449
M ULTIAGENT S YSTEMS Algorithmic, Game Theoretic, and Logical Foundations Yoav Shoham Stanford University http://cs.stanford.edu/ ˜ shoham [email protected] Kevin Leyton-Brown University of British Columbia http://cs.ubc.ca/ ˜ kevinlb [email protected] c all rights reserved Draft of August 27, 2007 For use with the authors’ permission only. Please do not distribute.

Mas kevin b-2007

  • Upload
    bin-yu

  • View
    1.004

  • Download
    3

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Mas kevin b-2007

M ULTIAGENT SYSTEMS

Algorithmic, Game Theoretic,and Logical Foundations

Yoav ShohamStanford University

http://cs.stanford.edu/ ˜ [email protected]

Kevin Leyton-BrownUniversity of British Columbia

http://cs.ubc.ca/ ˜ [email protected]

c© all rights reserved

Draft of August 27, 2007

For use with the authors’ permission only.Please do not distribute.

Page 2: Mas kevin b-2007
Page 3: Mas kevin b-2007

Contents

Credits and Acknowledgments ix

Introduction xi

1 Distributed Problem Solving 1

1.1 Distributed constraint satisfaction 21.1.1 Domain-pruning algorithms 41.1.2 Trial-and-backtrack algorithms 8

1.2 Distributed dynamic programming for path planning 111.2.1 Asynchronous dynamic programming 111.2.2 Learning real-time A∗ 13

1.3 Negotiation, auctions and optimization 151.3.1 Introduction: from contract nets to auction-like optimization 151.3.2 The assignment problem and linear programming 191.3.3 The scheduling problem and integer programming 25

1.4 Action selection in multiagent MDPs 321.5 History and references 35

2 Introduction to Non-Cooperative Game Theory: Games in Normal Form 37

2.1 Self-interested agents 372.1.1 Example: friends and enemies 382.1.2 Preferences and utility 39

2.2 Games in normal form 432.2.1 Example: The TCP user’s game 442.2.2 Definition of games in normal form 452.2.3 More examples of normal-form games 462.2.4 Strategies in normal-form games 49

2.3 Solution concepts for normal-form games 502.3.1 Pareto optimality 502.3.2 Best response and Nash equilibrium 512.3.3 Maxmin and minmax strategies 552.3.4 Minimax regret 582.3.5 Removal of dominated strategies 60

Page 4: Mas kevin b-2007

ii Contents

2.3.6 Rationalizability 632.3.7 Correlated equilibrium 642.3.8 Trembling-hand perfect equilibrium 662.3.9 ǫ-Nash equilibrium 66

2.4 History and references 67

3 Computing Solution Concepts of Normal-Form Games 69

3.1 Computing Nash equilibria of two-player zero-sum games693.2 Computing Nash equilibria of two-player general-sum games 71

3.2.1 Complexity of computing Nash equilibria 713.2.2 An LCP formulation and the Lemke-Howson algorithm 733.2.3 Searching the space of supports 79

3.3 Computing Nash equilibria ofn-player general-sum games 813.4 Computing all Nash equilibria of general-sum games 843.5 Computing specific Nash equilibria of general-sum games853.6 Computing maxmin and minmax strategies for two-player general-sum games 863.7 Removing dominated strategies 86

3.7.1 Domination by a pure strategy 873.7.2 Domination by a mixed strategy 883.7.3 Iterated dominance 89

3.8 Computing correlated equilibria ofn-player normal-form games 913.9 History and references 92

4 Playing Games with Time: Reasoning and Computing with the Extensive Form95

4.1 Perfect-information extensive-form games 954.1.1 Definition 964.1.2 Strategies and equilibria 964.1.3 Subgame-perfect equilibrium 994.1.4 Computing equilibria: backward induction 1014.1.5 An example, and criticisms of backward induction 105

4.2 Imperfect-information extensive-form games 1064.2.1 Definition 1074.2.2 Strategies and equilibria 1084.2.3 Computing equilibria: the sequence form 1104.2.4 Sequential equilibrium 118

4.3 History and references 121

5 Richer Representations: Beyond the Normal and Extensive Forms 123

5.1 Repeated games 1245.1.1 Finitely repeated games 1255.1.2 Infinitely repeated games 1255.1.3 “Bounded rationality”: repeated games played by automata 128

5.2 Stochastic games 1345.2.1 Definition 1345.2.2 Strategies and equilibria 135

c© Shoham and Leyton-Brown, 2007

Page 5: Mas kevin b-2007

Contents iii

5.2.3 Computing equilibria 1365.3 Bayesian games 137

5.3.1 Definition 1395.3.2 Strategies and equilibria 1425.3.3 Computing equilibria 1445.3.4 Ex-postequilibria 147

5.4 Congestion games 1475.4.1 Definition 1475.4.2 Computing equilibria 1495.4.3 Potential games 1505.4.4 Nonatomic congestion games 1525.4.5 Selfish routing and the price of anarchy 153

5.5 Computationally-motivated compact representations 1585.5.1 The expected utility problem 1585.5.2 Graphical games 1615.5.3 Action-graph games 1625.5.4 Multiagent influence diagrams 1655.5.5 GALA 167

5.6 History and references 168

6 Learning and Teaching 171

6.1 Why the subject of ‘learning’ is complex 1716.1.1 The interaction between learning and teaching 1716.1.2 What constitutes learning? 1736.1.3 If learning is the answer, what is the question? 174

6.2 Fictitious play 1776.3 Rational learning 1826.4 Reinforcement learning 1866.5 Learning in unknown MDPs 186

6.5.1 Reinforcement learning in zero-sum stochastic games1876.5.2 Beyond zero-sum stochastic games 1916.5.3 Belief-based reinforcement learning 191

6.6 No-regret learning and universal consistency 1916.7 Targeted learning 1936.8 Evolutionary learning and other large-population models 194

6.8.1 The replicator dynamic 1956.8.2 Evolutionarily stable strategies 1986.8.3 Agent-based simulation and emergent conventions 201

6.9 History and references 203

7 Communication 205

7.1 “Doing by talking” I: cheap talk 2057.2 “Talking by doing”: signaling games 2097.3 “Doing by talking” II: speech-act theory 211

7.3.1 Speech acts 2117.3.2 Rules of conversation 212

Multiagent Systems, draft of August 27, 2007

Page 6: Mas kevin b-2007

iv Contents

7.3.3 A game theoretic view of speech acts 2147.3.4 Applications 217

7.4 History and references 220

8 Aggregating Preferences: Social Choice 223

8.1 Introduction 2238.1.1 Example: plurality voting 223

8.2 A formal model 2248.2.1 Social choice functions 2258.2.2 Social welfare functions 226

8.3 Voting 2268.3.1 Voting methods 2268.3.2 Voting paradoxes 228

8.4 Existence of social functions 2298.4.1 Social welfare functions 2308.4.2 Social choice functions 232

8.5 Ranking systems 2368.6 History and references 239

9 Protocols for Strategic Agents: Mechanism Design 241

9.1 Introduction 2419.1.1 Example: strategic voting 2419.1.2 Example: selfish routing 242

9.2 Mechanism design with unrestricted preferences 2439.2.1 Implementation 2439.2.2 The revelation principle 2459.2.3 Impossibility of general, dominant-strategy implementation 247

9.3 Quasilinear preferences 2489.3.1 Risk attitudes 2499.3.2 Mechanism design in the quasilinear setting 251

9.4 Groves mechanisms 2549.4.1 The VCG mechanism 2589.4.2 Drawbacks of VCG 2609.4.3 VCG and individual rationality 2629.4.4 Weak budget balance 263

9.5 Budget balance and efficiency 2649.5.1 Impossibility results 2649.5.2 The AGV mechanism 265

9.6 Beyond efficiency 2669.6.1 What else can be implemented in dominant strategies? 2669.6.2 Tractable Groves mechanisms 268

9.7 Applications of mechanism design 2709.8 History and references 270

10 Protocols for Multiagent Resource Allocation: Auctions 273

10.1 Single-good auctions 273

c© Shoham and Leyton-Brown, 2007

Page 7: Mas kevin b-2007

Contents v

10.1.1 Canonical auction families 27410.1.2 Auctions as Bayesian mechanisms 27610.1.3 Second-price, Japanese and English auctions 27710.1.4 First-price and Dutch auctions 27910.1.5 Revenue equivalence 28110.1.6 Risk attitudes 28410.1.7 Auction variations 28510.1.8 “Optimal” (revenue-maximizing) auctions 28610.1.9 Collusion 28810.1.10 Interdependent values 291

10.2 Multiunit auctions 29310.2.1 Canonical auction families 29410.2.2 Single-unit demand 29510.2.3 Beyond single-unit demand 29710.2.4 Unlimited supply 29910.2.5 Position auctions 301

10.3 Combinatorial auctions 30410.3.1 Simple combinatorial auction mechanisms 30510.3.2 The winner determination problem 30610.3.3 Expressing a bid: bidding languages 31010.3.4 Iterative mechanisms 31410.3.5 A tractable mechanism 316

10.4 Exchanges 31810.4.1 Two-sided auctions 31810.4.2 Prediction markets 319

10.5 History and references 320

11 Teams of Selfish Agents: An Introduction to Coalitional GameTheory 323

11.1 Definition of coalitional games 32311.2 Classes of coalitional games 32511.3 Analyzing coalitional games 326

11.3.1 The core 32711.3.2 The Shapley value 32911.3.3 Other solution concepts 331

11.4 Computing coalitional game theory solution concepts 33111.5 Compact representations of coalitional games 332

11.5.1 Weighted graph games 33311.5.2 Weighted majority games 33511.5.3 Capturing synergies: a representation for super-additive games 33511.5.4 A decomposition approach: multi-issue representation 33611.5.5 A logical approach: marginal contribution nets 337

11.6 History and references 338

12 Logics of Knowledge and Belief 341

12.1 The partition model of knowledge 34112.1.1 Muddy children and warring generals 341

Multiagent Systems, draft of August 27, 2007

Page 8: Mas kevin b-2007

vi Contents

12.1.2 Formalizing intuitions about the partition model 34212.2 A detour to modal logic 345

12.2.1 Syntax 34612.2.2 Semantics 34612.2.3 Axiomatics 34712.2.4 Modal logics with multiple modal operators 34712.2.5 Remarks about first-order modal logic 348

12.3 S5: An axiomatic theory of the partition model 34812.4 Common knowledge, and an application to distributed systems 35112.5 Doing time, and an application to robotics 354

12.5.1 Termination conditions for motion planning 35412.5.2 Coordinating robots 358

12.6 From knowledge to belief 36012.7 Combining knowledge and belief (and revisiting knowledge) 36112.8 History and references 366

13 Beyond Belief: Probability, Dynamics and Intention 369

13.1 Knowledge and probability 36913.2 Dynamics of knowledge and belief 373

13.2.1 Belief revision 37413.2.2 Beyond AGM: update, arbitration, fusion, and friends 37913.2.3 Theories of belief change: a summary 383

13.3 Logic, games, and coalition logic 38413.4 Towards a logic of ‘intention’ 386

13.4.1 Some pre-formal intuitions 38613.4.2 The road to hell: elements of a formal theory of intention 38813.4.3 Group intentions 392

13.5 History and references 393

Appendices: Technical Background 395

A Probability Theory 397

A.1 Probabilistic models 397A.2 Axioms of probability theory 397A.3 Marginal probabilities 398A.4 Conditional probabilities 398

B Linear and Integer Programming 399

B.1 Linear programs 399B.2 Integer programs 401

C Markov Decision Problems (MDPs) 403

C.1 The model 403C.2 Solving known MDPs via value iteration 403

D Classical Logic 405

c© Shoham and Leyton-Brown, 2007

Page 9: Mas kevin b-2007

Contents vii

D.1 Propositional calculus 405D.2 First-order logic 406

Bibliography 409

Index 425

Multiagent Systems, draft of August 27, 2007

Page 10: Mas kevin b-2007
Page 11: Mas kevin b-2007

Credits and Acknowledgments

We should start off by explaining the order of authorship. Yoav conceived of theproject, and started it, in late 2001, working on it alone andwith several colleagues(see below). Sometime in 2004 Yoav realized he needed help ifthe project were everto come to conclusion, and he enlisted the help of Kevin. The result was a true partner-ship and a complete overhaul of the material. The current book is vastly different fromthe draft that existed when the partnership was formed—in depth, breadth, and form.Yoav and Kevin have made equal contributions to the book; theorder of authorshipreflects the history of the book, but nothing else.

In six years of book-writing we accumulated many debts. The following is our besteffort to acknowledge those. If we omit any names it is due solely to our poor memoriesand record keeping, and we apologize in advance.

When the book started out, Teg Grenager served as a prolific ghost writer. Whilelittle of the original writing remains (though some does, e.g., in the section on speechacts), the project would not have gotten off the ground without him.

Several past and present graduate students made substantial contributions. The ma-terial on coalitional games was written by Sam Ieong, with minimal editing on ourpart. Some of the material on computing solution concepts isbased on writing byRyan Porter, who also wrote much of the material on bounded rationality. The mate-rial on multiagent learning is based in part on joint work with Rob Powers, who alsocontributed text. The material on graphical games is based on writing by Albert Jiang,who also contributed to the proof of the minmax theorem.

Many colleagues across the world generously gave us comments on drafts, or pro-vided counsel otherwise. Felix Brandt and Vince Conitzer deserve special mention fortheir particularly detailed and insightful comments. Other colleagues to whom we areindebted include Alon Altman, Krzysztof Apt, Ronen Brafman, Yiling Chen, Nando deFreitas, Joe Halpern, Jason Hartline, Jean-Jacques Herings, Bobby Kleinberg, DaphneKoller, David Parkes, Baharak Rastegari, Tuomas Sandholm,Peter Stone, Moshe Ten-nenholtz, Nikos Vlasis, Mike Wellman, and Mike Wooldridge.

Several of our colleagues generously contributed materialto the book, in addition tolending their insight. They include Geoff Gordon (Matlab code to generate the saddlepoint figure for zero-sum games), Carlos Guestrin (on actionselection in distributedMDPs, and a sensor-nets figure), Michael Littman (on subgameperfection), Chris-tos Papadimitriou (on the Lemke-Howson algorithm, taken from his chapter in [Nisanet al., 2007]), Marc Pauly (on coalition logic), and Christian Shelton (on computingNash equilibria forn-person games).

Page 12: Mas kevin b-2007

x Credits and Acknowledgments

Several people provided critical editorial and productionassistance of various kinds.Chris Manning was kind enough to let us use the LATEX macros from his own book,and Ben Galin added a few miracles of his own. (Ben also composed several of theexamples, found some bugs, drew many figures, and more generally for two yearsserved as an intelligent jack of all trades on this project.)David Thompson overhauledour figures, code formatting and index, and also gave some feedback on drafts. ErikZawadzki helped with the bibliography and with some figures.Maia Shoham helpedwith some historical notes and bibliography entries, as well as with some copy-editing.

We thank all these friends and colleagues. Their input has contributed to a betterbook, but of course they are not to be held accountable for anyremaining shortcomings.We claim sole credit for those.

We also wish to thank [[[publisher]]] for agreeing to publish the book, for their pa-tience and conscientious production process, and for theirenlightened online-publishingpolicy which has enabled us to provide the broadest possibleaccess to the book.

Last, and certainly not the least, we thank our families [[[...]]]

c© Shoham and Leyton-Brown, 2007

Page 13: Mas kevin b-2007

Introduction

Imagine a personal software agent engaging in electronic commerce on your behalf.Say the task of this agent is to track goods available for salein various online venuesover time, and to purchase some of them on your behalf for an attractive price. In orderto be successful, your agent will need to embody your preferences for products, yourbudget, and in general your knowledge about the environmentin which it will operate.Moreover, the agent will need to embody your knowledge of other similar agents withwhich it will interact (for example, agents who might compete with it in an auction, oragents representing store owners)—including their own preferences and knowledge. Acollection of such agents forms a multiagent system. The goal of this book is to bringunder one roof a variety of ideas and techniques that providefoundations for modeling,reasoning about, and building multiagent systems.

Somewhat strangely for a book that purports to be rigorous, we will not give a pre-cise definition of a multiagent system. The reason is that many competing, mutuallyinconsistent answers have been offered in the past. Indeed,even the seemingly sim-pler question—What is a (single) agent?—has resisted a definitive answer. For ourpurposes, the following loose definition will suffice: Multiagent systems are those sys-tems that include multiple autonomous entities with eitherdiverging information ordiverging interests, or both.

Scope of the book

The motivation for studying multiagent systems often stemsfrom the interest in artifi-cial (software or hardware) agents, for example software agents living on the Internet.Indeed, the Internet can be viewed as the ultimate platform for interaction among self-interested, distributed computational entities. Such agents can be trading agents of thesort discussed above, “interface agents” that facilitate the interaction between the userand various computational resources (including other interface agents), game-playingagents that assist (or replace) a human player in a multi-player game, or autonomousrobots in a multi-robot setting. However, while the material is written by computerscientists with computational sensibilities, it is quite interdisciplinary and the mate-rial is in general fairly abstract. Many of the ideas apply to—and indeed often takenfrom—inquiries about human individuals and institutions.

The material spans disciplines as diverse as computer science (including artificialintelligence, distributed systems, and algorithms), economics, operations research, an-alytic philosophy and linguistics. The technical materialincludes logic, probability

Page 14: Mas kevin b-2007

xii Introduction

theory, game theory, and optimization. Each of the topics covered easily supportsmultiple independent books and courses, and this book does not aim to replace them.Rather, the goal has been to gather the most important elements from each discipline,and weave them together into a balanced and accurate introduction to this broad field.The intended reader is a advanced undergraduate (prototypically, but not necessarily,in computer science), or a graduate student making a first foray into the area.

Since the umbrella of multiagent systems is so broad, the questions of what to in-clude in any book on the topic and how to organize the selectedmaterial are crucial.To begin with, this book concentrates on foundational topics rather than surface ap-plications. Although we will occasionally make reference to real-world applications,we will do so primarily to clarify the concepts involved; this is despite the practicalmotivations professed above. And so this is the wrong text for the reader interestedin a practical guide into building this or that sort of software. The emphasis is ratheron important concepts and the essential mathematics behindthem. The intention is todelve in enough detail into each topic to be able to tackle some technical material, andthen to point the reader in the right directions for further education on particular topics.

The decision was thus to include predominantly established, rigorous material that islikely to withstand the test of time. This still leaves vast material from which to choose.In understanding the selection made here, it is useful to keep in mind the keywords inthe book’s (rather long) subtitle:coordination, competition, algorithms, game theory,andlogic. These terms will help frame the chapters overview which comes next.

Overview of the chapters

Starting with issues of coordination, we begin inChapter 1 with distributed prob-lem solving. In these multi-agent settings there is no question of agents’ individualpreferences; there is some global problem to be solved, but for one reason or anotherit is either necessary or advantageous to distribute the task among multiple agents,whose actions may require coordination. The chapter thus isstrongly algorithmic.Specifically, we look at four algorithmic methods: distributed constraint satisfaction,distributed dynamic programming, auction-like optimization procedures of linear- andinteger-programming, and action selection in distributedMDPs.

We then begin to embrace issues of competition as well as coordination. While thearea of multi-agent systems is not synonymous with game theory, there is no questionthat game theory is a key tool to master within the field, and sowe devote several chap-ters to it.Chapters 2, 4 and5 constitute a crash course in non-cooperative game theory(they cover, respectively, the normal form, the extensive form, and a host of other gamerepresentations). In these chapters, as in others which draw on game theory, we culledthe material that in our judgment is needed in order to be a knowledgeable consumerof modern-day game theory. Unlike traditional game theory texts, we include also dis-cussion of algorithmic considerations. In the context of the normal-form representationthat material is sufficiently substantial to warrant its ownchapter,Chapter 3.

We then switch to two specialized topics in multi-agent systems. InChapter 6 wecover multi-agent learning. The topic is interesting for several reasons. First, it is akey facet of multi-agent systems. Second, the very problemsaddressed in the area

c© Shoham and Leyton-Brown, 2007

Page 15: Mas kevin b-2007

xiii

are diverse and sometimes ill understood. Finally, the techniques used, which drawequally on computer science and game theory (as well as some other disciplines), arenot straightforward extensions of learning in the single-agent case.

In Chapter 7 we cover another element unique to multi-agent systems, communi-cation. We cover communication in a game theoretic setting,as well as in cooperativesettings traditionally considered by linguists and philosophers (except that we see thatthere too a game theoretic perspective can creep in).

Next is a three-chapter sequence that might be called “protocols for groups.”Chap-ters 8 covers social-choice theory, including voting methods. This is a non-strategictheory, in that it assumes that the preferences of agents areknown, and the only ques-tion is how to aggregate them properly.Chapter 9 covers mechanism design, whichlooks at how such preferences can be aggregated by a central designer even withoutthe cooperation of the agents involved. Finally,Chapter 10 looks at the special caseof auctions.

Chapter 11 covers coalitional game theory, in recent times somewhat neglectedwithin game theory and certainly under-appreciated in computer science.

The material in Chapters 1–11 is mostly Bayesian and/or algorithmic in nature.And thus the tools used in them include probability theory, utility theory, algorithms,Markov Decision Problems (MDPs), and linear/integer programming. We concludewith two chapters on logical theories in multi-agent systems. InChapter 12 we covermodal logic of knowledge and belief. This material hails from philosophy and com-puter science, but it turns out to dovetail very nicely with the discussion of Bayesiangames in Chapter 5. Finally, inChapter 13 we extend the discussion in severaldirections—we discuss how beliefs change over time, on logical models of games, andhow one might begin to use logic to model motivational attitudes (such as ‘intention’)in addition to the informational ones (knowledge, belief).

Required background

The book is rigorous, and requires mathematical thinking, but only basic specific back-ground knowledge. In much of the book we assume knowledge of basic computerscience (algorithms, complexity) and basic probability theory. In more specific partswe assume familiarity with Markov Decision Problems (MDPs), mathematical pro-gramming (specifically, linear and integer programming), and classical logic. All ofthese (except basic computer science) are covered briefly inappendices, but those aremeant as refreshers and to establish notation, not as a substitute for background inthose subjects (this is true in particular of probability theory). However, above all, aprerequisite is a capacity for clear thinking.

How to teach (and learn) from this book

There is partial dependence among the thirteen chapters. Tounderstand them, it isuseful to think of the book as consisting of the following “blocks”:

Block 1, Chapter 1: Distributed problem solving

Multiagent Systems, draft of August 27, 2007

Page 16: Mas kevin b-2007

xiv Introduction

Block 2, Chapters 2–5: Non-cooperative game theory

Block 3, Chapter 6: Learning

Block 4, Chapter 7: Communication

Block 5, Chapters 8–10: Protocols for groups

Block 6, Chapter 11: Coalitional game theory

Block 7, Chapters 12–13: Logical theories

Within every block there is a sequential dependence. Among the blocks, however,there is only one strong dependence: Blocks 3, 4 and 5 each depend on some ele-ments of non-cooperative game theory, and thus on block 2 (though none require theentire block). Otherwise there are some interesting local pairwise connections betweenblocks, but none that require that both blocks be covered, whether sequentially or inparallel.

Given this weak dependence among the chapters, there are many ways to craft acourse out of the material, depending on the background of the students, their interests,and the time available. Here are two examples:

• A format Yoav has used in a course consisting of up to twenty 75-minute lectures isas follows: 3 lectures on Chapter 1, 4 on Chapters 2–5, 3 on thematerial from Chap-ter 6, 3 on Chapters 8–10, 1 on Chapter 11, 3 on the material from Chapters 12–13,and 1 on Chapter 7. The experienced instructor will recognize that adding up thesedays easily brings you to a total of 20. This structure is appropriate for studentsinterested in a broad, yet technical introduction to the area.

• Kevin teaches a course consisting of 23 lectures which goes into more detail butwhich concentrates on individual, strategic theories. Specifically, he structures hiscourse as follows: 1 lecture introduction, 3 based on Chapters 2, 1 on Chapter 3, 2on 4, 2 on 5, 2 on Chapter 8, 5 on Chapter 9, 5 on Chapter 10 and 2 onChapter 1.

Many other formats are possible, and the book’s website contains additional re-sources for instructors.

On pronouns and gender

We use the male form throughout the book. We debated this between us, not beinghappy with any of the alternatives. In the end the “standard”male convention won overthe newer, reverse female convention. We hope the reader is not offended, and if sheis, that she forgives us.

c© Shoham and Leyton-Brown, 2007

Page 17: Mas kevin b-2007

1 Distributed Problem Solving

In this chapter we discuss cooperative situations, in whichthe agents collaborate toachieve a common goal. This goal can be viewed as the shared goal of the agents, oralternatively as the goal of a central designer who is designing the various agents. Ofcourse, if such a designer exists, a natural question is why it matters that there existmultiple agents; they can be viewed merely as end sensors andeffectors for executingthe plan devised by the designer. However, there exist situations in which a problemneeds to be solved in a distributed fashion, either because acentral controller is notfeasible or because one wants to make good use of the distributed resources. A goodexample is provided bysensor networks. Such networks consist of multiple process-sensor networksing units, each with very local sensor capabilities, limited processing power, limitedpower supply, and limited communication bandwidth. Despite these limitations, thesenetworks aim to provide some global service. Figure 1.1 shows an example of a fieldedsensor network for monitoring environmental quantities like humidity, temperature andpressure in an office environment. Each sensor can monitor only its local area, andsimilarly can communicate only with other sensors in its local vicinity. The questionis what algorithm the individual sensors should run so that the center can still piecetogether a reliable global picture.

Distributed algorithms have been studied widely in computer science. We will con-centrate on distributed problem solving algorithms of the sort studied in Artificial In-telligence. Specifically, we will consider four families oftechniques, and associatedsample problems. They are, in order:

• Distributed constraint satisfaction algorithms.

• Distributed dynamic programming (as applied to path planning problems).

• Auction-like optimization algorithms (as applied to matching and scheduling prob-lems).

• Distributed solutions to Markov Decision Problems (MDPs).

Later in the book we will encounter two additional examples of distributed problemsolving. In Chapter 6 we will will encounter a family of techniques that involve learn-ing, some of them targeted at purely cooperative situations. We delay the discussion

Page 18: Mas kevin b-2007

2 1 Distributed Problem Solving

SERVER

LAB

PHONEQUIET

Figure 1.1 Part of a real sensor network used for indoor environmental monitoring.

to that chapter since it requires some discussion of non-cooperative game theory (dis-cussed in Chapter 2) as well as general discussion of multiagent learning (discussedearlier in Chapter 6). In addition, in Chapter 12, we will seean application of logics ofknowledge to distributed control in the context of robotics. That material too must bepostponed, since it relies on the preceding material in Chapter 12.

1.1 Distributed constraint satisfaction

A constraint satisfaction problem (CSP)is defined by a set of variables, domains forconstraintsatisfactionproblem (CSP)

each of the variables, and constraints on the values that thevariables might take onsimultaneously. The role of constraint satisfaction algorithms is to assign values to thevariables in a way that is consistent with all the constraints, or to determine that nosuch assignment exists.

Constraint satisfaction techniques have been applied in diverse domains, includingmachine vision, natural language processing, theorem proving, planning and schedul-ing, and many more. Here is a simple example taken from the domain of sensor net-works. Figure 1.2 depicts three radar-equipped units scattered in a certain terrain. Eachof the radars has a certain radius which, in combination withthe obstacles in the envi-ronment, gives rise to a particular coverage area for each unit. As seen in the figure,some of the coverage areas overlap. We consider a very specific problem in this setting.

c© Shoham and Leyton-Brown, 2007

Page 19: Mas kevin b-2007

1.1 Distributed constraint satisfaction 3

Suppose that each unit can choose one of three possible radarfrequencies. All of thefrequencies work equally well, so long as that no two radars with overlapping coveragearea use the same frequency. The question is which algorithms the units should employto select this frequency, assuming that the frequencies cannot be assigned centrally.

Figure 1.2 A simple sensor net problem

As is the case for many other CSPs, the essence of this problemcan be captured asa graph coloring problem. Figure 1.3 shows such a graph, corresponding to the sensornetwork CSP above. The nodes represent the individual units, the different frequenciesare repented by ‘colors’, and two nodes are connected by an undirected edge iff thecoverage areas of the corresponding radars overlap. The goal of graph coloring is tochoose one color for each node so that no two adjacent nodes have the same color.

S

SS

SS

SS

X1

X2 X3

red, blue, green

red, blue, green red, blue, green6=

6= 6=

Figure 1.3 A graph coloring problem equivalent to the sensor net problem of Figure 1.2

Formally speaking, a CSP consists of a finite set of variablesX = X1, . . . ,Xn,

Multiagent Systems, draft of August 27, 2007

Page 20: Mas kevin b-2007

4 1 Distributed Problem Solving

a domainDi for each variableXi, and a set of constraintsC1, . . . , Cm. Althoughin general CSPs allows infinite domains, we will assume here that all the domains arefinite. In the graph coloring example above there were three variables, and they allhad the same domain—red, green, blue. Each constraint is a predicate on somesubset of the variables, sayXi1 , . . . ,Xij

; the predicate defines a relation which isa subset of the Cartesian productDi1 × . . . × Dij

. Each such constraint restrictsthe values that may be simultaneously assigned to the variables participating in theconstraint. In this chapter we will restrict the discussionto binary constraints, each ofwhich constrains exactly two variables. For example, in themap coloring case, each“not equal” constraint applied to two nodes.

Given a subset S of the variables, aninstantiation of Sis an assignment of a uniquedomain value for each variable in S; it islegal if it does not violate any constraintthat mentions only variables in S. Asolutionto a network is a legal instantiation of allvariables. Typical tasks associated with constraint networks are to determine whether asolution exists, to find one or all solutions, to determine whether a legal instantiation ofsome of the variables can be extended to a solution, and so on.We will concentrate onthe most common task, which is finding one solution to a CSP—or proving that noneexist.

In a distributedCSP, each variable is owned by a different agent. The goal is stillto find a global variable assignment that meets the constraints, but each agent decideson the value of his own variable with relative autonomy. Whileit does not have aglobal view, each agent can communicate with its neighbors in the constraint graph. Adistributed algorithm for solving a CSP has each agent engage in some protocol thatcombines local computation with communication with its neighbors. A good algorithmensures that such a process terminates with a legal solution(or with a realization thatno legal solution exists), and does so quickly.

We will discuss two types of algorithm. Algorithms of the first kind embody a least-commitment approach, and attempt to rule out impossible variable values without los-ing any possible solutions. Algorithms of the second kind embody a more adventurousspirit and select tentative variable values, backtrackingwhen those choices prove un-successful. In both cases we assume that the communication between neighboringnodes is perfect, but nothing about its timing; messages cantake more or less timewithout rhyme or reason. We do assume however that if nodei sends multiple mes-sages to nodej, those messages arrive in the order in which they were sent.

1.1.1 Domain-pruning algorithms

Under domain-pruning algorithms nodes communicate with their neighbors in orderto eliminate values from their domains. We will consider twosuch algorithms. Inthe first, thefiltering algorithm, each node communicates its domain to its neighbors,filtering

algorithm eliminates from its domain the values that are not consistent with the values receivedfrom the neighbors, and the process repeats. Specifically, each nodexi with domainDi repeatedly executes the procedureRevise(xi, xj) for each neighborxj :

The process, known also under the general termarc consistency, terminates when noarc consistencyfurther elimination takes place, or when one of the domains becomes empty (in whichcase the problem has no solution). If the process terminateswith one value in each

c© Shoham and Leyton-Brown, 2007

Page 21: Mas kevin b-2007

1.1 Distributed constraint satisfaction 5

procedureRevise(xi, xj)forall vi ∈ Di do

if there is no valuevj ∈ Dj such thatvi is consistent withvj thendeletevi fromDi

domain, that set of values constitutes a solution. If it terminates with multiple values ineach domain the result is inconclusive; the problem might ormight not have a solution.

Clearly the algorithm is guaranteed to terminate, and furthermore it is sound (in thatif it announced a solution, or announced that no solution exists, it is correct), but it isnot complete (that is, it may fail to pronounce a verdict). Consider, for example, thefamily of very simple graph coloring problems shown in Figure 1.4 (note that problem(d) is identical to the problem in Figure 1.3).

SS

SS

SS

SS

SS

SS

SS

SS

SS

SS

SS

SS

X1 X1

X1 X1

X2 X2

X2 X2

X3 X3

X3 X3

6= 6=

6= 6=

6= 6=

6= 6=

6= 6=

6= 6=

(d)(c)

(b)(a)

red, blue, green

red, blue, green red, blue, green

red

red, blue red, blue

red, blue

red, blue red, blue

red

red, blue red, blue, green

Figure 1.4 A family of graph coloring problems

In this family of CSPs the three variables (i.e., nodes) are fixed, as are the ‘not-equal’ constraints between them. What are not fixed are the domains of the variables.Consider the four instances of Figure 1.4:

(a) Initially, as the nodes communicate to one another, onlyx1’s messages result inany change. Specifically, when eitherx2 or x3 receivex1’s message they removered from their domains, ending up withD2 = blue andD3 = blue, green.

Multiagent Systems, draft of August 27, 2007

Page 22: Mas kevin b-2007

6 1 Distributed Problem Solving

Then, whenx2 communicates his new domain tox3, x3 further reduces his do-main togreen. At this point no further changes take place, and the algorithmterminates with a correct solution.

(b) The algorithm starts as before, but oncex2 andx3 receivex1’s message theyeach reduce their domain toblue. Now, when they update each other on theirnew domain, they each reduce their domain to, the empty set. At this point thealgorithm terminates, and correctly announces that no solution exists.

(c) In this case the initial set of messages yield no reduction in any domain. Thealgorithm terminates, but all the nodes have multiple values remaining. And sothe algorithm is not able to show that the problem is over-constrained and has nosolution.

(d) Filtering can also fail when a solution exists. For similar reasons as in instance (c),the algorithm is unable to show that in this case the problemdoeshave a solution.

In general, filtering is a very weak method, and at best is usedas a pre-processingstep for more sophisticated methods. The algorithm is baseddirectly on the notion ofunit resolutionfrom propositional logic. Unit resolution is the followinginference ruleunit resolution

A1

¬(A1 ∧A2 ∧ . . . ∧An)

¬(A2 ∧ . . . ∧An)

To see how the filtering algorithm corresponds to unit resolution, we must first writethe constraints as forbidden value combinations, callednogoods. For example, theconstraint thatx1 andx2 cannot both take the value “red” would give rise to the propo-sitional sentence¬(x1 = red ∧ x2 = red), which we write as the nogoodx1, x2.In instance (b) from Figure 1.4 agentX2 updated his domain based on agentX1’sannouncement thatx1 = red and the nogoodx1 = red, x2 = red:

x1 = red¬(x1 = red ∧ x2 = red)

¬(x2 = red)

Unit resolution is a weak inference rule, and so it is not surprising that the filteringalgorithm is weak as well.Hyper-resolutionis a generalization of unit resolution, andhyper-resolutionhas the following form:

A1 ∨A2 ∨ . . . ∨Am

¬(A1 ∧A1,1 ∧A1,2 ∧ . . .)¬(A2 ∧A2,1 ∧A2,2 ∧ . . .)...¬(Am ∧Am,1 ∧Am,2 ∧ . . .)

¬(A1,1 ∧ . . . ∧A2,1 ∧ . . . ∧Am,1 ∧ . . .)

c© Shoham and Leyton-Brown, 2007

Page 23: Mas kevin b-2007

1.1 Distributed constraint satisfaction 7

Hyper-resolution is both sound and complete for propositional logic, and indeed itgives rise to a complete distributed CSP algorithm. In this algorithm, each agent repeat-edly generates new constraints for its neighbors, notifies them of these new constraints,and prunes its own domain based on new constraints passed to it by its neighbors.Specifically, it executes the following algorithm, whereNGi is the set of all nogoodsof which agenti is aware, andNG∗

j is a set of new nogoods communicated from agentj to agenti:

procedure ReviseHR(NGi, NG∗j )

repeatNGi ← NGi

⋃NG∗

j

letNG∗i denote the set of new nogoods thati can derive fromNGi and its

domain using hyper-resolutionif NG∗

i is nonemptythenNGi ← NGi

⋃NG∗

i

send the nogoodsNG∗i to all neighbors ofi

if ∈ NG∗i then

stop

until there is no change in your set of nogoodsNGi

The algorithm is guaranteed to converge, in the sense that after sending and receivinga finite number of messages each agent will stop sending messages and generatingnogoods. Furthermore, the algorithm is complete. The problem has a solution iff, uponcompletion, no agent has generated the empty nogood (obviously every superset of anogood is also forbidden, and thus if a node ever generates anempty nogood then theproblem has no solution).

Consider again instance (c) of the CSP problem in Figure 1.4.In contrast to thefiltering algorithm, the hyper-resolution-based algorithm proceeds as follows. Initially,x1 maintains four nogoods –x1 = red, x2 = red, x1 = red, x3 = red,x1 = blue, x2 = blue, x1 = blue, x3 = blue – which are derived directly fromthe constraints involvingx1. Furthermore,x1 must adopt one of the values in its do-main, sox1 = red ∨ x1 = blue. Using hyper-resolution,x1 can reason:

x1 = red ∨ x1 = blue¬(x1 = red ∧ x2 = red)¬(x1 = blue ∧ x3 = blue)

¬(x2 = red ∧ x3 = blue)

Thusx1 constructs the new nogoodx2 = red, x3 = blue; in a similar way it canalso construct the nogoodx2 = blue, x3 = red. x1 then sends both nogoods toits neighborsx2 andx3. Using its domain, an existing nogood and one of these newnogoods,x2 can reason:

x2 = red ∨ x2 = blue¬(x2 = red ∧ x3 = blue)

Multiagent Systems, draft of August 27, 2007

Page 24: Mas kevin b-2007

8 1 Distributed Problem Solving

¬(x2 = blue ∧ x3 = blue)

¬(x3 = blue)

Using the other new nogood fromx1, x2 can also construct the nogoodx3 = red.These two singleton nogoods are communicated tox3, and allow it to generate theempty nogood. This proves that the problem does not have a solution.

This example, while demonstrating the greater power of the hyper-resolution-basedalgorithm relative to the filtering algorithm, also exposesits weakness; the number ofnogoods generated grows to be unmanageably large. (Indeed,we only described theminimal number of nogoods needed to derive the empty nogood;many others wouldbe created as all the agents processed each others’ messagesin parallel. Can you findan example?) Thus, the situation in which we find ourselves isthat we have one algo-rithm that is too weak, and another that is impractical. The problem lies in the least-commitment nature of these algorithms; they are restrictedto removing only provablyimpossible value combinations. The alternative to such ‘safe’ procedures is to explorea subset of the space, making tentative value selections forvariables, and backtrackingwhen necessary. This is the topic of the next section. However, the algorithms coveredin this section are not irrelevant; the filtering algorithm is an effective preprocessingstep, and the algorithm we will discuss next is based on the hyper-resolution-basedalgorithm.

1.1.2 Trial-and-backtrack algorithms

A straightforwardcentralizedtrial-and-error solution to a CSP is first to order the vari-ables (for example, alphabetically). Then, given the ordering x1, x2, . . . , xn invokethe procedure ChooseValue(x1, ). The procedure ChooseValue is defined recur-sively as follows, wherev1, v2, . . . , vi−1 is the set of values assigned to variablesx1, . . . , xi−1.

procedureChooseValue(xi, v1, v2, . . . , vi−1)vi ← value from the domain ofxi that is consistent withv1, v2, vi−1if no such value existsthen

backtrack1

else if i = n thenstop

elseChooseValue(xi+1, v1, v2, . . . , vi)

This exhaustive search of the space of assignments has the advantage of complete-choronologicalbacktracking ness. But it is only “distributable” in the uninteresting sense that the different agents

execute sequentially, mimicking the execution of a centralized algorithm. The question

1. There are various ways to implement the backtracking in thisprocedure. The most straightforward wayis to undo the choice made thus far in reverse chronological order, a procedure known aschronologicalbacktracking. It is well known that more sophisticated backtracking procedures can be more efficient, butthat does not concern us here.

c© Shoham and Leyton-Brown, 2007

Page 25: Mas kevin b-2007

1.1 Distributed constraint satisfaction 9

is whether there exists an algorithm that allows the agents to execute in parallel andasynchronously, and still guarantee soundness and completeness.

Consider the following naive procedure, executed by all agents in parallel and asyn-chronously:

select a value from your domainrepeat

if your current value is consistent with the current values of your neighbors,or if none of the values in your domain are consistent with them then

do nothingelse

select one a value in your domain that is consistent with those of yourneighbors and notify your neighbors of your new value

until there is no change in your value

Clearly when the algorithm terminates, either the current values constitute a solution,or no solution exists. The problem is of course that this procedure may never terminate.First, this procedure has no way of proving that no solution exists: it provides no meansfor reducing the size of an agent’s domain. Second, even if there is a solution, we maykeep selecting wrong values from domains without ever reaching it. (For example,consider example (d) from Figure 1.4: if every agent cycles sequentially between red,green and blue, the algorithm will never terminate.) This problem is less severe thanthe first, however, since it can be largely overcome through randomization.

Overall, the problem with this algorithm is essentially that, since the agents areexecuting asynchronously and with partial knowledge of thestate of other agents,agents make choices based on stale information; what one agent may consider a con-sistent choice may not actually be consistent given that other agents have changed theirchoices. This suggests that good algorithms are likely to besomewhat more involved.Indeed it is the case that distributed backtracking algorithms do tend to be relativelycomplex.

Here is an example that is still fairly simple, and which combines elements of boththe standard sequential algorithm and naive asynchronous procedure above. From thesequential algorithm we borrow the idea to order the agents sequentially (for example,again alphabetically by name; more on that later). As in the naive algorithm, we willhave agents execute in parallel and asynchronously. However, both the choice of valuesand the communication will be guided by the ordering on the agents. Each agent willat all times maintain alocal view. This local view consists of this agent’s view of thechoice of variable values by the other agents. This view willbe based on messagesreceived from other agents (and in general will be out of date, as discussed above).

The messages among agents are as follows:

• When an agent selects a new value, it communicates the value toall its neighborswho are lower in the ordering; they incorporate it into theirlocal views.

• When an agent discovers that none of the values in his domain are consistent withhis local view he generates a new nogood using the hyper-resolution rule, and com-

Multiagent Systems, draft of August 27, 2007

Page 26: Mas kevin b-2007

10 1 Distributed Problem Solving

municates this nogood to the lowest ranked among the neighbors occurring in thenogood.

Because of its reliance on building up a set of nogoods, we cansee that this algorithmresembles a greedy version of the hyper-resolution algorithm. In the latter, all possiblenogoods are generated by any given node, and communicated toall neighbors, eventhough the vast majority of those are not useful. Here, agents make tentative choicesof a value or their variables, only generate nogoods that incorporate values alreadygenerated by the agents above them in the order, and only communicate each newnogood to one agent.

Before we give the exact algorithm, here is an example of its operation. Consideragain the instance (c) of the CSP in Figure 1.4, and assume theagents are orderedalphabetically,x1, x2, x3. They initially select values at random; suppose they all selectblue. x1 notifiesx2 andx3 of its choice, andx2 notifiesx3. x2’s local view is thusx1 = blue, andx3’s local view isx1 = blue, x2 = blue. x2 andx3 must check forconsistency of their local views with their own values.x2 detects the conflict, changesits own value tored, and notifiesx3. In the meantimex3 also checks for consistency,and similarly changes its value tored; it, however, notifies no one. Thenx3 receivessecond message arrives fromx2, and updates its local view tox1 = blue, x2 = red.At this point it cannot find a value from its domain consistentwith its local view, and,using hyper resolution, generates the nogoodx1 = blue, x2 = red. It communicatesthis nogood tox2, the lowest ranked agent participating in the nogood.x2 now cannotfind a value consistent with its local view, generates the nogood x1 = blue, andcommunicates it tox1. x1 detects the inconsistency with its current value, changes itsvalue tored, and communicates the new value tox2 andx3. The process now continuesas before;x2 changes its value back toblue, x3 finds no consistent value and generatesthe nogoodx1 = red, x2 = blue, and thenx2 generates the nogoodx1 = red.At this pointx1 has the nogoodx1 = blue as well as the nogoodx1 = red, andusing hyper-resolution generates the nogood, and the algorithm terminates havingdetermined that the problem has no solution.

The precise algorithm performed by agentxi is as follows.

when received (new val, (xj , dj)) doadd (xj , dj) to local viewcheck local view

when received (nogood, nogood) dorecordnogoodas new constraintforall (xk, dk) ∈ nogood, if xk is not a neighbor ofxi do

addxk to list of neighborsadd (xk, dk) to local viewrequestxk to addxi as a neighbor

check local view

Note that the algorithm has one more wrinkle that was not mentioned thus far, havingto do with adding new links between nodes. Consider for example a slightly modifiedexample, shown in Figure 1.5.

c© Shoham and Leyton-Brown, 2007

Page 27: Mas kevin b-2007

1.2 Distributed dynamic programming for path planning 11

procedurechecklocal viewwhen local view andcurrentvalueare not consistentdo

if no value inDi is consistent withlocal view thenresolve a new nogood using hyper-resolution rulesend the nogood to the lowest priority process in the nogoodwhen an empty nogood is founddo

broadcast to other processes that there is no solutionterminate this algorithm

elseselectd ∈ Di wherelocal viewandd are consistentcurrent value← dsend (newval, (xi, d)) to neighbors

As in the previous example, here toox3 generates the nogoodx1 = blue, x2 =red, and notifiesx2. x2 is not able to regain consistency itself by changing its ownvalue. However,x1 is not a neighbor ofx2, and sox2 doesn’t have the valuex1 = bluein its local view, and is not able to send the nogoodx1 = blue to x1. Sox2 sends arequest tox1 to addx2 to its list of neighbors, and to sendx2 its current value. Fromthere onwards the algorithm proceeds as before.

1.2 Distributed dynamic programming for path planning

Like graph coloring, path planning constitutes another common abstract problem solv-ing framework. A path-planning problem consists of a weighted directed graph witha set ofn nodesN , directed linksL, a weight functionw : L → R

+, and two nodess, t ∈ N . The goal is to find a directed path froms to t of minimal total weight. Moregenerally, we consider a set of goal nodesT ⊂ N , and are interested in the shortestpath froms to any of the goal nodest ∈ T .

This abstract framework applies in many domains. Certainlyit applies when thereis some concrete network at hand (for example, a transportation or telecommunicationnetwork). But it also applies in more roundabout ways. For example, in a planningproblem the nodes can be states of the world, the arcs actionsavailable to the agent,and the weights the cost (or, alternatively, time) of each action.

1.2.1 Asynchronous dynamic programming

Path planning is a well-studied problem in computer scienceand operations research.We will be interested in distributed solutions, in which each node performs a localcomputation, with access only to the state of its neighbors.Underlying our solutionswill be theprinciple of optimality: if nodex lies on a shortest path froms to t, then the principle of

optimalityportion of the path froms to x (or, respectively, fromx to t) must also be the shortestpaths betweens andx (resp.,x andt). This allows an incremental divide-and-conquerprocedure, also known asdynamic programming. dynamic

programming

Multiagent Systems, draft of August 27, 2007

Page 28: Mas kevin b-2007

12 1 Distributed Problem Solving

@@

@@

X1

1, 2X2

2

X3

1, 2

6= 6=

(a)

@@

@R

(new val,(X1, 1)) (new val,(X2, 2))

local view(X1, 1), (X2, 2)

@@

@@

-new link

X1

1, 2X2

2

X3

1, 2

6= 6=

(b)

add neighbor request

(nogood,(X1, 1), (X2, 2))

local view(X1, 1)

@@

@@

-X1

1, 2X2

2

X3

1, 2

6= 6=

(c)

(nogood,(X1, 1))

Figure 1.5 Asynchronous backtracking with dynamic link addition

c© Shoham and Leyton-Brown, 2007

Page 29: Mas kevin b-2007

1.2 Distributed dynamic programming for path planning 13

Let us represent the shortest distance from any nodei to the goalt ash∗(i). Thusthe shortest distance fromi to t via a nodej neighboringi is given byf∗(i, j) =w(i, j) + h∗(j), andh∗(i) = minj f

∗(i, j). Based on these facts, the ASYNCHDP asynchronousdynamicprogramming

algorithm has each node repeatedly perform the following procedure. In this procedure,given in Figure 1.8, each nodeimaintains a variableh(i), which is an estimate ofh∗(i).

procedure ASYNCHDP (nodei)if i is a goal nodethen

seth(i) to 0else

initialize h(i) arbitrarily (for example, to∞ or 0)

repeatforall neighborsj do

f(j)← w(i, j) + h(j)

h(i)← minj f(j)

Figure 1.6 The asynchronous dynamic programming algorithm.

Figure 1.7 shows this algorithm in action. Theh values are initialized to∞, andincrementally decrease to their correct values. The figure shows three iterations; notethat after the first iteration, not all finiteh values are correct; in particular, the value 3in noded still overestimates the true distance, which is corrected in the next iteration.

One can prove that the ASYNCHDP procedure is guaranteed to converge to the truevalues, that is,h will converge toh∗. Specifically, convergence will require one stepfor each node in the shortest path, meaning that in the worst case convergence willrequiren iterations. However, for realistic problems this is of little comfort. Notonly can convergence be slow, but this procedure assumes a process (or agent) foreach node. In typical search spaces one cannot effectively enumerate all nodes, letalone allocate them each a process (for example, chess has approximately10120 boardpositions, while there are fewer than1081 atoms in the universe and there have onlybeen1026 nanoseconds since the Big Bang). So to be practical we turn toheuristicversions of the procedure, which require a smaller number ofagents. Let us start byconsidering the opposite extreme in which we have only one agent.

1.2.2 Learning real-time A∗

In the learning real-time A∗, or LRTA∗ algorithm, the agent starts at a given node,learningreal-time A∗

(LRTA∗)performs an operation similar to that of the asynchronous dynamic programming, andthen moves to the neighboring node with the shortest estimated distance to the goal,and repeats. The procedure is as follows:

As above, we assume that the set of nodes is finite, and that allweightsw(i, j) arepositive and finite. Note that this procedure uses a given heuristic functionh(·) thatserves as the initial value for each newly encountered node.For our purposes it is notimportant what the precise function is. However, to guarantee certain properties ofLRTA∗, we must assume thath is admissible. This means thath never overestimates admissiblethe distance to the goal, that is,h(i) ≤ h∗(i). Because weights are non-negative we can

Multiagent Systems, draft of August 27, 2007

Page 30: Mas kevin b-2007

14 1 Distributed Problem Solving

@@@R ?

-

-

@@@R

6

i

i

i

ii i

b d

a c

1

3

1

2

2

2 1

1

3

s t

∞ ∞

∞ ∞

∞ 0

(i)

@@@R ?

-

-

@@@R

6

i

i

i

ii i

b d

a c

1

3

1

2

2

2 1

1

3

s t

∞ 3

∞ 1

∞ 0

(ii)

@@@R ?

-

-

@@@R

6

i

i

i

ii i

b d

a c

1

3

1

2

2

2 1

1

3

s t

3 2

3 1

∞ 0

(iii)

Figure 1.7 Asynchronous dynamic programming in action

procedure LRTA∗

i← s // the start nodewhile i is not a goal nodedo

foreachneighborj dof(j)← w(i, j) + h(j)

i′ ← arg minj f(j)// breaking ties at randomh(i)← max(h(i), f(i′))i← i′

Figure 1.8 The learning real-time A∗ algorithm.

c© Shoham and Leyton-Brown, 2007

Page 31: Mas kevin b-2007

1.3 Negotiation, auctions and optimization 15

ensure admissibility by settingh(i) = 0 for all i, although less conservative admissibleheuristic functions (built using knowledge of the problem domain) can speed up theconvergence to the optimal solution. Finally, we must assume that there exists somepath from every node in the graph to a goal node. With these assumptions, LRTA∗ hasthe following properties:

• theh-values never decrease, and remain admissible.

• LRTA∗ terminates; the complete execution from the start node to termination at thegoal node is called atrial .

• If LRTA ∗ is repeated while maintaining theh-values from one trial to the next, iteventually discovers the shortest path from the start to a goal node.

• If LRTA ∗ find the same path on two sequential trials, the shortest pathhas beenfound. (However, this path may also be found in one or more previous trials beforeit is found twice in a row. Do you see why?)

Figure 1.9 shows four trials of LRTA∗. (Do you see why admissibility of the heuristicis necessary?)

LRTA∗ is a centralized procedure. However, we note that rather than have a singleagent execute this procedure, one can have multiple agents execute it. The propertiesof the algorithm (call it LRTA∗(n), with n agents) are not altered, but the convergenceto the shortest path can be sped up dramatically. First, if the agents each break tiesdifferently, some will reach the goal much faster than others. Furthermore, if they allhave access to a sharedh-value table, the learning of one agent can teach the others.Specifically, after every round and for everyi, h(i) = maxj hj(i), wherehj(i) isagentj’s updated value forh(i). Figure 1.10 shows an execution of LRTA∗(2), thatis, LRTA∗ with two agents, starting from the same initial state as in Figure 1.9. (Thehollow arrows show paths traversed by a single agent, while the dark arrows show pathstraversed by both agents.)

1.3 Negotiation, auctions and optimization

1.3.1 Introduction: from contract nets to auction-like optimization

In this section we consider distributed problem solving that has a certain economicflavor. In this introductory subsection we will informally give the general philosophyand background; in the next two sections we will be more precise.

Contract netswere one of the earliest proposals for such an economic approach. contract netsContract nets are not a specific algorithm, but a framework, aprotocol for implementingspecific algorithms. In a contract net the global problem is broken down into subtasks,and those are distributed among a set of agents. Each agent has different capabilities;for each agenti there is a functionci such that for any set of tasksT , ci(T ) is the cost foragenti to achieve all the tasks inT . Each agent starts out with some initial set of tasks,but in general this assignment is not optimal, in the sense that the sum of all agents’costs is not minimal. The agents then enter into a negotiation process which improves

Multiagent Systems, draft of August 27, 2007

Page 32: Mas kevin b-2007

16 1 Distributed Problem Solving

@@@R ?

-

-

@@@R

6

i

i

i

ii i

b d

a c

2

2

2

3

3

2 3

1

5

s t

0 0

0 0

0 0

initial state

@@@R ?

-

-

@@@R

@@@

@@@

@@@

@@@

6

@@@R ?

-

-

@@@R

@@@

@@@

@@@

@@@

6

i

i

i

ii i

b d

a c

2

2

2

3

3

2 3

1

5

s t

i

i

i

ii i

b d

a c

2

2

2

3

3

2 3

1

5

s t

2 20 0

2 41 1

2 40 0

first trial second trial

@@@R

@@@

@@@

@@@

@@@ ?

-

-

@@@R

@@@

@@@

@@@

@@@

6

@@@R

@@@

@@@

@@@

@@@ ?

-

-

@@@R

@@@

@@@

@@@

@@@

6

i

i

i

ii i

b d

a c

2

2

2

3

3

2 3

1

5

s t

i

i

i

ii i

b d

a c

2

2

2

3

3

2 3

1

5

s t

3 34 4

4 41 1

4 50 0

third trial forth trial

Figure 1.9 Four trials of LRTA∗

on the assignment, and hopefully culminates in an optimal assignment, that is, one withminimal cost. Furthermore, the process can have a so-calledanytime property; even ifanytime

property it is interrupted prior to achieving optimality, it can achieve significant improvementsover the initial allocation.

The negotiation consists of agents repeatedly contractingout assignments amongthemselves, each contract involving the exchange of tasks as well as money. The ques-tion is how the bidding process takes place, and what contracts hold based on thisbidding. The general contract net protocol is open on these issues. One particularapproach has each agent bid for each set of tasks the agent’s marginal cost for thetask, that is, the agent’s additional cost for adding that task to its current set. Thetasks are allocated to the highest bidders, and the process repeats. It can be shownthat there always exist a sequence of contracts that result in the optimal allocation. If

c© Shoham and Leyton-Brown, 2007

Page 33: Mas kevin b-2007

1.3 Negotiation, auctions and optimization 17

R

@@@

@@@ ?

-

-

@@@R

@@@

@@@

@@@

@@@

6

R

@@@

@@@ ?

-

-

@@@R

@@@

@@@

@@@

@@@

6

i

i

i

ii i

b d

a c

2

2

2

3

3

2 3

1

5

s t

i

i

i

ii i

b d

a c

2

2

2

3

3

2 3

1

5

s t

agenta agenta

agentb agentb

2 30 4

2 41 1

2 40 0

first trial second trial

@@@R

@@@

@@@

@@@

@@@ ?

-

-

@@@R

@@@

@@@

@@@

@@@

6

i

i

i

ii i

b d

a c

2

2

2

3

3

2 3

1

5

s t

3 4

4 1

5 0

third trial

Figure 1.10 Three trials of LRTA∗(2)

one is restricted to basic contract types in which one agent contracts a single task toanother agent, and receives from him some money in return, then in general achievingoptimality requires in general that agents enter into “money-losing” contracts in theprocess. However, there exist more complex contracts—whichinvolve contracting fora bundle of tasks (“cluster contracts”), or a swap of tasks among two agents (“swap cluster contractscontracts”), or simultaneous transfers among many agents (“multiagent contracts”)—

swap contracts

multiagentcontracts

whose combination allows for a sequence of contracts that are not money losing andwhich culminate in the optimal solution.

At this point several questions may naturally occur to the reader:

• We start with some global problem to be solved, but then speakabout minimizingthe total cost to the agents. What is the connection between the two?

• When exactly do agents make offers, and what is the precise method by which thecontracts are decided on?

• Since we are in a cooperative setting, why does it matter whether agents “losemoney” or not on a given contract?

We will provide an answer to the first question in the next section. We will see that, inin certain settings (specifically, those of linear programming and integer programming),finding an optimal solution is closely related to the individual utilities of the agents.

Regarding the second question, indeed one can provide several instantiation of eventhe specific, marginal-cost version of the contract net protocol. In the next two sec-

Multiagent Systems, draft of August 27, 2007

Page 34: Mas kevin b-2007

18 1 Distributed Problem Solving

tions we will be much more specific. We will look at a particular class of negotiationschemes, namely (specific kinds of) auctions. Every negotiation scheme consists ofthree elements: (1) permissible ways of making offers (bidding rules), (2) definitionbidding rulesof the outcome based on the offers (market clearing rules, and (3) the information

market clearingrules

made available to the agents throughout the process (information dissemination rules).

informationdisseminationrules

Auctions are a structured way of settling each of these dimensions, and we will lookat auctions that do so in specific ways. It should be mentioned, however, that thisspecificity is not without a price. While convergence to optimality in contract nets de-pends on particular sequences of contracts taken place, andthus on some coordinatinghand, the process is inherently distributed. In the auctionalgorithms we will study willinclude an auctioneer, an explicit centralized component.

The last question deserves particular attention. As we said, we start with some prob-lem to be solved. We then proceed to define an auction-like process for solving it ina distributed fashion. However it is no accident that this section precedes our (ratherdetailed) discussion of auctions in Chapter 10. As we see there, auctions are a way toallocate scarce resources among self-interested agents. Auction theory thus belongs tothe realm of game theory. In this chapter we also speak about auctions, but the dis-cussion has little to do with game theory. In the spirit of thecontract net paradigm, inour auctions agents will engage in a series of bids for resources, and at the end of theauction the assignment of the resources to the “winners” of the auction will constitutean optimal (or near optimal, in some cases) solution. However, there is an impor-tant difference between auctions as discussed in Chapter 10and the procedures wediscuss here. In the standard treatment of auctions, the bidders are assumed to be self-interested, and to bid in a way that maximizes their personalpayoff. Here however weare merely looking at a way to perform an optimization; the process is distributed, andthe protocol assigned to the individual entities is auction-like, but there is no questionof the entities deviating from the protocol for personal gain. For this reason, despitethe surface similarity, the discussion of these auction-like methods makes no referenceto game theory or mechanism design. In particular, while these methods have somenice properties—for example, they are intuitive, provably correct, naturally paralleliz-able, appropriate for deployment in distributed systems settings, and tend to be robustto slight perturbations of the problem specification—no claim is made about their use-fulness in adversarial situations. For this reason it is indeed something of a red herring,in this cooperative setting, to focus on questions such as whether a given contract isprofitable for a given agent. In non-cooperative settings, where contract nets are alsosometimes pressed into service, the situation is of course different.

In the next two sections we will be looking at two classical optimization problems,one representable as a linear program (LP) and one only as an integer program (IP) (fora brief review of LP and IP, see Appendix B). There exists a vast literature on how tosolve LP and IP problems, and it is not our aim in this chapter (or in the appendix) tocapture this broad literature. Our more limited aim here is to look at the auction-stylesolutions for them. First we will look at an LP problem—the problem of weightedmatching in a bipartite graph, also known as theassignment problem. We will thenlook at a more complex, IP problem—that ofscheduling. As we shall see, since theLP problem is relatively easy (specifically, solvable in polynomial time), it admits anauction-like procedure with tight guarantees. The IP problem isNP-complete, and so

c© Shoham and Leyton-Brown, 2007

Page 35: Mas kevin b-2007

1.3 Negotiation, auctions and optimization 19

it is not surprising that the auction-like procedure does not come with such guarantees.

1.3.2 The assignment problem and linear programming

The problem and its LP formulation

The problem ofweighted matching in a bipartite graph, otherwise known as the weightedmatchingassignment problem, is defined as follows.

assignmentproblem

Definition 1.3.1 (Assignment problem)A (symmetric) assignment problemconsists of

• A setN of n agents,

• A setX of n objects,

• A setM ⊆ N ×X of possible assignment pairs, and

• A functionv : M → R giving the value of each assignment pair.

An assignment is a set of pairsS ⊆ M such that each agenti ∈ N and each objectj ∈ X is in at most one pair inS. A feasible assignmentis one in which all agents are feasible

assignmentassigned an object. A feasible assignmentS is optimal if it maximizes the sum∑

(i,j)∈S

v(i, j).

An example of an assignment problem is the following (in thisexampleX = x1, x2, x3andN = 1, 2, 3):

x1 x2 x3

1 2 4 02 1 5 03 1 3 2

In this small example it is not hard to see that(1, x1), (2, x2), (3, x3) is an optimalassignment. In larger problems, however, the solution is not obvious, and the questionis how to compute it algorithmically.

We first note that an assignment problem can be encoded as a linear program. TheLP encoding of the assignment problem is as follows. Given a general assignmentproblem as defined above, we introduce the indicator matrixx; xi,j > 0 when the pair(i, j) is selected, and0 otherwise. Then we express the linear program as follows.

maximize∑

(i,j)∈M

v(i, j)xi,j

subject to∑

j|(i,j)∈M

xi,j ≤ 1 ∀i ∈ N

i|(i,j)∈M

xi,j ≤ 1 ∀j ∈ X

Multiagent Systems, draft of August 27, 2007

Page 36: Mas kevin b-2007

20 1 Distributed Problem Solving

On the face of it the LP formulation is inappropriate since itallows for fractionalmatches (that is, for0 < xi,j < 1). But as it turns out this LP has integral solutions:

Lemma 1.3.2 The LP encoding of the assignment problem has a solution suchthat foreveryi, j it is the case thatxi,j = 0 or xi,j = 1. Furthermore, any optimal fractionalsolution can be converted in polynomial time to an optimal integral solution.

Since LP can be solved in polynomial time, we have the following:

Corollary 1.3.3 The assignment problem can be solved in polynomial time.

This corollary might suggest that we are done. However, there are a number ofreasons to not stop there. First, the polynomial-time solution to the LP problem is ofcomplexityO(n3), which may be too high in some cases. Furthermore, the solutionis not obviously parallelizable, and is not particularly robust to changes in the problemspecification (if one of the input parameters changes, the program must essentially besolved from scratch). One solution that suffers less from these shortcomings is basedon the economic notion of competitive equilibrium, which weexplore next.

The assignment problem and competitive equilibrium

Imagine that each of the objects inX has an associated price; the price vector isp =(p1, . . . , pn), wherepj is the price of objectj. Given an assignmentS ⊆M and a pricevectorp, define the “utility” from an assignmentj to agenti asu(i, j) = v(i, j)− pj .An assignment and a set of prices are incompetitive equilibriumwhen each agent iscompetitive

equilibrium assigned the object that maximizes his utility given the current prices. More formally,we have the following.

Definition 1.3.4 (Competitive equilibrium) An feasible assignmentS and a price vec-tor p are incompetitive equilibriumwhen for every pairing(i, j) ∈ S it is the case that∀k, u(i, j) ≥ u(i, k).

It might seem strange to drag in an economic notion into a discussion of combinato-rial optimization, but as the following theorem shows thereare good reasons for doingso:

Theorem 1.3.5 If a feasible assignmentS and a price vectorp satisfy the competitiveequilibrium condition, thenS is an optimal assignment. Furthermore, for any opti-mal solutionS, there exists a price vectorp such thatp andS satisfy the competitiveequilibrium condition.

For example, in the above example, it is not hard to see that the optimal assignment(1, x1), (2, x2), (3, x3) is a competitive equilibrium given the price vector(0, 4, 0); the“utilities” of the agents are2, 1, and1 respectively, and none of them can increase theirprofit by bidding for one of the other objects at the current prices. We outline the proofof a more general form of the theorem in the next section.

This last theorem means that one way to search for solutions of the LP is to searchthe space of competitive equilibria. And a natural way to search that space involved

c© Shoham and Leyton-Brown, 2007

Page 37: Mas kevin b-2007

1.3 Negotiation, auctions and optimization 21

auction-like procedures, in which the individual agents “bid” for the different resourcesin a pre-specified way. We will look at open-outcry, ascending auction-like procedures,resembling the English auction discussed in Chapter 10. Before that, however, we takea slightly closer look at the connection between optimization problem and competitiveequilibrium.

Competitive equilibrium and primal-dual problems

Theorem 1.3.5 may seem at first almost magical; why would an economic notion proverelevant to an optimization problem? However, a slightly closer look removes someof the mystery. Rather than look at the specific LP corresponding to the assignmentproblem, consider the general form of an LP:

maximizen∑

i=1

cixi

subject ton∑

i=1

aijxi ≤ bj ∀j ∈ 1, . . . ,m

xi ≥ 0 ∀i ∈ 1, . . . , n

Note that this formulation makes reverse use the≤ and≥ signs as compared to theformulation in Appendix B. As we remark there, this is simplya matter of the signs ofthe constants used.

The primal problem has a natural economic interpretation, regardless of its actualorigin. Imagine aproduction economy, in which you have a set of resources and aproduction

economyset of products. Each product consumes a certain amount of each resource, and eachproduct is sold at a certain price. Interpretxi as the amount of producti produced,andci as the price of producti. Then the optimization problem can be interpreted asprofit maximization. Of course, this must be done within the constraints of availableresources. If we interpretbj as the available amount of resourcej andaij as the amountof resourcej needed to produce a unit of producti, then the constraint

∑i aijxi ≤ bj

appropriately captures the limitation on resourcej.Now consider the dual problem:

minimizem∑

i=1

biyi

subject tom∑

i=1

aijyi ≥ cj ∀j ∈ 1, . . . , n

yi ≥ 0 ∀i ∈ 1, . . . ,m

It turns out thatyi can also be given a meaningful economic interpretation, namelyas themarginal valueof resourcei, also known as itsshadow price. The shadow price shadow pricecaptures the sensitivity of the optimal solution to a small change in the availability of

Multiagent Systems, draft of August 27, 2007

Page 38: Mas kevin b-2007

22 1 Distributed Problem Solving

that particular resource, holding everything else constant. A high shadow price meansthat increasing its availability would have a large impact on the optimal solution, andvice versa.2

This helps explain why the economic perspective on optimization, at least in the con-text of linear programming, is not that odd. Indeed, armed with these intuitions, onecan look at traditional algorithms such as the Simplex method and give them an eco-nomic interpretation. In the next section we look at a specific auction-like algorithm,which is overtly economic in nature.

A naive auction algorithm

We start with a naive auction-like procedure which is “almost” right; it contains themain ideas, but has a major flaw. In the next section we will fix that flaw. The naiveprocedure begins with no objects allocated, and terminatesonce it has found a feasiblesolution. We define the naive auction algorithm formally as follows:

Naive Auction Algorithm// Initialization:S = ∅forall j ∈ X do

pj = 0

repeat// Bidding Step:let i ∈ N be an unassigned agent// Find an objectj ∈ X which offersi maximal value at current prices:j ∈ arg maxk|(i,k)∈M (v(i, k)− pk)// Computei’s bid increment forj:bi = (v(i, j)− pj)−maxk|(i,k)∈M ;k 6=j(v(i, k)− pk)// which is the difference between the value toi of the best and second-best objects at

current prices (note thati’s bid will be the current price plus this bid increment).// Assignment Step:add the pair(i, j) to the assignmentSif there is another pair(i′, j) then

remove it from the assignmentSincrease the pricepj by the incrementbi

until S is feasible// that is, it contains an assignment for alli ∈ N

It is not hard to verify that the following is true of the algorithm:

Theorem 1.3.6 The naive algorithm only terminates at a competitive equilibrium.

Here, for example, is a possible execution of the algorithm on our current exam-ple. The following table shows each round of bidding. In thisexecution we pick theunassigned agents in order, round-robin style.

2. To be precise, the shadow price is the value of the Lagrangemultiplier at the optimal solution.

c© Shoham and Leyton-Brown, 2007

Page 39: Mas kevin b-2007

1.3 Negotiation, auctions and optimization 23

round p1 p2 p3 bidder preferred bid currentobject incr. assignment

0 0 0 0 1 x2 2 (1, x2)1 0 2 0 2 x2 2 (2, x2)2 0 4 0 3 x3 1 (2, x2), (3, x3)3 0 4 1 1 x1 2 (2, x2), (3, x3), (1, x1)

Thus, at first agents 1 and 2 compete forx2, but quicklyx2 becomes too expensivefor agent 1, who opts forx1. By the time agent 3 gets to bid he is priced out of hispreferred item,x2, and settles forx3.

Thus when the procedure terminates we have our solution. Theproblem though isthat it may not terminate. This can occur when more than one object offers maximalvalue for a given agent; in this case the agent’s bid increment will be zero. If these twoitems also happen to be the best items for another agent, theywill enter into an infinitebidding war in which the price never rises. Consider a modification of our previousexample, in which the value function is given by the following table:

x1 x2 x3

1 1 1 02 1 1 03 1 1 0

The naive auction protocol would proceed as follows.

round p1 p2 p3 bidder preferred bid currentobject incr. assignment

0 0 0 0 1 x1 0 (1, x1)1 0 0 0 2 x2 0 (1, x1), (2, x2)2 0 0 0 3 x1 0 (3, x1), (2, x2)3 0 0 0 1 x2 0 (3, x1), (1, x2)4 0 0 0 2 x1 0 (2, x1), (1, x2)...

......

......

......

...

Clearly, in this example the naive algorithm will have the three agents forever fight overthe two desired objects.

A terminating auction algorithm

To remedy the flaw exposed above, we must ensure that prices continue to increasewhen objects are contested by a group of agents. The extension is quite straightforward:we add a small amount to the bidding increment. Thus we calculate the bid incrementof agenti ∈ N as follows.

bi = u(i, j)− maxk|(i,k)∈M ;k 6=j

u(i, k) + ǫ

Otherwise, the algorithm is as stated above.Consider again the problematic assignment problem on whichthe naive algorithm

did not terminate. The terminating auction protocol would proceed as follows.

Multiagent Systems, draft of August 27, 2007

Page 40: Mas kevin b-2007

24 1 Distributed Problem Solving

round p1 p2 p3 bidder preferred bidding assignmentobject increment

0 ǫ 0 0 1 x1 ǫ (1, x1)1 ǫ 2ǫ 0 2 x2 2ǫ (1, x1), (2, x2)2 3ǫ 2ǫ 0 3 x1 2ǫ (3, x1), (2, x2)3 3ǫ 4ǫ 0 1 x2 2ǫ (3, x1), (1, x2)4 5ǫ 4ǫ 0 2 x1 2ǫ (2, x1), (1, x2)

Note that at each iteration, the price for the preferred itemincreases by at leastǫ. Thisgives us some hope that we will avoid nontermination. We mustfirst though make surethat, if we terminate, we terminate with the “right” results.

First, because the prices must increase by at leastǫ at every round, the competitiveequilibrium property is no longer preserved over the iteration. Agents may “overbid”on some objects. For this reason we will need to define a notionof ǫ-competitiveequilibrium.ǫ-competitive

equilibriumDefinition 1.3.7 (ǫ-competitive equilibrium) S and p satisfyǫ-competitive equilib-rium when for eachi ∈ N , if there exists a pair(i, j) ∈ S then∀k, u(i, j) + ǫ ≥u(i, j).

In other words, in anǫ-equilibrium no agent can profit more thanǫ by bidding for anobject other than his assigned one, given current prices.

Theorem 1.3.8 A feasible assignmentS with n goods that forms anǫ-competitiveequilibrium with some price vector, is withinnǫ of optimal.

Corollary 1.3.9 Consider a feasible assignment problem with an integer valuationfunctionv : M → Z. If ǫ < 1

n then any feasible assignment found by the termi-nating auction algorithm will be optimal.

This leaves the question of whether the algorithm indeed terminates, and if so howquickly. To see why the algorithm must terminate, note that if an object receives a bidin k iterations, its price must exceed its initial price by at leastkǫ. Thus, for sufficientlylargek, the object will become expensive enough to be judged inferior to some objectthat has not received a bid so far. The total number of iterations in which an objectreceives a bid must be no more than

max(i,j) v(i, j)−min(i,j) v(i, j)

ǫ.

Once all objects receive at least one bid, the auction terminates (do you see why?).If each iteration involves a bid by single agent, the total number of iterations is no morethann times the preceding quantity. Thus, since each bid requiresO(n) operations, therunning time of the algorithm isO(n2 max(i,j)

|v(i,j)|ǫ ). Observe that ifǫ = O(1/n)

(as discussed in Corollary 1.3.9), the algorithm’s runningtime isO(n3k), wherek isa constant that does not depend onn, yielding performance similar to linear program-ming.

c© Shoham and Leyton-Brown, 2007

Page 41: Mas kevin b-2007

1.3 Negotiation, auctions and optimization 25

1.3.3 The scheduling problem and integer programming

The problem and its integer program

The scheduling probleminvolves a set of time slots and a set of agents. Each agentschedulingproblemrequires some number of time slots, and has a deadline. Intuitively, the agents each have

a task that requires the use of a shared resource, and that task lasts a certain number ofhours and has a certain deadline. Each agent also has some value for completing thetask by the deadline. Formally, we have the following definition.

Definition 1.3.10 (Scheduling problem)A scheduling problem consists of a tupleC =(N,X, q, v), where

• N is a set ofn agents

• X is a set ofm discrete and consecutive time slots

• q = (q1, . . . , qm) is a reserve price vector, whereqj is a reserve value for time slotxj ; q can be thought of as the value for the slot of the owner of the resource, thevalue he could get for it by allocating it other than to one of then agents.

• v = (v1, . . . , vn), wherevi, the valuation function of agenti, is a function overpossible allocations of time slots that is parameterized bytwo arguments:di, thedeadlines of agenti, andλi, the required number of time slots required by agenti.Thus for an allocationFi ⊆ 2X , we have that

vi(Fi) =

wi if Fi includesλi hours beforedi

0 otherwise

A solution to a scheduling problem is a vectorF = (F∅, F1, . . . , Fn), whereFi isthe set of time slots assigned to agenti, andF∅ is the time slots that are not assigned.The value of a solution is defined as

V (F ) =∑

j|xj∈F∅

qj +∑

i∈N

vi(Fi)

A solution is optimal if no other solution has a higher value.Here is an example, involving scheduling jobs on a busy processor. The processor

has several discrete time slots for the day—specifically, eight one-hour time slots from9:00 am to 5:00 pm. Its operating costs force it to have a reserve price of $3 per hour.There are four jobs, each with its own length, deadline, and worth. They are showed inthe following table:

job length (λ) deadline (d) worth (w)1 2 hours 1:00 pm $10.002 2 12:00 pm 16.003 1 12:00 pm 6.004 4 5:00 pm 14.50

Multiagent Systems, draft of August 27, 2007

Page 42: Mas kevin b-2007

26 1 Distributed Problem Solving

Even in this small example it takes a moment to see that an optimal solution is toallocate the machines as follows.

time slot agent9:00 pm 210:00 pm 211:00 pm 112:00 pm 113:00 pm 414:00 pm 415:00 pm 416:00 pm 4

The question is again how to find the optimal schedule algorithmically. The schedul-ing problem is inherently more complex than the assignment problem. The reason isthat the dependence of agents’ valuation functions on the job length and deadline im-poses exhibits bothcomplementarityandsubstitutability. For example, for agent 1 anycomplementarity

substitutabilitytwo blocks of two hours prior to 1:00 are perfect substitutes. On the other hand, anytwo single time slots before the deadline are strongly complementary; alone they areworth nothing, but together they are worth the full $10. Thismakes for a more complexsearch space than in the case of the assignment problem, and whereas the assignmentproblem is polynomial, the scheduling problem isNP-complete. Indeed, the schedul-ing application is merely an instance of the generalset packing problem.3set packing

problem The complex nature of the scheduling problem has many ramifications. Amongother things, this means that we cannot hope to find a polynomial LP encoding of theproblem (since linear programming has a polynomial-time solution). We can, however,encode it as an integer program. In the following for every subsetS ⊆ X, the booleanvariablexi,S will represent the fact that agenti was allocated the bundleS, andvi(S)his valuation for that bundle.

maximize∑

S⊆X,i∈N

vi(S)xi,S

subject to∑

S⊆X

xi,S ≤ 1 ∀i ∈ N∑

S⊆X:j∈S,i∈N

xi,S ≤ 1 ∀j ∈ X

xi,S ∈ 0, 1 ∀S ⊆ X, i ∈ N

In general, the length of the optimized quantity is exponential in the size ofX. Inpractice, many of the terms can be assumed to be zero, and thusdropped. However,

3. Even the scheduling problem can be defined much more broadly.It could involve earliest start timesand well as deadlines, could require contiguous blocks of time for a given agent (this turns out that thisrequirement does not to matter in our current formulation), could involve more than one resource, and so on.But the current problem formulation is rich enough for out purposes.

c© Shoham and Leyton-Brown, 2007

Page 43: Mas kevin b-2007

1.3 Negotiation, auctions and optimization 27

even when the IP is small, our problems are not over. IPs are not in general solvable inpolynomial time, so we cannot hope for easy answers. However, it turns out that a gen-eralization of the auction-like procedure can be applied inthis case too. The price wewill pay for the higher complexity of the problem is that the generalized algorithm willnot come with the same guarantees as we had in the case of the assignment problem.

A more general form of competitive equilibrium

We start by revisiting the notion ofcompetitive equilibrium. The definition really does competitiveequilibriumnot change, but rather is generalized to apply to assignments of bundles of time slots

rather than single objects.

Definition 1.3.11 (Competitive equilibrium, generalized form) Given a schedulingproblem, a solutionF is in competitive equilibriumat pricesp if and only if

• For all i ∈ N it is the case thatFi = arg maxT⊆X(vi(T )−∑j|xj∈T pj) (the setof time slots allocated to agenti maximizes his surplus at pricesp)

• For all j such thatxj ∈ F∅ it is the case thatpj = qj (the price of all unallocatedtime slots in the reserve price)

• For all j such thatxj 6∈ F∅ it is the case thatpj ≥ qj (the price of all allocatedtime slots is greater than the reserve price)

As was the case in the assignment problem, a solution that is in competitive equilib-rium is guaranteed to be optimal.

Theorem 1.3.12 If a solutionF to a scheduling problemC is in equilibrium at pricesp, thenF is also optimal forC.

We give an informal proof to facilitate understanding of thetheorem. Assume thatF is in equilibrium at pricesp; we would like to show that the total value ofF is higherthan the total value of any other solutionF ′. Starting with the definition of the totalvalue of the solutionF , the following equations shows this inequality for an arbitraryF ′.

V (F ) =∑

j|xj∈F∅

qj +∑

i∈N

vi(Fi)

=∑

j|xj∈F∅

pj +∑

i∈N

vi(Fi)

=∑

j|xj∈X

pj +∑

i∈N

vi(Fi)−∑

j|xj∈Fi

pj

≥∑

j|xj∈X

pj +∑

i∈N

vi(F′i )−

j|xj∈F ′i

pj

= V (F ′)

Multiagent Systems, draft of August 27, 2007

Page 44: Mas kevin b-2007

28 1 Distributed Problem Solving

The last line comes from the definition of a competitive equilibrium, for each agenti,there does not exist another allocationF ′

i that would yield a larger profit at the currentprices (formally,∀i, F ′

i vi(Fi)−∑

j|xj∈Fipj ≥ vi(F

′i )−

∑j|xj∈F ′

ipj). Applying this

condition to all agents, it follows that no alternative allocationF ′ with a higher totalvalue.

Consider our sample scheduling problem. A competitive equilibrium for that prob-lem is show in the following table:

time slot agent price9:00 pm 2 $6.2510:00 pm 2 $6.2511:00 pm 1 $6.2512:00 pm 1 $3.2513:00 pm 4 $3.2514:00 pm 4 $3.2515:00 pm 4 $3.2516:00 pm 4 $3.25

Note that the price of all allocated time slots is higher thanthe reserve prices of $3.00.Also note that the allocation of time slots to each agent maximizes his surplus at thepricesp. Finally, also notice that the solution is stable, in that noagent can profit bymaking an offer for an alternative bundle at the current prices.

Even before we ask how we might find such a competitive equilibrium, we shouldnote that one does not always exist. Consider a modified version of our schedulingexample, in which the processor has two one-hour time slots,at 9:00 am and at 10:00am, and there are two jobs as follows:

job length (λ) deadline (d) worth (w)1 2 hours 11:00 am $10.002 1 11:00 am 6.00

Table 1.1 A problematic scheduling example.

The reserve price is $3 per hour. We show that no competitive equilibrium existsby case analysis. Clearly, if agent 1 is allocated a slot he must be allocated both slots.But then their combined price cannot exceed $10, and thus forat least one of thosehours the price must not exceed $5. However, agent 2 is willing to pay as much as $6for that hour, and thus we are out of equilibrium. Similarly,if agent 2 is allocated atleast one of the two slots, their combined price cannot exceed $6, his value. But thenagent 1 would happily pay more and get both slots. Finally, wecannot have both slotsunallocated, since in this case their combined price would be $6, the sum of the reserveprices, in which case both agents would have the incentive tobuy.

This instability arises from the fact that the agents’ utility functions are superadditive(or, equivalently, that there are complementary goods). This suggest some restrictiveconditions under which we are guaranteed the existence of a competitive equilibriumsolution. The first theorem captures the essential connection to linear programming:

c© Shoham and Leyton-Brown, 2007

Page 45: Mas kevin b-2007

1.3 Negotiation, auctions and optimization 29

Theorem 1.3.13A scheduling problem has a competitive equilibrium solution if andonly if the LP relaxation of the associated integer program has a integer solution.

The following theorem captures weaker sufficient conditions for the existence of acompetitive equilibrium solution:

Theorem 1.3.14A scheduling problem has a competitive equilibrium solution if anyone of the following conditions hold:

• For all agentsi ∈ N , there exists a time slotx ∈ X such that for allT ⊆ X,vi(T ) = vi(x) (each agent desires only a single time slot, which must be thefirstone in the current formulation)

• For all agentsi ∈ N , and for allR, T ⊆ X, such thatR ∩ T = ∅, vi(R ∪ T ) =vi(R) + vi(T ) (the utility functions are additive)

• Time slots are gross substitutes; demand for one time slot doesn’t decrease if theprice of another time slot increases

An auction algorithm

Perhaps the best known distributed protocol for finding a competitive equilibrium isthe so-calledascending-auction algorithm. In this protocol, the center advertises anascending-

auctionalgorithm

ask price, and the agents bid the ask price for bundles of time slots that maximize theirsurplus at the given ask prices. This process repeats until there is no change.

Let b = (b1, . . . , bm) be the bid price vector, wherebj is the highest bid so far fortime slotxj ∈ X. Let F = (F1, . . . , Fn) be the set of allocated slots for each agent.Finally, letǫ be the price increment. The ascending-auction algorithm isas follows.

Multiagent Systems, draft of August 27, 2007

Page 46: Mas kevin b-2007

30 1 Distributed Problem Solving

Ascending-auction algorithmforeachslotxj do

setbj = qj// Set the initial bids to be the reserve price

foreachagenti dosetFi = ∅

repeatforeachagenti = 1 toN do

foreachslotxj doif xj ∈ Fj then

pj = bjelse

pj = bj + ǫ

// Agents assume that they will get slots they are currently the high bidder onat that price, while they must increment the bid byǫ to get any other slot.

let S∗ = arg maxS⊆X|S⊇Fi(vi(S)−∑j∈S pj)

// Find the best subset of slots, given your current outstanding bidsSetFi = S∗

// Agenti becomes the high bidder for all slots inS∗ \ Fi.foreachslotxj ∈ S∗ \ Fi do

bj = bj + ǫif there exists an agentk such thatxj ∈ Fk then

setFk = Fk \ xj// Update the bidding price and current allocations of the other bidders.

until F does not change

The ascending-auction algorithm is very similar to the assignment problem auctionpresented in the previous section, with one notable difference. Instead of calculating abid increment from the difference between the surplus gained from the best and second-best object, the bid increment here is always constant.

Let us consider a possible execution of the algorithm to the sample scheduling prob-lem discussed above. We use an increment of $0.25 for this execution of the algorithm.

round bidder slots bid on F = (F1, F2, F3, F4) b0 1 (9,10) (9, 10, ∅, ∅, ∅) (3.25,3.25,3,3,3,3,3,3)1 2 (10,11) (9, 10, 11, ∅, ∅) (3.25,3.5,3.25,3,3,3,3,3)2 3 (9) (∅, 10, 11, 9, ∅) (3.5,3.5,3.25,3,3,3,3,3)...

......

......

24 1 ∅ (11, 12, 9, 10, ∅, 12, 13, 14, 15) (6.25,6.25,6.25,3.25,3.25,3.25,3.25,3.25)

At this point, no agent has a profitable bid, and the algorithmterminates. How-

c© Shoham and Leyton-Brown, 2007

Page 47: Mas kevin b-2007

1.3 Negotiation, auctions and optimization 31

ever, this convergence depended on our choice of the increment. Let us consider whathappens if we select an increment of $1.

round bidder slots bid on F = (F1, F2, F3, F4) b0 1 (9,10) (9, 10, ∅, ∅, ∅) (4,4,3,3,3,3,3,3)1 2 (10,11) (9, 10, 11, ∅, ∅) (4,5,4,3,3,3,3,3)2 3 (9) (∅, 10, 11, 9, ∅) (5,5,4,3,3,3,3,3)3 4 (12,13,14,15) (∅, 10, 11, 9, 12, 13, 14, 15) (5,5,4,4,4,4,4,3)4 1 (11,12) (11, 12, 10, 9, 12, 13, 14, 15) (5,5,5,5,4,4,4,3)5 2 (9,10) (11, 12, 9, 10, ∅, 12, 13, 14, 15) (6,6,5,5,4,4,4,3)6 3 (11) (12, 9, 10, 11, 12, 13, 14, 15) (6,6,6,5,4,4,4,3)7 4 ∅ (12, 9, 10, 11, 12, 13, 14, 15) (6,6,6,5,4,4,4,3)8 1 ∅ (12, 9, 10, 11, 12, 13, 14, 15) (6,6,6,5,4,4,4,3)

Unfortunately, this bidding process does not reach the competitive equilibrium be-cause of the fact that the bidding increment of $1 is not smallenough.

It is also possible for the ascending-auction algorithm algorithm to not converge toan equilibrium independently of how small the increment is.Consider another problemof scheduling jobs on a busy processor. The processor has three one-hour time slots, at9:00 am, 10:00 am, and 11:00 am, and there are three jobs as shown in the followingtable. The reserve price is $0 per hour.

job length (λ) deadline (d) worth (w)1 1 hour 11:00 am $2.002 2 hours 12:00 pm 20.003 2 hours 12:00 pm 8.00

Here an equilibrium exists, but the ascending auction can miss it, if agent 2 bids up the11:00 am slot.

Despite a lack of a guarantee of convergence, we might still like to be able to claimthat if we do converge then we converge to an optimal solution. Unfortunately, notonly can we not do that, we cannot even bound how far the solution is from optimal.Consider the following problem. The processor has two one-hour time slots, at 9:00am and 10:00 am (with reserve prices of $1 and $9, respectively), and there are twojobs as shown in the following table.

job length (λ) deadline (d) worth (w)1 1 hours 10:00 am $3.002 2 11:00 am 11.00

The ascending-auction algorithm will stop with the first slot allocated to agent 1 andthe second to agent 2. By adjusting the value to agent 2 and thereserve price of the11:00 am time slot, we can create examples in which the allocation is arbitrarily farfrom optimal.

One property we can guarantee, however, is termination, andwe show this by con-tradiction. Assume that the algorithm does not to converge.It must be the case that ateach round at least one agent bids on at least one time slot, causing the price of that slot

Multiagent Systems, draft of August 27, 2007

Page 48: Mas kevin b-2007

32 1 Distributed Problem Solving

to increase. After some finite number of bids on bundles that include a particular timeslot, it must be the case that the price on this slot is so high that every agent prefers theempty bundle to all bundles that include this slot. Eventually, this condition will holdfor all time slots, and thus no agent will bid on a non-empty bundle, contradicting theassumption that the algorithm does not converge. In the worst case, in each iterationonly one of then agents bids, and this bid is on a single slot. Once the sum of theprices exceeds the maximum total value for the agents, the algorithm must terminate,

giving us the following running time:O(nmaxFi

∑i∈N vi(Fi)

ǫ ).

1.4 Action selection in multiagent MDPs

In this section we discuss the problem of optimal action selection in multiagent MDPs.Recall that in a single-agent MDP the optimal policyπ∗ is characterized by the mutually-recursive Bellman equations:

Qπ∗

(s, a) = r(s, a) + β∑

s

p(s, a, s)V π∗

(s)

V π∗

(s) = maxa

Qπ∗

(s, a)

Furthermore, these equations turn into an algorithm—specifically, the dynamic-programming-stylevalue iteration algorithm—by replacing the equality signs “=” with assignmentvalue iteration

algorithm operators “←” and iterating repeatedly through those assignments.4

However, in real-world applications the situation is not that simple. For example,the MDP may not be known by the planning agent, and must be learned. This case isdiscussed in Chapter 6. But more basically, the MDP may simply be too large to iterateover all instances of the equations. In this case, one approach is to exploit independenceproperties of the MDP. One case where this arises is when the states can be describedby feature vectors; each feature can take on many values, andthus the number of statesis be exponential in the number of features. One would ideally like to solve the MDP intime polynomial in the number of features rather than the number of states, and indeedtechniques have been developed to tackle such MDPs with factored state spaces.

We do not address that problem here, but instead on a similar one that has to dowith the modularity of actions rather than of states. In amultiagent MDPany (global)multiagent MDPactiona is really a vector of local actions(a1, . . . , an), one by each ofn agents. Theassumption here is that the reward is common, so there is no issue of competitionamong the agents. There is not even a problem of coordination; we have the luxuryof a central planner (but see discussion at the end of this section of parallelizability).The only problem is that the number of global actions is exponential in the number ofagents. Can we somehow solve the MDP other than by enumerating all possible actioncombinations?

We will not address this problem, which is quite involved, infull generality. Insteadwe will focus on an easier subproblem. Suppose that theQ values for the optimal policy

4. The basics of single-agent MDPs are covered in Appendix C.

c© Shoham and Leyton-Brown, 2007

Page 49: Mas kevin b-2007

1.4 Action selection in multiagent MDPs 33

have already been computed, how hard is it to decide on which action each agent shouldtake? Since we are assuming away the problem of coordinationby positing a centralplanner, on the face of it the problem is straightforward. InAppendix C we state thatonce the optimal (or close to optimal)Q values are computed, the optimal policy is‘easily’ recovered; the optimal action in states is arg maxaQ

π∗

(s, a). But of course ifa ranges over an exponential number of choices by all agents, ‘easy’ becomes ‘hard’.Can we do better than naively enumerating over all action combinations by the agents?

In general the answer is no, but in practice, the interactionamong the agents’ actionscan be quite limited, which can be exploited in both the representation of theQ func-tion and in the maximization process. Specifically, in some cases we can associate anindividualQi function with each agenti, and express theQ function (either preciselyor approximately) as as linear sum of the individualQis:

Q(s, a) =

n∑

i=1

Qi(s, a)

The maximization problem now becomes

arg maxa

n∑

i=1

Qi(s, a)

This in an of itself is not very useful, as one still needs to look at the set of all globalactionsa, which is exponential inn, the number of agents. However, it is often alsothe case that each individualQi depends only on a small subset of the variables.

For example, imagine a metal reprocessing plant with four locations, each with adistinct function: One for loading contaminated material and unload reprocessed mate-rial; one for cleaning the incoming material; one for reprocessing the cleaned material;and one for eliminating the waste. The material flow among them is as depicted inFigure 1.4.

in

++WWWWWWWWWWWWWWWWWWWWWWWW out

Location 1: load and unload

++WWWWWWWWWWWWWWWWWWWW

33ggggggggggggggggggggggg

Computer 2: eliminate waste

33gggggggggggggggggggg

Station 3: clean

ssgggggggggggggggggggg

Computer 4: process

kkWWWWWWWWWWWWWWWWWWW

Figure 1.11 A metal reprocessing plant

Each station can be in one of several states, depending on theload at that time.The operator of the station has two actions available—‘pass material to next station inprocess’, and ‘suspend flow’. The state of the plant is a function of the state of each

Multiagent Systems, draft of August 27, 2007

Page 50: Mas kevin b-2007

34 1 Distributed Problem Solving

of the stations; the higher the utilization of existing capacity the better, but exceedingfull capacity is detrimental. Clearly, in any given global state of the system, the optimalaction of each local station depends only on the action of thestation directly ‘upstream’from it. Thus in our example the globalQ function becomes

Q(a1, a2, a3, a4) = Q1(a1, a2) +Q2(a2, a4) +Q3(a1, a3) +Q4(a3, a4),

and we wish to compute

arg max(a1,a2,a3,a4)

Q1(a1, a2) +Q2(a2, a4) +Q3(a1, a3) +Q4(a3, a4)

.(Note that in the above expressions we omit the state argument, since that is being

held fixed; we are looking at optimal action selection at a given state.)In this case we can employ avariable eliminationalgorithm, which optimizes thevariable

elimination choice for the agents one at a time. We explain the operation of the algorithm via ourexample. Let us begin our optimization with agent 4. To optimize a4, functionsQ1

andQ3 are irrelevant. Hence, we obtain:

maxa1,a2,a3

Q1(a1, a2) +Q3(a1, a3) + maxa4

[Q2(a2, a4) +Q4(a3, a4)].

We see that to make the optimal choice overa4, the values ofa2 anda3 must be known.In effect, what is being computed for agent4 is aconditional strategy, with a (possibly)conditional

strategy different action choice for each action choice of agents 2 and 3. The value which agent4 brings to the system in the different circumstances can be summarized using a newfunction e4(A2, A3) whose value at the pointa2, a3 is the value of the internalmaxexpression:

e4(a2, a3) = maxa4

[Q2(a2, a4) +Q4(a3, a4)].

Agent4 has now been “eliminated”, and our problem now reduces to computing

maxa1,a2,a3

Q1(a1, a2) +Q3(a1, a3) + e4(a2, a3),

having one fewer agent involved in the maximization. Next, the choice for agent 3 ismade, giving:

maxa1,a2

Q1(a1, a2) + e3(a1, a2),

wheree3(a1, a2) = maxa3[Q3(a1, a3) + e1(a2, a3)]. Next, the choice for agent 2 is

made:

e2(a1) = maxa2

[Q1(a1, a2) + e3(a1, a2)],

The remaining decision for agent 1 is now the following maximization:

e1 = maxa1

e2(a1).

c© Shoham and Leyton-Brown, 2007

Page 51: Mas kevin b-2007

1.5 History and references 35

The resulte1 is simply a number, the required maximization overa1, . . . , a4. Notethat although this expression is short, there is no free lunch; in order to perform thisoptimization, one needs to not only iterate over all actionsa1 of the first agent, but alsoover the action of the other agents as needed to unwind the internal maximizations.However, in general the total number of combinations will besmaller than the fullexponential combination of agent actions.5

We can recover the maximizing set of actions by performing the process in reverse:The maximizing choice fore1 defines the actiona∗1 for agent 1:

a∗1 = arg maxa1

e2(a1).

To fulfill its commitment to agent 1, agent 2 must choose the value a∗2 which yieldede2(a

∗1):

a∗2 = arg maxa2

[Q1(a∗1, a2) + e3(a

∗1, a2)],

This, in turn, forces agent 3 and then agent 4 to select their actions appropriately:

a∗3 = arg maxa3

[Q3(a∗1, a3) + e4(a

∗2, a3)],

and

a∗4 = arg maxa4

[Q2(a∗2, a4) +Q4(a

∗3, a4)].

Two final comments. First, note that this procedure is naturally parallelizable. Eachagent only needs to knows when to perform his part, and to whomto communicate theresults. Second, the variable-elimination technique is quite general. We discussed itin the particular context of multiagent MDPs, but it is relevant in any context in whichmultiple agents wish to perform a distributed optimizationof an factorable objectivefunction.

1.5 History and references

Distributed constraint satisfaction is discussed in detail in [Yokoo, 2001], and reviewedin [Yokoo & Hirayama, 2000]. Our section on distributed CSP follows the approachinitially introduced in [Yokoo, 1994].

Distributed dynamic programming is discussed in detail in [Bertsekas, 1982]. LRTA*is introduced in [Korf, 1990], and our section follows that material, as well as [Yokoo& Ishida, 1999].

5. Full discussion of this point is beyond the scope of the book, but for the record, the complexity of thealgorithm is exponential in the tree width of thecoordination graph; this is the graph whose nodes are theagents and whose edges connect agents whoseQ values share one or more arguments. The tree width isalso the maximum clique size minus one in the triangulation of the graph; each triangulation essentiallycorresponds to one of the variable elimination orders. Unfortunately, it is NP-hard to compute the optimalordering. The notes at the end of the chapter provide additional references on the topic.

Multiagent Systems, draft of August 27, 2007

Page 52: Mas kevin b-2007

36 1 Distributed Problem Solving

Contract nets were introduced in [Smith, 1980], and [Davis &Smith, 1983] is per-haps the most influential publication on the topic. The marginal-cost interpretationof contract nets was introduced in [Sandholm, 1993], and thediscussion of the ca-pabilities and limitations of the various contract types (O, C, S and M) followed in[Sandholm, 1998]. Auction algorithms for linear programming are discussed broadlyin [Bertsekas, 1991]. The specific algorithm for the matching problem is taken from[Bertsekas, 1992]. AUction algorithms for combinatorial problems in general are in-troduced in [Wellman, 1993], and the specific auction algorithms for the schedulingproblem appear in [Wellmanet al., 2001].

Distributed solutions to Markov Decision Problems are discussed in detail in [Guestrin,2003]; the discussion there goes far beyond the specific problem of joint action selec-tion covered here. The real-world sensor-net figure is also due to Carlos Guestrin.

c© Shoham and Leyton-Brown, 2007

Page 53: Mas kevin b-2007

2 Introduction to Non-CooperativeGame Theory: Games in NormalForm

Game theory is a mathematical study of interaction among independent, self-interestedagents. It is the most developed theory of such interactions, and has been applied todisciplines as diverse as economics (historically, its main area of application), politicalscience, biology, psychology, linguistics—and computer science. Here we concentrateon what has become the dominant branch of game theory, callednon-cooperativegametheory, and specifically on what can be said about normal formgames, a canonicalrepresentation in this discipline.

As an aside, the name “non-cooperative game theory” could bemisleading, since itmay suggest that the theory applies exclusively in situations in which the interests ofdifferent agents conflict. This is not the case, although it is fair to say that the theory ismost interesting in such situations. By the same token, in Chapter 11 we will see thatcoalitional game theory(also known ascooperative game theory) does not apply onlyin situations in which the interests of the agents align witheach other. The essentialdifference between the two branches is that in non-cooperative game theory the basicmodeling unit is the individual (including his beliefs, preferences, and possible actions)while in coalitional game theory the basic modeling unit is the group. We will returnto that later in Chapter 11, but for now let us proceed with theindividualistic approach.

2.1 Self-interested agents

What does it mean to say that agents are self-interested? It does not necessarily meanthat agents want to cause harm to each other, or even that theyonly care about them-selves. Instead, it means that each agent has his own description of which states of theworld he likes—which can include good things happening to other agents—and thathe acts in an attempt to bring about these states of the world.In this section we willconsider how to model such interests.

The dominant approach to modeling an agent’s interests isutility theory. This theo- utility theoryretical approach aims to quantify an agent’s degree of preference across a set of avail-able alternatives. The theory also aims to understand how these preferences changewhen an agent faces uncertainty about which alternative he will receive. When we re-fer to an agent’sutility function, as we will do throughout much of this book, we will utility function

Page 54: Mas kevin b-2007

38 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

be making an implicit assumption that the agent has desires about how to act whichare consistent with utility-theoretic assumptions. Thus,before we discuss game theory(and thus interactions betweenmultiple utility-theoretic agents), we should examinesome key properties of utility functions and explain why they are believed to form asolid basis for a theory of preference and rational action.

A utility function is a mapping from states of the world to real numbers. Thesenumbers are interpreted as measures of an agent’s level of happiness in the given states.When the agent is uncertain about which state of the world he faces, his utility is definedas the expected value of his utility function with respect tothe appropriate probabilitydistribution over states.

2.1.1 Example: friends and enemies

We begin with a simple example of how utility functions can beused as a basis formaking decisions. Consider an agent Alice, who has three options: going to the club(c), going to a movie (m), or watching a video at home (h). If she is on her own, Alicehas a utility of100 for c, 50 for m and50 for h. However, Alice is also interested inthe activities of two other agents, Bob and Carol, who frequent both the club and themovie theater. Bob is Alice’s nemesis; he’s downright painful to be around. If Aliceruns into Bob at the movies, she can try to ignore him and only suffers a disutility of40;however, if she sees him at the club he’ll pester her endlessly, yielding her a disutility of90. Unfortunately, Bob prefers the club: he’s there 60% of the time, spending the restof his time at the movie theater. Carol, on the other hand, is Alice’s friend. She makeseverything more fun. Specifically, Carol increases Alice’sutility for either activity bya factor of1.5 (after taking into account the possible disutility of running into Bob).Carol can be found at the club 25% of the time, and the movie theater 75% of the time.

It will be easier to determine Alice’s best course of action if we list Alice’s utility foreach possible state of the world. There are twelve outcomes that can occur: Bob andCarol can each be in either the club or the movie theater, and Alice can be in the club,the movie theater or at home. Alice has a baseline level of utility for each of her threeactions, and this baseline is adjusted if either Bob, Carol or both are present. Followingthe description above, we see that Alice’s utility is always50 when she stays home,and for her other two activities it is given by Figure 2.1.

B = c B = m

C = c 15 150

C = m 10 100

A = c

B = c B = m

C = c 50 10

C = m 75 15

A = m

Figure 2.1 Alice’s utility for the actionsc andm.

So how should Alice choose among her three activities? To answer this ques-

c© Shoham and Leyton-Brown, 2007

Page 55: Mas kevin b-2007

2.1 Self-interested agents 39

tion we need to combine her utility function with her knowledge of Bob and Carol’srandomized entertainment habits. Alice’s expected utility for going to the club canbe calculated as0.25(0.6 · 15 + 0.4 · 150) + 0.75(0.6 · 10 + 0.4 · 100) = 51.75.In the same way, we can calculate her expected utility for going to the movies as0.25(0.6 · 50 + 0.4 · 10) + 0.75(0.6(75) + 0.4(15)) = 46.75. Of course, Alice getsan expected utility of50 for staying home. Thus, Alice prefers to go to the club (eventhough Bob is often there and Carol rarely is), and prefers staying home to going to themovies (even though Bob is usually not at the movies and Carolalmost always is).

2.1.2 Preferences and utility

Because the idea of utility is so pervasive, it may be hard to see why anyone wouldargue with the claim that it provides a sensible formal modelfor reasoning about anagent’s happiness in different situations. However, when considered more carefullythis claim turns out to be substantive, and hence requires justification. For example,why should a single-dimensional function be enough to explain preferences over anarbitrarily complicated set of alternatives (rather than,say, a function that maps toa point in a three-dimensional space, or to a point in a space whose dimensionalitydepends on the number of alternatives being considered)? And why should an agent’sresponse to uncertainty be captured purely by the expected value of his utility function,rather than also depending on other properties of the distribution such as its standarddeviation or number of modes?

Utility theorists respond to such questions by showing thatthe idea of utility can begrounded in a more basic concept ofpreferences. The most influential such theory is preferencesdue to von Neumann and Morgenstern, and thus the utility functions are sometimescalled von Neumann-Morgenstern utility functions to distinguish them from other va-rieties. We will present that theory here.

LetO denote a finite set of outcomes. For any pairo1, o2 ∈ O let o1 o2 denote theproposition that the agent weakly preferso1 to o2. Let o1 ∼ o2 denote the propositionthat the agent is indifferent betweeno1 ando2. Finally, byo1 ≻ o2 denote the proposi-tion that the agent strictly preferso1 to o2. Note that while the second two relations arenotationally convenient, the first relation is the only one we actually need. This isbecause we can defineo1 ≻ o2 as “o1 o2 and noto2 o1”, ando1 ∼ o2 as “o1 o2ando2 o1.”

We need a way to talk about how preferences interact with uncertainty about whichoutcome will be selected. In utility theory this is achievedthrough the concept oflotteries. A lottery is the random selection of one of a set of outcomes according to lotteriesspecified probabilities. Formally, a lottery is a probability distribution over outcomeswritten [p1 : o1, . . . , pk : ok], where eachoi ∈ O, eachpi ≥ 0 and

∑ki=1 pi = 1.

We consider lotteries to be outcomes themselves. Thus, let us extend the relation toapply to lotteries as well as to the elements ofO.

We are now able to begin stating the axioms of utility theory.These are constraintson the relation which, we will argue, make it consistent with our ideas of how pref-erences should behave.

Axiom 2.1.1 (Completeness)∀o1, o2, o1 ≻ o2 or o2 ≻ o1 or o1 ∼ o2.

Multiagent Systems, draft of August 27, 2007

Page 56: Mas kevin b-2007

40 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

The completeness axiom states that the relation induces an ordering over the out-comes (which allows ties). For every pair of outcomes, either the agent prefers one tothe other or he is indifferent between them.

Axiom 2.1.2 (Transitivity) If o1 o2 ando2 o3 theno1 o3.

There is good reason to feel that every agent should have transitive preferences. If anagent’s preferences were non-transitive, then there wouldexist some triple of outcomeso1, o2 ando3 for whicho1 o2, o2 o3 ando3 ≻ o1. We can show that such an agentwould be willing to engage in behavior that is hard to call rational. Consider a worldin which o1, o2 ando3 correspond to owning three different items, and an agent whocurrently owns the itemo3. Sinceo2 o3, there must be some non-negative amountof money that the agent would be willing to pay in order to exchangeo3 for o2. (Ifo2 ≻ o3 then this amount would be strictly positive; ifo2 ∼ o3 then it would be zero.)Similarly, the agent would pay a non-negative amount of money to exchangeo2 for o1.However, from non-transitivity (o3 ≻ o1) the agent wouldalsopay a strictly positiveamount of money to exchangeo1 for o3. The agent would thus be willing to pay astrictly positive sum to exchangeo3 for o3 in three steps. Such an agent could quicklybe separated fromany amount of money, which is why such a scheme is known as amoney pump.money pump

Axiom 2.1.3 (Substitutability) If o1 ∼ o2 then for all sequences of one or more out-comeso3, . . . , ok and sets of probabilitiesp, p3, . . . , pk for whichp +

∑ki=3 pi = 1,

[p : o1, p3 : o3, . . . , pk : ok] ∼ [p : o2, p3 : o3, . . . , pk : ok].

LetPℓ(oi) denote the probability that outcomeoi is selected by lotteryℓ. For exam-ple, if ℓ = [0.3 : o1; 0.7 : [0.8 : o2; 0.2 : o1]] thenPℓ(o1) = 0.44 andPℓ(o3) = 0.

Axiom 2.1.4 (Decomposability) If ∀oi ∈ O, Pℓ1(oi) = Pℓ2(oi) thenℓ1 ∼ ℓ2.

These axioms describe the way preferences change when lotteries are introduced.Substitutability states that if an agent is indifferent between two outcomes, he is alsoindifferent between two lotteries that differ only in whichof these outcomes is offered.Decomposability states that an agent is always indifferentbetween lotteries that in-duce the same probabilities over outcomes, no matter whether these probabilities areexpressed through a single lottery or nested in a lottery over lotteries. For example,[p : o1, 1 − p : [q : o2, 1 − q : o3]] ∼ [p : o1, (1 − p)q : o2, (1 − p)(1 − q) : o3]. De-composability is sometimes called the “no fun in gambling” axiom because it impliesthat, all else being equal, the number of times an agent “rolls dice” has no affect on hispreferences.

Axiom 2.1.5 (Monotonicity) If o1 ≻ o2 andp > q then [p : o1, 1 − p : o2] ≻ [q :o1, 1− q : o2].

The monotonicity axiom says that agents prefer more of a goodthing. When an agentpreferso1 to o2 and considers two lotteries over these outcomes, he prefersthe lotterythat assigns the larger probability too1. This property is called monotonicity becauseit does not depend on the numerical values of the probabilities—the more weighto1receives, the happier the agent will be.

c© Shoham and Leyton-Brown, 2007

Page 57: Mas kevin b-2007

2.1 Self-interested agents 41

10 10

Figure 2.2 Relationship betweeno2 andℓ(p).

Lemma 2.1.6 If a preference relation satisfies the axioms Completeness, Transitiv-ity, Decomposability and Monotonicity, and ifo1 ≻ o2 ando2 ≻ o3 then there existssome probabilityp such that for allp′ < p, o2 ≻ [p′ : o1; (1 − p′) : o3], and for allp′′ > p, [p′′ : o1; (1− p′′) : o3] ≻ o2.

Proof. Denote the lottery[p : o1; (1 − p) : o3] asℓ(p). Consider someplow forwhich o2 ≻ ℓ(plow). Such aplow must exist sinceo2 ≻ o3; for example, bydecomposabilityplow = 0 satisfies this condition. By monotonicity,ℓ(plow) ≻ℓ(p′) for any 0 ≤ p′ < plow, and so by transitivity∀p′ ≤ plow, o2 ≻ ℓ(p′).Consider somephigh for which ℓ(phigh) ≻ o2. By monotonicity,ℓ(p′) ≻ ℓ(phigh)for any1 ≥ p′ > phigh, and so by transitivity∀p′ ≥ phigh, ℓ(p

′) ≻ o2. We thusknow the relationship betweenℓ(p) ando2 for all values ofp except those on theinterval(plow, phigh). This is illustrated in Figure 2.2 (left).

Considerp∗ = (plow +phigh)/2, the midpoint of our interval. By completeness,o2 ≻ ℓ(p∗) or ℓ(p∗) ≻ o2 or o2 ∼ ℓ(p∗). First consider the caseo2 ∼ ℓ(p∗). Itcannot be that there is also another pointp′ 6= p∗ for which o2 ∼ ℓ(p′): thiswould entailℓ(p∗) ∼ ℓ(p′) by transitivity, and sinceo1 ≻ o3 this would violatemonotonicity. For allp′ 6= p∗, then, it must be that eithero2 ≻ ℓ(p′) or ℓ(p′) ≻ o2.By the arguments above, if there was a pointp′ > p∗ for which o2 ≻ ℓ(p′) then∀p′′ < p′, o2 ≻ ℓ(p′′), contradictingo2 ∼ ℓ(p∗). Similarly there cannot be a pointp′ < p∗ for which ℓ(p′) ≻ o2. The relationship that must therefore hold betweeno2 andℓ(p) is illustrated in Figure 2.2 (right). Thus in the caseo2 ∼ ℓ(p∗) we haveour result.

Otherwise ifo2 ≻ ℓ(p∗) then by the argument aboveo2 ≻ ℓ(p′) for all p′ ≤ p∗.Thus we can redefineplow—the lower bound of the interval of values for whichwe do not know the relationship betweeno2 and ℓ(p)—to be p∗. Likewise, ifℓ(p∗) ≻ o2 then we can redefinephigh = p∗. Either way, our interval(plow, phigh)is halved. We can continue to iterate the above argument, examining the midpointof the updated interval(plow, phigh). Either we will encounter ap∗ for whicho2 ∼ ℓ(p∗), or in the limitplow will approach somep from below, andphigh willapproach thatp from above.

Something our axioms do not tell us is what preference relation holds betweeno2and the lottery[p : o1; (1− p) : o3]. It could be that the agent strictly preferso2 in this

Multiagent Systems, draft of August 27, 2007

Page 58: Mas kevin b-2007

42 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

case, that the agent strictly prefers the lottery, or that the agent is indifferent. Our finalaxiom says that the third alternative—depicted in Figure 2.2(right)—always holds.

Axiom 2.1.7 (Continuity) If o1 ≻ o2 ando2 ≻ o3 then∃p ∈ [0, 1] such thato2 ∼ [p :o1, 1− p : o3].

If we accept Axioms 2.1.1, 2.1.2, 2.1.4, 2.1.5 and 2.1.7, it turns out that we haveno choice but to accept the existence of single-dimensionalutility functions whoseexpected values agents want to maximize. (And if we donot want to reach this con-clusion, we must therefore give up at least one of the axioms.) This fact is stated as thefollowing theorem.

Theorem 2.1.8 (von Neumann and Morgenstern, 1944)If a preference relationsatisfies the axioms Completeness, Transitivity, Substitutability, Decomposability, Mono-tonicity and Continuity, then there exists a functionu : O → [0, 1] with the propertiesthat

1. u(o1) ≥ u(o2) iff o1 o2, and

2. u([p1 : o1, . . . , pk : ok]) =∑k

i=1 piu(oi).

Proof. If the agent is indifferent between all outcomes, then for all oi ∈ O setu(oi) = 0. In this case Part 1 follows trivially (both sides of the implication arealways true), and Part 2 is immediate from decomposability.

Otherwise, there must be a set of one or more most-preferred outcomes and adisjoint set of one or more least-preferred outcomes. (There may of course be otheroutcomes belonging to neither set.) Label one of the most-preferred outcomes asoand one of the least-preferred outcomes aso. For any outcomeoi, defineu(oi) tobe the numberpi such thatoi ∼ [pi : o, (1− pi) : o]. By continuity such a numberexists; by Lemma 2.1.6 it is unique.

Part 1: u(o1) ≥ u(o2) iff o1 o2.Let ℓ1 be the lottery such thato1 ∼ ℓ1 = [u(o1) : o; 1− u(o1) : o]; similarly, let

ℓ2 be the lottery such thato2 ∼ ℓ2 = [u(o2) : o; 1− u(o1) : o]. First, we show thatu(o1) ≥ u(o2) ⇒ o1 o2. If u(o1) > u(o2) then, sinceo ≻ o we can concludethatℓ1 ≻ ℓ2 by monotonicity. Thus we haveo1 ∼ ℓ1 ≻ ℓ2 ∼ o2; by transitivity,substitutability and decomposability this giveso1 ≻ o2. If u(o1) = u(o2) theℓ1 andℓ2 are identical lotteries; thuso1 ∼ ℓ1 ≡ ℓ2 ∼ o2, and transitivity giveso1 ∼ o2.

Now we must show thato1 o2 ⇒ u(o1) ≥ u(o2). It suffices to prove thecontrapositive of this statement,u(o1) 6≥ u(o2)⇒ o1 6 o2, which can be rewrittenasu(o2) > u(o1)⇒ o2 ≻ o1 by completeness. This statement was already provenabove (with the labelso1 ando2 swapped).

Part 2: u([p1 : o1, . . . , pk : ok]) =∑k

i=1 piu(oi).Let u∗ = u([p1 : o1, . . . , pk : ok]). From the construction ofu we know

that oi ∼ [u(oi) : o, (1 − u(oi)) : o]. By substitutability, we can replaceeach oi in the definition ofu∗ by the lottery [u(oi) : o, (1 − u(oi)) : o],

c© Shoham and Leyton-Brown, 2007

Page 59: Mas kevin b-2007

2.2 Games in normal form 43

giving us u∗ = u([p1 : [u(o1) : o, (1 − u(o1)) : o], . . . , pk : [u(ok) :o, (1 − u(ok)) : o]]). This nested lottery only selects between the two out-comeso and o. This means that we can use decomposability to conclude

u∗ = u([(∑k

i=1 piu(oi))

: o, 1−(∑k

i=1 piu(oi))

: o])

. By our definition of

u, u∗ =∑k

i=1 piu(oi).

One might wonder why we do not use money to express the real-valued quantitythat rational agents want to maximize, rather than inventing the new concept of utility.The reason is that while it is reasonable to assume that all agents get happier the moremoney they have, it is often not reasonable to assume that agents care only about theexpected valuesof their bank balances. For example, consider a situation inwhich anagent is offered a gamble between a payoff of two million and apayoff of zero, witheven odds. When the outcomes are measured in units of utility (“utils”) then Theo-rem 2.1.8 tells us that the agent would prefer this gamble to asure payoff of 999,999utils. However, if the outcomes were measured in money, few of us would prefer togamble—most people would prefer a guaranteed payment of nearly a million dollarsto a double-or-nothing bet. This is not to say that utility-theoretic reasoning goes outthe window when money is involved. It simply points out that utility and money areoften not linearly related. This issue is discussed in more detail in Section 9.3.1.

What if we want a utility function that is not confined to the range [0, 1], such asthe one we had in our friends and enemies example? Luckily, Theorem 2.1.8 doesnot require that every utility function maps to this range; it simply shows that onesuch utility function must exist for every set of preferences that satisfy the requiredaxioms. Indeed, the absolute magnitudes of the utility function evaluated at differentoutcomes are unimportant. Instead, every positive affine transformation of a utilityfunction yields another utility function for the same agent(in the sense that it will alsosatisfy both properties of Theorem 2.1.8). In other words, if u(o) is a utility functionfor a given agent thenu′(o) = au(o) + b is also a utility function for the same agent,as long asa andb are constants anda is positive.

2.2 Games in normal form

We have seen that under reasonable assumptions about preferences, agents will alwayshave utility functions whose expected values they want to maximize. This suggeststhat acting optimally in an uncertain environment is conceptually straightforward—atleast as long as the outcomes and their probabilities are known to the agent and can besuccinctly represented. However, things can get considerably more complicated whenthe world containstwo or moreutility-maximizing agents whose actions can affecteach other’s utility. (To augment our example from Section 2.1.1, what if Bob hatesAlice and wants to avoid her too, while Carol is indifferent to seeing Alice and has acrush on Bob? In this case, we might want to revisit our previous assumption that Boband Carol will act randomly without caring about what the other two agents do.) Tostudy such settings, we will turn to game theory.

Multiagent Systems, draft of August 27, 2007

Page 60: Mas kevin b-2007

44 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

2.2.1 Example: The TCP user’s game

Let us begin with a simpler example to provide some intuitionabout the type of phe-nomena we would like to study. Imagine that you and another colleague are the onlypeople using the internet. Internet traffic is governed by the TCP protocol. One featureof TCP is thebackoff mechanism; if the rates at which you and your colleague sendinformation packets into the network causes congestion, you each back off and reducethe rate for a while until the congestion subsides. This is how a correct implementa-tion works. A defective one, however, will not back off when congestion occurs. Youhave two possible strategies: C (for using a Correct implementation) and D (for usinga Defective one). If both you and your colleague adopt C then your average packetdelay is 1ms (millisecond). If you both adopt D the delay is 3ms, because of additionaloverhead at the network router. Finally, if one of you adoptsD and the other adopts Cthen the D adopter will experience no delay at all, but the C adopter will experience adelay of 4ms.TCP user’s

game

prisoner’sdilemma game

These consequences are shown in Figure 2.3. Your options arethe two rows, andyour colleague’s options are the columns. In each cell, the first number representsyour payoff (or, minus your delay), and the second number represents your colleague’spayoff.1

C D

C −1,−1 −4, 0

D 0,−4 −3,−3

Figure 2.3 The TCP user’s (aka the Prisoner’s) Dilemma.

Given these options what should you adopt,C or D? Does it depend on what youthink your colleague will do? Furthermore, from the perspective of the network opera-tor, what kind of behavior can he expect from the two users? Will any two users behavethe same when presented with this scenario? Will the behavior change if the networkoperator allows the users to communicate with each other before making a decision?Under what changes to the delays would the users’ decisions still be the same? Howwould the users behave if they have the opportunity to face this same decision with thesame counterpart multiple times? Do answers to the above questions depend on howrational the agents are and how they view each other’s rationality?

Game theory gives answers to many of these questions. It tells us that any rationaluser, when presented with this scenario once, will adoptD—regardless of what theother user does. It tells us that allowing the users to communicate beforehand willnot change the outcome. It tells us that for perfectly rational agents, the decision willremain the same even if they play multiple times; however, ifthe number of times thatthe agents will play this is infinite, or even uncertain, we may see them adoptC.

1. A more standard name for this game is the Prisoner’s dilemma; we return to this in Section 2.2.3.

c© Shoham and Leyton-Brown, 2007

Page 61: Mas kevin b-2007

2.2 Games in normal form 45

2.2.2 Definition of games in normal form

The normal form, also known as thestrategicor matrix form, is the most familiar repre- game instrategic form

game in matrixform

sentation of strategic interactions in game theory. A game written in this way amountsto a representation of every player’s utility for every state of the world, in the specialcase where states of the world depend only on the players’ combined actions. Consid-eration of this special case may seem uninteresting. However, it turns out that settingsin which the state of the world also depends on randomness in the environment—called Bayesian games and introduced in Section 5.3—can be reduced to (much larger)normal form games. Indeed, there also exist normal-form reductions for other gamerepresentations, such as games that involve an element of time (extensive form games,introduced in Chapter 4). Because most other representations of interest can be reducedto it, the normal form representation is arguably the most fundamental in game theory.

Definition 2.2.1 (Normal-form game) A (finite,n-person)normal-form gameis a tu-ple (N,A,O, µ, u), where

• N is a finite set ofn players, indexed byi;

• A = (A1, . . . , An), whereAi is a finite set ofactions(or pure strategies; we will action,or purestrategyuse the terms interchangeably) available to playeri. Each vectora = (a1, . . . , an) ∈

A is called anaction profile(or pure strategy profile);

action profile

pure strategyprofile

• O is a set of outcomes;

• µ : A→ O determines the outcome as a function of the action profile; and

• u = (u1, . . . , un) whereui : O → R is a real-valuedutility (or payoff) functionutility function

payoff function

for playeri.

It may not be obvious why we need the notion of an outcome as distinct from astrategy profile. Indeed, often we do not. In such cases the game has the simpler formN,A, u, and in much of the remainder of the book we will adopt this form. However,two comments are in order here. First, from the formal standpoint (N,A, u) is viewedas shorthand for the game(N,A,A, I, u) whereI is the identity mapping. Second,sometimes it will be quite important to distinguish action profiles and outcomes; inparticular this will be true when we speak about mechanism design and auctions inChapters 9 and 10.

A natural way to represent games is via ann-dimensional matrix. Figure 2.4 illus-trates this for the special case of 2-player, 2-strategy games. In this representation,each row corresponds to a possible action for player 1, each column corresponds to apossible action for player 2, and each cell corresponds to one possible outcome. Eachplayer’s utility for an outcome is written in the cell corresponding to that outcome, withplayer 1’s utility listed first.

Multiagent Systems, draft of August 27, 2007

Page 62: Mas kevin b-2007

46 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

Player 1

Player 2

a12 a2

2

a11 u1(a

11, a

12), u2(a

11, a

12) u1(a

11, a

22), u2(a

11, a

22)

a21 u1(a

21, a

12), u2(a

21, a

12) u1(a

21, a

22), u2(a

21, a

22)

Figure 2.4 The matrix representation of a two-player, two-strategy normal-form game.

2.2.3 More examples of normal-form games

Prisoner’s dilemma

Above we already saw an example of a game in normal form, namely the Prisoner’s (orthe TCP user’s) Dilemma. As discussed in Section 2.1.2, the precise payoff numbersplay a limited role. The essence of the Prisoner’s Dilemma example would not changeif the−4 was replaced by−5, or if 100 was added to each of the numbers. In its mostgeneral form, the Prisoner’s Dilemma is any normal-form game shown in Figure 2.5 inwhich c > a > d > b.2

C D

C a, a b, c

D c, b d, d

Figure 2.5 Any c > a > d > b define an instance of Prisoner’s Dilemma.

Incidentally, the name ‘Prisoners’ Dilemma’ for this famous game theoretic situationderives from the original story accompanying the numbers. The players of the gameare two prisoners suspected of a crime rather than two network users. The prisonersare taken to separate interrogation rooms, and each can either Confess to the crime orDeny it (or, alternatively, Cooperate or Defect). The absolute values of the numbersrepresent the length of jail term each of prisoner will get ineach scenario.

Common-payoff games

There are some restricted classes of normal-form games thatdeserve special mention.The first is the class ofcommon-payoff games. These are games in which, for everyaction profile, all players have the same payoff.

2. Under some definitions, there is the further requirement that a > (b + c)/2, which guarantees that theoutcome(C, C) maximizes the sum of the agents’ utilities.

c© Shoham and Leyton-Brown, 2007

Page 63: Mas kevin b-2007

2.2 Games in normal form 47

Definition 2.2.2 (Common-payoff game)A common payoff game, or team game, is common-payoffgame

team game

a game in which for all action profilesa ∈ A1 × · · · × An and any pair of agentsi, j,it is the case thatui(a) = uj(a).

Common-payoff games are also calledpure coordination games, since in such games pure-coordinationgame

the agents have no conflicting interests; their sole challenge is to coordinate on anaction that is maximally beneficial to all.

Because of their special nature, we often represent common value games with anabbreviated form of the matrix in which we list only one payoff in each of the cells.

As an example, imagine two drivers driving towards each other in a country withouttraffic rules, and who must independently decide whether to drive on the left or on theright. If the players choose the same side (left or right) they have some high utility, andotherwise they have a low utility. The game matrix is shown inFigure 2.6.

Left Right

Left 1, 1 0, 0

Right 0, 0 1, 1

Figure 2.6 Coordination game.

Zero-sum games

At the other end of the spectrum from pure coordination gameslie zero-sum games, zero-sum gamewhich (bearing in mind the comment we made earlier about positive affine transforma-tions) are more properly calledconstant-sum games. Unlike common-payoff games, constant-sum

gamesconstant-sum games are meaningful primarily in the contextof two-player (though notnecessarily two-strategy) games.

Definition 2.2.3 (Constant-sum game)A two-player normal form game isconstant-sumif there exists a constantc such that for each strategy profilea ∈ A1 ×A2 it is thecase thatu1(a) + u2(a) = c.

For convenience, when we talk of constant-sum games going forward we will alwaysassume thatc = 0, that is, that we have a zero-sum game. If common-payoff gamesrepresent situations of pure coordination, zero-sum gamesrepresent situations of purecompetition; one player’s gain must come at the expense of the other player. Thereason zero-sum games are most meaningful for two agents is that if you allow moreagents, any game can be turned into a zero-sum game by adding adummy player whoseactions don’t impact the payoffs to the other agents, and whose own payoffs are chosento make the sum of payoffs in each outcome zero.

As in the case of common-payoff games, we can use an abbreviated matrix form torepresent zero-sum games, in which we write only one payoff value in each cell. This

Multiagent Systems, draft of August 27, 2007

Page 64: Mas kevin b-2007

48 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

value represents the payoff of player 1, and thus the negative of the payoff of player 2.Note, though, that whereas the full matrix representation is unambiguous, when we usethe abbreviation we must explicit state whether this matrixrepresents a common-payoffgame or a zero-sum one.

A classical example of a zero-sum game is the game ofmatching pennies. In thismatchingpennies game game, each of the two players has a penny, and independently chooses to display either

heads or tails. The two players then compare their pennies. If they are the same thenplayer 1 pockets both, and otherwise player 2 pockets them. The payoff matrix isshown in Figure 2.7.

Heads Tails

Heads 1,−1 −1, 1

Tails −1, 1 1,−1

Figure 2.7 Matching Pennies game.

The popular children’s game ofRock, Paper, Scissors, also known asRochambeau,Rock, Paper,Scissors,orRochambeaugame

provides a three-strategy generalization of the matching-pennies game. The payoffmatrix of this zero-sum game is shown in Figure 2.8. In this game, each of the twoplayers can choose either Rock, Paper, or Scissors. If both players choose the sameaction, there is no winner, and the utilities are zero. Otherwise, each of the actionswins over one of the other actions, and loses to the other remaining action.

Rock Paper Scissors

Rock 0, 0 −1, 1 1,−1

Paper 1,−1 0, 0 −1, 1

Scissors −1, 1 1,−1 0, 0

Figure 2.8 Rock, Paper, Scissors game.

Battle of the sexes

In general, games tend to include elements of both coordination and competition. Pris-oner’s Dilemma does, although in a rather paradoxical way. Here is another well-known game that includes both elements. In this game, calledBattle of the Sexes, abattle of the

sexes game husband and wife wish to go to the movies, and they can select among two movies:“Violence Galore (VG)” and “Gentle Love (GL)”. They much prefer to go together

c© Shoham and Leyton-Brown, 2007

Page 65: Mas kevin b-2007

2.2 Games in normal form 49

rather than to separate movies, but while the wife (player 1)prefers VG, the husband(player 2) prefers GL. The payoff matrix is shown in Figure 2.9. We will return to thisgame shortly.

Wife

Husband

VG GL

VG 2, 1 0, 0

GL 0, 0 1, 2

Figure 2.9 Battle of the Sexes game.

2.2.4 Strategies in normal-form games

We have so far defined the actions available to each player in agame, but not yet his setof strategies, or his available choices. Certainly one kind of strategy isto select a singleaction and play it. We call such a strategy apure strategy, and we will use the notation pure strategywe have already developed for actions to represent it. As we mentioned above, we calla choice of pure strategy for each agent apure strategy profile.

Players could also follow another, less obvious type of strategy: randomizing overthe set of available actions according to some probability distribution. Such a strategyis called a mixed strategy. Although it may not be immediately obvious why a playershould introduce randomness into his choice of action, in fact in a multiagent settingthe role of mixed strategies is critical. We will return to this when we discuss solutionconcepts for games in the next section.

We define a mixed strategy for a normal form game as follows.

Definition 2.2.4 (Mixed strategy) Let (N, (A1, . . . , An), O, µ, u) be a normal formgame, and for any setX let Π(X) be the set of all probability distributions overX.Then the set ofmixed strategiesfor playeri is Si = Π(Ai). mixed strategy

Definition 2.2.5 (Mixed strategy profile) The set ofmixed strategy profilesis simply mixed strategyprofilethe Cartesian product of the individual mixed strategy sets, S1 × · · · × Sn.

By si(ai) we denote the probability that an actionai will be played under mixedstrategysi. The subset of actions that are assigned positive probability by the mixedstrategysi is called thesupportof si.

Definition 2.2.6 (Support) Thesupportof a mixed strategysi for a playeri is the set support of amixed strategyof pure strategiesai|si(ai) > 0.

Note that a pure strategy is a special case of a mixed strategy, in which the supportis a single action. At the other end of the spectrum we havefully mixed strategies. A fully mixed

strategy

Multiagent Systems, draft of August 27, 2007

Page 66: Mas kevin b-2007

50 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

strategy is fully mixed if it has full support (that is, if it assigns every action a non-zeroprobability).

We have not yet defined the payoffs of players given a particular strategy profile,since the payoff matrix defines those directly only for the special case of pure strategyprofiles. But the generalization to mixed strategies is straightforward, and relies onthe basic notion of decision theory—expected utility. Intuitively, we first calculate theexpected utilityprobability of reaching each outcome given the strategy profile, and then we calculatethe average of the payoffs of the outcomes, weighted by the probabilities of each out-come. Formally, we define the expected utility as follows (overloading notation, weuseui for both utility and expected utility).

Definition 2.2.7 (Expected utility of a mixed strategy) Given a normal form game(N,A, u), the expected utilityui for playeri of the mixed strategy profiles = (s1, . . . , sn)is defined as

ui(s) =∑

a∈A

ui(a)

n∏

j=1

sj(aj).

2.3 Solution concepts for normal-form games

Now that we have defined what a game in normal form is and what strategies are avail-able to players in them, the question is how to reason about such games. In single-agentdecision theory the key notion is that of anoptimal strategy, that is, a strategy that max-optimal strategyimizes the agent’s expected payoff for a given environment in which the agent operates.The situation in the single-agent case can be fraught with uncertainty, since the envi-ronment might be stochastic, partially observable, and spring all kinds of surprises onthe agent. However, the situation is even more complex in a multiagent setting. In thiscase the environment includes—or, in many cases we discuss, consists entirely of—other agents, all of which are also hoping to maximize their payoffs. Thus the notionof an optimal strategy for a given agent is not meaningful; the best strategy depends onthe choices of others.

Generally speaking, we reason about multi-player games using solution concepts,solutionconcepts principles according to which we identify interesting subsets of the outcomes of a

game. The most important solution concept is the Nash equilibrium, which we willdiscuss in a moment. There are also a large number of other solution concepts, onlysome of which we will discuss here. Many of these concepts relate to the Nash equi-librium in one way or another—some are more restrictive than the Nash equilibrium,some less so, and some noncomparable. In Chapters 4 and 5 we will introduce someadditional solution concepts that are only applicable to game representations other thanthe normal form.

2.3.1 Pareto optimality

First, let us consider games from the point of view of an outside observer. Can someoutcomes of a game be said to be better than others?

c© Shoham and Leyton-Brown, 2007

Page 67: Mas kevin b-2007

2.3 Solution concepts for normal-form games 51

Observe that this question is complicated because we have noway of saying that oneagent’s interests are more important than another’s. For example, it might be temptingto say that we should prefer outcomes in which the sum of agents’ utilities is higher.However, recall from Section 2.1.2 that we can apply any positive affine transformationto an agent’s utility function and obtain another valid utility function. For example, wecould multiply all of player 1’s payoffs by 1000—this could clearly change whichoutcome maximized the sum of agents’ utilities.

Thus, our problem is to find a way of saying that some outcomes are better than oth-ers, even when we only know agents’ utility functions up to a positive affine transfor-mation. Imagine that each agent’s utility is a monetary payment that you will receive,but that each payment comes in a different currency, and you do not know anythingabout the exchange rates. Which outcomes should you prefer? Observe that, while itis not usually possible to identify the best outcome, thereare situations in which youcan be sure that one outcome is better than another. For example, it is better to get10units of currency 1 and3 units of currency 2 than to get9 units of currency 1 and3units of currency 2, regardless of the exchange rate. We formalize this intuition in thefollowing definition.

Definition 2.3.1 (Pareto domination) Strategy profilesPareto dominatesstrategy pro- Paretodominationfile s′ if for all i ∈ N , ui(s) ≥ ui(s

′), and there exists somej ∈ N for whomuj(s) > uj(s).

In other words, in a Pareto dominated outcome some player canbe made better offwithout making any other player worse off. Observe that we define Pareto dominationover strategy profiles, not just action profiles.

Pareto domination gives us a partial ordering over the outcomes in a game. Thuswe cannot generally identify a single “best” outcome; instead, we may have a set ofoptima that we cannot decide between.

Definition 2.3.2 (Pareto optimality) Strategy profiles is Pareto optimal, or strictly ParetooptimalityPareto efficient, if there does not exist another strategy profiles′ ∈ S that Pareto

strict Paretoefficiency

dominatess.

We can easily draw several conclusions about Pareto optimaloutcomes. First, everygame must have at least one such optimum. Second, some games will have more: forexample, in zero-sum games,all strategy profiles are strictly Pareto efficient. Finally,in common-payoff games, every Pareto optimal outcome has the same payoffs.

2.3.2 Best response and Nash equilibrium

Now let us look at games from an individual agent’s point of view, rather than as anoutside observer. Our first observation is that if an agent knew how the others weregoing to play, his strategic problem would become simple. Specifically, he wouldbe left with the single-agent problem of choosing a utility-maximizing action that wediscussed in Section 2.1.

Formally, defines−i = (s1, . . . , si−1, si+1, . . . , sn), a strategy profiles withoutagenti’s strategy. Thus we can writes = (si, s−i). If the agents other thani were to

Multiagent Systems, draft of August 27, 2007

Page 68: Mas kevin b-2007

52 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

commit to plays−i, a utility-maximizing agenti would face the problem of determin-ing his best response.

Definition 2.3.3 (Best response)Playeri’s best responseto the strategy profiles−i isbest responsea mixed strategys∗i ∈ Si such thatui(s

∗i , s−i) ≥ ui(si, s−i) for all strategiessi ∈ Si.

The best response is not necessarily unique. Indeed, exceptin the extreme casein which there is a unique best response that is a pure strategy, the number of bestresponses is always infinite. When the support of a best responses∗ includes two ormore actions, the agent must be indifferent between them—otherwise the agent wouldprefer to reduce the probability of playing at least one of the actions to zero. Butthusanymixture of these actions must also be a best response, not only the particularmixture in s∗. Similarly, if there are two pure strategies that are individually bestresponses, any mixture of the two is necessarily also a best response.

Of course, in general an agent won’t know what strategies theother players willadopt. Thus, the notion of best response is not a solution concept—it does not identifyan interesting set of outcomes in this general case. However, we can leverage the ideaof best response to define what is arguably the most central notion in non-cooperativegame theory, the Nash equilibrium.

Definition 2.3.4 (Nash equilibrium) A strategy profiles = (s1, . . . , sn) is a Nashequilibrium if, for all agentsi, si is a best response tos−i.Nash

equilibriumIntuitively, a Nash equilibrium is astablestrategy profile: no agent would want to

change his strategy if he knew what strategies the other agents were following. This isbecause in a Nash equilibrium all of the agents simultaneously play best responses toeach other’s strategies.

We can divide Nash equilibria into two categories, strict and weak, depending onwhether or not every agent’s strategy constitutes auniquebest response to the otheragents’ strategies.

Definition 2.3.5 (Strict Nash) A strategy profiles = (s1, . . . , sn) is a strict Nashequilibrium if, for all agentsi and for all strategiess′i 6= si, ui(si, s−i) > ui(s

′i, s−i).strict Nash

equilibriumDefinition 2.3.6 (Weak Nash)A strategy profiles = (s1, . . . , sn) is a weak Nashequilibrium if, for all agentsi and for all strategiess′i 6= si, ui(si, s−i) ≥ ui(s

′i, s−i).weak Nash

equilibriumIntuitively, weak Nash equilibria are less stable than strict Nash equilibria, because

in the former case at least one player has a best response to the other players’ strategiesthat is not his equilibrium strategy. Mixed-strategy Nash equilibria are always weak(can you see why?), while pure-strategy Nash equilibria canbe either strict or weak,depending on the game.

Consider again the Battle of the Sexes game. We immediately see that it has twopure-strategy Nash equilibria, depicted in Figure 2.10.

We can check that these are Nash equilibria by confirming thatwhenever one of theplayers plays the given (pure) strategy, the other player would only lose by deviating.

c© Shoham and Leyton-Brown, 2007

Page 69: Mas kevin b-2007

2.3 Solution concepts for normal-form games 53

VG GL

VG 2, 1 0, 0

GL 0, 0 1, 2

Figure 2.10 Pure-strategy Nash equilibria in the Battle of the Sexes game.

Are these the only Nash equilibria? The answer is no; although they are indeed theonly pure-strategy equilibria (why?), there is also another mixed-strategy equilibrium.In general, it is tricky to compute a game’s mixed-strategy equilibria; we considerthis problem in detail in Chapter 3. It is easy when we can guess thesupportof theequilibrium strategies, however. Let us now guess that bothplayers randomize, andlet us assume that husband’s strategy is to play VG with probability p and GL withprobability1−p. Then if the wife, the row player, also mixes between her two actions,she must be indifferent between them, given the husband’s strategy. (Otherwise, shewould be better off switching to a pure strategy according towhich she only played thebetter of her actions.) Then we can write the following equations.

Uwife(VG) = Uwife(GL)

2 ∗ p+ 0 ∗ (1− p) = 0 ∗ p+ 1 ∗ (1− p)

p =1

3

We get thatp = 1/3; that is, in order to make the wife indifferent between her actions,the husband must choose VG with probability1/3 and GL with probability2/3. Ofcourse, since the husband plays a mixed strategy he must alsobe indifferent between hisactions. By a similar calculation it can be shown that to makethe husband indifferent,the wife must choose GL with probability1/3 and VG with probability2/3. Now wecan confirm that we have indeed found an equilibrium: since both players play in a waythat makes the other indifferent between their actions, they are both best-responding toeach other. Like all mixed-strategy equilibria, this is a weak Nash equilibrium. Theexpected payoff of both agents is2/3 in this equilibrium, which means that each of thepure-strategy equilibria Pareto-dominates the mixed-strategy equilibrium.

Heads Tails

Heads 1,−1 −1, 1

Tails −1, 1 1,−1

Figure 2.11 The Matching Pennies game.

Multiagent Systems, draft of August 27, 2007

Page 70: Mas kevin b-2007

54 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

We mentioned briefly earlier that mixed strategies play an important role. The aboveexample may not make it obvious, but now consider again the Matching Pennies game,reproduced in Figure 2.11. It is not hard to see that no pure strategy could be part ofan equilibrium in this game of pure competition. (Therefore, likewise there can be nostrict Nash equilibrium in this game.) But again, using the same procedure as above,the reader can verify that there exists a mixed-strategy equilibrium in which each playerchooses one of the two available actions with probability1/2.

We have now given two examples in which we managed to find Nash equilibria(three equilibria in one example, one equilibrium in the other). Did we just luck out?Here there is some good news—it was not just luck.

Theorem 2.3.7 (Nash, 1950)Every finite normal form game has a Nash equilibrium.

Let us give a feel for the proof of Nash’s theorem, since it provides insight into thenature of mixed strategies and equilibria. There are several ways to prove the theorem,but all have the following structure: they first appeal to some general theorem fromtopology or combinatorics, and then show how Nash’s theoremfollows as a specialcase of that general theorem. There is a tradeoff here; the more powerful the generaltheorem, the easier it is to prove Nash’s theorem based on it,but at the same timeless insight is revealed because much of the insight it hidden inside the general the-orem. The three general theorems most commonly appealed to are Sperner’s lemma,Sperner’s lemmaBrouwer’s fixed-point theorem, and Kakutani’s theorem. We will outline the proof

Brouwer’sfixed-pointtheorem

Kakutani’stheorem

based on the latter, most general theorem.Kakutani’s theorem can be phrased as follows (the followingis not the most general

form of it, but it is general enough for our purposes):

Theorem 2.3.8 (Kakutani, 1941)LetX be a nonempty subset ofn-dimensional Eu-clidean space, andf : X → 2X be a function fromX onto its powerset (the set ofall its subsets). Then the following are sufficient conditions forf to have a fixed point,that is, a pointx∗ ∈ X such thatx∗ ∈ f(x∗):

1. X is compact, that is, any sequence inX has a subsequence which tends towardssome limit element ofX.

2. X is convex, that is, for any two pointsx, y ∈ X and anyα ∈ [0, 1] it is the casethatαx+ (1− α)y ∈ X.

3. For anyx ∈ X, f(x) is nonempty and convex.

4. For any sequence of pairs(xi, x∗i ) such thatxi, x

∗i ∈ X and x∗i ∈ f(xi), if

(xi, x∗i )→ (x, x∗) thenx∗ ∈ f(x).3

To apply Kakutani’s theorem, we first chooseX to consist of all mixed strategyprofiles. Note that each individual mixed strategy overk pure strategies can be viewedas an element of a(k − 1)-dimensionalsimplex, that is, a tuple(p1, p2, . . . , pk) suchthatpi ∈ [0, 1] andΣk

i=1pi = 1. X is thus the Cartesian product ofn such simplexes,one for each agent; it is easily verified thatX is compact and convex. Next we definef

c© Shoham and Leyton-Brown, 2007

Page 71: Mas kevin b-2007

2.3 Solution concepts for normal-form games 55

to be thebest-response function. That is, given a strategy profiles = (s1, s2, . . . , sn), best-responsefunctionf(s) consists of all strategy profiles(s′1, s

′2, . . . , s

′n) such thats′i is a best response by

playeri to s−i. It can be verified, albeit with a little more effort, thatf satisfies theremaining conditions of Kakutani’s theorem, and thus has a fixed point. Finally, wenote that, by definition, this fixed point is a Nash equilibrium.

Because mixed strategies and Nash equilibrium play such a central role in gametheory, it is worth discussing their possible interpretation further. One well-knownmotivating example for mixed strategies involves soccer: specifically, a kicker and agoalie getting ready for a penalty kick. The kicker can kick to the left or the right, andthe goalie can jump to the left or the right. The kicker scoresif and only if he kicksto one side and the goalie jumps to the other; this is thus bestmodelled as MatchingPennies, the zero-sum game in Figure 2.7. Any pure strategy on the part of either playerinvites a winning best response on the part of the other player. It is only by kicking orjumping in either direction with equal probability, goes the argument, that the opponentcannot exploit your strategy.

Of course, this argument is not uncontroversial. In particular, it can be argued thatthe strategies of each player are deterministic, but each player has uncertainty regardingthe other player’s strategy. This is indeed a second possible interpretation of mixedstrategy: The mixed strategy of playeri is everyone else’s assessment of how likelyiis to play each pure strategy. In equilibrium,i’s mixed strategy has the further propertythat every action in its support is a best response to playeri’s beliefs about the otheragents’ strategies.

Finally, there are two interpretations that are related to learning in multiagent sys-tems. In one interpretation, the game is actually played many times repeatedly, andthe probability of a pure strategy is the fraction of the timeit is played in the limit (itsso-calledempirical frequency). In the other interpretation, not only is the game playedempirical

frequencyrepeatedly, but each time it involves two different agents selected at random from alarge population. In this interpretation, each agent in thepopulation plays a pure strat-egy, and the probability of a pure strategy represents the fraction of agents playing thatstrategy. We return to these learning interpretations in Chapter 6.

2.3.3 Maxmin and minmax strategies

Themaxmin strategyof playeri in ann-player, general-sum game is a (not necessarilyunique, and in general mixed) strategy that maximizesi’s worst-case payoff, in thesituation where all the other players (whom we denote−i) happen to play the strategieswhich cause the greatest harm toi. Themaxmin value(or security level) of the game security levelfor playeri is that minimum amount of payoff guaranteed by a maxmin strategy.

Definition 2.3.9 (Maxmin) Themaxmin strategyfor playeri isarg maxsimins−i

ui(si, s−i),maxmin strategyand themaxmin valuefor playeri is maxsi

mins−iui(si, s−i).

maxmin valueThese concepts are useful in a variety of different settings. For example, the maxmin

strategy is the best choice for a conservative agent who wants to maximize his expectedutility without having to make any assumptions about the other agents. (For example,

3. For the record, this property is a special case of what is called upper hemi-continuity.

Multiagent Systems, draft of August 27, 2007

Page 72: Mas kevin b-2007

56 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

he does not want to assume that the other agents will act rationally according to theirown interests, or that they will draw their action choices from some known distribu-tions.) In the Battle of the Sexes game (Figure 2.9), the maxmin value for either playeris 2/3, and requires the maximizing agent to play a mixed strategy. (Do you see why?)

The minmax strategyandminmax valueplay a dual role to their maxmin counter-parts. In two-player games the minmax strategy for playeri against player−i is astrategy that keeps the maximum payoff of−i at a minimum, and the minmax value ofplayer−i is that minimum. This is useful when we want to consider the amount thatone player can punish another without regard for his own payoff. Such punishment canarise in repeated games, as we will see in Section 5.1. The formal definitions are:

Definition 2.3.10 (Minmax, 2-player) In a two-player game, theminmax strategyforminmax strategyin a two-playergame

playeri against player−i is arg minsimaxs−i

u−i(si, s−i), and player−i’s minmaxvalueis minsi

maxs−iu−i(si, s−i).

minmax value In n-player games withn > 2, defining playeri’s minmax strategy against playerjis a bit more complicated. This is becausei will not usually be able to guarantee thatj achieves minimal payoff by acting unilaterally. However, if we assume that all theplayers other thanj choose to “gang up” onj—and that they are able to coordinateappropriately when there is more than one joint strategy that would yield the sameminimal payoff forj—then we can define minmax strategies for then-player case.

Definition 2.3.11 (Minmax,n-player) In ann-player game, theminmax strategyforminmax strategyin ann-playergame

player i against playerj 6= i is i’s component of the mixed strategy profiles−j in theexpressionarg mins−j

maxsjuj(sj , s−j), where−j denotes the set of players other

thanj. As before, theminmax valuefor playerj is mins−jmaxsj

uj(sj , s−j).

In two-player games, a player’s minmax value is always equalto his maxmin value.For games with more than two players a weaker condition holds: a player’s maxminvalue is always less than or equal to his minmax value. (Can you explain why this is?)

Since neither an agent’s maxmin or minmax strategy depends on the strategies thatthe other agents actually choose, the maxmin and minmax strategies give rise to so-lution concepts in a straightforward way. We will call a mixed-strategy profiles =(s1, s2, . . .) a maxmin strategy profileof a given game ifs1 is a maxmin strategy forplayer 1,s2 is a maxmin strategy for player 2 and so on. In two-player games, we canalso defineminmax strategy profilesanalogously. In two-player, zero-sum games, thereis a very tight connection between minmax and maxmin strategy profiles. Furthermore,these solution concepts are also linked to the Nash equilibrium.

Theorem 2.3.12 (Minimax theorem (von Neumann, 1928))In any finite, two-player,zero-sum game, in any Nash equilibrium4 each player receives a payoff that is equal toboth his maxmin value and his minmax value.

4. The attentive reader might wonder how a theorem from 1928 can use the term “Nash equilibrium,” whenNash’s work was published in 1950. Von Neumann used differentterminology and proved the theorem in adifferent way; however, the above presentation is probablyclearer in the context of modern game theory.

c© Shoham and Leyton-Brown, 2007

Page 73: Mas kevin b-2007

2.3 Solution concepts for normal-form games 57

Proof. Let (s′i, s′−i) be a Nash equilibrium, and denotei’s equilibrium payoff as

vi. Denotei’s maxmin value asvi andi’s minmax value asvi.First, show thatvi = vi. Clearly we cannot havevi > vi, as if this were true

theniwould profit by deviating froms′i to his maxmin strategy, and hence(s′i, s′−i)

would not be a Nash equilibrium. Thus it remains to show thatvi cannot be lessthanvi.

Assume thatvi < vi. By definition, in equilibrium each player plays a bestresponse to the other. Thus

v−i = maxs−i

u−i(s′i, s−i).

Equivalently, we can write that−i minimizes the negative of his payoff, giveni’sstrategy,

−v−i = mins−i

−u−i(s′i, s−i).

Since the game is zero sum,vi = −v−i andui = −u−i. Thus,

vi = mins−i

ui(s′i, s−i).

We definedvi asmaxsimins−i

ui(si, s−i). By the definition ofmax, we musthave

maxsi

mins−i

ui(si, s−i) ≥ mins−i

ui(s′i, s−i).

Thusvi ≥ vi, contradicting our assumption.We have shown thatv(i) = v. The proof thatvi = vi is similar, and is left as an

exercise.

Why is the minmax theorem important? It demonstrates that maxmin strategies,minmax strategies and Nash equilibria coincide in two-player zero-sum games. Inparticular, Theorem 2.3.12 allows us to conclude that in two-player, zero-sum games:

1. Each player’s maxmin value is equal to his minmax value. Byconvention, themaxmin value for player 1 is called thevalue of the game. value of a

zero-sum game2. For both players, the set of maxmin strategies coincides with the set of minmax

strategies.

3. Any maxmin strategy profile (or, equivalently, minmax strategy profile) is a Nashequilibrium. Furthermore, these are all the Nash equilibria. Consequently, all Nashequilibria have the same payoff vector (namely, those in which player 1 gets thevalue of the game).

For example, in the Matching Pennies game in Figure 2.7, the value of the game is0. The unique Nash equilibrium consists of both players randomizing between Headsand Tails with equal probability, which is both the maxmin strategy and the minmaxstrategy for each player.

Multiagent Systems, draft of August 27, 2007

Page 74: Mas kevin b-2007

58 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

Nash equilibria in zero-sum games can be viewed graphicallyas a “saddle” in ahigh-dimensional space. At a saddle point, any deviation ofthe agent lowers his utilityand increases the utility of the other agent. It is easy to visualize in the simple case inwhich each agent has two pure strategies. In this case the space of strategy profiles canbe viewed as all points on the square between (0,0) and (1,1),with each point in thesquare describing the mixed strategy of each agent. The payoff to player 1 (and thus thenegative of the payoff to player 2) is indeed a saddle-shaped, three-dimensional surfaceabove this square. Figure 2.12 (left) gives a pictorial example, illustrating player 1’sexpected utility in matching pennies as a function of both players’ probabilities ofplaying heads. Figure 2.12 (right) adds a plane atz = 0 to make it easier to see that itis an equilibrium for both players to play heads 50% of the time, and that zero is boththe maxmin value and the minmax value for both players.

Figure 2.12 The saddle point in matching pennies, with and without a plane atz = 0.

2.3.4 Minimax regret

We argued above that one reason agents might play maxmin strategies is that they wantto achieve good payoffs in the worst case, even though the game might not be zero-sum.However, consider a setting in which the other agent is not believed to be malicious,but is instead believed to be entirely unpredictable. (Crucially, in this section we do notapproach the problem as Bayesians, saying that agenti’s beliefs can be described bya probability distribution; instead, we use a “pre-Bayesian” model in whichi does notknow such a distribution and indeed has no beliefs about it.)In such a setting, it canalso make sense for agents to care about minimizing their worst-caseloss, rather thansimply maximizing their worst-case payoff.

Consider the game in Figure 2.13. Interpret the payoff numbers as pertaining toagent1 only; for this example it does not matter what agent2’s payoffs are, and wecan imagine that agent1 does not know these values. (Indeed, this could be one reasonwhy player1 is unable to form beliefs about how player2 will play, even if he believesplayer2 to be rational.) In this game it is easy to see that player1’s maxmin strategyis to playB; this is because player2’s minmax strategy is to playR, andB is a best

c© Shoham and Leyton-Brown, 2007

Page 75: Mas kevin b-2007

2.3 Solution concepts for normal-form games 59

L R

T 100 1− ǫ

B 2 1

Figure 2.13 A game for contrasting maxmin with minimax regret. The numbers refer only toplayer1’s payoffs; letǫ be an arbitrarily small positive constant.

response toR. If player1 does not believe that player2 is malicious, however, he mightreason as follows. If player2 were to playR then it would not matter very much howplayer1 plays: the most he could lose by playing the wrong way would beǫ. On theother hand, if player2 were to playL then player1’s action would be very significant:if player 1 were to make the wrong choice here then his utility would be decreased by98. Thus player1 might choose to playT in order to minimize his worst-case loss; thisis the opposite of what he would choose if he followed his maxmin strategy.

Let us now formalize this idea. We begin with the notion of regret.

Definition 2.3.13 (Regret) An agenti’s regret for playing an actionai if the other regretagents adopt action profilea−i is defined as

[maxa′

i∈Ai

ui(a′i, a−i)

]− ui(ai, a−i).

In words, this is the amount thati loses by playingai, rather than playing his bestresponse toa−i. Of course,i does not know what actions the other players will take;however, he can consider those actions that would give him the highest regret for play-ing ai.

Definition 2.3.14 (Max regret) An agenti’s maximum regretfor playing an actionai maximum regretis defined as

maxa−i∈A−i

([maxa′

i∈Ai

ui(a′i, a−i)

]− ui(ai, a−i)

).

This is the amount thati loses by playingai rather than playing his best response toa−i, if the other agents chose thea−i that makes this loss as large as possible. Finally,i can choose his action in order to minimize this worst-case regret.

Definition 2.3.15 (Minimax regret) Minimax regretactions for agenti are defined as Minimax regret

arg minai∈Ai

[max

a−i∈A−i

([maxa′

i∈Ai

ui(a′i, a−i)

]− ui(ai, a−i)

)].

Multiagent Systems, draft of August 27, 2007

Page 76: Mas kevin b-2007

60 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

Thus, an agent’s minimax regret action is an action that yields the smallest maxi-mum regret. Minimax regret can be extended to a solution concept in the natural way,by identifying action profiles that consist of minimax regret actions for each player.Note that we can safely restrict ourselves to actions ratherthan mixed strategies in thedefinitions above (that is, maximizing over the setsAi andA−i instead ofSi andS−i),because of the linearity of expectation. We leave the proof of this fact as an exercise.

2.3.5 Removal of dominated strategies

We first define what it means for one strategy to dominate another. Intuitively, onestrategy dominates another for a playeri if the first strategy yieldsi a greater payoffthan the second strategy, foranystrategy profile of the remaining players.5 There are,however, three gradations of dominance, which are capturedin the following definition:

Definition 2.3.16 (Domination) Let si and s′i be two strategies of playeri, andS−i

the set of all strategy profiles of the remaining players. Then

1. si strictly dominatess′i if for all s−i ∈ S−i it is the case thatui(si, s−i) >strictdomination ui(s

′i, s−i).

2. si weakly dominatess′i if for all s−i ∈ S−i it is the case thatui(si, s−i) ≥weakdomination ui(s

′i, s−i), and for at least ones−i ∈ S−i it is the case thatui(si, s−i) > ui(s

′i, s−i).

3. si very weakly dominatess′i if for all s−i ∈ S−i it is the case thatui(si, s−i) ≥very weakdomination ui(s

′i, s−i).

If one strategy dominates all others, we say that it is (strongly, weakly or veryweakly)dominant.

Definition 2.3.17 (Dominant strategy) A strategy isstrictly (resp., weakly; very weakly)dominantfor an agent if it strictly (weakly; very weakly) dominates any other strategyfor that agent.

It is obvious that a strategy profile(s1, . . . , sn) in which everysi is dominant forplayeri (whether strictly, weakly or very weakly) is a Nash equilibrium. Such a strat-egy profile forms what is called adominant strategy equilibriumwith the appropriatedominant

strategyequilibrium

modifier (strictly, etc). A strictly dominant equilibrium is necessarily the unique Nashequilibrium. For example, consider again the Prisoner’s Dilemma game. For eachplayer, the strategyD is strictly dominant, and indeed(D,D) is the unique Nash equi-librium. Indeed, we can now explain the “dilemma” which is particularly troublingabout the Prisoner’s Dilemma game: the outcome reached in the strictly dominantequilibrium is also the only outcome that isnot Pareto optimal.

Games with dominant strategies will play an important role when we discussmech-anism designlater in the book. However, dominant strategies are rare. More commonare dominated strategies.

Definition 2.3.18 (Dominated strategy)A strategysi isstrictly (weakly; very weakly)dominatedfor an agenti if some other strategys′i strictly (weakly; very weakly) domi-dominated

strategy5. Note that here we consider strategy domination from one individual player’s point of view; thus, thisnotion is unrelated to the concept of Pareto domination discussed earlier.

c© Shoham and Leyton-Brown, 2007

Page 77: Mas kevin b-2007

2.3 Solution concepts for normal-form games 61

natessi.

Let us focus for the moment on strictly dominated strategies. Intuitively, all strictlydominated pure strategies can be ignored, since they will never participate in any equi-librium (why?). There are several subtleties, however. First, once a pure strategy iseliminated, another strategy which was not dominated can become dominated. And sothis process of elimination can be continued. Second, a purestrategy may be domi-nated by a mixture of other pure strategies without being dominated by any of themindependently. To see this, consider the game in Figure 2.14.

L C R

U 3, 1 0, 1 0, 0

M 1, 1 1, 1 5, 0

D 0, 1 4, 1 0, 0

Figure 2.14 A game with dominated strategies.

ColumnR can be eliminated, since it is dominated by, e.g., columnL. We are leftwith the reduced game in Figure 2.15.

L C

U 3, 1 0, 1

M 1, 1 1, 1

D 0, 1 4, 1

Figure 2.15 The game from Figure 2.14 after removing the dominated strategyR.

In this gameM is dominated by neitherU norD, but it is dominated by the mixedstrategy that selects eitherU or D with equal probability. (Note, however, that it wasnot dominated before the elimination of theR column.) And so we are left with themaximally reduced game in Figure 2.16.

Multiagent Systems, draft of August 27, 2007

Page 78: Mas kevin b-2007

62 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

L C

U 3, 1 0, 1

D 0, 1 4, 1

Figure 2.16 The game from Figure 2.15 after removing the dominated strategyM .

This yields us a solution concept: the set of all strategy profiles that assign zeroprobability to playing any action that would be removed through iterated removal ofstrictly dominated strategies. Note that this is a much weaker solution concept thanNash equilibrium—the set of strategy profiles will include all the Nash equilibria, butit will include many other mixed strategies as well. In some games, it will be equal toS, the set of all possible mixed strategies.

Since iterated removal of strictly dominated strategies preserves Nash equilibria, wecan use this technique to computational advantage. In the example above, rather thancomputing the Nash equilibria in the original3×3 game, we can now compute them inthis2× 2 game, applying the technique described earlier. In some cases, the procedureends with a single cell; this is the case, for example, with the Prisoner’s Dilemma game.In this case we say that the game issolvableby iterated elimination.

Clearly, in any finite game, iterated elimination ends aftera finite number of iter-ations. One might worry that, in general, the order of elimination might affect thefinal outcome. It turns out that this elimination order does not matter when we removestrictly dominated strategies (this is called aChurch-Rosserproperty). However, theChurch-Rosser

property elimination order can make a difference to the final reduced game if we remove weaklyor very weakly dominated strategies.

Which flavor of domination should we concern ourselves with? In fact, each fla-vor has advantages and disadvantages, which is why we present all of them here. Strictdomination leads to better-behaved iterated elimination:it yields a reduced game whichis independent of the elimination order, and iterated elimination is more computation-ally manageable (this and other computational issues regarding domination are dis-cussed in Section 3.7.3). There is also a further related advantage which we will deferto Section 2.3.6. Weak domination can yield smaller reducedgames, but under iteratedelimination the reduced game can depend on the elimination order. Very weak domi-nation can yield even smaller reduced games, but again thesereduced games dependon elimination order. Furthermore, very weak domination does not impose a strict or-der on strategies: when two strategies are equivalent, eachvery weakly dominates theother. For this reason, this last form of domination is generally considered the leastimportant by game theorists.

c© Shoham and Leyton-Brown, 2007

Page 79: Mas kevin b-2007

2.3 Solution concepts for normal-form games 63

2.3.6 Rationalizability

A strategy isrationalizableif a perfectly rational player could justifiably play it against rationalizableone or more perfectly rational opponents. Informally, a strategy profile for playeriis rationalizable if it is a best response to some beliefs that i could have about thestrategies that the other players will take. The wrinkle, however, is thati cannot havearbitrary beliefs about the other players’ actions—his beliefs must take into account hisknowledge oftheir rationality, which incorporates their knowledge ofhis rationality,their knowledge of his knowledge of their rationality, and so on in an infinite regress. Arationalizable strategy profile is a strategy profile which consists only of rationalizablestrategies.

For example, in the Matching Pennies game given in Figure 2.7, the pure strategyheadsis rationalizable for the row player. First, the strategyheadsis a best responseto the pure strategyheadsby the column player. Second, believing that the columnplayer would also playheadsis consistent with the column player’s rationality: thecolumn player could believe that the row player would playtails, to which the columnplayer’s best response isheads. It would be rational for the column player to believethat the row player would playtails because the column player could believe that therow player believed that the column player would playtails, to which tails is a bestresponse. Arguing in the same way, we can make our way up the chain of beliefs.

However, not every strategy can be justified in this way. For example, consideringthe Prisoner’s Dilemma game given in Figure 2.3, the strategy C is not rationalizablefor the row player, becauseC is not a best response to any strategy that the columnplayer could play. Similarly, consider the game from Figure2.14. M is not a ratio-nalizable strategy for the row player: although itis a best response to a strategy of thecolumn player’s (R), there do not exist any beliefs that the column player couldholdabout the row player’s strategy to whichR would be a best response.

Because of the infinite regress, the formal definition of rationalizability is somewhatinvolved; however, it turns out that there are some intuitive things that we can sayabout rationalizable strategies. First, note that Nash equilibrium strategies are alwaysrationalizable: thus, the set of rationalizable strategies (and strategy profiles) is alwaysnonempty. Second, in two-player games rationalizable strategies have a simple char-acterization: they are those strategies which survive the iterated elimination of strictlydominated strategies. Inn-player games there exist strategies which survive iteratedremoval of dominated strategies but are not rationalizable. In this more general case,rationalizable strategies are those strategies which survive iterative removal of strate-gies that are never a best response to any strategy profile by the other players.

We now define rationalizability more formally. First we willdefine an infinite se-quence of (possibly mixed) strategiesS0

i , S1i , S

2i , . . . for each playeri. Let S0

i = Si;thus for each agenti the first element in the sequence is the set of alli’s mixed strate-gies. LetCH(S) denote the convex hull of a setS: the smallest convex set contain-ing all the elements ofS. Now we defineSk

i as the set of all strategiessi ∈ Sk−1i

for which there exists somes−i ∈ ×j 6=i CH(Sk−1j ) such that for alls′i ∈ Sk−1

i ,ui(si, s−i) ≥ ui(s

′i, s−i). That is, a strategy belongs toSk

i if there is some strategys−i for the other players in response to whichsi is at least as good as any other strat-egy fromSk−1

i . The convex hull operation allowsi to best-respond to uncertain beliefs

Multiagent Systems, draft of August 27, 2007

Page 80: Mas kevin b-2007

64 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

about which strategies fromSk−1j another playerj will adopt. CH(Sk−1

j ) is used in-

stead ofΠ(Sk−1j ), the set of all probability distributions overSk−1

j , because the latterwould allow consideration of mixed strategies that are dominated by some pure strate-gies forj. Playeri could not believe thatj would play such a strategy because such abelief would be inconsistent withi’s knowledge ofj’s rationality.

Now we define the set of rationalizable strategies for playeri as the intersection ofthe setsS0

i , S1i , S

2i , . . .

Definition 2.3.19 (rationalizable strategies)Therationalizable strategiesfor playerirationalizablestrategies are

⋂∞k=0 S

ki .

2.3.7 Correlated equilibrium

The correlated equilibriumis a solution concept which generalizes the Nash equilib-correlatedequilibrium rium. Some people feel that this is the most fundamental solution concept of all.6

In a standard game, each player mixes his pure strategies independently. For exam-ple, consider again the Battle of the Sexes game (reproducedhere as Figure 2.17) andits mixed-strategy equilibrium.

VG GL

VG 2, 1 0, 0

GL 0, 0 1, 2

Figure 2.17 Battle of the Sexes game.

As we saw in Section 2.3.2, this game’s unique mixed-strategy equilibrium yieldseach player an expected payoff of2/3. But now imagine that the two players canobserve the result of a fair coin flip, and can condition theirstrategies based on thatoutcome. They can now adopt strategies from a richer set; forexample, they couldchoose “GL if heads, VG if tails.” Indeed, this pair forms an equilibrium in this richerstrategy space; given that one player plays the strategy, the other player only loses byadopting another. Furthermore, the expected payoff to eachplayer in this so-calledcorrelated equilibrium is.5 ∗ 2 + .5 ∗ 1 = 1.5. Thus both agents receive higher utilitythan they do under the mixed strategy equilibrium in the uncorrelated case (which hadexpected payoff of 2/3 for both agents), and the outcome is fairer than either of thepure strategy equilibria in the sense that the worst-off player achieves higher expectedutility. Correlating devices can thus be quite useful.

The above example had both players observe the exact outcomeof the coin flip,but the general setting doesn’t require this. Generally, the setting includes some ran-dom variable (the “external event”) with a commonly known probability distribution,

6. One noted game theorist, R. Myerson, has gone so far as to saythat “If there is intelligent life on otherplanets, in a majority of them, they would have discovered correlated equilibrium before Nash equilibrium.”

c© Shoham and Leyton-Brown, 2007

Page 81: Mas kevin b-2007

2.3 Solution concepts for normal-form games 65

and a private signal to each player about the instantiation of the random variable. Aplayer’s signal can be correlated with the random variable’s value and with the signalsreceived by other players, without uniquely identifying any of them. Standard gamescan be viewed as the degenerate case in which the signals of the different agents areprobabilistically independent.

To model this formally, considern random variables, with a joint distribution overthese variables. Imagine that nature choose according to this distribution, but revealsto each agent only the realized value of his variable, and theagent can condition hisaction on this value.7

Definition 2.3.20 (Correlated equilibrium) Given ann-agent gameG,A, u and a setof random variablesv = v1, . . . , vn with respective domainsD = D1, . . . ,Dn, con-sider a vector of mappingsσ = σ1, . . . , σn with σi : Di → Ai. σ is said to be acorrelated equilibriumof G if there exists a joint distributionπ over v such that for correlated

equilibriumeach agenti and any mappingσ′i : Di → Ai it is the case that

d∈D

π(d)ui (σ1(d1), . . . , σi(di), . . . , σn(dn))

≥∑

d∈D

π(d)ui (σ1(d1), . . . , σ′i(di), . . . , σn(dn))

Note that the mapping is to an action, that is to a pure strategy rather than a mixedone. One could allow a mapping to mixed strategies, but that would add no power (doyou see why?).

Clearly, for every Nash equilibrium, we can construct an equivalent correlated equi-librium, in the sense that they induce the same distributionon outcomes:

Theorem 2.3.21For every Nash equilibriumσ∗ there exists a correlated equilibriumof σ as above such that for every joint actiona = (a1, . . . , an) it is the case thatσ∗(a) =

∑d∈D,∀iσi(di)=ai

π(d)

The proof is straightforward; one can always define an uninformative signal (forexample, with a single value in the domain of each variable),in which case the corre-lated and uncorrelated strategies coincide. However, not every correlated equilibriumis equivalent to a Nash equilibrium; the Battle-of-the-Sexes example above provides acounter-example. Thus, correlated equilibrium is a strictly weaker notion than Nashequilibrium.

Finally, we note that correlated equilibria can be combinedtogether to form new cor-related equilibria. Thus, if the set of correlated equilibria of a gameG does not containa single element, it is infinite. Indeed, any convex combination of correlated equilib-rium payoffs can itself be realized as the payoff profile of some correlated equilibrium.The easiest way to understand this claim is to imagine a public random device thatselects which of the correlated equilibria will be played; next, another random number

7. This construction is closely related to two other constructions later in the book, one in connection withBayesian Games in Chapter 5, and one in connection with knowledge and probability (KP) structures inChapter 12.

Multiagent Systems, draft of August 27, 2007

Page 82: Mas kevin b-2007

66 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

is chosen in order to allow the chosen equilibrium to be played. Overall, each agent’sexpected payoff is the weighted sum of the payoffs from the correlated equilibria thatwere combined. Of course, these two stages of random number generation are notnecessary: we can simply derive a newΩ andπ from theΩ’s andπ’s of the originalcorrelated equilibria, and then perform the random number generation in one step.

2.3.8 Trembling-hand perfect equilibrium

Another important solution concept is thetrembling-hand perfect equilibrium, or sim-trembling-handperfectequilibrium

ply perfect equilibrium. While rationalizability is a weaker notion than that of a Nashequilibrium, perfection is a stronger one. Several equivalent definitions of the conceptexist. In the following definition, recall that a completelymixed strategy is one thatassigns every action a strictly positive probability:

Definition 2.3.22 (Trembling-hand perfect equilibrium) A mixed strategyS is a(trembling-hand) perfect equilibriumof a normal-form gameG if there exists a sequenceS0, S1, . . .of completely mixed strategy profiles such thatlimn→∞ Sn = S, and such that for eachSk in the sequence and each playeri, the strategysi is a best response to the strategiessk−i.

Perfect equilibria are relevant to one aspect of multiagentlearning (see Chapter 6),which is why we mention them here. However, we do not discuss them in any detail;they are an involved topic, and relate to other subtle refinements of the Nash equilib-rium such as theproper equilibrium. The notes at the end of the chapter point the readerproper

equilibrium to further readings on this topic. We should, however, at least explain the term ‘trem-bling hand.’ One way to think about the concept is as requiring that the equilibriumbe robust against slight errors—“trembles”—on the part of players. In other words,one’s action ought to be the best response not only against the opponents’ equilibriumstrategies, but also against small perturbation of those. However, since the mathemat-ical definition speaks about arbitrarily small perturbations, whether these trembles infact model player fallibility or are merely a mathematical device is open to debate.

2.3.9 ǫ-Nash equilibrium

Our final solution concept reflects the idea that players might not care about changingtheir strategies to a best response when the amount of utility that they could gain bydoing so is very small. This leads us to the idea of anǫ-Nash equilibrium.

Definition 2.3.23 (ǫ-Nash) A strategy profiles = (s1, . . . , sn) is an ǫ-Nash equilib-rium if, for all agentsi and for all strategiess′i 6= si, ui(si, s−i) ≥ ui(s

′i, s−i)− ǫ.ǫ-Nash

equilibriumThis concept has various attractive properties.ǫ-Nash equilibria always exist; in-

deed, every Nash equilibrium is surrounded by a region ofǫ-Nash equilibria. The ar-gument that agents are indifferent to infinitesimally smallgains is convincing to many.Further, the concept can be computationally useful: algorithms that aim to identifyǫ-Nash equilibria need to consider only a finite set of mixed strategy profiles rather thanthe whole continuous space. (Of course, the size of this finite set depends on bothǫ

c© Shoham and Leyton-Brown, 2007

Page 83: Mas kevin b-2007

2.4 History and references 67

and on the game’s payoffs.) Since computers generally represent real numbers usinga floating point approximation, it is usually the case that even methods for the “exact”computation of Nash equilibria (see e.g., Section 3.2) actually find only ǫ-equilibriawhereǫ is roughly the “machine precision” (on the order of10−16 or less for mostmodern computers).ǫ-Nash equilibria are also important to multiagent learningalgo-rithms; we discuss them in that context in Section 6.3.

However,ǫ-Nash equilibria also have several drawbacks. First, although Nash equi-libria are always surrounded byǫ-Nash equilibria, the reverse is not true. Thus, a givenǫ-Nash equilibrium is not necessarily close to any Nash equilibrium. This underminesthe sense in whichǫ-Nash equilibria can be understood as approximations of Nashequilibria. Consider the game in Figure 2.18.

L R

U 1, 1 0, 0

D 1 + ǫ2 , 1 500, 500

Figure 2.18 A game with an interestingǫ-Nash equilibrium.

This game has a unique Nash equilibrium of(D,R), which can be identified throughthe iterated removal of dominated strategies. (D dominatesU for player 1; upon theremoval ofU , R dominatesL for player 2.) (D,R) is also anǫ-Nash equilibrium,of course. However, there is also anotherǫ-Nash equilibrium: (U,L). This gameillustrates two things.

First, neither player’s payoff under theǫ-Nash equilibrium is withinǫ of his pay-off in a Nash equilibrium; indeed, in general both players’ payoffs under anǫ-Nashequilibrium can be arbitrarily less than in any Nash equilibrium. The problem is thatthe requirement that player 1 cannot gain more thanǫ by deviating from theǫ-Nashequilibrium strategy profile of(U,L) does not imply thatplayer 2would not be able togain more thanǫ by best-responding to player 1’s deviation.

Second, someǫ-Nash equilibria might be very unlikely to arise in play. Althoughplayer 1 might not care about a gain ofǫ

2 , he might reason that the fact thatD dominatesU would lead player 2 to expect him to playD, and that player 2 would thus playR inresponse. Player 1 might thus playD because it is his best response toR. Overall, theidea ofǫ-approximation is much messier when applied to the identification of a fixedpoint than when it is applied to a (single-objective) optimization problem.

2.4 History and references

There exist several excellent technical introductory textbooks for game theory, includ-ing [Osborne & Rubinstein, 1994; Fudenberg & Tirole, 1991; Myerson, 1991]. Thereader interested in gaining deeper insight into game theory should consult not onlythese, but also the most relevant strands of the the vast literature on game theory which

Multiagent Systems, draft of August 27, 2007

Page 84: Mas kevin b-2007

68 2 Introduction to Non-Cooperative Game Theory: Games in Normal Form

has evolved over the years.The origins of the material covered in the chapter are as follows. In 1928 von Neu-

mann derived the “maximin” solution concept to solve zero-sum normal-form games[von Neumann, 1928]. In 1944, he together with Oskar Morgenstern authored whatwas to become the founding document of game theory [von Neumann & Morgenstern,1944]; a second edition [von Neumann & Morgenstern, 1947] quickly followed in1947. Among the many contributions of this work are the axiomatic foundations for“objective probabilities” and what became known as vonNeumann-Morgenstern utilitytheory. (The classical foundation of “subjective probabilities” is [Savage, 1954], but wedon’t cover those since they do not play a role in the book. A comprehensive overviewof these foundational topics is provided by Kreps [Kreps, 1988], among others. Ourown treatment draws on Pooleet al. [1997]; see also Russell and Norvig [2003].) Butvon Neumann and Morgenstern [1944] did much more; they introduced the normal-form game, the extensive form (to be discussed in Chapter 4),the concepts of pure andmixed strategies, as well as other notions central to game theory. Shortly afterwardsJohn Nash introduced the concept of what would become known as the “Nash Equilib-rium” [Nash, 1950; Nash, 1951], without a doubt the most influential concept in gametheory to this date.

That opened the floodgates to a series of refinements and alternative solution con-cepts, which continues to this day. We covered several of these solution concepts.Iterated removal of dominated strategies, and the closely related rationalizability, en-joy a long history, though modern discussion of them is most firmly anchored in twoindependent and concurrent publications—Pearce [1984] andBernheim [1984]. Theliterature on Pareto optimality and social optimization iseven longer, dating back to theearly 20th century, including seminal work by Pareto and Pigou, but perhaps was bestestablished by Arrow in his seminal work on social choice [Arrow, 1970]. Correlatedequilibria were introduced in [Aumann, 1974]; Myerson’s quote is taken from [Solan& Vohra, 2002]. Trembling-hand perfection was introduced in [Selten, 1975]. Butthere are many more equilibrium concepts that we did not introduce here. For exam-ple, an even stronger notion than (trembling hand) perfect equilibrium is that ofproperequilibrium [Myerson, 1978]. In Chapter 6 we discuss the concept ofevolutionarilyproper

equilibrium stable strategies[Smith, 1972] and their connection to Nash equilibria. In addition to

evolutionarilystable strategies

such single-equilibrium concepts, there are concepts thatapply to sets of equilibria,not single ones. Of note are the notions ofstable equilibriaas originally defined in

stable equilibria

Kohlberg and Mertens [1986], and various later refinements such ashyperstable sets

hyperstable sets

defined in Govindan and Wilson [2005a]. Good surveys of many of these concepts canbe found in Hillas and Kohlberg [2002] and Govindan and Wilson [2005b].

c© Shoham and Leyton-Brown, 2007

Page 85: Mas kevin b-2007

3 Computing Solution Concepts ofNormal-Form Games

The discussion of strategies and solution concepts in Chapter 2 largely ignored issuesof computation. We start by asking the most basic question: How hard is it to computethe Nash equilibria of a game? The answer turns out to be quitesubtle, and to dependon the class of games being considered.

We have already seen how to compute the Nash equilibria of simple matrix games.These calculations were deceptively simple, partly because there were only two play-ers, and partly because each player had only two actions. In this chapter we discussseveral different classes of games, starting with the simple two-player, zero-sum nor-mal form game. Dropping only the zero-sum restriction yields a problem of differentcomplexity—while it is generally believed that any algorithm that guarantees a solutionmust have an exponential worst case complexity, it is also believed that a proof to thiseffect may not emerge for some time. We also consider procedures forn-player games.In each case, we describe how to formulate the problem, the algorithm (or algorithms)commonly used to solve them, and the complexity of the problem. While we focus onthe problem of finding a sample Nash equilibrium, we will briefly discuss the problemof finding all Nash equilibria and finding equilibria with specific properties. Alongthe way we also discuss the computation of other game-theoretic solution concepts:maxmin and minmax strategies, strategies that survive iterated removal of dominatedstrategies, and correlated equilibria.

3.1 Computing Nash equilibria of two-player zero-sum games

The class of two-player, zero-sum games is the easiest to solve. The Nash equilibriumproblem for such games can be expressed as alinear program (LP), which means thatequilibria can be computed in polynomial time.1 Consider a two-player zero-sum gameG = (1, 2, A1, A2, u1, u2). LetU∗

i be the expected utility for playeri in equilibrium(the value of the game); since the game is zero-sum,U∗

1 = −U∗2 . The minmax theorem

(see Section 2.3.3 and Theorem 2.3.12) tells us thatU∗1 holds constant in all equilibria,

and that it is the same as the value that player 1 achieves under a minmax strategy byplayer 2. Using this result, we can construct the linear program shown in Figure 3.1.

1. Appendix B reviews the basics of linear programming.

Page 86: Mas kevin b-2007

70 3 Computing Solution Concepts of Normal-Form Games

minimize U∗1 (3.1)

subject to∑

k∈A2

u1(aj1, a

k2) · sk

2 ≤ U∗1 ∀j ∈ A1 (3.2)

k∈A2

sk2 = 1 (3.3)

sk2 ≥ 0 ∀k ∈ A2 (3.4)

Figure 3.1 Linear program for computing a (possibly mixed-strategy) equilibrium strategy forPlayer 2 in a two-player zero-sum game.

Note first of all that the utility termsu1(·) are constants in the linear program, whilethe mixed strategy termss·2 andU∗

1 are variables. Let’s start by looking at constraint(3.2). This states that for every pure strategyj of player 1, his expected utility forplaying any actionj ∈ A1 given player 2’s mixed strategys2 is at mostU∗

1 . Thosepure strategies for which the expected utility is exactlyU∗

1 will be in player 1’s bestresponse set, while those pure strategies leading to lower expected utility will not. Ofcourse, as mentioned aboveU∗

1 is a variable; the linear program will choose player 2’smixed strategy in order to minimizeU∗

1 subject to the constraint just discussed. Thus,lines (3.1) and (3.2) state that player 2 plays the mixed strategy that minimizes theutility player 1 can gain by playing his best response. This is almost exactly whatwe want. All that is left is to ensure that the values of the variablessk

2 are consistentwith their interpretation as probabilities. Thus, the linear program also expresses theconstraints that these variables must sum to one (3.3) and must each be non-negative(3.4).

This linear program gives us player 2’s mixed strategy in equilibrium. In the samefashion, we could construct a linear program to give us player 1’s mixed strategies.This program would reverse the roles of player 1 and player 2;it would also be amaximization problem, as player 2’s payoffs are the negatives of player 1’s payoffs,and hence player 1 wants tomaximizeplayer 2’s payoffs. However, it still refers toplayer 1’s utility, since player 2’s utility is just the negative of this amount. This givesus thedual of player 2’s program; it is given in Figure 3.2.

Finally, we give a formulation equivalent to the linear program in Figure 3.1, whichwill be useful in the next section. This program works by introducingslack variablesslack variablesrj1 for everyj ∈ A1, and then replacing the inequality constraints with equality con-

straints. This LP formulation is given in Figure 3.3.Comparing the LP formulation in Figure 3.3 with our first formulation in Figure 3.1,

observe that constraint (3.2) changed to constraint (3.10), and that a new constraint(3.13) was introduced. To see why the two formulations are equivalent, note that sinceconstraint (3.13) requires only that each slack variable must be positive, the require-ment of equality in constraint (3.10) is equivalent to the inequality in constraint (3.2).

c© Shoham and Leyton-Brown, 2007

Page 87: Mas kevin b-2007

3.2 Computing Nash equilibria of two-player general-sum games 71

maximize U∗1 (3.5)

subject to∑

j∈A1

u1(aj1, a

k2) · sj

1 ≥ U∗1 ∀k ∈ A2 (3.6)

j∈A1

sj1 = 1 (3.7)

sj1 ≥ 0 ∀j ∈ A1 (3.8)

Figure 3.2 Linear program for computing a (possibly mixed-strategy) equilibrium strategy forPlayer 1 in a two-player zero-sum game.

minimize U∗1 (3.9)

subject to∑

k∈A2

u1(aj1, a

k2) · sk

2 + rj1 = U∗

1 ∀j ∈ A1 (3.10)

k∈A2

sk2 = 1 (3.11)

sk2 ≥ 0 ∀k ∈ A2 (3.12)

rj1 ≥ 0 ∀j ∈ A1 (3.13)

Figure 3.3 Slack variable-based linear program for computing a (possibly mixed-strategy)equilibrium strategy for Player 2 in a two-player zero-sum game.

3.2 Computing Nash equilibria of two-player general-sum games

Unfortunately, the problem of finding a Nash equilibrium of ageneral-sum two-playergame cannot be formulated as a linear program. Essentially,this is because the twoplayers’ interests are no longer diametrically opposed. Thus, we cannot state our prob-lem as an optimization problem: one player is not trying to minimize the other’s utility.

3.2.1 Complexity of computing Nash equilibria

The issue of characterizing the complexity of computing a sample Nash equilibriumis tricky. No known reduction exists from our problem to a decision problem whichis NP-complete (nor has our problem been shown to be easier). An intuitive stum-bling block is thateverygame has at least one Nash equilibrium, whereas knownNP-complete problems are expressible in terms of decision problems that do not alwayshave solutions.

Current knowledge about the complexity of computing a sample Nash equilibriumthus relies upon another, less familiar complexity class which describes the problemof finding a solution which always exists. This class is called PPAD, which stands

Multiagent Systems, draft of August 27, 2007

Page 88: Mas kevin b-2007

72 3 Computing Solution Concepts of Normal-Form Games

for “polynomial parity argument, directed version”. To describe this class we mustfirst define a family of directed graphs which we will denoteG(n). Let each graphin this family be defined on a setN of 2n nodes. Although each graph inG(n) thuscontains a number of nodes that is exponential inn, we want to restrict our attentionto graphs that can be described in polynomial space. There isno need to encode theset of nodes explicitly; we encode the set of edges in a given graph as follows. LetParent : N → N andChild : N → N be two functions that can be encoded asarithmetic circuits with sizes polynomial inn.2 Let there be one graphG ∈ G(n) forevery such pair ofParent andChild functions, as long asG satisfies one additionalrestriction which is described below. Given such a graphG, an edge exists from a nodej to a nodek iff Parent(k) = j andChild(j) = k. Thus, each node has either zeroparents or one parent and either zero children or one child. The additional restriction isthat there must exist one distinguished node0 ∈ N with exactly zero parents.

The above constraints on the in- and out-degrees of the nodesin graphsG ∈ G(n)imply that every node is either part of a cycle or part of a pathfrom a source (a parent-less node) to a sink (a childless node). The computational task of problems in the classPPAD is finding either a sink or a source other than0 for a given graphG ∈ G(n).Such a solution always exists: because the node0 is a source, there must be some sinkwhich is either a descendent of0 or 0 itself.

We can now state the main complexity result.

Theorem 3.2.1 The problem of finding a sample Nash equilibrium of a general-sumfinite game with two or more players isPPAD-complete.

Of course, this proof is achieved by showing that the problemis in PPAD and thatany other problem inPPAD can be reduced to it in polynomial time. To show thatthe problem is inPPAD, a reduction is given which expresses the problem of findinga Nash equilibrium as the problem of finding source or sink nodes in a graph as de-scribed above. This reduction proceeds quite directly fromthe proof that every gamehas a Nash equilibrium which appeals to Sperner’s lemma. Theharder part is the otherhalf of the puzzle: showing that Nash equilibrium computation is PPAD-hard, or inother words that every problem inPPAD can be reduced to finding a Nash equilibriumof some game with size polynomial in the size of the original problem. This result, ob-tained in 2005, is a culmination of a series of intermediate results obtained over morethan a decade. The initial results relied in part on the concept of graphical games(seegraphical gamesSection 5.5.2) which, in equilibrium, simulate the behavior of the arithmetic circuitsParent andChild used in the definition ofPPAD. More details are given in the notesat the end of the chapter.

What are the practical implications of the result that the problem of finding a sampleNash equilibrium isPPAD-complete? As is the case with other complexity classes suchasNP, it is not known whether or notP = PPAD. However, it is generally believed (e.g.,due to oracle arguments) that the two classes are not equivalent. Thus, the commonbelief is that in the worst case, computing a sample Nash equilibrium will take timethat is exponential in the size of the game. We do know for surethat finding a Nash

2. We warn the reader that some technical details are glossed over here.

c© Shoham and Leyton-Brown, 2007

Page 89: Mas kevin b-2007

3.2 Computing Nash equilibria of two-player general-sum games 73

equilibrium of a two-player game is no easier than finding an equilibrium of ann-player game—a result that may be surprising, given that in practice different algorithmsare used for the two-player case than for then-player case—and that finding a Nashequilibrium is no easier than finding an arbitrary Brouwer fixed point.

3.2.2 An LCP formulation and the Lemke-Howson algorithm

We now turn to algorithms for computing sample Nash equilibria, notwithstanding thediscouraging computational complexity of this problem. Westart with theLemke-Howson algorithm, for two reasons. First, it is the best-known algorithm for the Lemke-Howson

algorithmgeneral-sum, two-player case (however, it must be said, notthe fastest algorithm, ex-perimentally speaking). Second, it provides insight in thestructure of Nash equilibria,and indeed constitutes an independent, constructive proofof Nash’s theorem (Theo-rem 2.3.7).

Unlike in the special zero-sum case, the problem of finding a sample Nash equi-librium cannot be formulated as a linear program. However, the problem of findinga Nash equilibrium of a general-sum two-player game can be formulated as alinearcomplementarity problem (LCP). In this section we show how to construct this formu-linear

complementarityproblem (LCP)

lation by starting with the slack variable formulation fromFigure 3.3. After giving theformulation, we present the Lemke-Howson algorithm, whichcan be used to solve thisLCP.

As it turns out, our LCP will have no objective function at all, and is thus a constraintsatisfaction problem, or afeasibility program, rather than an optimization problem. feasibility

programAlso, we can no longer determine one player’s equilibrium strategy by only consideringthe other player’s payoff; instead, we will need to discuss both players explicitly. TheLCP for computing the Nash equilibrium of a general-sum two-player game is given inFigure 3.4.

k∈A2

u1(aj1, a

k2) · sk

2 + rj1 = U∗

1 ∀j ∈ A1 (3.14)

j∈A1

u2(aj1, a

k2) · sj

1 + rk2 = U∗

2 ∀k ∈ A2 (3.15)

j∈A1

sj1 = 1,

k∈A2

sk2 = 1 (3.16)

sj1 ≥ 0, sk

2 ≥ 0 ∀j ∈ A1, ∀k ∈ A2 (3.17)

rj1 ≥ 0, rk

2 ≥ 0 ∀j ∈ A1, ∀k ∈ A2 (3.18)

rj1 · sj

1 = 0, rk2 · sk

2 = 0 ∀j ∈ A1, ∀k ∈ A2 (3.19)

Figure 3.4 LCP formulation for computing a sample Nash equilibrium of a two-player general-sum game

Observe that this formulation bears a strong resemblance tothe LP formulation withslack variables given in Figure 3.3. Let us go through the differences. First, as dis-

Multiagent Systems, draft of August 27, 2007

Page 90: Mas kevin b-2007

74 3 Computing Solution Concepts of Normal-Form Games

cussed above the LCP has no objective function. Second, constraint (3.14) is the sameas constraint (3.10) in our LP formulation; however, here wealso include constraint(3.15) which constrains player 2’s actions in the same way. We also give the stan-dard constraints that probabilities sum to one (3.16), thatprobabilities are non-negative(3.17) and that slack variables are non-negative (3.18), but now state these constraintsfor both players rather than only for player 1.

If we included only constraints (3.14–3.18), we would stillhave a linear program.However, we would also have a flaw in our formulation: the variablesU∗

1 andU∗2 would

be insufficiently constrained. We want these values to express the expected utility thateach player would achieve by playing his best response to theother player’s chosenmixed strategy. However, with the constraints we have described so far,U∗

1 andU∗2

would be allowed to take unboundedly large values, because all of these constraintsremain satisfied when bothU∗

i and rji are increased by the same constant, for any

given i andj. We solve this problem by adding the nonlinear constraint (3.19), calledthecomplementarity condition. The addition of this constraint means that we no longercomplementarity

condition have a linear program; instead, we have a linear complementarity problem.Why does the complementarity condition fix our problem formulation? This con-

straint requires that whenever an action is played by a givenplayer with positive prob-ability (i.e., whenever an action is in the support of a givenplayer’s mixed strategy)then the corresponding slack variable must be zero. Under this requirement, each slackvariable can be viewed as the player’s incentive to deviate from the corresponding ac-tion. Thus, the complementarity condition captures the fact that, in equilibrium, allstrategies that are played with positive probability must yield the same expected pay-off, while all strategies that lead to lower expected payoffs are not played. Taking all ofour constraints together, we are left with the requirement that each player plays a bestresponse to the other player’s mixed strategy: the definition of a Nash equilibrium.

The primary algorithm used to solve this LCP formulation is theLemke-Howson al-gorithm. We will explain how it works through a graphical exposition. Consider theLemke-Howson

algorithm game in Figure 3.5. Figure 3.6 shows a graphical representation of the two players’

0, 1 6, 0

2, 0 5, 2

3, 4 3, 3

Figure 3.5 A game for the exposition of the Lemke-Howson algorithm.

mixed-strategy spaces in this game. Each player’s strategyspace is shown in a sepa-rate graph; within a graph, each axis corresponds to one of the corresponding player’spure strategies. For example, in the right-hand side of the figure, the two dots showplayer 2’s two pure strategies, and the line connecting themrepresents all his possiblemixed strategies.

Similarly, player 1’s three pure strategies are represented by the points(0, 0, 1),

c© Shoham and Leyton-Brown, 2007

Page 91: Mas kevin b-2007

3.2 Computing Nash equilibria of two-player general-sum games 75

6s31

s11

@@

@@R

s21

s

s

s

s

BBBBBBBBBB

BBBBBBBBBB

BBBBBBBBBB

PPPPPPPPP

PPPPPPPPP

PPPPPPPPP

(1,0,0)

(0,0,1)

(0,1,0)

6s22

- s12

s

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

s

s(1,0)

(0,1)

Figure 3.6 Strategy spaces for player 1 (left) and player 2 (right) in the game from Figure 3.5.

(0, 1, 0), and(1, 0, 0), while his mixed strategies are represented by the region boundedby the triangle having these three points as its vertices.

Our next step in defining the Lemke-Howson algorithm is to define a labeling on thestrategies. Every possible mixed strategysi is given a set of labelsL(sj

i ) ⊆ A1 ∪ A2

drawn from the set of available actions for both players. Denoting a given player asiand the other player as−i, mixed strategysi for playeri is labeled as follows:

• with each of playeri’s actionsaji that isnot in the support ofsi; and

• with each of player−i’s actionsaj−i that is a best response by player−i to si.

This labeling is useful because a pair of strategies(s1, s2) is a Nash equilibriumif and only if it is completely labeled (that is,L(s1) ∪ L(s2) = A1 ∪ A2). For apair to be completely labeled, each actionaj

i must either played by playeri with zeroprobability, or be a best response by playeri to the mixed strategy of player−i.3 degenerate gameThe requirement that a pair of mixed strategies must be completely labeled can beunderstood as a restatement of the complementarity condition given in constraint (3.19)in Figure 3.4, because the slack variablerj

i is zero exactly when its correspondingactionaj

i is a best response to the mixed strategys−i.Returning to our running example, the labeled version of thestrategy spaces is given

in Figure 3.7. Consider first the right side of Figure 3.7, which describes player 2’sstrategy space, and examine the two regions labeled with player 2’s actions. The linefrom (0, 0) to (0, 1) is labeled witha1

2, because none of these mixed strategies assignany probability to playing actiona1

2. In the same way, the line from(0, 0) to (1, 0)

3. We must introduce a certain caveat here. In general, it is possible that some actions will satisfy bothof these conditions, and thus belong to bothL(s1) andL(s2); however, this will not occur when a gameis nondegenerate. Full discussion of degenericity lies beyond the scope of the book, but for the record, onedefinition is as follows: A two-player game isdegenerateif there exists some mixed strategy for either playersuch that the number of pure strategy best responses of the other player is greater than the size of the supportof the mixed strategy. Here we will assume that the game is non-degenerate.

Multiagent Systems, draft of August 27, 2007

Page 92: Mas kevin b-2007

76 3 Computing Solution Concepts of Normal-Form Games

6s31

s11

@@

@@R

s21

s

s

s

s

BBBBBBBBBB

BBBBBBBBBB

BBBBBBBBBB

PPPPPPPPP

PPPPPPPPP

PPPPPPPPP

ss

a21

a12

a22

a11

a31

(0, 1

3, 23

)

(23

, 13

, 0)

(1,0,0)

(0,0,1)

(0,1,0)

6s22

- s12

s

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

s

s

ss

a12

a22

a21

a11

a31

(13

, 23

)

(23

, 13

)

(1,0)

(0,1)

Figure 3.7 labeled strategy spaces for player 1 (left) and player 2 (right) in the gamefromFigure 3.5.

is labeled witha22. Now consider the three regions labeled with player 1’s actions.

Examining the payoff matrix in Figure 3.5, you can verify that e.g., the actiona11 is a

best response by player 1 to any of the mixed strategies represented by the line from(0, 1) to ( 1

3 ,23 ). Notice that the point( 1

3 ,23 ) is labeled by botha1

1 anda21, because both

of these actions are best responses by player 1 to the mixed strategy( 13 ,

23 ) by player 2.

Now consider the left side of Figure 3.7, representing player 1’s strategy space.There is a region labeled with each actionaj

1 of player 1 which is the triangle hav-ing a vertex at the origin and running orthogonal to the axissj

1. (Can you see whythese are the only mixed strategies for player 1 that do not involve the actionaj

1?) Thetwo regions for the labels corresponding to actions of player 2 (a1

2 anda22) divide the

outer triangle. As above, note that some mixed strategies are multiply labeled: forexample, the point

(0, 1

3 ,23

)is labeled witha1

2, a22 anda1

1.The Lemke-Howson algorithm can be understood as searching these pairs of graphs.

It turns out that it is convenient to initialize the algorithm at the point(0, 0, 0) forplayer 1 and(0, 0) for player 2; thus, we want to be able to consider these pointsasbelonging to the players’ strategy spaces. While discussingthis algorithm, therefore,we redefine the players’ strategy spaces to be the convex hullof their true strategyspaces and the origin of the graph. (This can be understood asreplacing the constraintthat

∑j s

ji = 1 with the constraint that

∑j s

ji ≤ 1.) Thus, player 2’s strategy space

is a triangle with vertices(0, 0), (1, 0) and(0, 1), while player 1’s strategy space is apyramid with vertices(0, 0, 0), (1, 0, 0), (0, 1, 0) and(0, 0, 1).

We can now abstract away from these two simplexes, and define apair of graphs.DefineG1 andG2 to be the graphs (for players 1 and 2 respectively) in which thenodes are these points, and in which an edge exists between pairs of points that differin exactly one label. These graphs for our example are shown in Figure 3.8; each nodeis annotated with the mixed strategy to which it correspondsas well as the actions withwhich it is labeled.

When the game is nondegenerate, there are no points with more labels than the given

c© Shoham and Leyton-Brown, 2007

Page 93: Mas kevin b-2007

3.2 Computing Nash equilibria of two-player general-sum games 77

G1:

@@

@

s

s

s

s

BBBBBBBBBB

BBBBBBBBBB

BBBBBBBBBB

PPPPPPPPP

PPPPPPPPP

PPPPPPPPP

ss (

0, 13

, 23

)

a11, a1

2, a22

(23

, 13

, 0)

a31, a1

2, a22

(1, 0, 0)

a21, a3

1, a12

(0, 0, 1)

a11, a2

1, a12

(0, 1, 0)

a11, a3

1, a22

(0, 0, 0)

a11, a2

1, a31

G2:

s

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

@@

s

s

ss

(13

, 23

)

a11, a2

1

(23

, 13

)

a21, a3

1

(1, 0)

a31, a2

2

(0, 1)

a11, a1

2

(0, 0)

a12, a2

2

Figure 3.8 Graph of triply-labeled strategies for player 1 (left) and doubly-labeled strategiesfor player 2 (right) derived from the game in Figure 3.5.

player has actions, which implies that a completely labeledpair of strategies mustconsist of two points that have no labels in common. In our example it is easy to findthe three Nash equilibria of the game by inspection:((0,0,1),(1,0)),

((0, 1

3 ,23

),(

23 ,

13

)),

and((

23 ,

13 , 0),(

13 ,

23

)). However, such a search was only viable because the game is

so small—in general the number of possible pairs that we wouldhave to search throughcan be exponential in the number of actions.

The Lemke-Howson algorithm finds an equilibrium by following a path throughpairs(s1, s2) ∈ G1 ×G2 in the cross product of the two graphs. Alternating betweenthe two graphs, each iteration changes one of the two points to a new point that isconnected by an edge to the original point. The starting point is the origin in bothgraphs; we denote this pair as(0,0), where a bold0 denotes a vector of zeros.4 Startingfrom (0,0), which is completely labeled, the algorithm picks one of thetwo graphs, andmoves from0 in that graph to some adjacent nodex. The nodex, together with the0from the other graph, together form an almost completely labeled pair, in that betweenthem they miss exactly one label. The algorithm then moves from the remaining0 to aneighboring node with picks up that missing label, but in theprocess possibly loses adifferent label. The process thus proceeds, altenating between the two graphs, until anequilibrium (that is, a totally labeled pair) is reached.

In our running example, a possible execution of the algorithm starts at (0,0) andthen changess1 to (0, 1, 0). Now, our pair

((0, 1, 0), (0, 0)

)is a1

2-almost completelylabeled, and the duplicate label isa2

2. For its next step inG2 the algorithm movesto (0, 1) because the other possible choice,(1, 0), has the labela2

2. Returning toG1

for the next iteration, we move to(

23 ,

13 , 0)

because it is the point adjacent to(0, 1, 0)

that does not have the duplicate labela11. The final step is to changes2 to

(13 ,

23

)in

order to move away from the labela12. We have now reached the completely labeled

pair((

23 ,

13 , 0),(

13 ,

23

)), and the algorithm terminates. This execution trace can be

4. Incidentally, this pair is called the extraneous solution. It is the only extra solution to the LCP when wechange constraint (3.16) to require that the probabilitiesin a mixed strategy sum to no more than one.

Multiagent Systems, draft of August 27, 2007

Page 94: Mas kevin b-2007

78 3 Computing Solution Concepts of Normal-Form Games

summarized by the following path:((0, 0, 0), (0, 0)

)→((0, 1, 0), (0, 0)

)→((0, 1, 0), (0, 1)

)

→((

2

3,1

3, 0

), (0, 1)

)→((

2

3,1

3, 0

),

(1

3,2

3

))

This is a good time to point out an aspect of the algorithm thatwe have so far sweptunder the rug. In abstracting away to the two-graph representation we did not specifyhow to compute the graph nodes from the game description. In fact, we do not computethe nodes in advance at all. Instead, we compute them incrementally along the pathbeing explored. At each step, once we decide on the label to beadded, we solve arelatively simple optimization problem and compute its coefficient as well as the newcoefficients of the other variables (in particular, losing one of the variables, except atthe solution point).5

The Lemke-Howson algorithm has some good properties. First, it is guaranteed tofind a sample Nash equilibrium. Indeed, its constructive nature constitutes an alterna-tive proof regarding the existence of a Nash equilibrium (Theorem 2.3.7). Also, notethe following interesting fact: Since the algorithm repeatedly seeks to cover a miss-ing label, after choosing the initial move away from(0,0), the path through almostcompletely labeled pairs to an equilibrium is unique. So while the algorithm is nonde-terministic, all the nondeterminism is concentrated in itsfirst move. Finally, it can beused to find more than one Nash equilibrium. The reason the algorithm is initializedto start at the origin is that this is the only pair that is known a priori to be completelylabeled. However, once we have found another completely-labeled pair, we can useitas the starting point, allowing us to reach additional equilibria. For example, starting atthe equilibrium we just found and making an appropriate firstchoice, we can quicklyfind another equilibrium by the following path:((

2

3,1

3, 0

),

(1

3,2

3

))→((

0,1

3,2

3

),

(1

3,2

3

))→((

0,1

3,2

3

),

(2

3,1

3

))

The remaining equilibrium can be found using the following path from the origin:((0, 0, 0), (0, 0)

)→((0, 0, 1), (0, 0)

)→((0, 0, 1), (1, 0)

)

However, the algorithm is not without its limitations. Whilewe were able to use thealgorithm to find all equilibria in our running example, in general we are not guaranteedto be able to do so. As we have seen, the Lemke-Howson algorithm can be thought ofas exploring a graph of all completely- and almost-completely-labeled pairs. The badnews is that this graph can be disconnected, and the algorithm is only able to find theequilibria in the connected component that contains the origin (although luckily, thereis guaranteed to be at least one such equilibrium). Not only are we unable to guaranteethat we will find all equilibria—there is not even an efficient way to determine whetheror not all equilibria have been found.

5. For the reader familiar with methods such as the simplex algorithm for linear programming, we remarkthat it is a similar pivoting mechanism. The section at the end ofthe chapter provides pointers to morecomplete descriptions of this procedure.

c© Shoham and Leyton-Brown, 2007

Page 95: Mas kevin b-2007

3.2 Computing Nash equilibria of two-player general-sum games 79

Even with respect to finding a single equilibrium we are not trouble free. First,there is still indeterminacy in the first move, and the algorithm provides no guidanceon how to make a good first choice, one that will lead to a relatively short path to theequilibrium, if one exists. And one may not exist—there are cases in whichall paths areof exponential length (and thus the time complexity of the Lemke-Howson algorithm isprovably exponential). Finally, even if one gives up on worst case guarantees and hopesfor good heuristics, the fact that the algorithm has no objective function means that itprovides no obvious guideline to assess how close it is to a solution before actuallyfinding one.

Nevertheless, despite all these limitations, the Lemke-Howson algorithm remains akey element in understanding the algorithmic structure of Nash equilibria in generaltwo-person games.

3.2.3 Searching the space of supports

One can identify a spectrum of approaches to the design of algorithms. At one end ofthe spectrum one can develop deep insight into the structureof the problem, and crafta highly specialized algorithm based on this insight. The Lemke-Howson algorithmlies close to this end of the spectrum. At the other end of the spectrum, one identifiesrelatively shallow heuristics, and hope that these, coupled with ever-increasing com-puting power, will do the job. Of course, in order to be effective, even these heuristicsmust embody some insight into the problem. However, this insight tends to be limitedand local, yielding rules of thumb that aid in guiding the search through the space ofpossible solutions, but that do not directly yield a solution. One of the lessons fromcomputer science is that sometimes heuristic approaches can outperform more sophis-ticated algorithms in practice. In this section we discuss such a heuristic algorithm.

The basic idea behind the algorithm is straightforward. We first note that while thegeneral problem of computing a Nash equilibrium (NE) is a complementarity problem,computing whether there exists a NE with aparticular support6 for each player is arelatively simple feasibility program. So the problem is reduced to searching the spaceof supports. Of course the size of this space is exponential in the number of actions,and this is where the heuristics come in.

We start with the feasibility program. Given a support profileS = (S1, S2) as input(Si is a subset of playeri’s actions), TGS (for ‘Test Given Supports’) below, gives theformal description of a program for finding a NEp consistent withS, or proving thatno such strategy profile exists. In this program,vi corresponds to the expected utilityof playeri in an equilibrium, and the subscript−i indicates the player other thani asusual. The complete program is given in Figure 3.9.

Constraints (3.20) and (3.21) require that each player mustbe indifferent betweenall actions within his support, and must not strictly preferan action outside of his sup-port. These imply that neither player can deviate to a pure strategy that improves hisexpected utility, which is exactly the condition for the strategy profile to be a NE. Con-straints (3.22) and (3.23) ensure that eachSi can be interpreted as the support of playeri’s mixed strategy: the pure strategies inSi must be played with zero or positive prob-

6. Recall that the support specifies the pure strategies played with nonzero probability; see Definition 2.2.6.

Multiagent Systems, draft of August 27, 2007

Page 96: Mas kevin b-2007

80 3 Computing Solution Concepts of Normal-Form Games

a−i∈S−i

p(a−i)ui(ai, a−i) = vi ∀i ∈ 1, 2, ai ∈ Si (3.20)

a−i∈S−i

p(a−i)ui(ai, a−i) ≤ vi ∀i ∈ 1, 2, ai ∈/ Si (3.21)

pi(ai) ≥ 0 ∀i ∈ 1, 2, ai ∈ Si (3.22)

pi(ai) = 0 ∀i ∈ 1, 2, ai ∈/ Si (3.23)∑

ai∈Si

pi(ai) = 1 ∀i ∈ 1, 2 (3.24)

Figure 3.9 Feasibility program TGS

ability, and the pure strategies not inSi must be played with zero probability.7 Finally,Constraint (3.32) ensures that eachpi can be interpreted as a probability distribution. Asolution will be returned only when there exists an equilibrium with supportS (subjectto the caveat in Footnote 7).

With this feasibility program in our arsenal, we can proceedto search the spaceof supports. There are three keys to the efficiency of the following algorithm, calledSEM (for Support-Enumeration Method). The first two are the factors used to orderSupport-

EnumerationMethod

the search space. Specifically, SEM considers every possible support size profile sep-arately, favoring support sizes that are balanced and small. The third key to SEM isthat it separately instantiates each player’s support, making use of what we will callconditional strict dominanceto prune the search space.conditional strict

dominanceDefinition 3.2.2 (Conditionally strictly dominated action) An actionai ∈ Ai is con-ditionally strictly dominated, given a profile of sets of available actionsR−i ⊆ A−i

for the remaining agents, if the following condition holds:∃a′i ∈ Ai ∀a−i ∈ R−i :ui(ai, a−i) < ui(a

′i, a−i).

Observe that this definition is strict because, in a Nash equilibrium, no action thatis played with positive probability can be conditionally dominated given the actions inthe support of the opponents’ strategies. The problem of checking whether an actionis conditionally strictly dominated is equivalent to the problem of checking whetherthe action is strictly dominated by a pure strategy in a reduced version of the originalgame. As we show in Section 3.7.1, this problem can be solved in time linear in thesize of the game.

The preference for small support sizes amplifies the advantages of checking for con-ditional dominance. For example, after instantiating a support of size two for the firstplayer, it will often be the case that many of the second player’s actions are pruned, be-cause only two inequalities must hold for one action to conditionally dominate another.

7. Note that Constraint (3.22) allows an actionai ∈ Si to be played with zero probability, and so thefeasibility program may sometimes find a solution even when someSi includes actions which are not in thesupport. However, playeri must still be indifferent between actionai and each other actiona′

i ∈ Si. Thus,simply substituting inSi = Ai would not necessarily yield a Nash equilibrium as a solution.

c© Shoham and Leyton-Brown, 2007

Page 97: Mas kevin b-2007

3.3 Computing Nash equilibria ofn-player general-sum games 81

Pseudo-code for SEM is given in Figure 3.10.

forall support size profilesx = (x1, x2), sorted in increasing order of, first,|x1 − x2| and, second,(x1 + x2) do

forall S1 ⊆ A1 s.t. |S1| = x1 doA′

2 ← a2 ∈ A2 not conditionally dominated, givenS1 if ∄a1 ∈ S1 conditionally dominated, givenA′

2 thenforall S2 ⊆ A′

2 s.t. |S2| = x2 doif ∄a1 ∈ S1 conditionally dominated, givenS2 andTGS issatisfiable forS = (S1, S2) then

return the solution found; it is a NE

Figure 3.10 The SEM algorithm

Note that SEM is complete, because it considers all support size profiles, and becauseit only prunes actions that arestrictly dominated. As mentioned above, the numberof supports is exponential in the number of actions and hencethis algorithm has anexponential worst-case running time.

Of course, any enumeration order would yield a solution; theparticular ordering herehas simply been shown to yield solutions quickly in practice. In fact, extensive testingon a wide variety of games encountered throughout the literature has shown SEM toperformbetterthan the more sophisticated algorithms. Of course, this result tells us asmuch about the games in the literature (e.g., they tend to have small-support equilibria)as it tells us about the algorithms.

3.3 Computing Nash equilibria ofn-player general-sum games

For n-player games wheren ≥ 3, the problem of finding a Nash equilibrium can nolonger be represented even as an LCP. While it does allow a formulation as anon-linear complementarity problem, such problems are often hopelessly impractical tonon-linear

complementarityproblem

solve exactly. Unlike the two-player case, therefore, it isunclear how to best formulatethe problem as input to an algorithm. In this section we discuss three possibilities.

Instead of solving the non-linear complementarity problemexactly, there has beensome success approximating the solution using asequence of linear complementarityproblems (SLCP). Each LCP is an approximation of the problem, and its solution isused to create the next approximation in the sequence. This method can be thoughtof as a generalization toNewton’s methodof approximating the local maximum of aquadratic equation. Although this method is not globally convergent, in practice it isoften possible to try a number of different starting points because of its relative speed.

Another approach is to formulate the problem as a minimum of afunction. First, weneed to define some more notation. Starting from a strategy profile s, let cji (s) be thechange in utility to playeri if he switches to playing actionaj

i as a pure strategy. Then,definedj

i (s) ascji (s) bounded from below by zero.

Multiagent Systems, draft of August 27, 2007

Page 98: Mas kevin b-2007

82 3 Computing Solution Concepts of Normal-Form Games

cji (s) = ui(aji , s−i)− ui(s)

dji (s) = max

(cji (s), 0

)

Note thatdji (s) is positive if and only if playeri has an incentive to deviate to action

aji . Thus, strategy profiles is a Nash equilibrium if and only ifdj

i (s) = 0 for all playersi, and all actionsj for each player.

minimize∑

i∈N

j∈Ai

(dj

i (s))2

(3.25)

subject to∑

a∈Ai

sji = 1 ∀i ∈ N (3.26)

sji ≥ 0 ∀i ∈ N ∀j ∈ Ai (3.27)

Figure 3.11 Computing a sample Nash equilibrium of ann-player game as a constrained opti-mization problem.

We capture this property in the objective function of Figure3.11 (Equation (3.25));we will refer to this function asf(s). This function has one or more global minimaat 0, and the set of alls such thatf(s) = 0 is exactly the set of Nash equilibria. Ofcourse, this property holds even if we did not square eachdj

i (s), but doing so makesthe function differentiable everywhere. The constraints on the function are the obviousones: each player’s distribution over actions must sum to one, and all probabilities mustbe nonnegative. The advantage of this method is its flexibility. We can now apply anymethod for constrained optimization.

minimize∑

i∈N

j∈Ai

(dj

i (s))2

+∑

i∈N

j∈Ai

(min(sj

i , 0))2

+∑

i∈N

1−∑

j∈Ai

sji

2

Figure 3.12 Computing a sample Nash equilibrium of ann-player game as an unconstrainedoptimization problem.

If we instead want to use an unconstrained optimization method, we can roll theconstraints into the objective function (which we now callg(s)) in such a way that westill have a differentiable function that is zero if and onlyif s is a Nash equilibrium.This optimization problem is shown in Figure 3.12. Observe that the first term ing(s)is justf(s) from Figure 3.11. The second and third terms ing(s) enforce the first andsecond constraints from Figure 3.11 respectively.

A disadvantage in the formulations given in both Figure 3.11and Figure 3.12 is thatboth optimization problems have local minima which do not correspond to Nash equi-

c© Shoham and Leyton-Brown, 2007

Page 99: Mas kevin b-2007

3.3 Computing Nash equilibria ofn-player general-sum games 83

libria. Thus, global convergence is an issue. For example, considering the commonly-used optimization methods hill-climbing and simulated annealing, the former get stuckin local minima while the latter often converge globally only for parameter settings thatyield an impractically long running time.

When global convergence is required, a common choice is to turn to the class ofsimplicial subdivision algorithms. Before describing these algorithms we will revisitsimplicial

subdivisionalgorithms

some properties of the Nash equilibrium. Recall from the Nash existence theorem(Theorem 2.3.7) that Nash equilibria are fixed points of the best response function,f .(As defined previously, given a strategy profiles = (s1, s2, . . . , sn), f(s) consists ofall strategy profiles(s′1, s

′2, . . . , s

′n) such thats′i is a best response by playeri to s−i.)

Since the space of mixed strategy profiles can be viewed as a product of simplexes—aso-calledsimplotope—f is a function mapping from a simplotope to a set of simplo-simplotopetopes.

Scarf’s algorithm is a simplicial subdivision method for finding the fixed point ofany function on a simplex or simplotope. It divides the simplotope into small regionsand then searches over the regions. Unfortunately, such a search is approximate, sincea continuous space is approximated by a mesh of small regions. The quality of theapproximation can be controlled by refining the meshes into smaller and smaller sub-divisions. One way to do this is by restarting the algorithm with a finer mesh after aninitial solution has been found. Alternately, ahomotopy methodcan be used. In this homotopy

methodapproach, a new variable is added that represents the fidelity of the approximation, andthe variable’s value is gradually adjusted until the algorithm converges.

An alternative approach, due to Govindan and Wilson, uses a homotopy method ina different way. (This homotopy method actually turns out tobe ann-player exten-sion of the Lemke-Howson algorithm, although the correspondence is not immediatelyapparent.) Instead of varying between coarse and fine approximations, the new addedvariable interpolates between the given game and an easy-to-solve game. That is, wedefine a set of games indexed by a scalarλ ∈ [0, 1] such that whenλ = 0 we haveour original game and whenλ = 1 we have a very simple game. (One way to do thisis to change the original game by adding a “bonus”λk to each player’s payoff in oneoutcomea = (a1, . . . , an). Consider a choice ofk big enough that for each playeri, playingai is a strictly dominant strategy. Then, whenλ = 1, a will be a (unique)Nash equilibrium, and whenλ = 0 we will have our original game.) We begin withan equilibrium to the simple game andλ = 1 and let both the equilibrium to the gameand the index vary in a continuous fashion to trace the path ofgame-equilibrium pairs.Along this pathλmay both decrease and increase; however, if the path is followed cor-rectly, it will necessarily pass through a point whereλ = 0. This point’s correspondingequilibrium is a sample Nash equilibrium of the original game.

Finally, it is possible to generalize the SEM algorithm to then-player case. Unfor-tunately, the feasibility program becomes non-linear; seeFeasibility program TGS-nin Figure 3.13. The expressionp(a−i) from Constraints (3.20) and (3.21) is no longera single variable, but must now be written as

∏j 6=i pj(aj) in Constraints (3.28) and

(3.29). The resulting feasibility problem can be solved using standard numerical tech-niques for non-linear optimization. As with 2-player games, in principle any enumer-ation method would work; the question is which search heuristic works the fastest. Itturns out that a minor modification of the SEM heuristic described in Figure 3.10 is

Multiagent Systems, draft of August 27, 2007

Page 100: Mas kevin b-2007

84 3 Computing Solution Concepts of Normal-Form Games

a−i∈S−i

j 6=i

pj(aj)

ui(ai, a−i) = vi ∀i ∈ N, ai ∈ Si (3.28)

a−i∈S−i

j 6=i

pj(aj)

ui(ai, a−i) ≤ vi ∀i ∈ N, ai ∈/ Si (3.29)

pi(ai) ≥ 0 ∀i ∈ N, ai ∈ Si (3.30)

pi(ai) = 0 ∀i ∈ N, ai ∈/ Si (3.31)∑

ai∈Si

pi(ai) = 1 ∀i ∈ N (3.32)

Figure 3.13 Feasibility program TGS-n

effective for the general case as well: One simply reverses the lexicographic order-ing between size and balance of supports (SEM first sorts themby size, and then by ameasure of balance; in then-player case we reverse the ordering). The resulting heuris-tic algorithm performs very well in practice, and better than the algorithms discussedabove. We should note that while the ordering between balance and size becomesextremely important to the efficiency of the algorithm asn increases, this reverse or-dering does not perform substantially worse than SEM in the 2-player case, because thesmallest of the balanced support size profiles still appearsvery early in the ordering.

3.4 Computing all Nash equilibria of general-sum games

While most research has focused on the problem of finding a sample equilibrium, an-other interesting problem is to determineall equilibria of a game. This problem is oftenmotivated by the mechanism design perspective, an area discussed in detail in Chap-ter 9. For example, if we are designing a game that a set of agents will play, it maybe important to consider all of the possible stable outcomesof each game we consider.We will briefly discuss this problem for two-player general-sum games.

Theorem 3.4.1 Computing all of the equilibria of a two-player general-sumgame re-quires worst-case time that is exponential in the number of actions for each player.

Proof. First we show that all equilibria can be computed in time exponential inthe number of actions for each player. This can be achieved byrunning the SEMalgorithm to completion (i.e., not terminating the algorithm when the first equilib-rium is found), or simply by iterating through all supports.Verifying each supporttakes polynomial time because it involves solving a linear program; as the num-ber of supports is exponential in the number of actions, the total running time isexponential.

Now we show that no other procedure could find all equilibria more quickly.

c© Shoham and Leyton-Brown, 2007

Page 101: Mas kevin b-2007

3.5 Computing specific Nash equilibria of general-sum games 85

The reason is that a game withk actions can have2k − 1 Nash equilibria, evenif the game is nondegenerate. (Of course, when the game is degenerate, it canhave an infinite number of equilibria.) Consider a two-player coordination gamein which both players havek actions and a utility function given by the identitymatrix possesses2k − 1 Nash equilibria: one for each non-empty subset of thekactions. The equilibrium for each subset is for both playersto randomize uniformlyover each action in the subset. Any algorithm that finds all ofthese equilibria musthave a running time that is at least exponential ink.

3.5 Computing specific Nash equilibria of general-sum games

A third problem, which lies between finding a sample equilibrium and finding all equi-libria, is finding an equilibrium with a specific property. Listed below are several dif-ferent questions we could ask about the existence of such an equilibrium.

1. (Uniqueness)Given a gameG, does there exist a unique equilibrium inG?

2. (Pareto Optimality) Given a gameG, does there exist a strictly Pareto efficientequilibrium inG?

3. (Guaranteed Payoff)Given a gameG and a valuev, does there exist an equilibriumin G in which some playeri obtains an expected payoff of at leastv?

4. (Guaranteed Social Welfare)Given a gameG, does there exist an equilibrium inwhich the sum of agents’ utilities is at leastk?

5. (Action Inclusion) Given a gameG and an actionai ∈ Ai for some playeri ∈ N ,does there exist an equilibrium ofG in which playeri plays actionai with strictlypositive probability?

6. (Action Exclusion) Given a gameG and an actionai ∈ Ai for some playeri ∈ N ,does there exist an equilibrium ofG in which playeri plays actionai with zeroprobability?

The answers to these questions are more useful that they might appear at first glance.For example, the ability to answer theGuaranteed Payoffquestion in polynomial timecould be used to find, in polynomial time, the maximum expected payoff that can beguaranteed in a Nash equilibrium. Unfortunately, all of these questions are hard in theworst case.

Theorem 3.5.1 The following problems areNP-hard when applied to Nash equilibria:Uniqueness, Pareto Optimality, Guaranteed Payoff, Guaranteed Social Welfare, ActionInclusion, andAction Exclusion.

This result holds even for two-player games. Further, it is possible to show that theGuaranteed PayoffandGuaranteed Social Welfareproperties cannot be even approxi-mated to any constant factor by a polynomial-time algorithm. However, not all solutionconcepts are so difficult to compute—even for case ofn-player games—as we will seein the next section.

Multiagent Systems, draft of August 27, 2007

Page 102: Mas kevin b-2007

86 3 Computing Solution Concepts of Normal-Form Games

3.6 Computing maxmin and minmax strategies for two-player general-sum games

Recall from Section 2.3.3 that in a two-player general-sum game a maxmin strategyfor playeri is a strategy that maximizes his worst-case payoff, presuming that the otherplayerj follows the strategy which will cause the greatest harm toi. A minmax strategyfor j againsti is such a maximum-harm strategy. Maxmin and minmax strategies can becomputed in polynomial time because they correspond to Nashequilibrium strategiesin related zero-sum games.

Let G be an arbitrary two-player gameG = (1, 2), (A1, A2), (u1, u2). Let usconsider how to compute a maxmin strategy for player1. It will be useful to definethe zero-sum gameG′ = (1, 2), (A1, A2), (u1,−u1), in which player1’s utilityfunction is unchanged and player2’s utility is the negative of player1’s. By the minmaxtheorem (Theorem 2.3.12), sinceG′ is zero-sum every strategy for player 1 which ispart of a Nash equilibrium strategy profile forG′ is a maxmin strategy for player 1 inG′. Notice that by definition, player1’s maxmin strategy is independent of player2’sutility function. Thus, player1’s maxmin strategy will be the same inG and inG′. Ourproblem of finding a maxmin strategy inG thus reduces to finding a Nash equilibriumof G′, a two-player zero-sum game. We can thus solve the problem byapplying thetechniques given above in Section 3.1.

The computation of minmax strategies follows the same pattern. We can again usethe minmax theorem to argue that player2’s Nash equilibrium strategy inG′ is a min-max strategy for him against player1 in G. (If we wanted to compute player1’sminmax strategy, we would have to construct another gameG′′ where player1’s pay-off is −u2, the negative of player2’s payoff inG.) Thus, both maxmin and minmaxstrategies can be computed efficiently for two-player games.

3.7 Removing dominated strategies

Recall that one strategy dominates another when the first strategy is always at least asgood as the second, regardless of the other players’ actions. (Section 2.3.5 gave theformal definitions.) In this section we discuss some computational uses for identifyingdominated strategies, and consider the computational complexity of this process.

As discussed earlier, iterated removal of strictly dominated strategies is conceptuallystraightforward: the same set of strategies will be identified regardless of the elimina-tion order, and all Nash equilibria of the original game willbe contained in this set.Thus, this method can be used to narrow down the set of strategies to consider beforeattempting to identify a sample Nash equilibrium. In the worst case this procedure willhave no effect—many games havenodominated strategies. In practice, however, it canmake a big difference to iteratively remove dominated strategies before attempting tocompute an equilibrium.

Things are a bit trickier with the iterated removal ofweaklyor very weaklydom-inated strategies. In this case the elimination order does make a difference: the setof strategies that survive iterated removal can differ depending on the order in whichdominated strategies are removed. As a consequence, removing weakly or very weakly

c© Shoham and Leyton-Brown, 2007

Page 103: Mas kevin b-2007

3.7 Removing dominated strategies 87

dominated strategiescaneliminate some equilibria of the original game. There is stilla computational benefit to this technique, however. Since nonew equilibria are evercreated by this elimination (and since every game has at least one equilibrium), at leastone of the original equilibria always survives. This is enough if all we want to do isto identify a sample Nash equilibrium. Furthermore, iterative removal of weakly orvery weakly dominated strategies can eliminate a larger setof strategies than iterativeremoval of strictly dominated strategies, and so will oftenproduce a smaller game.

What is the complexity of determining whether a given strategy can be removed?This depends on whether we are interested in checking the strategy for dominationby a pure or mixed strategies, whether we are interested in strict, weak or very weakdomination, and whether we are interested only in domination or in survival underiterated removal of dominated strategies.

3.7.1 Domination by a pure strategy

The simplest case is checking whether a (not necessarily pure) strategysi for playeri is(strictly; weakly; very weakly) dominated by any pure strategy fori. For concreteness,let’s consider the case of strict dominance. To solve the problem we must check everypure strategyai for player i and every pure strategy profile for the other players todetermine whether there exists someai for which it is never weakly better fori to playsi instead ofai. If so, si is strictly dominated. An algorithm for this case is given inFigure 3.14.

forall pure strategiesai ∈ Ai for playeri whereai 6= si dodom← trueforall pure strategy profilesa−i ∈ A−i for the players other thani do

if ui(si, a−i) ≥ ui(ai, a−i) thendom← falsebreak

if dom = true thenreturn true

return false

Figure 3.14 Algorithm for determining whethersi is strictly dominated by any pure strat-egy

Observe that this algorithm works because we do not need to check everymixedstrategy profile of the other players, even though the definition of dominance refersto such strategies. Why can we get away with this? If it is the case (as the innerloop of our algorithm attempts to prove) that for every pure strategy profilea−i ∈ A−i,ui(si, a−i) < ui(ai, a−i), then there cannot exist any mixed strategy profiles−i ∈ S−i

for whichui(si, s−i) ≥ ui(ai, s−i). This holds because of the linearity of expectation.The case of very weak dominance can be tested using essentially the same algorithm

as in Figure 3.14, except that we must test the conditionui(si, s−i) > ui(s′i, s−i).

For weak dominance we need to do a bit more book-keeping: we can test the samecondition as for very weak dominance, but we must also setdom ← false if there

Multiagent Systems, draft of August 27, 2007

Page 104: Mas kevin b-2007

88 3 Computing Solution Concepts of Normal-Form Games

is not at least ones−i for which ui(si, s−i) < ui(s′i, s−i). For all of the definitions

of domination, the complexity of the procedure isO(|A|), linear in the size of thenormal-form game.

3.7.2 Domination by a mixed strategy

Recall that sometimes a strategy is not dominated by any purestrategy, butis domi-nated by some mixed strategy. (We saw an example of this in Figure 2.15.) We cannotuse a simple algorithm like the one in Figure 3.14 to test whether a given strategysi isdominated by a mixed strategy because these strategies cannot be enumerated. How-ever, it turns out that we can still answer the question in polynomial time by solvinga linear program. In this section, we will assume that playeri’s utilities are strictlypositive. This assumption is without loss of generality because if any playeri’s utilitieswere negative, we could add a constant to all ofi’s payoffs without changing the game(see Section 2.1.2).

Each flavor of domination requires a somewhat different linear program. First, let usconsider strict domination by a mixed strategy. This would seem to have the straight-forward solution shown in Figure 3.15.

j∈Ai

pjui(aj , a−i) > ui(si, a−i) ∀a−i ∈ A−i (3.33)

pj ≥ 0 ∀j ∈ Ai (3.34)∑

j∈Ai

pj = 1 (3.35)

Figure 3.15 Constraints for determining whether playeri’s strategysi is strictly dominated byany other mixed strategy (butnota valid LP because of the strict inequality in Constraint (3.33)).

While the constraints in Figure 3.15 do indeed describe strict domination by a mixedstrategy, they do not constitute a linear program. The problem is that the constraintsin linear programs must beweak inequalities (see Appendix B), and thus we cannotwrite Constraint 3.33 as we have done here. Instead, we must use the LP given inFigure 3.16.

minimize∑

j∈Ai

pj (3.36)

subject to∑

j∈Ai

pjui(aj , a−i) ≥ ui(si, a−i) ∀a−i ∈ A−i (3.37)

Figure 3.16 Linear program for determining whether playeri’s strategysi is strictly dominatedby any other mixed strategy.

c© Shoham and Leyton-Brown, 2007

Page 105: Mas kevin b-2007

3.7 Removing dominated strategies 89

This linear program simulates the strict inequality of Constraint 3.33 through theobjective function, as we will describe in a moment. Becauseno constraints restrictthe pj ’s, this LP will always be feasible. An optimal solution willhave allpj ≥ 0(following from our assumption thatui is strictly positive everywhere); however, thepj ’s may not sum to one. In the optimal solution thepj ’s will be set so that their sumcannot be reduced any further without violating Constraint(3.37). Thus for at leastsomea−i ∈ A−i we will have

∑j∈Ai

pjui(aj , a−i) = ui(si, a−i). A strictly domi-nating mixed strategy therefore exists if and only if the optimal solution to the LP hasobjective function value strictly less than1. In this case, we can add a positive amountto eachpj in order to cause Constraint (3.37) to hold in its strict version everywherewhile achieving the condition

∑j pj = 1.

Next, let us consider very weak domination. This flavor of domination does notrequire any strict inequalities, so things are easy here. Here wecanconstruct a feasi-bility program—nearly identical to our earlier failed attempt from Figure 3.15—whichis given in Figure 3.17.

j∈Ai

pjui(aj , a−i) ≥ ui(si, a−i) ∀a−i ∈ A−i (3.38)

pj ≥ 0 ∀j ∈ Ai (3.39)∑

j∈Ai

pj = 1 (3.40)

Figure 3.17 Feasibility program for determining whether playeri’s strategysi is very weaklydominated by any other mixed strategy.

Finally, let us consider weak domination by a mixed strategy. Again our inability towrite a strict inequality will make things more complicated. However, we can derive anLP by adding an objective function to the feasibility program in Figure 3.17, as shownin Figure 3.18.

Because of Constraint 3.42, any feasible solution will havea non-negative objec-tive value. If the optimal solution has a strictly positive objective, the mixed strategygiven by thepj ’s achieves strictly positive expected utility for at leastonea−i ∈ A−i,meaning thatsi is weakly dominated by this mixed strategy.

As a closing remark, observe that all of our linear programs can be modified tocheck whether a strategysi is strictly dominated by any mixed strategy which onlyplaces positive probability on some subset ofi’s actionsT ⊂ Ai. This can be achievedsimply by replacing all occurrences ofAi by T in Figures 3.16, 3.17 and 3.18 .

3.7.3 Iterated dominance

Finally, we consider the iterated removal of dominated strategies. We only considerpure strategies as candidates for removal; indeed, as it turns out, it never helps to re-move dominated mixed strategies when performing iterated removal. Itis important,however, that we consider the possibility that pure strategies may be dominatedby

Multiagent Systems, draft of August 27, 2007

Page 106: Mas kevin b-2007

90 3 Computing Solution Concepts of Normal-Form Games

maximize∑

a−i∈A−i

j∈Ai

pj · ui(aj , a−i)

− ui(si, a−i)

(3.41)

subject to∑

j∈Ai

pjui(aj , a−i) ≥ ui(si, a−i) ∀a−i ∈ A−i

(3.42)

pj ≥ 0 ∀j ∈ Ai

(3.43)∑

j∈Ai

pj = 1 (3.44)

Figure 3.18 Linear program for determining whether playeri’s strategysi is weakly domi-nated by any other mixed strategy.

mixed strategies, as we saw in Section 2.3.5.For all three flavors of domination, it requires only polynomial time to iteratively

remove dominated strategies until the game has been maximally reduced (i.e., no strat-egy is dominated for any player). A single step of this process consists of checkingwhether every pure strategy of every player is dominated by any other mixed strategy,which requires us to solve at worst

∑i∈N |Ai| linear programs. Each step removes one

pure strategy for one player, so there can be at most∑

i∈N (|Ai| − 1) steps.However, recall that some forms of dominance can produce different reduced games

depending on the order in which dominated strategies are removed. We might there-fore want to ask other computational questions, regarding which strategies remain inreduced games. Listed below are some such questions.

1. (Strategy Elimination) Does there exist some elimination path under which thestrategysi is eliminated?

2. (Reduction Identity) Given action subsetsA′i ⊆ Ai for each playeri, does there

exist a maximally reduced game where each playeri has the actionsA′i?

3. (Reduction Size)Given constantski for each playeri, does there exist a maximallyreduced game where each playeri has exactlyki actions?

It turns out that the complexity of answering these questions depends on the form ofdomination under consideration.

Theorem 3.7.1 For iterated strict dominance, thestrategy elimination, reduction iden-tity, uniquenessand reduction sizeproblems are inP. For iterated weak dominance,these problems areNP-complete.

The first part of this result, considering iterated strict dominance, is straightforward:it follows from the fact that iterated strict dominance always arrives at the same set

c© Shoham and Leyton-Brown, 2007

Page 107: Mas kevin b-2007

3.8 Computing correlated equilibria ofn-player normal-form games 91

of strategies regardless of elimination order. The second part is tricker; indeed, ourstatement of this theorem sweeps under the carpet some subtleties about whether dom-ination by mixed strategies is considered (it is in some cases, and is not in others) andthe minimum number of utility values permitted for each player. For all the details, thereader should consult the papers cited at the end of the chapter.

3.8 Computing correlated equilibria of n-player normal-form games

The final solution concept that we will consider is correlated equilibrium. It turns outthat correlated equilibria are (probably) easier to compute than Nash equilibria: a sam-ple correlated equilibrium can be found in polynomial time using a linear programmingformulation. Observe that every game has a correlated equilibrium in which the out-comes over which nature randomizes are the game’s pure strategy profiles. (Of course,some of these outcomes could be assigned probability zero.)Thus, we can find a sam-ple correlated equilibrium if we can find a probability distribution over pure actionprofiles with the property that each agent would prefer to play the action correspond-ing to a chosen outcome when told to do so, given that the otheragents are doing thesame.

As in Section 2.2, leta ∈ A denote a pure strategy profile, and letai ∈ Ai denote apure strategy for playeri. The variables in our linear program arep(a), the probabilityof realizing a given pure strategy profilea; since there is a variable for every purestrategy profile there are thus|A| variables. Observe that as above the valuesui(a) areconstants. The linear program is given in Figure 3.19.

a∈A|ai∈a

p(a)ui(a) ≥∑

a∈A|a′i∈a

p(a)ui(a′i, a−i) ∀i ∈ N, ∀ai, a

′i ∈ Ai (3.45)

p(a) ≥ 0 ∀a ∈ A (3.46)∑

a∈A

p(a) = 1 (3.47)

Figure 3.19 LP formulation for computing a sample correlated Nash equilibrium of ann-player general-sum game

Constraints (3.46) and (3.47) ensure thatp is a valid probability distribution. Theinteresting constraint is (3.45), which expresses the requirement that playeri must be(weakly) better off playing actiona when he is told to do so than playing any other ac-tion a′i, given that other agents play their prescribed actions. This constraint effectivelyrestates the definition of a correlated equilibrium given inDefinition 2.3.20.

We can select a desired correlated equilibrium by adding an objective function to thelinear program. For example, we can find a correlated equilibrium that maximizes the

Multiagent Systems, draft of August 27, 2007

Page 108: Mas kevin b-2007

92 3 Computing Solution Concepts of Normal-Form Games

sum of the agents’ expected utilities by adding the objective function

maximize:∑

a∈A

p(a)∑

i∈N

ui(a). (3.48)

Furthermore, all of the questions discussed in Section 3.5 can be answered about cor-related equilibria in polynomial time, making them (most likely) fundamentally easierproblems.

Theorem 3.8.1 The following problems are in the complexity classP when applied tocorrelated equilibria:uniqueness, Pareto optimal, guaranteed payoff, subset inclusion,andsubset containment.

Finally, it is worthwhile to consider the reason for the computational difference be-tween correlated equilibria and Nash equilibria. Why can we express the definition ofa correlated equilibrium as a linear constraint (3.45), while we cannot do the same withthe definition of a Nash equilibrium, even though both definitions are quite similar?The difference is that a correlated equilibrium involves a single randomization overaction profiles, while in a Nash equilibrium agents randomize separately. Thus, the(non-linear) version of Constraint (3.45) which would instruct a feasibility program tofind a Nash equilibrium would be:

a∈A

ui(a)∏

j∈N

pj(aj) ≥∑

a∈A

ui(a′i, a−i)

j∈N\i

pj(aj) ∀i ∈ N, ∀a′i ∈ Ai.

This constraint now mimics Constraint (3.45), directly expressing the definition ofNash equilibrium. It states that each playeri attains at least as much expected util-ity from following his mixed strategypi as from any pure strategy deviationa′i, giventhe mixed strategies of the other players. However, the constraint is non-linear becauseof the product

∏j∈N pj(aj).

3.9 History and references

The complexity of finding a sample Nash equilibrium is explored in a series of ar-ticles. First came the original definition of the classTFNP, a super-class ofPPAD,by Papadimitriou [1994]. Next Goldberg and Papadimitriou [2006] showed that find-ing an equilibrium of a game with any constant number of players is no harder thanfinding the equilibrium of a four-player game, and Daskalakis et al. [2006b] showedthat these computational problems arePPAD-complete. The result was almost im-mediately tightened to encompass two-player games by Chen and Deng [2006]. TheNP-completeness results for Nash equilibria with specific properties are due to Gilboaand Zemel [1989] and Conitzer and Sandholm [2003b]; the inapproximability resultappeared in [Conitzer, 2006].

A general survey of the classical algorithms for computing Nash equilibria in 2-person games is provided in von Stengel [2002]; another goodsurvey is [McKelvey

c© Shoham and Leyton-Brown, 2007

Page 109: Mas kevin b-2007

3.9 History and references 93

& McLennan, 1996]. Some specific references, both to these classical algorithms andthe newer ones discussed in the chapter, are as follows. The Lemke-Howson algo-rithm is based on Lemke’s pivoting procedure of solving linear complementarity prob-lems in general [Lemke, 1978], and the specialization for games in normal form (alsocalled bi-matrix games) in Lemke and Howson [1964]. The graphical exposition ofthe Lemke-Howson algorithm appeared first in [Shapley, 1974], and then in a modi-fied version in von Stengel [2002]. Scarf’s simplicial-subdivision-based algorithm isdescribed in Scarf [1967]. Homotopy-based approximation methods are covered, forexample, in Garcıa and Zangwill [1981]. Govindan and Wilson’s homotopy methodwas presented in [Govindan & Wilson, 2003]; its path-following procedure dependson topological results due to Kohlberg and Mertens [1986]. The support-enumerationmethod for finding a sample Nash equilibrium is described in [Porteret al., 2004a]. Thecomplexity of eliminating dominated strategies is described in Gilboaet al. [1989] andConitzer and Sandholm [2005].

Two online resources are of particular note.GAMBIT [McKelvey et al., 2006] GAMBIT(http://econweb.tamu.edu/gambit ) is a library of game theoretic algorithms forfinite normal-form and extensive-form games. It includes many different algorithmsfor finding Nash equilibria. In addition to several algorithms that can be used on gen-eral sum,n-player games, it also includes implementations of algorithms designedfor special cases, including 2-person games, zero sum gamesand finding all equilib-ria. GAMUT [Nudelmanet al., 2004] (http://gamut.stanford.edu ) is a suite of GAMUTgame generators designed for testing game-theoretic algorithms.

Multiagent Systems, draft of August 27, 2007

Page 110: Mas kevin b-2007
Page 111: Mas kevin b-2007

4 Playing Games with Time: Reasoningand Computing with the ExtensiveForm

In Chapter 2 we assumed that a game is represented in normal form: effectively, as abig table. In some sense, this is reasonable. The normal formis conceptually straight-forward, and most game theorists see it as fundamental. Whilemany other represen-tations exist to describe finite games, we will see in this chapter and in Chapter 5 thateach of them has an “induced normal form”: a corresponding normal-form represen-tation which preserves game-theoretic properties such as Nash equilibria. Thus theresults given in Chapter 2 hold for all finite games, no matterhow they are represented;in that sense the normal form representation is universal.

In this chapter we will look at extensive-form games, a finiterepresentation whichdoes not always assume that players act simultaneously. This representation is in gen-eral exponentially smaller than its induced normal form, and furthermore can be muchmore natural to reason about. While the Nash equilibria of an extensive-form game canbe found through its induced normal form, computational benefit can be had by work-ing with the extensive form directly. Furthermore, there are other solution conceptssuch as subgame-perfect equilibria (see Section 4.1.3) which explicitly refer to the se-quence in which players act, and which are therefore not meaningful when applied tonormal form games.

4.1 Perfect-information extensive-form games

The normal form game representation does not incorporate any notion of sequence,or time, of the actions of the players. Theextensive (or tree) form is an alternativerepresentation that makes the temporal structure explicit. We start by discussing thespecial case ofperfect informationextensive-form games, and then move on to discussthe more general class ofimperfect-informationextensive-form games in Section 4.2.In both cases we will restrict the discussion to finite games,that is, to games representedas finite trees.

Page 112: Mas kevin b-2007

96 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

4.1.1 Definition

Informally speaking, a perfect-information game in extensive form (or, more simply,a perfect-information game) is a tree in the sense of graph theory, in which each noderepresents the choice of one of the players, each edge represents a possible action, andthe leaves represent final outcomes over which each player has a utility function. Moreformally, we define the following.

Definition 4.1.1 (Perfect-information game) A (finite) perfect-information game(inperfect-informationgame

extensive form) is a tupleG = (N,A,H,Z, χ, ρ, σ, u), where

• N is a set ofn players;

• A is a (single) set of actions;

• H is a set of non-terminal choice nodes;

• Z is a set of terminal nodes, disjoint fromH;

• χ : H → 2A is the action function, which assigns to each choice node a set ofpossible actions;

• ρ : H → N is the player function, which assigns to each non-terminal node aplayeri ∈ N who chooses an action at that node;

• σ : H × A → H ∪ Z is the successor function, which maps a choice node andan action to a new choice node or terminal node such that for all h1, h2 ∈ H anda1, a2 ∈ A, if σ(h1, a1) = σ(h2, a2) thenh1 = h2 anda1 = a2;

• u = (u1, . . . , un), whereui : Z → R is a real-valued utility function for playerion the terminal nodesZ.

Since the choice nodes form a tree, we can unambiguously identify a node with itshistory, that is, the sequence of choices leading from the root node to it. We can alsodefine thedescendantsof a nodeh, namely all the choice and terminal nodes in thesubtree rooted inh.

An example of such a game is thesharing game. Imagine a brother and sister fol-lowing the following protocol for sharing two indivisible and identical presents fromtheir parents. First the brother suggests a split, which canbe one of three—he keepsboth, she keeps both, or they each keep one. Then the sister chooses whether to acceptor reject the split. If she accepts they each get their allocated present(s), and other-wise neither gets any gift. Assuming both siblings value thetwo presents equally andadditively, the graphical tree representation of this gameis shown in Figure 4.1.

4.1.2 Strategies and equilibria

A pure strategy for a player in a perfect-information game isa complete specificationof which deterministic action to take at every node belonging to that player. Moreformally:

c© Shoham and Leyton-Brown, 2007

Page 113: Mas kevin b-2007

4.1 Perfect-information extensive-form games 97

qqqq

qqqqqq

HHHHHHHHHHAAAAA

AAAAA

AAAAA

1

222

0–21–12–0

yesnoyesnoyesno

(0,2)(0,0)(1,1)(0,0)(2,0)(0,0)

Figure 4.1 The sharing game.

Definition 4.1.2 (Pure strategies)LetG = (N,A,H,Z, χ, ρ, σ, u) be a perfect-informationextensive-form game. Then the pure strategies of playeri consist of the cross product

×h∈H,ρ(h)=i

χ(h).

Notice that the definition contains a subtlety. An agent’s strategy requires a decisionat each choice node, regardless of whether or not it is possible to reach that node giventhe other choice nodes. In the sharing game above the situation is straightforward—player 1 has three pure strategies, and player 2 has eight, asfollows.

S1 = 2–0, 1–1, 0–2

S2 = (yes, yes, yes), (yes, yes, no), (yes, no, yes), (yes, no, no), (no, yes, yes),(no, yes, no), (no, no, yes), (no, no, no)

But now consider the game shown in Figure 4.2.

1

22

1

(5,5)(8,3)(3,8)

(2,10) (1,0)

A B

C D E F

G H

Figure 4.2 A perfect-information game in extensive form.

In order to define a complete strategy for this game, each of the players must choosean action at each of his two choice nodes. Thus we can enumerate the pure strategiesof the players as follows.

Multiagent Systems, draft of August 27, 2007

Page 114: Mas kevin b-2007

98 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

S1 = (A,G), (A,H), (B,G), (B,H)

S2 = (C,E), (C,F ), (D,E), (D,F )

It is important to note that we have to include the strategies(A,G) and(A,H), eventhough once player 1 has chosenA then his ownG-versus-H choice is moot.

The definition of best response and Nash equilibria in this game are exactly as theyare in for normal form games. Indeed, this example illustrates how every perfect-information game can be converted to an equivalent normal form game. For example,the perfect-information game of Figure 4.2 can be convertedinto the normal form im-age of the game, shown in Figure 4.3. Clearly, the strategy spaces of the two games arethe same, as are the pure-strategy Nash equilibria. (Indeed, both the mixed strategiesand the mixed-strategy Nash equilibria of the two games are also the same; however,we defer further discussion of mixed strategies until we consider imperfect-informationgames in Section 4.2.)

(C,E) (C,F) (D,E) (D,F)

(A,G) 3,8 3,8 8,3 8,3

(A,H) 3,8 3,8 8,3 8,3

(B,G) 5,5 2,10 5,5 2,10

(B,H) 5,5 1,0 5,5 1,0

Figure 4.3 The game from Fig. 4.2 in normal form.

In this way, for every perfect-information game there exists a corresponding normalform game. Note, however, that the temporal structure of theextensive-form repre-sentation can result in a certain redundancy within the normal form. For example, inFigure 4.3 there are 16 different outcomes, while in Figure 4.2 there are only five; thepayoff (3, 8) occurs only once in Figure 4.2 but four times in Figure 4.3. One generallesson is that while this transformation can always be performed, it can result in anexponential blowup of the game representation. This is an important lesson, since thedidactic examples of normal-form games are very small, wrongly suggesting that thisform is more compact.

The normal form gets its revenge, however, since the reversetransformation—fromthe normal form to the perfect-information extensive form—does not always exist.Consider, for example, the Prisoner’s Dilemma game from Figure 2.3. A little experi-mentation will convince the reader that there does not exista perfect-information gamethat is equivalent in the sense of having the same strategy profiles and the same pay-offs. Intuitively, the problem is that perfect-information extensive-form games cannotmodel simultaneity. The general characterization of the class of normal form games

c© Shoham and Leyton-Brown, 2007

Page 115: Mas kevin b-2007

4.1 Perfect-information extensive-form games 99

for which there exist corresponding perfect-information games in extensive form issomewhat complex.

The reader will have noticed that we have so far concentratedon pure strategiesand pure Nash equilibria in extensive-form games. There aretwo reasons for this, orperhaps one reason and one excuse. The reason is that discussion of mixed strategyintroduces a new subtlety, and it is convenient to postpone discussion of it. The excuse(which also allows the postponement, though not for long) isthe following theorem:

Theorem 4.1.3 Every (finite) perfect-information game in extensive form has a pure-strategy Nash equilibrium.

This is perhaps the earliest result in game theory, due to Zermelo in 1913 (see Histor-ical Notes at the end of the chapter). The intuition here should be clear; since playerstake turns, and everyone gets to see everything that happened thus far before making amove, it is never necessary to introduce randomness into theaction selection in orderto find an equilibrium.1 Both this intuition and the theorem will cease to hold truewhen we discuss more general classes of games such as imperfect-information gamesin extensive form. First, however, we discuss an important refinement of the conceptof Nash equilibrium.

4.1.3 Subgame-perfect equilibrium

As we have discussed, the notion of Nash equilibrium is as well defined in perfect-information games in extensive form as it is in the normal form. However, as thefollowing example shows, the Nash equilibrium can be too weak a notion for the exten-sive form. Consider again the perfect-information extensive-form game shown in Fig-ure 4.2. There are three pure-strategy Nash equilibria in this game:(A,G), (C,F ),(A,H), (C,F ) and(B,H), (C,E). This can be determined by examining thenormal form image of the game, as indicated in Figure 4.4.

(C,E) (C,F) (D,E) (D,F)

(A,G) 3,8 3,8 8,3 8,3

(A,H) 3,8 3,8 8,3 8,3

(B,G) 5,5 2,10 5,5 2,10

(B,H) 5,5 1,0 5,5 1,0

Figure 4.4 Equilibria of the game from Fig. 4.2.

1. Nevertheless, mixed-strategy equilibria still can existin perfect-information extensive-form games.

Multiagent Systems, draft of August 27, 2007

Page 116: Mas kevin b-2007

100 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

However, examining the normal form image of an extensive-form game obscures thegame’s temporal nature. To illustrate a problem that can arise in certain equilibria ofextensive-form games, in Figure 4.5 we contrast the equilibria (A,G), (C,F ) and(B,H), (C,E) by drawing them on the extensive-form game tree.

1

22

1

(5,5)(8,3)(3,8)

(2,10) (1,0)

A B

C D E F

G H

1

22

1

(5,5)(8,3)(3,8)

(2,10) (1,0)

A B

C D E F

G H

Figure 4.5 Equilibria of the game from Figure 4.2. Bold edges indicate players’ actionchoices,and the circled leaf node is the outcome that arises in play. Left:(A, G), (C, F ); Right:(B, H), (C, E).

First consider the equilibrium(A,G), (C,F ). If player 1 choosesA then player2 receives a higher payoff by choosingC than by choosingD. If player 2 played thestrategy(C,E) rather than(C,F ) then player1 would prefer to playB at the first nodein the tree; as it is, player 1 gets a payoff of3 by playingA rather than a payoff of2 byplayingB. Hence we have an equilibrium.

The second equilibrium(B,H), (C,E) is less intuitive. First, note that(B,G),(C,E) is not an equilibrium: player 2’s best response to(B,G) is (C,F ). Thus, theonly reason that player 2 chooses to play the actionE is that he knows that player 1would playH in his second decision node. This behavior by player 1 is called a threat:by committing to choose an action that is harmful to player 2 in his second decisionnode, player 1 can cause player 2 to avoid that part of the tree. (Note that player 1benefits from making this threat: he gets a payoff of5 instead of2 by playing(B,H)instead of(B,G).) So far so good. The problem, however, is that player 2 may notconsider player 1’s threat to be credible: if player 1 did reach his final decision node,actually choosingH overGwould also reduce player 1’s own utility. If player 2 playedF , would player 1 really follow through on his threat and playH, or would he relentand pickG instead?

To formally capture the reason why the(B,H), (C,E) equilibrium is unsatis-fying, and to define an equilibrium refinement concept that does not suffer from thisproblem, we first define the notion of a subgame:

Definition 4.1.4 (Subgame)Given a perfect-information extensive-form gameG, thesubgameofG rooted at nodeh is the restriction ofG to the descendants ofh. The setof subgames ofG consists of all of subgames ofG rooted at some node in G.

Now we can define the notion of asubgame-perfect equilibrium, a refinement of theNash equilibrium in perfect-information games in extensive form, which eliminatesthose unwanted Nash equilibria.2

2. The word “perfect” is used in two different senses here.

c© Shoham and Leyton-Brown, 2007

Page 117: Mas kevin b-2007

4.1 Perfect-information extensive-form games 101

Definition 4.1.5 (Subgame-perfect equilibrium) Thesubgame-perfect equilibria (SPE)subgame-perfectequilibrium(SPE)

of a gameG are all strategy profiless such that for any subgameG′ ofG, the restrictionof s toG′ is a Nash equilibrium ofG′.

SinceG is in particular its own subgame, every SPE is also a Nash equilibrium.Furthermore, although SPE is a stronger concept than Nash equilibrium (that is, everySPE is a NE, but not every NE is a SPE) it is still the case that every perfect-informationextensive-form game has at least one subgame-perfect equilibrium.

This definition rules out “non-credible threats”, of the sort illustrated in the aboveexample. In particular, note that the extensive-form game in Figure 4.2 has only onesubgame-perfect equilibrium,(A,G), (C,F ). Neither of the other Nash equilibriais subgame-perfect. Consider the subgame rooted in player 1’s second choice node.The unique Nash equilibrium of this (trivial) game is for player 1 to playG. Thus theactionH, the restriction of the strategies(A,H) and(B,H) to this subgame, is notoptimal in this subgame, and cannot be part of a subgame-perfect equilibrium of thelarger game.

4.1.4 Computing equilibria: backward induction

General-sumn-player games: the backward induction algorithm

Inherent in the concept of subgame-perfect equilibrium is the principle ofbackwardinduction. One identifies the equilibria in the “bottom-most” subgametrees, and as- backward

inductionsumes that those will be played as one backs up and considers larger and larger trees.We can use this procedure to compute a sample Nash equilibrium. This is good news:not only are we guaranteed to find a subgame-perfect equilibrium (rather than possiblyfinding a Nash equilibrium that involves non-credible threats) but also this procedureis computationally simple. In particular, it can be implemented as a single depth-firsttraversal of the game tree, and thus requires time linear in the size of the game repre-sentation. Recall in contrast that the best known methods for finding Nash equilibriaof general games require time exponential in the size of the normal form; remember aswell that the induced normal form of an extensive-form game is exponentially largerthan the original representation.

function BACKWARD INDUCTION (nodeh) returns u(h)if h ∈ Z then

return u(h) // h is a terminal node

best util← −∞forall a ∈ χ(h) do

util at child←BACKWARD INDUCTION(σ(h, a))if util at childρ(h) > best utilρ(h) then

best util← util at child

return best util

Figure 4.6 Procedure for finding the value of a sample (subgame-perfect) Nashequilib-rium of a perfect-information extensive-form game.

Multiagent Systems, draft of August 27, 2007

Page 118: Mas kevin b-2007

102 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

The algorithm BACKWARD INDUCTION is described in Figure 4.6. The variableutil at child is a vector denoting the utility for each player at the child node;util at childρ(h)

denotes the element of this vector corresponding to the utility for player ρ(h) (theplayer who gets to move at nodeh). Similarly best util is a vector giving utilities foreach player.

Observe that this procedure does not return an equilibrium strategy for each of then players, but rather describes how to label each node with a vector ofn real numbers.This labeling can be seen as an extension of the game’s utility function to the non-terminal nodesH. The players’ equilibrium strategies follow straightforwardly fromthis extended utility function: every time a given playeri has the opportunity to actat a given nodeh ∈ H (that is,ρ(h) = i), that player will choose an actionai ∈χ(h) that solvesarg maxai∈χ(h) ui(σ(ai, h)). These strategies can also be returned byBACKWARD INDUCTION given some extra bookkeeping.

From the way we have described the complexity of BACKWARD INDUCTION, itmight seem that it is already so fast that there would be no need to improve uponit. However, perfect-information extensive-form games are often used to describe in-teractions so complex that the whole game tree could never even be written down. Thismakes our procedure’s linear time complexity more onerous than it might seem. Forexample, consider chess: the extensive-form representation of this game has around10150 nodes, which is vastly too large to represent explicitly. For such games it is morecommon to discuss the size of the game tree in terms of the average branching factorb(the average number of actions which are possible at each node) and a maximum depthm (the maximum number of sequential actions). A procedure which requires time lin-ear in the size of the representation thus expandsO(bm) nodes. Nevertheless, we canunfortunately do no better than this on arbitrary perfect-information games.

Zero-sum two-player games: minimax and alpha-beta pruning

Wecanmake some computational headway in the widely-applicable case of two-playerzero-sum games. Before we describe how to do this, we note that BACKWARD INDUC-TION has another name in the two-player zero-sum context: theminimax algorithm.minimax

algorithm Recall that in such games, only a single payoff number is required to characterize anyoutcome. Player 1 wants to maximize this number, while player 2 wants to minimize it.In this context BACKWARD INDUCTION can be understood as propagating these singlepayoff numbers from the root of the tree up to the root. Each decision node for player 1is labeled with the maximum of the labels of its child nodes (representing the fact thatplayer 1 would choose the corresponding action), and each decision node for player 2is labeled with the minimum of that node’s children’s labels. The label on the root nodeis the value of the game: player 1’s payoff in equilibrium.

How can we improve on the minimax algorithm? The fact that player 1 and player 2always have strictly opposing interests means that we canpruneaway some parts of thepruninggame tree: we can recognize that certain subtrees will neverbe reached in equilibrium,even without examining the nodes in these subtrees. This leads us to a new algorithmcalled ALPHABETAPRUNING, which is given in Figure 4.7.

There are several ways in which ALPHABETAPRUNING differs from BACKWARD IN-DUCTION. Some concern the fact that we have now restricted ourselvesto a setting

c© Shoham and Leyton-Brown, 2007

Page 119: Mas kevin b-2007

4.1 Perfect-information extensive-form games 103

function ALPHABETAPRUNING (nodeh, realα, realβ) returns u1(h)if h ∈ Z then

return u1(h)// h is a terminal node

best util← (2ρ(h)− 3)×∞ // −∞ for player 1;∞ for player 2forall a ∈ χ(h) do

if ρ(h) = 1 thenbest util← max(best util,ALPHABETAPRUNING(σ(h, a), α, β))if best util ≥ β then

return best utilα← max(α, best util)

elsebest util← min(best util,ALPHABETAPRUNING(σ(h, a), α, β))if best util ≤ α then

return best utilβ ← min(β, best util)

return best util

Figure 4.7 The alpha-beta pruning algorithm. It is invoked at the root nodeh as ALPHA-BETAPRUNING(h,−∞,∞).

where there are only two players, and one player’s utility isthe negative of the other’s.We thus deal only with the utility for player 1. This is why we treat the two playersseparately, maximizing for player 1 and minimizing for player 2.

At each nodeh eitherα or β is updated. These variables take the value of thepreviously-encountered node that their corresponding player (player 1 forα and player2 for β) would most prefer to chooseinsteadof h. For example, the value of thevariableβ at a given nodeh player2 could make a different choice at an ancestorof h which would preventh from ever being reached, and would ultimately lead to apreviously-encountered terminal node with valuebeta. The players do not have anyalternative to starting at the root of the tree, which is why at the beginning of the searchα = −∞ andβ =∞.

We can now concentrate on the important difference between BACKWARD INDUC-TION and ALPHABETAPRUNING: in the latter procedure, the search can backtrack ata node that is not terminal. Let’s think about things from thepoint of view of player1, who is considering what action to play at nodeh. (As we encourage you to checkfor yourself, a similar argument holds when it is player 2’s turn to move at nodeh.)For player 1, this backtracking occurs on the line that reads“if best util ≥ β thenreturnbest util.” What is going on here? We have just explored some, but not all,of the children of player 1’s decision nodeh; the highest value among these explorednodes isbest val. The value of nodeh is therefore lower-bounded bybest val (it isbest val if h has no children with larger values, and is some larger amountotherwise).Either way, ifbest val ≥ β then player1 knows that player2 prefers choosing his bestalternative (at some ancestor node ofh) rather than allowing player 1 to act at nodeh.

Multiagent Systems, draft of August 27, 2007

Page 120: Mas kevin b-2007

104 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

Thus nodeh cannot be on the equilibrium path3 and so there is no need to continueexploring the game tree belowh.

1

22

10 8 6

. . .

Figure 4.8 An example of alpha-beta pruning. We can backtrack after expanding the first childof the shaded node.

A simple example of ALPHABETAPRUNING in action is given in Figure 4.8. Thesearch begins by heading down the left branch and visiting both terminal nodes, andeventually settingβ = 8. (Do you see why?) It then returns the value8 as the value ofthis subgame, which causesα to be set to8 at the root node. In the right subgame thesearch visits the first terminal node and so setsbest util = 6 at the shaded node, whichwe will call h. Now ath we havebest util ≤ α, which means that we can backtrack.This is safe to do because we have just shown that player 1 would never choose thissubgame: he can guarantee himself a payoff of8 by choosing the left subgame, whereashis utility in the right subgame would be no more than6.

The effectiveness of the alpha-beta pruning algorithm depends on the order in whichnodes are considered. For example, if player 1 considers nodes in increasing order oftheir value, and player 2 considers nodes in decreasing order of value, then no nodeswill ever be pruned. In the best case (where nodes are orderedin decreasing valuefor player 1 and in increasing order for player 2), alpha-beta pruning has complex-ity of O(b

m2 ). We can rewrite this expression asO(

√bm

), making more explicit thefact that the game’s branching factor would effectively be cut to the square root of itsoriginal value. If nodes are examined in random order then the analysis becomes some-what more complicated; whenb is fairly small, the complexity of alpha-beta pruning isO(b

3m4 ), which is still an exponential improvement. In practice, itis usually possible to

achieve performance somewhere between the best case and therandom case. This tech-nique thus offers substantial practical benefit over straightforward backward inductionin two-player zero-sum games for which the game tree is represented implicitly.

Techniques like alpha-beta pruning are commonly used to build strong computerplayers for two-player board games such as chess. (However,they perform poorly

3. In fact, in the casebest val = β it is possible thath could be reached on an equilibrium path; however, inthis case there is still always an equilibrium in which player 2 plays his best alternative andh is not reached.

c© Shoham and Leyton-Brown, 2007

Page 121: Mas kevin b-2007

4.1 Perfect-information extensive-form games 105

on games with extremely large branching factors, such as go.) Of course building agood computer player involves a great deal of engineering, and requires considerableattention to game-specific heuristics such as those used to order actions. One generaltechnique is required by many such systems, however, and so is worth discussing here.The game tree in practical games can be so large that it is infeasible to search all theway down to leaf nodes. Instead, the search proceeds to some shallower depth (which ischosen either statically or dynamically). Where do we get thenode values to propagateup using backward induction? The trick is to use anevaluation functionto estimate evaluation

functionthe value of the deepest node reached (taking into account game-relevant features suchas board position, number of pieces for each player, who getsto move next, etc., andeither built by hand or learned). When the search has reached an appropriate depth, thenode is treated as terminal with a call to the evaluation function replacing the evaluationof the utility function at that node. This requires a small change to the beginning ofALPHABETAPRUNING; otherwise, the algorithm works unchanged.

General-sum two-player gams: computing all subgame-perfect equilibria

While the BACKWARD INDUCTION procedure identifies one subgame-perfect equilib-rium in linear time, it does not provide an efficient way of finding all of them. Onemight wonder how there could evenbe more than one SPE in a perfect-informationgame. Multiple subgame-perfect equilibria can exist when there exist one or more de-cision nodes at which a player chooses between subgames in which he receives thesame utility. In such cases BACKWARD INDUCTION simply chooses the first subgameit encountered. It could be useful to find the set of all subgame-perfect equilibria if wewanted to find a specific SPE (as we did with Nash equilibria of normal-form games inSection 3.5) such as the one that maximizes social welfare.

Here let us restrict ourselves to two-player perfect-information extensive-form games,but lift our previous restriction that the game be zero-sum.A somewhat more compli-cated algorithm can find the set ofall subgame-perfect equilibrium values in worst casecubic time.

Theorem 4.1.6 Given a two-player perfect-information extensive-form game withℓleaves, the set of subgame-perfect equilibrium payoffs canbe represented as the unionofO(ℓ2) axis-aligned rectangles and can be computed in timeO(ℓ3).

Intuitively, the algorithm works much like BACKWARD INDUCTION, but the vari-ableutil at child holds a representation of all equilibrium values instead ofjust one.The “max” operation we had previously implemented throughbest util is replacedby a subroutine that returns a representation of all the values that can be obtained insubgame-perfect equilibria of the node’s children. This can include mixed strategiesif multiple children are simultaneously best responses. More information about thisalgorithm can be found in the reference cited in the chapter notes.

4.1.5 An example, and criticisms of backward induction

Despite the fact that strong arguments can be made in its favor, the concept of backwardinduction is not without controversy. To see why this is, consider the well-known

Multiagent Systems, draft of August 27, 2007

Page 122: Mas kevin b-2007

106 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

centipede game, depicted in Figure 4.9. (The game starts at the node at the upper left.)centipede gameIn this game two players alternate in making decisions, at each turn choosing betweengoing ‘down’ and ending the game or going ‘across’ and continuing it (except at thelast node where going ‘across’ also ends the game). The payoffs are constructed insuch a way that the only SPE is for each player to always chooseto go down. So seewhy, consider the last choice. Clearly at that point the bestchoice for the player is to godown. Since this is the case, going down is also the best choice for the other player inthe previous choice point. By induction the same argument holds for all choice points.

1q A

D

(1,0)

2q A

D

(0,2)

1q A

D

(3,1)

2q A

D

(2,4)

1q A

D

(4,3)

(3,5)

Figure 4.9 The centipede game

This would seem to be the end of this story, except for two pesky factors. The firstis commonsense—the SPE prediction in this case flies in the face of intuition. Indeed,in laboratory experiments subjects in fact continue to stayplay ‘across’ until close tothe end of the game. The second problem is theoretical. Imagine that you are thesecond player in the game, and in the first step of the game the first player actuallygoes across. What should you do? The SPE suggests you should godown, but thesame analysis suggests that you wouldn’t have gotten to thischoice point in the firstplace. In other words, you have reached a state to which your analysis has given aprobability of zero. How should you amend your beliefs and course of action basedon this measure-zero event? It turns out this seemingly small inconvenience actuallyraises a fundamental problem in game theory. We will not develop the subject furtherhere, but let us only mention that there exist different accounts of this situation, andthey depend on the probabilistic assumptions made, on what is common knowledge (inparticular, whether there is common knowledge of rationality), and on exactly how onerevises one’s beliefs in the face of measure zero events. Thelast question is intimatelyrelated to the subject of belief revision discussed in Chapter 13.

4.2 Imperfect-information extensive-form games

Up to this point, in our discussion of extensive-form games we have allowed players tospecify the action that they would take at every choice node of the game. This impliesthat players know the node they are in, and—recalling that in such games we equatenodes with the histories that led to them—all the prior choices, including those of otheragents. For this reason we have called theseperfect-information games.

We might not always want to make such a strong assumption about our players andour environment. In many situations we may want to model agents needing to act withpartial or no knowledge of the actions taken by others, or even agents with limitedmemory of their own past actions. The sequencing of choices allows us to representsuch ignorance to a limited degree; an “earlier” choice might be interpreted as a choice

c© Shoham and Leyton-Brown, 2007

Page 123: Mas kevin b-2007

4.2 Imperfect-information extensive-form games 107

made without knowing the “later” choices. However, we cannot represent two choicesmade in the same play of the game in mutual ignorance of each other. The normalform, of course, is optimized for such modelling.

4.2.1 Definition

Imperfect-informationgames in extensive form address this limitation. An imperfect-information game is an extensive-form game in which each player’s choice nodes arepartitioned intoinformation sets; intuitively, if two choice nodes are in the same infor- information setsmation set then the agent cannot distinguish between them.4

Definition 4.2.1 (Imperfect-information game) An imperfect-information game(in imperfect-informationgame

extensive form) is a tuple(N,A,H,Z, χ, ρ, σ, u, I), where

• (N,A,H,Z, χ, ρ, σ, u) is a perfect-information extensive-form game, and

• I = (I1, . . . , In), whereIi = (Ii,1, . . . , Ii,ki) is an equivalence relation on (that

is, a partition of)h ∈ H : ρ(h) = i with the property thatχ(h) = χ(h′) andρ(h) = ρ(h′) whenever there exists aj for whichh ∈ Ii,j andh′ ∈ Ii,j .

Note that in order for the choice nodes to be truly indistinguishable, we require thatthe set of actions at each choice node in an information set bethe same (otherwise, theplayer would be able to distinguish the nodes). Thus, ifIi,j ∈ Ii is an equivalence class,we can unambiguously use the notationχ(Ii,j) to denote the set of actions available toplayeri at any node in information setIi,j .

qq q

qqqqqq

""

""

"

bb

bb

b"

""

""

bb

bb

be

eee

%%

%%

ee

ee

%%

%%

1

22

1

RL

BA

rℓrℓ

(0,0)(2,4)(2,4)(0,0)

(1,1)

Figure 4.10 An imperfect-information game.

Consider the imperfect-information extensive-form game shown in Figure 4.10. Inthis game, player 1 has two information sets: the set including the top choice node, andthe set including the bottom choice nodes. Note that the two bottom choice nodes inthe second information set have the same set of possible actions. We can regard player1 as not knowing whether player 2 choseA orB when she makes her choice betweenℓ andr.

4. From the technical point of view, imperfect-information games are obtained by overlaying a partitionstructure, as defined in Chapter 12 in connection with models of knowledge, over a perfect-informationgame.

Multiagent Systems, draft of August 27, 2007

Page 124: Mas kevin b-2007

108 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

4.2.2 Strategies and equilibria

A pure strategy for an agent in an imperfect-information game selects one of the avail-able actions in each information set of that agent:

Definition 4.2.2 (Pure strategies)LetG = (N,A,H,Z, χ, ρ, σ, u, I) be an imperfect-information extensive-form game. Then the pure strategiesof player i consist of thecross product

×Ii,j∈Ii

χ(Ii,j)

Thus perfect-information games can be thought of as a special case of imperfect-information games, in which every equivalence class of eachpartition is a singleton.

Consider again the Prisoner’s Dilemma game, shown as a normal form game inFigure 2.3. An equivalent imperfect-information game in extensive form is given inFigure 4.11.

qqq

qqqq

""

""

"

bb

bb

be

eee

%%

%%

ee

ee

%%

%%

1

2

C D

dcdc

(-3,-3)(0,-4)(-4,0)(-1,-1)

Figure 4.11 The Prisoner’s Dilemma game in extensive form.

Note that we could have chosen to make player 2 choose first andplayer 1 choosesecond.

Recall that perfect-information games were not expressiveenough to capture thePrisoner’s Dilemma game and many other ones. In contrast, asis obvious from this ex-ample, any normal-form game can be trivially transformed into an equivalent imperfect-information game. However, this example is also special in that the Prisoner’s Dilemmais a game with a dominant strategy solution, and thus in particular a pure-strategy Nashequilibrium. This is not true in general for imperfect-information games. To be preciseabout the equivalence between a normal form game and its extensive-form image wemust consider mixed strategies, and this is where we encounter a new subtlety.

As we did for perfect-information games, we can define the normal form game cor-responding to any given imperfect-information game; this normal game is again de-fined by enumerating the pure strategies of each agent. Now, we define the set ofmixed strategies of an imperfect-information game as simply the set of mixed strate-gies in its image normal form game; in the same way, we can alsodefine the set ofNash equilibria.5 However, we can also define the set ofbehavioral strategiesin thebehavioral

strategy5. Note that we have defined two transformations – one from any normal form game to an imperfect-information game, and one in the other direction. However the first transformation is not one to one, and so

c© Shoham and Leyton-Brown, 2007

Page 125: Mas kevin b-2007

4.2 Imperfect-information extensive-form games 109

ss s

1

2

HHHHHHHHHH

TTTTT

TTTTT

L R

U DL R

1,0 100,100 5,1 2,2

Figure 4.12 A game with imperfect recall

extensive-form game. These are the strategies in which eachagent’s (potentially prob-abilistic) choice at each node is made independently of his choices at other nodes. Thedifference is substantive, and we illustrate it in the special case of perfect-informationgames. For example, consider the game of Figure 4.2. A strategy for player 1 thatselectsA with probability .5 andG with probability .3 is a behavioral strategy. Incontrast, the mixed strategy(.6(A,G), .4(B,H)) is not a behavioral strategy for thatplayer, since the choices made by him at the two nodes are not independent (in fact,they are perfectly correlated).

In general, the expressive power of behavioral strategies and the expressive powerof mixed strategies are non-comparable; in some games thereare outcomes that areachieved via mixed strategies but not any behavioral strategies, and in some games it isthe other way around.

Consider for example the game in Figure 4.12. In this game, when consideringmixed strategies (but not behavioral strategies), R is a strictly dominant strategy foragent 1, D is agent 2’s strict best response, and thus (R,D) isthe unique Nash equi-librium. Note in particular that in a mixed strategy, agent 1decides probabilisticallywhether to play L or R in his information set, but once he decides he plays that purestrategy consistently. Thus the payoff of 100 is irrelevantin the context of mixed strate-gies. On the other hand, with behavioral strategies agent 1 gets to randomize afresheach time he finds himself in the information set. Noting thatthe pure strategy D isweakly dominant for agent 2 (and in fact is the unique best response to all strategies ofagent 1 other than the pure strategy L), agent 1 computes the best response to D as fol-lows. If he uses the behavioral strategy(p, 1− p) (that is, choosing L with probabilityp each time he finds himself in the information set), his expected payoff is

1 ∗ p2 + 100 ∗ p(1− p) + 2 ∗ (1− p)

if we transform a normal form game to an extensive-form one and then back to normal form, we will not ingeneral get back the same game we started out with. However, we will get a game with identical strategyspaces and equilibria.

Multiagent Systems, draft of August 27, 2007

Page 126: Mas kevin b-2007

110 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

The expression simplifies to−99p2 + 98p + 2, whose maximum is obtained atp =98/198. Thus (R,D) =((0, 1), (0, 1)) is no longer an equilibrium in behavioral strate-gies, and instead we get the equilibrium((98/198, 100/198), (0, 1)).

There is, however, a broad class of imperfect-information games in which the ex-pressive power of mixed and behavioral strategies coincides. This is the class of gamesof perfect recall. Intuitively speaking, in these games no player forgets anyinformationhe knew about moves made so far; in particular, he remembers precisely all his ownmoves. Formally:

Definition 4.2.3 (Perfect recall) Playeri hasperfect recallin an imperfect-informationperfect recallgameG if for any two nodesh, h′ that are in the same information set for playeri, forany pathh0, a0, h1, a1, h2, . . . , hn, an, h from the root of the game toh (where thehj

are decision nodes and theaj are actions) and any pathh0, a′0, h

′1, a

′1, h

′2, . . . , h

′m, a

′m, h

from the root toh′ it must be the case that:

1. n = m

2. For all 0 ≤ j ≤ n, hj andh′j are in the same equivalence class for playeri.

3. For all 0 ≤ j ≤ n, if ρ(hj) = i (that is,hj is a decision node of playeri), thenaj = a′j .

G is a game of perfect recall if every player has perfect recallin it.

Clearly, every perfect-information game is a game of perfect recall.

Theorem 4.2.4 (Kuhn, 1953)In a game of perfect recall, any mixed strategy of agiven agent can be replaced by an equivalent behavioral strategy, and any behav-ioral strategy can be replaced by an equivalent mixed strategy. Here two strategies areequivalent in the sense that they induce the same probabilities on outcomes, for anyfixed strategy profile (mixed or behavioral) of the remainingagents.

As a corollary we can conclude that the set of Nash equilibriadoes not change if werestrict ourselves to behavioral strategies. This is true only in games of perfect recall,and thus, for example, in perfect-information games. We stress again, however, that ingeneral imperfect-information games, mixed and behavioral strategies yield noncom-parable sets of equilibria.

4.2.3 Computing equilibria: the sequence form

Because any extensive-form game can be converted into an equivalent normal formgame, an obvious way to find an equilibrium of an extensive-form game is to firstconvert it into a normal form game, and then find the equilibria using (e.g.,) the Lemke-Howson algorithm. This method is inefficient, however, because the number of actionsin the normal form game isexponentialin the size of the extensive-form game. Thenormal form game is created by considering all combinationsof information set actionsfor each player, and the payoffs that result when these strategies are employed.

One way to avoid this problem is to operate directly on the extensive-form represen-tation. This can be done by employing behavior strategies toexpress a game using adescription called thesequence form.sequence form

c© Shoham and Leyton-Brown, 2007

Page 127: Mas kevin b-2007

4.2 Imperfect-information extensive-form games 111

Defining the sequence form

The sequence form is (primarily) useful for representing imperfect-information extensive-form games of perfect recall. Definition 4.2.5 describes theelements of the sequence-form representation of such games; we then go on to explain what each of these ele-ments means.

Definition 4.2.5 (Sequence-form representation)LetG be an imperfect-informationgame of perfect recall. Thesequence-form representationofG is a tuple(N,Σ, g, C), sequence-form

representationwhere

• N is a set of agents;

• Σ = (Σ1, . . . ,Σn), whereΣi is the set ofsequencesavailable to agenti;

• g = (g1, . . . , gn), wheregi : Σ→ R is the payoff function for agenti;

• C = (C1, . . . , Cn), whereCi is a set of linear constraints on the realization prob-abilities of agenti.

Now let us define all these terms. To begin with, what is a sequence? The key insightof the sequence form is that, while there are exponentially-many pure strategies in anextensive-form game, there are only a small number of nodes in the game tree. Ratherthan building a player’s strategy around the idea of pure strategies, the sequence formbuilds it around paths in the tree from the root to each node.

Definition 4.2.6 (Sequence)A sequenceof actions of playeri ∈ N , defined by a node sequenceh ∈ H ∪ Z of the game tree, is the ordered set of playeri’s actions that lie the pathfrom the root toh. Let∅ denote the sequence corresponding to the root node. The set ofsequences of playeri is denotedΣi, andΣ = Σ1× . . .×Σn is the set of all sequences.

A sequence can thus be thought of as a string listing the action choices that playeriwould have to take in order to get from the root to a given nodeh. Observe thath mayor may not be a leaf node; observe also that the other players’actions that form part ofthis path are not part of the sequence.

Definition 4.2.7 (Payoff function) The payoff functiongi : Σ → R for agenti is sequence formpayoff functiongiven byg(σ) = u(z) if a leaf nodez ∈ Z would be reached when each player played

his sequenceσi ∈ σ, and byg(σ) = 0 otherwise.

Given the set of sequencesΣ and the payoff functiong, we can think of the sequenceform as defining a tabular representation of an imperfect-information extensive-formgame, much as the induced normal form does. Consider the gamegiven in Figure 4.10(see page 107). The sets of sequences for the two players areΣ1 = ∅, L,R, Lℓ, LrandΣ2 = ∅, A,B. The payoff function is given in Figure 4.13. For comparison, theinduced normal form of the same game is given in Figure 4.14. Written this way, thesequence form is larger than the induced normal form. However, many of the entriesin the game matrix in Figure 4.13 correspond to cases where the payoff function is

Multiagent Systems, draft of August 27, 2007

Page 128: Mas kevin b-2007

112 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

∅ A B

∅ 0, 0 0, 0 0, 0

L 0, 0 0, 0 0, 0

R 1, 1 0, 0 0, 0

Lℓ 0, 0 0, 0 2, 4

Lr 0, 0 2, 4 0, 0

Figure 4.13 The sequence form of thegame from Figure 4.10.

A B

Lℓ 0, 0 2, 4

Lr 2, 4 0, 0

Rℓ 1, 1 1, 1

Rr 1, 1 1, 1

Figure 4.14 The induced normal form ofthe game from Figure 4.10.

defined to be zero because the given pair of sequences does notcorrespond to a leafnode in the game tree. These entries are shaded in gray to indicate that they could notarise in play. Each payoff thatis defined at a leaf in the game tree occurs exactly oncein the sequence form table. Thus, ifg was represented using a sparse encoding, only5values would have to be stored. Compare this to the induced normal form, where all ofthe (8) entries correspond to leaf nodes from the game tree.

We now have a set of players, a set of sequences and a mapping from sequences topayoffs. At first glance this may look like everything we needto describe our game.However, sequences do not quite take the place of actions. Inparticular, a player cannotsimply select a single sequence in the way that he would select a pure strategy—theother player(s) might not play in a way that would allow him tofollow it to its end. Putanother way, players still need to define what they would do inevery information setthat could be reached in the game tree.

What we want is for agents to select behavior strategies. (Since we have assumedthat our gameG has perfect recall, Theorem 4.2.4 tells us that any equilibrium willbe expressible using behavior strategies.) However, it turns out that it is not a goodidea to work with behavior strategies directly—if we did so, the optimization problemswe develop later would be computationally harder to solve. Instead, we will developthe alternate concept of arealization plan, which corresponds to the probability that agiven sequence would arise under a given behavior strategy.

Consider an agenti following a behavior strategy that assigned probabilityβi(h, ai)to taking actionai at a given decision nodeh. Then we can construct arealization planthat assigns probabilities to sequences in a way that recovers i’s behavior strategyβ.

Definition 4.2.8 (Realization plan ofβi) Therealization plan ofβi for player i ∈ Nrealization planof βi is a mappingri : Σi → [0, 1] defined asri(σi) =

∏c∈σi

βi(c). Each valueri(σi) iscalled arealization probability.

realizationprobability

c© Shoham and Leyton-Brown, 2007

Page 129: Mas kevin b-2007

4.2 Imperfect-information extensive-form games 113

Definition 4.2.8 is not the most useful way of defining realization probabilities.There is a second, equivalent definition with the advantage that it involves a set oflinear equations, although it is a bit more complicated. This definition relies on twofunctions that we will make extensive use of in this section.

To define the first function, we make use of our assumption thatG is a game ofperfect recall. This entails that, given an information setI ∈ Ii, there must be onesingle sequence that playeri can play to reach all of his non-terminal choice nodesh ∈ I. We denote this mapping asseqi : Ii → Σi, and callseqi(I) the sequenceleading toinformation setI. Note that while there is only one sequence that leads to aseqi(I): the

sequenceleading toI

given information set, a given sequence can lead to multipledifferent information sets.For example, if player 1 moves first and player 2 observes his move, then the sequence∅ will lead to multiple information sets for player 2.

The second function considers ways that sequences can be built from other se-quences. Byσiai denote a sequence that consists of the sequenceσi followed bythe single actionai. As long as the new sequence still belongs toΣi, we say that thesequenceσiai extendsthe sequenceσi. A sequence can often be extended in multi-ple ways—for example, perhaps agenti could have chosen an actiona′i instead ofai

after playing sequenceσi. We denote byExti : Σi → 2Σi a function mapping from Exti(σi):sequencesextendingσi

sequences to sets of sequences, whereExti(σi) denotes the set of sequences that ex-tend the sequenceσi. We defineExti(∅) to be the set of all single-action sequences.Note that extension always refers to playing asingleaction beyond a given sequence;thus,σiaia

′i does not belong toExti(σi), even if it is a valid sequence. (Itdoesbelong

to Exti(σiai).) Also note that not all sequences have extensions; one example is se-quences leading to leaf nodes. In such casesExti(σ) returns the empty set. Finally, toreduce notation we introduce the shorthandExti(I) = Exti(seqi(I)): the sequences Exti(I) =

Exti(seqi(I))extending an information set are the sequences extending the (unique) sequence leadingto that information set.

Definition 4.2.9 (Realization plan) A realization planfor player i ∈ N is a function realization planri : Σi → [0, 1] satisfying the following constraints.

ri(∅) = 1 (4.1)∑

σ′i∈Exti(I)

ri(σ′i) = ri(seqi(I)) ∀I ∈ Ii (4.2)

ri(σi) ≥ 0 ∀σi ∈ Σi (4.3)

If a player i follows a realization planri, we must be able to recover a behaviorstrategyβi from it. For a decision nodeh for playeri that is in information setI ∈ Ii,and for any sequence(seqi(I)ai) ∈ Exti(I), βi(h, ai) is defined asri(seqi(I)ai)

ri(seqi(I)) , aslong asri(seqi(I)) > 0. If ri(seqi(I)) = 0 then we can assignβi(h, ai) an arbitraryvalue from[0, 1]: hereβi describes the player’s behavioral strategy at a node that couldnever be reached in play because of the player’s own previousdecisions, and so thevalue we assign toβi is irrelevant.

Multiagent Systems, draft of August 27, 2007

Page 130: Mas kevin b-2007

114 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

Let Ci be the set of constraints (4.2) on realization plans of player i. Let C =(C1, . . . , Cn). We have now defined all the elements6 of a sequence form representa-tionG = (N,Σ, g, C), as laid out in Definition 4.2.5.

What is the space complexity of the sequence form representation? Unlike the nor-mal form, the size of this representation is linear in the size of the extensive formgame. There is one sequence for each node in the game tree, plus the∅ sequence foreach player. As argued above, the payoff functiong can be represented sparsely, sothat each payoff corresponding to a leaf node is stored only once, and no other payoffsare stored at all. There is one version of Constraint (4.2) for each edge in the gametree. Each such constraint for playeri references only|Exti(I)| + 1 variables, againallowing sparse encoding.

Computing best responses in two-player games

The sequence form representation can be leveraged to allow the computation of equi-libria far more efficiently than can be done using the inducednormal form. Here wewill consider the case of two-player games, as it is these games for which the strongestresults hold. First we consider the problem of determining player 1’s best response toa fixed behavioral strategy of player 2 (represented as a realization plan). This problemcan be written as the linear program in Figure 4.15.

maximize∑

σ1∈Σ1

(∑

σ2∈Σ2

g1(σ1, σ2)r2(σ2)

)r1(σ1) (4.4)

subject to r1(∅) = 1 (4.5)∑

σ′1∈Ext1(I)

r1(σ′1) = r1(seq1(I)) ∀I ∈ I1 (4.6)

r1(σ1) ≥ 0 ∀σ1 ∈ Σ1 (4.7)

Figure 4.15 Linear program expressing the problem of finding player 1’s best response to arealization plan of player 2. Observe thatg1(·) andr2(·) are constants, whiler1(·) are variables.

This linear program is straightforward. It states that player 1 should chooser1 tomaximize his expected utility (given in the objective function (4.4)) subject to Con-straints (4.5)–(4.7) which require thatr1 corresponds to a valid realization plan.

In an equilibrium, player 1 and player 2 best-respond simultaneously. However, ifwe treated bothr1 andr2 as variables in Figure 4.15 then the objective function (4.4)would no longer be linear. Happily, this problem does not arise in the dual of this linearprogram.7 Denote the variables of our dual LP asv; there will be onevI for everyinformation setI ∈ I1 (corresponding to Constraint (4.6) from the primal) and oneadditional variablev0 (corresponding to Constraint 4.5). For notational convenience,

6. We do not need to explicitly store Constraints (4.1) and (4.3), because they are always the same for everysequence form representation.7. The dual of a linear program is defined in Appendix B.

c© Shoham and Leyton-Brown, 2007

Page 131: Mas kevin b-2007

4.2 Imperfect-information extensive-form games 115

we define a “dummy” information set0 for player 1; thus, we can consider every dualvariable to correspond to an information set.

We now define one more function. LetIi : Σi → Ii∪0 be a mapping from player Ii(σi): the lastinformation setencountered inσi

i’s sequences to information sets. We defineIi(σi) to be0 iff σi = ∅, and to be theinformation setI ∈ Ii in which the final action inσi was taken otherwise. Note that theinformation set in which each action in a sequence was taken is unambiguous becauseof our assumption that the game has perfect recall. Finally,we again overload notationto simplify the expressions that follow. Given a set of sequencesΣ′, let Ii(Σ

′) denoteIi(σ

′)|σ′i ∈ Σ′

i. Thus, for example,Ii(Exti(σ1)) is the (possibly empty) set of final Ii(Exti(σ1)) =Ii(σ

′)|σ′ ∈Exti(σ1)

information sets encountered in the (possibly empty) set ofextensions ofσi.The dual LP is given in Figure 4.16.

minimize v0 (4.8)

subject to vI1(σ1) −∑

I′∈I1(Ext1(σ1))

vI′ ≥∑

σ2∈Σ2

g1(σ1, σ2)r2(σ2) ∀σ1 ∈ Σ1

(4.9)

Figure 4.16 The dual to the linear program in Figure 4.15.

The variablev0 represents player 1’s expected utility under the realization plan hechooses to play, given player 2’s realization plan. In the optimal solutionv0 will corre-spond to player 1’s expected utility when he plays his best response. (This follows fromLP duality—primal and dual linear programs always have the same optimal solutions.)Each other variablevI can be understood as the portion of this expected utility thatplayer 1 will achieve under his best response realization plan in the subgame startingfrom information setI, again given player 2’s realization planr2.

There is one version of Constraint (4.9) for every sequenceσ1 of player 1. Observethat there is always exactly one positive variable on the left-hand side of the inequality,corresponding to the information set of the last action in the sequence. There can alsobe zero or more negative variables, each of which corresponds to a different informa-tion set in which player 1 can end up after playing the given sequence. To understandthis constraint, we will consider three different cases.

First, there are zero of these negative variables when the sequence cannot be extended—i.e., when player 1 never gets to move again afterI1σ1, no matter what player 2 does.In this case, the right-hand side of the constraint will evaluate to player 1’s expectedpayoff from the subgame beyondσ1, given player 2’s realization probabilitiesr2. (Thissubgame is either a terminal node or one or more decision nodes for player 2 leadingultimately to terminal nodes.) Thus, here the constraint states that the expected utilityfrom a decision at information setI1(σ1) must be at least as large as the expected util-ity from making the decision according toσ1. In the optimal solution this constraintwill be realized as equality ifσ1 is played with positive probability; contrapositively, ifthe inequality is strict,σ1 will never be played.

The second case is when the structure of the game is such that player 1 will faceanother decision node no matter how he plays at information setI1(σ1). For example,

Multiagent Systems, draft of August 27, 2007

Page 132: Mas kevin b-2007

116 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

this occurs ifσ1 = ∅ and player 1 moves at the root node: thenI1(Ext1(σ1)) = 1(the first information set). As another example, if player 2 takes one of two movesat the root node and player 1 observes this move before choosing his own move, thenfor σ1 = ∅ we will haveI1(Ext1(σ1)) = 1, 2. Whenever player 1 is guaranteedto face another decision node, the right-hand side of Constraint (4.9) will evaluate tozero becauseg1(σ1, σ2) will equal 0 for all σ2. Thus the constraint can be interpretedas stating that player 1’s expected utility at information set I1(σ1) must be equal tothe sum of the expected utilities at the information setsI1(Ext1(σ1)). In the optimalsolution, wherev0 is minimized, these constraints are always be realized as equality.

Finally, there is the case where there exist extensions of sequenceσ1, but where it isalso possible that player 2 will play in a way that will deny player 1 another move. Forexample, consider the game in Figure 4.2 from earlier in the chapter. If player 1 adoptsthe sequenceB at his first information set, then he will reach his second informationset if player 2 playsF , and will reach a leaf node otherwise. In this case there willbeboth negative terms on the left-hand side of Constraint (4.9) (one for every informationset that player 1 could reach beyond sequenceσ1) andpositive terms on the right-handside (expressing the expected utility player 1 achieves forreaching a leaf node). Herethe constraint can be interpreted as asserting thati’s expected utility atI1(σ1) can onlyexceed the sum of the expected utilities ofi’s successor information sets by the amountof the expected payoff due to reaching leaf nodes from player2’s move(s).

Computing equilibria of two-player zero-sum games

For two-player zero-sum games the sequence form allows us towrite a linear programfor computing a Nash equilibrium that can be solved in time polynomial in the sizeof the extensive form. Note that in contrast, the methods described in Section 3.1would require time exponential in the size of the extensive form, because they requireconstruction of an LP with a constraint for each pure strategy of each player and avariable for each pure strategy of one of the players.

This new linear program for games in sequence form can be constructed quite di-rectly from the dual LP in Figure 4.16. Intuitively, we simply treat the termsr2(·)as variables rather than constants, and add in the constraints from Definition 4.2.9 toensure thatr2 is a valid realization plan. The program is given in Figure 4.17.

The fact thatr2 is now a variable means that player 2’s realization plan willnow beselected to minimize player 1’s expected utility when player 1 best responds to it. Inother words, we find a minmax strategy for player 2 against player 1, and since we havea two-player zero-sum game it is also a Nash equilibrium by Theorem 2.3.12. Observethat if we had tried this same trick with the primal LP in Figure 4.15 we would haveended up with a quadratic objective function, and hence not alinear program.

Computing equilibria of two-player general-sum games

For two-player general-sum games, the problem of finding a Nash equilibrium can beformulated as a linear complementarity problem, which is given in Figure 4.18.

Like the linear complementarity problem for two-player games in normal form givenin Figure 3.4, this is a feasibility problem consisting of linear constraints and comple-

c© Shoham and Leyton-Brown, 2007

Page 133: Mas kevin b-2007

4.2 Imperfect-information extensive-form games 117

minimize v0 (4.10)

subject to vI1(σ1) −∑

I′∈I1(Ext1(σ1))

vI′ ≥∑

σ2∈Σ2

g1(σ1, σ2)r2(σ2) ∀σ1 ∈ Σ1 (4.11)

r2(∅) = 1 (4.12)∑

σ′2∈Ext2(I)

r2(σ′

2) = r2(seq2(I)) ∀I ∈ I2 (4.13)

r2(σ2) ≥ 0 ∀σ2 ∈ Σ2 (4.14)

Figure 4.17 Linear program for finding the equilibrium of a two-player zero-sum game repre-sented in sequence form. The variables arev· andr2(·).

mentary slackness conditions. The linear constraints are those from the primal LPfor player 1 (Constraints (4.15), (4.17), (4.19)), from thedual LP for player 1 (Con-straint (4.21)), and from the corresponding versions of these primal and dual programsfor player 2 (Constraints (4.16), (4.18), (4.20) and (4.22)). Note that we have rear-ranged some of these constraints by moving all terms to the left side, and have super-scripted thev’s with the appropriate player number.

If we stopped at Constraint 4.22 we would have a linear program, but the variablesvwould be allowed to take arbitrarily large values. The complementary slackness condi-tions (Constraints (4.23) and (4.24)) fix this problem at theexpense of shifting us froma linear program to a linear complementarity problem. Let’sexamine Constraint (4.23).It states that either sequenceσ1 is never played (i.e.,r1(σ1) = 0) or that

v1I1(σ1)

−∑

I′∈I1(Ext1(σ1))

v1I′ =

σ2∈Σ2

g1(σ1, σ2)r2(σ2). (4.25)

What does it mean for Equation (4.25) to hold? The short answeris that this equationrequires a property that we previously observed of the optimal solution to the dual LPin Figure 4.16: that the weak inequality in Constraint (4.9)will be realized as strictequality whenever the corresponding sequence is played with positive probability. Wewere able to achieve this property in the dual LP by minimizing v0; however, this doesnot work in the two-player general-sum case where we have both v1

0 andv20 . Instead,

we use the complementary slackness idea that we previously applied in the LCP fornormal-form games (Constraint (3.19) in Figure 3.4).

This linear complementarity program cannot be solved usingthe Lemke-Howsonalgorithm, as we were able to do with our LCP for normal-form games. However, itcanbe solved using the Lemke algorithm, a more general version of Lemke-Howson.Neither algorithm is polynomial-time in the worst case. However, it is exponentiallyfaster to run the Lemke algorithm on a game in sequence form than it is to run theLemke-Howson algorithm on the game’s induced normal form. We omit the detailsof how to apply the Lemke algorithm to sequence-form games, but refer the interestedreader to the reference given at the end of the chapter.

Multiagent Systems, draft of August 27, 2007

Page 134: Mas kevin b-2007

118 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

r1(∅) = 1 (4.15)

r2(∅) = 1 (4.16)∑

σ′1∈Ext1(I)

r1(σ′1) = r1(seq1(I)) ∀I ∈ I1

(4.17)∑

σ′2∈Ext2(I)

r2(σ′2) = r2(seq2(I)) ∀I ∈ I2

(4.18)

r1(σ1) ≥ 0 ∀σ1 ∈ Σ1

(4.19)

r2(σ2) ≥ 0 ∀σ2 ∈ Σ2

(4.20)(

v1I1(σ1) −

I′∈I1(Ext1(σ1))

v1I′

)−

(∑

σ2∈Σ2

g1(σ1, σ2)r2(σ2)

)≥ 0 ∀σ1 ∈ Σ1

(4.21)(

v2I2(σ2) −

I′∈I2(Ext2(σ2))

v2I′

)−

(∑

σ1∈Σ1

g2(σ1, σ2)r1(σ1)

)≥ 0 ∀σ2 ∈ Σ2

(4.22)

r1(σ1)

[(v1I1(σ1) −

I′∈I1(Ext1(σ1))

v1I′

)−

(∑

σ2∈Σ2

g1(σ1, σ2)r2(σ2)

)]= 0 ∀σ1 ∈ Σ1

(4.23)

r2(σ2)

[(v2I2(σ2) −

I′∈I2(Ext2(σ2))

v2I′

)−

(∑

σ1∈Σ1

g2(σ1, σ2)r1(σ1)

)]= 0 ∀σ2 ∈ Σ2

(4.24)

Figure 4.18 A linear complementarity problem for finding Nash equilibria of two-playergeneral-sum games represented in sequence form. The variables arev1

· , v2· , r1(·) andr2(·).

4.2.4 Sequential equilibrium

We have already seen that the Nash equilibrium concept is tooweak for perfect-informationgames, and how the more selective notion of subgame-perfectequilibrium can be moreinstructive. The question is whether this essential idea can be applied to the broaderclass of imperfect-information games; it turns out that it can, although the details areconsiderably more involved.

Recall that in a subgame-perfect equilibrium we require that the strategy of eachagent be a best response in every subgame, not only the whole game. It is immedi-ately apparent that the definition doesn’t apply in imperfect-information games, if forno other reason than the fact that we no longer have a well defined notion of a sub-game. What we have instead at each information set is a ‘sub-forest’, or a collectionof subgames. We could require that each player’s strategy bea best response in eachsubgame in each forest, but that would be both too strong a requirement and too weak.

c© Shoham and Leyton-Brown, 2007

Page 135: Mas kevin b-2007

4.2 Imperfect-information extensive-form games 119

To see why it is too strong, consider the game in Figure 4.19.

ss s s

s s s s

1

2 2

HHHHHHHHHH

TTTTT

TTTTT

L C R

U D U D(1,1)

(0,1000) (0,0) (1,0) (3,1)

Figure 4.19 Player 2 knows where in the information set he is.

The pure strategies of player 1 areL,C,R and of player twoU,D. Note also thatthe two pure Nash equilibria are (L,U) and (R,D). But should either of these be consid-ered subgame-perfect? On the face of it the answer is ambiguous, since in one subtreeU (dramatically) dominates D and in the other D dominates U. However, consider thefollowing argument. R dominates C for player 1, and player 2 knows this. So althoughplayer 2 does not have explicit information about which of the two nodes he is in withinhis information set, he can deduce that he is in the rightmostone based on player 1’sincentives, and hence will go D. Furthermore player 1 knows that player 2 can de-duce this, and therefore player 1 should go R. Thus (R,D) is the only subgame-perfectequilibrium.

This example shows how a requirement that a sub-strategy be abest response in allsub-games is too simplistic. However, in general it is not the case that subtrees of aninformation set can be pruned as in the above example so that all remaining ones agreeon the best strategy for the player. In this case the naive application of the SPE intuitionwould rule out all strategies.

There have been several related proposals that apply the intuition underlying subgame-perfection in more sophisticated ways. One of the more influential notions has been thatof sequential equilibrium(SE). It shares some features with the notion of trembling-sequential

equilibriumhand perfection, discussed in the section 2.3. Note that indeed trembling-hand perfec-tion, which was defined for normal-form games, applies here just as well; just think ofthe normal form induced by the extensive-form game. However, this notion makes noreference to the tree-structure of the game. SE does, but at the expense of additionalcomplexity.

SE is defined for games of perfect recall. As we have seen, in such games we canrestrict our attention to behavioral strategies. Considerfor the moment a completely-mixed strategy profile8. Such a strategy profile induces a positive probability on every

8. Again, recall that a strategy is completely mixed if, at every information set, each action is given somepositive probability.

Multiagent Systems, draft of August 27, 2007

Page 136: Mas kevin b-2007

120 4 Playing Games with Time: Reasoning and Computing with the Extensive Form

node in the game tree. This means in particular that every information set is given apositive probability. Therefore, for a given completely-mixed strategy profile, one canmeaningfully speak ofi’s expected utility, given that he finds himself in any particularinformation set (the expected utility of starting at any node is well defined, and sinceeach node is given positive probability, one can apply Bayes’ rule to aggregate theexpected utilities of the different nodes in the information set). If the completely-mixed strategy profile constitutes an equilibrium, it must be that each agent’s strategymaximizes his expected utility in each of his information sets, holding the strategies ofthe other agents fixed.

All of the above is for a completely mixed strategy profile. The problem is thatequilibria are rarely fully mixed, and strategy profiles that are not fully mixed donotinduce a positive probability on every information set. Theexpected utility of startingin information sets whose probability is zero under the given strategy profile is simplynot well defined. This is where the ingenious device of SE comes in. Given any strategyprofile S (not necessarily completely mixed), imagine a probabilitydistributionµ(h)over each information set.µ has to beconsistentwith S, in the sense that for sets whoseprobability is non-zero under their parents’ conditional distributionS, this distributionis precisely the one defined by Bayes’ rule. However, for other information sets, it canbe any distribution. Intuitively, one can think of these distributions as the new beliefsof the agents, if they are surprised and find themselves in a situation they thoughtwould not occur.9 This means that agents’ expected utility is now well defined in anyinformation set, including those having measure zero. For information seth belongingto agenti, with the associated probability distributionµ(h), the expected utility understrategy profileS is denoted byui(S | h, µ(h)).

With this, the precise definition of SE is as follows:

Definition 4.2.10 (Sequential equilibrium) A strategy profileS is a sequential equi-librium of an extensive-form gameG if there exist probability distributionsµ(h) foreach information seth in G, such that the following two conditions hold:

1. (S, µ) = limn→∞(Sn, µn) for some sequence(S1, µ1), (S2, µ2), . . . whereSn iscompletely mixed, andµn is consistent withSn (in fact, sinceSn is completelymixed,µn is uniquely determined bySn); and

2. For any information seth belonging to agenti, and any alternative strategyS′i of i,

we have that

ui(S | h, µ(h)) ≥ ui((S′, S−i) | h, µ(h))

Analogous to subgame-perfection in games of perfect information, sequential equi-libria are guaranteed to always exist:

Theorem 4.2.11Every finite game of perfect recall has a sequential equilibrium.

Finally, while sequential equilibria are defined for games of imperfect information,they are obviously also well defined for the special case of games of perfect informa-tion. This raises the question of whether, in the context of games of perfect information,the two solution concepts coincide. The answer is that they almost do, but not quite:

9. This construction is essentially that of a LPS, discussedin Chapter 12.

c© Shoham and Leyton-Brown, 2007

Page 137: Mas kevin b-2007

4.3 History and references 121

Theorem 4.2.12Every subgame-perfect equilibrium is a sequential equilibrium, butthe converse is not true in general.10

4.3 History and references

As in Chapter 2, much of the material in this chapter is covered in modern game theorytextbooks. Some of the historical references are as follows. The earliest game theoreticpublication is arguably that of Zermelo, who in 1913 introduced the notions of a gametree and backward induction, and argued that in principle chess admits a trivial solu-tion [Zermelo, 1913]. It was already mentioned in Chapter 2 that extensive-form gameswere discussed explicitly in [von Neumann & Morgenstern, 1944], as was backwardinduction. Subgame-perfection was introduced by Selten [1965]; the material on com-puting all subgame-perfect equilibria is based on [Littmanet al., 2006]. The centipedegame was introduced by Rosenthal [1981]; many other papers discuss the rationality ofbackward induction in such games [Aumann, 1995; Binmore, 1996; Aumann, 1996].

In 1953 Kuhn introduced extensive-form games of imperfect information, includingthe distinction and connection between mixed and behavioral strategies [Kuhn, 1953].The sequence form, and its application to computing the equilibria of zero-sum gamesof imperfect information with perfect recall, is due to von Stengel [1996]. Many of thesame ideas were developed earlier by Koller and Megiddo [1992]; see [von Stengel,1996] pp. 242–3 for the distinctions. The use of the sequenceform for computingthe equilibria of general-sum two-player games of imperfect information is explainedby Koller et al. [1996]. Sequential equilibria were introduced by Kreps andWilson[1982]. Here, as in normal-form games, the full list of alternative solution conceptsand connection among them is long, and the interested readeris referred to Hillas andKohlberg [2002] and Govindan and Wilson [2005b] for a more extensive survey thanis possible here.

10. For the record, the converseis true in so-calledgeneric games, but we do not discuss those here. We dodiscuss genericity innormal-formgames in Chapter 6, though there too only briefly.

Multiagent Systems, draft of August 27, 2007

Page 138: Mas kevin b-2007
Page 139: Mas kevin b-2007

5 Richer Representations: Beyond theNormal and Extensive Forms

In this chapter we will go beyond the normal and extensive forms by considering avariety of richer game representations. These further representations are importantbecause the normal and extensive forms are not always suitable for modeling large orrealistic game-theoretic settings.

First, we may be interested in games which are not finite, and which therefore cannotbe represented in normal or extensive form. For example, we may want to considerwhat happens when a simple normal-form game such as the Prisoner’s Dilemma isrepeated infinitely. We might want to consider a game played by an uncountably infiniteset of agents. Or we may want to use an interval of the real numbers as each player’saction space.1

Second, both of the representations we have studied so far presume that agents haveperfect knowledge of everyone’s payoffs. This seems like a poor model of many real-istic situations, where for example agents might have private information that affectstheir own payoffs, and other agents might only have probabilistic information abouteach others’ private information. An elaboration like thiscan have a big impact, be-cause one agent’s actions can depend on what he knows about another agent’s payoffs.

Finally, as the numbers of players and actions in a game grow—even if they remainfinite—games can quickly become far too large to reason about or even to write downusing the representations we have studied so far. Luckily, we’re not usually interestedin studying arbitrary strategic situations. The sorts of non-cooperative settings that aremost interesting in practice tend to involve highly structured payoffs. This can occurbecause of constraints imposed by the fact that the play of a game actually unfoldsover time (for example, because a large game actually corresponds to finitely repeatedplay of a small game). It can also occur because of the nature of the problem domain(for example, while the world may involve many agents, the number of agents whoare able to directly effect any given agent’s payoff is small). If we understand the wayin which agents’ payoffs are structured, we can represent them much more compactlythan we would be able to do using the normal or extensive forms. Often, these compactrepresentations also allow us to reason more efficiently about the games they describe

1. We will explore the first example in detail in this chapter. Athorough treatment of infinite sets of playersor action spaces is beyond the scope of this book; nevertheless, we will consider certain games with infinitesets of players in Section 5.4.4, and with infinite action spaces in Chapters 9 and 10.

Page 140: Mas kevin b-2007

124 5 Richer Representations: Beyond the Normal and Extensive Forms

C D

C −1,−1 −4, 0

D 0,−4 −3,−3

C D

C −1,−1 −4, 0

D 0,−4 −3,−3

Figure 5.1 Twice-played Prisoner’s Dilemma.

(e.g., the computation of Nash equilibria can be provably faster, or pure-strategy Nashequilibria can be proven to always exist).

In this chapter we will present various different representations that address theselimitations of the normal and extensive forms. In Section 5.1 we will begin by con-sidering the special case of extensive-form games which areconstructed by repeatedlyplaying a normal-form game, and then we will extend our consideration to the casewhere the normal form is repeated infinitely. This will lead us to stochastic games inSection 5.2, which are like repeated games but do not requirethat the same normal-form game is played in each time step. In Section 5.3 we will consider structureof a different kind: instead of considering time, we will consider games involvinguncertainty. Specifically, in Bayesian games agents face uncertainty—and hold pri-vate information—about the game’s payoffs. Section 5.4 describes congestion games,which model situations in which agents contend for scarce resources. Finally, in Sec-tion 5.5 we will consider representations that are motivated primarily by compactnessand by their usefulness for permitting efficient computation (e.g., of Nash equilibria).Such compact representations can extend upon any other existing representation suchas normal form games, extensive-form games or Bayesian games.

5.1 Repeated games

In repeated games, a given game (often thought of in normal form) is played multipletimes by the same set of players. The game being repeated is called thestage game.stage gameFor example, Figure 5.1 depicts two players playing the Prisoner’s Dilemma exactlytwice in a row.

This representation of the repeated game, while intuitive,obscures some key factors.Do agents see what the other agents played earlier? Do they remember what theyknew? And, while the utility of each stage game is specified, what is the utility of theentire repeated game?

We answer these questions in two steps. We first consider the case in which the gameis repeated a finite and commonly known number of times. Then we consider the casein which the game is repeated infinitely often, or a finite but unknown number of times.

c© Shoham and Leyton-Brown, 2007

Page 141: Mas kevin b-2007

5.1 Repeated games 125

5.1.1 Finitely repeated games

One way to completely disambiguate the semantics of a finitely repeated game is tospecify it as an imperfect-information game in extensive form. Figure 5.2 describesthe twice-played Prisoner’s Dilemma game in extensive form. Note that it capturesthe assumption that at each iteration the players do not knowwhat the other player isplaying, but afterwards they do. Also note that the payoff function of each agent isadditive, that is, it is the sum of payoffs in the two stage games.

q q q q q q q q q q q q q q q qq q q q q q q qq q q qq qq

2

2 2 2 2

XXXXXXXX

HHHH

HHHH

@

@

@

@

@

@

@

@

AAA

AAA

AAA

AAA

AAA

AAA

AAA

AAA

c d c d c d c d c d c d c d c d

C D C D C D C D

c d c d

C D

1 1 1 1

1

(-2,-2)(-5,-1)

(-1,-5)(-4,-4)

(-5,-1)(-8,0)

(-4,-4)(-7,-3)

(-1,-5)(-4,-4)

(0,-8)(-3,-7)

(-4,-4)(-7,-3)

(-3,-7)(-6,-6)

Figure 5.2 Twice-played Prisoner’s Dilemma in extensive form.

The extensive form also makes it clear that the strategy space of the repeated gameis much richer than the strategy space in the stage game. Certainly one strategy in therepeated game is to adopt the same strategy in each stage game; clearly, this memory-less strategy, called astationary strategy, is a behavioral strategy in the extensive-formstationary

strategyrepresentation of the game. But in general, the action (or mixture of actions) playedat a stage game can depend on the history of play thus far. Since this fact plays aparticularly important role in infinitely repeated games, we postpone further discussionof this to the next section. Indeed, in the finite, known repetition case, we encounteragain the phenomenon of backward induction, which we first encountered when weintroduced subgame perfect equilibria. Recall that in the centipede game, discussed inSection 4.1.3, the unique SPE was to go down and terminate thegame at every node.Now consider a finitely repeated Prisoner’s Dilemma case. Again, it can be argued, inthe last round it is a dominant strategy to defect, no matter what happened so far. Thisis common knowledge, and no choice of action in the precedingrounds will impact theplay in the last round. Thus in the second to last round too it is a dominant strategy todefect. Similarly, by induction, it can be argued that the only equilibrium in this caseis to always defect. However, as in the case of the centipede game, this argument isvulnerable to both empirical and theoretical criticisms.

5.1.2 Infinitely repeated games

When the infinitely repeated game is transformed into extensive form, the result is aninfinite tree. So the payoffs cannot be attached to any terminal nodes, nor can they bedefined as the sum of the payoffs in the stage games (which in general will be infinite).

Multiagent Systems, draft of August 27, 2007

Page 142: Mas kevin b-2007

126 5 Richer Representations: Beyond the Normal and Extensive Forms

There are two definitions of payoff that are common. The first is the average payoff ofthe stage game in the limit:2

Definition 5.1.1 (Average reward) Given an infinite sequence of payoffsr(1)i , r(2)i , . . .

for playeri, theaverage rewardof i isaverage reward

limk→∞

∑kj=1 r

(j)i

k.

Thefuture discounted rewardto a player at a certain point of the game is the sum ofhis payoff in the immediate stage game, plus the sum of futurerewards discounted bya constant factor. This is a recursive definition, since the future rewards again give ahigher weight to early payoffs than to later ones. Formally:

Definition 5.1.2 (Discounted reward) Given an infinite sequence of payoffsr(1)i , r(2)i , . . .

for playeri, and a discount factorβ with 0 ≤ β ≤ 1, thefuture discounted rewardof ifuturediscountedreward

is∑∞

j=1 βjr

(j)i .

The discount factor can be interpreted in two ways. First, itcan be taken to representthe fact that the agent cares more about his well being in the near term than in the longterm. Alternatively, it can be assumed that the agent cares about the future just as muchas he cares about the present, but with some probability the game will be stopped anygiven round;1− β represents that probability. The analysis of the game is notaffectedby which perspective is adopted.

Now let’s consider strategy spaces in an infinitely repeatedgame. In particular,consider the infinitely repeated prisoners’ dilemma game. As we discussed, there aremany strategies other than stationary ones. One of the most famous ones isTit-for-Tat (TfT). TfT is the strategy in which the player starts by cooperating, and thereafterTit-for-Tat (TfT)chooses in roundj + 1 the action chosen by the other player in roundj. Besides beingboth simple and easy to compute, this strategy is notoriously hard to beat; it was thewinner in several repeated-prisoners’-dilemma competitions for computer programs.

Since the space of strategies is so large, a natural questionis whether we can charac-terize all the Nash equilibria of the repeated game. For example, if the discount factoris large enough, both players playing TfT is a Nash equilibrium. But there is an infi-nite number of others. For example, consider thetrigger strategy. This is a draconiantrigger strategyversion of TfT; in the trigger strategy, a player starts by cooperating, but if ever theother player defects then the first defects forever. Again, for sufficiently large discountfactor, the trigger strategy forms a Nash equilibrium not only with itself but also withTfT.

The Folk Theorem—so-called because it was part of the common lore before it wasformally written down—helps us to understand the space of allNash equilibria of aninfinitely repeated game, by answering a related question. It doesn’t characterize theequilibrium strategy profiles, but rather the payoffs obtained in them. Roughly speak-ing, it states that in an infinitely repeated game the set of average rewards attainable in

2. The observant reader will notice a potential difficulty inthis definition, since the limit may not exist. Onecan extend the definition to cover these cases by using thelim sup operator rather thanlim.

c© Shoham and Leyton-Brown, 2007

Page 143: Mas kevin b-2007

5.1 Repeated games 127

equilibrium are precisely those pairs attainable under mixed strategies in a single stagegame, with the constraint on the mixed strategies that each player’s payoff is at least theamount he would receive if the other players adopted minmax strategies against him.

More formally, consider anyn-player gameG = (N, (Ai), (ui)) and any payoffprofile r = (r1, r2, . . . , rn). Let

vi = mins−i∈S−i

maxsi∈Si

ui(s−i, si)

In words, vi is player i’s minmax value: his utility when the other players playminmax strategies against him, and he plays his best response.

Before giving the theorem, we provide some more definitions.

Definition 5.1.3 (Enforceable) A payoff profiler = (r1, r2, . . . , rn) is enforceableifri ≥ vi.

Definition 5.1.4 (Feasible)A payoff profiler = (r1, r2, . . . , rn) is feasibleif there ex-ist rational, non-negative valuesαa such that for alli, we can expressri as

∑a∈A αui(a),

with∑

a∈A αa = 1.

In other words, a payoff profile is feasible if it is a convex, rational combination ofthe outcomes inG.

Theorem 5.1.5 (Folk Theorem)Consider any n-player normal-form gameG and Folk Theoremany payoff profiler = (r1, r2, . . . , rn).

1. If r is the payoff profile for any Nash equilibrium of the infinitely repeatedG withaverage rewards, then for each playeri, ri is enforceable.

2. If r is both feasible and enforceable, thenr is the payoff profile for some Nashequilibrium of the infinitely repeatedG with average rewards.

This proof is quite intuitive. The first part uses the definition of minmax and bestresponse to show that an agent can never receive less than hisminmax value in equi-librium. The second part shows how to construct an equilibrium that yields each agentthe average payoffs given in any feasible and enforceable payoff profile r. This equi-librium has the agents cycle in perfect lock-step through a sequence of game outcomesthat achieve the desired average payoffs. If any agent deviates, the others punish himforever by playing their minmax strategies against him.

Proof. Part 1: Supposer is not enforceable, i.e.ri < vi for somei. Thenconsider a deviation of this playeri to bi(s−i(h)) for any historyh of the repeatedgame, wherebi is any best-response action in the stage game ands−i(h) is theequilibrium strategy of other players given the current history h. By definition ofa minmax strategy, playeri will receive a payoff of at leastvi in every stage gameif he playsbi(s−i(h)), and soi’s average reward is also at leastvi. Thusi cannotreceive the payoffri < vi in any Nash equilibrium.

Part 2: Sincer is a feasible enforceable payoff profile, we can write it asri =∑a∈A(βa

γ )ui(a), whereβa andγ are non-negative integers. (Recall thatαa were

Multiagent Systems, draft of August 27, 2007

Page 144: Mas kevin b-2007

128 5 Richer Representations: Beyond the Normal and Extensive Forms

required to be rational. So we can takeγ to be their common denominator.) Sincethe combination was convex, we haveγ =

∑a∈A βa.

We’re going to construct a strategy profile that will cycle through all outcomesa ∈ A ofG with cycles of lengthγ, each cycle repeating actiona exactlyβa times.Let (at) be such a sequence of outcomes. Let’s define a strategysi of playeri tobe a trigger version of playing(at): if nobody deviates, thensi playsat

i in periodt. However, if there was a periodt′ in which some playerj 6= i deviated, thensi will play (p−j)i, where(p−j) is a solution to the minimization problem in thedefinition ofvj .

First observe that if everybody plays according tosi, then, by construction,player i receives average payoff ofri (look at averages over periods of lengthγ). Second, this strategy profile is a Nash equilibrium. Suppose everybody playsaccording tosi, and playerj deviates at some point. Then, forever after, playerjwill receive hismin max payoffvj ≤ rj , rendering the deviation unprofitable.

The above theorem is actually an instance of a large family offolk theorems. Thetheorem as stated is restricted to infinitely repeated games, to average reward, to theNash equilibrium, and to games of complete information. There are folk theoremsthat for other versions of each of these conditions, as well as other conditions notmentioned here. In particular, there are folk theorems for infinitely repeated games withdiscounted reward (for a large enough discount factor), forfinitely repeated games, forsubgame-perfect equilibria (that is, where agents only administer finite punishments todeviators), and for games of incomplete information. We do not review them here, butthe message of each of them is fundamentally the same: the payoffs in the equilibria ofa repeated game are essentially constrained only by enforceability and feasibility.

5.1.3 “Bounded rationality”: repeated games played by automata

Until now we have assumed that players can engage in arbitrarily deep reasoning andmutual modelling, regardless of its complexity. In particular, consider the fact thatwe have tended to rely on equilibrium concepts as predictions of—or prescriptionsfor—behavior. Even in the relatively uncontroversial case of zero-sum games, this isa questionable stance in practice; otherwise there would beno point in (e.g.,) chesscompetitions. While we continue to make this questionable assumption in most ofthe remainder of the book, we pause here to revisit it. We ask what happens whenagents are not perfectly rational expected-utility maximizers. In particular, we askwhat happens when we impose specific computational limitations on players.

Consider (yet again) an instance of the Prisoner’s Dilemma,which is reproducedin Figure 5.3. In the finitely repeated Prisoner’s Dilemma game, we know that eachplayer’s dominant strategy (and thus the only Nash equilibrium) is to choose the strat-egy Defect in each iteration of the game. In reality, when people actually play thegame, we typically observe a significant amount of cooperation, especially in the ear-lier iterations of the game. While much of game theory is open to the criticism that itdoes not match well with human behavior, this is a particularly start example of thisdivergence. What models might explain this fact?

One early proposal in the literature is based on the notion ofanǫ-equilibrium, which

c© Shoham and Leyton-Brown, 2007

Page 145: Mas kevin b-2007

5.1 Repeated games 129

C D

C 3, 3 0, 4

D 4, 0 1, 1

Figure 5.3 The Prisoner’s Dilemma game.

is a strategy profile in which no agent can gain more thanǫ by changing his strategy(a Nash equilibrium is thus the special case of a0-equilibrium). This is motivated bythe idea that agents’ rationality may be bounded in their ability to distinguish betweensmall differences in payoffs, and therefore may be willing to settle for payoffs that areclose to the best response payoffs. In the finitely repeated Prisoner’s Dilemma game,as the number of repetitions increases, the corresponding sets ofǫ-equilibria includeoutcomes with longer and longer sequences of theCooperate strategy.

Various other aspects of bounded rationality exist, but we will focus on what hasproved to be the richest source of results so far, namely restricting agents’ strategies tothose implemented by automata of the sort investigated in computer science.

Finite state automata

The motivation for using automata becomes apparent when we consider the representa-tion of a strategy in a repeated game. Recall that a finitely repeated game is an imperfectinformation extensive form game, and that a strategy for player i in such a game is aspecification of an action for every information set belonging to playeri. A strategyfor k repetitions of anm-action game is thus a specification ofmk−1

m−1 different actions.However, a naive encoding of a strategy as a table mapping each possible history toan action can be extremely inefficient. For example, the strategy of choosingDefect

in every round can be represented using just the single-stage strategyDefect , and theTit-for-Tatstrategy can be represented simply by specifying that the player mimic whathis opponent did in the previous round. One representation that exploits this structureis thefinite state automaton, or Moore machine. The formal definition of a finite state finite state

automaton

Moore machine

automaton in the context of a repeated game is as follows.

Definition 5.1.6 (Automaton) Given a repeated game of the normal form gameG = (N,A1, . . . , An, u1, . . . , un), an automatonMi for playeri is a four-tuple(Qi, q

0i , δi, fi),

where

• Qi is a set of states;

• q0i is the start state;

Multiagent Systems, draft of August 27, 2007

Page 146: Mas kevin b-2007

130 5 Richer Representations: Beyond the Normal and Extensive Forms

• δi : Qi × A1 × · · · × An → Qi is the transition function which maps the currentstate and an action profile to a new state; and

• fi : Qi → Ai is the strategy function associating with every state an action forplayeri

An automaton is used to represent one player’s repeated gamestrategy as follows.The machine begins in the start stateq0i , and in the first round plays the action given byfi(q

0i ). Using the transition function, and the actions played by the other players in the

first round, it then transitions automatically to the new state δi(q0i , a1, . . . , an) beforethe beginning of round two. It then plays the actionfi(δi(q

0i , a1, . . . , an)) in round

two, and so on. More generally, we can specify the current strategy and state at roundt using the following initial condition and recursive definitions.

q1i = q0i

ati = fi(q

ti)

qt+1i = δi(q

ti , a

t1, . . . , a

tn)

Automata representations of strategies are very intuitivewhen viewed graphically.The following figures show compact automata representations of some common strate-gies for the repeated Prisoner’s Dilemma game. Each circle is a state in the automaton,and its label is the action to play at that state. The transitions are represented as labelledarrows. From the current state, we transition along the arrow labelled with the movethe opponent played in the current game. The unlabelled arrow points to the initialstate.

The automaton represented by Figure 5.4 plays constantDefectstrategy, while Fig-ure 5.5 encodes the more interestingTit-for-Tatstrategy. It starts in theCooperatestate,and the transitions are constructed so that the automaton always mimics the opponent’slast action.

//?>=<89:;D

C,D

Figure 5.4 An automaton representing the repeatedDefectaction.

//?>=<89:;C

C

D

**?>=<89:;D

D

C

jj

Figure 5.5 An automaton representing theTit-for-Tat strategy.

We can now define a new class of games, calledmachine games, in which eachmachine gamesplayer selects an automata representing a repeated game strategy.

c© Shoham and Leyton-Brown, 2007

Page 147: Mas kevin b-2007

5.1 Repeated games 131

Definition 5.1.7 (Machine game)A two-player machine gameGM = (1, 2,M1,M2, G)of thek-period repeated gameG is defined by a set of available automataMi for eachplayeri, and a normal form gameG = (1, 2, A1, A2, u1, u2). A pairM1 ∈M1 andM2 ∈ M2 deterministically yield an outcomeot(M1,M2) at each iterationt of therepeated game. Thus,GM induces a normal form game(1, 2,M1,M2, U1, U2), inwhich each playeri chooses an automataMi ∈ Mi, and obtains a utility defined byUi(M1,M2) =

∑kt=1 ui(o

t(M1,M2)).

Note that we can easily replace thek-period repeated game with a discounted (orlimit of means) infinitely repeated game, with a corresponding change toUi(M1,M2)in the induced normal form game.

Notation: In the sequel the functions : M → ZZ represents the number of states ofan automatonM , and the functionS(M) = maxM∈M s(M) represents the size of thelargest automaton among a set of automataM.

Automata of bounded size

Automata present a natural way to bound the rationality of players in a repeated game.Intuitively, it seems that automata with fewer states representsimplerstrategies. Thus,one way to bound the rationality of the player is by limiting the number of states in theautomaton.

Placing severe restrictions on the number of states not onlyinduces an equilibriumin which cooperation always occurs, but also causes the always-defect equilibrium todisappear. This equilibrium in a finitely repeated Prisoner’s Dilemma game dependson the assumption that each player can usebackward inductionto find his dominantstrategy. In order to perform backward induction in ak-period repeated game, eachplayer needs keep track of at leastk distinct states: one state to represent the choice ofstrategy in each repetition of the game. It turns out that if2 < S(M1), S(M2) < kholds, then the constant-defect strategy does not yield a symmetric equilibrium, whilethe Tit-for-Tat automaton does.

When the size of the automata is not restricted to be less thank, the constant-defectequilibrium does exist. However, there is still a large class of machine games in whichother equilibria exist with some amount of cooperation occurs, as shown in the follow-ing result.

Theorem 5.1.8 For any integerx, there exists an integerk0 such that for allk > k0,any machine gameGM = (1, 2,M1,M2, G) of thek-period repeated Prisoner’sDilemma game G in which k1/x ≤mins(M1), s(M2) ≤ maxs(M1), s(M2) ≤ kx holds has a Nash equilibriumin which the average payoffs to each player are at least3− 1

x .

Thus the average payoffs to each player can be much higher than (1, 1); in fact theycan be arbitrarily close to(3, 3), depending on the choice ofx. While this result usespure strategies for both players, a stronger result can be proved through the use of amixed strategy equilibrium.

Multiagent Systems, draft of August 27, 2007

Page 148: Mas kevin b-2007

132 5 Richer Representations: Beyond the Normal and Extensive Forms

Theorem 5.1.9 For everyǫ > 0, there exists an integerk0 such that for allk > k0,any machine gameGM = (1, 2,M1,M2, G) of thek-period repeated Prisoner’s

Dilemma gameG in whichmins(M1), s(M2) < 2ǫk

12(1+ǫ) has Nash equilibrium inwhich both player’s average payoff for each player at least3− ǫ.

Thus, if even one of the players’ automata has a size that is less than exponential inthe length of the game, an equilibrium with some degree of cooperation exists.

Automata with a cost of complexity

Now, instead of imposing constraints on the complexity of the automata, we will in-corporate it as a cost into the agent’s utility function. This could reflect, for example,the implementation cost of a strategy or the cost to learn it.While we cannot showtheorems similar to those in the preceding section, it turnsout that we can get mileageout of this idea even when we incorporate it in a minimal way. Specifically, an agent’sdisutility for complexity will only play a tie-breaking role. We will say that agentshave alexicographic disutility for complexityin a machine game if their utility func-tion Ui(·) in the induced normal form game is replaced by a preference orderingi

such that(M1,M2) ≻i (M ′1,M

′2) whenever eitherUi(M1,M2) > Ui(M

′1,M

′2) or

Ui(M1,M2) = Ui(M′1,M

′2) ands(Mi) < s(M ′

i).Consider a machine gameGM of the discounted infinitely repeated Prisoner’s Dilemma

in which both players have a lexicographic disutility for complexity. Recall that weshowed that the trigger strategy is an equilibrium strategyin the infinitely repeated Pris-oner’s Dilemma game with discounting. When the discount factor δ is large enough, ifplayer 2 is using the trigger strategy, then player 1 cannot achieve a higher payoff byusing any other strategy than the trigger strategy as well. We can represent the triggerstrategy using the machineM shown in Figure 5.6. Notice that while no other machinecan give player 1 a higher payoff, there does exist another machine that can achievethe same payoff but is less complex. Player 1’s machineM never enters the stateDduring play; it is designed only as a threat to the other player. Thus the machine whichcontains only the stateC will achieve the same payoff as the machineM , but with lesscomplexity. As a result, the outcome(M,M) is not a Nash equilibrium of the machinegameGM .

//?>=<89:;C

C

D //?>=<89:;D

D

Figure 5.6 An automaton representing the Trigger strategy.

We can also show several interesting properties of the equilibria of machine gamesin which agents have a lexicographic disutility for complexity. First, because machinesin equilibrium must minimize complexity, they have no unused states. Thus we knowthat in an infinite game, every state must be visited in some period. Furthermore, itturns out that the two machines selected in an equilibrium must have the same numberof states. Finally, the strategies represented by the machines in a Nash equilibrium ofthe machine game also form a Nash equilibrium of the infinitely repeated game.

c© Shoham and Leyton-Brown, 2007

Page 149: Mas kevin b-2007

5.1 Repeated games 133

Computing best response automata

In the previous sections we limited the rationality of agents in repeated games bybounding the number of states that they can use to represent their strategies. How-ever, it could be the case that the number of states used by theequilibrium strategies issmall, but the time required to compute them is prohibitively large. Furthermore, onecan argue (by introspection, for example) that bounding thecomputation of an agent isa more appropriate means of capturing bounded rationality than bounding the numberof states.

It seems reasonable that an equilibrium must be at least verifiable by agents. But thisdoes not appear to be the case for finite automata. (The results in this section are forthe limit-average case, but can be adapted to the discountedcase as well.)

Theorem 5.1.10Given a machine gameGM = (N,M1,M2, G) of a limit averageinfinitely repeated gameG = (N,A1, A2, u1, u2) with unknownN , and a choice ofautomataM1, . . . ,Mn for all players, there does not exist a polynomial time algorithmthat solves the problem of verifying whetherMi is a best response automaton for playeri.

Thew news is not all bad yet; if we hold N fixed, then the problemfalls back intoP,the class of problems computable in polynomial time. We can explain this informallyby noting that playeri does not have to scan all of his possible strategies in order todecide whether the automatonMi is the best response; since he knows the strategiesof the other players, he merely needs to scan the actual path taken on the game tree,which is bounded by the length of the game tree.

Notice that the previous result held even when the other players were assumed toplay pure strategies. The following result shows that the verification problem is hardeven in the two-player case when the players can randomize over machines.

Theorem 5.1.11Given a two-player machine gameGM = (1, 2,M1,M2, G) of alimit-average infinitely repeated gameG = (1, 2, A1, A2, u1, u2), and a mixed strat-egy for player2 in which the set of automata that are played with positive probabilityis finite, the problem of verifying that an automatonM1 is the best response automatonfor player1 is NP-complete.

So far we have abandoned the bounds on number of states in the automata, and onemight wonder whether such bounds could improve the worst-case complexity. How-ever, for the repeated Prisoner’s Dilemma game, it has the opposite effect: limiting thesize of the automata under consideration increases the complexity of computing a bestresponse. By Theorem 5.1.10 we know that when the size of the automata under con-sideration are unbounded and the number of agents is two, theproblem of computingthe best response in the classP. The following result shows that when the automataunder consideration are instead bounded, the problem becomesNP-complete.

Theorem 5.1.12Given a machine gameGM = (1, 2,M1,M2, G) of the limit av-erage infinitely repeated Prisoner’s Dilemma gameG, an automatonM2, and an inte-gerk, the problem of computing a best response automatonM1 for player1, such thats(M1) ≤ k, is NP-complete.

Multiagent Systems, draft of August 27, 2007

Page 150: Mas kevin b-2007

134 5 Richer Representations: Beyond the Normal and Extensive Forms

From finite automata to Turing machines

Turing machines are more powerful than finite state automatadue to their infinite mem-ory. One might expect that in this richer model, unlike with finite automata, gametheoretic results will be preserved. But they are not. For example, there is strong ev-idence (if not yet proof) that a game of two Turing machines can have equilibria thatare arbitrarily close to the repeatedCooperate payoff. Thus cooperative play can beapproximated even if the machines memorize the entire history of the game and arecapable of counting the number of repetitions.

The problem of computing a best response yields another unintuitive result. Evenif we restrict the opponent to strategies for which the best response Turing machine iscomputable, the general problem of finding the best responsefor any such input is notTuring computable when the discount factor is sufficiently close to one.

Theorem 5.1.13For the discounted infinitely repeated Prisoner’s Dilemma gameG,there exists a discount factorδ > 0 such that for any rational discount factorδ ∈ (δ, 1)there is no Turing-computable procedure for computing a best response to a strategydrawn from the set of all computable strategies that admit a computable best response.

Furthermore, even before worrying about computing a best response, there is a morebasic challenge: The best response to Turing machine may notbe a Turing machine!

Theorem 5.1.14For the discounted infinitely repeated Prisoner’s Dilemma gameG,there exists a discount factorδ > 0 such that for any rational discount factorδ ∈ (δ, 1)there exists an equilibrium profile(s1, s2) such thats2 can be implemented by a Turingmachine, but no best response tos2 can be implemented by a Turing machine.

5.2 Stochastic games

Intuitively speaking, a stochastic game is a collection of normal form games; the agentsrepeatedly play games from this collection, and the particular game played at any giveniteration depends probabilistically on the previous game played and on the actionstaken by all agents in that game.

5.2.1 Definition

Definition 5.2.1 (Stochastic game)Astochastic game(also known as aMarkov game)stochastic game

Markov gameis a tuple(Q,N,A1, . . . , A, P,R), where

• Q is a finite set of states,

• N is a finite set ofn players,

• A = A1 × · · · ×An, whereAi is a finite set of actions available to playeri,

• P : Q × A × Q → [0, 1] is the transition probability function;P (q, a, q) is theprobability of transitioning from states to stateq after joint actiona, and

c© Shoham and Leyton-Brown, 2007

Page 151: Mas kevin b-2007

5.2 Stochastic games 135

• R = r1, . . . , rn, whereri : Q×A→ R is a real-valued payoff function for playeri.

In this definition we have assumed that the strategy space of the agents is the same inall games, and thus that the difference between the games is only in the payoff function.Removing this assumption adds notation, but otherwise presents no major difficulty orinsights. RestrictingQ and eachAi to be finite is a substantive restriction, but we do sofor a reason; the infinite case raises a number of complications that we wish to avoid.

So far we have specified the payoff of a player at each stage game (or in each state),but not how these payoffs are aggregated into an overall payoff. The two most com-monly used aggregation methods are the ones already seen in connection with repeatedgames—average rewardandfuture discounted reward.

Stochastic games are very broad framework, generalizing both Markov DecisionProcesses (MDPs) and repeated games. An MDP is simply a stochastic game withonly one player, while a repeated game is a stochastic game inwhich there is onlyone state (or stage game). Another interesting subclass of stochastic games is zero-sum stochastic games, in which each stage game is a zero-sum game (that is, for anyq ∈ Q, a ∈ A we have that

∑i vi(q, a) = 0). Finally, in asingle-controller stochastic

gamethe transition probabilities depend only on the actions of one particular agent. single-controllerstochastic game

5.2.2 Strategies and equilibria

We will now define the strategy space of an agent. Letht =(q0, a0, q1, a1, . . . , at−1, qt) denote a history oft stages of a stochastic game, and letHt be the set of all possible histories of this length. The set ofdeterministic strategiesis the cross product×t,Ht

Ai, which requires a choice for each possible history at eachpoint in time. As in the previous game forms, an agent’s strategy can consist of anymixture over deterministic strategies. However, there areseveral restricted classes ofstrategies that are of interest, and they form the followinghierarchy. The first restric-tion is that the mixing take place at each history independently; this is the restriction tobehavioral strategies seen in connection with extensive-form games.

Definition 5.2.2 (Behavioral strategy) A behavioral strategysi(ht, aij) returns the behavioral

strategyprobability of playing actionaijfor historyht.

A Markov strategy further restricts a behavioral strategy so that, for a given timet,the distribution over actions only depends on the current state.

Definition 5.2.3 (Markov strategy) A Markov strategysi is a behavioral strategy in Markov strategywhichsi(ht, aij

) = si(h′t, aij

) if qt = q′t, whereqt andq′t are the final states ofht andh′t, respectively.

The final restriction is to remove the possible dependence onthe timet.

Definition 5.2.4 (Stationary strategy) A stationary strategysi is a Markov strategy in stationarystrategywhichsi(ht1 , aij

) = si(h′t2 , aij

) if qt1 = q′t2 , whereqt1 andq′t2 are the final states ofht1 andh′t2 , respectively.

Multiagent Systems, draft of August 27, 2007

Page 152: Mas kevin b-2007

136 5 Richer Representations: Beyond the Normal and Extensive Forms

Now we can consider the equilibria of stochastic games, a topic that turns out to befraught with subtleties. The discounted-reward case is theless problematic case. In thiscase it can be shown that a Nash equilibrium exists in every stochastic game. In fact,we can state an even stronger property. A strategy profile is called aMarkov perfectequilibrium (MPE)if it consists of only Markov strategies, and is a Nash equilibriumMarkov perfect

equilibrium(MPE)

regardless of the starting state. In a sense, MPE plays an analogous role to the subgame-perfect equilibrium defined for perfect-information games.

Theorem 5.2.5 Everyn-player, general sum, discounted reward stochastic game hasa Markov perfect equilibrium.

The case of average rewards presents greater challenges. For one thing, the limitaverage may not exist (although the stage-game payoffs are bounded, their average maycycle and not converge). However, there is a class of stochastic games which is wellbehaved in this regard. This is the class ofirreduciblestochastic games. A stochasticirreducible

stochastic game game is irreducible if every strategy profile gives rise to anirreducible Markov chainover the set of games. In such games the limit averages are well defined, and in fact wehave the following theorem:

Theorem 5.2.6 For every 2-player, general sum, average reward, irreducible stochas-tic game has a Nash equilibrium.

Indeed, under the same condition we can state a folk theorem similar to that ofrepeated games. That is, as long as we give each player an expected payoff that is atleast as large as his maxmin value, we can achieve any feasible payoff pair through theuse of threats.

Theorem 5.2.7 For every 2-player, general sum, irreducible stochastic game, and ev-ery feasible outcome with a payoff vectorr that provides to each player at least hisminmax value, there exists a Nash equilibrium with a payoff vectorr. This is true forgames with average rewards, as well as games with large enough discount factors (or,with players that are sufficiently patient).

5.2.3 Computing equilibria

The algorithms and results for stochastic games depend greatly on whether we use thediscounted reward or average reward for the agent utility function. We will discuss bothseparately, starting with the discounted reward case. The first question to ask about theproblem of finding a Nash equilibrium is whether a polynomialprocedure is available.The fact that there exists an linear programming formulation for solving MDPs (forboth the discounted reward and average reward case) gives usa reason for optimism,since stochastic games are generalization of MDPs. While such a formulation does notexist for the full class of stochastic games, it does for several non-trivial subclasses.

One such subclass is the set of 2-player, general sum, discounted reward stochasticgames in which the transitions are determined by a single player. Thesingle-controllercondition is formally defined as follows.

c© Shoham and Leyton-Brown, 2007

Page 153: Mas kevin b-2007

5.3 Bayesian games 137

Definition 5.2.8 (Single-controller stochastic game)A stochastic game as defined inDefinition 5.2.1 is said to besingle-controllerif there exists a playeri such that∀q, q′ ∈ Q ∀a ∈ A ,P (q, a, q′) =single-controller

stochastic gameP (q, a′, q′) if a−i = a′−i.

The same results holds when we replace the single-controller restriction with thefollowing pair of restrictions: that the state and action profile have independent effectson the reward achieved by each agent, and that the transitionfunction only depends onthe action profile. Formally, this pair is called theSeparable Reward State IndependentTransitioncondition.

Definition 5.2.9 (SR-SIT stochastic game)A stochastic game as defined in Defini-tion 5.2.1 is said to beSeparable Reward State Independent Transition (SR-SIT)if the Separable

Reward StateIndependentTransition(SR-SIT)stochastic game

following two conditions holds:

• There exist functionsα, γ such that∀i, q ∈ Q, a ∈ A it is the case thatri(q, a) =α(q) + γ(a).

• ∀q, q′, q′′ ∈ Q, a ∈ A is the case thatP (q, a, q′′) = P (q′, a, q′′)

Even when the problem does not fall into one of these subclasses, practical solutionsstill exist for the discounted case. One such solution is to apply a modified version ofNewton’s method to a nonlinear program formulation of the problem. An advantage ofthis method is that no local minima exist. For zero sum games,an alternative is to usean algorithm developed by Shapley that is related to value iteration, a commonly usedmethod for solving MDPs.

Moving on to the average reward case, we have to impose more restrictions in orderto use a linear program than we did for the discounted reward case. Specifically, for theclass of 2-player, general sum, average reward stochastic games, the single-controllerassumption no longer suffices—we also need the game to be zero-sum in order to usea linear program.

Even when we cannot use a linear program, irreducibility allows to use an algorithm,which is guaranteed to converge, that uses is a combination of policy iteration (anothermethod used for solving MDPs) and successive approximation.

5.3 Bayesian games

All of the game forms discussed so far assumed that all players know what game isbeing played. Specifically, the number of players, the actions available to each player,and the payoff associated with each action vector, have all been assumed to be commonknowledge among the players. (Note that this is true even of imperfect-informationgames; the actual moves of agents are not common knowledge, but the game is.) Incontrast,Bayesian games, or games of incomplete information, allow us to represent Bayesian game

game ofincompleteinformation

players’ uncertainties about the very game being played.3 This uncertainty is repre-sented as a probability distribution over a set of possible games. We make two assump-tions:

3. It is easy to confuse the term ‘incomplete information’ with‘imperfect information’; don’t. . .

Multiagent Systems, draft of August 27, 2007

Page 154: Mas kevin b-2007

138 5 Richer Representations: Beyond the Normal and Extensive Forms

1. All possible games have the same number of agents and the same strategy space foreach agent; they differ only in their payoffs.

2. The beliefs of the different agents are posteriors, obtained by conditioning a com-mon prior on individual private signals.

The second assumption is substantive, and we return to it shortly. The first is not par-ticularly restrictive, although at first it might seem to be.One can imagine many otherpotential types of uncertainty that players might have about the game—how many play-ers are involved, what actions are available to each player,and perhaps other aspectsof the situation. It might seem that we have severely limitedthe discussion by rulingthese out. However, it turns out that these other types of uncertainty can be reduced touncertainty only about payoffs via problem reformulation.

For example, imagine that we want to model a situation in which one player is un-certain about the number of actions available to the other players. We can reduce thisuncertainty to uncertainty about payoffs by padding the game with irrelevant actions.For example, consider the following two-player game, in which the row player doesnot know whether his opponent has only the two strategiesL andR or also the thirdone C:

L R

U 1, 1 1, 3

D 0, 5 1, 13

L C R

U 1, 1 0, 2 1, 3

D 0, 5 2, 8 1, 13

Now consider replacing the leftmost, smaller game by a padded version, in whichwe add a new C column:

L C R

U 1, 1 0,−100 1, 3

D 0, 5 2,−100 1, 13

Clearly the newly added column is dominated by the others andwill not participatein any Nash equilibrium (or any other reasonable solution concept). Indeed, there is anisomorphism between Nash equilibria of the original game and the padded one. Thusthe uncertainty about the strategy space is reduced to uncertainty about payoffs.

Using similar tactics, it can be shown that it is also possible to reduce uncertaintyabout other aspects of the game to uncertainty about payoffsonly. This is not a mathe-matical claim, since we have given no mathematical characterization of all the possible

c© Shoham and Leyton-Brown, 2007

Page 155: Mas kevin b-2007

5.3 Bayesian games 139

forms of uncertainty, but it is the case that such reductionshave been shown for all thecommon forms of uncertainty.

The second assumption about Bayesian games is thecommon-prior assumption, fa- common-priorassumptionmiliar from the discussion of multiagent probabilities andKP-structures in Chapter 12.

As discussed there, a Bayesian game thus defines not only the uncertainties of agentsabout the game being played, but also their beliefs about thebeliefs of other agentsabout the game being played, and indeed an entire infinite hierarchy of nested beliefs(the so-called epistemic type space). As also discussed in Chapter 12, the common-prior assumption is a substantive assumption which limits the scope of applicability.We nonetheless make this assumption since it allows us to formulate the main ideas inBayesian games, and without the assumption the subject matter becomes much moreinvolved than is appropriate for this text. Indeed, most (but not all) work in game theorymakes this assumption.

5.3.1 Definition

There are several different ways of presenting Bayesian games; we will offer three(equivalent) definitions. Each formulation is useful in different settings, and offersdifferent intuition about the underlying structure of thisfamily of games.

First definition: Information sets

First, we present a definition that is based on information sets. Under this definition, aBayesian game consists of a set of games that differ only in their payoffs, a commonprior defined over them, and a partition structure over the games for each agent.4

Definition 5.3.1 (Bayesian game: information sets)A Bayesian gameis a tuple Bayesian game(N,G,P, I) where

• N is a set of agents,

• G is a set of games withN agents each such that ifg, g′ ∈ G then for each agenti ∈ N the strategy space ing is identical to the strategy space ing′,

• P ∈ Π(G) is a common prior over games, whereΠ(G) is the set of all probabilitydistributions overG, and

• I = (I1, ..., IN ) is a tuple of partitions ofG, one for each agent.

Figure 5.7 gives an example of a Bayesian game. It consists ofeight2× 2 zero-sumgames, and each agent’s partition consists of two equivalence classes.

4. This combination of a common prior and a set of partitions overstates of the world turns out to correspondto a KP-structure, defined in Chapter 12.

Multiagent Systems, draft of August 27, 2007

Page 156: Mas kevin b-2007

140 5 Richer Representations: Beyond the Normal and Extensive Forms

I2,1 I2,2

I1,1

A1 00 3p = 0.2

B2 11 4p = 0.1

C1 32 1p = 0.1

D1 04 0p = 0

I1,2

E3 20 3p = 0.1

F2 41 1p = 0.1

G0 13 1

p = 0.25

H2 03 4

p = 0.15

Figure 5.7 A Bayesian game.

Second definition: Extensive form with chance moves

A second way of capturing the common prior is to hypothesize aspecial agent calledNature which makes probabilistic choices. While we could have Nature’s choice beinterspersed arbitrarily with the agents’ moves, it does not imply a loss of general-ity to assume that nature makes all its choices at the outset.Nature does not have autility function (or, alternatively, it can be viewed as having a constant one), and hasthe unique strategy of randomizing in a commonly-known way.The agents receiveindividual signals about Nature’s choice, and these are captured by their informationsets, in a standard way. The agents have no additional information; in particular, theinformation sets capture the fact that agents make their choices without knowing thechoices of others. Thus, we have reduced games of incompleteinformation to games ofimperfect information, albeit ones with chance moves. These chance moves of Naturerequire minor adjustments of existing definitions, replacing payoffs by their expecta-tions, given Nature’s moves.5

For example, the Bayesian game of Figure 5.7 can be represented in extensive formas depicted in Figure 5.8.

Although this second definition of Bayesian games can be moreinitially intuitivethan our first definition, it can also be more cumbersome to work with. This is be-cause we use an extensive-form representation in a setting where players are unableto observe each others’ moves. (Indeed, for the same reason we do not routinelyuse extensive-form games of imperfect information to modelsimultaneous interactionssuch as the Prisoner’s Dilemma, though we could do so if we wished.) For this reason,we will not make any further use of this definition. We close bynoting one advan-tage that it does have, however: it extends very naturally toBayesian games in whichplayers move sequentially and do (at least sometimes) learnabout previous players’

5. Note that the special structure of this extensive form means that we do not have to agonize over therefinements of Nash equilibrium; since agents have no information about prior choices made other than bynature, all Nash equilibria are also sequential equilibria.

c© Shoham and Leyton-Brown, 2007

Page 157: Mas kevin b-2007

5.3 Bayesian games 141

q q q q q q q q q q q q q q q qq q q q q q q qq q q qq

1 1

2 2

@@

@@

@

PPPPPPPPPPPP

@

@

@

@

@

@

@

@

AA

AA

AA

AA

AA

AA

AA

AA

L R L R L R L R L R L R L R L R

U D U D U D U D

A,B C,D E,F G,H

Nature

43

13

13

103 1 3 2 1 5

2 3 12 2 3

458 3 17

8

Figure 5.8 The Bayesian game from Figure 5.7 in extensive form

moves.

Third definition: (epistemic) types

Recall that a game may be defined by a set of players, actions, and utility functions.Observe that in our first definition agents are uncertain about which game they areplaying; however, each possible game has the same sets of actions and players, and soagents are really only uncertain about the game’s utility function.

Our third definition uses the notion of anepistemic type, or simplytypeas a way of epistemic typedefining uncertainty directly over a game’s utility function.

Definition 5.3.2 (Bayesian game: types)A Bayesian game is a tuple Bayesian game(N,A,Θ, p, u) where

• N is a set of agents,

• A = (A1, . . . , An), whereAi is the set of actions available to playeri,

• Θ = (Θ1, . . . ,Θn), whereΘi is the type space of playeri,

• p : Θ→ [0, 1] is the common prior over types,

• u = (u1, . . . , un), whereui : A×Θ→ R is the utility function for playeri.

The assumption is that all of the above is common knowledge among the players,and that each agent knows his own type. This definition can seem mysterious, becausethe notion of type can be rather opaque. In general, the type of agent encapsulates allthe information possessed by the agent which is not common knowledge. This is oftenquite simple (for example, the agent’s knowledge of his private payoff function), butcan also include his beliefs about other agents’ payoffs, about their beliefs about hisown payoff, and any other higher-order beliefs.

We can get further insight into the notion of a type by relating it to the formulation atthe beginning of this section. Consider again the Bayesian game in Figure 5.7. For eachof the agents we have two types, corresponding to his two information sets. Denote

Multiagent Systems, draft of August 27, 2007

Page 158: Mas kevin b-2007

142 5 Richer Representations: Beyond the Normal and Extensive Forms

player 1’s actions as U and D, player 2’s actions as L and R. Call the types of the firstagentθ1,1 andθ1,2, and those of the second agentθ2,1 andθ2,2. The joint distributionon these types is as follows:p(θ1,1, θ2,1) = .3, p(θ1,1, θ2,2) = .1, p(θ1,2, θ2,1) = .2,p(θ1,2, θ2,2) = .4. The conditional probabilities for the first player arep(θ2,1 | θ1,1) =3/4, p(θ2,2 | θ1,1) = 1/4, p(θ2,1 | θ1,2) = 1/3, andp(θ2,2 | θ1,2) = 2/3. Theutility function for player 1ui is defined in Figure 5.9; player 2’s utility is the negativeof player 1’s utility. For example,u1(U,L, θ1,1, θ2,1) was calculated as 0.2

0.2+0.1 (1) +0.1

0.2+0.1 (2).

a1 a2 θ1 θ2 u1

U L θ1,1 θ2,1 4/3U L θ1,1 θ2,2 1U L θ1,2 θ2,1 5/2U L θ1,2 θ2,2 3/4U R θ1,1 θ2,1 1/3U R θ1,1 θ2,2 3U R θ1,2 θ2,1 3U R θ1,2 θ2,2 5/8

a1 a2 θ1 θ2 u1

D L θ1,1 θ2,1 1/3D L θ1,1 θ2,2 2D L θ1,2 θ2,1 1/2D L θ1,2 θ2,2 3D R θ1,1 θ2,1 10/3D R θ1,1 θ2,2 1D R θ1,2 θ2,1 2D R θ1,2 θ2,2 17/8

Figure 5.9 Utility function u1 for the Bayesian game from Figure 5.7.

5.3.2 Strategies and equilibria

Now that we have defined Bayesian games, we must explain how toreason about them.We will do this using the epistemic type definition above, because that is the definitionmost commonly used in mechanism design (discussed in Chapter 9), one of the mainapplications of Bayesian games. All of the concepts defined below can also be ex-pressed in terms of the first two Bayesian game definitions, ofcourse; we leave thisexercise to the reader.

The first task is to define an agent’s strategy space in a Bayesian game. Recall thatin an imperfect-information extensive-form game a pure strategy was a mapping frominformation sets to actions. The definition is similar in Bayesian games: a pure strategyαi : Θi → Ai is a mapping from every type agenti could have to the action he wouldplay if he had that type. We can then define mixed strategies inthe natural way asprobability distributions over pure strategies. As before, we denote a mixed strategyfor i assi ∈ Si, whereSi is the set of alli’s mixed strategies. Furthermore, we use thenotationsj(aj |θj) to denote the probability under mixed strategysj that agentj playsactionaj , given thatj’s type isθj .

Next, since we have defined an environment with multiple sources of uncertainty, wewill pause to reconsider the definition of an agent’s expected utility. In a Bayesian gamesetting, there are three meaningful notions of expected utility: ex-post, ex-interimandex-ante. The first is computed based on all agents’ actual types, the second considersthe setting in which an agent knows his own type but not the types of the other agents,and in the third case the agent does not know anybody’s type.

Definition 5.3.3 (Ex-postexpected utility) Agent i’s ex-postexpected utility in aex-postexpectedutility

c© Shoham and Leyton-Brown, 2007

Page 159: Mas kevin b-2007

5.3 Bayesian games 143

Bayesian game(N,A,Θ, p, u), where the agents’ strategies are given bys and theagent’ types are given byθ, is defined as

EUi(s, θ) =∑

a∈A

j∈N

sj(aj |θj)

ui(a, θ). (5.1)

In this case, the only uncertainty concerns the other agents’ mixed strategies, sinceagenti’s ex-postexpected utility is computed based on the other agents’ actual types.Of course, in a Bayesian game no agentwill know the others’ types; while that doesnot prevent us from offering the definition above, it might make the reader question itsusefulness. We will see that this notion of expected utilityis useful both for definingthe other two and also for defining a specialized equilibriumconcept.

Definition 5.3.4 (Ex-interim expected utility) Agenti’s ex-interimexpected utilityin ex-interimexpected utilitya Bayesian game(N,A,Θ, p, u), wherei’s type isθi and where the agents’ strategies

are given by the mixed strategy profiles, is defined as

EUi(s, θi) =∑

θ−i∈Θ−i

p(θ−i|θi)∑

a∈A

j∈N

sj(aj |θj)

ui(a, θ−i, θi). (5.2)

or equivalently as

EUi(s, θi) =∑

θ−i∈Θ−i

p(θ−i|θi)EUi(s, (θi, θ−i)). (5.3)

Thus,i must consider every assignment of types to the other agentsθ−i and everypure action profilea in order to evaluate his utility functionui(a, θi, θ−i). He mustweight this utility value by two amounts: the probability that the other players’ typeswould beθ−i given that his own type isθi, and the probability that the pure actionprofile a would be realized given all players’ mixed strategies and types. (Observethat agents’ types may be correlated.) Because uncertaintyover mixed strategies wasalready handled in theex-postcase, we can also writeex-interimexpected utility as aweighted sum ofEUi(s, θ) terms.

Finally, there is theex-antecase, where we computei’s expected utility under thejoint mixed strategys without observing any agents’ types.

Definition 5.3.5 (Ex-anteexpected utility) Agent i’s ex-anteexpected utility in a ex-anteexpectedutilityBayesian game(N,A,Θ, p, u), where the agents’ strategies are given by the mixed

strategy profiles, is defined as

EUi(s) =∑

θ∈Θ

p(θ)∑

a∈A

j∈N

sj(aj |θj)

ui(a, θ) (5.4)

or equivalently as

EUi(s) =∑

θ∈Θ

p(θ)EUi(s, θ) (5.5)

Multiagent Systems, draft of August 27, 2007

Page 160: Mas kevin b-2007

144 5 Richer Representations: Beyond the Normal and Extensive Forms

or again equivalently as

EUi(s) =∑

θi∈Θi

p(θi)EUi(s, θi). (5.6)

Next, we define best response.

Definition 5.3.6 (Best response in a Bayesian game)The set of agenti’s best responsesbest response ina Bayesian game to mixed strategy profiles−i are given by

BRi(s−i) = arg maxs′

i∈Si

EUi(s′i, s−i). (5.7)

Note thatBRi is a set because there may be many strategies fori that yield thesame expected utility. It may seem odd thatBR is calculated based oni’s ex-anteexpected utility. However, writeEUi(s) as

∑θi∈Θi

p(θi)EUi(s, θi) and observe thatEUi(s

′i, s−i, θi) does not depend on strategies thati would play if his type were notθi.

Thus, we are in fact performing independent maximization ofi’s ex-interimexpectedutility conditioned on each type that he could have. Intuitively speaking, if a certainaction is best after the signal is received, it is also the best conditional plan devisedahead of time for what to do should that signal be received.

We are now able to define the Bayes-Nash equilibrium.

Definition 5.3.7 (Bayes-Nash equilibrium)ABayes-Nash equilibriumis a mixed strat-Bayes-Nashequilibrium egy profiles that satisfies∀i si ∈ BRi(s−i).

This is exactly the definition we gave for the Nash equilibrium in Definition 2.3.4:each agent plays a best response to the strategies of the other players. The differencefrom Nash equilibrium, of course, is that the definition of Bayes-Nash equilibrium isbuilt on top of the Bayesian game definitions of best responseand expected utility.Observe that we would not be able to define equilibrium in thisway if an agent’sstrategies were not defined for every possible type. In orderfor a given agenti to playa best response to the other agents−i, i must know what strategy each agent wouldplay for each of his possible types. Without this information, it would be impossible toevaluate the termEUi(s

′i, s−i) in Equation (5.7).

5.3.3 Computing equilibria

Despite its similarity to the Nash equilibrium, the Bayes-Nash equilibrium may seemmore conceptually complicated. However, as we did with extensive-form games, wecan construct a normal form representation which corresponds to a given Bayesiangame.

As with games in extensive-form, the induced normal form forBayesian games hasan action for every pure strategy. That is, the actions for anagenti are the distinctmappings fromΘi to Ai. Each agenti’s payoff given a pure strategy profiles is his

c© Shoham and Leyton-Brown, 2007

Page 161: Mas kevin b-2007

5.3 Bayesian games 145

ex-anteexpected utility unders. Then, as it turns out, the Bayes-Nash equilibria of aBayesian game are precisely the Nash equilibria of its induced normal form. This factallows us to note that Nash’s theorem applies directly to Bayesian games, and henceBayes-Nash equilibria always exist.

An example will help. Consider again the Bayesian game from Figure 5.9. Note thatin this game each agent has four possible pure strategies (two types and two actions).Then player 1’s four strategies in the Bayesian game can be labelledUU , UD, DUandDD: UU means that 1 choosesU regardless of his type,UD that he choosesUwhen he has typeθ1,1 andD when he has typeθ1,2, and so forth. Similarly, we candenote the strategies of player 2 in the Bayesian game byRR,RL, LR, andLL.

We now define a4 × 4 normal form game in which these are the four strategies ofthe two agents, and the payoffs are the expected payoffs in the individual games, giventhe agents’ common prior beliefs. For example, the value of(UU,LL) is calculated asfollows:

u1(UU,LL) =∑

θ∈Θ

p(θ)u1(U,L, θ)

=(θ1,1, θ2,1)u1(U,L, θ1,1, θ2,1) + p(θ1,1, θ2,2)u1(U,L, θ1,1, θ2,2)+

p(θ1,2, θ2,1)u1(U,L, θ1,2, θ2,1) + p(θ1,2, θ2,2)u1(U,L, θ1,2, θ2,2)

=0.3

(4

3

)+ 0.1(1) + 0.2

(5

2

)+ 0.4

(3

4

)= 1.3

Continuing in this manner, the complete payoff matrix can beconstructed as indi-cated in Figure 5.10.

LL LR RL RR

UU 1.3 1.45 1.1 1.25

UD 1.8 1.65 1.8 1.65

DU 1.1 0.7 2.0 1.95

DD 1.5 1.15 2.8 2.35

Figure 5.10 Induced normal form of the game from Figure 5.9.

Recall that we calculate only one payoff per outcome becausethe game is zero-sum.Now note that strategy profile(UD,LR) is a Nash equilibrium of this normal formgame because it is the maximum of its column and the minimum ofits row.

Given a particular signal, the agent can compute the posterior probabilities and re-compute the expected utility of any given strategy vector. Thus in the previous example

Multiagent Systems, draft of August 27, 2007

Page 162: Mas kevin b-2007

146 5 Richer Representations: Beyond the Normal and Extensive Forms

once the row agent gets the signalθ1,1 he can update the expected payoffs and computethe new game shown in Figure 5.11.

LL LR RL RR

UU 1.25 1.75 0.5 1.0

UD 1.25 1.75 0.5 1.0

DU 0.75 0.5 3.0 2.75

DD 0.75 0.5 3.0 2.75

Figure 5.11 Ex-interiminduced normal form game, where player 1 has typeθ1,1.

Note thatUD is still the row player’s best response toLR; what has changed is howmuch better it is compared to the other three strategies. However, it is meaninglessto analyze the Nash equilibria in this payoff matrix; these expected payoffs are notcommon knowledge, and indeed if the column player were to condition on his signal,he would arrive at a different set of numbers (though, again,for himLR would still bethe best response toUD). Ironically, it is only in the induced normal form, in whichthe payoffs don’t correspond to anyex-interimassessment of any agent, that the Nashequilibria are meaningful.

Other computational techniques exist for Bayesian games which also have temporalstructure—that is, for Bayesian games written using the “extensive form with chancemoves” formulation, for which the game tree is smaller than its induced normal form.First, there is an algorithm for Bayesian games of perfect information that generalizesbackward induction (defined in Section 4.1.4). This algorithm is calledexpectimax.Expectimax

algorithm Intuitively, this algorithm is very much like the standard backward induction algorithmgiven in Figure 4.6. Like that algorithm, expectimax recursively explores a game tree,labeling each non-leaf nodeh with a payoff vector by examining the labels of eachof h’s child nodes—the actual payoffs when these child nodes are leaf nodes—andkeeping the payoff vector in which the agent who moves ath achieves maximal utility.The new wrinkle is that chance nodes must also receive labels. Expectimax labels achance nodeh with a weighted sum of the labels of its child nodes, where theweightsare the probabilities that each child node will be selected.The same idea of labelingchance nodes with the expected value of the next node’s labelcan also be appliedto extend the minimax algorithm (from which expectimax getsits name) and alpha-beta pruning (see Figure 4.7) in order to solve zero-sum games. This is a popularalgorithmic framework for building computer players for perfect-information games ofchance such as Backgammon.

There are also efficient computational techniques for computing sample equilibria ofimperfect-information extensive-form games with chance nodes. In fact, all the compu-

c© Shoham and Leyton-Brown, 2007

Page 163: Mas kevin b-2007

5.4 Congestion games 147

tational results for computing with the sequence form that we discussed in Section 4.2.3still hold when chance nodes are added. Intuitively, the only change we need to makeis to replace our definition of the payoff function (Definition 4.2.7) with an expectedpayoff that supplies the expected value, ranging over Nature’s possible actions, of thepayoff the agent would achieve by following a given sequence. This means that we cansometimes achieve a substantial computational savings by working with the extensiveform representation of a Bayesian game, rather than considering the game’s inducednormal form.

5.3.4 Ex-postequilibria

Finally, working withex-postutilities allows us to define an equilibrium concept thatis stronger than the Bayes-Nash equilibrium.

Definition 5.3.8 (ex-postequilibrium) An ex-postequilibrium is a mixed strategy ex-postequilibriumprofiles that satisfies∀θ, ∀i, si ∈ arg maxs′

i∈SiEUi(s

′i, s−i, θ).

Observe that this definition does not presume that each agentactuallydoesknowthe others’ types; instead, it says that no agent would ever want to deviate from hismixed strategyeven ifhe knew the complete type vectorθ. This form of equilibrium isappealing because it is unaffected by perturbations in the type distributionp(θ). Saidanother way, anex-postequilibrium does not ever require any agent to believe thatthe others have accurate beliefs about his own type distribution. (Note that a standardBayes-Nash equilibriumcan imply this requirement.) Theex-postequilibrium is thussimilar in flavor to equilibria in dominant strategies, which do not require agents tobelieve that other agents act rationally. In fact, equilibria in dominant strategies are ayet stronger equilibrium concept: every equilibrium in dominant strategies is anex-postequilibrium, but the reverse is not true. This is because under anex-postequilibriumit still matters to an agent what strategies the others adopt; it simply does not matterwhat types they turn out to have. Finally,ex-postequilibria share another unfortunatesimilarity with equilibria in dominant strategies—they arenot guaranteed to exist.

5.4 Congestion games

Congestion games are a restricted class of games that are useful for modeling someimportant real world settings and that also have attractivetheoretical properties. In-tuitively, they simplify the representation of a game by imposing constraints on theeffects that a single agent’s action can have on other agents’ utilities.

5.4.1 Definition

Formally, a congestion game is single-shotn-player game, defined as follows.

Definition 5.4.1 (Congestion game)A congestion gameis a tuple(N,R,A, c), where congestiongames

• N is a set ofn agents;

Multiagent Systems, draft of August 27, 2007

Page 164: Mas kevin b-2007

148 5 Richer Representations: Beyond the Normal and Extensive Forms

• R is a set ofr resources;

• A = (A1, . . . , An), whereAi ⊆ 2R \ ∅ is the set ofactionsfor agenti; and

• c = (c1, . . . , cr), whereck : N→ R is acost functionfor resourcek ∈ R.

The players’ actions are defined using the set of resourcesR. LetA′ denote2R\∅,the set of all subsets of the resources except for the empty set. Now for each playeri,let his set of actionsAi ⊆ A′ be an arbitrary subset ofA′. By the notationr ∈ ai wewill denote the proposition that actionai corresponds to a set of resources that hasr asa member.

The players’ utility functions are defined in terms of the cost functionsck. Define# : R × A → N as a function that counts the number of players who took any actionthat involves resourcer under action profilea. For each resourcek, define a costfunction ck : N → R. Now we are ready to state the utility function,6 which is thesame for all players. Given a pure strategy profilea = (ai, a−i),

ui(a) =∑

r∈R|r∈ai

cr(#(r, a)).

Observe that while the agents can have different actions available to them, they allhave the same utility function. Furthermore, observe that congestion games have ananonymityproperty: players care abouthow mayothers use a given resource, but theyanonymitydo not care aboutwhichothers do so.

One motivating example for this formulation is a computer network in which severalusers want to send a message from one node to another at approximately the same time.Each link connecting two nodes is a resource, and an action for a user is to select a pathof links connecting their source and target node. The cost function for each resourceexpresses the latency on each link as a function of its congestion.

As the name suggests, a congestion game typically features functionsck(·) that areincreasing in the number of people who choose that resource,as would be the case inthe network example. However, congestion games can just as easily handle positiveexternalities (or even cost functions that oscillate). A popular formulation that capturesboth types of externalities is theSanta Fe (or, El Farol) Bar Problem, in which eachSanta Fe Bar

Problem of a set of people independently selects whether or not to go to the bar. The utility ofattending increases with the number of other people who select the same night, up tothe capacity of the bar. Beyond this point, utility decreases because the bar gets toocrowded. Deciding not to attend yields a baseline utility that does not depend on theactions of the participants.7

6. This utility function is negated because the cost functions are historically understood as penalties that theagents want to minimize. We note that thecr functions are also permitted to be negative.7. Incidentally, this problem is typically studied in a repeated game context, in which (possibly boundedlyrational) agents must learn to play an equilibrium. It is famouspartly for not having a symmetric purestrategy equilibrium, and has been generalized with the concept ofminority games, in which agents get thehighest payoff for choosing a minority action.

c© Shoham and Leyton-Brown, 2007

Page 165: Mas kevin b-2007

5.4 Congestion games 149

5.4.2 Computing equilibria

Congestion games are interesting for reasons beyond the fact that they can compactlyrepresent realisticn-player games like the examples above. In particular,

Theorem 5.4.2 Every congestion game has a pure-strategy Nash equilibrium.

We defer the proof for the moment, though we note that the property is importantbecause mixed strategy equilibria are open to criticisms that they are less likely thanpure strategy equilibria to arise in practice (see Section??). Furthermore, this theoremtells us that if we want to compute a sample Nash equilibrium of a congestion game, wecan look for a pure-strategy equilibrium. Consider themyopic best responseprocess, myopic best

responsedescribed in Figure 5.12.

function MYOPICBESTRESPONSE(gameG, action profilea) returns awhile there exists an agenti for whomai is not a best response toa−i do

a′i ← some best response byi to a−i

a← (a′i, a−i)

return a

Figure 5.12 Myopic best response algorithm. It is invoked starting with an arbitrary (e.g.,random) action profilea.

By the definition of equilibrium, MYOPICBESTRESPONSEreturns a pure strategyNash equilibrium if it terminates. Because this procedure is so simple, it is an appealingway to search for an equilibrium. However, in general games MYOPICBESTRESPONSE

can get caught in a cycle, even when a pure strategy Nash equilibrium exists. Forexample, consider the game in Figure 5.13.

L C R

U −1, 1 1,−1 −2,−2

M 1,−1 −1, 1 −2,−2

D −2,−2 −2,−2 2, 2

Figure 5.13 A game on which MYOPICBESTRESPONSEcan fail to terminate.

This game has one pure-strategy Nash equilibrium:(D,R). However, if we runMYOPICBESTRESPONSEwith a = (L,U) the procedure will cycle forever. (Do yousee why?) This suggests that MYOPICBESTRESPONSEmay be too simplistic to beuseful in practice. Interestingly, itis useful for congestion games.

Theorem 5.4.3 TheMYOPICBESTRESPONSEprocedure is guaranteed to find a pure-strategy Nash equilibrium of a congestion game.

Multiagent Systems, draft of August 27, 2007

Page 166: Mas kevin b-2007

150 5 Richer Representations: Beyond the Normal and Extensive Forms

5.4.3 Potential games

To prove the two theorems from the previous section, it is useful to introduce the con-cept of potential games.8

Definition 5.4.4 (Potential game)A gameG = (N,A, u) is a potential gameif therepotential gamesexists a functionP : A→ R such that, for alli ∈ N , all a−i ∈ A−i andai, a

′i ∈ Ai,

ui(ai, a−i)− ui(a′i, a−i) = P (ai, a−i)− P (a′i, a−i). (5.8)

It is easy to prove the following property:

Theorem 5.4.5 Every (finite) potential game has a pure-strategy Nash equilibrium.

Proof. Let a∗ = arg maxa∈A P (a). Clearly for any other action profilea′,P (a∗) ≥ P (a′). Thus by definition of potential function, for any agentiwho can change the action profile froma∗ to a′ by changing his own action,ui(a

∗) ≥ ui(a′).

Let Ir∈aibe an indicator function that returns1 if r ∈ ai for a given actionai,

and0 otherwise. We also overload the notation# to give the expression#(r, a−i) itsobvious meaning. Now we can show:

Theorem 5.4.6 Every congestion game is a potential game.

Proof. We demonstrate that every congestion game has the potentialfunctionP (a) =

∑r∈R

∑#(r,a)j=1 cr(j). To accomplish this, we must show that for any

agenti and any action profiles(ai, a−i) and(a′i, a−i), the difference between thepotential function evaluated at these action profiles is thesame asi’s difference in

8. The potential games we discuss here are more formally known asexact potential games, though it iscorrect to shorten their name to the termpotential games. There are other variants with somewhat differentproperties, such as weighted potential games and ordinal potential games. These variants differ in the ex-pression that appears in Equation 5.8; for example, ordinal potential games generalize potential games withthe conditionui(ai, a−i) − ui(a

′i, a−i) > 0 iff P (ai, a−i) − P (a′

i, a−i) > 0. More can be learnedabout these distinctions by consulting the reference givenin the chapter notes.

c© Shoham and Leyton-Brown, 2007

Page 167: Mas kevin b-2007

5.4 Congestion games 151

utility.

P (ai, a−i)− P (a′i, a−i)

=

r∈R

#(r,(ai,a−i))∑

j=1

cr(j)

r∈R

#(r,(a′i,a−i))∑

j=1

cr(j)

=

r∈R

#(r,(a−i))∑

j=1

cr(j)

+ Ir∈aicr(j + 1)

r∈R

#(r,(a−i))∑

j=1

cr(j)

+ Ir∈a′icr(j + 1)

=

[∑

r∈R

Ir∈aicr(#(r, a−i) + 1)

]−[∑

r∈R

Ir∈a′icr(#(r, a−i) + 1)

]

=

r∈R|r∈ai

cr(#(r, (ai, a−i)))

r∈R|r∈a′i

cr(#(r, (a′i, a−i)))

= ui(ai, a−i)− ui(a′i, a−i)

Now that we have this result, the proof to Theorem 5.4.2 (stating that every conges-tion game has a pure-strategy Nash equilibrium) follows directly from Theorem 5.4.5and Theorem 5.4.6. Furthermore, though we do not state this result formally, it turnsout that the mapping given in Theorem 5.4.6 also holds in the other direction: everypotential game can be represented as a congestion game.

Potential games (along with their equivalence to congestion games) also make iteasy to prove Theorem 5.4.3 (stating that MYOPICBESTRESPONSEwill always find apure-strategy Nash equilibrium), which we had previously deferred:

Proof of Theorem 5.4.3.By Theorem 5.4.6 it is sufficient to show that MY-OPICBESTRESPONSEfinds a pure-strategy Nash equilibrium of any potentialgame. With every step of the while loop,P (a) strictly increases, because byconstructionui(a

′i, a−i) > u(ai, a−i), and thus by the definition of a potential

functionP (a′i, a−i) > P (ai, a−i). Since there are only a finite number of actionprofiles, the algorithm must terminate.

Thus, when given a congestion game MYOPICBESTRESPONSEwill converge re-gardless of the cost functions (e.g., they do not need to be monotonic), the actionprofile with which the algorithm is initialized and which agent we choose as agenti in the while loop (when there is more than one agent who is not playing a best re-sponse). Furthermore, we can see from the proof that it is noteven necessary thatagentsbest-respondat every step. The algorithm will still converge to a pure-strategyNash equilibrium by the same argument as long as agents deviate to abetterresponse.On the other hand, it has recently been shown that the problemof finding a pure Nash

Multiagent Systems, draft of August 27, 2007

Page 168: Mas kevin b-2007

152 5 Richer Representations: Beyond the Normal and Extensive Forms

equilibrium in a congestion game isPLS-complete: as hard to find as any object whoseexistence is guaranteed by a potential function argument. Intuitively, this means thatour problem is as hard as finding a local minimum in a travelingsalesman problemusing local search. This cautions us to expect that MYOPICBESTRESPONSEwill beinefficient in the worst case.

5.4.4 Nonatomic congestion games

A nonatomic congestion game is a congestion game that is played by an uncountablyinfinite number of players. These games are used to model congestion scenarios inwhich the number of agents is very large, and each agent’s effect on the level of con-gestion is very small. For example, consider modeling traffic congestion in a freewaysystem.

Definition 5.4.7 (Nonatomic congestion game)A nonatomic congestion gameis anonatomiccongestiongames

tuple(N,µ,R,A, ρ, c), where

• N = 1, . . . , n is a set oftypesof players;

• µ = (µ1, . . . , µn); for eachi ∈ N there is a continuum of players represented bythe interval[0, µi];

• R is a set ofk resources;

• A = (A1, . . . , An), whereAi ⊆ 2R \ ∅ is the set ofactionsfor agents of typei;

• ρ = (ρ1, . . . , ρn), where for eachi ∈ N , ρi : Ai × R → R+ denotes the amountof congestion contributed to a given resourcer ∈ R by players of typei selecting agiven actionai ∈ Ai; and

• c = (c1, . . . , ck), wherecr : R+ → R is a cost functionfor resourcer ∈ R, andcr is non-negative, continuous and nondecreasing.

To simplify notation, assume thatA1, . . . , An are disjoint; denote their union asS. An action distributions ∈ S indicates how many players choose each action; bys(ai) denote the element ofs that corresponds to the measure of the set of players oftype i who select actionai ∈ Ai. An action distributions must have the propertiesthat all entries are nonnegative real numbers and that

∑ai∈Ai

s(ai) = µi. Note thatρi(ai, r) = 0 when r 6∈ ai. Overloading notation, we write assr the amount ofcongestion induced on resourcer ∈ R by action distributions:

sr =∑

i∈N

ai∈Ai

ρi(ai, r)s(ai).

We can now express the utility function. As in (atomic) congestion games, all agentshave the same utility function, and the function depends only on how many agentschoose each actions rather than their identities. Bycai,s we denote the cost, under anaction distributions, to agents of typei who choose actionai. Then

cai(s) =

r∈ai

ρ(ai, r)cr(sr),

c© Shoham and Leyton-Brown, 2007

Page 169: Mas kevin b-2007

5.4 Congestion games 153

and so we haveui(ai, s) = −cai(s). Finally, we can define thesocial costof an action

profile as the total cost born by all the agents,

C(s) =∑

i∈N

ai∈Ai

s(ai)cai(s).

Despite the fact that we have an uncountably infinite number of agents, we can stilldefine a Nash equilibrium in the usual way.

Definition 5.4.8 (Pure strategy Nash equilibrium of a nonatomic congestion game)An action distributions arises in a pure strategy equilibrium of a nonatomic conges-tion game if for each player typei ∈ N and each pair of actionsa1, a2 ∈ Ai withs(a1) > 0, ui(a1, s) ≥ ui(a2, s) (and henceca1

(s) ≤ ca2(s)).

A couple of warnings are in order. First, the attentive reader will have noticed thatwe have glossed over the difference between actions and strategies. This is to simplifynotation, and because we will only be concerned with pure-strategy equilibria. We donote that results exist concerning mixed-strategy equilibria of nonatomic congestiongames; see the references cited at the end of the chapter. Second, we say only thatan action distributionarises inan equilibrium because an action distribution does notidentify the action taken by every individual agent, and hence cannotconstituteanequilibrium. Nevertheless, from this point on we will ignore this distinction.

We can now state some properties of nonatomic congestion games.

Theorem 5.4.9 Every nonatomic congestion game has a pure strategy Nash equilib-rium.

Furthermore, limiting ourselves by considering only pure strategy equilibria is insome sense not restrictive.

Theorem 5.4.10All equilibria of a nonatomic congestion game have equal social cost.

Intuitively, because the players are nonatomic, any mixed-strategy equilibrium corre-sponds to an “equivalent” pure-strategy equilibrium wherethe number of agents play-ing a given action is the expected number under the original equilibrium.

5.4.5 Selfish routing and the price of anarchy

Selfish routingis a model of how self-interested agents would route traffic through selfish routinga congested network. This model was studied as early as 1920—long before gametheory developed as a field. Today, we can understand these problems as nonatomiccongestion games.

Defining selfish routing

First, let us formally define the problem. LetG = (V,E) be a directed graph havingn source-sink pairs(s1, t1), . . . , (sn, tn). Some volume of traffic must be routed from

Multiagent Systems, draft of August 27, 2007

Page 170: Mas kevin b-2007

154 5 Richer Representations: Beyond the Normal and Extensive Forms

each source to each sink. For a given source-sink pair(si, ti) let Pi denote the set ofsimple paths fromsi to ti. We assume thatP 6= ∅ for all i; it is permitted for there tobe multiple “parallel” edges between the same pair of nodes in V , and for paths fromPi andPj (j 6= i) to share edges. Letµ ∈ R

n+ denote a vector oftraffic rates; µi

denotes the amount of traffic that must be routed fromsi to ti. Finally, every edgee ∈ E is associated with a cost functionce : R+ → R (think of it an amount of delay)that can depend on the amount of traffic carried by the edge. The problem in selfishrouting is to determine how the given traffic rates will lead traffic to flow along eachedge, assuming that agents are selfish and will direct their traffic to minimize the sumof their own costs.

Selfish routing problems can be encoded as nonatomic congestion games as follows:

• N is the set of source-sink pairs;

• µ is the set of traffic rates;

• R is the set of edgesE;

• Ai is the set of pathsPi from si to ti;

• ρi is always1;

• cr is the edge cost functionce.

The price of anarchy

From the above reduction to nonatomic congestion games and from Theorems 5.4.9and 5.4.10 we can conclude that every selfish routing problemhas at least one purestrategy Nash equilibrium,9 and that all of a selfish routing problem’s equilibria haveequal social cost. These properties allow us to ask an interesting question: how similaris the optimal social cost to the social cost under an equilibrium action distribution?

Definition 5.4.11 (Price of anarchy) Theprice of anarchyof a nonatomic congestionprice of anarchygame(N,µ,R,A, ρ, c) having equilibriums and social cost minimizing action distri-butions∗ is defined asC(s)

C(s∗) unlessC(s∗) = 0, in which case the price of anarchy isdefined to be1.

Intuitively, the price of anarchy is the proportion of additional social cost that isincurred because of agents’ self-interested behavior. Whenthis ratio is close to 1 fora selfish routing problem, one can conclude that the agents are routing traffic aboutas well as possible, given the traffic rates and network structure. When this ratio islarge, however, the agents’ selfish behavior is causing significantly suboptimal networkperformance. In this latter case one might want to seek ways of changing either thenetwork or the agents’ behavior in order to reduce the socialcost.

To gain a better understanding of the price of anarchy, and tolay the groundwork forsome theoretical results, consider the examples in Figure 5.14.

9. In the selfish routing literature these equilibria are known as Wardrop equilibria, after the author who firstproposed their use. For consistency we avoid that term here.

c© Shoham and Leyton-Brown, 2007

Page 171: Mas kevin b-2007

5.4 Congestion games 155

?>=<89:;s

c(x)=1**

c(x)=x

44?>=<89:;t

(a) Linear version

?>=<89:;s

c(x)=1**

c(x)=xp

44?>=<89:;t

(b) Nonlinear version

Figure 5.14 Pigou’s example: a selfish routing problem with an interesting price of anarchy.

In this example there is only one type of agent (n = 1) and the rate of traffic is 1(µ1 = 1). There are two paths froms to t, one of which is relatively slow but immuneto congestion, and the other which has congestion-dependent cost.

Consider first the linear version of the problem given in Figure 5.14(a). It is not hardto see that the Nash equilibrium is for all agents to choose the lower edge—indeed, thisis a Nash equilibrium in dominant strategies. The social cost of this Nash equilibriumis 1. Consider what would happen if we required half of the agentsto choose the upperedge, and the other half of the agents to choose the lower edge. In this case the socialcost would be3/4, because half the agents would continue to pay a cost of1 while halfthe agents would now pay a cost of only1/2. It is easy to show that this is the smallestsocial cost that can be achieved in this example, meaning that the price of anarchy hereis 4/3.

Now consider the nonlinear problem given in Figure 5.14(b),wherep is some largevalue. Again in the Nash equilibrium all agents will choose the lower edge, and againthe social cost of this equilibrium is1. Now, however, if evenε of the agents could becompelled to choose the upper edge the social cost would fallto ε + (1 − ε)p, whichapproaches 0 asε→ 0 andp→∞. Thus we can see that the price of anarchy tends toinfinity in the nonlinear version of Pigou’s example asp grows.

Bounding the price of anarchy

These examples illustrate that the price of anarchy is unbounded for unrestricted costfunctions. On the other hand, it turns out to be possible to offer bounds in the casewhere cost functions are restricted to a particular setC. First, we must define the so-called Pigou bound:

α(C) = supc∈C

supx,µ≥0

r · c(r)x · c(x) + (r − x)c(r) .

Whenα(C) evaluates to00 , we define it to be 1. We can now state a surprisingly strongresult.

Theorem 5.4.12The price of anarchy of a selfish routing problem whose cost func-tions are taken from the setC is never more thanα(C).

Observe that Theorem 5.4.12 makes a very broad statement, bounding a selfish rout-ing problem’s price of anarchy regardless of network structure and for any given familyof cost functions. Becauseα appears difficult to evaluate, one might find it hard to getexcited about this result. However,α can be evaluated for a variety of interesting sets

Multiagent Systems, draft of August 27, 2007

Page 172: Mas kevin b-2007

156 5 Richer Representations: Beyond the Normal and Extensive Forms

of cost functions. For example, whenC is the set of linear functionsax + b witha, b ≥ 0, α(C) = 4/3. Indeed,α(C) takes the same value whenC is the set of allconvex functions. This means that the bound from Theorem 5.4.12 is tight for this setof functions: Pigou’s linear example from Figure 5.14(a) uses only convex cost func-tions and we have already shown that this problem has a price of anarchy of precisely4/3. The linear version of Pigou’s example thus serves as a worst-case for the price ofanarchy among all selfish routing problems with convex cost functions. Because theprice of anarchy is relatively close to 1 for networks with convex edge costs, this resultindicates that centralized control of traffic offers limited benefit in this case.

What about other families of cost functions, such as polynomials with nonnegativecoefficients and bounded degree? It turns out that the Pigou bound is also tight forthis family, and that the nonlinear variant of Pigou’s example offers the worst-possibleprice of anarchy in this case (wherep is the bound on the polynomials’ degree). Forthis familyα(C) = [1−p ·(p+1)−(p+1)/p]−1. To give some examples, this means thatthe price of anarchy is about1.6 for p = 2, about2.2 for p = 4, about18 for p = 100and—as we saw earlier—unbounded asp→∞.

Results also exist bounding the price of anarchy for generalnonatomic congestiongames. It is beyond the scope of this section to state these results formally, but we notethat they are qualitatively similar to the results given above. More information can befound in the references cited in the chapter notes.

Reducing the social cost of selfish routing

When the equilibrium social cost is undesirably high, a network operator might wantto intervene in some way in order to reduce it. First, we give an example to show thatsuch interventions are possible, known asBraess’ paradox.Braess’ paradox

c(x)=1c(x)=xv

w

c(x)=1 c(x)=x

c(x)=0

(a) Original network

c(x)=1c(x)=xv

w

c(x)=1 c(x)=x

(b) After edge removal

Figure 5.15 Braess’ paradox: removing an edge that has zero cost can improvesocial welfare.

Consider first the example in Figure 5.15(a). This selfish routing problem is essen-tially a more complicated version of the linear version of Pigou’s example from 5.14(a).Again n = 1 andµ1 = 1. Agents have a dominant strategy of choosing the paths-v-w-t, and so in equilibrium all traffic will flow along this path. The social cost inequilibrium is therefore 1. Minimal social cost is achievedby having half of the agentschoose the paths-v-t and having the other half of the agents choose the paths-w-t; the

c© Shoham and Leyton-Brown, 2007

Page 173: Mas kevin b-2007

5.4 Congestion games 157

social cost in this case is3/4. Like the linear version of Pigou’s example, therefore,the price of anarchy is4/3.

The interesting thing about this new example is the role played by the edgev-w.One might intuitively believe that zero-cost edges can onlyhelp in routing problems,because they provide agents with a costless way of routing traffic from one node toanother. At worst, one might reason, such edges would be ignored. However, this intu-ition is wrong. Consider the network in Figure 5.15(b). Thisnetwork was constructedfrom the network in Figure 5.15(a) by removing the zero-costedgev-w. In this modi-fied problem agents no longer have a dominant strategy; the equilibrium is for half ofthem to choose each path. This is also the optimal action distribution, and hence theprice of anarchy in this case is1. We can now see the (apparent) paradox: removingeven a zero-cost edge can transform a selfish routing problemfrom the very worst (thehighest price of anarchy possible given its family of cost functions) to the very best (anetwork in which selfish agents will choose to route themselves optimally).

A network operator facing a high price of anarchy might therefore want to removeone or more edges in order to improve the network’s social cost in equilibrium. Unfor-tunately, however, the problem of determining which edges to remove is computation-ally hard.

Theorem 5.4.13 It is NP-complete to determine whether there exists any set of edgeswhose removal from a selfish routing problem would reduce thesocial cost in equilib-rium.

In particular, this result implies that identifying the optimal set of edges to removefrom a selfish routing problem in order to minimize the socialcost in equilibrium isalsoNP-complete.

Of course, it is always possible to reduce a network’s socialcost in equilibrium by re-ducing all of the edge costs. (This could be done in an electronic network, for example,by installing faster routers.) Interestingly, even in the case where the edge functionsare unconstrained and the price of anarchy is therefore unbounded, a relatively mod-est reduction in edge costs can outperform the imposition ofcentralized control in theoriginal network.

Theorem 5.4.14Let Γ be a selfish routing problem, and letΓ′ be identical toΓ ex-cept that each edge costce(x) is replaced byc′e(x) = ce(x/2)/2. The social cost inequilibrium ofΓ′ is less than or equal to the optimal social cost inΓ.

This result suggests that when it is relatively inexpensiveto speed up a network,doing so can have more significant benefits than getting agents to change their behavior.

Finally, we will briefly mention two other methods of reducing social cost in equi-librium. First, in so-calledStackelberg routinga small fraction of agents are routedStackelberg

routingcentrally, and the remaining population of agents is free tochoose their own actions.It should already be apparent from the example in Figure 5.14(b) that such an ap-proach can be very effective in certain networks. Second, taxes can be imposed oncertain edges in the graph in order to encourage agents to adopt more socially benefi-cial behavior. The dominant idea here is to charge agents according to “marginal costpricing”—each agent pays the amount his presence cost other agents who are using

Multiagent Systems, draft of August 27, 2007

Page 174: Mas kevin b-2007

158 5 Richer Representations: Beyond the Normal and Extensive Forms

the same edge.10 Under certain assumptions taxes can be set up in a way that inducesoptimal action distributions; however, the taxes themselves can be very large. Variouspapers in the literature elaborate upon and refine both of these ideas.

5.5 Computationally-motivated compact representations

So far we have examined game representations that are motivated primarily by thegoals of capturing relevant details of real-world domains and of showing that all gamesexpressible in the representation share useful theoretical properties. Many of theserepresentations—especially the normal and extensive forms—suffer from the problemthat their encodings of interesting games are so large as to be impractical. For exam-ple, when you describe to someone the rules of poker, you don’t give them a normalor extensive-form description; such a description would fill volumes and be almostunintelligible. Instead, you describe the rules of the gamein a very compact form,which is possible because of the inherent structure of the game. In this section we willexplore some computationally-motivated alternative representations that allow certainlarge games to be compactly described and also make it possible to efficiently find anequilibrium. The first two representations, graphical games and action-graph games,apply to normal form games, while the following two, multiagent influence diagramsand the GALA language, apply to extensive-form games.

5.5.1 The expected utility problem

We begin by defining a problem that is fundamental to the discussion of computationally-motivated compact representations.

Definition 5.5.1 (EXPECTEDUTILITY ) Given a game (possibly represented in a com-pact form), a mixed strategy profiles, andi ∈ N , theEXPECTEDUTILITY problemis

EXPECTEDUTILITY

problemto computeEUi(s), the expected utility of playeri.

Our chief interest in this section will be in the computational complexity of theEXPECTEDUTILITY problem for different game representations. When we considerednormal-form games, we previously showed (Definition 2.2.7)that EXPECTEDUTILITY

can be computed as

EUi(s) =∑

a∈A

ui(a)

n∏

j=1

sj(aj). (5.9)

If we interpret Equation (5.9) as a simple algorithm, we havea way of solving EX-PECTEDUTILITY in time exponential in the number of agents. (This is because, as-suming for simplicity that all agents have the same number ofactions, the size ofA is|Ai|n.) However, since the representation size of a normal form game is itself expo-nential in the number of agents (it isO(|Ai|n)), the problem can in fact be solved in

10. Here we anticipate the idea ofmechanism design, introduced in Chapter 9, and especially the VCGmechanism from Section 9.4.

c© Shoham and Leyton-Brown, 2007

Page 175: Mas kevin b-2007

5.5 Computationally-motivated compact representations 159

time linear in the size of the representation. Thus EXPECTEDUTILITY does not appearto be very computationally difficult.

Interestingly though, as game representations become exponentially more compactthan the normal form, it grows more challenging to solve the EXPECTEDUTILITY prob-lem efficiently. This is because our simple algorithm given by Equation (5.9) requirestime exponential in the size of such more compact representations. The trick with com-pact representations, therefore, will not be simply findingsome way of representingpayoffs compactly—indeed, there are any number of schemes from the compressionliterature that could achieve this goal. Rather, we will want the additional property thatthe compactness of the representation can be leveraged by anefficient algorithm forcomputing EXPECTEDUTILITY .

The first challenge is to ensure that the inputs to EXPECTEDUTILITY can be speci-fied compactly.

Definition 5.5.2 (Polynomial type) A game representation haspolynomial typeif the polynomial typenumber of agentsn and the sizes of the pure strategy sets|Si| are polynomially boundedin the size of the representation.

Representations always have polynomial type when their action sets are specifiedexplicitly. However, some representations—such as the extensive form—implicitlyspecify pure-strategy spaces that are exponential in the size of the representation, andso do not have polynomial type.

When we combine the polynomial type requirement with a further requirement aboutEXPECTEDUTILITY being efficiently computable, we obtain the following theorem.

Theorem 5.5.3 If a game representation satisfies the following properties:

1. the representation has polynomial type, and

2. EXPECTEDUTILITY can be computed using an arithmetic binary circuit with poly-nomial length, with nodes evaluating to constant values or performing addition,subtraction or multiplication on their inputs,

then the problem of finding a Nash equilibrium in this representation can be reducedto the problem of finding a Nash equilibrium in a two-player normal form game that isonly polynomially larger.

We know from Theorem 3.2.1 in Section 3.2 that the problem of finding a Nash equi-librium in a two-player normal form game isPPAD-complete. Therefore this theoremimplies that if the above condition holds, the problem of finding a Nash equilibriumfor a compact game representation is inPPAD. This should be understood as a positiveresult: if a game in its compact representation is exponentially smaller than its inducednormal form, and if computing an equilibrium for this representation belongs to thesame complexity class as computing an equilibrium of a normal form game, then equi-libria can be computed exponentially more quickly using thecompact representation.

Observe that the second condition in Theorem 5.5.3 implies that the EXPECTEDU-TILITY algorithm takes polynomial time; however, not every polynomial-time algo-rithm will satisfy this condition. Congestion games are an example of games that do

Multiagent Systems, draft of August 27, 2007

Page 176: Mas kevin b-2007

160 5 Richer Representations: Beyond the Normal and Extensive Forms

meet the conditions of Theorem 5.5.3. We will see two more such representations inthe next sections. What about extensive form games, which do not have polynomialtype—might it be harder to compute their Nash equilibria? Luckily we can use behav-ioral strategies, which can be represented linearly in the size of the game tree. Then weobtain the following result.

Theorem 5.5.4 The problem of computing a Nash equilibrium in behavioral strategiesin an extensive form game can be polynomially reduced to finding a Nash equilibriumin a two-player normal form game.

This shows that the speedups we achieved by using the sequence form in Sec-tion 4.2.3 were not achieved simply because of inefficiency in our algorithms fornormal-form games. Instead, there is a fundamental computational benefit to workingwith extensive-form games, at least when we restrict ourselves to behavioral strategies.

Fast algorithms for solving EXPECTEDUTILITY are useful for more than just demon-strating the worst-case complexity of finding a Nash equilibrium for a game represen-tation. EXPECTEDUTILITY is also a bottleneck step in several practical algorithmsfor computing Nash equilibria, such as the Govindan-Wilsonalgorithm and simplicialsubdivision methods (see Section 3.3). Plugging a fast method for solving EXPECTE-DUTILITY into one of these algorithms offers a simple way of more quickly computinga Nash equilibrium of a compactly-represented game.

The complexity of the EXPECTEDUTILITY problem is also relevant to the computa-tion of solution concepts other than the Nash equilibrium.

Theorem 5.5.5 If a game representation has polynomial type and has a polynomialalgorithm for computingEXPECTEDUTILITY , then a correlated equilibrium can becomputed in polynomial time.

The attentive reader may recall that we have already showed (in Section 3.8) thatcorrelated equilibria can be identified in polynomial time by solving a linear program(Figure 3.19). Thus, Theorem 5.5.5 may not seem very interesting. The catch, as withexpected utility, is that while this LP has size polynomial in size of the normal form,its size would be exponential in the size of many compact representations. Specifi-cally, there is one variable in the linear program for each joint action profile, and sooverall the linear program will have size exponential in anyrepresentation for whichthe simple EXPECTEDUTILITY algorithm discussed above is inadequate. Indeed, inthese cases evenrepresentinga correlated equilibrium using these probabilities of ac-tion profiles would be exponential. Theorem 5.5.5 is thus a much deeper result than itmay first seem. Its proof begins by showing that there exists acorrelated equilibrium ofevery compactly-represented game that can be written as themixture of a polynomialnumber ofproduct distributions, where a product distribution is a joint probability dis-tribution over action profiles arising from each player independently randomizing overhis actions (i.e., adopting a mixed strategy profile). Sincethe theorem requires that thegame representation has polynomial type, each of these product distributions can becompactly represented. Thus a polynomial mixture of product distributions can also berepresented polynomially. The rest of the proof appeals to linear programming dualityand to properties of the ellipsoid algorithm.

c© Shoham and Leyton-Brown, 2007

Page 177: Mas kevin b-2007

5.5 Computationally-motivated compact representations 161

5.5.2 Graphical games

Graphical gamesare a compact representation of normal form games that use graphicalmodels to capture thepayoff independencestructure of the game. Intuitively, a player’spayoff matrix can be written compactly if his payoff is affected by only a subset of theother players.

Let us begin with an example. Considern agents, each of whom has purchased apiece of land alongside a road. Each agent has to decide what to build on his land. Hispayoff depends on what he builds himself, what is built on theland to either side ofhis own, and what is built across the road. Intuitively, the payoff relationships in thissituation can be understood using the graph shown in Figure 5.16, where each noderepresents an agent.

Figure 5.16 Graphical game representation of the road example.

Now let us define the representation formally. First, we define a neighborhood rela-tion on a graph: the set of nodes connected to a given node, plus the node itself.

Definition 5.5.6 (neighborhood relation) For a graph defined on a set of nodesNand edgesE, for everyi ∈ N define theneighborhood relationν : N → 2N as neighborhood

relationν(i) = i ∪ j|(j, i) ∈ E.

Now we can define the graphical game representation itself.

Definition 5.5.7 (graphical game)A graphical gameis a tuple〈N,E,A, u〉, where graphical game

• N is a set ofn vertices, representing agents,

• E is a set of undirected edges connecting the nodesN ,

• A = (A1, . . . , An), whereAi is the set of actions available to agenti, and

• u = (u1, . . . , un), ui : Aν(i) → R, whereAν(i) = ×j∈ν(i)Aj .

An edge between two vertices in the graph can be interpreted as meaning that thetwo agents are able to affect each other’s payoffs. In other words, whenever two nodesi andj arenot connected in the graph, agenti must always receive the same payoffunder any action profiles(aj , a−j) and(a′j , a−j), aj , a

′j ∈ Aj . Graphical games can

represent any game, but of course they are not always compact. The space complexityof the representation is exponential in the size of the largestν(i). In the example abovethe size of the largestν(i) is 4, and this is independent of the total number of agents. As

Multiagent Systems, draft of August 27, 2007

Page 178: Mas kevin b-2007

162 5 Richer Representations: Beyond the Normal and Extensive Forms

a result, the graphical game representation of the example requires polynomial spacein n, while a normal form representation would require space exponential inn.

The following is sufficient to show that the properties we discussed above in Sec-tion 5.5.1 hold for graphical games.

Lemma 5.5.8 TheEXPECTEDUTILITY problem can be computed in polynomial timefor graphical games, and such an algorithm can be translatedto an arithmetic circuitas required by Theorem 5.5.3.

The way that graphical games capture payoff independence ingames is similar tothe way that Bayesian networks and Markov random fields capture conditional inde-pendence in multivariate probability distributions. It should therefore be unsurprisingthat many computations on graphical games can be performed efficiently using algo-rithms similar to those proposed in the graphical models literature. For example, whenthe graph(N,E) defines a tree, a message-passing algorithm called NASHPROP cancompute anǫ-Nash equilibrium in time polynomial in1/ǫ and the size of the represen-tation. NASHPROP consists of two phases: a “downstream” pass in which messagesare passed from the leaves to the root, and then an “upstream”pass in which messagesare passed from the root to the leaves. When the graph is a path,a similar algorithmcan find an exact equilibrium in polynomial time.

We may also be interested in finding pure-strategy Nash equilibria. Determiningwhether a pure-strategy equilibrium exists in a graphical game isNP-complete. How-ever, the problem can be formulated as a Markov random field, and solved using stan-dard probabilistic inference algorithms. In particular, when the graph has constant orlog-sizedtreewidth,11 the problem can be solved in polynomial time.

Graphical games have also been useful as a theoretical tool.In fact, they are in-strumental in the proof of Theorem 3.2.1, which showed that finding a sample Nashequilibrium of a normal-form game isPPAD-complete. Intuitively, graphical gamesare important to this proof because such games can be constructed that simulate arith-metic circuits in their equilibria.

5.5.3 Action-graph games

Consider a scenario similar to the road example in Section 5.5.2, with one major dif-ference: instead of deciding what to build, here agents needto decidewhereto build.Suppose each of then agents is interested in opening a business (say a coffee shop),and can choose to locate in any block along either side of a road. Multiple agents canchoose the same block. Agenti’s payoff depends on the number of agents who chosethe same block as he did, as well as the numbers of agents who chose each of the adja-cent blocks of land. This game has an obvious graphical structure, which is illustratedin Figure 5.16. Here nodes correspond to actions, and each edge indicates that an agentwho takes one action affects the payoffs of other agents who take some second action.

11. A graph’streewidthis a measure of how close the graph is to a tree. It is defined using thetree decom-positionof the graph. ManyNP-complete problems on graphs can be solved efficiently when thegraph hassmall treewidth.

c© Shoham and Leyton-Brown, 2007

Page 179: Mas kevin b-2007

5.5 Computationally-motivated compact representations 163

B1 B3

T4

B4B2

T3T2T1

B5 B7

T8

B8B6

T7T6T5

Figure 5.17 Modified road example.

Notice that any pair of agents can potentially affect each other’s payoffs by choosingadjacent locations. This means that the graphical game representation of this game isa clique, and its space complexity is the same as that of the normal form (exponentialin n). The problem is that graphical games are only compact for games withstrictpayoff independencies: i.e., where there exist pairs of players who can never (directly)affect each other. This game exhibitscontext-specific independenceinstead: whethertwo agents are able to affect each other’s payoffs depends onthe actions they choose.The action-graph game (AGG) representation exploits this kind of context-specific in-dependence. Intuitively, this representation is built around the graph structure shownin in Figure 5.17. Since this graph has actions rather than agents serving as the nodes,it is referred to as an action graph.

Definition 5.5.9 (action graph) Anaction graphis a tuple(A, E), whereA is a set of action graphnodes corresponding to actions andE is a set of directed edges.

We want to allow for settings where agents have different actions available to them,and hence where an agent’s action set is not identical toA. (For example, no two agentscould be able to take the “same” action, or every agent could have the same action setas in Figure 5.17.) We thus define as usual a set of action profilesA = (A1, . . . , An),and then letA =

⋃i∈N Ai. If two actions have the same name, they will collapse to

the same element ofA; otherwise they will correspond to two different elements of A.Given an action graph and a set of agents, we can further definea configuration,

which is a possible arrangement of agents over nodes in an action graph.

Definition 5.5.10 (configuration) Given an action graph(A, E) and a set of actionprofilesA, a configurationc is a tuple of|A| non-negative integers, where thejth el- configurationementcj is interpreted as the number of agents who chose thejth action aj ∈ A,and where there exists somea ∈ A that would give rise toc. Denote the set of allconfigurations asC.

Observe that multiple action profiles might give rise to the same configuration, be-cause configurations simply count the number of agents who took each action withoutworrying about which agent took which action. For example, in the road example allaction profiles in which exactly half of the agents take action T1 and exactly half the

Multiagent Systems, draft of August 27, 2007

Page 180: Mas kevin b-2007

164 5 Richer Representations: Beyond the Normal and Extensive Forms

agents take actionB8 give rise to the same configuration. Intuitively, configurationswill allow action-graph games to compactly representanonymitystructure: cases whereanonymityan agent’s payoffs depend on the aggregate behavior of otheragents, but not on whichparticular agents take which actions. Recall that we saw such structure in congestiongames (Section 5.4).

Intuitively, we will use the edges of the action graph to denote context-specific in-dependence relations in the game. Just as we did with graphical games, we will definea utility function that depends on the actions taken in some local neighborhood. As itwas for graphical games, the neighborhoodν will be defined by the edgesE; indeed,we will use exactly the same definition (Definition 5.5.6). Inaction graph games theidea will be that the payoff of a player playing an actions only depends on the configu-ration on the neighbors ofs. We must therefore define notation for such a configurationover a neighborhood. LetCν(j) denote the set of all restrictions of configurations to theelements corresponding to the neighborhood ofj. (That is, eachc ∈ Cν(i) is a tupleof lengthν(i).) Thenuj , the utility for any agent who takes actionj ∈ A, will be amapping fromCν(j) to the real numbers.

Summing up, we can state the formal definition of action-graph games as follows.

Definition 5.5.11 Anaction-graph game (AGG)is a tuple(N,A, (A, E), u), whereaction-graphgame • N is the set of agents;

• A = (A1, . . . , An), whereAi is the set of actions available to agenti;

• (A, E) is an action graph, whereA =⋃

i∈N Ai is the set of distinct actions;

• u = (u1, . . . , u|A|), uj : Cν(j) → R.

Since each utility function is a mapping only from the possible configurations overthe neighborhood of a given action, the utility function canbe represented concisely.In the road example, since each node has at most four incomingedges, we only needto storeO(n4) numbers for each node, andO(|A|n4) numbers for the entire game. Ingeneral, when the in-degree of the action graph is bounded bya constant, the spacecomplexity of the AGG representation is polynomial inn.

Like graphical games, AGGs are fully expressive. Arbitrarynormal form games canbe represented as AGGs with non-overlapping action sets. Graphical games can beencoded in the same way, but with a sparser edge structure. Indeed, the AGG encodingof a graphical game is just as compact as the original graphical game.

Although it is somewhat involved to show why this is true, action graph games havethe theoretical properties we have come to expect from a compact representation.

Theorem 5.5.12Given an AGG,EXPECTEDUTILITY can be computed in time poly-nomial in the size of the AGG representation by an algorithm represented as an arith-metic circuit as required by Theorem 5.5.3. In particular, if the in-degree of the actiongraph is bounded by a constant, the time complexity is polynomial in n.

Finally, the AGG representation can be extended to includefunction nodes, whichare special nodes in the action graph that do not correspond to actions. For each func-tion nodep, cp is defined as a deterministic function of the configuration ofits neigh-borsν(p). Function nodes can be used to represent a utility function’s intermediate

c© Shoham and Leyton-Brown, 2007

Page 181: Mas kevin b-2007

5.5 Computationally-motivated compact representations 165

parameters, allowing the compact representation of games with additional forms ofindependence structure. Computationally, when a game withfunction nodes has theproperty that each player affects the configurationc independently, EXPECTEDUTIL -ITY can still be computed in polynomial time.

5.5.4 Multiagent influence diagrams

Multiagent influence diagrams (MAIDs)are a generalization ofinfluence diagrams Multiagentinfluencediagrams(MAIDs)

(IDs), a compact representation for decision-theoretic reasoning in the single agentcase. Intuitively, MAIDs can be seen as a combination of graphical games and ex-tensive form games with chance moves (see Section 5.3. Not all variables (moves bynature) and action nodes depend on all other variables and action nodes, and only thedependencies need to be represented and reasoned about.

We will give a brief overview of MAIDs using the following example. Alice isconsidering building a patio behind her house, and the patiowould be more valuableto her if she could get a clear view of the ocean. Unfortunately, there is a tree inher neighbor Bob’s yard that blocks her view. Being somewhatunscrupulous, Aliceconsiders poisoning Bob’s tree, which might cause it to become sick. Bob cannot tellwhether Alice has poisoned his tree, but he can tell if the tree is getting sick, andhe has the option of calling in a tree doctor (at some cost). The attention of a treedoctor reduces the chance that the tree will die during the coming winter. Meanwhile,Alice must make a decision about building her patio before the weather gets too cold.When she makes this decision, she knows whether a tree doctor has come, but shecannot observe the health of the tree directly. A MAID for this scenario is shown inFigure 5.18.

ViewCost

BuildPatio

PoisonTree

Tree

TreeDoctor

TreeDead

TreeSick

Figure 5.18 A multiagent influence diagram. Nodes for Alice are in light gray, while Bob’sare in dark gray.

Multiagent Systems, draft of August 27, 2007

Page 182: Mas kevin b-2007

166 5 Richer Representations: Beyond the Normal and Extensive Forms

Chance variables are represented as ovals, decision variables as rectangles, and util-ity variables as diamonds. Each variable has a set of parents, which may be chancevariables or decision variables. Each chance node is characterized by a conditionalprobability distribution, which defines a distribution over the variable’s domain foreach possible instantiation of its parents. Similarly, each utility nodes records the con-ditional value for the corresponding agent. If multiple utility nodes exist for the sameagent, as they do for in this example for Bob, the total utility is simply the sum of thevalues from each node. Decision variables differ in that their parents (connected bydotted arrows) are the variables that an agent observes whenmaking his decision. Thisallows us to represent the information sets in a compact way.

For each decision node, the corresponding agent constructsa decision rule, which isa distribution over the domain of the decision variable for each possible instantiationof this nodes parents. A strategy for an agent consists of a decision rule for each of hisdecision nodes. Since a decision node acts as a chance node once its decision rule isset, we can calculate the expected utility of an agent given astrategy profile. As youwould expect, strategy profile is a Nash equilibrium in a MAIDif no agent can improveits expected utility by switching to a different set of decision rules.

This example shows several of the advantages of the MAID representation over theequivalent extensive-form game representation. Since there are a total of five chanceand decision nodes, and all variables are binary, the game tree would have 32 leaves,each with a value for both agents. In the MAID, however, we only need four values foreach agent to fill tables for the utility nodes. Similarly, redundant chance nodes of thegame tree are replaced by small conditional probability tables. In general, the spacesavings of MAIDs can be exponential (although it is possiblethat this relationship isreversed if the game tree is sufficiently asymmetric).

The most important advantage of MAIDs is that they allow moreefficient algorithmsfor computing equilibria, as we will informally show for theexample. The efficiencyof the algorithm comes from exploiting the property ofstrategic relevancein a waystrategic

relevance that is related to backward induction in perfect-information games. A decision nodeD2 is strategically relevant to another decision nodeD1 if, to optimize the rule atD1,the agent needs to consider the rule atD2. We omit a formal definition of strategicrelevance, but point out that it can be computed in polynomial time.

No decision nodes are strategically relevant toBuildPatio for Alice, because sheobserves both of the decision nodes (PoisonTreeandTreeDoctor) that could affect herutility before she has to make this decision. Thus, when finding an equilibrium, wecan optimize this decision rule independently of the othersand effectively convert itinto a chance node. Next, we observe thatPoisonTreeis not strategically relevant toTreeDoctor, because any influence thatPoisonTreehas on a utility node for Bob mustgo throughTreeSick, which is a parent ofTreeDoctor. After optimizing this decisionnode, we can obviouslyPoisonTreeby itself, yielding an equilibrium strategy profile.

Obviously not all games allow such a convenient decomposition. However, as longas there exists some subset of the decision nodes such that nonode outside of this subsetis relevant to any node in the subset, then we can achieve somecomputational savingsby jointly optimizing the decision rules for this subset before tackling the rest of theproblem. Using this general idea, an equilibrium can often be found exponentiallyfaster than in standard extensive-form games. An efficient algorithm also exists for

c© Shoham and Leyton-Brown, 2007

Page 183: Mas kevin b-2007

5.5 Computationally-motivated compact representations 167

computing EXPECTEDUTILITY for MAIDs.

Theorem 5.5.13The EXPECTEDUTILITY problem for MAIDs can be computed intime polynomial in the size of the MAID representation.

Unfortunately the only known algorithm for efficiently solving EXPECTEDUTILITY

in MAIDS uses divisions, and so cannot be directly translated to an arithmetic circuitas required in Theorem 5.5.3, which does not allow division operations. It is unknownwhether the problem of finding a Nash equilibrium in a MAID canbe reduced to findinga Nash equilibrium in a two-player game. Nevertheless many other applications forcomputing EXPECTEDUTILITY we discussed in Section 5.5.1 apply to MAIDs. Inparticular, the EXPECTEDUTILITY algorithm can be used as a subroutine to Govindanand Wilson’s algorithm for computing Nash equilibria in extensive-form games.

5.5.5 GALA

While MAIDs allow us to capture exactly the relevant information needed to makea decision at each point in the game, we still need to explicitly record each choicepoint of the game. When, instead of modelling real-world setting, we are modelling aboard or card game, this task would be rather cumbersome, if not impossible. The keyproperty of these games that is not being exploited is their repetitive nature—the gamealternates between the opponents whose possible moves are independent of the depthof the game tree, and can instead be defined in terms of the current state of the gameand an unchanging set of rules. The Prolog-based language GALA exploits this fact toallow concise specifications of large, complex games.

We present the main ideas of the language using the code in Figure 5.19 for animperfect-information variant of tic-tac-toe. Each player can mark a square with eitheran ‘x’ or an ‘o’, but the opponent sees only the position of themark, not its type. Aplayer wins if his move creates a line of the same type of mark.

game(blind tic tac toe, (1)[ players : [a,b], (2)

objects : [grid board : array(‘ $size’, ‘ $size’)], (3)params : [size], (4)flow : (take turns(mark,unless(full),until(win))), (5)mark : (choose(‘ $player’, (X, Y, Mark), (6)

(empty(X,Y), member(Mark, [x,o]))), (7)reveal(‘ $opponent’,(X,Y)), (8)place((X,Y),Mark)), (9)

full : ( \+(empty( , )) → outcome(draw)), (10)win : (straight line( , ,length = 3, (11)

contains(Mark)) → outcome(wins(‘ $player’)))]). (12)

Figure 5.19 A GALA description of blind tic-tac-toe.

Lines 3 and 5 define the central components of the representation—the objectgrid board that records all marks, and the flow of the game, which is defined astwo players alternating moves until either the board is fullor one of the them wins thegame. Lines 6–12 then provide the definitions of the terms used in line 5. Three of

Multiagent Systems, draft of August 27, 2007

Page 184: Mas kevin b-2007

168 5 Richer Representations: Beyond the Normal and Extensive Forms

the functions found in these lines are particularly important because of their relationto the corresponding extensive-form game:choose (line 8) defines the available ac-tions at each node,reveal (line 6) determines the information sets of the players, andoutcome (lines 10 and 12) defines the payoffs at the leaves.

Reading through the code in Figure 5.19, one finds not only primitives likearray ,but also several high-level modules, likestraight line , that are not defined. TheGALA language contains many such predicates, built up from primitives, that wereadded to handle conditions common to games people play. For example, the high-levelpredicatestraight line is defined using the intermediate-level predicatechain ,which in turn is defined to take a predicate and a set as input and return true if thepredicate holds for the entire set. The idea behind intermediate-level predicates is thatthey make it easier to define the high-level predicates specific to a game. For example,chain can be used in poker to define a flush.

On top of the language, the GALA system was implemented to take a description ofa game in the GALA language, generate the corresponding gametree, and then solvethe game using the sequence form of the game. In fact the representation uses thesequence form, defined in Section 4.2.3.

Since we lose the space savings of the GALA language when we actually solve thegame, the main advantage of the language is the ease with which it allows a human todescribe a game to the program that will solve it.

5.6 History and references

Some of the earliest and most influential work on repeated games is [Luce & Raiffa,1957; Aumann, 1959]. Of particular note is that the former provided the main ideasbehind the folk theorem, and that the latter explored the theoretical differences betweenfinitely and infinitely repeated games. Our own proof of the folk theorem is based on[Osborne & Rubinstein, 1994]. For an extensive discussion of the Tit-for-Tat strategyin the repeated game Prisoner’s Dilemma, and in particular its strong performance in atournament of computer programs, see [Axelrod, 1984a].

While most game theory textbooks have material on so-called “bounded rationality,”the most comprehensive repository of results in the area wasassembled by Rubin-stein [1998]. Some of the specific references are as follows.Theorem 5.1.8 is due toNeyman [1985], while Theorem 5.1.9 is due to Papadimitriou and Yannakakis [1994].Theorem 5.1.10 is due to Gilboa [1988], and Theorem 5.1.11 isdue to Ben-Porath[1990]. Theorem 5.1.12 is due to Papadimitriou [1992]. Finally, Theorems 5.1.13 and5.1.14 are due to [Nachbar & Zame, 1996].

In 1953 Shapley discussed stochastic games [Shapley, 1953]. Bayesian games wereintroduced by Harsanyi [1967-8].

Congestion games were first defined by Rosenthal [1973]; later potential games wereintroduced by Monderer and Shapley [1996a] and were shown tobe equivalent to con-gestion games (up to isomorphism). ThePLS-completeness result is due to Fabrikantet al. [2004]. Nonatomic congestion games are due to Schmeidler [1973]. Selfish rout-ing was first studied as early as 1920 [Pigou, 1920; Beckmannet al., 1956]. Pigou’sexample comes from the former reference; Braess’ paradox was introduced in [Braess,

c© Shoham and Leyton-Brown, 2007

Page 185: Mas kevin b-2007

5.6 History and references 169

1968]. The Wardrop equilibrium is due to Wardrop [1952]. Theconcept of the priceof anarchy is due to Koutsoupias and Papadimitriou [1999]. Most of the results inSection 5.4.5 are due to Roughgarden and his coauthors; see his recent book [Rough-garden, 2005]. Similar results have also been shown for broader classes of nonatomiccongestion games; see [Roughgarden & Tardos, 2004; Correaet al., 2005].

Theorems 5.5.3 and 5.5.4 are due to Daskalakiset al. [2006a]. Theorem 5.5.5 is dueto Papadimitriou [2005]. Graphical games were introduced in [Kearnset al., 2001].The problem of finding pure Nash equilibria in graphical games was analyzed in [Gott-lob et al., 2003] and [Daskalakis & Papadimitriou, 2006]. Action graph games were de-fined in [Bhat & Leyton-Brown, 2004] and extended in [Jiang & Leyton-Brown, 2006].Multiagent influence diagrams (MAIDs) were introduced in [Koller & Milch, 2001],which also contained the running example we used for that section. A related notionof game networkswas concurrently introduced by LaMura [2000]. Theorem 5.5.13 is game networksdue to Blumet al. [2003]. GALA is described in [Koller & Pfeffer, 1995], whichalsocontained the sample code for the tic-tac-toe example.

Multiagent Systems, draft of August 27, 2007

Page 186: Mas kevin b-2007
Page 187: Mas kevin b-2007

6 Learning and Teaching

The capacity to learn is a key facet of intelligent behavior,and it is no surprise thatmuch attention has been devoted to the subject in the variousdisciplines that studyintelligence and rationality. We will concentrate on techniques drawn primarily fromtwo such disciplines—artificial intelligence and game theory—although those in turnborrow from a variety of disciplines, including control theory, statistics, psychologyand biology, to name a few. We start with an informal discussion of the various subtleaspects of learning in multiagent systems, and then discussrepresentative theories inthis area.

6.1 Why the subject of ‘learning’ is complex

The subject matter of this chapter is fraught with subtleties, and so we begin withan informal discussion of the area. We address three issues—the interaction betweenlearning and teaching, the settings in which learning takesplace and what constituteslearning in those settings, and the yardsticks by which to measure this or that theory oflearning in multiagent systems.

6.1.1 The interaction between learning and teaching

Most work in artificial intelligence concerns the learning performed by an individualagent. In that setting the goal is to design an agent that learns to function successfully inan environment that is unknown and potentially also changesas the agent is learning. Abroad range of techniques have been developed, and learningrules have become quitesophisticated.

In a multiagent setting, however, an additional complication arises, since the envi-ronment contains (or perhaps consists entirely of) other agents. The problem is notonly that the other agents’ learning will change the environment for our protagonistagent—dynamic environments feature already in the single-agent case—but that thesechanges will depend in part on the actions of the protagonistagent. That is, the learningof the other agents will be impacted by the learning performed by our protagonist.

The simultaneous learning of the agents means that every learning rule leads to adynamical system, and sometimes even very simple learning rules can lead to complexglobal behaviors of the system. Beyond this mathematical fact, however, lies a concep-

Page 188: Mas kevin b-2007

172 6 Learning and Teaching

tual one. In the context of multiagent systems one cannot separate the phenomenon oflearning from that ofteaching; when choosing a course of action, an agent must takelearning and

teaching into account not only what it has learned from other agents’ past behavior, but also howit wishes to influence their future behavior.

The following example illustrates this point. Consider theinfinitely repeated gamewith limit average reward (that is, where the payoff to a given agent is the limit averageof his payoffs in the individual stage games), in which the stage game is the normal-form game shown in Figure 6.1.Stackelberg

game

L R

T 1, 0 3, 2

B 2, 1 4, 0

Figure 6.1 Stackelberg game: player 1 must teach player 2.

First note that player 1 (the row player) has a dominant strategy, namelyB. Also notethat(B,L) is the unique Nash equilibrium of the game. Indeed, if player1 were to playB repeatedly, it is reasonable to expect that player 2 would always respond withL. Ofcourse, if player 1 were to chooseT instead, then player 2’s best response would beR,yielding player 1 a payoff of 3 which is greater than player 1’s Nash equilibrium payoff.In a single-stage game it would be hard for player 1 to convince player 2 that it (player1) will play T , since it is a strictly dominated strategy.1 However, in a repeated-gamesetting agent 1 has an opportunity to put his payoff where hismouth is, and adopt therole of a teacher. That is, player 1 could repeatedly playT ; presumably, after a whileplayer 2, if she has any sense at all, would get the message andstart responding withR.

In the above example it is pretty clear who the natural candidate for adopting theteacher role is. But consider now the repetition of coordination game, reproducedin Figure 6.2. In this case, either player could play the teacher with equal success.

Left Right

Left 1 0

Right 0 1

Figure 6.2 Who’s the teacher here?

However, if both decide to play teacher and happen to select uncoordinated actions

1. See related discussion on signaling and cheap talk in Chapter 7.

c© Shoham and Leyton-Brown, 2007

Page 189: Mas kevin b-2007

6.1 Why the subject of ‘learning’ is complex 173

(Left,Right) or (Right,Left) then the players will receive a payoff of zero forever.2 Isthere a learning rule that will enable them to coordinate without an external designationof a teacher?

6.1.2 What constitutes learning?

In the examples above the setting was a repeated game. We consider this a ‘learning’setting because of the temporal nature of the domain, and theregularity across time(at each time the same players are involved, and they play thesame game as before).This allows us to consider strategies in which future actionis selected based on theexperience gained so far. When discussing repeated games in Chapter 5 we mentioneda few simple strategies. For example, in the context of repeated prisoners’ dilemma,we mentioned the Tit-for-Tat (TfT) and trigger strategies.These, in particular TfT, canbe viewed as very rudimentary forms of learning strategies.But one can imagine muchmore complex strategies, in which an agent’s next choice depends on the history of playin more sophisticated ways. For example, the agent could guess that the frequency ofactions played by his opponent in the past might be his current mixed strategy, and playa best response to that mixed strategy (as we shall see, this basic learning rule is calledfictitious play).

Repeated games are not the only context in which learning takes place. Certainlythe more general category of stochastic games (also discussed in Chapter 5) is also onein which regularity across time allows one to meaningfully discuss learning. Indeed,most of the techniques discussed in the context of repeated games are applicable moregenerally in stochastic games, though the specific results obtained for repeated gamesdo not always generalize.

In both cases—repeated and stochastic games—there are additional aspects of thesettings worth discussing. These have to do with whether the(e.g., repeated) game iscommonly known by the players. If it is, any ‘learning’ that takes place is only aboutthe strategies employed by the other. If the game is not known, the agent can in additionlearn about the structure of the game itself. For example, ina stochastic game setting,the agent may start out not knowing the payoff functions at a given stage game or thetransition probabilities, but learn those over time in the course of playing the game. Itis most interesting to consider the case in which the game being played is unknown;in this case there is a genuine process of discovery going on.Some of the remarkableresults are that, with certain learning strategies, agentscan sometimes converge to anequilibrium of the game even without knowing the game being played. Additionally,there is the question of whether the game isobservable; do the players see each others’observabilityactions? Each others’ payoffs (in case of a known game, of course the actions alsoreveal the payoffs).

While repeated and stochastic games constitute the main setting in which we willinvestigate learning, there are other settings as well. Chief among them are models oflarge populations. These models, which were largely inspired by evolutionary modelsin biology, are superficially quite different from the setting of repeated or stochastic

2. This is reminiscent of the “sidewalk shuffle”, that awkwardprocess of trying to get by the person walkingtowards you while he is doing the same thing, the result being that you keep blocking each other.

Multiagent Systems, draft of August 27, 2007

Page 190: Mas kevin b-2007

174 6 Learning and Teaching

Yield Dare

Yield 2, 2 1, 3

Dare 3, 1 0, 0

Figure 6.3 The game of chicken.

games. Unlike the latter, which involve a small number of players, the evolutionarymodels consist of a large number of players, who repeatedly play a given game amongthemselves (for example, pairwise in the case of two-playergames). A closer look,however, shows that these models are in fact closely relatedto the models of repeatedgames. We discuss this further in the last section of this chapter.

6.1.3 If learning is the answer, what is the question?

It is very important to be clear on why we study learning in multiagent systems, andhow we judge whether a given learning theory is successful ornot. These might seemlike trivial questions, but in fact the answers are not obvious, and not unique.

First, note that we do not judge ‘learning’ per se, but rathercomplete strategies thatinvolve a learning component. That is, the strategies tell us how to act, not only howto accumulate beliefs. One consequence from this is that learning in the sense of “ac-cumulated knowledge” is not always beneficial. In the abstract, accumulating knowl-edge never hurts, since one can always ignore what has been learned. But when onepre-commits to a particular strategy of how to act upon the accumulated knowledge,sometimes less is more.

This point is related to the inseparability of learning fromteaching, discussed earlier.For example, consider a protagonist agent planning to play an infinitely repeated gameof chicken. In the presence of any opponent who attempts to learn the protagonistchicken gameagent’s strategy and play a best response, an optimal strategy for an agent is to playthe stationary policy of always daring; this is the “watch out I’m crazy” policy. Theopponent will learn to always yield, a worse outcome for him had he never learnedanything.3

In the following, when we speak about learning strategies, these should be un-derstood as complete strategies, which involve learning inthe strict sense as well aschoices of action.

Broadly speaking, we can divide theories of learning in multiagent systems into twocategories—descriptive theoriesandprescriptive theories. Descriptive theories attemptdescriptive

theory

prescriptivetheory

to study the way in which learning takes place in real life—usually by people, butsometimes by other entities such as organizations or certain animal species. The goal

3. The literary-minded reader may be reminded of the quote from Oscar Wilde’sA Woman of No Importance:“[...] the worst tyranny the world has ever known; the tyranny of the weak over the strong. It is the onlytyranny that ever lasts.” Except here it is the tyranny of thesimpleton over the sophisticated.

c© Shoham and Leyton-Brown, 2007

Page 191: Mas kevin b-2007

6.1 Why the subject of ‘learning’ is complex 175

here is to show experimentally that a certain model of learning agrees with people’sbehavior (typically, in laboratory experiments), and thenshow interesting properties ofthe formal model.

The ideal descriptive theory would have two properties:

Realism: A good match between the formal theory and the natural phenomenon beingstudied.

Convergence: Interesting behavioral properties of the formal theory, in particular con-vergence of the strategy profile being played to some equilibrium of the game beingplayed.

One approach to demonstrating realism is to apply the experimental methodologyof the social sciences. While we will not focus on this area, there are several goodexamples of this approach in economics and game theory. But there can be othersupports for studying a given learning process. For example, to the extent that oneaccepts the Bayesian model as at least an idealized model of human decision making,this provides support for the model ofrational learningwhich we discuss later.

Convergence properties come in various flavors. Here are some of them:

• The holy grail is to show convergence to stationary strategies which form a Nashequilibrium of the stage game. In fact often this is the hidden motive of the research.It has been noted that game theory is somewhat unusual in having the notion of anequilibrium without associated dynamics that give rise to the equilibrium. Showingthat the equilibrium arises naturally would correct this anomaly.

• Convergence to Nash equilibria is a rare occurrence. One alternative is not to requirethat the agents converge to a strategy profile that is a Nash equilibrium, but that theempirical frequency of play converge to such an equilibrium. For example, considera repeated game of matching pennies. If both agents repeatedplay (H,H) and (T,T)the frequency of their plays will each converge to(.5, .5), the strategy in the uniqueNash equilibrium, even though the payoffs obtained will be very different from theequilibrium payoffs.

• An alternative is to give up on Nash equilibrium as the relevant solution concept.One option to show convergence to a correlated equilibrium of the stage game.This is interesting in a number of ways. First, no-regret learning, which we discussbelow, can be shown to converge to correlated equilibria in certain cases. Second,convergence to a correlated equilibrium provides a justification for this concept; the“correlating device” is not an abstract notion, but the prior history of play.

• Another interesting alternative is to give up on convergence to stationary policies,but require that the non-stationary policies converge to aninteresting state. In par-ticular, learning strategies that include building an explicit model of the opponents’strategies (as we shall see, these are calledmodel-basedlearning rules), can be re-quired to converge to correct models of the opponents’ strategies.

In contrast with descriptive theories, prescriptive theories ask how agents—people,programs, or otherwise—shouldlearn. A such they are not required to show a match

Multiagent Systems, draft of August 27, 2007

Page 192: Mas kevin b-2007

176 6 Learning and Teaching

with real-world phenomena. By the same token, their main focus is not on behavioralproperties, though they may investigate convergence issues as well. For the most part,we will concentrate onstrategicnormative theories, in which the individual agents areself motivated (recall the distinction drawn between non-strategic and strategic theoriesin the Introduction).

In zero-sum games, and even in repeated or stochastic zero sum games, it is mean-ingful to ask whether an agent is learning in an optimal fashion. But in general it is not,since the result depends not only on the learning being done but also on the behavior onthe other agents in the system. When all agents adopt the same strategy (for example,they all adopt TfT, or all adopt reinforcement learning, to be discussed shortly), this iscalledself play. One way to judge learning procedures is based on their performanceself playin self play. However, learning agents can be judged also by how they do in the contextof other types of agents; a TfT agent may perform well againstanother TfT agent, butless well against an agent using reinforcement learning.

No learning procedure is optimal against all possible opponent behaviors. This ob-servation is simply an instance of the general move in game theory away from thenotion of ‘optimal strategy’ and towards ‘best response’ and equilibrium. Indeed, inthe broad since in which we use the term, a ‘learning strategy’ is simply a strategy ina game that has a particular structure (namely, the structure of a repeated or stochasticgame), and a strategy that happens to have in it a component that is naturally viewedas performing some kind of ‘learning’ in the standard sense.

So how do we evaluate a prescriptive learning strategy? There are several answers.The first is to adopt the standard game theoretic stance, giveup on judging a strategyin isolation, and instead ask which learning rules are in equilibrium with each other.Note that requiring that repeated-game learning strategies be in equilibrium with eachother is very different from the convergence requirements discussed above; those speakabout equilibrium in the stage game, not the repeated game. For example, Tit-for-Tat(TfT) is in equilibrium with itself in an infinitely repeatedPrisoners’ Dilemma game,but does not lead to the repeated Defect play, the only Nash equilibrium of the stagegame. This ‘equilibrium of learning strategies’ approach is not common, but we shallsee one example of it later on.

A more modest, but by far more common and perhaps more practical, approachis to ask whether a learning strategy achieves payoffs that are “high enough”. Thisapproach is both stronger and weaker than the requirement of‘best response’. Bestresponse requires that the strategy yield the highest possible payoff against a particularstrategy of the opponent(s). The new approach considers a broad class of opponents,but makes weaker requirements regarding the payoffs, whichare allowed to fall shortof best response.

There are several different versions of such high-payoff requirements, each adoptand/or combine different basic requirements. Here is a listof such basic requirements:

Safety.A learning rule is safe if it guarantees the agent at least is maximin payoff,safety of alearning rule or security-value (recall that this is the payoff the agent can guarantee to himself

regardless of the strategies adopted by the opponents).

Rationality. A learning rule is rational, if—whenever the opponent settles on a sta-rationality of alearning rule tionary strategy of the stage game (that is, the opponent adopts the same mixed

c© Shoham and Leyton-Brown, 2007

Page 193: Mas kevin b-2007

6.2 Fictitious play 177

strategy each time, regardless of the past), the agent settles on a best-response tothat strategy.

No-regretoruniversal consistency.A learning rule is universally consistent, or (equiv-no-regret

universalconsistency

alently) exhibits no-regret, if, loosely speaking, against any set of opponents ityields a payoff that is no less than the payoff the agent couldhave obtained byplaying any one of his pure strategies throughout. We discuss this condition furtherlater in the chapter.

Some of these basic requirements are quite strong, and can beweakened in a varietyof ways. One way is to allow slight deviations, either in terms of the magnitude of thepayoff obtained, or the probability of obtaining it, or both. For example, rather thanrequire optimality, one can requireǫ, δ-optimality, meaning that with probability of atleast1−δ the agent’s payoff comes withinǫ of the payoff obtained by the best response.Another way of weakening the requirements is to limit the class of opponents againstwhich the requirement holds. A special case of this restriction is self play, the case self playin which the agent is playing a copy of itself (note that whilethe learning strategiesare identical, the game being played may not be symmetric). For example, one mightrequire that the learning rule guarantee convergence in self play. Or, as in the caseof targeted optimalitywhich we discuss later, one might require a best response onlyagainst a particular class of opponents.

In the next sections, as we discuss several learning rules, we will encounter vari-ous versions of these requirements and their combinations.For the most part we willconcentrate on repeated, two-player games, though in some cases we will broaden thediscussion and discuss stochastic games and games with morethan two players. Wewill cover the following types of learning in order:

• fictitious play

• rational (or Bayesian) learning

• reinforcement learning

• no-regret learning

• targeted learning

• evolutionary and other large-population models of learning

6.2 Fictitious play

Fictitious playis one of the earliest learning rules. It was actually not proposed initially fictitious playas a learning model at all, but rather as an iterative method for computing Nash equilib-ria in zero-sum games. It happens to not be a particularly effective way of performingthis computation, but since it employs an intuitive learning rule, it is usually viewedas a model of learning, albeit a simplistic one, and subjected mostly to convergenceanalyses of the sort discussed in the previous section.

Fictitious play is an instance of model-based learning, in which the learner explicitlymaintains a belief regarding the opponent’s strategy. These structure of such techniquesis straightforward:

Multiagent Systems, draft of August 27, 2007

Page 194: Mas kevin b-2007

178 6 Learning and Teaching

Initialize the beliefs regarding the opponent’s strategyrepeat

Play a best response to the assessed strategy of the opponentObserves the actual play the opponent and update the beliefsaccordingly

Note that in this scheme the agent is oblivious to the payoffsobtained or obtainableby other agents. We do however assume that the agents know their own payoff matrixin the stage game (that is, the payoff they would get in each action profile, whether ornot encountered in the past).

In fictitious play, an agent believes that his opponent is playing the mixed strategygiven by the empirical distribution of the opponent’s previous actions. That is, theplayer maintains a tally of the actions the opponent played in previous rounds, andtakes the ratio between them to be the opponent’s current mixed strategy. That is, ifAis the set of the opponent’s actions, and for everya ∈ A we letw(a) be the number oftimes that the opponent has played actiona, then the agent assesses the probability ofa in the opponent’s mixed strategy as

P (a) =w(a)∑

a′∈A w(a′)

For example, in a repeated prisoners’ dilemma game, if the opponent has playedC,C,D,C,D in the first five games, before the sixth game he is assumed to play themixed strategy(0.6, 0.4). Note that we can represent a player’s beliefs with either aprobability measure, or with the set of counts(w(a1), . . . , w(ak)).

Fictitious play is very sensitive to the players’ initial beliefs. This choice, which canbe interpreted as action counts that were observed before the start of the game, can havea radical impact on the learning process. Note that one must pick some non-empty priorbelief for each agent; the prior beliefs cannot be(0, . . . , 0) since they do not define ameaningful mixed strategy.

Fictitious play is somewhat paradoxical in that the each agent assumes a stationarypolicy of the opponent, yet no agent plays a stationary policy except when the processhappens to converge to one. The following example illustrates the operation of ficti-tious play. Recall the matching pennies game from Chapter 2,reproduced earlier inFigure 6.7. Two players are playing a repeated game of Matching Pennies. Each playeris using the fictitious play learning rule to update its beliefs and select actions. Player 1begins the game with the prior belief that player 2 has playedheads 1.5 times, and tails2 times. Player 2 begins with the prior belief that player 1 has played heads 2 times,and tails 1.5 times. How will the players play?

The first 7 rounds of play of the game is shown in Table 6.1.As you can see, each player ends up alternating back and forthbetween playing

heads and tails. In fact, as the number of rounds tends to infinity, the empirical distri-bution of the play of each player converges to(0.5, 0.5). If we take this distribution tobe the mixed strategy of each player, the play converges to the unique Nash equilib-rium of the normal form stage game, that in which each player plays the mixed strategy(0.5, 0.5).

c© Shoham and Leyton-Brown, 2007

Page 195: Mas kevin b-2007

6.2 Fictitious play 179

Round 1’s action 2’s action 1’s beliefs 2’s beliefs0 (1.5,2) (2,1.5)1 T T (1.5,3) (2,2.5)2 T H (2.5,3) (2,3.5)3 T H (3.5,3) (2,4.5)4 H H (4.5,3) (3,4.5)5 H H (5.5,3) (4,4.5)6 H H (6.5,3) (5,4.5)7 H T (6.5,4) (6,4.5)...

......

......

Table 6.1 Fictitious play of a repeated game of matching pennies.

We have not fully specified fictitious play. There exist different versions of fictitiousplay which differ on the tie-breaking method used to select an action when there is morethan one best response to the particular mixed strategy induced by an agent’s beliefs.In general the tie-breaking rule chosen has little effect onthe results of fictitious play.

Fictitious play has several nice properties. First, nice connections can be shown topure-strategy Nash equilibria, when they exist. In the following, we define an actionvector to be anabsorbing stateor steady stateof fictitious play if it is the case that absorbing state

steady statewhenever that action vector is played at roundt is also played at roundt+1 (and hencein all future rounds as well). The following two theorems establish a tight connectionbetween steady states and pure-strategy Nash equilibria.

Theorem 6.2.1 If an action vector is a strict Nash equilibrium of a stage game, then itis a steady state of fictitious play in the repeated game.

Note that the action vector must be astrict Nash equilibrium, which means that noagent can deviate to another action without strictly decreasing its payoff. We also havea converse result.

Theorem 6.2.2 If a vector of pure strategies is a steady state of fictitious play in therepeated game, then it is a (possibly weak) Nash equilibriumin the stage game.

Of course, one cannot guarantee that fictitious play always converges to a Nash equi-librium, if only because agents can only play pure strategies and a pure-strategy Nashequilibrium may not exist in a given game. However, while thestage game strategiesmay not converge, the empirical distribution of the stage game strategies over multipleiterations may. And indeed this was the case in the Matching Pennies example above,where the empirical distribution of the each player’s strategy converged to their mixedstrategy in the (unique) Nash equilibrium of the game. The following theorem showsthat this was no accident.

Theorem 6.2.3 If the empirical distribution of each player’s strategies converges infictitious play, then it converges to a Nash equilibrium.

Multiagent Systems, draft of August 27, 2007

Page 196: Mas kevin b-2007

180 6 Learning and Teaching

This seems like a powerful result. However, notice that although the theorem givessufficient conditions for the empirical distribution of theplayers’ actions to convergeto a mixed strategy equilibrium, we have not made any claims about the distribution ofthe particular outcomes played.

To better understand this point, consider the following example. Consider theanti-coordination gameshown in Figure 6.4.anti-

coordinationgame A B

A 0, 0 1, 1

B 1, 1 0, 0

Figure 6.4 The anti-coordination game.

Clearly there are two pure Nash equilibria of this game,(A,B) and(B,A), and onemixed Nash equilibrium, in which each agent mixesA andB with probability 0.5.Either of the two pure strategy equilibria earns each playera payoff of 1, and the mixedstrategy equilibrium earns each player a payoff of 0.5.

Now let’s see what happens when we have agents play the repeated anti-coordinationgame using fictitious play. Let’s assume that the weight function for each player isinitialized to(1, 0.5). The play of the first few rounds is shown in Table 6.2.

Round 1’s action 2’s action 1’s beliefs 2’s beliefs0 (1,0.5) (1,0.5)1 B B (1,1.5) (1,1.5)2 A A (2,1.5) (2,1.5)3 B B (2,2.5) (2,2.5)4 A A (3,2.5) (3,2.5)...

......

......

Table 6.2 Fictitious play of a repeated anti-coordination game.

As you can see, the play of each player converges to the mixed strategy(0.5, 0.5),which is the mixed strategy Nash equilibrium. However, the payoff received by eachplayer is 0, since the players never hit the outcomes with positive payoff. Thus, al-though the empirical distribution of the strategies converges to the mixed strategy Nashequilibrium, the players may not receive the expected payoff of the Nash equilibrium,because their actions are mis-correlated.

Finally, the empirical distributions of players’ actions need not converge at all. Con-sider the game in Figure 6.5.4

4. Note that this example, due to Shapley, is a modification of the Rock-Scissors-Paper game; this game isnot constant sum.

c© Shoham and Leyton-Brown, 2007

Page 197: Mas kevin b-2007

6.2 Fictitious play 181

Rock Paper Scissors

Rock 0, 0 0, 1 1, 0

Paper 1, 0 0, 0 0, 1

Scissors 0, 1 1, 0 0, 0

Figure 6.5 Shapley’s almost-rock-paper-scissors game.

The unique Nash equilibrium of this game is for each player toplay the mixed strat-egy(1/3, 1/3, 1/3). However, consider the fictitious play of the game when player 1’sweight function has been initialized to(0, 0, 0.5) and player 2’s weight function hasbeen initialized to(0, 0.5, 0). The play of this game is shown in Table 6.3.

Round 1’s action 2’s action 1’s beliefs 2’s beliefs0 (0,0,0.5) (0,0.5,0)1 Rock Scissors (0,0,1.5) (1,0.5,0)2 Rock Paper (0,1,1.5) (2,0.5,0)3 Rock Paper (0,2,1.5) (3,0.5,0)4 Scissors Paper (0,3,1.5) (3,0.5,1)5 Scissors Paper (0,1.5,0) (1,0,0.5)...

......

......

Table 6.3 Fictitious play of a repeated game of the almost-rock-paper-scissors game.

Although it is not obvious from these first few rounds, the empirical play of thisgame does not converge to any fixed distribution.

However, under some conditions we are guaranteed to reach convergence:

Theorem 6.2.4 The following are each a sufficient condition for the empirical frequen-cies of play to converge in fictitious play:

• The game is zero sum.

• The game is solvable by iterated elimination of strictly dominated.

• The game is a potential game.5

• The game is2×N and has generic payoffs.6

5. Actually an even more more general condition applies here, that the players have “identical interests,” butwe will not discuss his further here.6. Full discussion of genericity in games lies outside the scope of this book, but here is the essential idea, atleast for games in normal form. Roughly speaking, a game in normal form is generic if it does not have any

Multiagent Systems, draft of August 27, 2007

Page 198: Mas kevin b-2007

182 6 Learning and Teaching

Overall, fictitious play is an interesting model of learningin multiagent systems notbecause it is realistic or because it provides strong guarantees, but because it is verysimple to state and gives rise to non-trivial properties. But it is a very limited model; itsmodel of beliefs and belief update is mathematically constraining, and clearly implau-sible as a model of human learning. There exist various variants of fictitious play thatscore somewhat better on both fronts. We will mention one of them—calledsmoothfictitious play—when we discuss no-regret learning methods.

6.3 Rational learning

Rational learning(also sometimes calledBayesian learning) adopts the same generalrational learning

Bayesianlearning

model-based scheme as fictitious play. Unlike fictitious play, however, it allows playersto have a much richer set of beliefs about opponents’ strategies. First, the set of strate-gies of the opponent can include repeated-game strategies such as TfT in the prisoners’dilemma game, and not only repeated stage-game strategies.Second, the beliefs ofeach player about his opponent’s strategies are captured byany probability distributionover the set of all possible strategies.

As in fictitious play, each player begins the game with some prior belief. After eachround, the player usesBayesian updatingto update these beliefs. LetS be the set ofBayesian

updating the opponent’s strategies considered possible by playeri, andH be the set of possiblehistories of the game. Then we can use Bayes’ rule to express the probability assignedby playeri to the event in which the opponent is playing a particular strategys ∈ Sgiven the observation of historyh ∈ H, as follows.

Pi(s|h) =Pi(h|s)Pi(s)∑

s′∈S Pi(h|s′)Pi(s′)

For example, consider two players playing the infinitely repeated prisoner’s dilemmagame, reproduced in Figure 6.6.

C D

C 3, 3 0, 4

D 4, 0 1, 1

Figure 6.6 Prisoner’s dilemma game

Suppose that the support of the prior belief of each player (that is, the strategiesof the opponent to which he ascribes non-zero probability) consists of the strategiesg1, g2, . . . g∞, defined as follows.g∞ is the trigger strategythat was presented intrigger strategy

interesting property that does not also hold with probability 1 when the payoffs are selected independentlyfrom a sufficiently rich distribution (for example, the uniform distribution over a fixed interval). Of course, to

c© Shoham and Leyton-Brown, 2007

Page 199: Mas kevin b-2007

6.3 Rational learning 183

Chapter 4. A player using the trigger strategy begins the repeated game by cooperating,and if his opponent defects in any round, he defects in every subsequent round. ForT <∞, gT coincides withg∞ at all histories shorter thanT but prescribes unprovokeddefection starting from timeT on. Following this convention, strategyg0 is the strategyof constant defection.

Suppose furthermore that each player happens indeed to select a best response fromamongg0, g1, . . . , g∞ (there are of course infinitely many additional best responsesoutside this set). Thus each round of the game will be played according to some strat-egy profile(gT1

, gT2).

After playing each round of the repeated game, each player performs Bayesian up-dating. For example, if playeri has observed that playerj has always cooperated, theBayesian updating after historyht ∈ H of lengtht reduces to

Pi(gT |ht) =

0 if T ≤ t

Pi(gT )∑∞k=t+1 Pi(gk) if T > t

Rational learning is a very intuitive model of learning, butits analysis is quite in-volved. The formal analysis focuses on self-play, that is, on properties of the repeatedgame in which all agents employ rational learning (though they may start with differentpriors). We trace the main ideas below, but here are the main highlights:

• Under some conditions, in self play rational learning results in agents having closeto correct beliefs about the observable portion of their opponent’s strategy.

• Under some conditions, in self play rational learning causethe agents to convergetoward a Nash equilibrium with high probability.

• Chief among these ‘conditions’ is an assumption ofabsolute continuity, a strongassumption.

In the remainder of this section we discuss these points in more detail, starting with thenotion of absolute continuity.

Definition 6.3.1 (Absolute continuity) LetX be a set and letµ, µ′ ∈ Π(X) be prob-ability distributions overX. Then the distributionµ is said to beabsolutely continuous absolute

continuitywith respect to the distributionµ′ iff for x ⊂ X that is measurable7 it is the case thatif µ(x) > 0 thenµ′(x) > 0.

Note that the players’ beliefs and the actual strategies each induce probability dis-tributions over the set of historiesH. Let s = (s1, . . . , sn) be a vector of strategiesused by players1, . . . n respectively. If we assume that these strategies are used bythe players, we can calculate the probability of each history of the game occurring,thus inducing a distribution overH. We can also induce such a distribution with a

make this precise we would need to define ‘interesting’ and ‘sufficiently’. Intuitively, though, this means thatthe payoffs do not have accidental properties. A game whose payoffs are all distinct is necessarily generic.7. Recall that a probability distribution over a domainX does not necessarily give a value for all subsets ofX, but only over someσ-algebra ofX, the collection of measurable sets.

Multiagent Systems, draft of August 27, 2007

Page 200: Mas kevin b-2007

184 6 Learning and Teaching

player’s beliefs about players’ strategies. LetSij be a set of strategies thati believes

possible forj, andP ij ∈ Π(Si

j) be the distribution overSij believed by playeri. Let

Pi = (P i1, . . . , P

in) be the vector of beliefs about the possible strategies of every player.

Now, if playeri assumes that all players (including himself) will play according to hisbeliefs, he can also calculate the probability of each history of the game occurring, thusinducing a distribution overH. The results that follow all require that the distributionover histories induced by the actual strategies is absolutely continuous with respectto the distribution induced by a player’s beliefs; in other words, if there is a positiveprobability of some history given the actual strategies, then the player’s beliefs shouldalso assign the history positive probability (colloquially, it is sometimes said that thebeliefs of the players must contain agrain of truth). Although the results that followgrain of truthare very elegant, it must be said that the absolute continuity assumption is a significantlimitation of the theoretical results associated with rational learning.

In the prisoners’ dilemma example above, it is easy to see that the distribution ofhistories induced by the actual strategies is absolutely continuous with respect to thedistribution predicted by the prior beliefs of the players.All positive probability histo-ries in the game are assigned positive probability by the original beliefs of both play-ers: if the true strategies aregT1

, gT2, players assign positive probability to the history

with cooperation up to timet < min(T1, T2) and defection in all times exceeding themin(T1, T2).

The rational learning model is interesting because it has some very desirable prop-erties. Roughly speaking, players satisfying the assumptions of the rational learningmodel will have beliefs about the play of the other players that converge to the truth,and furthermore, players will in finite time converge to playthat is arbitrarily close tothe Nash equilibrium. Before we can state these results we need to define a measure ofthe similarity of two probability measures.

Definition 6.3.2 (ǫ-closeness)Given anǫ > 0 and two probability measuresµ andµ′ on the same space, we say thatµ is ǫ-close toµ′ if there is a measurable setQsatisfying:

• µ(Q) andµ′(Q) are each greater than1− ǫ, and

• For every measurable setA ⊆ Q, we have that

(1− ǫ)µ′(A) ≥ µ(A) ≥ (1 + ǫ)µ′(A).

Now we can state a result about the accuracy of the beliefs of aplayer using rationallearning.

Theorem 6.3.3 (Rational Learning and Belief Accuracy)Let s be a repeated-gamestrategy profile for a givenN -player game8, and letP = P1, . . . , PN be a a vectorof probability distribution over such strategy profiles (Pi is interpreted as playeri’sbeliefs). Letµs andµP be the distributions over infinite game histories induced bythestrategy vectors and the belief vectorP , respectively. If we have that

8. That is, a vector of repeated-game strategies, one for eachplayer.

c© Shoham and Leyton-Brown, 2007

Page 201: Mas kevin b-2007

6.3 Rational learning 185

• at each round, each playeri plays a best response strategy given his BeliefsPi;

• after each round each playeri updatesPi using Bayesian updating; and

• µs is absolutely continuous with respect toµPi,

then for everyǫ > 0 and for almost every history in the support ofµs (that is, everypossible history given the actual strategy profiles), there is a timeT such that for allt ≥ T , the playµPi

predicted by the playeri’s beliefs isǫ-close to the distribution ofplayµs predicted by the actual strategies.

Thus a player’s beliefs will eventually converge to the truth if he is using Bayesianupdating, is playing a best response strategy, and the play predicted by the other play-ers’ real strategies is absolutely continuous with respectto that predicted by his beliefs.In other words, he will correctly predict the on-path portion of the other players’ strate-gies.

Note that this result doesnot state that players will learn the true strategy beingplayed by their opponents. As stated above, there are an infinite number of possiblestrategies that their opponent could be playing, and each player begins with a priordistribution that assigns positive probability to only some subset of the possible strate-gies. Instead, players’ beliefs will accurately predict the play of the game, and noclaim is made about their accuracy in predicting the off-path portion of the opponents’strategies.

Consider again two players are playing the infinitely repeated prisoners’ dilemmagame, as described in the previous example. Let us verify that, as Theorem 6.3.3dictates, the future play of this game will be correctly predicted by the players. IfT1 <T2 then from timeT1 + 1 on, then player 2’s posterior beliefs will assign probability 1to player 1’s strategy,gT1

. On the other hand, player 1 will never fully know player 2’sstrategy, but will know thatT2 > T1. However, this is sufficient information to predictthat player 2 will always choose to defect in the future.

A player’s beliefs must converge to the truth even when his strategy space is incorrect(does not include the opponent’s actual strategy), as long as they satisfy the absolutecontinuity assumption. Suppose, for instance, that player1 is playing the trigger strat-egyg∞, and player 2 is playing tit-for-tat. However, player 1 believes that player 2 isalso playing the trigger strategy. Thus player 1’s beliefs about player 2’s strategy arecompletely incorrect. Nevertheless, his beliefs will correctly predict the future historiesof the game.

We have so far spoken about the accuracy of beliefs in rational learning. Note thatthe conditions of the theorem are identical to those of Theorem 6.3.3. Note also that thedefinition refers to the concept of anǫ-Nash Equilibrium from Section 2.3.9, as well asǫ-closeness defined above.

Theorem 6.3.4 (Rational Learning and Nash)Lets be a repeated-game strategy pro-file for a givenN -player game, and letP = P1, . . . , PN be a a vector of probabilitydistribution over such strategy profiles. Letµs andµP be the distributions over infinitegame histories induced by the strategy vectors and the belief vectorP , respectively. Ifwe have that

Multiagent Systems, draft of August 27, 2007

Page 202: Mas kevin b-2007

186 6 Learning and Teaching

• at each round, each playeri plays a best response strategy given his BeliefsPi;

• after each round each playeri updatesPi using Bayesian updating; and

• µs is absolutely continuous with respect toµPi,

then for everyǫ > 0 and for almost every history in the support ofµs there is a timeTsuch that for everyt ≥ T there exists anǫ-equilibriums∗ of the repeated game in whichthe playµPi

predicted by playeri’s beliefs isǫ-close to the playµs∗ of the equilibrium.

In other words, if utility maximizing players start with individual subjective beliefs,with respect to which the true strategies are absolutely continuous, then in the longrun, their behavior must be essentially the same as a behavior described by anǫ-Nashequilibrium.

Of course, the space of repeated-game equilibria is huge, which leaves open thequestion of which equilibrium will be reached. Here notice acertain self-fulfillingproperty; players optimism can lead to high rewards, and vice versa. For example, ina repeated prisoners dilemma game, if both players give a high prior on a TfT strategyby their opponent, they will tend to cooperate, leading to mutual cooperation. If on theother hand they place a high prior on constant defection, or the grim-trigger strategy,they will tend to defect.

6.4 Reinforcement learning

In this section we look at multiagent extensions of learningin MDPs, that is, in single-agent stochastic games (see Appendix C for a review of MDP essentials). Unlikethe first two learning techniques discussed, and with one exception discussed in sec-tion 6.5.3,reinforcement learningdoes not explicitly model the opponent’s strategy.reinforcement

learning The specific family of techniques we look at are derived from the Q-learning algorithmfor learning in unknown (single-agent) MDPs. Q-learning isdescribed in the next sec-tion, after which we present its extension to zero-sum stochastic games. We then brieflydiscuss the difficulty in extending the methods to general sum stochastic games.

6.5 Learning in unknown MDPs

Value iteration, as described in Appendix??, assumes that the MDP is known. Whatif we don’t know the rewards or transition probabilities of the MDP? It turns out that,if we always know what state we are in and the reward received in each iteration, thenwe can still converge to the correct Q-values.

Q-learning

c© Shoham and Leyton-Brown, 2007

Page 203: Mas kevin b-2007

6.5 Learning in unknown MDPs 187

Definition 6.5.1 (Q-learning) Q-learning is the following procedure:

Initialize st to your given initial stateInitialize the Q-function andV values (arbitrarily, for example)Select actionat and and take it. Observe your rewardr(st, at)Perform the following updates (and do not update any other Q-values):Qt+1(st, at)← (1− α)Qt(st, at) + αt(r(st, at) + βVt(st+1))Vt(s)← maxaQt(s, a)

Theorem 6.5.2 Q-learning guarantees that theQ andB values converge to those ofthe optimal policy, provided the following conditions hold:

• Each state-action pair is sampled an infinite number of times.

• The time-dependent learning rateαt obeys the following constraints:

0 ≤ αt < 1∞∑

0

αt =∞∞∑

0

α2t <∞

The intuition behind this approach is that we are approximating the unknown tran-sition probability by using the actual distribution of states reached in the game itself.Notice that this still leaves us a lot of room in designing theorder in which the algo-rithm selects actions.

Note that this theorem says nothing about the rate of convergence. Furthermore, itgives no assurance regarding the accumulation of optimal future discounted rewardsby the agent; it could well be, depending on the discount factor, that by the time theagent converges on the optimal policy it has paid too high a cost, which cannot berecouped by exploiting the policy going forward. This is nota concern if the learningtakes place during training sessions, and only when learning has converged sufficientlyis the agent unleashed on the world (for example, think of a fighter pilot being trainedon a simulator before going into combat). But in general Q-learning must be thoughtof as guaranteeing good learning, but neither quick learning nor high future discountedrewards.

6.5.1 Reinforcement learning in zero-sum stochastic games

In order to adapt the method presented from the setting of MDPs to stochastic games,we will need to make a few modifications. The simplest possible modification wouldbe to have each agent ignore the existence of the other agents. We then defineQπ

i : S×Ai → R to be the value for playeri if all players follow joint strategyπ after starting instates and playeri chooses the actiona. We can now apply the Q-learning algorithmas described previously. As mentioned earlier in the chapter, we are forced to foregoour search for an ‘optimal’ policy, and instead focus on one that achieves performs wellagainst its opponents; for example, we might require than itsatisfy Hannan consistency.The procedure can be shown to be Hannan-consistent for an agent in a stochastic game

Multiagent Systems, draft of August 27, 2007

Page 204: Mas kevin b-2007

188 6 Learning and Teaching

against opponents playing stationary policies. However, against opponents using morecomplex strategies, such as Q-learning itself, we lose the guarantee.

The above approach, assuming stationary opponents, seems unmotivated, since weare completely ignoring the existence of the other agents inthe problem. Instead, if theagent is aware of what actions its opponent(s) selected at each point in its history, wecan use a modified Q-function,Qπ

i : S × A → R, defined over states and joint actionpairs, whereA = A1×A2× . . . An. The formula to update Q is simple to modify andwould be the following for a two-player game.

Qi,t+1(st, at, ot) = (1− αt)Qi,t(st, at, ot) + αt(ri(st, at, ot) + βVt(st+1))

Now that the actions range over both our agent’s actions and that of its competitors,how can we calculate the value of a state? As discussed in Chapter 2, the maxminstrategy of an agent is that which maximizes his payoffs against a worst-case opponentor set of opponents. Recall that for zero-sum games, the joint policy where each agentplays their maxmin strategy forms a Nash equilibrium. The payoff to the first agent(and this minus the payoff to the second agent) is called thevalueof the game, and itvalue of the

game forms the basis for our revised value function for Q-learning:

Vt(s) = maxΠi

minoQi,t(s,Πi(s), o).

Like the basic Q-learning algorithm, the aboveminimax-Qlearning algorithm is guar-minimax-Qanteed to converge in the limit of infinite samples of each state and action profile pair.While this will guarantee the agent a payoff at least equal to that of its maxmin strat-egy, it no longer satisfies Hannan consistency. If the opponent is playing a suboptimalstrategy, minimax-Q will be unable to exploit it in most games.

A pseudo-code version for an implementation of the minimax-Q algorithm is shownbelow. Note that this pseudo code specifies not only how to update theQ andV values,but also how to update the strategyΠ. There are still some free parameters within thisimplementation. For instance, there are various choices onhow to update the learningparameter,α. One is to simply use a decay rate, so thatα is set toα ∗ decay after eachQ-value update, for some value ofdecay < 1. Another possibly from the Q-learningliterature is to keep separateα’s for each state and action profile pair. In this case, acommon method is to useα = 1/k, wherek equals the number of times that particularQ-value has been updated including the current one. So, whenfirst encountering areward for a states where an action profilea was played, the Q-value is set entirely tothe observed reward plus the discounted value of the successor state (α = 1). On thenext time that state, action profile pair is encountered, it will be set to be 1/2 of the oldQ-value plus 1/2 of the new reward and discounted successor state value.

The following example demonstrates the operation of minimax-Q learning in a sim-ple repeated game (recall that a repeated game is just a stochastic game a single statestage game, and thus a deterministic transition from the stage game back to itself). Welook at the case in which an agent is using minimax-Q learningto learn a Q-functionin the repeated matching pennies game (reproduced here in Figure 6.7) against an un-known opponent.

Note that the convergence results for Q-learning impose only weak constraints onhow to select actions and visit states. In this example, we follow the pseudo-code

c© Shoham and Leyton-Brown, 2007

Page 205: Mas kevin b-2007

6.5 Learning in unknown MDPs 189

// Initialize:forall s in S, a in A, and o in Odo

let Q(s,a,o) = 1

forall s in Sdolet V(s) = 1

forall s in S and a in Adolet Π(s,a) := 1/|A|

let α = 1.0// Take an action:when in state s, with probabilityexplor choose an action uniformly at random,and with probability(1− explor) choose actiona with probabilityΠ(s,a)// Learn:after receiving rewardrew for moving from state s to s’ via action a andopponent’s action olet Q(s,a,o) := (1-α) * Q(s,a,o) +α * (rew + γ ∗ V (s′))let Π(s,·) := arg maxΠ′(s,·)(mino′

∑a′(Π(s, a′) ∗Q(s, a′, o′)))

// The above can be done, for example, by linear programminglet V(s) := mino′ (

∑a′(Π(s,a’) * Q(s,a’,o’)))

Updateα

Heads Tails

Heads 1,−1 −1, 1

Tails −1, 1 1,−1

Figure 6.7 Matching pennies game.

given and assume that the agent chooses an action randomly some fraction of the time(denoted with a *), and plays according to her current best strategy otherwise. Forthe learning rate, we have chosen the second method discussed earlier, withα = 1/k,wherek is the number of times the state and action profile pair has been observed.Assume that the Q-values are initialized to 1 and that the discount factor of the gameis 0.9.

Table 6.4 shows the values of player 1’s Q-function in the first few iterations of thisgame as well her best strategy at each step. We see that the value of the game, 0, isbeing approached, albeit slowly. This is not an accident:

Theorem 6.5.3 Under the same conditions that assure convergence of Q-learning tothe optimal policy in MDPs, in zero-sum games Minimax-Q converges to the value ofthe game in self play.

Here again, no guarantee is made about the rate of convergence or about the accu-mulation of optimal rewards. We can achieve more rapid convergence if we are willing

Multiagent Systems, draft of August 27, 2007

Page 206: Mas kevin b-2007

190 6 Learning and Teaching

t Actions Reward1 Qt(H,H) Qt(H,T ) Qt(T,H) Qt(T, T ) V (s) π1(H)0 1 1 1 1 1 0.51 (H*,H) 1 1.9 1 1 1 1 0.52 (T,H) -1 1.9 1 -0.1 1 1 0.553 (T,T) 1 1.9 1 -0.1 1.9 1.279 0.6904 (H*,T) -1 1.9 0.151 -0.1 1.9 0.967 0.5345 (T,H) -1 1.9 0.151 -0.115 1.9 0.964 0.5356 (T,T) 1 1.9 0.151 -0.115 1.884 0.960 0.5337 (T,H) -1 1.9 0.151 -0.122 1.884 0.958 0.5348 (H,T) -1 1.9 0.007 -0.122 1.884 0.918 0.514...

......

......

......

......

100 (H,H) 1 1.716 -0.269 -0.277 1.730 0.725 0.503...

......

......

......

......

1000 (T,T) 1 1.564 -0.426 -0.415 1.564 0.574 0.500...

......

......

......

......

Table 6.4 Minimax-Q learning in a repeated matching pennies game.

to sacrifice the guarantee of finding a perfectly optimal maxmin-strategy. In particular,we can consider the framework ofProbably Approximately Correct (PAC) learning. InProbably

ApproximatelyCorrect (PAC)learning

this setting, choose someǫ > 0 and1 > δ > 0, and seek an algorithm that can guar-antee with probability(1 − δ) a payoff of at least that of the maxmin strategy minusǫversus any opponent. The advantage of this weaker guaranteeis that the algorithm isnow also guaranteed to achieve at least this value in a polynomially-bounded numberof time steps.

One example of such an algorithm is the model-based learningalgorithmR-max. ItR-maxalgorithm first initializes its estimate of the value of each state to bethe highest reward that might

be returned in the game (hence the name). This philosophy hasbeen referred to asoptimism in the face of uncertaintyand helps guarantee that the agent will explore itsenvironment to the best of its ability. The agent then uses these optimistic values tocalculate a maxmin strategy for the game. Unlike normal Q-learning, the algorithmdoes not update its values for any state and action profile pair until it has visited them‘enough’ times to have a good estimate of the reward and transition probabilities. Usinga theoretical result called Chernoff bounds, it is possibleto polynomially bound thenumber of samples necessary to guarantee that the accuracy of the average over thesamples deviates from the true average by at mostǫ with probability (1 − δ) for anyselected value ofǫ andδ. The polynomial is inN, k, T, 1/ǫ, and1/δ, whereN is thenumber of states (or games) in the stochastic game,k is the number of actions availableto each agent in a game (without loss of generally we can assume that this is the samefor all agents and all games), andT is theǫ-returnmixing timeof the optimal policy,mixing timethat is, the smallest length of time after which the optimal policy is guaranteed to yieldthat is at mostǫ away from optimal. The notes at the end of the chapter point tofurtherreading on R-max, and a predecessor algorithm calledE3 (pronounced ‘E cubed’).E3 algorithm

c© Shoham and Leyton-Brown, 2007

Page 207: Mas kevin b-2007

6.6 No-regret learning and universal consistency 191

6.5.2 Beyond zero-sum stochastic games

So far we have shown results for the class of zero-sum stochastic games. Although thealgorithms discussed, in particular minimax-Q, are still well defined in the general-sumcase, the guarantee of achieving the maxmin strategy payoffis less compelling. An-other subclass of stochastic games that has been addressed is that of common-payoff(pure coordination) games, in which all agents receive the same reward for an out-come. This class has the advantage of removing the competitive aspect and reducingthe problem to that of identifying an optimal action profile and coordinating with theother agents to play it. In many ways this problem can really be seen as a single-agent problem of distributed control. This is a relatively well understood problem, andvarious algorithms exist for it, depending on the precise definition of the problem

Expanding the learning algorithms to the general-sum case is quite problematic,however. There have been attempts to generalize Q-learningto general-sum games,but they have not yet been truly successful. As was discussedat the beginning of thischapter, the question of what it means to learn in general-sum games is subtle. Oneyardstick we have discussed is convergence to Nash equilibrium of the stage game dur-ing self play (that is, against an agent or agents with the same learning strategy). Nogeneralization of Q-learning has been put forward that has this property.

6.5.3 Belief-based reinforcement learning

There is also a version of reinforcement learning that includes explicit modeling of theother agent(s).

Qt+1(st, at)← (1− α)Qt(st, at) + αt(r(st, at) + βVt(st+1))

Vt(s)← maxai

a−i⊂A−i

Qt(s, (ai, a−i))Pri(a−i)

In this version, the agent updates the value of the game usingthe probability it as-signs to the opponent(s) playing each action profile. Of course, the belief function mustbe updated after each play. How it is updated depends on what the function is. Indeed,belief-based reinforcement learning is not a single procedure but a family, each mem-ber characterized by how beliefs are formed and updated. Forexample, in one versionthe belief is of the kind considered in fictitious play, and inanother it is a Bayesianbelief in the style of rational learning. There are some experimental results that showconvergence to equilibrium in self-play for some version ofbelief-based reinforcementlearning and some classes of games, but no theoretical results.

6.6 No-regret learning and universal consistency

A learning rule is universally consistent, or (equivalently) exhibits no-regret, if, looselyspeaking, against any set of opponents it yields a payoff that is no less than the payoffthe agent could have obtained by playing any one of his pure strategies throughout.

Multiagent Systems, draft of August 27, 2007

Page 208: Mas kevin b-2007

192 6 Learning and Teaching

More precisely, letαt be the average per-period reward the agent received up untiltime t, and letαt(si) be the average per-period reward the agentwould havereceivedup until timet had she played pure strategys instead, assuming all other agents playedas they did. Now define theregretthe agent experiences at timet for not having playedregrets as

Rt(s) = αt − αt(s).

A learning rule is said to exhibitno regret9 if it guarantees that with high probabilityno-regretlearning rule the agent will experience no positive regret. That is, for any pure strategy of the agent

s we have

Pr([lim inf Rt(s)] ≤ 0) = 1.

The quantification is over all of the agent’s pure strategiesof the stage game, butnote that it would make a difference if instead one quantifiedover all mixed strategiesof the stage game (why is this?). Note also that this guarantee is only in expectation,since the agent’s strategy will in general be mixed, and thusthe payoff obtained at anygiven time—ut

i—is uncertain.It is important to realize that this “in hindsight” requirement ignores the possibility

that the opponents’ play might change as a result of the agentown play. This is true forstationary opponents, and might be a reasonable approximation in the context of a largenumber of opponents (such as in a public securities market),but less in the context of asmall number of agents, of the sort game theory tends to focuson. For example, in thefinitely repeated Prisoners’ Dilemma game, the only strategy exhibiting no regret is toalways defect (do you see why this is so?). This precludes strategies that capitalize oncooperative behavior by the opponent, such as Tit-for-Tat.In this connection see ourearlier discussion of the inseparability of learning and teaching.

Over the years, a variety of no-regret learning techniques have been developed. Hereare two,regret matchingandsmooth fictitious play:regret matching

smooth fictitiousplay

• Regret matching. At each time step each action is chosen with probability propor-tional to its regret. That is,

σt+1i (s) =

Rt(s)∑s′∈Si

Rt(s′)

whereσt+1i (s) is the probability that agenti plays pure strategys at timet+ 1.

• Smooth fictitious play. Instead of playing the best response to the empirical fre-quency of the opponent’s play, as fictitious play prescribes, one introduces a pertur-bation, which gradually diminishes over time. That is, rather than adopt at timet+1a pure strategysi that maximizesui(si, P

t) whereP t is the empirical distributionof opponent’s play until time t, agenti adopts a mixed strategyσi that maximizes

9. There are actually several versions of regret. The one described here is calledexternal regretin computerscience, andunconditional regretin game theory.

c© Shoham and Leyton-Brown, 2007

Page 209: Mas kevin b-2007

6.7 Targeted learning 193

ui(si, Pt) + λvi(σi). Hereλ is any constant, andvi is a smooth, concave function

with boundaries at the unit simplex. For example,vi can be the entropy function,vi(σi) = −∑Si

σi(si)logσi(si).

Regret matching can be shown to exhibit no regret, and smoothfictitious play ap-proaches no regret asλ tends to zero. The proofs are based on Blackwell’s Approacha-bility Theorem; the notes at the end of the chapter provide pointers for further readingon it, as well as on other no-regret techniques.

6.7 Targeted learning

No-regret learning was one approach to ensuring good rewards, but as we discussedthis sense of ‘good’ has some drawbacks. Here we discuss an alternative sense of‘good’, which retains the requirement of best response, butlimits it to a particularclass of opponents. The intuition guiding this approach is that in any strategic setting,in particular a multiagent learning setting, one hassomesense of the agents in theenvironment. A chess player has studied previous plays of his opponent, a skipper ina sailing competition knows a lot about his competitors, andso on. And so it makessense to try to optimize against this set of opponents, rather than against completelyunknown opponents.

Technically speaking, the model oftargeted learningtakes as a parameter a class—targeted learningthe “target class”—of likely opponents, and be required to perform particularly wellagainst these likely opponents. At the same time one doesn’twant to give up all guar-antees against opponents outside the target class. Finally, an additional desirable prop-erty is for the algorithm to perform well in self-play; the algorithm should be designedto “cooperate” with itself.

For games with only two agents, these intuitions can be stated formally as follows.

Targeted Optimality: Against any opponent in the target class, the expected payoff is targetedoptimalitythe best-response payoff.10

Safety: Against any opponent, the expected payoff is at least the individual security safety(or maxmin) value for the game.

Auto-Compatibility:Self-play—in which both agents adopt the learning procedurein auto-compatibilityquestion—is strictly Pareto efficient.11

We introduce one additional twist. Since we are interested in quick learning, notonly learning in the limit, we need to allow some departure from the ideal. And so weamend the requirements as follows:

10. Note: the expectation is over the joint mixed strategies,but not over opponents; this requirement is forany fixed opponent.11. Recall that strict Pareto efficiency means that one agent’s expected payoff cannot increase without theother’s decreasing. Also note that we do not restrict the discussion to symmetric games, and so self playdoes not in general mean identical play by the agents, nor identical payoffs. We abbreviate “strictly Paretoefficient” as “Pareto efficient.”

Multiagent Systems, draft of August 27, 2007

Page 210: Mas kevin b-2007

194 6 Learning and Teaching

• Efficient targeted learning:For everyǫ > 0, 1 > δ > 0 there exists an M polyno-mial in 1/ǫ and1/δ such that after M time steps, with probability greater than1−δ,all three payoff requirements listed above are achieved within ǫ.

Note the difference from no-regret learning. For example, consider learning in arepeated Prisoner’s Dilemma game. Suppose that the target class consists of all op-ponents whose strategies rely on the past iteration; note this includes the Tit-for-Tatstrategy. In this case successful targeted learning will result in constant cooperation,while no-regret learning prescribes constant defection.

How hard is it to achieve efficient targeted learning? The answer depends of courseon the target class. Provably correct (with respect to this criterion) learning proceduresexist for the class of stationary opponents, and the class ofopponents whose memoryis limited to a finite window into the past. The basic approachis to construct a numberof building blocks, and then specialize and combine them differently depending onthe precise setting. The details of the algorithms can get involved, especially in theinteresting case of non-stationary opponents, but the essential flow though is as follows:

1. Start by assuming that the opponent is in the target set, and learn a best response tothe particular agent under this assumption. If the payoffs you obtain stray too muchfrom your expectation, move on.

2. Signal to the opponent to find out whether he is employing the same learning strat-egy. If he is, coordinate on a Pareto efficient outcome. If your payoffs stray too faroff, move on.

3. Play your security-level strategy.

Note that so far we have restricted the discussion to two-player games. Can wegeneralize the criteria—and the algorithms—to games with more players? The answeris yes, but various new subtleties creep in. For example, in the two-agent case weneeded to worry about three cases, corresponding to whetherthe opponent is in thetarget set, a self-play agent, or neither. We must now consider three sets of agents—selfplay agents (that is, agents using the algorithm in question), agents in the target set, andunconstrained agents, and ask how agents in the first set can jointly achieve a Pareto-efficient outcome against the second set, and yet protect themselves from exploitationby agents in the third set. This raises questions about possible coordination among theagents:

• Can agents self-play agents coordinate other than implicitly through their actions?

• Can opponents—whether in the target set or outside—coordinate other through theactions?

The section at the end of the chapter points to further reading on this topic.

6.8 Evolutionary learning and other large-population models

In this section we shift our focus from models of the learningof individual agents tomodels of the learning of populations of agents (although, as we shall see, we will

c© Shoham and Leyton-Brown, 2007

Page 211: Mas kevin b-2007

6.8 Evolutionary learning and other large-population models 195

not abandon the single-agent perspective altogether). Whenwe speak about learningin a population of agents, we mean the change in the constitution and behavior of thatpopulation over time. These models were originally developed by population biologiststo model the process of biological evolution, and later adopted and adapted by otherfields.

In the first subsection we present the model of thereplicator dynamic, a simplemodel inspired by evolutionary biology. In the second subsection we present the con-cept ofevolutionarily stable strategies, a stability concept that is related to the replica-tor dynamic. We conclude with a somewhat different model ofagent-based simulationand the concept ofemergent conventions.

6.8.1 The replicator dynamic

The replicator dynamicmodels a population undergoing frequent interactions among replicatordynamicthemselves. We will concentrate on the symmetric, two-player case, in which the

agents repeatedly play a two-player symmetric normal form stage game against eachother12.

Definition 6.8.1 (Symmetric2× 2 game) Let a two-player two-action normal-formgame be called asymmetric gameif it has the following form. symmetric game

A B

A x, x u, v

B v, u y, y

Intuitively, this requirement says that the agents don’t have distinct roles in the game,and the payoff for agents does not depend on the agents’ names. We have already seenseveral instances of such games, including the Prisoner’s Dilemma game.13

The replicator dynamic model describes a population of agents playing such a gamein an ongoing fashion. The agents only play pure strategies;at any point in time thereeach agent is associated with one given pure strategy. Informally speaking, the modelthen has the agents play each other, each obtaining some average payoff. This payoff iscalled the agent’sfitness. At this point the biological inspiration kicks in—each agent fitnessnow “reproduces” in a manner proportional to this fitness, and the process repeats. Thequestion is whether the process converges to a fixed proportion of the various purestrategies within the population, and if so to which fixed proportions.

The verbal description above is only meant to be suggestive.The actual mathemat-ical model is a little different. First, we never explicitlymodel the play of the game

12. There exist much more general notions of symmetric normal-form games with multiple actions andplayers, but the following is sufficient for our purposes.13. This restriction to symmetric games is very convenient, simplifying both the substance and notation.However, there exist more complicated evolutionary models, including ones with different strategy spacesfor different agents and non-symmetric payoffs. At the end of the chapter we point the reader to furtherreading on these models.

Multiagent Systems, draft of August 27, 2007

Page 212: Mas kevin b-2007

196 6 Learning and Teaching

between particular sets of players; we only model the proportions of the populationsassociated with a given strategy. Second, the model is not one of discrete repetitionsof play, but rather one of continuous evolution. Third, beside the fitness-based repro-duction, there is also a random element that impacts the proportions in the population(again, because of the biological inspiration, this randomelement is calledmutation).mutation

The formal model is as follows. Given a normal form gameG = (1, 2, A, u), let

ϕt(a)

denote the number of players playing actiona at timet. Also, let

θt(a) =ϕt(a)∑

a′∈A ϕt(a′)

be the proportion of players playing actiona at timet. We denote withϕt the vectorof measures of players playing each action, and withθt the vector of population sharesfor each action.

The expected payoff to any player of playing actiona at timet is

ut(a) =∑

a′

θt(a′)u(a, a′)

The change in the number of agents playing actiona at timet is defined to be propor-tional his fitness, that is, his average payoff at the currenttime:

ϕt(a) = ϕt(a)ut(a)

The absolute numbers of agents of each type are not important, only the relative ratiosare. Defining the average expected payoff of the whole population as

u∗t =∑

a

θt(a)ut(a)

we have that the change in the fraction of agents playing actiona at timet is

θt(a) =ϕt(a)

∑a′inA ϕt(a

′)− ϕt(a)∑

a′∈A ϕt(a′)

(∑

a′∈A ϕt(a′))2= θt(a)[ut(a)− u∗t ].

The system we have defined has a very intuitive quality. If an action does better than thepopulation average then the proportion of the population playing this action increases,and vice versa. Note that even an action that is not a best response to the currentpopulation state can grow as a proportion of the population when its expected payoff isbetter than the population average.

How should we interpret this evolutionary model? A straightforward way is asagents repeatedly interacting and replicating within a large population. However, wecan also interpret the fraction of agents playing a certain strategy as one mixed strategyof a single agent, and the process as that of two identical agents repeatedly updatingtheir mixed identical strategy based on the previous interaction. Seen in this light, ex-cept for its continuous-time nature, the evolutionary model is not as different from therepeated-game model as it seems at first glance.

c© Shoham and Leyton-Brown, 2007

Page 213: Mas kevin b-2007

6.8 Evolutionary learning and other large-population models 197

We would like to examine the equilibrium points in this system. Before we do,we need a definition of stability. We define asteady stateof a population using the steady statereplicator dynamic to be a population stateθ such that for alla ∈ A, θ(a) = 0. Inother words, a steady state is a state in which the populationshares of each action areconstant.

This stability concept has a major flaw. Any state in which allplayers play the sameaction is a steady state. The population shares of the actions will remain constant be-cause the replicator dynamic does not allow the “entry” of strategies that are not alreadybeing played. To disallow these states, we will often require that our steady states arestable. A steady stateθ of a replicator dynamic is stable if for every neighborhoodU stable steady

stateof θ there is another neighborhoodU ′ of θ such that ifθ0 ∈ U ′ thenθt ∈ U for allt > 0. That is, if the system starts close enough to the steady state, it remains nearby.

Finally, we might like to define an equilibrium state which, if perturbed, will even-tually return back to the state. We call thisasymptotic stability. A steady stateθ of a asymptotically

stable statereplicator dynamic is asymptotically stable if it is stable, and in addition if for everyneighborhoodU of θ it is the case that ifθ0 ∈ U thenlimt→∞ θt = θ.

The following example illustrates some of these concepts. Consider a homogeneouspopulation playing the anti-coordination game, repeated in Figure 6.8.

A B

A 0, 0 1, 1

B 1, 1 0, 0

Figure 6.8 The anti-coordination game.

The game has two pure strategy Nash equilibria,(L,R) and(R,L) and one mixedstrategy equilibrium in which both players select actions from the distribution(0.5, 0.5).Because of the symmetric nature of the setting, there is no way for the replicator dy-namic to converge to the pure strategy equilibria. However,note that the state cor-responding to the mixed strategy equilibrium is a steady state, because when half ofthe players are playingL and half are playingR, both strategies have equal expectedpayoff (0.5) and the population shares of each are constant. Moreover, notice that thisstate is also asymptotically stable. The replicator dynamic, when started in any otherstate of the population (where the share of players playingL is more or less than 0.5)will converge back to the state(0.5, 0.5). More formally we can express this as

θ(A) = θ(A)(1− θ(A)− 2θ(A)(1− θ(A)))

= θ(A)(1− 3θ(A) + 2θ(A)2).

This expression is positive forθ(A) < 0.5, exactly 0 at0.5, and negative forθ(A) >0.5, so that the state(0.5, 0.5) is asymptotically stable.

This example suggests that there may be a special relationship between Nash equi-libria and states in the replicator dynamic. Indeed, this isthe case, as the followingresults indicate.

Multiagent Systems, draft of August 27, 2007

Page 214: Mas kevin b-2007

198 6 Learning and Teaching

Theorem 6.8.2 Given a normal form gameG = (1, 2, A = a1, . . . , ak, u), if thestrategy profile(S, S) is a mixed strategy Nash equilibrium ofG then the populationshare vectorθ = (S(a1), . . . , S(ak)) is a steady state of the replicator dynamic ofG.

In other words, every symmetric Nash equilibrium is a steadystate. The reason forthis is quite simple. In a state corresponding to a Nash equilibrium, all strategies beingplayed have the same average payoff, so the population shares remain constant.

It is not the case, however, that every steady state of the replicator dynamic is a Nashequilibrium. In particular, states in which not all actionsare played may be steadystates because the replicator dynamic cannot introduce newactions, even when thecorresponding mixed strategy profile is not a Nash equilibrium.

Theorem 6.8.3 Given a normal form gameG = (1, 2, Aa1, . . . , ak, u) and amixed strategyS, if the population share vectorθ = (S(a1), . . . , S(ak)) is a stablesteady state of the replicator dynamic ofG, then the strategy profile(S, S) is a mixedstrategy Nash equilibrium ofG.

In other words, every stable steady state is a Nash equilibrium. It is easier to under-stand the converse of this. If a mixed strategy profile is not aNash equilibrium, thensome action must have a higher payoff than some of the actionsin its support. Then inthe replicator dynamic the share of the population using this better action will increase,once it exists. Then it is not possible that the population state corresponding to thismixed strategy profile is a stable steady state.

Finally, we show that an asymptotically stability corresponds to a notion that isstronger than Nash equilibrium. Recall the definition of trembling hand perfection,reproduced here for convenience:

Definition 6.8.4 (Repetition of definition 2.3.22)A mixed strategyS is a (trembling-hand) perfect equilibriumof a normal-form gameG if there exists a sequenceS0, S1, . . .of completely mixed strategy profiles such thatlimn→∞ Sn = S, and such that for eachSk in the sequence and each playeri, the strategysi is a best response to the strategiessk−i.

Trembling-hand perfection is related to the replicator dynamic as follows.

Theorem 6.8.5 Given a normal form gameG = (1, 2, A, u) and a mixed strategyS, if the population share vectorθ = (S(a1), . . . , S(ak)) is an asymptotically stablesteady state of the replicator dynamic ofG, then the strategy profile(S, S) is a Nashequilibrium ofG that is trembling-hand perfect and isolated.14

6.8.2 Evolutionarily stable strategies

An evolutionarily stable strategy (ESS)is a stability concept that was inspired by theevolutionarilystable strategy(ESS)

replicator dynamic. However, unlike the steady states discussed above, it does notrequire the replicator dynamic, or any dynamic process, explicitly; rather it is a static

14. An equilibrium strategy profile isisolatedif there does not exist another equilibrium strategy profileinthe neighborhood (that is, reachable via small perturbations of the strategies) of the original profile

c© Shoham and Leyton-Brown, 2007

Page 215: Mas kevin b-2007

6.8 Evolutionary learning and other large-population models 199

solution concept. Although in principle it is not inherently linked to learning, in fact itbears a close relationship to the replicator dynamic.

Roughly speaking, an evolutionarily stable strategy is a mixed strategy that is “resis-tant to invasion” by new strategies. Inspiration for the concept of evolutionarily stablestrategy comes from the replicator dynamic. Suppose that a population of players isplaying a particular mixed strategy in the replicator dynamic. Then suppose that a smallpopulation of “invaders,” playing a different mixed strategy is added to the population.The original strategy is considered to be an ESS if it gets a higher payoff against theresulting mixture of the new and old strategies than the invaders do, thereby “chasingout” the invaders.

More formally, we have the following

Definition 6.8.6 (Evolutionarily stable strategy (ESS))Given a symmetric two-playernormal form gameG = (1, 2, A, u) and a mixed strategyS, we say thatS is an evo-lutionarily stable strategy if and only if for someǫ > 0 and for all other strategiesS′

it is the case that

u(S, (1− ǫ)S + ǫS′) > u(S′, (1− ǫ)S + ǫS′).

We can use properties of expectation to state the above equivalently as

(1− ǫ)u(S, S) + ǫu(S, S′) > (1− ǫ)u(S′, S) + ǫu(S′, S′).

Note that, since this only need hold for smallǫ, this is equivalent to requiring thateitheru(S, S) > u(S′, S) holds, or else bothu(S, S) = u(S′, S) andu(S, S′) >u(S′, S′) hold. Note that this is a strict definition. We can also state aweaker definitionof ESS. We say thatS is a weak evolutionarily stable strategy if and only if for someǫ > 0 and for allS′ it is the case that eitheru(S, S) > u(S′, S) holds, or else bothu(S, S) = u(S′, S) andu(S, S′) ≥ u(S′, S′) hold.

This weaker definition includes strategies in which the invader does just as wellagainst the original population as it does against itself. In these cases the populationusing the invading strategy will not grow, but it will also not shrink.

As noted earlier, the definition of ESS does not require a dynamic process – thereplicator dynamic or any other. ESS are a static notion, defined solely in terms of theexpected utility earned by playing one mixed strategy against another.

We illustrate the concept of ESS with the following example.Thehawk-dovegameis a symmetric two-player game represented with the matrix shown in Figure 6.9.

D H

D 12 ,

12 0, 1

H 1, 012 (1− c),12 (1− c)

Figure 6.9 The hawk-dove game.

Multiagent Systems, draft of August 27, 2007

Page 216: Mas kevin b-2007

200 6 Learning and Teaching

In this game, two players are hunting together, and each can choose to behave eitheras a hawk (H) or as a dove (D). If both players chooseD, then they split the valueof the prey. If both players chooseH, then they fight, and so the value of the prey isreduced byc and then split. Finally, if one player choosesH and the otherD, the onechoosingH takes all of the prey, and the other receives nothing.

Consider the (pure) strategy in which each player chooses the actionH with prob-ability 1. Any other mixed strategy must assign some positive probability to choosingthe actionD. If there were a set of players playingH together, and they were invadedby a small population of players who were playing any other mixed strategy, then theinvading players would necessarily receive lower expectedpayoff, and disappear fromthe population. More formally, we have that

u(H,H) > u(S′,H)

whereS′ is any mixed strategy other than the pure strategyH. Therefore, the purestrategyH is the only ESS of this game.

Not surprisingly, the evolutionarily stable strategy concept is closely related to Nashequilibrium.

Theorem 6.8.7 Given a symmetric two-player normal form gameG = (1, 2, A, u)and a mixed strategyS, if S is an evolutionarily stable strategy then it is a Nash equi-librium ofG.

This is easy to show. Note that by definition an ESSS must satisfy

u(S, S) ≥ u(S′, S)

In other words, it is a best response to itself, and thus must be a Nash equilibrium.However, a Nash equilibrium must be strict to be an ESS.

Theorem 6.8.8 Given a symmetric two-player normal form gameG = (1, 2, A, u)and a mixed strategyS, if S is a strict Nash equilibrium ofG, then it is an evolutionarilystable strategy.

This is also easy to show. Note that for any strict Nash equilibriumS it must be thecase that

u(S, S) > u(S′, S).

But this satisfies the first criterion of an ESS.The ESS also is related to the idea of stability in the replicator dynamic.

Theorem 6.8.9 Given a symmetric two-player normal form gameG = (1, 2, A, u)and a mixed strategyS, if S is an evolutionarily stable strategy then it is an asymptot-ically stable steady state of the replicator dynamic ofG.

Intuitively, if a state is an ESS then we know that it will be resistant to invasions byother strategies. Thus, when this strategy is represented by a population in the replicatordynamic, it will be resistant to small perturbations. What isinteresting, however, isthat the converse isnot true. The reason for this is that in the replicator dynamic, onlypure strategies can be inherited. Thus some states that are asymptotically stable wouldactually not be resistant to invasion by a mixed strategy, and thus not an ESS.

c© Shoham and Leyton-Brown, 2007

Page 217: Mas kevin b-2007

6.8 Evolutionary learning and other large-population models 201

6.8.3 Agent-based simulation and emergent conventions

It was mentioned in Section 6.8.1 that, while motivated by a notion of dynamic processwithin a population, in fact the replicator dynamic only models the gross statistics ofthe process, not its details. There are other large-population models that provide amore fine-grained model of the process, with many parametersthat can impact thedynamics. We call such models, which explicitly model the individual agents,agent-based simulationmodels. agent-based

simulationIn this we look at one such model, geared toward the investigation of how con-ventions emerge in a society. In any realistic multiagent system it is crucial that theagents agree on certainsocial rules, in order to decrease conflicts among them and pro-social rulemote cooperative behavior. Without such rules even the simplest goals might becomeunattainable by any of the agents, or at least not efficientlyattainable (just imaginedriving in the absence of traffic rules). A social rule restricts the options available toeach agent. A special case of social rules aresocial conventions, which limit the agents social

conventionto exactly one option from the many available ones (for example, always driving on theright side of the road). A good social rule or convention strike a balance between onthe one hand allowing agents sufficient freedom to achieve their goals, and on the otherhand restricting them so that they do not interfere too much with one another.

One can ask how social rules and conventions can be designed by a social designer(this is related to our discussion of social welfare in Chapter 8), but here we ask howsuch conventions emerge organically. Roughly speaking, the process we aim to studyis one in which individual agents occasionally interact with one another, and as a resultgain some new information. Based on its personal accumulated information, each agentupdates its behavior over time. This process is reminiscentof the replicator dynamic,but there are crucial differences. We start in the same way, and restrict the discussionto symmetric, 2-player-2-choices games. Here too one can look at much more generalsettings, but we will restrict ourselves to the game in Figure 6.10

A B

A x, x u, v

B v, u y, y

Figure 6.10 A game for agent-based simulation models.

However, unlike the replicator dynamic, here we assume a discrete process, and fur-thermore assume that at each stage exactly one pair of agents—selected at random fromthe population—play. This contrasts sharply with the replicator dynamic, which can beinterpreted as implicitly assuming that almost all pairs ofagents play before updatingtheir choice of action. In this discrete model each agent is tracked individually, andindeed different agents end up possessing very different information.

Multiagent Systems, draft of August 27, 2007

Page 218: Mas kevin b-2007

202 6 Learning and Teaching

Most importantly, in contrast with the replicator dynamic,the evolution of the systemis not defined by some global statistics of the system. Instead, each agent decides howto play the next game based on his individual accumulated experience thus far. Thereare two constraints we impose on such rules:

Anonymity:The selection function cannot be based on the identities of agents or theanonymouslearning rule names of actions.

Locality: The selection function is purely a function of the agent’s personal history;local learningrule

in particular, it is not a function of global system properties.The requirement of anonymity deserves some discussion. We are interested in how

social conventions emerge when we cannot anticipate in advance the games that willbe played. For example, if we know that the coordination problem will be that of de-ciding whether to drive on the left of the road or on the right,we can very well usethe names ‘left’ and ‘right’ in the action-selection rule; in particular, we can admit thetrivial update rule which has all agents drive on the right immediately. Instead, thetype of coordination problem we are concerned with is bettertypified by the followingexample. Consider a collection of manufacturing robots that have been operating ata plant for five years, at which time a new collection of parts arrive that must be as-sembled. The assembly requires using one of two available attachment widgets, whichwere introduced three years ago (and hence were unknown to the designer of the robotsfive years ago). Either of the widgets will do, but if two robots use different ones thenthey incur the high cost of conversion when it is time for themto mate their respectiveparts. Our goal is that the robots learn to use the same kind ofwidget. The point toemphasize about this example is that five years ago the designer could have stated rulesof the general form “if in the future you have several choices, each of which has beentried this many times and has yielded this much payoff, then next time make the fol-lowing choice”; the designer could not, however, have referred to the specific choicesof widget, since those were only invented two years later.

The prohibition on using agent identities in the rules (e.g., “if you see Robot 17 use awidget of a certain type then do the same, but if you see Robot 5do it then never mind”)is similarly motivated. In a dynamic society agents appear and disappear, denying thedesigner the ability to anticipate membership in advance. One can sometimes refer totherolesof agents (such as Head Robot), and have them be treated in a special manner,but we will not discuss this interesting aspect here.

Finally, the notion of ‘personal history’ can be further honed. We will assume thatthe agent has access to the action he has taken and the reward he received at eachinstance. One could assume further that the agent observes the choices of others inthe games in which he participated, and perhaps also their payoffs. But we will lookspecifically at an action-selection rule that does not make this assumption. This rule,called theHighest Cumulative Reward (HCR)rule, is the following learning procedure:Highest

CumulativeReward (HCR) 1. Initialize the cumulative reward for each action (for example, to zero).

2. Pick an initial action.

3. Play according to the current action and update its cumulative reward.

4. Switch to a new action iff the total payoff obtained from that action in the latestm

c© Shoham and Leyton-Brown, 2007

Page 219: Mas kevin b-2007

6.9 History and references 203

iterations is greater than the payoff obtained from the currently-chosen action in thesame time period.

5. Go to step 3.

The parameterm in the procedure denotes a finite bound, but the bound may vary.HCR is a simple and natural procedure, but it admits many variants: One can considerrules that use a weighted accumulation of feedback rather than simple accumulation, orone that normalize the reward somehow rather than look at absolute numbers. Howevereven this basic rule gives rise to interesting properties. In particular, under certainconditions it guarantees convergence to a “good” convention:

Theorem 6.8.10Let g be a symmetric game as above, withx > 0 or y > 0 or x =y > 0, and eitheru < 0 or v < 0 or x < 0 or y < 0. Then if all agents employ theHCR rule, it is the case that for everyǫ > 0 there exists an integerδ such that afterδiterations of the process the probability that a social convention is reached is greaterthan1− ǫ. Once a convention is reached, it is never left. Furthermore, this conventionguarantees to the agent a payoff which is no less than the maximin value ofg.

There are many more questions to ask about the evolution of conventions: Howquickly does a convention evolve? How does this time depend on the various parame-ters, for examplem, the history remembered? How does it depend on the initial choicesof action? How does the particular convention reached—sincethere are many—dependon these variables? The discussion at the end of the chapter points the reader to furtherreading on this topic.

6.9 History and references

There are quite a few broad introductions to, and textbooks on, single-agent learning.In contrast, there are few general introductions to the areaof multiagentlearning. Fu-denberg and Levine [1998] provide a comprehensive survey ofthe area from a gametheoretic perspective, as does Young [2004]. A special issue of the Journal of ArtificialIntelligence [Vohra & (Eds.), 2007] looked at the foundations of the area. Parts of thischapter are based on [Shohamet al., 2007] from that special issue.

Some of the specific references are as follows.Fictitious play was introduced in [Brown, 1951; Robinson, 1951]. The convergence

results for fictitious play in Theorem 6.2.4 are taken respectively from [Robinson,1951], [Nachbar, 1990], [Monderer & Shapley, 1996b] and [Berger, 2005]. Thenon-convergence example appeared in [Shapley, 1964].

Rational learning was introduced and analyzed by Kalai and Lehrer [1993]. A richliterature followed, but this remains the seminal paper on the topic.

Single-agent reinforcement learning is surveyed in [Kaelbling et al., 1996]. Somekey publications in the literature include [Bellman, 1957]on value iteration in knownMDPs, and [Watkins, 1989; Watkins & Dayan, 1992] on Q-learning in unknown MDPs.The literature on multiagent reinforcement learning begins with [Littman, 1994]. Someother milestones in this line of research are as follows: [Littman & Szepesvari, 1996]

Multiagent Systems, draft of August 27, 2007

Page 220: Mas kevin b-2007

204 6 Learning and Teaching

completed the story regarding zero-sum games, [Claus & Boutilier, 1998] tackled thecase of pure coordination (or team) games, and [Hu & Wellman,1998; Bowling &Veloso, 2001; Littman, 2001] attempted to generalize the approach to general-sumgames. The R-max algorithm was introduced in [Brafman & Tennenholtz, 2002], andits predecessor, the E3 algorithm, in [Kearns & Singh, 1998].

The notion of no-regret learning can be traced to Blackwell’s Approachability Theo-rem [Blackwell, 1956] and Hannan’s notion of Universal Consistency [Hannan, 1957].A good review of the history of this line of thought is provided in [Foster & Vohra,1999]. The regret-matching algorithm and the analysis of its convergence to correlatedequilibria appears in [Hart & Mas-Colell, 2000]. Modifications of fictitious play thatexhibit no-regret are discussed in [Fudenberg & Levine, 1995; Fudenberg & Levine,1999].

Targeted learning was introduced in [Powers & Shoham, 2005b], and further refinedand extended in [Powers & Shoham, 2005a; Vuet al., 2006] (however, the termtar-geted learningwas invented later to apply to this approach to learning).

The replicator dynamic is borrowed from biology. While the concept can be tracedback at least to Darwin’s work, work that had the most influence on game theory isperhaps [Taylor & Jonker, 1978]. The specific model of replicator dynamics discussedhere appears in [Schuster & Sigmund, 1982].

The concept of evolutionarily stable strategies (ESSs) again has a long history, butwas most explicitly put forward in [Smith & Price, 1973], andfigured prominentlya decade later in the seminal [Smith, 1982]. Experimental work on learning and theevolution of cooperation appears in [Axelrod, 1984b]. It includes discussion of a cele-brated tournament among computer programs that played a finitely repeated Prisoner’sDilemma game and in which the simple Tit-for-Tat strategy emerged victorious.

Emergent conventions and the HCR rule were introduced in [Shoham & Tennen-holtz, 1997].

c© Shoham and Leyton-Brown, 2007

Page 221: Mas kevin b-2007

7 Communication

Agents communicate; this is one of the defining characteristics of a multiagent sys-tem. In traditional linguistic analysis, the communication is taken to have a certainform (syntax), to carry a certain meaning (semantics), and to be influenced by variouscircumstances of the communication (pragmatics). As we shall see, a closer look atcommunication adds to the complexity of the story. Following our taxonomy of agenttheories from the Introduction, we can distinguish betweenpurely informationaltheo-ries of communication andmotivationalones. In informational communication, agentssimply inform each other of different facts. The theories ofbelief change, discussedin Chapter 13, look at ways in which beliefs change in the faceof new information—whether the beliefs are logical or probabilistic, consistent with prior beliefs or not. Inthis chapter we broaden the discussion and consider motivational theories of commu-nication, involving agents with individual motivations and possible courses of actions.

We divide the discussion into three parts. The first concernscheap talk, and describesa situation in which self-motivated agents can engage in costless communication be-fore taking action. As we see, in some situations this talk influences future behavior,and in some not. Cheap talk can be viewed as “doing by talking”; in contrast,sig-naling gamescan be viewed as “talking by doing”. In signaling games an agent cantake actions which, by virtue of the underlying incentives,communicate to the otheragent something new. Taken from game theory, cheap talk and signaling both apply incooperative as well as in competitive situations. In contrast, speech-act theory, takenfrom philosophy and linguistics, applies in purely cooperative situations. It describespragmatic ways in which language is used not only to convey information but to effectchange; as such, it too has the flavor of “doing by talking.”

7.1 “Doing by talking” I: cheap talk

Consider the prisoner’s dilemma game, reproduced here in Figure 7.1.Recall that the game has a unique dominant strategy equilibrium, the strategy profile

(D ,D), which ironically is also the only outcome that is not Paretooptimal; bothplayers would do better if they both chooseC instead. Suppose now that the prisonersare allowed to communicate before they play; will this change the outcome of thegame? Intuitively, the answer no. Regardless of the other agent’s action, the givenagent’s best action is stillD ; his talk is indeed cheap. Furthermore, regardless of his

Page 222: Mas kevin b-2007

206 7 Communication

C D

C −1,−1 −4, 0

D 0,−4 −3,−3

Figure 7.1 The Prisoner’s Dilemma game.

true intention, it is the interest of a given agent to get the other agent to playC ; his talkis not only cheap, but also not credible (or, as the saying goes, the talk is free—andworth every penny).

Contrast this with cheap talk prior to the coordination gameshown in Figure 7.2.

Left Right

Left 1, 1 0, 0

Right 0, 0 1, 1

Figure 7.2 Coordination game.

Here, if the row player declares “I will playU ” prior to playing the game, the columnplayer should take this seriously. Indeed, this utterance by the row player is bothselfcommittingandself revealing. These are two related but subtly different notions. Aself committing

utterance

self revealingutterance

declaration of intent is self committing if, once uttered, and assuming it is believed, theoptimal course of action for the player is indeed to act as declared. In this example, ifthe column player believes the utterance “I will playU ”, then his best response is toplay L. But then the row player’s best response is indeed to playU . In contrast, anutterance is self revealing if, assuming that it is uttered with the expectation that it willbe believed, it is only uttered when indeed the intention wasto act that way. In ourcase, a row player intending to playD will never announce the intention to playU , andso the utterance is self revealing.

It must be mentioned that the precise analysis of this example, as well as the laterexamples, is subtle in a number of ways. In particular, the equilibrium analysis revealsother, less desirable equilibria than the ones in which a meaningful message is trans-mitted and received. For example, this example has another,less obvious equilibrium.The column player could ignore anything the row player says,allowing its beliefs to beunaffected by signals. In this case, the row player has no incentive to say anything inparticular, and he might as well “babble,” that is, send signals that are uncorrelated withher type. For this reason, we call this ababbling equilibrium. In theory, every cheapbabbling

equilibrium talk game has a babbling equilibrium; it is always an equilibrium for one party to senda meaningless signal and for the other party to ignore it. An equilibrium that is not

c© Shoham and Leyton-Brown, 2007

Page 223: Mas kevin b-2007

7.1 “Doing by talking” I: cheap talk 207

a babbling equilibrium is called arevealing equilibrium. In a similar fashion one can revealingequilibriumhave odd equilibria in which messages are not ignored, but are used in a non-standard

way. For example, the row player might send the signalU when she meansD and viceversa, so long as the column player adopts the same convention. However, going for-ward we will ignore these complications, and assume a meaningful and straightforwardcommunication among the parties.

It might seem that self commitment and self revelation are inseparable, but this is anartifact of the pure coordination nature of the game. In suchgames the utterance createsa so-calledfocal point, a signal on which the agents can coordinate their actions. But focal pointnow consider the well knownstag-hunt game.

stag-hunt gameArtemis and Calliope are about to go hunting, and are trying to decidewhether they want to hunt stag or hare. If both hunt stag, theydo verywell; if one tries to hunt stag alone, she fails completely. On the otherhand, if one hunts rabbits alone, she will do well, for there is no compe-tition; if both hunt rabbits together, they only do OK, for they each havecompetition.

The payoff matrix for this game is shown in Figure 7.3.

Stag Hare

Stag 9, 9 0, 8

Hare 8, 0 7, 7

Figure 7.3 Payoff matrix for the stag-hunt game.

In each cell of the matrix, Artemis’ payoff is listed first, and Calliope’s payoff islisted second. This game has a symmetric mixed strategy equilibrium, in which eachplayer hunts stag with probability78 , yielding an expected utility of7 7

8 . But now sup-pose Artemis can speak to Calliope before the game; can he do any better? The answeris arguably yes. Consider the message “I plan to hunt stag.” It is not self-revealing;Artemis would like Calliope to believe this, even if she doesnot actually plan to huntstag. However, it is self-committing; if Artemis were to think that Calliope believesher, then Artemis would actually prefer to hunt stag.

There is however the question of whether Calliope would believe the utterance,knowing that it is not self-revealing on the part of Artemis;some view self commit-ment without self revelation as a notion lacking force.

Let’s define the game more generally. Consider the game in Figure 7.4.Here, ifx is less than 7, then the message “I plan to hunt stag” is possibly credible.

However, ifx is greater than 7, then that message is not at all credible, because it isin Artemis’ best interest to get Calliope to hunt stag, no matter what Artemis actuallyintends to play.

We have so far spoken about communication in the context of games of perfectinformation. In such games all that can possibly be revealedin the intention to act a

Multiagent Systems, draft of August 27, 2007

Page 224: Mas kevin b-2007

208 7 Communication

Stag Hare

Stag 9, 9 0, x

Hare x, 0 7, 7

Figure 7.4 More general payoff matrix for the stag-hunt game.

certain way. In games of incomplete information, however, there is an opportunity toreveal one’s own private information prior to acting.

Consider the following example. The Acme Corporation wantsto hire Sally in oneof two positions: a demanding and an undemanding position. Sally may have high orlow ability. Sally prefers the demanding position if she hashigh ability (presumablybecause of salary and intellectual challenge) and she prefers the undemanding positionsif she instead has low ability (presumably because it will bemore manageable). Acmetoo prefers that Sally be in the demanding position if she hashigh ability, and that shebe in the undemanding position if she is of low ability. The situation is modeled by thetwo games in Figure 7.5.

Signal High Ability 3, 1

Signal Low Ability 0, 0

High-Ability Game

Signal High Ability 0, 0

Signal Low Ability 2, 1

Low-Ability Game

Figure 7.5 Payoff matrix for Sally’s job-hunt game.

In each cell of the matrix, Sally’s payoff is listed first, andAcme’s payoff is listedsecond. The actual game being played is determined by Nature; for concreteness, letus assume that selection is done with uniform probability. Importantly, however, onlySally knows what her true ability level is. However, before they play the game, Sallycan send Acme a signal about her ability level. Suppose for the sake of simplicity thatSally can only choose from two signals: “My ability is low,” and “My ability is high.”Note that Sally may choose to be either sincere or insincere.

What signal should Sally send? It seems obvious that she should tell the truth. Shehas no incentive to lie about her ability. If she were to lie, and Acme were to believeher, then she would receive a lower payoff than if she had toldthe truth. Acme knowsthat she has no reason to lie, and will believe her. Thus therein an equilibrium in whichwhen Sally has low ability she says so, and Acme gives her an undemanding job, andwhen Sally has high ability she also says so, and Acme gives her a demanding job. The

c© Shoham and Leyton-Brown, 2007

Page 225: Mas kevin b-2007

7.2 “Talking by doing”: signaling games 209

message is therefore self-signaling; assuming she will be believed, Sally will send themessage only if it is true.

7.2 “Talking by doing”: signaling games

We have so far discussed the situation in which cheap talk preceded action. But some-times actions speak louder than words. In this section we consider a class of imperfect-information games calledsignaling games.

Definition 7.2.1 (Signaling game)A signaling gameis a two-player game in which signaling gameNature selects a game to be played (we will assume that it doesso according to a com-monly known distribution), player 1 is informed of that choice and chooses in action,and player 2 then chooses an action not knowing Nature’s choice, but knowing player1’s choice.

In other words, a signaling game is an extensive form game of incomplete informa-tion for player 2.

It is tempting to model player 2’s decision problem as follows. Since each of thepossible games has a different set of payoffs, player 2 must first calculate the posteriorprobability distribution over possible games, given the message that she received fromplayer 1. She can calculate this using Bayes rule with the prior distribution over games,and the conditional probabilities of player 1’s message given the game. More precisely,the expected payoff for each action is as follows.

u2(a,m) = E(u2(g,m, a)|m,a)=

∑g∈G u2(g,m, a)P (g|m,a)

=∑

g∈G u2(g,m, a)P (g|m)

=∑

g∈G u2(g,m, a)P (m|g)P (g)

P (m)

=∑

g∈G u2(g,m, a)P (m|g)P (g)∑

g∈G P (m|g)P (g)

One problem with this formulation is that the use of Bayes’ rule requires that theprobabilities involved be non-zero. But more acutely, how does player 2 calculate theprobability of player 1’s message given a certain game? Thisis not at all obvious, inlight of the fact that player 2 knows that player 1 knows that player 2 will go throughsuch reasoning, et cetera. Indeed, even if player 1 has a dominant strategy in the gamebeing played the situation is not straightforward. Consider the following signalinggame. Nature chooses with equal probability one of the following two zero-sum normalform games:

L RU 4 1D 3 0

L RU 1 3D 2 5

Multiagent Systems, draft of August 27, 2007

Page 226: Mas kevin b-2007

210 7 Communication

Recall that player 1 knows which game is being played, and will choose his messagefirst (U orD), and then player 2, who does not know which game is being played, willchoose his action (L orR). What should player 1 do?

Note that in the leftmost game(U,R) is a dominant strategy equilibrium, and inrightmost game(D,L) is a dominant strategy equilibrium. Since player 2’s preferredaction depends entirely on the game being played, and he is confident that player 1 willplay his dominant strategy, his best response isR if player 1 choosesU , andL if player1 choosesD. If player 2 plays in this fashion, we can calculate the expected payoff toplayer 1 as follows.

E(u1) = (0.5)1 + (0.5)2 = 1.5

This seems like an optimal strategy. However, consider a different strategy for player1. If player 1 always choosesD, regardless of what game he is playing, then his payoffis independent of player 2’s action. We calculate the expected payoff to player 1 asfollows, assuming that player 2 playsL with probability p, andR with probability(1− p).E(u1) = (0.5)(3p+ 0(1− p)) + (0.5)(2p+ 5(1− p)) = 2.5

Thus player 1 has a higher expected payoff if he always chooses the messageD.The example highlights an interesting property of signaling games. Although player

1 has privileged information, it may not always be to his advantage to exploit it. Thisis because by exploiting the advantage, he is effectively telling player 2 what game isbeing played, and thus losing his advantage. Thus in some cases player 1 can receive ahigher payoff by ignoring his information.

Signaling games fall under the umbrella term ofgames of asymmetric information.games ofasymmetricinformation

One of the best known examples is the so-calledSpence signaling game, which offers a

Spencesignaling game

rationale for enrolling in a difficult academic program. Consider the situation in whichan employer is trying to decide how much to pay a new worker. The worker may ormay not be talented, and can signal this to the employer by choosing to get either ahigh or low level of education. Specifically, we can model thesetting as a Bayesiangame between an employer and a worker in whichNature first chooses the level of theworker’s talent,θ, to be eitherθL or θH , such thatθL < θH . This value ofθ definestwo different possible games. In each possible game, the worker’s strategy space is thelevel of educatione to get for both possible types, or level of talent. We useeL to referto the level of education chosen by the worker if her talent level is θL, andeH for theeducation chosen if her talent level isθH . We assume that the worker knows her talent.

Finally, the employer’s strategy specifies two wages,wH andwL, to offer a workerbased on whether her signal iseH or eL. We assume that the employer does not knowthe level of talent of the worker, but does get to observe the level of education of theworker. The employer is assumed to have two choices. One is that it ignores the signaland setswH = wL = pHθH + pLθL, wherepL + pH = 1 are the probabilities withwhich nature chooses a high and low talent for the worker. In the other it pays a workerwith a high educationθH and a worker with a low educationθL.

The payoff to the employer isθ−w, the difference between the talent of the workerand the payment to her. The payoff to the worker isw− e/θ, reflecting the assumptionthat the education is easier when the talent is higher.

c© Shoham and Leyton-Brown, 2007

Page 227: Mas kevin b-2007

7.3 “Doing by talking” II: speech-act theory 211

This game has two equilibria. The first is apooling equilibrium, in which the worker poolingequilibriumwill choose exactly the same level of education regardless of its type (eL = eH =

e∗), and the employer pays all workers the same amount. The other is a separatingequilibrium, in which the worker will choose a different level of education, depending separating

equilibriumon her type. In this case a low talent worker will choose to getno education,eL = 0,because the wage paid to this worker iswL, independent ofeL. The education chosenby a high talent worker is set in such a way as to make it unprofitable for either type ofworker to mimic the other. This is the case only if the following two inequalities aresatisfied.

θL ≥ θH − eH/θL

θL ≤ θH − eH/θH

These inequalities can be rewritten in terms ofeH as the following.

θL(θH − θL) ≤ eH ≤ θH(θH − θL)

Note that sinceθH > θL, a separating equilibrium always exists.

7.3 “Doing by talking” II: speech-act theory

Humans communicate in an extremely sophisticated manner. Human communicationsare as rich and imprecise as natural language, tone, affect,and body language permit,and human motivations are similarly complex. It is not surprising that philosophersand linguists have attempted to model such communication. As mentioned at the startof the chapter, human communication is analyzed on many different levels of abstrac-tion, among them thesyntactic, semantic, andpragmatic levels. The discussion ofspeech acts lies squarely within the pragmatic level, although it should be noted thatare legitimate arguments against a crisp separation among these layers.

7.3.1 Speech acts

The traditional view of communication was that it is the sharing of information. Speech-act theory, due to the philosopher J.L. Austin, embody the insight that some communi-cations can instead be viewed as actions, intended to achieve some goal.

Speech-act theory distinguishes between three different kinds of speech acts, or, ifyou wish, three levels at which an utterance can be analyzed.The locutionary actis locutionary actmerely the emission of a signal carrying a certain meaning. When I say “there’s a carcoming your way”, the locution refers to the content transmitted. Locutions establisha proposition, which may be true or false. However, the utterance can be also viewedas anillocutionary act, which in this case is awarning. In general, an illocution is illocutionary actthe invocation of a conventional force on the receiver through the utterances. Otherillocutions can be making a request, telling a joke, or, indeed, simply informing.

Finally, if the illocution captures the intention of the speaker, theperlocutionary act perlocutionaryactis the bringing about of an effect on the hearer as a result of an utterance. Although the

illocutionary and perlocutionary acts may seem similar, itis important to distinguish

Multiagent Systems, draft of August 27, 2007

Page 228: Mas kevin b-2007

212 7 Communication

between an illocutionary act and its perlocutionary consequences. Illocutionary actsdo somethingin saying something, while perlocutionary acts do somethingby sayingsomething. Perlocutionary acts include scaring, convincing, and saddening. In ourcar example above, the perlocution would be an understanding by the hearer of theimminent danger causing him to jump from in front of the car.

Illocutions thus may or may not be successful.Performativesconstitute a type of actperformativethat is inherently successful. Merely saying something achieves the desired effect. Forexample, the utterance “please get off my foot” (or, somewhat more stiffly, “I herebyrequest you to get off my foot”) is a performative. The speaker asserts that the utteranceis a request, and is thereby successful in communicating therequest to the listener,because the listener assumes that the speaker is an expert onhis own mental state.Some utterances are performatives only under some circumstances. For example, thestatement “I hereby pronounce you man and wife” is a performative only if the speakeris empowered to conduct marriage ceremonies in that time andplace, if the rest of theceremony follows protocol, if the bride and groom are eligible for marriage, and soon.1

7.3.2 Rules of conversation

Building ont he notion of speech acts as a foundation, another important contributionlanguage pragmatics takes the form ofrules of conversation, as developed by P. Grice,rules of

conversation another philosopher. The simple observation is that humansseem to undertake the actof conversation cooperatively. Humans generally seek to understand and be understoodwhen engaging in conversation, even when other motivationsmay be at odds. It is inboth parties’ best interest to communicate clearly and efficiently. This is called theCooperative Principle.Cooperative

Principle It is also the case that humans generally follow some basic rules when conversing,which presumably help them to achieve the larger shared goalof the Cooperative Prin-ciple. These rules have come to be known as theGricean maxims. The four GriceanGricean maximsmaxims arequantity, quality, relation, andmanner. We discuss each one in turn.

The rule of quantity states that humans tend to provide listeners with exactly theamount of information required in the current conversation, even when they have accessto more information. As an example, imagine that a waitress asks you, “how do you likeyour coffee?” You would probably answer, “Cream, no sugar, please,” or somethingsimilar. You would probably not answer, “I likearabicabeans, grown in the mountainsof Guatemala. I prefer the medium roast from Peet’s Coffee. Ilike to buy whole beans,which I keep in the freezer, and grind them just before brewing. I like the coffee strong,and served with a dash of cream.” The latter response clearlyprovides the waitresswith much more information than she needs. You also probablywouldn’t respond, “nosugar,” because this doesn’t give the waitress enough information to do her job.

The rule of quality states that humans usually only say things that they actuallybelieve. More specifically, humans don’t say things they know to be false, and don’tsay things for which they lack adequate evidence. For example, if someone asks youabout the weather outside, you only respond that it is raining if in fact you believe that

1. It is however interesting to contemplate a world in which any such utterance results in a marriage.

c© Shoham and Leyton-Brown, 2007

Page 229: Mas kevin b-2007

7.3 “Doing by talking” II: speech-act theory 213

it is raining, and if you have evidence to support that belief.The rule of relation states that humans tend to say things that are relevant to the

current conversation. If a stranger approaches you on the street to ask for directions tothe nearest gas station, they would be quite surprised if youbegan to tell them a storyabout your grandmother’s cooking.

Finally, the rule of manner states that humans generally saythings in a manner that isbrief and clear. When you are asked at the airport whether anyone unknown to you hasasked you to carry something in your luggage, the appropriate answer is either “yes”or “no,” not “many people assume that they know their family members, but what doesthat really mean?” In general, humans tend to avoid obscurity, ambiguity, prolixity,and disorganization.

These maxims help explain a surprising phenomenon about human speech, namelythat we often succeed in communicating much more meaning than is contained directlyin the words they say. This phenomenon is calledimplicature. For example, suppose implicaturethat A and B are talking about a mutual friend, C, who is now working in a bank. Aasks B how C is getting on in his job, and B replies, “Oh quite well, I think; he likeshis colleagues, and he hasn’t been to prison yet.” Clearly, by stating the simple factthat C hasn’t been to prison yet, which is a truism for most people, B is implying,suggesting, or meaning something else. He may mean that C is the kind of person whois likely to yield to temptation, or that C’s colleagues are the are really very treacherouspeople, for example. In this case the implicature may be clear from the context of theirconversation, or A may have to ask B what he means.

Grice distinguished betweenconventionaland nonconventionalimplicature. Theformer refers to the case in which the conventional meaning of the words used deter-mines what is implicated. In the latter, the implication does not follow directly fromthe conventional meaning of the words, but instead follows from context, or from thestructure of the conversation, as is the case inconversational implicatures. conversational

implicatureIn conversational implicatures, the implied meaning relies upon the fact that thehearer assumes that the speaker is following the Gricean maxims. Let’s begin with anexample. A is standing by an immobilized car, and is approached by B. A says, “Iam out of gas.” B says, “There is a garage around the corner.” Although B does notexplicitly say it, she implicates effectively that she thinks that the garage is open andsells gasoline. This follows immediately from the assumption that B is following theGricean maxims of relation and quality. If she were not following the maxim of rela-tion, her utterance about the garage could be anon sequitur; if she were not followingthe maxim of quality, she could be lying. In order for a conversational implicature tooccur, (1) the hearer must assume that the speaker is following the maxims, (2) this as-sumption is necessary for the hearer to get the implied meaning, and (3) it is commonknowledge that the hearer can work out the implication.

Grice offers three types of conversational implicature. Inthe first, no maxim is vi-olated, as in the above example. In the second, a maxim is violated, but the hearerassumes that the violation is because of a clash with anothermaxim. For example, ifA asks, “Where does C live?” and B responds, “Somewhere in the South of France,”A can presume that B does not know more, and thus violates the maxim of quantity inorder to obey the maxim of quality. Finally, in the third typeof conversational implica-ture, a maxim is flouted, and the hearer assumes that there must be another reason for

Multiagent Systems, draft of August 27, 2007

Page 230: Mas kevin b-2007

214 7 Communication

it. For example, when a recommendation letter says very little about the candidate inquestion, the maxim of quantity is flouted, and the reader cansafely assume that thereis very little positive to say.

We give some examples of commonly occurring conversationalimplicatures. Hu-mans often use anif statement to implicate anif and only if statement. Suppose A saysto B, “If you teach me speech act theory I’ll kiss you.” In thiscase, if A didn’t meanifand only if, then A might kiss B whether or not B teaches A speech act theory. Then Awould have been violating the maxim of quantity, telling theB something that did notcontain any useful information.

In another common case, people often make a direct statementas a way to implicatethat they believe the statement. When A says to B, “Austin was right,” B is meant toimplicate, “A believes Austin was right.” Otherwise, A would have been violating themaxim of quality.

Finally, humans use a presupposition to implicate that the presupposition is true.When A says to B, “Grice’s maxims are incomplete,” A intends B to assume that Gricehas axioms. Otherwise, A would have been violating the maximof quality.

Note that conversational implicatures enableindirectspeech acts. Consider the clas-indirect speechact sic Eddie Murphy skit in which his mother says to him, “It’s cold in here, Eddie.”

Although her utterance is on the surface merely an informational locution, it is in factimplicating a request for Eddie to do something to warm up theroom.

7.3.3 A game theoretic view of speech acts

The discussion of speech acts so far has clearly been relatively discursive and informalas compared to the discussion in the other sections, and indeed to most of the book.This reflects the nature of the work in the field. There are advantages to the relativelaxness; it enables a very broad and multi-facetted theory.Indeed, quite a number ofresearchers and practitioners in several disciplines havedrawn inspiration from speech-acts theory. But it also comes at a price, as the theory can be pushed only so far beforethe missing details halt progress.

One could look in a number of directions for such formal foundations. Since thedefinition of speech acts appeals to the mental state of the speaker and hearer, one couldplausibly try to apply the formal theories of mental state discussed earlier in the book,and in particular theories of attitudes such as belief, desire and intention. Section 13.4outlines one such theory, but also makes it clear that so-called BDI theories are not yetfully developed. Here we will explore a different direction. Our starting point is thefact that there are at least two agents involved in communication, the speaker and thehearer. So why not model this as a game between them, in the sense of game theory,and analyze that game?

Although this direction too is not yet well developed, we shall see that some in-sights can be gleaned from the game theoretic perspective. We illustrate this via thephenomenon ofdisambiguationin language. One of the factors that render natural lan-disambiguationguage understanding so hard is that speech is rife with ambiguities at all levels, fromthe phonemic through the lexical to the sentence and whole text level. We will analyzethe following sentence-level ambiguity:

c© Shoham and Leyton-Brown, 2007

Page 231: Mas kevin b-2007

7.3 “Doing by talking” II: speech-act theory 215

Every ten minutes a person gets mugged in New York City.

The intended interpretation is of course that every ten minutes some different persongets mugged. The unintended, but still permissible, alternative interpretation that thesame person gets mugged over and over again (indeed, if one adds the sentence “I feelvery bad for him”, the implausible interpretation becomes the only permissible one).How do the hearer and speaker implicitly understand which interpretation is intended?

One way is to set this up as a common-payoff game of incompleteinformation be-tween the speaker and hearer (indeed, as we shall see, in thisexample we end up witha signaling game as defined in Section 7.2, albeit a purely cooperative one). The gameproceeds as follows:

1. There exist two situations:

s: Muggings of different people take place at ten-minute intervals in NYC.

t: The same person is repeatedly mugged every ten minutes in NYC.

2. Nature selects betweens andt according to a distribution known commonly to Aand B.

3. Nature’s choice is revealed to A but not to B.

4. A decides between uttering one of three possible sentences:

p: “Every ten minutes a person gets mugged in New York City.”

q: “Every ten minutes some person or another gets mugged in New York City.”

r: “There is a person who gets mugged every ten minutes in New York City.”

5. B hears A, and must decide whethers or t obtain.

This is a simplified view of the world (more on this shortly), but let us simplify iteven further. Let us assume that A cannot utterr whent obtains, and cannot utterqwhens obtains (that is, he can be ambiguous, but not deceptive). Let us furthermoreassume that when B hears eitherp or q he has no interpretation decision, and knowsexactly which situation obtains (s or t, respectively).

In order to analyze the game, we must supply some numbers. Letus assume that theprobability ofs is much higher that that oft. Say,P (s) = .99 andP (t) = .01. Finally,we need to decide on the payoffs. We assume that this is a game of pure coordination,that is a common-payoff game. A and B jointly have the goal that B correctly have theright interpretation. In addition, though, both A and B havea preference for simplesentences, since long sentences place a cognitive burden onthem and waste time. Andso the payoffs are as follows: If the sentence used isp and a correct interpretation isreached, the payoff is +10; If eitherq or r are uttered (after which by assumption acorrect interpretation is reached), the payoff is +7; and ifan incorrect interpretation isreached the payoff is -10.

The resulting game is depicted in Figure 7.6.What are the equilibria of this game? Here are two:

Multiagent Systems, draft of August 27, 2007

Page 232: Mas kevin b-2007

216 7 Communication

?>=<89:;Binfer s /.-,()*+ +7

?>=<89:;A

sayq lllllllllllll

sayp RRRRRRRRRRRRR/.-,()*+ +10

?>=<89:;Binfer t

infer s mmmmmmmmmmmmm /.-,()*+ −10

?>=<89:;N

selectsp = 0.99zzzzz

zzzzz

selecttp = 0.01

DDDD

D

DDDDD

?>=<89:;Binfer s

infer t QQQQQQQQQQQQQ/.-,()*+ −10

?>=<89:;A

sayp lllllllllllll

sayr RRRRRRRRRRRRR/.-,()*+ +10

?>=<89:;Binfer t /.-,()*+ +7

_ _

_ _

Figure 7.6 Communication as a signaling game.

1. A’s strategy: sayq in s andr in t. B’s strategy: When hearingp select between thes andt interpretations with equal probability.

2. A’s strategy: sayp in s and r in t. B’s strategy: When hearingp select thesinterpretation.

First, you should persuade yourself that these are in fact Nash equilibria (and evensub-game perfect ones, or, to be more precise since this is a game of imperfect in-formation, sequential equilibria). But then we might ask - is there a reason to preferone over the other? Well, one way to select is based on the expected payoff to theplayers. After all, this is a cooperative game, and it makes sense to expect the play-ers to coordinate on the equilibrium with highest payoff. Indeed, this would be oneway to implement the Cooperative Principle of Grice. Note that in the first equilibriumthe (common) payoff is+7, while in the second equilibrium the expected payoff is.99×+10 + .01×+7 = +9.97. And so it would seem that we have a winner on ourhands, and a particularly pleasing one since this use of language accords well with reallife usage. Intuitively, to economize we use shorthand for commonly occurring situa-tions which allows the hearer to make some default assumptions, but use more verboselanguage in the relatively rare situations in which those defaults are misleading.

This example can be extended in various ways. A can be given the freedom to sayother sentences, and B can be given greater freedom to interpret them. Not only couldA say q in s, butA could even say “I like cucumbers” ins. This is no less useful asentence thanp, so long as B conditions its interpretation correctly on it.The problem isof course that we end up with infinitely many good equilibria,and payoff-maximization

c© Shoham and Leyton-Brown, 2007

Page 233: Mas kevin b-2007

7.3 “Doing by talking” II: speech-act theory 217

cannot distinguish between them. And so language can be seento have evolved so asto providefocal pointsamong these equilibria; the “straightforward interpretation” of focal pointsthe sentence is a device to coordinate on one of the optimal equilibria.

Although we are still far from being able to account for the entire pragmatics oflanguage in this fashion, one can apply similar analysis to more complex linguisticphenomena, and it remains an interesting area of investigation.

7.3.4 Applications

The framework of speech act theory has been put to practical use in a number of com-puter science and artificial intelligence applications. Wegive a brief description ofsome of these applications below.

Intelligent dialog systems

One obvious application of speech act theory is adialog system, which communicates dialog systemwith human users through a natural language dialog interface. In order to be communi-cate efficiently and naturally with the user, dialog systemsmust obey the principles ofhuman conversation, including those from Austin and Grice presented in this chapter.

The TRAINS/TRIPS project at the University of Rochester is one of the best knowndialog systems. The system is designed to assist the user in accomplishing tasks ina transportation domain. The system has access to information about the state of thetransportation network, and the user makes decisions aboutwhat actions to take. Thesystem maintains an ongoing conversation with the user about possible actions and thestate of the network.

Discourse Level Act Type Sample ActsMulti-Discourse Argumentation elaborate, summarize

Acts clarify, convinceDiscourse Speech Acts inform, accept, request,

suggest, offer, promiseUtterance Grounding initiate, continue,

Acts acknowledge, repairSub-Utterance Turn-Taking take-turn, keep-turn,

Acts release-turn, assign-turn

Table 7.1 Conversation acts used by the TRAINS/TRIPS system.

The TRAINS/TRIPS dialog system both uses and extends the principles of speechact theory. It incorporates aSpeech Act Interpreter, which hypothesizes what speechacts the user is making, and aDialog Manager, which uses knowledge of those acts tomaintain the dialogue. It extends speech act theory by creating a hierarchy ofconversa-tion acts, as shown in Table 7.1. As you can see, speech acts appear in this frameworkas the conversation acts that occur at the discourse level.

Multiagent Systems, draft of August 27, 2007

Page 234: Mas kevin b-2007

218 7 Communication

Workflow systems

Another useful application of speech act theory is in workflow software, software usedto track and manage complex interactions within and betweenhuman organizations.These interactions range from simple business transactions to long-term collaborativeprojects, and each requires the involvement of many different human participants. Totrack and manage the interactions effectively, workflow software provides a mediumfor structured communications between all of the participants.

Many workflow applications are designed around an information processing frame-work, in which, for example, interactions may be modeled as assertions and queriesto a database. This perspective is useful, but lacks an explicit understanding and rep-resentation of the pragmatic structure of human communications. An alternative is toview each communication as an illocutionary speech act, which states an intention onthe part of the sender, and places constraints on the possible responses of the recipi-ent. Instead of generic messages, as in the case of email communications, users mustchoose from a set of communication types when composing messages to other partici-pants. Within this framework, they can write freely. For example, when responding toa request, users might be given the following options.

• Acknowledge

• Promise

• Free-Form

• Counter-offer

• Commit-to-commit

• Decline

• Interim-report

• Report-completion

The speech act framework confers a number of advantages to developers and usersof workflow software. Because the basic unit of communication is a conversation,rather than a message, the organization of communications is straightforward, and re-trieval simple. Furthermore, the status and urgency of messages is clear. Users can ask“In which conversations is someone waiting for me to do something?” or “In whichconversations have I have promised to do things?”. Finally,access to messages canbe organized and controlled easily, depending on project involvement and authoriza-tion levels. The downside is that it involves additional overhead in the communication,which may not be justified by the benefits, especially if the conversational structuresimplemented in the system do not capture well the rich set of communications thattakes place in the workplace.

Agent communication languages

Perhaps the most widespread use of speech act theory within the field of computerscience is for communication between software applications. Increasingly, computersystems are structured in such a way that individual applications can act as agents

c© Shoham and Leyton-Brown, 2007

Page 235: Mas kevin b-2007

7.3 “Doing by talking” II: speech-act theory 219

(i.e., with the popularization of the Internet and electronic commerce), each with itsown goals and planning mechanisms. In such a system, the software applications mustcommunicate with each other, and with their human users: to enlist the support of otheragents to achieve goals, to commit to helping another agent,to report their own status,to request a status report from another, etc.

Not surprisingly, several proposals have been made for artificial languages to serveas the medium for this inter-application communication. A relatively simple exampleis presented byKnowledge Query and Manipulation Language (KQML), which was KQMLdeveloped in the early 1990s by the External Interfaces Working Group. KQML incor-porates some of the ideas from speech act theory, especiallythe idea of performatives.It has a built-in set of performatives, such as ACHIEVE, ADVERTISE, BROKER,REGISTER, TELL, etc.

The following is an example of a KQML message, taken from a communicationbetween two applications that are operating in the blocks world domain.

(tell:sender Agent1:receiver Agent2:language KIF:ontology Blocks-World:content (AND (Block A) (Block B) (On A B)))

Note that the message is a performative. The content of the message uses blocks worldsemantics, which are completely independent of the semantics of the performative it-self.

Since that time, with the advent of abstract markup languages such asXML and the XMLso-calledSemantic Web, these ideas continue to influence the various standards being

Semantic Webdeveloped.

Rational programming

In the previous section we described how speech act theory isused in communicationbetween software applications. Some authors have proposedto use it directly in thedevelopment of the software applications, that is, as part of the programming languageitself. This proposal is part of a more general effort to introduce elements of rationalityinto programming languages. This new programming paradigmhas been termedra-tional programming. Just as object-oriented programming shifted the paradigmfrom rational

programmingwriting procedures to creating objects, rational programming shifts the paradigm fromcreating informational objects to creating motivational agents.

So where does communication come in? The motivational agents created by rationalprogramming must act in the world, and because the agents arenot likely to have aphysical embodiment, their actions consist of sending and receiving signals; in otherwords, their actions will be speech acts. Of course, as shownin the previous section, itis possible to construct communicating agents within existing programming paradigms.However, by incorporating speech acts as primitives, rational programming constructsmake such programs more powerful, easier to create, and morereadable.

We give a few examples for clarity.Elephant2000is a programming language de-scribed by McCarthy which explicitly incorporates speech acts. Thus, for example, an

Multiagent Systems, draft of August 27, 2007

Page 236: Mas kevin b-2007

220 7 Communication

Elephant2000 program can make a promise to another, and cannot renege on a promise.The following is an example statement from Elephant2000, taken from a hypotheticaltravel agency program.

if ¬ full (flight)then accept.request(

make (commitment (admit (psgr, flight))))

The intuitive reading of this statement is “if a passenger has requested to reserve a seaton a given flight, and that flight is not full, then make the reservation.”

Agent Oriented Programming (AOP)is a separate proposal that is similar in severalAgent OrientedProgramming(AOP)

respects. It too embraces speech acts as the form and meaningof the communicationamong agents. The most significant difference from Elephant2000 is that AOP alsoembraces the notion of mental state, consisting of beliefs and commitments. Thus theresult of an inform speech act is a new belief. AOP is not actually a single language, buta general design that allows multiple languages; one particular simple language,Agent0was defined and implemented. The following is an example statement in Agent0, takenfrom a print server application.

IFMSG COND: (?msgId ?someone REQUEST

((FREE-PRINTER 5min) ?time))MENTAL COND:

((NOT (CMT ?other (PRINT ?doc (?time+10min))))(B (FRIENDLY ?someone)))

THEN COMMIT((?someone (FREE-PRINTER 5min) ?time)

(myself (INFORM ?someone (ACCEPT ?msgId)) now))

The approximate reading of this statement is “if you get a request to free the printerfor five minutes at a future time, if you’re not committed to finishing a print job withinten minutes of that time, and if you believe the requester to be friendly, then accept therequest and tell them that you did.”

7.4 History and references

The literature on language and natural language understanding is of course vast andwe cannot do it justice here. We will focus on the part of the literature that bears mostdirectly on the material presented in the chapter.

Two early seminal discussions on cheap talk are due to Crawford and Sobel [1982]and Farrell [1987]. Later references include Rabin [1990] and Farrell [1993]. Goodoverviews are given by Farrell [1995] and Farrell and Rabin [1996].

The literature on signaling games dates considerably farther back. The Stackel-berg leadership model, later couched in game-theoretic terminology (as we do in thebook), were introduced by Heinrich von Stackelberg, a German economist, as a modelof duopoly in economics [von Stackelberg, 1934]. The literature on information eco-nomics, and in particular on asymmetric information, continued to flourish, culminat-ing in the 2001 Nobel prize in Economic Sciences awarded to three pioneers in the

c© Shoham and Leyton-Brown, 2007

Page 237: Mas kevin b-2007

7.4 History and references 221

area (Akerlof, Spence and Stiglitz). The Spence signaling game, which we cover in thechapter, appeared in [Spence, 1973].

Austin’s seminal book,How to Do Things with Words, was published in 1962 [Austin,1962], but is also available in a more recent second edition [Austin, 2006]. Grice’sideas were developed in several publications starting in the late 1960s, for example[Grice, 1969]. This and many other of his relevant publications were collected in[Grice, 1989]. Another important reference is [Searl, 1979].

The game theoretic perspective on speech acts is more recent. The discussion herefor the most part follows Parikh [2001]. Another recent reference covering a numberof issues at the interface of language and economics is [Rubinstein, 2000].

The TRAINS dialogue system is described by Allenet al. [1995], and the TRIPSsystem is described by Ferguson and Allen [1998]. The speech-act-based approach toworkflow systems follows the ideas of Winograd and Flores [1986] and FloresEtAl88.The KQML language is described by [Fininet al., 1997].

The termrational programmingwas coined by Shoham [1997]. Elements of theElephant 2000 programming language are described by McCarthy [1994]. The AgentOriented Programming (AOP) framework was described by Shoham [1993].

Multiagent Systems, draft of August 27, 2007

Page 238: Mas kevin b-2007
Page 239: Mas kevin b-2007

8 Aggregating Preferences: SocialChoice

In the preceding chapters we adopted what might be called the“agent perspective”: weasked what an agent believes or wants, and how an agent shouldor would act in a givensituation. We now adopt a complementary, “designer perspective”: we ask what rulesshould be put in place by the authority (the “designer”) orchestrating a set of agents.In this chapter this will take us away from game theory, but inthe end (the next twochapters) it will bring us right back to it.

8.1 Introduction

A simple example of the designer perspective is voting. How should a central authoritypool the preferences of different agents so as to best reflectthe wishes of the populationas a whole? It turns out that voting, the kind familiar from our political and otherinstitutions, is only a special case of the general class ofsocial choice problems. Social social choice

problemschoice is a motivational but non-strategic theory—agents have preferences, but do nottry to camouflage them in order to manipulate the outcome (of voting, for example)to their personal advantage.1 This problem is thus analogous to the problem of belieffusion that we present in Section 13.2.1, which is also non-strategic; here, however, weexamine the problem of aggregating preferences rather thanbeliefs.

We start with a brief and informal discussion of the most familiar voting scheme,plurality. We then give the formal model of social choice theory, consider other votingschemes, and present two seminal results about the sorts of preference aggregationrules that it is possible to construct. Finally, we considerthe problem of buildingranking systems, where agents rate each other.

8.1.1 Example: plurality voting

To get a feel for social choice theory, consider an example inwhich you are babysittingthree children—Will, Liam, Vic—and need to decide on an activity for them. Youcan choose among going to the video arcade (a), playing basketball (b), and drivingaround in a car (c). Each kid has a different preference over these activities, which is

1. Some sources use the term “social choice” to refer to both strategic and non-strategic theories; we do notfollow that usage here.

Page 240: Mas kevin b-2007

224 8 Aggregating Preferences: Social Choice

represented as a strict total ordering over the activities,and which he reveals to youtruthfully. By a ≻ b denote the proposition that outcomea is preferred to outcomeb.

Will: a ≻ b ≻ cLiam: b ≻ c ≻ a

Vic: c ≻ b ≻ a

What should you do? One straightforward approach would be to ask each kid to votefor his favorite activity, and then to pick the activity thatreceived the largest numberof votes. This amounts to what is called theplurality method. While quite standard,plurality votingthis method is not without problems. For one thing, we need toselect a tie-breakingrule (for example, we could select the candidate ranked firstalphabetically). A moredisciplined way is to hold a runoff election among the candidates tied at the top.

Even absent a tie, however, the method is vulnerable to the criticism that it does notmeet theCondorcet condition. This condition states that if there exists a candidatexCondorcet

condition such that if for all other candidatesy at least half the voters preferx to y, thenx mustbe chosen. If each child votes for his top choice, the plurality method would declarea tie between all three candidates, and in our example would choosea. However,the Condorcet condition would rule out botha (since in two of the preferencesb ispreferred toa) andc (since in two of the preferencesb is preferred toc).

Based on this example the Condorcet rule might seem unproblematic (and actuallyuseful since it breaks the tie without resorting to an arbitrary choice such as alphabeticalordering), but now consider a similar example in which the preferences are as follows.

Will: a ≻ b ≻ cLiam: b ≻ c ≻ a

Vic: c ≻ a ≻ b

In this case the Condorcet condition does not tell us what to do, illustrating the factthat it does not tell us how to aggregate arbitrary sets of preferences. We’ll return tothe question of what properties can be guaranteed in social choice settings; for themoment, we aim simply to illustrate that social choice is nota straightforward matter.In order to study it precisely, we must establish a formal model. Our definition willcover voting, but will also handle more general situations in which agents’ preferencesmust be aggregated.

8.2 A formal model

LetN = 1, 2, . . . , n denote a set of agents, and letO denote a finite set of outcomes(or alternatives, or candidates). Making a multiagent extension to the preference nota-tion introduced in Section 2.1.2, denote the proposition that agenti prefers outcomeo1to outcomeo2 by o1 ≻i o2. Because preferences are transitive, an agent’s preferencerelation induces apreference ordering, a total ordering onO. To simplify the expo-preference

ordering sition, we will assume for most of this chapter that an agent’s preferences are strict,though none of the results we present require this assumption. Thus, letL be the set ofstrict total orders; we will understand each agent’s preference ordering as an elementof L. Overloading notation, we denote an element ofL using the same symbol we used

c© Shoham and Leyton-Brown, 2007

Page 241: Mas kevin b-2007

8.2 A formal model 225

for the relational operator:≻i∈ L. Likewise, we define apreference profile≻∈ Ln as preferenceprofilea preference ordering for each agent.

Note that the arguments in Section 2.1.2 and Appendix?? show that preferenceorderings and utility functions are tightly related. We candefine an ordering≻i ∈ Lin terms of a given utility functionui : O → R for an agenti by requiring thato1 ispreferred too2 if and only if ui(o1) > ui(o2).

In what follows, we define two kinds of social functions. In both cases, the inputis a preference profile. Both classes of functions aggregatethese preferences, but in adifferent way.

8.2.1 Social choice functions

Social choice functionssimply select one of the alternatives (or, in a more generalversion, some subset).

Definition 8.2.1 (Social choice function)A social choice function(overN andO) is social choicefunctiona functionC : Ln → O.

A social choice correspondenceC : Ln → 2O differs from a social choice function social choicecorrespondenceonly in that it can return a set of candidates, instead of justa single one.

In our babysitting example there were three agents (Will, Liam, and Vic), and threepossible outcomes (a, b, c). The social choice correspondence defined by plurality vot-ing of course picks the subset of candidates with the most votes; in this example thesubset must either be the singleton consisting of one of the candidates, or else it mustinclude all candidates. Plurality is turned into a social choice function by any deter-ministic tie-breaking rule (for example, alphabetical).2

Let #(oi ≻ oj) denote the number of agents who prefer outcomeoi to outcomeoj under preference profile≻. We can now give a formal statement of the Condorcetcondition.

Definition 8.2.2 (Condorcet winner) An outcomeo ∈ O is theCondorcet winnerif Condorcetwinner∀o′ ∈ O, #(o ≻ o′) ≥ #(o′ ≻ o).

A social choice function satisfies theCondorcet conditionif it always picks a Con-dorcet winner when one exists. We saw above that for some setsof preferences theredoesnotexist a Condorcet winner. (Indeed, under reasonable conditions the probabilitythat there will exist a Condorcet winner approaches zero as the number of candidatesapproaches infinity.) Thus, the Condorcet condition does not always tell us anythingabout which outcome to choose.

An alternative is to find a rule that identifies asetof outcomes among which we canchoose. Extending upon the idea of the Condorcet condition,a variety of other condi-tions have been proposed that are guaranteed to identify a non-empty set of outcomes.We will not describe such rules in detail; however, we give one prominent examplehere.

Definition 8.2.3 (Smith set) TheSmith setis the smallest setS ⊆ O having the prop- Smith set

2. One can also define probabilistic versions of social choice functions; however, we will focus on thedeterministic variety.

Multiagent Systems, draft of August 27, 2007

Page 242: Mas kevin b-2007

226 8 Aggregating Preferences: Social Choice

erty that∀o′ 6∈ S, #(o ≻ o′) ≥ #(o′ ≻ o).

That is, every outcomein the Smith set is preferred by at least half of the agentsto every outcomeoutsidethe set. This set always exists. When there is a Condorcetwinner then that candidate is also the only member of the Smith set; otherwise, theSmith set is the set of candidates who participate in a “stalemate” (or “top cycle”).

8.2.2 Social welfare functions

Our second flavor of social function is thesocial welfare function.These are similarto social choice functions, but produce a richer object, a total ordering on the set ofalternatives.

Definition 8.2.4 (Social welfare function) A social welfare function(overN andO)social welfarefunction is a functionW : Ln → L.

Social welfare functions take preference profiles as input;denote the preferenceordering selected by the social welfare functionW , given preference profile≻′, as≻W (≻′). When the input ordering≻′ is understood from context, we abbreviate ournotation for the social ordering as≻W .

Social welfare functions can also be strengthened by allowing them to both inputand output weak total orderings on the set of alternatives. That is, as we suggestedabove, both the agents’ rankings and the social ranking can be total preorders, or rank-ings on the alternatives that can include ties. Essentiallyeverything we will say aboutsocial welfare functions can be extended to this case. However, we nevertheless restrictourselves to the case of total orders in order to simplify theexposition.

8.3 Voting

We now survey some of the most important voting methods, and discuss their proper-ties. Then we demonstrate that the problem of voting is not aseasy as it might appear,by showing some counterintuitive ways in which these methods can behave.

8.3.1 Voting methods

The most standard class of voting methods is callednon-ranking voting, in which eachnon-rankingvoting agent votes for one of the candidates. We have already discussed plurality voting.

Definition 8.3.1 (Plurality voting) Each voter casts a single vote. The candidate withthe most votes is selected.

As discussed above, ties must be broken according to a tie-breaking rule (e.g., basedon a lexicographic ordering of the candidates; through a runoff election between thefirst-place candidates, etc.) Since the issue arises in the same way for all the votingmethods we discuss, we will not belabor it in what follows.

Plurality voting gives each voter a very limited way of expressing his preferences.Various other rules are more generous in this regard. Consider cumulative voting.cumulative

voting

c© Shoham and Leyton-Brown, 2007

Page 243: Mas kevin b-2007

8.3 Voting 227

Definition 8.3.2 (Cumulative voting) Each voter is givenk votes, which can be castarbitrarily (e.g., several votes could be cast for one candidate, with the remainder ofthe votes being distributed across other candidates). The candidate with the most votesis selected.

Approval votingis similar. approval voting

Definition 8.3.3 (Approval voting) Each voter can cast a single vote for as many ofthe candidates as he wishes; the candidate with the most votes is selected.

We have presented cumulative voting and approval voting to give a sense of the rangeof voting methods. We will defer discussion of such rules to Section 8.5, however,since in the (non-strategic) voting setting as we have defined it so far, it is not clearhow agents should choose when to vote for more than one candidate. Furthermore,although it is more expressive than plurality, approval voting still fails to allow votersto express their full preference orderings. Voting methodsthat do so are calledrankingvotingmethods. Among them, one of the best known isplurality with elimination; e.g., ranking voting

plurality withelimination

this method is used for some political elections. When preference orderings are elicitedfrom agents before any elimination has occurred, the methodis also known asinstantrunoff.

Definition 8.3.4 (Plurality with elimination) Each voter casts a single vote for theirmost-preferred candidate. The candidate with the fewest votes is eliminated. Eachvoter who cast a vote for the eliminated candidate casts a newvote for the candidatehe most prefers among the candidates that have not been eliminated. This process isrepeated until only one candidate remains.

Another method which has been widely studied isBorda voting. Borda voting

Definition 8.3.5 (Borda voting) Each voter submits a full ordering on the candidates.This ordering contributes points to each candidate; if there aren candidates, it con-tributesn−1 points to the highest ranked candidate,n−2 points to the second highest,and so on; it contributes no points to the lowest ranked candidate. The winners arethose whose total sum of points from all the voters is maximal.

Nanson’s methodis a variant of Borda that eliminates the candidate with the lowest Nanson’smethodBorda score, recomputes the remaining candidates’ scores,and repeats. This method

has the property that it always chooses a member of the Condorcet set if it is nonempty,and otherwise chooses a member of the Smith set.

Finally, there ispairwise elimination. pairwiseelimination

Definition 8.3.6 (Pairwise elimination) In advance, voters are given a schedule forthe order in which pairs of candidates will be compared. Given two candidates (andbased on each voter’s preference ordering) determine the candidate that each voterprefers. The candidate who is preferred by a minority of voters is eliminated, and thenext pair of non-eliminated candidates in the schedule is considered. Continue untilonly one candidate remains.

Multiagent Systems, draft of August 27, 2007

Page 244: Mas kevin b-2007

228 8 Aggregating Preferences: Social Choice

8.3.2 Voting paradoxes

At this point it would be reasonable to wonder why so many voting schemes have beeninvented. What are their strengths and weaknesses? For that matter, is there one votingmethod that is appropriate for all circumstances? We will give a more formal (and moregeneral) answer to the latter question in Section 8.4. First, however, we will considerthe first question by considering some sets of preferences for which our voting methodsexhibit undesirable behavior. Our aim is not to point out every problem that exists withevery voting method defined above; rather, it is to illustrate the fact that voting schemesthat seem reasonable can often fail in surprising ways.

Condorcet condition

Let’s start by revisiting the Condorcet condition. Earlier, we saw two examples: one inwhich plurality voting chose the Condorcet winner, and another in which a Condorcetwinner did not exist. Now consider a situation in which thereare 1000 agents withthree different sorts of preferences:

499 agents: a ≻ b ≻ c3 agents: b ≻ c ≻ a

498 agents: c ≻ b ≻ a

Observe that 501 people out of 1000 preferb to a, and 502 preferb to c; this makesb the Condorcet winner. However, many of our voting methods would fail to selectbas the winner. Plurality would picka, asa has the largest number of first-place votes.Plurality with elimination would first eliminateb, and would subsequently pickc as thewinner. In this example Borda does selectb, but there are other cases where it fails toselect the Condorcet winner—can you construct one?

Sensitivity to a losing candidate

Consider the following preferences by 100 agents:

35 agents: a ≻ c ≻ b33 agents: b ≻ a ≻ c32 agents: c ≻ b ≻ a

Plurality would pick candidatea as the winner, as would Borda. (To confirm the latterclaim, observe that Borda assignsa, b andc the scores 103, 98 and 99 respectively.)However, if the candidatec didn’t exist, then plurality would pickb, as would Borda.(With only two candidates, Borda is equivalent to plurality.) A third candidate whostands no chance of being selected can thus act as a “spoiler”, changing the selectedoutcome.

Another example demonstrates that the inclusion of a least-preferred candidate caneven cause the Borda method toreverseits ordering on the other candidates:

3 agents: a ≻ b ≻ c ≻ d2 agents: b ≻ c ≻ d ≻ a2 agents: c ≻ d ≻ a ≻ b

c© Shoham and Leyton-Brown, 2007

Page 245: Mas kevin b-2007

8.4 Existence of social functions 229

Given these preferences, the Borda method ranks the candidatesc ≻ b ≻ a ≻ d, withscores of 13, 12, 11 and 6 respectively. If the lowest-rankedcandidated is dropped,however, the Borda ranking isa ≻ b ≻ c with scores of 8, 7 and 6.

Sensitivity to the agenda setter

Finally, we examine the pairwise elimination method, and consider the influence thatthe agenda setter can have on the selected outcome. Considerthe following prefer-ences, which we discussed previously:

35 agents: a ≻ c ≻ b33 agents: b ≻ a ≻ c32 agents: c ≻ b ≻ a

First, consider the ordera, b, c. a is eliminated in the pairing betweena andb; thencis chosen in the pairing betweenb andc. Second, consider the ordera, c, b. a is chosenin the pairing betweena andc; thenb is chosen in the pairing betweena andb. Finally,under the orderb, c, a, we first eliminateb, and ultimately choosea. Thus, given thesepreferences, the agenda setter can select whichever outcome he wants by selecting theappropriate elimination order!

Next, consider the following preferences:

1 agent: b ≻ d ≻ c ≻ a1 agent: a ≻ b ≻ d ≻ c1 agent: c ≻ a ≻ b ≻ d

Consider the elimination orderinga, b, c, d. In the pairing betweena andb, a is pre-ferred;c is preferred toa and thend is preferred toc, leavingd as the winner. However,all of the agents preferb to d—the selected candidate is Pareto-dominated!

Last, we give an example showing that Borda is fundamentallydifferent from pair-wise elimination,regardlessof the elimination ordering. Consider the preferences:

3 agents: a ≻ b ≻ c2 agents: b ≻ c ≻ a1 agent: b ≻ a ≻ c1 agent: c ≻ a ≻ b

Regardlessof the elimination ordering, pairwise elimination will select the candidatea. The Borda method, on the other hand, selects candidateb.

8.4 Existence of social functions

The previous section has illustrated several senses in which some popular voting meth-ods exhibit undesirable or unfair behavior. In this section, we consider this state ofaffairs from a more formal perspective, examining both social welfare functions andsocial choice functions.

Multiagent Systems, draft of August 27, 2007

Page 246: Mas kevin b-2007

230 8 Aggregating Preferences: Social Choice

8.4.1 Social welfare functions

In this section we examineArrow’s impossibility theorem, without a doubt the mostArrow’simpossibilitytheorem

influential result in social choice theory. Its surprising conclusion is that fairness ismulti-faceted, and that it isimpossibleto achieve all of these kinds of fairness simulta-neously. First, let us review these notions of fairness.

Definition 8.4.1 (Pareto Efficiency (PE))W is Pareto efficientif for any o1, o2 ∈ O,ParetoEfficiency (PE) ∀i o1 ≻i o2 implies thato1 ≻W o2.

In words, Pareto efficiency means that when all agents agree on the ordering of twooutcomes, the social welfare function must select that ordering. Observe that this defi-nition is effectively the same asstrict Pareto efficiencyas defined in Definition 2.3.2.3

Definition 8.4.2 (Independence of Irrelevant Alternatives(IIA)) W is independentof irrelevant alternativesif, for anyo1, o2 ∈ O and any two preference profiles≻′,≻′′∈Independence of

IrrelevantAlternatives(IIA)

Ln, ∀i (o1 ≻′i o2 if and only ifo1 ≻′′

i o2) implies that(o1 ≻W (≻′) o2 if and only ifo1 ≻W (≻′′) o2).

That is, the selected ordering between two outcomes should depend only on therelative orderings they are given by the agents.

Definition 8.4.3 (Non-dictatorship) W does not have adictatorif ¬∃i∀o1, o2(o1 ≻idictatorshipo2 ⇒ o1 ≻W o2).

Non-dictatorship means that there does not exist a single agent whose preferencesalways determine the social ordering. We say thatW is dictatorial if it fails to satisfythis property.

Surprisingly, it turns out that there exists no social welfare functionW that satisfiesthese three properties for all of its possible inputs. This result relies on our previousassumption thatN is finite.

Theorem 8.4.4 (Arrow, 1951) If |O| ≥ 3, any social welfare functionW that is Paretoefficient and independent of irrelevant alternatives is dictatorial.

Proof. We will assume thatW is both PE and IIA, and show thatW must bedictatorial. The argument proceeds in four steps.

Step 1: If every voter puts an outcomeb at either the very top or the very bottomof his preference list,bmust be at either the very top or very bottom of≻W as well.

Consider an arbitrary preference profile≻ in which every voter ranks someb ∈O at either the very bottom or very top, and assume for contradiction that the aboveclaim is not true. Then, there must exist some pair of distinct outcomesa, c ∈ Ofor whicha ≻W b andb ≻W c.

Now let’s modify≻ so that every voter movesc just abovea in his preferenceranking, and otherwise leaves the ranking unchanged; let’scall this new preference

3. One subtle difference does arise from our assumption in this chapter that all preferences are strict.

c© Shoham and Leyton-Brown, 2007

Page 247: Mas kevin b-2007

8.4 Existence of social functions 231

(a) Preference profile≻1

(b) Preference profile≻2

(c) Preference profile≻3

!"# !" !"$ %& & & & &'( '( )( ( '(' !"# !" !"$ %& & & & &'( '( )( ( '('(d) Preference profile≻4

Figure 8.1 The four preference profiles used in the proof of Arrow’s theorem.A higher po-sition along the dotted line indicates a higher position in an agent’s preferenceordering. Theoutcomes indicated in bold (i.e.,b in profiles≻1,≻2 and≻3 anda for votern∗ in profiles≻3

and≻4) must be in the exact positions shown. (In profile≻4, a must simply be ranked abovec.)The outcomes not indicated in bold are simply examples, and can occur in any relative orderingthat is consistent with the placement of the bold outcomes.

profile≻′. We know from IIA that fora ≻W b or b ≻W c to change, the pair-wise relationship betweena andb and/or the pairwise relationship betweenb andc would have to change. However, sinceb occupies an extremal position for allvoters,c can be moved abovea without changing either of these pairwise relation-ships. Thus in profile≻′ it is also the case thata ≻W b andb ≻W c. From thisfact and from transitivity, we have thata ≻W c. However, in≻′ every voter ranksc abovea and so PE requires thatc ≻W a. We have a contradiction.

Step 2: There is some votern∗ who isextremely pivotalin the sense that bychanging his vote at some profile, he can move a given outcomeb from the bottomof the social ranking to the top.

Consider a preference profile≻ in which every voter ranksb last, and in whichpreferences are otherwise arbitrary. By PE,W must also rankb last. Now let votersfrom 1 ton successively modify≻ by movingb from the bottom of their rankingsto the top, preserving all other relative rankings. Denote asn∗ the first voter whosechange causes the social ranking ofb to change. There clearly must be some suchvoter: when the votern movesb to the top of his ranking, PE will require thatb be

Multiagent Systems, draft of August 27, 2007

Page 248: Mas kevin b-2007

232 8 Aggregating Preferences: Social Choice

ranked at the top of the social ranking.Denote by≻1 the set of preferences just beforen∗ movesb, and denote by≻2

the set of preferences just aftern∗ has movedb to the top of his ranking. (Thesesets of preferences are illustrated in Figures 8.1(a) and 8.1(b), with the indicatedpositions of outcomesa andc in each agent’s ranking serving only as examples.)In ≻1, b is at the bottom in≻W . In ≻2, b has changed its position in≻W , andevery voter ranksb at either the top or the bottom. By the argument from Step 1,in ≻2 b must be ranked at the top of≻W .

Step 3:n∗ (the agent who is extremely pivotal on outcomeb) is a dictator overany pairac not involvingb.

We begin by choosing one element from the pairac; without loss of generality,let’s choosea. We’ll construct a new preference profile≻3 from≻2 by making twochanges. (Profile≻3 is illustrated in Figure 8.1(c).) First, we movea to the topof n∗’s preference ordering, leaving it otherwise unchanged; thusa ≻n∗ b ≻n∗ c.Second, we arbitrarily rearrange the relative rankings ofa andc for all voters otherthann∗, while leavingb in its extremal position.

In ≻1 we hada ≻W b, asb was at the very bottom of≻W . When we compare≻1 to≻3, relative rankings betweena andb are the same for all voters. Thus, byIIA, we must havea ≻W b in ≻3 as well. In≻2 we hadb ≻W c, asb was at thevery top of≻W . Relative rankings betweenb andc are the same in≻2 and≻3.Thus in≻3, b ≻W c. Using the two above facts about≻3 and transitivity, we canconclude thata ≻W c in ≻3.

Now construct one more preference profile,≻4, by changing≻3 in two ways.First, arbitrarily change the position ofb in each voter’s ordering while keepingall other relative preferences the same. Second, movea to an arbitrary positionin n∗’s preference ordering, with the constraint thata remains ranked higher thanc. (Profile≻4 is illustrated in Figure 8.1(d).) Observe that all voters other thann∗ have entirely arbitrary preferences in≻4, while n∗’s preferences are arbitraryexcept thata ≻n∗ c. In ≻3 and≻4 all agents have the same relative preferencesbetweena andc; thus, sincea ≻W c in ≻3 and by IIA,a ≻W c in ≻4. Thus wehave determined the social preference betweena andc without assuming anythingexcept thata ≻n∗ c.

Step 4:n∗ is a dictator over all pairsab.Consider some third outcomec. By the argument in Step 2, there is a votern∗∗

who is extremely pivotal forc. By the argument in Step 3,n∗∗ is a dictator overany pairαβ not involving c. Of course,ab is such a pairαβ. We have alreadyobserved thatn∗ is able to affectW ’s ab ranking—for example, whenn∗ was ableto changea ≻W b in profile≻1 into b ≻W a in profile≻2. Hence,n∗∗ andn∗

must be the same agent.

We have now shown thatn∗ is a dictator over all pairs of outcomes.

8.4.2 Social choice functions

Arrow’s theorem tells us that we cannot hope to find a voting scheme that satisfies allof the notions of fairness that we find desirable. However, maybe the problem is that

c© Shoham and Leyton-Brown, 2007

Page 249: Mas kevin b-2007

8.4 Existence of social functions 233

Arrow’s theorem considers too hard a problem—the identification of a social orderingoverall outcomes. We now consider the setting of social choice functions, which arerequired only to identify a single top-ranked outcome. First, we define concepts analo-gous to Pareto efficiency, independence of irrelevant alternatives and non-dictatorshipfor social choice functions.

Definition 8.4.5 (Weak Pareto Efficiency)A social choice functionC isweakly Paretoefficient if, for any preference profile≻∈ Ln, if there exist a pair of outcomeso1 and weak Pareto

efficiencyo2 such that∀i ∈ N , o1 ≻i o2, thenC(≻) 6= o2.

This definition prohibits the social choice function from selecting any outcome thatis dominated by the same alternative for all agents. (That is, if all agents prefero1 too2, the social choice rule does not have to chooseo1, but it cannot chooseo2.) Thedefinition implies that the social choice rule must respect agents’ unanimous choices:if outcomeo is the top choice according to each≻i, then we must haveC(≻) =o. Thus, the definition is less demanding thanstrict Pareto efficiencyas defined inDefinition 2.3.2—a strictly Pareto efficient choice rule would also always satisfy weakPareto efficiency, but the reverse is not true.

Definition 8.4.6 (Monotonicity) C is monotonicif, for anyo ∈ O and any preference monotonicityprofile ≻∈ Ln with C(≻) = o, then for any other preference profile≻′ with theproperty that∀i ∈ N,∀o′ ∈ O, o ≻′

i o′ if o ≻i o

′, it must be thatC(≻′) = o.

Monotonicity says that when a social choice ruleC selects the outcomeo for apreference profile≻, then for any second preference profile≻′ in which, for everyagenti, the set of outcomes to whicho is preferred under≻′

i is a weak superset ofthe set of outcomes to whicho is preferred under≻i, the social choice rule must alsochoose outcomeo. Intuitively, monotonicity means that an outcomeo must remain thewinner whenever the support for it is increased relative to apreference profile underwhich o was already winning. Observe that the definition imposes no constraint onthe relative orderings of outcomeso1, o2 6= o under the two preference profiles; forexample, some or all of these relative orderings could be different.

Definition 8.4.7 (Non-dictatorship) C is non-dictatorialif there does not exist an dictatorshipagentj such thatC always selects the top choice inj’s preference ordering.

Following the pattern we followed for social welfare functions, we can show thatany no social choice function can satisfy all three of these properties.

Theorem 8.4.8 (Muller-Satterthwaite, 1977)If |O| ≥ 3, any social choice functionC that is weakly Pareto efficient and monotonic is dictatorial.

Before giving the proof, we must provide a key definition.

Definition 8.4.9 (TakingO′ to the top from ≻) LetO′ ⊂ O be a finite subset of theoutcomesO, and let≻ be a preference profile. Denote the setO \ O′ asO′. A secondpreference profile≻′ takesO′ to the top from≻ if, for all i ∈ N , o′ ≻′

i o for all o′ ∈ O′

ando ∈ O′ ando′1 ≻′i o

′2 if and only ifo′1 ≻i o

′2.

Multiagent Systems, draft of August 27, 2007

Page 250: Mas kevin b-2007

234 8 Aggregating Preferences: Social Choice

That is,≻′ takesO′ to the top from≻ when, under≻′:

• each outcome fromO′ is preferred by every agent to each outcome fromO′;

• the relative preferences between pairs of outcomes inO′ for every agent are thesame as the corresponding relative preferences under≻; and

• the relative preferences between pairs of outcomes inO′ are arbitrary (i.e., they arenot required to bear any relation to the corresponding relative preferences under≻).

We can now state the proof. Intuitively, it works by constructing a social welfarefunctionW from the given social choice functionC. We show that the facts thatC isweakly Pareto efficient and monotonic imply thatW must satisfy PE and IIA, allowingus to apply Arrow’s theorem.

Proof. We will assume thatC satisfies weak Pareto efficiency and monotonicity,and show that it must be dictatorial. The proof proceeds in 6 steps.

Step 1: If both≻′ and≻′′ takeO′ ⊂ O to the top from≻, thenC(≻′) = C(≻′′)andC(≻′) ∈ O′.

Under≻′, for all i ∈ N , o′ ≻′i o for all o′ ∈ O′ and allo ∈ O. Thus, by weak

Pareto efficiencyC(≻′) ∈ O′. For everyi ∈ N , everyo′ ∈ O′ and everyo 6= o′ ∈O, o′ ≻′

i o if and only if o′ ≻′′i o. Thus by monotonicity,C(≻′) = C(≻′′).

Step 2: We define a social welfare functionW fromC.For every pair of outcomeso1, o2 ∈ O, construct a preference ordering≻o1,o2

by takingo1, o2 to the top from≻. By Step 1,C(≻ o1, o2) will be eithero1or o2, and will always be the same regardless of how we choose≻ o1, o2. Nowwe will construct a social welfare functionW fromC. For each pair of outcomeso1, o2 ∈ O, let o1 ≻W o2 if and only ifC(≻ o1, o2) = o1.

In order to show thatW is a social welfare function, we must demonstrate thatit establishes a total strict ordering over the outcomes. SinceW is strict and com-plete, it only remains to show thatW is transitive. Suppose thato1 ≻W o2 ando2 ≻W o3; we must thus show thato1 ≻W o3. Let ≻′ be a preference profilethat takeso1, o2, o3 to the top from≻. By Step 1,C(≻′) ∈ o1, o2, o3. Weconsider each possibility.

Assume for contradiction thatC(≻′) = o2. Let ≻′′ be a profile that takeso1, o2 to the top from≻′. By monotonicity,C(≻′′) = o2 (o2 has weakly im-proved its ranking from≻′ to≻′′). Observe that≻′′ also takeso1, o2 to the topfrom≻. Thus by our definition ofW , o2 ≻W o1. But we already hado1 ≻W o2.ThusC(≻′) 6= o2. By an analogous argument, we can show thatC(≻′) 6= o3.

ThusC(≻′) = o1. Let≻′′ be a preference profile that takeso1, o3 to the topfrom ≻′. By monotonicityC(≻′′) = o1. Observe that≻′′ also takeso1, o3 tothe top from≻. Thus by our definition ofW , o1 ≻W o3, and hence we have shownthatW is transitive.

Step 3: The highest-ranked outcome inW (≻) is alwaysC(≻).We have seen thatC can be used to construct a social welfare functionW . It

turns out thatC can also be recovered fromW , in the sense that the outcome giventhe highest ranking byW (≻) will always beC(≻). LetC(≻) = o1, let o2 ∈ O

c© Shoham and Leyton-Brown, 2007

Page 251: Mas kevin b-2007

8.4 Existence of social functions 235

be any other outcome, and let≻′ be a profile that takeso1, o2 to the top from≻.By monotonicity,C(≻′) = o1. By the definition ofW , o1 ≻W o2. Thus,o1 is theoutcome ranked highest byW .

Step 4:W is Pareto efficient.Imagine that∀i ∈ N , o1 ≻ o2. Let≻′ takeo1, o2 to the top from≻. SinceC

is weakly Pareto efficient,C(≻′) = o1. Thus by the definition ofW from Step 2,o1 ≻W o2, and soW is Pareto efficient.

Step 5:W is independent of irrelevant alternatives.Let ≻1 and≻2 be two preference profiles with the property that for alli ∈ N

and for some pair of outcomeso1 ando2 ∈ O, o1 ≻1 o2 if and only if o1 ≻2 o2.We must show thato1 ≻W (≻1) o2 if and only if o1 ≻W (≻2) o2.

Let ≻1′

takeo1, o2 to the top from≻1, and let≻2′

takeo1, o2 to the topfrom ≻2. From the definition ofW in Step 2,o1 ≻W (≻1) o2 if and only if

C(≻1′

) = o1; likewise, o1 ≻W (≻2) o2 if and only if C(≻2′

) = o1. Now ob-

serve that≻1′

also takeso1, o2 to the top from≻2, because for alli ∈ N therelative ranking betweeno1 ando2 is the same under≻1 and≻2. Thus by Part 1,C(≻1′

) = C(≻2′

), and henceo1 ≻W (≻1) o2 if and only if o1 ≻W (≻2) o2.

Step 6:C is dictatorial.From Steps 4 and 5 and Theorem 8.4.4,W is dictatorial. That is, there must be

some agenti ∈ N such that, regardless of the preference profile≻′, we alwayshaveo1 ≻W (≻′) o2 if and only if o1 ≻′

i o2. Therefore, the highest ranked outcomein W (≻′) must also be the outcome ranked highest byi. By Step 3,C(≻′) isalways the outcome ranked highest inW (≻′). ThusC is dictatorial.

In effect, this theorem tells us that, perhaps contrary to intuition, social choice func-tions are no simpler than social welfare functions. Intuitively, the proof shows that wecan repeatedly ‘probe’ a social choice function to determine the relative social orderingbetween given pairs of outcomes. Because the function must be defined for all inputs,we can use this technique to construct a full social welfare ordering.

To get a feel for the theorem, consider the social choice function defined by the plu-rality rule.4 Clearly it satisfies weak Pareto efficiency, and clearly it isnot dictatorial.This means it must be non-monotonic. To see why, consider thefollowing scenariowith seven voters.

3 agents: a ≻ b ≻ c2 agents: b ≻ c ≻ a2 agents: c ≻ b ≻ a

Denote these preferences as≻1. Now consider the situation where the final twoagents change their votes by movingc to the bottom of their rankings as shown below;denote the new preferences as≻2.

4. Formally, we should also specify the tie-breaking rule used here. However, in our example monotonicityfails even when ties never occur, so the tie-breaking rule does not matter here.

Multiagent Systems, draft of August 27, 2007

Page 252: Mas kevin b-2007

236 8 Aggregating Preferences: Social Choice

3 agents: a ≻ b ≻ c2 agents: b ≻ c ≻ a2 agents: b ≻ a ≻ c

Under≻1 plurality choosesa. If plurality were monotonic, it would have to makethe same choice under≻2, because for alli ∈ N , a ≻2

i b if a ≻1i b anda ≻2

i c ifa ≻1

i c. However, under≻2 plurality choosesb. Therefore plurality is not monotonic.

8.5 Ranking systems

We now turn to a specialization of the social choice problem that has a computationalflavor, and in which some interesting progress can be made. Specifically, considera setting in which the set of agents isthe sameas the set of outcomes—agents areasked to vote to express their opinions about each other, with the goal of determininga social ranking. Such settings have great practical importance. For example, searchengines rank web pages by considering hyperlinks from one page to another to be votesabout the importance of the destination pages. Similarly, online auction sites employreputation systemsthat provide assessments of agents’ trustworthiness basedon ratingsfrom past transactions.

Let us formalize this setting. To do so, we must relax our earlier assumption thatagents’ preferences are strict total orders, assuming instead that preferences are totalpreorders (i.e., agents can be indifferent between outcomes). We denote the propositionthat agenti is indifferent betweeno1 ando2 aso1 ∼i o2. Now we can define our settingby stating two assumptions. First,N = O: the set of agents is the same as the set ofoutcomes. Second, agents’ preferences are such that each agent divides the other agentsinto a set that he likes equally, and a set that he dislikes equally (or, equivalently, hasno opinion about). Formally, for eachi ∈ N the outcome setO (equivalent toN ) ispartitioned into two setsOi,1 andOi,2, with ∀o1 ∈ Oi,1,∀o2 ∈ Oi,2, o1 ≻i o2, andwith ∀o, o′ ∈ Oi,k for k ∈ 1, 2, o ∼i o

′. We call this theranking systems setting, andranking systemssetting call a social welfare function in this setting aranking rule. Observe that a ranking rule

ranking ruleis not required to partition the agents into two sets; it mustsimply return some totalpreordering on the agents.

Interestingly, Arrow’s impossibility system does not holdin the ranking systemssetting. The easiest way to see this is to identify a ranking rule that satisfies all ofArrow’s axioms.5

Proposition 8.5.1 In the ranking systems setting, approval voting satisfies IIA, PE andnon-dictatorship.

The proof is straightforward, and is left as an easy exercise. Intuitively, the fact thatagents partition the outcomes into only two sets is crucially important. We would beable to apply Arrow’s argument if agents were able to partition the outcomes into asfew as three sets. (Recall that the proof of Theorem 8.4.4 requires arbitrary preferencesand|O| ≥ 3.)

5. Note that we defined these axioms in terms of strict total orderings; nevertheless, they generalize easilyto total preorderings.

c© Shoham and Leyton-Brown, 2007

Page 253: Mas kevin b-2007

8.5 Ranking systems 237

WVUTPQRSAlice //WVUTPQRSBob

WVUTPQRSWill //WVUTPQRSLiam //WVUTPQRSVic

Figure 8.2 Sample preferences in a ranking system, where arrows indicate votes.

Although the possibility of circumventing Arrow’s theoremis encouraging, the dis-cussion does not end here. Due to the special nature of the ranking systems setting,there are other properties that we would like a ranking rule to satisfy.

First, consider an example in which Alice votes only for Bob,Will votes only forLiam, and Liam votes only for Vic. These votes are illustrated in Figure 8.2. Whoshould be ranked highest? Three of the five kids have receivedvotes (Bob, Liam andVic); these three should presumably rank higher than the remaining two. But of thethree, Vic is special: he is the only one whose voter (Liam) himself received a vote.Thus, intuitively, Vic should receive the highest rank. This intuition is captured by theidea of transitivity.

First we definestrong transitivity. We will subsequently relax this definition; how-ever, it is useful for what follows.

Definition 8.5.2 (strong transitivity) Consider a set of preferences in which outcomeo2 receives at least as many votes aso1, and it is possible to pair up all the voters foro1 with voters fromo2 so that each voter foro2 is weakly preferred by the ranking ruleto the corresponding voter foro1.6 Further assume thato2 receives more votes thano1and/or that there is at least one pair of voters where the ranking rule strictly prefersthe voter foro2 to the voter foro1. Then the ranking rule satisfiesstrong transitivityif strong

transitivityit always strictly preferso2 to o1.

Because our transitivity definition will serve as the basis for an impossibility result,we want it to be as weak as possible. One way in which this definition is quite strongis that it does not take into account thenumberof votes that a voting agent places.Consider an example in which Vic votes for almost all the kids, whereas Ray votesonly for one. If Vic and Ray are ranked the same by the ranking rule, strong transitivityrequires that their votes must count equally. However, we might feel that Ray has beenmore decisive, and therefore feel that his vote should be counted more strongly thanVic’s. We can allow for such rules by weakening the notion of transitivity. The newdefinition is exactly the same as the old one, except that it isrestricted to apply only tosettings in which the voters vouch for exactly the same number of candidates.

Definition 8.5.3 (weak transitivity) Consider a set of preferences in which outcomeo2 receives at least as many votes aso1, and it is possible to pair up all the voters for

6. The pairing must use each voter fromo2 at most once; if there are more votes foro2 than foro1, therewill be agents who voted foro2 who are not paired. If an agent voted for botho1 ando2, it is acceptable forhim to be paired with himself.

Multiagent Systems, draft of August 27, 2007

Page 254: Mas kevin b-2007

238 8 Aggregating Preferences: Social Choice

o1 with voters foro2 who have both voted for exactly the same number of outcomesso that each voter foro2 is weakly preferred by the ranking rule to the correspondingvoter foro1. Further assume thato2 receives more votes thano1 and/or that there isat least one pair of voters where the ranking rule strictly prefers the voter foro2 to thevoter foro1. Then the ranking rule satisfiesweak transitivityif it always strictly prefersweak transitivityo2 to o1.

Recall the Independence of Irrelevant Alternatives (IIA) property defined earlier inDefinition 8.4.2, which said that the ordering of two outcomes should only depend onagents’ relative preferences between these outcomes. Suchan assumption is inconsis-tent with even our weak transitivity definitions. However, we can broaden the scopeof IIA to allow for transitive effects, and thereby still express the idea that the rankingrule should rank pairs of outcomes based only on local information.

Definition 8.5.4 (RIIA, informal) A ranking rule satisfiesranked independence of ir-relevant alternatives (RIIA)if the relative rank between pairs of outcomes is alwaysranked

independence ofirrelevantalternatives(RIIA)

determined according to the same rule, and this rule dependsonly on

1. the number of votes each outcome received; and

2. the relative ranks of these voters.7

Note that this definition prohibits the ranking rule from caring about theidentitiesofthe voters, which is allowed by IIA.

Despite the fact that Arrow’s theorem doesn’t apply in this setting, it turns out thatanother, very different impossibility result does hold.

Theorem 8.5.5 There is no ranking system that always satisfies both weak transitivityand RIIA.

What hope is there then for ranking systems? The obvious way forward is to considerrelaxing one axiom and keeping the other, and indeed, progress can be made both byrelaxing weak transitivity and by relaxing RIIA. For example, the famous PageRankalgorithm (used originally as the basis of the Google searchengine) can be understoodas a ranking system that satisfies weak transitivity but not RIIA. Unfortunately, anaxiomatic treatment of this algorithm is quite involved, sowe do not provide it here.

Instead, we will consider relaxations of transitivity. First, what happens if we simplydrop the weak transitivity requirement altogether? Let us add the requirements that anagent’s rank can only improve when he receives more votes (“positive response”) andthat the agents’ identities are ignored by the ranking function (“anonymity”). Then itcan be shown that approval voting, which we have already considered in this setting, istheonlypossible ranking function.

Theorem 8.5.6 Approval voting is the only ranking rule that satisfies RIIA,positiveresponse and anonymity.

7. The formal definition of RIIA is more complicated than Definition 8.5.4 because it must explain preciselywhat is meant by depending on the relative ranks of the voters.The interested reader is invited to consult thereference cited at the end of the chapter.

c© Shoham and Leyton-Brown, 2007

Page 255: Mas kevin b-2007

8.6 History and references 239

Finally, what if we try to modify the transitivity requirement rather than dropping itentirely? It turns out that we can also obtain a positive result here, although this comesat the expense of guaranteeing anonymity. Note that this newtransitivity requirementis a different weakening of strong transitivity which does not care about the numberof outcomes that agents vote for, but instead requires strict preference only when theranking rule strictly preferseverypaired voter foro2 over the corresponding voter foro1.

Definition 8.5.7 (strong quasi-transitivity) Consider a set of preferences in whichoutcomeo2 receives at least as many votes aso1, and it is possible to pair up allthe voters foro1 with voters fromo2 so that each voter foro2 is weakly preferred by theranking rule to the corresponding voter foro1. Then the ranking rule satisfiesstrongquasi-transitivityif it weakly preferso2 to o1, and strictly preferso2 to o1 if either o1 strong

quasi-transitivityreceived no votes or each paired voter foro2 is strictly preferred by the ranking rule tothe corresponding voter foro1.

forall i ∈ N do initialize rank(i) = 0repeat

forall i ∈ N doif |voters for(i)| > 0 then

rank(i) = 1n+1 [|voters for(i)|+ maxj∈voters for(i) rank(j)]

elserank(i) = 0

until rank converges

Figure 8.3 A ranking algorithm that satisfies strong quasi-transitivity and RIIA.

There exists a family of ranking algorithms that satisfy strong quasi-transitivity andRIIA. These algorithms work by assigning agents numerical ranks that depend on thenumber of votes they have received, and breaking ties in favor of the agent who re-ceived a vote from the highest-ranked voter. If this rule still yields a tie, it is appliedrecursively; when the recursion follows a cycle, the rank isa periodic rational numberwith period equal to the length of the cycle. One such algorithm is given in Figure 8.3.This algorithm can be proven to converge inn iterations; as each step takesO(n2)time (considering all votes for all agents), the worst-casecomplexity of the algorithmisO(n3).8

8.6 History and references

Social choice theory is covered in many textbooks on micro-economics and game the-ory, as well as in some specialized texts such as [Feldman & Serrano, 2006; Gaertner,2006]. An excellent survey is provided in [Moulin, 1994].

8. In fact, the complexity bound on this algorithm can be somewhat improved by more careful analysis;however, the argument here suffices to show that the algorithmruns in polynomial time.

Multiagent Systems, draft of August 27, 2007

Page 256: Mas kevin b-2007

240 8 Aggregating Preferences: Social Choice

The seminal individual publication in this area is [Arrow, 1970], which still remainsamong the best introductions to this field. The book includesArrow’s famous impos-sibility result, though our treatment follows the elegant first proof in [Geanakoplos,2005]. Plurality voting is too common and natural (it is usedin 43 of the 191 coun-tries in the United Nations for either local or national elections) to have clear origins.Borda invented his system as a fair way to elect members to theFrench Academyof Sciences in 1770, and first published his method in 1781 as [de Borda, 1781]. In1784 Marie Jean Antoine Nicolas Caritat, aka the Marquis de Condorcet, first pub-lished his ideas regarding voting [de Condorcet, 1784]. Somewhat later, Nanson, aBriton-turned-Australian mathematician and election reformer, published his modifi-cation of the Borda count in [Nanson, 1882]. The Smith set wasintroduced in [Smith,1973]. The Muller-Satterthwaite impossibility result appears in [Muller & Satterth-waite, 1977]; our proof follows [Mas-Colellet al., 1995]. Our section on rankingsystems follows [Altman & Tennenholtz, 2007b]. Other interesting directions in rank-ing systems include developing practical ranking rules and/or axiomatizing such rules(e.g., [Pageet al., 1998; Kleinberg, 1999; Borodinet al., 2005; Altman & Tennenholtz,2005]), and exploring personalized rankings, in which the ranking function gives a po-tentially different answer to each agent (e.g., [Altman & Tennenholtz, 2007a]).

c© Shoham and Leyton-Brown, 2007

Page 257: Mas kevin b-2007

9 Protocols for Strategic Agents:Mechanism Design

As we discussed in the previous chapter, social choice theory is non-strategic; it takesthe preferences of the agents as given, and investigates ways in which they can beaggregated. But of course those preferences are usually notknown. What you have,instead, is that the various agentsdeclaretheir preferences, which they may do truth-fully or not. Assuming the agents are self interested, in general they will not revealtheir true preferences. Since as a designer you wish to find anoptimal outcome withrespect to the agents’ true preferences (for example, electing a leader that truly reflectsthe agents’ preferences), optimizing with respect to the declared preferences will notin general achieve the objective.

9.1 Introduction

Mechanism designis a strategic version of social choice theory, in which mechanisms mechanismdesignare designed with the assumption that agents will behave so as to maximize their indi-

vidual payoffs. For example, in an election agents may not vote their true preference.

9.1.1 Example: strategic voting

Consider again our babysitting example. This time, in addition to Will, Liam, and Vicyou must also babysit their devious new friend, Ray. Again, you invite each child toselect their favorite among the three activities—going to the video arcade (a), playingbasketball (b), and going for a leisurely car ride (c). As before, you announce that youwill select the activity with the highest number of votes, breaking ties alphabetically.Consider the case in which the true preferences of the kids are as follows:

Will: b ≻ a ≻ cLiam: b ≻ a ≻ c

Vic: a ≻ c ≻ bRay: c ≻ a ≻ b

Will, Liam, and Vic are sweet souls who always tell you their true preferences. Butlittle Ray, he’s always figuring things out; and so he goes through the following rea-soning process. He prefers the most sedentary activity possible (hence his preference

Page 258: Mas kevin b-2007

242 9 Protocols for Strategic Agents: Mechanism Design

ordering). But he knows his friends well, an in particular heknows which activity eachof them will vote for. He thus knows that if he votes for his true passion—slumping inthe car for a few hours (c)—he’ll end up playing basketball (b). So he votes for goingto the arcade (a), ensuring that this indeed is the outcome. Is there anything you cando to prevent such manipulation by little Ray?

This is wheremechanism design, or implementation theory, comes in. Mechanismmechanismdesign

implementationtheory

design in sometimes colloquially called “inverse game theory.” Our discussion of gametheory in Chapter 2 was framed as follows: Given an interaction among a set of agents,how do we predict or prescribe the course of action of the various agents participatingin the interaction? In mechanism design, rather than investigate a given strategic in-teraction, we start with certain desired behaviors on the part of agents, and ask whatstrategic interaction among these agents might give rise tothese behaviors. Roughlyspeaking, from the technical point of view this will translate to the following: We willassume unknown individual preferences, and ask whether we can design a game suchthat, no matter what the secret preferences of the agents actually are, the equilibriumof the game is guaranteed to have a certain desired property.1 Mechanism design isperhaps the most “computer scientific” part of game theory, since it concerns itselfwith designing effective protocols for distributed systems. The key difference from thetraditional work in distributed systems is that in the current setting the distributed ele-ments are not necessarily cooperative, and must be motivated to play their part. For thisreason one can think of mechanism design as an exercise in “incentive engineering.”

9.1.2 Example: selfish routing

Like social choice theory, the scope of mechanism design is broader than voting. Themost famous application of mechanism design isauction theory, to which we return inauction theoryChapter 10. However, mechanism design has many other applications. Consider thetransportation network depicted in Figure 9.1.

76540123 2 //

1

;;

;;

;;

;;

;;

;;

;;

;; 76540123

2

&&MMMMMMMMMMM

?>=<89:;s

3

88qqqqqqqqqqq

2&&MMMMMMMMMMM ?>=<89:;t

76540123

5

44hhhhhhhhhhhhhhhhhhhh3

//765401231

88qqqqqqqqqqq

Figure 9.1 Transportation network with selfish agents.

In this figure, the number next to a given edge is the cost of transporting alongthat edge. You are interested in finding the least-cost routefrom node S to node T.Unfortunately, you do not know the costs; those are private information of the variousshippers. However, you do know that these shippers are interested in maximizing their

1. Actually, as we shall see, technically speaking what we design is not a game but a mechanism, thattogether with the secret utility functions defines a (Bayesian) game.

c© Shoham and Leyton-Brown, 2007

Page 259: Mas kevin b-2007

9.2 Mechanism design with unrestricted preferences 243

revenue. How can use you that knowledge to extract from them the information neededto compute the least cost path?

9.2 Mechanism design with unrestricted preferences

Now we introduce some of the broad principles of mechanism design, placing no re-striction on the preferences agents can have. (We will subsequently consider suchrestrictions.) We begin with a formal model.

Definition 9.2.1 (Mechanism) A mechanism(over a set of agentsN and a set of out- mechanismcomesO) is a pair (A,M), where

• A = A1 × · · · ×An, whereAi is the set of actions available to agenti ∈ N , and

• M : A→ Π(O) maps each action profile to a distribution over outcomes.

A mechanism isdeterministicif for every a ∈ A there existso ∈ O such thatM(a)(o) = 1; in this case we write simply we writeM(a) = o.

9.2.1 Implementation

A mechanism does not define a game in and of itself, but it does when coupled witha utility function over outcomes for each agent. A mechanism(A,M) overN andO and a vectoru = u1, . . . , un of utility functions (with ui : O → R), define thenormal-form game(N,A,O,M, u).2 In a similar way (discussed in a moment) wecan associate a mechanism with a Bayesian game. We now define akey property of amechanism—whether it induces outcomes that are consistent with a given social choicefunction.

Definition 9.2.2 (Implementation in dominant strategies)A mechanism (A,M)(overN andO) is animplementation in dominant strategiesof a social choice function implementation

in dominantstrategies

C over (N andO) if for any vector of utility functionsu, the game(N,A,O,M, u)has an equilibrium in dominant strategies, and in any such equilibrium a∗ we haveM(a∗) = C(u).

A mechanism that gives rise to dominant strategies is sometimes calledstrategy-proof, because there is no need for agents to reason about each others’ actions in order strategy-proof

mechanismto maximize their utility.In the babysitter example above, the pair consisting of “each child votes for one

choice” and “the activity selected is one with the most votes” is a well-formed mecha-nism, since it specifies the actions available to each child and the outcome dependingon the choices made. Now consider the social choice function“the selected activity isthat which is the top choice of the maximal number of children, breaking ties alpha-betically.” Clearly the mechanism defined by the babysitterdoes not implement this

2. Recall from Chapter 2 that the most general formulation of a game does not equate action profiles withoutcomes, but rather maps the former to the latter.

Multiagent Systems, draft of August 27, 2007

Page 260: Mas kevin b-2007

244 9 Protocols for Strategic Agents: Mechanism Design

function in dominant strategies. For example, the above instance of it has no dominantstrategy for Ray.

This suggests that the above definition can be relaxed, and can appeal to solutionconcepts that are weaker than dominant-strategy equilibrium. For example, one canappeal to the Nash equilibrium, and define Nash-implementation in a straightforwardway. Note that in the current example, the babysitter’s solution is not even a Nash-implementation of the social choice function; it is a Nash equilibrium for Will, Liam,Vic and Ray to voteb, b, a, a, respectively; the activity selected isa, while the socialchoice isb.

Nash implementation is a sensible concept when the agents know each other’s utilityfunction (but the designer does not; if he knew, he could simply select the social choicedirectly). For example, it would make sense in our babysitting example if the childrenwere old friends (albeit strategically inclined), but the babysitter were new and didn’tknow them. But in many cases of interest the utility functions u (and thus also thegame(N,A,O,M, u)) are not common knowledge among the agents. This lack ofcommon knowledge does not get in the way when the game has dominant strategies,but when the game does not, we must replace our normal-form game with a Bayesiangame, and the Nash equilibrium by the Bayes-Nash equilibrium. In mechanism designit is standard to use the definition of Bayesian games according to which agents’ util-ities depend on both their actions and their types. (This definition was introduced inSection 5.3.1.) We are thus able to define Bayes-Nash implementation.

Definition 9.2.3 (Bayes-Nash implementation)We begin with a mechanism(A,M)overN andO. Let Θ = Θ1 × · · · × Θn denote the set of all possible type vectorsθ = (θ1, . . . , θn), and denote agenti’s utility asui : O×Θ→ R. Letp be a (commonprior) probability distribution onΘ (and hence onu). Then(A,M) is a Bayes-Nashimplementationof a social choice functionC, with respect toΘ andp, if there existsBayes-Nash

implementation a Bayes-Nash equilibrium of the game of incomplete information (N,A,Θ, p, u) suchthat for everyθ ∈ Θ and every action profilea ∈ A that can arise given type profileθin this equilibrium, we have thatM(a) = C(u(·, θ)).3

A classical example of Bayesian mechanism design is auctiondesign. While wedevote a more lengthy discussion to auctions in Section 10, the basic idea is as follows.The designer wishes (for example) to ensure that the bidder with the highest valuationfor a given item will win the auction, but the valuations of the agents are all private. Theoutcomes consist of allocating the item (in the case of a simple, single-item auction) toone of the agents, and having the agents make or receive some payments. The auctionrules define the actions available to the agents (the “bidding rules”), and the mappingfrom action vectors to outcomes (“allocation rules” and “payment rules”: who winsand who pays what as a function of the bidding). If we assume that the valuationsare drawn from some known distribution, each particular auction design and particularset of agents define a Bayesian game, in which the signal of each agent is his ownvaluation.

3. The careful reader will notice that because we have previously defined social choice functions as deter-ministic, we here end up with a mechanism that selects outcomes deterministically as well. Of course, thisdefinition can be extended to describe randomized social choice functions and mechanisms.

c© Shoham and Leyton-Brown, 2007

Page 261: Mas kevin b-2007

9.2 Mechanism design with unrestricted preferences 245

Finally, there exist implementation concepts that are satisfied by a larger set of strat-egy profiles than implementation in dominant strategies, but that are not guaranteedto be achievable for any given social choice function and setof preferences, unlikeBayes-Nash implementation. For example, we could consideronly symmetric Bayes-Nash equilibria, on the principle that strategies that depend on agent identities wouldbe less likely to arise in practice. It turns out that symmetric Bayes-Nash equilibriaalways exist in symmetric Bayes-Nash games. A second implementation notion thatdeserves mention isex-post implementation.4 ex-post

implementationRegardless of the implementation concept, we can require that the desired socialchoice function is implemented (a) in the only equilibrium;(b) in every equilibrium; or(c) in at least one equilibrium of the underlying game. Thereare some other propertiesthat we often desire (we will return more formally to these concepts in Section 9.3.2below). Individual rationality means that no agent is harmed by participating in theindividual

rationalitymechanism, as compared to non-participation.Budget balanceis important for mecha-

budget balancenisms that involve monetary payments to and from agents; it means that the mechanismpays out and receives the same amount. Finally,truthfulnessmeans that agents truth-

truthfulnessfully disclose their preferences to the mechanism in equilibrium.

9.2.2 The revelation principle

It turns out that truthfulness, the final property discussedabove, can always be achievedregardless of the social choice function to be implemented and of the agents’ prefer-ences. Adirect mechanismis one in which the only action available to agents is todirect

mechanismannounce a preference function. Since in a Bayesian game theonly uncertainty aboutan agent’s preference functionui(o, θ) stems from his type, we can represent this asAi = Θi. For example, in a simple, single-item auction setting, theonly action avail-able is to announce one’s value for the item. Since the set of actions is the set of alltypes, an agent may lie and announce a typeθi that is different from his true typeθi. Adirect mechanism is said to betruthful (or incentive compatible)if, for any type vector truthful or

incentivecompatiblemechanism

θ, in the game defined by the mechanism it is a dominant strategyfor every agenti toannounce his true type, so thatθi = θi. In other words, the truthful mechanisms areprecisely the strategy-proof direct mechanisms. Sometimes the term used isincentivecompatibility in dominant strategies, to distinguish from the case in which the agentsincentive

compatibility indominantstrategies

are truthful only in a Bayes-Nash equilibrium (calledBayes-Nash incentive compati-bility).

Bayes-Nashincentivecompatibility

Of course, it may not be possible to find a dominant strategy implementation ofevery social choice problem. Furthermore, the space of all mechanisms is unmanage-ably large, which makes the task of finding such a mechanism—oreven ascertainingwhether it exists—seem daunting. However, the following theorem teaches us thatwe can, without loss of coverage, limit ourselves to a small sliver of the space of allmechanisms:

4. Recall from Section 5.3.4 that anex-postequilibrium has the property that no agent can ever gain bychanging his strategy even if he observes all the other agents’ strategies and also their types. Thus, unlike aBayes-Nash equilibrium, anex-postequilibrium does not depend on the type distribution. However, it is stilla weaker notion than equilibrium in dominant strategies because an agent’sex-postequilibrium strategy isonly guaranteed to be a best response if the other agents playtheir respective strategies from the equilibrium.

Multiagent Systems, draft of August 27, 2007

Page 262: Mas kevin b-2007

246 9 Protocols for Strategic Agents: Mechanism Design

OriginalMechanism

outcome

strategy s1(1)

type1

strategy sn(n)

typen

(a) Revelation principle: original mechanism

(

New Mechanism

OriginalMechanism

outcome

strategy1

type1

strategyn

typen

s1(1)

sn(n)

(b) Revelation principle: new mechanism

Figure 9.2 The revelation principle: how to construct a new mechanism with a truthful equi-librium, given an original mechanism with equilibrium(s1, . . . , sn).

Theorem 9.2.4 (Revelation principle)RevelationPrinciple

If there exists any mechanism that implements a social choice functionC in dom-inant strategies then there exists a direct mechanism that implementsC in dominantstrategies and is truthful.

Proof. Consider an arbitrary mechanism forn agents which implements a so-cial choice functionC in dominant strategies. This mechanism is illustrated inFigure 9.2(a). Lets1, . . . , sn denote the dominant strategies for agents1, . . . , n.We will construct a new mechanism whichtruthfully implementsC. Our newmechanism will ask the agents for their utility functions, use them to determines1, . . . , sn, the agents’ dominant strategies under the original mechanism, and thenchoose the outcome that would have been chosen by the original mechanism foragents following the strategiess1, . . . , sn. This new mechanism is illustrated inFigure 9.2(b).

Assume that some agenti would be better off declaring a utility functionu∗ito the new mechanism rather than his true utility functionui. This implies thatiwould have preferred to follow some different strategys∗i in the original mecha-nism rather thansi, contradicting our assumption thatsi is a dominant strategy fori. (Intuitively, if i could gain by lying to the new mechanism, he could likewisegain by “lying to himself” in the original mechanism.) Thus the new mechanismis dominant-strategy truthful.

c© Shoham and Leyton-Brown, 2007

Page 263: Mas kevin b-2007

9.2 Mechanism design with unrestricted preferences 247

In other words, any solution to a mechanism design problem can be converted intoone in which agents always reveal their true preferences, ifthe new mechanism “lies forthe agents” in just the way they would have chosen to lie to theoriginal mechanism.The revelation principle is arguably the most basic result in mechanism design. Itmeans that, while one might have thoughta priori that a particular mechanism designproblem calls for an arbitrarily complex strategy space, infact one can restrict one’sattention to truthful, direct mechanisms.

As you might have suspected, the revelation principle does not apply only to im-plementation in dominant strategies; we have stated the theorem in this way only tokeep things simple. Following exactly the same argument we can argue that, e.g., amechanism that implements a social choice function in a Bayes-Nash equilibrium canbe converted into a direct, Bayes-Nash incentive compatible mechanism.

The argument we used to justify the revelation principle also applies to originalmechanisms that are indirect (e.g., ascending auctions). The new, direct mechanism cantake the agents’ utility functions, construct their strategies for the indirect mechanism,and then simulate the indirect mechanism to determine whichoutcome to select. Onecaveat is that, even if the original indirect mechanism had aunique equilibrium, thereis no guarantee that the new revelation mechanism will not have additional equilibria.

Before moving on, we will finally consider some computational caveats to the revela-tion principle. Observe that the general effect of constructing a revelation mechanismis to push an additional computational burden onto the mechanism, as is implicit inFigure 9.2(b). There are many settings in which agents’ strategies are computationallydifficult to determine. When this is the case, the additional burden absorbed by themechanism may be considerable. Furthermore, the revelation mechanism forces theagents to reveal their types completely. There may be settings in which agents are notwilling to compromise their privacy to this degree (observethat in general, the orig-inal mechanism will require them to reveal much less information). Finally, even ifnot objectionable on privacy grounds, this full revelationcan sometimes place an un-reasonable burden on the communication channel. For all these reasons, in practicalsettings one must apply the revelation principle with caution.

9.2.3 Impossibility of general, dominant-strategy implementation

We now ask what social choice functions can be implemented indominant strategies.Given the revelation principle, we can restrict our attention to direct mechanisms. Indirect mechanisms the strategy space is simply the space of agent preferences; the(A,M) mechanism structure therefore simplifies toM . Thus a direct mechanism issimply a social choice function over the revealed preferences. Furthermore, if thedirect mechanism is truthful, it must be identical to the social choice function whichit implements. So the question we are after is really which social choice functions aretruthful. The first answer is disappointing.

Theorem 9.2.5 (Gibbard-Satterthwaite) Consider any social choice functionC ofN andO. If:

1. |O| ≥ 3 (there are at least three outcomes);

Multiagent Systems, draft of August 27, 2007

Page 264: Mas kevin b-2007

248 9 Protocols for Strategic Agents: Mechanism Design

2. C is onto; that is, for everyo ∈ O there is a preference vector≻ such thatC(≻) = o(this property is sometimes also calledcitizen sovereignty); and

3. C is dominant-strategy truthful,

thenC is dictatorial.

If Theorem 9.2.5 is reminiscent of the Muller-Satterthwaite theorem (Theorem 8.4.8)this is no accident, since Theorem 9.2.5 is implied by that theorem as a corollary.Finally, we emphasize again that this negative result is specific to dominant-strategyimplementation. It does not hold for the weaker concepts of Nash or Bayes-Nashequilibrium implementation.

9.3 Quasilinear preferences

If we are to design a dominant strategy truthful mechanism that is not dictatorial, weare going to have to relax some of the conditions of the Gibbard-Satterthwaite theorem.First, we relax the requirement that agents be able to express any preferences, andreplace it with the requirement that agents be able to express any preferences in alimited set. Second, we relax the condition that the mechanism be onto. We nowintroduce this set of preferences.

Definition 9.3.1 (Quasilinear utility function) Agents havequasilinear utility func-tions (or quasilinear preferences) in an n-player Bayesian game when the set of out-quasilinear

utility functions

quasilinearpreferences

comes isO = X × Rn for a finite setX, and the utility of an agenti with typeθi is

given byui(o, θi) = ui(x, θi)−fi(pi), whereo = (x, p) is an element ofO, ui(x, θi) isan arbitrary function andfi : R→ R is a strictly monotonically increasing function.

Intuitively, we split outcomes into two pieces which are linearly related. First,Xrepresents a finite set of non-monetary outcomes, such as theallocation of an object toone of the bidders in an auction, or the selection of a candidate in an election. Second,pi is the (possibly negative) payment made by agenti to the mechanism, such as apayment to the auctioneer.

What does it mean to assume that agents’ preferences are quasilinear? First, it meansthat we are in a setting in which the mechanism can choose to charge or reward theagents by an arbitrary monetary amount. Second, and more restrictive, it means that anagent’s degree of preference for the selection of any choicex ∈ X is independent fromhis degree of preference for having to pay the mechanism someamountpi ∈ R. Thusan agent’s utility for a choice can’t depend on the total amount of money that he has(e.g., an agent can’t value having a yacht more if he is rich than if he is poor). Finally,it means that agents care only about the choice selected and about their own payment:in particular, they do not care about the monetary payments made or received by otheragents.

Strictly speaking, we have defined quasilinear preferencesin a way that fixes the setof agents. However, we generally consider families of quasilinear problems, for anyset of agents. For example, consider a voting game of the sortdiscussed earlier. Youwould want to be able to speak about a voting problem and a voting solution in a way

c© Shoham and Leyton-Brown, 2007

Page 265: Mas kevin b-2007

9.3 Quasilinear preferences 249

that is not dependent on the number of agents. So in the following we assume that aquasilinear utility function is still defined when any one agent is taken away. In thiscase the set of non-monetary outcomes must be updated (for example, in an auctionsetting the missing agent cannot be the winner), and is denoted byO−i. Similarly, theutility functionsui and the choice functionC must be updated accordingly.

9.3.1 Risk attitudes

There is still one part of the definition of quasilinear preferences that we have notdiscussed—the functionsfi. Before defining them, let’s consider a question whichmay at first seem a bit nonsensical. Recall that we’ve said that pi denotes the amountan agent has to pay the mechanism. How much does this agent value a dollar? To makesense of this question, we must first note that utility is specified in its own units, ratherthan in units of currency, so we need to perform some kind of conversion. (Recall thediscussion in Section 2.1.2.) Indeed, this is the purpose offi. However, this conversionis nontrivial because for most people the value of a dollar depends on the amount ofmoney they start out with in the first place. (For example, if you’re broke and starvingthen a dollar could lead to a substantial increase in your utility; if you’re a millionaire,you might not bend over to pick up the same dollar if it was lying on the street!) Tomake the same point in another way, consider a fair lottery inwhich a ticket costs$xand pays off$2x half of the time. Your willingness to participate in this lottery wouldprobably depend onx—most people would be willing to play for sufficiently smallvalues ofx (say$1), but not for larger values (say$10, 000). However, we’ve modeledagents as expected utility maximizers—how can we express theidea that an agent’swillingness to participate in this lottery can depend onx, when the lottery’s expectedvalue is the same in both cases?

These two examples illustrate that we will often want thefi functions to be non-linear. The curvature offi givesi’s risk attitude, which we can understand as the wayrisk attitudethati feels about lotteries such as the one described above.

If an agenti simply wants to maximize his expected revenue, we say the agent is riskneutral. Such an agent has a linear value for money, as illustrated inFigure 9.3(a). To risk neutralsee why this is so, consider Figure 9.3(b). This figure describes a situation where theagent starts out with an endowment of$k, and must decide whether or not to participatein a fair lottery that awards$k + x half the time, and$k − x the other half of the time.From looking at the graph, we can see thatu(k) = 1

2u(k−x)+ 12u(k+x)—the agent

is indifferent between participating in the lottery and notparticipating. This is whatwe would expect, as the lottery’s expected value isk, the same as the value for notparticipating.

In contrast, consider the value-for-money curve illustrated in Figure 9.3(c). We callsuch an agentrisk averse—he has a sublinear value for money, which means that herisk averseprefers a “sure thing” to a risky situation with the same expected value. Consider thesame fair lottery described above from the point of view of a risk-averse agent, asillustrated in Figure 9.3(d). We can see that for this agentu(k) > 1

2u(k−x)+ 12u(k+

x)—the marginal disutility of losing$x is greater than the marginal utility of gaining$x, given an initial endowment ofk.

Finally, the opposite of risk aversion isrisk seeking, illustrated in Figure 9.3(e). risk seeking

Multiagent Systems, draft of August 27, 2007

Page 266: Mas kevin b-2007

250 9 Protocols for Strategic Agents: Mechanism Design

u

$(a) Risk neutrality

* *+,*-,./*0./*+,0./*-,0u

$(b) Risk neutrality: fair lottery

u

$(c) Risk aversion

1 1231435617561237561437u

$(d) Risk aversion: fair lottery

u

$(e) Risk seeking

8 89:8;:<=8><=89:><=8;:>u

$(f) Risk seeking: fair lottery

Figure 9.3 Risk attitudes: Risk aversion, risk neutrality, risk seeking, and in each case, utilityfor the outcomes of a fair lottery.

c© Shoham and Leyton-Brown, 2007

Page 267: Mas kevin b-2007

9.3 Quasilinear preferences 251

Such an agent has a superlinear value for money, which means that the agent prefersengaging in lotteries to a sure thing with the same expected value. This is shown inFigure 9.3(f). For example, an agent might prefer to spend$1 to buy a ticket that has a

11000 chance of paying off$1000, as compared to keeping the dollar.

The examples above suggest that people might exhibit different risk attitudes indifferent regions offi. For example, a person could be risk-seeking for very smallamounts of money, risk-neutral for moderate amounts and risk-averse for large amounts.Nevertheless, in what follows we will assume that agents arerisk-neutral unless indi-cated otherwise. When agents are risk neutral it is convenient to setfi(x) = x forall i, and to think of agents’ utilities for different choices as being expressed in dol-lars. The assumption of risk neutrality is made partly for mathematical convenience,partly because otherwise we would need to make an (often difficult to justify) assump-tion about the particular shape of agents’ value-for-moneycurves, and partly becauserisk neutrality is reasonable when the amounts of money being exchanged through themechanism are moderate. Considerable work in the literature extends results such asthose presented in this chapter and the next to the case of agents with different riskattitudes.

9.3.2 Mechanism design in the quasilinear setting

Now that we have defined the quasilinear preference model, wecan talk about thedesign of mechanisms for agents with these preferences. We concentrate on Bayesiangames because most mechanism design is performed in such domains.

First, we point out that since quasilinear preferences split the outcome space intotwo parts, we can modify our formal definition of a mechanism accordingly.

Definition 9.3.2 (Quasilinear mechanism)Amechanism in the quasilinear setting(over mechanism inthe quasilinearsetting

a set of agentsN and a set of outcomesO = X ×Rn) is a triple (A, x , p ), where

• A = A1 × · · · ×An, whereAi is the set of actions available to agenti ∈ N ,

• x : A→ Π(X) maps each action profile to a distribution over choices, and

• p : A→ Rn maps each action profile to a payment for each agent.

In effect, we have split the functionM into two functionsx andp , wherex is thechoice ruleandp is thepayment rule. We will use the notationp i to denote the payment choice rule

payment rulefunction for agenti.

A direct revelation mechanism in the quasilinear setting isone in which each agentis asked to state his type.

Definition 9.3.3 (Direct quasilinear mechanism)Adirect quasilinear mechanism(over directquasilinearmechanism

a set of agentsN and a set of outcomesO = X × Rn) is a pair (x , p ). It defines a

standard mechanism in the quasilinear setting, where for each i,Ai = Θi.

In a quasilinear setting we can also refer to an agent’svaluationfor choicex ∈ X, valuationwritten vi(x) = ui(x, θ). vi should be thought of as the maximum amount of moneythat i would be willing to pay to get the mechanism designer to implement choice

Multiagent Systems, draft of August 27, 2007

Page 268: Mas kevin b-2007

252 9 Protocols for Strategic Agents: Mechanism Design

x—in fact, having to pay this much would exactly makei indifferent about whetherhe was offered this deal or not.5 Note that an agent’s valuation depends on his type,even though we do not explicitly refer toθi. In the future when we discuss directquasilinear mechanisms, we will usually mean mechanisms that ask agents to declaretheir valuations for each choice; of course, this alternatedefinition is equivalent toDefinition 9.3.3.6 Let Vi denote the set of all possible valuations for agenti. We willuse the notationvi ∈ Vi to denote the valuation that agenti declares to such a directmechanism, which may be different from his true valuationvi. We denote the vectorof all agents’ declared valuations asv, and the set of all possible valuation vectors asV . Finally, denote the vector of declared valuations from allagents other thani asv−i.

Now we can state some properties that it is common to require of quasilinear mech-anisms.

Definition 9.3.4 (Truthfulness) A quasilinear mechanism istruthful if ∀i∀vi, agenttruthfulnessi’s equilibrium strategy is to adopt the strategyvi = vi.

Of course, this is equivalent to the definition of truthfulness that we gave in Sec-tion 9.2.2; we have simply updated the notation for the quasilinear utility setting.

Definition 9.3.5 (Efficiency) A quasilinear mechanism isstrictly Pareto efficient, orstrictly Paretoefficient just efficient, if in equilibrium it selects a choicex such that∀v∀x′, ∑i vi(x) ≥efficiency

∑i vi(x

′).

That is, an efficient mechanism selects the choice that maximizes the sum of agents’utilities, disregarding the monetary payments that agentswill be required to make. Wedescribe this property aseconomic efficiencywhen there is a danger that it will be con-fused with other (e.g., computational) notions of efficiency. Observe that efficiency isdefined in terms of agents’ true valuations, not their declared valuations. This conditionis also known associal-welfare maximization.social-welfare

maximization. The attentive reader might wonder about the relationship between strict Pareto ef-ficiency as defined in Definition 2.3.2 and Definition 9.3.5. The underlying conceptis indeed the same. The reason why we can get away with summingagents’ utilitieshere arises from our assumption that agents’ preferences are quasilinear. Recall thatwe observed in Section 2.3.1 that there can be many Pareto efficient outcomes becauseof the fact that agents’ utility functions are only unique upto positive affine transfor-mations. In a quasilinear setting, if we include the operator of the mechanism7 as anagent who values money linearly and is indifferent between the mechanism’s choices,it can be shown that all Pareto efficient outcomes involve themechanism making thesame choice, and differ only in monetary allocations.

Definition 9.3.6 (Budget balance)A quasilinear mechanism isbudget balancedwhenbudget balance∀v, ∑i p i(v) = 0.

5. Observe that here we make the explicit assumption of risk neutrality discussed above.6. Here we assume, as is common in the literature, that the mechanism designer knows each agent’s value-for-money functionfi.7. For example, this would be a seller in a single-sided auction, or a market maker in a double-sided market.

c© Shoham and Leyton-Brown, 2007

Page 269: Mas kevin b-2007

9.3 Quasilinear preferences 253

In other words, regardless of the agents’ types, the mechanism collects and disbursesthe same amount of money from and to the agents, meaning that it makes neither aprofit nor a loss.

Sometimes we relax this condition and require onlyweak budget balance, meaning weak budgetbalancethat∀v ∑i p i(v) ≥ 0 (that is, the mechanism never takes a loss, but it may make a

profit). Finally, we can require that either strong or weak budget balance holdex ante,ex antebudgetbalancewhich means thatEv

∑i p i(v) is either equal to or greater than zero. (That is, the

mechanism must break even or make a profit on expectation, butmay take a loss forcertain sets of agent valuationsv.)

Definition 9.3.7 (Ex-interim individual rationality) A quasilinear mechanism isex-interim individually rationalwhen∀i∀vi, Ev−i|vi

vi(x (si(vi), s−i(v−i)))−p i(si(vi), s−i(v−i)) ≥ex-interimindividualrationality

0, wheres is the equilibrium strategy profile.

This condition requires that no agent loses by participating in the mechanism. Wecall it ex-interimbecause it holds foreverypossible valuation for agenti, but averagesover the possible valuations of the other agents. This approach makes sense because itrequires that, based on the information that an agent has when he chooses to participatein a mechanism, no agent would be better off choosing not to participate. Of course,we can also strengthen the condition to say that no agenteverloses by participation:

Definition 9.3.8 (Ex-postindividual rationality) A quasilinear mechanism isex-postindividually rationalwhen∀i∀v, vi(x (s(v))) − p i(s(v)) ≥ 0, wheres is the equilib- ex-post

individualrationality

rium strategy profile.

We can also restrict mechanisms based on their computational requirements ratherthan their economic properties.

Definition 9.3.9 (Tractability) A quasilinear mechanism istractablewhen∀v, x (v) tractabilityandp (v) can be computed in polynomial time.

Finally, in some domains there will be many possible mechanisms that satisfy theconstraints given above, meaning that we need to have some way of choosing betweenthem. (And as we’ll see later, for other combinations of the constraints no mechanismsexist at all.) The usual approach is to define an optimizationproblem that identifies theoptimal outcome in the feasible set. For example, although we’ve defined efficiencyas a constraint, it’s also possible to soften the constraintand speak instead about socialwelfare maximization. Here we define some other quantities that a mechanism designercan seek to optimize.

First, the mechanism designer can take a selfish perspective. Interestingly, this goalturns out to be quite different from the goal of maximizing social welfare. (We give anexample of the differences between these approaches when weconsider single-goodauctions in Section 10.1.)

Definition 9.3.10 (Revenue maximization)A quasilinear mechanism isrevenue max-imizing when, among the set of functionsx andp that satisfy the other constraints, the revenue

maximizationmechanism selects thex andp that maximizeEθ

∑i p i(s(θ)), wheres(θ) denotes the

agents’ equilibrium strategy.

Multiagent Systems, draft of August 27, 2007

Page 270: Mas kevin b-2007

254 9 Protocols for Strategic Agents: Mechanism Design

Conversely, the designer might try to collect aslittle revenue as possible, for exam-ple if the mechanism uses payments only to provide incentives, but is not intended tomake money. The budget balance constraint is the best way to solve this problem, butsometimes it is impossible to satisfy. In such cases, one approach is to set weak budgetbalance as a constraint and then to pick the revenue minimizing mechanism, effectivelysoftening the (strict) budget balance constraint. Here we present aworst-caserevenueminimization objective; of course, an average-case objective is also possible.

Definition 9.3.11 (Revenue minimization)A quasilinear mechanism isrevenue min-imizing when, among the set of functionsx andp that satisfy the other constraints, therevenue

minimization mechanism selects thex andp that minimizemaxv

∑i p i(v) in equilibrium.

Finally, the mechanism designer might be concerned with selecting afair outcome.The idea of fairness has been formalized in various ways. Here we define so-calledmax-min fairness, which says that the fairest outcome is the one that makes theleast-happy agent the happiest.

Definition 9.3.12 (Max-min fairness) A quasilinear mechanism ismax-min fairwhen,max-minfairness among the set of functionsx and p that satisfy the other constraints, the mechanism

selects thex andp that for all v maximizemini∈N vi(x (v))− p i(v).

9.4 Groves mechanisms

We now define an important and very general family of mechanisms for the quasilinearsetting.

Definition 9.4.1 (Groves mechanisms)Groves mechanismsare direct quasilinear mech-Grovesmechanisms anisms(x , p ), for which

x (v) = arg maxx

i

vi(x)

p i(v) = hi(v−i)−∑

j 6=i

vj(x (v))

In other words, Groves mechanisms are direct mechanisms in which agents can de-clare any valuation functionv (and thus any quasilinear utility functionu). The mech-anism then optimizes the choice of outcome assuming that theagents disclosed theirtrue utility function. An agent is made to pay an arbitrary amounthi(v−i) which doesnot depend on his own declaration, and is paid the sum of everyother agent’s declaredvaluation for the mechanism’s choice. The fact that the mechanism designer has thefreedom to choose thehi functions explains why we refer to thefamily of Grovesmechanisms rather than to a single mechanism.

The remarkable property of Groves mechanisms is that they provide a dominant-strategy truthful implementation of a social-welfare maximizing social choice function.It is easy to see that if a Groves mechanism is dominant strategy truthful, then it mustbe social-welfare maximizing: the functionx in Definition 9.4.1 performs exactly themaximization called for by Definition 9.3.5 whenv = v. Thus, it suffices to show that:

c© Shoham and Leyton-Brown, 2007

Page 271: Mas kevin b-2007

9.4 Groves mechanisms 255

Theorem 9.4.2 Truth telling is a dominant strategy under any Groves mechanism.

Proof. Consider a situation where every agentj other thani follows some arbitrarystrategyvj . Consider agenti’s problem of choosing the best strategyvi. As ashorthand, we will writev = (v−i, vi). The best strategy fori is one that solves

maxvi

(vi(x (v))− p (v)

)

Substituting in the payment function from the Groves mechanism, we have

maxvi

vi(x (v))− hi(v−i) +∑

j 6=i

vj(x (v))

Sincehi(v−i) does not depend onvi, it is sufficient to solve

maxvi

vi(x (v)) +∑

j 6=i

vj(x (v))

.

The only way in which the declarationvi influences the maximization above isthrough the termvi(x (v). If possible,i would like to pick a declarationvi that willlead the mechanism to pick anx ∈ X which solves

maxx

vi(x) +∑

j 6=i

vj(x)

. (9.1)

The Groves mechanism chooses anx ∈ X as

x (v) = arg maxx

(∑

i

vi(x)

)= arg max

x

vi(x) +∑

j 6=i

vj(x)

.

Thus, agenti optimizes the outcome for himself by declaringvi = vi. Becausethis argument does not depend in any way on the declarations of the other agents,truth-telling is a dominant strategy for agenti.

Intuitively, the reason that Groves mechanisms are dominant-strategy truthful isthat agents’ externalities are internalized. Imagine a mechanism in which agents de-clared their valuations for the different choicesx ∈ X and the mechanism selectedthe efficient choice, but in which the mechanism did not impose any payments onagents. Clearly, agents would be able to change the mechanism’s choice to anotherthat they preferred by overstating their valuation. Under Groves mechanisms, how-ever, an agent’s utility doesn’t depend only on the selectedchoice, because paymentsare imposed. Since agents are paid the (reported) utility of allthe other agents underthe chosen allocation, each agent becomes just as interested in maximizing the otheragents’ utilities as in maximizing his own. Thus, once payments are taken into account,

Multiagent Systems, draft of August 27, 2007

Page 272: Mas kevin b-2007

256 9 Protocols for Strategic Agents: Mechanism Design

all agents have the same interests.Groves mechanisms illustrate a property that is generally true of dominant strategy

truthful mechanisms: an agent’s payment does not depend on the amount of his owndeclaration. Although other dominant strategy truthful mechanisms exist in the quasi-linear setting, the next theorem shows that Groves mechanisms are theonly mecha-nisms that implement an efficient allocation in dominant strategies among agents witharbitrary quasilinear utilities.

Theorem 9.4.3 An efficient social choice functionC : RXn → X ×Rn can be imple-

mented in dominant strategies for agents with unrestrictedquasilinear utilities only ifp i(v) = h(v−i)−

∑j 6=i vj(x (v)).

Proof. From the revelation principle, we can assume thatC is truthfully imple-mentable in dominant strategies. Thus, from the definition of efficiency, the out-come must be chosen as

x = arg maxx

i

vi(x)

We can write the payment function as

p i(v) = h(vi, v−i)−∑

j 6=i

vj(x (v)).

Observe that we can do this without loss of generality because h can be anarbitrary function that cancels out the second term. Now forcontradiction, assumethat there exist somevi andv′i such thath(vi, v−i) 6= h(v′i, v−i).

Case 1:x (vi, v−i) = x (v′i, v−i). SinceC is truthfully implementable in dominantstrategies, an agenti whose true valuation wasvi would be better off declaringvi

thanv′i:

vi(x (vi, v−i))− p i(vi, v−i) ≥ vi(x (v′i, v−i))− p i(v′i, v−i)

p i(vi, v−i) ≤ p i(v′i, v−i)

In the same way, an agenti whose true valuation wasv′i would be better offdeclaringv′i thanvi:

v′i(x (v′i, v−i))− p i(v′i, v−i) ≥ v′i(x (vi, v−i))− p i(vi, v−i)

p i(v′i, v−i) ≤ p i(vi, v−i)

Thus, we must have

p i(vi, v−i) = p i(v′i, v−i)

h(vi, v−i)−∑

j 6=i

vj(x (vi, v−i)) = h(v′i, v−i)−∑

j 6=i

vj(x (v′i, v−i))

c© Shoham and Leyton-Brown, 2007

Page 273: Mas kevin b-2007

9.4 Groves mechanisms 257

We are currently considering the case wherex (vi, v−i) = x (v′i, v−i). Thus wecan write

h(vi, v−i)−∑

j 6=i

vj(x (vi, v−i)) = h(v′i, v−i)−∑

j 6=i

vj(x (vi, v−i))

h(vi, v−i) = h(v′i, v−i)

This is a contradiction.

Case 2: x (vi, v−i) 6= x (v′i, v−i). Without loss of generality, leth(vi, v−i) <h(v′i, v−i). Since this inequality is strict, there must exist someε ∈ R

+ such thath(vi, v−i) < h(v′i, v−i)− ε.

Our mechanism must work for everyv. Consider a case wherei’s valuation is

v′′i (x) =

−∑j 6=i vj(x (vi, v−i)) x = x (vi, v−i)

−∑j 6=i vj(x (v′i, v−i)) + ε x = x (v′i, v−i)

−∑j 6=i vj(x)− ε for any otherx

Note that agenti still declares his valuations as real numbers; they just happento satisfy the constraints given above. Also note that theε used here is the sameε ∈ R

+ mentioned above. From the fact thatC is truthfully implementable indominant strategies, an agenti whose true valuation wasv′′i would be better offdeclaringv′′i thanvi:

v′′i (x (v′′i , v−i))− p i(v′′i , v−i) ≥ v′′i (x (vi, v−i))− p i(vi, v−i) (9.2)

Because our mechanism is efficient, it must pick the outcome that solves

f = maxx

v′′i (x) +∑

j

vj(x)

.

Pickingx = x (v′i, v−i) givesf = ε; pickingx = x (vi, v−i) givesf = 0, andany otherx givesf = −ε. Therefore, we can conclude that

x (v′′i , v−i) = x (v′i, v−i). (9.3)

Substituting Equation (9.3) into Equation (9.2), we get

v′′i (x (v′i, v−i))− p i(v′′i , v−i) ≥ v′′i (x (vi, v−i))− p i(vi, v−i). (9.4)

Expand Equation (9.4):

−∑

j 6=i

vj(x (v′i, v−i)) + ε

h(v′′i , v−i)−∑

j 6=i

vj(x (v′′i , v−i))

−∑

j 6=i

vj(x (vi, v−i))

h(vi, v−i)−∑

j 6=i

vj(x (vi, v−i))

. (9.5)

Multiagent Systems, draft of August 27, 2007

Page 274: Mas kevin b-2007

258 9 Protocols for Strategic Agents: Mechanism Design

We can use Equation (9.3) to replacex (v′′i , v−i) by x (v′i, v−i) on the LHS ofEquation (9.5). The sums then cancel out, and the inequalitysimplifies to

h(vi, v−i) ≥ h(v′′i , v−i)− ε. (9.6)

Sincex (v′′i , v−i) = x (v′i, v−i), by the argument from Case 1 we can show that

h(v′′i , v−i) = h(v′i, v−i). (9.7)

Substituting Equation (9.7) into Equation (9.6), we get

h(vi, v−i) ≥ h(v′i, v−i)− ε.

This contradicts our assumption thath(x (vi, v−i)) < h(x (v′i, v−i)) − ε. Wehave thus shown that there cannot existvi, v

′i such thath(vi, v−i) 6= h(v′i, v−i).

Although we do not give the proof here, it has also been shown that Groves mecha-nisms are unique among Bayes-Nash incentive compatible efficient mechanisms, in aweaker sense. Specifically, any Bayes-Nash incentive compatible efficient mechanismcorresponds to a Groves mechanism in the sense that each agent makes the sameex-interim expected payments and hence has the sameex-interimexpected utility underboth mechanisms.

9.4.1 The VCG mechanism

So far, we have said nothing about how to set the functionhi in a Groves mechanism’spayment function. Here we’ll discuss the most popular answer, which is called theClarke tax. In the subsequent sections we’ll discuss some ofits properties, but firstwe’ll define it.

Definition 9.4.4 (Clarke tax) TheClarke taxsets thehi term in a Groves mechanismClarke taxas

hi(v−i) =∑

j 6=i

vj (x (v−i)) ,

wherex is the Groves mechanism allocation function.

The resulting Groves mechanism goes by many names. We’ll seein Chapter 10 thatthe Vickrey auction (invented in 1961) is a special case; thus, in resource allocationsettings the mechanism is sometimes known as theGeneralized Vickrey Auction. Sec-Generalized

Vickrey Auction ond, the mechanism is also known as thepivot mechanism; we’ll explain the rationale

pivotmechanism

behind this name in a moment. From now on, though, we’ll referto it as theVickrey-Clarke-Groves (VCG) mechanism, naming its contributors in chronological order of

Vickrey-Clarke-Groves (VCG)mechanism

their contributions. We restate the full mechanism here.

Definition 9.4.5 (Vickrey-Clarke-Groves (VCG) mechanism) TheVickrey-Clarke-

c© Shoham and Leyton-Brown, 2007

Page 275: Mas kevin b-2007

9.4 Groves mechanisms 259

Groves mechanismis a direct quasilinear mechanism(x , p ), where

x (v) = arg maxx

i

vi(x)

p i(v) =∑

j 6=i

vj (x (v−i))−∑

j 6=i

vj(x (v))

First, note that because the Clarke tax does not depend on an agenti’s own declara-tion vi, our previous arguments that Groves mechanisms are dominant strategy truthfuland efficient transfer immediately to the VCG mechanism. Now, we’ll try to providesome intuition about the VCG payment rule. Assume that all agents follow their dom-inant strategies and declare their valuations truthfully.The second sum in the VCGpayment rule pays each agenti the sum of every other agentj 6= i’s utility for themechanism’s choice. The first sum charges each agenti the sum of every other agent’sutility for the choice thatwould have been madehadi not participated in the mecha-nism. Thus, each agent is made to pay hissocial cost—the aggregate impact that hisparticipation has on other agents’ utilities.

What can we say about the amounts of different agents’ payments to the mechanism?If some agenti does not change the mechanism’s choice by his participation—that is,if x (v) = x (v−i)—then the two sums in the VCG payment function will cancel out.The social cost ofi’s participation is zero, and so he has to pay nothing. In order foran agenti to be made to pay a nonzero amount, he must bepivotal in the sense thatthe mechanism’s choicex (v) is different from its choice withouti, x (v−i). This iswhy VCG is sometimes called the pivot mechanism—only pivotalagents are made topay. Of course, it’s possible that some agents willimproveother agents’ utility byparticipating; such agents will be made to pay a negative amount, or in other wordswill be paid by the mechanism.

Let’s see an example of how the VCG mechanism works. Recall that Section 9.1.2discussed the problem of selfish routing in a transportationnetwork. We’ll now recon-sider that example, and determine what route and what payments the VCG mechanismwould select. For convenience, we reproduce Figure 9.1 as Figure 9.4, and label thenodes so that we have names to refer to the agents (the edges).

Note that in this example, the numbers labeling the edges in the graph denote agents’costs rather than utilities; thus, an agent’s utility is−c if a route involving his edge(having costc) is selected, and zero otherwise. Thearg max in x will amount to costminimization. Thus,x (v) will return the shortest path in the graph, which isABEF .

How much will agents have to pay? First, let’s consider the agentAC. The shortestpath taking his declaration into account has a length of 5, and imposes a cost of−5 onagents other than him (because it does not involve him). Likewise, the shortest pathwithoutAC ’s declaration also has a length of 5. Thus, his paymentpAC = (−5) −(−5) = 0. This is what we expect, sinceAC is not pivotal. Clearly, by the sameargumentBD, CE, CF andDF will all be made to pay zero.

Now let’s consider the pivotal agents. The shortest path taking AB’s declarationinto account has a length of 5, and imposes a cost of 2 on other agents. The shortestpath withoutAB isACEF , which has a cost of6. ThuspAB = (−6)− (−2) = −4:AB is paid 4 for his participation. Arguing similarly, you can verify that pBE =

Multiagent Systems, draft of August 27, 2007

Page 276: Mas kevin b-2007

260 9 Protocols for Strategic Agents: Mechanism Design

n n

n n

n n3

2

3

2

1

5

2

1

A F

C E

B D

@@

@@R

-

@@

@@

@@

@@

@@R

1

-

@@

@@R

Figure 9.4 Transportation network with selfish agents.

(−6)− (−4) = −2, andpEF = (−7)− (−4) = −3. Note that althoughEF had thesame cost asBE, they are paid different amounts for the use of their edge. This occursbecauseEF has moremarket power: for the other agents, the situation withoutEF isworse than the situation withoutBE.

9.4.2 Drawbacks of VCG

The VCG mechanism is one of the most powerful positive results in mechanism design:it gives us a general way of constructing dominant strategy truthful mechanisms to im-plement social welfare maximizing social choice functionsin quasilinear settings. Wehave seen that no fundamentally different mechanism could do the same job. And VCGgives us even more—Sections 9.4.3 and 9.5 will show that underthe right conditionsVCG further guaranteesex-postindividual rationality and weak budget balance. Thus,it’s not surprising that this mechanism has been enormouslyinfluential, and contin-ues to be widely studied in the research community. However,despite these attractiveproperties, VCG also has some undesirable characteristics. In this section, we’ll surveyfour of them.

First, consider again the transportation network example above. We saw that theshortest path has a length of 5, but the VCG mechanism ends up paying 9 for it. Canwe give a bound on how much more than an agent’s costs VCG can pay? (In thecomputer science theory literature, such a bound is called afrugality ratio.) Perhapsfrugality ratiosurprisingly, there is not. Consider again the edgeAC, which is not part of the shortestpath. If the cost of this edge increased to 8, then our paymentto AB would increaseto pAB = (−12) − (−2) = −10. And indeed, if the cost forAC were somex ≥ 2,we would select the pathABEF and would have to make a payment toAB of pAB =(−4 − x) − (−2) = −(x + 2). Thus, the gap between agents’ true costs and the

c© Shoham and Leyton-Brown, 2007

Page 277: Mas kevin b-2007

9.4 Groves mechanisms 261

payments that they could receive under VCG is unbounded!Second, observe that VCG requires agents to fully reveal their private information

(e.g., in the transportation network example, every agent has to tell the mechanism hiscosts exactly). In some real-world domains, this private information may have value toagents that extends beyond the current interaction—for example, the agents may knowthat they will compete with each other again in the future. Insuch settings, it is oftenpreferable to elicit only as much information from agents asis required to determinethe social welfare maximizing outcome and compute the VCG payments. We’ll discussthis issue further when we come to ascending auctions in Chapter 10.

Third, consider a referendum setting in which three agents use the VCG mechanismto decide between two choices. For example, this mechanism could be useful for de-ciding whether or not to undertake a public project such as building a road. Figure 9.5shows a set of valuations and the VCG payments that each agentwould be required tomake.

Agent U (build road) U (do not build road) Payment1 200 0 1502 100 0 503 0 250 0

Figure 9.5 Valuations for agents in the road building referendum example.

We know from Theorem 9.4.2 that no agent can gain by changing his declaration.However, the same cannot be said about groups of agents. It turns out that groupsof colluding agents can achieve higher utility by coordinating their declarations ratherthan honestly reporting their valuations. For example, Figure 9.6 shows that agents 1and 2 can reduce both of their payments without changing the mechanism’s decisionby both increasing their declared valuations by $50.

Agent U (build road) U (do not build road) Payment1 250 0 1002 150 0 03 0 250 0

Figure 9.6 Agents in the road building referendum can gain by colluding.

Finally, in a setting such as this road building example, we may want to use VCG toinduce agents to report their valuations honestly, but may not want to make a profit bycollecting money from the agents. In our example this might be true if the referendumwas held by a government interested only in maximizing social welfare. Thus, wemight want to find some way of returning the mechanism’s profits back the agents.This turns out to be surprisingly hard to do, because the possibility of receiving a rebateafter the mechanism has been run changes the agents’ incentives. In fact, even if profitsare given to a charity that the agents care about, or spent in away that benefits the localeconomy and hence benefits the agents, the VCG mechanism can be undermined.

Multiagent Systems, draft of August 27, 2007

Page 278: Mas kevin b-2007

262 9 Protocols for Strategic Agents: Mechanism Design

9.4.3 VCG and individual rationality

We’ve seen that Groves mechanisms are dominant strategy truthful and efficient. We’vealso seen that no other mechanism has both of these properties in general quasilinearsettings. Thus, we might be a bit worried that we haven’t beenable to guarantee ei-ther individual rationality or budget balance, two properties that are quite important inpractice. (Recall that individual rationality means that no agent would prefer not toparticipate in the mechanism; budget balance means that themechanism doesn’t losemoney.) We’ll consider budget balance in Section 9.5; here we’ll investigate individualrationality.

As it turns out, our worry is well-founded: even with the freedom to sethi, we can’tfind a mechanism that guarantees us individual rationality in an unrestricted quasilinearsetting. However, we are often able to guarantee the strongest variety of individualrationality when the setting satisfies certain mild restrictions.

Definition 9.4.6 (Choice-set monotonicity)An environment exhibits choice-setmonotonicityif ∀i, X−i ⊆ X (removing any agent weakly decreases—that is, neverchoice-set

monotonicity increases—the mechanism’s set of possible choicesX).

Definition 9.4.7 (No negative externalities)An environment exhibitsno negative ex-ternalitiesif ∀i∀x ∈ X−i, vi(x) ≥ 0 (every agent has zero or positive utility for anyno negative

externalities choice that can be made without his participation).

These assumptions are often quite reasonable, as we’ll illustrate with two examples.First, consider again our road-building referendum setting. The set of choices is inde-pendent of the number of agents, satisfying choice-set monotonicity. No agent nega-tively values the project, though some might value the situation in which the project isnot undertaken more highly than the situation in which it is.

Second, consider a market setting consisting of a set of agents interested in buyinga single unit of a good such as a share of stock, and another setof agents interested inselling a single unit of this good. The choices in this environment are sets of buyer-seller pairings (prices are imposed through the payment function). If a new agent isintroduced into the market, no previously-existing pairings become infeasible, but newones become possible; thus choice-set monotonicity is satisfied. Because agents havezero utility both for choices that involve trades between other agents and no trades atall, there are no negative externalities.

Under these restrictions, it turns out that the VCG mechanism ensuresex-postindi-vidual rationality.

Theorem 9.4.8 The VCG mechanism isex-postindividually rational when the choice-set monotonicity and no negative externalities propertieshold.

Proof. All agents truthfully declare their valuations in equilibrium. Then we can

c© Shoham and Leyton-Brown, 2007

Page 279: Mas kevin b-2007

9.4 Groves mechanisms 263

write the agenti’s utility as

ui = vi(x (v))−

j 6=i

vj(x (v−i))−∑

j 6=i

vj(x (v))

=∑

j

vj(x (v))−∑

j 6=i

vj(x (v−i)) (9.8)

We know thatx (v) is the outcome that maximizes social welfare, and thatthis optimization could have pickedx (v−i) instead (by choice-set monotonicity).Thus,∑

j

vj(x (v)) ≥∑

j

vj(x (v−i)).

Furthermore, from no negative externalities,

vi(x (v−i)) ≥ 0.

Therefore,∑

i

vi(x (v)) ≥∑

j 6=i

vj(x (v−i)),

and thus Equation (9.8) is non-negative.

9.4.4 Weak budget balance

Finally, what about weak budget balance, the requirement that the mechanism will notlose money? Our two previous conditions, choice-set monotonicity and no negativeexternalities, will not be sufficient to guarantee weak budget balance: for example, theselfish routing example above satisfied these two conditions, but we saw that the VCGmechanism paid out money and did not collect any. Thus, we’llhave to explore furtherrestrictions to the quasilinear setting.

Definition 9.4.9 (No single-agent effect)An environment exhibitsno single-agent ef-fect if ∀i, ∀v−i, ∀x ∈ arg maxy

∑j vj(y) there exists a choicex′ that is feasible no single-agent

effectwithouti and that has∑

j 6=i vj(x′) ≥∑j 6=i vj(x).

In other words, removing any agent does not worsen the total value of the best solu-tion to the others, regardless of their valuations. For example, this property is satisfiedin a single-sided auction—dropping an agent just reduces theamount of competitionin the auction, making the others better off.

Theorem 9.4.10The VCG mechanism is weakly budget-balanced when the no single-agent effect property holds.

Multiagent Systems, draft of August 27, 2007

Page 280: Mas kevin b-2007

264 9 Protocols for Strategic Agents: Mechanism Design

Proof. As before, we start by assuming truth-telling in equilibrium. We must showthat the sum of transfers from agents to the center is greaterthan or equal to zero.

i

pi(v) =∑

i

j 6=i

vj(x (v−i))−∑

j 6=i

vj(x (v))

From the no single-agent effect condition we have that

∀i∑

j 6=i

vj(x (v−i)) ≥∑

j 6=i

vj(x (v)).

Thus the result follows directly.

Indeed, we can say something more about VCG’s revenue properties: restrictingourselves to settings in which VCG isex-postindividually rational as discussed above,and comparing to all other efficient andex-interimIR mechanisms, VCG turns out tocollect the maximal amount of revenue from the agents. This is somewhat surprising,since this result does not require dominant strategies, andhence compares VCG toall Bayes-Nash mechanisms. A useful corollary of this result is that VCG is as budgetbalanced as it can be: it satisfies weak budget balance in every case whereanydominantstrategy, efficient andex-interimIR mechanism would be able to do so.

9.5 Budget balance and efficiency

Although it is encouraging to identify a realistic case in which the VCG mechanismis budget balanced, there exist other important and practical settings in which the nosingle-agent effect property does not hold. For example, define asimple exchangeassimple exchangean environment consisting of buyers and sellers with quasilinear utility functions, allinterested in trading a single identical unit of some good. The no single-agent effectproperty is not satisfied in a simple exchange because dropping a seller could makesome buyer worse off, and vice versa. Can we find some other argument to show thatVCG will remain budget balanced in this important setting?

9.5.1 Impossibility results

It turns out that neither VCG nor any other Groves mechanism is budget balanced inthe simple exchange setting.

Theorem 9.5.1 No dominant strategy incentive-compatible mechanism is always bothefficient and weakly budget balanced, even if agents are restricted to the simple ex-change setting.

Furthermore, another seminal result showed that a similar problem arises in thebroader class of Bayes-Nash incentive compatible mechanisms (which, recall, includesthe class of dominant strategy incentive compatible mechanisms) if we also requireex-interimindividual rationality and allow general quasilinear utility functions.

c© Shoham and Leyton-Brown, 2007

Page 281: Mas kevin b-2007

9.5 Budget balance and efficiency 265

Theorem 9.5.2 No Bayes-Nash incentive-compatible mechanism is always simultane-ously efficient, weakly budget balanced andex-interim individually rational, even ifagents are restricted to quasilinear utility functions.

On the other hand, it turns out that itis possible to design a Bayes-Nash incentivecompatible mechanism that achieves any two of these three properties.

9.5.2 The AGV mechanism

Of particular interest is the AGV mechanism, which trades away ex-interimindividualrationality and dominant strategies in exchange for budgetbalance andex-anteindivid-ual rationality.

Definition 9.5.3 (Arrow; d’Aspremont–Gerard-Varet (AGV) mechanism) TheAr-row; d’Aspremont–Gerard-Varet (AGV) mechanismis a direct quasilinear mechanism Arrow;

d’Aspremont–Gerard-Varet(AGV)mechanism

(x , p ), where

x (v) = arg maxx

i

vi(x),

p i(v) =

1

n− 1

j 6=i

ESW−j(vj)

− ESW−i(vi),

ESW−i(vi) = Ev−i

j 6=i

vj (x (vi, v−i))

.

ESW (standing for “expected social welfare”) is an intermediate term that is usedto make the definition ofp more concise. Observe that AGV’s allocation rule is thesame as under Groves mechanisms. Although we will not prove this or any of theother properties we mention here, AGV is incentive-compatible, from which we canconclude that it is also efficient. Unlike Groves mechanisms, on the other hand, thepayment rule charges each agenti theexpectedutility of the other agents wheni is notpresent, with this expectation conditioned oni’s own declared type. (Recall that Grovesmechanisms base the amounti is charged on the actual declared types of the otheragents.) Furthermore, whereas VCG “refunds”i according to the actual social welfareof the other agents wheni is present, AGV refunds toi a 1

n−1 share of the paymentscollected from the other agents. This guarantees that the mechanism is budget balanced(i.e., that it always pays back to the agents exactly the total amount that it collectsfrom them). Two sacrifices are made in exchange for this property: AGV is truthfulonly in Bayes-Nash equilibrium rather than in dominant strategies, and is onlyex-anteindividually rational.

The AGV mechanism illustrates two senses in which we can discover new, usefulmechanisms by relaxing properties that we had previously insisted upon. First, ourmove from dominant-strategy incentive compatibility to Bayes-Nash incentive com-patibility allowed us to circumvent Theorem 9.4.3, which told us that efficiency canbe achieved under dominant strategies only by Groves mechanisms. (AGV is also ef-ficient, but is not a Groves mechanism.) Second, moving fromex-interimto ex-ante

Multiagent Systems, draft of August 27, 2007

Page 282: Mas kevin b-2007

266 9 Protocols for Strategic Agents: Mechanism Design

individual rationality is sufficient to get around the negative result from Theorem 9.5.2,that we cannot simultaneously achieve weak budget balance,efficiency andex-interimindividual rationality, even under Bayes-Nash equilibrium.

9.6 Beyond efficiency

Throughout our consideration of the quasilinear setting inthis chapter we have so farfocused on efficient mechanisms. However, one practical problem with efficient choicerules is that they can require unreasonable amounts of computation, as evaluating thearg max can require solving anNP-hard problem in many practical domains. (We giveexamples of this in Sections 10.2.3 and 10.3.2.) Thus, both Groves mechanisms andAGV can fail to satisfy thetractability property (Definition 9.3.9).

In this section we consider two ways of addressing this issue. The first is to exploredominant strategy mechanisms that implement different social choice functions. Wehave already seen that the quasilinear preference setting is considerably more amenableto dominant strategy implementation than the unrestrictedpreferences setting. How-ever, there are still limits—what are they? Second, we will examine an alternate way ofbuilding mechanisms, by using a Groves payment rule with an alternate choice func-tion, and leveraging agents’ computational limitations inorder to achieve the imple-mentation.

9.6.1 What else can be implemented in dominant strategies?

Here we give some characterizations of the social choice functions that can be imple-mented in dominant strategies in the quasilinear setting, and of how payments must beconstrained in order to enable such implementations. As always, the revelation prin-ciple allows us to restrict ourselves to truthful mechanisms without loss of generality.We also restrict ourselves todeterministicmechanisms: this restriction does turn out tobe substantive.

LetX i(v−i) ⊆ O denote the set of outcomes that can be selected by the choice rulexgiven the declarationv−i by the agents other thani (i.e., the range ofx (·, v−i)). Nowwe can state conditions that are both necessary and sufficient for dominant strategytruthfulness that are both intuitive and useful.

Theorem 9.6.1 A mechanism is dominant-strategy incentive-compatible ifand only if,for everyi ∈ N and everyv−i ∈ V−i:

1. The payment functionp i(v) can be written asp i(v−i, x (v));

2. For everyvi ∈ Vi, x (vi, v−i) ∈ arg maxx∈Xi(v−i)(vi(x)− p (v−i, x)).

The first condition says that an agent’s payment can only depend on other agents’declarations and the selected outcome; itcannotdepend otherwise on the agent’s owndeclaration. The second condition says that, taking the other agent’s declarations andthe payment function into account, from every player’s point of view the mechanismselects the most preferable outcome. This result isn’t verydifficult to prove; the inter-ested reader is encouraged to try.

c© Shoham and Leyton-Brown, 2007

Page 283: Mas kevin b-2007

9.6 Beyond efficiency 267

As the above characterization suggests, there is a tight connection between the choicerules and payment rules of dominant strategy truthful mechanisms. In fact, under rea-sonable assumptions about the valuation space, once a choice rule is chosen, all pos-sible payment rules differ only in their choice of a functionhi(v−i) that is added tothe rest of the payment. We already saw an example of this withthe Groves family ofmechanisms: these mechanisms share the same choice rule, and their payment rulesdiffer only in a constanthi(v−i).

Given this strong link between choice rules and payment rules, it is interesting tocharacterize a set ofchoice rulesthat can be implemented in dominant strategies, with-out reference to payments. Here we will consider such a characterization, though ingeneral it turns out only to offer anecessarycondition for dominant strategy truthful-ness.

Definition 9.6.2 (WMON) A social choice functionC satisfiesweak monotonicity(WMON) if for all i ∈ N and all v−i ∈ V−i, C(vi, v−i) 6= C(v′i, v−i) implies that weak

monotonicity(WMON)

vi(C(vi, v−i))− vi(C(v′i, v−i)) ≥ v′i(C(vi, v−i))− v′i(C(v′i, v−i)).

In words, WMON says that any time the choice function’s decision can be altered bya single agent changing his declaration, it must be the case that this change expresseda relative increase in preference for the new choice over theold choice.

Theorem 9.6.3 All social choice functions implementable by dominant-strategy incentive-compatible mechanisms in quasilinear settings satisfy WMON. Furthermore, letC bean arbitrary social choice functionC : V1 × . . . × Vn → X satisfying WMON andhaving the property that∀i ∈ N , Vi is a convex set. ThenC can be implemented indominant strategies.

Although Theorem 9.6.3 does not provide a full characterization of those socialchoice functions that can be implemented in dominant strategies, it gets pretty close—the convexity restriction is often acceptable. A bigger problem is that WMON is alocalcharacterization, speaking about how the mechanism treatseach agent individually. Itwould be desirable to have a global characterization that gave the social choice functiondirectly. This also turns out to be possible.

Definition 9.6.4 (affine maximizer) A social choice function is anaffine maximizerif affine maximizerit has the form

arg maxx∈X

(γx +

i∈N

wivi(x)

),

where eachγx is an arbitrary constant (may be−∞) and eachwi ∈ R+.

In the case of general quasilinear preferences (i.e., when each agent can have anindependent valuation for each choicex ∈ X) and where the choice function selectsfrom more than two alternatives, affine maximizers turn out to be the only social choicefunctions implementable in dominant strategies.

Multiagent Systems, draft of August 27, 2007

Page 284: Mas kevin b-2007

268 9 Protocols for Strategic Agents: Mechanism Design

Theorem 9.6.5 (Roberts, 1979)If there are at least three outcomes that a social choicefunction will choose given some input, and if agents have general quasilinear prefer-ences, then the set of social choice functions implementable in dominant strategies isprecisely the set of affine maximizers.

Note that efficiency is an affine-maximizing social choice function for which∀x ∈X, γx = 0 and ∀i ∈ N,wi = 1. Indeed, affine maximizing mechanisms can beseen as weighted Groves mechanisms—they transform both the choices and the agents’valuations by applying linear weights, and then effectively run a Groves mechanism inthe transformed space. Thus, Theorem 9.6.5 says that we can’t stray very far fromGroves mechanisms even if we are willing to give up on efficiency.

Is this the end of the story on dominant-strategy implementation in quasilinear set-tings? Not quite. It turns out that the assumption that agents have general quasilin-ear preferences is a strong one, and does not hold in many domains of interest. Asanother extreme, we can considersingle-parameter valuations: each agenti parti-single-parameter

valuations tions the set of choicesX into a setXi,wins in which i “wins” and receives someconstant payoffvi that does not depend on the choicex ∈ Xi,wins, and a set ofchoicesXi,loses = X \ Xi,wins in which i “loses” and receives zero payoff.8 Im-portantly, the setsXi,wins andXi,loses are assumed to be common knowledge, andso the agent’s private information can be summarized by a single parameter,vi. Suchsettings are quite practical: we will see several that satisfy these conditions in Chap-ter 10. Single-parameter settings are interesting becausefor such preferences, itispossible to go well beyond affine maximizers. In fact, additional characterizations ex-ist describing the social choice functions that can be implemented in this and otherrestricted-preference settings. We will not describe themhere, instead referring inter-ested readers to the works cited at the end of the chapter. However, we note that we dopresent a dominant-strategy incentive-compatible, non-affine-maximizing mechanismfor a single-parameter setting in Section 10.3.5.

9.6.2 Tractable Groves mechanisms

Now we consider a general approach that attempts to implement tractable, inefficientsocial choice functions by sticking with Groves mechanisms, but replacing the (pos-sibly exponential-time) computation of thearg max with some other polynomial-timealgorithm. The very clever idea here is not to build mechanisms that are impossible tomanipulate (indeed, in many cases it can be shown that this cannot be done), but ratherto build mechanisms that agents will be unable to manipulatein practice, given theircomputational limitations.

First, we define the class of mechanisms being considered.

Definition 9.6.6 (Groves-based mechanisms)Groves-based mechanismsare direct quasi-Groves-basedmechanisms

8. The assumption that this second payoff is zero can be understood as a normalization, and does not changethe set of social choice functions that can be implemented.

c© Shoham and Leyton-Brown, 2007

Page 285: Mas kevin b-2007

9.6 Beyond efficiency 269

linear mechanisms(x , p ), for which

x (v) is an arbitrary function mapping type declarations to choices; and

p i(v) = hi(v−i)−∑

j 6=i

vj(x (v)).

That is, a mechanism is Groves-based if it uses the Groves payment function, re-gardless of what allocation function it uses. (Contrast Definition 9.6.6 with Definition9.4.1 which defined a Groves mechanism.)

Most interesting Groves-based mechanisms are not dominant-strategy truthful. Forexample, consider a property sometimes calledreasonableness: if there exists somegoodg that only one agenti values above zero,g should be allocated toi. It can beshown that the only dominant-strategy truthful Groves-based mechanisms that satisfyreasonableness are Groves mechanisms themselves. This rules out the use of mostgreedy algorithms as candidates for the allocation function x in truthful Groves-basedmechanisms, as most such algorithms would select “reasonable” allocations.

If tractable Groves-based mechanisms lose the all-important property of dominant-strategy truthfulness, why are they still interesting? Theproof of Theorem 9.4.2 es-sentially argued that the Groves payment function aligns agents’ utilities, making allagents prefer the optimal allocation. Groves-based mechanisms still have this property,but may not select the optimal allocation. We can conclude that the only way an agentcan gain by lying to a Groves-based mechanism is tohelp it by causing it to select amore efficient allocation.

We now come to the idea of a second-chance mechanism. Intuitively, since lies byagents can only help the mechanism, the mechanism can simplyask the agents howthey intend to lie, and select an outcome that would be pickedbecause of such a lie ifit turns out to be better than what the mechanism would have picked otherwise.

Definition 9.6.7 (Second chance mechanisms)Given a Groves-based mechanism(x , p ),a second chance mechanismworks as follows: second chance

mechanism1. Each agenti is asked to submit a valuation declarationvi ∈ Vi and anappeal

functionl : V → V . appeal function

2. The mechanism computesx (v), and alsox (li(v)) for all i ∈ N . From the setof outcomes thus identified, the mechanism keeps one that maximizes the sum ofagents’ declared valuations

3. The mechanism charges each agenti p (v).

Intuitively, an appeal function maps agents’ valuations tovaluations that they mightinstead have chosen to report by lying. It is important that the appeal functions be com-putationally bounded (e.g., their execution could be time-limited). Otherwise, thesefunctions can solve the social welfare maximization problem, and then select an inputthat would causex to select this outcome. When appeal functions are computation-ally restricted, we cannot in general say that second chancemechanisms are truthful.However, they arefeasibly truthful, because an agent can use the appeal function to tryfeasibly truthful

Multiagent Systems, draft of August 27, 2007

Page 286: Mas kevin b-2007

270 9 Protocols for Strategic Agents: Mechanism Design

out any lie that he believes might help him. Thus in a second-chance mechanism, acomputationally-limited agent can do no better than to declare his true valuation alongwith the best appeal function he is able to construct.

9.7 Applications of mechanism design

We now survey some applications of mechanism design, to givea sense of some morerecent work that has drawn on the theory we have described so far. One caveat: withouta doubt the most significant application of mechanism designis the design and anal-ysis of auctions. Because there is so much to say about this application, we defer itsdiscussion to Chapter 10.

9.8 History and references

Mechanism design is covered to varying degrees in modern game theory textbooks,but even better are the micro-economic textbook of Mas-Colell et al. [1995] and bookson auction theory such as [Krishna, 2002a]. Good overviews from a computer scienceperspective are given in the introductory chapters of [Parkes, 2001] and [Nisanet al.,2007].

Specific publications that underlie some of the results covered in this chapter are asfollows.

Vickrey’s seminal contribution, which laid the foundations to all modern work onimplementation in dominant strategies, appears in [Vickrey, 1961]. It is still recom-mended reading for anyone interested in the area; we expand on it further in the contextof auctions, the topic of the next chapter.

Theorem 9.2.5 is due to both Satterthwaite and Gibbard, in two separate publications[Gibbard, 1973; Satterthwaite, 1975]. The VCG mechanism was anticipated by Vick-rey [1961], who outlined an extension of the second-price auction to multiple identicalgoods. Groves [1973] explicitly considered the general family of truthful mechanismsapplying to multiple distinct goods (though the result had appeared already in Groves’s1969 Ph.D. dissertation). Clarke [1971] proposed his tax for use with public goods(that is, goods such as roads and national defense that are paid for by all regardlessof personal use). Theorem 9.4.3 is due to Green and Laffont [1977]; Theorem 9.5.1is due to that paper as well as to to the earlier [Hurwicz, 1975]. The fact that Grovesmechanisms are payoff-equivalent to all other Bayes-Nash incentive compatible effi-cient mechanisms was shown by Krishna and Perry [1998] and Williams [1999]; theformer reference also gave the results that VCG isex-interimindividually rational, andthat VCG collects the maximal amount of revenue among allex-interimindividually-rational Groves mechanisms. Taking the opposite tack, recent research has investi-gated VCG-like mechanisms that collect aslittle revenue from the agents as possible,and thus minimize the extent to which they violate (strong) budget balance [Porteret al., 2004b; Cavallo, 2006; Guo & Conitzer, 2007]. (Interestingly, the first of thesepapers came to this problem through a desire to achievefair outcomes.) The Myerson-Satterthwaite theorem (9.5.2) appears in [Myerson, 1983].The AGV mechanism isdue (independently) to Arrow [1977] and d’Aspremont and Gerard-Varet [1979]. The

c© Shoham and Leyton-Brown, 2007

Page 287: Mas kevin b-2007

9.8 History and references 271

section on implementation in dominant strategies follows Nisan [2007]; Theorem 9.6.5is due to Roberts [1979]. Second chance mechanisms are due toNisan and Ronen[2007]. (One difference: we preferred the termGroves-based mechanismsto Nisanand Ronen’sVCG-based mechanisms.)

Multiagent Systems, draft of August 27, 2007

Page 288: Mas kevin b-2007
Page 289: Mas kevin b-2007

10 Protocols for Multiagent ResourceAllocation: Auctions

In this chapter we consider the problem of allocating resources among selfish agentsin a multiagent system. Auctions—an interesting and important application of mech-anism design—turn out to provide a general solution to this problem. We describevarious different flavors of auctions, including single-good, multiunit and combina-torial auctions. In each case, we survey some of the key theoretical, practical andcomputational insights from the literature.

The auction setting is important for two reasons. First, auctions are widely used inconsumer, corporate, and computer science settings. Millions of people use auctionsdaily on Internet consumer websites to trade goods. More complex types of auctionshave been used by governments around the world to sell important public resourcessuch as access to electromagnetic spectrum. Indeed, all financial markets constitute atype of auction (one of the family of so-calleddouble auctions). Auctions are also oftenused in computer science applications to efficiently allocate bandwidth and processingpower to applications and users.

The second—and more fundamental—reason to care about auctions is that they pro-vide a general theoretical framework for understanding resource allocation problemsamong self-interested agents. Formally speaking, an auction is any protocol that allowsagents to indicate their interest in one or more resources, and that uses these indicationsof interest to determine both an allocation of resources anda set of payments by theagents. Thus, auctions are important for a wide range of computational settings (e.g.,the sharing of computational power in a grid computer on a network) that would notnormally be thought of as auctions, and that might not even use money as the basis ofpayments.

10.1 Single-good auctions

It is important to realize that the most familiar type of auction—the ascending-bid,English auction—is a drop in the ocean of auction types. Indeed, since auctions aresimply mechanisms (see Chapter 9) for allocating goods, there is an infinite numberof auction types. In the most familiar types of auctions there is one good for sale, oneseller, and multiple potential buyers. Each buyer has his own valuation for the good,and each wishes to purchase it at the lowest possible price. These auctions are called

Page 290: Mas kevin b-2007

274 10 Protocols for Multiagent Resource Allocation: Auctions

single-sided, because there are multiple agents on only one side of the market. Our tasksingle-sidedauctions is to design a protocol for this auction that satisfies certain desirable global criteria. For

example, we might want an auction protocol that maximizes the expected revenue ofthe seller. Or, we might want an auction that is economicallyefficient, that is, one thatguarantees that the potential buyer with the highest valuation ends up with the good.

Given the popularity of auctions on the one hand, and the diversity of auction mech-anisms on the other, it is not surprising that the literatureon the topic is vast. In thissection we provide a taste for this literature, concentrating on auctions for selling asingle good. We explore richer settings later in the chapter.

10.1.1 Canonical auction families

To give a feel for the broad space of single-good auctions, westart by describing someof the most famous families: English, Japanese, Dutch and sealed-bid auctions. Weend the section by presenting a unifying view of auctions as structured negotiations.

English auctions

The English auctionis perhaps the best-known family of auctions, since in one formEnglish auctionor another such auctions are used in the venerable, old-guard auction houses, as wellas most of the online consumer auction sites. The auctioneersets a starting price forthe good, and agents then have the option to announce successive bids, each of whichmust be higher than the previous bid (usually by some minimumincrement set by theauctioneer). The rules for when the auction closes vary; in some instances the auctionends at a fixed time, in others it ends after a fixed period during which no new bidsare made, in others at the latest of the two, and in still otherinstances at the earliestof the two. The final bidder, who by definition is the agent withthe highest bid, mustpurchase the good for the amount of his final bid.

Japanese auctions

The Japanese auction1 is similar to the English auction in that it is an ascending-bidJapanese auctionauction, but is different otherwise. Here the auctioneer sets a starting price for thegood, and each agent must choose whether or not to be “in”, that is, whether they arewilling to purchase the good at that price. The auctioneer then calls out successivelyincreasing prices in a regular fashion2, and after each call each agent must announcewhether they are still in. When they drop out it is irrevocable, and they cannot re-enterthe auction. The auction ends when there is exactly one agentleft in; the agent mustthen purchase the good for the current price.

Dutch auctions

In aDutch auctionthe auctioneer begins by announcing a high price, and then proceedsDutch auction

1. Unlike the termsEnglishandDutch, the termJapaneseis not used universally; however, it is commonlyused, and there is no competing name for this family of auctions.2. In the theoretical analyses of this auction the assumptionis usually that the prices rise continuously.

c© Shoham and Leyton-Brown, 2007

Page 291: Mas kevin b-2007

10.1 Single-good auctions 275

to announce successively lower prices in a regular fashion.In practice, the descendingprices are indicated by a clock that all of the agents can see.The auction ends whenthe first agent signals the auctioneer by pressing a buzzer and stopping the clock; thesignaling agent must then purchase the good for the displayed price. This auction getsits name from the fact that it is used in the Amsterdam flower market; in practice, it ismost often used in settings where goods must be sold quickly.

Sealed-bid auctions

All the auctions discussed so far are consideredopen outcryauctions, in that all the open outcryauctionbidding is done by calling out the bids in public (however, aswe’ll discuss shortly, in

the case of the Dutch auction this is something of an optical illusion). The family ofsealed-bid auctions, probably the best known after English auctions, is different. In sealed-bid

auctionthis case, each agent submits to the auctioneer a secret, “sealed” bid for the good that isnot accessible to any of the other agents. The agent with the highest bid must purchasethe good, but the price at which he does so depends on the type of sealed-bid auction.In a first-price sealed bid auction (or simplyfirst-price auction) the winning agent pays first-price

auctionan amount equal to his own bid. This auction is also sometimescalled asilent auction.

silent auctionIn a second-price auctionhe pays an amount equal to the next highest bid (that is, the

second-priceauction

highest rejected bid). The second-price auction is also called theVickrey auction. In

Vickrey auction

general, in akth-price auctionthe winning agent purchases the good for a price equal

kth-price auction

to thekth highest bid.

Auctions as structured negotiations

While it is useful to have reviewed the best known auction types, this list is far fromexhaustive. For example, consider the following auction, consisting of a sequence ofsealed bids. In the first round the lowest bidder drops out; his bid is announced andbecomes the minimum bid in the next round for the remaining bidders. This processcontinues until only one bidder remains; this bidder wins and pays the minimum bidin the final round. This auction, called theelimination auction, is different from the elimination

auctionauctions described above, and yet makes perfect sense. Or consider a procurementreverse auction, in which an initial sealed bid auction is conducted among the interestedsuppliers, and then a reverse English auction is conducted among the three cheapestsuppliers (the “finalists”) to determine the ultimate supplier. This two-phased auctionis not uncommon in industry.

Indeed, a taxonomical perspective obscures the elements common to all auctions,and thus the infinite nature of the space. What is an auction? Atheart it is simply astructured framework for negotiation. Each such negotiation has certain rules, whichcan be broken down into three categories:

1. Bidding rules: How are offers made (by whom, when, what cantheir content be)?

2. Clearing rules: When do trades occur, or what are those trades (who gets whichgoods, and what money changes hands) as a function of the bidding?

3. Information rules: Who knows what when about the state of negotiation?

Multiagent Systems, draft of August 27, 2007

Page 292: Mas kevin b-2007

276 10 Protocols for Multiagent Resource Allocation: Auctions

The different auctions we have discussed make different choices along these threeaxes, but it is clear that other rules can be instituted. Indeed, when viewed this way, itbecomes clear that what seem like three radically differentcommerce mechanisms—the hushed purchase of a Matisse at a high-end auction house in London, the mundanepurchase of groceries at the local supermarket, and the one-on-one horse trading ina Middle Easternsouk—are simply auctions that make different choices along thesethree dimensions.

10.1.2 Auctions as Bayesian mechanisms

We now move to a more formal investigation of single-good auctions. Our startingpoint is the observation that the problem of choosing an auction that has various desiredproperties is a mechanism design problem. Ordinarily we assume that agents’ utilityfunctions in an auction setting are quasilinear. To define anauction as a quasilinearmechanism (see Definition 9.3.2) we must identify the following elements:

• A set of agentsN

• A set of outcomesO = X ×Rn

• A set of actionsAi available to each agenti ∈ N

• A choice functionx that selects one of the outcomes given the agents’ actions

• A payment functionp that determines what each agent must pay given all agents’actions

In an auction, the possible outcomesO consist of all possible ways to allocate thegood—the set of choicesX—and all possible ways of charging the agents. The agents’actions will vary in different auction types. In a sealed bidauction, each setAi is aninterval fromR (i.e., an agent’s action is the declaration of a bid amount between someminimum and maximum value). A Japanese auction is an extensive form game withimperfect information (see Section 4.2), and so in this casethe action space is the spaceof all policies the agent could follow (i.e., all different ways of acting conditioned ondifferent observed histories). As in all mechanism design problems, the choice andpayment functionsx andp depend on the objective of the auction. If it is to maximizeefficiency, these functions are defined in a straightforwardway. If it is to maximizerevenue, we must add the auctioneer as one of the agents, withno choice of strategy butwith a decided preference over the various outcomes (namely, preferring the outcomesin which the total payments to the auctioneer are maximal), apreference that definesthex andp functions.

A Bayesian game with quasilinear preferences includes two more ingredients that weneed to specify: the common prior, and the agents’ utility functions. We will say moreabout the common prior—the distribution from which the agents’ types are drawn—later; here, just note that the definition of an auction as a Bayesian game is incompletewithout it. Considering the agents’ utility functions, first note that the quasilinearityassumption (see Definition 9.3.1) allows us to writeui(o, θi) = ui(x, θi)−fi(pi). The

c© Shoham and Leyton-Brown, 2007

Page 293: Mas kevin b-2007

10.1 Single-good auctions 277

next-highest

bidi’s bid

i pays

i’s truevalue

(a) Bidding honestly,i has the highest bid.

next-highest

bidi’s bid

i pays

i’s truevalue

(b) i bids higher andstill wins.

next-highest

bidi’s bid

i pays

i’s truevalue

(c) i bids lower andstill wins.

highest

bidi’s bid

winner

pays

i’s truevalue

(d) i bids even lowerand loses.

highest

bidi’s bid

i’s truevalue

(e) Bidding honestly,i does not have thehighest bid.

highest

bidi’s bid

i’s truevalue

(f) i bids lower andstill loses.

highest

bidi’s bid

i’s truevalue

(g) i bids higher andstill loses.

next-highest

bidi’s bid

i pays

i’s truevalue

(h) i bids even higherand wins.

Figure 10.1 A case analysis to show that honest bidding is a dominant strategy in a second-price auction with independent private values.

functionfi indicates the agent’s risk attitude, as discussed in Section 9.3.1. Unless weindicate otherwise, we will commonly assume risk neutrality.

We are left with the task of describing the agents’ valuations: their utilities for dif-ferent allocations of the goodsx ∈ X. Auction theory distinguishes between a numberof different settings here. One of the best known and most extensively studied is theindependent private value (IPV)setting. In this setting all agents’ valuations are drawnindependent

private value(IPV)

independently from the same (commonly known) distribution, and an agent’s type (or“signal”) consists only of his own valuation, giving him no information about the val-uations of the others. An example where the IPV setting is appropriate is in auctionsconsisting of bidders with personal tastes who aim to buy a piece of art purely for theirown enjoyment. In most of this section we will assume that agents have independentprivate values, though we will explore an alternative, the common value assumption,in Section 10.1.10.

10.1.3 Second-price, Japanese and English auctions

Let us now consider whether the second-price sealed-bid auction, which is a directmechanism, is truthful (that is, whether it provides incentive for the agents to bid theirtrue values). The following, very conceptually-straightforward proof shows that in theIPV case it is.

Theorem 10.1.1 In a second-price auction where bidders have independent privatevalues, truth-telling is a dominant strategy.

Multiagent Systems, draft of August 27, 2007

Page 294: Mas kevin b-2007

278 10 Protocols for Multiagent Resource Allocation: Auctions

The observant reader will have noticed that the second-price auction is a special caseof the VCG mechanism, and hence of the Groves mechanism. Thus, Theorem 10.1.1follows directly from Theorem 9.4.2. However, a proof of this narrower claim is con-siderably more intuitive than the general argument.

Proof. Assume that all bidders other thani bid in some arbitrary way, and consideri’s best response. First, consider the case wherei’s valuation is larger than thehighest of the other bidders’ bids. In this casei would win and would pay thenext-highest bid amount, as illustrated in Figure 10.1(a).Could i be better off bybidding dishonestly in this case? If he bid higher, he would still win and would stillpay the same amount, as illustrated in Figure 10.1(b). If he bid lower, he wouldeither still win and still pay the same amount (Figure 10.1(c)) or lose and pay zero(Figure 10.1(d)).3Sincei gets non-negative utility for receiving the good at a priceless than or equal to his valuation,i cannot gain, and would sometimes lose, bybidding dishonestly in this case. Now consider the other case, wherei’s valuationis less than at least one other bidder’s bid. In this casei would lose and pay zero(Figure 10.1(e)). If he bid less, he would still lose and pay zero (Figure 10.1(f)). Ifhe bid more, either he would still lose and pay zero (Figure 10.1(g)) or he wouldwin and pay more than his valuation (Figure 10.1(h)), achieving negative utility.Thus again,i cannot gain, and would sometimes lose, by bidding dishonestly inthis case.

Notice that this proof does not depend on the agents’ risk attitudes. Thus, an agent’sdominant strategy in a second-price auction is the same regardless of whether the agentis risk-neutral, risk-averse or risk-seeking.

In the IPV case, we can identify strong relationships between the second-price auc-tion and Japanese and English auctions. Consider first the comparison between second-price and Japanese auctions. In both cases the bidder must select a number (in thesealed bid case the number is the one written down, and in the Japanese case it is theprice at which the agent will drop out); the bidder with highest amount wins, and paysthe amount selected by the second-highest bidder. The difference between the auctionsis that information about other agents’ bid amounts is disclosed in the Japanese auction.In the sealed bid auction an agent’s bid amount must be selected without knowing any-thing about the amounts selected by others, whereas in the Japanese auction the amountcan be updated based on the prices observed at which lower bidders dropped out. Ingeneral, this difference can be important (see Section 10.1.10); however, it makes nodifference in the IPV case. Thus, Japanese auctions are alsodominant-strategy truthfulwhen agents have independent private values.

Obviously, the Japanese and English auctions are closely related. Thus, it is notsurprising to find that second-price and English auctions are also similar. One connec-tion can be seen throughproxy bidding, a service offered on some online auction sitesproxy biddingsuch as eBay. Under proxy bidding, a bidder tells the system the maximum amounthe is willing to pay. The user can then leave the site, and the system bids as the bid-der’s proxy: every time the bidder is outbid, the system willrespond with a bid one

3. Figure 10.1(d) is over-simplified: the winner will not always payi’s bid in this case. (Do you see why?)

c© Shoham and Leyton-Brown, 2007

Page 295: Mas kevin b-2007

10.1 Single-good auctions 279

increment higher, until the bidder’s maximum is reached. Itis easy to see that if allbidders use the proxy service and update it only once, what occurs will be identical toa second-price auction (excepting that the winner’s payment may be one bid incrementhigher).

The main complication with English auctions is that bidderscan place so-calledjump bids: bids that are greater than the previous high bid by more thanthe minimumincrement. Although it seems relatively innocuous, this feature complicates analysis ofsuch auctions. Indeed, when an ascending auction is analyzed it is almost always theJapanese one, not the English.

10.1.4 First-price and Dutch auctions

Let us now consider first-price auctions. The first observation we can make is thatthe Dutch auction and the first-price auction, while quite different in appearance, areactually the same auction (in the technical jargon, they arestrategically equivalent).In both auctions each agent must select an amount without knowing about the otheragents’ selections; the agent with the highest amount wins the auction, and must pur-chase the good for that amount. Strategic equivalence is a very strong property: it saysthe auctions are exactly the same no matter what risk attitudes the agents have, and nomatter what valuation model describes their utility functions. This being the case, itis interesting to ask why both auction types are held in practice. One answer is thatthey make a trade-off between time complexity and communication complexity. First-price auctions require each bidder to send a message to the auctioneer, which couldbe unwieldy with a large number of bidders. Dutch auctions require only a single bitof information to be communicated to the auctioneer, but requires the auctioneer tobroadcast prices.

Of course, all this talk of equivalence does not help us to understand anything abouthow an agent should actuallybid in a first-price or Dutch auction. Unfortunately, un-like the case of second-price auctions, here we do not have the luxury of dominantstrategies, and must thus resort to Bayes-Nash equilibriumanalysis. Let us assume thatagents have independent private valuations. Furthermore,in a first-price auction, anagent’s risk attitude also matters. For example, a risk-averse agent would be willingto sacrifice some expected utility (by increasing his bid over what a risk-neutral agentwould bid), in order to increase his probability of winning the auction. Let us assumethat agents are risk-neutral, and that their valuations aredrawn uniformly from someinterval, say[0, 1]. Let si denote the bid of playeri, andvi denote his true valuation.Thus if playeri wins, his payoff isui = vi−si; if he loses, it isui = 0. Now we provein the case of two agents that there is an equilibrium in whicheach player bids half ofhis true valuation. (This also happens to be theuniquesymmetric equilibrium, but wedo not demonstrate that here.)

Proposition 10.1.2 In a first-price auction with two risk-neutral bidders whoseval-uations are drawn independently and uniformly at random from the interval[0, 1],( 12v1,

12v2) is a Bayes-Nash equilibrium strategy profile.

Multiagent Systems, draft of August 27, 2007

Page 296: Mas kevin b-2007

280 10 Protocols for Multiagent Resource Allocation: Auctions

Proof. Assume that bidder 2 bids12v2. From the fact thatv2 was drawn from auniform distribution, all values ofv2 between 0 and 1 are equally likely. Nowconsider bidder 1’s expected utility, in order to write an expression for his bestresponse.

E[u1] =

∫ 1

0

u1dv2 (10.1)

The integral in Equation (10.1) can be broken up into two smaller integrals thatdescribe cases in which player 1 does and does not win the auction.

E[u1] =

∫ 2s1

0

u1dv2 +

∫ 1

2s1

u1dv2

We can now substitute in values foru1. In the first case, because 2 bids12v2, 1

wins whenv2 < 2s1, and gains utilityv1−s1. In the second case 1 loses and gainsutility 0. Observe that we can ignore the case where the agents tie, because thisoccurs with probability zero.

E[u1] =

∫ 2s1

0

(v1 − s1)dv2 +

∫ 1

2s1

u1dv2

= (v1 − s1)v2∣∣∣∣2s1

0

= 2v1s1 − 2s21 (10.2)

We can find bidder 1’s best response to bidder 2’s strategy by taking the derivativeof Equation (10.2) and setting it equal to zero:

∂s1(2v1s1 − 2s21) = 0

2v1 − 4s1 = 0

s1 =1

2v1

Thus when player 2 is bidding half her valuation, player 1’s best strategy is to bidhalf his valuation. The calculation of the optimal bid for player 2 is analogous,given the symmetry of the game and the equilibrium.

This proposition was quite narrow: it spoke about the case ofonly two bidders, andconsidered valuations that were drawn uniformly at random from a particular intervalof the real numbers. Nevertheless, this is already enough for us to be able to observethat first-price auctions are not incentive compatible (andhence, unsurprisingly, are notequivalent to second-price auctions).

Somewhat more generally, we have the following theorem.

Theorem 10.1.3 In a first-price sealed bid auction withn risk-neutral agents whose

c© Shoham and Leyton-Brown, 2007

Page 297: Mas kevin b-2007

10.1 Single-good auctions 281

valuations are independently drawn from a uniform distribution on the same boundedinterval of the real numbers, the unique symmetric equilibrium is given by the strategyprofile (n−1

n v1, . . . ,n−1

n vn).

In other words, the unique equilibrium of the auction occurswhen each player bidsn−1

n of their valuation. This theorem can be proven using an argument similar to thatused in Proposition 10.1.2 above, although the calculus gets a bit more involved (forone thing, we must reason about the fact that each of several opposing agents mayplace the high bid). However, there’s a broader problem: that proof only showed howto verify an equilibrium strategy. How do we identify one in the first place? Althoughit is also possible to do this from first principles (at least for straightforward auctionssuch as first-price), we will explain a simpler technique in the next section.

10.1.5 Revenue equivalence

Of the large (in fact, infinite) space of auctions, which one should an auctioneer choose?To a certain degree, the choice does not matter, a result formalized by the followingwell-known theorem.4

Theorem 10.1.4 (Revenue Equivalence Theorem)Assume that each ofn risk-neutralagents has an independent private valuation for a single good at auction, drawn from acommon cumulative distributionF (v) that is strictly increasing and atomless on[v, v].Then any auction mechanism in which

• the good will be allocated to the agent with the highest valuation;5 and

• any agent with valuationv has an expected utility of zero;

yields the same expected revenue, and hence results in any bidder with valuationvmaking the same expected payment.

Proof. Consider any mechanism (direct or indirect) for allocatingthe good. Letui(v) be i’s expected utility and letpi(v) be i’s probability of being awarded thegood, in equilibrium of the mechanism if he follows the equilibrium strategy foran agent with typev and this were in fact his type.

ui(vi) = vipi(vi)− E[payment by typevi of playeri] (10.3)

From the definition of equilibrium,

ui(vi) ≥ ui(v) + (vi − v)pi(v) (10.4)

By behaving according to the equilibrium strategy for a player of typev, i makesall the same payments and wins the good with the same probability as an agent of

4. What is stated, in fact, is the revenue equivalence theoremfor the private-value, single-good case. Similartheorems hold for other—though not all—cases.5. We can replace this condition by the requirement that the auction has a symmetric and increasing equilib-rium, and always allocates the good to an agent who placed the highest bid.

Multiagent Systems, draft of August 27, 2007

Page 298: Mas kevin b-2007

282 10 Protocols for Multiagent Resource Allocation: Auctions

type v. Because an agent of typevi values the good(vi − v) more than an agentof type v does, we must add this term. The inequality holds because this deviationmust be unprofitable. Considerv = vi + dvi, by substituting this expression intoEquation (10.4):

ui(vi) ≥ ui(vi + dvi) + dvipi(vi + dvi) (10.5)

Likewise, considering the possibility thati’s true type could bevi + dvi,

ui(vi + dvi) ≥ ui(vi) + dvipi(vi) (10.6)

Combining (10.5) and (10.6), we have

pi(vi + dvi) ≥ui(vi + dvi)− ui(vi)

dvi≥ pi(vi) (10.7)

Taking the limit asdvi → 0 gives

dui

dvi= pi(vi) (10.8)

Integrating up,

ui(vi) = ui(v) +

∫ vi

x=v

pi(x)dx (10.9)

Now consider any two mechanisms that satisfy the conditionsgiven in the state-ment of the theorem. A bidder with valuationv will never win (since the distri-bution is atomless), so his expected utilityui(v) = 0. Every agenti has the samepi(vi) (his probability of winning given his typevi) under the two mechanisms, re-gardless of his type. These mechanisms must then also have the sameui functions,by Equation (10.9). From Equation (10.3), this means that a player of any giventypevi must make the same expected payment in both mechanisms. Thus, i’s ex-anteexpected payment is also the same in both mechanisms. Since this is true forall i, the auctioneer’s expected revenue is also the same in both mechanisms.

Thus, when bidders are risk neutral and have independent private valuations, all theauctions we have spoken about so far—English, Japanese, Dutch, and all sealed bidauction protocols—are revenue equivalent. The revenue equivalence theorem is usefulbeyond telling the auctioneer that it doesn’t much matter which auction she holds,however. It is also a powerful analytic tool. In particular,we can make use of thistheorem to identify equilibrium bidding strategies for auctions that meet the theorem’sconditions.

For example, let’s consider again then-bidder first-price auction discussed in The-orem 10.1.3. Does this auction satisfy the conditions of therevenue equivalence theo-rem? The second condition is easy to verify; the first is harder, because it speaks aboutthe outcomes of the auction under the equilibrium bidding strategies. For now, let’sassume that the first condition is satisfied as well.

c© Shoham and Leyton-Brown, 2007

Page 299: Mas kevin b-2007

10.1 Single-good auctions 283

The revenue equivalence theorem only helps us, of course, ifwe use it to comparethe revenue from a first-price auction with that of another auction that we already un-derstand. The second-price auction will do nicely in this latter role: we already knowits equilibrium strategy, and the theorem applies to this auction as well. We know fromthe theorem that a bidder of the same type will make the same expected payment inboth auctions. In both of the auctions we’re considering, a bidder’s payment is zerounless he wins. Thus a bidder’s expected payment conditional on being the winner of afirst-price auction must be the same as his expected payment conditional on being thewinner of a second-price auction. Finally, since the first-price auction is efficient, wecan observe that under the symmetric equilibrium agents will bid this amount all thetime: if the agent is the high bidder then he will make the right expected payment, andif he’s not, his bid amount won’t matter.

We must now find an expression for a the expected value of the second-highest val-uation, given that bidderi has the highest valuation. It’s helpful to know the formulafor thekth order statistic, in this case of draws from the uniform distribution. Thekth order statisticorder statistic of a distribution is a formula for the expected value of thekth-largest ofn draws. Forn IID draws from[0, vmax], thekth order statistic is

n+ 1− kn+ 1

vmax. (10.10)

If bidder i’s valuationvi is the highest, then there aren − 1 other valuations drawnfrom the uniform distribution on[0, vi]. Thus, the expected value of the second-highestvaluation is the first-order statistic ofn− 1 draws from[0, vi]. Substituting into Equa-tion (10.10), we have(n−1)+1−(1)

(n−1)+1 (vi) = n−1n vi. This confirms the equilibrium strat-

egy from Theorem 10.1.3. It also gives us a suspicion (that turns out to be correct) aboutthe equilibrium strategy for first-price auctions under valuation distributions other thanuniform: each bidder bids the expectation of the second-highest valuation, conditionedon the assumption that his own valuation is the highest.

A caveat must be given about the revenue equivalence theorem: this result makes an“if” statement, not an “if and only if” statement. That is, while it’s true that all auctionssatisfying the theorem’s conditions must yield the same expected revenue, it’snot truethat all strategies yielding that expected revenue constitute equilibria. Thus, after usingthe revenue equivalence theorem to identify a strategy profile that one believes to bean equilibrium, one must then prove that this strategy profile is indeed an equilibrium.This should be done in the standard way, by assuming that all but one of the agents playaccording to the equilibrium, and show that the equilibriumstrategy is a best responsefor the remaining agent.

Finally, recall that we assumed above that the first-price auction allocates the good tothe bidder with the highest valuation. The reason it was reasonable to do this (althoughwe could instead have proven that the auction has a symmetric, increasing equilibrium)is that we have to check the strategy profile derived using therevenue equivalencetheorem anyway. Given the equilibrium strategy, it’s easy to confirm that the bidderwith the highest valuation will indeed win the good.

Multiagent Systems, draft of August 27, 2007

Page 300: Mas kevin b-2007

284 10 Protocols for Multiagent Resource Allocation: Auctions

10.1.6 Risk attitudes

One of the key assumptions of the revenue equivalence theorem is that agents are risk-neutral. It turns out that many of the auctions we have been discussing cease to berevenue-equivalent when agents’ risk attitudes change. Recall from Section 9.3.1 thatan agent’s risk attitude can be understood as describing hispreference between a surepayment and a gamble with the same expected value. (Risk-averse agents prefer thesure thing; risk-neutral agents are indifferent; risk-seeking agents prefer to gamble.)

To illustrate how revenue equivalence breaks down when agents are not risk-neutral,consider an auction environment involvingn bidders with IPV valuations drawn uni-formly from [0, 1]. Bidderi, having valuationvi, must decide whether he would preferto engage in a first-price auction or a second-price auction.Regardless of which auc-tion he chooses (presuming that he, along with the other bidders, follows the chosenauction’s equilibrium strategy),i knows that he’ll gain positive utility only if he hasthe highest utility. In the case of the first-price auction,i will always gain 1

nvi when hehas the highest valuation. In the case of having the highest valuation in a second-priceauctioni’s expectedgain will be 1

nvi, but because he will pay the second-highest ac-tual bid, the amount ofi’s gain will vary based on the other bidders’ valuations. Thus,in choosing between the first-price and second-price auctions and conditioning on thebelief that he will have the highest valuation,i is presented with the choice betweena sure payment and a risky payment with the same expected value. If i is risk-averse,he will value the sure payment more highly than the risky payment, and hence will bidmore aggressively in the first-price auction, causing it to yield the auctioneer a higherrevenue than the second-price auction. (Note that it’si’s behavior in thefirst-priceauction that varies: the second-price auction has the same dominant strategy regardlessof i’s risk attitude.) Ifi is risk-seeking, the opposite will hold, and the auctioneerwillderive greater profit from holding a second-price auction.

The strategic equivalence of Dutch and first-price auctionscontinues to hold underdifferent risk attitudes; likewise, the (weaker) equivalence of Japanese, English andsecond-price auctions continues to hold as long as bidders have IPV valuations. Theseconclusions are summarized in Table 10.1.

Risk-neutral, IPVJap

=Eng

=2nd

=1st

=DutchRisk-averse, IPV = = < =

Risk-seeking, IPV = = > =

Table 10.1 Relationships between revenues of various single-good auction protocols.

A similar dynamic holds if the bidders are all risk-neutral,but theseller is eitherrisk-averse or risk-seeking. The variations in bidders’ payments are greater in second-price auctions than they are in first-price auctions, because the former depends on thetwo highest draws from the valuation distribution while thelatter depends on only thehighest draw. However, these payments have the same expectation in both auctions.Thus, a risk-averse seller would prefer to hold a first-priceauction, while a risk-seekingseller would prefer to hold a second-price auction.

c© Shoham and Leyton-Brown, 2007

Page 301: Mas kevin b-2007

10.1 Single-good auctions 285

10.1.7 Auction variations

In this section we consider three variations on our auction model. First, we considerreverse auctions, in which one buyer accepts bids from a set of sellers. Second, wediscuss the effect of entry costs on equilibrium strategies. Finally, we consider auctionswith uncertain numbers of bidders.

Reverse auctions

So far, we have considered auctions in which there is one seller and a set of buyers.What about the opposite: an environment in which there is one buyer and a set of sell-ers? This is what occurs when a buyer engages in arequest for quote(RFQ). Broadly, request for quotethis is called areverse auction, because in its open outcry variety this scenario involves

reverse auctionprices that descend rather than ascending.It turns out that everything that we have said about auctionsalso applies to reverse

auctions. Reverse auctions are simply auctions in which we substitute the word “seller”for “buyer” and vice versa, and furthermore place a negativesign in front of all numbersindicating prices or bid amounts. Because of this equivalence we will not discussreverse auctions any further; note, however, that our concentration on (non-reverse)auctions is without loss of generality.

Auctions with entry costs

A second auction variationdoescomplicate things, though we won’t analyze it here.This is the introduction of anentry costto an auction. Imagine that a first-price auctionentry costscost $1 to attend. How should bidders decide whether or not toattend, and then howshould they decide to bid given that they’re no longer sure how many other bidderswill have chosen to attend? This is a realistic way of augmenting our auction model:for example, it can be used to model the cost of researching anauction, driving (ornavigating a web browser) to it, and spending the time to bid.However, it can makeequilibrium analysis much more complex.

Things are straightforward for second-price (or, for IPV valuations, Japanese and En-glish) auctions. To decide whether to participate, biddersmust evaluate their expectedgain from participation. This means that the equilibrium strategy in these auctions nowdoesdepend on the distribution of other agents’ valuations and on the number of theseagents. The good news is that, once they have decided to bid, it remains an equilibriumfor bidders to bid truthfully.

In first-price auctions (and, generally, other auctions that do not have a dominant-strategy equilibrium) auctions with entry costs are harder—though certainly not impossible—to analyze. Again, bidders must make a trade-off between their expected gain fromparticipating in the auction and the cost of doing so. The complication here is that,since he is uncertain other agents’ valuations, a given bidder will thus also be uncer-tain about the number of agents who will decide that participating in the auction is intheir interest. Since an agent’s equilibrium strategy given that he has chosen to partic-ipate depends on the number of other participating agents, this makes that equilibriumstrategy more complicated to compute. And that, in turn, makes it more difficult todetermine the agent’s expected gain from participating in the first place.

Multiagent Systems, draft of August 27, 2007

Page 302: Mas kevin b-2007

286 10 Protocols for Multiagent Resource Allocation: Auctions

Auctions with uncertain numbers of bidders

Our standard model of auctions has presumed that the number of bidders is commonknowledge. However, it may be the case that bidders are uncertain about the number ofcompetitors they face, especially in a sealed-bid auction or in an auction held over theinternet. The above discussion of entry costs gave another example of how this couldoccur. Thus, it is natural to elaborate our model to allow forthe possibility that biddersmight be uncertain about the number of agents participatingin the auction.

It turns out that modeling this scenario is not as straightforward as it might appear.In particular, one must be careful about the fact that bidders will be able to update theirex-antebeliefs about the total number of participants by conditioning on the fact oftheir own selection, and thus may lead to a situation in whichbidders’ beliefs about thenumber of participants may be asymmetric. (This can be especially difficult when themodel does not place an upper bound on the number of agents whocan participate inan auction.) We will not discuss these modeling issues here;interested readers shouldconsult the notes at the end of the chapter. Instead, simply assume that the bidders holdsymmetric beliefs, each believing that the probability that the auction will involvejbidders isp(j).

Because the dominant strategy for bidding in second-price auctions does not dependon the number of bidders in the auction, it still holds in thisenvironment. The same isnot true of first-price auctions, however. LetF (v) be a cumulative probability densityfunction indicating the probability that a bidder’s valuation is greater than or equal tov,and letbe(vi, j) be the equilibrium bid amount in a (classical) first-price auction withjbidders, for a bidder with valuationj. Then the symmetric equilibrium of a first-priceauction with an uncertain number of bidders is

b(vi) =

∞∑

j=2

F j−1(vi)p(j)∑∞k=2 F

k−1(vi)p(k)be(vi, j).

Interestingly, because the proof of the revenue equivalence theorem does not dependon the number of agents, that theorem applies directly to this environment. Thus, in thisstochastic environment the seller’s revenue is the same when she runs a first-price anda second-price auction. The revenue equivalence theorem can thus be used to derivethe strategy above.

10.1.8 “Optimal” (revenue-maximizing) auctions

So far in our theoretical analysis we have only considered auctions in which the good isallocated to the high bidder and the seller imposes no reserve price. These assumptionsmake sense, especially when the seller wants to ensureeconomic efficiency—i.e., thatthe bidder who values the good most gets it. However, we mightinstead believe thatthe seller does not care who gets the good, but rather seeks tomaximize her expectedrevenue. In order to do so, she may be willing to risk failing to sell the good even whenthere is an interested buyer, and furthermore might be willing sometimes to sell to abuyer who didn’t make the highest bid, in order to encourage high bidders to bid moreaggressively. Mechanisms that are designed to maximize theseller’s expected revenueare known asoptimal auctions.optimal auctions

c© Shoham and Leyton-Brown, 2007

Page 303: Mas kevin b-2007

10.1 Single-good auctions 287

Consider an IPV setting where bidders are risk-neutral and each bidderi’s valua-tion is drawn from some strictly increasing cumulative density functionFi(v), havingprobability density functionfi(v). Note that we allow for the possibility thatFi 6= Fj :bidders’ valuations can come from different distributions. Such interactions are calledasymmetric auctions. We do assume that the seller knows the distribution from which asymmetric

auctionseach individual bidder’s valuation is drawn, and hence is able to distinguish strongbidders from weak bidders.

Define bidderi’s virtual valuationas virtual valuation

ψi(vi) = vi −1− Fi(vi)

fi(vi).

Also define an agent-specific reserve pricer∗i as the value for whichψi(r∗i ) = 0. The

optimal (single-good) auction is a sealed-bid auction in which every agent is asked todeclare his true valuation. These declarations are used to compute a virtual valuationfor each agent. The good is sold to the agenti whose virtual valuationψi(vi) is thehighest, as long as this value is positive (i.e., the agent’sdeclared valuationvi exceedshis reserve pricer∗i ). If every agent’s virtual valuation is negative, the seller keepsthe good and achieves a revenue of zero. If the good is sold, the winning agenti ischarged the maximum of his reserve price and the second-highest virtual valuation:max(r∗i ,maxj 6=iψj(vj)).

How would bidders behave in this auction? Note that it can be understood as asecond-price auction with a reserve price, held in virtual valuation space rather than inthe space of actual valuations. However, since neither the reserve prices nor the trans-formation between actual and virtual valuations depends onthe agent’s declaration, theproof that a second-price auction is dominant-strategy truthful applies here as well, andhence the optimal auction remains strategy-proof.

We began this discussion by introducing a new assumption: that different bidders’valuations could be drawn from different distributions. What happens when this doesnot occur, and instead all bidders’ valuations come from thesame distribution? Inthis case, the optimal auction has a simpler interpretation: it is simply a second-price auction in which the seller sets a reserve pricer∗ at the value that satisfiesr∗ − 1−Fi(r

∗)fi(r∗) = 0. For this reason, it is common to hear the claim that optimal

auctions correspond to setting reserve prices optimally. It is important to recognizethat this claim holds only in the case ofsymmetricIPV valuations. In the asymmetriccase, the virtual valuations can be understood as artificially increasing the amount ofweak bidders’ bids in order to make them more competitive. This sacrifices efficiency,but more than makes up for it on expectation by forcing bidders with higher expectedvaluations to bid more aggressively.

Although optimal auctions are interesting from a theoretical point of view, they arerarely to never used in practice. The problem is that they arenot detail free: they detail free

auctionsrequire the seller to incorporate information about the bidders’ valuation distributionsinto the mechanism. Such auctions are often considered impractical; famously, theWilson doctrineurges auction designers to consider only detail free mechanisms. With Wilson doctrinethis criticism in mind, it is interesting to ask the following question. In a symmetricIPV setting, is it better for the auctioneer to set an optimalreserve price (that is, ifshe knew the bidders’ valuation distribution) or to attractone additional bidder to the

Multiagent Systems, draft of August 27, 2007

Page 304: Mas kevin b-2007

288 10 Protocols for Multiagent Resource Allocation: Auctions

auction? Interestingly, the auctioneer is better off in thelatter case. Intuitively, an extrabidder is similar to a reserve price in the sense that his addition to the auction increasescompetition among the other bidders, but differs because hecan also buy the goodhimself. This suggests that trying to attract as many bidders as possible (by, amongother things, running an auction protocol with which bidders are comfortable) may bemore important than trying figure out the bidders’ valuationdistributions in order torun an optimal auction.

10.1.9 Collusion

Since we have seen that an auctioneer can increase her expected revenue by increasingcompetition among bidders, it is not surprising that bidders, conversely, can reducetheir expected payments to the auctioneer by reducing competition among themselves.Such cooperation between bidders is calledcollusion. Collusion is usually illegal;collusioninterestingly, however, it is also notoriously difficult for agents to pull off. The reasonis conceptually similar to the situation faced by agents playing the Prisoner’s Dilemma(see Section 2.3.5): while a given agent is better off if everyone cooperates than ifeveryone behaves selfishly, he isevenbetter off if everyone else cooperates and hebehaves selfishly himself. An interesting question to ask about collusion, therefore, iswhich collusive protocols have the property that agents will gain by colluding whilebeing unable to gain further by deviating from the protocol.

Second-price auctions

First, consider a protocol for collusion in second-price (or Japanese/English) auctions.We assume that a set of two or more colluding agents is chosen exogenously; thisset of agents is called acartel. Assume that the agents are risk-neutral and have IPVcartelvaluations. In the literature, a collusive protocol of thiskind is called abidding ring. It

bidding ring is sometimes necessary (as it is in this case) to assume the existence of an agent who isnot interested in the good being auctioned, but who serves torun the bidding ring. Thisagent does not behave strategically, and hence could be a computer program; we willrefer to this agent as thering center. Observe that there may be agents who participatering centerin the main auction and do not participate in the cartel; there may even be multiplecartels. The protocol follows:

1. Each agent in the cartel submits a bid to the ring center.

2. The ring center identifies the maximum bid that he received, vr1; he submits this bid

in the main auction, and drops the other bids. Denote the highest dropped bid asvr2.

3. If the ring center’s bid wins in the main auction (at the second-highest price in thatauction,v2), the ring center awards the good to the bidder who placed themaximumbid in the cartel, and requires that bidder to paymax(v2, v

r2).

4. The ring center gives every agent who participated in the bidding ring a payment ofk, regardless of the amount of that agent’s bid and regardlessof whether or not thecartel’s bid won the good in the main auction.

c© Shoham and Leyton-Brown, 2007

Page 305: Mas kevin b-2007

10.1 Single-good auctions 289

How should agents bid if they are faced with this bidding ringprotocol? First of all,consider the case wherek = 0. Here it is easy to see that this protocol is strategicallyequivalent to a second-price auction in a world where the bidder’s cartel does not exist.The high bidder always wins, and always pays the globally second-highest price (themax of the second-highest prices in the cartel and in the mainauction). Thus the auctionis dominant-strategy truthful, and agents have no incentive to cheat each other in thebidding ring’s “pre-auction”. At the same time, however, agents also do not gain byparticipating in the bidding ring: they would be just as happy if the cartel disbandedand they had to bid directly in the main auction.

Although for k = 0 the situation with and without the bidding ring is equivalentfrom the bidders’ point of view, it is different from the point of view of the ring center.In particular, with positive probabilityvr

2 will be the globally second-highest valuation,and hence the ring center will make a profit. (He will payv2 for the good in the mainauction, and will be paidvr

2 > v2 for it by the winning bidder.) Letc > 0 denotethe ring center’s expected profit. If there arenr agents in the bidding ring, the ringcenter could pay each agent up tok = c

nrand still budget balance on expectation. For

values ofk smaller than this amount but greater than zero, the ring center will profit onexpectation while still giving agents a strict preference for participation in the biddingring.

How are agents able to gain in this setting—doesn’t the revenue equivalence theoremsay that their gains should be the same in all efficient auctions? Observe that the agents’expected payments are in fact unchanged, although not all ofthis amount goes to theauctioneer. What does change is the unconditional payment that every agent receivesfrom the ring center. The second condition of the revenue-equivalence theorem statesthat a bidder with the lowest possible valuation must receive zero expected utility. Thiscondition is violated under our bidding ring protocol, in which such an agent has anexpected utility ofk.

First-price auctions

The construction of bidding ring protocols is much more difficult in the first-price auc-tion setting. This is for a number of reasons. First, in orderto make a lower expectedpayment, the winner must actually place a lower bid. In a second-price auction, awinner can instead persuade the second-highest bidder to leave the auction and makethe same bid he would have made anyway. This difference matters because in thesecond-price auction the second-highest bidder has no incentive to renege on his offerto drop out of the auction; by doing so, he can only make the winner pay more. In thefirst-price auction, the second-highest bidder could trickthe highest bidder into bid-ding lower by offering to drop out, and then could still win the good at less than hisvaluation. Some sort of enforcement mechanism is thereforerequired for punishingcheaters. Another problem with bidding rings for first-price auctions concerns how wemodel what non-colluding bidders know about the presence ofa bidding ring in theirauction. In the second-price auction we were able to gloss over this point: the non-colluding agents didn’t care whether other agents might have been colluding, becausetheir dominant strategy was independent of the number of agents or their valuation dis-tributions. (Observe that in our previous protocol, if the cumulative density function of

Multiagent Systems, draft of August 27, 2007

Page 306: Mas kevin b-2007

290 10 Protocols for Multiagent Resource Allocation: Auctions

bidders’ valuation distribution wasF , the ring center could be understood as an agentwith a valuation drawn from a distribution with CDFFnr .) In a first-price auction,the number of bidders and their valuation distributions matter to bidders’ equilibriumstrategies. If we assume that bidders know the true number ofbidders, then a collusiveprotocol in which bidders are dropped doesn’t make much sense (the strategies of otherbidders in the main auction would be unaffected). If we assume that non-colluding bid-ders follow the equilibrium strategy based on the number of bidders who actually bidin the main auction, bidder-dropping collusion does make sense, but the non-colludingbidders no longer follow an equilibrium strategy (they would gain on expectation ifthey bid more aggressively).

For the most part, the literature on collusion has side-stepped this problem by con-sidering first-price auctions only under the assumption that all n bidders belong to thecartel. In this setting, two kinds of bidding ring protocolshave been proposed.

The first assumes that the same bidders will have repeated opportunities to collude.Under this protocol all bidders except one are dropped, and this bidder bids zero (orthe reserve price) in the main auction. Clearly other bidders could gain by cheating andalso placing bids in the main auction; however, they are dissuaded from doing so by thethreat that if they cheat, the cartel will be disbanded and they will lose the opportunityto collude in the future. Under appropriate assumptions about agents’ discount rates(their valuations for profits in the future), their number, their valuation distribution,and so on, it can be shown that it constitutes an equilibrium for agents to follow thisprotocol. A variation on the protocol, that works almost regardless of the values ofthese variables, has the other agents forever punish any agent who cheats, following agrim trigger strategy (see Section 5.1.2).

The second protocol works in the case of a single, un-repeated first-price auction. Itis similar to the protocol introduced in the previous section:

1. Each agent in the cartel submits a bid to the ring center.

2. The ring center identifies the maximum bid that he received, v1. The bidder whoplaced this bid must pay the full amount of his bid to the ring center.

3. The ring center bids in the main auction at0. Note that the bidding ring always winsin the main auction as there are no other bidders.

4. The ring center gives the good to the bidder who placed the winning bid in thepre-auction.

5. The ring center pays every bidder other than the winner1n−1 v1.

Observe that this protocol can be understood as holding a first-price auction for theright to bid the reserve price in the main auction, with the profits of this pre-auctionsplit evenly among the losing bidders. (We here assume a reserve price of zero; theprotocol can easily be extended to work for other reserve prices.) Letbn+1(vi) denotethe amount that bidderi would bid in the (standard) equilibrium of a first-price auctionwith a total ofn+1 bidders. The symmetric equilibrium of the bidding ring pre-auctionis for each bidderi to bid

vi =n− 1

nbn+1(vi).

c© Shoham and Leyton-Brown, 2007

Page 307: Mas kevin b-2007

10.1 Single-good auctions 291

Demonstrating this fact is not trivial; details can be foundin the paper cited at the endof the chapter. Here we point out only the following. First, the n−1

n factor has nothingto with the equilibrium bid amount for first-price auctions with a uniform valuationdistribution; indeed, the result holds for any valuation distribution. Rather, it can beinterpreted as meaning that each bidder offers to pay everyone else1

nbn+1(vi), and

thereby also to gain utility of1nbn+1(vi) for himself. Second, although the equilibrium

strategy depends onbn+1, there are really onlyn bidders. Finally, observe that thismechanism is budget balanced (i.e., not just on expectation).

10.1.10 Interdependent values

So far, we have only considered the independent private values (IPV) setting. As wediscussed above, this setting is reasonable for domains in which the agents’ valuationsare unrelated to each other, depending only on their own signals—e.g., because anagent is buying a good for his own personal use. In this section, we discuss differentmodels, in which agents’ valuations depend on both their ownsignals and other agents’signals.

Common values

First of all, we discuss thecommon value(CV) setting, in which all agents value the common valuegood at exactly the same amount. The twist is that the agents don’t know this amount,though they have (common) prior beliefs about its distribution. Each agent has a privatesignal about the value, which allows him to condition his prior beliefs to arrive at aposterior distribution over the good’s value.6

For example, consider the problem of buying the rights to drill for oil in a particularoil field. The field contains some (uncertain but fixed) amountof oil, the cost of extrac-tion is about the same no matter who buys the contract, and thevalue of the oil will bedetermined by the price of oil when it is extracted. Given publicly-available informa-tion about these issues, all oil drilling companies have thesame prior distribution overthe value of the drilling rights. The difference between agents is that each has differentgeologists who estimate the amount of oil and how easy it willbe to extract, and dif-ferent financial analysts who estimate the way oil markets will perform in the future.These signals cause agents to arrive at different posteriordistributions over the valueof the drilling right, based on which, each agenti can determine an expected valuevi.How can this valuevi be interpreted? One way of understanding it is to note that ifasingle agenti was selected at random and offered a take-it-or-leave-it offer to buy thedrilling contract for pricep, he would achieve positive expected utility by accepting theoffer if and only ifp < vi.

6. In fact, most of what we say in this section also applies to a much more general valuation model in whicheach bidder may value the good differently. Specifically, in this model each bidder receives a signal drawnindependently from some distribution, and bidderi’s valuation for the good is some arbitrary function ofall of the bidders’ signals, subject to a symmetry condition that states thati’s valuation does not dependon which other agents received which signals. We focus here on the common value model to simplify theexposition.

Multiagent Systems, draft of August 27, 2007

Page 308: Mas kevin b-2007

292 10 Protocols for Multiagent Resource Allocation: Auctions

Now consider what would happen if these drilling rights weresold in a second-priceauction amongk risk-neutral agents. One might expect that each bidderi ought tobid vi. However, it turns out that bidders would achieve negative expected utility byfollowing this strategy.7 How can this be—didn’t we previously claim thati would behappy to pay any amount up tovi for the rights? The catch is that, since the value of thegood to each bidder is the same, each bidder cares as much about otherbidders’ signalsas he does about his own. When he finds out that he won the second-price auction, thewinning bidder also learns that he had the most optimistic signal. This informationcauses him to downgrade his expectation about the value of the drilling rights, whichcan make him conclude that he paid too much! This phenomenon is called thewinner’scurse.winner’s curse

Of course, the winner’s curse does not mean that in the CV setting the winner of asecond-price auction always pays too much. Instead, it goesto show that truth-telling isno longer a dominant strategy (or, indeed, an equilibrium strategy) of the second-priceauction in this setting. There is still an equilibrium strategy that bidders can follow inorder to achieve positive expected utility from participating in the auction; this simplyrequires the bidders to consider how they would update theirbeliefs upon finding thatthey were the high bidder. The symmetric equilibrium of a second-price auction inthis setting is for each bidderi to bid the amountb(vi) at which, if the second-highestbidder also happened to have bidb(vi), i would achieve zero expected gain for thegood, conditioned on the two highest signals both beingvi.8 We do not prove thisresult—or even state it more formally—as doing so would require the introduction ofconsiderable notation.

What about auctions other than second-price in the CV setting? Let’s considerJapanese auctions, recalling from Section 10.1.3 that the this auction can be used asa model of the English auction for theoretical analysis. Here the winner of the auctionhas the opportunity to learn more about his opponents’ signals, by observing the timesteps at which each of them drops out of the auction. The winner will thus have theopportunity to condition his strategy on each of his opponents’ signals, unless all ofhis opponents drop out at the same time. Let’s assume that thesequence of prices thatwill be called out by the auctioneer is known: thetth price will bept. The symmetricequilibrium of a Japanese auction in the CV setting is as follows. At each time stept, each agenti computes the expected utility of winning the goodvi,ti

, given what hehas learned about the signals of opponents who dropped out inprevious time steps, andassuming that all remaining opponents drop out at the current time step. (Bidders candetermine the signals of opponents who dropped out, at leastapproximately, by invert-ing the equilibrium strategy to determine what opponents’ signals must have been inorder for them to have dropped out when they did.) Ifvi,ti

> pt+1 then if all remainingagents actually did drop out at timet and madei the winner at timet+ 1, i would gainon expectation. Thusi remains in the auction at timet if vi,ti

> pt+1, and drops outotherwise.

7. As it turns out, we can only make this statement because we assumed thatk > 2. For the case of exactlytwo bidders, biddingvi is the right thing to do.8. We don’t need to discuss how ties are broken sincei achieves zero expected utility whether he wins orloses the good.

c© Shoham and Leyton-Brown, 2007

Page 309: Mas kevin b-2007

10.2 Multiunit auctions 293

Observe that the stated equilibrium strategy is different from the strategy given abovefor second-price auctions: thus, while second-price and Japanese auctions are strategi-cally equivalent in the IPV case, this equivalence does not hold in CV domains.

Affiliated values and revenue comparisons

The common value model is generalized by another valuation model calledaffiliatedvalues, which permits correlations between bidders’ signals. Forexample, this lat- affiliated valuester model can describe cases where a bidder’s valuation is divided into a private-valuecomponent (e.g., the bidder’s inherent value for the good) and a common-value compo-nent (e.g., the bidder’s private, noisy signal about the good’s resale value). Technically,we say that agents have affiliated values when a high value of one agent’s signal in-creases the probability that other agents will have high signals as well. A thoroughtreatment is beyond the scope of this book; however, we make two observations here.

First, in affiliated values settings generally—and thus in common-value settings asa special case—Japanese (and English) auctions lead to higher expected prices thansealed-bid second-price auctions. Even lower is the expected revenue from first-pricesealed bid auctions. The intuition here is that the winner’sgain depends on the privacyof his information. The more the price paid depends on others’ information (ratherthan on expectations of others’ information), the more closely this price is related tothe winner’s information, since valuations are affiliated.As the winner loses the privacyof his information, he can extract a smaller “information rent”, and so must pay moreto the seller.

Second, this argument leads to a powerful result known as thelinkage principle. If linkage principlethe seller has access to any private source of information that she knows is affiliatedwith the bidders’ valuations, she is better off precommitting to reveal it honestly. Con-sider the example of an auction of used cars, where the quality of each car is a randomvariable about which the seller, and each bidder, receives some information. The link-age principle states that the seller is better off committing to declare everything sheknows about each car’s defects before the auctions, even though this will sometimeslower the price at which she will be able to sell an individualcar. The reason the sellergains by this disclosure is that making her information public also reveals informa-tion about the winner’s signal, and hence reduces his ability to charge information rent.Note that the seller’s “commitment power” is crucial to thisargument. Bidders are onlyaffected in the desired way if the seller is able to convince them that she will alwaystell the truth, for example by agreeing to subject herself toan audit by a trusted thirdparty.

10.2 Multiunit auctions

We have so far considered the problem of selling a single goodto one winning bidder.In practice there will often be more than one good to allocate, and different goods mayend up going to different bidders. Here we considermultiunit auctions, in which there multiunit

auctionsis still only onekind of good available, but there are now multiple identical copies ofthat good. (Think of new cars, tickets to a movie, MP3 downloads, or shares of stockin the same company.) Although this setting seems like only asmall step beyond the

Multiagent Systems, draft of August 27, 2007

Page 310: Mas kevin b-2007

294 10 Protocols for Multiagent Resource Allocation: Auctions

single-item case we considered earlier, it turns out that there is still a lot to be saidabout it.

10.2.1 Canonical auction families

In Section 10.1.1 we surveyed some canonical single-good auction families. Here wereview the same auctions, explaining how each can be extended to the multiunit case.

Sealed-bid auctions

Overall, sealed-bid auctions in multiunit settings differfrom their single-unit cousinsin several ways. First, consider payment rules. If there arethree items for sale, andeach of the top three bids requests a single unit, then each bid will win one good. Ingeneral, these bids will offer different amounts; the question is what each bidder shouldpay. In the pay-your-bid scheme (the so-calleddiscriminatory pricing rule) each of thediscriminatory

pricing rule three top bidders pays a different amount, namely his own bid. This rule thereforegeneralizes the first-price auction. Under theuniform pricing ruleall winners pay the

uniform pricingrule

same amount; this is usually either the highest among the losing bids or the lowestamong the winning bids.

Second, instead of placing a single bid, bidders generally have to provide a priceoffer for every number of units. If a bidder simply names one number of units and isunwilling to accept any fewer, we say he has placed anall-or-nothing bid. If he namesall-or-nothing

bids one number of units but will accept any smaller number at the same price-per-unit wecall the biddivisible. We investigate some richer ways for bidders to specify multiunit

divisible bidsvaluations towards the end of Section 10.2.3.

Finally, tie-breaking can be tricky when bidders place all-or-nothing bids. For ex-ample, consider an auction for 10 units in which the highest bids are as follows, allof them all-or-nothing: 5 units for $20/unit, 3 units for $15/unit, 5 units for $15/unit,and 1 unit for $15/unit. Presumably the first bid should be satisfied, as well as two ofthe remaining three—but which? Here one sees different tie-breaking rules—by quan-tity (larger bids win over smaller ones), by time (earlier bids win over later bids), andcombinations thereof.

English auctions

When moving to the multiunit case, designers of English auctions face all of the prob-lems discussed above. However, since bidders can revise their offers from one round tothe next, multiunit English auctions rarely ask bidders to specify more than one numberof units along with their price offer. Auction designers still face the choice of whetherto treat bids as all-or-nothing or divisible. Another subtlety arises when you considerminimum increments. Consider the following example, in which there is a total of 10units available, and two bids: one for 5 units at $1/unit, andone for 5 units at $4/unit.What is the lowest acceptable next bid? Intuitively, it depends on the quantity—a bidfor 3 units at $2/unit can be satisfied, but a bid for 7 units at $2/unit cannot. Thisproblem is avoided if the latter bid is divisible, and hence can be partially satisfied.

c© Shoham and Leyton-Brown, 2007

Page 311: Mas kevin b-2007

10.2 Multiunit auctions 295

Japanese auctions

Japanese auctions can be extended to the multiunit case in a similar way. Now aftereach price increase each agent calls out a number rather thanthe simple in/out declara-tion, signifying the number of units he is willing to buy at the current price. A commonrestriction is that the number must decrease over time; the agent cannot ask to buy moreat a high price than he did at a lower price. The auction is overwhen the supply equalsor exceeds the demand. Different implementations of this auction variety differ in whathappens if supply exceeds demand: all bidders can pay the last price at which demandexceeded supply, with some of the dropped bidders reinserted according to one of thetie-breaking schemes above; goods can go unsold; one or morebidders can be offeredpartial satisfaction of their bids at the previous price; etc.

Dutch auctions

In multiunit Dutch auctions, the seller calls out descending per-unit prices, and agentsmust augment their signals with the quantity they wish to buy. If that is not the entireavailable quantity the auction continues. Here there are several options—the price cancontinue to descend from the current level, can be reset to a set percentage above thecurrent price, or can be reset to the original high price.

10.2.2 Single-unit demand

Let us now investigate multiunit auctions more formally, starting with a very simplemodel. Specifically, consider a setting withk identical goods for sale and risk-neutralbidders who want only1 unit each and have independent private values for these singleunits. Observe that restricting ourselves to this setting gets us around some of the trickypoints above such as complex tie breaking.

We saw in Section 10.1.3 that the VCG mechanism can be appliedto provide use-ful insight into auction problems, yielding the second-price auction in the single-goodcase. What sort of auction does VCG correspond to in our simplemultiunit setting?(You may want to think about this before reading on.) As before, since we will sim-ply apply VCG, the auction will be efficient and dominant-strategy truthful; since themarket is one-sided it will also satisfyex-postindividual rationality and weak budgetbalance. The auction mechanism is to sell the units to thek highest bidders for thesame price, and to set this price at the amount offered by the highest losing bid. Thus,instead of a2nd-price auction we have ak + 1st-price auction.

One immediate observation that we can make about this auction mechanism is thata seller will not necessarily achieve higher profits by selling more units. For example,consider the valuations in Figure 10.2.

If the seller were to offer only a single unit using VCG, he would receive revenueof $20. If he offered two units, he would receive $30: less than before on a per-unitbasis, but still more revenue overall. However, if the seller offered three units he wouldachieve total revenue of only $24, and if he offered four units he would get no revenueat all. What is going on? The answer points to something fundamental about markets.A dominant-strategy, efficient mechanism can use nothing but losing bidders’ bids to

Multiagent Systems, draft of August 27, 2007

Page 312: Mas kevin b-2007

296 10 Protocols for Multiagent Resource Allocation: Auctions

Bidder Bid Amount1 $252 $203 $154 $8

Figure 10.2 Example valuations in a single-unit demand multiunit auction.

set prices, and as the seller offers more and more units, there will necessarily be aweaker and weaker pool of losing bidders to draw upon. Thus the per-unit price willweakly fall as the seller offers additional units for sale, and depending on the bidders’valuations, his total revenue can fall as well. What can be done to fix this problem? Aswe saw for the single-good case in Section 10.1.8, the seller’s revenue can be increasedon expectation by permitting inefficient allocations, e.g., by using knowledge of thevaluation distribution to set reserve prices. In the example above, the seller’s revenuewould have been maximized if he had been lucky enough to set a $15 reserve price. (Tosee how the auction behaves in this case, think of the reserveprice simply as anotherbid placed in the auction by the seller.) However, these tactics only go so far. In theend, the law of supply and demand holds—as the supply goes up, the price goes down,since competition between bidders is reduced. We will return to the importance of thisidea for multiunit auctions in Section 10.2.4 below.

Thek+ 1st-price auction can be contrasted with another popular payment rule, usedfor example in an English auction variant by the online auction site eBay. In this auctionbidders are charged the lowest winning bid rather than the highest losing bid.9 This hasthe advantage that winning bidders always pay a nonzero amount, even when there arefewer bids than there are units for sale. In essence, this makes the bidders’ strategicproblem somewhat harder (the lowest winning bidder is able to improve his utilityby bidding dishonestly, and so overall bidders no longer have dominant strategies) inexchange for making the seller’s strategic problem somewhat easier (while the sellercan still lower his revenue by selling too many units, he doesnot have to worry aboutthe possibility of giving them away for nothing).

Despite such arguments for and against different mechanisms, as in the single-goodcase, in some sense it does not matter what auction the sellerchooses. This is becausethe revenue equivalence theorem for that case (Theorem 10.1.4) can be extended tocover multiunit auctions.10 The proof is similar, so we omit it.

Theorem 10.2.1 (Revenue Equivalence Theorem, multiunit version) Assume thateach ofn risk-neutral agents has an independent private valuation for a single unitof k identical goods at auction, drawn from a common cumulative distribution F (v)that is strictly increasing and atomless on[v, v]. Then any auction mechanism in which

9. Confusingly, this multiunit English auction variant is sometimes called aDutch auction. This is a practiceto be discouraged; the correct use of the term is in connection with the descending open outcry auction.10. As before, we state a more restricted version of this revenue equivalence theorem than necessary. Forexample, revenue equivalence holds forall pairs of auctions that share the same allocation rule (not just forefficient auctions), and does not require our assumption of single-unit demand.

c© Shoham and Leyton-Brown, 2007

Page 313: Mas kevin b-2007

10.2 Multiunit auctions 297

• the goods will will be allocated to the agent with the highestvaluation; and

• any agent with valuationv has an expected utility of zero;

yields the same expected revenue, and hence results in any bidder with valuationvmaking the same expected payment.

Thus all of the payment rules suggested in the previous paragraph must yield thesame expected revenue to the seller. Of course, this result holds only if we believethat bidders are correctly described by the theorem’s assumptions (e.g., they are riskneutral) and that they will play equilibrium strategies. The fact that auction houses likeeBay opt for non-dominant-strategy mechanisms suggests that these beliefs may notalways be reasonable in practice.

We can also use this revenue equivalence result to analyze another setting: repeatedsingle-good auction mechanisms, or so-calledsequential auctions. For example, imag- sequential

auctionsine a car dealer auctioning off a dozen new cars to a fixed set ofbidders through asequence of second-price auctions. With a bit of effort, it can be shown that for suchan auction there is a symmetric equilibrium in which bidders’ bids increase from oneauction to the next, and in a given auction bidders with higher valuations place higherbids. (To offer some intuition for the first of these claims, bidders still have a dominantstrategy to bid truthfully in the final auction. In previous auctions, bidders have positiveexpected utility after losing the auction, because they canparticipate in future rounds.As the number of future rounds decreases, so does this expected utility; hence in equi-librium bids rise.) We can therefore conclude that the auction is efficient, and thus byTheorem 10.2.1 each bidder makes the same expected payment as under VCG. Thusthere exists a symmetric equilibrium in which bidders bid honestly in the final auctionk, and in each auctionj < k, each bidderi bids the expected value of thekth-highestof the other bidders’ valuations, conditional on the assumption that his valuationvi liesbetween thejth-highest and thej + 1st-highest valuations. This makes sense becausein each auction the bidder who is correct in making this assumption will be the bidderwho places the second-highest bid and sets the price for the winner. Thus, the winnerof each auction will pay an unbiased estimate of the overallk + 1st-highest valuation,resulting in an auction that achieves the same expected revenue as VCG.

Very similar reasoning can be used to show that a symmetric equilibrium for k se-quentialfirst-price auctions is for each bidderi in each auctionj ≤ k to bid theexpected value of thekth-highest of the other bidders’ valuations, conditional on theassumption that his valuationvi lies between thej − 1st-highest and thejth-highestvaluations. Thus, each bidder conditions on the assumptionthat he is the highest bid-der remaining; the bidder who is correct in making this assumption wins, and hencepays an amount equal to the expected value of the overallk + 1st-highest valuation.

10.2.3 Beyond single-unit demand

Now let us investigate how things change when we relax the restriction that each bidderis only interested in a single unit of the good.

Multiagent Systems, draft of August 27, 2007

Page 314: Mas kevin b-2007

298 10 Protocols for Multiagent Resource Allocation: Auctions

VCG for general multiunit auctions

How does VCG behave in this more general setting? We no longerhave somethingas simple as thek + 1st-price auction we encountered in Section 10.2.2. Instead, wecan say that all winning bidders who won the same number of units will pay the sameamount as each other. This makes sense because the change in social welfare thatresults from dropping any one of these bidders will be the same. Bidders who windifferent numbers of units will not necessarily pay the sameper-unit prices. We cansay, however, that bidders who win larger numbers of units will pay at least as much(in total, though not necessarily per unit) as bidders who won smaller numbers of units,as their impact on social welfare will always be at least as great.

VCG can also help us to notice another interesting phenomenon in the general mul-tiunit auction case. For all the auctions we have consideredin this chapter so far, ithas always been computationally straightforward to identify an auction’s winners. Inthis setting, however, the problem of finding the social-welfare-maximizing allocationis computationally hard. Specifically, finding a subset of bids to satisfy that maximizesthe sum of bidders’ valuations for them is equivalent to a weighted knapsack problem,and hence isNP-complete.

Definition 10.2.2 (winner determination problem (WDP)) Thewinner determinationproblemfor a general multiunit auction, wherem denotes the total number of unitswinner

determinationproblem, generalmultiunitauction

available andvi(k) denotes bidderi’s declared valuation for being awardedk units, isto find the social-welfare maximizing allocation of goods toagents. This problem canbe expressed as the integer program in Figure 10.3.

maximize∑

i∈N

1≤k≤m

vi(k)xk,i (10.11)

subject to:∑

i∈N

1≤k≤m

k · xk,i ≤ m (10.12)

1≤k≤m

xk,i ≤ 1 ∀i ∈ N (10.13)

xk,i = 0, 1 ∀1 ≤ k ≤ m, i ∈ N (10.14)

Figure 10.3 Integer program for finding the winners of a general multiunit auction.

This integer program uses a variablexk,i to indicate whether bidderi is allocatedexactlyk units, and then seeks to maximize the sum of agents’ valuations for the chosenallocation in the objective function (10.11). Constraint (10.13) ensures that no morethan one of these indicator variables is nonzero for any bidder, and Constraint (10.12)ensures that the total number of units allocated does not exceed the number of unitsavailable. Constraint (10.14) requires that the indicatorvariables are integral; it is thisconstraint that makes the problem computationally hard.

c© Shoham and Leyton-Brown, 2007

Page 315: Mas kevin b-2007

10.2 Multiunit auctions 299

Representing multiunit valuations

We have assumed that agents can communicate their complete valuations to the auc-tioneer. When a large number of units are available in an auction, this means thatbidders must specify a valuation for every number of units. In practice, it is commonthat bidders would be provided with somebidding languagethat would allow them to bidding

languageconvey this same information more compactly.Of course, the usefulness of a bidding language depends on the sorts of underlying

valuations that bidders will commonly want to express. We can divide these bids intosymmetricand asymmetricvaluations. Symmetric valuations are those in which allgoods are identical from the point of view of the bidder, and for this reason we some-times use the termmultiple units of good. A few common symmetric valuations are thefollowing.

• Additive valuation. The bidder’s valuation of a set is directly proportional to thenumber of goods in the set, so thatvi(S) = c|S| for some constantc.

• Single item valuation. The bidder desires any single item, and only a single item,so thatvi(S) = c for some constantc for all S 6= ∅.

• Fixed budget valuation. Similar to the additive valuation, but the bidder has amaximum budget ofB, so thatvi(S) = min(c|S|, B)

• Majority valuation. The bidder values equally any majority of the goods, so that

vi(S) =

1 if |S| ≥ m/20 otherwise

We can generalize all of the symmetric valuations to a general symmetric valuation.

• General symmetric valuation.Letp1, p2, . . . , pm be arbitrary non-negative prices,so thatpj specifies how much the bidder is willing to pay of thejth item won. Then

vi(S) =

|S|∑

j=1

pj

• Downward sloping valuation. A downward sloping valuation is a symmetric val-uation in whichp1 ≥ p2 ≥ · · · ≥ pm.

10.2.4 Unlimited supply

Earlier, we suggested that MP3 downloads serve as a good example of a multiunit good.However, they differ from the other examples we gave, such asnew cars, in an impor-tant way. This difference is that a seller of MP3 downloads can produce additionalunits of the good at zero marginal cost, and hence has an effectively unlimited supplyof the good. This does not mean that the units have no value or that the seller shouldgive them away—after all, thefirst unit may be very expensive to produce, requiring

Multiagent Systems, draft of August 27, 2007

Page 316: Mas kevin b-2007

300 10 Protocols for Multiagent Resource Allocation: Auctions

the seller to amortize its cost across the sale of multiple units. What it does mean isthat the seller will not face any supply restrictions other than those he imposes himself.

We thus face the following multiunit auction problem: how should a seller choosea multiunit auction mechanism for use in an unlimited supplysetting if he cares aboutmaximizing his revenue? The goal will be finding an auction mechanism that choosesamong bids in a way that achieves good revenue without artificially picking a specificnumber of goods to sell in advance, and also without relying on distributional infor-mation about buyers’ valuations. We also want the mechanismto be dominant-strategytruthful, individually rational and weakly budget balanced. Clearly, it will be necessaryto artificially restrict supply (and thus cause allocative inefficiency), because otherwisebidders would be able to win units of the good in exchange for arbitrarily small pay-ments. Although this assumption can be relaxed, to simplifythe presentation we willreturn to our previous assumption that bidders are interested in buying at most one unitof the good.

The main insight that allows us to construct a mechanism for this case is that, if weknewbidders’ valuations but had to offer the goods at the same price to all bidders, itwould be easy to compute the optimal single price.

Definition 10.2.3 (Optimal single price) Theoptimal single priceis calculated as fol-optimal singleprice lows:

1. Order the bidders in descending order of bid amount; letbi denote the bid amountof theith-highest bid.

2. Calculateopt ∈ arg maxi∈1,...,n i · bi.3. The optimal single price isbopt.

Simply offering the good to the agents at the optimal single price is not a dominant-strategy truthful mechanism: bidders would have incentiveto misstate their valuations.If we knew the distributions from with bidders’ valuations were drawn, this wouldalso allow us to set an optimal single price in a similar way. More interestingly, thisprocedure also serves as a building block for a simple dominant strategy mechanismthat works even in the absence of distributional information.

Definition 10.2.4 (Random sampling optimal price auction)The random samplingoptimal price auctionis defined as follows:random

samplingoptimal priceauction

1. Randomly partition the set of biddersN into two sets,N1 andN2 (i.e.,N = N1 ∪N2; N1 ∩N2 = ∅; each bidder has probability 0.5 of being assigned to each set).

2. Using the procedure above findp1 andp2, wherepi is the optimal single price tocharge the set of biddersNi.

3. Then set the allocation and payment rules as follows:

• For each bidderi ∈ N1, award a unit of the good if and only ifbi ≥ p2, andcharge the bidderp2;

• For each bidderj ∈ N2, award a unit of the good if and only ifbj ≥ p1, andcharge the bidderp1.

c© Shoham and Leyton-Brown, 2007

Page 317: Mas kevin b-2007

10.2 Multiunit auctions 301

It is left as an exercise to the reader to give a formal proof that this mechanism isdominant-strategy truthful. The argument is essentially the same as in Theorem 10.1.1:bidders’ declared valuations are used only to determine whether or not they win, butbeyond serving as a maximum price offer do not affect the price that a bidder pays.It is also trivial to show that the mechanism satisfies weak budget balance andex-post individual rationality. Of course the random sampling auction is not efficient, asit sometimes refuses to sell units to bidders who value them.The random samplingauction’s most interesting property concerns revenue.

Theorem 10.2.5The random sampling optimal price auction always yields expectedrevenue that is at least a (fixed) constant fraction of the revenue that would be achievedfrom charging bidders the optimal single price, subject to the constraint that at leasttwo units of the good must be sold.

We refrain from stating the constant here to avoid introducing additional notation;instead, we refer the reader to the paper cited in the chapternotes. We do point out thata host of other auctions have been proposed in the same vein. Some of these auctionsoffer better revenue guarantees; e.g., one similar auctionguarantees an expected rev-enue of at least a14 fraction of the revenue from charging bidders the optimal singleprice described above. Other work covers additional settings such as goods for whichthere is a limited supply of units. (Here the trick is essentially to throw away low bid-ders so that the number of remaining bidders is the same as thenumber of goods, andthen to proceed as before with the additional constraint that the highest rejected bidmust not exceed the single price charged to any winning bidder.) In this limited sup-ply case, both the random sampling optimal price auction andits more sophisticatedcounterparts can achieve revenues much higher than VCG, andhence also higher thanall the other auctions discussed in Section 10.2.2. Still other work considers theonlineauctioncase, where bidders arrive one at a time and the auction must decide whether online auctioneach bidder wins or loses before seeing the next bid.

10.2.5 Position auctions

The last auction type we consider in this section goes somewhat beyond the multiunitauctions we have defined above. Like multiunit auctions in the case of single-unit de-mand,position auctionsnever allocate more than one good per bidder, and ask biddersposition

auctionsto specify their preferences using a single real number. Thewrinkle is these auctionssell a set of goods among which bidders are not indifferent: one of a set of ordered po-sitions. The motivating example for these auctions is the sale of ranked advertisementson a page of search results. (For this reason, these auctionsare also calledsponsoredsearch auctions.) Since goods are not identical, we cannot consider them to be multiple sponsored

search auctionsunits of the same good. In this sense position auctions can beunderstood as combina-torial auctions, the topic of the Section 10.3. Nevertheless, we choose to present themhere because they have been called multiunit auctions in theliterature, and becausetheir bidding and allocation rules have a multiunit flavor.

Regardless of how we choose to classify them, position auctions are very important.From a theoretical point of view they are interesting, and have good properties both in

Multiagent Systems, draft of August 27, 2007

Page 318: Mas kevin b-2007

302 10 Protocols for Multiagent Resource Allocation: Auctions

terms of incentives and computation. Practically speaking, major search engines usethem to sell many billions of dollars worth of advertising space annually, and indeeddid so even before the auctions’ theoretical properties were well understood. In theseauctions, search engines offer a set of keyword-specific “slots”—usually a list on theright-hand side of the page of search results—for sale to interested advertisers. Slotsare considered to be more valuable the closer they are to the top of the page, becausethis affects their likelihood of being clicked by a user. Advertisers place bids on key-words of interest, which are retained by the system. Every time a user searches for akeyword on which advertisers have bid, an auction is held. The outcome of this auc-tion is a decision about which ads appear on the search results page and in which order.Advertisers are required to pay only if a user clicks on theirad. Because sponsoredsearch is the dominant application of position auctions, wewill use it as our motivatingexample here.

How should position auctions be modeled? The setting can be understood as induc-ing an infinitely repeated Bayesian game, because a new auction is held every time auser searches for a given keyword. However, researchers have argued that it makessense to study an unrepeated, perfect-information model ofthe setting. The single-shotassumption is considered reasonable because advertisers tend to value clicks additively(i.e., the value derived from a given user clicking on an ad isindependent of how manyother users clicked earlier), at least when bidders do not face budget constraints. Theperfect-information assumption makes sense because search engines allow bidders ei-ther to observe other bids or to figure them out by probing the mechanism.

We now give a formal model. As before, letN be the set of bidders (advertisers),and letvi be i’s (commonly-known) valuation for getting a click. Letbi ∈ R ∪ 0denotei’s bid, and letb(j) denote thejth-highest bid, or 0 if there are fewer thanj bids.Let G = 1, . . . ,m denote the set of goods (slots), and letαj denote the expectednumber of clicks (theclick-through rate) that an ad will receive if it is listed in theithclick-through

rate slot. Observe that we assume thatα does not depend on the bidder’s identity.The generalized first-price auction was the first position auction to be used by search

engine companies.

Definition 10.2.6 (Generalized first-price auction)Thegeneralized first-price auction(GFP)awards the bidder with thejth-highest bid thejth slot. If bidderi’s ad receivesgeneralized

first-priceauction (GFP)

a click, he pays the auctioneerbi.

Unfortunately, these auctions do not give rise to any pure-strategy equilibria, evenin the unrepeated, perfect-information case. For example,consider three bidders 1, 2and 3 who value a click at $10, $4 and $2 respectively, participating in an auction fortwo slots, where the probability of a click for the two slots isα1 = 0.5 andα2 = 0.25respectively. Bidder 2 needs to bid at least $2 to get a slot; suppose he bids $2.01.Then bidder 1 can win the top slot for a bid of $2.02. But bidder2 could get the topslot for $2.03, increasing his expected utility. If the agents bid by best-responding toeach other—as has been observed in practice—their bids will increase all the way upto bidder 2’s valuation, at which point bidder 2 will drop out, bidder 1 will reduce hisbid to bidder 3’s valuation, and the cycle will begin again.

The instability of bidding behavior under the GFP led to the introduction of thegeneralized second-price auction, which is now the dominant mechanism in practice.

c© Shoham and Leyton-Brown, 2007

Page 319: Mas kevin b-2007

10.2 Multiunit auctions 303

Definition 10.2.7 (Generalized second-price auction)Thegeneralized second-priceauction (GSP)awards the bidder with thejth-highest bid thejth slot. If bidderi’s ad is generalized

second-priceauction (GSP)

ranked in slotj and receives a click, he pays the auctioneerb(j+1).

The GSP is more stable than the GFP. Continuing the example from above, if allbidders bid truthfully, then bidder 1 would pay $4 per click for the first slot, bidder2 would pay $2 per click for the second slot, and bidder 3 wouldlose. Bidder 1’sexpected utility would be 0.5($10-$4) = $3; if he bid less than $4 but more than $2 hewould pay $2 per click for the second slot and achieve expected utility of 0.25($10-$2)= $2, and if he bid even less then his expected utility would bezero. Thus bidder 1prefers to bid truthfully in this example. If bidder 2 bid more than $10 then he wouldwin the top slot for $10, and would achieve negative utility;thus in this example bidder2 also prefers honest bidding.

This example suggests a connection between the GSP and the VCG mechanism.However, these two mechanisms are actually quite different, as becomes clear whenwe apply the VCG formula to the position auction setting.

Definition 10.2.8 (VCG) In the position auction setting, theVCG mechanismawards VCGmechanismthe bidder with thejth-highest bid thejth slot. If bidderi’s ad is ranked in slotj and

receives a click, he pays the auctioneer1αj

∑m+1k=j+1 b(k)(αk−1 − αk).

Intuitively, the key difference between the GSP and VCG is that the former doesnot charge an agent his social cost, which depends on the differences between click-through rates that other agents would receive with and without his presence. Indeed,truthful bidding is not always a good idea under the GSP. Consider the same bidders asin our running examples, but change the click-through rate of slot 2 toα2 = 0.4. Whenall bidders bid truthfully we have already shown that bidder1 would achieve expectedutility of $3 (this argument did not depend onα2). However, if bidder 1 changed hisbid to $3, he would be awarded the second slot and would achieve expected utility of0.4($10-$2) = $3.2. Thus truthfulness is not even an equilibrium of this auction, letalone a dominant strategy.

Whatcanbe said about the equilibria of the GSP? Briefly, it can be shown that in theperfect-information setting, the GSP has many equilibria.The dynamic nature of thesetting suggests that the most stable configurations will belocally envy-free: no bidder locally envy-freewill wish that he could switch places with the bidder who won the slot directly abovehis. There exists a locally envy-free equilibrium of the GSPthat achieves exactly theVCG allocations and payments. Furthermore, all other locally envy-free equilibria leadto higher revenues for the seller, and hence are worse for thebidders.

What about relaxing the perfect information assumption? Here, it is possible toconstruct a generalizedEnglishauction that corresponds to the GSP, and to show thatthis English auction has a unique equilibrium with various desirable properties. Inparticular, the payoffs under this equilibrium are again the same as the VCG payoffs,and the equilibrium isex-post(see Section 5.3.4), meaning that it is independent of theunderlying valuation distribution.

Multiagent Systems, draft of August 27, 2007

Page 320: Mas kevin b-2007

304 10 Protocols for Multiagent Resource Allocation: Auctions

10.3 Combinatorial auctions

We now consider an even broader auction setting, in which a whole variety of differentgoods are available in the same market. This differs from themultiunit setting becausewe no longer assume that goods are interchangeable. Switching to a multi-goodauctionmodel is important when bidders’ valuations depend strongly on which set of goodsthey receive. Some widely studied practical examples include governmental auctionsfor the electromagnetic spectrum, energy auctions, corporate procurement auctions andauctions for paths (e.g., shipping rights; bandwidth) in a network.

More formally, let us consider a setting with a set of biddersN = 1, . . . , n (asbefore) and a set of goodsG = 1, . . . ,m. Let v = (v1, . . . , vn) denote the truevaluation functions of the different bidders, where for each i ∈ N , vi : 2G → R. Thereis a substantive assumption buried inside this definition: that there areno externalities.no externalities(Indeed, we have been making this assumption almost continuously since introducingquasilinear utilities in the previous chapter; however, this is a good time to remind thereader of it.) Specifically, we have asserted that a bidder’svaluation depends only onthe set of goods he wins. This assumption is quite standard; however, it does not allowus to model a bidder who also cares about the allocations and payments of the otheragents.

We will usually be interested in settings where bidders havenon-additivevaluationfunctions, for example valuing bundles of goods more than the sum of the values forsingle goods. We identify two important kinds of non-additivity. First, when two itemsarepartial substitutesfor each other (for examples, a Sony TV and a Toshiba TV, or,more partially, a CD player and an MP3 player), their combined value is less than thesum of their individual values. Strengthening this condition, when two items arestrictsubstitutestheir combined value is the same as the value for either one ofthe goods.For example, consider two non-transferable tickets for seats on the same plane. Sets ofstrictly substitutable goods can also be seen as multiple units of a single good.

Definition 10.3.1 (substitutability) Bidder i’s valuationvi exhibitssubstitutabilityifsubstitutabilitythere exist two sets of goodsG1, G2 ⊆ G, such thatG1 ∩G2 = ∅ andv(G1 ∪G2) <v(G1) + v(G2). When this condition holds, we say that the valuation function vi issubadditive.subadditive

valuationfunction The second form of non-additivity we will consider iscomplementarity. This con-

dition is effectively the opposite of substitutability: the combined value of goods isgreater than the sum of their individual values. For example, consider a left shoe and aright shoe, or two adjacent pieces of real estate.

Definition 10.3.2 (complementarity) Bidder i’s valuationvi exhibitscomplementar-ity if there exist two sets of goodsG1, G2 ⊆ G, such thatG1 ∩ G2 = ∅ andv(G1 ∪complementarityG2) > v(G1) + v(G2). When this condition holds, we say that the valuation functionvi is superadditive.superadditive

valuationfunction How should an auctioneer sell goods when faced with such bidders? One approach

is simply to sell the goods individually, ignoring the bidders’ valuations. This is easyfor the seller, but it makes things difficult for the bidders.In particular, it presents

c© Shoham and Leyton-Brown, 2007

Page 321: Mas kevin b-2007

10.3 Combinatorial auctions 305

them with what is called theexposure problem: a bidder might bid aggressively for a exposureproblemset of goods in the hopes of winning a bundle, but succeed in winning only a subset

of the goods and therefore pay too much. This problem is especially likely to arise insettings where bidders’ valuations exhibit strong complementarities, because in thesecases bidders might be willing to pay substantially more forbundles of goods than theywould pay if the goods were sold separately.

The next-simplest method is to run essentially separate auctions for the differentgoods, but to connect them in certain ways. For example, one could hold a multi-round(e.g., Japanese) auction, but synchronize the rounds in thedifferent auctions so that asa bidder bids in one auction he has a reasonably good indication of what is transpiringin the other auctions of interest. This approach can be made more effective throughthe establishment of constraints on bidding that span all the auctions (so-calledactivityrules). For example, bidders might be allowed to increase their aggregate bid amount activity rulesby only a certain percentage from one round to the next, thus providing a disincen-tive for bidders to fail to participate in early rounds of theauction and thus improv-ing the information transfer between auctions. Bidders might also be subject to otherconstraints: for example a budget constraint could requirethat a bidder not exceed acertain total commitment across all auctions. Both of theseideas can be seen in somegovernment auctions for electromagnetic spectrum (where the so-calledsimultaneousascending auctionwas used) as well as in some energy auctions. Despite some suc- simultaneous

ascendingauction

cesses in practice, however, this approach has the drawbackthat it only mitigates theexposure problem rather than eliminating it entirely.

A third approach ties goods together in a more straightforward way: the auctioneersells all goods in a single auction, and allows bidders to biddirectly on bundles ofgoods. Such mechanisms are calledcombinatorial auctions. This approach eliminates combinatorial

auctionsthe exposure problem because bidders are guaranteed that their bids will be satisfied“all or nothing”. For example a bidder may be permitted to offer $100 for the pair (TV,DVD player), or to make a disjunctive offer “either $100 for TV1 or $90 for TV2, butnot both.” However, we will see that while combinatorial auctions resolve the exposureproblem they raise many other questions. Indeed, these auctions have been the subjectof considerable recent study in both economics and computerscience, some of whichwe will describe in the remainder of this section.

10.3.1 Simple combinatorial auction mechanisms

The simplest reasonable combinatorial auction mechanism is probably the one in whichthe auctioneer computes the allocation that maximizes the social welfare of the declaredvaluations (i.e.,x = maxx∈X

∑i∈N vi(x)), and charges the winners their bids (i.e.,

for all i ∈ N , p i = vi). This is a direct generalization of the first-price sealed bid auc-tion, and like it this naive auction is not incentive compatible. Consider the followingsimple valuations in a combinatorial auction setting.

Bidder 1 Bidder 2 Bidder 3v1(x, y) = 100 v2(x) = 75 v3(y) = 40

v1(x) = v1(y) = 0 v2(x, y) = v2(y) = 0 v3(x, y) = v3(x) = 0

Multiagent Systems, draft of August 27, 2007

Page 322: Mas kevin b-2007

306 10 Protocols for Multiagent Resource Allocation: Auctions

This example makes it easy to show that the auction is not incentive compatible: e.g.,if agents 1 and 2 bid truthfully, agent 3 is better off declaring (e.g.,)v3(y) = 26.Unfortunately, it is not apparent how to characterize the equilibria of this auction usingthe techniques that worked in the single-good case: we do nothave a simple analyticexpression that describes when a bidder wins the auction, and we also lack a revenueequivalence theorem.

An obvious alternative is the method we applied most broadlyin the multigood case:the Vickrey-Clarke-Groves (VCG) mechanism. In the exampleabove, VCG wouldawardx to 2 andy to 3. Bidder 2 would pay 60; without him in the auction 1 wouldhave gotten both goods, gaining 100 in value, while with bidder 2 in the auction theother bids only net a total value of 40 (from goodx assigned to 3). Similarly, 3 wouldpay 25; the difference between 100 and 75. The reader can verify that no bidder cangain by unilaterally deviating from truthful bidding, and that bidders strictly lose fromsome deviations.

As in the multiunit case, VCG has some attractive propertieswhen applied to com-binatorial auctions. Specifically, it is dominant-strategy truthful, efficient,ex-postindi-vidual rational and weakly budget balanced (the latter by Theorems 9.4.8 and 9.4.10).The VCG combinatorial auction mechanism is not without shortcomings, however, aswe already discussed in Section 9.4.2. (Indeed, though we did not discuss them above,most of these shortcomings also affect the use of VCG in the multiunit case, and someeven impact second-price single-good auctions.) For example, a bidder who declareshis valuation truthfully has two main reasons to worry—one isthat the seller will ex-amine his bid before the auction clears and submit a fake bid just below, thus increasingthe amount that the agent would have to pay if he wins. (This isa so-calledshill bid.)shill bidAnother possibility is both his competitors and the seller will learn his true valuationand will be able to exploit this information in a future transaction. Indeed, these tworeasons are often cited as reasons why VCG auctions are rarely seen in practice. Otherissues include the fact that VCG is vulnerable to collusion among bidders, and, con-versely, to one bidder masquerading as several different ones (so calledpseudonymousbiddingor false-name bidding). Perhaps the biggest potential hurdle, however, is com-pseudonymous

bidding

false-namebidding

putational, and it is not specific to VCG. This is the subject of the next section.

10.3.2 The winner determination problem

Both our naive first-price combinatorial auction and the more sophisticated VCG ver-sion share an element in common: given the agents’ individual declarationsv, theymust determine the allocation of goods to agents that maximizes social welfare. Thatis, we must computemaxx∈X

∑i∈N vi(x). In single-good and single-unit demand

multiunit auctions this was simple—we just had to satisfy theagent(s) with the highestvaluation(s). In combinatorial auctions, as in the generalmultiunit auctions we consid-ered in Section 10.2.3, determining the winners is a more challenging computationalproblem.

Definition 10.3.3 (winner determination problem (WDP)) Thewinner determinationproblemfor a combinatorial auction, given the agents’ declared valuationsv, is to findwinner

determinationproblem,combinatorialauction

the social-welfare maximizing allocation of goods to agents. This problem can be ex-

c© Shoham and Leyton-Brown, 2007

Page 323: Mas kevin b-2007

10.3 Combinatorial auctions 307

pressed as the integer program in Figure 10.4.

maximize∑

i∈N

S⊆G

vi(S)xS,i (10.15)

subject to:∑

S∋j

i∈N

xS,i ≤ 1 ∀j ∈ G (10.16)

S⊆G

xS,i ≤ 1 ∀i ∈ N (10.17)

xS,i = 0, 1 ∀S ⊆ G, i ∈ N (10.18)

Figure 10.4 Integer program for the combinatorial auction winner determination problem.

In this integer programming formulation, the valuationsvi(S) are constants and thevariables arexS,i. These variables are boolean, indicating whether bundleS is allo-cated to agenti. The objective function (10.15) states that we want to maximize the sumof the agents’ declared valuations for the goods they are allocated. Constraint (10.16)ensures that no overlapping bundles of goods are allocated,and Constraint (10.17) en-sures that no agent receives more than one bundle. (This makes sense since biddersexplicitly assign a valuation toeverysubset of the goods.) Finally, Constraint (10.18)is what makes this aninteger program11 rather than a linear program: no subset can bepartially assigned to an agent.

The fact that the WDP is an integer program rather than a linearprogram is badnews, since only the latter admit a polynomial-time solution. Indeed, a reader familiarwith algorithms and complexity may recognize the combinatorial auction allocationproblem as aset packing problem(SPP). Unfortunately, it is well known that the SPPset packing

problemis NP-complete. This means that it is not likely that a polynomial-time algorithmexists for the problem. Worse, it so happens this problem cannot even be approximateduniformly, meaning that there does not exist a polynomial-time algorithm and a fixedconstantk > 0 such that for all inputs the algorithm returns a solution that is at least1ks

∗, wheres∗ is the value of the optimal solution for the given input.There are two primary approaches to getting around the computational problem.

First, we can restrict ourselves to a special class of problems for which there is guar-anteed to exist a polynomial time solution. Second, we can resort to heuristic methodsthat give up the guarantee of polynomial running time, optimality of solution, or both.In both cases,relaxation methodsare a common approach. One instance of the firstrelaxation

methodapproach is to relax the integrality constraint, thereby transforming the problem into alinear program, which is solvable by known methods in polynomial time. In generalthe solution results in “fractional” allocations, in whichfractions of goods are allocatedto different bidders. If we are lucky, however, our solutionto the LP will just happento be integral.

11. Integer programs and linear programs are defined in Appendix B.

Multiagent Systems, draft of August 27, 2007

Page 324: Mas kevin b-2007

308 10 Protocols for Multiagent Resource Allocation: Auctions

A B C D E

@@

@@R

HHHHHHHHj

@@

@I 6

Legal bundle: A,B,C

Illegal bundle: A,D, E

Figure 10.5 Example of legal and illegal bundles for contiguous pieces of lands whenwedemand thecontiguous ones property.

Polynomial methods

There are several sets of conditions under which such luck isassured. The most com-mon of these is calledtotal unimodularity(TU). In general terms, a constraint matrixtotal

unimodularity A (see Appendix B) is TU if the determinant of every square submatrix is 0, 1, or -1.In the case of the combinatorial auction WDP, this condition (via constraint (10.16))amounts to a restriction on the subsets that bidders are permitted to bid on. How dowe find out if a particular matrix is TU? There are many ways. First, there exists apolynomial time algorithm to decide whether an arbitrary matrix is TU. Second, wecan characterize important subclasses of TU matrices. Whilemany of these subclassesdefy an intuitive interpretation, in the following discussion we will present a few spe-cial cases that are relevant to combinatorial auctions.

One important subclass of TU matrices is the class of 0-1 matrices with thecon-secutive ones property. In this subclass, all nonzero entries in each column must ap-consecutive ones

property pear consecutively. One might ask what classes of bids the consecutive ones propertycorresponds to in auction problems. This corresponds roughly to contiguous single-dimensional goods, such as time intervals or parcels of landalong a shoreline (as shownin Figure 10.5), where bids can only be made on bundles of contiguous goods.

Another subclass of auction problems that have integral polyhedra, and thus can beeasily solved using linear programming, corresponds to theset ofbalanced matrices.balanced

matrices A 0-1 matrix is balanced if it has no square submatrix of odd order with exactly two1’s in each row and column. One class of auction problems thatis known to have abalanced matrix are those that allow onlytree-structured bids, as illustrated in Fig-tree-structured

bid ure 10.6. Consider that the set of goods for sale are the vertices of a tree, connected bysome set of edges. All bids must be on bundles of the form(j, r), which represents theset of vertices that are within distancer of item j. The constraint matrix for this set ofpossible bundles is indeed balanced, and so the corresponding polyhedron is integral,and the solution can be found using linear programming.

Yet another subclass that can be solved efficiently restricts the bids to be on bundlesof no more than two items. The technique here uses dynamic programming, and thealgorithm runs in cubic time. Finally, there are a number of positive results covering

c© Shoham and Leyton-Brown, 2007

Page 325: Mas kevin b-2007

10.3 Combinatorial auctions 309

m m m m

m m m

m

AA

AA

A

AA

AA

A

HHHHHHHHH

B C D E

A

Illegal bundle: A,C

Legal bundle: A,B,C

Figure 10.6 Example of a tree-structured bid.

cases where bidders’ valuation functions are subadditive.

Heuristic methods

In many cases the solutions to the associated linear programwill not be integral. Inthese cases we must resort to usingheuristic methodsto find solutions to the auctionproblem.

Heuristic methods come in two broad varieties. The first iscompleteheuristic meth-ods, which are guaranteed to find an optimal solution if one exists. Despite their dis-couraging worst-case guarantees, in practice complete heuristic methods are able tosolve many interesting problems within reasonable amountsof time. This makes suchalgorithms the tool of choice for many practical combinatorial auctions. One draw-back is that it can be difficult to anticipate how long such algorithms will require tosolve novel problem instances, as in the end their performance depends on a probleminstance’s combinatorial structure rather than on easily-measured parameters like thenumber of goods or bids. Complete heuristic algorithms tendto perform tree search(to guarantee completeness) along with some sort of pruningtechnique to reduce theamount of search required (the heuristic).

The second flavor of heuristic algorithm isincompletemethods, which are not guar-anteed to find optimal solutions. Indeed, as was mentioned earlier, in general theredoes not even exist a tractable algorithm that can guaranteethat you will reach an ap-proximate solution that is within afixed fractionof the optimal solution, no matterhow small the fraction. However, methods do exist that can guarantee a solution thatis within 1/

√k of the optimal solution, wherek is the number of goods. More im-

portantly, like their complete cousins, incomplete heuristic algorithms often performvery well in practice despite these theoretical caveats. One example of an incompleteheuristic method is a greedy algorithm, a technique that builds up an allocation byadding one bid at a time, and never reconsidering a bid once ithas been allocated.

Multiagent Systems, draft of August 27, 2007

Page 326: Mas kevin b-2007

310 10 Protocols for Multiagent Resource Allocation: Auctions

Another example is a local search algorithm, in which statesin the search space arecomplete—but possibly infeasible—allocations, and in whichthe search progresses bymodifying these allocations either randomly or greedily.

10.3.3 Expressing a bid: bidding languages

Except for our discussion of single-minded bidders, we haveso far assumed that bid-ders will specify a valuation for every subset of the goods atauction. Since there are anexponential number of such subsets, this will quickly become impossible as the num-ber of goods grows. If we are to have any hope of finding tractable mechanisms forgeneral combinatorial auctions, we must first find a way for bidders to express theirbids in a more succinct manner. In this section we will present a number of biddinglanguages that have been proposed for encoding bids.

As we will see, these languages differ in the ways that they express different classesof bids. We can state some desirable properties that we mightlike to have in a biddinglanguage. First, we want our language to beexpressiveenough to represent all possiblevaluation functions. Second, we want our language to beconcise, so that expressingcommonly used bids does not take space that is exponential inthe number of goods.Third, we want our language to benatural for humans to both understand and create;thus the structure of the bids should reflect the way in which we think about themnaturally. Finally, we want our language to betractablefor the auctioneer’s algorithmsto process when computing an allocation.

In the discussion that follows, for convenience we will often speak about bids asvaluation functions. Indeed, in the most general case a bid will contain a valuation forevery possible combination of goods. However, be aware thatthe bid valuations mayor may not reflect the players’ true underlying valuations. We also limit the scope ofour discussion to valuation functions in which the following properties hold.

• Free disposal. Goods have non-negative value, so that ifS ⊆ T thenvi(S) ≤vi(T ).

• Nothing-for-nothing. vi(∅) = 0 (In other words, a bidder who gets no goods alsogets no utility.)

We already discussed symmetric valuations in Section 10.2.3 (additive; single item;fixed budget; majority; general). The whole reason to consider a combinatorial auction,however, is that bidders are expected to have asymmetric valuations. For example, theremay be different classes of goods, and valuations for sets ofgoods may be a function ofthe classes of goods in the set. Imagine that our setG consists of two classes of goods:some red items and some green items, and the bidder requires only items of the samecolor. Alternatively, it could be the case that the bidder wants exactly one item fromeach class.

Atomic bids

Let’s begin to build up some languages for expressing these bids. Perhaps the mostbasic bid requests just one particular subset of the goods. We call such a bid anatomic

c© Shoham and Leyton-Brown, 2007

Page 327: Mas kevin b-2007

10.3 Combinatorial auctions 311

bid. An atomic bid is a pair(S, p) that indicates that the agent is willing to pay a priceatomic bidof p for the subset of goodsS; we denote this value byv(S) = p. Note that an atomicbid implicitly represents an AND operator between the different goods in the bundle.We stated an atomic bid above when we wanted to bid on the couch, the TV,and theVCR for $100.

OR bids

Of course, many simple bids cannot be expressed as an atomic bid; for example, itis easy to verify that an atomic bid cannot represent even theadditive valuation de-fined earlier. In order to represent this valuation, we will need to be able to bidon disjunctions of atomic valuations. AnOR bid is a disjunction of atomic bids OR bid(S1, p1) ∨ (S2, p2) ∨ · · · ∨ (Sk, pk) that indicates that the agent is willing to pay aprice ofp1 for the subset of goodsS1, or a price ofp2 for the subset of goodsS2, etc.

To define the semantics of an OR bid precisely, we interpret ORas an operator forcombining valuation functions. LetV be the space of possible valuation functions, andv1, v2 ∈ V be arbitrary valuation functions. Then we have that

(v1 ∨ v2)(S) = maxR,T⊆S,R∩T=∅

(v1(R) + v2(T )).

It is easy to verify that an OR bid can express the additive valuation. As the followingresult shows, its power is still quite limited; for example it cannot express the singleitem valuation described above.

Theorem 10.3.4OR bids can express all valuation functions that exhibit no substi-tutability, and only these.

For example, in the consumer auction example given above, wewanted to bid oneither the TV and the VCR for $100, or the TV and the DVD player for $150, but notboth. It is not possible for us to express this using OR bids.

XOR bids

XOR bidsdo not have this limitation. An XOR bid is an exclusive OR of atomic bids XOR bid(S1, p1) ⊕ (S2, p2) ⊕ · · · ⊕ (Sk, pk) that indicates that the agent is willing to acceptone but no more than one of the atomic bids.

Once again, the XOR operator is actually defined on the space of valuation func-tions. We can define its semantics precisely as follows. LetV be the space of possiblevaluation functions, andv1, v2 ∈ V be arbitrary valuation functions. Then we havethat

(v1 ⊕ v2)(S) = max(v1(S), v2(S)).

We can use XOR bids to express our example from above:

(TV,VCR, 100)⊕ (TV,DVD, 150)

It is easy to see that XOR bids have unlimited representational power, since it is pos-sible to construct a bid for an arbitrary valuation using an XOR of the atomic valuationsfor every possible subsetS ⊆ G.

Multiagent Systems, draft of August 27, 2007

Page 328: Mas kevin b-2007

312 10 Protocols for Multiagent Resource Allocation: Auctions

Theorem 10.3.5XOR bids can represent all possible valuation functions.

However, this does not imply that XOR bids represent every valuation function ef-ficiently. In fact, as the following result states, there aresimple valuations that can berepresented by short OR bids but that require XOR bids of exponential size.

Theorem 10.3.6Additive valuations can be represented by OR bids in linear space,but require exponential space if represented by XOR bids.

Note that for the purposes of the present discussion, we consider the size of a bid tobe the number of atomic formulas that it contains.

Combining the OR and XOR operators

We can also create bidding languages by combining the OR and XOR operators onvaluation functions. Consider a language that allows bids that are of the form of an ORof XOR of atomic bids. We call these bidsOR-of-XOR bids. An OR-of-XOR bidis aOR-of-XOR bidset of XOR bids, as defined above, such that the bidder is willing to obtain any numberof these bids.

Like XOR bids, OR-of-XOR bids have unlimited representational power. However,unlike XOR bids, they can specialize to plain OR bids, which affords greater simplic-ity of expression, as we have seen above. As a specific example, OR-of-XOR bidscan express any downward sloping symmetric valuation onm items in size of onlym2. However, its compactness is still limited. For example, even simple asymmetricvaluations can require size of at least2m/2+1 to express in the OR-of-XOR language.

It is also possible to define a language of XOR-of-OR bids, andeven a languageallowing arbitrary nesting of OR and XOR statements here (werefer to the latter asgeneralized OR/XOR bids). These languages vary in their compactness.

The OR* bidding language

Now we turn to a slightly different sort of bidding language that is powerful enough tosimulate all of the preceding languages with a relatively succinct representation. Thislanguage results from the insight that it is possible to simulate the effect of an XOR byallowing bids to includedummy(or phantom) items. The only difference between anOR and a XOR is that the latter is exclusive; we can enforce this exclusivity in the ORby ensuring that all of the sets in the disjunction share a common item. We call thislanguageOR*.

Definition 10.3.7 (OR* bid) Given a set of dummy itemsGi for each agenti ∈ N ,an OR* bid is a disjunction of atomic bids(S1, p1) ∨ (S2, p2) ∨ · · · ∨ (Sk, pk), whereOR* bidfor eachl = 1, . . . , k, the agent is willing to pay a price ofpl for the set of itemsSl ⊆ G ∪Gi.

An example will help make this clearer. If we wanted to express our TV bid fromabove using dummy items, we would create a single dummy item D, and express thebid as follows.

(TV,VCR,D, 100) ∨ (TV,DVD,D, 150)

c© Shoham and Leyton-Brown, 2007

Page 329: Mas kevin b-2007

10.3 Combinatorial auctions 313

Atomic OR XOR

OR-of-XOR XOR-of-OR

OR/XOR; OR*More expressive than

More compact than

Figure 10.7 Relationships between different bidding languages.

Any auction procedure that does not award one good to two people will select atmost one of these disjuncts. The following results show us that the OR* language issurprisingly expressive and simple.

Theorem 10.3.8Any valuation that can be represented by OR-of-XOR bids of sizescan also be represented by OR* bids of sizes, using at mosts dummy items.

Theorem 10.3.9Any valuation that can be represented by OR/XOR bids of sizes canalso be represented by OR* bids of sizes, using at mosts2 dummy items.

By the definition of OR/XOR bids, we have the following corollary.

Corollary 10.3.10 Any valuation that can be represented by XOR-of-OR bids of sizescan also be represented by OR* bids of sizes, using at mosts2 dummy items.

Let us briefly review the properties of the languages we have discussed. The XOR,OR-of-XORs, XOR-of-OR and OR* languages are all powerful enough to express allvaluations. Second, the efficiencies of the OR-of-XOR and XOR-of-OR languages areincomparable: there are bids that can be expressed succinctly in one but not the other,and vice-versa. Third, the OR* language is strictly more compact than both the OR-of-XOR and XOR-of-OR languages: it can efficiently simulate both languages, and cansuccinctly express some valuations that require exponential size in each of them. Theseproperties are summarized in Figure 10.7.

Interpretation and verification complexity

Recall that in the auction setting these languages are used for communicating bids tothe auctioneer. It is the auctioneer’s job to first interpretthese bids, and then calculatean allocation of goods to agents. Thus it is natural to be concerned about the compu-tational complexity of a given bidding language. In particular, we may want to knowhow difficult it is to take an arbitrary bid in some language and compute the valuationof some arbitrary subset of goods according to that bid. We call this the interpretation

Multiagent Systems, draft of August 27, 2007

Page 330: Mas kevin b-2007

314 10 Protocols for Multiagent Resource Allocation: Auctions

complexity. The interpretation complexity of a bidding language is theminimum timeinterpretationcomplexity required to compute the valuationv(S), given input of an arbitrary subsetS ⊆ G and

arbitrary bidv in the language.Not surprisingly, the atomic bidding language has interpretation complexity that is

polynomial in the size of the bid. To compute the valuation ofsome arbitrary subsetS,one need only check whether all members ofS are in the atomic bid. If they are, thevaluation ofS is just that given in the bid (because of free disposal); and if they are not,then the valuation ofS is 0. The XOR bidding language also has interpretation com-plexity that is polynomial in the size of the bid; just perform the above procedure foreach of the atomic bids in turn. However, all of the other bidding languages mentionedabove have interpretation complexity that is exponential in the size of the bid. For ex-ample, given the OR bid(S1, p1) ∨ (S2, p2) ∨ · · · ∨ (Sk, pk), computing the valuationof S requires checking all possible combinations of the atomic bids, and there are2k

such possible combinations.One might ask why we even consider bidding languages that have exponential inter-

pretation complexity. Simply stated, the answer is that languages with only polynomialinterpretation complexity are either not expressive enough or not compact enough. Thisbrings us to a more relaxed criterion. It may be sufficient to require that a given claimabout a bid’s valuation for a set isverifiable in polynomial time. We define theveri-fication complexityof a bidding language as the minimum time required to verify theverification

complexity valuationv(S), given input of an arbitrary subsetS ⊆ G, an arbitrary bidv in thelanguage, and a proof of the proposed valuationv(S). All of the languages we havepresented in this section have polynomial verification complexity.

10.3.4 Iterative mechanisms

We have argued that in a combinatorial auction setting, agents’ valuations can be socomplex that they cannot be tractably communicated. The idea behind bidding lan-guages (Section 10.3.3) is that this communication limitation can be overcome if anagent’s valuation can be succinctly represented. In this section we consider another,somewhat independent idea: replacing the sealed-bid mechanism we have discussed sofar with an indirect mechanism that probes agents’ valuations only as necessary.

Intuitively, the use of indirect mechanisms in combinatorial auctions offers the pos-sibility of several benefits. Most fundamentally, allowingthe mechanism to query bid-ders selectively can reduce communication. For example, ifthe mechanism arrived atthe desired outcome (say the VCG allocation and payments) after a small number ofqueries, other agents could realize that they were unable tomake a bid that would im-prove the allocation, and could thus quit the auction without communicating any moreinformation to the auctioneer. This sort of reduction in communication is still usefuleven if the auction is small enough that agents’ full valuationscouldbe tractably con-veyed. First, reducing communication can benefit bidders who want to reveal as littleas possible about their valuations to their competitors and/or to the auctioneer. Second,iterative mechanisms can help in cases where it is difficult for bidders to determinetheir own valuations. For example, in a logistics domain a bidder might have to solvea traveling salesman problem to determine his valuation fora given bundle, and thuscould face a computational barrier to determining his wholevaluation function.

c© Shoham and Leyton-Brown, 2007

Page 331: Mas kevin b-2007

10.3 Combinatorial auctions 315

Indirect mechanisms also have benefits that go beyond reducing communication.First, they can be easier for bidders to understand than complex direct mechanisms likeVCG, and so can be seen as more transparent. This matters, forexample, in govern-ment auctions of public assets like radio spectrum, where taxpayers want to be assuredthat the auction was conducted fairly. Finally, while no general result shows this for-mally, our experience with single-good auctions in the common and affiliated valuescase (Section 10.1.10) suggests that allowing bidders to iteratively exchange partialinformation about their valuations may lead to improvements in both revenue and effi-ciency.

Of course, considering iterative mechanisms also invites new challenges. Most ofall, such mechanisms are tremendously complicated, and hence can require extensiveeffort to design. Furthermore, small flaws in this design canlead to huge problems.For example, iterative mechanisms can give rise to considerably richer strategy spacesthan direct mechanisms do: an agent can condition his actions on everything he haslearned about the actions taken previously by other agents.Beyond potentially makingthings complicated for the agents, this strategic flexibility can also facilitate undesirablebehavior. For example, agents can bid insincerely in order to signal each other (e.g.,“don’t bid on my bundle and I won’t bid on yours”), and thus collude against theseller. Another problem is that agents can often gain by waiting for others to revealinformation, especially in settings where determining one’s own valuation is costly.The auction must therefore be designed in a way that gives agents some reason tobid early. One approach is to establish activity rules that restrict an agent’s futureparticipation in the auction if he does not remain sufficiently active. This idea wasalready discussed at the beginning of Section 10.3 in the context of decoupled auctionsfor combinatorial settings such as the simultaneous ascending auction.

Because iterative mechanisms can become quite complex, we will not formally de-scribe any of them here. (As always, interested readers should consult the referencescited at the end of the chapter.) However, we will note some ofthe main questionsaddressed in this literature, and some of the general trendsin the answers to thesequestions. The first question is what social choice functions to implement, and thuswhat payments to impose. Here a popular choice is to design mechanisms that con-verge to an efficient allocation, and that elicit enough information to guarantee thatagents pay the same amounts that they would under VCG. Even ifan indirect mech-anism mimics VCG, it does not automatically inherit its equilibrium properties—therevelation principle only covers the transformation of indirect mechanismsinto directmechanisms. Indeed, under indirect mechanisms that mimic VCG, answering querieshonestly is no longer a dominant strategy. For example, if agent i knows that agentj will overbid, agenti may also want to do so, as dishonest declarations byj can af-fect the queries thati will receive in the future. Nevertheless, it can be shown that itis anex-postequilibrium (see Section 5.3.4) for agents to cooperate with any indirectmechanism that achieves the same allocation and payment as VCG when all biddersbut some bidderi bid truthfully andi bids arbitrarily. In some cases mechanism de-signers have considered mechanisms that donot always converge to VCG payments.Here they usually also assume that agents will bid “straightforwardly”—i.e., that theywill answer queries truthfully even if it is not in their interest to do. This assumption istypically justified by demonstrations that agents can gain very little by deviations even

Multiagent Systems, draft of August 27, 2007

Page 332: Mas kevin b-2007

316 10 Protocols for Multiagent Resource Allocation: Auctions

in the worst case (i.e., straightforward bidding is anǫ-equilibrium for smallǫ), claimsthat determining a profitable deviation would be computationally intractable, and/or anappeal to a complete-information Nash analysis. As long as bidders do behave in a wayconsistent with this assumption, these iterative mechanisms are able to avoid some ofthe undesirable properties of VCG discussed in Section 9.4.2.

A second question is what sorts of queries the indirect mechanisms should ask. Themost popular are probablyvalue queriesanddemand queries. A mechanism asks avalue queries

demand queriesvalue query when it suggests a bundle and asks how much it is worth to a bidder.Demand queries are in some sense the reverse: the mechanism asks which bundlethe bidder would prefer at given prices. Demand queries comein various differentforms: the simplest have prices only on single goods and offer the same prices to allbidders, while the most complicated attach bidder-specificprices to bundles. Whenonly demand queries are used and prices (of whatever form) are guaranteed to riseas the auction progresses, we call the auction anascending combinatorial auction.ascending

combinatorialauction

(Observe that a Japanese auction is a single-good special case of such an auction.) Ofcourse, all sorts of other query families are also possible.For example, bidders canbe askedorder queries(state which of two bundles is preferred) andbounding queries(answer whether a given bundle is worth more or less than a given amount). Somemechanisms even allowpush-pull queries: bidders can answer questions they weren’tasked, and can decline to answer questions they were asked.

The final general lesson to convey from the literature on iterative combinatorial auc-tions is that in the worst case, any mechanism that achieves an efficient or approximately-efficient12 allocation in equilibrium must receive an amount of information equal in sizeto a single agent’s complete valuation. Since the size of an agent’s valuation is expo-nential in the number of goods, this is discouraging. However, this result speaks onlyabout the worst case, and only about bidders with unrestricted valuations. Researchershave managed to show theoretically that worst-case communication requirements arepolynomial under some restricted classes of valuations or when the query complex-ity is parameterized by the minimal representation size in agiven bidding language,and to demonstrate practically that iterative mechanisms can terminate after reasonableamounts of communication when used with real bidders.

10.3.5 A tractable mechanism

Recall that atractablemechanism is one that can determine the allocation and pay-ments using only polynomial-time computation (Definition 9.3.9). We have seen thatsuch a mechanism can easily be achieved by restricting bidders to expressing valua-tions from a set that makes the winner determination problemtractable (as discussedin Section 10.3.2 above), and then using VCG. Here we look beyond such biddingrestrictions, seeking more general mechanisms that nevertheless remains computation-ally feasible. The idea here is to build a mechanism around anoptimization algorithmthat is guaranteed to run in polynomial time regardless of its inputs.

We will give one example of such a mechanism: a dominant-strategy truthful mech-

12. Formally, this result holds for any auction that always finds allocations that achieve more than a1n

-fraction of the optimal social welfare.

c© Shoham and Leyton-Brown, 2007

Page 333: Mas kevin b-2007

10.3 Combinatorial auctions 317

anism for combinatorial auctions that is built around a greedy algorithm. This mecha-nism only works for bidders with a restricted class of valuations, calledsingle-minded.

Definition 10.3.11 (Single-minded bidder)A bidder issingle-mindedif he has the single-mindedvaluation function:

∀s ∈ 2G, vi(s) =

v i > 0 if s ⊇ bi;

0 otherwise.

Intuitively, a bidder is single-minded if he is only interested in a single bundle; hevalues this bundle and all supersets of it13 at the same amount,v i, and values all otherbundles at zero. Although this is a severe restriction on agents’ valuations, it does notmake the winner determination problem any easier. Intuitively, this is because agentsremain free to chooseanybundles from the set of possible bundles2G.

In a direct mechanism for single-minded bidders, every bidder i places a bidbi =〈si, v i〉 indicating a bundle of interestsi and an amountv i. (Observe that we haveassumed that the auctioneer does not know bidderi’s bundle of interestsi; this is whyi’s bid must have two parts.) Letapgi = v i/|si| denote bidderi’s declared amount pergood, and as before letn be the number of bidders. Now consider the following greedyalgorithm.

Definition 10.3.12 (greedy allocation scheme)The greedy allocation schemeis de- greedyallocationscheme

fined as follows:

store the bidders’ bids in a listL, sorted in decreasing order ofapgletL(j) denote thejth element ofLlet a = ∅let j = 1while j ≤ n do

if a ∩ sj = ∅ thenbid bj winsa = a ∪ sj

foreachwinning bidder idolook for a bidderinext, the first bidder whose bid followsi’s in L, whose biddoes not win, and whose bid does win if the greedy allocation scheme is runagain withi’s bid omittedif a bidderinext existsthen

bidderi’s payment ispi = |si|·vinext

|sinext|

elsepi = 0

foreach losing bidderi dopi = 0

13. Implicitly, this amounts to an assumption of free disposal,as defined in Section 10.3.3.

Multiagent Systems, draft of August 27, 2007

Page 334: Mas kevin b-2007

318 10 Protocols for Multiagent Resource Allocation: Auctions

Intuitively, the greedy allocation scheme ranks all bids indecreasing order ofapg,and then greedily allocates bids starting from the top ofL. The payment of a winningbidderi is theapg of the highest-ranked bidder that would have won but fori’s partic-ipation, multiplied by the number of goods allocated toi. Bidderi pays zero if he doesnot win or if there is no bidderinext.

Theorem 10.3.13When bidders are single-minded, the greedy allocation scheme isdominant-strategy truthful.

We leave the proof of this theorem as an exercise; however, itis not hard. Observethat, because it is based on a greedy algorithm, this mechanism does not select an ef-ficient allocation. It is natural to wonder whether this mechanism can come close. Itturns out that the best that can be achieved comes from modifying the algorithm, re-placingapgi with the ratiov i/

√|si|. This can be shown to preserve dominant strategy

truthfulness, and to achieve the1/√k-bound on efficiency discussed above.

10.4 Exchanges

So far, we have surveyed single-good, multiunit and combinatorial auctions. Despitethe wide variety within these families, we have not yet exhausted the space of auctions.We now briefly discuss one last, important category of auctions. These are exchanges:auctions in which agents are able to act as both buyers and sellers. We discuss twovarieties, which differ more in their intentions than in their mechanics. The first isintended to allocate goods; the second is designed to aggregate information.

10.4.1 Two-sided auctions

In two-sided auctions, otherwise known asdouble auctions, there are many buyers andtwo-sidedauction

double auction

sellers. A typical example is the stock market, where many people are interested inbuying or selling any given stock. It is important to distinguish this setting from cer-tain marketplaces (such as popular consumer auction sites)in which there are multipleseparate single-sided auctions. We will not have much to sayabout double auctions,in part because the relative dearth of theoretical results about them. However, let usmention two primary models of single-dimensional double markets, that is, marketsin which there are many potential buyers and sellers of many units of the same good(for example, the shares of a given company). We distinguishhere between two kindsof markets, thecontinuous double auction (or CDA) and theperiodic double auctioncontinuous

double auction(CDA)

periodic doubleauction

(otherwise known as thecall market).

call market

In both the CDA and the call market agents bid at their own paceand as many timesas they want. Each bid consists of a price and quantity, wherethe quantity is eitherpositive (signifying a ‘buy’ order) or negative (signifying a ‘sell’ order). There are noconstraints on what the price or quantity might be. Also in both cases, the bids receivedare put in a central repository, theorder book. Where the CDA and call market diverge

order bookis on when a trade occurs. In the CDA, as soon as the bid is received, an attempt ismade to match it with one or more bids on the order book; for example, a new sell orderfor 10 units may be matched with one existing buy bid for 4 units and another buy bid

c© Shoham and Leyton-Brown, 2007

Page 335: Mas kevin b-2007

10.4 Exchanges 319

for 6 units, so long as both the buy-bid prices are higher thanthe sell price. In cases ofpartial matches, the remaining units (either of the new bid or of one of order-book bids)is put back on the order book. For example, if the new sell order is for 13 units and theonly buy bids on the order book with a higher price are the onesdescribed (one buy bidfor 4 units and another buy bid for 6 units), two trades are arranged—one for 4 units,and one for 6—and the remaining 3 units of the new bid are put on the order book asa sell order. (We have not mentioned the price of the trades arranged; obviously theymust lie in the interval between the price in the buy bid and the price in the sell bid—theso called bid-ask spread—but are unconstrained otherwise. Indeed, the amount paid tothe seller could be less than the amount charged to the buyer,allowing a commissionfor the exchange or broker.)

In contrast, when a bid arrives in the call market, it is simply placed in the orderbook. No trade is attempted. Then, at some predetermined time, an attempt is madeto arrange the maximal amount of trade possible (called clearing the market). In thiscase this is done simply by ranking the sell bids in ascendingorder, the buy bids indescending order, and finding the point at which supply meetsdemand. Figure 10.8depicts a typical call market. In this example 14 units are traded when the marketclears, after the remaining bids are left on the order book awaiting the next marketclear.

beforeSell: 5@$1 3@$2 6@$4 2@$6 4@$9Buy: 6@$9 4@$5 6@$4 3@$3 5@$2 2@$1

⇓after

Sell: 2@$6 4@$9Buy: 2@$4 3@$3 5@$2 2@$1

Figure 10.8 A call-market order book, before and after market clears.

10.4.2 Prediction markets

A prediction market, also called aninformation market, is a double-sided auction that predictionmarket

informationmarket

is used to aggregate agents’ beliefs rather than allocatinggoods. For example, such amarket could be used to assess different candidates’ chances in an upcoming presiden-tial election. To set up a prediction market, the market designer establishes contracts(c1, . . . , ck), where each contractci is a commitment to pay the bearer $1 if candidatei wins the election, and $0 otherwise. However, the market designer does not actu-ally sell such contracts himself. Instead, he simply opens up his market, and allowsinterested agents to both buy and sell contracts with each other. The reason that sucha market is interesting is that a risk-neutral bidder shouldvalue a contractci at exactlythe probability thati will win the election. If a bidder believes that he has informationabout the election that other bidders do not have, and consequently believes that theprobability of i winning is greater than the current asking price for contract ci theniwill want to buy contracts. Conversely, if a bidder believesthe true probability ofiwin-ning is less than the current price, he will want to sell contracts. In equilibrium, pricesreflect bidders’ aggregate beliefs. This is similar to the idea that price of a company’s

Multiagent Systems, draft of August 27, 2007

Page 336: Mas kevin b-2007

320 10 Protocols for Multiagent Resource Allocation: Auctions

stock in a stock market will reflect the market’s belief aboutthe company’s future prof-itability. Indeed, an even tighter collection is tofutures markets, in which traders canfutures marketspurchase delivery of a commodity (e.g., oil) at a future date; prices in these marketsare commonly treated as forecasts about future commodity prices. The key distinctionis that a prediction market is designed primarily to elicit such information, rather thandoing so as a side-effect of solving an allocation problem. On the other hand, futuresmarkets are used primarily for hedging risk (e.g., an airline company might buy oilfutures in order to defend itself against the risk that oil prices will rise, causing it tolose money on tickets it has already sold).

Prediction markets have been used in practice for a wide variety of belief aggregationtasks, including political polling, forecasting the outcomes of sporting events and pre-dicting the box-office returns of new movies. Of course, there are other methods thatcan be used for all of these tasks. In particular, opinion polls and surveys of experts arecommon approaches. However, there is mounting empirical evidence that predictionmarkets can outperform both of these methods. This is interesting because predictionmarkets do not need to work with unbiased samples of the population: bidders are notasked whatthey think, but what they thinkthe whole populationthinks, and they aregiven an economic incentive to answer correctly. Similarly, prediction markets do notneed to explicitly identify or score experts: participantsare self-selecting, and weighttheir input themselves by the size of the position they take in the market. Finally,prediction markets are able to update their forecasts in real time, e.g., immediatelyupdating prices to reflect the consequences of a political scandal on an election.

A variety of different mechanisms can be used to implement prediction markets.One straightforward approach is to use continuous double auctions or call markets, asdescribed in the previous section. These mechanisms have the drawback that they cansuffer from a lack of liquidity: trades can only occur when agents want contracts onboth sides of the market at the same time. Liquidity can be introduced by a marketmaker. However, such a market maker therefore assumes financial risk, which meansthe mechanism is not budget balanced. Another approach is a parimutuel market: bid-ders place money on different outcomes, and when the true outcome is observed, thewinning bidders split the pot in proportion to the amount each one gambled. Thesemarkets are budget balanced and do not suffer from illiquidity; however, agents haveno incentive to place bets before the last moment, and so the market loses the propertyof aggregating beliefs in real time. Yet more complicated mechanisms, such as dy-namic parimutuel markets and market scoring rules, are ableto achieve real-time beliefaggregation, bound the market-maker’s worst-case loss, and still provide substantialliquidity. These markets are too involved to describe briefly here; the reader is referredto the references for details.

10.5 History and references

[Krishna, 2002b] is an excellent book that provides a formalintroduction to auctiontheory. [Klemperer, 1999b] is a large edited collection of many of the most importantpapers on the theory of auctions, preceded by a thorough survey by the editor; thissurvey is reproduced in [Klemperer, 1999a]. Earlier surveys include [Cassady, 1967;

c© Shoham and Leyton-Brown, 2007

Page 337: Mas kevin b-2007

10.5 History and references 321

Wilson, 1987a; McAfee & MacMillan, 1987]. These texts covermost of the canonicalsingle-good and multi-unit auction types we discuss in the chapter. (One exception isthe elimination auction, which we gave as an example of the diversity of auction types;it was introduced by Fujishimaet al. [1999b]).

As was mentioned in the notes for Chapter 9, Vickrey’s seminal contribution [Vick-rey, 1961] is still recommended reading for anyone interested in auctions. In it Vickreyintroduced the second-price auction and argued that bidders in such an auction do bestwhen they bid sincerely. He also provided the analysis of thefirst price auction underthe independent private value (IPV) model with the uniform distribution described inthe chapter. He even proved an early version of the revenue-equivalence theorem (The-orem 10.1.4), namely that in the independent private value (IPV) case, the English,Dutch, first-price and second price auctions all produce thesame expected revenue forthe seller. The more general form of the theorem, Theorem 10.1.4, is due to Myer-son [1981] and Riley and Samuelson [1981], who also investigated optimal (that is,revenue-maximizing) auctions. Our proof of the theorem follows [Klemperer, 1999a].The “auctions as structured negotiations” point of view wasadvanced by Wurmanet al.[2001]. McAfee and McMillan [1987] introduced the notion ofauctions with an un-certain number of bidders, and Harstadet al. [1990] analyzed its equilibrium. Theso-called “Wilson doctrine” was articulated in [Wilson, 1987b]. The result that oneadditional bidder yields more revenue than an optimal reserve price is due to Bulowand Klemperer [1996]. The most influential theoretical studies of collusion were byGraham and Marshall [1987] for second-price auctions and McAfee and McMillan[1992] for first-price auctions. Early important results onthe common-value (CV)model include [Wilson, 1969] and [Milgrom, 1981]. The former showed that whenbidders are uncertain about their values their bids are not truthful, but rather are lowerthat their assessment of that value. Milgrom [1981] analyzed the symmetric equilib-rium for second-price auctions under common values. The affiliated value model wasintroduced by [Milgrom & Weber, 1982].

The equilibrium analysis of sequential auctions is due to Milgrom and Weber [2000],from a seminal paper written in 1982 but only published recently. The random sam-pling optimal price auction and the proof that this auction achieves a constant fractionof optimal fixed-price revenue is due to Goldberget al. [2006]. Our discussion ofposition auctions generally follows Edelmanet al. [2007]; see also [Varian, 2007].

Combinatorial auctions are covered in depth in the edited collection [Cramtonet al.,2006], which probably provides the best single-source overview of the area. Severalchapters of this book are especially worthy of note here. Thecomputational complexityof the WDP is discussed in a chapter by Lehmannet al.. Algorithms for the WDPhave an involved history, and are reprised in chapters by Muller and Sandholm. Adetailed discussion of bidding languages can be found in a chapter by Nisan; the OR*language is due to Fujishimaet al. [1999a]. Iterative mechanisms are covered inchapters by Parkes, Ausubel and Milgrom, and Sandholm and Boutilier; the worst-case communication complexity analysis appears in a chapter by Segal. Finally, thetractable greedy mechanism is due to Lehmannet al. [2002].

There is a great deal of empirical evidence that prediction markets can effectivelyaggregate beliefs; two prominent examples of this literature consider election results[Berg et al., 2001] and movie box-office revenues [Spann & Skiera, 2003].Dynamic

Multiagent Systems, draft of August 27, 2007

Page 338: Mas kevin b-2007

322 10 Protocols for Multiagent Resource Allocation: Auctions

parimutuel markets are due to Pennock [2004], and the marketscoring rule is describedby Hanson [2003].

c© Shoham and Leyton-Brown, 2007

Page 339: Mas kevin b-2007

11 Teams of Selfish Agents: AnIntroduction to Coalitional GameTheory

In Chapter 1 we looked at how teams of cooperative agents can accomplish more to-gether than they can achieve in isolation. Then, in Chapter 2and many of the chaptersthat followed, we looked at how self-interested agents makeindividual choices. In thischapter we interpolate between these two extremes, asking how self-interested agentscan combine to form effective teams. As the title of the chapter suggests, this chap-ter is essentially a crash course incoalitional game theory, also known ascooperativegame theory. As was mentioned at the beginning of Chapter 2, when we introducednon-cooperative game theory, the term ‘cooperative’ can bemisleading. It does notmean that, as in Chapter 1, each agent is agreeable and will follow arbitrary instruc-tions. Rather, it means the basic modelling unit is the group, rather than the individualagent. More precisely, in coalitional game theory we still model the individual prefer-ence of agents, but not their possible actions; instead we have a coarser model of thecapabilities of different groups.

We will proceed as follows. We will first give several successively simpler modelsof coalitional games, leading to coalitional games with transferable utilities, which isthe focus of this chapter. We will then define its two primary and best-known solutionconcepts—thecoreand theShapley value(we will also briefly mention several othersolution concepts from the literature). Next we present some basic results concerningthese solution concepts. We conclude with a discussion of complexity considerationsin coalitional game theory.

11.1 Definition of coalitional games

In coalitional game theory our focus is on what groups of agents, rather than individualagents, can achieve. Given a set of agents, a coalition game defines how well eachgroup of agents (or acoalition) can do for itself. We are not concerned as to how theagents make individual choices within a coalition, how theycoordinate, or any othersuch detail; we simply take the payoff1 to a coalition as given.

1. Alternatively, one might assigncostsinstead of payoffs to coalitions. Throughout this chapter,we willfocus on the case of payoffs; the concepts defined herein can be extended analogously to the case of costs.

Page 340: Mas kevin b-2007

324 11 Teams of Selfish Agents: An Introduction to Coalitional Game Theory

In general, the payoff to a coalition will depend on what other coalitions are formed.Thus, we define a general coalitional game as follows.

Definition 11.1.1 (Coalitional game in partition form) A coalitional game in parti-tion form is a pair (N, p) wherecoalitional game

in partition form• N is a finite set of players, indexed byi.

• p associates each partitionπ of N (also known as acoalitional structure) and acoalition S ∈ π with p(π, S) ⊆ R

S , to be interpreted as the set of payoffs thatScan achieve for its members under partitionπ.

It is often the case that the payoff to a coalition is independent of the coalitionalstructure. In other words, there are no externalities amongthe coalitions: the value of acoalition depends only on its membership, and not on agents that do not belong to thecoalition. Under this commonly made assumption, we can simplify the definition of acoalition game.

Definition 11.1.2 (Coalitional game, non-transferable utility) Acoalitional game (withnon-transferable utility)is a pair (N, v) wherecoalitional game

withnon-transferableutility

• N is a finite set of players, indexed byi.

• v associates each coalitionS ⊆ N with v(S) ⊆ R|S|, to be interpreted as the set

of payoffs thatS can achieve for its members.

In many situations the definition can be simplified even further. This occurs whenthe payoffs to a coalition may be freely redistributed amongits members, an assump-tion commonly known as thetransferable utility assumption. This assumption is oftentransferable

utilityassumption

satisfied when there is acurrencyfor exchange in the system. Under this assumption,the set of feasible payoffs to a coalition can be summarized as a single number.

Definition 11.1.3 (Coalitional game, transferable utility) Acoalitional game with trans-ferable utility is a pair (N, v) wherecoalitional game

with transferableutility • N is a finite set of players, indexed byi.

• v associates each coalitionS ⊆ N with a real-valued payoffv(S) ∈ R, known asits worth, that can be distributed in any manner among its members. Thefunctionv is sometimes referred to as thecharacteristic function.characteristic

functionRegardless of which of the coalitional game model we adopt, it is common to ground

the payoff to the empty coalition to0, i.e.,v(∅) = 0.Consider the following two examples:

• In theTreasure of Sierra Madre game, a setN of gold prospectors find a treasureTreasure ofSierra Madregame

of many (more than2|N |) gold pieces in the mountains. Each piece can be carriedby two people but not by a single person. The situation can be modeled by a game(N, v), wherev(S) = ⌊|S|/2⌋

c© Shoham and Leyton-Brown, 2007

Page 341: Mas kevin b-2007

11.2 Classes of coalitional games 325

• In the three-playermajority gamethere are3 agents who need to agree on a coursemajority gameof action. If all three agree then the group gets a payoff of 1.Any two of them canguarantee a payoff ofα to the pair, independently of the actions of the third player(where0 ≤ α ≤ 1). No player can guarantee to himself anything on his own. Thesituation can be modeled by a game(N = 1, 2, 3, v), wherev(N) = 1, v(S) = αif |S| = 2, andv(S) = 0 if |S| = 1.

Note that for both of these examples, had we chosen to represent these games asnon-cooperative games, the action space of each agent wouldhave been complex. Forexample, it would have to specify both which groups of agentsan agent is going towork with, and how much payoff the agent should demand from the group. Evenif such a representation is made possible through the discretization of payoffs, suchrepresentation is not useful as it obscures the true nature of these games. The model ofcooperative games precisely abstracts away such low-leveldetails and focuses on thebigger picture of coalition formed and payoffs agents can expect to receive. Note thatthere is nothing inherently cooperative about these games;when some agent receivesmore, some other agent generally receives less due to the bargaining nature within acoalition.

In the remaining of this chapter, we will focus on analyzing coalitional games withtransferable utility. We first describe a few important classes of coalitional games thatoccur often in practice. We then describe two of the most important solution conceptsin coalitional games—the core and the Shapley value, that will provide us means toanswer questions such as what coalitions we expect to form and what payoffs shouldbe made to each agent. Finally, we will look at the computational complexity aspectsof coalitional games.

11.2 Classes of coalitional games

In this section we will define a few important classes of coalitional games, which haveinteresting applications as well as interesting formal properties. We start with the no-tion of super-additivity, a property often assumed for coalitional games.

Definition 11.2.1 (Super-additive game)A gameG = (N, v) is super-additiveif for super-additivegameall S, T ⊂ N , if S ∩ T = ∅, thenv(S ∪ T ) ≥ v(S) + v(T ).

Super-additivity is justified when coalitions can always work without interfering withone another; hence, the value of two coalitions will be no less than the sum of theirindividual values. Note that super-additivity implies that the value of the entire set ofplayers (the ‘grand coalition’) is no less than sum of the value of any non-overlappingset of coalitions. In other words, the grand coalition has the highest payoff among allcoalitional structures.

Taking the non-interference across coalitions to the extreme, when coalitions cannever affect one another, either positively or negatively,then we have additive (orinessential) games.

Definition 11.2.2 (Additive game) A gameG = (N, v) is additive(or inessential) if additive gamefor all S, T ⊂ N , if S ∩ T = ∅, thenv(S ∪ T ) = v(S) + v(T ).

Multiagent Systems, draft of August 27, 2007

Page 342: Mas kevin b-2007

326 11 Teams of Selfish Agents: An Introduction to Coalitional Game Theory

A related class of games is that of constant-sum games.

Definition 11.2.3 (Constant-sum game)A gameG = (N, v) is constant-sum(some-constant-sumgame times called alsozero-sum) if for all S ⊂ N , v(S) + v(N − S) = v(N).

zero-sum game Note that every additive game is necessarily constant sum, but not vice versa.An important subclass of super-additive games are the convex games.

Definition 11.2.4 (Convex game)A gameG = (N, v) is convexif for all S, T ⊂ N ,convex gamev(S ∪ T ) ≥ v(S) + v(T )− v(S ∩ T ).

Clearly, convexity is a stronger condition than super-additivity.While convex games may appear to be a very specialized class ofcoalitional games,

it is actually not so rare in practice. For example, considera typical scheduling appli-cation. Let the set of jobs beJ and the set of machines beK, and suppose each job canonly be scheduled on certain machines but not others. Suppose there is a value to eachscheduled job, then the coalitional game(J ∪K, v) wherev is the maximum value ofthe jobs that can be served is a convex game.

Finally, we present a class of coalitional games with restrictions on the types ofpayoffs.

Definition 11.2.5 (Simple game)A gameG = (N, v) is simple if for all S ⊂ N ,simple gamev(S) ∈ 0, 1.

One way to think of simple games are those games in which each coalition can eithersucceed or fail. Simple games are especially applicable formodeling voting situations.A successful (failed) coalition is thus sometimes referredto as a winning (losing) coali-tion. Therefore, it is often assumed that in simple games, ifS is a winning coalition,then allT ⊃ S are also winning coalitions. If it further satisfies the property of beingconstant-sum, that is, ifS is a winning coalition, thenN −S is a losing coalition, thensuch game is known asproper simple game.proper simple

game Figure 11.1 graphically depicts the relationship between the different classes ofgames.

Super-additive ConvexAdditive

Constant-sum

SimpleProper-simple

⊃ ⊃

⊃⊃

Figure 11.1 A hierarchy of coalitional game classes;X ⊃ Y means that classX is a superclassof classY .

11.3 Analyzing coalitional games

The central question in coalitional game theory is the division of the payoff to the grandcoalition among the agents. As in non-cooperative game theory, coalitional games have

c© Shoham and Leyton-Brown, 2007

Page 343: Mas kevin b-2007

11.3 Analyzing coalitional games 327

multiple solution concepts. Different solution concepts attempt to capture differentfacets ofstability and fairness. Here, we will focus on two solution concepts—thecore, and the Shapley value, and briefly mention a few others towards the end of thesection.

11.3.1 The core

Why do we focus on the division of payoff to the grand coalition? This is often justi-fied on two grounds. Firstly, since many of the most widely-studied games are super-additive, the grand coalition will be the coalition that achieves the highest payoff overall coalitional structure, and hence we can expect it to form. Secondly, games arisingfrom public projects are often legally bounded to include all participants, and hencethere may be no choice for the agents but to form the grand coalition.

We first start with a few basic definitions in payoff division.

Definition 11.3.1 (x(S)) Let x ∈ RN . For S ⊆ N , we will use the notationx(S) to

denote the value∑

i∈S xi.

Definition 11.3.2 (Feasible payoff set)Given a coalitional game(N, v),

• thefeasible payoff setis defined asx ∈ RN | x(N) ≤ v(N); feasible payoff

set• thepre-imputation set, P, is defined asx ∈ R

N | x(N) = v(N);pre-imputation

• the imputation set, I, is defined asx ∈ P | ∀i ∈ N,x(i) ≥ v(i).imputation

We may refer to the set of imputations as the set ofpayoff vectors.

We can view the pre-imputation set as the set of feasible payoffs that areefficient, asthey distribute all the payoffs from the grand coalition. The imputation set further sat-isfiesindividual rationality, as each agent is guaranteed a payoff of at least the amountwhich he can achieve on his own.

We are now ready to define the solution concept of the core. Thecore is a descriptiveconcept that attempts to define stability of payoffs in a coalitional game.

Definition 11.3.3 (Core) A payoff vectorx is in thecoreof a coalitional game(N, v) coreif and only if

∀S ⊆ N,x(S) ≥ v(S). (11.1)

In other words, a payoff is in the core if and only if no coalition has an incentiveto break off from the grand coalition and form its own group. We can view this as astronger condition than the concept of Nash equilibrium in non-cooperative games, asthe latter requires stability with respect to deviation by asingle agent.

As an application of the core, consider the scheduling game we mentioned in the lastsection. We know that this game is convex, and hence the grandcoalition is guaranteedto have the highest total payoff over all coalitional structure. But can we find a payoffthat will economically enforce this efficient outcome? To doso, such payoff must be

Multiagent Systems, draft of August 27, 2007

Page 344: Mas kevin b-2007

328 11 Teams of Selfish Agents: An Introduction to Coalitional Game Theory

stable in the sense of the core. As we will show later, the coreof the game is non-empty,and so the social optimal outcome can indeed be enforced through suitable payoffs.

The concept of the core seems natural enough, but two questions arise due to itsimplicit definition through inequalities:

1. Is the core always non-empty?

2. Is the core always unique?

Unfortunately, the answer to both questions is no. Consider, for example, the Trea-sure of Sierra Madre game. If|N | ≥ 4 is even then the core consists of a single payoffvector(1/2, . . . , 1/2). However, if|N | ≥ 3 is odd then the core is empty (why?). Asanother example, in the three-player majority game, the core consists of all payoff vec-tors in which the payoff to the grand coalition is 1 and the payoff to any pair is greaterthanα. Thus the payoff vector(1/3, 1/3, 1/3) is the only element of the core whenα = 2/3, the core is not unique whenα < 2/3, and is empty ifα > 2/3.

These examples call into question the universality of the core as a solution conceptfor coalitional games. We already saw in the context of non-cooperative game theorythat solution concepts—notably, the Nash equilibrium—do notyield unique solutionsin general. Here we are in an arguably worse situation, in that the solution concept mayyield no solution at all. Can we characterize when a coalitional game has a non-emptycore? Fortunately, that at least is possible.

Definition 11.3.4 (Characteristic vector) Thecharacteristic vector, eS , of a coalitioncharacteristicvector S ⊆ N is defined to be a binary vectorv ∈ 0, 1N such thatvi = 1 if i ∈ S and

vi = 0 otherwise.

Definition 11.3.5 (Balanced weights)A set of non-negative weights (over2N ), λ, isbalancedifbalanced

S⊆N

λ(S)eS = eN .

Theorem 11.3.6 (Bondereva-Shapley)A coalitional game(N, v) has an non-emptycore if and only if for all balanced set of weightsλ,

v(N) ≥∑

S⊆N

λ(S)v(S). (11.2)

Proof. Consider the following linear program:2

minx

x(N)

subject to x(S) ≥ v(S) ∀S ⊆ N

Note that we can find a payoff vectorx such thatx(N) = v(N) if and only if the

c© Shoham and Leyton-Brown, 2007

Page 345: Mas kevin b-2007

11.3 Analyzing coalitional games 329

core is non-empty. Now consider its dual:

maxλ

S⊆N

λ(S)v(S)

subject to∑

S⊆N

λ(S) = 1 ∀i ∈ N

λ(S) ≥ 0 ∀S ⊆ N

Note that the linear constraints in the dual ensure thatλ is balanced. By weakduality, the optimal value of the dual is at most the optimal value of the primal,and hence the core is non-empty if and only if the optimal value of the dual is nogreater thanv(N).

While the Bondereva-Shapley theorem completely characterizes when a coalitionalgame has a non-empty core, it is not always easy or feasible tocheck that a gamesatisfies Equation 11.2 for all balanced set of weights. Instead, we list below a fewresults regarding emptiness of the core that are related to the classes of coalitionalgames we defined earlier.

Theorem 11.3.7

• Every constant-sum game that is not additive has an empty core.

• In a simple game the core is empty iff there is noveto player, that is, a playeri veto playersuch thatv(N \ i) = 0. If there are veto players, the core consists of all payoffvectors in which the non-veto players get0.

• Every convex game has a non-empty core.

11.3.2 The Shapley value

We now turn our attention to another important solution concept of coalitional games—the Shapley value. The Shapley value is a normative solutionconcept that prescribes afair way of dividing the payoffs among the agents.

Definition 11.3.8 (Shapley value)Given a coalition game(N, v), theShapley value Shapley valueof playeri is given by

φi(v) =1

N !

S⊆N

|S|!(|N | − |S| − 1)![v(S ∪ i)− v(S)

]

This expression can be viewed as capturing the “average marginal contribution” ofagenti. More specifically, imagine that the agents form a coalitionby incrementallyadding one agent at a time, starting with the empty set. Within any such sequence ofadditions, look at agenti′smarginal contribution at the time he is added. If he is addedto the setS, his contribution is[v(S ∪ i)− v(S)]. Now multiply this quantity by the|S|! different ways the setS could have been formed prior to agenti’s addition, and the

Multiagent Systems, draft of August 27, 2007

Page 346: Mas kevin b-2007

330 11 Teams of Selfish Agents: An Introduction to Coalitional Game Theory

(|N | − |S| − 1)! different ways the remaining agents could have been formed after hisaddition. Finally, sum this over all possible setsS, and average the sum over all|N |!orderings of all the agents.

For a concrete example of the Shapley value in action, consider a parliament con-sisting of four different political parties, A, B, C, and D, which have 45, 25, 15, and15 representatives, respectively. The parliament is voting on whether to pass a $100million spending bill and how much of this amount should be controlled by each of theparties. A majority vote is required in order to pass any legislation, and if the bill doesnot pass every party gets zero to spend. We calculate the split of the funds accordingto the Shapley values. Every coalition with 51 or more members has a value of $100million, and all other coalitions have a value of $0. Note that parties B, C, and D areinterchangeable, since they add the same value to any coalition. (They add $100 mil-lion to the coalitionsB, C, C, D , B, D that don’t include them already, and toA; they add $0 to all other coalitions.) The Shapley value of A is given by:

φA = (3)(4− 1)!(2− 1)!

4!(100− 0) + (3)

(4− 3)!(3− 1)!

4!(100− 0)

+ (1)(4− 4)!(4− 1)!

4!(100− 100)

= (3)2

24(100) + (3)

(1)(2)

24(100− 0) + 0

= 25 + 25 = $50 million

The Shapley value for B (and, by symmetry, also for C and D), isgiven by:

φB =(4− 2)!(2− 1)!

4!(100− 0) +

(4− 3)!(3− 1)!

4!(100− 0)

=2

24(100) +

2

24(100− 0)

= 8.33 + 8.33 = $16.66 million

Thus the Shapley values are(50, 16.66, 16.66, 16.66), which add up to the entire$100 million.

The Shapley value is interesting in a number of ways. For one thing, unlike thecore, it always exists and is unique. It also embodies a notion of fairness. Note thatthis fairness comes with the price of weakening the notion ofstability, compared to thegroup stability inherent in the definition of the core. For example, in the above example,while A does not have a unilateral motivation to vote for a different split, certainly Aand (say) B have the incentive to defect and divide the $100 million between them.

But the Shapley value is particularly striking because it can be captured axiomati-cally. We make the following definitions. An agent is adummyif v(S ∪i)− v(S) =dummy agentv(i) for all S. Agentsi andj areinterchangeableif v(S ∪i) = v(S ∪j) for all

interchangeableagents

S that contains neitheri or j. Now consider the following axioms:

Axiom 11.3.9 (SYM (symmetry)) For anyv, if i andj are interchangeable thenφi(v) =φj(v).

Axiom 11.3.10 (DUM (dummy player)) For anyv, if i is a dummy player thenφi(v) =v(i).

c© Shoham and Leyton-Brown, 2007

Page 347: Mas kevin b-2007

11.4 Computing coalitional game theory solution concepts 331

Axiom 11.3.11 (ADD (additivity)) For any twov andw we have for any playeri thatφi(v + w) = φi(v) + φi(w), where the game(N, v + w) is defined by(v + w)(S) =v(S) + w(S) for every coalitionS.

These axioms—especially the third—are not without controversy, but they do giverise to the following characterization theorem:

Theorem 11.3.12 (Shapley)The Shapley value is the unique solution concept that sat-isfies SYM, DUM and ADD, and that distributes the entire payoff of the grand coalition(i.e., that is efficient).

Among other things, this theorem allows us to easily calculate the Shapley valuesin both the Treasure of Sierra Madre game and the three-player majority game; bysymmetry, the Shapley value of all agents is the same, and is the result of dividing thepayoff of the grand coalition equally among all the agents.

A final question we consider regards the relationship between the core and the Shap-ley value. We know that the core may be empty, but if it is not, is the Shapley valueguaranteed to lie in the core? The answer in general is no, butthe following theoremgives us a sufficient condition for this property to hold. We already know from Theo-rem 11.3.7 that the core of convex games is non-empty. The following theorem furthertells us that the Shapley value is an element of the core:

Theorem 11.3.13In every convex coalition game, the Shapley value belongs tothecore.

11.3.3 Other solution concepts

We wrap up the discussion on the analysis of coalitional games by briefly mentioninga few other major solution concepts that have been proposed for coalitional games.

The nucleolusis a solution concept that refines the core, and addresses thepossi- nucleolusble non-existence and non-uniqueness of the core. We first need to define the conceptexcess. Given a payoff vectorx, we can compute the differencev(S) − x(S) for all excessS ⊆ N , known as a coalition’s excess. The nucleolus is the payoff vector that lexico-graphically minimizes the excess of all coalitions. Such a payoff vector is guaranteedto exist, and as it happens, it is also unique.

Other solution concepts, such as thestable sets, thebargaining set, and thekernel,attempt to address the fact that the solution concept of the core does not require thatwhen a coalition deviates, that it should deviate to a stablecoalition itself. We will notdiscuss these solution concepts in detail. The notes at the end of the chapter point tofurther readings on this topic.

11.4 Computing coalitional game theory solution concepts

Recall that, after we covered the basic notions of non-cooperative game theory, wedevoted an extensive discussion (e.g., Chapter 3) to ways inwhich complexity issuesinteract with game theory. It is natural to do the same for coalitional game theory.

Multiagent Systems, draft of August 27, 2007

Page 348: Mas kevin b-2007

332 11 Teams of Selfish Agents: An Introduction to Coalitional Game Theory

Here, however, the discussion is briefer. This in part reflects the vagaries of a scientificcommunity that gravitated over a period of time more towardsnon-cooperative gametheory, and in part to some inherent properties of coalitional game theory.

It is natural to ask the computational complexity of answering some basic questions,such as:

• Is the core of the game empty?

• Is this given payoff in the core?

• What is the Shapley value of this game?

However, when examined closely, these complexity questions become less interest-ing than they seem at first. The problem lies in the very definition of a coalitionalgame. Recall that in order to define such a game, one must specify the value of everycoalition—and there is an exponential number of those. So even if one provides anefficient (say, polynomial-time) algorithm to solve such a problem, it would still beexponential in the number of agents. Nonetheless, we will start by considering thesequestions in the most general setting, where the value function is explicitly given. Wewill then look at a more specific setting, which does not applyin all cases, but whichgives rise to more meaningful complexity questions.

First, note that the core is defined by a set of2n − 1 inequalities (the constraintsfor no deviation) and one equality (the payoffs sum to the value of the grand coalition)overn unknown variables (the payoffs to all agents). The core thusdefines a linearprogram (see Appendix B for a review), which can be solved in polynomial time (again,polynomial in the input size, but exponential in the number of agents).

Similarly, the Shapley value can be directly computed by a sum over alln! permu-tations of then agents. As a shortcut, one only needs to compute the average marginalcontribution of an agent over all possible subsets not including the agent, scaled ap-propriately with the right factor (|S|!(n−|S|−1)!

n! ). By approximating the factorials withthe well-known Stirling formula (which states thatn! ∼

√2 ∗ π ∗ n(n

e )n), the Shapleyvalue can be computed approximately in timeO(N) (where, again,N = 2n is the sizeof the input).

11.5 Compact representations of coalitional games

The above results are not very illuminating, given the fact that the input size will alwaysbe exponential in the number of agents. In order to ask more interesting questionswe must first find a game representation that is more compact. In general one cannotcompress the representation, but many natural settings contain inherent symmetries andindependencies that do give rise to more compact representations. Conceptually, thissection mirrors the discussion in Section 5.5 which considered compact representationsof non-cooperative games.

In the following, we will discuss a number of compactly represented coalitionalgames. These games can be roughly categorized into two types. First, we look atgames that arise from specific applications, such as the weighted graph games and

c© Shoham and Leyton-Brown, 2007

Page 349: Mas kevin b-2007

11.5 Compact representations of coalitional games 333

the weighted majority games. We then look at representationlanguages designed torepresent coalitional games. This includes the synergy representation for super-additivegames, and the multi-issue representation and marginal contribution nets for generalgames.

11.5.1 Weighted graph games

A weighted graph game (WGG) is a coalitional game defined by an undirected weightedgraph (that is, a set of nodes, and a real-valued weight associated with every unorderedpair of nodes). Intuitively, the nodes in the graph represent the players, and the valueof a coalition is obtained by summing the weights of the edgesthat connect a pair ofvertices that are in a coalition. WGGs thus explicitly model only pairwise synergiesamong players and assume that all such synergies add to the coalition in an additivefashion. The payoff for this reduced expressiveness is a compact representation; agame withn agents is represented byn(n−1)

2 weights. Formally:

Definition 11.5.1 (Weighted graph game)Theweighted graph game(WGG) defined weighted graphgameby the undirected weighted graph(V,W ), whereV is the set of vertices andW ∈

RV ×V the set of edge weights (w(i, j) being the weight of the edge between the vertices

i and j) is the coalitional game:

• N = V ;

• v(S) =∑

i,j∈S w(i, j).

An example in which WGGs model well is the revenue sharing game. In this game,the nodes represent cities, and the edges represent highways with the weights corre-sponding to toll revenues. Solving the coalitional game then amounts to deciding howto divide the toll revenues among the cities.

The following is a direct consequence of the definitions:

Proposition 11.5.2 If all the weights are non-negative then the game is convex.

Thus we know that in this case WGGs have a non-empty core, and furthermore thatit contains the Shapley solution. But the core may contain additional payoff vectors,and it is natural to ask whether testing membership is easy orhard. The answer is givenby the following:

Proposition 11.5.3 If all the weights are non-negative then membership of a payoffvector in the core can be tested in polynomial time.

The proof is omitted, but consists of a maxflow-type algorithm.The Shapley value is also easy to compute. First note the following:

Theorem 11.5.4The Shapley value of a weighted graph game(V,W ) is

φ(i) =1

2

j 6=i

w(i, j).

Multiagent Systems, draft of August 27, 2007

Page 350: Mas kevin b-2007

334 11 Teams of Selfish Agents: An Introduction to Coalitional Game Theory

Proof. Consider the contribution of the edge(i, j) to φ(i). For every sub-setS containing it, it contributes(n−|S|)!(|S|−1)!

n! w(i, j). There are(n−2k−2

)sub-

sets of sizek that contain bothi and j. So all subsets of sizek contribute(n−2k−2

) (n−|S|)!(|S|−1)!n! w(i, j) = (k−1)

n(n−1)w(i, j). And by summing overk =

2, 3, ...n we obtain the result.

It follows that we can compute the Shapley value inO(n2).Answering questions regarding the core of WGGs is more complex. We first note

the following proposition (recall that a cut in a graph is a set of edges that divide thenodes into two disjoint sets, the weight of a cut is the sum of its weights, and a negativecut is one whose weight is negative):

Proposition 11.5.5 The Shapley solution is in the core of a weighted graph game ifand only if there is no negative cut in the weighted graph.

Proof. Note that while the value of a coalitionS is the sum of the weights withinS, the Shapley values in the same coalition is the same sum plushalf the totalweights of edges betweenS andN \ S. But the edges betweenS andN \ S forma cut. Clearly if that cut is negative the Shapley value cannot be in the core, andsince this holds for all setsS, the converse is also true.

And so we get as a consequence that if the weighted graph contains no negative cuts,the core cannot be empty. The next theorem turns this into a necessary and sufficientcondition:

Theorem 11.5.6The core of a weighted graph game is non-empty if and only if thereis no negative cut in the weighted graph.

Proof. The if part follows from the proposition above.For the only-if part, suppose we have a negative cut in the graph betweenS and

N−S. We then haveφ(S)−v(S) = φ(N−S)−v(N−S) = v(S,N−S)/2 < 0.For any payoff vectorx, we havex(N) = x(S) + x(N − S) = v(N) = φ(N) =φ(S)+φ(N−S). So,x(S)−v(S)+x(N−S)−v(N−S) = φ(S)−v(S)+φ(N−S)−v(N−S) = v(S,N−S) < 0. And one ofx(S)−v(S) orx(N−S)−v(N−S)has to be negative. Sox is not in the core and the core is empty.

These theorems suggest that one test for non-emptiness of the core is to checkwhether the Shapley solution lies in the core. However, despite all these promisingindications, testing membership in WGGs remains elusive in general:

Theorem 11.5.7Testing the Non-emptiness of the core of a general weighted graphgame isNP-complete.

The proof, which is omitted, is based on a reduction from MAXCUT, a well-knownNP-complete problem.

c© Shoham and Leyton-Brown, 2007

Page 351: Mas kevin b-2007

11.5 Compact representations of coalitional games 335

11.5.2 Weighted majority games

Weighted majority games are a compact form of encoding voting situations. The defi-nition is straightforward.

Definition 11.5.8 (Weighted majority game) A weighted majority gameis defined by weightedmajority gamethe weightswi assigned to each player. LetW be

∑i=1..n w(i). The value of a coali-

tion is then1 if∑

i∈S w(i) > W2 and0 otherwise.

Since this game is simple, testing the non-emptiness of the core is equivalent totesting the existence of a veto player, which can be done quickly. However, the situationis more complex for the Shapley value.

Theorem 11.5.9Computing the Shapley value in weighted majority games is #P-complete.3

The proof, which is omitted, is by a reduction from the counting version of KNAP-SACK.

11.5.3 Capturing synergies: a representation for super-additivegames

So far, we have looked at compact representations of coalitional games that arise nat-urally from the application, such as that of sharing surpluson a graph or voting. Wenow switch gears to consider compact representations that are designed with the intentto represent a diverse body of coalitional games.

The first one we will examine can be used to represent any super-additive game. Asmentioned earlier in the chapter, super-additivity is sometimes assumed for coalitionalgames and is justified when coalitions do not exert negative externalities on each other.

Definition 11.5.10 (Synergy representation)Thesynergy representationof super-additivesynergyrepresentationgames is given by the pair(N, s), whereN is the set of agents, ands is a set function

that maps each coalition to a value to be interpreted as thesynergythe coalition gen-erates when the agents of the coalition work together. Only coalitions with strictlypositive synergies will be included in the specification of the game.

The underlying coalitional game(N, v) under representation(N, s) is given by

v(S) =

(max

S1,S2,...,Sk∈π(S)

k∑

i=1

v(Si)

)+ s(S)

whereπ(S) denote the set of all partitions ofS.

Note that for some super-additive games, the representation may still require spaceexponential in the number of agents. This is unavoidable as the space of coalitionalgames ofn agents, when treated as a vector space, is of2n − 1 dimensions. However,for many games, the space required will be much less.

3. Recall that #P consists of the counting versions of the polynomial-time decision problems; for example,not deciding whether a Boolean formula is satisfiable, but rather counting the number of satisfying truthassignments.

Multiagent Systems, draft of August 27, 2007

Page 352: Mas kevin b-2007

336 11 Teams of Selfish Agents: An Introduction to Coalitional Game Theory

We can evaluate the usefulness of a representation based on anumber of criteria. Onecriterion is whether the representation exposes the structure of the underlying game tofacilitate efficient computation. For example, is it easy tofind out the value of a givencoalition? Unfortunately, the answer is negative for the synergy representation.

Proposition 11.5.11 It is NP-complete to determine the value of some coalitions fora coalitional game specified with the synergy representation. In particular, it is NP-complete to determine the value of the grand coalition.

Intuitively, the reason why it is hard to determine the valueof a coalition under thesynergy representation is that we need to pick the best partitioning for a coalition, forwhich there is an exponential number of choices in the size ofthe coalition.

As a result of this, it is (co)NP-hard to determine whether a given payoff vector isin the core, and it isNP-hard to compute the Shapley value of the game, as solutionto either problem can be used to compute the value of the grandcoalition. It is also(co)NP-hard to determine whether the core is empty or not, which follows from areduction from Exact Cover by 3 Sets.

Interestingly, if the value of the grand coalition is given as part of the input of theproblem, then the emptiness of the core can be determined efficiently.

Theorem 11.5.12Given a super-additive coalitional game specified with the synergyrepresentation, and the value of the grand coalition, we candetermine in polynomialtime whether the core of the game is empty or not.

The proof of the theorem is based on constructing a linear program for finding apayoff in the core.

11.5.4 A decomposition approach: multi-issue representation

The central idea behind the multi-issue representation, a representation based on gamedecomposition, is that ofaddition of games (in the sense of additivity in the axiom-atization of the Shapley value). Formally, the multi-issuerepresentation is given asfollows.

Definition 11.5.13 (Multi-issue representation)A multi-issue representationis com-multi-issuerepresentation posed of a collection of coalitional games, each known as an issue,(N1, v1), (N2, v2),

. . ., (Nk, vk), which together constitutes the coalitional game(N, v) where

• N = N1 ∪N2 ∪ · · · ∪Nk;

• For each coalitionS ⊆ N , v(S) =∑k

i=1 vi(S ∩Ni).

Intuitively, each issue of the game involved some set of agents, which may be par-tially overlapping with the set of agents for another issue,and that the value of a coali-tion in the game is the sum of the values achieved by the coalition in each issue. Forexample, consider a robotics domain where there are certaintasks to be performed,each of which can be performed by a certain subset of a group ofrobots. We can thentreat each of these tasks as an issue and model the system as a coalitional game where

c© Shoham and Leyton-Brown, 2007

Page 353: Mas kevin b-2007

11.5 Compact representations of coalitional games 337

the value of any group of robots is the sum of the values of the tasks the group canperform.

Clearly, the multi-issue representation can be used to represent any coalitional games,as we can always choose to treat the coalitional game as a single big issue.

From the computational standpoint, due to its close relation to additivity, it is perhapsnot surprising that the Shapley value of a coalitional game specified in multi-issuerepresentation can be found efficiently. More precisely,

Proposition 11.5.14The Shapley value of a coalitional game specified with the multi-issue representation can be computed in time linear in the size of the input.

This follows directly as a result of our observations that the Shapley value of a gamecan be computed in linear time when the input is given by a fullspecification of thevalue function and that the Shapley value satisfies the additivity axiom.

On the other hand, the multi-issue representation does not help with the computa-tional questions regarding the core. In particular, it is coNP-hard to determine if agiven payoff vector belongs to the core when the game is specified with the multi-issuerepresentation.

11.5.5 A logical approach: marginal contribution nets

The marginal contribution nets (MC-nets) is a representation scheme for coalitionalgames that attempts to harness the power of boolean logic to reduce the space requiredfor specifying a coalitional game. The basic idea is to treateach agent in a game as aboolean variable and to treat the (binary) characteristic vector of a coalition as a truthassignment. This truth assignment can be used to evaluate ifa boolean formula issatisfied, which can in turn be used to determine the value of acoalition. Formally,

Definition 11.5.15 (MC-net representation)A MC-netconsists of a set of rules, each marginalcontribution netrepresentation

of which has the following syntactic form (Pattern, weight). The pattern is given by aboolean formula, whereas the weight is a real value.

The coalitional game(N, v) specified with the MC-net(p1, w1), (p2, w2), . . . , (pk, wk)is given by

v(S) =

k∑

i=1

pi(eS)wi

wherepi(eS) evaluates to1 if the boolean formulapi evaluates to true for the truth

assignmenteS , and0 otherwise.

As an example, consider a MC-net with two rules:(a ∧ b, 5), (b, 2). The coalitionalgame represented has two agents,a andb, and the following value function.

v(∅) = 0 v(a) = 0

v(b) = 2 v(a, b) = 5 + 2 = 7

An alternative interpretation of the MC-net is a graphical representation. We cantreat the agents as nodes on a graph, and for each pattern, a clique is drawn on the

Multiagent Systems, draft of August 27, 2007

Page 354: Mas kevin b-2007

338 11 Teams of Selfish Agents: An Introduction to Coalitional Game Theory

graph for the agents that appear in the same pattern. The weight of a rule is the weightassociated with the corresponding clique. This graphical representation is similar tothat used in representing probability distributions, namely, Markov network.

A natural question for a representation language is whetherthere is a limit on theclass of objects it can represent. For MC-nets, we can show the following:

Proposition 11.5.16MC-net can only represent convex games when the patterns arelimited to conjunctive formula over positive literals and the weights are non-negative.MC-net can represent any game when negative literals are allowed in the pattern orwhen the weights can be negative.

Another question is how does the language relate to other representation languagesproposed for the problem. For MC-nets, we can show that it generalizes two of the pre-viously discussed representations: the WGG and the multi-issue representation. MC-nets can be viewed as a generalization of WGG that assigns weights to hyper-edgesrather than to simple edges. A pattern in this case specifies the agents that share thesame edge. However, since MC-nets can represent any coalitional games, it constitutesa strict generalization of WGG. MC-nets can also be viewed as ageneralization of themulti-issue representation. In this case, each issue is represented by patterns that onlyinvolve agents that are relevant to the issue. Comparing thetwo representations, for anycoalitional games, MC-net will require at mostO(n) more space than the multi-issuerepresentation. On the other hand, there exist coalitionalgames for which MC-net canbe exponentially (in the number of agents) more succinct than the multi-issue repre-sentation.

From a computational standpoint, when only limited to conjunctions in the Booleanformula, MC-net preserves the same ease of computation of the Shapley value as themulti-issue representation.

Theorem 11.5.17Given a coalitional game specified with a MC-net limited to con-junctive patterns, the Shapley value can be computed in timelinear to the size of theinput

However, when other logical connectives are allowed, thereis no known algorithmfor finding the Shapley value efficiently. Essentially, finding the Shapley value involvessumming factors of hypergeometric distributed variables,a problem for which there isno known closed-form formula.

As MC-net generalizes WGG, the problems of determining whether the core isempty and whether a payoff vector belongs to the core are bothcoNP-hard. How-ever, there exists an algorithm that can determine both problems in time exponentialonly in the tree-width of the graphical representation of the MC-net.

11.6 History and references

In the early days of game theory research, coalitional game theory was a major focus,particularly of economists. This is partly because the theory is closely related to equi-librium analysis and seemingly bridges the gap between gametheory and economics.

c© Shoham and Leyton-Brown, 2007

Page 355: Mas kevin b-2007

11.6 History and references 339

Von Neumann and Morgenstern, for example, devoted more thanhalf of their classictext,Theory of Games and Economic Behavior[von Neumann & Morgenstern, 1944],to an analysis of coalitional games. A large body of theoretical work on coalitionalgame theory has focused on the development of solution concepts, possibly in an at-tempt to explain the behavior of large systems such as markets. Solid explanations ofthe many solution concepts and their properties were given by Osborne and Rubinstein[1994] and Peleg and Sudholter [2003].

The first systematic investigation of the computational complexity of solution con-cepts in coalitional game theory were carried out in Deng andPapadimitriou [1994].This paper defined the weighted graph games and studied the complexity of comput-ing the solution concepts of the core and the Shapley value, as well as a few of theother solution concepts we mentioned. The subsequent compact representations weredeveloped in the AI community. The superadditive representation was developed byConitzer and Sandholm [2003a], and it has a natural extension for representing super-additive games without transferable utilities, which we have not covered in the texthere. The multi-issue representation was also developed byConitzer and Sandholm[2004]. The marginal contribution nets representation wasfirst proposed in Ieong andShoham [2005], and later generalized in [Ieong & Shoham, 2006].

Multiagent Systems, draft of August 27, 2007

Page 356: Mas kevin b-2007
Page 357: Mas kevin b-2007

12 Logics of Knowledge and Belief

In this chapter we look at how one might represent statementssuch as “John knowsthat it is raining”, “John believes that it will rain tomorrow”, “Mary knows that Johnbelieves that it will rain tomorrow” and “It is common knowledge between Mary andJohn that it is raining”.

12.1 The partition model of knowledge

One reason computer scientists reason about knowledge is tomodel distributed sys-tems. Distributed systems consist of multiple processors autonomously performingsome joint computation. Of course, the joint nature of the computation means that theprocessors need to communicate with one another. One set of problems comes aboutwhen the communication is error-prone. In this case the system analyst may find him-self saying something like the following: “Processor A sentthe message to processorB. The message may not arrive, and processor A knows this. Furthermore, this is com-mon knowledge, so processor A knows that processor B knows that it (A) knows thatif a message was sent it may not arrive.” The questions is how to make such reasoningprecise, which is the topic of this chapter.

12.1.1 Muddy children and warring generals

Often the modeling is done in the context of some stylized problem, with an associatedentertaining story. Thus, for example, when we return to thedistributed computingapplication in Section 12.4, rather than speak about computer processors, we will tellthe story of two generals who attempt to coordinate among themselves to gang up on athird. For now, however, consider the following less violent story:

A group ofn children enters their house after having played in the mudoutside. They are greeted in the hallway by their father, whonotices thatk of the children have mud on their foreheads. He makes the followingannouncement, “At least one of you has mud on his forehead.” Thechildren can all see each other’s foreheads, but not their own. The fatherthen says, “Do any of you know that you have mud on your forehead?If you do, raise your hand now.” No one raises their hand. The father

Page 358: Mas kevin b-2007

342 12 Logics of Knowledge and Belief

repeats the question, and again no one moves. The father doesnot giveup and keeps repeating the question. After exactlyk rounds, all thechildren with muddy foreheads raise their hands simultaneously.

How can this be? On the face of it only the father’s initial statement conveyed newinformation to the children, and his subsequent questions add nothing. If a child didn’thave information at the beginning, how could he later on?

Here is an informal argument. Let us start with the simple case in whichk = 1. Inthis case the single muddy child knows that all the others areclean, and when the fatherannounces that at least one child is dirty he can conclude that he himself is that child.Note that none of theotherchildren know at this point whether or not they are muddy(after the muddy child raises his hand, however, they do; seenext). Now considerk = 2. Imagine that you are one of the two muddy children. After thefather’s firstannouncement, you look around the room, and see that there isa muddy child otherthan you. Thus after the father’s announcement you do not have enough information toknow whether your are muddy (you might be, but it could also bethat the other child isthe only muddy one). But you note that after the father’s firstquestion the other muddychild does not raise his hand. You then realize that you yourself must be muddy aswell, or else—based on the reasoning in thek = 1 case—that child would have raisedhis hand. So you raise your hand. Of course, so does the other muddy child.

You could extend this argument tok = 3, 4, . . ., showing in each case that all of thek muddy children raise their hands together after thekth time that the father asks thequestion. But of course, you would rather have a general theorem that applies to allk. In particular, you might want to prove by induction that after rounds1, 2, . . . , k − 1none of the children know whether they are dirty, but after the next round exactly themuddy children do. However, for this we will need a formal model of ‘know’ thatapplies in this example.

12.1.2 Formalizing intuitions about the partition model

Definition 12.1.1 (Partition model) An (n-agent) partition modelover a languageΣpartition modelis a tupleA = (W,π, I1, . . . , In), where

• W is a set of possible worlds,

• π : Σ → 2W is an interpretation function that determines which sentences in thelanguages are true in which worlds, and

• eachIi denotes a set of possible worlds that are equivalent from thepoint of viewof agenti. Formally Ii is a partition ofW ; that is, Ii = (Wi1 , . . . ,Wir

) such thatWij∩Wik

= ∅ for all j 6= k, and∪1≤j≤rWij= W . We also use the following

notation: Ii(w) = w′ | w ∈ Wijandw′ ∈ Wij

; that is, Ii(w) includes all theworlds in the partition of worldw, according to agenti.

Thus each possible world completely specifies the concrete state of affairs, at leastinsofar as the languageΣ can describe it. For example, in the context of the MuddyChildren Puzzle, each possible world will specify precisely which of the children are

c© Shoham and Leyton-Brown, 2007

Page 359: Mas kevin b-2007

12.1 The partition model of knowledge 343

muddy. The choice ofΣ is not critical for current purposes, but for concreteness wewill take it to be the languages of propositional logic over some set of primitive propo-sitional symbols.1 For example, in the context of the muddy children we will assumethe primitive propositional symbolsmuddy1, muddy2,. . . ,muddyn.

We will use the notationKi(ϕ) (or simplyKiϕ, when no confusion arises) as “agenti knows thatϕ”.2 The following defines when a statement is true in a partition model:

Definition 12.1.2 (Logical entailment for partition models) LetA = (W,π, I1, . . . , In)be a partition model overΣ, andw ∈ W . Then we define the|= (logical entailment)relation as

• For anyϕ ∈ Σ, we say thatA,w |= ϕ if and only ifw ∈ π(ϕ)

• A,w |= Kiϕ if and only if for all worldsw′, if w′ ∈ Ii(w) thenA,w′ |= ϕ.

The first part of the definition gives the intended meaning to the interpretation func-tion π from Definition 12.1.1. The second part states that we can only conclude thatagenti knowsϕ whenϕ is true in all possible worlds thati considers indistinguishablefrom the true world.

Let us apply this modeling to the muddy children story. Consider the followinginstance ofn = k = 2 (that is, the instance with two children, both muddy). Thereare four possible worlds, corresponding to each of the children being muddy or not.There are two equivalence relationsI1 andI2, which allow us to express each of thechildren’s perspectives about which possible worlds can bedistinguished. There aretwo primitive propositional symbols—muddy1 andmuddy2. At the outset, beforethe children see or hear anything, all four worlds form one big equivalence class foreach child.

After the children see each other (and can tell apart worlds in whichotherchildren’sstate of cleanliness is different) but before the father speaks, the state of knowledge isas illustrated in Figure 12.1. The ovals illustrate the fourpossible worlds, with the darkoval indicating the true state of the world. The solid boxes indicate the equivalenceclasses inI1, and the dashed boxes indicate the equivalence classes inI2.

1. See Appendix D for a review of propositional logic.2. The reader familiar with modal logic will note that a partition model is nothing but a special case of apropositional Kripke model withn modalities, in which each of then accessibility relations is an equivalencerelation (that is, a binary relation that is reflexive, transitive and symmetric). What is remarkable is thatthe modalities defined by these accessibility relations correspond well to the notion of knowledge that wehave been discussing. This is why we use the modal operatorKi rather than the generic necessity operator2i. (The reader unfamiliar with modal logic should ignore this remark; we review modal logic in the nextsection.)

Multiagent Systems, draft of August 27, 2007

Page 360: Mas kevin b-2007

344 12 Logics of Knowledge and Belief

muddy1

muddy2

muddy1

¬muddy2

¬muddy1

¬muddy2

¬muddy1

muddy2

I1

I2

Figure 12.1 Partition model after the children see each other.

Note that in this state of knowledge, in the real world both the sentencesK1muddy2andK2muddy1 are true (along with, for example,K1¬K2muddy2). However, nei-therK1muddy1 norK2muddy2 is true in the real world.

Once the father announces publicly that at least one child isdirty, the world in whichneither child is muddy is ruled out. This world then becomes its own partition, leavingthe state of knowledge is as follows:

muddy1

muddy2

muddy1

¬muddy2

¬muddy1

¬muddy2

¬muddy1

muddy2

I1

I2

Figure 12.2 Partition model after the father’s announcement.

Thus, were it the case that only one of the children were dirty, at this point he wouldbe able to uniquely identify the real world (and in particular the fact that he is dirty).However, in the real world (where both children are dirty) itis still the case that inthis world neitherK1muddy1 norK2muddy2 hold. However, once the children eachobserve that the other child doesn’t know his state of cleanliness, the state of knowledgebecomes as follows:

c© Shoham and Leyton-Brown, 2007

Page 361: Mas kevin b-2007

12.2 A detour to modal logic 345

muddy1

muddy2

muddy1

¬muddy2

¬muddy1

¬muddy2

¬muddy1

muddy2

I1

I2

Figure 12.3 Final partition model.

And now indeed bothK1muddy1 andK2muddy2 hold.Thus, we can reason about knowledge rigorously in terms of partition models. This

is a big advance over the previous, informal reasoning. But it is also quite cumbersome,especially as these models get larger. Fortunately, we can sometimes reason about suchmodels more concisely using an axiomatic system, which provides a complementaryperspective on the same notion of knowledge.

12.2 A detour to modal logic

In order to reason about the partition model of knowledge, and later about models ofbelief and other concepts, we must briefly discuss modal logic. This discussion presup-poses familiarity with classical logic. For the most part wewill consider propositionallogic, but we will also make some comments about first-order logic. Both are reviewedat the end of the book.

From the syntactic point of view, modal logic augments classical logic with one ormoremodal operators, and a modal logic is a logic that includes one or more modalmodal operatoroperators. Most generally speaking, a modal operator is an expression that relates asentence to some entity. There are many different kinds of “entities” and many differentkinds of “relations”, and thus many different modal operators and modal logics, eachwith different properties. For example, the entities can bepoints in times, and therelationship can be “true at” (as in “sentenceϕ is true at timet). More relevantly tous, the entities can be agents, and the relationship can be “knows” or “believes” (as in“agenti knowsϕ but agentj believes¬ϕ).

The classical notation for a modal operator is2, often pronounced “necessarily”(and thus2ϕ is read as “ϕ is necessarily true”). The original motivation within ana-lytic philosophy for introducing the modal operator is indeed to distinguish betweenaccidental (or ‘contingent’) truths such as “it is now sunny” and analytic (or ‘neces-sary’) ones such as “1 + 1 = 2”. However, the formal machinery has been used for avariety of purposes. For example, some logics interpret2 temporally; in some tempo-ral logics2ϕ is read as “ϕ is true now and will always be true in the future.” We willencounter a similar temporal operator later in the chapter when we discuss robot mo-

Multiagent Systems, draft of August 27, 2007

Page 362: Mas kevin b-2007

346 12 Logics of Knowledge and Belief

tion planning. Logics of knowledge and belief read2 yet differently, as “ϕ is known”(in the case of knowledge) or “ϕ is believed” (in the case of belief). In fact, in theselogics the2 notation is usually replaced by other notation which is moreindicative ofthe intended interpretation, but in this section we stick tothe generic notation.

Of course, the different interpretations of2 suggest that there are different modallogics, each lending2 different properties. This indeed is the case. In this section webriefly present the generic framework of modal logic, and in later sections we specializeit to model knowledge, belief, and related notions.

As with any logic, in order to discuss modal logic we need to discuss in turn syntax,semantics, and axiomatics (or proof theory). We concentrate on propositional modallogic, but make some remarks about first-order modal logic atthe end.

12.2.1 Syntax

The set of sentences in the modal logic with propositional symbolsP is the smallestsetL containingP such that ifϕ,ψ ∈ L then also¬ ∈ L, ∧ ψ ∈ L, and2ϕ ∈L. As usual, other connectives such as∨, → and≡ can be defined in terms of∧and¬. In addition, it is common to define thedual operatorto 2, often denoted3,and pronounced “possibly”. It is defined by3ϕ ≡ ¬2¬ϕ, which can be read as“the statement thatϕ is possibly true is equivalent to the statement that notϕ is notnecessarily true.”

12.2.2 Semantics

The semantics are defined in terms ofpossible worlds structures, also calledKripkepossible worldsstructure structures. A (single modality) Kripke structure is a pair(W,R), whereW is a col-

Kripke structurelection of (not necessarily distinct) classical propositional models (that is, models thatgive a truth value to all sentences that do not contain2), andR is binary relation onthese models. Eachw ∈ W is called apossible world. R is called theaccessibilityrelations, and sometimes also thereachability relationor alternativeness relation. It isaccessibility

relation

reachabilityrelation

alternativenessrelation

convenient to think of Kripke structures as directed graphs, with the nodes being theclassical models and the arcs representing the accessibility.

This is where the discussion can start to be related back to the partition model ofknowledge. The partition is of course nothing by a binary relation, albeit one withspecial properties. We will return to these special properties in the next section, butfor now let us continue with the generic treatment of modal logic, one that allows forarbitrary accessibility relations.

A truth of a modal sentence is evaluated relative to a particular possible worldw in aparticular Kripke structure(W,R) (the pair is called aKripke model). The satisfactionKripke modelrelation is defined recursively as follows:

• M,w |= p if p is true inw, for any primitive propositionp.

• M,w |= ϕ ∧ ψ iff M,w |= ϕ andM,w |= ψ.

• M,w |= ¬ϕ iff it is not the case thatM,w |= ϕ.

• M,w |= 2ϕ iff, for any w′ ∈W such thatR(w,w′), it is the case thatM,w′ |= ϕ.

c© Shoham and Leyton-Brown, 2007

Page 363: Mas kevin b-2007

12.2 A detour to modal logic 347

As in classical logic, we overload the|= symbol. In addition to denoting the satis-faction relation, it is used to denotevalidity. |= ϕ means thatϕ is true in all Kripkemodels, and, given a class of Kripke modelsM , |=M ϕ means thatϕ is true in allKripke models withinM .

12.2.3 Axiomatics

Now that we have a well defined notion of validity, we can ask whether there existsan axiom systems that allows us to derive precisely all the valid sentences, or the validsentences within a given class. Here we discuss the first question, of capturing thesentences that are valid in all Kripke structures; in futuresections we discuss specificclasses of models of particular interest.

Consider the following axiom system, called the axiom system K.

Axiom 12.2.1 (Classical.)All propositional tautologies are valid.

Axiom 12.2.2 (K) (2ϕ ∧2(ϕ→ ψ))→ 2ψ is valid.

Rule 12.2.3 (Modus Ponens.)If the validity of bothϕ andϕ→ ψ infer the validity ofψ.

Rule 12.2.4 (Necessitation.)From the validity ofϕ infer the validity of2ϕ.

It is not hard to see that this axiom system issound; all the sentences pronouncedvalid by these axioms and inference rules are indeed true in every Kripke model. Whatis less obvious, but nonetheless true, is that this is also acompletesystem for the classof all Kripke models; there do not exist additional sentences that are true in all Kripkemodels. Thus we have the followingrepresentation theorem: representation

theoremTheorem 12.2.5The systemK is sound and complete for the class of all Kripke mod-els.

12.2.4 Modal logics with multiple modal operators

We have so far discussed a single modal operator, and a singleaccessibility relationcorresponding to it. But it is easy to generalize the formulation to included multiple ofeach. Rather than have a single modal operator2 we have operators21,22, . . . ,2n.The set of possible worlds is unchanged, but now we haven accessibility relations:R1, R2, . . . , Rn. The last semantic truth condition is changed to:

• For anyi = 1, . . . , n, M,w |= 2iϕ iff, for any w′ ∈ W such thatRi(w,w′), it is

the case thatM,w′ |= ϕ.

Finally, by making similar substitutions in the axiom system K we get a sound andcomplete axiomatization for modal logic withn modalities. We sometimes denote thesystemKn, to emphasize the fact that it containsn modalities. Again, we emphasizethat this is soundness and completeness for the class of all n-modality Kripke models.The system remains sound if we restrict the class of Kripke models under consideration.In general it is no longer complete, but often we can add axioms and/or inference rulesand recapture completeness.

Multiagent Systems, draft of August 27, 2007

Page 364: Mas kevin b-2007

348 12 Logics of Knowledge and Belief

12.2.5 Remarks about first-order modal logic

We have so far discussed propositional model logic, that is,modal logic in which theunderlying classical logic is propositional. But we can also look at the case in whichthe underlying logic is a richer first-order language. In such a first-order modal logicwe can express sentences such as2∀x∃yFather(y, x) (read “necessarily, everyonehas a father”). We will not discuss first-order in detail, since it will not play a role inwhat follows. And in fact the technical development is for the most part unremark-able, and simply mirror the additional richness of first-order logic as compared to thepropositional calculus.

There are however some interesting subtleties, which we point out briefly here. Thefirst has to do with question of whether the so-calledBarcan formulais valid.Barcan formula

∀x2R(x)→ 2∀xR(x)

For example, when we interpret2 as ‘know’, we might ask whether, if for every personit is known individually that the person is honest, does it follow that it is known thatall people are honest. One can imagine some settings in whichthe intuitive answer isyes, and others in which it is no. From the technical point of view, the answer dependson the domains of the various models. In first-order modal logic, possible worlds arefirst-order models. This means in particular that each possible world has a domain, ora set of individuals to which terms in the language are mapped. It is not hard to seethat the Barcan formula is valid in the class of first-order Kripke models in which allpossible worlds have the same domain, but not in general.

A similar problem has to do with the interaction between modality and equality. Ifa = b, then is it always the case that2(a = b)? Again, for intuition, consider theknowledge interpretation of2, and the following famous philosophical example. Sup-pose there are two objects, the “morning star” and the “evening star”. These are twostars that are observed regularly, one in the evening and onein the morning. It so hap-pens these are the same object, namely the planet Venus. Doesit follow that one knowsthat they are the same object? Intuitively the answer is no. From the technical pointof you, this is because, even if the domains of all possible worlds are the same, theinterpretation function which maps language terms to objects might be different in thedifferent worlds. For similar reasons, in general one cannot infer that the world’s great-est jazz soprano saxophonist was born in 1897 based on the fact that Sidney Bechetwas born that year, since one may not know that Sidney Bechet was indeed the bestsoprano sax player that ever graced this planet.

12.3 S5: An axiomatic theory of the partition model

We have seen that the axiom systemK precisely captures the sentences that are validin the class of all Kripke models. We now return to the partition model of knowledge,and search for an axiom system that is sound and complete for this more restrictedclass of Kripke models. Since it is a smaller class, it reasonable to expect that it will becaptured by an expanded set of axioms. This is indeed is the case.

With start with systemK, but now use theKi to denote the modal operators rather

c© Shoham and Leyton-Brown, 2007

Page 365: Mas kevin b-2007

12.3 S5: An axiomatic theory of the partition model 349

than2i, representing the interpretation we have in mind.3

Axiom 12.3.1 (Classical.)All propositional tautologies are valid.

Axiom 12.3.2 (K) (Kiϕ ∧Ki(ϕ→ ψ))→ Kiψ

Rule 12.3.3 (Modus Ponens.)Fromϕ andϕ→ ψ inferψ.

Rule 12.3.4 (Necessitation.)Fromϕ inferKiϕ.

Note that the generic axiomK takes on a special meaning under this interpretation ofthe modal operator—it states that an agent knows all of the tautological consequencesof that knowledge. Thus, if you know that it is after midnight, and if you know that thepresident is always asleep after midnight, then you know that the president is asleep.Less plausibly, if you know the Peano axioms, you know all thetheorems of arithmetic.For this reason, this property is sometimes calledlogical omniscience. One could argue logical

omnisciencethat this property is too strong, that it is too much to expectof an agent. After all,humans don’t seem to know the full consequences of all of their knowledge. Similarcriticisms can be levied also against the other axioms we discuss below. This is animportant topic, and we returned to it later. However, it is afact that the idealized notionof ‘knowledge’ defined by the partition model has the property of logical omniscience.So to the extent that the partition model is useful (and it is,as we discuss later inthe chapter), one must learn to live with some of the strong properties it induces on‘knowledge.’

Other properties of knowledge do not hold for general Kripkestructures, and thusare not derivable in theK axiom system. One of these properties isconsistency, which consistencystates than an agent cannot know a contradiction. The following axiom, called axiomD for historical reasons, captures this property:

Axiom 12.3.5 (D) ¬Ki(p ∧ ¬p).

Note that this axiom cannot be inferred from the axiom systemK, and thus is notvalid in the class of all n-Kripke structures. It is, however, valid in the more restrictedclass ofserial models, in which each accessibility is serial. (A binary relationX over serial

accessibilityrelation

domainY is serial if and only if∀y ∈ Y ∃y′ ∈ Y such that(y, y′) ∈ X.) Indeed,as we shall see, it is not only sound but also complete. The axiom system obtained byadding axiomD to the axiom systemK, is called axiom systemKD.

Another property that holds for our current notion of knowledge is that ofveridity; veridityit is impossible for an agent to know something that isn’t actually true. Indeed, this isoften taken to be the property that distinguishes knowledgefrom other informationalattitudes, such as belief. We can express this property withthe so-called axiomT:

Axiom 12.3.6 (T) Kiϕ→ ϕ.

This axiom also cannot be inferred from the axiom systemKD. Again, the class ofKripke structures for which axiomT is sound and complete can be defined succinctly

Multiagent Systems, draft of August 27, 2007

Page 366: Mas kevin b-2007

350 12 Logics of Knowledge and Belief

– it consists of the Kripke structures in which each accessibility relation is reflexive.reflexiveaccessibilityrelation

(A binary relationX over domainY is reflexive if and only if∀y ∈ Y, (y, y) ∈ X.Note thaty = y′ is allowed.) By adding axiomT to the systemKD we get the axiomsystemKDT. However, in this system the axioms are no longer independent; theaxiom systemKDT is equivalent to the axiom systemKT, since the axiomD can bederived from the remaining axioms and the inference rules. Indeed, this follows easilyfrom the completeness properties of the individual axioms;every reflexive relation istrivially also serial.

There are two additional properties of knowledge induced bythe partition modelwhich are not captured by the axioms discussed thus far. Theyboth have to do with theintrospective capabilities of a given agent, or the nestingof the knowledge operation.We first considerpositive introspection, the property that, when an agent knows some-positive

introspection thing, he knows that he knows it. It is expressed by the following axiom, historicallycalled axiom4:

Axiom 12.3.7 (4) Kiϕ→ KiKiϕ.

Again, it does not follow from the other axioms discussed so far. The class of Kripkestructures for which axiom4 is sound and complete consists of the structures in whicheach accessibility relation is transitive. (A binary relationX is transitive if and only iffor all y, y′, y′′ ∈ Y it is the case that if(y, y′) ∈ X and(y′, y′′) ∈ X then(y, y′′) ∈X.)

The last property of knowledge we will consider isnegative introspection. Thisnegativeintrospection is quite similar to positive introspection, but here we are concerned that if an agent

doesn’t know something, then they know that they don’t know it. We express this withthe following axiom, called axiom5:

Axiom 12.3.8 (5) ¬Kiϕ→ Ki¬Kiϕ

Again, it does not follow from the other axioms. Consider nowthe class of Kripkestructures in which axiom5 is sound and complete. Informally speaking, we wantto ensure that if two worlds are accessible from the current world, then they are alsoaccessible from each other. Formally, we say that the accessibility relation must beEuclidean. (A binary relationX over domainY is Euclidean if and only if for allEuclidean

accessibilityrelation

y, y′, y′′ ∈ Y it is the case that if(y, y′) ∈ X and(y, y′′) ∈ X then(y′, y′′) ∈ Y .)At this point the reader may feel confused. After all, we started with a very simple

class of Kripke structures—partition models—in which each accessibility relation is asimple equivalence relation. Then, in our pursuit of axiomsthat capture the notion ofknowledge defined in partition models, we have looked at increasingly baroque prop-erties of accessibility relations and associated axioms, as summarized in Table 12.1.

What do these complicated properties have to do with the simple partition modelswhich started this discussion? The answer lies in the following observation:

Proposition 12.3.9 A binary relation is an equivalence relation if and only if itis re-flexive, transitive and Euclidean.

3. We stick to this notation for historical accuracy, but thereader should not be confused by it. The axiomaticsystemK, and the axiomK that gives rise to the name, have nothing to do with the syntactic symbolKi weuse to describe the knowledge of an agent.

c© Shoham and Leyton-Brown, 2007

Page 367: Mas kevin b-2007

12.4 Common knowledge, and an application to distributed systems 351

Name Axiom Accessibility RelationAxiom K (Ki(ϕ) ∧Ki(ϕ→ ψ))→ Ki(ψ) NAAxiom D ¬Ki(p ∧ ¬p) SerialAxiom T Kiϕ→ ϕ ReflexiveAxiom 4 Kiϕ→ KiKiϕ TransitiveAxiom 5 ¬Kiϕ→ Ki¬Kiϕ Euclidean

Table 12.1 Axioms and corresponding constraints on the accessibility relation.

Indeed, the systemKT45 (which results from adding to the axiom systemK all theaxiomsT, 4 and5), exactly captures the properties of knowledge as defined bythepartition model. SystemKT45 is also known by another, more common name—theS5 axiom system.S5 is both sound and complete for the class of all partition models.However, we are able to state an even more general theorem, which will serve us welllater when we discuss moving from knowledge to belief:

Theorem 12.3.10Let X be a subset ofD,T,4,5 and letX be the correspondingsubset ofserial, reflexive, transitive, Euclidean. ThenK ∪ X (which is the basicaxiom systemK with the appropriate subset of axioms added) is a sound and completeaxiomatization ofKi for the class of Kripke structures whose accessibility relationssatisfyX .

12.4 Common knowledge, and an application to distributed systems

We mentioned earlier that one motivation for reasoning about knowledge in computerscience stems from distributed systems. The following example illustrate the sort ofreasoning one would like to perform in that context. (In casethis problem doesn’tsound like distributed computing to you, imagine that the generals are computer pro-cesses, who are trying to communicate reliably over a faultycommunication line.)

Two generals standing on opposing hilltops are trying to communicatein order to coordinate an attack on a third general, whose army sits in thevalley between them. The two generals are communicating viamessen-gers who must travel across enemy lines to deliver their messages. Anymessenger carries the risk of being caught, in which case themessage islost (alas, the fate of the messenger is of no concern in this story). Eachof the two generals wants to attack, but only if the other does; if they bothattack they will win, but either one will lose if he attacks alone. Giventhis context, what protocol can the generals establish thatwill ensure thatthey attack simultaneously?

You might imagine the following naive communication protocol. The protocol forthe first general,S, is to send an ‘attack tomorrow’ message to generalR, and to keepsending this message repeatedly until he receive an acknowledgement that the messagewas received. The protocol for the second general,R, is to do nothing until he receives

Multiagent Systems, draft of August 27, 2007

Page 368: Mas kevin b-2007

352 12 Logics of Knowledge and Belief

the message fromS, then send a single acknowledgement message back to generalS.The question is whether the agents can have a plan of attack based on this protocol,which always guarantees—or, less ambitiously, can sometimes guarantee—that theyattack simultaneously. And if not, can a different protocolachieve this guarantee?

Clearly, it would be useful to reason about this scenario rigorously. It seems intuitivethat the formal notion of ‘knowledge’ should apply here, butthe question is preciselyhow. In particular, what is the knowledge condition that must be attained in order toensure a coordinated attack?

To apply the partition model of knowledge we need to first define the possibleworlds. We will do this by first defining thelocal stateof each agent (that is, eachgeneral); together the two local state will form aglobal state. To reason about the naiveprotocol, we will have the local state forS represent two binary pieces of information:whether or not an ‘attack’ message was sent, and whether or not an acknowledgementwas received. The local state forRwill also represent two binary pieces of information:whether or not an ‘attack’ message was received, and whetheror not an acknowledge-ment message was sent. Thus we end up with the four possible local states for eachgeneral—(0, 0), (0, 1), (1, 0) and(1, 1)—and thus sixteen possible global states.

We are now ready to define the possible worlds of our model. Theinitial globalstate is well defined—by convention we call it〈(0, 0), (0, 0)〉. It will then evolve basedon two factors—the dictates of the protocol, and the nondeterministic effect of nature(which decides whether a message is received or not). Thus, among all the possiblesequences of global states, only some are legal according the protocol. We will callany finite prefix of such a legal sequence of global states ahistory. These histories willbe the possible worlds in our model.

For example, consider the following possible sequence of events, given the naiveprotocol:S sends an ‘attack’ message toR, R receives the message and sends an ac-knowledgement toS, andS receives the acknowledgement. The history correspondingto this scenario is

〈(0, 0), (0, 0)〉, 〈(1, 0), (1, 0)〉, 〈(1, 1), (1, 1)〉The structure of our possible worlds suggests a natural definition of the partition as-

sociated with each of the agents. We will say that two histories are in the same equiv-alence class of agenti (i = S,R) if their respective final global states have identicallocal state for agenti. That is, history

〈(0, 0), (0, 0)〉, 〈xS,1, xR,1〉, . . . , 〈xS,k, xR,k〉is indistinguishable in the eyes of agenti from history

〈(0, 0), (0, 0)〉, 〈yS,1, yR,1〉, . . . , 〈yS,l, yR,l〉if and only if xi,k = yi,l. Note a very satisfying aspect of this definition; the acces-sibility relation, which in general is thought of as an abstract notion, is given here aconcrete interpretation in terms of local and global states.

We can now reason formally about the naive protocol. Consider again the possiblehistory mentioned above:

〈(0, 0), (0, 0)〉, 〈(1, 0), (1, 0)〉, 〈(1, 1), (1, 1)〉

c© Shoham and Leyton-Brown, 2007

Page 369: Mas kevin b-2007

12.4 Common knowledge, and an application to distributed systems 353

The reader can verify that in this possible world the following sentences are true:KSattack,KRattack,KSKRattack. However, it is also the case that this worldsatisfies¬KRKSKRattack. Intuitively speaking,R does not know thatS knows thatR knows thatS intends to attack, since for allR knows its acknowledgment could havebeen lost. ThusR cannot proceed to attack; it reasons that if indeed the acknowledge-ment was lost thenS will not dare attack.

This of course suggests a fix—haveS acknowledgeR’s acknowledgement. To ac-commodate this we would need to augment each of the local states with another binaryvariable, representing forS whether the second acknowledgement was sent and forRwhether it was received. However it is not hard to see that this also has a flaw; assum-ing S’s second acknowledgement indeed goes through, in the resulting history we willhaveKRKSKRattack hold, but notKSKRKSKRattack. Can this be fixed, andwhat is the general knowledge condition we must aim for?

It turns out that the required condition is what is known ascommon knowledge. This commonknowledgeis a very intuitive notion, and we define it in two steps. We first define what it means

that everybody knowsa particular sentence to be true. To represent this we use thesymbolEG, whereG is a particular group of agents. The ‘everybody knows’ operatorhas the same syntactic rules as the knowledge operator. As one might expect, thesemantics can be defined easily in terms of the basic knowledge operator. We definethe semantics as follows.

Definition 12.4.1 (“Everyone knows”) LetM be a Kripke structure,w be a possibleworld inM ,G be a group of agents, andϕ be a sentence of modal logic. ThenM,w |=EGϕ if and only ifM,w′ |= ϕ for all w′ ∈ ∪i∈GPi(w). (Equivalently, we can requirethat∀i ∈ G it is the case thatM,w |= Kiϕ.)

In other words, everybody knows a sentence when the sentenceis true in all of theworlds that are considered possible in the current world by any agent in the group.

Using this concept we can define the notion ofcommon knowledge, or, as it is some-times called lightheartedly, “what any fools knows.” If ‘any fool’ knows something,then we can assume that everybody knows it, and everybody knows that everybodyknows it, and so on. An example from the real world might be theassumption we usewhen driving a car that all of the other drivers on the road also know the rules of theroad, and that they know that we know the rules of the road, andthat they know thatwe know that they know them, and so on. We require an infinite series of “everybodyknows” in order to capture this intuition. For this reason weuse the following recursivedefinition.

Definition 12.4.2 (Common knowledge)LetM be a Kripke structure,w be a possi-ble world inM , G be a group of agents, andϕ be a sentence of modal logic. ThenM,w |= CGϕ if and only ifM,w |= EG(ϕ ∧ CGϕ)

In other words, a sentence is common knowledge if everybody knows it and knowsthat it is common knowledge. This formula is called thefixed-point axiom, sinceCGϕcan be viewed as the fixed point solution of the functionf(x) = EG(ϕ ∧ x). Fixed-point definitions are notoriously hard to understand intuitively. Fortunately, we cangive alternative characterizations ofCG. First,we can give a direct semantic definition:

Multiagent Systems, draft of August 27, 2007

Page 370: Mas kevin b-2007

354 12 Logics of Knowledge and Belief

Theorem 12.4.3LetM be a Kripke structure,w be a possible world inM , G be agroup of agents, andϕ be a sentence of modal logic. ThenM,w |= CGϕ if and onlyif M,w′ |= ϕ for every sequence of possible worlds(w = w0, w1, . . . , wn) for whichthe following holds:wn = w′, and for every0 ≤ i < n there exists an agentj ∈ Gsuch thatwi+1 ∈ Pj(wi).

Second, it is worth noting ourS5 axiomatic system can also be enriched to providea sound and complete axiomatization ofCG. It turns out that what are needed are twoadditional axioms and one new inference rule:

Axiom 12.4.4 (A3) EGϕ↔∧

i∈GKiϕ

Axiom 12.4.5 (A4) CGϕ→ EG(ϕ ∧ CGϕ)

Rule 12.4.6 (R3)Fromϕ→ EG(ψ ∧ ϕ) inferϕ→ CGψ

This last inference rule is a form of anInduction Rule.Armed with this notion of common knowledge, we return to our warring generals.

It turns out that, in a precise sense, whenever any communication protocol guaranteesa coordinated attack in a particular history, in that history it must achieve commonknowledge between the two generals that an attack is about tohappen. It is not hardto see that no finite exchange of acknowledgements will ever lead to such commonknowledge. And thus it follows that there isnocommunication protocol that solves thecoordinated attack problem, at least not as the problem is stated here.

12.5 Doing time, and an application to robotics

We now move to another application of reasoning about knowledge, in robotics. Thedomain of robotics domain is characterized byuncertainty; a robot receives uncertainreadings from its input devices, and its motor controls produce imprecise motion. Suchuncertainty is not necessarily the end of the world, so long as its magnitude is nottoo large for the task at hand, and that the robot can reason about it effectively. Wewill explicitly consider the task of robot motion planning under uncertainty, and inparticular the question of how a robot knows to stop despite having an imprecise sensorand perhaps also motion controller. We first discuss the single-robot case, where wesee the power of the knowledge abstraction. We then move to the multiagent case,where we show the importance of each agent being able to modelthe other agents inthe system.

12.5.1 Termination conditions for motion planning

Imagine a point robot moving in one dimension along the positive reals from the originto the right, with the intention of reaching the interval[2, 4], the goal region (see Fig-ure 12.4). The robot is moving at a fixed finite velocity, so there is no question aboutwhether it will reach the destination; the question is whether the robot will know whento stop. Assume the robot has a position sensor that is inaccurate; if the true position isL, the sensor will return any value in the interval[L− 1, L+ 1]. We assume that time

c© Shoham and Leyton-Brown, 2007

Page 371: Mas kevin b-2007

12.5 Doing time, and an application to robotics 355

is continuous and that the sensor provides readings continuously, but that the readingvalues are not necessarily continuous. We are looking for a termination condition—apredicate on the readings such that as soon as it evaluates to“true” the robot will stop.What is a reasonable termination condition?

0 3 51 2 4 6

Goal Area

Initial State

Figure 12.4 A 1–dimensional robot motion problem

Consider the termination condition “R = 3”, whereR is the current reading. Thisis a “safe” condition, in the sense that if that is the readingthen the robot is guaranteedto be in the goal region. The problem is that, because of the discontinuity of readingvalues, the robot may never get that reading. In other words,this termination conditionis sound but not complete. Conversely, the condition “2 ≤ R ≤ 4” is complete but notsound. Is there a sound and complete termination condition?

Upon reflection it is not hard to see that “R ≥ 3” is such a condition. Althoughthere are locations outside the goal region that can give rise to readings that satisfythe predicate (e.g., in location 10 the reading necessarilydoes), what matters is thefirst time the robot encounters that reading. Given its starting location and its motiontrajectory, the first time can be no earlier than 2 and no laterthan 4.

Let us turn now to a slightly more complex, 2–dimensional robotic motion planningproblem, depicted in Figure 12.5. A point robot is given a command to move in acertain direction, and its goal is to arrive at the depicted rectangular goal region. Asin the previous example, its sensor is error prone; it returns a reading that is withinρfrom the true location (that is, the uncertainty is capturedby a disk of radiusρ). Unlikethe previous example, the robot’s motion is also subject to error; given a command tomove in a certain direction, the robot’s actual motion can deviate up to∆ degrees fromthe prescribed direction. It is again assumed that when the robot is given the command“Go in direction D” it moves at some finite velocity, always moving within ∆ degreesfrom that direction, but that the deviation is not consistent and can change arbitrarilywithin the tolerance∆. It is not hard to see that the space of locations reachable bytherobot is described by a cone whose angle is 2∆, as depicted in Figure 12.5. Again weask—what is a good termination condition for the robot? One candidate is a rectangleinside the goal region whose sides are∆ away from the corresponding side of the goalrectangle. Clearly this is a sound condition, but not a complete one. Figure 12.6 depictsanother candidate, termed the “naive termination condition”—all the sensor readings inthe region with the dashed boundary. Although it is now less obvious, this terminationcondition is also sound—but, again, not complete. In contrast, Figure 12.7 shows atermination condition that is both sound and complete.

Consider now the two sound and complete termination conditions identified for the

Multiagent Systems, draft of August 27, 2007

Page 372: Mas kevin b-2007

356 12 Logics of Knowledge and Belief

Cone of possible positions

D

G

G− the goal

D

− max. deviation in readings

− the motion direction

II initial state

− max. angle of deviation

Figure 12.5 A 2–dimensional robot motion problem

D

D

G

The Naive Termination

Condition

I

r

r

1

2

A

B

Uncertainty disk

Goal boundary (G)

Boundary of the naive termination condition

Cone of possible positions

reachable positionsthat do not correspond to possible readings in E

Figure 12.6 A sound but incomplete termination condition for the 2–dimensional robot motionproblem

c© Shoham and Leyton-Brown, 2007

Page 373: Mas kevin b-2007

12.5 Doing time, and an application to robotics 357

G

I

The terminationcondition

Figure 12.7 A sound and complete termination condition for the 2–dimensional robot motionproblem

two problems, the 1–dimensional one and the 2–dimensional.Geometrically they seemto bear no similarity, and yet one’s intuition might be that they embody the same prin-ciple. Can one articulate this principle, and thus perhaps later on apply it to yet otherproblems? To do so we abstract to what is sometimes called theknowledge level.Rather that speak about the particular geometry of the situation, let us reason more ab-stractly about the knowledge available to the robot. Not surprisingly, we choose to doit using possible worlds semantics, and specifically using the S5 model of knowledge.Define a history of the robot as a mapping from time to both location and sensor read-ing. That is, a history tells us where the robot has been at anypoint in time and its sensorreading at that time. Clearly, every motion planning problem—including both the onespresented here—defines a set of legal histories. For every motion planning problem,our set of possible worlds will consist of pairs(h, t), whereh is a legal history andt isa time point. We will say that(h, t) is accessible from(h′, t′) if the sensor reading inh at t is identical to the sensor reading inh′ at t′. Clearly this defines an equivalencerelation. We now need a language to speak about such possibleworlds structures. Asusual, we will use the K operator to denote knowledge, that is, truth in all accessibleworlds. We will also use a temporal operator3 whose meaning will be “sometime inthe past.” Specifically,h, t |= 3ϕ holds at h,t just in case there exists at′ ≤ t suchthath, t′ |= ϕ.4 Armed with this, we are ready to provide a knowledge–level condition

4. This operator is taken from the realm of temporal logic, where it usually appears in conjunction of othermodalities; however those are not required for our example.

Multiagent Systems, draft of August 27, 2007

Page 374: Mas kevin b-2007

358 12 Logics of Knowledge and Belief

that captures both the specific geometric ones given earlier, and is deceptively simple.Let g be the formula meaning “I am in the goal region” (which is of course instantiateddifferently in the two examples). Then the sound and complete termination conditionis simplyK3g, reading informally “I know that either I am in the goal region now, orelse sometime in the past I was”. It turns out that this simpletermination condition isnot only sound and complete for these two examples, but is so also in a much larger setof examples, and is furthermore optimal: it causes the robotto stop no later than anyother sound and complete termination condition.

12.5.2 Coordinating robots

We now extend this discussion to a multiagent setting. Add tothe first example asecond robot which must coordinate with the first robot (number the robots 1 and 2).Specifically, robot 2 must take an action at the exact time (and never before) robot 1stops in the goal area. For concreteness, think of the robotsas teammates in the annualRobotic competition, where robotic soccer teams compete against each other. Robot 2needs to pass the ball to robot 1 in front of the opposing goal.Robot 1 must be stoppedto receive the pass, but the pass must be completed as soon it stops, or else the defensewill have time to react. The robots must coordinate on the pass without the help ofcommunication. Instead, we provide robot 2 with a sensor of robot 1’s position. LetR1 be the sensor reading of the first robot,R2 be that of the second, andp the trueposition of robot 1.

The sound and complete termination condition that we seek inthis setting is actuallya conjunction of termination conditions for the two robots.Soundness means that robot1 never stops outside the goal area and robot 2 never takes itsaction when robot 1 isnot stopped in the goal area. Completeness obviously means that the robots eventuallycoordinate on their respective actions. As in the example ofthe warring generals, thetwo robots need common knowledge in order to coordinate. Forexample, if both robotsknow that robot 1 is in the goal but robot 1 does not know that robot 2 knows this, thenrobot 1 cannot stop, because robot 2 may not know to pass the ball at that point intime. Thus, the sound and complete termination condition ofK(3g) in the singlerobot setting becomesC1,2(3g) in the current setting, whereg now means “Robot 1 isin the goal”. That is, when this fact becomes common knowledge for the first time, therobots take their respective actions.

So far, we have left the sensor of robot 2 unspecified. In the first instance we analyze,let robot 2 be equipped with a sensor identical to robot 1’s. That is, the sensor returnsthe position of robot 1 within the interval[L−1, L+1], and the noise of the sensor is de-terministic so that this value is exactly equal to the value observed by robot 1. Further,assume that this setting is common knowledge between the tworobots. We can formal-ize the setting as a partition model in which a possible world(which was(h, t) above)is a tuple〈p,R1, R2〉. The partitions of the two robots define equivalence classesbasedon the sensor of that agent. That is,Pi(〈p,R1, R2〉) = 〈p′, R′

1, R′2〉|(Ri = R′

i), fori = 1, 2. The common knowledge of the properties of the sensors reduce the space ofpossible worlds to all〈p,R1, R2〉 such that|p − R1| ≤ 1 andR1 = R2. The latterproperty implies that the partition structures for the two agents are identical.

We can now analyze this partition model to find our answer. From the single robot

c© Shoham and Leyton-Brown, 2007

Page 375: Mas kevin b-2007

12.5 Doing time, and an application to robotics 359

case, we know thatR1 ≥ 3 → K1(3g). This statement can be verified easily us-ing the partition structure. Recall the semantic definitionof common knowledge: theproposition must be true in the last world of every sequence of possible worlds thatstarts at the current world and only transitions to a world that is in the partition of anagent. Because the partition structures are identical, starting from a world in whichR1 ≥ 3, we cannot transition out the identical partition we are in for both agents, andthusC1,2(3g) holds. Therefore, the robots can coordinate by using the same rule asthe previous section: robot 1 stops whenR1 ≥ 3 is first true, and robot 2 takes itsaction whenR2 ≥ 3 is first true. Since any lower sensor reading does not permit eitherrobot to know3g, it is also the case that this rule is optimal.

One year later, we are preparing for the next Robotic competition. We’ve receivedenough funding to purchase a perfect sensor (one that alwaysreturns the true position ofrobot 1). We replace the sensor for robot 2 with this sensor, but we do not have enoughmoney left over to buy either a new sensor for robot 1 or a meansof communicationbetween the two robots. Still, we are optimistic that our improved equipment will allowus to fare better this year by improving the termination condition so that the pass can becompleted earlier. Since we have different common knowledge about the sensors, wehave a different set of possible worlds. In this instance, each possible world〈p,R1, R2〉must satisfyp = R2 instead ofR1 = R2, while also satisfying|p − R1| ≤ 1 asbefore. This change causes the robots to no longer have identical partition structures.Analyzing the new structure, we quickly realize that, not only can we not improve thetermination condition, but also that our old rule no longer works. As soon asR2 ≥ 3becomes true, we have bothK1(3g) andK2(3g). However, in the partition for robot2 in whichR2 = 3, we find possible worlds in whichR1 < 3. In these worlds, robot 1does not know that it is in the goal. Thus, we have¬K2(K1(3g)), which means thatrobot 2 cannot take the action because robot 1 might not stop to receive the pass.

Suppose we try to salvage the situation by implementing a later termination condi-tion. This idea obviously fails, because if robot 2 instead waits forR2 to be a valuegreater than 3, then robot 1 may already be passed the goal. However, our problemsrun deeper. Even if we only require that the robots coordinate at some time after robot1 enters the goal area (implicitly extending the goal to the range[2,∞]), we can neverfind a sound and complete termination condition. Common knowledge of3g is im-possible to achieve, because, from any state of the world〈p,R1, R2〉, we can followa path of possible worlds (alternating between the robot whose partition we transitionin) until we get to a world in which robot 1 is not in the goal. For example, considerthe world〈5, 6, 5〉. Robot 2 considers it possible that robot 1 observes a value of 4 (inworld 〈5, 4, 5〉). In that world, robot 1 would consider it possible that the true world is3 (in world 〈3, 4, 3〉). We can then make two more transitions to〈3, 2, 3〉 and then to〈1, 2, 1〉, in which robot 1 is not in the goal. Thus, we have¬K2(K1(K2(K1(3g)))).Since we can obviously make a similar argument for any world with a finite value forp, there does not exist a world that impliesC1,2(3g).

This example illustrates how, when coordination is necessary, knowledge of theknowledge of the other agents can be more important than objective knowledge ofstate of the world. In fact, in many cases it takes common knowledge to achieve com-mon knowledge. When robot 2 had the flawed sensor, we needed common knowledgeof the fact that the sensors we identically flawed (which was encoded in the space of

Multiagent Systems, draft of August 27, 2007

Page 376: Mas kevin b-2007

360 12 Logics of Knowledge and Belief

possible worlds) in order for the robots to achieve common knowledge of3g. Also,the fact that the agents have to coordinate is a key restriction in this setting. If instead,we only needed robot 1 to stop in the goal and robot 2 to take itsaction at some pointafter robot 1 stopped, then all we need isK1(3g) andK2(K1(3g)). This is achievedby the termination conditions ofR1 ≥ 3 for robot 1 andR2 ≥ 4 for robot 2.

12.6 From knowledge to belief

We have so far discussed a particular informational attitude, namely ‘knowledge,’ butthere are others. In this section we discuss ‘belief’, and inthe next section we’ll men-tion a third—‘certainty.’

Like knowledge, belief is a mental attitude, and concerns anagent’s view of differentstate of affairs. Indeed, we will model belief using Kripke structures (that is, possibleworlds with binary relations defined on them). However, intuition tells us that beliefhas different properties from those of knowledge, which suggests that these Kripkestructures should be different from the partition models capturing knowledge.

We will return to the semantic model of belief shortly, but itis perhaps easiest tomake the transition from knowledge to belief via the axiomatic system. Recall thatin the case of knowledge we had the veridity axiom,T: Kiϕ → ϕ. Suppose wetake theS5 system and simply drop this axiom—what do we get? Hidden in thissimple question is a subtlety, it turns out. Recall thatS5 was shorthand for the systemKDT45. Recall also thatKDT45 was logically equivalent toKT45. However,KD45 is not logically equivalent toK45; axiomD is not derivable without axiomT (why?). It turns out that bothKD45 andK45 have been put forward as a logic ofidealized belief, and in fact both have been calledweakS5. There are good reasonshowever to stick toKD45, which we will do henceforth.

The standard logic of beliefKD45 therefore consists of the following axioms; notethat we change the notationKi to Bi to reflect the fact that we are modeling beliefrather than knowledge.

Axiom 12.6.1 (K) (Biϕ ∧Bi(ϕ→ ψ))→ Biψ

Axiom 12.6.2 (D) ¬Bi(p ∧ ¬p)

Axiom 12.6.3 (4) Biϕ→ BiBiϕ

Axiom 12.6.4 (5) ¬Biϕ→ Bi¬Biϕ

These axioms—logical omniscience, consistency, positive and negative introspection—play the same roles as in the logic of knowledge, as do the two inference rules,ModusPonensandNecessitation.Modus Ponens

NecessitationThe next natural question is whether we can complement this axiomatic theory of

belief with a semantic one. That is, is there a class of Kripkestructures for which thisaxiomatization is sound and complete, just asS5 is sound and complete for the classof partition models? The theorem in Section 12.3 already provides us the answer; it isthe class of Kripke structures in which the accessibility relations are serial, transitiveand Euclidean. Just as in the case of knowledge, there is a relatively simple way to

c© Shoham and Leyton-Brown, 2007

Page 377: Mas kevin b-2007

12.7 Combining knowledge and belief (and revisiting knowledge) 361

understand this class, although it is not as simple as the class of partitions. As shown inFigure 12.8, we can envision such a structure as composed of several clusters of worlds,each of which is completely connected internally. In addition there are possibly somesingleton worlds, each of which is connected only to all the worlds within exactly onecluster. We call such a model aquasi-partition. quasi-partition

'&

$%

'

&

$

%s ss ss

s s

ss-

@@R ?

@I

)

ss

s ss

s ss

@R

A

AK

Figure 12.8 Graphical depiction of a quasi-partition structure.

We saw that it was useful to the define the notion of common knowledge. Can wesimilarly definecommon belief, and would it be a useful one? The answers are yes,common beliefand maybe. We can certainly define common belief, by mirroring the definitions forknowledge. However, the resulting notion will necessarilybe weaker. In particular, itis not hard to verify the validity of the following sentence:

KiCGϕ ≡ CGϕ

This equivalence breaks down in the case of belief and commonbelief.

12.7 Combining knowledge and belief (and revisiting knowledge)

Up until this point we have discussed how to model knowledge and belief separatelyfrom each other. But of course we may well want to combine the two, so that we canformalize sentences such as “if Bob knows that Alice believes it is raining, then Aliceknows that Bob knows it.” Indeed, it is easy enough to just merge the languages ofknowledge and belief, so as to allow the sentence:

KBBArain→ KAKBBArain

Furthermore, there is no difficulty in merging the semantic structures, and having twosets of accessibility relations over the possible worlds—partition models representingthe knowledge of the agents, and quasi-partition models representing their beliefs. Fi-nally, we can merge the axiom systems of knowledge and belief, and obtaining a soundand complete axiomatization of merged structures.

However, doing just these merges, while preserving the individual properties ofknowledge and belief, will not capture any interaction between them. And we do have

Multiagent Systems, draft of August 27, 2007

Page 378: Mas kevin b-2007

362 12 Logics of Knowledge and Belief

some strong intuitions about such interactions. For example, according to the intuitionof most people, knowledge implies belief. That is, the following sentence ought to bevalid.

Kiϕ→ Biϕ

This sentence is not valid in the class of all merged Kripke structures defined above.This means that we must introduce further restrictions on these models in order tocapture this property, and any other property about the interaction between knowledgeand belief that we care about.

We will introduce a particular way of tying the two notions together, which hasseveral conceptual and technical advantages. It will forceus, however, to reconsidersome of the assumptions we have made about knowledge and belief thus far, and toenter into philosophical discussion more deeply than we have heretofore.

To begin, we should distinguish two types of belief. Both of them are distinguishedfrom knowledge in that they aredefeasible; the agent may believe something false.However, in one version of belief the believing agent is aware of this defeasibility, whilein the other it is not. We will call the first type of belief ‘mere belief’ (or sometimes‘belief’ for short), and the second type of belief ‘certainty.’ An agent who is certain ofa fact will not admit that she might be wrong; to her, her beliefs look like knowledge. Itis onlyanotheragent who might label her beliefs as only that, and deny them the statusof knowledge. Imagine that John is certain that the bank is open, but in fact the bank isclosed. If you were to ask John, he would tell you that the bankis open. If you pressedhim with “Are you sure?”, he would answer “What do you mean ‘am Isure,’ I knowthat it is open.” But of course this cannot be knowledge, since it is false. In contrast,you can imagine that John is pretty sure that the bank is open,sufficiently so to makesubstantial plans based on this belief. In this case if you were to ask John “Is the bankopen?” he might answer “I believe so, but I am not sure.” In this case John would bethe first to admit that this is a mere belief on his part, not knowledge.5, but one shouldnot discard qualitative beliefs casually. Not only can psychological and commonsensearguments be made on behalf of this notion, but in fact the notion plays an importantrole in game theory, which is for the most part Bayesian in nature. We return to thislater in this chapter, when we discussbelief revisionin Section 13.2.1.

¿From a technical point of view, we can split theBi operator into two versions,Bmi

for ‘mere belief’ andBci for certainty. Even before entering into a formal account

of such operators, we expect that the sentenceBciϕ → Bc

iKiϕ will be valid, but thesentenceBm

i ϕ→ Bmi Kiϕ will not.

This distinction between certainty and mere belief also calls into question some ofour assumptions about knowledge, in particular the negative introspection property.Consider the following informal argument. Suppose John is certain that the bank inopen (Bc

Jopen). According to our interpretation of certainty, we have that John iscertain that he knows that the bank is open (Bc

JKJopen). Suppose that the bank isin fact closed (¬open). This means that John doesn’t know that the bank is closed,

5. A reader seeped in the Bayesian methodology would at this point be strongly tempted to protest that ifJohn isn’t sure he should quantify his belief probabilistically, rather than make a qualitative statement. Wewill indeed discuss probabilistic beliefs in Section 13.1.

c© Shoham and Leyton-Brown, 2007

Page 379: Mas kevin b-2007

12.7 Combining knowledge and belief (and revisiting knowledge) 363

because of the veridity property (¬KJopen). Because of the negative introspectionproperty, we can conclude that John knows that he doesn’t know that the bank is open(KJ¬KJopen). If we reasonably assume that knowledge implies certainty, we get alsothat John is certain that he doesn’t know that the bank is open(Bc

J¬KJopen). Thuswe get that John is certain of both the sentenceKJopen and its negation, which con-tradicts the consistency property of certainty. So something has to give—and the primecandidate is the negative introspection axiom for knowledge. And indeed, even absenta discussion of belief, this axiom has attracted criticism early on as being counter-intuitive.

This is a good point at which to introduce a caveat. Commonsense can serve at bestas a crude guideline for selecting formal models. Human language and thinking areinfinitely flexible, and we must accept at the outset a certaindiscrepancy between themeaning of a commonsense term and any formalization of it. That said, some discrep-ancies are more harmful than others, and negative introspection for knowledge may beone of the biggest offenders. This criticism does not diminish the importance of thepartition model, which as we have seen is quite useful for modeling a variety of infor-mation settings. What it does mean is that perhaps the partition model is better thoughtof as capturing something other than knowledge; perhaps ‘possesses the implicit infor-mation that’ would be a more apt descriptor of theS5 modal operator. It also meansthen that we need to look for an alternative model that bettercaptures our notion ofknowledge, albeit still in a highly idealized fashion.

Here is one such model. Before we present it, one final philosophical musing.Philosophers have offered two informal slogans to explain the connection betweenknowledge and belief. The first is “knowledge is justified, true belief.” The intuitionis that knowledge is a belief that not only is true, but is heldwith proper justification.What ‘proper’ means is open to debate, and we do not offer a formal account of thisslogan. The second slogan is “knowledge is belief that is stable with respect to thetruth.” The intuition is that, when presented with evidenceto the contrary, an agentwould discard any of its false belief. It is only correct beliefs that cannot be contra-dicted by correct evidence, and are thus stable in the face ofsuch evidence. Our formalmodel can be thought of a formal version of the second informal slogan; we will returnto this intuition when we discussbelief revisionlater in Section 13.2.1.

In developing our combined formal model of knowledge and belief, we need to beclear about which sense of belief we intend. In the followingwe restrict the attentionto the certainty kind, namelyBc

i . Since there is no ambiguity, we will use the simplernotationBi.

We have so far encountered two special classes of Kripke models—partitions, andquasi-partitions. We will now introduce a third special class, the class oftotal preorders(a total preorder≤ over domainY is a reflexive and transitive binary relation, such thatfor all y, y′ ∈ Y it is the case that eithery ≤ y′, or y′ ≤ y, or both.)

Definition 12.7.1 (KB-structure) An(n-agent) KB-structureis a tuple(W,≤1, . . . ,≤n), KB-structurewhere

• W is a set of possible worlds, and

• each≤i is a finite total preorder overW .

Multiagent Systems, draft of August 27, 2007

Page 380: Mas kevin b-2007

364 12 Logics of Knowledge and Belief

An (n-agent) KB-modelover a languageΣ is a tupleA = (W,π,≤1, . . . ,≤n), whereKB-model

• W and≤i are as above, and

• π : Σ → 2W is an interpretation function that determines which sentences in thelanguages are true in which worlds.

Although there is merit in considering more general cases, we will confine our at-tention towell-foundedtotal preorders, that is, we will assume that for no preorder≤i

does there exist an infinite sequence of worldsw1, w2, . . . such thatwj <i wj+1 for allj > 0 (where< is the anti-reflexive closure of≤).

A graphical illustration of a KB structure for a single agentis given in Figure 12.9.In this structure there are three clusters of pairwise connected worlds, and each worldin a given cluster is connected to all worlds in lower clusters.

'&

$%s s s

s s?'

&$%

s ss s?'

&$%

s s ss s

Figure 12.9 A KB structure.

We use KB-structures to define both knowledge and certainty-type belief. First, itis useful to attach an intuitive interpretation to the accessibility relation. Think of itas describing the cognitive bias of an agent;w′ ≤ w iff w′ is at least as easy forthe agent to imagine asw. The agent’s beliefs are defined as truth in the most easilyimagined worlds; intuitively, those are the only worlds theagent considers, given hiscurrent evidence. Since we are considering only finite hierarchies, these consist of the‘bottom-most’ cluster. However, there are other worlds that are consistent with theagent’s evidence; as implausible as they are, the agent’s knowledge requires truth inthem as well. (Beside being relevant to defining knowledge, these worlds are relevant

c© Shoham and Leyton-Brown, 2007

Page 381: Mas kevin b-2007

12.7 Combining knowledge and belief (and revisiting knowledge) 365

in the context of belief revision, discussed in Section 13.2.1.) Figure 12.10 depicts thedefinitions graphically; the full definitions follow.

'&

$%s s s

s s?'

&$%

s ss s?'

&$%

s s ss s

w

Ki

Bi

Figure 12.10 Knowledge and belief in a KB model.

Definition 12.7.2 (Logical entailment for KB-models) LetA = (W,π,≤1, . . . ,≤n)be a KB-model overΣ, andw ∈W . Then we define the|= relation as

• If ϕ ∈ Σ, thenA,w |= ϕ if and only ifw ∈ π(ϕ)

• A,w |= Kiϕ if and only if for all worldsw′, if w′ ≤i w thenA,w′ |= ϕ.

• A,w |= Biϕ if and only ifϕ is true in all worlds minimal in≤i.

We can now ask whether there exists an axiom system that captures the propertiesof knowledge and belief, as defined by KB models. The answer isyes; the system thatdoes it, while somewhat more complex thanS5 or KD45, is nonetheless illuminating.If we only wanted to capture the notion of knowledge, we wouldneed the so-calledS4.3 axiomatic system, a well known system of modal logic capturing total preorders.But we want to capture knowledge as well as belief. This is done by the followingaxioms (to which one must add the two usual inference rules ofmodal logic):

Knowledge:

Axiom 12.7.3 Ki(ϕ→ ψ)→ (Kiϕ→ Kiψ)

Axiom 12.7.4 Kiϕ→ ϕ

Multiagent Systems, draft of August 27, 2007

Page 382: Mas kevin b-2007

366 12 Logics of Knowledge and Belief

Axiom 12.7.5 Kiϕ→ KiKiϕ

Belief:

Axiom 12.7.6 Bi(ϕ→ ψ)→ (Biϕ→ Biψ)

Axiom 12.7.7 Biϕ→ ¬Bi¬ϕ

Axiom 12.7.8 Biϕ→ BiBiϕ

Axiom 12.7.9 ¬Biϕ→ Bi¬Biϕ

Knowledge and belief:

Axiom 12.7.10Kiϕ→ Biϕ

Axiom 12.7.11Biϕ→ BiKiϕ

Axiom 12.7.12Biϕ→ KiBiϕ

Axiom 12.7.13 ¬Biϕ→ Ki¬Biϕ

The properties of belief are precisely the properties of aKD45 system, the standardlogic of belief. The properties of knowledge as listed consist of theKD4 system, alsoknown asS4. However, by virtue of the relationship between knowledge and belief,one obtains additional properties of knowledge. In particular, one obtains the propertyof the so-calledS4.2 system, a weak form of introspection:

¬Ki¬Kiϕ→ Ki¬Ki¬Kiϕ

This property is unintuitive, until one notes another surprising connection that can bederived between knowledge and belief in our system:

¬Ki¬Kiϕ↔ Biϕ

While one can debate the merits of this property, it is certainly less opaque than theprevious one; and, if this equivalence is substituted into the previous formula, then wesimply get one of the introspection properties of knowledgeand belief!

12.8 History and references

The most comprehensive one-stop shop for logics of knowledge and belief is Faginet al. [1995]. It is written by computer scientists, but covers well the perspectives ofphilosophy and game theory. Readers who would like to go directly to the sourcesshould start with Hintikka [1962] in philosophy, Aumann [1976] in game theory (whointroduced the partition model), Moore [1985] in computer science for an artificialintelligence perspective, and Halpern and Moses [1990] in computer science for a dis-tributed systems perspective. These are all seminal publications, in rough chronolog-ical order of development. Another good reference are the proceedings of Theoreti-cal Aspects of Rationality and Knowledge—or TARK—which can befound online at

c© Shoham and Leyton-Brown, 2007

Page 383: Mas kevin b-2007

12.8 History and references 367

www.tark.org (the acronym originally stood for Theoretical Aspects of Reasoningabout Knowledge). There are many books on modal logic in general, not only whenapplied to reasoning about knowledge and belief; from the classic Chellas [1980] tomore modern ones such as Fitting and Mendelsohn [1999] or Blackburnet al. [2002].The application of logics of knowledge in robotics is based on Brafmanet al. [1996];Brafmanet al. [1998].

Multiagent Systems, draft of August 27, 2007

Page 384: Mas kevin b-2007
Page 385: Mas kevin b-2007

13 Beyond Belief: Probability, Dynamicsand Intention

In this chapter we go beyond the model of knowledge and beliefintroduced in theprevious chapter. Here we look at how one might represent statements such “Marybelieves that it will rain tomorrow with probability> .7”, and even “Bill knows thatJohn believes with probability .9 that Mary believes with probability> .7 that it willrain tomorrow”. We will also look at rules that determine howthese knowledge andbelief statements can change over time, more broadly at the connection between logicand games, and consider how to formalize the notion of intention.

13.1 Knowledge and probability

In a Kripke structure, each possible world is either possible or not possible for a givenagent, and an agent knows (or believes) a sentence when the sentence is true in all ofthe worlds that are accessible for that agent. As a consequence, in this framework bothknowledge and belief are binary notions, in that agents can only believe or not believea sentence (and similarly for knowledge). We would like now to add a quantitativecomponent to the picture. In our quantitative setting we will keep the notion of knowl-edge as is, but will be able to make statements about the degree of an agent’s belief ina particular proposition. This will allow us to express not only statements of the form“the agent believes with probability .3 that it will rain” but also statements of the form“agenti believes with probability .3 that agentj believes with probability .9 that it willrain.” These sort of statements can become tricky if we are not careful – for example,what happens ifi = j in the last sentence?1

There are several ways of formalizing a probabilistic modelof belief in a multiagentsetting, which vary in their generality. We will define a relatively restrictive class ofmodels, but will then mention a few ways in which these can be generalized.

Our technical device will be to simply take our partition model and overlay a commonly-known probability distribution (called thecommon prior) over the possible worlds. common prior

Definition 13.1.1 (Multiagent probability structure) Given a setX, letΠ(X) be the

1. Indeed, some years back theNew Yorkermagazine published a cartoon with the following caption:Thereis now 60% chance of rain tomorrow, but there is 70% chance that later this evening the chance of raintomorrow will be 80%.

Page 386: Mas kevin b-2007

370 13 Beyond Belief: Probability, Dynamics and Intention

class of all probability distributions overX. Then we define a(common prior) multi-agent probability structureM over a non-empty setΦ of primitive propositions as themultiagent

probabilitystructure

tuple(W,π, I1, . . . , In,P) where

• W is a non-empty set ofpossible worlds

• π : Φ → 2W is an interpretationthat associates with each primitive propositionp ∈ Φ the set of possible worldsw ∈W in whichp is true.

• EachIi is a partition relation, just as in the original partition model (see Defini-tion 12.1.1).

• P ∈ Π(W ) is thecommon prior probability.

Adding the probability distribution does not change the setof worlds that an agentconsiders possible from a worldw, but it does allow us to quantify how likely an agentconsiders each of these possible worlds. In worldw, agenti can condition on the factthat it is in the partitionI(w) to determine the probability that it is in eachw′ ∈ I(w).For allw′ 6∈ Ii(w) it is the case thatPi(w

′ | w) = 0, and for allw′ ∈ Ii(w) we have:

Pi(w′|w) =

P(w′)∑v|v∈I(w) P(v)

(13.1)

Note that this means that ifw′ andw′′ lie in the same partition for agenti, thenPi(w|w′) = Pi(w|w′′). This is one of the restrictions of our formulation, to whichwereturn below.

We will often drop the designations “common prior” and “multiagent” when theyare understood from the context, and simply use the term “probability structure”. Nextwe discuss the syntax of a language to reason about probability structures and its se-mantics. We define the syntax of this new language formally asfollows.

Definition 13.1.2 (Well-formed probabilistic sentences)Given a setΦ of primitivepropositions, the set of well-formed probabilistic sentencesLP is defined as

• Φ ⊆ LP

• If ϕ,ψ ∈ LP thenϕ ∧ ψ ∈ LP and¬ϕ ∈ LP

• If ϕ ∈ LP thenPi(ϕ) ≥ a ∈ LP , for i = 1, . . . , n, anda ∈ [0, 1]

The intuitive meaning of these statements is clear; in particular,Pi(ϕ) ≥ a statesthat the probability ofϕ is at leasta. For convenience, we can use several other com-parison operators on probabilities, includingPi(ϕ) ≤ a ≡ Pi(¬ϕ) ≥ a, Pi(ϕ) = a ≡Pi(ϕ) ≥ a ∧ Pi(ϕ) ≤ a, andPi(ϕ) > a ≡ Pi(ϕ) ≥ a ∧ ¬Pi(ϕ) = a.

Now note that we can use this language and its abbreviations to express formally oursample higher-order sentences from above. To express the sentence “agent 1 believeswith probability0.3 that agent 2 believes with probability0.7 thatq,” we would writeP1(P2(q) = 0.7) = 0.3. Similarly, to express the sentence “agent 1 believes withprobability 1 that she herself believes with probability at least0.5 that p,” we wouldwrite P1(P1(p) ≥ 0.5) = 1.

Now we define|=, the satisfaction relation, which links our syntax and semantics.

c© Shoham and Leyton-Brown, 2007

Page 387: Mas kevin b-2007

13.1 Knowledge and probability 371

Definition 13.1.3 (|= relation) Let p ∈ Φ be a primitive proposition andϕ andψ besentences of modal logic. We define the|= relation as

• M,w |= p if and only ifw ∈ π(p)

• M,w |= ¬ϕ if and only ifM,w 6|= ϕ

• M,w |= ϕ ∧ ψ if and only ifM,w |= ϕ andM,w |= ψ

• M,w |= Pi(ϕ) ≥ a if and only if∑

v|(v∈I(w))∧(M,v|=ϕ) P(v)∑

v|v∈I(w) P(v) ≥ a

The following example illustrates the definitions.

p, q, r 0.2

p,¬q, r 0.1 p, q,¬r 0.1

p,¬q,¬r 0

¬p, q, r 0.1

¬p,¬q, r 0.25 ¬p, q,¬r 0.1

¬p,¬q,¬r 0.15

I1

I2

Figure 13.1 A KP-structure with a common prior.

Consider Figure 13.1. The interpretation of this structureis that each agent knowsthe truth of exactly one proposition:p for agent 1, andq for agent 2. If the real worldis w = (p, q, r), then we haveM,w |= (P1(q) = 0.75), because conditioning on thepartition I1(w) yields: 0.2+0.1

0.2+0.1+0.1+0 = 0.75. We now go a level deeper to consideragent 2 modeling agent 1. In the partition for agent 2 containing the true world,I2(w),the probability that agent 2 assigns to being in the partition I1(w) is 0.2+0.1

0.2+0.1+0.1+0.1 =0.6. Since the other partition for agent 1 yields a differentP1(q), we have the sentenceM,w |= P2(P1(q) = 0.75) = 0.6.

This is a good point at which to discuss to some of the constraints of the theory as wehave presented it. To begin with, we have implicitly assumedthat the partition structureof each agent is common knowledge among all agents. Second, we assumed that thebeliefs of the agents are based on a common prior, and are obtained by conditioning onthe worlds in the partition. Both are substantive assumptions, and have ramifications.For example, we have the fact that the beliefs of an agent are the same within all worldsof any given partition. Also note that we have a strong property of discreteness in thehigher-order beliefs. Specifically, the number of statements of the form “M,w |=Pi(Pj(ϕ) = a) = b” in which b > 0 is equal to at most the number of partitions foragentj, and the sum of allb’s from these statements is equal to 1. Thus, you do not

Multiagent Systems, draft of August 27, 2007

Page 388: Mas kevin b-2007

372 13 Beyond Belief: Probability, Dynamics and Intention

need intervals to account for all of the probability mass, asyou would with a continuousdistribution. In fact, it is the case that for any depth of recursive modeling, you canalways decompose a probability over a range into a finite number of probabilities ofindividual points. One could imagine an alternative formulation in which agenti didnot know the prior of agentj, but instead had a continuous distribution over agentj’spossible priors. In that case, you could haveM,w |= Pi(Pj(ϕ) ≥ b) > 0 withoutthere being any specifica ≥ b such thatM,w |= Pi(Pj(ϕ) = a) > 0.

We could in fact endow agents with probabilistic beliefs without assuming a commonprior and with a much more limited use of the partition structure. For example, wecould replace, in each world and for each agent, the set of accessible worlds by aprobability distribution over a set of worldsPi,w. In this case the semantic conditionfor belief would change to:

• M,w |= Pi(ϕ) ≥ a if and only if∑

v;M,v|=ϕ Pi,w(v) ≥ a.

If we want to retain the partition structure so we can speak about knowledge as wellas (probabilistic) belief, we could add the requirement that Pi,w(w′) = 0 for all w′ 6∈I(w).

However, such interesting extensions bring up a variety of complexities, including,in particular, the axiomatization of such a system, and throughout this book we staywithin the confines of the theory as presented.

As we have discussed, every probability space gives rise to an infinite hierarchy ofbeliefs for each agent: beliefs about the values of the primitive propositions, aboutother agents’ beliefs, beliefs about other agents’ beliefsabout other agents’ beliefs,and so on. This collection of beliefs is called the “epistemic type space” in gametheory, and plays an important role in the study of games of incomplete information,or Bayesian games, discussed in Section 5.3. (There, each possible world is a differentgame, and the partition of each agent represents the set of games that the agent cannotdistinguish between, given the signal it receives from nature.) One can ask whether theconverse holds: Is it the case that every type space is given rise to by some multiagentprobability structure? The perhaps surprising answer is yes, so long as the type spaceis self coherent. Self coherence, which plays the role analogous to a combination of thepositive and negative introspection properties of knowledge and (qualitative) belief, isdefined as follows:

Definition 13.1.4 (Self coherent belief)A collection of higher-order beliefs isself co-herentiff, for any agenti, any propositionϕ, and anya ∈ [0, 1], if Pi(ϕ) ≥ a thenself coherent

belief Pi(Pi(ϕ) ≥ a) = 1.

Recall that in modal logics of knowledge and belief we augmented the single-agentmodalities with group ones. In particular, we defined the notion of common knowledge.We can now do the same for probabilistic beliefs, and define the notion ofcommon(probabilistic) belief:common

(probabilistic)belief Definition 13.1.5 (Common belief)A sentenceϕ is commonly believedamong two

agentsa and b, written Cpa,b(ϕ), if Pa(ϕ) = 1, Pb(ϕ) = 1, Pa(Pb(ϕ) = 1) = 1,

Pb(Pa(ϕ) = 1) = 1, Pa(Pb(Pa(ϕ) = 1) = 1) = 1, and so on. The definition extends

c© Shoham and Leyton-Brown, 2007

Page 389: Mas kevin b-2007

13.2 Dynamics of knowledge and belief 373

naturally to common (probabilistic) belief among an arbitrary setG of agents, denotedCp

G.

Now that we have the syntax and semantics of the language, we may ask whetherthere exists an axiomatic system for this language that is sound and complete for theclass of common-prior probability spaces. The answer is that one exists, and it is thefollowing:

Definition 13.1.6 (axiom systemAXP ) The axiom systemAXP consists of the fol-lowing axioms and inference rules (in the schemas belowϕ andψ range over all sen-tences, andni, nj ∈ [0..1]):

Axiom 13.1.7 (A1) All of the tautological schema of propositional logic

Axiom 13.1.8 (P1: non-negativity) Pi(ϕ) ≥ 0

Axiom 13.1.9 (P2: additivity) Pi(ϕ ∧ ψ) + Pi(ϕ ∧ ¬ψ) = Pi(ϕ)

Axiom 13.1.10 (P3: syntax independence)Pi(ϕ) = Pi(ψ) if ϕ ⇔ ψ is a proposi-tional tautology

Axiom 13.1.11 (P4: positive introspection)(Pi(ϕ) ≥ a)→ Pi(Pi(ϕ) ≥ a) = 1

Axiom 13.1.12 (P5: negative introspection)(¬Pi(ϕ) ≥ a)→ Pi(¬Pi(ϕ) ≥ a) = 1

Axiom 13.1.13 (P6: common-prior assumption)Cpi,j(Pi(ϕ) = ni ∧ Pj(ϕ) = nj)→

ni = nj

Axiom 13.1.14 (R1) Fromϕ andϕ→ ψ inferψ

Axiom 13.1.15 (R2) Fromϕ infer Pi(ϕ) = 1

Notes that axioms P4 and P5 are the direct analogs of the positive and negativeintrospection axioms in the modal logics of knowledge and belief. Axiom P6 is novel,and captures common-prior assumption.

Now we are ready to give the soundness and completeness result:

Theorem 13.1.16The axiom systemAXP is sound and complete with respect to theclass of all common-prior multiagent probability structures.

13.2 Dynamics of knowledge and belief

We have so far discussed how to represent “snapshots” of knowledge and belief. Wedid speak a little about how (e.g.,) knowledge changes over time, for example in thecontext of the Muddy Children problem. But the theories presented were all static ones.For example, in the Muddy Children problem we used the partition model to representthe knowledge state of the children after each time the father asks his question, but didnot give a formal account for how the system transitions fromone state to the other.This chapter is devoted to discussing such dynamics.

Multiagent Systems, draft of August 27, 2007

Page 390: Mas kevin b-2007

374 13 Beyond Belief: Probability, Dynamics and Intention

We will first consider the problem ofbelief revision, which is the process of revisingbelief revisionan existing state of belief on the basis of newly learned information. We will considerthe revision of both qualitative and quantitative (i.e., probabilistic) beliefs. Then wewill briefly look at dynamic operations on beliefs that are different from revision, suchasupdateandfusion.

13.2.1 Belief revision

One way in which the knowledge and belief of an agent change iswhen the agentlearns new facts, whether by observing the world or by being informed by anotheragent. Whether the beliefs are categorical (as in the logicalsetting) or quantitative (asin the probabilistic setting), we call this processbelief revision.belief revision

When the new information is consistent with the old beliefs, the process is straight-forward. In the case of categorical beliefs, one simply addsthe new beliefs to the oldones and takes the logical closure of the union. That is, consider a knowledge base (orbelief base—here it does not matter)K and new informationϕ such thatK 6|= ¬ϕ.The result of revisingK byϕ, writtenK ∗ϕ, is simplyCn(K ∨ϕ), whereCn denotesthe logical closure operator. Or thought of semantically, the models ofK ∗ ϕ consistof the intersection of the models ofK and the models ofϕ.

The situation is equally straightforward in the probabilistic case. Consider a priorbelief in the form of a probability distributionP (·) and new informationϕ such thatP (ϕ) > 0. P ∗ ϕ is then simply the posterior distributionP (· | ϕ).

Note that in both the logical and probabilistic settings, the assumption that the newinformation is consistent with the prior belief (or knowledge) is critical. Otherwise,in the logical setting the result of revision yields the empty set of models, and in theprobabilistic case the result is undefined. Thus the bulk of the work in belief revisionlies in trying to capture how beliefs are revised by information that is inconsistent withthe prior beliefs (and, given their defeasibility, these are really beliefs, as opposed toknowledge).

One might be tempted to argue that this is a waste of time. If anagent is misguidedenough to hold false beliefs, let him suffer the consequences. But this is not a tenableposition. First, it is not only the agent who suffers the consequences, but also otheragents—and we, the modelers—who must reason about him. But beyond that, thereare good reasons why agents might hold firm beliefs and later retract them. In thelogical case, if an agent were to wait for foolproof evidencebefore adopting any belief,the agent would never believe anything but tautologies. Indeed, the notion of beliefis intimately tied to that ofdefault reasoningandnonmonotonic reasoning, which aredefault

reasoning

nonmonotonicreasoning

motivated by just this observation.In the probabilistic case too there are times at which it is unnatural to assign an event

a probability other than zero. We encounter this in particular in the context of non-cooperative game theory. There are situations in which it isastrictly dominant strategy(see Section 2.3.5) for an agent to take a certain action, as in the following example.

Two foes, Lance and Lot, about to enter into a duel. Each of them mustchoose a weapon, and then decide on a fighting tactic. Lance can chooseamong two swords—an old, blunt sword, and a new, sharp one. The

c© Shoham and Leyton-Brown, 2007

Page 391: Mas kevin b-2007

13.2 Dynamics of knowledge and belief 375

new one is much better than the old, regardless of the weapon selectedby Lot and the tactics selected by either foe. So in selectinghis fightingtactic, Lot is justified in assuming that Lance will have selected the newsword with probability one. But what should Lot do if he sees that Lanceselected the old sword after all?

If this example seems a bit unnatural, the reader might referforward to the discussionof backward inductionin Chapter 2. backward

inductionIndeed, a great deal of attention has been paid to belief revision by informationinconsistent with the initial beliefs. We first describe theaccount of belief revisionin the logical setting; we then show how the account in the probabilistic setting isessentially the same.

Logical belief revision: The AGM model

We start our discussion of belief revision semantically, and with a familiar structure—the KB-models of Section 12.7. In that section KB-models were used to distinguishedknowledge from (a certain kind of) belief. Here we use them for an additional purpose,namely to reason about conditional beliefs. Technically, in Section 12.7 KB-modelswere used to give meaning to the two modal operatorsKi andBi. Given a particularworld,Ki was defined as truth in all worlds “downward” from it, andBi was defined(loosely speaking) as truth in all worlds in the “bottom-most” cluster. But the KB-model can be mined for further information, and here is the intuition. Imagine a KB-model, and a piece of evidenceϕ. Now erase from that model all the worlds that dono satisfyϕ, and all the links to and from those worlds; the result is a new, reducedKB-model. The new beliefs of the agent, after takingϕ into account, are the beliefs inthis reduced KB-model. The following definition makes this precise:

Definition 13.2.1 (Belief revision in a KB-model) Given a KB-modelA = (W,π,≤1

, . . . ,≤n) overΣ and anyϕ ∈ Σ, letW (ϕ) = w ∈ W | A,w |= ϕ, let≤i (ϕ) =(w1, w2) ∈≤i| A,w1 |= ϕ,A,w2 |= ϕ, and letA(ϕ) = (W (ϕ), π,≤1 (ϕ), . . . ,≤n

(ϕ)). Then the beliefs of agenti after receiving evidenceϕ, denotedBϕi , are defined

as follows:

A,w |= Bϕi (ψ) iff A(ϕ), w |= ψ

Figure 13.2 illustrates this definition. The checkered areas contain all worlds that dono satisfyϕ.

Note some nice properties of this definition. First, note that you have thatBiψ ↔Btrue

i ψ, wheretrue is any tautology. Second, recall the philosophical slogan regardingknowledge as belief that is stable with respect to the truth.We now see, in a precisesense, why our definitions of knowledge and belief can be viewed as embodying thisslogan. Considerψ such thatA,w |= Kiψ. Obviously,A,w |= Biψ, but it is also thecase thatA,w |= Bϕ

i ψ for anyϕ such thatA,w |= ϕ. So we get one direction of theslogan—if a proposition is known, then indeed it is a belief that will be retained in theface of any correct evidence. While the converse—that any believed proposition that isnot known can be falsified by some evidence—is less straightforward, we point that itwould hold if the languageΣ were rich enough to capture all propositions.

Multiagent Systems, draft of August 27, 2007

Page 392: Mas kevin b-2007

376 13 Beyond Belief: Probability, Dynamics and Intention

'&

$%s s s

s s?'

&$%

s ss s?

'&

$%

s s ss s

Belief Revision-

'&

$%s s s

s s?'

&$%

s s

Figure 13.2 Example of belief revision.

As usual, we ask whether this semantic definition can be characterized through al-ternative means. The answer is yes, and in fact we will discuss two ways of doing so.The first is via the so-calledAGM postulates, so named after Alchurron, GardenforsAGM postulatesand Makinson. These set-theoretic postulates take arbitrary belief revision operator∗and the initial belief setK, and asks, for any evidenceϕ, what the properties should beof K ∗ϕ, the revisionK byϕ. For example, one property is the following:ϕ ∈ K ∗ϕ.This is theprioritization rule, or thegullibility rule—new information is always ac-prioritization

rule

gullibility rule

cepted, and thus is given priority over old one (other rules require that the theory beconsistent, which can cause old information to be discardedas a result of the revision).

The AGM postulates for revising a theoryK are as follows (recall thatCn(T ) de-notes the tautological consequence of the theoryT ):

(∗1) K ∗ ϕ = Cn(K ∗ ϕ).

(∗2) ϕ ∈ K ∗ ϕ.

(∗3) K ∗ ϕ ⊆ Cn(K,ϕ).

(∗4) If ¬ϕ 6∈ K thenCn(K,ϕ) ⊆ K ∗ ϕ.

(∗5) If ϕ is consistent thenK ∗ ϕ is consistent.

(∗6) If |= ϕ↔ ψ thenK ∗ ϕ = K ∗ ψ.

(∗7) K ∗ (ϕ ∧ ψ) ∈ Cn(K ∗ ϕ,ψ)

c© Shoham and Leyton-Brown, 2007

Page 393: Mas kevin b-2007

13.2 Dynamics of knowledge and belief 377

(∗8) If ¬ψ 6∈ K ∗ ϕ thenCn(K ∗ ϕ,ψ) ⊆ K ∗ (ϕ ∧ ψ).

In a sense that we will make precise below, these exactly characterize belief revi-sion with KB-models. But before making this precise, let us discuss an alternativeaxiomatic characterization of belief revision operators.This is an axiomatic theory ofconsequence relations. By definition a meta-theory, this axiomatic theory consists ofrules of the form “If derivation x is valid then so is derivation y.”

It turns out that this meta-theoretic characterization lends particularly deep insightinto belief dynamics. For example, in the classical logic setting, themonotonicity rule monotonicity

ruleis valid.

(Monotonicity) α⊢βα,γ⊢β

The analogous rule for belief revision would be “Ifβ ∈ K ∗α thenβ ∈ K ∗(α∧γ)”.This does not hold for belief revision, and indeed for this reason belief revision is oftencallednonmonotonic reasoning. There are however other meta-rules that hold for beliefnonmonotonic

reasoningrevision. Consider the following rules; in these we use the symbol |∼ to represent thefact that we are reasoning about a non-monotonic consequence relation, and not theclassical⊢.

(Left Logical Equivalence)|=α↔β,α|∼γβ|∼γ

(Right Weakening)|=α→β,γ|∼αγ|∼β

(Reflexivity)α |∼ α

(And) α|∼β,α|∼γα|∼β∧γ

(Or) α|∼γ,β|∼γα∨β|∼γ

(Cautious Monotonicity)α|∼β,α|∼γα∧β|∼γ

(Rational Monotonicity)α∧β 6|∼γ,α6|∼¬βα6|∼γ

A consequence relation that satisfies all of these properties is called arational con-sequence relation. rational

consequencerelation

The following theorem ties together the semantic notion, the axiomatic one, and themeta-axiomatic one:

Theorem 13.2.2Consider propositional languageL with a finite alphabet, a revisionoperator∗, and a theoryK ⊆ L. Then the following are equivalent:

1. ∗ is defined by a finite total preorder: There is a single-agent KB-modelA and aworldw such thatA,w |= Bρ for eachρ ∈ K, and for eachϕ,ψ ∈ L it is the casethatψ ∈ K ∗ ϕ iff A,w |= Bϕψ.

2. ∗ satisfies the AGM postulates.

3. ∗ is a rational consequence relation.

Multiagent Systems, draft of August 27, 2007

Page 394: Mas kevin b-2007

378 13 Beyond Belief: Probability, Dynamics and Intention

Probabilistic belief revision

As we have discussed, the crux of the problem in probabilistic belief revision is theinability, in the traditional Bayesian view, to condition beliefs on evidence whose priorprobability is zero (also known as measure-zero events). Specifically, the definition ofconditional probability:

P (A | B) =P (A ∩B)

P (B)

leaves the conditional beliefP (A | B) undefined when P(B)=0.There exist various extensions of traditional probabilitytheory to deal with this prob-

lem. In one of them, based on so-calledPopper functions, one takes the expressionPopperfunctions P (A | B) as primitive, rather than defined. In another, called the theory of non-

standard probabilities, one allows as probabilities not only the standard real numbersnon-standardprobabilities but alsononstandard reals. These two theories, as well as several others, turn out to

be essentially the same, except for some rather fine mathematical subtleties. We dis-cuss explicitly one of these theories, the theory of Lexicographic Probability Systems(LPSs).

Definition 13.2.3 (Lexicographic probability system) A (finite) lexicographic prob-ability system (LPS)is a sequencep = p1, p2, ..., pn of probability distributions. Givenlexicographic

probabilitysystem (LPS)

such an LPS, we say that eventp(A | B) = c if there is an index1 ≤ i ≤ n such thatfor all 1 ≤ j < i it is the case thatpj(B) = 0, pi(B) 6= 0, and (now using the classicalnotion of conditional utility)pi(A|B) = c). Similarly, we say that eventA has a higherprobability than eventB (p(A) > p(B)) if there is an index1 ≤ i ≤ n such that forall 1 ≤ j < i it is the case thatpj(A) = pj(B), andpi(A) > pi(B).

In other words, to determine the probability of an event in anLPS, one uses the firstprobability distribution if it is well defined for this event. If it is not, one tries the seconddistribution, and so on until the probability is well-defined. A standard example of thisinvolves throwing a die. It may land on one of its six faces, with equal probability.There is also a possibility that it will land on one of its twelve edges; minuscule as thisprobability is, it is nonetheless a possible event. Finally, even more improbably, the diemay land and stay perched on one of its eight corners. In a classical probabilistic settingone usually accords each of the first six events—corresponding to the die landing onone of the faces—a probability of 1/6, and the other 20 events aprobability of zero. Inthis more general setting, however, one can define an LPS(p1, p2, p3) as follows. p1

would be the classical distribution just described.p2 would give a probability of 1/12to the die landing on each of the edges, and 0 to all other events. Finally, thep3 wouldgive a probability of 1/8 to the die landing on each of the corners, and 0 to all otherevents.

LPSs are closely related to AGM-style belief revision:

Theorem 13.2.4Consider propositional languageL with a finite alphabet, a revisionoperator∗, and a theoryK ⊆ L. Then the following are equivalent:

1. ∗ satisfies the AGM postulates for revisingK.

c© Shoham and Leyton-Brown, 2007

Page 395: Mas kevin b-2007

13.2 Dynamics of knowledge and belief 379

2. There exists an LPSp = p1, . . . , pn such thatp1(K) = 1, and such that for everyϕ andψ it is the case thatψ ∈ K ∗ ϕ iff p(ψ | ϕ) = 1.

Thus, we have extended the three-way equivalence of Theorem13.2.2 to a four-wayone.

13.2.2 Beyond AGM: update, arbitration, fusion, and friends

In discussing the dynamics of belief we have so far limited ourselves to belief revision,and a specific type of revision at that (namely, AGM-style belief revision). But thereare other forms of belief dynamics, and this section discusses some of them. We do somore briefly than with belief revision; at the end of the chapter we provide referencesto further readings on these topics.

Expansion and contraction

To begin with, there are two simple operations closely linked to revision that are worthnoting.Expansionis simply the addition of a belief, regardless of whether it leads to a belief expansioncontradiction. The expansion of a theoryK by a formulaϕ, writtenK + ϕ, is definedby:

K + ϕ = Cn(K ∪ ϕ)

Contractionis the operation of removing just enough from a theory so as tomake it beliefcontractionconsistent with new evidence. The contraction of a theoryK by a formulaϕ is written

K − ϕ. Like revision, and unlike expansion, contraction is not a unique operation, butrather a family of operations that are governed by certain rules. Rather than list theserules, here is a simple equation (called theLevi identity) that relates all three operations: Levi identity

K ∗ ϕ = (K − ¬ϕ) + ϕ

Update

Belief updateis another interesting operation. Similar to revision, it incorporates new belief updateevidence into an existing belief state, ensuring that consistency is maintained. But theintuition behind update is subtly different from the intuition underlying revision. Inrevision, the second argument represents new evidence of facts that were true all along.In update, the second argument represents facts that have possibly become true onlyafter the original beliefs were formed. Thus if an agent believes that it is not rainingand suddenly feels raindrops, in the case of revision the assumption is that she waswrong to have those original beliefs, but in the case of update the assumption is thatshe was right, but that since then it started raining.

The different intuitions underlying revision and update translate to different conclu-sions they can yield. Consider the following initial belief: “Either the room is white,or else the independence day of Micronesia is November 2nd (or both)”. Now con-sider two scenarios. In the first one, you look in the room and see that it is green; thewhite hypothesis is ruled out, and you infer that the independence day of Micronesia

Multiagent Systems, draft of August 27, 2007

Page 396: Mas kevin b-2007

380 13 Beyond Belief: Probability, Dynamics and Intention

is November 2nd. This is an instance of belief revision. But now consider a differentscenario, in which a painter emerges from the room and informs you that he has justpainted the room green. You now have no business inferring anything about Microne-sia. This is an instance of belief update.2

Since revision and update are different, it is to be expectedthat the set of postulatesgoverning update will differ from the AGM postulates. The update of a theoryK by aformulaϕ is writtenK ⋄ ϕ. The so-called KM postulates for updating a (consistent)theoryK (so named after Katsuno and Mendelzon) read as follows:

(⋄1) K ⋄ ϕ = Cn(K ⋄ ϕ).

(⋄2) ϕ ∈ K ⋄ ϕ.

(⋄3) If ϕ ∈ K thenK ⋄ ϕ = K.

(⋄4) K ⋄ ϕ is inconsistent iffϕ is inconsistent.

(⋄5) If |= ϕ ≡ ψ thenK ⋄ ϕ = K ⋄ ψ.

(⋄6) K ⋄ (ϕ ∧ ψ) ⊆ (K ⋄ ϕ) + ψ.

(⋄7) If ψ ∈ K ⋄ ϕ andϕ ∈ K ⋄ ψ thenK ⋄ ϕ = K ⋄ ψ.

(⋄8) If K is complete3 thenK ⋄ (ϕ ∧ ψ) ⊆ K ⋄ ϕ ∩K ⋄ ψ.

(⋄9) K ⋄ ϕ = ∩M∈Comp(K)M ⋄ ϕ, whereComp(K) denotes the set of all completetheories that entailK.

As in the case of revision, the model theory of update is also well understood, and isas follows (this discussion is briefer than that for revision, and we point the reader tofurther reading at the end of the chapter).

Revision and update can be related by the following identity:

K ⋄ ϕ = ∩M∈Comp(K)M ∗ ϕ

Comparing this property to the last postulate for update, one can begin to gain intuitionfor the model theoretic characterization of the operator. In particular, it is the casethat there is no distinction between revision and update when dealing with completetheories. More generally, the following completely characterizes the class of updateoperators that obey the KM postulates:

Theorem 13.2.5The following two statements about an update operator⋄ are equiv-alent:

1. ⋄ obeys the KM postulates.

2. As it happens, the Federated States of Micronesia, which have been independent since 1986, celebrateindependence day on November 3rd.3. A theoryK is complete if for each sentenceϕ, eitherϕ ∈ K or¬ϕ ∈ K.

c© Shoham and Leyton-Brown, 2007

Page 397: Mas kevin b-2007

13.2 Dynamics of knowledge and belief 381

2. There exists a function that maps each interpretationM to a partial preorder≤M

such that

Mod(K ⋄ ϕ) = ∪M∈Comp(K)Min(Mod(M),≤M )

Note two important contrasts with the model theoretic characterization of belief revi-sion. First, here the preorders need not be total. Second, and more critically, in revisionthere is one global preorder on worlds; here each interpretation induces a different pre-order.

Arbitration

Returning to AGM-style revision, we note two aspects of asymmetry. The first isblatant—new evidence is given priority over existing beliefs. The second is moresubtle—the first argument to any AGM revision operator is a “richer” type of objectthan the second. In this subsection we discuss the first asymmetry, and in the nextsubsection the second.

There are certainly situations in which new evidence is not necessarily given prece-dence over existing beliefs. Indeed, from the multiagent perspective, each revision canbe seen as involving at least two agents—the believer, and theinformer. When synthe-sizing a new belief, the believer may wish to accord itself higher priority, to not favorone of them over the other, or to give priority to one over the other depending on thesubject matter. Various theories exist to capture this intuition. One of them is the theoryof belief arbitration, which takes an egalitarian approach: it does not favor either of belief arbitrationthe two sides over the other. Technically speaking, it does so by jettisoning the secondAGM postulateϕ ∈ K ∗ ϕ, and replaces it with a “fairness” axiom:

If |= ¬(K ∧ ϕ) thenK 6⊆ K ∗ ϕ andϕ 6∈ K ∗ ϕ

The intuition is that if the new evidence is inconsistent with the initial beliefs, theneach must “give up something.”

Fusion

The other source of asymmetry in AGM revision is more subtle.The AGM postulatesobscure the asymmetry between the two arguments. InK ∗ϕ,K is abelief set, or a set belief setof sentences (that happen to be tautologically closed). True, usually we think ofϕ as asingle sentence, but for most purposes it would matter little if we have it represent a setof sentences. However, it must be remembered that the AGM postulates do not definethe operator∗, but only constrain it. As is seen from Theorem 13.2.2, any particular ∗is defined with respect to a completebelief state, or total preorder on possible worlds. belief stateThis belief state defines the initial belief set, but it defines much more, namely all thebeliefs conditional on new evidence. Thus every specific AGMoperator takes as itsfirst input a belief state, and as its second a mere belief set.We now consider whathappens when the second argument is also a belief state.

The first question to ask is what does it mean intuitively to take a second belief stateas an argument. Here again the multiagent perspective is useful. We think of a belief

Multiagent Systems, draft of August 27, 2007

Page 398: Mas kevin b-2007

382 13 Beyond Belief: Probability, Dynamics and Intention

set as describing “current” beliefs and a belief state as describing “conditional” (thatis, “current” as well as “hypothetical”) beliefs. According to AGM revision, then, thebeliever has access to his full conditional beliefs, but theinformer only reveals (someof his) current beliefs. However, one could imagine situations in which the informerengages in “full disclosure”, revealing not only all his current beliefs but also all hishypothetical ones. When this is the case, the believer is faced with the task of mergingtwo belief states; this process is called belieffusion.

Belief fusion calls for resolving conflicts among the two belief states. From thetechnical standpoint, the basic unit of conflict here is no longer the inconsistency oftwo beliefs, but rather the inconsistency of the two orderings on possible worlds. Hereis how one theory resolves such conflicts; we present it in some detail since it veryexplicitly adopts a multiagent perspective.

To develop intuition for the following definitions, imaginea set of information sourcesand a set of agents. The sources can be thought of as primitiveagents with fixed beliefstates. Each source informs some of the agents of its belief state; in effect, each sourceoffers the opinion that certain worlds are more likely than others, and remains neutralabout other pairs.

An agent’s belief state is simply the amalgamation of all these opinions, each anno-tated by its origin (or “pedigree”). Of course, these opinions in general conflict withone another. To resolve these conflicts, the agent places a strict “credibility” rankingon the sources, and accepts the highest-ranked opinion offered on every pair of worlds.

We define thepedigreed belief stateas follows.

Definition 13.2.6 (Pedigreed belief state)Given a finite set of belief statesS (overW), thepedigreed belief state(overW) induced byS is a functionΨ : W ×W 7→pedigreed belief

state 2S∪s0 such thatΨ(w1, w2) = (W,≤) ∈ S : w2 6≤ w1 ∪ s0.

In words,Ψ(w1, w2) is the set of all agents who do not believe that worldw2 is atleast as likely asw1. The agents0 has no beliefs, and is thus always present.

We will useS to denote the set of all of sources overW, and throughout this paperwe will consider pedigreed belief states that are induced bysubsets ofS. Note thatboth ands0 induces the same pedigreed belief state; by slight abuse of notation wewill denote it too bys0.

Next we define a particular policy for resolving conflicts within a pedigreed beliefstate, since for any two worldsw1 andw2, we have the competing camps ofΨ(w1, w2)andΨ(w2, w1). We assume a strict ranking< onS (and thus also on the sources thatinduce any particularΨ). We interprets1 < s2 as ‘s2 is more credible thans1’. Asusual, we define⊑, read “as credible as” as the reflexive closure of<.

We also assume thats0 is the least credible source, which may merit some expla-nation. It might be asked why equate the most agnostic sourcewith the least credibleone. In fact we don’t have to, but since in the definitions thatfollow, agnosticism isoverridden by any opinion regardless of credibility ranking, we might as well assumethat all agnosticism originates from the least credible source, which will permit simplerdefinitions.

Intuitively, given a pedigreed belief stateΨ, Ψ< will retain from Ψ the highest-ranked opinion about the relative likelihood between any two worlds.

c© Shoham and Leyton-Brown, 2007

Page 399: Mas kevin b-2007

13.2 Dynamics of knowledge and belief 383

Definition 13.2.7 (Dominating belief state)GivenW,S, Ψ and< as above, thedom-inating belief stateof Ψ is the functionΨ< : W ×W 7→ S such that∀w1, w2 ∈ W dominating

belief statethe following holds: Ifmax(Ψ(w2, w1)) < max(Ψ(w1, w2)) then Ψ<(w1, w2) =max(Ψ(w1, w2)). Otherwise,Ψ<(w1, w2) = s0.4

Clearly, for anyw1, w2 ∈ W eitherΨ<(w1, w2) = s0 or Ψ<(w2, w1) = s0 or both.It is not hard to see thatΨ< induces a standard (anonymous) belief state:

Definition 13.2.8 (Ordering induced byΨ<) Theordering induced byΨ< is the re-lation, a binary relation onW, such thatw1 w2 iff Ψ<(w2, w1) = s0.

Clearly, is a total pre-order onW. Thus a dominating belief state is a generaliza-tion of the standard notion of belief state.

We are now ready to define the fusion operator.

Definition 13.2.9 (Belief fusion) Given a set of sourcesS and< as above,S1, S2 ⊂S, the pedigreed belief stateΨ1 induced byS1, and pedigreed belief stateΨ2 inducedbyS2, thefusionof Ψ1 andΨ2, denotedΨ1 6Ψ2, is the pedigreed belief state inducedbelief fusionbyS1 ∪ S2.

Figure 13.3 illustrates the operation of the fusion operator. Of the three agents,A hasthe highest priority at 3, followed byB andC with priorities of 2 and 1, respectively.The first three lines describe the beliefs of the agents over the four worlds:a,b,c, andd in the form of a dominating belief state. An arrow from one circle to another meansthat all worlds in the second circle are considered at least as likely as all worlds in thefirst. Each arrow is labelled with the priority of the agent who holds those beliefs. Thefinal two lines show examples of fusion, with the corresponding diagrams showing theresulting dominating belief state. When fusing the beliefs of A andC, we see that allof A’s beliefs are retained because it has a higher priority. SinceA has no opinion onthe pairs(a, b) and(c, d), we takeC ’s beliefs. OfC ’s two remaining beliefs,c ≤C ais overruled bya ≤A c, while b ≤c d is consistent withA’s belief and thus does notshow up in the output. When we fuse this dominating beliefs state with that ofB, theonly possible changes that can be made are on the pairs(a, b) and(c, d), sinceA hasa belief on all other pairs and has a higher priority thanB. AgentB has no opinionon (c, d), on disagrees withC on (a, b), causing a reversal of the arrow, which is nowlabelled withB’s priority.

13.2.3 Theories of belief change: a summary

We have discussed a number of ways in which beliefs change over time. Starting withAGM revision and its probabilistic counterpart, we went on to discuss the operationsof expansion, contraction, update, arbitration, and fusion.

It is important to emphasize that these example are not exhaustive. Because it’s soimportant, we devote this short subsection to making just this point. There are otherspecific theories, for example, theories accounting for theiteration of belief revision; iterated belief

revision4. Note the use of the restrictions. Finiteness assures thata maximal source exists; we could readily replaceit by weaker requirements on the infinite set. The absence of ties in the ranking< ensures that the maximalsource is unique; removing this restriction is not straightforward.

Multiagent Systems, draft of August 27, 2007

Page 400: Mas kevin b-2007

384 13 Beyond Belief: Probability, Dynamics and Intention

A

B

C

a bc

a bA C B

3

2 2

3

a1 1

1 ab

2

dc

b d

1

c d13

3 3

33

c

c

d

b

a

d

1

2

A C

3

3

Figure 13.3 Example of a fusion operator (only the dominating belief states are shown).

most of the theories we discussed do not say what happens if one wishes to conduct(e.g.,) two revisions in sequence. Other theories are more abstract, and provide a gen-eral framework for belief change, of which revision, update, and other operators arespecial instances. Among them, importantly, are theories with information change op-erators; these are couched in propositional dynamic logic, which wewill encounter inSection 13.4, but in which the modal action operators are indexed by a logical propo-sition, denoting (for example) the learning of that proposition. In Section 13.5 weprovide some references to this literature.

13.3 Logic, games, and coalition logic

So far in this chapter we looked at the use of logic to reason about purely “informa-tional” aspects of agents. In the last two remaining sections we broaden the discussionto include also “motivational” aspects. In this section we look (briefly) at the interac-tion between modal logic and game theory.

The connection between logic and games is multifaceted. Even before the adventof modern-day game theory, in the context of classical logic, it was proposed that alogical system be viewed as a game between a prover (the “asserter”) and a disprover(the “respondent”), with a sentence being valid if the prover had a winning strategy. Inthis case, games are used to reason about logic. But much recent work has been aimedin the opposite direction, with logic being used to reason about games. Here games aremeant in the formal sense of modern game theory, and the logics are modal rather thanclassical.

c© Shoham and Leyton-Brown, 2007

Page 401: Mas kevin b-2007

13.3 Logic, games, and coalition logic 385

We have so far seen how modal logic can be used to model and reason about agents’knowledge and the beliefs, and how those change in time. But modal logic can be usedto reason also about their actions, preferences, and hence also about games and (cer-tain) solution concepts. Much of the literature here focuses on extensive-form gamesof both perfect and imperfect (though not yet incomplete) information. There existsa rapidly expanding literature on the topic which is somewhat complex. We will justmention here that these logics allow us to reason about certain solution concepts, par-ticularly those involving only pure strategies. And so one can recapture in logic thenotion of (pure strategy) dominant strategy, iterated elimination of dominated strate-gies, rationalizability, and pure-strategy Nash equilibria.

In lieu of full discussion of this somewhat complex material, to give a feel for whatcan be expressed in such logics we briefly look at one particular exemplar – so-calledcoalition logic(CL). In the language of CL we do not model the actions of the agents, coalition logicbut rather the capabilities of groups (in this respect CL is similar to coalitional gametheory). For any given set of agentsC, the modal operator[C] is meant to capture thecapability of the group.[C]ϕ means that the group can bring aboutϕ (or, equivalently,can ensure thatϕ is the case), regardless of the actions of agents outside thesetC.

The formal syntax of CL is defined as follows. Given a (finite, nonempty) set ofagentsN and a set of primitive propositionsΦ0, the setΦ of well-formed formulas isthe smallest set satisfying the following:

Φ0 ⊂ Φ

If ϕ1, ϕ2 ∈ Φ then¬ϕ1 ∈ Φ andϕ1 ∨ ϕ2 ∈ Φ.

If ϕ ∈ Φ, C ⊂ N , andϕ is [C ′]-free for allC ′, then[C]ϕ ∈ Φ

⊤ is shorthand for¬⊥, and∧,→ and↔ are defined as usual.⊥ can be viewed asshorthand forp ∧ ¬p, and[i]ϕ is shorthand for[i]ϕ for anyi ∈ N .5

The formal semantics of CL are as follows. A CL model is a triple (S,E, V ), suchthat

S is a set of states, or worlds,

V : Φ0 → 2S is the valuation function, specifying the worlds in which primitivepropositions hold, and

E : 2N → 22S

such that

• E() = S• if C ⊂ C ′ thenE(C ′) is a refinement ofE(C)

The satisfaction relation is defined as follows:

(S,E, V ) 6|= ⊥

5. This is in fact a simplified version of CL, which might be called flat CL. The full definition allows for thenesting of[C] operators, but that requires semantics that are too involvedto include here.

Multiagent Systems, draft of August 27, 2007

Page 402: Mas kevin b-2007

386 13 Beyond Belief: Probability, Dynamics and Intention

for p ∈ Φ0, (S,E, V ) |= p iff s ∈ V (p),

(S,E, V ) |= ¬ϕ iff (S,E, V ) 6|= ϕ

(S,E, V ) |= ϕ1 ∨ ϕ2 iff (S,E, V ) |= ϕ1 or (S,E, V ) |= ϕ2

(S,E, V ) |= [C]ϕ iff there existsS′ ∈ E(C) such that for alls ∈ S′ it is the casethats |= ϕ (here|= is in the classical sense).

What can be said about the sentences that are valid in this logic? For example, clearlyboth¬[C]⊥ and[C]⊤ are valid (no coalition can force a contradiction, and tautologiesare true in any model, and thus in any set forced by a coalition). Equally intuitively,[C](ϕ1∧ϕ2)→ [C]ϕ2 is also valid (after all, if a coalition can enforce an outcome to liewithin a given set of worlds, it can also enforce it to lie within a superset). Perhaps moreinsightful is the following valid sentence:([C1]ϕ1∧[C2]ϕ2)→ [C1∪C2](ϕ1∧ϕ2), foranyC1∩C2 = ∅ (if one coalition can force some set of worlds, and a disjointcoalitioncan force another set of worlds, then together they can enforce their intersection). Thediscussion at the end of the chapter points the reader to a more complete discussion ofthis and related logics.

13.4 Towards a logic of ‘intention’

In this section we look, briefly again, at modal logics that have explicit “motivational”modal operators, ones that capture the motivation of agentsand their reason for action.The specific notion we will try to capture is that ofintentionfirst as it is attributed tointentiona single agent and then as it is attributed to set of agents functioning as a group. Aswe shall see, in service of ‘intention’ we will need to define several auxiliary notions.These extended theories are sometimes called jokinglyBDI theories, because, besideBDI theoriesthe notion of belief (B), they include the notions of desire (D) and intention (I), as wellas several others.

It should be said at the outset that these extended theories are considerably morecomplex and messy than the theories of knowledge and belief that were covered inprevious sections, and mathematically less well developedthan those on games andinformation dynamics. We will therefore not treat this subject matter with the sameamount of detail as we did the earlier topics. Instead we willdo the following. We willstart with an informal discussion of notions such as intention, and some requirementson a formal theory of them. We will then outline only the syntax of one such theory,starting with a single-agent theory, and then adding a multiagent component.

13.4.1 Some pre-formal intuitions

Let us start by considering what we are trying to achieve. Recall that our theories of‘knowledge’ and ‘belief’ only crudely approximated the everyday meanings of thoseterms, but were nonetheless useful for certain applications. We would like to strike asimilar balance in a theory of ‘intention’. Such theories could be use to reason aboutcooperative or adversarial planning agents, to create intelligent dialog systems, or tocreate useful personal assistants, to mention a few applications.

c© Shoham and Leyton-Brown, 2007

Page 403: Mas kevin b-2007

13.4 Towards a logic of ‘intention’ 387

What are some of the requirements on such theories? Differentapplications will giverise to different answers. For motivation behind a particular theory we will outline,consider the following scenario:

Phil is having trouble with his new household robot, Hector.He says,“Hector, bring me a beer.” The robot replies, “OK, boss.” Twenty min-utes later, Phil yells, “Hector, why didn’t you bring that beer?” It an-swers, “Well, I had intended to get you the beer, but I decidedto dosomething else.” Miffed, Phil sends the wise guy back to the manufac-turer, complaining about a lack of commitment. After retrofitting, Hectoris returned, marked “Model C: The Committed Assistant.” Again, Philasks Hector to bring a beer. Again, it accedes, replying “Sure thing.”Then Phil asks, “What kind do we have?” It answers, “Anchor Steam.”Phil says, “Never mind.” One minute later, Hector trundles over witha Anchor Steam in its gripper. This time, Phil angrily returnHectorfor over-commitment. After still more tinkering, the manufacturer sendsHector back, promising no more problems with its commitments. So,being a somewhat trusting consumer, Phil accepts the rascalback intohis household, but as a test, he asks Hector to bring him the last beerremaining in the fridge. Hector again accedes, saying, “Yes, sir.” Therobot gets the beer and starts toward Phil. As it approaches,it lifts itsarm, wheels around, deliberately smashes the bottle, and trundles off.Back at the plant, when interrogated by customer service as to why ithad abandoned its commitments, the robot replies that according to itsspecifications, it could not have a commitment that it believed to be un-achievable. Once the last remaining bottle was smashed, thecommit-ment became unachievable. Despite the impeccable logic, and the cor-rect implementation, Hector is dismantled.

This example suggests that in order to capture the notions ofintentionwe must also intentiontackle those ofcommitmentandcapability. In addition, we will consider the notions of

commitment

capability

desire, goal and evenagency. This scenario also suggests some constraints. Here are

desire

goal

agency

some of the intuition that will underlie the formal development:

1. Desires are unconstrained; they need not be achievable, and not even consistent(one can desire smoking and good health simultaneously). Goals in contrast mustbe consistent with each other, and furthermore must be believed to be achievable, orat least not believed to be unachievable. The same is true of intentions, which havethe additional property of being persistent in time. Loosely speaking, these threenotions form a hierarchy; goals imply desires, and intentions imply goals. Thesemutual constraints among the difference notions are sometimes calledrational bal-ance. In the formulation in the next section we will consider goals and intentions, rational balancebut not desires.

2. Intentions come in two varieties—intentions to achieve a particular state (such asbeing in San Francisco) and intentions to take a particular action (such as boardinga particular train to San Francisco). In particular, intentions are future directed. Thesame is true of goals.

Multiagent Systems, draft of August 27, 2007

Page 404: Mas kevin b-2007

388 13 Beyond Belief: Probability, Dynamics and Intention

3. Plans consist of a set of intentions and goals. Plans are ingeneralpartial; that is,they have some goals or intentions that are not directly achievable by the agent. Forexample, an agent may intend to go to San Francisco, even though he may not yetknow whether he wants to drive there or take the train.

4. Plans give rise to further goals and intentions. For example, an intention to go toSan Francisco requires that the agent further specify some means for getting there(e.g., driving or taking the train).

5. At the same time, since the plan of an agent must be internally consistent, a givenplan constrains the addition of new goals and intentions. For example, if the agent’splan already contains the intention of leaving his car at home for his wife to use,then he cannot adopt the intention of driving it to San Francisco as the means ofgetting there.

6. Intentions are persistent, but not irrevocable. For example, an intention to go to SanFrancisco may be part of a larger plan serving a higher-levelgoal, such as landinga new job. Our agent may find another means to achieve the same goal, and thusdrop his intention to go to San Francisco. Or, he may obtain new information whichmakes his trip to San Francisco infeasible, in which case toothat intention must bedropped.

7. Agents need not intend the anticipated side effects of their intentions. Driving toSan Francisco may necessarily increase one’s risk of a traffic accident, but an agentintending the former does not generally intend the latter. Similarly, an agent intend-ing to go to the dentist does not typically have the intentionof experiencing greatpain, even if he may fully expect such pain.

Philosophers have written at considerably greater length on these topics, but let usstop here, and next look at what a formal theory might look like in light of some ofthese considerations.

13.4.2 The road to hell: elements of a formal theory of intention

We will sketch a theory aimed at capturing the notion of intention. Our sketch will beincomplete, and will serve primarily to highlight some of the surprising elements that acomplete theory requires. Certainly we will only consider apropositional theory; thereis no special difficulty in extending the treatment to the first-order case. More critically,we will only discuss the axiomatic theory, and will have little to say about its formalsemantics.

Since intentions are future directed, we must start by representing the passage oftime. Although we have just said that we will not discuss semantics, it is useful tokeep in mind the picture of a sequence of events, leading fromone state to another;figure 13.4 shows an example.

Such event sequences will form the basis for reasoning aboutintentions and othernotions. The language we use to speak about such events is based ondynamic logic.Dynamic logic takes the basic events (such as going to the train station) as the primitiveobjects. These basic events can be combined in a number of ways to make complex

c© Shoham and Leyton-Brown, 2007

Page 405: Mas kevin b-2007

13.4 Towards a logic of ‘intention’ 389

- - - - · · ·

actions: go to train station board train do nothing

states: at home at train station on train in San Francisco

Figure 13.4 An event sequence: actions lead from one state to another

events; see below. By associating an agent with an event (where basic or complex) weget anaction. Since we will not consider agent-less events here, we will use the term‘event’ and ‘action’ interchangeably.

Thus one component of dynamic logic is a language to speak about actions. Anothercomponent is a language to speak about what is true and false in states, the end pointsof actions. In this language we will have two primitive modaloperators -B (belief)andG (goal), and define intention in terms of those. We will not define desire in thelanguage since our notion of intention will not depend on it.

Interestingly, the language of actions and the language of states are mutually depen-dent. They are defined as follows:

Definition 13.4.1 (action and state expressions)Given a set of agentsN , a setE ofprimitive actions, a setV of variables (which will range over actions), and a setPof primitive propositions (describing what is true and false in states), the set of theset of action expressionsA and the set of state expressionsLBG are the smallest setssatisfying the following constraints.

• E ⊆ A

• If a, b ∈ A thena; b ∈ A

• If a, b ∈ A thena|b ∈ A

• If a ∈ A thena∗ ∈ A

• If ϕ ∈ LBG thenϕ? ∈ A

Here is an intuitive explanation of these complex actions.a; b denotes composition:doinga followed byb. a|b denotes nondeterministic choice: doinga or b, nondeter-ministically.a∗ denotes iteration: repeatinga zero or more time, nondeterministically.ϕ? is perhaps the most unusual action. It is the test action; it does not change the state,but is successful only in states that satisfyϕ. Specifically, if the actionϕ? is taken in astate, thenϕ must be true in that state.

• P ⊆ LBG

• If ϕ,ψ ∈ LBG thenϕ ∧ ψ,¬ϕ ∈ LBG

Multiagent Systems, draft of August 27, 2007

Page 406: Mas kevin b-2007

390 13 Beyond Belief: Probability, Dynamics and Intention

• If a ∈ A thenJustHappened(a) ∈ LBG

• If a ∈ A thenAboutToHappen(a) ∈ LBG

• If i ∈ N , anda, b ∈ A thenAgent(a) = i ∈ LBG

• If v ∈ V , andϕ ∈ LBG then∀vϕ ∈ LBG

• If a, b ∈ A ∪ V thena < b ∈ LBG

• If i ∈ N andϕ ∈ LBG thenBiϕ ∈ LBG

• If i ∈ N andϕ ∈ LBG thenGiϕ ∈ LBG

Again, an explanation is in order.JustHappened(a) captures the fact that actiona ended in the current state, andAboutToHappen(a) that it is about to start in thecurrent state.Agent(i, a) identifiesi as the agent of actiona. a < b means that actiona took place before actionb. Finally, Bi andGi of course denote the belief and goaloperators, respectively. However, especially forGi, it is important to be precise aboutthe reading;Giϕ means thatϕ is true in all the states that satisfy the goals of agenti.

While we do not discuss formal semantics here, let us briefly discuss the belief andgoal modal operators. They are both intended to be interpreted via possible-worldssemantics. So the question arises as to the individual properties of these operators,and the interaction between them.B is a standard (KD45) belief operator. We placeno special restriction on theG operator, other than it be serial (to ensure that goalsare consistent). However, since goals must also be consistent with beliefs, we requirethat the goal accessibility relation be a subset of the belief accessibility relation. Thatis, if a possible world is ruled out by the agent’s beliefs, itcannot be a goal. Thisassumption means that you cannot intend that be in San Francisco on Saturday if youbelieve that you will be in Europe at the same time. (Of course, you can intend to be inSan Francisco on Saturday even though you are in Europe today.)6

Before proceeding further, it will be useful to define several auxiliary operators:

Definition 13.4.2 Always(ϕ)def≡ ∀a(AboutToHappen(a)→ AboutToHappen(a;ϕ?))

Eventually(ϕ)def≡ ¬Always(¬ϕ)

Later(ϕ)def≡ (¬ϕ) ∧ (Eventually(ϕ))

Before(ϕ,ψ)def≡ ∀c(AboutToHappen(c;ψ?))→ ∃a(a ≤ c)∧AboutToHappen(a;ϕ?)

JustDid i(a)def≡ JustHappened(a) ∧Agent(a) = i

AboutToDoi(a)def≡ AboutToHappen(a) ∧Agent(a) = i

With these in place, we proceed to define the notion of intention as follows. Wefirst strengthen the notion of having a goal. TheGi operator defines a weak notion; inparticular, it includes goals that are already satisfied, and thus provides no impetus foraction.Achievement goal, orAGoal, focuses on the goals that are yet to be achieved:achievement

goal6. Note that this constraint means that the formulaBiϕ → Giϕ is valid in our models; this is where thecareful reading of theGi operator is required.

c© Shoham and Leyton-Brown, 2007

Page 407: Mas kevin b-2007

13.4 Towards a logic of ‘intention’ 391

Definition 13.4.3 (Achievement goal)

AGoali , ϕdef≡ Gi(Later(ϕ)) ∧Bi(¬ϕ)

An achievement goal is useful, but it is still not an intention. This is because anachievement goal has no model of commitment. An agent might form an achievementgoal, only to drop it moments later, for no apparent reason. In order to model commit-ment, we should require that the agent not drop the goal untilhe reaches it. We callsuch a goal a persistent goal, defined as follows.

Definition 13.4.4 (Persistent goal)

PGoal iϕdef≡ AGoaliϕ ∧

Before(Bi(ϕ) ∨Bi(Always(¬ϕ)),¬Gi(Later(ϕ)))

In other words, a persistent goal is an achievement goal thatthe agent will not giveup until he believes that it is true or will never be true.

The notion of a persistent goal brings us closer to a reasonable model of intention,but it still misses an essential element of intentionality.In particular, it does not capturethe requirement that the agent himself do something to fulfill the goal, let alone do soknowingly. The following definitions attempt to add these ingredients.

Recall that an agent may intend either to do an action, or to achieve a state. For thisreason, we give two definitions of intention. The following is a definition of intendingan action:

Definition 13.4.5 (Intending an action)

IntendsAiadef≡ PGoali(JustDidi(Bi(AboutToDoi(a))?; a))

In other words, if an agent intends an action, he must have a persistent goal to firstbelieve that he is about to take that action and then to actually take it.

The second definition captures the intention to bring about astate with certain prop-erties:

Definition 13.4.6

IntendSi(ϕ)def≡ PGoali ,∃eJustDidi(Bi(∃e′AboutToDoi(e

′;ϕ?)) ∧¬Gi(¬AboutToDoi(ϕ?))?; e;ϕ?))

We explain this definition in a number of steps. Notice that tointend a state in whichϕ holds, an agent is committed to taking a number of actionse himself, after whichϕ holds. However, in order to avoid allowing him to intendϕ by doing somethingaccidentally, we require that he believe he is about to do some series of eventse′ whichbrings aboutϕ. In other words, we require that he has a plane′, which he believes thathe is executing, which will achieve his goalϕ.

As was mentioned at the beginning of this section, that the logic of intention is con-siderably more complex, messy, and controversial than thatof knowledge and belief.

Multiagent Systems, draft of August 27, 2007

Page 408: Mas kevin b-2007

392 13 Beyond Belief: Probability, Dynamics and Intention

We will not discuss the pros and cons of this line of definitions further, nor alterna-tive definitions; the notes at the end of the chapter provide references to these. Let usjust note that these definitions do have some desirable formal properties. For example,early intentions preempt later potential intentions (namely those that undermine theobjectives of the earlier intentions); agents cannot intend to achieve something if theybelieve that no sequence of actions on their part will bring it about; and agents do notnecessarily intend all the side effects of their intentions.

13.4.3 Group intentions

The theory of intention outlined in the previous section considers a single agent. Wehave seen that the extension of the logic of (e.g., knowledge) to the multiagent settingis interesting; in particular, it gives rise to the notions of common knowledge, which isso central to reasoning about coordination. Is there similar motivation for consideringintention in a multiagent setting?

Consider a set of agents acting as a group. Consider specifically the notion of con-voy. Consisting of multiple agents, each making local decisions, a convoy nonethelessmoves purposefully as a single body. It is tempting to attribute beliefs and intentionsto this body, but how do these group notions relate to the beliefs and intentions of theindividual agents?

It certainly won’t do to equate a group intention simply withall the individual agentshaving that intention; it is not even sufficient to have thoseindividual intentions becommon knowledge among the agents. Consider the following scenario.

Two ancient Inca kings, Euqsevel and Nehoc, aim to ride theirhorses ina convoy to the Temple of Otnorot. Specifically, since King Euqsevelknows the way, he intends to simply ride there, and King Nehocintendsto follow him. They set out, and since King Euqsevel’s horse is faster hequickly loses King Nehoc, who never makes it to Otnorot.

Centuries later, in old Wales, two noblemen—Sir Ffegroeg andSir Hgnis—set out on another journey. Here again only one of them, Sir Ffegroeg,knows the way. However, having learned from the Incan mishap, they donot simply adopt their individual intentions. Instead, SirFfegroeg agreesto go ahead on his faster horse, but to wait at road junctions in order toshow Sir Hgnis the way. They proceed in this fashion until at some pointSir Ffegroeg discovers that snow is covering the mountain pass, and thatit is impossible to get to their destination. So he correctlyjettisons hisintention to get there, and therefore also the dependent intention to showSir Hgnis the way. Instead he makes his way to a comfortable hotel heknows in a nearby town; Sir Hgnis is left to wander the unfamiliar landto his last day.

Clearly, the interaction between the intentions of the individual agents and the col-lective mental state is involved.

One way to approach the problem is to mimic the development inthe single-agentcase. We first define the notion ofjoint persistent goal, and then use it to define the

c© Shoham and Leyton-Brown, 2007

Page 409: Mas kevin b-2007

13.5 History and references 393

notion of ajoint intention. We give both definitions informally; the formal constructionis left as an exercise for the reader. We start with yet a thirddefinition:

Definition 13.4.7 (Weak achievement goal; informal)An agent has aweak achieve-ment goalwith respect to a team of agents iff one of the following is true: weak

achievementgoal1. The agent has a standard achievement goal (AGoal) to bringaboutϕ.

2. The agent believes thatϕ is true or will never be true, but has an achievement goalthat the status ofϕ be common belief within the team.

Definition 13.4.8 (Joint persistent goal; informal) A team of agents has ajoint per-sistent goalto achieveϕ iff joint persistent

goal1. They have common belief thatϕ is currently false.

2. They have common (and true) belief that they each have the goal thatϕ hold even-tually.

3. They have common (and true) belief that, until they come tohave common beliefthat ϕ is either true or never will be true, they will each continue to haveϕ as aweak achievement goal.

Definition 13.4.9 (Joint intention; informal) A team of agents has ajoint intention joint intentionto take a set of actionsA (each action by one of the agents) iff the agents have a jointpersistent goal of (a) having doneA and (b) having common belief throughout theexecution ofA that they are doingA.

In addition to avoiding the pitfalls demonstrated in the story above, these definitionshave a many attractive properties, including the following:

– The joint persistent goals of team consisting of a single agent coincide with theintentions of that agent.

– If a team hasϕ as a joint persistent goal then so does every individual agent.

– If a team has a joint intention to take a set of actionsA, and actiona ∈ A belongsto agenti, then agenti intends to take actiona.

These definitions are nothing if not complex, which is why we omit their formaldetails here Furthermore, as was said, there is not yet universal agreement on the bestdefinitions. The references at the end of the chapter point tothe reader to furtherreading on this fascinating yet incomplete body of work.

13.5 History and references

The combination of knowledge and probability is covered in Faginet al. [1995], whichremains a good technical introduction to the topic.

Multiagent Systems, draft of August 27, 2007

Page 410: Mas kevin b-2007

394 13 Beyond Belief: Probability, Dynamics and Intention

Theories of belief dynamics—revision, update, and beyond—are covered in the ap-propriate chapters of [Zalta, 1997], as well as in [Gardenfors & Rott, 1995]. Early sem-inal work includes that of Gardenfors [1988], Alchourron, Gardenfors and Makinson[1985] (after whom ‘AGM revision’ is named), and Katsuno andMendelzon [1991],who introduced the distinction between belief revision andbelief update. The materialin the chapter on knowledge, certainty and belief is based on[Boutilier, 1992; Lamarre& Shoham, 1994]. The material in the section on belief fusionis based on [Maynard-Reid & Shoham, 2001; Maynard-Zhang & Lehmann, 2003]. A broader introductionto logical theories of belief fusion can be found in [Gregoire & Konieczny, 2006]. Be-lief dynamics are closely related to the topic ofnon-monotonic logics, a rich source ofnon-monotonic

logics material. An early comprehensive coverage of non-monotonic logics can be found in[Ginsberg, 1987], and more recent material is covered in [Schlechta, 1997].

The recent trend towards modal logics of information dynamics is heralded in [vanBenthem, 1997]. The early seminal work with a game-like perspective on logic is dueto Peirce [1965]. A recent review of modal logic for games andinformation changeappears in [van der Hoek & Pauly, 2006]. Our discussion of coalition logic is coveredthere in detail, and originally appeared in [Pauly, 2002].

In general, belief dynamics still constitute an active areaof research, and the inter-ested reader will undoubtedly want to study the recent literature, primarily in artificialintelligence and philosophical logic.

The literature on formal theories of “motivational” attitudes, such as desires, goalsand intentions, is much sparser than the literature on the “informational” attitudes dis-cussed above. There are fewer publications on these topics,and the results are stillpreliminary (which is reflected in the style and length of thebook section). The mate-rial presented is based largely on work in artificial intelligence by Cohen and Levesque[1990; 1991], which in turn was inspired the philosophical work such as that of Brat-man [1987]. This area presents many opportunities for further research.

c© Shoham and Leyton-Brown, 2007

Page 411: Mas kevin b-2007

Appendices: Technical Background

Page 412: Mas kevin b-2007
Page 413: Mas kevin b-2007

A Probability Theory

Probability theory provides a formal framework for the discussion of chance or uncer-tainty. This appendix reviews some key concepts of the theory and establishes nota-tion. However, it glosses over some details (for example, pertaining to measure theory).Therefore, the interested reader is encouraged to consult atextbook on the topic for amore comprehensive picture.

A.1 Probabilistic models

A probabilistic model is defined as a tuple(Ω,F , P ), where:

• Ω is thesample space, also called theevent space. sample space

event space• F is a sigma algebra overΩ, that is,F ⊆ 2Ω and closed under intersection andcountable union.

• P : F → [0, 1] is theprobability density function(PDF). probabilitydensity function

Intuitively, the sample space is a set of things that can happen in the world ac-cording to our model. For example, in a model of a six-sided die, we might haveΩ = 1, 2, 3, 4, 5, 6. The sigma fieldF is a collection of measurable events.F isrequired because some outcomes inΩ may not be measurable; thus we must defineour probability density functionP overF rather than overΩ. However, in many casessuch as the six-sided die example, all outcomesare measurable. In those cases wecan equateF with 2Ω, and view the probability space as the pair(Ω, P ) andP asP : 2Ω → [0, 1]. We will assume this in the following.

A.2 Axioms of probability theory

The probability density functionP must satisfy the following axioms:

1. For anyA ⊆ Ω, P (∅) = 0 ≤ P (A) ≤ P (Ω) = 1;

2. For any pair of disjoint setsA,A′ ⊂ Ω, P (A ∪A′) = P (A) + P (A′).

Page 414: Mas kevin b-2007

398 A Probability Theory

That is, all probabilities must be bounded by0 and1; 0 is the probability of theempty set and1 is the probability of the whole sample space. Second, when sets ofoutcomes from the sample space are non-overlapping, the probability of achieving anoutcome from either of the sets is the sum of the probabilities of achieving an outcomefrom each of the sets. We can infer from these rules that if twosetsA,A′ ⊆ Ω are notdisjoint,P (A ∪A′) = P (A) + P (A′)− P (A ∩A′).

A.3 Marginal probabilities

We are often concerned with sample spacesΩ which are defined as the Cartesian prod-uct of a set of random variablesX1, . . . ,Xn with domainsX1, . . . ,Xn respectively.Thus, in this settingΩ = ×n

i=1Xi. Our density functionP is thus defined over fullassignments of values to our variables, such asP (X1 = x1, . . . ,Xn = xn). However,sometimes we want to ask aboutmarginal probabilities: the probability that a singlemarginal

probability variableXi takes some valuexi ∈ Xi. We define

P (Xi = xi) =∑

x1∈X1

· · ·∑

xi−1∈Xi−1

xi+1∈Xi+1

· · ·∑

xn∈Xn

P (X1 = x1, . . . ,Xn = xn).

From this definition and from the axioms given above we can also infer that (forexample)

P (Xi = xi) =∑

xj∈Xj

P (Xi = xi andXj = xj).

A.4 Conditional probabilities

We say that two random variablesXi andXj areindependentwhenP (Xi = xi andindependentvariables Xj = xj) = P (Xi = xi) · P (Xj = xj) for all valuesxi ∈ Xi, xj ∈ Xj .

Often, random variables are not independent. When this is thecase, it can be impor-tant to know the probability thatXi will take some valuexi given thatXj = xj hasalready been observed. We define this probability as

P (Xi = xi|Xj = xj) =P (Xi = xi andXj = xj)

P (Xj = xj).

We callP (Xi = xi|Xj = xj) a conditional probability; P (Xi = xi andXj =conditionalprobability xj) is called ajoint probability andP (Xj = xj) is a marginal probability, already

joint probabilitydiscussed above.

Finally, Bayes’ ruleis an important identity that allows us to reverse conditional

Bayes’ rule probabilities. Specifically,

P (Xi = xi|Xj = xj) =P (Xj = xj |Xi = xi)P (Xi = xi)

P (Xj = xj).

c© Shoham and Leyton-Brown, 2007

Page 415: Mas kevin b-2007

B Linear and Integer Programming

Linear programs and integer programs are optimization problems with linear objectivefunctions and linear constraints. These problems are very general: linear programs canbe used to express a wide range of problems such as bipartite matching and networkflow, while integer programs can express an even larger rangeof problems, includingall of those inNP. We will define each of these formalisms in turn.

B.1 Linear programs

Defining linear programs

A linear programis defined by: linear program

• a set of real-valued variables;

• a linear objective function (i.e., a weighted sum of the variables);

• a set of linear constraints (i.e., the requirement that a weighted sum of the variablesmust be greater than or equal to some constant).

Let the set of variables bex1, . . . , xn, with eachxi ∈ R. The objective functionof a linear program, given a set of constantsc1, . . . , cn, is

maximizen∑

i=1

cixi.

Linear programs can also express minimization problems: ofcourse, these are justmaximization problems with all weights in the objective function negated.

Constraints express the requirement that a weighted sum of the variables must begreater than or equal to some constant. For example, given a set of constantsa1j , . . . , anj

and a constantbj , a constraintcj is the expression

n∑

i=1

aijxi ≥ bj

This form actually allows us to express a broader range of constraints than mightimmediately be apparent. By negating all constants, we can express less-than-or-equal

Page 416: Mas kevin b-2007

400 B Linear and Integer Programming

constraints. By providing both less-than-or-equal and greater-than-or-equal constraintswith the same constants, we can express equality constraints. By setting some con-stants to zero, we can express constraints that do not involve all of the variables. Forexample, we can express the constraint that a single variable must be greater than orequal to zero. Furthermore, even problems with nonlinear constraints (e.g., involvingfunctions like amax of linear terms) can sometimes be expressed as linear programsby adding both new constraints and new variables. Observe that wecannotalwayswrite strict inequality constraints, though sometimes such constraints can be enforcedthrough changes to the objective function. (For an example,see the linear program inFigure 3.16, which enforces the strict inequality constraints given in Figure 3.15.)

Bringing it all together, if we have a setC of different constraints, we can write alinear program as follows:

maximizen∑

i=1

cixi

subject ton∑

i=1

aijxi ≥ bj ∀cj ∈ C

xi ≥ 0 ∀i = 1 . . . n

Observe that the requirement that eachxi must be positive is not restrictive: prob-lems involving negative variables can always be reformulated into equivalent problemsthat satisfy the constraint.

A linear program can also be written in matrix form. Letc be a vector containing theweightsci, let x be a vector containing the variablesxi, let A be a matrix of constantsaij , and letb be a vector of constantsbj . We can then write a linear program in matrixform as

maximize cT x

subject to Ax ≥ b

x ≥ 0

In some cases we care to satisfy a given set of constraints, but do not have an associ-ated objective function; any solution will do. In this case the LP reduces to a constraintsatisfaction problem, but we will sometimes still refer to an as an LP with an emptyobjective function (or, equivalently, the trivial one).

Finally, every linear program (a so-calledprimal problem) has a correspondingdualprimal problemproblemwhich shares the same optimal solution. For the linear program given above,

dual problem we write the dual program as

minimize bT y

subject to AT y ≤ c

y ≥ 0

In this linear program our variables arey. Variables and constraints effectively tradeplaces: there is one variabley ∈ y in the dual problem for every constraint from the

c© Shoham and Leyton-Brown, 2007

Page 417: Mas kevin b-2007

B.2 Integer programs 401

primal problem, and one constraint in the dual problem for every variablex ∈ x fromthe primal problem.

Solving linear programs

In order to solve linear programs, it is useful to observe that the set of feasible solutionsto a linear program corresponds to a convex polyhedron inn-dimensional space. Thisis true because all of the constraints are linear: they correspond to hyperplanes inthis space, and so the set of feasible solutions is the regionbounded by all of thehyperplanes. The fact that the objective function is also linear allows us to concludetwo useful things: any local optimum in the feasible region will be a global optimum,and at least one optimal solution will exist at a vertex of thepolyhedron. (More thanone optimal solution may exist if an edge or even a whole face of the polyhedron is alocal maximum.)

The most popular algorithm for solving linear programs is the simplex algorithm. simplexalgorithmThis algorithm works by identifying one vertex of the polyhedron, and then taking up-

hill (i.e., objective-function-improving) steps to neighboring vertices until an optimumis found. This algorithm requires an exponential number of steps in the worst case, butis usually very efficient in practice.

Although the simplex algorithm is not polynomial, it can be shown that other algo-rithms calledinterior point methodssolve linear programs in worst-case polynomialinterior point

methodstime. These algorithms get their name from the fact that theymove through the interiorregion of the polyhedron rather than jumping from one vertexto another. Surprisingly,although these algorithms dominate the simplex method in the worst case, they can bemuch slower in practice.

B.2 Integer programs

Defining integer programs

Integer programsare linear programs in which one additional constraint holds: the Integerprogramsvariables are required to take integral (rather than real) values. This makes it possible

to express combinatorial optimization problems such as satisfiability or set packing asinteger programs. A useful subclass of integer programs are0-1 integer programs, inwhich each variable is constrained to take either the value0 or the value1. Theseprograms are sufficient to express any problem inNP. We write 0-1 integer programsas follows:

maximizen∑

i=1

cixi

subject ton∑

i=1

aijxi ≥ bj ∀cj ∈ C

xi ∈ 0, 1 ∀i = 1 . . . n

Another useful class of integer programs ismixed-integer programs, which involve mixed-integerprograms

Multiagent Systems, draft of August 27, 2007

Page 418: Mas kevin b-2007

402 B Linear and Integer Programming

a combination of integer and real-valued variables.Finally, as in the LP case, both integer and mixed-integer programs can come without

an associated objective function, in which case they reduceto constraint-satisfactionproblems.

Solving integer programs

The introduction of an integrality constraint to linear programs leads to a much hardercomputational problem: in the worst case integer programs are undecidable. Whenvariables’ domains are finite sets of integers, they areNP-hard. Thus, it should not besurprising that there is no efficient procedure for solving integer programs.

The most commonly-used technique is branch-and-bound search. The space of vari-able assignments is explored depth-first: first one variableis assigned a value, thenthe next, and so on; when a constraint is violated or a complete variable assignment isachieved, the search backtracks and tries other assignments. The best feasible solutionfound so far is recorded as a lower bound on the value of the optimal solution. At eachsearch node thelinear program relaxationof the integer program is solved: this is thelinear program

relaxation linear program where the remaining variables are allowed totake real rather than inte-gral values between the minimum and maximum values in their domains. It is easy tosee that the value of a linear program relaxation of an integer program is an upper boundon the value of that integer program, since it involves a loosening of the latter problem’sconstraints. Branch-and-bound search differs from standard depth-first search becauseit sometimes prunes the tree. Specifically, branch-and-bound backtracks whenever theupper bound at a search node is less than or equal to the lower bound. In this way itcan skip over large parts of the search tree while still guaranteeing that it will find theoptimal solution.

Other, more complex techniques for solving integer programs include branch-and-cut and branch-and-price search. These methods offer no advantage over branch-and-bound search in the worst case, but often outperform it in practice.

Although they are computationally intractable in the worstcase, sometimes integerprograms are provably easy. This occurs when it can be shown that the solution tothe linear programming relaxation is integral, meaning that the integer program can besolved in polynomial time. One important example is when theconstraint matrix istotally unimodularand the vectorb is integral. A unimodular matrix is a square matrixtotal

unimodularity whose determinant is either -1 or 1; a totally unimodular matrix is one for which everysquare submatrix is unimodular. This definition implies that the entries in a totallyunimodular matrix can only be -1, 0 and 1.

c© Shoham and Leyton-Brown, 2007

Page 419: Mas kevin b-2007

C Markov Decision Problems (MDPs)

We briefly review the main ingredients of MDPs, which, as we discuss in Chapter 5,can be viewed as single-agent stochastic games. The literature on MDPs is rich, andthe reader is referred to the many textbooks on the subject for further reading.

C.1 The model

A Markov Decision Problem, or MDP, is a model for decision making in an uncertain,MarkovDecisionProblem

MDP

dynamic world. The (single) agent starts out in some state, takes an action, and receivessome immediate rewards. The state then transitions probabilistically to some otherstate, and the process repeats. Formally speaking, an MDP isa tuple(S,A, p,R). S isa set of states, andA is a set of actions. The functionp : S × A × S → R specifiesthe transition probability among states:p(s, a, s′) is the probability of ending in states′ when taking actiona in states. Finally, the functionR : S × A → R returns thereward for each state-action pair.

The individual rewards are aggregated in one of two ways:

The limit-average reward: lim∞T=1

ΣTt=1Rt

T , wherert is the reward at timet.

The future-discounted reward: Σit=1nftyβRt, where0 < β < 1 is the discount

factor.

(The reader will notice that both of these definitions must berefined, to account forcases in which (in the first case) the limit is ill-defined, and(in the second case) inwhich the sum is infinite. We do not discuss these subtleties here.)

A (stationary, deterministic) policyΠ : S → A maps each state to an action.For concreteness, in the next section we will focus on the future-discounted reward

case.

C.2 Solving known MDPs via value iteration

Every policy yields a reward under either of the reward-aggregation schemes. A policythat maximizes the total reward is called anoptimal policy. The primary computational optimal policychallenge associated with MDPs is to find an optimal policy for a given MDP.

Page 420: Mas kevin b-2007

404 C Markov Decision Problems (MDPs)

It is possible to use linear programming techniques (see Appendix B) to calculatean optimal policy for a known MDP in polynomial time. However, we will focus onan older, dynamic-programming-style method calledvalue iteration. We do so for twovalue iterationreasons. First, in typical real-world cases of interest, the LP-formulation of the MDPis too large to solve. Value iteration provides the basis fora number of more practicalsolutions, such as those providing approximate solutions to very large MDPs. Second,value iteration is most relevant to the discussion of learning in MDPs, discussed inChapter 6.

Value iteration defines a value functionV π : S → R which specifies the value offollowing policy π starting in states. Similarly, we can define a state-action valuefunctionQπ : S × A → R as a function that captures the value of starting in states,taking actiona, and then continuing according to policyπ. These two functions arerelated to each another by the following pair of equations.

Qπ(s, a) = r(s, a) + β∑

s

p(s, a, s)V π(s)

V π(s) = Qπ(s, π(s))

For the optimal policyπ∗, the second equation becomesV π∗

(s) = maxaQπ∗

(s, a)and the set of equations is referred to as theBellman equations. Note that the optimalBellman

equations policy is easily recovered from the Bellman equation; the optimal action in states isarg maxaQ

π∗

(s, a).The Bellman equations are interesting not only because theycharacterize the optimal

policy, but also—indeed, primarily—because they give rise toa procedure for calcu-lating theQ andV values of the optimal policy, and hence the optimal policy itself.Consider the following two assignment versions of the Bellman equations:

Qt+1(s, a)← r(s, a) + β∑

s

p(s, a, s)Vt(s)

Vt(s)← maxa

Qt(s, a)

Given an MDP, and starting with arbitrary initialQ values, we can repeatedly iteratethese two sets of assignment operators (‘sets’, since each choice ofs anda produces adifferent instant). It is well known that any “fair” order ofiteration (by which we meanthan each instance of the rules is updated after a finite amount of time) converges ontheQ andV values of an optimal policy.

c© Shoham and Leyton-Brown, 2007

Page 421: Mas kevin b-2007

D Classical Logic

The following is not intended as an introduction to classical logic, but rather as a reviewof the concepts and a setting of notation. We start with propositional calculus, and thenmove to first-order logic. (We do the latter for completeness, but in fact first-order logicplays almost no role in this book.)

D.1 Propositional calculus

Syntax

Given a setP of propositional symbols, the set of sentences in the propositional cal-culus is the smallest setL containingP such that ifϕ,ψ ∈ L then also¬ϕ ∈ L andϕ ∧ ψ ∈ L. Other connectives, such as∨,→ and≡, can be defined in terms of∧ and¬.

Semantics

Given the syntax as above, aninterpretation(or amodel) is a setM ⊂ P , the subset of interpretation

modeltrue primitive propositions. The satisfaction relation|= between models and sentencesis defined recursively as follows:

• For anyp ∈ P ,M |= p iff p ∈M .

• M |= ϕ ∧ ψ iff M |= ϕ andM |= ψ.

• M |= ¬ϕ iff it is not the case thatM |= ϕ.

We overload the|= symbol. First, it is used to denotevalidity; |= ϕ means thatϕ is validitytrue in all propositional models. Second, it is used to denoteentailment; ϕ |= ψ means

entailmentthat any model that satisfiesϕ also satisfiesψ.

Axiomatics

The following axiom system is sound and complete for the class of all propositionalmodels:

Page 422: Mas kevin b-2007

406 D Classical Logic

A1. A→ (B → A)

A2. (A→ (B → C))→ ((A→ B)→ (A→ C))

A3. (¬A→ ¬B)→ (B → A)

R1 (Modus Ponens).A,A→ B ⊢ B.

D.2 First-order logic

The book makes very little reference to first-order constructs, but we include the basicmaterial on the first-order logic for completeness.

Syntax

Given a setC of constant symbols,V of variables,F of function symbols each of agiven arity, andR of relation (or predicate) symbols each of a given arity. Theset ofterms is the smallest setT thatC ∪ V ⊂ T , and iff ∈ F is ann-ary functions symbolandt1, . . . , tn ∈ T then alsof(t1, . . . , tn) ∈ T . The set of sentences is the smallestsetL satisfying the following conditions:

If r is ann-ary relation symbol andt1, . . . , tn ∈ T thenr(t1, . . . , tn) ∈ L. These arethe atomic sentences.

If ϕ,ψ ∈ L then also¬ϕ ∈ L andϕ ∧ ψ ∈ L.

If ϕ ∈ L andv ∈ V then∀vϕ ∈ L.

Semantics

Given the syntax as above, aninterpretation(or amodel) is a tupleM = (D,G, S, µ).interpretation

modelD is the domain ofM , a set.G is a set of functions from the domain onto itself, eachof a given arity.S is a set of relations over the domain, each of a given arity.µ is an(overloaded) interpretation function:µ : C ∪ V → D, µ : F → G, µ : R → S (weassume thatµ respects the arities of the function and relations). We can lift µ to applyto any term by the recursive definitionµ(f(t1, . . . , tn)) = µ(f)(µ(t1), . . . , µ(tn)).

The satisfaction relation|= between models and sentences is defined recursively asfollows:

• For any atomic sentenceϕ = r(t1, . . . , tn),M |= ϕ iff (µ(t1), . . . , µ(tn)) ∈ µ(r).

• M |= ϕ ∧ ψ iff M |= ϕ andM |= ψ.

• M |= ¬ϕ iff it is not the case thatM |= ϕ.

• M |= ∀vϕ iff M |= ϕ[v/t] for all termst, whereϕ[v/t] denotesϕ with all freeinstances ofv replaced byt.

We overload the|= symbol as before.

c© Shoham and Leyton-Brown, 2007

Page 423: Mas kevin b-2007

D.2 First-order logic 407

Axiomatics

We omit the axiomatics of first-order logic since they play norole at all in the book.

Multiagent Systems, draft of August 27, 2007

Page 424: Mas kevin b-2007
Page 425: Mas kevin b-2007

Bibliography

Alchourron, C. E., Gardenfors, P., & Makinson, D. (1985). On the logic of theorychange: Partial meet contraction and revision functions.Journal of Symbolic Logic,50(2), 510–530.

Allen, J. F., Shubert, L. K., Ferguson, G., Heeman, P., Hwang, C. H., Kato, T., Light,M., Martin, N. G., Miller, B. W., Poesio, M., & Traum, D. R. (1995). The TRAINSproject: A case study in building a conversational planningagent.Journal of Exper-imental and Theoretical AI, 7, 7–48.

Altman, A., & Tennenholtz, M. (2005). Ranking systems: the PageRank axioms.Pro-ceedings of the 6th ACM conference on Electronic commerce, 1–8.

Altman, A., & Tennenholtz, M. (2007a). An axiomatic approach to personalized rank-ing systems.Technion Israel Institute of Technology Technical Report.

Altman, A., & Tennenholtz, M. (2007b). Axiomatic foundations for ranking systems.Journal of Artificial Intelligence Research. To appear.

Arrow, K. (1977). The property rights doctrine and demand revelation under incom-plete information. Institute for Mathematical Studies in the Social Sciences, StanfordUniversity.

Arrow, K. J. (1970).Social choice and individual values. Yale University Press.

Aumann, R. (1959). Acceptable points in general cooperativen-person games.Con-tributions to the Theory of Games, 4, 287–324.

Aumann, R. (1974). Subjectivity and correlation in randomized strategies.Journal ofMathematical Economics, 1, 67–96.

Aumann, R. (1995). Backward induction and common knowledgeof rationality.Games and Economic Behavior, 8(1), 6–19.

Aumann, R. (1996). Reply to Binmore.Games and Economic Behavior, 17(1), 138–146.

Aumann, R. J. (1976). Agreeing to disagree.Annals of Statistics, 4, 1236–1239.

Page 426: Mas kevin b-2007

410 BIBLIOGRAPHY

Austin, J. L. (1962).How to do things with words: The William James lectures deliv-ered at Harvard University in 1955. Oxford: Clarendon.

Austin, J. L. (2006).How to do things with words: The William James lectures deliv-ered at Harvard University in 1955. Harvard University Press. Second edition.

Ausubel, L. M., & Milgrom, P. (2006). Ascending proxy auctions. In P. Cramton,Y. Shoham and R. Steinberg (Eds.),Combinatorial auctions, chapter 3, 79–98. MITPress.

Axelrod, R. (1984a).The evolution of cooperation. New York: Basic Books.

Axelrod, R. (1984b).The evolution of cooperation. New York: Basic Books.

Beckmann, M. J., McGuire, C. B., & Winsten, C. B. (1956).Studies in the economicsof transportation. Yale University Press.

Bellman, R. (1957).Dynamic programming. Princeton University Press.

Ben-Porath, E. (1990). The complexity of computing a best response automaton inrepeated games with mixed strategies.Games and Economic Behavior, 1–12.

Berg, J., Forsythe, R., Nelson, F., & Rietz, T. (2001). Results from a Dozen Years ofElection Futures Markets Research.Handbook of Experimental Economics Results.

Berger, U. (2005). Fictitious play in 2xn games.Economic Theory.

Bernheim, B. D. (1984). Rationalizable strategic behavior. Econometrica, 52, 1007–1028.

Bertsekas, D. (1982). Distributed dynamic programming.IEEE Transactions on Auto-matic Control, 27(3).

Bertsekas, D. P. (1991).Linear network optimization: Algorithms and codes. MITPress.

Bertsekas, D. P. (1992). Auction algorithms for network flowproblems: A tutorialintroduction.Computational Optimization and Applications, 1(1).

Bhat, N., & Leyton-Brown, K. (2004). Computing Nash equilibria of Action-GraphGames.UAI.

Binmore, K. (1996). A note on backward induction.Games and Economic Behavior,17(1), 135–137.

Blackburn, P., de Rijke, M., & Venema, Y. (2002).Modal logic. Cambridge UniversityPress.

Blackwell, D. (1956). Controlled random walks.Proceedings of the InternationalCongress of Mathematicians(pp. 336–338). North-Holland.

c© Shoham and Leyton-Brown, 2007

Page 427: Mas kevin b-2007

BIBLIOGRAPHY 411

Blum, B., Shelton, C., & Koller, D. (2003). A continuation method for Nash equilibriain structured games.IJCAI.

Borodin, A., Roberts, G., Rosenthal, J., & Tsaparas, P. (2005). Link analysis ranking:algorithms, theory, and experiments.ACM Transactions on Internet Technology,5(1), 231–297.

Boutilier, C. (1992).Conditional logics for default reasoning and belief revision. Doc-toral dissertation, University of Toronto, Toronto, Ontario, Canada.

Bowling, M., & Veloso, M. (2001). Rational and convergent learning in stochasticgames.Proceedings of the Seventeenth International Joint Conference on ArtificialIntelligence.

Braess, D. (1968). Uber ein Paradoxon aus der Verkehrsplanung.Un-ternehmensforschung, 12, 258–268.

Brafman, R., Halpern, J., & Shoham, Y. (1998). On the knowledge requirements oftasks.Journal of Artificial Intelligence, 98(1–2), 317–350.

Brafman, R., Latombe, J. C., Moses, Y., & Shoham, Y. (1996). Logics of knowledgeand robot motion planning.Journal of the ACM.

Brafman, R., & Tennenholtz, M. (2002). R-max, a general polynomial time algorithmfor near-optimal reinforcement learning.Journal of Machine Learning Research, 3,213–231.

Bratman, M. E. (1987).Intention, plans, and practical reason. CSLI Publications,Stanford University.

Brown, G. (1951). Iterative solution of games by fictitious play. In Activity analysis ofproduction and allocation. New York: John Wiley and Sons.

Bulow, J., & Klemperer, P. (1996). Auctions versus negotiations. The American Eco-nomic Review, 86(1), 180–194.

Cassady, R. (1967).Auctions and auctioneering. University of California Press.

Cavallo, R. (2006). Optimal decision-making with minimal waste: strategyproof redis-tribution of VCG payments.Proceedings of the fifth international joint conferenceon Autonomous agents and multiagent systems, 882–889.

Chellas, B. F. (1980).Modal logic: an introduction. Cambridge University Press.

Chen, X., & Deng, X. (2006). Settling the complexity of 2-player nash-equilibrium.FOCS.

Clarke, E. H. (1971). Multipart pricing of public goods.Public Choice, 11, 17–33.

Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in coop-erative multiagent systems.Proceedings of the Fifteenth National Conference onArtificial Intelligence(pp. 746–752).

Multiagent Systems, draft of August 27, 2007

Page 428: Mas kevin b-2007

412 BIBLIOGRAPHY

Cohen, P. R., & Levesque, H. J. (1990). Intention is choice with commitment.ArtificialIntelligence, 42(2-3), 213–261.

Cohen, P. R., & Levesque, H. J. (1991). Teamwork.Nous, 25(4), 487–512.

Conitzer, V. (2006).Computational aspects of preference aggregation. Doctoral dis-sertation, Carnegie Mellon University.

Conitzer, V., & Sandholm, T. (2003a). Complexity of determining nonemptiness of thecore. IJCAI.

Conitzer, V., & Sandholm, T. (2003b). Complexity results about Nash equilibria.IJCAI(pp. 761–771).

Conitzer, V., & Sandholm, T. (2004). Computing shapley values, manipulating valuedivision schemes, and checking core membership in multi-issue domains.AAAI.

Conitzer, V., & Sandholm, T. (2005). Complexity of (iterated) dominance.EC ’05:Proceedings of the 6th ACM conference on Electronic commerce (pp. 88–97). NewYork: ACM Press.

Correa, J. R., Schulz, A. S., & Stier-Moses, N. E. (2005). On the inefficiency ofequilibria in nonatomic congestion games.Proceedings of the 11th Conference onInteger Programming and Combinatorial Optimization (IPCO’05) (pp. 167–181).

Cramton, P., Shoham, Y., & Steinberg, R. (Eds.). (2006).Combinatorial auctions. MITPress.

Crawford, V. P., & Sobel, J. (1982). Strategic information transmission.Econometrica,50(6), 1431–1451.

Daskalakis, C., Fabrikant, A., & Papadimitriou, C. (2006a). The game world is flat:The complexity of Nash equilibria in succinct games.ICALP.

Daskalakis, C., Goldberg, P. W., & Papadimitriou, C. H. (2006b). The complexity ofcomputing a nash equilibrium.STOC.

Daskalakis, C., & Papadimitriou, C. (2006). Computing pureNash equilibria viamarkov random fields.ACM Conference on Electronic Commerce.

d’Aspremont, C., & Gerard-Varet, L. (1979). Incentives and incomplete information.Journal of Public Economics, 11(1), 25–45.

Davis, R., & Smith, R. G. (1983). Negotiation as a metaphor for distributed problemsolving. Artificial Intelligence, 20, 63–109.

de Borda, J.-C. C. (1781). Memoire sur leselections au scrutin.Histoire de l’AcademieRoyale des Sciences.

de Condorcet, M. J. A. N. C. (1784). Essay on the application of analysis to the prob-ability of majority decisions.

c© Shoham and Leyton-Brown, 2007

Page 429: Mas kevin b-2007

BIBLIOGRAPHY 413

Deng, X., & Papadimitriou, C. H. (1994). On the complexity ofcooperative solutionconcepts.Mathematics of Operations Research, 19, 257.

Edelman, B., Schwarz, M., & Ostrovsky, M. (2007). Internet advertising and the gener-alized second price auction: Selling billions of dollars worth of keywords.AmericanEconomic Review, 97(1), 242–259.

Fabrikant, A., Papadimitriou, C., & Talwar, K. (2004). The complexity of pure Nashequilibria.Proceedings of the 36th ACM Symposium on Theory of Computing(STOC’04).

Fagin, R., Halpern, J. Y., Moses, Y., & Vardi, M. Y. (1995).Reasoning about knowl-edge. MIT Press.

Farrell, J. (1987). Cheap talk, coordination, and entry.RAND Journal of Economics,18(1), 34–39.

Farrell, J. (1993). Meaning and credibility in cheap-talk games.Games and EconomicBehavior, 5(4), 514–531.

Farrell, J. (1995). Talk is cheap.American Economic Review, 85(2), 186–190. Papersand Proceedings of the Hundredth and Seventh Annual Meetingof the AmericanEconomic Association Washington, DC, January 6-8, 1995 (May, 1995).

Farrell, J., & Rabin, M. (1996). Cheap talk.Journal of Economic Perspectives, 10(3),103–118.

Feldman, A. M., & Serrano, R. (2006).Welfare economics and social choice theory.Kluwer Academic Publishers.

Ferguson, G., & Allen, J. F. (1998). TRIPS: An intelligent integrated problem-solvingassistant.Proceedings of the 15th National Conference on Artificial Intelligence(pp.567–573).

Finin, T., Labrou, Y., & Mayfield, J. (1997). KQML as an agent communication lan-guage. In J. Bradshaw (Ed.),Software agents. Cambridge: MIT Press.

Fitting, M., & Mendelsohn, R. L. (1999).First-order modal logic. Springer.

Foster, D., & Vohra, R. (1999). Regret in the on-line decision problem. Games andEconomic Behavior, 29, 7–36.

Fudenberg, D., & Levine, D. (1995). Universal consistency and cautious fictitious play.Journal of Economic Dynamics and Control, 19, 1065–1089.

Fudenberg, D., & Levine, D. (1999). Conditional universal consistency.Games andEconomic Behavior, 29, 104–130.

Fudenberg, D., & Levine, D. K. (1998).The theory of learning in games. Cambridge,MA: MIT Press.

Multiagent Systems, draft of August 27, 2007

Page 430: Mas kevin b-2007

414 BIBLIOGRAPHY

Fudenberg, D., & Tirole, J. (1991).Game theory. MIT Press.

Fujishima, Y., Leyton-Brown, K., & Shoham, Y. (1999a). Taming the computationalcomplexity of combinatorial auctions: Optimal and approximate approaches.Inter-national Joint Conferences on Artificial Intelligence (IJCAI) (pp. 548–553).

Fujishima, Y., McAdams, D., & Shoham, Y. (1999b). Speeding up ascending-bidauctions.IJCAI-99: Proceedings of the Sixteenth International Joint Conference onArtificial Intelligence, 554–563.

Gaertner, W. (2006).A primer in social choice theory. New York: Oxford UniversityPress.

Garcıa, C. B., & Zangwill, W. I. (1981).Pathways to solutions, fixed points and equi-libria . New Jersey: Prentice-Hall.

Gardenfors, P. (1988).Knowledge in flux: Modeling the dynamics of epistemic states.Cambridge, MA, USA: Bradford Book, MIT Press.

Gardenfors, P., & Rott, H. (1995). Belief revision. InHandbook of logic in artificialintelligence and logic programming, vol. 4, 35–132. Oxford University Press.

Geanakoplos, J. (2005). Three brief proofs of Arrow’s impossibility theorem. Eco-nomic Theory, 26(1), 211–215.

Gibbard, A. (1973). Manipulation of voting schemes: A general result.Econometrica,41, 587–601.

Gilboa, I. (1988). The complexity of computing best-response automata in repeatedgames.Journal of Economic Theory, 45, 342–352.

Gilboa, I., Kalai, E., & Zemel, E. (1989). The complexity of eliminating dom-inated strategiesDiscussion Papers 853). Northwestern University, Center forMathematical Studies in Economics and Management Science.Available athttp://ideas.repec.org/p/nwu/cmsems/853.html .

Gilboa, I., & Zemel, E. (1989). Nash and correlated equilibria: Some complexityconsiderations.Games and Economic Behavior, 1, 80–93.

Ginsberg, M. L. (Ed.). (1987).Readings in nonmonotonic reasoning. San Francisco,CA, USA: Morgan Kaufmann Publishers Inc.

Goldberg, A., Hartline, J., Karlin, A., Saks, M., & Wright, A.(2006). Competitiveauctions.Games and Economic Behavior, 55(2), 242–269.

Goldberg, P. W., & Papadimitriou, C. H. (2006). Reducibility among equilibrium prob-lems. STOC ’06: Proceedings of the thirty-eighth annual ACM symposium on The-ory of computing(pp. 61–70).

Gottlob, G., Greco, G., & Scarcello, F. (2003). Pure Nash equilibria: Hard and easygames.Proceedings of the 9th conference on Theoretical aspects ofrationality andknowledge (TARK).

c© Shoham and Leyton-Brown, 2007

Page 431: Mas kevin b-2007

BIBLIOGRAPHY 415

Govindan, S., & Wilson, R. (2003). A global Newton method to compute Nash equi-libria. Journal of Economic Theory, 110, 65–86.

Govindan, S., & Wilson, R. (2005a). Essential equilibria.Proceedings of the NationalAcademy of Sciences USA, 102, 15706–15711.

Govindan, S., & Wilson, R. (2005b). Refinements of Nash equilibrium. In S. Durlaufand L. Blume (Eds.),The new Palgrave dictionary of economics, vol. II. New York:Macmillan.

Graham, D., & Marshall, R. (1987). Collusive bidder behavior at single-object second-price and English auctions.Journal of Political Economy, 95, 579–599.

Green, J., & Laffont, J. (1977). Characterization of satisfactory mechanisms for therevelation of preferences for public goods.Econometrica, 45(2), 427–438.

Gregoire, E., & Konieczny, S. (2006). Special issue on logic-based approaches toinformation fusion.Information Fusion, 7(1).

Grice, H. P. (1969). Utterer’s meaning and intention.The Philosophical Review, 78,147–77.

Grice, H. P. (1989).Studies in the way of words. Harvard University Press.

Groves, T. (1973). Incentives in teams.Econometrica, 41(4), 617–31.

Guestrin, C. E. (2003).Planning under uncertainty in complex structured environ-ments. Doctoral dissertation, Department of Computer Science, Stanford University.

Guo, M., & Conitzer, V. (2007). Worst-case optimal redistribution of VCG payments.Proceedings of the 8th ACM Conference on Electronic Commerce (EC-07).

Halpern, J. Y., & Moses, Y. (1990). Knowledge and common knowledge in a dis-tributed environment.Journal of the ACM (JACM), 37(3), 549–587.

Hannan, J. F. (1957). Approximation to Bayes risk in repeated plays.Contributions tothe Theory of Games, 3, 97–139.

Hanson, R. (2003). Combinatorial Information Market Design. Information SystemsFrontiers, 5(1), 107–119.

Harsanyi, J. (1967-8). Games with incomplete information played by ‘Bayesian’ play-ers, parts I, II and III.Management Science, 14, 159–182, 320–334 and 486–502.

Harstad, R., Kagel, J., & Levin, D. (1990). Equilibrium bid functions for auctions withan uncertain number of bidders.Economics Letters, 33(1), 35–40.

Hart, S., & Mas-Colell, A. (2000). A simple adaptive procedure leading to correlatedequilibrium. Econometrica, 68, 1127–1150.

Multiagent Systems, draft of August 27, 2007

Page 432: Mas kevin b-2007

416 BIBLIOGRAPHY

Hillas, J., & Kohlberg, E. (2002). Foundations of strategicequilibrium. In R. Au-mann and S. Hart (Eds.),Handbook of game theory, vol. III, chapter 42, 1597–1663.Amsterdam: Elsevier.

Hintikka, J. (1962).Knowledge and belief. Cornell University Press.

Hu, J., & Wellman, P. (1998). Multiagent reinforcement learning: Theoretical frame-work and an algorithm.Proceedings of the Fifteenth International Conference onMachine Learning(pp. 242–250).

Hurwicz, L. (1975). On the existence of allocation systems whose manipulative nashequilibria are pareto optimal. unpublished.

Ieong, S., & Shoham, Y. (2005). Marginal contribution nets:a compact representationscheme for coalitional games.EC ’05: Proceedings of the 6th ACM conference onElectronic commerce.

Ieong, S., & Shoham, Y. (2006). Multi-attribute coalitional games.EC ’06: Proceed-ings of the 7th ACM conference on Electronic commerce(pp. 170–179).

Jiang, A. X., & Leyton-Brown, K. (2006). A polynomial-time algorithm for Action-Graph Games.AAAI.

Kaelbling, L. P., Littman, M. L., & Moore, A. P. (1996). Reinforcement learning: Asurvey.Journal of Artificial Intelligence Research, 4, 237–285.

Kalai, E., & Lehrer, E. (1993). Rational learning leads to nash equilibrium.Economet-rica, 61(5), 1019–1045.

Katsuno, H., & Mendelzon, A. (1991). On the difference between updating a knowl-edge base and revising it. In J. F. Allen, R. Fikes and E. Sandewall (Eds.),KR’91:Principles of knowledge representation and reasoning, 387–394. San Mateo, Cali-fornia: Morgan Kaufmann.

Kearns, M., Littman, M., & Singh, S. (2001). Graphical models for game theory.UAI.

Kearns, M., & Singh, S. (1998). Near-optimal reinforcementlearning in polynomialtime. Proceedings of the Fifteenth International Conference on Machine Learning(pp. 260–268).

Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment.Journal ofthe ACM (JACM), 46(5), 604–632.

Klemperer, P. (1999a). Auction theory: A guide to the literature. Journal of EconomicSurveys, 13(3), 227–286.

Klemperer, P. (Ed.). (1999b).The economic theory of auctions. Edward Elgar.

Kohlberg, E., & Mertens, J.-F. (1986). On the strategic stability of equilibria. Econo-metrica, 54, 1003–1038.

c© Shoham and Leyton-Brown, 2007

Page 433: Mas kevin b-2007

BIBLIOGRAPHY 417

Koller, D., & Megiddo, N. (1992). The complexity of two-person zero-sum games inextensive form.Games and Economic Behavior, 4, 528–552.

Koller, D., Megiddo, N., & von Stengel, B. (1996). Efficient computation of equilibriafor extensive two-person games.Games and economic behavior, 14, 247–259.

Koller, D., & Milch, B. (2001). Multi-agent influence diagrams for representing andsolving games.IJCAI.

Koller, D., & Pfeffer, A. (1995). Generating and solving imperfect information games.IJCAI (pp. 1185–1193).

Korf, R. E. (1990). Real-time heuristic search.Artif. Intell., 42(2-3), 189–211.

Koutsoupias, E., & Papadimitriou, C. H. (1999). Worst-caseequilibria. Proceedingsof the 16th annual symposium on Theoretical Aspects of Computer Science (STACS)(pp. 404–413).

Kreps, D., & Wilson, R. (1982). Sequential equilibria.Econometrica, 50, 863–894.

Kreps, D. M. (1988).Notes on the theory of choice. Westview Press.

Krishna, V. (2002a).Auction theory. Elsevier Science.

Krishna, V. (2002b).Auction theory. Academic Press.

Krishna, V., & Perry, M. (1998). Efficient Mechanism Design.Pennsylvania StateUniversity.

Kuhn, H. (1953). Extensive games and the problem of information. Contributionsto the Theory of Games(pp. 193–216). Princeton: Princeton University Press.Reprinted in H. Kuhn (ed.),Classics in Game Theory, Princeton University Press,Princeton, New Jersey, 1997.

Lamarre, P., & Shoham, Y. (1994). Knowledge, certainty, belief, and conditionalisa-tion. Principles of Knowledge Representation and Reasoning: Proc. Fourth Inter-national Conference (KR ’94)(pp. 415–424). San Francisco, California: MorganKaufmann.

LaMura, P. (2000). Game networks.UAI.

Lehmann, D., Muller, R., & Sandholm, T. (2006). The winner determination prob-lem. In P. Cramton, Y. Shoham and R. Steinberg (Eds.),Combinatorial auctions,chapter 12, 297–318. MIT Press.

Lehmann, D., O’Callaghan, L., & Shoham, Y. (2002). Truth revelation in approxi-mately efficient combinatorial auctions.Journal of the ACM (JACM), 49(5), 577–602.

Lemke, C. (1978). Some pivot schemes for the linear complementarity problem.Math-ematical Programming Study, 7, 15–35.

Multiagent Systems, draft of August 27, 2007

Page 434: Mas kevin b-2007

418 BIBLIOGRAPHY

Lemke, C., & Howson, J. (1964). Equilibrium points of bimatrix games.Society forIndustrial and Applied Mathematics Journal of Applied Mathematics, 12, 413–423.

Littman, M., Ravi, N., Talwar, A., & Zinkevich, M. (2006). Anefficient optimal equi-librium algorithm for two-player game trees.Twenty-Second Conference on Uncer-tainty in Artificial Intelligence.

Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcementlearning. Proceedings of the 11th International Conference on Machine Learning(pp. 157–163).

Littman, M. L. (2001). Friend-or-foe Q-learning in general-sum games.Proceedingsof the Eighteenth International Conference on Machine Learning.

Littman, M. L., & Szepesvari, C. (1996). A generalized reinforcement-learning model:Convergence and applications.Proceedings of the 13th International Conference onMachine Learning(pp. 310–318).

Luce, R., & Raiffa, H. (1957).Games and decisions. New York: John Wiley and Sons.

Mas-Colell, A., Whinston, M. D., & Green, J. R. (1995).Microeconomic theory. Ox-ford University Press.

Maynard-Reid, P., & Shoham, Y. (2001). Belief fusion: Aggregating pedigreed beliefstates.Journal of Logic, Language and Information, 10(2), 183–209.

Maynard-Zhang, P., & Lehmann, D. (2003). Representing and aggregating conflictingbeliefs.Journal of Artificial Intelligence Research, 19, 155–203.

McAfee, R., & MacMillan, J. (1987). Auctions and bidding.Journal of EconomicLiterature, 25(3), 699–738.

McAfee, R., & McMillan, J. (1987). Auctions with a stochastic number of bidders.Journal of Economic Theory, 43, 1–19.

McAfee, R., & McMillan, J. (1992). Bidding rings.American Economic Review, 82,579–599.

McCarthy, J. (1994). A programming languagebased on speech acts. Stanford University,http://www-formal.stanford.edu/jmc/elephant/elephan t.html .

McKelvey, R., & McLennan, A. (1996). Computation of equilibria in finite games.Handbook of Computational Economics, 1, 87–142.

McKelvey, R. D., McLennan, A. M., & Turocy, T. L. (2006). Gambit: Software toolsfor game theory.http://econweb.tamu.edu/gambit .

Milgrom, P., & Weber, R. (1982). A theory of auctions and competitive bidding.Econometrica, 50(5), 1089–1122.

c© Shoham and Leyton-Brown, 2007

Page 435: Mas kevin b-2007

BIBLIOGRAPHY 419

Milgrom, P., & Weber, R. (2000). A theory of auctions and competitive bidding, II. InP. Klemperer (Ed.),The economic theory of auctions. Edward Elgar.

Milgrom, P. R. (1981). Rational expectations, informationacquisition, and competitivebidding. Econometrica, 49, 921–943.

Monderer, D., & Shapley, L. (1996a). Potential games.Games and Economic Behav-ior, 14, 124–143.

Monderer, D., & Shapley, L. S. (1996b). Fictitious play property for games with iden-tical interests.Journal of Economic Theory, 1, 258–265.

Moore, R. C. (1985). A formal theory of knowledge and action.In Formal theories ofthe commonsense world. Ablex Publishing Corporation.

Moulin, H. (1994). Social choice. In R. Aumann and S. Hart (Eds.), Handbook ofgame theory with economic applications, vol. II. New York: Elsevier.

Muller, E., & Satterthwaite, M. (1977). The equivalence of strong positive associationand strategy-proofness.Journal of Economic Theory, 14, 412–418.

Muller, R. (2006). Tractable cases of the winner determination problem. In P. Cramton,Y. Shoham and R. Steinberg (Eds.),Combinatorial auctions, chapter 13, 319–336.MIT Press.

Myerson, R. (1978). Refinements of the Nash equilibrium concept. InternationalJournal of Game Theory, 7, 73–80.

Myerson, R. (1981). Optimal auction design.Mathematics of Operations Research,6(1), 58–73.

Myerson, R. (1983). Mechanism design by an informed principal. Econometrica, 51,1767–1797.

Myerson, R. (1991).Game theory: analysis of conflict. Harvard Press.

Nachbar, J. (1990). Evolutionary selection dynamics in games: Convergence and limitproperties.International Journal of Game Theory.

Nachbar, J. H., & Zame, W. R. (1996). Non-computable strategies and discountedrepeated games.Journal of Economic Theory, 8(1), 103–122.

Nanson, E. J. (1882). Methods of election.Transactions and Proceedings of the RoyalSociety of Victoria, 18, 197–240.

Nash, J. (1950). Equilibrium points in n-person games.Proceedings of the NationalAcademy of Sciences USA, 36, 48–49. Reprinted in H. Kuhn (ed.),Classics in GameTheory, Princeton University Press, Princeton, New Jersey, 1997.

Nash, J. (1951). Non-cooperative games.Annals of Mathematics, 54, 286–295.

Multiagent Systems, draft of August 27, 2007

Page 436: Mas kevin b-2007

420 BIBLIOGRAPHY

Neyman, A. (1985). Bounded complexity justifies cooperation in finitely repeatedprisoner’s dilemma.Economic Letters, 227–229.

Nisan, N. (2006). Bidding languages for combinatorial auctions. In P. Cramton,Y. Shoham and R. Steinberg (Eds.),Combinatorial auctions, chapter 9, 215–232.MIT Press.

Nisan, N. (2007). Introduction to mechanism design (for computer scientists). InN. Nisan, T. Roughgarden, E. Tardos and V. Vazirani (Eds.),Algorithmic game the-ory. Cambridge University Press.

Nisan, N., & Ronen, A. (2007). Computationally feasible vcgmechanisms.Journal ofArtificial Intelligence Research (JAIR), 29, 19–47.

Nisan, N., Roughgarden, T., Tardos, E., & Vazirani, V. (Eds.). (2007). Algorithmicgame theory. Cambridge University Press.

Nudelman, E., Wortman, J., Shoham, Y., & Leyton-Brown, K. (2004). Run theGAMUT: A comprehensive approach to evaluating game-theoretic algorithms. In-ternational Conference on Autonomous Agents and Multi-Agent Systems (AAMAS)(pp. 880–887).

Osborne, M. J., & Rubinstein, A. (1994).A course in game theory. MIT Press.

Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The pagerank citation ranking:Bringing order to the web.

Papadimitriou, C. (2005). Computing correlated equilibria in multiplayer games.STOC.

Papadimitriou, C., & Yannakakis, M. (1994). On bounded rationality and computa-tional complexity.STOC: Proceedings of the Symposium on the Theory of Comput-ing (pp. 726–733).

Papadimitriou, C. H. (1992). On players with a bounded number of states.Games andEconomic Behavior, 122–131.

Papadimitriou, C. H. (1994). On the complexity of the parityargument and otherinefficient proofs of existence.Journal of Computer and System Sciences, 48(3),498–532.

Parikh, P. (2001).The use of language. Stanford, California: CSLI Publications.

Parkes, D. (2001).Iterative combinatorial auctions: Achieving economic andcompu-tational efficiency. Doctoral dissertation, University of Pennsylvania.

Parkes, D. C. (2006). Iterative combinatorial auctions. InP. Cramton, Y. Shoham andR. Steinberg (Eds.),Combinatorial auctions, chapter 2, 41–78. MIT Press.

Pauly, M. (2002). A modal logic for coalitional power in games. Logic and Computa-tion, 12(1), 149–166.

c© Shoham and Leyton-Brown, 2007

Page 437: Mas kevin b-2007

BIBLIOGRAPHY 421

Pearce, D. (1984). Rationalizable strategic behavior and the problem of perfection.Econometrica, 52, 1029–1050.

Peirce, C., Hartshorne, C., & Weiss, P. (1965).Collected papers of Charles SandersPeirce. Harvard University Press. 8 vols.

Peleg, B., & Sudholter, P. (2003). Introduction to the theory of cooperative games.Kluwer Academic Publishers.

Pennock, D. (2004). A dynamic pari-mutuel market for hedging, wagering, and infor-mation aggregation.ACM Conference on Electronic Commerce.

Pigou, A. C. (1920).The economics of welfare. Macmillan.

Poole, D., Mackworth, A., & Goebel, R. (1997).Computational intelligence: a logicalapproach. Oxford, UK: Oxford University Press.

Porter, R., Nudelman, E., & Shoham, Y. (2004a). Simple search methods for findinga nash equilibrium. AAAI: Proceedings of the National Conference on ArtificialIntelligence(pp. 664–669).

Porter, R., Shoham, Y., & Tennenholtz, M. (2004b). Fair imposition. Journal of Eco-nomic Theory, 118(2), 209–228.

Powers, R., & Shoham, Y. (2005a). Learning against opponents with bounded mem-ory. Proceedings of the Nineteenth International Joint Conference on Artificial In-telligence.

Powers, R., & Shoham, Y. (2005b). New criteria and a new algorithm for learning inmulti-agent systems. InAdvances in neural information processing systems 17. MITPress.

Rabin, M. (1990). Communication between rational agents.Journal of EconomicTheory, 51(1), 144–170.

Riley, J., & Samuelson, W. (1981). Optimal auctions.The American Economic Review,71(3), 381–392.

Roberts, K. (1979). The characterization of implementablechoice rules.Aggregationand Revelation of Preferences, 321–348.

Robinson, J. (1951). An iterative method of solving a game.Annals of Mathematics,54, 298–301.

Rosenthal, R. (1981). Games of perfect information, predatory pricing and the chain-store paradox.Journal of Economic Theory, 25(1), 92–100.

Rosenthal, R. W. (1973). A class of games possessing pure-strategy Nash equilibria.International Journal of Game Theory, 2, 65–67.

Roughgarden, T. (2005).Selfish routing and the price of anarchy. MIT Press.

Multiagent Systems, draft of August 27, 2007

Page 438: Mas kevin b-2007

422 BIBLIOGRAPHY

Roughgarden, T., & Tardos, E. (2004). Bounding the inefficiency of equilibria innonatomic congestion games.Games and economic behavior, 47(2), 389–403.

Rubinstein, A. (1998).Modeling bounded rationality. MIT Press.

Rubinstein, A. (2000).Economics and language: Five essays. Cambridge UniversityPress.

Russell, S., & Norvig, P. (2003).Artificial intelligence: A modern approach. Prentice-Hall, Englewood Cliffs, NJ. 2nd edition.

Sandholm, T. (2006). Optimal winner determination algorithms. In P. Cramton,Y. Shoham and R. Steinberg (Eds.),Combinatorial auctions, chapter 14, 337–368.MIT Press.

Sandholm, T., & Boutilier, C. (2006). Preference elicitation in combinatorial auctions.In P. Cramton, Y. Shoham and R. Steinberg (Eds.),Combinatorial auctions, chap-ter 10, 233–264. MIT Press.

Sandholm, T. W. (1993). An implementation of the contract net protocol based onmarginal cost calculations.Proceedings of the 12th International Workshop on Dis-tributed Artificial Intelligence(pp. 295–308).

Sandholm, T. W. (1998). Contract types for satisficing task allocation: I theoreticalresults.Proceedings of the AAAI Spring Symposium.

Satterthwaite, M. (1975). Strategy-proofness and arrow’sconditions: Existence andcorrespondence theorems for voting procedures and social welfare functions.Jour-nal of Economic Theory, 10, 187–217.

Savage, L. J. (1954).The foundations of statistics. New York: John Wiley and Sons.(Second Edition, Dover Press, 1972).

Scarf, H. (1967). The approximation of fixed points of continuous mappings.SIAMJournal of Applied Mathematics, 15, 1328–1343.

Schlechta, K. (1997). Nonmonotonic logics – basic concepts, results, and techniques.Springer Lecture Notes series, LNAI.

Schmeidler, D. (1973). Equilibrium points of nonatomic games.Journal of StatisticalPhysics, 7(4), 295–300.

Schuster, P., & Sigmund, K. (1982). Replicator dynamics.Theoretical Biology, 100,533–538.

Searl, J. R. (1979).Expression and meaning: Studies in the theory of speech acts.London: Cambridge University Press.

Segal, I. (2006). The communication requirements of combinatorial allocation prob-lems. In P. Cramton, Y. Shoham and R. Steinberg (Eds.),Combinatorial auctions,chapter 11, 265–295. MIT Press.

c© Shoham and Leyton-Brown, 2007

Page 439: Mas kevin b-2007

BIBLIOGRAPHY 423

Selten, R. (1965). Spieltheoretische Behandlung eines Oligopolmodells mit Nachfrage-traegheit.Zeitschrift fur die gesamte Staatswissenschaft, 12, 301–324.

Selten, R. (1975). Reexamination of the perfectness concept for equilibrium points inextensive games.International journal of game theory, 4, 25–55.

Shapley, L. (1964). Some topics in two-person games. In M. Drescher, L. Shapley andA. Tucker (Eds.),Advances in game theory. Princeton University Press.

Shapley, L. (1974).A note on the Lemke-Howson algorithm. Rand.

Shapley, L. S. (1953). Stochastic games.Proceedings of the National Academy ofSciences of the United States of America, 39, 1095–1100.

Shoham, Y. (1993). Agent oriented programming.Artificial Intelligence, 60(1), 51–92.

Shoham, Y. (1997).Rational programming(Technical Report). Stanford UniversityComputer Science Department.

Shoham, Y., Powers, W. R., & Grenager, T. (2007). If multiagent learning is the answer,what is the question?Artificial Intelligence, 171(1). Special issue on Foundationsof Multiagent Learning.

Shoham, Y., & Tennenholtz, M. (1997). On the emergence of social conventions:modeling, analysis and simulations.Journal of Artificial Intelligence, 94(1-2), 139–166.

Smith, J. H. (1973). Aggregation of preferences with variable electorate.Economet-rica, 41, 1027–1041.

Smith, J. M. (1972). Game theory and the evolution of fighting. In On evolution, 8–28.Edinburgh: Edinburgh University Press.

Smith, J. M. (1982).Evolution and the theory of games. Cambridge University Press.

Smith, J. M., & Price, G. R. (1973). The logic of animal conflict. Nature, 246, 15–18.

Smith, R. G. (1980). The contract net protocol: High-level communication and controlin a distributed problem solver.IEEE Transactions on Computers, C-29(12), 1104–1113.

Solan, E., & Vohra, R. (2002). Correlated equilibrium payoffs and public signalling inabsorbing games.International Journal of Game Theory, 31, 91–121.

Spann, M., & Skiera, B. (2003). Internet-Based Virtual Stock Markets for BusinessForecasting.Management Science, 49(10), 1310–1326.

Spence, A. M. (1973). Job market signaling.Quarterly Journal of Economics, 87,355–374.

Taylor, P., & Jonker, L. B. (1978). Evolutionarily stable strategies and game dynamics.Mathematical Biosciences, 40, 145–156.

Multiagent Systems, draft of August 27, 2007

Page 440: Mas kevin b-2007

424 BIBLIOGRAPHY

van Benthem, J. (1997).Exploring logical dynamics. Stanford, CA, USA: Center forthe Study of Language and Information.

van der Hoek, W., & Pauly, M. (2006). Modal logic for games andinformation. InP. Blackburn, van J. Benthem and F. Wolter (Eds.),Handbook of modal logic, 3.Elsevier.

Varian, H. (2007). Position auctions.International Journal of Industrial Organization.in press.

Vickrey, W. (1961). Counterspeculation, auctions, and competitive sealed tenders.TheJournal of Finance, 16(1), 8–37.

Vohra, R., & (Eds.), M. P. W. (2007). special issue on foundations of multiagent learn-ing. Artificial Intelligence, 171(1).

von Neumann, J. (1928). Zur Theorie der Gesellschaftsspiele. Math. Annalen, 100,295–320.

von Neumann, J., & Morgenstern, O. (1944).Theory of games and economic behavior.Princeton University Press.

von Neumann, J., & Morgenstern, O. (1947).Theory of games and economic behavior,second edition. Princeton University Press. Second edition.

von Stackelberg, H. F. (1934).Marktform und Gleichgewicht (market and equilibrium).Vienna: Julius Springer.

von Stengel, B. (1996). Efficient computation of behavior strategies. Games andeconomic behavior, 14, 220–246.

von Stengel, B. (2002). Computing equilibria for two-person games. In R. Aumannand S. Hart (Eds.),Handbook of game theory, vol. III, chapter 45, 1723–1759. Am-sterdam: Elsevier.

Vu, T., Powers, R., & Shoham, Y. (2006). Learning against multiple opponents.Pro-ceedings of the Fifth International Joint Conference on Autonomous Agents andMulti Agent Systems.

Wardrop, J. G. (1952). Some theoretical aspects of road traffic research.Proceedingsof the Institute of Civil Engineers, Pt. II(pp. 325–378).

Watkins, C. J. (1989). Learning from delayed rewards.

Watkins, C. J. C. H., & Dayan, P. (1992). Technical note: Q-learning.Machine Learn-ing, 8(3/4), 279–292.

Wellman, M. P. (1993). A market-oriented programming environment and its applica-tion to distributed multicommodity flow problems.Journal of Artificial IntelligenceResearch, 1, 1–23.

c© Shoham and Leyton-Brown, 2007

Page 441: Mas kevin b-2007

BIBLIOGRAPHY 425

Wellman, M. P., Walsh, W. E., Wurman, P. R., & MacKie-Mason, J. K. (2001). Auctionprotocols for decentralized scheduling.Games and Economic Behavior, 35(1-2),271–303.

Williams, S. (1999). A characterization of efficient, Bayesian incentive compatiblemechanisms.Economic Theory, 14(1), 155–180.

Wilson, R. (1969). Competitive bidding with disparate information. ManagementScience, 15(7), 446–448.

Wilson, R. (1987a). Auction theory. In J. Eatwell, M. Milgate and P. Newman (Eds.),The new Palgrave dictionary of economics, vol. I. London: Macmillan.

Wilson, R. (1987b). Game-theoretic approaches to trading processes.Advances inEconomic Theory: Fifth World Congress, 33–77.

Winograd, T., & Flores, C. F. (1986).Understanding computers and cognition: A newfoundation for design. Ablex Publishing Corporation.

Wurman, P., Wellman, M., & Walsh, W. (2001). A parametrization of the auctiondesign space.Games and Economic Behavior, 35(1-2), 304–338.

Yokoo, M. (1994). Weak-commitment search for solving constraint satisfaction prob-lems.Proc. 12th Nat’l Conf. Artificial Intelligence (AAAI)(pp. 313–318).

Yokoo, M. (2001).Distributed constraint satisfaction: Foundations of cooperation inmulti-agent systems. Springer-Verlag.

Yokoo, M., & Hirayama, K. (2000). Algorithms for distributed constraint satisfaction:A review. Autonomous Agents and Multi-Agent Systems, 3(2), 185–207.

Yokoo, M., & Ishida, T. (1999). Search algorithms for agents. In G. Weiss (Ed.),Multiagent systems: A modern approach to distributed artificial intelligence, 165–200. MIT Press.

Young, H. P. (2004).Strategic learning and its limits. Oxford University Press.

Zalta, E. N. (Ed.). (1997). The stanford encyclopedia of philosophy. Cen-ter for the Study of Language and Information, Stanford University. URL =http://plato.stanford.edu/.

Zermelo, E. F. F. (1913).Uber eine Anwendung der Mengenlehre auf die Theorie desSchachspiels.Fifth International Congress of Mathematicians, II , 501–504.

Multiagent Systems, draft of August 27, 2007

Page 442: Mas kevin b-2007

Index

ǫ-Nash equilibrium,66ǫ-competitive equilibrium,24ǫ-competitive equilibrium, 24kth-price auction,275(n-agent) KB-model, 364(n-agent) KB-structure, 363(common prior) multiagent probability struc-

ture, 370ex antebudget balance,253ex-interimexpected utility,143ex-postimplementation,245ex-postequilibrium, 147EXPECTEDUTILITY problem,158

absolute continuity,183absorbing state,179accessibility relation,346, 346achievement goal,390action graph,163action profile,45, 45action,or pure strategy,45action-graph game,164actions, 45activity rules,305, 315additive game,325admissible,13affiliated values,293affine maximizer,267agency,387, 387Agent Oriented Programming (AOP),220agent-based simulation,201AGM postulates,376, 376AGoal, 390all-or-nothing bids,294alpha-beta pruning, 102alternativeness relation,346, 346anonymity,148, 164anonymous learning rule,202

anti-coordination game,180, 180, 197anytime property,16appeal function,269approval voting,227arc consistency,4, 4Arrow’s impossibility theorem,230Arrow; d’Aspremont–Gerard-Varet (AGV)

mechanism,265ascending combinatorial auction,316ascending-auction algorithm,29assignment problem,19asymmetric auctions,287asymptotically stable state,197asynchronous dynamic programming,13atomic bid,311auction theory,242auto-compatibility,193average reward,126, 135

babbling equilibrium,206backoff, 44backward induction,101, 375, 375balanced,328balanced matrices,308Barcan formula,348, 348bargaining set, 331battle of the sexes game,48, 49, 53, 64Bayes’ rule,398Bayes-Nash equilibrium,144Bayes-Nash implementation,244Bayes-Nash incentive compatibility,245Bayesian game,137, 137, 139, 139, 141,

141ex-anteexpected utility, 143ex-postexpected utility, 142

Bayesian learning,182Bayesian updating,182BDI theories,386

Page 443: Mas kevin b-2007

Index 427

behavioral strategy,108, 108,135, 135belief arbitration,381, 381belief contraction,379belief expansion,379belief fusion, 382,383belief revision,374, 374belief set,381, 381belief state,381, 381belief update,379Bellman equations,404best response,52best response in a Bayesian game,144best-response function,55bidding language,299bidding ring,288bidding rules,18Borda voting,227Braess’ paradox,156Brouwer’s fixed-point theorem,54, 54budget balance,245, 252

call market,318capability,387, 387cartel,288centipede game,106, 106characteristic function,324characteristic vector,328chess, 104chicken game,174, 174choice rule,251choice-set monotonicity,262choronological backtracking,8Church-Rosser property,62Clarke tax,258click-through rate,302cluster contracts,17coalition logic,385coalitional game in partition form,324coalitional game theory, 37coalitional game with non-transferable util-

ity, 324coalitional game with transferable utility,324collusion, 261,288, 315combinatorial auctions,305commitment,387, 387common (probabilistic) belief,372, 372common belief,361, 361common knowledge,353, 353common prior,369, 369common prior probability, 370

common value,291common-payoff game, 46,47common-prior assumption,139, 139competitive equilibrium,20, 27complementarity,26, 304complementarity condition,74conditional probability,398conditional strategy,34conditional strict dominance,80Condorcet condition,224, 228Condorcet winner,225configuration,163congestion games,147

nonatomic, 152consecutive ones property,308consistency,349, 349constant-sum game,326constant-sum games,47constraint satisfaction problem (CSP),2continuous double auction (CDA),318contract nets,15contraction, 379conversational implicature,213convex game,326Cooperative Principle,212coordination game, 47, 172, 206coordination graph, 35core,327correlated equilibrium,64, 65cumulative voting,226

default reasoning,374, 374defeasible, 362degenerate game,75demand queries,316descendant, 96descriptive theory,174desire,387, 387detail free auctions,287dialog system,217dictatorship,230, 233direct mechanism,245disambiguation,214discriminatory pricing rule,294divisible bids,294dominant strategy, 60dominant strategy equilibrium,60dominated strategy,60dominating belief state,383, 383double auction,318

Multiagent Systems, draft of August 27, 2007

Page 444: Mas kevin b-2007

428 Index

dual operator, 346dual problem,400dummy agent,330Dutch auction,274

multiunit, 295dynamic programming,11

E3 algorithm,190economic efficiency, 252efficiency,252elimination auction,275empirical frequency,55, 55English auction,274

multiunit, 294entailment,405entry costs,285epistemic type,141Euclidean accessibility relation,350Euclidean relation, 350evaluation function,105event space,397evolutionarily stable strategies,68evolutionarily stable strategy (ESS),198excess,331expansion, 379expected utility,50, 50Expectimax algorithm,146exposure problem,305extensive form, 95

imperfect information, 95perfect information, 95,96

external regret, 192

false-name bidding,306feasibility program,73feasible assignment,19feasible payoff set,327feasibly truthful,269fictitious play,177filtering algorithm,4finite state automaton,129first-price auction,275fitness,195fixed-point axiom, 353focal point,207focal points,217Folk Theorem,127frugality ratio,260fully mixed strategy,49fusion, 374, 383

future discounted reward,126, 135futures markets,320

GAMBIT, 93game in matrix form,45game in strategic form,45game networks,169game of incomplete information,137, 137games of asymmetric information,210GAMUT, 93generalized first-price auction (GFP),302generalized second-price auction (GSP),303Generalized Vickrey Auction,258goal,387, 387grain of truth,184graphical game,161graphical games,72greedy allocation scheme,317Gricean maxims,212Groves mechanisms,254Groves-based mechanisms,268gullibility rule, 376, 376

hawk-dove game, 199Highest Cumulative Reward (HCR),202history, 96, 352homotopy method,83hyper-resolution,6hyperstable sets,68

illocutionary act,211imperfect information, 107imperfect-information game,107imperfect-information games, 107implementation in dominant strategies,243implementation theory,242implicature,213imputation,327incentive compatibility in dominant strate-

gies,245Independence of Irrelevant Alternatives (IIA),

230independent private value (IPV),277independent variables,398indirect speech act,214individual rationality,245

ex-interim, 253ex-post, 253

information dissemination rules,18information market,319

c© Shoham and Leyton-Brown, 2007

Page 445: Mas kevin b-2007

Index 429

information set, 107information sets,107Integer programs,401integer programs

0-1,401intention,386, 387interchangeable agents,330interior point methods,401interpretation, 370,405, 406interpretation complexity,314irreducible stochastic game,136iterated belief revision,383iteration, 383

Japanese auction,274multiunit, 295

job hunt game, 208joint intention,393joint persistent goal,393joint probability,398

Kakutani’s theorem,54, 54KB-model,364KB-structure,363kernel, 331KQML, 219Kripke model,346, 346Kripke structure,346, 346

learning and teaching,172learning real-time A∗ (LRTA∗), 13Lemke-Howson algorithm,73, 74Levi identity,379lexicographic probability system (LPS),378,

378Lexicographic Probability Systems (LPSs),

378linear complementarity problem (LCP),73linear program,399linear program relaxation,402linkage principle,293local learning rule,202locally envy-free,303locutionary act,211logical omniscience,349, 349lotteries,39

machine games,130majority game,325marginal contribution net representation,337

marginal probability,398market clearing rules,18Markov Decision Problem,403Markov Decision Process (MDP), 135Markov game,134Markov perfect equilibrium (MPE),136, 136Markov strategy,135, 135matching pennies game,48, 48, 53, 189matrix form, 45max-min fairness,254maximum regret,59maxmin strategy,55, 55maxmin value,55, 55MDP, 403mechanism,243mechanism design, 60,241, 242mechanism in the quasilinear setting,251minimax algorithm,102Minimax regret,59minimax-Q,188minmax strategy, 56minmax strategy in a two-player game,56minmax strategy in ann-player game,56minmax value,56, 56minority games, 148mixed strategy,49

support of,49mixed strategy profile,49mixed-integer programs,401mixing time,190modal operator,345, 345model,405, 406Modus Ponens,360, 360money pump,40monotonicity,233monotonicity rule,377, 377Moore machine,129multi-issue representation,336multiagent contracts,17Multiagent influence diagrams (MAIDs),165multiagent MDP,32multiagent probability structure,370multiunit auctions,293mutation,196myopic best response,149

n-person normal-form game, 45Nanson’s method,227Nash equilibrium,52

ǫ-Nash, 66

Multiagent Systems, draft of August 27, 2007

Page 446: Mas kevin b-2007

430 Index

strict, 52weak, 52

Necessitation,360, 360negative introspection,350, 350neighborhood relation,161no externalities,304no negative externalities,262no single-agent effect,263no-regret,177no-regret learning rule,192non-linear complementarity problem,81non-monotonic logics,394non-ranking voting,226non-standard probabilities,378, 378nonmonotonic reasoning,374, 374,377, 377nucleolus,331

observability,173online auction,301open outcry auction,275optimal auctions,286optimal policy,403optimal single price,300optimal strategy,50, 50OR bid,311OR* bid, 312OR-of-XOR bid,312order book,318order statistic,283

pairwise elimination,227Pareto domination,51Pareto Efficiency (PE),230Pareto optimality,51partial substitutes, 304partition model,342, 342payment rule,251payoff function,45, 45pedigreed belief state, 382pedigreed belief state,382perfect equilibrium, 66perfect recall,110, 110perfect-information game,96performative,212periodic double auction,318perlocutionary act,211pivot mechanism,258plan, 388plurality voting,224plurality with elimination,227

polynomial type,159pooling equilibrium,211Popper functions,378, 378position auctions,301positive introspection,350, 350possible world, 346, 370possible worlds structure,346, 346potential games,150

exact, 150ordinal, 150weighted, 150

pre-imputation,327prediction market,319preference ordering,224preference profile,225preferences,39prescriptive theory,174price of anarchy,154primal problem,400principle of optimality,11prioritization rule,376, 376prisoner’s dilemma game,44, 44, 46, 182,

205probability density function,397Probably Approximately Correct (PAC) learn-

ing, 190production economy,21proper equilibrium,66, 68proper simple game,326proxy bidding,278pruning,102pseudonymous bidding,306pure coordination games, 47pure strategy, 45,49, 49pure strategy profile,45, 45pure-coordination game,47

Q-learning,186quasi-partition,361, 361quasilinear mechanism

direct, 251quasilinear preferences,248quasilinear utility functions,248

R-max algorithm,190random sampling optimal price auction,300ranked independence of irrelevant alterna-

tives (RIIA), 238ranking rule,236ranking systems setting,236

c© Shoham and Leyton-Brown, 2007

Page 447: Mas kevin b-2007

Index 431

ranking voting,227rational balance,387, 387rational consequence relation,377, 377rational learning,182rational programming,219rationality of a learning rule,176rationalizable,63rationalizable strategies,64reachability relation,346, 346realization plan,113

of βi, 112realization probability,112reflexive accessibility relation,350, 350regret,59, 192regret matching,192reinforcement learning,186relaxation method,307replicator dynamic,195representation theorem,347, 347request for quote,285revealing equilibrium,207Revelation Principle,246revenue maximization,253revenue minimization,254reverse auction,285ring center,288risk attitude,249risk averse,249risk neutral,249risk seeking,249Rock, Paper, Scissors,or Rochambeau game,

48Rock, Paper, Scissors,or Rochambeau game,

48rules of conversation,212

safety,193safety of a learning rule,176sample space,397Santa Fe Bar Problem,148scheduling problem,25Sealed-bid auction

multiunit, 294sealed-bid auction,275second chance mechanism,269second-price auction,275security level,55, 55self coherent, 372self coherent belief,372self committing utterance,206

self play,176, 177self revealing utterance,206selfish routing,153Semantic Web,219sensor networks,1Separable Reward State Independent Tran-

sition (SR-SIT) stochastic game,137separating equilibrium,211sequence,111sequence form,110, 147sequence form payoff function,111sequence-form representation,111sequential auctions,297sequential equilibrium,119, 119serial accessibility relation,349serial models, 349set packing problem,26, 307shadow price,21Shapley value,329Shapley’s almost-rock-paper-scissors game,

181sharing game, 96shill bid, 306signaling game,209silent auction,275simple exchange,264simple game,326simplex, 54simplex algorithm,401simplicial subdivision algorithms,83simplotope,83simultaneous ascending auction,305, 315single-controller stochastic game,135, 137single-minded,317single-parameter valuations,268single-sided auctions,274slack variables,70Smith set,225smooth fictitious play,192social choice correspondence,225social choice function,225social choice problems,223social convention,201social rule,201social welfare function,226social-welfare maximization.,252solution concept, 50solution concepts,50solvable, 62Spence signaling game,210

Multiagent Systems, draft of August 27, 2007

Page 448: Mas kevin b-2007

432 Index

Sperner’s lemma,54, 54sponsored search auctions,301stable equilibria,68stable sets, 331stable steady state,197Stackelberg game,172, 172Stackelberg routing,157stag-hunt game,207, 207, 208stage game,124stationary strategy,125, 125,135, 135steady state,179, 197stochastic game,134, 134strategic form, 45strategic relevance,166strategies, 49strategy-proof mechanism,243strict domination,60strict Nash equilibrium,52strict Pareto efficiency,51strict substitutes, 304strictly dominant strategy, 374strictly Pareto efficient,252strong quasi-transitivity,239strong transitivity,237subadditive valuation function,304subgame, 100subgame-perfect equilibria (SPE), 101subgame-perfect equilibrium, 100subgame-perfect equilibrium (SPE),101substitutability,26, 304super-additive game,325superadditive valuation function,304support, 53Support-Enumeration Method,80swap contracts,17symmetric game,195synergy representation,335

targeted learning,193targeted optimality,193TCP user’s game,44, 44team game,47, 47Tit-for-Tat (TfT), 126, 126total preorder, 363total unimodularity,308, 402tractability,253, 266, 316transferable utility assumption,324Treasure of Sierra Madre game,324tree form, 95tree-structured bid,308

trembling-hand perfect equilibrium,66, 66trigger strategy,126, 126,182truthfulor incentive compatible mechanism,

245truthfulness,245, 252two-sided auction,318type, 141

uncertainty, 354unconditional regret, 192uniform pricing rule,294unit resolution,6universal consistency,177update, 374upper hemi-continuity, 55utility function, 37, 45, 45utility theory,37

validity, 347,405valuation,251value iteration,404value iteration algorithm,32value of a zero-sum game,57value of the game,188value queries,316variable elimination,34VCG mechanism, 258,303veridity, 349, 349verification complexity,314very weak domination,60veto player,329Vickrey auction,275Vickrey-Clarke-Groves (VCG) mechanism,

258virtual valuation,287

weakS5, 360weak achievement goal,393weak budget balance,253weak domination,60weak monotonicity (WMON),267weak Nash equilibrium,52weak Pareto efficiency,233weak transitivity,238weighted graph game,333weighted majority game,335weighted matching,19well-founded total preorder, 364Wilson doctrine,287, 321winner determination problem

c© Shoham and Leyton-Brown, 2007

Page 449: Mas kevin b-2007

Index 433

combinatorial auction,306general multiunit auction, 298

winner’s curse,292

XML, 219XOR bid,311

zero-sum game,47, 326

Multiagent Systems, draft of August 27, 2007