Dynamic Oligopoly Games with Private Markovian Dynamics

Dynamic Oligopoly Games with Private Markovian Dynamics

Yi Ouyang, Hamidreza Tavafoghi and Demosthenis Teneketzis

Abstract— We analyze a dynamic oligopoly model withstrategic sellers and buyers/consumers over a finite horizon.Each seller has private information described by a finite-stateMarkov process; the Markov processes describing the sellers’information are mutually independent. At the beginning ofeach time/stage t the sellers simultaneously post the pricesfor their good; subsequently, consumers make their buyingdecisions; finally, after the buyers’ decisions are made, apublic signal, indicating the buyers’ consumption experiencefrom each seller’s good becomes available and the wholeprocess moves to stage t + 1. The sellers’ prices, the buyers’decisions and the signal indicating the buyers’ consumptionexperience are common knowledge among buyers and sellers.This dynamic oligopoly model arises in online shopping anddynamic spectrum sharing markets. The model gives rise to astochastic dynamic game with asymmetric information. Usingideas from the common information approach (developed in[1] for decentralized decision-making), we prove the existenceof common information based equilibria. We obtain a sequen-tial decomposition of the game and we provide a backwardinduction algorithm to determine common information-basedequilibria that are perfect Bayesian equilibria. We illustrateour results with an example.

I. INTRODUCTION

A. Background and Motivation

Stochastic dynamic games arise in many socio-technological systems that consist of many strategicdecision makers (agents). In dynamic games with symmetricinformation all the agents share the same informationand each agent makes decisions anticipating other agents’strategies. This class of dynamic games has been extensivelystudied in the literature (see [2–5] and references therein).An appropriate solution concept for this class of gamesis sub-game perfect equilibrium (SPE) which consists ofa strategy profile of agents that must satisfy sequentialrationality [2]. The common history in dynamic games withsymmetric information can be utilized to provide a sequentialdecomposition of the dynamic game. The common history(or a function of it) serves as an information state and SPEcan be computed through backward induction.

In dynamic games with asymmetric information agentshave different observations over time, and consequently dif-ferent local history. As a result, each agent needs to anticipatethe other agents’ strategy and to form a belief about theother agents’ local information. Therefore, perfect Bayesianequilibrium (PBE) is an appropriate solution concept for this

This research was supported in part by NSF grants CCF-1111061 andCNS-1238962.

Y. Ouyang, H. Tavafoghi and D. Teneketzis are with the Departmentof Electrical Engineering and Computer Science, University of Michi-gan, Ann Arbor, MI (e-mail: [email protected]; [email protected];[email protected]).

class of games. PBE consists of a pair of strategy profilesand beliefs for all agents that jointly must satisfy sequentialrationality and consistency [2]. In games with asymmetricinformation a decomposition similar to that of games withsymmetric information is not possible in general. This isbecause the computation of any belief at time t depends,in general, on the strategy of all agents up to time t − 1.As a result, only special instances of stochastic dynamicgames with asymmetric information have been studied inthe literature (see [6–12] and references therein).

In this paper, we consider a dynamic oligopoly modelwith asymmetric information and Markovian dynamics. Wedefine a class of PBE and provide a sequential decompositionof the game through an appropriate choice of informationstate using ideas from the common information approachdeveloped in [1] for decentralized decision-making. Theproposed equilibrium and the associated decomposition re-semble Markov perfect equilibrium (MPE) defined in [13]for dynamic games with symmetric information.

Dynamic oligopoly games have been studied in eco-nomics. Bergemann et al. [14] study a dynamic oligopolygame with symmetric information where the quality ofgoods is unknown and is revealed over time through publicconsumption experiences of the buyers; the solution conceptis MPE. A repeated oligopoly game with asymmetric infor-mation is investigated in [15], [16] with private productioncosts that are described by iid processes; the solution conceptis public perfect equilibrium [12]. In this paper, we considera dynamic oligopoly model where each seller has privateinformation about the quality of his product; this informationis described by a Markov process. Such a dynamic oligopolymodel results in a non-zero sum stochastic dynamic gamewith asymmetric information.

The dynamic oligopoly model investigated in this paperarises in several socio-technological areas. Such an areais the growing online shopping market. In such a market,the buyers’ shopping decisions are based on the products’posted prices and the posted consumer reviews about theirconsumption experiences [17], [18]. A product’s quality isthe seller’s private information that varies over time due tochanges in technology and other unpredictable factors. It iswidely known that the public consumer reviews, known asword of mouth [19], have great impact on the the consumers’buying decisions and, consequently, on the sellers’ pricingstrategies.

Another area is dynamic spectrum markets. It has beenargued that the static long-term allocation of the spectrumresults in inefficient and underutilization of spectrum [20].A dynamic market architecture, regulated by FCC, has been

proposed as a solution to this inefficiency. In this marketarchitecture primary users (service providers) can rent/leasespectrum (possibly along with access points) to secondaryusers for limited time and at particular locations [21], [22].In such markets the primary users have private informationabout the quality and vacancy of their own channels. Thisasymmetry of of information has not been addressed in theliterature so far. Existing work (see [23–26] and referencesin) has primarily investigated dynamic spectrum marketswith symmetric information.

The problems investigated in [6–10] are the most closelyrelated to our problem, but they are different from ourproblem. The authors of [8–10] analyze zero-sum Markovianmodels with asymmetric information. Our paper as wellas [6], [7] use a common information-based methodologyto establish the existence of common information-basedequilibria, and achieve a sequential decomposition of thedynamic game that leads to a backward induction algorithmthat determines such equilibria. The key difference betweenour problem and those in [6], [7] is in the informationstructure. The information structure in [6], [7] is such that theagents’ common information based belief about the systemstate is policy independent. Therefore, in [6], [7] there is nosignaling effect [2], [4] and one only needs to determinea strategy profile that satisfies the sequential rationalitycondition. The information structure in our problem results ina common information-based belief that is policy dependent.As a result, there is signaling involved and we need to de-termine both strategy profiles and belief systems that jointlysatisfy sequential rationality and consistency conditions.

B. Contribution

The key contributions of the paper are: (i) The exis-tence of common information-based equilibria for a classof stochastic dynamic games with asymmetric informationwhere the common information-based beliefs are policydependent; (2) The sequential decomposition of stochasticdynamic oligopoly games with the asymmetric informationstructure described in section II. This decomposition providesa backward induction algorithm to find common information-based equilibria that are PBE. The results are illustrated byan example.

C. Organization

The paper is organized as follows. We introduce the modeland formulate our problem in section II. In Section III, weanalyze the problem and present our results. We provide anexample that illustrates our results in section IV. We concludein section V. Due to space limitation, the proofs of all thetechnical results of the paper have been omitted. They canbe found in the Appendix.

D. Notation

Random variables are denoted by upper case letters, theirrealization by the corresponding lower case letter. In gen-eral, subscripts are used as time index while superscriptsare used to index sellers and buyers. For time indices

t1 ≤ t2, Xt1:t2 (resp. ft1:t2(·)) is the short hand nota-tion for the variables (Xt1 , Xt1+1, ..., Xt2) (resp. functions(ft1(·), . . . , ft1(·))). When we consider the variables (resp.functions) for all time, we drop the subscript and use Xto denote X1:T (resp. f(·) to denote f1:T (·)). For vari-ables X1

t , . . . , XNt (resp. functions f1t (·), . . . , fNt (·)), we use

Xt := (X1t , . . . , X

Nt ) (resp. ft(·) := (f1t (·), . . . , fNt (·))) to

denote the vector of the set of variables (resp. functions), andX−nt := (X1

t , . . . , Xn−1t , Xn+1

t , . . . , XNt ) (resp. f−nt (·) :=

(f1t (·), . . . , fn−1t (·), fn+1t (·), . . . , fNt (·))) to denote all the

variables (resp. functions) except that of the agent indexedby n. P(·) and E(·) are the probability and the expectation,respectively, of an event. For a set X , ∆(X ) denotes theset of all beliefs/distributions on X . For random variablesX,Y with realizations x, y, P(x|y) := P(X = x|Y = y)and E(X|y) := E(X|Y = y). For a strategy g and a belief(probability mass function) π, we use Pgπ(·) (resp. Egπ(·)) toindicate that the probability (resp. expectation) depends onthe choice of g and π.

II. SYSTEM MODEL AND PROBLEM FORMULATION

Consider a dynamic oligopoly game between N sellersindexed by n ∈ N = {1, 2, · · · , N} and M buyers indexedby m ∈ M = {1, 2, · · · ,M} over a time period T ={1, 2, · · · , T}. At time t ∈ T the state of the system is givenby (X1

t , X2t , · · · , XN

t ), where Xnt ∈ X denotes the state of

seller n which is privately observed by seller n. The privatestate Xn

t has Markovian dynamics with matrix of transitionprobabilities given by Qn = {Qn(i, j) > 0, i, j ∈ X} forn ∈ N .

At the beginning of time t, each seller n, n ∈ N , selects aprice Pnt ∈ P at which he sells a unit of his good. After theprices Pt := (P 1

t , P2t , . . . , P

Nt ) are announced, each buyer

chooses the amount of good that he wants to buy from eachseller at time t. Let Dm

t ∈ DN ,m ∈ M, denote buyerm’s decision at time t. Then, Dm

t := (Dmt (1), . . . , Dm

t (N)),and Dm

t (n) = k means that buyer m buys k units of goodfrom seller n. Let Dt := (D1

t , . . . , DMt ) denote the buyers’

decisions at time t. The decisions Dt are observed by boththe buyers and the sellers. After the buyers’ decisions aremade, each buyer receives the procured goods at time t. Theconsumption experience of good n at time t, denoted byY nt ∈ Y , is given by

P(ynt |xnt ) = Qy(xnt , ynt ) > 0, for all ynt ∈ Y, xnt ∈ X . (1)

Let Yt := (Y 1t , Y

2t , · · · , Y Nt ); Yt is observed by the sellers

and the buyers at time t. The time-ordering of events fromtime t to t + 1 is shown in Fig. 1. Double arrows (blue)indicate sellers’ and buyers’ decisions; thin arrows indicatethe realizations of the states of the Markov processes andthe consumption experiences; and dashed arrows indicateobservations of sellers and buyers. Note that t+ is used todenote the time when the buyers make decisions.

For each n ∈ N , seller n’s instantaneous revenue at timet is given by

φn,St (Pt, Dt) := (Pnt − c)∑m∈N

Dmt (n) (2)

timet t+ t+ 1

Xt Yt

seller n Xnt P

nt Dt Yt

buyer m Pt Dmt Dt Yt

Fig. 1. The time-ordering of events.

where c denote the fixed marginal production cost. For eachm ∈ M, buyer m’s instantaneous utility at time t is givenby

φm,Bt (Yt, Dt, Pt) := vmt (Yt, Dt)−∑n∈N

Dmt (n)Pnt , (3)

where vmt (Yt, Dt) denotes his utility from using the procuredgoods with consumption experience Yt which depends on thecollective decisions Dt of the buyers. Note that the depen-dency of the buyer’ utility on the aggregate decision Dt cancapture complementarity among the goods and externalityamong the buyers.

We make the following assumptions

Assumption 1. The dynamic oligopoly game is a finite game.i.e. X , P , D and Y are all finite sets.

Assumption 2. The Markov chains describing the evolutionof the state processes, and the consumption experiences Ytconditional on the state Xt of the sellers are all mutuallyindependent.

Let Hn,St denote the history of observations of seller n

at time t, and Hm,Bt denote the history of observations of

buyer m at time t. The histories Hn,St and Hm,B

t are givenby (consult Fig 1)

Hn,St = {Xn

1:t, P1:t−1, D1:t−1, Y1:t−1} for n∈N , (4)

Hm,Bt = Hc

t+ :={P1:t, D1:t−1, Y1:t−1} for m∈M. (5)

Let Hn,St and Hm,Bt denote the space of possible historiesof seller n and buyer m, respectively, at time t. Let Ht :=∪n∈N ,m∈MHn,St ∪Hm,Bt and H := ∪t∈THt denote the setof all possible histories of the buyers and the sellers overtime horizon T . A behavioral strategy for seller n at time tis a mapping gn,St : Hn,St 7→ ∆(P) such that

P(Pnt = pnt ) = gn,St (hn,St )(pnt ) for all pnt ∈ P. (6)

A behavioral strategy for buyer m at time t is a mappinggm,Bt : Hm,Bt 7→ ∆(D) such that

P(Dmt = dmt ) = gm,Bt (hm,Bt )(dmt ) for all dmt ∈ D. (7)

We define gn,S := gn,S1:T and gm,B := gm,B1:T , and call g :=

(g1,S , . . . , gN,S , g1,B , . . . , gM,B) a strategy profile. Let Gn,Stand Gm,Bt denote the set of all possible strategies of sellern and buyer m, respectively, at time t.

Let µ := (µ1, µ2 . . . , µT ) be a belief system where µt :Ht 7→ ∆(X t×N ) for any t ∈ T ; for every history hn,St ∈Ht (resp. hm,Bt ∈ Ht) µt(hn,St ) (resp. µt(h

m,Bt )) defines a

distribution/belief on X1:t for seller n (resp. for buyer m).For each n ∈ N , seller n’s objective is to maximize his

expected (under the belief system µ) profit described by

Egµ

[T∑t=1

φn,St (Pt, Dt)

](8)

where φn,St (Pt, Dt) is given (2). For each m ∈ M, buyerm’s objective is to maximize his expected (under the beliefsystem µ) utility described by

Egµ

[T∑t=1

φm,Bt (Yt, Dt, Pt)

](9)

where φm,Bt (Yt, Dt, Pt) is given by (3).A pair (g, µ) satisfies sequential rationality if for every

n ∈ N , gn,St:T is a solution to

supg′n,St:T ∈G

n,St:T

Eg′n,St:T ,g−n,S ,gB

µ

[T∑τ=t

φn,St (Pt, Dt)|hn,St

](10)

for every t ∈ T and every history hn,St ∈ Hn,St , and forevery m ∈M, gm,Bt:T is a solution to

supg′m,Bt:T ∈Gm,Bt:T

EgS ,g′m,Bt:T ,g−m,B

µ

[T∑τ=t

φm,Bt (Yt, Dt, Pt)|hm,Bt

], (11)

for every t ∈ T and for every history hm,Bt ∈ Hm,Bt .A pair (g, µ) satisfies consistency if for every t ∈ T ,

µt(hn,St+1) (resp. µt(h

m,Bt+1 )) is determined by Bayes’ rule

µt(hn,St+1)(x1:t+1) =

Pgµt(h

n,St )

(x1:t+1, pt, dt, yt|hn,St )∑x′1:t+1∈X t+1:x′nt+1=x

nt+1Pgµt(h

n,St )

(x′1:t+1, pt, dt, yt|hn,St )(12)

for every history hn,St+1 = (hn,St , pt, dt, yt, xnt+1) ∈ Hn,St+1 of

seller n, n ∈ N (respectively,

µt(hm,Bt+1 )(x1:t+1) =

Pgµt(h

m,Bt )

(x1:t+1, pt+1, dt, yt|hm,Bt )∑x′1:t+1∈X t+1 Pg

µt(hm,Bt )

(x′1:t+1, pt+1, dt, yt|hm,Bt )(13)

for every history hm,Bt+1 = (hm,Bt , pt+1, dt, yt) ∈ Hm,Bt+1 ofbuyer m, m ∈M).

A pair (g, µ) is called a perfect Bayesian equilibrium(PBE) if it satisfies sequential rationality and consistency.1

We wish to determine PBE of the dynamic game withasymmetric information defined by the dynamic oligopolymodel described in this section.

1Note that the dynamic oligopoly model defined in this paper satisfiesthe conditions of [27] [2, ch. 8], because the Markov processes describingthe private information of the sellers are mutually independent. Therefore,the set of PBEs and the set of sequential equilibria of the game defined inthis paper are the same.

III. ANALYSIS

The main results of this paper are described by Theorem1 and 2 presented in Section III-B. The proof of Theorem 1and 2, along with the proofs of all the intermediate resultsrequired to verify the theorems’ assertions can be found inAppendix I. To establish Theorems 1 and 2 we proceed asfollows.

A. Preliminaries

The common history Hct at the beginning of time t is

Hct = {P1:t−1, D1:t−1, Y1:t−1}. (14)

The common history Hct+ at time t+, the time when buyers

make decisions, is

Hct+ = {P1:t, D1:t−1, Y1:t−1}. (15)

Let Hc denote the space of all possible common histories.Note that Hm,Bt = Hct+ and hm,Bt = hct+ for all m ∈M.

Based on common histories, we define common-information based (CIB) belief systems. A map γ : Hc 7→∪t∈T∆(XN ) is called a CIB belief system. A set of func-tions ψ = {ψnt , ψnt+ , n ∈ N , t ∈ T }, where ψnt : ∆(XN )×P 7→ ∆(X ) and ψnt+ : ∆(XN )×Y 7→ ∆(X ), is called a CIBupdate rule. From any CIB update rule ψ, we can construct aCIB belief system γψ : Hc 7→ ∪t∈T∆(XN ) by the followinginductive construction:

1) γψ(hc1)(x1) = P(x1) =∏n∈N P(xn1 ) ∀x1 ∈ XN .

2) At time t+, after γψ(hct) is defined, set

γψ(hct+)(xnt ) := ψnt (γψ(hct), pnt )(xnt ), (16)

γψ(hct+)(xt) :=

N∏n=1

γψ(hct+)(xnt ), (17)

for all xt ∈ XN .3) At time t+ 1, after γψ(hct+) is defined, set

γψ(hct+1)(xnt+1) :=ψnt+(γψ(hct+), ynt )(xnt+1), (18)

γψ(hct+1)(xt+1) :=

N∏n=1

γψ(hct+1)(xnt+1), (19)

for all xt+1 ∈ XN .For the CIB belief system γψ , we use Π

γψt and Π

γψt+ to denote

the belief, under γψ , on Xt conditional on the commonhistories at time t and time t+, respectively; that is,

Πγψt := γψ(Hc

t ) ∈ ∆(XN ), (20)

Πγψt+ := γψ(Hc

t+) ∈ ∆(XN ). (21)

We also define the marginal beliefs on Xnt at time t and time

t+, respectively, as

Πn,γψt (xnt ) := γψ(Hc

t )(xnt ) ∀xnt ∈ X , (22)

Πn,γψt+ (xnt ) := γψ(Hc

t+)(xnt ) ∀xnt ∈ X . (23)

Given a CIB belief system γψ , we consider CIB strategieswhere each seller n makes his decision at time t based onXnt and Π

γψt , and each buyer m makes his decision at time

t+ based on Pt and Πγψt+ . We call a set of functions λ =

{λn,St , λm,Bt , n ∈ N ,m ∈ M, t ∈ T }, where λn,St : X ×∆(XN ) 7→ ∆(P) and λm,Bt : P ×∆(XN ) 7→ ∆(D), a CIBstrategy profile.

In analogy with PBE, we define consistent beliefs for CIBstrategies.

Definition 1 (consistency). For a given CIB strategy profileλ, we call a CIB update rule ψ consistent with λ if, for alln ∈ N ,

ψnt (πt, pnt )(xnt )

=λn,St (xnt , πt)(p

nt )πnt (xnt )∑

x′nt ∈X λn,St (x′nt , πt)(p

nt )πnt (x′nt )

(24)

when the denominator of (24) is non-zero;

ψnt+(πt+ , ynt )(xnt+1)

=

∑xnt ∈X Q

n(xnt , xnt+1)Qy(xnt , y

nt )πnt+(xnt )∑

x′nt ∈X Qy(x′nt , y

nt )πnt+(x′nt )

(25)

when the denominator of (25) is non-zero.

Note that when the denominator of (24) and (25) are zero,ψnt (·) and ψnt+(·) can be arbitrarily defined and consistencyis still satisfied.

The following lemma connects a CIB strategy profilealong with its consistent CIB belief to a strategy profile alongwith its consistent belief for the dynamic oligopoly game.

Lemma 1. If λ is a CIB strategy profile and ψ is a CIBupdate rule consistent with λ, there exists a pair, denoted by(g, µ) = f(λ, ψ), such that g ∈ G is a strategy profile with

gn,St (hn,St ) := λn,St (xnt , γψ(hct)) = λn,St (xnt , πγψt ) (26)

for all n ∈ N ,

gm,Bt (hct+) := λm,Bt (pt, γψ(hct+)) = λm,Bt (pt, πγψt+ ), (27)

for all m ∈ M, and µ : H 7→ ∪t∈T∆(X t×N ), is a beliefsystem such that, for all hn,St , hct+ ∈ H and for all xt ∈ XN ,

µ(hn,St )(xt) = 1{xnt |hn,St }∏k 6=n

γψ(hct)(xkt ), (28)

µ(hct+)(xt) =∏k∈N

γψ(hct+)(xkt ), (29)

where 1{xnt |hn,St } denotes the indicator that Xnt = xnt in the

history hn,St . Furthermore, µ is consistent with g.

Note that, (28) implies that for any seller n, his privatestate Xn

t is independent of X−nt under µ for any historyhn,St ∈ Hn,St .

B. Common-Information Based Equilibria

We focus on common-information based equilibria definedbelow.

Definition 2. A pair (λ∗, ψ∗) of a CIB strategy profile λ∗

and a CIB update rule ψ∗ is called a common-informationbased equilibrium (CIB equilibrium) if ψ∗ is consistent with

λ∗ and the pair (g∗, µ∗) = f(λ∗, ψ∗) defined in Lemma 1forms a PBE.

The following lemma plays a crucial role in establishingthe main results of this paper.

Lemma 2 (Closeness of CIB strategies). Suppose λ is aCIB strategy profile and ψ is a CIB update rule consistentwith λ. If every buyer m uses the strategy λm,B along withthe belief generated by ψ, and every seller k 6= n uses thestrategy λk,S along with the belief generated by ψ, then,there exists a CIB strategy λn,S that is a best response forseller n under the belief generated by ψ. Furthermore, ifevery seller n uses the strategy λn,S along with the beliefgenerated by ψ, and every buyer k 6= m uses the strategyλk,B along with the belief generated by ψ, then, there existsa CIB strategy λm,B that is a best response for buyer munder the belief generated by ψ.

Lemma 2 says that the set of CIB strategies is closed underthe best response mapping. Lemma 2 allows us to restrictattention to the set of CIB strategies and to search for CIBequilibria. We show that the best response mapping has afixed point within the set of CIB strategies and beliefs; thus,we establish the existence of CIB equilibria.

Theorem 1. The dynamic oligopoly game defined in SectionII has at least one CIB equilibrium.

In order to compute CIB equilibria, for a pair of CIBstrategy profile λ and CIB update rule ψ (not necessaryconsistent with λ), we define, recursively, a set of functions

V (λ, ψ):={V n,St (·), V m,Bt (·), V n,St+ (·), V m,Bt+ (·), t∈T

}(30)

as follows.V n,ST+1(·) := 0 ∀n ∈ N , V m,BT+1 (·) := 0 ∀m ∈M.For each t+, t ∈ T , for all n ∈ N and for all m ∈M,

V n,St+ (πt+ , pt, xnt ) := Eλ

Btπt+

[φn,St (pt, Dt)

+V n,St+1 (ψt+(πt+ , Yt), Xnt+1)|πt+ , pt, xnt

](31)

V m,Bt+ (πt+ , pt) :=EλBtπt+

[φm,Bt (Yt, pt, Dt)

+V m,Bt+1 (ψt+(πt+ , Yt))|πt+ , pt]

(32)

where πt+ denotes the CIB distribution of Xt at t+ in theabove expectation.For each t, t ∈ T , for all n ∈ N and for all m ∈M,

V n,St (πt, xnt ):=Eλ

Stπt

[V n,St+ (ψt(πt, Pt), Pt, x

nt )|πt, xnt

](33)

V m,Bt (πt) := EλStπt

[V m,Bt+ (ψt(πt, Pt), Pt)|πt

]. (34)

where πt denotes the CIB distribution of Xt at t in the aboveexpectation.

Using the set of functions V (λ, ψ), we provide amethod/algorithm to sequentially compute CIB equilibria inthe following theorem.

Theorem 2. A pair (λ∗, ψ∗) of a CIB strategy profile λ∗ anda CIB update rule ψ∗ is a CIB equilibrium if and only if ψ∗

is consistent with λ∗ and λ∗ solves the dynamic programdefined by functions V ∗ = V (λ∗, ψ∗). That is,

1) For all n ∈ N , for any xnt ∈ X and πt ∈ ∆(XN ),any p∗nt ∈ P , such that λ∗n,St (xnt , πt)(p

∗nt ) > 0, should

satisfy

p∗nt ∈arg maxpnt ∈P

{Eλ∗−n,Stπt

[V ∗n,St+ (

ψ∗t (πt, (pnt , P

−nt )), (pnt , P

−nt ), xnt )|πt, xnt

]}.

(35)

2) For all m ∈ M, for any pt ∈ PN and πt+ ∈ ∆(XN ),any d∗mt ∈ D, such that λ∗m,Bt (pt, πt+)(d∗mt ) > 0,should satisfy

d∗mt ∈ arg maxdmt ∈D

{Eλ∗−m,Btπt+

[φm,Bt (Yt, pt, (d

mt , D

−mt ))

+V ∗m,Bt+1 (ψ∗t+(πt+ , Yt))|pt, πt+]}

. (36)

Any CIB equilibrium (λ∗, ψ∗) determined by the algo-rithm of Theorem 2 gives rise to a PBE (g∗, µ∗) = f(λ∗, ψ∗)via the result of Lemma 1. Note that there may existmultiple CIB equilibria, and corresponding PBE equilibria,of the game formulated in this paper. In such a situation,equilibrium selection is an open problem (see [28]).

We illustrate our results with the following example.

IV. AN EXAMPLE

Consider an oligopoly game with N = 2 sellers and M =10 buyers over a time horizon T = 2. The state of eachseller’s service is either high or low, i.e. X = {H,L}, withtransition probabilities given by

Qn(H,H) = qH , Qn(H,L) = 1− qH , (37)

Qn(L,H) = qL, Qn(H,L) = 1− qL (38)

for both the buyers; the initial distribution on X11 and X2

1 aregiven by π1

1 and π21 , respectively. The quality of the good

Y nt is assumed to be either y > 0 or 0. The conditionaldistribution of Y nt is given by

P(Y nt = y) = qyH when Xnt = H, (39)

P(Y nt = y) = qyL when Xnt = L, (40)

where qyH > qyL > 0. Consequently

E[Y nt |Xnt = H] = qyHy, E[Y nt |Xn

t = L] = qyLy. (41)

Let P = {h, l}, that is, each seller can set either a highprice h or a low price l. We assume that each buyer buysat most one unit of the good from either of the sellers, i.e.D = {(0, 0), e1, e2} where e1 := (1, 0), e2 := (0, 1).

The instantaneous utility of buyer m is given by

φm,Bt (Yt, Pt, Dt)

:=∑n=1,2

Dmt (n)

Y nt − Pnt −∑k 6=m

Dkt (n)

(42)

where the term∑k 6=mD

kt (n) captures the negative con-

sumption externality created by other buyers (e.g. delayin delivery in online shopping; interference in spectrummarkets). The instantaneous revenue of seller n are givenby

φn,St (Pt, Dt) = Pnt∑m∈M

Dmt (n). (43)

Since the state is either H or L, any common belief onXnt at time t (resp. t+) can be described by a scalar πnt (resp.

πnt+ ) that denotes P(Xnt = H|hct) (resp. P(Xn

t = H|hct+)).Using Theorem 2, we compute a CIB equilibrium of the

dynamic oligopoly game described above.

A. Computation of a CIB Equilibrium

In the computation, we use the following parameters: qH=0.8, qL=0.3, qyH =0.6, qyL=0.1, y=10, h=10, l=8. Due tospace limit, see Appendix II for more details in this section.

First consider time t+, t = 1, 2. Suppose that the agentsuse a CIB strategy profile λ and a CIB update rule ψ. Thenthe dynamic problem (36) for buyer m at each time t+, t =1, 2, becomes

arg maxdmt ∈D

{Eλ−m,Btπt+

[φm,Bt (Yt, pt, (d

mt , D

−mt ))

+V m,Bt+1 (ψt+(πt+ , Yt))|pt, πt+]}

= arg maxdmt ∈D

{Eλ−m,Btπt+

[φm,Bt (Yt, pt, (d

mt , D

−mt ))|pt, πt+

]}+ Eπt+

[V m,Bt+1 (ψt+(πt+ , Yt))|pt, πt+

]. (44)

Note that the term Eπt+[V m,Bt+1 (ψt+(πt+ , Yt))|pt, πt+

]does

not depend on Dt. Therefore, a CIB strategy λBt (·)solves (44) if for any pt ∈ P and πt+ ∈ ∆(XN ),{λm,Bt (pt, πt+)(dmt ),m ∈ M} is a Bayesian Nash equilib-rium (BNE) of the game where each buyer m,m ∈ M hasthe objective:

maxdmt ∈D

{Eπt+

[φm,Bt (Yt, pt, (d

mt , d

kt ))|pt, πt+

]}=

maxdmt ∈D

∑n=1,2

dmt (n)

πnt+yD + qyLy − pnt −∑k 6=m

dkt (n)

(45)

where yD := qyHy − qyLy. It can be shown that the abovegame has a symmetric equilibrium given by

P(Dmt (n) = 1)

=1

2(M − 1)

(yD(πnt+ − π−nt+ )− pnt + p−nt

)+

1

2(46)

for t = 1, 2 and n = 1, 2. Therefore, we define the behavioralstrategy λ∗m,Bt (pt, πt+) of buyer m,m ∈M, at time t+, t =1, 2, by

λ∗m,Bt (pt, πt+)(en)

=1

2(M − 1)

(yD(πnt+ − π−nt+ )− pnt + p−nt

)+

1

2. (47)

Now suppose the buyers use strategy λ∗B and the sellersuse a (behavioral) CIB strategy λS given by

λn,S1 (xnt , π1)(h) = βn1 (π1) for both xnt = H or L, (48)

λn,S2 (H,π2)(h) = βn2 (H,π2), (49)

λn,S2 (L, π2)(h) = βn2 (L, π2), (50)

where β1 = {βn1 (π1t ) ∈ [0, 1], n = 1, 2} and β2 =

{βn2 (H,π2), βn2 (L, π2) ∈ [0, 1], n = 1, 2} are functionsthat describe the probabilities in the behavioral strategiesλn,Sn , n = 1, 2. The CIB strategy λS can be interpreted asfollows. At time t = 1 the sellers make pricing decisionsbased only on the CIB belief π1. At time t = 2 the sellersmake pricing decisions based on their private informationand the the CIB belief π2.

For any β2 consider a CIB update rule ψβ2 given by

ψn,β2

1 (π1, pn1 ) = πn1 for both pn1 = h or l, (51)

ψn,β2

1+(π1+ , y)

=πn1+q

yHqH + (1− πn1+)qyLqL

πn1+qyH + (1− πn1+)qyL

, (52)

ψn,β2

1+(π1+ , 0)

=πn1+(1− qyH)qH + (1− πn1+)(1− qyL)qL

πn1+(1− qyH) + (1− πn1+)(1− qyL), (53)

for n = 1, 2 at time t = 1; at time 2 when βn2 (H,π2) =βn2 (L, π2)

ψn,β2

2 (π2, pn2 ) = πn2 for both pn2 = h or l; (54)

when βn2 (H,π2) 6= βn2 (L, π2)

ψn,β2

2 (π2, h) =πn2 β

n2 (H,π2)

πn2 βn2 (H,π2) + (1− πn2 )βn2 (L, π2)

, (55)

ψn,β2

2 (π2, l) =

πn2 (1− βn2 (H,π2))

πn2 (1− βn2 (H,π2)) + (1− πnt )(1− βn2 (L, π2))(56)

for n = 1, 2. The CIB update rule ψβ2 is consistent with theCIB strategy profile (λS , λ∗B).

Then, under the CIB strategy profile (λS , λ∗B) and theCIB update rule ψβ2 , the expected instantaneous revenue ofseller n, n = 1, 2, at time 2 becomes

pn2E(λS ,λ∗B)π2

[ ∑m∈M

Dm2 (n)

]=rn,β2

2 (π2, pn2 ) (57)

where

rn,β2

2 (π2, pn2 ) :=

M

2(M − 1)pn2 [yD(ψn,β2

2 (π2, pn2 )− π−n2 )

− pn2 + p−n2 (π2, β2) + (M − 1)], (58)

and

pn2 (π2, β2) := (πn2 βn2 (H,π2) + (1− πn2 )βn2 (L, π2))h

+ (πn2 (1− βn2 (H,π2)) + (1− πn2 )(1− βn2 (L, π2)))l (59)

for n = 1, 2. There exists a BNE λ∗S2 (·) at time t = 2 forthe game where each seller has the objective given by (57),

0

0.5

1

0

0.5

1

0.4

0.5

0.6

0.7

0.8

πn2

β∗n2 (H,π2)

π−n2

0

0.5

1

0

0.5

10.2

0.4

0.6

0.8

1

πn2

β∗n2 (L,π2)

π−n2

Fig. 2. The strategy λ∗S2 (·) for the sellers at time t = 2.

if for each π2 ∈ ∆({H,L}2) and for each n = 1, 2 one thethe following conditions is satisfied:

1) 1 > β∗n2 (H,π2) > 0 or 1 > β∗n2 (L, π2) > 0, andrn,β∗22 (π2, h) = r

n,β∗22 (π2, l).

2) β∗n2 (H,π2) = β∗n2 (L, π2) = 1, and rn,β∗22 (π2, h) ≥

rn,β∗22 (π2, l).

3) β∗n2 (H,π2) = β∗n2 (L, π2) = 0, and rn,β∗22 (π2, h) ≤

rn,β∗22 (π2, l).

where β∗2 = {β∗n2 (H,π2), β∗n2 (L, π2), n = 1, 2} are thefunctions corresponding to λ∗S2 (·) according to (49) and (50).We numerically compute such a BNE λ∗S2 (·) along with thefunctions β∗n2 (H,π2) and β∗n2 (L, π2). They are shown inFig. 2.

Now consider at time 1. Under the CIB strategy profile(λS , λ∗B) and the CIB update rule ψβ2 , the instantaneousrevenue of seller n is


[ ∑m∈M

Dm1 (n)

]=rn,β1

1 (π1, pn1 ) (60)

where

rn,β1

1 (π1, pn1 ) :=

M

2(M − 1)pn1 [yD(πn1 − π−n1 )

− pn1 + p−n1 (π1, β1) + (M − 1)], (61)

and

pn1 (π1, β1) := βn1 (π1)h+ (1− βn1 (π1))l, (62)

for n = 1, 2. Because of (48), each seller’s expectedcontinuation revenue in the analogue of (35) for our exampledoes not depend on λS1 (·) (see Appendix II for more details).Therefore, the sellers’ problem at t = 1 is to determine a CIBequilibrium for the game with objectives given by (60).There exists a BNE λ∗S1 (·) at time t = 1 for the gamewhere each seller has the objective given by (60), if foreach π1 ∈ ∆({H,L}2) and for each n = 1, 2 one the thefollowing conditions is satisfied:

1) 1 > β∗n(π1) > 0 and rn,β∗1

1 (π1, h) = rn,β∗11 (π1, l).

2) β∗n(π1) = 1 and rn,β∗1

1 (π1, h) ≥ rn,β∗1

1 (π1, l).

3) β∗n(π1) = 0 and rn,β∗1

1 (π1, h) ≤ rn,β∗1

1 (π1, l).

where β∗1 = {β∗n1 (π2), n = 1, 2} are the functions corre-sponding to λ∗S1 (·) according (48). We numerically computesuch a BNE λ∗S1 (·) along with the function β∗n1 (π1). Theyare shown in Fig. 3. The BNE λ∗S1 (·) is a solution to the

0

0.5

1

0

0.5

10

0.2

0.4

0.6

0.8

1

πn1

βn1 (π1)

π−n1

Fig. 3. The strategy λ∗S1 (·) for the sellers at time t = 1.

analogue of (35) in Theorem 2 for our example by thereasons explained above.

From the above calculation we have a CIB strategy λ∗ ={λ∗B , λ∗S} and a CIB update rule ψ∗ := ψβ

∗2 such that

λ∗ solves the dynamic program of Theorem 2, and ψ∗ isconsistent with λ∗. Then, according to Theorem 2, (λ∗, ψ∗)forms a CIB equilibrium for the dynamic oligopoly gamedescribed in this example.

We note that at time t = 2, the CIB strategy λS∗2 givenby (49) and (50) depends on the private information of theseller at t = 2. Consequently, the CIB belief at t = 2+ givenby (55) and (56) becomes policy-dependent. The policy-dependent model and solution presented in this examplehighlight the differences between our problem and the onesanalyzed in [6], [7].

The expected profit of a seller with state H and that

0

0.5

1

0

0.5

160

70

80

90

100

110

120

πn1

V n1 (π1,H)

π−n1

0

0.5

1

0

0.5

160

70

80

90

100

110

120

πn1

V n1 (π1, L)

π−n1

0

0.5

1

0

0.5

11

1.5

2

2.5

3

3.5

4

πn1

V n1 (π1,H)− V n

1 (π1, L)

π−n1

Fig. 4. The expected profit of a seller with state H and that of a seller with state L and their difference.

of a seller with state L along with their difference at thisequilibrium are shown in Fig. 4.

V. CONCLUSION

We analyzed a dynamic oligopoly model with privateMarkovian dynamics. This model gives rise to a stochasticdynamic game with asymmetric information. We used ideasfrom the common information approach (developed in [1]for decentralized decision-making) to prove the existence ofCIB equilibria, and to obtain a sequential decomposition ofthe game that leads to a backward induction algorithm forthe computation of such equilibria. We illustrated our resultswith an example. A key feature of our problem is that theCIB beliefs are policy-dependent. As a result, signaling ispresent in the stochastic dynamic game resulting from ourmodel. This is in contrast to the models of [6], [7] wherethere is no signaling.

REFERENCES

[1] A. Nayyar, A. Mahajan, and D. Teneketzis, “Decentralized stochasticcontrol with partial history sharing: A common information approach,”IEEE Transactions on Automatic Control, vol. 58, no. 7, pp. 1644–1658, 2013.

[2] D. Fudenberg and J. Tirole, “Game theory. 1991,” Cambridge, Mas-sachusetts, vol. 393, 1991.

[3] T. Basar and G. J. Olsder, Dynamic noncooperative game theory,vol. 200. SIAM, 1995.

[4] R. B. Myerson, Game theory. Harvard university press, 2013.[5] J. Filar and K. Vrieze, Competitive Markov decision processes.

Springer-Verlag New York, Inc., 1996.[6] A. Nayyar, A. Gupta, C. Langbort, and T. Basar, “Common informa-

tion based markov perfect equilibria for stochastic games with asym-metric information: Finite games,” IEEE Transactions on AutomaticControl, vol. 59, pp. 555–570, March 2014.

[7] A. Gupta, A. Nayyar, C. Langbort, and T. Basar, “Common informa-tion based markov perfect equilibria for linear-gaussian games withasymmetric information,” SIAM Journal on Control and Optimization,vol. 52, no. 5, pp. 3228–3260, 2014.

[8] L. Li and J. Shamma, “Lp formulation of asymmetric zero-sumstochastic games,” in 2014 IEEE 53rd Annual Conference on Decisionand Control (CDC), pp. 7752–7757, 2014.

[9] J. Renault, “The value of markov chain games with lack of informationon one side,” Mathematics of Operations Research, vol. 31, no. 3,pp. 490–512, 2006.

[10] F. Gensbittel and J. Renault, “The value of markov chaingames with incomplete information on both sides,” arXiv preprintarXiv:1210.7221, 2014.

[11] J. Renault, “The value of repeated games with an informed controller,”Mathematics of operations Research, vol. 37, no. 1, pp. 154–179,2012.

[12] G. J. Mailath and L. Samuelson, Repeated games and reputations,vol. 2. Oxford university press Oxford, 2006.

[13] E. Maskin and J. Tirole, “Markov perfect equilibrium: I. observableactions,” Journal of Economic Theory, vol. 100, no. 2, pp. 191–219,2001.

[14] D. Bergemann and J. Valimaki, “Dynamic price competition,” Journalof Economic Theory, vol. 127, no. 1, pp. 232–263, 2006.

[15] S. Athey, K. Bagwell, and C. Sanchirico, “Collusion and price rigid-ity,” The Review of Economic Studies, vol. 71, no. 2, pp. 317–349,2004.

[16] D. A. Miller, “Robust collusion with private information,” The Reviewof Economic Studies, vol. 79, no. 2, pp. 778–811, 2012.

[17] Y. Chen and J. Xie, “Online consumer review: Word-of-mouth as anew element of marketing communication mix,” Management Science,vol. 54, no. 3, pp. 477–491, 2008.

[18] B. Bickart and R. M. Schindler, “Internet forums as influential sourcesof consumer information,” Journal of interactive marketing, vol. 15,no. 3, pp. 31–40, 2001.

[19] J. A. Chevalier and D. Mayzlin, “The effect of word of mouth onsales: Online book reviews,” Journal of marketing research, vol. 43,no. 3, pp. 345–354, 2006.

[20] Q. Zhao and B. M. Sadler, “A survey of dynamic spectrum access,”IEEE Signal Processing Magazine, vol. 24, no. 3, pp. 79–89, 2007.

[21] R. Berry, M. L. Honig, and R. Vohra, “Spectrum markets: motiva-tion, challenges, and implications,” IEEE Communications Magazine,vol. 48, no. 11, pp. 146–155, 2010.

[22] M. Matinmikko, M. Mustonen, D. Roberson, J. Paavola, M. Hoy-htya, S. Yrjola, and J. Roning, “Overview and comparison of re-cent spectrum sharing approaches in regulation and research: Fromopportunistic unlicensed access towards licensed shared access,” in2014 IEEE International Symposium on Dynamic Spectrum AccessNetworks (DYSPAN), pp. 92–102, 2014.

[23] O. Ileri, D. Samardzija, and N. B. Mandayam, “Demand responsivepricing and competitive spectrum allocation via a spectrum server,”in 2005 First IEEE International Symposium on New Frontiers inDynamic Spectrum Access Networks, 2005. DySPAN 2005., pp. 194–202, 2005.

[24] C. Liu and R. A. Berry, “Competition with shared spectrum,” in2014 IEEE International Symposium on Dynamic Spectrum AccessNetworks (DYSPAN), pp. 498–509, 2014.

[25] J. Acharya and R. D. Yates, “Service provider competition and pricingfor dynamic spectrum allocation,” in International Conference onGame Theory for Networks, 2009. GameNets’ 09., pp. 190–198, 2009.

[26] K. R. Liu and B. Wang, Cognitive radio networking and security: Agame-theoretic view. Cambridge University Press, 2010.

[27] P. Battigalli, “Strategic independence and perfect bayesian equilibria,”Journal of Economic Theory, vol. 70, no. 1, pp. 201–234, 1996.

[28] J. C. Harsanyi and R. Selten, “A general theory of equilibriumselection in games,” MIT Press Books, vol. 1, 1988.

[29] P. Kumar and P. Varaiya, Stochastic Systems: Estimation Identificationand Adaptive Control. Prentice-Hall, Inc., 1986.

APPENDIX I

For notational simplicity, we use hnt to denote hn,St andhct+ to denote hm,Bt in the proofs.

Proof of Lemma 1. If λ is a CIB strategy profile and ψ is aCIB update rule consistent with λ, we define g ∈ G to be astrategy profile where

gn,St (hnt ) := λn,St (xnt , γψ(hct)) = λn,St (xnt , πγψt )

for all n ∈ N , (63)

gm,Bt (hct+) := λm,Bt (pt, γψ(hct+)) = λm,Bt (pt, πγψt+ )

for all m ∈M, (64)

We proceed to recursively define a belief system µ thatsatisfies (28) and (29), and is consistent with g. For thatmatter, for any time t ∈ T , in addition to hnt , n ∈ N , hct+we consider the histories hct and hnt+ := (hnt , pt).

We define µ(hc0+)(·) := 0 and µ(hn0+)(·) := 0 for n ∈ Nat time 0+. At time t = 1 we define, for all x1 ∈ XN ,

µ(hc1)(x1) :=P(x1) (65)

µ(hn1 )(x1) :=1{xn1 |hn1 }P(x−n1 ) (66)

for any histories hn1 , n ∈ N , and hc1. Then, (29) is satisfiedat time 0, (28) is satisfied at time 1, and g is consistent withµ before time 1. (basis of induction)

Suppose µ(hct)(·), µ(hnt )(·), µ(hc(t−1)+)(·) andµ(hn(t−1)+)(·) are defined, (29) is satisfied at time (t− 1)+,(28) is satisfied at time t, and g is consistent with µ beforetime t (induction hypothesis).

We proceed to define µ(hnt+1)(·), µ(hct+1)(·), µ(hct+)(·)and µ(hnt+)(·) and prove that (29) is satisfied at time t+ and(28) is satisfied at time t + 1, and g is consistent with µbefore time t+ 1.For any histories hct+ and hnt+ , n ∈ N , define the beliefs

µ(hct+)(x1:t) :=∏k∈N

µ(hct+)(xk1:t) (67)

µ(hnt+)(x1:t) := 1{xn1:t|hnt+}∏k 6=n

µ(hct+)(xk1:t) (68)

where for any k ∈ N

µ(hct+)(xk1:t) :=

µ(hct)(x

k1:t)

γψ(hct+

)(xkt )

γψ(hct)(xkt ),

when γψ(hct)(xkt ) 6= 0

0, when γψ(hct)(xkt ) = 0.

(69)

Then, at time t+, both sides of (29) are zero whenγψ(hct)(xt) =

∏k∈N γψ(hct)(x

kt ) = 0. When γψ(hct)(xt) 6=

0 we get

µ(hct+)(xt)

=∑

x1:(t−1)∈XN(t−1)

µ(hct+)(x1:t)

=∑

x1:(t−1)∈XN(t−1)

∏k∈N

µ(hct)(xk1:t)

γψ(hct+)(xkt )

γψ(hct)(xkt )

=∏k∈N

µ(hct)(xkt )γψ(hct+)(xkt )

γψ(hct)(xkt )

=∏k∈N

γψ(hct+)(xkt ) (70)

where the first and second equalities in (70) follow from (67)and (69), respectively, and the last equality in (70) followsfrom the induction hypothesis for (29) at time t. Therefore,(29) is true at time t+.

To show consistency at time t+, we need to show thatBayes rule, given by

µ(hct+)(x1:t) =Pgµ(hct)(x1:t, pt|h

ct)∑

x′1:tPgµ(hct)(x

′1:t, pt|hct)

(71)

µ(hnt+)(x1:t) =Pgµ(hnt )(x1:t, pt|h

nt )∑

x′1:tPgµ(hnt )(x

′1:t, pt|hnt )

(72)

is satisfied for any histories hct+ and hnt+ , n ∈ N when theabove denominators are non-zero. Because of (67) and (69)the left hand side of (71) is equal to

µ(hct+)(x1:t)

=∏k∈N

µ(hct+)(xk1:t)

=∏k∈N

µ(hct)(xk1:t)

γψ(hct+)(xkt )

γψ(hct)(xkt )

=µ(hct)(x1:t)∏k∈N

γψ(hct+)(xkt )

γψ(hct)(xkt )


λk,St (xkt , γψ(hct))(pkt )∑

x′kt ∈X λk,St (x′kt , γψ(hct))(p

kt )γψ(hct)(x

′kt )

(73)

when the denominators are non-zero. The last inequality in(73) is true because of (24) (ψ is consistent with λ).On the other hand, from the specification of g, the numeratorin the right hand side of (71) is equal to

Pgµ(hct)(x1:t, pt|hct)


λk,St (xkt , γψ(hct))(pkt ). (74)

Substituting (74) back into the right hand side of (71), wefind that the right hand side of (71) equals to the right handside of (73). Therefore, (71) is true. Using similar argumentsas in (71), we can show that (72) is satisfied for all n ∈ N .Therefore, we obtain consistency of µ at time t+.

At time t + 1, for any histories hct+1, n ∈ N , and hnt+1,we define the beliefs

µ(hct+1)(x1:t+1) :=∏k∈N

µ(hct+1)(xk1:t+1) (75)

µ(hnt+1)(x1:t+1) := 1{xn1:t+1}(hnt+1)

∏k 6=n

µ(hnt+1)(xk1:t+1)

(76)

where for any k ∈ Nµ(hct+1)(xk1:t+1)

:=

µ(hct+)(xk1:t)

Qk(xkt ,xkt+1)Q

y(xkt ,ykt )∑

x′kt ∈XQy(x′kt ,y

kt )γψ(h

ct+

)(x′kt ),

when the above denominator is non-zero,1|X |t γψ(hct+1)(xkt+1)

when the above denominator is zero.

(77)

µ(hnt+1)(xk1:t+1) := µ(hct+1)(xk1:t+1) for k 6= n. (78)

For any k ∈ N , when∑x′kt ∈X Q

y(x′kt , ykt )γψ(hct+)(x′kt ) =

0, we obtain, because of (77),

µ(hct+1)(xkt+1)

=∑

xk1:t∈X tµ(hct+1)(xk1:t+1)

=∑

xk1:t∈X t

1

|X |t γψ(hct+1)(xkt+1)

=γψ(hct+1)(xkt+1); (79)

when∑x′kt ∈X Q

y(x′kt , ykt )γψ(hct+)(x′kt ) 6= 0, we get

µ(hct+1)(xkt+1)

=∑

xk1:t∈X tµ(hct+1)(xk1:t+1)

=∑

xk1:t∈X tµ(hct+)(xk1:t)

Qk(xkt , xkt+1)Qy(xkt , y

kt )∑

x′kt ∈X Qy(x′kt , y

kt )γψ(hct+)(x′kt )

=∑xkt∈X

µ(hct+)(xkt )Qk(xkt , x

kt+1)Qy(xkt , y

kt )∑



=∑xkt∈X

γψ(hct+)(xkt )Qk(xkt , x

kt+1)Qy(xkt , y

kt )∑



=γψ(hct+1)(xkt+1), (80)

where the forth equality in (80) follows from the inductionhypothesis for (28) at time t+, and the last equality in (80)is true because of (25) (ψ is consistent with λ). Then, forany n ∈ N , from (76), (79) and (80) we obtain

µ(hnt+1)(xt+1)

=∑x1:t

1{xn1:t+1}(hnt+1)

∏k 6=n

µ(hct+1)(xk1:t+1)

=1{xnt+1}(hnt+1)

∏k 6=n

µ(hct+1)(xkt+1)

=1{xnt+1}(hnt+1)

∏k 6=n

γψ(hct+1)(xkt+1). (81)

Therefore, (28) is true at time t+ 1.To show consistency at time t+ 1, we need to show that

Bayes rule, given by

µ(hct+1)(x1:t+1)

=Pgµ(hct+ )(x1:t+1, dt, yt|hct+)∑

x′1:t+1Pgµ(hct+ )(x

′1:t+1, dt, yt|hct+)

(82)

µ(hnt+1)(x1:t+1)

=1{xnt+1}(h

nt+1)Pgµ(hnt+ )(x1:t+1, dt, yt|hnt+)∑

x′1:t+11{xnt+1}(x

′1:t+1)Pgµ(hnt+ )(x

′1:t+1, dt, yt|hnt+)

(83)

is satisfied for any histories hct+1 and hnt+1, n ∈ N when theabove denominators are non-zero.Because of (75) and (77) the left hand side of (82) is equalto

µ(hct+1)(x1:t+1)

=∏k∈N

µ(hct+)(xk1:t)Qk(xkt , x

kt+1)Qy(xkt , y

kt )∑



=µ(hct+)(x1:t)∏k∈N


kt )∑



=µ(hct+)(x1:t)∏k∈N


kt )∑


kt )µ(hct+)(x′kt )

(84)

when the above denominator is non-zero; the last equality in(84) follows from the induction hypothesis for (29) at timet+.On the other hand, because of the dynamics of the Markovchains and the specification of g, the numerator in the righthand side of (82) is equal to

Pgµ(hct+ )(x1:t+1, dt, yt|hct+)

=µ(hct+)(x1:t)∏m∈M

gm,B(hct+)∏k∈N


kt ). (85)

Substituting (85) back into the right hand side of (82), wefind that the right hand side of (82) is equal to the righthand side of (84). Therefore, (82) is satisfied for any historyhct+1. Using similar arguments as in (82), we can show that(83) is satisfied for all hnt+1, n ∈ N . Therefore, we obtainconsistency of µ at time t+ 1.

Proof of Lemma 2. We only prove the lemma for sellern, n ∈ N ; the proof for buyer m,m ∈ M is the same asthat for seller n. To simply the notation, we use γ to denotethe CIB belief γψ constructed from ψ.

Let (g, µ) = f(λ, ψ) as in Lemma 1. Suppose every buyerm uses the strategy gm,B and every seller k 6= n uses thestrategy gk,S . Since the strategy of every buyer and every

seller k 6= n is fixed, the best response of seller n is thesolution to the following stochastic control problem.

maxgn,S∈Gn,S

Egn,S

µ

[T∑t=1

φn,St (Pt, Dt)

]. (86)

Below, we show that the stochastic control problem (86) isequivalent to a Markov Decision Process (MDP) with stateprocess {(Xn

t ,Πγt ), t ∈ T } and action process {Pnt , t ∈ T }

of seller n.Since P kt , k 6= n satisfies (26), the distribution of P kt

only depends on {Xkt ,Π

γt }. Each Dm

t satisfies (27), thus,the distribution of Dm

t depends on {Pt,Πγt+}. Note that

Πγt+ = ψt(Π

γt , Pt) from (18). Consequently, the distribution

of Dmt only depends on {Pnt , X−nt ,Πγ

t }.Therefore, the distribution of (Pt, Dt) only depends on{Pnt , X−nt ,Πγ

t }. Then,

Egn,S

µ

[φn,St (Pt, Dt)|Pnt , X−nt ,Πγ

t , Hnt

]=E

[φn,St (Pt, Dt)|Pnt , X−nt ,Πγ

t

]=:φn,St (Pnt , X

−nt ,Πγ

t ); (87)

the first equality in (87) is true because the conditionalexpectation is (gn,S , µ)-independent.For any realization hnt = (hct , x

n1:t) ∈ Hn,St and pnt ∈ P , let

πγt := γ(hct). From (87) we obtain

Egn,S

µ

[φn,St (Pt, Dt)|hnt , pnt

]=Eg

n,S

µ

[E[φn,St (Pt, Dt)|Pnt , X−nt ,Πγ

t , Hnt

]|hnt , pnt

]=Eµ

[φn,St (pnt , X

−nt , πγt )|hct , xn1:t, pnt

](a)=Eµ

[φn,St (pnt , X

−nt , πγt )|hct

]=:φn,St (πγt , p

nt ). (88)

Equation (a) in (88) is true because from (28), X−nt and Xnt

are independent under µ. Because of (88), seller n’s objectivefunction (86) can be written as

maxgn,S∈Gn,S

Egn,S

µ

[T∑t=1

φn,St (Πγt , P

nt )

]. (89)

From the assumption on the evolution of {Xnt , t ∈ T }, the

distribution of Xnt+1 depends only on Xn

t . From (16) and(18) we get

Πγt+1 =ψt+(Πγ

t+, Yt)

=ψt+(ψt(Πγt , Pt), Yt). (90)

Furthermore, the distribution of P kt only depends on{Xk

t ,Πγt }; the distribution of Yt depends only on

(Xnt , X

−nt ). Therefore, the distribution of (Xn

t+1,Πγt+1)

depends only on {Xnt , X

−nt , Pnt ,Π

γt }. Then, for any real-

izations xnt+1 ∈ X , πγt+1 ∈ ∆(X ), hnt = (hct , xn1:t) ∈ Hn,St

and pnt ∈ P , we obtain

Pgn,S

µ (xnt+1, πγt+1|hnt , pnt )

=∑x−nt

Pgn,S

µ (xnt+1, πγt+1|x−nt , hct , x

nt , p

nt )Pg

n,S

µ (x−nt |hnt , pnt )

=∑x−nt

P(xnt+1, πγt+1|πγt , xt, pnt )µ(hnt )(x−nt )

=∑x−nt

P(xnt+1, πγt+1|πγt , xt, pnt )

∏k 6=n

πk,γt (xkt )

=Pgn,S

µ (xnt+1, πγt+1|xnt , πγt , pnt ). (91)

The third equality in (91) follows from Lemma 1, and thelast equality follows from the same arguments as the firstthrough third equalities. Equation (91) shows that the process{(Xn

t ,Πγt ), t ∈ T } is a controlled Markov Chain with

respect to the action process {Pnt , t ∈ T } for seller n. Thisprocess along with (89) define a MDP. From the theory ofMDP (see [29, Chap. 6]), there is a best response gn,S , underthe belief system µ, of seller n such that

gn,S(Hnt ) = λn,S(Xn

t ,Πγt ) = λn,S(Xn

t , γ(Hct )) (92)

for some function λn,S(·) at each time t. This completes theproof of Lemma 2 for the sellers.

Proof of Theorem 2.(⇒)If a pair (λ∗, ψ∗) is a CIB equilibrium, let (g∗, µ∗) =f(λ∗, ψ∗) be the PBE constructed from (λ∗, ψ∗) accordingto Lemma 1. From the definition of the functions V ∗ :=V (λ∗, ψ∗), V ∗n,St (γ(hct), x

nt ) is the expected continuation

revenue from time t on, under µ∗, for seller n at hnt underthe strategy profile g∗. If (35) is not true, then there existsanother strategy of seller n that achieves reward higher thanV ∗n,St (γ(hct), x

nt ). This contradicts the fact that (g∗, µ∗) is

a PBE. Similar arguments show that (36) should also hold.(⇐)Suppose λ∗ solves the dynamic program (35)-(36) for V ∗ :=V (λ∗, ψ∗). Let (g∗, µ∗) = f(λ∗, ψ∗). Then µ∗ is consistentwith g∗ because of Lemma 1.

We only prove sequential rationality for seller n. Sequen-tial rationality for the buyers can be obtained using similararguments.

If every buyer m uses the strategy g∗m,B and every sellerk 6= n uses the strategy g∗k,S , from Lemma 2 we know thatthere is a best response gn,S , under the belief system µ∗, ofseller n such that

gn,St (hnt ) = λn,St (xnt , πγ∗

t ) (93)

for some CIB strategy λn,St (·) at each time t. Define aCIB strategy profile λ := (λn,S , λ∗−n,S , λ∗B). From thedefinition of the functions V := V (λ, ψ∗), V n,St (γ(hct), x

nt )

is the expected continuation revenue from time t on, underµ∗, for seller n at hnt under the strategy profile g. Sincegn,S is a best response, V n,St (γ(hct), x

nt ) gives seller n

the maximum expected continuation revenue from time t

on under µ∗. However, since λ∗n,St (xnt , πγ∗

t ) is one of theoptimal solutions of (35) at each time t, we can show byinduction that

V ∗n,St (γ(hct), xnt ) ≥ V n,St (γ(hct), x

nt ). (94)

Therefore, (94) implies that, at any time t, g∗n,St:T gives sellern the maximum expected continuation revenue from time ton under µ∗. This complete the proof that (g∗, µ∗) is a PBE.As a result, the pair (λ∗, ψ∗) forms a CIB equilibrium of thedynamic oligopoly game described in Section II.

Proof of Theorem 1. Consider a CIB update rule ψ∗ givenby

ψ∗nt (πt, pnt ) = πnt (95)

ψ∗nt+ (πt+ , ynt )(xnt+1)

=

∑xnt

Qn(xnt ,xnt+1)Q

y(xnt ,ynt )π

nt+

(xnt )∑x′nt ∈X

Qy(x′nt ,ynt )π

nt+

(x′nt ) ,

when the above denominator is non-zero.1{xnt+1=1}, otherwise

(96)

Based on ψ∗, we solve the dynamic program (35)-(36) toget a CIB strategy profile λ∗ and show that (λ∗, ψ∗) formsa CIB equilibrium.

To solve the dynamic program, first consider buyers’strategies. Suppose the sellers and buyers use a CIB strategyprofile λ. Since Yt is independent of Dt, the optimizationproblem (36) defined by the functions V (λ, ψ∗) becomes

arg maxdmt ∈D

{Eλ−m,Btπt+

[φm,Bt (Yt, pt, (d

mt , D

−mt ))

+V m,Bt+1 (ψ∗t+(πt+ , Yt))|pt, πt+]}

= arg maxdmt ∈D

{Eλ−m,Btπt+

[φm,Bt (Yt, pt, (d

mt , D

−mt ))|pt, πt+

]}+ Eπt+

[V m,Bt+1 (ψ∗t+(πt+ , Yt))|pt, πt+

]. (97)

Consequently, λBt solves (97) if for any pt ∈ P andπt+ ∈ ∆(XN ), {λm,Bt (pt, πt+),m ∈ M} is a BayesianNash equilibrium of the game Gt+(pt, πt+) where each agentm has the payoff

Eπt+[φm,Bt (Yt, pt, (d

mt , d

−mt ))|pt, πt+

]. (98)

Since D is a finite set, Gt+(pt, πt+) has at least one equilib-rium. Let {λ∗m,Bt (pt, πt+),m ∈ M} denote an equilibriumof Gt+(pt, πt+) for t ∈ T . Then λ∗Bt solves (36) for anytime t ∈ T .

Now suppose the buyers use the strategy λ∗B and thesellers use some strategy λS . Taking (95) into account,the optimization problem (35) defined by the functionsV ((λS , λ∗B), ψ∗) becomes

arg maxpnt ∈D

{Eλ∗−n,Stπt

[V ∗n,St+ (

ψ∗t (πt, (pnt , P

−nt )), (pnt , P


]}= arg max

pnt ∈D

{Eλ−n,Stπt

[V n,St+ (πt, zt, (p

nt , P


]}.

(99)

The function V n,St+ (·) can be computed by

V n,St+ (πt, pt, xnt )

=Eλ∗Btπt

[φn,St (pt, Dt)

+V n,St+1 (ψt+(πt, Yt), Xnt+1)|πt, pt, xnt

]=∑

dt∈DMφn,St (pt, dt)

∏m∈M

λ∗m,Bt (pt, πt)(dmt )

+ Eπt[V n,St+1 (ψt+(πt, Yt), X

nt+1)|πt, xnt

]=φn,St (pt, πt) + V n,St+1 (πt, x

nt ) (100)

where

φn,St (pt, πt)

:=∑

dt∈DMφn,St (pt, dt)

∏m∈M

λ∗m,Bt (pt, πt)(dmt ), (101)

and

V n,St+1 (πt, xnt )

:=Eπt[V n,St+1 (ψt+(πt, Yt), X

nt+1)|πt, xnt

]. (102)

Because of (100)-(102), the optimization problem (99) be-comes

arg maxpnt ∈D

{Eλ−n,Stπt

[φn,St ((pnt , P

−nt ), πt)

+V n,St+1 (πt, xnt )|xnt , πt

]}= arg max

pnt ∈D

{Eλ−n,Stπt

[φn,St ((pnt , P

−nt ), πt)|πt

]}+ V n,St+1 (πt, x

nt ). (103)

Consequently, λSt solves (99) if λn,St (xnt , πt) = λn,St (πt)and for any πt ∈ ∆(XN ); {λn,St (πt), n ∈ N} is a Nashequilibrium of the game Gt(πt) where each seller n has thepayoff

φn,St ((pnt , p−nt ), πt). (104)

Since P is a finite set, Gt(πt) has at least one equilibrium.Let {λ∗n,St (πt), n ∈ N} denote an equilibrium of Gt(πt) fort ∈ T , and set {λ∗n,St (xnt , πt) := λ∗n,St (πt), n ∈ N}. Thenλ∗St solves (35) for any time t ∈ T .

Thus, we obtain a CIB strategy profile λ∗ = {λ∗B , λ∗S}and a CIB update rule ψ∗ such that λ∗ solves the dynamicprogram defined by the functions V (λ∗, ψ∗). By its definition(96), ψ∗ satisfies (25). Furthermore, since λ∗n,St (xnt , πt) =λ∗n,St (πt),

λ∗n,St (xnt , πt)(pnt )πnt (xnt )∑

x′nt ∈X λ∗n,St (x′nt , πt)(p

nt )πnt (x′nt )

=λ∗n,St (πt)(p

nt )πnt (xnt )∑

x′nt ∈X λ∗n,St (πt)(pnt )πnt (x′nt )

=πnt (xnt )

=ψ∗nt (πt, pnt )(xnt ). (105)

Thus, ψ∗ also satisfies (24). As a result, ψ∗ is consistentwith λ∗. According to Theorem 2, (λ∗, ψ∗) forms a CIBequilibrium.

APPENDIX II

Details for Section IV. We provide detailed derivations forequations in Section IV.Derivation of (45)-(46):the objective function is equal to

Eπt+[φm,Bt (Yt, pt, (d

mt , d

kt ))|pt, πt+

]=∑n=1,2

dmt (n)

Eπt+ [Y nt ]− pnt −∑k 6=m

dkt (n)

=∑n=1,2

dmt (n)

πnt+yD + qyLy − pnt −∑k 6=m

dkt (n)

(106)

where yD := qyHy − qyLy. Suppose every buyer k ∈M usesthe same mixed strategy such that P(Dk

t (1) = 1) = α(πt+).Then this mixed strategy forms a BNE if each buyer m ∈Mis indifferent between buying from seller 1 and seller 2. Thatis,

π1t+yD + qyLy − p1t − E

∑k 6=m

Dkt (1)

=π2

t+yD + qyLy − p2t − E

∑k 6=m

Dkt (2)

. (107)

The above equation is equivalent to

π1t+yD − p1t − (M − 1)α(πt+)

=π2t+yD − p2t − (M − 1)(1− α(πt+)). (108)

Solving (108) we obtain a symmetric equilibrium given by

P(Dmt (1) = 1) = α(πt+)

=1

2(M − 1)

(yD(π1

t+ − π2t+)− p1t + p2t

)+

1

2(109)

for t = 1, 2.Derivation of (57)-(59) and (60)-(62):

Under the CIB strategy profile (λS , λ∗B) and the CIB updaterule ψβ2 , the expected instantaneous revenue of seller n, n =1, 2, at time 2 becomes


[ ∑m∈M

Dm2 (n)

]=pn2ME(λS ,λ∗B)

π2[1{Dm2 (n)=1}]

=pn2M

2(M − 1)Eλ

S2π2

[yD(Πn

2+ −Π−n2+)

−Pn2 + P−n2 +M − 1]

=pn2M

2(M − 1)

(yD(Eλ

S2π2 [Πn

2+ ]− EλS2π2 [Π−n2+

])

−pn2 + EλS2π2 [P−n2 ] +M − 1

), (110)

where the second equality in (110) follows from (109). Wecompute each expectation term in (110). The first expectationin (110) is equal to

EλS2π2 [Πn

2+ ] = EλS2π2 [ψn,β2

2 (π2, pn2 )] = ψn,β2

2 (π2, pn2 ). (111)

The second expectation in (110) is equal to

EλS2π2 [Π−n2+

]

=EλS2π2 [ψ−n,β2

2 (π2, P−n2 )]

=(π−n2 β−n2 (H,π2) + (1− π−n2 )β−n2 (H,π2))ψ−n,β2

2 (π2, h)

+ (π−n2 (1− β−n2 (H,π2))

+ (1− π−n2 )(1− β−n2 (H,π2)))ψ−n,β2

2 (π2, l)

=π−n2 β−n2 (H,π2) + π−n2 (1− β−n2 (H,π2))

=π−n2 (112)

where the third equation in (112) follows from the definition,(55)-(56), of ψ−n,β2

2 (·). The third expectation in (110) isequal to

EλS2π2 [P−n2 ]

=Eπ2 [β−n,β2

2 (X−nt , π2)h+ (1− β−n,β2

2 (X−nt , π2))l]

=π−n2 [β−n,β2

2 (H,π2)h+ (1− β−n,β2

2 (H,π2))l]

+ (1− π−n2 )[β−n,β2

2 (L, π2)h+ (1− β−n,β2

2 (L, π2))l]

=:p−n2 (π2, β2). (113)

Consequently, (57)-(59) follow from (110)-(113). Using sim-ilar arguments as in (110)-(113), we obtain (60)-(62).

Derivation of the equivalence between the analogue of(35) and (60):Because of (51), each seller’s expected continuation revenueat time 1, in the analogue of (35) for our example, is givenby

EλS1π1

[V n,S1+

(ψβ1 (π1, P1), P1, xn1 )|π1, xn1

]=Eλ

S1π1

[V n,S1+

(π1, P1, xn1 )|π1

](114)

The function V n,S1+(π1, p1, x

n1 ) can be computed by

V n,S1+(π1, p1, x

n1 )

=Eλ∗B1π1

[φn,S1 (p1, D1)

+V n,S2 (ψ1+(π1, Y1), Xn2 )|π1, p1, xn1

]=pnt E

λ∗B1π1

[ ∑m∈M

Dm1 (n)

]+ Eπ1

[V n,S2 (ψ1+(π1, Y1), Xn

2 )|π1, xn1]

(115)

Since the second term in (115) doe not depend on thesellers’ decisions pt, maximizing the analogue of (35) forour example is equivalent to maximizing the instantaneousrevenue (the first term in (115)). The instantaneous revenueof seller n is equal to (60). Therefore, the sellers’ problemat t = 1 is to determine a CIB equilibrium for the game withobjectives given by (60).

Documents

Dynamic Oligopoly Games with Private Markovian Dynamics