5
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SEQUENTIAL BAYESIAN LEARNING IN LINEAR NETWORKS WITH RANDOM DECISION MAKING Yunlong Wang and Petar M.Djuric Department of Electrical and Computer Engineering Stony Brook University, Stony Brook, NY 11794 Email: {yunlong.wang,[email protected]} ABSTRACT In this paper, we consider the problem of social learning when deci- sions by agents in a network are made randomly. The agents receive private signals and use them for decision making on binary hypothe- ses under which the signals are generated. The agents make the de- cisions sequentially one at a time. All the agents know the decisions of the previous agents. We study a setting where the agents instead of making deterministic decisions by maximizing personal expected utility, they act randomly according to their private beliefs. We pro- pose a method by which the agents learn from the previous agents' random decisions using the Bayesian theory. We define the concept of social belief about the truthfulness of the two hypotheses and an- alyze its convergence. We provide performance and convergence analysis of the proposed method as well as simulation results that include comparisons with a deterministic decision making system. Index Terms- social learning, Bayesian learning, information aggregation, multiagent system, decision 1. INTRODUCTION present a method that implements Bayesian learning by the agents in a sequential decision making scenario, there each agent has the ability to use Bayes' rule to form its belief on the true state of nature based on its private observation and the actions made by previous agents. We analyze the evolution of beliefs in this system theoreti- cally and demonstrate its performance by simulations. The paper is organized as follows. In the next section we state the problem. In Section 3, we present the proposed Bayesian learn- ing method. We prove the convergence of the social belief and ac- tions in Section 4. Simulation results are provided in Section 5, and concluding remarks in Section 6. 2. PROBLEM STATEMENT We consider the sequential decision making problem in linear net- works of Bayesian agents An, n E N+, where each agent makes a decision after it gets its private observation and the decisions of all the previous agents. Mathematically, each agent An receives an in- dependent private observation Yn that is generated according to one of the following two hypotheses: where we define 7rn to be the social belief as a posterior on 1-ll con- ditioned on the action sequence until agent An, Le., with 7ro being defined by 1/2. Here we remark that the social belief 7rn-I serves as the prior knowledge of agent An before it has its private observation Yn. For any agent An, after obtaining the private belief by the Bayes rule in (2), we assume that it makes its decision by drawing it from where () is known and W n is the observation noise modeled as a Gaussian random variable with zero-mean and known variance Without loss of generality, we assume that () > 0. For every agent, its prior probabilities of the hypotheses are noninformative, where we let p(1-lo) == p(1-lI) == 1/2. Let an( E A == {O, I}) be the decision of agent An and al: n denote the decision sequence from agent Al to agent An. Then the agent An can formulate its private belief in 1-l1, p(1-lllal: n -l, Yn) by using the Bayes rule given by the following equation, \In E N+, (3) (2) (1) 1 1 + (1 - 7rn -I)P(Yn l1-lo) , 7rn-IP(Yn 11-l1) Yn == Wn , Yn==(}+Wn, 1-ll : 1-lo : This work was supported by NSF under Award CCF-1320626. It is an important issue in social learning how agents make decisions and learn from others' decisions. When individuals make decisions based on private and imperfect information, it is natural that they also learn from others' decisions made in similar situations. The agents can then make their own decisions with the purpose of maxi- mizing some utility function. A large body of literature investigates this problem including non-Bayesian social learning approaches [1, 2, 3, 4], and Bayesian social learning methods [5, 6, 7, 8, 9]. In this paper, we focus on the Bayesian methods. Some work on this issue can be found in [5, 10, 11], where the interest is to study herding behavior and information cascade. In other works [6, 12], the con- ditions for asymptotic learning are studied. Recently, the concern of the effect on agent's utility function from previous decisions is ad- dressed in [13]. An overview of models and techniques for studying social learning can be found in [5, 8]. In this paper, our contribution lies in that we consider random- ness in an agent's decision making. In most of the current literature, the agents are assumed to be "rational", which means that each agent makes decision in order to maximize the expected value of its utility function. Once the agents obtain information and form their beliefs on the true state of nature, they behave deterministically. However, in some scenarios, the behavior of agent may be by design random. In this work, we replace the "rational" assumption of the agent be- havior with a model of random decision making that establishes a random mapping from the agent's belief to the action space. We 978-1-4799-2893-4/14/$31.00 ©2014 IEEE 6404

[IEEE ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Florence, Italy (2014.5.4-2014.5.9)] 2014 IEEE International Conference

  • Upload
    petar-m

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Florence, Italy (2014.5.4-2014.5.9)] 2014 IEEE International Conference

2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP)

SEQUENTIAL BAYESIAN LEARNING IN LINEAR NETWORKS WITH RANDOM DECISIONMAKING

Yunlong Wang and Petar M.Djuric

Department of Electrical and Computer EngineeringStony Brook University, Stony Brook, NY 11794

Email: {yunlong.wang,[email protected]}

ABSTRACT

In this paper, we consider the problem of social learning when deci­sions by agents in a network are made randomly. The agents receiveprivate signals and use them for decision making on binary hypothe­ses under which the signals are generated. The agents make the de­cisions sequentially one at a time. All the agents know the decisionsof the previous agents. We study a setting where the agents insteadof making deterministic decisions by maximizing personal expectedutility, they act randomly according to their private beliefs. We pro­pose a method by which the agents learn from the previous agents'random decisions using the Bayesian theory. We define the conceptof social belief about the truthfulness of the two hypotheses and an­alyze its convergence. We provide performance and convergenceanalysis of the proposed method as well as simulation results thatinclude comparisons with a deterministic decision making system.

Index Terms- social learning, Bayesian learning, informationaggregation, multiagent system, decision

1. INTRODUCTION

present a method that implements Bayesian learning by the agentsin a sequential decision making scenario, there each agent has theability to use Bayes' rule to form its belief on the true state of naturebased on its private observation and the actions made by previousagents. We analyze the evolution of beliefs in this system theoreti­cally and demonstrate its performance by simulations.

The paper is organized as follows. In the next section we statethe problem. In Section 3, we present the proposed Bayesian learn­ing method. We prove the convergence of the social belief and ac­tions in Section 4. Simulation results are provided in Section 5, andconcluding remarks in Section 6.

2. PROBLEM STATEMENT

We consider the sequential decision making problem in linear net­works of Bayesian agents An, n E N+, where each agent makes adecision after it gets its private observation and the decisions of allthe previous agents. Mathematically, each agent An receives an in­dependent private observation Yn that is generated according to oneof the following two hypotheses:

where we define 7rn to be the social belief as a posterior on 1-ll con­ditioned on the action sequence until agent An, Le.,

with 7ro being defined by 1/2. Here we remark that the social belief7rn-I serves as the prior knowledge of agent An before it has itsprivate observation Yn.

For any agent An, after obtaining the private belief by the Bayesrule in (2), we assume that it makes its decision by drawing it from

where () is known and W n is the observation noise modeled as aGaussian random variable with zero-mean and known variance a~.

Without loss of generality, we assume that () > 0. For every agent,its prior probabilities of the hypotheses are noninformative, wherewe let p(1-lo) == p(1-lI) == 1/2.

Let a n (E A == {O, I}) be the decision of agent An and al:ndenote the decision sequence from agent Al to agent An. Then theagent An can formulate its private belief in 1-l1, p(1-lllal:n -l, Yn)by using the Bayes rule given by the following equation, \In E N+,

(3)

(2)

(1)

11 + (1 - 7rn-I)P(Yn l1-lo) ,

7rn-IP(Yn 11-l1)

Yn == Wn ,

Yn==(}+Wn ,1-ll :1-lo :

This work was supported by NSF under Award CCF-1320626.

It is an important issue in social learning how agents make decisionsand learn from others' decisions. When individuals make decisionsbased on private and imperfect information, it is natural that theyalso learn from others' decisions made in similar situations. Theagents can then make their own decisions with the purpose of maxi­mizing some utility function. A large body of literature investigatesthis problem including non-Bayesian social learning approaches [1,2, 3, 4], and Bayesian social learning methods [5, 6, 7, 8, 9]. In thispaper, we focus on the Bayesian methods. Some work on this issuecan be found in [5, 10, 11], where the interest is to study herdingbehavior and information cascade. In other works [6, 12], the con­ditions for asymptotic learning are studied. Recently, the concern ofthe effect on agent's utility function from previous decisions is ad­dressed in [13]. An overview of models and techniques for studyingsocial learning can be found in [5, 8].

In this paper, our contribution lies in that we consider random­ness in an agent's decision making. In most of the current literature,the agents are assumed to be "rational", which means that each agentmakes decision in order to maximize the expected value of its utilityfunction. Once the agents obtain information and form their beliefson the true state of nature, they behave deterministically. However,in some scenarios, the behavior of agent may be by design random.In this work, we replace the "rational" assumption of the agent be­havior with a model of random decision making that establishes arandom mapping from the agent's belief to the action space. We

978-1-4799-2893-4/14/$31.00 ©2014 IEEE 6404

Page 2: [IEEE ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Florence, Italy (2014.5.4-2014.5.9)] 2014 IEEE International Conference

the following probability mass function: Given the observation Yn, the agent An makes its decision accordingto its private belief by the following equation:

(7)

e O"~ I 7rn-1 f3 0where "In = - - -0 og , and = 2' and where the2 1 - 7rn -1 O"w

last sign of equality is due to the data model in (1). Thus, we canwrite

p(1-l1I a 1:n-1, Yn)1In other word, agent An makes its decision by generating a Bernoulli

random variable with probability p(1-l1Ia1:n-1, Yn) to be one as itsdecision. By contrast, in most of the existing literature, "rational"agents make decision 1 whenever p(1-l1Ia1:n-1, Yn) > 0.5 and 0otherwise. When the utility is 1 if an agent makes the right decisionand 0 otherwise, the expected utility of agent An is maximized bythis deterministic policy. After making the decision an, agent Anbroadcasts its decision to all the subsequent agents, and they all up­date their social beliefs from 7rn-1 to 7rn. For clarity, we present adiagram that depicts the random system in Fig. 1.

log (p(1-l1I a1 :n- 1,Yn))p(1-lo la1:n-1, Yn)

Fig. 1. Sequential decision making based on private signals and allthe decisions made by previous agents.

We propose a method where the agents update their social beliefs byBayes' rule from 7rn-1 to 7rn . For all the agents An, n E N+, thesocial belief is updated according to,

In this section, we analyze the asymptotic property of the proposedsequential method. As pointed out in (3), the social belief is a de­terministic function of the action sequence, whereas the action se­quence a1:n is random due to the data model. We first examine the

4. ANALYSIS

where eo = 0 and 01 = O.We summarize the behavior of agent An at time slot t ::; n by

the following steps:

Step 1: If t = n, after observing Yn, agent An calculates the pri­vate belief by (2) and makes a random decision by (4). Oth­erwise, agent An implements step 2.

Step 2: The agent An calculates "It from the current social belief7rt-1 by (8) and the two likelihoods l~O) and l~l) by integra­tion in (9).

Step 3: After observing the action of agent At, the agent An up­dates its social belief from 7rt-1 to 7rt by (6).

We also point out that the above sequential learning algorithm be­longs to the "social learning filter" in [5].

With this we have shown that with 7rn -1, the agent An's behaviorcan be modeled by a logistic function where the argument is the ob­servation Yn and the dependent variable is the probability of makinga decision of 1. Here we remark that the shaping parameter f3 isdefined by the signal to noise ratio of the data model, and the shiftparameter "In is just the threshold given by the Bayesian hypothe­sis testing method. When Yn = "In, we have p(1-l1I a 1:n-1, Yn) =p(1-lo la1:n-1, Yn) = 1/2.

The agent An needs to implement the marginalization of Yn toget l~k), Vk E {O, I}, according to

l~k) p(an = 1Ia 1:n-1, 1-lk)i: p(Cln = lICll:n-l,Yn)P(Ynl1lk)dYn

100 1 1 ( (Yn -Ok)2) d--------- exp - Y-00 1 + e-{3(Yn -f'n) v'27r0"~ 20"~ n,

(9)

(6)

(5)lim IE7rn = k, Vk E {a, I},n--+oo

3. THE PROPOSED BAYESIAN LEARNING METHOD

where k is the index of the true hypothesis.

((1)) I-an

7rn-1 (1 - In )

In the following sections, we propose a Bayesian learningmethod for updating the beliefs of the agents. We show that accord­ing to the method, the expected value of social belief asymptoticallyconverges to the true state of nature, or mathematically, we claimthat

where l~O) = p(an = 1I a 1:n-1,1-l0) and l~l) = p(an =1Ia1:n-1, 1-l1) denote the probability of agent An making decisionan = 1 given the decision sequence up to a n-1 and the true stateof nature being 1-l0 or 1-l1, respectively. For the case where n = 1,we define that liO) = p(a1 = 111-l0) and li1) = p(a1 = 111-l1).

In order to obtain l~O) and l~l), we first need to get the probabilitythat an = 1 with a known observation Yn, and then marginalize Yn.

6405

Page 3: [IEEE ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Florence, Italy (2014.5.4-2014.5.9)] 2014 IEEE International Conference

expected value of the social belief, lE7rn, which is given by,

Now we present our main analytical result by the following the­orem:

(18)

(19)

1,

1.

lim p(7rn == k)n--+oolim p(an == k)

n--+oo

5. SIMULATION RESULTS

The statement (18) can be shown by contradiction. Without loss ofgenerality, we continue to assume that k == 1. Then if we assumethat p(7rn == 1) < 1 when n goes to infinity, noting that 7rn can beno larger than 1, we have that lE7rn < 1, which is in contradictionwith theorem 1. Therefore (18) holds.

For the statement (19), we also assume that k == 1. From (2), itholds that when 7rn == 1, the private belief p(1-l1Ia1:n, Yn+1) == 1regardless of the observation Yn+1. Hence, the event an+1 == 1 is asubset of the event 7rn == 1. Therefore we get that p(a n +1 == 1) >p(7rn == 1). Noting thatp(an == 1) < 1, (19) can be proved by (18).

then by (6), it holds that °< 7rn < 1 if and only if°< l~O) < 1and °< l~l) < 1. By (9), \lk E {O, I}, °< l~k) < 1 when

() O"~ 1 7rn-1 . fi . h· h . d b h7n == -2 - -() og IS n1te, W 1C IS guarantee y t e1 - 7rn -1

assumption in the induction step that °< 7rn -1 < 1. Thus, we haveshown that °< 7rn < 1, \In E N. (17)

Following the result in (17) and (9), we have (l~l) - l~0))2 > o.Then (16) shows that ~(a1:n-1) > 0, which is a sufficient conditionfor lE7rn to be increasing in terms of n. Considering the fact thatlE7rn ~ 1 \In E N+, we have shown that given the true state ofnature is 1-l1, the limit of lE7rn exists and it is 1. With this, the proofof (11) is completed. D

From theorem 1, one can immediately state a corollary that both,the social belief 7rn and the action of agent An, an, converge tok E {O, I} in probability when 1-lk is the true state of nature. Thenit holds that

(11)lim lE7rn [t] == {I, if 1-lt == 1-l1,t--+oo 0, if 1-lt == 1-l0.

1

X L p(an la1:n-1,1-l1)p(1-l1I a1:n). (10)cxn=o

lE7rn L p(a1:nl1-l1)p(1-l1Ia1:n)CXl:nEAn

L p(a1:n-111-l1)CXl:n-l EAn-l

Proof: Considering the symmetric structure of the proposed sys­tem, we just prove the convergence of social belief to 1 in the casewhere the true state of nature is 1-l1. The proof of convergence to °can be repeated with just notational changes. Without loss of gener­ality, we assume that the true state of nature is 1-l1.

We first show that lE7rn [t] increases in terms of n. By (10), wecan write,

Theorem 1 In the proposed random decision making system, if theagents update their social beliefs using Bayes' rule according to (6),the expected value ofsocial beliefasymptotically converge to 1 whenthe true state ofnature is 1-lt == 1-l1, and 0 otherwise, i.e.,

where ~(a1:n-1) is a function of the action sequence a1:n-1, whichis given by,

1

L p(an la1:n-1, 1-l1)p(1-l1Ia1:n) - p(1-l1Ia1:n-1).cxn=O

(13)

Therefore, to prove that lE7rn > lE7rn-1, it is sufficient to show,

In this section, we present simulation results on the evolution of so­cial belief in our sequential system along with some numerical com­parisons with a deterministic decision making method from [8]. Inthe deterministic method, given the same observation model in (1),after calculating the private belief p(1-l1Ia1:n-1, Yn), the agent Anmade a decision by the following rule:

By equation (6), we can write,

(14)

an == {I, if p(1-l1Ia1:n-1, Yn) > 1/2,0, otherwise.

(20)

where Z is the multiplication of the two denominators and is givenby Z == (7rn -1 (l-l~1))+ (1-7rn -1)(1 _l~O))) (7rn_1l~1)+ (I-

) (0))7rn -1 In .We next need to show \In E N, °< 7rn < 1. This claim can

be proved by mathematical induction. Since we define 7r0 == 0.5,we have °< 7r0 < 1. Assuming that °< 7rn -1 < 1 is true,

From the above equation, one can derive that,

If we set the reward to one when the agent made a decision identicalto the true hypothesis, and zero otherwise, by the above rule, theexpected utility of the agent An was maximized.

In the first experiment, we verified the analytical result of theexpected social belief by Monte Carlo methods, which were con­ducted with 2,000 trials. In each trial, we set the number of agentsto be 1000. The parameters were O"w == 1, with different () == 1,() == 0.5 and () == 0.2. The results are shown in Fig. 2. On theabscissa, we plotted the agent index and on the ordinate the estimateof the expected social belief, which was given by the average socialbelief from all the 2,000 trials. The private signals of the agents weregenerated according to 1-l1.

In Fig. 2, it was shown that the expected social belief was in­creasing in both methods, and we observed crossovers of the evolu­tion of expected social belief in cases where () == 1 and () == 0.5. Ifwe extended the abscissa to n == 2, 000, we also see the crossover

(15)

(16)7rn-1 (1 _ )2 (l(l) _ l(0))2Z 7rn -1 n n ,

+ (1) ( ) (0) - 7rn -1·7rn- 1ln + 1 - 7rn -1 In

7rn -1(1 _l~l)) + (1 - 7rn -1)(1 _l~O))

7rn-1 (l~1))2

6406

Page 4: [IEEE ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Florence, Italy (2014.5.4-2014.5.9)] 2014 IEEE International Conference

Fig. 2. The convergence of social beliefs with two methods, wherethe red and blue lines correspond to the random and deterministicdecision making methods, respectively.

showing in case where () == 0.2. It can be seen that the higher signal­to-noise ratio in the data model yielded faster convergence.

In order to show the herding behavior of the agents, we plottedthe histogram of social belief in Fig. 3, where the parameters wereO'w == 1 and () == 0.5.

Fig. 3. The histograms of social beliefs with random and determin­istic decision making.

In Fig. 3, the upper plot shows the histogram of social belief inthe random system for agents A 20 , A 250 , A500 , A750 , A lOOO , and thelower plot displays the social belief in the deterministic system forthe same agents. It can be seen that in the deterministic system, theagents were more likely to show herding behavior entailing that thesocial belief became almost unchanged. In [10] and [11], more anal­ysis is provided about the herding behavior in deterministic system.It was shown that in these systems, once several consecutive agentsmade a decision in favor of Ho , it was very hard for the subsequentagents to make the opposite decision. In the random system, theagents' behaviors allow for choosing the correct hypothesis even ifseveral consecutive agents made identical decisions, the social beliefkept evolving.

In Fig. 4, we plotted the proportion of agents that made theright decision to verify (19). As shown in the figure, the agents withboth methods had a trend to make the right decision asymptotically,

6407

Fig. 4. The proportion of agents making the right decision.

whereas in the random system the agents' actions had some fluctua­tions.

Finally, we showed the social belief evolution with one outlier.In the second experiment, using the same data model with 0'w == 1and () == 1, we set the sequence of decisions to be al:40 all equalto ones except that ala == O. Figure 4 showed the evolotion ofsocial beliefs of the deterministic and proposed methods conditionedon this decision sequence. It can be seen that the social belief inthe deterministic method decreased from 0.9 to 0.4 just because onedecision ala is in favor of Ho. On the contrary, the proposed randommethod was not so sensitive to the outlier decision.

Fig. 5. The evolution of social belief with one outlier decision.

6. CONCLUSION

In this paper we proposed a Bayesian learning method for agents ina sequential random decision making system. We modeled the ob­servations of agents as Gaussian random variables with either zeromean or a known value. The agents drew their decisions randomlyfrom their beliefs about the true state of nature. We proved that bythe proposed Bayesian learning method, both the social belief andthe action converged to the state of nature in probability. We demon­strated the performance of the proposed method by Monte Carlo sim­ulations and compared it with that of a deterministic method. Weshowed that the system using random decision policy has a fasterconvergence rate than the one with deterministic policy.

Page 5: [IEEE ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Florence, Italy (2014.5.4-2014.5.9)] 2014 IEEE International Conference

7. REFERENCES

[1] A. V. Banerjee, "A simple model of herd behavior," The Quar­terly Journal ofEconomics, vol. 107, no. 3, pp. 797-817, 1992.

[2] S. Bikhchandani, D. Hirshleifer, and I. Welch, "A theory offads, fashion, custom, and cultural change as informationalcascades," Journal ofpolitical Economy, pp. 992-1026, 1992.

[3] A. Jadbabaie, P. Molavi, A. Sandroni, and A. Tahbaz-Salehi,"Non-Bayesian social learning," Games and Economic Behav­ior, vol. 76, no. 1, pp. 210-225, 2012.

[4] B. Golub and M. O. Jackson, "Naive learning in social net­works and the wisdom of crowds," American Economic Jour­nal: Microeconomics, vol. 2, no. 1, pp. 112-149, 2010.

[5] V. Krishnamurthy and H. V. Poor, "Social learning andBayesian games in multiagent signal processing: How do localand global decision makers interact?," IEEE Signal ProcessingMagazine, vol. 30, no. 3,pp. 43-57, 2013.

[6] D. Acemoglu, M. A. Dahleh, I. Lobel, and A. Ozdaglar,"Bayesian learning in social networks," The Review of Eco­nomic Studies, vol. 78, no. 4, pp. 1201-1236, 201l.

[7] D. Gale and S Kariv, "Bayesian learning in social networks,"Games and Economic Behavior, vol. 45, no. 2, pp. 329-346,2003.

[8] P. M. Djuric and Y. Wang, "Distributed Bayesian learning inmultiagent systems: Improving our understanding of its capa­bilities and limitations," IEEE Signal Processing Magazine,vol.29,no.2,pp.65-76,2012.

[9] Y. Wang and P. M. Djuric, "A gossip method for optimal con­sensus on a binary state from binary actions," IEEE JournalofSelected Topics in Signal Processing, vol. 7, no. 2, pp. 274­283,2013.

[10] V. Krishnamurthy, "Bayesian sequential detection with phase­distributed change time and nonlinear penaltya pomdp latticeprogramming approach," IEEE Transactions on InformationTheory, vol. 57, no. 10,pp. 7096-7124,2011.

[11] C. Charnley, Rational Herds: Economic Models of SocialLearning, Cambridge University Press, 2004.

[12] E. Mossel, A. Sly, and o. Tamuz, "Asymptotic learning onBayesian social networks," Probability Theory and RelatedFields, pp. 1-31, 2012.

[13] C. Wang, Y. Chen, and K. J. R. Liu, "Sequential Chineserestaurant game," IEEE Transactions on Signal Processing,vol.60,no.3,pp.571-584,2012.

6408