1 Convergence Analysis of the Variance in Gaussian Belief ...ycwu/publication_download/...10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing 1 Convergence Analysis of

1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing

1

Convergence Analysis of the Variance in

Gaussian Belief Propagation

Qinliang Su and Yik-Chung Wu

Abstract

It is known that Gaussian belief propagation (BP) is a low-complexity algorithm for (approximately)

computing the marginal distribution of a high dimensional Gaussian distribution. However, in loopy

factor graph, it is important to determine whether Gaussian BP converges. In general, the convergence

conditions for Gaussian BP variances and means are not necessarily the same, and this paper focuses

on the convergence condition of Gaussian BP variances. In particular, by describing the message-

passing process of Gaussian BP as a set of updating functions, the necessary and sufficient convergence

conditions of Gaussian BP variances are derived under both synchronous and asynchronous schedulings,

with the converged variances proved to be independent of the initialization as long as it is chosen from

the proposed set. The necessary and sufficient convergence condition is further expressed in the form of a

semi-definite programming (SDP) optimization problem, thus can be verified more efficiently compared

to the existing convergence condition based on computation tree. The relationship between the proposed

convergence condition and the existing one based on computation tree is also established analytically.

Numerical examples are presented to corroborate the established theories.

Index Terms

Gaussian belief propagation, loopy belief propagation, graphical model, factor graph, message

passing, sum-product algorithm, convergence.

Copyright (c) 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other

purposes must be obtained from the IEEE by sending a request to [email protected].

This work was supported in part by the HKU seed Funding Programme, Project No. 201311159066.

The authors are with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong,

(e-mail: [email protected], [email protected]).

July 31, 2014 DRAFT



2

I. INTRODUCTION

In signal processing and machine learning, many problems eventually come to the issue

of computing the marginal mean and variance of an individual random variable from a high

dimensional joint Gaussian probability density function (PDF). A direct way of marginalization

involves the computation of the inverse of the precision matrix in the joint Gaussian PDF. The

inverse operation is known to be computationally expensive for a high dimensional matrix, and

would introduce heavy communication overhead when carried out in distributed scenarios.

By representing the joint PDF with a factor graph [1], Gaussian BP provides an alternative to

(approximately) calculate the marginal mean and variance for each individual random variable

by passing messages between neighboring nodes in the factor graph [2]–[5]. It is known that if

the factor graph is of tree structure, the belief means and variances calculated with Gaussian BP

both converge to the true marginal means and variances, respectively [1]. For a factor graph with

loops, if the belief means and variances in Gaussian BP converge, the true marginal means and

approximate marginal variances are obtained [6]. Recently, a novel message-passing algorithm

has been proposed in [7] for inference in Gaussian graphical model by choosing a special set of

nodes to break loops in the graph. Although this algorithm computes the true marginal means and

improves the accuracy of variances, it still needs to use Gaussian BP as an underlying inference

algorithm. Thus, in this paper, we focus on the message-passing algorithm of Gaussian BP only.

With the ability to provide the true marginal means upon convergence, Gaussian BP has been

successfully applied in low-complexity detection and estimation problems arising in communi-

cation systems [8]–[10], fast solver for large sparse linear systems [11], [12], sparse Bayesian

learning [13], estimation in Gaussian graphical model [14], etc. In addition, the distributed

property inherited from message passing algorithms is particularly attractive for applications

requiring distributed implementation, such as distributed beamforming [15], inter-cell interference

mitigation [16], distributed synchronization and localization in wireless sensor networks [17]–

[19], distributed energy efficient self-deployment in mobile sensor networks [20], distributed rate

control in Ad Hoc networks [21], and distributed network utility maximization [22]. Moreover,

Gaussian BP is also exploited to provide approximate marginal variances for large-scale sparse

Bayesian learning in a computationally efficient way [23].

However, Gaussian BP only works under the prerequisite that the belief means and variances

July 31, 2014 DRAFT



3

calculated from the updating messages do converge. So far, several sufficient convergence condi-

tions have been proposed, which can guarantee the means and variances converge simultaneously

[6], [24], [25]. However, in general, the belief means and variances do not necessarily converge

under the same condition. It is reported in [24] that if the variances converge, the convergence

of the means can always be observed when suitable damping is imposed. Thus, it is important

to ensure the variances of Gaussian BP to converge in the first place. In the pioneering work

[24], the message-passing procedure of Gaussian BP in a loopy factor graph is equivalently

translated into that in a loop-free computation tree [26]. Based on the computation tree, an

almost necessary and sufficient convergence condition of variances is derived. It is proved that if

the spectral radius of the infinite dimensional precision matrix of the computation tree is smaller

than one, the variances converge; and if this spectral radius is larger than one, the Gaussian BP

becomes ill-posed. However, this condition is only ‘almost necessary and sufficient’ since it does

not cover the scenario when the spectral radius is equal to one. More crucially, this convergence

condition requires the evaluation of the spectral radius of an infinite dimensional matrix, which

is almost impossible to be calculated in practice.

In this paper, with the fact that the messages in Gaussian BP can be represented by linear

and quadratic parameters, the message-passing process of Gaussian BP is described as a set

of updating functions of parameters. Based on the monotonically non-decreasing and concave

properties of the updating functions, the necessary and sufficient convergence condition of

messages’ quadratic parameters is derived first. Then, with the relation between BP messages’

quadratic parameters and belief variances, the necessary and sufficient convergence condition

of belief variances is developed under both synchronous and asynchronous schedulings. The

initialization set under which the variances are guaranteed to converge is proposed as well.

The convergence condition derived in this paper is proved to be equivalent to a semi-definite

programming (SDP) problem, thus can be easily verified. Furthermore, the relationship between

the proposed convergence condition and the existing one based on computation tree is also

established. Our result fills in the missing part of the convergence condition in [24] on the case

when the spectral radius of the infinite dimensional matrix is equal to one.

The rest of this paper is organized as follows. Gaussian BP is reviewed in Section II. Section

III analyzes the updating process of Gaussian BP messages, followed by the necessary and

sufficient convergence condition of quadratic parameters in the messages in Section IV. The

July 31, 2014 DRAFT



4

derivation of the necessary and sufficient convergence condition of belief variances is presented

in Section V. Relationship between the condition proposed in this paper and the existing one

based on computation tree is established in Section VI. Numerical examples are presented in

Section VII, followed by conclusions in Section VIII.

The following notations are used throughout this paper. For two vectors, x1 ≥ x2 and x1 > x2

mean the inequalities are held in all corresponding elements. Notations λ(G) and λmax(G)

represent any eigenvalue and the maximal eigenvalue of matrix G, respectively. Notation ρ(G)

means the spectral radius of G. Notations [·]i and [·]ij represent the i-th and (i, j)-th element of

a vector and matrix, respectively.

II. GAUSSIAN BELIEF PROPAGATION

In general, a Gaussian PDF can be written as

f(x) ∝ exp

{−1

2xTPx+ hTx

}, (1)

where x = [x1, x2, · · · , xN ]T with N being the number of random variables; P ≻ 0 is the

precision matrix with pij being its (i, j)-th element; and h = [h1, h2, · · · , hN ]T is the linear

coefficient vector. In many signal processing applications, we want to find the marginalized PDF

p(xi) =∫f(x)dx1 · · · dxi−1dxi+1 · · · dxN . For a Gaussian f(x), this can be done by completing

the square inside the exponent of (1) as f(x) ∝ exp{−1

2(x−P−1h)TP(x−P−1h)

}, and then

the marginalized PDF is obtained as

p(xi) ∝ exp

{−(xi − µi)

2

2σ2i

}, (2)

with the mean µi = [P−1h]i and the variance σ2i = [P−1]ii. However, the required matrix inverse

is computationally expensive for a large dimensional P, and introduces heavy communication

overhead to the network when the information pij is located distributedly.

On the other hand, the Gaussian PDF in (1) can be expanded as f(x) ∝N∏i=1

fi(xi)N∏j=1

N∏k=j+1

fjk(xj, xk),

where fi(xi) = exp{−pii

2x2i + hixi

}and fjk(xj, xk) = exp {−pjkxjxk}. Based on this expan-

sion, a factor graph1 can be constructed by connecting each variable xi with its associated factors

fi(xi) and fij(xi, xj). It is known that Gaussian BP can be applied on this factor graph by passing

1The factor graph is assumed to be connected in this paper.

July 31, 2014 DRAFT



5

messages among neighboring nodes to obtain the true marginal mean and approximate marginal

variance for each variable.

In Gaussian BP, the departing and arriving messages corresponding to variables i and j are

updated as

mdi→j (xi, t) ∝ fi(xi)

∏k∈N (i)\j

mak→i (xi, t), (3)

mai→j(xj, t+ 1) ∝

∫fij (xi, xj)m

di→j (xi, t) dxi, (4)

where (i, j) ∈ E with E , {(i, j)|pij = 0, for i, j ∈ V} being the set of index pairs of any two

connected variables and V , {1, 2, · · · , N} being the set of indices of all variables; t is the time

index; N (i) is the set of indices of neighboring variable nodes of node i, and N (i)\j is the set

N (i) but excluding the node j. After obtaining the messages mak→i(xi, t), the belief at variable

node i is equal to

bi (xi, t) ∝ fi(xi)∏

k∈N (i)

mak→i (xi, t). (5)

It has been proved that bi(xi, t) always converges to p(xi) in (2) exactly if the factor graph is

of tree structure [1], [6]. For loopy factor graph, if bi (xi, t) converges, the mean of bi(xi, t) will

converge to the true mean µi, while the converged variance is an approximation of σ2i [6].

Assume the arriving message is in Gaussian form of mai→j(xj, t) ∝ exp

{−vai→j(t)

2x2j + βa

i→j(t)xj

},

where vai→j(t) and βai→j(t) are the arriving precision and arriving linear coefficient, respectively.

Inserting it into (3), we obtain mdi→j(xi, t) ∝ exp

{−vdi→j(t)

2x2i + βd

i→j(t)xi

}, where

vdi→j(t) = pii +∑

k∈N (i)\j

vak→i(t), (6)

βdi→j(t) = hi +

∑k∈N (i)\j

βak→i(t) (7)

are the departing precision and linear coefficient, respectively. Furthermore, substituting the

departing message mdi→j(xi, t) into (4), we obtain

mai→j(xj, t+1) ∝ exp

{p2ij

2vdi→j(t)x2j −

pijβdi→j(t)

vdi→j(t)xj

}×∫

exp

−vdi→j(t)

2

(xi −

βdi→j(t)− pijxj

vdi→j(t)

)2 dxi.

(8)

July 31, 2014 DRAFT



6

If vdi→j(t) > 0, the integration equals to a constant, and thus mai→j(xj, t+1) ∝ exp

{p2ij

2vdi→j(t)x2j −

pijβdi→j(t)

vdi→j(t)xj

}.

Therefore, vai→j(t+ 1) and βai→j(t+ 1) are updated as

vai→j(t+ 1) =

− p2ijvdi→j(t)

, if vdi→j(t) > 0

not defined, otherwise,(9)

βai→j(t+ 1) =

−pijβdi→j(t)

vdi→j(t), if vdi→j(t) > 0

not defined, otherwise,(10)

where ‘not defined’ arises since under the case of vdi→j(t) ≤ 0, the integration in (8) becomes

infinite and the message mai→j(xj, t + 1) loses its Gaussian form. After obtaining vak→i(t), the

variance of belief at each iteration is computed as

σ2i (t) =

1

pii +∑

k∈N (i) vak→i(t)

. (11)

From (9) and (10), it can be seen that the messages of Gaussian BP maintain the Gaussian

form only when vdi→j(t) > 0 for all t ≥ 0. Therefore, we have the following lemma.

Lemma 1. The messages of Gaussian BP are always in Gaussian form if and only if vdi→j(t) > 0

for all t ≥ 0.

III. ANALYSIS OF THE MESSAGE-PASSING PROCESS

In this section, we first analyze the updating process of vai→j(t), and then derive the condition

to guarantee the BP messages being maintained at Gaussian form.

A. Updating Function of vai→j(t) and Its Properties

Under the Gaussian form of messages, substitutting (6) into (9) gives

vai→j(t+ 1) = −p2ij

pii +∑

k∈N (i)\jvak→i(t)

. (12)

By writing (12) into a vector form, we obtain

va(t+ 1) = g(va(t)), (13)

where g(·) is a vector-valued function containing components gij(·) with (i, j) ∈ E arranged in

ascending order first on j and then on i, and gij(·) is defined as

gij (w) , −p2ij

pii +∑

k∈N (i)\j wki

; (14)

July 31, 2014 DRAFT



7

va(t) and w are vectors containing elements vai→j(t) and wij , respectively, both with (i, j) ∈ E

arranged in ascending order first on j and then on i. Moreover, define the set

W ,{w∣∣pii +∑

k∈N (i)\jwki > 0, ∀ (i, j) ∈ E

}. (15)

Now, we have the following proposition about g(·) and W .

Proposition 1. The following claims hold:

P1) For any w1 ∈ W , if w1 ≤ w2, then w2 ∈ W;

P2) For any w1 ∈ W , if w1 ≤ w2, then g(w1) ≤ g(w2);

P3) gij(w) is a concave function with respect to w ∈ W .

Proof: Consider two vectors w and w in W , which contain elements wki and wki with

(k, i) ∈ E arranged in ascending order first on k and then on i. For any w ∈ W , according to

the definition of W in (15), we have pii +∑

k∈N (i)\j wki > 0. Then, if w ≤ w, it can be easily

seen that pii +∑

k∈N (i)\j wki > 0 as well. Thus, we have w ∈ W .

The first-order derivative of gij(w) with respect to wki for k ∈ N (i)\j is computed to be

∂gij∂wki

=p2ij

(pii +∑

k∈N (i)\j wki)2 > 0. (16)

Thus, gij(w) is a continuous and strictly increasing function with respect to the components wki

for k ∈ N (i)\j with w ∈ W . Hence, we have if w1 ≤ w2, then g(w1) ≤ g(w2).

To prove the third property, consider the simple function − p2ijpii+r

under the domain of r > −pii

first. It can be easily derived that the second order derivative of − p2ijpii+r

is − 2p2ij(pii+r)3

< 0, where the

inequality holds due to pii+ r > 0. Thus, − p2ijpii+r

is a concave function with respect to r > −pii.

Since gij(w) = − p2ijpii+

∑k∈N (i)\j wki

is the composition of the concave function − p2ijpii+r

and the

affine mapping∑

k∈N (i)\j wki, the composition function gij(w) is also concave with respect to∑k∈N (i)\j wki > −pii [28, p. 79]. Due to W ,

{w∣∣pii +∑k∈N (i)\j wki > 0, ∀ (i, j) ∈ E

}as

defined in (15), then w ∈ W implies that∑

k∈N (i)\j wki > −pii. Thus, we can infer that gij(w)

is a concave function with respect to w ∈ W .

Notice that convexity is also exploited in [29] to derive the convergence condition for a general

min-sum message-passing algorithm, which reduces to Gaussian BP when it is applied to the

particular quadratic function 12xTPx − hTx. The convexity in [29] means that the quadratic

function can be decomposed into summation of convex functions, which is different from the

convexity in the updating function gij(w) here.

July 31, 2014 DRAFT



8

B. Condition to maintain the Gaussian Form of Messages

First, define the following set2

S1 , {w ∈ W |w ≤ g(w)} . (17)

With notations g(t)(w) , g(g(t−1)(w)

)and g(0)(w) , w, the following proposition can be

established.

Proposition 2. The set S1 has the following properties:

P4) S1 is a convex set;

P5) If s ∈ S1, then s < 0;

P6) If s ∈ S1, then g(t)(s) ∈ S1 and g(t)(s) ≤ g(t+1)(s) for all t ≥ 0.

Proof: As gij(w) is a concave function for w ∈ W according to P3), thus gij(w)−wij is also

concave for w ∈ W . The set of w ∈ W which satisfies the condition of gij(w)−wij ≥ 0 is a con-

vex set [28]. Thus, S1 = {w |w ≤ g(w) with w ∈ W} = {gij(w)− wij ≥ 0 for all (i, j) ∈ E with w ∈ W}

is also a convex set.

If s ∈ S1, we have s ≤ g(s) and s ∈ W . According to the definition of W in (15), pii +∑k∈N (i)\j ski > 0. Putting this fact into the definition of gij(·) in (14), it is obvious that g(s) < 0.

Therefore, we have the relation that s ≤ g(s) < 0 for all s ∈ S1.

Finally, if s ∈ S1, we have s ≤ g(s) and s ∈ W . Hence, g(s) ∈ W according to the P1).

Applying g(·) on both sides of s ≤ g(s) and using P2), we obtain g(s) ≤ g(2)(s). Furthermore,

since g(s) ∈ W and g(s) ≤ g(2)(s), we also have g(s) ∈ S1. By induction, we can prove in

general that g(t)(s) ∈ S1 and g(t)(s) ≤ g(t+1)(s) for all t ≥ 0.

Now, we give the following theorem.

Theorem 1. There exists at least one initialization va(0) such that the messages of Gaussian

BP are maintained at Gaussian form for all t ≥ 0 if and only if S1 = ∅.

Proof:

Sufficient Condition:

2The simpler symbol S is reserved for later use.

July 31, 2014 DRAFT



9

If S1 = ∅, by choosing an initialization va(0) ∈ S1 and applying P6), it can be inferred that

va(t) ∈ S1 for all t ≥ 0. Moreover, according to the definitions of W in (15) and S1 in (17), we

have S1 ⊆ W , and thereby va(t) ∈ W for all t ≥ 0, or equivalently pii +∑

k∈N (i)\j vak→i(t) >

0 for all (i, j) ∈ E according to the definition of set W in (15). Due to vdi→j(t) = pii +∑k∈N (i)\j v

ak→i(t) as given in (6), we obtain vdi→j(t) > 0 for all (i, j) ∈ E and t ≥ 0. According

to Lemma 1, it can be inferred that the messages are maintained at Gaussian form for all t ≥ 0.

Necessary Condition:

We prove this necessary condition by contradiction. When S1 = ∅, suppose that there exists a

va(0) such that the messages are maintained at Gaussian form for all t ≥ 0. According to Lemma

1, we can infer that vdi→j(t) = pii +∑

k→N (i)\j vak→i(t) > 0 for all (i, j) ∈ E , or equivalently

va(t) , g(t)(va(0)) ∈ W for all t ≥ 0 from the definition of W in (15).

Now, choose another initialization va(0) satisfying both va(0) ≤ va(0) and va(0) ≥ 0. Due

to va(0) ∈ W and va(0) ≤ va(0), by using P2), we have g (va(0)) ≤ g (va(0)), that is,

va(1) ≤ va(1). Furthermore, substituting va(0) ≥ 0 into (14) gives va(1) ≤ 0, and thereby

va(1) ≤ va(0). Combining with va(1) ≤ va(1) leads to va(1) ≤ va(1) ≤ va(0). Due to the

assumption va(1) ∈ W , by applying P2) to va(1) ≤ va(1) ≤ va(0), we obtain va(2) ≤ va(2) ≤

va(1). Combining with the fact va(1) ≤ 0 as proved above, we have va(2) ≤ va(2) ≤ va(1) ≤ 0.

By induction, it can be derived that

va(t+ 1) ≤ va(t+ 1) ≤ va(t) ≤ 0, ∀ t ≥ 1. (18)

Then, from the definition of W in (15), the assumption va(t) ∈ W is equivalent to pii +∑k∈N (i)\j v

ak→i(t) > 0 for all (i, j) ∈ E , where vak→i(t) are the elements of va(t) with (k, i) ∈ E

arranged in the same order as va(t). By rearranging the terms in pii +∑

k∈N (i)\j vak→i(t) > 0,

we obtain vaγ→i(t) > −pii −∑

k∈N (i)\j,γ vak→i(t). Applying va(t) ≤ 0 shown in (18) into this

inequality, it can be inferred that vaγ→i(t) > −pii−∑

k∈N (i)\j,γ vak→i(t) ≥ −pii, that is, vaγ→i(t) >

−pii for all (γ, i) ∈ E . Combining with va(t) ≥ va(t) shown in (18), for all (i, j) ∈ E , we have

vai→j(t) > −pjj. (19)

It can be seen from (18) and (19) that va(t) is a monotonically non-increasing and lower bounded

sequence. Thus, va(t) must converge to a vector va∗, that is, va∗ = g(va∗).

On the other hand, substituting (12) into (19) gives − p2ijpii+

∑k∈N (i)\jv

ak→i(t)

> −pjj . Due to

va(t) ∈ W , it can be inferred from P1) and (18) that va(t) ∈ W , or equivalently pii +∑

k∈N (i)\jvak→i(t) >

July 31, 2014 DRAFT



10

0. Together with the fact pjj > 0, we obtain pii +∑

k∈N (i)\jvak→i(t) >

p2ijpjj

. Since va(t) converges

to va∗, taking the limit on both sides of the inequality gives pii +∑

k∈N (i)\j va∗k→i ≥

p2ijpjj

. Hence,

pii+∑

k∈N (i)\j va∗k→i > 0. From the definition of W in (15), we have va∗ ∈ W . Combining with

va∗ = g(va∗), according to the definition of S1 in (17), it is clear that va∗ ∈ S1. This contradicts

with the prerequisite S1 = ∅. Therefore, S1 cannot be ∅.

IV. NECESSARY AND SUFFICIENT CONVERGENCE CONDITION OF MESSAGE PARAMETER

va(t)

According to Theorem 1, Gaussian form of messages may only be maintained for some va(0).

Thus, the choice of initialization va(0) also plays an important role in the convergence of va(t).

A. Synchronous Scheduling

In this section, we will investigate the convergence condition of va(t) under synchronous

scheduling, which means that va(t) is updated as va(t + 1) = g(va(t)). The necessary and

sufficient convergence condition under nonnegative initialization set is derived first. Then, the

convergence condition is proved to hold under a more general initialization set.

Lemma 2. va(t) converges to the same point va∗ ∈ S1 for all va(0) ≥ 0 if and only if S1 = ∅.

Proof: See Appendix A.

Obviously, the initialization va(0) = 0 used by the convergence condition in [24] is implied

by the proposed initialization set va(0) ≥ 0 in Lemma 2. In the following, we show that the

initialization set could be further expanded to the set

A , {w ≥ 0}∪{w ≥ w0 |w0 ∈ int(S1)}∪{w ≥ w0

∣∣∣w0 ∈ S1 and w0 = limt→∞

g(t)(0)}, (20)

where int(S1) means the interior of S1. Notice that the set A consists of the union of three parts.

• The first part {w ≥ 0} corresponds to the initialization set given in Lemma 2.

• For the case S1 = ∅ but int(S1) = ∅, the second part is empty. Also, according to Lemma 2,

we have limt→∞

g(t)(0) = va∗ ∈ S1. Thus, the third part of set A reduces to {w ≥ va∗}. Due

to the fact that va∗ ∈ S1 implies va∗ < 0 given by P5), we have {w ≥ 0} ( {w ≥ va∗}.

• For the case int(S1) = ∅, according to (63), we have w0 ≤ va∗ for any w0 ∈ int(S1).

This implies {w ≥ va∗} ⊆ {w ≥ w0 |w0 ∈ int(S1)}. Obviously, in this case, we also have

{w ≥ 0} ( {w ≥ w0 |w0 ∈ int(S1)}.

July 31, 2014 DRAFT



11

Therefore, if S1 = ∅, the set A is always larger than the set {w ≥ 0}.

Theorem 2. va(t) converges to the same point va∗ for all va(0) ∈ A if and only if S1 = ∅.

Proof: The first part {w ≥ 0} of A corresponds to the case covered in Lemma 2, thus has

already been established. To prove the sufficiency, we consider the case int (S1) = ∅ and the

case int(S1) = ∅ but S1 = ∅, respectively.

For the case int (S1) = ∅, it is known from (20) that A = {w ≥ w0 |w0 ∈ int(S1)} in this

case. To prove the theorem, we will first prove that va(t) converges to the same point va∗ for

all va(0) ∈ int(S1). Then, notice that for any va(0) ∈ {w ≥ w0 |w0 ∈ int(S1)}, it can be upper

bounded by a point from {w ≥ 0} and lower bounded by a point from int(S1). Because both

the upper bound and lower bound converge to va∗, we can easily extend the convergence to the

much larger set va(0) ∈ {w ≥ w0 |w0 ∈ int(S1)}.

First, we prove that va(t) converges to va∗ for all va(0) ∈ int(S1). To do this, we establish

the following three facts first.

1) For any va(0) ∈ int(S1), according to P6), we have va(t) ≤ va(t+ 1) and va(t) ∈ S1. With

P5), we also have va(t) < 0. It can be seen that va(t) is a monotonically non-decreasing

and upper bounded sequence, thus va(t) converges to a fixed point, which is denoted as

va∗. On the other hand, let va(0) ≥ 0 correspond to a point in the first part of A, due

to va(0) < 0, we have va(0) < va(0). By using P2), applying g(·) to va(0) < va(0) for

t times gives va(t) ≤ va(t). Since va(t) and va(t) converge to va∗ and va∗, respectively,

taking the limit on both sides of va(t) ≤ va(t) gives va∗ ≤ va∗. Due to va(0) ∈ int(S1), the

definition of S1 in (17) implies that va(0) < g(va(0)), that is, va(0) < va(1). Combining

with va(t) ≤ va(t + 1) established above, we obtain va(0) < va∗. Together with the fact

va∗ ≤ va∗, we obtain

va(0) < va∗ ≤ va∗. (21)

2) For a va(0) ∈ int (S1), construct a vector

va(0) = λva(0) + (1− λ)va∗, (22)

where λ ∈ (0, 1). Since gij(w) is a concave function over w ∈ S1 and {va(0),va∗} ∈ S1,

we have g(λva(0) + (1− λ)va∗) ≥ λg(va(0)) + (1− λ)g(va∗), or equivalently g(λva(0) +

(1− λ)va∗) ≥ λva(1) + (1− λ)va∗. According to P2), applying g(·) to this inequality gives

July 31, 2014 DRAFT



12

g(2)(λva(0) + (1 − λ)va∗) ≥ g(λva(1) + (1 − λ)va∗). Because of the concave property of

gij(w), it is known that g(λva(1) + (1 − λ)va∗) ≥ λva(2) + (1 − λ)va∗. Thus, we have

g(2)(λva(0) + (1 − λ)va∗) ≥ λva(2) + (1 − λ)va∗. By induction, it can be inferred that

g(t)(λva(0)+(1−λ)va∗) ≥ λva(t)+(1−λ)va∗ for all t ≥ 0. With va(0) = λva(0)+(1−λ)va∗

given by (22), we obtain

va(t) ≥ λva(t) + (1− λ)va∗, (23)

where va(t) , g(t)(va(0)). Since {va(0),va∗} ∈ S1 and S1 is a convex set as indicated in

P4), we have va(0) = λva(0) + (1− λ)va∗ ∈ S1. According to P6), va(t) , g(t)(va(0)) is a

monotonically non-decreasing sequence. Furthermore, according to P5), va(t) is also upper

bounded by 0. Thus, va(t) converges to a fixed point, which is denoted as va∗. Taking limits

on both sides of (23) gives

va∗ ≥ λva∗ + (1− λ)va∗

= va∗ + (1− λ)(va∗ − va∗). (24)

3) From va(0) < va∗ ≤ va∗ given by (21), we have va∗ − va(0) > 0 and va∗ − va(0) > 0. If

λ ∈ (0, 1) and is close to 1, the relation (1− λ)(va∗ − va(0)) < va∗ − va(0) always holds,

that is,

∃ λ ∈ (0, 1), s.t. (1− λ)(va∗ − va(0)) < va∗ − va(0). (25)

Rearranging the terms in (25) gives ∃ λ ∈ (0, 1), s.t. λva(0) + (1 − λ)va∗ < va∗. Due

to λva(0) + (1 − λ)va∗ = va(0) given in (22), thus (25) implies that there always exists

a λ ∈ (0, 1) such that va(0) < va∗. Now, applying g(·) to va(0) < va∗ for t times, from

P2), we obtain g(t)(va(0)) ≤ g(t)(va∗), or equivalently va(t) ≤ va∗. Taking the limit on both

sides of va(t) ≤ va∗, it can be inferred that

∃ λ ∈ (0, 1), s.t. va∗ ≤ va∗. (26)

Now, we make use of (21), (24) and (26) to prove that the convergence limit va∗ is identical to

the convergence limit va∗. Combining (21) and (26) gives

∃ λ ∈ (0, 1), s.t. va∗ ≤ va∗ ≤ va∗. (27)

From the second inequality of (27), we have (1− λ)(va∗ − va∗) ≥ 0. Substituting this relation

into (24), we can infer that va∗ ≥ va∗. Comparing this result to the first inequality of (27), we

July 31, 2014 DRAFT



13

obtain va∗ = va∗. Due to va∗ = va∗, (24) becomes (1− λ)(va∗ − va∗) ≤ 0. For λ ∈ (0, 1), the

relation is equivalent to va∗ ≤ va∗. Combining it with the second inequality in (27), we obtain

va∗ = va∗. (28)

Next, we will extend the initialization set to the whole set A = {w ≥ w0 |w0 ∈ int(S1)}.

For any va(0) ∈ A, we can always find a val (0) ∈ int(S1) and a va

u(0) ≥ 0 such that val (0) ≤

va(0) ≤ vau(0). According to P2), applying g(·) to this inequality repeatedly gives g(t)(va

l (0)) ≤

g(t)(va(0)) ≤ g(t)(vau(0)). Since va

l (t) = g(t)(val (0)) and va

u(t) = g(t)(vau(0)) both converge to

va∗, thus va(t) converges to va∗, too.

Now, consider the case int(S1) = ∅ but S1 = ∅. In this case, according to the discussion

after (20), A = {w ≥ va∗}. For any va(0) ∈ A, we can always find a vau(0) ∈ {w ≥ 0} such

that va∗ ≤ va(0) ≤ vau(0). Applying g(·) to this inequality for t times, from P2), we obtain

va∗ ≤ g(t)(va(0)) ≤ g(t)(vau(0)). Since va

u(t) converges to va∗, thus va(t) also converges to va∗.

On the other hand, if S1 = ∅, according to Theorem 1, the messages of Gaussian BP passed

in factor graph lose the Gaussian form, thus the parameters of messages va(t) cannot converge

in this case.

B. Asynchronous Scheduling

In this section, convergence conditions under asynchronous scheduling schemes will be in-

vestigated. Modified from the synchronous updating equation in (12), arriving precision in

asynchronous scheduling is updated as

vai→j(t+ 1) = −p2ij

pii +∑

k∈N (i)\jvak→i(τ

ak→i(t))

, ∀ t ∈ Ti→j (29)

where Ti→j is the set of time instants at which vai→j(t) are updated; 0 ≤ τak→i(t) ≤ t is the last

updated time instant of vak→i(t) at time t. In this paper, we only consider the totally asynchronous

scheduling defined as follows.

Definition 1. (Totally Asynchronous Scheduling) [27]: The sets Ti→j are infinite, and τai→j(t)

satisfies 0 ≤ τai→j(t) ≤ t and limt→∞

τai→j(t) = ∞ for all (i, j) ∈ E .

In order to prove the convergence of va(t) under asynchronous scheduling given by Definition

1, we need to find a sequence of nonempty sets {U(m)} for m ≥ 0 such that the following two

sufficient conditions are satisfied [27]:

July 31, 2014 DRAFT



14

• Synchronous Convergence Condition: the sequence of sets satisfy 1) U(m + 1) ⊆ U(m); 2)

g(w) ∈ U(m+ 1) for ∀ w ∈ U(m); 3) U(m) converges to a fixed point;

• Box Condition: U(m) can be written as product of subsets of individual components Ui→j(m)

for all (i, j) ∈ E .

While the formal proof of asynchronous convergence under these conditions are provided in

[27], we give a brief intuitive explanation below. Suppose when t ≥ tm, vai→j(t) ∈ Ui→j(m)

for all (i, j) ∈ E , or expressed using the Box Condition va(t) ∈ U(m) for t ≥ tm. Now,

consider the updating of an individual component vai→j(t). Due to limt→∞

τak→i(t) = ∞, we can

always find a t(i→j)m+1 > tm such that when t ≥ t

(i→j)m+1 , we have τak→i(t) ≥ tm and thereby

vak→i(τak→i(t)) ∈ Uk→i(m) for all k ∈ N (i)\j. Putting vak→i(τ

ak→i(t)) into function gij(·) in

(29) and applying condition 2) in Synchronous Convergence Condition, we have vai→j(t+ 1) ∈

Ui→j(m + 1) for t ≥ t(i→j)m+1 . Thus, by choosing tm+1 = max

(i,j)∈Et(i→j)m+1 , we have when t ≥ tm+1,

vai→j(t) ∈ Ui→j(m + 1) for all (i, j) ∈ E . Further due to conditions 1) and 3) in Synchronous

Convergence Condition, we know that va(t) will gradually converge to a fixed point.

Theorem 3. va(t) converges for any va(0) ∈ A and all choices of schedulings if and only if

S1 = ∅.

Proof: We first prove the sufficient condition. Since S1 = ∅, as given in the proof of

Theorem 2, for any va(0) ∈ A, there always exist an element va(0) ∈ int (S1) ∪ {va∗} and

va(0) ≥ 0 such that va(0) ≤ va(0) ≤ va(0). Construct a sequence of sets as

U(m) = {w |va(m) ≤ w ≤ va(m)} . (30)

Since va(0) ∈ int (S1)∪ {va∗}, we have va(m) ≤ va(m+1) according to P6). Further, for any

va(0) ≥ 0, putting va(0) into (14), we have va(1) < 0, thereby va(1) < va(0). Since va(m)

converges according to Theorem 2, then va(m) ∈ W for all m ≥ 0. Thus, according to P2), by

applying g(·) to va(1) < va(0) and by induction, we have va(m + 1) ≤ va(m). By exploiting

va(m) ≤ va(m+ 1), va(m+ 1) ≤ va(m) and the definition of U(m) in (30), we have

U(m+ 1) ⊆ U(m). (31)

Now, for any w ∈ U(m), by applying g(·) to va(m) ≤ w ≤ va(m), we obtain va(m + 1) ≤

g(w) ≤ va(m+ 1). Thus, from the definition of U(m) in (30), we have the relation

g (w) ∈ U(m+ 1), ∀ w ∈ U(m). (32)

July 31, 2014 DRAFT



15

Furthermore, according to Theorem 2, we have limm→∞

va(m) = limm→∞

va(m) = va∗, hence U(m)

will converge to the fixed point va∗. Thus, the set U(m) satisfies the Synchronous Convergence

Condition [27]. Furthermore, according to the definition of U(m) in (30), va(m) ∈ U(m) means

vai→j(m) ≤ vai→j(m) ≤ vai→j(m) for all (i, j) ∈ E , that is, vai→j(m) ∈ Ui→j(m) with Ui→j(m) =

{wij|vai→j(m) ≤ wij ≤ vai→j(m)}. Thus, U(m) can be represented in form of product of subsets

of individual components Ui→j(m) for all (i, j) ∈ E , and thereby the Box Condition is satisfied.

Hence, the sufficient condition is proved.


in factor graph lose the Gaussian form, thus the parameters of messages va(t) cannot converge

in this case.

From Theorems 2 and 3, it can be seen that the convergence condition of va(t) under

asynchronous scheduling is the same as that under synchronous scheduling.

V. NECESSARY AND SUFFICIENT CONVERGENCE CONDITION OF BELIEF VARIANCE σ2i (t)

Although the convergence conditions for va(t) are derived in the last section, what we are

really interested in is the convergence condition for the belief variance σ2i (t). According to

(11), the variance σ2i (t) and va(t) are related by σ2

i (t) = 1pii+

∑k∈N (i) v

ak→i(t)

, or equivalently1

σ2i (t)

= pii +∑

k∈N (i) vak→i(t).

A. Convergence Condition

In order to investigate the convergence condition of σ2i (t), we present the following lemma

first.

Lemma 3. If S1 = ∅ and va(0) ∈ A, then limt→∞

1σ2i (t)

> 0 for all i ∈ V or limt→∞

1σ2i (t)

= 0 for all

i ∈ V .

Proof: In the proof, we first consider the special case under tree-structured factor graph

as well as the impact of initializations. Then, for the case of loopy factor graph, we prove that

there exists at least one node γ ∈ V such that limt→∞

1σ2γ(t)

≥ 0. At last, we demonstrate that either

limt→∞

1σ2γ(t)

> 0 for all γ ∈ V or limt→∞

1σ2γ(t)

= 0 for all γ ∈ V will hold.

First, if the factor graph is of tree structure, according to properties of BP in a tree structured

factor graph, it is known that limt→∞

σ2γ(t) = [P−1]γγ [1]. Due to P−1 ≻ 0, we have lim

t→∞σ2γ(t) =

July 31, 2014 DRAFT



16

[P−1]γγ > 0, and thereby limt→∞

1σ2γ(t)

> 0 for all γ ∈ V . In the following, we focus on the factor

graph containing loops.

If S1 = ∅, according to Theorem 2, va(t) converges to the same point for any initialization

va(0) ∈ A. Due to 1σ2i (t)

= pii +∑

k∈N (i) vak→i(t), then lim

t→∞1

σ2i (t)

always exists and is identical

for all va(0) ∈ A. Therefore, we only need to consider a specific initialization, e.g. va(0) = 0.

Second, we prove limt→∞

1σ2γ(t)

≥ 0 for at least one node γ ∈ V . Due to the factor graph containing

loops, we can always find a node γ such that |N (γ)| ≥ 2, where | · | means the cardinality of a

set. Denote nodes i, j as two neighbors of node γ, that is, i, j ∈ N (γ). Since va(0) = 0, from

(63), we have va∗ ≤ va(t). Applying this relation to 1σ2γ(t)

= pγγ +∑

k∈N (γ)\i vak→γ(t) + vai→γ(t)

gives

1

σ2γ(t)

≥ pγγ +∑

k∈N (γ)\i

vak→γ(t) + va∗i→γ. (33)

By further using va∗ ≤ va(t), we can easily obtain

pγγ +∑

k∈N (γ)\i

vak→γ(t) + va∗i→γ = pγγ +∑

k∈N (γ)\i,j

vak→γ(t) + va∗i→γ + vaj→γ(t)

≥ pγγ +∑

k∈N (γ)\j

va∗k→γ + vaj→γ(t). (34)

It can be further shown that

pγγ +∑

k∈N (γ)\j

va∗k→γ + vaj→γ(t)

=pγγ +

∑k∈N (γ)\jv

a∗k→γ

pjj +∑

k∈N (j)\γ vak→j(t− 1)

pjj +∑

k∈N (j)\γ

vak→j(t− 1) +pjj +

∑k∈N (j)\γ v

ak→j(t− 1)

pγγ +∑

k∈N (γ)\jva∗k→γ

vaj→γ(t)

a=

pγγ +∑


pjj +∑


pjj +∑

k∈N (j)\γ

vak→j(t− 1)−p2γj

pγγ +∑


b=

pγγ +∑


pjj +∑


pjj +∑

k∈N (j)\γ

vak→j(t− 1) + va∗γ→j

, (35)

where the equality a= holds because

(pjj +

∑k∈N (j)\γ v

ak→j(t− 1)

)vaj→γ(t) = −p2γj from (12);

and b= is obtained by using the limiting form of (12) va∗γ→j = − p2γj

pγγ+∑


. Combining

July 31, 2014 DRAFT



17

(34) and (35) gives

pγγ+∑

k∈N (γ)\i

vak→γ(t)+va∗i→γ ≥pγγ +

∑k∈N (γ)\jv

a∗k→γ

pjj +∑


pjj +∑

k∈N (j)\γ

vak→j(t− 1) + va∗γ→j

.

(36)

Due to va∗ ∈ S1 from Lemma 2, then va∗ ∈ W . From the definition of W in (15), for all

(ν0, ν1) ∈ E , we have

pν0ν0 +∑

k∈N (ν0)\ν1

va∗k→ν0> 0. (37)

By using (37) and va(t) ≥ va∗ implied by (63), we can infer pν0ν0 +∑

k∈N (ν0)\ν1 vak→ν0

(t) > 0

for all (ν0, ν1) ∈ E , and thereby

pν0ν0 +∑

k∈N (ν0)\ν1 va∗k→ν0

pν1ν1 +∑

k∈N (ν1)\ν0 vak→ν1

(t)> 0. (38)

Since the factor graph contains loops, there always exists a walk (j0, j1, · · · , jt−1, jt, i) with

jt = γ such that all (jℓ−1, jℓ) ∈ E and |N (jℓ)| ≥ 2 for ℓ = 0, 1, · · · , t. Then, using (36)

repeatedly on such a walk gives

pγγ +∑

k∈N (γ)\i

vak→γ(t) + va∗i→γ ≥t−1∏ℓ=0

pjℓ+1jℓ+1+∑

k∈N (jℓ+1)\jℓ va∗k→jℓ+1

pjℓjℓ +∑

k∈N (jℓ)\jℓ+1vak→jℓ

(ℓ)

pj0j0 +∑

k∈N (j0)\j1

vak→j0(0) + va∗j1→j0

=(pj0j0 + va∗j1→j0

) t−1∏ℓ=0

pjℓ+1jℓ+1+∑

k∈N (jℓ+1)\jℓ va∗k→jℓ+1

pjℓjℓ +∑

k∈N (jℓ)\jℓ+1vak→jℓ

(ℓ), (39)

where the last equality holds due to va(0) = 0. Due to |N (j0)| ≥ 2, there exists a node ν

such that j1 ∈ N (j0)\ν. By exploiting the fact that va∗ < 0 due to va∗ ∈ S1, we obtain

pj0j0 + va∗j1→j0≥ pj0j0 +

∑k∈N (j0)\ν v

a∗k→j0

. Combining with (37), we can infer that

pj0j0 + va∗j1→j0> 0. (40)

Putting (38) and (40) into (39), we obtain pγγ +∑

k∈N (γ)\i vak→γ(t) + va∗i→γ > 0. Applying this

result to (33) gives 1σ2γ(t)

> 0 for all t ≥ 0, and thereby limt→∞

1σ2γ(t)

≥ 0.

At last, we will prove that limt→∞

1σ2γ(t)

> 0 for all γ ∈ V or limt→∞

1σ2γ(t)

= 0 for all γ ∈ V . Notice

that the relation in (35) holds for any (γ, j) ∈ E . Thus, taking the limit on both sides of (35)

and recognizing that limt→∞

1σ2γ(t)

= pγγ +∑

k∈N (γ)va∗k→γ and lim

t→∞1

σ2j (t)

= pjj +∑

k∈N (j)va∗k→j , we

obtain

limt→∞

1

σ2γ(t)

=pγγ +

∑k∈N (γ)\j v

a∗k→γ

pjj +∑

k∈N (j)\γ va∗k→j

· limt→∞

1

σ2j (t)

. (41)

July 31, 2014 DRAFT



18

Since pγγ+∑

k∈N (γ)\j va∗k→γ > 0 for all (γ, j) ∈ E given in (37), according to (41), if lim

t→∞1

σ2γ(t)

>

0, then limt→∞

1σ2j (t)

> 0 for all nodes j ∈ N (γ), and vice versa. Similarly, if limt→∞

1σ2γ(t)

= 0, then

limt→∞

1σ2j (t)

= 0 for all nodes j ∈ N (γ), and vice versa. Recall that we have proved there exists a

node γ with limt→∞

1σ2γ(t)

≥ 0. Therefore, for a fully connected factor graph, either limt→∞

1σ2i (t)

> 0

for all i ∈ V or limt→∞

1σ2i (t)

= 0 for all i ∈ V .

Obviously, if limt→∞

1σ2i (t)

= 0, then limt→∞

σ2i (t) exists, and hence the convergence condition of

σ2i (t) is the same as that of 1

σ2i (t)

. However, Lemma 3 reveals that the scenario limt→∞

1σ2i (t)

= 0

cannot be excluded. Thus, we have the following theorem.

Theorem 4. σ2i (t) with i ∈ V converges to the same positive value for all va(0) ∈ A under

synchronous and asynchronous schedulings if and only if S = ∅, where

S ,{w

∣∣∣∣w ∈ S1 and pii +∑

k∈N (i)wki > 0, ∀ i ∈ V

}. (42)

Proof: First, we prove if S = ∅, then σ2i (t) converges for any va(0) ∈ A. From the definition

of S in (42), if S = ∅, then S1 = ∅. According to Theorems 2 and 3, va(t) converges to va∗

for all va(0) ∈ A under both synchronous and asynchronous schedulings. Thus, limt→∞

1σ2i (t)

=

pii +∑

k∈N (i) vak→i(t) will converge to pii +

∑k∈N (i) v

a∗k→i. Due to S = ∅, there exists a w such

that w ∈ S1 and pii+∑

k∈N (i)wki > 0 for all i ∈ V . Due to w ∈ S1, it can be inferred from (63)

that w ≤ va∗. Putting this result into pii +∑

k∈N (i)wki > 0, we obtain pii +∑

k∈N (i) va∗k→i > 0

and its inverse exists. Thus, if S = ∅, the variance σ2i (t) = 1

pii+∑

k∈N (i) vak→i(t)

converges to1

pii+∑

k∈N (i) va∗k→i

> 0.

Next, we prove by contradiction that if σ2i (t) converges for va(0) ∈ A, then S = ∅. Suppose

that S = ∅. This assumption contains two cases: 1) S1 = ∅; 2) S1 = ∅ and pii+∑

k∈N (i) wki ≤ 0

for some i ∈ V . First, consider the case S1 = ∅. Due to S1 = ∅, according to Theorem 1,

the messages of Gaussian BP passed in factor graph cannot maintain Gaussian form, hence

va(t) becomes undefined. Therefore, the variance σ2i (t) cannot converge in this case, which

contradicts with the prerequisite that σ2i (t) converges. Second, consider the case S1 = ∅. Due

to S1 = ∅, according to Theorems 2 and 3, va(t) converges to va∗ ∈ S1 for any va(0) ∈ A.

According to Lemma 3, we further have limt→∞

1σ2(t)

= pii +∑

k∈N (i) va∗k→i > 0 for all i ∈ V or

limt→∞

1σ2(t)

= pii +∑

k∈N (i) va∗k→i = 0 for all i ∈ V . If pii +

∑k∈N (i) v

a∗k→i > 0 for all i ∈ V ,

combining with va∗ ∈ S1, we can infer from the definition of S in (42) that va∗ ∈ S , which

July 31, 2014 DRAFT



19

contradicts with the assumption S = ∅. On the other hand, if pii +∑

k∈N (i) va∗k→i = 0 for all

i ∈ V , obviously, the variance σ2i (t) =

1pii+

∑k∈N (i) v

ak→i(t)

cannot converge, which contradicts with

the prerequisite that σ2i (t) converges. In summary, we have proved that if σ2

i (t) converges for

all va(0) ∈ A, then S = ∅ cannot hold, or equivalently we must have S = ∅.

Remark 1: Due to the belief mean µi(t) =hi+

∑k∈N (i) β

ak→i(t)

σ2i (t)

, if the convergence condition

of belief variance σ2i (t) is already known, the convergence of µi(t) is determined by that

of message parameters βak→i(t). As pointed out in [24], βa

k→j(t) is updated in accordance to

a set of linear equations upon the convergence of belief variances. Thus, after deriving the

convergence condition of σ2i (t), we can then apply the theories from linear equations to analyze

the convergence condition of belief mean µi(t). However, the convergence condition of belief

mean µi(t) would be lengthy and will be investigated in the future.

B. Verification Using Semi-definite Programming

Next, we show that whether S is ∅ can be determined by solving the following SDP problem

[30]

minw, α

α (43)

s.t.

pii +∑

k∈N (i)\jwki pij

pij −wij

≽ 0, ∀ (i, j) ∈ E ;

α + pii +∑

k∈N (i)

wki ≥ 0, ∀ i ∈ V .

The SDP problem (43) can be solved efficiently by existing softwares, such as CVX [31] and

SeDuMi [32], etc.

Theorem 5. S = ∅ if and only if the optimal solution α∗ of (43) satisfies α∗ < 0.

Proof: First, notice that the SDP problem in (43) is equivalent to the following optimization

July 31, 2014 DRAFT



20

problem

minw, α

α (44)

s.t. wij − gij (w) ≤ 0, ∀ (i, j) ∈ E ;

− pii −∑

k∈N (i)\j

wki ≤ 0, ∀ (i, j) ∈ E ;

− pii −∑

k∈N (i)

wki ≤ α, ∀ i ∈ V .

If S = ∅, according to definition of S in (42), there must exist a w such that w ≤ g(w), w ∈ W ,

and pii +∑

k∈N (i) wki > 0 for all i ∈ V . Obviously, these three conditions are equivalent to

wij − gij (w) ≤ 0, −pii −∑

k∈N (i)\j wki < 0 for all (i, j) ∈ E and −pii −∑

k∈N (i)wki < 0 for

all i ∈ V . Thus, by defining α = −pii −∑

k∈N (i)wki, it can be seen that (w, α) satisfies the

constraints in (44). Due to −pii −∑

k∈N (i)wki < 0, thus we have α < 0. Since α is a feasible

solution of the minimization problem in (44), the optimal solution of (44) cannot be greater than

α, hence it can be inferred that α∗ < 0.

Next, we prove the reverse also holds. If (w∗, α∗) is the optimal solution of (44) with α∗ < 0,

(w∗, α∗) must satisfy the constraints in (44), thus the following three conditions hold: 1) w∗ij −

gij (w∗) ≤ 0; 2) −pii−

∑k∈N (i)\j w

∗ki ≤ 0; 3) −pii−

∑k∈N (i)w

∗ki ≤ α∗. For the second constraint,

if −pii −∑

k∈N (i)\j w∗ki = 0, the function gij (w

∗) = − p2ijpii+

∑k∈N (i)\j w

∗ki

becomes undefined, thus

−pii −∑

k∈N (i)\j w∗ki = 0 will never happen. Hence, we always have −pii −

∑k∈N (i)\j w

∗ki < 0.

For the third constraint, due to α∗ < 0, it can be inferred that −pii −∑

k∈N (i) w∗ki < 0. Now,

comparing −pii−∑

k∈N (i)\j w∗ki ≤ 0, −pii−

∑k∈N (i)\j w

∗ki < 0 and −pii−

∑k∈N (i)w

∗ki < 0 with

the definition of set S in (42), we have w∗ ∈ S, and hence S = ∅.

Remark 2: Using the alternating direction method of multipliers (ADMM) [27], the SDP

problem in (43) can be reformulated into N locally connected low-dimensional SDP sub-

problems, thus can be solved distributively. Not only this avoids the gathering of information at

a central processing unit, the complexity is also reduced from O((∑N

i=1 |N (i)|)4)

of directly

solving SDP to O(∑N

i=1 |N (i)|4)

per iteration of using ADMM technique. Furthermore, since

what we are really interested in is to know whether S = ∅, ADMM can stop its updating

immediately if an intermediary vector is found to be within S . Thus, the required number of

iterations can be reduced significantly. Because the derivation of ADMM is well-documented

July 31, 2014 DRAFT



21

in [27], [33], we do not give the details here. It should be emphasized that despite of the low

complexity of ADMM at each iteration, due to the unknown number of required iterations, it

cannot be proved that the overall complexity is always lower than that of direct matrix inverse

O(N3).

Remark 3: As discussed after (20), we know that A = {w ≥ w0 |w0 ∈ int(S1)} if int(S1) = ∅.

In particular, with a proof similar to that of Theorem 5, it can be shown that if the SDP

minw, α

α (45)

s.t.

pii +∑

k∈N (i)\jwki pij

pij α− wij

≽ 0, ∀ (i, j) ∈ E

has a solution α∗ < 0, then the optimal w∗ must be a point in int(S1).

VI. RELATIONSHIP WITH THE CONDITION BASED ON COMPUTATION TREE

In this section, we will establish the relation between the proposed convergence condition and

the one proposed in [24], which is the best currently available. For the convergence condition

in [24], the original PDF in (1) is first transformed into the normalized form

f(x) ∝ exp

{−1

2xT Px+ hT x

}, (46)

where x = diag12 (P)x, P = diag−

12 (P)Pdiag−

12 (P) and h = diag−

12 (P)h with diag(P)

denoting a diagonal matrix by only retaining the diagonal elements of P. Then, Gaussian BP

is carried out with P and h using the updating equations in (6), (7), (9) and (10). Under the

normalized P and h, we denote the corresponding message parameters as va(t) and βa(t). With

va(t), the corresponding belief variance can be calculated using (11) as σ2i (t) =

1pii+

∑k∈N (i) v

ak→i(t)

.

The variance of the original PDF f(x) can be recovered from 1piiσ2i (t). Moreover, under the

normalized P, the sets W , S1, A and S can also be defined similar to (15), (17), (20) and (42),

respectively.

By noticing that the message-passing process of Gaussian BP in the factor graph can be

translated exactly into the message passing in a computation tree, [24] proposed a convergence

condition for belief variance σ2i (t) in terms of computation tree. A computation tree Tγ,∆ is

constructed by choosing one of the variable nodes γ as root node and N (γ) as its direct

descendants, connected through factor node fγ,ν(xγ, xν) = exp{−pγν xγxν}, where ν is the

July 31, 2014 DRAFT



22

1x

2x

3x

4x

3x

4x

2x

3x

4x1

x

1x

3x

1x 3

x 1x 2

x

4x2

x

4x

1x

4x

3x

Fig. 1: Illustration of computation tree T(1, 4) and computation sub-tree Ts(1, 4, 4) within the

dashed line.

index of any descendant. For each descendant ν, the neighbors of ν excluding its parent node

are connected as next level descendants, also connected through the corresponding factor nodes.

The process is repeated until the computation tree Tγ,∆ has ∆ + 1 layers of variables nodes

(including the root node). The computation tree is completed by further connecting factor node

fi(xi) = exp{−12x2i + hixi} to variable node xi for i ∈ V . Furthermore, a computation subtree

Ts(γ, ν,∆) can be obtained by cutting the branch of xν in the first layer from T(γ,∆). An

example of computation tree T(1, 4) as well as computation subtree Ts(1, 4, 4) corresponding to

a fully connected factor graph with four variables are illustrated in Fig. 1.

By assigning to each variable node with a new index i = 1, 2, · · · , |Tγ,∆|, Tγ,∆ can be viewed

as a factor graph for a new PDF, where |Tγ,∆| denotes the number of variable nodes in Tγ,∆.

The new PDF represented by Tγ,∆ is

fγ,∆ (xγ,∆) ∝ exp

{−1

2xTγ,∆Pγ,∆xγ,∆ + hT

γ,∆xγ,∆

}, (47)

where xγ,∆ is a |Tγ,∆| × 1 vector; Pγ,∆ and hγ,∆ are the corresponding precision matrix and

linear coefficient vector, respectively. According to the equivalence between the message-passing

process in original factor graph and that in computation tree [6], [24], under the prerequisite of

Pγ,∆ ≻ 0 and va(0) = 0, it is known that

σ2γ(∆) = [P−1

γ,∆]11. (48)

July 31, 2014 DRAFT



23

Similarly, the computation sub-tree Ts(γ, ν,∆) can also be viewed as a factor graph with Pγ\ν,∆

being the corresponding symmetric precision matrix in the PDF. Furthermore, if (48) holds, we

also have

σ2γ\ν(∆) = [P−1

γ\ν,∆]11, (49)

where σ2γ\ν(∆) , 1

pγγ+∑

k∈N (γ)\ν vak→γ(∆). In [24, Proposition 25], it is proved that if lim

∆→∞ρ(Pγ,∆−

I) < 1, the variance σ2γ(t) converges for va(0) = 0; but if lim

∆→∞ρ(Pγ,∆ − I) > 1, Gaussian BP

becomes ill-posed. Notice that the limit lim∆→∞

ρ(Pγ,∆ − I) always exist and are identical for

all γ ∈ V [24, Lemma 24]. The following lemma and theorem reveal the relation between the

proposed convergence condition and that based on computation tree.

Lemma 4. If Pγ,∆ ≻ 0 for all γ ∈ V and ∆ ≥ 0, then va(t) converges to va∗ ∈ S1 for

va(0) = 0.

Proof: See Appendix B.

Now, we give the following theorem.

Theorem 6. S = ∅ if and only if either of the following two conditions holds:

1) lim∆→∞

ρ(Pγ,∆ − I) < 1;

2) lim∆→∞

ρ(Pγ,∆ − I) = 1, pγγ +∑

k∈N (γ) va∗k→γ = 0 with va(0) = 0, and Pν,∆ ≻ 0 for all ν ∈ V

and ∆ ≥ 0.

Proof:

Sufficient Condition:

First, we prove that if lim∆→∞

ρ(Pγ,∆ − I) < 1, then S = ∅. With the monotonically increasing

property of ρ(Pγ,∆ − I) [24, Lemma 23] and the assumption that its limit is smaller than 1, we

have ρ(Pγ,∆− I) ≤ ρu with ρu , lim∆→∞

ρ(Pγ,∆− I) ∈ (0, 1). Expressing in terms of eigenvalues,

we have −ρu ≤ λ(Pγ,∆)− 1 ≤ ρu, or equivalently

1− ρu ≤ λ(Pγ,∆) ≤ 1 + ρu. (50)

For a symmetric Pγ,∆, the eigenvalues of P−1γ,∆ are

{1

λ(Pγ,∆)

}, and (50) is equivalent to

1

1 + ρu≤ λ(P−1

γ,∆) ≤1

1− ρu. (51)

July 31, 2014 DRAFT



24

Due to 0 < ρu < 1, it can be obtained from (51) that λ(P−1γ,∆) > 0, that is,

P−1γ,∆ ≻ 0, ∀ γ ∈ V and ∆ ≥ 0. (52)

Notice that the diagonal element of a symmetric matrix is always smaller or equal to the

matrix’s maximum eigenvalue [34], that is, [P−1γ,∆]ii ≤ λmax(P

−1γ,∆). Combining with the fact

that for a positive definite matrix, its diagonal elements are positive, we obtain 0 < [P−1γ,∆]ii ≤

λmax(P−1γ,∆). Since the upper bound of (51) applies to all eigenvalues, we obtain

0 < [P−1γ,∆]ii ≤

1

1− ρu. (53)

Due to [P−1γ,∆]11 = σ2

γ(∆) from (48) and σ2γ(∆) = 1

pγγ+∑

k∈N (γ) vak→γ(∆)

, it can be inferred from

(53) that pγγ +∑

k∈N (γ) vak→γ(∆) ≥ 1− ρu for all ∆ ≥ 0. Due to P−1

γ,∆ ≻ 0 from (52), we have

Pγ,∆ ≻ 0, and according to Lemma 4, va(t) always converges to va∗ ∈ S1. Thus, it can be

inferred that pγγ +∑

k∈N (γ) va∗k→γ ≥ 1− ρu. With ρu < 1, we obtain

pγγ +∑

k∈N (γ)

va∗k→γ > 0, ∀ γ ∈ V . (54)

Combining (54) with va∗ ∈ S1, it can be inferred that va∗ ∈ S by the definition, thus S = ∅.

Next, we prove the second scenario. Due to Pγ,∆ ≻ 0, according to Lemma 4, va(t) converges

to va∗ ∈ S1, and thereby S1 = ∅ and pγγ +∑

k∈N (γ) va∗ exists. Due to S1 = ∅, according to

Lemma 3, it is known that pγγ+∑

k∈N (γ) va∗ > 0 for all γ ∈ V or pγγ+

∑k∈N (γ) v

a∗ = 0 for all

γ ∈ V . From the prerequisite pγγ +∑

k∈N (γ) va∗ = 0, we can infer that pγγ +

∑k∈N (γ) v

a∗ > 0

holds for all γ ∈ V . Combining with va∗ ∈ S1, we can see that va∗ ∈ S , thus S = ∅.

Necessary Condition:

In the following, we will first prove that the precision matrix of a computation tree can always

be written into a special structure by re-indexing the nodes in the tree. Based on the special

structure, the necessity is proved by the method of induction.

First, notice that changing indexing schemes in the computation tree does not affect the positive

definiteness of the corresponding precision matrix Pγ,∆. So, we consider an indexing scheme

July 31, 2014 DRAFT



25

such that the precision matrix Pγ,∆+1 for ∆ ≥ 0 can be represented in the form

Pγ,∆+1 =

pγγ aTk1\γ,∆ aT

k2\γ,∆ · · · aTk|N (γ)|\γ,∆

ak1\γ,∆ Pk1\γ,∆ 0 · · · 0

ak2\γ,∆ 0. . . . . . ...

...... . . . . . . 0

ak|N (γ)|\γ,∆ 0 · · · 0 Pk|N (γ)|\γ,∆

, (55)

where ki ∈ N (γ) for i = 1, 2, · · · , |N (γ)|; aki\γ,∆ = [pkiγ, 0, · · · , 0]T is a vector with length

|Ts(ki, γ,∆)|. Notice that the (∆ + 1)-order computation tree T(γ,∆+ 1) consists of the root

node [x]γ and a set of ∆-order computation sub-trees Ts(ki, γ,∆) with the corresponding root

node ki. The lower-right block diagonal structure of (55) can be easily obtained by assigning

consecutive indices to nodes inside each sub-tree Ts(ki, γ,∆). Moreover, since there is only

one connection from node γ to the root node ki of each sub-tree Ts(ki, γ,∆), aki\γ,∆ contains

only one nonzero element pkiγ . Then, by assigning the smallest index to the root node in each

Ts(ki, γ,∆), the only nonzero element in aki\γ,∆ must locate at the first position. Therefore, the

precision matrix Pγ,∆+1 can be represented in the form of (55).

When ∆ = 0, obviously, Pγ,∆ = pγγ > 0. Suppose Pγ,∆ ≻ 0 for some ∆ ≥ 0, we need

to prove Pγ,∆+1 ≻ 0. From (55), it is clear that Pγ,∆+1 ≻ 0 if and only if the following two

conditions are satisfied [34, p. 472]:

Bdiag(Pk1\γ,∆, · · · , Pk|N (γ)|\γ,∆

)≻ 0, (56)

pγγ −∑

k∈N (γ)

aTk\γ,∆P

−1k\γ,∆ak\γ,∆ > 0, (57)

where Bdiag(·) denotes block diagonal matrix with the elements located along the main diagonal.

Due to Pk,∆ ≻ 0 for all k ∈ V by assumption, then its sub-matrices Pk\γ,∆ ≻ 0 for all (k, γ) ∈ E ,

thus the first condition (56) holds. On the other hand, for the second condition (57), we write

pγγ −∑

k∈N (γ)

aTk\γ,∆P

−1k\γ,∆ak\γ,∆

a= pγγ −

∑k∈N (γ)

p2kγ · σ2k\γ(∆)

b= pγγ +

∑k∈N (γ)

vak→γ(∆ + 1), (58)

where the equality a= holds since ak\γ,∆ has only one nonzero pkγ in the first element, and

[P−1k\γ,∆]11 = σ2

k\γ(∆) from (49); and b= holds due to −p2kγ · σ2

k\γ(∆) = − p2kγpγγ+

∑k∈N (γ)\ν vak→γ(∆)

=

July 31, 2014 DRAFT



26

vak→γ(∆ + 1) given in (12). If S = ∅, according to Theorem 4, we have

pγγ +∑

k∈N (γ)

va∗k→γ > 0. (59)

Due to S ⊆ S1, then S = ∅ also implies S1 = ∅. According to (63), it can be inferred that

va(t) ≥ va∗. Combining with (59), we obtain pγγ+∑

k∈N (γ) vak→γ(t) > 0. Substituting the result

into (58), it can be inferred that the second condition (57) holds as well. Thus, we have Pγ,∆ ≻ 0

for all γ ∈ V and ∆ > 0. Furthermore, from (59), it is obvious that pγγ +∑

k∈N (γ) va∗k→γ = 0.

From Pγ,∆ ≻ 0, we obtain I− (I− Pγ,∆) ≻ 0, and hence 1−λmax(I− Pγ,∆) > 0. Since Pγ,∆

represents a tree-structured factor graph, it is proved in [24, Proposition 15] that λmax(I−Pγ,∆) =

ρ(Pγ,∆ − I). Therefore, it can be obtained that ρ(Pγ,∆ − I) < 1, and thereby

lim∆→∞

ρ(Pγ,∆ − I) ≤ 1. (60)

Finally, if lim∆→∞

ρ(Pγ,∆ − I) < 1, due to P−1γ,∆ ≻ 0 from (52), we have Pγ,∆ ≻ 0. Together

with (54), it can be inferred that under the prerequisite of lim∆→∞

ρ(Pγ,∆ − I) < 1, the conditions

Pγ,∆ ≻ 0 and pγγ +∑

k∈N (γ) va∗k→γ = 0 are automatically satisfied. Therefore, if S = ∅, we have

either lim∆→∞

ρ(Pγ,∆ − I) < 1 or lim∆→∞

ρ(Pγ,∆ − I) = 1, pγγ +∑

k∈N (γ) va∗k→γ = 0 and Pγ,∆ ≻ 0.

From Theorems 4 and 6, it can be obtained that the variance σ2γ(t) converges as lim

∆→∞ρ(Pγ,∆−

I) < 1, and diverges as lim∆→∞

ρ(Pγ,∆ − I) > 1, which are consistent with the results proposed

in [24]. Moreover, it can be seen from Theorem 6 that it is not sufficient to determine the

convergence of variance σ2γ(t) by using lim

∆→∞ρ(Pγ,∆ − I) = 1 only. This fills in the gap of [24]

in the scenario of lim∆→∞

ρ(Pγ,∆ − I) = 1. Albeit with similar conclusions to [24], we need to

emphasize that the criterion lim∆→∞

ρ(Pγ,∆−I) < 1 proposed in [24] is not easy to check in practice

due to the infinite dimension, while our condition S = ∅ can be verified by solving an SDP

problem given in Theorem 5. Moreover, the initialization is expanded from the a single choice

va(0) = 0 in [24] to a much larger set va(0) ∈ A in this paper. The flexibility on the choice

of initialization is useful to accelerate the convergence of variance σ2i (t) if the initialization is

chosen close to the convergent point.

July 31, 2014 DRAFT



27

VII. NUMERICAL EXAMPLES

In this section, numerical experiments are presented to corroborate the theories in this paper.

The example is based on the 20× 20 precision matrices P constructed as

pij =

1, if i = j

ζ · θmod(i+j,10)+1, if i = j, (61)

where ζ is a coefficient indicating the correlation strength among variables; and θk is the k-

th element of the vector θ = [0.13, 0.10, 0.71,−0.05, 0, 0.12, 0.07, 0.11,−0.02,−0.03]T . The

varying of correlation strength ζ induces a series of matrices, and the positive definite constraint

P ≻ 0 required by a valid PDF is guaranteed when ζ < 0.5978.

Fig. 2 illustrates how the optimal solution α∗ of (43) varies with the correlation strength ζ .

It can be seen that the optimal solution α∗ always exists and the condition α∗ < 0 holds for

all ζ ≤ 0.5859, while no feasible solution exists in the SDP problem (43) when ζ > 0.5859.

According to Theorem 4, this means that if ζ ≤ 0.5859, the variance σ2i (t) with i ∈ V converges

to the same point for all initializations va(0) ∈ A under both synchronous and asynchronous

schedulings. On the other hand, if ζ > 0.5859, the variance σ2i (t) cannot converge.

To verify the convergence of belief variances under ζ ≤ 0.5859, Fig. 3 shows how the variance

σ21(t) of the 1-st variable evolves as a function of t when ζ = 0.5858, which is slightly smaller

than 0.5859. It can be observed that the variance σ21(t) converges to the same value under

both synchronous and asynchronous schedulings and different initializations of va(0) = 0 and

va(0) = w∗, where w∗ is the optimal solution of (45). For the asynchronous case, a scheduling

with 30% chance of not updating the messages at each iteration is considered. On the other

hand, Fig. 4 verifies the divergence of variance σ21(t) when ζ = 0.5860, which is slightly larger

than 0.5859. In this figure, synchronous scheduling and the same initializations as that of Fig.

3 are used. It can be seen that σ21(t) fluctuates as iterations proceed, and does not show sign of

convergence.

VIII. CONCLUSIONS

In this paper, the necessary and sufficient convergence condition for the variances of Gaussian

BP was developed for synchronous and asynchronous schedulings. The initialization set of the

proposed condition is much larger than the usual choice of a single point of zero. It is proved that

July 31, 2014 DRAFT



28

0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

X: 0.5859Y: −0.2844

Converg

ence M

etr

ic: α

Convergence Metric of Variance

Correlation Strength (ζ)

ζ=0.5978

Fig. 2: The value of α∗ under different correlation strength ζ .

0 20 40 60 80 100 120 140 160 180 2001.38

1.39

1.4

1.41

1.42

1.43

1.44

1.45

1.46

1.47

Convergence of Variance with ζ=0.5858

Number of Iterations (t)

σ2 1(t

)

asynchronous

synchronous

va(0) = 0

va(0) = w

∗

Fig. 3: Illustration for the convergence of variance σ21(t) under different schedulings and

initializations with w∗ being the optimal solution of (45).

the convergence condition can be verified efficiently by solving an SDP problem. Furthermore,

the relationship between the convergence condition proposed in this paper and the one based on

computation tree was established. The relationship fills in a missing piece of the result in [24]

where the spectral radius of computation tree is equal to one. Numerical examples were further

proposed to verify the proposed convergence conditions.

July 31, 2014 DRAFT



29

0 200 400 600 800 1000 1200−10

−5

0

5

10

15Divergence of Variance with ζ=0.5860

Number of Iterations (t)

σ2 1(t

)

va(0) = 0

va(0) = w

∗

Fig. 4: Illustration for the divergence of variance σ21(t) with w∗ being the optimal solution of

(45).

APPENDIX A

PROOF OF LEMMA 2

First, we prove that va(t) converges for any va(0) ≥ 0 given S1 = ∅. For any w ∈ S1,

according to P5), we have w < 0. Thus, for any va(0) ≥ 0, the relation w ≤ va(0) always

holds. Notice that w ∈ W due to w ∈ S1 and S1 ⊆ W . Applying P2) to w ≤ va(0), we obtain

g(w) ≤ va(1). Combining it with w ≤ g(w) from P6) gives w ≤ va(1). On the other hand,

substituting va(0) ≥ 0 into (14) gives

va(1) < 0. (62)

Due to va(0) ≥ 0, thus va(1) ≤ va(0). Combining it with w ≤ va(1) gives w ≤ va(1) ≤ va(0).

Applying g(·) to w ≤ va(1) < va(0), it can be inferred from P2) that g(w) ≤ va(2) < va(1).

Together with w ≤ g(w) as claimed by P6), we obtain w ≤ va(2) < va(1). By induction, we

can infer that

w ≤ va(t+ 1) ≤ va(t). (63)

It can be seen from (63) that va(t) is a monotonically non-increasing but lower bounded sequence,

thus it converges.

Next, we prove that va(t) converges to the same point for all va(0) ≥ 0. For any va(0) ≥ 0,

according to P2), applying g(·) on both sides of 0 ≤ va(0) gives g(0) ≤ g(va(0)) = va(1).

July 31, 2014 DRAFT



30

Combining this relation with (62) leads to g(0) ≤ va(1) ≤ 0. Applying g(·) on this inequality

for t more times, it can be obtained from P2) that g(t+1)(0) ≤ va(t+1) ≤ g(t)(0) for all t ≥ 0.

By denoting limt→∞

g(t)(0) = va∗, it is obvious that va(t) also converges to the va∗.

Finally, since va∗ is a fixed point of g(·), hence va∗ = g(va∗). From (63), we obtain w ≤ va∗.

Due to w ∈ S1, thus w ∈ W by definition. Applying P1), it can be inferred that va∗ ∈ W .

Combining with the fact va∗ = g(va∗), according to the definition of S1 in (17), we obtain

va∗ ∈ S1.


in factor graph cannot be maintained at the Gaussian form, thus the parameters of messages

va(t) cannot converge in this case.

APPENDIX B

PROOF OF LEMMA 4

Due to Pγ,∆ ≻ 0, then P−1γ,∆ ≻ 0, hence the diagonal elements [P−1

γ,∆]ii > 0 [34]. With

σ2γ(∆) = [P−1

γ,∆]11 and 1σ2γ(t)

= pγγ +∑

k∈N (γ) vak→γ(t), we have

pγγ +∑

k∈N (γ)\ν

vak→γ(t) > −vaν→γ(t). (64)

Next, we prove va(t) ∈ W for t ≥ 0. Obviously, va(0) = 0 ∈ W . Suppose va(t) ∈ W

holds for some t ≥ 0, and according to the definition of W in (15), this is equivalent to

pνν +∑

k∈N (ν)\γ vak→ν(t) > 0. Substituting this inequality into (14) gives vaν→γ(t+1) < 0. Then

applying vaν→γ(t+1) < 0 into (64) gives pγγ +∑

k∈N (γ)\ν vak→γ(t+ 1) > 0 for all (ν, γ) ∈ E , or

equivalently va(t+ 1) ∈ W . Thus, we have va(t) ∈ W for all t ≥ 0.

Finally, substituting va(0) = 0 into (13) gives va(1) < 0, and thereby va(1) < va(0). From

va(t) ∈ W and P2), applying g(·) on both sides of va(1) < va(0) for t times gives

va(t+ 1) ≤ va(t). (65)

From va(0) = 0 and (65), we have va(t) ≤ 0. Putting the result into (64) gives vaν→γ(t) > −pγγ

for all (ν, γ) ∈ E . Together with (65), it can be seen that va(t) is a monotonically non-increasing

and lower bounded sequence. Thus, va(t) converges to a fixed point va∗ with va∗ = g(va∗).

Moreover, from va(1) < 0 and (65), we have va∗ < 0. Putting the result into the limiting form

of (64) gives pγγ +∑

k∈N (γ)\ν va∗k→γ > 0 for all (ν, γ) ∈ E , or equivalently va∗ ∈ W . Combining

with va∗ = g(va∗), we obtain va∗ ∈ S1 by the definition of S1.

July 31, 2014 DRAFT



31

REFERENCES

[1] F. R. Kschischang, B. J. Frey, and H. A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory,

vol. 47, pp. 498-519, 2001.

[2] H. A. Loeliger, “An introduction to factor graphs,” IEEE Signal Process. Mag., vol. 21, pp. 28-41, 2004.

[3] H. A. Loeliger, J. Dauwels, H. Junli, S. Korl, P. Li, and F. R. Kschischang, “The Factor graph approach to model-based

signal processing,” Proc. IEEE, vol. 95, no. 6, pp. 1295-1322, June 2007.

[4] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, Aug. 2006.

[5] D. Koller and N. Friedman. Probabilistic Graphical Models Principles and Techniques. MIT Press, 2009.

[6] Y. Weiss and W. T. Freeman, “Correctness of belief propagation in Gaussian graphical models of arbitrary topology,” Neural

Comput., vol. 13, pp. 2173-2200, 2001.

[7] Y. Liu, V. Chandrasekaran, A. Anandkumar, and A. S. Willsky, “Feedback message passing for inference in Gaussian

graphical models,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 4135-4150, 2012.

[8] A. Montanari, B. Prabhakar, and D. Tse, “Belief propagation based multiuser detection,” in Proc. IEEE Information Theory

Workshop, Punta del Este, Uruguay, Mar. 2006.

[9] Q. Guo and D. Huang, “EM-based joint channel estimation and detection for frequency selective channels using Gaussian

message passing,” IEEE Trans. Signal Process., vol. 59, pp. 4030-4035, 2011.

[10] Q. Guo and P. Li, “LMMSE turbo equalization based on factor graphs,” IEEE J. Sel. Areas Commun., vol. 26, pp. 311-319,

2008.

[11] Y. El-Kurdi, W. J. Gross, and D. Giannacopoulos, “Efficient implementation of Gaussian belief propagation solver for

large sparse diagonally dominant linear systems,” IEEE Trans. Magn., vol. 48, no. 2, pp. 471474, Feb. 2012.

[12] O. Shental, P. H. Siegel, J. K. Wolf, D. Bickson, and D. Dolev, “Gaussian belief propagation solver for systems of linear

equations,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2008, pp. 1863-1867.

[13] X. Tan and J. Li, “Computationally efficient sparse Bayesian learning via belief propagation,” IEEE Trans. Signal Process.,

vol. 58, pp. 2010-2021, 2010.

[14] V. Chandrasekaran, J. K. Johnson, and A. S. Willsky, “Estimation in Gaussian graphical models using tractable subgraphs:

a walk-sum analysis,” IEEE Trans. Signal Process., vol. 56, pp. 1916-1930, 2008.

[15] N. Boon Loong, J. S. Evans, S. V. Hanly, and D. Aktas, “Distributed downlink beamforming with cooperative base stations,”

IEEE Trans. Inf. Theory, vol. 54, pp. 5491-5499, 2008.

[16] F. Lehmann, “Iterative mitigation of intercell interference in cellular networks based on Gaussian belief propagation,”

IEEE Trans. Veh. Technol., vol. 61, pp. 2544-2558, 2012.

[17] M. Leng and Y. C. Wu, “Distributed clock synchronization for wireless sensor networks using belief propagation,” IEEE

Trans. Signal Process., vol.59, no.11, pp. 5404-5414, Nov. 2011.

[18] A. Ahmad, D. Zennaro, E. Serpedin, and L. Vangelista, “A factor graph approach to clock offset estimation in wireless

sensor networks,” IEEE Trans. Inf. Theory, vol. 58, no. 7, pp. 4244-4260, July 2012.

[19] M. Leng, T. Wee-Peng, and T. Q. S. Quek, “Cooperative and distributed localization for wireless sensor networks in

multipath environments,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 3125-

3128.

[20] Y. Song, B. Wang, Z. Shi, K. Pattipati, and S. Gupta, “Distributed algorithms for energy-efficient even self-deployment in

mobile sensor networks,” in IEEE Trans. Mobile Compt., 2013, pp. 1-14.

July 31, 2014 DRAFT



32

[21] G. Zhang, W. Xu, and Y. Wang, “Fast distributed rate control algorithm with QoS support in Ad Hoc networks,” in Proc.

IEEE Global Telecommunications Conf. (Globecom), 2010.

[22] D. Dolev, A. Zymnis, S. P. Boyd, D. Bickson, and Y. Tock, “Distributed large scale network utility maximization,” in

Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2009, pp. 829-833.

[23] M. W. Seeger and D. P. Wipf, “Variational Bayesian inference techniques,” IEEE Signal Process. Mag., vol. 27, no. 6, pp.

8191, Nov. 2010.

[24] D. M. Malioutov, J. K. Johnson, and A. S. Willsky, “Walk-sums and belief propagation in Gaussian graphical models,” J.

Mach. Learn. Res., vol. 7, no.1, pp. 2031-2064, 2006.

[25] C. C. Moallemi and B. Van Roy, “Convergence of min-sum message passing for quadratic optimization,” IEEE Trans. Inf.

Theory, vol. 55, pp. 2413-2423, 2009.

[26] S. Tatikonda and M. I. Jordan, “Loopy belief propagation and Gibbs measures,” in Proc. 18th Annu. Conf. Uncertainty

Artif. Intell. (UAI-02), San Francisco, CA, Aug. 2002, pp.493-500.

[27] D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Englewood Cliffs, NJ:

Prentice-Hall, 1989.

[28] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.

[29] C. C. Moallemi and B. Van Roy, “Convergence of Min-Sum Message Passing for Convex Optimization,” IEEE Trans. Inf.

Theory, vol. 56, no. 4, pp. 2041-2050, 2010.

[30] L. Vandenberghe and S. Boyd, “Semidefinite programming,” SIAM Rev., vol. 38, no. 1, pp. 49-95, Mar. 1996.

[31] M. Grant, S. Boyd, and Y. Ye, cvx: Matlab software for disciplined convex programming. [Online]. Available: http:

//www.stanford.edu/∼boyd/cvx/

[32] J. F. Sturm, “Using SeDumi 1.02, a MATLAB toolbox for optimization over symmetric cones,” Optim. Methods Software,

no. 11-22, pp.625-653, 1999.

[33] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating

method of multipliers,” Found. Trends in Mach. Learn., vol. 3, no. 1, pp. 1-122, 2011.

[34] R. Horn and C.R. Johnson, Matrix Analysis. New York: Cambridge Univ. Press, 1985.

Qinliang Su received the B.Eng degree from Chongqing University in 2007 and M.Eng degree from

Zhejiang University in 2010 (both with honors), respectively. He is currently pursuing the Ph.D degree

with the Department of Electrical and Electronic Engineering, the University of Hong Kong, Hong Kong.

His current research interests include signal processing for wireless communications and networks; statis-

tical inference and learning; graphical models; distributed and parallel computation; convex optimization

theories.

July 31, 2014 DRAFT



33

Yik-Chung Wu received the B.Eng. (EEE) degree in 1998 and the M.Phil. degree in 2001 from the

University of Hong Kong (HKU). He received the Croucher Foundation scholarship in 2002 to study

Ph.D. degree at Texas A&M University, College Station, and graduated in 2005. From August 2005 to

August 2006, he was with the Thomson Corporate Research, Princeton, NJ, as a Member of Technical

Staff. Since September 2006, he has been with HKU, currently as an Associate Professor. He was a

visiting scholar at Princeton University, in summer 2011. His research interests are in general area of

signal processing and communication systems, and in particular distributed signal processing and communications; optimization

theories for communication systems; estimation and detection theories in transceiver designs; and smart grid. Dr. Wu served as

an Editor for IEEE Communications Letters, is currently an Editor for IEEE Transactions on Communications and Journal of

Communications and Networks.

July 31, 2014 DRAFT

Documents

1 Convergence Analysis of the Variance in Gaussian Belief ...ycwu/publication_download/...10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing 1 Convergence Analysis of