Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
1
Convergence Analysis of the Variance in
Gaussian Belief Propagation
Qinliang Su and Yik-Chung Wu
Abstract
It is known that Gaussian belief propagation (BP) is a low-complexity algorithm for (approximately)
computing the marginal distribution of a high dimensional Gaussian distribution. However, in loopy
factor graph, it is important to determine whether Gaussian BP converges. In general, the convergence
conditions for Gaussian BP variances and means are not necessarily the same, and this paper focuses
on the convergence condition of Gaussian BP variances. In particular, by describing the message-
passing process of Gaussian BP as a set of updating functions, the necessary and sufficient convergence
conditions of Gaussian BP variances are derived under both synchronous and asynchronous schedulings,
with the converged variances proved to be independent of the initialization as long as it is chosen from
the proposed set. The necessary and sufficient convergence condition is further expressed in the form of a
semi-definite programming (SDP) optimization problem, thus can be verified more efficiently compared
to the existing convergence condition based on computation tree. The relationship between the proposed
convergence condition and the existing one based on computation tree is also established analytically.
Numerical examples are presented to corroborate the established theories.
Index Terms
Gaussian belief propagation, loopy belief propagation, graphical model, factor graph, message
passing, sum-product algorithm, convergence.
Copyright (c) 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other
purposes must be obtained from the IEEE by sending a request to [email protected].
This work was supported in part by the HKU seed Funding Programme, Project No. 201311159066.
The authors are with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong,
(e-mail: [email protected], [email protected]).
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
2
I. INTRODUCTION
In signal processing and machine learning, many problems eventually come to the issue
of computing the marginal mean and variance of an individual random variable from a high
dimensional joint Gaussian probability density function (PDF). A direct way of marginalization
involves the computation of the inverse of the precision matrix in the joint Gaussian PDF. The
inverse operation is known to be computationally expensive for a high dimensional matrix, and
would introduce heavy communication overhead when carried out in distributed scenarios.
By representing the joint PDF with a factor graph [1], Gaussian BP provides an alternative to
(approximately) calculate the marginal mean and variance for each individual random variable
by passing messages between neighboring nodes in the factor graph [2]–[5]. It is known that if
the factor graph is of tree structure, the belief means and variances calculated with Gaussian BP
both converge to the true marginal means and variances, respectively [1]. For a factor graph with
loops, if the belief means and variances in Gaussian BP converge, the true marginal means and
approximate marginal variances are obtained [6]. Recently, a novel message-passing algorithm
has been proposed in [7] for inference in Gaussian graphical model by choosing a special set of
nodes to break loops in the graph. Although this algorithm computes the true marginal means and
improves the accuracy of variances, it still needs to use Gaussian BP as an underlying inference
algorithm. Thus, in this paper, we focus on the message-passing algorithm of Gaussian BP only.
With the ability to provide the true marginal means upon convergence, Gaussian BP has been
successfully applied in low-complexity detection and estimation problems arising in communi-
cation systems [8]–[10], fast solver for large sparse linear systems [11], [12], sparse Bayesian
learning [13], estimation in Gaussian graphical model [14], etc. In addition, the distributed
property inherited from message passing algorithms is particularly attractive for applications
requiring distributed implementation, such as distributed beamforming [15], inter-cell interference
mitigation [16], distributed synchronization and localization in wireless sensor networks [17]–
[19], distributed energy efficient self-deployment in mobile sensor networks [20], distributed rate
control in Ad Hoc networks [21], and distributed network utility maximization [22]. Moreover,
Gaussian BP is also exploited to provide approximate marginal variances for large-scale sparse
Bayesian learning in a computationally efficient way [23].
However, Gaussian BP only works under the prerequisite that the belief means and variances
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
3
calculated from the updating messages do converge. So far, several sufficient convergence condi-
tions have been proposed, which can guarantee the means and variances converge simultaneously
[6], [24], [25]. However, in general, the belief means and variances do not necessarily converge
under the same condition. It is reported in [24] that if the variances converge, the convergence
of the means can always be observed when suitable damping is imposed. Thus, it is important
to ensure the variances of Gaussian BP to converge in the first place. In the pioneering work
[24], the message-passing procedure of Gaussian BP in a loopy factor graph is equivalently
translated into that in a loop-free computation tree [26]. Based on the computation tree, an
almost necessary and sufficient convergence condition of variances is derived. It is proved that if
the spectral radius of the infinite dimensional precision matrix of the computation tree is smaller
than one, the variances converge; and if this spectral radius is larger than one, the Gaussian BP
becomes ill-posed. However, this condition is only ‘almost necessary and sufficient’ since it does
not cover the scenario when the spectral radius is equal to one. More crucially, this convergence
condition requires the evaluation of the spectral radius of an infinite dimensional matrix, which
is almost impossible to be calculated in practice.
In this paper, with the fact that the messages in Gaussian BP can be represented by linear
and quadratic parameters, the message-passing process of Gaussian BP is described as a set
of updating functions of parameters. Based on the monotonically non-decreasing and concave
properties of the updating functions, the necessary and sufficient convergence condition of
messages’ quadratic parameters is derived first. Then, with the relation between BP messages’
quadratic parameters and belief variances, the necessary and sufficient convergence condition
of belief variances is developed under both synchronous and asynchronous schedulings. The
initialization set under which the variances are guaranteed to converge is proposed as well.
The convergence condition derived in this paper is proved to be equivalent to a semi-definite
programming (SDP) problem, thus can be easily verified. Furthermore, the relationship between
the proposed convergence condition and the existing one based on computation tree is also
established. Our result fills in the missing part of the convergence condition in [24] on the case
when the spectral radius of the infinite dimensional matrix is equal to one.
The rest of this paper is organized as follows. Gaussian BP is reviewed in Section II. Section
III analyzes the updating process of Gaussian BP messages, followed by the necessary and
sufficient convergence condition of quadratic parameters in the messages in Section IV. The
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
4
derivation of the necessary and sufficient convergence condition of belief variances is presented
in Section V. Relationship between the condition proposed in this paper and the existing one
based on computation tree is established in Section VI. Numerical examples are presented in
Section VII, followed by conclusions in Section VIII.
The following notations are used throughout this paper. For two vectors, x1 ≥ x2 and x1 > x2
mean the inequalities are held in all corresponding elements. Notations λ(G) and λmax(G)
represent any eigenvalue and the maximal eigenvalue of matrix G, respectively. Notation ρ(G)
means the spectral radius of G. Notations [·]i and [·]ij represent the i-th and (i, j)-th element of
a vector and matrix, respectively.
II. GAUSSIAN BELIEF PROPAGATION
In general, a Gaussian PDF can be written as
f(x) ∝ exp
{−1
2xTPx+ hTx
}, (1)
where x = [x1, x2, · · · , xN ]T with N being the number of random variables; P ≻ 0 is the
precision matrix with pij being its (i, j)-th element; and h = [h1, h2, · · · , hN ]T is the linear
coefficient vector. In many signal processing applications, we want to find the marginalized PDF
p(xi) =∫f(x)dx1 · · · dxi−1dxi+1 · · · dxN . For a Gaussian f(x), this can be done by completing
the square inside the exponent of (1) as f(x) ∝ exp{−1
2(x−P−1h)TP(x−P−1h)
}, and then
the marginalized PDF is obtained as
p(xi) ∝ exp
{−(xi − µi)
2
2σ2i
}, (2)
with the mean µi = [P−1h]i and the variance σ2i = [P−1]ii. However, the required matrix inverse
is computationally expensive for a large dimensional P, and introduces heavy communication
overhead to the network when the information pij is located distributedly.
On the other hand, the Gaussian PDF in (1) can be expanded as f(x) ∝N∏i=1
fi(xi)N∏j=1
N∏k=j+1
fjk(xj, xk),
where fi(xi) = exp{−pii
2x2i + hixi
}and fjk(xj, xk) = exp {−pjkxjxk}. Based on this expan-
sion, a factor graph1 can be constructed by connecting each variable xi with its associated factors
fi(xi) and fij(xi, xj). It is known that Gaussian BP can be applied on this factor graph by passing
1The factor graph is assumed to be connected in this paper.
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
5
messages among neighboring nodes to obtain the true marginal mean and approximate marginal
variance for each variable.
In Gaussian BP, the departing and arriving messages corresponding to variables i and j are
updated as
mdi→j (xi, t) ∝ fi(xi)
∏k∈N (i)\j
mak→i (xi, t), (3)
mai→j(xj, t+ 1) ∝
∫fij (xi, xj)m
di→j (xi, t) dxi, (4)
where (i, j) ∈ E with E , {(i, j)|pij = 0, for i, j ∈ V} being the set of index pairs of any two
connected variables and V , {1, 2, · · · , N} being the set of indices of all variables; t is the time
index; N (i) is the set of indices of neighboring variable nodes of node i, and N (i)\j is the set
N (i) but excluding the node j. After obtaining the messages mak→i(xi, t), the belief at variable
node i is equal to
bi (xi, t) ∝ fi(xi)∏
k∈N (i)
mak→i (xi, t). (5)
It has been proved that bi(xi, t) always converges to p(xi) in (2) exactly if the factor graph is
of tree structure [1], [6]. For loopy factor graph, if bi (xi, t) converges, the mean of bi(xi, t) will
converge to the true mean µi, while the converged variance is an approximation of σ2i [6].
Assume the arriving message is in Gaussian form of mai→j(xj, t) ∝ exp
{−vai→j(t)
2x2j + βa
i→j(t)xj
},
where vai→j(t) and βai→j(t) are the arriving precision and arriving linear coefficient, respectively.
Inserting it into (3), we obtain mdi→j(xi, t) ∝ exp
{−vdi→j(t)
2x2i + βd
i→j(t)xi
}, where
vdi→j(t) = pii +∑
k∈N (i)\j
vak→i(t), (6)
βdi→j(t) = hi +
∑k∈N (i)\j
βak→i(t) (7)
are the departing precision and linear coefficient, respectively. Furthermore, substituting the
departing message mdi→j(xi, t) into (4), we obtain
mai→j(xj, t+1) ∝ exp
{p2ij
2vdi→j(t)x2j −
pijβdi→j(t)
vdi→j(t)xj
}×∫
exp
−vdi→j(t)
2
(xi −
βdi→j(t)− pijxj
vdi→j(t)
)2 dxi.
(8)
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
6
If vdi→j(t) > 0, the integration equals to a constant, and thus mai→j(xj, t+1) ∝ exp
{p2ij
2vdi→j(t)x2j −
pijβdi→j(t)
vdi→j(t)xj
}.
Therefore, vai→j(t+ 1) and βai→j(t+ 1) are updated as
vai→j(t+ 1) =
− p2ijvdi→j(t)
, if vdi→j(t) > 0
not defined, otherwise,(9)
βai→j(t+ 1) =
−pijβdi→j(t)
vdi→j(t), if vdi→j(t) > 0
not defined, otherwise,(10)
where ‘not defined’ arises since under the case of vdi→j(t) ≤ 0, the integration in (8) becomes
infinite and the message mai→j(xj, t + 1) loses its Gaussian form. After obtaining vak→i(t), the
variance of belief at each iteration is computed as
σ2i (t) =
1
pii +∑
k∈N (i) vak→i(t)
. (11)
From (9) and (10), it can be seen that the messages of Gaussian BP maintain the Gaussian
form only when vdi→j(t) > 0 for all t ≥ 0. Therefore, we have the following lemma.
Lemma 1. The messages of Gaussian BP are always in Gaussian form if and only if vdi→j(t) > 0
for all t ≥ 0.
III. ANALYSIS OF THE MESSAGE-PASSING PROCESS
In this section, we first analyze the updating process of vai→j(t), and then derive the condition
to guarantee the BP messages being maintained at Gaussian form.
A. Updating Function of vai→j(t) and Its Properties
Under the Gaussian form of messages, substitutting (6) into (9) gives
vai→j(t+ 1) = −p2ij
pii +∑
k∈N (i)\jvak→i(t)
. (12)
By writing (12) into a vector form, we obtain
va(t+ 1) = g(va(t)), (13)
where g(·) is a vector-valued function containing components gij(·) with (i, j) ∈ E arranged in
ascending order first on j and then on i, and gij(·) is defined as
gij (w) , −p2ij
pii +∑
k∈N (i)\j wki
; (14)
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
7
va(t) and w are vectors containing elements vai→j(t) and wij , respectively, both with (i, j) ∈ E
arranged in ascending order first on j and then on i. Moreover, define the set
W ,{w∣∣pii +∑
k∈N (i)\jwki > 0, ∀ (i, j) ∈ E
}. (15)
Now, we have the following proposition about g(·) and W .
Proposition 1. The following claims hold:
P1) For any w1 ∈ W , if w1 ≤ w2, then w2 ∈ W;
P2) For any w1 ∈ W , if w1 ≤ w2, then g(w1) ≤ g(w2);
P3) gij(w) is a concave function with respect to w ∈ W .
Proof: Consider two vectors w and w in W , which contain elements wki and wki with
(k, i) ∈ E arranged in ascending order first on k and then on i. For any w ∈ W , according to
the definition of W in (15), we have pii +∑
k∈N (i)\j wki > 0. Then, if w ≤ w, it can be easily
seen that pii +∑
k∈N (i)\j wki > 0 as well. Thus, we have w ∈ W .
The first-order derivative of gij(w) with respect to wki for k ∈ N (i)\j is computed to be
∂gij∂wki
=p2ij
(pii +∑
k∈N (i)\j wki)2 > 0. (16)
Thus, gij(w) is a continuous and strictly increasing function with respect to the components wki
for k ∈ N (i)\j with w ∈ W . Hence, we have if w1 ≤ w2, then g(w1) ≤ g(w2).
To prove the third property, consider the simple function − p2ijpii+r
under the domain of r > −pii
first. It can be easily derived that the second order derivative of − p2ijpii+r
is − 2p2ij(pii+r)3
< 0, where the
inequality holds due to pii+ r > 0. Thus, − p2ijpii+r
is a concave function with respect to r > −pii.
Since gij(w) = − p2ijpii+
∑k∈N (i)\j wki
is the composition of the concave function − p2ijpii+r
and the
affine mapping∑
k∈N (i)\j wki, the composition function gij(w) is also concave with respect to∑k∈N (i)\j wki > −pii [28, p. 79]. Due to W ,
{w∣∣pii +∑k∈N (i)\j wki > 0, ∀ (i, j) ∈ E
}as
defined in (15), then w ∈ W implies that∑
k∈N (i)\j wki > −pii. Thus, we can infer that gij(w)
is a concave function with respect to w ∈ W .
Notice that convexity is also exploited in [29] to derive the convergence condition for a general
min-sum message-passing algorithm, which reduces to Gaussian BP when it is applied to the
particular quadratic function 12xTPx − hTx. The convexity in [29] means that the quadratic
function can be decomposed into summation of convex functions, which is different from the
convexity in the updating function gij(w) here.
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
8
B. Condition to maintain the Gaussian Form of Messages
First, define the following set2
S1 , {w ∈ W |w ≤ g(w)} . (17)
With notations g(t)(w) , g(g(t−1)(w)
)and g(0)(w) , w, the following proposition can be
established.
Proposition 2. The set S1 has the following properties:
P4) S1 is a convex set;
P5) If s ∈ S1, then s < 0;
P6) If s ∈ S1, then g(t)(s) ∈ S1 and g(t)(s) ≤ g(t+1)(s) for all t ≥ 0.
Proof: As gij(w) is a concave function for w ∈ W according to P3), thus gij(w)−wij is also
concave for w ∈ W . The set of w ∈ W which satisfies the condition of gij(w)−wij ≥ 0 is a con-
vex set [28]. Thus, S1 = {w |w ≤ g(w) with w ∈ W} = {gij(w)− wij ≥ 0 for all (i, j) ∈ E with w ∈ W}
is also a convex set.
If s ∈ S1, we have s ≤ g(s) and s ∈ W . According to the definition of W in (15), pii +∑k∈N (i)\j ski > 0. Putting this fact into the definition of gij(·) in (14), it is obvious that g(s) < 0.
Therefore, we have the relation that s ≤ g(s) < 0 for all s ∈ S1.
Finally, if s ∈ S1, we have s ≤ g(s) and s ∈ W . Hence, g(s) ∈ W according to the P1).
Applying g(·) on both sides of s ≤ g(s) and using P2), we obtain g(s) ≤ g(2)(s). Furthermore,
since g(s) ∈ W and g(s) ≤ g(2)(s), we also have g(s) ∈ S1. By induction, we can prove in
general that g(t)(s) ∈ S1 and g(t)(s) ≤ g(t+1)(s) for all t ≥ 0.
Now, we give the following theorem.
Theorem 1. There exists at least one initialization va(0) such that the messages of Gaussian
BP are maintained at Gaussian form for all t ≥ 0 if and only if S1 = ∅.
Proof:
Sufficient Condition:
2The simpler symbol S is reserved for later use.
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
9
If S1 = ∅, by choosing an initialization va(0) ∈ S1 and applying P6), it can be inferred that
va(t) ∈ S1 for all t ≥ 0. Moreover, according to the definitions of W in (15) and S1 in (17), we
have S1 ⊆ W , and thereby va(t) ∈ W for all t ≥ 0, or equivalently pii +∑
k∈N (i)\j vak→i(t) >
0 for all (i, j) ∈ E according to the definition of set W in (15). Due to vdi→j(t) = pii +∑k∈N (i)\j v
ak→i(t) as given in (6), we obtain vdi→j(t) > 0 for all (i, j) ∈ E and t ≥ 0. According
to Lemma 1, it can be inferred that the messages are maintained at Gaussian form for all t ≥ 0.
Necessary Condition:
We prove this necessary condition by contradiction. When S1 = ∅, suppose that there exists a
va(0) such that the messages are maintained at Gaussian form for all t ≥ 0. According to Lemma
1, we can infer that vdi→j(t) = pii +∑
k→N (i)\j vak→i(t) > 0 for all (i, j) ∈ E , or equivalently
va(t) , g(t)(va(0)) ∈ W for all t ≥ 0 from the definition of W in (15).
Now, choose another initialization va(0) satisfying both va(0) ≤ va(0) and va(0) ≥ 0. Due
to va(0) ∈ W and va(0) ≤ va(0), by using P2), we have g (va(0)) ≤ g (va(0)), that is,
va(1) ≤ va(1). Furthermore, substituting va(0) ≥ 0 into (14) gives va(1) ≤ 0, and thereby
va(1) ≤ va(0). Combining with va(1) ≤ va(1) leads to va(1) ≤ va(1) ≤ va(0). Due to the
assumption va(1) ∈ W , by applying P2) to va(1) ≤ va(1) ≤ va(0), we obtain va(2) ≤ va(2) ≤
va(1). Combining with the fact va(1) ≤ 0 as proved above, we have va(2) ≤ va(2) ≤ va(1) ≤ 0.
By induction, it can be derived that
va(t+ 1) ≤ va(t+ 1) ≤ va(t) ≤ 0, ∀ t ≥ 1. (18)
Then, from the definition of W in (15), the assumption va(t) ∈ W is equivalent to pii +∑k∈N (i)\j v
ak→i(t) > 0 for all (i, j) ∈ E , where vak→i(t) are the elements of va(t) with (k, i) ∈ E
arranged in the same order as va(t). By rearranging the terms in pii +∑
k∈N (i)\j vak→i(t) > 0,
we obtain vaγ→i(t) > −pii −∑
k∈N (i)\j,γ vak→i(t). Applying va(t) ≤ 0 shown in (18) into this
inequality, it can be inferred that vaγ→i(t) > −pii−∑
k∈N (i)\j,γ vak→i(t) ≥ −pii, that is, vaγ→i(t) >
−pii for all (γ, i) ∈ E . Combining with va(t) ≥ va(t) shown in (18), for all (i, j) ∈ E , we have
vai→j(t) > −pjj. (19)
It can be seen from (18) and (19) that va(t) is a monotonically non-increasing and lower bounded
sequence. Thus, va(t) must converge to a vector va∗, that is, va∗ = g(va∗).
On the other hand, substituting (12) into (19) gives − p2ijpii+
∑k∈N (i)\jv
ak→i(t)
> −pjj . Due to
va(t) ∈ W , it can be inferred from P1) and (18) that va(t) ∈ W , or equivalently pii +∑
k∈N (i)\jvak→i(t) >
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
10
0. Together with the fact pjj > 0, we obtain pii +∑
k∈N (i)\jvak→i(t) >
p2ijpjj
. Since va(t) converges
to va∗, taking the limit on both sides of the inequality gives pii +∑
k∈N (i)\j va∗k→i ≥
p2ijpjj
. Hence,
pii+∑
k∈N (i)\j va∗k→i > 0. From the definition of W in (15), we have va∗ ∈ W . Combining with
va∗ = g(va∗), according to the definition of S1 in (17), it is clear that va∗ ∈ S1. This contradicts
with the prerequisite S1 = ∅. Therefore, S1 cannot be ∅.
IV. NECESSARY AND SUFFICIENT CONVERGENCE CONDITION OF MESSAGE PARAMETER
va(t)
According to Theorem 1, Gaussian form of messages may only be maintained for some va(0).
Thus, the choice of initialization va(0) also plays an important role in the convergence of va(t).
A. Synchronous Scheduling
In this section, we will investigate the convergence condition of va(t) under synchronous
scheduling, which means that va(t) is updated as va(t + 1) = g(va(t)). The necessary and
sufficient convergence condition under nonnegative initialization set is derived first. Then, the
convergence condition is proved to hold under a more general initialization set.
Lemma 2. va(t) converges to the same point va∗ ∈ S1 for all va(0) ≥ 0 if and only if S1 = ∅.
Proof: See Appendix A.
Obviously, the initialization va(0) = 0 used by the convergence condition in [24] is implied
by the proposed initialization set va(0) ≥ 0 in Lemma 2. In the following, we show that the
initialization set could be further expanded to the set
A , {w ≥ 0}∪{w ≥ w0 |w0 ∈ int(S1)}∪{w ≥ w0
∣∣∣w0 ∈ S1 and w0 = limt→∞
g(t)(0)}, (20)
where int(S1) means the interior of S1. Notice that the set A consists of the union of three parts.
• The first part {w ≥ 0} corresponds to the initialization set given in Lemma 2.
• For the case S1 = ∅ but int(S1) = ∅, the second part is empty. Also, according to Lemma 2,
we have limt→∞
g(t)(0) = va∗ ∈ S1. Thus, the third part of set A reduces to {w ≥ va∗}. Due
to the fact that va∗ ∈ S1 implies va∗ < 0 given by P5), we have {w ≥ 0} ( {w ≥ va∗}.
• For the case int(S1) = ∅, according to (63), we have w0 ≤ va∗ for any w0 ∈ int(S1).
This implies {w ≥ va∗} ⊆ {w ≥ w0 |w0 ∈ int(S1)}. Obviously, in this case, we also have
{w ≥ 0} ( {w ≥ w0 |w0 ∈ int(S1)}.
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
11
Therefore, if S1 = ∅, the set A is always larger than the set {w ≥ 0}.
Theorem 2. va(t) converges to the same point va∗ for all va(0) ∈ A if and only if S1 = ∅.
Proof: The first part {w ≥ 0} of A corresponds to the case covered in Lemma 2, thus has
already been established. To prove the sufficiency, we consider the case int (S1) = ∅ and the
case int(S1) = ∅ but S1 = ∅, respectively.
For the case int (S1) = ∅, it is known from (20) that A = {w ≥ w0 |w0 ∈ int(S1)} in this
case. To prove the theorem, we will first prove that va(t) converges to the same point va∗ for
all va(0) ∈ int(S1). Then, notice that for any va(0) ∈ {w ≥ w0 |w0 ∈ int(S1)}, it can be upper
bounded by a point from {w ≥ 0} and lower bounded by a point from int(S1). Because both
the upper bound and lower bound converge to va∗, we can easily extend the convergence to the
much larger set va(0) ∈ {w ≥ w0 |w0 ∈ int(S1)}.
First, we prove that va(t) converges to va∗ for all va(0) ∈ int(S1). To do this, we establish
the following three facts first.
1) For any va(0) ∈ int(S1), according to P6), we have va(t) ≤ va(t+ 1) and va(t) ∈ S1. With
P5), we also have va(t) < 0. It can be seen that va(t) is a monotonically non-decreasing
and upper bounded sequence, thus va(t) converges to a fixed point, which is denoted as
va∗. On the other hand, let va(0) ≥ 0 correspond to a point in the first part of A, due
to va(0) < 0, we have va(0) < va(0). By using P2), applying g(·) to va(0) < va(0) for
t times gives va(t) ≤ va(t). Since va(t) and va(t) converge to va∗ and va∗, respectively,
taking the limit on both sides of va(t) ≤ va(t) gives va∗ ≤ va∗. Due to va(0) ∈ int(S1), the
definition of S1 in (17) implies that va(0) < g(va(0)), that is, va(0) < va(1). Combining
with va(t) ≤ va(t + 1) established above, we obtain va(0) < va∗. Together with the fact
va∗ ≤ va∗, we obtain
va(0) < va∗ ≤ va∗. (21)
2) For a va(0) ∈ int (S1), construct a vector
va(0) = λva(0) + (1− λ)va∗, (22)
where λ ∈ (0, 1). Since gij(w) is a concave function over w ∈ S1 and {va(0),va∗} ∈ S1,
we have g(λva(0) + (1− λ)va∗) ≥ λg(va(0)) + (1− λ)g(va∗), or equivalently g(λva(0) +
(1− λ)va∗) ≥ λva(1) + (1− λ)va∗. According to P2), applying g(·) to this inequality gives
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
12
g(2)(λva(0) + (1 − λ)va∗) ≥ g(λva(1) + (1 − λ)va∗). Because of the concave property of
gij(w), it is known that g(λva(1) + (1 − λ)va∗) ≥ λva(2) + (1 − λ)va∗. Thus, we have
g(2)(λva(0) + (1 − λ)va∗) ≥ λva(2) + (1 − λ)va∗. By induction, it can be inferred that
g(t)(λva(0)+(1−λ)va∗) ≥ λva(t)+(1−λ)va∗ for all t ≥ 0. With va(0) = λva(0)+(1−λ)va∗
given by (22), we obtain
va(t) ≥ λva(t) + (1− λ)va∗, (23)
where va(t) , g(t)(va(0)). Since {va(0),va∗} ∈ S1 and S1 is a convex set as indicated in
P4), we have va(0) = λva(0) + (1− λ)va∗ ∈ S1. According to P6), va(t) , g(t)(va(0)) is a
monotonically non-decreasing sequence. Furthermore, according to P5), va(t) is also upper
bounded by 0. Thus, va(t) converges to a fixed point, which is denoted as va∗. Taking limits
on both sides of (23) gives
va∗ ≥ λva∗ + (1− λ)va∗
= va∗ + (1− λ)(va∗ − va∗). (24)
3) From va(0) < va∗ ≤ va∗ given by (21), we have va∗ − va(0) > 0 and va∗ − va(0) > 0. If
λ ∈ (0, 1) and is close to 1, the relation (1− λ)(va∗ − va(0)) < va∗ − va(0) always holds,
that is,
∃ λ ∈ (0, 1), s.t. (1− λ)(va∗ − va(0)) < va∗ − va(0). (25)
Rearranging the terms in (25) gives ∃ λ ∈ (0, 1), s.t. λva(0) + (1 − λ)va∗ < va∗. Due
to λva(0) + (1 − λ)va∗ = va(0) given in (22), thus (25) implies that there always exists
a λ ∈ (0, 1) such that va(0) < va∗. Now, applying g(·) to va(0) < va∗ for t times, from
P2), we obtain g(t)(va(0)) ≤ g(t)(va∗), or equivalently va(t) ≤ va∗. Taking the limit on both
sides of va(t) ≤ va∗, it can be inferred that
∃ λ ∈ (0, 1), s.t. va∗ ≤ va∗. (26)
Now, we make use of (21), (24) and (26) to prove that the convergence limit va∗ is identical to
the convergence limit va∗. Combining (21) and (26) gives
∃ λ ∈ (0, 1), s.t. va∗ ≤ va∗ ≤ va∗. (27)
From the second inequality of (27), we have (1− λ)(va∗ − va∗) ≥ 0. Substituting this relation
into (24), we can infer that va∗ ≥ va∗. Comparing this result to the first inequality of (27), we
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
13
obtain va∗ = va∗. Due to va∗ = va∗, (24) becomes (1− λ)(va∗ − va∗) ≤ 0. For λ ∈ (0, 1), the
relation is equivalent to va∗ ≤ va∗. Combining it with the second inequality in (27), we obtain
va∗ = va∗. (28)
Next, we will extend the initialization set to the whole set A = {w ≥ w0 |w0 ∈ int(S1)}.
For any va(0) ∈ A, we can always find a val (0) ∈ int(S1) and a va
u(0) ≥ 0 such that val (0) ≤
va(0) ≤ vau(0). According to P2), applying g(·) to this inequality repeatedly gives g(t)(va
l (0)) ≤
g(t)(va(0)) ≤ g(t)(vau(0)). Since va
l (t) = g(t)(val (0)) and va
u(t) = g(t)(vau(0)) both converge to
va∗, thus va(t) converges to va∗, too.
Now, consider the case int(S1) = ∅ but S1 = ∅. In this case, according to the discussion
after (20), A = {w ≥ va∗}. For any va(0) ∈ A, we can always find a vau(0) ∈ {w ≥ 0} such
that va∗ ≤ va(0) ≤ vau(0). Applying g(·) to this inequality for t times, from P2), we obtain
va∗ ≤ g(t)(va(0)) ≤ g(t)(vau(0)). Since va
u(t) converges to va∗, thus va(t) also converges to va∗.
On the other hand, if S1 = ∅, according to Theorem 1, the messages of Gaussian BP passed
in factor graph lose the Gaussian form, thus the parameters of messages va(t) cannot converge
in this case.
B. Asynchronous Scheduling
In this section, convergence conditions under asynchronous scheduling schemes will be in-
vestigated. Modified from the synchronous updating equation in (12), arriving precision in
asynchronous scheduling is updated as
vai→j(t+ 1) = −p2ij
pii +∑
k∈N (i)\jvak→i(τ
ak→i(t))
, ∀ t ∈ Ti→j (29)
where Ti→j is the set of time instants at which vai→j(t) are updated; 0 ≤ τak→i(t) ≤ t is the last
updated time instant of vak→i(t) at time t. In this paper, we only consider the totally asynchronous
scheduling defined as follows.
Definition 1. (Totally Asynchronous Scheduling) [27]: The sets Ti→j are infinite, and τai→j(t)
satisfies 0 ≤ τai→j(t) ≤ t and limt→∞
τai→j(t) = ∞ for all (i, j) ∈ E .
In order to prove the convergence of va(t) under asynchronous scheduling given by Definition
1, we need to find a sequence of nonempty sets {U(m)} for m ≥ 0 such that the following two
sufficient conditions are satisfied [27]:
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
14
• Synchronous Convergence Condition: the sequence of sets satisfy 1) U(m + 1) ⊆ U(m); 2)
g(w) ∈ U(m+ 1) for ∀ w ∈ U(m); 3) U(m) converges to a fixed point;
• Box Condition: U(m) can be written as product of subsets of individual components Ui→j(m)
for all (i, j) ∈ E .
While the formal proof of asynchronous convergence under these conditions are provided in
[27], we give a brief intuitive explanation below. Suppose when t ≥ tm, vai→j(t) ∈ Ui→j(m)
for all (i, j) ∈ E , or expressed using the Box Condition va(t) ∈ U(m) for t ≥ tm. Now,
consider the updating of an individual component vai→j(t). Due to limt→∞
τak→i(t) = ∞, we can
always find a t(i→j)m+1 > tm such that when t ≥ t
(i→j)m+1 , we have τak→i(t) ≥ tm and thereby
vak→i(τak→i(t)) ∈ Uk→i(m) for all k ∈ N (i)\j. Putting vak→i(τ
ak→i(t)) into function gij(·) in
(29) and applying condition 2) in Synchronous Convergence Condition, we have vai→j(t+ 1) ∈
Ui→j(m + 1) for t ≥ t(i→j)m+1 . Thus, by choosing tm+1 = max
(i,j)∈Et(i→j)m+1 , we have when t ≥ tm+1,
vai→j(t) ∈ Ui→j(m + 1) for all (i, j) ∈ E . Further due to conditions 1) and 3) in Synchronous
Convergence Condition, we know that va(t) will gradually converge to a fixed point.
Theorem 3. va(t) converges for any va(0) ∈ A and all choices of schedulings if and only if
S1 = ∅.
Proof: We first prove the sufficient condition. Since S1 = ∅, as given in the proof of
Theorem 2, for any va(0) ∈ A, there always exist an element va(0) ∈ int (S1) ∪ {va∗} and
va(0) ≥ 0 such that va(0) ≤ va(0) ≤ va(0). Construct a sequence of sets as
U(m) = {w |va(m) ≤ w ≤ va(m)} . (30)
Since va(0) ∈ int (S1)∪ {va∗}, we have va(m) ≤ va(m+1) according to P6). Further, for any
va(0) ≥ 0, putting va(0) into (14), we have va(1) < 0, thereby va(1) < va(0). Since va(m)
converges according to Theorem 2, then va(m) ∈ W for all m ≥ 0. Thus, according to P2), by
applying g(·) to va(1) < va(0) and by induction, we have va(m + 1) ≤ va(m). By exploiting
va(m) ≤ va(m+ 1), va(m+ 1) ≤ va(m) and the definition of U(m) in (30), we have
U(m+ 1) ⊆ U(m). (31)
Now, for any w ∈ U(m), by applying g(·) to va(m) ≤ w ≤ va(m), we obtain va(m + 1) ≤
g(w) ≤ va(m+ 1). Thus, from the definition of U(m) in (30), we have the relation
g (w) ∈ U(m+ 1), ∀ w ∈ U(m). (32)
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
15
Furthermore, according to Theorem 2, we have limm→∞
va(m) = limm→∞
va(m) = va∗, hence U(m)
will converge to the fixed point va∗. Thus, the set U(m) satisfies the Synchronous Convergence
Condition [27]. Furthermore, according to the definition of U(m) in (30), va(m) ∈ U(m) means
vai→j(m) ≤ vai→j(m) ≤ vai→j(m) for all (i, j) ∈ E , that is, vai→j(m) ∈ Ui→j(m) with Ui→j(m) =
{wij|vai→j(m) ≤ wij ≤ vai→j(m)}. Thus, U(m) can be represented in form of product of subsets
of individual components Ui→j(m) for all (i, j) ∈ E , and thereby the Box Condition is satisfied.
Hence, the sufficient condition is proved.
On the other hand, if S1 = ∅, according to Theorem 1, the messages of Gaussian BP passed
in factor graph lose the Gaussian form, thus the parameters of messages va(t) cannot converge
in this case.
From Theorems 2 and 3, it can be seen that the convergence condition of va(t) under
asynchronous scheduling is the same as that under synchronous scheduling.
V. NECESSARY AND SUFFICIENT CONVERGENCE CONDITION OF BELIEF VARIANCE σ2i (t)
Although the convergence conditions for va(t) are derived in the last section, what we are
really interested in is the convergence condition for the belief variance σ2i (t). According to
(11), the variance σ2i (t) and va(t) are related by σ2
i (t) = 1pii+
∑k∈N (i) v
ak→i(t)
, or equivalently1
σ2i (t)
= pii +∑
k∈N (i) vak→i(t).
A. Convergence Condition
In order to investigate the convergence condition of σ2i (t), we present the following lemma
first.
Lemma 3. If S1 = ∅ and va(0) ∈ A, then limt→∞
1σ2i (t)
> 0 for all i ∈ V or limt→∞
1σ2i (t)
= 0 for all
i ∈ V .
Proof: In the proof, we first consider the special case under tree-structured factor graph
as well as the impact of initializations. Then, for the case of loopy factor graph, we prove that
there exists at least one node γ ∈ V such that limt→∞
1σ2γ(t)
≥ 0. At last, we demonstrate that either
limt→∞
1σ2γ(t)
> 0 for all γ ∈ V or limt→∞
1σ2γ(t)
= 0 for all γ ∈ V will hold.
First, if the factor graph is of tree structure, according to properties of BP in a tree structured
factor graph, it is known that limt→∞
σ2γ(t) = [P−1]γγ [1]. Due to P−1 ≻ 0, we have lim
t→∞σ2γ(t) =
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
16
[P−1]γγ > 0, and thereby limt→∞
1σ2γ(t)
> 0 for all γ ∈ V . In the following, we focus on the factor
graph containing loops.
If S1 = ∅, according to Theorem 2, va(t) converges to the same point for any initialization
va(0) ∈ A. Due to 1σ2i (t)
= pii +∑
k∈N (i) vak→i(t), then lim
t→∞1
σ2i (t)
always exists and is identical
for all va(0) ∈ A. Therefore, we only need to consider a specific initialization, e.g. va(0) = 0.
Second, we prove limt→∞
1σ2γ(t)
≥ 0 for at least one node γ ∈ V . Due to the factor graph containing
loops, we can always find a node γ such that |N (γ)| ≥ 2, where | · | means the cardinality of a
set. Denote nodes i, j as two neighbors of node γ, that is, i, j ∈ N (γ). Since va(0) = 0, from
(63), we have va∗ ≤ va(t). Applying this relation to 1σ2γ(t)
= pγγ +∑
k∈N (γ)\i vak→γ(t) + vai→γ(t)
gives
1
σ2γ(t)
≥ pγγ +∑
k∈N (γ)\i
vak→γ(t) + va∗i→γ. (33)
By further using va∗ ≤ va(t), we can easily obtain
pγγ +∑
k∈N (γ)\i
vak→γ(t) + va∗i→γ = pγγ +∑
k∈N (γ)\i,j
vak→γ(t) + va∗i→γ + vaj→γ(t)
≥ pγγ +∑
k∈N (γ)\j
va∗k→γ + vaj→γ(t). (34)
It can be further shown that
pγγ +∑
k∈N (γ)\j
va∗k→γ + vaj→γ(t)
=pγγ +
∑k∈N (γ)\jv
a∗k→γ
pjj +∑
k∈N (j)\γ vak→j(t− 1)
pjj +∑
k∈N (j)\γ
vak→j(t− 1) +pjj +
∑k∈N (j)\γ v
ak→j(t− 1)
pγγ +∑
k∈N (γ)\jva∗k→γ
vaj→γ(t)
a=
pγγ +∑
k∈N (γ)\jva∗k→γ
pjj +∑
k∈N (j)\γ vak→j(t− 1)
pjj +∑
k∈N (j)\γ
vak→j(t− 1)−p2γj
pγγ +∑
k∈N (γ)\jva∗k→γ
b=
pγγ +∑
k∈N (γ)\jva∗k→γ
pjj +∑
k∈N (j)\γ vak→j(t− 1)
pjj +∑
k∈N (j)\γ
vak→j(t− 1) + va∗γ→j
, (35)
where the equality a= holds because
(pjj +
∑k∈N (j)\γ v
ak→j(t− 1)
)vaj→γ(t) = −p2γj from (12);
and b= is obtained by using the limiting form of (12) va∗γ→j = − p2γj
pγγ+∑
k∈N (γ)\jva∗k→γ
. Combining
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
17
(34) and (35) gives
pγγ+∑
k∈N (γ)\i
vak→γ(t)+va∗i→γ ≥pγγ +
∑k∈N (γ)\jv
a∗k→γ
pjj +∑
k∈N (j)\γ vak→j(t− 1)
pjj +∑
k∈N (j)\γ
vak→j(t− 1) + va∗γ→j
.
(36)
Due to va∗ ∈ S1 from Lemma 2, then va∗ ∈ W . From the definition of W in (15), for all
(ν0, ν1) ∈ E , we have
pν0ν0 +∑
k∈N (ν0)\ν1
va∗k→ν0> 0. (37)
By using (37) and va(t) ≥ va∗ implied by (63), we can infer pν0ν0 +∑
k∈N (ν0)\ν1 vak→ν0
(t) > 0
for all (ν0, ν1) ∈ E , and thereby
pν0ν0 +∑
k∈N (ν0)\ν1 va∗k→ν0
pν1ν1 +∑
k∈N (ν1)\ν0 vak→ν1
(t)> 0. (38)
Since the factor graph contains loops, there always exists a walk (j0, j1, · · · , jt−1, jt, i) with
jt = γ such that all (jℓ−1, jℓ) ∈ E and |N (jℓ)| ≥ 2 for ℓ = 0, 1, · · · , t. Then, using (36)
repeatedly on such a walk gives
pγγ +∑
k∈N (γ)\i
vak→γ(t) + va∗i→γ ≥t−1∏ℓ=0
pjℓ+1jℓ+1+∑
k∈N (jℓ+1)\jℓ va∗k→jℓ+1
pjℓjℓ +∑
k∈N (jℓ)\jℓ+1vak→jℓ
(ℓ)
pj0j0 +∑
k∈N (j0)\j1
vak→j0(0) + va∗j1→j0
=(pj0j0 + va∗j1→j0
) t−1∏ℓ=0
pjℓ+1jℓ+1+∑
k∈N (jℓ+1)\jℓ va∗k→jℓ+1
pjℓjℓ +∑
k∈N (jℓ)\jℓ+1vak→jℓ
(ℓ), (39)
where the last equality holds due to va(0) = 0. Due to |N (j0)| ≥ 2, there exists a node ν
such that j1 ∈ N (j0)\ν. By exploiting the fact that va∗ < 0 due to va∗ ∈ S1, we obtain
pj0j0 + va∗j1→j0≥ pj0j0 +
∑k∈N (j0)\ν v
a∗k→j0
. Combining with (37), we can infer that
pj0j0 + va∗j1→j0> 0. (40)
Putting (38) and (40) into (39), we obtain pγγ +∑
k∈N (γ)\i vak→γ(t) + va∗i→γ > 0. Applying this
result to (33) gives 1σ2γ(t)
> 0 for all t ≥ 0, and thereby limt→∞
1σ2γ(t)
≥ 0.
At last, we will prove that limt→∞
1σ2γ(t)
> 0 for all γ ∈ V or limt→∞
1σ2γ(t)
= 0 for all γ ∈ V . Notice
that the relation in (35) holds for any (γ, j) ∈ E . Thus, taking the limit on both sides of (35)
and recognizing that limt→∞
1σ2γ(t)
= pγγ +∑
k∈N (γ)va∗k→γ and lim
t→∞1
σ2j (t)
= pjj +∑
k∈N (j)va∗k→j , we
obtain
limt→∞
1
σ2γ(t)
=pγγ +
∑k∈N (γ)\j v
a∗k→γ
pjj +∑
k∈N (j)\γ va∗k→j
· limt→∞
1
σ2j (t)
. (41)
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
18
Since pγγ+∑
k∈N (γ)\j va∗k→γ > 0 for all (γ, j) ∈ E given in (37), according to (41), if lim
t→∞1
σ2γ(t)
>
0, then limt→∞
1σ2j (t)
> 0 for all nodes j ∈ N (γ), and vice versa. Similarly, if limt→∞
1σ2γ(t)
= 0, then
limt→∞
1σ2j (t)
= 0 for all nodes j ∈ N (γ), and vice versa. Recall that we have proved there exists a
node γ with limt→∞
1σ2γ(t)
≥ 0. Therefore, for a fully connected factor graph, either limt→∞
1σ2i (t)
> 0
for all i ∈ V or limt→∞
1σ2i (t)
= 0 for all i ∈ V .
Obviously, if limt→∞
1σ2i (t)
= 0, then limt→∞
σ2i (t) exists, and hence the convergence condition of
σ2i (t) is the same as that of 1
σ2i (t)
. However, Lemma 3 reveals that the scenario limt→∞
1σ2i (t)
= 0
cannot be excluded. Thus, we have the following theorem.
Theorem 4. σ2i (t) with i ∈ V converges to the same positive value for all va(0) ∈ A under
synchronous and asynchronous schedulings if and only if S = ∅, where
S ,{w
∣∣∣∣w ∈ S1 and pii +∑
k∈N (i)wki > 0, ∀ i ∈ V
}. (42)
Proof: First, we prove if S = ∅, then σ2i (t) converges for any va(0) ∈ A. From the definition
of S in (42), if S = ∅, then S1 = ∅. According to Theorems 2 and 3, va(t) converges to va∗
for all va(0) ∈ A under both synchronous and asynchronous schedulings. Thus, limt→∞
1σ2i (t)
=
pii +∑
k∈N (i) vak→i(t) will converge to pii +
∑k∈N (i) v
a∗k→i. Due to S = ∅, there exists a w such
that w ∈ S1 and pii+∑
k∈N (i)wki > 0 for all i ∈ V . Due to w ∈ S1, it can be inferred from (63)
that w ≤ va∗. Putting this result into pii +∑
k∈N (i)wki > 0, we obtain pii +∑
k∈N (i) va∗k→i > 0
and its inverse exists. Thus, if S = ∅, the variance σ2i (t) = 1
pii+∑
k∈N (i) vak→i(t)
converges to1
pii+∑
k∈N (i) va∗k→i
> 0.
Next, we prove by contradiction that if σ2i (t) converges for va(0) ∈ A, then S = ∅. Suppose
that S = ∅. This assumption contains two cases: 1) S1 = ∅; 2) S1 = ∅ and pii+∑
k∈N (i) wki ≤ 0
for some i ∈ V . First, consider the case S1 = ∅. Due to S1 = ∅, according to Theorem 1,
the messages of Gaussian BP passed in factor graph cannot maintain Gaussian form, hence
va(t) becomes undefined. Therefore, the variance σ2i (t) cannot converge in this case, which
contradicts with the prerequisite that σ2i (t) converges. Second, consider the case S1 = ∅. Due
to S1 = ∅, according to Theorems 2 and 3, va(t) converges to va∗ ∈ S1 for any va(0) ∈ A.
According to Lemma 3, we further have limt→∞
1σ2(t)
= pii +∑
k∈N (i) va∗k→i > 0 for all i ∈ V or
limt→∞
1σ2(t)
= pii +∑
k∈N (i) va∗k→i = 0 for all i ∈ V . If pii +
∑k∈N (i) v
a∗k→i > 0 for all i ∈ V ,
combining with va∗ ∈ S1, we can infer from the definition of S in (42) that va∗ ∈ S , which
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
19
contradicts with the assumption S = ∅. On the other hand, if pii +∑
k∈N (i) va∗k→i = 0 for all
i ∈ V , obviously, the variance σ2i (t) =
1pii+
∑k∈N (i) v
ak→i(t)
cannot converge, which contradicts with
the prerequisite that σ2i (t) converges. In summary, we have proved that if σ2
i (t) converges for
all va(0) ∈ A, then S = ∅ cannot hold, or equivalently we must have S = ∅.
Remark 1: Due to the belief mean µi(t) =hi+
∑k∈N (i) β
ak→i(t)
σ2i (t)
, if the convergence condition
of belief variance σ2i (t) is already known, the convergence of µi(t) is determined by that
of message parameters βak→i(t). As pointed out in [24], βa
k→j(t) is updated in accordance to
a set of linear equations upon the convergence of belief variances. Thus, after deriving the
convergence condition of σ2i (t), we can then apply the theories from linear equations to analyze
the convergence condition of belief mean µi(t). However, the convergence condition of belief
mean µi(t) would be lengthy and will be investigated in the future.
B. Verification Using Semi-definite Programming
Next, we show that whether S is ∅ can be determined by solving the following SDP problem
[30]
minw, α
α (43)
s.t.
pii +∑
k∈N (i)\jwki pij
pij −wij
≽ 0, ∀ (i, j) ∈ E ;
α + pii +∑
k∈N (i)
wki ≥ 0, ∀ i ∈ V .
The SDP problem (43) can be solved efficiently by existing softwares, such as CVX [31] and
SeDuMi [32], etc.
Theorem 5. S = ∅ if and only if the optimal solution α∗ of (43) satisfies α∗ < 0.
Proof: First, notice that the SDP problem in (43) is equivalent to the following optimization
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
20
problem
minw, α
α (44)
s.t. wij − gij (w) ≤ 0, ∀ (i, j) ∈ E ;
− pii −∑
k∈N (i)\j
wki ≤ 0, ∀ (i, j) ∈ E ;
− pii −∑
k∈N (i)
wki ≤ α, ∀ i ∈ V .
If S = ∅, according to definition of S in (42), there must exist a w such that w ≤ g(w), w ∈ W ,
and pii +∑
k∈N (i) wki > 0 for all i ∈ V . Obviously, these three conditions are equivalent to
wij − gij (w) ≤ 0, −pii −∑
k∈N (i)\j wki < 0 for all (i, j) ∈ E and −pii −∑
k∈N (i)wki < 0 for
all i ∈ V . Thus, by defining α = −pii −∑
k∈N (i)wki, it can be seen that (w, α) satisfies the
constraints in (44). Due to −pii −∑
k∈N (i)wki < 0, thus we have α < 0. Since α is a feasible
solution of the minimization problem in (44), the optimal solution of (44) cannot be greater than
α, hence it can be inferred that α∗ < 0.
Next, we prove the reverse also holds. If (w∗, α∗) is the optimal solution of (44) with α∗ < 0,
(w∗, α∗) must satisfy the constraints in (44), thus the following three conditions hold: 1) w∗ij −
gij (w∗) ≤ 0; 2) −pii−
∑k∈N (i)\j w
∗ki ≤ 0; 3) −pii−
∑k∈N (i)w
∗ki ≤ α∗. For the second constraint,
if −pii −∑
k∈N (i)\j w∗ki = 0, the function gij (w
∗) = − p2ijpii+
∑k∈N (i)\j w
∗ki
becomes undefined, thus
−pii −∑
k∈N (i)\j w∗ki = 0 will never happen. Hence, we always have −pii −
∑k∈N (i)\j w
∗ki < 0.
For the third constraint, due to α∗ < 0, it can be inferred that −pii −∑
k∈N (i) w∗ki < 0. Now,
comparing −pii−∑
k∈N (i)\j w∗ki ≤ 0, −pii−
∑k∈N (i)\j w
∗ki < 0 and −pii−
∑k∈N (i)w
∗ki < 0 with
the definition of set S in (42), we have w∗ ∈ S, and hence S = ∅.
Remark 2: Using the alternating direction method of multipliers (ADMM) [27], the SDP
problem in (43) can be reformulated into N locally connected low-dimensional SDP sub-
problems, thus can be solved distributively. Not only this avoids the gathering of information at
a central processing unit, the complexity is also reduced from O((∑N
i=1 |N (i)|)4)
of directly
solving SDP to O(∑N
i=1 |N (i)|4)
per iteration of using ADMM technique. Furthermore, since
what we are really interested in is to know whether S = ∅, ADMM can stop its updating
immediately if an intermediary vector is found to be within S . Thus, the required number of
iterations can be reduced significantly. Because the derivation of ADMM is well-documented
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
21
in [27], [33], we do not give the details here. It should be emphasized that despite of the low
complexity of ADMM at each iteration, due to the unknown number of required iterations, it
cannot be proved that the overall complexity is always lower than that of direct matrix inverse
O(N3).
Remark 3: As discussed after (20), we know that A = {w ≥ w0 |w0 ∈ int(S1)} if int(S1) = ∅.
In particular, with a proof similar to that of Theorem 5, it can be shown that if the SDP
minw, α
α (45)
s.t.
pii +∑
k∈N (i)\jwki pij
pij α− wij
≽ 0, ∀ (i, j) ∈ E
has a solution α∗ < 0, then the optimal w∗ must be a point in int(S1).
VI. RELATIONSHIP WITH THE CONDITION BASED ON COMPUTATION TREE
In this section, we will establish the relation between the proposed convergence condition and
the one proposed in [24], which is the best currently available. For the convergence condition
in [24], the original PDF in (1) is first transformed into the normalized form
f(x) ∝ exp
{−1
2xT Px+ hT x
}, (46)
where x = diag12 (P)x, P = diag−
12 (P)Pdiag−
12 (P) and h = diag−
12 (P)h with diag(P)
denoting a diagonal matrix by only retaining the diagonal elements of P. Then, Gaussian BP
is carried out with P and h using the updating equations in (6), (7), (9) and (10). Under the
normalized P and h, we denote the corresponding message parameters as va(t) and βa(t). With
va(t), the corresponding belief variance can be calculated using (11) as σ2i (t) =
1pii+
∑k∈N (i) v
ak→i(t)
.
The variance of the original PDF f(x) can be recovered from 1piiσ2i (t). Moreover, under the
normalized P, the sets W , S1, A and S can also be defined similar to (15), (17), (20) and (42),
respectively.
By noticing that the message-passing process of Gaussian BP in the factor graph can be
translated exactly into the message passing in a computation tree, [24] proposed a convergence
condition for belief variance σ2i (t) in terms of computation tree. A computation tree Tγ,∆ is
constructed by choosing one of the variable nodes γ as root node and N (γ) as its direct
descendants, connected through factor node fγ,ν(xγ, xν) = exp{−pγν xγxν}, where ν is the
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
22
1x
2x
3x
4x
3x
4x
2x
3x
4x1
x
1x
3x
1x 3
x 1x 2
x
4x2
x
4x
1x
4x
3x
Fig. 1: Illustration of computation tree T(1, 4) and computation sub-tree Ts(1, 4, 4) within the
dashed line.
index of any descendant. For each descendant ν, the neighbors of ν excluding its parent node
are connected as next level descendants, also connected through the corresponding factor nodes.
The process is repeated until the computation tree Tγ,∆ has ∆ + 1 layers of variables nodes
(including the root node). The computation tree is completed by further connecting factor node
fi(xi) = exp{−12x2i + hixi} to variable node xi for i ∈ V . Furthermore, a computation subtree
Ts(γ, ν,∆) can be obtained by cutting the branch of xν in the first layer from T(γ,∆). An
example of computation tree T(1, 4) as well as computation subtree Ts(1, 4, 4) corresponding to
a fully connected factor graph with four variables are illustrated in Fig. 1.
By assigning to each variable node with a new index i = 1, 2, · · · , |Tγ,∆|, Tγ,∆ can be viewed
as a factor graph for a new PDF, where |Tγ,∆| denotes the number of variable nodes in Tγ,∆.
The new PDF represented by Tγ,∆ is
fγ,∆ (xγ,∆) ∝ exp
{−1
2xTγ,∆Pγ,∆xγ,∆ + hT
γ,∆xγ,∆
}, (47)
where xγ,∆ is a |Tγ,∆| × 1 vector; Pγ,∆ and hγ,∆ are the corresponding precision matrix and
linear coefficient vector, respectively. According to the equivalence between the message-passing
process in original factor graph and that in computation tree [6], [24], under the prerequisite of
Pγ,∆ ≻ 0 and va(0) = 0, it is known that
σ2γ(∆) = [P−1
γ,∆]11. (48)
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
23
Similarly, the computation sub-tree Ts(γ, ν,∆) can also be viewed as a factor graph with Pγ\ν,∆
being the corresponding symmetric precision matrix in the PDF. Furthermore, if (48) holds, we
also have
σ2γ\ν(∆) = [P−1
γ\ν,∆]11, (49)
where σ2γ\ν(∆) , 1
pγγ+∑
k∈N (γ)\ν vak→γ(∆). In [24, Proposition 25], it is proved that if lim
∆→∞ρ(Pγ,∆−
I) < 1, the variance σ2γ(t) converges for va(0) = 0; but if lim
∆→∞ρ(Pγ,∆ − I) > 1, Gaussian BP
becomes ill-posed. Notice that the limit lim∆→∞
ρ(Pγ,∆ − I) always exist and are identical for
all γ ∈ V [24, Lemma 24]. The following lemma and theorem reveal the relation between the
proposed convergence condition and that based on computation tree.
Lemma 4. If Pγ,∆ ≻ 0 for all γ ∈ V and ∆ ≥ 0, then va(t) converges to va∗ ∈ S1 for
va(0) = 0.
Proof: See Appendix B.
Now, we give the following theorem.
Theorem 6. S = ∅ if and only if either of the following two conditions holds:
1) lim∆→∞
ρ(Pγ,∆ − I) < 1;
2) lim∆→∞
ρ(Pγ,∆ − I) = 1, pγγ +∑
k∈N (γ) va∗k→γ = 0 with va(0) = 0, and Pν,∆ ≻ 0 for all ν ∈ V
and ∆ ≥ 0.
Proof:
Sufficient Condition:
First, we prove that if lim∆→∞
ρ(Pγ,∆ − I) < 1, then S = ∅. With the monotonically increasing
property of ρ(Pγ,∆ − I) [24, Lemma 23] and the assumption that its limit is smaller than 1, we
have ρ(Pγ,∆− I) ≤ ρu with ρu , lim∆→∞
ρ(Pγ,∆− I) ∈ (0, 1). Expressing in terms of eigenvalues,
we have −ρu ≤ λ(Pγ,∆)− 1 ≤ ρu, or equivalently
1− ρu ≤ λ(Pγ,∆) ≤ 1 + ρu. (50)
For a symmetric Pγ,∆, the eigenvalues of P−1γ,∆ are
{1
λ(Pγ,∆)
}, and (50) is equivalent to
1
1 + ρu≤ λ(P−1
γ,∆) ≤1
1− ρu. (51)
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
24
Due to 0 < ρu < 1, it can be obtained from (51) that λ(P−1γ,∆) > 0, that is,
P−1γ,∆ ≻ 0, ∀ γ ∈ V and ∆ ≥ 0. (52)
Notice that the diagonal element of a symmetric matrix is always smaller or equal to the
matrix’s maximum eigenvalue [34], that is, [P−1γ,∆]ii ≤ λmax(P
−1γ,∆). Combining with the fact
that for a positive definite matrix, its diagonal elements are positive, we obtain 0 < [P−1γ,∆]ii ≤
λmax(P−1γ,∆). Since the upper bound of (51) applies to all eigenvalues, we obtain
0 < [P−1γ,∆]ii ≤
1
1− ρu. (53)
Due to [P−1γ,∆]11 = σ2
γ(∆) from (48) and σ2γ(∆) = 1
pγγ+∑
k∈N (γ) vak→γ(∆)
, it can be inferred from
(53) that pγγ +∑
k∈N (γ) vak→γ(∆) ≥ 1− ρu for all ∆ ≥ 0. Due to P−1
γ,∆ ≻ 0 from (52), we have
Pγ,∆ ≻ 0, and according to Lemma 4, va(t) always converges to va∗ ∈ S1. Thus, it can be
inferred that pγγ +∑
k∈N (γ) va∗k→γ ≥ 1− ρu. With ρu < 1, we obtain
pγγ +∑
k∈N (γ)
va∗k→γ > 0, ∀ γ ∈ V . (54)
Combining (54) with va∗ ∈ S1, it can be inferred that va∗ ∈ S by the definition, thus S = ∅.
Next, we prove the second scenario. Due to Pγ,∆ ≻ 0, according to Lemma 4, va(t) converges
to va∗ ∈ S1, and thereby S1 = ∅ and pγγ +∑
k∈N (γ) va∗ exists. Due to S1 = ∅, according to
Lemma 3, it is known that pγγ+∑
k∈N (γ) va∗ > 0 for all γ ∈ V or pγγ+
∑k∈N (γ) v
a∗ = 0 for all
γ ∈ V . From the prerequisite pγγ +∑
k∈N (γ) va∗ = 0, we can infer that pγγ +
∑k∈N (γ) v
a∗ > 0
holds for all γ ∈ V . Combining with va∗ ∈ S1, we can see that va∗ ∈ S , thus S = ∅.
Necessary Condition:
In the following, we will first prove that the precision matrix of a computation tree can always
be written into a special structure by re-indexing the nodes in the tree. Based on the special
structure, the necessity is proved by the method of induction.
First, notice that changing indexing schemes in the computation tree does not affect the positive
definiteness of the corresponding precision matrix Pγ,∆. So, we consider an indexing scheme
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
25
such that the precision matrix Pγ,∆+1 for ∆ ≥ 0 can be represented in the form
Pγ,∆+1 =
pγγ aTk1\γ,∆ aT
k2\γ,∆ · · · aTk|N (γ)|\γ,∆
ak1\γ,∆ Pk1\γ,∆ 0 · · · 0
ak2\γ,∆ 0. . . . . . ...
...... . . . . . . 0
ak|N (γ)|\γ,∆ 0 · · · 0 Pk|N (γ)|\γ,∆
, (55)
where ki ∈ N (γ) for i = 1, 2, · · · , |N (γ)|; aki\γ,∆ = [pkiγ, 0, · · · , 0]T is a vector with length
|Ts(ki, γ,∆)|. Notice that the (∆ + 1)-order computation tree T(γ,∆+ 1) consists of the root
node [x]γ and a set of ∆-order computation sub-trees Ts(ki, γ,∆) with the corresponding root
node ki. The lower-right block diagonal structure of (55) can be easily obtained by assigning
consecutive indices to nodes inside each sub-tree Ts(ki, γ,∆). Moreover, since there is only
one connection from node γ to the root node ki of each sub-tree Ts(ki, γ,∆), aki\γ,∆ contains
only one nonzero element pkiγ . Then, by assigning the smallest index to the root node in each
Ts(ki, γ,∆), the only nonzero element in aki\γ,∆ must locate at the first position. Therefore, the
precision matrix Pγ,∆+1 can be represented in the form of (55).
When ∆ = 0, obviously, Pγ,∆ = pγγ > 0. Suppose Pγ,∆ ≻ 0 for some ∆ ≥ 0, we need
to prove Pγ,∆+1 ≻ 0. From (55), it is clear that Pγ,∆+1 ≻ 0 if and only if the following two
conditions are satisfied [34, p. 472]:
Bdiag(Pk1\γ,∆, · · · , Pk|N (γ)|\γ,∆
)≻ 0, (56)
pγγ −∑
k∈N (γ)
aTk\γ,∆P
−1k\γ,∆ak\γ,∆ > 0, (57)
where Bdiag(·) denotes block diagonal matrix with the elements located along the main diagonal.
Due to Pk,∆ ≻ 0 for all k ∈ V by assumption, then its sub-matrices Pk\γ,∆ ≻ 0 for all (k, γ) ∈ E ,
thus the first condition (56) holds. On the other hand, for the second condition (57), we write
pγγ −∑
k∈N (γ)
aTk\γ,∆P
−1k\γ,∆ak\γ,∆
a= pγγ −
∑k∈N (γ)
p2kγ · σ2k\γ(∆)
b= pγγ +
∑k∈N (γ)
vak→γ(∆ + 1), (58)
where the equality a= holds since ak\γ,∆ has only one nonzero pkγ in the first element, and
[P−1k\γ,∆]11 = σ2
k\γ(∆) from (49); and b= holds due to −p2kγ · σ2
k\γ(∆) = − p2kγpγγ+
∑k∈N (γ)\ν vak→γ(∆)
=
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
26
vak→γ(∆ + 1) given in (12). If S = ∅, according to Theorem 4, we have
pγγ +∑
k∈N (γ)
va∗k→γ > 0. (59)
Due to S ⊆ S1, then S = ∅ also implies S1 = ∅. According to (63), it can be inferred that
va(t) ≥ va∗. Combining with (59), we obtain pγγ+∑
k∈N (γ) vak→γ(t) > 0. Substituting the result
into (58), it can be inferred that the second condition (57) holds as well. Thus, we have Pγ,∆ ≻ 0
for all γ ∈ V and ∆ > 0. Furthermore, from (59), it is obvious that pγγ +∑
k∈N (γ) va∗k→γ = 0.
From Pγ,∆ ≻ 0, we obtain I− (I− Pγ,∆) ≻ 0, and hence 1−λmax(I− Pγ,∆) > 0. Since Pγ,∆
represents a tree-structured factor graph, it is proved in [24, Proposition 15] that λmax(I−Pγ,∆) =
ρ(Pγ,∆ − I). Therefore, it can be obtained that ρ(Pγ,∆ − I) < 1, and thereby
lim∆→∞
ρ(Pγ,∆ − I) ≤ 1. (60)
Finally, if lim∆→∞
ρ(Pγ,∆ − I) < 1, due to P−1γ,∆ ≻ 0 from (52), we have Pγ,∆ ≻ 0. Together
with (54), it can be inferred that under the prerequisite of lim∆→∞
ρ(Pγ,∆ − I) < 1, the conditions
Pγ,∆ ≻ 0 and pγγ +∑
k∈N (γ) va∗k→γ = 0 are automatically satisfied. Therefore, if S = ∅, we have
either lim∆→∞
ρ(Pγ,∆ − I) < 1 or lim∆→∞
ρ(Pγ,∆ − I) = 1, pγγ +∑
k∈N (γ) va∗k→γ = 0 and Pγ,∆ ≻ 0.
From Theorems 4 and 6, it can be obtained that the variance σ2γ(t) converges as lim
∆→∞ρ(Pγ,∆−
I) < 1, and diverges as lim∆→∞
ρ(Pγ,∆ − I) > 1, which are consistent with the results proposed
in [24]. Moreover, it can be seen from Theorem 6 that it is not sufficient to determine the
convergence of variance σ2γ(t) by using lim
∆→∞ρ(Pγ,∆ − I) = 1 only. This fills in the gap of [24]
in the scenario of lim∆→∞
ρ(Pγ,∆ − I) = 1. Albeit with similar conclusions to [24], we need to
emphasize that the criterion lim∆→∞
ρ(Pγ,∆−I) < 1 proposed in [24] is not easy to check in practice
due to the infinite dimension, while our condition S = ∅ can be verified by solving an SDP
problem given in Theorem 5. Moreover, the initialization is expanded from the a single choice
va(0) = 0 in [24] to a much larger set va(0) ∈ A in this paper. The flexibility on the choice
of initialization is useful to accelerate the convergence of variance σ2i (t) if the initialization is
chosen close to the convergent point.
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
27
VII. NUMERICAL EXAMPLES
In this section, numerical experiments are presented to corroborate the theories in this paper.
The example is based on the 20× 20 precision matrices P constructed as
pij =
1, if i = j
ζ · θmod(i+j,10)+1, if i = j, (61)
where ζ is a coefficient indicating the correlation strength among variables; and θk is the k-
th element of the vector θ = [0.13, 0.10, 0.71,−0.05, 0, 0.12, 0.07, 0.11,−0.02,−0.03]T . The
varying of correlation strength ζ induces a series of matrices, and the positive definite constraint
P ≻ 0 required by a valid PDF is guaranteed when ζ < 0.5978.
Fig. 2 illustrates how the optimal solution α∗ of (43) varies with the correlation strength ζ .
It can be seen that the optimal solution α∗ always exists and the condition α∗ < 0 holds for
all ζ ≤ 0.5859, while no feasible solution exists in the SDP problem (43) when ζ > 0.5859.
According to Theorem 4, this means that if ζ ≤ 0.5859, the variance σ2i (t) with i ∈ V converges
to the same point for all initializations va(0) ∈ A under both synchronous and asynchronous
schedulings. On the other hand, if ζ > 0.5859, the variance σ2i (t) cannot converge.
To verify the convergence of belief variances under ζ ≤ 0.5859, Fig. 3 shows how the variance
σ21(t) of the 1-st variable evolves as a function of t when ζ = 0.5858, which is slightly smaller
than 0.5859. It can be observed that the variance σ21(t) converges to the same value under
both synchronous and asynchronous schedulings and different initializations of va(0) = 0 and
va(0) = w∗, where w∗ is the optimal solution of (45). For the asynchronous case, a scheduling
with 30% chance of not updating the messages at each iteration is considered. On the other
hand, Fig. 4 verifies the divergence of variance σ21(t) when ζ = 0.5860, which is slightly larger
than 0.5859. In this figure, synchronous scheduling and the same initializations as that of Fig.
3 are used. It can be seen that σ21(t) fluctuates as iterations proceed, and does not show sign of
convergence.
VIII. CONCLUSIONS
In this paper, the necessary and sufficient convergence condition for the variances of Gaussian
BP was developed for synchronous and asynchronous schedulings. The initialization set of the
proposed condition is much larger than the usual choice of a single point of zero. It is proved that
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
28
0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65−1
−0.9
−0.8
−0.7
−0.6
−0.5
−0.4
−0.3
−0.2
X: 0.5859Y: −0.2844
Converg
ence M
etr
ic: α
Convergence Metric of Variance
Correlation Strength (ζ)
ζ=0.5978
Fig. 2: The value of α∗ under different correlation strength ζ .
0 20 40 60 80 100 120 140 160 180 2001.38
1.39
1.4
1.41
1.42
1.43
1.44
1.45
1.46
1.47
Convergence of Variance with ζ=0.5858
Number of Iterations (t)
σ2 1(t
)
asynchronous
synchronous
va(0) = 0
va(0) = w
∗
Fig. 3: Illustration for the convergence of variance σ21(t) under different schedulings and
initializations with w∗ being the optimal solution of (45).
the convergence condition can be verified efficiently by solving an SDP problem. Furthermore,
the relationship between the convergence condition proposed in this paper and the one based on
computation tree was established. The relationship fills in a missing piece of the result in [24]
where the spectral radius of computation tree is equal to one. Numerical examples were further
proposed to verify the proposed convergence conditions.
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
29
0 200 400 600 800 1000 1200−10
−5
0
5
10
15Divergence of Variance with ζ=0.5860
Number of Iterations (t)
σ2 1(t
)
va(0) = 0
va(0) = w
∗
Fig. 4: Illustration for the divergence of variance σ21(t) with w∗ being the optimal solution of
(45).
APPENDIX A
PROOF OF LEMMA 2
First, we prove that va(t) converges for any va(0) ≥ 0 given S1 = ∅. For any w ∈ S1,
according to P5), we have w < 0. Thus, for any va(0) ≥ 0, the relation w ≤ va(0) always
holds. Notice that w ∈ W due to w ∈ S1 and S1 ⊆ W . Applying P2) to w ≤ va(0), we obtain
g(w) ≤ va(1). Combining it with w ≤ g(w) from P6) gives w ≤ va(1). On the other hand,
substituting va(0) ≥ 0 into (14) gives
va(1) < 0. (62)
Due to va(0) ≥ 0, thus va(1) ≤ va(0). Combining it with w ≤ va(1) gives w ≤ va(1) ≤ va(0).
Applying g(·) to w ≤ va(1) < va(0), it can be inferred from P2) that g(w) ≤ va(2) < va(1).
Together with w ≤ g(w) as claimed by P6), we obtain w ≤ va(2) < va(1). By induction, we
can infer that
w ≤ va(t+ 1) ≤ va(t). (63)
It can be seen from (63) that va(t) is a monotonically non-increasing but lower bounded sequence,
thus it converges.
Next, we prove that va(t) converges to the same point for all va(0) ≥ 0. For any va(0) ≥ 0,
according to P2), applying g(·) on both sides of 0 ≤ va(0) gives g(0) ≤ g(va(0)) = va(1).
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
30
Combining this relation with (62) leads to g(0) ≤ va(1) ≤ 0. Applying g(·) on this inequality
for t more times, it can be obtained from P2) that g(t+1)(0) ≤ va(t+1) ≤ g(t)(0) for all t ≥ 0.
By denoting limt→∞
g(t)(0) = va∗, it is obvious that va(t) also converges to the va∗.
Finally, since va∗ is a fixed point of g(·), hence va∗ = g(va∗). From (63), we obtain w ≤ va∗.
Due to w ∈ S1, thus w ∈ W by definition. Applying P1), it can be inferred that va∗ ∈ W .
Combining with the fact va∗ = g(va∗), according to the definition of S1 in (17), we obtain
va∗ ∈ S1.
On the other hand, if S1 = ∅, according to Theorem 1, the messages of Gaussian BP passed
in factor graph cannot be maintained at the Gaussian form, thus the parameters of messages
va(t) cannot converge in this case.
APPENDIX B
PROOF OF LEMMA 4
Due to Pγ,∆ ≻ 0, then P−1γ,∆ ≻ 0, hence the diagonal elements [P−1
γ,∆]ii > 0 [34]. With
σ2γ(∆) = [P−1
γ,∆]11 and 1σ2γ(t)
= pγγ +∑
k∈N (γ) vak→γ(t), we have
pγγ +∑
k∈N (γ)\ν
vak→γ(t) > −vaν→γ(t). (64)
Next, we prove va(t) ∈ W for t ≥ 0. Obviously, va(0) = 0 ∈ W . Suppose va(t) ∈ W
holds for some t ≥ 0, and according to the definition of W in (15), this is equivalent to
pνν +∑
k∈N (ν)\γ vak→ν(t) > 0. Substituting this inequality into (14) gives vaν→γ(t+1) < 0. Then
applying vaν→γ(t+1) < 0 into (64) gives pγγ +∑
k∈N (γ)\ν vak→γ(t+ 1) > 0 for all (ν, γ) ∈ E , or
equivalently va(t+ 1) ∈ W . Thus, we have va(t) ∈ W for all t ≥ 0.
Finally, substituting va(0) = 0 into (13) gives va(1) < 0, and thereby va(1) < va(0). From
va(t) ∈ W and P2), applying g(·) on both sides of va(1) < va(0) for t times gives
va(t+ 1) ≤ va(t). (65)
From va(0) = 0 and (65), we have va(t) ≤ 0. Putting the result into (64) gives vaν→γ(t) > −pγγ
for all (ν, γ) ∈ E . Together with (65), it can be seen that va(t) is a monotonically non-increasing
and lower bounded sequence. Thus, va(t) converges to a fixed point va∗ with va∗ = g(va∗).
Moreover, from va(1) < 0 and (65), we have va∗ < 0. Putting the result into the limiting form
of (64) gives pγγ +∑
k∈N (γ)\ν va∗k→γ > 0 for all (ν, γ) ∈ E , or equivalently va∗ ∈ W . Combining
with va∗ = g(va∗), we obtain va∗ ∈ S1 by the definition of S1.
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
31
REFERENCES
[1] F. R. Kschischang, B. J. Frey, and H. A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory,
vol. 47, pp. 498-519, 2001.
[2] H. A. Loeliger, “An introduction to factor graphs,” IEEE Signal Process. Mag., vol. 21, pp. 28-41, 2004.
[3] H. A. Loeliger, J. Dauwels, H. Junli, S. Korl, P. Li, and F. R. Kschischang, “The Factor graph approach to model-based
signal processing,” Proc. IEEE, vol. 95, no. 6, pp. 1295-1322, June 2007.
[4] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, Aug. 2006.
[5] D. Koller and N. Friedman. Probabilistic Graphical Models Principles and Techniques. MIT Press, 2009.
[6] Y. Weiss and W. T. Freeman, “Correctness of belief propagation in Gaussian graphical models of arbitrary topology,” Neural
Comput., vol. 13, pp. 2173-2200, 2001.
[7] Y. Liu, V. Chandrasekaran, A. Anandkumar, and A. S. Willsky, “Feedback message passing for inference in Gaussian
graphical models,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 4135-4150, 2012.
[8] A. Montanari, B. Prabhakar, and D. Tse, “Belief propagation based multiuser detection,” in Proc. IEEE Information Theory
Workshop, Punta del Este, Uruguay, Mar. 2006.
[9] Q. Guo and D. Huang, “EM-based joint channel estimation and detection for frequency selective channels using Gaussian
message passing,” IEEE Trans. Signal Process., vol. 59, pp. 4030-4035, 2011.
[10] Q. Guo and P. Li, “LMMSE turbo equalization based on factor graphs,” IEEE J. Sel. Areas Commun., vol. 26, pp. 311-319,
2008.
[11] Y. El-Kurdi, W. J. Gross, and D. Giannacopoulos, “Efficient implementation of Gaussian belief propagation solver for
large sparse diagonally dominant linear systems,” IEEE Trans. Magn., vol. 48, no. 2, pp. 471474, Feb. 2012.
[12] O. Shental, P. H. Siegel, J. K. Wolf, D. Bickson, and D. Dolev, “Gaussian belief propagation solver for systems of linear
equations,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2008, pp. 1863-1867.
[13] X. Tan and J. Li, “Computationally efficient sparse Bayesian learning via belief propagation,” IEEE Trans. Signal Process.,
vol. 58, pp. 2010-2021, 2010.
[14] V. Chandrasekaran, J. K. Johnson, and A. S. Willsky, “Estimation in Gaussian graphical models using tractable subgraphs:
a walk-sum analysis,” IEEE Trans. Signal Process., vol. 56, pp. 1916-1930, 2008.
[15] N. Boon Loong, J. S. Evans, S. V. Hanly, and D. Aktas, “Distributed downlink beamforming with cooperative base stations,”
IEEE Trans. Inf. Theory, vol. 54, pp. 5491-5499, 2008.
[16] F. Lehmann, “Iterative mitigation of intercell interference in cellular networks based on Gaussian belief propagation,”
IEEE Trans. Veh. Technol., vol. 61, pp. 2544-2558, 2012.
[17] M. Leng and Y. C. Wu, “Distributed clock synchronization for wireless sensor networks using belief propagation,” IEEE
Trans. Signal Process., vol.59, no.11, pp. 5404-5414, Nov. 2011.
[18] A. Ahmad, D. Zennaro, E. Serpedin, and L. Vangelista, “A factor graph approach to clock offset estimation in wireless
sensor networks,” IEEE Trans. Inf. Theory, vol. 58, no. 7, pp. 4244-4260, July 2012.
[19] M. Leng, T. Wee-Peng, and T. Q. S. Quek, “Cooperative and distributed localization for wireless sensor networks in
multipath environments,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 3125-
3128.
[20] Y. Song, B. Wang, Z. Shi, K. Pattipati, and S. Gupta, “Distributed algorithms for energy-efficient even self-deployment in
mobile sensor networks,” in IEEE Trans. Mobile Compt., 2013, pp. 1-14.
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
32
[21] G. Zhang, W. Xu, and Y. Wang, “Fast distributed rate control algorithm with QoS support in Ad Hoc networks,” in Proc.
IEEE Global Telecommunications Conf. (Globecom), 2010.
[22] D. Dolev, A. Zymnis, S. P. Boyd, D. Bickson, and Y. Tock, “Distributed large scale network utility maximization,” in
Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2009, pp. 829-833.
[23] M. W. Seeger and D. P. Wipf, “Variational Bayesian inference techniques,” IEEE Signal Process. Mag., vol. 27, no. 6, pp.
8191, Nov. 2010.
[24] D. M. Malioutov, J. K. Johnson, and A. S. Willsky, “Walk-sums and belief propagation in Gaussian graphical models,” J.
Mach. Learn. Res., vol. 7, no.1, pp. 2031-2064, 2006.
[25] C. C. Moallemi and B. Van Roy, “Convergence of min-sum message passing for quadratic optimization,” IEEE Trans. Inf.
Theory, vol. 55, pp. 2413-2423, 2009.
[26] S. Tatikonda and M. I. Jordan, “Loopy belief propagation and Gibbs measures,” in Proc. 18th Annu. Conf. Uncertainty
Artif. Intell. (UAI-02), San Francisco, CA, Aug. 2002, pp.493-500.
[27] D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Englewood Cliffs, NJ:
Prentice-Hall, 1989.
[28] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.
[29] C. C. Moallemi and B. Van Roy, “Convergence of Min-Sum Message Passing for Convex Optimization,” IEEE Trans. Inf.
Theory, vol. 56, no. 4, pp. 2041-2050, 2010.
[30] L. Vandenberghe and S. Boyd, “Semidefinite programming,” SIAM Rev., vol. 38, no. 1, pp. 49-95, Mar. 1996.
[31] M. Grant, S. Boyd, and Y. Ye, cvx: Matlab software for disciplined convex programming. [Online]. Available: http:
//www.stanford.edu/∼boyd/cvx/
[32] J. F. Sturm, “Using SeDumi 1.02, a MATLAB toolbox for optimization over symmetric cones,” Optim. Methods Software,
no. 11-22, pp.625-653, 1999.
[33] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating
method of multipliers,” Found. Trends in Mach. Learn., vol. 3, no. 1, pp. 1-122, 2011.
[34] R. Horn and C.R. Johnson, Matrix Analysis. New York: Cambridge Univ. Press, 1985.
Qinliang Su received the B.Eng degree from Chongqing University in 2007 and M.Eng degree from
Zhejiang University in 2010 (both with honors), respectively. He is currently pursuing the Ph.D degree
with the Department of Electrical and Electronic Engineering, the University of Hong Kong, Hong Kong.
His current research interests include signal processing for wireless communications and networks; statis-
tical inference and learning; graphical models; distributed and parallel computation; convex optimization
theories.
July 31, 2014 DRAFT
1053-587X (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TSP.2014.2345635, IEEE Transactions on Signal Processing
33
Yik-Chung Wu received the B.Eng. (EEE) degree in 1998 and the M.Phil. degree in 2001 from the
University of Hong Kong (HKU). He received the Croucher Foundation scholarship in 2002 to study
Ph.D. degree at Texas A&M University, College Station, and graduated in 2005. From August 2005 to
August 2006, he was with the Thomson Corporate Research, Princeton, NJ, as a Member of Technical
Staff. Since September 2006, he has been with HKU, currently as an Associate Professor. He was a
visiting scholar at Princeton University, in summer 2011. His research interests are in general area of
signal processing and communication systems, and in particular distributed signal processing and communications; optimization
theories for communication systems; estimation and detection theories in transceiver designs; and smart grid. Dr. Wu served as
an Editor for IEEE Communications Letters, is currently an Editor for IEEE Transactions on Communications and Journal of
Communications and Networks.
July 31, 2014 DRAFT