14
TO APPEAR IN IEEE TRANS. ON VEHICULAR TECHNOLOGY, 2013 1 Low Complexity MIMO Detection based on Belief Propagation over Pair-wise Graphs Seokhyun Yoon, Member, IEEE and Chan-Byoung Chae, Senior Member, IEEE Abstract— This paper considers belief propagation algorithm over pair-wise graphical models to develop low complexity, iterative multiple-input multiple-output (MIMO) detectors. The pair-wise graphical model is a bipartite graph where a pair of variable nodes are related by an observation node represented by the bivariate Gaussian function obtained by marginalizing the posterior joint probability density under the Gaussian input assumption. Specifically, we consider two types of pair-wise models, the fully-connected and ring-type. The pair-wise graphs are sparse, compared to the conventional graphical model in [18], insofar as the number of edges connected to an observation node (edge degree) is only two. Consequently the computations are much easier than those of maximum likelihood (ML) detection, which are similar to the belief propagation (BP) that is run over the fully connected bipartite graph. The link level performance for non-Gaussian input is evaluated via simulations, and the results show the validity of the proposed algorithms. We also customize the algorithm with Gaussian input assumption to obtain the Gaussian BP run over the two pair-wise graphical models and, for the ring-type, we prove its convergence to the linear minimum mean square error (MMSE) estimates. Since the maximum a posterior (MAP) estimator for Gaussian input is equivalent to the linear MMSE estimator, it shows the optimality of the scheme for Gaussian input. Index Terms— Markov random field, low complexity MIMO detection, graph-based detection, belief propagation, sum- product algorithm, forward-backward recursion. I. I NTRODUCTION Recent works on multi-input and multi-output (MIMO) detections have mainly been focused on the so-called sphere decoding [1]–[6]. Sphere decoding is a two-stage detector in which the channel matrix is first converted into an upper triangular form and, utilizing this structure, a tree search is used for joint data detection. Since the full tree search has the same complexity as maximum likelihood (ML) detection, a sort of reduced search algorithm is applied by limiting the search space, e.g., the number of candidate symbols or radius at each tree search stage. One advantage of sphere decoding is that it can, by choosing an appropriate value of radius or list size, provide a tradeoff between performance and complexity. Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2012R1A1A2038807) and partly by Dankook University Project for funding RICT 2011. This work was partially presented at VTC 2011 Spring. S. Yoon is with the Department of Electronics Engineering, Dankook University, Korea (e-mail: [email protected]) C.-B. Chae is with the School of Integrated Technology, Yonsei University, Korea (e-mail: [email protected]) The performance of sphere decoding has been shown to be quite close to that of ML with a reasonable level of complexity [6]. To produce soft decisions required for channel decoding, however, the search space cannot be set too small. Another type of MIMO detector, which has received little attention, is the channel truncation approach in [7]–[10]. This approach is also a two-stage detector, where the channel is first converted into a bi-diagonal or, more generally, a poly- diagonal form [9], [10] and, utilizing the effective channel structure, a trellis search, e.g., the Viterbi algorithm or the forward-backward algorithm [11], [12], is used for post-joint detection. The method is similar to the concatenated channel- shortening equalizer and maximum likelihood sequence es- timator (MLSE) for the inter-symbol interference channel [13]. By employing channel shortening, rather than channel inversion, the noise enhancement that severely affects the performance can be eased, while the amount of interference is limited, allowing maximum likelihood sequence estimation (MLSE) to be implemented with less complexity. Bit-based probabilistic data association [14] is another ap- proach to low complexity MIMO detection especially for higher order QAM. In [14], a matrix representation is in- troduced to represents symbol mapping, by which it can be considered as a linear processing and can be combined as part of MIMO channel giving us a room for complexity reduction for higher order QAM. Another class of MIMO detection worthy of attention is graph based detection [15]–[21]. The approaches are based on the belief propagation (BP) algorithm [22], [23]. This algorithm has also been extensively studied for the decoding of channel codes, such as the turbo codes and low density parity check codes. In these approaches, the MIMO channel is modeled as a fully-connected bipartite graph, which consists of multiple N observation nodes representing the received signal, multiple M variable nodes representing the hidden data, and the edges connecting the observation nodes with the variable nodes. The resulting graph has the maximal edge degree, i.e., every observation node is connected to every variable node. When applying the BP algorithm [22] or the sum-product algorithm [23] to such graphs, the complexity is as high as the ML or MAP detector. This is mainly due to the metric computation and the marginalization operation required for the message update at the observation nodes. To reduce the computational complexity, the Gaussian BP has been considered in [17] and [18], where the input data and messages are all assumed to be Gaussian so that the message and posterior probability can be represented by a pair of mean and variance, resulting in a very simple message update rule.

Low Complexity MIMO Detection based on Belief Propagation over Pair-wise Graphs

Embed Size (px)

Citation preview

TO APPEAR IN IEEE TRANS. ON VEHICULAR TECHNOLOGY, 2013 1

Low Complexity MIMO Detection based on BeliefPropagation over Pair-wise Graphs

Seokhyun Yoon, Member, IEEE and Chan-Byoung Chae, Senior Member, IEEE

Abstract— This paper considers belief propagation algorithmover pair-wise graphical models to develop low complexity,iterative multiple-input multiple-output (MIMO) detectors. Thepair-wise graphical model is a bipartite graph where a pair ofvariable nodes are related by an observation node representedby the bivariate Gaussian function obtained by marginalizingthe posterior joint probability density under the Gaussian inputassumption. Specifically, we consider two types of pair-wisemodels, the fully-connected and ring-type. The pair-wise graphsare sparse, compared to the conventional graphical model in [18],insofar as the number of edges connected to an observation node(edge degree) is only two. Consequently the computations aremuch easier than those of maximum likelihood (ML) detection,which are similar to the belief propagation (BP) that is run overthe fully connected bipartite graph. The link level performancefor non-Gaussian input is evaluated via simulations, and theresults show the validity of the proposed algorithms. We alsocustomize the algorithm with Gaussian input assumption toobtain the Gaussian BP run over the two pair-wise graphicalmodels and, for the ring-type, we prove its convergence to thelinear minimum mean square error (MMSE) estimates. Sincethe maximum a posterior (MAP) estimator for Gaussian input isequivalent to the linear MMSE estimator, it shows the optimalityof the scheme for Gaussian input.

Index Terms— Markov random field, low complexity MIMOdetection, graph-based detection, belief propagation, sum-product algorithm, forward-backward recursion.

I. INTRODUCTION

Recent works on multi-input and multi-output (MIMO)detections have mainly been focused on the so-called spheredecoding [1]–[6]. Sphere decoding is a two-stage detector inwhich the channel matrix is first converted into an uppertriangular form and, utilizing this structure, a tree search isused for joint data detection. Since the full tree search hasthe same complexity as maximum likelihood (ML) detection,a sort of reduced search algorithm is applied by limiting thesearch space, e.g., the number of candidate symbols or radiusat each tree search stage. One advantage of sphere decoding isthat it can, by choosing an appropriate value of radius or listsize, provide a tradeoff between performance and complexity.

Copyright (c) 2013 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected].

This work was supported by Basic Science Research Program through theNational Research Foundation of Korea (NRF) funded by the Ministry ofEducation, Science and Technology (NRF-2012R1A1A2038807) and partlyby Dankook University Project for funding RICT 2011.

This work was partially presented at VTC 2011 Spring.S. Yoon is with the Department of Electronics Engineering, Dankook

University, Korea (e-mail: [email protected])C.-B. Chae is with the School of Integrated Technology, Yonsei University,

Korea (e-mail: [email protected])

The performance of sphere decoding has been shown to bequite close to that of ML with a reasonable level of complexity[6]. To produce soft decisions required for channel decoding,however, the search space cannot be set too small.

Another type of MIMO detector, which has received littleattention, is the channel truncation approach in [7]–[10]. Thisapproach is also a two-stage detector, where the channel isfirst converted into a bi-diagonal or, more generally, a poly-diagonal form [9], [10] and, utilizing the effective channelstructure, a trellis search, e.g., the Viterbi algorithm or theforward-backward algorithm [11], [12], is used for post-jointdetection. The method is similar to the concatenated channel-shortening equalizer and maximum likelihood sequence es-timator (MLSE) for the inter-symbol interference channel[13]. By employing channel shortening, rather than channelinversion, the noise enhancement that severely affects theperformance can be eased, while the amount of interferenceis limited, allowing maximum likelihood sequence estimation(MLSE) to be implemented with less complexity.

Bit-based probabilistic data association [14] is another ap-proach to low complexity MIMO detection especially forhigher order QAM. In [14], a matrix representation is in-troduced to represents symbol mapping, by which it can beconsidered as a linear processing and can be combined as partof MIMO channel giving us a room for complexity reductionfor higher order QAM.

Another class of MIMO detection worthy of attention isgraph based detection [15]–[21]. The approaches are basedon the belief propagation (BP) algorithm [22], [23]. Thisalgorithm has also been extensively studied for the decodingof channel codes, such as the turbo codes and low densityparity check codes. In these approaches, the MIMO channel ismodeled as a fully-connected bipartite graph, which consists ofmultiple N observation nodes representing the received signal,multiple M variable nodes representing the hidden data, andthe edges connecting the observation nodes with the variablenodes. The resulting graph has the maximal edge degree, i.e.,every observation node is connected to every variable node.When applying the BP algorithm [22] or the sum-productalgorithm [23] to such graphs, the complexity is as high asthe ML or MAP detector. This is mainly due to the metriccomputation and the marginalization operation required for themessage update at the observation nodes.

To reduce the computational complexity, the Gaussian BPhas been considered in [17] and [18], where the input data andmessages are all assumed to be Gaussian so that the messageand posterior probability can be represented by a pair of meanand variance, resulting in a very simple message update rule.

2 TO APPEAR IN IEEE TRANS. ON VEHICULAR TECHNOLOGY, 2013

As shown in [17] and [18], however, the algorithm converges(though not always) only to the linear minimum mean squarederror (LMMSE) solution, which is inferior to the ML detectorfor non-Gaussian input. On the other hand, [19] and [21] stud-ied complexity reduction via model simplification. Particularly,in [19], to reduce the edge degree, some edges in the fullyconnected bipartite graph were pruned based on the strength ofthe channel coefficients. By doing so, not only is the numberof messages reduced, but also the marginalization operationon observation nodes can be performed at a much lesser cost.Reduction in the marginalization cost is exponential with theedge-degree reduction, resulting in far less complexity thanML. The problem here, however, is that the performance lossis severe with the edge-degree reduction.

Other interesting graph-based approaches are thosein [21], [24]–[26], based on pair-wise Markov random field(MRF) [27]. In MRF, we have only one type of node rep-resenting the hidden data and the edges reflecting the localdependency among them. The local dependency is representedby potential functions and, specifically in pair-wise MRF, theyare functions of one or two variables. In fact, as noticed in [21],[24] and [26] (also in [17] and [20]), a multivariate Gaussianfunction can be decomposed into a product of functions ofone or two variables resulting in a fully connected pair-wiseMRF. On the other hand, in [21], noticing that BP maynot work well for a loopy graph, the authors proposed atree approximation on the basis of Kullback-Leibler distance(KLD) optimality criterion. In [24], the same authors proposedusing the potential functions obtained by two dimensionalprojection.

In this paper, we investigate a similar approach to the pair-wise MRF based MIMO detector, but with different formula-tion, i.e., instead of using the potential functions obtained fromthe direct decomposition of multivariate Gaussian function[21], [25], [26] or from the two dimensional projection in [24],we propose using the functions obtained by marginalizing theposterior joint probability density under the Gaussian inputassumption. The main advantage of this proposition over theone in [21], [25], [26] is that the proposed scheme workswell for higher modulation orders, such as 16QAM, whilethose in [21], [25], [26] do not. In addition to the fullyconnected pair-wise graph, we also consider the ring-typepair-wise graph over which the BP based detection algorithmhas even less computational complexity than the BP over thefully-connected pair-wise graph. The proposed scheme can beregarded as an edge pruning technique, similar to the one in[19]. Unlike that of [19], however, the pruning is performedby a linear transformation and the performance degradationcompared to the ML/MAP detector is shown to be reasonableeven with an edge degree of two.

This paper is organized as follows. In the next section, webriefly review the ML/MAP and the graph-based approachto MIMO detection. In Section III, the proposed iterativedetection algorithm is presented based on the fully-connectedand ring-type pair-wise models, respectively, for non-Gaussianinput. In Section IV, we customize the proposed algorithmsunder Gaussian input assumption (Gaussian BP), and discussits convergence property. The performance is extensively eval-

uated and compared via link-level simulations in Section Vand, finally, in Section VI, the concluding remarks are given.

II. SYSTEM MODEL, MAP AND GRAPH-BASEDDETECTION

System Model: A Gaussian MIMO system with an N ×Mchannel matrix HHH(N ≥M) is modeled as

yyy = HHHxxx+nnn =

M∑k=1

hhhkxk +nnn

where xxx is an M × 1 transmitted data symbol vector, nnn is anN × 1 noise vector, yyy is an N × 1 received signal vector andhhhm is the mth column of HHH . The noise vector nnn is assumed tobe complex Gaussian with mean 000 and covariance E[nnnnnnH ] =σ2III and the transmitted data symbol vector xxx is assumed tohave mean 000 and covariance matrix E[xxxxxxH ] = III , where E(·)denotes expectation. In practice, each element of xxx is usuallya 2m-ary symbol drawn from a finite alphabet set Ξ of size2m such as QPSK and 16-QAM, for which m = 2 and 4,respectively.

MAP detection: The maximum a posteriori (MAP) detectorselects xxx that maximizes the a posteriori likelihood

p(xxx|yyy) =p(yyy|xxx)p(xxx)

p(yyy)(1)

where

p(yyy|xxx) = CN(yyy;HxHxHx, σ2III

)(2)

p(xxx) =

M∏j=1

p(xj).

with CN (yyy;µµµ,CCC) representing a multivariate complex Gaus-sian probability density function (PDF) of mean µµµ and covari-ance CCC defined as

CN (yyy;µµµ,CCC) ≡ 1

(π)MdetCCCexp

(−(yyy −µµµ)HCCC−1(yyy −µµµ)

)where the superscript H denotes Hermitian transpose. Thesearch space of the MAP is an M -dimensional space, ΞM ,and the complexity is O(2mM ). When using concatenatedchannel coding and MIMO, a MIMO detector is required toproduce soft-decision values, i.e., log-likelihood ratio (LLR).Denoting the jth data symbol as xj(bj1, bj2, · · · , bjm), wherebj,k is the kth bit contained in xj . Then, LLR of bjkcan be obtained by first marginalizing p(xxx|yyy) over xxx\xj =(x1, x2, · · · , xj−1, xj+1, · · · , xM ) to get

p(xj = x|yyy) =∑

xxx\xj∈ΞM−1

p(x1, x2, · · · , xM |yyy)

= A ·∑

xxx\xj∈ΞM−1

p(yyy|x1, x2, · · · , xj = x, · · · , xM )∏k 6=j

p(xk)

(3)

where A is the normalizing constant and xj’s are assumedto be independent of each other. In (3), p(xj) is the apriori probability of xj , which is assumed to be uniformly

BELIEF PROPAGATION OVER PAIR-WISE GRAPHS 3

Fig. 1. Bipartite graphs for a 4× 4 MIMO channel. The circles are variablenodes corresponding to a data symbol and the boxes labeled by yyy and yj areobservation nodes corresponding to the received signal.

distributed, i.e., p(xj) = 1/2m for a modulation size of 2m.The LLR for each bit is then computed as

LLR(bj,k) = log

(p(bj,k = 0|yyy)

p(bj,k = 1|yyy)

)= log

(∑all xj :bjk=0 p(xj |yyy)∑all xj :bjk=1 p(xj |yyy)

). (4)

Graph-based detection (BP over fully-connected bipartitegraph): The MAP detections in (3) is useful for turboequalization [28], where one can find a vast amount ofliterature showing the validity of iterative MIMO detectionand channel decoding. Although turbo equalization is not ourmain focus in this paper, it is worthy of paying attention tothe iterative detection, especially the one in [19], i.e., the BPover the fully-connected bipartite graph. In fact, the MAPdetection in (3) can be regarded as a BP that is run over thesingly connected factor graph as shown in Fig. 1(a), whereeach variable node, representing a data symbol, first passesa priori information to the observation node labeled by thereceived vector, yyy. The observation node then provides eachvariable node with the corresponding a posteriori likelihoodby computing the marginalization in (3). Since the graph issingly connected and all variable nodes are connected via oneobservation node, the BP over this graph will surely converge,in one iteration, to the correct a posteriori probability. Thegraph-based detection in [19], on the other hand, is a BP overthe fully connected bipartite graph as shown in Fig. 1(b),where the marginalization is performed separately for eachobservation node and they are then combined to produce thebelief and the extrinsic information on each data symbol. Thealgorithm in [19] can be summarized as follows.

BP 1 over the fully-connected bipartite graph [19]For given a priori probability of xj, which

is assumed to be uniformly distributed, i.e.,

p(xj) = 1/2m for a modulation size of 2m

(1) Initialization:

λj→i(xj) = p(xj) ∀(i, j)

(2) Observation node computation:

πi→j(xj) =

A ·∑

xxx\xj∈ΞM−1

p(yi|x1, x2, · · · , xM )∏k 6=j

λk→i(xk) (5)

(3) Belief update:

b(xj) =

M∏k=1

πk→j(xj) (6)

(4) Variable node computation:

λj→i(xj) =∏k 6=i

πk→j(xj) =b(xj)

πi→j(xj)(7)

The message update (5)-(7) are repeated by a

pre-defined number or until the belief does

not change any more.

Note that, in (5), p(yi|x1, x2, .., xj = x, ..., xM ) is given by

p(yi|x1, x2, .., xM ) = CN(yi;∑M

j=1hijxj , σ

2

)and, by combining (5) and (6), we see that, at the first iteration

b(xj) ∝∏M

k=1p(xj |yk)

which is certainly different from p(xj |yyy) in (3). That is, in BP1, we first marginalize p(xxx|yk) for each received signal ykto obtain p(xj |yk) and, then, the belief is obtained by theirproduct, while, in MAP, we just marginalize p(xxx|yyy), onceand for all. Note also that since the marginalization in (5)is performed over M − 1 dimensional space and must beperformed for the total number 2m states of xj , the complexityfor one iteration is the same as that of MAP and the totalcomplexity is multiplied by the number of iteration resultingin far complex computation than that of MAP detection.Regardless of its complexity, however, it provides a basestructure for the development of low complexity detector.

Complexity Reduction via Edge Pruning: To reduce thecomputational burden of the marginalization in (5) for non-Gaussian input, [19] proposed pruning some edges of whichthe corresponding variable and observation nodes are weaklycoupled together, e.g., those variable-observation node pairswith small values of |hjk|. By using only df < M edges perobservation node (i.e., pruning M −df edges), the complexityis reduced by a factor of 1/2m(M−df) relative to the ML/MAPor the BP 1 of complexity O(2mM ). Here df is the edgedegree. The problem with this scheme is that df must be largeenough to ensure a reasonable performance, as shown in [19].

III. DETECTION ALGORITHM BASED ON PAIR-WISEGRAPHICAL MODELS

In this section, we develop low complexity iterative MIMOdetection algorithms based on the pair-wise graphical models.We consider two types, namely, the fully-connected and thering-type, and derive the corresponding BP algorithms thatwork for non-Gaussian input. As will be shown below, BPover the ring-type pair-wise graph is, with a slight difference,effectively equivalent to the one in [10].

4 TO APPEAR IN IEEE TRANS. ON VEHICULAR TECHNOLOGY, 2013

A. BP based on pair-wise Markov Random Field

Our starting point is the BP algorithm based on pair-wiseMarkov random field (MRF) in [18], [20] and [26]. MRF isan undirected graph that describes local dependencies among aset of random variables. In MRF, the joint PDF of all randomvariables involved can be represented by a product of the jointPDF of each clique.1 The pair-wise MRF means that a jointPDF (of all variables involved) is represented by a productof joint PDFs with only two variables corresponding to anedge connecting any two neighbors. Let V = {1, 2, · · · ,M}be the set of nodes in the MRF corresponding to the randomvariables x1, x2, · · · , xM , respectively, and let E be the set ofall edges connecting these nodes. For a compact expression,we also denote the edge connecting nodes j and k as e(j, k)and the set of neighbors of the jth node as V (j). In pair-wiseMRFs, the a posteriori joint function p(x1, x2, · · · , xM |yyy) ismodeled by a product of pair-wise potential functions [18],[27], e.g.,

p̂(x1, x2, · · · , xM |yyy) = A ·∏i∈V

ψi(xi)∏

(i,j):e(i,j)∈E

φij(xi, xj),

(8)

where ψ(xi) is self-potential assigned to each node andφ(xi, xj) is the edge potential assigned to each edge. Suchmodeling based on a pair-wise MRF can also facilitate themarginalization to finally obtain the marginal distribution foreach random variable. Denoting the (incoming) message fromthe ith to the jth node as πi→j(xj), the BP through the pair-wise MRF can be described as [18]

πi→j(xj) = α∑xi∈Ξ

ψi(xi)φij(xi, xj)∏

k∈V (i)\j

πk→i(xi) (9)

where, α is the normalizing constant, V (i)\j is the set ofneighbors of node i excluding node j. Note that we followthe convention in [18], [20], and [26] to describe the messagepassing over a MRF, where only one type of node, say thevariable nodes, exist and the message flies between thesevariable nodes.

When we use a bipartite graph as shown in Fig. 1(b), weneed to define two types of messages, i.e., one from variablenode to observation node and the other from observation nodeto variable node, which can be easily obtained by dividing (9)into two separate steps, i.e., λi→j(xi) =

∏k∈V (i)\j πk→i(xi)

(variable-to-observation node message) and πi→j(xj) =α∑xi∈Ξ ψi(xi)φij(xi, xj) ·λi→j(xi) (observation-to-variable

node message). Here, we can say that the incoming mes-sages are combined first to produce the extrinsic information,∏k∈V (i)\j πk→i(xi), and they are then “translated” by the po-

tential function, ψi(xi)φij(xi, xj). The belief on the variable,xj , is given by

b(xj) =∏

k∈V (i)

πk→j(xj). (10)

The potential functions in (8) is given by a fatorization ofthe joint a posteriori probability. Specifically, in [18], [26], the

1A clique in a graph is defined by a set of nodes having full-connection toeach other.

potential function is obtained by decomposition of multivariateGaussian function, i.e.,

φi,j(xi, xj) = Aij exp

(− 1

σ2Re[x∗iRijxj ]

)ψi(xi) = Ai exp

(− 1

σ2Re[x∗i y

′j −Rii|xi|2]

) (11)

where Rij = hhhHi hhhj , y′j = hhhHi yyy, and ∗ denotes complex con-

jugate. In fact, such decomposition gives us a fully connectedpair-wise MRF and is exact in the sense that (8) with thefunctions in (11) is exactly the same as the joint GaussianPDF. It has been shown in [17] and [18] that, with (11),the BP over the fully connected pair-wise MRF results in theMMSE solution if it converges (though the convergence is notalways guarateed for arbitrary channel matrices). Most of all,however, it does not work well for non-Gaussian input and theperformance is shown to be inferior to the ML/MAP detector,especially for higher order modulation.

B. Proposed BP algorithm over pair-wise graphical models

In this paper, we propose using the following messagepassing rule.

πi→j(xj) = α∑xi∈Ξ

p̃(xj |xi, yyy)∏

k∈V (i)\j

πk→i(xi). (12)

where p̃(xj |xi, yyy) is the conditional a posteriori probabilityderived under the Gaussian input assumption to be discussedshortly. Comparing with (9), the potential function in (9) isreplaced with p̃(xj |xi, yyy). Note, however, that it is not a factorof the a posteriori probability in (1), unlike those in (11).

The trick here is to use p̃(xj |xi, yyy) obtained under Gaussianinput assumption in order to approximate the marginal PDFof non-Gaussian data. Note also that although the translationfunction p̃(xj |xi, yyy) is obtained under the Gaussian assump-tion on the data symbol, the message itself, πi→j(xj), is nottreated as Gaussian. The rationale of using p̃(xj |xi, yyy) is toreduce the computational complexity. Let p(xj |xi, yyy) be thetrue conditional a posteriori probability without the Gaussianassumption. Further assume that, after many iterations, theextrinsic information

∏k∈V (i)\j πk→i(xi) for the ith node (a

neighbor of the jth node) converges to its true a posteri-ori marginal distribution, p(xi|yyy). Then, with an appropriatenormalizing constant, we also have πi→j(xj) → p(xj |yyy) forthe jth node, which means, once converged, this translationfunction ensures that the final belief is given by the truemarginal a posteriori distribution. This is actually a non-sense since, before we run the algorithm, we need first tocompute p(xj |xi, yyy), which, however, has a complexity ofML detection. Hence, at this step, we assume xj’s are allGaussian to obtain p̃(xj |xi, yyy), of which the computation ismuch simpler as to be discussed shortly. It is a simple trickto use p̃(xj |xi, yyy) obtained under Gaussian input assumptionto approximate the true posterior marginal for non-Gaussianinput (i.e., p(xj |yyy)).

On the other hand, the conditional PDF, p̃(xj |xi, yyy), underGaussian input assumption can be easily obtained from the

BELIEF PROPAGATION OVER PAIR-WISE GRAPHS 5

following simple probability relations, i.e.,

p̃(xi, xj |yyy)p̃(yyy) = p̃(yyy|xi, xj)p̃(xi, xj) = p̃(xj |xi, yyy)p̃(xi, yyy)

resulting in

p̃(xj |xi, yyy ) =p̃(xi, xj |yyy)

p̃(xi|yyy)=p̃(yyy|xi, xj)p̃(xj)

p̃(yyy|xi)(13)

where

p̃(yyy|xi, xj) = CN(yyy;hhhixi + hhhjxj ,KKK{j,i}

)(14)

p̃(yyy|xi) = CN(yyy;hhhixi,KKK{i}

)(15)

p̃(xi) = CN (xi; 0, 1)

with

KKKΦ = σ2III +∑

k/∈Φhhhkhhh

Hk (16)

for Φ = {i, j} or {i}. In the second equality in (13), we usedthe independence assumptions on xj’s.

Moreover, the Gaussian input assumption leads us to a muchsimpler form. First, define the conditional MMSE estimator forxj given xi,

cccj|i = KKK−1{j,i}hhhj (17)

and y′j|i = cccHj|iyyy such that

y′j|i = cccHj|iyyy = aj|i,jxj + aj|i,ixi + n′j|i (18)

where

aj|i,k = cccHj|ihhhk = hhhHj KKK−1{j,i}hhhk for k = i or j (19)

E|n′j|i|2 = cccHj|iKKK{j,i}cccj|i = hhhHj KKK

−1{j,i}hhhj ≡ σ

2j|i. (20)

Then, (13) can be rewritten as

p̃(xj |xi, y′j|i) =p̃(y′j|i |xi, xj )p̃(xj)

p̃(y′j|i |xi )(21)

with

p̃(y′j|i |xi, xj ) = CN (y′j|i; aj|i,jxj + aj|i,ixi, σ2j|i) (22)

p̃(y′j|i |xi ) = CN (y′j|i; aj|i,ixi, σ2j|i + |aj|i,j |2). (23)

In (23), we used p(xj) = CN (xj ; 0, 1). Plugging (22) and (23)into (21) and by replacing p(xj) with CN (xj ; 0, 1), we havethe simplified translation function from the derivation in theappendix.

p̃(xj

∣∣∣xi, y′j|i )= CN

xj ; a∗j|i,j(y′j|i − aj|i,ixi

)σ2j|i + |aj|i,j |2

,σ2j|i

σ2j|i + |aj|i,j |2

= CN

(xj ;

1

1 + σ2j|i

(y′j|i − aj|i,ixi

),

1

1 + σ2j|i

)(24)

where, in the last line, we used the fact that aj|i,j is realvalued and is equal to σ2

j|i. Note that in (24), the mean is theconditional MMSE estimate of xj given xi.

Using the equations from (17) to (24), the proposedmessage passing rule can be summarized as follows.

Fig. 2. The bipartite graph for (a) fully-connected pair-wise model and (b)ring-type pair-wise model, respectively, for a 4×N MIMO channel.

BP 2 over the fully-connected pair-wise graphGiven the messages in the previous iteration,

πk→i(xi),(1) Compute the extrinsic information for

all pairs (i, j) with i 6= j 2.

λi→j(xi) =∏

k∈V (i)\j

πk→i(xi) (25)

(2) Translate the message λi→j(xj) to πi→j(xj)

πi→j(xj) = α∑xi∈Ξ

p̃(xj |xi, y′j|i) · λi→j(xi) (26)

with p̃(xj |xi, y′j|i) given by (24). The above

message passing is computed for all edges in

both directions, and they are repeated by a

pre-defined number or until the messages do

not change any more. The belief is finally

obtained the same as that in (10).

Note that the above algorithm uses two types of messageand can be efficiently described by a message passing overa bipartite graph in Fig. 2(a), where the observations usedfor the message translation from the jth variable node tothe ith and its reverse is clearly denoted by y′j|i and y′i|j ,respectively 3. It is also interesting to note that the abovealgorithm is similar to the algorithms in [9] and [10] withtwo differences. One is in the underlying structure and theother in message translation. To clarify the similarity anddifference, we consider the ring-type bipartite graph shownin Fig. 2(b). In this ring-type graph, each (variable) node hasonly two neighbors and, hence, in the computation of extrinsicinformation, the incoming message from one neighbor issimply passed to the other and the detection algorithm can bedescribed more concisely and clearly as follows (even thoughBP2 can be generally applicable to any pair-wise graphicalmodel).

2In our original paper in [29], the algorithm has been developed using MRF,where the node indices i and j (as shown in this algorithm) represents the twovariable nodes connected by an edge. Here, the node indices i and j are alsotwo variable nodes connected through an observation node in the pair-wisegraphical model shown in Fig. 2(a)

3The algorithm can also be described over the corresponding MRF, forwhich the reader may refer to as [29].

6 TO APPEAR IN IEEE TRANS. ON VEHICULAR TECHNOLOGY, 2013

BP 3 over the ring-type pair-wise graph (Forward-backward recursion)Given the messages in the previous iteration,

πk→i(xi),(1) Variable node to observation node message

λj→(j±1)M (xj) = π(j∓1)M→j(xj) ∀j (27)

(2) Observation node to variable node message

πj→(j±1)M (x(j±1)M )

=∑xj∈Ξ

p̃(x(j±1)M |xj , y′(j±1)M |j) · λj→(j±1)M (xj) ∀j

(28)

with p̃(xj |xi, y′j|i) given by (24). After a

pre-defined number of iterations, the belief

is finally obtained by

b(xj) = π(j+1)M→j(xj) · π(j−1)M→j(xj). (29)

From (27) to (29), (·)M denotes the 1-base modulo-M oper-ation such that (M + 1)M = 1 and (0)M = M . Later on,however, we will omit this for notational simplicity.

On the other hand, this message update rule is a forward-backward algorithm similar to those in [9], i.e., the messagefrom the (j − 1)th node to the jth node corresponds to theforward message, and the one from the (j + 1)th node to thejth node corresponds to the backward message. The differenceis in the message translation. In (28), the message translationfrom the jth node to the ith and its reverse utilize differenttranslation functions, i.e.,

y′j|i 6= y′i|j ⇒ p(xj |xi, y′j|i) 6= p(xi|xj , y′i|j)

This means the branch metrics used for the forward andbackward recursion are separately optimized to maximize theirconditional SINR, as also proposed in [10]. The translationfunction is also different from the branch metric in [10],i.e., the mean and variance in (24) have a scaling factor ofa∗j|i,j/(σ

2j|i + |aj|i,j |2) and 1/(σ2

j|i + |aj|i,j |2), respectively,instead of a∗j|i,j/|aj|i,j |

2 and 1/|aj|i,j |2, though it has a minorimpact on the error rate performances. Note that, for ring-type graph, we obtain different performance with a differentantenna permutation, as also noted in [7], while, in the fully-connected one, we do not need antenna permutation, which isone possible advantage of the latter to the former.

Since the graphical models in Fig. 2 have short cycle(s)(especially the fully-connected pair-wise graph), it is quitequestionable whether or not BP 2 and 3 will converge. Inthe literature, it was known that the convergence of BP over aloopy graph is not guaranteed, even though it does convergein most practical cases. Since the convergence proof for non-Gaussian input is not tractable, we will tackle this question inthe next section by modifying them for Gaussian input.

C. Complexity

For complexity comparisons, we need to consider both thelinear preprocessing and the post iterative detections. Considerfirst the computational complexity of the post iterative detec-tion only. In the MAP detector, the distance metric |yyy−HxHxHx|2 is

computed first for all combinations of (x1, x2, · · · , xM ) ∈ ΞM

and, then, the marginalization in (3) is performed over allcombinations of xxx\xj ∈ ΞM−1 for each of 2m alphabet,resulting in a complexity ofO(M2·2mM ). Comparing with thecomplexity of MAP detector, the computational burden in theBP2 for the fully-connected pair-wise graph in Fig. 2(a) for νiterations is O(ν ·M(M − 1) · 22m) since the marginalizationfor each M node is performed separately for its (M − 1)neighbors and repeated ν times. Although some additionalcomputation is required for the linear processing in (16)-(20), it is typically much smaller than 2m(M−1), resulting inconsiderable computational reduction, which certainly comesfrom modeling through the pair-wise graphical model. On theother hand, the computational complexity for the ring-typepair-wise graph in Fig. 2(b) is O(ν ·M · 22m), which is evenless than that of the fully-connected one.

To evaluate approximate number of operations, we assume:

1) The marginalization in (3) for the MAP and the compu-tation in (26) and (28) for the BP 2 and 3, respectively,are performed in log-domain, where multiplications andadditions in these equations are replaced with additionand max-operation, respectively, and, in (2) and (24), weonly need to compute its exponent.

2) A multiplication of a (p×q) matrix with a (q×r) matrixrequires pqr times of multiplications and additions (ofcomplex numbers).

3) An inversion of a (p × p) square matrix approximatelyrequires 2p3 − 2p2 times of additions, 2p3 − p2 timesof multiplications and p2 times of divisions (of complexnumbers).

4) Division of complex numbers requires one complex mul-tiplication and two real divisions.

5) A complex addition requires two real additions andcomplex multiplication requires four real multiplicationsand two real additions.

6) Real addition and multiplication are assumed to have thesame complexity of one (operation), while real divisionto have 8 (operations).

With these assumptions, we can count the number ofoperations required to generate the symbol likelihoods, i.e.,the a posteriori likelihood in (3) for the MAP and the finalbeliefs in the BP2 and BP3. We do not count the generationof LLR for each bit from the symbol likelihood since it is thesame for all detectors. The results are summarized in Table.1,where we also show two examples, one with M = 6, m = 2,ν1 = 4, ν2 = 6 and the other with M = 4, m = 4, ν1 = 4,ν2 = 6, where ν1 and ν2 are the number of iterations for theBP2 and BP3, respectively.

It will be interesting to compare the complexity of theproposed schemes with the one in [26]. As analyzed for BP2and BP3, the complexity can be considered separately for thepreprocessing and the post decoding. For the latter, the com-plexity of the one in [26] should be the same as that of the BP2,though it would be more complex than that of the BP3 in ourproposal. The main difference is in the preprocessing stage.Certainly, the complexity of the preprocessing in [25] and [26]is much less than that of the proposed preprocessing since it

BELIEF PROPAGATION OVER PAIR-WISE GRAPHS 7

TABLE IAN APPROXIMATE NUMBER OF OPERATIONS REQUIRED FOR DETECTION IN A SINGLE M ×M MIMO CHANNEL (MODULATION SIZE OF 2m).

Detector Linear preprocessing Post detection M=6, m=2, M=4, m=4ν1=4, ν2 = 6 ν1=4, ν2=6

MMSE 24M3 + 18M2 + 2M 6M · 2m 5,988 2,216ML 0 2mM · (8M2 + 9M) 1,454,080 11,337,728BP2 16M4 + 60M3 + 437M2 − 486M [22m · (2ν1 + 21) + 2mν1] ·M(M − 1) 61,032 102,648BP3 56M3 + 113M2 + 914M 22m · (2ν2 + 21) · 2M 27,984 76,632

Parameters ν1 and ν2 are the number of iterations for the BP 2 and BP 3, respectively.

consists of only two matrix multiplications, i.e., HHHHHHH andHHHHrrr, which requires M3 + M2 of complex multiplicationsand the same number of complex additions.

IV. MESSAGE PASSING WITH GAUSSIAN INPUT

In Section III, we developed BP algorithms run over thepair-wise bipartite graphs for non-Gaussian messages. TheGaussian assumption on xj’s was employed first to obtainthe translation function in (24). While, we used the exactmarginalization in the message translation step. In this section,we further simplify the message passing rule by extending theGaussian assumption to the message translation step, as wasdone in [17], [18], [20], to obtain the Gaussian BP over thetwo graphical models under consideration.

ML detection with Gaussian input: With indepen-dent and identically distributed Gaussian input, p(xxx) =∏Mj=1 CN (xj ; 0, 1), the MAP detector in (3) becomes

p(xj |yyy) = A ·∫· · ·∫CN (yyy;HHHxxx, σ2)

∏k 6=j

CN (x; 0, 1) · dxxx\xj

= CN (xj ;hhhHj KKK

−1yyy, 1− hhhHj KKK−1hhhj)

(30)

where we appropriately select a normalization constant A,while the covariance matrix KKK, is given by KKK = (HHHHHHH +σ2III). Noting that, in (30), the mean is the linear MMSEestimates of xj and the variance is the corresponding minimumMSE, i.e.,

x̂j = hhhHj KKK−1yyy (31)

MMSEj = 1− hhhHj KKK−1hhhj . (32)

This means that linear MMSE estimation is optimum for theGaussian input, while it does not hold for non-Gaussian input.

A. Gaussian BP over the proposed pair-wise graphs

Assuming that xj’s are Gaussian and the distributionsπi→j(xj), and b(xi) are all Gaussian PDFs, they can becharacterized by their mean and variance only. This means themessages πi→j(xj) and the belief, b(xi), in the BP 2 and 3can be replaced with the update rule for the mean and variancepair. Since the Gaussian BP corresponding to the BP 1 overthe fully connected pair-wise graph in (5)-(7) has already beendiscussed in [17], we consider here only the BP 2 and 3 overthe two pair-wise graphical models.

Let us denote the mean and the variance pair of the complexGaussian PDFs, πi→j(xj), and b(xi) as (µπ,i→j , σ

2π,i→j)

and (µi, σ2i ). Then, the BP 2 and 3 under the Gaussian input

assumption can be rewritten as follows (Detailed derivationsare shown in the appendix):

Gaussian BP 2G over the fully-connected pair-wisegraphGiven the messages in the previous iteration

(or the initial messages), (µπ,i→j , σ2π,i→j) ∀(i, j):

i 6= j, they are recursively updated by

σ2π,i→j =

1

1 + σ2j|i

+|aj|i,i|2

(1 + σ2j|i)

2·(∑

k∈V (i)\jσ−2π,k→i

)−1

(33)

µπ,i→j =y′j|i

1 + σ2j|i−

aj|i,i

1 + σ2j|i·∑k∈V (i)\j σ

−2π,k→iµπ,k→i∑

k∈V (i)\j σ−2π,k→i

(34)

After a number of iterations of the above, the

final belief on xi is obtained by

σ−2i =

∑k∈V (i)

σ−2π,k→i (35)

µi =

∑k∈V (i) σ

−2π,k→iµπ,k→i∑

k∈V (i) σ−2π,k→i

. (36)

Gaussian BP 3G over the ring-type pair-wise graph(Gaussian forward-backward recursion)Given the messages in the previous iteration,

(µπ,i→i±1, σ2π,i→i±1) ∀i, they are recursively

updated by

σ2π,i→i±1 =

1

1 + σ2i±1|i

+|ai±1|i,i|2

(1 + σ2i±1|i)

2· σ2

π,i∓1→i (37)

µπ,i→i±1 =1

1 + σ2i±1|i

y′i±1|i −ai±1|i,i

1 + σ2i±1|i

· µπ,i∓1→i (38)

After a number of iterations of the above, the

final belief on xi is obtained by

σ−2i = σ−2

π,i+1→i + σ−2π,i−1→i (39)

µi =σ−2π,i+1→iµπ,i+1→i + σ−2

π,i−1→iµπ,i+1→i

σ−2π,i+1→i + σ−2

π,i−1→i. (40)

Particularly, in the Gaussian BP 3G, we observe the following:1) The variance and mean are updated separately (except in

the final belief).2) In (37) and (38), there are two separate message flows;

one is the forward from i to i+1 and the other is thebackward from i to i-1.

8 TO APPEAR IN IEEE TRANS. ON VEHICULAR TECHNOLOGY, 2013

3) Eq. (38) can be rewritten as

Forward recursion: µπ,i→i+1 = Fi ◦ µπ,i−1→i (41)

Backward recursion: µπ,i→i−1 = Bi ◦ µπ,i+1→i (42)

where the operations, Fi and Bi, are first order elementaryfunction defined as

Fi ◦ µ ≡ ui+1,i + vi+1,i · µ (43)Bi ◦ µ ≡ ui−1,i + vi−1,i · µ (44)

with

uj,i =y′j|i

1 + σ2j|i

=hhhHj KKK

−1{j,i}yyy

1 + hhhHj KKK−1{j,i}hhhj

= hhhHj KKK−1{i}yyy (45)

vj,i =−aj|i,i1 + σ2

j|i=−hhhHj KKK

−1{j,i}hhhi

1 + hhhHj KKK−1{j,i}hhhj

= −hhhHj KKK−1{i}hhhi.

(46)

Here, we used (17)-(20) and, in the last, the matrixinversion lemma

(AAA+BBBBBBH)−1 = AAA−1 −AAA−1BBB(III +BBBHAAA−1BBB)−1BBBHAAA−1.

4) Similar to the means, (37) can also be rewritten as

Forward recursion: σ2π,i→i+1 = F ′i ◦ σ2

π,i−1→i (47)

Backward recursion: σ2π,i→i−1 = B′i ◦ σ2

π,i+1→i (48)

where

F ′i ◦ µ ≡ u′i+1,i + v′i+1,i · µ (49)

B′i ◦ µ ≡ u′i−1,i + v′i−1,i · µ (50)

with

u′j,i =1

1 + σ2j|i

=1

1 + hhhHj KKK−1{j,i}hhhj

(51)

v′j,i =|aj|i,i|2

(1 + σ2j|i)

2=

∣∣∣∣∣ hhhHj KKK−1{j,i}hhhi

1 + hhhHj KKK−1{j,i}hhhj

∣∣∣∣∣2

=∣∣∣hhhHj KKK−1

{i}hhhi

∣∣∣2 .(52)

B. Convergence of Gaussian BP

Regarding the convergence of Gaussian BP, it was previ-ously shown in [30] that Gaussian BP for arbitrary topologyconverges to the correct mean (see also [31]). It was shownin [17] that the Gaussian BP over the bipartite graph inFig. 1(b) converges to the linear MMSE solution, even thoughits convergence is not assured. Based on these findings, wecan conjecture that, for both the Gaussian BP of rules 2Gand 3G, the mean converges to the linear MMSE solution,as also verified by simulations in the next section. One wayto prove the convergence would be to use the idea of the“unwrapped tree” presented in [30]. In our case, however, thiswould be a tedious derivation. Therefore, we try an alternativeapproach that works for GBP 3G, but not yet for GBP 2G.Note, however, that the derivation here differs from [17], [18]in the underlying graphical model and the translation functionused. The objective in this subsection is to prove the followingtheorem.

Theorem 1: In the Gaussian BP 3G over the ring-type pair-wise graph, the mean converges to the linear MMSE estimate(31) for non-zero noise power as the number of iterationsapproaches infinity.

The proof is based on the following Lemmas.Lemma 2: For an arbitrary initial value µ(0), both the for-

ward and backward recursions for the mean in (38) convergerespectively to a unique, fixed point.

Proof: Define one iteration as one complete turn of amessage passing along the ring and consider, without loss ofgenerality, the message at Node 1. Based on observations 1)through 3) in the previous subsection, we obtain the recursiverelations for Node 1, i.e., using an arbitrary initial value µ(0),we have

µπ,1→2(n) = (FM ◦ · · ·F3 ◦ F2 ◦ F1◦)µπ,1→2(n− 1)

=(FM ◦ · · ·F3 ◦ F2 ◦ F1◦)nµ(0)(53)

µπ,1→M (n) = (B2 ◦B3 ◦ · · ·BM ◦B1◦)µπ,1→M (n− 1)

= (B2 ◦B3 ◦ · · ·BM ◦B1◦)n µ(0)(54)

where n is the iteration number and the collective operationsfor one iteration of the forward/backward recursion are given,respectively, by

F1,T ◦ µ =FM ◦ · · ·F3 ◦ F2 ◦ F1 ◦ µ = f1,U + f1,V µ (55)B1,T ◦ µ =B2 ◦B3 ◦ · · ·BM ◦B1 ◦ µ = b1,U + b1,V µ

(56)

for some constants, f1,U , f1,V , b1,U ,and b1,V , which, in turn,are monomials of uj,i and vj,i in (43) and (44). For example,we have for M = 4

F4◦F3 ◦ F2 ◦ F1 ◦ µ = (u1,4 + v1,4u4,3+

v1,4v4,3u3,2 + v1,4v4,3v3,2u2,1) + (v1,4v4,3v3,2v2,1) · µB2◦B3 ◦B4 ◦B1 ◦ µ = (u1,2 + v1,2u2,3+

v1,2v2,3u3,4 + v1,2v2,3v3,4u4,1) + (v1,2v2,3v3,4v4,1) · µ

for which

f1,U =u1,4 + v1,4u4,3 + v1,4v4,3u3,2 + v1,4v4,3u3,2u2,1

f1,V =v1,4v4,3v3,2v2,1

b1,U =u1,2 + v1,2u2,3 + v1,2v2,3u3,4 + v1,2v2,3v3,4u4,1

b1,V =v1,2v2,3v3,4v4,1.

Here, we can show that f1,V and b1,V are given, respectively,by

f1,V =

M∏j=1

vj,j−1 and b1,V =

M∏j=1

vj,j+1. (57)

On the other hand, using (55) and (56), (53) and (54) become

µπ,1→2(n) =(F1,T ◦)nµ(0) = f1,U ·n−1∑k=0

fk1,V + fn1,V · µ(0)

(58)

µπ,1→M (n) =(B1,T ◦)nµ(0) = b1,U ·n−1∑k=0

bk1,V + bn1,V · µ(0)

(59)

BELIEF PROPAGATION OVER PAIR-WISE GRAPHS 9

where, from the fact to be proved in the next Lemma that|fi,V | < 1 and |bi,V | < 1, we have

fn1,V · µ(0)→ 0, bn1,V · µ(0)→ 0 as n→∞.

Therefore, the unique fixed point of the mean in GBP 3G isgiven by

limn→∞

µπ,1→2(n) → f1,U ·∞∑k=0

fk1,V =f1,U

1− f1,V(60)

limn→∞

µπ,1→M (n) → b1,U ·∞∑k=0

bk1,V =b1,U

1− b1,V. (61)

Lemma 3: |fi,V | = |∏Mj=1 vj,j−1| < 1 and |bi,V | =

|∏Mj=1 vj,j+1| < 1 for all i.

Proof: By plugging into (46) into (57), we have for all i.

|fi,V | =

∣∣∣∣∣∣M∏j=1

hhhHj KKK−1{j,j−1}hhhj−1

1 + hhhHj KKK−1{j,j−1}hhhj

∣∣∣∣∣∣ =

∣∣∣∣∣∣M∏j=1

hhhHj KKK−1{j−1}hhhj−1

∣∣∣∣∣∣=∣∣∣hhhH1 KKK−1

M hhhMhhhHMKKK

−1{M−1} · · ·KKK

−12 hhh2hhh

H2 KKK

−11 hhh1

∣∣∣(a)=∣∣∣tr(hhh1hhh

H1 KKK

−1{M}hhhMhhh

HMKKK

−1{M−1} · · ·KKK

−12 hhh2hhh

H2 KKK

−11

)∣∣∣(b)

∣∣∣∣∣∣M∏j=1

tr(hhhjhhhMj KKK

−1{j−1})

∣∣∣∣∣∣ =

∣∣∣∣∣∣M∏j=1

hhhHj KKK−1{j−1}hhhj

∣∣∣∣∣∣(c)=

M∏j=1

hhhHj KKK−1{j,j−1}hhhj

1 + hhhHj KKK−1{j,j−1}hhhj

=

M∏j=1

σ2j|j−1

1 + σ2j|j−1

< 1

where, (a) follows by the fact that aaaHbbb = tr(bbbaaaH) for arbitraryvectors aaa and bbb, and (b) results from tr(AAABBB) ≤ tr(AAA)tr(BBB) forarbitrary non-negative definite matrices AAA and BBB. Also, (c)follows by the matrix inversion Lemma, i.e.,

hhhHj KKK−1{j+1} = hhhHj

(KKK{j,j+1} + hhhjhhh

Hj

)−1

=

(1−

hhhHj KKK−1{j,j+1}hhhj

1 + hhhHj KKK−1{j,j+1}hhhj

)hhhHj KKK

−1{j,j+1}

=

(1

1 + hhhHj KKK−1{j,j+1}hhhj

)hhhHj KKK

−1{j,j+1}.

For the backward recursion, |bi,V | < 1 can also be proved ina similar way.

In (58) and (59), we see that the convergence rate dependson ∣∣∣∣∣∣

M∏j=1

hhhHj KKK−1{j−1}hhhj−1

∣∣∣∣∣∣ ≤M∏j=1

σ2j|j−1

1 + σ2j|j−1

< 1

which is similar to the result in [17]. Note that hhhHi KKK−1{i−1}hhhi−1

reflects the channel correlation between neighboring antennas.On the other hand, the operations, Fi and Bi, are not

permutable, such that Fi ◦ Fj ◦ µ and Fj ◦ Fi ◦ µ may bedifferent, and so are Fj,T ◦ µ and Fi,T ◦ µ for j 6= i. That is,the fixed point for each node may differ from one another.

The following two Lemmas show that the fixed points in(60) and (61) are both equal to the MMSE estimate in (31).

Lemma 4: In the forward recursion, µπ,i→i+1(n) is thelinear MMSE estimates of xi+1 provided that the previousmessage, µπ,i−1→i(n), is the linear MMSE estimates of xi.Likewise, in the backward recursion, µπ,i→i−1(n), is the linearMMSE estimates of xi−1 provided that µπ,i+1→i(n) is thelinear MMSE estimates of xi.

Proof: With cccj = KKK−1hhhj , the linear MMSE estimate ofxi is given by hhhHi KKK

−1yyy. And, hence, the proof is to showfrom (41) and (42) that

hhhHi+1KKK−1yyy = Fi ◦ (hhhHi KKK

−1yyy)

= ui+1,i + νi+1,i · (hhhHi KKK−1yyy) (62)

hhhHi−1KKK−1yyy = Bi ◦ (hhhHi KKK

−1yyy)

= ui−1,i + νi−1,i · (hhhHi KKK−1yyy)

where uj,i and νj,i are given by (45) and (46). Plugging theseinto the right hand side of (62) for the forward recursion, wefinally have

ui+1,i + νi+1,i · (hhhHi KKK−1yyy)

= hhhHi+1KKK−1{i}yyy − hhh

Hi+1KKK

−1{i}hhhihhh

Hi KKK

−1yyy

= hhhHi+1

(KKK−1{i} −KKK

−1{i}hhhihhh

Hi KKK

−1)yyy

= hhhHi+1

(KKK−1{i} −KKK

−1{i}hhhi

hhhHi KKK−1{i}

1 + hhhHiKKK−1{i}hhhi

)yyy

= hhhHi+1KKK−1yyy.

Similarly, for the backward recursion, we obtain

ui−1,i + νi−1,i · (hhhHi KKK−1yyy) = hhhHi−1KKK

−1yyy.

Lemma 5: Both the fixed points, fj,U1−fj,V and bj,U

1−bj,V , areequal to the MMSE estimate of xj , i.e., hhhHj KKK

−1yyy.Proof: Without loss of generality, let us consider the first

data symbol, x1. Starting from hhhH2 KKK−1yyy = F1 ◦ (hhhH1 KKK

−1yyy),we can successively apply the operations, F2◦, F3◦,..., FM◦to finally obtain

FM◦FM−1 ◦ · · · ◦ F2 ◦ F1 ◦ (hhhH1 KKK−1yyy)

= F1,T ◦ (hhhH1 KKK−1yyy)

= f1,U + f1,V · (hhhH1 KKK−1yyy)

= (hhhH1 KKK−1yyy)

where the first and second equality are obtained by thedefinition of F1,T ◦ and (f1,U , f1,V ), respectively, and the lastequality is from Lemma 4, i.e., hhhH1 KKK

−1yyy = FM ◦(hhhHMKKK−1yyy).

From the last equality, we obtain fj,U1−fj,V = hhhHj KKK

−1yyy and,

using similar derivation, we also can prove that bj,U1−bj,V =

hhhHj KKK−1yyy.

Proof: The proof of Theorem 1 is now obvious from theabove lemmas, i.e., from Lemma 2 the mean in the GaussianBP over the ring-type graphical model converges to a uniquefixed point and Lemma 5 shows that the fixed point of themean is equal to the linear MMSE estimates in (31).

Note that the Theorem 1 holds for any channel matricesif noise variance is not zero since, for σ2 > 0, the covariance

10 TO APPEAR IN IEEE TRANS. ON VEHICULAR TECHNOLOGY, 2013

matrices, KKK{i,j}’s in (16) are always invertible so that therecertainly exist the MMSE estimator in (17) and the translationfunctions in (24) for all pair of (i, j).

Since the message-update rule for the variance in (47) and(48) have the same form as in (41) and (42), we can alsoprove the convergence of the variance in GBP 3G, which canbe summarized by the following Lemma.

Theorem 6: In the Gaussian BP 3G over the ring type pair-wise graph, both the forward and backward recursion for thevariance converges to the MMSE in (32) as the number ofiterations approaches infinity.

It can be proved similarly to that of Theorem 1. Since theforward and backward recursion for the variance has the sameform for the mean, it is obvious that they converge to a uniquefixed point similar to Lemmas 2 and 3. Furthermore, one alsocan check that if the input variance is the MMSE of x̂j , thenthe output variance is the MMSE of x̂j±1. One thing to noteis that since the forward and backward recursions on varianceconverge to the MMSE in (32), the combination of them in(39) is half of the MMSE.

It will also be worth comparing GBP 2G and 3G proposedin this paper and the Gaussian BP in [17], [18], and [20], allof which are based on the direct decomposition of GaussianPDF, and, as noticed in [20], are the same algorithm. Thecomparison can be made in several aspects, i.e., in complexityand convergence. In complexity, the Gaussian BP in [17], [18],and [20] is much simpler than GBP 2G and 3G proposed here.Note that (1) Gaussian BP in [17], [18], and [20] does notrequire preprocessing while GBP 2G and 3G in this paper doand (2) the complexity of the post iteration for the former isobviously the same as that of GBP 2G since they utilize thesame graphical model, even though the post iteration of GBP3G is a little bit less complex than GBP 2G. Based on this,the overall complexity of the proposed GBP 2G is certainlymore complex than those in [17], [18], and [20], even though itdoes not generally hold for GBP 3G. Now, let us consider theirconvergence. Basically, GBP 3G proposed in this paper andthe Gaussian BP in [17], [18], and [20] results in an MMSEsolution (in mean) if they converge, as proved here for GBP3G and in [17] for Gaussian BP with the direct decomposition.This means that, once converged, they will perform the same.Unfortunately, the convergence of the Gaussian BP in [17],[18], and [20] is not assured while GBP 3G surely converges.

V. SIMULATION RESULTS

In this section, we present simulation results for the iterativealgorithms with and without channel coding. For channelcoding, we used DVB-S2 LDPC code of rates 3/4 and length64800 [32]. The performances of MAP and MMSE are alsoevaluated as references. In the transmitter, a block (48600bits) of random information bits are generated first and thencoded using the LDPC encoder and then interleaved with arandom interleaver and modulated into a sequence of 2m-ary symbols. The symbol sequence is then divided into sub-blocks of M symbols, each of which is fed to a transmitantenna, where M corresponds to the number of transmitantennas. At the receiver, the sequence of received vectors

Fig. 3. A comparison of bit error rate performance of MMSE, MAP and theproposed detectors as a function of SNR (1/σ2); 4×4 antenna configuration,QPSK modulation, (a) No channel coding, (b) DVB-S2 LDPC code of rate3/4 (length 64800).

Fig. 4. A comparison of bit error rate performances of MMSE, the proposeddetectors and the detector in [26] with a damping factor 0.45; 4× 4 antennaconfiguration, BPSK modulation, (a) No channel coding, (b) DVB-S2 LDPCcode of rate 3/4 (length 64800).

is passed to MIMO detector, which generates the estimates ofsymbol likelihoods and LLRs for each coded bit . The LLRis then de-interleaved and decoded by using a generic LDPCdecoder. (In the transmitter and receiver, the interleaving/de-interleaving and channel coding/decoding is used if channelcoding is applied.) Note that no ’turbo principle’ is appliedsince it is not our focus in this paper. This means thatthe LDPC decoding begins only after the inner iteration inMIMO detector is finished. Regarding the MIMO channel, wegenerated, for each transmitted data vector, an independentand identically distributed (i.i.d.) MIMO channel matrix, ofwhich each element is also an i.i.d. complex Gaussian randomvariable with mean 0 and variance 1. The resulting channelcan be regarded as a fully interleaved frequency selectiveMIMO channel that can be seen on top of the orthogonalfrequency division multiplexing, especially for those channelswhere the transmission bandwidth is much larger than thechannel coherence bandwidth.

Fig. 3 shows a comparison of bit error rate performance asa function of signal-to-noise ratio (SNR) (1/σ2) for MAP,BP1 in [19], MMSE and the proposed BP-based detectorwith the fully-connected and ring type pair-wise model. Theperformance without channel coding is on the left and theones with channel coding on the right. We set the numberof iterations to 3 and 4 for BP2 and BP3, respectively, and

BELIEF PROPAGATION OVER PAIR-WISE GRAPHS 11

Fig. 5. A comparison of bit error rate performance of MMSE, MAP and theproposed detectors as a function of SNR (1/σ2); 6×6 antenna configuration,QPSK modulation, (a) No channel coding, (b) DVB-S2 LDPC code of rate3/4 (length 64800).

Fig. 6. A comparison of bit error rate performance of MMSE, MAP, and theproposed detectors as a function of SNR (1/σ2); 4×4 antenna configuration,16QAM modulation, (a) No channel coding, (b) DVB-S2 LDPC code of rate3/4 (length 64800).

4 for the one in [19]. We use a 4 × 4 antenna configurationand QPSK modulation. We could confirm from Fig. 3 thatthe proposed detector performs as well as the MAP, especiallywhen using channel coding. In this case, the SNR gap betweenthe proposed scheme and the MAP is shown to be around 0.1and 0.3, respectively.

It is also worth comparing the performance of BP2 and BP3with the one in [26], where the pair-wise MRF obtained bythe direct decomposition of Gaussian PDF is used. The per-formance comparison is shown in Fig. 4 for BPSK modulationwith and without channel coding.4 With channel coding, theperformance of the one in [26] is almost the same as that ofBP2. While, without channel coding, it performs similar to theproposed scheme only up to SNR of 2dB, after which it getsworse with higher SNR. This is consistent with the resultsin [26]. We also tried to obtain the results for higher ordermodulation. Unfortunately, however, the algorithm in [26]failed to work for higher order modulation, as also reported in[21], and this is one of the advantages of using the proposedscheme over the existing (fully-connected) MRF based MIMOdetection. Fig. 5 shows the BER performance, with and

4The reason we consider here only one-dimensional constellation like BPSKis that the algorithm in [26] is applicable only to those real constellation andwe just wanted to use it as is since any modification may cause unexpectedresults.

Fig. 7. Convergence property of the proposed algorithm with the number ofantennas M = 4, 6 and 8; QPSK modulation.

Fig. 8. Convergence property of the proposed algorithm with modulationsize of 2m for m = 2, 3, 4 and 5; 4× 4 antenna configuration

without channel coding, for a 6×6 antenna configuration withQPSK modulation. With channel coding, the SNR gap betweenthe proposed scheme and the MAP is now approximately0.75 dB for the fully connected pair-wise graph and 1 dBfor the ring-type one, respectively. Although the performancedegradation compared to the MAP is larger than for a 4 × 4antenna configuration, the SNR gain over the MMSE detectoris around 3.5 dB.

In Fig. 6, BER performance with higher modulation order(16-QAM) is shown for a 4 × 4 antenna configuration. Withchannel coding, the SNR gap between the proposed methodand the MAP is shown to be around 1 dB for the BP2 overthe fully connected pair-wise graph and 0.7 dB for BP3 overthe ring-type, respectively. Note that, without channel coding,the performance of BP3 over the ring-type pair-wise graphis almost the same as that of BP2 over the fully-connectedgraph, while, with chnannel coding, the former is better thanthe latter. Here, we set the number of iterations of BP2 andBP3 to four and six. One possible reason for why the fully-connected graph perform worse than the ring-type for higherorder QAM can be inferred from the convergence behavior tobe discussed shortly.

In Figs. 3 to 6, the number of iterations was set based onthe simulation results in Figs. 7 and 8, which we performedwith different number of antennas and modulation size, to giveinsights into how many iterations are required for a satisfactoryperformance. To show the convergence behavior, we measuredthe cross entropy (also known as Kullback-Leibler distance)

12 TO APPEAR IN IEEE TRANS. ON VEHICULAR TECHNOLOGY, 2013

between the final beliefs and the transmitted data, rather thanBER. The reason we used the cross entropy is to take theimpact of soft decisions into account. Let us consider thetransmitted data bit, d. The cross entropy between the finalbelief and the transmitted data bits is given by

D(d) =∑

x∈{0,1}

qd(x) · log

(qd(x)

bd(x)

)where qd(x) = 1 if x = d or 0 otherwise, and the belief ond, bd(x), is obtained from its LLR, i.e.,

bd(x) =exp

(α · LLR(d)

2

)exp

(+ LLR(d)

2

)+ exp

(−LLR(d)

2

)with α = +1 if x = 0 or −1 if x = 1. As convention,0 · log(0/b) = 0, 1 · log(1/0) =∞ (replaced with a maximumvalue). The cross-entropy provides us a measure of how closethe final belief obtained by the detection algorithm to theactually transmitted bits taking into account the impact of soft-decision as well. Fig. 7 and 8 show the convergence behaviormeasured in terms of cross entropy, where we chose the SNRvalues around the waterfall region for DVB-S2 LDPC codesuch that they have similar values with even differences for thereaders to easily compare the convergence curves. 5 As shownin the simulation results, the number of iterations required forconvergence depends on the modulation sizes, but not muchon the number of antennas. For BP3, we need more number ofiterations for higher modulation size. For BP2, the convergencebehavior with different number of antennas looks similar tothat of BP3 (specifically for QPSK), while it is quite differentfrom those of BP3 with different modulation sizes. Specificallyspeaking, the performance of BP2 over the fully connectedgraph does not get better with more than 3 or 4 iterations.Rather, it is degraded especially for higher order modulation.In BP3 over the ring-type graph, however, no degradation hasbeen observed with more iterations. As mentioned previously,the condition for sure convergence in loopy graph is still anopen problem. And the difference in the convergence behaviorof BP2 and BP3 can only be explained by the note in [22],i.e., in densely connected graph, the messages may circulatealong the short loops preventing the eventual convergence.Fig. 2(a) of the fully connected pair-wise graph is moredensely connected than Fig. 2(b) of the ring-type pair-wisegraph. Although the message will propagate faster in denselyconnected graph than in sparsely connected graph resulting infaster convergence, the message circulation may prevents theeventual convergence with more iterations.

Another point we need to note is that, in BP3, one can allowa slight performance degradation for a large computationalsaving. Certainly, as shown in Fig. 8 and implicated in Fig. 6,at least 10 iterations is needed for eventual convergence for16QAM. However, comparing the required SNR for, say,BER=10−4, the difference between 6 and 12 iterations is less

5Note that the cross entropy is only a complementary measure and we usedit only to determine the number of iterations required for convergence. Sincethere is no literature that directly relates BER with channel coding to the crossentropy between the transmitted bits and the soft-decisions, it seems risky tosay something on BER performance based on the cross entropy measure.

Fig. 9. Bit error rate performance of the Gaussian BP over the fully-connectedand ring-type pair-wise graph, respectively; 4×4 antenna configuration, QPSKmodulation, no channel coding.

than 0.1 dB, while, in computational burden, 12 iterations aretwice of 6.

Fig.9 shows the convergence behavior of the GaussianBP discussed in Section IV. We plotted the bit error rateperformance of the Gaussian BP over the fully-connected andring-type pair-wise graph, respectively, with various numberof iterations. As can be seen in the figure, both GBP 2G and3G converge to the performance of linear MMSE detector,though it requires much more iterations than those of BP2 andBP3. The only difference between GBP 2G and 3G is the rateof convergence. On the other hand, in high SNR region, theperformance looks getting worse with higher SNR. However,it should be noted that with higher SNR it simply requiresmore number of iterations for eventual convergence.

VI. CONCLUSIONS

In this paper, low complexity, iterative MIMO detectionalgorithms were derived as a message passing over the pair-wise bipartite graphs with the translation functions that areobtained by marginalizing the posterior joint probability den-sity under the Gaussian input assumption. We investigatedtwo models, the fully-connected and ring-type pair-wise graph.The latter is shown to be an extension of the previous workin [9], [10]. The two pair-wise graphical models are rathersparse in the sense that the number of edges connected to anobservation node, i.e., edge degree, is only two and, thus, themessage passing becomes much easier than that over the fullyconnected bipartite graph in [19].

We also investigated the proposed algorithm under Gaussianinput assumption. It was shown that, for the Gaussian BPover the ring-type pair-wise graph, the algorithm convergesto the linear MMSE estimates. These results are in line withthose in [17], [18], [30], [31]. Gaussian BP over the fully-connected pair-wise graph shows a faster convergence ratethan Gaussian BP over the ring-type graph. As proved in thispaper, the convergence of the Gaussian BP 3G over the ring-type graph is guaranteed. This does not, however, appear to bethe case for non-Gaussian message. The performance of BP 2for non-Gaussian case degrade with more than four iterations.This phenomenon might stem from the short cycles in theirgraphical model and may be avoided by utilizing “global

BELIEF PROPAGATION OVER PAIR-WISE GRAPHS 13

iteration” between MIMO detection and channel decoding.That is, by using an appropriate channel code and interleaver,message circulation along short cycles can be broken up forsteady convergence and better performance. We leave this forour future work.

APPENDIXDETAILED DERIVATIONS OF (44) AND THE

GAUSSIAN BP

To derive (24) and the Gaussian BP rule, (33)-(40), we usethe properties of the Gaussian PDF in [22], as follows

1) CN(x;µ, σ2

)= CN

(µ;x, σ2

)= CN

(x− µ; 0, σ2

)2) CN

(ax+ b;µ, σ2

)= CN

(x;µ− ba

,σ2

|a|2

)3) CN

(x;µ1, σ

21

)· CN

(x;µ2, σ

22

)= CN

(x;σ−2

1 µ1 + σ−22 µ2

σ−21 + σ−2

2

,1

σ−21 + σ−2

2

)· CN (µ1;µ2, σ

21 + σ2

1)

4)

∫CN

(x;µ1, σ

21

)· CN

(x;µ2, σ

22

)· dx

= CN(µ1;µ2, σ

21 + σ2

1

).

Using these, (24) is obtained as follows.

p̃(xj |xi, y′j|i)

=CN

(y′j|i; aj|i,jxj + aj|i,ixi, σ

2j|i

)· CN (xj ; 0, 1)

CN(y′j|i; aj|i,ixi, σ

2j|i + |aj|i,j |2

)

=

CN(xj ;

1aj|i,j

(y′j|i − aj|i,ixi

),

σ2j|i

|aj|i,j |2

)· CN (xj ; 0, 1)

CN(y′j|i; aj|i,ixi, σ

2j|i + |aj|i,j |2

)=CN

xj ; 1aj|i,j

(y′j|i − aj|i,ixi

)σ2j|i

|aj|i,j |2+ 1

,σ2j|i

|aj|i,j |2· 1

1 +σ2j|i

|aj|i,j |2

·CN

(y′j|i − aj|i,ixi; 0, |aj|i,j |2 + σ2

j|i

)CN

(y′j|i; aj|i,ixi, σ

2j|i + |aj|i,j |2

)=CN

xj ; a∗j|i,j(y′j|i − aj|i,ixi

)σ2j|i + |aj|i,j |2

,σ2j|i

σ2j|i + |aj|i,j |2

. (63)

Now, we derive the message update rule of the Gaus-sian BP. To this end, we divide the message updaterule in BP2, (26), into two steps, i.e., the extrinsic in-formation computation, λi→j(xi) =

∏k∈V (i)\j πk→i (xi),

and and the message translation step, πi→j(xj) =α∑xi∈Ξ p̃(xj |xi, yyy)λi→j(xi). Assuming Gaussian messages,

πk→i(xi) = CN (xi;µπ,k→i, σ2π,k→i), the former is given by

λi→j(xi) =∏

k∈V (i)\j

πk→i(xi)

=∏

k∈V (i)\j

CN(xi;µπ,k→i, σ

2π,k→i

)

∝ CN

(xi;

∑k∈V (i)\j σ

−2π,k→iµπ,k→i∑

k∈V (i)\j σ−2π,k→i

,1∑

k∈V (i) σ−2π,k→i

)= CN

(xi;µλ,i→j , σ

2λ,i→j

)(64)

For the message translation, we first rewrite (63) as

p̃(xj

∣∣∣xi, y′j|i )= CN

xj ; a∗j|i,j(y′j|i − aj|i,ixi

)σ2j|i + |aj|i,j |2

,σ2j|i

σ2j|i + |aj|i,j |2

)

= CN

(σ2j|i + |aj|i,j |2

a∗j|i,jxj ; y

′j|i − aj|i,ixi,

σ2j|i(σ

2j|i + |aj|i,j |2)

|aj|i,j |2

)

= CN

(xi;

y′j|i

aj|i,i−σ2j|i + |aj|i,j |2

aj|i,i · a∗j|i,jxj ,

σ2j|i(σ

2j|i + |aj|i,j |2)

|aj|i,j |2|aj|i,i|2

).

(65)

Then, by plugging (64) and (65) into (26) and changing thesummation into integral,6 we have the equations on the top ofthe next page, where by comparing the mean and variance inthe last two equation, we obtain the message passing rules of(33) and (34), respectively. The belief in (35) and (36) can beobtained similarly to the derivation in (64) to (67).

REFERENCES

[1] U. Fincke and M. Poste, “Improved methods for calculating vectorsof short length in a lattice, including a complexity analysis,” Math.Comput, vol. 44, pp. 463–471, April 1985.

[2] H. Vikalo and B. Hassibi, “Modified Fincke-Poste algorithm for low-complexity iterative decoding over multiple antenna channel,” in Proc.IEEE Int. Symp. Info. Th., July 2002, p. 390.

[3] B. M. Hochwald and S. ten Brink, “Achieving near-capacity on multiple-antenna channel,” IEEE Trans. Comm., vol. 51, no. 3, pp. 389–399,March 2003.

[4] D. Wubben, R. Boehnke, V. Kuehn, and K. Kammeyer, “MMSEextension of V-BLAST based on sorted QR decomposition,” in Proc.IEEE Veh. Technol. Conf., Oct. 2003.

[5] C. Studer, A. Burg, and H. Bolcskei, “Soft-output sphere decoding:Algorithms and VLSI implementation,” IEEE Jour. Select. Areas inComm., vol. 26, no. 2, pp. 290–300, Feb. 2008.

[6] K. Higuchi, H. Kawai, H.Taoka, N. Maeda, and M. Sawahshi, “Adaptiveselection of surviving symbol replica candidates for quasi-maximumlikelihood detection using M-alorithm with QR-decomposition forOFDM MIMO multiplexing,” IEICE Trans. Comm., vol. e92-b, 2009.

[7] W. Jiang, X. Yu, and Y. Li, “Bi-truncation for simplified MIMO signaldetection,” in Proc. IEEE Glob. Telecom. Conf., Nov. 29–Dec. 3 2004,pp. 401–405.

[8] W. Jiang, Y. Li, and X. Yu, “Truncation for low-complexity MIMOsignal detection,” IEEE Trans. Info. Th., vol. 53, no. 4, pp. 1564–1571,April 2007.

[9] S. Yoon and S. Lee, “A detection algorithm for multi-input multi-output(MIMO) transmission using poly-diagonalization and trellis decoding,”IEEE Jour. Select. Areas in Comm., vol. 26, no. 8, pp. 993–1002, Aug.2008.

[10] S. Yoon, “Asymmetrically optimized poly-diagonalization for lowcomplexity MIMO detection,” IET Elec. Letters, vol. 46, no. 17, pp.1226–1228, Aug. 2010.

[11] G. D. Forney, “The forward-backward algorithm,” in Proc. of AllertonConf. on Comm. Control and Comp., Sep. 1996.

[12] J. B. Anderson and S. M. Hladik, “Tailbiting MAP decoders,” IEEEJour. Select. Areas in Comm., vol. 16, no. 2, pp. 297–302, Feb. 1998.

[13] D. D. Falconer and F. R. Magee, “Adaptive channel memory truncationfor maximum likelihood sequence estimation,” The Bell Sys. Tech. Jour.,vol. 52, no. 9, pp. 1541–1562, Sep. 1973.

6Since the input is now continuous Gaussian random variable, we need tochange the summation in (24) into integral.

14 TO APPEAR IN IEEE TRANS. ON VEHICULAR TECHNOLOGY, 2013

πi→j(xj) =

∫xi

p(xj |xi, y′ji

)· λi→j(xi) · dxi

=

∫xi

CN

xi; 1

aj|i,i

(y′j|i −

σ2j|i + |aj|i,j |2

a∗j|i,jxj

),σ2j|i

(σ2j|i + |aj|i,j |2

)|aj|i,j |2|aj|i,i|2

· CN (xi;µλ,i→j , σ2λ,i→j

)· dxi

= CN

xj ; a∗j|i,j

σ2j|i + |aj|i,j |2

(y′j|i − aj|i,iµλ,i→j

),

σ2j|i

σ2j|i + |aj|i,j |2

+|aj|i,j |2|aj|i,i|2σ2

λ,i→j(σ2j|i + |aj|i,j |2

)2

(66)

= CN(xj ;µπ,i→j , σ

2π,i→j

). (67)

[14] S. Yang, T. Lv, R. G. Maunder, and L. Hanzo, “Unified bit-basedprobabilistic data association aided mimo detection for high-order qamconstellations,” IEEE Trans. on Veh. Technol., vol. 60, no. 3, pp. 981–991, Mar. 2011.

[15] Y. Kabashima, “A CDMA multiuser detection algorithm on the basis ofbelief propagation,” J. Phys. A., pp. 11111–11121, 2003.

[16] T. Tanaka and M. Okada, “Approximate belief propagation, densityevolution, and neurodynamics for CDMA multiuser detection,” IEEETrans. Info. Th., vol. 51, no. 2, pp. 700–706, Feb. 2005.

[17] A. Montanari, B. Prabhakar, and D. Tse, “Belief propagation basedmulti-user detection,” in Proc. of Allerton Conf. on Comm. Control andComp., Sep. 2005.

[18] D. Bickson, O. Shental, P. H. Siegel, J. K. Wolf, , and D. Dolev, “Lineardetection via belief propagation,” in Proc. of Allerton Conf. on Comm.Control and Comp., Sep. 2007.

[19] J. Hu and T. M. Duman, “Graph-based detector for BLAST architec-ture,” in Proc. IEEE Int. Conf. on Comm., June 2007, pp. 1018–1023.

[20] D. Bickson, O. Shental, P. H. Siegel, J. K. Wolf, and D. Dolev, “Gaussianbelief propagation based multiuser detection,” in Proc. IEEE Int. Symp.Info. Th., July 2008, pp. 1878–1882.

[21] J. Goldberger and A. Leshem, “A Gaussian tree approximation forinteger least-squares,” in Neural Info. Proc. Systems (NIPS), 2009.

[22] J. Pearl, Probabilistic reasoning in intelligent systems: Networks ofplausible inference, Morgan Kaufmann, 1987.

[23] F. Kschischang, B. Frey, and H. A. Loeliger, “Factor graphs and thesum-product algorithm,” IEEE Trans. Info. Th., vol. 47, no. 2, pp. 498–519, Feb. 2001.

[24] J. Goldberger and A. Leshem, “MIMO decoding based on stochasticreconstruction from multiple projections,” in Proc. IEEE Int. Conf.Acoust., Speech and Sig. Proc., April 2009, pp. 2457–2460.

[25] M. Suneel, P. Som, A. Chockalingam, and B. S. Rajan, “Beliefpropagation based decoding of large non-orthogonal STBCs,” in Proc.IEEE Int. Symp. Info. Th., June 28–July 3 2009, pp. 2003–2007.

[26] P. Som, T. Datta, A. Chockalingam, and B.S. Rajan, “Improved large-MIMO detection based on damped belief propagation,” in Proc. of IEEEInfo. Th. Workshop, Jan. 2010.

[27] J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Understanding beliefpropagation and its generalizations,” in Proc. Int. Joint. Conf. onArtificial Intelligence, 2001.

[28] R. Koetter, A. C. Singer, and M. Tchler, “Turbo equalization,” IEEESig. Proc. Mag., vol. 21, no. 1, pp. 67–80, Jan. 2004.

[29] S. Yoon, “A low complexity mimo detection based on pairwise markovrandom fields,” in Proc. IEEE Veh. Technol. Conf. 2011 Spring, May.2011.

[30] Y. Weiss and W. T. Freeman, “Correctness of belief propagation ingaussian graphical models of arbitrary topology,” Report No.UCB/CSD-99-1046, UC Berkeley, 1999.

[31] P. Rusmevichientong and B. V. Roy, “An analysis of belief propagationon the turbo decoding graph with Gaussian densities,” IEEE Trans. Info.Th., vol. 47, no. 2, pp. 745–765, Feb. 2001.

[32] Digital Video Broadcasting (DVB), “Second generation framing struc-ture, channel coding and modulation systems for broadcasting, interac-tive services, news gathering and other broadband satellite applications(DVB-S2),” ETSI EN 302207, April 2009, ver. 1.2.1.

PLACEPHOTOHERE

Seokhyun Yoon (SM’95-M’03) received his B.S.and M.S. degrees in Electronics Engineering fromSung Kyun Kwan University, Suwon, Korea, in 1992and 1996, respectively, and his Ph.D. degree inElectrical and Computer Engineering from the NewJersey Institute of Technology, Newark, in 2003.

In 1999, he was with the Dept. of BroadcastingTechnology in the Electronics and Telecommuni-cations Research Institute (ETRI), Deajeon, Korea,as a Technical Staff Member. During 2003-2005,he was with the Telecomm. R&D Center, Samsung

Electronics Co., Ltd., Suwon, Korea, as a Senior Technical Staff Member,where he worked on technologies for wireless/mobile air interfaces and itsstandardization. Currently, he is an Associate Professor in the Department ofElectronics Engineering, Dankook University, Yongin-si, Kyunggi-do, Korea.His research activities are focused on broad area of wireless communications,including MIMO, channel coding and signal processing for communications.

Dr. Yoon was awarded the Hashimoto Prize from NJIT in 2003 and theHaedong Best paper award from the Korean Institute of Communications andInformation Sciences (KICS) in 2006.

PLACEPHOTOHERE

Chan-Byoung Chae (S’06 - M’09 - SM’12) isan Assistant Professor in the School of IntegratedTechnology, College of Engineering, Yonsei Univer-sity, Korea. He was a Member of Technical Staff(Research Scientist) at Bell Laboratories, Alcatel-Lucent, Murray Hill, NJ, USA from 2009 to 2011.Before joining Bell Laboratories, he was with theSchool of Engineering and Applied Sciences atHarvard University, Cambridge, MA, USA as a Post-Doctoral Research Fellow. He received the Ph. D.degree in Electrical and Computer Engineering from

The University of Texas (UT), Austin, TX, USA in 2008, where he was amember of the Wireless Networking and Communications Group (WNCG).

Prior to joining UT, he was a Research Engineer at the TelecommunicationsR&D Center, Samsung Electronics, Suwon, Korea, from 2001 to 2005. He wasa Visiting Scholar at the WING Lab, Aalborg University, Denmark in 2004and at University of Minnesota, MN, USA in August 2007. While havingworked at Samsung, he participated in the IEEE 802.16e standardization,where he made several contributions and filed a number of related patentsfrom 2004 to 2005. His current research interests include capacity analysisand interference management in energy-efficient wireless mobile networksand nano (molecular) communications. He serves as an Editor for the IEEETRANS. ON WIRELESS COMMUNICATIONS and IEEE/KICS JOUR. COMM.NETS. He is also an Area Editor for the IEEE JOUR. SELECTED AREAS INCOMMUNICATIONS (nano scale and molecular networking). He is an IEEESenior Member.

Dr. Chae was the recipient/co-recipient of the IEEE Signal Processing BestPaper Award in 2013, the IEEE ComSoc AP Outstanding Young ResearcherAward in 2012, the IEEE Dan. E. Noble Fellowship Award in 2008, the GoldPrize (1st) in the 14th/19th Humantech Paper Contest, and the KSEA-KUSCOscholarship in 2007. He also received the Korea Government Fellowship(KOSEF) during his Ph. D. studies.