294 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, …

294 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 2, JANUARY 15, 2018

A GAMP-Based Low Complexity Sparse BayesianLearning Algorithm

Maher Al-Shoukairi , Student Member, IEEE, Philip Schniter , Fellow, IEEE, and Bhaskar D. Rao, Fellow, IEEE

Abstract—In this paper, we present an algorithm for the sparsesignal recovery problem that incorporates damped Gaussian gen-eralized approximate message passing (GGAMP) into expectation-maximization-based sparse Bayesian learning (SBL). In particular,GGAMP is used to implement the E-step in SBL in place of ma-trix inversion, leveraging the fact that GGAMP is guaranteed toconverge with appropriate damping. The resulting GGAMP-SBLalgorithm is much more robust to arbitrary measurement matrixA than the standard damped GAMP algorithm while being muchlower complexity than the standard SBL algorithm. We then ex-tend the approach from the single measurement vector case to thetemporally correlated multiple measurement vector case, leadingto the GGAMP-TSBL algorithm. We verify the robustness andcomputational advantages of the proposed algorithms through nu-merical experiments.

Index Terms—Compressed sensing, approximate messagepassing (AMP), sparse Bayesian learning (SBL), expectation-maximization algorithms, Gaussian scale mixture, multiple mea-surement vectors (MMV).

I. INTRODUCTION

A. Sparse Signal Recovery

THE problem of sparse signal recovery (SSR) and the re-lated problem of compressed sensing have received much

attention in recent years [1]–[6]. The SSR problem, in the sin-gle measurement vector (SMV) case, consists of recovering asparse signal x ∈ RN fromM ≤ N noisy linear measurementsy ∈ RM :

y = Ax + e, (1)

where A ∈ RM×N is a known measurement matrix and e ∈RM is additive noise modeled by e ∼ N (0, σ2I). Despite thedifficulty in solving this problem [7], an important finding inrecent years is that for a sufficiently sparse x and a well de-signed A, accurate recovery is possible by techniques such as

Manuscript received March 2, 2017; revised June 18, 2017 and August 29,2017; accepted October 2, 2017. Date of publication October 19, 2017; dateof current version December 4, 2017. The associate editor coordinating thereview of this manuscript and approving it for publication was Prof. YiminD. Zhang. The work of M. Al-Shoukairi and B. D. Rao was supported by theEricsson Endowed Chair Funds. The work of P. Schniter was supported by theNational Science Foundation Grant 1527162. (Corresponding author: MaherAl-Shoukairi.)

M. Al-Shoukairi and B. D. Rao are with the Department of Electrical andComputer Engineering, University of California at San Diego, San Diego, CA92093 USA (e-mail: [email protected]; [email protected]).

P. Schniter is with the Department of Electrical and Computer Engi-neering, The Ohio State University, Columbus, OH 43210 USA (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSP.2017.2764855

basis pursuit and orthogonal matching pursuit [8]–[10]. TheSSR problem has seen considerable advances on the algorith-mic front and they include iteratively reweighted algorithms[11]–[13] and Bayesian techniques [14]–[20], among others.Two Bayesian techniques related to this work are the gener-alized approximate message passing (GAMP) and the sparseBayesian learning (SBL) algorithms. We briefly discuss bothalgorithms and some of their shortcomings that we intend toaddress in this work.

B. Generalized Approximate Message Passing Algorithm

Approximate message passing (AMP) algorithms applyquadratic and Taylor series approximations to loopy beliefpropagation to produce low complexity algorithms. Based onthe original AMP work in [21], a generalized AMP (GAMP)algorithm was proposed in [22]. The GAMP algorithm providesan iterative Bayesian framework under which the knowledgeof the matrix A and the densities p(x) and p(y|x) can beused to compute the maximum a posteriori (MAP) estimatexMAP = arg minx∈ RN p(x|y) when it is used in its max-summode, or approximate the minimum mean-squared error(MMSE) estimate xMMSE =

∫RN xp(x|y)dx = E(x|y)

when it is used in its sum-product mode.The performance of AMP/GAMP algorithms in the large sys-

tem limit (M,N → ∞) under an i.i.d zero-mean sub-Gaussianmatrix A is characterized by state evolution [23], whose fixedpoints, when unique, coincide with the MAP or the MMSE es-timate. However, when A is generic, GAMP’s fixed points canbe strongly suboptimal and/or the algorithm may never reachits fixed points. Previous work has shown that even mild ill-conditioning or small mean perturbations in A can cause GAMPto diverge [24]–[26]. To overcome the convergence problem inAMP algorithms, a number of AMP modifications have beenproposed. A “swept” GAMP (SwAMP) algorithm was pro-posed in [27], which replaces parallel variable updates in theGAMP algorithm with serial ones to enhance convergence. ButSwAMP is relatively slow and still diverges for certain A. Anadaptive damping and mean-removal procedure for GAMP wasproposed in [26] but it too is somewhat slow and still divergesfor certain A. An alternating direction method of multipliers(ADMM) version of AMP was proposed in [28] with improvedrobustness but even slower convergence.

In the special case that the prior and likelihood are bothindependent Gaussian, [24] was able to provide a full charac-terization of GAMP’s convergence. In particular, it was shownthat Gaussian GAMP (GGAMP) algorithm will converge if andonly if the peak to average ratio of the squared singular valuesin A is sufficiently small. When this condition is not met, [24]

1053-587X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

https://orcid.org/0000-0002-5808-9947

https://orcid.org/0000-0003-0939-7545

AL-SHOUKAIRI et al.: GAMP-BASED LOW COMPLEXITY SPARSE BAYESIAN LEARNING ALGORITHM 295

proposed a damping technique that guarantees convergence ofGGAMP at the expense of slowing its convergence rate. Al-though the case of Gaussian prior and likelihood is not enoughto handle the sparse signal recovery problem directly, it is suffi-cient to replace the costly matrix inversion step of the standardEM-based implementation of SBL, as we describe below.

C. Sparse Bayesian Learning Algorithm

To understand the contribution of this paper, we give a verybrief description of SBL [14], [15], saving a more detaileddescription for Section II. Essentially, SBL is based on aGaussian scale mixture (GSM) [29]–[31] prior on x. Thatis, the prior is Gaussian conditioned on a variance vector γ,which is then controlled by a suitable choice of hyperpriorp(γ). A large of number of sparsity-promoting priors, likethe Student-t and Laplacian priors, can be modeled using aGSM, making the approach widely applicable [29]–[32]. In theSBL algorithm, the expectation-maximization (EM) algorithmis used to alternate between estimating γ and estimating thesignal x under fixed γ. Since the latter step uses a Gaussianlikelihood and Gaussian prior, the exact solution can becomputed in closed form via matrix inversion. This matrixinversion is computationally expensive, limiting the algorithmsapplicability to large scale problems.

D. Paper’s Contribution1

In this paper, we develop low-complexity algorithms forsparse Bayesian learning (SBL) [14], [15]. Since the traditionalimplementation of SBL uses matrix inversions at each itera-tion, its complexity is too high for large-scale problems. In thispaper we circumvent the matrix inverse using the generalizedapproximate message passing (GAMP) algorithm [21], [22],[24]. Using GAMP to implement the E step of EM-based SBLprovides a significant reduction in complexity over the classicalSBL algorithm. This work is a beneficiary of the algorithmicimprovements and theoretical insights that have taken place inrecent work in AMP [21], [22], [24], where we exploit the factthat using a Gaussian prior on p(x) can provide guarantees forthe GAMP E-step to not diverge when sufficient damping is used[24], even for a non-i.i.d.-Gaussian A. In other words, the en-hanced robustness of the proposed algorithm is due to the GSMprior used on x, as opposed to other sparsity promoting priorsfor which there are no GAMP convergence guarantees when Ais non-i.i.d.-Gaussian. The resulting algorithm is the GaussianGAMP SBL (GGAMP-SBL) algorithm, which combines therobustness of SBL with the speed of GAMP.

To further illustrate and expose the synergy between the AMPand SBL frameworks, we also propose a new approach to themultiple measurement vector (MMV) problem. The MMV prob-lem extends the SMV problem from a single measurement andsignal vector to a sequence of measurement and signal vec-tors. Applications of MMV include direction of arrival (DOA)estimation and EEG/MEG source localization, among others.In our treatment of the MMV problem, all signal vectors areassumed to share the same support. In practice it is often thecase that the non-zero signal elements will experience tem-

1Part of this work was presented in the 2014 Asilomar conference on Signals,Systems and Computers [33]

poral correlation, i.e., each non-zero row of the signal matrixcan be treated as a correlated time series. If this correlationis not taken into consideration, the performance of MMV al-gorithms can degrade quickly [34]. Extensions of SBL to theMMV problem have been developed in [34]–[36], such as thetime varying sparse Bayesian learning (TSBL) and TMSBL al-gorithms [34]. Although TMSBL has lower complexity thanTSBL, it still requires an order of O(NM 2) operations per it-eration, making it unsuitable for large-scale problems. To over-come the complexity problem, [37] and [33] proposed AMP-based Bayesian approaches to the MMV problem. However,similar to the SMV case, these algorithms are only expectedto work for i.i.d zero-mean sub-Gaussian A. We therefore ex-tend the proposed GGAMP-SBL to the MMV case, to producea GGAMP-TSBL algorithm that is more robust to generic A,while achieving linear complexity in all problem dimensions.

The organization of the paper is as follows. In Section II, wereview SBL. In Section III, we combine the damped GGAMPalgorithm with the SBL approach to solve the SMV problemand investigate its convergence behavior. In Section IV, we usea time-correlated multiple measurement factor graph to derivethe GGAMP-TSBL algorithm. In Section V, we present numer-ical results to compare the performance and complexity of theproposed algorithms with the original SBL and with other AMPalgorithms for the SMV case, and with TMSBL for the MMVcase. The results show that the new algorithms maintained therobustness of the original SBL and TMSBL algorithms, whilesignificantly reducing their complexity.

II. SPARSE BAYESIAN LEARNING FOR SSR

A. GSM Class of Priors

We will assume that the entries of x are independent andidentically distributed, i.e. p(x) = Πnp(xn ). The sparsity pro-moting prior p(xn ) will be chosen from the GSM class and sowill admit the following representation

p(xn ) =∫

N (xn ; 0, γn )p(γn )dγn , (2)

where N (xn ; 0, γn ) denotes a Gaussian density with mean zeroand variance γn . The mixing density on hyperprior p(γn ) con-trols the prior on xn . For instance, if a Laplacian prior is desiredfor xn , then an exponential density is chosen for p(γn ) [29].

In the empirical Bayesian approach, an estimate of the hyper-parameter vector γ is iteratively estimated, often using evidencemaximization. For a given estimate γ, the posterior p(x|y) isapproximated as p(x|y; γ), and the mean of this posterior isused as a point estimate for x. This mean can be computedin closed form, as detailed below, because the approximateposterior is Gaussian. It was shown in [38] that this empiri-cal Bayesian method, also referred to as a Type II maximumlikelihood method, can be used to formulate a number of al-gorithms for solving the SSR problem by changing the mix-ing density p(γn ). There are many computational methods thatcan be employed for computing γ in the evidence maximiza-tion framework, e.g, [14], [15], [39]. In this work, we utilizethe EM-SBL algorithm because of its synergy with the GAMPframework, as will be apparent below.


B. SBL’s EM Algorithm

In EM-SBL, the EM algorithm is used to learn the un-known signal variance vector γ [40]–[42], and possibly also thenoise variance σ2 . We focus on learning γ, assuming the noisevariance σ2 is known. We later state the EM update rule for thenoise variance σ2 for completeness.

The goal of the EM algorithm is to maximize the posteriorp(γ|y) or equivalently2 to minimize− log p(y,γ). For the GSMprior (2) and the additive white Gaussian noise model, thisresults in the SBL cost function [14], [15],

χ(γ) = − log p(y,γ)

=12

log |Σy| + 12y�Σ−1

y y − log p(γ), (3)

Σy = σ2I + AΓA�, Γ � Diag(γ).

In the EM-SBL approach, x is treated as the hidden variableand the parameter estimate is iteratively updated as follows:

γi+1 = argmaxγ

Ex|y;γi [log p(y,x,γ)] , (4)

where p(y,x,γ) is the joint probability of the complete dataand p(x|y;γi) is the posterior under the old parameter estimateγi , which is used to evaluate the expectation. In each iteration,an expectation has to be computed (E-step) followed by a max-imization step (M-step). It is easy to show that at each iteration,the EM algorithm increases a lower bound on the log posteriorlog p(γ|y) [40], and it has been shown in [42] that the algorithmwill converge to a stationary point of the posterior under a fairlygeneral set of conditions that are applicable in many practicalapplications.

Next we detail the implementation of the E and M steps ofthe EM-SBL algorithm.

SBL’s E-step: The Gaussian assumption on the additive noisee leads to the following Gaussian likelihood function:

p(y|x;σ2) =1

(2πσ2)M2

exp(

− 12σ2 ‖y − Ax‖2

)

. (5)

Due to the GSM prior (2), the density of x conditioned on γ isGaussian:

p(x|γ) =N∏

n=1

1(2πγn )

12

exp(

− x2n

2γn

)

. (6)

Putting (5) and (6) together, the density needed for the E-step isGaussian:

p(x|y,γ) = N (x; x,Σx) (7)

x = σ−2ΣxA�y (8)

Σx = (σ−2A�A + Γ−1)−1

= Γ − ΓA�(σ2I + AΓA�)−1AΓ. (9)

We refer to the mean vector as x since it will be used as theSBL point estimate of x. In the sequel, we will use τ x when

2Using Bayes rule, p(γ|y) = p(y,γ)/p(y) where p(y) is a constant withrespect to γ. Thus for MAP estimation of γ we can maximize p(y,γ), orminimize − log p(y,γ).

referring to the vector composed from the diagonal entries ofthe covariance matrix Σx. Although both x and τ x change withthe iteration i, we will sometimes omit their i dependence forbrevity. Note that the mean and covariance computations in (8)and (9) are not affected by the choice of p(γ). The mean anddiagonal entries of the covariance matrix are needed to carry outthe M-step as shown next.

SBL’s M-Step: The M-step is then carried out as follows.First notice that

Ex|y;γi ,σ 2

[− log p(y,x,γ;σ2)]

=

Ex|y;γi ,σ 2

[− log p(y|x;σ2) − log p(x|γ) − log p(γ)]. (10)

Since the first term in (10) does not depend on γ, it will not berelevant for the M-step and thus can be ignored. Similarly, inthe subsequent steps we will drop constants and terms that donot depend on γ and therefore do not impact the M-step. SinceEx|y;γi ,σ 2 [x2

n ] = x2n + τxn ,

Ex|y;γi ,σ 2 [− log p(x|γ) − log p(γ)] =

N∑

n=1

((x2n + τxn2γn

)

+12

log γn − log p(γn ))

. (11)

Note that the E-step only requires xn , the posterior mean from(8), and τxn ,the posterior variance from (9), which are statisticsof the marginal densities p(xn |y,γi). In other words, the fulljoint posterior p(x|y,γi) is not needed. This facilitates the useof message passing algorithms.

As can be seen from (7)–(9), the computation of x and τ x in-volves the inversion of anN ×N matrix, which can be reducedto M ×M matrix inversion by the matrix inversion lemma.The complexity of computing x and τ x can be shown to beO(NM 2) under the assumption that M ≤ N . This makes theEM-SBL algorithm computationally prohibitive and impracticalto use with large dimensions.

From (4) and (11), the M-step for each iteration is as follows:

γi+1 = argminγ

[N∑

n=1

(x2n + τxn2γn

+log γn

2− log p(γn )

)]

.

(12a)

This reduces to N scalar optimization problems,

γi+1n = argmin

γn

[x2n + τxn2γn

+12

log γn − log p(γn )]

. (12b)

The choice of hyperprior p(γ) plays a role in the M-step,and governs the prior for x. However, from the computationalsimplicity of the M-step, as evident from (12b), the hyperpriorrarely impacts the overall algorithmic computational complex-ity, which is mainly that of computing the quantities x and τ x

in the E-step.Often a non-informative prior is used in SBL. For the purpose

of obtaining the M-step update, we will also simplify and dropp(γ) and compute the Maximum Likelihood estimate of γ.From (12b), this reduces to, γi+1

n = x2n + τxn .


Fig. 1. GAMP Factor Graph.

Similarly, if the noise variance σ2 is unknown, it can beestimated using:

(σ2)i+1 = argmaxσ 2

Ex|y,γ;(σ 2 )

i [p(y,x,γ;σ2)]

=‖y − Ax‖2 + (σ2)i

∑Nn=1

(1 − τx n

γn

)

M. (13)

We note here that estimates obtained by (13) can be highlyinaccurate as mentioned in [35]. Therefore, it suggests that ex-perimenting with different values of σ2 or using some otherapplication based heuristic will probably lead to better results.

III. DAMPED GAUSSIAN GAMP SBL

We now show how damped GGAMP can be used to sim-plify the E-step above, leading to the damped GGAMP-SBLalgorithm. Then we examine the convergence behavior of theresulting algorithm.

A. GGAMP-SBL

Above we showed that, in the EM-SBL algorithm, the M-stepis computationally simple but the E-step is computationally de-manding. The GAMP algorithm can be used to efficiently ap-proximate the quantities x and τ x needed in the E-step, while theM-step remains unchanged. GAMP is based on the factor graphin Fig. 1, where for a given prior fn (x) = p(xn ) and a likeli-hood function gm = p(ym |x), GAMP uses quadratic approx-imations and Taylor series expansions, to provide approxima-tions of MAP or MMSE estimates of x. The reader can refer to[22] for detailed derivation of GAMP. The E-step in Table I, usesthe damped GGAMP algorithm from [24] because of its abilityto enhance traditional GAMP algorithm divergence issues withnon-i.i.d.-Gaussian A. The damped GGAMP algorithm has animportant modification over the original GAMP algorithm andalso over the previously proposed AMP-SBL [33], namely theintroduction of damping factors θs, θx ∈ (0, 1] to slow downupdates and enhance convergence. Setting θs = θx = 1 in thedamped GGAMP algorithm will yield no damping, and reducesthe algorithm to the original GAMP algorithm. We note herethat the damped GGAMP algorithm from [24] is referred to byGGAMP, and therefore we will be using the terms GGAMP anddamped GGAMP interchangeably in this paper. Moreover, when

TABLE IGGAMP-SBL ALGORITHM

the components of the matrix A are not zero-mean, one can in-corporate the same mean removal technique used in [26]. Theinput and output functions gs(p, τ p) and gx(r, τ r ) in Table I aredefined based on whether the max-sum or the sum-product ver-sion of GAMP is being used. The intermediate variables r andp are interpreted as approximations of Gaussian noise corruptedversions of x and z = Ax, with the respective noise levels ofτ r and τ p . In the max-sum version, the vector MAP estimationproblem is reduced to a sequence of scalar MAP estimates givenr and p using the input and output functions, where they aredefined as:

[gs(p, τ p)]m = pm − τpm prox −1τ p m

ln p(ym |zm )

( pmτpm

)(14)

[gx(r, τ r )]n = prox−τr n ln p(xn )(rn ) (15)

proxf (r) � argminxf(x) +

12|x− r|2 . (16)

Similarly, in the sum-product version of the algorithm, thevector MMSE estimation problem is reduced to a sequence ofscalar MMSE estimates given r and p using the input and outputfunctions, where they are defined as:

[gs(p, τ p)]m =

∫zmp(ym |zm )N (zm ; pm

τp m, 1τp m

)dzm∫p(ym |zm )N (zm ; pm

τp m, 1τp m

)dzm(17)

[gx(r, τ r )]n =∫xnp(xn )N (xn ; rn , τrn )dxn∫p(xn )N (xn ; rn , τrn )dxn

. (18)

For the parametrized Gaussian prior we imposed on x in (2),both sum-product and max-sum versions of gx(r, τ r ) yield the


Fig. 2. Cost functions on SBL and GGAMP-SBL algorithms versus number of EM iterations. (a) Cost functions for i.i.d.-Gaussian A. (b) Cost functions forcolumn correlated A.

same updates for x and τ x [22], [24]:

gx(r, τ r ) =γ

γ + τrr (19)

g′x(r, τ r ) =γ

γ + τ r. (20)

Similarly, in the case of the likelihood p(y|x) given in (5),the max-sum and sum-product versions of gs(p, τ p) yield thesame updates for s and τ s [22], [24]:

gs(p, τ p) =(p/τ p − y)(σ2 + 1/τ p)

(21)

g′s(p, τ p) =σ−2

σ−2 + τ p. (22)

We note that, in (19), (20), (21) and (22), and for all equationsin Table I, all vector squares, divisions and multiplications aretaken element wise.

In Table I, Kmax is the maximum allowed number of GAMPalgorithm iterations, εgamp is the GAMP normalized toleranceparameter, Imax is the maximum allowed number of EM itera-tions and εem is the EM normalized tolerance parameter. Uponthe convergence of GAMP algorithm based E-step, estimatesfor the mean x and covariance diagonal τ x are obtained. Theseestimates can be used in the M-step of the algorithm, given by(12b). These estimates, along with the s vector estimate, are alsoused to initialize the E-step at the next EM iteration to acceleratethe convergence of the overall algorithm.

Defining S as the component wise magnitude squared of A,the complexity of the GGAMP-SBL algorithm is dominatedby the E-step, which in turn (from Table I) is dominated bythe matrix multiplications by A, A�, S and S� at each iter-ation, implying that the computational cost of the algorithm isO(NM) operations per GAMP algorithm iteration multipliedby the total number of GAMP algorithm iterations. For largeM ,this is much smaller than O(NM 2), the complexity of standardSBL iteration.

In addition to the complexity of each iteration, for the pro-posed GGAMP-SBL algorithm to achieve faster runtimes it isimportant for GGAMP-SBL total number of iterations to notbe too large, to the point where it over weighs the reductionin complexity per iteration, especially when heavier damping isused. We point out here that while SBL provides a one step exactsolution for the E-step, GGAMP-SBL provides an approximateiterative solution. Based on that, the total number of SBL iter-ations is the number of EM iterations needed for convergence,while the total number of GGAMP-SBL iterations is based onthe number of EM iterations it needs to converge and the numberof E-step iterations for each EM iteration. First we consider the

number of EM iterations for both algorithms. As explained inSection III-B2, the E-step of GGAMP-SBL algorithm provides agood approximation of the true posterior [43]. In addition to thatthe number of EM iterations is not affected by damping, sincedamping only affects the number of iterations of GGAMP inthe E-step, but it does not affect its outcome upon convergence.Based on these two points, we can expect the number of EMiterations for GGAMP-SBL to be generally in the same range asthe original SBL algorithm. This is also shown in Section III-B2Fig. 2(a) and (b), where we can see the two cost functions be-ing reduced to their minimum values using approximately thesame number of EM iterations, even when heavier damping isused. As for the GGAMP-SBL E-step iterations, because we arewarm starting each E-step with x and s values from the previ-ous EM iteration, it was found through numerical experimentsthat the number of required E-step iterations is reduced eachtime, to the point where the E-step converges to the requiredtolerance within 2–3 iterations towards the final EM iterations.When averaging the total number of E-step iterations over thenumber of EM iterations, it was found that for medium to largeproblem sizes the average number of E-step iterations was justa fraction of the measurements number M , even in the caseswhere heavier damping was used. Moreover, it was observedthat the number of iterations required for the E-step to convergeis independent of the problem size, which gives the algorithm abigger advantage at larger problem sizes. Finally, based on thecomplexity reduction per iteration and the total number of itera-tions required for GGAMP-SBL, we can expect it to have lowerruntimes than SBL for medium to large problems, even whenheavier damping is used. This runtime advantage is confirmedthrough numerical experiments in Section V.

B. GGAMP-SBL Convergence

We now examine the convergence of the GGAMP-SBL algo-rithm. This involves two steps; the first step is to show that theapproximate message passing algorithm employed in the E-stepconverges and the second step is to show that the overall EMalgorithm, which consists of several E and M-steps, converges.For the second step, in addition to convergence of the E-step(first step), the accuracy of the resulting estimates is important.Therefore, in the second step of our convergence investigation,we use results from [43], in addition to numerical results to showthat the GGAMP-SBL’s E and M steps are actually descendingon the original SBL’s cost function (3) at each EM iteration.

1) Convergence of the E-step With Generic Transformations:For the first step, we use the analysis from [24] which showsthat, in the case of generic A, the damped GGAMP algorithm isguaranteed to globally converge (to some values x and τ x ) when


sufficient damping is used. In particular, since γi is fixed in theE-step, the prior is Gaussian and so based on results in [24],starting with an initial estimate τ x ≥ γi the variance updatesτ x , τ s , τ r and τ p will converge to a unique fixed point. Inaddition, any fixed point (s, x) for GGAMP is globally stable ifθsθx ||A||22 < 1, where the matrix A is defined as given belowand is based on the fixed-point values of τ p and τ r :

A := Diag1/2(τ pqs)A Diag1/2(τ rqx)

qs =σ−2

σ−2 + τ p, qx =

γ

γ + τ r.

While the result above establishes that the GGAMP algorithmis guaranteed to converge when sufficient amount of damping isused at each iteration, in practice we do not recommend build-ing the matrix A at each EM iteration and calculating its spec-tral norm. Rather, we recommend choosing sufficiently smalldamping factors θx and θs and fixing them for all GGAMP-SBL iterations. For this purpose, the following result from [24]for an i.i.d.-Gaussian prior p(x) = N (x;0, γxI) can providesome guidance on choosing the damping factors. For the i.i.d.-Gaussian prior case, the damped GAMP algorithm is shown toconverge if

Ω(θs, θx) > ‖A‖22/‖A‖2

F , (23)

where Ω(θs, θx) is defined as

Ω(θs, θx) :=2[(2 − θx)N + θxM ]

θxθsMN. (24)

Experimentally, it was found that using a threshold Ω(θs, θx)that is 10% larger than (24) is sufficient for the GGAMP-SBLalgorithm to converge in the scenarios we considered.

2) GGAMP-SBL Convergence: The result above guaranteesconvergence of the E-step to some vectors x and τ x but it doesnot provide information about the overall convergence of theEM algorithm to the desired SBL fixed points. This convergencedepends on the quality of the mean x and variance τ x computedby the GGAMP algorithm. It has been shown that for an arbitraryA matrix, the fixed-point value of x will equal the true meangiven in (8) [43]. As for the variance updates, based on the stateevolution in [23], the vector τ x will equal the true posteriorvariance vector, i.e., the diagonal of (9), in the case that A islarge and i.i.d. Gaussian, but otherwise will only approximatethe true quantity.

The approximation of τ x by the GGAMP algorithm in theE-step introduces an approximation in the GGAMP-SBL algo-rithm compared to the original EM-SBL algorithm. Fortunately,there is some flexibility in the EM algorithm in that the M-stepneed not be carried out to minimize the objective stated in (12a)but it is sufficient to decrease the objective function as discussedin the generalized EM algorithm literature [41], [42]. Given thatthe mean is estimated accurately, EM iterations may be tolerantto some error in the variance estimate. Some flexibility in thisregards can also be gleaned from the results in [44], where itis shown how different iteratively reweighted algorithms corre-spond to a different choice in the variance. However, we have notbeen able to prove rigorously that the GGAMP approximationwill guarantee descent of the original cost function given in (3).

Nevertheless, our numerical experiments suggest that theGGAMP approximation has negligible effect on algorithm

convergence and ability to recover sparse solutions. We se-lect two experiments to illustrate the convergence behavior anddemonstrate that the approximate variance estimates are suffi-cient to decrease SBL’s cost function (3). In both experiments xis drawn from a Bernoulli-Gaussian distribution with a non-zeroprobability λ set to 0.2, and we set N = 1000 and M = 500.Fig. 2 shows a comparison between the original SBL and theGGAMP-SBL’s cost functions at each EM iteration of the algo-rithms. A in Fig. 2(a) is i.i.d.-Gaussian, while in Fig. 2(b) it is acolumn correlated transformation matrix, which is constructedaccording to the description given in Section V, with correlationcoefficient ρ = 0.9.

The cost functions in Fig. 2(a) and (b) show that, althoughwe are using an approximate variance estimate to implementthe M-step, the updates are decreasing the SBL’s cost functionat each iteration. As noted previously, it is not necessary forthe M-step to provide the maximum cost function reduction,it is sufficient to provide some reduction in the cost functionfor the EM algorithm to be effective. The cost function plotsconfirm this principle, since GGAMP-SBL eventually reachesthe same minimal value as the original EM-SBL. While thetwo numerical experiments do not provide a guarantee that theoverall GGAMP-SBL algorithm will converge, they suggest thatthe performance of the GGAMP-SBL algorithm often matchesthat of the original EM-SBL algorithm, which is supported bythe more extensive numerical results in Section V.

IV. GGAMP-TSBL FOR THE MMV PROBLEM

In this section, we apply the damped GAMP algorithm to theMMV empirical Bayesian approach to derive a low complexityalgorithm for the MMV case as well. Since the GAMP algorithmwas originally derived for the SMV case using an SMV factorgraph [22], extending it to the MMV case requires some moreeffort and requires going back to the factor graphs that are thebasis of the GAMP algorithm, making some adjustments, andthen utilizing the GAMP algorithm.

Once again we use an empirical Bayesian approach with aGSM model, and we focus on the ML estimate of γ. We assume acommon sparsity profile between all measured vectors, and alsoaccount for the temporal correlation that might exist betweenthe non-zero signal elements. Previous Bayesian algorithms thathave shown good recovery performance for the MMV probleminclude extensions of the SMV SBL algorithm, such as MSBL[35], TSBL and TMSBL [34]. MSBL is a straightforward ex-tension of SMV SBL, where no temporal correlation betweennon-zero elements is assumed, while TSBL and TMSBL ac-count for temporal correlation. Even though the TMSBL algo-rithm has lower complexity compared to the TSBL algorithm,the algorithm still has complexity of O(NM 2), which can limitits utility when the problem dimensions are large. Other AMPbased Bayesian algorithms have achieved linear complexity inthe problem dimensions, like AMP-MMV [37]. However AMP-MMV’s robustness to generic A matrices is expected to be out-performed by an SBL based approach.

A. MMV Model and Factor Graph

The MMV model can be stated as:

y(t) = Ax(t) + e(t) , t = 1, 2, ..., T,


where we have T measurement vectors [y(1) ,y(2) ...,y(T ) ]with y(t) ∈ RM . The objective is to recover X =[x(t) ,x(2) ...,x(T ) ] with x(t) ∈ RN , where in addition to thevectors x(t) being sparse, they share the same sparsity pro-file. Similar to the SMV case, A ∈ RM×N is known, and[e(1) ,e(2) ...,e(T ) ] is a sequence of i.i.d. noise vectors modeledas e(t) ∼ N (0, σ2I). This model can be restated as:

y = D(A)x + e,

where y�[y(1)� ,y(2)� ...,y(T )� ]�, x�[x(1)� ,x(2)� ...,x(T )� ]�,e � [e(1)� ,e(2)� ...,e(T )� ]� and D(A) is a block-diagonal ma-trix constructed from T replicas of A.

The posterior distribution of x is given by:

p(x | y) ∝T∏

t=1

[M∏

m=1

p(y(t)m | x(t))

N∏

n=1

p(x(t)n | x(t−1)

n )

]

,

where

p(y(t)m | x(t)) = N (y(t)

m ;a�m,.x

(t) , σ2),

where a�m,. is the mth row of the matrix A. Similar to the

previous work in [36], [37], [45], we use an AR(1) process tomodel the correlation between x(t)

n and x(t−1)n , i.e.,

x(t)n = βx(t−1)

n +√

1 − β2v(t)n

p(x(t)n | x(t−1)

n ) = N (x(t)n ;βx(t−1)

n , (1 − β2)γn ), t > 1

p(x(1)n ) = N (x(1)

n ; 0, γn ),

where β ∈ (−1, 1) is the temporal correlation coefficient andv

(t)n ∼ N (0, γn ). Following an empirical Bayesian approach

similar to the one proposed for the SMV case, the hyperparam-eter vector γ is then learned from the measurements using theEM algorithm. The EM algorithm can also be used to learn thecorrelation coefficient β and the noise variance σ2 . Based onthese assumptions we use the sum-product algorithm [46] toconstruct the factor graph in Fig. 3, and derive the MMV algo-rithm GGAMP-TSBL. In the MMV factor graph, the factors areg

(t)m (x) = p(y(t)

m | x(t)), f (t)n (x(t)

n ) = p(x(t)n | x(t−1)

n ) for t > 1and f (1)

n (x(1)n ) = p(x(1)

n ).

B. GGAMP-TSBL Message Phases and Scheduling (E-Step)

Due to the similarities between the factor graph for each timeframe of the MMV model and the factor graph of the SMVmodel, we will use the algorithm in Table I as a building blockand extend it to the MMV case. We divide the message updatesinto three steps as shown in Fig. 4.

For each time frame the “within” step in Fig. 4 is very similarto the SMV GAMP iteration, with the only difference beingthat each x(t)

n is connected to the factor nodes f (t)n and f (t+1)

n ,while it is connected to one factor node in the SMV case. Thisdifference is reflected in the calculation of the output function

Fig. 3. GGAMP-TSBL factor graph.

Fig. 4. Message passing phases for GGAMP-TSBL.

gx and therefore in finding the mean and variance estimates forx. The details of finding gx and therefore the update equationsfor τ

(t)x and x(t) are shown in Appendix A. The input function

gs is the same as (21), and the update equations for τ(t)s and s(t)

are the same as (A3) and (A4) from Table I, because an AWGNmodel is assumed for the noise. The second type of updates arepassing messages forward in time from x

(t−1)n to x(t)

n throughf

(t)n . And the final type of updates is passing messages backward

in time from x(t+1)n to x(t)

n through f (t)n . The details for finding

the “forward” and “backward” message passing steps are alsoshown in Appendix A.

We schedule the messages by moving forward in time first,where we run the “forward” step starting at t = 1 all the way tot = T . We then perform the “within” step for all time frames,this step updates r(t) , τ

(t)r , x(t) and τ

(t)x that are needed for the

“forward ”and “backward” message passing steps. Finally wepass the messages backward in time using the “backward” step,starting at t = T and ending at t = 1. Based on this message

γi+1n =

1T

[

|x(1)n |2 + τ (1)

xn+

11 − β2

T∑

t=2

|x(1)n |2 + τ (t)

xn+ β(|x(t−1)

n |2 + τ (t−1)xn

) − 2β(x(t)n x(t−1)

n + βτ (t−1)xn

)

]

. (25)


TABLE IIGGAMP-TSBL ALGORITHM

schedule, the GAMP algorithm E-step computation is summa-rized in Table II. In Table II we use the unparenthesized su-perscript to indicate the iteration index, while the parenthesizedsuperscript indicates the time frame index. Similar to Table I,Kmax is the maximum allowed number of GAMP iterations,εgamp is the GAMP normalized tolerance parameter, Imax is themaximum allowed number of EM iterations and εem is the EMnormalized tolerance parameter. In Table II all vector squares,divisions and multiplications are taken element wise.

The algorithm proposed can be considered an extension ofthe previously proposed AMP TSBL algorithm in [33]. The

extension to GGAMP-TSBL includes removing the averagingof the matrix A in the derivation of the algorithm, and it includesintroducing the same damping strategy used in the SMV caseto improve convergence. The complexity of the GGAMP-TSBLalgorithm is also dominated by the E-step which in turn is domi-nated by matrix multiplications by A,A�, S and S�, implyingthat the computational cost is O(MN) flops per iteration perframe. Therefore the complexity of the proposed algorithm isO(TMN) multiplied by the total number of GAMP algorithmiterations.

C. GGAMP-TSBL M-Step

Upon the convergence of the E-step, the M-step learns γ fromthe data by treating x as a hidden variable and then maximizingEx|y;γi ,σ 2 ,β [log p(y, x,γ;σ2 , β)].

γi+1 = argminγ

Ex|y;γi ,σ 2 ,β [− log p(y, x,γ;σ2 , β)].

The derivation of γi+1n M-step update follows the same steps

as the SMV case. The derivation is omitted here due to spacelimitation, and γi+1

n update is given in (25) at the bottom of theprevious page. We note here that the M-step γ learning rule in(25) is the same as the one derived in [36]. Both algorithms usethe same AR(1) model for x(t) , but they differ in the implemen-tation of the E-step. In the case that the correlation coefficientβ or the noise variance σ2 are unknown, the EM algorithm canbe used to estimate their values as well.

V. NUMERICAL RESULTS

In this section we present a numerical study to illustrate theperformance and complexity of the proposed GGAMP-SBL andGGAMP-TSBL algorithms. The performance and complexitywere studied through two metrics. The first metric studies theability of the algorithm to recover x, for which we use thenormalized mean squared error NMSE in the SMV case:

NMSE � ‖x − x‖2/‖x‖2 ,

and the time-averaged normalized mean squared error TNMSEin the MMV case:

TNMSE � 1T

T∑

t=1

‖x(t) − x(t)‖2/‖x(t)‖2 .

The second metric studies the complexity of the algorithm bytracking the time the algorithm requires to compute the finalestimate x. We measure the time in seconds. While the absoluteruntime could vary if the same experiments were to be run ona different machine, the runtimes of the algorithms of interestin relationship to each other is a good estimate of the relativecomputational complexity.

Several types of non-i.i.d.-Gaussian matrix were used to ex-plore the robustness of the proposed algorithms relative to thestandard SBL and TMSBL. The four different types of matricesare similar to the ones previously used in [26] and are describedas follows:

-Column correlated matrices: The rows of A are indepen-dent zero-mean Gaussian Markov processes with the followingcorrelation coefficient ρ = E{a�

.,na.,n+1}/E{|a.,n |2}, where


a.,n is the nth column of A. In the experiments the correla-tion coefficient ρ is used as the measure of deviation from thei.i.d.-Gaussian matrix.

-Low rank product matrices: We construct a rank deficient Aby A = 1

N HG with H ∈ RM×R , G ∈ RR×N and R < M .The entries of H and G are i.i.d.-Gaussian with zero mean andunit variance. The rank ratio R/N is used as the measure ofdeviation from the i.i.d.-Gaussian matrix.

-Ill conditioned matrices: we construct A with a conditionnumber κ > 1 as follows. A = UΣV �, where U and V � arethe left and right singular vector matrices of an i.i.d.-Gaussianmatrix, and Σ is a singular value matrix with Σi,i/Σi+1,i+1 =κ1/(M−1) for i = 1, 2, ....,M − 1. The condition number κ isused as the measure of deviation from the i.i.d.-Gaussian matrix.

-Non-zero mean matrices: The elements of A are am,n ∼N (μ, 1

N ). The mean μ is used as a measure of deviation fromthe zero-mean i.i.d.-Gaussian matrix. It is worth noting that inthe case of non-zero mean A, convergence of the GGAMP-SBL is not enhanced by damping but more by the mean removalprocedure explained in [26]. We include it in the implementationof our algorithm, and we include it in the numerical results tomake the study more inclusive of different types of generic Amatrices.

Although we have provided an estimation procedure, basedon the EM algorithm, for the noise variance σ2 in (13), in allexperiments we assume that the noise variance σ2 is known. Wealso found that the SBL algorithm does not necessarily have thebest performance when the exact σ2 is used, and in our case, itwas empirically found that using an estimate σ2 = 3σ2 yieldsbetter results. Therefore σ2 is used for SBL, TMSBL, GGAMP-SBL and GGAMP-TSBL throughout our experiments.

A. SMV GGAMP-SBL Numerical Results

In this section we compare the proposed SMV algorithm(GGAMP-SBL) against the original SBL and against two AMPalgorithms that have shown improvement in robustness over theoriginal AMP/GAMP, namely the SwAMP algorithm [27] andthe MADGAMP algorithm [26]. As a performance benchmark,we use a lower bound on the achievable NMSE which is similarto the one in [26]. The bound is found using a “genie” thatknows the support of the sparse vector x. Based on the knownsupport, A is constructed from the columns of A correspondingto non-zero elements of x, and an MMSE solution using A iscomputed.

x = A�(AA

� + σ2I)−1y.

In all SMV experiments, x had exactly K non-zero elements inrandom locations, and the nonzero entires were drawn indepen-dently from a zero-mean unit-variance Gaussian distribution. Inaccordance with the model (1), an AWGN channel was usedwith the SNR defined by:

SNR � E{‖Ax‖2}/E{‖y − Ax‖2}.1) Robustness to Generic Matrices at High SNR: The first

experiment investigates the robustness of the proposed algo-rithm to generic A matrices. It compares the algorithms ofinterest using the four types of matrices mentioned above, overa range of deviation from the i.i.d.-Gaussian case. For each

Fig. 5. NMSE comparison of SMV algorithms under non-i.i.d.-Gaussian Amatrices with SNR = 60 dB.

matrix type, we start with an i.i.d.-Gaussian A and increase thedeviation over 11 steps. We monitor how much deviation thedifferent algorithms can tolerate, before we start seeing signif-icant performance degradation compared to the “genie” bound.The vector x was drawn from a Bernoulli-Gaussian distributionwith non-zero probability λ = 0.2, with N = 1000, M = 500and SNR = 60 dB.

The NMSE results in Fig. 5 show that the performance ofGGAMP-SBL was able to match that of the original SBL evenfor A matrices with the most deviation from the i.i.d.-Gaussiancase. Both algorithms nearly achieved the bound in most cases,with the exception when the matrix is low rank with a rank ratioless than 0.45 where both algorithms fail to achieve the bound.This supports the evidence we provided before for the conver-gence of the GGAMP-SBL algorithm, which predicted its abil-ity to match the performance of the original SBL. As for otherAMP implementations, despite the improvement in robustnessthey provide over traditional AMP/GAMP, they cannot guaran-tee convergence beyond a certain point, and their robustness issurpassed by GGAMP-SBL in most cases. The only exceptionis when A is non-zero mean, where the GGAMP-SBL and theMADGAMP algorithms share similar performance. This is dueto the fact that both algorithms use mean removal to transformthe problem into a zero-mean equivalent problem, which bothalgorithms can handle well.

The complexity of the GGAMP-SBL algorithm is studiedin Fig. 6. The figure shows how the GGAMP-SBL was ableto reduce the complexity compared to the original SBL imple-mentation. It also shows that even when the algorithm is slowed


Fig. 6. Runtime comparison of SMV algorithms under non-i.i.d.-GaussianAmatrices with SNR = 60 dB.

down by heavier damping, the algorithm still has faster runtimesthan the original SBL.

2) Robustness to Generic Matrices at Lower SNR: In thisexperiment we examine the performance and complexity of theproposed algorithm at a lower SNR setting than the previous ex-periment. We lower the SNR to 30 dB and collect the same datapoints as in the previous experiment. The results in Fig. 7 showthat the performance of the GGAMP-SBL algorithm is still gen-erally matching that of the original SBL algorithm with slightdegradation. The MADGAMP algorithm provides slightly bet-ter performance than both SBL algorithms when the deviationfrom the i.i.d.-sub-Gaussian case is not too large. This can bedue to the fact that we choose to run the MADGAMP algo-rithm with exact knowledge of the data model rather than learnthe model parameters, while both SBL algorithms have infor-mation about the noise variance only. As the deviation in Aincreases, GGAMP-SBL’s performance surpasses MADGAMPand SWAMP algorithms, providing better robustness at lowerSNR.

On the complexity side, we see from Fig. 8 that the GGAMP-SBL continues to have reduced complexity compared to theoriginal SBL.

3) Performance and Complexity Versus Problem Dimen-sions: To show the effect of increasing the problem dimensionson the performance and complexity of the different algorithms,we plot the NMSE and runtime against N , while we keep anM/N ratio of 0.5, a K/N ratio of 0.2 and an SNR of 60 dB.We run the experiment using column correlated matrices withρ = 0.9.

Fig. 7. NMSE comparison of SMV algorithms under non-i.i.d.-Gaussian Amatrices with SNR = 30 dB.

Fig. 8. Runtime comparison of SMV algorithms under non-i.i.d.-GaussianAmatrices with SNR = 30 dB.


Fig. 9. Performance and complexity comparison for SMV algorithms versusproblem dimensions.

As expected from previous experiments, Fig. 9(a) shows thatonly GGAMP-SBL and SBL algorithms can recover x when weuse column correlated matrices with a correlation coefficient ofρ = 0.9. The comparison between the performance of SBL andGGAMP-SBL show almost identical NMSE.

As problem dimensions grow, Fig. 9(b) shows that the differ-ence in runtimes between the original SBL and GGAMP-SBLalgorithms grows to become more significant, which suggeststhat the GGAMP-SBL is more practical for large size problems.

4) Performance Versus Undersampling Ratio M/N : In thissection we examine the ability of the proposed algorithm torecover a sparse vector from undersampled measurements atdifferent undersampling ratios M/N . In the below experimentswe fix N at 1000 and vary M . We set the Bernoulli-Gaussiannon-zero probability λ so that M/K has an average of threemeasurements for each non-zero component. We plot the NMSEversus the undersampling ratio M/N for i.i.d.-Gaussian matri-ces A and for column correlated A with ρ = 0.9. We run theexperiments at SNR = 60 dB and at SNR = 30 dB. In Fig. 10 wefind that for SNR = 60 dB and i.i.d.-Gaussian A, all algorithmsmeet the SKS bound when the undersampling ratio is larger thanor equal to 0.25, while all algorithms fail to meet the bound at anyratio smaller than that. When A is column correlated, SBL andGGAMP-SBL are able to meet the SKS bound at M/N ≥ 0.3,while MADGAMP and SwAMP do not meet the bound evenat M/N = 0.5. We also note the MADGAMP’s NMSE slowlyimproves with increased underasampling ratio, while SwAMP’sNMSE does not. At SNR = 30 dB, with i.i.d.-Gaussian A allalgorithms are close to the SKS bound when the undersamplingratio is larger than 0.3. At M/N ≤ 0.3, SBL and GGAMP-

Fig. 10. NMSE comparison of SMV algorithms versus the undersamplingrate M/N.

SBL are slightly outperformed by MADGAMP, while SwAMPseems to have the best performance in this region. When A iscolumn correlated, NMSE of SBL and GGAMP-SBL outper-form the other two algorithms, and similar to the SNR = 60 dBcase, MADGAMP’s NMSE seems to slowly improve with in-creased undersampling ratio, while SwAMP’s NMSE does notimprove.

B. MMV GGAMP-TSBL Numerical Results

In this section, we present a numerical study to illustrate theperformance and complexity of the proposed GGAMP-TSBLalgorithm. Although the AMP MMV algorithm in [37] can beextended to incorporate damping, the current implementation ofAMP MMV does not include damping and will diverge whenused with the type of generic A matrices we are considering forour experiments. Therefore, we restrict the comparison of theperformance and complexity of the GGAMP-TSBL algorithmto the TMSBL algorithm. We also compare the recovery per-formance against a lower bound on the achievable TNMSE byextending the support aware Kalman smoother (SKS) from [37]to include damping and hence be able to handle generic A ma-trices. The implementation of the smoother is straight forward,and is exactly the same as the E-step part in Table II, when thetrue values of σ2 , γ and β are used, and when A is modifiedto include only the columns corresponding to the non-zero ele-ments in x(t) . An AWGN channel was also assumed in the caseof MMV.

1) Robustness to Generic Matrices at High SNR: The ex-periment investigates the robustness of the proposed algorithm


Fig. 11. TNMSE comparison of MMV algorithms under non-i.i.d.-GaussianAmatrices with SNR = 60 dB.

by comparing it to the TMSBL and the support aware smoother.Once again we use the four types of matrices mentioned at thebeginning of this section, over the same range of deviation fromthe i.i.d.-Gaussian case. For this experiment we set N = 1000,M = 500, λ = 0.2, SNR = 60 dB and the temporal correla-tion coefficient β to 0.9. We choose a relatively high value forβ to provide large deviation from the SMV case. This is dueto the fact that the no correlation case is reduced to solvingmultiple SMV instances in the E-step, and then applying theM-step to update the hyperparameter vector γ, which is com-mon across time frames [35]. The TNMSE results in Fig. 11show that the performance of GGAMP-TSBL was able to matchthat of TMSBL in all cases and they both achieved the SKSbound.

Once again Fig. 12 shows that the proposed GGAMP-TSBLwas able to reduce the complexity compared to the TMSBLalgorithm, even when damping was used. Although the com-plexity reduction does not seem to be significant for the selectedproblem size and SNR, we will see in the following experimentshow this reduction becomes more significant as the problem sizegrows or as a lower SNR is used.

2) Robustness to Generic Matrices at Lower SNR: The per-formance and complexity of the proposed algorithm are exam-ined at a lower SNR setting than the previous experiment. Weset the SNR to 30 dB and collect the same data points collectedas in the 60 dB SNR case. Fig. 13 shows that the GGAMP-TSBLperformance matches that of the TMSBL and almost achievesthe bound in most cases.

Similar to the previous cases, Fig. 14 shows that the complex-ity of GGAMP-TSBL is lower than that of TMSBL.

Fig. 12. Runtime comparison of MMV algorithms under non-i.i.d.-GaussianAmatrices with SNR = 60 dB.

Fig. 13. TNSME comparison of MMV algorithms under non-i.i.d.-GaussianAmatrices with SNR = 30 dB.


Fig. 14. Runtime comparison of MMV algorithms under non-i.i.d.-GaussianAmatrices with SNR = 30 dB.

Fig. 15. Performance and complexity comparison for MMV algorithms versusproblem dimensions.

3) Performance and Complexity Versus Problem Dimension:To validate the claim that the proposed algorithm is more suitedto deal with large scale problems we study the algorithms’ per-formance and complexity against the signal dimension N . Wekeep an M/N ratio of 0.5, a K/N ratio of 0.2 and an SNR of60 dB. We run the experiment using column correlated matri-ces with ρ = 0.9. In addition, we set β to 0.9, high temporalcorrelation. In terms of performance, Fig. 15(a) shows that theproposed GGAMP-TSBL algorithm was able to match the per-formance of TMSBL. However, in terms of complexity, similarto the SMV case, Fig. 15(b) shows that the runtime differencebecomes more significant as the problem size grows, makingthe GGAMP-SBL a better choice for large scale problems.

VI. CONCLUSION

In this paper, we presented a GAMP based SBL algorithmfor solving the sparse signal recovery problem. SBL uses spar-sity promoting priors on x that admit a Gaussian scale mixturerepresentation. Because of the Gaussian embedding offered bythe GSM class of priors, we were able to leverage the GaussianGAMP algorithm along with it’s convergence guarantees givenin [24], when sufficient damping is used, to develop a reliableand fast algorithm. We numerically showed how this dampedGGAMP implementation of the SBL algorithm also reduces thecost function of the original SBL approach. The algorithm wasthen extended to solve the MMV SSR problem in the case ofgeneric A matrices and temporal correlation, using a similarGAMP based SBL approach. Numerical results show that boththe SMV and MMV proposed algorithms were more robust togeneric A matrices when compared to other AMP algorithms. Inaddition, numerical results also show the significant reduction incomplexity the proposed algorithms offer over the original SBLand TMSBL algorithms, even when sufficient damping is usedto slow down the updates to guarantee convergence. Thereforethe proposed algorithms address the convergence limitations inAMP algorithms as well as the complexity challenges in tra-ditional SBL algorithms, while retaining the positive attributesnamely the robustness of SBL to generic A matrices, and thelow complexity of message passing algorithms.

APPENDIX ADERIVATION OF GGAMP-TSBL UPDATES

A. The Within Step Updates

To make the factor graph for the within step in Fig. 4 exactlythe same as the SMV factor graph we combine the product ofthe two messages incoming from f

(t)n and f (t+1)

n to x(t)n into

one message as follows:

Vf

( t )n →x

( t )n

∝ N (x(t)n ; η(t)

n , ψ(t)n )

Vf

( t+ 1 )n →x

( t )n

∝ N (x(t)n ; θ(t)

n , φ(t)n )

Vf

( t )n →x

( t )n

∝ N (x(t)n ; ρ(t)

n , ζ(t)n )

∝ N

⎛

⎜⎝x(t)

n ;η

( t )n

ψ( t )n

+ θ( t )n

φ( t )n

1ψ

( t )n

+ 1φ

( t )n

,1

1ψ

( t )n

+ 1φ

( t )n

⎞

⎟⎠ . (26)


Combining these two messages reduces each time frame factorgraph to an equivalent one to the SMV case with a modifiedprior on x(t)

n of (26). Applying the damped GAMP algorithmfrom [24] with p(x(t)

n ) given in (26):

g(t)x =

r( t )

τ( t )r

+ ρ( t )

ζ( t )

1τ

( t )r

+ 1ζ( t )

=r( t )

τ( t )r

+ η( t )

ψ( t ) + θ( t )

φ( t )

1τ

( t )r

+ 1ψ( t ) + 1

φ( t )

τ (t)x =

11τ

( t )r

+ 1ζ( t )

=1

1τ

( t )r

+ 1ψ( t ) + 1

φ( t )

.

B. Forward Message Updates

Vf

( 1 )n →x

( 1 )n

∝ N (x(1)n ; 0, γn )

Vf

( t )n →x

( t )n

∝ N (x(t)n ; η(t)

n , ψ(t)n )

∝∫ ( M∏

l=1

Vg

( t−1 )l →x

( t−1 )n

)

Vf

( t−1 )n →x

( t−1 )n

P (x(t)n | x(t−1)

n ) dx(t−1)n

∝∫

N (x(t−1)n ; r(t−1)

n , τ (t−1)rn

)

N (x(t−1)n ; η(t−1)

n , ψ(t−1)n )

N (x(t)n ;βx(t−1)

n , (1 − β2)γn ) dx(t−1)n .

Using rules for Gaussian pdf multiplication and convolution weget the η(t)

n and ψ(t)n updates given in Table II (E3) and (E4).

C. Backward Message Updates

Vf

( t+ 1 )n →x

( t )n

∝ N (x(t)n ; θ(t)

n , φ(t)n )

∝∫ ( M∏

l=1

Vg

( t+ 1 )l →x

( t+ 1 )n

)

Vf

( t+ 2 )n →x

( t+ 1 )n

P (x(t+1)n | x(t)

n ) dx(t+1)n

∝∫

N (x(t+1)n ; r(t+1)

n , τ (t+1)rn

)

N (x(t+1)n ; θ(t+1)

n , φ(t+1)n )

N (x(t+1)n ;βx(t)

n , (1 − β2)γn ) dx(t+1)n .

Using rules for Gaussian pdf multiplication and convolutionwe get the θ(t)

n and φ(t)n updates given in Table II (E13) and

(E14).

REFERENCES

[1] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52,no. 4, pp. 1289–1306, Apr. 2006.

[2] E. J. Candes and T. Tao, “Near-optimal signal recovery from randomprojections: Universal encoding strategies?,” IEEE Trans. Inf. Theory,vol. 52, no. 12, pp. 5406–5425, Dec. 2006.

[3] R. G. Baraniuk, “Compressive sensing,” IEEE Signal Process. Mag.,vol. 24, no. 4, pp. 118–121, Jul. 2007.

[4] M. Elad, Sparse and Redundant Representations: From Theory to Appli-cations in Signal and Image Processing. New York, NY, USA: Springer,2010.

[5] S. Foucart and H. Rauhut, A Mathematical Introduction to CompressiveSensing (Applied and Numerical Harmonic Analysis). Cambridge, MA,USA: Birkhauser, 2013.

[6] Y. C. Eldar and G. Kutyniok, eds., Compressed Sensing: Theory andApplications. Cambridge, U.K.: Cambridge Univ. Press, 2012.

[7] B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAMJ. Comput., vol. 24, no. 2, pp. 227–234, 1995.

[8] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans.Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005.

[9] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decom-position by basis pursuit,” SIAM Rev., vol. 43, no. 1, pp. 129–159,2001.

[10] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measure-ments via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53,no. 12, pp. 4655–4666, Dec. 2007.

[11] M. A. Figueiredo, J. M. Bioucas-Dias, and R. D. Nowak, “Majorization-minimization algorithms for wavelet-based image restoration,” IEEETrans. Image Process., vol. 16, no. 12, pp. 2980–2991, Dec. 2007.

[12] E. J. Candes, M. B. Wakin, and S. P. Boyd, “Enhancing sparsity byreweighted �1 minimization,” J. Fourier Anal. Appl., vol. 14, no. 5–6,pp. 877–905, 2008.

[13] R. Chartrand and W. Yin, “Iteratively reweighted algorithms for compres-sive sensing,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.,2008, pp. 3869–3872.

[14] M. E. Tipping, “Sparse Bayesian learning and the relevance vector ma-chine,” J. Mach. Learn. Res., vol. 1, pp. 211–244, Sept. 2001.

[15] D. P. Wipf and B. D. Rao, “Sparse Bayesian learning for basis selection,”IEEE Trans. Signal Process., vol. 52, pp. 2153–2164, Aug. 2004.

[16] J. P. Vila and P. Schniter, “Expectation-maximization Gaussian-mixtureapproximate message passing,” IEEE Trans. Signal Process., vol. 61,no. 19, pp. 4658–4672, Oct. 2013.

[17] S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Bayesian compressivesensing using laplace priors,” IEEE Trans. Image Process., vol. 19, no. 1,pp. 53–63, Jan. 2010.

[18] L. He and L. Carin, “Exploiting structure in wavelet-based Bayesian com-pressive sensing,” IEEE Trans. Signal Process., vol. 57, no. 9, pp. 3488–3497, Sep. 2009.

[19] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans.Signal Process., vol. 56, no. 6, pp. 2346–2356, Jun. 2008.

[20] D. Baron, S. Sarvotham, and R. G. Baraniuk, “Bayesian compressivesensing via belief propagation,” IEEE Trans. Signal Process., vol. 58,no. 1, pp. 269–280, Jan. 2010.

[21] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algorithmsfor compressed sensing: I. motivation and construction,” in Proc. IEEEInf. Theory Workshop Inf. Theory, Jan 2010, pp. 1–5.

[22] S. Rangan, “Generalized approximate message passing for estimation withrandom linear mixing,” in Proc. IEEE Int. Symp. Inf. Theory, Jul. 2011,pp. 2168–2172.

[23] M. Bayati and A. Montanari, “The dynamics of message passing on densegraphs, with applications to compressed sensing,” IEEE Trans. Inf. Theory,vol. 57, no. 2, pp. 764–785, Feb. 2011.

[24] S. Rangan, P. Schniter, and A. Fletcher, “On the convergence of approxi-mate message passing with arbitrary matrices,” in Proc. IEEE Int. Symp.Inf. Theory, Jun. 2014, pp. 236–240.

[25] F. Caltagirone, L. Zdeborova, and F. Krzakala, “On convergence of ap-proximate message passing,” in Proc. IEEE Int. Symp. Inf. Theory, Jun.2014, pp. 1812–1816.

[26] J. Vila, P. Schniter, S. Rangan, F. Krzakala, and L. Zdeborov, “Adaptivedamping and mean removal for the generalized approximate message pass-ing algorithm,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process.,Apr. 2015, pp. 2021–2025.

[27] A. Manoel, F. Krzakala, E. Tramel, and L. Zdeborova, “Swept approximatemessage passing for sparse estimation,” in Proc. 32nd Int. Conf. Mach.Learn., 2015, pp. 1123–1132.

[28] S. Rangan, A. K. Fletcher, P. Schniter, and U. S. Kamilov, “Infer-ence for generalized linear models via alternating directions and bethefree energy minimization,” in Proc. IEEE Int. Symp. Inf. Theory, 2015,pp. 1640–1644.

[29] D. F. Andrews and C. L. Mallows, “Scale mixtures of normal distribu-tions,” J. Roy. Stat. Soc. Series B (Methodological), vol. 36, pp. 99–102,1974.

[30] J. Palmer, K. Kreutz-Delgado, B. D. Rao, and D. P. Wipf, “Variational EMalgorithms for non-Gaussian latent variable models,” in Proc. Adv. NeuralInf. Process. Syst., 2005, pp. 1059–1066.

[31] K. Lange and J. S. Sinsheimer, “Normal/independent distributions andtheir applications in robust regression,” J. Comput. Graph. Stat., vol. 2,no. 2, pp. 175–198, 1993.


[32] N. L. Pedersen, C. N. Manchon, M.-A. Badiu, D. Shutin, and B. H. Fleury,“Sparse estimation using Bayesian hierarchical prior modeling for real andcomplex linear models,” Signal Process., vol. 115, pp. 94–109, 2015.

[33] M. Al-Shoukairi and B. Rao, “Sparse Bayesian learning using approxi-mate message passing,” in Proc. IEEE 48th Asilomar Conf. Signals Syst.Comput., 2014, pp. 1957–1961.

[34] Z. Zhang and B. D. Rao, “Sparse signal recovery with temporally corre-lated source vectors using sparse Bayesian learning,” IEEE J. Sel. Top.Signal Process., vol. 5, no. 5, pp. 912–926, Sep. 2011.

[35] D. P. Wipf and B. D. Rao, “An empirical Bayesian strategy for solv-ing the simultaneous sparse approximation problem,” IEEE Trans. SignalProcess., vol. 55, no. 7, pp. 3704–3716, Jul. 2007.

[36] R. Prasad, C. R. Murthy, and B. D. Rao, “Joint approximately sparsechannel estimation and data detection in OFDM systems using sparseBayesian learning,” IEEE Trans. Signal Process., vol. 62, no. 14, pp. 3591–3603, Jul. 2014.

[37] J. Ziniel and P. Schniter, “Efficient high-dimensional inference in the mul-tiple measurement vector problem,” IEEE Trans. Signal Process., vol. 61,no. 2, pp. 340–354, Jan. 2013.

[38] R. Giri and B. D. Rao, “Type I and type II Bayesian methods for sparsesignal recovery using scale mixtures,” IEEE Trans. Signal Process., vol.64, no. 13, pp. 3418–3428, 2016.

[39] M. E. Tipping et al., “Fast marginal likelihood maximisation for sparseBayesian models,” in Proc. 9th Int. Workshop Artif. Intell. Stat., 2003.[Online]. Available: http://dblp.org/db/conf/aistats/aistats2003.html

[40] C. M. Bishop, Pattern Recognition and Machine Learning (InformationScience and Statistics). Secaucus, NJ, USA: Springer, 2006.

[41] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions,vol. 382. New York, NY, USA: Wiley, 2007.

[42] C. J. Wu, “On the convergence properties of the EM algorithm,” Ann.Stat., vol. 11, pp. 95–103, 1983.

[43] S. Rangan, P. Schniter, E. Riegler, A. Fletcher, and V. Cevher, “Fixedpoints of generalized approximate message passing with arbitrary matri-ces,” in Proc. IEEE Int. Symp. Inf. Theory, 2013, pp. 664–668.

[44] D. Wipf and S. Nagarajan, “Iterative reweighted �1 and �2 methods forfinding sparse solutions,” IEEE J. Sel. Top. Signal Process., vol. 4, no. 2,pp. 317–329, Apr. 2010.

[45] Z. Zhang and B. D. Rao, “Sparse signal recovery in the presence ofcorrelated multiple measurement vectors,” in Proc. IEEE Int. Conf. Acoust.Speech Signal Process., 2010, pp. 3986–3989.

[46] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and thesum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, Feb. 2001.

Maher Al-Shoukairi (S’17) received the B.Sc. de-gree in electrical engineering from the University ofJordan, Amman, Jordan, in 2005. He received theM.S. degree in electrical engineering from TexasA&M University, College Station, TX, USA, in 2008,after which he joined Qualcomm Inc. on the sameyear to date. He is currently working toward the Ph.D.degree in electrical engineering at the University ofCalifornia at San Diego, San Diego, CA, USA.

Philip Schniter (S’92–M’93–SM’05–F’14) receivedthe B.S. and M.S. degrees in electrical engineeringfrom the University of Illinois at Urbana-Champaign,Champaign, IL,USA, in 1992 and 1993, respectively,and the Ph.D. degree in electrical engineering fromCornell University in Ithaca, NY, USA, in 2000. Afterreceiving the Ph.D. degree, he joined the Departmentof Electrical and Computer Engineering, The OhioState University, Columbus, OH, USA, where he iscurrently a Professor. From 1993 to 1996, he was aSystems Engineer at Tektronix Inc. in Beaverton, OR,

USA. During 2008–2009, he was a Visiting Professor at Eurecom, Sophia An-tipolis, France, and Supelec, Gif-sur-Yvette, France. During 2016–2017, he wasa Visiting Professor at Duke University, Durham, NC, USA. His research inter-est include signal processing, wireless communications, and machine learning.

Bhaskar D. Rao (F’00) received the B.Tech. de-gree in electronics and electrical communication en-gineering from the Indian Institute of Technology ofKharagpur, Kharagpur, India, in 1979, and the M.S.and Ph.D. degrees from the University of SouthernCalifornia, Los Angeles, CA, USA, in 1981 and 1983,respectively. Since 1983, he has been at the Univer-sity of California at San Diego, San Diego, CA, USA,where he is currently a Distinguished Professor in theDepartment of Electrical and Computer Engineering.He is the holder of the Ericsson Endowed Chair in

Wireless Access Networks and was the Director of the Center for WirelessCommunications (2008–2011). His research interests include digital signal pro-cessing, estimation theory, and optimization theory, with applications to digitalcommunications, speech signal processing, and biomedical signal processing.He was a Fellow of IEEE in 2000 for his contributions to the statistical anal-ysis of subspace algorithms for harmonic retrieval. His received several paperawards: 2013 Best Paper Award at the Fall 2013, IEEE Vehicular TechnologyConference for the paper “Multicell Random Beamforming with CDF-basedScheduling: Exact Rate and Scaling Laws,” by Yichao Huang and Bhaskar DRao, 2012 Signal Processing Society best paper award for the paper “An Em-pirical Bayesian Strategy for Solving the Simultaneous Sparse ApproximationProblem,” by D. P. Wipf and B. D. Rao published in IEEE TRANSACTION ON

SIGNAL PROCESSING, vol. 55, no. 7, July 2007, 2008 Stephen O. Rice Prize paperaward in the field of communication systems for the paper “Network Duality forMultiuser MIMO Beamforming Networks and Applications,” by B. Song, R.L. Cruz, and B. D. Rao that appeared in the IEEE TRANSACTIONS ON COMMU-NICATIONS, vol. 55, no. 3, pp. 618–630, March 2007. (http://www.comsoc.org/awards/rice.html), among others. He also received the 2016 IEEE Signal Pro-cessing Society Technical Achievement Award.

Documents

294 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, …