High Order Multi-User MIMO Subspace Detectionstaff.aub.edu.lb/~mm14/pdf/journals/2017_JSPS_High... · High Order Multi-User MIMO Subspace Detection ... (II), interference rejection

J Sign Process SystDOI 10.1007/s11265-017-1231-0

High Order Multi-User MIMO Subspace Detection

Hadi Sarieddeen1 ·Mohammad M. Mansour1 ·Louay Jalloul2 ·Ali Chehab1

Received: 24 March 2016 / Revised: 3 December 2016 / Accepted: 10 February 2017© Springer Science+Business Media New York 2017

Abstract An efficient high order multi-user multiple-inputmultiple-output (MU-MIMO) subspace detector is pro-posed. The detector employs joint modulation classification(MC) and subspace detection (SD), by which the modu-lation type of the interferer is estimated, while multipledecoupled streams are individually detected. The algorith-mic contributions are on two levels. First, the preprocessingchannel matrix decomposition overhead is reduced, usingspecial layer ordering followed by permutation-robust QRDecomposition and elementary matrix operations. Second,a hierarchical MC scheme is proposed, comprising feature-based and near-optimal likelihood-based classifiers, as wellas a classifier that always assumes the interfering modu-lation type to be a fixed high order quadrature amplitudemodulation. An efficient hardware architecture that realizesthe proposed algorithms is presented. Simulations demon-strate that depending on the channel condition, one of

Early results of this paper were presented at the IEEE WirelessCommunication and Networking Conference (WCNC 2016),Doha, Qatar, April 2016. [41].

� Hadi [email protected]

Mohammad M. [email protected]

Louay [email protected]

Ali [email protected]

1 Department of Electrical and Computer Engineering,American University of Beirut, Beirut, 1107 2020, Lebanon

2 Qualcomm Incorporated, San Diego, CA, USA

the proposed schemes can achieve near interference-awareperformance with a minimum complexity overhead.

Keywords Multi-user MIMO · Modulationclassification · Subspace detection · Matrix decomposition

1 Introduction

Multi-user multiple-input multiple-output (MU-MIMO)technology [1, 2] allows simultaneous transmissions tomultiple users over the same time-frequency resource ele-ments, by using multiple antennas at the transmitter andthe receiver. Although exploiting the spatial dimension hasbeen successfully used to increase the link throughput andnetwork capacity for several wireless communications stan-dards [3–6], the corresponding increase in computationalcomplexity resulting from increasing the number of anten-nas hinders the realization of MIMO systems.

At the receiver side, MIMO detection schemes rangebetween low-complexity low-performance linear detectorssuch as zero forcing (ZF) and minimum mean squareerror (MMSE), and optimal high-complexity maximumlikelihood (ML) detection. Many sub-optimal detectors inbetween offer good complexity-performance tradeoffs, suchas the sphere detector and its variants [7–11], and subspacedetection (SD) schemes [12–14], which in a special casereduce to the layered orthogonal lattice detector (LORD)[15].

Moreover, for MU-MIMO, different interference mitiga-tion proposals have led to different receiver designs. Con-ventional linear processing techniques only use the channelestimate of the co-scheduled user, without requiring theknowledge of its modulation type. Such techniques includeinterference ignoring (II), interference rejection combining

http://crossmark.crossref.org/dialog/?doi=10.1007/s11265-017-1231-0&domain=pdf

mailto:[email protected]




J Sign Process Syst

(IRC), and single-layer MMSE (SL-MMSE) [16], withthe latter two having the exact coded performance [17].However, if the detectors explicitly take into account themodulation formats of the desired and the interference sig-nals, remarkable performance gains can be achieved. Suchinterference-aware (IA) detectors, ML and minimum dis-tance detectors [16] for example, are noise limited, ratherthan interference limited, and are not prone to error floorslike conventional detectors.

Since current communication standards do not provideinformation about the interfering modulation type in thedownlink, several techniques emerged, that decide on aspecific interfering modulation type. In [18, 19], the con-stellation of the interfering user’s signal is presumed to be16-QAM regardless of its actual size, and without makingany attempt to estimate it. A better approach, however, is toadd an interference modulation classification (MC) routine,followed by a regular IA detector [20, 21].

MC techniques can be classified into two categories:feature-based classification that depends on statistical prop-erties, and likelihood-based classification that is based onlikelihood functions [22]. In this study, we consider acombination of both. The two main likelihood-based MCapproaches [23–25] are the average likelihood ratio test(ALRT) and the generalized likelihood ratio test (GLRT).While ALRT treats the signal and channel parameters asunknown random variables with known distributions, GLRTtreats them as deterministic but unknown. The hybrid Like-lihood Ratio Test (HLRT) is a combination of the previoustwo. These approaches were extended to multi-user andMIMO scenarios [26–29].

In the feature-based approach, on the other hand, sev-eral discriminant features are selected, where the decision ismade based on their observed values. In particular, higher-order cyclic cumulants (CCs) of the baseband interceptedsignal are exploited as powerful features for linear digitalMC [30–32]. Calculating the higher-order cumulants of thesum of independent processes is mathematically convenient,and the intrinsic cyclostationarity of communication signalsmakes the CCs robust to interference and stationary noise.

1.1 Contributions and Outline

In this paper, we propose an efficient subspace detectorfor MU-MIMO systems, where the modulation type of theinterferer is estimated, while multiple decoupled streamsare being individually detected. The algorithmic contribu-tion is twofold. First, we reduce the preprocessing channelmatrix decomposition overhead of SD, using special layerordering, followed by a permutation-robust QR decompo-sition (QRD), based on modified Gram-Schmidt (MGS)orthogonalization and elementary matrix operations. Then,we propose low-complexity versions of the optimal log-

maximum a posteriori (MAP) and Max-Log-MAP modula-tion classifiers, that adapt to the limitations of the proposedSD scheme, and employ them in a hierarchical fashion withfeature-based classifiers. We also show that the proposedalgorithms can be efficiently implemented, by proposing acorresponding low-complexity architecture and studying itscomplexity in the context of 802.11ac [6].

The remainder of this paper is organized as follows:First, the system model is presented in Section 2, followedby defining the reference ML and SL-MMSE detectors inSection 3. Then, single-stream SD is presented and anal-ysed in Section 4, and the proposed efficient regular MIMOdetector is detailed in Section 5. After that, the proposeddetector is extended to MU-MIMO by proposing severalMC schemes in Section 6, and the architectures that jointlyimplement MC and detection are proposed in Section 8.Finally, the simulation scenario and results are presented inSection 9, before drawing concluding remarks in Section 10.

Regarding notations, bold upper case, bold lower case,and lower case letters correspond to matrices, vectors, andscalars, respectively. Scalar norm, vector norm, and conju-gate transpose are represented by |·|, ‖·‖, and (·)∗, respec-tively. IN is an identity matrix of sizeM , E[·] is the expectedvalue, and P(·) is the probability density function.

2 System Model

2.1 Single-User MIMO

In Sections 3, 4, and 5, we consider a symmetric MIMOsystem with N transmit and M = N receive antennas. Thesystem model is represented as:

y = Hx + n (1)

with y = [y1y2 . . . yM ]T ∈ CM×1 being the received com-plex vector, H ∈ CM×N the spatially multiplexed complexchannel matrix, x = [x1x2 . . . xN ]T ∈ CN×1 the transmit-ted symbol vector, and n ∈ CM×1 the complex additivewhite Gaussian noise vector with zero mean and varianceσ 2n(i.e.,E[nn∗] = σ 2

n IN).

Each symbol xn belongs to a normalized complex con-stellation Xn of size Qn = 2qn , thus x = [x1x2 . . . xN ]T ∈X = X1 × . . . × XN and E[x∗

nxn] = 1. Consequently, thesignal to noise ratio (SNR) is defined in terms of the noisevariance as:

SNR= N

σ 2n

(2)

The bit representation of a symbol is a coded bit-interleaved sequence bn =(bn,1, bn,2, . . . , bn,qn).

J Sign Process Syst

2.2 Multi-User MIMO

In the MU-MIMO scenario (Section 6), we consider a sys-tem model where Nuser ≤ N antennas transmit useful datato the user of interest, while the remaining Ninter = N −Nuser antennas send interfering data (Fig. 1). Note that theentries of H are still considered independent and identi-cally distributed, complex Gaussian, with zero-mean andunit variance, and no weighting is applied.

We assume that the Nuser symbols of the user of interestthat form xuser = [x1 · · · xuser]T are drawn from the arbi-trary, but known, constellationM. We also assume, withoutloss of generality, that the Ninter symbols of the interfererthat form xinter=[xNuser+1 · · · xN ]T are drawn from the sameunknown constellation Uj , j ∈ {0, 1, 2, 3, 4}, where U0, U1,U2, U3 and U4 correspond to the constellations φ, QPSK,16-QAM, 64-QAM and 256-QAM, respectively, with φ rep-resenting a constellation having one entry of zero power,corresponding to the case when there is no interference.

3 Reference Detectors

3.1 ML Detection

An ML detector exhaustively searches the finite lattice X tofind a symbol vector that minimizes the Euclidean distancemetric:

dML = minx∈X

‖y − Hx‖2 (3)

The log-likelihood ratio (LLR) of bit bn,k , generated by thesoft-output ML detector, is calculated as:

λMLn,k = 1

σ 2n

⎛

⎝ minx∈X (0)

n,k

‖y − Hx‖2 − minx∈X (1)

n,k

‖y − Hx‖2⎞

⎠ (4)

for n = 1, . . . , N and k = 1, . . . , qn. The sets X (0)n,k =

x ∈ X : bn,k =0 and X (1)n,k = x ∈ X : bn,k =1 correspond to

BaseSta�on

User ofInterest

Data for User of Interest

Interferer

Figure 1 MU-MIMO system model.

subsets of symbol vectors in X , having in the correspondingkth bit of the nth symbol a value of 0 and 1, respectively.

3.2 Single-Layer MMSE Detection

The biased SL-MMSE detector solves for an equalizedoutput yMMSE:

yMMSE = (H∗H+(1/SNR)IN)−1H∗y (5)

and the LLRs can be calculated as:

λMMSEn,k = 1

σ 2MMSE

⎛

⎝ minxn∈M(0)

n,k

∣∣yn−xn

∣∣2− minxn∈M(1)

n,k

∣∣yn−xn

∣∣2⎞

⎠

(6)

for n = 1, . . . , N and k = 1, . . . , qn, where the setsM(0)

n,k = xn ∈ M : bn,k =0 and M(1)n,k = xn ∈ M : bn,k =1

correspond to subsets of symbols in M, having in thecorresponding kth bit a value of 0 and 1, respectively.σ 2MMSE = σ 2W(n, n) is a scaled variance with W =

(H∗H+(1/SNR) IN)−1.Similarly, the ZF detector solves for an equalized output

yZF = (H∗H)−1H∗y, and the rest of the derivation remainsintact. Note that an unbiased soft-output MMSE detectionscheme exists [33], that slightly outperforms this conven-tional detector. However, as shown later in Section 9, theperformance gap is negligible, and thus this conventionalscheme serves as a good reference.

4 Subspace Detection

4.1 Optimum 2×2 MIMO LORD

Optimal LORD detection [15] for dual-layer MIMO sys-tems (N = 2) only requiresQ1+Q2 distance computations,instead of Q1Q2. It is based on triangularizing the channelmatrix using QL Decomposition (QLD) or QRD. The cor-responding modified system model can be represented as:

y − Hx →[

y1y2

]−[

a1 0c1 b1

].

[x1x2

]= y − Lx (7)

where y = Q∗y, a1, b1 ∈ R+, and c1 ∈ C. With QLD [13,34], Q is unitary and L is lower triangular. Consequently,we have:

minx∈X∥∥y−Lx

∥∥2 = minx1∈X1x2∈X2

(|y1−a1x1|2+|y2−c1x1−b1x2|2

)

= minx1∈X1

(|y1−a1x1|2+

∣∣y2−c1x1−b1x2∣∣2)

(8)

J Sign Process Syst

where x2 is obtained by slicing (y2 − c1x1)/b1 ∈ Cover the constellation X2 using the operator �α�Xn

�argminx∈Xn

|α − x|:

x2 = �(y2 − c1x1)/b1�X2 ∈ X2 (9)

Note that this implementation requires only |X1| = Q1 dis-tance computations. The LLRs of the bits in the symbol x1can then be obtained:

λML1,k = 1

σ 2n

⎛

⎝ minx1∈X (0)

1,k

d(x1)− minx1∈X (1)

1,k

d(x1)

⎞

⎠, k=1, . . . , q1

(10)

with d(x1)=|y1−a1x1|2+∣∣y2−c1x1−b1x2

∣∣2. To obtain theLLRs λML

2,k for the bits in x2, the same operation is repeatedin a reversed order, where the x2 symbols are exhaustivelysearched, while the interference over layer 1 is subtracted,followed by simple slicing over X1.

Note that to find the hard-decision ML solution only, a1-sided decomposition is needed on either layer 1 or layer2. Moreover, for the special case of dual-layer systems, analternative slicing-based approach exist, that also requiresQ1 + Q2 distance computations only, but that does notrequire QRD or QLD [35].

4.2 Extensions to Higher-Order Layers

The aforementioned optimal implementation for 2×2MIMO cannot scale up for N ≥ 3 without loosing opti-mality. This is because L would include off-diagonal terms,the red-marked entries in Fig. 2a, that prevent computingthe ML solution by enumerating symbols on one layer andfinding the minima through slicing individually on all otherlayers in parallel. In fact, the ML solution requires enumer-ating symbols on N − 1 layers and slicing on the last layer,which results in O(

∏n Qn) complexity.

However, the channel matrix can be punctured to zero-out undesirable entries, as shown in Fig. 2b for a 4-layer

MIMO system [14]. This configuration allows us to enu-merate symbols on layer 1, while finding the minimumdistances on layers 2 to 4 in parallel, through slicing only onthe corresponding layers. Moreover, to compute the LLRsfor bits associated with layers 2 to 4, a similar processis repeated on each layer after decomposing the channelmatrix as shown in Fig. 2c–e. In this study we will adoptthe complementary QRD-based decompositions. The corre-sponding desirable structures are shown in Fig. 2f–j. In thiscase, by enumerating symbols on layer 4, the minimum onlayers 3 to 1 can be found in parallel through slicing, and asimilar process is repeated on other layers.

4.3 Generic WR Decomposition

The first step in subspace detection is channel matrixdecomposition. While LORD only requires QRD, thatresults in an unpunctured upper triangular matrix R asshown in Fig. 2f, a more powerful WR decomposition(WRD) scheme is required to puncture the red-markedentries above the diagonal. We consider N = M , and aimat transforming H into a punctured upper triangular matrix(UTM) Rp = [uij ] ∈ CN×N with uii ∈ R+, as shown inFig. 2g, through a matrix W = [w1w2 . . .wN ] ∈ CN×N ,such that W∗H = Rp.

We assume H = [h1h2 . . .hN ] to have a full columnrank. SettingW= (H∗H)−1H∗ to be the left Moore-Penrosepseudo-inverse of H results in R = IN , and choosing W tobe an orthonormal basis of the column space of H trans-forms it into an unpunctured UTM, with W being unitary(QRD). In general, if R = Rp is punctured, then W is non-unitary, and the noise is colored. However, if we impose thecondition on the column vectors of W to have unit length,i.e., w∗

nwn =1 for n=1, . . . , N , the transformed noise vec-tor is guaranteed to maintain an unaltered variance at thelayer of interest.

Let P=H(H∗H)−1H∗ be the orthogonal projection ontothe column space ofH, and P⊥ =IN −H(H∗H)−1H∗ be theorthogonal projection onto the left nullspace of H. Let HIbe the submatrix formed by the columns of H whose indexn ∈ I (if I = 1, 3, then HI = [h1h3]). Denote by In the

Figure 2 4×4 channel matrixstructures: (a, f) LORD; (b–e,g–j) SD.

J Sign Process Syst

column index set of the entries in the nth row of H to bezeroed out, and define wn = P⊥

Inhn, where:

P⊥In

= IN − HIn(H∗

InHIn

)−1H∗In

(11)

and HIn= {hm | m ∈ In}. The normalized vector is derived

as:

w= wn/∥∥wn

∥∥ ,∥∥wn

∥∥=√h∗

nP⊥Inhn (12)

4.4 Detection Routine

To generate soft-output LLRs for all layers, the N streamsare decoupled, one at a time in N steps, by cyclically shift-ing the columns of H and generating the punctured UTMs,as shown in Fig. 2g–j. We call this reference algorithm cycli-cal subspace detection (CYSD). Each permuted H at step t

is WR-decomposed into W(t) and R(t)p . For simplicity, we

assume Xn = M for all n. We first partition y(t), R(t)p , and

x as:

y(t) =[y(t)1

y(t)2

]

, R(t)p =

[A(t) b(t)

0 c(t)

], x =

[x1x2

](13)

where y(t)1 ∈ C(N−1)×1, y(t)

2 ∈ C1×1, A(t) ∈ R(N−1)×(N−1),b(t) ∈ C(N−1)×1, c(t) ∈ R1×1, x1 ∈ MN−1, and x2 ∈ M.Then the vector with minimum distance for a structure t is:

xWR(t) = argmin

x∈X

∥∥∥y(t)−R(t)p x∥∥∥2

= argminx2∈M

(∥∥∥y(t)2 −c(t)x2

∥∥∥2+∥∥∥y(t)

1 −A(t)x1−b(t)x2

∥∥∥2)

(14)

where x1 = �(y(t)1 − b(t)x2)/A(t)�MN−1 is the sliced out-

put. Since A(t) is a diagonal matrix, the slicing operationis applied to individual elements of the vector y(t)

1 over theconstellationM.

In order to generate soft outputs, we compute two dis-tance metrics defined as:

uWRn,k,t = argmin

x∈X 0n,k

∥∥∥y(t) − R(t)p x∥∥∥2

(15)

vWRn,k,t = argmin

x∈X 1n,k

∥∥∥y(t) − R(t)p x∥∥∥2

(16)

which can be expanded as in Eq. 14. Then, the LLRs arecalculated as:

λSDn,k,t = 1

σ 2n

(∥∥∥y(t) − R(t)p uWR

n,k,t

∥∥∥2 −∥∥∥y(t) − R(t)

p vWRn,k,t

∥∥∥2)

(17)

for n = 1, . . . , N , k = 1, . . . , log2 |M|, and t = 1, . . . , N .Similarly, we can define the LLRs for LORD, λLORDn,k,t , thatare derived using the unpunctured UTM R, but that can notmake use of Eq. 14.

Noting that the non-unitary matrix W in WRD, unlikethe unitary Q in QRD, modifies the distance metrics acrosslayers, we have:

‖y − Hx‖2 = ∥∥Q∗(y − Hx)∥∥2 �= ∥∥W∗(y − Hx)

∥∥2 (18)

Therefore, with LORD, unlike SD, tighter LLRs can becomputed by tracking global minimum distances rather thanjust minimizing over the per stream distances. The resultantdetector is called LORD-best (LORDB), which is achievedby selecting the minimum over all ts as:

λLORDBn,k ≈ mint=1,··· ,T (λLORDn,k,t ) (19)

However, as shown later, SD outperforms both LORD andLORDB.

4.5 Performance Analysis

In order to understand the behaviour of SD, we consider thespecial case of 4×4 MIMO with N being the root layer ofinterest, where we have:

R =

⎡

⎢⎢⎣

r11 r12 r13 r140 r22 r23 r240 0 r33 r340 0 0 r44

⎤

⎥⎥⎦ ,Rp =

⎡

⎢⎢⎣

rp11 0 0 rp14

0 rp22 0 rp24

0 0 rp33 rp34

0 0 0 rp44

⎤

⎥⎥⎦

(20)

Since the column of interest in W, in this case column N ,remains orthogonal to all remaining columns, we have thefollowing:

W∗W =

⎡

⎢⎢⎣

1 e12 e13 0e21 1 e23 0e31 e32 1 00 0 0 1

⎤

⎥⎥⎦ (21)

Thus, taking the expectation ofW∗nn∗W over n, we have:

En[W∗nn∗W] =

⎡

⎢⎢⎣

σ 2n e12 e13 0

e21 σ 2n e23 0

e31 e32 σ 2n 0

0 0 0 σ 2n

⎤

⎥⎥⎦ (22)

Hence, although the resultant noise after puncturing is col-ored, normalizing by

∥∥wn

∥∥ in Eq. 12 preserves the noisevariance at the layer of interest. However, the statisticalproperties of the elements of Rp get distorted under punc-turing. The non-zero elements of R have been proved to beindependent random variables with the following distribu-tions [36, 37]:

– The off-diagonal elements are circular complexGaussian with zero mean and unit variance.

J Sign Process Syst

– The square of the nth diagonal element is chi-squaredwith 2(N−n+1) degrees of freedom, and its probabilitydensity function is given by:

f(g = r2nn

)= 1

(N − n)!gN−ne−g, g ≥ 0 (23)

Figure 3 compares the distributions of the magnitudesquared of the elements of R and Rp. While off-diagonalelements remain intact, the distributions of the diagonalelements at upper layers, rp11 and rp22, lose degrees offreedom (from 8 and 6, to 4 degrees of freedom), whichresults in performance degradation. Such degradation canbe seen in hard-output subspace detection, where only onechannel decomposition is required to obtain the entire hardoutput vector. However, in the proposed soft-output detec-tor, for each partition, the LLRs are computed for the bitsof the symbol at the root layer only. This symbol of inter-est is immune to puncturing at higher layers, and hence lackof error propagations makes subspace detection superior toLORD, which is further emphasized by the coding gain.

4.6 Complexity Analysis

In addition to the performance gain achieved by SD, animportant motive behind puncturing is reducing complex-ity. We analyze the complexity in terms of floating-pointoperations (flops) based on real multiplication (RML) andaddition (RAD). Real division and square-root operationsare assumed equivalent to a RML. Also, complex multipli-cation requires 4RMLs and 2RADs, while complex additionrequires 2RADs.

As shown in [13, 14], regular QRD requires θ1 flops:

θ1 = (4N3 − N2 − N)RAD + (4N3 + 3N2)RML (24)

5 10 15 20 25

0

0.2

0.4

0.6

P(g)

r11

r22

r33

r44

rp11

rp22

rp33

rp44

2 4 6 8 10 12 14

g=r2

0

0.2

0.4

0.6

0.8

P(g)

r14

r24

r34

rp14

rp24

rp34

Figure 3 Statistical distributions of elements of R and Rp .

and puncturing alone requires θ2 flops:

θ2 = 2

3(8N3 − 15N2 + 4N − 12)RAD

+ (16

3N3 − 7N2 + 8

3N − 20)RML (25)

Within the detection routine, every time the product Rpx iscomputed, (N − 2)(N − 1)/2 complex multiplications aresaved, which consists of θ3 flops:

θ3 = (N2 − 3N + 2)RAD + (2N2 − 6N + 4)RML (26)

Moreover, channel matrix decompositions are only per-formed in the pre-processing stage of detection, and withslow fading channels, the decomposition outputs can beretained for a very large number of frames J . Hence, SDsaves J×N×|M|×θ3 flops compared to LORD. Therefore,the studied WRD-based approaches are computationallyefficient, especially with slow fading channels, and highorder modulation types.

Note that these complexity computations were derivedfor symmetric MIMO systems, as assumed in the systemmodel of this paper, for simplicity. However, had the systemhad more receive antennas (M >N ), the “thin” form of theQR decomposition for tall matrices would have been used,and other modifications would have immediately followed.

5 Proposed Detector

5.1 Layer Ordering

When cyclically shifting the columns of H, the number ofWRD operations required is equal to the number of layersto be processed, which is a significant computational burdenthat forms a bottleneck in high order MIMO. An alternativeminimal swapping operation can reduce this computationaloverhead (Section 8.1), by making successive decomposi-tions share a large amount of redundant computations thatcan be saved. For example, in the case of 4 × 4 MIMO,if we want to compute the LLRs of the bits on layer 2,we can swap h2 with h4, and use the matrix decomposi-tion of Fig. 2g. We represent this swapping operation by apermutation:

π(t)(i) =⎧⎨

⎩

N if i = t

t if i = N

i otherwise(27)

for t = 1, . . ., N and i = 1, . . ., N. The remainder ofthe derivation, Eqs. 13 to 17, remains intact. We call thisalgorithm single-permutation subspace detection (SPSD).

Note that the performance of subspace detection isnot affected by layer ordering. For example, if layer 2is the layer of interest and column 2 is to be in the

J Sign Process Syst

right-most location before decomposition, the permuta-tions [h1h3h4h2], [h1h4h3h2], [h3h1h4h2], [h3h4h1h2],[h4h1h3h2], and [h4h3h1h2], all result in the exact sameperformance. This is because with SD, upper layers areindependent of the layer of interest.

Another approach, which we will later argue to be of apractical interest, is what we call pairwise subspace detec-tion (PWSD). This approach consists of lumping the chan-nel columns in pairs (assuming N even), and handling eachpair of layers at a time. First, the pair of interest is swappedwith the rightmost two columns. Then, the columns of thepair get swapped so that each can be at position i = N . Forexample, in the case of 4×4 MIMO, the 4 permuted chan-nel matrices can be H1 = [h3h4h1h2], H2 = [h3h4h2h1],H3=[h1h2h3h4], andH4=[h1h2h4h3]. After each of theN

permutations, the permuted channel matrix is decomposed,and the LLRs for the corresponding layer are computed(Eqs. 13 to 17).

5.2 Permutation-Robust Reduced-Complexity QRD

The brute force approach for computing W involves exten-sive matrix inversion, which is computationally expensiveand prone to numerical error when implemented withfinite precision. However, there exist an alternative effi-cient scheme [14] to determine W, which consists of QRDfollowed by elementary matrix operations.

The QRD decomposes H into a unitary matrix Q =[q1q2 . . .qN ] and an upper triangular matrix R = [rij ]N×N

with real and positive diagonal elements (H = QR).This can be computed using Givens rotation (GR), Gram-Schmidt (GS) orthogonalization, or Householder transfor-mation (HT) [38]. While the hardware implementation ofHT is very complex, GR reduces the hardware area, but atthe expense of longer clock latency. The classical GS algo-rithm allows a memory efficient implementation due to itsinherent parallelism, resulting in better regularity in dataflow and a potential for better hardware-efficiency, however,due to fixed-precision computation and round off errors,it can not guarantee the orthogonality of Q. This limita-tion was overcome by the numerically superior ModifiedGram-Schmidt (MGS) algorithm.

The MGS-based QRD of H consists of two main parts.In the first part, the diagonal elements of R and the columnsofQ are computed. In the second part, the non-diagonal ele-ments of R are computed and the columns ofH are updated.Considering a 4 × 4 complex matrix, in the first part ofthe first iteration, the norm of h1 is assigned to r11, andq1 is calculated as q1 = h1/r11. Then, in the second partr12, r13 , and r14 are calculated using q1, h2, h3, and h4 asfollows:

r1j = qT1 hj 2 ≤ j ≤ 4 (28)

and H is updated by setting its first column to zero and sub-tracting from the other columns the length of the projectionof q1 on them, i-e:

hj = hj − q1r1j 2 ≤ j ≤ 4 (29)

This procedure is repeated with one less column every newiteration.

Moreover, since in our proposed detection algorithms theH(t) matrices are only one swap operation away, further sim-plifications can be introduced. In fact, when computing theQRD of a matrix, which is derived from another matrix, ofknown decomposition, by some column permutations, com-putational savings can be made. Part of the decompositionresult remains unaltered under specific permutations. Forexample, assume as shown in Fig. 4, columns 3 and 4 in H(in blue) were permuted. The first two columns of Q andR (in red) depend only on the first two columns of H, andhence there is no need to recompute them. This is calledpermutation-robust QRD (PR-QRD).

5.3 Matrix Puncturing

Assume that H is QR-decomposed and we have Q∗H =R. Obviously, q∗

NqN = 1 and q∗Nhm = 0 for all m =

1, . . . , N−1, hence, wN = qN . Now consider row 1< n≤N ofR, and assume themth entry rnm,m>n, is to be nulled.We have q∗

nhm = rnm ∈ C and q∗mhm = rmm ∈ R+, from

which it follows that (q∗n − q∗

mrnm

rmm)hm = 0. Therefore, the

equations:

qn = qn − qmr∗nm/rmm (30)

rnj = rnj − rmj rnm/rmm, for j = m, . . . , N (31)

puncture the required entry and updateQ accordingly. Theseoperations are repeated for all other entries m > n to bepunctured in row n. Finally, qn is normalized to have unitlength, and the non-zero entries in row n of R are updated:

rnj = rnj / ‖qn‖ , for j = n, . . . , N (32)

qn = qn/ ‖qn‖ (33)

The operations in Eqs. 30–31 followed by the normalizationsteps Eqs.32–33 are repeated for all rows n where punctur-ing is required. The resulting Q is W, and R is the desiredpunctured UTM Rp.

Figure 4 QRD savings under column permutations.

J Sign Process Syst

Unlike QRD, there is no permutation-robust implemen-tation for puncturing. The punctured elements are in theupper rows, affecting the leftmost columns of Q. We stillcall the overall decomposition a permutation-robust WRD(PR-WRD).

6 Extension to Multi-User MIMO

In the remainder of this paper, we follow the system modelof Section 2.2, and extend the proposed subspace detec-tion methods to MU-MIMO systems. Note that since theinterference is discrete and not Gaussian, MMSE is notthe optimal detection strategy in MU-MIMO. We proposea SD scheme that treats the co-scheduled user’s signal asa constrained unknown to be estimated rather than just asadditional random noise.

This MU-MIMO detector is very similar to the proposedMIMO detector, where the same operation is repeated, buton the layers of interest only. The interfering layers undergoa slicing operation, and hence better knowledge of theinterferer will result in a better performance. Therefore,advanced signal processing can be applied at the receiverside, e.g. joint/conditional ML detection, where the con-stellation size of the co-scheduled user’s signal needs to beestimated first, via a MC routine, before symbol detectionand decoding are performed.

6.1 Likelihood-Based MC

The optimal likelihood-based MC scheme decides on themodulation format that has the maximum likelihood withinmultiple hypotheses. Following the Bayesian formulation,hypothesis testing is performed on the possible modulationformats. We consider five hypotheses: y ∼ P(y; xuser ∈MNuser , xinter ∈ UN inter

j ), j ∈ {0, 1, 2, 3, 4}, with likeli-hoods:

P(y;Uj ) =∑

xuser∈MNuser , xinter∈UN interj

P (y|x)P (x) (34)

Under statistical independence between the components ofx, and assuming uniform priors, P(x1) = · · · = P(Nuser) =1/ |M| and P(Nuser+1) = · · · = P(xN) = 1/

∣∣Uj

∣∣, where|·| denotes the cardinality of the constellation (probabilitiesof user symbols are independent of the interferer and thuscan be dropped from the likelihood equation), the ML MCdecision metric can be derived as:

j = argmaxj∈{0,1,2,3,4}

∑


P (y|x) 1∣∣Uj

∣∣Ninter

(35)

Noting that P(y|x) = 1(πσ 2

n )Mexp(− 1

σ 2n

‖y − Hx‖2), andneglecting the term 1

(πσ 2n )M

which is assumed fixed

over hypotheses, the resultant Log-MAP decision metricis:

jLog-MAP = argmaxj∈{0,1,2,3,4}

(

Ninter log1∣∣Uj

∣∣

+ log∑


exp

(− 1

σ 2n

‖y − Hx‖2)⎞

⎟⎠

(36)

which is the optimal ALRT solution.Solving Eq. 36 is computationally intensive, because for

each j we have to calculate |M|Nuser × ∣∣Uj

∣∣Ninter exponen-tial terms. However, one of these terms is dominant andcorresponds to the scaled ML distance:

dML,j = minxuser∈MNuser , xinter∈UN inter

j

1

σ 2n

‖y − Hx‖2 (37)

Hence, following the Jacobian-logarithm approximation(log∑

r exp(ar ) ≈ maxr {ar}), we obtain:

jMax-Log-MAP = argmaxj∈{0,1,2,3,4}

(

Ninter log1∣∣Uj

∣∣ − dML,j

)

(38)

which is the sub-optimal Max-Log-MAP classifier [20, 21].Therefore, the main component of the decision metric for

MC is found to be an accumulation over a set of tones ofEuclidean distance computations, which are also used bythe ML detector for bit LLR soft decision generation. Com-bining MC and detection routines is thus computationallyefficient.

6.2 MC Using Higher-Order Cumulants

For feature-based classification, feature vectors containinghigher-order CCs are used. These features cannot be directlyextracted from the components of y, since they consist oflinear mixtures of the components of the transmitted signalvector and additive noise. First, the channel is compensated,using ZF for example, where the received vector is mul-tiplied by the pseudo-inverse of the channel matrix. Then,the modulation type-specific features are estimated fromthe noisy recovered symbol streams, the components of thevector yZF.

J Sign Process Syst

The general expression of a cumulant of order u, v-timesconjugated, for a complex random variable s is given as[39]:

κu,vs =

∑

Pu

⎡

⎣k(p)

p∏

j=1

E{suj −vj s∗vj }⎤

⎦ (39)

where Pu is the set of the partitions of the elements{1, 2, · · · , u}. A partition ρ consists of p sets νj : ρ ={νj }pj=1, where uj is the size of the set νi , vj is the number

of conjugated terms, and k(p)=(−1)p−1(p − 1)!.Assume that we require estimating the modulation type

from which the ith symbol yZF,i of yZF was drawn, wereplace s by yZF,i , and compute Eq. 39 with the required u

and v. Equation 39 can be simplified for specific values ofu and v. For example, we have:

κ2,0s = E{s2} (40)

κ2,1s = E{|s|2} (41)

κ4,0s = E{s4} − 3E{s2}2 (42)

κ4,2s = E{s2s∗2} −

∣∣∣E{s2}∣∣∣2 − 2E{ss∗}2 (43)

Due to the symmetry in constellations, only cumulants ofeven order are non-zero for linearly modulated signals, andhence are useful for MC. The theoretical values for variouscumulants for QAMmodulations are shown in Table 1. Notethat when only discriminating between QAMs, κ

4,1s , κ

6,0s ,

and κ6,2s are also all zeros, and hence can not be used. More-

over, κ2,0s and κ

2,1s can only be used to discriminate between

the cases whether there exist interference or not.Eventually, the decision on a specific modulation scheme

is made by choosing the modulation type that minimizesthe Euclidean distance between the feature vector estimateand the theoretical feature vector. When multiple branchestransmit symbols from the same modulation type, selectioncombining can be applied to select the feature estimate fromthe branch that has the highest SNR, or a more sophisticatedmaximum ratio combining mechanism.

Table 1 Theoretical cumulants for different constellations.

Cumulant φ QPSK 16-QAM 64-QAM 256-QAM

κ2,0s 0 1 1 1 1

κ2,1s 0 1 1 1 1

κ4,0s 0 1 −0.68 −0.619 −0.6047

κ4,1s 0 0 0 0 0

κ4,2s 0 −1 −0.68 −0.619 −0.6047

κ6,0s 0 0 0 0 0

κ6,1s 0 −4 2.08 1.7972 1.7345

κ6,2s 0 0 0 0 0

κ6,3s 0 4 2.08 1.7972 1.7345

7 Proposed MC Scheme

7.1 Modified Likelihood-Based MC

Since in this study we use subspace instead of ML detection,major modifications should be made to likelihood-basedMC. Equation 36 does not hold with SD, since the entirelattice is not exhaustively searched, neither does Eq. 38,since dML,j is not guaranteed to be within the search region.Moreover, aiming at efficiently combining MC and SD,and since SD only makes use of the layers of interest, wepropose carrying the summation over the desired signalconstellations while MC is for the interfering user.

We thus introduce the Quasi-Log-MAP and Quasi-Max-Log-MAPMC schemes, that best approximate the authenticschemes. With Quasi-Log-MAP, the summation in Eq. 36 isover the |M| lattice points searched by SD, on one of thedetected streams, say l, and hence the modified likelihoodfunction can be represented as:

jQuasi-Log-MAP = argmaxj∈{0,1,2,3,4}

⎛

⎝log1∣∣Uj

∣∣ + log∑

xl∈|M|

exp

(− 1

σ 2n

∥∥∥y(l)−R(l)p x∥∥∥2)⎞

⎠ (44)

where the Euclidean distance is expanded as in Eq. 14.With Quasi-Max-Log-MAP, the modified ML distance

metric (d ′ML) is considered to be the minimum of the

spanned scaled distances in Quasi-Log-MAP. Moreover,better (average) performance can be achieved when accu-mulating the minimum distances from all layers of interest.Equation 45 generalizes the proposed likelihood functionassuming T observations (tones) are accumulated under aconstant interfering modulation type before deciding on awinning hypothesis.

jQuasi-Max-Log-MAP= argmaxj∈{0,1,2,3,4}

T∑

t=1

(

log1

|Uj | −Nuser∑

l=1

d′(l,t)ML,j

)

(45)

Since distances from different layers of interest that undergodifferent decompositions are independent, combining themis only equivalent to repeating the observation and taking theaverage, as opposed to having a more powerful observation.Hence, dropping the inner summation over Nuser layers inEq. 45 is efficient. In fact, even in the case of MC in regularMIMO systems, where a uniform adaptive modulation typeon all layers is to be estimated, a per-layer approach wasproposed [40], that does not accumulate distances acrosslayers.

J Sign Process Syst

After the classifier decides on j , soft-output SD gener-ates the bit LLRs as described earlier. The joint MC anddetection setup is described as follows: After observing T

vectors, and for all possible hypotheses, the detection rou-tine is called T times and the outputs are stored in memory.Concurrently, the likelihood for each hypothesis gets com-puted. Eventually, the hypothesis that gets the maximumlikelihood is declared a winner and the corresponding outputis retrieved.

7.2 Hierarchical MC

While likelihood-base MC applies SD a number of timesequal to the number of hypotheses, feature-based MCrequires only one SD routine following the MC routine.However, we can have a combination of both, that reducesthe entailed complexity with minimum effect on perfor-mance.

Note that the theoretical CC values for higher orderQAM constellations are very close. Also, despite the factthat higher order cumulants have more distant theoreticalvalues, their corresponding variance is high. Thus, higherorder cumulants do not necessarily result in better perfor-mance. Nevertheless, at least second order cumulants canbe used to eliminate the hypothesis of no interferer, beforeproceeding with likelihood based MC to estimate otherhypotheses.

The proposed hierarchical MC scheme is the following:First, the ZF solution is computed, and κ

2,0s is calculated. If

the result is closer to 0, then there is no interference, and theentire likelihood-based MC routine is skipped. Otherwise,if the result is closer to 1, likelihood-based MC follows, butwith one less hypothesis to check.

7.3 Assuming High-Order Interfering ModulationTypes

Instead of adding a MC routine, an attractive solution is toassume the interfering modulation type to be a high orderQAM, without attempting to estimate it. A similar solutionwas presented in [18, 19], where the interfering modulationtype was assumed to be 16-QAM, and an ML detector fol-lowed. In our case, assuming very high order constellationsis feasible, because the number of distances computed in SDis not affected by the size of interfering constellation. Theonly increase in complexity by assuming higher order inter-fering constellations is in the slicing operation they undergo,which is negligible. Therefore, we propose to assume theinterfering modulation type to be 64-QAM, 256-QAM, or1024-QAM, where the latter is not even one of the possiblehypotheses.

8 Architectural Implementation

8.1 2-Stage SD

The reference CYSD with cyclic permutations does notallow further savings, since all column positions are alteredfrom one permutation to another. However, parallelism isan inherent feature in it, where the process on each layercan run on a separate core. If we discard this parallelism,and use a pipelined architecture, the decomposition outputfrom one layer can be fed to the subsequent layer, allowingcomputational savings.

A 2-stage architecture for PWSD is shown in Fig. 5. Theodd channel permutations can execute in parallel, using the

Figure 5 Architecture for a 2-stage 8 × 8 MIMO PWSD.

J Sign Process Syst

efficient implementation of Section 8, but with no redun-dant computations to save. The LLRs of their correspondinglayers are sent to a buffer, and the WRD output is passedto the next stage, to assist the WRD of even permutations.A PR-WRD is thus applied in the second stage, making useof previous decompositions. Finally, the collected LLRs areprocessed as previously described. To implement SPSD, an8-stage architecture is required, in which the decomposi-tions are carried out serially, and each stage can make useof computations in all previous stages. Such an architecture,if used with PWSD, results in more savings than a 2-stagearchitecture. However, adding more stages complicates thearchitecture, and increases its size and latency. A 2-stagearchitecture for PWSD with 8×8 MU-MIMO is then shownin Fig. 6, where the first four layers correspond to the userof interest.

Table 2 summarizes the redundant QRD computationsthat can be saved in the efficient implementations, depend-ing on the permutations and their order, for an 8×8 MIMOsystem (this setup of permutations is not unique). The com-plete QRD requires a total of 2240 RML and 1472 RAD,and the savings in the PR-QRD reach 1296 RML and 816RAD. This means that the total QRD overhead is decreasedby around 30% with a 2-stage PWSD. Note that the pro-posed approaches can be applied to arbitrary MIMO orders.The impact is better in higher order systems, 32×32 MIMOfor example, but worse with lower order systems such as4 × 4 MIMO, where the rightmost two columns consti-tute the majority of required computations. When the PR-WRD does not include matrix puncturing, CYSD, SPSD,and PWSD reduce to cyclical LORD (CYLD), single-permutation LORD (SPLD), and pairwise LORD (PWLD),respectively. The savings are more profound with the latterLORD detectors.

8.2 Joint MC and Detection

The optimized architecture for the proposed MU-MIMOdetector with hierarchical likelihood and feature-based MCis shown in Fig. 7. At the core of this architecture is a

Table 2 Computational savings in proposed schemes.

SPSD Saved Computations

1: h1h2h3h4h5h6h7h8 none

2: h1h2h3h4h5h6h8h7 1296RML+816RAD







8-Stage PWSD Saved Computations









2-Stage PWSD Saved Computations









subspace detector, that in the first stage detects the firstreceived symbol assuming all (except the one correspondingto no interference) possible choices of the interferer’s mod-ulation type, and generates the corresponding list of Euclid-ian distance metrics for all T vectors. These distances andsymbols are stored in buffers (increased space complexity).The sum of the logarithm of the exponential of the distancemetrics (no logarithms and exponentials with Max-Log-MAP and Quasi-Max-Log-MAP) are passed to an adder that

Figure 6 Architecture for a 2-stage 8 × 8 MU-MIMO PWSD - Nuser = 4.

J Sign Process Syst

Figure 7 Architecture for jointhierarchical MC and SD.

accumulates them over a span of T tones, during which theinterferer modulation is assumed to be static. The resultingaccumulated distances for each interference hypothesis aresaved, and after deciding on a winning hypothesis, the corre-sponding distances are forwarded for LLR processing. Thisblock is only activated when the feature-based classifierdecides that an interferer exists. Otherwise, SD is appliedonce, assuming the interfering modulation type is φ, and thedistances are forwarded for LLR processing.

The proposed algorithms can be used in many communi-cation standards. For example, it can be used with 802.11ac[6] (WiFi), which supports 80 MHz of bandwidth with 242usable tones, 8 of which are reserved for pilots and 234data tones. The length of the data field in a WiFi frame canbe a very large number of orthogonal frequency-divisionmultiplexing (OFDM) symbols L. Since the interferer’smodulation constellation remains static over T tones and L

symbols, the particular choice of T =234 results in substan-tial savings in computations. The detector only needs to runin the above mode to identify the interferer’s constellationfor one OFDM symbol in the frame. It can then switch backto normal SD detection mode (without MC) to generate theLLRs for the remaining OFDM symbols.

The total number of distance computations needed togenerate the LLRs from the 234×L data tones is 234×L × |M| × Nuser. With likelihood-based MC, the averageoverhead of the MC routine is 234×4×|M|×Nuser. This cor-

responds to a maximum increase of only 4/L%, comparedto the distances computed by a SD with perfect knowledgeabout the interferer. Moreover, the size L of the data fieldcan take a range of values from 8 to more than 1024, hencethe increase in distance computations ranges between 50%and less than 0.4%. With the proposed hierarchical MC,the overhead in distance computations is 3/L%, wich onlyoccurs 80% of times, and thus it ranges between 30% and0.23%. However, we have to add the complexity of findingthe ZF solution and computing the second order CCs.

9 Simulation Results

9.1 Simulation Scenario

Joint MC and SD was implemented following the stud-ied system model. The decision on the hypothesis is doneafter receiving T = 234 tones. Turbo coding is used, witha code rate of 1/2 and 8 decoding iterations. The inter-ferer is assumed to hop over the five hypotheses with equalprobability on every new frame. Moreover, in addition tothe regular channel H, we considered another channel Hc,that accounts for antenna correlation. The effective channelmatrices are related by Hc =R1/2

r HR1/2t , where Rt and Rr

are the transmit and receive antenna correlation matrices,respectively.

J Sign Process Syst

7.5 8 8.5 9 9.5

SNR-dB

10-4

10-3

10-2

10-1

BER

ZF

MMSE-biased

MMSE-unbiased

LORD

LORDB

Subspace

ML

Figure 8 BER performance of 4×4MIMO detectors -M is 16-QAM- uncorrelated channels.

9.2 Result Discussion

Figure 8 shows the coded bit error rate (BER) performanceof the reference detectors, for 4×4 MIMO with 16-QAM.While SD and LORDB are shown to achieve near-MLperformance, the gap between LORD and SD is around1 dB, and the MMSE and ZF detectors lag behind (notethat the difference between biased and unbiased MMSEis negligible). Figure 9 then shows the BER performanceof the proposed MIMO approaches, compared to that ofCYSD/CYLD, and the linear ZF detector, for 8×8 MIMOwith 16-QAM. The ML detector is not shown here since ithas a prohibitive complexity. The PWSD and SPSD curvescoincided with the CYSD curve, and so did PWLD and

9.5 10 10.5 11 11.5 12 12.5 13 13.5 14

10−4

10−3

10−2

10−1

SNR−dB

BER

CYLD

SPLD

PWLD

CYSD

SPSD

PWSD

ZF

Figure 9 BER performance of 8×8MIMO detectors -M is 16-QAM- uncorrelated channels.

SNR-dB

5 10 15 20 25 30

CCR

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Modulation Classification Performance

Quasi-Log-MAP

Quasi-Max-Log-MAP

Hierarchical-Log-MAP

Hierarchical-Max-Log-MAP

CC-2,0

CC-4,2

CC-6,3

Figure 10 CCR performance of 4×4 MU-MIMO detectors - Nuser =2 -M is 64-QAM - uncorrelated channels.

SPLD with CYLD. This means that savings came at noperformance degradation cost.

The correct classification ratio (CCR) of the various MCschemes is then shown in Fig. 10. Classifiers based onCC did not perform well since all hypotheses correspondto QAMs, and the Quasi-Log-MAP and Quasi-Max-Log-MAP classifiers had a very similar performance. But mostimportantly, the hierarchical version of the classifiers hadthe exact performance of the likelihood-based classifiers,which means that the reduction in complexity came at noperformance degradation cost.

In what follows, we illustrate the frame error rate (FER)performance of five SD schemes: the IA SD, the MC-based

13 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9

10−3

10−2

10−1

100

SNR−dB

FER−Coded

Subspace−IA

Subspace−64

Subspace−256

Subspace−1024

Subspace−MC

SL−MMSE

Figure 11 FER performance of 4×4 MU-MIMO detectors - Nuser =2 -M is 64-QAM - uncorrelated channels.

J Sign Process Syst

13.6 13.8 14 14.2 14.4 14.6 14.8

10−3

10−2

10−1

SNR−dB

FER−Coded

Subspace−IA

Subspace−64

Subspace−256

Subspace−1024

Subspace−MC

SL−MMSE

Figure 12 FER performance of 8×8 MU-MIMO detectors - Nuser =2 -M is 64-QAM - uncorrelated channels.

SD with a hierarchical classifier, and the SD schemes thatassume the interfering modulation type to be 64-QAM, 256-QAM, and 1024-QAM. For a 4×4 MU-MIMO system withNuser = 2, Fig. 11 shows the corresponding FER perfor-mance whenM is 64-QAM and the channel is uncorrelated.Adding a MC routine and assuming the 64-QAM hypoth-esis are found to achieve near-IA performance. Moreover,less than 0.5 dB apart, are the schemes that assume the 256-QAM and 1024-QAM hypotheses, respectively. A similarperformance is noted with 8×8 MU-MIMO systems havingNuser=2 and six interfering layers, as shown in Fig. 12.

Upon adding channel correlation, some of the SDschemes that assume the interferer without MC will exhibitan error floor. Figure 13 shows the case when medium cor-relation is added (transmit and receive correlation factors

34 34.5 35 35.5 36 36.5 37

10−4

10−3

10−2

10−1

100

SNR−dB

FER−Coded

Subspace−IA

Subspace−64

Subspace−256

Subspace−1024

Subspace−MC

SL−MMSE

Figure 13 FER performance of 4×4 MU-MIMO detectors - Nuser =2 -M is 64-QAM - correlated channels.

59 59.5 60 60.5 61 61.5 62

10−3

10−2

10−1

100

SNR−dB

FER−Coded Subspace−IA

Subspace−64

Subspace−256

Subspace−1024

Subspace−MC

SL−MMSE

Figure 14 FER performance of 4×4 MU-MIMO detectors - Nuser =2 -M is 64-QAM - highly correlated channels.

of 0.3). The SD that assumes the interferer to be 16-QAMsaturated, while the remaining schemes maintained near-IAperformance. Pushing this further, Fig. 14 shows the casewhen the channel is highly correlated (transmit correlationfactor of 0.6 and receive correlation factor of 0.9). Here,all interference-assuming detectors saturated, each at a dif-ferent FER level, with the best of them being the SD thatassumes the 1024-QAM hypothesis.

To understand these results we note the following: First,SD is less sensitive to interference than other detectionschemes, which explains why assuming an interfering mod-ulation type without estimation works fine. Moreover, 64-QAM is closer to the median of the hypotheses, and henceslicing over it is more likely to result in a similar output toslicing over the correct hypothesis. However, at high chan-nel correlation, the slicing operation will cause larger errors.Thus, assuming the interfering modulation type to be ofhigh order reduces this error while maintaining the structureof a QAM. Finally, correlation shifts the plots to a higherSNR range, where the CCR of MC is near 1, and thereforenear-IA performance is maintained with MC-based SD.

10 Conclusion

Several low-complexity SD schemes for MIMO and MU-MIMO systems have been proposed, alongside efficientarchitectural implementations. The preprocessing channelmatrix QRD overhead has been reduced by 30% in the pro-posed detector, and the MC complexity overhead has beenshown to reduce to only 0.23% in some WiFi settings. Ithas been concluded that while assuming the modulationtype of the interferer without estimation is sufficient withgood channel conditions, MC is required at high SNR withcorrelated channels.

J Sign Process Syst

Acknowledgments This work was partially funded by the NationalCouncil for Scientific Research (CNRS) in Lebanon.

References

1. Duplicy, J., Biljana, B., Balraj, R., Ghaffar, R., Horvath, P.,Kaltenberger, F., Knopp, R., Kovacs, I., Nguyen, H., & Tandur,D. (2011). MU-MIMO in LTE systems. EURASIP Journal onWireless Communications and Networking, 2011(1), 496–763.

2. Paulraj, A., Nabar, R., & Gore, D. (2003). Introduction to space-time wireless communications, Cambridge University Press.

3. Evolved universal terrestrial radio access (E-UTRA); physicalchannels and modulation. 3GPP Std. TS 36.211 http://www.3gpp.org.

4. Physical channels and mapping of transport channels onto physi-cal channels. 3GPP Std. TS 25.211 http://www.3gpp.org.

5. IEEE standard for air interface for broadband wireless accesssystems. IEEE Std. 802.16 (2012). http://standards.ieee.org/getieee802/802.16.html.

6. IEEE standard for local and metropolitan area networks – part11: Wireless LAN medium access control (MAC) and physicallayer (PHY). IEEE Std. 802.11 (2012). http://standards.ieee.org/getieee802/802.11.html.

7. Hassibi, B., & Vikalo, H. (2005). On the sphere-decoding algo-rithm I. expected complexity. IEEE Transactions on Signal Pro-cessing, 53(8), 2806–2818.

8. Mansour, M.M., Alex, S.P., & Jalloul, L.M. (2014). Reduced com-plexity soft-output MIMO sphere detectors—part I: Algorithmicoptimizations. IEEE Transactions on Signal Processing, 62(21),5505–5520.

9. Mansour, M.M., Alex, S.P., & Jalloul, L.M. (2014). Reduced com-plexity soft-output MIMO sphere detectors—part II: Architecturaloptimizations. IEEE Transactions on Signal Processing, 62(21),5521–5535.

10. Vikalo, H., & Hassibi, B. (2005). On the sphere-decoding algo-rithm II. generalizations, second-order statistics, and applicationsto communications. IEEE Transactions on Signal Processing,53(8), 2819–2834.

11. Viterbo, E., & Boutros, J. (1999). A universal lattice code decoderfor fading channels. IEEE Transactions on Information Theory,45(5), 1639–1642.

12. Ariyavisitakul, S.L., Zheng, J., Ojard, E., & Kim, J. (2008). Sub-space beamforming for near-capacity MIMO performance. IEEETransactions on Signal Processing, 56(11), 5729–5733.

13. Mansour, M.M. (2015). A near-ML MIMO subspace detectionalgorithm. IEEE Signal Processing Letters, 22(4), 408–412.

14. Mansour, M.M. (2015). A low-complexity MIMO subspace detec-tion algorithm. EURASIP Journal on Wireless Communicationsand Networking, 2015(1), 1–11.

15. Siti, M., & Fitz, M.P. (2006). A novel soft-output layered orthogonallatticedetectorformultiple antenna communications. In Proceedingsof the IEEE int. Conf. Commun. (ICC), (Vol. 4 pp. 1686–1691).

16. Jungwon, L., Toumpakaris, D., & Wei, Y. (2011). Interferencemitigation via joint detection. IEEE Journal on Selected Areas inCommunications, 29(6), 1172–1184.

17. Bai, Z., Badic, B., Iwelski, S., Scholand, T., Balraj, R., Bruck, G.,& Jung, P. (2011). On the equivalence of MMSE and IRC receiverin MU-MIMO systems. IEEE Communications Letters, 15(12),1288–1290.

18. Ghaffar, R., & Knopp, R. (2011). Interference sensitivity formultiuser MIMO in LTE. In 2011 IEEE 12th international work-shop on Signal processing advances in wireless communications(SPAWC) (pp. 506–510).

19. Ghaffar, R., & Knopp, R. (2011). Interference-aware receiverstructure for multi-user MIMO and LTE. EURASIP Journal onWireless Communications and Networking, 2011(1), 1–17.

20. Gomaa, A., Jalloul, L.M., Gomadam, K.S., Tujkovic, D., & Man-sour, M.M. (2015). Multi-user MIMO receivers with partial stateinformation. arXiv:http://arxiv.org/abs/1502.00212.

21. Bae, J.H., Kim, S., Lee, J., & Kang, I. (2012). Advanced downlinkMU-MIMO receiver for 3GPP LTE-a. In Proceeding of the IEEEint. Conf. Commun. (ICC) (pp. 7004–7008).

22. Dobre, O.A., Abdi, A., Bar-Ness, Y., & Su, W. (2007). Sur-vey of automatic modulation classification techniques: classicalapproaches and new trends. Communications, IET, 1(2), 137–156.

23. Wei, W., & Mendel, J.M. (2000). Maximum-likelihood classifica-tion for digital amplitude-phase modulations. IEEE transactionson Communications, 48(2), 189–193.

24. Panagiotou, P., Anastasopoulos, A., & Polydoros, A. (2000). Like-lihood ratio tests for modulation classification. In Proceedings ofthe IEEE MILCOM, (Vol. 2 pp. 670–674).

25. Hameed, F., Dobre, O.A., & Popescu, D.C. (2009). On thelikelihood-based approach to modulation classification. IEEETransactions on Wireless Communications, 8(12), 5884–5892.

26. Choqueuse, V., Azou, S., Yao, K., Collin, L., & Burel, G. (2009).Blind modulation recognition for MIMO systems. MTA Review,19(2), 183–196.

27. Shim, B., & Kang, I. (2009). Joint modulation classification anddetection using sphere decoding. IEEE Signal Processing Letters,16(9), 778–781.

28. Ramezani-Kebrya, A., Kim, I.M., Kim, D.I., Chan, F., & Inkol, R.(2013). Likelihood-based modulation classification for multiple-antenna receiver. IEEE Transactions on Communications, 61(9),3816–3829.

29. Sarieddeen, H., Mansour, M.M., Jalloul, L.M.A., & Chehab,A. (2015). Likelihood-based modulation classification for MU-MIMO systems. In Proceedings of the IEEE global conf. on signaland inform. Process. (GlobalSIP) (pp. 873–877).

30. Dobre, O.A., Abdi, A., Bar-Ness, Y., & Su, W. (2010).Cyclostationarity-based modulation classification of linear digitalmodulations in flat fading channels. Wireless Personal Communi-cations, 54(4), 699–717.

31. Muhlhaus, M.S., Oner, M., Dobre, O.A., Jkel, H.U., & Jondral,F.K. (2012). Automatic modulation classification for mimo sys-tems using fourth-order cumulants. In Proceedings of the IEEEvehic. Technol. Conf. (VTC) (pp. 1–5).

32. Muhlhaus, M.S., Oner, M., Dobre, O.A., Jakel, H.U., & Jondral,F.K. (2013). A novel algorithm for MIMO signal classificationusing higher-order cumulants. In 2013 IEEE Radio and wirelesssymposium (RWS) (pp. 7–9).

33. Studer, C., Fateh, S., & Seethaler, D. (2011). ASIC implemen-tation of soft-input soft-output MIMO detection using MMSEparallel interference cancellation. IEEE Journal of Solid-StateCircuits, 46(7), 1754–1765.

34. Mansour, M.M., & Jalloul, L.M. (2015). Optimized configurablearchitectures for scalable soft-input soft-output MIMO detectorswith 256-QAM. IEEE Transactions on Signal Processing, 63(18),4969–4984.

35. Gomaa, A., & Jalloul, L.M.A. (2014). Efficient soft-input soft-output detection of dual-layer MIMO systems. IEEE WirelessCommunications Letters, 3(5), 541–544.

36. Edelman, A. (1988). Eigenvalues and condition numbers of ran-dommatrices. SIAM Journal on Matrix Analysis and Applications,9(4), 543–560.

37. Choi, J. (2006). Nulling and cancellation detector for MIMO andits application to multistage receiver for coded signals: perfor-mance and optimization, (Vol. 5.

http://www.3gpp.org

http://www.3gpp.org

http://www.3gpp.org

http://standards.ieee.org/getieee802/802.16.html




http://arxiv.org/abs/1502.00212

J Sign Process Syst

38. Golub, G.H., & Van loan, C.F. (1996). Matrix Computations, 3rdedn. Baltimore: Johns Hopkins University.

39. William, G., & Chad, M.S. (1994). The cumulant theory ofcyclostationary time-series, part I. IEEE Transactions on SignalProcessing, 42(12), 3409–3429.

40. Sarieddeen, H., Mansour, M.M., & Chehab, A. (2016). Modula-tion classification via subspace detection in mimo systems, IEEECommunications Letters.

41. Sarieddeen, H., Mansour, M.M., & Chehab, A. (2016). Efficientnear-optimal 8x8 mimo detector. In Proceedings of the IEEEwireless commun. and netw. Conf. (WCNC) (pp. 1–6).

Hadi Sarieddeen receivedthe B.E. (summa cum laude)degree in computer andcommunications engineeringfrom Notre Dame Univer-sity - Louaize (NDU), ZoukMosbeh, Lebanon, in 2013.He is currently pursuing thePh.D. degree in electrical andcomputer engineering at theAmerican University of Beirut(AUB), Beirut, Lebanon. Hewas an intern at EricssonLebanon CommunicationsSarl in the summers of 2011and 2012, and has been a

teaching assistant at AUB since 2013. His research interests are in thearea of signal processing for wireless communications, with emphasison designing algorithms and architectures for high-order multiuserMIMO detection. Mr. Sarieddeen was the recipient of the GeneralKhalil Kanaan Award at NDU in 2013, for ranking first on a graduat-ing class of around 1200 students, and the recipient of the NationalCouncil for Scientific Research doctoral scholarship award at AUB in 2016.

Mohammad M. Mansour(S’97-M’03-SM’08) receivedthe B.E. (Hons.) and theM.E. degrees in computerand communications engi-neering from the AmericanUniversity of Beirut (AUB),Beirut, Lebanon, in 1996 and1998, respectively, and theM.S. degree in mathemat-ics and the Ph.D. degree inelectrical engineering fromthe University of Illinois atUrbana-Champaign (UIUC),Champaign, IL, USA, in 2002and 2003, respectively. He

was a Visiting Researcher at Qualcomm, San Jose, CA, USA, in sum-mer of 2016, where he worked on baseband receiver architectures forthe IEEE 802.11ax standard. He was a Visiting Researcher at Broad-com, Sunnyvale, CA, USA, from 2012 to 2014, where he worked onthe physical layer SoC architecture and algorithm development forLTE-Advanced baseband receivers. He was on research leave withQualcomm Flarion Technologies in Bridgewater, NJ, USA, from 2006to 2008, where he worked on modem design and implementationfor 3GPP-LTE, 3GPP2-UMB, and peer-to-peer wireless networking

physical layer SoC architecture and algorithm development. He wasa Research Assistant at the Coordinated Science Laboratory (CSL),UIUC, from 1998 to 2003. He worked at National SemiconductorCorporation, San Francisco, CA, with the Wireless Research group in2000. He was a Research Assistant with the Department of Electricaland Computer Engineering, AUB, in 1997, and a Teaching Assistantin 1996. He joined as a faculty member with the Department of Elec-trical and Computer Engineering, AUB, in 2003, where he is currentlya Professor. His research interests are in the area of energy-efficientand high-performance VLSI circuits, architectures, algorithms, andsystems for computing, communications, and signal processing. Prof.Mansour is a member of the Design and Implementation of SignalProcessing Systems (DISPS) Technical Committee Advisory Boardof the IEEE Signal Processing Society. He served as a member ofthe DISPS Technical Committee from 2006 to 2013. He served as anAssociate Editor for IEEE TRANSACTIONS ON CIRCUITS ANDSYSTEMS II (TCAS-II) from 2008 to 2013, as an Associate Editorfor the IEEE SIGNAL PROCESSING LETTERS from 2012 to 2016,and as an Associate Editor of the IEEE TRANSACTIONS ON VLSISYSTEMS from 2011 to 2016. He served as the Technical Co-Chairof the IEEEWorkshop on Signal Processing Systems in 2011, and as amember of the Technical Program Committee of various internationalconferences and workshops. He was the recipient of the PHI KappaPHI Honor Society Award twice in 2000 and 2001, and the recipientof the Hewlett Foundation Fellowship Award in 2006. He has sevenissued U.S. patents.

Louay M.A. Jalloul receivedthe B.S. degree from the Uni-versity of Oklahoma, Nor-man, OK, USA, in 1985; theM.S. degree from the OhioState University, Columbus,OH, USA, in 1988; and thePh.D. degree from Rutgers,The State University of NewJersey, Piscataway, NJ, USA,in 1993, all in electrical engi-neering. He is currently aSenior Director of Technologywith Qualcomm, San Jose,CA, USA. Prior to that he wasa Technical Director with the

Broadcom Corporation, and a Senior Director of Technology withBeceem Communications Inc. (a Silicon Valley startup providing solu-tions for mobile broadband wireless communication systems acquiredby Broadcom in November 2010). From September 2004 to September2005, he was an Associate Professor with the Department of Electri-cal and Computer Engineering, American University of Beirut, Beirut,Lebanon. In February 2001, he joined MorphICs Technology Inc.,Campbell, CA (acquired by Infineon Technologies AG in April 2003)as the Director of Systems Architecture, where he led his team in thedevelopment of the code-division multiple access cellular base bandmodem processor based on the third-generation wideband CDMAstandard. From 1993 to 2001, he was with Motorola Inc., taking onvarious functions in research and development. He was a ResearchAssociate with the Electro-Science Laboratory, The Ohio State Univer-sity; and the Wireless Information Networks Laboratory (WINLAB),Rutgers University. Dr. Jalloul has 89 issued U.S. patents. He is asenior member of the IEEE and a member of IEEE Engineering HonorSociety Eta Kappa Nu.

J Sign Process Syst

Ali Chehab received hisBachelor degree in EE fromAUB in 1987, the Master’sdegree in EE from SyracuseUniversity in 1989, and thePhD degree in ECE fromthe University of North Car-olina at Charlotte, in 2002.From 1989 to 1998, he wasa lecturer in the ECE Depart-ment at AUB. He rejoinedthe ECE Department at AUBas an Assistant Professor in2002, became Full Professorin 2014. He received the AUBTeaching Excellence Award in

2007. He teaches courses in Programming, Electronics, Digital Sys-tems Design, Computer Organization, Cryptography, and Digital Sys-tems Testing. His research interests include:Wireless CommunicationsSecurity, Cloud Computing Security, Multimedia Security, Trust inDistributed Computing, Low Energy VLSI Design, and VLSI Testing.He has about 200 publications. He is a senior member of IEEE and asenior member of ACM.

Documents

High Order Multi-User MIMO Subspace Detectionstaff.aub.edu.lb/~mm14/pdf/journals/2017_JSPS_High... · High Order Multi-User MIMO Subspace Detection ... (II), interference rejection