Schur algorithms for joint-detection in TD-CDMA based mobile … · 2009. 5. 29. · 366 M. VOLLMER - SCHUR ALGORITHMS FOR JOINT-DETECTION IN TD-CDMA BASED MOBILE RADIO SYSTEMS To

pp. 365-378 365

Schur algorithms for joint-detection in TD-CDMA based mobile radio systems

Marius VOLLMER* Martin HAARDT** Jtirgen GOTZE***

Abstract

Third generation mobile radio systems will employ TD-CDMA in their TDD mode. In a TD-CDMA mobile radio system, joint-detection is equivalent to solving a least squares problem with a system matrix that exhibits some form of block-Toeplitz strttctttre. This structure can be sttccessfully exploited by ttsing variations qf the Schttr algorithm j~br computing the QR decomposition of this system matrix. Together with a displacement representation, the Schur algorithm can be straight/bp~t'ardly adap- ted to a wide varie O' of matrix structures. In this paper we show this approach for two concrete mani[estations of the TD-CDMA systent matrix : first,/or a vepy compact, block-Toeplitz structure; and second,/or the less favo- rable Toeplitz-block strllctttre that arises when decision feedback is added to the (htta detection process.

Key words : Mobile radiocommunicatlon, Code thwslon muhlple access, Time division multiple access, Signal detection, Complex signal, Least squares method, Matrix method, Toeplitz matrix, Matrix decomposition. Decision feedback, Algorithm complexity.

ALGORITHME DE SCHUR POUR DI~TECTION CONJOINTE DANS LES SYSTEMES

DE RADIOCOMMUNICATION AVEC LES MOBILES UTILISANT L'ACCI~S MULTIPLE TD-CDMA

R6surn~

Le systkme de radioconmmnication avec les mobiles de troisibme g~ndration emploierait l'accbs multiple pat" r@artition mixte en code et en temps (TD-CDMA) dans leur ntode de duplexage temporel (TDD). Darts ttn tel sys- tome, la ddtection conjointe des utilisateurs est dquiva- lente h la r(solution d'un problkme de moindres carr6s ot't la matrice du svstbme pr~Xsente u n e certaine forme de structure de Toeplitz en blocs. Cette structttre peut ftre exploitde avec succbs en utilisant des variations de l'algorithme de Schur pour calctder la ddcomposition QR de

la matrice. Combin~ gt une reprdsentation du d@lace- ment, l'algorithme de Schur s'adapte directement avec ttne grande vari6t~ de structures matricielles. L'article applique cette approche gt deux exemples concrets de matrices : une structure de Toeplitz en blocs trbs com- pacte ; tote structure en blocs de Toeplitz moins favo- rable qui r~sulte de l'adjonction d'un retour des d(cisions de d~Stection des donn(es.

Mots cl6s : Radiocommunication service mobile, Acces muluple code, Acces multiple temps, D6tection signal. Signal complexe, M6thode moindre carte, M6thode matricielle, Matrice Toeplitz, D6composition matricielle, D6cision r6fl6chie, Complexitd algorithme.

Contents

I. Introduction II. System model

III. Joint data detection via block linear equalization IV. The Schur algorithnt V. Approximations

VI. Computational complexity VII. Conchtsions References (12 ref )

I. INTRODUCTION

In January 1998, the European standardization body for third generation mobile radio systems, the ETSl Spe- cial Mobile Group (SMG), has agreed on a radio access scheme for third generation mobile radio systems, called Universal Mobile Telecommunications System (UMTS). This agreement recommends the use of WCDMA (wide- band CDMA) in the paired portion of the radio spectrum (FDD mode) and TD-CDMA (Time division CDMA) in the u n p a i r e d p o r t i o n (TDD m o d e ) . TD-CDMA is a TDMA b a s e d

system where multiple users are separated by spreading codes within one time slot.

* Marius Vollmer works oil his PhD thesis both at the Universny of Dortmund and at Siemens AG. Arbeltsgebiet Datentechnik, Universit~it Dortmund. 44221 Dortmund, Allcmagnc, emall : [email protected]'tmund.de ** Martin Haardt works at Siemens AG ICN CA CTO 7 *** Jtirgen G6tze is head of the lnfomaation Processing Lab at the University of Dortmund.

1/14 ANN. Tf2LECOMMUN., 54, n ~ 7-8, 1999

366 M. VOLLMER - SCHUR ALGORITHMS FOR JOINT-DETECTION IN TD-CDMA BASED MOBILE RADIO SYSTEMS

To overcome the near/far problem of traditional CDMA systems, receiver structures have been proposed for TD-CDMA that perform joint (or multiuser) detection [1]. A joint detector combines the knowledge about all users that share one time slot into one large system of equations [2], [1], [3], [4]. This knowledge consists of the channel impulse responses that have been estimated from training sequences, the spreading codes, and the received antenna samples. The resulting system of equations can be very large and thus algorithms must be deve- loped to exploit its special structural characteristics, namely its band and block-Toeplitz structure. In [5] an approach based on the Cholesky algorithm was presented. The band structure of the system matrix leads to an approximate block-Toeplitz structure in the desired Cho- lesky factor. This has been exploited by computing the Cholesky factor of a smaller subproblem and using it to approximate the complete Cholesky factor from copies of that smaller factor [6].

In this paper, we show how the Toeplitz structure of the system matrix can be directly exploited by the Schur algorithm. While the resulting computational complexity is not significantly lower than for the approximating Cholesky algorithm, the Schur algorithm is more amen- dable to a fine-grained parallel implementation with sys- tolic processor arrays. In addition, it provides a flexible framework for engineering joint-detection algorithms in situations where the approximated Cholesky algorithm is not efficient. We show how this flexibility can be put to good use when decision feedback is introduced into the joint detection process.

After detailing the system model of the TO-COMA system and explaining the basic joint detection process in sections II and III, this paper shows how to derive

a generalized Schur algorithm that can cope with the specific kind of Toeplitz-derived structure of the system matrix (section IV) and how to apply approximation techniques to it (section V). In section VI, we present comparisons of the computational complexity for two algorithms, and then conclude.

II . SYSTEM M O D E L

Performing joint detection can be reduced to solving a system of linear equations. In the sequel, we will explain the construction of this system of equations and the underlying data model.

II.1. TD-CDMA

In the TD-CDMA system, K CDMA codes are simulta- neously active on the same frequency and in the same time slot. The different spreading codes allow the signal separation at the receiver. According to the required data rate, a given user might use several CDMA codes and/or time slots. The frame structure for this time-slotted CDMA concept is illustrated in Figure 1, where B, Ter, Nfr, Tbu, and K denote the bandwidth of a frequency slot, the duration of a TDMA frame, the number of bursts per TDMA frame, the burst duration, and the number of CDMA codes per frequency and time slot, respectively. A burst consists of a guard interval and two data blocks (of N symbols

FIG. 1. - - Frame structure of the TD-CDMA system. Here, B, Tfr, Nrr, Tbu, and K denote the bandwidth of a frequency slot, the duration of a YOMA frame, the number of bursts per TOMA frame, the burst duration, and the number of CDMA codes per frequency

and time slot, respectively

Structure de trame du syst~me TO-COMA. Dans cette figure, B, T/,, Nfe Tbu et K ddsignent respectivement la largeur de bande d'un crdneau frdquentiel, la durde d'une trame temporelle, le nombre de salves par trame temporelle, la durde d' une salve

et le nombre de codes COMA par crdneau temps-frdquence

ANN. T~LI~COMMUN., 54, n ~ 7-8, 1999 2/14

M. VOLLMER -- SCHUR ALGORITHMS FOR JOINT-DETECTION IN TD-CDMA BASED MOBILE RADIO SYSTEMS

each), separated by a user specific midamble which contains L m chips and is used for channel estimation [3], see Figure 1. In Figure 2, the structure of one time slot is illustrated for the kth midamble and the kth spreading code. Here, Q denotes the spreading factor of the data

symbols.

II.2. Discrete time data model

The received measurements at the ruth antenna during the duration of one time slot are pictured in Figure 3. The parts labeled a of the received measurement vector are only influenced by the corresponding data blocks, the c part is exclusively determined by the

367

transmitted midambles, and the b parts of the measurement vector are influenced by the transmitted midambles and the corresponding data blocks. The channel impulse response (CIR) vectors between the kth mobile and the mth antenna h (k,m) E C W are estimated

from the c part of this received measurement vector as, for instance, described in [7].

Let us combine the N data symbols ~ ) , 1 < n < N, that are transmitted on the kth spreading code during one data block (half burst) to the vector

d (k) (1) d(k)= .2 E C N,1 < k _ < K .

d(k) N

Fro. 2. - - Time slot structure of the TD-CDMA system. Here, Tbu, T s, T c, and Q denote the burst duration, the symbol duration, the chip duration, and the spreading factor of the data symbols, respectively.

Structure d'un crdneau temporel du systkme rD-CDMA. Darts cette figure, Tbu, T s, T c et Q ddsignent respectivement la durde d'une salve, la durde d'un symbole, la dur~e d'une bribe et le facteur d'~talement des syrnboles de donndes.

FIG. 3. - - Received measurements at the ruth antenna during the duration of one time slot. In this illustration, the additive noise and inter- cell-interference are not considered. The parts labeled a of the receiver measurement vector are only influenced by the corresponding data

block, the c part only by the transmitted midamble, and the b parts by the midamble and the corresponding data blocks.

Mesures de r~ception h la m-ikme antenne pendant la durge d'un cr~neau temporel. La figure ne prend pas en compte le bruit additif et le brouiUage intercellulaire. Les parties du vecteur de mesure notges a ne sont influencdes que pal" les blocs de donndes correspondants,

ta partie cne l'est que par le s@arateur de blocs, Ies parties b le sont ~ la fois par tes blocs de donndes et par le s@arateul:

3/14 ANN. T~L~COMMUN., 54, n ~ 7-8, 1999

368 M. VOLLMER -- SCHUR ALGORITHMS FOR JOINT-DETECTION IN TD-CDMA BASED MOBILE RADIO SYSTEMS

The kth spreading code consists of Q complex chips c~ tO, 1 < q < Q , and is denoted as

c (k) 1

c (k) c(k) = 2 ~-C Q, 1 < k < K.

c (k) Q

With this definition, the block-diagonal spreading matrix C (k) that corresponds to the kth code can be writ- ten as

(2) C (k) = I N | c (~)=

.(k)

c(k) E C NQ•

c(k)

Assume that K spreading codes are transmitted at the same time. After eliminating the influence of the midamble, cf. Figure 3, the received measurements at the mth antenna obey the following linear model :

K

X (rn) = Z k=l

U/i ".....

L. '"''"'" ~

I ] It (m) 1

C(k)" d(k) + n 2 ('n)

I " (m)

L~%Q + w - I

K

(3) = ~-~. H (k,m). C (k) . d (k) + n (m), 1 <_ m <_ M

k = l B(k,m)

with

(4) x (m) = [x X: ]

(m) " L NQ + W - l J

Notice t h a t H (k'm) ~. C (NQ + w - 1) x (NQ) i s a

Toeplitz matrix that contains estimates of the channel impulse response (CIR) vector h(k 'm)~ C (W) of the kth user (corresponding to the kth spreading code) at the mth antenna. By comparing (3) with Figure 3, we can identify s (k,# = C ~k). d (k'e), e = 1,2, where the parameter ~ determines whether the first or the second data block of a time slot is treated�9 In the sequel, we will omit the parameter for notational convenience.

Using the definition of B (k,m) in (3), x (m) may be expressed as

K

( 5 ) x(m) = Z B(k'm)" d(k) + n(m) = k = l

K

Z k = l

. d (k) + n (m), l < m < M,

where B (k,m) e C (NQ + w - 1 ) X N is a block-Toeplitz matrix consisting of combined cm vectors b (k,m) that can be expressed as the convolution of the channel impulse response (CIR) vector h(k'm)with the corresponding spreading code c (~), i.e.,

(6) b (Lm) =

F'bl(k,m)

[ bz(k,m)

b(k,m) Q + W -

= h (k'm) , c (k) E C Q + w - I

l < k < _ K , l < m < M .

The combined CIR vectors b (k'm) of the kth user at all M antennas can be simplified to a single combined CIR vector in the space-time domain

(7) b (k) = vec I YJJ

= v e c '

b(k,2) b(k,2) ,.. h(k, 2) 1 2 - Q + W - 1

M) b k,M ... h(k,M) - Q + W - 1

E C M(Q+W-1)"

1 < k < K. Here, vec {A} denotes the reshaping of a matrix into a vector, such that the first elements of the vector are formed by the first column of the matrix, the next elements by the second column, and so on. Moreover, let us define the space-time array measurement vector (during one half burst) as

ANN. TI~L~COMMUN., 54, n ~ 7-8, 1999 4/14

(8) x = vec


'~ CM(NQ + W- 1)

{ixT ]1 X(2)T

x(M) T

I I x(]) x(l) ' x(k']) 1 2 NQ + w - l

x(2) x(2) ... x(k, 2) 1 2 NQ+W- 1

= v e c "

X(l M) (M) .-- x(M) X2 NQ + W --

Using equation (5), the vector x may then be expressed as

K

( 9 ) x = ~ B (k) d(k) + n k = l

369

matrices B (k) for all K users into the single system matrix T. This arrangement dictates the sequence in which the estimated data symbols become available during the joint detection process, as we will see below. The arrangement that leads to a low computational complexity and to the approximation opportunities that have been exploited in [6] does not offer the right sequence to properly introduce decision feedback into the joint detector. We will first construct T such that the resulting computat ional complexity is minimized. Later, after introducing decision feedback itself, we will show the construction of a slightly different system matrix ~/" that allows decision feedback to be more effective.

Using the definition of d (k/e C u in (1), the transmitted data symbols of all K users are combined in the folio- zing fashion �9

(13) d = vec {EdiT J} d(2)T

i

d(K)T

K

(lO) = k = ]

b•(k) . . . . . . . . . .

IMO

I '(k) t ' d(k) + n, : A i f 1

I I I . I I I

L . _

~--- v e c : ~ ~ E C NK.

LLd(f ) d(K) d(K)N dN

This leads to the following construction of T:

where the matrix B (k) is constructed from b (k) in the same way as B (k,m) is constructed from b (k,m). In equation (9), the space-time vector.

R(2)T (11) n = vec

n(M)T

,r //~(2, ,(~/

LL< "'2

NQ +W -1 n(2)

NQ + W - 1 E C M(NQ

n(M) NQ+ w - 1

+ w - l )

models inter-cell interference, i.e., dominant interferers from adjacent cells, and additive (thermal) noise.

It is desirable to simplify the explicit summation over all users in equation (9) to a single matrix vector product such that we get the final form of the system equation :

(12) x = Td + n.

However, there are multiple ways to combine the

(14) T =

where the matrix

E C M(NQ + W- I) •

(15) V = b (1) b (2) ... b (K~ E C M(Q+W-1) x K

contains the combined cm vectors of all K users in the space-time domain.

5/14 ANN. TgLIICOMMUN., 54, n ~ 7-8, 1999


I I I . J O I N T DATA D E T E C T I O N V I A B L O C K L I N E A R E Q U A L I Z A T I O N

Given the linear space-time data model in equation (12), we want to find a linear estimate of the N data symbols transmitted by each of the K users during the duration of one block (half burst), i.e., a b lock l inear

equalizer, such that

(16) d = W H x = ! x e C NK

In the sequel, two alternative solutions are presented.

HI.1. Least-squares solution

Gauss-Markov estimate [8]. The solution of this constrai- ned optimization problem is given by

(19) d = W H x = (TH R n2 T ) - 1 T n R ~ 1 " x ,

where we assume that the space-time covariance matrix of the inter-cell interference-plus-noise Rnn = E { n n H} is non-singular) It contains contributions from the dominant interferers form adjacent cells and from the additive noise. Notice that the least squares solution to one that computer weighted least-squares solution (19) simplifies to the standard least-squares solution (7) if Rnn = 2 ,,7, IM ( N Q + W - 1 ) . Furthermore, the solution given in (19) minimizes the following weighted norm of the error

ILL,,,, (x-r.,o 112 where Ln, , is a square root of R~2 such that

Lnn Lnn = R - 1 rill"

This relation can be used to easily extend an algorithm that computes the least-squares solution to one that computes the weighted least-squares solution. Alternatively, one could use the algorithm due to Paige to solve the general ized least-squares problem directly [9].

In the first case, we choose the space-time weighting m a t r i x W H E C (NK)• + W - 1) in (16) such that the Euclidean norm of the error

IIx- r .al l 2 2

is minimized. It is given by the standard least squares (LS) solution, where W t-/is equal to the Moore-Penrose pseudo inverse (generalized inverse) of T, i. e.,

(17) d = W H x = T + x = (Tt-IT) -1 T H" x .

Notice that the LS solution does not take inter-cell interference into account, i.e., strong interferers from adjacent cells are not canceled.

III.2. Weighted least-squares solution

Alternatively, we want to find NK space-time weight vectors w H that minimize the variance of the estimated data symbols w i = argmin w i E { I 12/with

d(k n) = w ~ x , 1 < n < N , 1 < k < K , i = k + (n - 1) K,

such that iEWll/ (18) W H T = w ~ T = INK.

In the literature, this approach is called minimum variance distortionless response (MVDR) solution, zero- forcing block linear equalizer, linear minimum variance unbiased estimate, weighted least squares estimate, or

III.3. Decision feedback

As we will see in the sequel, the last step of the proposed estimation procedure is the solution of a triangular system of linear equations, i.e.

(20) R d = z,

where R ~ C/vK• is upper triangular. Equation (20) can easily be solved via back-substitution. It has been proposed to include decision feedback into this process to improve the bit error rate performance of the transmis- sion system [1], [2]. Figure 4 shows the procedure used for solving (20) with decision feedback. The function map (x) maps x to the nearest member of the symbol alphabet, i.e., it implements the decision. As can be seen, the first decision influences all symbols contained in d. Therefore, it could be advantageous to arrange d in such a way that the symbols of the strongest user are found in the last elements of d. Let us denote this new arrangement of the data symbols of all users by d and the corresponding system matrix by 2P.

Then we have

(21) d= / / a n d t : [n' l ' 8'2 ...

I. It can be shown that the MVDR solution also minimizes the variance of the estimation error E{ I e<~)[ 21 subject to (18), where eCkn)is defined

k) (k) - as e(~)= d'k n -at n - d(kn ) - wni x, l <k< K,l <n< N,i = k +(n- 1)K.

ANN. TI~LI~COMMUN., 54, n ~ 7-8, 1999 6/14

M. VOLLMER -- SCHUR ALGORITHMS FOR JOINT-DETECTION IN TD-CDMA BASED MOBILE RADIO SYSTEMS 371

for i from n down to 1

d(i) +-- (z(i) - Y," . J = t + l

t(i) <-- map (d(i))

end

R(i,j) t (j)) / R(i,i)

structure and the strong band structure is inherited by the Cholesky factor R. On the other hand,/~ and ~-/T are Syl- rester-block structured and the Cholesky factor/? does not inherit their sparseness. The latter fact is the main reason why the previously used Cholesky algorithm can not be successfully applied to the decision feedback problem.

FIG. 4. - - Performing decision feedback while back substituting. The function map (x) to the nearest member of the symbol alpha-

bet, i.e., it implements the decision.

Exdcution d'un retour de dgcision pendant une substitution arrikre. La fonction map (x) applique x sur l'dlLment le plus voisin d'un alphabet de symboles, c "est-fi-dire met en eeuvre la ddcision.

The structure of T and related matrices is depicted in Figure 5. For comparison, Figure 6 shows the structures of the corresponding matrices for T as defined in (14). As can be seen, the matrices T and THT have a block-Sylvester

(a) ir (b) ~u~ (c)R

FIG. 5. - - The sparseness of the mamces for decision feedback. The black regions represent non-zero matrix elements. In the

depicted example, the following parameters were chosen : N = 6 9 , K = 8 , Q = 16, W = 60

Le caractkre @ars des matrices pour le retour de ddcisions. Les rdgions noires reprLsentent les dldments non nuls de la matrice.

L'exemple correspond aux param~tres suivants : N = 6 9 , K = 8 , Q = 1 6 , W = 6 0

(a) T (b) THT (c)R

FIG. 6. - - The sparseness of the compact matrices. The black regions represent non-zero matrix elements. In the depicted

example, the following parameters were chosen : N=69, K = 8 , Q= 16, W= 60

Le caractkre dpars des matrices compactes. Les rdgions noires reprdsentent les dldments non nuls de Ia matrice. L'exemple cor-

respond attx parambtres suivants : N=69, K = 8 , Q= 16, W = 6 0

IV. T H E S C H U R A L G O R I T H M

The Schur algorithm is a way to efficiently compute the QR decomposition of a Toeplitz-structured matrix. Such a QR decomposition can be used to find the solution (17) of the least squares problem [9]. In the following, we will outline the general Schur algori thm for matrices that have some kind of Toeplitz derived structure. As we go along, we will use the insights to solve (17) for both the block-Toeplitz matrix T in (14), as well as the Toeplitz-block matrix/v,. In the following we will therefore not distinguish between T and 2r. Instead, we will use the general matrix U representing both.

The QR decomposition of U e C" x m consists of finding two matrices Q and R such that

(22) f = Q R

where Q e C ' x ,n is a unitary matrix, and QHQ = I. The matrix R e C m • m is upper triangular and is also known as the Cholesky factor of urlu . It can be used to find the vector a that minimizes

(23) I[ Ua - b II 2

by solving the triangular system of equations

(24) R a = QHb.

It should be noted that although U is Toeplitz structured, the resulting matrix R does not inherit this structure. Also, it is possible to compute the right hand side QHb in parallel to computing R. Therefore, we will concentrate on a procedure to compute R and will later show how to extend it to compute Ql4b as well.

The most general form of the Schur algorithm [10] starts from a displacement representation of the Gramian matrix u H u that is easy to obtain and then proceeds to transform this decomposition into R via orthogonal and hyperbolic rotations [11], [12]. The displacement representation consists of finding a way to exploit the whole structure of a matrix. For the positive definite, Hermitian matrix S = uHu, the displacement representation can be expressed as

L L

(25) S - Z S Z T = ~_~ oEHi o~ i -- Z f lH fli " / = 1 i = 1

The matrix Z is a shift matrix that has been constructed to reflect the Toeplitz structure of S. It is different depending on whether we want to invert T or T.

The vector a t and/3 i are 1 • m row vectors, i. e., the

outer product ~ a i i s a m • m matrix. These vectors are called the generators of S.

7/14 ANN. TI~LI~COMML~., 54, n ~ 7-8, 1999


The advantage of the Schur algorithm for structured matrices is that it is sufficient to work with the generators a, and/3 i (instead of the full matrix S) to compute R. The parameter L determines how many vector pairs are needed to express S. It, too, depends on the precise Toeplitz structure of S. The number 2L is called the displacement rank of S.

IV. I. Finding the generators

The first step in finding the generators a i and fli is to choose an appropriate shift matrix Z. Such a matrix should make D = S - Z S Z T as sparse as possible. In addition, the transformation S--->D should only introduce zeros but should not change the remaining elements of S. The last condition is required to ensure that the benefits gained from the fact that S is positive semi-definite are not lost. That requirement also means that D does not need to be computed explicitly since the introduced zeros are predictable and the rest can be read directly from S.

Once a suitable Z has been chosen, we need to compute one generator ( a i, 13) for each row that does not contain only zeros. For row j, the generators are computed according to

(26) o~i=D( j, :)/ V D (j, j), fli = ~ except ]3i (j) = 0

After the generator pair has been computed for such a row, its contribution is removed from D before computing the next generator pair.

For the two specific matrices T and T, Figure 7 and 8 show which shift matrix to use and how to compute the generators, respectively. The definition of such functions uses a Matlab inspired notation. The arrow <-- denotes assignment : the value on the right side of it is computed and then assigned to the entity denoted on the left hand side. Matrix and vector subscription is denoted as X (r,c)

Z ~-- Z (1) @ I K - - N

S ~- THT

D +- S - Z S Z T

for i = l : K

~. ~--D(i, :)/~

t~i 6-- lY i

t~i (i) +-- 0

D(:,i) +-- 0

end

Z ~--I K @ Z ( I N )

S ~-/~H~

D ~-- S - Z S Z r

f o r / = 1 "K

j < - - - ( i - 1 ) N + l

a, +--D(j, : ) / ~

/3 i ~ - ~

~i (J) ~-- 0

D(:,i) ~-- 0

end

FIG. 8 . - - C o m p u t i n g the genera tors for 1"

Calcul des gdndrateurs de 1"

where r selects rows of X and c selects columns. Both r and c are ranges. A range can have the following forms with the indicated meaning :

a The single index a;

a: e All indices from a to e inclusive.

a: s: e All indices from a to e inclusive with a step size of s.

: All indices that are valid as determined by the context.

The operator | denotes the Kronecker product defined as

Iall a12" . ]

A @ B = [ a ~ 1 ,,,, 1 @ B = t �9 21 B a22 , .

The matrix I K is the K • K identity matrix and Z (l) is N

the N • N unit shift matrix

(27)

Z(1)= N

0

1 0

0 1

. , ,

0

0 1

FIG. 7 . - - C o m p u t i n g the genera tors for T

Calcul des gdndrateurs de T

ANN. TI~LI~COMMUN., 54, n ~ 7-8, 1999

IV. 2. Finding R from the generators

The displacement representation (25) can be rewritten to directly express S by introducing Krylov matrices.

8/14

M. VOLLMER -- SCHUR AI,GORI'FHMS FOR JOINT-DETECTION 1N TD-CDMA BASED MOBILE RADIO SYSTEMS 373

The krylov matrix r Z) of a row vector ~ with respect to Z is (for our purposes) a m • m matrix defined as

(28)

(~, z ) = ~ z r

~Z(m- I)7"

Performing the transformation (30) can now be resta- ted as finding a matrix O e C 2mL• 2mL such that

(34) OX =

R 0

0 and OHJ O = J .

Note that ~gl(~, Z) is an upper triangular matrix. In the sequel we omit the Z argument to the Krylov constructor for simplicity. With this definition, (25) can be rewritten as

L L

(29) S = ~ r H r ) - ~ r162 t 1 * = 1

The goal of the Schur algorithm is now to gradually eliminate the contributions of 'gl(o'2), ~/(a3) . . . . . 'g/(O~L) and 'L/(flj ), 'g/(132) . . . . . 'g./(/3L) so that only the tirst term of the first sum remains. The Schur algorithm preserves the upper triangular structure of the matrices. This transformation can be summarized as follows

(30) 'gl(cq) -+ R, R upper triangular U(%) + 0, I < i < L ~(g)~o, l<_i<_L

It can be seen that with this transformation, the remaining matrix R must indeed be the R factor of the QR decomposition of U:

(31) S = u H u = RHQHQR = RHR.

To derive the actual Schur algorithm, we rewrite (29) into the following form "

(32) S = XHJX

where

(33) x =

u(a~) U (.O~ 2 )

u(a L) u(~) u(~2)

.u(&)

and

"I

j =

1

1 - I

- 1

E C 2mL• 2mL

- I

The complete transformation O is constructed from elementary rotations 0 i each introducing one zero into X. All transformations are applied from the left to X such that

(35) O = . . . O 3 0 2 0 1.

The condition OHJO = J is needed to ensure that the transformation preserves the validity of (32). When this condition holds, O is said to be J-orthogonal. The easiest way to enforce O to be J-orthogonal is to construct each elementary transtbrmation so that it is J-orthogonal as well.

The general structure of a elementary transformation that introduces one zero into X is

(36) O i = e ( K , k , l ) =

k 1 0

" 'N !

0 ... kll

0 . . . k21

0 0

1 0 0 i

... k12 .. 0

. . . k22 . . . 0

0 1

kl:l. where K = kll

k21 k22

If such a transformation O i is multiplied onto X from the left, it will only affect rows k and l of X. The transformation kernel K must be chosen so that the desired zero turns up in row l after the transtbrmation has been applied, and so that the c o n d i t i o n OiJ O, = J is satisfied. The first requirement can be expressed as

where a and b are elements from row k and row 1 of X, respectively. After O i has been applied to X, the place that contained b will be zero.

Due to the structure of J in (33) and since being J- orhogonal implies being (-J)-orthogonal, the kernel K must satisfy one of two orthogonality constraints, depending on the choice of k and l. The transformations that correspond to these two constraints are called unitat 3' and hyperbolic, respectively.

Unitary kernels arise if both row k and row l of X belong to the region formed by the Krylov matrices '~l(c~i), or both rows belong to the region of the U(fli ) :

9/14 ANN. TELECOMMUN, 54, n" 7-8, 1999

374

(38)


The unitary kernel is defined as

(39) K = G ( a , b ) =

i [1_ b p 1 ' P = a ' ~ 0 '

for a = 0

The hyperbolic case arises when row k of X is in the U(~.), region and row 1 is in the U (/3i), region :

. : [11

(41)

The corresponding hyperbolic kernel is defined as

K = H ( a , b ) - ~v/1 - ] p l 2

b p = - , la] > ]bl. a

The next step is to find a specific sequence of elementary transformations @i that actually implements the complete transformation (34). This sequence cannot be chosen arbitraily because care must be taken not to des- troy desired zeros during one of the later elementary transformations. Figure 9 shows the first steps for a system with L = 2. As one can see, the non-zero triangles in the Krylov matrices are made smaller by removing one diagonal after the other. One step consists of first using unitary rotations to cancel the diagonals within the group formed by the a i, and within the groups formed by the /~i; then, hyperbolic rotations are used to cancel the diagonal formed by/5 i with o~ i. After each step, the next row of the result is available.

One can deduce from Figure 9 which transformations are redundant due to the structure of the matrices and how this structure changes f rom step to step. All matrices except the first one - the one that originally was U(o~i)

and that is going to be transformed to R - can be expressed by the Krylov construction (28) throughout the whole transformation. Therefore, a lot of transformations are redundant because their result is already known from applying the transformation to the first row of the Krylov matrices. The first matrix is gradually transformed to R and thus it looses its Krylov nature. However, the transformation proceeds row-wise : each step produces one more row of R because the following steps will only affect the remaining rows. The remaining rows are expressible by the Krylov construction. To go from one step to the next, we need to go from one row of the matrix

FIG. 9. - - The first two steps in the Schur algorithm. In this example, L = 2

Les deux premiOres ~tapes de I'algorithme de Schur, avec Ie param~tre L = 2

to the next ; according to (28), this can be done by shifting ~/ with Z. It is therefore sufficient to work only with the row vectors a / a n d / 3 i and a place to store the result R to carry out the Schur algorithm. Figure 10 repeats Figure 9 but shows only the relevant computations.

IV. 3. T h e r ight h a n d s ide

As mentioned in the beginning, the r ight hand side Qnb can be computed in parallel to comput ing R. This is done by finding a vectory such that

Iql (42) @y = , and q = Qnb,

where * denotes vector elements of no further interest. The vector y can be deduced from the relation

(43) b n U = y n i X ,

ANN. TI~LI~OMMUN., 54, n ~ 7-8, 1999 10/14

M. VOLLMER -- SCHUR ALGORITHMS FOR JOINT-DETECF1ON IN TD-CDMA BASED MOBILE RADIO SYSTEMS 375

all these non-zero entries have the same value, and no 7_I(~i) - 71(flj) with ir has non-zero entries in the same columns, when Z has been chosen according to the requi- rements above.

Stated in another way, equation (46) can be partitioned column-wise. In addition, due to the structure of Z (both variants), only the first Z elements of eachYi do really mat- ter in the calculations. In the sequel, we will use the abbre-

�9 , ^

vlatmn Yi = Y i(1 "N) to simplify the notation accordingly�9 The column-wise partitioning depends on the precise

structure of Z. Therefore, we have two methods to com- A

pute the Yi, see Figure 11 and 12 respectively.

f o r / = I ' K

Yi = TH (:, i K ( N - 1) K + i)x /a i(i)

end

Fig. 11. - - Computing the right hand-side generators for T

Calctd des gdndrateurs du membre de droite pour T

Fig. 10. - - The Schur algorithm where only the relevant computations are shown

L'algorithme de Schur ; seuls les calculs pertinents sont indiquds

f o r / = I ' K

j = ( i - 1 ) N + 1)

-~i = /v,H (: , j . j + N - 1)x/a i (j)

end

Fig. 1 2 . - - Computing the right hand-side generators for if"

Calcul des gdn&'ateurs du membre de droite pour T

since after the transformation O has been applied to both X and y, we get

(44) bHU =[qH * ... ] J [ R] = qHR.

Substituting U = Q R into (44) shows that indeed q = QHb i fy has been chosen according to (43).

There is a lot of freedom in choosingy. The following restriction on the structure ofy simplifies the process �9

(45) y =

Y l

YL Y l

YL

With this construction, (43) can be expanded into

L

(46) bgU = ~ y r[(U(oci)- U(fli) ). i = i

The nice property of this formulation is that ' U ( ~ ) - U(fli) has only columns with at most one non-zero entry,

Once we have computed the relevant parts of y, we need to apply O to them, including the parts of O that are

^

redundant. We can do this by transposing each Yi into a row vector, pasting it onto the right of the corresponding a i and fli, and then applying the elementary transformations to tiffs longer vector. Care must be taken to properly shift J)L when a I is shifted. Taken together, this leads to the Schur algorithm, which is depicted in Figure 13.

V. A P P R O X I M A T I O N S

In general, the matrix R does not not inherit the structure of S. However, the sparseness of S leads to an approximate Toeplitz-derived structure in R [6]. It is possible to exploit this fact by computing only the relevant parts of R and fill the rest with copies from that part. Figure 14 shows how this can be done on a row-by-row basis which is well suited for the Schur algorithm. For the decision feedback system matrix /~, the approximations are interleaved with the algorithm, i.e., the receiver

11/14 ANN. TELECOMMUN., 54, n ~ 7-8, 1999

376 M. VOLLMER -- SCHUR ALGORITHMS FOR JOINT-DETECTION IN TD-CDMA BASED MOBILE RADIO SYSTEMS

for i f r o m 1 to K

Y(?)~-- YCib) ~-- Y i end

f o r j f r o m 1 to K N

for i from 2 to K

[ ctl Yl")r]~ --- q ( tzl(j), txi(j) ) [al Yl a'r ]

o: i yi(a)TJ [a i Yi (a)T j

I fll Ylb)r] ~'- G (ill (j) ' fli(j)) [ fl| ylb)r ]

fli Y i (b~r] [ fli y i(b)r J end

[fllYl(b)7 ][O'ly}a)T]cr-'- ff{(O~l(j)'fll(j))[fl:yl(b)T]

R (j, :)6--- t~ 1

q (j)<--- Yl(a)(1)

o~ l ~-- txIZT

yl(a)~'-{yl(a)(2 : N) O]

end

FIG. 15. - - Performance degradation of approximations for a varying number of computed rows, d. K = 4, Q = 16, W = 60,

N = 6 9 , M = 1

D~gradation de performance des approximations pour un nombre variable d de lignes calculdes. Les param~tres sont K = 4,

Q= 16, W = 60, N = 69e tM = 1

Fig. 13. - - The Schur algorithm. The matrices yi (a) and yi ~b) carry the right hand side through the algorithm

L'algorithme de Schur. Les matrices yi (a) et yi (b) transportent le membre de droite h travers l'algorithme

FIG. 14. - - Row-by-row approximation

Approximation ligne par ligne

FIG. 16. - - Computational complexity for a varying number of active codes, K. Q = 16, W = 57, N = 69, d = 10, M = 1

Complexit~ du calcul pour un hombre variable K de codes actifs. Les param~tres sont Q = 16, W = 57, N = 69, d = 10, et M= 1

actually computes the first d rows of R, then copies the

last of them unti l N rows are filled and then resumes to

compute the next few rows.

Figure 15 shows the bi t error rate per formance of a

TD-CDMA system that uses the Schur algori thm for deci-

s ion feedback in the rece iver app ly ing the approx ima-

t ions as shown in Figure 14.

These app rox ima t ions can be used to save a large

amount of computat ions wi thout degrading the bit error rate performance of the system.

VI . C O M P U T A T I O N A L C O M P L E X I T Y

Algor i thms that cannot exploi t the Toepl i tz structure of S have a computat ional complexi ty of O(N3K3). 2

The generalized Schur algorithm on the other hand can

exploit the Toeplitz structure and achieves a complexi ty of

O(N2K3). The number of symbols per block N is general ly

2. The notation Q(f(n)) means that the real complexity is some function g(n) with I g(n) / < C I }(n) ] for some constant C and large enough n.

ANN. TI~L~COMMUN., 54, n ~ 7-8, 1999 12/14

10 3

o -~ 10 2

.=

101

o 10 ~

10 -1 0

M. VOLLMER - SCHUR ALGORITHMS FOR JOINT-DETECTION IN TD-CDMA BASED MOBILE RADIO SYSTEMS 377

. . ! . - !

Cholesky + Schur

i i i

20 40 60 80 100 120

Number of data symbol, N

sion feedback for two algorithms. Note that the figures use a logarithmic scale. The first algorithm, denoted as "Cholesky" in the legend, consists of a Cholesky decomposition of S while applying the aproximations outlined in section V. The algorithm denoted by "Schur" is the one presented in this paper, featuring the same approximations. Each figure shows the number of multiplications required to perform joint detection of one burst. Such a burst consists of two data blocks with N symbols each.

VII. C O N C L U S I O N S

FiG. 17. - - Computational complexity for a varying number of

symbols per data block, N. Q = 16, W = 57, K = 8, d = 10, M = 1

Complexit# du caIcul pour un nombre variable N de symboles par bloc de donn~es. Les paramktres sont Q = 16, W = 57, K = 8,

d = lO, e t M = l

O

E

o

103

102

101

100

Cholesky Schur

J

l'O io 3'0 4'o 60 70 Number of rows d computed out of every N

By using a generalized Schur-type algorithm it is possible to exploit any variant of Toeplitz structure. For a system matrix T that has been arranged as in Figure 6, there is no significant difference in the computational complexity of the standard Cholesky algorithm and a Schur-type algorithm. This is due to the dominance of the band structure. However, when the decision feedback formulation is used (Fig. 5), such that T looses its band structure, the Schur-type algorithms can exploit the remaining Toeplitz-block structure and, therefore, allow the computational complexity to be reduced by a significant amount.

Approximations allow another large reduction of the computational effort needed to solve the joint detection problem. This has been achieved by generalizing the known approximation schemes.

Manuscrit refu le 28 juin 1999

FIG. 18. - - Computational complexi ty for a varying number of

computed rows r o u t of every N Q = 16, W = 57, K = 8,

N = 69, M = 1

Complexitg du calcul pour un nombre variable d de lignes calcul~es pour chaque ensemble de N lignes. Les paramktres sont

Q =- 16, W = 57, N = 69, d = IO, M = I

much larger than the number of users K. By using approximations one can reduce the complexity of the decomposition to O(NK 3) because the number of rows that need to be computed out of every N depends only on the degree of intersymbol interference. However, the back-substitution remains of order O(N2K 2) because there is no band-structure it could exploit in the decision feedback case.

Figures 16, 17, and 18 show the number of multiplications necessary for performing joint-detection with deci-

R E F E R E N C E S

[1] JUNG (P.), BLANZ (J. J). Joint detection with coherent receiver antenna diversity in CDMA mobile radion systems, IEEE Trans. o12 Vehicular Technology, 44, pp. 76-88, (1995).

[2] JUNG (R), BLANZ (J. J), BAmR (E W). Coherent receiver antenna diversity for CDMA mobile radio systems using joint detection, in Proc. 4tb IEEE Int. Symp. on Personal, Indoor and Mobile Radio Commun. (PIMRC), Yokohama, Japan, Sept. 1993, pp. 488-492.

[3] BLANZ (J. J), KLEIN (A.), NABnAN (M.), STEm (A). Performance of a cellular hybrid C/TDMA mobile radio system applying joing detection and coherent receiver antenna diversity, 1EEE J. Select. Areas Commun., 12, pp. 568-579, (May 1994).

[4] OTq'ossoN, (T.). Coding, modulation and multiuser decoding for DS-CDMA systems, Ph. Z. thesis, Chalmers University of Techno- logy, G6teborg, Sweden, (1997).

[5] BLANZ (J. J), KLEIN (A.), NArHAN (M.), ANDREAS STEIL (A . ) . Per- formance of a cellular hybrid clTDMA mobile radio system applying joing detection and coherent receiver antenna diversity, IEEE Journal on Selected Areas in Communications, 12, n ~ 4, (May 1994).

13/14 ANN. TELI~COMMUN., 54, n ~ 7-8, 1999


[6] MAYER (J.), SCHLEE (J.), WEBER (T.). Realtime feasibility of joint detection CDMA, in Proc. 2 nd European Personal Mobile Commtt- nications Cot~]erence, Bonn, Germany, Sept. 1997, pp. 245-252.

[7] STERNER (B.), JUNG (E). Optimum and suboptimum channel estimation for the uplink of COMA mobile radio systems with joint detection, European Trans. on Telecommunications and Related Techniques, vol. 5, pp. 39-50, 1994.

[8] LUENBERGER (D.G.). Optimization by vector space methods, John Wiley and Sons, New York, NY, (1969).

[9] GOLUB (G. H.), VAN LOAN (C. F.). Matrix computations. The John Hopkins University Press, third edition, 1996.

[10] KAILATH (Y.), CHUN (J.). Generalized displacement structure for block-Toeplitz, Toeplitz-block, and Toeplitz-derived matrices, SIAM J. Matrix Anal. Appl, 15, n ~ 1, pp. 114-128, (January 1994).

[11] GOTZE (J.), PARK (H). Schur-type methods based on subspace criteria, in PJvc. IEEE hit. Synq~. on Circuits and Systems. Hong Kong, 1997, pp. 2661-2664.

[12] CnuY (J.), KAILATH (T.), LEV-Am (H.). Fast parallel algorithms for QR and triangular factorization, SIAM J. Sci. Stat. Comput., 8, n ~ 6, (November 1987).

ANN. TCLf~COMMUN., 54, n ~ 7-8, 1999 14/14

Documents

Schur algorithms for joint-detection in TD-CDMA based mobile … · 2009. 5. 29. · 366 M. VOLLMER - SCHUR ALGORITHMS FOR JOINT-DETECTION IN TD-CDMA BASED MOBILE RADIO SYSTEMS To