31
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

  • Upload
    huy

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers. Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang. This work is supported by Nokia, TI, TATP and NSF. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Efficient VLSI architectures for baseband signal processing in wireless base-station

receivers

Sridhar Rajagopal, Srikrishna Bhashyam,

Joseph R. Cavallaro, and Behnaam Aazhang

This work is supported by Nokia, TI, TATP and NSF

Page 2: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Introduction

Real-time VLSI architecture for multiuser channel

estimation

Multiuser channel estimation usually neglected

– high computational complexity - DSPs infeasible

– Single user sliding correlator structures used

Iterative fixed point algorithm developed

Area-Time tradeoffs presented

– Area-Constrained,Time-Constrained, Area-Time efficient

Page 3: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Baseband signal processing

Base-Station Receiver

Channel estimation

Detection DecodingMultiple Users

Antenna

Detected Bits

TrackingTraining

Page 4: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Channel estimation

Direct Path

Reflected Path

Noise +MAI

User 1

User 2

Base Station

compensate for unknown fading amplitudes and

asynchronous delays.

Page 5: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Need for multiuser channel estimation

Detector performance depends on accuracy of channel

estimator

Multiuser Channel Estimation

– Jointly estimate parameters for all users

– Better performance than single user estimates

Page 6: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Computing multiuser channel estimates

Computed by sending a training sequence of known

bits to the receiver.

When absent, detected bits can be used to update

estimates in a decision feedback mode for tracking.

Importance of multiuser estimation usually neglected

May exceed detector complexity

Page 7: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

rbR Hiibr

bbR Tiibb

RA*R bribb

Multiuser Channel Estimation Algorithm

= {+1, -1} : Training/Tracking bits

= 8-bit integer (complex) : Received signal

N = spreading gain (typically fixed ,e.g: 32)

K = number of users (variable, <=N)

= Maximum Likelihood channel estimate

Cr

RbN

i

2Ki

bi

ri

Ai

Page 8: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Implementation complexity

Matrix inversions (size 32x32) per window

Unable to meet real-time on DSPs [Asilomar’99]

VLSI fixed-point architectures for matrix inversions

– Difficult to design , Finite precision problems

Typically, simpler single-user sliding correlator

structures used.

Page 9: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Outline

What is multiuser channel estimation?

Need for multiuser channel estimation

Implementation problems

Algorithm enhancements

VLSI architectures

– Area-constrained,Time-constrained, Area-Time efficient

Conclusions

Page 10: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Iterative scheme for channel estimation

Bit-streaming : suitable for tracking (window length L)

Method of gradient descent

Stable convergence behavior

Simple fixed-point VLSI architecture

T00

TLL

)1i(bb

)i(bb b*bb*bRR

H00

HLL

)1i(br

)i(br r*br*bRR

)RR*A(AA )i(br

)i(bb

)1i()1i()i(

Page 11: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

4 5 6 7 8 9 10 11 1210

-3

10-2

10-1 Comparison of Bit Error Rates (BER)

Signal to Noise Ratio (SNR)

BER

MF ActMFML ActML

O(K2N)

O(K3+K2N)

Simulations - Static multipath channel

SINR = 0 dB

Paths =3

Preamble =150

Spreading N = 31

Users K = 15

Page 12: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Rayleigh Fading channel with tracking

4 5 6 7 8 9 10 11 1210

-3

10-2

10-1

100

SNR

BE

R

MF - Static MF - TrackingML - Static ML - Tracking

Doppler = 10 Kmph

Page 13: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Outline

What is multiuser channel estimation?

Need for multiuser channel estimation

Implementation problems

Algorithm enhancements

VLSI architectures

– Area-constrained,Time-constrained, Area-Time efficient

Conclusions

Page 14: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Area-Time Tradeoffs

Design for 32 users (K) and spreading code (N) 32

Target = 128 Kbps (4000 cycles at 500 MHz).

Assume single cycle addition/multiplication

Area-Constrained Architecture :Pico-cells/fewer users

Time-Constrained Architecture : Maximum data rates

Area-Time Efficient Architecture : Real-Time

Page 15: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Task decomposition: channel estimation

IterateCorrelation Matrices (Per Bit)

Pilot Bits

Pilot

MUX

Detected Bits

Data

MUX

AO(4K2N,8)

Rbr

O(2KN,8)

Rbb

O(2K2,8)

TIME

ChannelEstimate

to Detector

b0

(2K,1)

Tracking Window

r0

(N,8)

b(2K,1)

r(N,8)

L

Page 16: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Architecture design: auto-correlation

b = {+1,-1}

Multiplication is a XNOR operation

Matrix updated using XNOR gates

Auto-correlation matrix implemented as an

UP/DOWN counter(s)

T00

TLL

)1i(bb

)i(bb b*bb*bRR

Page 17: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Architecture design: cross-correlation

b = {+1,-1}, r = 8-bit integer vector (complex)

Multiplications reduce to additions/subtractions

Matrix (complex) can be updated with 8-bit adders

Cross-correlation matrix stored as RAM.

H00

HLL

)1i(br

)i(br r*br*bRR

Page 18: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Architecture design: channel estimate

A = 8-bit integer matrix (complex)

µ << 1 : Truncated multiplication [Schulte’93]

Matrix-matrix (real-complex) multiplication of integers

Forms the bottleneck (8-bit multipliers)

Concentrate on multiplication for area-time tradeoffs!

)RR*A(AA )i(br

)i(bb

)1i()1i()i(

Page 19: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Area-Constrained Architecture

b0

bMUX

EN

Counter

Rbb A(i)

DEMUXMUX

MAC

Add/

SubAdd/Sub

Subtract

Subtract

A(i-1)

U/D

Load Store

ji

i j

j j

r0r

b

b0

16

8

8

88

8 8

1

11

1

1

1

1

1

1

88

88

Rbr

>>8

816

T00

TLL

)1i(bb

)i(bb b*bb*bRR

H00

HLL

)1i(br

)i(br r*br*bRR

)RR*A(AA )i(br

)i(bb

)1i()1i()i(

Channel Estimate

Page 20: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Area-constrained Architecture: Hardware Requirements

Blocks Quantity Full AdderCells

Complex Total

Counter 1*8 8 - 8

Multiplier 1*8 64 *2 128

Adders 3*8 + 2*16 56 *2 112

Total Area 248FA cells

Total Time(N=K=32)

4K2N 128,000cycles

Page 21: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Time-constrained Architecture

b*bT

b0*b0T

b

b0

MUX

Rbr

M

UX

r

r0

MUX

Rbb A

Mult

Subtract >>

Subtract

2K*12K*1

2K*1 K(2K-1)*1

K(2K-1)*1

2K2*8

2KN*16

2KN*162KN*8

2K*1

N*8

N*8

N*8

2KN*8

2KN*8

ChannelEstimate

T00

TLL

)1i(bb

)i(bb b*bb*bRR

H00

HLL

)1i(br

)i(br r*br*bRR

)RR*A(AA )i(br

)i(bb

)1i()1i()i(

Page 22: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Auto-correlation Update in Parallel

Rbb(i,j)

Counter

bbT(i,j)U/D#

Rbb(i,i)

Counter

1

U/D#

Array of XNORs

a·b a·c a·d

b·c b·d

c·d

b c da

b (2K)

bbT (K*{2K-1}*1)Rbb (2K2*8)

Array of Counters

T00

TLL

)1i(bb

)i(bb b*bb*bRR

Page 23: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Cross-Correlation Update in Parallel

b c da

b (2K*1)

r (N*8)

Rbr (2KN*8)

r(j)

Rbr(i,j)

Adder

b(i)Add/Sub#

8 8

1

H00

HLL

)1i(br

)i(br r*br*bRR

Page 24: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Time-constrained Architecture: Hardware Requirements

Blocks Quantity Full AdderCells

Complex Total

Counter 2K2*8 16K2 - 16K2

Multiplier 4K2N*8 256K2N *2 512K2N

Adders 2KN*16 +2KN*8 +4K2N*16

48KN +64K2N

*2 96KN +128K2N

Total Area(N=K=32)

20,000,000FA cells

Total Time Log2(2K) 6 cycles

Page 25: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Area-Time efficient architecture design

Area - constrained Architecture

– Minimize area - single 8-bit multiplier

– 4K2N cycles (128,000 cycles ; 3.81 Kbps)

Time-constrained Architecture

– Minimize time - 4K2N 8-bit multipliers

– Log2(2K) cycles (6 cycles ; 83.33 Mbps)

Aim : To meet real-time with min. area overhead

Different parallelism levels for multipliers

Page 26: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Area-Time Efficient Architecture

b*bT b0*b0T

b b0

MUX

M

UX

r

r0

MUX

Mult

Subtract >>

Subtract

2K*1 2K*1

2K*12K*1

2K*1 2K*8

2K*8

1*16

1*161*8

1*1

1*8

N*8

N*8

1*8

Rbr

Counters

StoreLoad

Rbb

A(i)

DEMUXMUX

A(i-1)

1*8

Adder

1*8

2K*1

2K*8

2K*8

T00

TLL

)1i(bb

)i(bb b*bb*bRR

H00

HLL

)1i(br

)i(br r*br*bRR

)RR*A(AA )i(br

)i(bb

)1i()1i()i(

Channel Estimate

Page 27: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Area-Time Efficient Architecture: Hardware Requirements

Blocks Quantity Full AdderCells

Complex Total

Counter 2K*8 16K - 16K

Multiplier 2K*8 128K *2 256K

Adders 2K*16 +2*8 + 1*16

32K + 32 *2 64K + 64

Total Area(N=K=32)

10,000FA cells

Total Time 2KN 2,000cycles

Page 28: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Outline

What is multiuser channel estimation?

Need for multiuser channel estimation

Implementation problems

Algorithm enhancements

VLSI architectures

– Area-constrained,Time-constrained, Area-Time efficient

Conclusions

Page 29: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Comparisons

Implementation ClockRate

Full AdderCells

Data Rates

C67 DSP 166 MHz - 1.02 KbpsArea 500 MHz 248 3.81 Kbps

: : : :Area-Time 500 MHz 104 256 Kbps

: : : :

Time 500 MHz 2x107 83.33 Mbps

DSPs unable to exploit bit-level parallelismInefficient storage of bitsReplacing multiplications by additions/subtractions

Page 30: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Scalability of Architectures with K

Disadvantages of VLSI architecturesDesign for maximum number of users in the systemIf there are fewer users,

– Turn off functional units to reduce power

– Reconfigure hardware for higher data rates (FPGA)

– Dr. Cavallaro, don’t know to handle this Question properly

– We never designed an architecture/algorithm for varying number of users dynamically. (Though we had started on it)

– What should be included in future work?

– Please give suggestions!!

Page 31: Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Conclusions

Real-Time VLSI architecture for multiuser channel

estimation

Iterative fixed-point algorithm developed to avoid matrix

inversions

Area-Time Tradeoffs presented

– Area-Constrained, Time-Constrained, Area-Time efficient

VLSI architectures exploit bit-level computations and

parallelism to meet real-time.