Upload
leo-valdez
View
34
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers. Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang. This work is supported by Nokia, TI, TATP and NSF. Introduction. - PowerPoint PPT Presentation
Citation preview
Efficient VLSI architectures for baseband signal processing in wireless base-station
receivers
Sridhar Rajagopal, Srikrishna Bhashyam,
Joseph R. Cavallaro, and Behnaam Aazhang
This work is supported by Nokia, TI, TATP and NSF
Introduction
A real-time VLSI architecture for channel estimation
Usually neglected, but high computational complexity
Current DSP solutions do not meet real-time
Iterative fixed point algorithm developed
Area-Time tradeoffs presented
– Area-Constrained,Time-Constrained, Area-Time efficient
Outline
What is multiuser channel estimation?
Need for multiuser channel estimation
Implementation problems
Algorithm enhancements
VLSI architectures
– Area-constrained,Time-constrained, Area-Time efficient
Conclusions
Evolution of mobile communications
First generationVoice
Second/Current generationVoice + Low-rate data
(9.6Kbps)
Third generation +Voice + High-rate data
(2 Mbps/384 Kbps/128 Kbps) + multimedia
Need for channel estimation
To compensate for unknown fading amplitudes and
asynchronous delays.
Detector performance depends on accuracy of channel
estimator
Multiuser Channel Estimation
– Jointly estimate parameters for all users
– Better performance than single user estimates
Computing channel estimates
Computed by sending a training sequence of known
bits to the receiver.
When absent, detected bits can be used to update
estimates in a decision feedback mode for tracking.
Importance usually neglected
May exceed detector complexity
Baseband signal processing
Base-Station Receiver
Channel estimation
Detection DecodingMultiple Users
Antenna
Detected Bits
TrackingTraining
rbRH
iibr
bbRT
iibb
RAR bribb *
Multiuser Channel Estimation Algorithm
b = {+1, -1} : Training/Tracking bits r = 8-bit integer (complex) : Received signal N = spreading gain (typically fixed ,e.g: 32) K = number of users (variable, <=N) A = Maximum Likelihood channel estimate
Cr
RbN
K
i
i
2
Implementation complexity
Matrix inversions (size 32x32) per window
Unable to meet real-time on DSPs [Asilomar’99]
VLSI fixed-point architectures for matrix inversions
– Difficult to design , Finite precision problems
Typically, simpler single-user sliding correlator
structures used.
Outline
What is multiuser channel estimation?
Need for multiuser channel estimation
Implementation problems
Algorithm enhancements
VLSI architectures
– Area-constrained,Time-constrained, Area-Time efficient
Conclusions
Iterative scheme for channel estimation
Bit-streaming : suitable for trackingMethod of gradient descentStable convergence behaviorSimple fixed-point VLSI architecture
TTLLbbbb bbbbRR 00 **
HHLLbrbr rbrbRR 00 **
)*( brbb RRAAA
4 5 6 7 8 9 10 11 1210
-3
10-2
10-1 Comparison of Bit Error Rates (BER)
Signal to Noise Ratio (SNR)
BER
MF ActMFML ActML
O(K2N)
O(K3+K2N)
Simulations - Static multipath channel
SINR = 0 dB
Paths =3
Preamble L =150
Spreading N = 31
Users K = 15
Fading channel with tracking
4 5 6 7 8 9 10 11 1210
-3
10-2
10-1
100
SNR
BE
R
MF - Static MF - TrackingML - Static ML - Tracking
Doppler = 10 Kmph
Outline
What is multiuser channel estimation?
Need for multiuser channel estimation
Implementation problems
Algorithm enhancements
VLSI architectures
– Area-constrained,Time-constrained, Area-Time efficient
Conclusions
Area-Time Tradeoffs
Design for 32 users (K) and spreading code (N) 32
Target Data Rate = 128 Kbps (4000 cycles at 500 MHz).
Area-Constrained Architecture : Pico-cells or fewer users
Time-Constrained Architecture : Maximum data rates
Area-Time Efficient Architecture : Real-Time
Task decomposition: channel estimation
IterateCorrelation Matrices (Per Bit)
Pilot Bits
Pilot
MUX
Detected Bits
Data
MUX
AO(4K2N,8)
Rbr
O(2KN,8)
Rbb
O(2K2,8)
TIME
ChannelEstimate
to Detector
b0
(2K,1)
Tracking Window
r0
(N,8)
b(2K,1)
r(N,8)
L
TTLLbbbb bbbbRR 00 **
Architecture design: auto-correlation
b = {+1,-1}
Multiplication is a XNOR operation
Matrix updated using XNOR gates
Auto-correlation matrix implemented as an
UP/DOWN counter(s)
Architecture design: cross-correlation
HHLLbrbr rbrbRR 00 **
b = {+1,-1}, r = 8-bit integer vector (complex)
Multiplications reduce to additions/subtractions
Matrix (complex) can be updated with 8-bit adders
Cross-correlation matrix stored as RAM.
Architecture design: channel estimate
)*( brbb RRAAA
A = 8-bit integer matrix (complex)
µ << 1 : Truncated multiplication [Schulte’93]
Matrix-matrix (real-complex) multiplication of integers
Forms the bottleneck (8-bit multipliers)
Concentrate on multiplication for area-time tradeoffs!
Area-Constrained Architecture
b0
bMUX
EN
Counter
Rbb A
DEMUXMUX
MAC
Add/
SubAdd/Sub
Subtract
Subtract
Anew
U/D
Load Store
ji
i j
j j
r0r
b
b0
16
8
8
88
8 8
1
11
1
1
1
1
1
1
88
88
Rbr
>>8
816
Area-constrained Architecture: Hardware Requirements
Blocks Quantity Full AdderCells
Complex Total
Counter 1*8 8 - 8
Multiplier 1*8 64 *2 128
Adders 3*8 + 2*16 56 *2 112
Total Area 248FA cells
Total Time(N=K=32)
4K2N 128,000cycles
Time-constrained Architecture
b*bT
b0*b0T
b
b0
MUX
Rbr
M
UX
r
r0
MUX
Rbb A
Mult
Subtract >>
Subtract
2K*12K*1
2K*1 K(2K-1)*1
K(2K-1)*1
2K2*8
2KN*16
2KN*162KN*8
2K*1
N*8
N*8
N*8
2KN*8
2KN*8
ChannelEstimate
Auto-correlation Update in Parallel
Rbb(i,j)
Counter
bbT(i,j)
U/D#
Rbb(i,i)
Counter
1
U/D#
Array of XNORs
a·b a·c a·d
b·c b·d
c·d
b c da
b (2K)
bbT (K*{2K-1}*1) Rbb (2K2*8)
Array of Counters
Cross-Correlation Update in Parallel
b c da
b (2K*1)
r (N*8)
Rbr (2KN*8)
r(j)
Rbr(i,j)
Adder
b(i)
Add/Sub#
8 8
1
Time-constrained Architecture: Hardware Requirements
Blocks Quantity Full AdderCells
Complex Total
Counter 2K2*8 16K2 - 16K2
Multiplier 4K2N*8 256K2N *2 512K2N
Adders 2KN*16 +2KN*8 +4K2N*16
48KN +64K2N
*2 96KN +128K2N
Total Area(N=K=32)
20,000,000FA cells
Total Time Log2(2K) 6 cycles
Area-Time Efficient Architecture
b*bT b0*b0T
b b0
MUX
M
UX
r
r0
MUX
Mult
Subtract >>
Subtract
2K*1 2K*1
2K*12K*1
2K*12K*8
2K*8
1*16
1*161*8
1*1
1*8
N*8
N*8
1*8
Rbr
Counters
StoreLoad
Rbb
A
DEMUXMUX
Anew
1*8
Adder
1*8
2K*1
2K*8
2K*8
Area-Time Efficient Architecture: Hardware Requirements
Blocks Quantity Full AdderCells
Complex Total
Counter 2K*8 16K - 16K
Multiplier 2K*8 128K *2 256K
Adders 2K*16 +2*8 + 1*16
32K + 32 *2 64K + 64
Total Area(N=K=32)
10,000FA cells
Total Time 2KN 2,000cycles
Outline
What is multiuser channel estimation?
Need for multiuser channel estimation
Implementation problems
Algorithm enhancements
VLSI architectures
– Area-constrained,Time-constrained, Area-Time efficient
Conclusions
Comparisons
Implementation ClockRate
Full AdderCells
Data Rates
Area 500 MHz 248 3.81 KbpsTime 500 MHz 2x107 83.33 Mbps
Area-Time 500 MHz 104 256 KbpsC67 DSP 166 MHz - 1.02 Kbps
DSPs unable to exploit bit-level parallelismInefficient storage of bitsReplacing multiplications by additions/subtractions
Conclusions
Real-Time VLSI architecture for multiuser channel
estimation
Iterative fixed-point algorithm developed to avoid matrix
inversions
Area-Time Tradeoffs presented
– Area-Constrained, Time-Constrained, Area-Time efficient
VLSI architectures exploit bit-level computations and
parallelism to meet real-time.