Upload
huy
View
24
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers. Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang. This work is supported by Nokia, TI, TATP and NSF. Introduction. - PowerPoint PPT Presentation
Citation preview
Efficient VLSI architectures for baseband signal processing in wireless base-station
receivers
Sridhar Rajagopal, Srikrishna Bhashyam,
Joseph R. Cavallaro, and Behnaam Aazhang
This work is supported by Nokia, TI, TATP and NSF
Introduction
Real-time VLSI architecture for multiuser channel
estimation
Multiuser channel estimation usually neglected
– high computational complexity - DSPs infeasible
– Single user sliding correlator structures used
Iterative fixed point algorithm developed
Area-Time tradeoffs presented
– Area-Constrained,Time-Constrained, Area-Time efficient
Baseband signal processing
Base-Station Receiver
Channel estimation
Detection DecodingMultiple Users
Antenna
Detected Bits
TrackingTraining
Channel estimation
Direct Path
Reflected Path
Noise +MAI
User 1
User 2
Base Station
compensate for unknown fading amplitudes and
asynchronous delays.
Need for multiuser channel estimation
Detector performance depends on accuracy of channel
estimator
Multiuser Channel Estimation
– Jointly estimate parameters for all users
– Better performance than single user estimates
Computing multiuser channel estimates
Computed by sending a training sequence of known
bits to the receiver.
When absent, detected bits can be used to update
estimates in a decision feedback mode for tracking.
Importance of multiuser estimation usually neglected
May exceed detector complexity
rbR Hiibr
bbR Tiibb
RA*R bribb
Multiuser Channel Estimation Algorithm
= {+1, -1} : Training/Tracking bits
= 8-bit integer (complex) : Received signal
N = spreading gain (typically fixed ,e.g: 32)
K = number of users (variable, <=N)
= Maximum Likelihood channel estimate
Cr
RbN
i
2Ki
bi
ri
Ai
Implementation complexity
Matrix inversions (size 32x32) per window
Unable to meet real-time on DSPs [Asilomar’99]
VLSI fixed-point architectures for matrix inversions
– Difficult to design , Finite precision problems
Typically, simpler single-user sliding correlator
structures used.
Outline
What is multiuser channel estimation?
Need for multiuser channel estimation
Implementation problems
Algorithm enhancements
VLSI architectures
– Area-constrained,Time-constrained, Area-Time efficient
Conclusions
Iterative scheme for channel estimation
Bit-streaming : suitable for tracking (window length L)
Method of gradient descent
Stable convergence behavior
Simple fixed-point VLSI architecture
T00
TLL
)1i(bb
)i(bb b*bb*bRR
H00
HLL
)1i(br
)i(br r*br*bRR
)RR*A(AA )i(br
)i(bb
)1i()1i()i(
4 5 6 7 8 9 10 11 1210
-3
10-2
10-1 Comparison of Bit Error Rates (BER)
Signal to Noise Ratio (SNR)
BER
MF ActMFML ActML
O(K2N)
O(K3+K2N)
Simulations - Static multipath channel
SINR = 0 dB
Paths =3
Preamble =150
Spreading N = 31
Users K = 15
Rayleigh Fading channel with tracking
4 5 6 7 8 9 10 11 1210
-3
10-2
10-1
100
SNR
BE
R
MF - Static MF - TrackingML - Static ML - Tracking
Doppler = 10 Kmph
Outline
What is multiuser channel estimation?
Need for multiuser channel estimation
Implementation problems
Algorithm enhancements
VLSI architectures
– Area-constrained,Time-constrained, Area-Time efficient
Conclusions
Area-Time Tradeoffs
Design for 32 users (K) and spreading code (N) 32
Target = 128 Kbps (4000 cycles at 500 MHz).
Assume single cycle addition/multiplication
Area-Constrained Architecture :Pico-cells/fewer users
Time-Constrained Architecture : Maximum data rates
Area-Time Efficient Architecture : Real-Time
Task decomposition: channel estimation
IterateCorrelation Matrices (Per Bit)
Pilot Bits
Pilot
MUX
Detected Bits
Data
MUX
AO(4K2N,8)
Rbr
O(2KN,8)
Rbb
O(2K2,8)
TIME
ChannelEstimate
to Detector
b0
(2K,1)
Tracking Window
r0
(N,8)
b(2K,1)
r(N,8)
L
Architecture design: auto-correlation
b = {+1,-1}
Multiplication is a XNOR operation
Matrix updated using XNOR gates
Auto-correlation matrix implemented as an
UP/DOWN counter(s)
T00
TLL
)1i(bb
)i(bb b*bb*bRR
Architecture design: cross-correlation
b = {+1,-1}, r = 8-bit integer vector (complex)
Multiplications reduce to additions/subtractions
Matrix (complex) can be updated with 8-bit adders
Cross-correlation matrix stored as RAM.
H00
HLL
)1i(br
)i(br r*br*bRR
Architecture design: channel estimate
A = 8-bit integer matrix (complex)
µ << 1 : Truncated multiplication [Schulte’93]
Matrix-matrix (real-complex) multiplication of integers
Forms the bottleneck (8-bit multipliers)
Concentrate on multiplication for area-time tradeoffs!
)RR*A(AA )i(br
)i(bb
)1i()1i()i(
Area-Constrained Architecture
b0
bMUX
EN
Counter
Rbb A(i)
DEMUXMUX
MAC
Add/
SubAdd/Sub
Subtract
Subtract
A(i-1)
U/D
Load Store
ji
i j
j j
r0r
b
b0
16
8
8
88
8 8
1
11
1
1
1
1
1
1
88
88
Rbr
>>8
816
T00
TLL
)1i(bb
)i(bb b*bb*bRR
H00
HLL
)1i(br
)i(br r*br*bRR
)RR*A(AA )i(br
)i(bb
)1i()1i()i(
Channel Estimate
Area-constrained Architecture: Hardware Requirements
Blocks Quantity Full AdderCells
Complex Total
Counter 1*8 8 - 8
Multiplier 1*8 64 *2 128
Adders 3*8 + 2*16 56 *2 112
Total Area 248FA cells
Total Time(N=K=32)
4K2N 128,000cycles
Time-constrained Architecture
b*bT
b0*b0T
b
b0
MUX
Rbr
M
UX
r
r0
MUX
Rbb A
Mult
Subtract >>
Subtract
2K*12K*1
2K*1 K(2K-1)*1
K(2K-1)*1
2K2*8
2KN*16
2KN*162KN*8
2K*1
N*8
N*8
N*8
2KN*8
2KN*8
ChannelEstimate
T00
TLL
)1i(bb
)i(bb b*bb*bRR
H00
HLL
)1i(br
)i(br r*br*bRR
)RR*A(AA )i(br
)i(bb
)1i()1i()i(
Auto-correlation Update in Parallel
Rbb(i,j)
Counter
bbT(i,j)U/D#
Rbb(i,i)
Counter
1
U/D#
Array of XNORs
a·b a·c a·d
b·c b·d
c·d
b c da
b (2K)
bbT (K*{2K-1}*1)Rbb (2K2*8)
Array of Counters
T00
TLL
)1i(bb
)i(bb b*bb*bRR
Cross-Correlation Update in Parallel
b c da
b (2K*1)
r (N*8)
Rbr (2KN*8)
r(j)
Rbr(i,j)
Adder
b(i)Add/Sub#
8 8
1
H00
HLL
)1i(br
)i(br r*br*bRR
Time-constrained Architecture: Hardware Requirements
Blocks Quantity Full AdderCells
Complex Total
Counter 2K2*8 16K2 - 16K2
Multiplier 4K2N*8 256K2N *2 512K2N
Adders 2KN*16 +2KN*8 +4K2N*16
48KN +64K2N
*2 96KN +128K2N
Total Area(N=K=32)
20,000,000FA cells
Total Time Log2(2K) 6 cycles
Area-Time efficient architecture design
Area - constrained Architecture
– Minimize area - single 8-bit multiplier
– 4K2N cycles (128,000 cycles ; 3.81 Kbps)
Time-constrained Architecture
– Minimize time - 4K2N 8-bit multipliers
– Log2(2K) cycles (6 cycles ; 83.33 Mbps)
Aim : To meet real-time with min. area overhead
Different parallelism levels for multipliers
Area-Time Efficient Architecture
b*bT b0*b0T
b b0
MUX
M
UX
r
r0
MUX
Mult
Subtract >>
Subtract
2K*1 2K*1
2K*12K*1
2K*1 2K*8
2K*8
1*16
1*161*8
1*1
1*8
N*8
N*8
1*8
Rbr
Counters
StoreLoad
Rbb
A(i)
DEMUXMUX
A(i-1)
1*8
Adder
1*8
2K*1
2K*8
2K*8
T00
TLL
)1i(bb
)i(bb b*bb*bRR
H00
HLL
)1i(br
)i(br r*br*bRR
)RR*A(AA )i(br
)i(bb
)1i()1i()i(
Channel Estimate
Area-Time Efficient Architecture: Hardware Requirements
Blocks Quantity Full AdderCells
Complex Total
Counter 2K*8 16K - 16K
Multiplier 2K*8 128K *2 256K
Adders 2K*16 +2*8 + 1*16
32K + 32 *2 64K + 64
Total Area(N=K=32)
10,000FA cells
Total Time 2KN 2,000cycles
Outline
What is multiuser channel estimation?
Need for multiuser channel estimation
Implementation problems
Algorithm enhancements
VLSI architectures
– Area-constrained,Time-constrained, Area-Time efficient
Conclusions
Comparisons
Implementation ClockRate
Full AdderCells
Data Rates
C67 DSP 166 MHz - 1.02 KbpsArea 500 MHz 248 3.81 Kbps
: : : :Area-Time 500 MHz 104 256 Kbps
: : : :
Time 500 MHz 2x107 83.33 Mbps
DSPs unable to exploit bit-level parallelismInefficient storage of bitsReplacing multiplications by additions/subtractions
Scalability of Architectures with K
Disadvantages of VLSI architecturesDesign for maximum number of users in the systemIf there are fewer users,
– Turn off functional units to reduce power
– Reconfigure hardware for higher data rates (FPGA)
– Dr. Cavallaro, don’t know to handle this Question properly
– We never designed an architecture/algorithm for varying number of users dynamically. (Though we had started on it)
– What should be included in future work?
– Please give suggestions!!
Conclusions
Real-Time VLSI architecture for multiuser channel
estimation
Iterative fixed-point algorithm developed to avoid matrix
inversions
Area-Time Tradeoffs presented
– Area-Constrained, Time-Constrained, Area-Time efficient
VLSI architectures exploit bit-level computations and
parallelism to meet real-time.