Upload
sarah-stone
View
217
Download
0
Embed Size (px)
Citation preview
RICE UNIVERSITY
On the architecture design of a 3G W-CDMA/W-LAN receiver
Sridhar Rajagopal and Joseph R. Cavallaro
Rice UniversityCenter for Multimedia Communication
http://[email protected]
This work is supported by Nokia, TI, TATP and NSF
RICE UNIVERSITY
Introduction
A baseband communications processor
Wireless LANWideband CDMARENE
W ire le s s M o bile de vic e(La pto p/P D A /C e ll pho ne )
B a s e ba ndC o m m unic a tio ns
P ro c e s s o rR F U nitA /D
D /A
A dd-o n P C M C IA N e tw o rk Inte rfa c e C a rdH ig he r La ye rs
(M A C /N e tw o rk /A pplic a tio n)
RICE UNIVERSITY
Motivation
No architecture developed yet to meet real-time requirements of 3G systems.
2 - 8 Mbps range for wideband CDMA
100 Mbps range for wireless LAN
Design factors that makes the problem harder
Low power
Flexibility
RICE UNIVERSITY
Previous Work
Designing algorithms from an implementation perspective algorithms with high degree of parallelism fixed-point computations simple operations - multiplications/additions
Example: multiuser channel estimation & detection
Real-time implementation on DSPs/FPGAs/ASICs area-time tradeoffs
RICE UNIVERSITY
Possible contributions of this work
A real-time low-power VLSI architecture design
using on-line arithmetic
A real-time programmable architecture design
using a media processor simulator -- IMAGINE
Integrating these two architectures in one.
RICE UNIVERSITY
Contents
Low-power VLSI architecture design using on-
line arithmetic
Programmable architecture design using the
IMAGINE simulator
Conclusions
RICE UNIVERSITY
On-line arithmetic
Uses a redundant number representation.
Pipelined digit-serial arithmetic with MSDF computations.
Successive computations as soon as inputs available ( = 1..4, typically).
Algorithms available for various operations (+,*,/,sqrt) and for fixed-point computations.
z5…z4z3z2z1
Output z
…y5y4y3y2y1Input y
…x5x4x3x2x1Input x
RICE UNIVERSITY
Why is on-line arithmetic useful?
Conventional operations in 3G wireless systems high precision operations (16-32 bits) but with
low precision outputs.
Only most significant digits (1-3 bits) needed.
Use MSDF computation to find the needed digits and avoid computation of the successive digits.
Digit-serial computations and hence, low power
Detection
RICE UNIVERSITY
Redundant number systems
Radix -r number system: digit has |r| values: 0,1,2…..,r-1
Redundant number system: digit has q >|r| values r+2 q 2r-1
Example: each digit in the number has a sign associated with it. 10(-1)2 = 992 has 2 equivalent representations.
Redundancy helps in carry-free additions - MSDF
RICE UNIVERSITY
Adder Implementation
d -b itC LA
t = t con v * l o g 2(d)
a b
c
C SA (4 - to -2 ) r adix( r )+ 1
R E G P SR E G SCa j b j
S e le ct
C S A (3 - to -2 )c j-1
W j
P j-1
P j
P j
t = t O L
P S - P ar tia l s u m sS C - S to r ed c ar r ies
C o nve nt i o nal b i t - par al l e l adde r O n-l i ne d i g i t - s e r i al adde r
tconv – conventional adder time per bittOL – online delay time per digitd – bit-precision
RICE UNIVERSITY
On-line radix-4 adder
Digit serial inputs
Digit serial outputs
Digit selection
Carry Save AddersResidual feedback
RICE UNIVERSITY
Comparison with regular adders
Addition time and area independent of digit precision (X area dependent on precision)
Savings in time obtained by chaining operations as successive operations can start as soon as MSD is obtained.
0 10 20 30 40 50 60 7010 0
101
102
Precision (in bits)
Re
lati
ve
Th
rou
gh
pu
t
Carry Look Ahead Adder
Radix-4 Online Adder
Ripple Carry Adder
0 10 20 30 40 50 60 70100
101
102
103
Precision (in bits)
Re
lati
ve
Are
a
Ripple Carry Adder
Carry Look Ahead Adder
Radix-4 Online Adder
RICE UNIVERSITY
-1 -0.5 0 0.5 10
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Signal Amplitude
Tim
e ta
ken
for
addi
tion
On-line addition
Conventional addition
Dependency of execution time for on-line addition on SNR
RICE UNIVERSITY
Detection Example
Multi-user
Single user
Detector
3.00m* tOL=8tCMF =24Throughput
+m* S*tOL=94+2*tCMF =1681.79
tMF+S*tPIC(2*S-1)*tCPIC
Latency
log2 (d)*tconv = 212.63
m* tOL=8(log2(N)+2)*Throughput
tOL +tstop = 14log2 (d)*tconv = 211.50
(log2(N)+2)*(log2(N)+2)*Latency
SpeedupOn-lineConventional
RICE UNIVERSITY
Low power VLSI design
Power savings due to 2 reasonseliminating unwanted computationsdigit-serial hardware
Real-time requirements met by proper pipelining of computations and exploiting parallelism in the algorithms.
RICE UNIVERSITY
Contents
Low-power VLSI architecture design using on-line arithmetic
Programmable architecture design using the IMAGINE simulator
Conclusions
RICE UNIVERSITY
A programmable architecture simulator
Flexibility in the algorithm requirementschannel dependent computationschanging algorithms on-the-flyseamless switching between wireless LAN
and wideband CDMA.
Simulator needed to test performance of algorithmsextensions/modifications for critical
operations
RICE UNIVERSITY
The IMAGINE architecture and simulator
IMAGINE is a media signal processor, built at Stanford.
Many common workload features
Good starting point to explore.
Local expertise - Dr. Scott Rixner ([email protected])
RICE UNIVERSITY
IMAGINE architecture
Great for media processing algorithms1024 pt FFT in 7.4 s on a 500 MHz
processor with a 8-cluster (48 units) 3.8W of power
Great for parallel, vector and streaming computations
Performance/extensions to sequential computation kernels such as Viterbi traceback needs to be investigated.
RICE UNIVERSITY
Conclusions
On-line arithmetic useful for a low power real-time implementation
A programmable real-time architecture is being investigated using the IMAGINE simulator
Aim is to then integrate these two features