Upload
skyler
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Implementation Issues for Channel Estimation and Detection Algorithms for W-CDMA. Sridhar Rajagopal and Joseph Cavallaro ECE Dept. Contents. Introduction W-CDMA Channel Estimation and Detection DSP Implementation ASIC Implementation Other Current Projects Future Work. - PowerPoint PPT Presentation
Citation preview
Implementation Issues for Channel Implementation Issues for Channel Estimation and Detection Algorithms Estimation and Detection Algorithms for W-CDMAfor W-CDMA
Sridhar Rajagopal and Joseph CavallaroECE Dept.
ContentsContents
IntroductionW-CDMAChannel Estimation and DetectionDSP ImplementationASIC ImplementationOther Current Projects Future Work
The CDMA Research GroupThe CDMA Research Group
We cover the entire (spread) spectrum!
– Algorithms– Implementation issues
D M
Implementation IssuesImplementation Issues
Important because– Real-time– Low Power– Mobility /Size
DSPs– Signal Processing Communications
ASICs / FPGAs– Speed / Size
W-CDMAW-CDMA
CDMA : Code Division Multiple Access
W-CDMA : Wideband CDMA (5 MHz)
– Next Generation Communication Systems
– Integrating Multimedia Capabilities
– QoS /Multi-rate Services
– Higher Data Rates 2048,384,144 Kbps
Uplink - Async, MultiuserUplink - Async, Multiuser
Direct Path
Reflected PathsBase
Station
Noise + MAI
User 1
User 2
Downlink - Sync, Single UserDownlink - Sync, Single User
Direct Path
Reflected Paths
Base Station
Noise + MAI
User 1
User 2 User 1
Determining the ChannelDetermining the Channel Channel Estimation
– Need to know the Channel for proper detection• Delays and Amplitudes : Multiuser/path
– Send sequence of known bits (Pilot / Preamble)
– 2 types• Code Multiplexed with Data• Time Multiplexed with Data
Detection– Use knowledge of channel for detection of Data bits
W-CDMA Standards W-CDMA Standards
Not fixed yet…...Uplink
– Channel Estimation - Time Multiplexed– Multiuser Detection
Downlink– Channel Estimation - Common Pilot– Detection : Rake Receivers/ Equalizers
Channel EstimationChannel Estimation
Uplink– Time Multiplexed
• Maximum Likelihood
• Subspace
Downlink– Continuous
• LMS Based Adaptive
Multiuser Detection Multiuser Detection
Optimal Sub-optimal
Linear Interference Cancellation
Neural Network
•MAI Whitening
•Decorrelating
•MMSE
•Serial SIC
•Parallel PIC
MLSE (Viterbi)
Base-Station Receiver Base-Station Receiver
Channel Estimator
Multiuser
Detector
Demux Decoder
Data
Pilot
Estimated Amplitudes &
Delays
Demodulator
Antenna
CDMA Uplink SystemCDMA Uplink System
Channel
Encoder
Channel
Encoder
Channel
Encoder
Spreading
Spreading
Spreading
AWGN
Matched
Filter
Matched
Filter
Channel
Estimator
Matched
Filter
Multi-
User
Detector
Channel
Decoder
+
User 1d1
User 2d2
User KdK
R(t)
User 1d1
'
User 2d2
'
User KdK
'
y1
y2
yK
Demux
Maximum Likelihood Channel EstimationMaximum Likelihood Channel Estimation
Send a time-multiplexed Preamble (Pilot).
Channel properties extracted
Compare with known pilot and estimate.
Keep estimate for remaining data bits (static). Repeat preamble every frame, if no tracking.
The Maximum Likelihood AlgorithmThe Maximum Likelihood Algorithm
Compute the correlation matrices
Compute the channel estimate
Calculate the noise covariance matrix K.
Calculate the channel impulse response vector z.
Extract the ampitudes and delays using least
squares fit.
bb.brrr R & R ,R
.bb-1
br R R Y
The ML Algorithm Complexity The ML Algorithm Complexity Complex-Real Dot Product.
Complex-Real Matrix Product.
Complex -Real Product.
Real Square roots.– Solving quadratic equation for least squares fit.
Critical code : Matrix-vector / Dot Product
r.bL1
Rbr
1
bbbr RRY
1''
212))((
UUUUUyUyz L
k
L
k
R
k
R
k
L
k
H
k
R
k
H
k
H
k
Assuming Unity Noise CovarianceAssuming Unity Noise Covariance
Offline
Differencing Multistage Multiuser DetectionDifferencing Multistage Multiuser Detection
Based on the principle of Parallel Interference
Cancellation (PIC)
Cross-correlation information used to remove
interference of other users
Repeated iterations for convergence
Differencing techniques to improve performance
The Differencing Multistage DetectorThe Differencing Multistage Detector
Split the cross correlation matrix into lower, upper
and the diagonal matrix.
Calculate impulse response
x is called the differencing vector.
TSSDR
R
D
S
TS
})2,2,0{ˆ(
ˆˆˆ
ˆ)()2()1()1(
)1()1()(
k
lll
lTll
xddx
xASSAzAz
where
Multistage Detector ComplexityMultistage Detector Complexity
Matrix Multiplication:
– Computed only once for one frame
Dot Product:
– Computed iteratively
Critical code: Dot Product
ASSB T )(
ljij
lk
lk xBzz ˆ1
TI Tools UsedTI Tools Used
Evaluation Modules (EVM) for C6201 and C6701
fixed and floating point DSPs
– 64 KB each internal program & data memory– 256 KB SBSRAM, 8 MB SDRAM (external)
C Compiler ver 3.0 from Code Generation Tools
Code Composer ver 4.02 for profiling
DSP Implementation: Channel EstimationDSP Implementation: Channel Estimation
Floating point implementation found more feasible due to matrix inversions and square-roots.
Code optimized for the DSP Use of Specialized approximate instructions
– Approximate reciprocal square roots– Approximate reciprocals
Use of Assembly Code for critical part.– TI's C67 floating point benchmarks for Matrix-Vector
Multiplication & Dot Product Data Memory requirements for Channel Estimation
Approximate Instructions & AssemblyApproximate Instructions & Assembly
L = 150, P =3, N= 31,
SNR = 5dB, SINR = -10 dB
TMS320C67x DSP Cycles
Approx. FPReciprocalinstruction
1
FP reciprocalfunction 28
Approx. FPReciprocal Sq. root
Instruction1
FP Reciprocal Sq.root Instruction 34
0 5 10 150
20
40
60
80
100
120
140
Number of users -->
Exe
cutio
n tim
e(in
mill
isec
onds
) -->
Use of specialized instructions and assembly code on C6701 DSP
C6701: Original C6701: with IntrinsicsC6701: with Assembly
10% improvement
100% improvement
Data Memory RequirementsData Memory Requirements
Data to be placed in External memory
1306
DSP Implementation: Multistage DetectionDSP Implementation: Multistage Detection
16-bit Fixed Point C Code
Code optimized for the DSP
Use of Assembly Code for critical part– TI's C62 fixed point assembly benchmarks for Dot
Product
Data memory requirements for Multistage Detection
Data Memory RequirementsData Memory Requirements
Data can be placed
completely in Internal memory
1 2 3 4 5 6 7 80
2
4
6
8
10
12
14 x 104
Total Number of Iterations
Num
ber o
f Flo
ps
Users:K=15 SNR=6dB
Conventional MethodDifferencing Method
Flops CountFlops Count
conventional
differencing
2X speedup
for a
three-stage
detector
Real-Time RequirementsReal-Time Requirements
Real-Time capability by C6201 DSP
NUMBER OF USERS8 9 10 11 12 13 1450
100
150
200
250
300
350
MA
X B
IT R
ATE
PER
USE
R (k
b/s)
SNR=10dB Window Size=12
Conventional MethodDifferencing Method
12users
150kb/s
Trends in Recent DSPsTrends in Recent DSPs
More internal memory and higher clock speeds – C6203 : 512 KB data, 384 KB program, 250 MHz– useful for uplink channel estimation algorithms.
Specialized Blocks in the DSP Core.– Viterbi decoding in C54.
Lower Voltage operation– 1.2 V in C5402 , useful for saving power consumption in
the mobile.
ASIC ImplementationASIC Implementation
MOSIS Tiny-Chip (40-pin DIP)– 8 synchronous users– 12-bit fixed point implementation– 6000 transistors– 1.2 m CMOS technology– 190kb/s for each user (@12.5MHz)– 3-stage cascade delay < 15 s
Advantages of ASICsAdvantages of ASICs Highly paralleled instructions: 4 RISC IPC (instructions per cycle)
– accumulating while shifting, loading and storing
– recoding while loading Application specific architecture
– faster I/O– smaller on chip memory– smaller ALU
Chip (Single Stage) ArchitectureChip (Single Stage) Architecture
)1( ld
)( lz )( lz
)( ld
)1( lz)( lz
)( ld
SHIFT
)1( ld
A
L
U
RECODER
REG
(L+L’)A ControlLogic
)1()()(
)()()1(
ˆˆˆ whereˆ)(
lll
lTll
ddxxALLzz Internal signals
External signals
Chip LayoutChip Layout
12-bit ALU
Soft Decisions
Cross-Correlation
Recodinglogic
2.0 mm
The Actual Chip PhotographThe Actual Chip Photograph
3-3-stage Cascade Modestage Cascade Mode
Sin
Hin
Fin
LoadCLK
Sout
Hout
Fout
1/2
Sin
Hin
Fin
LoadCLK
Sout
Hout
Fout
1/2
Sin
Hin
Fin
LoadCLK
Sout
Hout
Fout
1/2
Matched
FilterOutput
DetectorOutput
HandShaking
Load RClock
Output Valid
System TimingSystem Timing
Load R 1st Stage 2nd Stage3rd Stage
Final Output
Interference CancellationInterference Cancellation
10100000 00100000 00100000
Scalable ASIC DesignScalable ASIC DesignCurrent Tiny-Chip Chip in the future
Features 8 synchronous users,12-bit fixed point
30 asynchronous users,16-bit fixed point
Clock Rate 12.5MHz 100MHz
Internal registers 0.3 kb 8 kb
ALU 12-bit partial carry look-ahead adder
Three 16-bit full carry look-ahead adders
Transistors 6K 100K
Outputbandwidth
1.5Mb/s 3.0Mb/s
Design method layout VHDL synthesis
Xilinx FPGA XC4000: 500k gates, 96MHz
DSP-ASIC ComparisonDSP-ASIC Comparison
8 users DSP (C6201) ASIC (Tiny Chip)
Clock 200MHz 12.5MHz
Precision 16-bit 12-bit
Speed 300kb/s/user 190kb/s/user
Complexity ~10M (0.25m)transistors
6K (1.2m)transistors
DesignCycle
short long
•TI’s ‘C54xx: General purpose DSP core + ASIC
Other Current ProjectsOther Current Projects
Simulation Testbed– Entire Chain of Algorithms
• Simulink - RTW • Rapid Prototyping• Matlab to DSP
Copper Contest– Implementation of Multistage Detector
using 0.15 micron Copper Technology
Wireless LAN Project Wireless LAN Project
Home Area Wireless LAN
High Speed Office Wireless LAN
Outdoor CDMA Cellular Network
Future WorkFuture Work
Fixed Point Implementations on DSPs/ASICs– Uplink & Downlink Algorithms
Approximations using Linear Algebra Support Long Codes and Fading Multistage Detector
– Execution time Predictability – Increase Efficiency
GPP Comparisons : Praful, Partha, Dr.Adve
Effect of DMA and Caches