Click here to load reader

Implementing Multiuser Channel Estimation and Detection for W-CDMA Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro and Behnaam Aazhang Rice

  • View
    219

  • Download
    0

Embed Size (px)

Text of Implementing Multiuser Channel Estimation and Detection for W-CDMA Sridhar Rajagopal, Srikrishna...

  • Implementing Multiuser Channel Estimation and Detection for W-CDMASridhar Rajagopal, Srikrishna Bhashyam,Joseph R. Cavallaro and Behnaam AazhangRice University

    {sridhar,skrishna,cavallar,aaz}@rice.edu

    This work is supported by Nokia, Texas Instruments, Texas Advanced Technology Program and NSF

  • Organization Joint Estimation & Detection An Implementation-Friendly Scheme Simulations Architectural FeaturesTask PartitioningArea-Time Tradeoffs Conclusions Future Work

  • Base-Station with MUD

  • Joint Estimation & DetectionJointly estimate the channel response and detect all the users bits. Shown to have better performance as well as reduced computational complexity. Maximum Likelihood Based Channel Estimation[C.Sengupta et al. : PIMRC1998 WCNC1999] Differencing Multistage Detection based on Parallel Interference Cancellation[G.Xu et al. : SPIE1999]

  • Computations InvolvedModel

    Compute Correlation Matrices

    Bits of K async. users aligned at times I and I-1Received bits of spreading length N for K users delay

  • Multishot Detection

    Multishot DetectionSolve for the channel estimate, Ai

  • Differencing Multistage Detection Stage 0 [ Matched Filter Detector]

    Stage 1 [ to build differencing vector]

    Successive Stages

    S=diag(AHA)

    y - soft decision

    d - detected bits (hard decision)

  • Structure of AHANot difficult to Compute AHABlock Bi-Diagonal Matrix : Use Structure

  • DrawbacksMatrix Inversion/ Decomposition NeededResult not available till end of computationDelay before DetectionDifficult for TrackingHigher Precision Needed Floating Point UnitsLarger Memory RequirementsStorage of elements to compute inverseFloat = 32 bits / Input accuracy = 12-14 bitsSLOW! - Difficult to meet Real-Time[S.Rajagopal et al. : TI DSPFest1999]

  • Proposed Base-Station No Multiuser DetectionTI's Wireless Basestation (http://www.ti.com/sc/docs/psheets/diagrams/basestat.htm)

  • New Scheme Iterative Method to find the Channel Estimates [S.Bhashyam et al. : WCNC2000 (submitted)] Can be easily adapted to Tracking for Fading Channels Fixed Point Implementation Estimates ready for detection Immediately Simpler Hardware and Software.Computation Savings only Per Bit

  • Iterative SchemeTracking Slow Fading : Large Window LFast Fading : Smaller Window L Method of Steepest DescentStable convergence behavior fixed : Bit-by-Bit update Matches Closely to the Scheme with Inversions

  • Simulations - AGWN ChannelDetection Window = 12 SINR = 0 Paths =3 Preamble =15010000 bits/userMF Matched FilterML- Maximum LikelihoodACT using inversion

  • Fading Channel with TrackingDoppler = 10 Hz, 1000 Bits,15 users, 3 Paths

  • DSP ImplementationC6201 Texas InstrumentsFixed Point Processor200 MHz32 -bit VLIW Architecture 8 Functional Units2 Multipliers4 Adders2 Load/Store TI C Compiler

  • SimulationWork in Progress!Why better?Fixed Point Implementation - Faster on DSPsHigher Clock Speeds / Faster MultiplicationsMore SIMD Parallelism due to smaller wordlength.Software Code Simpler to writeSmaller Program SizeProblemsInput Bit Precision AnalysisOverflows

  • Task - Partitioning the Algorithm

  • Task Decomposition Matrix ProductsIterateCorrelation Matrices (Per Bit)Rbr[I]O(KN)A0HA1O(K2N)AHrO(KND)A1HA1O(K2N)A0HA0O(K2N)A[I]O(K2N)Multistage Detection(Per Window)

    O(DK2M)bPilotData

    MU XdData

    MU XA[R]O(K2N)dRbr[R]O(KN)RbbO(K2)Block IBlock IIBlock IIIBlock IVChannel EstimationMultistage DetectionTask ATask BS.Das et al : Asilomar99TIME

  • Channel Estimation Architecture Detection Architecture One version already ready[G.Xu - Masters Thesis 1999] Advantages over DSP Implementation:Optimal Memory UtilizationCustom Blocks for exploiting available pipelining and parallelismParts could be mapped to FPGA / Reconfigurable logicShows theoretical bounds for maximum achievable Data RatesShows how tasks could be split among different processors

  • Block Diagrambit8-bitREALIMAGEach block shows no. of operations in it.

  • Channel Estimation Windowb0b0(2K2)bb(2 K2)

    MUX(2K)

    MUX(N)

    MUX (2 K2)

    Inverter (2 K2)

    Rbb(2 K2)

    Rbr[R](KN)

    Multiplier(2 K2N)

    Atmp[R]

    >>(4 K2)

    A[R](KN)

    b0(2K) r0(N)br[R]Inverter(2K) REALEach block shows no. of operations in it.

  • Auto-correlation Structureb,b0 are 1-bitSubtraction by using inverterRbb using a Counter Fully Parallel 2K2 elements O(1) Time Pipelined [with LOAD] 2K elements O(K) Time Serial [with LOAD] 1 element O(2K2) Time

    Rbb(2 K2)

  • Cross-Correlation Structurer is 8-bit, b is 1-bitRbr using 8-bit Adders Based on sign of b Fully Parallel KN, O(1) Pipelined N , O(K) Serial 1, O(KN)

  • Iterative Update Structure8-bit Multipliers16-bit Adders for Multiplier8-bit Adders for A Parallel KN, O(K) Pipelined N , O(K2) Serial 1, O(K2N)

  • Elements in each blockExample : N = 32,L =100, K =32Fully Parallel Solution : 4K Multipliers, 12K Adders : O(32) Time Pipelined Solution :100 Multipliers, 300 Adders : O(1K) Time

    Block

    Requires

    Area-Time Tradeoff

    Fully Parallel

    Implementation

    bbT,b0b0T

    1-bit AND Gates

    2K2

    2K2

    Rbb

    8-bit UP/DOWN

    Counters

    2K

    [with LOAD]

    2K2

    Rbr[R,I]

    8-bit Adders

    2N

    4KN

    Y[R,I]

    8-bit Adders

    4K

    4KN

    Multiplier

    [R,I]

    8-bit Multipliers

    16-bit adders

    4K

    4K

    4KN

    4K

    Window

    Buffer

    Shift Registers:1-bit

    Shift Registers:8-bit

    L

    2L

    L

    2L

    Atmp[R,I]

    8-bit subtractors

    2K

    4KN

    TIME

    O(K2)

    O(K)

  • ConclusionsIterative Scheme for Joint Estimation & Detection No loss in algorithm performanceSuitable for Hardware ImplementationOn DSPs, FPGAs and ASICsSupports Tracking for Fading ChannelsFixed Point Implementation FeasibleASIC architecture To exploit available pipelining and parallelismMultiuser Channel Estimation and Detection algorithms POSSIBLE to IMPLEMENT for W-CDMA.

  • Future WorkMS Extend Architecture to Long Codes Task Partition the algorithm on the Sundance Multi-DSP/FPGA board to achieve real-timePost-MSDownlink Architectures to Min. Power Consumption /Area Implementing Coding/Decoding Blocks and integrateRENE

  • EXTRA SLIDES

  • Data Rates AchievedAssuming Channel Estimation Real-Time

  • Fading Channel SNR = 10 dB, Doppler = 10 Hz, 1000 Bits