Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers. Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang. This work is supported by Nokia, TI, TATP and NSF. Introduction. - PowerPoint PPT Presentation

Text of Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

  • Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

    Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam AazhangThis work is supported by Nokia, TI, TATP and NSF

  • IntroductionReal-time VLSI architecture for multiuser channel estimationMultiuser channel estimation usually neglected high computational complexity - DSPs infeasibleSingle user sliding correlator structures usedIterative fixed point algorithm developedArea-Time tradeoffs presentedArea-Constrained,Time-Constrained, Area-Time efficient

  • Baseband signal processingBase-Station ReceiverChannel estimationDetectionDecodingMultiple Users AntennaDetected BitsTrackingTraining

  • Channel estimationcompensate for unknown fading amplitudes and asynchronous delays.

  • Need for multiuser channel estimationDetector performance depends on accuracy of channel estimatorMultiuser Channel EstimationJointly estimate parameters for all usersBetter performance than single user estimates

  • Computing multiuser channel estimatesComputed by sending a training sequence of known bits to the receiver.When absent, detected bits can be used to update estimates in a decision feedback mode for tracking.Importance of multiuser estimation usually neglected May exceed detector complexity

  • Multiuser Channel Estimation Algorithm

    = {+1, -1} : Training/Tracking bits = 8-bit integer (complex) : Received signal N = spreading gain (typically fixed ,e.g: 32) K = number of users (variable,

  • Implementation complexityMatrix inversions (size 32x32) per windowUnable to meet real-time on DSPs [Asilomar99]VLSI fixed-point architectures for matrix inversionsDifficult to design , Finite precision problemsTypically, simpler single-user sliding correlator structures used.

  • OutlineWhat is multiuser channel estimation?Need for multiuser channel estimationImplementation problemsAlgorithm enhancementsVLSI architecturesArea-constrained,Time-constrained, Area-Time efficientConclusions

  • Iterative scheme for channel estimationBit-streaming : suitable for tracking (window length L)Method of gradient descentStable convergence behaviorSimple fixed-point VLSI architecture

  • Simulations - Static multipath channelSINR = 0 dBPaths =3Preamble =150Spreading N = 31Users K = 15

  • Rayleigh Fading channel with trackingDoppler = 10 Kmph

  • OutlineWhat is multiuser channel estimation?Need for multiuser channel estimationImplementation problemsAlgorithm enhancementsVLSI architecturesArea-constrained,Time-constrained, Area-Time efficient Conclusions

  • Area-Time TradeoffsDesign for 32 users (K) and spreading code (N) 32Target = 128 Kbps (4000 cycles at 500 MHz).Assume single cycle addition/multiplicationArea-Constrained Architecture :Pico-cells/fewer usersTime-Constrained Architecture : Maximum data ratesArea-Time Efficient Architecture : Real-Time

  • Task decomposition: channel estimation

  • Architecture design: auto-correlation

    b = {+1,-1}Multiplication is a XNOR operationMatrix updated using XNOR gatesAuto-correlation matrix implemented as an UP/DOWN counter(s)

  • Architecture design: cross-correlationb = {+1,-1}, r = 8-bit integer vector (complex) Multiplications reduce to additions/subtractionsMatrix (complex) can be updated with 8-bit addersCross-correlation matrix stored as RAM.

  • Architecture design: channel estimateA = 8-bit integer matrix (complex)
  • Area-Constrained Architectureb0bChannel Estimate

  • Area-constrained Architecture: Hardware Requirements

    Blocks

    Quantity

    Full Adder Cells

    Complex

    Total

    Counter

    1*8

    8

    -

    8

    Multiplier

    1*8

    64

    *2

    128

    Adders

    3*8 + 2*16

    56

    *2

    112

    Total Area

    248

    FA cells

    Total Time

    (N=K=32)

    4K2N

    128,000 cycles

  • Time-constrained Architectureb*bTb0*b0Tbb0MUXRbrMUXrr0MUXRbbAMultSubtract>>Subtract2K*12K*12K*1K(2K-1)*1K(2K-1)*12K2*82KN*162KN*162KN*82K*1N*8N*8N*82KN*82KN*8ChannelEstimate

  • Auto-correlation Update in ParallelRbb(i,j)CounterbbT(i,j)U/D#Rbb(i,i)Counter1U/D#Array of XNORsabacadbcbdcdbcdab (2K)bbT (K*{2K-1}*1)Rbb (2K2*8)Array of Counters

  • Cross-Correlation Update in Parallelr(j)

  • Time-constrained Architecture: Hardware Requirements

    Blocks

    Quantity

    Full Adder Cells

    Complex

    Total

    Counter

    2K2*8

    16K2

    -

    16K2

    Multiplier

    4K2N*8

    256K2N

    *2

    512K2N

    Adders

    2KN*16 + 2KN*8 + 4K2N*16

    48KN + 64K2N

    *2

    96KN + 128K2N

    Total Area

    (N=K=32)

    20,000,000

    FA cells

    Total Time

    Log2(2K)

    6 cycles

  • Area-Time efficient architecture designArea - constrained ArchitectureMinimize area - single 8-bit multiplier4K2N cycles (128,000 cycles ; 3.81 Kbps)Time-constrained ArchitectureMinimize time - 4K2N 8-bit multipliersLog2(2K) cycles (6 cycles ; 83.33 Mbps)Aim : To meet real-time with min. area overheadDifferent parallelism levels for multipliers

  • Area-Time Efficient ArchitectureChannel Estimate

  • Area-Time Efficient Architecture: Hardware Requirements

    Blocks

    Quantity

    Full Adder Cells

    Complex

    Total

    Counter

    2K*8

    16K

    -

    16K

    Multiplier

    2K*8

    128K

    *2

    256K

    Adders

    2K*16 + 2*8 + 1*16

    32K + 32

    *2

    64K + 64

    Total Area

    (N=K=32)

    10,000

    FA cells

    Total Time

    2KN

    2,000 cycles

  • OutlineWhat is multiuser channel estimation?Need for multiuser channel estimationImplementation problemsAlgorithm enhancementsVLSI architecturesArea-constrained,Time-constrained, Area-Time efficient Conclusions

  • ComparisonsDSPs unable to exploit bit-level parallelismInefficient storage of bitsReplacing multiplications by additions/subtractions

    Implementation

    Clock

    Rate

    Full Adder

    Cells

    Data Rates

    C67 DSP

    166 MHz

    -

    1.02 Kbps

    Area

    500 MHz

    248

    3.81 Kbps

    :

    :

    :

    :

    Area-Time

    500 MHz

    104

    256 Kbps

    :

    :

    :

    :

    Time

    500 MHz

    2x107

    83.33 Mbps

  • Scalability of Architectures with KDisadvantages of VLSI architecturesDesign for maximum number of users in the systemIf there are fewer users, Turn off functional units to reduce powerReconfigure hardware for higher data rates (FPGA)

    Dr. Cavallaro, dont know to handle this Question properlyWe never designed an architecture/algorithm for varying number of users dynamically. (Though we had started on it)What should be included in future work?Please give suggestions!!

  • ConclusionsReal-Time VLSI architecture for multiuser channel estimationIterative fixed-point algorithm developed to avoid matrix inversionsArea-Time Tradeoffs presentedArea-Constrained, Time-Constrained, Area-Time efficient VLSI architectures exploit bit-level computations and parallelism to meet real-time.

Recommended

View more >