27
WiMAX Basestation: Software Reuse Using a Resource Pool Cory Modlin L. N. Reddy Wireless Systems Architect Wireless Software Manager [email protected] [email protected] Arnon Friedmann SW Product Manager [email protected]

WiMAX Basestation: Software Reuse Using a Resource Pool · 2011. 8. 6. · WiMAX Brief Overview (1) • OFDM downlink/“OFDMA” uplink • 5 to 20-MHz bandwidth 512 to 2048 OFDM

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

  • WiMAX Basestation: Software Reuse Using a Resource Pool

    Cory Modlin L. N. ReddyWireless Systems Architect Wireless Software [email protected] [email protected]

    Arnon FriedmannSW Product [email protected]

  • Outline ___________________• Overview of Problem• Traditional Approaches• Resource Pool Approach• Realization

  • Our Goals• Mobile WiMAX (802.16e) PHY baseband base

    station demonstration• Single scalable architecture

    Single to multiple sectorsFrom pico to macro base stationSingle antenna to multiple antennae

    • Multiple processorsC6455 DSPs: add more DSPs as more processing is neededFPGA

  • WiMAX Brief Overview (1)• OFDM downlink/“OFDMA” uplink• 5 to 20-MHz bandwidth

    512 to 2048 OFDM subcarriers• 23/6 Mbit/s (downlink/uplink) at 10 MHz• TDD or FDD

    5 ms “frames” = ~50 OFDM symbols• Advanced features

    DL beamforming/MIMOUL MIMO

  • WiMAX Brief Overview (2)

    WiMAX TDD Frame (5 ms) figure from “Mobile WiMAX – Part I: A Technical Overview and Performance Evaluation; WiMAX Forum April, 2006.”

    UL Burst #1

    UL Burst #3

    UL Burst #5

    UL Burst #2

    UL Burst #4

    DL Burst #2

    DL Burst #1

    DL Burst #4

    DL Burst #6

    DL Burst #7

    DL Burst #5

    DL Burst #3

  • Unique Challenges for Advanced Wireless OFDM base stations• High complexity MIMO algorithms – requires > 1 DSP• Single user can consume either a small fraction of a frame or

    the entire frameCan not statically divide processing by user

    • Processing load can vary substantially from frame to frameMIMO receiver >> non MIMOTurbo decoder >> convolutional decoderBeamformed user >> single DL antenna

    • System designed for worst case could be substantially overdesigned

    Worst case for one burst might not be sustainable over entire subframeIn TDD, worst case UL and worst case DL can not co-existMAC scheduler controls allocation of users and can keep control over resource requirements

  • • Overview of Problem• Traditional Approaches• Resource Pool Approach• Realization

    Outline ___________________

  • Two General Approaches for Taking Advantange of Multiple Processors• Compiler/ programming language that

    abstracts the hardware from the software designer

    Application software not aware of physical topologyExample: remote procedure call (RPC), CORBA...

    • Static allocation of resources among processors

    Software architecture places functional blocks on specific processorsPlacement of functional blocks is an integral part of the application

    processor 2

    processor 2

    processor 1

    processor 1

  • Disadvantages to the Compiler Approach for our Application• Physical location of function calls determined at

    compile timeDoes not allow dynamic flexibility during run-timeRun-time flexibility desirable for efficient use of resources and for failure recovery

    • Host/client thread is blocked while remote function is called

    Waits for function to return• Lot of data movement between processors

    There is overhead for moving dataInterprocessor link can be high latency, so we do not want to require that results come back

    • Overhead for a generic approach like RPC/IDL (interface definition language)

    Need to balance desire for generic interface with real-time processing requirements

    processor 2

    processor 1

  • Common Communications Infrastructure Architectures

    FEC encoder/ interleaver

    modulation pulse shaping

    equalizerdemodulationFEC decoder

    DSP0

    DSP1 DSP2

    •DSL•CDMA

    downlink

    uplink

    pipelined concurrency for symbol rate vs chip rate

    parallel concurrency for uplink/downlink split

  • Problems With Static Allocation of Resources

    FEC encoder/ interleaver

    modulation pulse shaping

    equalizerdemodulationFEC decoder

    DSP0

    •single antenna, single sector, simple FEC code•all fits on single DSP•single antenna, single sector•add turbo encoder and decoder•need to split uplink and downlink•need hardware accelerator/FPGA for turbo decoder

  • Problems With Static Allocation of Resources

    FEC encoder/ interleaver

    modulation pulse shaping

    equalizerdemodulationFEC decoder pre/post processor

    DSP1

    •single antenna, single sector•add turbo encoder and decoder•need to split uplink and downlink•need hardware accelerator/FPGA for turbo decoder

    DSP0

    FEC decoder on FPGA

    •add support for multi antenna or multiple sectors•add MIMO•need to split uplink into parts

  • Problems With Static Allocation of Resources

    FEC encoder/ interleaver

    modulation pulse shaping

    MIMOequalizer

    demodulationFEC decoder pre/post processor

    DSP1

    •add support for multi antenna or multiple sectors•add MIMO•need to split uplink into parts

    DSP0

    FEC decoder on FPGA

    DSP2

    •implement design and discover that DSP2 is 101% loaded•need to split processing done on DSP2 into parts

  • Problems With Static Allocation of Resources

    FEC encoder/ interleaver

    modulation pulse shaping

    MIMOequalizer

    demodulationFEC decoder pre/post processor

    DSP1

    DSP0

    FEC decoder on FPGA

    DSP2

    MIMOequalizerMIMOequalizer

    DSP3

    •Implement design and discover that DSP2 is 101% loaded•Need to split processing done on DSP2 into parts

  • Problems With Static Allocation of Resources

    • Limited re-useFor a common hardware platform, division of resources among processors will be different for different standardsChanges in complexity (e.g. more antennas) or addition of features (e.g. beamforming) require substantial redesignEven a small change to a function on a heavily loaded processor could completely change the architecture

    • InefficientWorst case loading on each DSP might be well under 100% because of way functions are dividedEach DSP must be provisioned for worst case even if worst case is never possible

    • Example in time division duplexing (TDD)Worst case downlink is most of frame dedicated to downlinkWorst case uplink is when most of frame dedicated to uplinkBut worst worst case is never possible

    Headroom for unexpected worst case is limited – can not be distributed among the processors

  • • Overview of Problem• Traditional Approaches• Resource Pool Approach• Realization

    Outline ___________________

  • Ideal Resource Pool• Pool of processors is

    abstracted from the application

    • Total processing power is equal to the sum of the individual processing power

    • Resources are configurable at run-time

    • Both parallel and pipelined division of resources is possible

    resource pool

  • Resource Pool

    • Frame-by-frame configuration from MAC used as input to resource pool controller– Coding, block sizes,

    memory locations change from frame-to-frame

    • Management entity configures resource pool controller– Number of processors– Allocation of functions

    to processors

    MAC

    Management Entity

    host

    host

    Resource Pool

    Resource Pool Controller

    Connectivity Layer

    Signal Processing Signal

    Processing Signal Processing

    FPGA

    DSP2

    DSP1

    DSP1

    Physical Layer ControllerDSP1

    DSP1

  • Guiding Principles• There can be from 1 to n DSPs• MAC/management

    communicate with only one DSP

    • Same code image runs on all DSPs

    • Same data structures reside on all DSPs

    • Processors can talk to each other

    • Definition of jobs/ functional blocks is done manually

    No attempt to automate division of resources

    MAC/management

    PHY/resource pool controller

  • Architecture Layers with Connectivity Layer

    Connectivity Layer- determines physical location of

    destination

    sRIO (PHY) driver

    sRIO PHY

    Connectivity Layerpost next job

    sRIO (PHY) driver

    sRIO PHY

    WiMAX Signal Processingcommit (copy) data

    post next jobWiMAX Signal Processing

    if on a rem

    ote D

    SP

    if on a remote DSP

    if on a rem

    ote D

    SP

    if on same DSP

    DSP 1 DSP 2WiMAX processing divided into jobs/functions

    Jobs are called through Connectivity Layer API

    Connectivity Layer knows where each job runsAllows abstraction of multiple processors from application

    PHY driver (RapidIO in our case) transfers the data orGenerates interrupts/notification

    Resource Pool Controller

    PHY Controller

  • • Calculate job descriptor for all jobs

    for example, FEC coding parameters per codeword and memory location for each codeword

    • Calculate resource descriptor to designate on which physical processor each job will run

    • Send job descriptors and DSP assignments to all DSPs

    • Job distribution is dynamic and configurable at run-time

    PHY/Resource Pool Controller

    PHY/resource pool controller

  • Connectivity Layer API• Commit (copy) data (

    • source address, • destination address,• data size• pointer to appropriate resource descriptor)

    all memories involved in resource sharing reside on all processorsConnectivity Layer knows starting memory location on all processorsConnectivity Layer knows which processor to copy to (could be current processor) based on which application function called it

    • Notify/post (• job to be posted• job index• pointer to appropriate job and resource descriptor)

    Connectivity Layer causes interrupt on relevant processors (we use RapidIO “doorbell” on a remote processor)Connectivity Layer then posts next job

  • • Overview of Problem• Traditional Approaches• Resource Pool Approach• Realization

    Outline ___________________

  • WiMAX Transmitter

    MAC

    SRIO

    randomizer

    FEC and modulation

    randomizer

    FEC encoder + interleaver

    randomizer

    resource pool (per codeword)

    FEC and modulation

    FEC and modulation

    buffer

    permutation + IFFT

    permutation

    IFFT

    CRC calculation

    buffer

    buffer

    buffer

    buffer

    buffer

    buffer

    buffer

    buffer

    modulation

    FEC encoder + interleaver

    modulation

    FEC encoder + interleaver

    modulation

    buffer

    permutation + IFFT

    permutation

    IFFT

    buffer

    resource pool (per sector)

  • GUI to Configure the Resource Pool

  • Hardware PlatformAMC70k2000 (STx)-4, C6455 DSPs-IDT RapidIO switch

    Tundra RapidIOswitch

    lab development AMC carrier board (STx)

    General Purpose Processor with RapidIO

  • end