Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
WiMAX Basestation: Software Reuse Using a Resource Pool
Cory Modlin L. N. ReddyWireless Systems Architect Wireless Software [email protected] [email protected]
Arnon FriedmannSW Product [email protected]
Outline ___________________• Overview of Problem• Traditional Approaches• Resource Pool Approach• Realization
Our Goals• Mobile WiMAX (802.16e) PHY baseband base
station demonstration• Single scalable architecture
Single to multiple sectorsFrom pico to macro base stationSingle antenna to multiple antennae
• Multiple processorsC6455 DSPs: add more DSPs as more processing is neededFPGA
WiMAX Brief Overview (1)• OFDM downlink/“OFDMA” uplink• 5 to 20-MHz bandwidth
512 to 2048 OFDM subcarriers• 23/6 Mbit/s (downlink/uplink) at 10 MHz• TDD or FDD
5 ms “frames” = ~50 OFDM symbols• Advanced features
DL beamforming/MIMOUL MIMO
WiMAX Brief Overview (2)
WiMAX TDD Frame (5 ms) figure from “Mobile WiMAX – Part I: A Technical Overview and Performance Evaluation; WiMAX Forum April, 2006.”
UL Burst #1
UL Burst #3
UL Burst #5
UL Burst #2
UL Burst #4
DL Burst #2
DL Burst #1
DL Burst #4
DL Burst #6
DL Burst #7
DL Burst #5
DL Burst #3
Unique Challenges for Advanced Wireless OFDM base stations• High complexity MIMO algorithms – requires > 1 DSP• Single user can consume either a small fraction of a frame or
the entire frameCan not statically divide processing by user
• Processing load can vary substantially from frame to frameMIMO receiver >> non MIMOTurbo decoder >> convolutional decoderBeamformed user >> single DL antenna
• System designed for worst case could be substantially overdesigned
Worst case for one burst might not be sustainable over entire subframeIn TDD, worst case UL and worst case DL can not co-existMAC scheduler controls allocation of users and can keep control over resource requirements
• Overview of Problem• Traditional Approaches• Resource Pool Approach• Realization
Outline ___________________
Two General Approaches for Taking Advantange of Multiple Processors• Compiler/ programming language that
abstracts the hardware from the software designer
Application software not aware of physical topologyExample: remote procedure call (RPC), CORBA...
• Static allocation of resources among processors
Software architecture places functional blocks on specific processorsPlacement of functional blocks is an integral part of the application
processor 2
processor 2
processor 1
processor 1
Disadvantages to the Compiler Approach for our Application• Physical location of function calls determined at
compile timeDoes not allow dynamic flexibility during run-timeRun-time flexibility desirable for efficient use of resources and for failure recovery
• Host/client thread is blocked while remote function is called
Waits for function to return• Lot of data movement between processors
There is overhead for moving dataInterprocessor link can be high latency, so we do not want to require that results come back
• Overhead for a generic approach like RPC/IDL (interface definition language)
Need to balance desire for generic interface with real-time processing requirements
processor 2
processor 1
Common Communications Infrastructure Architectures
FEC encoder/ interleaver
modulation pulse shaping
equalizerdemodulationFEC decoder
DSP0
DSP1 DSP2
•DSL•CDMA
downlink
uplink
pipelined concurrency for symbol rate vs chip rate
parallel concurrency for uplink/downlink split
Problems With Static Allocation of Resources
FEC encoder/ interleaver
modulation pulse shaping
equalizerdemodulationFEC decoder
DSP0
•single antenna, single sector, simple FEC code•all fits on single DSP•single antenna, single sector•add turbo encoder and decoder•need to split uplink and downlink•need hardware accelerator/FPGA for turbo decoder
Problems With Static Allocation of Resources
FEC encoder/ interleaver
modulation pulse shaping
equalizerdemodulationFEC decoder pre/post processor
DSP1
•single antenna, single sector•add turbo encoder and decoder•need to split uplink and downlink•need hardware accelerator/FPGA for turbo decoder
DSP0
FEC decoder on FPGA
•add support for multi antenna or multiple sectors•add MIMO•need to split uplink into parts
Problems With Static Allocation of Resources
FEC encoder/ interleaver
modulation pulse shaping
MIMOequalizer
demodulationFEC decoder pre/post processor
DSP1
•add support for multi antenna or multiple sectors•add MIMO•need to split uplink into parts
DSP0
FEC decoder on FPGA
DSP2
•implement design and discover that DSP2 is 101% loaded•need to split processing done on DSP2 into parts
Problems With Static Allocation of Resources
FEC encoder/ interleaver
modulation pulse shaping
MIMOequalizer
demodulationFEC decoder pre/post processor
DSP1
DSP0
FEC decoder on FPGA
DSP2
MIMOequalizerMIMOequalizer
DSP3
•Implement design and discover that DSP2 is 101% loaded•Need to split processing done on DSP2 into parts
Problems With Static Allocation of Resources
• Limited re-useFor a common hardware platform, division of resources among processors will be different for different standardsChanges in complexity (e.g. more antennas) or addition of features (e.g. beamforming) require substantial redesignEven a small change to a function on a heavily loaded processor could completely change the architecture
• InefficientWorst case loading on each DSP might be well under 100% because of way functions are dividedEach DSP must be provisioned for worst case even if worst case is never possible
• Example in time division duplexing (TDD)Worst case downlink is most of frame dedicated to downlinkWorst case uplink is when most of frame dedicated to uplinkBut worst worst case is never possible
Headroom for unexpected worst case is limited – can not be distributed among the processors
• Overview of Problem• Traditional Approaches• Resource Pool Approach• Realization
Outline ___________________
Ideal Resource Pool• Pool of processors is
abstracted from the application
• Total processing power is equal to the sum of the individual processing power
• Resources are configurable at run-time
• Both parallel and pipelined division of resources is possible
resource pool
Resource Pool
• Frame-by-frame configuration from MAC used as input to resource pool controller– Coding, block sizes,
memory locations change from frame-to-frame
• Management entity configures resource pool controller– Number of processors– Allocation of functions
to processors
MAC
Management Entity
host
host
Resource Pool
Resource Pool Controller
Connectivity Layer
Signal Processing Signal
Processing Signal Processing
FPGA
DSP2
DSP1
DSP1
Physical Layer ControllerDSP1
DSP1
Guiding Principles• There can be from 1 to n DSPs• MAC/management
communicate with only one DSP
• Same code image runs on all DSPs
• Same data structures reside on all DSPs
• Processors can talk to each other
• Definition of jobs/ functional blocks is done manually
No attempt to automate division of resources
MAC/management
PHY/resource pool controller
Architecture Layers with Connectivity Layer
Connectivity Layer- determines physical location of
destination
sRIO (PHY) driver
sRIO PHY
Connectivity Layerpost next job
sRIO (PHY) driver
sRIO PHY
WiMAX Signal Processingcommit (copy) data
post next jobWiMAX Signal Processing
if on a rem
ote D
SP
if on a remote DSP
if on a rem
ote D
SP
if on same DSP
DSP 1 DSP 2WiMAX processing divided into jobs/functions
Jobs are called through Connectivity Layer API
Connectivity Layer knows where each job runsAllows abstraction of multiple processors from application
PHY driver (RapidIO in our case) transfers the data orGenerates interrupts/notification
Resource Pool Controller
PHY Controller
• Calculate job descriptor for all jobs
for example, FEC coding parameters per codeword and memory location for each codeword
• Calculate resource descriptor to designate on which physical processor each job will run
• Send job descriptors and DSP assignments to all DSPs
• Job distribution is dynamic and configurable at run-time
PHY/Resource Pool Controller
PHY/resource pool controller
Connectivity Layer API• Commit (copy) data (
• source address, • destination address,• data size• pointer to appropriate resource descriptor)
all memories involved in resource sharing reside on all processorsConnectivity Layer knows starting memory location on all processorsConnectivity Layer knows which processor to copy to (could be current processor) based on which application function called it
• Notify/post (• job to be posted• job index• pointer to appropriate job and resource descriptor)
Connectivity Layer causes interrupt on relevant processors (we use RapidIO “doorbell” on a remote processor)Connectivity Layer then posts next job
• Overview of Problem• Traditional Approaches• Resource Pool Approach• Realization
Outline ___________________
WiMAX Transmitter
MAC
SRIO
randomizer
FEC and modulation
randomizer
FEC encoder + interleaver
randomizer
resource pool (per codeword)
FEC and modulation
FEC and modulation
buffer
permutation + IFFT
permutation
IFFT
CRC calculation
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
modulation
FEC encoder + interleaver
modulation
FEC encoder + interleaver
modulation
buffer
permutation + IFFT
permutation
IFFT
buffer
resource pool (per sector)
GUI to Configure the Resource Pool
Hardware PlatformAMC70k2000 (STx)-4, C6455 DSPs-IDT RapidIO switch
Tundra RapidIOswitch
lab development AMC carrier board (STx)
General Purpose Processor with RapidIO
end