Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Sora: High Performance Software Radio using General Purpose Multi-
Core Processors
Kun Tan† Jiansong Zhang† Ji Fang‡ He Liu§ Yusheng Ye§
Shen Wang§ Yongguang Zhang† Haitao Wu† Wei Wang†
Geoffrey M. Voelker◊
† Microsoft Research Asia‡ Tsinghua University, Beijing, China
§ Beijing Jiaotong University, Beijing, China ◊ UCSD, La Jolla, USA
NSDI 2009, Boston, USA 1
Software Radio
NSDI 2009, Boston, USA 2
BenefitsPromise of universal connectivity and cost savingProgrammability => faster development cycle, faster to marketOpen platform for wireless research
CDMA
WiFi
WiMAX
GPS
3G
GSM
Bluetooth
software
Bluetooth, WiFi,WiMAX, GSM,CDMA, 3G, LTE …
General RF
Frontend
Fundamental Challenges
• Large volume of high-fidelity digital signals
– Require a high-speed system I/O
NSDI 2009, Boston, USA 3
Antenna
Processor
SoftwareHardwareDigital
Samples
D/A
A/D
RF
Frontend
1.2Gbps for 802.11 (20MHz channel, 16b A/D, 4x)
~up to 5 Gbps for 11n (4x4MIMO) ;
Over 10Gbps for future high-speed wireless
Fundamental Challenges
• Large volume of high-fidelity digital signals
– Require a high-speed system I/O
• Computation-intensive signal processing
NSDI 2009, Boston, USA 4
InterleavingConvolutional
encoderQAM Mod IFFT GI Addition
Symbol Wave
ShapingScramble
To RF
Demod +
InterleavingFFT
Viterbi
decodingRemove GI
From RF
Descramble
Transmitter:
Receiver:
Bits
@48MbpsBits
@48Mbps
Samples
@512Mbps
Samples
@1.28Gbps
Samples
@640Mbps
Decimation
Samples
@384MbpsBits
@24Mbps
Samples
@1.28Gbps
Samples
@640Mbps
Samples
@512Mbps
Samples
@384MbpsBits
@48Mbps
Bits
@24Mbps
Bits
@24Mbps
From MAC
To MAC
Bits
@24Mbps
Fundamental Challenges
• Large volume of high-fidelity digital signals
– Require a high-speed system I/O
• Computation-intensive signal processing
NSDI 2009, Boston, USA 5
InterleavingConvolutional
encoderQAM Mod IFFT GI Addition
Symbol Wave
ShapingScramble
To RF
Demod +
InterleavingFFT
Viterbi
decodingRemove GI
From RF
Descramble
Transmitter:
Receiver:
Bits
@48MbpsBits
@48Mbps
Samples
@512Mbps
Samples
@1.28Gbps
Samples
@640Mbps
Decimation
Samples
@384MbpsBits
@24Mbps
Samples
@1.28Gbps
Samples
@640Mbps
Samples
@512Mbps
Samples
@384MbpsBits
@48Mbps
Bits
@24Mbps
Bits
@24Mbps
From MAC
To MAC
Bits
@24Mbps
Raw computation power required: 802.11b => 10Gops, 802.11a => 40Gops!(now server-class CPU runs at 3GHz clock)
Fundamental Challenges
• Large volume of high-fidelity digital signals
– Require a high-speed system I/O
• Computation-intensive signal processing
• Hard deadline and accurate timing control
– 802.11 MAC requires response within a few s
– Event trigger timing accuracy at s level
NSDI 2009, Boston, USA 6
Approaches
NSDI 2009, Boston, USA 7
Programmability
Low High
Low
Hig
h
Perf
orm
ance
EmbeddedDSP
Example: Rice WARP, TI SFF-SDR
Programmable hardware
(FPGA)
Example: GNU Radio/USRP(v1&2)• Interface USB/GbE: <1Gbps, >1ms• Achievable wireless xput: ~100Kbps
Low-performanceGPP-based SDR
SoraSora
Resolving the SDR platform dilemma• Commodity PC w/ C program• High performance
• sys tput:10Gbps; ~s latency • target wireless xput:10M~1Gbps
Sora Approach
• New PCIe-based Interface card => high system throughput
• New optimizations to implement PHY algorithms and streamline processing on multi-core CPU=> efficient PHY processing
• Core dedication => real-time support
NSDI 2009, Boston, USA 8
MemRF
RFRF
Sora
APP
Multi-core CPU
Sora Soft-Radio Stack
PCIe bus
Digital Samples
@Multiple Gbps
RCBA/D
D/A RFSora
APP
APP
APP
APP
APP
Sora Hardware
Sora Architecture
NSDI 2009, Boston, USA9
General radio front-end: 700M/1.8G/2.4G/5GHz
MemRF
RFRF
Sora
APP
Multi-core CPU
Sora Soft-Radio Stack
PCIe bus
Digital Samples
@Multiple Gbps
RCBA/D
D/A RFSora
APP
APP
APP
APP
APP
Sora Hardware
Radio Control Board
NSDI 2009, Boston, USA10
PCIe-based High-speed Interface card
PCIe is commodity in most modern PCs High throughput: 16Gbps at PCIe-8x Low latency: ~ 1 s Separated with other I/O devices
RCB Details
NSDI 2009, Boston, USA 11
PCIe-8x interface: up to 16Gbps throughput
Versatile RF interface: up to 8 channels (8x8 MIMO)
Versatile RF interface: up to 8 channels (8x8 MIMO)
RCB Details
NSDI 2009, Boston, USA 12
s
Buffered data path: bridging the synchronous ops at RF and asynchronous processing at CPU (12.3Gbps measured )
Low latency control path for software (0.36 s measured)
A/D
D/ARF Circuit
RF Front-endPCIE
Controller SDRAM
Controller
FIFO
FIFODMA
Controller
DDR
SDRAM
FPGA
RCB
PCIe
bus
Antenna
RF
Controller
Registers
MemRF
RFRF
Sora
APP
Multi-core CPU
Sora Soft-Radio Stack
PCIe bus
Digital Samples
@Multiple Gbps
RCBA/D
D/A RFSora
APP
APP
APP
APP
APP
Sora Hardware
Sora Software
13
High-performance SDR processing w/ key software techniques
Efficient PHY implementation using SIMD and LUTs Speed up PHY using multi-core streamline processing Core dedication for real-time support
NSDI 2009, Boston, USA
Efficient PHY Implementation
• Exploit large high-speed cache memory– Extensive use of lookup tables (LUT): trade memory
for calculation; still well fit into L2 cache
– Applicable for more than half of the common algorithms; speedup ranges from 1.5x to 22x
NSDI 2009, Boston, USA 14
Direct impl. 8 ops per bit
LUT impl. 2 Look-up op for 8 bits!(size 32KB)
Ex: Convolutional encoder+
+
Tb Tb Tb Tb Tb Tb
Output Data A
Output Data B
Efficient PHY Implementation
• Exploit data parallelism in PHY
– Utilize wide-vector SIMD extension in CPU
– Applicable to many PHY algorithms with significant speedups (1.6x ~ 50x)
NSDI 2009, Boston, USA 15
Ex. (I)FFT
Core 2Core 1
Speed up PHY using multi-core streamline processing
• Efficiently partition and schedule the PHY processing across cores– Interconnecting sub-pipeline with light-weight,
synchronized FIFOs
– Static scheduling of processing modules in PHY pipeline
NSDI 2009, Boston, USA 16
Demod +
InterleavingFFT
Viterbi
decodingRemove GI DescrambleDecimation
Synchronized FIFO
Core Dedication for Real-time Support
• Exclusively allocate enough cores for SDR processing in multi-core systems
– Guarantee the CPU, cache and memory bandwidth resources for predictable performance
– Achieve s-level timing control
– Simple abstraction, and easier to implement in standard OSes than RT-scheduler
• Implemented in WinXP without modifications to Kernel
NSDI 2009, Boston, USA 17
Implementation
• Sora software platform on Win XP
– 14K lines of C code, including PCIe driver framework, memory management, FIFO management, etc
• SoftWiFi: full implementation of IEEE 802.11a/b/g PHY and DCF MAC
– 9K lines of C code; 4 man-month for dev & test
– DSSS 1, 2, 5.5, 11Mbps for 11b; OFDM 6, 9, 12, 18, 24, 36, 48, 54Mbps for 11a/g
NSDI 2009, Boston, USA 18
0
1
2
3
4
5
6
7
8
9
10
1M 2M 5.5M 11M 6M 24M 54M
Re
qu
ire
d c
om
pu
tati
on
(Gig
a cy
cle
s p
er
seco
nd
)
0
1
2
3
4
5
6
7
8
9
10
1M 2M 5.5M 11M 6M 24M 54M
Re
qu
ire
d c
om
pu
tati
on
(Gig
a cy
cle
s p
er
seco
nd
)
Results: PHY Processing
11.6 11.7 11.7 11.818.3 60.4 132.4
NSDI 2009, Boston, USA 19
>30
x s
peedup
~10
x s
peedup
802.11b 802.11a/g 802.11b 802.11a/g
After Sora Optimization
0
1
2
3
4
5
6
7
8
9
10
1M 2M 5.5M 11M 6M 24M 54M
Re
qu
ire
d c
om
pu
tati
on
(Gig
a cy
cle
s p
er
seco
nd
)
0
1
2
3
4
5
6
7
8
9
10
1M 2M 5.5M 11M 6M 24M 54M
Re
qu
ire
d c
om
pu
tati
on
(Gig
a cy
cle
s p
er
seco
nd
)
Results: PHY Processing
11.6 11.7 11.7 11.818.3 60.4 132.4
NSDI 2009, Boston, USA 20
>30
x s
peedup
~10
x s
peedup
802.11b 802.11a/g 802.11b 802.11a/g
Sora enables software implementation of today’s high-speed wireless system in
standard PC with a few cores
After Sora Optimization
0
5
10
15
20
25
1M 2M 5.5M 11M 6M 24M 54M
Th
rou
gh
pu
t (M
bp
s)
Modulation Mode
Sora-Commercial
Commercial-Commercial
Commercial-Sora
Results: End-to-end Throughput
NSDI 2009, Boston, USA 21
Communicating with commercial 802.11a/b/g card
0
5
10
15
20
25
1M 2M 5.5M 11M 6M 24M 54M
Th
rou
gh
pu
t (M
bp
s)
Modulation Mode
Sora-Commercial
Commercial-Commercial
Commercial-Sora
Results: End-to-end Throughput
NSDI 2009, Boston, USA 22
Communicating with commercial 802.11a/b/g card
Seamlessly interoperate with commercial WiFi• Correctness of all PHY algorithms• Satisfying timing requirements of standards• Commercial equivalent performance
Extensions
NSDI 2009, Boston, USA 23
Jumbo frames in 802.11 TDMA MAC
Extensions: New Applications
NSDI 2009, Boston, USA 24
Conclusion
• Sora is a fully programmable software radio platform on commodity PC architecture– Easy C programming on multi-core CPU
– High performance: high processing speed, low latency, and performance guarantee• Confirmed by SoftWiFi, the first fully interoperable IEEE
802.11 (PHY and MAC) on general purpose processors
• Plan to release Sora SDK to research community– H/W: RCB + 2.4G RF front-end set (~$2K USD)
NSDI 2009, Boston, USA 25
• Sora demo in the demo session this evening
• You can interact with Sora with your own laptop, iPhone, or other smart phones
SSID: soranet
DEMO
NSDI 2009, Boston, USA 2611/5/2008
HD Video streaming
Q&A
NSDI 2009, Boston, USA 27