View
214
Download
1
Tags:
Embed Size (px)
Citation preview
11
1
SPEX: A Programming Language for Software Defined Radio
Yuan Lin, Robert Mullenix, Mark Woh,
Scott Mahlke, Trevor Mudge,
Alastair Reid1, and Krisztián Flautner1
The University of Michigan at Ann Arbor1ARM, Ltd
22
2
ACAL – University of Michigan
SDR Hardware Platform Handset SDR has steep computational requirements
(>40GOPS) with tight power budget (<500mW) Heterogeneous multiprocessor solutions common
Examples Include:SODAPhilips EVPTI OMAPIBM CELL* (not a mobile solution but very similar)
Themes: Targets high-throughput signal-processing domains VLIW SIMD Control- and data- focused cores
33
3
ACAL – University of Michigan
SODA System Architecture for 3G Based on our prior work:
4 PEs
Scalar, SIMD pipelines
32 elements wide
Scratchpad memories
ARM controller processor
Handles control, DMA
Feasible approach: 3W, 26.6 mm2 at 180nm
~0.5W, 6.7 mm2 projected for 90nm
LocalMem
ExecutionUnit
PE
LocalMem
ExecutionUnit
PE
LocalMem
ExecutionUnit
PE
LocalMem
ExecutionUnit
PE
GlobalMem
System ArchitectureARM
SIMDRF
SIMDMEM
scalarRF
scalarMEM
VtoS&
StoV
DMA
Scalar ALU SIMD ALU
44
4
ACAL – University of Michigan
Programming SDR Platforms Programming DSPs already tough
Multiprocessor architectures make a tough problem tougher
Want to achieve high performance with high productivity
Software needs to advance along with hardware
C not sufficient Can express protocols, but awkward and inefficient
Rediscovering parallelism is challenging
Want to decouple algorithm design from implementation
No well-defined concept of time
55
5
ACAL – University of Michigan
Algorithm Characteristics Digital communication protocols hierarchical
Abstracted as a series of connected kernels Can isolate and optimize individually
Operations frequently involve matrix computations Have real-time constraints with static control flow
W-CDMA Protocol Operations
66
6
ACAL – University of Michigan
Desired SDR Language Features Plenty of Parallelism
Kernels vectorizible
Pipeline the stream
Interleave concurrent tasks
Give compiler control Express the algorithms & constraints, not run-time behavior
Static decisions should be made at compile time
(e.g. scheduling, PE assignment, memory management)
Support for Timing Models Absolute timing primitives prevent drifting
Periodic and relative timing constraints
77
7
ACAL – University of Michigan
SPEX – Language Extension for SDR
Two levels: data/control separation Kernel SPEX - the data plane
Algorithm kernel descriptions, timing unawareC + Matlab operators + DSP fixed-point arithmetic
System SPEX - the control plane
Wireless protocol system descriptions
C + Inter-kernel communications + timing constraints
88
8
ACAL – University of Michigan
Kernel SPEX Atomic Building Blocks
Non-preemptible
Ignorant of timing constraints
Maintains local state
Features Templated definitions
Member functions
Matlab-like vector support
SystemC-like data types
template<class T, TAPS, BSIZE>kernel FIR { vector<T, TAPS> z; vector<T, TAPS> coeff;
void set_coeff(vector<T, TAPS> c) { coeff = c; }
void run(channel<T, BSIZE> inbuf, channel<T, BSIZE> outbuf) { int i; T in, out;
for (i = 0; i < BSIZE; i++) { in = inbuf.pop(); z += coeff * in; out = z[0]; outbuf.push(out); z = (z(1:TAPS-1),0); } }};
99
9
ACAL – University of Michigan
System SPEX Synchronous primitives
A set of timing and concurrent primitives for expressing real-time execution
Modeled after real-time languages
Stream primitives A set of streaming primitives for expressing streaming computation
Modeled after the synchronous dataflow model and its variations
1010
10
ACAL – University of Michigan
Parallel execution of instructions within each scope
Synchronous Example – WCDMA
Real-time clock
Absolute timing assertion
void wcdma() { clock clk; at (clk % wcdma_frame == 0) { ... adcfir(ch1); chan_est(ch1, ch2, num_fingers); parallel { bch(ch1, bch_done); if (dch_mode) { dch(ch1, ch2, num_fingers, dch_done); } } ... parallel { wait(clk % bch_deadline == 0); if (dch_mode) wait(clk % dch_deadline == 0); } }}
1212
12
ACAL – University of Michigan
gcc
a.out
+ libraries
Functional debugging path
Kernel SPEX Compilation Flow
Virtual Kernel C
SPEX frontend
Kernel SPEX
Frontend removes “syntactic sugar” Templates instantiated Matlab features mapped to
function calls Virtual Kernel C
Infinite vector length assumed Robust set of operators Can be linked with special libraries
and compiled with gcc to verify functional correctness
Physical Kernel C Vector length bounded by actual
SIMD width Restricted to machine operators
V-to-P translation
VLIW backend
Physical Kernel C
SODA assembly
Compilation path
1414
14
ACAL – University of Michigan
System SPEX Task Compilation Stream IR with dataflow primitives
i.e. push(), pop(), peek()
Step 1: Dataflow rate-matching
Insert buffers between nodes
Add loops to match the rate
Step 2: Initial resource allocation
Processor assignments
memory allocation and DMA transfer
Step 3: Control-data split
Break the task into independent threads
1
2
3
1515
15
ACAL – University of Michigan
System SPEX Real-time Optimization
Hierarchical constraint scheduling Each task is treated as a
single node Guarantees all nodes are
schedulable through compiler optimizations
Non-preemptive multi-processor scheduler Static processor
assignments Static task execution
ordering Dynamic execution timing
Iterative optimization if constraints not met Re-compile each task with
system profiling
1616
16
ACAL – University of Michigan
Summary Multiprocessor architectures makes handset SDR
feasible, but complicates software Need better language to map algorithm to hardware
SPEX capitalizes on domain properties C and Matlab based
Control and data separation
Kernels exploit massive data parallelism
Systems can pipeline kernels and interleave tasks
Compile system and kernels independently Provide multiple paths to ensure robust debugging
1818
18
ACAL – University of Michigan
Stream Example – DCH Channel
void DCH(channel<int16, frame> ADC_in, channel<int16, max_fingers> searcher_in, int num_fingers, channel<int16, frame> & to_MAC, signal<bool> & done){ channel<int16, frame> ch1[max_rake_finger]; channel<int16, frame> ch2;
stream { for (int i = 0; i < num_fingers; i++) rake(ADC_in, searcher_in[i], ch1[i]); combiner(ch1, ch2); viterbi(ch2, to_MAC); } done = true;}
Stream computation within the scope
channels and signals can be declared either as function arguments or as local variablesChannel merging done
in combiner function