Upload
aleesha-chase
View
217
Download
0
Embed Size (px)
Citation preview
FPL 2003 - Sept. 2, 2003
Software Decelerators
Eric Keller, Gordon Brebner and Phil James-Roxby
Xilinx Research Labs
FPL 2003Software Decelerators 2
Talk Outline
• Background• Software Decelerators• Case Study: Finite State Machines• Results• Conclusions
FPL 2003Software Decelerators 3
Modern Platform FPGA
High Performance Sync Dual-Port™ RAM
SelectIO™-Ultra Technology
Advanced FPGA Logic 18 Bit
18 Bit36 Bit
Embedded DSP Functionality
PowerPC™ Processors 400+ MHz clock rate
DCM
Digital Clock Management
Z
VCCIO
Z
Z
ImpedanceControl
Digitally Controlled Impedance
High-speed Serial Transceivers 622 Mbps to 3.125 Gbps
FPL 2003Software Decelerators 4
Hardware Accelerator
• Processor-Centric• Algorithms executed on processor
– key functions performed by hardware
• Goal: Increase overall performance
ProcessorMem DWT
JPEG2000
Tier 1 CoderRCT
FPL 2003Software Decelerators 5
Motherboard On A Chip
• Processor running an operating system• Common board peripherals on FPGA
– Ethernet MAC– SVGA
controller
FPL 2003Software Decelerators 6
Logic-centric viewpoint
• Consistent with an interface-centric view that is appropriate for reactive systems - highly relevant for future ambient intelligence/ubiquitous computing
• Processors have no special status in systems, and indeed play only a secondary role as ‘function units’
• Explicit ‘hardware-software co-design’ becomes lesser issue - certainly no top-level partitioning
• Hardware accelerators of processor-centric model are inverted and replaced by ‘software decelerators’
FPL 2003Software Decelerators 7
Software Decelerators
• Algorithms are executed in logic– Processor executes software to perform one or more
services for programmable logic
&
inputs outputs*
+
+
PPC
FPL 2003Software Decelerators 8
Motivation
• Emergence of platform FPGAs• To increase overall system quality
– by making use of services provided by processor
• Ease of designing a complex function• Offload non time-critical logic
– to achieve a better partition (e.g. saving area)
• Offload corner cases– e.g. in MIR IPv4 packets handled in logic, IPv6 handled in
processor
FPL 2003Software Decelerators 9
Goals
• Overall area consumed by software decelerator should not be greater than logic counterpart
• Interfacing logic should consume minimal logic• Interface should shield logic from processor
– and vice versa
• Provide timing and resource usage information• Implementation neutral method to capture design
FPL 2003Software Decelerators 10
Example: finite state machines• Implement a general class of sequential functions that
are recognizable in digital designs• Processor determines next state and state outputs to
meet schedule determined by logic-based system– possibility to support multiple state machines
Hardwareplatform
Software
Timing report
FSMdeceleratorgenerator
GraphicalRepresentation
TextualRepresentation
FPL 2003Software Decelerators 11
Design Entry
• Graphical front end– e.g. StateCAD
• Textual intermediate representation– XML to support many design entry methods
• Define interface
• Define state
<variables> <variable name=“op” dir=“in” width=“4”/></variables>
<state name=“stateADD”> <eqns> <eqn lhs=“out0” rhs=“in1+in2/> </eqns> <transitions> <tran next=“state1”/> </transitions></state>
FPL 2003Software Decelerators 12
Logic-Processor Interface
• Rest of system doesn’t see processor signals• Choice of interface
– PowerPC’s native busses: PLB, OCM, DCR• With only two nodes, optimizations are possible
– interface logic always being addressed– No need for arbiter
PowerPC
FPL 2003Software Decelerators 13
Clocking
• Polling/Interrupt on external clock– processing time for state must be less than clock period– processor uses polling to detect clock edges – clock edge causes an interrupt
• Software Generated– processor generates clock pulse using a memory
mapped circuit– allows different states to take different processing time
FPL 2003Software Decelerators 14
Software Design
• General case is complex requiring timing analysis• Assembly code generation
– each state has same structure (clock/reset, equations, transitions)
• Execute out of cache– predictable memory accesses
• Accurate timing generation– count the exact number of cycles it will take for each
state and transition
FPL 2003Software Decelerators 15
Results: Resource Usage
OCM DCR PLB
sys FFs LUTs Ratio FFs LUTs Ratio FFs LUTs Ratiors232 1 4 3.6% 2 6 5.4% 4 8 7.2%miim 20 38 62.3 21 40 65.6 23 42 68.9
tx_host_io
94 75 23.4 95 77 24.1 97 79 24.7
*Ratio is the area of the decelerator as a percentage of area consumed by a logic implementation
FPL 2003Software Decelerators 16
Results: Performancesystem Worst-
casePerf.(cycles)
Worst-casePerf.(MHz)
% timein I/O
CodeSize(kbytes)
CodeSize(% ofcache)
rs232 40 8.75 30.95% 1416 8.6%
miim 74 4.73 25.22% 2968 18.1%
tx_host_io 135 2.59 33.99% 1952 11.9%
FPL 2003Software Decelerators 17
Conclusions
• Software decelerators – through example of FSM based design methodology– extendable to other functions– can provide an increased overall system quality
• Methodology applicable to subset of designs– achievable speeds vary with characteristics of FSM
• I/O takes a lot of processing time
FPL 2003Software Decelerators 18
Future Work
• Further study implications of logic centric model• Automatic selection and synthesis of logic-
processor interfaces• Characteristics of hard/soft processors
– e.g. I/O takes large percentage of time
• FSM based architectural components• Domain-specific high-level design entry and tools