Sebastian Brandhofer, Philipp Göttlich and Adrian Lanksweirt
Supervisor: Eric Schneider and Michael Kochte
Analysis of Hardware-Accelerated Applications in Reconfigurable
Network-on-a-Chip Based Systems
4
Motivation
CPU
Reconfigurable Blocks:
RCB
Calculate
Multiplication
reconfigure
Multiplication
Unit (MU)
9
Outline
Purpose of this Project
Existing Techniques
Reconfigurable Blocks
Network on a Chip
Implementation
Tests & Results
Conclusion
10
Purpose of this Project
Combine the advantages of RCBs and a NoC in one system
Analyse whether a system like this can be used to accelerate applications
Identify conditions for the best acceleration of a computation
Examine the behavior of the system
11
Existing Techniques - Reconfigurable Blocks (RCBs)
Can be configured at runtime with different hardware components
Acceleration of the computation of certain functions
Area reduction because one RCB can substitute more than one hardware component
Configuration process requires time
Typically implemented using FPGAs
12
Existing Techniques - Network-on-a-chip (NOC)
Connects the hardware components of a system via routers which create communication links
Connection of many different hardware components
Scalable communication in complex System-on-a-chip hardware systems
Wrappers enable the RCBs to communicate with the NoC
Routers forward the packets to their destination in the NoC
13
Implementation
Model in SystemC: Dependable Reconfiguration Platform and Simulator (DROPS)
NoC structure provided by Noxim1
Instruction Set Simulator (ISS) from SoCLib²
Models a Xilinx MicroBlaze CPU
RAM, interrupt controller unit (ICU) from SoCLib
Cache, Simulationhelper, TTY from SoCLib
RCBs with Wrapper
Realisable on a Xilinx FPGA Sources: [1] http://sourceforge.net/projects/noxim/ [2] http://www.soclib.fr/trac/dev
15
Implementation
A Runtime System was developed:
In order for the MicroBlaze processor to control the RCBs
Supervises the state of each RCBWrapper and performs
reconfigurations
Enables the interaction between MicroBlaze and RCBs
Memory-mapped
16
Tests & Results
Tests were done with the Mandelbrot Set
Many multiplication operations ISS cannot compute multiplications efficiently
Conducting different experiments to examine the acceleration of the Mandelbrot Set computation
17
Tests & Results
Testsystem (right picture):
2-way associative cache 1024 lines à 16 words
Test:
16 x 16 Pixel of Mandelbrot
One accelerator in a RCB calculates one single pixel
Pure software execution on MicroBlaze: 4.65 * 107 cycles 181641 cycles for one pixel
Accelerated execution with 5 RCBs: 4 * 105 cycles 1563 cycles for one pixel
Only 0.9% of the cycles are needed!
20
Conclusion
A system was modelled and implemented which uses a NoC structure and RCBs (called DROPS)
It was shown that a NoC containing RCBs can be used to accelerate applications like the Mandelbrot Set
The system was examined with different numbers of RCBs and hops
The acceleration cannot be signifcantly increased further after 3 RCBs
Tests with the number of hops showed that the components should be placed close to each other