View
220
Download
1
Tags:
Embed Size (px)
Citation preview
Physically Aware Data Communication Optimization
for Hardware Synthesis
Ryan Kastner, Wenrui Gong, Xin Hao, Forrest Brewer
Dept. of Electrical and Computer Engineering
University of California, Santa Barbara
Adam Kaplan, Philip Brisk and Majid Sarrafzadeh
Computer Science Department
University of California, Los Angeles
Hardware Compilation
Application specified in high level language
CompilerCompiler
SynthesisSynthesis
and and
PhysicalPhysical
DesignDesign
HDLHDL(behavioral,(behavioral,
structural)structural)
We focus our efforts on mapping an application written in a high-level language to a hardware description
We desire this mapping to have optimal characteristics (area, latency, etc.)
In this talk, we focus on the problem of minimizing data communication in the final hardware Chip, bitstream, …
Obligatory Design Flow SlideSUIF:
Syntactic &SemanticAnalysis
ApplicationSpecification
ASTMachine
SUIF:CompilerBackend
SSACDFG
4. Synthesize behavioral HDL code to RTL code
Behavioral Synthesis
Logical & Physical Synthesis
8. Synthesize RTL code
Entity 1
Entity 3Entity 2
Entity 4
6. Determine structural controland data communicationbetween basic block entities
7. Generate synthesizable RTL code
CFG Entity5. Create CFG interface
entity cfg is…
architecture behavioral of cfg…
2. Transform instruction list to dataflow graph
1. Create interface
++
+ *
*
3. Transform dataflow graph to behavioral HDL code
Basic Block Entity
entity basic_block is…
architecture behavioral of basic_block…
Characterizing Data Communication
Examples of data communication schemes
Control Node 1
Control Node 3
Control Node 2
Control Node 4
Memory(Register
Bank,RAM)
Control Node 4
Control Node 2
Control Node 3
Control Node 1
Bus
Distributed Distributed Centralized Centralized Data communication = wire Data communication = memory access
Identifying Data Communication Determine relationship between place(s) where data is
defined and where data is used
b …
a …
a
a …
a …
c …b …
b c
Naïve method: all use-points of a variable depend on all definitions of that variable
Not all use points “use” a variable
Need analysis to minimize Need analysis to minimize the amount of data the amount of data communicationcommunication
Global Data Communication = 5 variables
Use of SSA in Compilation
b …
a …
a
a …
a …
c …b …
b c
b1 …
a2 …
a4
a3 …
a1 …
c1 …b2 …
b1 c1
a4 (a2,a3)
Must determine relationship between where data is generated and where data is used
Problem formulations [DAC02]: Minimize the total number of
bits communicated between all pairs of control nodes
Today: Minimize overall wirelength SSA (Static Single Assignment)
Changes each variable to have a unique definition point
Must add -nodes to merge definitions
Physically Aware Compiler Transforms
Consider layout information during compilation Modify transforms to consider physical info Ideal: full physical synthesis – extremely
accurate, but way too time consuming
PhysicalSynthesis
HardwareCompilation
application
Floor-planner
Approximate using floorplanningMuch fasterGives “good enough” high level
physical picture Previous data communication work
No physical informationCan lead to negative results
Let’s Get Physical!
Physically Aware Data Communication
Modify placement of Φ-functions to consider wirelength
1. Given a CFG Gcfg(Vcfg, Ecfg)
2. perform_ssa(Gcfg)
3. calculate_def_use_chains(Gcfg)
4. remove_back_edges(Gcfg)
5. topological_sort(Gcfg)
6. foreach vertex v Vcfg
7. foreach -node v
8. s .sources
9. d |def_use_chain(.dest)|
10. IDF iterated_dominance_fronter(s)
11. PossiblePlacements findPlacementOptions(IDF)
12. place()
selectBest(PossiblePlacements)
13. distribute/duplicate to place()
-Placement Algorithm
1. Given a set of CFG Nodes R
2. -options
3. insert(R) into-options
4. foreach instruction i R
5. if( i is a destination of -function f )
6. return -options
7. temp_-options
8. foreach non-dominated child c of R
9. temp_-options crossProductJoin(temp__options, findPlacementOptions(c))
10. return-options temp_-options
FindPlacementOptions Algorithm
PhysicalSynthesis
HardwareCompilation
FullFloor-
planner
1. Initial optimization minimizes data communication
2. Full SA based floorplanning3. Reoptimization based to minimize
floorplanning4. Full SA based floorplanning
Floorplan Wirelength
1
10
100
1000
10000
100000
1000000
10000000
benchmark
wire
leng
th (l
ogar
ithm
ic)
WL (first)
WL (second)
Spectacularly negative results
Full Floorplanning Results
Simple iterative approach
Incremental Floorplanning
Incremental Placement [Coudert et al]: Given an optimized placement and a set of changes to
the netlist (e.g., due to technology remapping) modify the placement to improve it.
Equally applicable to floorplanning
6
1
2
3
4
6
Initial Floorplan Modified Floorplan
Perturbations 1
2
3
4
6
6
1
floorplanmodules (e.g. due to -function movement) floorplan
1
2
3
4
6
6
|
2/2.3 - 9/10.1 -
11/12.4 - 16/18 -
5/5.6 - 27/30.4 -
32/36 -
-
3
-
2
1
4
Incremental Floorplan
Our Incremental Floorplanner
IncrementalFloorplanner
6
1
2
3
4
6
Initial Floorplan Modified Floorplan
Perturbations1
2
3
4
6
Our Incremental Floorplanner
1. Calculate area & room of each node: bottom up slicing tree traversal
2. Area redistribution Top down traversal Increase area if necessary
Not enough space at root Aspect ratios become too distorted
1
2
3
4
6
6
|
2/2.3 - 9/10.1 -
11/12.4 - 16/18 -
5/5.6 - 27/30.4 -
32/36 -
-
3
-
2
1
4
Incremental FloorplanModified Floorplan
1
2
3
4
Simple, yet effective
Other more complicated algorithms might work better
MediaBench Functions
Benchmark Blocks Links Weight Initial WL
1adpcmcoder
33 31 54 2688 35568
2adpcm
decoder26 23 44 1952 21588
3internal
filter10 143 60 17088 411637
4Internalexpand
101 94 257 14336 317031
5compress
output34 17 60 2368 29114
6mpeg2dec
block62 13 66 2272 34510
7mpeg2dec
vector16 4 26 1024 4366
8 FAST 14 4 15 704 3714
9 FR4TR 77 87 155 704 340697
10 det 12 5 13 7936 3772
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 avrg
Initial Overall Optimal Overall Incremental Phi Optimal Phi Incremental
Incremental Floorplanning Results
Norm
alized
Wir
ele
ng
th
Benchmarks
“Optimal” Approach:12% Overall Wirelength Reduction
25% Phi-node Wirelength Reduction
Our Approach:6% Overall Wirelength
Reduction 8% Phi-node Wirelength
Reduction
Related Work
Hardware compilation projects using SSA PDG+SSA form [UCSB] CASH [CMU] SA-C [UCR] Sea Cucumber [BYU]
Physically aware behavioral synthesis techniques SA for scheduling, binding and floorplanning [Prabhakaran97] SA for binding and floorplanning [Yung-Ming94] Scheduling, allocation and binding [Dougherty00] Fasolt: bus topology [Knapp92] High level synthesis [Tarafdar00]
Incremental CAD Problem overview/challenges [Coudert00] Floorplanning [Crenshaw99]