Upload
timothy-miller
View
214
Download
1
Embed Size (px)
Citation preview
Electrical and Computer Engineering
Muhammad Noman Ashraf
Optimization of Data-Flow Computations Using Canonical TED Representation
M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation” , in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems
ECE 667 Synthesis and Verification of Digital SystemsSpring 2011
Slides adapted from D. Gomez-Prado,Q. Ren, M. Ciesielski, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
2Electrical and Computer Engineering
Overview
Motivation TED Review Related Work TED Decomposition System TED Linearization Product Term Extraction Sum-Term Extraction Reordering DFG Generation Replacing constant multipliers by Shifters Conclusion References
3Electrical and Computer Engineering
Motivation
F=a⋅ (f⋅ (g+d⋅ c)+c⋅ e⋅ g)
F=a⋅ f⋅ g+a⋅ f d⋅ c+a⋅ c⋅ e⋅ gMinimum number of operations: 5MPY, 2ADD
F=(a⋅ f)(g+d⋅ c)+(a⋅ c)⋅ e⋅ gnumber of operations: 6MPY, 2ADD
Res: 2MPY,1ADD
Res: 2MPY,1ADD
8MPY, 2ADD
1
2
3
4
5
1
2
3
4 L=3MPY+1ADD
L = 3MPY+2ADD
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
4Electrical and Computer Engineering
TED Review [Construction]
ywpwqwzux 2)(
zu
qw
(zu+qw)
+
x(zu+qw)
pw2
+
+
yw
Canonical for the given order:x,z,u,q,p,y,w
1 2w
^2 1 w
Notation: NON-LINEAR
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
5Electrical and Computer Engineering
RELATED WORK
HDL Compilers• High level synthesis systems – Cyber, Spark, Catapult C –
Lacks local optimility
Kernel based decomposition [Hosangadi et al, Optimizing Polynomial Expressions by algebraic factorization and cse, IEEE Transactions 2005]
• Lacks canonicity
Cut based decomposition (TED based) [Askar et al. “Data-flow transformations using Taylor expansion diagrams,” in Proc. Des. Autom. Test Eur., 2007]
• Limitation – only applicable to TEDs with disjoint decomposition property
6Electrical and Computer Engineering
Cut based decomposition (Related Work) Top down approach Apply a series of cuts (additive and multiplicative) to the edges such that it separates into two disjoint sub-graphs Different sequence of cuts results in different DFG
Sequence - A3,A1,M1,A2
7Electrical and Computer Engineering
Cut based decomposition (Related Work) Top down approach Apply a series of cuts (additive and multiplicative) to the edges such that it separates into two disjoint sub-graphs Different sequence of cuts results in different DFG
Sequence – A1,A3,M1,A2
Sequence - A3,A1,M1,A2
8Electrical and Computer Engineering
TED decomposition [TDS]
Cut based decomposition mentioned earlier only works for TEDs with disjoint decomposition property• Many TEDs don’t have this property
New approach – Bottom up• Identify algebraic operations and extract from the graph• Also works for TEDs without disjoint decomposition property• TED based factorization, CSE, and decomposition jointly referred asTED
decomposition
Systematically involves • Linearization• Product-term extraction• Sum-term extraction• Reordering• DFG generation
9Electrical and Computer EngineeringSlide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
TDS System Overview
TED linearization
Variable ordering
TED factorization & decomposition
Constant multiplication& shifter generation
Common subexpression elimination (CSE)
TED-based Transformations
Static timing analysis
Latency optimization
Resource constraints
DFG-based Transformations
Behavioral transformations
Optimized DFG
TDS netlist
TDS netlist
Designobjectives
Designconstraints
Structural elements
FunctionalTED
StructuralDFG
TDS flow
Matrix transforms,Polynomials
C, Behavioral HDL
DFG extraction
High Level Synthesis(GAUT)
RTL VHDL
Orig
inal
DF
G
HLS flow
10Electrical and Computer Engineering
TED Linearization
TED naturally represents polynomial in its factored form
This efficiency is missing when considering non-linear expressions
F=a2c+abca could be factored out
split a^2 intoa1 and a2
F=a1(a2+b)c
11Electrical and Computer EngineeringTED Decomposition
split w^2 intow1 and w2
TED Linearization [back to previous example]
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
12Electrical and Computer Engineering
TED Linearization [Concept]
^1
x^n^0
F0 F1
Fn…..
x1^0
F0
x2
F1xn
Fn-1
Fn
^1
^0
^0
^1
^1
• split xk = x1.x2.x3…..xk , where xi =xj for all i,j
• iteratively perform splitting on high order nodes
• above substitution results in Horner form which contains minimum no. of multiplications
13Electrical and Computer Engineering
Product Term Extraction
Extractable Product Term – product of variables which appear in expression only once• Can be extracted from TED without duplicating any of it’s variables
Set of nodes connected by a series of multiplicative edges only• starting and ending nodes can have incident additive edges• Starting and ending nodes can have more than one incoming or outgoing
multiplicative edge• Ending node can be terminal node 1
[TDS] recursively identify such terms by traversing the graph in a bottom-up fashion • For each node use depth first approach for including nodes in product term
14Electrical and Computer Engineering
start
u has only one * parent …YESu has only one child path …YES
z has only one * parent …YESz has only one * child path …NO
CONTINUE
BACKTRACK
zu
P1
P2
Product-Term Extraction [back to example]
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
15Electrical and Computer Engineering
Sum Term Extraction
Extractable Sum Term – sum of variables which appear in expression only once• Can be extracted from TED without duplicating any of it’s variables
“Set of nodes incident to multiplicative edges joined at a single common node, such that nodes in question are connected by a chain of additive edges only”
[TDS] recursively identify such terms by traversing the graph in a bottom-up fashion • For each node, make a list of incident nodes and extract the nodes from
the list if connected by additive edges only
[TDS] Uses associativity property of addition
16Electrical and Computer Engineering
Keep support(irreducible)
start
S1
Sum-Term Extraction [back to example]
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
17Electrical and Computer Engineering
Sum Term Extraction
Extractable Sum Term – sum of variables which appear in expression only once• Can be extracted from TED without duplicating any of it’s variables
“Set of nodes incident to multiplicative edges joined at a single common node, such that nodes in question are connected by a chain of additive edges only”
[TDS] recursively identify such terms by traversing the graph in a bottom-up fashion • For each node, make a list of incident nodes and extract the nodes from
the list if connected by additive edges only
[TDS] Uses associativity property of addition
19Electrical and Computer Engineering
Stop when TED isIrreducible.
Now generate DFG – (to be explained later)
If Sum term extraction results in more product terms, go back
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
Sum-Term Extraction [cont. – back to example]
20Electrical and Computer Engineering
P3
P4
P5 S3Stop when TED isIrreducible.
S2
Reordering [Back to previous example -> Iteration 2 extraction]
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
22Electrical and Computer Engineering
DFG Generation and Optimization
Transform each irreducible TED into simple DFG• Additive edge -> addition operation• Multiplicative edge -> multiplication operation• Break multiple operands operations into chain of operations
[TDS] maintain a hash table for DFG nodes keyed by the corresponding function • Helps in reusing the node, if same function/expression found again• Captures redundancy due to poor variable order during factorization
DFG is not unique• Can be restructured and balanced to minimize cost
23Electrical and Computer Engineering
Data Flow Graph
L=2MPY+2ADD
Req 3MPY, 2ADD
total: 5MPY, 3ADD
Reordering cost
1
2
3
4
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
24Electrical and Computer Engineering
S2
P3
P4 S3
L=2MPY+2ADD
Req 3MPY, 2ADD
Reordering [-> Iteration 3 extraction]
Cost involves
Reordering of variable
Extraction
DFG generation
Annotating Latency and resource requirements
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
25Electrical and Computer Engineering
1
2
3
4
F
1
2
3
4
5
total: 4MPY , 3ADD
F = S3 = P4+P3 = w⋅S2+x⋅P1 = w⋅(q+S1)+x⋅(z⋅u) = w⋅(q+P2+y)+x⋅z⋅ u = w⋅(q+p⋅w+y)+x⋅z⋅u
L=2MPY+2ADD L=2MPY+3ADD
Req 1MPY,1ADD
1×
1×
1×1×1+
1+1+
Reordering cost
L=2MPY+2ADD
Req 2MPY, 1ADD
Previous cost
L=2MPY+2ADD
Req=3MPY,2ADD
Generating and evaluating new Data Flow Graph [Iteration 3]
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
26Electrical and Computer Engineering
Through reordering all cases can be obtained
1
234
Reordering [-> Iteration 4 extraction,DFG generation]
Design Space Exploration
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
27Electrical and Computer Engineering
Replacing constant multipliers*
By shifters• Transform constant multiplications into shifters, while considering factorization involving
shifters Steps
• Represent constant in CSD format – Use shift variable Li (instead of 2i for shifting i bits• Generate TED with shift variables, linearize it and perform decomposition• Replace terms involving shift variables (Li) by i-bit shifters
7a + 6bL3(a+b) - L.b - a ((a+b)<<3) – (a+
(b<<1))
(L3-1)a+(L3-L)b
28Electrical and Computer EngineeringSlide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
TDS – TED Decomposition System RECAP Read in the CDFG file (cdfg) or polynomial expression (poly) or using pre-
coded DSP transforms (tr) Translate into functional TED (dfg2ted) and structural elements (comparators
etc.) Linearize its data path (linearize) Iterate
• Iterate• Product term extraction• Sum term extraction
• Reorder to minimize latency (reorder) Set of irreducible TEDs Produce Final DFG (ted2dfg)and annotate back the CDFG file (write) Data flow and computation intensive designs - DSP
Design Space Exploration
29Electrical and Computer Engineering
Conclusion
Results in the paper show 15% Latency improvement and 7% area reduction when using DFG generated from TDS instead of using KBD• Far better results when compared to original DFG
TDS – front end to GAUT
Fundamental limitation – decomposition dependent upon variable reordering which is an expensive operation
30Electrical and Computer Engineering
REFERENCES
M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation”, in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems
M. Ciesielski, S. Askar, D. Gomez-Prado, J. Guillot, and E. Boutillon, “Data-flow transformations using Taylor expansion diagrams,” in Proc. Des. Autom. Test Eur., 2007, pp. 455–460
TDS—TED-Based Dataflow Decomposition System, Univ. Massachusetts,Amherst, MA. [Online]. Available: http://www.ecs.umass.edu/ece/labs/vlsicad/tds.html
32Electrical and Computer Engineering
Experiment Setup*
TED linearization
Variable ordering
TED factorization & decomposition
Constant multiplication& shifter generation
Common subexpression elimination (CSE)
TED-based Transformations
Static timing analysis
Latency optimization
Resource constraints
DFG-based Transformations
Behavioral transformations
Optimized DFG
TDS netlist
TDS netlist
Designobjectives
Designconstraints
Structural elements
FunctionalTED
StructuralDFG
TDS flow
Matrix transforms,Polynomials
C, Behavioral HDL
DFG extraction
High Level Synthesis(GAUT)
RTL VHDL
Orig
inal
DF
G
HLS flow
KBD ORIGINAL
TED
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
33Electrical and Computer Engineering
Results*
KBD
KBDKBD
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
34Electrical and Computer Engineering
Results: Quintic Spline*
KBD
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
35Electrical and Computer Engineering
Results: Quartic spline*
KBD
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
36Electrical and Computer Engineering
Improvement over KBD and Original*
KBD
KBD
Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)