Upload
morgan-barrett
View
229
Download
0
Tags:
Embed Size (px)
Citation preview
11
Alan MishchenkoAlan Mishchenko
UC BerkeleyUC Berkeley
Implementation of Implementation of Industrial FPGA Synthesis Flow Industrial FPGA Synthesis Flow
RevisitedRevisited
22
OverviewOverview IntroductionIntroduction
MotivationMotivation Structure of FPGA synthesis flowStructure of FPGA synthesis flow Overview of the previous systemOverview of the previous system
Lessons learned while developing new systemLessons learned while developing new system Verilog parsingVerilog parsing Design representationDesign representation Netlist datastructureNetlist datastructure Integration of application packagesIntegration of application packages CustomizationCustomization
Experimental resultsExperimental results Future workFuture work
33
MotivationMotivation
ABC is a logic synthesis and verification tool ABC is a logic synthesis and verification tool
developed at Berkeley (developed at Berkeley (http://www.bvsrc.org/http://www.bvsrc.org/)) ABC has been in public domain since 2005, but it ABC has been in public domain since 2005, but it
does not meet all of the industrial requirementsdoes not meet all of the industrial requirements New system is needed to fill the gapNew system is needed to fill the gap Magic was an industrial version of ABC developed Magic was an industrial version of ABC developed
in 2010 and used by several companiesin 2010 and used by several companies A new system to enhance ABC and replace Magic A new system to enhance ABC and replace Magic
is being developed at this timeis being developed at this time This presentation shares this experienceThis presentation shares this experience
44
What Is Missing in ABC?What Is Missing in ABC?
The baseline version of ABC is not applicable to The baseline version of ABC is not applicable to industrial designs because it does not supportindustrial designs because it does not support Complex flopsComplex flops Multiple clock domainsMultiple clock domains Special objects (adders, RAMs, DSPs, etc)Special objects (adders, RAMs, DSPs, etc) Standard-cell librariesStandard-cell libraries
55
FPGA Synthesis FlowFPGA Synthesis Flow
Inputting the design
Sequential synthesis
Comb synthesis with choices
Retiming and resynthesis
Tech mapping
Outputting the design
Ver
ifica
tion
66
Magic: Synthesis Flow Based on ABCMagic: Synthesis Flow Based on ABC
Design database
Sequential synthesis
AIG rewriting
File / Code interface
Computing choices
Tech mapping
Retiming
Structuring for delay
Post-place resynthesis
Verification
Verilog, EDIF, BLIF
Programmable APIs
A. Mishchenko, N. Een, R. K. Brayton, S. Jang, M. Ciesielski, and T. Daniel, "Magic: An industrial-strength logic optimization, technology mapping, and formal verification tool". Proc. IWLS'10.
77
Case Study 1: Combinational Case Study 1: Combinational Synthesis with Structural ChoicesSynthesis with Structural Choices
Traditional synthesisTraditional synthesis
D2D2D1D1
Synthesis with choicesSynthesis with choices
D3D3HAIGHAIG
D2D2D1D1 D3D3 D4D4
D4D4
Perform synthesis and keep track of changesPerform synthesis and keep track of changes Iterate fast local AIG rewriting with a global view (via hash table)Iterate fast local AIG rewriting with a global view (via hash table) Collect AIG snapshots and prove equivalences across themCollect AIG snapshots and prove equivalences across them Use equivalences (choices) during technology mappingUse equivalences (choices) during technology mapping
ObservationsObservations Leads to improved QoR after technology mappingLeads to improved QoR after technology mapping Successfully applied to 1M gate designsSuccessfully applied to 1M gate designs
88
Case Study 2: Sequential VerificationCase Study 2: Sequential Verification
Property checkingProperty checking Takes design and property and Takes design and property and
makes a miter (AIG)makes a miter (AIG) Equivalence checkingEquivalence checking
Takes two designs and makes a Takes two designs and makes a miter (AIG)miter (AIG)
The goal is to transform AIG until The goal is to transform AIG until the output can be proved const 0the output can be proved const 0
Equivalence checking in Magic is Equivalence checking in Magic is based on the model checker that based on the model checker that won Hardware Model Checking won Hardware Model Checking Competition in 2008, 2010, 2011Competition in 2008, 2010, 2011
http://fmv.jku.at/hwmcc1http://fmv.jku.at/hwmcc111/results.html/results.html D2D2D1D1
Equivalence checkingEquivalence checking
0
D1D1
Property checkingProperty checking
0
pp
99
AIG: A Unifying RepresentationAIG: A Unifying Representation
An underlying data structure for various computationsAn underlying data structure for various computations Representing both local and global functionsRepresenting both local and global functions Used in rewriting, resubstitution, simulation, SAT sweeping, Used in rewriting, resubstitution, simulation, SAT sweeping,
induction, etcinduction, etc
A unifying representation for the whole flowA unifying representation for the whole flow Synthesis, mapping, verification pass around AIGsSynthesis, mapping, verification pass around AIGs Stored multiple structures for mapping (‘AIG with choices’)Stored multiple structures for mapping (‘AIG with choices’)
The main functional representation in ABCThe main functional representation in ABC Foundation of ‘contemporary’ logic synthesis Foundation of ‘contemporary’ logic synthesis Source of ‘signature features’ (speed, scalability, etc)Source of ‘signature features’ (speed, scalability, etc)
1010
AIG: DAIG: Definition and efinition and EExamplesxamples
cdcdabab 0000 0101 1111 1010
0000 00 00 11 00
0101 00 00 11 11
1111 00 11 11 00
1010 00 00 11 00
F(a,b,c,d) = ab + d(ac’+bc)
F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d)
cdcdabab 0000 0101 1111 1010
0000 00 00 11 00
0101 00 00 11 11
1111 00 11 11 00
1010 00 00 11 00
6 nodes
4 levels
7 nodes
3 levels
b ca c
a b d
a c b d b c a d
AIG is a Boolean network composed of two-input ANDs and invertersAIG is a Boolean network composed of two-input ANDs and inverters
1111
Design size, gate count
Time, years
1950-1970 1980 1990 2000
Conjunctive normal forms
Truth tables
Sum-of-products
Binary Decision Diagrams
Historical PerspectiveHistorical Perspective
And-Inverter Graphs
10
100
1,000,000
Espresso, MIS, SIS
SIS, VIS, MVSIS
ABC, Magic
2010
10,000
1212
Magic 2: Lessons LearnedMagic 2: Lessons Learned (1) Verilog parsing(1) Verilog parsing
Limit Verilog to a structural subsetLimit Verilog to a structural subset (2) Design representation(2) Design representation
Represent only relevant data and hide useless detailsRepresent only relevant data and hide useless details (3) Netlist data-structure(3) Netlist data-structure
Use simple, compact netlist data-structureUse simple, compact netlist data-structure (4) Integration of application packages(4) Integration of application packages
Make packages independent of the netlist and Make packages independent of the netlist and interface them using AIGsinterface them using AIGs
(5) Customization(5) Customization Make the system user-independentMake the system user-independent
1313
(1) Verilog Parsing(1) Verilog Parsing
Verilog parsing is believed to be a difficult problem, and Verilog parsing is believed to be a difficult problem, and companies (e.g. Verific) offer industry-standard solutionscompanies (e.g. Verific) offer industry-standard solutions
However, several simplifying assumptions can make However, several simplifying assumptions can make Verilog parsing a 1-person 1-month project:Verilog parsing a 1-person 1-month project:
Consider only structural VerilogConsider only structural Verilog Read the file into memory and parse it in memoryRead the file into memory and parse it in memory Remove preprocessor definitions, comments, line endings, etcRemove preprocessor definitions, comments, line endings, etc Split into statements separated by semi-colons (;)Split into statements separated by semi-colons (;) Parse in two passes: first statements for module interfacesParse in two passes: first statements for module interfaces
• module/endmodule, input/output/inout, etcmodule/endmodule, input/output/inout, etc Second, parse remaining statements, including instance definitionsSecond, parse remaining statements, including instance definitions Connect all constructed objects using net/pin namesConnect all constructed objects using net/pin names Check the correctness of the connectivity infoCheck the correctness of the connectivity info
1414
ExampleExample
module add2( A, B, S, CO );module add2( A, B, S, CO ); input [1:0] A , B;input [1:0] A , B; output CO, S[1:0];output CO, S[1:0]; wire n1;wire n1; fadd inst1 (.ci(1’b0), .a(A[0]), .b(B[0]), .s(S[0]) , .co(n1) );fadd inst1 (.ci(1’b0), .a(A[0]), .b(B[0]), .s(S[0]) , .co(n1) ); fadd inst2 (.ci(n1), .a(A[1]), .b(B[1]), .s(S[1]) , .co(CO) );fadd inst2 (.ci(n1), .a(A[1]), .b(B[1]), .s(S[1]) , .co(CO) );endmoduleendmodule
module fadd( ci, a, b, s, co );module fadd( ci, a, b, s, co ); input ci, a, b;input ci, a, b; output s, co;output s, co; assign s = ci ^ a ^ b;assign s = ci ^ a ^ b; assign co = (ci & a) | (ci & b) | (a & b);assign co = (ci & a) | (ci & b) | (a & b);endmoduleendmodule
1515
(2) Design Representation(2) Design Representation
Structural informationStructural information Inputs, outputs, wires, internal objects, etcInputs, outputs, wires, internal objects, etc Hierarchy (to be flattened, to be kept, library cells, etc)Hierarchy (to be flattened, to be kept, library cells, etc)
Functional informationFunctional information Combinational: gates, LUTsCombinational: gates, LUTs Sequential: flip-flops, clocksSequential: flip-flops, clocks
Additional structural informationAdditional structural information White/black/grey boxes: RAM, DSP, regfiles, etcWhite/black/grey boxes: RAM, DSP, regfiles, etc Multiple clock domains, clock networkMultiple clock domains, clock network Tri-states, in-outs, etcTri-states, in-outs, etc
1616
Handling Design RepresentationHandling Design Representation
Design representation should be comprehensive Design representation should be comprehensive (represent complete information) but flexible (represent complete information) but flexible (work only on what is necessary at each time)(work only on what is necessary at each time)
Examples:Examples: to flatten hierarchy, only structural info is neededto flatten hierarchy, only structural info is needed to perform comb synthesis, only comb logic is neededto perform comb synthesis, only comb logic is needed
In both cases, it should be possible to access In both cases, it should be possible to access and modify each type of information without and modify each type of information without changing other typeschanging other types
1717
(3) Netlist Data Structure(3) Netlist Data Structure
Should be very simple and easy to constructShould be very simple and easy to construct Objects use as little memory as possibleObjects use as little memory as possible
• Currently, 4-LUT uses 28 bytes + memory for attributesCurrently, 4-LUT uses 28 bytes + memory for attributes Object attributes are added/removed on demandObject attributes are added/removed on demand
• For example, no need for fanout information in most casesFor example, no need for fanout information in most cases Objects ordered in memory in a topological orderObjects ordered in memory in a topological order
• Improves runtime of iterative traversalsImproves runtime of iterative traversals• Makes the code much simplerMakes the code much simpler
LimitationLimitation Each time the netlist is modified, it needs to be Each time the netlist is modified, it needs to be
duplicatedduplicated
1818
(4) Integration of Application (4) Integration of Application PackagesPackages
Application packages Application packages interact with design interact with design databasedatabase
Logic information is Logic information is extracted and inserted extracted and inserted in the form of AIGsin the form of AIGs
Synthesis & verification Synthesis & verification are performed by ABC are performed by ABC working on these AIGsworking on these AIGs
Design database
Sequential synthesis
AIG rewriting
File / Code interface
Computing choices
Tech mapping
Retiming
Structuring for delay
Post-place resynthesis
Verification
1919
(5) Customization(5) Customization
The system should be easily customizableThe system should be easily customizable The source code is the same for all usersThe source code is the same for all users Configuration files differConfiguration files differ
Currently, the user “owns” the following:Currently, the user “owns” the following: The library of primitives (a Verilog file)The library of primitives (a Verilog file) Timing info for primitives (e.g. LUT pin delays)Timing info for primitives (e.g. LUT pin delays) Timing models used for calculating data for Timing models used for calculating data for
boxes, complex flops, wires, etcboxes, complex flops, wires, etc
2020
Experimental SetupExperimental Setup Integrated Magic into an industrial FPGA synthesis flowIntegrated Magic into an industrial FPGA synthesis flow Experimented with the full flow, including P&RExperimented with the full flow, including P&R
Did not use retimingDid not use retiming Did not use post-placement re-synthesisDid not use post-placement re-synthesis
Verified by running Magic and in-house simulation toolsVerified by running Magic and in-house simulation tools Experimented with 20 designs, from 175K to 648K LUT4Experimented with 20 designs, from 175K to 648K LUT4 Two experimental runs:Two experimental runs:
““Reference” stands for the typical industrial flow without MagicReference” stands for the typical industrial flow without Magic ““Magic” stands for the new flow with MagicMagic” stands for the new flow with Magic
Frontend
Design entry, high-level synthesis, quick mapping
BackendPlacement, routing,
design rule checking, etc
Magic
Seq and comb synthesis, mapping, legalization
2121
Experimental ResultsExperimental ResultsProfile Reference Magic
Circuits PI PO LUT FF Lev fMAX Time LUT FF Lev fMAX Time
C1 736 369 174972 113157 12 128.53 1.05 173561 100398 10 133.87 0.70
C2 150 67 187037 112991 18 91.32 0.53 161303 93930 16 95.69 0.67
C3 4 80 199097 53954 27 68.49 0.69 137126 36190 20 75.59 0.77
C4 517 253 206725 132416 11 105.37 1.31 197029 114745 8 129.20 0.67
C5 4 280 212124 64120 26 68.82 0.65 152799 49513 19 77.70 0.74
C6 803 258 255415 166644 11 113.25 2.08 255026 148445 8 123.00 1.00
C7 24 10 296152 133704 17 89.93 0.72 246908 114002 14 120.48 0.90
C8 124 58 323818 86712 32 40.68 1.99 346516 86662 25 47.08 1.94
C9 268 132 413017 195150 18 81.50 1.40 375481 174306 15 79.81 1.61
C10 205 94 439963 134139 20 63.17 3.55 445950 133575 15 69.06 2.64
C11 148 456 455429 160450 96 27.53 2.23 398428 149126 56 33.11 1.90
C12 4 3 455630 20277 6 66.67 0.78 152414 19446 6 100.40 0.41
C13 4 240 470436 230811 28 53.59 3.30 462010 225676 18 57.34 6.18
C14 218 69 522988 311436 17 68.78 1.83 448426 257996 15 69.40 2.19
C15 377 183 575355 351911 10 136.05 2.59 575672 349715 8 136.99 2.95
C16 73 33 599413 216051 4 202.02 1.07 599413 216051 4 209.21 1.79
C17 136 66 618377 259844 56 47.66 2.75 562367 243084 34 53.53 2.61
C18 136 66 621875 249327 27 45.68 4.60 606135 247825 27 52.58 4.03
C19 146 391 630918 275871 55 46.36 2.50 572834 259336 36 50.76 2.51
C20 135 32 648849 353940 7 127.71 2.45 645501 353616 5 136.43 2.91
Geomean 377883 150015 18.54 74.768 1.591 329751 135972 14.40 83.572 1.541
Ratio 1 1 1 1 1 0.873 0.906 0.777 1.118 0.969
2222
Cumulative ImprovementCumulative Improvement(retiming excluded)(retiming excluded)
2222
2323
Future WorkFuture Work
Improve the integrationImprove the integration Simpler interfaces, better data consistency checking, etcSimpler interfaces, better data consistency checking, etc
Improve application packagesImprove application packages AIG rewriting, tech-mapping, sequential synthesis, etcAIG rewriting, tech-mapping, sequential synthesis, etc
Integrate logic and physical synthesisIntegrate logic and physical synthesis Synthesis/mapping/retiming before placementSynthesis/mapping/retiming before placement Retiming/restructuring after placementRetiming/restructuring after placement
Extend to work for various technologiesExtend to work for various technologies Standard cellsStandard cells Macro cellsMacro cells LUT structuresLUT structures LUT/MUX structuresLUT/MUX structures
2424
AbstractAbstract
This talk is inspired by the recent experiences gained This talk is inspired by the recent experiences gained while developing an industrial-strength system for FPGA while developing an industrial-strength system for FPGA synthesis and mapping. First, we review the design synthesis and mapping. First, we review the design representation with "industrial stuff", such as black and representation with "industrial stuff", such as black and while boxes, complex flops, multiple clock domains, while boxes, complex flops, multiple clock domains, tristates, inouts, etc, and how to handle them in the tool tristates, inouts, etc, and how to handle them in the tool whose primary strength is applying combinational whose primary strength is applying combinational synthesis and mapping. Next, we discuss several ideas synthesis and mapping. Next, we discuss several ideas for implementing a custom Verilog parser for hierarchical for implementing a custom Verilog parser for hierarchical designs. Finally, we propose a low-memory netlist designs. Finally, we propose a low-memory netlist representation used to store the data and interface representation used to store the data and interface various optimization engines. various optimization engines.