22
Optimality Study of Logic Optimality Study of Logic Synthesis for LUT-Based FPGAs Synthesis for LUT-Based FPGAs Jason Cong and Kirill Minkovich Jason Cong and Kirill Minkovich VLSI CAD Lab VLSI CAD Lab Computer Science Department Computer Science Department University of California, Los Angeles University of California, Los Angeles Supported by Altera, Xilinx, and Magma under the California MICRO program.

Optimality Study of Logic Synthesis for LUT-Based FPGAs

  • Upload
    steffi

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

Optimality Study of Logic Synthesis for LUT-Based FPGAs. Jason Cong and Kirill Minkovich VLSI CAD Lab Computer Science Department University of California, Los Angeles. Supported by Altera, Xilinx, and Magma under the California MICRO program. Outline. Motivation and background - PowerPoint PPT Presentation

Citation preview

Page 1: Optimality Study of Logic Synthesis for LUT-Based FPGAs

Optimality Study of Logic Synthesis for LUT-Based Optimality Study of Logic Synthesis for LUT-Based FPGAsFPGAs

Jason Cong and Kirill MinkovichJason Cong and Kirill Minkovich

VLSI CAD LabVLSI CAD Lab

Computer Science DepartmentComputer Science Department

University of California, Los AngelesUniversity of California, Los Angeles

Supported by Altera, Xilinx, and Magma under the California MICRO program.

Page 2: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

OutlineOutline Motivation and backgroundMotivation and background

Current testcases hinted towards algorithms not having much Current testcases hinted towards algorithms not having much room for improvement. room for improvement.

LEKO LEKO Logic synthesis Examples with Known OptimalsLogic synthesis Examples with Known Optimals Creation, optimality, and resultsCreation, optimality, and results

LEKULEKU Logic synthesis Examples with Known Upper boundsLogic synthesis Examples with Known Upper bounds Creation and results Creation and results

ConclusionConclusion

Page 3: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

Goals of PaperGoals of PaperGoal was to test the optimality of two design steps for logic synthesis:Goal was to test the optimality of two design steps for logic synthesis:

Technology MappingTechnology Mapping Logic Optimization combined with Technology MappingLogic Optimization combined with Technology Mapping

Definitions Definitions Technology MappingTechnology Mapping Logic OptimizationLogic Optimization Logic Synthesis = Logic Optimization + Technology MappingLogic Synthesis = Logic Optimization + Technology Mapping

f

a b c d e

f

a b c d e

f

a b c d e

Page 4: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

MotivationMotivationLogic synthesis is NP-hard in generalLogic synthesis is NP-hard in general

Combining logic optimization & mapping is much harderCombining logic optimization & mapping is much harder

Academic tools mostly focus on mappingAcademic tools mostly focus on mapping

Problems with current test casesProblems with current test cases

How far from optimal?How far from optimal?

Logic optimization? Logic optimization?

Decrease of FPGA synthesis papersDecrease of FPGA synthesis papers

Suggests fewer improvements possibleSuggests fewer improvements possible

Why there is a need for new onesWhy there is a need for new ones

Test specific properties of logic synthesis toolsTest specific properties of logic synthesis tools

LEKO & LEKU LEKO & LEKU

Page 5: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

Construction Overview (LEKO)Construction Overview (LEKO) First create a small “core” graph, G5, with a known optimal mapping (and First create a small “core” graph, G5, with a known optimal mapping (and

possibly a logic synthesis) solution.possibly a logic synthesis) solution.

G5 has to have the following propertiesG5 has to have the following properties

1.1. 5 inputs (x5 inputs (x11,x,x22,…,x,…,x55) )

2.2. 5 outputs (y5 outputs (y11,y,y22,…,y,…,y55))

3.3. yyii = = f f (x(x11,x,x22,…,x,…,x55) )

4.4. Internal nodes have exactly two inputs.Internal nodes have exactly two inputs.

5.5. optimal (in terms of area/depth) mapping of optimal (in terms of area/depth) mapping of G5 into a 4-LUT mapping solution with only G5 into a 4-LUT mapping solution with only has 4-LUTs (no 3-LUTs or 2-LUTs). has 4-LUTs (no 3-LUTs or 2-LUTs).

Why these properties?Why these properties?

Simplest G5 for 4-LUT architectureSimplest G5 for 4-LUT architecture

Can be cascaded into larger structuresCan be cascaded into larger structures

y 1 y 2 y 3 y 4 y 5

x 1 x 2 x 3 x 4 x 5

G5

Page 6: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

G5 – example G5 – example (optimal 7 4-LUTs)(optimal 7 4-LUTs)

I1

I2I5

I3 I4

N1

N6

N8

N13

N21

N20

N2

N9

N10

N24N22

N23

N19

N18

N17

N16

N15

N14

N12

N7

N3

N4

N5 N11

O1 O5

O3

O2 O4

Output Node

Internal Node

Input Node

Legend

O1 = N1 ∙ N10 O2 = N13 ∙ N14 O3 = N12 + N17 O4 = N13 ∙ N16 O5 = N9 + N11 N1 = I1 ∙ I2' + I2 N2 = N1 ∙ I3' + N1' ∙ I3 N3 = N1' ∙ N7' N4 = N1 + N6 N5 = N3' + N4' N6 = I2' + I5' N7 = N1 ∙ N6 N8 = I3 ∙ I4' + I3' ∙ I4 N9 = N8 ∙ N2 N10 = N9 ∙ I5' + N9' ∙ I5 N11 = I5 ∙ N5 N12 = N18 ∙ I5 N13 = I1 + I1' ∙ I2 N14 = N15 + I2 N15 = N17 ∙ I5 N16 = N17 ∙ I5 + N17' ∙ I5' N17 = N20 ∙ N19 N18 = N23' + N24' N19 = N13 + N13' ∙ I3 N20 = I3 + I4 N21 = I2' + I5' N22 = N13 ∙ N21 N23 = N13' ∙ N22' N24 = N13 + N21

Node Values

Page 7: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

Construction Overview (LEKO)Construction Overview (LEKO) Algorithm Steps Algorithm Steps

1.1. Create a G5Create a G5

2.2. Then duplicate it and connect them together is such a way s.t. there is a Then duplicate it and connect them together is such a way s.t. there is a unique traversal of G5’s from PO to PI. unique traversal of G5’s from PO to PI.

This creates a new graph where we have the following properties:This creates a new graph where we have the following properties: There exists a known optimal mapping solutionThere exists a known optimal mapping solution

This also provides a tight upper-bound to the optimal logic synthesis This also provides a tight upper-bound to the optimal logic synthesis solutionsolution

By using different G5s we can construct different LEKO networks By using different G5s we can construct different LEKO networks

with any variety of properties. with any variety of properties. G5 can have different mapping and logic synthesis solutionsG5 can have different mapping and logic synthesis solutions

G5 can be based on realistic designs (multipliers, adders, etc)G5 can be based on realistic designs (multipliers, adders, etc)

Page 8: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

Construction Examples (LEKO)Construction Examples (LEKO)

G5

G5

G5

G5

G5

G5

G5

G5

G5

G5

G5 G5 G5 G5 G5 G5 G5 G5 G5

G5 G5 G5 G5 G5 G5 G5 G5 G5

G5 G5 G5 G5 G5 G5 G5 G5 G5

Page 9: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

OptimalityOptimality

Theorem: The optimal mapping solution of an arbitrarily sized LEKO Theorem: The optimal mapping solution of an arbitrarily sized LEKO

circuit circuit withoutwithout logic optimization is achieved when every G5 in the logic optimization is achieved when every G5 in the

circuit is mapped optimally without overlapping any other G5. circuit is mapped optimally without overlapping any other G5.

Proof Idea: A LUT spanning two layers can will not reduce the area of Proof Idea: A LUT spanning two layers can will not reduce the area of

the solution. This can be easily shown the solution. This can be easily shown

by looking at what would happen to G5 by looking at what would happen to G5

at layer i and at layer i+1 at layer i and at layer i+1

Complete proof is in the paperComplete proof is in the paper

G5

G5

layer i+1

layer i

3-LUT

4-LUT

Page 10: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

LEKO ExamplesLEKO ExamplesLEKO – Logic synthesis Examples with Known Optimals LEKO – Logic synthesis Examples with Known Optimals

NamingNaming• GG25 25 has 25 inputs and 25 outputshas 25 inputs and 25 outputs

• GGx x has x inputs and x outputshas x inputs and x outputs

Tools testedTools tested

Altera’s Quartus 5.0, Xilinx’s ISE 7.1i, UCLA’s DAOmap and Berkeley’s ABCAltera’s Quartus 5.0, Xilinx’s ISE 7.1i, UCLA’s DAOmap and Berkeley’s ABC

4-LUT architecture4-LUT architecture

Area optimization only (NP-hard)Area optimization only (NP-hard)

Circuits # Nodes

Depth # I/O

Optimal

# LUTs Depth

LEKO

G25 305 13 50 70 4

G125 2350 20 225 525 6

G625 15,875 27 1250 3,500 8

Page 11: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

Results (LEKO)Results (LEKO)Only mapping needed to produce Only mapping needed to produce optimaloptimal results. results.

What do these mean? What do these mean?

Scaled fairly well Scaled fairly well

Average gap = 15%Average gap = 15%

Why Quartus and ISE did so wellWhy Quartus and ISE did so well

Performed extra non-mapping stepsPerformed extra non-mapping steps

Circuits DAOmap ABC Quartus ISE Optimal

LEKO(G25)Area 83 80 72 80 70

Ratio 1.19 1.14 1.03 1.14 1

LEKO(G125)Area 650 609 561 588 525

Ratio 1.24 1.16 1.07 1.12 1

LEKO(G625)Area 4,435 4,072 3,737 3,974 3,500

Ratio 1.27 1.16 1.07 1.14 1

Average Ratio 1.23 1.16 1.05 1.13 1

Page 12: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

Creating LEKUCreating LEKULEKU – Logic synthesis Examples with Known Upper bounds LEKU – Logic synthesis Examples with Known Upper bounds

Constructed from LEKO Constructed from LEKO G25 (25 inputs and 25 outputs)

• Collapse then decompose the graphCollapse then decompose the graph• Creates much larger graph that is logically equivalent to originalCreates much larger graph that is logically equivalent to original

• LEKU-CD – collapsed LEKU-CD – collapsed decomposed into AND/OR gatesdecomposed into AND/OR gates

• LEKU-CB – collapsed LEKU-CB – collapsed balancedbalanced

LEKU-CD’LEKU-CD’• LEKU-CD was too large for Xilinx as a single inputLEKU-CD was too large for Xilinx as a single input• Split LEKU-CD into 25 separate designs, one for each POSplit LEKU-CD into 25 separate designs, one for each PO

Circuits # Nodes Depth #I/O Upper-Bound on Optimal

# LUTs Depth

LEKU-CD(G25) 1,166,655 19 50 70 4

LEKU-CB(G25) 814 16 50 70 4

Page 13: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

Results on LEKUResults on LEKULogic Optimization and Mapping were neededLogic Optimization and Mapping were needed

Academic tools were allowed to use preprocessing toolsAcademic tools were allowed to use preprocessing tools

What does this mean?What does this mean? There exist designs on which these tool perform very badlyThere exist designs on which these tool perform very badly Average gap = 171xAverage gap = 171x Suggest that all of these tools lack global minimization heuristicsSuggest that all of these tools lack global minimization heuristics

Circuits DAOmap ABC Quartus ISE Upper

Bounds

LEKU-CD(G25)

Area 22,717 30,511 10,381 * 70

Ratio 325 436 148 * 1

LEKU-CD(G25)’

Area 25,247 35,271 5,005 9,717 70

Ratio 361 504 72 139 1

LEKU-CB(G25)

Area 322 191 239 280 70

Ratio 4.6 2.7 3.4 4 1

Average Ratio (last 2 designs)

183 255 38 72 1

Average Ratio (ALL) 230 314 74 * 1

Page 14: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

LEKO/LEKU vs Real DesignsLEKO/LEKU vs Real Designs LimitationsLimitations

Whole circuit is combinational logicWhole circuit is combinational logic Contain highly repeated structures in the original circuitsContain highly repeated structures in the original circuits Doesn’t mean tools are 70x away from optimal on real designsDoesn’t mean tools are 70x away from optimal on real designs

Different uses than real designDifferent uses than real design LEKOLEKO

• Test mapping phase of algorithmTest mapping phase of algorithm Perform well on current LEKO benchmarksPerform well on current LEKO benchmarks Will construct larger core graphs Will construct larger core graphs worse results ? worse results ?

LEKULEKU• Test logic optimization phase of algorithmTest logic optimization phase of algorithm

Ability to reproduce original structureAbility to reproduce original structure Duplication removalDuplication removal Logic IdentificationLogic Identification Other global heuristics Other global heuristics

Page 15: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

ConclusionsConclusions ConclusionsConclusions

LEKOLEKO• Only circuits that test optimality of technology mapping Only circuits that test optimality of technology mapping • Have an optimal mapping solutionHave an optimal mapping solution

LEKULEKU• Test global area minimizing heuristicsTest global area minimizing heuristics• Have a very tight upper bound on optimal solutionHave a very tight upper bound on optimal solution

These circuits address a need for specific method testingThese circuits address a need for specific method testing

Current state of technology Current state of technology Technology MappingTechnology Mapping

• Current tools do very wellCurrent tools do very well Overall Logic SynthesisOverall Logic Synthesis

• Current tools just can’t produce good solutions that require a global Current tools just can’t produce good solutions that require a global minimization heuristics.minimization heuristics.

Page 16: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

Conclusions (continued)Conclusions (continued) Download every testcases mentioned here Download every testcases mentioned here

http://cadlab.cs.ucla.edu/http://cadlab.cs.ucla.edu/Click on “Optimality Study”Click on “Optimality Study”Click on “LEKO/LEKU”Click on “LEKO/LEKU”

Harder and Larger LEKO and LEKU circuits will be posted soon! Harder and Larger LEKO and LEKU circuits will be posted soon!

Check out the article in EE TimesCheck out the article in EE Times Just search EE Times for “kirill”Just search EE Times for “kirill” Thank you EE Times for your interest!Thank you EE Times for your interest!

http://eetimes.com/showArticle.jhtml?articleID=180204087http://eetimes.com/showArticle.jhtml?articleID=180204087

Questions? Questions?

Page 17: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

Page 18: Optimality Study of Logic Synthesis for LUT-Based FPGAs

Additional SlidesAdditional Slides

Page 19: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

Construction Algorithm (LEKO)Construction Algorithm (LEKO)

Page 20: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

VariationsVariations

LEKOLEKO Using larger core graphs to create more complex designsUsing larger core graphs to create more complex designs

Using commonly used cells as the core graphsUsing commonly used cells as the core graphs

Using collection of core graphsUsing collection of core graphs

LEKULEKU Using LEKO and adding in specific things to testUsing LEKO and adding in specific things to test

• Duplicating some specific partsDuplicating some specific parts• Adding wires that will be removed when DON’T CARES are Adding wires that will be removed when DON’T CARES are

computedcomputed

Page 21: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

Interesting New ResultsInteresting New Results After seeing the results we got several responses After seeing the results we got several responses

ABC ABC • RepeatingRepeating

map 4-LUTs map 4-LUTs don’t care calculation don’t care calculation

let to 3x improvement on the largest LEKU examplelet to 3x improvement on the largest LEKU example DAOMapDAOMap

• Multiple iteration of Multiple iteration of map 5-LUTs map 5-LUTs simplify simplify map 4-LUTs map 4-LUTs

showed similar improvements on the LEKU examplesshowed similar improvements on the LEKU examples Altera Altera

• For the LEKO the followingFor the LEKO the followingmap 5-LUT map 5-LUT map 4-LUT map 4-LUT

was able to achieve near optimal solutionswas able to achieve near optimal solutions• This result wouldn’t extend if we used a larger G5This result wouldn’t extend if we used a larger G5

Page 22: Optimality Study of Logic Synthesis for LUT-Based FPGAs

UCLA VLSICAD LAB

Different G5sDifferent G5s Assuming a Assuming a KK-LUT -LUT

G5 has to have the following propertiesG5 has to have the following properties

1.1. It has It has mm inputs and inputs and mm outputs. outputs.

2.2. Every output is a function of all Every output is a function of all fivefive inputs. inputs.

3.3. Each internal node of G5 has exactly two inputs.Each internal node of G5 has exactly two inputs.

4.4. There exists an optimal (in terms of area/depth) mapping of G5 into a There exists an optimal (in terms of area/depth) mapping of G5 into a KK-LUT -LUT mapping solution, denoted M5, such that M5 only has mapping solution, denoted M5, such that M5 only has KK-LUTs. -LUTs.

WhereWhere m m ≥ ≥ K K + 1+ 1

The larger the The larger the mm the harder the G5 is to map the harder the G5 is to map