40
Iterative Layering: Optimizing Arithmetic Circuits by Structuring the Information Flow Ajay K. Verma 1 , Philip Brisk 2 , Paolo Ienne 1 International Conference on Computer-Aided Design November 5, 2009 1 Processor Architecture Laboratory School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) 2 Department of Computer Science and Engineering Bourns College of Engineering University of California, Riverside

Iterative Layering: Optimizing Arithmetic Circuits by Structuring the Information Flow

  • Upload
    lilia

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

Iterative Layering: Optimizing Arithmetic Circuits by Structuring the Information Flow. 1 Processor Architecture Laboratory School of Computer and Communication Sciences Ecole Polytechnique F édérale de Lausanne (EPFL). Ajay K. Verma 1 , Philip Brisk 2 , Paolo Ienne 1. - PowerPoint PPT Presentation

Citation preview

Page 1: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Iterative Layering: Optimizing Arithmetic Circuits by Structuring the Information Flow

Ajay K. Verma1, Philip Brisk2, Paolo Ienne1

International Conference on Computer-Aided Design

November 5, 2009

1Processor Architecture LaboratorySchool of Computer and Communication SciencesEcole Polytechnique Fédérale de Lausanne (EPFL)

2Department of Computer Science and EngineeringBourns College of Engineering

University of California, Riverside

Page 2: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Logic Optimization Strategies

Ripple-Carry Adder Carry-Lookahead Adder

• Logic synthesis tools– Local optimization via Boolean minimization

• Architectural transformation– Not with “traditional” logic synthesis

1

Page 3: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Leading Zero Detector

2

16% faster, 8% smaller

[Oklobdzija, TLVSI 1994]

Naïve Implementation

Optimized Implementation

Page 4: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Outline• Decomposition Techniques

• Progressive Decomposition and its Shortcomings– [Verma et al., DAC 2007]

• Iterative Layering Algorithm

• Experimental Results

• Conclusion

3

Page 5: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Outline• Decomposition Techniques

• Progressive Decomposition and its Shortcomings– [Verma et al., DAC 2007]

• Iterative Layering Algorithm

• Experimental Results

• Conclusion

3

Page 6: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

4

Decomposition

Page 7: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

4

• Optimize the red block locally• Recursively decompose the remaining circuit

Decomposition

Page 8: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

4

• Input condensation– At each step, fewer input bits remain– Imposes hierarchy on the circuit

Decomposition

Page 9: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

4

• The result is a well-structured hierarchical circuit

Decomposition

Page 10: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Disjoint Decomposition

Non-disjoint Decomposition

5

Page 11: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Disjoint Decomposition Example:8:4 Parallel Counter

sc

(Full Adder)

6

Page 12: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

4x4-bit Multiplier

7

y0 x0y1 x0y2 x0y3 x0

y0 x1y1 x1y2 x1y3 x1

y0 x2y1 x2y2 x2y3 x2

y0 x3y1 x3y2 x3y3 x3

4 bits

Σ

X YPPG

X Y

4 bits

16 bits

Page 13: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

4x4-bit Multiplier

7

4 bits

Σ

X YPPG

X Y

4 bits

16 bits

Partial product reduction tree has a disjoint decomposition

Page 14: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

4x4-bit Multiplier

7

4 bits

Σ

X YPPG

X Y

4 bits

16 bits

Partial product reduction tree has a disjoint decomposition

The partial product generator requires a non-disjoint

decomposition

Page 15: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

M1 M2

48

E1 E2

19 19

4

sign

negs1 s2

xor

out

Compound CircuitsM1 M2

48

E1 E2

19 19

sign

not

out

and

1

4

s1 s2

xor

g72x

12% faster, 55% larger

8

Page 16: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Outline• Decomposition Techniques

• Progressive Decomposition and its Shortcomings– [Verma et al., DAC 2007]

• Iterative Layering Algorithm

• Experimental Results

• Conclusion

9

Page 17: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

• Successfully structured some arithmetic circuits

– Ripple-carry adder Inferred parallel prefix adder

– 3-input ripple-carry adder Inferred carry-save adder

– Leading zero detector Inferred design of [Oklobdzija 1994]

– Various counters, majority Inferred carry-free structures functions, etc. based on carry-save addition

10

Progressive Decomposition[Verma et al., DAC 2007]

Page 18: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Progressive Decomposition[Verma et al., DAC 2007]

• Disjoint decomposition– Forget about multipliers– Cannot handle compound arithmetic circuits

• Entire algorithm based on Reed-Muller Form– Rewrite ‘your’ optimizer, e.g., if you use AIGs or BDDs.– Exponential size for leading one detector

• Leading zero detector remains polynomial

10

Page 19: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Outline• Decomposition Techniques

• Progressive Decomposition and its Shortcomings– [Verma et al., DAC 2007]

• Iterative Layering Algorithm

• Experimental Results

• Conclusion

11

Page 20: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

12

• Non-disjoint decomposition– Yields disjoint decompositions when appropriate

• Not tied to any specific circuit representation– Our implementation uses BDDs

• SAT-based functional dependence test [Lee et al., ICCAD 2007]– Requires efficient conversion to CNF– Functional dependence is inherent to any decomposition

Iterative Layering

Page 21: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

13

• Bricks– Definition and algorithmic overview– Evaluation metrics

• Brick Enumeration– Cofactor enumeration– Generate bricks from cofactors

• Brick Selection– Problem formulation related to Set Cover

Iterative Layering Outline

Page 22: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

13

• Bricks– Definition and algorithmic overview– Evaluation metrics

• Brick Enumeration– Cofactor enumeration– Generate bricks from cofactors

• Brick Selection– Problem formulation related to Set Cover

Iterative Layering Outline

Page 23: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Bricks

14

• A subcircuit with < k inputs and one output

– Any functional dependence may exist between a brick and the original expression

– Kernels and co-kernels are bricks• The dependence is disjunctive by definition

E = ac + ad + bc + bd 7 gatesBrick: p = a + b (1 gate)

E = pc + pd 4 gatesE = p(c + d) 3 gates

Page 24: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Iterative Layering Algorithm

15

• Enumerate all bricks having < k inputs– k=6 in our implementation

• Evaluate all bricks based on a merit function

• Select a subset of bricks – The subset must contain all of the information about the circuit– The subset should be optimal w.r.t. some optimization criteria

• The selected bricks form a “layer”

• Stack layers on top of one another to structure the circuit

Page 25: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Information Fitness

16

• Estimated gate reduction– Size of BDD of input expression [Macii et al., GLS-VLSI 1999]

f gp

Info. Fitness =

Size(BDDf)

Size(BDDg) + Size(BDDp)

Page 26: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Information Coverage

17

E – expression to optimizep – brick under consideration

D = on-set(E) off-set(E)N = {(x, y)D| p(x) p(y)}

Intuition: • Attempt to quantify the functional dependency from p to E

Limitation: • Requires completely specified truth table

– Size is exponential in the number of inputs

Our Approach:• Randomly sample the truth table of E• Theorem 1 in the paper includes some probabilistic justification

Info. Coverage = |N|

|D|

Page 27: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

18

• Bricks– Definition and algorithmic overview– Evaluation metrics

• Brick Enumeration– Cofactor enumeration– Generate bricks from cofactors

• Brick Selection– Problem formulation related to Set Cover

Iterative Layering: Outline

Page 28: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Brute Force Cofactor Enumeration

19

Enumerate every combination of k input bits

E = ab cd (a b)(c d)B = {a, b, c} R = {d}

Enumerate the set of cofactors with respect to R

S = {Ed Ed } = {ab bc ac, ab bc ac a b c}

Problem: |S| = 2|R|

Page 29: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Cofactor Enumeration via Sampling and SAT-based

Functional Dependence Testing

20

1. Generate an initial set of cofactors using random sampling

2. Test if E depends on the cofactors and any remaining variables[Lee et al., ICCAD 2007]

• SAT = FALSE implies a full dependence• SAT = TRUE implies a partial dependence

• Satisfying assignment of input variables yields one missing cofactor

3. Repeat Step 2 until SAT = FALSE

Page 30: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Brick Computation: Summary

21

For every combination of at most k input bits

• Generate the cofactors of the remaining bits– Random sampling + SAT-based functional dependence testing

• Discard useless cofactors– Details are in the paper

• Recursively apply iterative layering with a smaller value of k to generate the bricks from the cofactors

That’s a lot of bricks!• Which bricks do I really need?

Page 31: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

22

• Bricks– Definition and algorithmic overview– Evaluation metric

• Brick Enumeration– Cofactor enumeration– Generate bricks from cofactors

• Brick Selection– Problem formulation related to Set Cover

Iterative Layering: Outline

Page 32: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Brick Selection: Overview

23

Goal: Find a minimal set of bricks that covers all points in on-set(E) off-set(E)

• Greedy heuristic based on [Johnson, HCSS 1974]

– Select a brick that maximizes Info.Fitness Info.Coverage

– Update Info.Fitness and Info.Coverage for the remaining bricks

– Stop when E is functionally dependent on the chosen bricks[Lee et al., ICCAD 2007]

• See the paper for details on the data structures used

Page 33: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Outline• Decomposition Techniques

• Progressive Decomposition and its Shortcomings– [Verma et al., DAC 2007]

• Iterative Layering Algorithm

• Experimental Results

• Conclusion

24

Page 34: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Experimental Setup

Circuit written by hand

Known Arithmetic

Circuits

Progressive Decomposition

[Verma et al., DAC 2007]

Synopsis Design Compiler

- compile_ultra - minimize delay

Artisan Standard CellsUMC (90 nm)

1 2 3

Iterative Layering

4

25

Page 35: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Critical Path Delay

26

0

0.2

0.4

0.6

0.8

1

1.2

1.4

16-b

itA

DD

12-b

it3A

DD

8x8

-bit

MU

L

9x9

-bit

MU

L

10x1

0-

bit

MU

L

MA

X84

16-b

itS

HIF

T

9sy

mm

l

sqrt8

rd84

rd73

t481

adpcm

enco

der

G.7

21

(fm

ult)

SA

D

Arithmetic Components MCNC Circuits CompoundArithmetic Circuits

Original Progressive Decomposition

Iterative Layering Library/Manual Implementation

Optimized for Area, Not Delay

Progressive Decomposition Fails

ns

Page 36: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Area

27

0

2000

4000

6000

8000

10000

12000

1400016-b

itA

DD

12-b

it3A

DD

8x8

-bit

MU

L

9x9

-bit

MU

L

10x1

0-

bit

MU

L

MA

X84

16-b

itS

HIF

T

9sy

mm

l

sqrt8

rd84

rd73

t481

adpcm

enco

der

G.7

21

(fm

ult)

SA

D

Arithmetic Components MCNC Circuits CompoundArithmetic Circuits

Original Progressive Decomposition

Iterative Layering Library/Manual Implementation

Optimized for Area, Not Delay

Progressive Decomposition Fails

μm2

Page 37: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

n-bit, k-input MAX Function

28

Pairwise Comparison of Inputs

½k(k - l) comparators

Delay: O(log n + log k)

Area: O(k2n)

0.21ns, 3479 m2

Iterative Layering

0.22ns, 1331 m2

Binary Tree of Comparators

k - l comparators

Delay: O(log n log k)

Area: O(kn)

0.46ns, 1755 m2

(Circuit structure was unknown to us)

Page 38: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Integer Domination Table Count Leading 1’s

8-bit, 4-input MAX Example

29

(22) 0010110 (59) 0111011(62) 0111110(61) 0111101

1010110 111101111111101111101

Replace any all-zero column with ones!

001 (1)100 (4)110 (6)101 (5)

(1) 001(4) 100(6) 110(5) 101

001100110101

00 (0)01 (1)10 (2)01 (1)

(0) 00(1) 01(2) 10(1) 01

00011001

001

MAX0

Page 39: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Outline• Decomposition Techniques

• Progressive Decomposition and its Shortcomings– [Verma et al., DAC 2007]

• Iterative Layering Algorithm

• Experimental Results

• Conclusion

30

Page 40: Iterative Layering:  Optimizing Arithmetic Circuits by Structuring the Information Flow

Conclusion• Iterative Layering structures arithmetic circuits

– Automatically infer well-known manual designs from arithmetic literature

– Fixes shortcomings of Progressive Decomposition • Non-disjoint decomposition• Usable with any circuit representation

31

PDIL

ADD 3-ADD LZD MUL SHFT MAX

CompoundArithmeticCircuits