37
1 Area Optimizations for Dual- Rail Circuits Using Relative-Timing Analysis Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein Department of Computer Science Carnegie Mellon University

Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

  • Upload
    yosefu

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis. Tiberiu Chelcea , Girish Venkataramani, Seth C. Goldstein Department of Computer Science Carnegie Mellon University. QDI: Orphans problem. Early propagation : “A” arrives early => Z transitions - PowerPoint PPT Presentation

Citation preview

Page 1: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

1

Area Optimizations for Dual-Rail Circuits Using Relative-

Timing Analysis

Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein

Department of Computer ScienceCarnegie Mellon University

Page 2: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

2

QDI: Orphans problem

• Early propagation:– “A” arrives early => Z

transitions– Stale values on the

other signals

• Incorrect behavior: inputs acknowledged before being received

Z0

Y0

X0

D0

C0

B0

A0

A1

B1

C1

D1

X1

Y1

Z1

Page 3: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

3

NCL-X solution

Z0

Y0

X0

D0

C0

B0

A0

A1

B1

C1

D1

X1

Y1

Z1N1

N2

N3

DoneC

Add completion detection

DoneA

Page 4: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

4

QDI Gate Delays

QDI implementations always assume the worst:equal probability for any gate delay

Page 5: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

5

Motivation

• Quasi-Delay Insensitive (QDI) circuits:– One timing constraint– Naturally tolerate

parametric variation, but…

• Have large area overheads– Added completion

detection for correctness

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

add_bk_32 lsr16 C880

gates cd

Page 6: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

6

Parametric Variation and Gate Delays

Goal: pay only what is necessary

ITRS’05: 35% parametric

variation by 2020

Page 7: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

7

Goal: Optimizing Sync→Async Flow

• Use timing information to reduce size of completion detection

• Use mixed gates to further reduce area– w/ early propagation– w/o early propagation

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

NCL-X Direct

gates cd

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

NCL-X Direct Exact

gates strict cd

regular gates

strict gates

Page 8: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

8

ContributionsThree new relative-timing

area optimizations:• Direct method:

– Timing analysis + simple CD elimination

• Greedy method: fast but not optimal– Uses strict gates, but

may increase area

• Exact method: optimal, but slow– Solves an mILP

problem

0.83

0.55

0.43

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Direct Greedy Exact

Page 9: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

9

Outline

• Timing analysis & Direct Optimization

• Greedy optimization method

• Exact optimization method

• Results

• Conclusions

Page 10: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

10

Basics• QDI circuits:

– Unbounded but finite delays on gates and wires

– One timing assumption: isochronic fork

• Timed circuits:

– Delays on gates and wires: bounded time intervals

– Given input arrival times: compute propagation intervals for each gate and wire

Page 11: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

11

Timing Computation

• Conservative assumption: any input change can trigger an output change

Z

YD

C

A

B

X

N1

N2

N3

(1.5,1.9)

(1.1,1.2)

(1.0,1.2)

(0.5,0.7)

(0.6,0.8)

(0.5,0.7)

(0,0)

(0,0)

(0,0)

(0,0)

(0,0)

(0,0)(0,0)

(3.5,4.1)

(3.0,4.0)

(1.5,1.9)

(3.6,4.9)

(3.6,4.9)

(2.0,5.6)(2.0,5.6)

GlobalPI

Page 12: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

12

Direct Optimization Method

• Gate completion detection iff gate may not be stable when outputs are produced

Z

YD

C

A

B

X

N1

N2

N3

(1.5,1.9)

(1.1,1.2)

(1.0,1.2)

(3.5,4.1)

(3.0,4.0)

(1.5,1.9)

(3.6,4.9)

(3.6,4.9)

(2.0,5.6)(2.0,5.6)

CDone

Under any input change, gate quiescent when output produced

1.9 < 2.0

Page 13: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

13

Strict Gates

• All inputs must arrive before producing an output

• Eliminate early propagation effect

- Extremely expensive+ Decrease length of

propagation interval

A

B

C

C

C

Page 14: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

14

Timing Computation with Strict Gates

• Entire completion detection: single OR gate

Z

YD

C

A

B

X

N1

N2

N3

(1.5,1.9)

(1.1,1.2)

(1.0,1.2)

(3.5,4.1)

(3.0,4.0)

(1.5,1.9)

(3.6,4.9)

(3.6,4.9)

(5.0,6.8)(5.0,6.8)

(1.4,1.9)Done

• This circuit: area not reduced• Goal: smart insertion of strict gates

Page 15: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

15

Outline

• Timing analysis & Direct Optimization

• Greedy optimization method

• Exact optimization method

• Results

• Conclusions

Page 16: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

16

Greedy Optimization (1)

• Strict gates: area implications– GlobalPI may be narrower and delayed– Fewer gates non-quiescent– Smaller completion detection

• Greedy optimization framework:– Flip gates in the circuit from normal to strict– Select most promising candidate– Continue until no improvements possible

Page 17: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

17

Greedy Optimization (2)

Algorithm:

1. For each gate Gi in the circuit

a. Flip each gate Gi in turn from regular to strict

b. Perform timing analysis, compute GlobalPIi

c. Flip back Gi to regular

2. Select Gk with the narrowest GlobalPIk

3. If GlobalPIk narrower than previous best:

a. Flip Gk to strict permanently

b. Continue (goto 1)

Else: finish

Page 18: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

18

Greedy Optimization (3)

• Algorithm does not optimize for area directly

• Instead: may reduce the completion detection by narrowing the output interval

• Results promising, but individual benchmarks may result in larger area

Page 19: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

19

Outline

• Timing analysis & Direct Method

• Greedy optimization method

• Exact optimization method

• Results

• Conclusions

Page 20: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

20

Exact Optimization Method

• mixed Integer Linear Programming (mILP)

• Transform circuit graph into an optimization problem:

– Introduce variables for each gate, wire and primary input/output

– Matrix coefficients: from library (gate areas) and back-annotation (gate/wire delays) files

– Decision variables (GS) should gate be strict?

Page 21: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

21

mILP formulation• Minimize: TotalArea = GateArea+CDArea

• GateArea = i (GSi·SAreai + (1-GSi)·NAreai)

• CDArea = SCD·Or2Area + (SCD-1)·CArea– SCD: # gates that need completion detection

• NeedsCD: does a gate need CD?– NeedsCD = 0 if PIM < GlobalPIm or successor is

strict; otherwise 1

• Rest of the model implements timing computation

Page 22: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

22

Improving the mILP Model

• Basic mILP model: too slow even for small circuits (hours for dozen gates)

• Leverage problem knowledge into model improvements:– Branching order: gates closer to the output are

more likely to become strict => inspected first – Single input gates: never strict– Provide initial solution (result of greedy opt)

• Can solve problems with hundreds of gates in minutes

Page 23: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

23

Related Work: Optimizations• Cortadella et al:

– logical function decompositions– can achieve substantial area savings– can be the starting point for our methods

• Zhou et al: consider strict gates in optimization, but no timing information

• Sokolov et al: two timing optimizations– Alternate levels: unrealistic assumptions for

gate delays– Longest path: applicable only for small circuits

Page 24: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

24

Experimental Setup• Tool flow:

– Synthesis & tech-mapping with Synopsys Design Compiler

– Perl scripts for dual-rail implementations– Optimization tool reads structural Verilog and

timing back-annotations– End result: optimized circuits (Verilog)

• Experiments:– Arithmetic and ISCAS’89 benchmarks– Pre-layout runs in 0.18m technology

Page 25: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

25

0

0.2

0.4

0.6

0.8

1

1.2

direct greedy ilp

Area: Ratio vs. NCL-X methodGreedy: 2.83x NCL-X areafor le32mILP does not finish in

less than 1 hourPartial results

Direct: 0.83xGreedy: 0.55xmILP: 0.43x

Page 26: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

26

Area breakdown

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

add_bk_32 Lsr16 C880

regular strict cdDi

rect

Gree

dy

ILP

NCLX

8/168 strict4.7% before → 40% after

Over twice as small than NCL-X

Page 27: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

27

Parametric Variation: BK adder

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0% 5% 10% 15% 20% 25% 30% 35%

Parametric Variation

Ratio

vs.

NCL

-X A

rea

Direct Greedy Exact

Page 28: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

28

Conclusions• Paper introduced:

– a method to translate synchronous circuits into optimized asynchronous circuits

– Three new relative timing optimizations for improving area

• Direct: extremely simple• Greedy: fast, good results• Exact: optimal, may be extremely slow

– Analyzed the impact of parametric variation on these circuits

Page 29: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

29

Backup slides

Page 30: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

30

Outline

• Background

• Timing analysis & Direct Optimization

• Greedy optimization method

• Exact optimization method

• Results

• Conclusions

Page 31: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

31

Introduction

• Future deep sub-micron technologies:– large parametric variations (ITRS’05 predicts

35% by 2020).– Asynchronous design a natural fit– Asynchronous handshaking: widespread

• Acceptance for asynchronous circuits is predicated on quality CAD tools:– “Pure” async: from scratch– Sync to async translation

Page 32: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

32

Synchronous to Asynchronous Translation

Synchronous circuit

Template-based replacement of each sync gate

AB

CD

Z

Y

X

Z = (A·B)·(C+D)

N1

N2

N3

Dual-rail circuit

Z0

Y0

X0

D0

C0

B0

A0

A1

B1

C1

D1

X1

Y1

Z1N1

N2

N3

Page 33: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

33

Related Work

• Numerous approaches for translating synchronous circuits into asynchronous

• Dealing with the orphans problem:

– Kondratiev et al: NCL-X (discussed below)

– Brej: anti-tokens

• Allows for early propagation

• Completion detection in background

• Even larger area overheads

Page 34: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

34

ILP optimization for 32-bit BK adder

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

55%

60%

65%

Time (s)

% e

rror

% Crt Sol % Best Estimation

CrtSol: current bestInteger solution

Best Estimation: best guess ofhow far the optimum isWhen 0, optimum found

Page 35: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

35

Outline

• Timing analysis & Direc Optimization

• Greedy optimization method

• Exact optimization method

• Results

• Conclusions

Page 36: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

36

Area breakdown

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

add_bk_32 Lsr16 C880

regular strict cdDi

rect

Gree

dy

ILP

NCLX

8/168 strict4.7% before → 40% after

Over twice as small than NCL-X

Page 37: Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis

37

mILP Run TimeBench #Inps #Outs #Gates #Vars #Constr Runtime

Eq32 64 1 37 731 1158 0.23s

Decode32 5 32 49 1239 2068 12.2s

C432 36 7 80 2391 4223 27m46s

Lsl16 32 16 81 1819 3534 10m24s

Lsr16 32 16 81 2315 4080 19m15s

Absval32 32 32 92 2420 4149 6m7s

C880 60 26 168 4385 7724 39m25s

C1908 33 25 190 3263 5300 20m23s

Bk32 64 32 285 4923 8293 78s

Clf32 64 32 309 5195 8737 71s