34
1 1 1 1 Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction Matthew Fojtik, David Fick, Yejoong Kim, Nathaniel Pinckney, David Harris, David Blaauw, Dennis Sylvester [email protected] Electrical Engineering & Computer Science Department The University of Michigan, Ann Arbor

Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

1 1

1

1

Bubble Razor An Architecture-Independent Approach to

Timing-Error Detection and Correction

Matthew Fojtik, David Fick, Yejoong Kim, Nathaniel Pinckney, David Harris, David Blaauw, Dennis Sylvester

[email protected]

Electrical Engineering & Computer Science Department

The University of Michigan, Ann Arbor

Page 2: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

2 2

2

2

Outline

Issues with Prior Razor

Bubble Razor Algorithm

Circuitry and Implementation

Area Overhead Tradeoffs

Test Chip Results

Page 3: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

3 3

3

3

Timing Margins

Voltage

Process Aging

Temperature

clock

Lost performance/energy

Margins for uncertainty:

Process Variation

Temperature Variation

Voltage Variation

Aging Effects

Associated Costs:

Lost performance

Lost energy

Tester time (tradeoff)

actual circuit delay

Data

Page 4: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

4 4

4

4

Technique Process Ambient Data

Global Local Global Local

Slow Fast Slow Fast

Table Lookup X X

Table & Sensors X X X

Canary Circuit X X

Razor Designs X X X X X X X

Eliminating Margins

Always Correct

Tables, Canaries

Detect and Correct

Razor Style Shadow

Latch

Main DFF

Error

Q D

CLK

DCLK S. Das, et. al. [VLSI 2005]

Page 5: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

5 5

5

5

Speculation Window and Hold Time

DFF B DFF A

CLK A

CLK B

Speculation Window

Speculation window linked to minimum delay constraint (hold time)

Page 6: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

6 6

6

6

Architectural Invasiveness

IF ID EX MEM WB CHK

Razor I Style – All Flops Reload Previous Values

Razor II Style – Check Stage and Architectural Replay

S. Das, et. al. [VLSI 2005]

D. Blaauw, et. al. [ISSCC 2008]

IF ID EX MEM WB

K. Bowman, et. al. [ISSCC 2008]

• Requires Designer Effort

• RTL written with Razor in mind

Page 7: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

7 7

7

7

Fundamentals of Bubble Razor

Two-Phase Latch Timing

Automatically convert Flip-Flop based design

Time Borrowing as Correction Mechanism

Does not modify design architecture

Does not require reloading / replaying instructions

Local Correction (Bubbles)

Break requirement of stalling entire chip at once

Page 8: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

8 8

8

8

Two Phase Latch Razor Timing

CLK A

CLK B

Larger Speculation Window

Minimum delay constraint the same as conventional design

LD B

LD A

Page 9: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

9 9

9

9

Time Borrowing as Error Correction

DFF DFF LD LD

TD TD TD TD

LD LD

G closed open closed open closed X closed open

D

Error

Bubble Razor – Switch to Latches, Borrow Time

• No Hold Time Issues

• Architecture Agnostic

• Push-button approach

• No metastability on datapath

Page 10: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

10 10

10

10

Stalling Locally with Bubbles

1 3 5 7 2 4 6 8

Tim

e

Eventually it all resolves

Stalling the Clock Locally

• With flops, all registers hold data

• With latches, half registers hold bubbles

• Every latch stalls exactly once

• Communication only between neighbors

Blue tells Green to stall Purple tells

Blue to stall Yellow takes off again

Red tells Purple to stall Yellow tells

Red to stall Yellow tells downstream

no new data exists Yellow stalls

Not immediately overwritten

Page 11: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

11 11

11

11

Timing of Clock Waveforms

1 2 3 4 5 6 7 8 9 10

2

3

4

1

Should Arrive

Timing violation

Give time to Recover

Prevent Double Sampling inst1

Prevent Losing inst2

Prevent Losing inst3

Page 12: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

12 12

12

12

Timing of Clock Waveforms

1 2 3 4 5 6 7 8 9 10

2

3

4

1

Should Arrive

Timing violation

Give time to Recover

Prevent Double Sampling inst1

Prevent Losing inst2

Prevent Losing inst3

Page 13: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

13 13

13

13

Timing of Clock Waveforms

1 2 3 4 5 6 7 8 9 10

7

2

3

4

5

6

8

1

9

10

Page 14: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

14 14

14

14

7

2

3

4

5

6

8

1

9

10

Timing of Clock Waveforms

1 2 3 4 5 6 7 8 9 10

Timing violation Stall Neighbors

Stall 3

Page 15: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

15 15

15

15

The Required Circuitry

2 3 1 2

CG CG CG CG

TD TD TD TD

B

B

B

B

Page 16: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

16 16

16

16

Error Detection And OR Circuitry

TD TD TD

1

Page 17: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

17 17

17

17

Clock Gate Control Logic

A cluster stalls and sends bubbles to all neighbors if

Told by a neighboring cluster

Did not stall in the previous cycle

Equivalent to sending bubbles to “other” neighbors

CG

B

Page 18: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

18 18

18

18

Clustering with hMETIS

Widely used Hypergraph

partitioning program, hMETIS

Clusters must only contain

members with the same phase

Create two graphs, and partition

independently

Connected in hMETIS graph, if

transitively connected in circuit

Edge Weight = number of latches

that form transitive connection

4 1

2

5

6

3

1

2

3

4

5

6 1

1

2

1

2

1

Page 19: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

19 19

19

19

Clustering Results

Tradeoff between

sizes of OR gates

Combining errors

Combining bubbles

100 negative clusters 70 positive clusters

Page 20: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

20 20

20

20

Two Port Memory Boundary Approach

Must fit edge triggered

memory into stalling

algorithm

Page 21: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

21 21

21

21

“Managing” the Synthesis/APR Tools

Want balanced pipelines, no time borrowing

Model razor latches as flip flops

Dynamic OR always followed by latch

Model dynamic OR as static

Model latch as flip flop (captures when latch closes)

Use regular ICG cells

Can use conventional clock tree synthesis

Final design appears to be relatively “normal”

Flip-flop based design with clock gating

Everything is timing constrained

“Razorization” process is entirely automated

Synthesis and netlist transformation scripts

Page 22: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

22 22

22

22

Retiming And Number of Latches

Retiming can increase the number of latches

Results in area overhead

Page 23: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

23 23

23

23

Area Overhead of Latch Transformation

Page 24: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

24 24

24

24

Speculation Window Size

Full Clock Phase (100%) Minus Delay of Error

Propagation Circuits

Maximum allowed by technique

Number / Location of Latches with Error Checking

Maximum slowdown that does not result in unchecked error

Speculation Window

Page 25: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

25 25

25

25

50%

Where Error Checking is Needed

If circuit delay suddenly becomes 130% of its

nominal value, all timing errors will be detected

before the circuit fails

15%

30% Speculation Window

A B C D

65%

50%

26%

50% 20%

65%

91% 156%

Delay at PoFF

Delay at Worst

Leave B

Arrive C

>50? >50? >50?

Arrive D

Page 26: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

26 26

26

26

Path Distribution for Cortex-M3

Positive

Latches Negative

Latches

Flip

Flops

All

Latches

Page 27: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

27 27

27

27

Area Increase from Error Checking

20% Area Overhead

30% Timing Speculation

Page 28: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

28 28

28

28

Implementation on ARM Cortex-M3

Page 29: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

29 29

29

29

Characterizing Throughput / Energy

Operating Point Set for Worst Case Operation

85°C

10% Supply Droop

2σ Process

5% Safety Margin

200 MHz at 1.0 V

Page 30: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

30 30

30

30

Gains from Bubble Razor

Page 31: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

31 31

31

31

Gains from Bubble Razor

Page 32: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

32 32

32

32

Bubble Razor Results

Slow Average Fast

Page 33: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

33 33

33

33

Bubble Razor Results

Worst

Case

200 MHz 8.5

FFT/ms

First

Failure

333 MHz 14.2

FFT/ms

Optimum 425 MHz 17.3

FFT/ms

Worst

Case

1.0 V 3.08

μJ/FFT

First

Failure

0.775 V 1.42

μJ/FFT

Optimum 0.725 V 1.18

μJ/FFT

Page 34: Bubble Razor - pdfs.semanticscholar.org · Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay S. Das, et. al. [VLSI 2005] D

34 34

34

34

Conclusion

First Razor style implementation on a complete,

commercial processor (ARM Cortex-M3).

Proposed two-phase latch based Razor technique

Novel local replay algorithm

Demonstrated automated nature of technique

Successfully implemented and fabricated in 45nm

60% energy efficiency or 100% throughput increase

over worst case margining