19
1 Better Than Worst-Case Design Better Than Worst-Case Design Todd Austin University of Michigan [email protected] Acknowledgements: Valeria Bertacco, David Blaauw, Shidhartha Das, Dan Ernst, Nam Sung Kim, Seokwoo Lee, Trevor Mudge, Chris Weaver Kris Flautner & ARM Ltd Robert Colwell Challenges in the Nanometer Regime Challenges in the Nanometer Regime Design complexity Billions and billions of transistors lead to untenable designs… Device-level faults in logic and memory Cosmic rays, alpha particles, gate wear-out, silicon defects, etc… Uncertainty in design parameters Process and temperature variation, supply noise… Power/performance demands Bounding performance, area, and battery life

Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

1

Better Than Worst-Case DesignBetter Than Worst-Case Design

Todd AustinUniversity of Michigan

[email protected]

Acknowledgements:Valeria Bertacco, David Blaauw, Shidhartha Das, Dan Ernst, Nam Sung Kim, Seokwoo Lee, Trevor Mudge, Chris Weaver

Kris Flautner & ARM LtdRobert Colwell

Challenges in the Nanometer RegimeChallenges in the Nanometer Regime

Design complexityBillions and billions of transistors lead to untenable designs…

Device-level faults in logic and memoryCosmic rays, alpha particles, gate wear-out, silicon defects, etc…

Uncertainty in design parametersProcess and temperature variation, supply noise…

Power/performance demandsBounding performance, area, and battery life

Page 2: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

2

Design-TimeVerification

andOptimization

Traditional Worst-Case DesignTraditional Worst-Case Design

L H

Time-to-Market

L H

Performance

Run-TimeVerification

TypicalCase

Optimization

Better Than Worst-Case DesignBetter Than Worst-Case Design

L H

Time-to-Market

L H

Performance

L H

Performance

L H

Time-to-Market

Online

Checker

Hardware

Page 3: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

3

Presentation AgendaPresentation Agenda

BTWC Design ExamplesDIVA CheckerRazor Logic

BTWC Design Opportunities and ChallengesTypical-Case design Optimization (TCO)

Conclusion

Example BTWC Design:DIVA Checker [MICRO ‘99]Example BTWC Design:DIVA Checker [MICRO ‘99]

All core function is validated by checkerSimple checker detects and corrects faulty results, restarts core

Checker relaxes burden of correctness on core processorTolerates design errors, electrical faults, defects, and failuresCore has burden of accurate prediction, as checker is 15x slower

Core does heavy lifting, removes hazards that slow checker

speculativeinstructions

in-orderwith PC, inst,inputs, addr

IF ID REN REG

EX/MEM

SCHEDULER CHK CT

Performance Correctness

Core CheckerOnline

Checker

Hardware

Page 4: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

4

result

Checker Processor ArchitectureChecker Processor Architecture

IF

ID

CTOK

CoreProcessorPrediction

Stream

PC

=inst

PC

inst

EX

=regs

regs

core PC

core inst

core regs

MEM

=res/addr

addrcore res/addr/nextPC

result

D-cache

I-cache

RF

WT

Check ModeCheck Mode

result

IF

ID

CTOK

CoreProcessorPrediction

Stream

PC

=inst

inst

EX

=regs

regs

core PC

core inst

core regs

MEM

=res/addr

addrcore res/addr/nextPC

result

D-cache

I-cache

RF

WT

Page 5: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

5

Recovery ModeRecovery Mode

result

IF

ID

CT

PC inst

PC

inst

EX

regs

regs

MEM

res/addr

addr result

D-cache

I-cache

RF

How Can the Simple Checker Keep Up? How Can the Simple Checker Keep Up?

Slipstream

Slipstream reduces power requirements of trailing carChecker processor executes inside core processor’s slipstream

fast moving air ⇒ branch predictions and cache prefetchesCore processor slipstream reduces complexity requirements of checkerChecker rarely sees branch mispredictions, data hazards, or cache misses

Page 6: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

6

How Can the Simple Checker Keep Up? How Can the Simple Checker Keep Up?

Slipstream

Slipstream reduces power requirements of trailing carChecker processor executes inside core processor’s slipstream

fast moving air ⇒ branch predictions and cache prefetchesCore processor slipstream reduces complexity requirements of checkerChecker rarely sees branch mispredictions, data hazards, or cache misses

How Can the Simple Checker Keep Up? How Can the Simple Checker Keep Up?

Slipstream

IF ID REN REG

EX/MEM

SCHEDULER CHK CT

Slipstream reduces power requirements of trailing carChecker processor executes inside core processor’s slipstream

fast moving air ⇒ branch predictions and cache prefetchesCore processor slipstream reduces complexity requirements of checkerChecker rarely sees branch mispredictions, data hazards, or cache misses

Page 7: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

7

REMORA: Physical Checker DesignREMORA: Physical Checker Design

Physical checker designAlpha integer ISA subset4-wide checker, 0.5k I-cache, 4k D-cache

Less than 3% slowdown for Alpha core

Only a 6% area overhead incurred

Design also includes:Pipelined checker design, simple coreClock/voltage tuning infrastructureExtensive BIST support

205 mm2

(in 0.25um)

Alpha 21264

REMORAChecker

datacache

instcache

pipe-line

BIST

12 mm2

(in 0.25um)

Verifying the Checker ProcessorVerifying the Checker Processor

Simple checker permits complete functional verificationIn-order blocking pipelines (trivial scheduler, no rename/reorder/commit)No “internal” non-architected state

Fully verified design using Sakallah’s GRASP SAT-solverFor Alpha integer ISA without exceptionsWith small register file and memory, and small data types

X

CheckerModel

ReferenceModel

(ISA sim)

==

output

output

ϕUnspecified CorePredictions Always true if

uArch model == Ref model

Identical state?

Page 8: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

8

Presentation AgendaPresentation Agenda

BTWC Design ExamplesDIVA CheckerRazor Logic

BTWC Design Opportunities and ChallengesTypical-Case design Optimization (TCO)

Conclusion

Motivating Study:Voltage vs. Circuit Error RateMotivating Study:Voltage vs. Circuit Error Rate

Page 9: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

9

Circuit Under TestCircuit Under Test

48-b

it LF

SR

48-b

it LF

SR

48-b

it LF

SR

48-b

it LF

SR

X

X

X

clk/2

clk/2

clk clk

clk/2

clk/2

clk

!=

40-b

it E

rror C

ount

er40

-bit

Erro

r Cou

nter

Slow Pipeline A

Slow Pipeline B

Fast Pipeline

clk/2

18

18

36

36

36

18x18

18x18

18x18

stabilize

18x18-bit Multiplier Block at 90 MHz and 27 C

0.0000000%0.0000001%0.0000010%0.0000100%0.0001000%0.0010000%0.0100000%0.1000000%1.0000000%10.0000000%100.0000000%

1.141.181.221.261.301.341.381.421.461.501.541.581.621.661.701.741.78

Supply Voltage (V)

Err

or r

ate

random

Zero-margin@ 1.54 V

Safety-margin@ 1.63 V

Environmental-margin@ 1.69 V

Error Rate Studies – Empirical ResultsError Rate Studies – Empirical Results

35% energy savings with 1.3% error

22% saving

once every 20 seconds!

Page 10: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

10

Error Rate Studies – SPICE-Level SimulationsError Rate Studies – SPICE-Level Simulations

Based on a SPICE-level simulations of a Kogge-Stone adder

Kogge-Stone Adder at 870 MHz and 27 C

0.00%

0.01%

0.10%

1.00%

10.00%

100.00%

0.60.811.21.41.61.82

Supply Voltage

Erro

r ra

te

random

bzip

ammp

200 mV

Another BTWC Design:Razor LogicAnother BTWC Design:Razor Logic

Main

FF

Shad

ow La

tch

Main

FF

clk clk

clk_del

5

49 MEM39

9

Double-sampling latches detect timing errorsSecond sample is correct-by-design

Microarchitectural support restores stateTiming errors treated like branch mispredictions

Research challenges: metastability and short-path constraints

Online

Checker

Hardware

Page 11: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

11

recover

IF

Razo

r FF ID

Razo

r FF EX

Razo

r FF MEM

(read-only)WB

(reg/mem)

error bubble

recover recover

Razo

r FF

Stab

ilizer

FF

PC

recover

flushID

bubbleerror bubble

flushID

error bubble

flushIDFlushControl

flushID

error

Cycle: 0

inst1inst2inst3inst4inst5

123456

inst6

Distributed Pipeline Recovery

inst2inst7inst8

789

inst3inst4

Builds on existing branch prediction frameworkMultiple cycle penalty for timing failureScalable design as all communication is local

Razor PrototypeRazor Prototype

Icache

Dcache

RF

IF ID EX MEM WB

3.3mm

3.0mm

Page 12: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

12

Razor Opportunity:Typical-Case Energy ReductionRazor Opportunity:Typical-Case Energy Reduction

Eref

VoltageControl

Function Σ...

Pipeline

reset

Vdd

Ediff = Eref - Esample

-

EsampleVoltageRegulator

Edifferror

signals

Energy reduction can be realized with a simple proportionalcontrol function

Control algorithm implemented in software

20 40 60 80 100 120 1400123456789

10

1.481.521.561.601.641.681.721.761.80

Voltage Controller ResponseVoltage Controller Response

Two minute snapshot of a 15 min run

Con

trol

ler O

utpu

t Vol

tage

(V)

Perc

enta

ge E

rror

Rat

e

Time (Seconds)

Page 13: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

13

Energy/Performance CharacteristicsEnergy/Performance Characteristics

Decreasing Supply Voltage

Energy

Energy of ProcessorOperations, Eproc

Energy ofPipeline

Recovery,Erecovery

Total Energy,Etotal = Eproc + Erecovery

Optimal Etotal

PipelineThroughput

IPC

Energy of Processorw/o Razor Support

50%

1%

Measured ResultsMeasured Results

Num

ber o

f Chi

ps

1.4 1.5 1.6 1.7 1.8

1.4

1.5

1.6

1.7

1.8 Chips Linear Fit y=0.78685x + 0.22117

Voltage at First FailureVolta

ge a

t 0.1

%Er

ror R

ate

Point of 0.1% Error Rate Vs

Point of First Failure

0 5 10 15 20 25 3002468

10121416 Lot0

Lot1

Normalized Energy Savings over First Failure Point at 0.1% Error Rate

Percentage Savings

Page 14: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

14

Other Better Than Worst-Case designsOther Better Than Worst-Case designs

Algorithmic-Noise Tolerance, Shanbhag et al.Converting circuit faults to S/N component

Approximate Circuits, Lu et al.Architecture-level speculation on computation

TEAtime Adaptive Clock, Uht et al.Adaptive clock control

On-Chip Self-Calibrating Busses, Worm et al.Error recovery logic for on-chip busses

Self-Tuning Circuits, Kehl et al.Early work on dynamic timing error avoidance

Time Based Transient Fault Detection, Anghel et al.Double sampling latches for speed testing

March 2004

Presentation AgendaPresentation Agenda

BTWC Design ExamplesDIVA CheckerRazor Logic

BTWC Design Opportunities and ChallengesTypical-Case design Optimization (TCO)

Conclusion

Page 15: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

15

BTWC Design OpportunitiesBTWC Design Opportunities

Key observation:

Infrequent faults in the core design are tolerable.Infrequent faults in the core design are tolerable.

Opportunities:Focus only on the critical components, no need to verify ad infinitumOptimize performance/power/implementation for the most common scenarios (typical-case optimization)

BTWC Design Opportunity:Typical-Case Optimized AdderBTWC Design Opportunity:Typical-Case Optimized Adder

Kogge-Stone Adder

G0P0

G1P1

G2P2

G3P3

G4P4

G5P5

G6P6

G7P7

G8P8

G9P9

G10P10

G11P11

G12P12

G13P13

G14P14

G15P15 Cin

Page 16: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

16

Carry Propagations for Random DataCarry Propagations for Random Data

081624324048

56

016

3248

64

0

0.01

prob

abili

ty

carry propagationcarry start

Bit Position Carry Distance

Pro

babi

lity

Carry Propagations for Typical DataCarry Propagations for Typical Data

08162432404856

016

3248

64

0

0.16

prob

abili

ty

carry propagationcarry start

Carry DistanceBit Position

Pro

babi

lity

Page 17: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

17

Typical Case Optimized AdderTypical Case Optimized Adder

G0P0

G1P1

G2P2

G3P3

G4P4

G5P5

G6P6

G7P7

G8P8

G9P9

G10P10

G11P11

G12P12

G13P13

G14P14

G15P15 Cin

ripple carry circuitcarry-lookahead circuit

Benefits of Typical Case OptimizationBenefits of Typical Case Optimization

3.0316TCO Adder

5.088Kogge-Stone

Typical-CaseWorst-Case

Latency (in gate delays)Adder Topology

Typical-case performance much better than worst caseEspecially for typical-case optimized design

Page 18: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

18

Presentation AgendaPresentation Agenda

BTWC Design ExamplesDIVA CheckerRazor Logic

BTWC Design Opportunities and ChallengesTypical-Case design Optimization (TCO)Circuit-level observability and system-level performance

Conclusion

BTWC Design Challenge:Observability of Circuit-Level CharacteristicsBTWC Design Challenge:Observability of Circuit-Level Characteristics

App

ArchConfig

ArchitecturalSimulator

ArchitecturalSimulator

CircuitSimulatorCircuit

Simulator

Output

ArchMetrics

ModuleCircuitModels

TechModels

CircuitMetrics

Inputs,Voltage,

Constraints

Delay,Power,Switching

IF ID EX MEM WBSpeedand

Scope

Fidelityand

Observability

Circuit-Aware Architectural Simulator efficiently melds circuit simulation with architectural simulation

Page 19: Better Than Worst-Case Designweb.eecs.umich.edu/~taustin/papers/BTWCDesign-MSFT.pdf · 2008. 7. 28. · 1 Better Than Worst-Case DesignBetter Than Worst-Case Design Todd Austin University

19

ConclusionConclusion

Better than worst-case design abandons traditional worst-case design constraintsCouples complex designs with checkers

DIVA Checker verifies program computationRazor Logic verifies circuit timing

Enables CAD opportunities for typical-case optimization