22
1 University of Michigan Electrical Engineering and Computer Science Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1 , Shuguang Feng 1 , Shantanu Gupta 1 , Scott Mahlke 1 , Daryl Bradley 2 University of Michigan 1 ARM, Ltd. 2

Cost-Efficient Soft Error Protection for Embedded Microprocessors

  • Upload
    dewey

  • View
    47

  • Download
    1

Embed Size (px)

DESCRIPTION

Cost-Efficient Soft Error Protection for Embedded Microprocessors. Jason Blome 1 , Shuguang Feng 1 , Shantanu Gupta 1 , Scott Mahlke 1 , Daryl Bradley 2 University of Michigan 1 ARM, Ltd. 2. CLK. 0. Q. D. transient fault. soft error. The Soft Error Problem. 1. Register File. - PowerPoint PPT Presentation

Citation preview

Page 1: Cost-Efficient Soft Error Protection for Embedded Microprocessors

1 University of MichiganElectrical Engineering and Computer Science

Cost-Efficient Soft Error Protection for Embedded Microprocessors

Jason Blome1, Shuguang Feng1, Shantanu Gupta1, Scott Mahlke1, Daryl Bradley2

University of Michigan1

ARM, Ltd. 2

Page 2: Cost-Efficient Soft Error Protection for Embedded Microprocessors

2 University of MichiganElectrical Engineering and Computer Science

The Soft Error Problem

transient fault soft error

0CLK

D Q1

Page 3: Cost-Efficient Soft Error Protection for Embedded Microprocessors

3 University of MichiganElectrical Engineering and Computer Science

Fault Masking• Logical: faulted value does not affect logical

operation of the circuit

0

0

• Latching-Window: the fault pulse does not reach a state element within the latching window

• Electrical: the fault pulse is electrically attenuated by subsequent gates in the circuit

• Architectural/Software: incorrect state is written before it is read

CLK

tsetup thold

mov r5, 8

mov r2, 4------

…de

code

r

Register File

012345

add r6, r2, r5mov r5, 8

mov r2, 4

98

4add r6, r2, r5

Page 4: Cost-Efficient Soft Error Protection for Embedded Microprocessors

4 University of MichiganElectrical Engineering and Computer Science

Soft Error Rate Trends

Shivakumar 2002

Soft Error Rate Contributions

Mitra 2005

49%

11%

40%

StaticCombinationalLogicUnprotectedSRAMs

SequentialElements

Increasing contribution of faults in combinational logic to the overall soft error rate

Page 5: Cost-Efficient Soft Error Protection for Embedded Microprocessors

5 University of MichiganElectrical Engineering and Computer Science

Outline• Soft error analysis setup• Summary of fault analysis results• Fault tolerance techniques

► Register value cache► Strategic deployment of fault detectors

• Conclusion

Page 6: Cost-Efficient Soft Error Protection for Embedded Microprocessors

6 University of MichiganElectrical Engineering and Computer Science

Fault Analysis Frameworktestbench

referencedesign

testdesign

report generation

benchmark

fault injection/error analysis framework

error checkingand logging

fault injectionscheduler

RegisterBank

Data Interface

InstructionAddress

Logic

DataAddress

Logic

Multiply ALU

Shift

Instruction Decode

ARM926EJ-SInstruction Fetch

DatacacheMMU

InstructioncacheMMU

Bus Interface

Write Buffer/Bus Interface

MuxArray

Page 7: Cost-Efficient Soft Error Protection for Embedded Microprocessors

7 University of MichiganElectrical Engineering and Computer Science

Observed Error Rates

Error Site Error RateMicroarchitectural State 94%

Architectural State 7%

Error Site Error RateMicroarchitectural State 16%

Architectural State 4%

Faults Occurring in Registers

Faults Occurring in Combinational Logic

At the software interface, error rates within 3%

94%

16%

7%

4%

Page 8: Cost-Efficient Soft Error Protection for Embedded Microprocessors

8 University of MichiganElectrical Engineering and Computer Science

Impact of Fault Injection

05

101520253035404550

0 5 10 15 20Cycle

Num

ber o

f Err

ors

Comb. Logic:Microarchitectural StateErrors

Comb. Logic: ArchitecturalState Errors

Seq. State:Microarchitectural StateErrors

Seq. State: ArchitecturalState Errors

Page 9: Cost-Efficient Soft Error Protection for Embedded Microprocessors

9 University of MichiganElectrical Engineering and Computer Science

Targeting the Faults that Count

• ARM926EJ-S register file consumes 8.7% of total core area

► Responsible for 57.4% of architectural errors• Register file area dominated by combinational

logic► ECC cost, efficacy?

Page 10: Cost-Efficient Soft Error Protection for Embedded Microprocessors

10 University of MichiganElectrical Engineering and Computer Science

The Register Value Cache

Register Value Cache

Register File

CMP

CMP

CMP

Stall/Check CRC

deco

der

012345

x

x…

10

32

54

Read/WriteAddr/Data Read Result

Page 11: Cost-Efficient Soft Error Protection for Embedded Microprocessors

11 University of MichiganElectrical Engineering and Computer Science

The Register Value CacheValid

Read/WriteAddr

ReadData

Index Array

Value Array

Previous Read Values

CRC

CRC

WriteData

WriteData

Error

CMP Error

Read OperationWrite OperationCheck Operation

Page 12: Cost-Efficient Soft Error Protection for Embedded Microprocessors

12 University of MichiganElectrical Engineering and Computer Science

Example

------

deco

der

Register File

Register Cache

x

x…

----

4

8

40

48

mov r5, 8

mov r2, 4

add r3, r1, r4

mov r5, 8

mov r2, 4

add r3, r2, r5

CheckCRC

012345

10

32

54

---

-8 crc4 crc

Page 13: Cost-Efficient Soft Error Protection for Embedded Microprocessors

13 University of MichiganElectrical Engineering and Computer Science

RVC Fault Coverage

57.4%

Page 14: Cost-Efficient Soft Error Protection for Embedded Microprocessors

14 University of MichiganElectrical Engineering and Computer Science

RVC Overhead

Page 15: Cost-Efficient Soft Error Protection for Embedded Microprocessors

15 University of MichiganElectrical Engineering and Computer Science

What About the Rest?• Leverage fault fanout to place detectors at

likely targets

Page 16: Cost-Efficient Soft Error Protection for Embedded Microprocessors

16 University of MichiganElectrical Engineering and Computer Science

Fault Fanout

Page 17: Cost-Efficient Soft Error Protection for Embedded Microprocessors

17 University of MichiganElectrical Engineering and Computer Science

Transient Fault Detector

Main Flip-Flop

ShadowLatchDelay

DCLK

Error

Q

ShadowLatch

A Self-Tuning DVS Processor Using Delay-Error Detection and Correction: S. Das 2006

Main Flip-Flop

Page 18: Cost-Efficient Soft Error Protection for Embedded Microprocessors

18 University of MichiganElectrical Engineering and Computer Science

Glitch Detector CoveragePower Area

Percent Overhead Percent Overhead

Cov

erag

e

Cov

erag

e

Page 19: Cost-Efficient Soft Error Protection for Embedded Microprocessors

19 University of MichiganElectrical Engineering and Computer Science

Combined Technique CoveragePower Area

Percent Overhead Percent Overhead

Cov

erag

e

Cov

erag

e

Page 20: Cost-Efficient Soft Error Protection for Embedded Microprocessors

20 University of MichiganElectrical Engineering and Computer Science

Conclusion

• Circuit level soft error analysis offers significant insight

• Faults in combinational logic do not require structural duplication

► Coverage versus cost tradeoffs available► Significant benefits in compromise

• 85% fault coverage for only 5.5% area► 2-3x increase in MTTF

Page 21: Cost-Efficient Soft Error Protection for Embedded Microprocessors

21 University of MichiganElectrical Engineering and Computer Science

Questions?

Page 22: Cost-Efficient Soft Error Protection for Embedded Microprocessors

22 University of MichiganElectrical Engineering and Computer Science

RVC Hit Rates

0.7

0.75

0.8

0.85

0.9

0.95

1

6 8 10 12 14 16

Cache Size

Hit

Rat

e

cjpeg

djpegepic

unepicg721decode

g721encodepegwitdecode

pegwitencoderawcaudio

rawdaudioaverage