Upload
dewey
View
47
Download
1
Embed Size (px)
DESCRIPTION
Cost-Efficient Soft Error Protection for Embedded Microprocessors. Jason Blome 1 , Shuguang Feng 1 , Shantanu Gupta 1 , Scott Mahlke 1 , Daryl Bradley 2 University of Michigan 1 ARM, Ltd. 2. CLK. 0. Q. D. transient fault. soft error. The Soft Error Problem. 1. Register File. - PowerPoint PPT Presentation
Citation preview
1 University of MichiganElectrical Engineering and Computer Science
Cost-Efficient Soft Error Protection for Embedded Microprocessors
Jason Blome1, Shuguang Feng1, Shantanu Gupta1, Scott Mahlke1, Daryl Bradley2
University of Michigan1
ARM, Ltd. 2
2 University of MichiganElectrical Engineering and Computer Science
The Soft Error Problem
transient fault soft error
0CLK
D Q1
3 University of MichiganElectrical Engineering and Computer Science
Fault Masking• Logical: faulted value does not affect logical
operation of the circuit
0
0
• Latching-Window: the fault pulse does not reach a state element within the latching window
• Electrical: the fault pulse is electrically attenuated by subsequent gates in the circuit
• Architectural/Software: incorrect state is written before it is read
CLK
tsetup thold
mov r5, 8
mov r2, 4------
…de
code
r
Register File
012345
add r6, r2, r5mov r5, 8
mov r2, 4
98
4add r6, r2, r5
4 University of MichiganElectrical Engineering and Computer Science
Soft Error Rate Trends
Shivakumar 2002
Soft Error Rate Contributions
Mitra 2005
49%
11%
40%
StaticCombinationalLogicUnprotectedSRAMs
SequentialElements
Increasing contribution of faults in combinational logic to the overall soft error rate
5 University of MichiganElectrical Engineering and Computer Science
Outline• Soft error analysis setup• Summary of fault analysis results• Fault tolerance techniques
► Register value cache► Strategic deployment of fault detectors
• Conclusion
6 University of MichiganElectrical Engineering and Computer Science
Fault Analysis Frameworktestbench
referencedesign
testdesign
report generation
benchmark
fault injection/error analysis framework
error checkingand logging
fault injectionscheduler
RegisterBank
Data Interface
InstructionAddress
Logic
DataAddress
Logic
Multiply ALU
Shift
Instruction Decode
ARM926EJ-SInstruction Fetch
DatacacheMMU
InstructioncacheMMU
Bus Interface
Write Buffer/Bus Interface
MuxArray
7 University of MichiganElectrical Engineering and Computer Science
Observed Error Rates
Error Site Error RateMicroarchitectural State 94%
Architectural State 7%
Error Site Error RateMicroarchitectural State 16%
Architectural State 4%
Faults Occurring in Registers
Faults Occurring in Combinational Logic
At the software interface, error rates within 3%
94%
16%
7%
4%
8 University of MichiganElectrical Engineering and Computer Science
Impact of Fault Injection
05
101520253035404550
0 5 10 15 20Cycle
Num
ber o
f Err
ors
Comb. Logic:Microarchitectural StateErrors
Comb. Logic: ArchitecturalState Errors
Seq. State:Microarchitectural StateErrors
Seq. State: ArchitecturalState Errors
9 University of MichiganElectrical Engineering and Computer Science
Targeting the Faults that Count
• ARM926EJ-S register file consumes 8.7% of total core area
► Responsible for 57.4% of architectural errors• Register file area dominated by combinational
logic► ECC cost, efficacy?
10 University of MichiganElectrical Engineering and Computer Science
The Register Value Cache
Register Value Cache
Register File
CMP
CMP
CMP
Stall/Check CRC
…
deco
der
012345
x
x…
10
32
54
Read/WriteAddr/Data Read Result
11 University of MichiganElectrical Engineering and Computer Science
The Register Value CacheValid
Read/WriteAddr
ReadData
Index Array
Value Array
Previous Read Values
CRC
CRC
WriteData
WriteData
Error
CMP Error
Read OperationWrite OperationCheck Operation
12 University of MichiganElectrical Engineering and Computer Science
Example
------
…
deco
der
Register File
Register Cache
x
x…
----
4
8
40
48
mov r5, 8
mov r2, 4
add r3, r1, r4
mov r5, 8
mov r2, 4
add r3, r2, r5
CheckCRC
012345
10
32
54
---
-8 crc4 crc
13 University of MichiganElectrical Engineering and Computer Science
RVC Fault Coverage
57.4%
14 University of MichiganElectrical Engineering and Computer Science
RVC Overhead
15 University of MichiganElectrical Engineering and Computer Science
What About the Rest?• Leverage fault fanout to place detectors at
likely targets
16 University of MichiganElectrical Engineering and Computer Science
Fault Fanout
17 University of MichiganElectrical Engineering and Computer Science
Transient Fault Detector
Main Flip-Flop
ShadowLatchDelay
DCLK
Error
Q
ShadowLatch
A Self-Tuning DVS Processor Using Delay-Error Detection and Correction: S. Das 2006
Main Flip-Flop
18 University of MichiganElectrical Engineering and Computer Science
Glitch Detector CoveragePower Area
Percent Overhead Percent Overhead
Cov
erag
e
Cov
erag
e
19 University of MichiganElectrical Engineering and Computer Science
Combined Technique CoveragePower Area
Percent Overhead Percent Overhead
Cov
erag
e
Cov
erag
e
20 University of MichiganElectrical Engineering and Computer Science
Conclusion
• Circuit level soft error analysis offers significant insight
• Faults in combinational logic do not require structural duplication
► Coverage versus cost tradeoffs available► Significant benefits in compromise
• 85% fault coverage for only 5.5% area► 2-3x increase in MTTF
21 University of MichiganElectrical Engineering and Computer Science
Questions?
22 University of MichiganElectrical Engineering and Computer Science
RVC Hit Rates
0.7
0.75
0.8
0.85
0.9
0.95
1
6 8 10 12 14 16
Cache Size
Hit
Rat
e
cjpeg
djpegepic
unepicg721decode
g721encodepegwitdecode
pegwitencoderawcaudio
rawdaudioaverage