20
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2 , Yu Hu 1 , and Xiaowei Li 1 1 Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences 2 Graduate University of Chinese Academy of Sciences

IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults

  • Upload
    armen

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults. Songjun Pan 1,2 , Yu Hu 1 , and Xiaowei Li 1 1 Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences - PowerPoint PPT Presentation

Citation preview

Page 1: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults

Songjun Pan1,2, Yu Hu1, and Xiaowei Li1

1Key Laboratory of Computer System and Architecture

Institute of Computing Technology

Chinese Academy of Sciences

2Graduate University of Chinese Academy of Sciences

Page 2: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Outline• Background and Related Work

• IVF Computing Methodology

• Experimental Results

• Conclusions

Page 3: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Background

Intermittent faults are emerging as a major source of failures in microprocessors [DSN’02]

FailureRate

InfantMortality

Stage

Useful LifeStage

Wear-outStage

Deep Submicron Era

Defect escape

Soft Errors

FasterAging

Lifetime

Intermittent faults

Page 4: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Intermittent Faults• Description

• Occur frequently and irregularly for a period of time

• Caused by loose connection, manufacturing residuals, process variation, or in-progress wear-out, combined with voltage and temperature fluctuations

• Characteristics

• Occur in bursts at the same location

• Removed if replace the offending circuit

• Activated or deactivated by PVT (process, temperature, and voltage) variations

Page 5: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Protecting the Microprocessor• Information redundancy techniques

• Parity and error-correcting codes– High area overhead

– High power consumption

• Hardware redundancy techniques• Dual modular redundancy/Triple modular redundancy

– 100%~200% area overhead

• Software redundancy techniques• Redundant multi-threading

– 10%~30% performance overhead

• Conventional protection methods ensure high reliability but also cause high overhead

Page 6: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Trade-off Reliability and Overhead• Key Observation

• Not all faults lead to external program failures

• A fault in branch predictor: doesn’t matter at all

• A fault in program counter: almost always matters

• Which bit matters? • ACE bit / un-ACE bit: Architectural Correct

Execution (ACE) bit [MICRO’03]

• ACE bit: If changed will lead to an external error

• Reliability evaluation• Protect the most vulnerable structures

Page 7: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Related Metrics• Mean Time To Failure (MTTF) / Mean Time

Between Repair (MTBR)• Masking effect

• Structure utilization

• Soft Error Vulnerability Analysis• Architectural Vulnerability Factor (AVF) [MICRO’03]

• Program Vulnerability Factor (PVF) [HPCA’09]

• Hard Fault Vulnerability Analysis• Hard-Faults AVF (H-AVF) [SIGMETRICS’06]

The vulnerability to intermittent faults are rarely considered due to their rich causes and behaviors

Page 8: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Our Contributions• Propose a metric Intermittent Vulnerability Factor

(IVF) to characterize the vulnerability to intermittent faults

• IVF definition: a structure’s IVF is the probability an intermittent fault in that structure causes an external visible error

• Present IVF computing algorithms for reorder buffer and register file

• Compute IVF with different fault configurations

Page 9: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Intermittent Fault ModelsCauses and mechanisms

Fault models at the logic level

Cell Solder jointInductive

noiseElectro-

migrationCrosstalk

Soft breakdown

Variation ofmetal R&C

Fluctuation of leakage current

IntermittentStuck-at

Intermittentshort

Intermittentopen

Intermittentpulse

Intermittentdelay

Intermittentindetermination

Manufacturingresidues Timing violations Oxide breakdown

Memory Buses Interconnectionlines, buses

Power supply

Intermittentcontacts

Page 10: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Intermittent Stuck-at Faults• Intermittent stuck-at faults

• Change the correct value intermittently to logic one or logic zero

• Vulnerable structures: storage structures such as memory and register file

• Key Parameters• Burst length/active time/inactivity time• Have adverse effect during the active time

. .

.burst lengthburst length

active timeinactive time

time

Page 11: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

IVF Computing• Determine whether an intermittent fault affects

program execution or not

• Analyze ACE bit / critical time

• Set the three key parameters: burst length, active time, and inactive time

• Burst length: randomly generated from [10T, 30T]

• Duty cycle: 50%

• Start time: randomly generated

• Compute IVFs for reorder buffer and register file

. .

.burst lengthburst length

active timeinactive time

time

Page 12: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

IVF Computing – Reorder buffer

entry

cycle

Y

Z

ACE

X

bit

ACE Bit AnalysisTime

An exampleof an intermittent fault

Active time

Inactive time

1

( )B

DACE

srob

U sIVF

B

2 / 6 1/ 3 robIVF Planar representation

B1

B2 B3

Page 13: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

IVF Computing – Register File

register version n

Allocation W R1 R2 Rlast DeallocationTime

n+1n-1 critical timenon-

criticalnon-

critical

F1 F3F2

Critical Time Analysis

1

( )E

DCT

ereg

U eIVF

E

Page 14: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Experimental Setup• Simulated processor configurations

• Execution-driven simulator Sim-Alpha

• Reorder buffer/register file 80/80 entries

• 4 integer ALUs, 2 integer multipliers, 2 float ALUs

• Hybrid, 4K global + 2-level 1K local + 4K choice branch predictor

• 64KB 2-way L1 data cache, 2MB direct mapped L2 cache

• Workload• SPEC2000 integer benchmark suite

• Simulate 100M instructions with SimPoint

Page 15: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

IVF vs AVF

0

10

20

30

40

50

60

70

80 AVF BL 10BL 20 BL 30

IVF varies significantly across benchmarksLonger burst length, higher IVF

IVF is much higher than AVF

Reorder Buffer

Page 16: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Different Fault Configurations

0

10

20

30

40

50

60

70 Config1_4 Config1_2Config2_4 Config2_2

Reorder Buffer

IVF varies little across burst length configuration filesIVF varies significantly for different active time

Page 17: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

IVF at Entry Level

0

20

40

60

80

100

1 11 21 31 41 51 61 71

Active Time 1Active Time 2Active Time 4

IVF varies across different entriesArchitecture registers are more vulnerable

Register File

Architecture registers

Renaming registers

Page 18: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Implications• Quantitatively guide reliability design at early

design stage and evaluate system reliability

• Harden partial structures/entries for high reliability while minimizing the overhead

• Razor [MICRO’03]• Parshield [DSN’07]

• Easily extend to analyze other structures (issue queue, load/store queue, and cache)

Page 19: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

Conclusions• Propose a methodology to characterize the

vulnerability of microprocessor structures to intermittent faults

• Compute IVF for reorder buffer and register file

• IVF varies significantly across inter- and intra-structures, motivating to protect the most vulnerable structures to improve system reliability

Page 20: IVF: Characterizing the  Vulnerability of Microprocessor Structures to Intermittent Faults

• Thank You for Your Attention

• Question?