Roza Ghamari Bogazici University. Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible

Replication Cache: A Small Fully Associative Cache to Improve Data Cache ReliabilityBy Wei Zhang, IEEE MemberIEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 12, DECEMBER 2005

Roza GhamariBogazici University

Why Fault-Tolerance in Cache?

Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible to transient hardware failures Cache memories are more vulnerable

Aggressive leakage control techniques over caches also have negative impact.

Cache soft errors can easily be propagated

1/22

Outlines

1. Introduction

2. Replication Cache in detail

3. Schemes under Consideration

4. Evaluation Methodology

5. Results

6. Conclusion

7. References

2/22

Introduction

Error Correcting Techniques: Single Error Correcting-Double Error

Detecting (SEC-DED)

Parity Check

N Modular Redundancy (NMR)

In-Cache Replication (I-CR)

3/22

Introduction (Cont.)

Single Error Correcting-Double Error Detecting (SEC-DED)

Fundamental limitation in error detection and correction

Not capable of correcting double or more bit errors

Needs a read-modify-write cycle Impact performance Nontrivial energy overhead

4/22


Parity Check

Cannot detect any even bit errors No error correction

N Modular Redundancy (NMR) with Majority Voting Too expensive for microprocessors or

embedded systems with stringent cost and area constraints

5/22


In-Cache Replication (I-CR) :

Exploit “dead” blocks in the data cache to store the replicas for “hot” blocks

Nontrivial portion of data is unprotected not acceptable for applications demanding very high reliability

No perfect dead block predictor Performance overhead

Replicas overlap the data Performance degradation

6/22


Replication Cache:

Main Idea :using a small fully associative cache to store the replica for every write to the L1 data cache

Provide 100% loads with replica Has no impact on performance Much more area efficient

7/22

Replication Cache in detail

A small fully associative cache in between the CPU and the L2 cache

Store the replicas for the “dirty” data in the L1 data cache

Address mapping is straightforward

In case of replication cache capacity misses some replicas may be written back to theL2 cache

8/22

CPU

L1 I-Cache L1 D-CacheR-Cache

L2 Chache

Memory

Replication Cache in detail (cont.)

When do we replicate?

Replicate data when it is written from the processor (Replicate the “dirty” data )

Replicating the data in case of replica miss in the replication cache

How do we protect the primary data and replicas? maintaining a parity bit at byte granularity

( no performance penalty in the common case) 9/22


How do we replace the cache block if the replication cache is full? Discard the least recently used block Replicating the data in L2 in case of

replica miss (for applications that require full replication for the “dirty” data)

Use the LRU (Least-Recently-Used) algorithm for replica replacement

10/22


How many replicas do we need? Making multiple replicas within one

replication cache (much more area efficient)

How do we detect soft errors? Compare the data in L1 and its replica in

the replication cache in parallel loads take two cycles and stores take

one cycle11/22


How do we recover from soft errors?

not “dirty” Loading block from L2

“dirty” Using replicas in the replication cache for correcting errors

Soft errors in replica Using majority voting if multiple copies of the same data existed

12/22

Schemes under Consideration

Base normal L1 data

cache without the replication cache

Parity protection

RC-P One replication

cache Parity protection In case of soft

errors, the replication cache is accessed

RC-C The replication

cache and the L1 data cache are searched in parallel and are compared with each other before the load returns

RC-2 Two replicas in the

replication cache for every write (majority voting)

13/22

Schemes under Consideration (Cont.)

For all RC schemes conservatively assume two cycles for load operations and one cycle for store operations

RC-C and RC-2 schemes use parallel comparison to detect errors multi-bit error detection

Parallel comparison one cycle latency is hide if proceeding speculatively

14/22

Evaluation Methodology

Evaluation Metrics

Execution Cycles : time taken for the execution of 200 million application instructions

Loads with Replica: the fraction of read hits having replicas in the replication cache

Implemented by modifying the “Simplesclar 3.0”

Eight applications from the SPEC 2000 for evaluation

15/22

Results

Size of the Replication Cache

16/22

bizip2 equake0

0.2

0.4

0.6

0.8

1

1.2

2 4 8 16 32

Load

s w

ith

Rep

lica

Results (Cont.)

17/22

verify the effectiveness of the replication cache

8K 16K 32K 64K 128K0.97

0.975

0.98

0.985

0.99

0.995

1

bzip2 load_with_replica bzip2 Hit Rate equake Load_with_replica equake Hit Rate

Results (Cont.)

Comparison between schemes

18/22

bizip2 equake

gcc gzip mcf mesa vor-tex

vpr0

0.2

0.4

0.6

0.8

1

1.2

RC-P RC-2 ICR

Load

s w

ith

Rep

lica

Results (Cont.)

Performance Comparison

19/22

bizip2

equake


vpr0

0.010.020.030.040.050.06

RC-P

Rep

licati

on

C

ach

e W

rite

-B

ack R

ate

Results (Cont.)

Performance Comparison

20/22

bizip2

equake


vpr0.98

11.021.041.061.081.1

1.121.14

RC-C

Norm

alized

exe-

cu

tion

cycle

for

RC

-C

Conclusion

A Fully Associative Replication CacheRC-P:

▪ Applications that only need parity-based protection

RC-C, RC-2:

▪ Applications require higher data integrity

▪ Applications operate under highly noisy environments

21/22

References

[1]J. Ray، J.C. Hoe، and B. Falsafi، “Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery،” Proc.MICRO، Dec. 2001.

[2]W. Zhang، S. Gurumurthi، M. Kandemir، and A. Sivasubramaniam، “ICR: In-Cache Replication for Enhancing Data Cache Reliability،” Proc. Int’l Conf. Dependable Service and Networks (DSN)، 2003.

[3]V. Degalahal، N. Vijaykrishnan، and M.J Irwin، “Analyzing Soft Errors in Leakage Optimized SRAM Design،” Proc. VLSI Design Conf.، Jan. 2003.

22/22

Thanks

1/22

Documents

Roza Ghamari Bogazici University. Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible