Upload
emory-obrien
View
215
Download
1
Embed Size (px)
Citation preview
Replication Cache: A Small Fully Associative Cache to Improve Data Cache ReliabilityBy Wei Zhang, IEEE MemberIEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 12, DECEMBER 2005
Roza GhamariBogazici University
Why Fault-Tolerance in Cache?
Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible to transient hardware failures Cache memories are more vulnerable
Aggressive leakage control techniques over caches also have negative impact.
Cache soft errors can easily be propagated
1/22
Outlines
1. Introduction
2. Replication Cache in detail
3. Schemes under Consideration
4. Evaluation Methodology
5. Results
6. Conclusion
7. References
2/22
Introduction
Error Correcting Techniques: Single Error Correcting-Double Error
Detecting (SEC-DED)
Parity Check
N Modular Redundancy (NMR)
In-Cache Replication (I-CR)
3/22
Introduction (Cont.)
Single Error Correcting-Double Error Detecting (SEC-DED)
Fundamental limitation in error detection and correction
Not capable of correcting double or more bit errors
Needs a read-modify-write cycle Impact performance Nontrivial energy overhead
4/22
Introduction (Cont.)
Parity Check
Cannot detect any even bit errors No error correction
N Modular Redundancy (NMR) with Majority Voting Too expensive for microprocessors or
embedded systems with stringent cost and area constraints
5/22
Introduction (Cont.)
In-Cache Replication (I-CR) :
Exploit “dead” blocks in the data cache to store the replicas for “hot” blocks
Nontrivial portion of data is unprotected not acceptable for applications demanding very high reliability
No perfect dead block predictor Performance overhead
Replicas overlap the data Performance degradation
6/22
Introduction (Cont.)
Replication Cache:
Main Idea :using a small fully associative cache to store the replica for every write to the L1 data cache
Provide 100% loads with replica Has no impact on performance Much more area efficient
7/22
Replication Cache in detail
A small fully associative cache in between the CPU and the L2 cache
Store the replicas for the “dirty” data in the L1 data cache
Address mapping is straightforward
In case of replication cache capacity misses some replicas may be written back to theL2 cache
8/22
CPU
L1 I-Cache L1 D-CacheR-Cache
L2 Chache
Memory
Replication Cache in detail (cont.)
When do we replicate?
Replicate data when it is written from the processor (Replicate the “dirty” data )
Replicating the data in case of replica miss in the replication cache
How do we protect the primary data and replicas? maintaining a parity bit at byte granularity
( no performance penalty in the common case) 9/22
Replication Cache in detail (cont.)
How do we replace the cache block if the replication cache is full? Discard the least recently used block Replicating the data in L2 in case of
replica miss (for applications that require full replication for the “dirty” data)
Use the LRU (Least-Recently-Used) algorithm for replica replacement
10/22
Replication Cache in detail (cont.)
How many replicas do we need? Making multiple replicas within one
replication cache (much more area efficient)
How do we detect soft errors? Compare the data in L1 and its replica in
the replication cache in parallel loads take two cycles and stores take
one cycle11/22
Replication Cache in detail (cont.)
How do we recover from soft errors?
not “dirty” Loading block from L2
“dirty” Using replicas in the replication cache for correcting errors
Soft errors in replica Using majority voting if multiple copies of the same data existed
12/22
Schemes under Consideration
Base normal L1 data
cache without the replication cache
Parity protection
RC-P One replication
cache Parity protection In case of soft
errors, the replication cache is accessed
RC-C The replication
cache and the L1 data cache are searched in parallel and are compared with each other before the load returns
RC-2 Two replicas in the
replication cache for every write (majority voting)
13/22
Schemes under Consideration (Cont.)
For all RC schemes conservatively assume two cycles for load operations and one cycle for store operations
RC-C and RC-2 schemes use parallel comparison to detect errors multi-bit error detection
Parallel comparison one cycle latency is hide if proceeding speculatively
14/22
Evaluation Methodology
Evaluation Metrics
Execution Cycles : time taken for the execution of 200 million application instructions
Loads with Replica: the fraction of read hits having replicas in the replication cache
Implemented by modifying the “Simplesclar 3.0”
Eight applications from the SPEC 2000 for evaluation
15/22
Results
Size of the Replication Cache
16/22
bizip2 equake0
0.2
0.4
0.6
0.8
1
1.2
2 4 8 16 32
Load
s w
ith
Rep
lica
Results (Cont.)
17/22
verify the effectiveness of the replication cache
8K 16K 32K 64K 128K0.97
0.975
0.98
0.985
0.99
0.995
1
bzip2 load_with_replica bzip2 Hit Rate equake Load_with_replica equake Hit Rate
Results (Cont.)
Comparison between schemes
18/22
bizip2 equake
gcc gzip mcf mesa vor-tex
vpr0
0.2
0.4
0.6
0.8
1
1.2
RC-P RC-2 ICR
Load
s w
ith
Rep
lica
Results (Cont.)
Performance Comparison
19/22
bizip2
equake
gcc gzip mcf mesa vor-tex
vpr0
0.010.020.030.040.050.06
RC-P
Rep
licati
on
C
ach
e W
rite
-B
ack R
ate
Results (Cont.)
Performance Comparison
20/22
bizip2
equake
gcc gzip mcf mesa vor-tex
vpr0.98
11.021.041.061.081.1
1.121.14
RC-C
Norm
alized
exe-
cu
tion
cycle
for
RC
-C
Conclusion
A Fully Associative Replication CacheRC-P:
▪ Applications that only need parity-based protection
RC-C, RC-2:
▪ Applications require higher data integrity
▪ Applications operate under highly noisy environments
21/22
References
[1]J. Ray، J.C. Hoe، and B. Falsafi، “Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery،” Proc.MICRO، Dec. 2001.
[2]W. Zhang، S. Gurumurthi، M. Kandemir، and A. Sivasubramaniam، “ICR: In-Cache Replication for Enhancing Data Cache Reliability،” Proc. Int’l Conf. Dependable Service and Networks (DSN)، 2003.
[3]V. Degalahal، N. Vijaykrishnan، and M.J Irwin، “Analyzing Soft Errors in Leakage Optimized SRAM Design،” Proc. VLSI Design Conf.، Jan. 2003.
22/22
Thanks
1/22