Upload
virginie-michel
View
23
Download
0
Embed Size (px)
DESCRIPTION
Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs. Adwait Jog † , Asit K. Mishra‡, Cong Xu†, Yuan Xie†, N. Vijaykrishnan†, Ravi Iyer‡, Chita R. Das†. † The Pennsylvania State University ‡ Intel Corporation. - PowerPoint PPT Presentation
Citation preview
Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs
Adwait Jog†, Asit K. Mishra‡, Cong Xu†, Yuan Xie†, N. Vijaykrishnan†, Ravi Iyer‡, Chita R. Das†
†The Pennsylvania State University ‡ Intel Corporation
2
STT-RAM as Emerging Memory Technology
• Spin-Torque Transfer RAM (STT-RAM) combines the speed of SRAM, density of DRAM, and non-volatility of Flash memory, making it attractive for on chip cache hierarchies.
• STT-RAM caches suffer from long write latency and higher write energy consumption when compared to traditional SRAM caches.
SRAM vs. STT-RAM
3
Area (mm2)
Read Energy
(nJ)
Write Energy
(nJ)
Leakage Power at
(mW)
Read Latency
(ns)
Write latency
(ns)
Read @ 2 GHz
(cycles)
Write @2 GHz (cycles)
1 MB SRAM 2.61 0.578 0.57
84542 1.012 1.012 2 2
4MB STT-RAM
3.00 1.035 1.066
2524 0.998 10.61 2 22
~3-4x denser
(capacity benefit)
1.8x lower
leakage energy
Comparable read
latency
~11x higher write
latency (@
2GHZ)
4
Proposal : Reduce Retention Time
• Years of data-retention time for STT-RAM may not be required.
• Trade-off retention time for lower STT-RAM write latency
• Challenge: Architecting “Volatile STT-RAM” Caches
• Advantage: Performance and Energy Benefits!
5
How to Calculate Optimal Retention Time?
(1) Device Constraints:Retention Time of STT-RAM can be reduced to a certain limit.
(2) Application Needs:Application Characteristics show that data-retention time in range of milliseconds is sufficient enough to make STT-RAM caches effective for CMPs.
Both Device Constraints and Application Needs should be considered for Optimal Results!
How to Reduce STT-RAM Write Latency?
6
1 2 3 4 5 6 7 8 9 100
50
100
150
200
250
300
10 years 1sec
Write Pulse Width (ns)
Wri
te C
urr
en
t (u
A)
Operating Point
Write current goes down with
reduction in retention time
Retention Time of STT-RAM
Write Latency @ 2 GHz
10 Years 22 cycles
1 second 12 cycles
10 millisecond 6 cycles
Retention Time
7
Inter-Write Time (Refresh Time) Distributions of Multi-threaded and Multi-Programmed Benchmarks
libq. gcc namd AVG. 0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
40+ ms
40 ms
30 ms
20 ms
10 ms
5 ms
frrt. fluid. x264 AVG.0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Pe
rce
nta
ge
of
Blo
cks
PARSEC SPEC 2006
Majority (> 50%) of L2 Cache Blocks get refreshed within 10ms
How much non-volatility can be traded off?
Volatile STT-RAM Based Last level Cache Design
8
Dying Blocks (Refresh)
Dying Blocks (Do not Refresh)
NON- IMP Blocks IMP Blocks
Answer: Use Selective Refresh Policy.
How to save rest 50% of the blocks?
Only refresh cache blocks which are in MRU Slots.
Block State
WAY ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
9
How to refresh?
Block State
WAY ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Is Buffer Full? YES Dirty?
YES
Write-back to DRAM
NOCOPY
COPY BACK
IMP Blocks NON- IMP Blocks
10
Results: Speedup Improvement
dedup freq. rtvw. swpts. x264 frrt. fcsim. vips fluid. AVG. 0.700000000000003
0.900000000000003
1.1
1.3
1.5
1.7
S-1MB S-4MB (Ideal) M-4MB Volatile M-4MB(1sec) Volatile M-4MB(10ms) Revived-M-4MB(10ms)N
orm
aliz
ed s
peed
up
Instruction Throughput Weighted Speedup0.700000000000003
0.800000000000003
0.900000000000003
1
1.1
1.2
SPEC Benchmarks
PARSEC Benchmarks
On Average, 18 % Performance Improvement for PARSEC Multithreaded Benchmarks
On Average, 10% Improvement in Instruction Throughput forMulti-programmed workloads
11
Results: Energy Improvements
dedup fcsim. freq. rtvw. AVG.0.20.40.60.8
1
No
rma
lize
d
Le
aka
ge
E
ne
rgy
dedup fcsim. freq. rtvw. AVG.0.5
1.5
2.5
3.5S-1MB M-4MB Volatile M-4MB(1sec) Volatile M-4MB(10ms) Revived M-4MB(10ms)
No
rma
lize
d
Dyn
am
ic
En
erg
yNominal Increase in Dynamic Energy (4%) over M-4MB because of
Buffer Scheme
60 % reduction in Leakage Energy over SRAM designs
12
Summary
• STT-RAM is a promising technology, which has high density, low leakage and competitive read latencies compared to SRAM.
• High Write Latency and Energy is impeding its widespread adoption.
• Reducing Retention time can directly reduce the write-latency and write energy of STT-RAM.
• A Simple Buffering Scheme is presented to refresh important diminishing blocks.