12
Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs Adwait Jog, Asit K. Mishra‡, Cong Xu†, Yuan Xie†, N. Vijaykrishnan†, Ravi Iyer‡, Chita R. Das† The Pennsylvania State University Intel Corporation

Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

Embed Size (px)

DESCRIPTION

Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs. Adwait Jog † , Asit K. Mishra‡, Cong Xu†, Yuan Xie†, N. Vijaykrishnan†, Ravi Iyer‡, Chita R. Das†. † The Pennsylvania State University ‡ Intel Corporation. - PowerPoint PPT Presentation

Citation preview

Page 1: Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

Adwait Jog†, Asit K. Mishra‡, Cong Xu†, Yuan Xie†, N. Vijaykrishnan†, Ravi Iyer‡, Chita R. Das†

†The Pennsylvania State University ‡ Intel Corporation

Page 2: Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

2

STT-RAM as Emerging Memory Technology

• Spin-Torque Transfer RAM (STT-RAM) combines the speed of SRAM, density of DRAM, and non-volatility of Flash memory, making it attractive for on chip cache hierarchies.

• STT-RAM caches suffer from long write latency and higher write energy consumption when compared to traditional SRAM caches.

Page 3: Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

SRAM vs. STT-RAM

3

Area (mm2)

Read Energy

(nJ)

Write Energy

(nJ)

Leakage Power at

(mW)

Read Latency

(ns)

Write latency

(ns)

Read @ 2 GHz

(cycles)

Write @2 GHz (cycles)

1 MB SRAM 2.61 0.578 0.57

84542 1.012 1.012 2 2

4MB STT-RAM

3.00 1.035 1.066

2524 0.998 10.61 2 22

~3-4x denser

(capacity benefit)

1.8x lower

leakage energy

Comparable read

latency

~11x higher write

latency (@

2GHZ)

Page 4: Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

4

Proposal : Reduce Retention Time

• Years of data-retention time for STT-RAM may not be required.

• Trade-off retention time for lower STT-RAM write latency

• Challenge: Architecting “Volatile STT-RAM” Caches

• Advantage: Performance and Energy Benefits!

Page 5: Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

5

How to Calculate Optimal Retention Time?

(1) Device Constraints:Retention Time of STT-RAM can be reduced to a certain limit.

(2) Application Needs:Application Characteristics show that data-retention time in range of milliseconds is sufficient enough to make STT-RAM caches effective for CMPs.

Both Device Constraints and Application Needs should be considered for Optimal Results!

Page 6: Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

How to Reduce STT-RAM Write Latency?

6

1 2 3 4 5 6 7 8 9 100

50

100

150

200

250

300

10 years 1sec

Write Pulse Width (ns)

Wri

te C

urr

en

t (u

A)

Operating Point

Write current goes down with

reduction in retention time

Retention Time of STT-RAM

Write Latency @ 2 GHz

10 Years 22 cycles

1 second 12 cycles

10 millisecond 6 cycles

Retention Time

Page 7: Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

7

Inter-Write Time (Refresh Time) Distributions of Multi-threaded and Multi-Programmed Benchmarks

libq. gcc namd AVG. 0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

40+ ms

40 ms

30 ms

20 ms

10 ms

5 ms

frrt. fluid. x264 AVG.0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Pe

rce

nta

ge

of

Blo

cks

PARSEC SPEC 2006

Majority (> 50%) of L2 Cache Blocks get refreshed within 10ms

How much non-volatility can be traded off?

Page 8: Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

Volatile STT-RAM Based Last level Cache Design

8

Dying Blocks (Refresh)

Dying Blocks (Do not Refresh)

NON- IMP Blocks IMP Blocks

Answer: Use Selective Refresh Policy.

How to save rest 50% of the blocks?

Only refresh cache blocks which are in MRU Slots.

Block State

WAY ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Page 9: Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

9

How to refresh?

Block State

WAY ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Is Buffer Full? YES Dirty?

YES

Write-back to DRAM

NOCOPY

COPY BACK

IMP Blocks NON- IMP Blocks

Page 10: Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

10

Results: Speedup Improvement

dedup freq. rtvw. swpts. x264 frrt. fcsim. vips fluid. AVG. 0.700000000000003

0.900000000000003

1.1

1.3

1.5

1.7

S-1MB S-4MB (Ideal) M-4MB Volatile M-4MB(1sec) Volatile M-4MB(10ms) Revived-M-4MB(10ms)N

orm

aliz

ed s

peed

up

Instruction Throughput Weighted Speedup0.700000000000003

0.800000000000003

0.900000000000003

1

1.1

1.2

SPEC Benchmarks

PARSEC Benchmarks

On Average, 18 % Performance Improvement for PARSEC Multithreaded Benchmarks

On Average, 10% Improvement in Instruction Throughput forMulti-programmed workloads

Page 11: Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

11

Results: Energy Improvements

dedup fcsim. freq. rtvw. AVG.0.20.40.60.8

1

No

rma

lize

d

Le

aka

ge

E

ne

rgy

dedup fcsim. freq. rtvw. AVG.0.5

1.5

2.5

3.5S-1MB M-4MB Volatile M-4MB(1sec) Volatile M-4MB(10ms) Revived M-4MB(10ms)

No

rma

lize

d

Dyn

am

ic

En

erg

yNominal Increase in Dynamic Energy (4%) over M-4MB because of

Buffer Scheme

60 % reduction in Leakage Energy over SRAM designs

Page 12: Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

12

Summary

• STT-RAM is a promising technology, which has high density, low leakage and competitive read latencies compared to SRAM.

• High Write Latency and Energy is impeding its widespread adoption.

• Reducing Retention time can directly reduce the write-latency and write energy of STT-RAM.

• A Simple Buffering Scheme is presented to refresh important diminishing blocks.