Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

Preview:

DESCRIPTION

Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs. Adwait Jog † , Asit K. Mishra‡, Cong Xu†, Yuan Xie†, N. Vijaykrishnan†, Ravi Iyer‡, Chita R. Das†. † The Pennsylvania State University ‡ Intel Corporation. - PowerPoint PPT Presentation

Citation preview

Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs

Adwait Jog†, Asit K. Mishra‡, Cong Xu†, Yuan Xie†, N. Vijaykrishnan†, Ravi Iyer‡, Chita R. Das†

†The Pennsylvania State University ‡ Intel Corporation

2

STT-RAM as Emerging Memory Technology

• Spin-Torque Transfer RAM (STT-RAM) combines the speed of SRAM, density of DRAM, and non-volatility of Flash memory, making it attractive for on chip cache hierarchies.

• STT-RAM caches suffer from long write latency and higher write energy consumption when compared to traditional SRAM caches.

SRAM vs. STT-RAM

3

Area (mm2)

Read Energy

(nJ)

Write Energy

(nJ)

Leakage Power at

(mW)

Read Latency

(ns)

Write latency

(ns)

Read @ 2 GHz

(cycles)

Write @2 GHz (cycles)

1 MB SRAM 2.61 0.578 0.57

84542 1.012 1.012 2 2

4MB STT-RAM

3.00 1.035 1.066

2524 0.998 10.61 2 22

~3-4x denser

(capacity benefit)

1.8x lower

leakage energy

Comparable read

latency

~11x higher write

latency (@

2GHZ)

4

Proposal : Reduce Retention Time

• Years of data-retention time for STT-RAM may not be required.

• Trade-off retention time for lower STT-RAM write latency

• Challenge: Architecting “Volatile STT-RAM” Caches

• Advantage: Performance and Energy Benefits!

5

How to Calculate Optimal Retention Time?

(1) Device Constraints:Retention Time of STT-RAM can be reduced to a certain limit.

(2) Application Needs:Application Characteristics show that data-retention time in range of milliseconds is sufficient enough to make STT-RAM caches effective for CMPs.

Both Device Constraints and Application Needs should be considered for Optimal Results!

How to Reduce STT-RAM Write Latency?

6

1 2 3 4 5 6 7 8 9 100

50

100

150

200

250

300

10 years 1sec

Write Pulse Width (ns)

Wri

te C

urr

en

t (u

A)

Operating Point

Write current goes down with

reduction in retention time

Retention Time of STT-RAM

Write Latency @ 2 GHz

10 Years 22 cycles

1 second 12 cycles

10 millisecond 6 cycles

Retention Time

7

Inter-Write Time (Refresh Time) Distributions of Multi-threaded and Multi-Programmed Benchmarks

libq. gcc namd AVG. 0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

40+ ms

40 ms

30 ms

20 ms

10 ms

5 ms

frrt. fluid. x264 AVG.0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Pe

rce

nta

ge

of

Blo

cks

PARSEC SPEC 2006

Majority (> 50%) of L2 Cache Blocks get refreshed within 10ms

How much non-volatility can be traded off?

Volatile STT-RAM Based Last level Cache Design

8

Dying Blocks (Refresh)

Dying Blocks (Do not Refresh)

NON- IMP Blocks IMP Blocks

Answer: Use Selective Refresh Policy.

How to save rest 50% of the blocks?

Only refresh cache blocks which are in MRU Slots.

Block State

WAY ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

9

How to refresh?

Block State

WAY ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Is Buffer Full? YES Dirty?

YES

Write-back to DRAM

NOCOPY

COPY BACK

IMP Blocks NON- IMP Blocks

10

Results: Speedup Improvement

dedup freq. rtvw. swpts. x264 frrt. fcsim. vips fluid. AVG. 0.700000000000003

0.900000000000003

1.1

1.3

1.5

1.7

S-1MB S-4MB (Ideal) M-4MB Volatile M-4MB(1sec) Volatile M-4MB(10ms) Revived-M-4MB(10ms)N

orm

aliz

ed s

peed

up

Instruction Throughput Weighted Speedup0.700000000000003

0.800000000000003

0.900000000000003

1

1.1

1.2

SPEC Benchmarks

PARSEC Benchmarks

On Average, 18 % Performance Improvement for PARSEC Multithreaded Benchmarks

On Average, 10% Improvement in Instruction Throughput forMulti-programmed workloads

11

Results: Energy Improvements

dedup fcsim. freq. rtvw. AVG.0.20.40.60.8

1

No

rma

lize

d

Le

aka

ge

E

ne

rgy

dedup fcsim. freq. rtvw. AVG.0.5

1.5

2.5

3.5S-1MB M-4MB Volatile M-4MB(1sec) Volatile M-4MB(10ms) Revived M-4MB(10ms)

No

rma

lize

d

Dyn

am

ic

En

erg

yNominal Increase in Dynamic Energy (4%) over M-4MB because of

Buffer Scheme

60 % reduction in Leakage Energy over SRAM designs

12

Summary

• STT-RAM is a promising technology, which has high density, low leakage and competitive read latencies compared to SRAM.

• High Write Latency and Energy is impeding its widespread adoption.

• Reducing Retention time can directly reduce the write-latency and write energy of STT-RAM.

• A Simple Buffering Scheme is presented to refresh important diminishing blocks.