24
© 2007 IBM Corporation International Symposium on Computer Architecture (ISCA-2009) Mar 25, 2022 Scalable High Performance Main Memory System Using PCM Technology Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers IBM T. J. Watson Research Center, Yorktown Heights, NY

Scalable High Performance Main Memory System Using PCM Technology

Embed Size (px)

DESCRIPTION

Scalable High Performance Main Memory System Using PCM Technology. Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers IBM T. J. Watson Research Center, Yorktown Heights, NY. Main Memory Capacity Wall. More cores in system  More concurrency  Larger working set - PowerPoint PPT Presentation

Citation preview

© 2007 IBM CorporationInternational Symposium on Computer Architecture (ISCA-2009) Apr 19, 2023

Scalable High Performance Main Memory System Using PCM Technology

Moinuddin K. QureshiViji Srinivasan and Jude Rivers

IBM T. J. Watson Research Center, Yorktown Heights, NY

2 © 2007 IBM Corporation

Main Memory Capacity Wall

More cores in system More concurrency Larger working set

Demand for main memory capacity continues to increase

Main Memory System consisting of DRAM are hitting:1. Cost wall: Major % of cost of large servers is main memory2. Scaling wall: DRAM scaling to small technology is challenge3. Power wall:

IBM P670 Server Processor Memory

Small (4 proc, 16GB) 384 Watts 314 Watts

Large (16 proc, 128GB) 840 Watts 1223 Watts

Source: Lefurgy et al. IEEE Computer 2003

Need a practical solution to increase main-memory capacity

3 © 2007 IBM Corporation

The Technology Hierarchy

More capacity by cheaper, denser, (slower) technology

21 23 27 211 213 215 219 223

Typical access latency in processor cycles (@ 4 GHz)

L1(SRAM) EDRAM DRAM HDD

25 29 217 221

Flash PCM

Phase Change Memory (PCM) promising candidate for large capacity main memory

High-Performance DiskMemory System

4 © 2007 IBM Corporation

Outline

Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary

5 © 2007 IBM Corporation

What is Phase Change Memory?

Phase change material (chalcogenide glass) exists in two states:1. Amorphous: high resistivity2. Crystalline: low resistivity

Materials can be switched between states reliably, quickly, large number of times

PCM stores data in terms of resistance• Low resistance (SET state) = 1• High resistance (RESET state) = 0

I

Bit Line

WordLine

N

WordLine

N N

6 © 2007 IBM Corporation

How does PCM work ?

Tmelt

Tcryst

Time [ns]

RESET

SET

Tem

per

atu

re

Switching by heating using electrical pulses

SET: sustained current to heat cell above Tcryst

RESET: cell heated above Tmelt and quenched

LargeCurrent

SETLow resistance

103-104

SmallCurrent

RESETHigh resistance

AccessDevice

MemoryElement

106-107

Photo Courtesy: Bipin Rajendran, IBM

7 © 2007 IBM Corporation

Key Characteristics of PCM

+ Scales better than DRAM, small cell size Prototypes as small as 3nm x 20 nm fabricated and tested [Raoux+ IBMJRD’08]

+ Can store multiple bits/cell More density in the same area Prototypes with 2 bits/cell in ISSCC’08. >2 bits/cell expected soon.

+ Non-Volatile Memory Technology Data retention of 10 years Power implications, system implications

Challenges: - More latency compared to DRAM. - Limited Endurance (~10 million writes per cell)- Write bandwidth constrained, so better to write less often.

8 © 2007 IBM Corporation

Outline

Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary

9 © 2007 IBM Corporation

Hybrid Memory System

Hybrid Memory System: 1. DRAM as cache to tolerate PCM Rd/Wr latency and Wr bandwidth2. PCM as main-memory to provide large capacity at good cost/power

DATA W

PCM Main Memory

DATAT

DRAM Buffer

PCM Write Queue

T=Tag-Store

Processor

FlashOr

HDD

10 © 2007 IBM Corporation

Lazy Write Architecture

Problem: Double PCM writes to dirty pages on install

DRAM Buffer PCM

Flash/Disk

WRQ

Processor

For example: Daxpy Kernel: Y[i] = Y[i] + X[i]Baseline has 2 writes for Y[i] and 1 for X[i] Lazy write has 1 write for Y[i] and 1 for X[i]

11 © 2007 IBM Corporation

0

2

4

6

8

10

12

14

16

18

20

db1 db2

Num

Writ

es P

er D

irty

Line

(Mill

ion)

Line Level Write Back

Problem: Not all lines in a dirty page are dirty

Solution: Dirty bits per line in DRAM buffer and

write-back only dirty lines from DRAM to PCM

Problem: With LLWB, not all lines in dirty pages are written uniformly

Num

Writ

es t

o E

ach

Line

(M

ln)

db1 db20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

AverageAverage

Line_id

12 © 2007 IBM Corporation

0

2

4

6

8

10

12

14

16

18

20

db1 db2

Num

Writ

es P

er D

irty

Line

(Mill

ion)

Fine Grained Wear Leveling

Solution: Fine Grained Wear Leveling (FGWL)-When a page gets allocated page is rotated by a random shift value-The rotate value remains constant while page remains in memory-On replacement of a page, a new random value is assigned for a new page -Over time, the write traffic per line becomes uniform.

FGWL makes writes across lines in a dirty page uniform

Num

Writ

es t

o E

ach

Line

(M

ln)

db1 db20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Average Average

Line_id

13 © 2007 IBM Corporation

Outline

Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary

14 © 2007 IBM Corporation

Evaluation Framework

Trace Driven Simulator:16-core system (simple core), 8GB DRAM main-memory at 320 cyclesHDD (2 ms) with Flash (32 us) with Flash hit-rate of 99%

Workloads:Database workloads & Data parallel kernels

1. Database workloads: db1 and db22. Unix utilities: qsort and binary search3. Data Mining : K-means and Gauss Seidal4. Streaming: DAXPY and Vector Dot Product

Assumption:PCM 4X denser & 4X slower than DRAM 32GB @ 1280 cycle read latency

15 © 2007 IBM Corporation

Reduction in Page Faults

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

db1 db2 qsort bsearch kmeans gauss daxpy vdotp

Pag

e F

au

lts N

orm

alize

d t

o 8

GB

Syste

m

4GB8GB16GB32GB

Benefit from capacity Need >16GB Streaming

16 © 2007 IBM Corporation

Impact on Execution Time

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

db1 db2 qsort bsearch kmeans gauss daxpy vdotp gmean

No

rmal

ized

Exe

cuti

on

Tim

e

8GB DRAM32GB PCM32GB DRAM32GB PCM + 1GB DRAM

PCM with DRAM buffer performs similar to equal capacity DRAM storage

17 © 2007 IBM Corporation

Impact of PCM Latency

Hybrid memory system is relatively insensitive to PCM Latency

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1X 2X 4X 8X 16X 2X 4X 8X 16X 1X

No

rma

lize

d E

xe

c.

Tim

e (

Av

g)

DRAM-8GB DRAM-32GBPCM-32GB HYBRID (1+32)GB

18 © 2007 IBM Corporation

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

Power Energy Energy x Delay

Va

lue

No

rma

lize

d t

o 8

GB

DR

AM 8GB DRAM

Hybrid (32GB PCM+ 1GB DRAM)

32GB DRAM

Power Evaluations

Significant Power and Energy savings with PCM based hybrid memory system

19 © 2007 IBM Corporation

Outline

Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary

20 © 2007 IBM Corporation

Impact of Write Endurance

F Frequency of System (4GHz)Y = Number of years (lifetime)

There are 225 seconds in a year

Num. cycles in Y years = Y. F.225

B Bytes/Cycle written to PCMS PCM capacity in bytesWmax Max writes per PCM cellAssuming uniform writes to PCM

Endurance (in cycles) = (S/B).Wmax

Y = (S/B). WmaxF.225

If Wmax = 10 million, PCM will last for 2.5 years

For a 4GHz System,a 32GB PCM written at

1 Byte per Cycle

Y = Wmax 4 million

21 © 2007 IBM Corporation

Lifetime Results

Configuration Avg. Bytes/Cycle Avg. Lifetime

1GB DRAM + 32GB PCM 0.807 3.0 yrs

+ Lazy Write 0.725 3.4 yrs

+ Line Level Write Back 0.316 7.6 yrs

+ Bypass Streaming Apps 0.247 9.7 yrs

Table shows average bytes per cycle written to PCM and Average lifetime of PCM assuming Wmax = 10 million

Proposed filtering techniques reduce write traffic to PCM by 3.2X, increasing its lifetime from 3 to 9.7 years

22 © 2007 IBM Corporation

Outline

Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary

23 © 2007 IBM Corporation

Summary

Need more main memory capacity: DRAM hitting power, cost, scaling wall

PCM is an emerging technology – 4x denser than DRAM but with slower access time and limited write endurance

We propose a Hybrid Memory System (DRAM+PCM) that provides significant power and performance benefits

Proposed write filtering techniques reduce writes by 3x and increase PCM lifetime from 3 years to 9 years

Not touched in this talk but important: Exploiting non-volatile memories for system enhancement & related OS issues.

24 © 2007 IBM Corporation

Thanks!