Upload
forrest-conner
View
22
Download
1
Embed Size (px)
DESCRIPTION
Scalable High Performance Main Memory System Using PCM Technology. Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers IBM T. J. Watson Research Center, Yorktown Heights, NY. Main Memory Capacity Wall. More cores in system More concurrency Larger working set - PowerPoint PPT Presentation
Citation preview
© 2007 IBM CorporationInternational Symposium on Computer Architecture (ISCA-2009) Apr 19, 2023
Scalable High Performance Main Memory System Using PCM Technology
Moinuddin K. QureshiViji Srinivasan and Jude Rivers
IBM T. J. Watson Research Center, Yorktown Heights, NY
2 © 2007 IBM Corporation
Main Memory Capacity Wall
More cores in system More concurrency Larger working set
Demand for main memory capacity continues to increase
Main Memory System consisting of DRAM are hitting:1. Cost wall: Major % of cost of large servers is main memory2. Scaling wall: DRAM scaling to small technology is challenge3. Power wall:
IBM P670 Server Processor Memory
Small (4 proc, 16GB) 384 Watts 314 Watts
Large (16 proc, 128GB) 840 Watts 1223 Watts
Source: Lefurgy et al. IEEE Computer 2003
Need a practical solution to increase main-memory capacity
3 © 2007 IBM Corporation
The Technology Hierarchy
More capacity by cheaper, denser, (slower) technology
21 23 27 211 213 215 219 223
Typical access latency in processor cycles (@ 4 GHz)
L1(SRAM) EDRAM DRAM HDD
25 29 217 221
Flash PCM
Phase Change Memory (PCM) promising candidate for large capacity main memory
High-Performance DiskMemory System
4 © 2007 IBM Corporation
Outline
Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary
5 © 2007 IBM Corporation
What is Phase Change Memory?
Phase change material (chalcogenide glass) exists in two states:1. Amorphous: high resistivity2. Crystalline: low resistivity
Materials can be switched between states reliably, quickly, large number of times
PCM stores data in terms of resistance• Low resistance (SET state) = 1• High resistance (RESET state) = 0
I
Bit Line
WordLine
N
WordLine
N N
6 © 2007 IBM Corporation
How does PCM work ?
Tmelt
Tcryst
Time [ns]
RESET
SET
Tem
per
atu
re
Switching by heating using electrical pulses
SET: sustained current to heat cell above Tcryst
RESET: cell heated above Tmelt and quenched
LargeCurrent
SETLow resistance
103-104
SmallCurrent
RESETHigh resistance
AccessDevice
MemoryElement
106-107
Photo Courtesy: Bipin Rajendran, IBM
7 © 2007 IBM Corporation
Key Characteristics of PCM
+ Scales better than DRAM, small cell size Prototypes as small as 3nm x 20 nm fabricated and tested [Raoux+ IBMJRD’08]
+ Can store multiple bits/cell More density in the same area Prototypes with 2 bits/cell in ISSCC’08. >2 bits/cell expected soon.
+ Non-Volatile Memory Technology Data retention of 10 years Power implications, system implications
Challenges: - More latency compared to DRAM. - Limited Endurance (~10 million writes per cell)- Write bandwidth constrained, so better to write less often.
8 © 2007 IBM Corporation
Outline
Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary
9 © 2007 IBM Corporation
Hybrid Memory System
Hybrid Memory System: 1. DRAM as cache to tolerate PCM Rd/Wr latency and Wr bandwidth2. PCM as main-memory to provide large capacity at good cost/power
DATA W
PCM Main Memory
DATAT
DRAM Buffer
PCM Write Queue
T=Tag-Store
Processor
FlashOr
HDD
10 © 2007 IBM Corporation
Lazy Write Architecture
Problem: Double PCM writes to dirty pages on install
DRAM Buffer PCM
Flash/Disk
WRQ
Processor
For example: Daxpy Kernel: Y[i] = Y[i] + X[i]Baseline has 2 writes for Y[i] and 1 for X[i] Lazy write has 1 write for Y[i] and 1 for X[i]
11 © 2007 IBM Corporation
0
2
4
6
8
10
12
14
16
18
20
db1 db2
Num
Writ
es P
er D
irty
Line
(Mill
ion)
Line Level Write Back
Problem: Not all lines in a dirty page are dirty
Solution: Dirty bits per line in DRAM buffer and
write-back only dirty lines from DRAM to PCM
Problem: With LLWB, not all lines in dirty pages are written uniformly
Num
Writ
es t
o E
ach
Line
(M
ln)
db1 db20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
AverageAverage
Line_id
12 © 2007 IBM Corporation
0
2
4
6
8
10
12
14
16
18
20
db1 db2
Num
Writ
es P
er D
irty
Line
(Mill
ion)
Fine Grained Wear Leveling
Solution: Fine Grained Wear Leveling (FGWL)-When a page gets allocated page is rotated by a random shift value-The rotate value remains constant while page remains in memory-On replacement of a page, a new random value is assigned for a new page -Over time, the write traffic per line becomes uniform.
FGWL makes writes across lines in a dirty page uniform
Num
Writ
es t
o E
ach
Line
(M
ln)
db1 db20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Average Average
Line_id
13 © 2007 IBM Corporation
Outline
Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary
14 © 2007 IBM Corporation
Evaluation Framework
Trace Driven Simulator:16-core system (simple core), 8GB DRAM main-memory at 320 cyclesHDD (2 ms) with Flash (32 us) with Flash hit-rate of 99%
Workloads:Database workloads & Data parallel kernels
1. Database workloads: db1 and db22. Unix utilities: qsort and binary search3. Data Mining : K-means and Gauss Seidal4. Streaming: DAXPY and Vector Dot Product
Assumption:PCM 4X denser & 4X slower than DRAM 32GB @ 1280 cycle read latency
15 © 2007 IBM Corporation
Reduction in Page Faults
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
db1 db2 qsort bsearch kmeans gauss daxpy vdotp
Pag
e F
au
lts N
orm
alize
d t
o 8
GB
Syste
m
4GB8GB16GB32GB
Benefit from capacity Need >16GB Streaming
16 © 2007 IBM Corporation
Impact on Execution Time
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
db1 db2 qsort bsearch kmeans gauss daxpy vdotp gmean
No
rmal
ized
Exe
cuti
on
Tim
e
8GB DRAM32GB PCM32GB DRAM32GB PCM + 1GB DRAM
PCM with DRAM buffer performs similar to equal capacity DRAM storage
17 © 2007 IBM Corporation
Impact of PCM Latency
Hybrid memory system is relatively insensitive to PCM Latency
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1X 2X 4X 8X 16X 2X 4X 8X 16X 1X
No
rma
lize
d E
xe
c.
Tim
e (
Av
g)
DRAM-8GB DRAM-32GBPCM-32GB HYBRID (1+32)GB
18 © 2007 IBM Corporation
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
Power Energy Energy x Delay
Va
lue
No
rma
lize
d t
o 8
GB
DR
AM 8GB DRAM
Hybrid (32GB PCM+ 1GB DRAM)
32GB DRAM
Power Evaluations
Significant Power and Energy savings with PCM based hybrid memory system
19 © 2007 IBM Corporation
Outline
Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary
20 © 2007 IBM Corporation
Impact of Write Endurance
F Frequency of System (4GHz)Y = Number of years (lifetime)
There are 225 seconds in a year
Num. cycles in Y years = Y. F.225
B Bytes/Cycle written to PCMS PCM capacity in bytesWmax Max writes per PCM cellAssuming uniform writes to PCM
Endurance (in cycles) = (S/B).Wmax
Y = (S/B). WmaxF.225
If Wmax = 10 million, PCM will last for 2.5 years
For a 4GHz System,a 32GB PCM written at
1 Byte per Cycle
Y = Wmax 4 million
21 © 2007 IBM Corporation
Lifetime Results
Configuration Avg. Bytes/Cycle Avg. Lifetime
1GB DRAM + 32GB PCM 0.807 3.0 yrs
+ Lazy Write 0.725 3.4 yrs
+ Line Level Write Back 0.316 7.6 yrs
+ Bypass Streaming Apps 0.247 9.7 yrs
Table shows average bytes per cycle written to PCM and Average lifetime of PCM assuming Wmax = 10 million
Proposed filtering techniques reduce write traffic to PCM by 3.2X, increasing its lifetime from 3 to 9.7 years
22 © 2007 IBM Corporation
Outline
Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary
23 © 2007 IBM Corporation
Summary
Need more main memory capacity: DRAM hitting power, cost, scaling wall
PCM is an emerging technology – 4x denser than DRAM but with slower access time and limited write endurance
We propose a Hybrid Memory System (DRAM+PCM) that provides significant power and performance benefits
Proposed write filtering techniques reduce writes by 3x and increase PCM lifetime from 3 years to 9 years
Not touched in this talk but important: Exploiting non-volatile memories for system enhancement & related OS issues.