Upload
marilynn-preston
View
223
Download
3
Tags:
Embed Size (px)
Citation preview
ECE 4100/6100Advanced Computer Architecture
Lecture 11 DRAM and Storage
Prof. Hsien-Hsin Sean Lee
School of Electrical and Computer Engineering
Georgia Institute of Technology
2
The DRAM Cell
• Why DRAMs– Higher density than SRAMs
• Disadvantages– Longer access times – Leaky, needs to be refreshed– Cannot be easily integrated with CMOS
Stack capacitor (vs. Trench capacitor)Source: Memory Arch Course, Insa. Toulouse
Word Line (Control)
Storage Capacitor
Bit Line
(Information)
1T1C DRAM cell
3
One DRAM Bank
wordline
bitlines
Sense ampsI/O gating
Row
decoder
Column decoder
Data out
Address
4
Column decoderColumn decoderColumn decoder
Row
decoder
Row
decoder
Row
decoder
Example: 512Mb 4-bank DRAM (x4)
Sense ampsI/O gating
Row
decoder
Column decoder
Data out D[3:0]
Address
A[13:0]
A[10:0]
Address Multiplexing
16K
2k
A x4 DRAM chip
A DRAM page = 2kx4 = 1KB
BA[1:0]
Bank016384 x 2048 x 4
5
DRAM Cell ArrayWordline0 Wordline1 Wordline2 Wordline1023
bitline0
bitline1
bitline2
bitline15
Wordline3
6
DRAM Sensing (Open Bitline Array)WL0 WL1 WL2 WL127
A DRAM Subarry
WL128 WL129 WL130 WL255
A DRAM Subarry
SenseSenseAmpAmp
7
Basic DRAM OperationsSenseSenseAmpAmp
Vdd/2
WL
BLVdd/2
Vdd
Write ‘1’driver Vdd - Vth
SenseSenseAmpAmp
Vdd/2WL
BL
Precharge to Vdd/2Vdd/2 + Vsignal
Read ‘1’Vdd - Vth
BLm
mddsignal CC
C
2
VV
CmCBL
Amplified Vsignal
refresh
8
DRAM Basics• Address multiplexing
– Send row address when RAS asserted – Send column address when CAS asserted
• DRAM reads are self-destructive– Rewrite after a read
• Memory array– All bits within an array work in unison
• Memory bank– Different banks can operate independently
• DRAM rank– Chips inside the same rank are accessed
simultaneously
9
Examples of DRAM DIMM Standards
D0
D7
x8
D8
D15
x8D
16
D23
x8
D24
D31
x8
D32
D39
x8
D40
D47
x8
D48
D55
x8
D56
D63
x8
x64 (No ECC)
D0
D7
x8
D8
D15
x8
CB
0
CB
7
x8
D16
D23
x8
D24
D31
x8
D32
D39
x8D
40
D47
x8
D48
D55
x8
X72 (ECC)D
56
D63
x8
10
DRAM Ranks
x8 x8 x8 x8 x8 x8 x8 x8x8 x8 x8 x8 x8 x8 x8 x8
D0
D7
D8
D15
D16
D23
D24
D31
D32
D39
D40
D47
D48
D55
D56
D63
CS1
CS0
Mem
ory
Con
trol
ler Rank0Rank0
Rank1Rank1
11
DRAM Ranks
Single Rank
8b 8b 8b 8b 8b 8b 8b 8b
64b
Single Rank
4b 4b 4b 4b 4b 4b 4b 4b
64b
4b 4b 4b 4b 4b 4b 4b 4b
Dual-Rank
8b 8b 8b 8b 8b 8b 8b 8b
64b
64b
8b 8b 8b 8b 8b 8b 8b 8b
12
DRAM Organization
Source: Memory Systems Architecture Course, B. Jacobs, Maryland
13
Organization of DRAM Modules
Source: Memory Systems Architecture Course Bruce Jacobs, University of Maryland
MemoryController
Addr and Cmd Bus
Data Bus
Channel
Multi-Banked DRAM Chip
14
DRAM Configuration ExampleSource: MICRON DDR3 DRAM
15
MemoryController
DRAM Module
Addr Bus
WE
CAS
RASAssert RAS
Row Address
Row OpenedData Bus
Column Address
Assert CAS
DRAM Access (Non Nibble Mode)RAS
CAS
ADDR
DATA
Row Addr
Col Addr
Data
Col Addr
Data
16
DRAM Refresh• Leaky storage • Periodic Refresh across DRAM rows • Un-accessible when refreshing• Read, and write the same data back
• Example: – 4k rows in a DRAM– 100ns read cycle– Decay in 64ms
– 4096*100ns = 410s to refresh once– 410s / 64ms = 0.64% unavailability
17
DRAM Refresh Styles• Bursty
64ms
410s =(100ns*4096) 410s
64ms
• Distributed
64ms
15.6s
64ms
100ns
18
• RAS-Only Refresh
• CAS-Before-RAS (CBR) Refresh
MemoryController
DRAM Module
DRAM Module
MemoryController
Addr Bus
WE
CAS
RAS
Addr Bus
WE#
CAS
RAS
Assert RAS
Row Address
Refresh Row
Assert RAS
Refresh Row
Assert CAS
WE High
Increment counter
DRAM Refresh Policies
Ad
dr co
un
ter
No address involved
19
Types of DRAM• Asynchronous DRAM
– Normal: Responds to RAS and CAS signals (no clock)– Fast Page Mode (FPM): Row remains open after RAS for multiple CAS co
mmands – Extended Data Out (EDO): Change output drivers to latches. Data can be
held on bus for longer time– Burst Extended Data Out: Internal counter drives address latch. Able to p
rovide data in burst mode.
• Synchronous DRAM– SDRAM: All of the above with clock. Adds predictability to DRAM operatio
n– DDR, DDR2, DDR3: Transfer data on both edges of the clock– FB-DIMM: DIMMs connected using point to point connection instead of b
us. Allows more DIMMs to be incorporated in server based systems
• RDRAM– Low pin count
20
Disk Storage
21
Disk Organization
Platters
A track
A sector
A cylinder
(1 to 12)
(5000 to 30000)
(100 to 500)
512 Bytes
3600 to 15000 RPM
22
Disk OrganizationRead/write Head (10s of nanometers above magnetic surface)
Arm
23
Disk Access Time• Seek time
– Move the arm to the desired track– 5ms to 12ms
• Rotation latency (or delay)– For example, average rotation latency for a 10,000
RPM disk is 3ms (=0.5/(10,000/60))• Data transfer latency (or throughput)
– Some tens of hundreds of MB per second– E.g., Seagate Cheetah 15K.6 sustained 164MB/sec
• Disk controller overhead
• Use Disk cache (or cache buffer) to exploit locality– 4 to 32MB today– Come with the embedded controller in the HDD
24
Reliability, Availability, Dependability• Program faults
25
Reliability, Availability, Dependability• Program faults• Static Permanent faults
– Design flaw • FDIV ~500
million$– Manufacturing
• Stuck-at-faults• Process variability
• Dynamic faults– Soft errors– Noise-induced– Wear-out
26
Solution Space • DRAM / SRAM
– Use ECC (SECDED)
• Disks– Use redundancy
• User’s backup• Disk arrays
27
RAID• Reliability and Performance consideration• Redundant Array of Inexpensive Disks• Combine multiple small, inexpensive disk
drives• Break arrays into “reliability groups”• Data are divided and replicated across
multiple disk drives• RAID-0 to RAID-5
• Hardware RAID– Dedicated HW controller
• Software RAID– Implemented in the OS
28
Basic Principles• Data mirroring
• Data striping
• Error correction code
29
RAID-1
• Mirrored disks• Most expensive (100% overhead)• Every write to disk also writes to the check disk• Can improve read/seek performance with sufficient number of
controllers
A4
A3
A2
A1
A0
A4
A3
A2
A1
A0
Disk 0(Data Disk)
Disk 1(Check Disk)
30
RAID-10
• Combine data striping atop of RAID-1
B5B2A3A0
B5B2A3A0
Data Disk 0
Data Disk 1
C0B3B0A1
Data Disk 2
C0B3B0A1
Data Disk 3
B4B1A2
Data Disk 4
B4B1A2
Data Disk 5
31
RAID-2
• Bit-interleaving striping• Use Hamming Code to generate and store ECC on check disks
(e.g., Hamming(7,4))– Space: 4 data disks need 3 check disks (75%), 10 data disks need 4
check disks (40% overhead), 25 data disks need 5 check disks (20%)– CPU needs more compute power to generate Hamming code than
parity• Complex controller• Not really used today!
D0C0B0A0
D1C1B1A1
Data Disk 0
Data Disk 1
D2C2B2A2
Data Disk 2
D3C3B3A3
Data Disk 3
dECC0
cECC0
bECC0
aECC0
Check Disk 0
dECC1
cECC1
bECC1
aECC1
CheckDisk 1
dECC2
cECC2
bECC2
aECC2
CheckDisk 2
32
RAID-3
• Byte-level striping• Use XOR parity to generate and store
parity code on the check disk• At least 3 disks: 2 data disks + 1 check
disk
D0C0B0A0
D1C1B1A1
Data Disk 0
Data Disk 1
D2C2B2A2
Data Disk 2
D3C3B3A3
Data Disk 3
ECCd
ECCc
ECCb
ECCa
Check Disk 0
OneTransfer
Unit
33
RAID-4
• Block-level striping• Keep each individual accessed unit in one disk
– Do not access all disks for (small) transfers– Improved parallelism
• Use XOR parity to generate and store parity code on the check disk• Check info is calculated over a piece of each transfer unit• Small read one read on one disk• Small write two reads and two writes (data and check disks)
– New parity = (old data new data) old parity – No need to read B0, C0, and D0 when read-modify-write A0
• Write is the bottlenecks as all writes access the check disk
A3
A2A1A0
B3B2B1B0
Data Disk 0
Data Disk 1
C3C2C1C0
Data Disk 2
D3D2D1D0
Data Disk 3
ECC3
ECC2
ECC1
ECC0
Check Disk 0
34
E3D3B3ECC2 C3ECC3 C2
ECC4
D2D1
ECC0
RAID-5
• Block-level striping• Distributed parity to enable write parallelism.
Remove bottleneck of accessing parity• Example: write “sector A” and write “sector B” can
be performed simultaneously
A3A2A1A0
E2
B2B1B0
Data Disk 0
Data Disk 1
E1
C1C0
Data Disk 2
E0D0
Data Disk 3
ECC1
Data Disk 4
35
ECC4q
D2D1D0
E2B2A2 ECC4pECC3p ECC3q
C2
RAID-6
• Similar to RAID-5 with “dual distributed parity”• ECC_p = XOR(A0, B0, C0); ECC_q = Code(A0, B0, C0,
ECC_p)• Sustain 2 drive failures with no data loss• Minimum requirement: 4 disks
– 2 for data striping– 2 for dual parity
A1ECC2p
ECC1q
A0
E1
ECC2q
B1B0
Data Disk 0
Data Disk 1
E0
C1C0
Data Disk 2
ECC1p
ECC0p
Data Disk 3
ECC0q
Data Disk 4