Transcript
Page 1: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

15-740/18-740 Computer Architecture

Lecture 25: Main Memory

Prof. Onur Mutlu Yoongu Kim

Carnegie Mellon University

Page 2: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Today n  SRAM vs. DRAM n  Interleaving/Banking n  DRAM Microarchitecture

q  Memory controller q  Memory buses q  Banks, ranks, channels, DIMMs q  Address mapping: software vs. hardware q  DRAM refresh

n  Memory scheduling policies n  Memory power/energy management n  Multi-core issues

q  Fairness, interference q  Large DRAM capacity

2

Page 3: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Readings n  Recommended:

q  Mutlu and Moscibroda, “Parallelism-Aware Batch Scheduling: Enabling High-Performance and Fair Memory Controllers,” IEEE Micro Top Picks 2009.

q  Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007.

q  Zhang et al., “A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality,” MICRO 2000.

q  Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. q  Rixner et al., “Memory Access Scheduling,” ISCA 2000.

3

Page 4: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Main Memory in the System

4

CORE 1

L2 CA

CH

E 0

SHA

RED

L3 CA

CH

E

DR

AM

INTER

FAC

E

CORE 0

CORE 2 CORE 3 L2 C

AC

HE 1

L2 CA

CH

E 2

L2 CA

CH

E 3

DR

AM

BA

NK

S

DRAM MEMORY CONTROLLER

Page 5: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Memory Bank Organization n  Read access sequence:

1. Decode row address & drive word-lines

2. Selected bits drive bit-lines • Entire row read

3. Amplify row data 4. Decode column

address & select subset of row

• Send to output 5. Precharge bit-lines • For next access

5

Page 6: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

SRAM (Static Random Access Memory)

6

bit-cell array

2n row x 2m-col

(n≈m to minimize overall latency)

sense amp and mux 2m diff pairs

2n n

m

1

row select

bitli

ne

_bitl

ine

n+m

Read Sequence 1. address decode 2. drive row select 3. selected bit-cells drive bitlines (entire row is read together)

4. diff. sensing and col. select (data is ready) 5. precharge all bitlines (for next read or write)

Access latency dominated by steps 2 and 3 Cycling time dominated by steps 2, 3 and 5

-  step 2 proportional to 2m

-  step 3 and 5 proportional to 2n

Page 7: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

DRAM (Dynamic Random Access Memory)

7

row enable _b

itlin

e

bit-cell array

2n row x 2m-col

(n≈m to minimize overall latency)

sense amp and mux 2m

2n n

m

1

RAS

CAS A DRAM die comprises of multiple such arrays

Bits stored as charges on node capacitance (non-restorative)

-  bit cell loses charge when read -  bit cell loses charge over time

Read Sequence 1~3 same as SRAM 4. a “flip-flopping” sense amp

amplifies and regenerates the bitline, data bit is mux’ed out

5. precharge all bitlines

Refresh: A DRAM controller must periodically read all rows within the allowed refresh time (10s of ms) such that charge is restored in cells

Page 8: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

SRAM vs. DRAM n  SRAM is preferable for register files and L1/L2 caches

q  Fast access q  No refreshes q  Simpler manufacturing (compatible with logic process) q  Lower density (6 transistors per cell) q  Higher cost

n  DRAM is preferable for stand-alone memory chips q  Much higher capacity q  Higher density q  Lower cost

8

Page 9: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Memory  subsystem  organiza1on  

•  Memory  subsystem  organiza1on  – Channel  – DIMM  – Rank  – Chip  – Bank  – Row/Column  

Page 10: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Memory  subsystem  

Memory  channel   Memory  channel  

DIMM  (Dual  in-­‐line  memory  module)  

Processor  

“Channel”  

Page 11: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Breaking  down  a  DIMM  DIMM  (Dual  in-­‐line  memory  module)  

Side  view  

Front  of  DIMM   Back  of  DIMM  

Page 12: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Breaking  down  a  DIMM  DIMM  (Dual  in-­‐line  memory  module)  

Side  view  

Front  of  DIMM   Back  of  DIMM  

Rank  0:  collec1on  of  8  chips   Rank  1  

Page 13: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Rank  

Rank  0  (Front)   Rank  1  (Back)  

Data  <0:63>  CS  <0:1>  Addr/Cmd  

<0:63>  <0:63>  

Memory  channel  

Page 14: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

DIMM  &  Rank  (from  JEDEC)  

Page 15: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Breaking  down  a  Rank  

Rank  0  

<0:63>  

Chip  0  

Chip  1  

Chip  7  .  .  .  

<0:7>  

<8:15>  

<56:63>  

Data  <0:63>  

Page 16: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Breaking  down  a  Chip  Ch

ip  0  

<0:7>  

Bank  0  

<0:7>  

<0:7>  

<0:7>  

...  

<0:7>  

Page 17: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Breaking  down  a  Bank  

Bank  0  

<0:7>  

row  0  

row  16k-­‐1  

...  2kB  

1B  

1B  (column)  

1B  

Row-­‐buffer  

1B  

...  <0:7>  

Page 18: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Memory  subsystem  organiza1on  

•  Memory  subsystem  organiza1on  – Channel  – DIMM  – Rank  – Chip  – Bank  – Row/Column  

Page 19: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Example:  Transferring  a  cache  block  

0xFFFF…F  

0x00  

0x40  

...  

64B    cache  block  

Physical  memory  space  

Channel  0  

DIMM  0  

Rank  0  Mappe

d  to  

Page 20: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Example:  Transferring  a  cache  block  

0xFFFF…F  

0x00  

0x40  

...  

64B    cache  block  

Physical  memory  space  

Rank  0  Chip  0   Chip  1   Chip  7  

<0:7>  

<8:15>  

<56:63>  

Data  <0:63>  

.  .  .  

Page 21: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Example:  Transferring  a  cache  block  

0xFFFF…F  

0x00  

0x40  

...  

64B    cache  block  

Physical  memory  space  

Rank  0  Chip  0   Chip  1   Chip  7  

<0:7>  

<8:15>  

<56:63>  

Data  <0:63>  

Row  0  Col  0  

.  .  .  

Page 22: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Example:  Transferring  a  cache  block  

0xFFFF…F  

0x00  

0x40  

...  

64B    cache  block  

Physical  memory  space  

Rank  0  Chip  0   Chip  1   Chip  7  

<0:7>  

<8:15>  

<56:63>  

Data  <0:63>  

8B  

Row  0  Col  0  

.  .  .  

8B  

Page 23: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Example:  Transferring  a  cache  block  

0xFFFF…F  

0x00  

0x40  

...  

64B    cache  block  

Physical  memory  space  

Rank  0  Chip  0   Chip  1   Chip  7  

<0:7>  

<8:15>  

<56:63>  

Data  <0:63>  

8B  

Row  0  Col  1  

.  .  .  

Page 24: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Example:  Transferring  a  cache  block  

0xFFFF…F  

0x00  

0x40  

...  

64B    cache  block  

Physical  memory  space  

Rank  0  Chip  0   Chip  1   Chip  7  

<0:7>  

<8:15>  

<56:63>  

Data  <0:63>  

8B  

8B  

Row  0  Col  1  

.  .  .  

8B  

Page 25: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Example:  Transferring  a  cache  block  

0xFFFF…F  

0x00  

0x40  

...  

64B    cache  block  

Physical  memory  space  

Rank  0  Chip  0   Chip  1   Chip  7  

<0:7>  

<8:15>  

<56:63>  

Data  <0:63>  

8B  

8B  

Row  0  Col  1  

A  64B  cache  block  takes  8  I/O  cycles  to  transfer.    

During  the  process,  8  columns  are  read  sequenUally.  

.  .  .  

Page 26: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Page Mode DRAM n  A DRAM bank is a 2D array of cells: rows x columns n  A “DRAM row” is also called a “DRAM page” n  “Sense amplifiers” also called “row buffer”

n  Each address is a <row,column> pair n  Access to a “closed row”

q  Activate command opens row (placed into row buffer) q  Read/write command reads/writes column in the row buffer q  Precharge command closes the row and prepares the bank for

next access

n  Access to an “open row” q  No need for activate command

26

Page 27: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

DRAM Bank Operation

27

Row Buffer

(Row 0, Column 0)

Row

dec

oder

Column mux

Row address 0

Column address 0

Data

Row 0 Empty

(Row 0, Column 1)

Column address 1

(Row 0, Column 85)

Column address 85

(Row 1, Column 0)

HIT HIT

Row address 1

Row 1

Column address 0

CONFLICT !

Columns

Row

s

Access Address:

Page 28: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Latency Components: Basic DRAM Operation

n  CPU → controller transfer time n  Controller latency

q  Queuing & scheduling delay at the controller q  Access converted to basic commands

n  Controller → DRAM transfer time n  DRAM bank latency

q  Simple CAS is row is “open” OR q  RAS + CAS if array precharged OR q  PRE + RAS + CAS (worst case)

n  DRAM → CPU transfer time (through controller)

28

Page 29: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

A DRAM Chip and DIMM n  Chip: Consists of multiple banks (2-16 in Synchronous DRAM) n  Banks share command/address/data buses n  The chip itself has a narrow interface (4-16 bits per read)

n  Multiple chips are put together to form a wide interface q  Called a module q  DIMM: Dual Inline Memory Module q  All chips in one side of a DIMM are operated the same way (rank)

n  Respond to a single command n  Share address and command buses, but provide different data

n  If we have chips with 8-bit interface, to read 8 bytes in a single access, use 8 chips in a DIMM

29

Page 30: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

128M x 8-bit DRAM Chip

30

Page 31: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

A 64-bit Wide DIMM

31

DRAMChip

DRAMChip

DRAMChip

DRAMChip

DRAMChip

DRAMChip

DRAMChip

DRAMChip

Command Data

Page 32: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

A 64-bit Wide DIMM n  Advantages:

q  Acts like a high-capacity DRAM chip with a wide interface

q  Flexibility: memory controller does not need to deal with individual chips

n  Disadvantages: q  Granularity:

Accesses cannot be smaller than the interface width

32

Page 33: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Multiple DIMMs

33

n  Advantages: q  Enables even

higher capacity

n  Disadvantages: q  Interconnect

complexity and energy consumption can be high

Page 34: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

DRAM Channels

n  2 Independent Channels: 2 Memory Controllers (Above) n  2 Dependent/Lockstep Channels: 1 Memory Controller with

wide interface (Not Shown above)

34

Page 35: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Generalized Memory Structure

35

Page 36: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Multiple Banks (Interleaving) and Channels n  Multiple banks

q  Enable concurrent DRAM accesses q  Bits in address determine which bank an address resides in

n  Multiple independent channels serve the same purpose q  But they are even better because they have separate data buses q  Increased bus bandwidth

n  Enabling more concurrency requires reducing q  Bank conflicts q  Channel conflicts

n  How to select/randomize bank/channel indices in address? q  Lower order bits have more entropy q  Randomizing hash functions (XOR of different address bits)

36

Page 37: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

How Multiple Banks/Channels Help

37

Page 38: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Multiple Channels n  Advantages

q  Increased bandwidth q  Multiple concurrent accesses (if independent channels)

n  Disadvantages q  Higher cost than a single channel

n  More board wires n  More pins (if on-chip memory controller)

38

Page 39: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Address Mapping (Single Channel) n  Single-channel system with 8-byte memory bus

q  2GB memory, 8 banks, 16K rows & 2K columns per bank

n  Row interleaving q  Consecutive rows of memory in consecutive banks

n  Cache block interleaving n  Consecutive cache block addresses in consecutive banks n  64 byte cache blocks

n  Accesses to consecutive cache blocks can be serviced in parallel n  How about random accesses? Strided accesses?

39

Column (11 bits) Bank (3 bits) Row (14 bits) Byte in bus (3 bits)

Low Col. High Column Row (14 bits) Byte in bus (3 bits) Bank (3 bits) 3 bits 8 bits

Page 40: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Bank Mapping Randomization n  DRAM controller can randomize the address mapping to

banks so that bank conflicts are less likely

40

Column (11 bits) 3 bits Byte in bus (3 bits)

XOR

Bank index (3 bits)

Page 41: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Address Mapping (Multiple Channels)

n  Where are consecutive cache blocks?

41

Column (11 bits) Bank (3 bits) Row (14 bits) Byte in bus (3 bits) C

Column (11 bits) Bank (3 bits) Row (14 bits) Byte in bus (3 bits) C

Column (11 bits) Bank (3 bits) Row (14 bits) Byte in bus (3 bits) C

Column (11 bits) Bank (3 bits) Row (14 bits) Byte in bus (3 bits) C

Low Col. High Column Row (14 bits) Byte in bus (3 bits) Bank (3 bits) 3 bits 8 bits

C

Low Col. High Column Row (14 bits) Byte in bus (3 bits) Bank (3 bits) 3 bits 8 bits

C

Low Col. High Column Row (14 bits) Byte in bus (3 bits) Bank (3 bits) 3 bits 8 bits

C

Low Col. High Column Row (14 bits) Byte in bus (3 bits) Bank (3 bits) 3 bits 8 bits

C

Low Col. High Column Row (14 bits) Byte in bus (3 bits) Bank (3 bits) 3 bits 8 bits

C

Page 42: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

Interaction with VirtualàPhysical Mapping n  Operating System influences where an address maps to in

DRAM

n  Operating system can control which bank a virtual page is mapped to. It can randomize Pageà<Bank,Channel> mappings

n  Application cannot know/determine which bank it is accessing

42

Column (11 bits) Bank (3 bits) Row (14 bits) Byte in bus (3 bits)

Page offset (12 bits) Physical Frame number (19 bits)

Page offset (12 bits) Virtual Page number (52 bits) VA

PA PA

Page 43: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

DRAM Refresh (I) n  DRAM capacitor charge leaks over time n  The memory controller needs to read each row periodically

to restore the charge q  Activate + precharge each row every N ms q  Typical N = 64 ms

n  Implications on performance? -- DRAM bank unavailable while refreshed -- Long pause times: If we refresh all rows in burst, every 64ms

the DRAM will be unavailable until refresh ends

n  Burst refresh: All rows refreshed immediately after one another

n  Distributed refresh: Each row refreshed at a different time, at regular intervals

43

Page 44: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

DRAM Refresh (II)

n  Distributed refresh eliminates long pause times n  How else we can reduce the effect of refresh on

performance? q  Can we reduce the number of refreshes?

44

Page 45: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

DRAM Controller n  Purpose and functions

q  Ensure correct operation of DRAM (refresh)

q  Service DRAM requests while obeying timing constraints of DRAM chips n  Constraints: resource conflicts (bank, bus, channel), minimum

write-to-read delays n  Translate requests to DRAM command sequences

q  Buffer and schedule requests to improve performance n  Reordering and row-buffer management

q  Manage power consumption and thermals in DRAM n  Turn on/off DRAM chips, manage power modes

45

Page 46: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

DRAM Controller Issues n  Where to place?

q  In chipset + More flexibility to plug different DRAM types into the system

+ Less power density in the CPU chip

q  On CPU chip

+ Reduced latency for main memory access + Higher bandwidth between cores and controller

q  More information can be communicated (e.g. request’s importance in the processing core)

46

Page 47: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

DRAM Controller (II)

47

Page 48: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

48

A Modern DRAM Controller

Page 49: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

DRAM Scheduling Policies (I) n  FCFS (first come first served)

q  Oldest request first

n  FR-FCFS (first ready, first come first served) 1. Row-hit first 2. Oldest first Goal: Maximize row buffer hit rate à maximize DRAM throughput q  Actually, scheduling is done at the command level

n  Column commands (read/write) prioritized over row commands (activate/precharge)

n  Within each group, older commands prioritized over younger ones

49

Page 50: 15-740/18-740 Computer Architecture Lecture 25: Main Memoryece740/f11/lib/... · 15-740/18-740 Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Yoongu Kim Carnegie Mellon

DRAM Scheduling Policies (II) n  A scheduling policy is essentially a prioritization order

n  Prioritization can be based on q  Request age q  Row buffer hit/miss status q  Request type (prefetch, read, write) q  Requestor type (load miss or store miss) q  Request criticality

n  Oldest miss in the core? n  How many instructions in core are dependent on it?

50


Recommended