37
ECE 4750 Computer Architecture Topic 5: Memory System Basics Christopher Batten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece4750 slide revision: 2013-09-22-21-12

Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

ECE 4750 Computer ArchitectureTopic 5: Memory System Basics

Christopher BattenSchool of Electrical and Computer Engineering

Cornell University

http://www.csl.cornell.edu/courses/ece4750

slide revision: 2013-09-22-21-12

Page 2: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches Classifying Caches Cache Performance

Processors, Memories, Networks

ECE 4750 T05: Memory System Basics 2 / 37

Page 3: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches Classifying Caches Cache Performance

Agenda

Memory Technology

Motivation for Caches

Classifying Caches

Cache Performance

ECE 4750 T05: Memory System Basics 3 / 37

Page 4: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

• Memory Technology • Motivation for Caches Classifying Caches Cache Performance

Latches and Registers

ECE 4750 T05: Memory System Basics 4 / 37

Page 5: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

• Memory Technology • Motivation for Caches Classifying Caches Cache Performance

Naive Register File

ECE 4750 T05: Memory System Basics 5 / 37

Page 6: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

• Memory Technology • Motivation for Caches Classifying Caches Cache Performance

ECE 4750 T05: Memory System Basics 6 / 37

Page 7: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

• Memory Technology • Motivation for Caches Classifying Caches Cache Performance

Memory Arrays: Register File

ECE 4750 T05: Memory System Basics 7 / 37

Page 8: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

• Memory Technology • Motivation for Caches Classifying Caches Cache Performance

Memory Arrays: SRAMs

ECE 4750 T05: Memory System Basics 8 / 37

Page 9: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

• Memory Technology • Motivation for Caches Classifying Caches Cache Performance

Memory Arrays: DRAMs

ECE 4750 T05: Memory System Basics 9 / 37

Page 10: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

• Memory Technology • Motivation for Caches Classifying Caches Cache Performance

Relative Memory Sizes of SRAM vs. DRAM

[ Foss, “Implementing Application-Specific Memory”, ISSCC 1996 ]

DRAM on memory chip

On-Chip SRAM in logic chip

ECE 4750 T05: Memory System Basics 10 / 37

Page 11: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

• Memory Technology • Motivation for Caches Classifying Caches Cache Performance

Memory Technology Trade-offs

Latches/Registers

Register File

SRAM

DRAMHigh CapacityHigh LatencyLow Bandwidth

Low CapacityLow LatencyHigh Bandwidth(more and wider ports)

ECE 4750 T05: Memory System Basics 11 / 37

Page 12: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology • Motivation for Caches • Classifying Caches Cache Performance

Agenda

Memory Technology

Motivation for Caches

Classifying Caches

Cache Performance

ECE 4750 T05: Memory System Basics 12 / 37

Page 13: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology • Motivation for Caches • Classifying Caches Cache Performance

CPU-Memory Bottleneck

ProcessorMain

Memory

I Performance of high-speed computers is usually limited by memorybandwidth and latency

I Latency is time for a single access. Main memory latency is usually >> than processor cycle time

I Bandwidth is the number of accesses per unit time. If m instructions are loads/stores, 1 + m memory accesses per instruction,

CPI = 1 requires at least 1 + m memory accesses per cycle

I Bandwidth-Delay product is amount of data that can be in flight at thesame time (Little’s Law)

ECE 4750 T05: Memory System Basics 13 / 37

Page 14: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology • Motivation for Caches • Classifying Caches Cache Performance

Processor-DRAM Latency Gap

!Proc 60%/year

DRAM 7%/year

1

10

100

1000

1980 1985 1990 1995 2000

DRAM

CPU Processor-Memory Performance Gap: (grows 50% / year)

Perf

orm

ance

I Four-issue 2 GHz superscalar accessing 100 ns DRAM could execute800 instructions during the time for one memory access!

I Long latencies mean large bandwidth-delay products which can bedifficult to saturate, meaning bandwidth is wasted

ECE 4750 T05: Memory System Basics 14 / 37

Page 15: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology • Motivation for Caches • Classifying Caches Cache Performance

Physical Size Affects Latency

Processor

SmallMemory

Processor

LargeMemory▶ Signals have further to travel

▶ Fan out to more locations

ECE 4750 T05: Memory System Basics 15 / 37

Page 16: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology • Motivation for Caches • Classifying Caches Cache Performance

Memory Hierarchy

Processor

SmallFast

Memory(RF,SRAM)

BigSlow

Memory(DRAM)

I Capacity → Register << SRAM << DRAMI Latency → Register << SRAM << DRAMI Bandwidth → on-chip >> off-chip

I On a data access:. if data is in fast memory→ low-latency access to SRAM. if data is not in fast memory→ long-latency access to DRAM

I Memory hierarchies only work if the small, fast memory actually storesdata that is reused by the processor

ECE 4750 T05: Memory System Basics 16 / 37

Page 17: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology • Motivation for Caches • Classifying Caches Cache Performance

Management of Memory Hierarchy

I Small/fast storage (e.g., registers). Address usually specified in instruction. Generally implemented directly as a register file

I Large/slower storage (e.g., main memory). Addresses usually computed from values in registers. Generally implemented with large off-chip DRAM

I Ideally would like intermediate level that is inbetween registers and main memory (e.g., cache). Need to predict what is kept in fast memory. Generally implemented with small on-chip/off-chip SRAM

ECE 4750 T05: Memory System Basics 17 / 37

Page 18: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology • Motivation for Caches • Classifying Caches Cache Performance

Desk/Library Analogy

I How many books to check out at once?I How long to keep books on desk?I Which books to return?I How to find books on your desk?

ECE 4750 T05: Memory System Basics 18 / 37

Page 19: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology • Motivation for Caches • Classifying Caches Cache Performance

Common And Predictable Memory Reference Patterns

Address

Time

Instruction fetches

Stack accesses

Data accesses

n loop iterations

subroutine call

subroutine return

argument access

vector access

scalar accesses

I Temporal Locality –if a location isreference it is likelyto be reference againin the near future

I Spatial Locality –if a location isreference it is likelythat locations near itwill be referenced inthe near future

ECE 4750 T05: Memory System Basics 19 / 37

Page 20: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology • Motivation for Caches • Classifying Caches Cache Performance

Real Memory Reference PatternsM

em

ory

Ad

dre

ss

Time (one dot per access to that address at that time)

D.J. Hatfield, J. Gerald, "Program Restructuring for Virtual Menory," IBM Systems Journal, 10(3): 168-192, 1971

Spatial Locality

Temporal Locality

Spatial and TemporalLocality

ECE 4750 T05: Memory System Basics 20 / 37

Page 21: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology • Motivation for Caches • Classifying Caches Cache Performance

Caches Exploit Both Types of Locality

Processor

SmallFast

Memory(RF,SRAM)

BigSlow

Memory(DRAM)

I Exploit temporal locality by rememberthe contents of recently access locations

I Exploit spatial locality by fetchingblocks of data around recently access locations

ECE 4750 T05: Memory System Basics 21 / 37

Page 22: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology • Motivation for Caches • Classifying Caches Cache Performance

PARCv4 Multicore Systems

ECE 4750 T05: Memory System Basics 22 / 37

Page 23: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches • Classifying Caches • Cache Performance

Agenda

Memory Technology

Motivation for Caches

Classifying Caches

Cache Performance

ECE 4750 T05: Memory System Basics 23 / 37

Page 24: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches • Classifying Caches • Cache Performance

Inside a Cache

CACHE Processor Main Memory

Address Address

Data Data

Address Tag

Data Block

Data Byte

Data Byte

Data Byte

Line 100

304

6848

copy of main memory location 100

copy of main memory location 101

416

ECE 4750 T05: Memory System Basics 24 / 37

Page 25: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches • Classifying Caches • Cache Performance

Basic Cache Algorithm for a Load

Processor issues load request to cache

Compare request address to cache tags & see if there is a match

Read block of data frommain memory

Replace victim block in cachewith new block

Return copy of data from cache

Cache Hit(found in cache)

Cache Miss(not in cache)

ECE 4750 T05: Memory System Basics 25 / 37

Page 26: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches • Classifying Caches • Cache Performance

Classifying Caches

Processor

SmallFast

Memory(RF,SRAM)

BigSlow

Memory(DRAM)

I Block Placement – Where can a block be placed in the cache?

I Block Identification – How is a block found if it is in the cache?

I Block Replacement – Which block should be replaced on a miss?

I Write Strategy – What happens on a write?

ECE 4750 T05: Memory System Basics 26 / 37

Page 27: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches • Classifying Caches • Cache Performance

Block Placement – Where place block in cache?

0 1 2 3 4 5 6 7 0 1 2 3 Set Number

Cache

Fully (2-way) Set Direct Associative Associative Mapped anywhere anywhere in only into set 0 block 4 (12 mod 4) (12 mod 8)

0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9

2 2 2 2 2 2 2 2 2 2 0 1 2 3 4 5 6 7 8 9

3 3 0 1

Memory

Block Number

block 12 can be placed

ECE 4750 T05: Memory System Basics 27 / 37

Page 28: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches • Classifying Caches • Cache Performance

Block Identification – How find block in cache?

Tag Index Offset

Tags Data Block

Block Address

V

Set 0Set 1Set 2Set 3

I Cache uses index and offset to find potential match, then checks tagI Tag check only includes higher order bits, low order bits are implicitI In this example (Direct-mapped, 8B block, 4 line cache)

. Number of bits in offset?

. Number of bits in index?

. Number of bits in tag?

ECE 4750 T05: Memory System Basics 28 / 37

Page 29: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches • Classifying Caches • Cache Performance

Block Identification – How find block in cache?

Tag Index Offset

Tags Data Block

Block Address

V

Set 0

Set 1

I Cache checks all potential blocks w/ fast parallel tag checkI In this example (2-way associative, 8B block, 4 line cache)

. Number of bits in offset?

. Number of bits in index?

. Number of bits in tag?

ECE 4750 T05: Memory System Basics 29 / 37

Page 30: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches • Classifying Caches • Cache Performance

Block Replacement – Which block to replace?

I No choice in a direct mapped cache

I In an associative cache, which block from setshould be evicted when the set becomes full?

I Random

I Least Recently Used (LRU). LRU cache state must be updated on every access. True implementation only feasible for small sets (2-way). Pseudo-LRU binary tree often used for 4-8 way

I First In, First Out (FIFO) aka Round-Robin. Used in highly associative caches

I Not Least Recently Used (NLRU). FIFO with exception for most recently used block(s)

ECE 4750 T05: Memory System Basics 30 / 37

Page 31: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches • Classifying Caches • Cache Performance

Write Strategy – How are writes handled?

I Cache Hit. Write Through – write both cache and memory,

generally higher traffic but simpler to design. Write Back – write cache only, memory is written

when evicted, dirty bit per block avoids unnecessary write backs,more complicated

I Cache Miss. No Write Allocate – only write to main memory. Write Allocate – fetch block into cache, then write

I Common Combinations. Write Through & No Write Allocate. Write Back & Write Allocate

ECE 4750 T05: Memory System Basics 31 / 37

Page 32: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches Classifying Caches • Cache Performance •

Agenda

Memory Technology

Motivation for Caches

Classifying Caches

Cache Performance

ECE 4750 T05: Memory System Basics 32 / 37

Page 33: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches Classifying Caches • Cache Performance •

Processor Performance

Processor CacheMain

Memory

Hit

Miss

lw F D X M W

F D X M W

F D X M WMMM

F D X M WXXX

F D D M WXDD

F F F M WXDF

stalladdu

lw

addu

beq

subu

10

0 it

era

tion

s

TimeProgram

=Instructions

Program× Cycles

Instruction× Time

Cycles

What is time / program for this simple example?

ECE 4750 T05: Memory System Basics 33 / 37

Page 34: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches Classifying Caches • Cache Performance •

Average Memory Access Time

Processor CacheMain

Memory

Hit

Miss

lw F D X M W

F D X M W

F D X M WMMM

F D X M WXXX

F D D M WXDD

F F F M WXDF

stalladdu

lw

addu

beq

subu

10

0 it

era

tion

s

Avg Mem Access Time = Hit Time + ( Miss Rate × Miss Penalty )

What is average memory access time for this simple example?

ECE 4750 T05: Memory System Basics 34 / 37

Page 35: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches Classifying Caches • Cache Performance •

Categorizing Misses: The Three C’s

Tag Index Offset

Tags Data Block

Block Address

V

Set 0

Set 1

I Compulsory – first-reference to a block, occur even with infinite cache

I Capacity – cache is too small to hold all data needed by program,occur even under perfect replacement policy (loop over 5 cache lines)

I Conflict – misses that occur because of collisions due to less than fullassociativity (loop over 3 cache lines)

ECE 4750 T05: Memory System Basics 35 / 37

Page 36: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches Classifying Caches Cache Performance

Summary

I Technology constraint leads to fundamental trade-off betweenmemory size (capacity) and memory latency

I Creating a memory hierarchy enables both low latency and highcapacity, IF data in higher levels of the hierarchy store data that isreused by the processor. Temporal Locality – refers likely reuse in time. Spatial Locality – refers likely reuse in space

I Caches exploit temporal and spatial locality, and involve a variety ofdifferent design decisions. Block Placement – Where can a block be placed in the cache?. Block Identification – How is a block found if it is in the cache?. Block Replacement – Which block should be replaced on a miss?. Write Strategy – What happens on a write?

I Average Mem Access Time – simple metric for evaluating trade-offs

ECE 4750 T05: Memory System Basics 36 / 37

Page 37: Topic 5: Memory System Basics ECE 4750 Computer …...Memory Technology • Motivation for Caches •Classifying CachesCache Performance CPU-Memory Bottleneck Processor Main Memory

Memory Technology Motivation for Caches Classifying Caches Cache Performance

Acknowledgements

Some of these slides contain material developed and copyrighted by:

Arvind (MIT), Krste Asanovic (MIT/UCB), Joel Emer (Intel/MIT)James Hoe (CMU), John Kubiatowicz (UCB), David Patterson (UCB)

MIT material derived from course 6.823UCB material derived from courses CS152 and CS252

ECE 4750 T05: Memory System Basics 37 / 37