View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Multilevel MemoryCaches
Prof. Sirer
CS 316
Cornell University
Storage Hierarchy
Technology Capacity Cost/GBLatency
Tape 1 TB $.17 100sDisk 300 GB $.34 4msDRAM 4GB $520 20nsSRAM off 512KB $123000 5nsSRAM on 16 KB ??? 2ns
Capacity and latency are closely coupled, cost is inversely proportional
How do we create the illusion of large and fast memory?
Tape
Disk
DRAM
SRAM
off chip
SRAM
on chip
Memory Hierarchy
Principle: Hide latency using small, fast memories called caches
Caches exploit locality Temporal locality: If a memory location is
referenced, it is likely to be referenced again in the near future
Spatial locality: If a memory location is referenced, other locations near it will be referenced in the near future
Cache Lookups (Read)
Look at address issued by processor, search cache tags to see if that block is in the cache Hit: Block is in the cache, return
requested data Miss: Block is not in the cache, read line
from memory, evict an existing line from the cache, place new line in cache, return requested data
Cache Organization
Cache has to be fast and small Gain speed by performing lookups in parallel,
requires die real estate Reduce hardware required by limiting where in
the cache a block might be placed
Three common designs Fully associative: Block can be anywhere in the
cache Direct mapped: Block can only be in one line in
the cache Set-associative: Block can be in a few (2 to 8)
places in the cache
Tags and Offsets
Cache block size determines cache organization
31 Virtual Address 0
31 Tag 5 4 Offset 0
Block
Fully Associative CacheO
ffset
T
ag
V Tag Block
=
=
line
select
word/byte
select
hit encode
Direct Mapped CacheO
ffset
Ind
ex
Tag
V Tag Block
=
2-Way Set-Associative Cache
Offs
et
I
ndex
T
ag
V Tag Block
=
V Tag Block
=
Valid Bits
Valid bits indicate whether cache line contains an up-to-date copy of the values in memory Must be 1 for a hit Reset to 0 on power up
An item can be removed from the cache by setting its valid bit to 0
Eviction
Which cache line should be evicted from the cache to make room for a new line? Direct-mapped
no choice, must evict line selected by index Associative caches
random: select one of the lines at random round-robin: similar to random FIFO: replace oldest line LRU: replace line that has not been used in the
longest time
Cache Writes
No-Write writes invalidate the cache and go to memory
Write-Through writes go to main memory and cache
Write-Back write cache, write main memory only when block is
evicted
CPUCache
SRAMMemory
DRAM
addr
data
Dirty Bits and Write-Back Buffers
Dirty bits indicate which lines have been writtenDirty bits enable the cache to handle multiple writes to the same cache line without having to go to memoryWrite-back buffer
A queue where dirty lines are placed Items added to the end as dirty lines are evicted from the cache Items removed from the front as memory writes are completed
Tag Data Byte 0, Byte 1 … Byte N
Line
V D
0
01
111
Misses
Three types of misses Cold
The line is being referenced for the first time Capacity
The line was evicted because the cache was not large enough
Conflict The line was evicted because of another
access whose index conflicted
Cache Design
Need to determine parameters Block size Number of ways Eviction policy Write policy Separate I-cache from D-cache
Virtual vs. Physical Caches
L1 (on-chip) caches are typically virtual
L2 (off-chip) caches are typically physical
CPUCache
SRAM
Memory
DRAMaddr
data
MMU
Cache
SRAMMMUCPU Memory
DRAM
addr
data
Cache works on physical addresses
Cache works on virtual addresses
Cache Conscious Programming
Speed up this program
int a[NCOL][NROW];
int sum = 0;
for(i = 0; i < NROW; ++i)
for(j = 0; j < NCOL; ++j)
sum += a[j][i];
Cache Conscious Programming
Every access is a cache miss!
int a[NCOL][NROW];
int sum = 0;
for(j = 0; j < NCOL; ++j)
for(i = 0; i < NROW; ++i)
sum += a[j][i];
1 11
2 12
3 13
4 14
5 15
6
7
8
9
10
Cache Conscious Programming
Same program, trivial transformation, 3 out of four accesses hit in the cache
int a[NCOL][NROW];
int sum = 0;
for(i = 0; i < NROW; ++i)
for(j = 0; j < NCOL; ++j)
sum += a[j][i];
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15