106
Outline • Cache writes • DRAM configurations • Performance • Associative caches • Multi-level caches

Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Embed Size (px)

Citation preview

Page 1: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Outline

• Cache writes

• DRAM configurations

• Performance

• Associative caches

• Multi-level caches

Page 2: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

00011011

DataTagValid

Reference Stream:Hit/Miss0b010010000b000101000b001110000b00010000

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

Tag IndexByte Offset

Block Offset

01110000

110

1

Page 3: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

00011011

DataTagValid

Reference Stream:Hit/Miss0b010010000b000101000b001110000b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

01110000

110

1

Page 4: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

00011011

DataTagValid

Reference Stream:Hit/Miss0b010010000b000101000b001110000b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

01110000

110

1

Page 5: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0100110100100011

DataTagValid

110

1

Reference Stream:Hit/Miss0b010010000b000101000b001110000b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

M[64-79]

Page 6: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0100110100100011

DataTagValid

110

1

Reference Stream:Hit/Miss0b010010000b000101000b001110000b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

M[64-79]M[208-223]

Page 7: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0100110100100011

DataTagValid

110

1

Reference Stream:Hit/Miss0b010010000b000101000b001110000b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

M[64-79]M[208-223]M[32-47]

Page 8: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0100110100100011

DataTagValid

110

1

Reference Stream:Hit/Miss0b010010000b000101000b001110000b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

M[64-79]M[208-223]M[32-47]Not Valid

Page 9: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0100110100100011

DataTagValid

110

1

Reference Stream:Hit/Miss0b010010000b000101000b001110000b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

M[64-79]M[208-223]M[32-47]

Page 10: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0100110100100011

DataTagValid

110

1

Reference Stream: Hit/Miss0b01001000 H0b000101000b001110000b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

M[64-79]M[208-223]M[32-47]

Page 11: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0100110100100011

DataTagValid

110

1

Reference Stream: Hit/Miss0b01001000 H0b000101000b001110000b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

M[64-79]M[208-223]M[32-47]

Page 12: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0100000100100011

DataTagValid

110

1

Reference Stream: Hit/Miss0b01001000 H0b00010100 M0b001110000b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

M[64-79]M[16-31]M[32-47]

Page 13: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0100000100100011

DataTagValid

110

1

Reference Stream: Hit/Miss0b01001000 H0b00010100 M0b001110000b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

M[64-79]M[16-31]M[32-47]

Page 14: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0100000100100011

DataTagValid

111

1

Reference Stream: Hit/Miss0b01001000 H0b00010100 M0b00111000 M0b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

M[64-79]M[16-31]M[32-47]M[48-63]

Page 15: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0100000100100011

DataTagValid

111

1

Reference Stream: Hit/Miss0b01001000 H0b00010100 M0b00111000 M0b00010000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

M[64-79]M[16-31]M[32-47]M[48-63]

Page 16: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0100000100100011

DataTagValid

111

1

Reference Stream: Hit/Miss0b01001000 H0b00010100 M0b00111000 M0b00010000 H

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=4words, wordsize= 4bytes

M[64-79]M[16-31]M[32-47]M[48-63]

Page 17: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Cache Writes

• There are multiple copies of the data lying around– L1 cache, L2 cache, DRAM

• Do we write to all of them?

• Do we wait for the write to complete before the processor can proceed?

Page 18: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Do we write to all of them?

• Write-through

• Write-back– creates data - different values

for same item in cache and DRAM.– This data is referred to as

Page 19: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Do we write to all of them?

• Write-through - write to all levels of hierarchy

• Write-back– creates data - different values

for same item in cache and DRAM.– This data is referred to as

Page 20: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Do we write to all of them?

• Write-through - write to all levels of hierarchy• Write-back - write to lower level only when cache

line gets evicted from cache– creates inconsistent data - different values for same

item in cache and DRAM – stale data.

– Inconsistent data in highest level in cache is referred to as dirty

– If they all match, they are clean

– The old data is stale.

Page 21: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Write-Through

CPU

L1

L2 Cache

DRAM

Sw $3, 0($5)

Page 22: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Write-Back

CPU

L1

L2 Cache

DRAM

Sw $3, 0($5)

Page 23: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Write-through vs Write-back

• Which performs the write faster?

• Which has faster evictions from a cache?

• Which causes more bus traffic?

Page 24: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Write-through vs Write-back

• Which performs the write faster?– Write-back - it only writes the L1 cache

• Which has faster evictions from a cache?

• Which causes more bus traffic?

Page 25: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Write-through vs Write-back

• Which performs the write faster?– Write-back - it only writes the L1 cache

• Which has faster evictions from a cache?– Write-through - no write involved, just

overwrite tag

• Which causes more bus traffic?

Page 26: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Write-through vs Write-back

• Which performs the write faster?– Write-back - it only writes the L1 cache

• Which has faster evictions from a cache?– Write-through - no write involved, just

overwrite tag

• Which causes more bus traffic?– Write-through. DRAM is written every store.

Write-back only writes on eviction.

Page 27: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Does processor wait for write?

• Write buffer

– Any loads must check write buffer in parallel with cache access.

– Buffer values are more recent than cache values.

Page 28: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Does processor wait for write?

• Write buffer - intermediate queue for pending writes

– Any loads must check write buffer in parallel with cache access.

– Buffer values are more recent than cache values.

Page 29: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Outline

• Cache writes

• DRAM configurations

• Performance

• Associative caches

Page 30: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Challenge

• DRAM is designed for density, not speed

• DRAM is ______ than the bus

• We are allowed to change the width, the number of DRAMs, and the bus protocol, but the access latency stays slow.

• Widening anything increases the cost by quite a bit.

Page 31: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Challenge

• DRAM is designed for density, not speed

• DRAM is slower than the bus

• We are allowed to change the width, the number of DRAMs, and the bus protocol, but the access latency stays slow.

• Widening anything increases the cost by quite a bit.

Page 32: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Narrow Configuration CPU

Cache

DRAM

Bus

• Given:– 1 clock cycle request– 15 cycles / word DRAM latency– 1 cycle / word bus latency

• If a cache block is 8 words, what is the miss penalty of an L2 cache miss?

Page 33: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Narrow Configuration CPU

Cache

DRAM

Bus

• Given:– 1 clock cycle request– 15 cycles / word DRAM latency– 1 cycle / word bus latency

• If a cache block is 8 words, what is the miss penalty of an L2 cache miss?

• 1cycle + 15 cycles/word * 8 words + 1 cycle/word * 8 words = 129 cycles

Page 34: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Wide Configuration CPU

Cache

DRAM

Bus

• Given:– 1 clock cycle request– 15 cycles / 2 words DRAM latency– 1 cycle / 2 words bus latency

• If a cache block is 8 words, what is the miss penalty of an L2 cache miss?

Page 35: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Wide Configuration CPU

Cache

DRAM

Bus

• Given:– 1 clock cycle request– 15 cycles / 2 words DRAM latency– 1 cycle / 2 words bus latency

• If a cache block is 8 words, what is the miss penalty of an L2 cache miss?

• 1cycle + 15 cycles/2 words * 8 words + 1 cycle/2words*8words = 65 cycles

Page 36: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Interleaved Configuration CPU

Cache

DRAM

Bus

• Given:– 1 clock cycle request– 15 cycles / word DRAM latency– 1 cycle / word bus latency

• If a cache block is 8 words, what is the miss penalty of an L2 cache miss?

DRAM

Page 37: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Interleaved Configuration CPU

Cache

DRAM

Bus

• Given:– 1 clock cycle request– 15 cycles / word DRAM latency– 1 cycle / word bus latency

• If a cache block is 8 words, what is the miss penalty of an L2 cache miss?

• 1 cycle + 15 cycles / 2 words * 8 words + 1 cycle / word * 8 words = 69 cycles

DRAM

Page 38: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Recent DRAM trends

• Fewer, Bigger DRAMs

• New bus protocols (RAMBUS)

• small DRAM caches (page mode)

• SDRAM (synchronous DRAM)– one request & length nets several continuous

responses.

Page 39: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Outline

• Cache writes

• DRAM configurations

• Performance

• Associative caches

Page 40: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Performance• Execute Time = (Cpu cycles + Memory-stall

cycles) * clock cycle time• Memory-stall cycles =

– accesses * misses * cycles = – program access miss – memory access * Miss rate * Miss penalty – program – instructions * misses * cycles = – program inst miss – instructions * misses * miss penalty– program inst

Page 41: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Example 1

• instruction cache miss rate: 2%

• data cache miss rate: 3%

• miss penalty: 50 cycles

• ld/st instructions are 25% of instructions

• CPI with perfect cache is 2.3

• How much faster is the computer with a perfect cache?

Page 42: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Example 1

• misses = Iacc * Imr + Dacc * Dmr

• instr instr instr

Page 43: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Example 1

• misses = Iacc * Imr + Dacc * Dmr

• instr instr instr

• = 1 * .02 + .25 * .03 = .02 + .0075 = .0275

Page 44: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Example 1

• misses = Iacc * Imr + Dacc * Dmr

• instr instr instr

• = 1 * .02 + .25 * .03 = .02 + .0075 = .0275

• Memory cycles = I * .0275 * 50 = I* 1.375

Page 45: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Example 1

• misses = Iacc * Imr + Dacc * Dmr

• instr instr instr

• = 1 * .02 + .25 * .03 = .02 + .0075 = .0275

• Memory cycles = I * .0275 * 50 = I* 1.375

• ExecT = (Cpu CPI * I + MemCycles)*Clk

Page 46: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Example 1

• misses = Iacc * Imr + Dacc * Dmr

• instr instr instr

• = 1 * .02 + .25 * .03 = .02 + .0075 = .0275

• Memory cycles = I * .0275 * 50 = I* 1.375

• ExecT = (Cpu CPI * I + MemCycles)*Clk

• = (2.3 * I + 1.375 * I) * clk = 3.675IC

Page 47: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Example 1

• misses = Iacc * Imr + Dacc * Dmr

• instr instr instr

• = 1 * .02 + .25 * .03 = .02 + .0075 = .0275

• Memory cycles = I * .0275 * 50 = I* 1.375

• ExecT = (Cpu CPI * I + MemCycles)*Clk

• = (2.3 * I + 1.375 * I) * clk = 3.675IC

• speedup = 3.675 IC / 2.3IC = 1.6

Page 48: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Example 2• Double the clock rate from Example1.

What is the ideal speedup when taking into account the memory system?

• How long is the miss penalty now?

Page 49: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Example 2• Double the clock rate from Example1.

What is the ideal speedup when taking into account the memory system?

• How long is the miss penalty now? 100 cycles

• Memory cycles =

Page 50: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Example 2• Double the clock rate from Example1.

What is the ideal speedup when taking into account the memory system?

• How long is the miss penalty now? 100 cycles

• Memory cycles = I * .0275 * 100 = I * 2.75

Page 51: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Example 2• Double the clock rate from Example1.

What is the ideal speedup when taking into account the memory system?

• How long is the miss penalty now? 100 cycles

• Memory cycles = I * .0275 * 100 = I * 2.75

• Exec = (2.3*I + 2.75*I)*clk = 5.05I(C/2)

Page 52: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Example 2• Double the clock rate from Example1.

What is the ideal speedup when taking into account the memory system?

• How long is the miss penalty now? 100 cycles

• Memory cycles = I * .0275 * 100 = I * 2.75

• Exec = (2.3*I + 2.75*I)*clk = 5.05I(C/2)

• speedup = old = 3.675IC = 3.675 = 1.5

• new = 5.05IC/2 2.525

Page 53: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Outline

• Cache writes

• DRAM configurations

• Performance

• Associative caches

Page 54: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

10100010010001000011

DataTagValid

110

1

Reference Stream: Hit/Miss0b001110000b000111000b001110000b00011000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=2words, wordsize= 4bytes

M[160-167]M[72-79]M[16-23]Not Valid

Page 55: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

000110

00111

DataTagValid

111

1

Reference Stream: Hit/Miss0b00111000 M0b000111000b001110000b00011000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=2words, wordsize= 4bytes

101010000

M[160-167]M[72-79]M[16-23]M[56-63]

Page 56: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

000110

00111

DataTagValid

111

1

Reference Stream: Hit/Miss0b00111000 M0b000111000b001110000b00011000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=2words, wordsize= 4bytes

101010000

M[160-167]M[72-79]M[16-23]M[56-63]

Page 57: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

000110

00011

DataTagValid

111

1

Reference Stream: Hit/Miss0b00111000 M0b00011100 M0b001110000b00011000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=2words, wordsize= 4bytes

101010000

M[160-167]M[72-79]M[16-23]M[24-31]

Page 58: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

000110

00011

DataTagValid

111

1

Reference Stream: Hit/Miss0b00111000 M0b00011100 M0b001110000b00011000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=2words, wordsize= 4bytes

101010000

M[160-167]M[72-79]M[16-23]M[24-31]

Page 59: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

000110

00111

DataTagValid

111

1

Reference Stream: Hit/Miss0b00111000 M0b00011100 M0b00111000 M0b00011000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=2words, wordsize= 4bytes

101010000

M[160-167]M[72-79]M[16-23]M[56-63]

Page 60: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

000110

00111

DataTagValid

111

1

Reference Stream: Hit/Miss0b00111000 M0b00011100 M0b00111000 M0b00011000

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=2words, wordsize= 4bytes

101010000

M[160-167]M[72-79]M[16-23]M[56-63]

Page 61: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

000110

00111

DataTagValid

111

1

Reference Stream: Hit/Miss0b00111000 M0b00011100 M0b00111000 M0b00011000 M

Tag IndexByte Offset

Block Offset

Direct-mapped CacheBlocksize=2words, wordsize= 4bytes

101010000

M[160-167]M[72-79]M[16-23]M[56-63]

Page 62: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Problem

• Conflicting addresses cause high miss rates

Page 63: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Solution

• Relax the direct-mapping

• Allow each address to be mapped into 2 or 4 locations (a set)

Page 64: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Cache Configurations

00011011

DataTagValid

01

DataTagValid DataTagValid

Direct-Mapped

2-way Associative - each set has two blocks

DataTagValid DataTagValidFully Associative - all addresses map to the same set

Page 65: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Cache Configurations

00011011

DataTagValid

01

DataTagValid DataTagValid

Direct-Mapped

2-way Associative - each set has two blocks

DataTagValid DataTagValidFully Associative - all addresses map to the same set

Block

Page 66: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Cache Configurations

00011011

DataTagValid

01

DataTagValid DataTagValid

Direct-Mapped

2-way Associative - each set has two blocks

DataTagValid DataTagValidFully Associative - all addresses map to the same set

BlockSet

Page 67: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

10010

00101

0000

0001

DataTagValid

1

1

1

1

Reference Stream: Hit/Miss0b001110000b000111000b001110000b00011000

Tag IndexByte Offset

Block Offset

2-way Set Associative CacheBlocksize=2words, wordsize= 4bytes

DataTagValidIndex

Set

Block

Page 68: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

10010

00101

0000

0001

DataTagValid

1

1

1

1

Reference Stream: Hit/Miss0b001110000b000111000b001110000b00011000

Tag IndexByte Offset

Block Offset

2-way Set Associative Cache Blocksize=2words, wordsize= 4bytes

DataTagValidIndex

Page 69: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

10010

00111

0000

0001

DataTagValid

1

1

1

1

Reference Stream: Hit/Miss0b00111000 M0b000111000b001110000b00011000

Tag IndexByte Offset

Block Offset

2-way Set Associative Cache Blocksize=2words, wordsize= 4bytes

DataTagValidIndex

Page 70: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

10010

00111

0000

0001

DataTagValid

1

1

1

1

Reference Stream: Hit/Miss0b00111000 M0b000111000b001110000b00011000

Tag IndexByte Offset

Block Offset

2-way Set Associative Cache Blocksize=2words, wordsize= 4bytes

DataTagValidIndex

Page 71: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

10010

00111

0000

0001

DataTagValid

1

1

1

1

Reference Stream: Hit/Miss0b00111000 M0b00011100 H0b001110000b00011000

Tag IndexByte Offset

Block Offset

2-way Set Associative Cache Blocksize=2words, wordsize= 4bytes

DataTagValidIndex

Page 72: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

10010

00111

0000

0001

DataTagValid

1

1

1

1

Reference Stream: Hit/Miss0b00111000 M0b00011100 H0b001110000b00011000

Tag IndexByte Offset

Block Offset

2-way Set Associative Cache Blocksize=2words, wordsize= 4bytes

DataTagValidIndex

Page 73: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

10010

00111

0000

0001

DataTagValid

1

1

1

1

Reference Stream: Hit/Miss0b00111000 M0b00011100 H0b00111000 H0b00011000

Tag IndexByte Offset

Block Offset

2-way Set Associative Cache Blocksize=2words, wordsize= 4bytes

DataTagValidIndex

Page 74: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

10010

00111

0000

0001

DataTagValid

1

1

1

1

Reference Stream: Hit/Miss0b00111000 M0b00011100 H0b00111000 H0b00011000

Tag IndexByte Offset

Block Offset

2-way Set Associative Cache Blocksize=2words, wordsize= 4bytes

DataTagValidIndex

Page 75: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

10010

00111

0000

0001

DataTagValid

1

1

1

1

Reference Stream: Hit/Miss0b00111000 M0b00011100 H0b00111000 H0b00011000 H

Tag IndexByte Offset

Block Offset

2-way Set Associative Cache Blocksize=2words, wordsize= 4bytes

DataTagValidIndex

Page 76: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Implementation

0

1

DataTagValid

Byte Address0x100100100

Tag IndexByte Offset

=Hit? MUX

Block offset

Data

DataTagValid

MUX=

MUX

Page 77: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Performance Implications

• Increasing associativity increases/decreases hit rate

• Increasing associativity increases/decreases access time

• Increasing associativity increases/decreases miss penalty

Page 78: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Performance Implications

• Increasing associativity increases hit rate

• Increasing associativity increases/decreases access time

• Increasing associativity increases/decreases miss penalty

Page 79: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Performance Implications

• Increasing associativity increases hit rate

• Increasing associativity increases access time

• Increasing associativity increases/decreases miss penalty

Page 80: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Performance Implications

• Increasing associativity increases hit rate

• Increasing associativity increases access time

• Increasing associativity has no effect on miss penalty

Page 81: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0

1

Direct-Mapped Cache

DataTagValid

0

00

0

Miss Rate:Tag Index Byte OffsetBlock Offset

Example 2-way associative

Reference Stream: Hit/Miss0b1001000 M0b00111000b10010000b0111000

Page 82: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0

1

Direct-Mapped Cache

DataTagValid

0

00

0

Tag Index Byte OffsetBlock Offset

Example 2-way associative

Reference Stream:0b10010000b00111000b10010000b0111000

Page 83: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0

1 100

Direct-Mapped Cache

DataTagValid

0

10

0

Tag Index Byte OffsetBlock Offset

Example 2-way associative

Reference Stream:0b10010000b00111000b10010000b0111000

Page 84: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0

1 100

Direct-Mapped Cache

DataTagValid

0

10

0

Tag Index Byte OffsetBlock Offset

Example 2-way associative

Reference Stream:0b10010000b00111000b10010000b0111000

Page 85: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0

1 100001

Direct-Mapped Cache

DataTagValid

0

11

0

Tag Index Byte OffsetBlock Offset

Example 2-way associative

Reference Stream:0b10010000b00111000b10010000b0111000

Page 86: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0

1 100001

Direct-Mapped Cache

DataTagValid

0

11

0

Tag Index Byte OffsetBlock Offset

Example 2-way associative

Reference Stream:0b10010000b00111000b10010000b0111000

Page 87: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

0

1 100001

Direct-Mapped Cache

DataTagValid

0

11

0

Tag Index Byte OffsetBlock Offset

Example 2-way associative

Reference Stream:0b10010000b00111000b10010000b0111000

Page 88: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Which block to replace?

• 0b1001000

• 0b0011100

Page 89: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Which block to replace?

• 0b1001000 - It entered the cache first– FIFO - First In First Out

• 0b0011100

Page 90: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Which block to replace?

• 0b1001000 - It entered the cache first– FIFO - First In First Out

• 0b0011100 - Longer since it has been used– LRU - Least Recently Used

• Random

Page 91: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Replacement Algorithms

• LRU & FIFO simple conceptually, but implementation difficult for high assoc.

• LRU & FIFO must be approximated with high associativity

• Random sometimes better than approximated LRU/FIFO

• Tradeoff between accuracy, implementation cost

Page 92: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

L1

L2 Cache

DRAM

Memory

Me

L1 cache’s perspective

L1’s miss penalty containsthe access of L2, and possiblythe access of DRAM!!!

Page 93: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Multi-level Caches

• Base CPI 1.0, 500MHz clock

• main memory-100 cycles, L2 - 10 cycles

• L1 miss rate per instruction - 5%

• w/L2 - 2% of instructions go to DRAM

• What is the speedup with the L2 cache?

There is a typo in the book for this example!

Page 94: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Multi-level Caches

• CPI = 1 + memory stalls / instructions

Page 95: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Multi-level Caches

• CPI = 1 + memory stalls / instructions

• CPIold = 1 + 5% miss/instr * 100 cycles/miss = 1 + 5 = 6 cycles / instr

Page 96: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Multi-level Caches

• CPI = 1 + memory stalls / instructions

• CPIold = 1 + 5% miss/instr * 100 cycles/miss = 1 + 5 = 6 cycles / instr

• CPInew = 1 + L2%*L2penalty + Mem%*MemPenalty

Page 97: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Multi-level Caches

• CPI = 1 + memory stalls / instructions

• CPIold = 1 + 5% miss/instr * 100 cycles/miss = 1 + 5 = 6 cycles / instr

• CPInew = 1 + L2%*L2penalty + Mem%*MemPenalty

• = 1 + 5% * 10 + 2% * 100 = 3.5

Page 98: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Multi-level Caches

• CPI = 1 + memory stalls / instructions

• CPIold = 1 + 5% miss/instr * 100 cycles/miss = 1 + 5 = 6 cycles / instr

• CPInew = 1 + L2%*L2penalty + Mem%*MemPenalty

• = 1 + 5% * 10 + 2% * 100 = 3.5

• = 1 + (5-2)%*10 + 2%*(10+100) = 3.5

Page 99: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Multi-level Caches

• CPI = 1 + memory stalls / instructions

• CPIold = 1 + 5% miss/instr * 100 cycles/miss = 1 + 5 = 6 cycles / instr

• CPInew = 1 + L2%*L2penalty + Mem%*MemPenalty

• = 1 + 5% * 10 + 2% * 100 = 3.5

• = 1 + (5-2)%*10 + 2%*(10+100) = 3.5

• Speedup = 6/3.5 = 1.7

Page 100: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

• DO GROUPWORK NOW

Page 101: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Summary

• Direct-mapped– simple– _____ access time– _______ hit rate

• Variable block size– still simple– _______ access time

Page 102: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Summary

• Direct-mapped– simple– fast access time– marginal hit rate

• Variable block size– still simple– _____ access time– _____ hit rate by exploiting __________

Page 103: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Summary

• Direct-mapped– simple– fast access time– marginal hit rate

• Variable block size– still simple– fast access time– higher hit rate by exploiting spatial locality

Page 104: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Summary• Associative caches

– ________ the access time– ________ the hit rate– associativity above ___ has little to no gain

• Multi-level caches– __________ worst-case miss penalty– __________ average miss penalty

Page 105: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Summary• Associative caches

– increase the access time– increase the hit rate– associativity above 8 has little to no gain

• Multi-level caches– __________ worst-case miss penalty– __________ average miss penalty

Page 106: Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Summary• Associative caches

– increase the access time– increase the hit rate– associativity above 8 has little to no gain

• Multi-level caches– increases worst-case miss penalty (because you

waste time accessing another cache)– Reduces average miss penalty (because so

many are caught and handled quickly)