View
213
Download
0
Embed Size (px)
Citation preview
ReviewCPSC 321
Andreas Klappenecker
Announcements
• Tuesday, November 30, midterm exam
Cache
• Placement strategies• direct mapped• fully associative• set-associative
• Replacement strategies• random• FIFO• LRU
• Mapping: address modulo the number of blocks in the cache, x -> x mod B
Direct Mapped Cache
00001 00101 01001 01101 10001 10101 11001 11101
000
Cache
Memory
001
01
001
11
001
011
101
11
Set Associative Caches
• Each block maps to a unique set,• the block can be placed into any
element of that set,• Position is given by
(Block number) modulo (# of sets in cache)
• If the sets contain n elements, then the cache is called n-way set associative
• Cache with 1024=210 words• tag from cache is compared against
upper portion of the address• If tag=upper 20 bits and valid bit is
set, then we have a cache hit otherwise it is a cache miss
What kind of locality are we taking advantage of?
Direct Mapped Cache
Address (showing bit positions)
20 10
Byteoffset
Valid Tag DataIndex
0
1
2
1021
1022
1023
Tag
Index
Hit Data
20 32
31 30 13 12 11 2 1 0
The index is determined by address mod 1024
Byte offset
• Taking advantage of spatial locality:
Direct Mapped Cache
Address (showing bit positions)
16 12 Byteoffset
V Tag Data
Hit Data
16 32
4Kentries
16 bits 128 bits
Mux
32 32 32
2
32
Block offsetIndex
Tag
31 16 15 4 32 1 0
Block offset
Address Determination
reconstruction of the memory address = tag bits || set index bits || block offset || byte offset
Example: • 32 bit words, cache capacity 2^12 = 4096
words, blocks of 8 words, direct mapped • byte offset = 2 bits, block offset = 3 bits, set
index bits = 9 bits, tag bits = 18 bits
Example
• Suppose you want to realize a cache with a capacity for 8 KB of data (32 bits of address size). Assume that the blocksize is 4 words and a word consists of 4 bytes.
• How many bits are needed to realize a direct mapped cache? • 8 KByte = 2K words = 512 blocks = 2^9 blocks• direct mapped => # index bits = log(2^9)=9. • 2^9 x (128 + (32 – 9 – 2 – 2) + 1) = 2^9 x 148 bits
= number of blocks x (bits per block + tag + valid bit)
• How many bits are needed to realize a 8-way set associative cache? • Number of tag bits increase by 3. Why?
Typical Questions
• Show the evolution of a cache• Determine the number of bits needed in an
implementation of a cache• Know the placement and replacement
strategies• Be able to design a cache according to
specifications• Determine the number of cache misses• Measure cache performance
Typical Questions
• What kind of placement is typically used in virtual memory systems?
• What is a translation lookaside buffer?
• Why is a TLB used?
Pages: virtual memory blocks
• Page faults: if data is not in memory, retrieve it from disk• huge miss penalty, thus pages should be fairly large
(e.g., 4KB)• reducing page faults is important (LRU is worth the
price)• can handle the faults in software instead of hardware• using write-through takes too long so we use writeback• Example: page size 212=4KB; 218 physical pages; main memory <= 1GB; virtual memory <= 4GB
3 2 1 011 10 9 815 14 13 1231 30 29 28 27
Page offsetVirtual page number
Virtual address
3 2 1 011 10 9 815 14 13 1229 28 27
Page offsetPhysical page number
Physical address
Translation
Page Faults
• Incredible high penalty for a page fault• Reduce number of page faults by
optimizing page placement• Use fully associative placement
• full search of pages is impractical• pages are located by a full table that indexes
the memory, called the page table• the page table resides within the memory
Page Tables
Physical memory
Disk storage
Valid
1
1
1
1
0
1
1
0
1
1
0
1
Page table
Virtual pagenumber
Physical page ordisk address
The page table maps each page to either a page in mainmemory or to a page stored on disk
Page Tables
Page offsetVirtual page number
Virtual address
Page offsetPhysical page number
Physical address
Physical page numberValid
If 0 then page is notpresent in memory
Page table register
Page table
20 12
18
31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0
29 28 27 15 14 13 12 11 10 9 8 3 2 1 0
Making Memory Access Fast
• Page tables slow us down• Memory access will take at least twice as
long• access page table in memory• access page
• What can we do?
Memory access is local => use a cache that keeps track of recently used address translations, called translation lookaside buffer
Making Address Translation Fast
A cache for address translations: translation lookaside buffer
Valid
1
1
1
1
0
1
1
0
1
1
0
1
Page table
Physical pageaddressValid
TLB
1
1
1
1
0
1
TagVirtual page
number
Physical pageor disk address
Physical memory
Disk storage
MIPS Processor and Variations
Datapath for MIPS instructions
Note the seven control signals!
Single Cycle Datapath
Pipelined Version
Obstacles to Pipelining
• Structural Hazards• hardware cannot support the combination of
instructions in the same clock cycle
• Control Hazards• need to make decision based on results of one
instruction while other is still executing
• Data Hazards• instruction depends on results of instruction
still in pipeline
• Control Hazards Resolution (for branch)• Stall pipeline• predict result• delayed branch
Stall on Branch
• Assume that all branch computations are done in stage 2
• Delay by one cycle to wait for the result
Branch Prediction
• Predict branch result• For example, predict always that branch is not taken (e.g. reasonable for while instructions)• if choice is correct, then pipeline runs at
full speed• if choice is incorrect, then pipeline stalls
Branch Prediction
Delayed Branch
Data Hazards
• A data hazard results if an instruction depends on the result of a previous instruction• add $s0, $t0, $t1• sub $t2, $s0, $t3 // $s0 to be determined
• These dependencies happen often, so it is not possible to avoid them completely
• Use forwarding to get missing data from internal resources once available
Forwarding
add $s0, $t0, $t1
sub $t2, $s0, $t3
Typical Questions
• Given a brief specification of the processor and a sequences of instructions, determine all pipeline hazards.
• Most typical question: fill in some steps in a timing diagram (almost every exam has such a question, google).
Example
add $1, $2, $3 _ _ _ _ _
add $4, $5, $6 _ _ _ _ _
add $7, $8, $9 _ _ _ _ _
add $10, $11, $12 _ _ _ _ _
add $13, $14, $1 _ _ _ _ _ (data arrives early OK)
add $15, $16, $7 _ _ _ _ _ (data arrives on time OK)
add $17, $18, $13 _ _ _ _ _ (uh, oh)
add $19, $20, $17 _ _ _ _ _ (uh, oh)
Verilog
Mixed Questions