41
L17 Pipeline Issues & Memory 1 Comp 411 CPU Pipelining Issues This pipe stuff makes my head hurt! What have you been beating your head against?

CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

  • Upload
    phamnhi

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 1Comp 411

CPU Pipelining Issues

This pipe stuff makesmy head hurt!

What have you been beating your head

against?

Page 2: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 2Comp 411

Pipelining

Improve performance by increasing instruction throughput

Ideal speedup is number of stages in the pipeline. Do we

achieve this?

Instruction

fetchReg ALU

Data

accessReg

8 nsInstruction

fetchReg ALU

Data

accessReg

8 nsInstruction

fetch

8 ns

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 4 6 8 10 12 14 16 18

2 4 6 8 10 12 14

...

Program

execution

order

(in instructions)

Instruction

fetchReg ALU

Data

accessReg

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 nsInstruction

fetchReg ALU

Data

accessReg

2 nsInstruction

fetchReg ALU

Data

accessReg

2 ns 2 ns 2 ns 2 ns 2 ns

Program

execution

order

(in instructions)

Page 3: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 3Comp 411

Pipelining

What makes it easyall instructions are the same length

just a few instruction formats

memory operands appear only in loads and stores

What makes it hard?structural hazards: suppose we had only one memory

control hazards: need to worry about branch instructions

data hazards: an instruction depends on a previous instruction

Individual Instructions still take the same number of cycles

But we’ve improved the through-put by increasing the number of simultaneously executing instructions

Page 4: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 4Comp 411

Structural Hazards

Inst

Fetch

Reg

Read

ALU Data

Access

Reg Write

Inst

Fetch

Reg

Read

ALU Data

Access

Reg Write

Inst

Fetch

Reg

Read

ALU Data

Access

Reg Write

Inst

Fetch

Reg

Read

ALU Data

Access

Reg Write

Page 5: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 5Comp 411

Problem with starting next instruction before first is finisheddependencies that “go backward in time” are data hazards

Data Hazards

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Program

execution

order

(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of

register $2:

DM Reg

Reg

Reg

Reg

DM

Page 6: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 6Comp 411

Have compiler guarantee no hazards

Where do we insert the “nops” ?

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Problem: this really slows us down!

Software Solution

Page 7: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 7Comp 411

Use temporary results, don’t wait for them to be written register file forwarding to handle read/write to same register ALU forwarding

Forwarding

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Program

execution order

(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2 :

DM Reg

Reg

Reg

Reg

X X X – 20 X X X X XValue of EX/MEM :

X X X X – 20 X X X XValue of MEM/WB :

DM

Page 8: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 8Comp 411

Load word can still cause a hazard:an instruction tries to read a register following a load instruction that writes to

the same register.

Thus, we need a hazard detection unit to “stall” the instruction

Can't always forward

Reg

IM

Reg

Reg

IM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $2, 20($1)

Program

execution

order

(in instructions)

and $4, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

DM Reg

Reg

Reg

DM

Page 9: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 9Comp 411

Stalling

We can stall the pipeline by keeping an instruction in the same stage

lw $2, 20($1)

Program

execution

order

(in instructions)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

Reg

IM

Reg

Reg

IM DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

IM Reg DM RegIM

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

RegReg

Reg

bubble

Page 10: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 10Comp 411

When we decide to branch, other instructions are in the pipeline!

We are predicting “branch not taken”need to add hardware for flushing instructions if we are wrong

Branch Hazards

Reg

Reg

CC 1

Time (in clock cycles)

40 beq $1, $3, 7

Program

execution

order

(in instructions)

IM Reg

IM DM

IM DM

IM DM

DM

DM Reg

Reg Reg

Reg

Reg

RegIM

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

72 lw $4, 50($7)

CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

Reg

Page 11: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 11Comp 411

Improving Performance

Try to avoid stalls! E.g., reorder these instructions:

lw $t0, 0($t1)

lw $t2, 4($t1)

sw $t2, 0($t1)

sw $t0, 4($t1)

Add a “branch delay slot”the next instruction after a branch is always executed

rely on compiler to “fill” the slot with something useful

Superscalar: start more than one instruction in the same cycle

Page 12: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 12Comp 411

Dynamic Scheduling

The hardware performs the “scheduling”

hardware tries to find instructions to execute

out of order execution is possible

speculative execution and dynamic branch prediction

All modern processors are very complicated

Pentium 4: 20 stage pipeline, 6 simultaneous instructions

PowerPC and Pentium: branch history table

Compiler technology important

Page 13: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 13Comp 411

Pipeline Summary (I)

• Started with unpipelined implementation– direct execute, 1 cycle/instruction

– it had a long cycle time: mem + regs + alu + mem + wb

• We ended up with a 5-stage pipelined implementation– increase throughput (3x???)

– delayed branch decision (1 cycle)

Choose to execute instruction after branch

– delayed register writeback (3 cycles)

Add bypass paths (6 x 2 = 12) to forward correct value

– memory data available only in WB stage

Introduce NOPs at IRALU, to stall IF and RF stages until LD result was ready

Page 14: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 14Comp 411

Pipeline Summary (II)

Fallacy #1: Pipelining is easySmart people get it wrong all of the time!

Fallacy #2: Pipelining is independent of ISAMany ISA decisions impact how easy/costly it is to implement pipelining (i.e. branch semantics, addressing modes).

Fallacy #3: Increasing Pipeline stages improves performanceDiminishing returns. Increasing complexity.

Page 15: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 15Comp 411

RISC = Simplicity???

Generalization of registers and

operand coding

Complex instructions, addressing modes

Addressingfeatures, eg

index registers

RISCs

Primitive Machines with direct

implementations

VLIWs,Super-Scalars ?

Pipelines, Bypasses,Annulment, …, ...

“The P.T. Barnum World’s Tallest Dwarf Competition”

World’s Most Complex RISC?

Page 16: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 16Comp 411

Memory Hierarchy

Why are you dressed like that? Halloween was weeks ago!

It makes me look faster,don’t you think?

•Memory Flavors•Principle of Locality•Program Traces•Memory Hierarchies•Associativity

(Study Chapter 5)

Page 17: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 17Comp 411

What Do We Want in a Memory?

PC

INST

MADDR

MDATA

miniMIPS MEMORY

Capacity Latency Cost

Register 1000’s of bits 10 ps $$$$

SRAM 1’s Mbytes 0.2 ns $$$

DRAM 10’s Gbytes 10 ns $

Hard disk* 10’s Tbytes 10 ms ¢

Want?

* non-volatile

ADDR

DOUT

ADDR

DATA

R/WWr

100 Gbytes 0.2 ns cheap

Page 18: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 18Comp 411

Tricks for Increasing Throughput

Row

Ad

dre

ss D

ecod

erCol.

1Col.2

Col.3

Col.2M

Row 1

Row 2

Row 2N

Column Multiplexer/ShifterN

N

Multiplexed Address

bit lines word lines

memorycell

(one bit)

Dt1 t2 t3 t4

The first thing that should pop into your mind when asked to speed up a digital design…

PIPELINING

Synchronous DRAM(SDRAM)

($25 per Gbyte)

Clock

Dataout

Double-clockedSynchronous DRAM

(SDRAM)

Page 19: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 19Comp 411

Hard Disk Drives

Typical drive:• Average latency = 4 ms (7200 rpm)• Average seek time = 8.5 ms• Transfer rate = 140 Mbytes/s (SATA)• Capacity = 1.5 T byte• Cost = $149 (10¢ G byte)

figu

res

from

ww

w.p

ctec

hgui

de.c

om

Page 20: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 20Comp 411

Quantity vs Quality…

Your memory system can be• BIG and SLOW... or• SMALL and FAST.

10-8 10-3 1 100

.1

10

1000

100

1

10-6

DVD Burner (0.06$/G, 150ms)

DISK (0.10$/GB, 10 mS)

DRAM (25$/GB, 5 ns)

SRAM (5000$/GB, 0.2 ns)

AccessTime

.01

$/GB

We’ve explored a range of device-design trade-offs.

Is there an ARCHITECTURAL solution to this DELIMA?

1

Page 21: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 21Comp 411

Managing Memory via Programming

• In reality, systems are built with a mixture of all these various memory types

• How do we make the most effective use of each memory? • We could push all of these issues off to programmers

• Keep most frequently used variables and stack in SRAM• Keep large data structures (arrays, lists, etc) in DRAM• Keep bigger data structures on disk (databases) on DISK

• It is harder than you think… data usage evolves over a program’s execution

CPU

SRAMMAINMEM

Page 22: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 22Comp 411

Best of Both Worlds

What we REALLY want: A BIG, FAST memory!(Keep everything within instant access)

We’d like to have a memory system that• PERFORMS like 10 GBytes of SRAM; but• COSTS like 1-4 Gbytes of slow memory.

SURPRISE: We can (nearly) get our wish!

KEY: Use a hierarchy of memory technologies:

CPU

SRAMMAINMEM

Page 23: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 23Comp 411

Key IDEA

• Keep the most often-used data in a small, fast SRAM (often local to CPU chip)

• Refer to Main Memory only rarely, for remaining data.

The reason this strategy works: LOCALITY

Locality of Reference:

Reference to location X at time t implies

that reference to location X+X at time

t+t becomes more probable as X and

t approach zero.

Page 24: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 24Comp 411

Cache

cache (kash) n.

A hiding place used especially for storing provisions.

A place for concealment and safekeeping, as of valuables.

The store of goods or valuables concealed in a hiding place.

Computer Science. A fast storage buffer in the central processing unit of a computer. In this sense, also called cache memory.

v. tr. cached, cach·ing, cach·es.

To hide or store in a cache.

Page 25: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 25Comp 411

Cache Analogy

You are writing a term paper at a table in the library

As you work you realize you need a book

You stop writing, fetch the reference, continue writing

You don’t immediately return the book, maybe you’ll need it again

Soon you have a few books at your table and no longer have to fetch more books

The table is a CACHE for the rest of the library

Page 26: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 26Comp 411

Typical Memory Reference Patterns

time

address

data

stack

program

MEMORY TRACE –A temporal sequenceof memory references (addresses) from areal program.

TEMPORAL LOCALITY –If an item is referenced,it will tend to be referenced again soon

SPATIAL LOCALITY –If an item is referenced,nearby items will tendto be referenced soon.

Page 27: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 27Comp 411

Working Set

time

address

data

stack

program

t

|S|

t

S is the set of locations accessed during t.

Working set: a set S which changes slowly w.r.t. access time.

Working set size, |S|

Page 28: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 28Comp 411

Exploiting the Memory Hierarchy

Approach 1 (Cray, others): Expose Hierarchy

• Registers, Main Memory,

Disk each available asstorage alternatives;

• Tell programmers: “Use them cleverly”

Approach 2: Hide Hierarchy• Programming model: SINGLE kind of memory, single address

space.

• Machine AUTOMATICALLY assigns locations to fast or slow memory, depending on usage patterns.

CPU

SRAMMAINMEM

CPU SmallStatic

DynamicRAM

HARDDISK

“MAIN MEMORY”

Page 29: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 29Comp 411

Why We Care

CPU SmallStatic

DynamicRAM

HARDDISK

“MAIN MEMORY”

TRICK #1: How to make slow MAIN MEMORY appear faster than it is.

CPU performance is dominated by memory performance.More significant than:

ISA, circuit optimization, pipelining, super-scalar, etc

TRICK #2: How to make a small MAIN MEMORY appear bigger than it is.

“VIRTUAL MEMORY”“SWAP SPACE”

Technique: VIRTUAL MEMORY

“CACHE”

Technique: CACHEING

Page 30: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 30Comp 411

The Cache Idea:Program-Transparent Memory Hierarchy

Cache contains TEMPORARY COPIES of selectedmain memory locations... eg. Mem[100] = 37

GOALS: 1) Improve the average access time

2) Transparency (compatibility, programming ease)

1.0 (1.0-)

CPU

"CACHE"

DYNAMICRAM

"MAINMEMORY"

100 37

(1-)

HIT RATIO: Fraction of refs found in CACHE.MISS RATIO: Remaining references.

Challenge:To make thehit ratio ashigh aspossible.mcmccave t)(t)tt)((tt 11

Page 31: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 31Comp 411

How High of a Hit Ratio?

Suppose we can easily build an on-chip static memory with a 0.8 nS access time, but the fastest dynamic memories that we can buy for main memory have an average access time of 10 nS. How high of a hit rate do we need to sustain an average access time of 1 nS?

m

cave

t

tt 1 %98

10

8.011

WOW, a cache really needs to be good?

Page 32: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 32Comp 411

CacheSits between CPU and main memory

Very fast table that stores a TAG and DATA

TAG is the memory address

DATA is a copy of memory at the address given by TAG

1000 17

1040 1

1032 97

1008 11

1000 17

1004 23

1008 11

1012 5

1016 29

1020 38

1024 44

1028 99

1032 97

1036 25

1040 1

1044 4

Memory

Tag Data

Page 33: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 33Comp 411

Cache Access

On load we look in the TAG entries for the address we’re loading

Found a HIT, return the DATA

Not Found a MISS, go to memory for the data and put it and

the address (TAG) in the cache

1000 17

1004 23

1008 11

1012 5

1016 29

1020 38

1024 44

1028 99

1032 97

1036 25

1040 1

1044 4

Memory

1000 17

1040 1

1032 97

1008 11

Tag Data

Page 34: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 34Comp 411

Cache Lines

Usually get more data than requested (Why?)

a LINE is the unit of memory stored in the cache

usually much bigger than 1 word, 32 bytes per line is common

bigger LINE means fewer misses because of spatial locality

but bigger LINE means longer time on miss

1000 17 23

1040 1 4

1032 97 25

1008 11 5

Tag Data

1000 17

1004 23

1008 11

1012 5

1016 29

1020 38

1024 44

1028 99

1032 97

1036 25

1040 1

1044 4

Memory

Page 35: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 35Comp 411

Finding the TAG in the Cache

A 1MByte cache may have 32k different lines each of 32 bytes

We can’t afford to sequentially search the 32k different tags

ASSOCIATIVE memory uses hardware to compare the address to

the tags in parallel but it is expensive and 1MByte is thus unlikely

TAG Data

= ?

TAG Data

= ?

TAG Data

= ?

IncomingAddress

HIT

DataOut

Page 36: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 36Comp 411

Finding the TAG in the Cache

A 1MByte cache may have 32k different lines each of 32 bytes

We can’t afford to sequentially search the 32k different tags

ASSOCIATIVE memory uses hardware to compare the address to

the tags in parallel but it is expensive and 1MByte is thus unlikely

DIRECT MAPPED CACHE computes the cache entry from the

address

multiple addresses map to the same cache line

use TAG to determine if right

Choose some bits from the address to determine the Cache line

low 5 bits determine which byte within the line

we need 15 bits to determine which of the 32k different lines

has the data

which of the 32 – 5 = 27 remaining bits should we use?

Page 37: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 37Comp 411

Direct-Mapping Example

1024 44 99

1000 17 23

1040 1 4

1016 29 38

Tag Data

1000 17

1004 23

1008 11

1012 5

1016 29

1020 38

1024 44

1028 99

1032 97

1036 25

1040 1

1044 4

Memory

•With 8 byte lines, the bottom 3 bits determine the byte within the line

•With 4 cache lines, the next 2 bits determine which line to use

1024d = 10000000000b line = 00b = 0d

1000d = 01111101000b line = 01b = 1d

1040d = 10000010000b line = 10b = 2d

Page 38: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 38Comp 411

Direct Mapping Miss

1024 44 99

1000 17 23

1040 1 4

1016 29 38

Tag Data

1000 17

1004 23

1008 11

1012 5

1016 29

1020 38

1024 44

1028 99

1032 97

1036 25

1040 1

1044 4

Memory

•What happens when we now ask for address 1008?

1008d = 01111110000b line = 10b = 2d

but earlier we put 1040d there...

1040d = 10000010000b line = 10b = 2d

1008 11 5

Page 39: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 39Comp 411

Miss Penalty and Rate

The MISS PENALTY is the time it takes to read the memory if it isn’t

in the cache

50 to 100 cycles is common.

The MISS RATE is the fraction of accesses which MISS

The HIT RATE is the fraction of accesses which HIT

MISS RATE + HIT RATE = 1

Suppose a particular cache has a MISS PENALTY of 100 cycles

and a HIT RATE of 95%. The CPI for load on HIT is 5 but on a

MISS it is 105. What is the average CPI for load?

Average CPI = 10

Suppose MISS PENALTY = 120 cycles?

then CPI = 11 (slower memory doesn’t hurt much)

5 * 0.95 + 105 * 0.05 = 10

Page 40: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 40Comp 411

Some Associativity can help

Direct-Mapped caches are very common but can cause problems...

SET ASSOCIATIVITY can help.

Multiple Direct-mapped caches, then compare multiple TAGS2-way set associative = 2 direct mapped + 2 TAG comparisons

4-way set associative = 4 direct mapped + 4 TAG comparisons

Now array size == power of 2 doesn’t get us in trouble

Butslower

less memory in same area

maybe direct mapped wins...

Page 41: CPU Pipelining Issues - Computer Sciencegb/Comp411Fall2011/media/L17-Pipeline... · Comp 411 L17 –Pipeline Issues & Memory 1 CPU Pipelining Issues This pipe stuff makes my head

L17 – Pipeline Issues & Memory 41Comp 411

What about store?

What happens in the cache on a store?WRITE BACK CACHE put it in the cache, write on replacement

WRITE THROUGH CACHE put in cache and in memory

What happens on store and a MISS?WRITE BACK will fetch the line into cache

WRITE THROUGH might just put it in memory