54
1 CHAPTER 6

Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

Embed Size (px)

Citation preview

Page 1: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

1

CHAPTER 6

Page 2: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

2

Pipelining

• Improve performance by increasing instruction throughput

Instruction class

Instructionmemory

Registerread

ALUData

memoryRegister

writeTotal

(in ps)

Load word 200 100 200 200 100 800

Store word 200 100 200 200 700

R-format 200 100 200 100 600

Branch 200 100 200 500

Ideal speedup is number of stages in the pipeline. Do we achieve this?

Ins tru ction�fe tch R eg A LU D ata �

acc ess R eg

8 n s Ins tru ction�fe tch R eg A LU D ata �

ac cess R eg

8 n sIns tru ction�

fe tch

8 ns

T im e

lw $ 1, 10 0 ($0 )

lw $ 2, 20 0 ($0 )

lw $ 3, 30 0 ($0 )

2 4 6 8 1 0 1 2 14 16 1 8

2 4 6 8 1 0 1 2 14

...

P rog ram �e xecution �o rd er�(in in struc tio ns )

Ins truc tion �fe tch R eg ALU D a ta�

access R eg

T im e

lw $1 , 1 00 ($ 0)

lw $2 , 2 00 ($ 0)

lw $3 , 3 00 ($ 0)

2 ns Ins truc tion �fe tch R eg ALU D a ta�

access R eg

2 nsIns truc tion �

fe tc h R eg A LU D a ta�access R eg

2 ns 2 n s 2 n s 2 ns 2 n s

P rog ram �e xecut io n�o rd er�( in in struc tio n s)

Page 3: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

3

Pipelining

• What makes it easy– all instructions are the same length– just a few instruction formats– memory operands appear only in loads and stores

• What makes it hard?– structural hazards: suppose we had only one memory– control hazards: need to worry about branch instructions– data hazards: an instruction depends on a previous instruction

• We’ll build a simple pipeline and look at these issues

• We’ll talk about modern processors and what really makes it hard:– exception handling– trying to improve performance with out-of-order execution, etc.

Page 4: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

4

Hazards

A=B+EC=B+F

lw $t1, 0($t0)lw $t2, 4($t0)add $t3, $t1, $t2sw $t3, 12($t0)lw $t4, 8($t0)add $t5, $t1, $t4sw $t5, 16($t0)

lw $t1, 0($t0)lw $t2, 4($t0)lw $t4, 8($t0)add $t3, $t1, $t2sw $t3, 12($t0)add $t5, $t1, $t4sw $t5, 16($t0)

Page 5: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

5

Basic Idea

What do we need to add to actually split the datapath into stages?

Page 6: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

6

Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem?

Pipelined datapath

Page 7: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

7

Five Stages (lw)

Memory and registersLeft half: writeRight half: read

Page 8: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

8

Five Stages (lw)

Page 9: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

9

Five Stages (lw)

Page 10: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

10

What is wrong with this datapath?

Page 11: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

11

• Can help with answering questions like:– How many cycles does it take to execute this code?– What is the ALU doing during cycle 4?– Use this representation to help understand datapaths

Graphically representing pipelines

Page 12: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

12

Pipeline operation

• In pipeline one operation begins in every cycle• Also, one operation completes in each cycle• Each instruction takes 5 clock cycles

– k cycles in general, where k is pipeline depth• When a stage is not used, no control needs to be applied• In one clock cycle, several instructions are active • Different stages are executing different instructions• How to generate control signals for them is an issue

Page 13: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

13

Pipeline control

• We have 5 stages. What needs to be controlled in each stage?– Instruction Fetch and PC Increment– Instruction Decode / Register Fetch– Execution– Memory Stage– Write Back

• How would control be handled in an automobile plant?– A fancy control center telling everyone what to do?– Should we use a finite state machine?

Page 14: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

14

PC

Instruction�memory

Address

Inst

ruct

ion

Instruction�[20– 16]

MemtoReg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction�[15– 0]

0

0Registers

Write�register

Write�data

Read�data 1

Read�data 2

Read�register 1

Read�register 2

Sign�extend

M�u�x

1Write�data

Read�data M�

u�x

1

ALU�control

RegWrite

MemRead

Instruction�[15– 11]

6

IF/ID ID/EX EX/MEM MEM/WB

MemWrite

Address

Data�memory

PCSrc

Zero

Add Add�result

Shift�left 2

ALU�result

ALUZero

Add

0

1

M�u�x

0

1

M�u�x

Pipeline control

Page 15: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

15

Execution/Address Calculation stage control

linesMemory access stage

control lines

Write-back stage control

lines

InstructionReg Dst

ALU Op1

ALU Op0

ALU Src

Branch

Mem Read

Mem Write

Reg write

Mem to Reg

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

Pipeline control

Page 16: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

16

PC

Instruction�memory

Inst

ruct

ion

Add

Instruction�[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction�[15– 0]

0

0

M�u�x

0

1

Add Add�result

RegistersW rite�register

W rite�data

Read�data 1

Read�data 2

Read�register 1

Read�register 2

Sign�extend

M�u�x

1

ALU�result

Zero

Write�data

Read�data

M�u�x

1

ALU�control

Shift�left 2

Reg

Writ

e

MemRead

Control

ALU

Instruction�[15– 11]

6

EX

M

W B

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

M�u�x

0

1

Mem

Writ

e

AddressData�

memory

Address

Datapath with control

Page 17: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

17

• Problem with starting next instruction before first is finished– Dependencies that “go backward in time” are data hazards

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Program�execution�order�(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of �register $2:

DM Reg

Reg

Reg

Reg

DM

Dependencies

Page 18: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

18

• Use temporary results, don’t wait for them to be written– register file forwarding to handle read/write to same register– ALU forwarding

Programexecutionorder(in instructions)

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14,$2 , $2

sw $15, 100($2)

Forwarding

Time (in clock cycles)CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

IM DMReg Reg

IM DMReg Reg

IM DMReg Reg

IM DMReg Reg

IM DMReg Reg

10 10 10 10 10/–20 –20 –20 –20 –20Value of register $2:Value of EX/MEM: X X X –20 X X X X XValue of MEM/WB: X X X X –20 X X X X

Page 19: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

19

PC Instruction�memory

Registers

M�u�x

M�u�x

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Data�memory

M�u�x

Forwarding�unit

IF/ID

Inst

ruct

ion

M�u�x

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Forwardingsub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

Page 20: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

20

• Load word can still cause a hazard:– an instruction tries to read a register following a load instruction

that writes to the same register.

• Thus, we need a hazard detection unit to “stall” the load instruction

Can't always forward

Programexecutionorder(in instructions)

lw $2, 20($1)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

Time (in clock cycles)CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

IM DMReg Reg

IM DMReg Reg

IM DMReg Reg

IM DMReg Reg

IM DMReg Reg

Page 21: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

21

ForwardingForward from EX/MEM registers

If (EX/MEM.RegWrite)and If (EX/MEM.Rd != 0)

and (ID/EX.Rs == EX/MEM.Rd)

Forward from MEM/WB registers

If (MEM/WB.RegWrite)and If (MEM/WB.Rd != 0)

and If (ID/EX.Rt==EX/MEM.Rd)

Page 22: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

22

lw $2, 20($1)

Program�execution�order�(in instructions)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

Reg

IM

Reg

Reg

IM DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6Time (in clock cycles)

IM Reg DM RegIM

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

RegReg

Reg

bubble

Stalling

• Hardware detection and no-op insertion is called stalling• Stall pipeline by keeping instruction in the same stage

Page 23: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

23

Example

Page 24: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

24

Page 25: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

25

Stall logic

• Stall logic– If (ID/EX.MemRead) // Load

word instruction AND– If ((ID/EX.Rt == IF/ID.Rs) or

(ID/EX.Rt == IF/ID.Rt))

• Insert no-op (no-operation)– Deasserting all control

signals

• Stall following instruction– Not writing program counter– Not writing IF/ID registers

PCWrite

IF/ID.RsIF/ID.Rt

ID/EX.Rt

Page 26: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

26

Pipeline with hazard detection

Page 27: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

27

Summary

Page 28: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

28

Forwarding Case Summary

Page 29: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

29

Multi-cycle

Page 30: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

30

Multi-cycle

Page 31: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

31

Multi-cycle Pipeline

Page 32: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

32

PC

Instruction�memory

Inst

ruct

ion

Add

Instruction�[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction�[15– 0]

0

0

M�u�x

0

1

Add Add�result

RegistersW rite�register

W rite�data

Read�data 1

Read�data 2

Read�register 1

Read�register 2

Sign�extend

M�u�x

1

ALU�result

Zero

Write�data

Read�data

M�u�x

1

ALU�control

Shift�left 2

Reg

Writ

e

MemRead

Control

ALU

Instruction�[15– 11]

6

EX

M

W B

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

M�u�x

0

1

Mem

Writ

e

AddressData�

memory

Address

Branch Hazards

Page 33: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

33

• When we decide to branch, other instructions are in the pipeline!• We are predicting “branch not taken”

– need to add hardware for flushing instructions if we are wrong

Reg

Reg

CC 1

Time (in clock cycles)

40 beq $1, $3, 7

Program�execution�order�(in instructions)

IM Reg

IM DM

IM DM

IM DM

DM

DM Reg

Reg Reg

Reg

Reg

RegIM

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

72 lw $4, 50($7)

CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

Reg

Branch hazards

Page 34: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

34

Solution to control hazards

• Branch prediction– We are predicting “branch not taken”– Need to add hardware for flushing instructions if we are wrong

• Reduce branch penalty– By advancing the branch decision to ID stage– Compare the data read from two registers read in ID stage– Comparison for equality is a simpler design! (Why?)– Still need to flush instruction in IF stage

• Make the hazard into a feature!– Delayed branch slot - Always execute instruction following

branch

Page 35: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

35

Branch detection in ID stage

Page 36: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

36

Dynamic branch prediction

• Use lower part of instruction address

– Use one bit to say denote branch taken or not taken

– Disadvantage: poor performance in loops

• Dynamic branch prediction– Use two bits instead of one– Condition must be satisfied

twice to predict

• More sophisticated– Count the number of times

branch is taken 2-bit branch predictionState diagram

Page 37: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

37

Correlating Branches• Hypothesis: recent branches are correlated; that is, behavior of recently

executed branches affects prediction of current branch• Idea: record m most recently executed branches as taken or not taken, and

use that pattern to select the proper branch history table• In general, (m,n) predictor means record last m branches to select between

2m history tables each with n-bit counters– Old 2-bit BHT is then a (0,2) predictor

If (aa == 2)aa=0;

If (bb == 2)bb = 0;

If (aa != bb)do something;

Page 38: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

38

XX XX

Branch address

Prediction

2-bit global branch history

2-bit per branch predictors

4

Correlating Branches

(2,2) predictor– Then behavior of

recent branches selects between, say, four predictions of next branch, updating just that prediction

Branch address

2-bits per branch predictors

PredictionPrediction

2-bit global branch history

Page 39: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

39

Freq

uenc

y of

Mis

pred

ictio

ns

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

nasa

7

mat

rix30

0

tom

catv

dodu

cd

spic

e

fppp

p gcc

espr

esso

eqnt

ott li

0%1%

5%6% 6%

11%

4%

6%5%

1%

4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)

4096 Entries 2-bit BHTUnlimited Entries 2-bit BHT1024 Entries (2,2) BHT

Accuracy of Different Schemes

0%

18%

Freq

uenc

y of

Mis

pred

ictio

ns

Page 40: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

40

Branch Prediction

• Sophisticated Techniques:– A “branch target buffer” to help us look up the destination– Correlating predictors that base prediction on global behavior

and recently executed branches (e.g., prediction for a specificbranch instruction based on what happened in previous branches)

– Tournament predictors that use different types of prediction strategies and keep track of which one is performing best.

– A “branch delay slot” which the compiler tries to fill with a useful instruction (make the one cycle delay part of the ISA)

• Branch prediction is especially important because it enables other more advanced pipelining techniques to be effective!

• Modern processors predict correctly 95% of the time!

Page 41: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

41

Branch Target Buffer

• Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)– Note: must check for branch match now, since can’t use wrong

branch address

• Return instruction addresses predicted with stack

Predicted PCBranch Prediction:Taken or not Taken

Page 42: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

42

Scheduling in delayed branching

Page 43: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

43

Other issues in pipelines

• Exceptions– Errors in ALU for arithmetic instructions– Memory non-availability

• Exceptions lead to a jump in a program• However, the current PC value must be saved so that the program

can return to it back for recoverable errors• Multiple exception can occur in a pipeline• Preciseness of exception location is important in some cases• I/O exceptions are handled in the same manner

Page 44: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

44

Exceptions

Page 45: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

45

Improving Performance

• Try and avoid stalls! E.g., reorder these instructions:

lw $t0, 0($t1)lw $t2, 4($t1)sw $t2, 0($t1)sw $t0, 4($t1)

• Dynamic Pipeline Scheduling– Hardware chooses which instructions to execute next– Will execute instructions out of order (e.g., doesn’t wait for a

dependency to be resolved, but rather keeps going!)– Speculates on branches and keeps the pipeline full

(may need to rollback if prediction incorrect)

• Trying to exploit instruction-level parallelism

Page 46: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

46

Advanced Pipelining

• Increase the depth of the pipeline• Start more than one instruction each cycle (multiple issue)• Loop unrolling to expose more ILP (better scheduling)• “Superscalar” processors

– DEC Alpha 21264: 9 stage pipeline, 6 instruction issue• All modern processors are superscalar and issue multiple

instructions usually with some limitations (e.g., different “pipes”)• VLIW: very long instruction word, static multiple issue

(relies more on compiler technology)

• This class has given you the background you need to learn more!

Page 47: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

47

Superscalar architecture --Two instructions executed in parallel

Page 48: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

48

Dynamically scheduled pipeline

Page 49: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

49

Motorola G4e

Page 50: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

50

Intel Pentium 4

Page 51: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

51

IBM PowerPC 970

Page 52: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

52

Important facts to remember

• Pipelined processors divide execution in multiple steps

• However pipeline hazards reduce performance

– Structural, data, and control hazard

• Data forwarding helps resolve data hazards

– But all hazards cannot be resolved

– Some data hazards require bubble or noop insertion

• Effects of control hazard reduced by branch prediction

– Predict always taken, delayed slots, branch prediction table

– Structural hazards are resolved by duplicating resources

• Time to execute n instructions depends on

– # of stages (k)– # of control hazard and penalty of

each step– # of data hazards and penalty for

each– Time = n + k - 1 + (load hazard

penalty) + (branch penalty)

• Load hazard penalty is 1 or 0 cycle – Depending on data use with

forwarding

• Branch penalty is 3, 2, 1, or zero cycles depending on scheme

Page 53: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

53

Design and performance issues with pipelining

• Pipelined processors are not EASY to design

• Technology affect implementation

• Instruction set design affect the performance

– i.e., beq, bne• More stages do not lead to higher

performance!

Page 54: Chapter 6 Slides - University of Arizona · 2 Pipelining • Improve performance by increasing instruction throughput Instruction class Instruction memory Register read ALU Data memory

54

Chapter 6 Summary

• Pipelining does not improve latency, but does improve throughput

Slower Faster

Instructions per clock (IPC = 1/CPI)

Multicycle(Section 5.5)

Single-cycle(Section 5.4)

Deeplypipelined

Pipelined

Multiple issuewith deep pipeline

(Section 6.10)

Multiple-issuepipelined

(Section 6.9)

1 Several

Use latency in instructions

Multicycle(Section 5.5)

Single-cycle(Section 5.4)

DeeplypipelinedPipelined

Multiple issuewith deep pipeline

(Section 6.10)

Multiple-issuepipelined

(Section 6.9)