39
Pipelining Prof. James L. Frankel Harvard University Version of 2:05 AM 2-Apr-2019 Copyright © 2019 James L. Frankel. All rights reserved.

Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Embed Size (px)

Citation preview

Page 1: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

PipeliningProf. James L. Frankel

Harvard University

Version of 2:05 AM 2-Apr-2019Copyright © 2019 James L. Frankel. All rights reserved.

Page 2: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Single-Cycle CPU Implementation

• The clock cycle must be long enough to accommodate the longest instruction

• There is no overlapping of the execution of instructions

• With time-consuming instructions (floating point, memory access), the penalty is high

Page 3: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Pipelining

• Decreases the average number of cycles per instruction (CPI) –throughput

• Does not decrease the time for any single instruction – latency

Page 4: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Memory Organization

• We will allow the assumption that there are “two” memories – one for instructions and one for data

• This is not really the case

• We will have an instruction cache and a data cache• These will create the illusion that we have two separate memories

Page 5: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Instruction Set Design for Pipelining

• All instructions are the same length• x86 ISA has 1 to 15 bytes per instruction

• Small number of instruction formats• Operands can be fetched from registers before determining the instruction

type

• Load/store architecture means that the EX (ALU) stage can compute the memory address• If operands could come from memory, then the memory access would need

to precede the EX stage

• All memory accesses are aligned• Only a single memory access is required

Page 6: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Data Path Split into Pipeline Stages

Page 7: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

InsMemAd

dr

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Add

erA

LU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMemory

OrALU

Mux

ALUSecondOpImmOrReg2

Mux

Regw#From20..16Or15..11

IR

LoadIR

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

Page 8: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Right to Left Data Paths are Problematic

Page 9: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

InsMemAd

dr

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Add

erA

LU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMemory

OrALU

Mux

ALUSecondOpImmOrReg2

Mux

Regw#From20..16Or15..11

IR

LoadIR

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

Page 10: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Pipelining Three lw Instructions

IM Reg ALU DM Reglw $1, 100($0)

IM Reg ALU DM Reglw $2, 200($0)

IM Reg ALU DM Reglw $3, 300($0)

Instruction Execution Order

Page 11: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Staging the Pipelined Data

• Add pipeline registers between stages

• Allows the pipeline stages to run independently within the same clock cycle

Page 12: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

InsMemA

dd

r

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Add

erA

LU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMemory

OrALU

Mux

ALUSecondOpImmOrReg2

Mux

Regw#From20..16Or15..11

IR

LoadIR

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

IF/ID ID/EX EX/MEM MEM/WB

Page 13: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

lw Instruction in IF Pipeline Stage

Add

er

InsMemA

dd

r

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Add

erA

LU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMemory

OrALU

Mux

ALUSecondOpImmOrReg2

Mux

Regw#From20..16Or15..11

IR

LoadIR

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

IF/ID ID/EX EX/MEM MEM/WB

InsO

utIns

MemAd

dr

Mux

PC

LoadPC

Page 14: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

ID (Instruction Decode) Stage

• No need to keep the IR as a register because the current instruction is stored in the IF/ID pipeline register

Page 15: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

lw Instruction in ID Pipeline Stage

InsMemA

dd

r

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Add

erA

LU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMemory

OrALU

Mux

ALUSecondOpImmOrReg2

Mux

Regw#From20..16Or15..11

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

IF/ID ID/EX EX/MEM MEM/WB

Sign extend

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#Reg

Array

Page 16: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

lw Instruction in EX Pipeline Stage

InsMemA

dd

r

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Ad

der

ALU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMemory

OrALU

Mux

ALUSecondOpImmOrReg2

Mu

x

Regw#From20..16Or15..11

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

IF/ID ID/EX EX/MEM MEM/WB

Mux

ALU

Page 17: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

lw Instruction in MEM Pipeline Stage

InsMemA

dd

r

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Add

erA

LU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMemory

OrALU

Mux

ALUSecondOpImmOrReg2

Mux

Regw#From20..16Or15..11

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

IF/ID ID/EX EX/MEM MEM/WB

DataMem

WriteData

Ad

dr

Rea

dD

ata

Page 18: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

lw Instruction in WB Pipeline Stage

InsMemA

dd

r

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Add

erA

LU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMemory

OrALU

Mux

ALUSecondOpImmOrReg2

Mux

Regw#From20..16Or15..11

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

IF/ID ID/EX EX/MEM MEM/WB

Mux

RegReadOrWrite

DataIn

Reg2 #

RegW #

Reg1

Reg2

Reg1 #Reg

Array

Page 19: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Problem with lw in WB Pipeline Stage

• Unfortunately, the incorrect register is written in the WB stage

• The Regw# is coming from the wrong pipeline stage!

• Let’s correct the problem by keeping the Regw# with the value that will be written to that register

Page 20: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Corrected Pipeline Diagram for lw Instruction in WB Pipeline Stage

InsMemA

ddr

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Add

erA

LU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMemory

OrALU

Mux

ALUSecondOpImmOrReg2

Mux

Regw#From20..16Or15..11

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

IF/ID ID/EX EX/MEM MEM/WB

Mux

RegReadOrWrite

DataIn

Reg2 #

RegW #

Reg1

Reg2

Reg1 #Reg

Array

5

5

Page 21: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Executing an sw Instruction

IM Reg ALUsw $4, 400($0) DM Reg

Page 22: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

sw Instruction in MEM Pipeline Stage

InsMemA

ddr

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Add

erA

LU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMemory

OrALU

Mux

ALUSecondOpImmOrReg2

Mux

Regw#From20..16Or15..11

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

IF/ID ID/EX EX/MEM MEM/WB

5

5

DataMem

WriteData

Ad

dr

Rea

dD

ata

Page 23: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

sw Instruction in WB Pipeline Stage

InsMemA

ddr

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Add

erA

LU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMemory

OrALU

Mux

ALUSecondOpImmOrReg2

Mux

Regw#From20..16Or15..11

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

IF/ID ID/EX EX/MEM MEM/WB

5

5

Page 24: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Sequence of Instructions

• Imaging the pipelined execution of a sequence of five instructions:

lw $10, 20($1)

sub $11, $2, $3

add $12, $3, $4

lw $13, 24($1)

add $14, $5, $6

Page 25: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Multi-clock-cycle Pipeline Diagram of Above Instructions with Pipeline Registers

IM Reg ALU DM Reglw $10, 20($1)

sub $11, $2, $3

add $12, $3, $4

Instruction Execution Order

lw $13, 24($1)

add $14, $5, $6

IM Reg ALU RegDM

IM Reg ALU RegDM

IM Reg ALU DM Reg

IM Reg ALU RegDM

Page 26: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Identification of Control Lines

InsMemA

ddr

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Add

erA

LU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMem

oryOrALU

Mux

ALUSecondOpImmOrReg2

Mux

Regw#From20..16Or15..11

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

IF/ID ID/EX EX/MEM MEM/WB

5

5

MemRead MemWrite

Zero

Branch

Page 27: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Problem with Control Lines

• Each instruction at each stage of the pipeline needs to have its specific control lines asserted

• Difficult problem to manage the state of each instruction in each stage

• However, what if we stage the control lines with the pipelined data!

Page 28: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Pipelined Control Summary

IF: Instruction FetchID: Instruction Decode/

Register File ReadEX: Execute/Address

CalculationMEM: Memory Access WB: Write Back

IF/ID ID/EX EX/MEM MEM/WB

Control

Inst

ruct

ion

EX

ME

MW

B

ME

MW

B

WB

Page 29: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Pipelined Control Detail

InsMemA

dd

r

InsO

ut

4

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

5

5

[20..16]

[15..11]

5

[25..21]

[25..0]Shift left two bits

2628

[31..28]

4

Sign extend

[15..0]

16

Shift left two bits

32 32

32

32

5

5

32

32

Add

er

Add

erA

LU

ALUFcnincl. Carryin

PC

LoadPC

ClearPC

Mux

BrchOrJmpOr

Squencl

Mux

WriteRegFromMem

oryOrALU

Mux

ALUSecondOpImmOrReg2

Mux

Regw#From20..16Or15..11

RegReadOrWrite

RegArray

RegReadOrWrite

DataIn

Reg2#

RegW#

Reg1

Reg2

Reg1#

IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address

Calculation

DataMemA

ddr

Rea

dD

ata

WriteData

5

5

MEM: Memory Access

WB: Write Back

32

IF/ID ID/EX EX/MEM MEM/WB

5

5

MemRead MemWrite

Zero

Branch

Page 30: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Sequence of Instructions with Dependencies

• Pipelined execution of a sequence of five instructions with dependencies:

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Page 31: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Multi-clock-cycle Pipeline Diagram of Above Instructions with Dependencies

IM Reg ALU Regsub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

Instruction Execution Order

add $14, $2, $2

sw $15, 100($2)

IM Reg ALU Reg

IM Reg ALU Reg

IM Reg ALU Reg

IM Reg ALU Reg

DM

DM

DM

DM

DM

Page 32: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Multi-clock-cycle Pipeline Diagram of Above Instructions with Dependencies & Forwarding

IM Reg ALU Regsub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

Instruction Execution Order

add $14, $2, $2

sw $15, 100($2)

IM Reg ALU Reg

IM Reg ALU Reg

IM Reg ALU Reg

IM Reg ALU Reg

DM

DM

DM

DM

DM

Page 33: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Forwarding

• It is the responsibility of the Forwarding Unit to determine how to forward available values to where they are used

Page 34: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Sequence of Instructions with Hazards & Stalls• Pipelined execution of a sequence of five instructions with

dependencies:

lw $2, 20($1)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

Page 35: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Multi-clock-cycle Pipeline Diagram of Above Instructions with Hazards & Stalls

IM Reg ALU Reglw $2, 20($1)

and $4, $2, $5

or $8, $2, $6

Instruction Execution Order

add $9, $4, $2

slt $1, $6, $7

IM Reg ALU Reg

IM Reg ALU Reg

IM Reg ALU Reg

IM Reg ALU Reg

DM

DM

DM

DM

DM

Page 36: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Multi-clock-cycle Pipeline Diagram of Above Instructions with Hazards & Stalls & Forwarding

IM Reg ALU Reglw $2, 20($1)

and $4, $2, $5

or $8, $2, $6

Instruction Execution Order

add $9, $4, $2

slt $1, $6, $7

IM Reg ALU Reg

IM Reg ALU Reg

IM Reg ALU Reg

IM Reg ALU Reg

DM

DM

DM

DM

DM

Page 37: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Forwarding Does Not Solve All Our Problems

• We really need the value of $2 fromlw $2, 20($1)

inand $4, $2, $5

before it is available

• This can’t be corrected by using forwarding

• We need to delay the execution of and $4, $2, $5 for one clock cycle until $2 is available

Page 38: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Hazard Detection Unit

• It is the responsibility of the Hazard Detection Unit to determine if a stall is required

• The Hazard Detection Unit inserts bubbles

• A bubble causes the CPU to perform no operation• We refer to this as a nop (pronounced: no op)

Page 39: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –

Multi-clock-cycle Pipeline Diagram of Above Instructions with Hazards & Stalls Inserted

IM Reg ALU Reglw $2, 20($1)

and $4, $2, $5 becomes nop

and $4, $2, $5

Instruction Execution Order

or $8, $2, $6

add $9, $4, $2

IM Reg ALU Reg

IM Reg ALU Reg

IM Reg ALU Reg

IM Reg ALU Reg

DM

DM

DM

DM

DM