lec5-pipeline 1.ppt [Uyumluluk Modu]eem.eskisehir.edu.tr/userfiles/atdogan/files/lec7-pipeline_1.pdfPipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput

04.10.2010

1

Pipelining

Pipelining is Natural!

� Laundry Example

� Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold

� Washer takes 30 minutes

� Dryer takes 40 minutes

� “Folder” takes 20 minutes

A B C D

04.10.2010

2

Sequential Laundry

� Sequential laundry takes 6 hours for 4 loads

� If they learned pipelining, how long would laundry take?

A

B

C

D

30 40 20 30 40 20 30 40 20 30 40 20

6 PM 7 8 9 10 11 Midnight

Task

Order

Time

Pipelined Laundry: Start Work ASAP

� Pipelined laundry takes 3.5 hours for 4 loads

A

B

C

D

6 PM 7 8 9 10 11 Midnight

Task

Order

Time

30 40 40 40 40 20

04.10.2010

3

Pipelining Lessons

� Pipelining doesn’t help latency of single task, it helps throughput of entire workload

� Pipeline rate limited by slowest pipeline stage

� Multiple tasks operating simultaneously using different resources

� Potential speedup = Number pipe stages

� Unbalanced lengths of pipe stages reduces speedup

� Time to “fill” pipeline and time to “drain” it reduces speedup

� Stall for Dependences

� Ifetch: Instruction Fetch� Fetch the instruction from the Instruction Memory

� Reg/Dec: Registers Fetch and Instruction Decode

� Exec: Calculate the memory address

� Mem: Read the data from the Data Memory

� WrB: Write the data back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrBLoad

The Five Stages of a RISC Instruction

04.10.2010

4

The load instruction has 5 stages:

� Five independent functional units to work on each stage� Each functional unit is used only once!

� A second load can start doing Ifetch as soon as the first load finishes its Ifetch stage

� Each load still takes five cycles to complete� The latency of a single load is still 5 cycles

� The throughput is much higher� CPI approaches 1

� Cycle time is ~1/5th the cycle time of the single-cycle implementation

� Instructions start executing before previous instructions complete execution


Key Ideas Behind Instruction Pipelining

CPI ↓↓↓↓Cycle time ↓↓↓↓

Pipelining the LOAD Instruction

� The five independent pipeline stages are:� Read next instruction: The Ifetch stage

� Decode instruction and fetch register values: The Reg/Dec stage

� Execute the operation: The Exec stage

� Access data memory: The Mem stage

� Write data to destination register: The WrB stage

� One instruction enters the pipeline every cycle� One instruction comes out of the pipeline (completed) every cycle

� The “effective” CPI is 7/3 (tends to 1); ~1/5 cycle time

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Ifetch Reg/Dec Exec Mem WrB1st lw

Ifetch Reg/Dec Exec Mem WrB2nd lw

Ifetch Reg/Dec Exec Mem WrB3rd lw

04.10.2010

5

� Ifetch: Instruction fetch� Fetch the instruction from the instruction memory

� Reg/Dec: Registers fetch and instruction decode

� Exec: ALU operates on the two register operands

� WrB: Write the ALU output back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec WrBR-type

The Four Stages of R-type

� We have a problem called pipeline conflict or hazard� 2 instructions try to write to the register file at the same time!

� “Contention for a shared resource” (in OS terminology)

� It is no longer meaningful to talk about the execution of a single instruction in isolation� Execution is inherently concurrent; need to achieve serializability

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Exec WrR-type


Ifetch Reg/Dec Exec Mem WrLoad



OOPS! We have a problem!

Pipelining the R-type and Load Instructions

04.10.2010

6

� Each functional unit can only be used once/instruction

� Each functional unit must be used at the same stage for all instructions� Load uses Register File’s Write Port during its 5th stage

� R-type uses Register File’s Write Port during its 4th stage


1 2 3 4 5

Ifetch Reg/Dec Exec WrBR-type

1 2 3 4

è How to resolve this pipeline hazard?

Important Observations

Clock


Ifetch Reg/Dec Mem WrBR-type





Exec

Exec

Exec

Exec

Ifetch Reg/Dec Exec WrR-type Mem

1 2 3 4 5

Solution: Delay R-type’s Write by 1 Cycle

� Delay R-type’s register write by one cycle:� Now R-type instructions also use Reg File’s write port at Stage 5

� Mem stage is a NO-OP stage: nothing is being done. Effective CPI?

04.10.2010

7


� Reg/Dec: Registers fetch and instruction decode

� Exec: Calculate the memory address

� Mem: Write the data into the data memory


Ifetch Reg/Dec Exec MemStore WrB

The Four Stages of Store


� Reg/Dec: Registers fetch and instruction decode� Exec: ALU compares the two register operands

� Adder calculates the branch target address

� Mem: If the registers we compared in the Exec stage are the same,� Write the branch target address into the PC


Ifetch Reg/Dec Exec MemBeq WrB

The Four Stages of Beq

04.10.2010

8

Basic Idea

What do we need to add to split the datapath into stages?

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Address

Datamemory

1

ALUresult

Mux

ALUZero

IF: Instruction fetch ID: Instruction decode/register file read

E X: Execute/address calculation

MEM: Memory access WB: Write back

Pipelined Datapath

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruc

tion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

04.10.2010

9

LW instruction: Instruction Fetch (IF)

Instructionmemory

Address

4

32

0

AddAdd

result

Shiftleft 2

Instr

uctio

n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX

Datamemory

Address

Instruction fetch

LW instruction: Instruction Decode (ID)

Instructionmemory

Address

4

32

0

AddAdd

result

Shiftleft 2

Instr

uctio

n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX

Datamemory

Address

Instruction decode

04.10.2010

10

LW instruction: Execution (EX)

Instructionmemory

Address

4

32

0

AddAdd

result

Shiftleft 2

Instr

uctio

n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX

Datamemory

Address

Execution

LW instruction: Memory (MEM)

Instructionmemory

Address

4

32

0

AddAdd

result

Shiftleft 2

Instr

uctio

n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX

Datamemory

Address

Memory

04.10.2010

11

LW instruction: Writeback (WB)

Instructionmemory

Address

4

32

0

AddAdd

result

Shiftleft 2

Instr

uctio

n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX

Datamemory

Address

Writeback

SW instruction: Memory (MEM)

Instructionmemory

Address

4

32

0

AddAdd

result

Shiftleft 2

Instr

uctio

n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX

Datamemory

Address

Memory

04.10.2010

12

SW instruction: Writeback (WB): do nothing

Instructionmemory

Address

4

32

0

AddAdd

result

Shiftleft 2

Instr

uctio

n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX

Datamemory

Address

Writeback

Corrected Datapath

In s tr u c t io n

m e m o ry

A dd re s s

4

3 2

0

A ddA d d

re s u lt

S h if t

le ft 2

Ins

tru

cti

on

IF /I D E X / M E M M E M /W B

M

u

x

0

1

A d d

P C

0

A d dr e ss

W ri te

d at a

M

u

x

1

R e g is te rs

R e a d

d a t a 1

R e a d

d a t a 2

R e a d

re g is te r 1

R e a d

re g is te r 2

1 6S ig n

e xte n d

W rite

re g is te r

W rite

d ata

R e a d

da ta

D a ta

m e m o ry

1

A L U

re s ul tM

u

x

A L U

Z e ro

ID /E X

04.10.2010

13

Pipelining Example

Instructionmemory

Address

4

32

0

AddAdd

result

Shiftleft 2

Instructio

n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX

Datamemory

Address

add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1)

Events on Every Pipe Stage

04.10.2010

14

� We have 5 stages. What needs to be controlled in

each stage?

� Instruction Fetch and PC Increment

� Instruction Decode / Register Fetch

� Execution

� Memory Stage

� Write Back

Pipeline control

Pipeline Control

PC

Instructionmemory

Address

Inst

ruct

ion

Instruction[20– 16]

MemtoReg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1Write

data

Read

data Mux

1

ALUcontrol

RegWrite

MemRead


6

IF/ID ID/EX EX/MEM MEM/WB

MemWrite

Address

Datamemory

PCSrc

Zero

AddAdd

result

Shift

left 2

ALUresult

ALU

Zero

Add

0

1

Mux

0

1

Mux

04.10.2010

15

� Pass control signals along just like the data

Pipeline Control

Execution/Address Calculation

stage control lines

Memory access stage

control lines

Write-back

stage control

lines

Instruction

Reg

Dst

ALU

Op1

ALU

Op0

ALU

Src Branch

Mem

Read

Mem

Write

Reg

write

Mem to

Reg

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

Control

EX

M

WB

M

WB

WB

IF/ID ID/EX EX/MEM MEM/WB

Instruction

Datapath with Control

PC

Instructionmemory

I nst

ruct

i on

Add


Me

mto

Re

g

ALUOp

Branch

RegDst

ALUSrc

4


0

0

Mux

0

1

AddAdd

result

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Re

gW

rit e

MemRead

Control

ALU


6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Me

mW

rite

Address

Datamemory

Address

04.10.2010

16


PC

Instructionmemory

Inst

ruct

ion

Add


Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4


0

0

Mux

0

1

AddAdd

result


Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

RegW

rite

MemRead

Control

ALU


6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Write

Address

Datamemory

Address

IF: lw $10, 9($1)IF: lw $10, 9($1)


PC

Instructionmemory

Inst

ruct

ion

Add


Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4


0

0

Mux

0

1

AddAdd

result


Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

RegW

rite

MemRead

Control

ALU


6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Write

Address

Datamemory

Address

IF: sub $11, $2, $3IF: sub $11, $2, $3 ID: lw $10, 9($1)ID: lw $10, 9($1)

11

010

0001E

“lw”

04.10.2010

17


PC

Instructionmemory

Inst

ruct

ion

Add


Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4


0

0

Mux

0

1

AddAdd

result


Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

RegW

rite

MemRead

Control

ALU


6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Write

Address

Datamemory

Address

11

010

00E

ID: sub $11, $2, $3ID: sub $11, $2, $3 EX: lw $10, 9($1)EX: lw $10, 9($1)IF: and $12, $4, $5IF: and $12, $4, $5

1

0

10

000

1100

“sub”


PC

Instructionmemory

Inst

ruct

ion

Add


Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4


0

0

Mux

0

1

AddAdd

result


Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

RegW

rite

MemRead

Control

ALU


6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Write

Address

Datamemory

Address

10

000

10E

EX: sub $11, $2, $3EX: sub $11, $2, $3 MEM: lw $10, 9($1)MEM: lw $10, 9($1)ID: and $12, $4, $5ID: and $12, $4, $5

0

1

10

000

1100

IF: or $13, $6, $7IF: or $13, $6, $7

110

10

“and”

04.10.2010

18


PC

Instructionmemory

Inst

ruct

ion

Add


Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4


0

0

Mux

0

1

AddAdd

result


Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

RegW

rite

MemRead

Control

ALU


6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Write

Address

Datamemory

Address

10

000

10E

MEM: sub $11, ..MEM: sub $11, .. WB: lw $10, WB: lw $10, 9($1)9($1)

EX: and $12, $4, $5EX: and $12, $4, $5

0

1

10

000

1100

ID: or $13, $6, $7ID: or $13, $6, $7

100

00

“or”

IF: add $14, $8, $9IF: add $14, $8, $9

1

1


PC

Instructionmemory

Inst

ruct

ion

Add


Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4


0

0

Mux

0

1

AddAdd

result


Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

RegW

rite

MemRead

Control

ALU


6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Write

Address

Datamemory

Address

10

000

10E

WB: sub $11, ..WB: sub $11, ..MEM: and $12…MEM: and $12…

0

1

10

000

1100

EX: or $13, $6, $7EX: or $13, $6, $7

100

00

“add”

ID: add $14, $8, $9ID: add $14, $8, $9

1

0

IF: xxxxIF: xxxx

04.10.2010

19


PC

Instructionmemory

Inst

ruct

ion

Add


Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4


0

0

Mux

0

1

AddAdd

result


Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

RegW

rite

MemRead

Control

ALU


6

M

WB

WBIF/ID

PCSrc

EX/MEM

MEM/WB

Mux

0

1

Mem

Write

Address

Datamemory

Address

10

000

10

WB: and $12…WB: and $12…

0

1

MEM: or $13, ..MEM: or $13, ..

100

00

EX: add $14, $8, $9EX: add $14, $8, $9

1

0

IF: xxxxIF: xxxx ID: xxxxID: xxxx

X

M

WB

ID/EX

E

Datapath with ControlWB: or $13…WB: or $13…

PC

Instructionmemory

Inst

ruct

ion

Add


Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4


0

0

Mux

0

1

AddAdd

result


Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

RegW

rite

MemRead

Control

ALU


6

M

WB

WBIF/ID

PCSrc

EX/MEM

MEM/WB

Mux

0

1

Mem

Write

Address

Datamemory

Address

MEM: add $14, ..MEM: add $14, ..

1000

0

EX: xxxxEX: xxxx

1

0


X

M

WB

ID/EX

E

04.10.2010

20


PC

Instructionmemory

Inst

ruct

ion

Add


Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4


0

0

Mux

0

1

Add Addresult


Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

RegW

rite

MemRead

Control

ALU


6

M

WB

WBIF/ID

PCSrc

EX/MEM

MEM/WB

Mux

0

1

Mem

Write

Address

Datamemory

Address

WB: add $14..WB: add $14..MEM: xxxxMEM: xxxxEX: xxxxEX: xxxx

1

0


X

M

WB

ID/EX

E

Graphically Representing Pipelines

Multiple-clock-cycle pipeline diagram of two instructions

04.10.2010

21

Conventional Pipelined Execution Representation

IFetch Dcd Exec Mem WB





IFetch Dcd Exec Mem WBProgram Flow

Time

Single Cycle, Multiple Cycle, vs. Pipeline

Clk

Cycle 1

Multiple Cycle Implementation:

Ifetch Reg Exec Mem Wr

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10

Load Ifetch Reg Exec Mem Wr

Ifetch Reg Exec Mem

Load Store

Pipeline Implementation:

Ifetch Reg Exec Mem WrStore

Clk

Single Cycle Implementation:

Load Store Waste

Ifetch

R-type

Ifetch Reg Exec Mem WrR-type

Cycle 1 Cycle 2

04.10.2010

22

Why Pipeline?

� Suppose we execute 100 instructions

� Single Cycle Machine� 45 ns/cycle x 1 CPI x 100 inst = 4500 ns

� Multicycle Machine� 10 ns/cycle x 4.1 CPI (due to inst mix) x 100 inst = 4100 ns

� Ideal pipelined machine� 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

Why Pipeline? Because the resources are there

Instr.

Order

Time (clock cycles)

Inst 0

Inst 1

Inst 2

Inst 4

Inst 3

ALUIm Reg Dm Reg

ALUIm Reg Dm Reg

ALUIm Reg Dm Reg

ALUIm Reg Dm Reg

ALUIm Reg Dm Reg

04.10.2010

23

Wr

Clk

Cycle 1

Multiple Cycle Implementation:


Load Ifetch Reg Exec Mem Wr

Ifetch Reg Exec Mem

Load Store

Pipelined Implementation:

Ifetch Reg Exec Mem WrStore

Clk

Single Cycle Implementation:

Load Store Waste

Ifetch

R-type

Ifetch Reg Exec Mem WrR-type

Cycle 1 Cycle 2

Ifetch Reg Exec Mem

Single Cycle vs. Multiple Cycle vs. Pipelined

CPU Designs: Summary

� Disadvantages of the Single Cycle Processor� Long cycle time

� Cycle time wasted for the faster instructions

� Multiple Clock Cycle Processor� Divide the instructions into smaller steps

� Execute each step (instead of the entire instruction) in 1 cycle

� Pipelined Processor� Natural enhancement of the multiple clock cycle processor

� Each functional unit used only once per instruction

� If an instruction is going to use a functional unit:� it must use it at the same stage as all other instructions

� Pipeline Control:� each stage’s control signal depends ONLY on the instruction that is currently in that stage

04.10.2010

24

Basic Terms

� Filling a pipeline

� Flushing or draining a pipeline

� Stage or segment delay� Each stage may have a different stage delay

� Beat time (= max stage delay), or clock cycle time

� Number of stages

� End-to-end latency� number of stages × beat time

� Stages are separated by latches (registers)

Documents

lec5-pipeline 1.ppt [Uyumluluk Modu]eem.eskisehir.edu.tr/userfiles/atdogan/files/lec7-pipeline_1.pdfPipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput