1 השעון Hertz=1/sec מחשב פנטיום במהירות של פירושו שהוא מבצע...

Preview:

Citation preview

1

השעון

Hertz=1/sec

מחשב פנטיום במהירות של

. מחזורי שעון בשניה2 *10^8פירושו שהוא מבצע

כל מחזור שעון לוקח

200MHZ

5*10^-9=5nanosecond

?כמה לוקחת פקודה בימינו

2

Gotta Do Laundry° Ann, Brian, Cathy, Dave

each have one load of clothes to wash, dry, fold, and put away

A B C D

° Dryer takes 30 minutes

° “Folder” takes 30 minutes

° “Stasher” takes 30 minutes to put clothes into drawers

° Washer takes 30 minutes

3

Sequential Laundry

• Sequential laundry takes 8 hours for 4 loads

Task

Order

B

C

D

A

30Time

3030 3030 30 3030 3030 3030 3030 3030

6 PM 7 8 9 10 11 12 1 2 AM

4

Pipelined Laundry

• Pipelined laundry takes 3.5 hours for 4 loads!

Task

Order

B

C

D

A

12 2 AM6 PM 7 8 9 10 11 1

Time303030 3030 3030

5

General Definitions

• Latency: time to completely execute a certain task– for example, time to read a sector from disk is

disk access time or disk latency

• Throughput: amount of work that can be done over a period of time

6

Pipelining Lessons (1/2)

• Pipelining doesn’t help latency of single task, it helps throughput of entire workload

• Multiple tasks operating simultaneously using different resources

• Potential speedup = Number pipe stages

• Time to “fill” pipeline and time to “drain” it reduces speedup:2.3X v. 4X in this example

6 PM 7 8 9

Time

B

C

D

A

303030 3030 3030Task

Order

7

Pipelining Lessons (2/2)

• Suppose new Washer takes 20 minutes, new Stasher takes 20 minutes. How much faster is pipeline?

• Pipeline rate limited by slowest pipeline stage

• Unbalanced lengths of pipe stages also reduces speedup

6 PM 7 8 9

Time

B

C

D

A

303030 3030 3030Task

Order

8

Steps in Executing MIPS

1) IFetch: Fetch Instruction, Increment PC2) Decode Instruction, Read Registers3) Execute: Mem-ref: Calculate Address Arith-log: Perform Operation

4) Memory: Load: Read Data from Memory Store: Write Data to Memory

5) Write Back: Write Data to Register

9

Pipelined Execution Representation

• Every instruction must take same number of steps, also called pipeline “stages”, so some will go idle sometimes

IFtch Dcd Exec Mem WB

IFtch Dcd Exec Mem WB

IFtch Dcd Exec Mem WB

IFtch Dcd Exec Mem WB

IFtch Dcd Exec Mem WB

IFtch Dcd Exec Mem WB

Time

10

Review: Datapath for MIPS

Stage 1 Stage 2 Stage 3Stage 4 Stage 5

• Use datapath figure to represent pipelineIFtch Dcd Exec Mem WB

AL

U I$ Reg D$ Reg

PC

inst

ruct

ion

me

mor

y+4

rtrs

rd

regi

ste

rs

ALU

Da

tam

em

ory

imm

1. InstructionFetch

2. Decode/ Register Read

3. Execute 4. Memory5. Write

Back

11

Graphical Pipeline Representation

Instr.

Order

Load

Add

Store

Sub

Or

I$

Time (clock cycles)

I$

AL

U

Reg

Reg

I$

D$

AL

U

AL

U

Reg

D$

Reg

I$

D$

RegA

LU

Reg Reg

Reg

D$

Reg

D$

AL

U

(In Reg, right half highlight read, left half write)

Reg

I$

12

Example• Suppose 2 ns for memory access, 2 ns for ALU operation, and 1 ns for register file read or write

• Nonpipelined Execution:–lw : IF + Read Reg + ALU + Memory + Write Reg

= 2 + 1 + 2 + 2 + 1 = 8 ns–add: IF + Read Reg + ALU + Write Reg

= 2 + 1 + 2 + 1 = 6 ns

• Pipelined Execution:–Max(IF,Read Reg,ALU,Memory,Write Reg) = 2

ns

13

PIPELINE הרעיון מאחורי

Instructionfetch

Reg ALUData

accessReg

8 nsInstruction

fetchReg ALU

Dataaccess

Reg

8 nsInstruction

fetch

8 ns

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 4 6 8 10 12 14 16 18

2 4 6 8 10 12 14

...

Programexecutionorder(in instructions)

Instructionfetch

Reg ALUData

accessReg

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Programexecutionorder(in instructions)

Never waste time !!!

14

חלוקה לשלבים

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

ReaddataAddress

Datamemory

1

ALUresult

Mux

ALUZero

IF: Instruction fetch ID: Instruction decode/register file read

EX: Execute/address calculation

MEM: Memory access WB: Write back

15

הוספת הרגיסטרים

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

16

In struc t ion

m e m o ry

Add re ss

4

32

0

A ddA d d

re s u lt

S h if t

le f t 2

Ins

tru

ct i

on

IF /I D E X / M E M M E M /W B

Mux

0

1

A d d

P C

0W ri teda ta

Mux

1

R eg iste rs

R e a dd a ta 1

R e a dd a ta 2

R e a dre g is te r 1

R e a dre g is te r 2

16S ig n

e xte nd

W ritere g is te r

W rited ata

R ea dda ta

1

A LUre s u lt

Mux

A LU

Z e ro

ID /E X

In s tr u c t io n fe tc h

lw

A

IF/ID

17

In struc t ion

m e m o ry

A dd re ss

4

32

0

A ddA dd

res u lt

S h if t

le ft 2

Inst

ruc

tio

n

IF / I D E X / M E M

Mux

0

1

A dd

P C

0W r iteda ta

Mux

1

R eg iste rs

R e a dda ta 1

R e a dda ta 2

R e a dre g is te r 1

R e a dre g is te r 2

16S i gn

e xte nd

W r itere g is te r

W rited ata

R ea dda ta

1

A LUre s u lt

Mux

A LU

Ze ro

ID /E X M E M /W B

I n s t r u c t io n d e c o d e

lw

A d d re s s

D ata

m e m o ry

ID/EX

18

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Execution

lw

Address

Datamemory

EX/MEM

19

In s tru c t ion

m e m or y

A dd res s

4

32

0

A d dA dd

r e su lt

S h if t

le ft 2

Ins

tru

ct i

on

I F /ID E X/M E M

Mux

0

1

A d d

P C

0W rit ed a ta

Mux

1

R e g is te rs

R ea dd ata 1

R ea dd ata 2

R e adre g is ter 1

R e adre g is ter 2

16S ig n

e xte nd

W ritere g is ter

W riteda ta

R e add at a

D a ta

m em o ry

1

A L Ures u lt

Mux

A L U

Z er o

ID /E X M E M /W B

M e m o r y

lw

A d dre ss

MEM/WB

20

In stru ct ion

m em ory

A dd res s

4

32

0

Ad dA dd

r esu lt

S h ift

l e ft 2

Ins

tru

ct io

n

I F /ID E X /M E M

Mux

0

1

A d d

P C

0W rit ed a ta

Mux

1

R eg iste rs

R e add ata 1

R e add ata 2

R e a dre g is ter 1

R e a dre g is ter 2

16S ig n

e xte nd

W rited a ta

R e a ddat aD a ta

m e m o ry

1

A LUre s u lt

Mux

A LU

Ze ro

ID /EX M E M /W B

W r it e b a c k

l w

W ritere g is te r

A d dre s s

21

A correction !!! תיקון

Keep the right Rd all the way!

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX

22

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Address

Datamemory

So here is the updated CPU;

23

תצוגה גרפית

IM Reg DM Reg

IM Reg DM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $10, 20($1)

Programexecutionorder(in instructions)

sub $11, $2, $3

ALU

ALU

24

Control

PC

Instructionmemory

Address

Inst

ruct

ion

Instruction[20– 16]

MemtoReg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1Write

data

Read

data Mux

1

ALUcontrol

RegWrite

MemRead

Instruction[15– 11]

6

IF/ID ID/EX EX/MEM MEM/WB

MemWrite

Address

Datamemory

PCSrc

Zero

AddAdd

result

Shiftleft 2

ALUresult

ALU

Zero

Add

0

1

Mux

0

1

Mux

25

קווי הבקרהExecution/Address Calculation

stage control linesMemory access stage

control lines

Write-back stage control

lines

InstructionReg Dst

ALU Op1

ALU Op0

ALU Src Branch

Mem Read

Mem Write

Reg write

Mem to Reg

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

Control

EX

M

WB

M

WB

WB

IF/ID ID/EX EX/MEM MEM/WB

Instruction

26

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Me

mto

Re

g

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Re

gWrit

e

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Me

mW

rite

AddressData

memory

Address

Datapath with Control

27

דוגמאA demonstration of a sequence of instructions:

Lw $10,20($1)

Sub $11,$2,$3

And $12,$4,$5

Or $13,$6,$7

Add $14,$8,$9

28

Instructionmemory

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

Instruction[15– 0]

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

ALUcontrol

Shiftleft 2

Re

gWrit

e

MemRead

Control

ALU

Instruction[15– 11]

EX

M

WB

M

WB

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: before<1> EX: before<2> MEM: before<3> WB: before<4>

MEM/WB

IF: lw $10, 20($1)

000

00

0000

000

00

000

0

00

00

0

0

0

Mux

0

1

Add

PC

0

Datamemory

Address

Writedata

Readdata

Mux

1

WB

EX

M

Instructionmemory

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

Mux

0

1

Add Addresult

Writeregister

Writedata

Mux

1

ALUresult

Zero

ALUcontrol

Shiftleft 2

Re

gWrit

e

ALU

M

WB

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: lw $10, 20($1) EX: before<1> MEM: before<2> WB: before<3>

MEM/WB

IF: sub $11, $2, $3

010

11

0001

000

00

000

0

00

00

0

0

0

Mux

0

1

Add

PC

0Writedata

Readdata

Mux

1

lwControl

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

X

10

20

X

1

Instruction[20– 16]

Instruction[15– 0] Sign

extend

Instruction[15– 11]

20

$X

$1

10

X

Me

mW

rite

MemReadM

em

Writ

e

Datamemory

Address

Address

Address

Clock 2

Clock 1

29

Instructionmemory

Address

Instruction[20– 16]

Mem

toR

eg

Branch

ALUSrc

4

Instruction[15– 0]

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

ALUresult

Shiftleft 2

Re

gWrit

e

MemRead

Control

ALU

Instruction[15– 11]

EX

M

WB

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: sub $11, $2, $3 EX: lw $10, . . . MEM: before<1> WB: before<2>

MEM/WB

IF: and $12, $4, $5

000

10

1100

010

11

000

1

00

00

0

0

0

Mux

0

1

Add

PC

0Writedata

Readdata

Mux

1

WB

EX

M

Instructionmemory

Address

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

0

1

Add Addresult

Writeregister

Writedata 1

ALUresult

ALUcontrol

Shiftleft 2

Re

gWrit

e

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: and $12, $2, $3 EX: sub $11, . . . MEM: lw $10, . . . WB: before<1>

MEM/WB

IF: or $13, $6, $7

000

10

1100

000

10

101

0

11

10

0

0

0

Mux

0

1

Add

PC

0Writedata

Mux

1

andControl

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

12

X

X

5

4

Instruction[20– 16]

Instruction[15– 0]

Instruction[15– 11]

X

$5

$4

X

12

Me

mW

rite

MemReadM

em

Writ

e

sub

11

X

X

3

2

X

$3

$2

X

11

$1

20

10

Mux

0

Mux

1

ALUOp

RegDst

ALUcontrol

M

WB

$3

$2

11

Mux

Mux

ALUAddress Read

dataData

memory

10

WB

Zero

Zero

Signextend

Signextend

Datamemory

Address

Clock 3

Clock 4

ID: and $12, $4, $5

30

Instructionmemory

Address

Instruction[20– 16]

Branch

ALUSrc

4

Instruction[15– 0]

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

ALUresult

Shiftleft 2

Re

gWrit

e

MemRead

Control

ALU

Instruction[15– 11]

EX

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: or $13, $6, $7 EX: and $12, . . . MEM: sub $11, . . . WB: lw $10, . . .

MEM/WB

IF: add $14, $8, $9

000

10

1100

000

10

101

0

10

00

0

Mux

0

1

Add

PC

0Writedata

Readdata

Mux

1

WB

EX

M

Instructionmemory

Address

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

0

1

Add Addresult

1

ALUresult

ALUcontrol

Shiftleft 2

Re

gWrit

e

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: add $14, $8, $9 EX: or $13, . . . MEM: and $12, . . . WB: sub $11, . . .

MEM/WB

IF: after<1>

000

10

1100

000

10

101

0

10

00

0

1

0

Mux

0

1

Add

PC

0Writedata

Mux

1

addControl

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

14

X

X

9

8

Instruction[20– 16]

Instruction[15– 0]

Instruction[15– 11]

X

$9

$8

X

14

Me

mW

rite

MemReadM

em

Writ

e

or

13

X

X

7

6

X

$7

$6

X

13

$4

Mux

0

Mux

1

ALUOp

RegDst

ALUcontrol

M

WB

$7

$6

13

Mux

Mux

ALUReaddata

12

WB

11 10

10$5

12

WB

Mem

toR

eg

1

1

11

11

Writeregister

Writedata

Zero

Zero

Datamemory

Address

Datamemory

Address

Signextend

Signextend

Clock 5

Clock 6

31

Instructionmemory

Address

Instruction[20– 16]

Branch

ALUSrc

4

Instruction[15– 0]

0

1

Add Addresult

RegistersWriteregister

Writedata

ALUresult

Shiftleft 2

Re

gWrit

e

MemRead

Control

ALU

Instruction[15– 11]

Signextend

EX

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: after<1> EX: add $14, . . . MEM: or $13, . . . WB: and $12, . . .

MEM/WB

IF: after<2>

000

00

0000

000

10

101

0

10

00

0

Mux

0

1

Add

PC

0Writedata

Readdata

Mux

1

WB

EX

M

Instructionmemory

Address

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

0

1

Add Addresult

1

ALUresult

Zero

ALUcontrol

Shiftleft 2

Re

gWrit

e

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: after<2> EX: after<1> MEM: add $14, . . . WB: or $13, . . .

MEM/WB

IF: after<3>

000

00

0000

000

00

000

0

10

00

0

1

0

Mux

0

1

Add

PC

0Writedata

Mux

1

Control

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[20– 16]

Instruction[15– 0] Sign

extend

Instruction[15– 11]

Me

mW

rite

MemReadM

em

Writ

e

$8

Mux

0

Mux

1

ALUOp

RegDst

ALUcontrol

M

WB

Mux

Mux

ALUReaddata

14

WB

13 12

12$9

14

WB

Mem

toR

eg

1

0

13

13

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2 Zero

Datamemory

Address

Datamemory

Address

Clock 7

Clock 8

32

WB

EX

M

Instructionmemory

Address

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

0

1

Add Addresult

1

ALUresult

Zero

ALUcontrol

Shiftleft 2R

egW

r ite

M

WB

Inst

r uct

ion

IF/ID EX/MEMID/EX

ID: after<3> EX: after<2> MEM: after<1> WB: add $14, . . .

MEM/ WB

IF: after<4>

000

00

0000

000

00

000

0

00

00

0

1

0

Mux

0

1

Add

PC

0Writedata

Mux

1

Control

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[20– 16]

Instruction[15– 0] Sign

extend

Instru ction[15– 11]

MemRead

Mem

Wr i

te

Mux

Mux

ALUReaddata

WB

14

14

Writeregister

Writedata

Datamemory

Address

Clock 9

33

Pipeline Hazard: Matching socks in later load

A depends on D; stall since folder tied up

Task

Order

B

C

D

A

E

F

bubble

12 2 AM6 PM 7 8 9 10 11 1

Time303030 3030 3030

34

Problems for Computers• Limits to pipelining: Hazards prevent next

instruction from executing during its designated clock cycle– Structural hazards: HW cannot support this combination

of instructions (single person to fold and put clothes away)

– Control hazards: Pipelining of branches & other instructions stall the pipeline until the hazard “bubbles” in the pipeline

– Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)

35

An example for data hazards:

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

36

An example for data hazards:

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

An example for data hazards:Register $2 is updated only at the WB phase, i.e., the 5th clock cycle (actually at the end of the 5th clock cycle). However, we try to use it at the 3rd clock cycle when we read $2 at the decode phase of the and instruction

37

Graphic representation of data hazards:

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2:

DM Reg

Reg

Reg

Reg

DM

38

sub $2, $1, $3

nop

nop

nop

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Solving data hazards by adding nops

39

Solving data hazards by adding nops

Programexecutionorder(in instructions)

and $12, $2, $

or $13, $6, $ 2

add $14, $2, $

sw $15, 100( $2

5

2

)g

IM Reg

IM Reg DM Reg

IM DM Reg

IM DM Re

Reg

Reg

Reg

DM

IM Reg DMReg

IM Reg DM Reg

IM Reg DMReg

IM Reg DMReg

sub $2, $1, $ 3

nop

nop

nop

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

Value of register $2:

CC 10 CC 11 CC 12

– 20 – 20 – 20

40

Graphic representation of data hazards:

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2:

DM Reg

Reg

Reg

Reg

DM

41

Forwarding – גניבת הערכים

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecution order(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2 :

DM Reg

Reg

Reg

Reg

X X X – 20 X X X X XValue of EX/MEM :X X X X – 20 X X X XValue of MEM/WB :

DM

42

Forwarding (done at the execute phase)

PCInstruction

memory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID

Inst

ruct

ion

Mux

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

If ID/EX.Rs=EX/MEM.Rd, i.e., the Rd of the previous instruction equals the Rs of the current instruction (which is in the “decode” phase), then we use the “ALUout” of the previous instruction

instead of the output of the GPR. If ID/EX.Rs=MEM/WB.Rd, i.e., the Rd of the previous instruction equals the Rs of the current instruction (which is in the “decode” phase), then we use the “ALUout” of the previous instruction instead of the output of the GPR. [ similarly, compare also ID/EX.Rt to MEM/WB.Rd ]

Similarly, compare also ID/EX.Rt to EX/MEM.Rd and to MEM/WB.Rd

43

Data hazard from previous instruction:

ALU Src A:

If (ID/EX.Rs = = EX/MEM.Rd) use the “ALUOut” instead of Rs

I.e., if Rs of the current executing instruction = = Rd of the previous instruction

The actual equations are:

if ((EX/MEM.RegWrite = = ‘1”)&& (EX/MEM.Rd <> 0)&& (ID/EX.Rs = = EX/MEM.Rd)) => ForwardA=“1,0”

ALU Src B:

If (ID/EX.Rt = = EX/MEM.Rd) use the “ALUOut” instead of Rt

I.e., if Rt of the current executing instruction = = Rd of the previous instruction

The actual equations are:

if ((EX/MEM.RegWrite = = ‘1”)&& (EX/MEM.Rd <> 0)&& (ID/EX.Rt = = EX/MEM.Rd)) => ForwardB=“1,0”

44

Data hazard from 2 instructions back:ALU Src A:

If (ID/EX.Rs = = MEM/WB.Rd) use the GPR “write data” instead of Rs

I.e., if Rs of the current executing instruction = = Rd of 2 instructions ago

The actual equations are:

if ((MEM/WB.RegWrite = = ‘1”)&& (MEM/WB.Rd <> 0)&& (ID/EX.Rs = = MEM/WB.Rd)) => ForwardA=“0,1”

ALU Src B:

If (ID/EX.Rt = = MEM/WB.Rd) use the GPR “write data” instead of Rt

I.e., if Rt of the current executing instruction = = Rd of 2 instructions ago

The actual equations are:

if ((MEM/WB.RegWrite = = ‘1”)&& (MEM/WB.Rd <> 0)&& (ID/EX.Rt = = MEM/WB.Rd)) => ForwardB=“0,1”

Double hazard: If there is a hazard from previous inst and the instruction before that?We should chhose the data from the previous instruction, it is up to date (“newer”)!

45

An example for forwarding דוגמא

Sub $2, $1, $3

And $4, $2, $5 needs forwarding from the previous instruction

Or $4, $4, $2 needs forwarding from two instructions back

Add $9, $4, $2 needs forwarding from 3 instructions back (thru the “transparent” GPR)

Here we discuss the $2 register only

(The first two cases are handled in the execute phase, the last one, in the decode phase).

46

An example for forwarding דוגמא

Sub $2, $1, $3

And $4, $2, $5

Or $4, $4, $2 needs forwarding from the previous instruction

Add $9, $4, $2 needs forwarding from the previous instruction

Here we discuss the $4 register and there are two case (the 2nd one in purple)

47

PCInstruction

memory

Registers

Mux

Mux

Mux

EX

M

WB

WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

and $4, $2, $5 sub $2, $1, $3

ID/EX

before<1>

EX/MEM

before<2>

MEM/WB

or $4, $4, $2

Clock 3

2

5

10 10

$2

$5

5

2

4

$1

$3

3

1

2

Control

ALU

PCInstruction

memory

Registers

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

or $4, $4, $2 and $4, $2, $5

ID/EX

sub $2, . . .

EX/MEM

before<1>

MEM/WB

add $9, $4, $2

Clock 4

4

6

10 10

$4

$2

6

2

4

$2

$5

5

2

4

Control

ALU

10

2

WB

M

WB

Since Rs=2 and Rd of previous inst. was 2, we use ALUout instead of Rs

48

PCInstruction

memory

Registers

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

add $9, $4, $2 or $4, $4, $2

ID/EX

and $4, . . .

EX/MEM

sub $2, . . .

MEM/WB

after<1>

Clock 5

4

2

10 10

$4

$2

2

4

9

$4

$2

4

2

24

Control

ALU

10

WB

2

1

PCInstruction

memory

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

after<1>after<2> add $9, $4, $2 or $4, . . .

EX/MEM

and $4, . . .

MEM/WB

Clock 6

10

$4

$2

2

4

9

ALU

10

4

4

WB

4

1

Registers

Inst

ruct

ion

IF/ID

ID/EX

4

Control

In red we see forwarding from two instructions back (Mem->Exec.), in purple, from previous instruction (WB->Exec.), in blue, from 3 instructions back (WB->Decode).

49

The solution does not work for lw - לא תמיד הפתרון עובד(in lw we do not have the data in the pipe!, it comes from the data memory!)

Reg

IM

Reg

Reg

IM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $2, 20($1)

Programexecutionorder(in instructions)

and $4, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

DM Reg

Reg

Reg

DM

If the previous instruction was lw to a register and we try to use the register in the current instruction, we have a problem, since we cannot go back in time!

One solution is to avoid such cases by adding a nop (by the Assembler) whenever Rt of the lw is equal to Rs or Rt of the following instruction.

50

Another h/w solution is to add Bubbles,i.e., add nop by hardware

lw $2, 20($1)

Programexecutionorder(in instructions)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

Reg

IM

Reg

Reg

IM DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6Time (in clock cycles)

IM Reg DM RegIM

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

RegReg

Reg

bubble

“nop”

We need to hold IF/ID for one ck cycle and insert a “nop: into ID/EX. This is equal to adding a nop instruction by the Assembler.

51

Hazard detection unit

PCInstruction

memory

Registers

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

0

Mux

IF/ID

Inst

ruct

ion

ID/EX.MemRead

IF/I

DW

rite

PC

Wri

te

ID/EX.RegisterRt

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

RtRs

Rd

Rt EX/MEM.RegisterRd

MEM/WB.RegisterRd

We need to hold the IF/ID and PC for one ck cycle and insert a “nop: into ID/EX. This is equal to adding a nop instruction by the Assembler.

If (ID/EX.MemRd)&& ( (ID/EX.Rt= =IF/ID.Rs) || (ID/EX.Rt= =IF/ID.Rt) ) we must “stall” the pipeline! This means that prev. inst was lw and it was to the current Rs or Rt. (of course if one of them is not used, don’t stall)

Holding means”freeze” the IF/ID and the PC for 1 clock cycle

Hold the IF/ID by not giving a IF/IDWrire signal and do not increment the PC (which already points at the nex instruction) by not giving the PCWrite signal. Inserting a nop is by clearing all control signals.

Rt from prev. inst.

Rs, Rt of current inst.

identifies lw

52

An example for lw hazard detection דוגמא

lw $2, 20($1)

And $4, $2, $5

Or $4, $4, $2

Add $9, $4, $2

53

Hazarddetection

unit

0

MuxIF

/ID

Wri

te

PC

Wri

te

ID/EX.RegisterRt

lw $2, 20($1)

PCInstruction

memory

Registers

Mux

Mux

Mux

EX

M

WB

WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

and $4, $2, $5

ID/EX

before<1>

EX/MEM

before<2>

MEM/WB

or $4, $4, $2

Clock 3

2

5

2

500 11

$2

$5

5

2

4

$1

$X

X

1

2

Control

ALU

M

WB

Hazarddetection

unit

0

MuxIF

/ID

Wri

te

PC

Wri

te

ID/EX.RegisterRt

ID/EX.MemRead

ID/EX.MemRead

M

WB

$1

$X

X

1

2

before<3>

PCInstruction

memory

Registers

Mux

Mux

Mux

EX WB

Datamemory

Mux

Forwardingunit

Inst

ruct

ion

IF/ID

ID/EX

EX/MEM

MEM/WB

and $4, $2, $5 lw $2, 20($1) before<1> before<2>

Clock 2

1

1

X

X11

Control

ALU

M

WB

54

Hazarddetection

unit

0

MuxIF

/IDW

rite

PC

Writ

e

ID/EX.RegisterRt

$2

$5

5

2

2

2

4

WB

Hazarddetection

unit

0

MuxIF

/IDW

rite

PC

Writ

e

ID/EX.RegisterRt

PCInstruction

memory

Registers

Mux

Mux

Mux

EX

M

WB

Datamemory

Mux

Inst

ruct

ion

IF/ID

and $4, $2, $5 bubble

ID/EX

lw $2, . . .

EX/MEM

before<1>

MEM/WB

Clock 4

2

2

5

510

11

00

$2

$5

5

2

4

Control

ALU

M

WB

bubble lw $2, . . .

PCInstruction

memory

Registers

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

Forwardingunit

Inst

ruct

ion

IF/ID

and $4, $2, $5

ID/EX

EX/MEM

MEM/WB

add $9, $4, $2

Clock 5

2

210 10

11

$4

$2

2

4

4

4

2

4

$2

$5

5

2

4

Control

ALU

0

WB

ID/EX.MemRead

ID/EX.MemRead

or $4, $4, $2

or $4, $4, $2

The lw instruction is in the WB phase. $2 is “being written”. We can use $2 in the Execute phase of the and instruction, with the help of forwarding.

55

Registers

Inst

ruct

ion

ID/EX

4

Control

PCInstruction

memory

PCInstruction

memory

Hazarddetection

unit

0

Mux

IF/I

DW

rite

PC

Wri

te

IF/I

DW

rite

PC

Wri

te

ID/EX.RegisterRt

bubble

Registers

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Inst

ruct

ion

IF/ID

add $9, $4, $2

ID/EX

and $4, . . .

EX/MEM

MEM/WB

Clock 6

4

4

2

210 10

$4

$2

2

4

49

$2

2

Control

ALU

10

WB0

add $9, $4, $2 or $4, . . . and $4, . . .after<2> after<1>

after<1>

Clock 7

Mux

Mux

Mux

EX

M

WB

M

WB

Datamemory

Mux

Forwardingunit

Forwardingunit

EX/MEM

MEM/WB

10 10

$4

$4

$2

2

4

4

9

ALU

10

WB

44

4

1

Hazarddetection

unit

0

Mux

ID/EX.RegisterRt

or $4, $4, $2

ID/EX.MemRead

ID/EX.MemRead

Mux

IF/ID

56

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Me

mto

Re

g

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Re

gWrit

e

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Me

mW

rite

AddressData

memory

Address

Just to remind us how branch is handled we show again the Datapath with Control

57

Branch Hazards

Reg

Reg

CC 1

Time (in clock cycles)

40 beq $1, $3, 7

Programexecutionorder(in instructions)

IM Reg

IM DM

IM DM

IM DM

DM

DM Reg

Reg Reg

Reg

Reg

RegIM

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

72 lw $4, 50($7)

CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

Reg

Here we calc.Rs-Rt

Here we decide to branch (switching the address to the PC and issuing PCWrite Cond)

These 3 instructions should be “killed” before they do harm, I.e., change any register.

In cc5 we already use the new PC calculated by the branch. (PC=72)

58

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Me

mto

Re

g

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Re

gWrit

e

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Me

mW

rite

AddressData

memory

Address

The situation was better if we some how “moved” the branch address calculation one ck earlier. This is easy to dosince sign extension and shift are only wires. We just need to move the branch address ALU 1 register to the left. Rverything happens 1 ck earlier and so we’ll have to “kill” only two instructions.

Next, we’ll add a fast comparator which will compare Rs and Rt at the same ck cycle of the “decode” phase. (Instead of using the ALU to calc. Rs-Rt, we’ll built a simple and fast xor circuit). This means extra h/w but now we earned one more ck cycle. So, we have to kill only a single instruction.

Killing an instruction also called “flushing” the pipeline, is easily done by clraing the IF/ID register of the instruction following the branch (if the branch is successful)

59

Flushing

PCInstruction

memory

4

Registers

Mux

Mux

Mux

ALU

EX

M

WB

M

WB

WB

ID/EX

0

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

Signextend

Control

Mux

=

Shiftleft 2

Mux

60

An example for flushing דוגמא

sub $10, $4, $8

beq $1, $3, 7

and $12, $2, $5

lw $4, 50($7)

61

PCInstruction

memory

4

Registers

Signextend

Mux

Mux

Control

EX

M

WB

M

WB

WB

Mux

Hazarddetection

unit

Forwardingunit

Mux

IF.Flush

IF/ID

and $12, $2, $5 beq $1, $3, 7 sub $10, $4, $8

MEM/WB

EX/MEM

ID/EX

Clock 3

72 44

48 44

28

7

$1

$3

10

48

72

72

0

Mux

0

$4

$8

ALUData

memory

bubble (nop)lw $4, 50($7)

Clock 4

Mux

Shiftleft 2

before<1>

beq $1, $3, 7 sub $10, . . . before<1>

before<2>

=

PC Instructionmemory

4

Registers

Signextend

Mux

Mux

Control

EX

M

WB

M

WB

WB

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

MEM/WB

EX/MEM

ID/EX

76 72

76 72

$1

$3

10

76

ALUData

memory

Mux

Shiftleft 2

=

sub $10, $4, $8

beq $1, $3, 7

and $12, $2, $5

lw $4, 50($7)

62

Summary of hazardsData hazards:

* Forward from previous instruction

* Forward from two instructions ago

* (Forward thru “transparent”GPR = from 3 instructions ago)

* If we cannot forward, (after lw) we stall the pipe by inserting a nop and freezing IF/ID and PC for 1 ck cycle

Control hazards:

* If branch is successful we flush the instruction following the branch (which is at the IF/ID register. We

just clear the register)

Notes:

In the real MIPS CPU, no flush was employed. This give the compiler the opportunity to put useful instructions following the branch. This explains why the simulator always performs the instruction following the branch.this is called a delayed branch.

Also, in the real MIPS CPU no lw stall was used. Again this give some freedom to the compiler to

choose whether to put a nop following lw or some useful instruction. This is called a delayed load.

Recommended