46
06-1 06-1 MIPS Pipelined Implementation Outline: (In this set.) Unpipelined Implementation. (Diagram only.) Pipelined MIPS Implementations: Hardware, notation, hazards. Dependency Definitions. Data Hazards: Definitions, stalling, bypassing. Control Hazards: Squashing, one-cycle implementation. Outline: (Covered in class but not yet in set.) Operation of nonpipelined implementation, elegance and power of pipelined implementation. (See text.) Computation of CPI for program executing a loop. 06-1 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06-1

061 MIPS Pipelined Implementation 061

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 061 MIPS Pipelined Implementation 061

06­1 06­1MIPS Pipelined Implementation

Outline: (In this set.)

Unpipelined Implementation. (Diagram only.)

Pipelined MIPS Implementations: Hardware, notation, hazards.

Dependency Definitions.

Data Hazards: Definitions, stalling, bypassing.

Control Hazards: Squashing, one-cycle implementation.

Outline: (Covered in class but not yet in set.)

Operation of nonpipelined implementation, elegance and power of pipelined implementation.(See text.)

Computation of CPI for program executing a loop.

06­1 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­1

Page 2: 061 MIPS Pipelined Implementation 061

06­2 06­2

Very Simple MIPS Implementation

From EE 3755.

PCen

NPCen

IRen

op

data in

dataaddr

Mem

Portaddr

addr

addr

data

Reg File

data

data in

xma

rt 20:16

rs 25:21

rd 15:11

prepare

imm

uimm

simm

limm

bimm

imm 15:0

32d4

xrwr

xrws

xalu1

xalu2

oalu

omemenpc epceir op

5d315d0

rsv

rtvmd

alu

opcode 31:26

func 5:0

to control logic

Features

Avoid duplication of hardware: One Memory Port, One Adder (ALU).

Relatively complex control logic needed to re-use ALU, etc.

06­2 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­2

Page 3: 061 MIPS Pipelined Implementation 061

06­3 06­3Unpipelined Implementation

In this implementation hardware is duplicated.

Instruction fetchInstruction decode/

register fetch

Execute/ address

calculation

Memory access

Write back

B

PC

4

ALU

16 32

Add

Data memory

Registers

Sign extend

Instruction memory

M u x

M u x

M u x

M u x

Zero?Branch

takenCond

NPC

lmm

ALU output

IRA

LMD

FIGURE 3.1 The implementation of the DLX datapath allows every instruction to be executed in four or five clock cycles.

06­3 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­3

Page 4: 061 MIPS Pipelined Implementation 061

06­4 06­4Pipelined MIPS Implementation

formatimmed

IR

Addr25:21

20:16

IF ID EX WBME

rsv

rtv

IMM

NPC

ALUAddr

Data

Data

Addr D In

+1

Mem

Port

Addr

Data

Out

Addr

DIn

MemPort

Outrtv

ALU

MD

dstDecodedest. reg

NPC

=30 22'b0

PC

15:0

D

dstdst

E

Note: diagram omits connections for some instructions.

06­4 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­4

Page 5: 061 MIPS Pipelined Implementation 061

06­5 06­5Pipeline Details

Pipelining Idea

Split hardware into n equally sized (in time) stages . . .

. . . separate the stages using special registers called pipeline latches . . .

. . . increase the clock frequency by / n× . . .

. . . avoid problems due to overlapping of execution.

Pipeline Stages

Pipeline divided into stages.

Each stage occupied by at most one instruction.

At any time, each stage can be occupied by its own instruction.

Stages given names: IF, ID, EX, ME, WB

Sometimes ME written as MEM.

06­5 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­5

Page 6: 061 MIPS Pipelined Implementation 061

06­6 06­6

Pipeline Latches:

Registers separating pipeline stages.

Written at end of each cycle.

To emphasize role shown in diagram as bar separating stages.

Registers named using pair of stage names and register name.

For example, IF/ID.IR, ID/EX.IR, ID/EX.A (used in text, notes).

if id ir, id ex ir, id ex rs val (used in Verilog code).

06­6 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­6

Page 7: 061 MIPS Pipelined Implementation 061

06­7 06­7Pipeline Execution Diagram

Pipeline Execution Diagram:

Diagram showing the pipeline stages that instructions occupy as they execute.

Time on horizontal axis, instructions on vertical axis.

Diagram shows where instruction is at a particular time.

Cycle 0 1 2 3 4 5 6

add r1, r2, r3 IF ID EX ME WB

and r4, r5, r6 IF ID EX ME WB

lw r7, 8(r9) IF ID EX ME WB

A vertical slice (e.g., at cycle 3) shows processor activity at that time.

In such a slice a stage should appear at most once . . .

. . . if it appears more than once execution not correct . . .

. . . since a stage can only execute one instruction at a time.

06­7 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­7

Page 8: 061 MIPS Pipelined Implementation 061

06­8 06­8Instruction Decoding and Pipeline Control

Pipeline Control

Setting control inputs to devices including . . .

. . . multiplexor inputs . . .

. . . function for ALU . . .

. . . operation for memory . . .

. . . whether to clock each register . . .

. . . et cetera.

06­8 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­8

Page 9: 061 MIPS Pipelined Implementation 061

06­9 06­9

Options for controlling pipeline:

• Decode in ID

Determine settings in ID, pass settings along in pipeline latches.

• Decode in Each Stage

Pass opcode portions of instruction along.

Decoding performed as needed.

Real systems decode in ID.

Example given later in this set.

06­9 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­9

Page 10: 061 MIPS Pipelined Implementation 061

06­10 06­10Dependencies and Hazards

Remember

Operands read from registers in ID. . .

. . . and results written to registers in WB.

Consider the following incorrect execution:

! Cycle 0 1 2 3 4 5 6 7

add r1, r2, r3 IF ID EX ME WB

sub r4, r1, r5 IF ID EX ME WB

and r6, r1, r8 IF ID EX ME WB

xor r9, r4, r11 IF ID EX ME WB

Execution incorrect because . . .

. . . sub reads r1 before add writes (or even finishes computing) r1, . . .

. . . and reads r1 before add writes r1, and . . .

. . . xor reads r4 before sub writes r4.

06­10 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­10

Page 11: 061 MIPS Pipelined Implementation 061

06­11 06­11Dependencies and Hazards

Incorrect execution due to. . .

. . . dependencies in program. . .

. . . and hazards in hardware (pipeline).

Incorrect execution above is the “fault” of the hardware. . .

. . . because the ISA does not forbid dependencies.

Dependency:

A relationship between two instructions . . .

. . . indicating that their execution should be (or appear to be) in program order.

Hazard:

A potential execution problem in an implementation due to overlapping instruction execution.

There are several kinds of dependencies and hazards.

For each kind of dependence there is a corresponding kind of hazard.

06­11 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­11

Page 12: 061 MIPS Pipelined Implementation 061

06­12 06­12Dependencies

Dependency:

A relationship between two instructions . . .

. . . indicating that their execution should be, or appear to be, in program order.

If B is dependent on A then B should appear to execute after A.

Dependency Types:

• True, Data, or Flow Dependence (Three different terms used for the same concept.)

• Name Dependence

• Control Dependence

06­12 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­12

Page 13: 061 MIPS Pipelined Implementation 061

06­13 06­13Data Dependence

Data Dependence: (a.k.a., True and Flow Dependence)

A dependence between two instructions . . .

. . . indicating data needed by the second is produced by the first.

Example:

add r1, r2, r3

sub r4, r1, r5

and r6, r4, r7

The sub is dependent on add (via r1).

The and is dependent on sub (via r4).

The and is dependent add (via sub).

Execution may be incorrect if . . .

. . . a program having a data dependence . . .

. . . is run on a processor having an uncorrected RAW hazard.

06­13 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­13

Page 14: 061 MIPS Pipelined Implementation 061

06­14 06­14Name Dependencies

There are two kinds: antidependence and output dependence.

Antidependence:

A dependence between two instructions . . .

. . . indicating a value written by the second . . .

. . . that the first instruction reads.

Antidependence Example

add r1, r2, r3

sub r2, r4, r5

sub is antidependent on the add.

Execution may be incorrect if . . .

. . . a program having an antidependence . . .

. . . is run on a processor having an uncorrected WAR hazard.

06­14 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­14

Page 15: 061 MIPS Pipelined Implementation 061

06­15 06­15

Output Dependence:

A dependence between two instructions . . .

. . . indicating that both instructions write the same location . . .

. . . (register or memory address).

Output Dependence Example

add r1, r2, r3

sub r1, r4, r5

The sub is output dependent on add.

Execution may be incorrect if . . .

. . . a program having an output dependence . . .

. . . is run on a processor having an uncorrected WAW hazard.

06­15 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­15

Page 16: 061 MIPS Pipelined Implementation 061

06­16 06­16

Control Dependence:

A dependence between a branch instruction and a second instruction . . .

. . . indicating that whether the second instruction executes . . .

. . . depends on the outcome of the branch.

beq $1, $0 SKIP # Delayed branch

nop

add $2, $3, $4

SKIP:

sub $5, $6, $7

The add is control dependent on the beq.

The sub is not control dependent on the beq.

06­16 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­16

Page 17: 061 MIPS Pipelined Implementation 061

06­17 06­17Pipeline Hazards

Hazard:

A potential execution problem in an implementation due to overlapping instruction execution.

Interlock:

Hardware that avoids hazards by stalling certain instructions when necessary.

Hazard Types:

Structural Hazard:

Needed resource currently busy.

Data Hazard:

Needed value not yet available or overwritten.

Control Hazard:

Needed instruction not yet available or wrong instruction executing.

06­17 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­17

Page 18: 061 MIPS Pipelined Implementation 061

06­18 06­18Data Hazards

Identified by acronym indicating correct operation.

• RAW: Read after write, akin to data dependency.

• WAR: Write after read, akin to anti dependency.

• WAW: Write after write, akin to output dependency.

MIPS implementations above only subject to RAW hazards.

RAR not a hazard since read order irrelevant (without an intervening write).

06­18 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­18

Page 19: 061 MIPS Pipelined Implementation 061

06­19 06­19Interlocks

When threatened by a hazard:

• Stall (Pause a part of the pipeline.)

Stalling avoids overlap that would cause error.

This does slow things down.

• Add hardware to avoid the hazards.

Details of hardware depend on hazard and pipeline.

Several will be covered.

06­19 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­19

Page 20: 061 MIPS Pipelined Implementation 061

06­20 06­20Structural Hazards

Cause: two instructions simultaneously need one resource.

Solutions:

Stall.

Duplicate resource.

Pipelines in this section do not have structural hazards.

Covered in more detail with floating-point instructions.

06­20 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­20

Page 21: 061 MIPS Pipelined Implementation 061

06­21 06­21Data Hazards

Pipelined MIPS Subject to RAW Hazards.

Consider the following incorrect execution of code containing data dependencies.

! Cycle 0 1 2 3 4 5 6 7

add r1, r2, r3 IF ID EX ME WB

sub r4, r1, r5 IF ID EX ME WB

and r6, r1, r8 IF ID EX ME WB

xor r9, r4, r11 IF ID EX ME WB

Execution incorrect because . . .

. . . sub reads r1 before add writes (or even finishes computing) r1, . . .

. . . and reads r1 before add writes r1, and . . .

. . . xor reads r4 before sub writes r4.

Problem fixed by stalling the pipeline.

06­21 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­21

Page 22: 061 MIPS Pipelined Implementation 061

06­22 06­22

Stall:

To pause execution in a pipeline from IF up to a certain stage.

With stalls, code can execute correctly:

For code on previous slide, stall until data in register.

! Cycle 0 1 2 3 4 5 6 7 8 9 10

add r1, r2, r3 IF ID EX ME WB

sub r4, r1, r5 IF ID -----> EX ME WB

and r6, r1, r8 IF -----> ID EX ME WB

xor r9, r4, r11 IF ID -> EX ME WB

Arrow shows that instructions stalled.

Stall creates a bubble, stages without valid instructions, in the pipeline.

With bubbles present, CPI is greater than its ideal value of 1.

06­22 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­22

Page 23: 061 MIPS Pipelined Implementation 061

06­23 06­23

Stall Implementation

Stall implemented by asserting a hold signal . . .

. . . which inserts a nop (or equivalent) after the stalling instruction . . .

. . . and disables clocking of pipeline latches before the stalling instruction.

! Cycle 0 1 2 3 4 5 6 7 8 9 10

add r1, r2, r3 IF ID EX ME WB

sub r4, r1, r5 IF ID -----> EX ME WB

and r6, r1, r8 IF -----> ID EX ME WB

xor r9, r4, r11 IF ID -> EX ME WB

During cycle 3, a nop is in EX.

During cycle 4, a nop is in EX and ME .

The two adjacent nops are called a bubble . . .

. . . they move through the pipeline with the other instructions.

A third nop is in EX in cycle 7.

06­23 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­23

Page 24: 061 MIPS Pipelined Implementation 061

06­24 06­24Bypassing

Some stalls are avoidable.

Consider again:

! Cycle 0 1 2 3 4 5 6 7 8 9 10

add r1, r2, r3 IF ID EX ME WB

sub r4, r1, r5 IF ID EX ME WB

and r6, r1, r8 IF ID EX ME WB

xor r9, r4, r11 IF ID EX ME WB

Note that the new value of r1 needed by sub . . .

. . . has been computed at the end of cycle 2 . . .

. . . and isn’t really needed until the beginning of the next cycle, 3.

Execution was incorrect because the value had to go around the pipeline to ID.

Why not provide a shortcut?

Why not call a shortcut a bypass or forwarding path?

06­24 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­24

Page 25: 061 MIPS Pipelined Implementation 061

06­25 06­25Non-Bypassed MIPS

formatimmed

IR

Addr25:21

20:16

IF ID EX WBME

rsv

rtv

IMM

NPC

ALUAddr

Data

Data

Addr D In

+1

Mem

Port

Addr

Data

Out

Addr

DIn

MemPort

Outrtv

ALU

MD

dstDecodedest. reg

NPC

=30 22'b0

PC

15:0

D

dstdst

E

06­25 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­25

Page 26: 061 MIPS Pipelined Implementation 061

06­26 06­26Bypassed MIPS

format

immed

IR

Addr 25:21

20:16

IR

IF ID EX WB MEM

IR IR

rsv

rtv

IMM

NPC

ALU Addr

Data

Data

Addr D In

+4

PC

Mem

Port

Addr

Data

Out

Addr

Data

In

Mem

Port

Data

Out rtv

ALU

MD

dst dst dst Decode

dest. reg

= =0 <0

E

Z

N

NPC

06­26 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­26

Page 27: 061 MIPS Pipelined Implementation 061

06­27 06­27MIPS Implementation With Some Forwarding Paths

format

immed

IR

Addr 25:21

20:16

IR

IF ID EX WB MEM

IR IR

rsv

rtv

IMM

NPC

ALU Addr

Data

Data

Addr D In

+4

PC

Mem

Port

Addr

Data

Out

Addr

Data

In

Mem

Port

Data

Out rtv

ALU

MD

dst dst dst Decode

dest. reg

= =0 <0

E

Z

N

NPC

! Cycle 0 1 2 3 4 5 6 7 8 9 10

add r1, r2, r3 IF ID EX ME WB

sub r4, r1, r5 IF ID EX ME WB

and r6, r1, r8 IF ID EX ME WB

xor r9, r4, r11 IF ID EX ME WB

It works!

06­27 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­27

Page 28: 061 MIPS Pipelined Implementation 061

06­28 06­28MIPS Implementation With Some Forwarding Paths

format

immed

IR

Addr 25:21

20:16

IR

IF ID EX WB MEM

IR IR

rsv

rtv

IMM

NPC

ALU Addr

Data

Data

Addr D In

+4

PC

Mem

Port

Addr

Data

Out

Addr

Data

In

Mem

Port

Data

Out rtv

ALU

MD

dst dst dst Decode

dest. reg

= =0 <0

E

Z

N

NPC Some stalls unavoidable.

! Cycle 0 1 2 3 4 5 6 7 8 9 10

lw r1, 0(r2) IF ID EX ME WB

add r1, r1, r4 IF ID -> EX ME WB

sw 4(r2), r1 IF -> ID -----> EX ME WB

addi r2, r2, 8 IF -----> ID EX ME WB

Stall due to lw could not be avoided (data not available in cycle 3).

Stall in cycles 5 and 6 could be avoided with a new forwarding path.

06­28 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­28

Page 29: 061 MIPS Pipelined Implementation 061

06­29 06­29Bypass Control Logic for Lower ALU Mux

Start with logic for dst, show path of Mux logic.

formatimmed

IR

Addr25:21

20:16

IF ID EX WBME

rsv

rtv

IMM

NPC

ALUAddr

Data

Data

Addr D In

+1

Mem

Port

Addr

Data

Out

Addr

DIn

MemPort

Outrtv

ALU

MD

dstDecodedest. reg

NPC

=

30 22'b0

PC

+15:0

25:0

29:26

29:0

15:0

D0 1

dstdst

mx1Design

me!

Me too!

06­29 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­29

Page 30: 061 MIPS Pipelined Implementation 061

06­30 06­30Logic to determine dst for register file.

formatimmed

IR

Addr25:21

20:16

IF ID EX WBME

rsv

rtv

IMM

NPC

ALUAddr

Data

Data

Addr D In

+1

Mem

Port

Addr

Data

Out

Addr

DIn

MemPort

Outrtv

ALU

MD

dst

NPC

30 2

2'b0

PC

+15:0

25:0

29:26

29:0

15:0

D

dstdst

mx1Design

me next!

is Type R

is Store

is Branch

is J

is JAL

is JR

is JALR

Dest is rd.

No dest (use r0).

Dest is r31.

Dest is rt.

rt 20:16

rd 15:11

5'd0

5'd31

00

11

01

10

lsb

msb

=

06­30 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­30

Page 31: 061 MIPS Pipelined Implementation 061

06­31 06­31Bypass Control Logic for Lower ALU Mux

formatimmed

IR

Addr25:21

20:16

IF ID EX WBME

rsv

rtv

imm

NPC

ALUAddr

Data

Data

Addr D In

+1

Mem

Port

Addr

Data

Out

Addr

DIn

MemPort

Outrtv

ALU

MD

dst

NPC

=

30 2

PC

+15:0

25:0

29:26

29:0

15:0

D0 1

dstdst

mx1

is Type R

lsb

msb

Decode

Dest

=' ='rt 20:16

ByME

rtv

ByWB

imm

ByME

rtv

ByWB

imm

00

01

10

11

ID.IR signals in purple.

2'b0

msb lsb

06­31 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­31

Page 32: 061 MIPS Pipelined Implementation 061

06­32 06­32Bypass Control Logic

Control logic not minimized (for clarity).

Control Logic Generating dst.

Present in previous implementations, just not shown.

Determines which register gets written based on instruction.

Instruction categories used in boxes such as = is Store (some instructions omitted):

= is Type R : All Type R instructions.

= is Store : All store instructions.

= is Branch : branches such as beq and bltz.

= is JR , = is J , etc.: Matches the exact instruction.

06­32 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­32

Page 33: 061 MIPS Pipelined Implementation 061

06­33 06­33Bypass Control Logic, Continued

Logic Generating ID/EX.MUX.

=′ box determines if two register numbers are equal.

Register number zero is not equal register zero, nor any other register.

(The bypassed zero value might not be zero.)

06­33 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­33

Page 34: 061 MIPS Pipelined Implementation 061

06­34 06­34Control Hazards

Cause: on taken CTI several wrong instructions fetched.

Consider:

format

immed

IR

Addr 25:21

20:16

IR

IF ID EX WB MEM

IR IR

rsv

rtv

IMM

NPC

ALU Addr

Data

Data

Addr D In

+4

PC

Mem

Port

Addr

Data

Out

Addr

Data

In

Mem

Port

Data

Out rtv

ALU

MD

dst dst dst Decode

dest. reg

= =0 <0

E

Z

N

NPC

06­34 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­34

Page 35: 061 MIPS Pipelined Implementation 061

06­35 06­35

format

immed

IR

Addr 25:21

20:16

IR

IF ID EX WB MEM

IR IR

rsv

rtv

IMM

NPC

ALU Addr

Data

Data

Addr D In

+4

PC

Mem

Port

Addr

Data

Out

Addr

Data

In

Mem

Port

Data

Out rtv

ALU

MD

dst dst dst Decode

dest. reg

= =0 <0

E

Z

N

NPC

Example of incorrect execution

!I Adr Cycle 0 1 2 3 4 5 6 7 8

0x100 bgtz r4, TARGET IF ID EX ME WB

0x104 sub r4, r2, r5 IF ID EX ME WB

0x108 sw 0(r2), r1 IF ID EX ME WB

0x10c and r6, r1, r8 IF ID EX ME WB

0x110 or r12, r13, r14

...

TARGET: ! TARGET = 0x200

0x200 xor r9, r4, r11 IF ID EX ME WB

Branch is taken yet two instructions past delay slot (sub) complete execution.

Branch target finally fetched in cycle 4.

Problem: Two instructions following delay slot.

06­35 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­35

Page 36: 061 MIPS Pipelined Implementation 061

06­36 06­36

format

immed

IR

Addr 25:21

20:16

IR

IF ID EX WB MEM

IR IR

rsv

rtv

IMM

NPC

ALU Addr

Data

Data

Addr D In

+4

PC

Mem

Port

Addr

Data

Out

Addr

Data

In

Mem

Port

Data

Out rtv

ALU

MD

dst dst dst Decode

dest. reg

= =0 <0

E

Z

N

NPC

Handling Instructions Following a Taken Branch Delay Slot

Option 1: Don’t fetch them.

Possible (with pipelining) because . . .

. . . fetch starts (sw in cycle 2) . . .

. . . after branch decoded.

(Would be impossible . . .

. . . for non-delayed branch.)

!I Adr Cycle 0 1 2 3 4 5 6 7 8

0x100 bgtz r4, TARGET IF ID EX ME WB

0x104 sub r4, r2, r5 IF ID EX ME WB

0x108 sw 0(r2), r1 IF ID EX ME WB

0x10c and r6, r1, r8 IF ID EX ME WB

0x110 or r12, r13, r14

...

TARGET: ! TARGET = 0x200

0x200 xor r9, r4, r11 IF ID EX ME WB

06­36 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­36

Page 37: 061 MIPS Pipelined Implementation 061

06­37 06­37

Handling Instructions Following a Taken Branch

Option 2: Fetch them, but squash (stop) them in a later stage.

This will work if instructions squashed . . .

. . . before modifying architecturally visible storage (registers and memory).

Memory modified in ME stage and registers modified in WB stage . . .

. . . so instructions must be stopped before beginning of ME stage.

Can we do it? Depends depends where branch instruction is.

In example, need to squash sw before cycle 5.

During cycle 3 bgtz in ME . . .

. . . it has been decoded and the branch condition is available . . .

. . . so we know whether the branch is taken . . .

. . . so sw can easily be squashed before cycle 5.

Option 2 will be used.

06­37 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­37

Page 38: 061 MIPS Pipelined Implementation 061

06­38 06­38Instruction Squashing

In-Flight Instruction::

An instruction in the execution pipeline.

Later in the semester a more specific definition will be used.

Squashing:: [an instruction]

preventing an in-flight instruction . . .

. . . from writing registers, memory or any other visible storage.

Squashing also called: nulling, abandoning, and cancelling..

Like an insect, a squashed instruction is still there (in most cases) but can do no harm.

06­38 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­38

Page 39: 061 MIPS Pipelined Implementation 061

06­39 06­39Squashing Instruction in Example MIPS Implementation

Two ways to squash.

• Prevent it from writing architecturally visible storage.

Replace destination register control bits with zero. (Writing zero doesn’t change anything.)

Set memory control bits (not shown so far) for no operation.

• Change Operation to nop.

Would require changing many control bits.

Squashing shown that way here for brevity.

Illustrated by placing a nop in IR.

Why not replace squashed instructions with target instructions?

Because there is no straightforward and inexpensive way . . .

. . . to get the instructions where and when they are needed.

(Curvysideways and expensive techniques covered in Chapter 4.)

06­39 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­39

Page 40: 061 MIPS Pipelined Implementation 061

06­40 06­40

MIPS implementation used so far.

format

immed

IR

Addr 25:21

20:16

IR

IF ID EX WB MEM

IR IR

rsv

rtv

IMM

NPC

ALU Addr

Data

Data

Addr D In

+4

PC

Mem

Port

Addr

Data

Out

Addr

Data

In

Mem

Port

Data

Out rtv

ALU

MD

dst dst dst Decode

dest. reg

= =0 <0

E

Z

N

NPC

06­40 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­40

Page 41: 061 MIPS Pipelined Implementation 061

06­41 06­41

Example of correct execution

!I Adr Cycle 0 1 2 3 4 5 6 7 8

0x100 bgtz r4, TARGET IF ID EX ME WB

0x104 sub r4, r2, r5 IF ID EX ME WB

0x108 sw 0(r2), r1 IF IDx

0x10c and r6, r1, r8 IFx

0x110 or r12, r13, r14

...

TARGET: ! TARGET = 0x200

0x200 xor r9, r4, r11 IF ID EX ME WB

Branch outcome known at end of cycle 2 . . .

. . . wait for cycle 3 when doomed instructions (sw and and) in flight . . .

. . . and squash them so in cycle 4 they act like nops.

Two cycles (1, 2, and 3), are lost.

Two cycles called a branch penalty.

Two cycles is alot of cycles, is there something we can do?

06­41 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­41

Page 42: 061 MIPS Pipelined Implementation 061

06­42 06­42Yes: Zero-Cycle Branch Delay Implementation

formatimmed

IR

Addr25:21

20:16

IF ID EX WBME

rsv

rtv

IMM

NPC

ALUAddr

Data

Data

Addr D In

+1

Mem

Port

Addr

Data

Out

Addr

DIn

MemPort

Outrtv

ALU

MD

dstDecodedest. reg

NPC

=

30 22'b0

PC

+15:0

25:0

29:26

29:0

15:0

D0 1

dstdst

msb lsb

msb

lsb

Compute branch target address in ID stage.

06­42 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­42

Page 43: 061 MIPS Pipelined Implementation 061

06­43 06­43Zero-Cycle Branch Delay Implementation

Compute branch target and condition in ID stage.

Workable because register values not needed to compute branch address and . . .

. . . branch condition can be computed quickly.

Now how fast will code run?

!I Adr Cycle 0 1 2 3 4 5 6 7 8

0x100 bgtz r4, TARGET IF ID EX ME WB

0x104 sub r4, r2, r5 IF ID EX ME WB

0x108 sw 0(r2), r1

0x10c and r6, r1, r8

0x110 or r12, r13, r14

...

TARGET: ! TARGET = 0x200

0x200 xor r9, r4, r11 IF ID EX ME WB

No penalty, not a cycle wasted!!

06­43 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­43

Page 44: 061 MIPS Pipelined Implementation 061

06­44 06­44Non-Bypassed MIPS

formatimmed

IR

Addr25:21

20:16

IF ID EX WBME

rsv

rtv

IMM

NPC

ALUAddr

Data

Data

Addr D In

+1

Mem

Port

Addr

Data

Out

Addr

DIn

MemPort

Outrtv

ALU

MD

dstDecodedest. reg

NPC

=30 22'b0

PC

15:0

D

dstdst

E

06­44 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­44

Page 45: 061 MIPS Pipelined Implementation 061

06­45 06­45Bypassed MIPS

format

immed

IR

Addr 25:21

20:16

IR

IF ID EX WB MEM

IR IR

rsv

rtv

IMM

NPC

ALU Addr

Data

Data

Addr D In

+4

PC

Mem

Port

Addr

Data

Out

Addr

Data

In

Mem

Port

Data

Out rtv

ALU

MD

dst dst dst Decode

dest. reg

= =0 <0

E

Z

N

NPC

06­45 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­45

Page 46: 061 MIPS Pipelined Implementation 061

06­46 06­46ID Branch MIPS

formatimmed

IR

Addr25:21

20:16

IF ID EX WBME

rsv

rtv

IMM

NPC

ALUAddr

Data

Data

Addr D In

+1

Mem

Port

Addr

Data

Out

Addr

DIn

MemPort

Outrtv

ALU

MD

dstDecodedest. reg

NPC

=

30 22'b0

PC

+15:0

25:0

29:26

29:0

15:0

D0 1

dstdst

msb lsb

msb

lsb

06­46 EE 4720 Lecture Transparency. Formatted 9:07, 29 January 2016 from lsli06. 06­46