Review: Datapath for MIPS

Preview:

DESCRIPTION

CS61C : Machine Structures Lecture 6.1.1 Pipelining II 2004-07-26 Kurt Meinz inst.eecs.berkeley.edu/~cs61c. rd. instruction memory. registers. PC. rs. Data memory. rt. IFtch. Dcd. Exec. Mem. WB. +4. imm. ALU. 2. Decode/ Register Read. 5. Write Back. 1. Instruction Fetch. - PowerPoint PPT Presentation

Citation preview

CS 61C L6.1.1 Pipelining II (1) K. Meinz, Summer 2004 © UCB

CS61C : Machine StructuresLecture 6.1.1Pipelining II

2004-07-26

Kurt Meinz

inst.eecs.berkeley.edu/~cs61c

CS 61C L6.1.1 Pipelining II (2) K. Meinz, Summer 2004 © UCB

Review: Datapath for MIPS

•Use datapath figure to represent pipelineIFtch Dcd Exec Mem WB

AL

U I$ Reg D$ Reg

PC

inst

ruct

ion

mem

ory

+4

rtrsrd

regi

ster

s

ALU

Dat

am

emor

y

imm

1. InstructionFetch

2. Decode/ Register Read

3. Execute 4. Memory5. WriteBack

CS 61C L6.1.1 Pipelining II (3) K. Meinz, Summer 2004 © UCB

Review: Problems for Computers

•Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle

• Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away)

• Control hazards: Pipelining of branches & other instructions stall the pipeline until the hazard; “bubbles” in the pipeline

• Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)

CS 61C L6.1.1 Pipelining II (4) K. Meinz, Summer 2004 © UCB

Review: C.f. Branch Delay vs. Load Delay•Load Delay occurs only if necessary (dependent instructions).•Branch Delay always happens (part of the ISA).

•Why not have Branch Delay interlocked?

• Answer: Interlocks only work if you can detect hazard ahead of time. By the time we detect a branch, we already need its value … hence no interlock is possible!

CS 61C L6.1.1 Pipelining II (5) K. Meinz, Summer 2004 © UCB

FYI: Historical Trivia•First MIPS design did not interlock and stall on load-use data hazard

•Real reason for name behind MIPS: Microprocessor without Interlocked Pipeline Stages

• Word Play on acronym for Millions of Instructions Per Second, also called MIPS

• Load/Use Wrong Answer!

CS 61C L6.1.1 Pipelining II (6) K. Meinz, Summer 2004 © UCB

Outline•Pipeline Control

•Forwarding Control

•Hazard Control

CS 61C L6.1.1 Pipelining II (7) K. Meinz, Summer 2004 © UCB

Piped Proc So Far …

CS 61C L6.1.1 Pipelining II (8) K. Meinz, Summer 2004 © UCB

New Representation: Regs more explicit

Exec

Reg

. Fi

le

Dat

aM

em

A

B

S

Reg

File

PC

Nex

t PC

IR

Inst

. Mem

D

M

S

IF/DE DE/EX EX/ME ME/WB

IF/DE.Ir = InstructionDE/EX.A = BusA out of RegEX/ME.S = AluOutEX/ME.D = Bus B pass-through for swME/WB.S = ALuOut pass-throughME/WB.M = Mem Result from lw

CS 61C L6.1.1 Pipelining II (9) K. Meinz, Summer 2004 © UCB

New Representation: Regs more explicit

Exec

Reg

. Fi

le

Dat

aM

em

A

B

S

Reg

File

PC

Nex

t PC

IR

Inst

. Mem

D

M

S

IF/DE DE/EX EX/ME ME/WB

What’s Missing???

CS 61C L6.1.1 Pipelining II (10) K. Meinz, Summer 2004 © UCB

Pipelined Processor (almost) for slides

Exec

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Nex

t PC

IR

Inst

. Mem

Valid

IRex

Dcd

Ctrl

IRm

em

Ex

Ctrl

IRw

b

Mem

Ctrl

WB

Ctrl

D

Idea: Parallel Piped Control …

CS 61C L6.1.1 Pipelining II (11) K. Meinz, Summer 2004 © UCB

Pipelined Control IR <- Mem[PC]; PC <– PC+4;

A <- R[rs]; B<– R[rt]

S <– A + B;

R[rd] <– S;

S <– A + SX;

M <– Mem[S]

R[rd] <– M;

S <– A or ZX;

R[rt] <– S;

S <– A + SX;

Mem[S] <- B

If CondPC < PC+SX;

Exec

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

A

B

S

Reg

File

Equ

al

PC

Nex

t PC

IR

Inst

. Mem

D

M

CS 61C L6.1.1 Pipelining II (12) K. Meinz, Summer 2004 © UCB

Data Stationary Control• The Main Control generates the control signals during Reg/Dec• Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later• Control signals for Mem (MemWr Branch) are used 2 cycles later• Control signals for Wr (MemtoReg MemWr) are used 3 cycles later

IF/ID R

egister

ID/E

x Register

Ex/M

em R

egister

Mem

/Wr R

egister

Reg/Dec Exec Mem

ExtOp

ALUOpRegDst

ALUSrc

BranchMemWr

MemtoRegRegWr

MainControl

ExtOp

ALUOpRegDst

ALUSrc

MemtoRegRegWr

MemtoRegRegWr

MemtoRegRegWr

BranchMemWr

BranchMemWr

Wr

CS 61C L6.1.1 Pipelining II (13) K. Meinz, Summer 2004 © UCB

Let’s Try it Out

10 lw r1, 36(r2)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12

100 and r13, r14, 15

CS 61C L6.1.1 Pipelining II (14) K. Meinz, Summer 2004 © UCB

Start: Fetch 10

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

A

B

SReg

File

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

M

rs rt im

10 lw r1, 36(r2)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12

100 and r13, r14, 15

IF

PC

Nex

t PC

10

=

n n n n

CS 61C L6.1.1 Pipelining II (15) K. Meinz, Summer 2004 © UCB

Fetch 14, Decode 10

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

A

B

SReg

File

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

M

2 rt im

10 lw r1, 36(r2)

14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12

100 and r13, r14, 15

lw r1

, 36(

r2)

ID

IF

PC

Nex

t PC

14

=

n n n

CS 61C L6.1.1 Pipelining II (16) K. Meinz, Summer 2004 © UCB

Fetch 20, Decode 14, Exec 10

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

r2

B

SReg

File

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

M

2 rt 36

10 lw r1, 36(r2)

14 addI r2, r2, 3

20 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12

100 and r13, r14, 15

lw r1

addI

r2, r

2, 3

ID

IF

EX

PC

Nex

t PC

20

=

n n

CS 61C L6.1.1 Pipelining II (17) K. Meinz, Summer 2004 © UCB

Fetch 24, Decode 20, Exec 14, Mem 10

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

r2

B

r2+3

6

Reg

File

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

M

4 5 3

10 lw r1, 36(r2)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12

100 and r13, r14, 15

lw r1

sub

r3, r

4, r5

addI

r2, r

2, 3

ID

IF

EX

M

PC

Nex

t PC

24

=

n

CS 61C L6.1.1 Pipelining II (18) K. Meinz, Summer 2004 © UCB

Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

r4

r5

r2+3

Reg

File

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

M[r2

+36]

6 7

10 lw r1, 36(r2)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 1734 add r10, r11, r12

100 and r13, r14, 15

lw r1

beq

r6, r

7 10

0

addI

r2

sub

r3

ID

IF

EX

M WB

PC

Nex

t PC

30

=

Note Delayed Branch: always execute ori after beq

CS 61C L6.1.1 Pipelining II (19) K. Meinz, Summer 2004 © UCB

Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14

Exe

c

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

r6

r7

r2+3

Reg

File

IR

Inst

. Mem

D

Dec

ode

MemCtrl

WB Ctrl

r1=M

[r2+3

5]

9 xx

10 lw r1, 36(r2)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

beq

addI

r2

sub

r3r4

-r5

100

ori r

8, r9

17

ID

IF

EX

M WB

PC

Nex

t PC

100

=

CS 61C L6.1.1 Pipelining II (20) K. Meinz, Summer 2004 © UCB

? ? ? ?

Exec

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

A

B

S MR

egFi

le

Equ

al

PC

Nex

t PC

IR

Inst

. Mem

Valid

IRex

Dcd

Ctrl

IRm

em

Ex

Ctrl

IRw

b

Mem

Ctrl

WB

Ctrl

D

• Remember: means triggered on edge.• What is wrong here?

CS 61C L6.1.1 Pipelining II (21) K. Meinz, Summer 2004 © UCB

Double-Clocked Signals

Exec

Reg

. Fi

le

Mem

Acc

ess

Dat

aM

em

A

B

S MR

egFi

le

Equ

al

PC

Nex

t PC

IR

Inst

. Mem

Valid

IRex

Dcd

Ctrl

IRm

em

Ex

Ctrl

IRw

b

Mem

Ctrl

WB

Ctrl

D

• Some signals are double clocked!• In general: Inputs to edge components are their

own pipeline regs• Watch out for stalls and such!

CS 61C L6.1.1 Pipelining II (22) K. Meinz, Summer 2004 © UCB

Outline•Pipeline Control

•Forwarding Control

•Hazard Control

CS 61C L6.1.1 Pipelining II (23) K. Meinz, Summer 2004 © UCB

Fix by Forwarding result as soon as we have it to where we need it:

Review: Forwarding

sub $t4,$t0,$t3

AL

UI$ Reg D$ Reg

and $t5,$t0,$t6A

LUI$ Reg D$ Reg

or $t7,$t0,$t8 * I$

AL

UReg D$ Reg

xor $t9,$t0,$t10

AL

UI$ Reg D$ Reg

add $t0,$t1,$t2IF ID/RF EX MEM WBA

LUI$ Reg D$ Reg

* “or” hazard solved by register hardware

CS 61C L6.1.1 Pipelining II (24) K. Meinz, Summer 2004 © UCB

ForwardingIn general:•For each stage i that has reg inputs

• For each stage j after I that has reg output- If i.reg == j.reg forward j value back to i.- Some exceptions ($0, invalid)

•ALUinput (ALUResult, MemResult)•MemInput (MemResult)

In particular:

CS 61C L6.1.1 Pipelining II (25) K. Meinz, Summer 2004 © UCB

Pending Writes In Pipeline Registers

npc

I mem

Regs

B

alu

S

D mem

m

IAU

PC

Regs

A im n

n

n

op rw rs rt

CS 61C L6.1.1 Pipelining II (26) K. Meinz, Summer 2004 © UCB

Pending Writes In Pipeline Registers

• Current operand registers

• Pending writes• hazard <=

((rs == rwex) & regWex) OR

((rs == rwmem) & regWme) OR

((rs == rwwb) & regWwb) OR

((rt == rwex) & regWex) OR

((rt == rwmem) & regWme) OR

((rt == rwwb) & regWwb)

npc

I mem

Regs

B

alu

S

D mem

m

IAU

PC

Regs

A im op rwn

op rwn

op rwn

op rw rs rt

CS 61C L6.1.1 Pipelining II (27) K. Meinz, Summer 2004 © UCB

Forwarding Muxes • Detect nearest

valid write op operand register and forward into op latches, bypassing remainder of the pipe

• Increase muxes to add paths from pipeline registers

• Data Forwarding = Data Bypassing

npc

I mem

Regs

B

alu

S

D mem

m

IAU

PC

Regs

A im op rwn

op rwn

op rwn

op rw rs rtForward

mux

CS 61C L6.1.1 Pipelining II (28) K. Meinz, Summer 2004 © UCB

What about memory operations?

A B

op Rd Ra Rb

op Rd Ra Rb

Rd

to regfile

R

Rd

Tricky situation:MIPS:lw 0($t0)sw 0($t1)

RTL:R1 <- Mem[ R2 + I ];Mem[R3+34] <- R1

D

Mem

T

CS 61C L6.1.1 Pipelining II (29) K. Meinz, Summer 2004 © UCB

What about memory operations?

A B

op Rd Ra Rb

op Rd Ra Rb

Rd

to regfile

R

Rd

Tricky situation:MIPS:lw 0($t0)sw 0($t1)

RTL:R1 <- Mem[ R2 + I ];Mem[R3+34] <- R1

Solution: Handle with bypass in

memory stage!

D

Mem

T

CS 61C L6.1.1 Pipelining II (30) K. Meinz, Summer 2004 © UCB

Outline•Pipeline Control

•Forwarding Control

•Hazard Control

CS 61C L6.1.1 Pipelining II (31) K. Meinz, Summer 2004 © UCB

• Forwarding works if value is available (but not written back) before it is needed. But consider …

Data Hazard: Loads (1/4)

sub $t3,$t0,$t2A

LUI$ Reg D$ Reg

lw $t0,0($t1)IF ID/RF EX MEM WBA

LUI$ Reg D$ Reg

•Need result before it is calculated! •Must stall use (sub) 1 cycle and then forward. …

CS 61C L6.1.1 Pipelining II (32) K. Meinz, Summer 2004 © UCB

• Hardware must stall pipeline• Called “interlock”

Data Hazard: Loads (2/4)

sub $t3,$t0,$t2A

LUI$ Reg D$ Regbub

ble

and $t5,$t0,$t4

AL

UI$ Reg D$ Regbubble

or $t7,$t0,$t6 I$

AL

UReg D$bubble

lw $t0, 0($t1)IF ID/RF EX MEM WBA

LUI$ Reg D$ Reg

CS 61C L6.1.1 Pipelining II (33) K. Meinz, Summer 2004 © UCB

Data Hazard: Loads (3/4)

• Instruction slot after a load is called “load delay slot”• If that instruction uses the result of the load, then the hardware interlock will stall it for one cycle.• If the compiler puts an unrelated instruction in that slot, then no stall•Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot (except the latter uses more code space)

CS 61C L6.1.1 Pipelining II (34) K. Meinz, Summer 2004 © UCB

Data Hazard: Loads (4/4)•Stall is equivalent to nop

sub $t3,$t0,$t2

and $t5,$t0,$t4

or $t7,$t0,$t6 I$

AL

UReg D$

lw $t0, 0($t1) AL

UI$ Reg D$ Reg

bubble

bubble

bubble

bubble

bubble

AL

UI$ Reg D$ Reg

AL

UI$ Reg D$ Reg

nop

CS 61C L6.1.1 Pipelining II (35) K. Meinz, Summer 2004 © UCB

Hazards / StallingIn general:•For each stage i that has reg inputs

• If I’s reg is being written later on in the pipe but is not ready yet

- Stages 0 to i: Stall (Turn CEs off so no change)- Stage i+1: Make a bubble (do nothing)- Stages i+2 onward: As usual

•ALUinput (MemResult)In particular:

CS 61C L6.1.1 Pipelining II (36) K. Meinz, Summer 2004 © UCB

Hazards / StallingAlternative Approach:•Detect non-forwarding hazards in decode

• Possible since our hazards are formal.- Not always the case.

• Stalling then becomes:- Issue nop to EX stage- Turn off nextPC update (refetch same inst)- Turn off InstReg update (re-decode same inst)

CS 61C L6.1.1 Pipelining II (37) K. Meinz, Summer 2004 © UCB

Stall Logic • 1. Detect non-

resolving hazards.

• 2a. Insert Bubble

• 2b. Stall nextPC, IF/DE

npc

I mem

Regs

B

alu

S

D mem

m

IAU

PC

Regs

A im op rwn

op rwn

op rwn

op rw rs rtForward

mux

CS 61C L6.1.1 Pipelining II (38) K. Meinz, Summer 2004 © UCB

Stall Logic•Stall-on-issue is used quite a bit

• More complex processors: many cases that stall on issue.

• More complex processors: cases that can’t be detected at decode

- E.g. value needed from mem is not in cache – proc must stall multiple cycles

CS 61C L6.1.1 Pipelining II (39) K. Meinz, Summer 2004 © UCB

By the way …•Notice that our forwarding and stall logic is stateless!

•Big Idea: Keep it simple!• Option 1: Store old fetched inst in reg (“stall_temp”), keep state reg that says whether to use stall_temp or value coming off inst mem.

• Option 2: Re-fetch old value by turning off PC update.

Recommended