41
CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II 2005-07-21 Andy Carle

CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II 2005-07-21

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB

inst.eecs.berkeley.edu/~cs61c/su05CS61C : Machine Structures

Lecture #19: Pipelining II

2005-07-21

Andy Carle

CS 61C L19 Pipelining II (2) A Carle, Summer 2005 © UCB

Review: Datapath for MIPS

•Use datapath figure to represent pipelineIFtch Dcd Exec Mem WB

AL

U I$ Reg D$ Reg

PC

inst

ruct

ion

me

mor

y+4

rtrs

rd

regi

ste

rs

ALU

Da

tam

em

ory

imm

1. InstructionFetch

2. Decode/ Register Read

3. Execute 4. Memory5. Write

Back

CS 61C L19 Pipelining II (3) A Carle, Summer 2005 © UCB

Review: Problems for Computers

•Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle

• Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away)

• Control hazards: Pipelining of branches & other instructions stall the pipeline until the hazard; “bubbles” in the pipeline

• Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)

CS 61C L19 Pipelining II (4) A Carle, Summer 2005 © UCB

Review: C.f. Branch Delay vs. Load Delay•Load Delay occurs only if necessary (dependent instructions).

•Branch Delay always happens (part of the ISA).

•Why not have Branch Delay interlocked?

• Answer: Interlocks only work if you can detect hazard ahead of time. By the time we detect a branch, we already need its value … hence no interlock is possible!

CS 61C L19 Pipelining II (5) A Carle, Summer 2005 © UCB

FYI: Historical Trivia•First MIPS design did not interlock and stall on load-use data hazard

•Real reason for name behind MIPS: Microprocessor without Interlocked Pipeline Stages

• Word Play on acronym for Millions of Instructions Per Second, also called MIPS

• Load/Use Wrong Answer!

CS 61C L19 Pipelining II (6) A Carle, Summer 2005 © UCB

Outline•Pipeline Control

•Forwarding Control

•Hazard Control

CS 61C L19 Pipelining II (7) A Carle, Summer 2005 © UCB

Piped Proc So Far …

CS 61C L19 Pipelining II (8) A Carle, Summer 2005 © UCB

New Representation: Regs more explicit

Exec

Reg

. F

ile

Dat

aM

em

A

B

S

Reg

File

PC

Nex

t P

C

IR

Inst

. M

em

D

M

S

IF/DE DE/EX EX/ME ME/WB

IF/DE.Ir = Instruction

DE/EX.A = BusA out of Reg

EX/ME.S = AluOut

EX/ME.D = Bus B pass-through for sw

ME/WB.S = ALuOut pass-through

ME/WB.M = Mem Result from lw

CS 61C L19 Pipelining II (9) A Carle, Summer 2005 © UCB

New Representation: Regs more explicit

Exec

Reg

. F

ile

Dat

aM

em

A

B

S

Reg

File

PC

Nex

t P

C

IR

Inst

. M

em

D

M

S

IF/DE DE/EX EX/ME ME/WB

What’s Missing???

CS 61C L19 Pipelining II (10) A Carle, Summer 2005 © UCB

Pipelined Processor (almost) for slides

Exec

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Nex

t P

C

IR

Inst

. M

em

Valid

IRex

Dcd

Ctr

l

IRm

em

Ex

Ctr

l

IRw

b

Mem

Ctr

l

WB

Ctr

l

D

Idea: Parallel Piped Control …

CS 61C L19 Pipelining II (11) A Carle, Summer 2005 © UCB

Pipelined Control IR <- Mem[PC]; PC <– PC+4;

A <- R[rs]; B<– R[rt]

S <– A + B;

R[rd] <– S;

S <– A + SX;

M <– Mem[S]

R[rd] <– M;

S <– A or ZX;

R[rt] <– S;

S <– A + SX;

Mem[S] <- B

If CondPC < PC+SX;

Exec

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

Reg

File

Equ

al

PC

Nex

t P

C

IR

Inst

. M

em

D

M

CS 61C L19 Pipelining II (12) A Carle, Summer 2005 © UCB

Data Stationary Control

• The Main Control generates the control signals during Reg/Dec• Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later

• Control signals for Mem (MemWr Branch) are used 2 cycles later

• Control signals for Wr (MemtoReg MemWr) are used 3 cycles later

IF/ID

Register

ID/E

x Register

Ex/M

em R

egister

Mem

/Wr R

egister

Reg/Dec Exec Mem

ExtOp

ALUOp

RegDst

ALUSrc

Branch

MemWr

MemtoReg

RegWr

MainControl

ExtOp

ALUOp

RegDst

ALUSrc

MemtoReg

RegWr

MemtoReg

RegWr

MemtoReg

RegWr

Branch

MemWr

Branch

MemWr

Wr

CS 61C L19 Pipelining II (13) A Carle, Summer 2005 © UCB

Let’s Try it Out

10 lw r1, 36(r2)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

28 ori r8, r9, 17

32 add r10, r11, r12

100 and r13, r14, 15

CS 61C L19 Pipelining II (14) A Carle, Summer 2005 © UCB

Start: Fetch 10

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

SReg

File

IR

Inst

. M

em

D

Dec

ode

MemCtrl

WB Ctrl

M

rs rt im

10 lw r1, 36(r2)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

IF

PC

Nex

t P

C

10

=

n n n n

CS 61C L19 Pipelining II (15) A Carle, Summer 2005 © UCB

Fetch 14, Decode 10

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

SReg

File

IR

Inst

. M

em

D

Dec

ode

MemCtrl

WB Ctrl

M

2 rt im

10 lw r1, 36(r2)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

lw r

1, 3

6(r2

)

ID

IF

PC

Nex

t P

C

14

=

n n n

CS 61C L19 Pipelining II (16) A Carle, Summer 2005 © UCB

Fetch 20, Decode 14, Exec 10

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

r2

B

SReg

File

IR

Inst

. M

em

D

Dec

ode

MemCtrl

WB Ctrl

M

2 rt 36

10 lw r1, 36(r2)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

lw r

1

add

I r2,

r2,

3

ID

IF

EX

PC

Nex

t P

C

20

=

n n

CS 61C L19 Pipelining II (17) A Carle, Summer 2005 © UCB

Fetch 24, Decode 20, Exec 14, Mem 10

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

r2

B

r2+

36

Reg

File

IR

Inst

. M

em

D

Dec

ode

MemCtrl

WB Ctrl

M

4 5 3

10 lw r1, 36(r2)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

lw r

1

sub

r3,

r4,

r5

add

I r2,

r2,

3

ID

IF

EX

M

PC

Nex

t P

C

24

=

n

CS 61C L19 Pipelining II (18) A Carle, Summer 2005 © UCB

Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

r4

r5

r2+

3

Reg

File

IR

Inst

. M

em

D

Dec

ode

MemCtrl

WB Ctrl

M[r

2+36

]6 7

10 lw r1, 36(r2)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

lw r

1

beq

r6,

r7

100

add

I r2

sub

r3

ID

IF

EX

M WB

PC

Nex

t P

C

30

=

Note Delayed Branch: always execute ori after beq

CS 61C L19 Pipelining II (19) A Carle, Summer 2005 © UCB

Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

r6

r7

r2+

3

Reg

File

IR

Inst

. M

em

D

Dec

ode

MemCtrl

WB Ctrl

r1=

M[r

2+35

]

9 xx

10 lw r1, 36(r2)

14 addI r2, r2, 3

20 sub r3, r4, r5

24 beq r6, r7, 100

30 ori r8, r9, 17

34 add r10, r11, r12

100 and r13, r14, 15

beq

add

I r2

sub

r3

r4-r

5

100

ori

r8,

r9

17

ID

IF

EX

M WB

PC

Nex

t P

C

100

=

CS 61C L19 Pipelining II (20) A Carle, Summer 2005 © UCB

? ? ? ?

Exec

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S MR

egF

ile

Equ

al

PC

Nex

t P

C

IR

Inst

. M

em

Valid

IRex

Dcd

Ctr

l

IRm

em

Ex

Ctr

l

IRw

b

Mem

Ctr

l

WB

Ctr

l

D

• Remember: means triggered on edge.

• What is wrong here?

CS 61C L19 Pipelining II (21) A Carle, Summer 2005 © UCB

Double-Clocked Signals

Exec

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S MR

egF

ile

Equ

al

PC

Nex

t P

C

IR

Inst

. M

em

Valid

IRex

Dcd

Ctr

l

IRm

em

Ex

Ctr

l

IRw

b

Mem

Ctr

l

WB

Ctr

l

D

• Some signals are double clocked!

• In general: Inputs to edge components are their own pipeline regs

• Watch out for stalls and such!

CS 61C L19 Pipelining II (22) A Carle, Summer 2005 © UCB

Administrivia•Proj 2 – Due Sunday

•HW6 – Due Tuesday

•Midterm 2:• Friday, July 29: 11:00 – 2:00

• Location TBD

• If you are really so concerned about the drop deadline that this is a problem for you, talk to me about the possibility of taking the exam on Thursday

CS 61C L19 Pipelining II (23) A Carle, Summer 2005 © UCB

Megahertz Myth?

CS 61C L19 Pipelining II (24) A Carle, Summer 2005 © UCB

Outline•Pipeline Control

•Forwarding Control

•Hazard Control

CS 61C L19 Pipelining II (25) A Carle, Summer 2005 © UCB

Fix by Forwarding result as soon as we have it to where we need it:

Review: Forwarding

sub $t4,$t0,$t3

AL

UI$ Reg D$ Reg

and $t5,$t0,$t6A

LUI$ Reg D$ Reg

or $t7,$t0,$t8 * I$

AL

UReg D$ Reg

xor $t9,$t0,$t10

AL

UI$ Reg D$ Reg

add $t0,$t1,$t2IF ID/RF EX MEM WBA

LUI$ Reg D$ Reg

* “or” hazard solved by register hardware

CS 61C L19 Pipelining II (26) A Carle, Summer 2005 © UCB

ForwardingIn general:

•For each stage i that has reg inputs• For each stage j after I that has reg output

- If i.reg == j.reg forward j value back to i.

- Some exceptions ($0, invalid)

•ALUinput (ALUResult, MemResult)

•MemInput (MemResult)

In particular:

CS 61C L19 Pipelining II (27) A Carle, Summer 2005 © UCB

Pending Writes In Pipeline Registers

npc

I mem

Regs

B

alu

S

D mem

m

IAU

PC

Regs

A im n

n

n

op rw rs rt

CS 61C L19 Pipelining II (28) A Carle, Summer 2005 © UCB

Pending Writes In Pipeline Registers

• Current operand registers

• Pending writes

• hazard <=

((rs == rwex) & regWex) OR

((rs == rwmem) & regWme) OR

((rs == rwwb) & regWwb) OR

((rt == rwex) & regWex) OR

((rt == rwmem) & regWme) OR

((rt == rwwb) & regWwb)

npc

I mem

Regs

B

alu

S

D mem

m

IAU

PC

Regs

A im op rwn

op rwn

op rwn

op rw rs rt

CS 61C L19 Pipelining II (29) A Carle, Summer 2005 © UCB

Forwarding Muxes • Detect nearest

valid write op operand register and forward into op latches, bypassing remainder of the pipe

• Increase muxes to add paths from pipeline registers

• Data Forwarding = Data Bypassing

npc

I mem

Regs

B

alu

S

D mem

m

IAU

PC

Regs

A im op rwn

op rwn

op rwn

op rw rs rtForward

mux

CS 61C L19 Pipelining II (30) A Carle, Summer 2005 © UCB

What about memory operations?

A B

op Rd Ra Rb

op Rd Ra Rb

Rd

to regfile

R

Rd

Tricky situation:MIPS:lw 0($t0)sw 0($t1)

RTL:R1 <- Mem[ R2 + I ];Mem[R3+34] <- R1

D

Mem

T

CS 61C L19 Pipelining II (31) A Carle, Summer 2005 © UCB

What about memory operations?

A B

op Rd Ra Rb

op Rd Ra Rb

Rd

to regfile

R

Rd

Tricky situation:MIPS:lw 0($t0)sw 0($t1)

RTL:R1 <- Mem[ R2 + I ];Mem[R3+34] <- R1

Solution: Handle with bypass in

memory stage!

D

Mem

T

CS 61C L19 Pipelining II (32) A Carle, Summer 2005 © UCB

Outline•Pipeline Control

•Forwarding Control

•Hazard Control

CS 61C L19 Pipelining II (33) A Carle, Summer 2005 © UCB

• Forwarding works if value is available (but not written back) before it is needed. But consider …

Data Hazard: Loads (1/4)

sub $t3,$t0,$t2A

LUI$ Reg D$ Reg

lw $t0,0($t1)IF ID/RF EX MEM WBA

LUI$ Reg D$ Reg

•Need result before it is calculated! •Must stall use (sub) 1 cycle and then forward. …

CS 61C L19 Pipelining II (34) A Carle, Summer 2005 © UCB

• Hardware must stall pipeline• Called “interlock”

Data Hazard: Loads (2/4)

sub $t3,$t0,$t2A

LUI$ Reg D$ Reg

bubble

and $t5,$t0,$t4

AL

UI$ Reg D$ Regbubble

or $t7,$t0,$t6 I$

AL

UReg D$bubble

lw $t0, 0($t1)IF ID/RF EX MEM WBA

LUI$ Reg D$ Reg

CS 61C L19 Pipelining II (35) A Carle, Summer 2005 © UCB

Data Hazard: Loads (3/4)

• Instruction slot after a load is called “load delay slot”

• If that instruction uses the result of the load, then the hardware interlock will stall it for one cycle.

• If the compiler puts an unrelated instruction in that slot, then no stall

•Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot (except the latter uses more code space)

CS 61C L19 Pipelining II (36) A Carle, Summer 2005 © UCB

Data Hazard: Loads (4/4)•Stall is equivalent to nop

sub $t3,$t0,$t2

and $t5,$t0,$t4

or $t7,$t0,$t6 I$

AL

UReg D$

lw $t0, 0($t1)

AL

UI$ Reg D$ Reg

bubble

bubble

bubble

bubble

bubble

AL

UI$ Reg D$ Reg

AL

UI$ Reg D$ Reg

nop

CS 61C L19 Pipelining II (37) A Carle, Summer 2005 © UCB

Hazards / StallingIn general:

•For each stage i that has reg inputs• If I’s reg is being written later on in the pipe but is not ready yet

- Stages 0 to i: Stall (Turn CEs off so no change)

- Stage i+1: Make a bubble (do nothing)

- Stages i+2 onward: As usual

•ALUinput (MemResult)

In particular:

CS 61C L19 Pipelining II (38) A Carle, Summer 2005 © UCB

Hazards / StallingAlternative Approach:

•Detect non-forwarding hazards in decode• Possible since our hazards are formal.

- Not always the case.

• Stalling then becomes:- Issue nop to EX stage

- Turn off nextPC update (refetch same inst)

- Turn off InstReg update (re-decode same inst)

CS 61C L19 Pipelining II (39) A Carle, Summer 2005 © UCB

Stall Logic • 1. Detect non-

resolving hazards.

• 2a. Insert Bubble

• 2b. Stall nextPC, IF/DE

npc

I mem

Regs

B

alu

S

D mem

m

IAU

PC

Regs

A im op rwn

op rwn

op rwn

op rw rs rtForward

mux

CS 61C L19 Pipelining II (40) A Carle, Summer 2005 © UCB

Stall Logic•Stall-on-issue is used quite a bit

• More complex processors: many cases that stall on issue.

• More complex processors: cases that can’t be detected at decode

- E.g. value needed from mem is not in cache – proc must stall multiple cycles

CS 61C L19 Pipelining II (41) A Carle, Summer 2005 © UCB

By the way …•Notice that our forwarding and stall logic is stateless!

•Big Idea: Keep it simple!• Option 1: Store old fetched inst in reg (“stall_temp”), keep state reg that says whether to use stall_temp or value coming off inst mem.

• Option 2: Re-fetch old value by turning off PC update.