View
213
Download
0
Tags:
Embed Size (px)
Citation preview
CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB
inst.eecs.berkeley.edu/~cs61c/su05CS61C : Machine Structures
Lecture #19: Pipelining II
2005-07-21
Andy Carle
CS 61C L19 Pipelining II (2) A Carle, Summer 2005 © UCB
Review: Datapath for MIPS
•Use datapath figure to represent pipelineIFtch Dcd Exec Mem WB
AL
U I$ Reg D$ Reg
PC
inst
ruct
ion
me
mor
y+4
rtrs
rd
regi
ste
rs
ALU
Da
tam
em
ory
imm
1. InstructionFetch
2. Decode/ Register Read
3. Execute 4. Memory5. Write
Back
CS 61C L19 Pipelining II (3) A Carle, Summer 2005 © UCB
Review: Problems for Computers
•Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle
• Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away)
• Control hazards: Pipelining of branches & other instructions stall the pipeline until the hazard; “bubbles” in the pipeline
• Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)
CS 61C L19 Pipelining II (4) A Carle, Summer 2005 © UCB
Review: C.f. Branch Delay vs. Load Delay•Load Delay occurs only if necessary (dependent instructions).
•Branch Delay always happens (part of the ISA).
•Why not have Branch Delay interlocked?
• Answer: Interlocks only work if you can detect hazard ahead of time. By the time we detect a branch, we already need its value … hence no interlock is possible!
CS 61C L19 Pipelining II (5) A Carle, Summer 2005 © UCB
FYI: Historical Trivia•First MIPS design did not interlock and stall on load-use data hazard
•Real reason for name behind MIPS: Microprocessor without Interlocked Pipeline Stages
• Word Play on acronym for Millions of Instructions Per Second, also called MIPS
• Load/Use Wrong Answer!
CS 61C L19 Pipelining II (6) A Carle, Summer 2005 © UCB
Outline•Pipeline Control
•Forwarding Control
•Hazard Control
CS 61C L19 Pipelining II (8) A Carle, Summer 2005 © UCB
New Representation: Regs more explicit
Exec
Reg
. F
ile
Dat
aM
em
A
B
S
Reg
File
PC
Nex
t P
C
IR
Inst
. M
em
D
M
S
IF/DE DE/EX EX/ME ME/WB
IF/DE.Ir = Instruction
DE/EX.A = BusA out of Reg
EX/ME.S = AluOut
EX/ME.D = Bus B pass-through for sw
ME/WB.S = ALuOut pass-through
ME/WB.M = Mem Result from lw
CS 61C L19 Pipelining II (9) A Carle, Summer 2005 © UCB
New Representation: Regs more explicit
Exec
Reg
. F
ile
Dat
aM
em
A
B
S
Reg
File
PC
Nex
t P
C
IR
Inst
. M
em
D
M
S
IF/DE DE/EX EX/ME ME/WB
What’s Missing???
CS 61C L19 Pipelining II (10) A Carle, Summer 2005 © UCB
Pipelined Processor (almost) for slides
Exec
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
A
B
S
M
Reg
File
Equ
al
PC
Nex
t P
C
IR
Inst
. M
em
Valid
IRex
Dcd
Ctr
l
IRm
em
Ex
Ctr
l
IRw
b
Mem
Ctr
l
WB
Ctr
l
D
Idea: Parallel Piped Control …
CS 61C L19 Pipelining II (11) A Carle, Summer 2005 © UCB
Pipelined Control IR <- Mem[PC]; PC <– PC+4;
A <- R[rs]; B<– R[rt]
S <– A + B;
R[rd] <– S;
S <– A + SX;
M <– Mem[S]
R[rd] <– M;
S <– A or ZX;
R[rt] <– S;
S <– A + SX;
Mem[S] <- B
If CondPC < PC+SX;
Exec
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
A
B
S
Reg
File
Equ
al
PC
Nex
t P
C
IR
Inst
. M
em
D
M
CS 61C L19 Pipelining II (12) A Carle, Summer 2005 © UCB
Data Stationary Control
• The Main Control generates the control signals during Reg/Dec• Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later
• Control signals for Mem (MemWr Branch) are used 2 cycles later
• Control signals for Wr (MemtoReg MemWr) are used 3 cycles later
IF/ID
Register
ID/E
x Register
Ex/M
em R
egister
Mem
/Wr R
egister
Reg/Dec Exec Mem
ExtOp
ALUOp
RegDst
ALUSrc
Branch
MemWr
MemtoReg
RegWr
MainControl
ExtOp
ALUOp
RegDst
ALUSrc
MemtoReg
RegWr
MemtoReg
RegWr
MemtoReg
RegWr
Branch
MemWr
Branch
MemWr
Wr
CS 61C L19 Pipelining II (13) A Carle, Summer 2005 © UCB
Let’s Try it Out
10 lw r1, 36(r2)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
28 ori r8, r9, 17
32 add r10, r11, r12
100 and r13, r14, 15
CS 61C L19 Pipelining II (14) A Carle, Summer 2005 © UCB
Start: Fetch 10
Exe
c
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
A
B
SReg
File
IR
Inst
. M
em
D
Dec
ode
MemCtrl
WB Ctrl
M
rs rt im
10 lw r1, 36(r2)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
IF
PC
Nex
t P
C
10
=
n n n n
CS 61C L19 Pipelining II (15) A Carle, Summer 2005 © UCB
Fetch 14, Decode 10
Exe
c
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
A
B
SReg
File
IR
Inst
. M
em
D
Dec
ode
MemCtrl
WB Ctrl
M
2 rt im
10 lw r1, 36(r2)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
lw r
1, 3
6(r2
)
ID
IF
PC
Nex
t P
C
14
=
n n n
CS 61C L19 Pipelining II (16) A Carle, Summer 2005 © UCB
Fetch 20, Decode 14, Exec 10
Exe
c
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
r2
B
SReg
File
IR
Inst
. M
em
D
Dec
ode
MemCtrl
WB Ctrl
M
2 rt 36
10 lw r1, 36(r2)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
lw r
1
add
I r2,
r2,
3
ID
IF
EX
PC
Nex
t P
C
20
=
n n
CS 61C L19 Pipelining II (17) A Carle, Summer 2005 © UCB
Fetch 24, Decode 20, Exec 14, Mem 10
Exe
c
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
r2
B
r2+
36
Reg
File
IR
Inst
. M
em
D
Dec
ode
MemCtrl
WB Ctrl
M
4 5 3
10 lw r1, 36(r2)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
lw r
1
sub
r3,
r4,
r5
add
I r2,
r2,
3
ID
IF
EX
M
PC
Nex
t P
C
24
=
n
CS 61C L19 Pipelining II (18) A Carle, Summer 2005 © UCB
Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10
Exe
c
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
r4
r5
r2+
3
Reg
File
IR
Inst
. M
em
D
Dec
ode
MemCtrl
WB Ctrl
M[r
2+36
]6 7
10 lw r1, 36(r2)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
lw r
1
beq
r6,
r7
100
add
I r2
sub
r3
ID
IF
EX
M WB
PC
Nex
t P
C
30
=
Note Delayed Branch: always execute ori after beq
CS 61C L19 Pipelining II (19) A Carle, Summer 2005 © UCB
Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14
Exe
c
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
r6
r7
r2+
3
Reg
File
IR
Inst
. M
em
D
Dec
ode
MemCtrl
WB Ctrl
r1=
M[r
2+35
]
9 xx
10 lw r1, 36(r2)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
beq
add
I r2
sub
r3
r4-r
5
100
ori
r8,
r9
17
ID
IF
EX
M WB
PC
Nex
t P
C
100
=
CS 61C L19 Pipelining II (20) A Carle, Summer 2005 © UCB
? ? ? ?
Exec
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
A
B
S MR
egF
ile
Equ
al
PC
Nex
t P
C
IR
Inst
. M
em
Valid
IRex
Dcd
Ctr
l
IRm
em
Ex
Ctr
l
IRw
b
Mem
Ctr
l
WB
Ctr
l
D
• Remember: means triggered on edge.
• What is wrong here?
CS 61C L19 Pipelining II (21) A Carle, Summer 2005 © UCB
Double-Clocked Signals
Exec
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
A
B
S MR
egF
ile
Equ
al
PC
Nex
t P
C
IR
Inst
. M
em
Valid
IRex
Dcd
Ctr
l
IRm
em
Ex
Ctr
l
IRw
b
Mem
Ctr
l
WB
Ctr
l
D
• Some signals are double clocked!
• In general: Inputs to edge components are their own pipeline regs
• Watch out for stalls and such!
CS 61C L19 Pipelining II (22) A Carle, Summer 2005 © UCB
Administrivia•Proj 2 – Due Sunday
•HW6 – Due Tuesday
•Midterm 2:• Friday, July 29: 11:00 – 2:00
• Location TBD
• If you are really so concerned about the drop deadline that this is a problem for you, talk to me about the possibility of taking the exam on Thursday
CS 61C L19 Pipelining II (24) A Carle, Summer 2005 © UCB
Outline•Pipeline Control
•Forwarding Control
•Hazard Control
CS 61C L19 Pipelining II (25) A Carle, Summer 2005 © UCB
Fix by Forwarding result as soon as we have it to where we need it:
Review: Forwarding
sub $t4,$t0,$t3
AL
UI$ Reg D$ Reg
and $t5,$t0,$t6A
LUI$ Reg D$ Reg
or $t7,$t0,$t8 * I$
AL
UReg D$ Reg
xor $t9,$t0,$t10
AL
UI$ Reg D$ Reg
add $t0,$t1,$t2IF ID/RF EX MEM WBA
LUI$ Reg D$ Reg
* “or” hazard solved by register hardware
CS 61C L19 Pipelining II (26) A Carle, Summer 2005 © UCB
ForwardingIn general:
•For each stage i that has reg inputs• For each stage j after I that has reg output
- If i.reg == j.reg forward j value back to i.
- Some exceptions ($0, invalid)
•ALUinput (ALUResult, MemResult)
•MemInput (MemResult)
In particular:
CS 61C L19 Pipelining II (27) A Carle, Summer 2005 © UCB
Pending Writes In Pipeline Registers
npc
I mem
Regs
B
alu
S
D mem
m
IAU
PC
Regs
A im n
n
n
op rw rs rt
CS 61C L19 Pipelining II (28) A Carle, Summer 2005 © UCB
Pending Writes In Pipeline Registers
• Current operand registers
• Pending writes
• hazard <=
((rs == rwex) & regWex) OR
((rs == rwmem) & regWme) OR
((rs == rwwb) & regWwb) OR
((rt == rwex) & regWex) OR
((rt == rwmem) & regWme) OR
((rt == rwwb) & regWwb)
npc
I mem
Regs
B
alu
S
D mem
m
IAU
PC
Regs
A im op rwn
op rwn
op rwn
op rw rs rt
CS 61C L19 Pipelining II (29) A Carle, Summer 2005 © UCB
Forwarding Muxes • Detect nearest
valid write op operand register and forward into op latches, bypassing remainder of the pipe
• Increase muxes to add paths from pipeline registers
• Data Forwarding = Data Bypassing
npc
I mem
Regs
B
alu
S
D mem
m
IAU
PC
Regs
A im op rwn
op rwn
op rwn
op rw rs rtForward
mux
CS 61C L19 Pipelining II (30) A Carle, Summer 2005 © UCB
What about memory operations?
A B
op Rd Ra Rb
op Rd Ra Rb
Rd
to regfile
R
Rd
Tricky situation:MIPS:lw 0($t0)sw 0($t1)
RTL:R1 <- Mem[ R2 + I ];Mem[R3+34] <- R1
D
Mem
T
CS 61C L19 Pipelining II (31) A Carle, Summer 2005 © UCB
What about memory operations?
A B
op Rd Ra Rb
op Rd Ra Rb
Rd
to regfile
R
Rd
Tricky situation:MIPS:lw 0($t0)sw 0($t1)
RTL:R1 <- Mem[ R2 + I ];Mem[R3+34] <- R1
Solution: Handle with bypass in
memory stage!
D
Mem
T
CS 61C L19 Pipelining II (32) A Carle, Summer 2005 © UCB
Outline•Pipeline Control
•Forwarding Control
•Hazard Control
CS 61C L19 Pipelining II (33) A Carle, Summer 2005 © UCB
• Forwarding works if value is available (but not written back) before it is needed. But consider …
Data Hazard: Loads (1/4)
sub $t3,$t0,$t2A
LUI$ Reg D$ Reg
lw $t0,0($t1)IF ID/RF EX MEM WBA
LUI$ Reg D$ Reg
•Need result before it is calculated! •Must stall use (sub) 1 cycle and then forward. …
CS 61C L19 Pipelining II (34) A Carle, Summer 2005 © UCB
• Hardware must stall pipeline• Called “interlock”
Data Hazard: Loads (2/4)
sub $t3,$t0,$t2A
LUI$ Reg D$ Reg
bubble
and $t5,$t0,$t4
AL
UI$ Reg D$ Regbubble
or $t7,$t0,$t6 I$
AL
UReg D$bubble
lw $t0, 0($t1)IF ID/RF EX MEM WBA
LUI$ Reg D$ Reg
CS 61C L19 Pipelining II (35) A Carle, Summer 2005 © UCB
Data Hazard: Loads (3/4)
• Instruction slot after a load is called “load delay slot”
• If that instruction uses the result of the load, then the hardware interlock will stall it for one cycle.
• If the compiler puts an unrelated instruction in that slot, then no stall
•Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot (except the latter uses more code space)
CS 61C L19 Pipelining II (36) A Carle, Summer 2005 © UCB
Data Hazard: Loads (4/4)•Stall is equivalent to nop
sub $t3,$t0,$t2
and $t5,$t0,$t4
or $t7,$t0,$t6 I$
AL
UReg D$
lw $t0, 0($t1)
AL
UI$ Reg D$ Reg
bubble
bubble
bubble
bubble
bubble
AL
UI$ Reg D$ Reg
AL
UI$ Reg D$ Reg
nop
CS 61C L19 Pipelining II (37) A Carle, Summer 2005 © UCB
Hazards / StallingIn general:
•For each stage i that has reg inputs• If I’s reg is being written later on in the pipe but is not ready yet
- Stages 0 to i: Stall (Turn CEs off so no change)
- Stage i+1: Make a bubble (do nothing)
- Stages i+2 onward: As usual
•ALUinput (MemResult)
In particular:
CS 61C L19 Pipelining II (38) A Carle, Summer 2005 © UCB
Hazards / StallingAlternative Approach:
•Detect non-forwarding hazards in decode• Possible since our hazards are formal.
- Not always the case.
• Stalling then becomes:- Issue nop to EX stage
- Turn off nextPC update (refetch same inst)
- Turn off InstReg update (re-decode same inst)
CS 61C L19 Pipelining II (39) A Carle, Summer 2005 © UCB
Stall Logic • 1. Detect non-
resolving hazards.
• 2a. Insert Bubble
• 2b. Stall nextPC, IF/DE
npc
I mem
Regs
B
alu
S
D mem
m
IAU
PC
Regs
A im op rwn
op rwn
op rwn
op rw rs rtForward
mux
CS 61C L19 Pipelining II (40) A Carle, Summer 2005 © UCB
Stall Logic•Stall-on-issue is used quite a bit
• More complex processors: many cases that stall on issue.
• More complex processors: cases that can’t be detected at decode
- E.g. value needed from mem is not in cache – proc must stall multiple cycles
CS 61C L19 Pipelining II (41) A Carle, Summer 2005 © UCB
By the way …•Notice that our forwarding and stall logic is stateless!
•Big Idea: Keep it simple!• Option 1: Store old fetched inst in reg (“stall_temp”), keep state reg that says whether to use stall_temp or value coming off inst mem.
• Option 2: Re-fetch old value by turning off PC update.