Upload
esme
View
46
Download
0
Embed Size (px)
DESCRIPTION
CS61C : Machine Structures Lecture 6.1.1 Pipelining II 2004-07-26 Kurt Meinz inst.eecs.berkeley.edu/~cs61c. rd. instruction memory. registers. PC. rs. Data memory. rt. IFtch. Dcd. Exec. Mem. WB. +4. imm. ALU. 2. Decode/ Register Read. 5. Write Back. 1. Instruction Fetch. - PowerPoint PPT Presentation
Citation preview
CS 61C L6.1.1 Pipelining II (1) K. Meinz, Summer 2004 © UCB
CS61C : Machine StructuresLecture 6.1.1Pipelining II
2004-07-26
Kurt Meinz
inst.eecs.berkeley.edu/~cs61c
CS 61C L6.1.1 Pipelining II (2) K. Meinz, Summer 2004 © UCB
Review: Datapath for MIPS
•Use datapath figure to represent pipelineIFtch Dcd Exec Mem WB
AL
U I$ Reg D$ Reg
PC
inst
ruct
ion
mem
ory
+4
rtrsrd
regi
ster
s
ALU
Dat
am
emor
y
imm
1. InstructionFetch
2. Decode/ Register Read
3. Execute 4. Memory5. WriteBack
CS 61C L6.1.1 Pipelining II (3) K. Meinz, Summer 2004 © UCB
Review: Problems for Computers
•Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle
• Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away)
• Control hazards: Pipelining of branches & other instructions stall the pipeline until the hazard; “bubbles” in the pipeline
• Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)
CS 61C L6.1.1 Pipelining II (4) K. Meinz, Summer 2004 © UCB
Review: C.f. Branch Delay vs. Load Delay•Load Delay occurs only if necessary (dependent instructions).•Branch Delay always happens (part of the ISA).
•Why not have Branch Delay interlocked?
• Answer: Interlocks only work if you can detect hazard ahead of time. By the time we detect a branch, we already need its value … hence no interlock is possible!
CS 61C L6.1.1 Pipelining II (5) K. Meinz, Summer 2004 © UCB
FYI: Historical Trivia•First MIPS design did not interlock and stall on load-use data hazard
•Real reason for name behind MIPS: Microprocessor without Interlocked Pipeline Stages
• Word Play on acronym for Millions of Instructions Per Second, also called MIPS
• Load/Use Wrong Answer!
CS 61C L6.1.1 Pipelining II (6) K. Meinz, Summer 2004 © UCB
Outline•Pipeline Control
•Forwarding Control
•Hazard Control
CS 61C L6.1.1 Pipelining II (7) K. Meinz, Summer 2004 © UCB
Piped Proc So Far …
CS 61C L6.1.1 Pipelining II (8) K. Meinz, Summer 2004 © UCB
New Representation: Regs more explicit
Exec
Reg
. Fi
le
Dat
aM
em
A
B
S
Reg
File
PC
Nex
t PC
IR
Inst
. Mem
D
M
S
IF/DE DE/EX EX/ME ME/WB
IF/DE.Ir = InstructionDE/EX.A = BusA out of RegEX/ME.S = AluOutEX/ME.D = Bus B pass-through for swME/WB.S = ALuOut pass-throughME/WB.M = Mem Result from lw
CS 61C L6.1.1 Pipelining II (9) K. Meinz, Summer 2004 © UCB
New Representation: Regs more explicit
Exec
Reg
. Fi
le
Dat
aM
em
A
B
S
Reg
File
PC
Nex
t PC
IR
Inst
. Mem
D
M
S
IF/DE DE/EX EX/ME ME/WB
What’s Missing???
CS 61C L6.1.1 Pipelining II (10) K. Meinz, Summer 2004 © UCB
Pipelined Processor (almost) for slides
Exec
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
A
B
S
M
Reg
File
Equ
al
PC
Nex
t PC
IR
Inst
. Mem
Valid
IRex
Dcd
Ctrl
IRm
em
Ex
Ctrl
IRw
b
Mem
Ctrl
WB
Ctrl
D
Idea: Parallel Piped Control …
CS 61C L6.1.1 Pipelining II (11) K. Meinz, Summer 2004 © UCB
Pipelined Control IR <- Mem[PC]; PC <– PC+4;
A <- R[rs]; B<– R[rt]
S <– A + B;
R[rd] <– S;
S <– A + SX;
M <– Mem[S]
R[rd] <– M;
S <– A or ZX;
R[rt] <– S;
S <– A + SX;
Mem[S] <- B
If CondPC < PC+SX;
Exec
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
A
B
S
Reg
File
Equ
al
PC
Nex
t PC
IR
Inst
. Mem
D
M
CS 61C L6.1.1 Pipelining II (12) K. Meinz, Summer 2004 © UCB
Data Stationary Control• The Main Control generates the control signals during Reg/Dec• Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later• Control signals for Mem (MemWr Branch) are used 2 cycles later• Control signals for Wr (MemtoReg MemWr) are used 3 cycles later
IF/ID R
egister
ID/E
x Register
Ex/M
em R
egister
Mem
/Wr R
egister
Reg/Dec Exec Mem
ExtOp
ALUOpRegDst
ALUSrc
BranchMemWr
MemtoRegRegWr
MainControl
ExtOp
ALUOpRegDst
ALUSrc
MemtoRegRegWr
MemtoRegRegWr
MemtoRegRegWr
BranchMemWr
BranchMemWr
Wr
CS 61C L6.1.1 Pipelining II (13) K. Meinz, Summer 2004 © UCB
Let’s Try it Out
10 lw r1, 36(r2)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12
100 and r13, r14, 15
CS 61C L6.1.1 Pipelining II (14) K. Meinz, Summer 2004 © UCB
Start: Fetch 10
Exe
c
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
A
B
SReg
File
IR
Inst
. Mem
D
Dec
ode
MemCtrl
WB Ctrl
M
rs rt im
10 lw r1, 36(r2)14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12
100 and r13, r14, 15
IF
PC
Nex
t PC
10
=
n n n n
CS 61C L6.1.1 Pipelining II (15) K. Meinz, Summer 2004 © UCB
Fetch 14, Decode 10
Exe
c
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
A
B
SReg
File
IR
Inst
. Mem
D
Dec
ode
MemCtrl
WB Ctrl
M
2 rt im
10 lw r1, 36(r2)
14 addI r2, r2, 320 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12
100 and r13, r14, 15
lw r1
, 36(
r2)
ID
IF
PC
Nex
t PC
14
=
n n n
CS 61C L6.1.1 Pipelining II (16) K. Meinz, Summer 2004 © UCB
Fetch 20, Decode 14, Exec 10
Exe
c
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
r2
B
SReg
File
IR
Inst
. Mem
D
Dec
ode
MemCtrl
WB Ctrl
M
2 rt 36
10 lw r1, 36(r2)
14 addI r2, r2, 3
20 sub r3, r4, r524 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12
100 and r13, r14, 15
lw r1
addI
r2, r
2, 3
ID
IF
EX
PC
Nex
t PC
20
=
n n
CS 61C L6.1.1 Pipelining II (17) K. Meinz, Summer 2004 © UCB
Fetch 24, Decode 20, Exec 14, Mem 10
Exe
c
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
r2
B
r2+3
6
Reg
File
IR
Inst
. Mem
D
Dec
ode
MemCtrl
WB Ctrl
M
4 5 3
10 lw r1, 36(r2)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 10030 ori r8, r9, 1734 add r10, r11, r12
100 and r13, r14, 15
lw r1
sub
r3, r
4, r5
addI
r2, r
2, 3
ID
IF
EX
M
PC
Nex
t PC
24
=
n
CS 61C L6.1.1 Pipelining II (18) K. Meinz, Summer 2004 © UCB
Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10
Exe
c
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
r4
r5
r2+3
Reg
File
IR
Inst
. Mem
D
Dec
ode
MemCtrl
WB Ctrl
M[r2
+36]
6 7
10 lw r1, 36(r2)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 1734 add r10, r11, r12
100 and r13, r14, 15
lw r1
beq
r6, r
7 10
0
addI
r2
sub
r3
ID
IF
EX
M WB
PC
Nex
t PC
30
=
Note Delayed Branch: always execute ori after beq
CS 61C L6.1.1 Pipelining II (19) K. Meinz, Summer 2004 © UCB
Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14
Exe
c
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
r6
r7
r2+3
Reg
File
IR
Inst
. Mem
D
Dec
ode
MemCtrl
WB Ctrl
r1=M
[r2+3
5]
9 xx
10 lw r1, 36(r2)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
beq
addI
r2
sub
r3r4
-r5
100
ori r
8, r9
17
ID
IF
EX
M WB
PC
Nex
t PC
100
=
CS 61C L6.1.1 Pipelining II (20) K. Meinz, Summer 2004 © UCB
? ? ? ?
Exec
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
A
B
S MR
egFi
le
Equ
al
PC
Nex
t PC
IR
Inst
. Mem
Valid
IRex
Dcd
Ctrl
IRm
em
Ex
Ctrl
IRw
b
Mem
Ctrl
WB
Ctrl
D
• Remember: means triggered on edge.• What is wrong here?
CS 61C L6.1.1 Pipelining II (21) K. Meinz, Summer 2004 © UCB
Double-Clocked Signals
Exec
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
A
B
S MR
egFi
le
Equ
al
PC
Nex
t PC
IR
Inst
. Mem
Valid
IRex
Dcd
Ctrl
IRm
em
Ex
Ctrl
IRw
b
Mem
Ctrl
WB
Ctrl
D
• Some signals are double clocked!• In general: Inputs to edge components are their
own pipeline regs• Watch out for stalls and such!
CS 61C L6.1.1 Pipelining II (22) K. Meinz, Summer 2004 © UCB
Outline•Pipeline Control
•Forwarding Control
•Hazard Control
CS 61C L6.1.1 Pipelining II (23) K. Meinz, Summer 2004 © UCB
Fix by Forwarding result as soon as we have it to where we need it:
Review: Forwarding
sub $t4,$t0,$t3
AL
UI$ Reg D$ Reg
and $t5,$t0,$t6A
LUI$ Reg D$ Reg
or $t7,$t0,$t8 * I$
AL
UReg D$ Reg
xor $t9,$t0,$t10
AL
UI$ Reg D$ Reg
add $t0,$t1,$t2IF ID/RF EX MEM WBA
LUI$ Reg D$ Reg
* “or” hazard solved by register hardware
CS 61C L6.1.1 Pipelining II (24) K. Meinz, Summer 2004 © UCB
ForwardingIn general:•For each stage i that has reg inputs
• For each stage j after I that has reg output- If i.reg == j.reg forward j value back to i.- Some exceptions ($0, invalid)
•ALUinput (ALUResult, MemResult)•MemInput (MemResult)
In particular:
CS 61C L6.1.1 Pipelining II (25) K. Meinz, Summer 2004 © UCB
Pending Writes In Pipeline Registers
npc
I mem
Regs
B
alu
S
D mem
m
IAU
PC
Regs
A im n
n
n
op rw rs rt
CS 61C L6.1.1 Pipelining II (26) K. Meinz, Summer 2004 © UCB
Pending Writes In Pipeline Registers
• Current operand registers
• Pending writes• hazard <=
((rs == rwex) & regWex) OR
((rs == rwmem) & regWme) OR
((rs == rwwb) & regWwb) OR
((rt == rwex) & regWex) OR
((rt == rwmem) & regWme) OR
((rt == rwwb) & regWwb)
npc
I mem
Regs
B
alu
S
D mem
m
IAU
PC
Regs
A im op rwn
op rwn
op rwn
op rw rs rt
CS 61C L6.1.1 Pipelining II (27) K. Meinz, Summer 2004 © UCB
Forwarding Muxes • Detect nearest
valid write op operand register and forward into op latches, bypassing remainder of the pipe
• Increase muxes to add paths from pipeline registers
• Data Forwarding = Data Bypassing
npc
I mem
Regs
B
alu
S
D mem
m
IAU
PC
Regs
A im op rwn
op rwn
op rwn
op rw rs rtForward
mux
CS 61C L6.1.1 Pipelining II (28) K. Meinz, Summer 2004 © UCB
What about memory operations?
A B
op Rd Ra Rb
op Rd Ra Rb
Rd
to regfile
R
Rd
Tricky situation:MIPS:lw 0($t0)sw 0($t1)
RTL:R1 <- Mem[ R2 + I ];Mem[R3+34] <- R1
D
Mem
T
CS 61C L6.1.1 Pipelining II (29) K. Meinz, Summer 2004 © UCB
What about memory operations?
A B
op Rd Ra Rb
op Rd Ra Rb
Rd
to regfile
R
Rd
Tricky situation:MIPS:lw 0($t0)sw 0($t1)
RTL:R1 <- Mem[ R2 + I ];Mem[R3+34] <- R1
Solution: Handle with bypass in
memory stage!
D
Mem
T
CS 61C L6.1.1 Pipelining II (30) K. Meinz, Summer 2004 © UCB
Outline•Pipeline Control
•Forwarding Control
•Hazard Control
CS 61C L6.1.1 Pipelining II (31) K. Meinz, Summer 2004 © UCB
• Forwarding works if value is available (but not written back) before it is needed. But consider …
Data Hazard: Loads (1/4)
sub $t3,$t0,$t2A
LUI$ Reg D$ Reg
lw $t0,0($t1)IF ID/RF EX MEM WBA
LUI$ Reg D$ Reg
•Need result before it is calculated! •Must stall use (sub) 1 cycle and then forward. …
CS 61C L6.1.1 Pipelining II (32) K. Meinz, Summer 2004 © UCB
• Hardware must stall pipeline• Called “interlock”
Data Hazard: Loads (2/4)
sub $t3,$t0,$t2A
LUI$ Reg D$ Regbub
ble
and $t5,$t0,$t4
AL
UI$ Reg D$ Regbubble
or $t7,$t0,$t6 I$
AL
UReg D$bubble
lw $t0, 0($t1)IF ID/RF EX MEM WBA
LUI$ Reg D$ Reg
CS 61C L6.1.1 Pipelining II (33) K. Meinz, Summer 2004 © UCB
Data Hazard: Loads (3/4)
• Instruction slot after a load is called “load delay slot”• If that instruction uses the result of the load, then the hardware interlock will stall it for one cycle.• If the compiler puts an unrelated instruction in that slot, then no stall•Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot (except the latter uses more code space)
CS 61C L6.1.1 Pipelining II (34) K. Meinz, Summer 2004 © UCB
Data Hazard: Loads (4/4)•Stall is equivalent to nop
sub $t3,$t0,$t2
and $t5,$t0,$t4
or $t7,$t0,$t6 I$
AL
UReg D$
lw $t0, 0($t1) AL
UI$ Reg D$ Reg
bubble
bubble
bubble
bubble
bubble
AL
UI$ Reg D$ Reg
AL
UI$ Reg D$ Reg
nop
CS 61C L6.1.1 Pipelining II (35) K. Meinz, Summer 2004 © UCB
Hazards / StallingIn general:•For each stage i that has reg inputs
• If I’s reg is being written later on in the pipe but is not ready yet
- Stages 0 to i: Stall (Turn CEs off so no change)- Stage i+1: Make a bubble (do nothing)- Stages i+2 onward: As usual
•ALUinput (MemResult)In particular:
CS 61C L6.1.1 Pipelining II (36) K. Meinz, Summer 2004 © UCB
Hazards / StallingAlternative Approach:•Detect non-forwarding hazards in decode
• Possible since our hazards are formal.- Not always the case.
• Stalling then becomes:- Issue nop to EX stage- Turn off nextPC update (refetch same inst)- Turn off InstReg update (re-decode same inst)
CS 61C L6.1.1 Pipelining II (37) K. Meinz, Summer 2004 © UCB
Stall Logic • 1. Detect non-
resolving hazards.
• 2a. Insert Bubble
• 2b. Stall nextPC, IF/DE
npc
I mem
Regs
B
alu
S
D mem
m
IAU
PC
Regs
A im op rwn
op rwn
op rwn
op rw rs rtForward
mux
CS 61C L6.1.1 Pipelining II (38) K. Meinz, Summer 2004 © UCB
Stall Logic•Stall-on-issue is used quite a bit
• More complex processors: many cases that stall on issue.
• More complex processors: cases that can’t be detected at decode
- E.g. value needed from mem is not in cache – proc must stall multiple cycles
CS 61C L6.1.1 Pipelining II (39) K. Meinz, Summer 2004 © UCB
By the way …•Notice that our forwarding and stall logic is stateless!
•Big Idea: Keep it simple!• Option 1: Store old fetched inst in reg (“stall_temp”), keep state reg that says whether to use stall_temp or value coming off inst mem.
• Option 2: Re-fetch old value by turning off PC update.