Upload
roger-spencer
View
220
Download
0
Embed Size (px)
Citation preview
cs 152 L1 3 .1 DAP Fa97, U.CB
Pipelining Lessons
° Pipelining doesn’t help latency of single task, it helps throughput of entire workload
° Multiple tasks operating simultaneously using different resources
° Potential speedup = Number pipe stages
° Pipeline rate limited by slowest pipeline stage
° Unbalanced lengths of pipe stages reduces speedup
° Time to “fill” pipeline and time to “drain” it reduces speedup
° Stall for Dependences
6 PM 7 8 9
Time
B
C
D
A
303030 3030 3030Task
Order
cs 152 L1 3 .2 DAP Fa97, U.CB
The Five Stages of Load
° Ifetch: Instruction Fetch• Fetch the instruction from the Instruction Memory
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec: Calculate the memory address
° Mem: Read the data from the Data Memory
° Wr: Write the data back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
cs 152 L1 3 .3 DAP Fa97, U.CB
Conventional Pipelined Execution Representation
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WBProgram Flow
Time
cs 152 L1 3 .4 DAP Fa97, U.CB
Single Cycle, Multiple Cycle, vs. Pipeline
Clk
Cycle 1
Multiple Cycle Implementation:
Ifetch Reg Exec Mem Wr
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipeline Implementation:
Ifetch Reg Exec Mem WrStore
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch
R-type
Ifetch Reg Exec Mem WrR-type
Cycle 1 Cycle 2
cs 152 L1 3 .5 DAP Fa97, U.CB
Why Pipeline? Because the resources are there!
Instr.
Order
Time (clock cycles)
Inst 0
Inst 1
Inst 2
Inst 4
Inst 3
AL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
AL
UIm Reg Dm RegA
LUIm Reg Dm Reg
AL
UIm Reg Dm Reg
cs 152 L1 3 .6 DAP Fa97, U.CB
Can pipelining get us into trouble?
° Yes: Pipeline Hazards• structural hazards: attempt to use the same resource two
different ways at the same time
- E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)
• data hazards: attempt to use item before it is ready
- E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer
- instruction depends on result of prior instruction still in the pipeline
• control hazards: attempt to make a decision before condition is evaulated
- E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in
- branch instructions
° Can always resolve hazards by waiting• pipeline control must detect the hazard
• take action (or delay action) to resolve hazards
cs 152 L1 3 .7 DAP Fa97, U.CB
Mem
Single Memory is a Structural Hazard
Instr.
Order
Time (clock cycles)
Load
Instr 1
Instr 2
Instr 3
Instr 4A
LUMem Reg Mem Reg
AL
UMem Reg Mem Reg
AL
UMem Reg Mem RegA
LUReg Mem Reg
AL
UMem Reg Mem Reg
Right half highlight means read, left half write
cs 152 L1 3 .8 DAP Fa97, U.CB
° Stall: wait until decision is clear• It’s possible to move up decision to 2nd stage by adding
hardware to check registers as being read
° Impact: 2 clock cycles per branch instruction => slow
Control Hazard Solutions
Instr.
Order
Time (clock cycles)
Add
Beq
Load
AL
UMem Reg Mem Reg
AL
UMem Reg Mem RegA
LUReg Mem RegMem
cs 152 L1 3 .9 DAP Fa97, U.CB
° Predict: guess one direction then back up if wrong• Predict not taken
° Impact: 2 clock cycles if wrong (right 50% of time)
° More dynamic scheme: history of 1 branch ( 90%)
Control Hazard Solutions
Instr.
Order
Time (clock cycles)
Add
Beq
Load
AL
UMem Reg Mem Reg
AL
UMem Reg Mem Reg
MemA
LUReg Mem Reg
cs 152 L1 3 .10 DAP Fa97, U.CB
° Redefine branch behavior (takes place after next instruction) “delayed branch”
° Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” ( 50% of time)
° As launch more instruction per clock cycle, less useful
Control Hazard Solutions
Instr.
Order
Time (clock cycles)
Add
Beq
Misc
AL
UMem Reg Mem Reg
AL
UMem Reg Mem Reg
MemA
LUReg Mem Reg
Load Mem
AL
UReg Mem Reg
cs 152 L1 3 .11 DAP Fa97, U.CB
Data Hazard on r1
add r1 ,r2,r3
sub r4, r1 ,r3
and r6, r1 ,r7
or r8, r1 ,r9
xor r10, r1 ,r11
cs 152 L1 3 .12 DAP Fa97, U.CB
• Dependencies backwards in time are hazards
Data Hazard on r1:
Instr.
Order
Time (clock cycles)
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
IF
ID/RF
EX MEM WBAL
UIm Reg Dm Reg
AL
UIm Reg Dm RegA
LUIm Reg Dm Reg
Im
AL
UReg Dm Reg
AL
UIm Reg Dm Reg
cs 152 L1 3 .13 DAP Fa97, U.CB
• “Forward” result from one stage to another
• “or” OK if define read/write properly
Data Hazard Solution:
Instr.
Order
Time (clock cycles)
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
IF
ID/RF
EX MEM WBAL
UIm Reg Dm Reg
AL
UIm Reg Dm RegA
LUIm Reg Dm Reg
Im
AL
UReg Dm Reg
AL
UIm Reg Dm Reg
cs 152 L1 3 .14 DAP Fa97, U.CB
• Dependencies backwards in time are hazards
• Can’t solve with forwarding: • Must delay/stall instruction dependent on loads
Forwarding (or Bypassing): What about Loads
Time (clock cycles)
lw r1,0(r2)
sub r4,r1,r3
IF
ID/RF
EX MEM WBAL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
cs 152 L1 3 .15 DAP Fa97, U.CB
Pipelining the Load Instruction
° The five independent functional units in the pipeline datapath are:
• Instruction Memory for the Ifetch stage
• Register File’s Read ports (Read Register 1 and Read Register 2) for the Reg/Dec stage
• ALU for the Exec stage
• Data Memory for the Mem stage
• Register File’s Write port (Write Register) for the Wr stage
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Ifetch Reg/Dec Exec Mem Wr1st lw
Ifetch Reg/Dec Exec Mem Wr2nd lw
Ifetch Reg/Dec Exec Mem Wr3rd lw
cs 152 L1 3 .16 DAP Fa97, U.CB
The Four Stages of R-type
° Ifetch: Instruction Fetch• Fetch the instruction from the Instruction Memory
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec: • ALU operates on the two register operands
• Update PC
° Wr: Write the ALU output back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec WrR-type
cs 152 L1 3 .17 DAP Fa97, U.CB
Pipelining the R-type and Load Instruction
° We have pipeline conflict or structural hazard:• Two instructions try to write to the register file at the same time!
• Only one write port
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec WrR-type
Ops! We have a problem!
cs 152 L1 3 .18 DAP Fa97, U.CB
Important Observation
° Each functional unit can only be used once per instruction
° Each functional unit must be used at the same stage for all instructions:
• Load uses Register File’s Write Port during its 5th stage
• R-type uses Register File’s Write Port during its 4th stage
Ifetch Reg/Dec Exec Mem WrLoad
1 2 3 4 5
Ifetch Reg/Dec Exec WrR-type
1 2 3 4
° 2 ways to solve this pipeline hazard.
cs 152 L1 3 .19 DAP Fa97, U.CB
Solution 1: Insert “Bubble” into the Pipeline
° Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle
• The control logic can be complex.
• Lose instruction fetch and issue opportunity.
° No instruction is started in Cycle 6!
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec WrR-type Pipeline
Bubble
Ifetch Reg/Dec Exec Wr
cs 152 L1 3 .20 DAP Fa97, U.CB
Solution 2: Delay R-type’s Write by One Cycle
° Delay R-type’s register write by one cycle:• Now R-type instructions also use Reg File’s write port at Stage 5
• Mem stage is a NOOP stage: nothing is being done.
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Mem WrR-type
Ifetch Reg/Dec Mem WrR-type
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Mem WrR-type
Ifetch Reg/Dec Mem WrR-type
Ifetch Reg/Dec Exec WrR-type Mem
Exec
Exec
Exec
Exec
1 2 3 4 5
cs 152 L1 3 .21 DAP Fa97, U.CB
The Four Stages of Store
° Ifetch: Instruction Fetch• Fetch the instruction from the Instruction Memory
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec: Calculate the memory address
° Mem: Write the data into the Data Memory
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec MemStore Wr
cs 152 L1 3 .22 DAP Fa97, U.CB
The Three Stages of Beq
° Ifetch: Instruction Fetch• Fetch the instruction from the Instruction Memory
° Reg/Dec: • Registers Fetch and Instruction Decode
° Exec: • compares the two register operand,
• select correct branch target address
• update PC
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec MemBeq Wr
cs 152 L1 3 .23 DAP Fa97, U.CB
Pipeline Control
PC
Instructionmemory
Address
Inst
ruct
ion
Instruction[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1Write
data
Read
data Mux
1
ALUcontrol
RegWrite
MemRead
Instruction[15– 11]
6
IF/ID ID/EX EX/MEM MEM/WB
MemWrite
Address
Datamemory
PCSrc
Zero
AddAdd
result
Shiftleft 2
ALUresult
ALU
Zero
Add
0
1
Mux
0
1
Mux
cs 152 L1 3 .24 DAP Fa97, U.CB
° We have 5 stages. What needs to be controlled in each stage?• Instruction Fetch and PC Increment• Instruction Decode / Register Fetch• Execution• Memory Stage• Write Back
° How would control be handled in an automobile plant?• a fancy control center telling everyone what to do?• should we use a finite state machine?
Pipeline control
cs 152 L1 3 .25 DAP Fa97, U.CB
° Pass control signals along just like the data
Pipeline Control
Execution/Address Calculation stage control lines
Memory access stage control lines
Write-back stage control
lines
InstructionReg Dst
ALU Op1
ALU Op0
ALU Src Branch
Mem Read
Mem Write
Reg write
Mem to Reg
R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X
Control
EX
M
WB
M
WB
WB
IF/ID ID/EX EX/MEM MEM/WB
Instruction
cs 152 L1 3 .26 DAP Fa97, U.CB
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Me
mto
Re
g
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Re
gWrit
e
MemRead
Control
ALU
Instruction[15– 11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Me
mW
rite
AddressData
memory
Address