Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
04.10.2010
1
Pipelining
Pipelining is Natural!
� Laundry Example
� Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold
� Washer takes 30 minutes
� Dryer takes 40 minutes
� “Folder” takes 20 minutes
A B C D
04.10.2010
2
Sequential Laundry
� Sequential laundry takes 6 hours for 4 loads
� If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
Pipelined Laundry: Start Work ASAP
� Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
30 40 40 40 40 20
04.10.2010
3
Pipelining Lessons
� Pipelining doesn’t help latency of single task, it helps throughput of entire workload
� Pipeline rate limited by slowest pipeline stage
� Multiple tasks operating simultaneously using different resources
� Potential speedup = Number pipe stages
� Unbalanced lengths of pipe stages reduces speedup
� Time to “fill” pipeline and time to “drain” it reduces speedup
� Stall for Dependences
� Ifetch: Instruction Fetch� Fetch the instruction from the Instruction Memory
� Reg/Dec: Registers Fetch and Instruction Decode
� Exec: Calculate the memory address
� Mem: Read the data from the Data Memory
� WrB: Write the data back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrBLoad
The Five Stages of a RISC Instruction
04.10.2010
4
The load instruction has 5 stages:
� Five independent functional units to work on each stage� Each functional unit is used only once!
� A second load can start doing Ifetch as soon as the first load finishes its Ifetch stage
� Each load still takes five cycles to complete� The latency of a single load is still 5 cycles
� The throughput is much higher� CPI approaches 1
� Cycle time is ~1/5th the cycle time of the single-cycle implementation
� Instructions start executing before previous instructions complete execution
Ifetch Reg/Dec Exec Mem WrBLoad
Key Ideas Behind Instruction Pipelining
CPI ↓↓↓↓Cycle time ↓↓↓↓
Pipelining the LOAD Instruction
� The five independent pipeline stages are:� Read next instruction: The Ifetch stage
� Decode instruction and fetch register values: The Reg/Dec stage
� Execute the operation: The Exec stage
� Access data memory: The Mem stage
� Write data to destination register: The WrB stage
� One instruction enters the pipeline every cycle� One instruction comes out of the pipeline (completed) every cycle
� The “effective” CPI is 7/3 (tends to 1); ~1/5 cycle time
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Ifetch Reg/Dec Exec Mem WrB1st lw
Ifetch Reg/Dec Exec Mem WrB2nd lw
Ifetch Reg/Dec Exec Mem WrB3rd lw
04.10.2010
5
� Ifetch: Instruction fetch� Fetch the instruction from the instruction memory
� Reg/Dec: Registers fetch and instruction decode
� Exec: ALU operates on the two register operands
� WrB: Write the ALU output back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec WrBR-type
The Four Stages of R-type
� We have a problem called pipeline conflict or hazard� 2 instructions try to write to the register file at the same time!
� “Contention for a shared resource” (in OS terminology)
� It is no longer meaningful to talk about the execution of a single instruction in isolation� Execution is inherently concurrent; need to achieve serializability
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec WrR-type
OOPS! We have a problem!
Pipelining the R-type and Load Instructions
04.10.2010
6
� Each functional unit can only be used once/instruction
� Each functional unit must be used at the same stage for all instructions� Load uses Register File’s Write Port during its 5th stage
� R-type uses Register File’s Write Port during its 4th stage
Ifetch Reg/Dec Exec Mem WrBLoad
1 2 3 4 5
Ifetch Reg/Dec Exec WrBR-type
1 2 3 4
è How to resolve this pipeline hazard?
Important Observations
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Mem WrBR-type
Ifetch Reg/Dec Mem WrBR-type
Ifetch Reg/Dec Exec Mem WrBLoad
Ifetch Reg/Dec Mem WrBR-type
Ifetch Reg/Dec Mem WrBR-type
Exec
Exec
Exec
Exec
Ifetch Reg/Dec Exec WrR-type Mem
1 2 3 4 5
Solution: Delay R-type’s Write by 1 Cycle
� Delay R-type’s register write by one cycle:� Now R-type instructions also use Reg File’s write port at Stage 5
� Mem stage is a NO-OP stage: nothing is being done. Effective CPI?
04.10.2010
7
� Ifetch: Instruction fetch� Fetch the instruction from the instruction memory
� Reg/Dec: Registers fetch and instruction decode
� Exec: Calculate the memory address
� Mem: Write the data into the data memory
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec MemStore WrB
The Four Stages of Store
� Ifetch: Instruction fetch� Fetch the instruction from the instruction memory
� Reg/Dec: Registers fetch and instruction decode� Exec: ALU compares the two register operands
� Adder calculates the branch target address
� Mem: If the registers we compared in the Exec stage are the same,� Write the branch target address into the PC
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec MemBeq WrB
The Four Stages of Beq
04.10.2010
8
Basic Idea
What do we need to add to split the datapath into stages?
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Instruction
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Address
Datamemory
1
ALUresult
Mux
ALUZero
IF: Instruction fetch ID: Instruction decode/register file read
E X: Execute/address calculation
MEM: Memory access WB: Write back
Pipelined Datapath
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Instruc
tion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
04.10.2010
9
LW instruction: Instruction Fetch (IF)
Instructionmemory
Address
4
32
0
AddAdd
result
Shiftleft 2
Instr
uctio
n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALU
Zero
ID/EX
Datamemory
Address
Instruction fetch
LW instruction: Instruction Decode (ID)
Instructionmemory
Address
4
32
0
AddAdd
result
Shiftleft 2
Instr
uctio
n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALU
Zero
ID/EX
Datamemory
Address
Instruction decode
04.10.2010
10
LW instruction: Execution (EX)
Instructionmemory
Address
4
32
0
AddAdd
result
Shiftleft 2
Instr
uctio
n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALU
Zero
ID/EX
Datamemory
Address
Execution
LW instruction: Memory (MEM)
Instructionmemory
Address
4
32
0
AddAdd
result
Shiftleft 2
Instr
uctio
n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALU
Zero
ID/EX
Datamemory
Address
Memory
04.10.2010
11
LW instruction: Writeback (WB)
Instructionmemory
Address
4
32
0
AddAdd
result
Shiftleft 2
Instr
uctio
n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALU
Zero
ID/EX
Datamemory
Address
Writeback
SW instruction: Memory (MEM)
Instructionmemory
Address
4
32
0
AddAdd
result
Shiftleft 2
Instr
uctio
n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALU
Zero
ID/EX
Datamemory
Address
Memory
04.10.2010
12
SW instruction: Writeback (WB): do nothing
Instructionmemory
Address
4
32
0
AddAdd
result
Shiftleft 2
Instr
uctio
n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALU
Zero
ID/EX
Datamemory
Address
Writeback
Corrected Datapath
In s tr u c t io n
m e m o ry
A dd re s s
4
3 2
0
A ddA d d
re s u lt
S h if t
le ft 2
Ins
tru
cti
on
IF /I D E X / M E M M E M /W B
M
u
x
0
1
A d d
P C
0
A d dr e ss
W ri te
d at a
M
u
x
1
R e g is te rs
R e a d
d a t a 1
R e a d
d a t a 2
R e a d
re g is te r 1
R e a d
re g is te r 2
1 6S ig n
e xte n d
W rite
re g is te r
W rite
d ata
R e a d
da ta
D a ta
m e m o ry
1
A L U
re s ul tM
u
x
A L U
Z e ro
ID /E X
04.10.2010
13
Pipelining Example
Instructionmemory
Address
4
32
0
AddAdd
result
Shiftleft 2
Instructio
n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALU
Zero
ID/EX
Datamemory
Address
add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1)
Events on Every Pipe Stage
04.10.2010
14
� We have 5 stages. What needs to be controlled in
each stage?
� Instruction Fetch and PC Increment
� Instruction Decode / Register Fetch
� Execution
� Memory Stage
� Write Back
Pipeline control
Pipeline Control
PC
Instructionmemory
Address
Inst
ruct
ion
Instruction[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1Write
data
Read
data Mux
1
ALUcontrol
RegWrite
MemRead
Instruction[15– 11]
6
IF/ID ID/EX EX/MEM MEM/WB
MemWrite
Address
Datamemory
PCSrc
Zero
AddAdd
result
Shift
left 2
ALUresult
ALU
Zero
Add
0
1
Mux
0
1
Mux
04.10.2010
15
� Pass control signals along just like the data
Pipeline Control
Execution/Address Calculation
stage control lines
Memory access stage
control lines
Write-back
stage control
lines
Instruction
Reg
Dst
ALU
Op1
ALU
Op0
ALU
Src Branch
Mem
Read
Mem
Write
Reg
write
Mem to
Reg
R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X
Control
EX
M
WB
M
WB
WB
IF/ID ID/EX EX/MEM MEM/WB
Instruction
Datapath with Control
PC
Instructionmemory
I nst
ruct
i on
Add
Instruction[20– 16]
Me
mto
Re
g
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
AddAdd
result
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Re
gW
rit e
MemRead
Control
ALU
Instruction[15– 11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Me
mW
rite
Address
Datamemory
Address
04.10.2010
16
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
AddAdd
result
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Write
Address
Datamemory
Address
IF: lw $10, 9($1)IF: lw $10, 9($1)
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
AddAdd
result
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Write
Address
Datamemory
Address
IF: sub $11, $2, $3IF: sub $11, $2, $3 ID: lw $10, 9($1)ID: lw $10, 9($1)
11
010
0001E
“lw”
04.10.2010
17
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
AddAdd
result
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Write
Address
Datamemory
Address
11
010
00E
ID: sub $11, $2, $3ID: sub $11, $2, $3 EX: lw $10, 9($1)EX: lw $10, 9($1)IF: and $12, $4, $5IF: and $12, $4, $5
1
0
10
000
1100
“sub”
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
AddAdd
result
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Write
Address
Datamemory
Address
10
000
10E
EX: sub $11, $2, $3EX: sub $11, $2, $3 MEM: lw $10, 9($1)MEM: lw $10, 9($1)ID: and $12, $4, $5ID: and $12, $4, $5
0
1
10
000
1100
IF: or $13, $6, $7IF: or $13, $6, $7
110
10
“and”
04.10.2010
18
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
AddAdd
result
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Write
Address
Datamemory
Address
10
000
10E
MEM: sub $11, ..MEM: sub $11, .. WB: lw $10, WB: lw $10, 9($1)9($1)
EX: and $12, $4, $5EX: and $12, $4, $5
0
1
10
000
1100
ID: or $13, $6, $7ID: or $13, $6, $7
100
00
“or”
IF: add $14, $8, $9IF: add $14, $8, $9
1
1
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
AddAdd
result
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Write
Address
Datamemory
Address
10
000
10E
WB: sub $11, ..WB: sub $11, ..MEM: and $12…MEM: and $12…
0
1
10
000
1100
EX: or $13, $6, $7EX: or $13, $6, $7
100
00
“add”
ID: add $14, $8, $9ID: add $14, $8, $9
1
0
IF: xxxxIF: xxxx
04.10.2010
19
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
AddAdd
result
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
M
WB
WBIF/ID
PCSrc
EX/MEM
MEM/WB
Mux
0
1
Mem
Write
Address
Datamemory
Address
10
000
10
WB: and $12…WB: and $12…
0
1
MEM: or $13, ..MEM: or $13, ..
100
00
EX: add $14, $8, $9EX: add $14, $8, $9
1
0
IF: xxxxIF: xxxx ID: xxxxID: xxxx
X
M
WB
ID/EX
E
Datapath with ControlWB: or $13…WB: or $13…
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
AddAdd
result
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
M
WB
WBIF/ID
PCSrc
EX/MEM
MEM/WB
Mux
0
1
Mem
Write
Address
Datamemory
Address
MEM: add $14, ..MEM: add $14, ..
1000
0
EX: xxxxEX: xxxx
1
0
IF: xxxxIF: xxxx ID: xxxxID: xxxx
X
M
WB
ID/EX
E
04.10.2010
20
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
M
WB
WBIF/ID
PCSrc
EX/MEM
MEM/WB
Mux
0
1
Mem
Write
Address
Datamemory
Address
WB: add $14..WB: add $14..MEM: xxxxMEM: xxxxEX: xxxxEX: xxxx
1
0
IF: xxxxIF: xxxx ID: xxxxID: xxxx
X
M
WB
ID/EX
E
Graphically Representing Pipelines
Multiple-clock-cycle pipeline diagram of two instructions
04.10.2010
21
Conventional Pipelined Execution Representation
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WBProgram Flow
Time
Single Cycle, Multiple Cycle, vs. Pipeline
Clk
Cycle 1
Multiple Cycle Implementation:
Ifetch Reg Exec Mem Wr
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipeline Implementation:
Ifetch Reg Exec Mem WrStore
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch
R-type
Ifetch Reg Exec Mem WrR-type
Cycle 1 Cycle 2
04.10.2010
22
Why Pipeline?
� Suppose we execute 100 instructions
� Single Cycle Machine� 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
� Multicycle Machine� 10 ns/cycle x 4.1 CPI (due to inst mix) x 100 inst = 4100 ns
� Ideal pipelined machine� 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
Why Pipeline? Because the resources are there
Instr.
Order
Time (clock cycles)
Inst 0
Inst 1
Inst 2
Inst 4
Inst 3
ALUIm Reg Dm Reg
ALUIm Reg Dm Reg
ALUIm Reg Dm Reg
ALUIm Reg Dm Reg
ALUIm Reg Dm Reg
04.10.2010
23
Wr
Clk
Cycle 1
Multiple Cycle Implementation:
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipelined Implementation:
Ifetch Reg Exec Mem WrStore
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch
R-type
Ifetch Reg Exec Mem WrR-type
Cycle 1 Cycle 2
Ifetch Reg Exec Mem
Single Cycle vs. Multiple Cycle vs. Pipelined
CPU Designs: Summary
� Disadvantages of the Single Cycle Processor� Long cycle time
� Cycle time wasted for the faster instructions
� Multiple Clock Cycle Processor� Divide the instructions into smaller steps
� Execute each step (instead of the entire instruction) in 1 cycle
� Pipelined Processor� Natural enhancement of the multiple clock cycle processor
� Each functional unit used only once per instruction
� If an instruction is going to use a functional unit:� it must use it at the same stage as all other instructions
� Pipeline Control:� each stage’s control signal depends ONLY on the instruction that is currently in that stage
04.10.2010
24
Basic Terms
� Filling a pipeline
� Flushing or draining a pipeline
� Stage or segment delay� Each stage may have a different stage delay
� Beat time (= max stage delay), or clock cycle time
� Number of stages
� End-to-end latency� number of stages × beat time
� Stages are separated by latches (registers)