View
247
Download
5
Tags:
Embed Size (px)
Citation preview
EECC550 - ShaabanEECC550 - Shaaban#1 Lec # 5 Winter 2006 1-11-2007
Major CPU Design StepsMajor CPU Design Steps1. Analyze instruction set operations using independent RTN
ISA => RTN => datapath requirements.– This provides the the required datapath components and how they are connected to meet ISA
requirements.
2. Select required datapath components, connections & establish clock methodology (e.g clock edge-triggered).
3. Assemble datapath meeting the requirements.
4. Identify and define the function of all control points or signals needed by the datapath.– Analyze implementation of each instruction to determine setting of control points that affects its operations and
register transfer.
5. Design & assemble the control logic.– Hard-Wired: Finite-state machine implementation.– Microprogrammed.
(Chapter 5.5)
Dat
apat
hC
ontr
ol
EECC550 - ShaabanEECC550 - Shaaban#2 Lec # 5 Winter 2006 1-11-2007
Single Cycle MIPS Datapath: Single Cycle MIPS Datapath: CPI = 1, Long Clock CycleCPI = 1, Long Clock Cycleim
m16
32
ALUop (2-bits)
Clk
busW
RegWr
32
32
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Exten
der
Mu
x
3216imm16
ALUSrcExtOp
Mu
x
MemtoReg
Clk
Data InWrEn32 Adr
DataMemory
MemWrA
LU
Zero
Instruction<31:0>
0
1
0
1
01
<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRtRs
=
Ad
der
Ad
der
PC
Clk
00
Mu
x
4
PCSrc
PC
Ext
Adr
InstMemory
BranchZero
0
1
PC+4
BranchTarget
R[rs]
R[rt]
MainALU
(Includes ORInot in book version)
ALUControlFunction
Field
Jump Not Included
T = I x CPI x C
EECC550 - ShaabanEECC550 - Shaaban#3 Lec # 5 Winter 2006 1-11-2007
Drawbacks of Single-Cycle ProcessorDrawbacks of Single-Cycle Processor1. Long cycle time:
– All instructions must take as much time as the slowest:• Cycle time for load is longer than needed for all other instructions.
– Real memory is not as well-behaved as idealized memory• Cannot always complete data access in one (short) cycle.
2. Impossible to implement complex, variable-length instructions and complex addressing modes in a single cycle.
• e.g indirect memory addressing.
3. High and duplicate hardware resource requirements– Any hardware functional unit cannot be used more than once in a single cycle (e.g. ALUs).
4. Cannot pipeline (overlap) the processing of one instruction with the previous instructions.– (instruction pipelining, chapter 6).
EECC550 - ShaabanEECC550 - Shaaban#4 Lec # 5 Winter 2006 1-11-2007
Abstract View of Single Cycle CPUAbstract View of Single Cycle CPU
PC
Nex
t P
C
Reg
iste
rF
etch ALU Reg
. W
rt
Mem
Acc
ess
Dat
aM
emInst
ruct
ion
Fet
ch
Res
ult
Sto
re
AL
Uct
r
Reg
Dst
AL
US
rc
Ext
Op
Mem
Wr
Eq
ual
Bra
nch,
Jum
p
Reg
Wr
Mem
Wr
Mem
Rd
MainControl
ALUcontrol
op
fun
Ext
One CPU Clock CycleDuration C = 8ns
One instruction per cycle CPI = 1
Assuming the following datapath/control hardware components delays:Memory Units: 2 ns ALU and adders: 2 nsRegister File: 1 ns Control Unit < 1 ns
2 ns1 ns
2 ns
2 ns
1 ns
EECC550 - ShaabanEECC550 - Shaaban#5 Lec # 5 Winter 2006 1-11-2007
Single Cycle Instruction TimingSingle Cycle Instruction Timing
PC Inst Memory mux ALU Data Mem mux
PC Reg FileInst Memory mux ALU mux
PC Inst Memory mux ALU Data Mem
PC Inst Memory cmp mux
Reg File
Reg File
Reg File
Arithmetic & Logical
Load
Store
Branch
Critical Path
setup
setup
(Determines CPU clock cycle, C)
EECC550 - ShaabanEECC550 - Shaaban#6 Lec # 5 Winter 2006 1-11-2007
Clock Cycle Time & Critical PathClock Cycle Time & Critical Path
• Critical path: the slowest path between any two storage devices
• Clock Cycle time is a function of the critical path, and must be greater than:
– Clock-to-Q + Longest Delay Path through the Combination Logic + Setup + Clock Skew
Clk
.
.
.
.
.
.
.
.
.
.
.
.
One CPU Clock CycleDuration C = 8ns here
Critical Path
Assuming the following datapath/control hardware components delays:Memory Units: 2 ns ALU and adders: 2 nsRegister File: 1 ns Control Unit < 1 ns
i.e longest delay
EECC550 - ShaabanEECC550 - Shaaban#7 Lec # 5 Winter 2006 1-11-2007
Reducing Cycle Time: Multi-Cycle DesignReducing Cycle Time: Multi-Cycle Design• Cut combinational dependency graph by inserting registers / latches.• The same work is done in two or more shorter cycles, rather than one long
cycle.
storage element
Acyclic CombinationalLogic
storage element
storage element
Acyclic CombinationalLogic (A)
storage element
storage element
Acyclic CombinationalLogic (B)
=>
Place registers to:• Get a balanced clock cycle length• Save any results needed for the remaining cycles
One longcycle
Two shortercycles
Cycle 1
Cycle 2
e.g CPI =1e.g CPI =2
EECC550 - ShaabanEECC550 - Shaaban#8 Lec # 5 Winter 2006 1-11-2007
Basic MIPS Instruction Processing StepsBasic MIPS Instruction Processing Steps
Obtain instruction from program storage
Determine instruction type
Obtain operands from registers
Compute result value or status
Store result in register/memory if needed
(usually called Write Back).
Update program counter to address
of next instruction } Commonsteps for all instructions
Instruction
Fetch
Instruction
Decode
Execute
Result
Store
Next
Instruction
Instruction Mem[PC]
PC PC + 4
Done by Control Unit
Instruction Memory
EECC550 - ShaabanEECC550 - Shaaban#9 Lec # 5 Winter 2006 1-11-2007
Partitioning The Single Cycle DatapathPartitioning The Single Cycle Datapath Add registers between steps to break into cycles
PC
Nex
t P
C
Ope
rand
Fet
ch Exec Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
Inst
ruct
ion
Fet
ch
Res
ult
Sto
re
AL
Uct
r
Reg
Dst
AL
US
rc
Ext
Op
Mem
Wr
Bra
nch,
Ju
mp
Reg
Wr
Mem
Wr
Mem
Rd
Instruction Fetch Cycle (IF)
Instruction Decode Cycle (ID)
Execution Cycle (EX)
Data Memory Access Cycle (MEM)
Write back Cycle (WB)
1 2 3 4 5
Place registers to:• Get a balanced clock cycle length• Save any results needed for the remaining cycles
2 ns1 ns
2 ns 2 ns 1 ns
EECC550 - ShaabanEECC550 - Shaaban#10 Lec # 5 Winter 2006 1-11-2007
Example Multi-cycle DatapathExample Multi-cycle Datapath
PC
Nex
t P
C
Ext
ALU Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
AL
Uct
r
Reg
Dst
AL
US
rc
Ext
Op
Bra
nch,
Jum
p
Reg
Wr
Mem
Wr
Mem
Rd
IR
A
B
R
M
RegFile
Mem
ToR
eg
Equ
al
Registers added: All clock-edge triggered (not shown register write enable control lines)
IR: Instruction registerA, B: Two registers to hold operands read from register file.R: or ALUOut, holds the output of the main ALUM: or Memory data register (MDR) to hold data read from data memoryCPU Clock Cycle Time: Worst cycle delay = C = 2ns (ignoring MUX, CLK-Q delays)
Instruction Fetch (IF) 2ns
Instruction Decode (ID) 1ns
Execution (EX) 2ns
Memory (MEM) 2ns
Write Back (WB) 1ns
To Control Unit
Assuming the following datapath/control hardware components delays:Memory Units: 2 ns ALU and adders: 2 nsRegister File: 1 ns Control Unit < 1 ns
Inst
ruct
ion
Fet
ch
1 2 3 4 5
Thus Clock Rate:f = 1 / 2ns = 500 MHz
EECC550 - ShaabanEECC550 - Shaaban#11 Lec # 5 Winter 2006 1-11-2007
Operations (Dependant RTN) for Each CycleOperations (Dependant RTN) for Each Cycle
Instruction Fetch
Instruction Decode
Execution
Memory
WriteBack
R-Type
IR Mem[PC]
A R[rs]
B R[rt]
R A funct B
R[rd] R
PC PC + 4
Logic Immediate
IR Mem[PC]
A R[rs]
B R[rt
R A OR ZeroExt[imm16]
R[rt] R
PC PC + 4
Load
IR Mem[PC]
A R[rs]B R[rt
R A + SignEx(Im16)
M Mem[R]
R[rt] M
PC PC + 4
Store
IR Mem[PC]
A R[rs]
B R[rt]
R A + SignEx(Im16)
Mem[R] B
PC PC + 4
Branch
IR Mem[PC]
A R[rs]
B R[rt]
Zero A - B
If Zero = 1:
PC PC + 4 +
(SignExt(imm16) x4)
else (i.e Zero =0):
PC PC + 4
IF
ID
EX
MEM
WB
Instruction Fetch (IF) & Instruction Decode cycles are common for all instructions
EECC550 - ShaabanEECC550 - Shaaban#12 Lec # 5 Winter 2006 1-11-2007
MIPS Multi-Cycle Datapath:MIPS Multi-Cycle Datapath: Five Cycles of LoadFive Cycles of Load
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
IF ID EX MEM WBLoad
1- Instruction Fetch (IF): Fetch the instruction from instruction Memory.
2- Instruction Decode (ID): Operand Register Fetch and Instruction Decode.
3- Execute (EX): Calculate the effective memory address.
4- Memory (MEM): Read the data from the Data Memory.
5- Write Back (WB): Write the loaded data to the register file. Update PC.
EECC550 - ShaabanEECC550 - Shaaban#13 Lec # 5 Winter 2006 1-11-2007
Multi-cycle Datapath Instruction CPIMulti-cycle Datapath Instruction CPI• R-Type/Immediate: Require four cycles, CPI = 4
– IF, ID, EX, WB
• Loads: Require five cycles, CPI = 5– IF, ID, EX, MEM, WB
• Stores: Require four cycles, CPI = 4– IF, ID, EX, MEM
• Branches/Jumps: Require three cycles, CPI = 3– IF, ID, EX
• Average or effective program CPI: 3 CPI 5 depending on program profile (instruction mix).
EECC550 - ShaabanEECC550 - Shaaban#14 Lec # 5 Winter 2006 1-11-2007
Single Cycle Vs. Multi-Cycle CPUSingle Cycle Vs. Multi-Cycle CPU
Clk
Cycle 1
Multiple Cycle Implementation:
IF ID EX MEM WB
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
IF ID EX MEM
Load Store
Clk
Single Cycle Implementation:
Load Store Waste
IF
R-type
Cycle 1 Cycle 2
8 ns
2ns (500 MHz)
Single-Cycle CPU:CPI = 1 C = 8nsOne million instructions take = I x CPI x C = 106 x 1 x 8x10-9 = 8 msec
Multi-Cycle CPU:CPI = 3 to 5 C = 2nsOne million instructions take from 106 x 3 x 2x10-9 = 6 msecto 106 x 5 x 2x10-9 = 10 msecdepending on instruction mix used.
8ns (125 MHz)
Assuming the following datapath/control hardware components delays:Memory Units: 2 ns ALU and adders: 2 nsRegister File: 1 ns Control Unit < 1 ns
f = 500 MHzf = 125 MHz
EECC550 - ShaabanEECC550 - Shaaban#15 Lec # 5 Winter 2006 1-11-2007
Finite State Machine (FSM) Control ModelFinite State Machine (FSM) Control Model• State specifies control points (outputs) for Register Transfer.• Control points (outputs) are assumed to depend only on the current state
and not inputs (i.e. Moore finite state machine)• Transfer (register/memory writes) and state transition occur upon exiting
the state on the falling edge of the clock.
State X
Register TransferControl Points
State Transition Depends on Inputs
Control State
Next StateLogic
Output Logic
inputs (opcode, conditions)
outputs (control points)
Next State
Last State
To datapath
Current State
Control Unit Design:
e.g Flip-Flops
EECC550 - ShaabanEECC550 - Shaaban#16 Lec # 5 Winter 2006 1-11-2007
Control Specification For Multi-cycle CPUControl Specification For Multi-cycle CPUFinite State Machine (FSM) - State Transition DiagramFinite State Machine (FSM) - State Transition Diagram
IR MEM[PC]
R-type
A R[rs]B R[rt]
R A fun B
R[rd] RPC PC + 4
R A or ZX
R[rt] RPC PC + 4
ORi
R A + SX
R[rt] MPC PC + 4
M MEM[R]
LW
R A + SX
MEM[R] BPC PC + 4
BEQ & Zero
BEQ & ~Zero
PC PC + 4 PC PC + 4+ SX || 00
SW
“instruction fetch”
“decode / operand fetch”
Execute
Memory
Write-back
To instruction fetch
To instruction fetchTo instruction fetch
13 states:4 State Flip-Flops needed
(Start state)
EECC550 - ShaabanEECC550 - Shaaban#17 Lec # 5 Winter 2006 1-11-2007
Traditional FSM ControllerTraditional FSM Controller
State
6
4
11nextState
op
Equal
control points
state op condnextstate control points
State Transition Table
datapath StateTo datapath
Outputs (Control points)
OpcodeCurrent State
State register (4 Flip-Flops)
Output Logic
Next StateLogic
Outputs
Inputs
EECC550 - ShaabanEECC550 - Shaaban#18 Lec # 5 Winter 2006 1-11-2007
Traditional FSM ControllerTraditional FSM Controller
datapath + state diagram => controldatapath + state diagram => control
• Translate RTN statements into control points.
• Assign states.
• Implement the controller.
More on FSM controller implementation in Appendix C
EECC550 - ShaabanEECC550 - Shaaban#19 Lec # 5 Winter 2006 1-11-2007
Mapping RTNs To Control Points ExamplesMapping RTNs To Control Points Examples& State Assignments& State Assignments
IR MEM[PC]
0000
R-type
A R[rs]B R[rt] 0001
R A fun B 0100
R[rd] RPC PC + 4
0101
R A or ZX 0110
R[rt] RPC PC + 4
0111
ORi
R A + SX 1000
R[rt] MPC PC + 4
1010
M MEM[R] 1001
LW
R A + SX 1011
MEM[R] BPC PC + 4 1100
BEQ & Zero
BEQ & ~Zero
PC PC + 4 0011
PC PC + 4+SX || 00 0010
SW
“instruction fetch”
“decode / operand fetch”
Execute
Memory
Write-back
imem_rd, IRen
Aen, Ben
ALUfun, Sen
RegDst,RegWr,PCen To instruction fetch
state 0000
To instruction fetch state 0000To instruction fetch state 0000
0
1
2
3
4
5 7
8
9
10
116
12
EECC550 - ShaabanEECC550 - Shaaban#20 Lec # 5 Winter 2006 1-11-2007
Detailed Control Specification - State Transition TableCurrent Op field Z Next IR PC Ops Exec Mem Write-BackState en sel A B Ex Sr ALU S R W M M-R Wr
Dst0000 ?????? ? 0001 10001 BEQ 0 0011 1 10001 BEQ 1 0010 1 10001 R-type x 0100 1 10001 orI x 0110 1 10001 LW x 1000 1 10001 SW x 1011 1 10010 xxxxxx x 0000 1 10011 xxxxxx x 0000 1 00100 xxxxxx x 0101 0 1 fun 10101 xxxxxx x 0000 1 0 0 1 10110 xxxxxx x 0111 0 0 or 10111 xxxxxx x 0000 1 0 0 1 01000 xxxxxx x 1001 1 0 add 11001 xxxxxx x 1010 1 0 11010 xxxxxx x 0000 1 0 1 1 01011 xxxxxx x 1100 1 0 add 11100 xxxxxx x 0000 1 0 0 1
R
ORI
LW
SW
BEQ
IF
ID
Can be combined in one state
More on FSM controller implementation in Appendix C
EECC550 - ShaabanEECC550 - Shaaban#21 Lec # 5 Winter 2006 1-11-2007
Alternative Multiple Cycle Datapath (In Textbook)• Miminizes Hardware: 1 memory, 1 ALU
IdealMemory
Din
Address
32
32
32Dout
MemWr32
AL
U
3232
ALUOp
ALUControl
32
IRWr
Instru
ction R
eg
32
Reg File
Ra
Rw
busW
Rb5
5
32busA
32busB
RegWr
Rs
Rt
Mu
x
0
1
Rt
Rd
PCWr
ALUSrcA
Mux 01
RegDst
Mu
x
0
1
32
PC
MemtoReg
Extend
Mu
x
0
132
0
123
4
16Imm 32
ALUSrcB
Mu
x1
0
32
Zero
ZeroPCWrCond PCSrc
32
IorD
Mem
Data R
eg
AL
U O
ut
B
A
<< 2
MemRd
PC
EECC550 - ShaabanEECC550 - Shaaban#22 Lec # 5 Winter 2006 1-11-2007
Alternative Multiple Cycle Datapath (In Textbook)
• Shared instruction/data memory unit• A single ALU shared among instructions• Shared units require additional or widened multiplexors• Temporary registers to hold data between clock cycles of the instruction:
• Additional registers: Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut
(Figure 5.27 page 322)
rs
rt
rd
imm16
EECC550 - ShaabanEECC550 - Shaaban#23 Lec # 5 Winter 2006 1-11-2007
Alternative Multiple Cycle Datapath With Control Lines (Fig 5.28 In Textbook)
(Figure 5.28 page 323)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
rs
rt
rd
2
2
2
imm16
32
32
32
32
32
32
32
PC
EECC550 - ShaabanEECC550 - Shaaban#24 Lec # 5 Winter 2006 1-11-2007
The Effect of The 1-bit Control Signals Signal Name
RegDst
RegWrite
ALUSrcA
MemRead
MemWrite
MemtoReg
IorD
IRWrite
PCWrite
PCWriteCond
Effect when deasserted (=0)
The register destination number for thewrite register comes from the rt field(instruction bits 20:16).
None
The first ALU operand is the PC
None
None
The value fed to the register write data input comes from ALUOut register.
The PC is used to supply the address to thememory unit.
None
None
None
Effect when asserted (=1)
The register destination number for thewrite register comes from the rd field(instruction bits 15:11).The register on the write register inputis written with the value on the Write data input.
The First ALU operand is register A (I.e R[rs])
Content of memory specified by the address input are put on the memory data output.
Memory contents specified by the address input is replaced by the value on the Write data input.
The value fed to the register write data input comes from data memory register (MDR).
The ALUOut register is used to supply the the address to the memory unit.
The output of the memory is written into Instruction Register (IR)
The PC is written; the source is controlled by PCSource
The PC is written if the Zero output of the ALU isalso active.
(Figure 5.29 page 324)
EECC550 - ShaabanEECC550 - Shaaban#25 Lec # 5 Winter 2006 1-11-2007
The Effect of The 2-bit Control Signals Signal Name
ALUOp
ALUSrcB
PCSource
Value (Binary)
00
01
10
00
01
10
11
00
01
10
Effect
The ALU performs an add operation
The ALU performs a subtract operation
The funct field of the instruction determines the ALU operation (R-Type)
The second input of the ALU comes from register B
The second input of the ALU is the constant 4
The second input of the ALU is the sign-extended 16-bitimmediate (imm16) field of the instruction in IR
The second input of the ALU is is the sign-extended 16-bitimmediate field of IR shifted left 2 bits
Output of the ALU (PC+4) is sent to the PC for writing
The content of ALUOut (the branch target address) is sent to the PC for writing
The jump target address (IR[25:0] shifted left 2 bits and concatenated with PC+4[31:28] is sent to the PC for writing
(Figure 5.29 page 324)
i.e jump address
EECC550 - ShaabanEECC550 - Shaaban#26 Lec # 5 Winter 2006 1-11-2007
Instruction Fetch
Instruction Decode
Execution
Memory
WriteBack
R-Type
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC + (SignExt(imm16) x4)
ALUout
A funct B
R[rd] ALUout
Load
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
ALUout
A + SignEx(Imm16)
MDR Mem[ALUout]
R[rt] MDR
Store
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
ALUout
A + SignEx(Imm16)
Mem[ALUout] B
Branch
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
Zero A - B
Zero: PC ALUout
Jump
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
PC Jump Address
IF
ID
EX
MEM
WB
Instruction Fetch (IF) & Instruction Decode (ID) cycles are common for all instructions
Operations (Dependant RTN) for Each CycleOperations (Dependant RTN) for Each Cycle
EECC550 - ShaabanEECC550 - Shaaban#27 Lec # 5 Winter 2006 1-11-2007
High-Level View of Finite State High-Level View of Finite State Machine ControlMachine Control
• First steps are independent of the instruction class• Then a series of sequences that depend on the instruction opcode• Then the control returns to fetch a new instruction.• Each box above represents one or several state.
(Figure 5.32)
(Figure 5.33) (Figure 5.34) (Figure 5.35) (Figure 5.36)
(Figure 5.31 page 332)
EECC550 - ShaabanEECC550 - Shaaban#28 Lec # 5 Winter 2006 1-11-2007
Instruction Fetch (IF) and Decode (ID) Instruction Fetch (IF) and Decode (ID) FSM StatesFSM States
IFID
(Figure 5.33) (Figure 5.34) (Figure 5.35) (Figure 5.36)
(Figure 5.32 page 333)
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC + (SignExt(imm16) x4)
EECC550 - ShaabanEECC550 - Shaaban#29 Lec # 5 Winter 2006 1-11-2007
Instruction Fetch (IF) Cycle (State 0)
(Figure 5.28 page 323)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
rs
rt
rd
2
2
2
imm16
32
32
32
32
32
32
32
PC
IR Mem[PC]PC PC + 4
00
MemRead = 1 ALUSrcA = 0 IorD = 0 IRWrite =1 ALUSrcB = 01 ALUOp = 00 (add) PCWrite = 1 PCSource = 00
10
101
0
1
00
Add
1
EECC550 - ShaabanEECC550 - Shaaban#30 Lec # 5 Winter 2006 1-11-2007
Instruction Decode (ID) Cycle (State 1)
(Figure 5.28 page 323)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
rs
rt
rd
2
2
2
imm16
32
32
32
32
32
32
32
PC
A R[rs]
B R[rt]
ALUout PC + (SignExt(imm16) x4)
ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 (add)
00
Add
11
0
(Calculate branch target)
EECC550 - ShaabanEECC550 - Shaaban#31 Lec # 5 Winter 2006 1-11-2007
Load/Store Instructions FSM StatesLoad/Store Instructions FSM States
EX
MEM
WB To Instruction Fetch(Figure 5.32)
(From Instruction Decode)
(Figure 5.33 page 334)
ALUout A + SignEx(Imm16)
MDR Mem[ALUout]
Mem[ALUout] B
R[rt] MDR
EECC550 - ShaabanEECC550 - Shaaban#32 Lec # 5 Winter 2006 1-11-2007
Load/Store Execution (EX) Cycle (State 2)
(Figure 5.28 page 323)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
rs
rt
rd
2
2
2
imm16
32
32
32
32
32
32
32
PC
ALUSrcA = 1 ALUSrcB = 10ALUOp = 00 (add)
00
Add
10
1
ALUout A + SignEx(Imm16)
Effective address calculation
EECC550 - ShaabanEECC550 - Shaaban#33 Lec # 5 Winter 2006 1-11-2007
(Figure 5.28 page 323)
Load Memory (MEM) Cycle (State 3)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
rs
rt
rd
2
2
2
imm16
32
32
32
32
32
32
32
PC
MDR Mem[ALUout] MemRead = 1 IorD = 1
11
EECC550 - ShaabanEECC550 - Shaaban#34 Lec # 5 Winter 2006 1-11-2007
(Figure 5.28 page 323)
Load Write Back (WB) Cycle (State 4)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
rs
rt
rd
2
2
2
imm16
32
32
32
32
32
32
32
PC
R[rt] MDR RegWrite = 1 MemtoReg = 1 RegDst = 0
1
0
1
EECC550 - ShaabanEECC550 - Shaaban#35 Lec # 5 Winter 2006 1-11-2007
(Figure 5.28 page 323)
Store Memory (MEM) Cycle (State 5)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
rs
rt
rd
2
2
2
imm16
32
32
32
32
32
32
32
PC
Mem[ALUout] B MemWrite = 1 IorD = 1
11
EECC550 - ShaabanEECC550 - Shaaban#36 Lec # 5 Winter 2006 1-11-2007
R-Type Instructions R-Type Instructions FSM StatesFSM States
EX
WB
To State 0 (Instruction Fetch) (Figure 5.32)
(From Instruction Decode)
(Figure 5.34 page 335)
ALUout A funct B
R[rd] ALUout
EECC550 - ShaabanEECC550 - Shaaban#37 Lec # 5 Winter 2006 1-11-2007
R-Type Execution (EX) Cycle (State 6)
(Figure 5.28 page 323)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
rs
rt
rd
2
2
2
imm16
32
32
32
32
32
32
32
PC
ALUout A funct B ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 (R-Type)
10
00
1
R-Type
EECC550 - ShaabanEECC550 - Shaaban#38 Lec # 5 Winter 2006 1-11-2007
(Figure 5.28 page 323)
R-Type Write Back (WB) Cycle (State 7)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
rs
rt
rd
2
2
2
imm16
32
32
32
32
32
32
32
PC
R[rd] ALUout RegWrite = 1 MemtoReg = 0 RegDst = 1
1
1
0
EECC550 - ShaabanEECC550 - Shaaban#39 Lec # 5 Winter 2006 1-11-2007
Jump Instruction Jump Instruction Single EX StateSingle EX State
Branch Instruction Branch Instruction Single EX StateSingle EX State
EXEX
To State 0 (Instruction Fetch) (Figure 5.32)
(From Instruction Decode)
To State 0 (Instruction Fetch) (Figure 5.32)
(From Instruction Decode)
(Figures 5.35, 5.36 page 337)
PC Jump AddressZero A - B
Zero : PC ALUout
EECC550 - ShaabanEECC550 - Shaaban#40 Lec # 5 Winter 2006 1-11-2007
(Figure 5.28 page 323)
Branch Execution (EX) Cycle (State 8)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
rs
rt
rd
2
2
2
imm16
32
32
32
32
32
32
32
PC
Zero A - B
Zero : PC ALUoutALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 (Subtract)PCWriteCond = 1 PCSource = 01
011
01
Subtract
00
1
EECC550 - ShaabanEECC550 - Shaaban#41 Lec # 5 Winter 2006 1-11-2007
(Figure 5.28 page 323)
Jump Execution (EX) Cycle (State 9)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
rs
rt
rd
2
2
2
imm16
32
32
32
32
32
32
32
PC
PC Jump Address PCWrite = 1 PCSource = 10
101
1
EECC550 - ShaabanEECC550 - Shaaban#42 Lec # 5 Winter 2006 1-11-2007
FSM State TransitionDiagram (From Book) IF ID
EX
MEM WB
WB
(Figure 5.38 page 339)
Total 10 states
More on FSM controller implementation in Appendix C
R[rd] ALUout
IR Mem[PC]PC PC + 4
ALUout A func B
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
Zero A -B
Zero: PC ALUout
ALUout
A + SignEx(Imm16)
PC Jump Address
R[rt] MDR
MDR Mem[ALUout]
Mem[ALUout] B
EECC550 - ShaabanEECC550 - Shaaban#43 Lec # 5 Winter 2006 1-11-2007
MIPS Multi-cycle Datapath MIPS Multi-cycle Datapath Performance EvaluationPerformance Evaluation
• What is the average CPI?– State diagram gives CPI for each instruction type.
– Workload (program) below gives frequency of each type.
Type CPIi for type Frequency CPIi x freqIi
Arith/Logic 4 40% 1.6
Load 5 30% 1.5
Store 4 10% 0.4
branch 3 20% 0.6
Average CPI: 4.1
Better than CPI = 5 if all instructions took the same number of clock cycles (5).
EECC550 - ShaabanEECC550 - Shaaban#44 Lec # 5 Winter 2006 1-11-2007
• You are to add support for a new instruction, swap that exchanges the values of two registers to the MIPS multicycle datapath of Figure 5.28 on page 232
swap $rs, $rt• Swap used the R-Type format with: the value of field rs = the value of field rd • Add any necessary datapaths and control signals to the
multicycle datapath. Find a solution that minimizes the number of clock cycles required for the new instruction without modifying the register file. Justify the need for the modifications, if any.
• Show the necessary modifications to the multicycle control finite state machine of Figure 5.38 on page 339 when adding the swap instruction. For each new state added, provide the dependent RTN and active control signal values.
Adding Support for swap to Multi Cycle Datapath
(For More Practice Exercise 5.42)
i.e No additional register write ports
EECC550 - ShaabanEECC550 - Shaaban#45 Lec # 5 Winter 2006 1-11-2007
23
Adding swap Instruction Support to Multi Cycle Datapath Swap $rs, $rt R[rt] R[rs]
R[rs] R[rt]
We assume here rs = rd in instruction encoding
The outputs of A and B should be connected to the multiplexor controlled by MemtoReg if one of the two fields (rs and rd) contains the name of one of the registers being swapped. The other register is specified by rt. The MemtoReg control signal becomes two bits.
op rs rt rd[31-26] [25-21] [20-16] [10-6]
(For More Practice Exercise 5.42)
rs
rt
rd
imm16
PC+ 4
BranchTarget
22
2
R[rs]
R[rt]
EECC550 - ShaabanEECC550 - Shaaban#46 Lec # 5 Winter 2006 1-11-2007
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
IR Mem[PC]PC PC + 4
IF
ID
R[rd] B
R[rt] A
ALUout A func B
R[rd] ALUout
ALUout
A + SignEx(Imm16)EX
MEMWB
WB
Swap takes 4 cycles
WB1
WB2
Adding swap Instruction Support to Multi Cycle Datapath
(For More Practice Exercise 5.42)
Zero A -B
Zero: PC ALUout
EECC550 - ShaabanEECC550 - Shaaban#47 Lec # 5 Winter 2006 1-11-2007
• You are to add support for a new instruction, add3, that adds the values of three registers, to the MIPS multicycle datapath of Figure 5.28 on page 232 For example:
add3 $s0,$s1, $s2, $s3
Register $s0 gets the sum of $s1, $s2 and $s3.
The instruction encoding uses a modified R-format, with an additional register specifier rx added replacing the five low bits of the “funct” field.
• Add necessary datapath components, connections, and control signals to the multicycle datapath without modifying the register bank or adding additional ALUs. Find a solution that minimizes the number of clock cycles required for the new instruction. Justify the need for the modifications, if any.
• Show the necessary modifications to the multicycle control finite state machine of Figure 5.38 on page 339 when adding the add3 instruction. For each new state added, provide the dependent RTN and active control signal values.
Adding Support for add3 to Multi Cycle Datapath
(For More Practice Exercise 5.45)
OP rs rt rd rx
$s1 $s2 Not used
6 bits[31-26]
5 bits[25-21]
5 bits[20-16]
5 bits[15-11]
add3
5 bits [4-0]
$s0 $s3
6 bits [10-5]
EECC550 - ShaabanEECC550 - Shaaban#48 Lec # 5 Winter 2006 1-11-2007
Exercise 5.45: add3 instruction support to Multi Cycle Datapath Add3 $rd, $rs, $rt, $rx
R[rd] R[rs] + R[rt] + R[rx]
rx is a new register specifier in field [0-4] of the instructionNo additional register read ports or ALUs allowed
2
ReadSrc
1. ALUout is added as an extra input to first ALU operand MUX to use the previous ALU result as an input for the second addition. 2. A multiplexor should be added to select between rt and the new field rx containing register number of the 3rd operand (bits 4-0 for the instruction) for input for Read Register 2. This multiplexor will be controlled by a new one bit control signal called ReadSrc.
op rs rt rd rx[31-26] [25-21] [20-16] [10-6] [4-0]
ModifiedR-Format
WriteB
3. WriteB control line added to enable writing R[rx] to B
2
2
2 PC+ 4
BranchTarget
imm16
rx
rd
rs
rt
EECC550 - ShaabanEECC550 - Shaaban#49 Lec # 5 Winter 2006 1-11-2007
Exercise 5.45: add3 instruction support to Multi Cycle Datapath A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
IR Mem[PC]PC PC + 4
IF
ID
ALUout A + B
B R[rx]
ALUout ALUout + B
ALUout A func B
Zero A -B
Zero: PC ALUout
ALUout
A + SignEx(Im16)EX
MEMWB
WB
EX1
EX2
R[rd] ALUout
Add3 takes 5 cycles
WriteB
WriteB
(For More Practice Exercise 5.45)