Upload
griselda-walters
View
217
Download
0
Embed Size (px)
DESCRIPTION
EI209 Chapter 4B.3Haojin Zhu, SJTU 2015 Step 5: Assemble Control logic ALUctr RegDst ALUSrc ExtOp MemtoRegMemWr Equal Instruction Imm16RdRsRt nPC_sel Adr Inst Memory DATA PATH Decoder Op Fun RegWr
Citation preview
EI209 Chapter 4B.1 Haojin Zhu, SJTU 2015
EI 209 Computer Organization
Fall 2015
Chapter 4B: The Processor, Control and Multi-cycle Datapath
[Adapted from Computer Organization and Design, 4th Edition, Patterson & Hennessy, © 2012, MK]
EI209 Chapter 4B.2 Haojin Zhu, SJTU 2015
Recap: A Summary of Control Signalsinst Register Transfer
ADD R[rd] <– R[rs] + R[rt]; PC <– PC + 4
ALUsrc = RegB, ALUctr = “add”, RegDst = rd, RegWr, nPC_sel = “+4”
SUB R[rd] <– R[rs] – R[rt]; PC <– PC + 4
ALUsrc = RegB, ALUctr = “sub”, RegDst = rd, RegWr, nPC_sel = “+4”
ORi R[rt] <– R[rs] + zero_ext(Imm16); PC <– PC + 4
ALUsrc = Im, Extop = “Z”, ALUctr = “or”, RegDst = rt, RegWr, nPC_sel = “+4”
LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4
ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemtoReg, RegDst = rt, RegWr, nPC_sel = “+4”
STORE MEM[ R[rs] + sign_ext(Imm16)] <– R[rs]; PC <– PC + 4
ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemWr, nPC_sel = “+4”
BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00 else PC <– PC + 4
nPC_sel = “Br”, ALUctr = “sub”
EI209 Chapter 4B.3 Haojin Zhu, SJTU 2015
Step 5: Assemble Control logic
ALUctrRegDst ALUSrcExtOp MemtoRegMemWr Equal
Instruction<31:0>
<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
nPC_sel
Adr
InstMemory
DATA PATH
Decoder
Op
<21:25>
Fun
RegWr
EI209 Chapter 4B.4 Haojin Zhu, SJTU 2015
add sub ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWriteBranchJumpExtOpALUctr<2:0>
1001000x
Add
1001000x
Subtr
01010000
Or
01110001
Add
x1x01001
Add
x0x0010x
Subtr
xxx0001x
xxx
op target address
op rs rt rd shamt func061116212631
op rs rt immediate
R-type
I-type
J-type
add, sub
ori, lw, sw, beq
jump
funcop 00 0000 00 0000 00 110110 0011 10 1011 00 0100 00 0010
10 0000 10 0010 We Don’t Care :-)
A Summary of the Control Signals
EI209 Chapter 4B.5 Haojin Zhu, SJTU 2015
R-type ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWriteBranchJumpExtOpALUctr
1001000x
Add/Subtr
01010000
Or
01110001
Add
x1x01001
Add
x0x0010x
Subtr
xxx0001x
xxx
op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
funcMain
Controlop
6
ALUControl(Local)N=?
6ALUop
ALUctr3
ALU
Two levels of decoding: Main Control and ALU Control
ALUctr is determined by ALUop and func, while other control signals are determined by op。How many bits will N need?
R、 I-ori、 I-lw/sw、 I-beq、J
The Concept of Local Decoding
3, why?
EI209 Chapter 4B.6 Haojin Zhu, SJTU 2015
R-type ori lw sw beq jump
ALUop (Symbolic) “R-type” Or Add Add Subtr xxx
ALUop<2:0> 1 xx 0 10 0 00 0 00 0x1 xxx
MainControl
op
6
ALUControl(Local)
func
N
6ALUop
ALUctr
3
000000 rs rt rd shamt func
061116212631
R-type
func<5:0> Instruction Operation
10 0000
10 0010
10 0100
10 0101
10 1010
add
subtract
and
or
set-on-less-than
ALUctr<2:0> ALU Operation
000
001
100
101
010
Add
Subtract
And
Or
Subtract
ALUctr
ALU
Encoding ALUop as follows
Could ALUop use just 2 bits?Yes! Since jump is X, we could use 2 bits: R:11, I-ori:10, I-beq:01, I-lw/sw:00, J-xx
R-Type is1xx, could identify R-instruction by using 1 bit!
The Decoding of the “func” Field
EI209 Chapter 4B.7 Haojin Zhu, SJTU 2015
R-type ori lw sw beqALUop(Symbolic) “R-type” Or Add Add Subtr
ALUop<2:0> 1 00 0 10 0 00 0 00 0 x1
ALUop funcbit2 bit1 bit0 bit<2> bit<1> bit<0>bit<3>
0 0 0 x x x x
ALUctrALUOperation
Add 0 0 0bit<2> bit<1> bit<0>
0 x 1 x x x x Subtract 0 0 10 1 0 x x x x Or 1 1 01 x x 0 0 0 0 Add 0 0 01 x x 0 0 1 0 Subtract 0 0 11 x x 0 1 0 0 And 0 1 01 x x 0 1 0 1 Or 1 1 01 x x 1 0 1 0 Subtract 0 0 1
funct<3:0> Instruction Op.00000010010001011010
addsubtractandorset-on-less-than
The Truth Table for ALUctrNon-R-type Instructionsdetermined by ALUop
R-type Instructionsdetermined by funct
EI209 Chapter 4B.8 Haojin Zhu, SJTU 2015
The Logic Equation for ALUctr<0>
ALUctr<0> = !ALUop<2> & ALUop<0> + ALUop<2> & !func<2> & func<1> & !func<0>
ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3> ALUctr<0>
0 x 1 x x x x 11 x x 0 0 1 0 11 x x 1 0 1 0 1
This makes func<3> a don’t care
Choose the rows with ALUctr[0]=1
EI209 Chapter 4B.9 Haojin Zhu, SJTU 2015
The Logic Equation for ALUctr<1>
ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3>
0 1 0 x x x x 1ALUctr<1>
1 x x 0 1 0 0 11 x x 0 1 0 1 1
ALUctr<1> = !ALUop<2> & ALUop<1> & ! ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1>
Choose the rows with ALUctr[1]=1
EI209 Chapter 4B.10 Haojin Zhu, SJTU 2015
The Logic Equation for ALUctr<2>
ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3> ALUctr<2>
0 1 0 x x x x 11 x x 0 1 0 1 1
ALUctr<2> = !ALUop<2> & ALUop<1> & !ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1>
& func<0>
Choose the rows with ALUctr[2]=1
EI209 Chapter 4B.11 Haojin Zhu, SJTU 2015
Summery of Control Logic of Local Control
ALUControl(Local)
func
3
6ALUop
ALUctr
3
ALUctr<0> = !ALUop<2> & ALUop<0> + ALUop<2> & !func<2> & func<1> & !func<0>
ALUctr<1> = !ALUop<2> & ALUop<1> & !ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1>
ALUctr<2> = !ALUop<2> & ALUop<1> & !ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1> &
func<0>
EI209 Chapter 4B.12 Haojin Zhu, SJTU 2015
R-type ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWriteBranchJumpExtOpALUop (Symbolic)
1001000x
“R-type”
01010000
Or
01110001
Add
x1x01001
Add
x0x0010x
Subtr
xxx0001x
xxx
op 00 0000 00 1101 10 0011 10 101100 010000 0010
ALUop <2> 1 0 0 0 0 xALUop <1> x 1 0 0 x xALUop <0> x 0 0 0 1 x
MainControl
op6
ALUControl(Local)
func
3
6
ALUop
ALUctr3
RegDstALUSrc
:
Input of main controlOutput of Main Control
The “Truth Table” for the Main Control
EI209 Chapter 4B.13 Haojin Zhu, SJTU 2015
R-type ori lw sw beq jumpRegWrite 1 1 1 0 0 0
op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
RegWrite = R-type + ori + lw= !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> (R-type) + !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> (ori) + op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0> (lw)
op<0>
op<5>. .op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
R-type ori lw sw beq jumpRegWrite
Decoder
The “Truth Table” for RegWrite
EI209 Chapter 4B.14 Haojin Zhu, SJTU 2015
op<0>
op<5>. .op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
R-type ori lw sw beq jumpRegWrite
ALUSrc
MemtoRegMemWrite
BranchJump
RegDst
ExtOp
ALUop<2>ALUop<1>ALUop<0>
Decoder
The “Truth Table” for RegWrite
EI209 Chapter 4B.15 Haojin Zhu, SJTU 2015
The Complete Single Cycle Data Path with Control
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Ext
Mux
Mux
3216imm16
ALUSrc
ExtOp
Mux
MemtoReg
Clk
Data InWrEn32 Adr
DataMemory
32
MemWrA
LU
InstructionFetch Unit
Clk
Zero
Instruction<31:0>
Jump
Branch
0
1
0
1
01<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
MainControl
op6
ALUControlfunc
6
3ALUop
ALUctr3
RegDst
ALUSrc:
Instr<5:0>
Instr<31:26>
Instr<15:0>
EI209 Chapter 4B.16 Haojin Zhu, SJTU 2015
Time Delay for LW: Critical PathClk
PC
Rs, Rt, Rd,Op, Func
Clk-to-Q
ALUctr
Instruction Memory Access Time
Old Value New Value
RegWr Old Value New Value
Delay through Control Logic
busARegister File Access Time
Old Value New Value
busB
ALU Delay
Old Value New Value
Old Value New Value
New ValueOld Value
ExtOp Old Value New Value
ALUSrc Old Value New Value
MemtoReg Old Value New Value
Address Old Value New Value
busW Old Value New
Delay through Extender & Mux
RegisterWrite Occurs
Data Memory Access Time
PC+4
PC+4 PC
EI209 Chapter 4B.17 Haojin Zhu, SJTU 2015
Performance of Single-Cycle Machine Assume that the operation times for the major functional units in this
implementation are the following Memory units: 200 picoseconds ALU and adders: 100 ps Register file (read or write): 50 ps
Assuming that the multiplexors, control unit, PC accesses, sign extension unit, and wires have no delay, which of the following implementations would be faster and by how much?
1. An implementation in which every instruction operates in 1 clock cycle of a fixed length.
2. An implementation where every instruction executes in 1 clock cycle using a variable-length clock.
To compare the performance, assume the following instruction mix: 25% loads, 10% stores, 45% ALU instructions, 15 % branches, and 5% jumps.
EI209 Chapter 4B.18 Haojin Zhu, SJTU 2015
What’s wrong with our CPI=1 processor?
PC Inst Memory mux ALU Data Mem mux
PC Reg FileInst Memory mux ALU mux
PC Inst Memory mux ALU Data Mem
PC Inst Memory cmp mux
Reg File
Reg File
Reg File
Arithmetic & Logical
Load
Store
Branch
Critical Path
setup
setup
EI209 Chapter 4B.19 Haojin Zhu, SJTU 2015
200 50 100 0 350
200 200
Instr. I Mem Reg Rd ALU Op D Mem Reg Wr TotalR-typeloadstorebeqjump
200 50 100 200 50 600
200 50 100 200 550
Instruction Critical Paths
200 50 100 0 50 400
Calculate cycle time assuming negligible delays (for muxes, control unit, sign extend, PC access, shift left 2, wires, setup and hold times) except: Instruction and Data Memory (200 ps) ALU and adders (100 ps) Register File access (reads or writes) (50 ps)
EI209 Chapter 4B.20 Haojin Zhu, SJTU 2015
Comparison of clock cycles for two implementation
2. An implementation where every instruction executes in 1 clock cycle using a variable-length clock.
1. An implementation in which every instruction operates in 1 clock cycle of a fixed length.
CPU clock cycle=600*25%+550*10%+400*45%+350*15%+200*5% =447.5 ps
CPU clock cycle=600 ps (longest instruction)
EI209 Chapter 4B.21 Haojin Zhu, SJTU 2015
Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently – the clock cycle must
be timed to accommodate the slowest instruction especially problematic for more complex instructions like
floating point multiply
May be wasteful of area since some functional units (e.g., adders) must be duplicated since they can not be shared during a clock cycle
but Is simple and easy to understand
Clk
lw sw Waste
Cycle 1 Cycle 2
EI209 Chapter 4B.22 Haojin Zhu, SJTU 2015
Multicycle Implementation Overview Each instruction step takes 1 clock cycle
Therefore, an instruction takes more than 1 clock cycle to complete
Not every instruction takes the same number of clock cycles to complete
Multicycle implementations allow faster clock rates different instructions to take a different number of clock cycles functional units to be used more than once per instruction as long as
they are used on different clock cycles, as a result- only need one memory- only need one ALU/adder
EI209 Chapter 4B.23 Haojin Zhu, SJTU 2015
The Multicycle Datapath – A High Level View
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB A
LUou
t
Registers have to be added after every major functional unit to hold the output value until it is used in a subsequent clock cycle
EI209 Chapter 4B.24 Haojin Zhu, SJTU 2015
Clocking the Multicycle Datapath
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB A
LUou
t
System Clock
MemWrite RegWrite
clock cycle
EI209 Chapter 4B.25 Haojin Zhu, SJTU 2015
Break up the instructions into steps where each step takes a clock cycle while trying to balance the amount of work to be done in each step use only one major functional unit per clock cycle
At the end of a clock cycle Store values needed in a later clock cycle by the current instruction in a
state element (internal register not visible to the programmer)IR – Instruction RegisterMDR – Memory Data RegisterA and B – Register File read data registersALUout – ALU output register
- All (except IR) hold data only between a pair of adjacent clock cycles (so they don’t need a write control signal)
Data used by subsequent instructions are stored in programmer visible state elements (i.e., Register File, PC, or Memory)
Our Multicycle Approach
EI209 Chapter 4B.26 Haojin Zhu, SJTU 2015
The Complete Multicycle Data with Control
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
EI209 Chapter 4B.27 Haojin Zhu, SJTU 2015
Reading from or writing to any of the internal registers, Register File, or the PC occurs (quickly) at the beginning (for read) or the end of a clock cycle (for write)
Reading from the Register File takes ~50% of a clock cycle since it has additional control and access overhead (but reading can be done in parallel with decode)
Had to add multiplexors in front of several of the functional unit input ports (e.g., Memory, ALU) because they are now shared by different clock cycles and/or do multiple jobs
All operations occurring in one clock cycle occur in parallel This limits us to one ALU operation, one Memory access, and
one Register File access per clock cycle
Our Multicycle Approach, con’t
EI209 Chapter 4B.28 Haojin Zhu, SJTU 2015
1. Instruction Fetch
2. Instruction Decode and Register Fetch
3. R-type Instruction Execution, Memory Read/Write Address Computation, Branch Completion, or Jump Completion
4. Memory Read Access, Memory Write Completion or R-type Instruction Completion
5. Memory Read Completion (Write Back)
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
Five Instruction Steps
EI209 Chapter 4B.29 Haojin Zhu, SJTU 2015
RTL for Instructions Common Steps:
Instr fetch IR = Memory[PC]; PC Updating PC = PC + 4;
Decode and Register reading A = Reg[IR[25-21]];B = Reg[IR[20-16]];
Instruction Dependent operation
EI209 Chapter 4B.30 Haojin Zhu, SJTU 2015
RTL for Instructions Instruction Dependent operation
Execute ALUOut = A op B;
ALUOut = A + sign-
extend (IR[15-0]);
if (A==B) PC = ALUOut;
ALUOut = PC +(sign-
extend(IR[15-0])<< 2);
PC = PC[31-28] ||(IR[25-0]
<< 2);
Memory access
Reg[IR[15-11]] = ALUOut;
MDR = Memory[ALUOu
t]; or
Memory[ALUOut] = B;
Write-back Reg[IR[20-
16]] = MDR;
EI209 Chapter 4B.31 Haojin Zhu, SJTU 2015
Group Discussion 1 Four-six members form a group and discuss the
following questions related to multi-cycle datapath? 1. What are the task for each step?(balance the load of
each step to minimize the number of clock cycles) Recommend one member to present your results?
EI209 Chapter 4B.32 Haojin Zhu, SJTU 2015
Multi-cycle datapath
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
EI209 Chapter 4B.33 Haojin Zhu, SJTU 2015
RTL SummaryStep R-type Mem Ref Branch Jump
Instr fetch
IR = Memory[PC]; PC = PC + 4;
Decode A = Reg[IR[25-21]];B = Reg[IR[20-16]];
ALUOut = PC +(sign-extend(IR[15-0])<< 2);
Execute ALUOut = A op B;
ALUOut = A + sign-extend
(IR[15-0]);
if (A==B) PC =
ALUOut;
PC = PC[31-28] ||(IR[25-0] << 2);
Memory access
Reg[IR[15-11]] = ALUOut;
MDR = Memory[ALUOut];
orMemory[ALUOut]
= B; Write-back
Reg[IR[20-16]] = MDR;
EI209 Chapter 4B.34 Haojin Zhu, SJTU 2015
Group Discussion 2 Four-six members form a group and discuss the
following questions related to multi-cycle datapath? 1. What are data paths and control signals for each step
of each instruction? Recommend one member to present your results?
EI209 Chapter 4B.35 Haojin Zhu, SJTU 2015
Multi-cycle datapath
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
EI209 Chapter 4B.36 Haojin Zhu, SJTU 2015
Use PC to get instruction from the memory and put it in the Instruction Register
Increment the PC by 4 and put the result back in the PC Can be described succinctly using the RTL "Register-
Transfer Language“
IR = Memory[PC];PC = PC + 4;
Step 1: Instruction Fetch
Can we figure out the values of the control signals?
What is the advantage of updating the PC now?
EI209 Chapter 4B.37 Haojin Zhu, SJTU 2015
Datapath Activity During Instruction Fetch
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
00
EI209 Chapter 4B.39 Haojin Zhu, SJTU 2015
Fetch Control Signals Settings
Start
Instr FetchIorD=0MemRead;IRWrite
ALUSrcA=0ALUsrcB=01
PCSource,ALUOp=00PCWrite
Unless otherwise assigned
PCWrite,IRWrite, MemWrite,RegWrite=0 others=X
EI209 Chapter 4B.40 Haojin Zhu, SJTU 2015
Don’t know what the instruction is yet, so can only Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch
The RTL:
A = Reg[IR[25-21]];B = Reg[IR[20-16]];ALUOut = PC
+(sign-extend(IR[15-0])<< 2);
Note we aren't setting any control lines based on the instruction (since we don’t know what it is (the control logic is busy "decoding" the op code bits))
Step 2: Instruction Decode and Register Fetch
EI209 Chapter 4B.42 Haojin Zhu, SJTU 2015
Datapath Activity During Instruction Decode
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
00
EI209 Chapter 4B.44 Haojin Zhu, SJTU 2015
Decode Control Signals Settings
Start
Instr Fetch DecodeALUSrcA=0ALUSrcB=11ALUOp=00
PCWriteCond=0
IorD=0MemRead;IRWrite
ALUSrcA=0ALUsrcB=01
PCSource,ALUOp=00PCWrite
Unless otherwise assigned
PCWrite,IRWrite, MemWrite,RegWrite=0 others=X
EI209 Chapter 4B.45 Haojin Zhu, SJTU 2015
ALU is performing one of four functions, based on instruction type
Memory reference (lw and sw):
ALUOut = A + sign-extend(IR[15-0]);
R-type:
ALUOut = A op B;
Branch:
if (A==B) PC = ALUOut; Jump:
PC = PC[31-28] || (IR[25-0] << 2);
Step 3 (instruction dependent)
EI209 Chapter 4B.47 Haojin Zhu, SJTU 2015
Datapath Activity During lw & sw Execute
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
00
EI209 Chapter 4B.49 Haojin Zhu, SJTU 2015
Datapath Activity During R-type Execute
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
10
EI209 Chapter 4B.51 Haojin Zhu, SJTU 2015
Datapath Activity During beq Execute
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
01
EI209 Chapter 4B.53 Haojin Zhu, SJTU 2015
Datapath Activity During j Execute
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
EI209 Chapter 4B.55 Haojin Zhu, SJTU 2015
Execute Control Signals Settings
Start
Instr Fetch Decode
(Op = R-ty
pe)
(Op =
beq)
(Op = lw or sw
) (Op = j)
ALUSrcA=1ALUSrcB=10ALUOp=00
PCWriteCond=0
ALUSrcA=1ALUSrcB=00ALUOp=10
PCWriteCond=0
ALUSrcA=1ALUSrcB=00ALUOp=01
PCSource=01PCWriteCond
PCSource=10PCWriteExecute
Unless otherwise assigned
PCWrite,IRWrite, MemWrite,RegWrite=0 others=X
ALUSrcA=0ALUSrcB=11ALUOp=00
PCWriteCond=0
IorD=0MemRead;IRWrite
ALUSrcA=0ALUsrcB=01
PCSource,ALUOp=00PCWrite
EI209 Chapter 4B.56 Haojin Zhu, SJTU 2015
Memory reference:
MDR = Memory[ALUOut]; -- lwor
Memory[ALUOut] = B; -- sw
R-type instruction completion
Reg[IR[15-11]] = ALUOut;
Remember, the register write actually takes place at the end of the cycle on the clock edge
Step 4 (also instruction dependent)
EI209 Chapter 4B.58 Haojin Zhu, SJTU 2015
Datapath Activity During lw Memory Access
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
EI209 Chapter 4B.60 Haojin Zhu, SJTU 2015
Datapath Activity During sw Memory Access
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
EI209 Chapter 4B.62 Haojin Zhu, SJTU 2015
Datapath Activity During R-type Completion
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
EI209 Chapter 4B.64 Haojin Zhu, SJTU 2015
Memory Access Control Signals Settings
Start
Instr Fetch Decode
Memory Access
Execute
(Op = R-ty
pe)
(Op =
beq)
(Op = lw or sw
) (Op = j)
(Op = lw)(Op = sw)
MemReadIorD=1
PCWriteCond=0
MemWriteIorD=1
PCWriteCond=0
RegDst=1RegWriteMemtoReg=0
PCWriteCond=0
Unless otherwise assigned
PCWrite,IRWrite, MemWrite,RegWrite=0 others=X
IorD=0MemRead;IRWrite
ALUSrcA=0ALUsrcB=01
PCSource,ALUOp=00PCWrite
ALUSrcA=0ALUSrcB=11ALUOp=00
PCWriteCond=0
ALUSrcA=1ALUSrcB=10ALUOp=00
PCWriteCond=0
ALUSrcA=1ALUSrcB=00ALUOp=10
PCWriteCond=0
ALUSrcA=1ALUSrcB=00ALUOp=01
PCSource=01PCWriteCond
PCSource=10PCWrite
EI209 Chapter 4B.65 Haojin Zhu, SJTU 2015
All we have left is the write back into the register file the data just read from memory for lw instruction
Reg[IR[20-16]]= MDR;
What about all the other instructions?
Step 5: Memory Read Completion (Write Back)
EI209 Chapter 4B.67 Haojin Zhu, SJTU 2015
Datapath Activity During lw Write Back
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
ALU
out
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorDPCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[31-26]
32
28
EI209 Chapter 4B.69 Haojin Zhu, SJTU 2015
Write Back Control Signals Settings
Start
Instr Fetch Decode
Write Back
Memory Access
Execute
(Op = R-ty
pe)
(Op =
beq)
(Op = lw or sw
) (Op = j)
(Op = lw)(Op = sw)
RegDst=0RegWrite
MemtoReg=1PCWriteCond=0
Unless otherwise assigned
PCWrite,IRWrite, MemWrite,RegWrite=0 others=X
IorD=0MemRead;IRWrite
ALUSrcA=0ALUsrcB=01
PCSource,ALUOp=00PCWrite
ALUSrcA=0ALUSrcB=11ALUOp=00
PCWriteCond=0
ALUSrcA=1ALUSrcB=10ALUOp=00
PCWriteCond=0
ALUSrcA=1ALUSrcB=00ALUOp=10
PCWriteCond=0
ALUSrcA=1ALUSrcB=00ALUOp=01
PCSource=01PCWriteCond
PCSource=10PCWrite
MemReadIorD=1
PCWriteCond=0
MemWriteIorD=1
PCWriteCond=0
RegDst=1RegWriteMemtoReg=0
PCWriteCond=0
EI209 Chapter 4B.70 Haojin Zhu, SJTU 2015
RTL SummaryStep R-type Mem Ref Branch Jump
Instr fetch
IR = Memory[PC]; PC = PC + 4;
Decode A = Reg[IR[25-21]];B = Reg[IR[20-16]];
ALUOut = PC +(sign-extend(IR[15-0])<< 2);
Execute ALUOut = A op B;
ALUOut = A + sign-extend
(IR[15-0]);
if (A==B) PC =
ALUOut;
PC = PC[31-28] ||(IR[25-0] << 2);
Memory access
Reg[IR[15-11]] = ALUOut;
MDR = Memory[ALUOut];
orMemory[ALUOut]
= B; Write-back
Reg[IR[20-16]] = MDR;
EI209 Chapter 4B.71 Haojin Zhu, SJTU 2015
Example
Using the following instruction mix, what is the CPI, assuming that each state in the multicycle CPU requires 1 clock cycle?
Load 25%store 10%
branches 11%
jumps 2%
ALU 52%
EI209 Chapter 4B.72 Haojin Zhu, SJTU 2015
Answer:
Loads:5Stores:4ALU: 4Branches:3Jumps:3
CPI=0.25*5+0.10*4+0.52*4+0.11*3+0.02*3=4.12