Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
ACCESS IC LABORTORY
93-2 Digital System DesignGraduate Institute of Electronics Engineering, NTU
93-2 Digital System DesignGraduate Institute of Electronics Engineering, NTU
Design a MIPS Processor (1/2)Design a MIPS Processor (1/2)
Lecturer: Chihhao ChaoAdvisor: Prof. An-Yeu Wu2005/4/29 Friday
Graduate Institute of Electronics Engineering, NTU
P2
Digital System Design
OutlineOutlineReview: Implement a single-cycle MIPS machinevReview of synchronous circuit designvArchitecture of single-cycle MIPS machinevBuild combinational partsvBuild sequential partsvCombine them all
vMulti-cycle approach
Graduate Institute of Electronics Engineering, NTU
P3
Digital System Design
Review of Synchronous Circuit DesignReview of Synchronous Circuit Designv System is composed ofv D-FFv Combinational circuits
v Signal transition on output of D-FF always aligns to clock edge.
v Combinational circuits between each D-FF pair computes simultaneously.
v Usually code sequential elements (D-FF, Register File, Memory) and combinational elements (Computation Unit)separately.
Graduate Institute of Electronics Engineering, NTU
P4
Digital System Design
FF Based, Edge Trigger ClockingFF Based, Edge Trigger Clockingv Td = delay of combinational logicv Tcycle = cycle time of clockv Timing requirements for Tdv Tdmax < Tcycle –Tsetup – Tcq è no setup time violationv Tdmin > Thold – Tcq è no hold time violation
FF FF
clkTcycle
CombinationalLogic
Td
Tcq Td Tsetup
Graduate Institute of Electronics Engineering, NTU
P5
Digital System Design
ExampleExample
vThese computation are simultaneous:B(t) = fAB( A(t-1) )C(t) = fBC( B(t-1) )D(t) = fCD( C(t-1) )
At some t
Graduate Institute of Electronics Engineering, NTU
P6
Digital System Design
The Big Picture: The Performance PerspectiveThe Big Picture: The Performance Perspectivev Performance of a machine is determined by:v Instruction countv Clock cycle timev Clock cycles per instruction
v Processor design (datapath and control) will determine:v Clock cycle timev Clock cycles per instruction
v Single cycle processor - one clock cycle per instructionv Advantages: Simple design, low CPIv Disadvantages: Long cycle time, which is limited by the slowest
instruction.
Graduate Institute of Electronics Engineering, NTU
P7
Digital System Design
From Instruction Set to ArchitectureFrom Instruction Set to Architecture1. Analyze instruction set => datapath requirementsv the meaning of each instruction is given by register transfersv R[rd] <– R[rs] + R[rt];v datapath must include storage element for ISA registersv datapath must support each register transfer
2. Select set of datapath components and establish clocking methodology
3. Design datapath to meet the requirements4. Analyze implementation of each instruction to determine
setting of control points that effects the register transfer.5. Design the control logic
Graduate Institute of Electronics Engineering, NTU
P8
Digital System Design
Step 1. Analyze Instruction SetStep 1. Analyze Instruction Set
Graduate Institute of Electronics Engineering, NTU
P9
Digital System Design
MIPS Instruction FormatMIPS Instruction Formatv All MIPS instructions are 32 bits long. The three
instruction formats are:
v R-type
v I-type
v J-type
v The different fields are:v op: operation of the instructionv rs, rt, rd: the source and destination register specifiersv shamt: shift amountv funct: selects the variant of the operation in the “op” fieldv address / immediate: address offset or immediate valuev target address: target address of the jump instruction
op target address02631
6 bits 26 bits
op rs rt rd shamt funct061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
Graduate Institute of Electronics Engineering, NTU
P10
Digital System Design
Several Key InstructionsSeveral Key Instructions
vADD and SUBvaddu rd, rs, rtvsubu rd, rs, rt
vOR Immediatevori rt, rs, imm16
vLOAD and STOREvlw rt, rs, imm16vsw rt, rs, imm16
vBRANCH:vbeq rs, rt, imm16
op rs rt rd shamt funct061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
Graduate Institute of Electronics Engineering, NTU
P11
Digital System Design
RTL Semantic of InstructionRTL Semantic of Instructionv The semantics of instructions are given by fetched data
from memory MEM[PC]v Processor executes the semantic of instruction{ op | rs | rt | rd | shamt | funct } ←MEM[ PC ]{ op | rs | rt | Imm16 } ←MEM[ PC ]
Instruction Register Transfers
ADDu R[rd] <– R[rs] + R[rt]; PC ← PC + 4
SUBu R[rd] <– R[rs] – R[rt]; PC ← PC + 4
ORi R[rt] <– R[rs] + zero_ext(imm16); PC ← PC + 4
LOAD R[rt] <– MEM[ R[rs] + sign_ext(imm16)]; PC ← PC + 4
STORE MEM[ R[rs] + sign_ext(imm16) ] <– R[rt]; PC ← PC + 4
BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + sign_ext(imm16)] else PC ← PC + 4
Graduate Institute of Electronics Engineering, NTU
P12
Digital System Design
Step 2. Select Components and Step 2. Select Components and Determine Clocking MethodologyDetermine Clocking Methodology
Graduate Institute of Electronics Engineering, NTU
P13
Digital System Design
SingleSingle--Cycle MIPS Architecture (1/2)Cycle MIPS Architecture (1/2)vCombinational Logic: Doesn’t need clock
vMultiplexer
vArithmetic-Logic Unit
32A
B 32
Y32
Select
MU
X
32
32
A
B32 Result
OP
AL
U
3
Graduate Institute of Electronics Engineering, NTU
P14
Digital System Design
SingleSingle--Cycle MIPS Architecture (2/2)Cycle MIPS Architecture (2/2)vSequential logic element: Register file and
Memory
Graduate Institute of Electronics Engineering, NTU
P15
Digital System Design
Determine Clocking MethodologyDetermine Clocking Methodology
v All storage elements are clocked by the same clock edgev Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew
v (CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold Time
Clk
Don’t CareSetup Hold
.
.
.
.
.
.
.
.
.
.
.
.
Setup Hold
Graduate Institute of Electronics Engineering, NTU
P16
Digital System Design
Step 3. Design DatapathStep 3. Design DatapathA BottomA Bottom--Up ApproachUp Approach
Graduate Institute of Electronics Engineering, NTU
P17
Digital System Design
Define ALU OperationDefine ALU OperationvDesign a compact ALU:vAdd (Unsigned/Signed)vSubtract (U/S)
vGreater thanvEqual
vBoolean ANDvBoolean ORvBoolean NOTvBoolean XOR
vShift Left (zero-padding)vShift Left (sign-extension)vShift Right (zero-padding)
vWe design the ALU in 3 partsvADD/SUB/GreaterThenvBooleanLogic/EqualvShifter
Graduate Institute of Electronics Engineering, NTU
P18
Digital System Design
Build Combinational Parts (1/3)Build Combinational Parts (1/3)vAdd / Sub / GreaterThan can share an adder
( 1)10 = (00000001)2,2’s complement = (00000001)2,1’s complement
(-1)10 = (11111111)2,2’s complement = (11111110)2,1’s complement
One’s complement
Op2 > Op1→ Op1 - Op2 < 0
Subtraction
GreaterThan
Graduate Institute of Electronics Engineering, NTU
P19
Digital System Design
22’’s Comp s Comp -- Detecting OverflowDetecting OverflowvWhen adding two's complement numbers, overflow will
only occur if vthe numbers being added have the same sign vthe sign of the result is different
vIf we perform the additionan-1 an-2 ... a1 a0
+ bn-1bn-2… b1 b0----------------------------------= sn-1sn-2… s1 s0
vOverflow can be detected asvOverflow can also be detected as
where cn-1and cn are the carry in and carry out of the most significant bit.
111111 −⋅−⋅−+−⋅−⋅−= nnnnnn sbasbaV
1−⊗= nn ccV
Graduate Institute of Electronics Engineering, NTU
P20
Digital System Design
Unsigned Unsigned -- Detecting OverflowDetecting OverflowvFor unsigned numbers, overflow occurs if there is carry
out of the most significant bit.
vFor example, 1001 = 9+1000 = 8
0001 = 1
vWith the MIPS architecturevOverflow exceptions occur for two’s complement arithmeticØ add, sub, addi
vOverflow exceptions do not occur for unsigned arithmeticØ addu, subu, addiu
ncV =
Graduate Institute of Electronics Engineering, NTU
P21
Digital System Design
Build Combinational Parts (2/3)Build Combinational Parts (2/3)v Put Branch-on-Equal in
Boolean logic modulev Implement XOR for each
bit of Op1 and Op2v XOR shared with Boolean
logic operationv Use OR tree to propagate
unequal bit out
v Or, bitwise-OR all result bit of subtractv Slower, longer critical
path
Graduate Institute of Electronics Engineering, NTU
P22
Digital System Design
Two kinds:
logical-- value shifted in is always "0"
arithmetic-- on right shifts, sign extend
Build Combinational Parts (3/3)Build Combinational Parts (3/3)v Shift left: zero padding
v A=16’b0000000000000001 v B[3:0]=4’b0100v Result=16’b0000000000010000
v Shift right (unsigned): zero paddingv A=16’b1000000000000000v B[3:0]=4’b0100v Result=16’b0000100000000000
v Keep 2’s complement signv Shift right (signed)
v A=16’b1100000000000000v B[3:0]=4’b0100v Result=16’b1111110000000000
v A=16’b0100000000000000v B[3:0]=4’b0100v Result=16’b0000010000000000
msb lsb"0" "0"
msb lsb "0"
Graduate Institute of Electronics Engineering, NTU
P23
Digital System Design
Shifter Structure
vWhat comes in the MSBs?vHow many levels for 32-bit shifter?
Basic Building Block
8-bit right shifter
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
S2 S1 S0A0A1A2A3A4A5A6A7
R0R1R2R3R4R5R6R7
1 0sel
A B
D
2-to-1 Mux
Graduate Institute of Electronics Engineering, NTU
P24
Digital System Design
Build ALU ModuleBuild ALU Modulev Now, you can
combine them all to build an pure combinational ALU
Graduate Institute of Electronics Engineering, NTU
P25
Digital System Design
Build Sequential ElementsBuild Sequential ElementsvRegister File (RF)v32 registers ($r0 for constant zero)vDual asynchronous read portsvSingle synchronous write portvSince this class is based on Flip-Flop design, we use
DFF to compose the RFvMemoryvInstruction memoryØWord length: 32-bit
vData memoryØWord length: 32-bit
Graduate Institute of Electronics Engineering, NTU
P26
Digital System Design
Storage Element: Register FileStorage Element: Register File
v length of each word : 32 bitv 32 registers in the register file. v one 32 bit input bus: busW. v Two 32 bit output buses: busA and busB.
Clk
busW
Write Enable
3232
busA
32busB
5 5 5RW RA RB
32 32-bitRegisters
Graduate Institute of Electronics Engineering, NTU
P27
Digital System Design
Register File: WriteRegister File: WritevWriteEnable =1vThe data on busW will be written into a register
synchronously with clk(posedge). vRW selects the register(one of 32 registers) to be
written. vBusA and busB could be arbitrary number during
write operation.
Graduate Institute of Electronics Engineering, NTU
P28
Digital System Design
Register File: ReadRegister File: ReadvThe register file behaves as a combinational logic
block when reading.vRead the data in the register file asynchronouslyvRA selects one of 32 registers. The content of
that register will be output on busA. vAt the same time, RB selects one of 32 registers.
The content of that register will be output on busB.
Graduate Institute of Electronics Engineering, NTU
P29
Digital System Design
Structure of RFStructure of RF
Graduate Institute of Electronics Engineering, NTU
P30
Digital System Design
Storage Element: Idealized MemoryStorage Element: Idealized MemoryvMemory (idealized)v One input bus: Data Inv One output bus: Data Out
vMemory word is selected by:v Address selects the word to
put on Data OutvWrite Enable = 1: address
selects the memoryword to be written via the Data In bus
Clk
Data In
Write Enable
`WLEN
Data Out
Address32
`WLEN
`WLEN = 16 , for Data Memory`WLEN = 32 , for Instruction Memory
Graduate Institute of Electronics Engineering, NTU
P31
Digital System Design
Memory: WriteMemory: WritevWrite Enable =1’b1vData In can be written into the memory
synchronously with positive edge of clockvThe written location is specified by input address vData Out could be arbitrary number during write
operation
Graduate Institute of Electronics Engineering, NTU
P32
Digital System Design
Memory: ReadMemory: ReadvWrite Enable = 1’b0vThe memory behaves as a combinational logic
block. vRead the data in the memory asynchronouslyvThe location to be read is specified by input
address.
Graduate Institute of Electronics Engineering, NTU
P33
Digital System Design
Precedence of Instruction ExecutionPrecedence of Instruction Execution
v Register Transfer Requirements →Datapath Design1. Instruction Fetch2. Decode instructions and Read Operands 3. Execute Operation4. Write back the result
Graduate Institute of Electronics Engineering, NTU
P34
Digital System Design
33--A: Overview of the Instruction Fetch UnitA: Overview of the Instruction Fetch UnitvThe common RTL operationsvFetch the Instruction: mem[PC]vUpdate the program counter:Ø Sequential Code: PC ← PC + 4 Ø Branch and Jump: PC ← “something else”
32
Instruction WordAddress
InstructionMemory
PCClk
Next AddressLogic
Graduate Institute of Electronics Engineering, NTU
P35
Digital System Design
33--B: Instruction Fetch and Execution B: Instruction Fetch and Execution (R(R--type)type)
vR[rd] ← R[rs] op R[rt] Example: addu rd, rs, rtvRa, Rb, and Rw come from instruction’s rs, rt, and rd fieldsvALUctr and RegWr: control logic after decoding the instruction
32Result
ALUctr
Clk
busW
RegWr
3232
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs RtRd
AL
Uop rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
3
Graduate Institute of Electronics Engineering, NTU
P36
Digital System Design
33--C: Instruction Fetch and Execution C: Instruction Fetch and Execution (I(I--type)type)
vR[rt] ← R[rs] op ZeroExt[imm16] Example : ori rt, rs, imm16
32
Result
ALUctr
Clk
busW
RegWr
3232
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs
RtRdRegDst
ZeroExt
Mux
Mux
3216imm16
ALUSrc
AL
U
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
immediate016 1531
16 bits16 bits0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3Rt
Graduate Institute of Electronics Engineering, NTU
P37
Digital System Design
33--D: Load from MemoryD: Load from MemoryvR[rt] ← Mem[R[rs] + SignExt[imm16] Example: lw rt, rs, imm16
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
RtRd
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
RsRegDst
Extender
Mux
Mux
3216
imm16
ALUSrc
ExtOp
Clk
Data InWrEn
32
Adr
DataMemory
32
AL
UMemWr M
ux
W_Src3
Graduate Institute of Electronics Engineering, NTU
P38
Digital System Design
33--E: Store to MemoryE: Store to Memory
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Extender
Mux
Mux
3216imm16
ALUSrcExtOp
Clk
Data InWrEn
32Adr
DataMemory
MemWr
AL
U
32
Mu
x
W_Src
3
vMem[ R[rs] + SignExt[imm16] ← R[rt] ] Example: sw rt, rs, imm16
Graduate Institute of Electronics Engineering, NTU
P39
Digital System Design
33--F: Branch InstructionF: Branch Instruction
vbeq rs, rt, imm16vmem[PC] Fetch the instruction from memoryvEqual ← (R[rs] == R[rt]) Calculate the branch conditionvif (COND eq 0) Calculate the next instruction’s addressØ PC ← PC + 4 + SignExt( immediate-16bit ) x 4
elseØ PC ← PC + 4
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
Graduate Institute of Electronics Engineering, NTU
P40
Digital System Design
Datapath for Branch InstructionDatapath for Branch Instructionvbeq rs, rt, imm16 Datapath generates condition (equal)
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
32
imm16PC
Clk
00
Adder
Mux
Adder
4nPC_sel
Clk
busW
RegWr
32
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs Rt
Equ
al?
Cond
PC E
xt
Inst Address
Graduate Institute of Electronics Engineering, NTU
P41
Digital System Design
Putting Them All TogetherPutting Them All Togetherim
m16
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Extender
Mux
3216imm16
ALUSrcExtOp
Mux
MemtoReg
Clk
Data InWrEn32 Adr
DataMemory
MemWrA
LUBranch
Instruction<31:0>
0
1
0
1
01
<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRtRs
=
Adder
Adder
PC
Clk
00
Mux
4
nPC_sel
PC E
xt
Adr
InstMemory
3
Graduate Institute of Electronics Engineering, NTU
P42
Digital System Design
Step 4. Analyze Implementation & Step 4. Analyze Implementation & Setup Control PointsSetup Control Points
Graduate Institute of Electronics Engineering, NTU
P43
Digital System Design
Role of Control UnitRole of Control Unit
ALUctrRegDst ALUSrcExtOp MemtoRegMemWr Equal
Instruction<31:0>
<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
Branch
Adr
InstMemory
DATA PATH
Control Unit
Op
<21:25>
Fun
RegWr
Graduate Institute of Electronics Engineering, NTU
P44
Digital System Design
Meaning of Control SignalMeaning of Control Signal
addr
InstructionMemory
vRs, Rt, Rd and Imm16 hardwired into datapathvBranch Taken: 0 => PC <– PC + 4;
1 => PC <– PC + 4 + ( SignExt( Imm16 ) || 0x00 )
Adder
Adder
PC
Clk
00Mux
4
Branch Taken
PC E
xt
Imm16
Graduate Institute of Electronics Engineering, NTU
P45
Digital System Design
The Meaning of Control SignalsThe Meaning of Control SignalsExtOp: “zero”, “sign”ALUsrc: 0 => regB; 1 => immediateALUctr: “add”, “sub”, “or”
MemWr: write memoryMemtoReg: 0 => ALU; 1 => MemRegDst: 0 => “rt”; 1 => “rd”RegWr: write destination register
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Extender
Mux
3216imm16
ALUSrcExtOp
Mux
MemtoReg
Clk
Data InWrEn32 Adr
DataMemory
MemWr
AL
U
Branch
0
1
0
1
01
=
3
Graduate Institute of Electronics Engineering, NTU
P46
Digital System Design
Specify Control SignalSpecify Control Signalinst Register Transfer
ADD R[rd] <– R[rs] + R[rt]; PC <– PC + 4
ALUsrc = RegB, ALUctr = “add”, RegDst = rd, RegWr, nPC_sel = “+4”
SUB R[rd] <– R[rs] – R[rt]; PC <– PC + 4
ALUsrc = RegB, ALUctr = “sub”, RegDst = rd, RegWr, nPC_sel = “+4”
ORi R[rt] <– R[rs] + zero_ext(Imm16); PC <– PC + 4
ALUsrc = Im, Extop = “Z”, ALUctr = “or”, RegDst = rt, RegWr, nPC_sel = “+4”
LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4
ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemtoReg, RegDst = rt, RegWr, nPC_sel = “+4”
STORE MEM[ R[rs] + sign_ext(Imm16)] <– R[rs]; PC <– PC + 4
ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemWr, nPC_sel = “+4”
BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00 else PC <– PC + 4
Branch Taken = EQUAL, ALUctr = “sub”
Graduate Institute of Electronics Engineering, NTU
P47
Digital System Design
Logic for Each Control SignalLogic for Each Control Signal
vnPC_sel <= if (OP == BEQ) then EQUAL else 0
vALUsrc <= if (OP == “000000”) then “regB” else “immed”
vALUctr <= if (OP == “000000”) then functelseif (OP == ORi) then “OR”elseif (OP == BEQ) then “sub”else “add”
vExtOp <= if (OP == ORi) then “zero” else “sign”
vMemWr <= (OP == Store)
vMemtoReg<= (OP == Load)
vRegWr: <= if ((OP == Store) || (OP == BEQ)) then 0 else 1
vRegDst: <= if ((OP == Load) || (OP == ORi)) then 0 else 1
Graduate Institute of Electronics Engineering, NTU
P48
Digital System Design
An Abstract View of ImplementationAn Abstract View of Implementation
DataOut
Clk
5
Rw Ra Rb32 32-bitRegisters
Rd
AL
U
Clk
Data In
DataAddress Ideal
DataMemory
Instruction
InstructionAddress
IdealInstruction
Memory
PC
5Rs
5Rt
32
323232
A
B
Nex
t Add
ress
Control
Datapath
Control Signals Conditions
Clk
Graduate Institute of Electronics Engineering, NTU
P49
Digital System Design
Step 5. Design Control UnitStep 5. Design Control Unit
Graduate Institute of Electronics Engineering, NTU
P50
Digital System Design
op target address
op rs rt rd shamt funct061116212631
op rs rt immediate
R-type
I-type
J-type
add, sub
ori, lw, sw, beq
jump
add sub ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWritenPCselJumpExtOpALUctr<2:0>
1001000x
Add
1001000x
Subtract
01010000
Or
01110001
Add
x1x01001
Add
x0x0010x
Subtract
xxx0001x
xxx
funcop 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010Appendix A
10 0000See 10 0010 We Don’t Care :-)
A Summary of Control SignalsA Summary of Control Signals
Graduate Institute of Electronics Engineering, NTU
P51
Digital System Design
R-type ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWriteBranchJumpExtOpALUop<N:0>
1001000x
“R-type”
01010000
Or
01110001
Add
x1x01001
Add
x0x0010x
Subtract
xxx0001x
xxx
op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
MainControl
op6
ALUControl(Local)
func
N
6ALUop
ALUctr3
AL
U
The Concept of Local DecodingThe Concept of Local Decoding
Graduate Institute of Electronics Engineering, NTU
P52
Digital System Design
vIn this exercise, ALUop has to be N=2 bits wide to represent:v(1) “R-type” instructionsv“I-type” instructions that require the ALU to perform:
Ø (2) Or, (3) Add, and (4) Subtract
vTo implement the full MIPS ISA, ALUop has to be 3 bits to represent:v(1) “R-type” instructionsv“I-type” instructions that require the ALU to perform:
Ø (2) Or, (3) Add, (4) Subtract (5) And (6) Set on <
MainControl
op6
ALUControl(Local)
func
N
6ALUop
ALUctr3
R-type ori lw sw beq jumpALUop (Symbolic) “R-type” Or Add Add Subtract xxx
ALUop<2:0> 1 00 0 10 0 00 0 00 0 01 xxx
The Encoding of ALUopThe Encoding of ALUop
Graduate Institute of Electronics Engineering, NTU
P53
Digital System Design
R-type ori lw sw beq jumpALUop (Symbolic) “R-type” Or Add Add Subtract xxx
ALUop<2:0> 1 00 0 10 0 00 0 00 0 01 xxx
MainControl
op6
ALUControl(Local)
func
N
6ALUop
ALUctr3
op rs rt rd shamt funct061116212631
R-type
funct<5:0> Instruction Operation10 000010 001010 010010 010110 1010
addsubtractandorset-on-less-than
ALUctr<2:0> ALU Operation000001010110111
AddSubtract
AndOr
Set-on-less-than
Get func from back of book for R-type Our processor only implements subset of operations
The Decoding of the The Decoding of the ““funcfunc”” FieldField
Graduate Institute of Electronics Engineering, NTU
P54
Digital System Design
This control is for more R-type instructions than our processor, but fewer than the entire MIPS ISA.
R-type ori lw sw beqALUop(Symbolic) “R-type” Or Add Add Subtract
ALUop<2:0> 1 00 0 10 0 00 0 00 0 01
ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3>
0 0 0 x x x x
ALUctrALUOperation
Add 0 1 0bit<2> bit<1> bit<0>
0 x 1 x x x x Subtract 1 1 00 1 x x x x x Or 0 0 11 x x 0 0 0 0 Add 0 1 01 x x 0 0 1 0 Subtract 1 1 01 x x 0 1 0 0 And 0 0 01 x x 0 1 0 1 Or 0 0 11 x x 1 0 1 0 Set on < 1 1 1
funct<3:0> Instruction Op.00000010010001011010
addsubtractandorset-on-less-than
The Truth Table for ALUctr<2>The Truth Table for ALUctr<2>
Graduate Institute of Electronics Engineering, NTU
P55
Digital System Design
ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3> ALUctr<2>
0 x 1 x x x x 11 x x 0 0 1 0 11 x x 1 0 1 0 1
vALUctr<2> = !ALUop<2> & ALUop<0>+ ALUop<2> & func<1>
The Logic Equation for ALUctr<2>The Logic Equation for ALUctr<2>
Graduate Institute of Electronics Engineering, NTU
P56
Digital System Design
ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3>
0 0 0 x x x x
ALUctrALUOperation
Add 0 1 0bit<2> bit<1> bit<0>
0 x 1 x x x x Subtract 1 1 00 1 x x x x x Or 0 0 11 x x 0 0 0 0 Add 0 1 01 x x 0 0 1 0 Subtract 1 1 01 x x 0 1 0 0 And 0 0 01 x x 0 1 0 1 Or 0 0 11 x x 1 0 1 0 Set on < 1 1 1
The Truth Table for ALUctr <1>The Truth Table for ALUctr <1>
Graduate Institute of Electronics Engineering, NTU
P57
Digital System Design
ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3>
0 0 0 x x x x 1ALUctr<1>
0 x 1 x x x x 11 x x 0 0 0 0 11 x x 0 0 1 0 11 x x 1 0 1 0 1
vALUctr<1> = !ALUop<2> & !ALUop<1>+ ALUop<2> & func<2>
The Logic Equation for ALUctr<1>The Logic Equation for ALUctr<1>
Graduate Institute of Electronics Engineering, NTU
P58
Digital System Design
ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3>
0 0 0 x x x x
ALUctrALUOperation
Add 0 1 0bit<2> bit<1> bit<0>
0 x 1 x x x x Subtract 1 1 00 1 x x x x x Or 0 0 11 x x 0 0 0 0 Add 0 1 01 x x 0 0 1 0 Subtract 1 1 01 x x 0 1 0 0 And 0 0 01 x x 0 1 0 1 Or 0 0 11 x x 1 0 1 0 Set on < 1 1 1
The Truth Table for ALUctr<0>The Truth Table for ALUctr<0>
Graduate Institute of Electronics Engineering, NTU
P59
Digital System Design
ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3> ALUctr<0>
0 1 x x x x x 11 x x 0 1 0 1 11 x x 1 0 1 0 1
vALUctr<0> = !ALUop<2> & ALUop<1> + ALUop< 2> & func<2> & func<0>+ ALUop<2> & func<3>
The Logic Equation for ALUctr<0>The Logic Equation for ALUctr<0>
Graduate Institute of Electronics Engineering, NTU
P60
Digital System Design
ALUControl(Local)
func
3
6ALUop
ALUctr3
vALUctr<2> = !ALUop<2> & ALUop<0>+ ALUop<2> & func<1>
vALUctr<1> = !ALUop<2> & !ALUop<1>+ ALUop<2> & func<2>
vALUctr<0> = !ALUop<2> & ALUop<1> + ALUop< 2> & func<2> & func<0>+ ALUop<2> & func<3>
The ALU Control BlockThe ALU Control Block
Graduate Institute of Electronics Engineering, NTU
P61
Digital System Design
R-type ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWriteBranchJumpExtOpALUop (Symbolic)
1001000x
“R-type”
01010000
Or
01110001
Add
x1x01001
Add
x0x0010x
Subtract
xxx0001x
xxx
op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
ALUop <2> 1 0 0 0 0 xALUop <1> 0 1 0 0 0 xALUop <0> 0 0 0 0 1 x
MainControl
op6
ALUControl(Local)
func
3
6
ALUop
ALUctr3
RegDstALUSrc
:
The The ““Truth TableTruth Table”” for the Main Controlfor the Main Control
Graduate Institute of Electronics Engineering, NTU
P62
Digital System Design
R-type ori lw sw beq jump
RegWrite 1 1 1 0 0 0
op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
vRegWrite = R-type + ori + lw= !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> (R-type)
+ !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> (ori)+ op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0> (lw)
op<0>
op<5>. .op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
R-type ori lw sw beq jumpRegWrite
The The ““Truth TableTruth Table”” for RegWritefor RegWrite
Graduate Institute of Electronics Engineering, NTU
P63
Digital System Design
op<0>
op<5>. .op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
R-type ori lw sw beq jumpRegWrite
ALUSrc
MemtoRegMemWrite
BranchJump
RegDst
ExtOp
ALUop<2>ALUop<1>ALUop<0>
PLA Implementation of the Main ControlPLA Implementation of the Main Control
Graduate Institute of Electronics Engineering, NTU
P64
Digital System Design
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Extender
Mux
Mux
3216imm16
ALUSrc
ExtOp
Mux
MemtoReg
Clk
Data InWrEn
32Adr
DataMemory
32
MemWrA
LU
InstructionFetch Unit
Clk
Zero
Instruction<31:0>
0
1
0
1
01<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
MainControl
op6
ALUControlfunc
6
3ALUop
ALUctr3
RegDst
ALUSrc:
Instr<5:0>
Instr<31:26>
Instr<15:0>
nPC_sel
Putting it All Together: A Single Cycle Putting it All Together: A Single Cycle ProcessorProcessor
Graduate Institute of Electronics Engineering, NTU
P65
Digital System Design
Critical Path (Load Operation) = PC’s Clk-to-Q +Instruction Memory’s Access Time +Register File’s Access Time +ALU to Perform a 32-bit Add +Data Memory Access Time +Setup Time for Register File Write +Clock Skew
Clk
5
Rw Ra Rb32 32-bitRegisters
Rd
AL
U
Clk
Data In
DataAddress Ideal
DataMemory
Instruction
InstructionAddress
IdealInstruction
Memory
Clk
PC
5Rs
5Rt
16Imm
32
323232
A
B
Nex
t Add
ress
Worst case delay for load is much longer than needed for all other instructions, yet this sets the cycle time.
Abstract View of Critical PathAbstract View of Critical Path
Graduate Institute of Electronics Engineering, NTU
P66
Digital System Design
OutlineOutlinevReview: Implement a single-cycle MIPS machinevMulti-cycle approachvFrom single cycle to multi-cyclevController design for multi-cyclevA multi-cycle design example of MIPS machine
Graduate Institute of Electronics Engineering, NTU
P67
Digital System Design
Flaw of SingleFlaw of Single--Cycle ProcessorCycle Processor
vLong cycle timevAll instructions take as much time as the slowest
PC Inst Memory mux ALU Data Mem mux
PC Reg FileInst Memory mux ALU mux
PC Inst Memory mux ALU Data Mem
PC Inst Memory cmp mux
Reg File
Reg File
Reg File
Arithmetic & Logical
Load
Store
Branch
Critical Path
setup
setup
Graduate Institute of Electronics Engineering, NTU
P68
Digital System Design
Comparison: SingleComparison: Single--Cycle and Multiple Cycle and Multiple CycleCycle
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Reg
ALUMem Reg Reg
Load from memory
Load from memory
R-type Instruction
R-type Instruction
R-type Instruction
R-type Instruction
Graduate Institute of Electronics Engineering, NTU
P69
Digital System Design
Performance EvaluationPerformance EvaluationvWhat is the average CPI?vState diagram gives CPI for each instruction typevWorkload gives frequency of each type
Type CPIi for type Frequency CPIi x freqIiArith/Logic 4 40% 1.6
Load 5 30% 1.5
Store 4 10% 0.4
branch 3 20% 0.6
Average CPI:4.1
Graduate Institute of Electronics Engineering, NTU
P70
Digital System Design
Reducing Cycle TimeReducing Cycle TimevCut combinational dependency graph and insert register / latchvDo same work in two fast cycles, rather than one slow one
storage element
Acyclic CombinationalLogic
storage element
storage element
Acyclic CombinationalLogic (A)
storage element
storage element
Acyclic CombinationalLogic (B)
=>
Graduate Institute of Electronics Engineering, NTU
P71
Digital System Design
Partition SinglePartition Single--Cycle DatapathCycle DatapathvAdd registers between smallest steps
PC
Nex
t PC
Ope
rand
Fetc
h
Reg
. Fi
le
Mem
Acc
ess
Inst
ruct
ion
Fetc
h
Res
ult S
tore
ALU
ctr
Reg
Dst
ALU
Src
ExtO
p
Mem
Wr
nPC
_sel
Reg
Wr
Mem
Wr
Mem
Rd
Allow the instruction to take multiple cycles.
EX
E
Graduate Institute of Electronics Engineering, NTU
P72
Digital System Design
MultiMulti--Cycle DatapathCycle Datapath
vAdditional registers are added to store values between stages.
PC
Nex
t PC
Ope
rand
Fetc
h
Ext
ALU Reg
. Fi
le
Mem
Acc
ess
Inst
ruct
ion
Fetc
h
Res
ult S
tore
ALU
ctr
Reg
Dst
ALU
Src
ExtO
p
nPC
_sel
Reg
Wr
Mem
Wr
Mem
Rd
IR
A
B
S
M
RegFile
Mem
ToR
eg
Graduate Institute of Electronics Engineering, NTU
P73
Digital System Design
RR--Type InstructionsType Instructions
inst Logical Register Transfers
ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4
inst Physical Register TransfersIR <– MEM[pc]
ADDU A<– R[rs]; B <– R[rt]S <– A + BR[rd] <– S; PC <– PC + 4
Exe
c
Reg
. Fi
le
Mem
Acc
ess
A
B
S
M
Reg
File
PC
Nex
t PC
IR
Inst
. Mem
Graduate Institute of Electronics Engineering, NTU
P74
Digital System Design
II--Type InstructionsType Instructions
inst Logical Register Transfers
ADDU R[rt] <– R[rs] OR zx(Im16); PC <– PC + 4
inst Physical Register TransfersIR <– MEM[pc]
ADDU A<– R[rs]; B <– R[rt]S <– A or ZeroExt(Im16)R[rt] <– S; PC <– PC + 4
Exe
c
Reg
. Fi
le
Mem
Acc
ess
A
B
S
M
Reg
File
PC
Nex
t PC
IR
Inst
. Mem
Graduate Institute of Electronics Engineering, NTU
P75
Digital System Design
Load from MemoryLoad from Memory
inst Logical Register Transfers
LW R[rt] <– MEM(R[rs] + sx(Im16);
PC <– PC + 4
inst Physical Register TransfersIR <– MEM[pc]
LW A<– R[rs]; B <– R[rt]S <– A + SignEx(Im16)M <– MEM[S]R[rd] <– M; PC <– PC + 4
Exe
c
Reg
. Fi
le
Mem
Acc
ess
A
B
S
M
Reg
File
PC
Nex
t PC
IR
Inst
. Mem
Graduate Institute of Electronics Engineering, NTU
P76
Digital System Design
Store to MemoryStore to Memory
inst Logical Register Transfers
SW MEM(R[rs] + sx(Im16) <– R[rt];
PC <– PC + 4
inst Physical Register TransfersIR <– MEM[pc]
SW A<– R[rs]; B <– R[rt]S <– A + SignEx(Im16); MEM[S] <– B PC <– PC + 4
Exe
c
Reg
. Fi
le
Mem
Acc
ess
A
B
S
M
Reg
File
PC
Nex
t PC
IR
Inst
. Mem
Graduate Institute of Electronics Engineering, NTU
P77
Digital System Design
Branch InstructionBranch Instruction
inst Logical Register Transfers
BEQ if R[rs] == R[rt]
then PC <= PC + sx(Im16) || 00
else PC <= PC + 4
Exe
c
Reg
. Fi
le
Mem
Acc
ess
A
B
S
M
Reg
File
PC
Nex
t PC
IR
Inst
. Mem
inst Physical Register TransfersIR <– MEM[pc]A<– R[rs]; B <– R[rt]Eq =(A - B == 0)
BEQ&Eq PC <– PC + sx(Im16) || 00
Graduate Institute of Electronics Engineering, NTU
P78
Digital System Design
Control SchemeControl SchemevControl may be designed using one of several initial representations.
The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique.
Initial Representation Finite State Diagram Microprogram
Sequencing Control Explicit Next State Microprogram counterFunction + Dispatch ROMs
Logic Representation Logic Equations Truth Tables
Implementation PLA ROMTechnique “hardwired control” “micro-programmed control”
Graduate Institute of Electronics Engineering, NTU
P79
Digital System Design
FSM: Sequential Circuit DesignFSM: Sequential Circuit DesignvModeling system outputs depend not only on current inputv Depend on inputsv Depend on current state
v Fundamental components v Combinational circuitsvMemory elements
CombinationalLogic
Memory Elements
Inputs Outputs
NextState
CurrentState
clock
Graduate Institute of Electronics Engineering, NTU
P80
Digital System Design
Finite State MachinesFinite State Machinesv Synchronous (i.e. clocked) finite state machines (FSMs) have widespread application
in digital systems, e.g. as datapath controllers in computational units and processors. Synchronous FSMs are characterized by a finite number of states and by clock-driven state transitions.
v Mealy Machine: The next state and the outputs depend on the present state and the inputs.
v Moore Machine: The next state depends on the present state and the inputs, but the output depends on only the present state.
Next StateCombinational
Logic
InputsState
RegisterOutputsOutput
CombinationalLogic
clock
Moore machine
Next State and OutputCombinational
Logic
Inputs
StateRegister
Outputs
clock
Mealy machine
Graduate Institute of Electronics Engineering, NTU
P81
Digital System Design
FiniteFinite--State Machine ControlState Machine Controlv The high-level view of the finite state machine control
Graduate Institute of Electronics Engineering, NTU
P82
Digital System Design
Instruction Fetch and DecodeInstruction Fetch and Decode
Graduate Institute of Electronics Engineering, NTU
P83
Digital System Design
MemoryMemory--Reference InstructionsReference Instructions
Graduate Institute of Electronics Engineering, NTU
P84
Digital System Design
RR--Type InstructionsType Instructions
Graduate Institute of Electronics Engineering, NTU
P85
Digital System Design
BranchBranch
Graduate Institute of Electronics Engineering, NTU
P86
Digital System Design
JumpJump
Graduate Institute of Electronics Engineering, NTU
P87
Digital System Design
Complete State DiagramComplete State Diagram
Graduate Institute of Electronics Engineering, NTU
P88
Digital System Design
MultiMulti--Cycle ArchitectureCycle Architecture