41
CEG3420 L06.1 Spring 2016 CENG 3420 Computer Organization and Design Lecture 06: MIPS Processor - I Bei Yu

CENG 3420 Computer Organization and Design Lecture 06

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.1 Spring 2016

CENG 3420Computer Organization and Design

Lecture 06: MIPS Processor - I

Bei Yu

Page 2: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.2 Spring 2016

q We're ready to look at an implementation of the MIPSq Simplified to contain only:

● memory-reference instructions: lw, sw● arithmetic-logical instructions: add, addu, sub, subu, and, or, xor, nor, slt, sltu

● arithmetic-logical immediate instructions: addi, addiu, andi, ori, xori, slti, sltiu

● control flow instructions: beq, jq Generic implementation:

● use the program counter (PC) to supply the instruction address and fetch the instruction from memory(and update the PC)

● decode the instruction (and read registers)● execute the instruction

The Processor: Datapath & Control

FetchPC = PC+4

DecodeExec

Page 3: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.3 Spring 2016

Abstract Implementation Viewq Two types of functional units:

● elements that operate on data values (combinational)● elements that contain state (sequential)

q Single cycle operationq Split memory (Harvard) model - one memory for

instructions and one for data

Address Instruction

InstructionMemory

Write Data

Reg Addr

Reg Addr

Reg Addr

Register

File ALUData

Memory

Address

Write Data

Read DataPC

ReadData

ReadData

Page 4: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.4 Spring 2016

Clocking Methodologiesq Clocking methodology defines when signals can

be read and when they can be writtenfalling (negative) edge

rising (positive) edgeclock cycle

clock rate = 1/(clock cycle)e.g., 10 nsec clock cycle = 100 MHz clock rate

1 nsec clock cycle = 1 GHz clock rate

q State element design choices● level sensitive latch● master-slave and edge-triggered flipflops

Page 5: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.5 Spring 2016

Review:Latches vs Flipflopsq Output is equal to the stored value inside the

element q Change of state (value) is based on the clock

● Latches: output changes whenever the inputs change and the clock is asserted (level sensitive methodology)

- Two-sided timing constraint● Flip-flop: output changes only on a clock edge (edge-

triggered methodology)- One-sided timing constraint

A clocking methodology defines when signals can be read and written – would NOT want to read a

signal at the same time it was being written

Page 6: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.6 Spring 2016

Our Implementationq An edge-triggered methodologyq Typical execution

● read contents of some state elements ● send values through some combinational logic● write results to one or more state elements

Stateelement

1

Stateelement

2

Combinationallogic

clock

one clock cycle

q Assumes state elements are written on every clock cycle; if not, need explicit write control signal● write occurs only when both the write control is asserted

and the clock edge occurs

Page 7: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.7 Spring 2016

Fetching Instructionsq Fetching instructions involves

● reading the instruction from the Instruction Memory● updating the PC value to be the address of the next

(sequential) instruction

ReadAddress Instruction

InstructionMemory

Add

PC

4

● PC is updated every clock cycle, so it does not need an explicit write control signal

● Instruction Memory is read every clock cycle, so it doesn’t need an explicit read control signal

FetchPC = PC+4

DecodeExec

clock

Page 8: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.8 Spring 2016

Decoding Instructionsq Decoding instructions involves

● sending the fetched instruction’s opcode and function field bits to the control unit

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ControlUnit

● reading two values from the Register File- Register File addresses are contained in the instruction

FetchPC = PC+4

DecodeExec

Page 9: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.9 Spring 2016

q Note that both RegFile read ports are active for allinstructions during the Decode cycle using the rs and rt instruction field addresses● Since haven’t decoded the instruction yet, don’t know what

the instruction is !● Just in case the instruction uses values from the RegFile

do “work ahead” by reading the two source operands

Which instructions do make use of the RegFile values?

q Also, all instructions (except j) use the ALU afterreading the registers

Why? memory-reference? arithmetic? control flow?

Reading Registers “Just in Case”

Page 10: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.10 Spring 2016

Executing R Format Operationsq R format operations (add, sub, slt, and, or)

● perform operation (op and funct) on values in rs and rt● store the result back into the Register File (into location rd)

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

overflowzero

ALU controlRegWrite

R-type:31 25 20 15 5 0

op rs rt rd functshamt

10

● Note that Register File is not written every cycle (e.g. sw), so we need an explicit write control signal for the Register File

FetchPC = PC+4

DecodeExec

Page 11: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.11 Spring 2016

q Remember the R format instruction sltslt $t0, $s0, $s1 # if $s0 < $s1

# then $t0 = 1# else $t0 = 0

Consider the slt Instruction

● Where does the 1 (or 0) come from to store into $t0 in the Register File at the end of the execute cycle?

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

overflowzero

ALU controlRegWrite

Page 12: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.12 Spring 2016

Executing Load and Store Operationsq Load and store operations have to

● compute a memory address by adding the base register (in rs) to the 16-bit signed offset field in the instruction

- base register was read from the Register File during decode

- offset value in the low order 16 bits of the instruction must be sign extended to create a 32-bit signed value

● store value, read from the Register File during decode, must be written to the Data Memory

● load value, read from the Data Memory, must be stored in the Register File

I-Type: op rs rt address offset31 25 20 15 0

Page 13: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.13 Spring 2016

Executing Load and Store Operations, con’t

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

overflowzero

ALU controlRegWrite

DataMemory

Address

Write Data

Read Data

SignExtend

MemWrite

MemRead16 32

Page 14: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.14 Spring 2016

Executing Branch Operationsq Branch operations have to

● compare the operands read from the Register File during decode (rs and rt values) for equality (zeroALU output)

● compute the branch target address by adding the updated PC to the sign extended16-bit signed offset field in the instruction

- “base register” is the updated PC- offset value in the low order 16 bits of the instruction

must be sign extended to create a 32-bit signed value and then shifted left 2 bits to turn it into a word address

I-Type: op rs rt address offset31 25 20 15 0

Page 15: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.15 Spring 2016

Executing Branch Operations, con’t

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

zero

ALU control

SignExtend16 32

Shiftleft 2

Add

4 Add

PC

Branchtargetaddress

(to branch control logic)

Page 16: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.16 Spring 2016

Executing Jump Operationsq Jump operations have to

● replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted left by 2 bits

ReadAddress Instruction

InstructionMemory

Add

PC

4

Shiftleft 2

Jumpaddress

26

4

28

J-Type: op

31 25 0jump target address

Page 17: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.18 Spring 2016

Creating a Single Datapath from the Partsq Assemble the datapath elements, add control lines

as needed, and design the control pathq Fetch, decode and execute each instructions in

one clock cycle – single cycle design● no datapath resource can be used more than once per

instruction, so some must be duplicated (e.g., why we have a separate Instruction Memory and Data Memory)

● to share datapath elements between two different instruction classes will need multiplexors at the input of the shared elements with control lines to do the selection

q Cycle time is determined by length of the longest path

Page 18: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.19 Spring 2016

Fetch, R, and Memory Access Portions

ReadAddress Instruction

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

ovfzero

ALU controlRegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemReadSign

Extend16 32

Page 19: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.20 Spring 2016

Multiplexor Insertion

MemtoReg

ReadAddress Instruction

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

ovfzero

ALU controlRegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemReadSign

Extend16 32

ALUSrc

Page 20: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.21 Spring 2016

Clock Distribution

MemtoReg

ReadAddress Instruction

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

ovfzero

ALU control

RegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemReadSign

Extend16 32

ALUSrc

System Clock

clock cycle

Page 21: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.22 Spring 2016

Adding the Branch Portion

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

ovfzero

ALU controlRegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemReadSign

Extend16 32

MemtoRegALUSrc

ReadAddress Instruction

InstructionMemory

Add

PC

4 Shiftleft 2

Add

PCSrc

Page 22: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.23 Spring 2016

q We wait for everything to settle down● ALU might not produce “right answer” right away● Memory and RegFile reads are combinational (as are

ALU, adders, muxes, shifter, signextender)● Use write signals along with the clock edge to determine

when to write to the sequential elements (to the PC, to the Register File and to the Data Memory)

q The clock cycle time is determined by the logic delay through the longest path

Our Simple Control Structure

We are ignoring some details like register setup and hold times

Page 23: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.24 Spring 2016

Adding the Controlq Selecting the operations to perform (ALU, Register

File and Memory read/write)q Controlling the flow of data (multiplexor inputs)q Information comes from the 32 bits of the instruction

I-Type: op rs rt address offset31 25 20 15 0

R-type:31 25 20 15 5 0

op rs rt rd functshamt

10

q Observations● op field always

in bits 31-26● addr of two

registers to be read are always specified by the rs and rt fields (bits 25-21 and 20-16)

● base register for lw and sw always in rs (bits 25-21) ● addr. of register to be written is in one of two places – in rt

(bits 20-16) for lw; in rd (bits 15-11) for R-type instructions● offset for beq, lw, and sw always in bits 15-0

Page 24: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.25 Spring 2016

(Almost) Complete Single Cycle Datapath

ReadAddress Instr[31-0]

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr ALU

ovfzero

DataMemory

Address

Write Data

Read Data

MemWrite

MemRead

Register

File

ReadData 1

ReadData 2

RegWrite

SignExtend16 32

MemtoRegALUSrc

Shiftleft 2

Add

PCSrc

10

RegDst

0

1

10

1

0

ALUcontrol

ALUOpInstr[5-0]

Instr[15-0]

Instr[25-21]

Instr[20-16]

Instr[15 -11]

Page 25: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.26 Spring 2016

ALU Control

ALU control input

Function

0000 and0001 or0010 xor0011 nor0110 add1110 subtract1111 set on less than

q ALU's operation based on instruction type and function code

q Notice that we are using different encodings than in the book

Page 26: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.27 Spring 2016

ALU Control, Con’tq Controlling the ALU uses of multiple decoding levels

● main control unit generates the ALUOp bits● ALU control unit generates ALUcontrol bits

Instr op funct ALUOp action ALUcontrollw xxxxxx 00 add 0110sw xxxxxx 00 add 0110beq xxxxxx 01 subtract 1110add 100000 10 add 0110subt 100010 10 subtract 1110and 100100 10 and 0000or 100101 10 or 0001xor 100110 10 xor 0010nor 100111 10 nor 0011slt 101010 10 slt 1111

Page 27: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.28 Spring 2016

ALU Control Truth TableF5 F4 F3 F2 F1 F0 ALU

Op1

ALU Op0

ALU control3

ALU control2

ALU control1

ALU control0

X X X X X X 0 0 0 1 1 0X X X X X X 0 1 1 1 1 0X X 0 0 0 0 1 0 0 1 1 0X X 0 0 1 0 1 0 1 1 1 0X X 0 1 0 0 1 0 0 0 0 0X X 0 1 0 1 1 0 0 0 0 1X X 0 1 1 0 1 0 0 0 1 0X X 0 1 1 1 1 0 0 0 1 1X X 1 0 1 0 1 0 1 1 1 1

q Four, 6-input truth tables

Our ALU m control input

Add/subt Mux control

Page 28: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.29 Spring 2016

ALU Control Logicq From the truth table can design the ALU Control logicInstr[3]Instr[2]Instr[1]Instr[0]

ALUOp1

ALUOp0

ALUcontrol3

ALUcontrol2

ALUcontrol1

ALUcontrol0

Page 29: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.30 Spring 2016

(Almost) Complete Datapath with Control Unit

ReadAddress Instr[31-0]

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

ovf

zero

RegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemRead

SignExtend16 32

MemtoReg

ALUSrc

Shiftleft 2

Add

PCSrc

RegDst

ALUcontrol

1

1

1

00

0

0

1

ALUOp

Instr[5-0]

Instr[15-0]

Instr[25-21]

Instr[20-16]

Instr[15 -11]

ControlUnit

Instr[31-26]

Branch

Page 30: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.31 Spring 2016

R-type Instruction Data/Control Flow

ReadAddress Instr[31-0]

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

ovf

zero

RegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemRead

SignExtend16 32

MemtoReg

ALUSrc

Shiftleft 2

Add

PCSrc

RegDst

ALUcontrol

1

1

1

00

0

0

1

ALUOp

Instr[5-0]

Instr[15-0]

Instr[25-21]

Instr[20-16]

Instr[15 -11]

ControlUnit

Instr[31-26]

Branch

0

0

1

Page 31: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.32 Spring 2016

Store Word Instruction Data/Control Flow

ReadAddress Instr[31-0]

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

ovf

zero

RegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemRead

SignExtend16 32

MemtoReg

ALUSrc

Shiftleft 2

Add

PCSrc

RegDst

ALUcontrol

1

1

1

00

0

0

1

ALUOp

Instr[5-0]

Instr[15-0]

Instr[25-21]

Instr[20-16]

Instr[15 -11]

ControlUnit

Instr[31-26]

Branch

0 1

0

Page 32: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.33 Spring 2016

Load Word Instruction Data/Control Flow

ReadAddress Instr[31-0]

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

ovf

zero

RegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemRead

SignExtend16 32

MemtoReg

ALUSrc

Shiftleft 2

Add

PCSrc

RegDst

ALUcontrol

1

1

1

00

0

0

1

ALUOp

Instr[5-0]

Instr[15-0]

Instr[25-21]

Instr[20-16]

Instr[15 -11]

ControlUnit

Instr[31-26]

Branch

0

1

1

Page 33: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.34 Spring 2016

Branch Instruction Data/Control Flow

ReadAddress Instr[31-0]

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

ovf

zero

RegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemRead

SignExtend16 32

MemtoReg

ALUSrc

Shiftleft 2

Add

PCSrc

RegDst

ALUcontrol

1

1

1

00

0

0

1

ALUOp

Instr[5-0]

Instr[15-0]

Instr[25-21]

Instr[20-16]

Instr[15 -11]

ControlUnit

Instr[31-26]

Branch

0

0

0

Page 34: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.35 Spring 2016

Main Control Unit

Instr RegDst ALUSrc MemReg RegWr MemRd MemWr Branch ALUOp

R-type000000

1 0 0 1 0 0 0 10

lw100011

0 1 1 1 1 0 0 00

sw101011

X 1 X 0 0 1 0 00

beq000100

X 0 X 0 0 0 1 01

Page 35: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.36 Spring 2016

Control Unit Logicq From the truth table can design the Main Control logicInstr[31]Instr[30]Instr[29]Instr[28]Instr[27]Instr[26]

R-type lw sw beqRegDst

ALUSrc

MemtoReg

RegWrite

MemRead

MemWrite

Branch

ALUOp1

ALUOp0

Page 36: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.37 Spring 2016

Review: Handling Jump Operationsq Jump operation have to

● replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted left by 2 bits

ReadAddress Instruction

InstructionMemory

Add

PC

4

Shiftleft 2

Jumpaddress

26

4

28

J-Type: op jump target address

31 0

Page 37: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.38 Spring 2016

Adding the Jump Operation

ReadAddress Instr[31-0]

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

ovf

zero

RegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemRead

SignExtend16 32

MemtoReg

ALUSrc

Shiftleft 2

Add

PCSrc

RegDst

ALUcontrol

1

1

1

00

0

0

1

ALUOp

Instr[5-0]

Instr[15-0]

Instr[25-21]

Instr[20-16]

Instr[15 -11]

ControlUnit

Instr[31-26]

Branch

Shiftleft 2

0

1

Jump

32Instr[25-0]

26PC+4[31-28]

28

0 0

0

Page 38: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.39 Spring 2016

EX: Main Control Unit of j

Instr RegDst ALUSrc MemReg RegWr MemRd MemWr Branch ALUOp Jump

R-type000000

1 0 0 1 0 0 0 10 0

lw100011

0 1 1 1 1 0 0 00 0

sw101011

X 1 X 0 0 1 0 00 0

beq000100

X 0 X 0 0 0 1 01 0

j000010

X X X 0 0 0 X XX 1

Page 39: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.40 Spring 2016

Single Cycle Implementation Cycle Time

q Unfortunately, though simple, the single cycle approach is not used because it is very slow

q Clock cycle must have the same length for every instruction

q What is the longest path (slowest instruction)?

Page 40: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.41 Spring 2016

EX: Instruction Critical Paths

Instr. I Mem Reg Rd ALU Op D Mem Reg Wr TotalR-typeloadstorebeqjump

4 1 2 1 8

4 1 2 4 1 12

q Calculate cycle time assuming negligible delays (for muxes, control unit, sign extend, PC access, shift left 2, wires) except:

● Instruction and Data Memory (4 ns)● ALU and adders (2 ns)● Register File access (reads or writes) (1 ns)

4 1 2 4 114 1 2 74 4

Page 41: CENG 3420 Computer Organization and Design Lecture 06

CEG3420 L06.42 Spring 2016

Single Cycle Disadvantages & Advantagesq Uses the clock cycle inefficiently – the clock cycle

must be timed to accommodate the slowest instr● especially problematic for more complex instructions like

floating point multiply

q May be wasteful of area since some functional units (e.g., adders) must be duplicated since they can not be shared during a clock cycle

butq It is simple and easy to understand

Clk

lw sw Waste

Cycle 1 Cycle 2