45
Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 U nported License .

Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Embed Size (px)

Citation preview

Page 1: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Designing a Single Cycle Datapath

or

The Do-It-Yourself CPU Kit

Reading 4.4 – HW due Monday

Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Page 2: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

The Big Picture: Where are We Now?

• The Five Classic Components of a Computer

• Today’s Topic: Datapath Design, then Control Design

Control

Datapath

Memory

Processor

Input

Output

Page 3: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

The Big Picture: The Performance Perspective

• Processor design (datapath and control) will determine:– Clock cycle time

– Clock cycles per instruction

• Starting today:– Single cycle processor:

Advantage: One clock cycle per instruction Disadvantage: long cycle time

• ET = Insts * CPI * Cycle Time Execute anentire instruction

Page 4: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

• We're ready to look at an implementation of the MIPS simplified to contain only:– memory-reference instructions: lw, sw – arithmetic-logical instructions: add, sub, and, or, slt– control flow instructions: beq

• Generic Implementation:– use the program counter (PC) to supply instruction address

– get the instruction from memory

– read registers

– use the instruction to decide exactly what to do

The Processor: Datapath & Control

Let’s look at some regularity in our instructions

Page 5: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Review: Two Types of Logic Components

StateElement

clk

A

BC = f(A,B,state)

CombinationalLogic

A

BC = f(A,B)

Page 6: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Clocking Methodology

• All storage elements are clocked by the same clock edge

Clk

Don’t Care

Setup Hold

.

.

.

.

.

.

.

.

.

.

.

.

Setup Hold

Consequently, our cycle time will be the sum of:(a) The Clock-to-Q time of the input registers.(b) The longest delay path through the combinational logic block.(c) The set up time of the output register.(d) And finally the clock skew.In order to avoid hold time violation, you have to make sure this inequality is fulfilled. ---- DRAW CT

Page 7: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Which is correct about the ALU and memory in MIPS?

A. The ALU always performs an operation before accessing data memoryB. The ALU sometimes performs an operation before accessing data memoryC. Data memory is always accessed before performing an ALU operationD. Data memory is sometimes accessed before performing an ALU operationE. None of the above.

Isomorphic

Page 8: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Which is correct about the ALU and the register file in MIPS?

A. The ALU always performs an operation before accessing the register fileB. The ALU sometimes performs an operation before accessing the register fileC. The register file is always accessed before performing an ALU operationD. The register file is sometimes accessed before performing an ALU operationE. None of the above.

Isomorphic

Page 9: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

So what does this tell us?

Draw the register file before ALU before memory

Page 10: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Register Transfer Language (RTL)

• is a mechanism for describing the movement and manipulation of data between storage elements:

R[3] <- R[5] + R[7]

PC <- PC + 4 + R[5]

R[rd] <- R[rs] + R[rt]

R[rt] <- Mem[R[rs] + immed]

We’ll be using this from time to time – its just a shorthand for what is going on in hardware, we’ll use it in a second

Page 11: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Review: The MIPS Instruction Formats

• All MIPS instructions are 32 bits long. The three instruction formats:

R-type

I-type

J-typeop target address

02631

6 bits 26 bits

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

Before we start designing our processor – we need to know how the instructions look alike.

MIPS is simple – only 3 formats and they have some common features. Let’s look more closely at the few instructions we are focusing on today.

Page 12: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

The MIPS Subset

• R-type– add rd, rs, rt

– sub, and, or, slt

• LOAD and STORE– lw rt, rs, imm16

– sw rt, rs, imm16

• BRANCH:– beq rs, rt, imm16

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

op rs rt displacement

016212631

6 bits 16 bits5 bits5 bits

PC = PC+4

R[rd] = R[rs] OP R[rt]

PC = PC+4

R[rt] = Mem[R[rs] + SE(imm)] OR

Mem[R[rs] + SE(imm)] = R[rt]

ZERO = (R[rs] – R[rt] == 0)

PC = if(ZERO) PC + 4+ (SE(Imm)<<2)

Else PC = PC+4

BEFORE GOING ON… quick reminder…

Page 13: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Storage Element: Register

• Register– Similar to the D Flip Flop except

N-bit input and output Write Enable input

– Write Enable: 0: Data Out will not change 1: Data Out will become Data In (on the clock edge)

Clk

Data In

Write Enable

N N

Data Out

Page 14: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Which of these describes our register file?

A. Two 32-bit outputs, 3 5-bit inputs, clk input, 1-bit control input

B. Two 32-bit outputs, 3 32-bit inputs, clk input, 1-bit control input

C. Two 32-bit outputs, 2 5-bit inputs, 1 32-bit input, clk input, 1-bit control input

D. Two 32-bit outputs, 2 32-bit inputs, 1 32-bit input, clk input, 1-bit control input

E. None of the above

Page 15: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Register File

Clk

Write Data

RegWrite

32

32

Read Data 1

32

Read Data 2

32 32-bitRegisters

5

5

5

RR1

RR2

WR

Page 16: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Which of these describes our memory (for now)?

A. One 32-bit output, 1 5-bit input, 1 32-bit input, clk input, 1-bit control input, 1 bit control input

B. One 32-bit output, 2 5-bit inputs, clk input, 1-bit control input, 1 bit control input

C. One 32-bit output, 2 32-bit inputs, clk input, 2 1-bit control inputs

D. One 32-bit output, 1 32-bit input, clk input, 2 1-bit control inputs

E. None of the above

Page 17: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Memory

Clk

Write Data

MemWrite

32 32

Read Data

Address

MemRead

Page 18: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Can we layout a high-level design to do this?

Draw as much as you can implementing one instruction at a time – get the students involved

You’ll want to do something like this for your lab

Page 19: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Putting it All Together: A Single Cycle Datapath

• We have everything except control signals (later)

Page 20: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Ignoring control - which instruction does this active datapath represent

A. R-typeB. lwC. swD. BeqE. None of the above

Active Single-Cycle Datapath

Page 21: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Ignoring control - which instruction does this active datapath represent

A. R-typeB. lwC. swD. BeqE. None of the above

Active Single-Cycle Datapath

Page 22: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Ignoring control - which instruction does this active datapath represent

A. R-typeB. lwC. swD. BeqE. None of the above

Active Single-Cycle Datapath

Page 23: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Ignoring control - which instruction does this active datapath represent

A. R-typeB. lwC. swD. BeqE. None of the above

Active Single-Cycle Datapath

Page 24: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Key Points

• CPU is just a collection of state and combinational logic

• We just designed a very rich processor, at least in terms of functionality

• ET = IC * CPI * Cycle Time– where does the single-cycle machine fit in?

Page 25: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Control Logic for the Single-Cycle CPU

or

Who’s in charge here?

Page 26: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Putting it All Together: A Single Cycle Datapath• We have everything except control signals

We’re going to connect up all these Signals to a central place, and controlThem from there, based on opcode/funct

Page 27: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Okay, then, what about those Control Signals?

Point out we’ve just hooked these up.

Page 28: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Peer instruction question asking if decode can happen in parallel with register read.

Selection

Select the true statement for MIPS

A Registers can be read in parallel with control signal generation

B Instruction Read can be done in parallel with control signal generation

C Registers can be written in parallel with control signal generation

D The main ALU can execute in parallel with control signal generation

E None of the above

Page 29: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Okay, then, what about those Control Signals?

Start here

Notice control bits come from opcode and sometimes function code bits. R-type are the same except for the ALU

Page 30: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

ALU control bits• Recall: 5-function ALU

ALU control input Function Operations 000 And and 001 Or or 010 Add add, lw, sw 110 Subtract sub, beq 111 Slt slt

Take your time here, this isn’t obvious. These are the 3 bit input signals which cause the processor to do what you want.

Page 31: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Full ALU

sign bit (adder output from bit 31)

what signals accomplish: Binvert CIn Operand?or? add?sub?beq?slt?

And 0 0 0Or 0 0 1Add 0 0 2Sub 1 1 2Beq 1 1 2Slt 1 1 3

Consolidate to 3 wires since Binvert and CIn are always the same

Page 32: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

ALU control bits• Recall: 5-function ALU

• based on and from instruction

• ALU doesn’t need to know all opcodes--we will summarize opcode with ALUOp (2 bits):

00 - lw,sw 01 - beq 10 - R-format

ALU control input Function Operations 000 And and 001 Or or 010 Add add, lw, sw 110 Subtract sub, beq 111 Slt slt

MainControl

op6

ALUControl

func

2

6ALUop

ALUctr3

Opcode (31-26) Function code (5-0)

Page 33: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Generating ALU controlInstruction

opcodeALUOp Instruction

operationFunction

codeDesired

ALUaction

ALUcontrolinput

lw 00 load word xxxxxx add 010

sw 00 store word xxxxxx add 010

beq 01 branch eq xxxxxx subtract 110

R-type 10 add 100000 add 010

R-type 10 subtract 100010 subtract 110

R-type 10 AND 100100 and 000

R-type 10 OR 100101 or 001

R-type 10 slt 101010 slt 111

ALUControlLogic

Essentially a truth table, and we can design logic to do this.

Page 34: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Generating individual ALU signalsALUop Function ALUCtr

signals 00 xxxx 010

01 xxxx 110

10 0000 010

10 0010 110

10 0100 000

10 0101 001

10 1010 111

ALUctr2 = ALUctr1 =

ALUctr0 =

MainControl

op6

ALUControl

func

2

6ALUop

ALUctr3

Op0 + Op1F1Op1+F2Op1Op0(F0+F3)

A: (Op1)(!Op)(F0+F3)

B: !Op1+!F2

C: Op0+Op1F1

Select ALUctr2 ALUctr1 ALUctr0

A A B C

B A C B

C B C A

D C B A

E None of the above

Page 35: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Select RegDst MemToReg ALUOp

A 0 X 00

B 1 0 00

C 0 X 10

D 1 0 10

E None of the above

add instruction control signals?

ISOMORPHIC

Page 36: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Select ALUSrc RegDst ALUOp

A 0 0 00

B 1 X 00

C 0 0 10

D 1 X 10

E None of the above

sw instruction control signals?

ISOMORPHIC

Page 37: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

beq Control

InstructionRegDst ALUSrcMemto-

RegReg Write

Mem Read

Mem Write Branch ALUOp1 ALUp0

R-formatlwswbeq

Ultimately we canGenerate the controlSignals for all insts.

Branches are a bit tricker – let’sDo this together

Page 38: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Control Truth TableR-format lw sw beq

Opcode 000000 100011 101011 000100

RegDst 1 0 x x

ALUSrc 0 1 1 0

MemtoReg 0 1 x x

RegWrite 1 1 0 0

Outputs MemRead 0 1 0 0

MemWrite 0 0 1 0

Branch 0 0 0 1

ALUOp1 1 0 0 0

ALUOp0 0 0 0 1

Here’s a truth table – which means we can make the logic to design it.

Page 39: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Control

• Simple combinational logic (truth tables)

Operation2

Operation1

Operation0

Operation

ALUOp1

F3

F2

F1

F0

F (5– 0)

ALUOp0

ALUOp

ALU control block

R-format Iw sw beq

Op0

Op1

Op2

Op3

Op4

Op5

Inputs

Outputs

RegDst

ALUSrc

MemtoReg

RegWrite

MemRead

MemWrite

Branch

ALUOp1

ALUOpO

Here’s the truth table

Page 40: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Which wire – if always ZERO – would break add?

CB A

D

ISOMORPHIC

Page 41: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Which wire – if always ONE – would break lw?

CB A

D

ISOMORPHIC

Page 42: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Add new instructions

• Potentially requires modifying the datapath

• Potentially requires adding more control wires – which would impact our previous truth table.

Page 43: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Select Best Answer

A Yes – we need both new control and datapath.

B Yes – we need just datapath.

C No – but we should for better performance.

D No – just changing control signals is fine.

E Single cycle can’t do jump register.

Do we need to modify our single-cycle design to do jr

ISOMORPHIC

Page 44: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Single-Cycle CPU Summary

• Easy, particularly the control

• Which instruction takes the longest? By how much? Why is that a problem?

• ET = IC * CPI * CT

• What else can we do?

Page 45: Designing a Single Cycle Datapath or The Do-It-Yourself CPU Kit Reading 4.4 – HW due Monday Peer Instruction Lecture Materials for Computer Architecture

Single-Cycle CPU Summary

• Easy, particularly the control

• Which instruction takes the longest? By how much? Why is that a problem?

• ET = IC * CPI * CT

• What else can we do?

• When does a multi-cycle implementation make sense?– e.g., 70% of instructions take 75 ns, 30% take 200 ns?

– suppose 20% overhead for extra latches

• Real machines have much more variable instruction latencies than this.

200 vs. (200*.3+75*.7)*1.2 (60+50)*1.2 ~ 135