CS141-L4-1Tarun Soni, Summer’03 Single Cycle CPU Previously: built and ALU. Today: Actually…

Preview:

DESCRIPTION

CS141-L4-3Tarun Soni, Summer’03 CPU: Building blocks Adder MUX ALU 32 A B Sum Carry 32 A B Result OP 32 A B Y Select Adder MUX ALU CarryIn

Citation preview

CS141-L4-1 Tarun Soni, Summer’03

Single Cycle CPU

Previously: built and ALU.Today: Actually build a CPU

Questions on CS140 ? Computer Arithmetic ?

•Attend office hours with TAs or me.•Do the exercises in the text.

CS141-L4-2 Tarun Soni, Summer’03

Instruction Set Architectures Performance issues 2s complement, Addition, Subtraction Multiplication, Division, Floating Point numbers

The Story so far:

Basically ISA & ALU stuff

CS141-L4-3 Tarun Soni, Summer’03

CPU: Building blocks

• Adder

• MUX

• ALU

32

32

A

B32

Sum

Carry

32

32

A

B32

Result

OP

32A

B32

Y32

Select

Adder

MU

XA

LU

CarryIn

CS141-L4-4 Tarun Soni, Summer’03

CPU: Building blocks

OP

32A

B32

Y32

Select

MU

X

3232A[31..0]

B[31..0]32

Sum[31..0]

Carry

Adder

CarryIn

32A[63..32]

B[63..32]32

Sum[63..32]

Carry

Adder

CarryIn

32

• Building a 64-bit adder from 2x32-bit adders

CS141-L4-5 Tarun Soni, Summer’03

CPU: Building blocks

32A

B32

Sum[63..32]32

Select

MU

X

32

32

A[31..0]

B[31..0]32

Sum[31..0]

Carry

Adder

CarryIn

32

32

A[63..32]

B[63..32]32

S

Cout

Adder

Cin=0

32

32

A[63..32]

B[63..32]32

S

CoutA

dder

Cin=11

A

B1

Cout1

Select

MU

X

• Silicon is cheap – sort-of

CS141-L4-6 Tarun Soni, Summer’03

CPU

Single Cycle CPU

CS141-L4-7 Tarun Soni, Summer’03

CPU

The Big Picture: Where are We Now?

• The Five Classic Components of a Computer

• Datapath Design, then Control Design

Control

Datapath

Memory

ProcessorInput

Output

CS141-L4-8 Tarun Soni, Summer’03

CPU: The big picture

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction ° Design hardware for each of these steps!!!

Execute anentire instruction

Fetc

h

Dec

ode

Fetc

h

Exec

ute

Stor

e

Nex

t

CS141-L4-9 Tarun Soni, Summer’03

CPU: Clocking

Clk

Don’t CareSetup Hold

.

.

.

.

.

.

.

.

.

.

.

.

Setup Hold

• All storage elements are clocked by the same clock edge

CS141-L4-10 Tarun Soni, Summer’03

CPU

The Big Picture: The Performance Perspective

• Execution Time = Insts * CPI * Cycle Time• Processor design (datapath and control) will determine:

– Clock cycle time– Clock cycles per instruction

• Starting today:– Single cycle processor:

• Advantage: One clock cycle per instruction• Disadvantage: long cycle time

Execute anentire instruction

CS141-L4-11 Tarun Soni, Summer’03

CPU

• We're ready to look at an implementation of the MIPS• Simplified to contain only:

– memory-reference instructions: lw, sw – arithmetic-logical instructions: add, sub, and, or, slt– control flow instructions: beq

• Generic Implementation:– use the program counter (PC) to supply instruction address– get the instruction from memory– read registers– use the instruction to decide exactly what to do

• All instructions use the ALU after reading the registersmemory-reference? arithmetic? control flow?

CPI

Inst. Count Cycle Time

CS141-L4-12 Tarun Soni, Summer’03

CPU

Review: The MIPS Instruction Formats

op target address02631

6 bits 26 bits

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

°The different fields are:•op: operation of the instruction•rs, rt, rd: the source and destination register specifiers•shamt: shift amount•funct: selects the variant of the operation in the “op” field•address / immediate: address offset or immediate value•target address: target address of the jump instruction

CS141-L4-13 Tarun Soni, Summer’03

CPU

• R-type– add rd, rs, rt– sub, and, or, slt

• LOAD and STORE– lw rt, rs, imm16– sw rt, rs, imm16

• BRANCH:– beq rs, rt, imm16

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

op rs rt displacement016212631

6 bits 16 bits5 bits5 bits

CS141-L4-14 Tarun Soni, Summer’03

CPU

• Memory– instruction & data

• Registers (32 x 32)– read RS– read RT– Write RT or RD

• PC• Extender• Add and Sub register or extended immediate• Add 4 or extended immediate to PC

Requirements to implement the ISA

CS141-L4-15 Tarun Soni, Summer’03

CPU

• Combinational Elements• Storage Elements

– Clocking methodology

StateElement

clk

A

B

C = f(A,B,state){State[n] = f(A,B,state[n-1])}

CombinationalLogic

A

BC = f(A,B)

CS141-L4-16 Tarun Soni, Summer’03

CPU: Storage unit

• The set-reset latch– output depends on present inputs and also on past inputs

CS141-L4-17 Tarun Soni, Summer’03

CPU: D-flip flop

• Two inputs:– the data value to be stored (D)– the clock signal (C) indicating when to read & store D

• Two outputs:– the value of the internal state (Q) and it's complement

Q

C

D

_Q

D

C

Q

• Output changes only on the clock edge

QQ

_Q

Q

_Q

Dlatch

D

C

Dlatch

DD

C

C

D

C

Q

CS141-L4-18 Tarun Soni, Summer’03

CPU: Clocking Methodology

• An edge triggered methodology• Typical execution:

– read contents of some state elements, – send values through some combinational logic– write results to one or more state elements

Clock cycle

Stateelement

1Combinational logic

Stateelement

2

CS141-L4-19 Tarun Soni, Summer’03

CPU: Storage block

• Register– Similar to the D Flip Flop except

• N-bit input and output• Write Enable input

– Write Enable:• 0: Data Out will not change• 1: Data Out will become Data In (on the clock edge)

Clk

Data In

Write Enable

N N

Data Out

CS141-L4-20 Tarun Soni, Summer’03

CPU: Register Files

• Register File consists of (32) registers:– Two 32-bit output buses:– One 32-bit input bus: busW

• Register is selected by:– RA selects the register to put on busA– RB selects the register to put on busB– RW selects the register to be written

via busW when Write Enable is 1• Clock input (CLK)

• Factor only during write-enable=1;• Otherwise, this unit acts just like combinational logic.

Clk

busW

Write Enable

3232

busA

32busB

5 5 5RW RA RB

32 32-bitRegisters

CS141-L4-21 Tarun Soni, Summer’03

CPU: Register Files

Mux

Register 0Register 1

Register n – 1Register n

Mux

Read data 1

Read data 2

Read registernumber 1

Read registernumber 2

Read registernumber 1 Read

data 1

Readdata 2

Read registernumber 2

Register fileWriteregister

Writedata Write

n-to-1decoder

Register 0

Register 1

Register n – 1C

C

D

DRegister n

C

C

D

D

Register number

Write

Register data

01

n – 1

n

Built using D-flip flopsStill use the real clock (not shown here) to do the actual write

CS141-L4-22 Tarun Soni, Summer’03

CPU: Memory

• Memory (idealized)– One input bus: Data In– One output bus: Data Out

• Memory word is selected by:– Address selects the word to put on Data Out– Write Enable = 1: address selects the memory

word to be written via the Data In bus• Clock input (CLK)

– The CLK input is a factor ONLY during write operation– During read operation, behaves as a combinational logic block:

• Address valid => Data Out valid after “access time.”

Clk

Data In

Write Enable

32 32DataOut

Address

CS141-L4-23 Tarun Soni, Summer’03

CPU: RTL

• is a mechanism for describing the movement and manipulation of data between storage elements:

R[3] <- R[5] + R[7]PC <- PC + 4 + R[5]R[rd] <- R[rs] + R[rt]R[rt] <- Mem[R[rs] + immed]

Register Transfer Language (RTL)

CS141-L4-24 Tarun Soni, Summer’03

CPU: More building blocks

PC

Instructionmemory

Instructionaddress

Instruction

a. Instruction memory b. Program counter

Add Sum

c. Adder

ALU control

RegWrite

RegistersWriteregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writedata

ALUresult

ALU

Data

Data

Registernumbers

a. Registers b. ALU

Zero5

5

5 3

16 32Sign

extend

b. Sign-extension unit

MemRead

MemWrite

Datamemory

Writedata

Readdata

a. Data memory unit

Address

CS141-L4-25 Tarun Soni, Summer’03

CPU: The big picture

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction ° Design hardware for each of these steps!!!

Execute anentire instruction

Fetc

h

Dec

ode

Fetc

h

Exec

ute

Stor

e

Nex

t

CS141-L4-26 Tarun Soni, Summer’03

CPU: Instruction Fetch

• RTL version of the instruction fetch step: • Fetch the Instruction: mem[PC]– Update the program counter:

• Sequential Code: PC <- PC + 4 • Branch and Jump: PC <- “something else”

32

Instruction WordAddress

InstructionMemory

PCClk

Next AddressLogic

CS141-L4-27 Tarun Soni, Summer’03

CPU: Binary arithmetic for PC

• In theory, the PC is a 32-bit byte address into the instruction memory:– Sequential operation: PC<31:0> = PC<31:0> + 4– Branch operation: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4

• The magic number “4” always comes up because:– The 32-bit PC is a byte address– And all our instructions are 4 bytes (32 bits) long

• In other words:– The 2 LSBs of the 32-bit PC are always zeros– There is no reason to have hardware to keep the 2 LSBs

• In practice, we can simplify the hardware by using a 30-bit PC<31:2>:– Sequential operation: PC<31:2> = PC<31:2> + 1– Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16]– In either case: Instruction Memory Address = PC<31:2> concat “00”

CS141-L4-28 Tarun Soni, Summer’03

CPU: Instruction Fetch unit

• The common RTL operations– Fetch the Instruction: inst <- mem[PC]– Update the program counter:

• Sequential Code: PC <- PC + 4 • Branch and Jump PC <- “something else”

CS141-L4-29 Tarun Soni, Summer’03

CPU: Register-Register Operations (Add, Subtract etc.)

• R[rd] <- R[rs] op R[rt] Example: addU rd, rs, rt– Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields– ALUctr and RegWr: control logic after decoding the instruction

32Result

ALUctr

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs RtRd

ALU

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

° Worry about instruction decode to generate ALUctr and RegWr later.

CS141-L4-30 Tarun Soni, Summer’03

CPU: Register - Register Timing

32Result

ALUctr

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs RtRd

AL

U

Clk

PC

Rs, Rt, Rd,Op, Func

Clk-to-Q

ALUctr

Instruction Memory Access Time

Old Value New Value

RegWr Old Value New Value

Delay through Control Logic

busA, B

Register File Access TimeOld Value New Value

busWALU Delay

Old Value New Value

Old Value New Value

New ValueOld Value

Register WriteOccurs Here

CS141-L4-31 Tarun Soni, Summer’03

CPU: Logical Immediate Op.• R[rt] <- R[rs] op ZeroExt[imm16] ]

32

Result

ALUctr

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs

RtRdRegDst

ZeroExt

Mux

Mux

3216imm16

ALUSrc

ALU

11

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits rd?

immediate016 1531

16 bits16 bits0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Handle Rt as destination

HandleImmediate asoperand

CS141-L4-32 Tarun Soni, Summer’03

CPU: Load Operations

• R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16

11

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits rd

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs

RtRdRegDst

Extender

Mux

Mux

3216

imm16

ALUSrc

ExtOp

Clk

Data InWrEn

32

Adr

DataMemory

32

ALU

MemWr Mux

W_Src

Need dataMemory!

Reg-Write could be from result or data memory

CS141-L4-33 Tarun Soni, Summer’03

CPU: Store Operations

• Mem[ R[rs] + SignExt[imm16] <- R[rt] ] Example: sw rt, rs, imm16

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Extender

Mux

Mux

3216imm16

ALUSrcExtOp

Clk

Data InWrEn

32Adr

DataMemory

MemWr

ALU

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

32

Mux

W_SrcReg can

write to Data Memory

CS141-L4-34 Tarun Soni, Summer’03

CPU: Branching

• beq rs, rt, imm16

– mem[PC] Fetch the instruction from memory

– Equal <- R[rs] == R[rt] Calculate the branch condition

– if (COND eq 0) Calculate the next instruction’s address• PC <- PC + 4 + ( SignExt(imm16) x 4 )

– else• PC <- PC + 4

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

CS141-L4-35 Tarun Soni, Summer’03

CPU: Datapath for Branching

• beq rs, rt, imm16 Datapath generates condition (equal)

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

32

imm16

PCClk

00

Adder

Mux

Adder

4nPC_sel

Clk

busW

RegWr

32

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs Rt

Equa

l?

Cond

PC Ext

Inst Address

Calculate (PC+4) as well as (imm16+PC+4) and choose one

Calculate the “condition” part of the branch op.

CS141-L4-36 Tarun Soni, Summer’03

CPU: The Aggregate Datapathim

m16

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Extender

Mux

3216imm16

ALUSrcExtOp

Mux

MemtoReg

Clk

Data InWrEn32 Adr

DataMemory

MemWrA

LU

Equal

Instruction<31:0>

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRtRs

=

Adder

Adder

PC

Clk

00Mux

4

nPC_sel

PC E

xt

Adr

InstMemory

Still need to worry about Instruction Decode

CS141-L4-37 Tarun Soni, Summer’03

CPU: Datapath: High-level view• Register file and ideal memory:

– The CLK input is a factor ONLY during write operation– During read operation, behave as combinational logic:

• Address valid => Output valid after “access time.”

Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction Memory’s Access Time + Register File’s Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew

Clk

5

Rw Ra Rb32 32-bitRegisters

RdA

LU

Clk

Data In

DataAddress Ideal

DataMemory

Instruction

InstructionAddress

IdealInstruction

Memory

Clk

PC

5Rs

5Rt

16Imm

32

323232

A

B

Nex

t Add

ress

CS141-L4-38 Tarun Soni, Summer’03

CPU: Control Signals

ALUctrRegDst ALUSrcExtOp MemtoRegMemWr Equal

Instruction<31:0>

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel

Adr

InstMemory

DATA PATH

Control

Op

<21:25>

Fun

RegWr

CS141-L4-39 Tarun Soni, Summer’03

CPU: Control Signals: Meaning

Adr

InstMemory

• Rs, Rt, Rd and Imed16 hardwired into datapath• nPC_sel: 0 => PC <– PC + 4; 1 => PC <– PC + 4 + SignExt(Im16) || 00

Adder

Adder

PC

Clk

00Mux

4

nPC_sel

PC Extim

m16

CS141-L4-40 Tarun Soni, Summer’03

CPU: Control Signals: Meaning• ExtOp: “zero”, “sign”• ALUsrc: 0 => regB; 1 =>

immed• ALUctr: “add”, “sub”, “or”

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Extender

Mux

3216imm16

ALUSrcExtOp

Mux

MemtoReg

Clk

Data InWrEn32 Adr

DataMemory

MemWr

ALU

Equal

0

1

0

1

01

° MemWr: write memory° MemtoReg: 1 => Mem° RegDst: 0 => “rt”; 1 =>

“rd”° RegWr: write dest register

=

CS141-L4-41 Tarun Soni, Summer’03

CPU: Control Signals for various operations

inst Register Transfer

ADD R[rd] <– R[rs] + R[rt]; PC <– PC + 4

ALUsrc = RegB, ALUctr = “add”, RegDst = rd, RegWr, nPC_sel = “+4”

SUB R[rd] <– R[rs] – R[rt]; PC <– PC + 4

ALUsrc = RegB, ALUctr = “sub”, RegDst = rd, RegWr, nPC_sel = “+4”

ORi R[rt] <– R[rs] + zero_ext(Imm16); PC <– PC + 4

ALUsrc = Im, Extop = “Z”, ALUctr = “or”, RegDst = rt, RegWr, nPC_sel = “+4”

LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4

ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemtoReg, RegDst = rt, RegWr, nPC_sel = “+4”

STORE MEM[ R[rs] + sign_ext(Imm16)] <– R[rs]; PC <– PC + 4

ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemWr, nPC_sel = “+4”

BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00 else PC <– PC + 4

nPC_sel = EQUAL, ALUctr = “sub”

CS141-L4-42 Tarun Soni, Summer’03

CPU: Control Signals: Logic Design

• nPC_sel <= if (OP == BEQ) then EQUAL else 0• ALUsrc <= if (OP == “000000”) then “regB” else “immed”• ALUctr <= if (OP == “000000”) then funct

elseif (OP == ORi) then “OR”elseif (OP == BEQ) then “sub”

else “add”• ExtOp <= _____________• MemWr <= _____________• MemtoReg <= _____________• RegWr: <=_____________• RegDst: <= _____________

CS141-L4-43 Tarun Soni, Summer’03

CPU: Control Signals: Logic Design

• nPC_sel <= if (OP == BEQ) then EQUAL else 0• ALUsrc <= if (OP == “000000”) then “regB” else “immed”• ALUctr <= if (OP == “000000”) then funct

elseif (OP == ORi) then “OR” elseif (OP == BEQ) then “sub” else “add”

• ExtOp <= if (OP == ORi) then “zero” else “sign”• MemWr <= (OP == Store)• MemtoReg <= (OP == Load)• RegWr: <= if ((OP == Store) || (OP == BEQ)) then 0 else 1• RegDst: <= if ((OP == Load) || (OP == ORi)) then 0 else 1

CS141-L4-44 Tarun Soni, Summer’03

CPU: Example: Load

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Extender

Mux

3216imm16

ALUSrcExtOp

Mux

MemtoReg

Clk

Data InWrEn32 Adr

DataMemory

MemWrA

LUEqual

Instruction<31:0>

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRtRs

=

imm

16

Adder

Adder

PC

Clk

00Mux

4

nPC_sel

PC Ext

Adr

InstMemory

sign ext

addrt+4

R[rt] <- Mem[R[rs] + SignExt[imm16]]Viz., lw rt, rs, imm16

CS141-L4-45 Tarun Soni, Summer’03

CPU: The abstract version

• Logical vs. Physical Structure

DataOut

Clk

5

Rw Ra Rb32 32-bitRegisters

Rd

ALU

Clk

Data In

DataAddress Ideal

DataMemory

Instruction

InstructionAddress

IdealInstruction

Memory

Clk

PC

5Rs

5Rt

32

323232

A

B

Nex

t Add

ress

Control

Datapath

Control Signals Conditions

CS141-L4-46 Tarun Soni, Summer’03

CPU: The real thing

CS141-L4-47 Tarun Soni, Summer’03

CPU: 5 steps to design

• 5 steps to design a processor– 1. Analyze instruction set => datapath requirements– 2. Select set of datapath components & establish clock methodology– 3. Assemble datapath meeting the requirements– 4. Analyze implementation of each instruction to determine setting of control points that

effects the register transfer.– 5. Assemble the control logic

• MIPS makes it easier– Instructions same size– Source registers always in same place– Immediates same size, location– Operations always on registers/immediates

• Single cycle datapath => CPI=1, CCT => long

CS141-L4-48 Tarun Soni, Summer’03

CPU: Control Section

• The Five Classic Components of a Computer

Control

Datapath

Memory

ProcessorInput

Output

CS141-L4-49 Tarun Soni, Summer’03

CPU: Add Instruction

• add rd, rs, rt

– mem[PC] Fetch the instruction from memory

– R[rd] <- R[rs] + R[rt] The actual operation

– PC <- PC + 4 Calculate the next instruction’s address

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

CS141-L4-50 Tarun Soni, Summer’03

CPU: The Add Instruction

Instruction Fetch Unit at the Beginning of Add

PC E

xt

• Fetch the instruction from Instruction memory: Instruction <- mem[PC]– This is the same for all instructions

Adr

InstMemory

Adder

Adder

PC

Clk

00Mux

4

nPC_sel

imm

16Instruction<31:0>

CS141-L4-51 Tarun Soni, Summer’03

CPU: The Add Instruction

The Single Cycle Datapath during Add

32

ALUctr = Add

Clk

busW

RegWr = 1

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst = 1

Extender

Mux

Mux

3216imm16

ALUSrc = 0

Mux

MemtoReg = 0

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>• R[rd] <- R[rs] + R[rt]

0

1

0

1

01<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

op rs rt rd shamt funct061116212631

nPC_sel= +4

CS141-L4-52 Tarun Soni, Summer’03

CPU: The Add Instruction

Instruction Fetch Unit at the End of Add

• PC <- PC + 4– This is the same for all instructions except: Branch and Jump

Adr

InstMemory

Adder

Adder

PC

Clk

00Mux

4

nPC_sel

imm

16Instruction<31:0>

CS141-L4-53 Tarun Soni, Summer’03

CPU: The Or Immediate Instruction

• R[rt] <- R[rs] or ZeroExt[Imm16]

op rs rt immediate016212631

32

ALUctr =

Clk

busW

RegWr =

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst =

Extender

Mux

Mux

3216imm16

ALUSrc =

ExtOp =

Mux

MemtoReg =

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel =

CS141-L4-54 Tarun Soni, Summer’03

CPU: The Or Immediate Instruction

The Single Cycle Datapath during Or Immediate

32

ALUctr = Or

Clk

busW

RegWr = 1

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst = 0

Extender

Mux

Mux

3216imm16

ALUSrc = 1

ExtOp = 0

Mux

MemtoReg = 0

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

• R[rt] <- R[rs] or ZeroExt[Imm16]

0

1

0

1

01<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

op rs rt immediate016212631

nPC_sel= +4

CS141-L4-55 Tarun Soni, Summer’03

CPU: The Load Instruction

The Single Cycle Datapath during Load

32

ALUctr = Add

Clk

busW

RegWr = 1

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst = 0

Extender

Mux

Mux

3216imm16

ALUSrc = 1

ExtOp = 1

Mux

MemtoReg = 1

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

• R[rt] <- Data Memory {R[rs] + SignExt[imm16]}

op rs rt immediate016212631

nPC_sel= +4

CS141-L4-56 Tarun Soni, Summer’03

CPU: The Store Instruction

The Single Cycle Datapath during Store• Data Memory {R[rs] + SignExt[imm16]} <- R[rt]

op rs rt immediate016212631

32

ALUctr =

Clk

busW

RegWr =

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst =

Extender

Mux

Mux

3216imm16

ALUSrc =

ExtOp =

Mux

MemtoReg =

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel =

CS141-L4-57 Tarun Soni, Summer’03

CPU: The Store Instruction

The Single Cycle Datapath during Store

32

ALUctr = Add

Clk

busW

RegWr = 0

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst = x

Extender

Mux

Mux

3216imm16

ALUSrc = 1

ExtOp = 1

Mux

MemtoReg = x

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 1A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

• Data Memory {R[rs] + SignExt[imm16]} <- R[rt]

op rs rt immediate016212631

nPC_sel= +4

CS141-L4-58 Tarun Soni, Summer’03

CPU: Datapath during branch

32

ALUctr = Subtract

Clk

busW

RegWr = 0

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst = x

Extender

Mux

Mux

3216imm16

ALUSrc = 0

ExtOp = x

Mux

MemtoReg = x

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr = 0A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

• if (R[rs] - R[rt] == 0) then Zero <- 1 ; else Zero <- 0

op rs rt immediate016212631

nPC_sel= “Br”

CS141-L4-59 Tarun Soni, Summer’03

CPU: Datapath during branch

Instruction Fetch Unit at the End of Branch

• if (Zero == 1) then PC = PC + 4 + SignExt[imm16]*4 ; else PC = PC + 4

op rs rt immediate016212631

Adr

InstMemory

Adder

Adder

PC

Clk

00Mux

4

nPC_sel

imm

16Instruction<31:0>

CS141-L4-60 Tarun Soni, Summer’03

CPU: Creating control from Datapath

ALUctrRegDst ALUSrcExtOp MemtoRegMemWr Equal

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel

Adr

InstMemory

DATA PATH

Control

Op

<21:25>

Fun

RegWr

CS141-L4-61 Tarun Soni, Summer’03

CPU: Control Signals

inst Register Transfer

ADD R[rd] <– R[rs] + R[rt]; PC <– PC + 4

ALUsrc = RegB, ALUctr = “add”, RegDst = rd, RegWr, nPC_sel = “+4”

SUB R[rd] <– R[rs] – R[rt]; PC <– PC + 4

ALUsrc = RegB, ALUctr = “sub”, RegDst = rd, RegWr, nPC_sel = “+4”

ORi R[rt] <– R[rs] + zero_ext(Imm16); PC <– PC + 4

ALUsrc = Im, Extop = “Z”, ALUctr = “or”, RegDst = rt, RegWr, nPC_sel = “+4”

LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4

ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemtoReg, RegDst = rt, RegWr, nPC_sel = “+4”

STORE MEM[ R[rs] + sign_ext(Imm16)] <– R[rs]; PC <– PC + 4

ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemWr, nPC_sel = “+4”

BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00 else PC <– PC + 4

nPC_sel = “Br”, ALUctr = “sub”

CS141-L4-62 Tarun Soni, Summer’03

CPU: Summary of Control Signals

add sub ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWritenPCselJumpExtOpALUctr<2:0>

1001000x

Add

1001000x

Subtract

01010000

Or

01110001

Add

x1x01001

Add

x0x0010x

Subtract

xxx0001x

xxx

op target address

op rs rt rd shamt funct061116212631

op rs rt immediate

R-type

I-type

J-type

add, sub

ori, lw, sw, beq

jump

funcop 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010Appendix A

10 0000See 10 0010 We Don’t Care :-)

CS141-L4-63 Tarun Soni, Summer’03

CPU: Summary of Control Signals

The Concept of Local Decoding

R-type ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWriteBranchJumpExtOpALUop<N:0>

1001000x

“R-type”

01010000

Or

01110001

Add

x1x01001

Add

x0x0010x

Subtract

xxx0001x

xxx

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

MainControl

op6

ALUControl(Local)

func

N

6ALUop

ALUctr3

ALU

CS141-L4-64 Tarun Soni, Summer’03

CPU: Encoding of ALUop

• In this exercise, ALUop has to be 2 bits wide to represent:– (1) “R-type” instructions– “I-type” instructions that require the ALU to perform:

• (2) Or, (3) Add, and (4) Subtract• To implement the full MIPS ISA, ALUop has to be 3 bits to represent:

– (1) “R-type” instructions– “I-type” instructions that require the ALU to perform:

• (2) Or, (3) Add, (4) Subtract, and (5) And (Example: andi)

MainControl

op6

ALUControl(Local)

func

N

6ALUop

ALUctr3

R-type ori lw sw beq jumpALUop (Symbolic) “R-type” Or Add Add Subtract xxx

ALUop<2:0> 1 00 0 10 0 00 0 00 0 01 xxx

CS141-L4-65 Tarun Soni, Summer’03

CPU: Decoding of the ‘func’ field

R-type ori lw sw beq jumpALUop (Symbolic) “R-type” Or Add Add Subtract xxx

ALUop<2:0> 1 00 0 10 0 00 0 00 0 01 xxx

MainControl

op6

ALUControl(Local)

func

N

6ALUop

ALUctr3

op rs rt rd shamt funct061116212631

R-type

funct<5:0> Instruction Operation10 000010 001010 010010 010110 1010

addsubtractandorset-on-less-than

ALUctr<2:0> ALU Operation000001010110111

AddSubtract

AndOr

Set-on-less-than

Recall

ALUctr

ALU

CS141-L4-66 Tarun Soni, Summer’03

CPU: Truth table for ALUctr

R-type ori lw sw beqALUop(Symbolic) “R-type” Or Add Add Subtract

ALUop<2:0> 1 00 0 10 0 00 0 00 0 01

ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3>

0 0 0 x x x x

ALUctrALUOperation

Add 0 1 0bit<2> bit<1> bit<0>

0 x 1 x x x x Subtract 1 1 00 1 x x x x x Or 0 0 11 x x 0 0 0 0 Add 0 1 01 x x 0 0 1 0 Subtract 1 1 01 x x 0 1 0 0 And 0 0 01 x x 0 1 0 1 Or 0 0 11 x x 1 0 1 0 Set on < 1 1 1

funct<3:0> Instruction Op.00000010010001011010

addsubtractandorset-on-less-than

CS141-L4-67 Tarun Soni, Summer’03

CPU: Logic Equation ALUctr[2]

The Logic Equation for ALUctr<2>

ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3> ALUctr<2>

0 x 1 x x x x 11 x x 0 0 1 0 11 x x 1 0 1 0 1

• ALUctr<2> = !ALUop<2> & ALUop<0> + ALUop<2> & !func<2> & func<1> & !func<0>

This makes func<3> a don’t care

CS141-L4-68 Tarun Soni, Summer’03

CPU: Logic Equation ALUctr[1]

The Logic Equation for ALUctr<1>

ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3>

0 0 0 x x x x 1ALUctr<1>

0 x 1 x x x x 11 x x 0 0 0 0 11 x x 0 0 1 0 11 x x 1 0 1 0 1

• ALUctr<1> = !ALUop<2> & !ALUop<0> + ALUop<2> & !func<2> & !func<0>

CS141-L4-69 Tarun Soni, Summer’03

CPU: Logic Equation ALUctr[0]

The Logic Equation for ALUctr<0>

ALUop funcbit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3> ALUctr<0>

0 1 x x x x x 11 x x 0 1 0 1 11 x x 1 0 1 0 1

• ALUctr<0> = !ALUop<2> & ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1> & func<0> + ALUop<2> & func<3> & !func<2> & func<1> & !func<0>

CS141-L4-70 Tarun Soni, Summer’03

CPU: ALU Control block

The ALU Control Block

ALUControl(Local)

func

3

6ALUop

ALUctr3

• ALUctr<2> = !ALUop<2> & ALUop<0> + ALUop<2> & !func<2> & func<1> & !func<0>

• ALUctr<1> = !ALUop<2> & !ALUop<0> + ALUop<2> & !func<2> & !func<0>

• ALUctr<0> = !ALUop<2> & ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1> & func<0> + ALUop<2> & func<3> & !func<2> & func<1> & !func<0>

CS141-L4-71 Tarun Soni, Summer’03

CPU: Main Control

R-type ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWriteBranchJumpExtOpALUop (Symbolic)

1001000x

“R-type”

01010000

Or

01110001

Add

x1x01001

Add

x0x0010x

Subtract

xxx0001x

xxx

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

ALUop <2> 1 0 0 0 0 xALUop <1> 0 1 0 0 0 xALUop <0> 0 0 0 0 1 x

MainControl

op6

ALUControl(Local)

func

3

6

ALUop

ALUctr3

RegDstALUSrc

:

CS141-L4-72 Tarun Soni, Summer’03

CPU: Main Control The “Truth Table” for RegWrite

R-type ori lw sw beq jumpRegWrite 1 1 1 0 0 0

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

• RegWrite = R-type + ori + lw= !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> (R-type) + !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> (ori) + op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0> (lw)

op<0>

op<5>. .op<5>. .

<0>

op<5>. .

<0>

op<5>. .

<0>

op<5>. .

<0>

op<5>. .

<0>

R-type ori lw sw beq jumpRegWrite

CS141-L4-73 Tarun Soni, Summer’03

CPU: Main Control PLA Implementation of the Main Control

op<0>

op<5>. .op<5>. .

<0>

op<5>. .

<0>

op<5>. .

<0>

op<5>. .

<0>

op<5>. .

<0>

R-type ori lw sw beq jumpRegWrite

ALUSrc

MemtoRegMemWrite

BranchJump

RegDst

ExtOp

ALUop<2>ALUop<1>ALUop<0>

CS141-L4-74 Tarun Soni, Summer’03

CPU Putting it All Together: A Single Cycle Processor

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Extender

Mux

Mux

3216imm16

ALUSrc

ExtOp

Mux

MemtoReg

Clk

Data InWrEn

32Adr

DataMemory

32

MemWrA

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

0

1

0

1

01<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

MainControl

op6

ALUControlfunc

6

3ALUop

ALUctr3

RegDst

ALUSrc:

Instr<5:0>

Instr<31:26>

Instr<15:0>

nPC_sel

CS141-L4-75 Tarun Soni, Summer’03

CPU Worst Case Timing (Load)

Clk

PCRs, Rt, Rd,Op, Func

Clk-to-Q

ALUctr

Instruction Memoey Access Time

Old Value New Value

RegWr

Old Value New Value

Delay through Control Logic

busARegister File Access Time

Old Value New Value

busBALU Delay

Old Value New Value

Old Value New Value

New ValueOld Value

ExtOp Old Value New Value

ALUSrc Old Value New Value

MemtoReg

Old Value New Value

Address

Old Value New Value

busW Old Value New

Delay through Extender & Mux

RegisterWrite Occurs

Data Memory Access Time

CS141-L4-76 Tarun Soni, Summer’03

CPU: Single Cycle Solution

• Long cycle time:– Cycle time must be long enough for the load instruction:

PC’s Clock -to-Q +Instruction Memory Access Time +Register File Access Time +ALU Delay (address calculation) +Data Memory Access Time +Register File Setup Time +Clock Skew

• Cycle time for load is much longer than needed for all other instructions

CS141-L4-77 Tarun Soni, Summer’03

CPU: Single Cycle Solution

° Single cycle datapath => CPI=1, CCT => long

° 5 steps to design a processor• 1. Analyze instruction set => datapath requirements• 2. Select set of datapath components & establish clock methodology• 3. Assemble datapath meeting the requirements• 4. Analyze implementation of each instruction to determine setting of control points

that effects the register transfer.• 5. Assemble the control logic

° Control is the hard part

° MIPS makes control easier• Instructions same size• Source registers always in same place• Immediates same size, location• Operations always on registers/immediates

Control

Datapath

Memory

ProcessorInput

Output

CS141-L4-78 Tarun Soni, Summer’03

CPU: Interrupts

° Datapath for interrupts

° Interrupt: basically hardware line requesting an immediate jump

° PC = Int[I] if Int[I] = 1;

° May or maynot save registers

° May or maynot be maskable.

° Useful for multitasking control & real-time processing

° Signal Processing

° Harder to implement in case of a multi-cycle/pipelines system !

Recommended