31
Gary Marsden Slide 1 University of Cape Town Chapter 5 - The Processor Machine Performance factors Instruction Count, Clock cycle time, Clock cycles per instruction (CPI) Both clock cycle time and CPI are determined by processor implementation We will construct datapath and a control unit for 2 different processor implementations for ‘core’ instructions Memory ref: lw/sw Arithmetic: add/sub/and/or/slt Control: beq/j

Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Embed Size (px)

Citation preview

Page 1: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 1University of Cape Town

Chapter 5 - The Processor

Machine Performance factors– Instruction Count, Clock cycle time, Clock cycles per

instruction (CPI)

Both clock cycle time and CPI are determined by processor implementation

We will construct datapath and a control unit for 2 different processor implementations for ‘core’ instructions– Memory ref: lw/sw– Arithmetic: add/sub/and/or/slt– Control: beq/j

Page 2: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 2University of Cape Town

Implementation Overview

Consider a core subset of MIPS instructions:– Integer arith-log instructions– Memory-reference instructions– Branch instructions

Good news is that much is similar across different instructions

For every instruction– Set the PC to a memory location to fetch an

instruction– Read one or two registers using instructions

fields to choose registers

Page 3: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 3University of Cape Town

Differing Instructions

After previous 2 steps, instructions divergeAll instructions do use the ALU next

– Arith-log: for opcode execution– Mem-ref: for effective address calculation– Branches: for comparison

After using the ALU– Arith-log: write data from ALU to register– Mem-ref: access memory containing data to

complete a store or retrieve a word being loaded

– Branch: may need to exchange next instruction address based on comparison

Page 4: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 4University of Cape Town

High-level view

Two types of functional units:– elements that operate on data values

(combinational)– elements that contain state (sequential)

Registers

Register #

Data

Register #

Datamemory

Address

Data

Register #

PC Instruction ALU

Instructionmemory

Address

Page 5: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 5University of Cape Town

Clocking methodology

Defines when signals can be read and when they can be written

Assume an edge-triggered clock– Clock cycles between high and low– Clock period: time for one full cycle

Clock cycle

Stateelement

1Combinational logic

Stateelement

2

Stateelement

Combinational logic

Page 6: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 6University of Cape Town

MIPS subset implementation

Develop 2 implementations– Single long clock cycle for each instruction

(simple)– Multiple clock cycles per instructions (complex)

Input / Output– Nearly all elements have 32 bit wide

inputs/outputs– Buses: signals > 1 bit (thick lines)– Control signals vs data signals

• Notation: control in colour

Page 7: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 7University of Cape Town

Building Blocks

1. Instruction Memory: a place to store program instructions

2. Program Counter (PC): the address of an instruction

3. Adder: to increment the PC to the instruction location

PC

Instructionmemory

Instructionaddress

Instruction

a. Instruction memory b. Program counter

Add Sum

c. Adder

Page 8: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 8University of Cape Town

The common bit

Instruction execution1. Fetch instruction from memory2. Increment PC to next instruction (PC += 4)

PC

Instructionmemory

Readaddress

Instruction

4

Add

Page 9: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 9University of Cape Town

R-format

add, sub, slt, and, or– E.g add $1,$2,$3 ($1 = $2+$3)

Need fourth element: Register file– Contains register state of the machine– Register can be read or written by specifying

number• 2 read ‘ports’ and 1 write ‘port’• 32 registers => 5 bit register number

Fifth element: ALU– 3 bit operation signal

Page 10: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 10University of Cape Town

R-type elements

ALU control

RegWrite

RegistersWriteregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writedata

ALUresult

ALU

Data

Data

Registernumbers

a. Registers b. ALU

Zero5

5

5 3

Page 11: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 11University of Cape Town

R-format execution

Only two elements required– Read 2 registers– Perform ALU operation on the contents of the

registers– Write the result

InstructionRegisters

Writeregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writedata

ALUresult

ALU

Zero

RegWrite

ALU operation3

Page 12: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 12University of Cape Town

Load and Store Operations

lw $1, offset_value($2) sw $1, offset_value($2)

– Address found by adding offset to contents of $2

Besides previous elements, needSixth element: Data Memory Unit

– State element with inputs (read address, write address, write data) and a ‘read data’ output

Seventh element: Sign Extension Unit– Memory addresses are all 32 but, so ‘offset’ is

extended from 16 to 32 bits

Page 13: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 13University of Cape Town

Sign extension

Consider 16 bit version of 2– 0000 0000 0000 0010

Sign extend by copying most significant bit into the new 32bit word– 0000 0000 0000 0000 0000 0000 0000 0010

Consider 16 bit of -1 (1->0, 0->1 and add 1)– 1111 1111 1111 1110– -> 1111 1111 1111 1111 1111 1111 1111 1110

One of the ‘magic’ reasons for using 2’s compliment

Page 14: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 14University of Cape Town

Sixth and Seventh logic elements

16 32Sign

extend

b. Sign-extension unit

MemRead

MemWrite

Datamemory

Writedata

Readdata

a. Data memory unit

Address

Page 15: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 15University of Cape Town

Executing load and store

Address in memory is sign extend (offset + contents of $2)

Store: value from $1 is put in this location Load: value from location is put in to $1

Instruction

16 32

RegistersWriteregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Datamemory

Writedata

Readdata

Writedata

Signextend

ALUresult

ZeroALU

Address

MemRead

MemWrite

RegWrite

ALU operation3

Page 16: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 16University of Cape Town

Branch instruction

beq $1, $2, offsetNeed to compare the contents of $1 and $2 If they are equal, we need to calculate a

new value for the PC using the offsetThe offset is relative to the branch

instruction– So we need to add it to the current PC

The offset is a word offset, not a byte offset!

Page 17: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 17University of Cape Town

Word offset

If the offset was a byte offset, the last two bits would always be ‘00’ as instructions take 4 bytes of memory:– 0, 4, 8, 12, 16, 20 etc.– 00000, 00100, 01000,01100, 10000, 10100

etc.

This is wastefulBy using a word offset, the range is

extended by a factor of four

Page 18: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 18University of Cape Town

Executing branch

If ($1 == $2) PC = PC + (offset << 2)

16 32Sign

extend

ZeroALU

Sum

Shiftleft 2

To branchcontrol logic

Branch target

PC + 4 from instruction datapath

Instruction

Add

RegistersWriteregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writedata

RegWrite

ALU operation3

Page 19: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 19University of Cape Town

Putting it all together - a simple implementation

We know what elements we need, but we need control (mysterious orange lines)

If creating a single datapath– Execute everything in one cycle– No datapath resource used more than once per

instruction (duplication)

Elements common to different instructions can be shared - implies multiplexor – Selector for multiple inputs to the same element

port MUX

A

B

S

C

Page 20: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 20University of Cape Town

Combined path

Key differences between arith-log and mem-ref: Second ALU input & Result register input

PC

Instructionmemory

Readaddress

Instruction

16 32

Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

ALUresult

Zero

Datamemory

Address

Writedata

Readdata M

ux

4

Add

Mux

ALU

RegWrite

ALU operation3

MemRead

MemWrite

ALUSrcMemtoReg

Page 21: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 21University of Cape Town

Adding branch path

Use adder to compute target addressAnother Mux for PC

PC

Instructionmemory

Readaddress

Instruction

16 32

Add ALUresult

Mux

Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1Readregister 2

Shiftleft 2

4

Mux

ALU operation3

RegWrite

MemRead

MemWrite

PCSrc

ALUSrc

MemtoReg

ALUresult

ZeroALU

Datamemory

Address

Writedata

Readdata M

ux

Signextend

Add

Page 22: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 22University of Cape Town

Control - the ALU

5 of 8 options usedNeed to generate 3 bit input code to ALU

for each instruction type3 types of code implies 2 bit control

(ALUOp)

ALU ResultZero

Overflow

a

b

ALU operation

CarryOut

ALU input Function

000 AND

001 OR

010 Add

110 Subtract

111 SLT

Page 23: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 23University of Cape Town

ALU control for instruction types

Page 24: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 24University of Cape Town

Main control

ALU control relatively easy (not temporal)– PLA / Simple custom controller

To define the rest of the control circuit– Identify control lines and instruction

components

Before we do that, we need to look at the instruction types to understand data bus requirements

Page 25: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 25University of Cape Town

Instruction analysis

Target register*

op rs rt rd shamt functR

op rs rd addressLS

op rs rtB address

31-26 25-21 20-16 15-11 10-6 5-0

offset

Base register

* This implies a Mux

Page 26: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 26University of Cape Town

What does that look like?

MemtoReg

MemRead

MemWrite

ALUOp

ALUSrc

RegDst

PC

Instructionmemory

Readaddress

Instruction[31– 0]

Instruction [20– 16]

Instruction [25– 21]

Add

Instruction [5– 0]

RegWrite

4

16 32Instruction [15– 0]

0Registers

WriteregisterWritedata

Writedata

Readdata 1

Readdata 2

Readregister 1Readregister 2

Signextend

ALUresult

Zero

Datamemory

Address Readdata M

ux

1

0

Mux

1

0

Mux

1

0

Mux

1

Instruction [15– 11]

ALUcontrol

Shiftleft 2

PCSrc

ALU

Add ALUresult

Page 27: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 27University of Cape Town

What do the orange bits do?

RegDest– Source of the destination register for the operation

RegWrite– Enables writing a register in the register file

ALUsrc– Source of second ALU operand, can be a register or part

of the instruction PCsrc

– Source of the PC (increment [PC + 4] or branch) MemRead / MemWrite

– Reading / Writing from memory MemtoReg

– Source of write register contents

Page 28: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 28University of Cape Town

Building the control unit

All but one of the 7 lines can be set using op-code bits– PCSrc is determined by output from the ALU as

well as op-code (need an AND gate)

Besides this 7, there are 2 for the ALUOpTo set these, all we need are the 6 bits

determining the op-code

Page 29: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 29University of Cape Town

Bunch up - inserting the control unit

PC

Instructionmemory

Readaddress

Instruction[31– 0]

Instruction [20 16]

Instruction [25 21]

Add

Instruction [5 0]

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

BranchRegDst

ALUSrc

Instruction [31 26]

4

16 32Instruction [15 0]

0

0Mux

0

1

Control

Add ALUresult

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

PCSrc

Datamemory

Writedata

Readdata

Mux

1

Instruction [15 11]

ALUcontrol

Shiftleft 2

ALUAddress

Page 30: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 30University of Cape Town

Truth table

Page 31: Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per

Gary Marsden Slide 31University of Cape Town

Sample R-type execution

Instruction fetched and PC incremented$2 and $3 are read from register fileALU operates on the dataThe result from the ALU is written to

register file