79
Chapter 5 The Processor Husam Alzaq Husam Alzaq Islamic University of Gaza 2009/2010

Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

  • Upload
    buitram

  • View
    227

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Chapter 5The Processor

Husam AlzaqHusam AlzaqIslamic University of Gaza

2009/2010

Page 2: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Introduction§4.1 Int

CPU performance factorsI t ti t

roduction

Instruction countDetermined by ISA and compiler

CPI and Cycle time

n

CPI and Cycle timeDetermined by CPU hardware

We will examine two MIPS implementationspA simplified versionA more realistic pipelined versionp p

Simple subset, shows most aspectsMemory reference: lw, swy ,Arithmetic/logical: add, sub, and, or, slt

Control transfer: beq, j

22 Chapter 5 — The Processor

Page 3: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

The CPUProcessor (CPU): the active part of the

t hi h d ll th k (d tcomputer, which does all the work (data manipulation and decision-making)Datapath: portion of the processor which contains hardware necessary to perform operations required by the processor (the brawn)Control: portion of the processor (also in hardware) which tells the datapath what ) pneeds to be done (the brain)

33 Chapter 5 — The Processor

Page 4: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Instruction ExecutionPC → instruction memory, fetch instructionRegister numbers → register file, read registersDepending on instruction classp g

Use ALU to calculateArithmetic resultMemory address for load/storeBranch target address

Access data memory for load/storePC ← target address or PC + 4

44 Chapter 5 — The Processor

Page 5: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Basic Instruction Cycle

55 Chapter 5 — The Processor

Page 6: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

CPU Overview

66 Chapter 5 — The Processor

Page 7: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

MultiplexersCan’t just join wires together

Use multiplexers

77 Chapter 5 — The Processor

Page 8: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Control

88 Chapter 5 — The Processor

Page 9: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Question?Why do we have two separate memories, one for instruction and the others for Data, in the previous figure??p g

99 Chapter 5 — The Processor

Page 10: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Logic Design Basics§4.2 Logic D

esigInformation encoded in binary gn Conve

Low voltage = 0, High voltage = 1One wire per bit entions

One wire per bitMulti-bit data encoded on multi-wire buses

C bi ti l l tCombinational elementOperate on dataOutput is a function of input

State (sequential) elementsState (sequential) elementsStore information

1010 Chapter 5 — The Processor

Page 11: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Combinational Elements

AND gate AAdderAND-gateY = A & B

A

BY+

AdderY = A + B

AB

Y

MultiplexerArithmetic/Logic Unit

Y = F(A, B)

I0 YM

Y = S ? I1 : I0A

YALU

( , )

0I1 Yu

x

S

B

YALU

F

1111 Chapter 5 — The Processor

S F

Page 12: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Sequential ElementsRegister: stores data in a circuit

Uses a clock signal to determine when to update the stored valuepEdge-triggered: update when Clk changes from 0 to 1from 0 to 1

ClkD Q

Clk

D

Clk Q

1212 Chapter 5 — The Processor

Page 13: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Sequential ElementsRegister with write control

Only updates on clock edge when write control input is 1pUsed when stored value is required later

Clk

D QWrite

Write

DClk

Q

1313 Chapter 5 — The Processor

Page 14: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Clocking MethodologyCombinational logic t f d t d itransforms data during clock cycles

Between clock edgesInput from state elements, output to state elementLongest delay determines clock period

1414 Chapter 5 — The Processor

Page 15: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Building a Datapath§4.3 B

u

Datapath

uilding a D

Elements that process data and addressesin the CPU

Datapath

Registers, ALUs, mux’s, memories, …

We will build a MIPS datapath

h

We will build a MIPS datapath incrementally

R fi i h i d iRefining the overview design

1515 Chapter 5 — The Processor

Page 16: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Fetch elements

1616 Chapter 5 — The Processor

Page 17: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Instruction Fetch

Increment by 4 for next

32-bit register

4 for next instruction

1717 Chapter 5 — The Processor

Page 18: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

R-Format InstructionsRead two register operandsPerform arithmetic/logical operationWrite register resultWrite register result

1818 Chapter 5 — The Processor

Page 19: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Load/Store InstructionsRead register operandsC l l t dd i 16 bit ff tCalculate address using 16-bit offset

Use ALU, but sign-extend offsetL d R d d d t i tLoad: Read memory and update registerStore: Write register value to memory

1919 Chapter 5 — The Processor

Page 20: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Branch InstructionsRead register operandsCompare operands

Use ALU subtract and check Zero outputUse ALU, subtract and check Zero outputCalculate target address

Sign-extend displacementShift left 2 places (word displacement)S t e t p aces ( o d d sp ace e t)Add to PC + 4

Already calculated by instruction fetchAlready calculated by instruction fetch

2020 Chapter 5 — The Processor

Page 21: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Branch Instructions

JustJustre-routes

wires

Sign-bit wire

2121 Chapter 5 — The Processor

replicated

Page 22: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Composing the ElementsFirst-cut data path does an instruction in one clock cycle

Each datapath element can only do oneEach datapath element can only do one function at a timeHence we need separate instruction and dataHence, we need separate instruction and data memories

U lti l h lt t d tUse multiplexers where alternate data sources are used for different instructions

2222 Chapter 5 — The Processor

Page 23: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

R-Type/Load/Store Datapath

2323 Chapter 5 — The Processor

Page 24: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Full Datapath

2424 Chapter 5 — The Processor

Page 25: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

ALU Control§4.4 A S

ALU used for

Sim

ple Im

Load/Store: F = addBranch: F = subtract

mplem

entBranch: F subtractR-type: F depends on funct field

tation Scchem

eALU control Function0000 AND0001 OR0010 add0110 subtract0110 subtract0111 set-on-less-than1100 NOR

2525 Chapter 5 — The Processor

Page 26: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

2626 Chapter 5 — The Processor

Page 27: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

ALU ControlAssume 2-bit ALUOp derived from opcode

Combinational logic derives ALU control

opcode ALUOp Operation funct ALU function ALU controllw 00 load word XXXXXX add 0010

00 t d XXXXXX dd 0010sw 00 store word XXXXXX add 0010beq 01 branch equal XXXXXX subtract 0110R-type 10 add 100000 add 0010

subtract 100010 subtract 0110AND 100100 AND 0000OR 100101 OR 0001OR 100101 OR 0001set-on-less-than 101010 set-on-less-than 0111

2727 Chapter 5 — The Processor

Page 28: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

The Main Control Unit

2828 Chapter 5 — The Processor

Page 29: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

The Main Control Unit

2929 Chapter 5 — The Processor

Page 30: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

The Main Control UnitControl signals derived from instruction

0 rs rt rd shamt functR-type31:26 5:025:21 20:16 15:11 10:6

35 or 43 rs rt addressLoad/Store

31:26 25:21 20:16 15:0

4 rs rt address

Store

Branch 4 rs rt address31:26 25:21 20:16 15:0

Branch

opcode always read

read, except for load

write for R-type

and load

sign-extend and add

3030 Chapter 5 — The Processor

for load and load

Page 31: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Datapath With Control

3131 Chapter 5 — The Processor

Page 32: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Controller Signal

3232 Chapter 5 — The Processor

Page 33: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Controller Signal

Memto Reg Mem MemInstruction RegDst ALUSrc

Memto-Reg

Reg Write

Mem Read

Mem Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1

3333 Chapter 5 — The Processor

Page 34: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

R-Type Instruction

3434 Chapter 5 — The Processor

Page 35: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Load Instruction

3535 Chapter 5 — The Processor

Page 36: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Branch-on-Equal Instruction

3636 Chapter 5 — The Processor

Page 37: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Mapping the Main Control Function to Gates

How do we generate all the signals?Simple combinational logic (truth tables) Use a structured two-level logic array – PLAmUse a structured two level logic array PLAmby using an array of AND gates followed by an array of OR gates. A PLA is one of the mostarray of OR gates. A PLA is one of the most common ways to implement a control function.

See Appendix C pages C-7 and C-8See Appendix C, pages C 7 and C 8We will revisit this to cover different implementation techniques (ROM PLAimplementation techniques (ROM, PLA, sequencer, etc.

3737 Chapter 5 — The Processor

Page 38: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

3838 Chapter 5 — The Processor

Page 39: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

3939 Chapter 5 — The Processor

Page 40: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Implementing Jumps

2 addressJump

J d dd

31:26 25:0Jump

Jump uses word addressUpdate PC with concatenation ofp

Top 4 bits of old PC26 bit jump address26-bit jump address00

Need an extra control signal decoded from opcode

4040 Chapter 5 — The Processor

p

Page 41: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Datapath With Jumps Added

4141 Chapter 5 — The Processor

Page 42: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Executing different types of instructions

Which functional units are used?An example: EXECUTING AN R-type INSTRUCTION

Step #1: Instruction is fetched from the instruction memory and the PC is incrementedStep #2: two operands are read from the register file; the main control lines are setStep #3: ALU control generates ALU codes and performs operations on data read from the register fileSt #4 Th lt f ALU i itt b k t thStep #4: The result from ALU is written back to the register file

4242 Chapter 5 — The Processor

Page 43: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Functional units used by yinstruction class

4343 Chapter 5 — The Processor

Page 44: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Our Simple Control StructureAll of the logic is combinationalWe wait for everything to settle down, and the right thing to be donethe right thing to be done

ALU might not produce “right answer” right awayawaywe use write signals along with clock to d t i h t itdetermine when to write

Cycle time determined by length of the y y glongest path

4444 Chapter 5 — The Processor

Page 45: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Cycle time

4545 Chapter 5 — The Processor

Page 46: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Performance IssuesLongest delay determines clock period

Critical path: load instructionInstruction memory → register file → ALU →Instruction memory → register file → ALU →data memory → register file

Not feasible to vary period for differentNot feasible to vary period for different instructionsViolates design principle

Making the common case fastMaking the common case fastWe will improve performance by pipelining

4646 Chapter 5 — The Processor

Page 47: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Example: Performance of single cycle Machine

Calculate cycle time assuming negligible delays except:

Memory (200ps),Memory (200ps),ALU and adders (100ps)Register file access (50ps)Register file access (50ps)

25% of the instructions are loads, 10% stores, 45% are ALU, 10% branches and 5% are jump5% are jump

4747 Chapter 5 — The Processor

Page 48: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Example: Performance of single cycle Machine

4848 Chapter 5 — The Processor

Page 49: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Example: Performance of single cycle Machine

If you use a fixed clock cycle, determine the clock cycle

If you use a variable clock cycle, determine the clock cyclethe clock cycle

Which is better?

4949 Chapter 5 — The Processor

Page 50: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Single Cycle Implementation - Problems

InefficientCPI is 1CPI is 1Clock cycle determined by the longest pathWaste of resources (2 ALUs, etc) = waste of areaWaste of resources (2 ALUs, etc) waste of area

Performance: Calculate cycle time assuming:Negligible delays except memory (200ps), ALU and adders g g y p y ( p )(100ps), register file access (50ps)

Instruction mix: 25% loads, 10% stores, 45% ALU, 15% b h 5% jbranches, 5% jumpsCompare two implementations:

h i i 1 fi d l k leach instruction – 1 fixed clock cycleeach instruction – 1 variable length cock cycle

5050

Penalty seems small, but increases when FP taken into account Chapter 5 — The Processor

Page 51: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

A Multicycle Implementation§5.5 A M

An implementation in which an instruction

Multicycle

is executed in multiple cycleObjective: To re-implement the MIPS

e ImplemObjective: To re implement the MIPS

instruction set using a multi-cycle implementation

mentation

implementation. The benefits are

Shared hardware Instructions can take a different number ofInstructions can take a different number of cycles (reduced computing time).

5151 Chapter 5 — The Processor

Page 52: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

A High-level view of Multicycle Datapath g y pA single memory unit is used for both instructions and d tdata.A single ALU is used rather than an ALU and two adders.One or more registers are added after every major functional unit.

52

Page 53: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Multicycle Approach

Break up the instructions into steps,Break up the instructions into steps, each step takes a cycle

balance the amount of work to be donebalance the amount of work to be donerestrict each cycle to use only one major f ti l itfunctional unitFunctional units: memory, register file, and ALU

At the end of a cycleAt the end of a cycleUse internal registers to store results between steps

5353

between stepsChapter 5 — The Processor

Page 54: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Continue

Replacing the three ALUs of the single-cycle by a single ALU means that the single ALU must accommodate allALU means that the single ALU must accommodate all the inputs that used to go to the three different ALUs.

5454

Page 55: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

ContinueControl signals:

The programmer-visible state units (PC, Memory, Register file) and IR writeMemory ReadALU control: same asALU control: same as single cycleMultiplexor single/twoMultiplexor single/two control lines

5555

Page 56: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Continue PC write control signal:PCWrite : PC+4 and

Three possible sources for the PC:ALUOut : address of the beq

PCWrite : PC 4 and jump PCWriteCond : beq

ALUOut : address of the beqAddress for jump ( j ) PC+4PC+4

5656

Page 57: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Continue

5757

Page 58: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Breaking the Instruction Execution into Clock Cycles

1. Instruction fetch step

IR <= Memory[PC];

IR <= Memory[PC];MemRead

y[ ];PC <= PC + 4;

MemReadIRWriteIorD = 0-------------------------------PC <= PC + 4;ALUSrcA = 0ALUSrcA 0ALUSrcB = 01ALUOp = 00 (for add)

PCSource = 00PCWrite

58

PCWriteThe increment of the PC and instruction memory access can occur in parallel, how?

Page 59: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Breaking the Instruction Execution into Clock Cycles

2. Instruction decode and register 2. Instruction decode and register fetch step

Actions that are either applicable to all instructionsOr are not harmful

A <= Reg[IR[25:21]];B <= Reg[IR[20:16]];ALUOut <= PC + (sign-extend(IR[15-0] << 2 );

5959

Page 60: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

2. Instruction decode and register fetch stepA <= Reg[IR[25:21]];B <= Reg[IR[20:16]];

ALUOut <= PC + (sign-extend(IR[15-0] << 2 );

A <= Reg[IR[25:21]];B <= Reg[IR[20:16]];Since A and B are overwritten on

every cycle Doneevery cycle Done------------------------------------------ALUOut <= PC + (sign-

extend(IR[15-0]<<2);Thi iThis requires:ALUSrcA 0ALUSrcB 11ALUOp 00 (for add)

branch target address will be stored in ALUOut.

60The register file access and computation of branch target occur in parallel.

Page 61: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Breaking the Instruction Execution into Clock Cyclesg y

3. Execution, memory address computation, or branch completion

Memory reference:Memory reference:ALUOut <= A + sign-extend(IR[15:0]);

Arithmetic logical instruction:Arithmetic-logical instruction:ALUOut <= A op B;

Branch:if (A == B) PC <= ALUOut;( )

Jump:PC <= { PC[31:28] (IR[25:0] 2’b00)};

6161

PC <= { PC[31:28], (IR[25:0], 2’b00)};

Page 62: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Memory reference:ALUOut <= A + sign-extend(IR[15:0]);ALUS A 1 && ALUS B 10

3. Execution, memory address computation, or branch completion

ALUSrcA = 1 && ALUSrcB = 10 ALUOp = 00

Arithmetic-logical instruction:ALUOut <= A op B;ALUSrcA = 1 && ALUSrcB = 00 ALUOp = 10

Branch:if (A == B) PC <= ALUOut;ALUSrcA = 1 && ALUSrcB = 00 ALUO 01 (f bt ti )ALUOp = 01 (for subtraction)PCSource = 01PCWriteCond

Jump:PC <= { PC[31:28], (IR[25:0],2’b00) };PCSource = 10PCWrite

62

PCWrite

Page 63: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Breaking the Instruction Execution into Clock Cyclesg y4. Memory access or R-type instruction completion step

Memory reference:MDR M [ALUO ] M R dMDR <= Memory [ALUOut]; MemRead

or IorD=1Memory [ALUOut] <= B; MemWritey [ ] ;

Arithmetic-logical instruction (R-type):R [IR[15 11]] ALUO t R D t 1 R W itReg[IR[15:11]] <= ALUOut; RegDst=1 RegWrite

MemtoReg=0Memory read completion step5. Memory read completion stepLoad:

Reg[IR[20:16]] <= MDR; MemtoReg=1 RegWriteReg[IR[20:16]] <= MDR; MemtoReg=1 RegWriteRegDst=0

6363

Page 64: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Breaking the Instruction Execution into Clock Cyclesg y

6464

Page 65: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Defining the Controlg

Two different techniques to design the control:

Finite state machineFinite state machineMicroprogramming

E l CPI i M lti l CPUExample: CPI in a Multicycle CPUUsing the SPECINT2000 instruction mix, which is: 25% load, 10% store, 11% branches, 2% jumps, and 52% ALU., j p ,What is the CPI, assuming that each state in the multicycle CPU requires 1 clock cycle?

Answer:The number of clock cycles for each instruction class is the following:

Load: 5Stores: 4

6565

Stores: 4ALU instruction: 4Branches: 3Jumps: 3

Page 66: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Example Continue The CPI is given by the following:

CPII∑n count Instructio

CPIn countInstruction countInstructio

cyclesCPU clock CPI ii∑ ×==

CPIn countInstruction countInstructioCPI i

i

ratio The

∑ ×=

is simply the instruction frequency for the instruction class i. We can therefore substitute to bt i

n countInstruction countInstructio i

obtain:

CPI = 0.25×5 + 0.10×4 + 0.52×4 + 0.11×3 + 0.02×3 = 4.12

This CPI is better than the worst-case CPI of 5.0 when all instructions take the same number of clock cycles.

66

Page 67: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Defining the Control (Cont.)g ( )

67

Page 68: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Defining the Control (Cont.)g ( )

The completeThe complete finite state machinemachine control

6868

Page 69: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Defining the Control (Cont.)g ( )Finite state machine controllers are typically implemented using a block of combinational logic and a register to holdcombinational logic and a register to hold the current state.

69

Page 70: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Exceptions and Interrupts§5.6 E

x

“Unexpected” events requiring changei fl f t l

xceptions

in flow of controlDifferent ISAs use the terms differently

ExceptionArises within the CPU

e.g., undefined opcode, overflow, syscall, …

InterruptFrom an external I/O controller

Dealing with them without sacrificingDealing with them without sacrificing performance is hard

7070 Chapter 5 — The Processor

Page 71: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

5.6 ExceptionspExceptionsInterruptsType of event From where? MIPS terminologyType of event From where? MIPS terminologyI/O device request External Interrupt

Invoke the operating system from user program Internal Exception

Arithmetic overflow Internal Exception

Using an undefined instruction Internal Exception

Hardware malfunction Either Exception or interruptHardware malfunction Either Exception or interrupt

71

Page 72: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

How Exception Are Handled

To communicate the reason for an exception:1 a status register ( called the Cause register)1. a status register ( called the Cause register)2. vectored interrupts

Exception type Exception vector address (in hex)Undefined instruction C000 0000hex

Arithmetic overflow C000 0020hex

7272

Page 73: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

How Control Checks for ExceptionAssume two possible exceptions:

Undefined instructionUndefined instructionArithmetic overflow

7373

Page 74: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Continue

7474

The multicycle datapath with the addition needed to implement exceptions

Page 75: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Continue

7575

The finite state machine with the additions to handle exception detection

Page 76: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Handling ExceptionsIn MIPS, exceptions managed by a System Control Coprocessor (CP0)Control Coprocessor (CP0)Save PC of offending (or interrupted) instruction

I MIPS E ti P C t (EPC)In MIPS: Exception Program Counter (EPC)Save indication of the problem

I MIPS C i tIn MIPS: Cause registerWe’ll assume 1-bit

0 for undefined opcode 1 for overflow0 for undefined opcode, 1 for overflow

Jump to handler at 8000 00180

7676 Chapter 5 — The Processor

Page 77: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

An Alternate MechanismVectored Interrupts

Handler address determined by the causeExample:p

Undefined opcode: C000 0000Overflow: C000 0020Overflow: C000 0020…: C000 0040

Instructions eitherInstructions eitherDeal with the interrupt, orJump to real handler

7777 Chapter 5 — The Processor

Page 78: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Handler ActionsRead cause, and transfer to relevant h dlhandlerDetermine action requiredqIf restartable

Take corrective actionTake corrective actionuse EPC to return to program

Oth iOtherwiseTerminate programReport error using EPC, cause, …

7878 Chapter 5 — The Processor

Page 79: Chapter 5 The Processorsite.iugaza.edu.ps/.../02/CA_Chapter_5_The_Processor.pdfState(sequential)State (sequential)elements Store information 1010 Chapter 5 — The Processor Combinational

Concluding Remarks§4.14 C

ISA influences design of datapath and control

Concludin

Datapath and control influence design of ISAPipelining improves instruction throughput

ng Rem

arp g p g pusing parallelism

More instructions completed per second

rks

p pLatency for each instruction not reduced

Hazards: structural data controlHazards: structural, data, control

7979 Chapter 5 — The Processor