47
The Processor: Multicycle Implementation

The Processor: Multicycle Implementation

  • Upload
    devona

  • View
    77

  • Download
    0

Embed Size (px)

DESCRIPTION

The Processor: Multicycle Implementation. Reminder: Single-Cycle Datapath. Multicycle Approach. Break up the instructions into steps, each step takes a cycle balance the amount of work in cycles only one major functional unit is used in each cycle At the end of each cycle - PowerPoint PPT Presentation

Citation preview

Page 1: The Processor: Multicycle Implementation

The Processor:Multicycle Implementation

Page 2: The Processor: Multicycle Implementation

Reminder: Single-Cycle Datapath

Page 3: The Processor: Multicycle Implementation

Multicycle ApproachBreak up the instructions into steps, each

step takes a cyclebalance the amount of work in cyclesonly one major functional unit is used in each

cycleAt the end of each cycle

store values for use in later cycles introduce additional “internal” registers

We will be recycling functional unitsALU used to compute branch address and to

increment PCThe same memory used for instruction and data

We will use finite state machine for control

Page 4: The Processor: Multicycle Implementation

Multicycle Implementation

buffer registers between functional units which will be updated every clock cycle

IR: Instruction register MDR: Memory Data register A and B registers to hold register operand values read

from register file ALUOut: to hold output of ALU

Page 5: The Processor: Multicycle Implementation

Datapath with Control Signals

Page 6: The Processor: Multicycle Implementation

Handling PC

Multiplexor for selecting the source for the PCALU output after PC+4ALU output after branch target address calculationNew address with jump instructions (26 bits of IR)

Control signals for writing PCUnconditionally after a normal instruction and jump

- PCWriteConditionally overwritten if a branch is taken

When PCWriteCond and zero are asserted, the branch address at the output of ALU is written into the PC.

Page 7: The Processor: Multicycle Implementation

Handling PC

Address

write data

Memory

MemData

PC 1

PCWrite

PCWriteCond

zero

0

from ALUOut

IorD

PC+4,jump address,branch address

Page 8: The Processor: Multicycle Implementation

Complete Datapath & Control

Page 9: The Processor: Multicycle Implementation

Five Execution Steps1. Instruction Fetch

2. Instruction Decode and Register Fetch

3. Execution, Memory Address Computation, or Branch Completion

4. Memory Access or R-type instruction completion

5. Write-back step

INSTRUCTIONS TAKE 3 - 5 CYCLES!

Page 10: The Processor: Multicycle Implementation

Step 1: Instruction FetchUse PC to get the instruction and put it in IR.Increment the PC by 4 and put the result back

in the PC.Can be described succinctly using RTL

IR <= Memory[PC];PC <= PC + 4;

The ALU is used for incrementing PCControl signals

MemRead = ? IRWrite = ? PCWrite = ?IorD = ? ALUSrcA = ?ALUSrcB = ?? ALUOp = ?? PCSource = ??

Page 11: The Processor: Multicycle Implementation

Step 2: Instruction Decode & Register Fetch

1. Read registers rs and rt in case we need them2. Compute the branch address in case the

instruction is a branchA <= Reg[IR[25-21]];B <= Reg[IR[20-16]];ALUOut <= PC + (sign-extend(IR[15-

0])<<2); We aren't setting any control lines based on the

instruction type yetALUSrcA = ? (PC)ALUSrcB = ?? (sign-extended & shifted

offset)ALUOp = ?? (add)

Page 12: The Processor: Multicycle Implementation

Step 3 (Instruction Dependent)

ALU performs for one of the four functions1. Memory Reference:

ALUOut <= A + sign-extend(IR[15-0]);ALUSrcA = ?, ALUSrcB = ??, ALUOp = ??

2. R-type:ALUOut <= A op B;ALUSrcA = ?, ALUSrcB = ??, ALUOp = ??

3. Branch:if (A == B) PC <= ALUOut;ALUSrcA = ?, ALUSrcB = ??, ALUOp = ??,

PCWriteCond = ?, PCSource = ??

4. Jump:PC <= PC[31:28]||(IR[25:0]<<2) PCWrite = ?, PCSource = ??

Page 13: The Processor: Multicycle Implementation

Step 4 (R-type or Memory-Access)

1. Load and store instructions access memory lw: MDR <= Memory[ALUOut];

MemRead = ?, IorD = ? sw: Memory[ALUOut] <= B;

MemWrite = ?, IorD = ?

2. R-type instructions finishReg[IR[15-11]] <= ALUOut;RegDst = ?, RegWrite = ?, MemtoReg = ?

Page 14: The Processor: Multicycle Implementation

Write-Back StepMemory read completion stepReg[IR[20-16]] <= MDR;

RegDst = ?, RegWrite = ?, MemtoReg = ?

What about all the other instructions?

Page 15: The Processor: Multicycle Implementation

Summary of the StepsStep Name Actions for

R-typeActions for memory-

accessAction

for branch

Action for jump

Instruction Fetch IR <= Memory[PC];PC <= PC + 4;

Instruction decode / register fetch

A <= Reg[IR[25-21]];B <= Reg[IR[20-16]];ALUOut <= PC + (sign-extend(IR[15-0]) << 2);

Execution, address calculations, branch / jump completion

ALUOut <= A op B;

ALUOut <= A +

sign-extend(IR[15-0]);

if (A==B) PC <= ALUOut;

PC <= PC[31:28]||(IR[25:0]<<2)

Memory access / R-type completion

Reg[IR[15- 11]] <= ALUOut;

Load: MDR <= Memory[ALUOut];

Store:Memory[ALUOut] <= B;

Memory read completion

Reg[IR[20-16]] <= MDR;

Page 16: The Processor: Multicycle Implementation

Instructions in Multicycle Implementation

Loads: 5, Stores: 4, ALU Instructions: 4, Branch: 3, Jump: 3

Example: From SPECINT2000, instruction mix: 25 % loads (1% load byte + 24% load word) 10% stores (1% store byte + 9% store word) 11 % branches (6% beq + 5% bne) 2 % jumps (1% jal + 1% jr) 52% ALU operations

CPI = formula = 4.12

Page 17: The Processor: Multicycle Implementation

ExampleHow many cycles does it take to execute this

code?

lw $t2, 0($t3)lw $t3, 4($t3)beq $t2, $t3, Label #assume not

taken add $t5, $t2, $t3sw $t5, 8($t3)

Label: ...

What exactly is happening during the 8th cycle of execution?

In what cycle does the actual addition of $t2 and $t3 take place?

Page 18: The Processor: Multicycle Implementation

Implementing the ControlA control unit that will

assert control signals at the right timedetermine the next step

Value of control signals is dependent upon:which instruction is being executedwhich step is being performed (implies sequential

logic)Two specification methods

State diagramsMicroprogramming

Implementation can be derived from this specification

Page 19: The Processor: Multicycle Implementation

Review: Finite State Machines

Finite state machines:a finite set of states and next state function (determined by current

state and input)output function (determined by current state

and possibly input)We’ll use a Moore machine (output based

only on current state)

Page 20: The Processor: Multicycle Implementation

Finite State Machines

StateElements

InputsMealy Outputs

Next statestate

clk

Combinational Network

Output Function

Next State Function

Output Function

Combinational Network

MooreOutputs

Page 21: The Processor: Multicycle Implementation

State Diagram for Fetch & Decode

start

MemReadALUSrcA = 0

IorD = 0IRWrite

ALUSrcB = 01ALUOp = 00

PCWritePCSource = 00

S0

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

S1

op = ‘lw’ or ‘s

w’

Memory Reference

FSM

op = R-type

R-type FSM

op =

‘beq

Branch FSM

op

= ‘j’

Jump FSM

Page 22: The Processor: Multicycle Implementation

State Diagram for Memory-Access

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

Op = ‘lw’ or ‘sw’S2

From state S1

memory address calculation

RegDst = 0RegWrite

MemtoReg = 1

S4 memory read completion

MemWriteIorD = 1

sw

S5

memory accessmemory accessMemReadIorD = 1

lwS3

To state S0

Page 23: The Processor: Multicycle Implementation

State Diagram for R-type

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

Op = R-type

S6

From state S1

Execution

RegDst = 1RegWrite

MemtoReg = 0

S7R-type completion

To state S0

Page 24: The Processor: Multicycle Implementation

State Diagram for Branch

S8ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCWriteCondPCSource = 01

Op

= beq

Branch completion

From state S1

To state S0

Page 25: The Processor: Multicycle Implementation

State Diagram for Jump

To state S0

PCWritePCSource = 10

Op

= j

Jump completion

From state S1

S9

Page 26: The Processor: Multicycle Implementation

Complete State Diagram How many state bits are needed?

start

MemReadALUSrcA = 0

IorD = 0IRWrite

ALUSrcB = 01ALUOp = 00

PCWritePCSource = 00

S0ALUSrcA = 0ALUSrcB = 11ALUOp = 00

S1

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

Op = ‘lw’ or ‘sw’

S2

MemReadIorD = 1

lw

S3

RegDst = 0RegWrite

MemtoReg = 1

S4

MemWriteIorD = 1

sw

S5RegDst = 1RegWrite

MemtoReg = 0

S7

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

Op = R-type

S6 ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCWriteCondPCSource = 01

Op

= beq

S8PCWrite

PCSource = 10

Op =

j

S9

Page 27: The Processor: Multicycle Implementation

Implementing the Control

Combinationalcontrollogic

datapathcontrol outputs

outputs

inputs

FFsinputs from instruction

next statestate

Moore or Mealy

machine?

Page 28: The Processor: Multicycle Implementation

Complete Datapath & Control

Page 29: The Processor: Multicycle Implementation

Exceptions or InterruptsUnintentional change of normal flow of

instructionsException is an unexpected event from within

the processor, e.g arithmetic overflow.An interrupt may come from outside of the

processor.These terms are observed to be used

interchangeably (IA32 uses interrupts for both) Our simplified MIPS architecture can generate

two exceptions:Arithmetic overflowUndefined instruction.

Page 30: The Processor: Multicycle Implementation

Interrupt or Exception?

Type of event From where?

MIPS terminology

I/O device request External Interrupt

Invoke the OS from the user program

Internal Exception

Arithmetic overflow Internal Exception

Undefined instruction Internal Exception

Hardware malfunctions

Either Exception or interrupt

Page 31: The Processor: Multicycle Implementation

When an Exception Occurs 1/2

Exception program counter (EPC) PC of the current instruction (i.e. PC-4) in EPC

Transfer the control to OS at some specified address.

The OS does what needs to be done, it eitherStops the execution of the program and

reports an errorProvides a service to the user program (e.g a

predefined action to an overflow) and return to the point of origin of exception using EPC

Page 32: The Processor: Multicycle Implementation

When an Exception Occurs 2/2 The OS must know the reason for the

exception An exception is handled using two

approaches:1. cause register contains the reason for

exception (MIPS).2. Vectored interrupt: For each interrupt type,

a separate vector is stored and each interrupt vector targets at the appropriate piece of code.

Page 33: The Processor: Multicycle Implementation

Vectored Interrupts

Exception type exception vector address (in hex)

Undefined instruction

0xC0000000

Arithmetic overflow

0xC0000020

For each cause of the interrupt, the control goes toa different location in memory to handle the exception

Page 34: The Processor: Multicycle Implementation

How MIPS Handles ExceptionsRegisters

EPCCause register: individual bits are used to

encode the cause of exceptions (e.g. 0 in the LSB for undefined instruction 1 for overflow)

Control SignalsCauseWriteEPCWriteIntCause: 1-bit signals to set the appropriate

bit of Cause registerException address:

0x8000 0180 (SPIM uses 0x8000 0080)

Page 35: The Processor: Multicycle Implementation

MIPS Memory Allocation

0

7fff fffc

Reserved

Dynamic Data

Stack

Textpc 0040 0000

gp 1000 8000 Static Data

1000 0000

sp

0x8000 0180

Page 36: The Processor: Multicycle Implementation

The Multicycle Datapath with Exception Handling

0x80000180

overflow

Page 37: The Processor: Multicycle Implementation

What is New?PCSource

0

1

Zero

ALU

OF

2

3

ALUOut

jump address

0x8000 0180

EPC

EPCWrite

Cause

CauseWrite

0

1

0

1

IntCausePC-4

Page 38: The Processor: Multicycle Implementation

Exception States

IntCause = 0CauseWriteALUSrcA=0ALUSrcB=01ALUOp = 01

EPCWritePCWrite

PCSource=11

S10

IntCause = 1CauseWriteALUSrcA=0ALUSrcB=01ALUOp = 01

EPCWritePCWrite

PCSource=11

S11

To state S0 to beginnext instruction

Page 39: The Processor: Multicycle Implementation

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

MemReadALUSrcA = 0

IorD = 0IRWrite

ALUSrcB = 01ALUOp = 00

PCWritePCSource = 00

PCWritePCSource = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCWriteCondPCSource = 01

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

MemReadIorD = 1

RegDst = 1RegWrite

MemtoReg = 0

RegDst = 0RegWrite

MemtoReg = 1

MemWriteIorD = 1

start

Op = ‘lw’ or ‘sw’Op = R-ty

pe

Op

= b

eq

Op =

J

lw

sw

S0 S1

S2

S3

S4

S5 S7

S6 S8S9

Complete State Diagram with Exceptions

IntCause = 1CauseWriteALUSrcA=0ALUSrcB=01ALUOp = 01

EPCWritePCWrite

PCSource=11

Overflow

S11

IntCause = 0CauseWriteALUSrcA=0ALUSrcB=01ALUOp = 01

EPCWritePCWrite

PCSource=11

Op = other

S10

Page 40: The Processor: Multicycle Implementation

Hardware/Software Interface

Hardware

ISA

Assembly

Microarchitecture

Page 41: The Processor: Multicycle Implementation

Control with Microprogramming

Simpler instructions to implement (macro) instructions

Microinstructions are kept in ROMaddressable

Not visible to programmersMicroinstruction format

ALU Control SRC1 SRC2 Reg. Cont. MemoryPC Write Cont. Sequencing

Control signals are directly embedded in microinstructions

Page 42: The Processor: Multicycle Implementation

Fields of MicroinstructionsField name Function of field

ALU Control

Specify the operation being performed by the ALU during this clock, the result is always written in ALUOut

SRC1 Specify the source for the first ALU operand

SRC2 Specify the source for the second ALU operand

Register Control

Specify read or write for the register file, and the source of the value for a write

Memory Specify read or write, and the source for the memory. For a read specify the destination register

PCWrite Control

Specify the writing of the PC

Sequencing Specify how to choose the next microinstruction

Page 43: The Processor: Multicycle Implementation

Choosing Next Microinstruction

1. Next instruction Seq (sequential behavior)

2. Branch to the microinstruction that starts execution of the next MIPS instruction Fetch is the label of the initial microinstruction

3. Next microinstruction is chosen based on the input (dispatching, dispatch tables) In MIPS, two dispatch tables: one from S1 and the

other from S2. Dispatch i

Page 44: The Processor: Multicycle Implementation

Microprogram for the Control Unit

Dispatch 1ReadExtshftPCAdd

SeqALURead PC4PCAddFetch

Sequencing

PCWrite

control

MemoryRegister control

SRC2SRC1

ALU contro

l

Label

12

0

3

4

5

6

7

8

9

Page 45: The Processor: Multicycle Implementation

Dispatch Tables

Microcode dispatch table 1

Opcode field Opcode name Value

000000 (0) R-format Rformat1

000010 (2) jmp JUMP1

000100 (4) beq BEQ1

100011 (35) lw Mem1

101011 (43) sw Mem1

Microcode dispatch table 2

Opcode field Opcode name value

100011 (35) lw LW2

101011 (43) sw SW2

Page 46: The Processor: Multicycle Implementation

Microprogram for the Control Unit

Dispatch 2ExtendAAddMem1

SeqRead ALULW2

FetchWrite MDR

FetchJump address

JUMP1

FetchALUOut-cond

BASubBEQ1

Dispatch 1ReadExtshftPCAdd

FetchWrite ALUSW2

FetchWrite ALU

SeqBAFunc code

Rformat1

SeqALURead PC4PCAddFetch

Sequencing

PCWrite

control

MemoryRegister control

SRC2SRC1

ALU contro

l

Label

12

0

3

4

5

6

7

8

9

Page 47: The Processor: Multicycle Implementation

Implementing Microprogram

Microcode storage

microprogram counter

address select logicadder

1

datapathcontrol outputs

Sequencingcontrol

outputs

input

Dispatch tables