EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.1

1/22/04

January 22, 2004

Prof. Andreas Savvides

Spring 2004

http://www.eng.yale.edu/courses/eeng449bG

EENG 449bG/CPSC 439bG Computer Systems

Lecture 3

Pipelining Part II


1/22/04

Announcements

• Project groups and group meetings• Project topics

– A 1-page project proposal due next Friday, Jan 30 (email it to me)

• Project proposal should include:– 1 paragraph project overview. This describes what

your project will do.– 1 paragraph describing the specific tasks that you

need to do» E.g read papers, install tools, learn some special

programming language or hardware– 1 paragraph on what resources you need for your

project» E.g Are you using any special hardware?» Do you have access to lab/hardware/software


1/22/04

Instruction Formats Review


1/22/04

Implementing a MIPS Pipeline

We are developing a subset of the MIPS pipeline supporting

– Load store word– Branch equal zero– Integer ALU Operations

• Remember MIPS has register-register ALU instructions (e.g Add R1, R2, R3)

• Attention: In the homework you will have to redesign the pipeline for register-memory instructions for ALU operations (e.g Add R1,R2,(R3)!!!


1/22/04

MIPS Datapath Review

4;PCNPC

Mem[PC];IR


1/22/04


IR; of field immediate extended-singImm

Regs[rt];B

Regs[rs];A


1/22/04


0)(ACond

2) (Imm NPC ALUOutput :Branch

or Imm; op AALUOutput :Imm-Reg

or B; func A ALUOutput :ALU Reg-Reg

or Imm; AALUOutput :Ref Memory


1/22/04


ALUOutputPC if(cond) :Branch

B;put]Mem[ALUOut

or put];Mem[ALUOutLMD :ref Mem


1/22/04

MIPS Basic Pipeline

Data needs to be written in the registers at the end of each cycle

Depend on instruction type

Load or ALUoperation

LMD

ALUOut


1/22/04

Events at every pipe stage


1/22/04

Events at every pipe stage


1/22/04

Hazards Review

From previous lecture we know the situations that would cause incorrect execution

• Structural Hazards -• Data Hazards -• Control Hazards -


1/22/04

• Read After Write (RAW) InstrJ tries to read operand before InstrI writes it

• Caused by a “Data Dependence” (in compiler nomenclature). This hazard results from an actual need for communication.

Three Generic Data Hazards

I: add r1,r2,r3J: sub r4,r1,r3


1/22/04

• Write After Read (WAR) InstrJ writes operand before InstrI reads it

• Called an “anti-dependence” by compiler writers.This results from reuse of the name “r1”.

• Can’t happen in MIPS 5 stage pipeline because:– All instructions take 5 stages, and– Reads are always in stage 2, and – Writes are always in stage 5

I: sub r4,r1,r3 J: add r1,r2,r3K: mul r6,r1,r7



1/22/04


• Write After Write (WAW) InstrJ writes operand before InstrI writes it.

• Called an “output dependence” by compiler writersThis also results from the reuse of name “r1”.

• Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Writes are always in stage 5

• Will see WAR and WAW in later more complicated pipes

I: sub r1,r4,r3 J: add r1,r2,r3K: mul r6,r1,r7


1/22/04

MIPS Basic Pipeline

Instruction issued

IF ID EX IF WB

Data Hazards can be detected here


1/22/04

Hardware Hazard Detection

• Figure A.20


1/22/04

Logic to Detect Load Interlocks

• Figure A.21


1/22/04

Forwarding of Results to the ALU

Mem output

ALU output


1/22/04

Control Hazards Revisited

A branch causes a 3-cycle stall in the 5-stage pipeline

Branch Instruction IF ID EX MEM WB

Branch Successor+1 IF stall stall IF ID EX MEM WB

Branch Successor+2 IF ID EX MEM WB

Branch Successor+3 IF ID EX MEM WB

Higher overhead than data hazards…

Can HW changes improve that? YES!• Try to make an early decision whether a branch is taken or not.


1/22/04

Improved Pipeline – Dealing with Branches

Additional adder in ID stageWrite the PC faster

Can detect branch hazard 2 cycles earlier


1/22/04

Improved Pipeline – Dealing with Branches

Additional adder in ID stageWrite the PC faster

Note change of order in text!Figure A.11 says a branch hazard would stall for 1 cycle. This is after the optimization in

Figure A.24!!!Note the change of order…


1/22/04

Reducing Branch Penalties

1. Freeze the pipeline until the outcome of a branch instruction is known

2. Treat every branch as always not-taken • You have to be careful on how to restore the

state of the pipeline back the correct place

3. Treat every branch as taken• May make sense for some machines where the

branch target address is known before the outcome this might make sense

4. Delayed branch• Execute some instructions until the outcome is

known (branch-delay slots)


1/22/04

Branch-Delay Slots

On a machine that needs n cycles before a branch outcome is known:

branch instruction

sequencial successor1 compiler needs to decide

sequencial successor2 on valid and useful successors …………………………………… sequencial successorn

Typically most processors have 1 delay slotLimitations of branch delay:• Restrictions on branch delay instructions• Ability to predict branch outcome at compile time

– Most hardware support nullifying branch – gives the compiler more flexibility. It can schedule the instruction and later on cancel its effects without violating program correctness


1/22/04

Delayed Branch• Where to get instructions to fill branch delay slot?

– Before branch instruction– From the target address: only valuable when branch taken– From fall through: only valuable when branch not taken– Canceling branches allow more slots to be filled

• Compiler effectiveness for single branch delay slot:– Fills about 60% of branch delay slots– About 80% of instructions executed in branch delay slots

useful in computation– About 50% (60% x 80%) of slots usefully filled

• Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar)


1/22/04

Scheduling Branch DelayIndependent instruction

Cannot be used

Preferred when branch taken w/ high prob


1/22/04

Performance of Branch Schemes

branches from cycles stall Pipelinedepth Pipeline

speedup Pipeline

1

penalty Branch frequency Branch branches from cycles stall Pipeline

penalty Branchfrequency Branch1depth Pipeline

speedup Pipeline

Assuming an ideal CPI of 1:


1/22/04

Challenges in Pipeline Implementation

Exceptions: Situations that can disrupt the in-order execution of instructions (interrupt, fault, exception)

• I/O device request• Invoking an OS service from a user

program• Breakpoint• Integer arithmetic overflow or FP

arithmetic anomaly• Page fault (not in main memory)• Misaligned memory access etc…


1/22/04

Exceptions Requirements

• Synchronous vs. Asynchronous• User requested vs. coerced• User maskable vs. user non-maskable• With vs. between instructions• Resume vs. terminate

Major challenges:• Exceptions happening within

instructions• Exceptions that need to be restarted –

as in the case of a page fault


1/22/04

MIPS Exceptions

Pipeline State Problem Exceptions

IF Page fault on instruction fetch misaligned memory access memory protection violation

ID Undefined or illegal opcode

EX Arithmetic exception

MEM Page fault on data fetch; misaligned

memory access; memory protection violation

WB None


1/22/04

What’s next?

Next lecture:– MIPS FP Pipeline & Dynamic Scheduled

Pipelines– An embedded processor architecture: ARM

Lecture 6:– Sensor networks and applications– The connection between architecture and

networks

Documents

EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer