31
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 http://www.eng.yale.edu/courses/ eeng449bG EENG 449bG/CPSC 439bG Computer Systems Lecture 3 Pipelining Part II

EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.1

1/22/04

January 22, 2004

Prof. Andreas Savvides

Spring 2004

http://www.eng.yale.edu/courses/eeng449bG

EENG 449bG/CPSC 439bG Computer Systems

Lecture 3

Pipelining Part II

Page 2: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.2

1/22/04

Announcements

• Project groups and group meetings• Project topics

– A 1-page project proposal due next Friday, Jan 30 (email it to me)

• Project proposal should include:– 1 paragraph project overview. This describes what

your project will do.– 1 paragraph describing the specific tasks that you

need to do» E.g read papers, install tools, learn some special

programming language or hardware– 1 paragraph on what resources you need for your

project» E.g Are you using any special hardware?» Do you have access to lab/hardware/software

Page 3: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.3

1/22/04

Instruction Formats Review

Page 4: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.4

1/22/04

Implementing a MIPS Pipeline

We are developing a subset of the MIPS pipeline supporting

– Load store word– Branch equal zero– Integer ALU Operations

• Remember MIPS has register-register ALU instructions (e.g Add R1, R2, R3)

• Attention: In the homework you will have to redesign the pipeline for register-memory instructions for ALU operations (e.g Add R1,R2,(R3)!!!

Page 5: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.5

1/22/04

MIPS Datapath Review

4;PCNPC

Mem[PC];IR

Page 6: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.6

1/22/04

MIPS Datapath Review

IR; of field immediate extended-singImm

Regs[rt];B

Regs[rs];A

Page 7: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.7

1/22/04

MIPS Datapath Review

0)(ACond

2) (Imm NPC ALUOutput :Branch

or Imm; op AALUOutput :Imm-Reg

or B; func A ALUOutput :ALU Reg-Reg

or Imm; AALUOutput :Ref Memory

Page 8: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.8

1/22/04

MIPS Datapath Review

ALUOutputPC if(cond) :Branch

B;put]Mem[ALUOut

or put];Mem[ALUOutLMD :ref Mem

Page 9: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.9

1/22/04

MIPS Basic Pipeline

Data needs to be written in the registers at the end of each cycle

Depend on instruction type

Load or ALUoperation

LMD

ALUOut

Page 10: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.10

1/22/04

Events at every pipe stage

Page 11: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.11

1/22/04

Events at every pipe stage

Page 12: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.12

1/22/04

Hazards Review

From previous lecture we know the situations that would cause incorrect execution

• Structural Hazards -• Data Hazards -• Control Hazards -

Page 13: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.13

1/22/04

• Read After Write (RAW) InstrJ tries to read operand before InstrI writes it

• Caused by a “Data Dependence” (in compiler nomenclature). This hazard results from an actual need for communication.

Three Generic Data Hazards

I: add r1,r2,r3J: sub r4,r1,r3

Page 14: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.14

1/22/04

• Write After Read (WAR) InstrJ writes operand before InstrI reads it

• Called an “anti-dependence” by compiler writers.This results from reuse of the name “r1”.

• Can’t happen in MIPS 5 stage pipeline because:– All instructions take 5 stages, and– Reads are always in stage 2, and – Writes are always in stage 5

I: sub r4,r1,r3 J: add r1,r2,r3K: mul r6,r1,r7

Three Generic Data Hazards

Page 15: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.15

1/22/04

Three Generic Data Hazards

• Write After Write (WAW) InstrJ writes operand before InstrI writes it.

• Called an “output dependence” by compiler writersThis also results from the reuse of name “r1”.

• Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Writes are always in stage 5

• Will see WAR and WAW in later more complicated pipes

I: sub r1,r4,r3 J: add r1,r2,r3K: mul r6,r1,r7

Page 16: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.16

1/22/04

MIPS Basic Pipeline

Instruction issued

IF ID EX IF WB

Data Hazards can be detected here

Page 17: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.17

1/22/04

Hardware Hazard Detection

• Figure A.20

Page 18: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.18

1/22/04

Logic to Detect Load Interlocks

• Figure A.21

Page 19: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.19

1/22/04

Forwarding of Results to the ALU

Mem output

ALU output

Page 20: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.20

1/22/04

Control Hazards Revisited

A branch causes a 3-cycle stall in the 5-stage pipeline

Branch Instruction IF ID EX MEM WB

Branch Successor+1 IF stall stall IF ID EX MEM WB

Branch Successor+2 IF ID EX MEM WB

Branch Successor+3 IF ID EX MEM WB

Higher overhead than data hazards…

Can HW changes improve that? YES!• Try to make an early decision whether a branch is taken or not.

Page 21: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.21

1/22/04

Improved Pipeline – Dealing with Branches

Additional adder in ID stageWrite the PC faster

Can detect branch hazard 2 cycles earlier

Page 22: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.22

1/22/04

Improved Pipeline – Dealing with Branches

Additional adder in ID stageWrite the PC faster

Note change of order in text!Figure A.11 says a branch hazard would stall for 1 cycle. This is after the optimization in

Figure A.24!!!Note the change of order…

Page 23: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.23

1/22/04

Reducing Branch Penalties

1. Freeze the pipeline until the outcome of a branch instruction is known

2. Treat every branch as always not-taken • You have to be careful on how to restore the

state of the pipeline back the correct place

3. Treat every branch as taken• May make sense for some machines where the

branch target address is known before the outcome this might make sense

4. Delayed branch• Execute some instructions until the outcome is

known (branch-delay slots)

Page 24: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.24

1/22/04

Branch-Delay Slots

On a machine that needs n cycles before a branch outcome is known:

branch instruction

sequencial successor1 compiler needs to decide

sequencial successor2 on valid and useful successors …………………………………… sequencial successorn

Typically most processors have 1 delay slotLimitations of branch delay:• Restrictions on branch delay instructions• Ability to predict branch outcome at compile time

– Most hardware support nullifying branch – gives the compiler more flexibility. It can schedule the instruction and later on cancel its effects without violating program correctness

Page 25: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.25

1/22/04

Delayed Branch• Where to get instructions to fill branch delay slot?

– Before branch instruction– From the target address: only valuable when branch taken– From fall through: only valuable when branch not taken– Canceling branches allow more slots to be filled

• Compiler effectiveness for single branch delay slot:– Fills about 60% of branch delay slots– About 80% of instructions executed in branch delay slots

useful in computation– About 50% (60% x 80%) of slots usefully filled

• Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar)

Page 26: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.26

1/22/04

Scheduling Branch DelayIndependent instruction

Cannot be used

Preferred when branch taken w/ high prob

Page 27: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.27

1/22/04

Performance of Branch Schemes

branches from cycles stall Pipelinedepth Pipeline

speedup Pipeline

1

penalty Branch frequency Branch branches from cycles stall Pipeline

penalty Branchfrequency Branch1depth Pipeline

speedup Pipeline

Assuming an ideal CPI of 1:

Page 28: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.28

1/22/04

Challenges in Pipeline Implementation

Exceptions: Situations that can disrupt the in-order execution of instructions (interrupt, fault, exception)

• I/O device request• Invoking an OS service from a user

program• Breakpoint• Integer arithmetic overflow or FP

arithmetic anomaly• Page fault (not in main memory)• Misaligned memory access etc…

Page 29: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.29

1/22/04

Exceptions Requirements

• Synchronous vs. Asynchronous• User requested vs. coerced• User maskable vs. user non-maskable• With vs. between instructions• Resume vs. terminate

Major challenges:• Exceptions happening within

instructions• Exceptions that need to be restarted –

as in the case of a page fault

Page 30: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.30

1/22/04

MIPS Exceptions

Pipeline State Problem Exceptions

IF Page fault on instruction fetch misaligned memory access memory protection violation

ID Undefined or illegal opcode

EX Arithmetic exception

MEM Page fault on data fetch; misaligned

memory access; memory protection violation

WB None

Page 31: EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring 2004  EENG 449bG/CPSC 439bG Computer

EENG449b/SavvidesLec 4.31

1/22/04

What’s next?

Next lecture:– MIPS FP Pipeline & Dynamic Scheduled

Pipelines– An embedded processor architecture: ARM

Lecture 6:– Sensor networks and applications– The connection between architecture and

networks