86
PIPELINING B649 Parallel Architectures and Programming

PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

PIPELINING

B649Parallel Architectures and Programming

Page 2: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Announcements

• Blogs• Presentation topics and teams

★ already taken:✴ ISA (App. J)✴ Hardware and software for VLIW and EPIC (App. G)✴ Large Scale Multiprocessors and Scientific Applications (App. H)

• Heads-up★ Blog question★ Assignment 1

2

Page 3: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Why Pipelining?

• Instruction-Level Parallelism (ILP)• Reducing Cycles Per Instruction (CPI)

★ if instructions may take multiple cycles

• Decreasing the clock cycle time★ if each instruction takes one (long) cycle

• Invisible to the programmer

3

u = Time per instruction on unpipelined machinen = Number of pipelined stagestime per pipelined instruction = u / n

Page 4: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Basics of a RISC Instruction Set

• All operations on data registers• Only load and store access memory• All instructions are of one (fixed) size• MIPS64 (64-bit) instructions for example

4

Page 5: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Instruction Set Overview

• ALU instructions★ R1 ← R2 op R3

✴ R1, R2, R3: registers★ R1 ← R2 op I

✴ I: signed extended 16-bit immediate

• Load and store instructions★ LD R1:O, R2

✴ R1, R2: registers, O: 16-bit signed extended 16-bit immediate

• Branch and jump instructions★ comparison between two registers, or register and zero★ no unconditional jump

5

Page 6: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Digression: Recall Multiplexer

6

Page 7: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

• Positive Number: extend with zeroes

• Negative number: extend with ones

Digression: Sign Extension

7

x (16 bits) =

x (32 bits) =

0 x

0 000000000000000 x

-x (16 bits) =

-x (32 bits) =

1 x

1 111111111111111 x

Page 8: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

• Positive Number: extend with zeroes

• Negative number: extend with ones

Digression: Sign Extension

7

x (16 bits) =

x (32 bits) =

0 x

0 000000000000000 x

-x (16 bits) =

-x (32 bits) =

1 x

1 111111111111111 x

-x (16 bits) = (2¹⁶−x−1)-x (extended) = (2¹⁶−x−1) + 2¹⁶(2¹⁶−1) = (2³²−x−1) = -x (32 bits)

Page 9: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle• ID: Instruction decode / register fetch cycle• EX: Execute / effective address cycle• MEM: Memory access cycle• WB: Write-back cycle

8

Page 10: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle★ fetch current instruction from PC, add 4 to PC

• ID: Instruction decode / register fetch cycle• EX: Execute / effective address cycle• MEM: Memory access cycle• WB: Write-back cycle

9

Page 11: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle• ID: Instruction decode / register fetch cycle

★ decode instruction, read registers (fixed field decoding)★ do equality test on registers for possible branch★ sign extend offset field, in case it is needed★ add offset to possible branch target address

• EX: Execute / effective address cycle• MEM: Memory access cycle• WB: Write-back cycle

10

Page 12: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle• ID: Instruction decode / register fetch cycle• EX: Execute / effective address cycle

★ memory reference: base address+offset to compute effective address

★ register-register ALU instruction: perform the operation★ register-immediate ALU instruction: perform the operation

• MEM: Memory access cycle• WB: Write-back cycle

11

Page 13: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle• ID: Instruction decode / register fetch cycle• EX: Execute / effective address cycle• MEM: Memory access cycle

★ read or write based on effective address computed in last cycle

• WB: Write-back cycle

12

Page 14: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle• ID: Instruction decode / register fetch cycle• EX: Execute / effective address cycle• MEM: Memory access cycle• WB: Write-back cycle

★ register-register ALU instruction or Load: write result (computed or loaded from memory) into register file

13

Page 15: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle• ID: Instruction decode / register fetch cycle• EX: Execute / effective address cycle• MEM: Memory access cycle• WB: Write-back cycle

14

branch = 2 cyclesstore = 4 cyclesall else = 5 cyclesCPI = 4.54, assuming 12% branches, 10% stores

Page 16: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple Pipelined Implementation

15

Page 17: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Some Considerations

16

• Resource evaluation★ avoid resource conflicts across stages

• Separate instruction and data memories★ typically, with separate I and D caches

• Register access★ write in first half, read in second half

• PC not shown★ also need an adder to compute branch target★ branch does not change PC until ID (second) stage

✴ ignore for now!

Page 18: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Prevent Interference

17

Page 19: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Observations

18

• Each instruction takes the same number of cycles• Instruction throughput increases

★ hence programs run faster

• Imbalance among pipeline stages reduces performance

• Overheads★ pipeline delays (register setup time)★ clock skew (clock cycle ≥ clock skew + latch overhead)

• Hazards ahead!

Page 20: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Pipeline Hazards

• Structural hazards★ not all instruction combinations possible in parallel

• Data hazards★ data dependence

• Control hazards★ control dependence

19

Page 21: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Pipeline Hazards

• Structural hazards★ not all instruction combinations possible in parallel

• Data hazards★ data dependence

• Control hazards★ control dependence

19

Hazards make it necessary to stall the pipeline

Page 22: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Quantifying the Stall Cost

20

Average instruction time unpipelinedSpeedup = Average instruction time pipelined

CPI unpipelined × Clock cycle unpipelined = CPI pipelined × Clock cycle pipelined

CPI pipelined = Ideal CPI + Pipeline stall cycles per instruction = 1 + Pipeline stall cycles per instruction

Ignoring pipeline overheads, assuming balanced stages,

Clock cycle unpipelined = Clock cycle pipelined

CPI Unpipelined (≈ Pipeline depth)Speedup = 1 + Pipeline stall cycles per instruction

Page 23: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

STRUCTURAL HAZARDS

Page 24: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Structural Hazard Example: Mem. Port Conflict

22

Page 25: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

DATA HAZARDS

Page 26: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Data Hazard Types

• RAW: Read After Write★ true dependence

• WAR: Write After Read★ anti-dependence

• WAR: Write After Write★ output dependence

• RAR: Read After Read★ input dependence

24

Page 27: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Data Hazard Types

• RAW: Read After Write★ true dependence

• WAR: Write After Read★ anti-dependence

• WAR: Write After Write★ output dependence

• RAR: Read After Read★ input dependence

24

Page 28: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Data Hazard Example

25

Page 29: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Ameliorating Data Hazards

26

• Idea:★ ALU results from EX/MEM and MEM/WB registers fed

back to ALU inputs★ if previous ALU operation wrote the register needed by the

current operation, select the forwarded result

Page 30: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Forwarding

27

Page 31: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Ameliorating Data Hazards

28

• Idea:★ ALU results from EX/MEM and MEM/WB registers fed

back to ALU inputs★ if previous ALU operation wrote the register needed by the

current operation, select the forwarded result

• Observations:★ forwarding needed across multiple cycles (how many?)★ forwarding may be implemented across functional units

✴ e.g., output of one unit may be forwarded to input of another, rather than the input of just the same unit

Page 32: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Forwarding Across Multiple Units

29

Page 33: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Not All Stalls Avoided

30

Page 34: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

CONTROL HAZARDS

Page 35: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Handling Branch Hazard

32

Page 36: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Reducing Branch Penalty

33

• “Freeze” or “flush” the pipeline• Treat every branch as not-taken (“predicted-untaken”)

★ need to handle taken branches by roll-back

• Treat every branch as taken (“predicted-taken”)• Delayed branch

Page 37: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Freezing Pipeline

34

Page 38: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Delayed Branch

35

branch instructionsequential successor_1branch target if taken

Page 39: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Behavior of Delayed Branch

36

Page 40: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Schedule of Branch Delay Slot

37

Page 41: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Schedule of Branch Delay Slot

37

Page 42: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

HOW IS PIPELINING IMPLEMENTED?

Page 43: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple MIPS Implementation

• Instruction fetch cycle (IF)• Instruction decode/register fetch cycle (ID)• Execution / effective address cycle (EX)• Memory access / branch completion cycle (MEM)• Write-back cycle (WB)

39

Page 44: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple MIPS Implementation: IF(IF ➙ ID ➙ EX ➙ MEM ➙ WB)

• Fetch

40

IR ← Mem[PC];NPC ← PC + 4;

Page 45: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple MIPS Implementation: ID(IF ➙ ID ➙ EX ➙ MEM ➙ WB)

• Decode

41

A ← Regs[rs];B ← Regs[rt];Imm ← sign-extended immediate field of IR;

Page 46: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple MIPS Implementation: ID(IF ➙ ID ➙ EX ➙ MEM ➙ WB)

• Execution★ Memory reference

★ Register-Register ALU instruction

★ Register-Immediate ALU instruction

★ Branch

42

ALUOutput ← A + Imm;

ALUOutput ← A func B;

ALUOutput ← A op Imm;

ALUOutput ← NPC + (Imm << 2);Cond ← (A == 0);

Page 47: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple MIPS Implementation: ID(IF ➙ ID ➙ EX ➙ MEM ➙ WB)

• Memory access / branch completion★ Memory reference

★ Branch

43

LMD ← Mem[ALUOutput] orMem[ALUOutput] ← B;

if (cond) PC ← ALUOutput;

Page 48: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Simple MIPS Implementation: ID(IF ➙ ID ➙ EX ➙ MEM ➙ WB)

• Write-back★ Register-Register ALU instruction

★ Register-Immediate ALU instruction

★ Load instruction

44

Regs[rd] ← ALUOutput;

Regs[rt] ← ALUOutput;

Regs[rt] ← LMD;

Page 49: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

MIPS Data Path

45

Page 50: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

MIPS Data Path: Pipelined

46

Page 51: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Situations for Data Hazard

47

Page 52: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Logic to Detect Data Hazards

48

Page 53: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Forwarding

49

Page 54: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Forwarding

49

Page 55: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Forwarding Implemented

50

Page 56: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Reducing the Branch Delay

51

Page 57: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

EXCEPTIONS

Page 58: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Types of Exceptions

• I/O device request• Invoking an OS service• Tracing• Breakpoint• Integer arithmetic overflow• FP arithmetic anomaly• Page fault• Misaligned memory access• Memory protection violation• Undefined or unimplemented instruction• Hardware malfunction• Power failure

53

Page 59: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Exception Categories

• Synchronous vs Asynchronous• User requested vs coerced• User maskable vs nonmaskable• Within vs between instructions• Resume vs terminate

54

Page 60: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Exception Categorization

55

Page 61: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Handling Exceptions

56

• Force a “trap” into the pipeline on the next IF• Turn of all writes until the trap is taken off

★ place zeros into pipeline latches of instructions following the one causing exception

• Exception handler saves PC and returns to it• Delayed branches cause a problem

★ need to save delay slots plus one number of PCs

Page 62: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

(Precise) Exceptions in MIPS

57

Page 63: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

(Precise) Exceptions in MIPS

57

Page 64: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

(Precise) Exceptions in MIPS

57

Use exception status vector for each instruction

Page 65: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Complications due to Complex Instructions

58

• Instructions that change processor states before they are committed★ autoincrement★ string copy (change memory state)

• State bits★ implicitly condition codes (flags)

✴ instructions setting condition codes not allowed in delay slots

• Multicycle operations★ use of microinstructions

Page 66: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

ADDING FLOATING POINT SUPPORT

Page 67: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Functional Units

• Main integer unit★ handles loads, stores, integer ALU operations, branches

• FP and integer multiplier• FP adder

★ handles FP add, subtract, and conversion

• FP and integer divider

60

Page 68: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Extended Pipeline

61

Page 69: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Extended Pipeline: Expanded View

62

Page 70: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Extended Pipeline: Expanded View

62

Page 71: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Extended Pipeline: Expanded View

62

Page 72: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Additional Complications

63

• Structural hazard because divide unit is not pipelined

• Number of register writes in a cycle may more than one

• WAW hazards possible★ WAR hazards not possible

• Maintaining precise exceptions★ out-of-order completion

• More number of stalls due to RAW hazards, due to longer pipeline

Page 73: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Example: RAW Hazard

64

Page 74: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Example 2

65

Page 75: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Handling the Problems

66

• Register file conflict detection★ detect at ID stage★ detect at MEM or WB stage

Page 76: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Handling the Problems

67

• Register file conflict detection★ detect at ID stage, or★ detect at MEM or WB stage

• WAW hazard detection★ delay the issue of second instruction, or★ stamp out the result of the first instruction (convert it into

noop)

Page 77: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Summary of Checks

• Check for structural hazards• Check for RAW data hazard• Check for WAW data hazard

68

Page 78: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Maintaing Precise Exceptions

• Ignore the problem★ settle for imprecise exceptions

• Buffer the results★ simple buffer could be very big★ history file★ future file

• Let trap handling routines clean up• Hybrid scheme: allow issue when earlier instructions can

no longer raise exception

69

Page 79: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

DYNAMIC SCHEDULING

Page 80: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Idea Behind Dynamic Scheduling

• Split ID stage:★ Issue -- decode, check for structural hazards★ Read operands -- wait until no data hazards, then read

• Other stages remain as before

71

Page 81: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Idea Behind Dynamic Scheduling

• Split ID stage:★ Issue -- decode, check for structural hazards★ Read operands -- wait until no data hazards, then read

• Other stages remain as before

71

DIV.D F0, F2, F4ADD.D F10,F0,F8SUB.D F8,F8,F14

Must take care of WAR and WAW hazards.

Page 82: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Scoreboarding (CDC 6600)

72

Page 83: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Scoreboarding (CDC 6600)

73

• Issue★ check for free functional units★ check if another instruction has the same destination

register (WAW hazard detection)

• Read operands★ monitor availability of source operands (RAW hazard

detection)

• Execution★ functional unit notifies scoreboard of completion

• Write result★ check for WAR hazards, stall write if necessary

Page 84: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Scoreboarding: Effectiveness

• Relative easy to implement the logic★ only as much as a functional unit★ but four times as many buses as without scoreboard

• Reduces Clocks Per Instruction (CPI)★ tries to make use of the available Instruction Level

Parallelism (ILP)★ 1.7x speedup for Fortran, 2.5x for hand-coded assembly

74

Page 85: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Scoreboarding: Limitations

• Overlapping instructions must be picked from a single basic block

• Window: number of scoreboard entries• Number and types of functional units• Presence of anti- and output-dependences

75

Page 86: PIPELINING - Indiana University Bloomington · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles

B649: Parallel Architectures and Programming, Spring 2009

Pitfalls

• Unexpected execution sequences may cause unexpected hazards

• Extensive pipelining can impact other aspects of a design

• Evaluating dynamic or static scheduling on the basis of unoptimized code

76

BNEZ R1, foo DIV.D F0,F2,F4 ... ...foo: L.D F0,qrs