57
EECS 470 Lecture 5 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar EECS 470 Lecture 5 Scoreboard Scheduling Fall 2019 Prof. Ronald Dreslinski http://www.eecs.umich.edu/courses/eecs470 Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides. Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin.

EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 1

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

EECS 470Lecture 5Scoreboard SchedulingFall 2019Prof. Ronald Dreslinskihttp://www.eecs.umich.edu/courses/eecs470

Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides. Portions developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin.

Page 2: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 2

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Readings

For Today:

• H & P Chapter C.5-C.7, 3.1-3.3, 3.10

For Monday:

• H & P Chapter 3.4-3.6• D. Sima “Design Space of Register Renaming Techniques”

• Paper is linked from the Readings page

Page 3: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

Lecture 4 Slide 3EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Load Delay Slot (MIPS R2000)

F D E M WF D E M W

F D E M

t0 t1 t2 t3 t4 t5

Wj:k:

h: Rk ¬ --……

i: Rk ¬ MEM[ - ]j: -- ¬ Rkk: -- ¬ Rk

Which (Rk) do we really mean?

- The effect of a “delayed” Load is not visible to the instructions in its delay slots.

i:

Page 4: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

Lecture 4 Slide 4EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Control Hazards

beq 1 1 10sub 3 4 5

F D E M WF D E M W

t0 t1 t2 t3 t4 t5beqsub squash

Page 5: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

Lecture 4 Slide 5EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Handling Control Hazards

Avoidance (static)r No branches?r Convert branches to predication

m Control dependence becomes data dependence

Detect and Stall (dynamic)r Stop fetch until branch resolves

Speculate and squash (dynamic)r Keep going past branch, throw away instructions if wrong

Page 6: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

Lecture 4 Slide 6EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Avoidance: if-conversion

if (a == b) {x++;y = n / d;

}

sub t1 ¬ a, bjnz t1, PC+2add x ¬ x, #1div y ¬ n, d

sub t1 ¬ a, badd(t1) x ¬ x, #1div(t1) y ¬ n, d

sub t1 ¬ a, badd t2 ¬ x, #1div t3 ¬ n, dcmov(t1) x ¬ t2cmov(t1) y ¬ t3

Page 7: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

Lecture 4 Slide 7EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Handling Control Hazards:Detect & Stall

Detectionr In decode, check if opcode is branch or jump

Stallr Hold next instruction in Fetchr Pass noop to Decode

Page 8: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

Lecture 4 Slide 8EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Problems with Detect & Stall

CPI increases on every branch

Are these stalls necessary? Not always!r Branch is only taken half the timer Assume branch is NOT taken

m Keep fetching, treat branch as noopm If wrong, make sure bad instructions don’t complete

Page 9: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

Lecture 4 Slide 9EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Handling Control Hazards:Speculate & Squash

Speculater Assume branch is not taken

Squashr Overwrite opcodes in Fetch, Decode, Execute with noopr Pass target to Fetch

Page 10: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

10

© Wenisch et al. 2016

PC REGfile

MUXA

LU

MUX

1

Datamemory

++

MUX

IF/ID

ID/EX

EX/Mem

Mem/WB

signext

Control

equal

MUX

beqsubaddnand

add

sub

beq

beq

Instmem

noop

noop

noop

Page 11: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

Lecture 4 Slide 11EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Problems with Speculate & Squash

Always assumes branch is not takenCan we do better? Yes.

r Predict branch direction and target!r Why possible? Program behavior repeats.

More on branch prediction to come...

Page 12: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

Lecture 4 Slide 12EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Branch Delay Slot (MIPS, SPARC)

F D E M WF

F D E M

t0 t1 t2 t3 t4 t5

Wnext:

target:

i: beq 1, 2, tgtj: add 3, 4, 5 What can we put here?

branch:

F D E M WF D E M W

F D E M Wdelay:

target:

branch:

Squash

- Instruction in delay slot executes even on taken branch

Page 13: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

Lecture 4 Slide 13EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Pipeline Hazard ChecklistMemory Data Dependences

r Output Dependence (WAW)r Anti Dependence (WAR)r True Data Dependence (RAW)

Register Data Dependencesr Output Dependence (WAW)r Anti Dependence (WAR)r True Data Dependence (RAW)

Control Dependences

Page 14: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 14

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Instruction Level Parallelism

Page 15: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 15

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Limitations of Scalar PipelinesUpper Bound on Scalar Pipeline Throughput

Limited by IPC=1

Inefficient Unification Into Single PipelineLong latency for each instruction

Performance Lost Due to Rigid In-order PipelineUnnecessary stalls

Page 16: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 16

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, VijaykumarArchitectures for

Instruction-Level Parallelism

Page 17: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 17

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Superscalar Machine

Page 18: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 18

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

What is the real problem?CPI of in-order pipelines degrades very sharply if the machine

parallelism is increased beyond a certain point, i.e., when NxMapproaches average distance between dependent instructions

Forwarding is no longer effectivePipeline may never be full due to frequent dependency stalls!

Page 19: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 19

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

ILP: Instruction-Level Parallelism

ILP is a measure of the amount of inter-dependencies between instructions

Average ILP = no. instruction / no. cyc requiredcode1: ILP = 1

i.e. must execute serially

code2: ILP = 3i.e. can execute at the same time

code1: r1 ¬ r2 + 1r3 ¬ r1 / 17r4 ¬ r0 - r3

code2: r1 ¬ r2 + 1r3 ¬ r9 / 17r4 ¬ r0 - r10

Page 20: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 20

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Purported Limits on ILP

Weiss and Smith [1984] 1.58Sohi and Vajapeyam [1987] 1.81Tjaden and Flynn [1970] 1.86Tjaden and Flynn [1973] 1.96Uht [1986] 2.00Smith et al. [1989] 2.00Jouppi and Wall [1988] 2.40Johnson [1991] 2.50Acosta et al. [1986] 2.79Wedig [1982] 3.00Butler et al. [1991] 5.8Melvin and Patt [1991] 6Wall [1991] 7Kuck et al. [1972] 8Riseman and Foster [1972] 51Nicolau and Fisher [1984] 90

Page 21: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 21

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

The Problem With In-Order Pipelines

What’s happening in cycle 4?• mulf stalls due to RAW hazard

• OK, this is a fundamental problem• subf stalls due to pipeline hazard

• Why? subf can’t proceed into D because mulf is there• That is the only reason, and it isn’t a fundamental one

Why can’t subf go into D in cycle 4 and E+ in cycle 5?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16addf f0,f1,f2 F D E+ E+ E+ Wmulf f2,f3,f2 F D d* d* E* E* E* E* E* Wsubf f0,f1,f4 F p* p* D E+ E+ E+ W

Page 22: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

Lecture 4 Slide 22EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, VijaykumarNecessary Conditions for

Data Hazards

i:rk¬_

j:rk¬_ Reg Write

Reg Write i:_¬rk

j:rk¬_ Reg Write

Reg Read i:rk¬_

j:_¬rk Reg Read

Reg Write

stage X

stage Y

dist(i,j) £ dist(X,Y) Þ ??dist(i,j) > dist(X,Y) Þ ??

WAW Hazard WAR Hazard RAW Hazard

dist(i,j) £ dist(X,Y) Þ Hazard!!dist(i,j) > dist(X,Y) Þ Safe

Haz

ard

Dis

tanc

e

Page 23: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 23

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

regfile

D$I$BP

insn buffer

SD

add p2,p3,p4sub p2,p4,p5mul p2,p5,p6div p4,4,p7

Ready TableP2 P3 P4 P5 P6 P7Yes YesYes Yes Yes YesYes Yes Yes Yes YesYes Yes Yes Yes Yes Yes

div p4,4,p7mul p2,p5,p6sub p2,p4,p5add p2,p3,p4

and

Dynamic Scheduling: The Big Picture

• Instructions fetch/decoded/renamed into Instruction Buffer• Also called “instruction window” or “instruction scheduler”

• Instructions (conceptually) check ready bits every cycle• Execute when ready

Page 24: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 24

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Register Renaming• Anti (WAR ) and output (WAW ) dependencies are false

• The dependence is on name/location rather than data• Given infinite registers, WAR/WAW can always be eliminated• Renaming removes these dependencies, but leaves RAW intact

• Example• Names: r1,r2,r3• Locations: p1,p2,p3,p4,p5,p6,p7• Original mapping: r1®p1, r2®p2, r3®p3, p4–p7 are “free”

MapTable FreeList Orig. insns Renamed insnsr1 r2 r3p1 p2 p3 p4,p5,p6,p7 add r2,r3,r1 add p2,p3,p4p4 p2 p3 p5,p6,p7 sub r2,r1,r3 sub p2,p4,p5p4 p2 p5 p6,p7 mul r2,r3,r3 mul p2,p5,p6p4 p2 p6 p7 div r1,4,r1 div p4,4,p7

Page 25: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 25

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Dynamic Scheduling – OoO Execution• Dynamic scheduling

• Totally in the hardware• Also called “out-of-order execution” (OoO)

• Fetch many instructions into instruction window• Use branch prediction to speculate past (multiple) branches• Flush pipeline on branch misprediction

• Rename to avoid false dependencies (WAW and WAR)• Execute instructions as soon as possible

• Register dependencies are known• Handling memory dependencies more tricky (much more later)

• Commit instructions in order• Anything strange happens before commit, just flush the pipeline

• Current machines: 100+ instruction scheduling window

Page 26: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 26

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Motivation for Dynamic Scheduling• Dynamic scheduling (out-of-order execution)

• Execute insns in non-sequential (non-VonNeumann) order…+ Reduce RAW stalls+ Increase pipeline and functional unit (FU) utilization

• Original motivation was to increase FP unit utilization+ Expose more opportunities for parallel issue (ILP)

• Not in-order ® can be in parallel• …but make it appear like sequential execution

• Important– But difficult• Next few lectures

Page 27: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 27

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Going Forward: What’s Next• We’ll build this up in steps over the next few weeks

• “Scoreboarding” - first OoO, no register renaming• “Tomasulo’s algorithm” - adds register renaming• Handling precise state and speculation

• P6-style execution (Intel Pentium Pro)• R10k-style execution (MIPS R10k)

• Handling memory dependencies

• Let’s get started!

Page 28: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 28

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

New Pipeline Terminology

• In-order pipeline• Often written as F,D,X,W (multi-cycle X includes M)• Example pipeline: 1-cycle int (including mem), 3-cycle pipelined FP

regfile

D$I$BP

Page 29: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 29

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

New Pipeline DiagramInsn D X Wldf X(r1),f1 c1 c2 c3mulf f0,f1,f2 c3 c4+ c7stf f2,Z(r1) c7 c8 c9addi r1,4,r1 c8 c9 c10ldf X(r1),f1 c10 c11 c12mulf f0,f1,f2 c12 c13+ c16stf f2,Z(r1) c16 c17 c18

• Alternative pipeline diagram• Down: insns• Across: pipeline stages• In boxes: cycles• Basically: stages « cycles• Convenient for out-of-order

Page 30: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 30

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

The Problem With In-Order Pipelines

• In-order pipeline• Structural hazard: 1 insn register (latch) per stage

• 1 insn per stage per cycle (unless pipeline is replicated)• Younger insn can’t “pass” older insn without “clobbering” it

• Out-of-order pipeline• Implement “passing” functionality by removing structural hazard

regfile

D$I$BP

Page 31: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 31

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Instruction Buffer

• Trick: insn buffer (many names for this buffer)• Basically: a bunch of flops for holding insns

• Split D into two pieces• Accumulate decoded insns in buffer in-order• Buffer sends insns down rest of pipeline out-of-order

regfile

D$I$BP

insn buffer

D2D1

Page 32: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 32

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Dispatch and Issue

• Dispatch (D): first part of decode• Allocate slot in insn buffer

– New kind of structural hazard (insn buffer is full)• In order: stall back-propagates to younger insns

• Issue (S): second part of decode• Send insns from insn buffer to execution units+ Out-of-order: wait doesn’t back-propagate to younger insns

regfile

D$I$

BP

insn buffer

SD

Page 33: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 33

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Dispatch and Issue with Floating-Point

regfile

D$I$BP

insn buffer

SD

F-regfile

E/

E+

E+

E* E* E*

Page 34: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 34

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Dynamic Scheduling Algorithms

• Look at two register scheduling algorithms• Register scheduler: scheduler based on register dependences • Scoreboard

• No register renaming ® limited scheduling flexibility• Tomasulo

• Register renaming ® more flexibility, better performance

• Big simplification in this lecture: memory scheduling• Pretend register algorithm magically knows memory dependences• A little more realism in a few lectures

Page 35: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 35

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scheduling Algorithm I: Scoreboard• Scoreboard

• Centralized control scheme: insn status explicitly tracked• Insn buffer: Functional Unit Status Table (FUST)

• First implementation: CDC 6600 [1964]• 16 separate non-pipelined functional units (7 int, 4 FP, 5 mem) • No bypassing

• Our example: “Simple Scoreboard”• 5 FU: 1 ALU, 1 load, 1 store, 2 FP (3-cycle, pipelined)

Page 36: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 36

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard Data Structures• FU Status Table

• FU, busy, op, R, R1, R2: destination/source register names• T: destination register tag (FU producing the value)• T1,T2: source register tags (FU producing the values)

• Register Status Table• T: tag (FU that will write this register)

• Tags interpreted as ready-bits• Tag == 0 ® Value is ready in register file• Tag != 0 ® Value is not ready, will be supplied by T

• Insn status table• S,X bits for all active insns

Page 37: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 37

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Simple Scoreboard Data Structures

• Insn fields and status bits• Tags• Values

FU Status

R1 R2

XS Insn value

FU

T

T2T1Top========

Reg Status

Fetchedinsns

Regfile

R

T

========CAMs

Page 38: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 38

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard Pipeline• New pipeline structure: F, D, S, X, W

• F (fetch)• Same as it ever was

• D (dispatch)• Structural or WAW hazard ? stall : allocate scoreboard entry

• S (issue)• RAW hazard ? wait : read registers, go to execute

• X (execute)• Execute operation, notify scoreboard when done

• W (writeback)• WAR hazard ? wait : write register, free scoreboard entry• W and RAW-dependent S in same cycle• W and structural-dependent D in same cycle

Page 39: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 39

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard Dispatch (D)

• Stall for WAW or structural (Scoreboard, FU) hazards• Allocate scoreboard entry• Copy Reg Status for input registers• Set Reg Status for output register

FU Status

R1 R2

XS Insn value

FU

T

T2T1Top========

Reg Status

Fetchedinsns

Regfile

R

T

========

Page 40: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 40

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard Issue (S)

• Wait for RAW register hazards• Read registers

FU Status

R1 R2

XS Insn value

FU

T

T2T1Top========

Reg Status

Fetchedinsns

Regfile

R

T

========

Page 41: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 41

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Issue Policy and Issue Logic• Issue

• If multiple insns ready, which one to choose? Issue policy• Oldest first? Safe• Longest latency first? May yield better performance

• Select logic: implements issue policy• W®1 priority encoder• W: window size (number of scoreboard entries)

Page 42: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 42

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard Execute (X)

• Execute insn

FU Status

R1 R2

XS Insn value

FU

T

T2T1Top========

Reg Status

Fetchedinsns

Regfile

R

T

========

Page 43: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 43

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard Writeback (W)

• Wait for WAR hazard• Write value into regfile, clear Reg Status entry• Compare tag to waiting insns input tags, match ? clear input tag• Free scoreboard entry

FU Status

R1 R2

XS Insn value

FU

T

T2T1Top========

Reg Status

Fetchedinsns

Regfile

R

T

========

Page 44: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 44

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard Data StructuresInsn StatusInsn D S X Wldf X(r1),f1mulf f0,f1,f2stf f2,Z(r1) addi r1,4,r1ldf X(r1),f1mulf f0,f1,f2stf f2,Z(r1)

Reg StatusReg Tf0f1f2r1

FU StatusFU busy op R R1 R2 T1 T2ALU noLD noST noFP1 noFP2 no

Page 45: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 45

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard: Cycle 1Insn StatusInsn D S X Wldf X(r1),f1 c1mulf f0,f1,f2stf f2,Z(r1) addi r1,4,r1ldf X(r1),f1mulf f0,f1,f2stf f2,Z(r1)

Reg StatusReg Tf0f1 LDf2r1

FU StatusFU busy op R R1 R2 T1 T2ALU noLD yes ldf f1 - r1 - -ST noFP1 noFP2 no

allocate

Page 46: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 46

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard: Cycle 2Insn StatusInsn D S X Wldf X(r1),f1 c1 c2mulf f0,f1,f2 c2stf f2,Z(r1) addi r1,4,r1ldf X(r1),f1mulf f0,f1,f2stf f2,Z(r1)

Reg StatusReg Tf0f1 LDf2 FP1r1

FU StatusFU busy op R R1 R2 T1 T2ALU noLD yes ldf f1 - r1 - -ST noFP1 yes mulf f2 f0 f1 - LDFP2 no

allocate

Page 47: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 47

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard: Cycle 3Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3mulf f0,f1,f2 c2stf f2,Z(r1) c3addi r1,4,r1ldf X(r1),f1mulf f0,f1,f2stf f2,Z(r1)

Reg StatusReg Tf0f1 LDf2 FP1r1

Functional unit statusFU busy op R R1 R2 T1 T2ALU noLD yes ldf f1 - r1 - -ST yes stf - f2 r1 FP1 -FP1 yes mulf f2 f0 f1 - LDFP2 no

allocate

Page 48: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 48

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard: Cycle 4Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4mulf f0,f1,f2 c2 c4stf f2,Z(r1) c3addi r1,4,r1 c4ldf X(r1),f1mulf f0,f1,f2stf f2,Z(r1)

Reg StatusReg Tf0f1 LDf2 FP1r1 ALU

FU StatusFU busy op R R1 R2 T1 T2ALU yes addi r1 r1 - - -LD noST yes stf - f2 r1 FP1 -FP1 yes mulf f2 f0 f1 - LDFP2 no

allocatefree

f0 (LD) is ready ® issue mulf

f1 written ® clear

Page 49: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 49

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard: Cycle 5Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4mulf f0,f1,f2 c2 c4 c5stf f2,Z(r1) c3addi r1,4,r1 c4 c5ldf X(r1),f1 c5mulf f0,f1,f2stf f2,Z(r1)

Reg StatusReg Tf0f1 LDf2 FP1r1 ALU

FU StatusFU busy op R R1 R2 T1 T2ALU yes addi r1 r1 - - -LD yes ldf f1 - r1 - ALUST yes stf - f2 r1 FP1 -FP1 yes mulf f2 f0 f1 - -FP2 no

allocate

Page 50: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 50

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard: Cycle 6Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4mulf f0,f1,f2 c2 c4 c5+stf f2,Z(r1) c3addi r1,4,r1 c4 c5 c6ldf X(r1),f1 c5mulf f0,f1,f2stf f2,Z(r1)

Reg StatusReg Tf0f1 LDf2 FP1r1 ALU

FU StatusFU busy op R R1 R2 T1 T2ALU yes addi r1 r1 - - -LD yes ldf f1 - r1 - ALUST yes stf - f2 r1 FP1 -FP1 yes mulf f2 f0 f1 - -FP2 no

D stall: WAW hazard w/ mulf (f2) How to tell? RegStatus[f2] non-empty

Page 51: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 51

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard: Cycle 7Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4mulf f0,f1,f2 c2 c4 c5+stf f2,Z(r1) c3addi r1,4,r1 c4 c5 c6ldf X(r1),f1 c5mulf f0,f1,f2stf f2,Z(r1)

Reg StatusReg Tf0f1 LDf2 FP1r1 ALU

FU StatusFU busy op R R1 R2 T1 T2ALU yes addi r1 r1 - - -LD yes ldf f1 - r1 - ALUST yes stf - f2 r1 FP1 -FP1 yes mulf f2 f0 f1 - -FP2 no

W wait: WAR hazard w/ stf (r1)How to tell? Untagged r1 in FuStatusRequires CAM

Page 52: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 52

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard: Cycle 8Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4mulf f0,f1,f2 c2 c4 c5+ c8stf f2,Z(r1) c3 c8addi r1,4,r1 c4 c5 c6ldf X(r1),f1 c5mulf f0,f1,f2 c8stf f2,Z(r1)

Reg StatusReg Tf0f1 LDf2 FP1 FP2r1 ALU

FU StatusFU busy op R R1 R2 T1 T2ALU yes addi r1 r1 - - -LD yes ldf f1 - r1 - ALUST yes stf - f2 r1 FP1 -FP1 noFP2 yes mulf f2 f0 f1 - LD allocate

freef1 (FP1) is ready ® issue stf

first mulf done (FP1)

W wait

Page 53: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 53

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard: Cycle 9Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4mulf f0,f1,f2 c2 c4 c5+ c8stf f2,Z(r1) c3 c8 c9addi r1,4,r1 c4 c5 c6 c9ldf X(r1),f1 c5 c9mulf f0,f1,f2 c8stf f2,Z(r1)

Reg StatusReg Tf0f1 LDf2 FP2r1 ALU

FU StatusFU busy op R R1 R2 T1 T2ALU noLD yes ldf f1 - r1 - ALUST yes stf - f2 r1 - -FP1 noFP2 yes mulf f2 f0 f1 - LD

D stall: structural hazard FuStatus[ST]

r1 written ® clear

freer1 (ALU) is ready ® issue ldf

Page 54: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 54

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard: Cycle 10Insn StatusInsn D S X Wldf X(r1),f1 c1 c2 c3 c4mulf f0,f1,f2 c2 c4 c5+ c8stf f2,Z(r1) c3 c8 c9 c10addi r1,4,r1 c4 c5 c6 c9ldf X(r1),f1 c5 c9 c10mulf f0,f1,f2 c8stf f2,Z(r1) c10

Reg StatusReg Tf0f1 LDf2 FP2r1

FU StatusFU busy op R R1 R2 T1 T2ALU noLD yes ldf f1 - r1 - -ST yes stf - f2 r1 FP2 -FP1 noFP2 yes mulf f2 f0 f1 - LD

W & structural-dependent D in same cycle

free, then allocate

Page 55: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 55

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

In-Order vs. Scoreboard

• Big speedup? – Only 1 cycle advantage for scoreboard

• Why? addi WAR hazard• Scoreboard issued addi earlier (c8® c5)• But WAR hazard delayed W until c9• Delayed issue of second iteration

In-Order ScoreboardInsn D X W D S X Wldf X(r1),f1 c1 c2 c3 c1 c2 c3 c4mulf f0,f1,f2 c3 c4+ c7 c2 c4 c5+ c8stf f2,Z(r1) c7 c8 c9 c3 c8 c9 c10addi r1,4,r1 c8 c9 c10 c4 c5 c6 c9ldf X(r1),f1 c10 c11 c12 c5 c9 c10 c11mulf f0,f1,f2 c12 c13+ c16 c8 c11 c12+ c15stf f2,Z(r1) c16 c17 c18 c10 c15 c16 c17

Page 56: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 56

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

In-Order vs. Scoreboard II:Cache Miss

• Assume• 5 cycle cache miss on first ldf• Ignore FUST structural hazards– Little relative advantage

• addi WAR hazard (c7® c13) stalls second iteration

In-Order ScoreboardInsn D X W D S X Wldf X(r1),f1 c1 c2+ c7 c1 c2 c3+ c8mulf f0,f1,f2 c7 c8+ c11 c2 c8 c9+ c12stf f2,Z(r1) c11 c12 c13 c3 c12 c13 c14addi r1,4,r1 c12 c13 c14 c4 c5 c6 c13ldf X(r1),f1 c14 c15 c16 c5 c13 c14 c15mulf f0,f1,f2 c16 c17+ c20 c6 c15 c16+ c19stf f2,Z(r1) c20 c21 c22 c7 c19 c20 c21

Page 57: EECS 470 Lecture 5 Scoreboard SchedulingWhat’s happening in cycle 4? •mulfstalls due to RAW hazard •OK, this is a fundamental problem •subfstalls due to pipeline hazard •Why?

EECS 470Lecture 5 Slide 57

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar

Scoreboard Redux• The good

+ Cheap hardware

• InsnStatus + FuStatus + RegStatus ~ 1 FP unit in area

+ Pretty good performance

• 1.7X for FORTRAN (scientific array) programs

• The less good

– No bypassing

• Is this a fundamental problem?

– Limited scheduling scope

• Structural/WAW hazards delay dispatch

– Slow issue of truly-dependent (RAW) insns

• WAR hazards delay writeback

• Fix with hardware register renaming