18
1 CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Michela Taufer http://www.cis.udel.edu/~taufer/teaching/CIS662F07 Powerpoint Lecture Notes from John Hennessy and David Patterson’s: Computer Architecture, 4th edition ---- Additional teaching material from: Jelena Mirkovic (U Del), John Kubiatowicz (UC Berkeley), and Soner Oender (Michigan Technological University) 2 Branch Predictions 3 Reducing Branch Penalty Branch penalty in dynamically scheduled processors: wasted cycles due to pipeline flushing on mis-predicted branches Reduce branch penalty: Predict branch/jump instructions AND branch direction (taken or not taken) Predict branch/jump target address (for taken branches) Speculatively execute instructions along the predicted path 4 What to Use and What to Predict Available info: Current predicted PC Past branch history (direction and target) What to predict: Conditional branch inst: branch direction and target address Jump inst: target address Procedure call/return: target address May need instruction pre-decoded IM PC Predictors PC pred_PC pred info feedback PC & Inst

CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

1

CISC 662 Graduate Computer Architecture Lecture 11 -

Hardware Speculation

Michela Taufer http://www.cis.udel.edu/~taufer/teaching/CIS662F07

Powerpoint Lecture Notes from John Hennessy and David Patterson’s: Computer Architecture, 4th edition

---- Additional teaching material from:

Jelena Mirkovic (U Del), John Kubiatowicz (UC Berkeley), and Soner Oender (Michigan Technological University)

2

Branch Predictions

3

Reducing Branch Penalty

Branch penalty in dynamically scheduled processors: wasted cycles due to pipeline flushing on mis-predicted branches

Reduce branch penalty:

  Predict branch/jump instructions AND branch direction (taken or not taken)

  Predict branch/jump target address (for taken branches)

  Speculatively execute instructions along the predicted path

4

What to Use and What to Predict Available info:

–  Current predicted PC –  Past branch history (direction and

target)

What to predict: –  Conditional branch inst: branch

direction and target address –  Jump inst: target address –  Procedure call/return: target

address

May need instruction pre-decoded IM

PC

Predictors

PC

pred_PC

pred info feedback PC & Inst

Page 2: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

2

5

Mis-prediction Detections and Feedbacks

Detections: •  At the end of decoding

–  Target address known at decoding, and not match –  Flush fetch stage

•  At commit (most cases) –  Wrong branch direction or target address not match –  Flush the whole pipeline

(at EXE: MIPS R10000)

Feedbacks: •  Any time a mis-prediction is detected •  At a branch’s commit (at EXE: called speculative update)

FETCH

RENAME

SCHD

REB/ROB

COMMIT

WB

EXE

predictors

6

Branch Direction Prediction

•  Predict branch direction: taken or not taken (T/NT)

•  Static prediction: compilers decide the direction •  Dynamic prediction: hardware decides the direction using dynamic

information 1.  1-bit Branch-Prediction Buffer 2.  2-bit Branch-Prediction Buffer 3.  Correlating Branch Prediction Buffer 4.  Tournament Branch Predictor 5.  and more …

Not taken

taken BNE R1, R2, L1 … L1: …

7

Predictor for a Single Branch

state 2. Predict Output T/NT

1. Access

3. Feedback T/NT

T

Predict Taken Predict Taken 1 0 T

NT

General Form

1-bit prediction

NT

PC

Feedback

8

Branch History Table of 1-bit Predictor

BHT also Called Branch Prediction Buffer in textbook

•  Can use only one 1-bit predictor, but accuracy is low

•  BHT: use a table of simple predictors, indexed by bits from PC

•  Similar to direct mapped cache •  More entries, more cost, but

less conflicts, higher accuracy •  BHT can contain complex

predictors Prediction

K-bit Branch address

2k

Page 3: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

3

9

1-bit BHT Weakness •  Example: in a loop, 1-bit BHT will cause

2 mis-predictions •  Consider a loop of 9 iterations before exit:

for (…){ for (i=0; i<9; i++) a[i] = a[i] * 2.0; } –  End of loop case, when it exits instead of looping as before –  First time through loop on next time through code, when it predicts exit

instead of looping –  Only 80% accuracy even if loop 90% of the time

10

•  Solution: 2-bit scheme where change prediction only if get mis-prediction twice: (Figure 3.7, p. 249)

•  Blue: stop, not taken •  Gray: go, taken •  Adds hysteresis to decision making process

2-bit Saturating Counter

T

T

NT

Predict Taken

Predict Not Taken

Predict Taken

Predict Not Taken

11 10

01 00 T

NT

T

NT

NT

11

Correlating Branches Code example showing the

potential

If (d==0) d=1; If (d==1) …

Assemble code

BNEZ R1, L1 DADDIU R1,R0,#1 L1: DADDIU R3,R1,#-1 BNEZ R3, L2 L2: …

Observation: if BNEZ1 is not taken, then BNEZ2 is taken

12

Correlating Branch Predictor Idea: taken/not taken of recently executed branches is related to behavior of next branch (as well as the history of that branch behavior)

–  Then behavior of recent branches selects between, say, 2 predictions of next branch, updating just that prediction

–  (1,1) predictor: 1-bit global, 1-bit local

Branch address (4 bits)

1-bits per branch local predictors

Prediction

1-bit global branch history (0 = not taken)

Page 4: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

4

13

Correlating Branch Predictor General form: (m, n) predictor

–  m bits for global history, n bits for local history

–  Records correlation between m+1 branches

–  Simple implementation: global history can be store in a shift register

–  Example: (2,2) predictor, 2-bit global, 2-bit local

Branch address (4 bits)

2-bits per branch local predictors

Prediction

2-bit global branch history (01 = not taken then taken) 14

Accuracy of Different Schemes (Figure 3.15, p. 206)

4096 Entries 2-bit BHT Unlimited Entries 2-bit BHT 1024 Entries (2,2) BHT

Freq

uenc

y of

Mis

pred

ictio

ns

15

Accuracy of Return Address Predictor

16

Branch Target Buffer •  Branch Target Buffer (BTB): Address of branch index to get prediction AND branch

address (if taken) –  Note: must check for branch match now, since can’t use wrong branch address

•  Example: BTB combined with BHT

Branch PC Predicted PC

=?

PC of instruction FETCH

Extra prediction state bits

Yes: instruction is branch and use predicted PC as next PC

No: branch not predicted, proceed normally

(Next PC = PC+4)

Page 5: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

5

17

Hardware Speculation

•  Exploiting more ILP requires that we overcome the limitation of control dependence: –  With branch prediction we allowed the processor

continue issuing instructions past a branch based on a prediction:

•  Those fetched instructions do not modify the processor state. •  These instructions are squashed if prediction is incorrect.

–  We now allow the processor to execute these instructions before we know if it is ok to execute them:

•  We need to correctly restore the processor state if such an instruction should not have been executed.

•  We need to pass the results from these instructions to future instructions as if the program is just following that path.

Hardware Based Speculation

•  Assume the processor predicts B1 to be taken (T) and executes.

•  What will happen if the prediction was wrong?

•  What value of each variable should be used if the processor predicts B1 and B2 taken (T) and executes instructions along the way?

Hardware Based Speculation

x < y?

A =b+c C=c-1

C=0 A=0

B=b+1 A=a+1

C=a

D=a+b+c ….

Use d

X < z

B1

B2

T

T

N

N

•  In order to execute instructions speculatively, we need to provide means: –  To roll back the values of both registers and the memory

to their correct values upon a misprediction. –  To communicate speculatively calculated values to the

new uses of those values. •  Both can be provided by using a simple structure

called Reorder Buffer (ROB).

Hardware Based Speculation

Page 6: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

6

•  It is a simple circular array with a head and a tail pointer: –  New instructions is allocated a position at the tail in

program order. –  Each entry provides a location for storing the

instruction’s result. –  New instructions look for the values starting from

tail – back. –  When the instruction at the head complete and

becomes non-speculative the values are committed and the instruction is removed from the buffer.

Reorder Buffer

Tail Head

•  3 fields: instr, destination, value •  Reorder buffer can be operand source => more

registers like RS •  Supplies operands between execution complete &

commit •  Use reorder buffer number instead of reservation

station when execution completes •  Once operand commits, result is put into register •  As a result, its easy to undo speculated instructions

on mispredicted branches or on exceptions

Reorder Buffer

Steps of Speculative Tomasulo Algorithm

1.  Issue [get instruction from FP Op Queue]

1.  Check if the reorder buffer is full. 2.  Check if a reservation station is available. 3.  Access the register file and the reorder buffer for the

current values of the source operands. 4.  Send the instruction, its reorder buffer slot number and

the source operands to the reservation station.

Once issued, the instruction stays in the reservation station until it gets both operands.

Steps of Speculative Tomasulo Algorithm

2. Execute [operate on operands (EX) ] •  When both operands ready and a functional unit is

available, the instruction executes. •  This step checks RAW hazards and as long as

operands are not ready, watches CDB for results.

Page 7: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

7

Steps of Speculative Tomasulo Algorithm

3. Write result [ finish execution (WB) ] –  Write on Common Data Bus to all awaiting FUs

and the reorder buffer. –  Mark reservation station available.

Steps of Speculative Tomasulo Algorithm

4. Commit [ update register file with reorder result ] •  When instruction reaches the head of reorder buffer •  The result is present •  No exceptions associated with the instruction

The instruction becomes non-speculative: •  Update register file with result (or store to memory) •  Remove the instruction from the reorder buffer.

A mispredicted branch flushes the reorder buffer.

MIPS FP Unit Recall: Four Steps of Speculative Tomasulo Algorithm

1. Issue — get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr &

send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”)

2. Execution — operate on operands (EX) When both operands ready then execute; if not ready, watch CDB

for result; when both in reservation station, execute; checks RAW (sometimes called “issue”)

3. Write result — finish execution (WB) Write on Common Data Bus to all awaiting FUs

& reorder buffer; mark reservation station available. 4. Commit —update register with reorder result

When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)

Page 8: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

8

Tomasulo With Reorder Buffer

To Memory

FP adders FP multipliers

Reservation Stations

FP Op Queue

ROB7 ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R2 Dest

Reorder Buffer

Registers

Tomasulo With Reorder Buffer

To Memory

FP Op Queue

ROB7 ROB6

ROB5

ROB4

ROB3

ROB2

ROB1

Done?

Oldest

Newest

Reorder Buffer

Registers

COB

Dest. Value

Instruction type

Example 1

LD F6, 34(R2) LD F2, 45(R3) MULTD F0, F2, F4 SUBD F8, F6, F2 DIVD F10, F0, F6 ADDD F6, F8, F2

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =

Mult1 Mult2

Busy

Page 9: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

9

Entry Busy Instruction State Destination Value Reorder buffer

Time =

1 2 3 4 5 6

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =1 First load is issued

Mult1 Mult2

yes Load

#1

Regs[R2] 34

Busy yes

#1

Entry Busy Instruction State Destination Value L.D F6, 34(R2)

Reorder buffer

Time =1 First load is issued

1 2 3 4 5 6

yes Issue F6

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =2 First load executes Second load is issued

Mult1 Mult2

yes yes

Load

#1

34+ Regs[R2]

Busy yes

#1

yes

√ √

Load Regs[R3] #2 45

#2

Page 10: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

10

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3)

Reorder buffer

1 2 3 4 5 6

yes yes

Execute F6 Issue F2

Time =2 First load executes Second load is issued Issue Execute Write result Commit

L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =3 First load executes Second load executes Mul is issued

Mult1 Mult2

yes yes

yes

Load

#1

34+ Regs[R2]

Busy yes

#1

yes yes

√ √

Load #2

#2

Mult Regs[F4] #2

#3

#3

45+ Regs[R3]

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4

Reorder buffer

1 2 3 4 5 6

yes yes yes

Execute F6 Execute F2

Time =3 First load executes Second load executes Mul is issued

Issue F0

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =4 First load writes result Second load executes Sub is issued

Mult1 Mult2

yes yes

yes

#1 Busy yes yes yes yes

√ √

Load #2

#2

Mult Regs[F4] #2

#3

#3

45+ Regs[R3]

Sub Mem[34+ Regs[R2]] #2 #4

#4

Page 11: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

11

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

yes yes yes yes

Write result F6 Execute F2 Stalled in issue F0

Mem[34+ Regs[R2]]

Issue F8

Time =4 First load writes result Second load executes Sub is issued

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =5 First load commits Second load writes result Div is issued

Mult1 Mult2

yes

yes yes

Busy yes yes yes yes

√ √

#2

Mult Regs[F4]

#3

#3

Sub #4

#4

√ √

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]

Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5

#5

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no yes yes yes yes

Commit F6 Write result F2 Stalled in issue F0

Mem[34+ Regs[R2]]

Stalled in issue F8

Time =5 First load commits Second load writes result Div is issued

DIV.D F10, F0, F6 Issue F10

Mem[45+ Regs[R3]]

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =6 Second load commits Mul (1/10) and sub(1/2) execute Add is issued

Mult1 Mult2

yes yes

yes yes

Busy yes yes yes yes

√ √ √

Mult Regs[F4]

#3

#3

Sub #4

#4

√ √

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]

Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5

#5

√ √ √

Add Mem[45+ Regs[R3]] #4

#6

#6

Page 12: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

12

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no no yes yes yes yes

Commit F6 Commit F2 Execute F0

Mem[34+ Regs[R2]]

Execute F8 DIV.D F10, F0, F6 Stalled in issue F10

Time =6 Second load commits Mul (1/10) and sub(1/2) execute Add is issued

Mem[45+ Regs[R3]]

ADD.D F6, F8, F2 Issue F6

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =7 Second load commits Mul (2/10) and sub(2/2) execute

Mult1 Mult2

yes yes

yes yes

Busy yes yes yes yes

√ √ √

Mult Regs[F4]

#3

#3

Sub #4

#4

√ √

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]

Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5

#5

√ √ √

Add Mem[45+ Regs[R3]] #4

#6

#6

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no no yes yes yes yes

Commit F6 Commit F2 Execute F0 Execute F8

DIV.D F10, F0, F6 Stalled in issue F10

Time =7 Second load commits Mul (2/10) and sub(2/2) execute Add is issued

ADD.D F6, F8, F2 Issue F6

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =8 Mul executes (3/10) Sub writes result (X)

Mult1 Mult2

yes

yes yes

Busy yes yes yes yes

√ √ √

Mult Regs[F4]

#3

#3

√ √

Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5

#5

√ √ √

Add Mem[45+ Regs[R3]] X

#6

#6

#4

Page 13: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

13

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no no yes yes yes yes

Commit F6 Commit F2 Execute F0 Write result F8

DIV.D F10, F0, F6 Stalled in issue F10 ADD.D F6, F8, F2 Stalled in issue F6

Time =8 Mul executes (3/10) Sub writes result (X)

X

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =9 Mul executes (4/10) Add executes(1/2)

Mult1 Mult2

yes

yes yes

Busy yes yes yes yes

√ √ √

Mult Regs[F4]

#3

#3

√ √

Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5

#5

√ √ √

Add Mem[45+ Regs[R3]] X

#6

#6

#4

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no no yes yes yes yes

Commit F6 Commit F2 Execute F0 Waiting to commit F8

DIV.D F10, F0, F6 Stalled in issue F10 ADD.D F6, F8, F2 Execute F6

X

Time =9 Mul executes (4/10) Add executes(1/2)

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =10 Mul executes (5/10) Add executes(2/2)

Mult1 Mult2

yes

yes yes

Busy yes yes yes yes

√ √ √

Mult Regs[F4]

#3

#3

√ √

Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5

#5

√ √ √

Add Mem[45+ Regs[R3]] X

#6

#6

#4

Page 14: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

14

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no no yes yes yes yes

Commit F6 Commit F2 Execute F0 Waiting to commit F8

DIV.D F10, F0, F6 Stalled in issue F10 ADD.D F6, F8, F2 Execute F6

X

Time =10 Mul executes (5/10) Add executes(2/2)

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =11 Mul executes (6/10) Add writes result (Y)

Mult1 Mult2

yes yes

Busy yes yes yes yes

√ √ √

Mult Regs[F4]

#3

#3

√ √

Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5

#5

√ √ √

√ √

#4 #6

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no no yes yes yes yes

Commit F6 Commit F2 Execute F0 Waiting to commit F8

DIV.D F10, F0, F6 Stalled in issue F10 ADD.D F6, F8, F2 Write result F6

X

Time =11 Mul executes (6/10) Add writes result (Y)

Y

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]] Faster than light computation

(skip a couple of cycles)

Page 15: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

15

Faster than light computation (skip a couple of cycles)

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =16 Mul writes result (Z)

Mult1 Mult2 yes

Busy yes yes yes yes

√ √ √ √

√ √

Div Mem[34+ Regs[R2]] Z #5

#5

√ √ √

√ √

#3 #4 #6

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no no yes yes yes yes

Commit F6 Commit F2 Write result F0 Waiting to commit F8

DIV.D F10, F0, F6 Stalled in issue F10 ADD.D F6, F8, F2 Waiting to commit F6

X

Y

Time =16 Mul writes result (Z)

Z

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =17 Mul commits Div is executed (1/40)

Mult1 Mult2 yes

Busy yes yes yes

√ √ √ √

√ √

Div Mem[34+ Regs[R2]] Z #5

#5

√ √ √

√ √

#4 #6

Page 16: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

16

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no no no yes yes yes

Commit F6 Commit F2 Commit F0 Waiting to commit F8

DIV.D F10, F0, F6 Execute F10 ADD.D F6, F8, F2 Waiting to commit F6

X

Y

Z

Time =17 Mul commits Div is executed (1/40)

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =18 Sub commits Div is executed (2/40)

Mult1 Mult2 yes

Busy yes yes

√ √ √ √

√ √

Div Mem[34+ Regs[R2]] Z #5

#5

√ √ √

√ √

#6

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no no no no yes yes

Commit F6 Commit F2 Commit F0 Commit F8

DIV.D F10, F0, F6 Execute F10 ADD.D F6, F8, F2 Waiting to commit F6

X

Y

Z

Time =18 Sub commits Div is executed (2/40)

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]] Faster than light computation

(skip a couple of cycles)

Page 17: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

17

Faster than light computation (skip a couple of cycles)

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =57 Div writes result (W)

Mult1 Mult2

Busy yes yes

√ √ √ √

√ √

#5

√ √ √

√ √

#6

√ √

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no no no no yes yes

Commit F6 Commit F2 Commit F0 Commit F8

DIV.D F10, F0, F6 Write result F10 ADD.D F6, F8, F2 Waiting to commit F6

X

Y

Z

Time =57 Div writes result (W)

W

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =58 Div commits

Mult1 Mult2

Busy yes

√ √ √ √

√ √

√ √ √

√ √

#6

√ √ √

Page 18: CISC 662 Graduate Computer Architecture Lecture 11 ......• Dynamic prediction: hardware decides the direction using dynamic information 1. 1-bit Branch-Prediction Buffer 2. 2-bit

18

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no no no no no yes

Commit F6 Commit F2 Commit F0 Commit F8

DIV.D F10, F0, F6 Commit F10 ADD.D F6, F8, F2 Waiting to commit F6

X

Y

Z

W

Time =58 Div commits

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]

Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2

Instruction status

Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3

Reservation stations

F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #

Register result status

Time =59 Add commits

Mult1 Mult2

Busy

√ √ √ √

√ √

√ √ √

√ √

√ √

√ √ √

Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6

Reorder buffer

1 2 3 4 5 6

no no no no no no

Commit F6 Commit F2 Commit F0 Commit F8

DIV.D F10, F0, F6 Commit F10 ADD.D F6, F8, F2 Commit F6

X

Y

Z

W

Time =59 Add commits

Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]