Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
CISC 662 Graduate Computer Architecture Lecture 11 -
Hardware Speculation
Michela Taufer http://www.cis.udel.edu/~taufer/teaching/CIS662F07
Powerpoint Lecture Notes from John Hennessy and David Patterson’s: Computer Architecture, 4th edition
---- Additional teaching material from:
Jelena Mirkovic (U Del), John Kubiatowicz (UC Berkeley), and Soner Oender (Michigan Technological University)
2
Branch Predictions
3
Reducing Branch Penalty
Branch penalty in dynamically scheduled processors: wasted cycles due to pipeline flushing on mis-predicted branches
Reduce branch penalty:
Predict branch/jump instructions AND branch direction (taken or not taken)
Predict branch/jump target address (for taken branches)
Speculatively execute instructions along the predicted path
4
What to Use and What to Predict Available info:
– Current predicted PC – Past branch history (direction and
target)
What to predict: – Conditional branch inst: branch
direction and target address – Jump inst: target address – Procedure call/return: target
address
May need instruction pre-decoded IM
PC
Predictors
PC
pred_PC
pred info feedback PC & Inst
2
5
Mis-prediction Detections and Feedbacks
Detections: • At the end of decoding
– Target address known at decoding, and not match – Flush fetch stage
• At commit (most cases) – Wrong branch direction or target address not match – Flush the whole pipeline
(at EXE: MIPS R10000)
Feedbacks: • Any time a mis-prediction is detected • At a branch’s commit (at EXE: called speculative update)
FETCH
RENAME
SCHD
REB/ROB
COMMIT
WB
EXE
predictors
6
Branch Direction Prediction
• Predict branch direction: taken or not taken (T/NT)
• Static prediction: compilers decide the direction • Dynamic prediction: hardware decides the direction using dynamic
information 1. 1-bit Branch-Prediction Buffer 2. 2-bit Branch-Prediction Buffer 3. Correlating Branch Prediction Buffer 4. Tournament Branch Predictor 5. and more …
Not taken
taken BNE R1, R2, L1 … L1: …
7
Predictor for a Single Branch
state 2. Predict Output T/NT
1. Access
3. Feedback T/NT
T
Predict Taken Predict Taken 1 0 T
NT
General Form
1-bit prediction
NT
PC
Feedback
8
Branch History Table of 1-bit Predictor
BHT also Called Branch Prediction Buffer in textbook
• Can use only one 1-bit predictor, but accuracy is low
• BHT: use a table of simple predictors, indexed by bits from PC
• Similar to direct mapped cache • More entries, more cost, but
less conflicts, higher accuracy • BHT can contain complex
predictors Prediction
K-bit Branch address
2k
3
9
1-bit BHT Weakness • Example: in a loop, 1-bit BHT will cause
2 mis-predictions • Consider a loop of 9 iterations before exit:
for (…){ for (i=0; i<9; i++) a[i] = a[i] * 2.0; } – End of loop case, when it exits instead of looping as before – First time through loop on next time through code, when it predicts exit
instead of looping – Only 80% accuracy even if loop 90% of the time
10
• Solution: 2-bit scheme where change prediction only if get mis-prediction twice: (Figure 3.7, p. 249)
• Blue: stop, not taken • Gray: go, taken • Adds hysteresis to decision making process
2-bit Saturating Counter
T
T
NT
Predict Taken
Predict Not Taken
Predict Taken
Predict Not Taken
11 10
01 00 T
NT
T
NT
NT
11
Correlating Branches Code example showing the
potential
If (d==0) d=1; If (d==1) …
Assemble code
BNEZ R1, L1 DADDIU R1,R0,#1 L1: DADDIU R3,R1,#-1 BNEZ R3, L2 L2: …
Observation: if BNEZ1 is not taken, then BNEZ2 is taken
12
Correlating Branch Predictor Idea: taken/not taken of recently executed branches is related to behavior of next branch (as well as the history of that branch behavior)
– Then behavior of recent branches selects between, say, 2 predictions of next branch, updating just that prediction
– (1,1) predictor: 1-bit global, 1-bit local
Branch address (4 bits)
1-bits per branch local predictors
Prediction
1-bit global branch history (0 = not taken)
4
13
Correlating Branch Predictor General form: (m, n) predictor
– m bits for global history, n bits for local history
– Records correlation between m+1 branches
– Simple implementation: global history can be store in a shift register
– Example: (2,2) predictor, 2-bit global, 2-bit local
Branch address (4 bits)
2-bits per branch local predictors
Prediction
2-bit global branch history (01 = not taken then taken) 14
Accuracy of Different Schemes (Figure 3.15, p. 206)
4096 Entries 2-bit BHT Unlimited Entries 2-bit BHT 1024 Entries (2,2) BHT
Freq
uenc
y of
Mis
pred
ictio
ns
15
Accuracy of Return Address Predictor
16
Branch Target Buffer • Branch Target Buffer (BTB): Address of branch index to get prediction AND branch
address (if taken) – Note: must check for branch match now, since can’t use wrong branch address
• Example: BTB combined with BHT
Branch PC Predicted PC
=?
PC of instruction FETCH
Extra prediction state bits
Yes: instruction is branch and use predicted PC as next PC
No: branch not predicted, proceed normally
(Next PC = PC+4)
5
17
Hardware Speculation
• Exploiting more ILP requires that we overcome the limitation of control dependence: – With branch prediction we allowed the processor
continue issuing instructions past a branch based on a prediction:
• Those fetched instructions do not modify the processor state. • These instructions are squashed if prediction is incorrect.
– We now allow the processor to execute these instructions before we know if it is ok to execute them:
• We need to correctly restore the processor state if such an instruction should not have been executed.
• We need to pass the results from these instructions to future instructions as if the program is just following that path.
Hardware Based Speculation
• Assume the processor predicts B1 to be taken (T) and executes.
• What will happen if the prediction was wrong?
• What value of each variable should be used if the processor predicts B1 and B2 taken (T) and executes instructions along the way?
Hardware Based Speculation
x < y?
A =b+c C=c-1
C=0 A=0
B=b+1 A=a+1
C=a
D=a+b+c ….
Use d
X < z
B1
B2
T
T
N
N
• In order to execute instructions speculatively, we need to provide means: – To roll back the values of both registers and the memory
to their correct values upon a misprediction. – To communicate speculatively calculated values to the
new uses of those values. • Both can be provided by using a simple structure
called Reorder Buffer (ROB).
Hardware Based Speculation
6
• It is a simple circular array with a head and a tail pointer: – New instructions is allocated a position at the tail in
program order. – Each entry provides a location for storing the
instruction’s result. – New instructions look for the values starting from
tail – back. – When the instruction at the head complete and
becomes non-speculative the values are committed and the instruction is removed from the buffer.
Reorder Buffer
Tail Head
• 3 fields: instr, destination, value • Reorder buffer can be operand source => more
registers like RS • Supplies operands between execution complete &
commit • Use reorder buffer number instead of reservation
station when execution completes • Once operand commits, result is put into register • As a result, its easy to undo speculated instructions
on mispredicted branches or on exceptions
Reorder Buffer
Steps of Speculative Tomasulo Algorithm
1. Issue [get instruction from FP Op Queue]
1. Check if the reorder buffer is full. 2. Check if a reservation station is available. 3. Access the register file and the reorder buffer for the
current values of the source operands. 4. Send the instruction, its reorder buffer slot number and
the source operands to the reservation station.
Once issued, the instruction stays in the reservation station until it gets both operands.
Steps of Speculative Tomasulo Algorithm
2. Execute [operate on operands (EX) ] • When both operands ready and a functional unit is
available, the instruction executes. • This step checks RAW hazards and as long as
operands are not ready, watches CDB for results.
7
Steps of Speculative Tomasulo Algorithm
3. Write result [ finish execution (WB) ] – Write on Common Data Bus to all awaiting FUs
and the reorder buffer. – Mark reservation station available.
Steps of Speculative Tomasulo Algorithm
4. Commit [ update register file with reorder result ] • When instruction reaches the head of reorder buffer • The result is present • No exceptions associated with the instruction
The instruction becomes non-speculative: • Update register file with result (or store to memory) • Remove the instruction from the reorder buffer.
A mispredicted branch flushes the reorder buffer.
MIPS FP Unit Recall: Four Steps of Speculative Tomasulo Algorithm
1. Issue — get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr &
send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”)
2. Execution — operate on operands (EX) When both operands ready then execute; if not ready, watch CDB
for result; when both in reservation station, execute; checks RAW (sometimes called “issue”)
3. Write result — finish execution (WB) Write on Common Data Bus to all awaiting FUs
& reorder buffer; mark reservation station available. 4. Commit —update register with reorder result
When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)
8
Tomasulo With Reorder Buffer
To Memory
FP adders FP multipliers
Reservation Stations
FP Op Queue
ROB7 ROB6
ROB5
ROB4
ROB3
ROB2
ROB1
Done?
Dest Dest
Oldest
Newest
from Memory
1 10+R2 Dest
Reorder Buffer
Registers
Tomasulo With Reorder Buffer
To Memory
FP Op Queue
ROB7 ROB6
ROB5
ROB4
ROB3
ROB2
ROB1
Done?
Oldest
Newest
Reorder Buffer
Registers
COB
Dest. Value
Instruction type
Example 1
LD F6, 34(R2) LD F2, 45(R3) MULTD F0, F2, F4 SUBD F8, F6, F2 DIVD F10, F0, F6 ADDD F6, F8, F2
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
Time =
Mult1 Mult2
Busy
9
Entry Busy Instruction State Destination Value Reorder buffer
Time =
1 2 3 4 5 6
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =1 First load is issued
Mult1 Mult2
yes Load
#1
Regs[R2] 34
Busy yes
#1
Entry Busy Instruction State Destination Value L.D F6, 34(R2)
Reorder buffer
Time =1 First load is issued
1 2 3 4 5 6
yes Issue F6
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =2 First load executes Second load is issued
Mult1 Mult2
yes yes
Load
#1
34+ Regs[R2]
Busy yes
#1
yes
√ √
Load Regs[R3] #2 45
#2
10
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3)
Reorder buffer
1 2 3 4 5 6
yes yes
Execute F6 Issue F2
Time =2 First load executes Second load is issued Issue Execute Write result Commit
L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =3 First load executes Second load executes Mul is issued
Mult1 Mult2
yes yes
yes
Load
#1
34+ Regs[R2]
Busy yes
#1
yes yes
√ √
Load #2
#2
√
Mult Regs[F4] #2
#3
#3
45+ Regs[R3]
√
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4
Reorder buffer
1 2 3 4 5 6
yes yes yes
Execute F6 Execute F2
Time =3 First load executes Second load executes Mul is issued
Issue F0
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =4 First load writes result Second load executes Sub is issued
Mult1 Mult2
yes yes
yes
#1 Busy yes yes yes yes
√ √
Load #2
#2
√
Mult Regs[F4] #2
#3
#3
45+ Regs[R3]
√
√
√
Sub Mem[34+ Regs[R2]] #2 #4
#4
11
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
yes yes yes yes
Write result F6 Execute F2 Stalled in issue F0
Mem[34+ Regs[R2]]
Issue F8
Time =4 First load writes result Second load executes Sub is issued
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =5 First load commits Second load writes result Div is issued
Mult1 Mult2
yes
yes yes
Busy yes yes yes yes
√ √
#2
√
Mult Regs[F4]
#3
#3
√
√
√
Sub #4
#4
√ √
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]
Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5
√
#5
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no yes yes yes yes
Commit F6 Write result F2 Stalled in issue F0
Mem[34+ Regs[R2]]
Stalled in issue F8
Time =5 First load commits Second load writes result Div is issued
DIV.D F10, F0, F6 Issue F10
Mem[45+ Regs[R3]]
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =6 Second load commits Mul (1/10) and sub(1/2) execute Add is issued
Mult1 Mult2
yes yes
yes yes
Busy yes yes yes yes
√ √ √
Mult Regs[F4]
#3
#3
√
√
√
Sub #4
#4
√ √
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]
Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5
√
#5
√ √ √
√
Add Mem[45+ Regs[R3]] #4
#6
#6
12
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no no yes yes yes yes
Commit F6 Commit F2 Execute F0
Mem[34+ Regs[R2]]
Execute F8 DIV.D F10, F0, F6 Stalled in issue F10
Time =6 Second load commits Mul (1/10) and sub(1/2) execute Add is issued
Mem[45+ Regs[R3]]
ADD.D F6, F8, F2 Issue F6
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =7 Second load commits Mul (2/10) and sub(2/2) execute
Mult1 Mult2
yes yes
yes yes
Busy yes yes yes yes
√ √ √
Mult Regs[F4]
#3
#3
√
√
√
Sub #4
#4
√ √
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]
Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5
√
#5
√ √ √
√
Add Mem[45+ Regs[R3]] #4
#6
#6
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no no yes yes yes yes
Commit F6 Commit F2 Execute F0 Execute F8
DIV.D F10, F0, F6 Stalled in issue F10
Time =7 Second load commits Mul (2/10) and sub(2/2) execute Add is issued
ADD.D F6, F8, F2 Issue F6
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =8 Mul executes (3/10) Sub writes result (X)
Mult1 Mult2
yes
yes yes
Busy yes yes yes yes
√ √ √
Mult Regs[F4]
#3
#3
√
√
√
√ √
Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5
√
#5
√ √ √
√
Add Mem[45+ Regs[R3]] X
#6
#6
√
#4
13
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no no yes yes yes yes
Commit F6 Commit F2 Execute F0 Write result F8
DIV.D F10, F0, F6 Stalled in issue F10 ADD.D F6, F8, F2 Stalled in issue F6
Time =8 Mul executes (3/10) Sub writes result (X)
X
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =9 Mul executes (4/10) Add executes(1/2)
Mult1 Mult2
yes
yes yes
Busy yes yes yes yes
√ √ √
Mult Regs[F4]
#3
#3
√
√
√
√ √
Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5
√
#5
√ √ √
√
Add Mem[45+ Regs[R3]] X
#6
#6
√
√
#4
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no no yes yes yes yes
Commit F6 Commit F2 Execute F0 Waiting to commit F8
DIV.D F10, F0, F6 Stalled in issue F10 ADD.D F6, F8, F2 Execute F6
X
Time =9 Mul executes (4/10) Add executes(1/2)
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =10 Mul executes (5/10) Add executes(2/2)
Mult1 Mult2
yes
yes yes
Busy yes yes yes yes
√ √ √
Mult Regs[F4]
#3
#3
√
√
√
√ √
Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5
√
#5
√ √ √
√
Add Mem[45+ Regs[R3]] X
#6
#6
√
√
#4
14
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no no yes yes yes yes
Commit F6 Commit F2 Execute F0 Waiting to commit F8
DIV.D F10, F0, F6 Stalled in issue F10 ADD.D F6, F8, F2 Execute F6
X
Time =10 Mul executes (5/10) Add executes(2/2)
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =11 Mul executes (6/10) Add writes result (Y)
Mult1 Mult2
yes yes
Busy yes yes yes yes
√ √ √
Mult Regs[F4]
#3
#3
√
√
√
√ √
Mem[45+ Regs[R3]] Div Mem[34+ Regs[R2]] #3 #5
√
#5
√ √ √
√
√
√ √
#4 #6
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no no yes yes yes yes
Commit F6 Commit F2 Execute F0 Waiting to commit F8
DIV.D F10, F0, F6 Stalled in issue F10 ADD.D F6, F8, F2 Write result F6
X
Time =11 Mul executes (6/10) Add writes result (Y)
Y
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]] Faster than light computation
(skip a couple of cycles)
15
Faster than light computation (skip a couple of cycles)
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =16 Mul writes result (Z)
Mult1 Mult2 yes
Busy yes yes yes yes
√ √ √ √
√
√
√ √
Div Mem[34+ Regs[R2]] Z #5
√
#5
√ √ √
√
√
√ √
√
#3 #4 #6
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no no yes yes yes yes
Commit F6 Commit F2 Write result F0 Waiting to commit F8
DIV.D F10, F0, F6 Stalled in issue F10 ADD.D F6, F8, F2 Waiting to commit F6
X
Y
Time =16 Mul writes result (Z)
Z
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =17 Mul commits Div is executed (1/40)
Mult1 Mult2 yes
Busy yes yes yes
√ √ √ √
√
√
√ √
Div Mem[34+ Regs[R2]] Z #5
√
#5
√ √ √
√
√
√ √
√
√
√
#4 #6
16
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no no no yes yes yes
Commit F6 Commit F2 Commit F0 Waiting to commit F8
DIV.D F10, F0, F6 Execute F10 ADD.D F6, F8, F2 Waiting to commit F6
X
Y
Z
Time =17 Mul commits Div is executed (1/40)
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =18 Sub commits Div is executed (2/40)
Mult1 Mult2 yes
Busy yes yes
√ √ √ √
√
√
√ √
Div Mem[34+ Regs[R2]] Z #5
√
#5
√ √ √
√
√
√ √
√
√
√
#6
√
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no no no no yes yes
Commit F6 Commit F2 Commit F0 Commit F8
DIV.D F10, F0, F6 Execute F10 ADD.D F6, F8, F2 Waiting to commit F6
X
Y
Z
Time =18 Sub commits Div is executed (2/40)
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]] Faster than light computation
(skip a couple of cycles)
17
Faster than light computation (skip a couple of cycles)
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =57 Div writes result (W)
Mult1 Mult2
Busy yes yes
√ √ √ √
√
√
√ √
√
#5
√ √ √
√
√
√ √
√
√
√
#6
√ √
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no no no no yes yes
Commit F6 Commit F2 Commit F0 Commit F8
DIV.D F10, F0, F6 Write result F10 ADD.D F6, F8, F2 Waiting to commit F6
X
Y
Z
Time =57 Div writes result (W)
W
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =58 Div commits
Mult1 Mult2
Busy yes
√ √ √ √
√
√
√ √
√
√ √ √
√
√
√ √
√
√
√
#6
√ √ √
18
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no no no no no yes
Commit F6 Commit F2 Commit F0 Commit F8
DIV.D F10, F0, F6 Commit F10 ADD.D F6, F8, F2 Waiting to commit F6
X
Y
Z
W
Time =58 Div commits
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]
Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
Instruction status
Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3
Reservation stations
F0 … F2 … F4 … F6 … F8 … F10 … F12 Reorder #
Register result status
√
Time =59 Add commits
Mult1 Mult2
Busy
√ √ √ √
√
√
√ √
√
√ √ √
√
√
√ √
√
√
√ √
√ √ √
Entry Busy Instruction State Destination Value L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6
Reorder buffer
1 2 3 4 5 6
no no no no no no
Commit F6 Commit F2 Commit F0 Commit F8
DIV.D F10, F0, F6 Commit F10 ADD.D F6, F8, F2 Commit F6
X
Y
Z
W
Time =59 Add commits
Mem[34+ Regs[R2]] Mem[45+ Regs[R3]]