View
215
Download
1
Embed Size (px)
Citation preview
CPSC614Lec 5.1
Instruction Level Parallelism and Dynamic Execution #4:
Based on lectures by
Prof. David A. Patterson
E. J. Kim
CPSC614Lec 5.4
1-bit Predictor (Initialized to NT)
d b1 predic
b1 action
new b1 pr
b2 predic
b2 action
new b2 pr
2
0
2
0
CPSC614Lec 5.5
(1,1) Predictor
• Every branch has two separate prediction bits.– First bit: the prediction if the last branch in the
program is not taken.– Second bit: the prediction if the last branch in
the program is taken.
• Write the pair of prediction bits together.
CPSC614Lec 5.7
(m,n) Predictor
• Uses the last m branches to choose from 2m branch predictors, each of which is an n-bit predictor.
• Yields higher prediction rates than 2-bit scheme
• Requires a trivial amount of additional hardware
• The global history of the most recent m branches are recorded in an m-bit shift register.
CPSC614Lec 5.9
(m,n) Predictor
• Total number of bits:
= 2m x n x #prediction entries selected by the branch address
• Examples
CPSC614Lec 5.12
Tournament Predictors
• By using multiple predictors (one based on global information, one based on local information, and combining them with a selector), it can select the right predictor for the right branch.
• Alpha 21264– Uses most sophisticated branch predictor as of
2001.
CPSC614Lec 5.15
Need Address at Same Time as Prediction
• Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)
Branch PC Predicted PC
=?
PC
of in
stru
ctio
nFETC
H
Extra prediction state
bits
Yes: instruction is branch and use predicted PC as next PC
No: branch not predicted, proceed normally
(Next PC = PC+4)
CPSC614Lec 5.16
Multiple-Issue Processors
• Allow multiple instructions to issue in a clock cycle.
• Ideal CPI < 1• 2 flavors
– Superscalar– VLIW (Very Long Instruction Word)
CPSC614Lec 5.17
Superscalar Processors
• Issue varying numbers of instructions per clock– statically scheduled
» using compiler techniques» in-order execution
– dynamically scheduled» Tomasulo’s algorithm» out-of-order execution
CPSC614Lec 5.18
• Superscalar MIPS: 2 instructions, 1 FP & 1 anything
– Fetch 64-bits/clock cycle; Int on left, FP on right– Can only issue 2nd instruction if 1st instruction issues– More ports for FP registers to do FP load & FP op in a pair
Type PipeStages
Int. instruction IF ID EX MEM WB
FP instruction IF ID EX MEM WB
Int. instruction IF ID EX MEM WB
FP instruction IF ID EX MEM WB
Int. instruction IF ID EX MEM WB
FP instruction IF ID EX MEM WB
• Figure 3.24 P.219
CPSC614Lec 5.19
VLIW Processors
• issue a fixed number of instructions formatted either as one large instruction or as a fixed instruction packet with the parallelism among instructions explicitly indicated by the instruction (EPIC: Explicitly Parallel Instruction Computers).
• Statically scheduled by the compiler.
CPSC614Lec 5.20
name Issue structure
Hazard detection
Scheduling Distinguishing characteristic
Examples
Superscalar (static)
dynamic h/w static in-order execution
Sun Ultra SPARC II/III
Superscalar(dynamic)
dynamic h/w dynamic some out-of-order
execution
IBM Power2
Superscalar(speculative)
dynamic h/w dynamic w/ speculation
out-of-order execution w/ speculation
Pentium III/4, MIPS R10K,
Alpha 21264,
VLIW/LIW static s/w static no hazards between issue
packets
Trimedia, i860
EPIC mostly static
mostly s/w
mostly static
explicit dependences
marked by compiler
Itanium
CPSC614Lec 5.21
Hardware-Based Speculation
• As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing burden.
=> Speculating on the outcome of branches and executing the program as if the guesses were correct.
• Hardware Speculation
CPSC614Lec 5.22
3 Key Ideas of Hardware Speculation
• Dynamic Branch Prediction– Choose which instruction to execute.
• Speculation– Allow the execution of instructions before the control
dependences are resolved (with the ability to undo the effect of an incorrectly speculated sequence).
• Dynamic Scheduling– Deal with the scheduling of different combinations of
basic blocks
CPSC614Lec 5.23
Examples
• PowerPC 603/604/G3/G4• MIPS R10000/12000• Intel Pentium II/III/4• Alpha 21264• AMD K5/K6/Athlon
CPSC614Lec 5.24
What about Precise Interrupts?
• Tomasulo had:
In-order issue, out-of-order execution, and out-of-order completion
• Need to “fix” the out-of-order completion aspect so that we can find precise breakpoint in instruction stream.
CPSC614Lec 5.25
Relationship between precise interrupts and speculation:
• Speculation is a form of guessing.• Important for branch prediction:
– Need to “take our best shot” at predicting branch direction.
• If we speculate and are wrong, need to back up and restart execution to point at which we predicted incorrectly:
– This is exactly same as precise exceptions!
• Technique for both precise interrupts/exceptions and speculation: in-order completion or commit
CPSC614Lec 5.26
HW support for precise interrupts
• Need HW buffer for results of uncommitted instructions: reorder buffer
– 3 fields: instr, destination, value
– Use reorder buffer number instead of reservation station when execution completes
– Supplies operands between execution complete & commit
– (Reorder buffer can be operand source => more registers like RS)
– Instructions commit– Once instruction commits,
result is put into register– As a result, easy to undo
speculated instructions on mispredicted branches or exceptions
ReorderBuffer
FPOp
Queue
FP Adder FP Adder
Res Stations Res Stations
FP Regs
CPSC614Lec 5.27
Four Steps of Speculative Tomasulo Algorithm
1.Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue
instr & send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”)
2.Execution—operate on operands (EX) When both operands ready then execute; if not ready,
watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”)
3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs
& reorder buffer; mark reservation station available.
4.Commit—update register with reorder result When instr. at head of reorder buffer & result present,
update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)
CPSC614Lec 5.28
What are the hardware complexities with reorder buffer (ROB)?
ReorderBuffer
FPOp
Queue
FP Adder FP Adder
Res Stations Res Stations
FP Regs
Com
par n
etw
ork
• How do you find the latest version of a register?– (As specified by Smith paper) need associative comparison network– Could use future file or just use the register result status buffer to track which specific reorder buffer has received the value
• Need as many ports on ROB as register file
Reorder Table
Dest
Reg
Resu
lt
Excep
tion
s?
Valid
Pro
gra
m C
ou
nte
r