28
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim

CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

CPSC614Lec 5.1

Instruction Level Parallelism and Dynamic Execution #4:

Based on lectures by

Prof. David A. Patterson

E. J. Kim

CPSC614Lec 5.2

Correlating Predictors

• Two-level predictors

if (d == 0)d = 1;

if (d == 1)

CPSC614Lec 5.3

initial value of d

b1 value of d before b2

b2

0

1

2

CPSC614Lec 5.4

1-bit Predictor (Initialized to NT)

d b1 predic

b1 action

new b1 pr

b2 predic

b2 action

new b2 pr

2

0

2

0

CPSC614Lec 5.5

(1,1) Predictor

• Every branch has two separate prediction bits.– First bit: the prediction if the last branch in the

program is not taken.– Second bit: the prediction if the last branch in

the program is taken.

• Write the pair of prediction bits together.

CPSC614Lec 5.6

Combinations & Meaning

Prediction bits Prediction if not taken

Prediction if taken

CPSC614Lec 5.7

(m,n) Predictor

• Uses the last m branches to choose from 2m branch predictors, each of which is an n-bit predictor.

• Yields higher prediction rates than 2-bit scheme

• Requires a trivial amount of additional hardware

• The global history of the most recent m branches are recorded in an m-bit shift register.

CPSC614Lec 5.8

CPSC614Lec 5.9

(m,n) Predictor

• Total number of bits:

= 2m x n x #prediction entries selected by the branch address

• Examples

CPSC614Lec 5.10

CPSC614Lec 5.11

Tournament Predictors

• Most popular multilevel branch predictors

CPSC614Lec 5.12

Tournament Predictors

• By using multiple predictors (one based on global information, one based on local information, and combining them with a selector), it can select the right predictor for the right branch.

• Alpha 21264– Uses most sophisticated branch predictor as of

2001.

CPSC614Lec 5.13

CPSC614Lec 5.14

CPSC614Lec 5.15

Need Address at Same Time as Prediction

• Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)

Branch PC Predicted PC

=?

PC

of in

stru

ctio

nFETC

H

Extra prediction state

bits

Yes: instruction is branch and use predicted PC as next PC

No: branch not predicted, proceed normally

(Next PC = PC+4)

CPSC614Lec 5.16

Multiple-Issue Processors

• Allow multiple instructions to issue in a clock cycle.

• Ideal CPI < 1• 2 flavors

– Superscalar– VLIW (Very Long Instruction Word)

CPSC614Lec 5.17

Superscalar Processors

• Issue varying numbers of instructions per clock– statically scheduled

» using compiler techniques» in-order execution

– dynamically scheduled» Tomasulo’s algorithm» out-of-order execution

CPSC614Lec 5.18

• Superscalar MIPS: 2 instructions, 1 FP & 1 anything

– Fetch 64-bits/clock cycle; Int on left, FP on right– Can only issue 2nd instruction if 1st instruction issues– More ports for FP registers to do FP load & FP op in a pair

Type PipeStages

Int. instruction IF ID EX MEM WB

FP instruction IF ID EX MEM WB

Int. instruction IF ID EX MEM WB

FP instruction IF ID EX MEM WB

Int. instruction IF ID EX MEM WB

FP instruction IF ID EX MEM WB

• Figure 3.24 P.219

CPSC614Lec 5.19

VLIW Processors

• issue a fixed number of instructions formatted either as one large instruction or as a fixed instruction packet with the parallelism among instructions explicitly indicated by the instruction (EPIC: Explicitly Parallel Instruction Computers).

• Statically scheduled by the compiler.

CPSC614Lec 5.20

name Issue structure

Hazard detection

Scheduling Distinguishing characteristic

Examples

Superscalar (static)

dynamic h/w static in-order execution

Sun Ultra SPARC II/III

Superscalar(dynamic)

dynamic h/w dynamic some out-of-order

execution

IBM Power2

Superscalar(speculative)

dynamic h/w dynamic w/ speculation

out-of-order execution w/ speculation

Pentium III/4, MIPS R10K,

Alpha 21264,

VLIW/LIW static s/w static no hazards between issue

packets

Trimedia, i860

EPIC mostly static

mostly s/w

mostly static

explicit dependences

marked by compiler

Itanium

CPSC614Lec 5.21

Hardware-Based Speculation

• As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing burden.

=> Speculating on the outcome of branches and executing the program as if the guesses were correct.

• Hardware Speculation

CPSC614Lec 5.22

3 Key Ideas of Hardware Speculation

• Dynamic Branch Prediction– Choose which instruction to execute.

• Speculation– Allow the execution of instructions before the control

dependences are resolved (with the ability to undo the effect of an incorrectly speculated sequence).

• Dynamic Scheduling– Deal with the scheduling of different combinations of

basic blocks

CPSC614Lec 5.23

Examples

• PowerPC 603/604/G3/G4• MIPS R10000/12000• Intel Pentium II/III/4• Alpha 21264• AMD K5/K6/Athlon

CPSC614Lec 5.24

What about Precise Interrupts?

• Tomasulo had:

In-order issue, out-of-order execution, and out-of-order completion

• Need to “fix” the out-of-order completion aspect so that we can find precise breakpoint in instruction stream.

CPSC614Lec 5.25

Relationship between precise interrupts and speculation:

• Speculation is a form of guessing.• Important for branch prediction:

– Need to “take our best shot” at predicting branch direction.

• If we speculate and are wrong, need to back up and restart execution to point at which we predicted incorrectly:

– This is exactly same as precise exceptions!

• Technique for both precise interrupts/exceptions and speculation: in-order completion or commit

CPSC614Lec 5.26

HW support for precise interrupts

• Need HW buffer for results of uncommitted instructions: reorder buffer

– 3 fields: instr, destination, value

– Use reorder buffer number instead of reservation station when execution completes

– Supplies operands between execution complete & commit

– (Reorder buffer can be operand source => more registers like RS)

– Instructions commit– Once instruction commits,

result is put into register– As a result, easy to undo

speculated instructions on mispredicted branches or exceptions

ReorderBuffer

FPOp

Queue

FP Adder FP Adder

Res Stations Res Stations

FP Regs

CPSC614Lec 5.27

Four Steps of Speculative Tomasulo Algorithm

1.Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue

instr & send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”)

2.Execution—operate on operands (EX) When both operands ready then execute; if not ready,

watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”)

3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs

& reorder buffer; mark reservation station available.

4.Commit—update register with reorder result When instr. at head of reorder buffer & result present,

update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)

CPSC614Lec 5.28

What are the hardware complexities with reorder buffer (ROB)?

ReorderBuffer

FPOp

Queue

FP Adder FP Adder

Res Stations Res Stations

FP Regs

Com

par n

etw

ork

• How do you find the latest version of a register?– (As specified by Smith paper) need associative comparison network– Could use future file or just use the register result status buffer to track which specific reorder buffer has received the value

• Need as many ports on ROB as register file

Reorder Table

Dest

Reg

Resu

lt

Excep

tion

s?

Valid

Pro

gra

m C

ou

nte

r