Download pptx - Advanced Microarchitecture

Transcript
Page 1: Advanced  Microarchitecture

Advanced MicroarchitectureLecture 4: Branch Predictors

Page 2: Advanced  Microarchitecture

2

Direction vs. Target• Direction: 0 or 1• Target: 32- or 64-bit value

• Turns out targets are generally easier to predict– Don’t need to predict NT target– T target doesn’t usually change

• or has “nice” pattern like subroutine returns

Lecture 4: Correlated Branch Predictors

Page 3: Advanced  Microarchitecture

3

Branches Have Locality• If a branch was previously taken, there’s a

good chance it’ll be taken again in the future

for(i=0; i < 100000; i++){

/* do stuff */}

Lecture 4: Correlated Branch Predictors

This branch will be taken99,999 times in a row.

Page 4: Advanced  Microarchitecture

4

Simple Predictor• Always predict NT

– no fetch bubbles (always just fetch the next line)

– does horribly on previous for-loop example• Always predict T

– does pretty well on previous example– but what if you have other control besides

loops?

p = calloc(num,sizeof(*p));if(p == NULL)

error_handler( );Lecture 4: Correlated Branch Predictors

This branch is practicallynever taken

Page 5: Advanced  Microarchitecture

5

Last Outcome Predictor• Do what you did last time

Lecture 4: Correlated Branch Predictors

0xDC08: for(i=0; i < 100000; i++){

0xDC44: if( ( i % 100) == 0 )

tick( );

0xDC50:if( (i & 1) == 1)

odd( );

}

T

N

Page 6: Advanced  Microarchitecture

6

Misprediction Rates?

Lecture 4: Correlated Branch Predictors

DC08: TTTTTTTTTTT ... TTTTTTTTTTNTTTTTTTTT …100,000 iterations

How often is branch outcome != previous outcome?2 / 100,000

TNNT

DC44: TTTTT ... TNTTTTT ... TNTTTTT ...2 / 100

DC50: TNTNTNTNTNTNTNTNTNTNTNTNTNTNT …2 / 2

99.998%Prediction

Rate98.0%

0.0%

Page 7: Advanced  Microarchitecture

7

Saturating Two-Bit Counter

Lecture 4: Correlated Branch Predictors

0 1

FSM for Last-OutcomePrediction

0 1

2 3

FSM for 2bC(2-bit Counter)

Predict NTPredict T

Transistion on T outcomeTransistion on NT outcome

Page 8: Advanced  Microarchitecture

8

Example

Lecture 4: Correlated Branch Predictors

2T

3T

3T

…3N

N1

T

0

0T

1T T T T

…T

1 1 1 1

T

1

T

…1

0T

1T

2T

3T

3T

… 3T

Initial Training/Warm-up1bC:

2bC:

Only 1 Mispredict per N branches now!DC08: 99.999% DC04: 99.0%

Page 9: Advanced  Microarchitecture

9

Importance of Branches• 98% 99%

– Whoop-Dee-Do!– Actually, it’s 2% misprediction rate 1%– That’s a halving of the number of mispredictions

• So what?– If misp rate equals 50%, and 1 in 5 insts is a branch, then

number of useful instructions that we can fetch is:5*(1 + ½ + (½)2 + (½)3 + … ) = 10

– If we halve the miss rate down to 25%:5*(1 + ¾ + (¾)2 + (¾)3 + … ) = 20

– Halving the miss rate doubles the number of useful instructions that we can try to extract ILP from

Lecture 4: Correlated Branch Predictors

Page 10: Advanced  Microarchitecture

10

Typical Organization of 2bC Predictor

Lecture 4: Correlated Branch Predictors

PC hash32 or 64 bits

log2 n bits

n entries/counters

Prediction

FSMUpdateLogic

table update

Actual outcome

… back to predictors

Page 11: Advanced  Microarchitecture

11

Typical Hash• Just take the log2n least significant bits of

the PC• May need to ignore a few bits

– In a 32-bit RISC ISA, all instructions are 4 bytes wide, and all instruction addresses are 4-byte aligned least two significant bits of PC are always zeros and so they are not included• equivalent to right-shifting PC by two positions before

hashing– In a variable-length CISC ISA (ex. x86),

instructions may start on arbitrary byte boundaries• probably don’t want to shift

Lecture 4: Correlated Branch Predictors

Page 12: Advanced  Microarchitecture

12

How about the Branch at 0xDC50?• 1bc and 2bc don’t do too well (50% at best)• But it’s still obviously predictable• Why?

– It has a repeating pattern: (NT)*– How about other patterns? (TTNTN)*

• Use branch correlation– The outcome of a branch is often related to

previous outcome(s)

Lecture 4: Correlated Branch Predictors

Page 13: Advanced  Microarchitecture

13

Idea: Track the History of a Branch

Lecture 4: Correlated Branch Predictors

PC Previous Outcome

1Counter if prev=0

3 0 Counter if prev=1

1 3 3 prev = 1 3 0 prediction = Nprev = 0 3 0 prediction = Tprev = 1 3 0 prediction = Nprev = 0 3 0 prediction = T

prev = 1 3 prediction = T3

prev = 1 3 prediction = T3

prev = 1 3 prediction = T2

prev = 0 3 prediction = T2

Page 14: Advanced  Microarchitecture

14

Deeper History Covers More Patterns

• What pattern has this branch predictor entry learned?

Lecture 4: Correlated Branch Predictors

PC

0 310 1 3 1 0 02 2

Last 3 Outcomes Counter if prev=000Counter if prev=001

Counter if prev=010

Counter if prev=111

001 1; 011 0; 110 0; 100 100110011001… (0011)*

Page 15: Advanced  Microarchitecture

15

Predictor Organizations

Lecture 4: Correlated Branch Predictors

PC Hash

Different pattern foreach branch PC

PC Hash

Shared set ofpatterns

PC Hash

Mix of both

Page 16: Advanced  Microarchitecture

16

Example (1)• 1024 counters (210)

– 32 sets ( )• 5-bit PC hash chooses a set

– Each set has 32 counters• 32 x 32 = 1024• History length of 5 (log232 =

5)

• Branch collisions– 1000’s of branches

collapsed into only 32 sets

Lecture 4: Correlated Branch Predictors

PC Hash

5

5

Page 17: Advanced  Microarchitecture

17

Example (2)• 1024 counters (210)

– 128 sets ( )• 7-bit PC hash chooses a set

– Each set has 8 counters• 128 x 8 = 1024• History length of 3 (log28 =

3)

• Limited Patterns/Correlation– Can now only handle

history length of threeLecture 4: Correlated Branch Predictors

PC Hash

73

Page 18: Advanced  Microarchitecture

18

Two-Level Predictor Organization• Branch History Table

(BHT)– 2a entries– h-bit history per entry

• Pattern History Table (PHT)– 2b sets– 2h counters per set

• Total Size in bits– h2a + 2(b+h)2

Lecture 4: Correlated Branch Predictors

PC Hash a

b

h

Each entry is a 2-bit counter

Page 19: Advanced  Microarchitecture

19

Classes of Two-Level Predictors• h = 0 or a = 0 (Degenerate Case)

– Regular table of 2bC’s (b = log2counters)• h > 0, a > 1

– “Local History” 2-level predictor• h > 0, a = 1

– “Global History” 2-level predictor

Lecture 4: Correlated Branch Predictors

Page 20: Advanced  Microarchitecture

20

Global vs. Local Branch History• Local Behavior

– What is the predicted direction of Branch A given the outcomes of previous instances of Branch A?

• Global Behavior– What is the predicted direction of Branch Z

given the outcomes of all* previous branches A, B, …, X and Y?

* number of previous branches tracked limited by the history length

Lecture 4: Correlated Branch Predictors

Page 21: Advanced  Microarchitecture

21

Why Global Correlations Exist• Example: related branch

conditions

p = findNode(foo);if ( p is parent )

do something;

do other stuff; /* may contain more branches */

if ( p is a child )do something else;

Lecture 4: Correlated Branch Predictors

Outcome of secondbranch is always

opposite of the firstbranch

A:

B:

Page 22: Advanced  Microarchitecture

22

Other Global Correlations• Testing same/similar conditions

– code might test for NULL before a function call, and the function might test for NULL again

– in some cases it may be faster to recompute a condition rather than save a previous computation in memory and re-load it

– partial correlations: one branch could test for cond1, and another branch could test for cond1 && cond2 (if cond1 is false, then the second branch can be predicted as false)

– multiple correlations: one branch tests cond1, a second tests cond2, and a third tests cond1 cond2 (which can always be predicted if the first two branches are known).

Lecture 4: Correlated Branch Predictors

Page 23: Advanced  Microarchitecture

23

A Global-History Predictor

Lecture 4: Correlated Branch Predictors

PC Hash

b

h

Single global branchhistory register (BHR)

PC Hashb

h

b+h

Page 24: Advanced  Microarchitecture

24

Similar Tradeoff Between B and H• For fixed number of counters

– Larger h Smaller b• Larger h longer history

– able to capture more patterns– longer warm-up/training time

• Smaller b more branches map to same set of counters

– more interference– Larger b Smaller h

• just the opposite…

Lecture 4: Correlated Branch Predictors

Page 25: Advanced  Microarchitecture

25

Motivation for Combined Indexing• Not all 2h “states” are used

– (TTNN)* only uses half of the states for a history length of 3, and only ¼ of the states for a history length of 4

– (TN)* only uses two states no matter how long the history length is

• Not all bits of the PC are uniformly distributed

• Not all bits of the history are uniformly likely to be correlated– more recent history more likely to be strongly

correlatedLecture 4: Correlated Branch Predictors

Page 26: Advanced  Microarchitecture

26

Combined Index Example: gshare• S. McFarling (DEC-WRL TR, 1993)

Lecture 4: Correlated Branch Predictors

PC Hashk

k

XOR

k = log2counters

Page 27: Advanced  Microarchitecture

27

Gshare exampleBranchAddress

GlobalHistory

Gselect4/4

Gshare8/8

00000000 00000001 00000001 00000001

00000000 00000000 00000000 00000000

11111111 00000000 11110000 11111111

11111111 10000000 11110000 01111111

Lecture 4: Correlated Branch Predictors

Insufficient historyleads to a conflict

Page 28: Advanced  Microarchitecture

28

Some Interference May Be Tolerable• Branch A: always not-

taken• Branch B: always taken• Branch C: TNTNTN…• Branch D: TTNNTTNN…

Lecture 4: Correlated Branch Predictors

3

0

3

0

3

0

0

3

000

111

010

101

001

011

100

110

Page 29: Advanced  Microarchitecture

29

And Then It Might Not• Branch X: TTTNTTTN…• Branch Y: TNTNTN…• Branch Z: TTTT…

Lecture 4: Correlated Branch Predictors

000

111

010

101

001

011

100

110

0

3

3

3

3?

?

Page 30: Advanced  Microarchitecture

30

Interference Reducing Predictors• There are patterns and asymmetries in

branches• Not all patterns occur with same frequency• Branches have biases• This lecture:

– Bi-Mode (Lee et al., MICRO 97)– gskewed (Michaud et al., ISCA 97)

• These are global history predictors, but the ideas can be applied to other types of predictors

Lecture 4: Correlated Branch Predictors

Page 31: Advanced  Microarchitecture

31

Gskewed idea• Interference occurs because two (or more)

branches hash to the same index• A different hash function can prevent this

collision– but may cause other collisions

• Use multiple hash functions such that a collision can only occur in a few cases– use a majority vote to make final decision

Lecture 4: Correlated Branch Predictors

Page 32: Advanced  Microarchitecture

32

Gskewed organization

Lecture 4: Correlated Branch Predictors

PCGlobal Hist

hash1 hash2 hash3

maj

prediction

PHT1

PHT2

PHT3

if hash1(x) = hash1(y)then:

hash2(x) hash2(y)hash3(x) hash3(y)

Page 33: Advanced  Microarchitecture

33

Gskewed example

Lecture 4: Correlated Branch Predictors

A

B

maj

Page 34: Advanced  Microarchitecture

34

Combining Predictors• Some branches exhibit local history

correlations– ex. loop branches

• While others exhibit global history correlations– “spaghetti logic”, ex. if-elsif-elsif-elsif-else

branches

• Using a global history predictor prevents accurate prediction of branches exhibiting local history correlations

• And visa-versaLecture 4: Correlated Branch Predictors

Page 35: Advanced  Microarchitecture

35

Tournament Hybrid Predictors

Pred0 Pred1

MetaUpdat

e --- Inc Dec ---

Lecture 4: Correlated Branch Predictors

Pred0 Pred1Meta-

Predictor

Final Prediction

table of 2-/3-bit counters

If meta-counter MSB = 0,use pred0 else use pred1

Page 36: Advanced  Microarchitecture

36

Common Combinations• Global history + Local history• “easy” branches + global history

– 2bC and gshare• short history + long history

• Many types of behavior, many combinations

Lecture 4: Correlated Branch Predictors

Page 37: Advanced  Microarchitecture

37

Multi-Hybrids• Why only combine two predictors?

Lecture 4: Correlated Branch Predictors

M23

MM

prediction

M

prediction

• Tradeoff between making good individual predictions (P’s) vs. making good meta-predictions (M’s)– for a fixed hardware budget, improving one may

hurt the other

P3P2M01P1P0 P3P2P1P0

Page 38: Advanced  Microarchitecture

38

Prediction Fusion

• Selection discards information from n-1 predictors

• Fusion attempts to synthesize all information– more info to work with– possibly more junk to sort through

Lecture 4: Correlated Branch Predictors

M

prediction

P3

prediction

M

P2P1P0P3P2P1P0

Page 39: Advanced  Microarchitecture

39

Using Long Branch Histories• Long global history provides more context

for branch prediction/pattern matching– more potential sources of correlation

• Costs– For PHT-based approach, HW cost increases

exponentially: O(2h) counters– Training time increases, which may decrease

overall accuracy

Lecture 4: Correlated Branch Predictors

Page 40: Advanced  Microarchitecture

40

Predictor Training Time• Ex: prediction equals opposite for 2nd most

recent

Lecture 4: Correlated Branch Predictors

• Hist Len = 2• 4 states to train:

NN TNT TTN NTT N

• Hist Len = 3• 8 states to train:

NNN TNNT TNTN NNTT NTNN T…

Page 41: Advanced  Microarchitecture

41

Neural Branch Prediction• Uses “Perceptron” from classical machine

learning theory– simplest form of a neural-net (single-layer,

single-node)• Inputs are past branch outcomes• Compute weighted sum of inputs

– output is linear function of inputs– sign of output is used for the final prediction

Lecture 4: Correlated Branch Predictors

Page 42: Advanced  Microarchitecture

42

Perceptron Predictor

Lecture 4: Correlated Branch Predictors

1 0 1 0 0 1 0 1 1 1 0 0 1 1 0 0 0 0 1

1 -1 1 -1 -1 1 -1 1 1 1 -1 -1 1 1 -1 -1 -1 -1 1

xn x0x1x2xn-1

w0w1w2w3w4w5w6w7w8w9w10w11w12w13w14w15w16w17wn

xxx x x x x x x x x x x x x x x x x

Adder 0 prediction

“bias”

Page 43: Advanced  Microarchitecture

43

Perceptron Predictor (2)• Magnitude of weight wi determines how

correlated branch i is to the current branch• Sign of weight determines postitive or

negative correlation• Ex. outcome is usually opposite as 5th

oldest branch– w5 has large magnitude (L), but is negative– if x5 is taken, then w5x5 = -L1 = -L

• tends to make sum more negative (toward a NT prediction)

– if x5 is not taken, then w5x5 = -L-1 = L

Lecture 4: Correlated Branch Predictors

Page 44: Advanced  Microarchitecture

44

Perceptron Predictor (3)• When actual branch outcome is known:

– if xi = outcome, then increment wi (positive correlation)

– if xi outcome, then decrement wi (negative correlation)

– for x0, increment if branch taken, decrement if NT

• “Done with training”– if |S wi| > q, then don’t update weights unless

mispredLecture 4: Correlated Branch Predictors

Page 45: Advanced  Microarchitecture

45

Perceptron Trains Quickly• If no correlation exists with branch i, then

wi will just get incremented and decremented back and forth, wi 0

• If correlation exists with branch j, then wj will be consistently incremented (or decremented) to have a large influence on the overall sum

Lecture 4: Correlated Branch Predictors

Page 46: Advanced  Microarchitecture

46

Linearly Inseparable Functions• Perceptron computes linear combination of

inputs• Can only learn linearly separable functions

Lecture 4: Correlated Branch Predictors

xi

xj

1-1-1

1 N

N

N

T xi

xj

1-1-1

1 N

T

T

N

f() = -3*xi -4*xj – 5

wi wj w0

• No values of wi, wj, w0 exist to satisfy these output

• No straight line exists that separates T’s from N’s

Page 47: Advanced  Microarchitecture

47

Overall Hardware Organization

Lecture 4: Correlated Branch Predictors

PC Hashone set of weights

BHR…

adder …

prediction = sign(sum)Size = (h+1)*k*n + h + Area(mult) + Area(adder)

h = history length, k = counter width, n = number of perceptrons in table

Table of weights

Table BHR

Multipliers

Page 48: Advanced  Microarchitecture

48

GEHL• GEometric History Length predictor

Lecture 4: Correlated Branch Predictors

very long branch history

h1 h2 h3 h4

PC

adder

prediction = sign(sum)

K-bit weights

L1L2 L3 L4

L(i) = ai-1 L(1)History lengths form a geometric progression

Page 49: Advanced  Microarchitecture

49

PPM Predictors• PPM = Partial Pattern Matching

– Used in data compression– Idea: Use longest history necessary, but no longer

Lecture 4: Correlated Branch Predictors

Most Recent Oldest

2bc

Parti

al ta

gs

Parti

al ta

gs

Parti

al ta

gs

Parti

al ta

gs

h1 h2 h3 h4

PC

= = = =

0 1

0 1

0 1

0 1 Pred

2bc

2bc

2bc

2bc

Page 50: Advanced  Microarchitecture

50

TAGE Predictor• Similar to PPM, but uses geometric history

lengths– Currently the most accurate type of branch

prediction algorithm

• References (www.jilp.org):– PPM: Michaud (CBP-1)– O-GEHL: Seznec (CBP-1)– TAGE: Seznec & Michaud (JILP)– L-TAGE: Seznec (CBP-2)

Lecture 4: Correlated Branch Predictors