Advanced MicroarchitectureLecture 4: Branch Predictors
2
Direction vs. Target• Direction: 0 or 1• Target: 32- or 64-bit value
• Turns out targets are generally easier to predict– Don’t need to predict NT target– T target doesn’t usually change
• or has “nice” pattern like subroutine returns
Lecture 4: Correlated Branch Predictors
3
Branches Have Locality• If a branch was previously taken, there’s a
good chance it’ll be taken again in the future
for(i=0; i < 100000; i++){
/* do stuff */}
Lecture 4: Correlated Branch Predictors
This branch will be taken99,999 times in a row.
4
Simple Predictor• Always predict NT
– no fetch bubbles (always just fetch the next line)
– does horribly on previous for-loop example• Always predict T
– does pretty well on previous example– but what if you have other control besides
loops?
p = calloc(num,sizeof(*p));if(p == NULL)
error_handler( );Lecture 4: Correlated Branch Predictors
This branch is practicallynever taken
5
Last Outcome Predictor• Do what you did last time
Lecture 4: Correlated Branch Predictors
0xDC08: for(i=0; i < 100000; i++){
0xDC44: if( ( i % 100) == 0 )
tick( );
0xDC50:if( (i & 1) == 1)
odd( );
}
T
N
6
Misprediction Rates?
Lecture 4: Correlated Branch Predictors
DC08: TTTTTTTTTTT ... TTTTTTTTTTNTTTTTTTTT …100,000 iterations
How often is branch outcome != previous outcome?2 / 100,000
TNNT
DC44: TTTTT ... TNTTTTT ... TNTTTTT ...2 / 100
DC50: TNTNTNTNTNTNTNTNTNTNTNTNTNTNT …2 / 2
99.998%Prediction
Rate98.0%
0.0%
7
Saturating Two-Bit Counter
Lecture 4: Correlated Branch Predictors
0 1
FSM for Last-OutcomePrediction
0 1
2 3
FSM for 2bC(2-bit Counter)
Predict NTPredict T
Transistion on T outcomeTransistion on NT outcome
8
Example
Lecture 4: Correlated Branch Predictors
2T
3T
3T
…3N
N1
T
0
0T
1T T T T
…T
1 1 1 1
T
1
T
…1
0T
1T
2T
3T
3T
… 3T
Initial Training/Warm-up1bC:
2bC:
Only 1 Mispredict per N branches now!DC08: 99.999% DC04: 99.0%
9
Importance of Branches• 98% 99%
– Whoop-Dee-Do!– Actually, it’s 2% misprediction rate 1%– That’s a halving of the number of mispredictions
• So what?– If misp rate equals 50%, and 1 in 5 insts is a branch, then
number of useful instructions that we can fetch is:5*(1 + ½ + (½)2 + (½)3 + … ) = 10
– If we halve the miss rate down to 25%:5*(1 + ¾ + (¾)2 + (¾)3 + … ) = 20
– Halving the miss rate doubles the number of useful instructions that we can try to extract ILP from
Lecture 4: Correlated Branch Predictors
10
Typical Organization of 2bC Predictor
Lecture 4: Correlated Branch Predictors
PC hash32 or 64 bits
log2 n bits
n entries/counters
Prediction
FSMUpdateLogic
table update
Actual outcome
… back to predictors
11
Typical Hash• Just take the log2n least significant bits of
the PC• May need to ignore a few bits
– In a 32-bit RISC ISA, all instructions are 4 bytes wide, and all instruction addresses are 4-byte aligned least two significant bits of PC are always zeros and so they are not included• equivalent to right-shifting PC by two positions before
hashing– In a variable-length CISC ISA (ex. x86),
instructions may start on arbitrary byte boundaries• probably don’t want to shift
Lecture 4: Correlated Branch Predictors
12
How about the Branch at 0xDC50?• 1bc and 2bc don’t do too well (50% at best)• But it’s still obviously predictable• Why?
– It has a repeating pattern: (NT)*– How about other patterns? (TTNTN)*
• Use branch correlation– The outcome of a branch is often related to
previous outcome(s)
Lecture 4: Correlated Branch Predictors
13
Idea: Track the History of a Branch
Lecture 4: Correlated Branch Predictors
PC Previous Outcome
1Counter if prev=0
3 0 Counter if prev=1
1 3 3 prev = 1 3 0 prediction = Nprev = 0 3 0 prediction = Tprev = 1 3 0 prediction = Nprev = 0 3 0 prediction = T
prev = 1 3 prediction = T3
prev = 1 3 prediction = T3
prev = 1 3 prediction = T2
prev = 0 3 prediction = T2
14
Deeper History Covers More Patterns
• What pattern has this branch predictor entry learned?
Lecture 4: Correlated Branch Predictors
PC
0 310 1 3 1 0 02 2
Last 3 Outcomes Counter if prev=000Counter if prev=001
Counter if prev=010
Counter if prev=111
001 1; 011 0; 110 0; 100 100110011001… (0011)*
15
Predictor Organizations
Lecture 4: Correlated Branch Predictors
PC Hash
Different pattern foreach branch PC
PC Hash
Shared set ofpatterns
PC Hash
Mix of both
16
Example (1)• 1024 counters (210)
– 32 sets ( )• 5-bit PC hash chooses a set
– Each set has 32 counters• 32 x 32 = 1024• History length of 5 (log232 =
5)
• Branch collisions– 1000’s of branches
collapsed into only 32 sets
Lecture 4: Correlated Branch Predictors
PC Hash
5
5
17
Example (2)• 1024 counters (210)
– 128 sets ( )• 7-bit PC hash chooses a set
– Each set has 8 counters• 128 x 8 = 1024• History length of 3 (log28 =
3)
• Limited Patterns/Correlation– Can now only handle
history length of threeLecture 4: Correlated Branch Predictors
PC Hash
73
18
Two-Level Predictor Organization• Branch History Table
(BHT)– 2a entries– h-bit history per entry
• Pattern History Table (PHT)– 2b sets– 2h counters per set
• Total Size in bits– h2a + 2(b+h)2
Lecture 4: Correlated Branch Predictors
PC Hash a
b
h
Each entry is a 2-bit counter
19
Classes of Two-Level Predictors• h = 0 or a = 0 (Degenerate Case)
– Regular table of 2bC’s (b = log2counters)• h > 0, a > 1
– “Local History” 2-level predictor• h > 0, a = 1
– “Global History” 2-level predictor
Lecture 4: Correlated Branch Predictors
20
Global vs. Local Branch History• Local Behavior
– What is the predicted direction of Branch A given the outcomes of previous instances of Branch A?
• Global Behavior– What is the predicted direction of Branch Z
given the outcomes of all* previous branches A, B, …, X and Y?
* number of previous branches tracked limited by the history length
Lecture 4: Correlated Branch Predictors
21
Why Global Correlations Exist• Example: related branch
conditions
p = findNode(foo);if ( p is parent )
do something;
do other stuff; /* may contain more branches */
if ( p is a child )do something else;
Lecture 4: Correlated Branch Predictors
Outcome of secondbranch is always
opposite of the firstbranch
A:
B:
22
Other Global Correlations• Testing same/similar conditions
– code might test for NULL before a function call, and the function might test for NULL again
– in some cases it may be faster to recompute a condition rather than save a previous computation in memory and re-load it
– partial correlations: one branch could test for cond1, and another branch could test for cond1 && cond2 (if cond1 is false, then the second branch can be predicted as false)
– multiple correlations: one branch tests cond1, a second tests cond2, and a third tests cond1 cond2 (which can always be predicted if the first two branches are known).
Lecture 4: Correlated Branch Predictors
23
A Global-History Predictor
Lecture 4: Correlated Branch Predictors
PC Hash
b
h
Single global branchhistory register (BHR)
PC Hashb
h
b+h
24
Similar Tradeoff Between B and H• For fixed number of counters
– Larger h Smaller b• Larger h longer history
– able to capture more patterns– longer warm-up/training time
• Smaller b more branches map to same set of counters
– more interference– Larger b Smaller h
• just the opposite…
Lecture 4: Correlated Branch Predictors
25
Motivation for Combined Indexing• Not all 2h “states” are used
– (TTNN)* only uses half of the states for a history length of 3, and only ¼ of the states for a history length of 4
– (TN)* only uses two states no matter how long the history length is
• Not all bits of the PC are uniformly distributed
• Not all bits of the history are uniformly likely to be correlated– more recent history more likely to be strongly
correlatedLecture 4: Correlated Branch Predictors
26
Combined Index Example: gshare• S. McFarling (DEC-WRL TR, 1993)
Lecture 4: Correlated Branch Predictors
PC Hashk
k
XOR
k = log2counters
27
Gshare exampleBranchAddress
GlobalHistory
Gselect4/4
Gshare8/8
00000000 00000001 00000001 00000001
00000000 00000000 00000000 00000000
11111111 00000000 11110000 11111111
11111111 10000000 11110000 01111111
Lecture 4: Correlated Branch Predictors
Insufficient historyleads to a conflict
28
Some Interference May Be Tolerable• Branch A: always not-
taken• Branch B: always taken• Branch C: TNTNTN…• Branch D: TTNNTTNN…
Lecture 4: Correlated Branch Predictors
3
0
3
0
3
0
0
3
000
111
010
101
001
011
100
110
29
And Then It Might Not• Branch X: TTTNTTTN…• Branch Y: TNTNTN…• Branch Z: TTTT…
Lecture 4: Correlated Branch Predictors
000
111
010
101
001
011
100
110
0
3
3
3
3?
?
30
Interference Reducing Predictors• There are patterns and asymmetries in
branches• Not all patterns occur with same frequency• Branches have biases• This lecture:
– Bi-Mode (Lee et al., MICRO 97)– gskewed (Michaud et al., ISCA 97)
• These are global history predictors, but the ideas can be applied to other types of predictors
Lecture 4: Correlated Branch Predictors
31
Gskewed idea• Interference occurs because two (or more)
branches hash to the same index• A different hash function can prevent this
collision– but may cause other collisions
• Use multiple hash functions such that a collision can only occur in a few cases– use a majority vote to make final decision
Lecture 4: Correlated Branch Predictors
32
Gskewed organization
Lecture 4: Correlated Branch Predictors
PCGlobal Hist
hash1 hash2 hash3
maj
prediction
PHT1
PHT2
PHT3
if hash1(x) = hash1(y)then:
hash2(x) hash2(y)hash3(x) hash3(y)
33
Gskewed example
Lecture 4: Correlated Branch Predictors
A
B
maj
34
Combining Predictors• Some branches exhibit local history
correlations– ex. loop branches
• While others exhibit global history correlations– “spaghetti logic”, ex. if-elsif-elsif-elsif-else
branches
• Using a global history predictor prevents accurate prediction of branches exhibiting local history correlations
• And visa-versaLecture 4: Correlated Branch Predictors
35
Tournament Hybrid Predictors
Pred0 Pred1
MetaUpdat
e --- Inc Dec ---
Lecture 4: Correlated Branch Predictors
Pred0 Pred1Meta-
Predictor
Final Prediction
table of 2-/3-bit counters
If meta-counter MSB = 0,use pred0 else use pred1
36
Common Combinations• Global history + Local history• “easy” branches + global history
– 2bC and gshare• short history + long history
• Many types of behavior, many combinations
Lecture 4: Correlated Branch Predictors
37
Multi-Hybrids• Why only combine two predictors?
Lecture 4: Correlated Branch Predictors
M23
MM
prediction
M
prediction
• Tradeoff between making good individual predictions (P’s) vs. making good meta-predictions (M’s)– for a fixed hardware budget, improving one may
hurt the other
P3P2M01P1P0 P3P2P1P0
38
Prediction Fusion
• Selection discards information from n-1 predictors
• Fusion attempts to synthesize all information– more info to work with– possibly more junk to sort through
Lecture 4: Correlated Branch Predictors
M
prediction
P3
prediction
M
P2P1P0P3P2P1P0
39
Using Long Branch Histories• Long global history provides more context
for branch prediction/pattern matching– more potential sources of correlation
• Costs– For PHT-based approach, HW cost increases
exponentially: O(2h) counters– Training time increases, which may decrease
overall accuracy
Lecture 4: Correlated Branch Predictors
40
Predictor Training Time• Ex: prediction equals opposite for 2nd most
recent
Lecture 4: Correlated Branch Predictors
• Hist Len = 2• 4 states to train:
NN TNT TTN NTT N
• Hist Len = 3• 8 states to train:
NNN TNNT TNTN NNTT NTNN T…
41
Neural Branch Prediction• Uses “Perceptron” from classical machine
learning theory– simplest form of a neural-net (single-layer,
single-node)• Inputs are past branch outcomes• Compute weighted sum of inputs
– output is linear function of inputs– sign of output is used for the final prediction
Lecture 4: Correlated Branch Predictors
42
Perceptron Predictor
Lecture 4: Correlated Branch Predictors
1 0 1 0 0 1 0 1 1 1 0 0 1 1 0 0 0 0 1
1 -1 1 -1 -1 1 -1 1 1 1 -1 -1 1 1 -1 -1 -1 -1 1
xn x0x1x2xn-1
w0w1w2w3w4w5w6w7w8w9w10w11w12w13w14w15w16w17wn
xxx x x x x x x x x x x x x x x x x
Adder 0 prediction
“bias”
43
Perceptron Predictor (2)• Magnitude of weight wi determines how
correlated branch i is to the current branch• Sign of weight determines postitive or
negative correlation• Ex. outcome is usually opposite as 5th
oldest branch– w5 has large magnitude (L), but is negative– if x5 is taken, then w5x5 = -L1 = -L
• tends to make sum more negative (toward a NT prediction)
– if x5 is not taken, then w5x5 = -L-1 = L
Lecture 4: Correlated Branch Predictors
44
Perceptron Predictor (3)• When actual branch outcome is known:
– if xi = outcome, then increment wi (positive correlation)
– if xi outcome, then decrement wi (negative correlation)
– for x0, increment if branch taken, decrement if NT
• “Done with training”– if |S wi| > q, then don’t update weights unless
mispredLecture 4: Correlated Branch Predictors
45
Perceptron Trains Quickly• If no correlation exists with branch i, then
wi will just get incremented and decremented back and forth, wi 0
• If correlation exists with branch j, then wj will be consistently incremented (or decremented) to have a large influence on the overall sum
Lecture 4: Correlated Branch Predictors
46
Linearly Inseparable Functions• Perceptron computes linear combination of
inputs• Can only learn linearly separable functions
Lecture 4: Correlated Branch Predictors
xi
xj
1-1-1
1 N
N
N
T xi
xj
1-1-1
1 N
T
T
N
f() = -3*xi -4*xj – 5
wi wj w0
• No values of wi, wj, w0 exist to satisfy these output
• No straight line exists that separates T’s from N’s
47
Overall Hardware Organization
Lecture 4: Correlated Branch Predictors
PC Hashone set of weights
BHR…
adder …
prediction = sign(sum)Size = (h+1)*k*n + h + Area(mult) + Area(adder)
h = history length, k = counter width, n = number of perceptrons in table
Table of weights
Table BHR
Multipliers
48
GEHL• GEometric History Length predictor
Lecture 4: Correlated Branch Predictors
very long branch history
h1 h2 h3 h4
PC
adder
prediction = sign(sum)
K-bit weights
L1L2 L3 L4
L(i) = ai-1 L(1)History lengths form a geometric progression
49
PPM Predictors• PPM = Partial Pattern Matching
– Used in data compression– Idea: Use longest history necessary, but no longer
Lecture 4: Correlated Branch Predictors
Most Recent Oldest
2bc
Parti
al ta
gs
Parti
al ta
gs
Parti
al ta
gs
Parti
al ta
gs
h1 h2 h3 h4
PC
= = = =
0 1
0 1
0 1
0 1 Pred
2bc
2bc
2bc
2bc
50
TAGE Predictor• Similar to PPM, but uses geometric history
lengths– Currently the most accurate type of branch
prediction algorithm
• References (www.jilp.org):– PPM: Michaud (CBP-1)– O-GEHL: Seznec (CBP-1)– TAGE: Seznec & Michaud (JILP)– L-TAGE: Seznec (CBP-2)
Lecture 4: Correlated Branch Predictors