40
Control Dependency 1 Problem Dependency tracking in ARVI ignores control dependency Can’t get practical Available registers Make same patterns for different directions CANNOT predict that branch correctly

improved register value pattern generation for branch prediction

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: improved register value pattern generation for branch prediction

Control Dependency

1

Problem– Dependency tracking in ARVI ignores control

dependency

– Can’t get practical Available registers• Make same patterns for different directions

– CANNOT predict that branch correctly

Page 2: improved register value pattern generation for branch prediction

Example of control dependency

Practical Available Register Set of branch 2– r1, r3

Available Register Set in ARVI of branch 2– r3

When r3==1– r1 ==0 -> not taken– r1 !=0 -> taken

2

Page 3: improved register value pattern generation for branch prediction

Dependence Tracking

3

Branch 1 not taken

Branch 1 taken

Page 4: improved register value pattern generation for branch prediction

Improved Data Dependence Tracking

Resolve control dependency– Add Control flow information to tracking

• Add logical register to architecture– Called TA-register(target address)– For maintaining Target address of last

branch• TA is hidden source register of instructions

4

Page 5: improved register value pattern generation for branch prediction

Behavior of branch instruction

Example – beq in MIPS instruction set architecture

5

i f ( r s == r t ) t henTA = PC + 4 + {14{i mm[ 15] }, i mm, 2' b0}

el seTA = PC + 4

endi fPC = TA

Page 6: improved register value pattern generation for branch prediction

Improved Dependence Tracking

6

Branch 1 not taken

Branch 1 taken

Page 7: improved register value pattern generation for branch prediction

ARS of Branch 13

Improved Tracking– Track control dependency well– When completed by the INST1

• It’s different with practical ARS• But it can be predicted well

– because TA has control flow information• When r3==1

– TA==2 Not Taken– Ta==4 Taken 7

Page 8: improved register value pattern generation for branch prediction

Common code problem

Performance loss in not con-

trol dependence code– In common code

ARS of Branch 15– When completed by the

INST2– Practical ARS

• r2(INST1)– Previously proposed Track-

ing• r2(INST1)

– Improved Tracking• r2(INST1), TA 8

Page 9: improved register value pattern generation for branch prediction

Distinguishing control flow in im-proved tracking

TA is wasted Information– it’s not mean that the

prediction isn’t correct– But mean that predictor

need more training

Information to Train– Previously Tracking

• r2 = 0 -> Taken– Improved Tracking

• r2 =0, TA=5 -> Taken• r2=0, TA=6 -> Taken

9

Page 10: improved register value pattern generation for branch prediction

“SetTA” Instruction

Add “SetTA” Instruction – Save next instruction address to TA

ARS of branch 15 is still r2 and TA– But TA is always 6

Disadvantage– Wasted Instructions(INST6)– Programs will be Recompiled– Have to find start of common code

for adding “setTA” at compile time• It’s hard because an Assembly

language is not the structured

programming language(have

“goto”)

10

Page 11: improved register value pattern generation for branch prediction

Encoding

Amount of information is changed by number

of registers in ARS– Amount of information

» Assume each length of values is

10bits• 1 register in ARS => 10bits• 2 registers in ARS => 20bits• 3 registers in ARS => 30bits

Must generate fixed length pattern from vari-

ous length information– -> HASH– Various Encodings are possible

11

Page 12: improved register value pattern generation for branch prediction

Encoding of ARVI

XOR with each physical register values– Simple XOR HASH with XOR tree

12

Page 13: improved register value pattern generation for branch prediction

Reducing Hash conflict

Programs more use lower bits than higher bits

of registers– Almost information is centralized in lower bits– Hash conflict occurs due to lower bits

For decentralizing information distribution– Different circular shifted values per logical reg-

ister numbers• Because physical number is changed in run-

time

13

Page 14: improved register value pattern generation for branch prediction

Percentage of use of each bit

14

There are the bits that program use mostly Hash conflict occurs in that bits

164.

gzip

175.

vpr

176.

gcc

177.

mes

a

179.

art

181.

mcf

183.

equa

ke

197.

pars

er

255.

vorte

x

256.

bzip

2

aver

age

0

0.02

0.04

0.06

0.08

0.1

0.12

164.

gzip

175.

vpr

176.

gcc

177.

mes

a

179.

art

181.

mcf

183.

equa

ke

197.

pars

er

255.

vorte

x

256.

bzip

2

aver

age

0

0.02

0.04

0.06

0.08

0.1

0.12

Page 15: improved register value pattern generation for branch prediction

Degree of centralization

15

High value mean use small number bits of reg-

isters– Information is centralized in small number of

bits– Decentralized well by circular shift

164.

gzip

175.

vpr

176.

gcc

177.

mes

a

179.

art

181.

mcf

183.

equa

ke

197.

pars

er

255.

vorte

x

256.

bzip

2

aver

age

non32 circular32

Page 16: improved register value pattern generation for branch prediction

Proposed Encoding

XOR with each Logical register values– Different Circular shifted by logical number

– Serialize physical-logical mapping• Value information is shorter than

before(Disadventage) 16

Page 17: improved register value pattern generation for branch prediction

Select Logical Register X

Select Logical Register X

– Select physical register value that mapped in logical

register X17

Page 18: improved register value pattern generation for branch prediction

Delay

» nPR = Number of Physical Register

» nLR = Number of Logical Register

» L = Log2(nLR)

Simple XOR Hash– Log2(nPR) * XOR2 + AND2

Proposed Hash– Log2(nLR) *XOR2 + Select + AND2

• Select = XOR2 + ANDL + Gate + OR(nPR)

nPR > nLR *2– Log2(nPR) > Log2(nLR) + 1

– Approximately same or little bit slower 18

Page 19: improved register value pattern generation for branch prediction

HW Resource

Simple XOR Hash– nPR *N*AND2 + (nPR-1)*N*XOR2

– nPR-1 * 3bitADD for Logical num tag

Proposed Hash– nPR *N*AND2 + nLR * Select + (nLR-1)*N*XOR2

• Select = nPR *( L * (XOR2 +2Gate) + ANDL) + N *

OR(NPR)

– No Logical num tag• Pattern has that information already

19

Page 20: improved register value pattern generation for branch prediction

Suitable predictor for register-value-pattern

Characteristic of register-value-pattern– Need long pattern length for reliable prediction

• PHT is not suitable– Must save tags for comparing states

• Perceptron is not suitable[17][18]– Non-linear-separable[17][18]

• Each bit of value has relation of AND with others– Perceptron is not suitable

– Many various patterns for branches• If there is loop that r1 is changed from 0 to 999

– There is 999 not taken patterns and 1 taken pat-

tern– Long Delay for pattern generation

• Perceptron is not suitable[17][18]• Must hybrid with fast predictor[19][20]

20

Page 21: improved register value pattern generation for branch prediction

Proposed predictor

21

Modified YAGS[21]– 1 Bimodial

• Saving Biases for each

branches– 2 Cache

• Save only pattern that

different with bias• Taken Cache

– Saving Not taken pat-

terns for taken biased

branches • Not Taken Cache

– Saving Taken patterns

for Not taken biased

branches

Page 22: improved register value pattern generation for branch prediction

Block diagram

22

1 Fast predictor predict direction in early cycle When Modified YAGS hit and Depth tag is same with now

state– Update fetch direction in late cycle

When Modified YAGS miss then predicted direction of YAGS

is bias and we don’t know it is not trained or trained but not

save– Selector select biased direction or Fasted predictor direction

Page 23: improved register value pattern generation for branch prediction

Outlines

Why We need branch prediction ??Related worksImproved Register-value-pattern genera-

tionExperiment and EvaluationContribution

Reference

23

Page 24: improved register value pattern generation for branch prediction

Experimental environment

SimpleScalar3.0 – PISA Instruction Set Architecture– Little Endean

sim-outorder– Performance-based– Execution driven– Cycle timer

Benchmarks– 10 programs of SPEC 2k– Instructions coverage

• 150M ~ 250M instruction24

Page 25: improved register value pattern generation for branch prediction

Processor Architecture Configuration

25

Page 26: improved register value pattern generation for branch prediction

Memory Architecture Configuration

26

Page 27: improved register value pattern generation for branch prediction

Predictor Configuration

27

Page 28: improved register value pattern generation for branch prediction

Outlines

Why We need branch prediction ??Related worksImproved Register-value-pattern genera-

tionExperiment and EvaluationContribution

Reference

28

Page 29: improved register value pattern generation for branch prediction

Register-Value-Pattern predictor

Register-Value-Pattern predictor predictor is

predict like Human doing. – If we know “this branch was taken before when

a=3 and b=4”• We predict the branch without calculation

when arrive a=3 and b=4 again.

– Commonsense design• Why it’s not possible 100% accuracy??

29

Page 30: improved register value pattern generation for branch prediction

Factors of performance loss

1. Limitation of dependence tracking– 1.1 Load Branch– 1.2 Control Dependency

2. Hash conflict in encoding 3. Prediction Delay 4. Various Patterns for same direction

– 4.1 Pattern capacity of predictor– 4.2 Lack of training

30

Page 31: improved register value pattern generation for branch prediction

Contribution

We improve some factors of performance loss– 1.2 Control Dependency– 2 Hash conflict in encoding– 4.1 Pattern capacity of predictor

But we still have assignments

31

Page 32: improved register value pattern generation for branch prediction

Applications of Register-Value-Pat-tern

Register Value Pattern has limits at different kinds of

branches with Branch History Pattern– Higher performance in hybrid predictor with Branch His-

tory Pattern

Register-Value-Pattern with Branch register value based– Depth of dependence chain is 0

• Means Branch register is already updated– We are good to use Branch register value based

prediction in that case

Register-Value-Pattern for Value prediction– We can use register-value-pattern for value prediction as

Information 32

Page 33: improved register value pattern generation for branch prediction

Outlines

Why We need branch prediction ??Related worksImproved Register-value-pattern genera-

tionExperiment and EvaluationContribution

Reference

33

Page 34: improved register value pattern generation for branch prediction

Reference

[1] T. Yeh and Y. Patt. “Two-level Adaptive Branch Prediction” In

Proc 24th ACM/IEEE Int Symp. on Microarchitecture, 1991.

[2] T. Yeh and Y. Patt. “A Comparison of Dynamic Branch

Predictors that use Two Levels of Branch History” In Proc 20th

Ann Int Symp. on Computer Architecture,1993.

[3] S. Pan , K So and J. Rahmeh. “Improving the Accuracy of

Dynamic Branch Prediction Using Branch Correlation” In Proc

5th Annual Intl Conf. on Architectural Support for Prog. Lang.

and Operating Systems, 1992.

[4] R. Nair “Dynamic Path-Based Branch Correlation” In Proc

28th Ann Int Symp On Microarchitecture,1995.

[5] D. Jim´enez “Fast Path-Based Neural Branch Prediction” In

Proc 36th Ann IEEE/ACM Int Symp On Microarchitecure, 200334

Page 35: improved register value pattern generation for branch prediction

Reference

[6] F. Gabbay and A. Mendelson “Speculative Execution

Based on Value Prediction” In Technical Report Technion,

1997 [7] J. Gonzalez and A. Gonzalez “Control-Flow

Speculation through Value Prediction for Superscalar

Processors” In Proc Int Conf On Parallel Architectures

and Compilation Techniques, 1999 [8] T. Heil, Z. Smith and J.E. Smith “Improving Branch

Predictor by Correlating on Data Value” In Proc 32nd Int

Symp On Microarchitecture,1999.

35

Page 36: improved register value pattern generation for branch prediction

Reference

[9] K.Wang “Highly Accurate Data Value Prediction using

Hybrid Predictors” In Proc 30th Int Symp on

Microarchitecture, 1997. [10] M. Lipasti and J. Shen “Exceeding the Dataflow

Limit via Value Prediction” In proc 29th Int Symp on

Microarchitecture,1996. [11] W.Mohan and M.Franklin “Improving Data Value

Prediction Accuracy using Path Correlation” In Proc 6th

Int Conf on High performance Computing, 1999. [12] Y. Sazeides and J. Smith. “Implementations of Con-

text Based Value Predictors” In Technical Report #ECE-

TR-97- 8, University of Wisconsin-Madison, 1997.

36

Page 37: improved register value pattern generation for branch prediction

Reference

[13]L. N. Vintan, M. Sbera, I. Z. Mihu and A. Florea, "An

alternative to branch prediction: pre-computed

branches," In ACM SIGARCH Computer Architecture

News archive Vol 31 , 2003. [14] L. He and Z. Liu, “A New Value Based Branch Pre-

dictor For SMT Processors” In Proc 16th IASTED Int Conf

on Parallel and Distributed Computing and System,

2004 [15] Y. Pan, X. Fan, L. He, D. Wang “A bypass Mechanism

to Enhance Branch Predictor for SMT”, In Proc 12th Asia-

Pacific Conf on Computer Systems Architecture AC-

SAC2007, vol 4697, 2007

37

Page 38: improved register value pattern generation for branch prediction

Reference

[16] L. Chen, S. Dropsho and D. H. Albonesi “Dynamic

Data Dependence Tracking and its Application to Branch

Prediction” In Proc 9th Int Symp on Highperformance

Computer Architecture, 2003. [17]D.A.Jim´enez and C.Lin. “Dynamic Branch Prediction

with Perceptrons”.In Proc 7th Int Symp.on High

Performace Computer Architecutre,2001. [18] D.A.Jim´enez and C.Lin. “Neural Methods for

Dynamic Branch Prediction”.In ACM Transactions on

Computer Systems, 2002.

38

Page 39: improved register value pattern generation for branch prediction

Reference

[19] P. Chang , E. Hao and Y. Patt “Alternative

Implementations of Hybrid Branch Predictors”.In Proc

28th Ann Int Symp.on Microarchitecture, 1995. [20] M. Evers, P. Chang and Y. Patt “Using Hybrid Branch

Predictors to Improve Branch Prediction Accuracy in The

Presence of Context Switches”. In Proc 23rd Ann Int

Symp. on Computer Architecture ,1996 [21] A.Eden and T. Mudge. “The YAGS branch prediction

scheme”In Proc 31st Ann ACM/IEEE Int Symp.on Microar-

chitectres, 1998 [22] P. N. Glaskowsky. “Pentium 4 (partially) previewed.

“In Microprocessor Report, 2000.

39

Page 40: improved register value pattern generation for branch prediction

40