22
Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture Lab Electrical and Computer Engineering Intel Corporation Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

  • Upload
    mireya

  • View
    21

  • Download
    0

Embed Size (px)

DESCRIPTION

Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution. Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt. The University of Texas at Austin *Oregon Microarchitecture Lab Electrical and Computer Engineering Intel Corporation. Talk Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

The University of Texas at Austin *Oregon Microarchitecture LabElectrical and Computer Engineering Intel Corporation

Hyesoon KimOnur MutluJared Stark*Yale N. Patt

Page 2: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

2

Talk Outline

Problem Wish Branches Experimental Methodology Results Conclusion

Page 3: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

3

Predicated Execution

Convert control flow dependency to data dependencyPro: Eliminate hard-to-predict branches

(normal branch code)

C B

D

AT N

p1 = (cond) branch p1, TARGET

mov b, 1 jmp JOIN

TARGET: mov b, 0

A

B

C

BCD

A(predicated code)

A

B

C

if (cond) { b = 0;}else { b = 1;}

Cons: (1) Fetch blocks B and C all the time (2) Wait until p1 is resolved

D add x, b, 1

p1 = (cond)

(!p1) mov b, 1

(p1) mov b, 0

Page 4: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

4

p1 = (cond)

(!p1) mov b, 1

(p1) mov b, 0

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG

Nor

mal

ized

exe

cutio

n tim

e

PREDICATED CODENO-DEPENDENCYNO-DEPENDENCY + NO-FETCH

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG

Nor

mal

ized

exe

cutio

n tim

e

PREDICATED CODENO-DEPENDENCYNO-DEPENDENCY + NO-FETCH

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG

Nor

mal

ized

exe

cutio

n tim

e

PREDICATED CODENO-DEPENDENCYNO-DEPENDENCY + NO-FETCH

The Overhead of Predicated Execution

If all overhead is ideally eliminated, predicated execution would provide 16% improvement in average execution time

A

B

C

(Predicated code)

D add x, b, 1

non-predicated

p1 = (cond)

(0) mov b,1

(1) mov b,0

-2%13%16%

Page 5: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

5

The Problem Due to the predication overhead, predicated execution sometimes reduces performance

Branch misprediction characteristics are dependent on run-time behavior: input set, control-flow path and phase behavior. The compiler cannot accurately estimate the run-time behavior of branches

Page 6: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

6

Talk Outline

Problem Wish Branches Experimental Methodology Results Conclusion

Page 7: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

7

Wish Branches A new type of control flow instruction

3 types: wish jump/join and wish loop The compiler generates code (with wish branches)

that can be executed either as predicated code or non-predicated code (normal branch code)

The hardware decides to execute predicated code or normal branch code at run-time based on the confidence of branch prediction

Easy to predict: normal branch code Hard to predict: predicated code

Page 8: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

8

TARGET: (p1) mov b,0TARGET: (1) mov b,0

(!p1) mov b,1 wish.join !p1 JOIN

(1) mov b,1 wish.join (1) JOIN

Low ConfidenceWish Jump/Join

p1 = (cond) branch p1, TARGET

C B

D

AT N

mov b, 1 jmp JOIN

TARGET: mov b,0

normal branch code

A

B

C

BCD

A

p1 = (cond)

(!p1) mov b,1

(p1) mov b,0

predicated code

A

B

C

wish jump/join code

B

A

C

D

wish jump

p1=(cond) wish.jump p1 TARGET

A

B

C

wish join

DJOIN:

High Confidence

nop

nop

Taken

Not-Taken

Page 9: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

9

Low Confidence

Wish Loop

X

Y

NT

LOOP: add a, a, 1 add i, i, 1 p1 = (i<N) branch p1, LOOP

EXIT:

X

Y

NT

H

mov p1, 1

LOOP: (p1) add a, a, 1 (p1) add i, i, 1 (p1) p1 = (cond) wish. loop p1, LOOP

EXIT:

normal backward branch code

do {

a++;

i++;

} while (i<N);

XHX

wish loop codeY Y

High Confidence

(1)(1)(1)

Page 10: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

10

Mispredicted Case 1: Early-Exit

X1 X2 X3 Y

T T N

Correct execution:

Early-exit:

(Low confidence)

X1 X2

T

Y

N

X3 Y

N

Flush pipeline

Compared to normal branch code: predicate data dependency and one extra instruction (-)

…X

Y

NT

H

H

H

Page 11: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

11

Mispredicted Case 2: Late-Exit

X1 X2 X3 Y

T T N

Correct execution:

Late-exit:

(Low confidence)

X1 X2

T

X3

T

Compared to normal branch code: pro: reduce flush penalty (+++)

cons: predicate data dependency and one extra instruction (-)

T

X4

T

X5

N

Y …nop nopX

Y

NT

H

H

H

Page 12: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

12

Mispredicted Case 3: No-Exit

X1 X2 X3 Y

T T N

Correct execution:

No-exit:

(Low confidence)

X1 X2

T

X3

T

Compared to normal branch code: predicate data dependency and one extra instruction (-)

T

X4

T

X5

T

X6 …

T

Flush pipeline

Y

X

Y

NT

H

H

H

Page 13: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

13

Advantages/Disadvantages of Wish Branches

Advantages compared to predicated execution Reduce the overhead of predication Increase the benefits of predicated code by

allowing the compiler to generate more aggressively-predicated code

Provide a mechanism to exploit predication to reduce the branch misprediction penalty for backward branches (Wish loops)

Make predicated code less dependent on machine configuration (eg. branch predictor)

Page 14: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

14

Advantages/Disadvantages of Wish Branches

Disadvantages compared to predicated execution

Extra branch instructions use machine resources

Extra branch instructions increase the contention for branch predictor table entries

May constrain the compiler’s scope for code optimizations

Page 15: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

15

Wish Branch Support ISA Support

predicated execution, wish branch instruction Compiler Support

Wish branch generation algorithmsThe compiler needs to decide which branches

are predicated, which are converted to wish branches, and which stay as normal branches

Hardware Support Confidence estimator Front-end and branch misprediction

detection/recovery module

Page 16: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

16

Talk Outline

Problem Wish Branches Experimental Methodology Results Conclusion

Page 17: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

17

Experimental Infrastructure

IA-64 provides full support for predication Convert IA-64 traces to micro-ops to simulate

an out-of-order superscalar processor model

IA-64Compiler

(ORC)

SourceCode

IA-64 Binary

IA-64 Trace µopsTrace

generationmodule

Micro-opTranslator

Micro-opSimulator

Page 18: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

18

Simulation Methodology Nine SPEC 2000 integer benchmarks Baseline Processor Configuration

Front End Large and accurate branch predictor (64KB

hybrid branch predictor: gshare + local) Minimum 30-cycle branch misprediction penalty 64KB, 2-cycle latency I-cache

Execution Core 8-wide out-of-order processor 512-entry instruction window

Confidence Estimator 1KB tagged 16-bit history JRS confidence

estimator (Jacobsen et al. MICRO-29)

Page 19: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

19

Talk Outline

Problem Wish Branches Experimental Methodology Results Conclusion

Page 20: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

20

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf

Nor

mal

ized

exe

cutio

n tim

e.

SELECTIVE-PREDICATIONAGGRESSIVE-PREDICATIONwish jump/joinwish jump/join/loop

SELECTIVE-PREDICATION: branches are selectively predicated using compile-time cost-benefit analysis

AGGRESSIVE-PREDICATION: all branches that are suitable for if-conversion are predicated

16% over conditional branch prediction (w/o mcf)11% over selective-predication (w/o mcf) 7 % over aggressive predication (w/o mcf)

14% over conditional branch prediction and13% over selective-predication and16% over aggressive-predication

12% over conditional branch prediction 11% over selective-predication 13 % over aggressive predication

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf

Nor

mal

ized

exe

cutio

n tim

e.

SELECTIVE-PREDICATIONAGGRESSIVE-PREDICATIONwish jump/joinwish jump/join/loop

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf

Nor

mal

ized

exe

cutio

n tim

e.

SELECTIVE-PREDICATIONAGGRESSIVE-PREDICATIONwish jump/joinwish jump/join/loop

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf

Nor

mal

ized

exe

cutio

n tim

e.

SELECTIVE-PREDICATIONAGGRESSIVE-PREDICATIONwish jump/joinwish jump/join/loop

Performance Improvement

24% 8% 14%-4%non-predicated

2.02

Page 21: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

21

Talk Outline

Problem Wish Branches Experimental Methodology Results Conclusion

Page 22: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

22

Conclusion New control flow instructions: wish branches (jump/join/loop) Wish branches improve performance by dividing the work of

predication between the compiler and the microarchitecture Compiler: analyzes the control-flow graph and generates code Microarchitecture: makes run-time decision to use predication

Wish branches provide significant performance benefits 16% compared to conditional branch prediction 13% compared to selectively predicated code

Wish branches can make predicated execution more viable and effective in high performance processors By enabling adaptive and aggressive predicated execution