37
TINKER Microarchitecture and Compiler Research Group Department of Electrical & Computer Engineering North Carolina State University Chad Rosier GCC Developers’ Summit June 30 th , 2006 Treegion Instruction Scheduling Treegion Instruction Scheduling in GCC in GCC

Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

TINKER Microarchitecture and Compiler Research GroupDepartment of Electrical & Computer EngineeringNorth Carolina State University

Chad RosierGCC Developers’ Summit

June 30th, 2006

Treegion Instruction Scheduling Treegion Instruction Scheduling in GCCin GCC

Page 2: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

2

Presentation Outline

• Introduction• Treegion formation• Code size efficiency• Implementation• Summary• Future work

Page 3: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

3

Region Formation Techniques

• Trace Scheduling [Fisher and Ellis]– Linear regions, based on profile, optimizes likely path

• Superblock Scheduling [Chang, et al.]– Apply tail duplication to remove side entrances– Robert Kidd working at Tree SSA level

• Hyperblock Scheduling [Mahlke et al.]– Add hardware predication

• Regions are linear in nature

Page 4: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

4

Treegion Scheduling [Havanki et al.]

• Tree-shaped subgraph of CFG– single-entry, multiple-exit– acyclic, non-linear

• Multiple paths considered for speculating instructions

• Formation based on CFG• Profile used for scheduling• Additional notable characteristics

– Calculating domination– No merge points

Treegion 0

Treegion 1

Treegion 2

Page 5: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

5

Treegion Scheduling

• Treegion formation (basis of this talk)– Natural Treegion formation

• Treegion formation without tail duplication

– Region enlargement optimizations• Tail duplication

• Treegion-based instruction scheduling• Currently using existing Haifa scheduler

Page 6: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

6

Page 7: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

7

Page 8: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

8

Page 9: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

9

Page 10: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

10

Page 11: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

11

BB0

BB1 BB2

BB3 BB4

BB5

BB7BB6

BB8

Treegion 0

Treegion 1

Treegion 2

Page 12: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

12

Tail Duplication Example

BB0

BB1 BB2

BB3 BB4

BB5

BB7BB6

BB8

Treegion 0

Treegion 1

Treegion 2

Page 13: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

13

Tail Duplication Example

BB0

BB1 BB2

BB3 BB4

BB5

BB7BB6

BB8

Treegion 0

Treegion 1

Treegion 2

Page 14: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

14

Tail Duplication Example

BB0

BB1/BB5' BB2

BB3 BB4

BB5

BB7BB6

BB8

Treegion 0

Treegion 1

Treegion 2

BB7'BB6'

Page 15: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

15

Region Enlargement Optimizations

• Instruction level parallelism (ILP) vs. static code size– Region enlarging optimizations usually enhance ILP

• Cyclic scheduling: loop unrolling, loop peeling, etc.• Acyclic scheduling: tail duplication, recovery code, etc.

• I-cache and ITLB performance vs. static code size– Larger code usually means larger I-Cache footprint

• Trade off of the conflicting effects of code size increase– Especially in acyclic global scheduling

Page 16: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

16

Code Size Efficiency

• Use the ratio of IPC changes over code size changes as an indication of code size efficiency.– Average code size efficiency

– Instantaneous code size efficiency

treegionnaturalcandidate

treegionnaturalcandidateaverage sizecodesizecode

IPCIPCEfficiency

_

_

__ −

−=

napplicatioindividualbeforenapplicatioindividualafter

napplicatioindividualbeforenapplicatioindividualafterinst sizecodesizecode

IPCIPCEfficiency

____

____

__ −

−=

Page 17: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

17

Instantaneous Code Size Efficiency

Code Size

Static IPC

A1

A2

A3A4

04

04

__ AA

AAaverage SizeCodeSizeCode

IPCIPCEff−−

=

A0

23

23

__)2(

AA

AAinst SizeCodeSizeCode

IPCIPCAEff−−

=

Page 18: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

18

Estimate Static IPC Before Scheduling• Use the expected execution time to calculate the

static IPC– For a multi-path region:

• Now, IPC changes can be calculated as execution time saved by the optimization.

( )[ ]∑ ∗=ipath

ipathipathipathExpected FreqboundresourcebounddependencedataMaxTimeExe_

___ _,___

tree1

tree2

Tree1’

)2(_)'1(_)2(_)1(_)(

treesizeCodeTreetimeExetreetimeExetreetimeExeAEffinst

−+=

Example:

Page 19: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

19

Motivating Results from LEGO129.compress

2

2.2

2.4

2.6

2.8

3

0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6Relative code size

Stat

ic IP

C

best ILP for a given code size

0%

2%5%

80%30%

Page 20: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

20

Motivating Results from LEGO147.vortex

2.5

2.7

2.9

3.1

3.3

3.5

0.8 1 1.2 1.4 1.6 1.8 2 2.2Relative code size

Stat

ic IP

C

best ILP for a given code size

0%

2%5%

80%30%

Reason: only a very small part of the program is frequently executed.

Page 21: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

21

Optimal Code Size Efficiency• Point where the ‘diminishing returns’ start• Range between tan(π/12) (.268) and

tan(π /6) (.577)

Code Size

IPC

A

l

Page 22: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

22

Single Entry Acyclic Regions (SEARs)

• Prompted by recent extend region code• Form larger regions by merging treegions

– IFF predecessor edges entering the child treegion are exit edges from parent region

• Prevents side entrances (and need for compensation code)

• We now have merge points in the region so it is no longer a Treegion

• SEARs formation happens after tail duplication, but ICSE heuristic still applicable

Page 23: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

23

Page 24: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

24

Page 25: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

25

BB0

BB1 BB2

BB3 BB4

BB5

BB7BB6

BB8

SEAR 0

Page 26: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

26

Tree Traversal Scheduling• Final step is to (Haifa) schedule instructions• Other region formation techniques optimize likely

path while off path suffers• Prioritize likely path (based on exec frequency)

• TTS Algorithm:1. Sort basic blocks in depth-first traversal order2. List schedule beginning at root block3. Consider speculation of instructions dominated by current block4. Repeat step 3 until all blocks scheduled

Page 27: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

27

Implementation Details

• Serves as front end for Haifa scheduler– Scheduling pass prior to RA

• Primary modifications in sched-rgn.c

• ICSE - data dependence code in sched-dep.c

• Tail duplication – duplicationcode available in cfglayout.c

Front EndParserGenericGimple

TreesTree

GenerationTree

Optimization

RTL

Schedule 1Register Alloc

Schedule 2

BackendSelect

EBB (IA64)

Gimple

Tree-SSA

RTL

ASM

RTL Gen/Opt

Page 28: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

28

Compile Time Considerations

• Efficient updating of region structures• Calculating ICSE is expensive

– Requires building DDG to find dependence bound

• Compile time considerations:– Limit duplication based on:

• Number of blocks/instructions• Code expansion • ICSE threshold

Page 29: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

29

Region Formation Statistics

• Significant increase in region size• Increase % blocks contained within multi-block region• Limited additional duplication using ICSE• Without code size consideration region size may become

extremely large

Page 30: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

30

Speedup

Page 31: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

31

Operation Combining

• Tail duplication increases code size• General operation combining reduces code size

Page 32: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

32

Automata Based Pipeline Description

• Integral part of achieving speedup is knowing what the hardware can do

• Only need to be described once per processor• The compiler can have a generic coding• Solution must not be memory intensive• This method is used in GCC

– But the DFA representation of GCC is very complex.

Page 33: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

33

HMDES Machine Description Language

• Created by John Gyllenhaal in UIUC

• Provides a flexible way to model the pipeline for a variety of processors

• User-friendly yet Powerful• HMDES is converted to

LMDES which is then interfaced with the IMPACT-compiler

Page 34: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

34

GCC Machine Description Example

State Diagram of a Integer & FP Unit

Pipeline

START

DECODE

FP

FP

Next Cycle

INT

INT

Next Cycle Next Cycle

GCC-DFA representation of

this State Diagram

Page 35: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

35

MDES Machine Description Example

State Diagram of a Integer & FP Unit

Pipeline

START

DECODE

FP

FP

Next Cycle

INT

INT

Next Cycle Next Cycle

MDES representation of

this State Diagram

Page 36: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

36

Conclusions

• Natural Treegion formation provides significantly larger regions (SEARs extend upon Treegions)

• Application of ICSE to tail duplication – (has the potential for) good tradeoff between code size, compile time, and performance

• What’s missing?

Page 37: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling

TINKER Microarchitecture and Compiler Research GroupNorth Carolina State Universityhttp://www.tinker.ncsu.edu

Chad Rosier [email protected] Conte [email protected] V. Iyer [email protected]

Questions & Contact InformationQuestions & Contact Information