# Topic 2b Basic Back-End Optimization

• View
29

• Category

## Documents

Tags:

• #### loop scheduling swp

Embed Size (px)

DESCRIPTION

Topic 2b Basic Back-End Optimization. Register allocation. Slides: Topic 3a Dragon book: chapter 10 S. Cooper: Chapter 13 Other papers as assigned in class or homework. Reading List. Focus of This Topic. We focus on “scalar register allocation” - PowerPoint PPT Presentation

### Text of Topic 2b Basic Back-End Optimization

• *\course\cpeg421-10F\Topic-2b.ppt*Topic 2b Basic Back-End OptimizationRegister allocation

\course\cpeg421-10F\Topic-2b.ppt

• *\course\cpeg421-10F\Topic-2b.ppt*Reading ListSlides: Topic 3aDragon book: chapter 10 S. Cooper: Chapter 13Other papers as assigned in class or homework

\course\cpeg421-10F\Topic-2b.ppt

• *\course\cpeg421-10F\Topic-2b.ppt*Focus of This TopicWe focus on scalar register allocationLocal register is straightforward (read Coopers Section 13.3)This global register allocation problem is essentially solved by graph coloring techniques:Chaitin et. al. 1981, 82 (IBM)Chow, Hennesy 1983 (Stanford)Briggs, Kennedy 1992 (Rice)Register allocation for array variables in loops -- subject not discussed here

\course\cpeg421-10F\Topic-2b.ppt

• *\course\cpeg421-10F\Topic-2b.ppt*Interprocedural Analysis and OptimizationLoop Nest Optimization and ParallelizationGlobal OptimizationCode GenerationFront endGood IRHigh-Level Compiler Infrastructure Needed A Modern View

\course\cpeg421-10F\Topic-2b.ppt

• *\course\cpeg421-10F\Topic-2b.ppt*Good IPOGood LNOGood global optimizationGood integration of IPO/LNO/OPTSmooth information passing between FE and CGComplete and flexible support of inner-loop scheduling (SWP), instruction scheduling and register allocation Inter-ProceduralOptimization (IPO)Loop NestOptimization (LNO)Global Optimization(OPT)SourceInnermostLoopschedulingGlobal instschedulingReg allocLocal instschedulingExecutableArchModelsCGMEGeneral Compiler Framework

\course\cpeg421-10F\Topic-2b.ppt

• *\course\cpeg421-10F\Topic-2b.ppt*A Map of Modern Compiler PlatformsGNU CompilersIMPACT CompilerCydra VLIWCompilerMultiflow VLIW CompilerUcode CompilerChow/HennessyHP ResearchCompiler SGI Pro Compiler - Designed for ILP/MP - Production quality - Open SourceTrimaranCompilerSUIF CompilerLLVM CompilerOpen64 Compiler (PathScale, ORC, Osprey)

\course\cpeg421-10F\Topic-2b.ppt

• *\course\cpeg421-10F\Topic-2b.ppt*Osprey Compiler Performance (4/3/07)GCC4.3 at O3 With additional options recommended by GCC developersTwo programs has runtime error using additional optionsOsprey3.1 with vanilla O3The performance delta is ~10%, excluding two failing programsCurtesy: S.M. Liu (HP)

\course\cpeg421-10F\Topic-2b.ppt

Chart1

900705

1016954

1132176.gcc

10861090

936945

804815

1074907

883871

641571

1223255.vortex

951800

10831097

923954

26291996

397375

1144873

953.7900062392868.6840686344

opencc3.1 -O3

gcc4.3 -O3 -funroll-loops -fgcse-las -fgcse-sm

SPEC2000 C/C++ Benchmark ComparisonMontecito 1.6GHz 4G Memory (higher is better)

Sheet1

opencc O3gcc4.3 O3 -funroll-loops -fgcse-las -fgcse-smopencc -O3/gcc -O3C/C++

int:timeratiotimeratioopencc3.1 -O3gcc4.3 -O3 -funroll-loops -fgcse-las -fgcse-sm

164.gzip1569001997051.27659574471.2765957447164.gzip900705

175.vpr13810161479541.06498951781.0649895178175.vpr1016954

176.gcc97.21132RE176.gcc1132

181.mcf166108616510900.99633027520.9963302752181.mcf10861090

186.crafty1079361069450.99047619050.9904761905186.crafty936945

197.parser2248042218150.98650306750.9865030675197.parser804815

252.eon12110741439071.1841234841.184123484252.eon1074907

253.perlbmk2048832078711.01377726751.0137772675253.perlbmk883871

254.gap1726411935711.1225919441.122591944254.gap641571

255.vortex1551223255.vortex1223

256.bzip21589511888001.188751.18875256.bzip2951800

300.twolf277108327310970.98723792160.9872379216300.twolf10831097

9568611.0766680749177.mesa923954

fp:179.art26291996

168.wupwise2666023254931.2210953347183.equake397375

171.swim12025915225944.361952862188.ammp1144873

172.mgrid12014976372835.2897526502Geomean953.7900062392868.6840686344

173.applu17811814244952.3858585859979.1533788234868.6840686344

177.mesa1529231479540.96750524110.96750524111.0979711044

178.galgel58.249851.1271685693

179.art98.9262913019961.31713426851.3171342685

183.equake3283973473751.05866666671.0586666667

187.facerec4364364344380.99543379

188.ammp19211442528731.31042382591.3104238259

189.lucas14014332557851.825477707

191.fma3d6533227772701.1925925926

200.sixtrack1616825551983.4444444444

301.apsi24510593926631.5972850679

geomean1050537.38058869481.74747655011.0979711044

spec2006

int:

400.perlbench15416.3416166.041.0496688742

401.bzip213207.3113886.951.0517985612

403.gcc12156.6311676.90.9608695652

429.mcf20534.4421494.241.0471698113

445.gobmk13467.7912568.360.9318181818

456.hmmer10738.6912377.541.1525198939

458.sjeng16637.2816377.390.9851150203

462.libquantum68573.0266963.090.9773462783

464.h264ref154714.3152114.60.9794520548

471.omnetpp15064.1513834.520.9181415929

473.astar8688.0910606.621.2220543807

483.xalancbmk16784.1115674.40.9340909091

6.31293179586.22684801641.0138246155

fp:

410.bwavesRE30544.45

416.gamess27767.0530826.35

433.milc25413.6126073.521.0255681818

434.zeusmp18045.0432672.79

435.gromacs8688.2219643.64

437.leslie3d1045942192.23

444.namd9198.7212966.191.408723748

447.dealII16836.821115.421.2546125461

450.soplex18714.4618834.431.006772009

453.povray6837.797826.81.1455882353

454.calculix20873.9522343.69

459.GemsFDTD21994.8242152.52

465.tonto10709.223324.22

470.lbm8571619607.012.2824536377

481.wrfRECE

482.sphinx3144813.521149.221.464208243

1.1173802018

Sheet2

Sheet3

• *\course\cpeg421-10F\Topic-2b.ppt*Vision and Status of Open64 Today ?People should view it as GCC with an alternative backend with great potential to reclaim the best compiler in the worldThe technology incorporated all top compiler optimization research in 90'sIt has regain momentum in the last three years due to Pathscale, HP and AMDs (and others) investment in robustness and performanceTargeted to x86, Itanium in the public repository, ARM, MIPS, PowerPC, and several other signal processing CPU in private branches

\course\cpeg421-10F\Topic-2b.ppt

• *\course\cpeg421-10F\Topic-2b.ppt*Register AllocationMotivationLive ranges and interference graphsProblem formulationSolution methods

\course\cpeg421-10F\Topic-2b.ppt

• *\course\cpeg421-10F\Topic-2b.ppt*Motivation Registers much faster than memory Limited number of physical registers Keep values in registers as long as possible (minimize number of load/stores executed)

\course\cpeg421-10F\Topic-2b.ppt

• *\course\cpeg421-10F\Topic-2b.ppt*Goals of Optimized Register Allocation1 Pay careful attention to allocating registers to variables that are more profitable to reside in registers2 Use the same register for multiple variables when legal to do so

\course\cpeg421-10F\Topic-2b.ppt

• *\course\cpeg421-10F\Topic-2b.ppt*Brief History of Register AllocationChaitin: Coloring Heuristic. Use the simple stack heuristic forACM register allocation. Spill/no-spillSIGPLAN decisions are made during theNotices stack construction phase of the1982 algorithmBriggs: Finds out that Chaitins algorithmPLDI spills even when there are available1989 registers. Solution: the optimistic approach: may-spill during stack construction, decide at spilling time.

\course\cpeg421-10F\Topic-2b.ppt

• *\course\cpeg421-10F\Topic-2b.ppt*Brief History of Register Allocation (Cont)Callahan: Hierarchical Coloring Graph,PLDI register preference,1991 profitability of spilling.Chow-Hennessy: Priority-based coloring.SIGPLAN Integrate spilling decisions in the1984 coloring decisions: spill a variableASPLOS for a limited life range.1990 Favor dense over sparse use regions. Consider parameter passing convention.

\course\cpeg421-10F\Topic-2b.ppt

• *\course\cpeg421-10F\Topic-2b.ppt*Assigning Registers to more Profitable Variables (example)c = S ;sum = 0 ;i = 1 ;while ( i
• *\course\cpeg421-10F\Topic-2b.ppt*[105] sum := sum + i[106]i := i + 1[107]goto L1[100] c = S[101]sum := 0[102]i := 1[103]label L1:[104]if i > 100 goto L2[108]label L2:[109]square = sum * sum[110]print c, sum, squareThe Control Flow Graph of the Examplec = S ;sum = 0 ;i = 1 ;while ( i
• *\course\cpeg421-10F\Topic-2b.ppt*Desired Register Allocation for ExampleAssume that there are only two non-reserved registers available for allocation (\$t2 and \$t3). A desired register allocation for the above example is as follows:VariableRegistercno registersum\$t2i\$t3square\$t3c = S ;sum = 0 ;i = 1 ;while ( i
• *\course\cpeg421-10F\Topic-2b.ppt*1. Pay careful attention to assigning registers to variables that are more profitableThe number of defs (writes) and uses (reads) to the variables in this sample program is as follows:

Register Allocation GoalsVariable#defs#uses c 1 1 sum101103 i101301 square 1 1 variables sum and i should get priority over variable c for register assignment.

c = S ;sum = 0 ;i = 1 ;while ( i

• *\course\cpeg421-10F\Topic-2b.ppt*2. Use the same register for multiple variables when legal to do so

Reuse same register (\$t3) for variables I and square since there is no point in the program where both variables are simultaneously live.Register Allocation GoalsVariableRegistercno registersum\$t2i\$t3square\$t3c = S ;sum = 0 ;i = 1 ;while ( i

• *\course\cpeg421-10F\Topic-2b.ppt*

Register Allocation vs. Register AssignmentRegister Allocation determining which values should be kept i

Recommended

Documents
##### Gsb728 lecture note topic 2b
Economy & Finance
Documents
##### 2b or not 2b
Presentations & Public Speaking
Documents
Education
Documents
Education
Documents
Documents
Documents
Documents