Topic 2b Basic Back-End Optimization

Embed Size (px)

DESCRIPTION

Topic 2b Basic Back-End Optimization. Register allocation. Slides: Topic 3a Dragon book: chapter 10 S. Cooper: Chapter 13 Other papers as assigned in class or homework. Reading List. Focus of This Topic. We focus on “scalar register allocation” - PowerPoint PPT Presentation

Text of Topic 2b Basic Back-End Optimization

  • *\course\cpeg421-10F\Topic-2b.ppt*Topic 2b Basic Back-End OptimizationRegister allocation

    \course\cpeg421-10F\Topic-2b.ppt

  • *\course\cpeg421-10F\Topic-2b.ppt*Reading ListSlides: Topic 3aDragon book: chapter 10 S. Cooper: Chapter 13Other papers as assigned in class or homework

    \course\cpeg421-10F\Topic-2b.ppt

  • *\course\cpeg421-10F\Topic-2b.ppt*Focus of This TopicWe focus on scalar register allocationLocal register is straightforward (read Coopers Section 13.3)This global register allocation problem is essentially solved by graph coloring techniques:Chaitin et. al. 1981, 82 (IBM)Chow, Hennesy 1983 (Stanford)Briggs, Kennedy 1992 (Rice)Register allocation for array variables in loops -- subject not discussed here

    \course\cpeg421-10F\Topic-2b.ppt

  • *\course\cpeg421-10F\Topic-2b.ppt*Interprocedural Analysis and OptimizationLoop Nest Optimization and ParallelizationGlobal OptimizationCode GenerationFront endGood IRHigh-Level Compiler Infrastructure Needed A Modern View

    \course\cpeg421-10F\Topic-2b.ppt

  • *\course\cpeg421-10F\Topic-2b.ppt*Good IPOGood LNOGood global optimizationGood integration of IPO/LNO/OPTSmooth information passing between FE and CGComplete and flexible support of inner-loop scheduling (SWP), instruction scheduling and register allocation Inter-ProceduralOptimization (IPO)Loop NestOptimization (LNO)Global Optimization(OPT)SourceInnermostLoopschedulingGlobal instschedulingReg allocLocal instschedulingExecutableArchModelsCGMEGeneral Compiler Framework

    \course\cpeg421-10F\Topic-2b.ppt

  • *\course\cpeg421-10F\Topic-2b.ppt*A Map of Modern Compiler PlatformsGNU CompilersIMPACT CompilerCydra VLIWCompilerMultiflow VLIW CompilerUcode CompilerChow/HennessyHP ResearchCompiler SGI Pro Compiler - Designed for ILP/MP - Production quality - Open SourceTrimaranCompilerSUIF CompilerLLVM CompilerOpen64 Compiler (PathScale, ORC, Osprey)

    \course\cpeg421-10F\Topic-2b.ppt

  • *\course\cpeg421-10F\Topic-2b.ppt*Osprey Compiler Performance (4/3/07)GCC4.3 at O3 With additional options recommended by GCC developersTwo programs has runtime error using additional optionsOsprey3.1 with vanilla O3The performance delta is ~10%, excluding two failing programsCurtesy: S.M. Liu (HP)

    \course\cpeg421-10F\Topic-2b.ppt

    Chart1

    900705

    1016954

    1132176.gcc

    10861090

    936945

    804815

    1074907

    883871

    641571

    1223255.vortex

    951800

    10831097

    923954

    26291996

    397375

    1144873

    953.7900062392868.6840686344

    opencc3.1 -O3

    gcc4.3 -O3 -funroll-loops -fgcse-las -fgcse-sm

    SPEC2000 C/C++ Benchmark ComparisonMontecito 1.6GHz 4G Memory (higher is better)

    Sheet1

    opencc O3gcc4.3 O3 -funroll-loops -fgcse-las -fgcse-smopencc -O3/gcc -O3C/C++

    int:timeratiotimeratioopencc3.1 -O3gcc4.3 -O3 -funroll-loops -fgcse-las -fgcse-sm

    164.gzip1569001997051.27659574471.2765957447164.gzip900705

    175.vpr13810161479541.06498951781.0649895178175.vpr1016954

    176.gcc97.21132RE176.gcc1132

    181.mcf166108616510900.99633027520.9963302752181.mcf10861090

    186.crafty1079361069450.99047619050.9904761905186.crafty936945

    197.parser2248042218150.98650306750.9865030675197.parser804815

    252.eon12110741439071.1841234841.184123484252.eon1074907

    253.perlbmk2048832078711.01377726751.0137772675253.perlbmk883871

    254.gap1726411935711.1225919441.122591944254.gap641571

    255.vortex1551223255.vortex1223

    256.bzip21589511888001.188751.18875256.bzip2951800

    300.twolf277108327310970.98723792160.9872379216300.twolf10831097

    9568611.0766680749177.mesa923954

    fp:179.art26291996

    168.wupwise2666023254931.2210953347183.equake397375

    171.swim12025915225944.361952862188.ammp1144873

    172.mgrid12014976372835.2897526502Geomean953.7900062392868.6840686344

    173.applu17811814244952.3858585859979.1533788234868.6840686344

    177.mesa1529231479540.96750524110.96750524111.0979711044

    178.galgel58.249851.1271685693

    179.art98.9262913019961.31713426851.3171342685

    183.equake3283973473751.05866666671.0586666667

    187.facerec4364364344380.99543379

    188.ammp19211442528731.31042382591.3104238259

    189.lucas14014332557851.825477707

    191.fma3d6533227772701.1925925926

    200.sixtrack1616825551983.4444444444

    301.apsi24510593926631.5972850679

    geomean1050537.38058869481.74747655011.0979711044

    spec2006

    int:

    400.perlbench15416.3416166.041.0496688742

    401.bzip213207.3113886.951.0517985612

    403.gcc12156.6311676.90.9608695652

    429.mcf20534.4421494.241.0471698113

    445.gobmk13467.7912568.360.9318181818

    456.hmmer10738.6912377.541.1525198939

    458.sjeng16637.2816377.390.9851150203

    462.libquantum68573.0266963.090.9773462783

    464.h264ref154714.3152114.60.9794520548

    471.omnetpp15064.1513834.520.9181415929

    473.astar8688.0910606.621.2220543807

    483.xalancbmk16784.1115674.40.9340909091

    6.31293179586.22684801641.0138246155

    fp:

    410.bwavesRE30544.45

    416.gamess27767.0530826.35

    433.milc25413.6126073.521.0255681818

    434.zeusmp18045.0432672.79

    435.gromacs8688.2219643.64

    436.cactusADM32993.6235513.37

    437.leslie3d1045942192.23

    444.namd9198.7212966.191.408723748

    447.dealII16836.821115.421.2546125461

    450.soplex18714.4618834.431.006772009

    453.povray6837.797826.81.1455882353

    454.calculix20873.9522343.69

    459.GemsFDTD21994.8242152.52

    465.tonto10709.223324.22

    470.lbm8571619607.012.2824536377

    481.wrfRECE

    482.sphinx3144813.521149.221.464208243

    1.1173802018

    Sheet2

    Sheet3

  • *\course\cpeg421-10F\Topic-2b.ppt*Vision and Status of Open64 Today ?People should view it as GCC with an alternative backend with great potential to reclaim the best compiler in the worldThe technology incorporated all top compiler optimization research in 90'sIt has regain momentum in the last three years due to Pathscale, HP and AMDs (and others) investment in robustness and performanceTargeted to x86, Itanium in the public repository, ARM, MIPS, PowerPC, and several other signal processing CPU in private branches

    \course\cpeg421-10F\Topic-2b.ppt

  • *\course\cpeg421-10F\Topic-2b.ppt*Register AllocationMotivationLive ranges and interference graphsProblem formulationSolution methods

    \course\cpeg421-10F\Topic-2b.ppt

  • *\course\cpeg421-10F\Topic-2b.ppt*Motivation Registers much faster than memory Limited number of physical registers Keep values in registers as long as possible (minimize number of load/stores executed)

    \course\cpeg421-10F\Topic-2b.ppt

  • *\course\cpeg421-10F\Topic-2b.ppt*Goals of Optimized Register Allocation1 Pay careful attention to allocating registers to variables that are more profitable to reside in registers2 Use the same register for multiple variables when legal to do so

    \course\cpeg421-10F\Topic-2b.ppt

  • *\course\cpeg421-10F\Topic-2b.ppt*Brief History of Register AllocationChaitin: Coloring Heuristic. Use the simple stack heuristic forACM register allocation. Spill/no-spillSIGPLAN decisions are made during theNotices stack construction phase of the1982 algorithmBriggs: Finds out that Chaitins algorithmPLDI spills even when there are available1989 registers. Solution: the optimistic approach: may-spill during stack construction, decide at spilling time.

    \course\cpeg421-10F\Topic-2b.ppt

  • *\course\cpeg421-10F\Topic-2b.ppt*Brief History of Register Allocation (Cont)Callahan: Hierarchical Coloring Graph,PLDI register preference,1991 profitability of spilling.Chow-Hennessy: Priority-based coloring.SIGPLAN Integrate spilling decisions in the1984 coloring decisions: spill a variableASPLOS for a limited life range.1990 Favor dense over sparse use regions. Consider parameter passing convention.

    \course\cpeg421-10F\Topic-2b.ppt

  • *\course\cpeg421-10F\Topic-2b.ppt*Assigning Registers to more Profitable Variables (example)c = S ;sum = 0 ;i = 1 ;while ( i
  • *\course\cpeg421-10F\Topic-2b.ppt*[105] sum := sum + i[106]i := i + 1[107]goto L1[100] c = S[101]sum := 0[102]i := 1[103]label L1:[104]if i > 100 goto L2[108]label L2:[109]square = sum * sum[110]print c, sum, squareThe Control Flow Graph of the Examplec = S ;sum = 0 ;i = 1 ;while ( i
  • *\course\cpeg421-10F\Topic-2b.ppt*Desired Register Allocation for ExampleAssume that there are only two non-reserved registers available for allocation ($t2 and $t3). A desired register allocation for the above example is as follows:VariableRegistercno registersum$t2i$t3square$t3c = S ;sum = 0 ;i = 1 ;while ( i
  • *\course\cpeg421-10F\Topic-2b.ppt*1. Pay careful attention to assigning registers to variables that are more profitableThe number of defs (writes) and uses (reads) to the variables in this sample program is as follows:

    Register Allocation GoalsVariable#defs#uses c 1 1 sum101103 i101301 square 1 1 variables sum and i should get priority over variable c for register assignment.

    c = S ;sum = 0 ;i = 1 ;while ( i

  • *\course\cpeg421-10F\Topic-2b.ppt*2. Use the same register for multiple variables when legal to do so

    Reuse same register ($t3) for variables I and square since there is no point in the program where both variables are simultaneously live.Register Allocation GoalsVariableRegistercno registersum$t2i$t3square$t3c = S ;sum = 0 ;i = 1 ;while ( i

  • *\course\cpeg421-10F\Topic-2b.ppt*

    Register Allocation vs. Register AssignmentRegister Allocation determining which values should be kept i