66
Compiler Optimisation 7 – Register Allocation Hugh Leather IF 1.18a [email protected] Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2019

Compiler Optimisation - 7 Register Allocation · 2019. 1. 23. · Register allocation Definitions Spilling Spilling saves a value from a register to memory That register is then

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

  • Compiler Optimisation7 – Register Allocation

    Hugh LeatherIF 1.18a

    [email protected]

    Institute for Computing Systems ArchitectureSchool of Informatics

    University of Edinburgh

    2019

  • Introduction

    This lecture:Local Allocation - spill codeGlobal Allocation based on graph colouringTechniques to reduce spill code

  • Register allocation

    Physical machines have limited number of registersScheduling and selection typically assume infinite registersRegister allocation and assignment ∞ → k registers

    RequirementsProduce correct code that uses k (or fewer) registersMinimise added loads and storesMinimise space used to hold spilled valuesOperate efficiently

    O(n), O(nlog2n), maybe O(n2), but not O(2n)

  • Register allocationDefinitions

    Allocation versus assignmentAllocation is deciding which values to keep in registersAssignment is choosing specific registers for values

    InterferenceTwo valuesa cannot be mapped to the same register wherever theyare both livebSuch values are said to interfere

    aA value is stored in a variablebA value is live from its definition to its last use

    Live rangeThe live range of a value is the set of statements at which it is liveMay be conservatively overestimated (e.g. just begin → end)

  • Register allocationDefinitions

    SpillingSpilling saves a value from a register to memoryThat register is then free – Another value often loadedRequires F registers to be reserved

    Clean and dirty valuesA previously spilled value is clean if not changed since last spillOtherwise it is dirtyA clean value can b spilled without a new store instruction

    Spilling in ILOCF is 0 (assuming rarp already reserved)Dirty valuestoreAI rx → rarp,@xloadAI rarp,@y ⇒ ry

    Clean valueloadAI rarp,@y ⇒ ry

  • Local register allocation

    Register allocation only on basic block

    MAXLIVELet MAXLIVE be the maximum, over each instruction i in theblock, of the number of values (pseudo-registers) live at i.

    If MAXLIVE ≤ k, allocation should be easyIf MAXLIVE ≤ k, no need to reserve F registers for spillingIf MAXLIVE > k, some values must be spilled to memoryIf MAXLIVE > k, need to reserve F registers for spilling

    Two main forms:Top downBottom up

  • Local register allocationMAXLIVE

    Example MAXLIVE computationSome simple code with virtual registers

  • Local register allocationMAXLIVE

    Example MAXLIVE computationLive registers

  • Local register allocationMAXLIVE

    Example MAXLIVE computationMAXLIVE is 4

  • Local register allocationTop down

    Algorithm:If number of values > k

    Rank values by occurrencesAllocate first k - F values to registersSpill other values

  • Local register allocationTop down

    Example top downUsage counts

  • Local register allocationTop down

    Example top downSpill rc . Now only 3 values live at once

  • Local register allocationTop down

    Example top downSpill code inserted

  • Local register allocationTop down

    Example top downRegister assignment straightforward

  • Local register allocationBottom up

    Algorithm:Start with empty register setLoad on demandWhen no register is available, free one

    Replacement:Spill the value whose next use is farthest in the futurePrefer clean value to dirty value

  • Local register allocationTop down

    Example bottom downSpill ra. Now only 3 values live at once

  • Local register allocationTop down

    Example bottom downSpill code inserted

  • Global register allocation

    Local allocation does not capture reuse of values across multipleblocksMost modern, global allocators use a graph-colouring paradigm

    Build a “conflict graph” or “interference graph”Data flow based liveness analysis for interference

    Find a k-colouring for the graph, or change the code to anearby problem that it can k-colourNP-complete under nearly all assumptions1

    1Local allocation is NP-complete with dirty vs clean

  • Global register allocationAlgorithm sketch

    From live ranges construct an interference graphColour interference graph so that no two neighbouring nodeshave same colourIf graph needs more than k colours - transform code

    Coalesce merge-able copiesSplit live rangesSpill

    Colouring is NP-complete so we will need heuristicsMap colours onto physical registers

  • Global register allocationGraph colouring

    DefinitionA graph G is said to be k-colourable iff the nodes can be labeledwith integers 1 ... k so that no edge in G connects two nodes withthe same label

    Examples

  • Global register allocationInterference graph

    The interference graph, GI = (NI ,EI)Nodes in GI represent values, or live rangesEdges in GI represent individual interferences∀x , y ∈ NI , x → y ∈ EI iff x and y interfere2

    A k-colouring of GI can be mapped into an allocation to k registers

    2Two values interfere wherever they are both liveTwo live ranges interfere if their values interfere at any point

  • Global register allocationColouring the interference graph

    Degree3 of a node (n°) is a loose upper bound on colourabilityAny node, n, such that n° < k is always trivially k-colourable

    Trivially colourable nodes cannot adversely affect thecolourability of neighbours4Can remove them from graphReduces degree of neighbours - may be trivially colourable

    If left with any nodes such that n° ≥ k spill oneReduces degree of neighbours - may be trivially colourable

    3Degree is number of neighbours4Proof as exercise

  • Global register allocationChaitin’s algorithm

    1 While ∃ vertices with < k neighbours in GIPick any vertex n such that n° < k and put it on the stackRemove n and all edges incident to it from GI

    2 If GI is non-empty (n° >= k, ∀n ∈ GI) then:Pick vertex n (heuristic), spill live range of nRemove vertex n and edges from GI , put n on “spill list”Goto step 1

    3 If the spill list is not empty, insert spill code, then rebuild theinterference graph and try to allocate, again

    4 Otherwise, successively pop vertices off the stack and colourthem in the lowest colour not used by some neighbour

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmColour with k = 3 colours

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithma° = 2 < k Choose a

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmPush a and remove from graph

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmb° = 2 < k and c° = 2 < k Choose b

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmPush b and remove from graph

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmc° = 2 < k, d° = 2 < k, and e° = 2 < k Choose c

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmPush c and remove from graph

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmd° = 1 < k and e° = 1 < k Choose d

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmPush d and remove from graph

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithme° = 0 < k Choose e

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmPush e and remove from graph

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmPop e, neighbours use no colours, choose red

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmPop d , neighbours use red, choose green

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmPop c, neighbours use red and green choose blue

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmPop b, neighbours use red and green choose blue

  • Global register allocationChaitin’s algorithm

    Example: colouring with Chaitin’s algorithmPop a, neighbours use blue choose red

  • Global register allocationOptimistic colouring

    If Chaitins algorithm reaches a state where every node has kor more neighbours, it chooses a node to spill.

    Example of Chaitin overzealous spilling

    k = 2Graph is 2-colourable

    Chaitin must immediately spill one of these nodes

    Briggs said, take that same node and push it on the stackWhen you pop it off, a colour might be available for it!

    Chaitin-Briggs algorithm uses this to colour that graph

  • Global register allocationChaitin-Briggs algorithm

    1 While ∃ vertices with < k neighbours in GIPick any vertex n such that n° < k and put it on the stackRemove n and all edges incident to it from GI

    2 If GI is non-empty (n° >= k, ∀n ∈ GI) then:Pick vertex n (heuristic) (Do not spill)Remove vertex n from GI , put n on stack (Not spill list)Goto step 1

    3 Otherwise, successively pop vertices off the stack and colourthem in the lowest colour not used by some neighbour

    If some vertex cannot be coloured, then pick an uncolouredvertex to spill, spill it, and restart at step 1

    Step 3 is also different

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithmColour with k = 2 colours

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithma° = 2 ≥ k Don’t Spill! Choose a

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithmPush a and remove from graph

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithmb° = 1 < k and c° = 1 < k Choose b

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithmPush b and remove from graph

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithmc° = 1 < k, and d° = 1 < k Choose c

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithmPush c and remove from graph

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithmd° = 1 < k Choose d

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithmPush d and remove from graph

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithmPop d , neighbours use no colours, choose red

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithmPop c, neighbours use red choose green

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithmPop b, neighbours use red choose green

  • Global register allocationChaitin-Briggs algorithm

    Example: colouring with Chaitin-Briggs algorithmPop a, neighbours use green choose red

  • Global register allocationSpill candidates

    Minimise spill cost/ degreeSpill cost is the loads and stores needed. Weighted by scope -i.e. avoid inner loopsThe higher the degree of a node to spill the greater thechance that it will help colouringNegative spill cost load and store to same memory locationwith no other usesInfinite cost - definition immediately followed by use. Spillingdoes not decrease live range

  • Global register allocationAlternative spilling

    Splitting live rangesCoalesce

  • Global register allocationLive range splitting

    A whole live range may have many interferences, but perhapsnot all at the same timeSplit live range into two variables connected by copyCan reduce degree of interference graphSmart splitting allows spilling to occur in “cheap” regions

  • Global register allocationLive ranges splitting

    Splitting exampleNon contiguous live ranges - cannot be 2 coloured

  • Global register allocationLive ranges splitting

    Splitting exampleSplit live ranges - can be 2 coloured

  • Global register allocationCoalescing

    If two ranges don’t interfere and are connected by a copycoalesce into one – opposite of splittingReduces degree of nodes that interfered with bothIf x := y and x → y ∈ GI then can combine LRx and LRy

    Eliminates the copy operationReduces degree of LRs that interfere with both x and yIf a node interfered with both both before, coalescing helpsAs it reduces degree, often applied before colouring takes place

  • Global register allocationCoalescing

    Coalescing can make the graph harder to colorTypically, LRxy ° > max(LRx °, LRy °)If max(LRx°, LRy°) < k and k < LRxy ° then LRxy might spill,while LRx and LRy would not spill

  • Global register allocationCoalescing

    Observation led to conservative coalescingConceptually, coalesce x and y iff x → y ∈ GI and LRxy ° < kWe can do better

    Coalesce LRx and LRy iff LRxy has < k neighbours withdegree > kOnly neighbours of “significant degree” can force LRxy to spill

    Always safe to perform that coalesceCannot introduce a node of non-trivial degreeCannot introduce a new spill

  • Global register allocationOther approaches

    Top-down uses high level priorities to decide on colouringHierarchical approaches - use control flow structure to guideallocationExhaustive allocation - go through combinatorial options -very expensive but occasional improvementRe-materialisation - if easy to recreate a value do so ratherthan spillPassive splitting using a containment graph to make spillseffectiveLinear scan - fast but weak; useful for JITs

  • Global register allocationOngoing work

    Eisenbeis et al examining optimality of combined reg alloc andscheduling. Difficulty with general control-flowPartitioned register sets complicate matters. Allocation canrequire insertion of code which in turn affects allocation.Leupers investigated use of genetic algs for TM seriespartitioned reg sets.New work by Fabrice Rastello and others. Chordal graphsreduce complexityAs latency increases see work in combined code generation,instruction scheduling and register allocation

  • Summary

    Local Allocation - spill codeGlobal Allocation based on graph colouringTechniques to reduce spill code

  • PPar CDT Advert

    The biggest revolution in the technological landscape for fifty years

    Now accepting applications! Find out more and apply at:

    pervasiveparallelism.inf.ed.ac.uk

    • • 4-year programme: 4-year programme: MSc by Research + PhDMSc by Research + PhD

    • Collaboration between: ▶ University of Edinburgh’s School of Informatics ✴ Ranked top in the UK by 2014 REF

    ▶ Edinburgh Parallel Computing Centre ✴ UK’s largest supercomputing centre

    • Full funding available

    • Industrial engagement programme includes internships at leading companies

    • Research-focused: Work on your thesis topic from the start

    • Research topics in software, hardware, theory and

    application of: ▶ Parallelism ▶ Concurrency ▶ Distribution