26
School of Computer Science A Global Progressive Register Allocator David Ryan Koes Seth Copen Goldstein Carnegie Mellon University Carnegie Mellon University {dkoes,seth}@cs.cmu.edu

School of Computer Science A Global Progressive Register Allocator David Ryan Koes Seth Copen Goldstein Carnegie Mellon University {dkoes,seth}@cs.cmu.edu

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

School of Computer Science

A Global Progressive Register Allocator

A Global Progressive Register Allocator

David Ryan KoesSeth Copen GoldsteinCarnegie Mellon UniversityCarnegie Mellon University

{dkoes,seth}@cs.cmu.edu

2School of Computer Science

Register Allocation ProblemRegister Allocation Problem

v = 1

w = v + 3

x = w + v

u = v

t = u + x

print(x);

print(w);

print(t);

print(u);

registerregisterallocatorallocatorregisterregisterallocatorallocator

unbounded number of unbounded number of program variablesprogram variables

limited number of limited number of processor registers + processor registers + slow memoryslow memory

eaxebxecxedxesiedi

ebpesp

spill code optimizationspill code optimizationspill code optimizationspill code optimization

memory operandsmemory operandsmemory operandsmemory operands

register preferencesregister preferencesregister preferencesregister preferencesrematerializationrematerializationrematerializationrematerialization

live range splittinglive range splittinglive range splittinglive range splitting

3School of Computer Science

A More Principled Register AllocatorA More Principled Register Allocator– fully utilize machine description

• explicit and expressive model of costs of allocation for given architecture

– optimal solutions

reg allocreg

alloc

machine description

4School of Computer Science

Multi-commodity Network Flow: An Expressive ModelMulti-commodity Network Flow: An Expressive Model

Given network (directed graph) with– cost and capacity on each edge– sources & sinks for multiple commodities

Find lowest cost flow of commodities

NP-complete for integer flows

Example:edges have unit capacity

a b

a b

01

5School of Computer Science

Variables Commodities

Variable Definition Source

Variable Last Use Sink

Nodes Allocation Classes (Reg/Mem/Const)

Registers Limits Node Capacities

Spill Costs Edge Costs

Allocation Flow

Register Allocation as a MCNFRegister Allocation as a MCNF

a

a

r0 r1 mem 1

r1 mem 1

r0 r1 mem 1

3

Also need Also need anti-variablesanti-variables to to model persistent memorymodel persistent memoryAlso need Also need anti-variablesanti-variables to to model persistent memorymodel persistent memory

6School of Computer Science

ExampleExampleSource Codeint example(int a, int b){ int d = 1; int c = a - b; return c+d;}

Pre-alloc AssemblyMOVE 1 -> dSUB a,b -> cADD c,d -> cMOVE c -> r0

load cost

insn pref cost

mem access cost

7School of Computer Science

Control FlowControl FlowMCNF can only represent straight-line code

– need to link together networks from basic blocks

a: %eaxa: %eax

a: %eaxa: %eaxa: %eaxa: %eax

a: mema: mem

a: mema: mema: mema: mem

a: mema: mem

New nodes to handle block entry/exit constraints

Normal

ini outi

Merge

ini out,i

Split

in outi,i

8School of Computer Science

A More Principled Register AllocatorA More Principled Register Allocator– fully utilize machine description

• explicit and expressive model of costs of allocation for given architecture: Global MCNF

– optimal solutions• NP-hard, so use progressive

solution technique

Compile Time

Allo

catio

n Q

ualit

y

Lagrangian relaxation directed allocatorsLagrangian relaxation directed allocators

Technique:Technique:

reg alloc

reg alloc

machine description

9School of Computer Science

Solution ProcedureSolution ProcedureCompute Lagrangian prices using iterative

subgradient optimization– guaranteed converge to “optimal” prices

• for linear relaxation of the problem

Prices used by allocator to find solution– solution improves as prices converge– two allocators

• iterative heuristic allocator• simultaneous heuristic allocator

10School of Computer Science

Solution ProcedureSolution ProcedureAdvantages

+ iterative nature progressive+ Lagrangian relaxation theory provides means

for computing a good lower bound+ Can compute optimality bound

Disadvantages– No guarantee of finding optimal solution– Optimality bound poor if integrality gap large

99% of the time 99% of the time integrality gap = 0integrality gap = 099% of the time 99% of the time integrality gap = 0integrality gap = 0

11School of Computer Science

Iterative Heuristic AllocatorIterative Heuristic AllocatorAllocation order:

a, b, c, d

Cost:

a

0

b

4

c

0

d

-2

Total: 22

Edges to/from memory cost 3

12School of Computer Science

Simultaneous Heuristic AllocatorSimultaneous Heuristic Allocator

XX XX

Current cost:-1-1-3-3-2-2

Edges to/from memory cost 3

13School of Computer Science

EvaluationEvaluationImplemented in gcc 3.4.3 targeting x86

Optimize for code sizecode size– perfect static evaluation– important metric in its own right

MediaBench, MiBench, Spec95, Spec2000– over 10,000 functions

14School of Computer Science

ProgressivenessProgressiveness

CPLEX

default allocator: 1121graph allocator: 1422

15School of Computer Science

ProgressivenessProgressiveness

graph allocator

default allocator

CPLEX

16School of Computer Science

Code SizeCode Size

Progressive!

17School of Computer Science

OptimalityOptimality

Proven optimality

Proven maximum distance from optimal

18School of Computer Science

10x slower

Compile Time Slowdown :-(Compile Time Slowdown :-(

19School of Computer Science

A More Principled Register AllocatorA More Principled Register Allocator– fully utilize machine description

• explicit and expressive model of costs of allocation for given architecture: Global MCNF

– optimal solutions• approach optimality using

progressive solution technique: Lagrangian directed allocators

reg allocreg

alloc

machine description

20School of Computer Science

Questions?Questions?

?

21School of Computer Science

22School of Computer Science

0%

10%

20%

30%

40%

50%

60%

<10%

10–5%

5–2%

2–1%

1–0% 0%

0–1%

1–2%

2–3%

3–4%

4–5%

5–10%

10–100%

>100%

Percent predicted size larger than actual size

Perc

en

t of

fun

ctio

ns

Accuracy of the ModelAccuracy of the ModelGlobal MCNF model correctly predicts costs of register allocation within 2% for 71% of functions compiled

23School of Computer Science

Code SizeCode Size

24School of Computer Science

Compile Time Asymptotic ComplexityCompile Time Asymptotic Complexity

one iteration: O(nv)

25School of Computer Science

Code PerformanceCode Performance

26School of Computer Science

Compile Time Slowdown :-(Compile Time Slowdown :-(

10x slower