View
216
Download
1
Tags:
Embed Size (px)
Citation preview
School of Computer Science
A Global Progressive Register Allocator
A Global Progressive Register Allocator
David Ryan KoesSeth Copen GoldsteinCarnegie Mellon UniversityCarnegie Mellon University
{dkoes,seth}@cs.cmu.edu
2School of Computer Science
Register Allocation ProblemRegister Allocation Problem
…
v = 1
w = v + 3
x = w + v
u = v
t = u + x
print(x);
print(w);
print(t);
print(u);
…
registerregisterallocatorallocatorregisterregisterallocatorallocator
unbounded number of unbounded number of program variablesprogram variables
limited number of limited number of processor registers + processor registers + slow memoryslow memory
eaxebxecxedxesiedi
ebpesp
spill code optimizationspill code optimizationspill code optimizationspill code optimization
memory operandsmemory operandsmemory operandsmemory operands
register preferencesregister preferencesregister preferencesregister preferencesrematerializationrematerializationrematerializationrematerialization
live range splittinglive range splittinglive range splittinglive range splitting
3School of Computer Science
A More Principled Register AllocatorA More Principled Register Allocator– fully utilize machine description
• explicit and expressive model of costs of allocation for given architecture
– optimal solutions
reg allocreg
alloc
machine description
4School of Computer Science
Multi-commodity Network Flow: An Expressive ModelMulti-commodity Network Flow: An Expressive Model
Given network (directed graph) with– cost and capacity on each edge– sources & sinks for multiple commodities
Find lowest cost flow of commodities
NP-complete for integer flows
Example:edges have unit capacity
a b
a b
01
5School of Computer Science
Variables Commodities
Variable Definition Source
Variable Last Use Sink
Nodes Allocation Classes (Reg/Mem/Const)
Registers Limits Node Capacities
Spill Costs Edge Costs
Allocation Flow
Register Allocation as a MCNFRegister Allocation as a MCNF
a
a
r0 r1 mem 1
r1 mem 1
r0 r1 mem 1
3
Also need Also need anti-variablesanti-variables to to model persistent memorymodel persistent memoryAlso need Also need anti-variablesanti-variables to to model persistent memorymodel persistent memory
6School of Computer Science
ExampleExampleSource Codeint example(int a, int b){ int d = 1; int c = a - b; return c+d;}
Pre-alloc AssemblyMOVE 1 -> dSUB a,b -> cADD c,d -> cMOVE c -> r0
load cost
insn pref cost
mem access cost
7School of Computer Science
Control FlowControl FlowMCNF can only represent straight-line code
– need to link together networks from basic blocks
a: %eaxa: %eax
a: %eaxa: %eaxa: %eaxa: %eax
a: mema: mem
a: mema: mema: mema: mem
a: mema: mem
New nodes to handle block entry/exit constraints
Normal
ini outi
Merge
ini out,i
Split
in outi,i
8School of Computer Science
A More Principled Register AllocatorA More Principled Register Allocator– fully utilize machine description
• explicit and expressive model of costs of allocation for given architecture: Global MCNF
– optimal solutions• NP-hard, so use progressive
solution technique
Compile Time
Allo
catio
n Q
ualit
y
Lagrangian relaxation directed allocatorsLagrangian relaxation directed allocators
Technique:Technique:
reg alloc
reg alloc
machine description
9School of Computer Science
Solution ProcedureSolution ProcedureCompute Lagrangian prices using iterative
subgradient optimization– guaranteed converge to “optimal” prices
• for linear relaxation of the problem
Prices used by allocator to find solution– solution improves as prices converge– two allocators
• iterative heuristic allocator• simultaneous heuristic allocator
10School of Computer Science
Solution ProcedureSolution ProcedureAdvantages
+ iterative nature progressive+ Lagrangian relaxation theory provides means
for computing a good lower bound+ Can compute optimality bound
Disadvantages– No guarantee of finding optimal solution– Optimality bound poor if integrality gap large
99% of the time 99% of the time integrality gap = 0integrality gap = 099% of the time 99% of the time integrality gap = 0integrality gap = 0
11School of Computer Science
Iterative Heuristic AllocatorIterative Heuristic AllocatorAllocation order:
a, b, c, d
Cost:
a
0
b
4
c
0
d
-2
Total: 22
Edges to/from memory cost 3
12School of Computer Science
Simultaneous Heuristic AllocatorSimultaneous Heuristic Allocator
XX XX
Current cost:-1-1-3-3-2-2
Edges to/from memory cost 3
13School of Computer Science
EvaluationEvaluationImplemented in gcc 3.4.3 targeting x86
Optimize for code sizecode size– perfect static evaluation– important metric in its own right
MediaBench, MiBench, Spec95, Spec2000– over 10,000 functions
14School of Computer Science
ProgressivenessProgressiveness
CPLEX
default allocator: 1121graph allocator: 1422
17School of Computer Science
OptimalityOptimality
Proven optimality
Proven maximum distance from optimal
19School of Computer Science
A More Principled Register AllocatorA More Principled Register Allocator– fully utilize machine description
• explicit and expressive model of costs of allocation for given architecture: Global MCNF
– optimal solutions• approach optimality using
progressive solution technique: Lagrangian directed allocators
reg allocreg
alloc
machine description
22School of Computer Science
0%
10%
20%
30%
40%
50%
60%
<10%
10–5%
5–2%
2–1%
1–0% 0%
0–1%
1–2%
2–3%
3–4%
4–5%
5–10%
10–100%
>100%
Percent predicted size larger than actual size
Perc
en
t of
fun
ctio
ns
Accuracy of the ModelAccuracy of the ModelGlobal MCNF model correctly predicts costs of register allocation within 2% for 71% of functions compiled
24School of Computer Science
Compile Time Asymptotic ComplexityCompile Time Asymptotic Complexity
one iteration: O(nv)