View
228
Download
0
Category
Tags:
Preview:
Citation preview
Incrementalized Pointer and Escape Analysis
Martin RinardMIT LCS
Context
•Unsafe languages (C, C++, …)•Safe languages (Java, ML, Scheme, …)•Big difference: garbage collection
Goal: analyze program to safely allocate objects on the call stack instead of in
garbage-collected heap
Advantages• No dangling
references• No memory leaks
Disadvantages• Collection
overhead• Collection pauses
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Program With Allocation Sites
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Program With Allocation Sites
When program runs
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Program With Allocation Sites
When program runs
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Program With Allocation Sites
When program runsIt allocates objects in heap
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Program With Allocation SitesProgram With Allocation Sites
When program runsIt allocates objects in heap
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Program With Allocation SitesProgram With Allocation Sites
When program runsIt allocates objects in heap
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Program With Allocation SitesProgram With Allocation Sites
When program runsIt allocates objects in heap
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Program With Allocation Sites
When objects become unreachable
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Program With Allocation Sites
When objects become unreachableGarbage collector (eventually)
reclaims memory
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Stack AllocationCorrelate lifetimes of objectswith lifetimes of procedures
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Stack AllocationAllocate object on activation
record of procedure
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Stack AllocationWhen procedure returns
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Stack AllocationWhen procedure returns
Object automatically deallocated
Problem
• How does compiler determine if it is safe to allocate objects on stack?
• Classic problem in program analysis• Standard approach
• Analyze whole program• Use information to find “captured”
objects• Allocate captured objects on stack
Stack Allocation
Percentage of Memory Allocated on Stack
0
20
40
60
80
100
barnes water jlex db raytrace compress
Whole-Program Analysis
Normalized Execution Times
0
20
40
60
80
100
barnes water jlex db raytrace compress
Whole-Program Analysis
Reference: execution time without optimization
Analysis Times
0
25
50
75
100
125
150
barnes water jlex db raytrace compress
Ana
lysi
s T
ime
(sec
onds
)
Whole-Program Analysis
223645
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Key Observation Number One:
Most optimizations require only
the analysis of a small part of
program surrounding the object
allocation site
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Key Observation Number Two:
Most of the optimization
benefit comes from a small
percentage of the allocation
sites
99% of objects
allocated at these two
sites
Intuition for Better Analysis
• Locate important allocation sites
• Use demand-driven approach to analyze region surrounding site
• Somehow avoid sinking analysis resources into unprofitable sites
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
99% of objects allocated at these
two sites
Example
Employee Database Example
Traverse database, extract max salary
Name
Salary
John Doe
$45,000
Ben Bit
$30,000
Jane Roe
$55,000
Vector
max salary = $55,000
highest paid = Jane Roe
Coding Max Computation (in Java)
class EmployeeDatabase { Vector database = new Vector();Employee highestPaid;void computeMax() {
int max = 0;Enumeration enum = database.elements();while (enum.hasMoreElements()) {
Employee e = enum.nextElement();if (max < e.salary()) {
max = e.salary(); highestPaid = e;}
} }
}
Coding Max Computation (in Java)
class EmployeeDatabase { Vector database = new Vector();Employee highestPaid;void computeMax() {
int max = 0;Enumeration enum = database.elements();while (enum.hasMoreElements()) {
Employee e = enum.nextElement();if (max < e.salary()) {
max = e.salary(); highestPaid = e;}
} }
}
Would like to allocate enum object on stack, not on the heap
Whole Program Analysis
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Bottom Up and Compositional
Currently analyzed procedure
Currently analyzed part of
the program
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Bottom Up and Compositional
Currently analyzed procedure
Currently analyzed part of
the program
Points-to Graph in Examplevoid computeMax() {
int max = 0;Enumeration enum = database.elements();while (enum.hasMoreElements()) {
Employee e = enum.nextElement();if (max < e.salary()) { max = e.salary(); highestPaid = e; }
}
} vector elementData [ ]
this
enum
e
database highestPaid
Escape Information
• Escaped nodes• parameter nodes• returned nodes• nodes reachable from other escaped
nodes• Captured is the opposite of escaped
vector elementData [ ]
this
enum
e
database highestPaid
green = escaped
white = captured
Stack Allocation Optimization
• Examine graph from end of procedure• If a node is captured in this graph• Allocate corresponding objects on stack
(may need to inline procedures to apply optimization)
vector elementData [ ]
this
enum
e
database highestPaid
green = escaped
white = captured
Can allocate enum object on stack
Whole Program Analysis
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Whole Program Analysis
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Whole Program Analysis
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Whole Program Analysis
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Whole Program Analysis
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Whole Program Analysis
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Whole Program Analysis
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Whole Program Analysis
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
Incrementalized Analysis RequirementsMust be able to
• Analyze procedure independently of callers• Whole-program analysis is
compositional• Already does this
• Skip analysis of invoked procedures
• But later incrementally integrate analysis results if desirable to do so
First Extension to Whole-Program Analysis
• Skip the analysis of invoked procedures• Parameters are marked as escaping into
skipped call site
Assume analysis skips enum.nextElement()
vector1
this
enum
e
database highestPaid
Node 1 escapes intoenum.nextElement()
First Extension Almost Works
• Can skip analysis of invoked procedures• If allocation site is captured, great!• If not, escape information tells you what
procedures you should have analyzed…
Should have analyzed enum.nextElement()
vector1
this
enum
e
database highestPaid
Node 1 escapes intoenum.nextElement()
Second Extension to Base Algorithm
• Record enough information to undo skip and incorporate analysis into existing result• Parameter mapping at call site• Ordering information for call sites
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
Attempt to stack allocate Enumeration object from elements
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
Analyze elements(intraprocedurally)
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
Analyze elements(intraprocedurally)
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
Analyze elements(intraprocedurally)
Escapes only into the caller
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
Analyze computeMax(intraprocedurally)
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
Analyze computeMax(intraprocedurally)
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
Analyze computeMax(intraprocedurally)
Escapes to• hasMoreElements• nextElement
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
Analyze hasMoreElements and nextElement(intraprocedurally)
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
Incorporate Results
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized AnalysisEnumeration object Captured in computeMax
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized AnalysisEnumeration object Captured in computeMax
Inline elements
Stack allocate enumeration object
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
We skipped the analysis of some procedures
void computeMax() ———— ———— ————
Enumeration elements() ——— ——— ———
Vector elementData() ———— ———— ————
boolean hasMoreElements() —————— —————— ——————
void printStatistics() ——————— ——————— ———————
Employee nextElement() —————— ——————
int salary() —————— ——————
Incrementalized Analysis
We skipped the analysis of some procedures
We ignored some other procedures
Result• We can incrementally analyze
• Only what is needed • For whatever allocation site we want• And even temporarily suspend
analysis part of the way through!
New Issue• We can incrementally analyze
• Only what is needed • For whatever allocation site we want• And even temporarily suspend
analysis part of the way through!
But…• Lots of analysis opportunities
• Not all opportunities are profitable• Where to invest analysis resources?• How much resources to invest?
Analysis Policy
Formulate policy as solution to an investment problem
GoalMaximize optimization payoff from
invested analysis resources
Analysis Policy Implementation
• For each allocation site, estimate marginal return on invested analysis resources
• Loop• Invest a unit of analysis resources (time)
in site that offers best return Expand analyzed region surrounding site
• When unit expires, recompute marginal returns (best site may change)
Marginal Return Estimate
N · P(d)
C · T
N = Number of objects allocated at the siteP(d) = Probability of capturing the site, knowing
we explored a region of call depth d C = Number of skipped call sites the
allocation site escapes through T = Average time needed to analyze a call site
Marginal Return Estimate
N · P(d)
C · T
As invest analysis resources• explore larger regions around
allocation sites• get more information about sites• marginal return estimates improve• analysis makes better investment
decisions!
Usage Scenarios
• Ahead of time compiler• Give algorithm an analysis budget• Algorithm spends budget• Takes whatever optimizations it
uncovered
• Dynamic compiler• Algorithm acquires analysis budget as
a percentage of run time• Periodically spends budget, delivers
additional optimizations• Longer program runs, more
optimizations
Experimental Results
Methodology
• Implemented analysis in MIT Flex System
• Obtained several benchmarks• Scientific computations: barnes,
water• Our lexical analyzer: jlex• Spec benchmarks: db, raytrace,
compress
Analysis Times
0
25
50
75
100
125
150
barnes water jlex db raytrace compress
Ana
lysi
s Ti
me
(sec
onds
)
Incrementalized Analysis Whole-Program Analysis
223 645
jdk
1.2
Stack Allocation
Percentage of Memory Allocated on Stack
0
20
40
60
80
100
barnes water jlex db raytrace compress
Incrementalized Analysis Whole-Program Analysis
Normalized Execution Times
0
20
40
60
80
100
barnes water jlex db raytrace compress
Incrementalized Analysis Whole-Program Analysis
Reference: execution time without optimization
Other Analyses and Uses• Whole-program pointer and escape analysis
(OOPSLA 1999)• Stack allocation• Synchronization elimination
• Pointer and escape analysis for multithreaded programs (PLDI 1999, PPoPP 2001)• Correct use of region-based allocation• Elimination of dynamic region checks• Synchronization elimination• Memory bank disambiguation (RAW, DeepC)• Foundation for other analyses
• Bitwidth analysis (MIT and CMU)• Symbolic accessed region analysis
Other Analyses and Uses
• Symbolic analysis of accessed regions of memory blocks (PPoPP 1999, PLDI 2000)• Automatic parallelization of sequential
divide and conquer programs• Data race freedom for parallel divide
and conquer programs• Absence of array bounds violations for
both• Bitwidth analysis
Future Directions
Information from designers and developers
• Type systems for program properties• Automatic parallelization and data race freedom
for divide and conquer programs (CC 2001)• Data race freedom for OO programs (OOPSLA
2001)
• Application-specific design properties• Shapes of recursive data structures• Object role transitions and constraints
• Distributed systems, component-based systems• Interaction patterns• Failure propagation properties• Global view of system
Recommended