48
Context-Sensitive Interprocedural Points- to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin

Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

  • Upload
    valiant

  • View
    28

  • Download
    1

Embed Size (px)

DESCRIPTION

Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers. Presentation by Patrick Kaleem Justin. Abstract. - PowerPoint PPT Presentation

Citation preview

Page 1: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Context-Sensitive Interprocedural Points-to Analysis in the

Presence of Function Pointers

Presentation byPatrickKaleemJustin

Page 2: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Abstract

• Emami, et al introduced a new method for dealing with the alias problem in C-like languages. The method provides context-sensitive interprocedural information based on “invocation graphs” that even supports recursive functions.

Page 3: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Introduction/Motivation

Similar analyses had already been applied to block-structured languages like Fortran.

C proved to be a greater challenge because of the many pointer-related options available to the programmer.

Simply applying Fortran’s optimizations was practically useless due to the many pointer-related operations used in C.

Page 4: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Difficulties With C

• The address-of operator (&) can create new points-to relationships at any point in a program

• Handling both stack-allocated and dynamically-allocated (heap) variables

• Proper analysis of recursion and function pointers

Page 5: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Data Structures

Two main data structures are used during Emami’s analysis:

• An invocation graph to deal with function call paths.

• A set of abstract stack locations to deal with Steensgaard/Anderson-like points-to relationships

Page 6: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Points-to Abstraction

• Based on relationships between stack locations instead of simple alias pairs

• Stack location x points-to stack location y at program point p if x == &y

• Computes both possible and definite points-to relationships

Page 7: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Points-to Abstraction

• Definite relationships provide valuable “killing” information as well as providing pointer replacement

• If we know q definitely points to y, we can replace the statement x=*q with x=y

• Later in compilation, these replacements can reduce the number of loads and stores needed

Page 8: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Context-sensitive/Interprocedural

• Emami’s approach is to use invocation graphs, which generate accurate results and correctly handle recursion in the presence of function pointers

• This is important because function pointers previously proved difficult for interprocedural analysis

• Paths along the graph represent possible paths of execution from one function call to another

Page 9: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Stack-based Analysis

• Aliasing problems come in one of three forms: aliases between references to the stack, between references to dynamically allocated heap space, and between two references to the same array

• Emami holds that stack-based analysis can be safely decoupled from heap-based analysis. The given algorithm deals only with the stack analysis.

Page 10: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Stack vs. Heap

• C’s pointer-related complexity involves aliases in the stack as well as the heap. Dynamically allocated memory is said to be on the heap. Emami’s algorithm deals with resolving stack-based aliases

• For example, the statements

int a; int* x=&a;

create a pointer to a member of the stack

Page 11: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Setting – McCAT Compiler

• SIMPLE is a compact representation for program statements that is easier to deal with, while retaining ALL the functionality required by real C programs. In effect, it is a strict subset of C.

• Using SIMPLE, points-to analysis rules need only be implemented for 15 basic statements and for the simplified conditions/control statements (if, while, etc).

• The 15 statements conveniently require only one-level pointer indirection per reference.

Page 12: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Previous MethodsPast analysis methods, like Steensgaard and Andersen,

approximated aliases via alias pairs. Two variable references were said to be aliased if they referred to the same location.

These algorithms ignored the context and flow for statements. Although they could be used to optimize real programs, there was still room for improvement because they would likely include many false alias pairs.

Context/Flow sensitive algorithms can eliminate even more false information, allowing more optimization possibilities when implemented.

Page 13: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Stack-based Method

• Emami’s method abstracts the set of all accessible stack locations with a finite set of named “abstract stack locations”.

• The approximation consists of a set of points-to relationships between the abstract stack locations.

• After the statement p=&y, we say abstract stack location p points-to abstract stack location y.

Page 14: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Properties of the Abstraction

• Every real stack location involved in a pointer reference is represented by exactly one named abstract stack location.

• Each of these named abstract location represents one or more real stack locations.

Page 15: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Abstract Stack Locations

Each abstract location corresponds to one of three things:

• The name of a local/global variable or parameter

• A symbolic name that corresponds to locations indirectly accessible through pointer variables

• The symbolic name heap

Page 16: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Definitely Points-to

Abstract stack location x definitely points to abstract stack location y, if x and y each represent exactly one real stack location, and the real stack location corresponding to x contains the address of the real stack location corresponding to y. This is denoted by the triple (x, y, D)

Page 17: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Possibly Points-to

Abstract stack location x possibly points-to abstract stack location y if it is possible that one of the real stack locations corresponding to x contains the address of one of the real stack locations corresponding to y. This is denoted by

(x, y, P)

Page 18: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

L- vs R-locations

• L-locations and R-locations are abstract locations referred to by a variable reference on the left or right side of an assignment statement, respectively.

• Both are represented as the pair (x, D), (x, P), where X is an abstract location name, and D/P indicate definite and possible locations. These are described in Table 1.

Page 19: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

L- and R-locations

• L-locations refer to the stack location of the variable itself

• An R-location is

{(x, d) | a points to x with the relationship d}

• Table 1 shows these sets for all references available in SIMPLE

Page 20: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

SIMPLE References

Ref L-location set R-location set&a N/A {(a, D)}

&a[0] N/A {(ahead, D)}

a {(a, D)} {(x,d) | (a,x,d) in S}

*a {(x,d) |

(a,x,d) in S}

{(y, d1 /\ d2) |

(a,x,d1) in S /\ (x,y,d1) in S}

malloc() N/A {(heap, P)}

Page 21: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Basic Analysis Rules

If the statement is a sequence, process both statements separately.If the statement is a basic assignment, process it with process_basic_stmt().If the statement is a control statement, call the corresponding function to process it.

Page 22: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Basic Analysis Rules

If we are assigning to a pointer variable:

kill_set = those relationships of definite L-locations of lhs(S)change_set = those relationships from possible L-locations of lhs(S)gen_set = generate all relationships between L-locations of

lhs(S) and R-locations of rhs(S)

Page 23: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Basic Analysis Rules

Page 24: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Interprocedural Analysis

• When measuring the effect of a procedure call, we estimate it within the context in which it was called

• A calling context depends on the chain of procedure calls starting at main() and ending with the procedure being processed

Page 25: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Invocation Graph

• All invocation paths, beginning with main(), are represented in an invocation graph.

• If we could disregard recursion, the graph could be built using a depth-first traversal of the program’s calling structure

• With recursion, we have to approximate all possible “unrollings” of the recursion.

• Nodes for recursive calls are marked as an approximate node, and a special approximating edge is added to the graph

Page 26: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers
Page 27: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Mapping Points-to Information

• When a procedure is called, the input points-to set for the called procedure must be mapped from the points-to information at the call site.

• The mapping must include information about any multi-level pointer relationships

Page 28: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Mapping Points-to Information

• Globals may refer to locations outside the scope of the function. These pointers are named “invisible”.

• Invisible variables are represented by a symbolic name in the abstraction.

• A symbolic name can represent more than one invisible variable.

Page 29: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Mapping Points-to Information

• A good mapping scheme must minimize the number of invisible variables mapped to a symbolic name to improve accuracy

• The output points-to set is mapped back to obtain the output points-to information at the call site.

Page 30: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Recursion

• Figure 4 outlines the recursion algorithm• All possible unrollings of recursive calls are

approximated by introducing matched pairs of recursive and approximate nodes in the invocation graph.

• Each approximate node marks a place where the current stored approximation for the function should be used. Instead of evaluating the call again (and again…), the stored output set is used as an approximation.

Page 31: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Recursion – Figure 4

• For approximate nodes, the current input is compared to the stored input of the matching recursive node. If the current input is contained in the stored input, then we safely use the stored output as the result.

• An approximate node never evaluates the body of a function. It either uses the stored result or returns BOTTOM.

Page 32: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Function Pointers

With function pointers, the invocation graph cannot be constructed correctly with just a single pass through the source code.

When a function is called through a function pointer, program execution could potentially jump to a number of functions, which we cannot know for sure ahead of time.

Page 33: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Function Pointers

• The safest approximation would be to assume that these calls can point to any function in the program. But this would make our results too conservative.

• Emami improves this approximation by building the invocation graph while points-to analysis is performed.

Page 34: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Function Pointers

• When the analysis reaches a function pointer call, the set of currently-known possible points-to destinations for the pointer is used to limit the number of false assumptions.

• This is not perfect, because the graph is most likely incomplete at the point where the function is called, but it greatly improves upon assuming that all functions are reachable.

Page 35: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Function Pointer Algorithm

1. Begin building invocation graph

2. Perform points-to analysis using the incomplete graph

3. Each pointed-to function is analyzed in the context of the call

Page 36: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Function Pointer Example

Page 37: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Experimental Results• 17 C programs analyzed• All programs were converted to SIMPLE, and then

processed.

Page 38: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Results

• The average number of stack locations pointed to by the dereferenced pointer is nearly 1, which would be the ideal case

• This means that the analysis is very precise. Very few false assumptions make it past the analysis.

Page 39: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Results

• About 30% of indirect references refer to a pointer that definitely points to a single stack location. About 20% of indirect references can then be replaced by direct references in the final program. For this replacement, about two-thirds of the replacements are useful.

• The missing 10% deals with the scoping problems of invisible variables.

Page 40: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Heap Pointers

• Emami’s algorithm deals only with stack-based pointers. Heap-based pointers are ignored. In the 17 test programs, about 30% of points-to relationships used have heap locations as their target.

• Stack analysis alone is insufficient. For a more complete analysis, a companion heap analysis is needed.

Page 41: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Heap vs Stack

Surprisingly, all test programs had zero pointer relationships from the heap to the stack.

This strongly supports the author’s claim that stack and heap analysis should be performed separately.

Page 42: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Results

• Each call site was found to have an average of only two locations in the invocation graphs.

• This implies that for real programs, explicitly following call chains in an invocation graph is not too costly.

• In other words, there are not too many edges coming from each node.

Page 43: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Problems

• Even though there are only an average of two locations per call site in the invocation graph, it is theoretically possible that the algorithm can have exponential time.

• Possible explanations for this include recursion, and function pointers.

Page 44: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Applications

• Once the analysis has been completed, other analyses can use the existing invocation graph and abstract stack information to provide further optimizations.

• These other analyses won’t have to take invisible variables or function pointers into account, as those problems will have already been resolved.

Page 45: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Related Work

• Alias Analysis– Alias analysis was the precursor to points-to

analysis.

Page 46: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Related Work

• Landi and Ryder– The points-to abstraction provides more alias

information in a more compact method .

– The author’s method can give more accurate results for multi-level pointers.

– The points-to method also provides a safe approximation even in the presence of pointers from the heap to the stack.

– Function pointers cannot be handled by this method.

Page 47: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Conclusion

• The method presented provides context-sensitive interprocedural information, as well as handling general function pointers.

• This information can be used for optimizations, and transformations including pointer replacement and array dependence testing.

Page 48: Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

The End

Patrick

Kaleem

Justin