32
Quantifying Uncertainty in Points-To Relations Constantino Ribeiro and Marcelo Cintra University of Edinburgh http://www.homepages.inf.ed.ac.uk/mc/Projects/VESPA

Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

Quantifying Uncertainty in Points-To Relations

Constantino Ribeiro and Marcelo Cintra

University of Edinburghhttp://www.homepages.inf.ed.ac.uk/mc/Projects/VESPA

Page 2: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 2

ContributionsScope

– Measure and compare sizes of static vs. dynamic points-to sets from context- and flow-sensitive algorithm

Goal – Quantification of may-alias behavior that is intrinsic to

applications– Classification of reasons for difference between static

prediction and run-time behaviorRelevance

– Important step toward future aggressive (speculative) optimizations

This work is not about a new pointer analysis algorithm

Page 3: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 3

Outline

MotivationPointer Analysis Evaluation MethodologyExperimental Setup and Results Related Work Conclusions

Page 4: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 4

Compiler Optimizations

To make good optimizations a compiler must have accurate knowledge of:– Data flow:

Redundant variable eliminationConstant propagationRegister allocation

– Control flow:Dead code eliminationInstruction scheduling

Page 5: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 5

Data Flow Analysis

Data flow analysis: difficult to achieve 100% of precision– Use of pointers variables

Same pointer may refer to different memory objects at different timesSame pointer may refer to many memory objects at some program point

– Use of proceduresSide effects caused by call by reference and access to global data

– Presence of control flow structuresMultiple def-use chains

Page 6: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 6

Real Points-to Behavior

So we want to– Understand the points-to behavior in real applications– Discover the causes of the ambiguities from static

analysis– Facilitate more aggressive optimizations for ambiguous

points-to

Page 7: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 7

Outline

MotivationPointer AnalysisEvaluation Methodology Experimental Setup and Results Related Work Conclusions

Page 8: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 8

Points-to analysis

3

4

87

12

Data Dependence Analysis for pointer variablesAt each point of the program: set of pointer variables and the locations that they point toPointer variables may point to an address or to many addressesPointer variables can even point to other pointersMany possible points-to targets restrict optimizations in conservative compilersProcedures and their calls increase complexity and time of the analysis

Page 9: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 9

Types of Algorithms

Sensitivity:

Flow-sensitive + Context-sensitive → more precise analysisGranularity:

Fine: individual fields of complex data structuresCoarse: whole data structures and arrays

Naming of dynamically created memory objects:Single name “heap”Per memory allocation sitePer context

Context-sensitive: points-to sets within procedures are computed for each call site

Context-insensitive:

Flow-sensitive: points-to sets are computed for each program point

Flow-insensitive:

Page 10: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 10

Formal Representation

Location sets or locsets: individual named memory locations where:– Points-to relations (R): tuples (p,v) where

p: pointer v: location set

– P and V: set of pointers and location sets whereR ⊂ P × V : points-to relation

– Every tuple (p, v) ∈ R means: pointer p may point to location set v

p → v

– Points-to graph:G = (N, E) of N = P ∪ V nodes and E = R edges

Page 11: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 11

Formal Representation

Analysis: compute points-to graph to:– Basic dataflow equations that make pointer manipulation

operations:p1 = &p2; (Address-of assignment)p1 = p2; (Copy assignment)p1 = *p2; (Load assignment)*p1 = p2; (Store assignment)

– Resulting in: points-to graph to all points-to relationships:

Definitely points-toPossibly points-to

Page 12: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 12

Formal Representation

Where:

Definitely points-to:R = {(p, v)} only p = &v

Possibly points-to:R = {(p, v),(p, z)} either p = &v

or p = &z

Page 13: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 13

Causes of Uncertainty in Pointer Analysis

Control flowPointer arithmeticUnavailable procedure codeRecursive data structuresAggregate data structuresDynamically allocated objects

Page 14: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 14

Outline

MotivationPointer AnalysisEvaluation MethodologyExperimental Setup and Results Related Work Conclusions

Page 15: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 15

Static Source Code Analysis

An extension of Rugina and Rinard’s Context- and flow-sensitive pointer analysis algorithm with following new features:– Number of accesses with pointer de-reference– Number of used and modified locsets that occurs just before of:

Indirect use of a variable : ... = *p;Indirect modification of a variable: *p = ...;Multi-level indirect use of variable: ... = * * p;Multi-level indirect modification of variable: * * p = ...;Procedure call: foo(..., *p, ...);

– Loops : one instance of the cases above per pointer de-reference– Procedures : one instance of each pointer de-reference per calling

context

Page 16: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 16

Run-time Statistics Collection

Our tool inserts additional profiling code that:– Records all different run-time memory addresses– Counts the number of accesses to each different address

Each run-time access has a unique identifier (source code number) that matches the run-time / static accessProblem:– Possible mismatches between static and dynamic:

Multiple static accesses may map to the same source code line with the same run-time counter:

– The pointer analysis algorithm separates static accesses according to their context

Not all static accesses may appear at run time:– Portion of the code not executed due to input data

Page 17: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 17

Outline

MotivationPointer AnalysisEvaluation MethodologyExperimental Setup and ResultsRelated Work Conclusions

Page 18: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 18

Experimental Setup

Applications:– SPEC2000 integer

Except gcc, gap, vortex and eon

– MediaBench– SPEC2000 fp tried but found to be not interesting as a

pointer analysis problem

Standard input set used with run-time experiments

Page 19: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 19

Applications Characteristics

797515,262 (950)17.5300.twolf

04887 (85)2.9256.bzip2

8310,5873,631 (917)12197.parser

6724,7164,920 (469)12Int186.crafty

1316506 (194)1.9SPEC181.mcf

4289603,959 (649)17175.vpr

431131,750 (246)9.1164.gzip

Pointer ModificationsPointer Uses

Total Location Sets

(Pointer)

Lines of Code (KLOC)

SuitApplication

Page 20: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 20

Applications Characteristics

311681566 (599)5.8gsmdecode

022448 (133)5.8gsmencode

24122 (36)`1g721-dec

02393 (68)1g721-enc

851401,605 (295)4.9mpeg2dec

2761162,179 (455)8.5MediaBenchmpeg2enc

618531 (242)7.6unepic

1337397 (105)7.6epic

Pointer ModificationsPointer Uses

Total Location Sets

(Pointer)

Lines of Code

(KLOC)

SuitApplication

Page 21: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 21

Static Analysis Tool

Extension of SPAN package that:– Records all instances of pointer de-references + number of

possible targets + source code line number– Uses and modifications via pointer de-references counted

separately– Static de-references to potentially uninitialized pointers use

a special location set (unk) and are counted separately– Static de-references to dynamically allocated memory use a

special location set (heap.X, where X is context id) and are counted separately

Page 22: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 22

Static Analysis Results

N > 3N = 3N = 2N = 1

Uses (u) and Modifications (m) with N possible targets (including unk target, including heap target, number of source code lines)

Application

nonenone6 (6, 0, 6)2 (2, 0, 2)

u: 3687m: 77

twolf

nonenonenoneu: 119m: 0

bzip2

7841 (181, 230, 259)31 (9, 4, 9)

36 (0, 0, 11)0 (0, 0, 0)

241 (241, 241, 35)32 (32, 32, 6)

u: 25178m: 20

parser

119 (0, 26, 24)0 (0, 0, 0)

2 (2, 2, 1)146 (146, 66, 13)

542 (534, 67, 59)47 (45, 11, 9)

u: 4970m: 479

crafty

6 (0, 0, 3)0 (0, 0, 0)

nonenoneu: 67m: 13

mcf

nonenonenoneu: 2488m: 428

vpr

nonenonenoneu: 277m: 43

gzip

Page 23: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 23

Static Analysis Results

N > 3N = 3N = 2N = 1

Uses (u) and Modifications (m) with N possible targets (including unk target, including heap target, number of source code lines)

Application

9 (0, 0, 9)0 (0, 0, 0)

nonenoneu: 346m: 31

gsmdecode

nonenonenoneu: 154m: 0

gsmencode

nonenonenoneu: 6m: 2

g721-dec

nonenonenoneu: 22m: 0

g721-enc

6 (6, 6, 1)10 (10, 10, 2)

none8 (8, 8, 2)0 (0, 0, 0)

u: 499m: 75

mepeg2dec

nonenonenoneu: 395m: 279

mpeg2enc

nonenonenoneu: 59m: 6

Unepic

nonenonenoneu: 156m: 13

epic

Page 24: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 24

Profiling Environment

Monitor the actual run-time behaviour of static pointer de-references with multiple possible targets

SPAN extension include profiling code where:– static de-reference has multiple targets and then record the actual

address accessed + counter per address

Instrumented code is converted (SUIF format (.spd) to C code)

Compiled (Intel x86 platform, gcc 3.4.4, -O2 optimizationlevel)

Page 25: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 25

Run-time Uncertainty

Modifications with N actual targetsUses with N actual targetsApplication

N > 2N = 2N = 1NEN > 2N = 2N = 1NE

----0090gsmdecode

10011002mpeg2dec

00020051twolf

801685027193parser

50017231159crafty

----0021mcf

119 (0, 26, 24)0 (0, 0, 0)

2 (2, 2, 1)146 (146, 66, 13)

542 (534, 67, 59)47 (45, 11, 9)

u: 4970m: 479

Crafty

59 + 1 + 24 = 84

59 + 1 + 1 + 23 = 84

N > 3N = 3N = 2N = 1

Uses (u) and Modifications (m) with N possible targets (including unk target, including heap target, number of source code lines)

Application

Page 26: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 26

Causes of Uncertainty

Behaviour difference

ActualStatic

No change

Single target3 or more targets

3 or more targets2 targets

Single target2 targets(inclusive unk)

Not executed2 or more targets

Number of cases

Cause

95-

28Control path alternative never taken

9Pointer arithmetic to index into array-like object

2Use of structure fields

5Use of recursive data structures

2Use of arrays

22Pointer arithmetic to index into array-like object

6Pointer turns out to be always initialised

282-

Page 27: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 27

Outline

MotivationPointer AnalysisEvaluation MethodologyExperimental Setup and ResultsRelated WorkConclusions

Page 28: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 28

Related Work

Algorithms:– The basic SUIF1 package used in our study (SPAN) was introduced by

R. Rugina and M. Rinard (PLDI ‘1999);– E. M. Nystrom et al proposed a fast and efficient summary-based

pointer analysis algorithm (SAS ‘04);– M. Hind discussed main pointer analysis research and talked about

unsolved questions (PASTE ‘01) - SURVEY;Quantification of run-time behavior:– Few works investigated the impact of pointer analysis on overall

compiler optimization like B. Cheng and W. M. Hwu, M. Das et al, R. Ghiya et al (SIGPLAN ‘00 - PLDI , SAS ‘04, SIGPLAN ‘01 – PLDI);

– A attempted to quantify the run-time behavior of points-to sets was done by M. Mock et al (PASTE ‘01);

– D. Liang et al is similar to previous work but using Java programs (ISSTA ‘02);

Page 29: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 29

Related Work

Speculative probabilistic analysis:– A quantitative computation of static points-to results against run-time

behavior in a probabilistic framework was proposed by Y. S. Hwang et al (LCPC ‘01)

– Support for speculative analysis of points-to was proposed by J. Lin, T. Chen et al (PLDI ‘03)

– G. Ramalingam proposed to extend static analysis with probabilistic information reflecting the actual run-time behavior (SIGPLAN ‘01 –PLDI)

Page 30: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 30

Outline

MotivationPointer AnalysisEvaluation MethodologyExperimental Setup and ResultsRelated WorkConclusions

Page 31: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

LCPC 2006 31

Conclusions

For most of the benchmarks static pointer analysis is very accurateFor some benchmarks up to 25% of the de-references cannot be statically fully disambiguated27% of these de-references access a single memory location at run time, but many do access several different memory locationsResults suggest further compiler optimizations exploiting cases where the uncertainty does not appear at run time – We need to improve the handling of pointer arithmetic – New probabilistic approaches that capture actual control flow

behavior

Page 32: Quantifying Uncertainty in Points-To Relationsresearch.ihost.com/lcpc06/presentations/11_presentation.pdf · 2006-11-17 · 186.crafty Int 12 4,920 (469) 4,716 672 181.mcf SPEC 1.9

Quantifying Uncertainty in Points-To Relations

Constantino Ribeiro and Marcelo Cintra

University of Edinburghhttp://www.homepages.inf.ed.ac.uk/mc/Projects/VESPA