17
PRESTO Research Group, Ohio State University Interprocedural Dataflow Analysis in the Presence of Large Libraries Atanas (Nasko) Rountev Scott Kagan Ohio State University Thomas Marlowe Seton Hall University

PRESTO Research Group, Ohio State University Interprocedural Dataflow Analysis in the Presence of Large Libraries Atanas (Nasko) Rountev Scott Kagan Ohio

Embed Size (px)

Citation preview

PRESTO Research Group, Ohio State University

Interprocedural Dataflow Analysis in the Presence of

Large Libraries

Atanas (Nasko) RountevScott Kagan

Ohio State University

Thomas MarloweSeton Hall University

223/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

Uses of Interprocedural Dataflow Analysis

Performance optimizations in compilers Software understanding and

transformation e.g. dependence analysis for program

slicing, change impact analysis, refactoring, etc.

Software testing e.g. dataflow-based testing; testing of

object interactions in OO software Software checking

e.g. object protocols: open(read|write)*close

333/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

Model for Interprocedural Whole-Program Analysis

Components C1, C2, …, Cn form a complete program

Assumption: it is possible and desirable to analyze the source code of the entire program

code for C1

code for C2

…code for Cn

dataflowsolution forC1 + C2 + … + Cn

Engine forEngine forWhole-Whole-

ProgramProgramDataflowDataflowAnalysisAnalysis

443/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

A Specific Case: Main + Lib

Main + Lib form a complete program What if we are using large libraries that need to

be re-analyzed from scratch? e.g. the standard Java libraries contain about

10,000 classes and 80,000 methods need to be re-analyzed with every new Main

component

code for Main

code for Lib

dataflowsolution forMain + Lib

Engine forEngine forWhole-Whole-

ProgramProgramDataflowDataflowAnalysisAnalysis

553/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

Example: Methods in Java Programs

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

Nu

mb

er o

f m

eth

od

s

User Methods Library Methods

663/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

A Specific Case: Main + Lib

Goal: the solution for Main should be as good as the solution that would have been computed by a whole-program analysis (no loss of precision)

code for Lib

Summary Summary GenerationGeneration

AnalysisAnalysissummary for Lib

code for Main dataflow solution for Main

Engine for Engine for Whole-Whole-

Program Program Dataflow Dataflow AnalysisAnalysis

summary for Lib

773/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

Functional Approach to Whole-Program Analysis

Sharir-Pnueli 1981 Dataflow lattice L Edge function f: L L for effects of a

statement Path function: f = fn fn-1 … f2 f1

Phase 1: summary functions φn: L L solution at node n as a function of the

solution at the entry of n’s procedure Phase 2: solutions at start nodes of

procedures Phase 3: solutions at the remaining nodes

883/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

φ6 = φ13 f1 f0

1

f0

main

2

3

4

5

6

p2

14

15

16

17

24

22

21

18

2019

23

p3

25

26

27

28

f1

f4 f5

f6

f7

f8

p1

7

8

9 10

11

12

13

f2 f3

Example: Functional Approach

φ28 = f8 f7

φ21 = f4 f5 (φ28 f6)

φ13 = (φ21 f2) (φ21 f3)

993/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

Callbacks

Callbacks e.g.

function pointers in C

e.g. virtual dispatch in C++ and Java

Can no longer determine φ21 and φ13 without code for ext

1

f0

main

2

3

4

5

6

p2

14

15

16

17

24

22

21

18

2019

23

p3

25

26

27

28

ext

29

30

31

f1

f4 f5

f6

f7

f8

f9

p1

7

8

9 10

11

12

13

f2 f3

10103/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

p2

14

15

16

17

24

22

21

18

2019

23

p3

25

26

27

28

f4 f5

f6

f7

f8

p1

7

8

9 10

11

12

13

f2 f3

Library Summary Idea: run

“pieces” of phase 1

Compute functions for sets of library-local pathsφ = id

φ = f8 f7

f6

φ = f4 f5

φ = f2 f3

φ = id

141614217 11

17211213

11113/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

Library Summary Generation

“Fixed” call in the library always invokes the same library procedure

independent of code for main component “Fixed” procedure in the library

makes no calls, or makes only fixed calls, to fixed procedures standard functional approach can be

applied For any other procedure, compute φ

k is the start node, or k is a return from a non-fixed call, or k is a return from a fixed call to a non-fixed

procedure

k n

12123/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

Example: Library Summary Generationp2

14

15

16

17

24

22

21

18

2019

23

p3

25

26

27

28

f4 f5

f6

f7

f8

p1

7

8

9 10

11

12

13

f2 f3

Fixed calls 11-12 and 23-24

Non-fixed calls 16-17

Fixed procedures p3

Non-fixed procedures p1 and p2

Contexts k for φ 7 and 14: start nodes 17: return from a non-fixed call 12: return from a fixed call to a non-

fixed procedure

k n

13133/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

The Condensed Graphp2

14

15

16

17

24

22

21

18

2019

23

p3

25

26

27

28

f4 f5

f6

f7

f8

p1

7

8

9 10

11

12

13

f2 f3

p2

21

p1

7

11

12

13

14

16

17

φ = id

φ = f8 f7

f6

φ = f4 f5

φ = f2 f3

φ = id

141614217 11

17211213

14143/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

Analysis of a Main Component

1

f0

main

2

3

4

5

6 ext

29

30

31

f1

f9

p2

21

p1

7

11

12

13

14

16

17

Create a “fake” graph for the whole program

Run a whole-program analysis engine

Safe solutions for non-library nodes precise for

distributive problems

15153/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

Original vs. Condensed Library CFGs: Number of Nodes

0

50000

100000

150000

200000

250000

300000

350000

Nu

mb

er

of

CF

G n

od

es

Nodes in original CFGs Nodes in condensed CFGs

16163/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

Original vs. Condensed Library CFGs: Number of Edges

0

50000

100000

150000

200000

250000

300000

350000

400000

Nu

mb

er

of

CF

G e

dg

es

Edges in original CFGs Edges in condensed CFGs

17173/30/06

CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group

Discussion

Flow and context insensitivity Cost reduction: time and memory Compact representation of functions

IFDS, IDE Use assumptions about the callback

methods? e.g. assume callback methods are

“good”