Click here to load reader

Interprocedural Path Profiling and the Interprocedural Express-Lane Transformation

  • View
    0

  • Download
    0

Embed Size (px)

Text of Interprocedural Path Profiling and the Interprocedural Express-Lane Transformation

thesis.dviBy David Gordon Melski
REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
2002
Dedicated to my parents, John and Linda Melski
i
Abstract
The contributions of this thesis can be broadly divided into two categories: we present novel path- profiling techniques, and we present techniques for performing the express-lane transformation, a pro- gram transformation that duplicates frequently executed paths in the hope that better data-flow facts result along those paths.
In path profiling, a program is instrumented with code that counts the number of times particular finite-length path fragments of the program’s control flow graph are executed. This thesis presents a number of extensions to the intraprocedural path-profiling technique of Ball and Larus. Several of our techniques collect information about interprocedural paths (i.e., paths that cross procedure boundaries). We show that the overhead of our techniques is not prohibitive (300–700%), and that they often capture more information than the Ball-Larus technique.
The express-lane transformation isolates and duplicates hot paths in a program, aiming for better data-flow facts along the duplicated path. We describe several variants of the interprocedural express- lane transformation, each of which duplicates hot paths from an interprocedural path profile. We show that an interprocedural express-lane transformation helps range analysis to determine the outcome of 0–7% more branches than the intraprocedural express-lane transformation and 1.5–19% more branches than performing no transformation.
Code growth is one drawback of the express-lane transformation. When a pair of duplicate control- flow vertices have the same data-flow facts, it is desirable to eliminate one of the vertices (e.g., by coalescing the duplicate vertices). We present several effective techniques for eliminating duplicated code that has a redundant data-flow solution; this helps to control code growth.
We also present experimental results for program optimizations that are based on: (1) performing an express-lane transformation; (2) performing range analysis; and (3) replacing decided branches and constant expressions. We show that when used with the intraprocedural express-lane transformation, this strategy leads to larger performance benefits than previously reported (0.7–13.0%). Using the inter- procedural express-lane transformation also leads to performance benefits, although usually not enough to offset the costs incurred by the transformation. It is likely that a better implementation would lower these costs, possibly leading to a net performance gain.
ii
Acknowledgements
I love being in Madison. And I have thoroughly enjoyed being a student in Madison. Even so, graduate school is hard, and I could not have accomplished anything without help.
First and foremost, I must thank my advisor Tom Reps for his patience and his guidance. I have learned a lot from Tom, including not just specific knowledge in the field of computer science, but also about how to think about problems and how to write. (Tom is an excellent editor and I wish there were time to get more feedback on the thesis; as it is, there are many rough patches for which I must take full responsibility.) I have been glad of the opportunity to work with him.
I would also like to thank my other committee members; I have tried to make the thesis easy to read, but I know it is both long and sometimes dense. I am also thankful for all of the people in the programming languages group at Wisconsin, including Susan Horwitz, Ras Bodik, Jim Larus, Tom Ball, Charles Fischer, Mike Siff, Manuvir Das, Alexey Loginov, Glenn Ammons and many more. All of these people have offered useful feedback and support. I cannot stress this enough: without the support and feedback from these people, I could not have accomplished anything. There are also colleagues outside of Wisconsin to whom I am grateful for support and suggestions, including Mooley Sagiv, Reinhard Wilhelm, Barbara Ryder, and Laurie Hendren.
I owe thanks to Glenn Ammons for his implementation of a Ball-Larus path profiler and his imple- mentation of the intraprocedural express-lane transformation. They were a good starting point for my own implementations. I would also like to thank Mike Siff, Glenn Ammons, and Alexey Loginov for reading my prelim and calming me down before the oral presentation of my prelim.
There are other crucial players in my support network. Chief among these are my parents, John and Linda Melski. They are always there for me, and they are always supportive. I think that it is impossible to underestimate the importance of their support.
I have also been blessed with many great friends during my tenure in Madison. These include Amy Millen, Berit and Mark Givens, Eric Melski (my brother), Kasey Melski (my sister), Bill Winters, Amir Roth, Chris Lukas, Alain Roy, Alexey Loginov, and Meghan Wulster. These people have lifted my spirits countless times, and they always helped to relieve the pressures of graduate school. My soccer teams, the Crystal Corner and the Madison O2, were also great for relieving stress, both on the field and off.
There are many other people who have played an important role in my life while working on my Ph.D., and I am sure that I am forgetting to mention some important people. To those people, please know that I am grateful.
iii
Contents
Abstract i
Acknowledgements ii
1 Introduction 1 1.1 Interprocedural Path Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The Interprocedural, Express-Lane Transformation . . . . . . . . . . . . . . . . . . . 4
1.2.1 Reducing the Hot-path Supergraph . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Using the Express-Lane Transformation for Optimization . . . . . . . . . . . . 5
1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Related Work 7 2.1 Summary of the Ball-Larus Technique for Intraprocedural Path Profiling . . . . . . . . 7 2.2 Improving Data-flow Analysis with Path Profiles . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Constructing the Hot-path Graph . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Reducing the Hot-path Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 The Functional Approach to Interprocedural, Context Path Profiling 17 3.1 Background: The Program Supergraph and Call Graph . . . . . . . . . . . . . . . . . 20 3.2 Modifying G∗ to Eliminate Backedges and Recursive Calls . . . . . . . . . . . . . . . 21
3.2.1 G∗ fin has a Finite Number of Paths . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Numbering Unbalanced-Left Paths: A Motivating Example . . . . . . . . . . . . . . . 24 3.3.1 What Do You Learn From a Profile of Unbalanced-Left Paths? . . . . . . . . . 26
3.4 Numbering L-Paths in a Finite-Path Graph . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5 Numbering Unbalanced-Left Paths in G∗
fin . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5.1 Connection Between Numbering Unbalanced-Left Paths inG∗
fin and Numbering L-Paths in a Finite-Path Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.2 Assigning ψ and ρ Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.5.3 Computing edgeValueInContext for interprocedural edges . . . . . . . . . . . 35 3.5.4 Practical Considerations When Numbering Unbalanced-Left Paths . . . . . . . 36 3.5.5 Calculating the Path Number of an Unbalanced-Left Path . . . . . . . . . . . . 38
3.6 Runtime Environment for Collecting a Profile . . . . . . . . . . . . . . . . . . . . . . 40 3.6.1 Optimizing the Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.6.2 Recovering a Path From a Path Number . . . . . . . . . . . . . . . . . . . . . 41
3.7 Handling Other Language Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.7.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.7.2 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.7.3 Indirect Procedure Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
iv
4 The Functional Approach to Interprocedural Piecewise Path Profiling 55 4.1 Numbering Unbalanced-Right-Left Paths in G∗
fin . . . . . . . . . . . . . . . . . . . . 56 4.1.1 Calculating numValidComps from ExitP . . . . . . . . . . . . . . . . . . . . 59 4.1.2 Practical Considerations When Numbering Unbalanced-Right-Left Paths . . . 61
4.2 Calculating the Path Number of an Unbalanced-Right-Left Path . . . . . . . . . . . . 64 4.3 Runtime Environment for Collecting a Profile . . . . . . . . . . . . . . . . . . . . . . 65 4.4 Comparing Path-Profiling Information Content . . . . . . . . . . . . . . . . . . . . . 66
5 Other Path-Profiling Techniques 70 5.1 Intraprocedural Context Path Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2 Interprocedural Context Path Profiling with Improved Context for Recursion . . . . . . 72 5.3 Non-Functional Approaches to Interprocedural Path Profiling . . . . . . . . . . . . . . 73 5.4 Hybrid Approaches to Path Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6 Path Profiling Experimental Results 75
7 The Interprocedural Express-lane Transformation 83 7.1 Entry and Exit Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.2 Defining the Interprocedural Express-Lane . . . . . . . . . . . . . . . . . . . . . . . . 86
7.2.1 The Minimal Predecessor Property . . . . . . . . . . . . . . . . . . . . . . . . 89 7.2.2 The Context Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.3 Performing the Interprocedural, Express-Lane Transformation . . . . . . . . . . . . . 90 7.3.1 The Hot-Path Automata for Interprocedural, Piecewise Paths . . . . . . . . . . 91 7.3.2 The Hot-Path Automata for Interprocedural, Context Paths . . . . . . . . . . . 93 7.3.3 Step Two: Hot-Path Tracing of Intraprocedural Path Pieces . . . . . . . . . . . 95 7.3.4 Step Three: Connecting Intraprocedural Path Pieces . . . . . . . . . . . . . . 96
7.4 Graph Congruence of the Supergraph and the Hot-path Supergraph . . . . . . . . . . . 99
8 Experimental Results for the Express-lane Transformation 106
9 Reducing the Hot-path (Super)graph: Partitioning Algorithms 118 9.1 Definition of a Hot-path Graph Reduction Algorithm . . . . . . . . . . . . . . . . . . 118
9.1.1 A Paradigm Shift? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 9.2 The Ammons/Larus Approach to Reducing the Hot-path Graph . . . . . . . . . . . . . 121
9.2.1 Step One: Identify Hot Vertices . . . . . . . . . . . . . . . . . . . . . . . . . 121 9.2.2 Step Two: Partition Vertices into Compatible Blocks . . . . . . . . . . . . . . 122 9.2.3 Step Three: Apply the Coarsest Partitioning Algorithm . . . . . . . . . . . . . 122
9.3 Adapting the Coarsest Partitioning Algorithm for the Hot-Path Supergraph . . . . . . . 127 9.3.1 Properties of the Supergraph Partitioning Algorithm . . . . . . . . . . . . . . 129 9.3.2 Using the Supergraph Partitioning Algorithm in the Ammons-Larus Reduction
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 9.3.3 Comparing and Contrasting the Partitioning Algorithms . . . . . . . . . . . . 130 9.3.4 The Supergraph Partitioning Algorithm . . . . . . . . . . . . . . . . . . . . . 132
v
10 Reducing the Hot-path Supergraph Using Edge Redirection 144 10.1 Problems Created by Performing an Edge Redirection . . . . . . . . . . . . . . . . . . 144 10.2 Determining When Edge Redirection is Possible . . . . . . . . . . . . . . . . . . . . . 146 10.3 Determining When Edge Redirection is Profitable . . . . . . . . . . . . . . . . . . . . 154 10.4 Proof of Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 10.5 Analysis of Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 10.6 Updating a Path Profile After Edge Redirection . . . . . . . . . . . . . . . . . . . . . 160 10.7 Alternating Between Graph Reduction Strategies . . . . . . . . . . . . . . . . . . . . 162
11 Reducing the Hot-path Graph is NP-hard 163
12 Experimental Results for Reducing the Hot-path Supergraph and for Program Optimiza- tion 171
12.0.1 The Supergraph Partitioning Algorithm . . . . . . . . . . . . . . . . . . . . . 171 12.0.2 Edge Redirection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
12.1 Using the Express-Lane Transformation for Program Optimization . . . . . . . . . . . 178
13 RelatedWork 185 13.1 Related Profiling Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 13.2 Related Path Optimization Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
14 Contributions and Future Work 189
Bibliography 191
B Runtime Environment for Collecting an Interprocedural, Context Path Profile 199
C Proofs for Theorems in Chapter 9 203
D Proofs for Theorems in Chapter 10 210
E Determining If J ′ Preserves the Valuable Data-Flow Facts of J 215
vi
List of Tables
1 Example path profile for Figure 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Paths for Figure 1 translated to the hot-path graph in Figure 6. . . . . . . . . . . . . . 12 3 Path profiling statistics when the profiled SPEC benchmark is run on its reference input. 76 4 Path profiling statistics when the profiling SPEC benchmark is run on its reference input. 77 5 Path profiling statistics when the profiling SPEC benchmark is run on its reference input. 79 6 Runtime of the SPEC95Int benchmarks with and without interprocedural path profiling
instrumentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7 Interprocedural path profiling overhead. . . . . . . . . . . . . . . . . . . . . . . . . . 81 8 Comparison of the cost of performing various express-lane transformations and the cost
of performing interprocedural range analysis after an express-lane transformation has been performed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
9 Comparison of the results of range analysis after various express-lane transformations have been…