257
HAL Id: tel-01677897 https://tel.archives-ouvertes.fr/tel-01677897v2 Submitted on 12 Jan 2018 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Static analysis of functional programs with an application to the frame problem in deductive verification Oana Fabiana Andreescu To cite this version: Oana Fabiana Andreescu. Static analysis of functional programs with an application to the frame problem in deductive verification. Other [cs.OH]. Université Rennes 1, 2017. English. NNT : 2017REN1S047. tel-01677897v2

Static Analysis of Functional Programs with an Application

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Static Analysis of Functional Programs with an Application

HAL Id tel-01677897httpstelarchives-ouvertesfrtel-01677897v2

Submitted on 12 Jan 2018

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents whether they are pub-lished or not The documents may come fromteaching and research institutions in France orabroad or from public or private research centers

Lrsquoarchive ouverte pluridisciplinaire HAL estdestineacutee au deacutepocirct et agrave la diffusion de documentsscientifiques de niveau recherche publieacutes ou noneacutemanant des eacutetablissements drsquoenseignement et derecherche franccedilais ou eacutetrangers des laboratoirespublics ou priveacutes

Static analysis of functional programs with anapplication to the frame problem in deductive

verificationOana Fabiana Andreescu

To cite this versionOana Fabiana Andreescu Static analysis of functional programs with an application to the frameproblem in deductive verification Other [csOH] Universiteacute Rennes 1 2017 English NNT 2017REN1S047 tel-01677897v2

THEgraveSE UNIVERSITEacute DE RENNES 1sous le sceau de lrsquoUniversiteacute Bretagne Loire

pour le grade deDOCTEUR DE LrsquoUNIVERSITEacute DE RENNES 1

Mention InformatiqueEcole doctorale Matisse

preacutesenteacutee par

Oana Fabiana Andreescupreacutepareacutee agrave Prove amp Run et aacute lrsquouniteacute de recherche 6074 ndash IRISAInstitut de Recherche en Informatique et Systemes Aleatoires

Static Analysis ofFunctional Programs

with an Application tothe Frame Problem inDeductive Verification

Thegravese soutenue agrave Rennes

le 29 Mai 2017

devant le jury composeacute de

Sandrine BlazyProfesseure Preacutesidente

Catherine DuboisProfesseure Rapporteuse

Antoine MineacuteProfesseur Rapporteur

Sylvain ConchonProfesseur Examinateur

Thomas JensenProfesseur Directeur de thegravese

Steacutephane LescuyerIngeacutenieur Co-directeur de thegravese

ii

iii

UNIVERSITEacute DE RENNES 1

AbstractProve amp Run

Eacutecole doctorale Matisse

DOCTEUR DE LrsquoUNIVERSITEacute DE RENNES 1

Static Analysis of Functional Programs with anApplication to the Frame Problem in

Deductive Verification

by Oana Fabiana Andreescu

In the field of software verification the frame problem refers to establishing the bound-aries within which program elements operate It has notoriously tedious consequenceson the specification of frame properties which indicate the parts of the program statethat an operation is allowed to modify as well as on their verification ie provingthat operations modify only what is specified by their frame properties In the contextof interactive formal verification of complex systems such as operating systems mucheffort is spent addressing these consequences and proving the preservation of the sys-temsrsquo invariants However most operations have a localized effect on the system andimpact only a limited number of invariants at the same time In this thesis we addressthe issue of identifying those invariants that are unaffected by an operation and wepresent a solution for automatically inferring their preservation Our solution is meantto ease the proof burden for the programmer It is based on static analysis and doesnot require any additional frame annotations Our strategy consists in combining adependency analysis and a correlation analysis We have designed and implementedboth static analyses for a strongly-typed functional language that handles structuresvariants and arrays The dependency analysis computes a conservative approximationof the input fragments on which functional properties and operations depend Thecorrelation analysis computes a safe approximation of the parts of an input state to afunction that are copied to the output state It summarizes not only what is modifiedbut also how it is modified and to what extent By employing these two static analysesand by subsequently reasoning based on their combined results an interactive theo-rem prover can automate the discharching of proof obligations for unmodified partsof the state We have applied both of our static analyses to a functional specificationof a micro-kernel and the obtained results demonstrate both their precision and theirscalability

v

AcknowledgementsFirst of all I would like to express my gratitude to my two PhD advisors ThomasJensen and Steacutephane Lescuyer without whom this thesis would have been impossibleI thank them for their patience and dedication in guiding me throughout these yearsand for all the rigour that they instilled into me by word and by their own exampleThomas thank you for helping me put my work into perspective Thank you for yourencouragement when I was overwhelmed by doubts and for your optimism when I hadnone Steacutephane thank you for your inspiring advices for the rigorous proofreadingfor the many interesting discussions and for your careful attention to my work Knowthat this thank you note was written using Emacs to which I am happy to admit thatyou converted me

I am in debt to Dominique Bolignano for raising the possibility of this thesis andfor creating the frame that allowed me to embark on this interesting journey and toexplore the seas of research among an inspiring group of professionals - the Prove ampRun team

I am grateful to and would like to wholeheartedly thank Catherine Dubois andAntoine Mineacute for accepting to review my dissertation I am honoured to know that my200+ pages have been read by experts of static analysis and formal verification and Iam grateful for their valuable comments and remarks

I would also like to thank Sandrine Blazy and Sylvain Conchon for accepting to bemembers of the jury Sylvain Conchon I am grateful for your keen interest during mydefense Sandrine Blazy thank you for accepting to chair my defense and for drivingit in such a positive manner

For their understanding their advice and their support during the transition periodand the months before my defense I would like to thank Claire Loiseaux and CarolinaLavatelli

I thank all of my colleagues at Prove amp Run for our discussions and their adviceduring these years I thank Florence for her warmth energy and optimism Erica andHenry for being such great office colleagues Pauline and Franccedilois for being friendlyreliable colleagues in the academic trenches I am in debt to Olivier and Benoit forreviewing my articles and providing valuable remarks I thank Pascale for smoothingout the stormy waves of administrative work Though our interactions were brieferI would like to also thank the Celtique members for their openness and for the inter-esting seminaries A special thanks goes to Lydie Mabil for helping me deal with theadministrative work during these years and finally for helping prepare the defense ofmy dissertation

This academic journey started long ago even before I was aware with the help ofMarius Minea and Ovidiu Badescu who unknowingly motivated me to take this pathyears later I warmly thank them and I am grateful to both for paving the first part ofmy academic path

I would also like to thank my friends old and new far and near Thank you foralways being there for me and providing perspective enthusiasm and breaths of freshair Thank you as well for still being my friends despite the long winded and geeky

vi

descriptions of my work and the occasionally cancelled plans and absences while I wastrying to find my way into the research world

I lack the appropriate words to express the gratitude I feel towards my family fortheir never-ending love and support I thank my mother and my sister for being suchwonderful examples of women in science I thank my father for his unwavering belief inme and for his love and respect for well-written sentences no matter the context whichhe instilled into me I thank my brother-in-law for being the one who ignited early onthe sparkle and interest for computers and mathematics and my two wonderful niecesfor always being my rays of light

Last but surely not least I have only gratitude for Georges my companion mypillar of strength my compass and lighthouse during the darkest moments To quoteCarl Sagan in the vastness of space and immensity of time it is my absolute joy tospend a planet and an epoch with you

vii

Contents

I Reacutesumeacute eacutetendu en Franccedilais xxiiiI1 Le Problegraveme du Frame xxiiiI2 Objectifs xxiiiI3 Analyse de deacutependance xxivI4 Anaylse de correacutelation xxvI5 Proceacutedure de deacutecision xxvI6 Conclusion xxvi

1 Introduction 111 Formal Verification of Software 112 The Frame Problem in a Nutshell 513 Prove amp Run Objectives and Products 714 Context and Problem Statement 915 Contributions and Structure of the Document 11

2 The Frame Problem in Software Verification 1321 Specification Languages and Verification Tools 1322 Manifestations of the Frame Problem 1623 Approaches to Specifying Frame Properties 17

231 The Manual Approach 17232 The Exclusive Approach 19233 The Implicit Approach 21

24 Topologies and Effects 21241 Explicit Footprints 23242 Implicit Footprints 24243 Predefined Footprints 25

25 Other Approaches to Reason about Frames 2626 Other Relevant Work 27

3 The Smart Language and ProvenTools 2931 The Smart Modeling Language 29

311 Smart Predicates and Types 30312 Exit Labels and Control Flow 34313 Polymorphism amp Algebraic Data Types 40314 Specifications 43315 Illustrating Smart ndash An Abstract Process Manager 47

32 ProvenTools 52

viii

33 Smil 55

4 The αSmil Language 5941 αSmil Syntax 5942 Control Flow Graph 6743 Well-Typed αSmil Statements 6744 Operational Semantics of αSmil Statements 70

5 Dependency Analysis for Functional Specifications 7751 Dependency Analysis in a Nutshell 78

511 Targeted Dependency Information 79512 Outline 83

52 Abstract Dependency Domain 83521 Join and Reduction Operator 86522 Well-Typed Dependencies 90

53 Intraprocedural Analysis and Data-Flow Equations 91531 Intraprocedural Dependency Domains 91532 Intraprocedural Data-Flow Equations 93533 Intraprocedural Dependency Analysis Illustrated 97

54 Interprocedural Dependencies 100541 Interprocedural Dependency Analysis Illustrated 103542 Context-Insensitivity and its Consequences 104

55 Semantics of Dependency Values 10556 Related Work 10957 Conclusion 112

6 Deferred Dependencies Injecting Context in Dependency Summaries11561 Dealing with Context-Insensitivity 11562 Symbolic Dependency Components in a Nutshell 11663 Symbolic Paths 120

631 Symbolic Path Type 120632 Semantics of Symbolic Paths 122633 Well-Typed Paths and Path Sets 123

64 Abstract Dependency Domain with Deferred Accesses 12565 Deferred Dependencies at the Intraprocedural Level 128

651 Extended Intraprocedural Dependency Analysis 128652 Intraprocedural Dependency Analysis Illustrated 129

66 Deferred Dependencies at the Interprocedural Level 130661 Applying Context-Sensitive Information by Substitution 132662 Wrapped Calls and Results 134

67 Related Work 13468 Conclusion 136

ix

7 Correlation Analysis 13771 Introduction 137

711 Targeted Correlation Information 138712 Correlation Analysis in a Nutshell 140

72 Partial Equivalence Relations 141721 Abstract Partial Equivalence Type 141722 Well-Typed Partial Equivalences and their Semantics 144

73 Paths and Correlations 146731 Paths and Correlation Types 146732 Alignment and Partial Order 149

74 Intraprocedural Correlation Analysis 155741 Intraprocedural Correlation Summaries and Analysis 155742 Intraprocedural Correlation Analysis Illustrated 162

75 Interprocedural Correlation Analysis 16676 Extension ndash Constructor Evolution 16777 Related Work 16978 Conclusion 171

8 Implementation Application and Results 17381 Implementation of the Dependency Analysis 173

811 Dependency Type and Operators 174812 Intraprocedural Dependency Analysis 177

82 Implementation of the Correlation Analysis 178821 Partial Equivalence Relations and Operators 178822 Intraprocedural Correlations 179823 Dependency and Correlation Analysers 180

83 Dependency and Correlation Results on ProvenCore Layers 182831 ProvenCore Description 182832 Obtained Dependency and Correlation Results 184833 Precision of our Dependency and Correlation Summaries 188

84 Reasoning about Framing using Correlations and Dependencies 192841 A Decision Procedure 192842 Types of Targeted Queries 197

85 Decision Procedure Experiments 199

9 Conclusion and Perspectives 20391 Contributions 20492 Future Work 206

Bibliography 211

xi

List of Figures

11 Complex Transition Systems Frame Problem 912 Frame Problem and Solution Strategy 10

31 Possible Transitions between Thread States 4832 The ProvenTools Toolchain 5333 Smart Editor 54

41 Body of the stop_thread Predicate 6542 Example ndash Control Flow Graph of Predicate thread 6743 Well-Typed Control Flow Graph 70

51 Example Data Types ndash Thread and Memory Region 7952 Input Type ndash Process 8053 Predicate thread ndash Implementation 8054 Gthread ndash Control Flow Graph of Predicate thread 8155 Targeted Dependency Results for Predicate thread 8156 Gstart_address ndash Control Flow Graph of Predicate start_address 8257 Predicate start_address ndash Implementation 8258 Targeted Dependency Results for Predicate start_address 8259 Order Relation on Pairs of Atomic Dependencies 85510 Computation of the Intraprocedural Domain at a Nodersquos Entry Point 94511 Analysing Predicate thread ndash Initialisation 98512 Applying the Variant Switch Equation 98513 Analysing Predicate thread ndash Variant Switch 99514 Applying the Array Access Equation 99515 Analysing Predicate thread ndash Array Access 100516 Applying the Field Access Equation 100517 Analysing Predicate thread ndash Field Access 101518 Gstart_address ndash Dependency Information 103519 Gstart_address ndash Final Dependency Results 104

61 Analysing thread ndash Dependency Summary with Deferred Occurrences 13062 Gstart_address ndash Intermediate Dependency Results for start_address 13163 Substitution of Formal Parameters by Effective Parameters 13164 Substituting Deferred Dependencies by Actual Dependencies 132

71 Body of the stop_thread Predicate 138

xii

72 Targeted Correlation Results for Predicate stop_thread 13973 Intraprocedural Correlations ndash General Representation 14074 Intraprocedural Domain ndash Examples 14175 Entry Point ndash Correlation Information 16276 Analysing Predicate stop_thread ndash Initialisation 16377 Construction Evolution 167

81 ProvenCore ndash Abstract Layers 18382 Distribution of the number of inferred preserved properties 20183 Distribution of the number of inferred predicates for which a property is

preserved 202

xiii

List of Tables

42 αSmil ndash Set of Supported Statements 6243 Statements and their Exit Labels 6344 Predicate Body in αSmil 6446 Well-Typed Predicate Call 6847 Well-Typed Statements 6948 The Structural Operational Semantics of αSmil Generic Statements 7249 Operational Semantics of αSmil Structure-Related Statements 73410 Operational Semantics of αSmil Variant-Related Statements 74411 Operational Semantics of αSmil Array-Related Statements 75412 Semantics of a Predicate Call 76

51 v ndash Comparison of Two Domains 8652 or ndash Join Operation 8753 oplus ndash Reduction Operator 8954 Dependency Extractions 9055 Well-Typed Dependencies 9156 Statements ndash Representations and Data-Flow Equations 9357 Generic Statements ndash Data-Flow Equations 9558 Structure-Related Statements ndash Data-Flow Equations 9559 Variant-Related Statements ndash Data-Flow Equations 96510 Array-Related Statements ndash Data-Flow Equations 97

61 E ndash Path Semantics 12262 Well-Typed Dependency Paths

12463 Extended Leq - Comparison of Two Domains

12664 or ndash Extended Join 12765 oplus ndash Extended Reduction Operator 12766 Extended Extraction Operators 12867 Well-Typed Dependencies ndash Extended 12868 Deferred Paths ndash Application and Substitutions 13369 Interprocedural Domain ndash Substitutions 133

71 vR ndash Comparison of Two Domains 14272 Partial Equivalences ndash orR ndash Join Operation 14373 Partial Equivalences ndash andR ndash Meet Operation 143

xiv

74 Partial Equivalence Extractions 14475 Well-Typed Partial Equivalences 14576 Partial Equivalence Relations ndash Semantics 14677 Well-Typed Access Paths

14878 Well-Typed Correlations

14879 Well-Typed Correlation Maps

149711 Links between Access Paths 152712 Statements ndash Representations and Data-Flow Equations 157719 Well-Formed Intraprocedural Correlation Summaries

162

83 ProvenCore Abstract Layers ndash Global State Type 18584 ProvenCore Abstract Layers ndash ProcessMachine Type 18585 Abstract Layers ndash Evaluation Data and Dependency Analysis Timing 18686 Abstract Layers ndash Detailed Dependency Analysis Timing 18687 Abstract Layers ndash Evaluation Data and Deferred Dependency Analysis

Timing 18788 Abstract Layers ndash Detailed Deferred Dependency Analysis Timing 18789 Abstract Layers ndash Evaluation Data and Correlation Analysis Timing 187810 Abstract Layers ndash Detailed Correlation Analysis Timing 188811 RSMFSP Layers ndash Evaluation Data and Dependency Summaries 190812 TDS Layer ndash Evaluation Data and Dependency Summaries 191813 RSMFSP Layers ndash Evaluation Data and Correlation Summaries 192814 TDS Layer ndash Evaluation Data and Correlation Summaries 193

xv

List of notations

Section Symbol Type DescriptionSec 312 true L Special exit label 34Sec 312 false L Special exit label 35Sec 41 T0 sub T Set of base type identifiers 60Def 411 T Universe of type identifiers 60Def 411 τ T Type 60Def 411 τ0 T Primitive type 60Def 411 structf1 τ1 T Structure type 60Def 411 variant[C1 τ1| ] T Variant type 60Def 411 arrτ 〈τ〉 T Array type 60Sec 41 λ L Exit label 61Sec 41 L Set of exit labels 61Sec 41 error L Special exit label 61Sec 41 σ σp Σ Signature (of predicate p) 61Sec 41 Σ Set of predicate signatures 61Sec 41 o o V Output variable(s) 61Tab 42 s αSmil statement 62Tab 42 o = e αSmil assignment statement 62Tab 42 e1 = e2 αSmil equality test statement 62Tab 42 nop αSmil no operation statement 62Tab 42 r = e1 en αSmil create structure statement 62Tab 42 o1 on = r αSmil destructure structure 62Tab 42 o = rfi αSmil access field statement 62Tab 42 rprime = r with fi = e αSmil update field statement 62Tab 42 rprime = 〈f1 fk〉rprimeprime αSmil partial structure equality 62Tab 42 v = Cp[e] αSmil create variant statement 62Tab 42 switch(v) as [o1| ] αSmil destructure variant statement 62Tab 42 v isin C1 Ck αSmil variant possible statement 62Tab 42 o = a[i] αSmil array access statement 62Tab 42 aprime = [a with i = e] αSmil array update statement 62Tab 42 p(e1 ) [λ1 o1 | ] αSmil predicate call statement 62Sec 42 Gp = (N E) Control flow graph of predicate p 67Def 431 Γ V rarr T Typing environment 68Sec 43 v V Variable 68Sec 43 V Set of variables 68Sec 43 V+ sube V Writable variable identifiers 68Def 432 Σ P rarr S Maps predicate ids to signatures 68

xvi

Def 433 ΣΓO ` srarr λ Well-typed statement 68Sec 43 O sube V+ Output variables of a predicate 68Sec 44 Dτ Semantic values of type τ 70Sec 44 P sube Dτ Domain of valid array indices 71Sec 44 E = V rarr D Valuation or environment type 71Def 442 E E Valuation or environment 71Sec 44 Γ(v) Type of v 71Sec 44 Γ ` E Well-typed environment 71Def 443

langE [s]

rangConfiguration 71

Def 444langE [s]

rang λminusrarr Eprime Transition 71Def 445 E [xrarr v] Extension of E with xrarr v 72Def 446 I = PtimesErarrEtimesL Set of interpretations 72Def 446 I I Interpretation 72Sec 52 D Abstract dependency domain 83Def 521 δ D Dependency 83Def 521 gt D Everything atomic dependency 83Def 521 D Nothing atomic dependency 83Def 521 perp D Impossible atomic dependency 83Def 521 f1 7rarr δ1 D Structure dependency 83Def 521 [C1 7rarr δ1 ] D Variant dependency 83Def 521 〈δ〉 D Array dependency 83Def 521 〈δdef i δexc〉 D Array dependency exception for i 83Def 522 v sube DtimesD Partial order on dependencies 85Tab 51 Rules for v 86Def 523 or DtimesD rarr D Join operator for dependencies 86Tab 52 or cases 87Def 524 oplus DtimesD rarr D Reduction operator for dependencies 88Tab 53 oplus cases 89Def 525 f D 9 D Extraction of a fieldrsquos dependency 89Def 526 C D 9 D Extraction of a constructorrsquos dep 89Def 527 〈i〉 D 9 D Extraction of an arrayrsquos cell dep 89Def 528 〈lowast i〉 D 9 D Extraction of an arrayrsquos dep (exc) 90Def 529 〈lowast〉 D 9 D Extraction of an arrayrsquos dependency 90Tab 54 f c 〈lowast i〉 〈i〉 and 〈lowast〉 cases 90Tab 55 Γ ` δ τ Well-typed dependency 91Def 531 D = VrarrD Intraprocedural dependency domain 92Def 531 ∆ D Intraprocedural dependency 92Sec 531 Unreachable D Intra dep for unreachable nodes 92Def 532 ∆ x DtimesV rarr D Forget x 92Def 533 v∆ sube DtimesD Intraprocedural partial order 92Def 534 or∆ DtimesD rarr D Intraprocedural join operation 92Def 535 oplus∆ DtimesD rarr D Intraprocedural reduction operator 93Sec 532 JsKλ(∆nj ) Contribution of an edge (ni nj) 93

xvii

Sec 532 JsKλ() Transfer function of the edge s λ 93Sec 532 gensλ Written variables on the edge s λ 94Sec 532 ∆n D Dependency domain of node n 94Sec 532 I sube V Set of input variables 96Sec 54 χ Formal-Effective param mapping 101Sec 54 J (χ) Substitution formal to effective 101Def 631 π Π Symbolic path 120Def 631 Π Universe of symbolic paths 120Def 631 ε Π Symbolic path endpoint 120Def 631 fπ Π Symbolic path field 120Def 631 Cπ Π Symbolic path constructor 120Def 631 〈i〉π Π Symbolic path array cell 120Def 631 〈lowast i〉π Π Symbolic path array cells except 120Def 631 〈lowast〉π Π Symbolic path all array cells 120Sec 631 ΠtimesΠrarrΠ Path extension operator 121Sec 631 P 2Π Symbolic path set 121Def 632

v sub 2Πtimes2Π Partial order for path sets 121

Def 633or 2Πtimes2Πrarr2Π Join operator for path sets 121

Def 634 2ΠtimesΠrarr2Π Extension operator for path sets 121Def 635 π Π Actual path 122Def 635 Π Universe of actual paths 122Def 635 ε Π Actual path empty 122Def 635 f π Π Actual path field 122Def 635 Cπ Π Actual path constructor 122Def 635 〈i〉π Π Actual path array cell 122Def 61 E sub E timesΠtimesΠ Symbolic path covers actual path 122Sec 632 E sub E times2ΠtimesΠ Set of symbolic paths covers actual 122Def 636 JP KE sub E times2Πrarr2Π Interpretation of symbolic paths set 123Def 637 at ΠtimesDrarrD Find subpart of value at given path 123Tab 62 I ` π τrarrτ prime sub VtimesΠtimesTtimesT Symbolic paths typing judgement 124Sec 633 I

` P τrarrτ prime sub Vtimes2ΠtimesTtimesTSymbolic paths sets judgement 124

Def 641 δ D Extended dependency 125Def 641 D Ext abstract dependency domain 125Def 641 Deferred(o17rarrP1 ) D Deferred accesses dependency 125Def 642 A V 9 Π Access map 125Tab 63 Deferred rule for v 126Tab 64 or cases for deferred 127Tab 65 oplus cases for deferred 127Tab 66 f c 〈lowast i〉 〈i〉 〈lowast〉 deferred cases128Tab Γ IO ` δ τ Well-typed dependency deferred rule128Def 661 σ V rarr D Substitution roots vars to deps 132Def 662 φ V9V Substitution indices in arrays 132Sec 661 J (σ φ) Substitutes deferred dependencies 132

xviii

Sec 661 bull Applies symbolic paths to dep 132Sec 661 Applies symbolic path to dep 133Def 721 R R Partial equivalence 141Def 721 R Partial equivalence type 141Def 721 Equal R Partial equivalence equal 141Def 721 Any R Partial equivalence unrelated 141Def 721 f1 7rarr R1 R Partial equivalence structure 141Def 721 [C1 7rarr R1 ] R Partial equivalence variant 141Def 721 〈Rdef 〉 R Partial equivalence array 141Def 721 〈Rdef i Rexc〉 R Partial equivalence array + exc 141Def 722 vR sube RtimesR Preorder for partial equivalences 142Def 71 Rules for vR 142Def 723 orR RtimesRrarrR Join for partial equivalences 142Tab 72 orR cases 142Def 724 andR RtimesRrarrR Meet for partial equivalences 142Tab 73 andR cases 142Def 725 extrf R9R Extracts fieldrsquos partial eqv 143Def 726 extrC R9R Extracts constructorrsquos partial eqv 143Def 727 extr 〈i〉 R9R Extracts cellrsquos partial eqv 143Tab 74 extrf extrC and extr 〈i〉 cases 144Tab 75 Γ ` R τ Partial equivalence well-typedness 145Sec 722 JRKτ Partial equivalence semantics 145Def 731 π Π Access path 147Def 731 Π Access path type 147Def 731 ε Π Access path empty 147Def 731 f π Π Access path field 147Def 731 Cπ Π Access path constructor 147Def 731 〈i〉π Π Access path array cell 147Def 732 κ K Correlation map 147Def 732 K = ΠtimesΠrarrR Correlation map type 147Sec 731 (π ρ) 7rarr R ΠtimesΠtimesR Correlation 147Tab 77 ΓI ` π τrarr τ Well-typed access path 148Tab 78 ΓI `(πρ) 7rarrR (τlτr) Well-typed correlation 148Tab 79 ΓI `κ (τlτr) Well-typed correlation map 149Def 733 micro M Link 151Def 733 M Link type 151Def 733 Identical M Link identical 151Def 733 Left π M Link left path has suffix π 151Def 733 Right π M Link right path has suffix π 151Def 733 Incompatible M Link incompatible paths 151Def 734 f ΠtimesΠrarrM Matching Operator 151Def 735 R

(πρ)(πprimeρprime) Aligning a correlation 152

Def 736 Computation of R(πρ)(πprimeρprime) 154

xix

Def 737 ΠtimesR9R Projection 154Def 738 x RtimesΠ9R Injection 154Def 739 κ (πprime ρprime) Aligns correlation maps 154Def 7310v sube K timesK Correlation maps preorder 155Def 7311

orKtimesKrarrK Join for correlation maps 155

Def 7312and

KtimesKrarrK Meet for correlation maps 155Def 741 K K Intraprocedural corr summary 156Def 741 K = VtimesVrarrK Intraproc corr summary type 156Sec 741 NoCorrelation K Any for any pair of variables 156Def 742 vK sube KtimesK v for intraproc corr summaries 156Def 743

orK KtimesKrarrK Join for intraproc corr summaries 156

Def 744 Csλ() C Contribution of an edge 157Sec 741 csλ K Corr created by stmt s on label λ 157Sec 741 killλ sube V Variables redefined by stmt on label157Def 745 (πbull ρbull) 7rarr R ΠtimesΠtimesR New correlation after composition 161Def 746 KtimesKrarrK Composition of correlation maps 161Def 747 CtimesKrarrK Contribution Csλi(Kni) 161Def 719 Γ IO K Well-formed intraproc corr summ 162Sec 742 o Final value of o 162Def 751 Kp ΛprarrK Interproc correlation domain 166Def 751 Λp sube L Output labels of predicate p 166Sec 76 Impossible R Partial eqv constructor impossible 168Sec 76 RCiCj R Partial eqv variant matrix 168

xxi

To my family and close ones

xxiii

Chapter I

Reacutesumeacute eacutetendu en Franccedilais

I1 Le Problegraveme du FrameDans le domaine de la veacuterification formelle de logiciels il est impeacuteratif drsquoidentifier leslimites au sein desquelles les eacuteleacutements ou fonctions opegraverent Une speacutecification com-plegravete drsquoune opeacuteration doit non seulement preacuteciser que les valeurs de sortie possegravedentune certaine propriegravete mais elle doit eacutegalement deacutelimiter les parties de lrsquoeacutetat drsquoeacutentreacuteesur lesquelles lrsquoopeacuteration fonctionne Ces limites constituent les proprieacuteteacutes de frame(frame properties en anglais) Elles sont habituellement speacutecifieacutees manuellement parle programmeur et leur validiteacute doit ecirctre veacuterifieacutee il est neacutecessaire de prouver que lesopeacuterations du programme nrsquooutrepassent pas les limites ainsi deacuteclareacutees La speacutecificationet la preuve de proprieacuteteacutes de frame est une tacircche notoiremment connue comme eacutetantlongue et fastidieuse Lrsquoeffort consideacuterable investi dans cette tacircche est une manifesta-tion du problegraveme de frame (frame problem en anglais) Les manifestations du problegravemede frame apparaissent dans le contexte de tous les langages de speacutecification et de toutesles meacutethodes de veacuterification formelle

I2 ObjectifsAu fil du deacuteveloppement de ProvenCore un micro-noyau polyvalent qui garantit lrsquoisola-tion il est apparu eacutevident que la speacutecification et la veacuterification des systegravemes de transi-tion en geacuteneacuteral ainsi que la speacutecification et veacuterification des systegravemes drsquoexploitation enparticulier ne sont pas immunes au problegraveme du frame Les systegravemes drsquoexploitation sontcaracteacuteriseacutes par des eacutetats complexes deacutefinis par des types de donneacutees algeacutebriques et destableaux associatifs qui sont des briques fondamentales pour repreacutesenter et manipulerdes donneacutees complexes drsquoune maniegravere efficace Les systegravemes drsquoexploitation sont aussicaracteacuteriseacutes par des transitions qui associent de tels eacutetats drsquoentreacutee agrave de nouveaux eacutetatsde sortie Cependant la plupart des transitions ne sont pas concerneacutees par lrsquoeacutetat drsquoen-treacutee dans son inteacutegraliteacute mais deacutependent de et modifient un sous-ensemble de celui-ciIntuitivement des proprieacuteteacutes valides pour lrsquoeacutetat drsquoentreacutee restent trivialement validespour lrsquoeacutetat de sortie obtenue apregraves la transition tant qursquoelles deacutependent seulement desparties de lrsquoeacutetat drsquoentreacutee qui ne sont pas modifieacutees par la transition En pratique prou-ver la preacuteservation de ces proprieacuteteacutes nrsquoest pas une tacircche eacutevidente et impose un effortmanuel conseacutequent et une foule de preuves peacutenibles et reacutepeacutetitives

xxiv

Lrsquoobjectif de notre travail a eacuteteacute drsquoadresser ce problegraveme et de trouver une solutionautomatiseacutee pour infeacuterer la preacuteservation de ces proprieacuteteacutes Plus preacuteciseacutement notre but aeacuteteacute lrsquoinfeacuterence automatique des proprieacuteteacutes qui deacutependent drsquoun sous-ensemble de lrsquoentreacuteequi est disjoint du frame de lrsquoopeacuteration crsquoest-agrave-dire du sous-ensemble de lrsquoeacutetat qui estmodifieacute Agrave cette fin nous avons proposeacute une solution baseacutee sur lrsquoanalyse statique quine requiert pas drsquoannotations de frame suppleacutementaires En deacutetectant le sous-ensemblede lrsquoeacutetat dont deacutepend une proprieacuteteacute ainsi que la partie qui nrsquoest pas affecteacutee par uneopeacuteration nous pouvons reacutesoudre automatiquement les obligations de preuve lieacutees agravedes parties non modifieacutees

Nous employons deux analyses statiques dans ce but une analyse de deacutependance etune analyse de correacutelation Les deux analyses gegraverent des programmes manipulant des ta-bleaux associatifs ainsi que des types de donneacutees algeacutebriques (structures et variants) etcalculent des reacutesultats refleacutetant la structure sous-jacente de ces types (champs construc-teurs et cellules de tableau) Un raisonnement automatique baseacute sur le reacutesultat combineacutede ces deux analyses statiques permet drsquoinfeacuterer la preacuteservation de certaines proprieacuteteacuteesrelatives agrave lrsquoeacutetat de sortie Agrave terme ces deux analyses ont pour vocation agrave ecirctre em-ployeacutees par une tactique de preuve qui sera inteacutegreacutee agrave lrsquoassistant de preuve interactiveinclus dans la suite logicielle ProvenTools deacuteveloppeacutee par Prove amp Run

Smart le langage cibleacute par la suite logicielle ProvenTools est un langage purmentfonctionnel qui manipule des structures de donneacutees algeacutebriques et des tableaux associa-tifs immuables Ce travail a eacuteteacute motiveacute par la veacuterification de ProvenCore ProvenCore estimpleacutementeacute via de multiples raffinements entre des modegraveles successifs du noyau du plusabstrait qui permet la deacutefinition et la preuve de la proprieacuteteacute drsquoisolation au plus concretqui est utiliseacute pour la geacuteneacuteration de code Les eacutetats globaux des couches abstraites sontdes structures complexes contenant de nombreux champs eux-mecircmes composites Descommandes telles que fork exec et exit peuvent ecirctre exeacutecuteacutees Chacune de ces com-mandes reccediloit comme argument un eacutetat global drsquoentreacutee et produit lrsquoeacutetat du systegravemeapregraves exeacutecution de la commande En pratique la plupart des commandes supporteacuteespar le systegraveme ne menacent qursquoun nombre limiteacute drsquoinvariants Prouver automatique-ment la preacuteservation des invariants immunes peut diminuer consideacuterablement le nombretotal de preuves agrave la charge du programmeur et permet agrave celui-ci de se concentrer surles preuves les plus inteacuteressantes

I3 Analyse de deacutependanceLrsquoanalyse de deacutependance gegravere des fonctions et leur speacutecification de maniegravere uniformeElle calcule conservativement pour chaque sceacutenario drsquoexeacutecution possible une approxi-mation des sous-eacuteleacutements de lrsquoeacutetat drsquoentreacutee desquels deacutepend le reacutesultat Pour les va-riants une analyse suppleacutementaire est effectueacutee simultaneacutement afin de calculer le sous-ensemble des constructeurs possibles dans chaque sceacutenario drsquoexeacutecution

Nous avons deacutefini notre propre domaine abstrait repreacutesentant les deacutependances etobtenons des informations de deacutependance qui reflegravetent la structure en couche des typesde donneacutees

xxv

Cette analyse a eacuteteacute conccedilue dans le but drsquoecirctre exeacutecuteacutee agrave la voleacutee durant la veacuterifica-tion interactive et opegravere de maniegravere uniforme sur les programmes et leur speacutecificationces deux points confeacuterant agrave notre approche son originaliteacute Nous avons impleacutementeacute unprototype de cette analyse de deacutependance en OCaml et lrsquoavons appliqueacutee agrave une speacuteci-fication fonctionnelle de ProvenCore Les reacutesultats obtenus sont positifs par exemplelrsquoanalyse de deacutependance srsquoexeacutecute en moins drsquoune seconde sur un ensemble de plus de600 preacutedicats totalisant approximativement 10000 lignes de code

Afin drsquointroduire pour lrsquoanalyse de deacutependance une forme de sensibiliteacute au contextenous avons conccedilu une extension baseacutee sur des chemins symboliques Cette extensionrallonge leacutegegraverement le temps drsquoexeacutecution (de 10 agrave 20 sur les benchmarks utiliseacutes)Cependant en utilisant lrsquoanalyse de deacutependance avec cette extension nous avons obtenudes reacutesultats plus preacutecis pour 50 des preacutedicats inclus dans ces benchmarks

I4 Anaylse de correacutelationLrsquoanalyse de correacutelation deacutetecte le flot de valeurs drsquoentreacutee dans les valeurs de sortie Ellecalcule conservativement une approximation des eacutequivalences entre les sous-eacuteleacutementsdrsquoentreacutee et ceux de sortie pour une fonction donneacutee Crsquoest une analyse statique inter-proceacutedurale qui reacutesume le comportement drsquoune fonction et qui deacutetecte quelles partiesde lrsquoeacutetat sont modifieacutees et dans quelle mesure Nous avons deacutefini un type drsquoeacutequivalencepartiel qui reflegravete la structure des types de donneacutees algeacutebriques et tableaux associatifsPour gagner en preacutecision et ne pas perdre drsquoinformations lorsque lrsquoentreacutee et la sortieont des types diffeacuterents nous avons introduit un niveau intermeacutediaire Les correacutelationsconsistent donc en des chemins drsquoaccegraves vers des sous-eacuteleacutements de mecircme type et deseacutequivalences entre ces sous-eacuteleacutements Ce niveau intermeacutediaire permet de calculer demaniegravere flexible des eacutequivalences preacutecises entre des parties de lrsquoentreacutee et des parties dela sortie

Nous avons lagrave aussi impleacutementeacute en OCaml un prototype de cette analyse de cor-reacutelation et nous lrsquoavons appliqueacute agrave une speacutecification fonctionnelle de ProvenCore Lesreacutesultats obtenus sont encourageants par exemple les correacutelations calculeacutees pour unsous-ensemble de 630 preacutedicats totalisant approximativement 10000 lignes de code sontobtenus en moins de 05 secondes Bien que plus complexe que lrsquoanalyse de deacutependancelrsquoanalyse de correacutelation srsquoexeacutecute plus rapidement sur nos benchmarks car contrairementagrave la premiegravere elle ne srsquoapplique qursquoaux fonctions mais pas aux speacutecifications En effetles speacutecifications sont des preacutedicats booleacuteens et ne retournent pas un eacutetat modifieacute

I5 Proceacutedure de deacutecisionNous avons esquisseacute une proceacutedure de deacutecision qui emploie nos deux analyses statiquesCelle-ci constitue la premiegravere eacutetape de notre solution pour lrsquoinfeacuterence automatique dela preacuteservation des invariants de frame En mettant au jour des eacutequivalences entreles entreacutees et les sorties et apregraves avoir deacutetecteacute qursquoune proprieacuteteacute ne deacutepend que de

xxvi

parties inchangeacutees il est possible drsquoinfeacuterer la preacuteservation des invariants pour ces partiesinchangeacutees

La proceacutedure de deacutecision nrsquoa pas encore eacuteteacute impleacutementeacutee mais des expeacuteriencespreacuteliminaires et un prototype simple nous donnent une ideacutee de la maniegravere dont lesreacutesultats de deacutependance et de correacutelation doivent ecirctre unifieacutes Par ailleurs cela nous apermis de deacuteterminer le genre de requecirctes qui peuvent ecirctre traiteacutees et le meacutecanismepermettant drsquoy reacutepondre Les reacutesultats obtenus gracircce agrave notre prototype simple sur unespeacutecification fonctionnelle de ProvenCore sont deacutecrits et analyseacutes

Lrsquounification des reacutesultats des deux analyses passe par la creacuteation drsquoun graphe re-liant les variables drsquoentreacutee et de sortie examineacutees par la requecircte Les arcs repreacutesententdes correacutelations entre des sous-eacuteleacutements de ces variables qui sont deacutetecteacutees par la se-conde analyse Les deacutependances de la proprieacuteteacute dont on cherche agrave infeacuterer la preacuteservationindiquent les sous-eacuteleacutements qui influent sur le reacutesultat de cette proprieacuteteacute Lorsque cessous-eacuteleacutements sont laisseacutes intacts la proprieacuteteacute est trivialement preacuteserveacutee Lrsquoalgorithmedrsquounification parcourt donc le graphe en tentant de deacutetecter un maximum drsquoeacutequiva-lences entre des sous-eacuteleacutements des variables drsquoentreacutee et de sortie Si les sous-eacuteleacutementsindiqueacutes par la deacutependance sont inclus dans lrsquoensemble des sous-eacuteleacutements eacutequivalentsalors la proprieacuteteacute est neacutecessairement preacuteserveacutee car toutes les valeurs influant sur sonreacutesultat sont les mecircmes avant et apregraves lrsquoexeacutecution de lrsquoopeacuteration

I6 ConclusionPour conclure nous avons conccedilu et impleacutementeacute deux analyses statiques qui deacutetectentles deacutependances de donneacutees drsquoune proprieacuteteacute logique ainsi que des correacutelations entreles entreacutees et sorties drsquoopeacuterations Nos premiers reacutesultats sur un modegravele fonctionneldrsquoun micro-noyau sont encourageants tant pour leur preacutecision que pour la vitesse delrsquoanalyse ce qui rend ces analyses adeacutequates pour un usage dans le cadre drsquoun prouveurinteractif Hormis de menues ameacuteliorations impactant la preacutecision de notre analyse lesprochaines eacutetapes consistent agrave les combiner afin de deacutetecter les invariants qui ne sontpas affecteacutes par lrsquoexeacutecution drsquoun preacutedicat puis inteacutegrer cette deacutetection comme tactiquedans le prouveur de theacuteoregravemes ProvenTools Nous pensons qursquoil est possible de tirerparti des speacutecifications de frame agrave moindre coucirct en particulier sans que cela imposeau programmeur lrsquoeacutecriture fastidieuse drsquoannotations intuitivement eacutevidentes Lors dela veacuterification formelle de systegravemes de transition complexes il devient alors possibledrsquointeacutegrer aux outils de deacuteveloppement une infeacuterence automatique de la preacuteservationdes invariants lieacutes au frame via lrsquoanalyse statique

1

Chapter 1

Introduction

No human investigation can claim tobe scientific if it doesnrsquot pass the testof mathematical proof

Leonardo da Vinci

11 Formal Verification of SoftwareSince the middle of the last century computers and information technology broughtforth a digital revolution fundamentally changing the way we live work and inter-act with one another Nowadays computer programs govern our world and softwarepermeates our lives in manifold ways shaping our interactions with the surroundingenvironment From the alarm clock that marks the start of our day and the coffee ma-chine that motivates us to leave the house to the smart phone we use for checking ouremails or bank account and the car we are driving (or the automated driverless subwaywe are relying on) some type of software is discreetly acting in the background Wehave grown so accustomed to it that we do not even notice it anymore until it assertsitself by impeding us to check our email by displaying a blue error screen on an ATM orticket machine or by serving us a salty bag of crisps instead of the desperately neededbottle of water we have just paid for on a vending machine Such reminders can lead tofrustration and cause inconveniences but essentially they cause minor problems How-ever receiving such reminders as a result of malfunctions of medical equipment suchas radiation therapy machines of flight control systems Mars orbiters satellites or nu-clear power plants can have dramatic consequences endangering human lives causingenvironmental harm or entailing significant financial losses Therefore the quality ofthe software around us not only influences the quality of our daily lives but it mightpotentially have an impact on our safety and the safety of our surrounding world

Writing reliable completely error-free software is a difficult task and even a utopianone in the absence of dedicated rigorous approaches for improving its quality Indeedfor many software systems no guarantees or warranties are provided and their qualityis addressed only by traditional software engineering approaches such as testing or codereview which cannot guarantee the absence of bugs While this can be acceptable fornon-critical programs mission- or safety-critical software systems for which software

2 Chapter 1 Introduction

quality is of the utmost importance have to guarantee the absence of runtime errorsand provide high levels of confidence regarding their functional correctness Certainsafety-critical market segments impose standards and regulatory requirements for thedevelopment of such software systems In these domains formal program verificationis emerging as a promising approach gaining a wider audience and more and moreterrain

Formal program verification comprises a set of techniques and tools that can be usedto ensure by mathematical means that the program under scrutiny fulfills its functionalcorrectness requirements ie that it computes the right information For achieving thisgoal a formal description or specification of the programrsquos expected behaviour mustbe given Once this is established multiple mathematical tools can be employed forformally verifying that the programrsquos implementation follows the formal specification

Formal methods can be traced back to the early days of computer science andtheir origin can be linked to the names of Floyd (Floyd 1967) Hoare (Hoare 1969)and Naur (Naur 1966) (and later to that of Dijkstra (Dijkstra 1976)) and theirmethods for verifying program code with respect to assertions Despite their earlyfoundations formal methods seemed for decades to be confined to the research worldas a consequence of intricate notations failure to scale to real-world programs andlimited or inadequate tool support Since the 1960rsquos however considerable progresshas been made in the field of formal methods in terms of both methodology and toolsfor computer aided program verification Still formal program verification methods arenot yet a widespread alternative or even complement to testing in the industry Unliketesting that cannot show the absence of bugs the goal of formal verification methodsis to prove by means of mathematical tools that the program execution is correct in allspecified environments without actually executing the program itself These are staticverification techniques

Static verification techniques include program typing model checking deductiveverification methods and static program analysis Besides requiring a formal specifica-tion of the programrsquos intended behaviour and its envisioned properties at runtime allformal methods are theoretically characterized by undecidability and complexity whichare addressed by introducing some form of approximation For soundness consider-ations these approximations are necessarily over-approximations and all static veri-fication techniques are necessarily conservative they can prove the absence of someerroneous runtime behaviours but they will inevitably trigger some false warnings re-jecting certain behaviours that are in practice correct

Program Typing Type systems (Cardelli and Wegner 1985) are tools for reasoningabout programs More specifically they constitute ldquoa syntactic method for proving theabsence of certain program behaviours by classifying phrases according to the kindsof values they computerdquo (Pierce 2002) They are used for computing static approxi-mations of the runtime behaviours of the terms in a program and can guarantee thatwell-typed programs are free from certain runtime type errors such as passing stringsas arguments to a primitive arithmetic operation or using an integer as a pointer

11 Formal Verification of Software 3

In practice type systems have become the most widespread instance of formalmethods with applications to many programming languages and automatic typecheck-ers built into a variety of compilers Static typecheckers entail a variety of benefitsranging from early error detection to offering convenient abstraction and documen-tation mechanisms and improving the efficiency of compilers which nowadays makeuse of the information provided by typecheckers during their optimization and codegeneration phases

The Curry-Howard correspondence implies that types can be used for expressingarbitrary complex mathematical specifications Additional type annotations could inprinciple enable the full proof of complex properties effectively transforming typecheckers into proof checkers (Pierce 2002) Approaches such as Extended Static Check-ing (Leino 2001 Leino and Nelson 1998 Flanagan et al 2002) made progress towardsimplementing entirely automatic checks for broad classes of correctness properties

Additionally approaches relying on type inference have been used for alias analy-sis (OrsquoCallahan and Jackson 1997) and exception analysis (Leroy and Pessaux 2000)Powerful type systems based on dependent types (Martin-Loumlf 1984 Nordstroumlm Peters-son and Smith 1990) are used in automated theorem proving Various proof assistantsincluding Coq (Bertot and Casteacuteran 2004 Sozeau and team 1997) 1 are based on typetheory

Model Checking Model checking is a verification technique exhaustively exploringall possible system states in a systematic manner (Baier and Katoen 2008) More pre-cisely given a finite-state model of a system and a formal property a model checkingtool verifies whether the property under scrutiny holds for a state in the given modelModel checking emerged as a popular lightweight formal method as a consequence ofprogress made in the development of program logic and decision procedures auto-matic model checking techniques and compiler analysis (Jhala and Majumdar 2009)First program logic and decision procedures (Nelson and Oppen 1980 Shostak 1984)provided the needed framework and algorithmic tools to reason about infinite statespaces Automatic model checking techniques (Clarke and Emerson 1981 Vardi andWolper 1994) for temporal logic provided algorithmic tools for state-space explorationAbstract interpretation (Cousot and Cousot 1977) provided connections between thelogical world of infinite state spaces and the algorithmic world of finite representa-tions (Jhala and Majumdar 2009)

Currently model checking continues attracting considerable attention from the in-dustry This can be partly explained by it being a rather general verification approachthat is suitable for applications stemming from different areas ranging from embeddedsystems to hardware design In addition it is also an automatic lightweight techniquesupporting partial verification and requires a low degree of user interaction and a lowerdegree of expertise (Baier and Katoen 2008) compared to other verification techniques

1Coq Reference Manual Version 86 httpscoqinriafrdistribcurrentfilesReference-Manualpdf

4 Chapter 1 Introduction

Its main weaknesses stem on one hand from it suffering from the combinatorial state-space explosion (the number of states needed to model the system accurately may easilyexceed the amount of available computer memory) and on the other hand from itbeing less suitable for data-intensive applications

Model checking techniques also impose the production of models often expressedusing finite-state automata which are in turn described in a dedicated description lan-guage Another prerequisite for model checking is a formal specification of the prop-erties to be verified typically provided by means of temporal logic which is suitablefor the specification of a variety of properties ranging from functional correctness andsafety to liveness fairness and real-time properties (Baier and Katoen 2008)

Deductive Verification Methods Deductive verification methods consist in pro-ducing formal correctness proofs by first generating a set of formal mathematical proofobligations from the program and its specification and by subsequently dischargingthese Based on the manner in which proof obligations are discharged namely auto-matically or interactively the deductive verification methods can be classified into twobroad categories Both require a thorough understanding of the system to be provenas well as a good knowledge of the employed proof tools

The first category of deductive methods rely on standalone tools that accept asinputs programs written in a specific programming language (such as Java C or Ada)and specified in a dedicated annotation language (such as JML or ACSL) These auto-matically produce a set of mathematical formulas called verification conditions whichare typically proven using automatic theorem provers (Gallier 1987) or satisfiabilitymodulo theories solvers (SMT) such as Alt-Ergo Z3 CVC3 Yices Deductive verifi-cation tools such as Why3 or Boogie have their own programming and specificationlanguage (WhyML and Boogie respectively) which can act as intermediate verifica-tion languages and are designed as a layer on which to build program verifiers for otherlanguages Verifiers for C Dafny Chalice and Spec have been built using BoogieWhyML has been used for the verification of Java C and Ada programs

The second category of deductive methods relies on interactive theorem provers(Bertot and Casteacuteran 2004) also called proof assistants such as Isabelle Coq AgdaHOL or Mizar Both the program and its specification are encoded in the proof as-sistantrsquos own language (Gallina and Isar respectively) and the proofs that a programfollows its specification ie that it is functionally correct are typically conducted inan interactive manner using the underlying proof construction engine In other wordsusers are required to actively participate in the verification process by providing induc-tive arguments and guiding the proof through proof tactics proof hints or strategies

Both deductive verification methods offer a high level of assurance For automatictheorem provers the proof chain consisting of multiple steps (the model of the inputprogramming language the generator of verification condition the used SMT solver) atwhich errors could potentially infiltrate can be perceived as a weakness For interactivetheorem provers the high-level expertise required to employ them can be perceived asdiscouraging by the wider audience However major industrial breakthroughs havebeen recently achieved For instance Hyper-V Microsoftrsquos hypervisor for highly secure

12 The Frame Problem in a Nutshell 5

virtualization was verified using VCC and the Z3 prover (Leinenbach and Santen 2009)CompCert (Leroy 2009) the first formally proven C compiler was verified using theCoq proof assistant High security properties of the seL4 microkernel (Klein et al2009) have been proven using the IsabelleHOL proof assistant

Static Program Analysis Static program analysis comprises multiple techniquesfor computing at compile-time safe approximations of the set of values or behavioursthat can occur dynamically when executing a program Static analysis techniquesinitially emerged in the field of compilation where they provided manners to generatecode efficiently by avoiding redundant or superfluous computations (Nielson Nielsonand Hankin 1999)

Static analyses compute sound conservative information However for decadestheir scalability to industrial-size programs has been doubted and their application hasbeen considered as being limited to the research world and to small programs Recentmajor breakthroughs have been achieved however and they triggered on one hand theinclusion of static analysis at different levels of the software validation process (Cousot2001) and on the other hand a proliferation of static code analysers for a varietyof languages targeting mainstream usage and offering a solution for detecting andeliminating common runtime errors A recent example is Infer (Calcagno and Distefano2011) an open-source static analysis tool for bug detection in Java C and Objective-Ccode It was developed at Facebook where it is used as part of the development processfor mobile applications Furthermore static analysis techniques and tools are nowadaysemployed in the safety-critical market segment For instance Astreacutee (Cousot et al2005 Blanchet et al 2003 Cousot et al 2007) a static analyser for embedded softwarewritten in C has been employed for the verification of aerospace software (Delmas andSouyris 2007 Bouissou et al 2009 Bertrane et al 2015) In particular it has beenused for proving the absence of runtime errors in the primary flight control software ofthe fly-by-wire system of Airbus airplanes

It is argued (Cousot and Cousot 2010) that model checking deductive verifica-tion and static program analysis represent approximations of the program semanticsformalized by the abstract interpretation theory (Cousot and Cousot 1977)

Broadly speaking this thesis focuses on static program analysis techniques that aremeant to be used during interactive theorem proving in order to facilitate and auto-mate the verification of a certain class of properties in the context of a strongly typedlanguage

12 The Frame Problem in a NutshellThe frame problem (McCarthy and Hayes 1969) has been initially identified and de-scribed by McCarthy and Hayes in 1969 in the context of Artificial Intelligence (AI) Itshistory is essentially intertwined with that of logicist AI the branch of AI attempting

6 Chapter 1 Introduction

to formalize reasoning within mathematical logic The initial description of the frameproblem is the following

ldquoIn proving that one person could get into conversation with anotherwe were obliged to add the hypothesis that if a person has a telephone hestill has it after looking up a number in the telephone book If we hada number of actions to be performed in sequence we would have quite anumber of conditions to write down that certain actions do not change thevalues of certain fluents In fact with n actions and m fluents we mighthave to write down mn such conditionsrdquo

Unsurprisingly given its identification in the context of logicist AI the frame prob-lem manifests itself in the realm of formal software specification and verification aswell (Borgida Mylopoulos and Reiter 1993) In this area it continues to identify acurrent problem having notoriously tedious consequences and imposing a considerableamount of manual effort For instance when considering a simple procedure

transferAmount(ownerId id1 id2 amount)

that records the transfer of a given sum of money amount from a customerrsquos (identifiedby ownerId) current deposit account (identified by the account number id1) to a savingsaccount (identified by the account number id2) a reasonable specification would bethe following

Precondition owner(id1) = ownerId and owner(id2) = ownerIdandavailableAmount(id1) ge amount

Postcondition availableAmount(id1)rsquo = availableAmount(id1) - amountandavailableAmount(id2)rsquo = availableAmount(id2) + amount

The program states prior to the procedurersquos execution and the ones subsequent to it arereferred to by the typical unprimedprime notation and by the availableAmount(id)and owner(id) functions The given specification declares a precondition that hasto hold prior to transferring the indicated sum of money from one account to theother and it stipulates that the customer identified by ownerId must be the owner ofboth accounts involved in the transaction It also requires that the currently availableamout of money in the deposit account identified by id1 is higher than the amount tobe transferred The postcondition specifies the procedurersquos effects on the final programstate and encompasses the conditions that have to hold after executing the procedureThey include a stipulation about incrementing the amount of money available in thesavings account by the transferred sum amount as well as one referring to decrementingthe amount of money available in the current account by the same amount

As discussed by Borgida et al (Borgida Mylopoulos and Reiter 1993) the prin-ciples on which this specification relies are simple and ubiquitous Program states

13 Prove amp Run Objectives and Products 7

are represented in terms of predicates and functions and a procedurersquos effects on theprogram state are represented as changes to one or more of these predicates and func-tions However the above specification can be interpreted in at least two manners andmultiple implementations with different effects can comply to it For instance oneimplementation that can be considered results in exactly two changes to the programstate as required by the postcondition and as intuitively expected Another implemen-tation considered makes these two changes but additionally also changes the ownershipof the two accounts involved in the transition The postcondition still holds after exe-cuting the second procedure version However the intuitive interpretation of the givenspecification namely that nothing else but the amount of money in the two accountschanges is inconsistent with the second implementation which does more than it isnecessary and indeed even desired In order to prevent such situations the postcon-dition for the transferAmount(ownerId id1 id2 amount) procedure would haveto also include conditions such as

forall id owner(id)rsquo = owner(id) and owner(id2)rsquo = owner(id2)and

forall id id = id1rArr id = id2rArr amount(id)rsquo = amount(id)

In other words the postcondition should include not only information about whatchanges but also about what does not change While this might not seem dramaticfor the trivial example illustrated above in real-world examples this quickly escalatesleading to the necessity of specifying a plethora of conditions of the same type as theones indicated above These are called frame properties Writing such conditions isnecessary but also notoriously repetitive and tedious Kogtenkov et al (KogtenkovMeyer and Velder 2015) rightfully state that

ldquoIt is hard enough to convince programmers to state what their programdoes forcing them in addition to specify all that it does not do may be atough sellrdquo

The tedious undeserved manual effort entailed by the specification and verificationof frame properties is a manifestation of the frame problem Though certain conventionsand approaches such as the implicit frames approach for specifying frame propertiescan alleviate the manual effort imposed some manifestation of the frame problem willbe visible to some extent in the context of any specification language and verificationmethod

13 Prove amp Run Objectives and ProductsThe proliferation of mobile devices with unprecedented processing power storage ca-pacity and access to information already generated a plethora of new possibilities forbillions of people Breakthroughs in emerging technology stemming from fields suchas artificial intelligence and the Internet of Things have increased the number of such

8 Chapter 1 Introduction

possibilities but also brought forth an unprecedented number of massive security risksand challenges Prove amp Runrsquos2 objective is to offer solutions for the security chal-lenges entailed by the large-scale deployment of mobile and connected devices and ofthe Internet of Things

Attempts at addressing security challenges and diminishing or eliminating potentialsecurity issues in systems linked to such devices must put their underlying operatingsystems and kernels at the core of their efforts to ensure the absence of errors orfaulty behaviours Any software running on the operating system depends on theoperating system Furthermore operating systems run in privileged modes in whichprotection from certain faulty behaviours is non-existing and bugs can lead to arbitraryeffects Therefore these central software parts need to provide a high level of trust anddemonstrate proven and auditable compliance with security properties

Motivated by the desire to integrate the usage of formal methods in the industryworld and therefore to contribute to the increase of software quality and security thecompanyrsquos initial efforts concentrated on offering a reliable software solution that fa-cilitates the formalization of software functioning and mathematically proves that thissoftware accurately and correctly follows its specification and ensures complex secu-rity properties This led to the development of ProvenTools a software developmenttoolchain designed to write and formally prove models written in Smart Prove amp Runrsquospurely functional unified programming and specification language For formally prov-ing models written in Smart ProvenTools integrates an interactive proof assistant whichautomates simple proofs and guides or assists users during more complex ones Theprover was designed to offer detailed explanations about its results providing either thereasoning steps employed for achieved proofs or detailed information for properties thatcannot be proven Such transparency on the proverrsquos side is imperative for productsthat have to be certified as auditors need to be able to verify the claims of the proverFurthermore ProvenTools includes a generator for transforming programs modeled inSmart into their equivalents in other languages such as C while leveraging the proofguarantees of the Smart model

Following the development of ProvenTools Prove amp Run reached a new stage con-centrating on developing and providing formally proven microkernels and hypervisorsUnlike the widely used operating systems which are enormous and typically have mil-lions of lines of code microkernels are compact minimal software systems that canprovide all the mechanisms that need to run in privileged mode including low-level ad-dress space management thread management and inter-process communication Theycan be used for creating a protected secure environment on the execution platformon top of which sensitive security-critical services can run Being much smaller in sizecompared to traditional operating systems they are amenable to formal verificationHypervisors or virtualization platforms create and host virtual machines They cre-ate the possibility of running multiple different operating systems whose execution ismanaged by the hypervisor which has full control over all critical resources such asthe memory or the CPU Therefore any security issue of the hypervisor impacts every

2Prove amp Run Website httpwwwprovenruncom

14 Context and Problem Statement 9

operating system it hosts The security and reliability of the host hypervisor is thuscrucial

By employing Smart and ProvenTools two microkernels have been developed3 Thefirst named ProvenCore is a formally proven general purpose microkernel that ensuresisolation ie integrity and confidentiality The second named ProvenCore-M targetsembedded devices based on microcontrollers ProvenVisor is a hypervisor currently indevelopment at Prove amp Run

14 Context and Problem StatementDuring the development of ProvenCore it became obvious that the specification andverification of transition systems in general and operating systems in particular arenot insulated from the frame problem The latter are characterized by complex statesdefined by algebraic data types and associative arrays which are fundamental buildingblocks for representing grouping and handling complex data efficiently Transitionstheir other characteristic component map such a complex input state to an outputstate However most transitions are rarely concerned with the entire input state thatthey are manipulating for retrieving the output state Most frequently they depend on

sX

t

f

Observation

Observation

Figure 11 ndash Complex Transition Systems Frame Problem

and modify only a limited subset of it Intuitively properties holding for the inputstate should hold for the output state following the transition as well as long asthey depend only on fragments of the state that are not modified by the transition Inpractice proving the preservation of such properties does not come for free and imposesconsiderable manual effort and a multitude of tedious repetitive proofs

3Prove amp Run Products httpwwwprovenruncomproducts

10 Chapter 1 Introduction

This general case is illustrated in Figure 11 where a transition system and a states in it are considered For the state s a property depending only on a limited subsetshown in the grey rectangle with vertical lines is known to hold A transition f leadsto a new state t obtained by modifying only a small part of the input state s shownin the orange rectangles with inclined lines Since the previously proven property isknown to depend only on an unmodified subset of the state we should be able to inferthe preservation of the property for the state t as well This however is not inferred bydefault

The goal of this work is to address this issue and to find an automatic solution forinferring the preservation of such properties More specifically we target the automaticinference of properties that depend only on an input subset that is disjoint from anoperationrsquos frame ie the state subset it modifies

To this end we propose a solution based on static analysis which does not requireany additional frame annotations We argue that by detecting the subset on which aproperty depends and by uncovering the part that is not modified by an operationas shown in Figure 12 we can automatically discharge proof obligations related tounmodified parts We employ two different static analyses for this goal

Dependency Obs

= Obs

Correlation f

=

Invariant Obs

rArr Obs

f

Figure 12 ndash Frame Problem and Solution Strategy

The first analysis of our two-step strategy is a dependency analysis which is meantto detect the input subset δ on which the outcome of an operation or of a logicalproperty L relies This was illustrated by the grey rectangle with vertical lines inFigure 11 The second one is a correlation analysis meant to detect the subsetξ modified by an operation O This was illustrated by the orange rectangles withinclined lines in Figure 11 By employing these two static analyses thus detecting δand ξ automatically and by subsequently reasoning based on their combined resultswe can infer the preservation of the property L for the post-state of O

We target the development of a proof tactic that relies on our solution based onstatic analysis and that is meant to be integrated into the interactive proof assistantoffered by ProvenTools Smart the language to which the ProvenTools toolchain isassociated is a purely functional language manipulating immutable algebraic datastructures and associative arrays

15 Contributions and Structure of the Document 11

The motivation and ideas behind this work were triggered by the verification ofProvenCore Its proof is based on multiple refinements between successive models fromthe most abstract on which the isolation property is defined and proven to the mostconcrete ie the actual model used for code generation The global states of the ab-stract layers are complex structures with multiple compound fields Commands suchas fork exec exit can be executed Each of these receives as input the global statebefore executing the command and returns the state of the system after execution Inpractice most supported commands effectively affect only a limited number of invari-ants Automatically proving the preservation of unaffected invariants can diminish thetotal number of proof obligations

15 Contributions and Structure of the DocumentWe propose an approach for automatically inferring the preservation of framing-relatedinvariants which is meant to be used in the context of an interactive theorem proverOur approach employs two different static analyses namely a dependency analysis and acorrelation analysis Both analyses handle associative arrays and algebraic data typesie structures and variants and compute fine-grained results mirroring the layeredstructures of such types

The dependency analysis handles functions and their specifications in a unified man-ner and computes for each possible execution scenario a conservative approximation ofthe input (sub)elements on which their outcome depends It is a flow-sensitive path-sensitive interprocedural analysis For variants an additional analysis is simultaneouslyconducted for computing the subset of possible constructors on a given execution sce-nario

In order to introduce a relaxed form of context-sensitivity for our dependency anal-ysis we have devised an extension based on symbolic paths

The correlation analysis detects the flow of input values into output values It com-putes a conservative approximation of fine-grained equivalences between the input andthe output subelements of a function It is an interprocedural analysis that summarisesthe behaviour of functions and detects what is modified and to what extent

For both analyses a prototype has been implemented and applied to a medium-sizedfunctional specification of a microkernel

The rest of this dissertation is structured into 8 chapters the first two being intro-ductory

Chapter 2 discusses the manifestations and effects of the frame problem on bothformal specification and formal verification and presents some of the main approachesemployed for addressing them We also include a brief presentation of some of theleading specification languages and deductive verification tools and their mechanismsfor dealing with frame properties

In Chapter 3 we introduce the features and the syntax of Smart the unified pro-gramming and specification language developed at Prove amp Run and give a conciseoverview of ProvenTools the toolchain associated with it

12 Chapter 1 Introduction

After these two preliminary chapters in Chapter 4 we focus on the computationalversion of Smartrsquos intermediate language as it is the language that we consider through-out the rest of this dissertation We present its syntax underline its specificities andpresent its formal semantics

Chapter 5 is dedicated to the dependency analysis the first of the two static analysesthat we have developed and designed as companion tools to be used during interactiveprogram verification We present our abstract dependency domain that mirrors thelayered structure of associative arrays and algebraic data types discuss the analysisat an intra- and interprocedural level and present the semantic interpretations of thecomputed dependency information

Chapter 6 touches upon the issue of context-sensitivity and presents our extensionto the dependency analysis presented in Chapter 5 This is meant to eliminate someimprecision by introducing a relaxed form of context-sensitivity

The correlation analysis the second component of our strategy for inferring thepreservation of frame-related invariants is presented in Chapter 7 We introduce ourabstract partial equivalence type discuss the need for an additional level of abstractionallowing us to refer not only to variables but also to substructures within them and givean in-depth presentation of the analysis at an intraprocedural level and a descriptionof it at the interprocedural level

The implementations of our two analyses and the results obtained on a medium-sizedfunctional specification of a microkernel are presented in Chapter 8 The strategy foremploying the information computed by the two analyses is discussed and illustrated

Finally Chapter 9 concludes this dissertation with a summary of our contributionsand some remarks concerning the specificities of each of our static analyses as wellas our experience with their design and implementation In addition we also discussfuture perspectives and potential extensions to this work

Notes about Chapter 5 and Chapter 7

bull The work presented in Chapter 5 was the subject of a publication in the pro-ceedings of the 17th International Conference on Formal Engineering Methods(ICFEM15) (Andreescu Jensen and Lescuyer 2015)

bull The work presented in Chapter 7 was the subject of a publication in the proceed-ings of the 14th International Conference on Software Engineering and FormalMethods (SEFM) (Andreescu Jensen and Lescuyer 2016)

bull On-line dedicated web pages The prototypes for each of the two discussedstatic analyses can be tested on their dedicated web pages Various examplesare provided and explained and additionally users can devise and test their ownexamples The corresponding links are indicated in the chapters

13

Chapter 2

The Frame Problem in SoftwareVerification

All his successors gone before him havedonersquot and all his ancestors that comeafter him may

William Shakespeare

In this chapter in Section 21 we give a very brief necessarily incomplete pre-sentation of some of the major existing specification languages and verification toolsfocusing on those which have addressed the frame problem explicitly and which are rel-evant for our discussion in the section following it We then discuss the manifestationsof the frame problem in formal specification and verification in Section 22 and presentthe basic approaches to specifying and verifying frame properties in Section 23 In Sec-tion 24 we explain some of the difficulties entailed by these goals when combined withother concerns such as considerations regarding heap modifications and informationhiding Even though we are not concerned with information hiding and heap modifica-tions are beyond the scope of our work there are some parallels that can be drawn andsome ideas stemming from work that has been done in these areas that are relevant forour context and solution as well In Section 25 we briefly present other approaches tothe automatic detection of frame properties Finally we give a short overview of someof the approaches used for specifying and reasoning about pure methods in Section 26

21 Specification Languages and Verification ToolsDafny Dafny (Leino 2010) is a programming language designed at Microsoft witha focus on verification It is an imperative sequential language supporting genericclasses dynamic allocation and inductive data types Additionally it also offers built-in specification constructs such as pre- and postconditions frame specifications (whichwe will discuss in more detail in Section 23) quantifiers loop invariants and termi-nation metrics (decreases clauses used in conjunction with loop invariants) Theseare reminiscent of contracts in Eiffel (Meyer 1997 Meyer 1991) or similar constructsin JML (Leavens Baker and Ruby 2006) and Spec (Barnett et al 2005b) whichwe will present in the following paragraphs as well Additionally Dafny also includes

14 Chapter 2 The Frame Problem in Software Verification

support for algebraic data types recursive functions and types as well as updatableghost variables which are not allowed to flow into non-ghost variables Ghost vari-ables and specification constructs in general are eliminated from the executable codeas they are meant to be used strictly during verification For framing Dafny relies ondynamic frames (Kassios 2006) using ghost variables We will discuss this approach inSection 24

Dafny has an accompanying static program verifier run as part of the compilerwhich targets the verification of functional correctness properties of programs Thisis built on top of the Boogie verification engine (Barnett et al 2005a) which in turnuses Z3 (Moura and Bjoslashrner 2008) The Dafny compiler translates verified programswritten in Dafny to executable code for the Net Platform The tool is open source andcan be tried online 1

Smart the modeling language developed at Prove amp Run will be presented in detailin Chapter 3 Similar to Dafny it is a unified programming and specification languagedesigned with the goal of facilitating verification Unlike Dafny Smart is a functionallanguage relying on predicates the equivalent of functions in other programming lan-guages Both Dafny and Smart are translated into intermediate languages (Boogie andSmil respectively) which act as median layers between Dafny or Smart programs andthe underlying verification tools For Smart the deductive verification tool is an inter-active proof assistant Executable code can be generated from both verified Dafny andverified Smart models

Spec The Spec programming system (Mike Barnett 2005 Barnett et al 2005bBarnett et al 2011) includes a programming language a compiler and a static programverifier It stems from a research effort focusing on the development of a specificationmethodology for object-oriented languages and seeking suitable approaches for enforc-ing it both statically and dynamically The Spec methodology introduced some newideas that influenced the research community and served as a starting point for otherapproaches (Barnett et al 2011) It supports sound modular verification of object in-variants in the presence of multi-object invariants subclassing and reentrancy Specled to advances concerning the specification of pure methods ie methods withoutside-effects and it introduced an ownership model that allows expressing and usingheap topologies in specifications (Barnett et al 2011) We will discuss the latter inSection 24

The language Spec is a formal object-oriented language extending the type sys-tem of C with non-null types and checked exceptions It provides standard methodcontracts based on pre- and postconditions as well as object invariants as inspiredby Eiffel and the Design by Contract (Meyer 1992) approach The accompanyingcompiler performs various static data-flow analyses for checking that the non-null typesystem is enforced and that contracts are pure ie have no side-effects In additionit also performs admissibility checks which are important for soundness and consist in

1Dafny Web Page httpswwwmicrosoftcomen-usresearchprojectdafny-a-language-and-program-verifier-for-functional-correctnessAccessed 2017-02-12 (Archived by WebCite Rcopy at httpwwwwebcitationorg6oE9sn0iL)

21 Specification Languages and Verification Tools 15

restricting what can appear in object invariants and what pure methods can read Thecompiler also emits runtime checks run-time assertions are generated for the programpoints at which contracts are supposed to hold and any failure causes an exception tobe thrown (Barnett et al 2011)

Another important contribution having its origins in the Spec project are theBoogie intermediate language and verification engine Spec programs are translatedto the Boogie language where the heap is modeled as a two-dimensional array indexedby object references and field names Method calls are modeled by assuming theirpreconditions and type information by assigning arbitrary values to anything thatthey might modify and by subsequently assuming their postconditions Based on thisverification conditions are generated and expressed in a standard format supported byautomatic theorem provers Any error reported by the theorem prover is mapped backto Boogie and then to Spec (Barnett et al 2011)

Spec2 has been developed at Microsoft and is publicly available

Boogie The Boogie project 3 comprises both an intermediate verification languageand a verification tool The Boogie language (This is Boogie 2 Boogie Reference Man-ual) is meant to be used as an intermediate representation for static program verifiersof various source languages such as Dafny Chalice and Spec Verifiers for C such asVCC and HAVOC have been built on top of Boogie as well It supports mathematical(types constants functions axioms) and imperative components (global variables pro-cedure declarations and implementations) The latter specify sets of execution tracesthereby describing and constraining states using the former Parametric polymorphismpartial orders nondeterminism logical quantifications total expressions and partialstatements are among the languagersquos features

The Boogie verification tool (Barnett et al 2005a) infers invariants of the inputBoogie programs and then generates verification conditions expressed as formulae infirst-order logic and arithmetic that are passed to an SMT solver such as Z3 Theencoding for the verification formulae allows the reconstruction of error traces fromfailed proofs

JML The Java Modeling Language (JML) (Leavens Baker and Ruby 2006 Leavenset al 2006) is a behavioural interface specification language (Wing 1987) targetingas its name implies the specification of Java classes and interfaces Its design wasguided by the syntax and semantics of Java as some of the main targeted charac-teristics were understandability and a shallow learning curve for programmers alreadyfamiliar with Java The constructs it supports are inspired by the Design by Contractapproach as well as by the Larch family of specification languages (Guttag Horning

2Spec Web Page httpswwwmicrosoftcomen-usresearchprojectspecAccessed 2017-02-12 (Archived by WebCite Rcopy at httpwwwwebcitationorg6oEAJnY8b)

3Boogie Web Page httpswwwmicrosoftcomen-usresearchprojectboogie-an-intermediate-verification-languageAccessed 2017-02-12 (Archived by WebCite Rcopy at httpwwwwebcitationorg6oEAgwOzp)

16 Chapter 2 The Frame Problem in Software Verification

and Wing 1985) It also includes quantifiers constructs for specifying frame conditionsand specification-only fields and methods

Nowadays an evergrowing variety of tools supports JML (Burdy et al 2005)ranging from tools for type-checking specifications (the jmlc compiler) to tools forruntime debugging static analysis (such as ESCJava2 (Flanagan et al 2002 Burdyet al 2005 Chalin et al 2005) and Chase) and verification (such as LOOP KeY andKRAKATOA)

ESCJava2 performs extended static checking (Flanagan et al 2002) for Java pro-grams annotated with specifications written in JML It can check assertions and detectfrequent types of errors in Java such as dereferencing null or indexing an array outsideits bounds However the ESCJava2 tool did not initially address aspects related tochecking frame conditions and this became a notorious source of unsoundness (Burdyet al 2005) Various static verification tools (Berg and Jacobs 2001 Catantildeo and Huis-man 2003 Marcheacute Paulin-Mohring and Urbain 2004 Marcheacute 2016) and dynamicapproaches (Lehner and Muumlller 2010) addressed this issue

22 Manifestations of the Frame ProblemIn the realm of software verification the frame problem refers to establishing the bound-aries within which program elements operate and it has notoriously tedious implica-tions and consequences along two different axes the specification of frame propertiesor frame conditions which indicate which parts of the program state an operationis allowed to modify and their verification ie proving that operations modify onlywhat is allowed according to the specified frame properties Additionally the verifi-cation of frame properties has other ramifications such as proving the preservation ofproperties concerning parts of the state that are external to an operationrsquos frame iethe parts of the state modified by the operation Though identified decades ago in1969 in the context of Artificial Intelligence (McCarthy and Hayes 1969) the frameproblem is still a current concern in the field of formal specification and verificationLeavens et al (Leavens Leino and Muumlller 2007) identify it as one of the difficultremaining challenges in program verification Even more recently Bertrand Meyer de-scribed it as a subsisting problem (Meyer 2015) He argues that it constitutes anexcellent candidate for automation and describes the usual approaches to the frameproblem such as those frequently based on separation logic (Reynolds 2005) or own-ership types (Clarke Potter and Noble 1998) as elegant but requiring undeservedmanual specification effort in addition to annotations on the implementation side Inorder to make verification appealing to a wider audience in the industry the amountof annotations required from the programmers is of the utmost importance and thusmust be carefully taken into consideration when devising a solution While it is le-gitimate to require the specification of properties expressing the functional behaviourexpected of program elements intermediate properties to which frame properties be-long to should as much as possible be detected automatically They are an integral

23 Approaches to Specifying Frame Properties 17

part of a complete specification and they are necessary for proving functional correct-ness but in practical terms they are repetitive and cumbersome and their specificationis an inconvenience (Meyer 2015) Borgida et al provide a comprehensive discussionof the problem itself and the approaches to addressing it (Borgida Mylopoulos andReiter 1993 Borgida Mylopoulos and Reiter 1995) In (Borgida Mylopoulos andReiter 1995) Borgida et al suggest grouping the permissions to modify variablesaround variables themselves instead of methods However this type of specificationshave an unclear semantics in terms of proof obligations (Muumlller 2002) A more recentdiscussion of framing is provided by Hatcliff et al and it is included in a comprehensivesurvey of behavioural interface specification languages (Hatcliff et al 2012) A discus-sion regarding the remaining challenges related to the frame problem with a focus onmodular verification and information hiding is included in (Leavens Leino and Muumlller2007) The authors discuss possible approaches for addressing these challenges as wellas their respective limitations In the following section we present the main existingapproaches to specifying frame properties

We remark that Smart does not provide any explicit specification constructs forframe conditions It is a functional language and it does not support global variables ordestructive updates Implicitly Smart predicates may read anything passed to them asan input without modifying it and write everything in their output or locally declaredvariables The preservation of a frame property ie a logical property depending onlyon parts of the input that are copied without any modification to the output can bespecified as an implication of the form

frame_property(input) =rArr predicate(input output) =rArr frame_property(output)which can be included either in the predicatersquos postcondition or as a separate predicatewith a Boolean result receiving the predicatersquos input output elements as inputs

23 Approaches to Specifying Frame Properties

Various approaches for expressing frame properties have emerged These are knownas the manual exclusive and implicit approaches (Meyer 2015) We remark that allthree major approaches target only the specification of write effects of an operationMost specification languages do not offer special constructs for the specification of readeffects (some notable exceptions are JML Dafny and WhyML the programming andspecification language provided by Why3)

231 The Manual Approach

One of the existing approaches to specifying frame properties does not rely onany specific technique but instead treats them like any other specification componentThis consists in explicitly stating for each operation what is not modified implicitlyconveying that everything else may change This type of specification can be donewith logical variables or with old expressions by explicitly stating for each unchanged

18 Chapter 2 The Frame Problem in Software Verification

variable that its value in the operationrsquos post-state is equal to its prior value in theoperationrsquos pre-state

As described by McCarthy and Hayes (McCarthy and Hayes 1969) with m op-erations such as transfer and n ldquofluentsrdquo such as owner in our introductory examplefrom Section 12 the manual convention leads to a proliferation of clauses that needto be specified Their number can potentially be as high as mn This can prove tobe tedious repetitive and diverting attention and effort from what is truly interestingwhat is actually modified by the operation and how Moreover this approach can leadto instability in the software process (Meyer 2015)

For instance adding new fields to a class whose existing methods are not affected bythe newly added fields requires modifying the postcondition for each existing methodand adding clauses of the form newField = old newField for each added field

Both Dafny (Leino 2010) and Spec (Leino and Muumlller 2008a) support clauses ofthe form e = old(e) in method postconditions for specifying that a method has noimpact on the value of an expression e However these are not the primary mechanismsfor specifying frames in either Dafny or Spec as we will discuss in Section 232

In Smart for predicates manipulating inputs and outputs of the same structuredtype it can be specified in the postcondition that the values of certain fields are equalbetween the received input and the obtained output For instance for a predicatereceiving an input structure of type stype having fields f g h and returning an outputstructure of the same type where the values of the fields f h are equal to their valuesin the input a standard postcondition would have the following form

stypeequals[fh](input output)

This can be viewed as a form of old expressions However the construct used in theabove postcondition which we will discuss in Chapter 3 was not introduced specificallyfor this purpose This idiom is frequently employed for specifying contracts for implicitpredicates a form of foreign or native functions signatures

As we will discuss in Chapter 7 the fine-grained relations that we are detectingbetween parts of the input and parts of the output can be seen as clauses of the formsubvalue = old(subvalue) However in our case these are detected automatically bymeans of static analysis and thus do not require any annotation or manual effortFurthermore by detecting them automatically the potential of changes to the modeledentities and types leading to instability is eliminated

Another problem with this approach becomes visible when some variables are notin scope and hence cannot be explicitly mentioned in the specification (Hatcliff et al2012) In order to overcome the problem in this context complex solutions (Reynolds1981 OrsquoHearn Reynolds and Yang 2001 Banerjee Naumann and Rosenberg 2008)based on Hoare logic style frame rules (Hoare 1971) have been suggested (Hatcliff etal 2012)

23 Approaches to Specifying Frame Properties 19

232 The Exclusive Approach

The most frequent approach to framing is the exclusive approach This consists inexpressing frame properties by means of modifies-clauses that list all the variables thatmay be modified by an operation Implicitly everything that is not listed in such clausesis understood as having to remain unchanged (Guttag et al 1993a) This approachrelies on the observation that the mn matrix described by McCarthy and Hayes isusually sparse as most operations affect only a limited number of elements (Meyer2015)

Modifies clauses such asmodifies a b c can be interpreted as a set of clauses of theform q = old(q) for any q other than a b or c Despite their widely accepted yet mildlymisleading name a modifies clause does not require a command to modify all the listedelements Essentially modifies clauses put an upper bound on the set of elements thatcan be modified and imply that it is strictly forbidden to modify anything else Theexclusive approach to specifying frame properties owns its name to its characteristicof identifying unaffected elements by exclusion (Meyer 2015) Bertrand Meyer arguesthat a more appropriate name for such clauses is only clauses (Meyer 2015) sincethe main goal is not necessarily to enumerate variables that will change but rather tospecify that everything else ie variables that are not listed will not change

This approach has its roots in the modifies construct presented by Liskov and Gut-tag (Liskov and Guttag 1986) Forms of modifies clauses have been used in manydifferent specification languages including the Larch family (Guttag Horning andWing 1985 Guttag et al 1993a) JML (Leavens et al 2006) Spec (Mike Barnett2005) Dafny (Leino 2010) and Z (Abrial Schuman and Meyer 1980)

In JML (Leavens Baker and Ruby 2006) modifies clauses are called assignableclauses and are used for indicating locations that a method may assign to These areslightly different than classical modifies clauses in other languages For instance amethod assigning to a location a and then re-establishing its original value is requiredto list a in its corresponding assignable clause A typical modifies clause however doesnot require listing a since the method does not modify a effectively JML also featuresconditional modifies clauses allowing methods to specify that a modification may occuronly in certain situations Non-pure methods that do not explicitly specify assignableclauses are by default given an assignable everything clause Pure methods have bydefault an assignable nothing clause (Chalin et al 2005) Additionally JML providesaccessible clauses that allow specifying accessed locations (Leavens et al 2006)

In Dafny (Leino 2010) modifies clauses are expressed by sets of objects and theymust be interpreted as giving permissions to a method to modify any field of any objectthat is a member of the specified set Frame conditions are thus expressed at the levelof objects and not at the level of object fields While Dafny methods are not required tospecify what they read for Dafny predicates ie functions returning Booleans readingframe conditions can also be specified (Koenig and Leino 2012) These are memorylocations that predicates are allowed to read and they can be specified as sets ofobjects or object fields Dafny checks that memory locations outside the reading frame

20 Chapter 2 The Frame Problem in Software Verification

are not accessed nested predicate calls must have reading frames that are includedin the reading frames of the calling predicate Predicate parameters are not memorylocations and hence must not be declared In addition Dafny uses a form of dynamicframes (Kassios 2006) that we will present in Section 24

In Spec (Mike Barnett 2005 Leino and Muumlller 2008a) modifies clauses can beexplicitly added for constraining the modification of objects that were allocated in thepre-state of a method ie new objects allocated and modified by a method need notbe included in the modifies clauses Methods can specify that any field of an object omay be modified with a construct of the following form o it can also be specifiedthat only some field a may be modified with a construct of the form oa Unlikethe clauses expressed using old in postconditions for excluding some modificationsmodifies clauses must account for temporary modifications as well (similarly thus tothe JML assignable clause interpretation) For instance for a method decrementingsome integer field f and incrementing it subsequently the method could still specifythat f = old(f) in its postconditions However it would also have to include f in itsmodifies clause

Spec implicitly adds a modifies clause to methods in which this is the onlylisted element Thus by default methods are allowed to modify any field of the thisobject To prevent this the fields that may be modified must be explicitly includedin the clause (meaning that those not included are not allowed to change) A specialconstruct of the form thiso must be explicitly used for specifying that a method doesnot modify any field of this (Leino and Muumlller 2008a)

Information hiding imposes mechanisms for abstracting over program state thatcannot be explicitly mentioned in the modifies clause of a public method To this endwildcards can be used for specifying that the private representations of objects may bemodified as well as for specifying the modification of state in subclasses (Leino andMuumlller 2008a) However wildcards do not extend to aggregate objects and to this endSpec introduces the notion of ownership that we will discuss in Section 24

In Boogie frame conditions are expressed using coarse-grained modifies clausesin conjunction with postconditions These can quantify over fields and specify locationsof the heap that may be modified (This is Boogie 2 Boogie Reference Manual)

SPARK (Barnes and Limited 1997) uses a variation of the typical exclusive ap-proach SPARK procedures may reference or update the state associated with theirparameters in addition to that of global variables SPARK contracts must explicitlyaccount for the global variables accessed (read or written) during procedure executionin a globals construct Additionally for each parameter or global variable it must beindicated if it is read only written only or both read and written As SPARK is basedon the Ada language this is done by means of mode annotations such as in outindicating that a parameter or global variable is read only or written only respectivelyThe in out annotation is used for signaling that the annotated parameter or globalvariable is both read and written Together mode annotations on parameters and glob-als provide a complete specification of the inputs and outputs of a procedure (Hatcliffet al 2012) VDM (Jones 1990) provides similar annotations

24 Topologies and Effects 21

The exclusive convention facilitates the specification of pure operations ie opera-tions having no side-effects on which assertions in various languages including EiffelJML and Spec rely on for supporting data abstraction Specifying that an operationis pure simply amounts to specifying an empty modifies clause However specifyingand verifying the effects of heap modifications on the results of pure methods has beendescribed as one of the difficult remaining challenges related to framing (Hatcliff et al2012)

233 The Implicit Approach

The implicit approach eliminates the need to specify frame properties per se One ofthe implicit approaches relies on limiting what a procedure can modify based on theprocedurersquos precondition This approach is adopted in separation logic (discussed inSection 24) and in the implicit dynamic frames (Smans Jacobs and Piessens 2012)technique where reading and writing to memory requires knowing that the memorycontains that location To this end accessibility information is specified in the precon-ditions of methods By analysing preconditions an upper bound on the set of locationsthat are modifiable by a procedure can be detected As will be discussed in Chapter 7our approach to inferring fine-grained modifications can be seen as an implicit one aswell It relies on data-flow analysis and it is entirely automatic without requiring anydedicated annotations

Another approach to implicit framing was presented by Meyer He proposes theinference of frame properties for a method from the methodrsquos postcondition (Meyer2015) This approach relies on the empirical observation that in practice when pro-grammers realize that an element is modified by a methodrsquos execution they will gener-ally include and express information about how the element is modified It was inspiredby an informal review of publicly available JML code which showed that in practiceelements included in an assignable clause overlap those appearing in the methodrsquos post-condition Meyer argues that any exception to this observation can be easily addressedby inserting a Boolean function into the postcondition which always returns true andwhich introduces its elements into the implicit frame (Meyer 2015)

24 Topologies and EffectsSpecification techniques for complex data structures and operations manipulating themmust be able to describe and to address issues related to two different aspects namelythe topology or structure of the former and the effects of the latter on the data struc-turesrsquo state (Hatcliff et al 2012) In the object-oriented realm objects encapsulatestate and functionality yet their implementations are rarely limited to the fields andmethods of a single object After all one of the principles of object-oriented program-ming is to favour composition over inheritance Thus object fields reference otherobjects often of different classes and those objects in turn reference yet other objectsand so on In order to reason about and to prove functional correctness specificationshave to capture this ldquocompositerdquo shape of the implemented data structures (Leino and

22 Chapter 2 The Frame Problem in Software Verification

Muumlller 2008a) They also have to describe the effects of operations on the state ofthe data structures including write effects ie which parts are potentially modified byan operation and read effects ie which parts are potentially accessed by an opera-tion (Hatcliff et al 2012)

For objects and heap data structures the write and read effects (Greenhouse andBoyland 1999) refer to parts of the heap ie locations Specifications for heap datastructures might also require including allocation and deallocation effects as well aslocking information (Hatcliff et al 2012) Detecting and reasoning about read andwrite effects is necessary and relevant in different situations For instance Greenhouseand Boyland (Greenhouse and Boyland 1999) present an effects system for performingsemantics-preserving program manipulation on Java source code

Our work is done in the context of a purely functional language with immutabledata structures and no destructive updates Reasoning about the heap is beyond ourscope However our concerns are similar we handle ldquocompositerdquo data structuresmodeled by immutable associative arrays and algebraic data types ie structures andvariants and we want to capture the behaviour of operations receiving such a compositeinput manipulating it reconstructing it and returning its new state into a compositeoutput Thus in contrast to specification and reasoning techniques for objects whichare concerned with deep-heap effects we are concerned with deep-state effects

Specification techniques for topologies and effects must address three major chal-lenges namely abstraction reasoning and framing (Hatcliff et al 2012)

Abstraction In the object-oriented context heap properties must be expressed in animplementation-independent manner Abstraction is important for information hidingand for supporting subtyping (Leino 1998 Leavens and Muumlller 2007) Aspects relatedto visibility and information-hiding are orthogonal to our work The language we areworking with does not have subtyping Therefore disclosing the topology of our datastructures is not problematic from this point of view

Reasoning The formal framework in which (heap) properties are expressed shouldallow efficient ideally automatic reasoning

Framing Specifications of heap operations should ease reasoning about framing andaid in proving that certain heap properties are not affected by a heap operation Fram-ing can be illustrated by the following rule expressing that a state that is unmodifiedby C can be preserved

PCQP andRCQ andR

if the write effect of C is disjoint from the free variables of R In the presence of complexheap data structures the disjointness of the effects of C and the assertion R is moredifficult to express as it needs to specify that the locations that are modified by C aredisjoint from the locations read by R Similarly though not referring to locations we

24 Topologies and Effects 23

have to be able to express that the substructures (or subelements) modified by C andthose read by R are disjoint

The sets of written or read locations are called footprints Hatcliff et al classifyapproaches to the specification of heap properties into three categories The first cate-gory relies on explicit footprints and uses sets of objects or locations that are includedin predicates and effects specifications Dynamic frames (Kassios 2006 Kassios 2011)and region logic (Banerjee Barnett and Naumann 2008 Banerjee Naumann andRosenberg 2013) are the main exponents of this category The second category re-lies on implicit footprints which are derived from predicates in specialized logics suchas separation logic The third approach relies on predefined footprints which are de-rived from predefined heap topologies (Hatcliff et al 2012) Ownership types (ClarkePotter and Noble 1998) are the main exponent of this category All of these tech-niques allow specifying the topologies of common heap data structures and reasoningabout the effects of operations However each amounts to a different balance betweenexpressiveness and automation (Hatcliff et al 2012)

241 Explicit Footprints

The explicit footprint approach to framing was pioneered by Kassios and the dynamicframe theory (Kassios 2006 Kassios 2011) This proposed adding sets of locations tothe specification language and expressing footprints in terms of such sets For preservinginformation hiding these sets of locations can involve dynamic frames specificationvariables that abstract over a set of locations The initial solution based on dynamicframes was formalized in the context of an idealized logical framework using higher-order logic and inductive-based proofs which are difficult to automate Subsequentwork on region logic (Banerjee Naumann and Rosenberg 2008 Banerjee Barnettand Naumann 2008 Banerjee Naumann and Rosenberg 2013) and the Dafny verifieron one hand and VeriCool (Smans Jacobs and Piessens 2008) on the other handdeveloped dynamic frames in a first-order setting

VeriCool uses pure methods for describing sets of locations Recursively defined puremethods or logic functions can be a challenge for automatic theorem provers (Hatcliffet al 2012 Banerjee Barnett and Naumann 2008)

In region logic for minimizing the need for inductively defined predicates in spec-ifications the specification attributes used in the dynamic frames approach (Kassios2006) are replaced with ghost state (Banerjee Naumann and Rosenberg 2013) iemutable auxiliary fields and variables Programs have to be explicitly annotated withthese which might imply a cumbersome manual effort but unlike the dynamic frametheory in its original form this permits automated theorem proving

Zee et al have used explicit footprints for verifying the functional correctnessof linked data structures in Jahob (Zee Kuncak and Rinard 2008) Banerjee etal (Banerjee Naumann and Rosenberg 2008 Banerjee Barnett and Naumann 2008)encoded region logic in the intermediate verification language Boogie (Leino and Ruumlm-mer 2010)

24 Chapter 2 The Frame Problem in Software Verification

The dynamic frames approach using ghost variables is supported by the Dafnylanguage (Leino 2010 Koenig and Leino 2012) As described in Section 232 Dafnysupports the exclusive approach to specifying frames Ghost variables are used inmodifies clauses The standard idiom consists in declaring a set-valued ghost fieldRepr for instance to dynamically maintain Repr (ie explicitly update it in the code)as the set of objects that are part of the receiverrsquos representation and to use Repr inmodifies clauses (Leino 2010) The following idiom is standard (Leino 2010)

class MyClass ghost var Repr setltobjectgtmethod SomeMethod() modifies Repr

This modifies clause is to be interpreted as the method may modify any field ofany object in Repr If this is a member of the Repr set then the modifies clause alsoallows the method to modify the field Repr itself (Leino 2010)

With explicit footprints proving frame properties consists in proving that the readeffects of a predicate and the write effects of a method are disjoint

Before the dynamic frame approach data groups (Leino 1998 Leino Poetzsch-Heffter and Zhou 2002) and solutions based on the Universe type system (Muumlller2002) have been proposed for specifying footprints within single objects

The level of expressiveness offered by techniques based on explicit footprints is veryhigh allowing specifications to relate different regions in arbitrary ways ranging fromdisjointness or inclusion of regions to characterizing their intersection However thisflexibility complicates reasoning When regions are stored explicitly in ghost variablesas is done in Dafny programs need to explicitly update these ghost variables to maintaininvariants This can prove to be a cumbersome task When pure methods are used asin VeriCool it is mandatory to reason explicitly about the effects of heap modificationson their results (Hatcliff et al 2012)

242 Implicit Footprints

The implicit footprint approaches rely on specialized logics for implicitly representingfootprints Separation logic (OrsquoHearn Reynolds and Yang 2001 OrsquoHearn Yang andReynolds 2004 Reynolds 2002 Reynolds 2005 Reynolds 2000) is the most prominentrepresentative of this category

Separation logic extends Hoare logic (Hoare 1971) with the separating conjunctionoperator lowast Each assertion in separation logic defines a portion of the heap Theassertion P lowastQ is true if and only if P and Q hold for disjoint parts of the heap Localreasoning is fundamental to separation logic (OrsquoHearn Reynolds and Yang 2001)specifications need to describe all the state that the code C reads or writes Thus inthe triple PCQ P must be interpreted as being all the state that is needed forexecuting C ie the footprint of C This interpretation of Hoare triples leads to thefollowing frame rule in separation logic

24 Topologies and Effects 25

PCQP lowastRCQ lowastR

which allows inferring that a local property is preserved for a wider state obtained byextending P with another disjoint state R Some versions of separation logic imposeadditional conditions about local variable modifications as the lowast operator only separatesheaps Separation logic can be extended such that lowast also separates variables thuseliminating the need for additional conditions (Parkinson Bornat and Calcagno 2006)

A separation logic for Java was introduced by Parkinson (Parkinson and Bierman2005) This has primitive assertions to describe the values of fields in the heap andallows describing portions of the heap containing several disjoint objects using the lowastoperator

Separation logic does not require explicitly specifying read or write effects They areimplicit in a methodrsquos precondition Data structures are specified using logic functionsBy including such a logic function in a methodrsquos precondition the method is allowedto read and write anything belonging to the footprint of the logic function but cannotaccess anything outside this footprint

Approaches based on separation logic are hard to implement and to integrate intoverification tools Verifiers based on separation logic have mostly relied on sym-bolic execution and have not yet achieved the same level of automation as verifiersbased on verification condition generation (Hatcliff et al 2012) However currentlya series of tools exist that can reason using separation logic These include Small-foot (Berdine Calcagno and OrsquoHearn 2005 Berdine Calcagno and OrsquoHearn 2012)SpaceInvader (Distefano OrsquoHearn and Yang 2006 Calcagno et al 2008) jStar (Dis-tefano and Parkinson 2008 Naudziuniene et al 2011) VeriFast (Jacobs Smans andPiessens 2010 Jacobs et al 2011) and SLAyer (Berdine Cook and Ishtiaq 2011)

The implicit dynamic frames approach (Smans Jacobs and Piessens 2012) unifiesthe dynamic frames concept with separation logic Framing specifications of a methodare inferred using an implicit approach as described in Section 233 They are encodedin first-order logic and can be used for automatic verification with SMT solvers Thisis done in VeriCool (Smans Jacobs and Piessens 2008) and Chalice (Leino Muumlllerand Smans 2009)

243 Predefined Footprints

In contrast to the implicit and explicit footprint approaches which describe propertiesfound in a program the third approach focuses on reasoning efficiently about programswith restricted topologies Ownership types (Clarke Potter and Noble 1998) arerepresentative of this approach

Ownership types typically enforce a tree topology whereby every object in the heaphas at most one owner object and the owner relation is acyclic Topological propertiesbeyond this tree structure have to be expressed using object invariants and predicatelogic Read and write effects typically use ownership as an abstraction mechanism the

26 Chapter 2 The Frame Problem in Software Verification

right to read or write an object include the right to read or write all the objects it(transitively) owns (Hatcliff et al 2012)

Spec addresses framing through ownership types without explicit specificationsstating otherwise (modifies clauses of the form presented in Section 232) methodsmay modify only the fields of the receiver and of those objects within the subtree ofwhich the receiver is the root Ownership is expressed by means of attributes on fielddeclarations (Barnett et al 2004 Barnett et al 2011)

Ownership has been used to verify write effects (Muumlller Poetzsch-Heffter and Leav-ens 2003) and invariants (Drossopoulou et al 2008 Leino and Muumlller 2004 MuumlllerPoetzsch-Heffter and Leavens 2006) All the existing ownership-based verificationtechniques enforce that all modifications of an object must be initiated by the objectrsquosowner This gives owners total control over modifications of their internal representa-tions and allows them to maintain invariants (Hatcliff et al 2012) Ownership-basedapproaches have been used for reasoning about model fields (Leino and Muumlller 2006)and for enforcing object immutability (Leino Muumlller and Wallenburg 2008)

The ownership topology can be enforced by type systems (Lu Potter and Xue2007 Muumlller 2002) In JML it is enforced through universe types (Dietl and Muumlller2005) In Spec it is encoded as object invariants (Barnett et al 2004)

Reasoning about framing relies on the tree structure on the heap enforced by own-ership The ownership trees rooted in two different objects o1 and o2 are disjoint ifneither o1 owns o2 nor o2 owns o1 The disjointness of ownership trees can then beused to prove that read and write effects of methods do not overlap (Hatcliff et al2012)

25 Other Approaches to Reason about Frames

Rakamarić and Hu report in (Rakamaric and Hu 2008) a method to infer frameaxioms of procedures and loops based on static analysis As a starting point they usethe DSA shape analysis presented by Lattner et al (Lattner Lenharth and Adve2007) DSA provides a summary of points-to relations as a graph that is used tocompute a set of memory locations that are modified by a procedure or its callees Bya pass through the graph for each node that is reachable from the globals or procedureparameters they generate expressions representing a path to that node The generatedframe axioms are used internally by an extended static checker of C programs ie ina purely automatic setting

In (Taghdiri Seater and Jackson 2006) Taghdiri et al present a technique forextracting procedure summaries for object-oriented procedures used to prove verifi-cation conditions Procedures are executed symbolically and the environment of thepost-state is computed so as to express every variable and field in terms of the valuesof the variables and fields of the pre-state The extracted procedure summaries canbe viewed as detailed frame conditions describing which memory locations might bechanged and how

26 Other Relevant Work 27

In (Sozeau 2009) Sozeau presents a generalized rewriting technique implementedin the Coq proof assistant that allows substituting a term t of an expression by anotherterm tprime when t and tprime are related by a relation R This generalizes equational reasoningto reasoning modulo arbitrary relations The technique relies on dependent types andis based on a constraint generation algorithm generating type class constraints TheCoq tactic supports polymorphic relations morphisms and subrelations

Bertrand Meyer proposed the double frame inference strategy an approach that tar-gets the automation of both frame specification and frame verification in the contextof Eiffel (Meyer 1991) an object-oriented language with native support of Design byContract features (Meyer 1992) The first component ndash the frame specification infer-ence ndash relies on the analysis of method postconditions as described in Section 233 andobtaining a set p This represents an overapproximation of the set of elements that areallowed to be modified by p according to its specification The second component of thestrategy the frame implementation inference relies on the frame calculus (KogtenkovMeyer and Velder 2015) which is itself based on alias calculus (Kogtenkov Meyerand Velder 2015 Meyer 2010 Meyer 2011) Methods are analysed and p is detectedthis represents an overapproximation of the set of expressions whose values may changeas a result of executing p Frame verification amounts to verifying that p includes p

26 Other Relevant WorkPure methods also known as queries or observers are side-effect free methods that al-ways evaluate to the same result value given the same input value They are intensivelyused for providing specifications for methods without disclosing implementation detailsin languages such as JML Spec and Eiffel Leavens et al identify the developmentof specification and verification techniques for determining the effects of heap modifi-cations on the results of pure methods as one of the remaining challenging problemsrelated to framing (Leavens Leino and Muumlller 2007) Though our work is not con-cerned with heap modifications we are interested in the dependency of Boolean Smartpredicates ie logical properties on the layered (ldquocompositerdquo) data structures theyare receiving as inputs In Chapter 5 we present a static analysis meant to capturesuch dependencies

Various encodings of pure methods (Cok 2005 Darvas and Muumlller 2006) in pro-gram logic have been proposed but they do not cover aspects related to reasoningabout frame properties when the specifications make use of pure methods Some spec-ification techniques for frame properties (Leavens Baker and Ruby 2006 Leino andMuumlller 2006 Leino and Nelson 2002 Muumlller Poetzsch-Heffter and Leavens 2003)allow describing the fields that are potentially modified by a method execution usingmodifies clauses These however do not specify the effects of a method execution onthe results of pure methods (Leavens Leino and Muumlller 2007)

One technique for determining the effects of heap modifications on the results of puremethods requires listing all pure methods that are potentially affected by a methodin the methodrsquos modifies clause This approach is adopted in COLD-K (Feijs and

28 Chapter 2 The Frame Problem in Software Verification

Jonkers 1992) where the frame of a procedure specification lists the variables and theequivalent of pure methods whose value may be changed by the procedure For dealingwith modularity issues COLD-K also makes use of read effects

Other approaches (Leino and Muumlller 2006 Muumlller Poetzsch-Heffter and Leavens2003) for determining effects on the results of pure methods rely on model fields Theseare specification-only constructs whose value is determined by applying a mapping tothe concrete state of an object They are similar to pure methods but unlike the latterthey do not have parameters and they are required to be confined (Leino and Muumlller2006 Muumlller Poetzsch-Heffter and Leavens 2003)

Approaches based on model fields require that pure methods read only the stateof the receiver object and its sub-objects This information about the read effect of apure method can be used to determine which write effects potentially have an impacton the result of a pure method In general it can be proven that a method m does notaffect the result of a pure method p if the write effect of m and the read effect of p aredisjoint (Leavens Leino and Muumlller 2007)

There are various approaches to using read effects for reasoning about pure meth-ods One approach relies on complete specifications of result values included in thepostconditions of pure methods Used in conjunction with modifies clauses theseallow determining whether a method affects the result of a pure method (LeavensLeino and Muumlller 2007) Various solutions based on explicitly specified read effectsexist (Feijs and Jonkers 1992 Greenhouse and Boyland 1999 Jacobs and Piessens2006) Specification of these using data groups (Leino 1998 Leino Poetzsch-Heffterand Zhou 2002) and an effects system built on top of an ownership type system (Clarkeand Drossopoulou 2002) have been proposed Multi-threaded programs also requiresuch specifications (Praun and Gross 2003)

29

Chapter 3

The Smart Language andProvenTools

Languages are not strangers to oneanother

Walter Benjamin

In this chapter we introduce Smart a programming and specification languagedeveloped at Prove amp Run as well as the toolchain associated with it While notclaiming to be exhaustive we give an overview of the languagersquos features and syntax inSection 31 In Section 32 we present the tools manipulating Smartmodels Section 33briefly presents Smil the Smart Intermediate Language A computational version of itndash αSmil ndash is targeted by the static analyses presented throughout the remainder of thisthesis The following chapter will focus entirely on αSmil illustrating its usage andintroducing its syntax and formal semantics

31 The Smart Modeling LanguageSmart is a modeling language developed at Prove amp Run It constitutes a unified pro-gramming and specification language designed to facilitate proofs One of the commonoften cited reasons why programmers reject the use of formal methods is that they arenot willing to learn a separate language just for specifying their programs in particu-lar if that language is fundamentally different from the programming language Smartaddresses this issue by allowing one to both develop the implementation of programsand to specify their logical properties in a single language

The Smart language is a purely functional (side-effect free) strongly-typed poly-morphic first-order language The basic building blocks of programs written in Smartare predicates the equivalent of functions in other common programming languagesBesides the common primitive types that are traditionally available as built-in typesalgebraic data types (structures and variants) and associative arrays are provided aswell Exit labels constitute the languagersquos main specificity they facilitate separatingdata- and control-flow in programs

In addition being designed in order to write code that will subsequently be proventhe language allows the definition of various types of logical specifications as well

30 Chapter 3 The Smart Language and ProvenTools

These range from pre- and postcondition contracts local assertions and loop invariantsto inductive predicates lemmas and hypotheses

ProvenTools is a complex set of development tools for the Smart language It hasbeen developed at Prove amp Run with the goal of facilitating the achievement of high-levelcertifications The toolchain has the structure of a set of Eclipse plug-ins of JDT typendash Java Development Tools (Eclipse Java Development Tools (JDT)) Together theseconstitute a complete Integrated Development Environment (IDE) allowing one to notonly write edit and document Smart models but also to browse proof obligations toprove them by employing a built-in prover and finally to generate executable code inC

ProvenCore1 (Lescuyer 2015) and ProvenCore-M2 are two microkernels that havebeen completely modeled in Smart and developed using ProvenTools The former isa general-purpose microkernel that ensures isolation ie integrity and confidentialityThe latter targets embedded devices based on microcontrollers

Throughout the rest of this section we will present some of the main concepts andmechanisms of Smart discussing predicates control flow algebraic data types andspecification-only constructs

311 Smart Predicates and Types

Smart supports modular program development with a straightforward module con-cept Modules constitute the compilation units of Smart programs and any valid Smartprogram consists of a non-empty set of modules which are themselves organized inpackages Modules have an identifier that is unique in each program and in practicalterms each module corresponds to a file Modules can import other modules and theycontain a list of type and constant declarations as well as a list of predicates

Predicates the equivalent of functions in other common programming languagesare the basic building blocks of programs written in Smart Though named in referenceto predicate logic predicates in Smart receive a number of inputs and produce a numberof outputs in return in contrast to predicates in mathematics which are commonlyunderstood to be Boolean-valued functions of the form

P X rarr true false

Smart predicates can be classified in two different categories namely implicit andexplicit predicates based on their implementation or their lack thereof

Implicit predicates can be seen as a form of an assumption as their names suggestthey are not implemented per se but simply declared using the implicit programkeywords Such predicates are similar to the declarations of native methods in Javaor external functions in C Traditionally in Java programmers use the Java NativeInterface (JNI) (Liang 1999 Java Native Interface Documentation (JNI) 1999) whenthey need to implement small time-critical code portions in a lower-level language

1httpwwwprovenruncomproductsprovencore2httpwwwprovenruncomproductsprovencore-m

31 The Smart Modeling Language 31

such as assembly or when they need to access a library already written in anotherprogramming language such as C In Smart implicit predicates play an important rolewith respect to code documentation Their implementation is not provided in themodel but as we will further explain in Section 314 they can be used to specifylogical properties of the explicit implementations provided externally in a lower-levellanguage typically in C or assembly

For example an implicit predicate converting an integer given as an input into afloat can be declared as follows

public float_of ( int n f l oa t f+)impl ic i t program

The predicatersquos result is given a name f and it is introduced as one of the predi-catersquos parameters It is marked as being the predicatersquos output by the + symbol follow-ing it and is thereby syntactically distinguished from the predicatersquos input parametern which is unadorned

In the general case Smart predicates can have any number of input or output pa-rameters However a parameter cannot be both at the same time and each of thesemust be explicitly marked either as an input or as an output An input parameterrsquosvalue can be read and used in the predicatersquos implementation An output parameterrsquosvalue must be constructed by the predicatersquos implementation and returned as a resultFurthermore values in Smart are immutable As a consequence Smart predicates arepure it is impossible to pass a parameter ldquoby referencerdquo and modify a predicatersquos inputas a side-effect Smart is thus a side-effect free language which provides referential trans-parency (Strachey 1967) Furthermore the language supports neither global variablesnor global states but can be characterized rather as a state-passing style languageSmart predicates are deterministic they always return the same output any time theyare called with a specific set of input values In particular this is a prerequisite forimplicit predicates

As mentioned in the introduction Smart is also a strongly-typed language Eachinput and output parameter of a predicate must have an associated type and the us-age of an object of some type where a parameter of another data type is expected isforbidden by the language Unsafe conversions between different types are forbiddenas well Smart provides various built-in types such as int short long char booleanfloat and double that are traditionally available in other programming languages aswell Additionally users can declare new types with the type keyword and then de-fine predicates manipulating these types As in the case of predicates implicit datatypes can be simply declared without being explicitly defined For example supposingthat an implicit data type called cartesian_point and the predicates manipulating itare defined in a lower-level language we would make them available to other Smartpredicates using the following declarations

Implicit data type declarationtype cartesian_point

32 Chapter 3 The Smart Language and ProvenTools

Retrieve coordinate on X-axis public get_X ( cartesian_point p f l oa t x+)impl ic i t program

Retrieve coordinate on Y-axispublic get_Y ( cartesian_point p f l oa t y+)impl ic i t program

Construct a new point p with coordinates (x y)public new_point ( f l oa t x f l oa t y cartesian_point p+)impl ic i t program

Pretty - printpublic print_point ( cartesian_point p)impl ic i t program

Some implicit predicates manipulating inputs of type cartesian_point are declaredas well the first two of them ndash get_X and get_Y ndash simply return the input pointrsquos numer-ical coordinates on each of the Cartesian systemrsquos axes The next predicate new_pointcreates and returns a new point from the two given input coordinates Alternativelyit is possible to directly declare and implement these types and predicates in Smart aswe will show in the following paragraphs The last one print_point simply displaysthe input point without effectively producing an output As shown in the examplesimilarly to Java comments in Smart can be introduced by using for single-line com-ments or for multi-line comments Similarly to Javadoc code documentation canbe given using the begin-comment delimiter

In general implicit data types and the implicit predicates manipulating them canact as a public interface for a concrete class showing the type and the operationsallowed to manipulate values of that type but hiding the implementation

Explicit data types can be declared and defined using structures and variants Forexample we could explicitly define the type cart_point by means of a structure havingtwo different fields of type float called x and y Each of them corresponds to thepointrsquos numerical coordinates on the X- or Y-axis respectively

type cart_point = f l oa t xf l oa t y

For representing a point in a polar coordinate system we can define a different typepolar_point as follows

type polar_point = Radial coordinate ( distance from the pole) f l oa t radius

31 The Smart Modeling Language 33

Polar angle f l oa t azimuth

Explicit predicates have explicitly defined implementations following immediatelyafter their declaration which strongly resembles that of an implicit predicate but fromwhich the keyword implicit is omitted Their bodies are sequences of several state-ments which are essentially calls to other predicates For example to translate a point(x y) ie to add a given pair of numbers (a b) to its Cartesian coordinates and obtainthe new point (xprime yprime) = (x+ a y + b) a predicate translate_point could be defined inthe following manner

Convert x to float add it to y and retrieve the sum public sum_of ( int x f l oa t y f l oa t s+)impl ic i t programpublic translate_point ( cartesian_point p int a

int b cartesian_point q+)program f l oa t xa f l oa t yb Local variables

print_point (p) 1

get_X (p xa +) 2 get_Y (p yb +) 3

sum_of (a xa xa +) 4 sum_of (b yb yb +) 5

new_point (xa yb q+) 6

print_point (p) 7

The body of the translate_point predicate consists in a sequence of several state-ments the first of these simply pretty-prints the input point p The next two statementsare calls to accessors of prsquos coordinates on the X- and Y-axis which are stored in thelocal variables xa and yb respectively Next the coordinates (xaprime ybprime) = (a+xa b+yb)for the translated point are computed by calling the sum_of predicate which returnsthe float sum of an integer and a float The output point q is constructed by callingthe constructor new_point with xa and yb as inputs The last statement pretty-printsthe input point p again

As illustrated by our example each call to a predicate is made by passing theparameters in the same order as in the predicatersquos declaration and by explicitly mark-ing any output with +3 Replacing line 4 with sum_of(xa a xa+) would result in an

3This is mandatory because of overloading

34 Chapter 3 The Smart Language and ProvenTools

error because the first input parameter of a call to sum_of is expected to be an in-teger and the second a float Similarly omitting the + symbol at line 6 and writingnew_point(xa yb q) would result in an error By explicitly marking the outputs ofeach statement it is straightforward to distinguish between the variables that are ac-tually written by the statement and those that are used only as inputs Furthermoresince predicates are not allowed to modify their inputs the language strictly forbidsusing a predicatersquos input parameter as an output for any statement in the predicatersquosbody Thus in our example predicate we are prevented from using the input point pas the output of the new_point predicate call However outputs and local variablessuch as xa and xb can be written to but reading them (ie using them as inputs fora predicate call) before they have been written at least once amounts to using unini-tialized variables and behaves in an unspecified manner In our example xa and ya areused as both inputs and outputs at line 4 and 5 respectively This is correct since xaand ya are local variables that have already been written to by the statements at line2 and 3 preceding the calls to sum_of

We stress again the fact that destructive updates are not possible in Smart even ifat a first glance a statement such as the call to sum_of at line 4 might give the impressionthat xa is modified in place all that the statement actually does is to create a new floatwhose value is obtained by adding the old value of xa to the value of a and then to setxa to reference this new float instead of the old one A simple conversion to a staticsingle assignment form (Cytron et al 1989) would eliminate these assignments andshow the absence of any mutation whatsoever Thus were we to inspect the state ofthe input point p before and after the calls to sum_of we would observe that it remainsunchanged this is what we do when printing p again at the end of sum_of

As a last remark about our example it is noteworthy to mention that the statementnew_point(xa yb q+) which produces the predicatersquos output is not the predicatersquoslast statement Smart does not support any dedicated return statement Instead whenexiting from a predicate the outputs hold the values that they have been assigned whenexecuting the body This mechanism allows one to define predicates having multipleoutputs Their names are chosen by the programmer and their values can be modifiedmultiple times during the predicatersquos execution however the values retrieved are theones that are available at the moment the program exits the predicate

312 Exit Labels and Control Flow

Besides input and output parameters the declaration of a predicate can also include aset of exit labels When called a predicate exits with one of the specified exit labels thussummarising and returning to its callers further information regarding its execution

Exit labels constitute the main specificity of the Smart language They can denotedifferent exceptional execution scenarios and act as exit codes similarly to exceptionsand exit status return values in other programming languages

Every predicate has a non-empty set of labels by default any predicate has thebuilt-in exit label true that denotes the successful exit status of a predicate Thepredicates illustrated previously in Section 311 did not have explicitly declared exit

31 The Smart Modeling Language 35

labels in such a case it is assumed that the only possible exit label for the predicateis true and hence that the predicate will succeed in all circumstances

Returning to our previous example the predicate translate_point we could havewritten its complete declaration by explicitly stating that true is the only possible exitlabel

public translate_point ( cartesian_point p int aint b cartesian_point q+)

-gt [ true]program

This declaration is strictly equivalent to the one given in Section 311In the general case any number of labels can be specified after the parameters For

example we could declare a predicate that converts the coordinates of an input point(x y) of type cartesian_point to polar coordinates

r =radicx2 + y2

φ = atan2 (y x)

and returns a point (r φ) of type polar_point with these coordinates For computingthe second polar coordinate the polar angle or azimuth the predicate would call an-other predicate atan2 which is the arctangent function with two arguments a commonvariation on the arctangent function The atan2 function avoids the problem of divisionby zero however it is undefined when both x and y ie the Cartesian coordinates arezero For declaring it in Smart we can add a special exit label for the case when thegiven input coordinates represent the origin and the result cannot be returned

Computes atan(yx) public atan2( f l oa t x f l oa t y f l oa t at+) -gt [ true undef ]impl ic i t program

The declared labelrsquos name undef is a custom name and any valid identifier canbe chosen and used as a label in Smart As previously mentioned the exit label trueis predefined and has a special meaning Another predefined label that is interpretedin a special manner by conditional statements and logical operators is the false labelTogether these two exit labels offer a convenient manner to model a Boolean resultFrequently a Boolean output value can be replaced by declaring these two possible exitlabels true to denote a successful execution of the predicate and false respectively

Besides indicating the followed execution scenario exit labels play an importantrole with respect to control flow management Primarily the exit label of a call toa predicate determines whether the next predicate call in sequential order should beexecuted or not when the predicate exits with true the program can proceed to the

36 Chapter 3 The Smart Language and ProvenTools

next statement in the program Any other exit label lbl disrupts the normal controlflow and forces the current predicate to exit with label lbl

For example a predicate cart_to_polar can be defined with two exit labels trueand undef as well It takes two float numbers x and y computes the correspondingpolar coordinates r and phi by calling the predicates compute_radius and atan2 andconstructs a new point p of type polar_point using the computed values

public compute_radius ( f l oa t x f l oa t y f l oa t r+)-gt [ true]impl ic i t program

public cart_to_polar ( f l oa t x f l oa t y polar_point p+)-gt [ true undef ]program f l oa t phi f l oa t r

compute_radius (x y r+)

atan2 (y x phi +)new_polar_point (r phi p+)

There is no guarantee that the call to atan2 will return successfully with exit labeltrue it might return with undef in which case the execution of cart_to_polar willbreak at that point and exit with label undef Furthermore no output will be generatedIn Smart exit labels condition the existence of output parameters every output isassociated to an exit label lbl and it is generated if and only if the predicate exits withthat particular exit label lbl All other outputs are discarded and can be consideredas unchanged by the caller The same output can be associated to multiple labels Bydefault if no output parameters are specified for a label it means that no outputs aregenerated when the predicate exits with this label The only exception to this rule ismade in the case of the built-in true label since true normally represents a successfulexecution every output of the predicate is associated to it by default For examplethe previous declaration of cart_to_polar is strictly equivalent to

public cart_to_polar ( f l oa t x f l oa t y polar_point p+)-gt [ true ltpgt undef ltgt]program f l oa t phi f l oa t r

compute_radius (x y r+)atan2 (y x phi +)new_polar_point (r phi p+)

Exit labels can thus behave similarly to exceptions in other programming languages Inorder to handle specific observed execution scenarios Smart provides label transformerswhich allow catching labels before they escape the current predicate and transforming

31 The Smart Modeling Language 37

them into another label Complex control flow can be expressed by indicating a set ofrules of the form lbl1 lbl2 whose role is to transform the label lbl1 into lbl2 andby associating them to statements

For example we could let the predicate cart_to_polar return the label origin_failwhen the inner computation of the azimuth fails instead of just forwarding the labelreturned by atan2

public cart_to_polar ( f l oa t x f l oa t y polar_point p+)-gt [ true ltpgt origin_fail ]program f l oa t phi f l oa t r

compute_radius (x y r+)[undef origin_fail ]

atan2(y x phi +)new_polar_point (r phi p+)

Alternatively we could also handle the failure of the computation by using trans-formers and constructing the output point differently for example by declaring aconstant representing the azimuth of the origin often called pole in polar coordinatesand using this for the construction of p when the call to atan2 fails

public const float POLEAZIMUTH

public cart_to_polar (float x float y polar_point p+)-gt [true ltpgt]program float phi float r

compute_radius (x y r+)[done true]

[true done undef true]atan2(y x phi+)phi = POLEAZIMUTH

new_polar_point (r phi p+)

In the following we show how the control flows when atan2 terminates with labeltrue The green arrows indicate how control is passed from one statement to the otherbased on their exit labels when starting from the call to the atan2 predicate

38 Chapter 3 The Smart Language and ProvenTools

public const float POLEAZIMUTHpublic cart_to_polar (float x float y polar_point p+)-gt [true ltpgt]program float phi float r

compute_radius (x y r+)[done true]

[true done undef true]atan2(y x phi+)phi = POLEAZIMUTH

new_polar_point (r phi p+)

And here is how the control flows when atan2 terminates with label undef

public const float POLEAZIMUTHpublic cart_to_polar(float x float y polar_point p+)-gt [true ltpgt]program float phi float r

compute_radius (x y r+) 1 [done true ] 2

[true done undef true] 3 atan2(y x phi+) 4 phi = POLEAZIMUTH 5

new_polar_point (r phi p+)

After computing the radius r by calling compute_radius this new version of thepredicate starts by calling the predicate atan2 If this operation succeeds then phi isthe value of the azimuth and we can use this value as the second input parameter forthe pointrsquos constructor new_polar_point This is done by transforming true to a newlabel done whose effect is to jump immediately to the outer block in this case thetop-level The top-level block of the program catches done transforms it back to trueand continues with the statement following the block namely new_polar_point whichwill construct the output p by using r and phi the value of the azimuth returnedby atan2 When atan2 is undefined the transformer undef true is used to jump toan additional statement phi = POLEAZIMUTH that assigns the value of POLEAZIMUTH tophi The constructor is reached in this case as well However this time the value of phiwritten at line 5 is used as the second input parameter We note that the statementat line 5 is a call to a built-in assignment predicate denoted by = and using an infixnotation

The constant POLEAZIMUTH is declared using the keyword const In Smart constantscan be declared and used directly as inputs for predicate calls

31 The Smart Modeling Language 39

In the general case arbitrarily complex control flows can be expressed by couplinglabel transformers blocks and recursion

In order to facilitate the userrsquos task of simulating common control flow structureswith labels and transformers Smart provides various control flow statements whichare themselves based on this mechanism These include a construct that is equivalentto the try catch mechanism in Java a conditional if then else controlstructure as well as the common logical operators for negation () conjunction (ampamp)disjunction (||) implication (=gt) and equivalence (lt=gt)

Given the Cartesian coordinates (x y) the first polar coordinate the radius isobtained by computing radic

x2 + y2

For explicitly defining the predicate compute_radius we would first need to imple-ment a predicate sqrt computing the square root of a given positive number Such apredicate can be recursively implemented as follows by using the if then elseconstruct and three implicit predicates

Newton - Raphson Square Roots Finding Algorithm

Divides a to b and retrieves result in div public div_double (double a double b double div +)-gt [ true undef]impl ic i t program

Check if a is close enough to b |a - b| lt b 0001 public close_approximation (double a double b)-gt [ true f a l s e ]impl ic i t program

Compute ((b + ab) 2) public better_approximation (double a double b double g+)-gt [ true undef ]impl ic i t program

public sqrt(double x double g double sqr +)-gt [ true undef ] Returns the square root of x by making recursive callswith better and better guesses g until reaching a guessthat is close enough to the actual square root rsquos value program double aux

div_double (x g aux +)i f close_approximation (aux g)then

sqr = g

40 Chapter 3 The Smart Language and ProvenTools

e l se better_approximation (x g aux +)sqrt(x aux sqr +)

Besides recursion Smart also supports loops by providing a specific construct thatis similar to a traditional ldquowhilerdquo loop in other programming languageswhile

The body of thiswhile block is repeatedly executed until a dedicated exit label calledexit tries to escape in which case the loop is aborted and the execution continues afterthe block A ldquobreakrdquo can be achieved by raising the special exit label inside the loop

For instance the previously recursive predicate sqrt can be implemented iterativelywith a while loop as follows

public sqrt_iter (double x double g double sqr +)-gt [ true undef ] Computes the square root of x iteratively program

div_double (x g sqr +)while double aux

[ true exit f a l s e true]close_approximation (sqr g)

better_approximation (x g aux +)div_double (x aux sqr +)

313 Polymorphism amp Algebraic Data Types

Smart supports polymorphic types and predicates For declaring polymorphic types anumber of type parameters must be introduced in the typersquos declaration For examplean implicit type of polymorphic pairs can be declared as follows

type pair ltA Bgt

This type is parameterized by two types A and B which are the types of the first and sec-ond projection of the pair Type variables must always start with an uppercase letterwhile regular types must always start with a lowercase letter The declaration of poly-morphic predicates is straightforward For instance declaring an implicit constructorfor the pair type declared above amounts to the following

31 The Smart Modeling Language 41

public new_pair (A a B b pair ltA Bgt p+)impl ic i t program

This predicate is implicitly parameterized by two type variables A and B Thetype parameters of a predicate are implicitly determined by the type variables in itsarguments Local variables in explicit predicates can also be declared with polymorphictypes However they can only depend on type variables introduced in the predicatersquosparameters Type variables in polymorphic types can be instantiated by any type

As mentioned in Section 311 Smart allows users to define their own concrete datatypes by using algebraic data types namely structures and variants

Structures Structures also called records or tuples in other programming languagesrepresent the Cartesian products of the different types of their elements called fieldsIn Smart these can be declared in two manners either by using the keyword structfollowed by the name of the structured type and its list of field types and field namesor by using the keyword type as shown below The latter is preferred Declaringpolymorphic structures is possible by introducing type variables in the definition

struct pair ltA Bgt A fstB snd

type pair ltA Bgt = A fstB snd

In order to build and manipulate structures Smart supports built-in constructorsand accessors For instance for the following type definition of a structure

type t = t1 f1t2 f2

tn fn

a constructor a destructor as well as individual accessors and ldquoupdatersrdquo for any ofthe structurersquos fields are generated by Smart Constructing an object of type t amountsto using tnew which requires a value for each of trsquos fields For example creating astructure value s of type t with values e1 en for each field amounts to callingtnew(s+ e1 en) The values of these fields can all be read with a singlepredicate call to tall(s e1+ en+) (which ldquodestructsrdquo the structure value intoits fields components) Individual accessors of type tfi(s ei+) are provided as wellfor any field fi Finally the value of a field fi can be set to some variable vi by usingtfi(s+ vi) As all statements in Smart this call has a functional nature and handlesimmutable data Thus setting the value of the fi field amounts to returning a newstructure where all fields have the same value as s except fi which is set to vi

It is possible to define a structured type with no fields at all

42 Chapter 3 The Smart Language and ProvenTools

struct unit

The value s of this type can be constructed by using unitnew(s+) without any inputThis type can be seen as representing the absence of information

Variants Many programs need to deal with heterogeneous collections of values Forexample a node in a binary tree can be either a leaf or an interior node with twochildren similarly a node of an abstract syntax tree in a compiler can represent avariable an abstraction an application etc Variant types provide the mechanismthat supports this kind of heterogeneous value collections (Pierce 2002)

Variants also called tagged unions in other programming languages can be seen asthe dual of structures A variant is the disjoint union of different types It representsdata that may take on multiple forms where each form is marked by a specific tagcalled the constructor

Revisiting our previously declared types cartesian_point and polar_point in Smartwe can define a type point as being either expressed in Cartesian or in polar or sphericalcoordinates using the following variant declaration

type point =| Cartesian ( cartesian_point p)| Polar ( polar_point p)| Spherical ( f l oa t r f l oa t theta f l oa t phi)

Each form that a variant can take is indicated by the symbol | followed by theuppercase tag and the list of parameters and their types The cases are mutuallyexclusive and a value of type point can have only one form at a time An object of typepoint can be built by using one of the constructors called with the appropriate numberand types of inputs For instance a Cartesian point pc can be obtained by callingpointCartesian(p pc+) Given an object pt of type point we can also distinguishbetween the different cases by using a constructor that is similar to the match withconstruct in OCaml

switch (pt)case Cartesian ( cartesian_point p) get_X(p x+)case Polar ( polar_point p) get_radius (p r+)case Spherical ( f l oa t r f l oa t theta f l oa t phi)

For verifying if a given point pt is a Cartesian point we can use

pointcase[ Cartesian ](pt)

31 The Smart Modeling Language 43

This could be obtained using the switch construct but for practical considerationsthe case construct has been additionally provided as a built-in predicate

314 Specifications

Smart also supports various types of logical specifications ranging from axioms andlemmas to pre- and postconditions invariants and inductives

In Section 311 we stated that implicit predicates are a form of assumption andthat declaring implicit Smart types and the predicates manipulating them provides aconvenient manner of axiomatizing external implementations frequently developed in alower-level language They can provide implementation-independent descriptions andact as abstractions that hide hardware-related details and low-level implementationdecisions Another form of assumptions are hypotheses Hypotheses are logical resultsthat are assumed ie they constitute axioms which are supposed to be true In Smarthypotheses are specification-only predicates ie they cannot be called in the codeThey are introduced by the keyword hypothesis

For example we could revisit our polymorphic pair type introduced in Section 313and provide a polymorphic axiomatization for it by using implicit predicates and hy-potheses that stipulate that the operations fst and snd retrieve the first and secondrespectively elements of the pair These are declared as follows

type pair ltA Bgt

public new_pair (A a B b pair ltA Bgt p+)impl ic i t program

public fst(pair ltA Bgt p A a+)impl ic i t program

public snd(pair ltA Bgt p B b+)impl ic i t program

public hypothesis pair_fst (A a B b)program pair ltA Bgt p A a2

new_pair (a b p+)fst(p a2 +)a = a2

public hypothesis pair_snd (A a B b)program pair ltA Bgt p B b2

new_pair (a b p+)snd(p b2 +)b = b2

44 Chapter 3 The Smart Language and ProvenTools

Lemmas are another type of specification-only predicates meant to facilitate prov-ing logical properties In contrast to hypotheses lemmas must be proven A lemmacan be introduced with the keyword lemma and it states that all paths that exit fromits body with an undeclared exit label represent impossible execution scenarios

In Section 311 we introduced a type cartesian_point allowing to express a pointby its Cartesian coordinates and we defined a predicate translate_point for translatinga point by a given pair of numerical values (a b) We revisit our example and implementa predicate that translates a pair of points by a fixed pair of numbers (a b) that areadded to the Cartesian coordinates of each point of the pair In addition we consideran implicit predicate euclidean_dist that computes the Euclidean distance d

d =radic

(x2 minus x1)2 + (y2 minus y1)2

between a pair of points 〈(x1 y1) (x2 y2)〉 These are declared as follows

type point_pair = pair lt cartesian_point cartesian_point gt

For a pair of points (( x1 y1 ) (x2 y2 )) computed = sqrt ((x2 - x1 )^2 + (y2 - y1 )^2)

public euclidean_dist ( point_pair p f l oa t d+)-gt [ true]impl ic i t program

For a pair of points (( x1 y1 ) (x2 y2 )) and a fixednumerical pair (a b) compute ((x1 rsquo y1 rsquo) (x2 rsquo y2 rsquo))as (( x1 + a y1 + b) (x2 + a y1 + b))

public translate_pair ( point_pair p pair lt int int gt tpoint_pair o+)

-gt [ true]

The translation of a pair of points preserves the Euclidean distance between themthe Euclidean distance of a pair of points p will be equal to the Euclidean distanceof the pair of points obtained after a translation We can express this property bydeclaring it as a lemma

public lemma edist_preserved (pair lt f l oa t f loat gt tpoint_pair p)

program point_pair translated f l oa t d1 f l oa t d2

euclidean_dist (p d1+) =gttranslate_pair (p t translated +) =gteuclidean_dist ( translated d2 +) =gt d1 = d2

31 The Smart Modeling Language 45

Specifying contracts for Smart predicates is also possible by employing pre- andpostconditions A precondition represents a logical property that must be true priorto calling a predicate and it serves the purpose of letting the callers know when it issafe to call some predicate Typically it represents the callerrsquos obligations In Smarta precondition can be introduced with the keyword pre and it can be attached to anyimplicit or explicit predicate A precondition can refer to the predicatersquos inputs andit can declare its own local variables However it cannot make use of the predicatersquosoutputs

For instance for the atan2 predicate discussed in Section 312 we could indicatethat the predicate should never be called with the coordinates (0 0) of the origin byadding the following precondition

public const f l oa t ZERO

public atan2 ( f l oa t x f l oa t y f l oa t at +) -gt [ true]pre

x = ZERO || y = ZEROimpl ic i t program

A postcondition represents a logical condition that must be true after executinga predicate Its purpose is to indicate to the callers of a predicate what they areentitled to expect with respect to the outputs produced by the predicate In Smartpostconditions are introduced with the keyword post and they can be attached toany implicit or explicit (computational) predicate on a subset or all of the predicatersquosoutput labels They can refer to the predicatersquos inputs and the outputs associated tothe label considered in the postcondition Additionally they can declare their own localvariables

For instance a predicate equal_points verifying if two points are equal and havingfour possible exit labels eq_points eq_x eq_y and false respectively could declarepostconditions as follows

public equal_points ( cartesian_point p cartesian_point q)-gt [ eq_points eq_x eq_y f a l s e ]program f l oa t px f l oa t qx f l oa t py f l oa t qy

cartesian_pointx(p px +)cartesian_pointx(q qx +)cartesian_pointy(p py +)cartesian_pointy(q qy +)i f px = qxthen

[ true eq_points f a l s e eq_x] py = qy e l se

[ true eq_y] py = qy

post eq_points p = q

46 Chapter 3 The Smart Language and ProvenTools

post eq_x f l oa t x1 f l oa t x2 cartesian_pointequals[x](pq)

post p = q

The first postcondition applies to the exit label eq_points the second to the labeleq_x and the last one indicated by applies to labels eq_y and false

In Smart mathematical relations can be represented by introducing inductives orschemes These predicates have no outputs but they always have true and false astheir exit labels Inductive predicates are the only part of the language that cannot betransformed into executable code however they can be used to facilitate the proofsPredicates introduced with the inductive keyword represent the least fixed point oftheir cases introduced with the keyword case and a user-defined name Each case canintroduce existentially quantified variables In particular in the absence of recursioninductive predicates represent a parallel disjunction of cases An inductive predicatewill exit with the label true if any of its declared cases holds

For example we could specify membership for an implicit array type using aninductive named contains having a single case with the user-defined name ElemAtwhich introduces an existentially quantified variable idx

type array ltAgt

public get_size (array ltAgt arr int s+)impl ic i t programpublic get_elem (array ltAgt arr int i A ai+)-gt [ true oob]impl ic i t program

Membership defined with an inductive and an existential public contains (array ltAgt arr A a) -gt [ true f a l s e ]inductive An array contains an element if there exists a validindex where this element is to be found case ElemAt ( int idx ) A b

[ oob f a l s e ] get_elem (arr idx b+) ampamp b = a

Schemes on the other hand represent conjunction of cases cases are introducedwith the keyword with followed by a user-defined name and each of them can introduceuniversally quantified variables A scheme will return the label true only if all of itsdeclared cases hold

Using a scheme with two cases Size and Forall as shown below we can definethe pointwise equality of arrays The first case Size verifies if the two arrays have thesame length by introducing two universally quantified variables n and m The Forallcase verifies that for any index i the arrays contain equal elements Two arrays are

31 The Smart Modeling Language 47

equal pointwise if and only if they are of the same size and at any given index i thearrays have the same element

public equals_pointwise (array ltAgt arr1 array ltAgt arr2)-gt [ true f a l s e ] Extensional equality of arrays [arr1] and [arr2]scheme They must be of the same sizewith Size int n int m

get_size (arr1 n+) =gt get_size (arr2 m+) =gt n = m

If they exist elements at the same index must be equalwith Forall ( int i) A a A b

get_elem (arr1 i a+) =gt get_elem (arr2 i b+) =gta = b

Loop invariants are supported as well These can be introduced in various waysfor instance by declaring them with the keyword invariant or by declaring them asinductives

315 Illustrating Smart ndash An Abstract Process Manager

To illustrate the Smart language and its capabilities we consider an abstract processmanager and its fundamental components process and thread We define the data struc-tures corresponding to threads and processes implement the predicates correspondingto a simple thread switch and specify some fundamental properties for processes

Thread

Stack Register Counter

Data Files

Code

Process with a single thread

Thread1 Threadn

Stack Stack

Counter Counter

Register Register

Data Files

Code

Process with n threads

The implementation of threads and processes differs depending on the operatingsystem but frequently a thread is a component of a process that belongs to exactlyone process outside which it cannot exist Each thread represents a separate flow of

48 Chapter 3 The Smart Language and ProvenTools

control Multiple threads can be associated to one process they execute concurrentlyand provide a mechanism to improve application performance through parallelism Ina nutshell threads represent a software approach to improving the performance ofoperating systems by reducing the overhead of process switching

A thread is a flow of execution through the process code having its own programcounter that keeps track of which instruction to execute next as well as systemregisters which hold its current working variables and a stack which contains theexecution history Every thread is uniquely identified by a thread identifier Peerthreads share some information such as the code and data segments When one threadalters a code memory item all other threads see the change

Ready

Running

Blocked

Figure 31 ndash Possible Transitions between Thread States

We define a thread type as a structure consisting of multiple fields such as thethreadrsquos identifier its current state and the memory region for its stack

type memory_region = Start addressint start Region lengthint length

type state =| Ready| Running| Blocked

type thread = Identifierint id Current statestate crt_state Stackmemory_region stack

The threadrsquos stack is identified by its start address and its length The state of athread is defined as a variant having three alternatives Running (the thread is currentlyexecuting) Ready (the thread is currently awaiting execution and could potentially bestarted) and Blocked (the thread has exhausted its allocated time or is waiting foran event to occur it must be unblocked before being able to execute) The possibletransitions between states are shown in Figure 31 A threadrsquos current state determinesthe valid transitions

Similarly a process is defined as a structure consisting of an internal identifier anidentifier for the thread that is currently executing an address space and an array ofpossibly inactive threads associated with it Whether a thread in the thread array isactive or has terminated is indicated by a variant of type option An inactive thread

31 The Smart Modeling Language 49

indicated by None is a thread that terminated its execution and whose slot in the arrayof associated threads has not been reallocated In contrast a blocked thread indicatedby Some is a thread that cannot execute currently but should execute in the futureonce the resources it is waiting for are freed We consider a segmented address spacewith addresses existing not in a single linear range but instead in multiple segmentscorresponding to the code the data and the stack respectively

type option ltAgt =| None| Some (A a)

type address_space = memory_region codememory_region datamemory_region stack

type process = Array of associated threadsarray ltoption ltthread gtgt threads Internal idint pid Currently running threadint crt_thread Address spaceaddress_space adr_space

Next we consider a simple predicate called stop_thread having two possible exe-cution scenarios as indicated by its two exit labels true and invalid When the giveninput index i corresponds to an active thread the predicate executes successfully thusexiting with true In this case the state of the i-th thread associated to the inputprocess is set to Blocked and the new state of the process is returned in the outputout Otherwise when the given index i corresponds to a thread that is Ready or whenthere is no active thread at that index the predicate exits with the label invalid andno output is generated

public stop_thread ( process in int i process out +)-gt [ true invalid ]program array ltoption ltthread gtgt ta state s thread ti

option ltthread gt tio

Copy in to outout = in Fetch in threads and copy it to taprocessthreads (in ta +) Get the array rsquos i-th element[ oob invalid ] get_elem (ta i tio +) Check if the i-th element is activeswitch (tio)case Some ( thread th) ti = th case None ra i se invalid

50 Chapter 3 The Smart Language and ProvenTools

Get the thread rsquos current statethreadcrt_state (ti s+) Check whether the transition is valid[ f a l s e invalid ]statecase[ Running ](s) Create the new state for the running threadstateBlocked (s+) Set the newly created statethreadcrt_state (ti+ s) Reset tio to the thread with the modified stateoptionSome(tio + ti ) Reset the i-th thread and return the new state ta[ oob invalid ] set_ei (ta i tio ta +) Update out threads to taprocessthreads (out + ta)

Another auxiliary predicate called start_thread when given a valid index of anunblocked thread sets the state of the i-th thread to Running It is implementedsimilarly as shown below

public start_thread ( process in int i process out +)-gt [ true invalid ]program array ltoption ltthread gtgt ta state s thread ti

option ltthread gt tio

Copy in to outout = in Fetch in threads and copy it to taprocessthreads (in ta +) Get the array rsquos i-th element[ oob invalid ] get_ei (ta i tio +) Check if the i-th thread is activeswitch (tio)case Some ( thread th) ti = th case None ra i se invalid

threadcrt_state (ti s+)

Check whether the transition is valid[ f a l s e invalid ]statecase[Ready ](s) Create the new state for the running threadstateRunning (s+) Set the newly created statethreadcrt_state (ti + s) Reset tio to the thread with the modified stateoptionSome(tio + ti ) Set the i-th element and return the new state ta[ oob invalid ] set_ei (ta i tio ta +)

31 The Smart Modeling Language 51

Update out threads to taprocessthreads ( out + ta)

These two predicates will be called by the predicate run_thread that performs asimple thread switch It stops the thread currently executing indicated by crt_threadand starts the one with the given index i The new state of the process is returned inthe output out

public run_thread ( process in int i process out +)-gt [ true inval ]program int crt

processcrt_thread (in crt +)[ true true invalid inval ] stop_thread (in crt out +)[ true true invalid inval ] start_thread (out i out +)processcrt_thread (out + nid )

Next we introduce a fundamental property for any valid process state namely thefact that the stack regions of all its associated threads are completely disjoint

public not_disjoint ( process p) -gt [ true f a l s e ]inductivecase StacksJoint ( int i int j) thread ti thread tj memory_region sti

memory_region stj i = j[None f a l s e ] thread (p i ti +)[None f a l s e ] thread (p j tj +)threadstack(ti sti +) threadstack (tj stj +)overlap (sti stj )

case CodeStackJoint ( int i)

thread ti memory_region sti address_space asmemory_region code [None f a l s e ] thread (p i ti +)threadstack (ti sti +)processadr_space (p as +)address_spacecode(as code +)overlap (sti code )

case DataStackJoint ( int i)

thread ti memory_region sti address_space asmemory_region data [None f a l s e ] thread (p i ti +)threadstack (ti sti +)

52 Chapter 3 The Smart Language and ProvenTools

processadr_space (p as +)address_spacedata(as data +)overlap (sti data )

public disjoint_stacks ( process p) -gt [ true f a l s e ]program

not_disjoint (p)

This property is expressed using an inductive predicate that characterizes the potentialsituations in which the memory isolation of the different associated threads of a processcan be broken The natural manner of expressing such a property in Smart is by usinga scheme as presented in Section 314 here we use an inductive predicate becausethe language we are working with and which will be presented in Chapter 4 doesnot support schemes In our inductive predicate the first case StacksJoint checkswhether there exist two different threads having overlapping stacks The next twocases CodeStackJoint and DataStackJoint check whether there exists a thread whosestack overlaps the processrsquo code segment or data segment respectively This uses anauxiliary predicate verifying if two memory regions overlap ie if there exists anaddress that is contained simultaneously by two different segments This operation issymmetric we express this property with the lemma overlap_sym

public contains ( memory_region m int address )-gt [ true f a l s e ]impl ic i t programpublic overlap ( memory_region m1 memory_region m2)-gt [ true f a l s e ]inductivecase InBoth ( int address )

contains (m1 address ) ampamp contains (m2 address )

public lemma overlap_sym ( memory_region m1 memory_region m2)-gt [ true f a l s e ]program

overlap (m1 m2) =gt overlap (m2 m1)

32 ProvenToolsProvenTools is a comprehensive set of development tools for the Smart language Ithas been developed at Prove amp Run with the goal of facilitating the achievement ofhigh-level certifications The toolchain has the structure of a set of Eclipse plug-ins ofJDT type ndash Java Development Tools Together these constitute a complete IntegratedDevelopment Environment (IDE) allowing one to not only write edit and document

32 ProvenTools 53

Smart models but also to browse proof obligations to prove them by employing abuilt-in prover and finally to generate executable code in C or Java

The plug-ins are based on Xtext (Xtext Documentation) an official Eclipse plug-indedicated to the creation of DSLs (Domain Specific Languages) in Eclipse Xtext-basedDSLs are described in an EBNF (Extended Backus-Naur Form) grammar languageFully statically typed expressions can be embedded in the developed DSL and Javastyle scoping and linking are supported

Proofs

ProofObligations

C Code

Java Code

Prover

Code Generators

Prover

Code Generators

SmilSmart Code ampSpecifications

Front-end Back-end

Figure 32 ndash The ProvenTools Toolchain

Concretely the toolchain includes a compiler whose front-end contains the plug-inin charge of Smart as well as the plug-in dedicated to Smil the Smart IntermediateLanguage to which Smart programs and specifications are translated Smil is a simplerform of the Smart language Though roughly equivalent to Smart Smil has a ratherdifferent form manipulating less complex structures and having no syntactic sugarHarder to be understood by a human reader Smil is meant to be easily manipulated bythe back-end of the toolchain The back-end currently offers a C code generator andan interactive prover An overview of this architecture is shown in Figure 32

While employing ProvenTools the code undergoes various compilation steps andtransformations During the compilation chain the Smart code is transformed to aSmart AST (Abstract Syntax Tree) The obtained AST is then compiled to a SmilAST Following the Smil AST is transformed to Smil source code and then reinsertedin the compilation chain by the plug-in in charge of it

After finishing all the compilation chain and obtaining the Smil AST and the asso-ciated Smil source code the back-end of the compiler can be employed The back-endcomprises a source code generator and a prover The generator transforms Smart mod-els into their equivalents in C

54 Chapter 3 The Smart Language and ProvenTools

Figure 33 ndash Smart Editor

Smart Editor The Smart editor provides facilities to edit Smart code and supportsbroad and complex features such as syntax highlighting facilities for code navigationand visualization and edition assistants including word completion and quick fixes Asnapshot of it is shown in Figure 33

Prover ProvenTools provides users a dedicated view for interacting with the proverThis presents the existing proof obligations and provides facilities to solve them Proofobligations are generated for any logical lemma precondition postcondition or invariantincluded in the Smart models Additionally any label that remains unhandled in thecode triggers the generation of a proof obligation thus enforcing that each possible exitlabel of a predicate is either explicitly handled or proven to be impossible

An automatic prover trying various proof search procedures is called whenever aproof obligation is generated It uses previously proven obligations or existing hypothe-ses for discharging new obligations automatically Unproved obligations can be solvedby interactively employing manual tactics called hints which are provided in the IDEHints that are considered useless with respect to the currently selected proof obliga-tions are automatically disabled Additionally users can define strategies ie proofpatterns and employ an interactive proof assistant that applies them automatically inthe background This will suggest a possible proof as soon as it finds one Proofs thusfound are rechecked as if they had been done manually

33 Smil 55

ProvenTools offers facilities to inspect any manual or automatic proof step thusmaking an eventual review of the proofs possible The toolchain also provides a dedi-cated system for assisting the user into adapting former proofs to new changes due tocode maintenance or evolution

C Code Generator The executable part of Smartmodels is translated to executableC code by the C code generator To this end the executable parts of the Smart modelsare identified and extracted while the logical parts are discarded Users can guidethis process through annotations and they can specify that particular values are purelylogical Functional implementations are transformed to imperative ones the dedicatedC code generation plug-in tries to replace functional modifications of structures in themodels by in-place updates Such transformations are correct only if the differentvalues are handled linearly in the Smart code ie if no previous value is read afterapplying a functional update on it For ensuring the safety of functional to imperativecode transformations the C generation plug-in employs various global static analysesWhen safety cannot be guaranteed the generator reports errors or introduces copiesif the users deemed it acceptable

In earlier experiments (Lescuyer 2015) the Prove amp Run team was able to generateC code for a complete model of ProvenCore that did not require dynamic allocationand ran at a speed comparable to the original C code

33 SmilSmil is an intermediate language to which Smart models are compiled Similarly toSmart Smil is a functional language with algebraic data types (structures and variants)However unlike Smart Smil is not a user-oriented language ie it was not designed towrite programs in it directly but rather to provide a representation of Smart programsat a different level of abstraction Thus reading Smil code is a rather cumbersome taskas it is a language without syntactic sugar meant to serve as a starting point for themain components of the ProvenTools back-end exploiting Smart models the prover andthe code generator

To give an idea of Smilrsquos syntax we illustrate below the types thread and processas well as the stop_thread predicate from our abstract process manager example givenin Section 315

public type state =| Ready| Running| Blocked

public type thread = id int crt_state statestack memory_region

56 Chapter 3 The Smart Language and ProvenTools

public state_acopy_ahypothesis (state state_1 ) -gt [ true]hypothesis state state_2

[lt1gt] stateswitch ( state_1 )-gt [ Ready -gt 5 Running -gt 4 Blocked -gt 3]

[lt2gt] ==ltstate gt( state_1 state_2 )-gt [ true -gt true f a l s e -gt error ]

[lt3gt] stateBlocked ( state_2 )-gt [ true -gt 2]

[lt4gt] stateRunning ( state_2 )-gt [ true -gt 2]

[lt5gt] stateReady ( state_2 )-gt [ true -gt 2]

public thread_ahypothesis ( thread x1) -gt [ true]hypothesis thread x2 int zid state zcrt_state

memory_region zstack [lt1gt] threada l l (x1 zid zcrt_state zstack )

-gt [ true -gt 2][lt2gt] threadnew(x2 zid zcrt_state zstack )

-gt [ true -gt 3][lt3gt] ==lt thread gt( x1 x2)

-gt [ true -gt true f a l s e -gt error ]

The type declarations in Smil strongly resemble their Smart counterpart Predicatedeclarations as well mirror the form found in Smart except that in Smil any outputvariable associated to the true exit label is explicitly declared as such Preconditionsand postconditions are appended to any predicate and as shown above a hypothesisis added for any explicitly declared type

The real syntax differences are visible in predicate implementations every state-ment is preceded by a numerical label and every possible exit label lbl of the statementindicates another numerical label The latter numerical label actually designates thestatement that will be executed next if the current statement exits with label lbl Inparticular this mechanism replaces the try catch and the conditional controlconstructs as well as the logical operators and any other construct based on labeltransformers described in Section 312 Thus the predicate bodies are very similar inform to a control flow graph where the statements represent the nodes of the graphand the exit labels represent transitions

public stop_thread ( process in int i process out +)-gt [true ltout gt invalid ]

pre [lt0gt] true() -gt [ true -gt true]

33 Smil 57

array ltoption ltthread gtgt ta state s thread tioption ltthread gt tio thread th

[lt1gt] =lt process gt( out in)-gt [ true -gt 2]

[lt2gt] processthreads (in ta)-gt [ true -gt 3]

[lt3gt] get ltoption ltthread gtgt(ta i tio)-gt [ true -gt 4 oob -gt invalid ]

[lt4gt] optionswitch ltthread gt( tio th)-gt [None -gt 6 Some -gt 7]

[lt5gt] stateBlocked (s)-gt [ true -gt 8]

[lt6gt] true()-gt [ true -gt invalid ]

[lt7gt] =lt thread gt(ti th)-gt [ true -gt 5]

[lt8gt] threadcrt_state +( ti ti s)-gt [ true -gt 9]

[lt9gt] optionSome ltthread gt( tio ti)-gt [ true -gt 10]

[lt10 gt] set ltoption ltthread gtgt(ta i tio ta)-gt [ true -gt 11 oob -gt invalid ]

[lt11 gt] set ltoption ltthread gtgt(ta i tio ta)-gt [ true -gt 12 oob -gt invalid ]

[lt12 gt] processthreads +( out out ta)-gt [ true -gt true]

post true 0post invalid 0

In a nutshell Smil constitutes a representative albeit restricted set of constructsand it is a language designed to be well-suited for further transformations and analyses

The next chapter focuses entirely on αSmil the computational version of Smil withwhich we are working throughout the rest of this thesis We will illustrate its usageand describe its abstract syntax and formal semantics

59

Chapter 4

The αSmil Language

One day I will find the right wordsand they will be simple

Jack Kerouac

In this chapter we define the syntax and the semantics of αSmil the languagethat we consider in this thesis This is a computational version of Smil (presented inSection 33) which is essentially a subset of Smart presented in the previous chapterChapter 3 However it contains a few additional elements introduced for the purposeof this thesis

The αSmil language is a first-order purely functional and strongly-typed languagewith arrays and algebraic data types ie structures and variants It is an intermediateanalysis-oriented language

41 αSmil SyntaxThe αSmil language is minimal in the sense that it contains only those constructs thatare needed for the purpose of this thesis For instance unlike Smart and Smil thelanguage does not contain visibility modifiers because these modifiers play no role inthe techniques presented in the sequel During the introduction of the grammar wewill point out the most important deviations from Smart and Smil

Programs A program in αSmil consists of a number of type and constant declara-tions and definitions followed by a collection of predicates In contrast to Smart andSmil type and predicate declarations have no visibility modifiers (such as public) andthey are not organized into modules The absence of visibility modifiers is a naturalconsequence of the disappearance of modules We assume that there is one modulein which every type constant and predicate declaration resides and these are mutu-ally visible to each other These restrictions are made for the sake of simplicity sincethe techniques proposed in this thesis are orthogonal to the concepts of visibility andmodules

Constants are declared using the keyword const followed by the type and the con-stant identifier Constant identifiers are written in upper-case letters and are precededby the special symbol

60 Chapter 4 The αSmil Language

Types are declared using the keyword type followed by the type identifier and op-tionally in the case of polymorphic type declarations by a number of type parametersgiven in upper-case letters between ltgt In the case of implicit types this constitutes thecomplete type declaration Explicit type declarations continue with the symbol = andthe typersquos definition Throughout the rest of this chapter and the presentation of ourstatic analyses we will ignore polymorphism The abstract types of our analyses arenot polymorphic and the impact of polymorphism is visible only at the implementationlevel for type substitutions that will be discussed in Chapter 8

Types Similarly to Smart algebraic data types ie structures and variants andassociative arrays are supported We let T be the universe of type identifiers andT0 sub T the set of base type identifiers We assume a set of identifiers for structurefields and variant constructors denoted by F and C respectively

A structure represents the Cartesian product of the different types of its elementscalled fields A variant is the disjoint union of different types It represents data thatmay take on multiple forms where each form is marked by a specific tag called theconstructor Arrays group elements of data of the same type (given in angle brackets)into a single entity elements are selected by an index whose type is included (as denotedby the superscript) in the arrayrsquos definition as well

Definition 411 Types τ isin T in αSmil

τ isin T τ = | τ0 isin T0 base types| structf1 τ fn τ fi isin F 0 le n structures| variant[C1 τ | | Cn τ ] Ci isin C 1 le m variants| arrτ 〈τ〉 arrays

Variants and structures can be used together to model traditional algebraic variantswith zero or several parameters For instance a generic type optionltTgt is actuallymodeled as

variant[Some structt T | None struct]

Concretely structures are declared and defined by indicating a set of pairs of fieldidentifiers and their corresponding types between Declaring structures with no fieldsis possible Variants are declared and defined by indicating the list of their constructorseach starting with an upper-case letter preceded by the symbol | Unlike structuresvariants must have at least one declared constructor For instance the state and threadtypes from our Abstract Process Manager example given in Smart in Section 315 onpage 48 have the following Smil declaration

type state =|Ready| Running| Blocked

type thread = id int crt_state statestack memory_region

41 αSmil Syntax 61

In contrast to Smart in structure declarations the field name precedes the field type

Predicates Predicates are declared using the keyword predicate which is specificto αSmil followed by a predicate identifier and a signature A signature is given by asequence of input types and a non-empty finite mapping of exit labels λ isin L errorto sequences of output labels The set of exit labels L contains three distinguishedelements true false and error The latter cannot appear in predicate signatures it isused as a sink node in control flow graphs which will be presented in Section 42 Wewrite signatures in the following manner

σ =

(x1 τ1 xn τn)︸ ︷︷ ︸input identifiers types

[λ1 (τ11 y11 τ1k1 y1k1)| |label (output types identifiers)︷ ︸︸ ︷λp (τp1 yp1 τpkp ypkp)]︸ ︷︷ ︸

p possible exit labels

We denote by Σ the mapping between predicate identifiers and their signaturesThe predicate declaration is followed by the predicatersquos body Depending on its

bodyrsquos nature a predicate will be implicit explicit or inductive Smart implicit andexplicit predicates have been presented in Section 311 of our previous chapter whileinductive predicates have been illustrated in Section 314 on page 46 For implicitpredicates the body consists solely in the keyword implicit For explicit predicates anoptional declaration unit can follow This is a finite mapping from variables to types andit must be given between double curly braces ie typeid videntifier Input andoutput parameters must be different from all the variables appearing in the declarationunits Declaration units are followed by a sequence of statements representing calls topredicates

Just as presented in Chapter 314 for Smart an inductive predicate is syntacticallydistinguished by the keyword inductive followed by its different cases declared withthe keyword case followed by an identifier an optional list of existentially quantifiedvariables and a body of statements

A generic call to a predicate p is of the form

p(e1 en) [λ1 o1 | | λm om]

The predicate p is called with inputs e1 en and yields one of the declared exitlabels λ1 λm each having its own set of associated output variables o1 omrespectively We denote by o a sequence of 0 or more output variables

Statements The αSmil language supports the statements presented in Table 42These represent calls to built-in predicates and can be seen as special cases of thepredicate call presented above All statements have a functional nature and handleimmutable data A statement consists in as many variables as there are input types

62 Chapter 4 The αSmil Language

s = | o = e (1) assignment| e1 = e2 (2) equality test| nop (3) no operation| r = e1 en (4) create structure| o1 on = r (5) destructure structure| o = rfi (6) access field| rprime = r with fi = e (7) update field| rprime = 〈f1 fk〉rprimeprime (8) check (partial) structure equality| v = Cp[e] (9) create variant| switch(v) as [o1| |on] (10) destructure variant| v isin C1 Ck (11) variant possible| o = a[i] (12) array access| aprime = [a with i = e] (13) array update| p(e1 en) [λ1 o1 | | λm om] (14) predicate call

Table 42 ndash αSmil ndash Set of Supported Statements

in the signature σp of the called built-in predicate p and a mapping associating toeach exit label of σp a sequence of variables one variable for each output type in thecorresponding sequence

The first three statements are generic and can be applied to any type Statement (1)is a call to the built-in assignment predicate denoted by = present in an identical formin Smart as well Statement (2) is a call to the logical operator = verifying whether itstwo input arguments are equal Statement (3) is the αSmil equivalent of a no-operationAs a general convention for the statements notation we denote by e the identifiers ofentry variables and by o the identifiers of output variables

Statements (4) ndash (8) are structure-related The first of them statement (4) is theconstructor of a structure r of type rtype having n fields It corresponds to the state-ment rtypenew(r+ e_1 e_n) in Smart Statement (5) returns the values ofall the fields of r into the output parameters o1 on and it is the equivalent ofrtypeall(r o_1+ o_n+) in Smart Statement (6) is the individual accessor ofa field fi and corresponds to rtypef_i(r e_i+) in Smart As previously mentionedour language is purely functional and handles only immutable algebraic data structuresand arrays Therefore setting the field fi of a structure shown in (7) and being theequivalent of rtypef_i(rrsquo+ e_i) returns a new structure where all fields have thesame value as in r except fi which is set to ei Statement (8) verifies if the valuesof the indicated subset of fields of two structures rprime and rprimeprime are equal It exists inSmart as well where it has a similar syntax rtypeequals[fg](rrsquo rrsquorsquo) for check-ing that the values of fields f and g of the two structures are equal or the dualrtypeequals-[fg](rrsquo rrsquorsquo) for checking that the values of all fields except f and gare equal

The next group of statements is variant-related The first of them statement (9)creates a new variant v of type vtype using the constructor Cp with e as an argumentIt corresponds to vtypeCp(v+ e) in Smart Statement (10) is used for matching on

41 αSmil Syntax 63

the different constructors of the input variant v and corresponds to switch(v) case in Smart The last statement of this group statement (11) verifies if the given variantwas created with one of the constructors in C1 Ck This could be obtained witha variant switch but for practical considerations it has been provided as a built-inpredicate Its counterpart in Smart is vtypecase[C1 Ck](v)

Statements (12) and (13) are array-related (12) returns the value of the i-th cell ofthe input array a Similarly to (7) updating the i-th cell of an array ndash shown in (13) ndashhas a functional nature It returns a new array where all cells have the same values asin a except the i-th cell which is set to e These statements are specific to αSmil

Statement (14) is a generic call to a predicate p and has been presented on page 61

Exit Labels All of the built-in supported statements have an associated set of exitlabels λ isin L error These are indicated in Table 43 There are two distinguishedexit labels true and false respectively An additional built-in label called error is usedas a sink node in control flow graphs It cannot be used as an exit label for a predicate

Table 43 ndash Statements and their Exit Labels

Statement Exit Labels

o = e (1)[true 7rarr o

]

e1 = e2 (2)

[true 7rarr emptyfalse 7rarr empty

]

nop (3)[true 7rarr empty

]r = e1 en (4)

[true 7rarr r

]o1 on = r (5)

[true 7rarr o1 on

]o = rfi (6)

[true 7rarr o

]rprime = r with fi = e (7)

[true 7rarr rprime

]

rprime = 〈f1 fk〉rprimeprime (8)

[true 7rarr emptyfalse 7rarr empty

]

v = Cp[e] (9)[true 7rarr v

]

64 Chapter 4 The αSmil Language

switch(v) as [o1| |on] (10)

λC1 7rarr o1

λCn 7rarr on

v isin C1 Ck (11)

[true 7rarr emptyfalse 7rarr empty

]

o = a[i] (12)

[true 7rarr ofalse 7rarr empty

]

aprime = [a with i = e] (13)

[true 7rarr aprime

false 7rarr empty

]

p(e1 en) [λ1 o1 | | λm om] (14)

λ1 7rarr o1 λm 7rarr om

As shown in Table 43 statement (10) has an exit label λCi corresponding to eachconstructor Ci of the input variant Statements (2) (8) and (11) are bi-labeled using trueand false as logical values Neither of them has any associated outputs Statements (12)and (13) are bi-labeled as well However unlike the previously mentioned statementsthey use the label false as an ldquoout of boundsrdquo exception and generate an output onlyfor the label true All other statements except (14) are uni-labeled they associate alltheir output parameters (if any) to the label true In contrast to Smart in αSmilevery exit label including true must be explicitly indicated Furthermore any outputis explicitly associated to an exit label

In Section 315 (on page 50) of our previous chapter we introduced a Smart pred-icate called stop_thread If the given index i designates an active associated threadthis predicate sets its state to Blocked and returns the new state of the process Oth-erwise the predicate exits with label invalid Revisiting it we can finally indicate itsbody in the αSmil language1

Table 44 ndash Predicate Body in αSmil

Signaturepredicate stop_thread ( process p int i)-gt [ true process o | invalid ] Declaration unit array lt option_thread gt ta option_thread th

thread ti state s Predicate body

1The αSmil version is slightly simplified as we are not checking if the transition to Blocked is valid

41 αSmil Syntax 65

ta = p threads [ true -gt 1] 0th = ta[i] [ true -gt 2 f a l s e -gt 9] 1switch (th) as [ti | ] [Some -gt 3 None -gt 9] 2s = Blocked [ true -gt 4] 3ti = ti with crt_state = s [ true -gt 5] 4th = Some(ti) [ true -gt 6] 5ta = [ta with i = th] [ true -gt 7 f a l s e -gt 9] 6o = p with threads = ta [ true -gt 8] 7[ true] 8[ invalid ] 9

Every statement in our stop_thread example is followed by a construct of the formexit_label -gt numerical_label This indicates the statement to be executed next asidentified by the numerical_label if the current statement exits with label exit_labelFor example when the first statement ta = pthreads exits with label true thepredicatersquos execution continues with the statement following it having the numericallabel 1 We remark that the predicatersquos exit labels are included in the body of anexplicit predicate as can be seen at lines 8 and 9 respectively in the case of trueand inval Intuitively the predicatersquos body resembles a control flow graph and canbe illustrated as shown in Figure 41 The predicatersquos exit labels are the control flowgraphrsquos exit nodes as will be discussed in Section 42

0 ta = inthreads1 th = ta[i]2 switch(th) as [Someti | None]3 s = BLOCKED4 ti = ti with current_state=s5 th = Some(ti)6 ta = [ta with i=th]7 o = in with threads=ta8 true 9 inval

false

None

false

Figure 41 ndash Body of the stop_thread Predicate

We are working with αSmil which is a computational version of Smil where allspecification-only predicates have been removed Simulating hypotheses lemmas andcontracts is straightforward and can be achieved using predicates having only the trueand false labels and no associated output Inductives are the only exception to thisrule they are supported in αSmil as well and their declaration is similar to the one inSmart The αSmil equivalent of the not_disjoint inductive presented in our AbstractProcess Manager example (on page 46) has the following form

predicate not_disjoint ( process p)-gt [ true | f a l s e ]inductive

66 Chapter 4 The αSmil Language

case StacksJoint ( int i int j) thread ti thread tj memory_region sti

memory_region stj i = j [ true -gt 1 f a l s e -gt 7]thread (p i)[ true ti | None] [ true -gt 2 None -gt 7]thread (p j)[ true tj | None] [ true -gt 3 None -gt 7]sti = tistack [ true -gt 4]stj = tjstack [ true -gt 5]overlap (sti stj )[ true| f a l s e ] [ true -gt 6 f a l s e -gt 7][ true][ f a l s e ][error]

case CodeStackJoint ( int k)

thread tk memory_region stk address_space aspmemory_region code

thread (p k)[ true tk | None] [ true -gt 1 None -gt 6]stk = tkstack [ true -gt 2]asp = p adr_space [ true -gt 3]code = aspcode [ true -gt 4]overlap (stk code )[ true| f a l s e ] [ true -gt 5 f a l s e -gt 6][ true][ f a l s e ][error]

case DataStackJoint ( int l)

thread tl memory_region stl address_space aspace memory_region data

thread (p l)[ true tl | None] [ true -gt 1 None -gt 6]stl = tlstack [ true -gt 2]aspace = p adr_space [ true -gt 3]data = aspace data [ true -gt 4]overlap (stl data )[ true| f a l s e ] [ true -gt 5 f a l s e -gt 6][ true][ f a l s e ][error]

predicate disjoint_stacks ( process p) -gt [ true | f a l s e ]

not_disjoint (p)[ true| f a l s e ] [ true -gt 1 f a l s e -gt 2][ true][ f a l s e ][error]

This inductive predicate has been introduced and explained in Section 315 of theprevious chapter (on page 52) and it characterizes the potential situations in which thememory isolation of the different associated threads of a process can be broken

42 Control Flow Graph 67

42 Control Flow GraphPredicate bodies in αSmil resemble a control flow graph representation having state-ments as nodes The nodes represent program states and the edges are defined bystatements with a particular exit label λ

The control flow graph Gp = (N E) of a predicate p has a node ni isin N for eachprogram point For each statement s at program point ni that can execute and reachprogram point nj with exit label λk an edge (ni nj) is added to Gp and labeled withs and λk Gp has a single entry node nin isin N corresponding to the program pointassociated to the first statement of p The set of exit nodes nout sub N consists of thenodes associated to each possible exit label λk of the predicate To these one additionalexit node which is used as a sink node is added This corresponds to the error label

In practice all the outgoing edges of a node ni isin N bear the different cases of thesame statement s found at program point ni Thus the edges are labeled with thesame statement s and there is an edge labeled s λk for each possible exit label λk of s

The subfigures in Figure 42 show the control flow graph of the following predicate

predicate thread ( process p int i)-gt [ true thread ti | None | oob]

which receives a process p and an index i as inputs and returns the i-th active threadof the input process If the i-th thread is inactive it exits with the exit label NoneIn the case of an ldquoout of boundsrdquo exception the exit label oob is returned For betterreadability Figure 42-b gives the control flow of the same predicate where we havelabeled the nodes with statements of the predicate and the edges with their exit la-bels Throughout the rest of our αSmil predicate examples we will favour the latterrepresentation

a) Gthread b) Gthread ndash alternative representationn1

n2

n3 oob

true None

ts = pthreads true

tio = ts[i] truetio = ts[i] false

switch (tio) as [ti| ] Some switch (tio) as [ti| ] None

ts = pthreads

tio = ts[i]

switch(tio) as [ti| ] oob

true None

true

true false

Some None

Figure 42 ndash Example ndash Control Flow Graph of Predicate thread

43 Well-Typed αSmil StatementsWe formally define what it means for an αSmil statement to be well-typed and detailthe full system of inference rules for the statements supported by αSmil in Table 46

68 Chapter 4 The αSmil Language

and Table 47A well-typed αSmil statement is a statement that is compatible with the types

specified in the signature σp of the called built-in predicate p This requires a typingenvironment Γ mapping variables to their types

Definition 431 Typing Environment Γ

Γ V rarr T

Furthermore αSmil distinguishes between variables v isin V which can be writtento and variables which are read-only Therefore the definition of well-typedness forstatements requires two different sets of variable identifiers one for each kind of variableThese are

bull V+ V+ sube V which denotes the set of identifiers of writable and readable vari-ables and

bull V V+ which denotes the set of read-only variables

The mapping between predicate identifiers and their signatures is denoted by Σ

Definition 432 Mapping between Predicate Identifiers and Signatures

Σ P rarr S

Definition 433 Well-Typed Statement A statement s exiting with label λ isin L error is well-typed in the typing environment Γ given Σ

ΣΓO ` srarr λ

if it is compatible with the types specified in its signature Moreover outputs of awell-typed statement must be in the writable variables set O sube V+

The inference rule for a well-typed predicate call captures all these properties andis shown in rule [WTPCall] given in Table 46

Table 46 ndash Well-Typed Predicate Call

Σ(p) = (x1 τ1 xn τn)[λ1 (τ11 y11 τ1k1 y1k1)| | λm (τm1 ym1 τmkm ymkm)]

Γ(e1) = τ1 Γ(en) = τnforalli isin 1 m Γ(oi1) = τi1 Γ(oiki) = τiki

oi1 oiki isin O foralli foralljforallki j 6= ki oij 6= oiki λ isin λ1 λmΣΓO ` p(e1 en) [λ1 o1 | | λm om]rarr λ

WTPCall

43 Well-Typed αSmil Statements 69

The inference rules for the αSmil statements representing calls to built-in predicatesare detailed in Table 47

Table 47 ndash Well-Typed Statements

Γ(e1) = Γ(e2) λ isin true falseΣΓO ` e1 = e2 rarr λ

WTEquals

Γ(o) = Γ(e) o isin OΣΓO ` o = erarr true

WTAsgn

ΣΓO ` noprarr trueWTNop

Γ(r) = structf1 τ1 fn τnΓ(e1) = τ1 Γ(en) = τn r isin OΣΓO ` r = e1 en rarr true

WTRecNew

Γ(r) = structf1 τ1 fn τnΓ(o1) = τ1 Γ(on) = τn foralli oi isin O foralli 6= j oi 6= oj

ΣΓO ` o1 on = r rarr trueWTRecAll

Γ(r) = structf1 τ1 fi τi fn τn Γ(o) = τi o isin OΣΓO ` o = rfi rarr true

WTRecGet

Γ(r) = Γ(rprime) = structf1 τ1 fi τi fn τnΓ(e) = τi rprime isin O

ΣΓO ` rprime = r with fi = e rarr trueWTRecSet

Γ(rprime) = Γ(rprimeprime) = structg1 τ1 gn τnλ isin true false f1 fk sube g1 gn

ΣΓO ` rprime = 〈f1 fk〉rprimeprime rarr λWTRecEq

Γ(v) = variant[C1 τ1| | Cp τp| | Cn τn]Γ(e) = τp v isin O

ΣΓO ` v = Cp[e]rarr trueWTVarCons

Γ(v) = variant[C1 τ1| | Cp τp| | Cn τn]Γ(op) = τp op isin O

ΣΓO ` switch(v) as [o1| |on]rarr λCpWTVarSwitch

70 Chapter 4 The αSmil Language

Γ(v) = variant[D1 τ1| | Dm τm]C1 Ck sube D1 Dm λ isin true false

ΣΓO ` v isin C1 Ck rarr λWTVarPos

Γ(a) = arrτi〈τ〉 λ isin true false Γ(i) = τi Γ(o) = τ o isin OΣΓO ` o = a[i]rarr λ

WTAGet

Γ(aprime) = Γ(a) = arrτi〈τ〉λ isin true false Γ(i) = τi Γ(e) = τ aprime isin O

ΣΓO ` aprime = [a with i = e]rarr λWTASet

The well-typedness of statements plays an important role with respect to the state-mentsrsquo interpretation as we will show in the next section It is also essential for thewell-typedness and well-formedness of dependency and correlation summaries that willbe presented in the following chapters

The control flow graph Gp = (N E) of a predicate p is well-typed if any edge labeledwith (s λ) isin E is well-typed

forall(s λ) isin E ΣΓO ` srarr λ

ΣΓO ` Gp = (N E)WTCfg

Figure 43 ndash Well-Typed Control Flow Graph

44 Operational Semantics of αSmil StatementsThis section presents the structural operational semantics (Nielson Nielson and Han-kin 1999 Plotkin 2004) of the αSmil language Sometimes also called the small stepoperational semantics this allows reasoning about intermediate stages in a programrsquosexecution and emphasizes the individual steps of the computation

Types We take T0 to be the universe of primitive types τ0 isin T0 Structures variantsand associative arrays are defined inductively Structures are finite labeled products oftypes They are a generalization of the Cartesian product Variants are finite labeleddisjoint unions of several types τ Two types are equal when they are pointwise equal

Semantic Values For each type τ we define the set Dτ of semantic values of thattype For each primitive type τ0 isin T0 we suppose a given Dτ0 Other semantic valuesare defined inductively as shown below

44 Operational Semantics of αSmil Statements 71

Definition 441 Semantic Values Dτ

Dstructf1τ1fnτn = f1 = v1 fn = vn| foralli vi isin Dτi

Dvariant[C1τ1| | Cnτn] =⊎

1leilenCi[v]| v isin Dτiwhere⊎

is the disjointunion

Darrτi 〈τ〉 = (P (vk)kisinP)| P sube Dτi forallk isin P vk isin Dτ

In αSmil arrays are partial In a semantic value belonging to Darrτi 〈τ〉 P denotesthe domain of valid indices for the array

Two values of the same type are equal when they are pointwise equalTraditionally in operational semantics one is interested in how the state is modified

during the execution of a statement αSmil has no concept of state per se what isessential is the evaluation of variables in different environments or semantic contextsTo emphasize this idea we define a valuation or environment E isin E as a mappingfrom variables to semantic values

Definition 442 Valuation or environment E

E V rarr D

Two valuations E and Ersquo are equal if they are mapping the same set of variables tosemantic values that are pointwise equal

E = Eprime lArrrArr forallv isin V E(v) = Eprime(v)

Given a typing environment Γ a valuation E is well-typed if the value mapped toany variable v isin Dom(E) is of the appropriate type Γ(v) We denote this by Γ ` Eand show it in [WTEnv]

forallv isin E E(v) isin DΓ(v)

Γ ` EWTEnv

Definition 443 A configurationlangE [s]

rangof the semantics is a pair consisting of a

valuation and a statement

Definition 444 The transitions of the semantics are of the formlangE [s]

rang λminusrarr Eprime

They express how the configuration is changed by one step of computation occur-ing when executing a statement s that exits with label λ The exit label yielded bythe statementrsquos execution uniquely determines the statement that will be executednext The change of the valuation is recorded in the resulting valuation Ersquo We write

72 Chapter 4 The αSmil Language

E [xrarr v] for the valuation that is identical to E except that x is mapped to the valuev We say that E is extended with xrarr v and formally we define it as shown below

Definition 445 Extend E with xrarr v

(E [xrarr v])(y) =v if x = yE(y) otherwise

Extending a valuation E with multiple mappings x rarr v consists in applying theextension in a left-associative fashion In the following we will omit parentheses forsuch extensions thus denoting

( ((E [x1 rarr v1])[x2 rarr v2]) )[xn rarr vn]

asE [x1 rarr v1] [x2 rarr v2] [xn rarr vn]

An interpretation I isin I for a predicate is defined as a mapping from a predicateand an initial environment to an output environment and an exit label

Definition 446 Predicate Interpretation I isin I

I P times E rarr E times L

The initial environment is a mapping between the predicatersquos formal arguments andtheir effective values The output environment is a mapping between the predicatersquosformal output arguments and their effective values after executing the predicate

The detailed definition of the semantics of generic statements is described belowin Table 48 The first clause [nop] constitutes an axiom as it has no premises Itstates that the nop statement executes in one step yielding the exit label true withoutextending the valuation E The semantics of equality tests is given by two inferencerules [equalT ] and [equalF ] one for each of the statementrsquos possible exit labels Acall to the built-in predicate = will exit with label true if and only if the valuations ofits arguments e1 and e2 are equal (clause [equalT ]) Otherwise the statement will exitwith label false (clause [equalF ]) In both cases the statement leaves the valuation Eunchanged The semantics of an assignment is given by the [asgn] clause the statementalways yields the exit label true and extends the valuation E with o mapped to thevalue E(e) of e

Table 48 ndash The Structural Operational Semantics of αSmil GenericStatements

[nop]langE [nop]

rang trueminusminusrarr E

[equalT ]E(e1) = E(e2)lang

E [e1 = e2]rang trueminusminusrarr E

44 Operational Semantics of αSmil Statements 73

[equalF ]

E(e1) 6= E(e2)langE [e1 = e2]

rang falseminusminusrarr E

[asgn]Eprime = E [orarr E(e)]langE [o = e]

rang trueminusminusrarr Eprime

The semantics of structure-related statements is given in the Table 49 The creationof a structure always yields the exit label true as indicated by the [recNew] clause andit extends the valuation E by mapping the resulting output variable r to the structuralvalue obtained by mapping every field fi to the value E(ei) of the corresponding eiarguments The destructuring of a structure r extends the valuation E by mappingevery output oi to the corresponding value E(vi) of the fi field of r The statementalways exits with true The valuation Eprime obtained after executing an access to a givenfield fi of a structure r is an extension of E where the output o is mapped to thecorresponding value of rrsquos fi field in E The semantics of a field update is given bythe clause [recSet] This statement extends the valuation E by mapping the outputstructure rprime to a new value where the updated field fi is mapped to the value of e inE and every other field is mapped to the same value it had in E Finally the last twoclauses correspond to a partial structure equality test As shown by [recEqualsT ] thestatement yields the exit label true if and only if the values of every field gi in the givenset of fields are equal for r and rprime in E Otherwise the statement yields the label falseIn both cases the valuation E remains unchanged

Table 49 ndash Operational Semantics of αSmil Structure-RelatedStatements

[recNew]Eprime = E [r rarr f1 = E(e1) fi = E(ei) fn = E(en)]lang

E [r = e1 en]rang trueminusminusrarr Eprime

[recAll]

E(r) = f1 = v1 fn = vnEprime = E [o1 rarr v1] [o2 rarr v2] [on rarr vn] foralli j i 6= j oi 6= ojlang

E [o1 on = r]rang trueminusminusrarr Eprime

[recGet]

E(r) = f1 = v1 fi = vi fn = vnEprime = E [orarr vi]lang

E [o = rfi]rang trueminusminusrarr Eprime

[recSet]

E(r) = f1 = v1 fi = vi fn = vnEprime = E

[rprime rarr f1 = v1 fi = E(e) fn = vn

]langE [rprime = r with fi = e]

rang trueminusminusrarr Eprime

74 Chapter 4 The αSmil Language

[recEqualsT ]

E(rprime) = f1 = vf1 fn = vfnE(rprimeprime) = f1 = wf1 fn = wfn

g1 gk sube f1 fn vgi = wgi foralli isin 1 klangE [rprime = 〈g1 gk〉rprimeprime]

rang trueminusminusrarr E

[recEqualsF ]

E(rprime) = f1 = vf1 fn = vfnE(rprimeprime) = f1 = wf1 fn = wfn

g1 gk sube f1 fn existi i isin 1 k vgi 6= wgilangE [rprime = 〈g1 gk〉rprimeprime]

rang falseminusminusrarr E

Table 410 details the semantics of variant-related statements As indicated by the[varCons] clause the construction of a variant v with a constructor Cp always yieldsthe exit label true The obtained valuation Eprime is an extension of E where the valueof v is obtained by applying the constructor Cp to the argumentrsquos value E(e) Avariant switch exits with the label λCi if the value of v in E has been constructedwith the Ci constructor The valuation Eprime obtained after executing the statement is anextension of E whereby the corresponding output oi is mapped to the value of the Ciconstructorrsquos argument E(e) The last two clauses [varPossibleT ] and [varPossibleF ]indicate the semantics of a variant possible check and correspond to the statementrsquospossible exit labels The statement will yield the label true only if the value of v in E hasbeen obtained with a constructor D that is a member of the given set of constructorsC1 Ck Otherwise the false label will be returned In both cases the valuationremains unchanged

Table 410 ndash Operational Semantics of αSmil Variant-RelatedStatements

[varCons]Eprime = E [v rarr Cp[E(e)]]langE [v = Cp[e]]

rang trueminusminusrarr Eprime

[varSwitch]

E(v) = Ci[e] Eprime = E [oi rarr E(e)]langE [switch(v) as [o1| |on]]

rang λCiminusminusrarr Eprime

[varPossibleT ]E(v) = D[e] D isin C1 Cklang

E [v isin C1 Ck]rang trueminusminusrarr E

[varPossibleF ]

E(v) = D[e] D isin C1 CklangE [v isin C1 Ck]

rang falseminusminusrarr E

44 Operational Semantics of αSmil Statements 75

Table 411 describes the semantics of array-related statements Each array-relatedstatement has two corresponding clauses one for each of the Boolean exit labels Ac-cessing an arrayrsquos element yields the exit label true if the given index i is a valid indexThe resulting valuation Eprime is extended by mapping the output o to the value in E ofthe arrayrsquos i-th element Otherwise when the given index i is invalid as indicatedby the [arrGetF ] clause the statement yields the label false and leaves the valuationunmodified The semantics of an array update is given by the [arrSetT ] and [arrSetF ]clauses If the given index i is valid the exit label true is yielded and the resultingvaluation is obtained by extending E with aprime whose i-th elementrsquos value is the value ofe in the initial valuation E The values of all other elements of aprime are the ones found inE for the elements of a On the contrary if the given index i is invalid the valuationremains unchanged and the label false is yielded

Table 411 ndash Operational Semantics of αSmil Array-RelatedStatements

[arrGetT ]

E(a) = (P (v)k) E(i) isin P Eprime = E[orarr vE(i)

]langE [o = a[i]]

rang trueminusminusrarr Eprime

[arrGetF ]

E(a) = (P (v)k) E(i) isin PlangE [o = a[i]]

rang falseminusminusrarr E

[arrSetT ]

E(a) = (P (v)k) E(i) isin P

E

[aprime rarr (P (w)k) wk =

E(e) if k = E(i)vk otherwise

]langE [aprime = [a with i = e]]

rang trueminusminusrarr Eprime

[arrSetF ]

E(a) = (P (v)k) E(i) isin PlangE [aprime = [a with i = e]]

rang falseminusminusrarr E

The semantics of a generic predicate call p(e1 en) [λ1 o1 | | λm om] is cap-tured by the [pCall] inference rule shown in Table 412 Interpreting the predicate p inthe context of its argumentsrsquo values in the valuation E yields a label λi and a map-ping between its formal output arguments and their resulting values vij The resultingevaluation Eprime is obtained by extending E with the output variables oij mapped to thecorresponding vij

The interpretation of a statement is well-typed with respect to a signature if andonly if every tuple in the interpretation is well-typed ie if it has the expected numberof inputs with the adequate types and an adequate label with well-typed outputs as

76 Chapter 4 The αSmil Language

well Furthermore it has to be total ie for every well-typed tuple of inputs thereexists a label and some outputs that match in the interpretation

Table 412 ndash Semantics of a Predicate Call

Σ(p) = p(x1 τ1 xn τn)[λ1 (τ1 y1)| | λi (τi1 yi1 τiki yiki)| | λm (τm ym)]

I(p inputs) = (outputs λi) inputs(xl) = E(el)foralll isin 1 noutputs(yi1) = vi1 outputs(yiki) = viki

Eprime = E [oi1 rarr vi1] [oiki rarr viki ]langE [p(e1 en) [λ1 o1 | | λm om]]

rang λiminusrarr EprimepCall

Definition 447 Subject Reduction PropertyThe interpretation of a well-typed statement given well-typed interpretations for

the external predicate calls preserves the fact that the valuation is well-typed

forall Γ E s λΣ (Γ ` E) and (ΣΓO ` srarr λ) and (langE [s]

rang λminusrarr Eprime) =rArr Γ ` Eprime

Definition 448 The Progress PropertyA well-typed statement in a well-typed environment can always be interpreted to

some label and outputs

forall EΓΣ s (Γ ` E) and (ΣΓO ` srarr λ) =rArr existλprime EprimelangE [s]

rang λprimeminusrarr Eprime

The well-typedness of an interpretation as well as the subject reduction and progressproperties have been formally proven in Coq by Steacutephane Lescuyer

77

Chapter 5

Dependency Analysis forFunctional Specifications

like islands in the sea separate onthe surface but connected in the deep

William James

Algebraic data types (structures and variants) and associative arrays are fundamen-tal building blocks for representing grouping and handling complex data efficientlyHowever as argued in Chapter 1 operations manipulating them are rarely concernedwith the entire compound input data structure Frequently they depend only on a lim-ited subset of their input Complete specifications or contracts (Meyer 1997) of suchoperations will not only stipulate that the output possesses a certain property (BorgidaMylopoulos and Reiter 1993 Polikarpova et al 2013) but will also include their frameconditions (Borgida Mylopoulos and Reiter 1995) ie the parts of the input on whichthey operate Such conditions facilitate reasoning locally without overlooking the globalpicture if a property P is known to hold at a certain point in the program where apredicate p is called P still holds after the call to p provided that the (sub)structureson which P depends are disjoint from the (sub)structures that might be modified ac-cording to prsquos frame condition (Banerjee and Naumann 2014) Though intuitivelyeasy specifying and proving the preservation of logical properties for the unmodifiedpart is a particular manifestation of the frame problem (McCarthy and Hayes 1969Leavens Leino and Muumlller 2007) ndash a notoriously cumbersome task in formal softwareverification imposing unnecessary manual effort (Meyer 2015)

One of the challenges of addressing this problem and thereby simplifying the ver-ification of certain preserved properties is to determine the input fragments on whichthese properties depend ie their footprint (Distefano OrsquoHearn and Yang 2006)or to a first approximation their read effects (Feijs and Jonkers 1992 Greenhouseand Boyland 1999 Clarke and Drossopoulou 2002) While specifications sometimesinclude the write effects (Clarke and Drossopoulou 2002) of an operation through mod-ifies clauses (Guttag et al 1993b) read effects are usually not specified explicitly eventhough this information can be useful for reasoning about an operationrsquos results Thepurpose of the dependency analysis presented in this chapter is to take a step forward in

78 Chapter 5 Dependency Analysis for Functional Specifications

this direction and to detect such information automatically More precisely our analy-sis is a static dependency analysis for the αSmil language (presented in Chapter 4) thatcomputes a conservative approximation of the input fragments on which the operationsdepend

Dependence and liveness analyses are traditionally used in the compilation realmfor code optimization (Kennedy 1978) dead code elimination (Knoop Ruumlthing andSteffen 1994 Wand and Siveroni 1999 Liu and Stoller 2003) program slicing (Weiser1984 Tip 1995 Reps and Turnidge 1996 Castillo et al 2008) or compile-time garbagecollection (Jones and Meacutetayer 1989 Park and Goldberg 1992 Wand and Clinger1998) In contrast to the vast majority of static analyses that are meant to be usedstrictly on code and in an essentially purely automatic setting our analysis is thoughtof as a companion tool to be exploited in the middle of interactive program verificationand it is designed to be used on programs as well as on specifications

51 Dependency Analysis in a NutshellIn a nutshell our dependency analysis targets the delimitation of the input subset onwhich the output depends in the context of an operation with a compound input Wedefine dependency as the observed part of a structured domain and strive to obtain type-sensitive results distinguishing between the subelements of arrays and algebraic datatypes and capturing the dependency specific to each The targeted results are meantto mirror ndash in terms of dependency ndash the layered structure of compound data typesFurthermore the dependency analysis must work with conservative approximations andit must guarantee that what is marked as not needed is definitely not needed ie it isirrelevant for the obtained output

In the classification of Hind (Hind 2001) our dependency analysis is a flow-sensitive field-sensitive interprocedural analysis that handles associative arrays struc-tures and variant data types Specific dependency results are computed for each of thepossible execution scenarios ie for each exit label Thus our analysis also shows aform of path-sensitivity (Hind 2001) However we favour the term label-sensitivity todescribe this characteristic as it seems more appropriate applied to our case and thelanguage we are working with

Our dependency analysis targets complex transition systems in general and oper-ating systems and microkernels in particular These are characterized by states definedby complex compound data structures and by transitions ie state changes that mapan input state to an output state Automatically proving the preservation of invariantsconcerning only subelements of the state ie fields array cells etc that have not beenaltered by a transition in the system would considerably diminish the number of proofobligations The first step towards achieving this goal consists in automatically detect-ing dependency summaries and the minimum relevant input information for producingcertain outputs

As mentioned our analysis targets fine-grained dependency summaries for arraysstructures and variants expressed at the level of their subelements For variants

51 Dependency Analysis in a Nutshell 79

besides capturing the specific dependency on each constructor and its arguments weargue that additional relevant information can be computed regarding the subset ofpossible constructors at a given program point This is not dependency informationper se but it enriches the footprint of a predicate with useful information Togetherwith the dependency information this additional information about constructors ismeant to answer the same question namely what fragments of the input influence theoutput from a different albeit related point of view Therefore we are simultaneouslyperforming a possible-constructors analysis This has an impact on the defined abstractdependency type making it more complex as we will see in the following section Thepossible-constructors analysis could be performed separately as a stand-alone analysisBy performing the two analyses simultaneously we lose some of the precision thatwould be attained if the two were performed separately but we reduce overhead andpresent relevant information in a unified manner

Designing the analysis as a tool to be used in the context of interactive programverification on both code and specifications has led to specific traits One of themconcerns the treatment of arrays In contrast to dependence and liveness analyses usedfor code optimizations (Gross and Steenkiste 1990) which require precision for everyarray cell we compute dependency information referring to all cells of the array orto all but one cell for which an exceptional dependency is computed In practice aconsiderable number of relevant properties and operations involving arrays fall into thisspectrum

In the following subsection in order to better illustrate the problem that our analysisaddresses we briefly present two examples of αSmil predicates manipulating structuresvariants and arrays and describe the dependency information that we are targeting

511 Targeted Dependency Information

To present the envisioned dependency results we consider two αSmil predicates threadand start_address whose control flow graphs and implementations are shown belowBoth predicates manipulate inputs of type process introduced in Section 315 (onpage 49) and shown in Figure 52 Internally they handle values of type thread andmemory_region respectively described in Section 315 (on page 48) as well and shownbelow in Figure 51

type memory_region = Start addressstart int Region lengthlength int

type thread = Identifierid int Current statecrt_state state Stackstack memory_region

Figure 51 ndash Example Data Types ndash Thread and Memory Region

80 Chapter 5 Dependency Analysis for Functional Specifications

type option ltAgt =| None| Some (A a)

type process = Array of associated threadsthreads array ltoption ltthread gtgt Internal idpid int Currently running threadcrt_thread int Address spaceadr_space address_space

Figure 52 ndash Input Type ndash Process

The first predicate thread having the control flow graph shown in Figure 54 andwhose implementation is shown in Figure 53 receives a process p and an index ias inputs It reads the i-th element in the threads array of the input process p Ifthis element is active then the predicate exits with the label true and outputs thecorresponding thread ti Otherwise it exits with the label None and no output isgenerated

predicate thread ( process p int i)-gt [ true thread ti|None|oob]

array ltoption ltthread gtgt th option ltthread gt tio th = p threads [ true -gt 1]tio = th[i] [ true -gt 2 f a l s e -gt 5]switch (tio) as [ |ti] [None -gt 4 Some -gt 3][ true][None ][oob]

Figure 53 ndash Predicate thread ndash Implementation

Our dependency analysis should be able to distinguish between the different exitlabels of the predicate For the label true for instance it should detect that onlythe field threads is read by the predicate while all others are irrelevant to the resultFurthermore it should detect that for the threads array of the input p only the i-thelement is inspected Additionally since we are considering the label true the i-thelement is necessarily an active thread indicated by the constructor Some The otherconstructor None is impossible for this execution scenario On the contrary for theexit label None the constructor Some is impossible For the exit label oob nothing butthe index i and the ldquosupportrdquo or ldquolengthrdquo of the associated threads array is read Thetargeted dependency results for the predicate thread are depicted in Figure 55

The second predicate start_address whose control flow graph is shown in Fig-ure 56 receives a process p and an index j as inputs and finds the start address of

51 Dependency Analysis in a Nutshell 81

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some None

Figure 54 ndash Gthread ndash Control Flow Graph of Predicate thread

Exit label true

adr_space

crt_thread

pid

process p

ithreads

Exit label None

adr_space

crt_thread

pid

process p

ithreads

optionltthreadgt

Some(thread t)

None

ReadNeeded

IrrelevantNot Needed

Figure 55 ndash Targeted Dependency Results for Predicate thread

the stack corresponding to an active thread It makes a call to the predicate threadthus reading the j-th element of the threads array of its input process If this is anactive element it further accesses the field stack from which it only reads the startaddress start Otherwise if the element is inactive the predicate forwards the exitlabel None of the called predicate thread and generates no output When given aninvalid index i the predicate exits with label oob The predicatersquos implementation isshown in Figure 57

The dependency information for this predicate should capture the fact that on thetrue execution scenario only the field start of the inputrsquos j-th associated thread isread Furthermore the only possible constructor on this execution path is the Someconstructor On the contrary for the None execution scenario the only possible con-structor is the None constructor The targeted dependency results for the start_addresspredicate are depicted in Figure 58 We remark that for the oob execution scenarioonly the ldquosupportrdquo or ldquolengthrdquo of the threads array is read

82 Chapter 5 Dependency Analysis for Functional Specifications

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

error

trueNone

true

true

oob

Figure 56 ndash Gstart_address ndash Control Flow Graph of Predicatestart_address

predicate start_address ( process p int j)-gt [ true int adr|None]

thread tj memory_region sj thread (p j)[ true tj | None | oob] [ true -gt 1

None -gt 4 oob -gt 5]sj = tj stack [ true -gt 2]adr = sjstart [ true -gt 3][ true][None ][error]

Figure 57 ndash Predicate start_address ndash Implementation

Exit label true

adr_space

crt_thread

pid

process p

threads

idcrt_state

stack

thread tjstartstack stj

lengthExit label None

adr_space

crt_thread

pid

threads

process p

optionltthreadgt

Some(thread t)

None

ReadNeeded

IrrelevantNot Needed

Figure 58 ndash Targeted Dependency Results for Predicatestart_address

52 Abstract Dependency Domain 83

512 Outline

The rest of this chapter is focusing on technical details related to the dependency analy-sis In Section 52 we present the abstract dependency domain This is the fundamentalbuilding block on which our analysis relies in order to determine expressive dependencysummaries It is followed in Section 53 by an in-depth description of our analysis at anintraprocedural level underlining the data-flow equations in Section 532 and explain-ing them by illustrating the step-by-step mechanism on an example in Section 533 Asummary of the dependency analysis at an interprocedural level is given in Section 54We illustrate the approach underline its shortcomings on an example in Section 541and discuss their origin in Section 542 Two different semantic interpretations of ourdependency information are discussed in Section 55 In Section 56 we review anddiscuss approaches targeting information that is similar to our dependency summariesFinally in Section 57 we conclude and present some other potential applications ofour dependency analysis which are not confined to the field of interactive programverification

52 Abstract Dependency DomainThe first step towards inferring expressive type-sensitive results that capture the de-pendency specific to each subelement of an algebraic data type or an associative arrayis the definition of an abstract dependency domain D that mimics the structure of suchdata types The dependency domain δ isin D shown below is defined inductively fromthe three atomic cases mdash gt and perp mdash and mirrors the structure of the concretetypes

Definition 521 Dependency Domain δ isin D

δ = | gt Everything ndash atomic case (i)| Nothing ndash atomic case (ii)| perp Impossible ndash atomic case (iii)| f1 7rarr δ1 fn 7rarr δn f1 fn fields (iv)| [C1 7rarr δ1 Cm 7rarr δm] C1 Cm constructors (v)| 〈δ〉 (vi)| 〈δdef i δexc〉 i array index (vii)

As reflected by the above definition the dependency for atomic types is expressed interms of the domainrsquos atomic cases gt (least precise) denoting that everything is neededand denoting that nothing is needed The third atomic case perp denoting impossibleis introduced for the possible constructors analysis performed simultaneously and isfurther explained below

The dependency of a structure (iv) describes the dependency on each of its fields Forinstance revisiting our thread example from Section 511 we could express an over-approximation of the dependency information depicted for the process p in Figure 55

84 Chapter 5 Dependency Analysis for Functional Specifications

using the following dependency

threads 7rarr gt pid 7rarr crt_thread 7rarr adr_space 7rarr

This captures the fact that all fields except the threads field are irrelevant ie theyare not read and nothing in their contents is needed The dependency for the threadsfield is an over-approximation and expresses the fact that it is entirely necessary ieeverything in its value is needed for the result

For arrays we distinguish between two cases namely arrays with a general depen-dency applying to all of the cells given by (vi) and arrays with a general dependencyapplying to all but one exceptional cell for which a specific dependency is known givenby (vii) For instance for the threads field of the previous example the following de-pendency

〈 i gt〉

would be a less coarse approximation capturing the fact that only the i-th element ofthe associated threads array is needed while all others are irrelevant

For variants (v) the dependency is expressed in terms of the dependencies of theirconstructors expressed in turn in terms of their argumentsrsquo dependencies Thus aconstructor having a dependency mapped to is one for which nothing but the taghas been read ie its arguments if any are irrelevant for the execution For in-stance for the i-th element of the threads array of our previous example the followingdependency

[Some 7rarr gt None 7rarr ]

would be a more precise approximation when considering the exit label true It isstill an over-approximation as it expresses that both constructors are possible Theargument of the Some constructor is entirely read while for None only the tag is read

For variants we want to take a step further and to also include the informationthat certain constructors cannot occur for certain execution paths Impossible thethird atomic case mdash perp mdash is introduced for this purpose As mentioned previouslyin Section 51 in order to obtain this additional information we perform a ldquopossible-constructorsrdquo analysis simultaneously which computes for each execution scenario thesubset of possible constructors for a given value at a given program point All construc-tors that cannot occur on a given execution path are marked as being perp In contrastconstructors for which only the tag is read are marked as The difference between perpand can be illustrated by considering a polymorphic option type optionltAgt havingtwo constructors None and Some(A val) respectively and a Boolean predicate thatpattern matches on an input of this type and returns false in the case of None andtrue in the case of Some unconditioned by the value val of its argument For thetrue execution scenario the dependency on the Some constructor would be Thetag is read and it is decisive for the outcome but the value of its argument val iscompletely irrelevant The dependency on the None constructor however would be perpthe predicate can exit with label true if and only if the input matches against the Someconstructor By distinguishing between these two cases we can not only distinguish the

52 Abstract Dependency Domain 85

inputrsquos subelements that have a direct impact on an operationrsquos output but addition-ally we can also obtain a more detailed footprint that highlights the influence exertedby the inputrsquos ldquoshaperdquo on the operationrsquos outcome

For instance for the i-th element of the threads array of our previous example adependency mapping the constructor None to perp would be a more precise approximationwhen considering the label true Taking into account all the discussed values we canexpress the dependency depicted in Figure 55 for the label true as follows

threads 7rarr 〈 i [Some 7rarr gt None 7rarr perp]〉pid 7rarr crt_thread 7rarr adr_space 7rarr

We remark that gt and perp can apply to any type For instance gt can be seen

as a placeholder for data that is needed in its entirety Structure array or variantdependencies whose subelements are all entirely needed and thus uniformly mappedto gt are transformed to gt The perp dependency is a placeholder for data that cannotoccur on a certain execution scenario A whole variant value is impossible if all itsconstructors are mapped to perp A whole structure or array is impossible if any of itssubelements is impossible

The perp atomic value is the lower bound of our domain and hence the most precisevalue The final abstract dependency is a closure of all these combined recursively Togive an intuition of the shape of our dependency lattice we illustrate below in Figure 59the Hasse diagram of the order relation between pairs of atomic dependency valuesIntuitively if the two analyses would be performed separately the upper ldquodiamondrdquoshape would correspond to the dependency analysis and the lower one to the possible-constructors analysis The element would be the lower bound for the dependencydomain and the upper bound for the possible-constructors domain By performingthem simultaneously perp becomes the domainrsquos lower bound

(gtgt)

(gt) (gt)

()

(perp) (perp)

(perpperp)

(gtperp) (perpgt)

Figure 59 ndash Order Relation on Pairs of Atomic Dependencies

The partial order relation is denoted by v and defined as shown below

Definition 522 Partial Order v

v sube D timesD

86 Chapter 5 Dependency Analysis for Functional Specifications

Table 51 ndash v ndash Comparison of Two Domains

δ v gtTop

perp v δBot

δ1 v δprime1 δn v δprimenf1 7rarr δ1 fn 7rarr δn v f1 7rarr δprime1 fn 7rarr δprimen

Str v δ1 v δn v f1 7rarr δ1 fn 7rarr δn

Str

δ1 v δprime1 δn v δprimen[C1 7rarr δ1 Cn 7rarr δn] v [C1 7rarr δprime1 Cn 7rarr δprimen]

Var v δ1 v δn v [C1 7rarr δ1 Cn 7rarr δn]

Var

δdef v δprimedef

〈δdef 〉 v 〈δprimedef 〉ADef

v δdef

v 〈δdef 〉ADef

δdef v δprimedef δexc v δprimedef

〈δdef i δexc〉 v 〈δprimedef 〉AIA

δdef v δprimedef δdef v δprimeexc

〈δdef 〉 v 〈δprimedef i δprimeexc〉AAI

δdef v δprimedef δexc v δprimeexc

〈δdef i δexc〉 v 〈δprimedef i δprimeexc〉AI v δdef v δexc

v 〈δdef i δexc〉AI

δdef v δprimedef δexc v δprimeexc δdef v δprimeexc δexc v δprimedef i 6= j

〈δdef i δexc〉 v 〈δprimedef j δprimeexc〉AIJ

It is used to compare dependencies and it is detailed in Table 51 We write δ1 v δ2and we read it as ldquoa dependency δ1 is more precise than another dependency δ2rdquo ifit represents a smaller subset of a structural object and if it allows at most as manyconstructors as δ2 The greatest element is gt (Top) and perp is the least (Bot) Instancesof identical structure and variant types are compared pointwise (Str Var) For arrayswithout known exceptional dependencies we compare the default dependencies applyingto all array cells (ADef) If exceptional dependencies are known for the same cell theseare additionally compared (AI) For arrays with known exceptional dependencies fordifferent cells we compare each dependency on the left-hand side with each one on theright-hand side (AIJ) The comparison of with structures (Str) variants (Var)and arrays (ADef AI) is a pointwise comparison between and the dependencyof each subelement

521 Join and Reduction Operator

The join operation is denoted by or and it is defined as shown below

Definition 523 Join Operation or

or D timesD rarr D

52 Abstract Dependency Domain 87

It is detailed in Table 52 Intuitively the join of two dependencies is the union ofthe dependencies represented by the two It is a commutative operation for which theundisplayed cases in Table 52 are defined by their symmetrical counterparts Theoperation is total joining incompatible domains such as a structure and a variant ortwo structures having different field identifiers results in gt the least precise valueJoin is applied pointwise on each subelement perp is its identity element and gt is itsabsorbing element Joining and the dependency of a structure variant or array isapplied pointwise The value obtained by joining δ and δprime is an upper bound of the two

δ v δ or δprime and δprime v δ or δprime forall δ δprime isin D

Defining the join of two dependencies corresponding to arrays is subtle As shownin Table 51 we are allowing comparisons between dependencies corresponding to ar-rays with exceptions on different variables (rule AIJ) the join operation in this caseamounts to joining the four different dependencies without keeping any of the two ex-ceptions We could have chosen to keep one of the known exceptional dependenciesbut this would have posed two problems on one hand the join operation would notbe commutative and on the other hand it is hard to predict how the exceptionaldependencies would be used at the intraprocedural level and which of the two couldpotentially lead to a gain in precision Thus we adopted this design decision Astrategy possibly worth investigating in such cases would be to allow users to specifyarray cells of interest at specific program points This user-supplied information couldthen be taken into consideration whenever joining array dependencies with two differ-ent known exceptional dependencies Our current join approach for arrays can lead tonon-monotonic approximations in join This becomes visible when noting that for a

Table 52 ndash or ndash Join Operation

δprime δprimeprime δprime or δprimeprime

gt or δ = gtperp or δ = δ

f1 7rarr δ1 fn 7rarr δn or f1 7rarr δprime1 fn 7rarr δprimen = f1 7rarr δ1 or δprime1 fn 7rarr δn or δprimen or f1 7rarr δ1 fn 7rarr δn = f1 7rarr or δ1 fn 7rarr or δn

[C1 7rarr δ1 Cn 7rarr δn] or [C1 7rarr δprime1 Cn 7rarr δprimen] = [C1 7rarr δ1 or δprime1 Cn 7rarr δn or δprimen] or [C1 7rarr δ1 Cn 7rarr δn] = [C1 7rarr or δ1 Cn 7rarr or δn]

〈δdef 〉 or 〈δprimedef 〉 = 〈δdef or δprimedef 〉 or 〈δdef 〉 = 〈 or δdef 〉

〈δdef 〉 or 〈δprimedef i δprimeexc〉 = 〈δdef or δprimedef i δdef or δprimeexc〉 or 〈δdef i δexc〉 = 〈 or δdef i or δexc〉

〈δdef i δexc〉 or 〈δprimedef j δprimeexc〉i = j

i 6= j=

〈δdef or δprimedef i δexc or δprimeexc〉〈δdef or δexc or δprimedef or δprimeexc〉

or =

88 Chapter 5 Dependency Analysis for Functional Specifications

monotonic join operation the following should hold

forallδ δprime ρ δ v δprime =rArr δ or ρ v δprime or ρ (i)

Consideringρ equiv 〈ρdef i ρi〉δ equiv 〈δdef j δj〉δprime equiv 〈δprimedef i δprimei〉 where i 6= j

the hypothesis δ v δprime is translated into the following constraints

δdef v δprimedef δdef v δprimei δj v δprimedef δj v δprimei

Applying (i) for these three dependencies we obtain

〈(δdef or δj) or (ρdef or ρi)〉 v 〈δprimedef or ρdef i δprimei or ρi〉

which holds if and only if both of the following inequalities hold

(δdef or δj) or (ρdef or ρi) v δprimedef or ρdef(δdef or δj) or (ρdef or ρi) v δprimei or ρi

Considering for instance

ρi = gt ρdef 6= gt δdef = δj = δprimedef = perp

a counterexample is foundAs a consequence of the non-monotonic approximations made for arrays (rule AIJ)

the value obtained by joining two dependencies is an upper bound not a least upperbound We address this issue and indicate our solution in Section 53 (on page 94)We remark that we keep only one exceptional cell for array dependencies as in practicemost operations manipulating arrays tend to either modify only one element or all ofthem Logical properties on arrays generally have to hold for all elements Keepingmore than one exceptional dependency would be much more costly and the additionalcost would not necessarily be justified in practice However the join operation wouldbe more straightforward and would not impose non-monotonic approximations

Besides join a reduction operator denoted by oplus has been defined as well

Definition 524 Reduction Operator oplus

oplus D timesD rarr D

This is a recursive commutative pointwise operation Intuitively this operator is intro-duced for taking advantage of the information additionally computed by the possible-constructors analysis that we perform simultaneously Following the same executionpath the same constructors must be possible The reduction operator is used in orderto incorporate this additional information computed for constructors The dependency

52 Abstract Dependency Domain 89

analysis can be seen as amay analysis ie when combining the dependency informationcomputed at two different points on the same execution path the result must accountfor all dependencies computed at any of the two combined points In contrast thepossible-constructors analysis can be seen as a must analysis ie when combining in-formation at two different points on the same execution path it needs to keep facts thathold at both combined points Thus the reduction operator combines dependencies onthe same execution path and consists in performing the intersection of constructors inthe case of variants and the union of dependencies for all other types The reductionoperatorrsquos role will become more transparent after presenting the intraprocedural de-pendency analysis and the corresponding data-flow equations in Section 53 Its identityelement is and its absorbing element is perp The reduction operator between gt andthe dependency of a structure variant or array is applied pointwise Two instances ofidentical variant types are pointwise reduced Similarly to join the undisplayed casesin Table 53 are defined with respect to their symmetrical counterparts

δprime δprimeprime δprime oplus δprimeprime

perp oplus δ = perp oplus δ = δ

f1 7rarr δ1 fn 7rarr δn oplus f1 7rarr δprime1 fn 7rarr δprimen = f1 7rarr δ1 oplus δprime1 fn 7rarr δn oplus δprimenf1 7rarr δ1 fn 7rarr δn oplus gt = f1 7rarr δ1 oplusgt fn 7rarr δn oplusgt[C1 7rarr δ1 Cn 7rarr δn] oplus [C1 7rarr δprime1 Cn 7rarr δprimen] = [C1 7rarr δ1 oplus δprime1 Cn 7rarr δn oplus δprimen][C1 7rarr δ1 Cn 7rarr δn] oplus gt = [C1 7rarr δ1 oplusgt Cn 7rarr δn oplusgt]

〈δdef 〉 oplus 〈δprimedef 〉 = 〈δdef oplus δprimedef 〉〈δdef 〉 oplus 〈δprimedef i δprimeexc〉 = 〈δdef oplus δprimedef i δdef oplus δprimeexc〉

〈δdef i δexc〉 oplus 〈δprimedef j δprimeexc〉 =〈δdef oplus δprimedef i δdef oplus δprimeexc〉 where i = j

〈(δdef or δexc)oplus (δprimedef or δprimeexc)〉 otherwise〈δdef 〉 oplus gt = 〈δdef oplusgt〉

〈δdef i δexc〉 oplus gt = 〈δdef oplusgt i δexc oplusgt〉gt oplus gt = gt

Table 53 ndash oplus ndash Reduction Operator

Finally the extractions summarized in Table 54 have been defined for dependenciesδ and are used to express the data-flow equations of Section 53Definition 525 Extraction of a fieldrsquos dependency

f D 9 D

Definition 526 Extraction of a constructorrsquos dependency

C D 9 D

Definition 527 Extraction of an arrayrsquos cell dependency

〈i〉 D 9 D

90 Chapter 5 Dependency Analysis for Functional Specifications

Definition 528 Extraction of an arrayrsquos dependency outside a cell i

〈lowast i〉 D 9 D

Definition 529 Extraction of an arrayrsquos general dependency

〈lowast〉 D 9 D

They are partial functions and can only be applied on dependencies of the cor-responding kind For instance the field extraction f only makes sense for atomic orstructured values with a field named f which should be the case if the dependencyrepresents a variable of a structured type with some field f For any of the atomicdependencies δa applying any of the defined extractions yields δa

Table 54 ndash Dependency Extractions

δf f isin F

gtf = gtf = perpf = perpf1 7rarr δ1 fn 7rarr δnf = δi if f = fi

δCC isin C

gtC = gtC = perpC = perp[C1 7rarr δ1 Cm 7rarr δm]C = δj if C = Cj

δ〈lowast i〉 δ〈i〉 δ〈lowast〉

gt〈lowast i〉 = gt gt〈i〉 = gt gt〈lowast〉 = gt〈lowast i〉 = gt 〈i〉 = 〈lowast〉 = perp〈lowast i〉 = perp perp〈i〉 = perp perp〈lowast〉 = perp〈δdef 〉〈lowast i〉 = δdef 〈δdef 〉〈i〉 = δdef 〈δdef 〉〈lowast〉 = δdef

〈δdef k δexc〉〈lowast i〉 =δdef when i = kδdef or δexc otherwise

〈δdef k δexc〉〈i〉 =δexc when i = kδdef or δexc otherwise

〈δdef k δexc〉〈lowast〉 =δdef or δexc

522 Well-Typed Dependencies

The described syntactic dependencies are untyped However their interpretation ismade in the context of a type τ Dependencies such as or gt do not exhibit any datatype features and can apply to any type but others will be completely constrained andmost will fall in between uncovering a few layers of structured types before reaching oneof the ldquogenericrdquo leaves gt or perp For example the dependency f 7rarr δf only reallymakes sense for structured types with a single field f whose type itself is compatiblewith δf and shall not be used in connection with variant or array types

As a consequence we conclude the presentation of our abstract dependency typeby explaining what it means for a dependency to be compatible with some type τ ie

53 Intraprocedural Analysis and Data-Flow Equations 91

to be well-typed of some type τ This is described as a judgement parameterized by thetyping environment Γ (Definition 431) and the different inference rules are detailed inTable 55

Γ ` gt τWTgt

Γ ` perp τWTperp

Γ ` τWT

τ = structf1 τ1 fn τnΓ ` δ1 τ1 Γ ` δn τnΓ ` f1 7rarr δ1 fn 7rarr δn τ

WTStruct

τ = variant[C1 τ1| | Cn τn]Γ ` δ1 τ1 Γ ` δn τnΓ ` [C1 7rarr δ1 Cn 7rarr δn] τ

WTVar

Γ ` δdef τΓ ` 〈δdef 〉 arrτi〈τ〉

WTArr

Γ ` δdef τ Γ ` δexc τ Γ(i) = τi

Γ ` 〈δdef i δexc〉 arrτi〈τ〉WTArrI

Table 55 ndash Well-Typed Dependencies

The atomic dependency values are generic they are well-typed with respect to anytype (WTgt WT WTperp) The dependency δ for a structure (WTStruct) is well-typed only with respect to an adequate structured type whose field types are themselvescompatible with the dependency mapped to them in δ Similarly the dependency δfor a variant (WTVar) is well-typed only with respect to an adequate variant typeIn turn its constructors must be themselves compatible with the dependency mappedto them in δ For well-typed array dependencies (WTArr WTArrI) the defaultdependency as well as the exceptional dependency have to be compatible with thetype τ of the arrayrsquos elements Furthermore the type of i the index of the knownexceptional dependency has to be compatible with τi the arrayrsquos index type

In the following section we are discussing our intraprocedural dependency domainand the manner in which dependencies are computed and manipulated

53 Intraprocedural Analysis and Data-Flow Equations

531 Intraprocedural Dependency Domains

At an intraprocedural level dependency information has to be kept at each point ofthe control flow graph for each variable of the typing environment Γ that maps input

92 Chapter 5 Dependency Analysis for Functional Specifications

output and local variables to their types We use the term domain to denote thisinformation

Definition 531 Intraprocedural Dependency Domain ∆ isin D An intraproceduraldomain ∆ isin D

∆ V rarr D

is a mapping from variables to dependencies

An intraprocedural domain is associated to every node of the control flow graph rep-resenting the dependencies at the nodersquos entry point A special case is the mappingwhich binds all variables to perp which we call Unreachable

Unreachable equiv x 7rarr perp

In particular it is associated to nodes that cannot be reached during the analysisAlso if any of the variables of ∆ is marked as perp the entire node collapses becomingUnreachable

For any node of the control flow graph associated to an intraprocedural domain ∆∆(x) retrieves the dependency associated to the variable x If a dependency for x hasnot been computed yet it is mapped to

Forgetting a variable x from a reachable intraprocedural domain denoted by ∆ xldquoerasesrdquo the variablersquos dependency information by mapping it to

Definition 532 Forget x

∆ x =

Unreachable when ∆ = Unreachable

∆prime = y 7rarr

∆(y) when y 6= x when y = x

The v∆ or∆ and oplus∆ operations are pointwise extensions of v (defined in 522) or(defined in 523) and oplus (defined in 524) respectively they apply to intraproceduraldependency domains for each variable and its associated dependency δv

We define a partial order v∆ on D

Definition 533 Intraprocedural Partial Order v∆

v∆ sube D timesD ∆prime v∆ ∆primeprime iff ∆prime(x) v ∆primeprime(x)forallx isin V

In particular Unreachable is the bottom of this intraprocedural lattice It is the identityelement of the intraprocedural join or∆ operation and the absorbing element of theintraprocedural reduction operator oplus∆ defined below

Definition 534 Intraprocedural Join Operation or∆

or∆ D timesD rarr D

∆prime or∆ ∆primeprime = ∆ lArrrArr ∆(x) = ∆prime(x) or∆primeprime(x)forallx isin V

53 Intraprocedural Analysis and Data-Flow Equations 93

Definition 535 Intraprocedural Reduction Operator oplus∆

oplus∆ D timesD rarr D

∆prime oplus∆ ∆primeprime = ∆ lArrrArr ∆(x) = ∆prime(x)oplus∆primeprime(x) forallx isin Γ

Finally an intraprocedural domain ∆ is well-typed with respect to a typing envi-ronment Γ if and only if the dependency mapped to any variable x is well-typed withrespect to xrsquos type in the typing environment Γ (Definition 431)

532 Intraprocedural Data-Flow Equations

Table 56 ndash Statements ndash Representations and Data-Flow Equations

Representation Equationn

n1 ni nk∆n1

∆ni ∆nk

s λ1 s λks λi∆n =

or∆

nsλiminusminusrarrni

JsKλi(∆ni)

Our dependency analysis is a backward data-flow analysis For each exit label ittraverses the control flow graph starting with its corresponding exit node and it marksall other exit points as Unreachable since exit labels are mutually exclusive The in-traprocedural domain for the currently analysed label is initialized with its associatedoutput variables mapped to gt Thereby the analysis starts by making a conservativeapproximation and by considering that all the input has been observed and the outputdepends on it entirely Typically dependence analyses are forward analyses Howevergiven our goal to express label-specific dependencies as input-output relations and tak-ing into consideration the characteristics of the αSmil language choosing to design ouranalysis as a backward data-flow analysis seemed a pertinent choice In αSmil outputsare associated to a particular exit label and they are generated if and only if the pred-icate exits with that particular label By traversing the control flow graph backwardswe can use this information and consider starting with the initialisation phase onlythe outputs that are relevant for the analysed exit label

After the initialisation the analysis then traverses the control flow graph and grad-ually refines the dependencies until a fixed point is reached Table 56 summarizes therepresentation and general equation of the statements For each statement the pre-sented data-flow equation operates on the intraprocedural domains of the statementrsquossuccessor nodes The intraprocedural domain at the entry point of the node is obtainedby joining the contributions of each outgoing edge as shown in Figure 510

Definition 536 The contribution of an edge (ni nj) labeled with s and λ is givenby JsKλ(∆nj ) where JsKλ() is the transfer function of the edge labeled s λ

94 Chapter 5 Dependency Analysis for Functional Specifications

Dependencies corresponding to variables that are written by a statement s on an exitlabel λ denoted by gensλ in Figure 510 are forgotten from the intraprocedural domainon which we are operating

statement

∆in = JsKλ1(∆λ1)or∆ or∆JsKλn(∆λn)JsKλi(∆i) (∆i gensλi

)oplus∆ δsλi

δsλicontribution of s on λi

δsλ1∆λ1

δsλn

∆λn

(∆λ1 gensλ1) oplus∆δsλ1 (∆λn gensλn) oplus∆δsλn

Figure 510 ndash Computation of the Intraprocedural Domain at a NodersquosEntry Point

In Section 521 we explained that as a consequence of the non-monotonic approxi-mations made when joining dependencies corresponding to arrays the result of the joinoperation is an upper bound not a least upper bound In order to deal with this issue weadopt the generic solution consisting of systematically joining the dependency domainassociated to a node before its iteration with the new dependency domain computedby the transfer function Thus the dependency domain of a node n is

∆n = old(∆n)or∆ (or

∆nminusrarrnprime

JsKλ(∆nprime))

This is not prohibitive in terms of performance leading to an increase of the executiontime of 5 to 10

Tables 57 58 59 510 define the transfer functions for each built-in statementof our language whereas the general case of a predicate call and its correspondingequation will be detailed in Section 54

Table 57 presents the transfer functions for statements which are not type-specificFor equality tests (1) both of the inputs e1 e2 are completely read whether the testreturns true or false The transfer functions therefore reduce the domain of the corre-sponding successor node with a domain consisting of e1 and e2 both mapped to gt Inthe case of assignment (2) the dependency of the written output variable o is forgottenfrom the successorrsquos intraprocedural domain thus being mapped to and forwardedto the input variable e The transfer function for the nop operation (3) is simply theidentity

53 Intraprocedural Analysis and Data-Flow Equations 95

Statement JsKλi(∆)

Equality test (1)Je1 = e2Ktrue(∆) = ∆ oplus∆ dep where

Je1 = e2Kfalse(∆) = ∆ oplus∆ dep dep =e1 7rarr gte2 7rarr gt

Assignment (2) Jo = eKtrue(∆) = (∆ o) oplus∆ e 7rarr ∆(o)

No Operation (3) JnopKtrue(∆) = ∆

Table 57 ndash Generic Statements ndash Data-Flow Equations

The data-flow equations given in Table 58 correspond to structure-related state-ments For the equations (4) (5) (6) and (7) we assume that the variable r is of typestructf1 τ fn τ for some fields fi 1 le i le n The equation (4) refers to thecreation of a structure each input ei is read as much as the corresponding field fi ofthe structure is read The destructuring of a structure is handled in (5) each field fi isneeded as much as the corresponding variable oi is When accessing the i-th field of astructure r (6) only the field fi is read and only as much as the accessrsquo result o itselfThe equation (7) treats field updates the variable ei is read as much as the field fi isThe structure r is read as much as all the fields other than fi are read in rprime Finally theequations given in (8) handle partial structure equality tests and the transfer functionsare the same for the labels true or false for both compared structures rprime and rprimeprime all thefields in the given set f1 fk are completely read and only those

Statement JsKλi(∆)

Create (4) Jr = e1 enKtrue(∆) = (∆ r) oplus∆oplus

1leilenei 7rarr ∆(r)fi

Destructure (5) Jo1 on = rKtrue(∆) = (∆ oi| oi isin o) oplus∆ r 7rarr f1 7rarr ∆(o1) fn 7rarr ∆(on)

Access field (6) Jo = rfiKtrue(∆) = (∆ o) oplus∆ r 7rarr f1 7rarr fi 7rarr ∆(o) fn 7rarr

Update field (7) Jrprime = r with fi = eKtrue(∆) = (∆ rprime) oplus∆

ei 7rarr ∆(rprime)fir 7rarr f1 7rarr δ1 fn 7rarr δn

where δj =

∆(rprime)fj if j 6= i otherwise

Equality (8)

Jrprime = 〈f1 fk〉rprimeprimeKtrue(∆) = ∆ oplus∆ d where d =rprime 7rarr f1 7rarr δ1 fn 7rarr δnrprimeprime 7rarr f1 7rarr δ1 fn 7rarr δn

Jrprime = 〈f1 fk〉rprimeprimeKfalse(∆) = ∆ oplus∆ d and δi =gt if fi isin f1 fk otherwise

Table 58 ndash Structure-Related Statements ndash Data-Flow Equations

96 Chapter 5 Dependency Analysis for Functional Specifications

The data-flow equations given in Table 59 correspond to variant-related statementsThey follow the same principles as those used for structure-related statements aboveNote that the transfer functions for the switch (10) and possible constructor test (11)introduce perp dependencies for constructors which are known to be impossible on theconsidered edge In particular since perp is an absorbing element for oplus these transferfunctions erase for every constructor which is known to be locally impossible all thedependency information possibly attached to such a constructor in the successor nodesThis is the actual raison drsquoecirctre for the reduction operator since using or∆ to combinea successor domain and a local contribution would lose this information

Finally the equations for array-related statements are given in Table 510 We as-sume for both that the context is fixed and that I is the distinguished set of inputvariables for the analysed predicate This set is used to make sure that exceptions inarray dependencies are only registered to variables in I and not local or output vari-ables The reason for such a constraint is pragmatic input variables are not assignablein our language and therefore they always represent the same value intraprocedurallyOtherwise each time a variable is written by a statement we would need to traverseall the dependencies in the domain to erase or reinterpret the occurrences where thisvariable appears as an exception Only recording exceptions for input variables makesthis kind of costly traversal useless and since only exceptions about input variablesmake sense at the interprocedural level (see Section 54) we do not lose much precisionby doing so

Statement JsKλi(∆)

Create variant (9) Jv = Cp[e]Ktrue(∆) = (∆ v) oplus∆ e 7rarr ∆(v)Cp

Variant Switch (10) Jswitch(v) as [o1| |on]Kλi(∆) = (∆ oi)oplus∆ v 7rarr depiwhere depi = [C1 7rarr perp Ci 7rarr ∆(oi) Cn 7rarr perp]

Possible variant (11)

Jv isin C1 CkKtrue(∆) = ∆ oplus∆ v 7rarr [C1 7rarr δ1 Cn 7rarr δn ]

where δi =

∆(v)Ci if Ci isin C1 Ckperp otherwise

Jv isin C1 CkKfalse(∆) = ∆ oplus∆v 7rarr

[C1 7rarr δ1 Cn 7rarr δn

]

where δi =

∆(v)Ci if Ci isin C1 Ckperp otherwise

Table 59 ndash Variant-Related Statements ndash Data-Flow Equations

53 Intraprocedural Analysis and Data-Flow Equations 97

Statement JsKλi(∆)

Array access (12)

Jo = a[i]Ktrue(∆) =

(∆ o) oplus∆

i 7rarr gta 7rarr 〈 i ∆(o)〉

when i isin I

(∆ o) oplus∆

i 7rarr gta 7rarr 〈∆(o) or 〉

when i isin I

Jo = a[i]Kfalse(∆) = ∆ oplus∆

i 7rarr gta 7rarr 〈〉

Array update (13)

Japrime = [a with i = e]Ktrue(∆) =

(∆ aprime) oplus∆

i 7rarr gte 7rarr ∆(aprime)〈i〉a 7rarr 〈∆(aprime)〈lowast i〉 i 〉

when i isin I

(∆ aprime) oplus∆

i 7rarr gte 7rarr ∆(aprime)〈lowast〉a 7rarr 〈∆(aprime)〈lowast〉 or 〉

when i isin I

Japrime = [a with i = e]Kfalse(∆) = ∆ oplus∆

i 7rarr gta 7rarr 〈empty〉

Table 510 ndash Array-Related Statements ndash Data-Flow Equations

The transfer functions for (12) and (13) thus take care of making adequate approximationswhen exceptions cannot be introduced As for the cases when the array access exitswith the false label note that the contribution to the array a is 〈〉 which is strictlyless precise than The operation makes implicit bounds checking and this can thusbe seen as accounting for the fact that no cell in a has been read but the ldquolengthrdquoor ldquosupportrdquo of a has been read Hence it would not be correct to claim that theresult of the statement does not depend on a at all Similarly a variant dependency[C1 7rarr Cn 7rarr ] mapping all constructors to nothing has not read any value inany of the constructors but may still depend on the variantrsquos constructor itself Incontrast we do not make this distinction for structures because we assume surjectivepairing ie structure values consist only of the fields themselves Our solution caneasily be adapted in order to deal with non-surjective cases

533 Intraprocedural Dependency Analysis Illustrated

To better illustrate our analysis at an intraprocedural level we exemplify the mechanismbehind it step by step on the predicate thread discussed in Section 511 We considerthe true execution scenario apply our dependency analysis and compare the actualobtained results with the targeted ones depicted in Figure 55

Since a predicate can only exit with one label at a time and we are considering thetrue label we can map the nodes None and oob to Unreachable as shown in Figure 511This is an advantage of backwards analyses For true we make a pessimistic assumptionand map the output ti to gt considering that control on the output is external and

98 Chapter 5 Dependency Analysis for Functional Specifications

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachableti 7rarr gt

Figure 511 ndash Analysing Predicate thread ndash Initialisation

hence out of our reach and that ti will be entirely needed by a potential caller Goingfurther up the control flow graph we analyse the variant switch

In order to compute the dependency for the node corresponding to the variantswitch we apply the data-flow equation given by (10) in Table 59 Since we areanalysing the true case we know that all other constructors (only the constructor Nonein this case) are locally impossible Thus we map it to perp We continue by forgettingthe dependency information we knew about the output ti Since its value is neededonly in as much as the result of the switch on the corresponding edge is needed weforward it to the part corresponding to the Some constructor This is summarized below

oplusoplus perp perp

C1 CSome Cn

tio =

ti =

Jswitch(v) as [o1| |on]Kλi(∆) = (∆ oi)oplus v 7rarr depiwheredepi = [ C1 7rarr perp Ci 7rarr ∆(oi) Cn 7rarr perp ]

Figure 512 ndash Applying the Variant Switch Equation

Taking all this into account for the node corresponding to the variant switch weobtain the dependency shown in Figure 513 For the output ti we depend entirelyon the Some constructor of the nodersquos input variant tio while the constructor None isimpossible

Making a step further up the graph we access the cell i of the array th and applythe equation (12) given in Table 510 We begin by forgetting the dependency for theoutput tio since this is written Since we only access the element i we map all othercells to Nothing ie To the dependency corresponding to the i-th cell we forward

53 Intraprocedural Analysis and Data-Flow Equations 99

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

tio 7rarr [Some 7rarr gt None 7rarr perp]

ti 7rarr gt

Figure 513 ndash Analysing Predicate thread ndash Variant Switch

the dependency we knew about tio since we depend on it to the extent to which theresult of the access is needed

oplusoplus oplusoplus oplusoplus1 i n

th =

tio =

Jo = a[i]Ktrue(∆) =

(∆ o) oplus

i 7rarr gta 7rarr 〈 i ∆(o)〉

when i isin I

(∆ o) oplusi 7rarr gta 7rarr 〈∆(o) or 〉

when i isin I

Figure 514 ndash Applying the Array Access Equation

We thus obtain a dependency stating that we depend only on the i-th cell of thearray th for which only the constructor Some is possible and entirely needed The cellrsquosindex i is entirely needed as well The applied equation is shown in Figure 514 (sincei is an input we use the first case of the equation) and the obtained results are shownin Figure 515

As a last step we access the field threads of the input process p and apply theequation (6) given in Table 58 and illustrated in Figure 516 As before we forget theinformation for th the access result We map all other fields to and we forward thedependency of the variable th to the dependency part of the field threads

We thus obtain the dependency result shown in Figure 517 This states that for thelabel true the output ti depends only on the i-th cell of the field threads of the inputprocess p for which it depends entirely on the Some constructor Before returning thepredicatersquos final results the analysis filters out any dependency information referringto local variables and verifies that the invariant imposed on dependency information

100 Chapter 5 Dependency Analysis for Functional Specifications

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

th 7rarr 〈 i [Some 7rarr gt None 7rarr perp] 〉i 7rarr gt

tio 7rarr [Some 7rarr gt None 7rarr perp]

ti 7rarr gt

Figure 515 ndash Analysing Predicate thread ndash Array Access

f1 = oplusoplus f2 = oplusoplus

fthreads = oplusoplus

fnminus1 = oplusoplus fn = oplusoplus

p =

th =

Jo = rfiKtrue(∆) = (∆ o) oplus s 7rarr f1 7rarr fi 7rarr ∆(o) fn 7rarr

Figure 516 ndash Applying the Field Access Equation

related to arrays holds Since the results refer only to the inputs p and i and the indexof the exceptional computed dependency is an input the invariant holds and the finalresult can be retrieved The final dependency results obtained for the thread predicateon the exit label true are identical to the ones that we were targeting and that weredepicted in Figure 55 For readability considerations for structures such as the inputprocess p we omit dependencies on fields mapped to We maintain this conventionthroughout the rest of this chapter and thus any field of a structure that is omittedfrom a dependency summary should be interpreted as being mapped to ie nothing

54 Interprocedural DependenciesExit labels presented in Section 312 and in Section 41 (on page 63) constitute anincreased source of expressivity as they indicate the scenario that was observed whileexecuting a predicate We incorporate this expressivity in our dependency results bycomputing specific dependencies for each possible execution scenario Therefore ouranalysis is performed label by label and interprocedural dependency domains associatean intraprocedural domain to each exit label of the analysed predicate The variable

54 Interprocedural Dependencies 101

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

p 7rarr threads 7rarr 〈 i [Some 7rarr gt None 7rarr perp]〉i 7rarr gt

th 7rarr 〈 i [Some 7rarr gt None 7rarr perp] 〉i 7rarr gt

tio 7rarr [Some 7rarr gt None 7rarr perp]

ti 7rarr gt

Figure 517 ndash Analysing Predicate thread ndash Field Access

key-set of each associated intraprocedural domain comprises the inputs of the analysedpredicate A label that cannot be returned is mapped to an Unreachable intraproceduraldomain This is a form of path-sensitivity (Robert and Leroy 2012) However we favorthe term label-sensitivity for this characteristic as it seems to be a more natural choiceapplied to our case and the language we are working on

An interprocedural domain of a predicate p is thus defined as shown below

Definition 541 Interprocedural Dependency Domain

Dp Λp rarr D where Λp the set of output labels of predicate p

For each analysed label of a predicate the analysis starts by initializing the intrapro-cedural domain mapped to it with the output variables associated to the exit labelTo avoid making any false assumption these are initially mapped to the most generaldependency namely gt Subsequently as described in Section 532 the dependencyinformation is gradually refined until a fixed point is reached The execution scenariosdenoted by the exit labels of a predicate are mutually exclusive Therefore during theanalysis of a particular exit label all other exit labels of the predicate are mapped toUnreachable After reaching a fixed point the intraprocedural domain is filtered so thatonly input variables appear in the variable set As explained in Section 532 the in-traprocedural domains are built such that only input variables may appear as exceptionindices in dependencies computed for arrays This invariant is preserved throughoutthe analysis

Interprocedural dependency information is expressed in terms of the formal param-eters of predicates For analysing predicate calls we need to substitute the formalparameters of the callee by the ones that are supplied by the caller Therefore asubstitution must be performed on interprocedural summaries This consists in substi-tuting all occurrences of formal input parameters of a predicate by the correspondingeffective input parameters The substitution operation is denoted as J (χ) where χ isa substitution from formal to effective parameters

102 Chapter 5 Dependency Analysis for Functional Specifications

We proceed by detailing the equation corresponding to a call to a predicate

p(e1 en)[λ1 o1 | | λm om]

having the following signature

p(ε1 εn)[λ1 ω1 | | λm ωm]

The general equation (given in Table 56) applies

∆n =or

∆nsλiminusminusrarrni

Jp(e1 en) [λ1 o1 | | λm om]Kλi(∆ni)

The transfer functions for the predicate call statement are deduced from the predicatersquosinterprocedural domain in the following fashion

Jp(e1 en) [λ1 o1 | | λm om]Kλi(∆) = (∆ oi)oplus

jisin1nej 7rarr depij

where (PredEq)depij = Dp(λi)(εj) J (ε 7rarr e)

Namely the mappings for the outputs o associated to a label λi are removed and thecontribution of a call to each input ej stems from the contribution of the interproceduraldomain for label λi and formal input εj In these all the formal input parametersε in array dependency domains are substituted by the corresponding effective inputparameters from e

An αSmil program is analysed by computing once and for all an interproceduraldependency domain for every predicate These are stored in a mapping binding pred-icate identifiers to their interprocedural dependency domains Whenever a predicatecall is handled intraprocedurally the corresponding computed interprocedural depen-dency summary is retrieved from the mapping propagated to the calling site and usedas explained above If an interprocedural dependency summary for a called predicatehas not been computed yet it is handled as if it were an implicit predicate In practicein programs generated in αSmil from Smil predicates are sorted in topological orderwhen possible For implicit predicates described in Chapters 3 and 4 a pessimisticassumption is made it is considered that everything in their inputs has been read andis needed for any of their possible exit labels Since their implementation is hidden aconservative approximation must be made in their case

Inductive predicates have been discussed in Section 314 (on page 46) They arespecification-only predicates and represent a disjunction of cases Each case can intro-duce existentially quantified variables An inductive predicate exits with the true labelif any of its declared cases holds Therefore for inductive predicates one analysis percase is made For the true exit label the dependency results are obtained by joiningthe results of all cases For the false label everything is considered to be read

54 Interprocedural Dependencies 103

541 Interprocedural Dependency Analysis Illustrated

To better illustrate our analysis at an interprocedural level we revisit our start_addressexample predicate introduced in Section 511 We consider the true execution scenarioapply our dependency analysis and compare the actual obtained results with the tar-geted ones depicted in Figure 58

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

error

trueNone

true

true

oob

adr 7rarr gt

sj 7rarr start 7rarr gt

tj 7rarr stack 7rarr start 7rarr gt

Figure 518 ndash Gstart_address ndash Dependency Information

We begin by initialising the output adr withgt and continue by traversing the controlflow graph backwards and by computing the dependency information at each nodeWe apply the data-flow equation (6) given in Table 58 and we obtain the intermediateresults shown in Figure 518

To compute the dependency information of the control flow graphrsquos entry node iethe one corresponding to a predicate call to thread we use the dependency summarycomputed for this predicate for the exit label true and we substitute the formal pa-rameters ie p and i appearing in it with the effective arguments of the call ie pand j We thus obtain the following dependency summary

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

We apply the data-flow equation (PredEq) corresponding to a predicate call discussedon page 102 and make use of the dependency information corresponding to the suc-cessor node on the edge marked with true

tj 7rarr stack 7rarr start 7rarr gt

thus obtaining the following final dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

However the targeted results for start_address depicted in Figure 58 would trans-late to

104 Chapter 5 Dependency Analysis for Functional Specifications

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr gt None 7rarr perp]〉j 7rarr gt

Clearly the dependency information computed by our analysis and shown in Fig-ure 519 is an over-approximation of the results that we had envisioned The obtaineddependency summary states that the entire j-th associated thread of the input pro-cess p is needed in order to obtain the output adr on the true exit label Howeverin reality only one of this threadrsquos fields is actually needed namely the stack fieldfor which only one subelement ndash the start field ndash is read This loss of precision isa consequence of the dependency information mapped to the Some constructor at thecontrol flow graphrsquos entry node corresponding to a call to the thread predicate Whenexecuting successfully and exiting with label true the thread predicate returns the i-thassociated thread of its input process However the predicate thread does not need thiselement itself it does not read nor use it per se it merely retrieves it The dependencyon this returned element is relative to the amount in which the predicatersquos callers willuse it The start_address predicate for instance depends only on one of the 3 fieldsof the returned thread Yet by mapping the i-th thread to gt in threadrsquos dependencysummary we fail to mirror this distinction gt is the top element of our dependencydomain and joining it with any other dependency will lead to gt thus shadowing anyother information we might compute while observing its usage

542 Context-Insensitivity and its Consequences

Precision losses in dependency summaries such as the one detected in our previousexample are a direct consequence of considering and analysing predicates in isolationThere is a level of information that goes beyond a predicatersquos own control flow graphand a more detailed picture that can emerge once non-local information connected tothe predicatersquos use ie the calling context is included into the analysis

Interprocedural analyses that consider the calling context when analysing the targetof a function ndash or in our case a predicate ndash call are context-sensitive analyses (Hind2001) As the name implies context-sensitive analyses can jump back to the originalcall site using context information for the results they compute Context-insensitiveanalyses on the other hand dispense with such information and propagate back to all

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

error

trueNone

true

true

oob

adr 7rarr gt

sj 7rarr start 7rarr gt

tj 7rarr stack 7rarr start 7rarr gt

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

Figure 519 ndash Gstart_address ndash Final Dependency Results

55 Semantics of Dependency Values 105

possible call sites the information that they compute once This is a notorious sourceof potential precision loss in static analysis Choosing either one of these two traits hassignificant consequences on the one hand by choosing to ignore the calling contextand the additional information it supplies one pays a high price in terms of precisionand on the other hand by choosing to include such information one risks sacrificingscalability

Our dependency analysis as presented so far is context-insensitive for each predi-cate the analysis computes a dependency summary once stores it and further propa-gates it to its callers whenever needed Considering that αSmil predicates are sequencesof calls to other predicates built-in or user-defined as discussed in Chapter 4 if wewould adopt a purely context-sensitive solution we would gain in terms of precisionbut we would obtain results that are prohibitive in terms of performance This is atypical trade-off of static analyses We address this issue and describe our solution indetail in Chapter 6 Without adopting context-sensitivity to the letter we strike a bal-ance between the two alternatives by including lazy components in our interproceduraldependency summaries and by using them for injecting the current intraproceduralcontext on an as-needed basis As will be discussed in Chapters 6 and 8 this approachleads to improved precision with only a marginal decrease in performance

55 Semantics of Dependency ValuesThere are two different manners of interpreting dependency values δ one focusing onthe possible constructors part and the other focusing on the dependency part Inboth cases the interpretations are relative to a type τ and hold only for well-typeddependencies of the same type The set of types that a dependency is compatible withhas been discussed in Section 522 and defined in Table 55

First focusing on the possible constructors aspect dependencies can be interpretedas a constraint on the forms that values may take Such constraints can arise asa consequence of perp ie impossible appearing in nested dependencies These aredescribed by a characteristic function 1

DD = (v δ) isin DtimesD | δ isin D τ isin T v isin Dτ Γ ` δ τ1 DD rarr 0 1

This is defined as follows belowDefinition 551 Characteristic function 1

1(vgt) = 11(v) = 11(vperp) = 0

1(f1 = v1 fn = vn f1 7rarr δ1 fn 7rarr δn) =

1 when 1(vi δi)forall1 le i le n0 otherwise

106 Chapter 5 Dependency Analysis for Functional Specifications

1(Ci[v] [C1 7rarr δ1 Cn 7rarr δn]) =

1 when 1(v δi)0 otherwise

1((P (vk)kisinP) 〈δdef 〉) =

1 when 1(vk δdef )forallk isin P0 otherwise

1((P (vk)kisinP) 〈δdef i δexc〉) =

1 when (1(vk δdef )forallk isin P k 6= E(i)) or(E(i) isin P1(vE(i) δexc))

0 otherwise

This interpretation is compatible with the partial order v (Definition 522 Ta-ble 51) defined on dependencies If a dependency is more precise or equal to anotherdependency then it should be interpreted as constraints which are at least as strong asthe ones for the other dependency Given a typing environment Γ (Definition 431)

forallτ isin Tlowast δ v δprime =rArr (Dτ cap 1(bull δ)) sube (Dτ cap 1(bull δprime))

whereTlowast = τ isin T | Γ ` δ τ and Γ ` δprime τ

The interpretation of the reduction operator oplus (Definition 524) with respect tothe constraints semantics of dependencies is that if two dependencies δ and δprime can beinterpreted as constraints for a value v then their reduction can be interpreted as aconstraint for v as well

1(v δ) and 1(v δprime) =rArr 1(v δ oplus δprime)

The converse which one might expect to be true as well does not hold because ofapproximations made by our treatment of arrays

Given a valuation E (Definition 442) an intraprocedural dependency summarycan be interpreted as a conjunction of the constraints on every variablersquos value as givenby its associated dependency We use the notation E ∆ to indicate this

E ∆ =rArr forallv isin V1(E(v)∆(v))

Under the appropriate conditions given a semantic transition λminusrarr (Definition 444)from the configuration

langE [s]

rang(Definition 443) to the valuation Ersquo as defined in

Section 44 if the intraprocedural summary ∆prime of the statementrsquos s successor on labelλ represents the semantic interpretation of constraints given Ersquo then the contributionJsKλ(∆prime) (Definition 536) of the edge labeled with s and λ must necessarily representthe semantic interpretation of constraints given E We thus obtain the following

55 Semantics of Dependency Values 107

Γ ` E =rArr (51)ΣΓO ` srarr λ =rArr (52)lang

E [s]rang λminusrarr Eprime =rArr (53)

Γ Eprime ` ∆prime =rArr (54)Eprime ∆prime =rArr (55)E JsKλ(∆prime) (56)

We note that thanks to the subject reduction property (Definition 447) (53)implies that Γ ` Eprime

Following from (56) when joining the contributions on all labels of the statements the obtained intraprocedural dependency summary represents the semantic interpre-tation of the disjunction of constraints given E

(E JsKλ1(∆prime1))or∆ or∆(E JsKλn(∆primen)) =rArrE (JsKλ1(∆prime1)or∆ or∆JsKλn(∆primen)) =rArrE old(∆) =rArrE old(∆)or∆(JsKλ1(∆prime1)or∆ or∆JsKλn(∆primen))

For a predicate p exiting with label λ and having the intraprocedural summary ∆λthe characteristic function given I sube E a valuation mapping the predicatersquos inputs totheir values constrains the space of inputs that can make the predicate exit with thelabel λ It thus denotes the necessary conditions on inputs according to the observedexecution scenario and can be used as an inversion lemma when reasoning on calls toa predicate

The soundness of this interpretation as well as the well-formedness of our dependen-cies have been proven in Coq and the corresponding files can be consulted online1 Themechanized Coq proofs are entirely due to Steacutephane Lescuyer These proofs also dealwith deferred dependencies that will be presented in Chapter 6 but these constitutean extension that does not modify the underlying lattice

The second interpretation of dependency values focuses on the dependency part andis a partial equivalence relation asymp

TD= (τ δ) isin Ttimes D | Γ ` δ τasymp TDrarr Dtimes D

The partial equivalence relation asympτδ relates well-typed values of the same type τ Itrelates values that only differ in places that are irrelevant according to the dependencyδ It is defined as shown below

1The corresponding files are provided at the following address httpajl-demofr2015proveCoq

108 Chapter 5 Dependency Analysis for Functional Specifications

Definition 552 Partial Equivalence Relation asympτδ

asympτgt = (x x)| x isin Dτasympτ = (x y)| x y isin Dτasympτperp = (x y)| x y isin Dτ

asympstructf1τ1fnτnf1 7rarrδ1fn 7rarrδn = (f1 = v1 fn = vn f1 = w1 fn = wn) |

foralli 1 le i le n (vi wi) isin asympτiδi

asympvariant[C1τ1| | Cnτn][C1 7rarrδ1Cn 7rarrδn] = (Ci[vi] Ci[wi]) | (vi wi) isin asympτiδi

asymparrτi 〈τ〉〈δdef 〉 = ((P (vk)kisinP) (P (wk)kisinP)) | forallk (vk wk) isin asympτδdef

asymparrτi 〈τ〉〈δdef i δexc〉 = ((P (vk)kisinP) (P (wk)kisinP)) | E(i) isin P =rArr

(vE(i) wE(i)) isinasympτδexc forallk 6= E(i) (vk wk) isin asympτδdef

This interpretation is compatible with the partial order v (Definition 522) definedon dependencies If a dependency is more precise or equal to another dependency thenit should be interpreted as an equivalence relation relating more values

δ v δprime =rArr asympτδ supe asympτδprime forallτΓ ` δ τ and Γ ` δprime τ

The interpretation of the reduction operator oplus (Definition 524) with respect tothe equivalence relation interpretation of dependencies is that the set of values relatedby δ oplus δprime is a subset of the intersection of values related by δ and δprime respectively

asympτδoplusδprime sube asympτδ cap asympτδprime forallτΓ ` δ τ and Γ ` δprime τ

The interpretation of the or operator (Definition 523 Table 52) with respect tothe equivalence relation interpretation of dependencies is similar

asympτδorδprime sube asympτδ cap asympτδprime forallτΓ ` δ τ and Γ ` δprime τ

Given two valuations E and Ersquo they are equivalent modulo an intraproceduraldependency summary ∆ if the values that they associate to variables are equivalentmodulo the corresponding dependency associated in ∆

E asympΓ∆ Eprime =rArr forallv isin ∆ E(v) asympΓ(v)

∆(v) Eprime(v)

The equivalence relation asympΓ∆ thus relates valuations that are not distinguishable by

only looking at the parts specified by the intraprocedural dependency summary ∆This interpretation can be used to apply congruence modulo reasoning to predicate

calls By calling a predicate p with two sequences of input values v and u respectively

56 Related Work 109

which are related by the intraprocedural dependency summary of p on label λ thenthe predicate will necessarily exercise the same execution scenario exiting with label λand will yield identical outputs w

56 Related WorkThe frame problem and its manifestations in the software verification process ndash detect-ing program properties that remain unchanged under a certain operation ndash are notori-ous (Leavens Leino and Muumlller 2007 Leavens and Clifton 2005 OrsquoHearn 2005) Acomplete specification of a program will necessarily include frame properties (BorgidaMylopoulos and Reiter 1995) However though necessary specifying and verifyingframe properties is tedious and repetitive Two prominent solutions to the frame prob-lem come from separation logic (Reynolds 2005 Distefano OrsquoHearn and Yang 2006Calcagno et al 2011) and ownership types (Clarke and Drossopoulou 2002) HoweverMeyer (Meyer 2015) argues that the problem itself should not impose such annotation-heavy solutions Simpler automatic solutions for their specification and verificationwould allow programmers to concentrate on the truly challenging part (Meyer 2015)

Though we share the same desideratum with separation logic (Reynolds 2002Reynolds 2005 OrsquoHearn 2012 OrsquoHearn Yang and Reynolds 2004) the programmingparadigm and context under which we operate leads to a considerably different solutionSeparation logic is targeted at low-level imperative programming languages and itsapplications focus on shared mutable data structures We on the other hand focuson a purely functional language and consider immutable algebraic data structures andarrays We treat mappings between variables and values and analyse their evolution ina side-effect free environment in the context of verification of programs where a newoutput is obtained by altering just a subset of the inputrsquos subelements and preservingthe rest Instead of using a collection of Hoare triples as an abstract domain we havedefined our own dependency domain The results of our dependency analysis are closeto the concept of a footprint (Distefano OrsquoHearn and Yang 2006 Hur Dreyer andVafeiadis 2011 Bobot and Filliacirctre 2012) in the sense that they describe an over-approximation of only those variables and subelements that are needed by a programand are expressed as an input-output relation

The dependency results computed by our analysis are similar to primitive read andwrite effects used in ownership type systems (Clarke and Drossopoulou 2002) Writeeffects in our case are implicit and include strictly the output variables associated toan exit label Read effects can only refer to input variables of a predicate Alsoread effects comprise the whole execution of a method even if they are irrelevant forthe methodrsquos results We however ignore read effects on which the output does notdepend reflecting only those which contribute to the observed result A technique fordeclaring and verifying read effects in an ownership type system is presented in (Clarkeand Drossopoulou 2002) We use static analysis to automatically detect them Inthe Spec (Mike Barnett 2005) program verifier the notion of confined is used for

110 Chapter 5 Dependency Analysis for Functional Specifications

describing the reading effects of a pure method in terms of the ownership cone (ClarkePotter and Noble 1998) of its parameters

In (Hughes 1987) Hughes argues that analyses of programs that manipulate datastructures should ideally distinguish between the information they are computing fora data structure as a whole and the information computed for each component withinit The information that is computed by a backward analysis is dubbed generically ascontext A manner of constructing richer domains is described and it is argued that forinstance a context for a sum type must contain (sub)contexts for any of its summandsSimilarly for product types a context should include a (sub)context for each componentas well as a context referring to the value as a whole We target fine-grained dependencyinformation for structures variants and arrays Similarly to the described producttype contexts our dependencies for structures describe the dependency on each of thestructurersquos fields Variant dependencies are expressed in terms of the dependencies oftheir constructors ie their summands Furthermore it is argued that any contextshould include a maximal element interpreted as a ldquono informationrdquo value a minimalelement interpreted as ldquocontradictory requirementsrdquo and an element representing ldquonocontextrdquo or ldquounusedrdquo Close to the notion of ldquocontradictory requirementsrdquo we includean atomic value denoting impossible in our dependency domain Program points havinga ldquocontradictory requirementsrdquo context denote points in the program that will lead tocrashes if reached Our notion of impossible refers to nodes that are unreachable orconstructors that cannot occur on a given execution path Our maximal elementdenoting everything is a safe value close to the notion of ldquono informationrdquo Nothingan element different from both everything and impossible is similar to the notion ofldquounusedrdquo It denotes (sub)elements that are irrelevant and constitutes quite definiteinformation

Hughes (Hughes 1987) introduces a notion of neededunneeded parameters forprograms manipulating lists This enables detecting whether the value of a subterm isignored The method is formulated in terms of a fixed finite set of projection functionsMultiple other approaches and analyses focus on the elimination of unnecessary datastructures (Cousot and Cousot 1994) filtering of useless arguments and unnecessaryvariables in the context of logic programming (Leuschel and Soslashrensen 1996) and morerecently removing redundant arguments (Alpuente Escobar and Lucas 2007)

The concept of a context is further discussed by Wadler and Hughes in (Wadler andHughes 1987) The authors describe a technique for strictness analysis for non-flat listdomains that relies on contexts represented using the notion of projections from domaintheory These allow expressive list descriptions such as contexts specifying that while alistrsquos elements can be ignored its length is relevant Their backward analysis computesnecessary information using a fixed finite abstract domain

Leino and Muumlller (Leino and Muumlller 2008b) present a technique for verifying thatmethods that query the state of identical data structures return identical or equivalentresults They stress the frequency of such assumptions in program verification as wellas the counter-intuitive amount of effort required for the specification and verificationof such equivalent-results methods and their callers One of the two interpretationsof our dependency values mdash asympτδ mdash is an equivalence relation binding pairs of values

56 Related Work 111

that are not distinguishable by considering only the parts specified by the dependencydomain Thus it ensures not only that identical input data structures will lead to iden-tical results but also that different invocations of a predicate with input data structuresthat are congruent with respect to this interpretation will lead to identical results Ourdependencies are similar to the influence sets presented by Leino and Muumlller Influencesets are represented as sets of heap locations and they are used to specify the partsof the program state that are allowed to impact the return values Influence sets areuser-defined and they are required to be self-protecting This property is enforced byrequiring the set of path expressions specifying the influence set to be prefix close aconstraint which is then checked syntactically In contrast our dependencies are com-puted by static analysis Influence sets may depend on the heap Reasoning aboutheap locations is beyond the scope of our analysis We treat mappings between vari-ables and values analyse their evolution in a side-effect free environment and expressdependencies as input-output relations The technique presented by Leino and Muumlllerhas been applied for reasoning about pure methods (Leino Muumlller and Wallenburg2008 Hatcliff et al 2012 Nordio et al 2010 Banerjee and Naumann 2014)

Identifying the input (sub)parts on which a predicatersquos outputs depend can also beseen as an instance of secure information flow (Sabelfeld and Myers 2003) where thepredicatersquos outputs and the input (sub)parts appearing in the predicatersquos dependencysummary have a low-security level ie are public and everything else has a high-security level ie is private The first interpretation of our dependency values mirrorsthe notion of non-interference as given by Volpano et al in (Volpano Irvine andSmith 1996) for deterministic programs By only observing the public parts nothingcan be concluded about the private parts The link between permissions and ownershiptypes has been underlined by Zhao and Boyland (Zhao and Boyland 2008)

Liu and Stoller present a backward dependence analysis for the computation ofdead code (Liu and Stoller 2003) They obtain expressive descriptions of partiallydead recursive data using liveness patterns These are based on general regular treegrammars that were extended with two notions live and dead Users can specifyliveness patterns at particular program points of interest The analysis then uses theseand computes liveness patterns at all program points based on constraints derived fromthe programming language semantics and the program itself The obtained informationis meant to be used for identifying and eliminating dead code In a separate paper (Liu1998) Liu presents three approximation operations meant to guarantee terminationin the context of fixed point computations using general grammar transformers onpotentially infinite grammar domains

Static dependence or liveness analyses are typically used for code optimizationdead code elimination (Liu and Stoller 2003) and compile time garbage collectionbut only seldom for program verification One exception that we are aware of comesfrom Frama-C (Cuoq et al 2012) where it is used in a purely automatic setting andunlike our analysis it does not handle unions and arrays A plug-in based on theavailable value analysis (Frama-C Value Analysis User Manual) computes lists of inputand output locations for each function distinguishing between operational functionaland imperative inputs and outputs Dependencies computed for an output o hold if

112 Chapter 5 Dependency Analysis for Functional Specifications

and when the analysed function terminates They are represented as sets of variableswhose initial value can influence the final value of o Input variables appearing in thisset are called functional inputs Imperative inputs are the locations that may be readduring the execution of the analysed function An over-approximation of the set ofthese locations is computed locations that are read only in non-terminating branchesare included in the imperative inputs set as well Operational inputs are the memoryzones that are read without having been previously written to

57 ConclusionIn the context of interactive formal verification of complex systems considerable effortis spent on proving the preservation of the systemrsquos invariants However most oper-ations have a localised effect on the system which only really impacts few invariantsat the same time Identifying those invariants that are unaffected by an operation cansubstantially ease the proof burden for the programmer

In this chapter we have presented a data-flow analysis that computes a conserva-tive approximation of the input fragments on which the operations depend It is aflow-sensitive path-sensitive interprocedural dependency analysis that handles arraysstructures and variants For the latter it simultaneously computes a subset of possibleconstructors We have defined our own abstract dependency domain and we obtaindependency information that mirrors the layered structure of compound data types

The main original traits of this contribution stem from its design as an analysismeant to be used as a companion tool during interactive program verification in aunified manner on programs as well as on specifications

We have implemented a prototype of the dependency analysis in OCaml and wehave applied it to a functional specification of ProvenCore (Lescuyer 2015) a general-purpose microkernel that ensures isolation Its proof is based on multiple refinementsbetween successive models from the most abstract one on which the isolation propertyis defined and proven to the most concrete ie the actual model used for code gener-ation Medium-sized experiments performed on the abstract layers of ProvenCore showpositive results For instance the dependency results of approximately 630 αSmil pred-icates totalling approximately 10000 lines of code are obtained in less than 1 secondStatic approaches have long been considered as being confined to small programs Webelieve that our preliminary results indicate that it is possible to report conservativedependency summaries without sacrificing scalability The implementation and the ob-tained results will be presented and discussed in detail in Chapter 8 The prototypecan be tested on the web page2 dedicated to our dependency analysis where variousexamples are provided and explained Additionally users can devise and test their ownexamples

An obvious first challenge is to address the issue of context-sensitivity In thefollowing chapter we present a solution based on lazy components which are includedin our interprocedural dependency summaries The current intraprocedural context is

2Dependency Analysis Web Page httpajl-demofr2015

57 Conclusion 113

injected in them on an as-needed basis As we will show in Chapter 6 these lead toimproved precision with only a marginal decrease in performance

Our main goal is to combine the dependency analysis with the correlation analysispresented in Chapter 7 which is meant to detect relations between inputs and outputsBy uncovering partial equivalence relations between inputs and outputs after havingdetected that a property only depends on unmodified parts and by unifying the resultsthe preservation of invariants for the unmodified parts can be inferred

We surmise that besides its intended target other programming activities can relyon our dependency analysis as well For instance it could have applications in thetesting realm the computed dependency information could be used for designing andgenerating test suites that avoid redundant testing of the same execution scenarioBased on the second interpretation mdash asympτδ mdash of our dependency information given inSection 55 classes of inputs that will test the same execution scenario can be deter-mined The input subelements on which the outputs of a predicate do not depend canbe consistently supplied with the same testing value as they are completely irrelevantfor the outcome On the contrary the input subelements on which the outputs dependshould be targeted and their values should be varied for more comprehensive testingSince our dependency analysis computes results for every exit label of an αSmil pred-icate it could also facilitate unit testing for exceptions Furthermore the computeddependency information could provide assistance in specifying read effects of predicatessimilar to accesible clauses (Leavens et al 2006) in JML

The dependency analysis presented in this chapter has been the subject of a previouspublication (Andreescu Jensen and Lescuyer 2015)

115

Chapter 6

Deferred Dependencies InjectingContext in DependencySummaries

No symbols where none intended

Samuel Beckett

61 Dealing with Context-InsensitivityTraditionally the precision of static analyses is characterized along several axes in-cluding the scope of the analysis ie intraprocedural or interprocedural analyses anddifferent nuances of sensitivity relative to the analysisrsquo use of control-flow informationor of information pertaining to the calling context This classification and terminologyhas its origins in data-flow analyses (Hind 2001 Midtgaard 2012) Regarding scopeintraprocedural analyses are local and operate within the boundaries of procedures Incontrast interprocedural analyses are global and operate across procedure calls (Midt-gaard 2012) These are somewhat more challenging and costly to perform and imposedealing with parameter mechanisms

Another important distinction is made regarding the calling context Context-sensitive analyses distinguish between different calling contexts At the other end ofthe spectrum context-insensitive analyses compute information only once and subse-quently use the same information at all calling sites Clearly a context-sensitive analysisis more precise than a context-insensitive analysis but it is also more costly (NielsonNielson and Hankin 1999) The choice between which technique to use amounts to acareful balance between precision and efficiency (Nielson Nielson and Hankin 1999)The dependency analysis presented in the previous chapter is an interprocedural flow-sensitive context-insensitive data-flow analysis Regarding pure context-sensitivity ina functional language such as αSmil in which predicate calls and the manipulation ofthe returned outputs are omnipresent unfolding predicates at each call site and recom-puting the needed information seems to be a daunting perspective that risks becomingprohibitive in terms of execution time very quickly On the other hand choosing toanalyse predicates in isolation and to dispense completely with information regarding

116 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

the calling context leads to clear precision losses as illustrated in Section 541 anddiscussed in Section 542 In order to address this aspect we have devised a solutionbased on symbolic dependencies that requires an extension of our abstract dependencydomain (Definition 521) but which otherwise has a minimal impact on the dependencyanalysis at an intraprocedural and interprocedural level

Outline In this chapter we present our solution based on symbolic dependencies Westart by illustrating the addressed problem and the desired results in Section 62 InSection 63 and Section 64 we present the extended abstract dependency domain Weshow the insertion and use of symbolic components at the intra- and interprocedurallevel of our dependency analysis in Section 65 and Section 66 respectively Finallywe discuss their impact on the precision of the computed dependency information

62 Symbolic Dependency Components in a NutshellSymbolic dependency components allow us to compute interprocedural predicate sum-maries with lazy components in which the callerrsquos intraprocedural information andcontext can be injected on an as-needed basis The interprocedural dependency infor-mation for each predicate is still computed only once and propagated back to everypossible call site However even though the analysis does not systematically recomputethe dependency for the called predicate it shows a form of context-sensitivity (Hind2001) and leads to increased precision by creating templates with symbolic elements foreach predicate These elements introduce degrees of freedom in our interprocedural de-pendencies and allow us to parameterize and vary them according to the callerrsquos actualintraprocedural context Thus we exclude some sources of coarse over-approximationswithout sacrificing scalability

Previously in Section 541 we illustrated on two αSmil example predicates threadand start_address how failing to take into consideration the current context of acaller leads to over-approximations We argued in Section 542 that a more precisedependency blueprint can emerge once we consider a predicatersquos use as well The firstexample predicate given in Chapter 5 thread is an accessor predicate it receives aprocess p and an index i as inputs and returns the i-th associated thread of the processp when executing succesfully ie when exiting with the true label The computedpredicatersquos dependency summary for the successful execution scenario was the following

p 7rarr threads 7rarr 〈 i [Some 7rarr gt None 7rarr perp]〉i 7rarr gt

This dependency information is expressive it shows that only one of the 4 fields ofthe input process is read by the predicate while all others are irrelevant for its outputThe read field threads corresponds to the array of threads associated to the inputprocess p Furthermore the dependency summary shows that for this array only thei-th element is inspected This element is entirely needed while all others are irrelevant

62 Symbolic Dependency Components in a Nutshell 117

This summary provides a rather detailed and precise blueprint of the predicatersquos outputdependencies on its inputs Yet it fails to make one subtle but important distinctionregarding the dependency on the i-th element of the associated threads array Ifwe want to be more accurate while describing this predicatersquos dependency we needto acknowledge that the predicate itself is not actually needing or depending on thei-th associated thread of the process Indeed it does not read or use it per se itmerely retrieves it Thus the dependency on the input processrsquo i-th associated threadis relative to the amount in which the callers of the thread predicate will use theoutput element in which it is retrieved It is important to distinguish between thesetwo rather subtle nuances Failing to do so can shadow information that is computedwhile analysing callers of the thread predicate This was exactly what happened forour second example predicate start_address The predicate start_address receivesa process p and an index j as inputs It makes a call to the predicate thread thusreading the j-th associated element of the process p If this is an active element itfurther accesses the field stack from which it only reads the start address start Theobtained dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

was an over-approximation of the desired dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr gt None 7rarr perp]〉j 7rarr gt

Intraprocedurally the dependency analysis was correctly detecting that only thefield stack of the thread was needed for which only the start field was read Howeverwhen joining the dependency information computed locally for start_thread with theone given by the predicatersquos thread dependency summary we obtain less precise de-pendency results This scenario is not a corner case it would typically be exhibited inthe case of accessor predicates and their callers

In order to address this source of precision loss we can introduce symbolic or lazycomponents in our abstract dependency domain As a first attempt and approximationwe could consider the set of output variables of a predicate as the lazy componentsThese can be seen as the points at which a caller predicate may insert its intraproceduralinformation in the dependency summary computed for the callee predicate

The dependency summary for a successful execution of the thread predicate iethe true exit label would therefore not map the i-th element of the threads arrayto everything ie gt the top element of our abstract dependency domain Insteadthis would be mapped to the symbolic set of output variables in which this inputsubelement is retrieved ie the set containing the ti output variable We denote thisset by Deferred(ti) as it represents the set of points in which a caller predicate caninject its context Establishing the dependency on the i-th associated thread of theinput process p is thus deferred or postponed and left to the caller predicates it isrelative to their context and the amount in which they use the output ti

118 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr Deferred(ti) None 7rarr perp]〉j 7rarr gt

Using this dependency summary when computing the information for the predicatestart_thread we would obtain the targeted dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr Deferred(adr) None 7rarr perp]〉j 7rarr gt

This dependency summary for start_address shows that the dependency on thej-th associated thread of the input process p depends on the amount in which theoutput adr representing the start address of the threadrsquos stack is subsequently usedIndeed start_address itself is an accessor predicate

This first approximation of lazy components as sets of output variables of a predi-cate is effective for accessor predicates However its limitations become visible whenconsidering functional non-destructive mutator predicates for example Such predi-cates receive a compound input destructure it and construct a new output variableThis is created by modifying only one of the compound inputrsquos subelements and bycopying all the rest without further changes For example the predicate set_threadshown below is the dual of our thread example predicate It receives a process p athread ti and an index i as inputs and returns a new process r as an output ob-tained by setting the i-th associated thread in the threads array to ti and by copyingeverything else from p

predicate set_thread ( process p int i thread ti)-gt [ true process r] array ltoption ltthread gtgt threads option ltthread gt tio

r = p [ true -gt 1]threads = r threads [ true -gt 2]tio = Some(ti) [ true -gt 3]threads = [ threads with i = tio] [ true -gt 4 f a l s e -gt 6]r = r with threads = threads [ true -gt 5][ true][error]

The dependency summary computed for this predicate on the exit label true isshown below It indicates that the given inputs the index i and the thread ti used forupdating the i-th associated thread of the output process r are completely needed Forthe input process p the fields pid crt_thread and adr_space are completely neededas well They are copied without further changes to the output r From the arrayof associated threads all elements except the i-th are needed as well The latter iscompletely irrelevant since it is replaced in the output r by the given ti The formerare simply read and copied to r

62 Symbolic Dependency Components in a Nutshell 119

p 7rarr

threads 7rarr 〈gt i 〉pid 7rarr gt

crt_thread 7rarr gtadr_space 7rarr gt

i 7rarr gtti 7rarr gt

At a first glance this dependency summary seems to reflect rather accurately thepredicatersquos inputs and input subelements on which the output process r depends onHowever similarly to the accessor predicate thread a further distinction is possibleThe predicate set_thread does not depend itself on the input ti nor on the fields ofthe process p It does not use these for new computations ndash it simply copies them to thecorresponding output subelements Just as before the amount in which the outputrsquossubelements are used subsequently characterizes more precisely the dependency on theinputs of set_thread For instance the dependency on prsquos current thread field shouldbe the symbolic element corresponding to the outputrsquos process crt_thread Howeverour first attempt at representing symbolic elements as sets of output variables seen asa whole does not allow us to convey such information For expressing it we first needto be able to refer to the substructure rcrt_thread and use this as a lazy componentin which callers may inject their own context Similarly for the threads array we needto be able to refer to all other elements except the i-th one Thus at the symbolicdependencies level as well we need the capability of distinguishing between the differentsubelements of the inputs This would allow us to obtain the following dependencysummary

p 7rarr

threads 7rarr 〈 Deferred(rthreads〈lowast i 〉) i 〉pid 7rarr Deferred(rpid)

crt_thread 7rarr Deferred(rcrt_thread)adr_space 7rarr Deferred(radr_space)

i 7rarr gtti 7rarr Deferred(rthreads〈 i 〉Somet)

One way to capture the actual effect that is due to set_thread consists in replac-ing all deferred dependencies with ie nothing and simplifying the summary Thedependency summary thus obtained shows the dependency on set_threadrsquos inputs inthe extreme case of calling the predicate and throwing away its result In this casethe summary for set_thread would show that the predicate only depends on the in-put i and on the length or support of the threads array captured by 〈〉 On thecontrary by replacing the deferred dependencies with gt ie everything we obtainexactly the results computed by the context-insensitive dependency analysis presentedin Chapter 5 The information thus obtained shows the dependency on set_threadrsquosinputs when considering the other end of the spectrum namely calling the predicateand using its result entirely

120 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

The dependency summary with deferred occurrences is indeed precise Not onlydoes it create a dependency template in which callers can inject their own context but italso distills the predicatersquos set_thread specification A quick glance and interpretationof it indicates that it is indeed a non-destructive mutator updating the i-th associatedthread of a process to ti and preserving everything else

In order to obtain such dependency summaries we need to refine our first approx-imation of symbolic elements as sets of a predicatersquos output variables Just as neededin our initial abstract dependency domain we must reflect the layered structure ofalgebraic data types and arrays at the level of symbolic dependencies as well To thisend we need to consider not only sets of output variables but also symbolic paths tosubstructures within them

63 Symbolic Paths

631 Symbolic Path Type

In order to extend our abstract dependency domain with symbolic dependencies and toobtain expressive dependency summaries as the ones discussed in the previous sectionwe begin by introducing symbolic paths These are meant to mirror the layered structureof algebraic data types and arrays at the level of symbolic dependencies

Each deferred occurence in a dependency summary is identified by symbolic pathsSymbolic paths are rooted at one of the programrsquos variables and represent sequences ofsymbolic internal accesses inside some valuersquos structure ie they are symbolic traversalsfrom one value to some of its subparts Paths are chains of symbolic accesses leadingto nested elements in which different calling contexts can be subsequently injected Wedefine a recursive type π of symbolic paths encompassing this

Definition 631 Symbolic path type π isin Π

π isin Π π = | ε endpoint ndash root| f π f isin F | Cπ C isin C| 〈i〉π i index| 〈lowast i〉π i index| 〈lowast〉π

An endpoint denoted by ε is the special path denoting an entire element For struc-tures we denote the symbolic path to some field f by fπ Similarly for variants wedenote the path to some chosen constructor C by Cπ For arrays we distinguishbetween three cases

bull symbolic paths referring to a specific array cell identified by the cellrsquos index iand denoted by 〈i〉π

bull symbolic paths referring to all but one specific array cell identified by its indexi and denoted by 〈lowast i〉π

63 Symbolic Paths 121

bull symbolic paths referring to all the cells of an array denoted by 〈lowast〉π

With one exception these symbolic paths directly reflect the cases of our abstractdependency domain For instance the correspondance between symbolic paths forstructures or variants is immediately apparent In contrast for arrays the abstractdependency domain included two cases namely 〈δ〉 corresponding to a dependencyapplying to all of the cells and 〈δdef i δexc〉 corresponding to arrays with a generaldependency applying to all but one exceptional cell for which a specific dependencyis known In order to reflect the second case in the deferred occurrences we need tobe able to refer to the exceptional cell on one hand and to all other cells of the arrayon the other hand Hence to this end we need to introduce two symbolic path typesthe symbolic 〈i〉π path for expressing deferred occurrences of exceptional cells and the〈lowast i〉π symbolic path for expressing deferred occurences of all the other array cellsexcept the one identified by i

The action of appending a non-empty path πprime to another path π is denoted byπ πprime We call the extension operator and when applying it we say that we extendπ with πprime

We further consider sets P sub Π of symbolic paths π and define the partial orderv

between them

Definition 632 Partial Orderv for Path Sets

forallP sub Π P prime sub Π Pv P prime lArrrArr P sube P prime

They establish a semi-lattice based on the subset order The bottom element of thissemi-lattice is empty the empty set of paths

forallP sub Π emptyv P

There is no top element Theoretically this would correspond to the set representingall possible paths In practice this cannot be constructed and we chose not to add aspecial case for it to our symbolic path type π

The join operation of deferred path sets is based on set union and is denoted byor

Definition 633 Join Operationor for Path Sets

forallP sub Π P prime sub Π Por P prime = P cup P prime

It is symmetric and the value obtained by joining two path sets is the least upper boundApplying the extension operator on a set of symbolic paths P amounts to a

pointwise extension of each member of the path set

Definition 634 Extension Operator for Path Sets

forallP sub Π P πprime = π πprime| π isin P

122 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

632 Semantics of Symbolic Paths

Semantically paths of type π defined previously are a symbolic representation of severalactual paths In the following we explicit this notion and we begin by defining simpleactual paths in a value of the universe D (Definition 441)

Actual paths represent a unique sequence of internal accesses inside some valuersquosstructure leading to a single nested element Unlike symbolic paths that can forinstance cover multiple elements of an array an actual path designates a single subvalueof a structure variant or array The recursive actual path type π isin Π is defined below

Definition 635 Actual Path Type π isin Π

π = | ε empty| f π f isin F | C π C isin C| 〈i〉π i index

A symbolic path π covers an actual path π if when given a valuation E (Defini-tion 442) of the index variables for arrays it matches π A set of symbolic pathscovers an actual path π if at least one of the symbolic paths matches π We denotethis by the E relation that is parameterized by a valuation E The definition of Eis given in Table 61

Table 61 ndash E ndash Path Semantics

ε E εE ε

π E π

fπ E f πEStruct

π E π

Cπ E CπEVar

π E π E(i) = j

〈i〉π E 〈j〉πECell

π E π

〈lowast〉π E 〈j〉πEAnyCell

π E π E(i) 6= j

〈lowast i〉π E 〈j〉πEOutCell

Given a valuation E a set P of symbolic paths covers an actual path π if at leastone of the symbolic paths in the set covers or matches π

forallP sub Π P E π lArrrArr existπ isin P π E π

63 Symbolic Paths 123

The interpretation JP KE of a set of paths P is then the set of single actual pathsthat are covered given a valuation E

Definition 636 Interpretation JP KE of a set of paths P

forallP sube Π JP KE = π| P E π

The partial orderv (Definition 632) on sets of paths is compatible with the inter-

pretation JP KE in the sense that when Pv Q holds the interpretation JP KE of P is

included in JQKE for every valuation

forallPQ sube ΠforallEPv Q lArrrArr JP KE sube JQKE

Each single path can be interpreted as a way to find a subpart of a value which weexplicit by the following function at It is not defined for all cases since not all pathscan be applied to all values

Definition 637 Function at

at Πtimes Drarr D

at(π v) =

v when π = ε

at(πprime vi) when π = fiπprime and

v = f1 = v1 fi = vi fn = vnat(πprime vC) when π = Ciπprime and

v = Ci[vC ]at(πprime vi) when π = 〈i〉πprime and

v = (P (vk)kisinP)i isin P

633 Well-Typed Paths and Path Sets

Symbolic paths cannot be used in every context their interpretation must be made inthe context of a type τ An endpoint ie the ε symbolic path can apply to any type Incontrast other symbolic paths that exhibit specific data features can only apply to thecorresponding types For instance a path such as fπ is meaningless on values whichare not records or on record values that do not exhibit a field f the field specified inthe symbolic path

A path set can be seen as a set of sequences of internal accesses inside some valuesrsquosstructure In that sense it is a set of possible traversals from one value to some of itssubparts To characterize the contexts in which a path set is well-typed we need toconsider the types of values to which it can be applied and the types of values to whichit can lead to Therefore in the following we begin by defining a typing judgement forsymbolic paths as a three-place relation π τ rarr τ prime whose meaning is that π can beapplied to any value of type τ and in that case it will always describe subvalues of type

124 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

τ prime Additionally the typing judgement is also parameterized by a set of input variablesI which are the variables having the right to appear as identifiers for array accessesThis is detailed in Table 62

I ` ε τ rarr τWTε

τ = structf1 τ1 fi τi fn τnI ` πi τi rarr τ prime

I ` fiπi τ rarr τ primeWTStructPath

τ = variant[C1 τ1| | Ci τi| | Cn τn]I ` πC τi rarr τ prime

I ` CiπC τ rarr τ primeWTVarPath

Γ ` π τ rarr τ prime

I ` 〈lowast〉π arrτi〈τ〉 rarr τ primeWTArrayPath

I ` π τ rarr τ prime I(i) = τi

I ` 〈i〉π arrτi〈τ〉 rarr τ primeWTCellPath

I ` π τ rarr τ prime I(i) = τi

I ` 〈lowast i〉π arrτi〈τ〉 rarr τ primeWTOutPath

Table 62 ndash Well-Typed Dependency Paths

A set P of symbolic paths is well-typed if every path contained by it is well-typedfor the same types

forallP sub Π I` P τ rarr τ prime lArrrArr forallπ isin P I ` π τ rarr τ prime

The well-typedness property of sets of symbolic paths is preserved by the join op-eration

or (Definition 633)

forallP prime P primeprime isin Π forallτ prime τ primeprime isin TI` P prime τ prime rarr τ primeprime rArr I

` P primeprime τ prime rarr τ primeprime rArr I

` P prime

or pprimeprime τ prime rarr τ primeprime

When extending a well-typed set of symbolic paths with a well-typed path using theextension operator (Definition 634) the resulting set of symbolic paths is well-typed

64 Abstract Dependency Domain with Deferred Accesses 125

as well

forallP prime isin Π forallτ τ prime τ primeprime isin TI` P prime τ prime rarr τ primeprime I ` πprime τ primeprime rarr τ rArr I

` P prime πprime τ prime rarr τ

64 Abstract Dependency Domain with Deferred AccessesFrequently as explained in Section 62 the dependency on a predicatersquos input variable isrelative to the amount in which some of the predicatersquos outputs are subsequently neededMore precisely these outputs are those into which the input variable is copied andretrieved We strive to avoid over-approximations in such cases and to create degreesof freedom for the callers by treating such output variables as points in which callers caninject their own context externally In other words we want to defer the computationof the dependency on certain input variables of a predicate to the predicatersquos callerssince they have additional information about the actual use of the predicatersquos outputs

In our previous section mdash Section 63 mdash we have introduced and defined an in-termediate level consisting of symbolic paths and path sets These reflect the layeredstructure of algebraic data types and arrays and allow us to consider not only outputvariables as a whole but also symbolic paths within them Thus we can computemore flexible and expressive dependency summaries with finer-grained elements Wecan finally link these two ideas and extend our abstract dependency domain with de-ferred dependencies by including an additional dependency case in our domain δ isin Dinitially defined (Definition 521) in Section 52

Definition 641 Extended Abstract Dependency Domain δ isin D

δ = | gt Everything ndash atomic case (i)| Nothing ndash atomic case (ii)| perp Impossible ndash atomic case (iii)| f1 7rarr δ1 fn 7rarr δn f1 fn fields (iv)| [C1 7rarr δ1 Cm 7rarr δm] C1 Cm constructors (v)| 〈δ〉 (vi)| 〈δdef i δexc〉 i array index (vii)| Deferred(o1 7rarr P1 ok 7rarr Pk) deferred accesses (viii)

A deferred dependency shown in (viii) consists of a mapping which binds outputvariables which we also call root variables in this case to sets of symbolic paths

Definition 642 Access Map

A V 9 Π

Only output variables can be treated as lazy dependency components The sets ofsymbolic paths mapped to them allow us to distinguish between their subelements Inthe following discussion we will denote an access map o1 7rarr P1 ok 7rarr Pk by a

126 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

For the partial order v (Definition 522) defined in Chapter 5 and detailed in Ta-ble 51 an additional rule (Def) for comparing instances of deferred dependencies isadded This is shown in Table 63 The top and bottom elements of our dependencydomain are as before gt and perp respectively Thus any instance of a deferred depen-dency is more precise than gt and less precise than perp Just as gt perp and the specialdependency case a deferred dependency can be used in association to any typealbeit with some constraints for its elements

forallo 7rarr P isin a a(o)v aprime(o)

Deferred(a) v Deferred(aprime)Def

Table 63 ndash Extended Leq - Comparison of Two Domains

However unlike the atomic cases gt perp and deferred dependencies are not relatedto or to dependencies corresponding to structures variants or arrays Since they actas placeholders for dependencies that are effectively computed subsequently instancesof deferred dependencies can be compared only to gt and perp or to other instances ofdeferred dependencies For instance comparing a deferred dependency to wouldyield

Deferred(o1 7rarr P1 ok 7rarr Pk) 6v and

6v Deferred(o1 7rarr P1 ok 7rarr Pk)

The extended join operation or (Definition 523) initially defined in Section 521and detailed in Table 52 is shown below in Table 64 It still has perp as its identityelement and gt as its absorbing element Joining two instances of deferred dependen-cies amounts to a pointwise join of the path sets mapped to each output variable inthe access maps The join between an instance of a deferred dependency and a de-pendency corresponding to a structure a variant an array or to the special case amounts to gt the top element of our domain Since we cannot make any supposi-tion regarding deferred dependencies we are forced to make a pessimistic assumptionand to approximate to the least precise value Join is a commutative operation forwhich the undisplayed cases in Table 64 are defined with respect to their symmetricalcounterparts

Similarly to join the reduction operation oplus (Definition 524) has been initiallydefined in Section 521 and it has been detailed in Table 53 The extended form isshown in Table 65 It still has as an identity element and perp as an absorbing elementWhen applying the reduction operation between a deferred dependency and a depen-dency δprime corresponding to a structure a variant or an array we over-approximate thedeferred dependency to gt and apply the reduction operation between δprime and gt Apply-ing the reduction operation between a deferred dependency and gt behaves similarlythe outcome in this case is straightforward and amounts to gt As was the case forjoin applying the reduction operation between two instances of deferred dependencies

64 Abstract Dependency Domain with Deferred Accesses 127

δprime δprimeprime δprime or δprimeprime

Deferred(a) or Deferred(aprime) = Deferred(aprimeprime) where

aprimeprime(o) =

a(o)

or aprime(o) when o 7rarr Po isin a o 7rarr P primeo isin aprime

Po when o 7rarr Po isin aP primeo when o 7rarr P primeo isin aprime

Deferred(a) or f1 7rarr δ1 fn 7rarr δn = gtDeferred(a) or [C1 7rarr δ1 Cm 7rarr δm] = gtDeferred(a) or 〈δ〉 = gtDeferred(a) or 〈δdef i δexc〉 = gtDeferred(a) or = gt

Table 64 ndash or ndash Extended Join

amounts to a pointwise join of the path sets mapped to each output variable in theaccess maps The reduction operation is commutative and the undisplayed cases inTable 65 are defined with respect to their symmetrical counterparts

δprime δprimeprime δprime oplus δprimeprime

Deferred(a) oplus Deferred(a) = Deferred(aprimeprime) where

aprimeprime(o) =

a(o)

or aprime(o) when o 7rarr Po isin a o 7rarr P primeo isin aprime

Po when o 7rarr Po isin aP primeo when o 7rarr P primeo isin aprime

Deferred(a) oplus gt = gtDeferred(a) oplus f1 7rarr δ1 fn 7rarr δn = gtoplus f1 7rarr δ1 fn 7rarr δnDeferred(a) oplus [C1 7rarr δ1 Cm 7rarr δm] = gtoplus [C1 7rarr δ1 Cm 7rarr δm]Deferred(a) oplus 〈δ〉 = gtoplus 〈δ〉Deferred(a) oplus 〈δdef i δexc〉 = gtoplus 〈δdef i δexc〉

Table 65 ndash oplus ndash Extended Reduction Operator

Finally the extractions previously defined for dependencies δ (Definition 525 526527 528 and 529) have been extended in order to handle deferred dependencies aswell Their treatment is summarized in Table 66 Making array-specific extractions aswell as extracting field and constructor dependencies on a deferred dependency amountsto a pointwise extension of every path set in the access map with the correspondingsymbolic path

Finally we add the following rule to the well-typed dependency rules given in Chap-ter 5 Table 55

128 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

Extraction δ Result

Field Deferred(o1 7rarr P1 ok 7rarr Pk)f Deferred(o1 7rarr P1 fε ok 7rarr Pk

fε )Constructor Deferred(o1 7rarr P1 ok 7rarr Pk)C Deferred(o1 7rarr P1

Cε ok 7rarr Pk Cε )

Cell Deferred(o1 7rarr P1 ok 7rarr Pk)〈i〉 Deferred(o1 7rarr P1 〈i〉ε ok 7rarr Pk

〈i〉ε )Array General Deferred(o1 7rarr P1 ok 7rarr Pk)〈lowast〉 Deferred(o1 7rarr P1

〈lowast〉ε ok 7rarr Pk 〈lowast〉ε )

Outside Cell Deferred(o1 7rarr P1 ok 7rarr Pk)〈lowast i〉 Deferred(o1 7rarr P1 〈lowast i〉ε ok 7rarr Pk

〈lowast i〉ε )

Table 66 ndash Extended Extraction Operators

Γ(o1) = τ1 Γ I` P1 τ1 rarr τ

Γ(ok) = τk Γ I` Pk τk rarr τ

o1 isin O ok isin OΓ IO ` Deferred(o1 7rarr P1 ok 7rarr Pk) τ

WTDeferred

Table 67 ndash Well-Typed Dependencies ndash Extended

65 Deferred Dependencies at the Intraprocedural Level

651 Extended Intraprocedural Dependency Analysis

At the intraprocedural and interprocedural level of our dependency analysis the intro-duction of deferred dependencies has a minimal impact in terms of required changes

Intraprocedurally each predicate is analysed on every possible exit label As ex-plained in Section 532 our dependency analysis is a backward data-flow analysis Foreach possible exit label of a predicate the control flow graph is traversed backwardsstarting from the exit node that corresponds to the analysed execution scenario De-pendency information is computed at every point of the control flow graph for eachof the predicatersquos input output and local variables and this information is graduallyrefined until a fixed point is reached

By traversing the control flow graph backwards we take advantage of the infor-mation regarding the outputs that are associated to the analysed exit label and weconsider only the relevant ones starting from the initialisation phase As explainedpreviously in Section 532 the intraprocedural domain for the currently analysed exitlabel is initialised with its associated output variables mapped to gt the least preciseelement of our abstract dependency domain This is a conservative over-approximationit is considered that control on the outputs is lost and that these are entirely observedexternally As illustrated in Section 62 this over-approximation propagates along thecontrol flow graph and in certain cases has a non-negligible impact on the precisionof the computed dependency summaries

We argued that at the intraprocedural level of the analysis a subtle but importantdistinction can be made regarding the dependency on certain inputs This consists in

65 Deferred Dependencies at the Intraprocedural Level 129

distinguishing between the cases in which a predicate effectively uses an input subele-ment to compute an output subelement and those in which it simply forwards it toan output subelement In the latter cases the predicate does not use or need such aninput subelement per se and as a consequence the dependency on it is relative to theamount in which the predicatersquos callers will subsequently use the output in which itis retrieved At the intraprocedural level in order to avoid the propagation of over-approximations it is important to make this distinction early on from the initialisationphase Therefore we introduce deferred dependencies at this level instead of mappingthe output variables to gt as was previously done

For a predicate p of the following form

p(e1 en) [λ1 o11 o1k1 | | λi oi1 oiki | | λm om1 omkm ]

analysed on the λi exit label the intraprocedural dependency domain used for initial-ising the node corresponding to λi is the following

oi1 7rarr Deferred(oi1 7rarr ε) oiki 7rarr Deferred(oiki 7rarr ε)

For each associated output oij 1 le j le ki of the analysed label λi a set Poij ofsymbolic paths is constructed Initially this consists of a single element namely the εpath The deferred dependency associated to each output oij is an access map bindingoij itself to its corresponding set of symbolic paths Poij Since the symbolic paths εrefer to the output variables in their entirety this is still a conservative approximationbut in contrast to our previous initialisation strategy it acknowledges the fact thatdependencies on the inputs might be relative to the amount in which the outputs aresubsequently used It allows injecting context-sensitive information later on

This new initialisation strategy is enough to incorporate the expressive power ofdeferred dependencies at an intraprocedural level Whereas before we were computinglabel-specific dependency summaries as input-output relations the new strategy allowsus to obtain label-specific dependency templates with lazy components that can beparameterized and varied according to a callerrsquos own intraprocedural context Thesecan be seen as context-insensitive dependency summaries with context-sensitive leaves

652 Intraprocedural Dependency Analysis Illustrated

In order to illustrate the use of deferred dependencies at an intraprocedural level werevisit our thread example predicate discussed in Section 533 As done previouslywe consider the true execution scenario and apply our extended dependency analysisWe initialize the dependency corresponding to the true exit node by mapping thepredicatersquos output ti to the deferred dependency mapping it to a set containing asingle symbolic path namely ε

130 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

After the initialisation phase the analysis continues as before by traversing thecontrol flow graph backwards and by applying at each step the corresponding data-flow equation The deferred dependency is propagated upwards until the entry node isreached and analysed

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

p 7rarr threads 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti 7rarr ε) None 7rarr perp]〉i 7rarr gt

th 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti 7rarr ε) None 7rarr perp] 〉i 7rarr gt

tio 7rarr [Some 7rarr t 7rarr Deferred(ti 7rarr ε) None 7rarr perp]

ti 7rarr Deferred(ti 7rarr ε)

Figure 61 ndash Analysing thread ndash Dependency Summary with DeferredOccurrences

The final dependency summary for the true exit label of the predicate is obtained

p 7rarr threads 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti) None 7rarr perp]〉i 7rarr gt

and this is similar to the targeted dependency information for thread discussed inSection 62 and illustrated on page 117

66 Deferred Dependencies at the Interprocedural LevelAt the interprocedural level the impact of introducing deferred dependencies is visibleonly at the level of the substitutions that have to be performed Previously the only re-quired substitution consisted in replacing all occurrences of formal input parameters ofa predicate with the corresponding effective input parameters After having introduceddeferred dependencies further substitutions are needed These can be easily illustratedby revisiting our start_address example predicate discussed in Section 541 As donepreviously we consider the true execution scenario and apply our extended dependencyanalysis

We begin by initialising the output adr with a corresponding deferred dependencyas discussed in Section 651 The analysis traverses the control flow graph backwardsand computes the dependency information at each node until reaching the controlflow graphrsquos entry node which corresponds to a call to the thread predicate Theintermediate dependency results are shown in Figure 62

We obtain the dependency summary for the true exit label of the called predicatethread In order to be able to use it we must first substitute the formal input param-eters ie p and i appearing in it with the effective arguments of the call ie p andj Additionally in deferred dependencies we also have to substitute the formal output

66 Deferred Dependencies at the Interprocedural Level 131

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

trueNone oob

true

true

adr 7rarr Deferred(adr 7rarr ε)

sj 7rarr start 7rarr Deferred(adr 7rarr ε)

tj 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε)

Figure 62 ndash Gstart_address ndash Intermediate Dependency Results forstart_address

parameters appearing as roots in the access maps ie ti with the corresponding ef-fective output parameters These substitutions are shown in Figure 63 Formal indexvariables appearing in dependencies corresponding to arrays have to be substitutedwith their effective counterparts as well Similarly any formal index variable appearingin symbolic paths that correpond to arrays must be substituted by the correspondingeffective index variable

p 7rarr threads 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti) None 7rarr perp]〉i 7rarr gt

tj 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε)

p j tj

j

Figure 63 ndash Substitution of Formal Parameters by Effective Parame-ters

We can finally take advantage of the flexibility obtained using deferred dependenciesby injecting the callerrsquos intraprocedural dependency information into the deferred oc-currences of the calleersquos dependency summary This is another type of substitution andconsists in replacing deferred occurrences of formal output parameters of a predicateby the dependency information computed in the current context for the correspondingeffective output parameters For our start_address example this is shown in Fig-ure 64 and amounts to substituting the dependency computed for tj in the deferredoccurrence of ti in the dependency summary of thread

After this substitution we obtain the following dependency summary for the exitlabel true of the start_address predicate

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε) None 7rarr perp]〉j 7rarr gt

132 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr Deferred(tj) None 7rarr perp]〉j 7rarr gt

tj 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε)

Figure 64 ndash Substituting Deferred Dependencies by Actual Dependen-cies

661 Applying Context-Sensitive Information by Substitution

As shown in our previous example deferred dependencies associate sets of symbolicpaths to certain root variables We can substitute such deferred dependencies by actualdependencies computed in the current context by applying the symbolic paths to theactual dependency to substitute We iterate through entire dependency summaries inorder to substitute the nested deferred dependencies appearing at some leaves Thissubstitution can be seen as an application of contextual information to summarieswith deferred dependencies which are essentially context-insensitive abstractions withcontext-sensitive leaves It is denoted by a mapping σ which associates dependenciesto root variables appearing in deferred access maps

Definition 661 Substitution σ

σ V rarr D

Simultaneously while substituting root variables in deferred dependencies by theiractual dependencies computed in the current intraprocedural context we also substi-tute indices in information corresponding to arrays These are substituted either byanother array index ie the one corresponding to an actual input parameter or theyare eliminated when corresponding to a local variable Their elimination consists inapproximating the dependencies so as to remove references to the array index Thissubstitution is denoted by φ and it is a mapping from variables to new variables toreplace them

Definition 662 Substitution φ

φ V 9 V

The two substitutions can be done separately However for performance reasonswe chose to do them simultaneously This is also what the actual implementation of thedependency analysis does We denote the two simultaneous substitutions by J (σ φ)and detail them in Table 69 Performing the two operations simultaneously can beseen as a manner of reinterpreting a dependency computed in one context in anothercontext

For sets of symbolic paths (as defined in Section 631) in deferred dependenciesthe operation P bull (σ(o) φ) is the application of symbolic paths to the dependency of

66 Deferred Dependencies at the Interprocedural Level 133

the root variable o computed in the current context For a deferred access map alldependencies obtained by applying the symbolic paths are joined The application of asymbolic path π to a dependency δ is denoted by π (δ φ) and it is shown in Table 68During the application free variables appearing in symbolic paths associated to arraysare substituted by their corresponding index variables as given by φ If φ does notcontain a mapping for a free variable an approximation is made in order to remove itand the dependency obtained by applying 〈lowast〉 is returned

π (δ φ)

ε (δ φ) = δ

fπ (δ φ) = π (δf φ)Cπ (δ φ) = π (δC φ)〈lowast〉π (δ φ) = π (δ〈lowast〉 φ)

〈i〉π (δ φ) =π (δ〈φ(i)〉 φ) i isin Dom(φ)π (δ〈lowast〉 φ) otherwise

〈lowast i〉π (δ φ) =π (δ〈lowast φ(i)〉 φ) i isin Dom(φ)π (δ〈lowast〉 φ) otherwise

Table 68 ndash Deferred Paths ndash Application and Substitutions

Definition 663 Application of Symbolic Paths to a Dependency

P bull (δ φ) =orforallπisinP

π (δ φ)

δ J (σ φ)

gt J (σ φ) = gt J (σ φ) = perp J (σ φ) = perp

f1 7rarr δ1 fn 7rarr δn J (σ φ) = f1 7rarr δ1 J (σ φ) fn 7rarr δn J (σ φ)[C1 7rarr δ1 Cm 7rarr δm] J (σ φ) = [C1 7rarr δ1 J (σ φ) Cm 7rarr δm J (σ φ)]

Deferred(o1 7rarr P1 ok 7rarr Pk) J (σ φ) =or

1leilekPi bull (σ(oi) φ)

〈δdef 〉 J (σ φ) = 〈δdef J (σ φ)〉

〈δdef i δexc〉 J (σ φ) =〈δdef J (σ φ) φ(i) δexc J (σ φ)〉 i isin Dom(φ)〈δdef J (σ φ) or δexc J (σ φ)〉 otherwise

Table 69 ndash Interprocedural Domain ndash Substitutions

134 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

662 Wrapped Calls and Results

As a simple experiment for verifying the precision of our dependency analysis approachwith deferred dependencies we have replaced all calls to built-in predicates in ourprevious example predicates thread and start_address illustrated in Section 652and on page 131 respectively with calls to predicates wrapping every call of this typeWe compared the precision of the obtained results as well as the execution time neededto compute the dependency summaries

The thread_with_wrapped predicate thus has the following formpredicate thread_with_wrapped ( process p int i)-gt [ true thread ti|None|oob] array lt option_thread gt th option_thread tio

get_threads (p)[ true th] [ true -gt 1]get_ith (th i)[ true tio| f a l s e ] [ true -gt 2 f a l s e -gt 5]switch_option (tio )[ none|some ti] [none -gt 4 some -gt 3][ true][None ][oob]

The start_address predicate becomespredicate start_address_wrapped ( process p int j)

-gt [ true int adr|None] thread tj memory_region sj

thread (p j)[ true tj | None | oob] [ true -gt 1None -gt 4 oob -gt 4]

get_stack (tj) [ true sj] [ true -gt 2]get_start (sj) [ true adr] [ true -gt 3][ true][None ][error]

The dependency summaries obtained for each of the two predicates are identicalto the ones obtained for the predicates thread and start_address in their originalform The dependency information for thread and start_address is computed in 033milliseconds while that for the versions with calls to the wrapped built-in predicatesie thread_with_wrapped and start_address_wrapped are obtained in 065 millisecondsWe ran the analysis 10001 times in a loop The time measured includes only theexecution of the analysis algorithms It excludes the time required to load the inputfiles as well as the time spent printing the results

67 Related WorkFor the past few decades interprocedural analyses have generated considerable interestin the static analysis community They expand the scope of analysis beyond a pro-cedurersquos limits in order to encompass the effect of callees on callers The precision

67 Related Work 135

of both data-flow and control-flow analyses is traditionally characterized in terms ofcontext-sensitivity ie computing information depending on the calling context orits dual context-insensitivity For control-flow analyses the terms polyvariant andmonovariant analyses are used interchangeably for the same distinction (Nielson andNielson 1999) In (Midtgaard 2012) a comprehensive survey of control-flow analysesfor functional programs is made Context-sensitivity has the advantage of increasedprecision However the scalability of such analyses is frequently a major concern Theprecision and performance impact of context-sensitivity is discussed by Lhotaacutek andHendren in (Lhotaacutek and Hendren 2006) In contrast Ruf argues in (Ruf 1995) thatcontext-insensitivity leads to little or no precision penalty Shapiro and Horwitz ar-gue in (Shapiro and Horwitz 1997) that using a more precise pointer analysis does ingeneral lead to more precise results

Sharir and Pnueli introduced in (Sharir and Pnueli 1978) a comprehensive theoryof interprocedural data-flow analyses for general frameworks The first of them thefunctional approach is based upon computing a context-sensitive summary of a functionor procedure call Procedures are viewed as collections of structured program blocksand input-output relations are established for each such block Subsequently the effectof procedure calls is computed by simply using such relations The second approachproposed by Sharir and Pnueli is the call-string approach Broadly speaking this isbased upon avoiding infeasible paths by matching corresponding calls and returnsIt can be seen as an extension to intraprocedural data-flow analyses in which onlyvalid interprocedural paths are considered during graph traversal This is achieved bytagging the propagated data with an encoded history of procedure calls thus making theinterprocedural flow explicit and increasing the accuracy of the propagated informationBoth approaches are generic and can be used for a wide variety of analyses Our formof interprocedural dependency analysis is closer to the functional approach For eachpredicate of the analysed program it computes a dependency summary as an input-output relation and then uses this summary whenever the predicate is called Symbolicelements are used to allow callers to inject their own context information

Though desirable in terms of precision context-sensitivity is often considered pro-hibitively costly in terms of performance In practice many analyses make a com-promise and relax to a certain degree this requirement for scalability Our approachmakes no exception either it constitutes an application of context-sensitive informa-tion to summaries with deferred dependencies which are essentially context-insensitiveabstractions with context-sensitive leaves Though not purely context-sensitive weobtain a gain in precision without sacrificing scalability

Purely context-sensitive analyses have been developed especially in the area ofpoints-to analyses (Gharat Khedker and Mycroft 2016) but also for informationflow control (Hammer and Snelting 2009) or liveness analysis used for garbage collec-tion (Asati et al 2014) In (Khedker Mycroft and Rawat 2011) Khedker et alpresent a lazy context-sensitive points-to analysis Points-to information is computedonly for the pointers that are live and the propagation of points-to information is sparsebeing restricted to live ranges of pointers Though our approach is not directly com-parable to this approach it is interesting to make a few general remarks In (Khedker

136 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

Mycroft and Rawat 2011) strong liveness is used for identifying the pointers thatare directly used or which are used for defining pointers that are strongly live Onthe other hand we use strong dependency to identify and distinguish between inputsubelements that are directly needed for computing the output and input subelementsthat are simply copied into and forwarded as outputs Thus Khedker et al preventthe explosion of information by clearly distinguishing between relevant and irrelevantinformation We achieve scalability by refining the notion of needed or depending onTheir analysis is fully context-sensitive and is based on the call-string approach (Sharirand Pnueli 1978) our analysis shows a relaxed form of context-sensitivity and is closerto the functional approach

Jensen et al present in (Jensen Moslashller and Thiemann 2010) a technique based onlazy propagation for context-sensitive interprocedural analysis of JavaScript programsie programs with objects and first-class functions Transfer functions may not bedistributive and hence the IFDS technique (Reps Horwitz and Sagiv 1995 Padhyeand Khedker 2013) is not applicable They propagate data-flow information ldquoby needrdquoin an iterative fixpoint algorithm

The computation of relevant information is deferred in demand-driven analyses (Hor-witz Reps and Sagiv 1995 Heintze and Tardieu 2001 Zheng and Rugina 2008Sridharan et al 2005) as well These compute the targeted results only at specificprogram points thereby avoiding the effort of computing a global result We computedependency summaries with symbolic elements These can be seen as dependency tem-plates parameterized by a callerrsquos context Their instantiation is deferred and left tothe callers

68 ConclusionWe have presented an extension of our dependency analysis introducing a relaxedform of context-sensitivity Our solution is based on computing deferred dependen-cies consisting of symbolic access maps in which callerrsquos can subsequently inject theirspecific context information on an as-needed basis The dependency summaries foreach predicate are computed only once However by including nested context-sensitivecomponents at the summariesrsquo leaves we reduce the precision penalty exerted by ourprevious context-insensitive approach The introduction of deferred dependencies re-quired the introduction of an additional level of symbolic paths and path sets Howeverthe impact of this extension had a minimal impact on the dependency analysis at theintra- and interprocedural levels imposing only the modification of the initialisationstrategy and of the substitution operation As we will discuss in Chapter 8 our ex-tension of the dependency analysis with deferred dependencies led to an increase of10ndash20 in execution time on our used benchmark However it obtained more precisedependency information for 50 of the predicates included in the used benchmark

137

Chapter 7

Correlation Analysis

A thousand fibers connect us [] andamong those fibers as sympatheticthreads our actions run as causes andthey come back to us as effects

Hermann Melville

71 IntroductionIn the field of Artificial Intelligence the frame problem (McCarthy and Hayes 1969)is loosely but frequently described as ldquoknowing what stays the same as actions occurin a changing worldrdquo (Morgenstern 1995) In the realm of software verification theframe problem refers to establishing the boundaries within which functions operateand it has notoriously tedious implications and consequences along two different axesthe specification of frame properties (Borgida Mylopoulos and Reiter 1995) and theirverification

Another frequently used definition of the frame problem in the context of ArtificialIntelligence refers to ldquoefficiently determining what remains the same in a changingworldrdquo (Morgenstern 1995) This definition is similar to the first yet the initial wordsldquoefficiently determiningrdquo confer it a subtle but crucial nuance In this chapter we arerather interested in the latter and we address the issue of automatically detecting deep-state modifications in the context of αSmil a functional language In our ldquochangingworldrdquo destructive updates are not allowed The new state out of a structured valuein is obtained by destructuring in and reconstructing it in out by copying unmodifiedsubvalues from in and replacing in out only what needs to reflect the modificationThus referring to old values per se as one of the three major approaches to specifyingframe properties (described in Section 231) implies does not make sense Instead wehave to focus on and to detect the relations between the (sub)values in and out Tothis end we present a static correlation analysis which when given a predicate thatmanipulates a structured input is meant to determine automatically the subset thatremains unchanged and is further propagated into the output Thus the behaviour ofa predicate is summarised by computing relations between parts of the input and partsof the output The computed correlation summaries are a safe approximation of what

138 Chapter 7 Correlation Analysis

part of an input state of a predicate is copied to the output state they summarise notonly what is modified by the predicate but also how it is modified and to what extent

Outline We continue this chapter by illustrating the targeted correlation results onan αSmil example in Section 711 In Section 712 we give a brief overview of thecharacteristics of our correlation analysis and explain the motivation behind some ofthem The rest of the chapter is focusing on technical details related to the correlationanalysis In Section 72 we present our abstract partial equivalence type a fundamen-tal component of our correlation analysis It is followed in Section 73 by an in-depthpresentation of paths and correlations an intermediate level of abstraction that is im-perative for obtaining expressive results In Section 74 we focus on the correlationanalysis at an intraprocedural level and illustrate the step-by-step mechanism behindit in Section 742 A summary of the correlation analysis at an interprocedural level isgiven in Section 75 A possible extension going beyond the detection of equivalencesand handling more general relations is briefly discussed in Section 76 Detecting mod-ifications is traditionally associated to shape and side-effect analyses In Section 77 wereview and discuss such approaches

711 Targeted Correlation Information

The goal of our analysis and the targeted correlation results can be illustrated onan example predicate such as stop_thread for instance This predicate has beenintroduced in Section 315 (on page 50) and its body in the αSmil language was shownin Section 41 on page 64 We revisit it and illustrate the predicatersquos body in Figure 71

predicate stop_thread(process in int i)-gt [true process o | inval]arrayltoption_threadgt ta option_thread ththread ti state s1 ta = inthreads2 th = ta[i]3 switch(th) as [Someti | None]4 s = Blocked5 ti = ti with current_state=s6 th = Some(ti)7 ta = [ta with i=th]8 o = in with threads=ta9 true 10 inval

false

None

false

Figure 71 ndash Body of the stop_thread Predicate

It has two possible execution scenarios true when the given index i corresponds toan active thread and inval otherwise ie when it corresponds to an inactive elementor when it lies outside the arrayrsquos bounds In the latter case the predicate exits with

71 Introduction 139

the inval label and generates no output In the former case stop_thread modifies thestate of the i-th active thread by setting it to Blocked and returns the new state ofthe process in the output o This is accomplished by destructuring the input processin and copying the array of associated threads into the local variable ta (line 1) Thearrayrsquos i-th element is copied to the local variable th (line 2) and as it is an activeelement its corresponding thread is extracted and put into ti (line 3) The new statefor the thread value ti is created by setting its current_state field (line 5) to the states constructed previously (line 4) The new state o of the process is constructed usingti for its i-th active element (lines 6 and 7) and copying everything else from the inputin (line 9) It is interesting to note that for each destructuring step of in there is acorresponding construction step for o as is visible at lines 1 and 8 2 and 7 and 3 and6 for instance

The targeted correlation results for this predicate are illustrated in Figure 72 Ouranalysis should infer that between the input process in and the output o the valuesof the fields pid current_thread and address_space are equal Furthermore for thethreads array of associated threads it should detect that all elements are equal exceptthe value of the i-th element (as illustrated by Rth) for which only one of the threefields namely the current_state field differs (shown by Ri)

in

o

address_spacecurrent_thread

pidthreads

address_spacecurrent_thread

pidthreads =

==

Rth

Rth i iRi

Ri stackcurrent_stateidentifier stackcurrent_stateidentifier

Figure 72 ndash Targeted Correlation Results for Predicate stop_thread

By tracking equalities between pairs of variables of the same type and by defining

140 Chapter 7 Correlation Analysis

an abstract partial equivalence type that mirrors the layered structure of associativearrays and algebraic data types we can detect the equality of the values for the pidcurrent_thread and address_space fields between the input and the output However ifwe track only equalities between variables of the same type and we ignore the flow of aninputrsquos subelement value to a variable (or conversely the flow of a variablersquos value to anoutputrsquos subelement) valuable information is lost We are not only losing informationbetween inputs and outputs of different types but by accumulating imprecisions wealso lose information concerning inputs and outputs of the same type such as the inand o processes of our example For instance the equality between the values extractedfrom the input in and copied into ta and th respectively as well as the relation betweenthe values of ta and othreads and th and othreads[i] are ignored because neitherta nor th are of the same type as in and o As a consequence we lose the informationconcerning the relation between inrsquos and orsquos threads values altogether In order tocompute such information it is imperative to track (cor)relations between variables ofdifferent types as well

712 Correlation Analysis in a Nutshell

Our correlation analysis is a conservative static analysis inferring what is modified byan operation and to what extent It approximates the flow of input values into outputvalues by uncovering equalities and computing correlations as pairs between inputparts and the output parts into which these are injected What is marked as beingequal is definitely equal

π

ρ

πprime

ρprimeRprime

R

Figure 73 ndash Intraprocedural Correlations ndash General Representation

Outputs are often complex compounds of different subparts of different input vari-ables a subset of the input is modified while the rest is injected as is We track theorigin of subparts of the output and relate it to subparts of the input As previouslyillustrated on our stop_thread example predicate in order to prevent avoidable over-approximations we need to avoid dealing with data in a monolithic manner To thisend it is imperative to consider pairs of different types and granularities as well As aconsequence we are forced to introduce an additional level of granularity allowing us torefer not only to variables but also to substructures within them At the intraprocedu-ral level illustrated in Figure 73 we define correlations as mappings between pairs ofinputs and outputs to which we associate mappings between pairs of valid inner paths

72 Partial Equivalence Relations 141

and the relations binding them Correlations for arrays and variants are exemplified inFigures 74-a) and 74-b)

i i

R

a) Arrays foralli a[i]R b[i] b) Variants

Figure 74 ndash Intraprocedural Domain ndash Examples

Similarly to our dependency analysis presented in Chapter 5 the correlation analysisis an interprocedural flow-sensitive field-sensitive label-sensitive analysis that handlesassociative arrays structures and variant data types However unlike the dependencyanalysis for which we introduced a relaxed form of context-sensitivity in Chapter 6 thecorrelation analysis is context-insensitive Fine-grained equivalence relations betweenthe inputs and outputs of a predicate are computed once and subsequently propagatedto its callers

Our correlation analysis is meant to be used in an interactive verification contextPrecise correlation summaries must be computed quickly in order to answer effectivelywhen combined with dependency summaries queries regarding the preservation of cer-tain invariants

72 Partial Equivalence Relations

721 Abstract Partial Equivalence Type

The first step towards automatically reasoning about the propagation of input subele-ments into output subelements is the definition of an abstract partial equivalence typeR that mimics the structure of algebraic data types and arrays A partial equivalencerelation R isin R is defined inductively from the two atomic elements Equal and Anyand mirrors the structure of the concrete types

Definition 721 Partial Equivalence Type R isin R

R = | Equal atomic case ndash equal (i)| Any atomic case ndash unrelated (ii)| f1 7rarr R1 fn 7rarr Rn f1 fn fields (iii)| [C1 7rarr R1 Cn 7rarr Rn ] C1 Cn constructors (iv)| 〈Rdef 〉 array (v)| 〈Rdef i Rexc〉 i array index (vi)

Such relations represent fine-grained partial equivalences between pairs of values of thesame type Equal and Any represent equal and unrelated values respectively Partialequivalence relations for structures (given by (iii)) and for variants (given by (iv)) areexpressed in terms of the partial equivalences of their subparts by mapping each field

142 Chapter 7 Correlation Analysis

or constructor to the corresponding relations As for the dependency analysis presentedin Chapter 5 for arrays we distinguish between two cases namely arrays with a generalrelation applying to all of the cells (as given by (v)) or to all but one exceptional cell(as given by (vi)) for which a specific relation is known to hold

The preorder relation of the partial equivalence lattice is denoted by vR and definedbelow

Definition 722 Preorder Relation vR

vR sube R timesR

It is detailed in Table 71

Table 71 ndash vR ndash Comparison of Two Domains

R vR AnyTop

Equal vR RBot

R1 vR Rprime1 Rn vR Rprimen

f1 7rarr R1 fn 7rarr Rn vR f1 7rarr Rprime1 fn 7rarr RprimenStr

R1 vR Rprime1 Rn vR Rprimen

[C1 7rarr R1 Cn 7rarr Rn] vR [C1 7rarr Rprime1 Cn 7rarr Rprimen]Var

R vR Rprime

〈R〉 vR 〈Rprime〉Adef

Rdef vR Rprimedef Rexc vR Rprimeexc

〈Rdef i Rexc〉 vR

langRprimedef i Rprimeexc

rang AI

Rdef vR Rprime Rexc vR Rprime

〈Rdef i Rexc〉 vR 〈Rprime〉AIA

R vR Rprimedef R vR Rprimeexc

〈R〉 vR

langRprimedef i Rprimeexc

rang AAI

i 6= j Rdef vR Rprimedef Rdef vR Rprimeexc Rexc vR Rprimedef Rexc vR Rprimeexc

〈Rdef i Rexc〉 vR

langRprimedef j Rprimeexc

rang AIJ

The join and meet operations are denoted by orR and andR respectively

Definition 723 Join Operation orR

orR R times R rarr R

Definition 724 Meet Operation andR

andR R times R rarr R

72 Partial Equivalence Relations 143

Both are commutative operations applied pointwise on each subelement Join shownin Table 72 has Equal as its identity element and Any as its absorbing element Meetshown in Table 73 has Equal as its absorbing element and Any as its identity elementFor both operations the undisplayed cases are defined by their symmetrical counter-parts

Table 72 ndash Partial Equivalences ndash orR ndash Join Operation

Rprime Rprimeprime Rprime orR Rprimeprime

Any orR R = AnyEqual orR R = R

f1 7rarr R1 fn 7rarr Rn orR f1 7rarr Rprime1 fn 7rarr Rprimen = f1 7rarr R1 orR Rprime1 fn 7rarr Rn orR Rprimen[C1 7rarr R1 Cn 7rarr Rn] orR [C1 7rarr Rprime1 Cn 7rarr Rprimen] = [C1 7rarr R1 orR Rprime1 Cn 7rarr Rn orR Rprimen]

〈R〉 orR 〈Rprime〉 = 〈R orR Rprime〉〈R〉 orR 〈Rprimedef i Rprimeexc〉 = 〈R orR Rprimedef i R orR Rprimeexc〉

〈Rdef i Rexc〉 orR 〈Rprimedef j Rprimeexc〉i = j

i 6= j=

〈Rdef orR Rprimedef i Rexc orR Rprimeexc〉〈Rdef orR Rprimedef orR Rexc orR Rprimeexc〉

Table 73 ndash Partial Equivalences ndash andR ndash Meet Operation

Rprime Rprimeprime Rprime andR Rprimeprime

Any andR R = R

Equal andR R = Equalf1 7rarr R1 fn 7rarr Rn andR f1 7rarr Rprime1 fn 7rarr Rprimen = f1 7rarr R1 andR Rprime1 fn 7rarr Rn andR Rprimen[C1 7rarr R1 Cn 7rarr Rn] andR [C1 7rarr Rprime1 Cn 7rarr Rprimen] = [C1 7rarr R1 andR Rprime1 Cn 7rarr Rn andR Rprimen]

〈R〉 andR 〈Rprime〉 = 〈R andR Rprime〉〈R〉 andR 〈Rprimedef i Rprimeexc〉 = 〈R andR Rprimedef i R andR Rprimeexc〉

〈Rdef i Rexc〉 andR 〈Rprimedef j Rprimeexc〉i = j

i 6= j=

〈Rdef andR Rprimedef i Rexc andR Rprimeexc〉〈Rdef andR Rprimedef andR Rexc andR Rprimeexc〉

Additionally extraction functions are defined for partial equivalence relations

Definition 725 Extraction of a Fieldrsquos Relation

extrf R 9 R

Definition 726 Extraction of a Constructorrsquos Relation

extrC R 9 R

Definition 727 Extraction of a Cellrsquos Relation

extr 〈i〉 R 9 R

144 Chapter 7 Correlation Analysis

These are partial functions and can only be applied on relations of the correspondingtypes For example the field extraction extrf only makes sense for atomic or structuredrelations having a field named f which should be the case if the relation connects twovalues of a structured type with a field f For any of the two atomic relations Equalor Any applying any of these extractions yields Equal or Any respectively They aresummarized in Table 74

Table 74 ndash Partial Equivalence Extractions

extrf (R) f isin F

extrf (Any) = Anyextrf (Equal) = Equal

extrf (f1 7rarr R1 fi 7rarr Ri fn 7rarr Rn) = Ri if f = fi

extrC(R) C isin C

extrC(Any) = AnyextrC(Equal) = Equal

extrC([C1 7rarr R1 Ci 7rarr Ri Cn 7rarr Rn]) = Rj if C = Cj

extr 〈i〉(R)

extr 〈i〉(Any) = Anyextr 〈i〉(Equal) = Equal

extr 〈i〉(〈R〉) = R

extr 〈i〉(〈Rdef i Rexc〉) = Rexcextr 〈i〉(〈Rdef j Rexc〉) i 6= j = Rdef orR Rexc

722 Well-Typed Partial Equivalences and their Semantics

As discussed in the case of dependencies in Section 522 syntactic partial equivalencesare untyped However their interpretation is made in the context of a type τ isin TThe atomic cases such as Equal and Any can apply to any type since they are notexhibiting any data type features Cases other than Equal and Any only have non-empty interpretations for types τ which are compatible with their shape For instancethe structured relation f 7rarr R only really makes sense for structured types with asingle field f whose type itself is compatible with R and will not be used in connectionwith variant or array types for example In Table 75 we detail the inference rulesrelated to the well-typedness of partial equivalences This is described as a judgementparameterized by a typing environment Γ (Definition 431)

Γ ` Equal τWTgt

Γ ` Any τWTperp

72 Partial Equivalence Relations 145

τ = structf1 τ1 fn τnΓ ` R1 τ1 Γ ` Rn τnΓ ` f1 7rarr R1 fn 7rarr Rn τ

WTStruct

τ = variant[C1 τ1| | Cn τn]Γ ` R1 τ1 Γ ` Rn τnΓ ` [C1 7rarr R1 Cn 7rarr Rn] τ

WTVar

Γ ` R τΓ ` 〈R〉 arrτi〈τ〉

WTArr

Γ ` Rdef τ Γ ` Rexc τ Γ(i) = τi

Γ ` 〈Rdef i Rexc〉 arrτi〈τ〉WTArrI

Table 75 ndash Well-Typed Partial Equivalences

The atomic values are generic they are well-typed with respect to any type (WTgtWTperp) The partial equivalences of structures (WTStruct) are well-typed only withrespect to an adequate structured type whose field types are themselves compatiblewith the equivalences mapped to them Similarly the partial equivalences of variants(WTVar) are well-typed only with respect to an adequate variant type In turn theconstructors must be themselves pointwise compatible with the equivalences mappedto them For well-typed array equivalences (WTArr WTArrI) the default relationas well as the exceptional relation have to be compatible with the type τ of the arrayrsquoselements Furthermore the type of i the index of the known exceptional equivalencerelation has to be compatible with τi the arrayrsquos index type

The semantics of a partial equivalence R for a type τ is a partial equivalence re-lation over values of type τ Given a valuation E from variables to semantic values(Definition 442) the interpretation JRKτ of a relation R isin R with respect to sometype τ is a binary relation over Dτ (Definition 441) The interpretation JRKτ is definedas shown in Table 76

JEqualKτ = (x x)| x isin Dτ JAnyKτ = Dτ times Dτ

Jf1 7rarr R1 fn 7rarr RnKstructf1τ1fnτn =(f1 = v1 fn = vn f1 = w1 fn = wn) | foralli 1 le i le n (vi wi) isin JRiKτi

J[C1 7rarr R1 Cn 7rarr Rn]Kvariant[C1τ1| | Cnτn] = (Ci[vi] Ci[wi]) | (vi wi) isin JRiKτi

J〈Rdef 〉Karrτi 〈τ〉 = ((P (v)k) (P (w)k)) | forallk (vk wk) isin JRdef Kτ

146 Chapter 7 Correlation Analysis

J〈Rdef i Rexc〉Karrτi 〈τ〉 = ((P (v)k) (P (w)k)) |E(i) isin P =rArr (vE(i) wE(i)) isin JRexcKτ forallk 6= E(i) (vk wk) isin JRdef Kτ

Table 76 ndash Partial Equivalence Relations ndash Semantics

A partial equivalence relation R only relates values of the same type τ whichmust be compatible with Rrsquos ldquoshaperdquo For structures a partial equivalence relatespointwise the values of the fields of the two structure values For variant values apartial equivalence relation relates values built with the same constructor Ci usingarguments whose values are related by a relation Ri For arrays P indicates the supporttype which has to be identical for both values The values of the array elements arepointwise related by the same relation Rdef with the exception of the i-th elementswhich are potentially related by an exceptional relation Rexc Since variables i are usedfor indicating the exceptional elements the valuation E is used for determining thevalue of i

73 Paths and Correlations

731 Paths and Correlation Types

The partial equivalence relations discussed in Section 72 and defined in 721 are enoughto represent fine-grained information for values of the same structured type For thestop_thread example discussed in Section 711 these would suffice to express the equal-ity of the pid current_thread and address_space fields between the input process inand the output process o by simply mapping this pair to the following partial equiva-lence

threads 7rarr Anypid 7rarr Equalcurrent_thread 7rarr Equaladdress_space 7rarr Equal

However the partial equivalence relations cannot for instance be used to convey theequality at line 1 in Figure 71 between the value of the threads field of in and the localta variable By not tracking information such as this we lose the targeted informationregarding the threads field denoted by Rth in Figure 72 In order to express thisinformation we first need to be able to refer to the substructure inthreads and relateits value to the one of ta

To this end rather than handling only partial equivalences between pairs of variablesof the same type and approximating the rest to Any ndash the element that conveys noinformation ndash we introduce an intermediate level allowing us to store relations betweensubparts of values We begin by introducing access paths Unlike the symbolic pathsintroduced in Chapter 6 and defined in 631 that are used for computing dependencysummaries with context-sensitive elements the paths used for the correlation analysis

73 Paths and Correlations 147

are actual access paths inside some valuersquos structure The symbolic paths used indeferred dependencies may cover multiple actual paths inside a value whereas theaccess paths required for the correlation analysis represent unique chains of internalaccesses leading to a single nested subvalue Each access path is rooted at one of theprogramrsquos variables It is noteworthy to remark that in both cases an intermediate levelbelow variables needs to be introduced as soon as fine-grained relations between pairs ofvariables are considered directly or indirectly In the case of deferred dependencies thiswas not the main goal per se but rather a mechanism for obtaining more precision inspecific cases for already pertinent dependency results In contrast for the correlationresults this is imperative for obtaining useful expressive information in non-trivialcases We therefore define a recursive type π isin Π encompassing this

Definition 731 Access Path Type π isin Π

π = | ε empty ndash root| f π f isin F| Cπ C isin C| 〈i〉π i index program variable

The empty path denoted by ε is the special case denoting an access to an entireelement ie the root The action of appending a non-empty path πprime to another pathπ is denoted by π πprime For instance the path denoting the current_state field of thei-th active associated thread of the in process of our stop_thread predicate would bethe following inthreads〈i〉Sometcurrent_thread

Meaningful information is conveyed by associating paths and partial equivalencerelations For instance the equality between inthreads and ta at line 1 in Figure 71can be expressed by associating Equal to the pair of subelements identified by thethreads path in in and by ε in ta We call correlation such a mapping from a pairof access paths to a partial relation After setting the i-th element of ta to ti thethread with the current state set to Blocked and everything else left unmodified wecould express the relation between in and ta by two correlations namely

(threads ε) 7rarr 〈Equal i Any〉

(threads〈i〉Somet 〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

To this end we introduce correlation maps κ isin K defined below

Definition 732 Correlation Maps κ isin K Correlation maps κ isin K are finite mappings from pairs of paths to partial equiva-

lence relations R isin Rκ Πtimes Π rarr R

148 Chapter 7 Correlation Analysis

Generally for two given variables e and o a correlation (π ρ) 7rarr R specifies thate and o have nested subelements respectively identified by the inner paths π and ρwhose values are related by the relation R

We conclude this subsection by specifying what it means for paths correlations andcorrelation maps to be well-typed

For characterizing the contexts in which an access path π is well-typed we need toconsider the types of values to which it can be applied and the types of (sub)valuesto which it can lead to Therefore in the following we define a typing judgement foraccess paths as a three-place relation π τ rarr τ prime whose meaning is that π can beapplied to any value of type τ and in that case it will always describe subvalues of typeτ prime Additionally the typing judgement is also parameterized by a set of input variablesI which are the variables having the right to appear as identifiers for array accessesThis is detailed in Table 77

Γ I ` ε τ rarr τWTε

τ = structf1 τ1 fi τi fn τnΓ I ` πi τi rarr τ prime

Γ I ` fiπi τ rarr τ primeWTStructAPath

τ = variant[C1 τ1| | Ci τi| | Cn τn]Γ I ` πi τi rarr τ prime

Γ I ` Ciπi τ rarr τ primeWTVarAPath

Γ I ` πi τ rarr τ prime Γ(i) = τi i isin IΓ I ` 〈i〉πi arrτi〈τ〉 rarr τ prime

WTCellAPath

Table 77 ndash Well-Typed Access Paths

Correlations are mappings from pairs of access paths to partial relations Thoughthe two access paths can be applied to values of different types they both need toreturn subvalues of the same type τ prime Furthermore the partial equivalence relationassociated to them has to be well-typed with respect to τ prime as detailed in Table 75The inference rule for well-typed correlations is shown in Table 78

Γ I ` π τl rarr τ prime Γ I ` ρ τr rarr τ prime Γ ` R τ prime

Γ I ` (π ρ) 7rarr R (τl τr)WTCorrelation

Table 78 ndash Well-Typed Correlations

73 Paths and Correlations 149

Finally as shown in Table 79 a correlation map κ is well-typed if all the correlationsit contains are well-typed

forall(π ρ) 7rarr R isin κ Γ I ` (π ρ) 7rarr R (τl τr)Γ I ` κ (τl τr)

WTCorMaps

Table 79 ndash Well-Typed Correlation Maps

732 Alignment and Partial Order

There is no clear choice for a canonical form for correlations For instance it is equiv-alent to write (ε ε) 7rarr f 7rarr R and (f f) 7rarr R Is one superior to the otherWhich one should be chosen Operations can create and manipulate correlations indifferent manners that are hard to predict New correlations can also be introducedwhile considering def-use chains in the transfer function presented later in Section 741Choosing between the two forms considerably limits flexibility Not choosing a canoni-cal form however has consequences as well notably it renders the definition of a partialorder between correlation maps difficult In order to compare two correlation maps κ1and κ2 we cannot simply verify if the path pairs are identical and compare their asso-ciated relations A correlation of the second map could be linked in different mannersto multiple mappings of the first

For instance between a process p of the type used by our stop_thread example andan array ta of the same type as the field threads of the process we might have thefollowing correlation maps

κ1 (threads ε) 7rarrlang

None 7rarr Any

Some 7rarr

t 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

rang

κ2

(threads ε) 7rarr 〈Equal i Any〉

(threads〈i〉Somet 〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

These correlation maps can be depicted as follows

150 Chapter 7 Correlation Analysis

κ1

threadsR1

p

taε

κ2

threadsR2

Rprime2

p

taε

As illustrated above in the given example map κ2 in addition to the relation R2associated to (threads ε) the relation associated to (threads〈i〉Somet 〈i〉Somet)and denoted by Rprime2 expresses information about the values of the processrsquo threadsfield and ta as well These are nested in the i-th element of each as identified by〈i〉Somet In order to compare these two correlation maps we have to first determinethe relationships between the pair of paths (threads ε) from κ1 and each pair of pathsof κ2 The first pair of paths in κ2 is identical whereas the second pair refers toelements that are further away from the root Based on these relationships we haveto extract all the information relevant to (threads ε) from κ2 and consider it in itsentirety This amounts to

(threads ε) 7rarrlangEqual i

None 7rarr Any

Some 7rarr

t 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

rang

Having expressed the information from the κ2 correlation map at the same level asthe information of κ1 is expressed ie that of the pair of paths (threads ε) wecan finally compare them and conclude that the information contained by κ2 is moreprecise than the relation associated to (threads ε) in κ1 The relation associated to(threads ε) in κ1 captures the equality between the values of the identifier and stackfields of all active thread elements of the two arrays identified by the paths The relationassociated to (threads ε) in κ2 expresses the equality between all thread elements ofthe two arrays except the i-th elements Furthermore if the i-th elements of the twoarrays are active it captures the equality between the values of the identifier andstack fields Thus by using the information contained by κ1 we can conclude that for

73 Paths and Correlations 151

all active elements of the two arrays the values of 2 out of the 3 fields are equal byusing the more precise information contained by κ2 we can conclude that all elementsof the two arrays are equal except the i-th one for which the values of the same 2 outof 3 fields as in κ1 are equal

In the general case for comparing two correlation maps κ1 and κ2 we need tocollect for each correlation (π ρ) 7rarr R in κ2 all the information contained by κ1 thatrefers to the elements identified by (π ρ) and verify if this covers at least the sameinformation as the relation R This information could be scattered across multiplemappings of the correlation map κ1 We call alignment the process of collecting forany correlation (π ρ) 7rarr R in κ2 all the information contained in κ1 that refers tothe elements identified by (π ρ) It is necessary in the absence of a canonical forma trait of our approach that is both a weakness and a strength it leads to complexcomputations but gives considerable flexibility as will be shown in Section 74

For aligning we first determine the relationships between paths by determining therelationship between the sequences of internal accesses that they represent These canbe identical representing the same traversal to the same subelement of a value or theycan be completely unrelated such as f and g for instance representing accesses to twodifferent fields of a structure They can also represent sequences of accesses of differentdepths one being the prefix of the other ie being closer to the root For examplethe path f is a prefix of the path f〈i〉 the first represents the access to the field f whereas the second one represents an access to the i-th element of the array nested inthe field f

To distinguish between these cases we define a link type and a matching operator

Definition 733 Link Type micro isinM A link type denoted by micro isinM is defined as follows

micro = | Identical| Left π| Right π| Incompatible

Definition 734 Matching Operator fThe matching operator f retrieves the link micro between two paths

f Πtimes Π rarrM f (π ρ) =

Identical π = ρLeft πprime π πprime = ρRight ρprime ρ ρprime = πIncompatible otherwise

The different cases are depicted in Table 711

152 Chapter 7 Correlation Analysis

f(π ρ) = Identicalπ ρ

f(π ρ) = Left πprime

π

πprime

ρ ρ

f(π ρ) = Right ρprimeπ

ρ

ρprime

π

f(π ρ) = Incompatibleπ ρ

Table 711 ndash Links between Access Paths

Definition 735 AligningAligning a correlation (π ρ) 7rarr R to another pair of paths (πprime ρprime) is denoted by

(Πtimes ΠtimesR)times (Πtimes Π)rarr R [(π ρ) 7rarr R] (πprime ρprime) = R(πρ)(πprimeρprime)

From R we obtain the information referring to the elements identified by (πprime ρprime) anddenote it by R

(πρ)(πprimeρprime) This is done by matching on π and πprime on the one hand and

on ρ and ρprime on the other and by distinguishing between the different cases Whenthe paths are identical we can simply return the relation R When the links betweenthe paths differ or when the paths are incompatible we have to approximate to theleast precise relation thus returning Any When π and ρ are more shallow paths iecloser to the root we need to make a projection denoted by For example aligning(f ε) 7rarr a 7rarr Ra b 7rarr Rb c 7rarr Rc to (fb b) consists in projecting b on the relationa 7rarr Ra b 7rarr Rb c 7rarr Rc and thus obtaining Rb More generically this case isdepicted below

73 Paths and Correlations 153

αβγ

δ

πα

β

γ

δ

αβγ

δ

πα

β

γ

δ

R

For aligning the known correlation to the given pair of paths we need to extractfrom R the information that is relevant for the nested element δ as depicted below

αβγ

δ

πα

β

γ

δ

αβγ

δ

πα

β

γ

δ

R

On the contrary if πprime and ρprime are closer to the root we need to perform an injectiondenoted by x For example aligning (fb b) 7rarr Rb to (f ε) consists in creating arelation a 7rarr Any b 7rarr Rb c 7rarr Any More generically this case can be depicted asfollows

αβγ

δ

βγ

δ

αβ

β

For aligning the known correlation to the given pair of paths we need to expressthe relation R

δat the level of the (αβ β) paths a level that is closer to the root This

consists in creating a new higher-level relation where the element identified by δ ismapped to R

δand everything else is ldquofilledrdquo with Any since nothing is known about

the rest of the elements This can be depicted as follows

154 Chapter 7 Correlation Analysis

αβγ

δ

βγ

δ

αβ

β

Any Any

In the general case R(πρ)(πprimeρprime) is computed as defined below

Definition 736 Computation of R(πρ)(πprimeρprime)

R(πρ)(πprimeρprime) =

R whenf (π πprime) = f(ρ ρprime) = Identical (σ R) whenf (π πprime) = f(ρ ρprime) = Left σx (R σ) whenf (π πprime) = f(ρ ρprime) = Right σAny otherwise

The used projection and injection x operators are defined as follows

Definition 737 Projection Operator

ΠtimesR 9 R

Projection (π R) =

R when π = ε (πprime extrf (R)) when π = f πprime

(πprime extrC(R)) when π = Cπprime (πprime extr 〈i〉(R)) when π = 〈i〉πprime

Definition 738 Injection Operator x

x R times Π 9 R

Injection x (R π) =

R when π = ε

f1 7rarr Any fi 7rarrx (R πprime) fn 7rarr Any when π = f πprime f = fi[C1 7rarr Any Ci 7rarrx (R πprime) Cn 7rarr Any] when π = Cπprime C = Cilang

Any i x (R πprime)rang when π = 〈i〉πprime

For applying the injection operator we need to know the types of the elements ontowhich the relation is injected ie in order to ldquofillrdquo the unknown relations for fields orconstructors with Any we need to know which those fields or constructors are Thusin practice we need to connect the types to the context

Aligning a correlation map κ isin K to (πprime ρprime) amounts to performing this operationfor each element (π ρ) 7rarr R of κ and intersecting the results with the andR operator(Definition 724)

Definition 739 Aligning Correlation Maps

κ (πprime ρprime) =and

R(πρ)7rarrRisinκ

R(πρ)(πprimeρprime)

74 Intraprocedural Correlation Analysis 155

The obtained results R(πρ)(πprimeρprime) are intersected in order to take into account all the in-

formation scattered across the different elements of κ and thus to obtain the mostprecise partial equivalence relation that is contained in κ about the elements identifiedby (πprime ρprime)

Finally we can define the preorder for correlation maps

Definition 7310 Correlation Maps Preorder v

κ1 v κ2 lArrrArr forall[(π ρ) 7rarr R] isin κ2 κ1 (π ρ) vR R

A correlation map κ1 is therefore more precise than another correlation map κ2 if therelation obtained by aligning κ1 to any pair of paths (π ρ) of κ2 is more precise thanR the relation mapped to this pair in κ2 By definition any correlation map κ isin Kis smaller than empty the empty correlation map Therefore the empty correlation mapis the top element for the correlation maps semilattice A bottom element in this casedoes not make sense as it would have to map to Equal any pair of paths denoting(sub)elements having compatible typesThe defined join operation between two correlation maps is denoted by

or

Definition 7311 Join Operationor

for Correlation Maps

κ1orκ2 = κ3 lArrrArr forall[(π ρ) 7rarr R] isin κ1 κ3(π ρ) = R orR κ2 (π ρ)

It consists in aligning the correlation map κ2 to any correlation (π ρ) 7rarr R in κ1 andjoining the obtained aligned relation with R We note that the correlation map obtainedby joining κ1 and κ2 will contain the same keys as κ1 We could have expressed joinby aligning the first correlation map to the elements of the second map This wouldlead to results that have different forms ie (ε ε) 7rarr f 7rarr R versus (f f) 7rarr R butwhich are equivalent by definition

The meet operation between two correlation maps is denoted byand

Definition 7312 Meet Operationand

for Correlation Maps

κ1andκ2 = κ3 lArrrArr κ3(π ρ) =

R andR Rprime when (π ρ) 7rarr R isin κ1

and (π ρ) 7rarr Rprime isin κ2R when (π ρ) 7rarr R isin κ1Rprime when (π ρ) 7rarr Rprime isin κ2

forall(π ρ)

74 Intraprocedural Correlation Analysis

741 Intraprocedural Correlation Summaries and Analysis

As was the case for the dependency analysis presented in Chapter 5 we are working witha control flow graph (CFG) representation of the predicatesrsquo bodies We remind thatnodes represent program states and edges are defined by statements with a particularexit label λ In our case all the outgoing edges of a node n bear the different cases of

156 Chapter 7 Correlation Analysis

the same statement s found at the program point n For each statement s there is anedge labeled s λk for each of its possible exit labels λk (as discussed in Section 42)However similarly to the dependency analysis our correlation analysis does not dependon this specificity

Intraprocedurally correlation information has to be kept at each point of the controlflow graph for each input and output pair of the node

Definition 741 Intraprocedural Correlation SummariesAn intraprocedural correlation summary is a mapping from pairs of variables v isin V

to correlation mapsK isin K K V times V rarr K

There is one special case called NoCorrelation which associates Any ndash the least precisepartial relation ndash to any pair of variables on any pair of valid compatible paths Itis the top element at the intraprocedural level Unreachable is used for nodes thatcannot be reached as its name implies and constitutes the bottom element at theintraprocedural level

For each node of a given control flow graph K(e o) retrieves the correlation mapbetween the local variable e and the output variable o If a mapping for e and o doesnot currently exist K(e o) retrieves the correlation (ε ε) 7rarr Equal when e = o or theempty correlation map empty otherwise

Establishing the partial order vK and the join operationorK is straightforward v

(Definition 7310) andor

(Definition 7311) are extended pointwise to an intraproce-dural summary for each ordered input-output pair and its associated correlation map

Definition 742 Partial Order for Intraprocedural Correlation Summaries

vKsube K timesK K1 vK K2 lArrrArr foralle o isin V K1(e o) v K2(e o)

Definition 743 Join Operation for Intraprocedural Correlation SummariesorK K timesK rarr K K1

orKK2 = K3 lArrrArr forall(e o) K3(e o) = K1(e o)

orK2(e o)

Our correlation analysis is a backward data-flow analysis computing an intrapro-cedural summary at each point of the control flow graph This represents the cor-relations at the nodersquos entry point For each exit label it traverses the control flowgraph starting with its corresponding exit node The intraprocedural summary forthe currently analysed label is initialized with pairs between the local value of eachassociated output variable of the label and the final value of the same output variablemapped to (ε ε) 7rarr Equal The analysis traverses the control flow graph and graduallyrefines the correlations using Kildallrsquos worklist algorithm (Kildall 1973) until a fixedpoint is reached Table 712 summarizes the representation and general equation ofthe statements For each statement the presented data-flow equation operates on theintraprocedural summaries of the statementrsquos successor nodes The intraproceduralsummary at the entry point of the node is obtained by joining the contributions ofeach outgoing edge

74 Intraprocedural Correlation Analysis 157

Definition 744 The contribution of an edge (n ni) labeled with s and λi is givenby Csλi(Kni) isin C where Csλi() is the transfer function of the edge labeled s λi

We note that there are four statements supported by αSmil ie the equality test no-operation the partial structure equality test and the possible variant test that haveno write effects and thus have no own contribution and are not included in Table 712Excepting the no-operation statement the correlation information at their entry pointis obtained by simply joining the intraprocedural summaries of their successor nodeson the true and false exit labels For the no-operation statement the correlation in-formation at the entry point is identical to the intraprocedural summary of its onlysuccessor node the one on the true exit label

Table 712 ndash Statements ndash Representations and Data-Flow Equations

Representation Equationn

n1 ni nk

Kn

Kn1

KniKnk

s λ1 s λks λiKn =

orK

nsλiminusminusrarrni

Csλi

(Kni)

Statement Csλ() csλ killλ

Assignment o = e (e o) 7rarr [(ε ε) 7rarr Equal] otrue

New Struct r = e1 en foralli 1 le i le n (ei r) 7rarr [(ε fi) 7rarr Equal] rtrue

Destructure o1 on = r foralli 1 le i le n (r oi) 7rarr [(fi ε) 7rarr Equal] oitrue

Get Field o = rfi (r o) 7rarr [(fi ε) 7rarr Equal] otrue

Set Field rprime = r with fi = e (r rprime) 7rarr [(ε ε) 7rarr rprimetruef1 7rarr Equal fi 7rarr Any fn 7rarr Equal]

(e rprime) 7rarr [(ε fi) 7rarr Equal]

Create Var v = Cp[e] (e v) 7rarr [(εCpe) 7rarr Equal] vtrue

Var Switch switch(v) as [o1| |on] (v oi) 7rarr [(Cie ε) 7rarr Equal] oiλCi

Array Get o = a[i] (a o) 7rarr [(〈i〉 ε) 7rarr Equal] otrue

Array Set aprime = [a with i = e] (a aprime) 7rarr [(ε ε) 7rarr 〈Equal i Any〉] aprimetrue(e aprime) 7rarr [(ε 〈i〉) 7rarr Equal]

The transfer function Csλ() formalizes the correlations created by the statement son the label λ between its local input variables and its local output variables denotedby csλ as well as the set killλ of variables whose values have been redefined by thestatement s on the label λ These are shown in Table 712 There is one crucialdifference between transfer functions Csλ() and intraprocedural summaries K Anintraprocedural summary K implicitly maps any pair (v v) for v isin V to (ε ε) 7rarr EqualOn the contrary in csλ when the variable v is used as both input and output by the

158 Chapter 7 Correlation Analysis

statement s the pair (v v) is mapped to the correlation map known between the inputrsquosv old value and the outputrsquos v fresh value Otherwise when v is an output ie v isin killλbut not an input of s (v v) is mapped to empty We remark that K represents a statewhile csλ represents a transition

In order to obtain the contribution Csλi(Kni) of an edge labeled with s and λi weneed to connect the information given by csλi to the information contained in the in-traprocedural summary Kni For example at the entry of node 3 in Figure 71 (onpage 138) when considering the scenario in which the predicate exits with true theintraprocedural summary contains the mapping

(th o) 7rarr

(Somet threads〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

On the true edge statement 2 creates the mapping

(ta th) 7rarr [(〈i〉 ε) 7rarr Equal]

Intuitively since we are traversing the graph backwards and we are mapping ordered(local) input-output pairs (ta th) and (th o) can be seen as a def-use pair thecorrelation associated to (ta th) expresses the relation between the defined value of thand the input ta used for creating it while the correlation associated to (th o) showsa subsequent use of that value of th for creating o The contribution of statement 2 onthe true edge should capture this flow of tarsquos value to orsquos value through the variableth Thus it should contain a mapping for the pair (ta o) In the general case we needto detect any variable r such that [(p r) 7rarr κ] isin csλi [(r q) 7rarr κprime] isin Kni and computethe mapping for (p q) in Csλi(Kni)

In order to compute the correlation map associated to (ta o) we take into accountthe fact that both the right path ε of csλ(ta th) and the left path Somet of Kn3(th o)refer to the th variable However they do not represent traversals of the same depthε refers to the entire value of th while Somet refers to the value below the construc-tor Some Between ta and o we can conclude that the values nested under the Someconstructor of the i-th elements are related

(ta o) 7rarr

〈i〉Somet threads〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

We call the process of obtaining the correlation map associated to (ta o) from thecorrelations associated to (ta th) and (th o) composition

In the general case the composition operation is denoted by and it refers to theprocess of computing the flow of a variable p to a variable q through an intermediatevariable r Thus when knowing that (p r) 7rarr [(π ρ) 7rarr R] and that (r q) 7rarr [(πprime ρprime) 7rarrRprime] we must first obtain the link (Definition 733) between the paths ρ and πprime relating

74 Intraprocedural Correlation Analysis 159

subvalues of r to subvalues of p and q respectively This is obtained by matching withf (Definition 734) In the context of the example given above ρ and πprime are the pathsreferring to subvalues of the th variable ie ε and Somet respectively If the twopaths are incompatible ie they refer to different unrelated subvalues of r there isno flow between p and q through r If the paths are compatible we can compute thecorrelation between p and r by distinguishing between the three different possible linkcases obtained with f

The case when the same subvalue of r identified by ρ (and the identical πprime) is relatedto both p and q is depicted below

f(ρ πprime) = Identical

π ρ ρprimeR Rprimep r

πprimeq

In this case computing the flow from p to q through r is rather straightforward Sincethe same subvalue of r is related to prsquos subvalue identified by π and to qrsquos subvalueidentified by ρprime we can relate these two subvalues and map the pair (π ρprime) to therelation obtained by composing R and Rprime We note that given the special form ofpartial relations R isin R the compose operation at this level is equivalent to orR

1

(Definition 723) The computation of the correlation for p and q is depicted below

f(ρ πprime) = Identical

π ρ ρprimeR Rprime

R orR Rprime

p rπprime

q

The subelements of r related to p and to q respectively can also have differentgranularities one being nested deeper in r than the other For instance the subvalueof r identified by the path ρ can be closer to the root than its subelement identified byπprime related to q This case is depicted below

1However this would not be the case anymore for a more complex partial relation type includingnot only equivalences but also more general relations

160 Chapter 7 Correlation Analysis

f(ρ πprime) = Left σ

π

σ

ρ

σ

ρprimeR

Rprime

p rπprime

q

In this case we can only detect the flow of p to q at the level of rrsquos subelement that isrelated to both p and q ie the subelement nested deeper Thus in order to computethe correlation between p and q we need to project σ on R and to compose the obtainedrelation with Rprime This is summarized by the following figure

f(ρ πprime) = Left σ

π

σ

ρ

σ

ρprimeR

Rprime

(σ R) orR Rprime

p rπprime

q

Finally in the complementary case the subvalue of r identified by the path ρand correlated to p can be nested deeper than the subvalue identified by πprime which iscorrelated to q This case is depicted below

f(ρ πprime) = Right σ

π ρ

σ

ρprime

σ

RRprime

p rπprime

q

As in the previous case we can only detect the flow of p to q at the level of rrsquos subelementthat is related to both p and q ie the subelement nested deeper In this case we needto project σ on Rprime and to compose the obtained relation with R The flow between pand q is at the level of the subvalues identified by π and ρprime σ respectively This isillustrated below

74 Intraprocedural Correlation Analysis 161

f(ρ πprime) = Right σ

π πprime

σ

ρprime

σ

RRprime

R orR (σ Rprime)

p rπprime

q

Formally if the ρ and πprime paths are compatible we compose the correlation elements(π ρ) 7rarr R and (πprime ρprime) 7rarr Rprime thereby obtaining a new correlation element (πbull ρbull) 7rarrR which is computed as shown below

Definition 745 Computing (πbull ρbull) 7rarr R

(πbull ρbull) = (π ρ) bull (πprime ρprime) def=

(π ρprime) whenf (ρ πprime) = Identical(π σ ρprime) whenf (ρ πprime) = Left σ(π ρprime σ) whenf (ρ πprime) = Right σ

R = R Rprimedef=

R orR Rprime whenf (ρ πprime) = Identical (σR) orR Rprime whenf (ρ πprime) = Left σR orR (σRprime) whenf (ρ πprime) = Right σ

We note that the use of the projection operation (Definition 737) for both compat-ible non-identical link cases for rrsquos access paths related to p and to q respectively is aconsequence of not choosing a canonical form for correlations The flexibility conferedby the absence of a canonical correlation form is visible at the composition level

The composition of correlation maps is denoted by and defined below

Definition 746 Composition of Correlation MapsComputing κ1 κ2 amounts to intersecting the composition of all correlation ele-

ments from κ1 and κ2

(κ1 κ2)(πbull ρbull) =and

R(πρ)7rarrRisinκ1

(πprimeρprime)7rarrRprimeisinκ2(πbullρbull)=(πρ)bull(πprimeρprime)

R Rprime

Finally the contribution Csλi(Kni) is obtained as defined below

Definition 747 Contribution Csλi(Kni)

CtimesK rarr K csλ K = K prime where K prime(p q) =andr

(csλ(p r) K(r q))

It is depicted in Figure 75

162 Chapter 7 Correlation Analysis

statement s

(csλ1∆λ1)

orK

orK(csλn ∆λn)

csλ1Kλ1

csλnKλn

csλ1Kλ1 csλn Kλn

Figure 75 ndash Entry Point ndash Correlation Information

We conclude this section by specifying what it means for intraprocedural corre-lation summaries to be well-formed showing the corresponding inference rule in Ta-ble 719 Only ordered input-output pairs can appear as keys in intraprocedural map-pings Therefore the well-formedness judgement is parameterized by the set of inputvariables I and by the set of output variables O The former indicate variables thathave the right to appear as left members of the variable pairs while the latter indicatevariables that have the right to appear as right members of the variable pairs The cor-relation map associated to each such input-output pair must be well-typed with respectto the types of the variables as given by the typing environment Γ (Definition 431)The typing judgement for correlation maps was shown in Table 79

forall(e o) 7rarr κ isin K Γ(e) = τe Γ(o) = τo e isin I o isin OΓ I ` κ (τe τo)

Γ IO KWFIntraCor

Table 719 ndash Well-Formed Intraprocedural Correlation Summaries

742 Intraprocedural Correlation Analysis Illustrated

To better illustrate our correlation analysis at an intraprocedural level and to sum-marize everything that has been presented so far in this chapter we exemplify themechanism behind it step by step on the predicate stop_thread discussed in Sec-tion 711 on page 138 We consider the true execution scenario apply our analysisand compare the actual obtained correlation results with the targeted ones depicted inFigure 72

Since a predicate can only exit with one label at a time and we are analysing thetrue label we can map the exit node inval to the special case Unreachable We beginby initialising the correlation summary for the exit node corresponding to the true exitlabel As shown in Figure 76 this consists in mapping the pair referring to the localvalue of the o variable and the final state of o to a correlation map containing a singlecorrelation namely (ε ε) 7rarr Equal This acknowledges that the value of the output oretrieved to the predicatersquos callers is the most recent value computed locally In thefollowing we denote the final value of o by o in order to distinguish it from the localvalue

74 Intraprocedural Correlation Analysis 163

1 ta = inthreads

2 th = ta[i]

3 switch(th) as [ | ti]

4 s = Blocked

5 ti = ti with current_state = s

6 th = Some(ti)

7 ta = [ta with i=th]

8 o = in with threads=ta

9 true 10 inval

true

true

true

true

true

true

true

true

false

false

None

Unreachable(o o) 7rarr (ε ε) 7rarr Equal

Figure 76 ndash Analysing Predicate stop_thread ndash Initialisation

We advance backwards along the control flow graph reaching node 8 We apply theequation corresponding to a field access as given in Table 712 and obtain the followingcorrelation summary

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr (ε threads) 7rarr Equal

We compose it with the correlation summary of its successor node ie the exit nodecorresponding to the true exit label thus detecting the flow of in to o and of ta to o

164 Chapter 7 Correlation Analysis

respectively through the local value o This amounts to

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr (ε threads) 7rarr Equal

Since node 8 does not have any other successor nodes the correlation information atits entry point is identical to the one we have just computed

We advance one step reaching node 7 and apply the corresponding equationthereby obtaining

(ta ta) 7rarr (ε ε) 7rarr 〈Equal i Any〉

(th ta) 7rarr (ε 〈i〉) 7rarr Equal

We compose it with the correlation summary of node 8 tracking the flow of the localvalue of ta to o through the new state of the variable ta after updating its i-thelement We also track the flow of th to o The correlation map for the (in o) pairremains unchanged We thus obtain

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr (ε threads) 7rarr 〈Equal i Any〉

(th o) 7rarr (ε threads〈i〉) 7rarr Equal

In order to obtain the correlation information at the entry point of node 7 we need tojoin the computed correlation summary with the correlation summary known for theother successor of node 7 namely the exit node 10 Since the latter is Unreachable theidentity element for join at the intraprocedural level it does not affect the correlationsummary at the entry point of node 7 We proceed similarly for nodes 6 5 4 3 and 2applying the corresponding data-flow equation for each statement and composing withthe intraprocedural correlation summary of the successor node Since each of thesenodes has only one possible exit label there are not multiple contributions that need tobe joined At the entry point of node 6 for example we obtain the following summary

(ta o) 7rarr (ε threads) 7rarr 〈Equal i Any〉

(ti o) 7rarr (ε threads〈i〉Somet) 7rarr Equal

74 Intraprocedural Correlation Analysis 165

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

We skip some steps and obtain the following correlation summary at the entry point ofnode 2

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr

(ε threads) 7rarr 〈Equal i Any〉

(〈i〉Somet threads〈i〉Somet) 7rarr

id 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

Finally we reach node 1 where we apply the data-flow equation correspondingto a field access and compose the obtained information with the correlation summarycomputed at the entry of node 2 We obtain

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(threads threads) 7rarr 〈Equal i Any〉

(threads〈i〉Somet threads〈i〉Somet) 7rarr

id 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

Since the node 1 has only one successor node this correlation summary represents

the correlation information at the entry point of node 1 ie there is no other correlationsummary to join it with This contains a single pair of variables (in o) and theirassociated correlation map Since the pair is an input-output pair of the stop_threadpredicate we do not need to filter anything out This constitutes the final correlationsummary for the analysed predicate on the true exit label These results are identicalto the ones we had depicted as our targeted results in Figure 72

For the inval exit label the corresponding correlation summary is NoCorrelationThis example can be tried on the web page2 dedicated to our correlation analysis Other

2Correlation Analysis Web Page httpwwwajl-demofr2016

166 Chapter 7 Correlation Analysis

examples are provided and explained there as well Additionally users can devise andtest their own examples

75 Interprocedural Correlation AnalysisOur analysis is performed label by label and interprocedural correlation domains asso-ciate an intraprocedural summary to each exit label of the analysed predicate There-fore interprocedural domains encapsulate an intraprocedural summary for each possibleexecution scenario of a predicate

An interprocedural domain Kp of a predicate p is thus defined as shown below

Definition 751 Interprocedural Correlation Domain

Kp Λp rarr K where Λp is the set of output labels of predicate p

The intraprocedural summary associated to each label is filtered so as to contain onlyordered pairs of variables where the left member is an input of the analysed predicateand the right member is an output associated to the analysed label The correlationmaps associated to such pairs are built so as to contain correlations where only inputvariables may appear in array cell paths Similarly the exception index in partialequivalence relations of arrays must be an input variable Registering exceptions inarray correlations only for input variables is not a consequence of a language restrictionon array operations but simply a consequence of the fact that at the interprocedurallevel only correlation information between inputs and outputs makes sense

The interprocedural domain of a predicate is used for deducing the transfer functionsfor a predicate call statement

In the following we detail the equation corresponding to a call to a predicate

p(e1 en)[λ1 o1 | | λm om]︸ ︷︷ ︸s

having the following signature

p(ε1 εn)[λ1 ω1 | | λm ωm]

The general equation form given in Table 712 applies

Kn =orK

nsλiminusminusrarrni

Csλi(Kni)

The transfer functions for the predicate call statement are deduced from the predicatersquosinterprocedural domain in the following fashion

Csλi(Kni) = csλi Kni killλi = oicsλi(ej o

ki ) = κjki forallj isin 1 n forallk isin 1 h

76 Extension ndash Constructor Evolution 167

whereκjki = Kp(λi)(εj ωki ) J (ε 7rarr e)s = p(e1 en) [λ1 o1 | | λm om] oi = o1

i ohi

Namely the contribution of a predicate call to each (ej oki ) input-output pair stemsfrom the contribution of the interprocedural domain for label λi and formal input-output pair (εj ωki ) In these all the formal input parameters ε in array partial equiv-alences and in array cell paths are substituted by the corresponding effective inputparameters from e or approximated away The substitution operation is denoted byJ (χ) where χ is a substitution from formal to effective parameters

Our correlation analysis is context-insensitive and αSmil programs are analysed bycomputing once and for all an interprocedural correlation summary for every predicatethey contain The correlation summaries are stored in a mapping binding predicateidentifiers to their interprocedural correlation information

76 Extension ndash Constructor EvolutionThe correlation analysis as presented so far in this chapter tracks and detects partialequivalence relations between inputs and outputs of predicates An interesting directionto investigate would be an extension of our analysis allowing us to detect not onlyequivalences but more general relations that could capture the evolution of constructorsfor variants In Figure 74-b) we illustrated the form of correlations computed forvariants With the extension the correlation information obtained for variants wouldbe richer as illustrated in Figure 77

Figure 77 ndash Construction Evolution

This extension would allow inferring the preservation of certain properties whentransitioning from a ldquostrongerrdquo state to a ldquoweakerrdquo state For instance we consideragain our process and thread data types introduced in Chapter 3 Section 315 (onpage 49 and 48 respectively) Additionally we consider a predicate kill_thread shownbelow which modifies the array of associated threads of the input p by setting the i-thelement to None If the i-th element is already inactive no modifications are made Inthis case the predicate exits with label inactive and simply copies p to the output o

predicate kill_thread ( process p int i)-gt [ true process o | inactive process o | oob] array ltoption ltthread gtgt threads option ltthread gt thi thread ti o = p [ true -gt 1]

168 Chapter 7 Correlation Analysis

threads = o threads [ true -gt 2]thi = threads [i] [ true -gt 3 f a l s e -gt 9]switch (thi) as [ti |] [Some -gt 4 None -gt 8]thi = None [ true -gt 5]threads = [ threads with i = thi] [ true -gt 6 f a l s e -gt 9]o = o with threads = threads [ true -gt 7][ true][ inactive ][oob]

For variants we are currently detecting equivalence relations between the argumentsof variant values built with the same constructor With the extension for capturingconstructor evolution we could take a step further and also detect for a given executionscenario the set of possible transitions between the different constructors For instancefor the kill_thread predicate on the true exit label we could detect that the onlypossible transition of the i-th element of the threads array is from Some to None Had theelement been None the predicate would have followed the inactive execution scenario

We further consider a predicate disjoint_stacks(process p) verifying a fundamen-tal property of any process namely the fact that the stacks of all associated threads ofthe process are disjoint If the property holds for the input process p prior to executingkill_thread intuitively it should continue to hold subsequently for the output processo as well If the arrayrsquos i-th element was already inactive ie None the propertydisjoint_stack obviously still holds since the input p is simply copied to the outputo If it was active the transition from Some to None does not impact the property asit does not create a new memory region that could threaten the property In this casethe transition from Some to None is a transition from a ldquostrongerrdquo state to a ldquoweakerrdquostate

We have conducted preliminary experiments targeting the detection of such infor-mation and these have led to promising results Tracking general relations that captureevolution requires certain modifications that are confined to the abstract partial relationtype and to the data-flow equations concerning variants

The abstract partial relation type presented in Section 72 (Definition 721) wouldneed to be extended with Impossible an additional atomic case along with Equal andAny It is required for signalling impossible transitions between variant constructors andleads to some overlap with the possible-constructors analysis presented in Chapter 5The partial relations for variants would be expressed as a square matrix of constructorswhere each element aCiCj of the matrix has a corresponding associated partial relationRCiCj Impossible would be associated to any element aCiCj for which the transitionfrom Ci to Cj is impossible For the elements aCiCi on the main diagonal for which thetransition from Ci to Ci is possible we could compute partial equivalences between thearguments of the Ci constructor For the elements aCiCj lying outside the main matrixdiagonal for which the transition from Ci to Cj is possible the associated relationwould be Any Alternatively for computing reflexive relations we could consider thattransitions on the main diagonal ie from Ci to Ci are always possible

77 Related Work 169

Impossible would become the bottom element of our partial relation type R replac-ing Equal in this role It would also become the identity element for the join operationorR (Definition 723) of partial relations and the absorbing element for the meet op-eration andR (Definition 724) Similarly to the case of for the abstract dependencytype the current bottom element Equal would become the middle element of a doublediamond-shaped abstract type and it would require the addition of some extra compar-ison cases for vR (Definition 722) as well as some extra cases for the orR (Table 72)and andR (Table 73) operations The most important modification however would bein the case of the compose operation Currently the compose operation at the level ofpartial equivalence relations is orR With this extension it would amount to a matrixmultiplication

77 Related WorkA rigorous presentation of the frame problem in specification and the different existingapproaches for addressing it has been given by Borgida et al (Borgida Mylopoulosand Reiter 1993 Borgida Mylopoulos and Reiter 1995) A more recent overview offraming is included in (Hatcliff et al 2012)

In recent years a vast body of research has been conducted on the specificationof frame properties in the context of modular programming This ranges from com-plex approaches imposing the swinging pivots requirement (Leino and Nelson 2002) toapproaches using data groups (Leino 1998 Leino Poetzsch-Heffter and Zhou 2002)adopting the Universe type system (Muumlller 2002 Muumlller Poetzsch-Heffter and Leav-ens 2003) or variations of it (Leino and Muumlller 2004 Leino and Muumlller 2006 Barnettand Naumann 2004 Barnett et al 2004) to approaches based on the dynamic frametheory (Kassios 2006 Kassios 2011 Smans Jacobs and Piessens 2012) regionallogic (Banerjee Naumann and Rosenberg 2008) or separation logic (Reynolds 2002OrsquoHearn Yang and Reynolds 2004 Parkinson and Bierman 2005)

In (Smans Jacobs and Piessens 2012) Smans et al present a technique for frameinference based on a variant of dynamic frames inspired by separation logic and relyingon accessibility information contained within pre- and postconditions By includingaccessibility information in a methodrsquos precondition an upper bound on the set oflocations modifiable by the method can be detected In our case the upper bound onthe set of elements that a predicate may modify when exiting with a particular exit labelis implicitly the set of output variables generated on that exit label joined with theset of local variables The implicit dynamic frame approach requires the specificationof accessibility information Our correlation analysis is entirely automatic and infersfine-grained frame properties for compound data structures

The literature on shape analysis (Calcagno et al 2009 Sagiv Reps and Wilhelm1999 Jones and Muchnick 1979 Montenegro Pentildea and Segura 2015) and side effectsanalyses (Salcianu and Rinard 2005 Milanova Rountev and Ryder 2005) is vastThe former is aimed at deep-heap mutations while we are focusing on deep-state mod-ifications in the context of complex transition systems The latter determine memory

170 Chapter 7 Correlation Analysis

locations that may be modified by an operation Reasoning about heap locations isbeyond our scope We treat mappings between variables and their values analyse theirevolution in a side-effect free environment and detect not only what is modified butalso how and to what extent

In (Chang and Leino 2005) Chang and Leino present the congruence-closure ab-stract domain designed for an object-oriented context and implemented in the Specprogram verifier They infer and express relations between fields of variables a goalsimilar to ours The congruence-closure domain maintains equivalence graphs mappingfield accesses to symbolic locations On its own this domain allows the inference andexpression of relations for accessed fields In order to take into account updates as wellthis needs to use the heap succession domain as a base Unlike us they can expresspreorders between fields depending on the base domains used However our domainhandles both accesses and updates to structures arrays and variants in a uniform man-ner independent of additional information We have sketched an extension for handlingnot only equivalences but also more general relations capturing constructor evolutionThis is a direction we plan to investigate in the future

Rakamarić and Hu report in (Rakamaric and Hu 2008) a method to infer frameaxioms of procedures and loops based on static analysis As a starting point they usethe DSA shape analysis presented by Lattner et al (Lattner Lenharth and Adve2007) DSA provides a summary of points-to relations as a graph that is used tocompute a set of memory locations that are modified by a procedure or its calleesBy a pass through the graph for each node reachable from the globals or procedureparameters they generate expressions representing a path to that node The generatedframe axioms are used internally by an extended static checker of C programs iein a purely automatic setting In contrast our analysis is designed for an interactiveverification context Our technique focusing on a purely functional language is notconcerned by aliasing and does not depend on an external points-to framework

In (Taghdiri Seater and Jackson 2006) Taghdiri et al present a technique forextracting procedure summaries for object-oriented procedures used to prove verifi-cation conditions Procedures are executed symbolically and the environment of thepost-state is computed so as to express every variable and field in terms of the values ofthe variables and fields of the pre-state Their goal is broader than ours However un-like their summaries our correlation results encompass only information that is visiblefrom the outside (to the callers)

Bertrand Meyer presents the double frame inference strategy an approach that tar-gets the automation of both frame specification and frame verification in the contextof Eiffel (Meyer 1991) The first component ndash the frame specification inference ndash relieson the analysis of method postconditions The idea stems from an informal reviewof JML code which showed that in practice there is a considerable overlap betweenwhat is mentioned in an assignable clause ie modifies clause and what is includedin the postcondition It relies on the observation that in general when manually writ-ten specifications include clauses about what changes they also include clauses abouthow it changes By analysing a methodrsquos p postcondition a set p is obtained Thisrepresents an overapproximation of the set of elements that are allowed to be modified

78 Conclusion 171

by p according to its specification The second component of the strategy the frameimplementation inference relies on the frame calculus (Kogtenkov Meyer and Velder2015) which is itself based on alias calculus (Kogtenkov Meyer and Velder 2015Meyer 2010 Meyer 2011) Methods are analysed and p is detected this representsan overapproximation of the set of expressions whose values may change as a result ofexecuting p Frame verification amounts to verifying that p includes p Though ourgoal is closely related to the issue addressed by the double frame inference in generaland the frame calculus in particular the approaches are not directly comparable asthey target languages with different characteristics which in turn influence both theadopted analysis techniques and the derivative targeted issues Both approaches areconservative and automatic ie neither requires manual annotations In contrast tothe frame calculus our correlation analysis is standalone and it is not concerned byaliasing

78 ConclusionIdentifying precise information concerning the effects of program operations is possibleby means of static analysis without sacrificing scalability In this chapter we have pre-sented a data-flow analysis that tracks the origin of subparts of the output and relatesit to subparts of the inputs thus detecting not only what is modified but also how it ismodified and to what extent The correlation analysis is a flow-sensitive path-sensitiveinterprocedural analysis that handles arrays structures and variants The analysis iscontext-insensitive but this trait does not have a costly impact in terms of precisionWe have defined a partial equivalence type mirroring the layered structure of algebraicdata types and associative arrays and we introduced an intermediate level consisting ofaccess paths and correlations in order to compute expressive fine-grained equivalencesbetween parts of the inputs and parts of the outputs in a flexible manner Just asframe properties specified by means of old expressions tend to lead to a proliferationof conditions to be specified our correlation summaries showing equivalences betweeninput and output subelements can become verbose in the case of predicates handlinglarge compound values and modifying only a limited input subset However these aredetected automatically and their verbose form could easily be transformed using a morecompact notation of the following form

input ( - changed subelements) = output ( - corresponding subelements)Detecting modifications is traditionally associated to shape analyses that focus

on deep-heap mutations Side-effect analyses detect memory locations that may bemodified by an operation We however are interested by deep-state modifications inthe context of a functional language Other analyses inferring frame properties havebeen devised These are mostly used in a purely automatic setting We howeverdeveloped a correlation analysis meant to be used in an interactive verification context

Similarly to the case of the dependency analysis presented in Chapter 5 we haveimplemented a prototype of the correlation analysis in OCaml and we have applied it toa functional specification of ProvenCore (Lescuyer 2015) Medium-sized experiments

172 Chapter 7 Correlation Analysis

performed on the abstract layers of ProvenCore show encouraging results For instancethe correlation results of approximately 630 αSmil predicates totalling approximately10000 lines of code are obtained in less than 05 seconds ie faster than the dependencysummaries are obtained on the same predicates This is partly a consequence of thefact that unlike the dependency analysis which computes summaries for both codeand specifications the correlation analysis computes non-trivial results only for codeSpecifications are predicates with Boolean exit labels which generate no outputs Sinceour correlation analysis computes fine-grained relations between parts of the inputsand parts of the outputs it cannot detect anything non-trivial in their case Howeverthis would change if we were to extend our correlation analysis and track relationsbetween parts of the inputs as well This is a direction that we plan to investigate inthe future We will focus on the implementation and the discussion of the obtainedresults in Chapter 8 The prototype can be tested on the web page3 dedicated to ourcorrelation analysis where multiple examples are provided and explained Additionallyusers can devise and test their own examples

The correlation analysis presented in this chapter has been the subject of a previouspublication (Andreescu Jensen and Lescuyer 2016)

3Correlation Analysis Web Page httpwwwajl-demofr2016

173

Chapter 8

Implementation Application andResults

Any fact becomes important when itrsquosconnected to another

Umberto Eco

In this chapter we focus mainly on the practical aspects regarding our static anal-yses and the approach to using their results for inferring the preservation of certainlogical properties In Section 81 and Section 82 we give a brief overview of the imple-mentations of our dependency and correlation analyses respectively In Section 83 wesuccinctly present ProvenCore one of the two microkernels developed at Prove amp Runand discuss in terms of execution times and precision the experiments we made on itsfunctional specification In Section 84 we describe the manner in which the summariescomputed by our dependency and correlation analyses are meant to be combined andused for reasoning about the preservation of certain logical invariants We illustratethis approach and discuss it on some examples inspired by ProvenCore

81 Implementation of the Dependency AnalysisPrototypes for both of our static analyses the dependency analysis presented in Chap-ter 5 and its extension with symbolic dependencies presented in Chapter 6 as well as thecorrelation analysis presented in Chapter 7 have been implemented in OCaml (Reacutemyand Vouillon 1997) While trying to retain close proximity to the analyses as presentedtheoretically their implementation mildly diverges from them at certain points due toperformance and scalability considerations One of the main differences is related to themanner in which we store dependencies and partial equivalence relations Based on theobservation that in general when considering complex transition systems the statesare characterized by properties depending only on a limited subset of their subelementswhile most transitions modify only a limited subset of the input statersquos subelements weadopt a more compact representation This in turn is reflected in some of the operatorsas well

174 Chapter 8 Implementation Application and Results

811 Dependency Type and Operators

The abstract dependency type δ that mirrors the structure of associative arrays andalgebraic data types was introduced in Chapter 52 on page 83 It is implemented bythe recursive type dep shown below

( Implementation for the dependency typeintroduced in Chapter 52 )

type dep =| Everything ( top )| Impossible ( bottom )| Nothing| Deferred of accesses ( symbolic )| Struct of struct_typ dep FMapt| Variant of var_typ dep CMapt| Array of dep (var dep) option

The maps used for expressing dependencies for structures and variants use as keysfields and constructors respectively

type fieldmodule FMap EMapS with type key = field

type consmodule CMap EMapS with type key = cons

In contrast to the extended abstract dependency type δ (Definition 641) the actualdependency for structures stores in addition to the map associating dependencies tofields the type struct_typ of the structure as well Similarly the actual dependencyfor variants stores the variantrsquos type var_typ as well in addition to the map associatingdependencies to constructors

As previously mentioned we are targeting complex transition systems such as op-erating systems and microkernels In practice transitions frequently map a large inputstate to a large output state but for computing the output state they are concernedonly with a limited subset of the input state The number of subelements of a complexinput on which the outcome of a predicate depends tends to be low compared to thetotal number of input subelements so we are filtering fields mapped to denotedby Nothing in our implemented dependency type from dependencies for structuresSimilarly from dependencies for variants we are filtering constructors mapped to perpdenoted by Impossible in our implemented dependency type

As a consequence of this optimization we need to know and hence store the typesof structures and variants in order to correctly compare join and reduce dependenciescorresponding to such types In addition this is also useful for checking that theconstructed dependencies are well-typed

81 Implementation of the Dependency Analysis 175

For building dependencies of the corresponding type we have implemented smartconstructors The dependency type is private and new dependencies can be constructedonly by using the provided smart constructors

As explained in Section 52 gt and perp can apply to any type For instance gtcan be seen as a placeholder for data that is needed in its entirety Structure arrayor variant dependencies whose subelements are all entirely needed and thus uniformlymapped to gt are transformed to gt The perp dependency is a placeholder for data thatcannot occur on a certain execution scenario A whole variant value is impossible if allits constructors are mapped to perp A whole structure or array is impossible if any of itssubelements is impossible These canonizations1 are made by our smart constructorsFor instance the smart constructor for structure dependencies returns Everything ifit receives as an input a map of fields in which each key is mapped to EverythingSince fields that are absent from a field map must be interpreted as being mappedto Nothing before returning Everything the constructor also verifies that the map offields it received as an input contains all the fields of the structure type struct_typgiven as an input as well If the given map of fields contains an Impossible value thesmart constructor returns Impossible Any mapping field 7rarr Nothing is filtered fromthe given input map

Similarly for variant dependencies the corresponding smart constructor receives asinputs the variantrsquos type and a map from constructor keys to dependency values Ifall constructors of the variant as indicated by its type var_typ are present in the in-put map and mapped to Everything the smart constructor returns Everything Ifall constructors are present and mapped to Impossible the smart constructor re-turns Impossible Otherwise if the input map contains some constructors mappedto Impossible the corresponding mappings are filtered from the map used to build thevariant dependency

For arrays the smart constructor returns Everything if both the default dependencyand the known exceptional dependency are Everything or if the former is Everythingand there is no known exceptional dependency If any of the two dependencies isImpossible the smart constructor returns Impossible

The smart constructor for deferred dependencies receives a set of variables as aninput If the given set is empty the constructor returns Nothing Otherwise it createsthe access map having the variables in the given input set ie the root variables forsymbolic paths as keys As described in Section 65 a set containing a single paththe empty path is initially associated to each

The v operator (Definition 522) as formally presented in Section 52 and detailedin Table 51 on page 86 returns false whenever comparing two incompatible depen-dencies In practice situations in which comparisons on incompatible types are madeshould never be reached As a consequence whenever we compare structure or variantdependencies we check as a safety measure that the two dependencies correspondto structures or variants of the same type Otherwise the two dependencies are not

1For making all the described canonizations we have to make sure that whenever we replace δ byδprime both δ v δprime and δprime v δ hold

176 Chapter 8 Implementation Application and Results

comparable and we throw an exception that indicates that the types are incompatibleFor structure dependencies whenever a mapping for one field f can be found only inone of the two maps to be compared we compare its mapped dependency value toNothing since absent fields must be interpreted as being mapped to Nothing Similarlyfor variant dependencies whenever a mapping for a constructor C can be found only inone of the two maps to be compared we interpret it as being mapped to Impossible

The join (Definition 523) and reduction operator (Definition 524) as formallypresented in Section 52 on page 87 and 89 respectively are total they return gt theelement conveying no information for incompatible dependencies In practice the twooperators are partial an exception is thrown whenever the two dependencies to bejoined or reduced are incompatible This applies to structures or variant dependenciesthat do not correspond to the same type as well Otherwise when joining or reducingtwo compatible structure or variant dependencies we interpret missing fields or missingconstructors as being mapped to Nothing or Impossible respectively

In Section 661 we described that there are two types of free variables that canappear in dependencies The first type consists of index variables that can appear inarray dependencies For instance in ltNothing ^ i Everythinggt the variable i is theindex of the cell for which the exceptional dependency Everything is known Addi-tionally such index variables can also appear in symbolic paths related to arrays suchas ltNothing ^ i Deferred(a[i])gt or ltDeferred(a[ - i]) ^ i Nothinggt Suchindices must be input variables of the currently analysed predicate as explained in Sec-tion 532 on page 97 The second type of free variables are the root variables thatappear in deferred dependencies For instance in ltDeferred(a[ - i]) ^ i Nothinggtthe variable a is a root variable In the general case the root variables are those outputsto which symbolic access paths are associated in deferred dependencies In order tomake use of the computed context-sensitive information actual dependencies can besubstituted for the root variables This is done by applying the symbolic access pathsto the dependency to substitute By traversing entire dependencies such as

f -gt ltNothing ^ j Everything gtg -gt b -gt Deferred (o)h -gt x -gt Everything

y -gt ltDeferred (a[ - j]) ^ j Nothing gt

and substituting the nested deferred dependencies such as Deferred(a[ - j]) andDeferred(o) we apply context-sensitive information Simultaneously during the sametraversal we also substitute the indices appearing in array dependencies such as j inthe dependency associated to the field f for instance These are either substituted byanother index variable or they are forgotten If the index to substitute is an inputthe formal variable will be replaced by the effective one Otherwise an approximationis made in order to remove the local index variable This consists in joining thedefault and the exceptional dependencies and using the result for building a new arraydependency without an exception

An index substitution is a mapping from variables to either a new index variable toreplace it or to Forget if all references to the index variable should be removed Theindex type is shown below

81 Implementation of the Dependency Analysis 177

type index = | NewIdx of var | Forget

The substitution function subst has the following type

type varmodule VMap EMapS with type key = var

val subst index VMapt -gt dep VMapt -gt dep -gt dep

Its first argument is the index substitution the second argument is the dependencysubstitution mapping root variables to dependencies The third argument is the depen-dency on which the substitutions are to be made The function returns the dependencyobtained after making both substitutions The two substitution passes are fused forperformance considerations

A separate substitution is performed for dealing with polymorphic types Our de-pendency type is not polymorphic per se However αSmil supports polymorphic typesand thus the variables described by the computed dependencies can have a polymorphictype Since the types of structures and variants are stored in the corresponding depen-dencies we must substitute polymorphic type parameters by their effective argumentsThis is done by a recursive function which traverses the dependencies and makes thetype substitution at each nested level if necessary Besides this substitution no othermodifications were made in the implementation in order to handle polymorphism Thisjustifies our formal presentation of the analyses without polymorphism

812 Intraprocedural Dependency Analysis

The intraprocedural dependency type ∆ (Definition 531) mapping variables to depen-dencies δ that was introduced in Chapter 531 is implemented as shown below

type reachable = dep VMapt

( Implementation of the intraprocedural dependency domainintroduced in Chapter 531 )

type intra =| Unreachable| Reachable of reachable

The VMap type is a map having variables as keys

type varmodule VMap EMapS with type key = var

178 Chapter 8 Implementation Application and Results

In order to avoid needlessly storing large maps predominantly containing variablesmapped to Nothing we do not store by default mappings for variables for which de-pendencies have not yet been computed Therefore the intraprocedural dependency ofany variable v for which a mapping has not yet been stored in the map is interpreted asv 7rarr Nothing As discussed in the previous section for the partial order join and reduc-tion operators when applying v∆ (Definition 533) and the join or∆ (Definition 534)and reduction oplus∆ (Definition 535) operators at the intraprocedural level any miss-ing mapping from a Reachable domain has to be interpreted as a variable mapped toNothing

With this interpretation forgetting a variable v (Definition 532) from an intrapro-cedural domain denoted by in Chapter 531 becomes straightforward and amountsto simply removing the mapping for v from the intraprocedural domain

( Forget )l e t forget d v =match d with

| Unreachable -gt d| Reachable dmap -gt Reachable (VMap remove v dmap)

We remark that the complex operations are performed at the dependency typelevel and are mostly applied pointwise at the intraprocedural level The interproce-dural dependency domains are mappings from labels to intraprocedural dependencysummaries

82 Implementation of the Correlation Analysis

821 Partial Equivalence Relations and Operators

The partial equivalence type R (Definition 721) that mirrors the structure of associativearrays and algebraic data types which was introduced in Chapter 721 on page 141 isimplemented as shown below

( Implementation of the partial equivalence typeintroduced in Chapter 72 )

type pequiv =| Equal ( bottom )| Any ( top )| PStruct of struct_typ pequiv FMapt ( structures )| PVariant of var_typ pequiv CMapt ( variants )| PArray of pequiv (var pequiv ) option ( arrays )

The FMap and CMap types are the ones presented on page 174Similarly to structure and variant dependencies and due to the same practical

considerations in addition to the map associating partial equivalences to fields the

82 Implementation of the Correlation Analysis 179

type struct_typ of the structure is stored as well Similarly the implemented partialequivalence for variants stores the variantrsquos type var_typ as well in addition to themap associating partial equivalences to constructors

For avoiding to store large maps in which the majority of the fields or constructorsare mapped to Any we filter mappings of the type field 7rarr Any and cons 7rarr Any

The partial equivalence type is private and the only manner in which partial equiva-lence relations can be built is by using the provided smart constructors The two atomiccases Equal and Any respectively can apply to any type The smart constructors forpartial equivalences corresponding to structures filters out any field mapped to Any Italso returns Equal if all fields of the structure are mapped to Equal in the given inputmap If on the contrary the given input map is empty or all fields are mapped to Anythe smart constructor returns Any

Similarly for partial equivalences corresponding to variants the correspondingsmart constructor receives as inputs the variantrsquos type and a map with constructorkeys and partial equivalences If all constructors of the variants as indicated by theirtype are present in the input map and mapped to Equal the smart constructor returnsEqual If all constructors are present and mapped to Any or if the given input map isempty the smart constructor returns Any Otherwise if the input map contains someconstructors mapped to Any the corresponding mappings are filtered from the mapused to build the variant partial equivalence

For arrays the smart constructor returns Equal if both the default relation and theknown exceptional relation are Equal or if the former is Equal and there is no knownexceptional relation If both the default relation and the known exceptional relationare Any or if the former is Any and there is no known exceptional relation the smartconstructor returns Any

In contrast to dependencies there is only one type of free variables that can appearin partial equivalence relations namely index variables As was the case for arraydependencies these can appear in partial equivalence relations corresponding to arraysand they must be input variables We traverse the partial equivalences recursivelychecking for each index variable appearing in an array relation if it is an input ora local variable References to local variables are eliminated by approximating thepartial equivalences effectively joining the default array relations with the exceptionalarray relations

822 Intraprocedural Correlations

In Chapter 74 on page 156 we have defined intraprocedural correlation summaries(Definition 741) as mappings from pairs of variables to correlation maps In practicethe type intra is the following

module PVMap = EMapMake( struct type t = element element l e t compare = compare end)

module PMap = EMapMake( struct type t = Patht Patht l e t compare = compare end)

180 Chapter 8 Implementation Application and Results

type correlation = pequiv PMapttype intra = correlation PVMapt

type t =| Related of intra| NoCorrelation| Unreachable

The implemented intraprocedural correlation summary type intra is a mappingfrom pairs of elements to correlation maps The element type is shown below

( The type of the elements for which correlationsare computed and kept intraprocedurally Ghost elements are used only for variants for avariant [v] a ghost element that nests the typeof the variant [v] is created These are filteredfrom final results )

type element =| Local of var| Output of var| Ghost of texpr

In practice we need to distinguish between output variables and local variables Thisis important for distinguishing between the final value of an output ie the one cor-related with values of the inputs and its local intermediate values Furthermore weneed to introduce ghost elements for variants When constructing a variant v with aconstructor C(ab) for instance we can keep correlations between the pairs (av) and(bv) However we fail to capture the information regarding vrsquos construction with CIn order to maintain it we create a ghost element g_vtyp with vrsquos type we add thepair (g_vtypv) to the intraprocedural summary and associate (ε ε) 7rarr [C 7rarr Any] toit Such pairs are deleted from the intraprocedural predicate summaries they are onlyused while analysing a predicatersquos body

Unlike the operations discussed in Chapter 7 the implementations of the partialorder (Definition 742) and join (Definition 743) operations are parameterized by thetyping environment mapping variables to types This has to be threaded through alloperations as it is necessary for the injection operation (Definition 738) We needto know the variable type onto which the relation is injected For instance in orderto ldquofillrdquo the unknown relations for fields or constructors with Any we must first knowwhat those fields or constructors are

823 Dependency and Correlation Analysers

The input program is first parsed and each predicate is analysed in turn Implicit pred-icates are treated conservatively Since their implementation is hidden a pessimisticassumption must be made For the dependency analysis it is considered that every-thing in their inputs has been read in order to obtain the outputs for any possible exit

82 Implementation of the Correlation Analysis 181

label Similarly for the correlation analysis it is considered that there is no correlationbetween the input and the output variables on any possible exit label

For inductive predicates the dependency analysis computes a summary for eachcase and joins the results for obtaining the dependency summary for the true exitlabel The false label is treated conservatively and everything is considered to beread Since inductive predicates are specification-only predicates that do not generateoutputs the correlation analysis associates a NoCorrelation summary to both labels

( Analyse the body [g] of an explicit predicate )l e t analyze g =

l e t todo = Queue create () inListiter ( fun v -gt Queuepush v todo) (G vertices g)l e t result = init_result g inl e t rec progress r =

tryl e t v = Queue pop todo inl e t vd = MVfind v r inl e t edges = preds g v inl e t vd rsquo = transfer r v edges ini f Dleq vd rsquo vd then progress re l se begin

Listiter ( fun edge -gtQueue push ( source edge) todo) edges

progress (MVadd v (Djoin vd vd rsquo) r)end

with Queue Empty -gt rinprogress result

The body of each explicit predicate is analysed independently for each possibleexit label using a variation of the worklist algorithm as shown above in the analyzefunction Initially a map is created having as many elements as there are nodes inthe predicatersquos body All of these are initially mapped to Unreachable the bottomelement at the intraprocedural level All the predicatersquos exit nodes are loaded intothe working queue Then a recursive function progress is executed until a fixed pointis reached and there are no more nodes left to analyse in the working queue Thefirst node of the queue is popped and analysed The nodersquos summary as stored in themap is retrieved in vd The analysis returns a summary vdrsquo for the node The twosummaries vdrsquo and vd are compared and if the former is more precise than the latterthen the recursive function progress is called Otherwise before calling progress thepredecessors of the analysed node are pushed into the working queue and in the map ofnodes the join of vd and vdrsquo is associated to the analysed node Since both analyses arebackwards analyses the dependency and correlation information of a node is based onthe dependency or correlation information of its successors in the control flow graph andthe former must be recomputed if the latter are modified Finally from the computedintraprocedural dependency summary all mappings corresponding to local variables

182 Chapter 8 Implementation Application and Results

are filtered From the computed correlation summary of an exit label l all mappingsthat do not correspond to an input and output variable pair are filtered

For the dependency analyser a command-line flag can be used to disable the usageof deferred dependencies Also the well-typedness check of dependency summaries canbe enabled similarly

A parser for dependency information has been implemented as well This allowsus to annotate αSmil programs with the expected results and compare them to thecomputed ones A similar parser for the correlation information is planned for the nearfuture

83 Dependency and Correlation Results on ProvenCoreLayers

831 ProvenCore Description

ProvenCore (Lescuyer 2015) is one of the two microkernels entirely specified and devel-oped in Smart at Prove amp Run Unlike Minix 31 by which it was inspired ProvenCoretargets ARM architectures and uses a Memory Management Unit for managing virtualaddress spaces It is a general-purpose microkernel supporting creation and deletion ofprocesses execution of programs synchronous message-passing inter-process commu-nication with timeouts asynchronous notifications and process-to-process data copies

The main property ensured by ProvenCore is the isolation property Isolation impliestwo complementary properties namely integrity and confidentiality Integrity refersto ensuring that the resources of a process (its code data and registers) cannot bealtered or interfered with by other processes unless explicitly authorized by the processConfidentiality refers to ensuring that the resources of a process cannot be observed byother processes unless explicitly authorized by the process In other words integrityensures that until a process decides to communicate with other processes it will executeas if it were alone on the system Confidentiality ensures that as long as a process doesnot send its secrets to other processes it can change its secrets without affecting otherprocesses

The isolation property has been formally proven using the interactive proof as-sistant of ProvenTools The proofs also establish functional specifications verified byProvenCore (Lescuyer 2015)

The proof for the isolation property is based on multiple refinements between suc-cessive models from the most abstract on which the isolation property is defined andproven to the most concrete ie the actual model used for code generation Thesesuccessive models are shown in Figure 81

Using multiple abstract models each more abstract than its predecessor enablesa degree of separation of concerns in the overall proof The lower-level proofs includea plethora of low-level properties and invariants and are devoid of functional prop-erties while the higher-level models focus on functional specifications Each layer ofabstraction removes details that are not relevant for it anymore and enables changing

83 Dependency and Correlation Results on ProvenCore Layers 183

SPM

RSM

FSP

TDS

Most Abstract

Least Abstract

Figure 81 ndash ProvenCore ndash Abstract Layers

the representation of the transition system in order to internalize in the structure of itsstates some invariants of the preceding level

The Security Policy Model (SPM) is the most abstract level and the one at whichthe isolation property is expressed and proven The kernel is modeled as an abstractcontroller and the various processes are modeled as machines each possessing its ownindependent physical resources

The Refined Security Model (RSM) is an intermediate layer meant to bridge thewide gap between its successor the SPM and its predecessor the FSP In the RSMthe machines share the same physical resources which are managed by the controller

The Functional Specifications (FSP) layer is a model roughly equivalent to its pre-decessor ndash the TDS ndash in functionality but unlike the latter it uses data structures andalgorithms that facilitate reasoning and formal proof Its main functional differencewith the TDS is that it eliminates MMU address translation using instead a linearview of the RAM similarly to the RSM

The Target of Evaluation Design (TDS) is the model that is used to generate thesequential Smart code of the kernel as well as the models for hardware componentsthat are not translated into C code but which are necessary for completing the TDSspecifications

For each refinement a view ie a function from the concrete model state to theabstract model state is defined Then a correspondence or commutation lemma isproven establishing that transitions from c to cprime in the concrete model entail transitionsfrom the view of c to the view of cprime in the abstract model Since the views are not totalfunctions this requires showing that the views actually exist In this manner thehigher levels are attained reaching models that are simpler and more flexible than theTDS but that still simulate all its possible behaviours (Lescuyer 2015)

This refinement chain also facilitates reusing parts of one proof effort in other proofs

184 Chapter 8 Implementation Application and Results

832 Obtained Dependency and Correlation Results

Our dependency and correlation analyses must be evaluated by two different criterianamely execution time and precision In this section we are discussing the former Thelatter will be discussed in the following section

Both analyses target complex transition systems in general and operating systemsin particular The ideas behind them stemmed directly from the verification effortentailed by ProvenCore Unlike other static analyses which are frequently employed ina fully automatic setting our static analyses are supposed to be used as companiontools in the middle of interactive program verification They are supposed to be appliedoften as steps during interactive proofs For instance the dependency and correlationsummaries for different predicates might be needed for verifying a single propertyThese in turn may imply a whole-model analysis Therefore the dependency andcorrelation analyses must perform quickly in order to answer effectively ldquoquestionsrdquoasked frequently

Our analyses have currently been applied to the functional specification of Proven-Core (Lescuyer 2015) More specifically they have been applied to the RSM FSP andTDS layers shown in Figure 81 Each of these layers is characterized by a global statewith numerous fields and different transitions ie supported commands or systemcalls such as fork exec exit Each supported command receives as an input the globalstate before the transition and returns the state of the system after the transition

For instance in RSM the global states are much simpler compared to the ones inthe layers below it ie FSP and TDS They are modeled by a structure with 6 fieldsout of which 3 are modeled by arrays and 2 by structures The RSM counterpart ofthe optional table of processes is a store of machines which are themselves the coun-terpart of FSP processes Machines are structures with 7 fields that refer to registersinformation regarding inter-process communication or permissions and code and datasegments Out of the 7 fields 2 are modeled by variants 2 by associative arrays andother 2 by structures

The global state of the FSP layer is modeled by a structure type with 15 fieldsincluding fields that concern process management (for memory allocations informationabout processes) interrupt handling (registered handlers active handlers) scheduling(priority queues currently running process process to run next) time management orcode data Among these 15 fields 9 fields are ldquocompositerdquo themselves being modeledby structures variants or associative arrays For instance among the fields concerningprocess management there is a table of optional processes The processes themselvesare modeled by a structure type having 26 fields Out of the total of 26 fields 11 aremodeled by algebraic data structures or associative arrays too

The FSP global state is characterized by over 70 invariantsIn TDS the global state is a structure having 33 fields among which 23 are ldquocom-

positerdquo as well The processes are structures having 29 fields among which 14 aremodeled by associative arrays or algebraic data types The global state is character-ized by approximately 140 invariants

83 Dependency and Correlation Results on ProvenCore Layers 185

In Table 83 we give an overview of the global states for each analysed layer Thefirst column shows the total number of fields The second column indicates the numberof fields that are modeled by associative arrays Between parentheses we indicatethe number of arrays having ldquocompositerdquo elements and elements of atomic or implicittypes respectively For example the FSP global state has 6 fields that are modeled byassociative arrays and all 6 of them have ldquocompositerdquo elements In columns 3 4 and5 we show the number of fields that are modeled by structures variants and atomic orimplicit types respectively

Table 83 ndash ProvenCore Abstract Layers ndash Global State Type

Global State Arrays Structures Variants AtomicImplicit

RSM 6 fields 2 fields (11) 2 fields 0 fields 2 fieldsFSP 15 fields 6 fields (60) 0 fields 3 fields 6 fieldsTDS 33 fields 14 fields (140) 3 fields 6 fields 10 fields

The global state of each layer contains an array or store of processes or machinesIn Table 84 we give an overview of the process or machine type for each analysed layerThe table has the same structure as the one described previously for the global statetypes

Table 84 ndash ProvenCore Abstract Layers ndash ProcessMachine Type

ProcessMachine Arrays Structures Variants AtomicImplicit

RSM 7 fields 2 fields (11) 2 fields 2 fields 1 fieldFSP 26 fields 2 fields (02) 5 fields 3 fields 16 fieldsTDS 29 fields 1 field (10) 8 fields 5 fields 15 fields

We have applied our dependency and correlation analyses on the RSM FSP andTDS layers thus conducting medium-sized experiments An overview of the charac-teristics for the 3 ProvenCore layers is included in Table 85 Table 87 and Table 89In each of these the first column shows the total number of predicates of the analysedlayers In parentheses we indicate the number of predicates that only read informationand return a Boolean-like exit label ie logical properties as well as the number of im-plicit predicates for which a pessimistic assumption is made The second column showsthe total number of lines of code (LoC) for each including comments and type defini-tions The next three columns indicate the number of LoC corresponding to predicatestype definitions and comments respectively

We have run the analyses 101 times in a loop on a Lenovo laptop with a Quad-CoreIntel Core I7-5500U processor and 8 GB RAM The system runs Xubuntu GnuLinux64 bit Release 1510 with OCaml 401 Before the first run of each loop the operatingsystemrsquos cache was dropped using the following command

186 Chapter 8 Implementation Application and Results

echo 3 gt procsysvmdrop_caches

The time measured includes only the execution of the analysis algorithms It ex-cludes the time required to load the input files as well as the time spent printing theresults

On average our fully context-insensitive dependency analysis as presented in Chap-ter 5 computed the dependency summaries for 633 RSMFSP predicates in 0656 sec-onds For the TDS predicates the dependency summaries were computed in 0699seconds on average These results are indicated in Table 85

Table 85 ndash Abstract Layers ndash Evaluation Data and DependencyAnalysis Timing

Predicates Total LoC Code Types Comments Dependency Avg

RSMFSP 633 (23565) 9853 8402 596 855 0656 s

TDS 780 (231155) 14000 11306 588 2106 0699 s

In Table 86 we indicate the minimum and maximum execution times for thecontext-insensitive dependency analysis Various percentiles are indicated as well

Table 86 ndash Abstract Layers ndash Detailed Dependency Analysis Timing(in seconds)

Min 10ile 50ile 90ile Max Avg

RSMFSP 0650 0651 0652 0658 0730 0656

TDS 0690 0691 0693 0718 0798 0699

The average execution time of our dependency analysis with the deferred accessesextension is shown in Table 87 in the last column denoted by Avg On averageour dependency analysis extended with deferred accesses as presented in Chapter 6computed the dependency summaries with context-sensitive leaves for 633 predicatesin 0779 seconds For the TDS predicates the dependency information was computedin 0919 seconds on average These results are indicated in Table 87

Therefore using our relaxed form of context-sensitivity led to an increase of 10-20in execution time on the used benchmarks

The detailed timing information for the dependency analysis using deferred accessesis shown in Table 88

The average execution time of our correlation analysis is shown in Table 89 in thelast column denoted by Avg The correlation summaries for the RSMFSP predicatesare computed in 0426 seconds on average For the TDS predicates the correlationsummaries are computed in 0496 seconds on average Unlike the dependency analysis

83 Dependency and Correlation Results on ProvenCore Layers 187

Table 87 ndash Abstract Layers ndash Evaluation Data and Deferred Depen-dency Analysis Timing

Predicates Total LoC Code Types Comments Deferred Avg

RSMFSP 633 (23565) 9853 8402 596 855 0779 s

TDS 780 (231155) 14000 11306 588 2106 0919 s

Table 88 ndash Abstract Layers ndash Detailed Deferred Dependency AnalysisTiming (in seconds)

Min 10ile 50ile 90ile Max Avg

RSMFSP 0776 0777 0779 0781 0785 0779

TDS 0904 0905 0908 0975 0999 0919

which computes information for code as well as specifications ie logical propertiesin a unified manner the correlation analysis only computes information for predicatesthat actually modify data structures This partly explains the time difference betweenthe two analyses We also remark that the possible-constructors analysis is performedsimultaneously with the dependency analysis and this contributes to the differencebetween the execution times as well

Table 89 ndash Abstract Layers ndash Evaluation Data and Correlation Anal-ysis Timing

Predicates Total LoC Code Types Comments Correlation Avg

RSMFSP 633 (23565) 9853 8402 596 855 0426 s

TDS 780 (231155) 14000 11306 588 2106 0496 s

The detailed timing information for our correlation analysis is shown in Table 810Generally static analysis has been considered prohibitive in terms of execution

time and it has been avoided in an interactive context and used predominantly inan automatic context Though currently applied only on medium-sized models theexecution times of both of our analyses are short enough to expect reasonable executiontimes for larger models as well2

2It is noteworthy to remark that the interprocedural dependency and correlation summaries willnot necessarily be computed on-the-fly during the interactive proof They rather will be computed aspart of the build In contrast the treatment of a query once all interprocedural information has been

188 Chapter 8 Implementation Application and Results

Table 810 ndash Abstract Layers ndash Detailed Correlation Analysis Timing(in seconds)

Min 10ile 50ile 90ile Max Avg

RSMFSP 0424 0425 0425 0427 0432 0426

TDS 0492 0493 0494 0498 0540 0496

833 Precision of our Dependency and Correlation Summaries

In this section we try to illustrate the sort of dependency and correlation summariesthat are computed by our analyses We conclude the section with a brief discussionregarding the precision of our obtained results Assessing and discussing precision asa metric for usefulness is hard in isolation and can only be effectively done in relationto actual applications However we present some statistics in order to give someinsight about the proportion of the non-trivial information computed For our currentdiscussion we focus on the results obtained on the RSMFSP and the TDS layers

One of the analysed predicates of the RSMFSP layers is do_auth This predicateis a system call clearing or granting an authorization to some process to read from orwrite to some memory range of the current process It receives a global state in andan index i as inputs and produces on the true label the new global state out aftermodifying the permission for the i-th process in the process store

The code of do_auth performs various system-wide checks before registering thepermission change and is therefore not trivial although its effect is quite limitedIndeed the correlation results computed by our analysis for the true label of thispredicate are shown below

true (in out) 7rarr [(ε ε) 7rarr 7rarr Equal 14 fields

procs 7rarr Any (procs procs) 7rarr 〈 Equal i [ None 7rarr Equal

Some 7rarr v 7rarr 7rarr Equal 25fields

mem_auth 7rarr Any]〉]

The analysis detects that out of the 15 fields of out only the i-th element of the procsfield is changed Furthermore it detects that if this element is an active process iebuilt with the Some constructor only the mem_auth field is modified out of the total of26 fields Everything else is copied from the input state in

computed will be executed in real-time Nevertheless it is desirable to have fast analyses allowingdevelopers to iterate frequently

83 Dependency and Correlation Results on ProvenCore Layers 189

Combined with dependency summaries for logical properties this correlation sum-mary would allow us to infer the preservation of all invariants that are not concernedwith the memory permissions All but one out of the specified properties for the globalstate fall into this category This is the relevant memory permissions property

predicate proc_mem_auth_ok(proc proc) -gt [true | false]

which verifies a fundamental property that has to hold for all processes in the processstore of proc and states that a process has permissions covering a valid range of mem-ory addresses and referring only to existing processes After executing do_auth thisproperty is threatened and needs to be verified only for the i-th process of the storeIt is preserved for all others

The dependency results computed by our analysis for this predicate are shown be-low The analysis detects that for each of the possible execution scenarios the outcomedepends only on 2 out of the 26 fields namely the stackframe and the memory per-missions The dependency on the stackframe is confined to only one of the 3 fieldsthe data and stack segment The memory permissions are given by a variant with 3constructors denoting reading and writing permissions or the absence of any permis-sion Furthermore besides pinning down the outcomersquos dependency on 2 out of the 26fields of the proc structure the analysis also detects that the absence of any memorypermission indicated by the constructor NONE of the mem_auth variant is perp for the falseexecution scenario In other words unused permissions cannot threaten the propertyproc_mem_auth_ok

false rarr proc rarr mem_auth rarr [ READ rarr base rarr gt len rarr gtWRITE rarr base rarr gt len rarr gtNONE rarr perp ]

stackframe rarr ds rarr gttrue rarr proc rarr mem_auth rarr [ READ rarr base rarr gt len rarr gt

WRITE rarr base rarr gt len rarr gtNONE rarr ]

stackframe rarr ds rarr gt

The relevant memory permissions property is thus only threatened by transitionsthat add memory permissions or change a processrsquo virtual space layout Only 2 tran-sitions out of the 25 belong to this category exec which resets the processrsquo segmentsand do_auth which adds permissions and was discussed above In particular transi-tions deleting memory permissions do not impact the property since the absence ofpermissions as shown by the dependency of the constructor NONE for the false labelis an impossible case when the property does not hold This is one of the practicaladvantages of tracking constructor possibilities simultaneously and of extending thecorrelation analysis to track the evolution of constructors as well

In the following we briefly discuss our dependency summaries obtained on theRSMFSP layer in terms of precision An overview is given in Table 811 The firstcolumn refers to the fully context-insensitive dependency analysis as presented in Chap-ter 5 The second column refers to the dependency analysis extended with deferred

190 Chapter 8 Implementation Application and Results

access maps as presented in Chapter 6 The first line indicates the total number ofpredicates both implicit and explicit The second line indicates the total number ofimplicit predicates for which we are obliged to make a pessimistic assumption and toconsider everything needed given that their implementation is hidden The third lineindicates the number of explicit predicates without inputs for which empty summariesare retrieved Our dependency analysis detects the input subset that is read in orderto obtain the output In the case of predicates without inputs this subset is emptyMost explicit predicates without inputs correspond to wrapper predicates around callsto constructors that take no arguments Since αSmil is an intermediate language suchpredicates are automatically generated and do not necessarily correspond to program-mer written predicates The next line line 4 indicates the number of predicates forwhich we obtain non-trivial information By non-trivial information we mean depen-dency summaries in which the dependency associated to at least one input variableis different than gt ie Everything the element conveying no information With thecontext-insensitive dependency analysis we obtain non-trivial results for 344 predicatesWith the extended dependency we obtain non-trivial results for 403 predicates

Table 811 ndash RSMFSP Layers ndash Evaluation Data and DependencySummaries

Context-Insensitive Deferred

Number of Total Predicates 633 633

Number of Implicit Predicates 65 65No Inputs 26 26

Number of Non-Trivial Results 344 403

Number of Trivial-Results 289 230bull Implicit 65 65bull No Inputs 26 26bull Other 198 139

Predicates with Atomic Inputs 31 31

Completely Read 71 71

Overapproximation 96 37

The following line mdash line 5 mdash indicates the total number of predicates for whichtrivial results are obtained These include the results for implicit predicates as well asthose for predicates without inputs For the simple version of the dependency analysiswe obtain 198 trivial results excluding implicit predicates and predicates without in-puts For the extended dependency analysis we obtain trivial results for 139 predicatesexcluding implicit predicates and predicates without inputs Therefore for the first ver-sion of the analysis 49 trivial summaries are a consequence of context-insensitivity The

83 Dependency and Correlation Results on ProvenCore Layers 191

next 3 lines refer to the 139 predicates for which trivial results are obtained with bothversions of the dependency analysis 31 of them correspond to predicates manipulat-ing only inputs of atomic types such as int Such inputs are completely read andthus the trivial results are justified and do not correspond to an over-approximationOther 71 correspond to predicates making complex manipulations and actually read-ing all of their input such as well-formedness checks The last 37 trivial results area consequence of over-approximations made by our analysis The majority of themcorrespond to complex predicates making multiple calls to other complex predicatesand relying heavily on calls to implicit predicates for which conservative assumptionsare made For the simple dependency analysis other 46 trivial results are a result ofover-approximations related to context-insensitivity

An overview of the dependency results for the TDS layer is given in Table 812The table follows the same structure as described for Table 811

Table 812 ndash TDS Layer ndash Evaluation Data and DependencySummaries

Context-Insensitive Deferred

Number of Total Predicates 780 780

Number of Implicit Predicates 155 155No Inputs 15 15

Number of Non-Trivial Results 386 458

Number of Trivial-Results 394 322bull Implicit 155 155bull No Inputs 15 15bull Other 224 152

Predicates with Atomic Inputs 49 49

Completely Read 59 59

Overapproximation 116 44

We remark that with the deferred dependencies extension we obtain more pre-cise dependency summaries for 273 predicates of the RSMFSP abstract layer Theseconstitute approximately 50 of the predicates in the used benchmark For the TDSlayer we obtain more precise results for 308 predicates using the deferred dependenciesextension These constitute approximately 50 of the predicates in the TDS layer forwhich non-trivial results can be obtained (ie excluding implicit predicates and thosewithout inputs) The dependency summaries obtained with the extended analysis areconsiderably more detailed For instance just to give an intuition of the differencebetween the results obtained for the TDS layer the file containing the results com-puted with the context-insensitive dependency analysis contains 7333 lines and its size

192 Chapter 8 Implementation Application and Results

is 2631 kB while the file containing the results computed with the extended analysiscontains 11547 lines and its size is 5239 kB

The statistics for the correlation analysis are shown in Table 813 Unlike the depen-dency analysis which handles both logical properties and predicates generating outputsthe correlation analysis does not handle logical properties It tracks fine-grained partialequivalences between parts of the input and parts of the output Therefore the numberof RSMFSP predicates for which we can obtain non-trivial results (ie at least onepartial equivalence between an input (sub)element and an output (sub)element on atleast one exit label) is lower Implicit predicates and specification-only predicates aremapped to NoCorrelation the top element conveying no information Out of the 307predicates left we obtain non-trivial results for 186 of them The rest include predi-cates relying heavily on calls to implicit predicates They also include complex systemcalls such as fork or exec and auxiliary operations which modify their input entirely

Table 813 ndash RSMFSP Layers ndash Evaluation Data and CorrelationSummaries

Correlation Analysis

Number of Total Predicates 633

Number of Implicit Predicates 65Number of Logical Properties (No Outputs) 235

No Inputs 26

Number of Non-Trivial Results 186

Number of Trivial-Results 90bull Implicit 65bull No Inputs 26bull No Outputs 235bull AtomicImplicit Inputs 31

An overview of the correlation results for the TDS layer is given in Table 814 Thetable follows the same structure as described for Table 813

84 Reasoning about Framing using Correlations and De-pendencies

841 A Decision Procedure

In general reasoning about framing relies on the frame rule which is commonly illus-trated as follows

PCQP andRCQ andR

84 Reasoning about Framing using Correlations and Dependencies 193

Table 814 ndash TDS Layer ndash Evaluation Data and Correlation Summaries

Correlation Analysis

Number of Total Predicates 780

Number of Implicit Predicates 155Number of Logical Properties (No Outputs) 231

No Inputs 15

Number of Non-Trivial Results 235

Number of Trivial-Results 95bull Implicit 155bull No Inputs 15bull No Outputs 231bull AtomicImplicit Inputs 49

The purpose of the frame rule is to enable local reasoning a property R that holdsfor a state P will continue to hold after executing a command C provided that Rreads only locations that are unmodified by C The frame rule also called the rule ofconstancy (Reynolds 1981) applies in its original form to simple languages which donot use a heap Separation logic addresses framing for heap-supporting languages

In our case the αSmil language with which we are working does not support mu-tation Our work is not concerned with heap modifications but focuses on deep-statemodifications We handle predicates that receive a composite input state and constructa new composite output state without altering the former The new output state isconstructed by copying the input state and modifying a subset of subelements

In our context the frame rule must be reinterpreted as follows a property R ispreserved by a predicate C receiving an input state P and constructing an output stateQ if the states P and Q agree on the subset on which the property R depends In otherwords a property is preserved by a predicate if the latter only modifies subelements onwhich the property does not depend Using the terminology used in separation logica property R is preserved by a predicate C if the footprint of C is disjoint from thefootprint of R However we are not concerned with locations but with subelements oflarge states modeled by algebraic data structures and arrays Therefore when reasoningabout framing we need to check if the input subset modified by an operation is disjointfrom the subset that properties are reading and depending on

We have devised two static analyses for automatically computing the footprints ofoperations and properties The dependency analysis detects the input subset on whichthe outcome of an operation or of a property relies The correlation analysis detectsthe input subset that is modified by an operation in order to obtain the output Theresults of the two analyses are meant to be used and combined by a decision procedurein order to automatically infer the preservation of frame properties

The decision procedure has not been implemented yet but based on preliminary

194 Chapter 8 Implementation Application and Results

experiments we give an intuition about how the dependency and correlation summariesare meant to be unified what type of queries could be answered and the mechanismused for answering them

Concretely the decision procedure is meant to receive a sequence of atoms one ofwhich is a query The query is to be answered based on the correlation summariescomputed for the other atoms Atoms are calls to built-in or user-defined predicatesQueries usually consist of a Boolean built-in statement such as an equality check ora partial structure equality check for instance or a call to a logical predicate havingtrue and false as exit labels and generating no outputs In a nutshell the dependencysummary computed for the query would have to be transformed and interpreted as aset of correlations that are sufficient to answer affirmatively the given query Thisshould then be compared to the correlations computed for the atoms The query canbe answered affirmatively if the latter is less than or equal to the former

We sketch the envisioned mechanism behind our decision procedure on a simpleexample receiving 4 atoms One of them is a query as shown below

type state = f int g int h int

v1 = sft = s with g = w

v2 = tf

Q v1 = v2 - true -

In this case it is not necessary to first obtain the dependency for the query markedwith Q and to interpret it as a correlation The necessary and sufficient correlation forthe query to be answered affirmatively can be obtained directly

(v1 v2) 7rarr (ε ε) 7rarr Equal

Separately we need to extract all the correlation information regarding (v1 v2) fromthe given atoms For this we must first find the chains of correlations connecting thetwo through other intermediate atoms Therefore we begin by building an undirectedgraph in which every variable appearing in the atoms is added as a node An edge isadded between any nodes representing the input and the output of the same atom3For our example the graph is shown below

s

t v1

v2 w

3In general these graphs will not be acyclic Further measures will have to be taken for correctlydealing with all cases

84 Reasoning about Framing using Correlations and Dependencies 195

The path connecting v1 and v2 is highlighted in green In the general case such pathscould be detected using a depth-first search algorithm Using the detected path betweenv1 and v2 we build a chain of pairs of variables of the following form

(v1 s) lt-gt (s t) lt-gt (t v2)

These are the unordered paths for which we need to extract the correlation informationcontained in the correlation summaries of the atoms The correlation summaries of ourexample atoms are the following

v1 = sf (s v1) 7rarr (f ε) 7rarr Equal

t = s with g = w (s t ) 7rarr

(f f) 7rarr Equal(h h) 7rarr Equal

(w t ) 7rarr (ε g) 7rarr Equal

v2 = tf (t v2) 7rarr (f ε) 7rarr Equal

In the correlation summaries computed by our analysis correlation maps are associatedto pairs of input and output values ie the computed information is expressed betweenthe input and the output variables of an operation They can be seen as ordered pairshaving inputs as the left members and outputs as the right members However thecorrelation information expresses a relation between two runtime values which canbe compared independently of the order in which they appear4 The atoms refer tovalues that occur in the program at different times and answering the query is doneindependently of the order of execution Therefore at this level we can swap themembers of the pairs to which correlation maps are associated This allows us toobtain correlation information expressed in terms of the variable pairs in the chainextracted from the graph of atom variables For instance for our example we wouldobtain the following

(v1 s) lt-gt (s t) lt-gt (t v2)

(v1 s) 7rarr (ε f) 7rarr Equal

(s t ) 7rarr

(f f) 7rarr Equal(h h) 7rarr Equal

(t v2) 7rarr (f ε) 7rarr Equal

From these we compute the Cartesian product of the correlations appearing in thecorrelation maps as follows

4When the evolution of constructors will be tracked as well the relations will stop being symmetricThus the matrices will have to be transposed

196 Chapter 8 Implementation Application and Results

c1 times c2 c3 times c4

wherec1 = (ε f) 7rarr Equalc2 = (f f) 7rarr Equalc3 = (h h) 7rarr Equalc4 = (f ε) 7rarr Equal

For our example the obtained set would be the following((ε f) 7rarr Equal (f f) 7rarr Equal (f ε) 7rarr Equal)((ε f) 7rarr Equal (h h) 7rarr Equal (f ε) 7rarr Equal))

For each member of the obtained set we need to recursively compose the correlationsin order to obtain information regarding the values involved in the query The composeoperations would be applied as follows

(((cprime1 cprime2) cprime3) middot middot middot )

where for the first element of our example set cprime1 cprime2 and cprime3 have the following values

cprime1 = (ε f) 7rarr Equalcprime2 = (f f) 7rarr Equalcprime3 = (f ε) 7rarr Equal

For our example we cannot obtain any correlation information regarding (v1 v2)by composing the correlations of the second member of the Cartesian product Thefirst correlation relates the value of v1 to the value of the f field of s while the secondcorrelation relates the values of the field h of s and t Thus in this case we cannotinfer anything regarding v1 and t nor regarding v1 and v2 However by composingthe correlations of the first member of the Cartesian product we obtain the following

(v1 v2) 7rarr (ε ε) 7rarr Equal

If after composing we would have obtained multiple correlations referring to (v1 v2)these would have had to be intersected thus allowing us to extract from the givenatoms the most precise correlation information regarding (v1 v2) In the general casethe correlation information obtained after the intersection is the one that has to becompared to the correlation computed previously ie the sufficient correlation for thequery to be answered affirmatively For our example this amounts thus to comparing

(v1 v2) 7rarr (ε ε) 7rarr EqualvK

(v1 v2) 7rarr (ε ε) 7rarr Equal

Based on this we can conclude that the given query Q will be answered affirmativelyfor the atoms given in our example

84 Reasoning about Framing using Correlations and Dependencies 197

842 Types of Targeted Queries

The types of queries that are targeted by our approach can be categorized as follows

bull equality of values

bull structure equality on the values of a subset of fields

bull implications of the form logical_property(a) rArr logical_property(b) where a and bare related by the facts inferred from the other atoms of the query

bull conjunctions of such queries

In the general case we need to reinterpret a dependency summary as a correlationsummary The queryrsquos goal is to deduce the equality between pairs of variables Whentwo such variables are of the same type we can create a correlation map containinga single correlation That correlation associates to the pair of paths (ε ε) a partialequivalence relation which mirrors the dependency The partial equivalence relation iscreated as follows

bull When the dependency is Everything the equivalence relation becomes Equal

bull When the dependency is Nothing the equivalence relation becomes Any

bull Structure variant and array dependencies are transformed pointwise to structurevariant and array partial relations

bull When the dependency is Impossible the equivalence relation becomes Any in theabsence of the possible-constructors extension

We illustrate here some example queries revolving around our do_auth predicatediscussed in Section 833

A naive equality query on the entire input and output of do_auth would not besatisfiable as do_auth does modify the memory authorizations of one process This isthe first sort of supported query

do_auth (now i arg3 )[ true after |oob| f a l s e ]Q after = nowrArr no

The main argument of the do_auth predicate is the global state now an instance ofthe global_state structure5type global_state =

procs array ltoption ltprocess gtgtmemory_regions array lt mem_region gtirq_handlers array lt irq_handler gtcurrent_process int

5Due to confidentiality reasons the actual definition of the struct has been modified and edited forlength

198 Chapter 8 Implementation Application and Results

Since the do_auth predicate only affects the mem_auth of one process in the procsarray we can successfully deduce for the values of now and after the equality on thefields memory_regions and current_process This is the second sort of supported query

do_auth (now arg2 arg3 )[ true after |oob| f a l s e ]Q after = ltmemory_regions current_process gtnowrArr yes

Finally we can directly deduce that the all_ids_in_handlers_ok_global(state)property is not threatened by the execution of the do_auth predicate

do_auth (now arg2 arg3 )[ true after |oob| f a l s e ]Q congruent all_ids_in_handlers_ok_global (now)

all_ids_in_handlers_ok_global (after )rArr yes

This property verifies that all the identifiers used by the registered interruptionhandlers stored in the field irq_handlers are valid The property has the followingdependency summary

false rarr staterarr irq_handlersrarr Everythingtrue rarr staterarr irq_handlersrarr Everything

From the correlation of the do_auth predicate we know that the irq_handlers fieldis preserved and therefore it follows that the property which only depends on thatfield is preserved Similar properties that do not depend on the procs array but onlyon parts or on the entirety of one or more of the other 14 fields will be preserved aswell

The preservation of properties that have to hold for every process in the arrayprocs will be inferred as well as long as they do not depend on the mem_auth field ofthe processes For instance the property procs_proc_map_ok_global verifies that eachprocess of the array procs has valid code data and stack segments This property hasthe following dependency summary

truerarrstaterarr

procsrarr

lang[None rarr EverythingSome rarr vrarr proc_maprarr Everything

]rang

falserarrstaterarr

procsrarr

lang[None rarr EverythingSome rarr vrarr proc_maprarr Everything

]rang

Since for every active process of the array the property depends only on the proc_mapfield it is unaffected by the modification of the mem_auth field Therefore the propertyis preserved for the global state after obtained after the execution of do_auth Similarproperties that do not depend on the mem_auth field but only depend on other parts ofthe data structure will be preserved as well

An extension of the decision procedure sketched in Section 841 could take advan-tage of additional information regarding array indices For example the query couldspecify that two of the involved array indices are different

85 Decision Procedure Experiments 199

do_auth (now i arg3 )[ true after |oob| f a l s e ]Assert i = jQ congruent mem_auth_ok_global (now j)

mem_auth_ok_global (after j)rArr yes

The mem_auth_ok_global(statej) property checks the well-formedness of the mem-ory permission on the j-th process The above query is satisfied if the propertymem_auth_ok_global holds for all processes other than the i-th The correlation sum-mary for do_auth states that the elements of the procs array are unmodified by theoperation except for the i-th element Combined with the dependency summary formem_auth_ok_global given below this allows the query to be satisfied

truerarrstaterarr

procsrarr

langNothing j

[None rarr EverythingSome rarr vrarr ProcDep1

]rang

falserarrstaterarr

procsrarr

langNothing j

[None rarr EverythingSome rarr vrarr ProcDep2

]rang

where ProcDep1 ismem_auth rarr

READ rarr base rarr Everything

len rarr EverythingWRITE rarr base rarr Everything

len rarr EverythingNONE rarr Impossible

stackframe rarr dsrarr Everything

and ProcDep2 is

mem_auth rarr

READ rarr base rarr Everything

len rarr EverythingWRITE rarr base rarr Everything

len rarr EverythingNONE rarr Nothing

stackframe rarr dsrarr Everything

85 Decision Procedure ExperimentsWe have applied a basic prototype of the decision procedure using the dependency andcorrelation summaries computed for the RSMFSP layers of ProvenCore

Our prototype considers pairs of one logical property and one predicate The log-ical property and the predicate must both operate on values of the same type Moreprecisely one of the predicatersquos inputs as well as one of its outputs and one of thelogical propertyrsquos inputs must all be of the same type Our prototype attempts to

200 Chapter 8 Implementation Application and Results

detect whether the logical property is preserved after the execution of the predicate Ifseveral inputs or outputs are of the same type all combinations are considered Mostimplicit types were not considered when searching for propertypredicate pairs as theyare less likely to yield successful results For example arguments of a primitive typelike int are unlikely to be unaffected by the execution of the predicate

This prototype automatically inspected all such propertypredicate pairs found inthe RSMFSP layers A property was considered to be preserved if its dependencysummary for the argument involved when translated to a set of equalities formed asubset of the equalities implied by the predicatersquos correlation summary Both the trueand the false exit labels were considered independently and the property is consideredto be preserved (subject to some conditions) when it is preserved for either or both exitlabels More precisely given a property π(ı)[true|false] and a predicate p(ıprime)[` oprime] wereport success when it can satisfy the following

exist i isin ı iprime isin ıprime oprime isin oprime such that Γ(i) = Γ(iprime) = Γ(oprime) (81)and exist ` isin true false (82)and E(j) 6= E(k) and Eprime(j) 6= Eprime(k) forallj k isin ı ıprime oprime (83)

when j and k are used as array indices (84)

andlangE[

Prop(ı[irarr iprime])[true|false]]rang `minusrarr E (85)

andlangE[

Pred (ıprime)[`prime o| ]]rang `primeminusrarr Eprime (86)

andlangEprime[

Prop(ı[irarr oprime])[true|false]]rang `minusrarr Eprime (87)

where ı[i rarr iprime] and ı[i rarr oprime] denote the sequence of variables ı in which the variable iis replaced by the variable iprime (respectively oprime)

This initial prototype was run on the 398 explicit predicates and 235 properties ofthe RSMFSP layer of ProvenCore Out of these we filtered predicateproperty pairsfor which the property has an input i of the same type as one of the predicatersquos inputsiprime and one of its outputs oprime These pairs involve 161 distinct predicates and 165 distinctproperties In total there were 8250 tuples (i iprime oprime `) which satisfied the conditions 81and 82

This experiment allowed us as a first result to automatically identify 102 predicatesfor which at least one property is preserved under the conditions 81 ndash 87 stated aboveFor many predicates it was possible to show that after the execution of said predicateseveral properties are preserved (up to 33) Figure 82 shows an overview of howmany properties were inferred to be preserved for each predicate The blue regionat the bottom indicates how many properties are inferred to be preserved for a givenpredicate while the red region above shows how many properties were compatible withthe predicate but were not inferred to be preserved

Figure 83 shows an overview of how many predicates were inferred to be preservingeach property The blue region at the bottom indicates how many predicates areinferred to be preserving a given property while the red region above shows how many

85 Decision Procedure Experiments 201

20 40 60 80 1000

5

10

15

20

25

30

35

40

45

50

Predicates

Num

berof

preservedprop

ertie

sinferred

Figure 82 ndash Distribution of the number of inferred preserved proper-ties Predicates are sorted along that criterion

predicates were compatible with the property but were not inferred to be preservingit

It is worth noting that in both figures 82 and 83 the red zone contains properties(respectively predicates) which could fall into these cases

bull The property is actually threatened by the predicate (respectively the predicatethreatens the property)

bull The property is not threatened (respectively the predicate is not threatening)but proving so requires more information that is obtained by our dependencyand correlation analysis For example a more precise dependency or correlationanalysis (eg tracking constructor evolution as presented in 76) could be neededA numerical or value analysis could also help determine that the parts of the in-put data structure which are modified by the predicate and on which the logical

202 Chapter 8 Implementation Application and Results

20 40 60 80 900

10

20

30

40

50

60

Properties

Num

berof

predicates

preserving

theprop

erty

inferred

Figure 83 ndash Distribution of the number of inferred predicates for whicha property is preserved Properties are sorted along that criterion

property also depends still satisfy the property after the execution of the pred-icate Alternatively the preservation of these properties can be demonstratedusing an interactive prover

bull The property is not threatened (respectively the predicate is not threatening) andthe dependency and correlation summaries contain enough information to provethe non-interference of the predicate and property but our decision procedureprototype failed to infer it This can be due to a timeout (this initial prototypehas not been optimized at all and can take a substantial time in some cases) orto precision losses in the decision procedure prototype itself

203

Chapter 9

Conclusion and Perspectives

There is no real ending Itrsquos just theplace where you stop the story

Frank Herbert

Despite its intuitive simplicity the frame problem has proved to be an enduringissue with notoriously tedious implications Its different manifestations have been stud-ied for several decades in various contexts ranging from Artificial Intelligence in thecontext of which it has been originally identified to the field of formal specificationand verification Recently it has received extensive attention from the object-orientedverification community where it has been identified as a subsisting problem (LeavensLeino and Muumlller 2007) and an ideal candidate for automation (Meyer 2015) Clas-sical approaches to addressing the frame problem are typically relying on separationlogic (Reynolds 2005) or ownership types (Clarke Potter and Noble 1998) Thoughthe merits of such approaches are indisputable the manual specification effort that theyrequire is non-negligible as well Frame properties are an integral part of a completespecification and they are mandatory for proving correctness but ideally they shouldimpose little additional effort Programmers should be able to focus on the truly inter-esting part namely what code does and rely on automatic tools for the repetitive andcumbersome task of specifying and verifying frame properties

Interactive formal verification of complex transition systems is not exempt from themanifestations of the frame problem either Considerable effort is spent on provingthe preservation of the systemrsquos invariants even though in practice the majority ofoperations have a localised effect on the system and impact only a limited number ofinvariants at the same time Identifying those invariants that are unaffected by anoperation and automatically proving their preservation can substantially ease the proofburden for the programmer In this thesis we have presented an approach towardsautomatically inferring the preservation of framing-related invariants It is meant tobe used in the context of an interactive theorem prover and employs two differentstatic analyses namely a dependency analysis and a correlation analysis whose unifiedresults are meant to establish the disjointness between the data dependencies of a logicalproperty and the modifications performed by an operation The decision proceduremeant to combine the results of the two analyses is still in an incipient stage Howeverour preliminary experiments related to automatically answering queries regarding the

204 Chapter 9 Conclusion and Perspectives

preservation of certain invariants for unmodified parts are encouraging We believethat our envisioned approach can become applicable to complex transition systemson a routine basis Reasoning about framing can come for free without imposing thespecification of additional clauses We also believe that automatic reasoning aboutframing can be achieved through static analysis Generally static analysis has beenconsidered prohibitive in terms of execution time It has been predominantly usedin an automatic context and avoided in interactive contexts where queries have to beanswered fast so as not to impede the natural flow of an interactive proof Thoughcurrently applied only on medium-sized models given the short execution times of ourdedicated static analyses we believe that reasonable execution times for larger modelscan be expected as well Therefore we surmise that static analysis is applicable in aninteractive verification context

91 ContributionsThe main contributions of this thesis are the designed and implemented dependencyand correlation analyses which are meant to be used in the context of an interactivetheorem prover Both analyses handle associative arrays and algebraic data types andcompute fine-grained results mirroring the layered structures of such types They targetcomplex transition systems in general and operating systems in particular These arecharacterized by states defined by complex compound data structures and by transi-tions ie state changes that map an input state to an output state Both of our staticanalyses are concerned with deep-state manipulations ie accesses and modificationsrespectively

The dependency analysis presented in Chapter 5 automatically detects the relevantinput subset needed for producing certain outputs It handles functions and theirspecifications in a unified manner and computes for each possible execution scenario aconservative approximation of the input (sub)elements on which their outcome dependsIt is a flow-sensitive path-sensitive interprocedural data-flow analysis Furthermore forvariants an additional analysis is simultaneously conducted for computing the subsetof possible constructors on a given execution scenario Together with the dependencyinformation per se this additional information about constructors is meant to answerthe same question namely what fragments of the input influence the output from adifferent albeit related point of view The first version of the dependency analysis wasfully context-insensitive In order to introduce a relaxed form of context-sensitivity wehave devised an extension based on symbolic paths This was presented in Chapter 6

The extension for the dependency analysis is based on computing deferred depen-dencies consisting of symbolic access maps in which callers can subsequently injecttheir specific context information on an as-needed basis The dependency summariesfor each predicate are still computed only once However by including nested context-sensitive components at the summariesrsquo leaves we reduce the precision penalty exertedby the fully context-insensitive approach without sacrificing performance As discussedin Chapter 8 the deferred dependencies extension led to an increase of 10ndash20 in

91 Contributions 205

execution time on the used benchmarks In terms of precision it led to more precisedependency summaries for 50 of the predicates of the same benchmarks

We surmise that besides its intended target other programming activities can relyon our dependency analysis as well For instance the analysis can have applications inthe testing realm for designing and generating test suites that avoid redundant testingof the same execution scenario Classes of inputs that will test the same executionscenario can be automatically determined The input subelements on which the outputsof a predicate do not depend can be consistently supplied with the same testing value asthey are completely irrelevant for the outcome On the contrary the input subelementson which the outputs depend should be targeted and their values should be varied formore comprehensive testing Furthermore our dependency analysis could also facilitateunit testing for exceptions as it computes specific results for every execution scenarioof a predicate Indeed it is useful to have dedicated test cases which trigger eachexception that can be thrown by a function The set of relevant parts of the inputdiffers for each possible exception and for the regular execution behaviour

Our second contribution is the correlation analysis presented in Chapter 7 whichdetects the flow of input values into output values It computes a conservative approx-imation of fine-grained equivalences between the input and the output subelementsof a function The correlation analysis is an interprocedural data-flow analysis thattracks the origin of subparts of the output and relates it to subparts of the inputs thussummarising the behaviour of functions and detecting not only what is modified butalso how and to what extent We have defined a partial equivalence type mirroringthe layered structure of algebraic data types and associative arrays and we introducedan intermediate level consisting of access paths and correlations These allow comput-ing expressive information regarding equivalences between subparts of the inputs andsubparts of the outputs in a flexible manner

Prototypes for both of our analyses have been implemented in OCaml These werediscussed in Chapter 8 We have applied them to a functional specification of Proven-Core (Lescuyer 2015) a general-purpose microkernel that ensures isolation Resultsfor medium-sized models have been obtained on average in less than 1 second with thedependency analysis and less than 05 seconds on average with the correlation analysisStatic approaches have long been considered as being confined to small programs Webelieve that our preliminary results indicate that it is possible to report conservativeprecise information without sacrificing scalability

We remark that our experience with the design and implementation of the twoanalyses has been rather different The dependency analysis is much more complexsemantically This is partly a consequence of the simultaneous possible-constructorsanalysis which has an impact on the abstract dependency domain Deferred depen-dencies add yet another layer of complexity However the implementation proved tobe much simpler than the implementation of the correlation analysis The latter posedchallenges due to the intermediate layer of access paths and correlations that we had toadd for obtaining expressive fine-grained information However the correlation analy-sis is simpler from a semantics point of view It is also noteworthy to remark that forboth analyses an intermediate level below variables needed to be introduced as soon as

206 Chapter 9 Conclusion and Perspectives

fine-grained relations between pairs of variables were considered directly or indirectlyIn the case of deferred dependencies this was not the main goal but rather a mecha-nism for obtaining increased precision in specific cases for already pertinent dependencyinformation In contrast for the correlation analysis the inclusion of an intermediatelevel was imperative for obtaining useful expressive information in non-trivial cases

As a first step towards a solution for automatically inferring the preservation offraming-related invariants we have sketched a decision procedure meant to employour two static analyses By uncovering equivalences between inputs and outputs afterhaving detected that a property only depends on unmodified parts and by unifying theresults the preservation of invariants for the unmodified parts can be inferred

92 Future WorkWe conclude this thesis with some perspectives for practical future work as well assome theoretical open issues that we wish to address in the future

Practical Future Work From a practical point of view our future work goalsrevolve around the full implementation of the decision procedure its integration inthe interactive theorem prover developed at Prove amp Run as well as its comprehensiveassessment in a real-word context

Decision Procedure Implementation Our first and main goal for the nearfuture focuses on the full implementation of the decision procedure combining our de-pendency and correlation summaries and answering queries related to the preservationof logical properties The performance of the algorithm sketched in Section 84 shouldbe assessed on real-world examples The complexity of this algorithm depends on thenumber of paths relating two endpoints in the graph of query atoms variables Italso depends on the number of correlations relating pairs of variables along the chainsconnecting endpoints This could lead to a combinatorial explosion of the number ofcompose operations for large query graphs Further optimization manners should beinvestigated and applied in the algorithm implementing the decision procedure

Validation After having implemented the decision procedure the precision ofour two static analyses employed by it should be comprehensively assessed on variousbenchmarks

Some of the theoretical aspects related to our static analyses have been formalizedin Coq by Steacutephane Lescuyer However the actual implementation of the algorithmsis not formally connected to the mechanized proofs Therefore it would be desirableto extensively test the implementation of the analysis algorithms This could be doneby translating the dependencies and correlations to types in a sufficiently expressivetype system or by inserting runtime guards These guards would check equalities forcorrelations and would taint supposedly irrelevant values identified by the dependencyanalysis verifying that the output is not tainted For the correlation analysis inputs

92 Future Work 207

which are correlated to some output values could be given a universally quantified typethe same type appearing in the parts of the output which are supposed to be equalThis is commonly used as a design pattern in functional programming languages toexpress data-flow constraints via the type system For the dependency analysis eachpart of the input which is supposed to be irrelevant for a predicatersquos output could beassigned a distinct polymorphic type variable which does not appear in the outputThis allows the body of the predicate to take notice of a valuersquos presence without beingable to manipulate its contents

Tool Integration and Support Another important goal for the near future isthe integration of our decision procedure in the ProvenTools interactive prover A tac-tic allowing to automate the inference of framing-related invariant preservation shouldbe supported This goal entails a sequence of other considerations that have to beaddressed Currently the dependency and correlation analyses handle whole programsand compute summaries for every predicate of the analysed program Though theexecution times of our analyses are low even these can prove to be cumbersome ina real world context Therefore the two analyses should be adapted so as to allowincrementally analysing only parts of a program Caching the results of the analysesacross invocations of the decision procedure could prove to be efficient as well Addi-tionally the mechanism of answering queries regarding invariant preservation shouldbe transparent allowing users to see the reasoning steps behind the decision procedureTransparency is necessary for the ProvenTools prover which targets products that haveto be certified This possibly also requires a more concise output notation for thedependency and correlation summaries in order to ease the interpretation of resultsCurrently they tend to be rather verbose for predicates handling composite values witha large number of subelements

For the dependency summaries a parser was implemented allowing users to an-notate predicates with expected dependency information A similar parser could bewritten for the correlation summaries These annotations are a useful tool for testingthe analyses on benchmarks for which the correlations and dependencies are knownIn addition they would allow users to annotate programs with constraints on the ex-pected dependencies and correlations similarly to type annotations in the presence oftype inference and check that these expectations hold

Finally the decision procedure and our dependency and correlation analyses couldbe offered as a software library A public API should describe and prescribe the ex-pected behavior of our two static analyses and the decision procedure relying on them

Theoretical Perspective From a theoretical perspective several interesting as-pects remain open In a nutshell these consist in developing support for more sophis-ticated queries that could be answered by our decision procedure The precision of ourdependency and correlation analysis can be further increased as well

208 Chapter 9 Conclusion and Perspectives

Decision Procedure A first interesting theoretical effort revolves around theformalization of our envisioned decision procedure used for inferring framing-relatedinvariants The types of queries it can answer should be further investigated andextended For instance it would be desirable to assert as a hypothesis that certainpredicates are known to be valid on some nodes of the graph We further identifiedtwo extensions for our correlation analysis that could increase the number of answeredqueries

Constructor Evolution For increasing the number of queries that our decisionprocedure can answer one direction to investigate is the extension of our correlationanalysis in order to track and compute information regarding the evolution of variantconstructors This additional information should be leveraged to the context of ourdecision procedure The formalization and implementation of this extension constitutean interesting effort Furthermore other types of relations between variables could beconsidered as well

Correlations between Inputs Another extension of our correlation analysisthat would enrich the types of queries that can be answered by our decision proce-dure consists in tracking correlations between pairs of inputs in addition to the onescomputed between pairs of inputs and outputs Besides the unified treatment of bothactual code and logical properties on the correlation analysis side this would allowanswering queries that consist in a single logical property on multiple input values thatare additionally related by other facts It would also allow detecting aliasing betweenvariables used as array indices

Numerical Analysis for Arrays Arrays are a source of precision loss in bothof our static analyses Hence it would be interesting to investigate the impact of usingsimple numerical abstractions (congruence modulo and linear abstract domains) Thenumerical analysis could otherwise be offloaded to an external SMT solver such as Z3or Alt-Ergo for instance Symbolic evaluation of the arithmetic computations shouldalso be possible This would avoid precision losses when joining two dependencies orcorrelations with exceptional information on distinct index variables which prove tohave the same integer value in practice Eliminating this source of imprecision wouldlikely benefit the analysis of loops over arrays

In conclusion we have devised and implemented two static analyses detecting thedata dependencies of a logical property as well as correlations between the inputs andthe outputs of operations Our first results on a functional model of a microkernelare encouraging both in terms of precision and speed making these analyses suitableto use in the context of interactive provers Aside from incremental improvements onthe precision of our analyses the next steps are to combine them in order to detectinvariants which are not affected by the execution of a predicate and to integrate this

92 Future Work 209

as a tactic in the ProvenTools theorem prover We believe that reasoning about framingcan come for free without imposing additional annotations Inferring the preservationof framing-related invariants through static analysis can become applicable on a routinebasis for complex transition systems

211

Bibliography

Abrial Jean-Raymond Stephen A Schuman and Bertrand Meyer (1980) ldquoSpecifica-tion Languagerdquo In On the Construction of Programs pp 343ndash410

Alpuente Mariacutea Santiago Escobar and Salvador Lucas (2007) ldquoRemoving RedundantArguments Automaticallyrdquo In TPLP 71-2 pp 3ndash35 url httpdxdoiorg101017S1471068406002869

Andreescu Oana F Thomas Jensen and Steacutephane Lescuyer (2015) ldquoDependencyAnalysis of Functional Specifications with Algebraic Data Structuresrdquo In FormalMethods and Software Engineering - 17th International Conference on Formal En-gineering Methods ICFEM 2015 Proceedings pp 116ndash133 doi 101007978-3-319-25423-4_8 url httpdxdoiorg101007978-3-319-25423-4_8

Andreescu Oana Fabiana Thomas Jensen and Steacutephane Lescuyer (2016) ldquoCorrelat-ing Structured Inputs and Outputs in Functional Specificationsrdquo In Software En-gineering and Formal Methods - 14th International Conference SEFM 2016 Heldas Part of STAF 2016 Vienna Austria July 4-8 2016 Proceedings pp 85ndash103doi 101007978-3-319-41591-8_7 url httpdxdoiorg101007978-3-319-41591-8_7

Asati Rahul Amitabha Sanyal Amey Karkare and Alan Mycroft (2014) ldquoLiveness-Based Garbage Collectionrdquo In Compiler Construction - 23rd International Con-ference CC 2014 Held as Part of the European Joint Conferences on Theory andPractice of Software ETAPS 2014 Grenoble France April 5-13 2014 Proceed-ings pp 85ndash106 doi 101007978-3-642-54807-9_5 url httpdxdoiorg101007978-3-642-54807-9_5

Baier Christel and Joost-Pieter Katoen (2008) Principles of Model Checking MITPress isbn 978-0-262-02649-9

Banerjee Anindya Mike Barnett and David A Naumann (2008) ldquoBoogie Meets Re-gions A Verification Experience Reportrdquo In Verified Software Theories Tools Ex-periments Second International Conference VSTTE 2008 Toronto Canada Oc-tober 6-9 2008 Proceedings Ed by Natarajan Shankar and Jim Woodcock BerlinHeidelberg Springer Berlin Heidelberg pp 177ndash191 isbn 978-3-540-87873-5 doi101007978-3-540-87873-5_16 url httpdxdoiorg101007978-3-540-87873-5_16

Banerjee Anindya and David A Naumann (2014) ldquoA Logical Analysis of Framing forSpecifications with Pure Method Callsrdquo In Verified Software Theories Tools andExperiments - 6th International Conference VSTTE 2014 Vienna Austria July17-18 2014 Revised Selected Papers pp 3ndash20 doi 101007978-3-319-12154-3_1

212 BIBLIOGRAPHY

Banerjee Anindya David A Naumann and Stan Rosenberg (2008) ldquoRegional Logicfor Local Reasoning about Global Invariantsrdquo In ECOOP 2008 - Object-OrientedProgramming 22nd European Conference Paphos Cyprus July 7-11 2008 Pro-ceedings pp 387ndash411 doi 101007978-3-540-70592-5_17 url httpdxdoiorg101007978-3-540-70592-5_17

mdash (2013) ldquoLocal Reasoning for Global Invariants Part I Region Logicrdquo In J ACM603 181ndash1856 doi 1011452485982 url httpdoiacmorg1011452485982

Barnes J and Praxis Critical Systems Limited (1997) High Integrity Ada The SPARKApproach Programming Languages Addison-Wesley isbn 9780201175172 urlhttpsbooksgooglefrbooksid=YoBGAAAAYAAJ

Barnett Michael and David A Naumann (2004) ldquoFriends Need a Bit More Maintain-ing Invariants Over Shared Staterdquo In Mathematics of Program Construction 7thInternational Conference MPC 2004 Stirling Scotland UK July 12-14 2004Proceedings pp 54ndash84 doi 10 1007 978 - 3 - 540 - 27764 - 4 _ 5 url http dxdoiorg101007978-3-540-27764-4_5

Barnett Michael Robert DeLine Manuel Faumlhndrich K Rustan M Leino and Wol-fram Schulte (2004) ldquoVerification of Object-Oriented Programs with InvariantsrdquoIn Journal of Object Technology 36 pp 27ndash56 doi 105381jot200436a2url httpdxdoiorg105381jot200436a2

Barnett Michael Bor-Yuh Evan Chang Robert DeLine Bart Jacobs and K RustanM Leino (2005a) ldquoBoogie A Modular Reusable Verifier for Object-Oriented Pro-gramsrdquo In Formal Methods for Components and Objects 4th International Sym-posium FMCO 2005 Amsterdam The Netherlands November 1-4 2005 RevisedLectures pp 364ndash387 doi 10100711804192_17 url httpdxdoiorg10100711804192_17

Barnett Michael Robert DeLine Manuel Faumlhndrich Bart Jacobs K Rustan M LeinoWolfram Schulte and Herman Venter (2005b) ldquoThe Spec Programming SystemChallenges and Directionsrdquo In Verified Software Theories Tools ExperimentsFirst IFIP TC 2WG 23 Conference VSTTE 2005 Zurich Switzerland October10-13 2005 Revised Selected Papers and Discussions pp 144ndash152 doi 101007978-3-540-69149-5_16 url httpdxdoiorg101007978-3-540-69149-5_16

Barnett Mike Manuel Faumlhndrich K Rustan M Leino Peter Muumlller Wolfram Schulteand Herman Venter (2011) ldquoSpecification and Verification The Spec ExperiencerdquoIn Commun ACM 546 pp 81ndash91 doi 10114519531221953145 url httpdoiacmorg10114519531221953145

Berdine Josh Cristiano Calcagno and Peter W OrsquoHearn (2005) ldquoSmallfoot Mod-ular Automatic Assertion Checking with Separation Logicrdquo In Formal Methodsfor Components and Objects 4th International Symposium FMCO 2005 Amster-dam The Netherlands November 1-4 2005 Revised Lectures pp 115ndash137 doi10100711804192_6 url httpdxdoiorg10100711804192_6

mdash (2012) ldquoVerification Condition Generation and Variable Conditions in SmallfootrdquoIn CoRR abs12044804 url httparxivorgabs12044804

BIBLIOGRAPHY 213

Berdine Josh Byron Cook and Samin Ishtiaq (2011) ldquoSLAyer Memory Safety forSystems-Level Coderdquo In Computer Aided Verification - 23rd International Confer-ence CAV 2011 Snowbird UT USA July 14-20 2011 Proceedings pp 178ndash183doi 101007978-3-642-22110-1_15 url httpdxdoiorg101007978-3-642-22110-1_15

Berg Joachim van den and Bart Jacobs (2001) ldquoThe LOOP Compiler for Java andJMLrdquo In Tools and Algorithms for the Construction and Analysis of Systems7th International Conference TACAS 2001 Held as Part of the Joint EuropeanConferences on Theory and Practice of Software ETAPS 2001 Genova Italy April2-6 2001 Proceedings pp 299ndash312 doi 1010073- 540- 45319- 9_21 urlhttpdxdoiorg1010073-540-45319-9_21

Bertot Yves and Pierre Casteacuteran (2004) Interactive Theorem Proving and ProgramDevelopment - CoqrsquoArt The Calculus of Inductive Constructions Texts in The-oretical Computer Science An EATCS Series Springer isbn 978-3-642-05880-6doi 101007978-3-662-07964-5 url httpdxdoiorg101007978-3-662-07964-5

Bertrane Julien Patrick Cousot Radhia Cousot Jeacuterocircme Feret Laurent MauborgneAntoine Mineacute and Xavier Rival (2015) ldquoStatic Analysis and Verification of AerospaceSoftware by Abstract Interpretationrdquo In Foundations and Trends in ProgrammingLanguages 22-3 pp 71ndash190 doi 1015612500000002 url httpdxdoiorg1015612500000002

Blanchet Bruno Patrick Cousot Radhia Cousot Jeacuterocircme Feret Laurent MauborgneAntoine Mineacute David Monniaux and Xavier Rival (2003) ldquoA Static Analyzer forLarge Safety-Critical Softwarerdquo In Proceedings of the ACM SIGPLAN 2003 Con-ference on Programming Language Design and Implementation 2003 San DiegoCalifornia USA June 9-11 2003 pp 196ndash207 doi 101145781131781153url httpdoiacmorg101145781131781153

Bobot Franccedilois and Jean-Christophe Filliacirctre (2012) ldquoSeparation Predicates A Tasteof Separation Logic in First-Order Logicrdquo In Formal Methods and Software Engi-neering - 14th International Conference on Formal Engineering Methods ICFEM2012 Kyoto Japan November 12-16 2012 Proceedings pp 167ndash181 doi 101007978-3-642-34281-3_14 url httpdxdoiorg101007978-3-642-34281-3_14

Borgida Alexander John Mylopoulos and Raymond Reiter (1993) ldquo And NothingElse Changes The Frame Problem in Procedure Specificationsrdquo In Proceedings ofthe 15th International Conference on Software Engineering Baltimore MarylandUSA May 17-21 1993 Pp 303ndash314 url httpportalacmorgcitationcfmid=257572257636

mdash (1995) ldquoOn the Frame Problem in Procedure Specificationsrdquo In IEEE Trans Soft-ware Eng 2110 pp 785ndash798 doi 10110932469460 url httpdxdoiorg10110932469460

Bouissou O Eacute Conquet P Cousot R Cousot J Feret K Ghorbal Eacute GoubaultD Lesens L Mauborgne A Mineacute S Putot X Rival and M Turin (2009)

214 BIBLIOGRAPHY

ldquoSpace Software Validation using Abstract Interpretationrdquo In Proc of the In-ternational Space System Engineering Conference on Data Systems in Aerospace(DASIA 2009) Vol SP-669 httpwww-aprlip6fr~minepubliarticle-bouissou-al-dasia09pdf Istambul Turkey ESA p 7 doi 19215321921553

Burdy Lilian Yoonsik Cheon David R Cok Michael D Ernst Joseph R Kiniry GaryT Leavens K Rustan M Leino and Erik Poll (2005) ldquoAn Overview of JML Toolsand Applicationsrdquo In STTT 73 pp 212ndash232 doi 101007s10009-004-0167-4url httpdxdoiorg101007s10009-004-0167-4

Calcagno Cristiano and Dino Distefano (2011) ldquoInfer An Automatic Program Verifierfor Memory Safety of C Programsrdquo In NASA Formal Methods - Third Interna-tional Symposium NFM 2011 Pasadena CA USA April 18-20 2011 Proceed-ings pp 459ndash465 doi 101007978-3-642-20398-5_33 url httpdxdoiorg101007978-3-642-20398-5_33

Calcagno Cristiano Dino Distefano Peter W OrsquoHearn and Hongseok Yang (2008)ldquoSpace Invading Systems Coderdquo In Logic-Based Program Synthesis and Transfor-mation 18th International Symposium LOPSTR 2008 Valencia Spain July 17-18 2008 Revised Selected Papers pp 1ndash3 doi 101007978-3-642-00515-2_1url httpdxdoiorg101007978-3-642-00515-2_1

mdash (2009) ldquoCompositional Shape Analysis by Means of Bi-Abductionrdquo In Proceedingsof the 36th ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages POPL 2009 pp 289ndash300 doi 10114514808811480917 url httpdoiacmorg10114514808811480917

mdash (2011) ldquoCompositional Shape Analysis by Means of Bi-Abductionrdquo In J ACM586 p 26 doi 10114520496972049700

Cardelli Luca and Peter Wegner (1985) ldquoOn Understanding Types Data Abstractionand Polymorphismrdquo In ACM Comput Surv 174 pp 471ndash522 doi 10114560416042 url httpdoiacmorg10114560416042

Castillo Rosa Francisco Corbera Angeles G Navarro Rafael Asenjo and Emilio LZapata (2008) ldquoComplete Def-Use Analysis in Recursive Programs with DynamicData Structuresrdquo In Euro-Par 2008 Workshops - Parallel Processing VHPC 2008UNICORE 2008 HPPC 2008 SGS 2008 PROPER 2008 ROIA 2008 and DPA2008 Las Palmas de Gran Canaria Spain August 25-26 2008 Revised SelectedPapers pp 273ndash282 doi 101007978-3-642-00955-6_32 url httpdxdoiorg101007978-3-642-00955-6_32

Catantildeo Neacutestor and Marieke Huisman (2003) ldquoCHASE A Static Checker for JMLrsquosAssignable Clauserdquo In Verification Model Checking and Abstract Interpretation4th International Conference VMCAI 2003 New York NY USA January 9-112002 Proceedings pp 26ndash40 doi 10 1007 3 - 540 - 36384 - X _ 6 url http dxdoiorg1010073-540-36384-X_6

Chalin Patrice Joseph R Kiniry Gary T Leavens and Erik Poll (2005) ldquoBeyondAssertions Advanced Specification and Verification with JML and ESCJava2rdquoIn Formal Methods for Components and Objects 4th International SymposiumFMCO 2005 Amsterdam The Netherlands November 1-4 2005 Revised Lectures

BIBLIOGRAPHY 215

pp 342ndash363 doi 10100711804192_16 url httpdxdoiorg10100711804192_16

Chang Bor-Yuh Evan and K Rustan M Leino (2005) ldquoAbstract Interpretation withAlien Expressions and Heap Structuresrdquo In Verification Model Checking andAbstract Interpretation 6th International Conference VMCAI 2005 Proceedingspp 147ndash163 doi 101007978-3-540-30579-8_11 url httpdxdoiorg101007978-3-540-30579-8_11

Clarke David G and Sophia Drossopoulou (2002) ldquoOwnership Encapsulation andthe Disjointness of Type and Effectrdquo In Proceedings of the 2002 ACM SIGPLANConference on Object-Oriented Programming Systems Languages and ApplicationsOOPSLA 2002 Seattle Washington USA November 4-8 2002 Pp 292ndash310 doi101145582419582447 url httpdoiacmorg101145582419582447

Clarke David G John Potter and James Noble (1998) ldquoOwnership Types for Flex-ible Alias Protectionrdquo In Proceedings of the 1998 ACM SIGPLAN Conferenceon Object-Oriented Programming Systems Languages amp Applications (OOPSLArsquo98) Vancouver British Columbia Canada October 18-22 1998 Pp 48ndash64 doi101145286936286947 url httpdoiacmorg101145286936286947

Clarke Edmund M and E Allen Emerson (1981) ldquoDesign and Synthesis of Synchro-nization Skeletons Using Branching-Time Temporal Logicrdquo In Logics of ProgramsWorkshop Yorktown Heights New York May 1981 pp 52ndash71 doi 10 1007BFb0025774 url httpdxdoiorg101007BFb0025774

Cok David R (2005) ldquoReasoning with Specifications Containing Method Calls andModel Fieldsrdquo In Journal of Object Technology 48 pp 77ndash103 doi 105381jot200548a4 url httpdxdoiorg105381jot200548a4

Cousot P and R Cousot (1994) ldquoHigher-Order Abstract Interpretation (and Appli-cation to Comportment Analysis Generalizing Strictness Termination Projectionand PER Analysis of Functional Languages) invited paperrdquo In Proceedings of the1994 International Conference on Computer Languages Toulouse France IEEEComputer Society Press Los Alamitos California pp 95ndash112

Cousot Patrick (2001) ldquoAbstract Interpretation Based Formal Methods and FutureChallengesrdquo In Informatics - 10 Years Back 10 Years Ahead Pp 138ndash156 doi1010073-540-44577-3_10 url httpdxdoiorg1010073-540-44577-3_10

Cousot Patrick and Radhia Cousot (1977) ldquoAbstract Interpretation A Unified Lat-tice Model for Static Analysis of Programs by Construction or Approximation ofFixpointsrdquo In Conference Record of the Fourth ACM Symposium on Principles ofProgramming Languages Los Angeles California USA January 1977 pp 238ndash252 doi 101145512950512973 url httpdoiacmorg101145512950512973

mdash (2010) ldquoA Gentle Introduction to Formal Verification of Computer Systems byAbstract Interpretationrdquo In Logics and Languages for Reliability and Securitypp 1ndash29 doi 103233978-1-60750-100-8-1 url httpdxdoiorg103233978-1-60750-100-8-1

216 BIBLIOGRAPHY

Cousot Patrick Radhia Cousot Jeacuterocircme Feret Laurent Mauborgne Antoine MineacuteDavid Monniaux and Xavier Rival (2005) ldquoThe ASTREEacute Analyzerrdquo In Program-ming Languages and Systems 14th European Symposium on ProgrammingESOP2005 Held as Part of the Joint European Conferences on Theory and Practice ofSoftware ETAPS 2005 Edinburgh UK April 4-8 2005 Proceedings pp 21ndash30doi 101007978-3-540-31987-0_3 url httpdxdoiorg101007978-3-540-31987-0_3

Cousot Patrick Radhia Cousot Jeacuterocircme Feret Antoine Mineacute Laurent MauborgneDavid Monniaux and Xavier Rival (2007) ldquoVarieties of Static Analyzers A Com-parison with ASTREErdquo In First Joint IEEEIFIP Symposium on Theoretical As-pects of Software Engineering TASE 2007 June 5-8 2007 Shanghai China pp 3ndash20 doi 101109TASE200755 url httpdxdoiorg101109TASE200755

Cuoq Pascal Virgile Prevosto and Boris Yakobowski Frama-C Value Analysis UserManual url httpframa-ccomdownloadframa-c-value-analysispdf

Cuoq Pascal Florent Kirchner Nikolai Kosmatov Virgile Prevosto Julien Signolesand Boris Yakobowski (2012) ldquoFrama-C - A Software Analysis Perspectiverdquo InSoftware Engineering and Formal Methods - 10th International Conference SEFM2012 Thessaloniki Greece October 1-5 2012 Proceedings pp 233ndash247 doi 101007978-3-642-33826-7_16 url httpdxdoiorg101007978-3-642-33826-7_16

Cytron Ron Jeanne Ferrante Barry K Rosen Mark N Wegman and F KennethZadeck (1989) ldquoAn Efficient Method of Computing Static Single Assignment FormrdquoIn Conference Record of the Sixteenth Annual ACM Symposium on Principles ofProgramming Languages Austin Texas USA January 11-13 1989 pp 25ndash35 doi1011457527775280 url httpdoiacmorg1011457527775280

Darvas Aacutedaacutem and Peter Muumlller (2006) ldquoReasoning About Method Calls in InterfaceSpecificationsrdquo In Journal of Object Technology 55 pp 59ndash85 doi 105381jot200655a3 url httpdxdoiorg105381jot200655a3

Delmas David and Jean Souyris (2007) ldquoAstreacutee From Research to Industryrdquo In StaticAnalysis 14th International Symposium SAS 2007 Kongens Lyngby DenmarkAugust 22-24 2007 Proceedings pp 437ndash451 doi 101007978-3-540-74061-2_27 url httpdxdoiorg101007978-3-540-74061-2_27

Dietl Werner and Peter Muumlller (2005) ldquoUniverses Lightweight Ownership for JMLrdquoIn Journal of Object Technology 48 pp 5ndash32 doi 105381jot200548a1url httpdxdoiorg105381jot200548a1

Dijkstra Edsger W (1976) A Discipline of Programming Prentice-HallDistefano Dino Peter W OrsquoHearn and Hongseok Yang (2006) ldquoA Local Shape Anal-

ysis Based on Separation Logicrdquo In Proceedings of the 12th International Con-ference on Tools and Algorithms for the Construction and Analysis of SystemsTACASrsquo06 Vienna Austria Springer-Verlag pp 287ndash302 isbn 3-540-33056-9978-3-540-33056-1

Distefano Dino and Matthew J Parkinson (2008) ldquojStar Towards Practical Verifi-cation for Javardquo In Proceedings of the 23rd Annual ACM SIGPLAN Conference

BIBLIOGRAPHY 217

on Object-Oriented Programming Systems Languages and Applications OOPSLA2008 October 19-23 2008 Nashville TN USA pp 213ndash226 doi 10 1145 14497641449782 url httpdoiacmorg10114514497641449782

Drossopoulou Sophia Adrian Francalanza Peter Muumlller and Alexander J Summers(2008) ldquoA Unified Framework for Verification Techniques for Object Invariantsrdquo InECOOP 2008 - Object-Oriented Programming 22nd European Conference PaphosCyprus July 7-11 2008 Proceedings pp 412ndash437 doi 101007978- 3- 540-70592-5_18 url httpdxdoiorg101007978-3-540-70592-5_18

Eclipse Java Development Tools (JDT) httpwwweclipseorgjdt Accessed2016-09-11

Feijs L M G Loe M G and H B M Jonkers (1992) Formal Specification andDesign Cambridge tracts in theoretical computer science Cambridge New YorkCambridge University Press isbn 0-521-43457-2 url httpopacinriafrrecord=b1083844

Flanagan Cormac K Rustan M Leino Mark Lillibridge Greg Nelson James B Saxeand Raymie Stata (2002) ldquoExtended Static Checking for Javardquo In Proceedingsof the 2002 ACM SIGPLAN Conference on Programming Language Design andImplementation (PLDI) Berlin Germany June 17-19 2002 pp 234ndash245 doi101145512529512558 url httpdoiacmorg101145512529512558

Floyd Robert W (1967) ldquoAssigning Meanings to Programsrdquo In Mathematical Aspectsof Computer Science Ed by J T Schwartz Vol 19 Proceedings of Symposia inApplied Mathematics Providence Rhode Island American Mathematical Societypp 19ndash32

Gallier Jean H (1987) Logic for Computer Science Foundations of Automatic Theo-rem Proving Wiley isbn 978-0-471-61546-0

Gharat Pritam M Uday P Khedker and Alan Mycroft (2016) ldquoFlow- and Context-Sensitive Points-To Analysis Using Generalized Points-To Graphsrdquo In Static Anal-ysis - 23rd International Symposium SAS 2016 Edinburgh UK September 8-102016 Proceedings pp 212ndash236 doi 101007978- 3- 662- 53413- 7_11 urlhttpdxdoiorg101007978-3-662-53413-7_11

Greenhouse Aaron and John Boyland (1999) ldquoAn Object-Oriented Effects SystemrdquoIn ECOOPrsquo99 - Object-Oriented Programming 13th European Conference LisbonPortugal June 14-18 1999 Proceedings pp 205ndash229 doi 1010073-540-48743-3_10 url httpdxdoiorg1010073-540-48743-3_10

Gross Thomas R and Peter Steenkiste (1990) ldquoStructured Dataflow Analysis for Ar-rays and its Use in an Optimizing Compilerrdquo In Softw Pract Exper 202 pp 133ndash155 doi 101002spe4380200203 url httpdxdoiorg101002spe4380200203

Guttag John V James J Horning and Jeannette M Wing (1985) ldquoThe Larch Familyof Specification Languagesrdquo In IEEE Software 25 pp 24ndash36 doi 101109MS1985231756 url httpdxdoiorg101109MS1985231756

Guttag John V James J Horning Stephen J Garland Kevin D Jones A Modet andJeannette M Wing (1993a) Larch Languages and Tools for Formal SpecificationTexts and Monographs in Computer Science Springer isbn 978-1-4612-7636-4

218 BIBLIOGRAPHY

doi 101007978-1-4612-2704-5 url httpdxdoiorg101007978-1-4612-2704-5

Guttag John V James J Horning Stephen J Garland Kevin D Jones A Modet andJeannette M Wing (1993b) Larch Languages and Tools for Formal SpecificationTexts and Monographs in Computer Science Springer isbn 978-1-4612-7636-4doi 101007978-1-4612-2704-5 url httpdxdoiorg101007978-1-4612-2704-5

Hammer Christian and Gregor Snelting (2009) ldquoFlow-Sensitive Context-Sensitiveand Object-Sensitive Information Flow Control based on Program DependenceGraphsrdquo In Int J Inf Sec 86 pp 399ndash422 doi 101007s10207-009-0086-1url httpdxdoiorg101007s10207-009-0086-1

Hatcliff John Gary T Leavens K Rustan M Leino Peter Muumlller and Matthew JParkinson (2012) ldquoBehavioral Interface Specification Languagesrdquo In ACM Com-put Surv 443 p 16 doi 10114521876712187678 url httpdoiacmorg10114521876712187678

Heintze Nevin and Olivier Tardieu (2001) ldquoDemand-Driven Pointer Analysisrdquo InProceedings of the ACM SIGPLAN 2001 Conference on Programming LanguageDesign and Implementation PLDI rsquo01 Snowbird Utah USA ACM pp 24ndash34isbn 1-58113-414-2 doi 101145378795378802 url httpdoiacmorg101145378795378802

Hind Michael (2001) ldquoPointer Analysis Havenrsquot We Solved This Problem Yetrdquo InProceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program AnalysisFor Software Tools and Engineering PASTErsquo01 Snowbird Utah USA June 18-19 2001 pp 54ndash61 doi 101145379605379665 url httpdoiacmorg101145379605379665

Hoare C A R (1969) ldquoAn Axiomatic Basis for Computer Programmingrdquo In Com-mun ACM 1210 pp 576ndash580 doi 101145363235363259 url httpdoiacmorg101145363235363259

mdash (1971) ldquoProcedures and Parameters An Axiomatic Approachrdquo In Symposium onSemantics of Algorithmic Languages pp 102ndash116 doi 101007BFb0059696 urlhttpdxdoiorg101007BFb0059696

Horwitz Susan Thomas W Reps and Shmuel Sagiv (1995) ldquoDemand Interproce-dural Dataflow Analysisrdquo In SIGSOFT rsquo95 Proceedings of the Third ACM SIG-SOFT Symposium on Foundations of Software Engineering Washington DC USAOctober 10-13 1995 pp 104ndash115 doi 10 1145 222124222146 url http doiacmorg101145222124222146

Hughes J (1987) ldquoBackwards Analysis of Functional Programsrdquo In IFIP Workshopon Partial Evaluation and Mivxed Computation Ed by Bjoslashrner and Ershov

Hur Chung-Kil Derek Dreyer and Viktor Vafeiadis (2011) ldquoSeparation Logic in thePresence of Garbage Collectionrdquo In Proceedings of the 26th Annual IEEE Sym-posium on Logic in Computer Science LICS 2011 June 21-24 2011 TorontoOntario Canada pp 247ndash256 doi 101109LICS201146 url httpdxdoiorg101109LICS201146

BIBLIOGRAPHY 219

Jacobs Bart and Frank Piessens (2006) ldquoVerification of Programs with Inspector Meth-odsrdquo In In FTfJP 2006

Jacobs Bart Jan Smans and Frank Piessens (2010) ldquoA Quick Tour of the VeriFastProgram Verifierrdquo In Programming Languages and Systems - 8th Asian Sympo-sium APLAS 2010 Shanghai China November 28 - December 1 2010 Proceed-ings pp 304ndash311 doi 101007978-3-642-17164-2_21 url httpdxdoiorg101007978-3-642-17164-2_21

Jacobs Bart Jan Smans Pieter Philippaerts Freacutedeacuteric Vogels Willem Penninckx andFrank Piessens (2011) ldquoVeriFast A Powerful Sound Predictable Fast Verifier for Cand Javardquo In NASA Formal Methods - Third International Symposium NFM 2011Pasadena CA USA April 18-20 2011 Proceedings pp 41ndash55 doi 101007978-3-642-20398-5_4 url httpdxdoiorg101007978-3-642-20398-5_4

Java Native Interface Documentation (JNI) url https docs oracle com javase7docstechnotesguidesjnispecintrohtmlwp725 (Accessed09112016)

Jensen Simon Holm Anders Moslashller and Peter Thiemann (2010) ldquoInterproceduralAnalysis with Lazy Propagationrdquo In Static Analysis - 17th International Sympo-sium SAS 2010 Perpignan France September 14-16 2010 Proceedings pp 320ndash339 doi 101007978-3-642-15769-1_20 url httpdxdoiorg101007978-3-642-15769-1_20

Jhala Ranjit and Rupak Majumdar (2009) ldquoSoftware Model Checkingrdquo In ACMComput Surv 414 211ndash2154 doi 10 1145 1592434 1592438 url http doiacmorg10114515924341592438

Jones Cliff B (1990) Systematic Software Development Using VDM (2Nd Ed) UpperSaddle River NJ USA Prentice-Hall Inc isbn 0-13-880733-7

Jones Neil D and Steven S Muchnick (1979) ldquoFlow Analysis and Optimization of Lisp-Like Structuresrdquo In Conference Record of the Sixth Annual ACM Symposium onPrinciples of Programming Languages 1979 pp 244ndash256 doi 101145567752567776 url httpdoiacmorg101145567752567776

Jones Simon B and Daniel Le Meacutetayer (1989) ldquoComputer-Time Garbage Collectionby Sharing Analysisrdquo In Proceedings of the fourth international conference onFunctional programming languages and computer architecture FPCA 1989 Lon-don UK September 11-13 1989 pp 54ndash74 doi 1011459937099375 urlhttpdoiacmorg1011459937099375

Kassios Ioannis T (2006) ldquoDynamic Frames Support for Framing Dependencies andSharing Without Restrictionsrdquo In FM 2006 Formal Methods 14th InternationalSymposium on Formal Methods Hamilton Canada August 21-27 2006 Proceed-ings pp 268ndash283 doi 10100711813040_19 url httpdxdoiorg10100711813040_19

mdash (2011) ldquoThe Dynamic Frames Theoryrdquo In Formal Asp Comput 233 pp 267ndash288doi 101007s00165-010-0152-5 url httpdxdoiorg101007s00165-010-0152-5

220 BIBLIOGRAPHY

Kennedy Ken (1978) ldquoUse-Definition Chains with Applicationsrdquo In Comput Lang33 pp 163ndash179 doi 1010160096-0551(78)90009-7 url httpdxdoiorg1010160096-0551(78)90009-7

Khedker Uday P Alan Mycroft and Prashant Singh Rawat (2011) ldquoLazy PointerAnalysisrdquo In CoRR abs11125000 url httparxivorgabs11125000

Kildall Gary A (1973) ldquoA Unified Approach to Global Program Optimizationrdquo InConference Record of the ACM Symposium on Principles of Programming Lan-guages 1973 pp 194ndash206 doi 101145512927512945 url httpdoiacmorg101145512927512945

Klein Gerwin Kevin Elphinstone Gernot Heiser June Andronick David Cock PhilipDerrin Dhammika Elkaduwe Kai Engelhardt Rafal Kolanski Michael NorrishThomas Sewell Harvey Tuch and Simon Winwood (2009) ldquoseL4 Formal Verifica-tion of an OS Kernelrdquo In Proceedings of the ACM SIGOPS 22Nd Symposium onOperating Systems Principles SOSP rsquo09 Big Sky Montana USA ACM pp 207ndash220 isbn 978-1-60558-752-3 doi 10114516295751629596 url httpdoiacmorg10114516295751629596

Knoop Jens Oliver Ruumlthing and Bernhard Steffen (1994) ldquoPartial Dead Code Elim-inationrdquo In Proceedings of the ACM SIGPLANrsquo94 Conference on ProgrammingLanguage Design and Implementation (PLDI) Orlando Florida USA June 20-24 1994 pp 147ndash158 doi 101145178243178256 url httpdoiacmorg101145178243178256

Koenig Jason and K Rustan M Leino (2012) ldquoGetting Started with Dafny A GuiderdquoIn Software Safety and Security - Tools for Analysis and Verification pp 152ndash181doi 103233978-1-61499-028-4-152 url httpdxdoiorg103233978-1-61499-028-4-152

Kogtenkov Alexander Bertrand Meyer and Sergey Velder (2015) ldquoAlias CalculusChange Calculus and Frame Inferencerdquo In Sci Comput Program 97P1 pp 163ndash172 issn 0167-6423

Lattner Chris Andrew Lenharth and Vikram S Adve (2007) ldquoMaking Context-Sensitive Points-To Analysis with Heap Cloning Practical for the Real WorldrdquoIn Proceedings of the ACM SIGPLAN 2007 Conference on Programming LanguageDesign and Implementation 2007 pp 278ndash289 doi 10114512507341250766url httpdoiacmorg10114512507341250766

Leavens Gary T Albert L Baker and Clyde Ruby (2006) ldquoPreliminary Design ofJML A Behavioral Interface Specification Language for Javardquo In ACM SIGSOFTSoftware Engineering Notes 313 pp 1ndash38 doi 10114511278781127884 urlhttpdoiacmorg10114511278781127884

Leavens Gary T and Curtis Clifton (2005) ldquoLessons from the JML Projectrdquo In Veri-fied Software Theories Tools Experiments First IFIP TC 2WG 23 ConferenceVSTTE 2005 Zurich Switzerland October 10-13 2005 Revised Selected Papersand Discussions pp 134ndash143 doi 10 1007 978 - 3 - 540 - 69149 - 5 _ 15 urlhttpdxdoiorg101007978-3-540-69149-5_15

Leavens Gary T K Rustan M Leino and Peter Muumlller (2007) ldquoSpecification andVerification Challenges for Sequential Object-Oriented Programsrdquo In Formal Asp

BIBLIOGRAPHY 221

Comput 192 pp 159ndash189 doi 10 1007 s00165 - 007 - 0026 - 7 url http dxdoiorg101007s00165-007-0026-7

Leavens Gary T and Peter Muumlller (2007) ldquoInformation Hiding and Visibility in In-terface Specificationsrdquo In 29th International Conference on Software Engineer-ing (ICSE 2007) Minneapolis MN USA May 20-26 2007 pp 385ndash395 doi101109ICSE200744 url httpdxdoiorg101109ICSE200744

Leavens Gary T Erik Poll Curtis Clifton Yoonsik Cheon Clyde Ruby David Cokand Joseph Kiniry (2006) JML Reference Manual

Lehner Hermann and Peter Muumlller (2010) ldquoEfficient Runtime Assertion Checking ofAssignable Clauses with Datagroupsrdquo In Fundamental Approaches to Software En-gineering 13th International Conference FASE 2010 Held as Part of the JointEuropean Conferences on Theory and Practice of Software ETAPS 2010 PaphosCyprus March 20-28 2010 Proceedings pp 338ndash352 doi 101007978-3-642-12029-9_24 url httpdxdoiorg101007978-3-642-12029-9_24

Leinenbach Dirk and Thomas Santen (2009) ldquoVerifying the Microsoft Hyper-V Hy-pervisor with VCCrdquo In FM 2009 Formal Methods Second World Congress Eind-hoven The Netherlands November 2-6 2009 Proceedings Ed by Ana Cavalcantiand Dennis R Dams Berlin Heidelberg Springer Berlin Heidelberg pp 806ndash809isbn 978-3-642-05089-3 doi 101007978- 3- 642- 05089- 3_51 url httpdxdoiorg101007978-3-642-05089-3_51

Leino K Rustan M This is Boogie 2 Boogie Reference Manual url http researchmicrosoftcomen-usumpeopleleinopaperskrml178pdf

mdash (1998) ldquoData Groups Specifying the Modification of Extended Staterdquo In Pro-ceedings of the 1998 ACM SIGPLAN Conference on Object-Oriented ProgrammingSystems Languages amp Applications (OOPSLA rsquo98) Vancouver British ColumbiaCanada October 18-22 1998 Pp 144ndash153 doi 101145286936286953 urlhttpdoiacmorg101145286936286953

mdash (2001) ldquoExtended Static Checking A Ten-Year Perspectiverdquo In Informatics - 10Years Back 10 Years Ahead Pp 157ndash175 doi 1010073-540-44577-3_11 urlhttpdxdoiorg1010073-540-44577-3_11

mdash (2010) ldquoDafny An Automatic Program Verifier for Functional Correctnessrdquo InLogic for Programming Artificial Intelligence and Reasoning - 16th InternationalConference LPAR-16 Dakar Senegal April 25-May 1 2010 Revised Selected Pa-pers pp 348ndash370 doi 101007978-3-642-17511-4_20 url httpdxdoiorg101007978-3-642-17511-4_20

Leino K Rustan M and Peter Muumlller (2004) ldquoObject Invariants in Dynamic Con-textsrdquo In ECOOP 2004 - Object-Oriented Programming 18th European Confer-ence Oslo Norway June 14-18 2004 Proceedings pp 491ndash516 doi 101007978-3-540-24851-4_22 url httpdxdoiorg101007978-3-540-24851-4_22

mdash (2006) ldquoA Verification Methodology for Model Fieldsrdquo In Programming Languagesand Systems 15th European Symposium on Programming ESOP 2006 Held as Partof the Joint European Conferences on Theory and Practice of Software ETAPS

222 BIBLIOGRAPHY

2006 Vienna Austria March 27-28 2006 Proceedings pp 115ndash130 doi 10 100711693024_9 url httpdxdoiorg10100711693024_9

Leino K Rustan M and Peter Muumlller (2008a) ldquoUsing the Spec Language Method-ology and Tools to Write Bug-Free Programsrdquo In Advanced Lectures on SoftwareEngineering LASER Summer School 20072008 pp 91ndash139 doi 101007978-3-642-13010-6_4 url httpdxdoiorg101007978-3-642-13010-6_4

mdash (2008b) ldquoVerification of Equivalent-Results Methodsrdquo In Programming Languagesand Systems 17th European Symposium on Programming ESOP 2008 Held as Partof the Joint European Conferences on Theory and Practice of Software ETAPS2008 Budapest Hungary March 29-April 6 2008 Proceedings pp 307ndash321 doi101007978-3-540-78739-6_24 url httpdxdoiorg101007978-3-540-78739-6_24

Leino K Rustan M Peter Muumlller and Jan Smans (2009) ldquoVerification of Concur-rent Programs with Chalicerdquo In Foundations of Security Analysis and Design VFOSAD 200720082009 Tutorial Lectures pp 195ndash222 doi 101007978- 3-642-03829-7_7 url httpdxdoiorg101007978-3-642-03829-7_7

Leino K Rustan M Peter Muumlller and Angela Wallenburg (2008) ldquoFlexible Im-mutability with Frozen Objectsrdquo In Verified Software Theories Tools Experi-ments Second International Conference VSTTE 2008 Toronto Canada October6-9 2008 Proceedings pp 192ndash208 doi 101007978-3-540-87873-5_17 urlhttpdxdoiorg101007978-3-540-87873-5_17

Leino K Rustan M and Greg Nelson (1998) ldquoAn Extended Static Checker for Modular-3rdquo In Compiler Construction 7th International Conference CCrsquo98 Held as Part ofthe European Joint Conferences on the Theory and Practice of Software ETAPSrsquo98Lisbon Portugal March 28 - April 4 1998 Proceedings pp 302ndash305 doi 101007BFb0026441 url httpdxdoiorg101007BFb0026441

mdash (2002) ldquoData Abstraction and Information Hidingrdquo In ACM Trans ProgramLang Syst 245 pp 491ndash553 doi 101145570886570888 url httpdoiacmorg101145570886570888

Leino K Rustan M Arnd Poetzsch-Heffter and Yunhong Zhou (2002) ldquoUsing DataGroups to Specify and Check Side Effectsrdquo In Proceedings of the 2002 ACM SIG-PLAN Conference on Programming Language Design and Implementation (PLDI)Berlin Germany June 17-19 2002 pp 246ndash257 doi 101145512529512559url httpdoiacmorg101145512529512559

Leino K Rustan M and Philipp Ruumlmmer (2010) ldquoA Polymorphic Intermediate Ver-ification Language Design and Logical Encodingrdquo In Tools and Algorithms forthe Construction and Analysis of Systems 16th International Conference TACAS2010 Held as Part of the Joint European Conferences on Theory and Practice ofSoftware ETAPS 2010 Paphos Cyprus March 20-28 2010 Proceedings pp 312ndash327 doi 101007978-3-642-12002-2_26 url httpdxdoiorg101007978-3-642-12002-2_26

Leroy Xavier (2009) ldquoA Formally Verified Compiler Back-endrdquo In J Autom Reason-ing 434 pp 363ndash446 doi 101007s10817-009-9155-4 url httpdxdoiorg101007s10817-009-9155-4

BIBLIOGRAPHY 223

Leroy Xavier and Franccedilois Pessaux (2000) ldquoType-Based Analysis of Uncaught Excep-tionsrdquo In ACM Trans Program Lang Syst 222 pp 340ndash377 doi 101145349214349230 url httpdoiacmorg101145349214349230

Lescuyer Steacutephane (2015) ldquoProvenCore Towards a Verified Isolation Micro-KernelrdquoIn International Workshop on MILS Architecture and Assurance for Secure Sys-tems url httpmils-workshop-2015euromilseu

Leuschel Michael and Morten Heine Soslashrensen (1996) ldquoRedundant Argument Filteringof Logic Programsrdquo In Logic Programming Synthesis and Transformation 6th In-ternational Workshop LOPSTRrsquo96 Stockholm Sweden August 28-30 1996 Pro-ceedings pp 83ndash103 doi 1010073-540-62718-9_6 url httpdxdoiorg1010073-540-62718-9_6

Lhotaacutek Ondrej and Laurie J Hendren (2006) ldquoContext-Sensitive Points-to AnalysisIs It Worth Itrdquo In Compiler Construction 15th International Conference CC2006 Held as Part of the Joint European Conferences on Theory and Practice ofSoftware ETAPS 2006 Vienna Austria March 30-31 2006 Proceedings pp 47ndash64 doi 10100711688839_5 url httpdxdoiorg10100711688839_5

Liang Sheng (1999) Java Native Interface Programmerrsquos Guide and Reference 1stBoston MA USA Addison-Wesley Longman Publishing Co Inc isbn 0201325772

Liskov Barbara and John Guttag (1986) Abstraction and Specification in ProgramDevelopment Cambridge MA USA MIT Press isbn 0-262-12112-3

Liu Yanhong A (1998) ldquoDependence Analysis for Recursive Datardquo In Proceedings ofthe 1998 International Conference on Computer Languages ICCL 1998 ChicagoIL USA May 14-16 1998 pp 206ndash215 doi 101109ICCL1998674171 urlhttpdxdoiorg101109ICCL1998674171

Liu Yanhong A and Scott D Stoller (2003) ldquoEliminating Dead Code on RecursiveDatardquo In Sci Comput Program 472-3 pp 221ndash242 doi 10 1016 S0167 -6423(02)00134-X url httpdxdoiorg101016S0167-6423(02)00134-X

Lu Yi John Potter and Jingling Xue (2007) ldquoValidity Invariants and Effectsrdquo InECOOP 2007 - Object-Oriented Programming 21st European Conference BerlinGermany July 30 - August 3 2007 Proceedings pp 202ndash226 doi 101007978-3-540-73589-2_11 url httpdxdoiorg101007978-3-540-73589-2_11

Marcheacute Claude Christine Paulin-Mohring and Xavier Urbain (2004) ldquoThe KRAKA-TOA Tool for Certification of JAVAJAVACARD Programs Annotated in JMLrdquo InJ Log Algebr Program 581-2 pp 89ndash106 doi 101016jjlap200307006url httpdxdoiorg101016jjlap200307006

Marcheacute Claude (2016) The Krakatoa Verification Tool for Java Programs KrakatoaTutorial and Reference Manual url httpkrakatoalrifrkrakatoapdf

Martin-Loumlf Per (1984) Intuitionistic Type Theory Naples BibliopolisMcCarthy John and Patrick J Hayes (1969) ldquoSome Philosophical Problems from the

Standpoint of Artificial Intelligencerdquo In Machine Intelligence Edinburgh Univer-sity Press

Meyer Bertrand (1991) Eiffel The Language Prentice-Hall isbn 0-13-247925-7mdash (1992) ldquoApplying Design by Contractrdquo In IEEE Computer 2510 pp 40ndash51

doi 1011092161279 url httpdxdoiorg1011092161279

224 BIBLIOGRAPHY

Meyer Bertrand (1997) Object-Oriented Software Construction 2nd Edition Prentice-Hall isbn 0-13-629155-4

mdash (2010) ldquoTowards a Theory and Calculus of Aliasingrdquo In Journal of Object Tech-nology 92 pp 37ndash74 doi 105381jot201092c5 url httpdxdoiorg105381jot201092c5

mdash (2011) ldquoSteps Towards a Theory and Calculus of Aliasingrdquo In Int J Softwareand Informatics 51-2 pp 77ndash115 url httpwwwijsiorgchreaderview_abstractaspxfile_no=i77

mdash (2015) ldquoFraming the Frame Problemrdquo In Dependable Software Systems Engineer-ing pp 193ndash203 doi 103233978-1-61499-495-4-193 url httpdxdoiorg103233978-1-61499-495-4-193

Midtgaard Jan (2012) ldquoControl-Flow Analysis of Functional Programsrdquo In ACMComput Surv 443 p 10 doi 10114521876712187672 url httpdoiacmorg10114521876712187672

Mike Barnett Rustan Leino Wolfram Schulte (2005) ldquoThe Spec Programming Sys-tem An Overviewrdquo In CASSIS 2004 Construction and Analysis of Safe Secureand Interoperable Smart devices Vol 3362 Springer pp 49ndash69 url httpswwwmicrosoftcomen-usresearchpublicationthe-spec-programming-system-an-overview

Milanova Ana Atanas Rountev and Barbara G Ryder (2005) ldquoParameterized ObjectSensitivity for Points-To Analysis for Javardquo In ACM Trans Softw Eng Methodol141 pp 1ndash41 doi 10114510448341044835 url httpdoiacmorg10114510448341044835

Montenegro Manuel Ricardo Pentildea and Clara Segura (2015) ldquoShape Analysis in aFunctional Language by Using Regular Languagesrdquo In Sci Comput Program 111pp 51ndash78 doi 101016jscico201412006 url httpdxdoiorg101016jscico201412006

Morgenstern Leora (1995) ldquoThe Problem with Solutions to the Frame Problemrdquo InThe Robotrsquos Dilemma Revisited The Frame Problem in Artificial Intelligence AblexAblex Publishing Co pp 99ndash133

Moura Leonardo Mendonccedila de and Nikolaj Bjoslashrner (2008) ldquoZ3 An Efficient SMTSolverrdquo In Tools and Algorithms for the Construction and Analysis of Systems14th International Conference TACAS 2008 Held as Part of the Joint EuropeanConferences on Theory and Practice of Software ETAPS 2008 Budapest HungaryMarch 29-April 6 2008 Proceedings pp 337ndash340 doi 101007978- 3- 540-78800-3_24 url httpdxdoiorg101007978-3-540-78800-3_24

Muumlller Peter (2002) Modular Specification and Verification of Object-Oriented Pro-grams Vol 2262 Lecture Notes in Computer Science Springer isbn 3-540-43167-5 doi 1010073-540-45651-1 url httpdxdoiorg1010073-540-45651-1

Muumlller Peter Arnd Poetzsch-Heffter and Gary T Leavens (2003) ldquoModular Specifi-cation of Frame Properties in JMLrdquo In Concurrency and Computation Practiceand Experience 152 pp 117ndash154 doi 101002cpe713 url httpdxdoiorg101002cpe713

BIBLIOGRAPHY 225

mdash (2006) ldquoModular Invariants for Layered Object Structuresrdquo In Sci Comput Pro-gram 623 pp 253ndash286 doi 10 1016 j scico 2006 03 001 url http dxdoiorg101016jscico200603001

Naudziuniene Daiva Matko Botincan Dino Distefano Mike Dodds Radu Grigore andMatthew J Parkinson (2011) ldquojStar-Eclipse An IDE for Automated Verificationof Java Programsrdquo In SIGSOFTFSErsquo11 19th ACM SIGSOFT Symposium on theFoundations of Software Engineering (FSE-19) and ESECrsquo11 13th European Soft-ware Engineering Conference (ESEC-13) Szeged Hungary September 5-9 2011pp 428ndash431 doi 10114520251132025182 url httpdoiacmorg10114520251132025182

Naur Peter (1966) ldquoProof of Algorithms by General Snapshotsrdquo In BIT NumericalMathematics 64 pp 310ndash316 issn 1572-9125 doi 101007BF01966091 urlhttpdxdoiorg101007BF01966091

Nelson Greg and Derek C Oppen (1980) ldquoFast Decision Procedures Based on Con-gruence Closurerdquo In J ACM 272 pp 356ndash364 doi 101145322186322198url httpdoiacmorg101145322186322198

Nielson Flemming and Hanne Riis Nielson (1999) ldquoInterprocedural Control Flow Anal-ysisrdquo In Programming Languages and Systems 8th European Symposium on Pro-gramming ESOPrsquo99 Held as Part of the European Joint Conferences on the Theoryand Practice of Software ETAPSrsquo99 Amsterdam The Netherlands 22-28 March1999 Proceedings pp 20ndash39 doi 10 1007 3 - 540 - 49099 - X _ 3 url http dxdoiorg1010073-540-49099-X_3

Nielson Flemming Hanne Riis Nielson and Chris Hankin (1999) Principles of ProgramAnalysis Springer isbn 978-3-540-65410-0

Nordio Martin Cristiano Calcagno Bertrand Meyer Peter Muumlller and Julian Tschan-nen (2010) ldquoReasoning about Function Objectsrdquo In Objects Models ComponentsPatterns 48th International Conference TOOLS 2010 Maacutelaga Spain June 28 -July 2 2010 Proceedings pp 79ndash96 doi 101007978-3-642-13953-6_5 urlhttpdxdoiorg101007978-3-642-13953-6_5

Nordstroumlm Bengt Kent Petersson and Jan M Smith (1990) Programming in Martin-Loumlfrsquos Type Theory Vol 200 Oxford University Press Oxford

OrsquoCallahan Robert and Daniel Jackson (1997) ldquoLackwit A Program UnderstandingTool Based on Type Inferencerdquo In Pulling Together Proceedings of the 19th Inter-national Conference on Software Engineering Boston Massachusetts USA May17-23 1997 Pp 338ndash348 doi 101145253228253351 url httpdoiacmorg101145253228253351

OrsquoHearn Peter W (2005) ldquoScalable Specification and Reasoning Challenges for Pro-gram Logicrdquo In Verified Software Theories Tools Experiments First IFIP TC2WG 23 Conference VSTTE 2005 Zurich Switzerland October 10-13 2005Revised Selected Papers and Discussions pp 116ndash133 doi 101007978-3-540-69149-5_14 url httpdxdoiorg101007978-3-540-69149-5_14

mdash (2012) ldquoA Primer on Separation Logic (and Automatic Program Verification andAnalysis)rdquo In Software Safety and Security - Tools for Analysis and Verification

226 BIBLIOGRAPHY

pp 286ndash318 doi 103233978-1-61499-028-4-286 url httpdxdoiorg103233978-1-61499-028-4-286

OrsquoHearn Peter W John C Reynolds and Hongseok Yang (2001) ldquoLocal Reasoningabout Programs that Alter Data Structuresrdquo In Computer Science Logic 15thInternational Workshop CSL 2001 10th Annual Conference of the EACSL ParisFrance September 10-13 2001 Proceedings pp 1ndash19 doi 1010073-540-44802-0_1 url httpdxdoiorg1010073-540-44802-0_1

OrsquoHearn Peter W Hongseok Yang and John C Reynolds (2004) ldquoSeparation andInformation Hidingrdquo In Proceedings of the 31st ACM SIGPLAN-SIGACT Sympo-sium on Principles of Programming Languages POPL 2004 Venice Italy January14-16 2004 pp 268ndash280 doi 101145964001964024 url httpdoiacmorg101145964001964024

Padhye Rohan and Uday P Khedker (2013) ldquoInterprocedural Data Flow Analysisin Soot Using Value Contextsrdquo In Proceedings of the 2nd ACM SIGPLAN In-ternational Workshop on State Of the Art in Java Program analysis SOAP 2013Seattle WA USA June 20 2013 pp 31ndash36 doi 10114524875682487569url httpdoiacmorg10114524875682487569

Park Young Gil and Benjamin Goldberg (1992) ldquoEscape Analysis on Listsrdquo In Pro-ceedings of the ACM SIGPLANrsquo92 Conference on Programming Language Designand Implementation (PLDI) San Francisco California USA June 17-19 1992pp 116ndash127 doi 101145143095143125 url httpdoiacmorg101145143095143125

Parkinson Matthew J and Gavin M Bierman (2005) ldquoSeparation Logic and Ab-stractionrdquo In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium onPrinciples of Programming Languages POPL 2005 Long Beach California USAJanuary 12-14 2005 pp 247ndash258 doi 10114510403051040326 url httpdoiacmorg10114510403051040326

Parkinson Matthew J Richard Bornat and Cristiano Calcagno (2006) ldquoVariables asResource in Hoare Logicsrdquo In 21th IEEE Symposium on Logic in Computer Science(LICS 2006) 12-15 August 2006 Seattle WA USA Proceedings pp 137ndash146 doi101109LICS200652 url httpdxdoiorg101109LICS200652

Pierce Benjamin C (2002) Types and Programming Languages MIT Press isbn 978-0-262-16209-8

Plotkin Gordon D (2004) ldquoA Structural Approach to Operational Semanticsrdquo In JLog Algebr Program 60-61 pp 17ndash139

Polikarpova Nadia Carlo A Furia Yu Pei Yi Wei and Bertrand Meyer (2013) ldquoWhatGood are Strong Specificationsrdquo In 35th International Conference on SoftwareEngineering ICSE rsquo13 San Francisco CA USA May 18-26 2013 pp 262ndash271doi 101109ICSE20136606572 url httpdxdoiorg101109ICSE20136606572

Praun Christoph von and Thomas R Gross (2003) ldquoStatic Conflict Analysis forMulti-Threaded Object-Oriented Programsrdquo In Proceedings of the ACM SIGPLAN2003 Conference on Programming Language Design and Implementation 2003 San

BIBLIOGRAPHY 227

Diego California USA June 9-11 2003 pp 115ndash128 doi 101145781131781145 url httpdoiacmorg101145781131781145

Rakamaric Zvonimir and Alan J Hu (2008) ldquoAutomatic Inference of Frame AxiomsUsing Static Analysisrdquo In 23rd IEEEACM International Conference on Auto-mated Software Engineering (ASE 2008) pp 89ndash98 doi 101109ASE200819url httpdxdoiorg101109ASE200819

Reacutemy Didier and Jerome Vouillon (1997) ldquoObjective ML A Simple Object-OrientedExtension of MLrdquo In Conference Record of POPLrsquo97 The 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages Papers Presentedat the Symposium Paris France 15-17 January 1997 pp 40ndash53 doi 101145263699263707 url httpdoiacmorg101145263699263707

Reps Thomas W Susan Horwitz and Shmuel Sagiv (1995) ldquoPrecise InterproceduralDataflow Analysis via Graph Reachabilityrdquo In Conference Record of POPLrsquo9522nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lan-guages San Francisco California USA January 23-25 1995 pp 49ndash61 doi101145199448199462 url httpdoiacmorg101145199448199462

Reps Thomas W and Todd Turnidge (1996) ldquoProgram Specialization via ProgramSlicingrdquo In Partial Evaluation International Seminar Dagstuhl Castle GermanyFebruary 12-16 1996 Selected Papers pp 409ndash429 doi 1010073-540-61580-6_20 url httpdxdoiorg1010073-540-61580-6_20

Reynolds John C (1981) The Craft of Programming Prentice Hall International seriesin computer science Prentice Hall isbn 978-0-13-188862-3

mdash (2000) ldquoIntuitionistic Reasoning about Shared Mutable Data Structurerdquo In Mil-lennial Perspectives in Computer Science Palgrave pp 303ndash321

mdash (2002) ldquoSeparation Logic A Logic for Shared Mutable Data Structuresrdquo In 17thIEEE Symposium on Logic in Computer Science (LICS 2002) 22-25 July 2002Copenhagen Denmark Proceedings pp 55ndash74 doi 101109LICS20021029817url httpdxdoiorg101109LICS20021029817

mdash (2005) ldquoAn Overview of Separation Logicrdquo In Verified Software Theories ToolsExperiments First IFIP TC 2WG 23 Conference VSTTE 2005 Zurich Switzer-land October 10-13 2005 Revised Selected Papers and Discussions pp 460ndash469doi 101007978-3-540-69149-5_49 url httpdxdoiorg101007978-3-540-69149-5_49

Robert Valentin and Xavier Leroy (2012) ldquoA Formally-Verified Alias Analysisrdquo InCertified Programs and Proofs - Second International Conference CPP 2012 KyotoJapan December 13-15 2012 Proceedings pp 11ndash26 doi 101007978-3-642-35308-6_5 url httpdxdoiorg101007978-3-642-35308-6_5

Ruf Erik (1995) ldquoContext-Insensitive Alias Analysis Reconsideredrdquo In Proceedingsof the ACM SIGPLAN 1995 Conference on Programming Language Design andImplementation PLDI rsquo95 La Jolla California USA ACM pp 13ndash22 isbn 0-89791-697-2 doi 101145207110207112 url httpdoiacmorg101145207110207112

Sabelfeld Andrei and Andrew C Myers (2003) ldquoLanguage-Based Information-FlowSecurityrdquo In IEEE Journal on Selected Areas in Communications 211 pp 5ndash19

228 BIBLIOGRAPHY

doi 101109JSAC2002806121 url httpdxdoiorg101109JSAC2002806121

Sagiv Shmuel Thomas W Reps and Reinhard Wilhelm (1999) ldquoParametric ShapeAnalysis via 3-Valued Logicrdquo In POPL rsquo99 Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages 1999 pp 105ndash118doi 101145292540292552 url httpdoiacmorg101145292540292552

Salcianu Alexandru and Martin C Rinard (2005) ldquoPurity and Side Effect Analysis forJava Programsrdquo In Verification Model Checking and Abstract Interpretation 6thInternational Conference VMCAI 2005 Proceedings pp 199ndash215 doi 101007978-3-540-30579-8_14 url httpdxdoiorg101007978-3-540-30579-8_14

Shapiro Marc and Susan Horwitz (1997) ldquoThe Effects of the Precision of Pointer Anal-ysisrdquo In Static Analysis 4th International Symposium SAS rsquo97 Paris FranceSeptember 8-10 1997 Proceedings pp 16ndash34 doi 101007BFb0032731 urlhttpdxdoiorg101007BFb0032731

Sharir M and A Pnueli (1978) Two Approaches to Interprocedural Data Flow AnalysisNew York NY New York Univ Comput Sci Dept url httpscdscernchrecord120118

Shostak Robert E (1984) ldquoDeciding Combinations of Theoriesrdquo In J ACM 311pp 1ndash12 doi 1011452422322411 url httpdoiacmorg1011452422322411

Smans Jan Bart Jacobs and Frank Piessens (2008) ldquoVeriCool An Automatic Verifierfor a Concurrent Object-Oriented Languagerdquo In Formal Methods for Open Object-Based Distributed Systems 10th IFIP WG 61 International Conference FMOODS2008 Oslo Norway June 4-6 2008 Proceedings pp 220ndash239 doi 101007978-3-540-68863-1_14 url httpdxdoiorg101007978-3-540-68863-1_14

mdash (2012) ldquoImplicit Dynamic Framesrdquo In ACM Trans Program Lang Syst 34121ndash258 doi 10114521609102160911 url httpdoiacmorg10114521609102160911

Sozeau Matthieu (2009) ldquoA New Look at Generalized Rewriting in Type TheoryrdquoIn J Formalized Reasoning 21 pp 41ndash62 doi 106092issn1972-57871574url httpdxdoiorg106092issn1972-57871574

Sozeau Matthieu and the COQ development team (1997) The Coq Proof AssistantReference Manual Version 86 Inria

Sridharan Manu Denis Gopan Lexin Shan and Rastislav Bodiacutek (2005) ldquoDemand-Driven Points-to Analysis for Javardquo In Proceedings of the 20th Annual ACM SIG-PLAN Conference on Object-oriented Programming Systems Languages and Ap-plications OOPSLA rsquo05 San Diego CA USA ACM pp 59ndash76 isbn 1-59593-031-0 doi 10114510948111094817 url httpdoiacmorg10114510948111094817

Strachey Christopher (1967) Fundamental Concepts in Programming Languages Lec-ture Notes International Summer School in Computer Programming CopenhagenReprinted in Higher-Order and Symbolic Computation 13(12) pp 1ndash49 2000

BIBLIOGRAPHY 229

Taghdiri Mana Robert Seater and Daniel Jackson (2006) ldquoLightweight Extraction ofSyntactic Specificationsrdquo In Proceedings of the 14th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering FSE 2006 pp 276ndash286 doi10114511817751181809 url httpdoiacmorg10114511817751181809

Tip Frank (1995) ldquoA Survey of Program Slicing Techniquesrdquo In J Prog Lang 33url httpcompscinetdcskclacukJPjp030301abshtml

Vardi Moshe Y and Pierre Wolper (1994) ldquoReasoning about Infinite ComputationsrdquoIn Information and Computation 115 pp 1ndash37

Volpano Dennis M Cynthia E Irvine and Geoffrey Smith (1996) ldquoA Sound TypeSystem for Secure Flow Analysisrdquo In Journal of Computer Security 423 pp 167ndash188 doi 103233JCS-1996-42-304 url httpdxdoiorg103233JCS-1996-42-304

Wadler Philip and R J M Hughes (1987) ldquoProjections for Strictness Analysisrdquo InFunctional Programming Languages and Computer Architecture Portland OregonUSA September 14-16 1987 Proceedings pp 385ndash407 doi 1010073- 540-18317-5_21 url httpdxdoiorg1010073-540-18317-5_21

Wand Mitchell and William D Clinger (1998) ldquoSet Constraints for Destructive ArrayUpdate Optimizationrdquo In Proceedings of the 1998 International Conference onComputer Languages ICCL 1998 Chicago IL USA May 14-16 1998 pp 184ndash195 doi 101109ICCL1998674169 url httpdxdoiorg101109ICCL1998674169

Wand Mitchell and Igor Siveroni (1999) ldquoConstraint Systems for Useless VariableEliminationrdquo In POPL rsquo99 Proceedings of the 26th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages San Antonio TX USAJanuary 20-22 1999 pp 291ndash302 doi 101145292540292567 url httpdoiacmorg101145292540292567

Weiser Mark (1984) ldquoProgram Slicingrdquo In IEEE Trans Software Eng 104 pp 352ndash357 doi 101109TSE19845010248 url httpdxdoiorg101109TSE19845010248

Wing Jeannette M (1987) ldquoWriting Larch Interface Language Specificationsrdquo InACM Trans Program Lang Syst 91 pp 1ndash24 doi 101145975810500 urlhttpdoiacmorg101145975810500

Xtext Documentation httpseclipseorgXtext Accessed 2016-09-11Zee Karen Viktor Kuncak and Martin C Rinard (2008) ldquoFull Functional Verification

of Linked Data Structuresrdquo In Proceedings of the ACM SIGPLAN 2008 Conferenceon Programming Language Design and Implementation Tucson AZ USA June 7-13 2008 pp 349ndash361 doi 10114513755811375624 url httpdoiacmorg10114513755811375624

Zhao Yang and John Boyland (2008) ldquoA Fundamental Permission Interpretation forOwnership Typesrdquo In Second IEEEIFIP International Symposium on TheoreticalAspects of Software Engineering TASE 2008 June 17-19 2008 Nanjing Chinapp 65ndash72 doi 101109TASE200845 url httpdxdoiorg101109TASE200845

230 BIBLIOGRAPHY

Zheng Xin and Radu Rugina (2008) ldquoDemand-Driven Alias Analysis for Crdquo In Pro-ceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages POPL rsquo08 San Francisco California USA ACM pp 197ndash208 isbn 978-1-59593-689-9 doi 10114513284381328464 url httpdoiacmorg10114513284381328464

  • Reacutesumeacute eacutetendu en Franccedilais
    • Le Problegraveme du Frame
    • Objectifs
    • Analyse de deacutependance
    • Anaylse de correacutelation
    • Proceacutedure de deacutecision
    • Conclusion
      • Introduction
        • Formal Verification of Software
        • The Frame Problem in a Nutshell
        • Prove amp Run Objectives and Products
        • Context and Problem Statement
        • Contributions and Structure of the Document
          • The Frame Problem in Software Verification
            • Specification Languages and Verification Tools
            • Manifestations of the Frame Problem
            • Approaches to Specifying Frame Properties
              • The Manual Approach
              • The Exclusive Approach
              • The Implicit Approach
                • Topologies and Effects
                  • Explicit Footprints
                  • Implicit Footprints
                  • Predefined Footprints
                    • Other Approaches to Reason about Frames
                    • Other Relevant Work
                      • The Smart Language and ProvenTools
                        • The Smart Modeling Language
                          • Smart Predicates and Types
                          • Exit Labels and Control Flow
                          • Polymorphism amp Algebraic Data Types
                          • Specifications
                          • Illustrating Smart ndash An Abstract Process Manager
                            • ProvenTools
                            • Smil
                              • The alpha-Smil Language
                                • alpha-Smil Syntax
                                • Control Flow Graph
                                • Well-Typed Smil Statements
                                • Operational Semantics of Smil Statements
                                  • Dependency Analysis for Functional Specifications
                                    • Dependency Analysis in a Nutshell
                                      • Targeted Dependency Information
                                      • Outline
                                        • Abstract Dependency Domain
                                          • Join and Reduction Operator
                                          • Well-Typed Dependencies
                                            • Intraprocedural Analysis and Data-Flow Equations
                                              • Intraprocedural Dependency Domains
                                              • Intraprocedural Data-Flow Equations
                                              • Intraprocedural Dependency Analysis Illustrated
                                                • Interprocedural Dependencies
                                                  • Interprocedural Dependency Analysis Illustrated
                                                  • Context-Insensitivity and its Consequences
                                                    • Semantics of Dependency Values
                                                    • Related Work
                                                    • Conclusion
                                                      • Deferred Dependencies Injecting Context in Dependency Summaries
                                                        • Dealing with Context-Insensitivity
                                                        • Symbolic Dependency Components in a Nutshell
                                                        • Symbolic Paths
                                                          • Symbolic Path Type
                                                          • Semantics of Symbolic Paths
                                                          • Well-Typed Paths and Path Sets
                                                            • Abstract Dependency Domain with Deferred Accesses
                                                            • Deferred Dependencies at the Intraprocedural Level
                                                              • Extended Intraprocedural Dependency Analysis
                                                              • Intraprocedural Dependency Analysis Illustrated
                                                                • Deferred Dependencies at the Interprocedural Level
                                                                  • Applying Context-Sensitive Information by Substitution
                                                                  • Wrapped Calls and Results
                                                                    • Related Work
                                                                    • Conclusion
                                                                      • Correlation Analysis
                                                                        • Introduction
                                                                          • Targeted Correlation Information
                                                                          • Correlation Analysis in a Nutshell
                                                                            • Partial Equivalence Relations
                                                                              • Abstract Partial Equivalence Type
                                                                              • Well-Typed Partial Equivalences and their Semantics
                                                                                • Paths and Correlations
                                                                                  • Paths and Correlation Types
                                                                                  • Alignment and Partial Order
                                                                                    • Intraprocedural Correlation Analysis
                                                                                      • Intraprocedural Correlation Summaries and Analysis
                                                                                      • Intraprocedural Correlation Analysis Illustrated
                                                                                        • Interprocedural Correlation Analysis
                                                                                        • Extension ndash Constructor Evolution
                                                                                        • Related Work
                                                                                        • Conclusion
                                                                                          • Implementation Application and Results
                                                                                            • Implementation of the Dependency Analysis
                                                                                              • Dependency Type and Operators
                                                                                              • Intraprocedural Dependency Analysis
                                                                                                • Implementation of the Correlation Analysis
                                                                                                  • Partial Equivalence Relations and Operators
                                                                                                  • Intraprocedural Correlations
                                                                                                  • Dependency and Correlation Analysers
                                                                                                    • Dependency and Correlation Results on ProvenCore Layers
                                                                                                      • ProvenCore Description
                                                                                                      • Obtained Dependency and Correlation Results
                                                                                                      • Precision of our Dependency and Correlation Summaries
                                                                                                        • Reasoning about Framing using Correlations and Dependencies
                                                                                                          • A Decision Procedure
                                                                                                          • Types of Targeted Queries
                                                                                                            • Decision Procedure Experiments
                                                                                                              • Conclusion and Perspectives
                                                                                                                • Contributions
                                                                                                                • Future Work
                                                                                                                  • Bibliography
Page 2: Static Analysis of Functional Programs with an Application

THEgraveSE UNIVERSITEacute DE RENNES 1sous le sceau de lrsquoUniversiteacute Bretagne Loire

pour le grade deDOCTEUR DE LrsquoUNIVERSITEacute DE RENNES 1

Mention InformatiqueEcole doctorale Matisse

preacutesenteacutee par

Oana Fabiana Andreescupreacutepareacutee agrave Prove amp Run et aacute lrsquouniteacute de recherche 6074 ndash IRISAInstitut de Recherche en Informatique et Systemes Aleatoires

Static Analysis ofFunctional Programs

with an Application tothe Frame Problem inDeductive Verification

Thegravese soutenue agrave Rennes

le 29 Mai 2017

devant le jury composeacute de

Sandrine BlazyProfesseure Preacutesidente

Catherine DuboisProfesseure Rapporteuse

Antoine MineacuteProfesseur Rapporteur

Sylvain ConchonProfesseur Examinateur

Thomas JensenProfesseur Directeur de thegravese

Steacutephane LescuyerIngeacutenieur Co-directeur de thegravese

ii

iii

UNIVERSITEacute DE RENNES 1

AbstractProve amp Run

Eacutecole doctorale Matisse

DOCTEUR DE LrsquoUNIVERSITEacute DE RENNES 1

Static Analysis of Functional Programs with anApplication to the Frame Problem in

Deductive Verification

by Oana Fabiana Andreescu

In the field of software verification the frame problem refers to establishing the bound-aries within which program elements operate It has notoriously tedious consequenceson the specification of frame properties which indicate the parts of the program statethat an operation is allowed to modify as well as on their verification ie provingthat operations modify only what is specified by their frame properties In the contextof interactive formal verification of complex systems such as operating systems mucheffort is spent addressing these consequences and proving the preservation of the sys-temsrsquo invariants However most operations have a localized effect on the system andimpact only a limited number of invariants at the same time In this thesis we addressthe issue of identifying those invariants that are unaffected by an operation and wepresent a solution for automatically inferring their preservation Our solution is meantto ease the proof burden for the programmer It is based on static analysis and doesnot require any additional frame annotations Our strategy consists in combining adependency analysis and a correlation analysis We have designed and implementedboth static analyses for a strongly-typed functional language that handles structuresvariants and arrays The dependency analysis computes a conservative approximationof the input fragments on which functional properties and operations depend Thecorrelation analysis computes a safe approximation of the parts of an input state to afunction that are copied to the output state It summarizes not only what is modifiedbut also how it is modified and to what extent By employing these two static analysesand by subsequently reasoning based on their combined results an interactive theo-rem prover can automate the discharching of proof obligations for unmodified partsof the state We have applied both of our static analyses to a functional specificationof a micro-kernel and the obtained results demonstrate both their precision and theirscalability

v

AcknowledgementsFirst of all I would like to express my gratitude to my two PhD advisors ThomasJensen and Steacutephane Lescuyer without whom this thesis would have been impossibleI thank them for their patience and dedication in guiding me throughout these yearsand for all the rigour that they instilled into me by word and by their own exampleThomas thank you for helping me put my work into perspective Thank you for yourencouragement when I was overwhelmed by doubts and for your optimism when I hadnone Steacutephane thank you for your inspiring advices for the rigorous proofreadingfor the many interesting discussions and for your careful attention to my work Knowthat this thank you note was written using Emacs to which I am happy to admit thatyou converted me

I am in debt to Dominique Bolignano for raising the possibility of this thesis andfor creating the frame that allowed me to embark on this interesting journey and toexplore the seas of research among an inspiring group of professionals - the Prove ampRun team

I am grateful to and would like to wholeheartedly thank Catherine Dubois andAntoine Mineacute for accepting to review my dissertation I am honoured to know that my200+ pages have been read by experts of static analysis and formal verification and Iam grateful for their valuable comments and remarks

I would also like to thank Sandrine Blazy and Sylvain Conchon for accepting to bemembers of the jury Sylvain Conchon I am grateful for your keen interest during mydefense Sandrine Blazy thank you for accepting to chair my defense and for drivingit in such a positive manner

For their understanding their advice and their support during the transition periodand the months before my defense I would like to thank Claire Loiseaux and CarolinaLavatelli

I thank all of my colleagues at Prove amp Run for our discussions and their adviceduring these years I thank Florence for her warmth energy and optimism Erica andHenry for being such great office colleagues Pauline and Franccedilois for being friendlyreliable colleagues in the academic trenches I am in debt to Olivier and Benoit forreviewing my articles and providing valuable remarks I thank Pascale for smoothingout the stormy waves of administrative work Though our interactions were brieferI would like to also thank the Celtique members for their openness and for the inter-esting seminaries A special thanks goes to Lydie Mabil for helping me deal with theadministrative work during these years and finally for helping prepare the defense ofmy dissertation

This academic journey started long ago even before I was aware with the help ofMarius Minea and Ovidiu Badescu who unknowingly motivated me to take this pathyears later I warmly thank them and I am grateful to both for paving the first part ofmy academic path

I would also like to thank my friends old and new far and near Thank you foralways being there for me and providing perspective enthusiasm and breaths of freshair Thank you as well for still being my friends despite the long winded and geeky

vi

descriptions of my work and the occasionally cancelled plans and absences while I wastrying to find my way into the research world

I lack the appropriate words to express the gratitude I feel towards my family fortheir never-ending love and support I thank my mother and my sister for being suchwonderful examples of women in science I thank my father for his unwavering belief inme and for his love and respect for well-written sentences no matter the context whichhe instilled into me I thank my brother-in-law for being the one who ignited early onthe sparkle and interest for computers and mathematics and my two wonderful niecesfor always being my rays of light

Last but surely not least I have only gratitude for Georges my companion mypillar of strength my compass and lighthouse during the darkest moments To quoteCarl Sagan in the vastness of space and immensity of time it is my absolute joy tospend a planet and an epoch with you

vii

Contents

I Reacutesumeacute eacutetendu en Franccedilais xxiiiI1 Le Problegraveme du Frame xxiiiI2 Objectifs xxiiiI3 Analyse de deacutependance xxivI4 Anaylse de correacutelation xxvI5 Proceacutedure de deacutecision xxvI6 Conclusion xxvi

1 Introduction 111 Formal Verification of Software 112 The Frame Problem in a Nutshell 513 Prove amp Run Objectives and Products 714 Context and Problem Statement 915 Contributions and Structure of the Document 11

2 The Frame Problem in Software Verification 1321 Specification Languages and Verification Tools 1322 Manifestations of the Frame Problem 1623 Approaches to Specifying Frame Properties 17

231 The Manual Approach 17232 The Exclusive Approach 19233 The Implicit Approach 21

24 Topologies and Effects 21241 Explicit Footprints 23242 Implicit Footprints 24243 Predefined Footprints 25

25 Other Approaches to Reason about Frames 2626 Other Relevant Work 27

3 The Smart Language and ProvenTools 2931 The Smart Modeling Language 29

311 Smart Predicates and Types 30312 Exit Labels and Control Flow 34313 Polymorphism amp Algebraic Data Types 40314 Specifications 43315 Illustrating Smart ndash An Abstract Process Manager 47

32 ProvenTools 52

viii

33 Smil 55

4 The αSmil Language 5941 αSmil Syntax 5942 Control Flow Graph 6743 Well-Typed αSmil Statements 6744 Operational Semantics of αSmil Statements 70

5 Dependency Analysis for Functional Specifications 7751 Dependency Analysis in a Nutshell 78

511 Targeted Dependency Information 79512 Outline 83

52 Abstract Dependency Domain 83521 Join and Reduction Operator 86522 Well-Typed Dependencies 90

53 Intraprocedural Analysis and Data-Flow Equations 91531 Intraprocedural Dependency Domains 91532 Intraprocedural Data-Flow Equations 93533 Intraprocedural Dependency Analysis Illustrated 97

54 Interprocedural Dependencies 100541 Interprocedural Dependency Analysis Illustrated 103542 Context-Insensitivity and its Consequences 104

55 Semantics of Dependency Values 10556 Related Work 10957 Conclusion 112

6 Deferred Dependencies Injecting Context in Dependency Summaries11561 Dealing with Context-Insensitivity 11562 Symbolic Dependency Components in a Nutshell 11663 Symbolic Paths 120

631 Symbolic Path Type 120632 Semantics of Symbolic Paths 122633 Well-Typed Paths and Path Sets 123

64 Abstract Dependency Domain with Deferred Accesses 12565 Deferred Dependencies at the Intraprocedural Level 128

651 Extended Intraprocedural Dependency Analysis 128652 Intraprocedural Dependency Analysis Illustrated 129

66 Deferred Dependencies at the Interprocedural Level 130661 Applying Context-Sensitive Information by Substitution 132662 Wrapped Calls and Results 134

67 Related Work 13468 Conclusion 136

ix

7 Correlation Analysis 13771 Introduction 137

711 Targeted Correlation Information 138712 Correlation Analysis in a Nutshell 140

72 Partial Equivalence Relations 141721 Abstract Partial Equivalence Type 141722 Well-Typed Partial Equivalences and their Semantics 144

73 Paths and Correlations 146731 Paths and Correlation Types 146732 Alignment and Partial Order 149

74 Intraprocedural Correlation Analysis 155741 Intraprocedural Correlation Summaries and Analysis 155742 Intraprocedural Correlation Analysis Illustrated 162

75 Interprocedural Correlation Analysis 16676 Extension ndash Constructor Evolution 16777 Related Work 16978 Conclusion 171

8 Implementation Application and Results 17381 Implementation of the Dependency Analysis 173

811 Dependency Type and Operators 174812 Intraprocedural Dependency Analysis 177

82 Implementation of the Correlation Analysis 178821 Partial Equivalence Relations and Operators 178822 Intraprocedural Correlations 179823 Dependency and Correlation Analysers 180

83 Dependency and Correlation Results on ProvenCore Layers 182831 ProvenCore Description 182832 Obtained Dependency and Correlation Results 184833 Precision of our Dependency and Correlation Summaries 188

84 Reasoning about Framing using Correlations and Dependencies 192841 A Decision Procedure 192842 Types of Targeted Queries 197

85 Decision Procedure Experiments 199

9 Conclusion and Perspectives 20391 Contributions 20492 Future Work 206

Bibliography 211

xi

List of Figures

11 Complex Transition Systems Frame Problem 912 Frame Problem and Solution Strategy 10

31 Possible Transitions between Thread States 4832 The ProvenTools Toolchain 5333 Smart Editor 54

41 Body of the stop_thread Predicate 6542 Example ndash Control Flow Graph of Predicate thread 6743 Well-Typed Control Flow Graph 70

51 Example Data Types ndash Thread and Memory Region 7952 Input Type ndash Process 8053 Predicate thread ndash Implementation 8054 Gthread ndash Control Flow Graph of Predicate thread 8155 Targeted Dependency Results for Predicate thread 8156 Gstart_address ndash Control Flow Graph of Predicate start_address 8257 Predicate start_address ndash Implementation 8258 Targeted Dependency Results for Predicate start_address 8259 Order Relation on Pairs of Atomic Dependencies 85510 Computation of the Intraprocedural Domain at a Nodersquos Entry Point 94511 Analysing Predicate thread ndash Initialisation 98512 Applying the Variant Switch Equation 98513 Analysing Predicate thread ndash Variant Switch 99514 Applying the Array Access Equation 99515 Analysing Predicate thread ndash Array Access 100516 Applying the Field Access Equation 100517 Analysing Predicate thread ndash Field Access 101518 Gstart_address ndash Dependency Information 103519 Gstart_address ndash Final Dependency Results 104

61 Analysing thread ndash Dependency Summary with Deferred Occurrences 13062 Gstart_address ndash Intermediate Dependency Results for start_address 13163 Substitution of Formal Parameters by Effective Parameters 13164 Substituting Deferred Dependencies by Actual Dependencies 132

71 Body of the stop_thread Predicate 138

xii

72 Targeted Correlation Results for Predicate stop_thread 13973 Intraprocedural Correlations ndash General Representation 14074 Intraprocedural Domain ndash Examples 14175 Entry Point ndash Correlation Information 16276 Analysing Predicate stop_thread ndash Initialisation 16377 Construction Evolution 167

81 ProvenCore ndash Abstract Layers 18382 Distribution of the number of inferred preserved properties 20183 Distribution of the number of inferred predicates for which a property is

preserved 202

xiii

List of Tables

42 αSmil ndash Set of Supported Statements 6243 Statements and their Exit Labels 6344 Predicate Body in αSmil 6446 Well-Typed Predicate Call 6847 Well-Typed Statements 6948 The Structural Operational Semantics of αSmil Generic Statements 7249 Operational Semantics of αSmil Structure-Related Statements 73410 Operational Semantics of αSmil Variant-Related Statements 74411 Operational Semantics of αSmil Array-Related Statements 75412 Semantics of a Predicate Call 76

51 v ndash Comparison of Two Domains 8652 or ndash Join Operation 8753 oplus ndash Reduction Operator 8954 Dependency Extractions 9055 Well-Typed Dependencies 9156 Statements ndash Representations and Data-Flow Equations 9357 Generic Statements ndash Data-Flow Equations 9558 Structure-Related Statements ndash Data-Flow Equations 9559 Variant-Related Statements ndash Data-Flow Equations 96510 Array-Related Statements ndash Data-Flow Equations 97

61 E ndash Path Semantics 12262 Well-Typed Dependency Paths

12463 Extended Leq - Comparison of Two Domains

12664 or ndash Extended Join 12765 oplus ndash Extended Reduction Operator 12766 Extended Extraction Operators 12867 Well-Typed Dependencies ndash Extended 12868 Deferred Paths ndash Application and Substitutions 13369 Interprocedural Domain ndash Substitutions 133

71 vR ndash Comparison of Two Domains 14272 Partial Equivalences ndash orR ndash Join Operation 14373 Partial Equivalences ndash andR ndash Meet Operation 143

xiv

74 Partial Equivalence Extractions 14475 Well-Typed Partial Equivalences 14576 Partial Equivalence Relations ndash Semantics 14677 Well-Typed Access Paths

14878 Well-Typed Correlations

14879 Well-Typed Correlation Maps

149711 Links between Access Paths 152712 Statements ndash Representations and Data-Flow Equations 157719 Well-Formed Intraprocedural Correlation Summaries

162

83 ProvenCore Abstract Layers ndash Global State Type 18584 ProvenCore Abstract Layers ndash ProcessMachine Type 18585 Abstract Layers ndash Evaluation Data and Dependency Analysis Timing 18686 Abstract Layers ndash Detailed Dependency Analysis Timing 18687 Abstract Layers ndash Evaluation Data and Deferred Dependency Analysis

Timing 18788 Abstract Layers ndash Detailed Deferred Dependency Analysis Timing 18789 Abstract Layers ndash Evaluation Data and Correlation Analysis Timing 187810 Abstract Layers ndash Detailed Correlation Analysis Timing 188811 RSMFSP Layers ndash Evaluation Data and Dependency Summaries 190812 TDS Layer ndash Evaluation Data and Dependency Summaries 191813 RSMFSP Layers ndash Evaluation Data and Correlation Summaries 192814 TDS Layer ndash Evaluation Data and Correlation Summaries 193

xv

List of notations

Section Symbol Type DescriptionSec 312 true L Special exit label 34Sec 312 false L Special exit label 35Sec 41 T0 sub T Set of base type identifiers 60Def 411 T Universe of type identifiers 60Def 411 τ T Type 60Def 411 τ0 T Primitive type 60Def 411 structf1 τ1 T Structure type 60Def 411 variant[C1 τ1| ] T Variant type 60Def 411 arrτ 〈τ〉 T Array type 60Sec 41 λ L Exit label 61Sec 41 L Set of exit labels 61Sec 41 error L Special exit label 61Sec 41 σ σp Σ Signature (of predicate p) 61Sec 41 Σ Set of predicate signatures 61Sec 41 o o V Output variable(s) 61Tab 42 s αSmil statement 62Tab 42 o = e αSmil assignment statement 62Tab 42 e1 = e2 αSmil equality test statement 62Tab 42 nop αSmil no operation statement 62Tab 42 r = e1 en αSmil create structure statement 62Tab 42 o1 on = r αSmil destructure structure 62Tab 42 o = rfi αSmil access field statement 62Tab 42 rprime = r with fi = e αSmil update field statement 62Tab 42 rprime = 〈f1 fk〉rprimeprime αSmil partial structure equality 62Tab 42 v = Cp[e] αSmil create variant statement 62Tab 42 switch(v) as [o1| ] αSmil destructure variant statement 62Tab 42 v isin C1 Ck αSmil variant possible statement 62Tab 42 o = a[i] αSmil array access statement 62Tab 42 aprime = [a with i = e] αSmil array update statement 62Tab 42 p(e1 ) [λ1 o1 | ] αSmil predicate call statement 62Sec 42 Gp = (N E) Control flow graph of predicate p 67Def 431 Γ V rarr T Typing environment 68Sec 43 v V Variable 68Sec 43 V Set of variables 68Sec 43 V+ sube V Writable variable identifiers 68Def 432 Σ P rarr S Maps predicate ids to signatures 68

xvi

Def 433 ΣΓO ` srarr λ Well-typed statement 68Sec 43 O sube V+ Output variables of a predicate 68Sec 44 Dτ Semantic values of type τ 70Sec 44 P sube Dτ Domain of valid array indices 71Sec 44 E = V rarr D Valuation or environment type 71Def 442 E E Valuation or environment 71Sec 44 Γ(v) Type of v 71Sec 44 Γ ` E Well-typed environment 71Def 443

langE [s]

rangConfiguration 71

Def 444langE [s]

rang λminusrarr Eprime Transition 71Def 445 E [xrarr v] Extension of E with xrarr v 72Def 446 I = PtimesErarrEtimesL Set of interpretations 72Def 446 I I Interpretation 72Sec 52 D Abstract dependency domain 83Def 521 δ D Dependency 83Def 521 gt D Everything atomic dependency 83Def 521 D Nothing atomic dependency 83Def 521 perp D Impossible atomic dependency 83Def 521 f1 7rarr δ1 D Structure dependency 83Def 521 [C1 7rarr δ1 ] D Variant dependency 83Def 521 〈δ〉 D Array dependency 83Def 521 〈δdef i δexc〉 D Array dependency exception for i 83Def 522 v sube DtimesD Partial order on dependencies 85Tab 51 Rules for v 86Def 523 or DtimesD rarr D Join operator for dependencies 86Tab 52 or cases 87Def 524 oplus DtimesD rarr D Reduction operator for dependencies 88Tab 53 oplus cases 89Def 525 f D 9 D Extraction of a fieldrsquos dependency 89Def 526 C D 9 D Extraction of a constructorrsquos dep 89Def 527 〈i〉 D 9 D Extraction of an arrayrsquos cell dep 89Def 528 〈lowast i〉 D 9 D Extraction of an arrayrsquos dep (exc) 90Def 529 〈lowast〉 D 9 D Extraction of an arrayrsquos dependency 90Tab 54 f c 〈lowast i〉 〈i〉 and 〈lowast〉 cases 90Tab 55 Γ ` δ τ Well-typed dependency 91Def 531 D = VrarrD Intraprocedural dependency domain 92Def 531 ∆ D Intraprocedural dependency 92Sec 531 Unreachable D Intra dep for unreachable nodes 92Def 532 ∆ x DtimesV rarr D Forget x 92Def 533 v∆ sube DtimesD Intraprocedural partial order 92Def 534 or∆ DtimesD rarr D Intraprocedural join operation 92Def 535 oplus∆ DtimesD rarr D Intraprocedural reduction operator 93Sec 532 JsKλ(∆nj ) Contribution of an edge (ni nj) 93

xvii

Sec 532 JsKλ() Transfer function of the edge s λ 93Sec 532 gensλ Written variables on the edge s λ 94Sec 532 ∆n D Dependency domain of node n 94Sec 532 I sube V Set of input variables 96Sec 54 χ Formal-Effective param mapping 101Sec 54 J (χ) Substitution formal to effective 101Def 631 π Π Symbolic path 120Def 631 Π Universe of symbolic paths 120Def 631 ε Π Symbolic path endpoint 120Def 631 fπ Π Symbolic path field 120Def 631 Cπ Π Symbolic path constructor 120Def 631 〈i〉π Π Symbolic path array cell 120Def 631 〈lowast i〉π Π Symbolic path array cells except 120Def 631 〈lowast〉π Π Symbolic path all array cells 120Sec 631 ΠtimesΠrarrΠ Path extension operator 121Sec 631 P 2Π Symbolic path set 121Def 632

v sub 2Πtimes2Π Partial order for path sets 121

Def 633or 2Πtimes2Πrarr2Π Join operator for path sets 121

Def 634 2ΠtimesΠrarr2Π Extension operator for path sets 121Def 635 π Π Actual path 122Def 635 Π Universe of actual paths 122Def 635 ε Π Actual path empty 122Def 635 f π Π Actual path field 122Def 635 Cπ Π Actual path constructor 122Def 635 〈i〉π Π Actual path array cell 122Def 61 E sub E timesΠtimesΠ Symbolic path covers actual path 122Sec 632 E sub E times2ΠtimesΠ Set of symbolic paths covers actual 122Def 636 JP KE sub E times2Πrarr2Π Interpretation of symbolic paths set 123Def 637 at ΠtimesDrarrD Find subpart of value at given path 123Tab 62 I ` π τrarrτ prime sub VtimesΠtimesTtimesT Symbolic paths typing judgement 124Sec 633 I

` P τrarrτ prime sub Vtimes2ΠtimesTtimesTSymbolic paths sets judgement 124

Def 641 δ D Extended dependency 125Def 641 D Ext abstract dependency domain 125Def 641 Deferred(o17rarrP1 ) D Deferred accesses dependency 125Def 642 A V 9 Π Access map 125Tab 63 Deferred rule for v 126Tab 64 or cases for deferred 127Tab 65 oplus cases for deferred 127Tab 66 f c 〈lowast i〉 〈i〉 〈lowast〉 deferred cases128Tab Γ IO ` δ τ Well-typed dependency deferred rule128Def 661 σ V rarr D Substitution roots vars to deps 132Def 662 φ V9V Substitution indices in arrays 132Sec 661 J (σ φ) Substitutes deferred dependencies 132

xviii

Sec 661 bull Applies symbolic paths to dep 132Sec 661 Applies symbolic path to dep 133Def 721 R R Partial equivalence 141Def 721 R Partial equivalence type 141Def 721 Equal R Partial equivalence equal 141Def 721 Any R Partial equivalence unrelated 141Def 721 f1 7rarr R1 R Partial equivalence structure 141Def 721 [C1 7rarr R1 ] R Partial equivalence variant 141Def 721 〈Rdef 〉 R Partial equivalence array 141Def 721 〈Rdef i Rexc〉 R Partial equivalence array + exc 141Def 722 vR sube RtimesR Preorder for partial equivalences 142Def 71 Rules for vR 142Def 723 orR RtimesRrarrR Join for partial equivalences 142Tab 72 orR cases 142Def 724 andR RtimesRrarrR Meet for partial equivalences 142Tab 73 andR cases 142Def 725 extrf R9R Extracts fieldrsquos partial eqv 143Def 726 extrC R9R Extracts constructorrsquos partial eqv 143Def 727 extr 〈i〉 R9R Extracts cellrsquos partial eqv 143Tab 74 extrf extrC and extr 〈i〉 cases 144Tab 75 Γ ` R τ Partial equivalence well-typedness 145Sec 722 JRKτ Partial equivalence semantics 145Def 731 π Π Access path 147Def 731 Π Access path type 147Def 731 ε Π Access path empty 147Def 731 f π Π Access path field 147Def 731 Cπ Π Access path constructor 147Def 731 〈i〉π Π Access path array cell 147Def 732 κ K Correlation map 147Def 732 K = ΠtimesΠrarrR Correlation map type 147Sec 731 (π ρ) 7rarr R ΠtimesΠtimesR Correlation 147Tab 77 ΓI ` π τrarr τ Well-typed access path 148Tab 78 ΓI `(πρ) 7rarrR (τlτr) Well-typed correlation 148Tab 79 ΓI `κ (τlτr) Well-typed correlation map 149Def 733 micro M Link 151Def 733 M Link type 151Def 733 Identical M Link identical 151Def 733 Left π M Link left path has suffix π 151Def 733 Right π M Link right path has suffix π 151Def 733 Incompatible M Link incompatible paths 151Def 734 f ΠtimesΠrarrM Matching Operator 151Def 735 R

(πρ)(πprimeρprime) Aligning a correlation 152

Def 736 Computation of R(πρ)(πprimeρprime) 154

xix

Def 737 ΠtimesR9R Projection 154Def 738 x RtimesΠ9R Injection 154Def 739 κ (πprime ρprime) Aligns correlation maps 154Def 7310v sube K timesK Correlation maps preorder 155Def 7311

orKtimesKrarrK Join for correlation maps 155

Def 7312and

KtimesKrarrK Meet for correlation maps 155Def 741 K K Intraprocedural corr summary 156Def 741 K = VtimesVrarrK Intraproc corr summary type 156Sec 741 NoCorrelation K Any for any pair of variables 156Def 742 vK sube KtimesK v for intraproc corr summaries 156Def 743

orK KtimesKrarrK Join for intraproc corr summaries 156

Def 744 Csλ() C Contribution of an edge 157Sec 741 csλ K Corr created by stmt s on label λ 157Sec 741 killλ sube V Variables redefined by stmt on label157Def 745 (πbull ρbull) 7rarr R ΠtimesΠtimesR New correlation after composition 161Def 746 KtimesKrarrK Composition of correlation maps 161Def 747 CtimesKrarrK Contribution Csλi(Kni) 161Def 719 Γ IO K Well-formed intraproc corr summ 162Sec 742 o Final value of o 162Def 751 Kp ΛprarrK Interproc correlation domain 166Def 751 Λp sube L Output labels of predicate p 166Sec 76 Impossible R Partial eqv constructor impossible 168Sec 76 RCiCj R Partial eqv variant matrix 168

xxi

To my family and close ones

xxiii

Chapter I

Reacutesumeacute eacutetendu en Franccedilais

I1 Le Problegraveme du FrameDans le domaine de la veacuterification formelle de logiciels il est impeacuteratif drsquoidentifier leslimites au sein desquelles les eacuteleacutements ou fonctions opegraverent Une speacutecification com-plegravete drsquoune opeacuteration doit non seulement preacuteciser que les valeurs de sortie possegravedentune certaine propriegravete mais elle doit eacutegalement deacutelimiter les parties de lrsquoeacutetat drsquoeacutentreacuteesur lesquelles lrsquoopeacuteration fonctionne Ces limites constituent les proprieacuteteacutes de frame(frame properties en anglais) Elles sont habituellement speacutecifieacutees manuellement parle programmeur et leur validiteacute doit ecirctre veacuterifieacutee il est neacutecessaire de prouver que lesopeacuterations du programme nrsquooutrepassent pas les limites ainsi deacuteclareacutees La speacutecificationet la preuve de proprieacuteteacutes de frame est une tacircche notoiremment connue comme eacutetantlongue et fastidieuse Lrsquoeffort consideacuterable investi dans cette tacircche est une manifesta-tion du problegraveme de frame (frame problem en anglais) Les manifestations du problegravemede frame apparaissent dans le contexte de tous les langages de speacutecification et de toutesles meacutethodes de veacuterification formelle

I2 ObjectifsAu fil du deacuteveloppement de ProvenCore un micro-noyau polyvalent qui garantit lrsquoisola-tion il est apparu eacutevident que la speacutecification et la veacuterification des systegravemes de transi-tion en geacuteneacuteral ainsi que la speacutecification et veacuterification des systegravemes drsquoexploitation enparticulier ne sont pas immunes au problegraveme du frame Les systegravemes drsquoexploitation sontcaracteacuteriseacutes par des eacutetats complexes deacutefinis par des types de donneacutees algeacutebriques et destableaux associatifs qui sont des briques fondamentales pour repreacutesenter et manipulerdes donneacutees complexes drsquoune maniegravere efficace Les systegravemes drsquoexploitation sont aussicaracteacuteriseacutes par des transitions qui associent de tels eacutetats drsquoentreacutee agrave de nouveaux eacutetatsde sortie Cependant la plupart des transitions ne sont pas concerneacutees par lrsquoeacutetat drsquoen-treacutee dans son inteacutegraliteacute mais deacutependent de et modifient un sous-ensemble de celui-ciIntuitivement des proprieacuteteacutes valides pour lrsquoeacutetat drsquoentreacutee restent trivialement validespour lrsquoeacutetat de sortie obtenue apregraves la transition tant qursquoelles deacutependent seulement desparties de lrsquoeacutetat drsquoentreacutee qui ne sont pas modifieacutees par la transition En pratique prou-ver la preacuteservation de ces proprieacuteteacutes nrsquoest pas une tacircche eacutevidente et impose un effortmanuel conseacutequent et une foule de preuves peacutenibles et reacutepeacutetitives

xxiv

Lrsquoobjectif de notre travail a eacuteteacute drsquoadresser ce problegraveme et de trouver une solutionautomatiseacutee pour infeacuterer la preacuteservation de ces proprieacuteteacutes Plus preacuteciseacutement notre but aeacuteteacute lrsquoinfeacuterence automatique des proprieacuteteacutes qui deacutependent drsquoun sous-ensemble de lrsquoentreacuteequi est disjoint du frame de lrsquoopeacuteration crsquoest-agrave-dire du sous-ensemble de lrsquoeacutetat qui estmodifieacute Agrave cette fin nous avons proposeacute une solution baseacutee sur lrsquoanalyse statique quine requiert pas drsquoannotations de frame suppleacutementaires En deacutetectant le sous-ensemblede lrsquoeacutetat dont deacutepend une proprieacuteteacute ainsi que la partie qui nrsquoest pas affecteacutee par uneopeacuteration nous pouvons reacutesoudre automatiquement les obligations de preuve lieacutees agravedes parties non modifieacutees

Nous employons deux analyses statiques dans ce but une analyse de deacutependance etune analyse de correacutelation Les deux analyses gegraverent des programmes manipulant des ta-bleaux associatifs ainsi que des types de donneacutees algeacutebriques (structures et variants) etcalculent des reacutesultats refleacutetant la structure sous-jacente de ces types (champs construc-teurs et cellules de tableau) Un raisonnement automatique baseacute sur le reacutesultat combineacutede ces deux analyses statiques permet drsquoinfeacuterer la preacuteservation de certaines proprieacuteteacuteesrelatives agrave lrsquoeacutetat de sortie Agrave terme ces deux analyses ont pour vocation agrave ecirctre em-ployeacutees par une tactique de preuve qui sera inteacutegreacutee agrave lrsquoassistant de preuve interactiveinclus dans la suite logicielle ProvenTools deacuteveloppeacutee par Prove amp Run

Smart le langage cibleacute par la suite logicielle ProvenTools est un langage purmentfonctionnel qui manipule des structures de donneacutees algeacutebriques et des tableaux associa-tifs immuables Ce travail a eacuteteacute motiveacute par la veacuterification de ProvenCore ProvenCore estimpleacutementeacute via de multiples raffinements entre des modegraveles successifs du noyau du plusabstrait qui permet la deacutefinition et la preuve de la proprieacuteteacute drsquoisolation au plus concretqui est utiliseacute pour la geacuteneacuteration de code Les eacutetats globaux des couches abstraites sontdes structures complexes contenant de nombreux champs eux-mecircmes composites Descommandes telles que fork exec et exit peuvent ecirctre exeacutecuteacutees Chacune de ces com-mandes reccediloit comme argument un eacutetat global drsquoentreacutee et produit lrsquoeacutetat du systegravemeapregraves exeacutecution de la commande En pratique la plupart des commandes supporteacuteespar le systegraveme ne menacent qursquoun nombre limiteacute drsquoinvariants Prouver automatique-ment la preacuteservation des invariants immunes peut diminuer consideacuterablement le nombretotal de preuves agrave la charge du programmeur et permet agrave celui-ci de se concentrer surles preuves les plus inteacuteressantes

I3 Analyse de deacutependanceLrsquoanalyse de deacutependance gegravere des fonctions et leur speacutecification de maniegravere uniformeElle calcule conservativement pour chaque sceacutenario drsquoexeacutecution possible une approxi-mation des sous-eacuteleacutements de lrsquoeacutetat drsquoentreacutee desquels deacutepend le reacutesultat Pour les va-riants une analyse suppleacutementaire est effectueacutee simultaneacutement afin de calculer le sous-ensemble des constructeurs possibles dans chaque sceacutenario drsquoexeacutecution

Nous avons deacutefini notre propre domaine abstrait repreacutesentant les deacutependances etobtenons des informations de deacutependance qui reflegravetent la structure en couche des typesde donneacutees

xxv

Cette analyse a eacuteteacute conccedilue dans le but drsquoecirctre exeacutecuteacutee agrave la voleacutee durant la veacuterifica-tion interactive et opegravere de maniegravere uniforme sur les programmes et leur speacutecificationces deux points confeacuterant agrave notre approche son originaliteacute Nous avons impleacutementeacute unprototype de cette analyse de deacutependance en OCaml et lrsquoavons appliqueacutee agrave une speacuteci-fication fonctionnelle de ProvenCore Les reacutesultats obtenus sont positifs par exemplelrsquoanalyse de deacutependance srsquoexeacutecute en moins drsquoune seconde sur un ensemble de plus de600 preacutedicats totalisant approximativement 10000 lignes de code

Afin drsquointroduire pour lrsquoanalyse de deacutependance une forme de sensibiliteacute au contextenous avons conccedilu une extension baseacutee sur des chemins symboliques Cette extensionrallonge leacutegegraverement le temps drsquoexeacutecution (de 10 agrave 20 sur les benchmarks utiliseacutes)Cependant en utilisant lrsquoanalyse de deacutependance avec cette extension nous avons obtenudes reacutesultats plus preacutecis pour 50 des preacutedicats inclus dans ces benchmarks

I4 Anaylse de correacutelationLrsquoanalyse de correacutelation deacutetecte le flot de valeurs drsquoentreacutee dans les valeurs de sortie Ellecalcule conservativement une approximation des eacutequivalences entre les sous-eacuteleacutementsdrsquoentreacutee et ceux de sortie pour une fonction donneacutee Crsquoest une analyse statique inter-proceacutedurale qui reacutesume le comportement drsquoune fonction et qui deacutetecte quelles partiesde lrsquoeacutetat sont modifieacutees et dans quelle mesure Nous avons deacutefini un type drsquoeacutequivalencepartiel qui reflegravete la structure des types de donneacutees algeacutebriques et tableaux associatifsPour gagner en preacutecision et ne pas perdre drsquoinformations lorsque lrsquoentreacutee et la sortieont des types diffeacuterents nous avons introduit un niveau intermeacutediaire Les correacutelationsconsistent donc en des chemins drsquoaccegraves vers des sous-eacuteleacutements de mecircme type et deseacutequivalences entre ces sous-eacuteleacutements Ce niveau intermeacutediaire permet de calculer demaniegravere flexible des eacutequivalences preacutecises entre des parties de lrsquoentreacutee et des parties dela sortie

Nous avons lagrave aussi impleacutementeacute en OCaml un prototype de cette analyse de cor-reacutelation et nous lrsquoavons appliqueacute agrave une speacutecification fonctionnelle de ProvenCore Lesreacutesultats obtenus sont encourageants par exemple les correacutelations calculeacutees pour unsous-ensemble de 630 preacutedicats totalisant approximativement 10000 lignes de code sontobtenus en moins de 05 secondes Bien que plus complexe que lrsquoanalyse de deacutependancelrsquoanalyse de correacutelation srsquoexeacutecute plus rapidement sur nos benchmarks car contrairementagrave la premiegravere elle ne srsquoapplique qursquoaux fonctions mais pas aux speacutecifications En effetles speacutecifications sont des preacutedicats booleacuteens et ne retournent pas un eacutetat modifieacute

I5 Proceacutedure de deacutecisionNous avons esquisseacute une proceacutedure de deacutecision qui emploie nos deux analyses statiquesCelle-ci constitue la premiegravere eacutetape de notre solution pour lrsquoinfeacuterence automatique dela preacuteservation des invariants de frame En mettant au jour des eacutequivalences entreles entreacutees et les sorties et apregraves avoir deacutetecteacute qursquoune proprieacuteteacute ne deacutepend que de

xxvi

parties inchangeacutees il est possible drsquoinfeacuterer la preacuteservation des invariants pour ces partiesinchangeacutees

La proceacutedure de deacutecision nrsquoa pas encore eacuteteacute impleacutementeacutee mais des expeacuteriencespreacuteliminaires et un prototype simple nous donnent une ideacutee de la maniegravere dont lesreacutesultats de deacutependance et de correacutelation doivent ecirctre unifieacutes Par ailleurs cela nous apermis de deacuteterminer le genre de requecirctes qui peuvent ecirctre traiteacutees et le meacutecanismepermettant drsquoy reacutepondre Les reacutesultats obtenus gracircce agrave notre prototype simple sur unespeacutecification fonctionnelle de ProvenCore sont deacutecrits et analyseacutes

Lrsquounification des reacutesultats des deux analyses passe par la creacuteation drsquoun graphe re-liant les variables drsquoentreacutee et de sortie examineacutees par la requecircte Les arcs repreacutesententdes correacutelations entre des sous-eacuteleacutements de ces variables qui sont deacutetecteacutees par la se-conde analyse Les deacutependances de la proprieacuteteacute dont on cherche agrave infeacuterer la preacuteservationindiquent les sous-eacuteleacutements qui influent sur le reacutesultat de cette proprieacuteteacute Lorsque cessous-eacuteleacutements sont laisseacutes intacts la proprieacuteteacute est trivialement preacuteserveacutee Lrsquoalgorithmedrsquounification parcourt donc le graphe en tentant de deacutetecter un maximum drsquoeacutequiva-lences entre des sous-eacuteleacutements des variables drsquoentreacutee et de sortie Si les sous-eacuteleacutementsindiqueacutes par la deacutependance sont inclus dans lrsquoensemble des sous-eacuteleacutements eacutequivalentsalors la proprieacuteteacute est neacutecessairement preacuteserveacutee car toutes les valeurs influant sur sonreacutesultat sont les mecircmes avant et apregraves lrsquoexeacutecution de lrsquoopeacuteration

I6 ConclusionPour conclure nous avons conccedilu et impleacutementeacute deux analyses statiques qui deacutetectentles deacutependances de donneacutees drsquoune proprieacuteteacute logique ainsi que des correacutelations entreles entreacutees et sorties drsquoopeacuterations Nos premiers reacutesultats sur un modegravele fonctionneldrsquoun micro-noyau sont encourageants tant pour leur preacutecision que pour la vitesse delrsquoanalyse ce qui rend ces analyses adeacutequates pour un usage dans le cadre drsquoun prouveurinteractif Hormis de menues ameacuteliorations impactant la preacutecision de notre analyse lesprochaines eacutetapes consistent agrave les combiner afin de deacutetecter les invariants qui ne sontpas affecteacutes par lrsquoexeacutecution drsquoun preacutedicat puis inteacutegrer cette deacutetection comme tactiquedans le prouveur de theacuteoregravemes ProvenTools Nous pensons qursquoil est possible de tirerparti des speacutecifications de frame agrave moindre coucirct en particulier sans que cela imposeau programmeur lrsquoeacutecriture fastidieuse drsquoannotations intuitivement eacutevidentes Lors dela veacuterification formelle de systegravemes de transition complexes il devient alors possibledrsquointeacutegrer aux outils de deacuteveloppement une infeacuterence automatique de la preacuteservationdes invariants lieacutes au frame via lrsquoanalyse statique

1

Chapter 1

Introduction

No human investigation can claim tobe scientific if it doesnrsquot pass the testof mathematical proof

Leonardo da Vinci

11 Formal Verification of SoftwareSince the middle of the last century computers and information technology broughtforth a digital revolution fundamentally changing the way we live work and inter-act with one another Nowadays computer programs govern our world and softwarepermeates our lives in manifold ways shaping our interactions with the surroundingenvironment From the alarm clock that marks the start of our day and the coffee ma-chine that motivates us to leave the house to the smart phone we use for checking ouremails or bank account and the car we are driving (or the automated driverless subwaywe are relying on) some type of software is discreetly acting in the background Wehave grown so accustomed to it that we do not even notice it anymore until it assertsitself by impeding us to check our email by displaying a blue error screen on an ATM orticket machine or by serving us a salty bag of crisps instead of the desperately neededbottle of water we have just paid for on a vending machine Such reminders can lead tofrustration and cause inconveniences but essentially they cause minor problems How-ever receiving such reminders as a result of malfunctions of medical equipment suchas radiation therapy machines of flight control systems Mars orbiters satellites or nu-clear power plants can have dramatic consequences endangering human lives causingenvironmental harm or entailing significant financial losses Therefore the quality ofthe software around us not only influences the quality of our daily lives but it mightpotentially have an impact on our safety and the safety of our surrounding world

Writing reliable completely error-free software is a difficult task and even a utopianone in the absence of dedicated rigorous approaches for improving its quality Indeedfor many software systems no guarantees or warranties are provided and their qualityis addressed only by traditional software engineering approaches such as testing or codereview which cannot guarantee the absence of bugs While this can be acceptable fornon-critical programs mission- or safety-critical software systems for which software

2 Chapter 1 Introduction

quality is of the utmost importance have to guarantee the absence of runtime errorsand provide high levels of confidence regarding their functional correctness Certainsafety-critical market segments impose standards and regulatory requirements for thedevelopment of such software systems In these domains formal program verificationis emerging as a promising approach gaining a wider audience and more and moreterrain

Formal program verification comprises a set of techniques and tools that can be usedto ensure by mathematical means that the program under scrutiny fulfills its functionalcorrectness requirements ie that it computes the right information For achieving thisgoal a formal description or specification of the programrsquos expected behaviour mustbe given Once this is established multiple mathematical tools can be employed forformally verifying that the programrsquos implementation follows the formal specification

Formal methods can be traced back to the early days of computer science andtheir origin can be linked to the names of Floyd (Floyd 1967) Hoare (Hoare 1969)and Naur (Naur 1966) (and later to that of Dijkstra (Dijkstra 1976)) and theirmethods for verifying program code with respect to assertions Despite their earlyfoundations formal methods seemed for decades to be confined to the research worldas a consequence of intricate notations failure to scale to real-world programs andlimited or inadequate tool support Since the 1960rsquos however considerable progresshas been made in the field of formal methods in terms of both methodology and toolsfor computer aided program verification Still formal program verification methods arenot yet a widespread alternative or even complement to testing in the industry Unliketesting that cannot show the absence of bugs the goal of formal verification methodsis to prove by means of mathematical tools that the program execution is correct in allspecified environments without actually executing the program itself These are staticverification techniques

Static verification techniques include program typing model checking deductiveverification methods and static program analysis Besides requiring a formal specifica-tion of the programrsquos intended behaviour and its envisioned properties at runtime allformal methods are theoretically characterized by undecidability and complexity whichare addressed by introducing some form of approximation For soundness consider-ations these approximations are necessarily over-approximations and all static veri-fication techniques are necessarily conservative they can prove the absence of someerroneous runtime behaviours but they will inevitably trigger some false warnings re-jecting certain behaviours that are in practice correct

Program Typing Type systems (Cardelli and Wegner 1985) are tools for reasoningabout programs More specifically they constitute ldquoa syntactic method for proving theabsence of certain program behaviours by classifying phrases according to the kindsof values they computerdquo (Pierce 2002) They are used for computing static approxi-mations of the runtime behaviours of the terms in a program and can guarantee thatwell-typed programs are free from certain runtime type errors such as passing stringsas arguments to a primitive arithmetic operation or using an integer as a pointer

11 Formal Verification of Software 3

In practice type systems have become the most widespread instance of formalmethods with applications to many programming languages and automatic typecheck-ers built into a variety of compilers Static typecheckers entail a variety of benefitsranging from early error detection to offering convenient abstraction and documen-tation mechanisms and improving the efficiency of compilers which nowadays makeuse of the information provided by typecheckers during their optimization and codegeneration phases

The Curry-Howard correspondence implies that types can be used for expressingarbitrary complex mathematical specifications Additional type annotations could inprinciple enable the full proof of complex properties effectively transforming typecheckers into proof checkers (Pierce 2002) Approaches such as Extended Static Check-ing (Leino 2001 Leino and Nelson 1998 Flanagan et al 2002) made progress towardsimplementing entirely automatic checks for broad classes of correctness properties

Additionally approaches relying on type inference have been used for alias analy-sis (OrsquoCallahan and Jackson 1997) and exception analysis (Leroy and Pessaux 2000)Powerful type systems based on dependent types (Martin-Loumlf 1984 Nordstroumlm Peters-son and Smith 1990) are used in automated theorem proving Various proof assistantsincluding Coq (Bertot and Casteacuteran 2004 Sozeau and team 1997) 1 are based on typetheory

Model Checking Model checking is a verification technique exhaustively exploringall possible system states in a systematic manner (Baier and Katoen 2008) More pre-cisely given a finite-state model of a system and a formal property a model checkingtool verifies whether the property under scrutiny holds for a state in the given modelModel checking emerged as a popular lightweight formal method as a consequence ofprogress made in the development of program logic and decision procedures auto-matic model checking techniques and compiler analysis (Jhala and Majumdar 2009)First program logic and decision procedures (Nelson and Oppen 1980 Shostak 1984)provided the needed framework and algorithmic tools to reason about infinite statespaces Automatic model checking techniques (Clarke and Emerson 1981 Vardi andWolper 1994) for temporal logic provided algorithmic tools for state-space explorationAbstract interpretation (Cousot and Cousot 1977) provided connections between thelogical world of infinite state spaces and the algorithmic world of finite representa-tions (Jhala and Majumdar 2009)

Currently model checking continues attracting considerable attention from the in-dustry This can be partly explained by it being a rather general verification approachthat is suitable for applications stemming from different areas ranging from embeddedsystems to hardware design In addition it is also an automatic lightweight techniquesupporting partial verification and requires a low degree of user interaction and a lowerdegree of expertise (Baier and Katoen 2008) compared to other verification techniques

1Coq Reference Manual Version 86 httpscoqinriafrdistribcurrentfilesReference-Manualpdf

4 Chapter 1 Introduction

Its main weaknesses stem on one hand from it suffering from the combinatorial state-space explosion (the number of states needed to model the system accurately may easilyexceed the amount of available computer memory) and on the other hand from itbeing less suitable for data-intensive applications

Model checking techniques also impose the production of models often expressedusing finite-state automata which are in turn described in a dedicated description lan-guage Another prerequisite for model checking is a formal specification of the prop-erties to be verified typically provided by means of temporal logic which is suitablefor the specification of a variety of properties ranging from functional correctness andsafety to liveness fairness and real-time properties (Baier and Katoen 2008)

Deductive Verification Methods Deductive verification methods consist in pro-ducing formal correctness proofs by first generating a set of formal mathematical proofobligations from the program and its specification and by subsequently dischargingthese Based on the manner in which proof obligations are discharged namely auto-matically or interactively the deductive verification methods can be classified into twobroad categories Both require a thorough understanding of the system to be provenas well as a good knowledge of the employed proof tools

The first category of deductive methods rely on standalone tools that accept asinputs programs written in a specific programming language (such as Java C or Ada)and specified in a dedicated annotation language (such as JML or ACSL) These auto-matically produce a set of mathematical formulas called verification conditions whichare typically proven using automatic theorem provers (Gallier 1987) or satisfiabilitymodulo theories solvers (SMT) such as Alt-Ergo Z3 CVC3 Yices Deductive verifi-cation tools such as Why3 or Boogie have their own programming and specificationlanguage (WhyML and Boogie respectively) which can act as intermediate verifica-tion languages and are designed as a layer on which to build program verifiers for otherlanguages Verifiers for C Dafny Chalice and Spec have been built using BoogieWhyML has been used for the verification of Java C and Ada programs

The second category of deductive methods relies on interactive theorem provers(Bertot and Casteacuteran 2004) also called proof assistants such as Isabelle Coq AgdaHOL or Mizar Both the program and its specification are encoded in the proof as-sistantrsquos own language (Gallina and Isar respectively) and the proofs that a programfollows its specification ie that it is functionally correct are typically conducted inan interactive manner using the underlying proof construction engine In other wordsusers are required to actively participate in the verification process by providing induc-tive arguments and guiding the proof through proof tactics proof hints or strategies

Both deductive verification methods offer a high level of assurance For automatictheorem provers the proof chain consisting of multiple steps (the model of the inputprogramming language the generator of verification condition the used SMT solver) atwhich errors could potentially infiltrate can be perceived as a weakness For interactivetheorem provers the high-level expertise required to employ them can be perceived asdiscouraging by the wider audience However major industrial breakthroughs havebeen recently achieved For instance Hyper-V Microsoftrsquos hypervisor for highly secure

12 The Frame Problem in a Nutshell 5

virtualization was verified using VCC and the Z3 prover (Leinenbach and Santen 2009)CompCert (Leroy 2009) the first formally proven C compiler was verified using theCoq proof assistant High security properties of the seL4 microkernel (Klein et al2009) have been proven using the IsabelleHOL proof assistant

Static Program Analysis Static program analysis comprises multiple techniquesfor computing at compile-time safe approximations of the set of values or behavioursthat can occur dynamically when executing a program Static analysis techniquesinitially emerged in the field of compilation where they provided manners to generatecode efficiently by avoiding redundant or superfluous computations (Nielson Nielsonand Hankin 1999)

Static analyses compute sound conservative information However for decadestheir scalability to industrial-size programs has been doubted and their application hasbeen considered as being limited to the research world and to small programs Recentmajor breakthroughs have been achieved however and they triggered on one hand theinclusion of static analysis at different levels of the software validation process (Cousot2001) and on the other hand a proliferation of static code analysers for a varietyof languages targeting mainstream usage and offering a solution for detecting andeliminating common runtime errors A recent example is Infer (Calcagno and Distefano2011) an open-source static analysis tool for bug detection in Java C and Objective-Ccode It was developed at Facebook where it is used as part of the development processfor mobile applications Furthermore static analysis techniques and tools are nowadaysemployed in the safety-critical market segment For instance Astreacutee (Cousot et al2005 Blanchet et al 2003 Cousot et al 2007) a static analyser for embedded softwarewritten in C has been employed for the verification of aerospace software (Delmas andSouyris 2007 Bouissou et al 2009 Bertrane et al 2015) In particular it has beenused for proving the absence of runtime errors in the primary flight control software ofthe fly-by-wire system of Airbus airplanes

It is argued (Cousot and Cousot 2010) that model checking deductive verifica-tion and static program analysis represent approximations of the program semanticsformalized by the abstract interpretation theory (Cousot and Cousot 1977)

Broadly speaking this thesis focuses on static program analysis techniques that aremeant to be used during interactive theorem proving in order to facilitate and auto-mate the verification of a certain class of properties in the context of a strongly typedlanguage

12 The Frame Problem in a NutshellThe frame problem (McCarthy and Hayes 1969) has been initially identified and de-scribed by McCarthy and Hayes in 1969 in the context of Artificial Intelligence (AI) Itshistory is essentially intertwined with that of logicist AI the branch of AI attempting

6 Chapter 1 Introduction

to formalize reasoning within mathematical logic The initial description of the frameproblem is the following

ldquoIn proving that one person could get into conversation with anotherwe were obliged to add the hypothesis that if a person has a telephone hestill has it after looking up a number in the telephone book If we hada number of actions to be performed in sequence we would have quite anumber of conditions to write down that certain actions do not change thevalues of certain fluents In fact with n actions and m fluents we mighthave to write down mn such conditionsrdquo

Unsurprisingly given its identification in the context of logicist AI the frame prob-lem manifests itself in the realm of formal software specification and verification aswell (Borgida Mylopoulos and Reiter 1993) In this area it continues to identify acurrent problem having notoriously tedious consequences and imposing a considerableamount of manual effort For instance when considering a simple procedure

transferAmount(ownerId id1 id2 amount)

that records the transfer of a given sum of money amount from a customerrsquos (identifiedby ownerId) current deposit account (identified by the account number id1) to a savingsaccount (identified by the account number id2) a reasonable specification would bethe following

Precondition owner(id1) = ownerId and owner(id2) = ownerIdandavailableAmount(id1) ge amount

Postcondition availableAmount(id1)rsquo = availableAmount(id1) - amountandavailableAmount(id2)rsquo = availableAmount(id2) + amount

The program states prior to the procedurersquos execution and the ones subsequent to it arereferred to by the typical unprimedprime notation and by the availableAmount(id)and owner(id) functions The given specification declares a precondition that hasto hold prior to transferring the indicated sum of money from one account to theother and it stipulates that the customer identified by ownerId must be the owner ofboth accounts involved in the transaction It also requires that the currently availableamout of money in the deposit account identified by id1 is higher than the amount tobe transferred The postcondition specifies the procedurersquos effects on the final programstate and encompasses the conditions that have to hold after executing the procedureThey include a stipulation about incrementing the amount of money available in thesavings account by the transferred sum amount as well as one referring to decrementingthe amount of money available in the current account by the same amount

As discussed by Borgida et al (Borgida Mylopoulos and Reiter 1993) the prin-ciples on which this specification relies are simple and ubiquitous Program states

13 Prove amp Run Objectives and Products 7

are represented in terms of predicates and functions and a procedurersquos effects on theprogram state are represented as changes to one or more of these predicates and func-tions However the above specification can be interpreted in at least two manners andmultiple implementations with different effects can comply to it For instance oneimplementation that can be considered results in exactly two changes to the programstate as required by the postcondition and as intuitively expected Another implemen-tation considered makes these two changes but additionally also changes the ownershipof the two accounts involved in the transition The postcondition still holds after exe-cuting the second procedure version However the intuitive interpretation of the givenspecification namely that nothing else but the amount of money in the two accountschanges is inconsistent with the second implementation which does more than it isnecessary and indeed even desired In order to prevent such situations the postcon-dition for the transferAmount(ownerId id1 id2 amount) procedure would haveto also include conditions such as

forall id owner(id)rsquo = owner(id) and owner(id2)rsquo = owner(id2)and

forall id id = id1rArr id = id2rArr amount(id)rsquo = amount(id)

In other words the postcondition should include not only information about whatchanges but also about what does not change While this might not seem dramaticfor the trivial example illustrated above in real-world examples this quickly escalatesleading to the necessity of specifying a plethora of conditions of the same type as theones indicated above These are called frame properties Writing such conditions isnecessary but also notoriously repetitive and tedious Kogtenkov et al (KogtenkovMeyer and Velder 2015) rightfully state that

ldquoIt is hard enough to convince programmers to state what their programdoes forcing them in addition to specify all that it does not do may be atough sellrdquo

The tedious undeserved manual effort entailed by the specification and verificationof frame properties is a manifestation of the frame problem Though certain conventionsand approaches such as the implicit frames approach for specifying frame propertiescan alleviate the manual effort imposed some manifestation of the frame problem willbe visible to some extent in the context of any specification language and verificationmethod

13 Prove amp Run Objectives and ProductsThe proliferation of mobile devices with unprecedented processing power storage ca-pacity and access to information already generated a plethora of new possibilities forbillions of people Breakthroughs in emerging technology stemming from fields suchas artificial intelligence and the Internet of Things have increased the number of such

8 Chapter 1 Introduction

possibilities but also brought forth an unprecedented number of massive security risksand challenges Prove amp Runrsquos2 objective is to offer solutions for the security chal-lenges entailed by the large-scale deployment of mobile and connected devices and ofthe Internet of Things

Attempts at addressing security challenges and diminishing or eliminating potentialsecurity issues in systems linked to such devices must put their underlying operatingsystems and kernels at the core of their efforts to ensure the absence of errors orfaulty behaviours Any software running on the operating system depends on theoperating system Furthermore operating systems run in privileged modes in whichprotection from certain faulty behaviours is non-existing and bugs can lead to arbitraryeffects Therefore these central software parts need to provide a high level of trust anddemonstrate proven and auditable compliance with security properties

Motivated by the desire to integrate the usage of formal methods in the industryworld and therefore to contribute to the increase of software quality and security thecompanyrsquos initial efforts concentrated on offering a reliable software solution that fa-cilitates the formalization of software functioning and mathematically proves that thissoftware accurately and correctly follows its specification and ensures complex secu-rity properties This led to the development of ProvenTools a software developmenttoolchain designed to write and formally prove models written in Smart Prove amp Runrsquospurely functional unified programming and specification language For formally prov-ing models written in Smart ProvenTools integrates an interactive proof assistant whichautomates simple proofs and guides or assists users during more complex ones Theprover was designed to offer detailed explanations about its results providing either thereasoning steps employed for achieved proofs or detailed information for properties thatcannot be proven Such transparency on the proverrsquos side is imperative for productsthat have to be certified as auditors need to be able to verify the claims of the proverFurthermore ProvenTools includes a generator for transforming programs modeled inSmart into their equivalents in other languages such as C while leveraging the proofguarantees of the Smart model

Following the development of ProvenTools Prove amp Run reached a new stage con-centrating on developing and providing formally proven microkernels and hypervisorsUnlike the widely used operating systems which are enormous and typically have mil-lions of lines of code microkernels are compact minimal software systems that canprovide all the mechanisms that need to run in privileged mode including low-level ad-dress space management thread management and inter-process communication Theycan be used for creating a protected secure environment on the execution platformon top of which sensitive security-critical services can run Being much smaller in sizecompared to traditional operating systems they are amenable to formal verificationHypervisors or virtualization platforms create and host virtual machines They cre-ate the possibility of running multiple different operating systems whose execution ismanaged by the hypervisor which has full control over all critical resources such asthe memory or the CPU Therefore any security issue of the hypervisor impacts every

2Prove amp Run Website httpwwwprovenruncom

14 Context and Problem Statement 9

operating system it hosts The security and reliability of the host hypervisor is thuscrucial

By employing Smart and ProvenTools two microkernels have been developed3 Thefirst named ProvenCore is a formally proven general purpose microkernel that ensuresisolation ie integrity and confidentiality The second named ProvenCore-M targetsembedded devices based on microcontrollers ProvenVisor is a hypervisor currently indevelopment at Prove amp Run

14 Context and Problem StatementDuring the development of ProvenCore it became obvious that the specification andverification of transition systems in general and operating systems in particular arenot insulated from the frame problem The latter are characterized by complex statesdefined by algebraic data types and associative arrays which are fundamental buildingblocks for representing grouping and handling complex data efficiently Transitionstheir other characteristic component map such a complex input state to an outputstate However most transitions are rarely concerned with the entire input state thatthey are manipulating for retrieving the output state Most frequently they depend on

sX

t

f

Observation

Observation

Figure 11 ndash Complex Transition Systems Frame Problem

and modify only a limited subset of it Intuitively properties holding for the inputstate should hold for the output state following the transition as well as long asthey depend only on fragments of the state that are not modified by the transition Inpractice proving the preservation of such properties does not come for free and imposesconsiderable manual effort and a multitude of tedious repetitive proofs

3Prove amp Run Products httpwwwprovenruncomproducts

10 Chapter 1 Introduction

This general case is illustrated in Figure 11 where a transition system and a states in it are considered For the state s a property depending only on a limited subsetshown in the grey rectangle with vertical lines is known to hold A transition f leadsto a new state t obtained by modifying only a small part of the input state s shownin the orange rectangles with inclined lines Since the previously proven property isknown to depend only on an unmodified subset of the state we should be able to inferthe preservation of the property for the state t as well This however is not inferred bydefault

The goal of this work is to address this issue and to find an automatic solution forinferring the preservation of such properties More specifically we target the automaticinference of properties that depend only on an input subset that is disjoint from anoperationrsquos frame ie the state subset it modifies

To this end we propose a solution based on static analysis which does not requireany additional frame annotations We argue that by detecting the subset on which aproperty depends and by uncovering the part that is not modified by an operationas shown in Figure 12 we can automatically discharge proof obligations related tounmodified parts We employ two different static analyses for this goal

Dependency Obs

= Obs

Correlation f

=

Invariant Obs

rArr Obs

f

Figure 12 ndash Frame Problem and Solution Strategy

The first analysis of our two-step strategy is a dependency analysis which is meantto detect the input subset δ on which the outcome of an operation or of a logicalproperty L relies This was illustrated by the grey rectangle with vertical lines inFigure 11 The second one is a correlation analysis meant to detect the subsetξ modified by an operation O This was illustrated by the orange rectangles withinclined lines in Figure 11 By employing these two static analyses thus detecting δand ξ automatically and by subsequently reasoning based on their combined resultswe can infer the preservation of the property L for the post-state of O

We target the development of a proof tactic that relies on our solution based onstatic analysis and that is meant to be integrated into the interactive proof assistantoffered by ProvenTools Smart the language to which the ProvenTools toolchain isassociated is a purely functional language manipulating immutable algebraic datastructures and associative arrays

15 Contributions and Structure of the Document 11

The motivation and ideas behind this work were triggered by the verification ofProvenCore Its proof is based on multiple refinements between successive models fromthe most abstract on which the isolation property is defined and proven to the mostconcrete ie the actual model used for code generation The global states of the ab-stract layers are complex structures with multiple compound fields Commands suchas fork exec exit can be executed Each of these receives as input the global statebefore executing the command and returns the state of the system after execution Inpractice most supported commands effectively affect only a limited number of invari-ants Automatically proving the preservation of unaffected invariants can diminish thetotal number of proof obligations

15 Contributions and Structure of the DocumentWe propose an approach for automatically inferring the preservation of framing-relatedinvariants which is meant to be used in the context of an interactive theorem proverOur approach employs two different static analyses namely a dependency analysis and acorrelation analysis Both analyses handle associative arrays and algebraic data typesie structures and variants and compute fine-grained results mirroring the layeredstructures of such types

The dependency analysis handles functions and their specifications in a unified man-ner and computes for each possible execution scenario a conservative approximation ofthe input (sub)elements on which their outcome depends It is a flow-sensitive path-sensitive interprocedural analysis For variants an additional analysis is simultaneouslyconducted for computing the subset of possible constructors on a given execution sce-nario

In order to introduce a relaxed form of context-sensitivity for our dependency anal-ysis we have devised an extension based on symbolic paths

The correlation analysis detects the flow of input values into output values It com-putes a conservative approximation of fine-grained equivalences between the input andthe output subelements of a function It is an interprocedural analysis that summarisesthe behaviour of functions and detects what is modified and to what extent

For both analyses a prototype has been implemented and applied to a medium-sizedfunctional specification of a microkernel

The rest of this dissertation is structured into 8 chapters the first two being intro-ductory

Chapter 2 discusses the manifestations and effects of the frame problem on bothformal specification and formal verification and presents some of the main approachesemployed for addressing them We also include a brief presentation of some of theleading specification languages and deductive verification tools and their mechanismsfor dealing with frame properties

In Chapter 3 we introduce the features and the syntax of Smart the unified pro-gramming and specification language developed at Prove amp Run and give a conciseoverview of ProvenTools the toolchain associated with it

12 Chapter 1 Introduction

After these two preliminary chapters in Chapter 4 we focus on the computationalversion of Smartrsquos intermediate language as it is the language that we consider through-out the rest of this dissertation We present its syntax underline its specificities andpresent its formal semantics

Chapter 5 is dedicated to the dependency analysis the first of the two static analysesthat we have developed and designed as companion tools to be used during interactiveprogram verification We present our abstract dependency domain that mirrors thelayered structure of associative arrays and algebraic data types discuss the analysisat an intra- and interprocedural level and present the semantic interpretations of thecomputed dependency information

Chapter 6 touches upon the issue of context-sensitivity and presents our extensionto the dependency analysis presented in Chapter 5 This is meant to eliminate someimprecision by introducing a relaxed form of context-sensitivity

The correlation analysis the second component of our strategy for inferring thepreservation of frame-related invariants is presented in Chapter 7 We introduce ourabstract partial equivalence type discuss the need for an additional level of abstractionallowing us to refer not only to variables but also to substructures within them and givean in-depth presentation of the analysis at an intraprocedural level and a descriptionof it at the interprocedural level

The implementations of our two analyses and the results obtained on a medium-sizedfunctional specification of a microkernel are presented in Chapter 8 The strategy foremploying the information computed by the two analyses is discussed and illustrated

Finally Chapter 9 concludes this dissertation with a summary of our contributionsand some remarks concerning the specificities of each of our static analyses as wellas our experience with their design and implementation In addition we also discussfuture perspectives and potential extensions to this work

Notes about Chapter 5 and Chapter 7

bull The work presented in Chapter 5 was the subject of a publication in the pro-ceedings of the 17th International Conference on Formal Engineering Methods(ICFEM15) (Andreescu Jensen and Lescuyer 2015)

bull The work presented in Chapter 7 was the subject of a publication in the proceed-ings of the 14th International Conference on Software Engineering and FormalMethods (SEFM) (Andreescu Jensen and Lescuyer 2016)

bull On-line dedicated web pages The prototypes for each of the two discussedstatic analyses can be tested on their dedicated web pages Various examplesare provided and explained and additionally users can devise and test their ownexamples The corresponding links are indicated in the chapters

13

Chapter 2

The Frame Problem in SoftwareVerification

All his successors gone before him havedonersquot and all his ancestors that comeafter him may

William Shakespeare

In this chapter in Section 21 we give a very brief necessarily incomplete pre-sentation of some of the major existing specification languages and verification toolsfocusing on those which have addressed the frame problem explicitly and which are rel-evant for our discussion in the section following it We then discuss the manifestationsof the frame problem in formal specification and verification in Section 22 and presentthe basic approaches to specifying and verifying frame properties in Section 23 In Sec-tion 24 we explain some of the difficulties entailed by these goals when combined withother concerns such as considerations regarding heap modifications and informationhiding Even though we are not concerned with information hiding and heap modifica-tions are beyond the scope of our work there are some parallels that can be drawn andsome ideas stemming from work that has been done in these areas that are relevant forour context and solution as well In Section 25 we briefly present other approaches tothe automatic detection of frame properties Finally we give a short overview of someof the approaches used for specifying and reasoning about pure methods in Section 26

21 Specification Languages and Verification ToolsDafny Dafny (Leino 2010) is a programming language designed at Microsoft witha focus on verification It is an imperative sequential language supporting genericclasses dynamic allocation and inductive data types Additionally it also offers built-in specification constructs such as pre- and postconditions frame specifications (whichwe will discuss in more detail in Section 23) quantifiers loop invariants and termi-nation metrics (decreases clauses used in conjunction with loop invariants) Theseare reminiscent of contracts in Eiffel (Meyer 1997 Meyer 1991) or similar constructsin JML (Leavens Baker and Ruby 2006) and Spec (Barnett et al 2005b) whichwe will present in the following paragraphs as well Additionally Dafny also includes

14 Chapter 2 The Frame Problem in Software Verification

support for algebraic data types recursive functions and types as well as updatableghost variables which are not allowed to flow into non-ghost variables Ghost vari-ables and specification constructs in general are eliminated from the executable codeas they are meant to be used strictly during verification For framing Dafny relies ondynamic frames (Kassios 2006) using ghost variables We will discuss this approach inSection 24

Dafny has an accompanying static program verifier run as part of the compilerwhich targets the verification of functional correctness properties of programs Thisis built on top of the Boogie verification engine (Barnett et al 2005a) which in turnuses Z3 (Moura and Bjoslashrner 2008) The Dafny compiler translates verified programswritten in Dafny to executable code for the Net Platform The tool is open source andcan be tried online 1

Smart the modeling language developed at Prove amp Run will be presented in detailin Chapter 3 Similar to Dafny it is a unified programming and specification languagedesigned with the goal of facilitating verification Unlike Dafny Smart is a functionallanguage relying on predicates the equivalent of functions in other programming lan-guages Both Dafny and Smart are translated into intermediate languages (Boogie andSmil respectively) which act as median layers between Dafny or Smart programs andthe underlying verification tools For Smart the deductive verification tool is an inter-active proof assistant Executable code can be generated from both verified Dafny andverified Smart models

Spec The Spec programming system (Mike Barnett 2005 Barnett et al 2005bBarnett et al 2011) includes a programming language a compiler and a static programverifier It stems from a research effort focusing on the development of a specificationmethodology for object-oriented languages and seeking suitable approaches for enforc-ing it both statically and dynamically The Spec methodology introduced some newideas that influenced the research community and served as a starting point for otherapproaches (Barnett et al 2011) It supports sound modular verification of object in-variants in the presence of multi-object invariants subclassing and reentrancy Specled to advances concerning the specification of pure methods ie methods withoutside-effects and it introduced an ownership model that allows expressing and usingheap topologies in specifications (Barnett et al 2011) We will discuss the latter inSection 24

The language Spec is a formal object-oriented language extending the type sys-tem of C with non-null types and checked exceptions It provides standard methodcontracts based on pre- and postconditions as well as object invariants as inspiredby Eiffel and the Design by Contract (Meyer 1992) approach The accompanyingcompiler performs various static data-flow analyses for checking that the non-null typesystem is enforced and that contracts are pure ie have no side-effects In additionit also performs admissibility checks which are important for soundness and consist in

1Dafny Web Page httpswwwmicrosoftcomen-usresearchprojectdafny-a-language-and-program-verifier-for-functional-correctnessAccessed 2017-02-12 (Archived by WebCite Rcopy at httpwwwwebcitationorg6oE9sn0iL)

21 Specification Languages and Verification Tools 15

restricting what can appear in object invariants and what pure methods can read Thecompiler also emits runtime checks run-time assertions are generated for the programpoints at which contracts are supposed to hold and any failure causes an exception tobe thrown (Barnett et al 2011)

Another important contribution having its origins in the Spec project are theBoogie intermediate language and verification engine Spec programs are translatedto the Boogie language where the heap is modeled as a two-dimensional array indexedby object references and field names Method calls are modeled by assuming theirpreconditions and type information by assigning arbitrary values to anything thatthey might modify and by subsequently assuming their postconditions Based on thisverification conditions are generated and expressed in a standard format supported byautomatic theorem provers Any error reported by the theorem prover is mapped backto Boogie and then to Spec (Barnett et al 2011)

Spec2 has been developed at Microsoft and is publicly available

Boogie The Boogie project 3 comprises both an intermediate verification languageand a verification tool The Boogie language (This is Boogie 2 Boogie Reference Man-ual) is meant to be used as an intermediate representation for static program verifiersof various source languages such as Dafny Chalice and Spec Verifiers for C such asVCC and HAVOC have been built on top of Boogie as well It supports mathematical(types constants functions axioms) and imperative components (global variables pro-cedure declarations and implementations) The latter specify sets of execution tracesthereby describing and constraining states using the former Parametric polymorphismpartial orders nondeterminism logical quantifications total expressions and partialstatements are among the languagersquos features

The Boogie verification tool (Barnett et al 2005a) infers invariants of the inputBoogie programs and then generates verification conditions expressed as formulae infirst-order logic and arithmetic that are passed to an SMT solver such as Z3 Theencoding for the verification formulae allows the reconstruction of error traces fromfailed proofs

JML The Java Modeling Language (JML) (Leavens Baker and Ruby 2006 Leavenset al 2006) is a behavioural interface specification language (Wing 1987) targetingas its name implies the specification of Java classes and interfaces Its design wasguided by the syntax and semantics of Java as some of the main targeted charac-teristics were understandability and a shallow learning curve for programmers alreadyfamiliar with Java The constructs it supports are inspired by the Design by Contractapproach as well as by the Larch family of specification languages (Guttag Horning

2Spec Web Page httpswwwmicrosoftcomen-usresearchprojectspecAccessed 2017-02-12 (Archived by WebCite Rcopy at httpwwwwebcitationorg6oEAJnY8b)

3Boogie Web Page httpswwwmicrosoftcomen-usresearchprojectboogie-an-intermediate-verification-languageAccessed 2017-02-12 (Archived by WebCite Rcopy at httpwwwwebcitationorg6oEAgwOzp)

16 Chapter 2 The Frame Problem in Software Verification

and Wing 1985) It also includes quantifiers constructs for specifying frame conditionsand specification-only fields and methods

Nowadays an evergrowing variety of tools supports JML (Burdy et al 2005)ranging from tools for type-checking specifications (the jmlc compiler) to tools forruntime debugging static analysis (such as ESCJava2 (Flanagan et al 2002 Burdyet al 2005 Chalin et al 2005) and Chase) and verification (such as LOOP KeY andKRAKATOA)

ESCJava2 performs extended static checking (Flanagan et al 2002) for Java pro-grams annotated with specifications written in JML It can check assertions and detectfrequent types of errors in Java such as dereferencing null or indexing an array outsideits bounds However the ESCJava2 tool did not initially address aspects related tochecking frame conditions and this became a notorious source of unsoundness (Burdyet al 2005) Various static verification tools (Berg and Jacobs 2001 Catantildeo and Huis-man 2003 Marcheacute Paulin-Mohring and Urbain 2004 Marcheacute 2016) and dynamicapproaches (Lehner and Muumlller 2010) addressed this issue

22 Manifestations of the Frame ProblemIn the realm of software verification the frame problem refers to establishing the bound-aries within which program elements operate and it has notoriously tedious implica-tions and consequences along two different axes the specification of frame propertiesor frame conditions which indicate which parts of the program state an operationis allowed to modify and their verification ie proving that operations modify onlywhat is allowed according to the specified frame properties Additionally the verifi-cation of frame properties has other ramifications such as proving the preservation ofproperties concerning parts of the state that are external to an operationrsquos frame iethe parts of the state modified by the operation Though identified decades ago in1969 in the context of Artificial Intelligence (McCarthy and Hayes 1969) the frameproblem is still a current concern in the field of formal specification and verificationLeavens et al (Leavens Leino and Muumlller 2007) identify it as one of the difficultremaining challenges in program verification Even more recently Bertrand Meyer de-scribed it as a subsisting problem (Meyer 2015) He argues that it constitutes anexcellent candidate for automation and describes the usual approaches to the frameproblem such as those frequently based on separation logic (Reynolds 2005) or own-ership types (Clarke Potter and Noble 1998) as elegant but requiring undeservedmanual specification effort in addition to annotations on the implementation side Inorder to make verification appealing to a wider audience in the industry the amountof annotations required from the programmers is of the utmost importance and thusmust be carefully taken into consideration when devising a solution While it is le-gitimate to require the specification of properties expressing the functional behaviourexpected of program elements intermediate properties to which frame properties be-long to should as much as possible be detected automatically They are an integral

23 Approaches to Specifying Frame Properties 17

part of a complete specification and they are necessary for proving functional correct-ness but in practical terms they are repetitive and cumbersome and their specificationis an inconvenience (Meyer 2015) Borgida et al provide a comprehensive discussionof the problem itself and the approaches to addressing it (Borgida Mylopoulos andReiter 1993 Borgida Mylopoulos and Reiter 1995) In (Borgida Mylopoulos andReiter 1995) Borgida et al suggest grouping the permissions to modify variablesaround variables themselves instead of methods However this type of specificationshave an unclear semantics in terms of proof obligations (Muumlller 2002) A more recentdiscussion of framing is provided by Hatcliff et al and it is included in a comprehensivesurvey of behavioural interface specification languages (Hatcliff et al 2012) A discus-sion regarding the remaining challenges related to the frame problem with a focus onmodular verification and information hiding is included in (Leavens Leino and Muumlller2007) The authors discuss possible approaches for addressing these challenges as wellas their respective limitations In the following section we present the main existingapproaches to specifying frame properties

We remark that Smart does not provide any explicit specification constructs forframe conditions It is a functional language and it does not support global variables ordestructive updates Implicitly Smart predicates may read anything passed to them asan input without modifying it and write everything in their output or locally declaredvariables The preservation of a frame property ie a logical property depending onlyon parts of the input that are copied without any modification to the output can bespecified as an implication of the form

frame_property(input) =rArr predicate(input output) =rArr frame_property(output)which can be included either in the predicatersquos postcondition or as a separate predicatewith a Boolean result receiving the predicatersquos input output elements as inputs

23 Approaches to Specifying Frame Properties

Various approaches for expressing frame properties have emerged These are knownas the manual exclusive and implicit approaches (Meyer 2015) We remark that allthree major approaches target only the specification of write effects of an operationMost specification languages do not offer special constructs for the specification of readeffects (some notable exceptions are JML Dafny and WhyML the programming andspecification language provided by Why3)

231 The Manual Approach

One of the existing approaches to specifying frame properties does not rely onany specific technique but instead treats them like any other specification componentThis consists in explicitly stating for each operation what is not modified implicitlyconveying that everything else may change This type of specification can be donewith logical variables or with old expressions by explicitly stating for each unchanged

18 Chapter 2 The Frame Problem in Software Verification

variable that its value in the operationrsquos post-state is equal to its prior value in theoperationrsquos pre-state

As described by McCarthy and Hayes (McCarthy and Hayes 1969) with m op-erations such as transfer and n ldquofluentsrdquo such as owner in our introductory examplefrom Section 12 the manual convention leads to a proliferation of clauses that needto be specified Their number can potentially be as high as mn This can prove tobe tedious repetitive and diverting attention and effort from what is truly interestingwhat is actually modified by the operation and how Moreover this approach can leadto instability in the software process (Meyer 2015)

For instance adding new fields to a class whose existing methods are not affected bythe newly added fields requires modifying the postcondition for each existing methodand adding clauses of the form newField = old newField for each added field

Both Dafny (Leino 2010) and Spec (Leino and Muumlller 2008a) support clauses ofthe form e = old(e) in method postconditions for specifying that a method has noimpact on the value of an expression e However these are not the primary mechanismsfor specifying frames in either Dafny or Spec as we will discuss in Section 232

In Smart for predicates manipulating inputs and outputs of the same structuredtype it can be specified in the postcondition that the values of certain fields are equalbetween the received input and the obtained output For instance for a predicatereceiving an input structure of type stype having fields f g h and returning an outputstructure of the same type where the values of the fields f h are equal to their valuesin the input a standard postcondition would have the following form

stypeequals[fh](input output)

This can be viewed as a form of old expressions However the construct used in theabove postcondition which we will discuss in Chapter 3 was not introduced specificallyfor this purpose This idiom is frequently employed for specifying contracts for implicitpredicates a form of foreign or native functions signatures

As we will discuss in Chapter 7 the fine-grained relations that we are detectingbetween parts of the input and parts of the output can be seen as clauses of the formsubvalue = old(subvalue) However in our case these are detected automatically bymeans of static analysis and thus do not require any annotation or manual effortFurthermore by detecting them automatically the potential of changes to the modeledentities and types leading to instability is eliminated

Another problem with this approach becomes visible when some variables are notin scope and hence cannot be explicitly mentioned in the specification (Hatcliff et al2012) In order to overcome the problem in this context complex solutions (Reynolds1981 OrsquoHearn Reynolds and Yang 2001 Banerjee Naumann and Rosenberg 2008)based on Hoare logic style frame rules (Hoare 1971) have been suggested (Hatcliff etal 2012)

23 Approaches to Specifying Frame Properties 19

232 The Exclusive Approach

The most frequent approach to framing is the exclusive approach This consists inexpressing frame properties by means of modifies-clauses that list all the variables thatmay be modified by an operation Implicitly everything that is not listed in such clausesis understood as having to remain unchanged (Guttag et al 1993a) This approachrelies on the observation that the mn matrix described by McCarthy and Hayes isusually sparse as most operations affect only a limited number of elements (Meyer2015)

Modifies clauses such asmodifies a b c can be interpreted as a set of clauses of theform q = old(q) for any q other than a b or c Despite their widely accepted yet mildlymisleading name a modifies clause does not require a command to modify all the listedelements Essentially modifies clauses put an upper bound on the set of elements thatcan be modified and imply that it is strictly forbidden to modify anything else Theexclusive approach to specifying frame properties owns its name to its characteristicof identifying unaffected elements by exclusion (Meyer 2015) Bertrand Meyer arguesthat a more appropriate name for such clauses is only clauses (Meyer 2015) sincethe main goal is not necessarily to enumerate variables that will change but rather tospecify that everything else ie variables that are not listed will not change

This approach has its roots in the modifies construct presented by Liskov and Gut-tag (Liskov and Guttag 1986) Forms of modifies clauses have been used in manydifferent specification languages including the Larch family (Guttag Horning andWing 1985 Guttag et al 1993a) JML (Leavens et al 2006) Spec (Mike Barnett2005) Dafny (Leino 2010) and Z (Abrial Schuman and Meyer 1980)

In JML (Leavens Baker and Ruby 2006) modifies clauses are called assignableclauses and are used for indicating locations that a method may assign to These areslightly different than classical modifies clauses in other languages For instance amethod assigning to a location a and then re-establishing its original value is requiredto list a in its corresponding assignable clause A typical modifies clause however doesnot require listing a since the method does not modify a effectively JML also featuresconditional modifies clauses allowing methods to specify that a modification may occuronly in certain situations Non-pure methods that do not explicitly specify assignableclauses are by default given an assignable everything clause Pure methods have bydefault an assignable nothing clause (Chalin et al 2005) Additionally JML providesaccessible clauses that allow specifying accessed locations (Leavens et al 2006)

In Dafny (Leino 2010) modifies clauses are expressed by sets of objects and theymust be interpreted as giving permissions to a method to modify any field of any objectthat is a member of the specified set Frame conditions are thus expressed at the levelof objects and not at the level of object fields While Dafny methods are not required tospecify what they read for Dafny predicates ie functions returning Booleans readingframe conditions can also be specified (Koenig and Leino 2012) These are memorylocations that predicates are allowed to read and they can be specified as sets ofobjects or object fields Dafny checks that memory locations outside the reading frame

20 Chapter 2 The Frame Problem in Software Verification

are not accessed nested predicate calls must have reading frames that are includedin the reading frames of the calling predicate Predicate parameters are not memorylocations and hence must not be declared In addition Dafny uses a form of dynamicframes (Kassios 2006) that we will present in Section 24

In Spec (Mike Barnett 2005 Leino and Muumlller 2008a) modifies clauses can beexplicitly added for constraining the modification of objects that were allocated in thepre-state of a method ie new objects allocated and modified by a method need notbe included in the modifies clauses Methods can specify that any field of an object omay be modified with a construct of the following form o it can also be specifiedthat only some field a may be modified with a construct of the form oa Unlikethe clauses expressed using old in postconditions for excluding some modificationsmodifies clauses must account for temporary modifications as well (similarly thus tothe JML assignable clause interpretation) For instance for a method decrementingsome integer field f and incrementing it subsequently the method could still specifythat f = old(f) in its postconditions However it would also have to include f in itsmodifies clause

Spec implicitly adds a modifies clause to methods in which this is the onlylisted element Thus by default methods are allowed to modify any field of the thisobject To prevent this the fields that may be modified must be explicitly includedin the clause (meaning that those not included are not allowed to change) A specialconstruct of the form thiso must be explicitly used for specifying that a method doesnot modify any field of this (Leino and Muumlller 2008a)

Information hiding imposes mechanisms for abstracting over program state thatcannot be explicitly mentioned in the modifies clause of a public method To this endwildcards can be used for specifying that the private representations of objects may bemodified as well as for specifying the modification of state in subclasses (Leino andMuumlller 2008a) However wildcards do not extend to aggregate objects and to this endSpec introduces the notion of ownership that we will discuss in Section 24

In Boogie frame conditions are expressed using coarse-grained modifies clausesin conjunction with postconditions These can quantify over fields and specify locationsof the heap that may be modified (This is Boogie 2 Boogie Reference Manual)

SPARK (Barnes and Limited 1997) uses a variation of the typical exclusive ap-proach SPARK procedures may reference or update the state associated with theirparameters in addition to that of global variables SPARK contracts must explicitlyaccount for the global variables accessed (read or written) during procedure executionin a globals construct Additionally for each parameter or global variable it must beindicated if it is read only written only or both read and written As SPARK is basedon the Ada language this is done by means of mode annotations such as in outindicating that a parameter or global variable is read only or written only respectivelyThe in out annotation is used for signaling that the annotated parameter or globalvariable is both read and written Together mode annotations on parameters and glob-als provide a complete specification of the inputs and outputs of a procedure (Hatcliffet al 2012) VDM (Jones 1990) provides similar annotations

24 Topologies and Effects 21

The exclusive convention facilitates the specification of pure operations ie opera-tions having no side-effects on which assertions in various languages including EiffelJML and Spec rely on for supporting data abstraction Specifying that an operationis pure simply amounts to specifying an empty modifies clause However specifyingand verifying the effects of heap modifications on the results of pure methods has beendescribed as one of the difficult remaining challenges related to framing (Hatcliff et al2012)

233 The Implicit Approach

The implicit approach eliminates the need to specify frame properties per se One ofthe implicit approaches relies on limiting what a procedure can modify based on theprocedurersquos precondition This approach is adopted in separation logic (discussed inSection 24) and in the implicit dynamic frames (Smans Jacobs and Piessens 2012)technique where reading and writing to memory requires knowing that the memorycontains that location To this end accessibility information is specified in the precon-ditions of methods By analysing preconditions an upper bound on the set of locationsthat are modifiable by a procedure can be detected As will be discussed in Chapter 7our approach to inferring fine-grained modifications can be seen as an implicit one aswell It relies on data-flow analysis and it is entirely automatic without requiring anydedicated annotations

Another approach to implicit framing was presented by Meyer He proposes theinference of frame properties for a method from the methodrsquos postcondition (Meyer2015) This approach relies on the empirical observation that in practice when pro-grammers realize that an element is modified by a methodrsquos execution they will gener-ally include and express information about how the element is modified It was inspiredby an informal review of publicly available JML code which showed that in practiceelements included in an assignable clause overlap those appearing in the methodrsquos post-condition Meyer argues that any exception to this observation can be easily addressedby inserting a Boolean function into the postcondition which always returns true andwhich introduces its elements into the implicit frame (Meyer 2015)

24 Topologies and EffectsSpecification techniques for complex data structures and operations manipulating themmust be able to describe and to address issues related to two different aspects namelythe topology or structure of the former and the effects of the latter on the data struc-turesrsquo state (Hatcliff et al 2012) In the object-oriented realm objects encapsulatestate and functionality yet their implementations are rarely limited to the fields andmethods of a single object After all one of the principles of object-oriented program-ming is to favour composition over inheritance Thus object fields reference otherobjects often of different classes and those objects in turn reference yet other objectsand so on In order to reason about and to prove functional correctness specificationshave to capture this ldquocompositerdquo shape of the implemented data structures (Leino and

22 Chapter 2 The Frame Problem in Software Verification

Muumlller 2008a) They also have to describe the effects of operations on the state ofthe data structures including write effects ie which parts are potentially modified byan operation and read effects ie which parts are potentially accessed by an opera-tion (Hatcliff et al 2012)

For objects and heap data structures the write and read effects (Greenhouse andBoyland 1999) refer to parts of the heap ie locations Specifications for heap datastructures might also require including allocation and deallocation effects as well aslocking information (Hatcliff et al 2012) Detecting and reasoning about read andwrite effects is necessary and relevant in different situations For instance Greenhouseand Boyland (Greenhouse and Boyland 1999) present an effects system for performingsemantics-preserving program manipulation on Java source code

Our work is done in the context of a purely functional language with immutabledata structures and no destructive updates Reasoning about the heap is beyond ourscope However our concerns are similar we handle ldquocompositerdquo data structuresmodeled by immutable associative arrays and algebraic data types ie structures andvariants and we want to capture the behaviour of operations receiving such a compositeinput manipulating it reconstructing it and returning its new state into a compositeoutput Thus in contrast to specification and reasoning techniques for objects whichare concerned with deep-heap effects we are concerned with deep-state effects

Specification techniques for topologies and effects must address three major chal-lenges namely abstraction reasoning and framing (Hatcliff et al 2012)

Abstraction In the object-oriented context heap properties must be expressed in animplementation-independent manner Abstraction is important for information hidingand for supporting subtyping (Leino 1998 Leavens and Muumlller 2007) Aspects relatedto visibility and information-hiding are orthogonal to our work The language we areworking with does not have subtyping Therefore disclosing the topology of our datastructures is not problematic from this point of view

Reasoning The formal framework in which (heap) properties are expressed shouldallow efficient ideally automatic reasoning

Framing Specifications of heap operations should ease reasoning about framing andaid in proving that certain heap properties are not affected by a heap operation Fram-ing can be illustrated by the following rule expressing that a state that is unmodifiedby C can be preserved

PCQP andRCQ andR

if the write effect of C is disjoint from the free variables of R In the presence of complexheap data structures the disjointness of the effects of C and the assertion R is moredifficult to express as it needs to specify that the locations that are modified by C aredisjoint from the locations read by R Similarly though not referring to locations we

24 Topologies and Effects 23

have to be able to express that the substructures (or subelements) modified by C andthose read by R are disjoint

The sets of written or read locations are called footprints Hatcliff et al classifyapproaches to the specification of heap properties into three categories The first cate-gory relies on explicit footprints and uses sets of objects or locations that are includedin predicates and effects specifications Dynamic frames (Kassios 2006 Kassios 2011)and region logic (Banerjee Barnett and Naumann 2008 Banerjee Naumann andRosenberg 2013) are the main exponents of this category The second category re-lies on implicit footprints which are derived from predicates in specialized logics suchas separation logic The third approach relies on predefined footprints which are de-rived from predefined heap topologies (Hatcliff et al 2012) Ownership types (ClarkePotter and Noble 1998) are the main exponent of this category All of these tech-niques allow specifying the topologies of common heap data structures and reasoningabout the effects of operations However each amounts to a different balance betweenexpressiveness and automation (Hatcliff et al 2012)

241 Explicit Footprints

The explicit footprint approach to framing was pioneered by Kassios and the dynamicframe theory (Kassios 2006 Kassios 2011) This proposed adding sets of locations tothe specification language and expressing footprints in terms of such sets For preservinginformation hiding these sets of locations can involve dynamic frames specificationvariables that abstract over a set of locations The initial solution based on dynamicframes was formalized in the context of an idealized logical framework using higher-order logic and inductive-based proofs which are difficult to automate Subsequentwork on region logic (Banerjee Naumann and Rosenberg 2008 Banerjee Barnettand Naumann 2008 Banerjee Naumann and Rosenberg 2013) and the Dafny verifieron one hand and VeriCool (Smans Jacobs and Piessens 2008) on the other handdeveloped dynamic frames in a first-order setting

VeriCool uses pure methods for describing sets of locations Recursively defined puremethods or logic functions can be a challenge for automatic theorem provers (Hatcliffet al 2012 Banerjee Barnett and Naumann 2008)

In region logic for minimizing the need for inductively defined predicates in spec-ifications the specification attributes used in the dynamic frames approach (Kassios2006) are replaced with ghost state (Banerjee Naumann and Rosenberg 2013) iemutable auxiliary fields and variables Programs have to be explicitly annotated withthese which might imply a cumbersome manual effort but unlike the dynamic frametheory in its original form this permits automated theorem proving

Zee et al have used explicit footprints for verifying the functional correctnessof linked data structures in Jahob (Zee Kuncak and Rinard 2008) Banerjee etal (Banerjee Naumann and Rosenberg 2008 Banerjee Barnett and Naumann 2008)encoded region logic in the intermediate verification language Boogie (Leino and Ruumlm-mer 2010)

24 Chapter 2 The Frame Problem in Software Verification

The dynamic frames approach using ghost variables is supported by the Dafnylanguage (Leino 2010 Koenig and Leino 2012) As described in Section 232 Dafnysupports the exclusive approach to specifying frames Ghost variables are used inmodifies clauses The standard idiom consists in declaring a set-valued ghost fieldRepr for instance to dynamically maintain Repr (ie explicitly update it in the code)as the set of objects that are part of the receiverrsquos representation and to use Repr inmodifies clauses (Leino 2010) The following idiom is standard (Leino 2010)

class MyClass ghost var Repr setltobjectgtmethod SomeMethod() modifies Repr

This modifies clause is to be interpreted as the method may modify any field ofany object in Repr If this is a member of the Repr set then the modifies clause alsoallows the method to modify the field Repr itself (Leino 2010)

With explicit footprints proving frame properties consists in proving that the readeffects of a predicate and the write effects of a method are disjoint

Before the dynamic frame approach data groups (Leino 1998 Leino Poetzsch-Heffter and Zhou 2002) and solutions based on the Universe type system (Muumlller2002) have been proposed for specifying footprints within single objects

The level of expressiveness offered by techniques based on explicit footprints is veryhigh allowing specifications to relate different regions in arbitrary ways ranging fromdisjointness or inclusion of regions to characterizing their intersection However thisflexibility complicates reasoning When regions are stored explicitly in ghost variablesas is done in Dafny programs need to explicitly update these ghost variables to maintaininvariants This can prove to be a cumbersome task When pure methods are used asin VeriCool it is mandatory to reason explicitly about the effects of heap modificationson their results (Hatcliff et al 2012)

242 Implicit Footprints

The implicit footprint approaches rely on specialized logics for implicitly representingfootprints Separation logic (OrsquoHearn Reynolds and Yang 2001 OrsquoHearn Yang andReynolds 2004 Reynolds 2002 Reynolds 2005 Reynolds 2000) is the most prominentrepresentative of this category

Separation logic extends Hoare logic (Hoare 1971) with the separating conjunctionoperator lowast Each assertion in separation logic defines a portion of the heap Theassertion P lowastQ is true if and only if P and Q hold for disjoint parts of the heap Localreasoning is fundamental to separation logic (OrsquoHearn Reynolds and Yang 2001)specifications need to describe all the state that the code C reads or writes Thus inthe triple PCQ P must be interpreted as being all the state that is needed forexecuting C ie the footprint of C This interpretation of Hoare triples leads to thefollowing frame rule in separation logic

24 Topologies and Effects 25

PCQP lowastRCQ lowastR

which allows inferring that a local property is preserved for a wider state obtained byextending P with another disjoint state R Some versions of separation logic imposeadditional conditions about local variable modifications as the lowast operator only separatesheaps Separation logic can be extended such that lowast also separates variables thuseliminating the need for additional conditions (Parkinson Bornat and Calcagno 2006)

A separation logic for Java was introduced by Parkinson (Parkinson and Bierman2005) This has primitive assertions to describe the values of fields in the heap andallows describing portions of the heap containing several disjoint objects using the lowastoperator

Separation logic does not require explicitly specifying read or write effects They areimplicit in a methodrsquos precondition Data structures are specified using logic functionsBy including such a logic function in a methodrsquos precondition the method is allowedto read and write anything belonging to the footprint of the logic function but cannotaccess anything outside this footprint

Approaches based on separation logic are hard to implement and to integrate intoverification tools Verifiers based on separation logic have mostly relied on sym-bolic execution and have not yet achieved the same level of automation as verifiersbased on verification condition generation (Hatcliff et al 2012) However currentlya series of tools exist that can reason using separation logic These include Small-foot (Berdine Calcagno and OrsquoHearn 2005 Berdine Calcagno and OrsquoHearn 2012)SpaceInvader (Distefano OrsquoHearn and Yang 2006 Calcagno et al 2008) jStar (Dis-tefano and Parkinson 2008 Naudziuniene et al 2011) VeriFast (Jacobs Smans andPiessens 2010 Jacobs et al 2011) and SLAyer (Berdine Cook and Ishtiaq 2011)

The implicit dynamic frames approach (Smans Jacobs and Piessens 2012) unifiesthe dynamic frames concept with separation logic Framing specifications of a methodare inferred using an implicit approach as described in Section 233 They are encodedin first-order logic and can be used for automatic verification with SMT solvers Thisis done in VeriCool (Smans Jacobs and Piessens 2008) and Chalice (Leino Muumlllerand Smans 2009)

243 Predefined Footprints

In contrast to the implicit and explicit footprint approaches which describe propertiesfound in a program the third approach focuses on reasoning efficiently about programswith restricted topologies Ownership types (Clarke Potter and Noble 1998) arerepresentative of this approach

Ownership types typically enforce a tree topology whereby every object in the heaphas at most one owner object and the owner relation is acyclic Topological propertiesbeyond this tree structure have to be expressed using object invariants and predicatelogic Read and write effects typically use ownership as an abstraction mechanism the

26 Chapter 2 The Frame Problem in Software Verification

right to read or write an object include the right to read or write all the objects it(transitively) owns (Hatcliff et al 2012)

Spec addresses framing through ownership types without explicit specificationsstating otherwise (modifies clauses of the form presented in Section 232) methodsmay modify only the fields of the receiver and of those objects within the subtree ofwhich the receiver is the root Ownership is expressed by means of attributes on fielddeclarations (Barnett et al 2004 Barnett et al 2011)

Ownership has been used to verify write effects (Muumlller Poetzsch-Heffter and Leav-ens 2003) and invariants (Drossopoulou et al 2008 Leino and Muumlller 2004 MuumlllerPoetzsch-Heffter and Leavens 2006) All the existing ownership-based verificationtechniques enforce that all modifications of an object must be initiated by the objectrsquosowner This gives owners total control over modifications of their internal representa-tions and allows them to maintain invariants (Hatcliff et al 2012) Ownership-basedapproaches have been used for reasoning about model fields (Leino and Muumlller 2006)and for enforcing object immutability (Leino Muumlller and Wallenburg 2008)

The ownership topology can be enforced by type systems (Lu Potter and Xue2007 Muumlller 2002) In JML it is enforced through universe types (Dietl and Muumlller2005) In Spec it is encoded as object invariants (Barnett et al 2004)

Reasoning about framing relies on the tree structure on the heap enforced by own-ership The ownership trees rooted in two different objects o1 and o2 are disjoint ifneither o1 owns o2 nor o2 owns o1 The disjointness of ownership trees can then beused to prove that read and write effects of methods do not overlap (Hatcliff et al2012)

25 Other Approaches to Reason about Frames

Rakamarić and Hu report in (Rakamaric and Hu 2008) a method to infer frameaxioms of procedures and loops based on static analysis As a starting point they usethe DSA shape analysis presented by Lattner et al (Lattner Lenharth and Adve2007) DSA provides a summary of points-to relations as a graph that is used tocompute a set of memory locations that are modified by a procedure or its callees Bya pass through the graph for each node that is reachable from the globals or procedureparameters they generate expressions representing a path to that node The generatedframe axioms are used internally by an extended static checker of C programs ie ina purely automatic setting

In (Taghdiri Seater and Jackson 2006) Taghdiri et al present a technique forextracting procedure summaries for object-oriented procedures used to prove verifi-cation conditions Procedures are executed symbolically and the environment of thepost-state is computed so as to express every variable and field in terms of the valuesof the variables and fields of the pre-state The extracted procedure summaries canbe viewed as detailed frame conditions describing which memory locations might bechanged and how

26 Other Relevant Work 27

In (Sozeau 2009) Sozeau presents a generalized rewriting technique implementedin the Coq proof assistant that allows substituting a term t of an expression by anotherterm tprime when t and tprime are related by a relation R This generalizes equational reasoningto reasoning modulo arbitrary relations The technique relies on dependent types andis based on a constraint generation algorithm generating type class constraints TheCoq tactic supports polymorphic relations morphisms and subrelations

Bertrand Meyer proposed the double frame inference strategy an approach that tar-gets the automation of both frame specification and frame verification in the contextof Eiffel (Meyer 1991) an object-oriented language with native support of Design byContract features (Meyer 1992) The first component ndash the frame specification infer-ence ndash relies on the analysis of method postconditions as described in Section 233 andobtaining a set p This represents an overapproximation of the set of elements that areallowed to be modified by p according to its specification The second component of thestrategy the frame implementation inference relies on the frame calculus (KogtenkovMeyer and Velder 2015) which is itself based on alias calculus (Kogtenkov Meyerand Velder 2015 Meyer 2010 Meyer 2011) Methods are analysed and p is detectedthis represents an overapproximation of the set of expressions whose values may changeas a result of executing p Frame verification amounts to verifying that p includes p

26 Other Relevant WorkPure methods also known as queries or observers are side-effect free methods that al-ways evaluate to the same result value given the same input value They are intensivelyused for providing specifications for methods without disclosing implementation detailsin languages such as JML Spec and Eiffel Leavens et al identify the developmentof specification and verification techniques for determining the effects of heap modifi-cations on the results of pure methods as one of the remaining challenging problemsrelated to framing (Leavens Leino and Muumlller 2007) Though our work is not con-cerned with heap modifications we are interested in the dependency of Boolean Smartpredicates ie logical properties on the layered (ldquocompositerdquo) data structures theyare receiving as inputs In Chapter 5 we present a static analysis meant to capturesuch dependencies

Various encodings of pure methods (Cok 2005 Darvas and Muumlller 2006) in pro-gram logic have been proposed but they do not cover aspects related to reasoningabout frame properties when the specifications make use of pure methods Some spec-ification techniques for frame properties (Leavens Baker and Ruby 2006 Leino andMuumlller 2006 Leino and Nelson 2002 Muumlller Poetzsch-Heffter and Leavens 2003)allow describing the fields that are potentially modified by a method execution usingmodifies clauses These however do not specify the effects of a method execution onthe results of pure methods (Leavens Leino and Muumlller 2007)

One technique for determining the effects of heap modifications on the results of puremethods requires listing all pure methods that are potentially affected by a methodin the methodrsquos modifies clause This approach is adopted in COLD-K (Feijs and

28 Chapter 2 The Frame Problem in Software Verification

Jonkers 1992) where the frame of a procedure specification lists the variables and theequivalent of pure methods whose value may be changed by the procedure For dealingwith modularity issues COLD-K also makes use of read effects

Other approaches (Leino and Muumlller 2006 Muumlller Poetzsch-Heffter and Leavens2003) for determining effects on the results of pure methods rely on model fields Theseare specification-only constructs whose value is determined by applying a mapping tothe concrete state of an object They are similar to pure methods but unlike the latterthey do not have parameters and they are required to be confined (Leino and Muumlller2006 Muumlller Poetzsch-Heffter and Leavens 2003)

Approaches based on model fields require that pure methods read only the stateof the receiver object and its sub-objects This information about the read effect of apure method can be used to determine which write effects potentially have an impacton the result of a pure method In general it can be proven that a method m does notaffect the result of a pure method p if the write effect of m and the read effect of p aredisjoint (Leavens Leino and Muumlller 2007)

There are various approaches to using read effects for reasoning about pure meth-ods One approach relies on complete specifications of result values included in thepostconditions of pure methods Used in conjunction with modifies clauses theseallow determining whether a method affects the result of a pure method (LeavensLeino and Muumlller 2007) Various solutions based on explicitly specified read effectsexist (Feijs and Jonkers 1992 Greenhouse and Boyland 1999 Jacobs and Piessens2006) Specification of these using data groups (Leino 1998 Leino Poetzsch-Heffterand Zhou 2002) and an effects system built on top of an ownership type system (Clarkeand Drossopoulou 2002) have been proposed Multi-threaded programs also requiresuch specifications (Praun and Gross 2003)

29

Chapter 3

The Smart Language andProvenTools

Languages are not strangers to oneanother

Walter Benjamin

In this chapter we introduce Smart a programming and specification languagedeveloped at Prove amp Run as well as the toolchain associated with it While notclaiming to be exhaustive we give an overview of the languagersquos features and syntax inSection 31 In Section 32 we present the tools manipulating Smartmodels Section 33briefly presents Smil the Smart Intermediate Language A computational version of itndash αSmil ndash is targeted by the static analyses presented throughout the remainder of thisthesis The following chapter will focus entirely on αSmil illustrating its usage andintroducing its syntax and formal semantics

31 The Smart Modeling LanguageSmart is a modeling language developed at Prove amp Run It constitutes a unified pro-gramming and specification language designed to facilitate proofs One of the commonoften cited reasons why programmers reject the use of formal methods is that they arenot willing to learn a separate language just for specifying their programs in particu-lar if that language is fundamentally different from the programming language Smartaddresses this issue by allowing one to both develop the implementation of programsand to specify their logical properties in a single language

The Smart language is a purely functional (side-effect free) strongly-typed poly-morphic first-order language The basic building blocks of programs written in Smartare predicates the equivalent of functions in other common programming languagesBesides the common primitive types that are traditionally available as built-in typesalgebraic data types (structures and variants) and associative arrays are provided aswell Exit labels constitute the languagersquos main specificity they facilitate separatingdata- and control-flow in programs

In addition being designed in order to write code that will subsequently be proventhe language allows the definition of various types of logical specifications as well

30 Chapter 3 The Smart Language and ProvenTools

These range from pre- and postcondition contracts local assertions and loop invariantsto inductive predicates lemmas and hypotheses

ProvenTools is a complex set of development tools for the Smart language It hasbeen developed at Prove amp Run with the goal of facilitating the achievement of high-levelcertifications The toolchain has the structure of a set of Eclipse plug-ins of JDT typendash Java Development Tools (Eclipse Java Development Tools (JDT)) Together theseconstitute a complete Integrated Development Environment (IDE) allowing one to notonly write edit and document Smart models but also to browse proof obligations toprove them by employing a built-in prover and finally to generate executable code inC

ProvenCore1 (Lescuyer 2015) and ProvenCore-M2 are two microkernels that havebeen completely modeled in Smart and developed using ProvenTools The former isa general-purpose microkernel that ensures isolation ie integrity and confidentialityThe latter targets embedded devices based on microcontrollers

Throughout the rest of this section we will present some of the main concepts andmechanisms of Smart discussing predicates control flow algebraic data types andspecification-only constructs

311 Smart Predicates and Types

Smart supports modular program development with a straightforward module con-cept Modules constitute the compilation units of Smart programs and any valid Smartprogram consists of a non-empty set of modules which are themselves organized inpackages Modules have an identifier that is unique in each program and in practicalterms each module corresponds to a file Modules can import other modules and theycontain a list of type and constant declarations as well as a list of predicates

Predicates the equivalent of functions in other common programming languagesare the basic building blocks of programs written in Smart Though named in referenceto predicate logic predicates in Smart receive a number of inputs and produce a numberof outputs in return in contrast to predicates in mathematics which are commonlyunderstood to be Boolean-valued functions of the form

P X rarr true false

Smart predicates can be classified in two different categories namely implicit andexplicit predicates based on their implementation or their lack thereof

Implicit predicates can be seen as a form of an assumption as their names suggestthey are not implemented per se but simply declared using the implicit programkeywords Such predicates are similar to the declarations of native methods in Javaor external functions in C Traditionally in Java programmers use the Java NativeInterface (JNI) (Liang 1999 Java Native Interface Documentation (JNI) 1999) whenthey need to implement small time-critical code portions in a lower-level language

1httpwwwprovenruncomproductsprovencore2httpwwwprovenruncomproductsprovencore-m

31 The Smart Modeling Language 31

such as assembly or when they need to access a library already written in anotherprogramming language such as C In Smart implicit predicates play an important rolewith respect to code documentation Their implementation is not provided in themodel but as we will further explain in Section 314 they can be used to specifylogical properties of the explicit implementations provided externally in a lower-levellanguage typically in C or assembly

For example an implicit predicate converting an integer given as an input into afloat can be declared as follows

public float_of ( int n f l oa t f+)impl ic i t program

The predicatersquos result is given a name f and it is introduced as one of the predi-catersquos parameters It is marked as being the predicatersquos output by the + symbol follow-ing it and is thereby syntactically distinguished from the predicatersquos input parametern which is unadorned

In the general case Smart predicates can have any number of input or output pa-rameters However a parameter cannot be both at the same time and each of thesemust be explicitly marked either as an input or as an output An input parameterrsquosvalue can be read and used in the predicatersquos implementation An output parameterrsquosvalue must be constructed by the predicatersquos implementation and returned as a resultFurthermore values in Smart are immutable As a consequence Smart predicates arepure it is impossible to pass a parameter ldquoby referencerdquo and modify a predicatersquos inputas a side-effect Smart is thus a side-effect free language which provides referential trans-parency (Strachey 1967) Furthermore the language supports neither global variablesnor global states but can be characterized rather as a state-passing style languageSmart predicates are deterministic they always return the same output any time theyare called with a specific set of input values In particular this is a prerequisite forimplicit predicates

As mentioned in the introduction Smart is also a strongly-typed language Eachinput and output parameter of a predicate must have an associated type and the us-age of an object of some type where a parameter of another data type is expected isforbidden by the language Unsafe conversions between different types are forbiddenas well Smart provides various built-in types such as int short long char booleanfloat and double that are traditionally available in other programming languages aswell Additionally users can declare new types with the type keyword and then de-fine predicates manipulating these types As in the case of predicates implicit datatypes can be simply declared without being explicitly defined For example supposingthat an implicit data type called cartesian_point and the predicates manipulating itare defined in a lower-level language we would make them available to other Smartpredicates using the following declarations

Implicit data type declarationtype cartesian_point

32 Chapter 3 The Smart Language and ProvenTools

Retrieve coordinate on X-axis public get_X ( cartesian_point p f l oa t x+)impl ic i t program

Retrieve coordinate on Y-axispublic get_Y ( cartesian_point p f l oa t y+)impl ic i t program

Construct a new point p with coordinates (x y)public new_point ( f l oa t x f l oa t y cartesian_point p+)impl ic i t program

Pretty - printpublic print_point ( cartesian_point p)impl ic i t program

Some implicit predicates manipulating inputs of type cartesian_point are declaredas well the first two of them ndash get_X and get_Y ndash simply return the input pointrsquos numer-ical coordinates on each of the Cartesian systemrsquos axes The next predicate new_pointcreates and returns a new point from the two given input coordinates Alternativelyit is possible to directly declare and implement these types and predicates in Smart aswe will show in the following paragraphs The last one print_point simply displaysthe input point without effectively producing an output As shown in the examplesimilarly to Java comments in Smart can be introduced by using for single-line com-ments or for multi-line comments Similarly to Javadoc code documentation canbe given using the begin-comment delimiter

In general implicit data types and the implicit predicates manipulating them canact as a public interface for a concrete class showing the type and the operationsallowed to manipulate values of that type but hiding the implementation

Explicit data types can be declared and defined using structures and variants Forexample we could explicitly define the type cart_point by means of a structure havingtwo different fields of type float called x and y Each of them corresponds to thepointrsquos numerical coordinates on the X- or Y-axis respectively

type cart_point = f l oa t xf l oa t y

For representing a point in a polar coordinate system we can define a different typepolar_point as follows

type polar_point = Radial coordinate ( distance from the pole) f l oa t radius

31 The Smart Modeling Language 33

Polar angle f l oa t azimuth

Explicit predicates have explicitly defined implementations following immediatelyafter their declaration which strongly resembles that of an implicit predicate but fromwhich the keyword implicit is omitted Their bodies are sequences of several state-ments which are essentially calls to other predicates For example to translate a point(x y) ie to add a given pair of numbers (a b) to its Cartesian coordinates and obtainthe new point (xprime yprime) = (x+ a y + b) a predicate translate_point could be defined inthe following manner

Convert x to float add it to y and retrieve the sum public sum_of ( int x f l oa t y f l oa t s+)impl ic i t programpublic translate_point ( cartesian_point p int a

int b cartesian_point q+)program f l oa t xa f l oa t yb Local variables

print_point (p) 1

get_X (p xa +) 2 get_Y (p yb +) 3

sum_of (a xa xa +) 4 sum_of (b yb yb +) 5

new_point (xa yb q+) 6

print_point (p) 7

The body of the translate_point predicate consists in a sequence of several state-ments the first of these simply pretty-prints the input point p The next two statementsare calls to accessors of prsquos coordinates on the X- and Y-axis which are stored in thelocal variables xa and yb respectively Next the coordinates (xaprime ybprime) = (a+xa b+yb)for the translated point are computed by calling the sum_of predicate which returnsthe float sum of an integer and a float The output point q is constructed by callingthe constructor new_point with xa and yb as inputs The last statement pretty-printsthe input point p again

As illustrated by our example each call to a predicate is made by passing theparameters in the same order as in the predicatersquos declaration and by explicitly mark-ing any output with +3 Replacing line 4 with sum_of(xa a xa+) would result in an

3This is mandatory because of overloading

34 Chapter 3 The Smart Language and ProvenTools

error because the first input parameter of a call to sum_of is expected to be an in-teger and the second a float Similarly omitting the + symbol at line 6 and writingnew_point(xa yb q) would result in an error By explicitly marking the outputs ofeach statement it is straightforward to distinguish between the variables that are ac-tually written by the statement and those that are used only as inputs Furthermoresince predicates are not allowed to modify their inputs the language strictly forbidsusing a predicatersquos input parameter as an output for any statement in the predicatersquosbody Thus in our example predicate we are prevented from using the input point pas the output of the new_point predicate call However outputs and local variablessuch as xa and xb can be written to but reading them (ie using them as inputs fora predicate call) before they have been written at least once amounts to using unini-tialized variables and behaves in an unspecified manner In our example xa and ya areused as both inputs and outputs at line 4 and 5 respectively This is correct since xaand ya are local variables that have already been written to by the statements at line2 and 3 preceding the calls to sum_of

We stress again the fact that destructive updates are not possible in Smart even ifat a first glance a statement such as the call to sum_of at line 4 might give the impressionthat xa is modified in place all that the statement actually does is to create a new floatwhose value is obtained by adding the old value of xa to the value of a and then to setxa to reference this new float instead of the old one A simple conversion to a staticsingle assignment form (Cytron et al 1989) would eliminate these assignments andshow the absence of any mutation whatsoever Thus were we to inspect the state ofthe input point p before and after the calls to sum_of we would observe that it remainsunchanged this is what we do when printing p again at the end of sum_of

As a last remark about our example it is noteworthy to mention that the statementnew_point(xa yb q+) which produces the predicatersquos output is not the predicatersquoslast statement Smart does not support any dedicated return statement Instead whenexiting from a predicate the outputs hold the values that they have been assigned whenexecuting the body This mechanism allows one to define predicates having multipleoutputs Their names are chosen by the programmer and their values can be modifiedmultiple times during the predicatersquos execution however the values retrieved are theones that are available at the moment the program exits the predicate

312 Exit Labels and Control Flow

Besides input and output parameters the declaration of a predicate can also include aset of exit labels When called a predicate exits with one of the specified exit labels thussummarising and returning to its callers further information regarding its execution

Exit labels constitute the main specificity of the Smart language They can denotedifferent exceptional execution scenarios and act as exit codes similarly to exceptionsand exit status return values in other programming languages

Every predicate has a non-empty set of labels by default any predicate has thebuilt-in exit label true that denotes the successful exit status of a predicate Thepredicates illustrated previously in Section 311 did not have explicitly declared exit

31 The Smart Modeling Language 35

labels in such a case it is assumed that the only possible exit label for the predicateis true and hence that the predicate will succeed in all circumstances

Returning to our previous example the predicate translate_point we could havewritten its complete declaration by explicitly stating that true is the only possible exitlabel

public translate_point ( cartesian_point p int aint b cartesian_point q+)

-gt [ true]program

This declaration is strictly equivalent to the one given in Section 311In the general case any number of labels can be specified after the parameters For

example we could declare a predicate that converts the coordinates of an input point(x y) of type cartesian_point to polar coordinates

r =radicx2 + y2

φ = atan2 (y x)

and returns a point (r φ) of type polar_point with these coordinates For computingthe second polar coordinate the polar angle or azimuth the predicate would call an-other predicate atan2 which is the arctangent function with two arguments a commonvariation on the arctangent function The atan2 function avoids the problem of divisionby zero however it is undefined when both x and y ie the Cartesian coordinates arezero For declaring it in Smart we can add a special exit label for the case when thegiven input coordinates represent the origin and the result cannot be returned

Computes atan(yx) public atan2( f l oa t x f l oa t y f l oa t at+) -gt [ true undef ]impl ic i t program

The declared labelrsquos name undef is a custom name and any valid identifier canbe chosen and used as a label in Smart As previously mentioned the exit label trueis predefined and has a special meaning Another predefined label that is interpretedin a special manner by conditional statements and logical operators is the false labelTogether these two exit labels offer a convenient manner to model a Boolean resultFrequently a Boolean output value can be replaced by declaring these two possible exitlabels true to denote a successful execution of the predicate and false respectively

Besides indicating the followed execution scenario exit labels play an importantrole with respect to control flow management Primarily the exit label of a call toa predicate determines whether the next predicate call in sequential order should beexecuted or not when the predicate exits with true the program can proceed to the

36 Chapter 3 The Smart Language and ProvenTools

next statement in the program Any other exit label lbl disrupts the normal controlflow and forces the current predicate to exit with label lbl

For example a predicate cart_to_polar can be defined with two exit labels trueand undef as well It takes two float numbers x and y computes the correspondingpolar coordinates r and phi by calling the predicates compute_radius and atan2 andconstructs a new point p of type polar_point using the computed values

public compute_radius ( f l oa t x f l oa t y f l oa t r+)-gt [ true]impl ic i t program

public cart_to_polar ( f l oa t x f l oa t y polar_point p+)-gt [ true undef ]program f l oa t phi f l oa t r

compute_radius (x y r+)

atan2 (y x phi +)new_polar_point (r phi p+)

There is no guarantee that the call to atan2 will return successfully with exit labeltrue it might return with undef in which case the execution of cart_to_polar willbreak at that point and exit with label undef Furthermore no output will be generatedIn Smart exit labels condition the existence of output parameters every output isassociated to an exit label lbl and it is generated if and only if the predicate exits withthat particular exit label lbl All other outputs are discarded and can be consideredas unchanged by the caller The same output can be associated to multiple labels Bydefault if no output parameters are specified for a label it means that no outputs aregenerated when the predicate exits with this label The only exception to this rule ismade in the case of the built-in true label since true normally represents a successfulexecution every output of the predicate is associated to it by default For examplethe previous declaration of cart_to_polar is strictly equivalent to

public cart_to_polar ( f l oa t x f l oa t y polar_point p+)-gt [ true ltpgt undef ltgt]program f l oa t phi f l oa t r

compute_radius (x y r+)atan2 (y x phi +)new_polar_point (r phi p+)

Exit labels can thus behave similarly to exceptions in other programming languages Inorder to handle specific observed execution scenarios Smart provides label transformerswhich allow catching labels before they escape the current predicate and transforming

31 The Smart Modeling Language 37

them into another label Complex control flow can be expressed by indicating a set ofrules of the form lbl1 lbl2 whose role is to transform the label lbl1 into lbl2 andby associating them to statements

For example we could let the predicate cart_to_polar return the label origin_failwhen the inner computation of the azimuth fails instead of just forwarding the labelreturned by atan2

public cart_to_polar ( f l oa t x f l oa t y polar_point p+)-gt [ true ltpgt origin_fail ]program f l oa t phi f l oa t r

compute_radius (x y r+)[undef origin_fail ]

atan2(y x phi +)new_polar_point (r phi p+)

Alternatively we could also handle the failure of the computation by using trans-formers and constructing the output point differently for example by declaring aconstant representing the azimuth of the origin often called pole in polar coordinatesand using this for the construction of p when the call to atan2 fails

public const float POLEAZIMUTH

public cart_to_polar (float x float y polar_point p+)-gt [true ltpgt]program float phi float r

compute_radius (x y r+)[done true]

[true done undef true]atan2(y x phi+)phi = POLEAZIMUTH

new_polar_point (r phi p+)

In the following we show how the control flows when atan2 terminates with labeltrue The green arrows indicate how control is passed from one statement to the otherbased on their exit labels when starting from the call to the atan2 predicate

38 Chapter 3 The Smart Language and ProvenTools

public const float POLEAZIMUTHpublic cart_to_polar (float x float y polar_point p+)-gt [true ltpgt]program float phi float r

compute_radius (x y r+)[done true]

[true done undef true]atan2(y x phi+)phi = POLEAZIMUTH

new_polar_point (r phi p+)

And here is how the control flows when atan2 terminates with label undef

public const float POLEAZIMUTHpublic cart_to_polar(float x float y polar_point p+)-gt [true ltpgt]program float phi float r

compute_radius (x y r+) 1 [done true ] 2

[true done undef true] 3 atan2(y x phi+) 4 phi = POLEAZIMUTH 5

new_polar_point (r phi p+)

After computing the radius r by calling compute_radius this new version of thepredicate starts by calling the predicate atan2 If this operation succeeds then phi isthe value of the azimuth and we can use this value as the second input parameter forthe pointrsquos constructor new_polar_point This is done by transforming true to a newlabel done whose effect is to jump immediately to the outer block in this case thetop-level The top-level block of the program catches done transforms it back to trueand continues with the statement following the block namely new_polar_point whichwill construct the output p by using r and phi the value of the azimuth returnedby atan2 When atan2 is undefined the transformer undef true is used to jump toan additional statement phi = POLEAZIMUTH that assigns the value of POLEAZIMUTH tophi The constructor is reached in this case as well However this time the value of phiwritten at line 5 is used as the second input parameter We note that the statementat line 5 is a call to a built-in assignment predicate denoted by = and using an infixnotation

The constant POLEAZIMUTH is declared using the keyword const In Smart constantscan be declared and used directly as inputs for predicate calls

31 The Smart Modeling Language 39

In the general case arbitrarily complex control flows can be expressed by couplinglabel transformers blocks and recursion

In order to facilitate the userrsquos task of simulating common control flow structureswith labels and transformers Smart provides various control flow statements whichare themselves based on this mechanism These include a construct that is equivalentto the try catch mechanism in Java a conditional if then else controlstructure as well as the common logical operators for negation () conjunction (ampamp)disjunction (||) implication (=gt) and equivalence (lt=gt)

Given the Cartesian coordinates (x y) the first polar coordinate the radius isobtained by computing radic

x2 + y2

For explicitly defining the predicate compute_radius we would first need to imple-ment a predicate sqrt computing the square root of a given positive number Such apredicate can be recursively implemented as follows by using the if then elseconstruct and three implicit predicates

Newton - Raphson Square Roots Finding Algorithm

Divides a to b and retrieves result in div public div_double (double a double b double div +)-gt [ true undef]impl ic i t program

Check if a is close enough to b |a - b| lt b 0001 public close_approximation (double a double b)-gt [ true f a l s e ]impl ic i t program

Compute ((b + ab) 2) public better_approximation (double a double b double g+)-gt [ true undef ]impl ic i t program

public sqrt(double x double g double sqr +)-gt [ true undef ] Returns the square root of x by making recursive callswith better and better guesses g until reaching a guessthat is close enough to the actual square root rsquos value program double aux

div_double (x g aux +)i f close_approximation (aux g)then

sqr = g

40 Chapter 3 The Smart Language and ProvenTools

e l se better_approximation (x g aux +)sqrt(x aux sqr +)

Besides recursion Smart also supports loops by providing a specific construct thatis similar to a traditional ldquowhilerdquo loop in other programming languageswhile

The body of thiswhile block is repeatedly executed until a dedicated exit label calledexit tries to escape in which case the loop is aborted and the execution continues afterthe block A ldquobreakrdquo can be achieved by raising the special exit label inside the loop

For instance the previously recursive predicate sqrt can be implemented iterativelywith a while loop as follows

public sqrt_iter (double x double g double sqr +)-gt [ true undef ] Computes the square root of x iteratively program

div_double (x g sqr +)while double aux

[ true exit f a l s e true]close_approximation (sqr g)

better_approximation (x g aux +)div_double (x aux sqr +)

313 Polymorphism amp Algebraic Data Types

Smart supports polymorphic types and predicates For declaring polymorphic types anumber of type parameters must be introduced in the typersquos declaration For examplean implicit type of polymorphic pairs can be declared as follows

type pair ltA Bgt

This type is parameterized by two types A and B which are the types of the first and sec-ond projection of the pair Type variables must always start with an uppercase letterwhile regular types must always start with a lowercase letter The declaration of poly-morphic predicates is straightforward For instance declaring an implicit constructorfor the pair type declared above amounts to the following

31 The Smart Modeling Language 41

public new_pair (A a B b pair ltA Bgt p+)impl ic i t program

This predicate is implicitly parameterized by two type variables A and B Thetype parameters of a predicate are implicitly determined by the type variables in itsarguments Local variables in explicit predicates can also be declared with polymorphictypes However they can only depend on type variables introduced in the predicatersquosparameters Type variables in polymorphic types can be instantiated by any type

As mentioned in Section 311 Smart allows users to define their own concrete datatypes by using algebraic data types namely structures and variants

Structures Structures also called records or tuples in other programming languagesrepresent the Cartesian products of the different types of their elements called fieldsIn Smart these can be declared in two manners either by using the keyword structfollowed by the name of the structured type and its list of field types and field namesor by using the keyword type as shown below The latter is preferred Declaringpolymorphic structures is possible by introducing type variables in the definition

struct pair ltA Bgt A fstB snd

type pair ltA Bgt = A fstB snd

In order to build and manipulate structures Smart supports built-in constructorsand accessors For instance for the following type definition of a structure

type t = t1 f1t2 f2

tn fn

a constructor a destructor as well as individual accessors and ldquoupdatersrdquo for any ofthe structurersquos fields are generated by Smart Constructing an object of type t amountsto using tnew which requires a value for each of trsquos fields For example creating astructure value s of type t with values e1 en for each field amounts to callingtnew(s+ e1 en) The values of these fields can all be read with a singlepredicate call to tall(s e1+ en+) (which ldquodestructsrdquo the structure value intoits fields components) Individual accessors of type tfi(s ei+) are provided as wellfor any field fi Finally the value of a field fi can be set to some variable vi by usingtfi(s+ vi) As all statements in Smart this call has a functional nature and handlesimmutable data Thus setting the value of the fi field amounts to returning a newstructure where all fields have the same value as s except fi which is set to vi

It is possible to define a structured type with no fields at all

42 Chapter 3 The Smart Language and ProvenTools

struct unit

The value s of this type can be constructed by using unitnew(s+) without any inputThis type can be seen as representing the absence of information

Variants Many programs need to deal with heterogeneous collections of values Forexample a node in a binary tree can be either a leaf or an interior node with twochildren similarly a node of an abstract syntax tree in a compiler can represent avariable an abstraction an application etc Variant types provide the mechanismthat supports this kind of heterogeneous value collections (Pierce 2002)

Variants also called tagged unions in other programming languages can be seen asthe dual of structures A variant is the disjoint union of different types It representsdata that may take on multiple forms where each form is marked by a specific tagcalled the constructor

Revisiting our previously declared types cartesian_point and polar_point in Smartwe can define a type point as being either expressed in Cartesian or in polar or sphericalcoordinates using the following variant declaration

type point =| Cartesian ( cartesian_point p)| Polar ( polar_point p)| Spherical ( f l oa t r f l oa t theta f l oa t phi)

Each form that a variant can take is indicated by the symbol | followed by theuppercase tag and the list of parameters and their types The cases are mutuallyexclusive and a value of type point can have only one form at a time An object of typepoint can be built by using one of the constructors called with the appropriate numberand types of inputs For instance a Cartesian point pc can be obtained by callingpointCartesian(p pc+) Given an object pt of type point we can also distinguishbetween the different cases by using a constructor that is similar to the match withconstruct in OCaml

switch (pt)case Cartesian ( cartesian_point p) get_X(p x+)case Polar ( polar_point p) get_radius (p r+)case Spherical ( f l oa t r f l oa t theta f l oa t phi)

For verifying if a given point pt is a Cartesian point we can use

pointcase[ Cartesian ](pt)

31 The Smart Modeling Language 43

This could be obtained using the switch construct but for practical considerationsthe case construct has been additionally provided as a built-in predicate

314 Specifications

Smart also supports various types of logical specifications ranging from axioms andlemmas to pre- and postconditions invariants and inductives

In Section 311 we stated that implicit predicates are a form of assumption andthat declaring implicit Smart types and the predicates manipulating them provides aconvenient manner of axiomatizing external implementations frequently developed in alower-level language They can provide implementation-independent descriptions andact as abstractions that hide hardware-related details and low-level implementationdecisions Another form of assumptions are hypotheses Hypotheses are logical resultsthat are assumed ie they constitute axioms which are supposed to be true In Smarthypotheses are specification-only predicates ie they cannot be called in the codeThey are introduced by the keyword hypothesis

For example we could revisit our polymorphic pair type introduced in Section 313and provide a polymorphic axiomatization for it by using implicit predicates and hy-potheses that stipulate that the operations fst and snd retrieve the first and secondrespectively elements of the pair These are declared as follows

type pair ltA Bgt

public new_pair (A a B b pair ltA Bgt p+)impl ic i t program

public fst(pair ltA Bgt p A a+)impl ic i t program

public snd(pair ltA Bgt p B b+)impl ic i t program

public hypothesis pair_fst (A a B b)program pair ltA Bgt p A a2

new_pair (a b p+)fst(p a2 +)a = a2

public hypothesis pair_snd (A a B b)program pair ltA Bgt p B b2

new_pair (a b p+)snd(p b2 +)b = b2

44 Chapter 3 The Smart Language and ProvenTools

Lemmas are another type of specification-only predicates meant to facilitate prov-ing logical properties In contrast to hypotheses lemmas must be proven A lemmacan be introduced with the keyword lemma and it states that all paths that exit fromits body with an undeclared exit label represent impossible execution scenarios

In Section 311 we introduced a type cartesian_point allowing to express a pointby its Cartesian coordinates and we defined a predicate translate_point for translatinga point by a given pair of numerical values (a b) We revisit our example and implementa predicate that translates a pair of points by a fixed pair of numbers (a b) that areadded to the Cartesian coordinates of each point of the pair In addition we consideran implicit predicate euclidean_dist that computes the Euclidean distance d

d =radic

(x2 minus x1)2 + (y2 minus y1)2

between a pair of points 〈(x1 y1) (x2 y2)〉 These are declared as follows

type point_pair = pair lt cartesian_point cartesian_point gt

For a pair of points (( x1 y1 ) (x2 y2 )) computed = sqrt ((x2 - x1 )^2 + (y2 - y1 )^2)

public euclidean_dist ( point_pair p f l oa t d+)-gt [ true]impl ic i t program

For a pair of points (( x1 y1 ) (x2 y2 )) and a fixednumerical pair (a b) compute ((x1 rsquo y1 rsquo) (x2 rsquo y2 rsquo))as (( x1 + a y1 + b) (x2 + a y1 + b))

public translate_pair ( point_pair p pair lt int int gt tpoint_pair o+)

-gt [ true]

The translation of a pair of points preserves the Euclidean distance between themthe Euclidean distance of a pair of points p will be equal to the Euclidean distanceof the pair of points obtained after a translation We can express this property bydeclaring it as a lemma

public lemma edist_preserved (pair lt f l oa t f loat gt tpoint_pair p)

program point_pair translated f l oa t d1 f l oa t d2

euclidean_dist (p d1+) =gttranslate_pair (p t translated +) =gteuclidean_dist ( translated d2 +) =gt d1 = d2

31 The Smart Modeling Language 45

Specifying contracts for Smart predicates is also possible by employing pre- andpostconditions A precondition represents a logical property that must be true priorto calling a predicate and it serves the purpose of letting the callers know when it issafe to call some predicate Typically it represents the callerrsquos obligations In Smarta precondition can be introduced with the keyword pre and it can be attached to anyimplicit or explicit predicate A precondition can refer to the predicatersquos inputs andit can declare its own local variables However it cannot make use of the predicatersquosoutputs

For instance for the atan2 predicate discussed in Section 312 we could indicatethat the predicate should never be called with the coordinates (0 0) of the origin byadding the following precondition

public const f l oa t ZERO

public atan2 ( f l oa t x f l oa t y f l oa t at +) -gt [ true]pre

x = ZERO || y = ZEROimpl ic i t program

A postcondition represents a logical condition that must be true after executinga predicate Its purpose is to indicate to the callers of a predicate what they areentitled to expect with respect to the outputs produced by the predicate In Smartpostconditions are introduced with the keyword post and they can be attached toany implicit or explicit (computational) predicate on a subset or all of the predicatersquosoutput labels They can refer to the predicatersquos inputs and the outputs associated tothe label considered in the postcondition Additionally they can declare their own localvariables

For instance a predicate equal_points verifying if two points are equal and havingfour possible exit labels eq_points eq_x eq_y and false respectively could declarepostconditions as follows

public equal_points ( cartesian_point p cartesian_point q)-gt [ eq_points eq_x eq_y f a l s e ]program f l oa t px f l oa t qx f l oa t py f l oa t qy

cartesian_pointx(p px +)cartesian_pointx(q qx +)cartesian_pointy(p py +)cartesian_pointy(q qy +)i f px = qxthen

[ true eq_points f a l s e eq_x] py = qy e l se

[ true eq_y] py = qy

post eq_points p = q

46 Chapter 3 The Smart Language and ProvenTools

post eq_x f l oa t x1 f l oa t x2 cartesian_pointequals[x](pq)

post p = q

The first postcondition applies to the exit label eq_points the second to the labeleq_x and the last one indicated by applies to labels eq_y and false

In Smart mathematical relations can be represented by introducing inductives orschemes These predicates have no outputs but they always have true and false astheir exit labels Inductive predicates are the only part of the language that cannot betransformed into executable code however they can be used to facilitate the proofsPredicates introduced with the inductive keyword represent the least fixed point oftheir cases introduced with the keyword case and a user-defined name Each case canintroduce existentially quantified variables In particular in the absence of recursioninductive predicates represent a parallel disjunction of cases An inductive predicatewill exit with the label true if any of its declared cases holds

For example we could specify membership for an implicit array type using aninductive named contains having a single case with the user-defined name ElemAtwhich introduces an existentially quantified variable idx

type array ltAgt

public get_size (array ltAgt arr int s+)impl ic i t programpublic get_elem (array ltAgt arr int i A ai+)-gt [ true oob]impl ic i t program

Membership defined with an inductive and an existential public contains (array ltAgt arr A a) -gt [ true f a l s e ]inductive An array contains an element if there exists a validindex where this element is to be found case ElemAt ( int idx ) A b

[ oob f a l s e ] get_elem (arr idx b+) ampamp b = a

Schemes on the other hand represent conjunction of cases cases are introducedwith the keyword with followed by a user-defined name and each of them can introduceuniversally quantified variables A scheme will return the label true only if all of itsdeclared cases hold

Using a scheme with two cases Size and Forall as shown below we can definethe pointwise equality of arrays The first case Size verifies if the two arrays have thesame length by introducing two universally quantified variables n and m The Forallcase verifies that for any index i the arrays contain equal elements Two arrays are

31 The Smart Modeling Language 47

equal pointwise if and only if they are of the same size and at any given index i thearrays have the same element

public equals_pointwise (array ltAgt arr1 array ltAgt arr2)-gt [ true f a l s e ] Extensional equality of arrays [arr1] and [arr2]scheme They must be of the same sizewith Size int n int m

get_size (arr1 n+) =gt get_size (arr2 m+) =gt n = m

If they exist elements at the same index must be equalwith Forall ( int i) A a A b

get_elem (arr1 i a+) =gt get_elem (arr2 i b+) =gta = b

Loop invariants are supported as well These can be introduced in various waysfor instance by declaring them with the keyword invariant or by declaring them asinductives

315 Illustrating Smart ndash An Abstract Process Manager

To illustrate the Smart language and its capabilities we consider an abstract processmanager and its fundamental components process and thread We define the data struc-tures corresponding to threads and processes implement the predicates correspondingto a simple thread switch and specify some fundamental properties for processes

Thread

Stack Register Counter

Data Files

Code

Process with a single thread

Thread1 Threadn

Stack Stack

Counter Counter

Register Register

Data Files

Code

Process with n threads

The implementation of threads and processes differs depending on the operatingsystem but frequently a thread is a component of a process that belongs to exactlyone process outside which it cannot exist Each thread represents a separate flow of

48 Chapter 3 The Smart Language and ProvenTools

control Multiple threads can be associated to one process they execute concurrentlyand provide a mechanism to improve application performance through parallelism Ina nutshell threads represent a software approach to improving the performance ofoperating systems by reducing the overhead of process switching

A thread is a flow of execution through the process code having its own programcounter that keeps track of which instruction to execute next as well as systemregisters which hold its current working variables and a stack which contains theexecution history Every thread is uniquely identified by a thread identifier Peerthreads share some information such as the code and data segments When one threadalters a code memory item all other threads see the change

Ready

Running

Blocked

Figure 31 ndash Possible Transitions between Thread States

We define a thread type as a structure consisting of multiple fields such as thethreadrsquos identifier its current state and the memory region for its stack

type memory_region = Start addressint start Region lengthint length

type state =| Ready| Running| Blocked

type thread = Identifierint id Current statestate crt_state Stackmemory_region stack

The threadrsquos stack is identified by its start address and its length The state of athread is defined as a variant having three alternatives Running (the thread is currentlyexecuting) Ready (the thread is currently awaiting execution and could potentially bestarted) and Blocked (the thread has exhausted its allocated time or is waiting foran event to occur it must be unblocked before being able to execute) The possibletransitions between states are shown in Figure 31 A threadrsquos current state determinesthe valid transitions

Similarly a process is defined as a structure consisting of an internal identifier anidentifier for the thread that is currently executing an address space and an array ofpossibly inactive threads associated with it Whether a thread in the thread array isactive or has terminated is indicated by a variant of type option An inactive thread

31 The Smart Modeling Language 49

indicated by None is a thread that terminated its execution and whose slot in the arrayof associated threads has not been reallocated In contrast a blocked thread indicatedby Some is a thread that cannot execute currently but should execute in the futureonce the resources it is waiting for are freed We consider a segmented address spacewith addresses existing not in a single linear range but instead in multiple segmentscorresponding to the code the data and the stack respectively

type option ltAgt =| None| Some (A a)

type address_space = memory_region codememory_region datamemory_region stack

type process = Array of associated threadsarray ltoption ltthread gtgt threads Internal idint pid Currently running threadint crt_thread Address spaceaddress_space adr_space

Next we consider a simple predicate called stop_thread having two possible exe-cution scenarios as indicated by its two exit labels true and invalid When the giveninput index i corresponds to an active thread the predicate executes successfully thusexiting with true In this case the state of the i-th thread associated to the inputprocess is set to Blocked and the new state of the process is returned in the outputout Otherwise when the given index i corresponds to a thread that is Ready or whenthere is no active thread at that index the predicate exits with the label invalid andno output is generated

public stop_thread ( process in int i process out +)-gt [ true invalid ]program array ltoption ltthread gtgt ta state s thread ti

option ltthread gt tio

Copy in to outout = in Fetch in threads and copy it to taprocessthreads (in ta +) Get the array rsquos i-th element[ oob invalid ] get_elem (ta i tio +) Check if the i-th element is activeswitch (tio)case Some ( thread th) ti = th case None ra i se invalid

50 Chapter 3 The Smart Language and ProvenTools

Get the thread rsquos current statethreadcrt_state (ti s+) Check whether the transition is valid[ f a l s e invalid ]statecase[ Running ](s) Create the new state for the running threadstateBlocked (s+) Set the newly created statethreadcrt_state (ti+ s) Reset tio to the thread with the modified stateoptionSome(tio + ti ) Reset the i-th thread and return the new state ta[ oob invalid ] set_ei (ta i tio ta +) Update out threads to taprocessthreads (out + ta)

Another auxiliary predicate called start_thread when given a valid index of anunblocked thread sets the state of the i-th thread to Running It is implementedsimilarly as shown below

public start_thread ( process in int i process out +)-gt [ true invalid ]program array ltoption ltthread gtgt ta state s thread ti

option ltthread gt tio

Copy in to outout = in Fetch in threads and copy it to taprocessthreads (in ta +) Get the array rsquos i-th element[ oob invalid ] get_ei (ta i tio +) Check if the i-th thread is activeswitch (tio)case Some ( thread th) ti = th case None ra i se invalid

threadcrt_state (ti s+)

Check whether the transition is valid[ f a l s e invalid ]statecase[Ready ](s) Create the new state for the running threadstateRunning (s+) Set the newly created statethreadcrt_state (ti + s) Reset tio to the thread with the modified stateoptionSome(tio + ti ) Set the i-th element and return the new state ta[ oob invalid ] set_ei (ta i tio ta +)

31 The Smart Modeling Language 51

Update out threads to taprocessthreads ( out + ta)

These two predicates will be called by the predicate run_thread that performs asimple thread switch It stops the thread currently executing indicated by crt_threadand starts the one with the given index i The new state of the process is returned inthe output out

public run_thread ( process in int i process out +)-gt [ true inval ]program int crt

processcrt_thread (in crt +)[ true true invalid inval ] stop_thread (in crt out +)[ true true invalid inval ] start_thread (out i out +)processcrt_thread (out + nid )

Next we introduce a fundamental property for any valid process state namely thefact that the stack regions of all its associated threads are completely disjoint

public not_disjoint ( process p) -gt [ true f a l s e ]inductivecase StacksJoint ( int i int j) thread ti thread tj memory_region sti

memory_region stj i = j[None f a l s e ] thread (p i ti +)[None f a l s e ] thread (p j tj +)threadstack(ti sti +) threadstack (tj stj +)overlap (sti stj )

case CodeStackJoint ( int i)

thread ti memory_region sti address_space asmemory_region code [None f a l s e ] thread (p i ti +)threadstack (ti sti +)processadr_space (p as +)address_spacecode(as code +)overlap (sti code )

case DataStackJoint ( int i)

thread ti memory_region sti address_space asmemory_region data [None f a l s e ] thread (p i ti +)threadstack (ti sti +)

52 Chapter 3 The Smart Language and ProvenTools

processadr_space (p as +)address_spacedata(as data +)overlap (sti data )

public disjoint_stacks ( process p) -gt [ true f a l s e ]program

not_disjoint (p)

This property is expressed using an inductive predicate that characterizes the potentialsituations in which the memory isolation of the different associated threads of a processcan be broken The natural manner of expressing such a property in Smart is by usinga scheme as presented in Section 314 here we use an inductive predicate becausethe language we are working with and which will be presented in Chapter 4 doesnot support schemes In our inductive predicate the first case StacksJoint checkswhether there exist two different threads having overlapping stacks The next twocases CodeStackJoint and DataStackJoint check whether there exists a thread whosestack overlaps the processrsquo code segment or data segment respectively This uses anauxiliary predicate verifying if two memory regions overlap ie if there exists anaddress that is contained simultaneously by two different segments This operation issymmetric we express this property with the lemma overlap_sym

public contains ( memory_region m int address )-gt [ true f a l s e ]impl ic i t programpublic overlap ( memory_region m1 memory_region m2)-gt [ true f a l s e ]inductivecase InBoth ( int address )

contains (m1 address ) ampamp contains (m2 address )

public lemma overlap_sym ( memory_region m1 memory_region m2)-gt [ true f a l s e ]program

overlap (m1 m2) =gt overlap (m2 m1)

32 ProvenToolsProvenTools is a comprehensive set of development tools for the Smart language Ithas been developed at Prove amp Run with the goal of facilitating the achievement ofhigh-level certifications The toolchain has the structure of a set of Eclipse plug-ins ofJDT type ndash Java Development Tools Together these constitute a complete IntegratedDevelopment Environment (IDE) allowing one to not only write edit and document

32 ProvenTools 53

Smart models but also to browse proof obligations to prove them by employing abuilt-in prover and finally to generate executable code in C or Java

The plug-ins are based on Xtext (Xtext Documentation) an official Eclipse plug-indedicated to the creation of DSLs (Domain Specific Languages) in Eclipse Xtext-basedDSLs are described in an EBNF (Extended Backus-Naur Form) grammar languageFully statically typed expressions can be embedded in the developed DSL and Javastyle scoping and linking are supported

Proofs

ProofObligations

C Code

Java Code

Prover

Code Generators

Prover

Code Generators

SmilSmart Code ampSpecifications

Front-end Back-end

Figure 32 ndash The ProvenTools Toolchain

Concretely the toolchain includes a compiler whose front-end contains the plug-inin charge of Smart as well as the plug-in dedicated to Smil the Smart IntermediateLanguage to which Smart programs and specifications are translated Smil is a simplerform of the Smart language Though roughly equivalent to Smart Smil has a ratherdifferent form manipulating less complex structures and having no syntactic sugarHarder to be understood by a human reader Smil is meant to be easily manipulated bythe back-end of the toolchain The back-end currently offers a C code generator andan interactive prover An overview of this architecture is shown in Figure 32

While employing ProvenTools the code undergoes various compilation steps andtransformations During the compilation chain the Smart code is transformed to aSmart AST (Abstract Syntax Tree) The obtained AST is then compiled to a SmilAST Following the Smil AST is transformed to Smil source code and then reinsertedin the compilation chain by the plug-in in charge of it

After finishing all the compilation chain and obtaining the Smil AST and the asso-ciated Smil source code the back-end of the compiler can be employed The back-endcomprises a source code generator and a prover The generator transforms Smart mod-els into their equivalents in C

54 Chapter 3 The Smart Language and ProvenTools

Figure 33 ndash Smart Editor

Smart Editor The Smart editor provides facilities to edit Smart code and supportsbroad and complex features such as syntax highlighting facilities for code navigationand visualization and edition assistants including word completion and quick fixes Asnapshot of it is shown in Figure 33

Prover ProvenTools provides users a dedicated view for interacting with the proverThis presents the existing proof obligations and provides facilities to solve them Proofobligations are generated for any logical lemma precondition postcondition or invariantincluded in the Smart models Additionally any label that remains unhandled in thecode triggers the generation of a proof obligation thus enforcing that each possible exitlabel of a predicate is either explicitly handled or proven to be impossible

An automatic prover trying various proof search procedures is called whenever aproof obligation is generated It uses previously proven obligations or existing hypothe-ses for discharging new obligations automatically Unproved obligations can be solvedby interactively employing manual tactics called hints which are provided in the IDEHints that are considered useless with respect to the currently selected proof obliga-tions are automatically disabled Additionally users can define strategies ie proofpatterns and employ an interactive proof assistant that applies them automatically inthe background This will suggest a possible proof as soon as it finds one Proofs thusfound are rechecked as if they had been done manually

33 Smil 55

ProvenTools offers facilities to inspect any manual or automatic proof step thusmaking an eventual review of the proofs possible The toolchain also provides a dedi-cated system for assisting the user into adapting former proofs to new changes due tocode maintenance or evolution

C Code Generator The executable part of Smartmodels is translated to executableC code by the C code generator To this end the executable parts of the Smart modelsare identified and extracted while the logical parts are discarded Users can guidethis process through annotations and they can specify that particular values are purelylogical Functional implementations are transformed to imperative ones the dedicatedC code generation plug-in tries to replace functional modifications of structures in themodels by in-place updates Such transformations are correct only if the differentvalues are handled linearly in the Smart code ie if no previous value is read afterapplying a functional update on it For ensuring the safety of functional to imperativecode transformations the C generation plug-in employs various global static analysesWhen safety cannot be guaranteed the generator reports errors or introduces copiesif the users deemed it acceptable

In earlier experiments (Lescuyer 2015) the Prove amp Run team was able to generateC code for a complete model of ProvenCore that did not require dynamic allocationand ran at a speed comparable to the original C code

33 SmilSmil is an intermediate language to which Smart models are compiled Similarly toSmart Smil is a functional language with algebraic data types (structures and variants)However unlike Smart Smil is not a user-oriented language ie it was not designed towrite programs in it directly but rather to provide a representation of Smart programsat a different level of abstraction Thus reading Smil code is a rather cumbersome taskas it is a language without syntactic sugar meant to serve as a starting point for themain components of the ProvenTools back-end exploiting Smart models the prover andthe code generator

To give an idea of Smilrsquos syntax we illustrate below the types thread and processas well as the stop_thread predicate from our abstract process manager example givenin Section 315

public type state =| Ready| Running| Blocked

public type thread = id int crt_state statestack memory_region

56 Chapter 3 The Smart Language and ProvenTools

public state_acopy_ahypothesis (state state_1 ) -gt [ true]hypothesis state state_2

[lt1gt] stateswitch ( state_1 )-gt [ Ready -gt 5 Running -gt 4 Blocked -gt 3]

[lt2gt] ==ltstate gt( state_1 state_2 )-gt [ true -gt true f a l s e -gt error ]

[lt3gt] stateBlocked ( state_2 )-gt [ true -gt 2]

[lt4gt] stateRunning ( state_2 )-gt [ true -gt 2]

[lt5gt] stateReady ( state_2 )-gt [ true -gt 2]

public thread_ahypothesis ( thread x1) -gt [ true]hypothesis thread x2 int zid state zcrt_state

memory_region zstack [lt1gt] threada l l (x1 zid zcrt_state zstack )

-gt [ true -gt 2][lt2gt] threadnew(x2 zid zcrt_state zstack )

-gt [ true -gt 3][lt3gt] ==lt thread gt( x1 x2)

-gt [ true -gt true f a l s e -gt error ]

The type declarations in Smil strongly resemble their Smart counterpart Predicatedeclarations as well mirror the form found in Smart except that in Smil any outputvariable associated to the true exit label is explicitly declared as such Preconditionsand postconditions are appended to any predicate and as shown above a hypothesisis added for any explicitly declared type

The real syntax differences are visible in predicate implementations every state-ment is preceded by a numerical label and every possible exit label lbl of the statementindicates another numerical label The latter numerical label actually designates thestatement that will be executed next if the current statement exits with label lbl Inparticular this mechanism replaces the try catch and the conditional controlconstructs as well as the logical operators and any other construct based on labeltransformers described in Section 312 Thus the predicate bodies are very similar inform to a control flow graph where the statements represent the nodes of the graphand the exit labels represent transitions

public stop_thread ( process in int i process out +)-gt [true ltout gt invalid ]

pre [lt0gt] true() -gt [ true -gt true]

33 Smil 57

array ltoption ltthread gtgt ta state s thread tioption ltthread gt tio thread th

[lt1gt] =lt process gt( out in)-gt [ true -gt 2]

[lt2gt] processthreads (in ta)-gt [ true -gt 3]

[lt3gt] get ltoption ltthread gtgt(ta i tio)-gt [ true -gt 4 oob -gt invalid ]

[lt4gt] optionswitch ltthread gt( tio th)-gt [None -gt 6 Some -gt 7]

[lt5gt] stateBlocked (s)-gt [ true -gt 8]

[lt6gt] true()-gt [ true -gt invalid ]

[lt7gt] =lt thread gt(ti th)-gt [ true -gt 5]

[lt8gt] threadcrt_state +( ti ti s)-gt [ true -gt 9]

[lt9gt] optionSome ltthread gt( tio ti)-gt [ true -gt 10]

[lt10 gt] set ltoption ltthread gtgt(ta i tio ta)-gt [ true -gt 11 oob -gt invalid ]

[lt11 gt] set ltoption ltthread gtgt(ta i tio ta)-gt [ true -gt 12 oob -gt invalid ]

[lt12 gt] processthreads +( out out ta)-gt [ true -gt true]

post true 0post invalid 0

In a nutshell Smil constitutes a representative albeit restricted set of constructsand it is a language designed to be well-suited for further transformations and analyses

The next chapter focuses entirely on αSmil the computational version of Smil withwhich we are working throughout the rest of this thesis We will illustrate its usageand describe its abstract syntax and formal semantics

59

Chapter 4

The αSmil Language

One day I will find the right wordsand they will be simple

Jack Kerouac

In this chapter we define the syntax and the semantics of αSmil the languagethat we consider in this thesis This is a computational version of Smil (presented inSection 33) which is essentially a subset of Smart presented in the previous chapterChapter 3 However it contains a few additional elements introduced for the purposeof this thesis

The αSmil language is a first-order purely functional and strongly-typed languagewith arrays and algebraic data types ie structures and variants It is an intermediateanalysis-oriented language

41 αSmil SyntaxThe αSmil language is minimal in the sense that it contains only those constructs thatare needed for the purpose of this thesis For instance unlike Smart and Smil thelanguage does not contain visibility modifiers because these modifiers play no role inthe techniques presented in the sequel During the introduction of the grammar wewill point out the most important deviations from Smart and Smil

Programs A program in αSmil consists of a number of type and constant declara-tions and definitions followed by a collection of predicates In contrast to Smart andSmil type and predicate declarations have no visibility modifiers (such as public) andthey are not organized into modules The absence of visibility modifiers is a naturalconsequence of the disappearance of modules We assume that there is one modulein which every type constant and predicate declaration resides and these are mutu-ally visible to each other These restrictions are made for the sake of simplicity sincethe techniques proposed in this thesis are orthogonal to the concepts of visibility andmodules

Constants are declared using the keyword const followed by the type and the con-stant identifier Constant identifiers are written in upper-case letters and are precededby the special symbol

60 Chapter 4 The αSmil Language

Types are declared using the keyword type followed by the type identifier and op-tionally in the case of polymorphic type declarations by a number of type parametersgiven in upper-case letters between ltgt In the case of implicit types this constitutes thecomplete type declaration Explicit type declarations continue with the symbol = andthe typersquos definition Throughout the rest of this chapter and the presentation of ourstatic analyses we will ignore polymorphism The abstract types of our analyses arenot polymorphic and the impact of polymorphism is visible only at the implementationlevel for type substitutions that will be discussed in Chapter 8

Types Similarly to Smart algebraic data types ie structures and variants andassociative arrays are supported We let T be the universe of type identifiers andT0 sub T the set of base type identifiers We assume a set of identifiers for structurefields and variant constructors denoted by F and C respectively

A structure represents the Cartesian product of the different types of its elementscalled fields A variant is the disjoint union of different types It represents data thatmay take on multiple forms where each form is marked by a specific tag called theconstructor Arrays group elements of data of the same type (given in angle brackets)into a single entity elements are selected by an index whose type is included (as denotedby the superscript) in the arrayrsquos definition as well

Definition 411 Types τ isin T in αSmil

τ isin T τ = | τ0 isin T0 base types| structf1 τ fn τ fi isin F 0 le n structures| variant[C1 τ | | Cn τ ] Ci isin C 1 le m variants| arrτ 〈τ〉 arrays

Variants and structures can be used together to model traditional algebraic variantswith zero or several parameters For instance a generic type optionltTgt is actuallymodeled as

variant[Some structt T | None struct]

Concretely structures are declared and defined by indicating a set of pairs of fieldidentifiers and their corresponding types between Declaring structures with no fieldsis possible Variants are declared and defined by indicating the list of their constructorseach starting with an upper-case letter preceded by the symbol | Unlike structuresvariants must have at least one declared constructor For instance the state and threadtypes from our Abstract Process Manager example given in Smart in Section 315 onpage 48 have the following Smil declaration

type state =|Ready| Running| Blocked

type thread = id int crt_state statestack memory_region

41 αSmil Syntax 61

In contrast to Smart in structure declarations the field name precedes the field type

Predicates Predicates are declared using the keyword predicate which is specificto αSmil followed by a predicate identifier and a signature A signature is given by asequence of input types and a non-empty finite mapping of exit labels λ isin L errorto sequences of output labels The set of exit labels L contains three distinguishedelements true false and error The latter cannot appear in predicate signatures it isused as a sink node in control flow graphs which will be presented in Section 42 Wewrite signatures in the following manner

σ =

(x1 τ1 xn τn)︸ ︷︷ ︸input identifiers types

[λ1 (τ11 y11 τ1k1 y1k1)| |label (output types identifiers)︷ ︸︸ ︷λp (τp1 yp1 τpkp ypkp)]︸ ︷︷ ︸

p possible exit labels

We denote by Σ the mapping between predicate identifiers and their signaturesThe predicate declaration is followed by the predicatersquos body Depending on its

bodyrsquos nature a predicate will be implicit explicit or inductive Smart implicit andexplicit predicates have been presented in Section 311 of our previous chapter whileinductive predicates have been illustrated in Section 314 on page 46 For implicitpredicates the body consists solely in the keyword implicit For explicit predicates anoptional declaration unit can follow This is a finite mapping from variables to types andit must be given between double curly braces ie typeid videntifier Input andoutput parameters must be different from all the variables appearing in the declarationunits Declaration units are followed by a sequence of statements representing calls topredicates

Just as presented in Chapter 314 for Smart an inductive predicate is syntacticallydistinguished by the keyword inductive followed by its different cases declared withthe keyword case followed by an identifier an optional list of existentially quantifiedvariables and a body of statements

A generic call to a predicate p is of the form

p(e1 en) [λ1 o1 | | λm om]

The predicate p is called with inputs e1 en and yields one of the declared exitlabels λ1 λm each having its own set of associated output variables o1 omrespectively We denote by o a sequence of 0 or more output variables

Statements The αSmil language supports the statements presented in Table 42These represent calls to built-in predicates and can be seen as special cases of thepredicate call presented above All statements have a functional nature and handleimmutable data A statement consists in as many variables as there are input types

62 Chapter 4 The αSmil Language

s = | o = e (1) assignment| e1 = e2 (2) equality test| nop (3) no operation| r = e1 en (4) create structure| o1 on = r (5) destructure structure| o = rfi (6) access field| rprime = r with fi = e (7) update field| rprime = 〈f1 fk〉rprimeprime (8) check (partial) structure equality| v = Cp[e] (9) create variant| switch(v) as [o1| |on] (10) destructure variant| v isin C1 Ck (11) variant possible| o = a[i] (12) array access| aprime = [a with i = e] (13) array update| p(e1 en) [λ1 o1 | | λm om] (14) predicate call

Table 42 ndash αSmil ndash Set of Supported Statements

in the signature σp of the called built-in predicate p and a mapping associating toeach exit label of σp a sequence of variables one variable for each output type in thecorresponding sequence

The first three statements are generic and can be applied to any type Statement (1)is a call to the built-in assignment predicate denoted by = present in an identical formin Smart as well Statement (2) is a call to the logical operator = verifying whether itstwo input arguments are equal Statement (3) is the αSmil equivalent of a no-operationAs a general convention for the statements notation we denote by e the identifiers ofentry variables and by o the identifiers of output variables

Statements (4) ndash (8) are structure-related The first of them statement (4) is theconstructor of a structure r of type rtype having n fields It corresponds to the state-ment rtypenew(r+ e_1 e_n) in Smart Statement (5) returns the values ofall the fields of r into the output parameters o1 on and it is the equivalent ofrtypeall(r o_1+ o_n+) in Smart Statement (6) is the individual accessor ofa field fi and corresponds to rtypef_i(r e_i+) in Smart As previously mentionedour language is purely functional and handles only immutable algebraic data structuresand arrays Therefore setting the field fi of a structure shown in (7) and being theequivalent of rtypef_i(rrsquo+ e_i) returns a new structure where all fields have thesame value as in r except fi which is set to ei Statement (8) verifies if the valuesof the indicated subset of fields of two structures rprime and rprimeprime are equal It exists inSmart as well where it has a similar syntax rtypeequals[fg](rrsquo rrsquorsquo) for check-ing that the values of fields f and g of the two structures are equal or the dualrtypeequals-[fg](rrsquo rrsquorsquo) for checking that the values of all fields except f and gare equal

The next group of statements is variant-related The first of them statement (9)creates a new variant v of type vtype using the constructor Cp with e as an argumentIt corresponds to vtypeCp(v+ e) in Smart Statement (10) is used for matching on

41 αSmil Syntax 63

the different constructors of the input variant v and corresponds to switch(v) case in Smart The last statement of this group statement (11) verifies if the given variantwas created with one of the constructors in C1 Ck This could be obtained witha variant switch but for practical considerations it has been provided as a built-inpredicate Its counterpart in Smart is vtypecase[C1 Ck](v)

Statements (12) and (13) are array-related (12) returns the value of the i-th cell ofthe input array a Similarly to (7) updating the i-th cell of an array ndash shown in (13) ndashhas a functional nature It returns a new array where all cells have the same values asin a except the i-th cell which is set to e These statements are specific to αSmil

Statement (14) is a generic call to a predicate p and has been presented on page 61

Exit Labels All of the built-in supported statements have an associated set of exitlabels λ isin L error These are indicated in Table 43 There are two distinguishedexit labels true and false respectively An additional built-in label called error is usedas a sink node in control flow graphs It cannot be used as an exit label for a predicate

Table 43 ndash Statements and their Exit Labels

Statement Exit Labels

o = e (1)[true 7rarr o

]

e1 = e2 (2)

[true 7rarr emptyfalse 7rarr empty

]

nop (3)[true 7rarr empty

]r = e1 en (4)

[true 7rarr r

]o1 on = r (5)

[true 7rarr o1 on

]o = rfi (6)

[true 7rarr o

]rprime = r with fi = e (7)

[true 7rarr rprime

]

rprime = 〈f1 fk〉rprimeprime (8)

[true 7rarr emptyfalse 7rarr empty

]

v = Cp[e] (9)[true 7rarr v

]

64 Chapter 4 The αSmil Language

switch(v) as [o1| |on] (10)

λC1 7rarr o1

λCn 7rarr on

v isin C1 Ck (11)

[true 7rarr emptyfalse 7rarr empty

]

o = a[i] (12)

[true 7rarr ofalse 7rarr empty

]

aprime = [a with i = e] (13)

[true 7rarr aprime

false 7rarr empty

]

p(e1 en) [λ1 o1 | | λm om] (14)

λ1 7rarr o1 λm 7rarr om

As shown in Table 43 statement (10) has an exit label λCi corresponding to eachconstructor Ci of the input variant Statements (2) (8) and (11) are bi-labeled using trueand false as logical values Neither of them has any associated outputs Statements (12)and (13) are bi-labeled as well However unlike the previously mentioned statementsthey use the label false as an ldquoout of boundsrdquo exception and generate an output onlyfor the label true All other statements except (14) are uni-labeled they associate alltheir output parameters (if any) to the label true In contrast to Smart in αSmilevery exit label including true must be explicitly indicated Furthermore any outputis explicitly associated to an exit label

In Section 315 (on page 50) of our previous chapter we introduced a Smart pred-icate called stop_thread If the given index i designates an active associated threadthis predicate sets its state to Blocked and returns the new state of the process Oth-erwise the predicate exits with label invalid Revisiting it we can finally indicate itsbody in the αSmil language1

Table 44 ndash Predicate Body in αSmil

Signaturepredicate stop_thread ( process p int i)-gt [ true process o | invalid ] Declaration unit array lt option_thread gt ta option_thread th

thread ti state s Predicate body

1The αSmil version is slightly simplified as we are not checking if the transition to Blocked is valid

41 αSmil Syntax 65

ta = p threads [ true -gt 1] 0th = ta[i] [ true -gt 2 f a l s e -gt 9] 1switch (th) as [ti | ] [Some -gt 3 None -gt 9] 2s = Blocked [ true -gt 4] 3ti = ti with crt_state = s [ true -gt 5] 4th = Some(ti) [ true -gt 6] 5ta = [ta with i = th] [ true -gt 7 f a l s e -gt 9] 6o = p with threads = ta [ true -gt 8] 7[ true] 8[ invalid ] 9

Every statement in our stop_thread example is followed by a construct of the formexit_label -gt numerical_label This indicates the statement to be executed next asidentified by the numerical_label if the current statement exits with label exit_labelFor example when the first statement ta = pthreads exits with label true thepredicatersquos execution continues with the statement following it having the numericallabel 1 We remark that the predicatersquos exit labels are included in the body of anexplicit predicate as can be seen at lines 8 and 9 respectively in the case of trueand inval Intuitively the predicatersquos body resembles a control flow graph and canbe illustrated as shown in Figure 41 The predicatersquos exit labels are the control flowgraphrsquos exit nodes as will be discussed in Section 42

0 ta = inthreads1 th = ta[i]2 switch(th) as [Someti | None]3 s = BLOCKED4 ti = ti with current_state=s5 th = Some(ti)6 ta = [ta with i=th]7 o = in with threads=ta8 true 9 inval

false

None

false

Figure 41 ndash Body of the stop_thread Predicate

We are working with αSmil which is a computational version of Smil where allspecification-only predicates have been removed Simulating hypotheses lemmas andcontracts is straightforward and can be achieved using predicates having only the trueand false labels and no associated output Inductives are the only exception to thisrule they are supported in αSmil as well and their declaration is similar to the one inSmart The αSmil equivalent of the not_disjoint inductive presented in our AbstractProcess Manager example (on page 46) has the following form

predicate not_disjoint ( process p)-gt [ true | f a l s e ]inductive

66 Chapter 4 The αSmil Language

case StacksJoint ( int i int j) thread ti thread tj memory_region sti

memory_region stj i = j [ true -gt 1 f a l s e -gt 7]thread (p i)[ true ti | None] [ true -gt 2 None -gt 7]thread (p j)[ true tj | None] [ true -gt 3 None -gt 7]sti = tistack [ true -gt 4]stj = tjstack [ true -gt 5]overlap (sti stj )[ true| f a l s e ] [ true -gt 6 f a l s e -gt 7][ true][ f a l s e ][error]

case CodeStackJoint ( int k)

thread tk memory_region stk address_space aspmemory_region code

thread (p k)[ true tk | None] [ true -gt 1 None -gt 6]stk = tkstack [ true -gt 2]asp = p adr_space [ true -gt 3]code = aspcode [ true -gt 4]overlap (stk code )[ true| f a l s e ] [ true -gt 5 f a l s e -gt 6][ true][ f a l s e ][error]

case DataStackJoint ( int l)

thread tl memory_region stl address_space aspace memory_region data

thread (p l)[ true tl | None] [ true -gt 1 None -gt 6]stl = tlstack [ true -gt 2]aspace = p adr_space [ true -gt 3]data = aspace data [ true -gt 4]overlap (stl data )[ true| f a l s e ] [ true -gt 5 f a l s e -gt 6][ true][ f a l s e ][error]

predicate disjoint_stacks ( process p) -gt [ true | f a l s e ]

not_disjoint (p)[ true| f a l s e ] [ true -gt 1 f a l s e -gt 2][ true][ f a l s e ][error]

This inductive predicate has been introduced and explained in Section 315 of theprevious chapter (on page 52) and it characterizes the potential situations in which thememory isolation of the different associated threads of a process can be broken

42 Control Flow Graph 67

42 Control Flow GraphPredicate bodies in αSmil resemble a control flow graph representation having state-ments as nodes The nodes represent program states and the edges are defined bystatements with a particular exit label λ

The control flow graph Gp = (N E) of a predicate p has a node ni isin N for eachprogram point For each statement s at program point ni that can execute and reachprogram point nj with exit label λk an edge (ni nj) is added to Gp and labeled withs and λk Gp has a single entry node nin isin N corresponding to the program pointassociated to the first statement of p The set of exit nodes nout sub N consists of thenodes associated to each possible exit label λk of the predicate To these one additionalexit node which is used as a sink node is added This corresponds to the error label

In practice all the outgoing edges of a node ni isin N bear the different cases of thesame statement s found at program point ni Thus the edges are labeled with thesame statement s and there is an edge labeled s λk for each possible exit label λk of s

The subfigures in Figure 42 show the control flow graph of the following predicate

predicate thread ( process p int i)-gt [ true thread ti | None | oob]

which receives a process p and an index i as inputs and returns the i-th active threadof the input process If the i-th thread is inactive it exits with the exit label NoneIn the case of an ldquoout of boundsrdquo exception the exit label oob is returned For betterreadability Figure 42-b gives the control flow of the same predicate where we havelabeled the nodes with statements of the predicate and the edges with their exit la-bels Throughout the rest of our αSmil predicate examples we will favour the latterrepresentation

a) Gthread b) Gthread ndash alternative representationn1

n2

n3 oob

true None

ts = pthreads true

tio = ts[i] truetio = ts[i] false

switch (tio) as [ti| ] Some switch (tio) as [ti| ] None

ts = pthreads

tio = ts[i]

switch(tio) as [ti| ] oob

true None

true

true false

Some None

Figure 42 ndash Example ndash Control Flow Graph of Predicate thread

43 Well-Typed αSmil StatementsWe formally define what it means for an αSmil statement to be well-typed and detailthe full system of inference rules for the statements supported by αSmil in Table 46

68 Chapter 4 The αSmil Language

and Table 47A well-typed αSmil statement is a statement that is compatible with the types

specified in the signature σp of the called built-in predicate p This requires a typingenvironment Γ mapping variables to their types

Definition 431 Typing Environment Γ

Γ V rarr T

Furthermore αSmil distinguishes between variables v isin V which can be writtento and variables which are read-only Therefore the definition of well-typedness forstatements requires two different sets of variable identifiers one for each kind of variableThese are

bull V+ V+ sube V which denotes the set of identifiers of writable and readable vari-ables and

bull V V+ which denotes the set of read-only variables

The mapping between predicate identifiers and their signatures is denoted by Σ

Definition 432 Mapping between Predicate Identifiers and Signatures

Σ P rarr S

Definition 433 Well-Typed Statement A statement s exiting with label λ isin L error is well-typed in the typing environment Γ given Σ

ΣΓO ` srarr λ

if it is compatible with the types specified in its signature Moreover outputs of awell-typed statement must be in the writable variables set O sube V+

The inference rule for a well-typed predicate call captures all these properties andis shown in rule [WTPCall] given in Table 46

Table 46 ndash Well-Typed Predicate Call

Σ(p) = (x1 τ1 xn τn)[λ1 (τ11 y11 τ1k1 y1k1)| | λm (τm1 ym1 τmkm ymkm)]

Γ(e1) = τ1 Γ(en) = τnforalli isin 1 m Γ(oi1) = τi1 Γ(oiki) = τiki

oi1 oiki isin O foralli foralljforallki j 6= ki oij 6= oiki λ isin λ1 λmΣΓO ` p(e1 en) [λ1 o1 | | λm om]rarr λ

WTPCall

43 Well-Typed αSmil Statements 69

The inference rules for the αSmil statements representing calls to built-in predicatesare detailed in Table 47

Table 47 ndash Well-Typed Statements

Γ(e1) = Γ(e2) λ isin true falseΣΓO ` e1 = e2 rarr λ

WTEquals

Γ(o) = Γ(e) o isin OΣΓO ` o = erarr true

WTAsgn

ΣΓO ` noprarr trueWTNop

Γ(r) = structf1 τ1 fn τnΓ(e1) = τ1 Γ(en) = τn r isin OΣΓO ` r = e1 en rarr true

WTRecNew

Γ(r) = structf1 τ1 fn τnΓ(o1) = τ1 Γ(on) = τn foralli oi isin O foralli 6= j oi 6= oj

ΣΓO ` o1 on = r rarr trueWTRecAll

Γ(r) = structf1 τ1 fi τi fn τn Γ(o) = τi o isin OΣΓO ` o = rfi rarr true

WTRecGet

Γ(r) = Γ(rprime) = structf1 τ1 fi τi fn τnΓ(e) = τi rprime isin O

ΣΓO ` rprime = r with fi = e rarr trueWTRecSet

Γ(rprime) = Γ(rprimeprime) = structg1 τ1 gn τnλ isin true false f1 fk sube g1 gn

ΣΓO ` rprime = 〈f1 fk〉rprimeprime rarr λWTRecEq

Γ(v) = variant[C1 τ1| | Cp τp| | Cn τn]Γ(e) = τp v isin O

ΣΓO ` v = Cp[e]rarr trueWTVarCons

Γ(v) = variant[C1 τ1| | Cp τp| | Cn τn]Γ(op) = τp op isin O

ΣΓO ` switch(v) as [o1| |on]rarr λCpWTVarSwitch

70 Chapter 4 The αSmil Language

Γ(v) = variant[D1 τ1| | Dm τm]C1 Ck sube D1 Dm λ isin true false

ΣΓO ` v isin C1 Ck rarr λWTVarPos

Γ(a) = arrτi〈τ〉 λ isin true false Γ(i) = τi Γ(o) = τ o isin OΣΓO ` o = a[i]rarr λ

WTAGet

Γ(aprime) = Γ(a) = arrτi〈τ〉λ isin true false Γ(i) = τi Γ(e) = τ aprime isin O

ΣΓO ` aprime = [a with i = e]rarr λWTASet

The well-typedness of statements plays an important role with respect to the state-mentsrsquo interpretation as we will show in the next section It is also essential for thewell-typedness and well-formedness of dependency and correlation summaries that willbe presented in the following chapters

The control flow graph Gp = (N E) of a predicate p is well-typed if any edge labeledwith (s λ) isin E is well-typed

forall(s λ) isin E ΣΓO ` srarr λ

ΣΓO ` Gp = (N E)WTCfg

Figure 43 ndash Well-Typed Control Flow Graph

44 Operational Semantics of αSmil StatementsThis section presents the structural operational semantics (Nielson Nielson and Han-kin 1999 Plotkin 2004) of the αSmil language Sometimes also called the small stepoperational semantics this allows reasoning about intermediate stages in a programrsquosexecution and emphasizes the individual steps of the computation

Types We take T0 to be the universe of primitive types τ0 isin T0 Structures variantsand associative arrays are defined inductively Structures are finite labeled products oftypes They are a generalization of the Cartesian product Variants are finite labeleddisjoint unions of several types τ Two types are equal when they are pointwise equal

Semantic Values For each type τ we define the set Dτ of semantic values of thattype For each primitive type τ0 isin T0 we suppose a given Dτ0 Other semantic valuesare defined inductively as shown below

44 Operational Semantics of αSmil Statements 71

Definition 441 Semantic Values Dτ

Dstructf1τ1fnτn = f1 = v1 fn = vn| foralli vi isin Dτi

Dvariant[C1τ1| | Cnτn] =⊎

1leilenCi[v]| v isin Dτiwhere⊎

is the disjointunion

Darrτi 〈τ〉 = (P (vk)kisinP)| P sube Dτi forallk isin P vk isin Dτ

In αSmil arrays are partial In a semantic value belonging to Darrτi 〈τ〉 P denotesthe domain of valid indices for the array

Two values of the same type are equal when they are pointwise equalTraditionally in operational semantics one is interested in how the state is modified

during the execution of a statement αSmil has no concept of state per se what isessential is the evaluation of variables in different environments or semantic contextsTo emphasize this idea we define a valuation or environment E isin E as a mappingfrom variables to semantic values

Definition 442 Valuation or environment E

E V rarr D

Two valuations E and Ersquo are equal if they are mapping the same set of variables tosemantic values that are pointwise equal

E = Eprime lArrrArr forallv isin V E(v) = Eprime(v)

Given a typing environment Γ a valuation E is well-typed if the value mapped toany variable v isin Dom(E) is of the appropriate type Γ(v) We denote this by Γ ` Eand show it in [WTEnv]

forallv isin E E(v) isin DΓ(v)

Γ ` EWTEnv

Definition 443 A configurationlangE [s]

rangof the semantics is a pair consisting of a

valuation and a statement

Definition 444 The transitions of the semantics are of the formlangE [s]

rang λminusrarr Eprime

They express how the configuration is changed by one step of computation occur-ing when executing a statement s that exits with label λ The exit label yielded bythe statementrsquos execution uniquely determines the statement that will be executednext The change of the valuation is recorded in the resulting valuation Ersquo We write

72 Chapter 4 The αSmil Language

E [xrarr v] for the valuation that is identical to E except that x is mapped to the valuev We say that E is extended with xrarr v and formally we define it as shown below

Definition 445 Extend E with xrarr v

(E [xrarr v])(y) =v if x = yE(y) otherwise

Extending a valuation E with multiple mappings x rarr v consists in applying theextension in a left-associative fashion In the following we will omit parentheses forsuch extensions thus denoting

( ((E [x1 rarr v1])[x2 rarr v2]) )[xn rarr vn]

asE [x1 rarr v1] [x2 rarr v2] [xn rarr vn]

An interpretation I isin I for a predicate is defined as a mapping from a predicateand an initial environment to an output environment and an exit label

Definition 446 Predicate Interpretation I isin I

I P times E rarr E times L

The initial environment is a mapping between the predicatersquos formal arguments andtheir effective values The output environment is a mapping between the predicatersquosformal output arguments and their effective values after executing the predicate

The detailed definition of the semantics of generic statements is described belowin Table 48 The first clause [nop] constitutes an axiom as it has no premises Itstates that the nop statement executes in one step yielding the exit label true withoutextending the valuation E The semantics of equality tests is given by two inferencerules [equalT ] and [equalF ] one for each of the statementrsquos possible exit labels Acall to the built-in predicate = will exit with label true if and only if the valuations ofits arguments e1 and e2 are equal (clause [equalT ]) Otherwise the statement will exitwith label false (clause [equalF ]) In both cases the statement leaves the valuation Eunchanged The semantics of an assignment is given by the [asgn] clause the statementalways yields the exit label true and extends the valuation E with o mapped to thevalue E(e) of e

Table 48 ndash The Structural Operational Semantics of αSmil GenericStatements

[nop]langE [nop]

rang trueminusminusrarr E

[equalT ]E(e1) = E(e2)lang

E [e1 = e2]rang trueminusminusrarr E

44 Operational Semantics of αSmil Statements 73

[equalF ]

E(e1) 6= E(e2)langE [e1 = e2]

rang falseminusminusrarr E

[asgn]Eprime = E [orarr E(e)]langE [o = e]

rang trueminusminusrarr Eprime

The semantics of structure-related statements is given in the Table 49 The creationof a structure always yields the exit label true as indicated by the [recNew] clause andit extends the valuation E by mapping the resulting output variable r to the structuralvalue obtained by mapping every field fi to the value E(ei) of the corresponding eiarguments The destructuring of a structure r extends the valuation E by mappingevery output oi to the corresponding value E(vi) of the fi field of r The statementalways exits with true The valuation Eprime obtained after executing an access to a givenfield fi of a structure r is an extension of E where the output o is mapped to thecorresponding value of rrsquos fi field in E The semantics of a field update is given bythe clause [recSet] This statement extends the valuation E by mapping the outputstructure rprime to a new value where the updated field fi is mapped to the value of e inE and every other field is mapped to the same value it had in E Finally the last twoclauses correspond to a partial structure equality test As shown by [recEqualsT ] thestatement yields the exit label true if and only if the values of every field gi in the givenset of fields are equal for r and rprime in E Otherwise the statement yields the label falseIn both cases the valuation E remains unchanged

Table 49 ndash Operational Semantics of αSmil Structure-RelatedStatements

[recNew]Eprime = E [r rarr f1 = E(e1) fi = E(ei) fn = E(en)]lang

E [r = e1 en]rang trueminusminusrarr Eprime

[recAll]

E(r) = f1 = v1 fn = vnEprime = E [o1 rarr v1] [o2 rarr v2] [on rarr vn] foralli j i 6= j oi 6= ojlang

E [o1 on = r]rang trueminusminusrarr Eprime

[recGet]

E(r) = f1 = v1 fi = vi fn = vnEprime = E [orarr vi]lang

E [o = rfi]rang trueminusminusrarr Eprime

[recSet]

E(r) = f1 = v1 fi = vi fn = vnEprime = E

[rprime rarr f1 = v1 fi = E(e) fn = vn

]langE [rprime = r with fi = e]

rang trueminusminusrarr Eprime

74 Chapter 4 The αSmil Language

[recEqualsT ]

E(rprime) = f1 = vf1 fn = vfnE(rprimeprime) = f1 = wf1 fn = wfn

g1 gk sube f1 fn vgi = wgi foralli isin 1 klangE [rprime = 〈g1 gk〉rprimeprime]

rang trueminusminusrarr E

[recEqualsF ]

E(rprime) = f1 = vf1 fn = vfnE(rprimeprime) = f1 = wf1 fn = wfn

g1 gk sube f1 fn existi i isin 1 k vgi 6= wgilangE [rprime = 〈g1 gk〉rprimeprime]

rang falseminusminusrarr E

Table 410 details the semantics of variant-related statements As indicated by the[varCons] clause the construction of a variant v with a constructor Cp always yieldsthe exit label true The obtained valuation Eprime is an extension of E where the valueof v is obtained by applying the constructor Cp to the argumentrsquos value E(e) Avariant switch exits with the label λCi if the value of v in E has been constructedwith the Ci constructor The valuation Eprime obtained after executing the statement is anextension of E whereby the corresponding output oi is mapped to the value of the Ciconstructorrsquos argument E(e) The last two clauses [varPossibleT ] and [varPossibleF ]indicate the semantics of a variant possible check and correspond to the statementrsquospossible exit labels The statement will yield the label true only if the value of v in E hasbeen obtained with a constructor D that is a member of the given set of constructorsC1 Ck Otherwise the false label will be returned In both cases the valuationremains unchanged

Table 410 ndash Operational Semantics of αSmil Variant-RelatedStatements

[varCons]Eprime = E [v rarr Cp[E(e)]]langE [v = Cp[e]]

rang trueminusminusrarr Eprime

[varSwitch]

E(v) = Ci[e] Eprime = E [oi rarr E(e)]langE [switch(v) as [o1| |on]]

rang λCiminusminusrarr Eprime

[varPossibleT ]E(v) = D[e] D isin C1 Cklang

E [v isin C1 Ck]rang trueminusminusrarr E

[varPossibleF ]

E(v) = D[e] D isin C1 CklangE [v isin C1 Ck]

rang falseminusminusrarr E

44 Operational Semantics of αSmil Statements 75

Table 411 describes the semantics of array-related statements Each array-relatedstatement has two corresponding clauses one for each of the Boolean exit labels Ac-cessing an arrayrsquos element yields the exit label true if the given index i is a valid indexThe resulting valuation Eprime is extended by mapping the output o to the value in E ofthe arrayrsquos i-th element Otherwise when the given index i is invalid as indicatedby the [arrGetF ] clause the statement yields the label false and leaves the valuationunmodified The semantics of an array update is given by the [arrSetT ] and [arrSetF ]clauses If the given index i is valid the exit label true is yielded and the resultingvaluation is obtained by extending E with aprime whose i-th elementrsquos value is the value ofe in the initial valuation E The values of all other elements of aprime are the ones found inE for the elements of a On the contrary if the given index i is invalid the valuationremains unchanged and the label false is yielded

Table 411 ndash Operational Semantics of αSmil Array-RelatedStatements

[arrGetT ]

E(a) = (P (v)k) E(i) isin P Eprime = E[orarr vE(i)

]langE [o = a[i]]

rang trueminusminusrarr Eprime

[arrGetF ]

E(a) = (P (v)k) E(i) isin PlangE [o = a[i]]

rang falseminusminusrarr E

[arrSetT ]

E(a) = (P (v)k) E(i) isin P

E

[aprime rarr (P (w)k) wk =

E(e) if k = E(i)vk otherwise

]langE [aprime = [a with i = e]]

rang trueminusminusrarr Eprime

[arrSetF ]

E(a) = (P (v)k) E(i) isin PlangE [aprime = [a with i = e]]

rang falseminusminusrarr E

The semantics of a generic predicate call p(e1 en) [λ1 o1 | | λm om] is cap-tured by the [pCall] inference rule shown in Table 412 Interpreting the predicate p inthe context of its argumentsrsquo values in the valuation E yields a label λi and a map-ping between its formal output arguments and their resulting values vij The resultingevaluation Eprime is obtained by extending E with the output variables oij mapped to thecorresponding vij

The interpretation of a statement is well-typed with respect to a signature if andonly if every tuple in the interpretation is well-typed ie if it has the expected numberof inputs with the adequate types and an adequate label with well-typed outputs as

76 Chapter 4 The αSmil Language

well Furthermore it has to be total ie for every well-typed tuple of inputs thereexists a label and some outputs that match in the interpretation

Table 412 ndash Semantics of a Predicate Call

Σ(p) = p(x1 τ1 xn τn)[λ1 (τ1 y1)| | λi (τi1 yi1 τiki yiki)| | λm (τm ym)]

I(p inputs) = (outputs λi) inputs(xl) = E(el)foralll isin 1 noutputs(yi1) = vi1 outputs(yiki) = viki

Eprime = E [oi1 rarr vi1] [oiki rarr viki ]langE [p(e1 en) [λ1 o1 | | λm om]]

rang λiminusrarr EprimepCall

Definition 447 Subject Reduction PropertyThe interpretation of a well-typed statement given well-typed interpretations for

the external predicate calls preserves the fact that the valuation is well-typed

forall Γ E s λΣ (Γ ` E) and (ΣΓO ` srarr λ) and (langE [s]

rang λminusrarr Eprime) =rArr Γ ` Eprime

Definition 448 The Progress PropertyA well-typed statement in a well-typed environment can always be interpreted to

some label and outputs

forall EΓΣ s (Γ ` E) and (ΣΓO ` srarr λ) =rArr existλprime EprimelangE [s]

rang λprimeminusrarr Eprime

The well-typedness of an interpretation as well as the subject reduction and progressproperties have been formally proven in Coq by Steacutephane Lescuyer

77

Chapter 5

Dependency Analysis forFunctional Specifications

like islands in the sea separate onthe surface but connected in the deep

William James

Algebraic data types (structures and variants) and associative arrays are fundamen-tal building blocks for representing grouping and handling complex data efficientlyHowever as argued in Chapter 1 operations manipulating them are rarely concernedwith the entire compound input data structure Frequently they depend only on a lim-ited subset of their input Complete specifications or contracts (Meyer 1997) of suchoperations will not only stipulate that the output possesses a certain property (BorgidaMylopoulos and Reiter 1993 Polikarpova et al 2013) but will also include their frameconditions (Borgida Mylopoulos and Reiter 1995) ie the parts of the input on whichthey operate Such conditions facilitate reasoning locally without overlooking the globalpicture if a property P is known to hold at a certain point in the program where apredicate p is called P still holds after the call to p provided that the (sub)structureson which P depends are disjoint from the (sub)structures that might be modified ac-cording to prsquos frame condition (Banerjee and Naumann 2014) Though intuitivelyeasy specifying and proving the preservation of logical properties for the unmodifiedpart is a particular manifestation of the frame problem (McCarthy and Hayes 1969Leavens Leino and Muumlller 2007) ndash a notoriously cumbersome task in formal softwareverification imposing unnecessary manual effort (Meyer 2015)

One of the challenges of addressing this problem and thereby simplifying the ver-ification of certain preserved properties is to determine the input fragments on whichthese properties depend ie their footprint (Distefano OrsquoHearn and Yang 2006)or to a first approximation their read effects (Feijs and Jonkers 1992 Greenhouseand Boyland 1999 Clarke and Drossopoulou 2002) While specifications sometimesinclude the write effects (Clarke and Drossopoulou 2002) of an operation through mod-ifies clauses (Guttag et al 1993b) read effects are usually not specified explicitly eventhough this information can be useful for reasoning about an operationrsquos results Thepurpose of the dependency analysis presented in this chapter is to take a step forward in

78 Chapter 5 Dependency Analysis for Functional Specifications

this direction and to detect such information automatically More precisely our analy-sis is a static dependency analysis for the αSmil language (presented in Chapter 4) thatcomputes a conservative approximation of the input fragments on which the operationsdepend

Dependence and liveness analyses are traditionally used in the compilation realmfor code optimization (Kennedy 1978) dead code elimination (Knoop Ruumlthing andSteffen 1994 Wand and Siveroni 1999 Liu and Stoller 2003) program slicing (Weiser1984 Tip 1995 Reps and Turnidge 1996 Castillo et al 2008) or compile-time garbagecollection (Jones and Meacutetayer 1989 Park and Goldberg 1992 Wand and Clinger1998) In contrast to the vast majority of static analyses that are meant to be usedstrictly on code and in an essentially purely automatic setting our analysis is thoughtof as a companion tool to be exploited in the middle of interactive program verificationand it is designed to be used on programs as well as on specifications

51 Dependency Analysis in a NutshellIn a nutshell our dependency analysis targets the delimitation of the input subset onwhich the output depends in the context of an operation with a compound input Wedefine dependency as the observed part of a structured domain and strive to obtain type-sensitive results distinguishing between the subelements of arrays and algebraic datatypes and capturing the dependency specific to each The targeted results are meantto mirror ndash in terms of dependency ndash the layered structure of compound data typesFurthermore the dependency analysis must work with conservative approximations andit must guarantee that what is marked as not needed is definitely not needed ie it isirrelevant for the obtained output

In the classification of Hind (Hind 2001) our dependency analysis is a flow-sensitive field-sensitive interprocedural analysis that handles associative arrays struc-tures and variant data types Specific dependency results are computed for each of thepossible execution scenarios ie for each exit label Thus our analysis also shows aform of path-sensitivity (Hind 2001) However we favour the term label-sensitivity todescribe this characteristic as it seems more appropriate applied to our case and thelanguage we are working with

Our dependency analysis targets complex transition systems in general and oper-ating systems and microkernels in particular These are characterized by states definedby complex compound data structures and by transitions ie state changes that mapan input state to an output state Automatically proving the preservation of invariantsconcerning only subelements of the state ie fields array cells etc that have not beenaltered by a transition in the system would considerably diminish the number of proofobligations The first step towards achieving this goal consists in automatically detect-ing dependency summaries and the minimum relevant input information for producingcertain outputs

As mentioned our analysis targets fine-grained dependency summaries for arraysstructures and variants expressed at the level of their subelements For variants

51 Dependency Analysis in a Nutshell 79

besides capturing the specific dependency on each constructor and its arguments weargue that additional relevant information can be computed regarding the subset ofpossible constructors at a given program point This is not dependency informationper se but it enriches the footprint of a predicate with useful information Togetherwith the dependency information this additional information about constructors ismeant to answer the same question namely what fragments of the input influence theoutput from a different albeit related point of view Therefore we are simultaneouslyperforming a possible-constructors analysis This has an impact on the defined abstractdependency type making it more complex as we will see in the following section Thepossible-constructors analysis could be performed separately as a stand-alone analysisBy performing the two analyses simultaneously we lose some of the precision thatwould be attained if the two were performed separately but we reduce overhead andpresent relevant information in a unified manner

Designing the analysis as a tool to be used in the context of interactive programverification on both code and specifications has led to specific traits One of themconcerns the treatment of arrays In contrast to dependence and liveness analyses usedfor code optimizations (Gross and Steenkiste 1990) which require precision for everyarray cell we compute dependency information referring to all cells of the array orto all but one cell for which an exceptional dependency is computed In practice aconsiderable number of relevant properties and operations involving arrays fall into thisspectrum

In the following subsection in order to better illustrate the problem that our analysisaddresses we briefly present two examples of αSmil predicates manipulating structuresvariants and arrays and describe the dependency information that we are targeting

511 Targeted Dependency Information

To present the envisioned dependency results we consider two αSmil predicates threadand start_address whose control flow graphs and implementations are shown belowBoth predicates manipulate inputs of type process introduced in Section 315 (onpage 49) and shown in Figure 52 Internally they handle values of type thread andmemory_region respectively described in Section 315 (on page 48) as well and shownbelow in Figure 51

type memory_region = Start addressstart int Region lengthlength int

type thread = Identifierid int Current statecrt_state state Stackstack memory_region

Figure 51 ndash Example Data Types ndash Thread and Memory Region

80 Chapter 5 Dependency Analysis for Functional Specifications

type option ltAgt =| None| Some (A a)

type process = Array of associated threadsthreads array ltoption ltthread gtgt Internal idpid int Currently running threadcrt_thread int Address spaceadr_space address_space

Figure 52 ndash Input Type ndash Process

The first predicate thread having the control flow graph shown in Figure 54 andwhose implementation is shown in Figure 53 receives a process p and an index ias inputs It reads the i-th element in the threads array of the input process p Ifthis element is active then the predicate exits with the label true and outputs thecorresponding thread ti Otherwise it exits with the label None and no output isgenerated

predicate thread ( process p int i)-gt [ true thread ti|None|oob]

array ltoption ltthread gtgt th option ltthread gt tio th = p threads [ true -gt 1]tio = th[i] [ true -gt 2 f a l s e -gt 5]switch (tio) as [ |ti] [None -gt 4 Some -gt 3][ true][None ][oob]

Figure 53 ndash Predicate thread ndash Implementation

Our dependency analysis should be able to distinguish between the different exitlabels of the predicate For the label true for instance it should detect that onlythe field threads is read by the predicate while all others are irrelevant to the resultFurthermore it should detect that for the threads array of the input p only the i-thelement is inspected Additionally since we are considering the label true the i-thelement is necessarily an active thread indicated by the constructor Some The otherconstructor None is impossible for this execution scenario On the contrary for theexit label None the constructor Some is impossible For the exit label oob nothing butthe index i and the ldquosupportrdquo or ldquolengthrdquo of the associated threads array is read Thetargeted dependency results for the predicate thread are depicted in Figure 55

The second predicate start_address whose control flow graph is shown in Fig-ure 56 receives a process p and an index j as inputs and finds the start address of

51 Dependency Analysis in a Nutshell 81

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some None

Figure 54 ndash Gthread ndash Control Flow Graph of Predicate thread

Exit label true

adr_space

crt_thread

pid

process p

ithreads

Exit label None

adr_space

crt_thread

pid

process p

ithreads

optionltthreadgt

Some(thread t)

None

ReadNeeded

IrrelevantNot Needed

Figure 55 ndash Targeted Dependency Results for Predicate thread

the stack corresponding to an active thread It makes a call to the predicate threadthus reading the j-th element of the threads array of its input process If this is anactive element it further accesses the field stack from which it only reads the startaddress start Otherwise if the element is inactive the predicate forwards the exitlabel None of the called predicate thread and generates no output When given aninvalid index i the predicate exits with label oob The predicatersquos implementation isshown in Figure 57

The dependency information for this predicate should capture the fact that on thetrue execution scenario only the field start of the inputrsquos j-th associated thread isread Furthermore the only possible constructor on this execution path is the Someconstructor On the contrary for the None execution scenario the only possible con-structor is the None constructor The targeted dependency results for the start_addresspredicate are depicted in Figure 58 We remark that for the oob execution scenarioonly the ldquosupportrdquo or ldquolengthrdquo of the threads array is read

82 Chapter 5 Dependency Analysis for Functional Specifications

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

error

trueNone

true

true

oob

Figure 56 ndash Gstart_address ndash Control Flow Graph of Predicatestart_address

predicate start_address ( process p int j)-gt [ true int adr|None]

thread tj memory_region sj thread (p j)[ true tj | None | oob] [ true -gt 1

None -gt 4 oob -gt 5]sj = tj stack [ true -gt 2]adr = sjstart [ true -gt 3][ true][None ][error]

Figure 57 ndash Predicate start_address ndash Implementation

Exit label true

adr_space

crt_thread

pid

process p

threads

idcrt_state

stack

thread tjstartstack stj

lengthExit label None

adr_space

crt_thread

pid

threads

process p

optionltthreadgt

Some(thread t)

None

ReadNeeded

IrrelevantNot Needed

Figure 58 ndash Targeted Dependency Results for Predicatestart_address

52 Abstract Dependency Domain 83

512 Outline

The rest of this chapter is focusing on technical details related to the dependency analy-sis In Section 52 we present the abstract dependency domain This is the fundamentalbuilding block on which our analysis relies in order to determine expressive dependencysummaries It is followed in Section 53 by an in-depth description of our analysis at anintraprocedural level underlining the data-flow equations in Section 532 and explain-ing them by illustrating the step-by-step mechanism on an example in Section 533 Asummary of the dependency analysis at an interprocedural level is given in Section 54We illustrate the approach underline its shortcomings on an example in Section 541and discuss their origin in Section 542 Two different semantic interpretations of ourdependency information are discussed in Section 55 In Section 56 we review anddiscuss approaches targeting information that is similar to our dependency summariesFinally in Section 57 we conclude and present some other potential applications ofour dependency analysis which are not confined to the field of interactive programverification

52 Abstract Dependency DomainThe first step towards inferring expressive type-sensitive results that capture the de-pendency specific to each subelement of an algebraic data type or an associative arrayis the definition of an abstract dependency domain D that mimics the structure of suchdata types The dependency domain δ isin D shown below is defined inductively fromthe three atomic cases mdash gt and perp mdash and mirrors the structure of the concretetypes

Definition 521 Dependency Domain δ isin D

δ = | gt Everything ndash atomic case (i)| Nothing ndash atomic case (ii)| perp Impossible ndash atomic case (iii)| f1 7rarr δ1 fn 7rarr δn f1 fn fields (iv)| [C1 7rarr δ1 Cm 7rarr δm] C1 Cm constructors (v)| 〈δ〉 (vi)| 〈δdef i δexc〉 i array index (vii)

As reflected by the above definition the dependency for atomic types is expressed interms of the domainrsquos atomic cases gt (least precise) denoting that everything is neededand denoting that nothing is needed The third atomic case perp denoting impossibleis introduced for the possible constructors analysis performed simultaneously and isfurther explained below

The dependency of a structure (iv) describes the dependency on each of its fields Forinstance revisiting our thread example from Section 511 we could express an over-approximation of the dependency information depicted for the process p in Figure 55

84 Chapter 5 Dependency Analysis for Functional Specifications

using the following dependency

threads 7rarr gt pid 7rarr crt_thread 7rarr adr_space 7rarr

This captures the fact that all fields except the threads field are irrelevant ie theyare not read and nothing in their contents is needed The dependency for the threadsfield is an over-approximation and expresses the fact that it is entirely necessary ieeverything in its value is needed for the result

For arrays we distinguish between two cases namely arrays with a general depen-dency applying to all of the cells given by (vi) and arrays with a general dependencyapplying to all but one exceptional cell for which a specific dependency is known givenby (vii) For instance for the threads field of the previous example the following de-pendency

〈 i gt〉

would be a less coarse approximation capturing the fact that only the i-th element ofthe associated threads array is needed while all others are irrelevant

For variants (v) the dependency is expressed in terms of the dependencies of theirconstructors expressed in turn in terms of their argumentsrsquo dependencies Thus aconstructor having a dependency mapped to is one for which nothing but the taghas been read ie its arguments if any are irrelevant for the execution For in-stance for the i-th element of the threads array of our previous example the followingdependency

[Some 7rarr gt None 7rarr ]

would be a more precise approximation when considering the exit label true It isstill an over-approximation as it expresses that both constructors are possible Theargument of the Some constructor is entirely read while for None only the tag is read

For variants we want to take a step further and to also include the informationthat certain constructors cannot occur for certain execution paths Impossible thethird atomic case mdash perp mdash is introduced for this purpose As mentioned previouslyin Section 51 in order to obtain this additional information we perform a ldquopossible-constructorsrdquo analysis simultaneously which computes for each execution scenario thesubset of possible constructors for a given value at a given program point All construc-tors that cannot occur on a given execution path are marked as being perp In contrastconstructors for which only the tag is read are marked as The difference between perpand can be illustrated by considering a polymorphic option type optionltAgt havingtwo constructors None and Some(A val) respectively and a Boolean predicate thatpattern matches on an input of this type and returns false in the case of None andtrue in the case of Some unconditioned by the value val of its argument For thetrue execution scenario the dependency on the Some constructor would be Thetag is read and it is decisive for the outcome but the value of its argument val iscompletely irrelevant The dependency on the None constructor however would be perpthe predicate can exit with label true if and only if the input matches against the Someconstructor By distinguishing between these two cases we can not only distinguish the

52 Abstract Dependency Domain 85

inputrsquos subelements that have a direct impact on an operationrsquos output but addition-ally we can also obtain a more detailed footprint that highlights the influence exertedby the inputrsquos ldquoshaperdquo on the operationrsquos outcome

For instance for the i-th element of the threads array of our previous example adependency mapping the constructor None to perp would be a more precise approximationwhen considering the label true Taking into account all the discussed values we canexpress the dependency depicted in Figure 55 for the label true as follows

threads 7rarr 〈 i [Some 7rarr gt None 7rarr perp]〉pid 7rarr crt_thread 7rarr adr_space 7rarr

We remark that gt and perp can apply to any type For instance gt can be seen

as a placeholder for data that is needed in its entirety Structure array or variantdependencies whose subelements are all entirely needed and thus uniformly mappedto gt are transformed to gt The perp dependency is a placeholder for data that cannotoccur on a certain execution scenario A whole variant value is impossible if all itsconstructors are mapped to perp A whole structure or array is impossible if any of itssubelements is impossible

The perp atomic value is the lower bound of our domain and hence the most precisevalue The final abstract dependency is a closure of all these combined recursively Togive an intuition of the shape of our dependency lattice we illustrate below in Figure 59the Hasse diagram of the order relation between pairs of atomic dependency valuesIntuitively if the two analyses would be performed separately the upper ldquodiamondrdquoshape would correspond to the dependency analysis and the lower one to the possible-constructors analysis The element would be the lower bound for the dependencydomain and the upper bound for the possible-constructors domain By performingthem simultaneously perp becomes the domainrsquos lower bound

(gtgt)

(gt) (gt)

()

(perp) (perp)

(perpperp)

(gtperp) (perpgt)

Figure 59 ndash Order Relation on Pairs of Atomic Dependencies

The partial order relation is denoted by v and defined as shown below

Definition 522 Partial Order v

v sube D timesD

86 Chapter 5 Dependency Analysis for Functional Specifications

Table 51 ndash v ndash Comparison of Two Domains

δ v gtTop

perp v δBot

δ1 v δprime1 δn v δprimenf1 7rarr δ1 fn 7rarr δn v f1 7rarr δprime1 fn 7rarr δprimen

Str v δ1 v δn v f1 7rarr δ1 fn 7rarr δn

Str

δ1 v δprime1 δn v δprimen[C1 7rarr δ1 Cn 7rarr δn] v [C1 7rarr δprime1 Cn 7rarr δprimen]

Var v δ1 v δn v [C1 7rarr δ1 Cn 7rarr δn]

Var

δdef v δprimedef

〈δdef 〉 v 〈δprimedef 〉ADef

v δdef

v 〈δdef 〉ADef

δdef v δprimedef δexc v δprimedef

〈δdef i δexc〉 v 〈δprimedef 〉AIA

δdef v δprimedef δdef v δprimeexc

〈δdef 〉 v 〈δprimedef i δprimeexc〉AAI

δdef v δprimedef δexc v δprimeexc

〈δdef i δexc〉 v 〈δprimedef i δprimeexc〉AI v δdef v δexc

v 〈δdef i δexc〉AI

δdef v δprimedef δexc v δprimeexc δdef v δprimeexc δexc v δprimedef i 6= j

〈δdef i δexc〉 v 〈δprimedef j δprimeexc〉AIJ

It is used to compare dependencies and it is detailed in Table 51 We write δ1 v δ2and we read it as ldquoa dependency δ1 is more precise than another dependency δ2rdquo ifit represents a smaller subset of a structural object and if it allows at most as manyconstructors as δ2 The greatest element is gt (Top) and perp is the least (Bot) Instancesof identical structure and variant types are compared pointwise (Str Var) For arrayswithout known exceptional dependencies we compare the default dependencies applyingto all array cells (ADef) If exceptional dependencies are known for the same cell theseare additionally compared (AI) For arrays with known exceptional dependencies fordifferent cells we compare each dependency on the left-hand side with each one on theright-hand side (AIJ) The comparison of with structures (Str) variants (Var)and arrays (ADef AI) is a pointwise comparison between and the dependencyof each subelement

521 Join and Reduction Operator

The join operation is denoted by or and it is defined as shown below

Definition 523 Join Operation or

or D timesD rarr D

52 Abstract Dependency Domain 87

It is detailed in Table 52 Intuitively the join of two dependencies is the union ofthe dependencies represented by the two It is a commutative operation for which theundisplayed cases in Table 52 are defined by their symmetrical counterparts Theoperation is total joining incompatible domains such as a structure and a variant ortwo structures having different field identifiers results in gt the least precise valueJoin is applied pointwise on each subelement perp is its identity element and gt is itsabsorbing element Joining and the dependency of a structure variant or array isapplied pointwise The value obtained by joining δ and δprime is an upper bound of the two

δ v δ or δprime and δprime v δ or δprime forall δ δprime isin D

Defining the join of two dependencies corresponding to arrays is subtle As shownin Table 51 we are allowing comparisons between dependencies corresponding to ar-rays with exceptions on different variables (rule AIJ) the join operation in this caseamounts to joining the four different dependencies without keeping any of the two ex-ceptions We could have chosen to keep one of the known exceptional dependenciesbut this would have posed two problems on one hand the join operation would notbe commutative and on the other hand it is hard to predict how the exceptionaldependencies would be used at the intraprocedural level and which of the two couldpotentially lead to a gain in precision Thus we adopted this design decision Astrategy possibly worth investigating in such cases would be to allow users to specifyarray cells of interest at specific program points This user-supplied information couldthen be taken into consideration whenever joining array dependencies with two differ-ent known exceptional dependencies Our current join approach for arrays can lead tonon-monotonic approximations in join This becomes visible when noting that for a

Table 52 ndash or ndash Join Operation

δprime δprimeprime δprime or δprimeprime

gt or δ = gtperp or δ = δ

f1 7rarr δ1 fn 7rarr δn or f1 7rarr δprime1 fn 7rarr δprimen = f1 7rarr δ1 or δprime1 fn 7rarr δn or δprimen or f1 7rarr δ1 fn 7rarr δn = f1 7rarr or δ1 fn 7rarr or δn

[C1 7rarr δ1 Cn 7rarr δn] or [C1 7rarr δprime1 Cn 7rarr δprimen] = [C1 7rarr δ1 or δprime1 Cn 7rarr δn or δprimen] or [C1 7rarr δ1 Cn 7rarr δn] = [C1 7rarr or δ1 Cn 7rarr or δn]

〈δdef 〉 or 〈δprimedef 〉 = 〈δdef or δprimedef 〉 or 〈δdef 〉 = 〈 or δdef 〉

〈δdef 〉 or 〈δprimedef i δprimeexc〉 = 〈δdef or δprimedef i δdef or δprimeexc〉 or 〈δdef i δexc〉 = 〈 or δdef i or δexc〉

〈δdef i δexc〉 or 〈δprimedef j δprimeexc〉i = j

i 6= j=

〈δdef or δprimedef i δexc or δprimeexc〉〈δdef or δexc or δprimedef or δprimeexc〉

or =

88 Chapter 5 Dependency Analysis for Functional Specifications

monotonic join operation the following should hold

forallδ δprime ρ δ v δprime =rArr δ or ρ v δprime or ρ (i)

Consideringρ equiv 〈ρdef i ρi〉δ equiv 〈δdef j δj〉δprime equiv 〈δprimedef i δprimei〉 where i 6= j

the hypothesis δ v δprime is translated into the following constraints

δdef v δprimedef δdef v δprimei δj v δprimedef δj v δprimei

Applying (i) for these three dependencies we obtain

〈(δdef or δj) or (ρdef or ρi)〉 v 〈δprimedef or ρdef i δprimei or ρi〉

which holds if and only if both of the following inequalities hold

(δdef or δj) or (ρdef or ρi) v δprimedef or ρdef(δdef or δj) or (ρdef or ρi) v δprimei or ρi

Considering for instance

ρi = gt ρdef 6= gt δdef = δj = δprimedef = perp

a counterexample is foundAs a consequence of the non-monotonic approximations made for arrays (rule AIJ)

the value obtained by joining two dependencies is an upper bound not a least upperbound We address this issue and indicate our solution in Section 53 (on page 94)We remark that we keep only one exceptional cell for array dependencies as in practicemost operations manipulating arrays tend to either modify only one element or all ofthem Logical properties on arrays generally have to hold for all elements Keepingmore than one exceptional dependency would be much more costly and the additionalcost would not necessarily be justified in practice However the join operation wouldbe more straightforward and would not impose non-monotonic approximations

Besides join a reduction operator denoted by oplus has been defined as well

Definition 524 Reduction Operator oplus

oplus D timesD rarr D

This is a recursive commutative pointwise operation Intuitively this operator is intro-duced for taking advantage of the information additionally computed by the possible-constructors analysis that we perform simultaneously Following the same executionpath the same constructors must be possible The reduction operator is used in orderto incorporate this additional information computed for constructors The dependency

52 Abstract Dependency Domain 89

analysis can be seen as amay analysis ie when combining the dependency informationcomputed at two different points on the same execution path the result must accountfor all dependencies computed at any of the two combined points In contrast thepossible-constructors analysis can be seen as a must analysis ie when combining in-formation at two different points on the same execution path it needs to keep facts thathold at both combined points Thus the reduction operator combines dependencies onthe same execution path and consists in performing the intersection of constructors inthe case of variants and the union of dependencies for all other types The reductionoperatorrsquos role will become more transparent after presenting the intraprocedural de-pendency analysis and the corresponding data-flow equations in Section 53 Its identityelement is and its absorbing element is perp The reduction operator between gt andthe dependency of a structure variant or array is applied pointwise Two instances ofidentical variant types are pointwise reduced Similarly to join the undisplayed casesin Table 53 are defined with respect to their symmetrical counterparts

δprime δprimeprime δprime oplus δprimeprime

perp oplus δ = perp oplus δ = δ

f1 7rarr δ1 fn 7rarr δn oplus f1 7rarr δprime1 fn 7rarr δprimen = f1 7rarr δ1 oplus δprime1 fn 7rarr δn oplus δprimenf1 7rarr δ1 fn 7rarr δn oplus gt = f1 7rarr δ1 oplusgt fn 7rarr δn oplusgt[C1 7rarr δ1 Cn 7rarr δn] oplus [C1 7rarr δprime1 Cn 7rarr δprimen] = [C1 7rarr δ1 oplus δprime1 Cn 7rarr δn oplus δprimen][C1 7rarr δ1 Cn 7rarr δn] oplus gt = [C1 7rarr δ1 oplusgt Cn 7rarr δn oplusgt]

〈δdef 〉 oplus 〈δprimedef 〉 = 〈δdef oplus δprimedef 〉〈δdef 〉 oplus 〈δprimedef i δprimeexc〉 = 〈δdef oplus δprimedef i δdef oplus δprimeexc〉

〈δdef i δexc〉 oplus 〈δprimedef j δprimeexc〉 =〈δdef oplus δprimedef i δdef oplus δprimeexc〉 where i = j

〈(δdef or δexc)oplus (δprimedef or δprimeexc)〉 otherwise〈δdef 〉 oplus gt = 〈δdef oplusgt〉

〈δdef i δexc〉 oplus gt = 〈δdef oplusgt i δexc oplusgt〉gt oplus gt = gt

Table 53 ndash oplus ndash Reduction Operator

Finally the extractions summarized in Table 54 have been defined for dependenciesδ and are used to express the data-flow equations of Section 53Definition 525 Extraction of a fieldrsquos dependency

f D 9 D

Definition 526 Extraction of a constructorrsquos dependency

C D 9 D

Definition 527 Extraction of an arrayrsquos cell dependency

〈i〉 D 9 D

90 Chapter 5 Dependency Analysis for Functional Specifications

Definition 528 Extraction of an arrayrsquos dependency outside a cell i

〈lowast i〉 D 9 D

Definition 529 Extraction of an arrayrsquos general dependency

〈lowast〉 D 9 D

They are partial functions and can only be applied on dependencies of the cor-responding kind For instance the field extraction f only makes sense for atomic orstructured values with a field named f which should be the case if the dependencyrepresents a variable of a structured type with some field f For any of the atomicdependencies δa applying any of the defined extractions yields δa

Table 54 ndash Dependency Extractions

δf f isin F

gtf = gtf = perpf = perpf1 7rarr δ1 fn 7rarr δnf = δi if f = fi

δCC isin C

gtC = gtC = perpC = perp[C1 7rarr δ1 Cm 7rarr δm]C = δj if C = Cj

δ〈lowast i〉 δ〈i〉 δ〈lowast〉

gt〈lowast i〉 = gt gt〈i〉 = gt gt〈lowast〉 = gt〈lowast i〉 = gt 〈i〉 = 〈lowast〉 = perp〈lowast i〉 = perp perp〈i〉 = perp perp〈lowast〉 = perp〈δdef 〉〈lowast i〉 = δdef 〈δdef 〉〈i〉 = δdef 〈δdef 〉〈lowast〉 = δdef

〈δdef k δexc〉〈lowast i〉 =δdef when i = kδdef or δexc otherwise

〈δdef k δexc〉〈i〉 =δexc when i = kδdef or δexc otherwise

〈δdef k δexc〉〈lowast〉 =δdef or δexc

522 Well-Typed Dependencies

The described syntactic dependencies are untyped However their interpretation ismade in the context of a type τ Dependencies such as or gt do not exhibit any datatype features and can apply to any type but others will be completely constrained andmost will fall in between uncovering a few layers of structured types before reaching oneof the ldquogenericrdquo leaves gt or perp For example the dependency f 7rarr δf only reallymakes sense for structured types with a single field f whose type itself is compatiblewith δf and shall not be used in connection with variant or array types

As a consequence we conclude the presentation of our abstract dependency typeby explaining what it means for a dependency to be compatible with some type τ ie

53 Intraprocedural Analysis and Data-Flow Equations 91

to be well-typed of some type τ This is described as a judgement parameterized by thetyping environment Γ (Definition 431) and the different inference rules are detailed inTable 55

Γ ` gt τWTgt

Γ ` perp τWTperp

Γ ` τWT

τ = structf1 τ1 fn τnΓ ` δ1 τ1 Γ ` δn τnΓ ` f1 7rarr δ1 fn 7rarr δn τ

WTStruct

τ = variant[C1 τ1| | Cn τn]Γ ` δ1 τ1 Γ ` δn τnΓ ` [C1 7rarr δ1 Cn 7rarr δn] τ

WTVar

Γ ` δdef τΓ ` 〈δdef 〉 arrτi〈τ〉

WTArr

Γ ` δdef τ Γ ` δexc τ Γ(i) = τi

Γ ` 〈δdef i δexc〉 arrτi〈τ〉WTArrI

Table 55 ndash Well-Typed Dependencies

The atomic dependency values are generic they are well-typed with respect to anytype (WTgt WT WTperp) The dependency δ for a structure (WTStruct) is well-typed only with respect to an adequate structured type whose field types are themselvescompatible with the dependency mapped to them in δ Similarly the dependency δfor a variant (WTVar) is well-typed only with respect to an adequate variant typeIn turn its constructors must be themselves compatible with the dependency mappedto them in δ For well-typed array dependencies (WTArr WTArrI) the defaultdependency as well as the exceptional dependency have to be compatible with thetype τ of the arrayrsquos elements Furthermore the type of i the index of the knownexceptional dependency has to be compatible with τi the arrayrsquos index type

In the following section we are discussing our intraprocedural dependency domainand the manner in which dependencies are computed and manipulated

53 Intraprocedural Analysis and Data-Flow Equations

531 Intraprocedural Dependency Domains

At an intraprocedural level dependency information has to be kept at each point ofthe control flow graph for each variable of the typing environment Γ that maps input

92 Chapter 5 Dependency Analysis for Functional Specifications

output and local variables to their types We use the term domain to denote thisinformation

Definition 531 Intraprocedural Dependency Domain ∆ isin D An intraproceduraldomain ∆ isin D

∆ V rarr D

is a mapping from variables to dependencies

An intraprocedural domain is associated to every node of the control flow graph rep-resenting the dependencies at the nodersquos entry point A special case is the mappingwhich binds all variables to perp which we call Unreachable

Unreachable equiv x 7rarr perp

In particular it is associated to nodes that cannot be reached during the analysisAlso if any of the variables of ∆ is marked as perp the entire node collapses becomingUnreachable

For any node of the control flow graph associated to an intraprocedural domain ∆∆(x) retrieves the dependency associated to the variable x If a dependency for x hasnot been computed yet it is mapped to

Forgetting a variable x from a reachable intraprocedural domain denoted by ∆ xldquoerasesrdquo the variablersquos dependency information by mapping it to

Definition 532 Forget x

∆ x =

Unreachable when ∆ = Unreachable

∆prime = y 7rarr

∆(y) when y 6= x when y = x

The v∆ or∆ and oplus∆ operations are pointwise extensions of v (defined in 522) or(defined in 523) and oplus (defined in 524) respectively they apply to intraproceduraldependency domains for each variable and its associated dependency δv

We define a partial order v∆ on D

Definition 533 Intraprocedural Partial Order v∆

v∆ sube D timesD ∆prime v∆ ∆primeprime iff ∆prime(x) v ∆primeprime(x)forallx isin V

In particular Unreachable is the bottom of this intraprocedural lattice It is the identityelement of the intraprocedural join or∆ operation and the absorbing element of theintraprocedural reduction operator oplus∆ defined below

Definition 534 Intraprocedural Join Operation or∆

or∆ D timesD rarr D

∆prime or∆ ∆primeprime = ∆ lArrrArr ∆(x) = ∆prime(x) or∆primeprime(x)forallx isin V

53 Intraprocedural Analysis and Data-Flow Equations 93

Definition 535 Intraprocedural Reduction Operator oplus∆

oplus∆ D timesD rarr D

∆prime oplus∆ ∆primeprime = ∆ lArrrArr ∆(x) = ∆prime(x)oplus∆primeprime(x) forallx isin Γ

Finally an intraprocedural domain ∆ is well-typed with respect to a typing envi-ronment Γ if and only if the dependency mapped to any variable x is well-typed withrespect to xrsquos type in the typing environment Γ (Definition 431)

532 Intraprocedural Data-Flow Equations

Table 56 ndash Statements ndash Representations and Data-Flow Equations

Representation Equationn

n1 ni nk∆n1

∆ni ∆nk

s λ1 s λks λi∆n =

or∆

nsλiminusminusrarrni

JsKλi(∆ni)

Our dependency analysis is a backward data-flow analysis For each exit label ittraverses the control flow graph starting with its corresponding exit node and it marksall other exit points as Unreachable since exit labels are mutually exclusive The in-traprocedural domain for the currently analysed label is initialized with its associatedoutput variables mapped to gt Thereby the analysis starts by making a conservativeapproximation and by considering that all the input has been observed and the outputdepends on it entirely Typically dependence analyses are forward analyses Howevergiven our goal to express label-specific dependencies as input-output relations and tak-ing into consideration the characteristics of the αSmil language choosing to design ouranalysis as a backward data-flow analysis seemed a pertinent choice In αSmil outputsare associated to a particular exit label and they are generated if and only if the pred-icate exits with that particular label By traversing the control flow graph backwardswe can use this information and consider starting with the initialisation phase onlythe outputs that are relevant for the analysed exit label

After the initialisation the analysis then traverses the control flow graph and grad-ually refines the dependencies until a fixed point is reached Table 56 summarizes therepresentation and general equation of the statements For each statement the pre-sented data-flow equation operates on the intraprocedural domains of the statementrsquossuccessor nodes The intraprocedural domain at the entry point of the node is obtainedby joining the contributions of each outgoing edge as shown in Figure 510

Definition 536 The contribution of an edge (ni nj) labeled with s and λ is givenby JsKλ(∆nj ) where JsKλ() is the transfer function of the edge labeled s λ

94 Chapter 5 Dependency Analysis for Functional Specifications

Dependencies corresponding to variables that are written by a statement s on an exitlabel λ denoted by gensλ in Figure 510 are forgotten from the intraprocedural domainon which we are operating

statement

∆in = JsKλ1(∆λ1)or∆ or∆JsKλn(∆λn)JsKλi(∆i) (∆i gensλi

)oplus∆ δsλi

δsλicontribution of s on λi

δsλ1∆λ1

δsλn

∆λn

(∆λ1 gensλ1) oplus∆δsλ1 (∆λn gensλn) oplus∆δsλn

Figure 510 ndash Computation of the Intraprocedural Domain at a NodersquosEntry Point

In Section 521 we explained that as a consequence of the non-monotonic approxi-mations made when joining dependencies corresponding to arrays the result of the joinoperation is an upper bound not a least upper bound In order to deal with this issue weadopt the generic solution consisting of systematically joining the dependency domainassociated to a node before its iteration with the new dependency domain computedby the transfer function Thus the dependency domain of a node n is

∆n = old(∆n)or∆ (or

∆nminusrarrnprime

JsKλ(∆nprime))

This is not prohibitive in terms of performance leading to an increase of the executiontime of 5 to 10

Tables 57 58 59 510 define the transfer functions for each built-in statementof our language whereas the general case of a predicate call and its correspondingequation will be detailed in Section 54

Table 57 presents the transfer functions for statements which are not type-specificFor equality tests (1) both of the inputs e1 e2 are completely read whether the testreturns true or false The transfer functions therefore reduce the domain of the corre-sponding successor node with a domain consisting of e1 and e2 both mapped to gt Inthe case of assignment (2) the dependency of the written output variable o is forgottenfrom the successorrsquos intraprocedural domain thus being mapped to and forwardedto the input variable e The transfer function for the nop operation (3) is simply theidentity

53 Intraprocedural Analysis and Data-Flow Equations 95

Statement JsKλi(∆)

Equality test (1)Je1 = e2Ktrue(∆) = ∆ oplus∆ dep where

Je1 = e2Kfalse(∆) = ∆ oplus∆ dep dep =e1 7rarr gte2 7rarr gt

Assignment (2) Jo = eKtrue(∆) = (∆ o) oplus∆ e 7rarr ∆(o)

No Operation (3) JnopKtrue(∆) = ∆

Table 57 ndash Generic Statements ndash Data-Flow Equations

The data-flow equations given in Table 58 correspond to structure-related state-ments For the equations (4) (5) (6) and (7) we assume that the variable r is of typestructf1 τ fn τ for some fields fi 1 le i le n The equation (4) refers to thecreation of a structure each input ei is read as much as the corresponding field fi ofthe structure is read The destructuring of a structure is handled in (5) each field fi isneeded as much as the corresponding variable oi is When accessing the i-th field of astructure r (6) only the field fi is read and only as much as the accessrsquo result o itselfThe equation (7) treats field updates the variable ei is read as much as the field fi isThe structure r is read as much as all the fields other than fi are read in rprime Finally theequations given in (8) handle partial structure equality tests and the transfer functionsare the same for the labels true or false for both compared structures rprime and rprimeprime all thefields in the given set f1 fk are completely read and only those

Statement JsKλi(∆)

Create (4) Jr = e1 enKtrue(∆) = (∆ r) oplus∆oplus

1leilenei 7rarr ∆(r)fi

Destructure (5) Jo1 on = rKtrue(∆) = (∆ oi| oi isin o) oplus∆ r 7rarr f1 7rarr ∆(o1) fn 7rarr ∆(on)

Access field (6) Jo = rfiKtrue(∆) = (∆ o) oplus∆ r 7rarr f1 7rarr fi 7rarr ∆(o) fn 7rarr

Update field (7) Jrprime = r with fi = eKtrue(∆) = (∆ rprime) oplus∆

ei 7rarr ∆(rprime)fir 7rarr f1 7rarr δ1 fn 7rarr δn

where δj =

∆(rprime)fj if j 6= i otherwise

Equality (8)

Jrprime = 〈f1 fk〉rprimeprimeKtrue(∆) = ∆ oplus∆ d where d =rprime 7rarr f1 7rarr δ1 fn 7rarr δnrprimeprime 7rarr f1 7rarr δ1 fn 7rarr δn

Jrprime = 〈f1 fk〉rprimeprimeKfalse(∆) = ∆ oplus∆ d and δi =gt if fi isin f1 fk otherwise

Table 58 ndash Structure-Related Statements ndash Data-Flow Equations

96 Chapter 5 Dependency Analysis for Functional Specifications

The data-flow equations given in Table 59 correspond to variant-related statementsThey follow the same principles as those used for structure-related statements aboveNote that the transfer functions for the switch (10) and possible constructor test (11)introduce perp dependencies for constructors which are known to be impossible on theconsidered edge In particular since perp is an absorbing element for oplus these transferfunctions erase for every constructor which is known to be locally impossible all thedependency information possibly attached to such a constructor in the successor nodesThis is the actual raison drsquoecirctre for the reduction operator since using or∆ to combinea successor domain and a local contribution would lose this information

Finally the equations for array-related statements are given in Table 510 We as-sume for both that the context is fixed and that I is the distinguished set of inputvariables for the analysed predicate This set is used to make sure that exceptions inarray dependencies are only registered to variables in I and not local or output vari-ables The reason for such a constraint is pragmatic input variables are not assignablein our language and therefore they always represent the same value intraprocedurallyOtherwise each time a variable is written by a statement we would need to traverseall the dependencies in the domain to erase or reinterpret the occurrences where thisvariable appears as an exception Only recording exceptions for input variables makesthis kind of costly traversal useless and since only exceptions about input variablesmake sense at the interprocedural level (see Section 54) we do not lose much precisionby doing so

Statement JsKλi(∆)

Create variant (9) Jv = Cp[e]Ktrue(∆) = (∆ v) oplus∆ e 7rarr ∆(v)Cp

Variant Switch (10) Jswitch(v) as [o1| |on]Kλi(∆) = (∆ oi)oplus∆ v 7rarr depiwhere depi = [C1 7rarr perp Ci 7rarr ∆(oi) Cn 7rarr perp]

Possible variant (11)

Jv isin C1 CkKtrue(∆) = ∆ oplus∆ v 7rarr [C1 7rarr δ1 Cn 7rarr δn ]

where δi =

∆(v)Ci if Ci isin C1 Ckperp otherwise

Jv isin C1 CkKfalse(∆) = ∆ oplus∆v 7rarr

[C1 7rarr δ1 Cn 7rarr δn

]

where δi =

∆(v)Ci if Ci isin C1 Ckperp otherwise

Table 59 ndash Variant-Related Statements ndash Data-Flow Equations

53 Intraprocedural Analysis and Data-Flow Equations 97

Statement JsKλi(∆)

Array access (12)

Jo = a[i]Ktrue(∆) =

(∆ o) oplus∆

i 7rarr gta 7rarr 〈 i ∆(o)〉

when i isin I

(∆ o) oplus∆

i 7rarr gta 7rarr 〈∆(o) or 〉

when i isin I

Jo = a[i]Kfalse(∆) = ∆ oplus∆

i 7rarr gta 7rarr 〈〉

Array update (13)

Japrime = [a with i = e]Ktrue(∆) =

(∆ aprime) oplus∆

i 7rarr gte 7rarr ∆(aprime)〈i〉a 7rarr 〈∆(aprime)〈lowast i〉 i 〉

when i isin I

(∆ aprime) oplus∆

i 7rarr gte 7rarr ∆(aprime)〈lowast〉a 7rarr 〈∆(aprime)〈lowast〉 or 〉

when i isin I

Japrime = [a with i = e]Kfalse(∆) = ∆ oplus∆

i 7rarr gta 7rarr 〈empty〉

Table 510 ndash Array-Related Statements ndash Data-Flow Equations

The transfer functions for (12) and (13) thus take care of making adequate approximationswhen exceptions cannot be introduced As for the cases when the array access exitswith the false label note that the contribution to the array a is 〈〉 which is strictlyless precise than The operation makes implicit bounds checking and this can thusbe seen as accounting for the fact that no cell in a has been read but the ldquolengthrdquoor ldquosupportrdquo of a has been read Hence it would not be correct to claim that theresult of the statement does not depend on a at all Similarly a variant dependency[C1 7rarr Cn 7rarr ] mapping all constructors to nothing has not read any value inany of the constructors but may still depend on the variantrsquos constructor itself Incontrast we do not make this distinction for structures because we assume surjectivepairing ie structure values consist only of the fields themselves Our solution caneasily be adapted in order to deal with non-surjective cases

533 Intraprocedural Dependency Analysis Illustrated

To better illustrate our analysis at an intraprocedural level we exemplify the mechanismbehind it step by step on the predicate thread discussed in Section 511 We considerthe true execution scenario apply our dependency analysis and compare the actualobtained results with the targeted ones depicted in Figure 55

Since a predicate can only exit with one label at a time and we are considering thetrue label we can map the nodes None and oob to Unreachable as shown in Figure 511This is an advantage of backwards analyses For true we make a pessimistic assumptionand map the output ti to gt considering that control on the output is external and

98 Chapter 5 Dependency Analysis for Functional Specifications

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachableti 7rarr gt

Figure 511 ndash Analysing Predicate thread ndash Initialisation

hence out of our reach and that ti will be entirely needed by a potential caller Goingfurther up the control flow graph we analyse the variant switch

In order to compute the dependency for the node corresponding to the variantswitch we apply the data-flow equation given by (10) in Table 59 Since we areanalysing the true case we know that all other constructors (only the constructor Nonein this case) are locally impossible Thus we map it to perp We continue by forgettingthe dependency information we knew about the output ti Since its value is neededonly in as much as the result of the switch on the corresponding edge is needed weforward it to the part corresponding to the Some constructor This is summarized below

oplusoplus perp perp

C1 CSome Cn

tio =

ti =

Jswitch(v) as [o1| |on]Kλi(∆) = (∆ oi)oplus v 7rarr depiwheredepi = [ C1 7rarr perp Ci 7rarr ∆(oi) Cn 7rarr perp ]

Figure 512 ndash Applying the Variant Switch Equation

Taking all this into account for the node corresponding to the variant switch weobtain the dependency shown in Figure 513 For the output ti we depend entirelyon the Some constructor of the nodersquos input variant tio while the constructor None isimpossible

Making a step further up the graph we access the cell i of the array th and applythe equation (12) given in Table 510 We begin by forgetting the dependency for theoutput tio since this is written Since we only access the element i we map all othercells to Nothing ie To the dependency corresponding to the i-th cell we forward

53 Intraprocedural Analysis and Data-Flow Equations 99

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

tio 7rarr [Some 7rarr gt None 7rarr perp]

ti 7rarr gt

Figure 513 ndash Analysing Predicate thread ndash Variant Switch

the dependency we knew about tio since we depend on it to the extent to which theresult of the access is needed

oplusoplus oplusoplus oplusoplus1 i n

th =

tio =

Jo = a[i]Ktrue(∆) =

(∆ o) oplus

i 7rarr gta 7rarr 〈 i ∆(o)〉

when i isin I

(∆ o) oplusi 7rarr gta 7rarr 〈∆(o) or 〉

when i isin I

Figure 514 ndash Applying the Array Access Equation

We thus obtain a dependency stating that we depend only on the i-th cell of thearray th for which only the constructor Some is possible and entirely needed The cellrsquosindex i is entirely needed as well The applied equation is shown in Figure 514 (sincei is an input we use the first case of the equation) and the obtained results are shownin Figure 515

As a last step we access the field threads of the input process p and apply theequation (6) given in Table 58 and illustrated in Figure 516 As before we forget theinformation for th the access result We map all other fields to and we forward thedependency of the variable th to the dependency part of the field threads

We thus obtain the dependency result shown in Figure 517 This states that for thelabel true the output ti depends only on the i-th cell of the field threads of the inputprocess p for which it depends entirely on the Some constructor Before returning thepredicatersquos final results the analysis filters out any dependency information referringto local variables and verifies that the invariant imposed on dependency information

100 Chapter 5 Dependency Analysis for Functional Specifications

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

th 7rarr 〈 i [Some 7rarr gt None 7rarr perp] 〉i 7rarr gt

tio 7rarr [Some 7rarr gt None 7rarr perp]

ti 7rarr gt

Figure 515 ndash Analysing Predicate thread ndash Array Access

f1 = oplusoplus f2 = oplusoplus

fthreads = oplusoplus

fnminus1 = oplusoplus fn = oplusoplus

p =

th =

Jo = rfiKtrue(∆) = (∆ o) oplus s 7rarr f1 7rarr fi 7rarr ∆(o) fn 7rarr

Figure 516 ndash Applying the Field Access Equation

related to arrays holds Since the results refer only to the inputs p and i and the indexof the exceptional computed dependency is an input the invariant holds and the finalresult can be retrieved The final dependency results obtained for the thread predicateon the exit label true are identical to the ones that we were targeting and that weredepicted in Figure 55 For readability considerations for structures such as the inputprocess p we omit dependencies on fields mapped to We maintain this conventionthroughout the rest of this chapter and thus any field of a structure that is omittedfrom a dependency summary should be interpreted as being mapped to ie nothing

54 Interprocedural DependenciesExit labels presented in Section 312 and in Section 41 (on page 63) constitute anincreased source of expressivity as they indicate the scenario that was observed whileexecuting a predicate We incorporate this expressivity in our dependency results bycomputing specific dependencies for each possible execution scenario Therefore ouranalysis is performed label by label and interprocedural dependency domains associatean intraprocedural domain to each exit label of the analysed predicate The variable

54 Interprocedural Dependencies 101

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

p 7rarr threads 7rarr 〈 i [Some 7rarr gt None 7rarr perp]〉i 7rarr gt

th 7rarr 〈 i [Some 7rarr gt None 7rarr perp] 〉i 7rarr gt

tio 7rarr [Some 7rarr gt None 7rarr perp]

ti 7rarr gt

Figure 517 ndash Analysing Predicate thread ndash Field Access

key-set of each associated intraprocedural domain comprises the inputs of the analysedpredicate A label that cannot be returned is mapped to an Unreachable intraproceduraldomain This is a form of path-sensitivity (Robert and Leroy 2012) However we favorthe term label-sensitivity for this characteristic as it seems to be a more natural choiceapplied to our case and the language we are working on

An interprocedural domain of a predicate p is thus defined as shown below

Definition 541 Interprocedural Dependency Domain

Dp Λp rarr D where Λp the set of output labels of predicate p

For each analysed label of a predicate the analysis starts by initializing the intrapro-cedural domain mapped to it with the output variables associated to the exit labelTo avoid making any false assumption these are initially mapped to the most generaldependency namely gt Subsequently as described in Section 532 the dependencyinformation is gradually refined until a fixed point is reached The execution scenariosdenoted by the exit labels of a predicate are mutually exclusive Therefore during theanalysis of a particular exit label all other exit labels of the predicate are mapped toUnreachable After reaching a fixed point the intraprocedural domain is filtered so thatonly input variables appear in the variable set As explained in Section 532 the in-traprocedural domains are built such that only input variables may appear as exceptionindices in dependencies computed for arrays This invariant is preserved throughoutthe analysis

Interprocedural dependency information is expressed in terms of the formal param-eters of predicates For analysing predicate calls we need to substitute the formalparameters of the callee by the ones that are supplied by the caller Therefore asubstitution must be performed on interprocedural summaries This consists in substi-tuting all occurrences of formal input parameters of a predicate by the correspondingeffective input parameters The substitution operation is denoted as J (χ) where χ isa substitution from formal to effective parameters

102 Chapter 5 Dependency Analysis for Functional Specifications

We proceed by detailing the equation corresponding to a call to a predicate

p(e1 en)[λ1 o1 | | λm om]

having the following signature

p(ε1 εn)[λ1 ω1 | | λm ωm]

The general equation (given in Table 56) applies

∆n =or

∆nsλiminusminusrarrni

Jp(e1 en) [λ1 o1 | | λm om]Kλi(∆ni)

The transfer functions for the predicate call statement are deduced from the predicatersquosinterprocedural domain in the following fashion

Jp(e1 en) [λ1 o1 | | λm om]Kλi(∆) = (∆ oi)oplus

jisin1nej 7rarr depij

where (PredEq)depij = Dp(λi)(εj) J (ε 7rarr e)

Namely the mappings for the outputs o associated to a label λi are removed and thecontribution of a call to each input ej stems from the contribution of the interproceduraldomain for label λi and formal input εj In these all the formal input parametersε in array dependency domains are substituted by the corresponding effective inputparameters from e

An αSmil program is analysed by computing once and for all an interproceduraldependency domain for every predicate These are stored in a mapping binding pred-icate identifiers to their interprocedural dependency domains Whenever a predicatecall is handled intraprocedurally the corresponding computed interprocedural depen-dency summary is retrieved from the mapping propagated to the calling site and usedas explained above If an interprocedural dependency summary for a called predicatehas not been computed yet it is handled as if it were an implicit predicate In practicein programs generated in αSmil from Smil predicates are sorted in topological orderwhen possible For implicit predicates described in Chapters 3 and 4 a pessimisticassumption is made it is considered that everything in their inputs has been read andis needed for any of their possible exit labels Since their implementation is hidden aconservative approximation must be made in their case

Inductive predicates have been discussed in Section 314 (on page 46) They arespecification-only predicates and represent a disjunction of cases Each case can intro-duce existentially quantified variables An inductive predicate exits with the true labelif any of its declared cases holds Therefore for inductive predicates one analysis percase is made For the true exit label the dependency results are obtained by joiningthe results of all cases For the false label everything is considered to be read

54 Interprocedural Dependencies 103

541 Interprocedural Dependency Analysis Illustrated

To better illustrate our analysis at an interprocedural level we revisit our start_addressexample predicate introduced in Section 511 We consider the true execution scenarioapply our dependency analysis and compare the actual obtained results with the tar-geted ones depicted in Figure 58

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

error

trueNone

true

true

oob

adr 7rarr gt

sj 7rarr start 7rarr gt

tj 7rarr stack 7rarr start 7rarr gt

Figure 518 ndash Gstart_address ndash Dependency Information

We begin by initialising the output adr withgt and continue by traversing the controlflow graph backwards and by computing the dependency information at each nodeWe apply the data-flow equation (6) given in Table 58 and we obtain the intermediateresults shown in Figure 518

To compute the dependency information of the control flow graphrsquos entry node iethe one corresponding to a predicate call to thread we use the dependency summarycomputed for this predicate for the exit label true and we substitute the formal pa-rameters ie p and i appearing in it with the effective arguments of the call ie pand j We thus obtain the following dependency summary

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

We apply the data-flow equation (PredEq) corresponding to a predicate call discussedon page 102 and make use of the dependency information corresponding to the suc-cessor node on the edge marked with true

tj 7rarr stack 7rarr start 7rarr gt

thus obtaining the following final dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

However the targeted results for start_address depicted in Figure 58 would trans-late to

104 Chapter 5 Dependency Analysis for Functional Specifications

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr gt None 7rarr perp]〉j 7rarr gt

Clearly the dependency information computed by our analysis and shown in Fig-ure 519 is an over-approximation of the results that we had envisioned The obtaineddependency summary states that the entire j-th associated thread of the input pro-cess p is needed in order to obtain the output adr on the true exit label Howeverin reality only one of this threadrsquos fields is actually needed namely the stack fieldfor which only one subelement ndash the start field ndash is read This loss of precision isa consequence of the dependency information mapped to the Some constructor at thecontrol flow graphrsquos entry node corresponding to a call to the thread predicate Whenexecuting successfully and exiting with label true the thread predicate returns the i-thassociated thread of its input process However the predicate thread does not need thiselement itself it does not read nor use it per se it merely retrieves it The dependencyon this returned element is relative to the amount in which the predicatersquos callers willuse it The start_address predicate for instance depends only on one of the 3 fieldsof the returned thread Yet by mapping the i-th thread to gt in threadrsquos dependencysummary we fail to mirror this distinction gt is the top element of our dependencydomain and joining it with any other dependency will lead to gt thus shadowing anyother information we might compute while observing its usage

542 Context-Insensitivity and its Consequences

Precision losses in dependency summaries such as the one detected in our previousexample are a direct consequence of considering and analysing predicates in isolationThere is a level of information that goes beyond a predicatersquos own control flow graphand a more detailed picture that can emerge once non-local information connected tothe predicatersquos use ie the calling context is included into the analysis

Interprocedural analyses that consider the calling context when analysing the targetof a function ndash or in our case a predicate ndash call are context-sensitive analyses (Hind2001) As the name implies context-sensitive analyses can jump back to the originalcall site using context information for the results they compute Context-insensitiveanalyses on the other hand dispense with such information and propagate back to all

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

error

trueNone

true

true

oob

adr 7rarr gt

sj 7rarr start 7rarr gt

tj 7rarr stack 7rarr start 7rarr gt

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

Figure 519 ndash Gstart_address ndash Final Dependency Results

55 Semantics of Dependency Values 105

possible call sites the information that they compute once This is a notorious sourceof potential precision loss in static analysis Choosing either one of these two traits hassignificant consequences on the one hand by choosing to ignore the calling contextand the additional information it supplies one pays a high price in terms of precisionand on the other hand by choosing to include such information one risks sacrificingscalability

Our dependency analysis as presented so far is context-insensitive for each predi-cate the analysis computes a dependency summary once stores it and further propa-gates it to its callers whenever needed Considering that αSmil predicates are sequencesof calls to other predicates built-in or user-defined as discussed in Chapter 4 if wewould adopt a purely context-sensitive solution we would gain in terms of precisionbut we would obtain results that are prohibitive in terms of performance This is atypical trade-off of static analyses We address this issue and describe our solution indetail in Chapter 6 Without adopting context-sensitivity to the letter we strike a bal-ance between the two alternatives by including lazy components in our interproceduraldependency summaries and by using them for injecting the current intraproceduralcontext on an as-needed basis As will be discussed in Chapters 6 and 8 this approachleads to improved precision with only a marginal decrease in performance

55 Semantics of Dependency ValuesThere are two different manners of interpreting dependency values δ one focusing onthe possible constructors part and the other focusing on the dependency part Inboth cases the interpretations are relative to a type τ and hold only for well-typeddependencies of the same type The set of types that a dependency is compatible withhas been discussed in Section 522 and defined in Table 55

First focusing on the possible constructors aspect dependencies can be interpretedas a constraint on the forms that values may take Such constraints can arise asa consequence of perp ie impossible appearing in nested dependencies These aredescribed by a characteristic function 1

DD = (v δ) isin DtimesD | δ isin D τ isin T v isin Dτ Γ ` δ τ1 DD rarr 0 1

This is defined as follows belowDefinition 551 Characteristic function 1

1(vgt) = 11(v) = 11(vperp) = 0

1(f1 = v1 fn = vn f1 7rarr δ1 fn 7rarr δn) =

1 when 1(vi δi)forall1 le i le n0 otherwise

106 Chapter 5 Dependency Analysis for Functional Specifications

1(Ci[v] [C1 7rarr δ1 Cn 7rarr δn]) =

1 when 1(v δi)0 otherwise

1((P (vk)kisinP) 〈δdef 〉) =

1 when 1(vk δdef )forallk isin P0 otherwise

1((P (vk)kisinP) 〈δdef i δexc〉) =

1 when (1(vk δdef )forallk isin P k 6= E(i)) or(E(i) isin P1(vE(i) δexc))

0 otherwise

This interpretation is compatible with the partial order v (Definition 522 Ta-ble 51) defined on dependencies If a dependency is more precise or equal to anotherdependency then it should be interpreted as constraints which are at least as strong asthe ones for the other dependency Given a typing environment Γ (Definition 431)

forallτ isin Tlowast δ v δprime =rArr (Dτ cap 1(bull δ)) sube (Dτ cap 1(bull δprime))

whereTlowast = τ isin T | Γ ` δ τ and Γ ` δprime τ

The interpretation of the reduction operator oplus (Definition 524) with respect tothe constraints semantics of dependencies is that if two dependencies δ and δprime can beinterpreted as constraints for a value v then their reduction can be interpreted as aconstraint for v as well

1(v δ) and 1(v δprime) =rArr 1(v δ oplus δprime)

The converse which one might expect to be true as well does not hold because ofapproximations made by our treatment of arrays

Given a valuation E (Definition 442) an intraprocedural dependency summarycan be interpreted as a conjunction of the constraints on every variablersquos value as givenby its associated dependency We use the notation E ∆ to indicate this

E ∆ =rArr forallv isin V1(E(v)∆(v))

Under the appropriate conditions given a semantic transition λminusrarr (Definition 444)from the configuration

langE [s]

rang(Definition 443) to the valuation Ersquo as defined in

Section 44 if the intraprocedural summary ∆prime of the statementrsquos s successor on labelλ represents the semantic interpretation of constraints given Ersquo then the contributionJsKλ(∆prime) (Definition 536) of the edge labeled with s and λ must necessarily representthe semantic interpretation of constraints given E We thus obtain the following

55 Semantics of Dependency Values 107

Γ ` E =rArr (51)ΣΓO ` srarr λ =rArr (52)lang

E [s]rang λminusrarr Eprime =rArr (53)

Γ Eprime ` ∆prime =rArr (54)Eprime ∆prime =rArr (55)E JsKλ(∆prime) (56)

We note that thanks to the subject reduction property (Definition 447) (53)implies that Γ ` Eprime

Following from (56) when joining the contributions on all labels of the statements the obtained intraprocedural dependency summary represents the semantic interpre-tation of the disjunction of constraints given E

(E JsKλ1(∆prime1))or∆ or∆(E JsKλn(∆primen)) =rArrE (JsKλ1(∆prime1)or∆ or∆JsKλn(∆primen)) =rArrE old(∆) =rArrE old(∆)or∆(JsKλ1(∆prime1)or∆ or∆JsKλn(∆primen))

For a predicate p exiting with label λ and having the intraprocedural summary ∆λthe characteristic function given I sube E a valuation mapping the predicatersquos inputs totheir values constrains the space of inputs that can make the predicate exit with thelabel λ It thus denotes the necessary conditions on inputs according to the observedexecution scenario and can be used as an inversion lemma when reasoning on calls toa predicate

The soundness of this interpretation as well as the well-formedness of our dependen-cies have been proven in Coq and the corresponding files can be consulted online1 Themechanized Coq proofs are entirely due to Steacutephane Lescuyer These proofs also dealwith deferred dependencies that will be presented in Chapter 6 but these constitutean extension that does not modify the underlying lattice

The second interpretation of dependency values focuses on the dependency part andis a partial equivalence relation asymp

TD= (τ δ) isin Ttimes D | Γ ` δ τasymp TDrarr Dtimes D

The partial equivalence relation asympτδ relates well-typed values of the same type τ Itrelates values that only differ in places that are irrelevant according to the dependencyδ It is defined as shown below

1The corresponding files are provided at the following address httpajl-demofr2015proveCoq

108 Chapter 5 Dependency Analysis for Functional Specifications

Definition 552 Partial Equivalence Relation asympτδ

asympτgt = (x x)| x isin Dτasympτ = (x y)| x y isin Dτasympτperp = (x y)| x y isin Dτ

asympstructf1τ1fnτnf1 7rarrδ1fn 7rarrδn = (f1 = v1 fn = vn f1 = w1 fn = wn) |

foralli 1 le i le n (vi wi) isin asympτiδi

asympvariant[C1τ1| | Cnτn][C1 7rarrδ1Cn 7rarrδn] = (Ci[vi] Ci[wi]) | (vi wi) isin asympτiδi

asymparrτi 〈τ〉〈δdef 〉 = ((P (vk)kisinP) (P (wk)kisinP)) | forallk (vk wk) isin asympτδdef

asymparrτi 〈τ〉〈δdef i δexc〉 = ((P (vk)kisinP) (P (wk)kisinP)) | E(i) isin P =rArr

(vE(i) wE(i)) isinasympτδexc forallk 6= E(i) (vk wk) isin asympτδdef

This interpretation is compatible with the partial order v (Definition 522) definedon dependencies If a dependency is more precise or equal to another dependency thenit should be interpreted as an equivalence relation relating more values

δ v δprime =rArr asympτδ supe asympτδprime forallτΓ ` δ τ and Γ ` δprime τ

The interpretation of the reduction operator oplus (Definition 524) with respect tothe equivalence relation interpretation of dependencies is that the set of values relatedby δ oplus δprime is a subset of the intersection of values related by δ and δprime respectively

asympτδoplusδprime sube asympτδ cap asympτδprime forallτΓ ` δ τ and Γ ` δprime τ

The interpretation of the or operator (Definition 523 Table 52) with respect tothe equivalence relation interpretation of dependencies is similar

asympτδorδprime sube asympτδ cap asympτδprime forallτΓ ` δ τ and Γ ` δprime τ

Given two valuations E and Ersquo they are equivalent modulo an intraproceduraldependency summary ∆ if the values that they associate to variables are equivalentmodulo the corresponding dependency associated in ∆

E asympΓ∆ Eprime =rArr forallv isin ∆ E(v) asympΓ(v)

∆(v) Eprime(v)

The equivalence relation asympΓ∆ thus relates valuations that are not distinguishable by

only looking at the parts specified by the intraprocedural dependency summary ∆This interpretation can be used to apply congruence modulo reasoning to predicate

calls By calling a predicate p with two sequences of input values v and u respectively

56 Related Work 109

which are related by the intraprocedural dependency summary of p on label λ thenthe predicate will necessarily exercise the same execution scenario exiting with label λand will yield identical outputs w

56 Related WorkThe frame problem and its manifestations in the software verification process ndash detect-ing program properties that remain unchanged under a certain operation ndash are notori-ous (Leavens Leino and Muumlller 2007 Leavens and Clifton 2005 OrsquoHearn 2005) Acomplete specification of a program will necessarily include frame properties (BorgidaMylopoulos and Reiter 1995) However though necessary specifying and verifyingframe properties is tedious and repetitive Two prominent solutions to the frame prob-lem come from separation logic (Reynolds 2005 Distefano OrsquoHearn and Yang 2006Calcagno et al 2011) and ownership types (Clarke and Drossopoulou 2002) HoweverMeyer (Meyer 2015) argues that the problem itself should not impose such annotation-heavy solutions Simpler automatic solutions for their specification and verificationwould allow programmers to concentrate on the truly challenging part (Meyer 2015)

Though we share the same desideratum with separation logic (Reynolds 2002Reynolds 2005 OrsquoHearn 2012 OrsquoHearn Yang and Reynolds 2004) the programmingparadigm and context under which we operate leads to a considerably different solutionSeparation logic is targeted at low-level imperative programming languages and itsapplications focus on shared mutable data structures We on the other hand focuson a purely functional language and consider immutable algebraic data structures andarrays We treat mappings between variables and values and analyse their evolution ina side-effect free environment in the context of verification of programs where a newoutput is obtained by altering just a subset of the inputrsquos subelements and preservingthe rest Instead of using a collection of Hoare triples as an abstract domain we havedefined our own dependency domain The results of our dependency analysis are closeto the concept of a footprint (Distefano OrsquoHearn and Yang 2006 Hur Dreyer andVafeiadis 2011 Bobot and Filliacirctre 2012) in the sense that they describe an over-approximation of only those variables and subelements that are needed by a programand are expressed as an input-output relation

The dependency results computed by our analysis are similar to primitive read andwrite effects used in ownership type systems (Clarke and Drossopoulou 2002) Writeeffects in our case are implicit and include strictly the output variables associated toan exit label Read effects can only refer to input variables of a predicate Alsoread effects comprise the whole execution of a method even if they are irrelevant forthe methodrsquos results We however ignore read effects on which the output does notdepend reflecting only those which contribute to the observed result A technique fordeclaring and verifying read effects in an ownership type system is presented in (Clarkeand Drossopoulou 2002) We use static analysis to automatically detect them Inthe Spec (Mike Barnett 2005) program verifier the notion of confined is used for

110 Chapter 5 Dependency Analysis for Functional Specifications

describing the reading effects of a pure method in terms of the ownership cone (ClarkePotter and Noble 1998) of its parameters

In (Hughes 1987) Hughes argues that analyses of programs that manipulate datastructures should ideally distinguish between the information they are computing fora data structure as a whole and the information computed for each component withinit The information that is computed by a backward analysis is dubbed generically ascontext A manner of constructing richer domains is described and it is argued that forinstance a context for a sum type must contain (sub)contexts for any of its summandsSimilarly for product types a context should include a (sub)context for each componentas well as a context referring to the value as a whole We target fine-grained dependencyinformation for structures variants and arrays Similarly to the described producttype contexts our dependencies for structures describe the dependency on each of thestructurersquos fields Variant dependencies are expressed in terms of the dependencies oftheir constructors ie their summands Furthermore it is argued that any contextshould include a maximal element interpreted as a ldquono informationrdquo value a minimalelement interpreted as ldquocontradictory requirementsrdquo and an element representing ldquonocontextrdquo or ldquounusedrdquo Close to the notion of ldquocontradictory requirementsrdquo we includean atomic value denoting impossible in our dependency domain Program points havinga ldquocontradictory requirementsrdquo context denote points in the program that will lead tocrashes if reached Our notion of impossible refers to nodes that are unreachable orconstructors that cannot occur on a given execution path Our maximal elementdenoting everything is a safe value close to the notion of ldquono informationrdquo Nothingan element different from both everything and impossible is similar to the notion ofldquounusedrdquo It denotes (sub)elements that are irrelevant and constitutes quite definiteinformation

Hughes (Hughes 1987) introduces a notion of neededunneeded parameters forprograms manipulating lists This enables detecting whether the value of a subterm isignored The method is formulated in terms of a fixed finite set of projection functionsMultiple other approaches and analyses focus on the elimination of unnecessary datastructures (Cousot and Cousot 1994) filtering of useless arguments and unnecessaryvariables in the context of logic programming (Leuschel and Soslashrensen 1996) and morerecently removing redundant arguments (Alpuente Escobar and Lucas 2007)

The concept of a context is further discussed by Wadler and Hughes in (Wadler andHughes 1987) The authors describe a technique for strictness analysis for non-flat listdomains that relies on contexts represented using the notion of projections from domaintheory These allow expressive list descriptions such as contexts specifying that while alistrsquos elements can be ignored its length is relevant Their backward analysis computesnecessary information using a fixed finite abstract domain

Leino and Muumlller (Leino and Muumlller 2008b) present a technique for verifying thatmethods that query the state of identical data structures return identical or equivalentresults They stress the frequency of such assumptions in program verification as wellas the counter-intuitive amount of effort required for the specification and verificationof such equivalent-results methods and their callers One of the two interpretationsof our dependency values mdash asympτδ mdash is an equivalence relation binding pairs of values

56 Related Work 111

that are not distinguishable by considering only the parts specified by the dependencydomain Thus it ensures not only that identical input data structures will lead to iden-tical results but also that different invocations of a predicate with input data structuresthat are congruent with respect to this interpretation will lead to identical results Ourdependencies are similar to the influence sets presented by Leino and Muumlller Influencesets are represented as sets of heap locations and they are used to specify the partsof the program state that are allowed to impact the return values Influence sets areuser-defined and they are required to be self-protecting This property is enforced byrequiring the set of path expressions specifying the influence set to be prefix close aconstraint which is then checked syntactically In contrast our dependencies are com-puted by static analysis Influence sets may depend on the heap Reasoning aboutheap locations is beyond the scope of our analysis We treat mappings between vari-ables and values analyse their evolution in a side-effect free environment and expressdependencies as input-output relations The technique presented by Leino and Muumlllerhas been applied for reasoning about pure methods (Leino Muumlller and Wallenburg2008 Hatcliff et al 2012 Nordio et al 2010 Banerjee and Naumann 2014)

Identifying the input (sub)parts on which a predicatersquos outputs depend can also beseen as an instance of secure information flow (Sabelfeld and Myers 2003) where thepredicatersquos outputs and the input (sub)parts appearing in the predicatersquos dependencysummary have a low-security level ie are public and everything else has a high-security level ie is private The first interpretation of our dependency values mirrorsthe notion of non-interference as given by Volpano et al in (Volpano Irvine andSmith 1996) for deterministic programs By only observing the public parts nothingcan be concluded about the private parts The link between permissions and ownershiptypes has been underlined by Zhao and Boyland (Zhao and Boyland 2008)

Liu and Stoller present a backward dependence analysis for the computation ofdead code (Liu and Stoller 2003) They obtain expressive descriptions of partiallydead recursive data using liveness patterns These are based on general regular treegrammars that were extended with two notions live and dead Users can specifyliveness patterns at particular program points of interest The analysis then uses theseand computes liveness patterns at all program points based on constraints derived fromthe programming language semantics and the program itself The obtained informationis meant to be used for identifying and eliminating dead code In a separate paper (Liu1998) Liu presents three approximation operations meant to guarantee terminationin the context of fixed point computations using general grammar transformers onpotentially infinite grammar domains

Static dependence or liveness analyses are typically used for code optimizationdead code elimination (Liu and Stoller 2003) and compile time garbage collectionbut only seldom for program verification One exception that we are aware of comesfrom Frama-C (Cuoq et al 2012) where it is used in a purely automatic setting andunlike our analysis it does not handle unions and arrays A plug-in based on theavailable value analysis (Frama-C Value Analysis User Manual) computes lists of inputand output locations for each function distinguishing between operational functionaland imperative inputs and outputs Dependencies computed for an output o hold if

112 Chapter 5 Dependency Analysis for Functional Specifications

and when the analysed function terminates They are represented as sets of variableswhose initial value can influence the final value of o Input variables appearing in thisset are called functional inputs Imperative inputs are the locations that may be readduring the execution of the analysed function An over-approximation of the set ofthese locations is computed locations that are read only in non-terminating branchesare included in the imperative inputs set as well Operational inputs are the memoryzones that are read without having been previously written to

57 ConclusionIn the context of interactive formal verification of complex systems considerable effortis spent on proving the preservation of the systemrsquos invariants However most oper-ations have a localised effect on the system which only really impacts few invariantsat the same time Identifying those invariants that are unaffected by an operation cansubstantially ease the proof burden for the programmer

In this chapter we have presented a data-flow analysis that computes a conserva-tive approximation of the input fragments on which the operations depend It is aflow-sensitive path-sensitive interprocedural dependency analysis that handles arraysstructures and variants For the latter it simultaneously computes a subset of possibleconstructors We have defined our own abstract dependency domain and we obtaindependency information that mirrors the layered structure of compound data types

The main original traits of this contribution stem from its design as an analysismeant to be used as a companion tool during interactive program verification in aunified manner on programs as well as on specifications

We have implemented a prototype of the dependency analysis in OCaml and wehave applied it to a functional specification of ProvenCore (Lescuyer 2015) a general-purpose microkernel that ensures isolation Its proof is based on multiple refinementsbetween successive models from the most abstract one on which the isolation propertyis defined and proven to the most concrete ie the actual model used for code gener-ation Medium-sized experiments performed on the abstract layers of ProvenCore showpositive results For instance the dependency results of approximately 630 αSmil pred-icates totalling approximately 10000 lines of code are obtained in less than 1 secondStatic approaches have long been considered as being confined to small programs Webelieve that our preliminary results indicate that it is possible to report conservativedependency summaries without sacrificing scalability The implementation and the ob-tained results will be presented and discussed in detail in Chapter 8 The prototypecan be tested on the web page2 dedicated to our dependency analysis where variousexamples are provided and explained Additionally users can devise and test their ownexamples

An obvious first challenge is to address the issue of context-sensitivity In thefollowing chapter we present a solution based on lazy components which are includedin our interprocedural dependency summaries The current intraprocedural context is

2Dependency Analysis Web Page httpajl-demofr2015

57 Conclusion 113

injected in them on an as-needed basis As we will show in Chapter 6 these lead toimproved precision with only a marginal decrease in performance

Our main goal is to combine the dependency analysis with the correlation analysispresented in Chapter 7 which is meant to detect relations between inputs and outputsBy uncovering partial equivalence relations between inputs and outputs after havingdetected that a property only depends on unmodified parts and by unifying the resultsthe preservation of invariants for the unmodified parts can be inferred

We surmise that besides its intended target other programming activities can relyon our dependency analysis as well For instance it could have applications in thetesting realm the computed dependency information could be used for designing andgenerating test suites that avoid redundant testing of the same execution scenarioBased on the second interpretation mdash asympτδ mdash of our dependency information given inSection 55 classes of inputs that will test the same execution scenario can be deter-mined The input subelements on which the outputs of a predicate do not depend canbe consistently supplied with the same testing value as they are completely irrelevantfor the outcome On the contrary the input subelements on which the outputs dependshould be targeted and their values should be varied for more comprehensive testingSince our dependency analysis computes results for every exit label of an αSmil pred-icate it could also facilitate unit testing for exceptions Furthermore the computeddependency information could provide assistance in specifying read effects of predicatessimilar to accesible clauses (Leavens et al 2006) in JML

The dependency analysis presented in this chapter has been the subject of a previouspublication (Andreescu Jensen and Lescuyer 2015)

115

Chapter 6

Deferred Dependencies InjectingContext in DependencySummaries

No symbols where none intended

Samuel Beckett

61 Dealing with Context-InsensitivityTraditionally the precision of static analyses is characterized along several axes in-cluding the scope of the analysis ie intraprocedural or interprocedural analyses anddifferent nuances of sensitivity relative to the analysisrsquo use of control-flow informationor of information pertaining to the calling context This classification and terminologyhas its origins in data-flow analyses (Hind 2001 Midtgaard 2012) Regarding scopeintraprocedural analyses are local and operate within the boundaries of procedures Incontrast interprocedural analyses are global and operate across procedure calls (Midt-gaard 2012) These are somewhat more challenging and costly to perform and imposedealing with parameter mechanisms

Another important distinction is made regarding the calling context Context-sensitive analyses distinguish between different calling contexts At the other end ofthe spectrum context-insensitive analyses compute information only once and subse-quently use the same information at all calling sites Clearly a context-sensitive analysisis more precise than a context-insensitive analysis but it is also more costly (NielsonNielson and Hankin 1999) The choice between which technique to use amounts to acareful balance between precision and efficiency (Nielson Nielson and Hankin 1999)The dependency analysis presented in the previous chapter is an interprocedural flow-sensitive context-insensitive data-flow analysis Regarding pure context-sensitivity ina functional language such as αSmil in which predicate calls and the manipulation ofthe returned outputs are omnipresent unfolding predicates at each call site and recom-puting the needed information seems to be a daunting perspective that risks becomingprohibitive in terms of execution time very quickly On the other hand choosing toanalyse predicates in isolation and to dispense completely with information regarding

116 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

the calling context leads to clear precision losses as illustrated in Section 541 anddiscussed in Section 542 In order to address this aspect we have devised a solutionbased on symbolic dependencies that requires an extension of our abstract dependencydomain (Definition 521) but which otherwise has a minimal impact on the dependencyanalysis at an intraprocedural and interprocedural level

Outline In this chapter we present our solution based on symbolic dependencies Westart by illustrating the addressed problem and the desired results in Section 62 InSection 63 and Section 64 we present the extended abstract dependency domain Weshow the insertion and use of symbolic components at the intra- and interprocedurallevel of our dependency analysis in Section 65 and Section 66 respectively Finallywe discuss their impact on the precision of the computed dependency information

62 Symbolic Dependency Components in a NutshellSymbolic dependency components allow us to compute interprocedural predicate sum-maries with lazy components in which the callerrsquos intraprocedural information andcontext can be injected on an as-needed basis The interprocedural dependency infor-mation for each predicate is still computed only once and propagated back to everypossible call site However even though the analysis does not systematically recomputethe dependency for the called predicate it shows a form of context-sensitivity (Hind2001) and leads to increased precision by creating templates with symbolic elements foreach predicate These elements introduce degrees of freedom in our interprocedural de-pendencies and allow us to parameterize and vary them according to the callerrsquos actualintraprocedural context Thus we exclude some sources of coarse over-approximationswithout sacrificing scalability

Previously in Section 541 we illustrated on two αSmil example predicates threadand start_address how failing to take into consideration the current context of acaller leads to over-approximations We argued in Section 542 that a more precisedependency blueprint can emerge once we consider a predicatersquos use as well The firstexample predicate given in Chapter 5 thread is an accessor predicate it receives aprocess p and an index i as inputs and returns the i-th associated thread of the processp when executing succesfully ie when exiting with the true label The computedpredicatersquos dependency summary for the successful execution scenario was the following

p 7rarr threads 7rarr 〈 i [Some 7rarr gt None 7rarr perp]〉i 7rarr gt

This dependency information is expressive it shows that only one of the 4 fields ofthe input process is read by the predicate while all others are irrelevant for its outputThe read field threads corresponds to the array of threads associated to the inputprocess p Furthermore the dependency summary shows that for this array only thei-th element is inspected This element is entirely needed while all others are irrelevant

62 Symbolic Dependency Components in a Nutshell 117

This summary provides a rather detailed and precise blueprint of the predicatersquos outputdependencies on its inputs Yet it fails to make one subtle but important distinctionregarding the dependency on the i-th element of the associated threads array Ifwe want to be more accurate while describing this predicatersquos dependency we needto acknowledge that the predicate itself is not actually needing or depending on thei-th associated thread of the process Indeed it does not read or use it per se itmerely retrieves it Thus the dependency on the input processrsquo i-th associated threadis relative to the amount in which the callers of the thread predicate will use theoutput element in which it is retrieved It is important to distinguish between thesetwo rather subtle nuances Failing to do so can shadow information that is computedwhile analysing callers of the thread predicate This was exactly what happened forour second example predicate start_address The predicate start_address receivesa process p and an index j as inputs It makes a call to the predicate thread thusreading the j-th associated element of the process p If this is an active element itfurther accesses the field stack from which it only reads the start address start Theobtained dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

was an over-approximation of the desired dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr gt None 7rarr perp]〉j 7rarr gt

Intraprocedurally the dependency analysis was correctly detecting that only thefield stack of the thread was needed for which only the start field was read Howeverwhen joining the dependency information computed locally for start_thread with theone given by the predicatersquos thread dependency summary we obtain less precise de-pendency results This scenario is not a corner case it would typically be exhibited inthe case of accessor predicates and their callers

In order to address this source of precision loss we can introduce symbolic or lazycomponents in our abstract dependency domain As a first attempt and approximationwe could consider the set of output variables of a predicate as the lazy componentsThese can be seen as the points at which a caller predicate may insert its intraproceduralinformation in the dependency summary computed for the callee predicate

The dependency summary for a successful execution of the thread predicate iethe true exit label would therefore not map the i-th element of the threads arrayto everything ie gt the top element of our abstract dependency domain Insteadthis would be mapped to the symbolic set of output variables in which this inputsubelement is retrieved ie the set containing the ti output variable We denote thisset by Deferred(ti) as it represents the set of points in which a caller predicate caninject its context Establishing the dependency on the i-th associated thread of theinput process p is thus deferred or postponed and left to the caller predicates it isrelative to their context and the amount in which they use the output ti

118 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr Deferred(ti) None 7rarr perp]〉j 7rarr gt

Using this dependency summary when computing the information for the predicatestart_thread we would obtain the targeted dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr Deferred(adr) None 7rarr perp]〉j 7rarr gt

This dependency summary for start_address shows that the dependency on thej-th associated thread of the input process p depends on the amount in which theoutput adr representing the start address of the threadrsquos stack is subsequently usedIndeed start_address itself is an accessor predicate

This first approximation of lazy components as sets of output variables of a predi-cate is effective for accessor predicates However its limitations become visible whenconsidering functional non-destructive mutator predicates for example Such predi-cates receive a compound input destructure it and construct a new output variableThis is created by modifying only one of the compound inputrsquos subelements and bycopying all the rest without further changes For example the predicate set_threadshown below is the dual of our thread example predicate It receives a process p athread ti and an index i as inputs and returns a new process r as an output ob-tained by setting the i-th associated thread in the threads array to ti and by copyingeverything else from p

predicate set_thread ( process p int i thread ti)-gt [ true process r] array ltoption ltthread gtgt threads option ltthread gt tio

r = p [ true -gt 1]threads = r threads [ true -gt 2]tio = Some(ti) [ true -gt 3]threads = [ threads with i = tio] [ true -gt 4 f a l s e -gt 6]r = r with threads = threads [ true -gt 5][ true][error]

The dependency summary computed for this predicate on the exit label true isshown below It indicates that the given inputs the index i and the thread ti used forupdating the i-th associated thread of the output process r are completely needed Forthe input process p the fields pid crt_thread and adr_space are completely neededas well They are copied without further changes to the output r From the arrayof associated threads all elements except the i-th are needed as well The latter iscompletely irrelevant since it is replaced in the output r by the given ti The formerare simply read and copied to r

62 Symbolic Dependency Components in a Nutshell 119

p 7rarr

threads 7rarr 〈gt i 〉pid 7rarr gt

crt_thread 7rarr gtadr_space 7rarr gt

i 7rarr gtti 7rarr gt

At a first glance this dependency summary seems to reflect rather accurately thepredicatersquos inputs and input subelements on which the output process r depends onHowever similarly to the accessor predicate thread a further distinction is possibleThe predicate set_thread does not depend itself on the input ti nor on the fields ofthe process p It does not use these for new computations ndash it simply copies them to thecorresponding output subelements Just as before the amount in which the outputrsquossubelements are used subsequently characterizes more precisely the dependency on theinputs of set_thread For instance the dependency on prsquos current thread field shouldbe the symbolic element corresponding to the outputrsquos process crt_thread Howeverour first attempt at representing symbolic elements as sets of output variables seen asa whole does not allow us to convey such information For expressing it we first needto be able to refer to the substructure rcrt_thread and use this as a lazy componentin which callers may inject their own context Similarly for the threads array we needto be able to refer to all other elements except the i-th one Thus at the symbolicdependencies level as well we need the capability of distinguishing between the differentsubelements of the inputs This would allow us to obtain the following dependencysummary

p 7rarr

threads 7rarr 〈 Deferred(rthreads〈lowast i 〉) i 〉pid 7rarr Deferred(rpid)

crt_thread 7rarr Deferred(rcrt_thread)adr_space 7rarr Deferred(radr_space)

i 7rarr gtti 7rarr Deferred(rthreads〈 i 〉Somet)

One way to capture the actual effect that is due to set_thread consists in replac-ing all deferred dependencies with ie nothing and simplifying the summary Thedependency summary thus obtained shows the dependency on set_threadrsquos inputs inthe extreme case of calling the predicate and throwing away its result In this casethe summary for set_thread would show that the predicate only depends on the in-put i and on the length or support of the threads array captured by 〈〉 On thecontrary by replacing the deferred dependencies with gt ie everything we obtainexactly the results computed by the context-insensitive dependency analysis presentedin Chapter 5 The information thus obtained shows the dependency on set_threadrsquosinputs when considering the other end of the spectrum namely calling the predicateand using its result entirely

120 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

The dependency summary with deferred occurrences is indeed precise Not onlydoes it create a dependency template in which callers can inject their own context but italso distills the predicatersquos set_thread specification A quick glance and interpretationof it indicates that it is indeed a non-destructive mutator updating the i-th associatedthread of a process to ti and preserving everything else

In order to obtain such dependency summaries we need to refine our first approx-imation of symbolic elements as sets of a predicatersquos output variables Just as neededin our initial abstract dependency domain we must reflect the layered structure ofalgebraic data types and arrays at the level of symbolic dependencies as well To thisend we need to consider not only sets of output variables but also symbolic paths tosubstructures within them

63 Symbolic Paths

631 Symbolic Path Type

In order to extend our abstract dependency domain with symbolic dependencies and toobtain expressive dependency summaries as the ones discussed in the previous sectionwe begin by introducing symbolic paths These are meant to mirror the layered structureof algebraic data types and arrays at the level of symbolic dependencies

Each deferred occurence in a dependency summary is identified by symbolic pathsSymbolic paths are rooted at one of the programrsquos variables and represent sequences ofsymbolic internal accesses inside some valuersquos structure ie they are symbolic traversalsfrom one value to some of its subparts Paths are chains of symbolic accesses leadingto nested elements in which different calling contexts can be subsequently injected Wedefine a recursive type π of symbolic paths encompassing this

Definition 631 Symbolic path type π isin Π

π isin Π π = | ε endpoint ndash root| f π f isin F | Cπ C isin C| 〈i〉π i index| 〈lowast i〉π i index| 〈lowast〉π

An endpoint denoted by ε is the special path denoting an entire element For struc-tures we denote the symbolic path to some field f by fπ Similarly for variants wedenote the path to some chosen constructor C by Cπ For arrays we distinguishbetween three cases

bull symbolic paths referring to a specific array cell identified by the cellrsquos index iand denoted by 〈i〉π

bull symbolic paths referring to all but one specific array cell identified by its indexi and denoted by 〈lowast i〉π

63 Symbolic Paths 121

bull symbolic paths referring to all the cells of an array denoted by 〈lowast〉π

With one exception these symbolic paths directly reflect the cases of our abstractdependency domain For instance the correspondance between symbolic paths forstructures or variants is immediately apparent In contrast for arrays the abstractdependency domain included two cases namely 〈δ〉 corresponding to a dependencyapplying to all of the cells and 〈δdef i δexc〉 corresponding to arrays with a generaldependency applying to all but one exceptional cell for which a specific dependencyis known In order to reflect the second case in the deferred occurrences we need tobe able to refer to the exceptional cell on one hand and to all other cells of the arrayon the other hand Hence to this end we need to introduce two symbolic path typesthe symbolic 〈i〉π path for expressing deferred occurrences of exceptional cells and the〈lowast i〉π symbolic path for expressing deferred occurences of all the other array cellsexcept the one identified by i

The action of appending a non-empty path πprime to another path π is denoted byπ πprime We call the extension operator and when applying it we say that we extendπ with πprime

We further consider sets P sub Π of symbolic paths π and define the partial orderv

between them

Definition 632 Partial Orderv for Path Sets

forallP sub Π P prime sub Π Pv P prime lArrrArr P sube P prime

They establish a semi-lattice based on the subset order The bottom element of thissemi-lattice is empty the empty set of paths

forallP sub Π emptyv P

There is no top element Theoretically this would correspond to the set representingall possible paths In practice this cannot be constructed and we chose not to add aspecial case for it to our symbolic path type π

The join operation of deferred path sets is based on set union and is denoted byor

Definition 633 Join Operationor for Path Sets

forallP sub Π P prime sub Π Por P prime = P cup P prime

It is symmetric and the value obtained by joining two path sets is the least upper boundApplying the extension operator on a set of symbolic paths P amounts to a

pointwise extension of each member of the path set

Definition 634 Extension Operator for Path Sets

forallP sub Π P πprime = π πprime| π isin P

122 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

632 Semantics of Symbolic Paths

Semantically paths of type π defined previously are a symbolic representation of severalactual paths In the following we explicit this notion and we begin by defining simpleactual paths in a value of the universe D (Definition 441)

Actual paths represent a unique sequence of internal accesses inside some valuersquosstructure leading to a single nested element Unlike symbolic paths that can forinstance cover multiple elements of an array an actual path designates a single subvalueof a structure variant or array The recursive actual path type π isin Π is defined below

Definition 635 Actual Path Type π isin Π

π = | ε empty| f π f isin F | C π C isin C| 〈i〉π i index

A symbolic path π covers an actual path π if when given a valuation E (Defini-tion 442) of the index variables for arrays it matches π A set of symbolic pathscovers an actual path π if at least one of the symbolic paths matches π We denotethis by the E relation that is parameterized by a valuation E The definition of Eis given in Table 61

Table 61 ndash E ndash Path Semantics

ε E εE ε

π E π

fπ E f πEStruct

π E π

Cπ E CπEVar

π E π E(i) = j

〈i〉π E 〈j〉πECell

π E π

〈lowast〉π E 〈j〉πEAnyCell

π E π E(i) 6= j

〈lowast i〉π E 〈j〉πEOutCell

Given a valuation E a set P of symbolic paths covers an actual path π if at leastone of the symbolic paths in the set covers or matches π

forallP sub Π P E π lArrrArr existπ isin P π E π

63 Symbolic Paths 123

The interpretation JP KE of a set of paths P is then the set of single actual pathsthat are covered given a valuation E

Definition 636 Interpretation JP KE of a set of paths P

forallP sube Π JP KE = π| P E π

The partial orderv (Definition 632) on sets of paths is compatible with the inter-

pretation JP KE in the sense that when Pv Q holds the interpretation JP KE of P is

included in JQKE for every valuation

forallPQ sube ΠforallEPv Q lArrrArr JP KE sube JQKE

Each single path can be interpreted as a way to find a subpart of a value which weexplicit by the following function at It is not defined for all cases since not all pathscan be applied to all values

Definition 637 Function at

at Πtimes Drarr D

at(π v) =

v when π = ε

at(πprime vi) when π = fiπprime and

v = f1 = v1 fi = vi fn = vnat(πprime vC) when π = Ciπprime and

v = Ci[vC ]at(πprime vi) when π = 〈i〉πprime and

v = (P (vk)kisinP)i isin P

633 Well-Typed Paths and Path Sets

Symbolic paths cannot be used in every context their interpretation must be made inthe context of a type τ An endpoint ie the ε symbolic path can apply to any type Incontrast other symbolic paths that exhibit specific data features can only apply to thecorresponding types For instance a path such as fπ is meaningless on values whichare not records or on record values that do not exhibit a field f the field specified inthe symbolic path

A path set can be seen as a set of sequences of internal accesses inside some valuesrsquosstructure In that sense it is a set of possible traversals from one value to some of itssubparts To characterize the contexts in which a path set is well-typed we need toconsider the types of values to which it can be applied and the types of values to whichit can lead to Therefore in the following we begin by defining a typing judgement forsymbolic paths as a three-place relation π τ rarr τ prime whose meaning is that π can beapplied to any value of type τ and in that case it will always describe subvalues of type

124 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

τ prime Additionally the typing judgement is also parameterized by a set of input variablesI which are the variables having the right to appear as identifiers for array accessesThis is detailed in Table 62

I ` ε τ rarr τWTε

τ = structf1 τ1 fi τi fn τnI ` πi τi rarr τ prime

I ` fiπi τ rarr τ primeWTStructPath

τ = variant[C1 τ1| | Ci τi| | Cn τn]I ` πC τi rarr τ prime

I ` CiπC τ rarr τ primeWTVarPath

Γ ` π τ rarr τ prime

I ` 〈lowast〉π arrτi〈τ〉 rarr τ primeWTArrayPath

I ` π τ rarr τ prime I(i) = τi

I ` 〈i〉π arrτi〈τ〉 rarr τ primeWTCellPath

I ` π τ rarr τ prime I(i) = τi

I ` 〈lowast i〉π arrτi〈τ〉 rarr τ primeWTOutPath

Table 62 ndash Well-Typed Dependency Paths

A set P of symbolic paths is well-typed if every path contained by it is well-typedfor the same types

forallP sub Π I` P τ rarr τ prime lArrrArr forallπ isin P I ` π τ rarr τ prime

The well-typedness property of sets of symbolic paths is preserved by the join op-eration

or (Definition 633)

forallP prime P primeprime isin Π forallτ prime τ primeprime isin TI` P prime τ prime rarr τ primeprime rArr I

` P primeprime τ prime rarr τ primeprime rArr I

` P prime

or pprimeprime τ prime rarr τ primeprime

When extending a well-typed set of symbolic paths with a well-typed path using theextension operator (Definition 634) the resulting set of symbolic paths is well-typed

64 Abstract Dependency Domain with Deferred Accesses 125

as well

forallP prime isin Π forallτ τ prime τ primeprime isin TI` P prime τ prime rarr τ primeprime I ` πprime τ primeprime rarr τ rArr I

` P prime πprime τ prime rarr τ

64 Abstract Dependency Domain with Deferred AccessesFrequently as explained in Section 62 the dependency on a predicatersquos input variable isrelative to the amount in which some of the predicatersquos outputs are subsequently neededMore precisely these outputs are those into which the input variable is copied andretrieved We strive to avoid over-approximations in such cases and to create degreesof freedom for the callers by treating such output variables as points in which callers caninject their own context externally In other words we want to defer the computationof the dependency on certain input variables of a predicate to the predicatersquos callerssince they have additional information about the actual use of the predicatersquos outputs

In our previous section mdash Section 63 mdash we have introduced and defined an in-termediate level consisting of symbolic paths and path sets These reflect the layeredstructure of algebraic data types and arrays and allow us to consider not only outputvariables as a whole but also symbolic paths within them Thus we can computemore flexible and expressive dependency summaries with finer-grained elements Wecan finally link these two ideas and extend our abstract dependency domain with de-ferred dependencies by including an additional dependency case in our domain δ isin Dinitially defined (Definition 521) in Section 52

Definition 641 Extended Abstract Dependency Domain δ isin D

δ = | gt Everything ndash atomic case (i)| Nothing ndash atomic case (ii)| perp Impossible ndash atomic case (iii)| f1 7rarr δ1 fn 7rarr δn f1 fn fields (iv)| [C1 7rarr δ1 Cm 7rarr δm] C1 Cm constructors (v)| 〈δ〉 (vi)| 〈δdef i δexc〉 i array index (vii)| Deferred(o1 7rarr P1 ok 7rarr Pk) deferred accesses (viii)

A deferred dependency shown in (viii) consists of a mapping which binds outputvariables which we also call root variables in this case to sets of symbolic paths

Definition 642 Access Map

A V 9 Π

Only output variables can be treated as lazy dependency components The sets ofsymbolic paths mapped to them allow us to distinguish between their subelements Inthe following discussion we will denote an access map o1 7rarr P1 ok 7rarr Pk by a

126 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

For the partial order v (Definition 522) defined in Chapter 5 and detailed in Ta-ble 51 an additional rule (Def) for comparing instances of deferred dependencies isadded This is shown in Table 63 The top and bottom elements of our dependencydomain are as before gt and perp respectively Thus any instance of a deferred depen-dency is more precise than gt and less precise than perp Just as gt perp and the specialdependency case a deferred dependency can be used in association to any typealbeit with some constraints for its elements

forallo 7rarr P isin a a(o)v aprime(o)

Deferred(a) v Deferred(aprime)Def

Table 63 ndash Extended Leq - Comparison of Two Domains

However unlike the atomic cases gt perp and deferred dependencies are not relatedto or to dependencies corresponding to structures variants or arrays Since they actas placeholders for dependencies that are effectively computed subsequently instancesof deferred dependencies can be compared only to gt and perp or to other instances ofdeferred dependencies For instance comparing a deferred dependency to wouldyield

Deferred(o1 7rarr P1 ok 7rarr Pk) 6v and

6v Deferred(o1 7rarr P1 ok 7rarr Pk)

The extended join operation or (Definition 523) initially defined in Section 521and detailed in Table 52 is shown below in Table 64 It still has perp as its identityelement and gt as its absorbing element Joining two instances of deferred dependen-cies amounts to a pointwise join of the path sets mapped to each output variable inthe access maps The join between an instance of a deferred dependency and a de-pendency corresponding to a structure a variant an array or to the special case amounts to gt the top element of our domain Since we cannot make any supposi-tion regarding deferred dependencies we are forced to make a pessimistic assumptionand to approximate to the least precise value Join is a commutative operation forwhich the undisplayed cases in Table 64 are defined with respect to their symmetricalcounterparts

Similarly to join the reduction operation oplus (Definition 524) has been initiallydefined in Section 521 and it has been detailed in Table 53 The extended form isshown in Table 65 It still has as an identity element and perp as an absorbing elementWhen applying the reduction operation between a deferred dependency and a depen-dency δprime corresponding to a structure a variant or an array we over-approximate thedeferred dependency to gt and apply the reduction operation between δprime and gt Apply-ing the reduction operation between a deferred dependency and gt behaves similarlythe outcome in this case is straightforward and amounts to gt As was the case forjoin applying the reduction operation between two instances of deferred dependencies

64 Abstract Dependency Domain with Deferred Accesses 127

δprime δprimeprime δprime or δprimeprime

Deferred(a) or Deferred(aprime) = Deferred(aprimeprime) where

aprimeprime(o) =

a(o)

or aprime(o) when o 7rarr Po isin a o 7rarr P primeo isin aprime

Po when o 7rarr Po isin aP primeo when o 7rarr P primeo isin aprime

Deferred(a) or f1 7rarr δ1 fn 7rarr δn = gtDeferred(a) or [C1 7rarr δ1 Cm 7rarr δm] = gtDeferred(a) or 〈δ〉 = gtDeferred(a) or 〈δdef i δexc〉 = gtDeferred(a) or = gt

Table 64 ndash or ndash Extended Join

amounts to a pointwise join of the path sets mapped to each output variable in theaccess maps The reduction operation is commutative and the undisplayed cases inTable 65 are defined with respect to their symmetrical counterparts

δprime δprimeprime δprime oplus δprimeprime

Deferred(a) oplus Deferred(a) = Deferred(aprimeprime) where

aprimeprime(o) =

a(o)

or aprime(o) when o 7rarr Po isin a o 7rarr P primeo isin aprime

Po when o 7rarr Po isin aP primeo when o 7rarr P primeo isin aprime

Deferred(a) oplus gt = gtDeferred(a) oplus f1 7rarr δ1 fn 7rarr δn = gtoplus f1 7rarr δ1 fn 7rarr δnDeferred(a) oplus [C1 7rarr δ1 Cm 7rarr δm] = gtoplus [C1 7rarr δ1 Cm 7rarr δm]Deferred(a) oplus 〈δ〉 = gtoplus 〈δ〉Deferred(a) oplus 〈δdef i δexc〉 = gtoplus 〈δdef i δexc〉

Table 65 ndash oplus ndash Extended Reduction Operator

Finally the extractions previously defined for dependencies δ (Definition 525 526527 528 and 529) have been extended in order to handle deferred dependencies aswell Their treatment is summarized in Table 66 Making array-specific extractions aswell as extracting field and constructor dependencies on a deferred dependency amountsto a pointwise extension of every path set in the access map with the correspondingsymbolic path

Finally we add the following rule to the well-typed dependency rules given in Chap-ter 5 Table 55

128 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

Extraction δ Result

Field Deferred(o1 7rarr P1 ok 7rarr Pk)f Deferred(o1 7rarr P1 fε ok 7rarr Pk

fε )Constructor Deferred(o1 7rarr P1 ok 7rarr Pk)C Deferred(o1 7rarr P1

Cε ok 7rarr Pk Cε )

Cell Deferred(o1 7rarr P1 ok 7rarr Pk)〈i〉 Deferred(o1 7rarr P1 〈i〉ε ok 7rarr Pk

〈i〉ε )Array General Deferred(o1 7rarr P1 ok 7rarr Pk)〈lowast〉 Deferred(o1 7rarr P1

〈lowast〉ε ok 7rarr Pk 〈lowast〉ε )

Outside Cell Deferred(o1 7rarr P1 ok 7rarr Pk)〈lowast i〉 Deferred(o1 7rarr P1 〈lowast i〉ε ok 7rarr Pk

〈lowast i〉ε )

Table 66 ndash Extended Extraction Operators

Γ(o1) = τ1 Γ I` P1 τ1 rarr τ

Γ(ok) = τk Γ I` Pk τk rarr τ

o1 isin O ok isin OΓ IO ` Deferred(o1 7rarr P1 ok 7rarr Pk) τ

WTDeferred

Table 67 ndash Well-Typed Dependencies ndash Extended

65 Deferred Dependencies at the Intraprocedural Level

651 Extended Intraprocedural Dependency Analysis

At the intraprocedural and interprocedural level of our dependency analysis the intro-duction of deferred dependencies has a minimal impact in terms of required changes

Intraprocedurally each predicate is analysed on every possible exit label As ex-plained in Section 532 our dependency analysis is a backward data-flow analysis Foreach possible exit label of a predicate the control flow graph is traversed backwardsstarting from the exit node that corresponds to the analysed execution scenario De-pendency information is computed at every point of the control flow graph for eachof the predicatersquos input output and local variables and this information is graduallyrefined until a fixed point is reached

By traversing the control flow graph backwards we take advantage of the infor-mation regarding the outputs that are associated to the analysed exit label and weconsider only the relevant ones starting from the initialisation phase As explainedpreviously in Section 532 the intraprocedural domain for the currently analysed exitlabel is initialised with its associated output variables mapped to gt the least preciseelement of our abstract dependency domain This is a conservative over-approximationit is considered that control on the outputs is lost and that these are entirely observedexternally As illustrated in Section 62 this over-approximation propagates along thecontrol flow graph and in certain cases has a non-negligible impact on the precisionof the computed dependency summaries

We argued that at the intraprocedural level of the analysis a subtle but importantdistinction can be made regarding the dependency on certain inputs This consists in

65 Deferred Dependencies at the Intraprocedural Level 129

distinguishing between the cases in which a predicate effectively uses an input subele-ment to compute an output subelement and those in which it simply forwards it toan output subelement In the latter cases the predicate does not use or need such aninput subelement per se and as a consequence the dependency on it is relative to theamount in which the predicatersquos callers will subsequently use the output in which itis retrieved At the intraprocedural level in order to avoid the propagation of over-approximations it is important to make this distinction early on from the initialisationphase Therefore we introduce deferred dependencies at this level instead of mappingthe output variables to gt as was previously done

For a predicate p of the following form

p(e1 en) [λ1 o11 o1k1 | | λi oi1 oiki | | λm om1 omkm ]

analysed on the λi exit label the intraprocedural dependency domain used for initial-ising the node corresponding to λi is the following

oi1 7rarr Deferred(oi1 7rarr ε) oiki 7rarr Deferred(oiki 7rarr ε)

For each associated output oij 1 le j le ki of the analysed label λi a set Poij ofsymbolic paths is constructed Initially this consists of a single element namely the εpath The deferred dependency associated to each output oij is an access map bindingoij itself to its corresponding set of symbolic paths Poij Since the symbolic paths εrefer to the output variables in their entirety this is still a conservative approximationbut in contrast to our previous initialisation strategy it acknowledges the fact thatdependencies on the inputs might be relative to the amount in which the outputs aresubsequently used It allows injecting context-sensitive information later on

This new initialisation strategy is enough to incorporate the expressive power ofdeferred dependencies at an intraprocedural level Whereas before we were computinglabel-specific dependency summaries as input-output relations the new strategy allowsus to obtain label-specific dependency templates with lazy components that can beparameterized and varied according to a callerrsquos own intraprocedural context Thesecan be seen as context-insensitive dependency summaries with context-sensitive leaves

652 Intraprocedural Dependency Analysis Illustrated

In order to illustrate the use of deferred dependencies at an intraprocedural level werevisit our thread example predicate discussed in Section 533 As done previouslywe consider the true execution scenario and apply our extended dependency analysisWe initialize the dependency corresponding to the true exit node by mapping thepredicatersquos output ti to the deferred dependency mapping it to a set containing asingle symbolic path namely ε

130 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

After the initialisation phase the analysis continues as before by traversing thecontrol flow graph backwards and by applying at each step the corresponding data-flow equation The deferred dependency is propagated upwards until the entry node isreached and analysed

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

p 7rarr threads 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti 7rarr ε) None 7rarr perp]〉i 7rarr gt

th 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti 7rarr ε) None 7rarr perp] 〉i 7rarr gt

tio 7rarr [Some 7rarr t 7rarr Deferred(ti 7rarr ε) None 7rarr perp]

ti 7rarr Deferred(ti 7rarr ε)

Figure 61 ndash Analysing thread ndash Dependency Summary with DeferredOccurrences

The final dependency summary for the true exit label of the predicate is obtained

p 7rarr threads 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti) None 7rarr perp]〉i 7rarr gt

and this is similar to the targeted dependency information for thread discussed inSection 62 and illustrated on page 117

66 Deferred Dependencies at the Interprocedural LevelAt the interprocedural level the impact of introducing deferred dependencies is visibleonly at the level of the substitutions that have to be performed Previously the only re-quired substitution consisted in replacing all occurrences of formal input parameters ofa predicate with the corresponding effective input parameters After having introduceddeferred dependencies further substitutions are needed These can be easily illustratedby revisiting our start_address example predicate discussed in Section 541 As donepreviously we consider the true execution scenario and apply our extended dependencyanalysis

We begin by initialising the output adr with a corresponding deferred dependencyas discussed in Section 651 The analysis traverses the control flow graph backwardsand computes the dependency information at each node until reaching the controlflow graphrsquos entry node which corresponds to a call to the thread predicate Theintermediate dependency results are shown in Figure 62

We obtain the dependency summary for the true exit label of the called predicatethread In order to be able to use it we must first substitute the formal input param-eters ie p and i appearing in it with the effective arguments of the call ie p andj Additionally in deferred dependencies we also have to substitute the formal output

66 Deferred Dependencies at the Interprocedural Level 131

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

trueNone oob

true

true

adr 7rarr Deferred(adr 7rarr ε)

sj 7rarr start 7rarr Deferred(adr 7rarr ε)

tj 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε)

Figure 62 ndash Gstart_address ndash Intermediate Dependency Results forstart_address

parameters appearing as roots in the access maps ie ti with the corresponding ef-fective output parameters These substitutions are shown in Figure 63 Formal indexvariables appearing in dependencies corresponding to arrays have to be substitutedwith their effective counterparts as well Similarly any formal index variable appearingin symbolic paths that correpond to arrays must be substituted by the correspondingeffective index variable

p 7rarr threads 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti) None 7rarr perp]〉i 7rarr gt

tj 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε)

p j tj

j

Figure 63 ndash Substitution of Formal Parameters by Effective Parame-ters

We can finally take advantage of the flexibility obtained using deferred dependenciesby injecting the callerrsquos intraprocedural dependency information into the deferred oc-currences of the calleersquos dependency summary This is another type of substitution andconsists in replacing deferred occurrences of formal output parameters of a predicateby the dependency information computed in the current context for the correspondingeffective output parameters For our start_address example this is shown in Fig-ure 64 and amounts to substituting the dependency computed for tj in the deferredoccurrence of ti in the dependency summary of thread

After this substitution we obtain the following dependency summary for the exitlabel true of the start_address predicate

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε) None 7rarr perp]〉j 7rarr gt

132 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr Deferred(tj) None 7rarr perp]〉j 7rarr gt

tj 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε)

Figure 64 ndash Substituting Deferred Dependencies by Actual Dependen-cies

661 Applying Context-Sensitive Information by Substitution

As shown in our previous example deferred dependencies associate sets of symbolicpaths to certain root variables We can substitute such deferred dependencies by actualdependencies computed in the current context by applying the symbolic paths to theactual dependency to substitute We iterate through entire dependency summaries inorder to substitute the nested deferred dependencies appearing at some leaves Thissubstitution can be seen as an application of contextual information to summarieswith deferred dependencies which are essentially context-insensitive abstractions withcontext-sensitive leaves It is denoted by a mapping σ which associates dependenciesto root variables appearing in deferred access maps

Definition 661 Substitution σ

σ V rarr D

Simultaneously while substituting root variables in deferred dependencies by theiractual dependencies computed in the current intraprocedural context we also substi-tute indices in information corresponding to arrays These are substituted either byanother array index ie the one corresponding to an actual input parameter or theyare eliminated when corresponding to a local variable Their elimination consists inapproximating the dependencies so as to remove references to the array index Thissubstitution is denoted by φ and it is a mapping from variables to new variables toreplace them

Definition 662 Substitution φ

φ V 9 V

The two substitutions can be done separately However for performance reasonswe chose to do them simultaneously This is also what the actual implementation of thedependency analysis does We denote the two simultaneous substitutions by J (σ φ)and detail them in Table 69 Performing the two operations simultaneously can beseen as a manner of reinterpreting a dependency computed in one context in anothercontext

For sets of symbolic paths (as defined in Section 631) in deferred dependenciesthe operation P bull (σ(o) φ) is the application of symbolic paths to the dependency of

66 Deferred Dependencies at the Interprocedural Level 133

the root variable o computed in the current context For a deferred access map alldependencies obtained by applying the symbolic paths are joined The application of asymbolic path π to a dependency δ is denoted by π (δ φ) and it is shown in Table 68During the application free variables appearing in symbolic paths associated to arraysare substituted by their corresponding index variables as given by φ If φ does notcontain a mapping for a free variable an approximation is made in order to remove itand the dependency obtained by applying 〈lowast〉 is returned

π (δ φ)

ε (δ φ) = δ

fπ (δ φ) = π (δf φ)Cπ (δ φ) = π (δC φ)〈lowast〉π (δ φ) = π (δ〈lowast〉 φ)

〈i〉π (δ φ) =π (δ〈φ(i)〉 φ) i isin Dom(φ)π (δ〈lowast〉 φ) otherwise

〈lowast i〉π (δ φ) =π (δ〈lowast φ(i)〉 φ) i isin Dom(φ)π (δ〈lowast〉 φ) otherwise

Table 68 ndash Deferred Paths ndash Application and Substitutions

Definition 663 Application of Symbolic Paths to a Dependency

P bull (δ φ) =orforallπisinP

π (δ φ)

δ J (σ φ)

gt J (σ φ) = gt J (σ φ) = perp J (σ φ) = perp

f1 7rarr δ1 fn 7rarr δn J (σ φ) = f1 7rarr δ1 J (σ φ) fn 7rarr δn J (σ φ)[C1 7rarr δ1 Cm 7rarr δm] J (σ φ) = [C1 7rarr δ1 J (σ φ) Cm 7rarr δm J (σ φ)]

Deferred(o1 7rarr P1 ok 7rarr Pk) J (σ φ) =or

1leilekPi bull (σ(oi) φ)

〈δdef 〉 J (σ φ) = 〈δdef J (σ φ)〉

〈δdef i δexc〉 J (σ φ) =〈δdef J (σ φ) φ(i) δexc J (σ φ)〉 i isin Dom(φ)〈δdef J (σ φ) or δexc J (σ φ)〉 otherwise

Table 69 ndash Interprocedural Domain ndash Substitutions

134 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

662 Wrapped Calls and Results

As a simple experiment for verifying the precision of our dependency analysis approachwith deferred dependencies we have replaced all calls to built-in predicates in ourprevious example predicates thread and start_address illustrated in Section 652and on page 131 respectively with calls to predicates wrapping every call of this typeWe compared the precision of the obtained results as well as the execution time neededto compute the dependency summaries

The thread_with_wrapped predicate thus has the following formpredicate thread_with_wrapped ( process p int i)-gt [ true thread ti|None|oob] array lt option_thread gt th option_thread tio

get_threads (p)[ true th] [ true -gt 1]get_ith (th i)[ true tio| f a l s e ] [ true -gt 2 f a l s e -gt 5]switch_option (tio )[ none|some ti] [none -gt 4 some -gt 3][ true][None ][oob]

The start_address predicate becomespredicate start_address_wrapped ( process p int j)

-gt [ true int adr|None] thread tj memory_region sj

thread (p j)[ true tj | None | oob] [ true -gt 1None -gt 4 oob -gt 4]

get_stack (tj) [ true sj] [ true -gt 2]get_start (sj) [ true adr] [ true -gt 3][ true][None ][error]

The dependency summaries obtained for each of the two predicates are identicalto the ones obtained for the predicates thread and start_address in their originalform The dependency information for thread and start_address is computed in 033milliseconds while that for the versions with calls to the wrapped built-in predicatesie thread_with_wrapped and start_address_wrapped are obtained in 065 millisecondsWe ran the analysis 10001 times in a loop The time measured includes only theexecution of the analysis algorithms It excludes the time required to load the inputfiles as well as the time spent printing the results

67 Related WorkFor the past few decades interprocedural analyses have generated considerable interestin the static analysis community They expand the scope of analysis beyond a pro-cedurersquos limits in order to encompass the effect of callees on callers The precision

67 Related Work 135

of both data-flow and control-flow analyses is traditionally characterized in terms ofcontext-sensitivity ie computing information depending on the calling context orits dual context-insensitivity For control-flow analyses the terms polyvariant andmonovariant analyses are used interchangeably for the same distinction (Nielson andNielson 1999) In (Midtgaard 2012) a comprehensive survey of control-flow analysesfor functional programs is made Context-sensitivity has the advantage of increasedprecision However the scalability of such analyses is frequently a major concern Theprecision and performance impact of context-sensitivity is discussed by Lhotaacutek andHendren in (Lhotaacutek and Hendren 2006) In contrast Ruf argues in (Ruf 1995) thatcontext-insensitivity leads to little or no precision penalty Shapiro and Horwitz ar-gue in (Shapiro and Horwitz 1997) that using a more precise pointer analysis does ingeneral lead to more precise results

Sharir and Pnueli introduced in (Sharir and Pnueli 1978) a comprehensive theoryof interprocedural data-flow analyses for general frameworks The first of them thefunctional approach is based upon computing a context-sensitive summary of a functionor procedure call Procedures are viewed as collections of structured program blocksand input-output relations are established for each such block Subsequently the effectof procedure calls is computed by simply using such relations The second approachproposed by Sharir and Pnueli is the call-string approach Broadly speaking this isbased upon avoiding infeasible paths by matching corresponding calls and returnsIt can be seen as an extension to intraprocedural data-flow analyses in which onlyvalid interprocedural paths are considered during graph traversal This is achieved bytagging the propagated data with an encoded history of procedure calls thus making theinterprocedural flow explicit and increasing the accuracy of the propagated informationBoth approaches are generic and can be used for a wide variety of analyses Our formof interprocedural dependency analysis is closer to the functional approach For eachpredicate of the analysed program it computes a dependency summary as an input-output relation and then uses this summary whenever the predicate is called Symbolicelements are used to allow callers to inject their own context information

Though desirable in terms of precision context-sensitivity is often considered pro-hibitively costly in terms of performance In practice many analyses make a com-promise and relax to a certain degree this requirement for scalability Our approachmakes no exception either it constitutes an application of context-sensitive informa-tion to summaries with deferred dependencies which are essentially context-insensitiveabstractions with context-sensitive leaves Though not purely context-sensitive weobtain a gain in precision without sacrificing scalability

Purely context-sensitive analyses have been developed especially in the area ofpoints-to analyses (Gharat Khedker and Mycroft 2016) but also for informationflow control (Hammer and Snelting 2009) or liveness analysis used for garbage collec-tion (Asati et al 2014) In (Khedker Mycroft and Rawat 2011) Khedker et alpresent a lazy context-sensitive points-to analysis Points-to information is computedonly for the pointers that are live and the propagation of points-to information is sparsebeing restricted to live ranges of pointers Though our approach is not directly com-parable to this approach it is interesting to make a few general remarks In (Khedker

136 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

Mycroft and Rawat 2011) strong liveness is used for identifying the pointers thatare directly used or which are used for defining pointers that are strongly live Onthe other hand we use strong dependency to identify and distinguish between inputsubelements that are directly needed for computing the output and input subelementsthat are simply copied into and forwarded as outputs Thus Khedker et al preventthe explosion of information by clearly distinguishing between relevant and irrelevantinformation We achieve scalability by refining the notion of needed or depending onTheir analysis is fully context-sensitive and is based on the call-string approach (Sharirand Pnueli 1978) our analysis shows a relaxed form of context-sensitivity and is closerto the functional approach

Jensen et al present in (Jensen Moslashller and Thiemann 2010) a technique based onlazy propagation for context-sensitive interprocedural analysis of JavaScript programsie programs with objects and first-class functions Transfer functions may not bedistributive and hence the IFDS technique (Reps Horwitz and Sagiv 1995 Padhyeand Khedker 2013) is not applicable They propagate data-flow information ldquoby needrdquoin an iterative fixpoint algorithm

The computation of relevant information is deferred in demand-driven analyses (Hor-witz Reps and Sagiv 1995 Heintze and Tardieu 2001 Zheng and Rugina 2008Sridharan et al 2005) as well These compute the targeted results only at specificprogram points thereby avoiding the effort of computing a global result We computedependency summaries with symbolic elements These can be seen as dependency tem-plates parameterized by a callerrsquos context Their instantiation is deferred and left tothe callers

68 ConclusionWe have presented an extension of our dependency analysis introducing a relaxedform of context-sensitivity Our solution is based on computing deferred dependen-cies consisting of symbolic access maps in which callerrsquos can subsequently inject theirspecific context information on an as-needed basis The dependency summaries foreach predicate are computed only once However by including nested context-sensitivecomponents at the summariesrsquo leaves we reduce the precision penalty exerted by ourprevious context-insensitive approach The introduction of deferred dependencies re-quired the introduction of an additional level of symbolic paths and path sets Howeverthe impact of this extension had a minimal impact on the dependency analysis at theintra- and interprocedural levels imposing only the modification of the initialisationstrategy and of the substitution operation As we will discuss in Chapter 8 our ex-tension of the dependency analysis with deferred dependencies led to an increase of10ndash20 in execution time on our used benchmark However it obtained more precisedependency information for 50 of the predicates included in the used benchmark

137

Chapter 7

Correlation Analysis

A thousand fibers connect us [] andamong those fibers as sympatheticthreads our actions run as causes andthey come back to us as effects

Hermann Melville

71 IntroductionIn the field of Artificial Intelligence the frame problem (McCarthy and Hayes 1969)is loosely but frequently described as ldquoknowing what stays the same as actions occurin a changing worldrdquo (Morgenstern 1995) In the realm of software verification theframe problem refers to establishing the boundaries within which functions operateand it has notoriously tedious implications and consequences along two different axesthe specification of frame properties (Borgida Mylopoulos and Reiter 1995) and theirverification

Another frequently used definition of the frame problem in the context of ArtificialIntelligence refers to ldquoefficiently determining what remains the same in a changingworldrdquo (Morgenstern 1995) This definition is similar to the first yet the initial wordsldquoefficiently determiningrdquo confer it a subtle but crucial nuance In this chapter we arerather interested in the latter and we address the issue of automatically detecting deep-state modifications in the context of αSmil a functional language In our ldquochangingworldrdquo destructive updates are not allowed The new state out of a structured valuein is obtained by destructuring in and reconstructing it in out by copying unmodifiedsubvalues from in and replacing in out only what needs to reflect the modificationThus referring to old values per se as one of the three major approaches to specifyingframe properties (described in Section 231) implies does not make sense Instead wehave to focus on and to detect the relations between the (sub)values in and out Tothis end we present a static correlation analysis which when given a predicate thatmanipulates a structured input is meant to determine automatically the subset thatremains unchanged and is further propagated into the output Thus the behaviour ofa predicate is summarised by computing relations between parts of the input and partsof the output The computed correlation summaries are a safe approximation of what

138 Chapter 7 Correlation Analysis

part of an input state of a predicate is copied to the output state they summarise notonly what is modified by the predicate but also how it is modified and to what extent

Outline We continue this chapter by illustrating the targeted correlation results onan αSmil example in Section 711 In Section 712 we give a brief overview of thecharacteristics of our correlation analysis and explain the motivation behind some ofthem The rest of the chapter is focusing on technical details related to the correlationanalysis In Section 72 we present our abstract partial equivalence type a fundamen-tal component of our correlation analysis It is followed in Section 73 by an in-depthpresentation of paths and correlations an intermediate level of abstraction that is im-perative for obtaining expressive results In Section 74 we focus on the correlationanalysis at an intraprocedural level and illustrate the step-by-step mechanism behindit in Section 742 A summary of the correlation analysis at an interprocedural level isgiven in Section 75 A possible extension going beyond the detection of equivalencesand handling more general relations is briefly discussed in Section 76 Detecting mod-ifications is traditionally associated to shape and side-effect analyses In Section 77 wereview and discuss such approaches

711 Targeted Correlation Information

The goal of our analysis and the targeted correlation results can be illustrated onan example predicate such as stop_thread for instance This predicate has beenintroduced in Section 315 (on page 50) and its body in the αSmil language was shownin Section 41 on page 64 We revisit it and illustrate the predicatersquos body in Figure 71

predicate stop_thread(process in int i)-gt [true process o | inval]arrayltoption_threadgt ta option_thread ththread ti state s1 ta = inthreads2 th = ta[i]3 switch(th) as [Someti | None]4 s = Blocked5 ti = ti with current_state=s6 th = Some(ti)7 ta = [ta with i=th]8 o = in with threads=ta9 true 10 inval

false

None

false

Figure 71 ndash Body of the stop_thread Predicate

It has two possible execution scenarios true when the given index i corresponds toan active thread and inval otherwise ie when it corresponds to an inactive elementor when it lies outside the arrayrsquos bounds In the latter case the predicate exits with

71 Introduction 139

the inval label and generates no output In the former case stop_thread modifies thestate of the i-th active thread by setting it to Blocked and returns the new state ofthe process in the output o This is accomplished by destructuring the input processin and copying the array of associated threads into the local variable ta (line 1) Thearrayrsquos i-th element is copied to the local variable th (line 2) and as it is an activeelement its corresponding thread is extracted and put into ti (line 3) The new statefor the thread value ti is created by setting its current_state field (line 5) to the states constructed previously (line 4) The new state o of the process is constructed usingti for its i-th active element (lines 6 and 7) and copying everything else from the inputin (line 9) It is interesting to note that for each destructuring step of in there is acorresponding construction step for o as is visible at lines 1 and 8 2 and 7 and 3 and6 for instance

The targeted correlation results for this predicate are illustrated in Figure 72 Ouranalysis should infer that between the input process in and the output o the valuesof the fields pid current_thread and address_space are equal Furthermore for thethreads array of associated threads it should detect that all elements are equal exceptthe value of the i-th element (as illustrated by Rth) for which only one of the threefields namely the current_state field differs (shown by Ri)

in

o

address_spacecurrent_thread

pidthreads

address_spacecurrent_thread

pidthreads =

==

Rth

Rth i iRi

Ri stackcurrent_stateidentifier stackcurrent_stateidentifier

Figure 72 ndash Targeted Correlation Results for Predicate stop_thread

By tracking equalities between pairs of variables of the same type and by defining

140 Chapter 7 Correlation Analysis

an abstract partial equivalence type that mirrors the layered structure of associativearrays and algebraic data types we can detect the equality of the values for the pidcurrent_thread and address_space fields between the input and the output However ifwe track only equalities between variables of the same type and we ignore the flow of aninputrsquos subelement value to a variable (or conversely the flow of a variablersquos value to anoutputrsquos subelement) valuable information is lost We are not only losing informationbetween inputs and outputs of different types but by accumulating imprecisions wealso lose information concerning inputs and outputs of the same type such as the inand o processes of our example For instance the equality between the values extractedfrom the input in and copied into ta and th respectively as well as the relation betweenthe values of ta and othreads and th and othreads[i] are ignored because neitherta nor th are of the same type as in and o As a consequence we lose the informationconcerning the relation between inrsquos and orsquos threads values altogether In order tocompute such information it is imperative to track (cor)relations between variables ofdifferent types as well

712 Correlation Analysis in a Nutshell

Our correlation analysis is a conservative static analysis inferring what is modified byan operation and to what extent It approximates the flow of input values into outputvalues by uncovering equalities and computing correlations as pairs between inputparts and the output parts into which these are injected What is marked as beingequal is definitely equal

π

ρ

πprime

ρprimeRprime

R

Figure 73 ndash Intraprocedural Correlations ndash General Representation

Outputs are often complex compounds of different subparts of different input vari-ables a subset of the input is modified while the rest is injected as is We track theorigin of subparts of the output and relate it to subparts of the input As previouslyillustrated on our stop_thread example predicate in order to prevent avoidable over-approximations we need to avoid dealing with data in a monolithic manner To thisend it is imperative to consider pairs of different types and granularities as well As aconsequence we are forced to introduce an additional level of granularity allowing us torefer not only to variables but also to substructures within them At the intraprocedu-ral level illustrated in Figure 73 we define correlations as mappings between pairs ofinputs and outputs to which we associate mappings between pairs of valid inner paths

72 Partial Equivalence Relations 141

and the relations binding them Correlations for arrays and variants are exemplified inFigures 74-a) and 74-b)

i i

R

a) Arrays foralli a[i]R b[i] b) Variants

Figure 74 ndash Intraprocedural Domain ndash Examples

Similarly to our dependency analysis presented in Chapter 5 the correlation analysisis an interprocedural flow-sensitive field-sensitive label-sensitive analysis that handlesassociative arrays structures and variant data types However unlike the dependencyanalysis for which we introduced a relaxed form of context-sensitivity in Chapter 6 thecorrelation analysis is context-insensitive Fine-grained equivalence relations betweenthe inputs and outputs of a predicate are computed once and subsequently propagatedto its callers

Our correlation analysis is meant to be used in an interactive verification contextPrecise correlation summaries must be computed quickly in order to answer effectivelywhen combined with dependency summaries queries regarding the preservation of cer-tain invariants

72 Partial Equivalence Relations

721 Abstract Partial Equivalence Type

The first step towards automatically reasoning about the propagation of input subele-ments into output subelements is the definition of an abstract partial equivalence typeR that mimics the structure of algebraic data types and arrays A partial equivalencerelation R isin R is defined inductively from the two atomic elements Equal and Anyand mirrors the structure of the concrete types

Definition 721 Partial Equivalence Type R isin R

R = | Equal atomic case ndash equal (i)| Any atomic case ndash unrelated (ii)| f1 7rarr R1 fn 7rarr Rn f1 fn fields (iii)| [C1 7rarr R1 Cn 7rarr Rn ] C1 Cn constructors (iv)| 〈Rdef 〉 array (v)| 〈Rdef i Rexc〉 i array index (vi)

Such relations represent fine-grained partial equivalences between pairs of values of thesame type Equal and Any represent equal and unrelated values respectively Partialequivalence relations for structures (given by (iii)) and for variants (given by (iv)) areexpressed in terms of the partial equivalences of their subparts by mapping each field

142 Chapter 7 Correlation Analysis

or constructor to the corresponding relations As for the dependency analysis presentedin Chapter 5 for arrays we distinguish between two cases namely arrays with a generalrelation applying to all of the cells (as given by (v)) or to all but one exceptional cell(as given by (vi)) for which a specific relation is known to hold

The preorder relation of the partial equivalence lattice is denoted by vR and definedbelow

Definition 722 Preorder Relation vR

vR sube R timesR

It is detailed in Table 71

Table 71 ndash vR ndash Comparison of Two Domains

R vR AnyTop

Equal vR RBot

R1 vR Rprime1 Rn vR Rprimen

f1 7rarr R1 fn 7rarr Rn vR f1 7rarr Rprime1 fn 7rarr RprimenStr

R1 vR Rprime1 Rn vR Rprimen

[C1 7rarr R1 Cn 7rarr Rn] vR [C1 7rarr Rprime1 Cn 7rarr Rprimen]Var

R vR Rprime

〈R〉 vR 〈Rprime〉Adef

Rdef vR Rprimedef Rexc vR Rprimeexc

〈Rdef i Rexc〉 vR

langRprimedef i Rprimeexc

rang AI

Rdef vR Rprime Rexc vR Rprime

〈Rdef i Rexc〉 vR 〈Rprime〉AIA

R vR Rprimedef R vR Rprimeexc

〈R〉 vR

langRprimedef i Rprimeexc

rang AAI

i 6= j Rdef vR Rprimedef Rdef vR Rprimeexc Rexc vR Rprimedef Rexc vR Rprimeexc

〈Rdef i Rexc〉 vR

langRprimedef j Rprimeexc

rang AIJ

The join and meet operations are denoted by orR and andR respectively

Definition 723 Join Operation orR

orR R times R rarr R

Definition 724 Meet Operation andR

andR R times R rarr R

72 Partial Equivalence Relations 143

Both are commutative operations applied pointwise on each subelement Join shownin Table 72 has Equal as its identity element and Any as its absorbing element Meetshown in Table 73 has Equal as its absorbing element and Any as its identity elementFor both operations the undisplayed cases are defined by their symmetrical counter-parts

Table 72 ndash Partial Equivalences ndash orR ndash Join Operation

Rprime Rprimeprime Rprime orR Rprimeprime

Any orR R = AnyEqual orR R = R

f1 7rarr R1 fn 7rarr Rn orR f1 7rarr Rprime1 fn 7rarr Rprimen = f1 7rarr R1 orR Rprime1 fn 7rarr Rn orR Rprimen[C1 7rarr R1 Cn 7rarr Rn] orR [C1 7rarr Rprime1 Cn 7rarr Rprimen] = [C1 7rarr R1 orR Rprime1 Cn 7rarr Rn orR Rprimen]

〈R〉 orR 〈Rprime〉 = 〈R orR Rprime〉〈R〉 orR 〈Rprimedef i Rprimeexc〉 = 〈R orR Rprimedef i R orR Rprimeexc〉

〈Rdef i Rexc〉 orR 〈Rprimedef j Rprimeexc〉i = j

i 6= j=

〈Rdef orR Rprimedef i Rexc orR Rprimeexc〉〈Rdef orR Rprimedef orR Rexc orR Rprimeexc〉

Table 73 ndash Partial Equivalences ndash andR ndash Meet Operation

Rprime Rprimeprime Rprime andR Rprimeprime

Any andR R = R

Equal andR R = Equalf1 7rarr R1 fn 7rarr Rn andR f1 7rarr Rprime1 fn 7rarr Rprimen = f1 7rarr R1 andR Rprime1 fn 7rarr Rn andR Rprimen[C1 7rarr R1 Cn 7rarr Rn] andR [C1 7rarr Rprime1 Cn 7rarr Rprimen] = [C1 7rarr R1 andR Rprime1 Cn 7rarr Rn andR Rprimen]

〈R〉 andR 〈Rprime〉 = 〈R andR Rprime〉〈R〉 andR 〈Rprimedef i Rprimeexc〉 = 〈R andR Rprimedef i R andR Rprimeexc〉

〈Rdef i Rexc〉 andR 〈Rprimedef j Rprimeexc〉i = j

i 6= j=

〈Rdef andR Rprimedef i Rexc andR Rprimeexc〉〈Rdef andR Rprimedef andR Rexc andR Rprimeexc〉

Additionally extraction functions are defined for partial equivalence relations

Definition 725 Extraction of a Fieldrsquos Relation

extrf R 9 R

Definition 726 Extraction of a Constructorrsquos Relation

extrC R 9 R

Definition 727 Extraction of a Cellrsquos Relation

extr 〈i〉 R 9 R

144 Chapter 7 Correlation Analysis

These are partial functions and can only be applied on relations of the correspondingtypes For example the field extraction extrf only makes sense for atomic or structuredrelations having a field named f which should be the case if the relation connects twovalues of a structured type with a field f For any of the two atomic relations Equalor Any applying any of these extractions yields Equal or Any respectively They aresummarized in Table 74

Table 74 ndash Partial Equivalence Extractions

extrf (R) f isin F

extrf (Any) = Anyextrf (Equal) = Equal

extrf (f1 7rarr R1 fi 7rarr Ri fn 7rarr Rn) = Ri if f = fi

extrC(R) C isin C

extrC(Any) = AnyextrC(Equal) = Equal

extrC([C1 7rarr R1 Ci 7rarr Ri Cn 7rarr Rn]) = Rj if C = Cj

extr 〈i〉(R)

extr 〈i〉(Any) = Anyextr 〈i〉(Equal) = Equal

extr 〈i〉(〈R〉) = R

extr 〈i〉(〈Rdef i Rexc〉) = Rexcextr 〈i〉(〈Rdef j Rexc〉) i 6= j = Rdef orR Rexc

722 Well-Typed Partial Equivalences and their Semantics

As discussed in the case of dependencies in Section 522 syntactic partial equivalencesare untyped However their interpretation is made in the context of a type τ isin TThe atomic cases such as Equal and Any can apply to any type since they are notexhibiting any data type features Cases other than Equal and Any only have non-empty interpretations for types τ which are compatible with their shape For instancethe structured relation f 7rarr R only really makes sense for structured types with asingle field f whose type itself is compatible with R and will not be used in connectionwith variant or array types for example In Table 75 we detail the inference rulesrelated to the well-typedness of partial equivalences This is described as a judgementparameterized by a typing environment Γ (Definition 431)

Γ ` Equal τWTgt

Γ ` Any τWTperp

72 Partial Equivalence Relations 145

τ = structf1 τ1 fn τnΓ ` R1 τ1 Γ ` Rn τnΓ ` f1 7rarr R1 fn 7rarr Rn τ

WTStruct

τ = variant[C1 τ1| | Cn τn]Γ ` R1 τ1 Γ ` Rn τnΓ ` [C1 7rarr R1 Cn 7rarr Rn] τ

WTVar

Γ ` R τΓ ` 〈R〉 arrτi〈τ〉

WTArr

Γ ` Rdef τ Γ ` Rexc τ Γ(i) = τi

Γ ` 〈Rdef i Rexc〉 arrτi〈τ〉WTArrI

Table 75 ndash Well-Typed Partial Equivalences

The atomic values are generic they are well-typed with respect to any type (WTgtWTperp) The partial equivalences of structures (WTStruct) are well-typed only withrespect to an adequate structured type whose field types are themselves compatiblewith the equivalences mapped to them Similarly the partial equivalences of variants(WTVar) are well-typed only with respect to an adequate variant type In turn theconstructors must be themselves pointwise compatible with the equivalences mappedto them For well-typed array equivalences (WTArr WTArrI) the default relationas well as the exceptional relation have to be compatible with the type τ of the arrayrsquoselements Furthermore the type of i the index of the known exceptional equivalencerelation has to be compatible with τi the arrayrsquos index type

The semantics of a partial equivalence R for a type τ is a partial equivalence re-lation over values of type τ Given a valuation E from variables to semantic values(Definition 442) the interpretation JRKτ of a relation R isin R with respect to sometype τ is a binary relation over Dτ (Definition 441) The interpretation JRKτ is definedas shown in Table 76

JEqualKτ = (x x)| x isin Dτ JAnyKτ = Dτ times Dτ

Jf1 7rarr R1 fn 7rarr RnKstructf1τ1fnτn =(f1 = v1 fn = vn f1 = w1 fn = wn) | foralli 1 le i le n (vi wi) isin JRiKτi

J[C1 7rarr R1 Cn 7rarr Rn]Kvariant[C1τ1| | Cnτn] = (Ci[vi] Ci[wi]) | (vi wi) isin JRiKτi

J〈Rdef 〉Karrτi 〈τ〉 = ((P (v)k) (P (w)k)) | forallk (vk wk) isin JRdef Kτ

146 Chapter 7 Correlation Analysis

J〈Rdef i Rexc〉Karrτi 〈τ〉 = ((P (v)k) (P (w)k)) |E(i) isin P =rArr (vE(i) wE(i)) isin JRexcKτ forallk 6= E(i) (vk wk) isin JRdef Kτ

Table 76 ndash Partial Equivalence Relations ndash Semantics

A partial equivalence relation R only relates values of the same type τ whichmust be compatible with Rrsquos ldquoshaperdquo For structures a partial equivalence relatespointwise the values of the fields of the two structure values For variant values apartial equivalence relation relates values built with the same constructor Ci usingarguments whose values are related by a relation Ri For arrays P indicates the supporttype which has to be identical for both values The values of the array elements arepointwise related by the same relation Rdef with the exception of the i-th elementswhich are potentially related by an exceptional relation Rexc Since variables i are usedfor indicating the exceptional elements the valuation E is used for determining thevalue of i

73 Paths and Correlations

731 Paths and Correlation Types

The partial equivalence relations discussed in Section 72 and defined in 721 are enoughto represent fine-grained information for values of the same structured type For thestop_thread example discussed in Section 711 these would suffice to express the equal-ity of the pid current_thread and address_space fields between the input process inand the output process o by simply mapping this pair to the following partial equiva-lence

threads 7rarr Anypid 7rarr Equalcurrent_thread 7rarr Equaladdress_space 7rarr Equal

However the partial equivalence relations cannot for instance be used to convey theequality at line 1 in Figure 71 between the value of the threads field of in and the localta variable By not tracking information such as this we lose the targeted informationregarding the threads field denoted by Rth in Figure 72 In order to express thisinformation we first need to be able to refer to the substructure inthreads and relateits value to the one of ta

To this end rather than handling only partial equivalences between pairs of variablesof the same type and approximating the rest to Any ndash the element that conveys noinformation ndash we introduce an intermediate level allowing us to store relations betweensubparts of values We begin by introducing access paths Unlike the symbolic pathsintroduced in Chapter 6 and defined in 631 that are used for computing dependencysummaries with context-sensitive elements the paths used for the correlation analysis

73 Paths and Correlations 147

are actual access paths inside some valuersquos structure The symbolic paths used indeferred dependencies may cover multiple actual paths inside a value whereas theaccess paths required for the correlation analysis represent unique chains of internalaccesses leading to a single nested subvalue Each access path is rooted at one of theprogramrsquos variables It is noteworthy to remark that in both cases an intermediate levelbelow variables needs to be introduced as soon as fine-grained relations between pairs ofvariables are considered directly or indirectly In the case of deferred dependencies thiswas not the main goal per se but rather a mechanism for obtaining more precision inspecific cases for already pertinent dependency results In contrast for the correlationresults this is imperative for obtaining useful expressive information in non-trivialcases We therefore define a recursive type π isin Π encompassing this

Definition 731 Access Path Type π isin Π

π = | ε empty ndash root| f π f isin F| Cπ C isin C| 〈i〉π i index program variable

The empty path denoted by ε is the special case denoting an access to an entireelement ie the root The action of appending a non-empty path πprime to another pathπ is denoted by π πprime For instance the path denoting the current_state field of thei-th active associated thread of the in process of our stop_thread predicate would bethe following inthreads〈i〉Sometcurrent_thread

Meaningful information is conveyed by associating paths and partial equivalencerelations For instance the equality between inthreads and ta at line 1 in Figure 71can be expressed by associating Equal to the pair of subelements identified by thethreads path in in and by ε in ta We call correlation such a mapping from a pairof access paths to a partial relation After setting the i-th element of ta to ti thethread with the current state set to Blocked and everything else left unmodified wecould express the relation between in and ta by two correlations namely

(threads ε) 7rarr 〈Equal i Any〉

(threads〈i〉Somet 〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

To this end we introduce correlation maps κ isin K defined below

Definition 732 Correlation Maps κ isin K Correlation maps κ isin K are finite mappings from pairs of paths to partial equiva-

lence relations R isin Rκ Πtimes Π rarr R

148 Chapter 7 Correlation Analysis

Generally for two given variables e and o a correlation (π ρ) 7rarr R specifies thate and o have nested subelements respectively identified by the inner paths π and ρwhose values are related by the relation R

We conclude this subsection by specifying what it means for paths correlations andcorrelation maps to be well-typed

For characterizing the contexts in which an access path π is well-typed we need toconsider the types of values to which it can be applied and the types of (sub)valuesto which it can lead to Therefore in the following we define a typing judgement foraccess paths as a three-place relation π τ rarr τ prime whose meaning is that π can beapplied to any value of type τ and in that case it will always describe subvalues of typeτ prime Additionally the typing judgement is also parameterized by a set of input variablesI which are the variables having the right to appear as identifiers for array accessesThis is detailed in Table 77

Γ I ` ε τ rarr τWTε

τ = structf1 τ1 fi τi fn τnΓ I ` πi τi rarr τ prime

Γ I ` fiπi τ rarr τ primeWTStructAPath

τ = variant[C1 τ1| | Ci τi| | Cn τn]Γ I ` πi τi rarr τ prime

Γ I ` Ciπi τ rarr τ primeWTVarAPath

Γ I ` πi τ rarr τ prime Γ(i) = τi i isin IΓ I ` 〈i〉πi arrτi〈τ〉 rarr τ prime

WTCellAPath

Table 77 ndash Well-Typed Access Paths

Correlations are mappings from pairs of access paths to partial relations Thoughthe two access paths can be applied to values of different types they both need toreturn subvalues of the same type τ prime Furthermore the partial equivalence relationassociated to them has to be well-typed with respect to τ prime as detailed in Table 75The inference rule for well-typed correlations is shown in Table 78

Γ I ` π τl rarr τ prime Γ I ` ρ τr rarr τ prime Γ ` R τ prime

Γ I ` (π ρ) 7rarr R (τl τr)WTCorrelation

Table 78 ndash Well-Typed Correlations

73 Paths and Correlations 149

Finally as shown in Table 79 a correlation map κ is well-typed if all the correlationsit contains are well-typed

forall(π ρ) 7rarr R isin κ Γ I ` (π ρ) 7rarr R (τl τr)Γ I ` κ (τl τr)

WTCorMaps

Table 79 ndash Well-Typed Correlation Maps

732 Alignment and Partial Order

There is no clear choice for a canonical form for correlations For instance it is equiv-alent to write (ε ε) 7rarr f 7rarr R and (f f) 7rarr R Is one superior to the otherWhich one should be chosen Operations can create and manipulate correlations indifferent manners that are hard to predict New correlations can also be introducedwhile considering def-use chains in the transfer function presented later in Section 741Choosing between the two forms considerably limits flexibility Not choosing a canoni-cal form however has consequences as well notably it renders the definition of a partialorder between correlation maps difficult In order to compare two correlation maps κ1and κ2 we cannot simply verify if the path pairs are identical and compare their asso-ciated relations A correlation of the second map could be linked in different mannersto multiple mappings of the first

For instance between a process p of the type used by our stop_thread example andan array ta of the same type as the field threads of the process we might have thefollowing correlation maps

κ1 (threads ε) 7rarrlang

None 7rarr Any

Some 7rarr

t 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

rang

κ2

(threads ε) 7rarr 〈Equal i Any〉

(threads〈i〉Somet 〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

These correlation maps can be depicted as follows

150 Chapter 7 Correlation Analysis

κ1

threadsR1

p

taε

κ2

threadsR2

Rprime2

p

taε

As illustrated above in the given example map κ2 in addition to the relation R2associated to (threads ε) the relation associated to (threads〈i〉Somet 〈i〉Somet)and denoted by Rprime2 expresses information about the values of the processrsquo threadsfield and ta as well These are nested in the i-th element of each as identified by〈i〉Somet In order to compare these two correlation maps we have to first determinethe relationships between the pair of paths (threads ε) from κ1 and each pair of pathsof κ2 The first pair of paths in κ2 is identical whereas the second pair refers toelements that are further away from the root Based on these relationships we haveto extract all the information relevant to (threads ε) from κ2 and consider it in itsentirety This amounts to

(threads ε) 7rarrlangEqual i

None 7rarr Any

Some 7rarr

t 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

rang

Having expressed the information from the κ2 correlation map at the same level asthe information of κ1 is expressed ie that of the pair of paths (threads ε) wecan finally compare them and conclude that the information contained by κ2 is moreprecise than the relation associated to (threads ε) in κ1 The relation associated to(threads ε) in κ1 captures the equality between the values of the identifier and stackfields of all active thread elements of the two arrays identified by the paths The relationassociated to (threads ε) in κ2 expresses the equality between all thread elements ofthe two arrays except the i-th elements Furthermore if the i-th elements of the twoarrays are active it captures the equality between the values of the identifier andstack fields Thus by using the information contained by κ1 we can conclude that for

73 Paths and Correlations 151

all active elements of the two arrays the values of 2 out of the 3 fields are equal byusing the more precise information contained by κ2 we can conclude that all elementsof the two arrays are equal except the i-th one for which the values of the same 2 outof 3 fields as in κ1 are equal

In the general case for comparing two correlation maps κ1 and κ2 we need tocollect for each correlation (π ρ) 7rarr R in κ2 all the information contained by κ1 thatrefers to the elements identified by (π ρ) and verify if this covers at least the sameinformation as the relation R This information could be scattered across multiplemappings of the correlation map κ1 We call alignment the process of collecting forany correlation (π ρ) 7rarr R in κ2 all the information contained in κ1 that refers tothe elements identified by (π ρ) It is necessary in the absence of a canonical forma trait of our approach that is both a weakness and a strength it leads to complexcomputations but gives considerable flexibility as will be shown in Section 74

For aligning we first determine the relationships between paths by determining therelationship between the sequences of internal accesses that they represent These canbe identical representing the same traversal to the same subelement of a value or theycan be completely unrelated such as f and g for instance representing accesses to twodifferent fields of a structure They can also represent sequences of accesses of differentdepths one being the prefix of the other ie being closer to the root For examplethe path f is a prefix of the path f〈i〉 the first represents the access to the field f whereas the second one represents an access to the i-th element of the array nested inthe field f

To distinguish between these cases we define a link type and a matching operator

Definition 733 Link Type micro isinM A link type denoted by micro isinM is defined as follows

micro = | Identical| Left π| Right π| Incompatible

Definition 734 Matching Operator fThe matching operator f retrieves the link micro between two paths

f Πtimes Π rarrM f (π ρ) =

Identical π = ρLeft πprime π πprime = ρRight ρprime ρ ρprime = πIncompatible otherwise

The different cases are depicted in Table 711

152 Chapter 7 Correlation Analysis

f(π ρ) = Identicalπ ρ

f(π ρ) = Left πprime

π

πprime

ρ ρ

f(π ρ) = Right ρprimeπ

ρ

ρprime

π

f(π ρ) = Incompatibleπ ρ

Table 711 ndash Links between Access Paths

Definition 735 AligningAligning a correlation (π ρ) 7rarr R to another pair of paths (πprime ρprime) is denoted by

(Πtimes ΠtimesR)times (Πtimes Π)rarr R [(π ρ) 7rarr R] (πprime ρprime) = R(πρ)(πprimeρprime)

From R we obtain the information referring to the elements identified by (πprime ρprime) anddenote it by R

(πρ)(πprimeρprime) This is done by matching on π and πprime on the one hand and

on ρ and ρprime on the other and by distinguishing between the different cases Whenthe paths are identical we can simply return the relation R When the links betweenthe paths differ or when the paths are incompatible we have to approximate to theleast precise relation thus returning Any When π and ρ are more shallow paths iecloser to the root we need to make a projection denoted by For example aligning(f ε) 7rarr a 7rarr Ra b 7rarr Rb c 7rarr Rc to (fb b) consists in projecting b on the relationa 7rarr Ra b 7rarr Rb c 7rarr Rc and thus obtaining Rb More generically this case isdepicted below

73 Paths and Correlations 153

αβγ

δ

πα

β

γ

δ

αβγ

δ

πα

β

γ

δ

R

For aligning the known correlation to the given pair of paths we need to extractfrom R the information that is relevant for the nested element δ as depicted below

αβγ

δ

πα

β

γ

δ

αβγ

δ

πα

β

γ

δ

R

On the contrary if πprime and ρprime are closer to the root we need to perform an injectiondenoted by x For example aligning (fb b) 7rarr Rb to (f ε) consists in creating arelation a 7rarr Any b 7rarr Rb c 7rarr Any More generically this case can be depicted asfollows

αβγ

δ

βγ

δ

αβ

β

For aligning the known correlation to the given pair of paths we need to expressthe relation R

δat the level of the (αβ β) paths a level that is closer to the root This

consists in creating a new higher-level relation where the element identified by δ ismapped to R

δand everything else is ldquofilledrdquo with Any since nothing is known about

the rest of the elements This can be depicted as follows

154 Chapter 7 Correlation Analysis

αβγ

δ

βγ

δ

αβ

β

Any Any

In the general case R(πρ)(πprimeρprime) is computed as defined below

Definition 736 Computation of R(πρ)(πprimeρprime)

R(πρ)(πprimeρprime) =

R whenf (π πprime) = f(ρ ρprime) = Identical (σ R) whenf (π πprime) = f(ρ ρprime) = Left σx (R σ) whenf (π πprime) = f(ρ ρprime) = Right σAny otherwise

The used projection and injection x operators are defined as follows

Definition 737 Projection Operator

ΠtimesR 9 R

Projection (π R) =

R when π = ε (πprime extrf (R)) when π = f πprime

(πprime extrC(R)) when π = Cπprime (πprime extr 〈i〉(R)) when π = 〈i〉πprime

Definition 738 Injection Operator x

x R times Π 9 R

Injection x (R π) =

R when π = ε

f1 7rarr Any fi 7rarrx (R πprime) fn 7rarr Any when π = f πprime f = fi[C1 7rarr Any Ci 7rarrx (R πprime) Cn 7rarr Any] when π = Cπprime C = Cilang

Any i x (R πprime)rang when π = 〈i〉πprime

For applying the injection operator we need to know the types of the elements ontowhich the relation is injected ie in order to ldquofillrdquo the unknown relations for fields orconstructors with Any we need to know which those fields or constructors are Thusin practice we need to connect the types to the context

Aligning a correlation map κ isin K to (πprime ρprime) amounts to performing this operationfor each element (π ρ) 7rarr R of κ and intersecting the results with the andR operator(Definition 724)

Definition 739 Aligning Correlation Maps

κ (πprime ρprime) =and

R(πρ)7rarrRisinκ

R(πρ)(πprimeρprime)

74 Intraprocedural Correlation Analysis 155

The obtained results R(πρ)(πprimeρprime) are intersected in order to take into account all the in-

formation scattered across the different elements of κ and thus to obtain the mostprecise partial equivalence relation that is contained in κ about the elements identifiedby (πprime ρprime)

Finally we can define the preorder for correlation maps

Definition 7310 Correlation Maps Preorder v

κ1 v κ2 lArrrArr forall[(π ρ) 7rarr R] isin κ2 κ1 (π ρ) vR R

A correlation map κ1 is therefore more precise than another correlation map κ2 if therelation obtained by aligning κ1 to any pair of paths (π ρ) of κ2 is more precise thanR the relation mapped to this pair in κ2 By definition any correlation map κ isin Kis smaller than empty the empty correlation map Therefore the empty correlation mapis the top element for the correlation maps semilattice A bottom element in this casedoes not make sense as it would have to map to Equal any pair of paths denoting(sub)elements having compatible typesThe defined join operation between two correlation maps is denoted by

or

Definition 7311 Join Operationor

for Correlation Maps

κ1orκ2 = κ3 lArrrArr forall[(π ρ) 7rarr R] isin κ1 κ3(π ρ) = R orR κ2 (π ρ)

It consists in aligning the correlation map κ2 to any correlation (π ρ) 7rarr R in κ1 andjoining the obtained aligned relation with R We note that the correlation map obtainedby joining κ1 and κ2 will contain the same keys as κ1 We could have expressed joinby aligning the first correlation map to the elements of the second map This wouldlead to results that have different forms ie (ε ε) 7rarr f 7rarr R versus (f f) 7rarr R butwhich are equivalent by definition

The meet operation between two correlation maps is denoted byand

Definition 7312 Meet Operationand

for Correlation Maps

κ1andκ2 = κ3 lArrrArr κ3(π ρ) =

R andR Rprime when (π ρ) 7rarr R isin κ1

and (π ρ) 7rarr Rprime isin κ2R when (π ρ) 7rarr R isin κ1Rprime when (π ρ) 7rarr Rprime isin κ2

forall(π ρ)

74 Intraprocedural Correlation Analysis

741 Intraprocedural Correlation Summaries and Analysis

As was the case for the dependency analysis presented in Chapter 5 we are working witha control flow graph (CFG) representation of the predicatesrsquo bodies We remind thatnodes represent program states and edges are defined by statements with a particularexit label λ In our case all the outgoing edges of a node n bear the different cases of

156 Chapter 7 Correlation Analysis

the same statement s found at the program point n For each statement s there is anedge labeled s λk for each of its possible exit labels λk (as discussed in Section 42)However similarly to the dependency analysis our correlation analysis does not dependon this specificity

Intraprocedurally correlation information has to be kept at each point of the controlflow graph for each input and output pair of the node

Definition 741 Intraprocedural Correlation SummariesAn intraprocedural correlation summary is a mapping from pairs of variables v isin V

to correlation mapsK isin K K V times V rarr K

There is one special case called NoCorrelation which associates Any ndash the least precisepartial relation ndash to any pair of variables on any pair of valid compatible paths Itis the top element at the intraprocedural level Unreachable is used for nodes thatcannot be reached as its name implies and constitutes the bottom element at theintraprocedural level

For each node of a given control flow graph K(e o) retrieves the correlation mapbetween the local variable e and the output variable o If a mapping for e and o doesnot currently exist K(e o) retrieves the correlation (ε ε) 7rarr Equal when e = o or theempty correlation map empty otherwise

Establishing the partial order vK and the join operationorK is straightforward v

(Definition 7310) andor

(Definition 7311) are extended pointwise to an intraproce-dural summary for each ordered input-output pair and its associated correlation map

Definition 742 Partial Order for Intraprocedural Correlation Summaries

vKsube K timesK K1 vK K2 lArrrArr foralle o isin V K1(e o) v K2(e o)

Definition 743 Join Operation for Intraprocedural Correlation SummariesorK K timesK rarr K K1

orKK2 = K3 lArrrArr forall(e o) K3(e o) = K1(e o)

orK2(e o)

Our correlation analysis is a backward data-flow analysis computing an intrapro-cedural summary at each point of the control flow graph This represents the cor-relations at the nodersquos entry point For each exit label it traverses the control flowgraph starting with its corresponding exit node The intraprocedural summary forthe currently analysed label is initialized with pairs between the local value of eachassociated output variable of the label and the final value of the same output variablemapped to (ε ε) 7rarr Equal The analysis traverses the control flow graph and graduallyrefines the correlations using Kildallrsquos worklist algorithm (Kildall 1973) until a fixedpoint is reached Table 712 summarizes the representation and general equation ofthe statements For each statement the presented data-flow equation operates on theintraprocedural summaries of the statementrsquos successor nodes The intraproceduralsummary at the entry point of the node is obtained by joining the contributions ofeach outgoing edge

74 Intraprocedural Correlation Analysis 157

Definition 744 The contribution of an edge (n ni) labeled with s and λi is givenby Csλi(Kni) isin C where Csλi() is the transfer function of the edge labeled s λi

We note that there are four statements supported by αSmil ie the equality test no-operation the partial structure equality test and the possible variant test that haveno write effects and thus have no own contribution and are not included in Table 712Excepting the no-operation statement the correlation information at their entry pointis obtained by simply joining the intraprocedural summaries of their successor nodeson the true and false exit labels For the no-operation statement the correlation in-formation at the entry point is identical to the intraprocedural summary of its onlysuccessor node the one on the true exit label

Table 712 ndash Statements ndash Representations and Data-Flow Equations

Representation Equationn

n1 ni nk

Kn

Kn1

KniKnk

s λ1 s λks λiKn =

orK

nsλiminusminusrarrni

Csλi

(Kni)

Statement Csλ() csλ killλ

Assignment o = e (e o) 7rarr [(ε ε) 7rarr Equal] otrue

New Struct r = e1 en foralli 1 le i le n (ei r) 7rarr [(ε fi) 7rarr Equal] rtrue

Destructure o1 on = r foralli 1 le i le n (r oi) 7rarr [(fi ε) 7rarr Equal] oitrue

Get Field o = rfi (r o) 7rarr [(fi ε) 7rarr Equal] otrue

Set Field rprime = r with fi = e (r rprime) 7rarr [(ε ε) 7rarr rprimetruef1 7rarr Equal fi 7rarr Any fn 7rarr Equal]

(e rprime) 7rarr [(ε fi) 7rarr Equal]

Create Var v = Cp[e] (e v) 7rarr [(εCpe) 7rarr Equal] vtrue

Var Switch switch(v) as [o1| |on] (v oi) 7rarr [(Cie ε) 7rarr Equal] oiλCi

Array Get o = a[i] (a o) 7rarr [(〈i〉 ε) 7rarr Equal] otrue

Array Set aprime = [a with i = e] (a aprime) 7rarr [(ε ε) 7rarr 〈Equal i Any〉] aprimetrue(e aprime) 7rarr [(ε 〈i〉) 7rarr Equal]

The transfer function Csλ() formalizes the correlations created by the statement son the label λ between its local input variables and its local output variables denotedby csλ as well as the set killλ of variables whose values have been redefined by thestatement s on the label λ These are shown in Table 712 There is one crucialdifference between transfer functions Csλ() and intraprocedural summaries K Anintraprocedural summary K implicitly maps any pair (v v) for v isin V to (ε ε) 7rarr EqualOn the contrary in csλ when the variable v is used as both input and output by the

158 Chapter 7 Correlation Analysis

statement s the pair (v v) is mapped to the correlation map known between the inputrsquosv old value and the outputrsquos v fresh value Otherwise when v is an output ie v isin killλbut not an input of s (v v) is mapped to empty We remark that K represents a statewhile csλ represents a transition

In order to obtain the contribution Csλi(Kni) of an edge labeled with s and λi weneed to connect the information given by csλi to the information contained in the in-traprocedural summary Kni For example at the entry of node 3 in Figure 71 (onpage 138) when considering the scenario in which the predicate exits with true theintraprocedural summary contains the mapping

(th o) 7rarr

(Somet threads〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

On the true edge statement 2 creates the mapping

(ta th) 7rarr [(〈i〉 ε) 7rarr Equal]

Intuitively since we are traversing the graph backwards and we are mapping ordered(local) input-output pairs (ta th) and (th o) can be seen as a def-use pair thecorrelation associated to (ta th) expresses the relation between the defined value of thand the input ta used for creating it while the correlation associated to (th o) showsa subsequent use of that value of th for creating o The contribution of statement 2 onthe true edge should capture this flow of tarsquos value to orsquos value through the variableth Thus it should contain a mapping for the pair (ta o) In the general case we needto detect any variable r such that [(p r) 7rarr κ] isin csλi [(r q) 7rarr κprime] isin Kni and computethe mapping for (p q) in Csλi(Kni)

In order to compute the correlation map associated to (ta o) we take into accountthe fact that both the right path ε of csλ(ta th) and the left path Somet of Kn3(th o)refer to the th variable However they do not represent traversals of the same depthε refers to the entire value of th while Somet refers to the value below the construc-tor Some Between ta and o we can conclude that the values nested under the Someconstructor of the i-th elements are related

(ta o) 7rarr

〈i〉Somet threads〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

We call the process of obtaining the correlation map associated to (ta o) from thecorrelations associated to (ta th) and (th o) composition

In the general case the composition operation is denoted by and it refers to theprocess of computing the flow of a variable p to a variable q through an intermediatevariable r Thus when knowing that (p r) 7rarr [(π ρ) 7rarr R] and that (r q) 7rarr [(πprime ρprime) 7rarrRprime] we must first obtain the link (Definition 733) between the paths ρ and πprime relating

74 Intraprocedural Correlation Analysis 159

subvalues of r to subvalues of p and q respectively This is obtained by matching withf (Definition 734) In the context of the example given above ρ and πprime are the pathsreferring to subvalues of the th variable ie ε and Somet respectively If the twopaths are incompatible ie they refer to different unrelated subvalues of r there isno flow between p and q through r If the paths are compatible we can compute thecorrelation between p and r by distinguishing between the three different possible linkcases obtained with f

The case when the same subvalue of r identified by ρ (and the identical πprime) is relatedto both p and q is depicted below

f(ρ πprime) = Identical

π ρ ρprimeR Rprimep r

πprimeq

In this case computing the flow from p to q through r is rather straightforward Sincethe same subvalue of r is related to prsquos subvalue identified by π and to qrsquos subvalueidentified by ρprime we can relate these two subvalues and map the pair (π ρprime) to therelation obtained by composing R and Rprime We note that given the special form ofpartial relations R isin R the compose operation at this level is equivalent to orR

1

(Definition 723) The computation of the correlation for p and q is depicted below

f(ρ πprime) = Identical

π ρ ρprimeR Rprime

R orR Rprime

p rπprime

q

The subelements of r related to p and to q respectively can also have differentgranularities one being nested deeper in r than the other For instance the subvalueof r identified by the path ρ can be closer to the root than its subelement identified byπprime related to q This case is depicted below

1However this would not be the case anymore for a more complex partial relation type includingnot only equivalences but also more general relations

160 Chapter 7 Correlation Analysis

f(ρ πprime) = Left σ

π

σ

ρ

σ

ρprimeR

Rprime

p rπprime

q

In this case we can only detect the flow of p to q at the level of rrsquos subelement that isrelated to both p and q ie the subelement nested deeper Thus in order to computethe correlation between p and q we need to project σ on R and to compose the obtainedrelation with Rprime This is summarized by the following figure

f(ρ πprime) = Left σ

π

σ

ρ

σ

ρprimeR

Rprime

(σ R) orR Rprime

p rπprime

q

Finally in the complementary case the subvalue of r identified by the path ρand correlated to p can be nested deeper than the subvalue identified by πprime which iscorrelated to q This case is depicted below

f(ρ πprime) = Right σ

π ρ

σ

ρprime

σ

RRprime

p rπprime

q

As in the previous case we can only detect the flow of p to q at the level of rrsquos subelementthat is related to both p and q ie the subelement nested deeper In this case we needto project σ on Rprime and to compose the obtained relation with R The flow between pand q is at the level of the subvalues identified by π and ρprime σ respectively This isillustrated below

74 Intraprocedural Correlation Analysis 161

f(ρ πprime) = Right σ

π πprime

σ

ρprime

σ

RRprime

R orR (σ Rprime)

p rπprime

q

Formally if the ρ and πprime paths are compatible we compose the correlation elements(π ρ) 7rarr R and (πprime ρprime) 7rarr Rprime thereby obtaining a new correlation element (πbull ρbull) 7rarrR which is computed as shown below

Definition 745 Computing (πbull ρbull) 7rarr R

(πbull ρbull) = (π ρ) bull (πprime ρprime) def=

(π ρprime) whenf (ρ πprime) = Identical(π σ ρprime) whenf (ρ πprime) = Left σ(π ρprime σ) whenf (ρ πprime) = Right σ

R = R Rprimedef=

R orR Rprime whenf (ρ πprime) = Identical (σR) orR Rprime whenf (ρ πprime) = Left σR orR (σRprime) whenf (ρ πprime) = Right σ

We note that the use of the projection operation (Definition 737) for both compat-ible non-identical link cases for rrsquos access paths related to p and to q respectively is aconsequence of not choosing a canonical form for correlations The flexibility conferedby the absence of a canonical correlation form is visible at the composition level

The composition of correlation maps is denoted by and defined below

Definition 746 Composition of Correlation MapsComputing κ1 κ2 amounts to intersecting the composition of all correlation ele-

ments from κ1 and κ2

(κ1 κ2)(πbull ρbull) =and

R(πρ)7rarrRisinκ1

(πprimeρprime)7rarrRprimeisinκ2(πbullρbull)=(πρ)bull(πprimeρprime)

R Rprime

Finally the contribution Csλi(Kni) is obtained as defined below

Definition 747 Contribution Csλi(Kni)

CtimesK rarr K csλ K = K prime where K prime(p q) =andr

(csλ(p r) K(r q))

It is depicted in Figure 75

162 Chapter 7 Correlation Analysis

statement s

(csλ1∆λ1)

orK

orK(csλn ∆λn)

csλ1Kλ1

csλnKλn

csλ1Kλ1 csλn Kλn

Figure 75 ndash Entry Point ndash Correlation Information

We conclude this section by specifying what it means for intraprocedural corre-lation summaries to be well-formed showing the corresponding inference rule in Ta-ble 719 Only ordered input-output pairs can appear as keys in intraprocedural map-pings Therefore the well-formedness judgement is parameterized by the set of inputvariables I and by the set of output variables O The former indicate variables thathave the right to appear as left members of the variable pairs while the latter indicatevariables that have the right to appear as right members of the variable pairs The cor-relation map associated to each such input-output pair must be well-typed with respectto the types of the variables as given by the typing environment Γ (Definition 431)The typing judgement for correlation maps was shown in Table 79

forall(e o) 7rarr κ isin K Γ(e) = τe Γ(o) = τo e isin I o isin OΓ I ` κ (τe τo)

Γ IO KWFIntraCor

Table 719 ndash Well-Formed Intraprocedural Correlation Summaries

742 Intraprocedural Correlation Analysis Illustrated

To better illustrate our correlation analysis at an intraprocedural level and to sum-marize everything that has been presented so far in this chapter we exemplify themechanism behind it step by step on the predicate stop_thread discussed in Sec-tion 711 on page 138 We consider the true execution scenario apply our analysisand compare the actual obtained correlation results with the targeted ones depicted inFigure 72

Since a predicate can only exit with one label at a time and we are analysing thetrue label we can map the exit node inval to the special case Unreachable We beginby initialising the correlation summary for the exit node corresponding to the true exitlabel As shown in Figure 76 this consists in mapping the pair referring to the localvalue of the o variable and the final state of o to a correlation map containing a singlecorrelation namely (ε ε) 7rarr Equal This acknowledges that the value of the output oretrieved to the predicatersquos callers is the most recent value computed locally In thefollowing we denote the final value of o by o in order to distinguish it from the localvalue

74 Intraprocedural Correlation Analysis 163

1 ta = inthreads

2 th = ta[i]

3 switch(th) as [ | ti]

4 s = Blocked

5 ti = ti with current_state = s

6 th = Some(ti)

7 ta = [ta with i=th]

8 o = in with threads=ta

9 true 10 inval

true

true

true

true

true

true

true

true

false

false

None

Unreachable(o o) 7rarr (ε ε) 7rarr Equal

Figure 76 ndash Analysing Predicate stop_thread ndash Initialisation

We advance backwards along the control flow graph reaching node 8 We apply theequation corresponding to a field access as given in Table 712 and obtain the followingcorrelation summary

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr (ε threads) 7rarr Equal

We compose it with the correlation summary of its successor node ie the exit nodecorresponding to the true exit label thus detecting the flow of in to o and of ta to o

164 Chapter 7 Correlation Analysis

respectively through the local value o This amounts to

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr (ε threads) 7rarr Equal

Since node 8 does not have any other successor nodes the correlation information atits entry point is identical to the one we have just computed

We advance one step reaching node 7 and apply the corresponding equationthereby obtaining

(ta ta) 7rarr (ε ε) 7rarr 〈Equal i Any〉

(th ta) 7rarr (ε 〈i〉) 7rarr Equal

We compose it with the correlation summary of node 8 tracking the flow of the localvalue of ta to o through the new state of the variable ta after updating its i-thelement We also track the flow of th to o The correlation map for the (in o) pairremains unchanged We thus obtain

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr (ε threads) 7rarr 〈Equal i Any〉

(th o) 7rarr (ε threads〈i〉) 7rarr Equal

In order to obtain the correlation information at the entry point of node 7 we need tojoin the computed correlation summary with the correlation summary known for theother successor of node 7 namely the exit node 10 Since the latter is Unreachable theidentity element for join at the intraprocedural level it does not affect the correlationsummary at the entry point of node 7 We proceed similarly for nodes 6 5 4 3 and 2applying the corresponding data-flow equation for each statement and composing withthe intraprocedural correlation summary of the successor node Since each of thesenodes has only one possible exit label there are not multiple contributions that need tobe joined At the entry point of node 6 for example we obtain the following summary

(ta o) 7rarr (ε threads) 7rarr 〈Equal i Any〉

(ti o) 7rarr (ε threads〈i〉Somet) 7rarr Equal

74 Intraprocedural Correlation Analysis 165

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

We skip some steps and obtain the following correlation summary at the entry point ofnode 2

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr

(ε threads) 7rarr 〈Equal i Any〉

(〈i〉Somet threads〈i〉Somet) 7rarr

id 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

Finally we reach node 1 where we apply the data-flow equation correspondingto a field access and compose the obtained information with the correlation summarycomputed at the entry of node 2 We obtain

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(threads threads) 7rarr 〈Equal i Any〉

(threads〈i〉Somet threads〈i〉Somet) 7rarr

id 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

Since the node 1 has only one successor node this correlation summary represents

the correlation information at the entry point of node 1 ie there is no other correlationsummary to join it with This contains a single pair of variables (in o) and theirassociated correlation map Since the pair is an input-output pair of the stop_threadpredicate we do not need to filter anything out This constitutes the final correlationsummary for the analysed predicate on the true exit label These results are identicalto the ones we had depicted as our targeted results in Figure 72

For the inval exit label the corresponding correlation summary is NoCorrelationThis example can be tried on the web page2 dedicated to our correlation analysis Other

2Correlation Analysis Web Page httpwwwajl-demofr2016

166 Chapter 7 Correlation Analysis

examples are provided and explained there as well Additionally users can devise andtest their own examples

75 Interprocedural Correlation AnalysisOur analysis is performed label by label and interprocedural correlation domains asso-ciate an intraprocedural summary to each exit label of the analysed predicate There-fore interprocedural domains encapsulate an intraprocedural summary for each possibleexecution scenario of a predicate

An interprocedural domain Kp of a predicate p is thus defined as shown below

Definition 751 Interprocedural Correlation Domain

Kp Λp rarr K where Λp is the set of output labels of predicate p

The intraprocedural summary associated to each label is filtered so as to contain onlyordered pairs of variables where the left member is an input of the analysed predicateand the right member is an output associated to the analysed label The correlationmaps associated to such pairs are built so as to contain correlations where only inputvariables may appear in array cell paths Similarly the exception index in partialequivalence relations of arrays must be an input variable Registering exceptions inarray correlations only for input variables is not a consequence of a language restrictionon array operations but simply a consequence of the fact that at the interprocedurallevel only correlation information between inputs and outputs makes sense

The interprocedural domain of a predicate is used for deducing the transfer functionsfor a predicate call statement

In the following we detail the equation corresponding to a call to a predicate

p(e1 en)[λ1 o1 | | λm om]︸ ︷︷ ︸s

having the following signature

p(ε1 εn)[λ1 ω1 | | λm ωm]

The general equation form given in Table 712 applies

Kn =orK

nsλiminusminusrarrni

Csλi(Kni)

The transfer functions for the predicate call statement are deduced from the predicatersquosinterprocedural domain in the following fashion

Csλi(Kni) = csλi Kni killλi = oicsλi(ej o

ki ) = κjki forallj isin 1 n forallk isin 1 h

76 Extension ndash Constructor Evolution 167

whereκjki = Kp(λi)(εj ωki ) J (ε 7rarr e)s = p(e1 en) [λ1 o1 | | λm om] oi = o1

i ohi

Namely the contribution of a predicate call to each (ej oki ) input-output pair stemsfrom the contribution of the interprocedural domain for label λi and formal input-output pair (εj ωki ) In these all the formal input parameters ε in array partial equiv-alences and in array cell paths are substituted by the corresponding effective inputparameters from e or approximated away The substitution operation is denoted byJ (χ) where χ is a substitution from formal to effective parameters

Our correlation analysis is context-insensitive and αSmil programs are analysed bycomputing once and for all an interprocedural correlation summary for every predicatethey contain The correlation summaries are stored in a mapping binding predicateidentifiers to their interprocedural correlation information

76 Extension ndash Constructor EvolutionThe correlation analysis as presented so far in this chapter tracks and detects partialequivalence relations between inputs and outputs of predicates An interesting directionto investigate would be an extension of our analysis allowing us to detect not onlyequivalences but more general relations that could capture the evolution of constructorsfor variants In Figure 74-b) we illustrated the form of correlations computed forvariants With the extension the correlation information obtained for variants wouldbe richer as illustrated in Figure 77

Figure 77 ndash Construction Evolution

This extension would allow inferring the preservation of certain properties whentransitioning from a ldquostrongerrdquo state to a ldquoweakerrdquo state For instance we consideragain our process and thread data types introduced in Chapter 3 Section 315 (onpage 49 and 48 respectively) Additionally we consider a predicate kill_thread shownbelow which modifies the array of associated threads of the input p by setting the i-thelement to None If the i-th element is already inactive no modifications are made Inthis case the predicate exits with label inactive and simply copies p to the output o

predicate kill_thread ( process p int i)-gt [ true process o | inactive process o | oob] array ltoption ltthread gtgt threads option ltthread gt thi thread ti o = p [ true -gt 1]

168 Chapter 7 Correlation Analysis

threads = o threads [ true -gt 2]thi = threads [i] [ true -gt 3 f a l s e -gt 9]switch (thi) as [ti |] [Some -gt 4 None -gt 8]thi = None [ true -gt 5]threads = [ threads with i = thi] [ true -gt 6 f a l s e -gt 9]o = o with threads = threads [ true -gt 7][ true][ inactive ][oob]

For variants we are currently detecting equivalence relations between the argumentsof variant values built with the same constructor With the extension for capturingconstructor evolution we could take a step further and also detect for a given executionscenario the set of possible transitions between the different constructors For instancefor the kill_thread predicate on the true exit label we could detect that the onlypossible transition of the i-th element of the threads array is from Some to None Had theelement been None the predicate would have followed the inactive execution scenario

We further consider a predicate disjoint_stacks(process p) verifying a fundamen-tal property of any process namely the fact that the stacks of all associated threads ofthe process are disjoint If the property holds for the input process p prior to executingkill_thread intuitively it should continue to hold subsequently for the output processo as well If the arrayrsquos i-th element was already inactive ie None the propertydisjoint_stack obviously still holds since the input p is simply copied to the outputo If it was active the transition from Some to None does not impact the property asit does not create a new memory region that could threaten the property In this casethe transition from Some to None is a transition from a ldquostrongerrdquo state to a ldquoweakerrdquostate

We have conducted preliminary experiments targeting the detection of such infor-mation and these have led to promising results Tracking general relations that captureevolution requires certain modifications that are confined to the abstract partial relationtype and to the data-flow equations concerning variants

The abstract partial relation type presented in Section 72 (Definition 721) wouldneed to be extended with Impossible an additional atomic case along with Equal andAny It is required for signalling impossible transitions between variant constructors andleads to some overlap with the possible-constructors analysis presented in Chapter 5The partial relations for variants would be expressed as a square matrix of constructorswhere each element aCiCj of the matrix has a corresponding associated partial relationRCiCj Impossible would be associated to any element aCiCj for which the transitionfrom Ci to Cj is impossible For the elements aCiCi on the main diagonal for which thetransition from Ci to Ci is possible we could compute partial equivalences between thearguments of the Ci constructor For the elements aCiCj lying outside the main matrixdiagonal for which the transition from Ci to Cj is possible the associated relationwould be Any Alternatively for computing reflexive relations we could consider thattransitions on the main diagonal ie from Ci to Ci are always possible

77 Related Work 169

Impossible would become the bottom element of our partial relation type R replac-ing Equal in this role It would also become the identity element for the join operationorR (Definition 723) of partial relations and the absorbing element for the meet op-eration andR (Definition 724) Similarly to the case of for the abstract dependencytype the current bottom element Equal would become the middle element of a doublediamond-shaped abstract type and it would require the addition of some extra compar-ison cases for vR (Definition 722) as well as some extra cases for the orR (Table 72)and andR (Table 73) operations The most important modification however would bein the case of the compose operation Currently the compose operation at the level ofpartial equivalence relations is orR With this extension it would amount to a matrixmultiplication

77 Related WorkA rigorous presentation of the frame problem in specification and the different existingapproaches for addressing it has been given by Borgida et al (Borgida Mylopoulosand Reiter 1993 Borgida Mylopoulos and Reiter 1995) A more recent overview offraming is included in (Hatcliff et al 2012)

In recent years a vast body of research has been conducted on the specificationof frame properties in the context of modular programming This ranges from com-plex approaches imposing the swinging pivots requirement (Leino and Nelson 2002) toapproaches using data groups (Leino 1998 Leino Poetzsch-Heffter and Zhou 2002)adopting the Universe type system (Muumlller 2002 Muumlller Poetzsch-Heffter and Leav-ens 2003) or variations of it (Leino and Muumlller 2004 Leino and Muumlller 2006 Barnettand Naumann 2004 Barnett et al 2004) to approaches based on the dynamic frametheory (Kassios 2006 Kassios 2011 Smans Jacobs and Piessens 2012) regionallogic (Banerjee Naumann and Rosenberg 2008) or separation logic (Reynolds 2002OrsquoHearn Yang and Reynolds 2004 Parkinson and Bierman 2005)

In (Smans Jacobs and Piessens 2012) Smans et al present a technique for frameinference based on a variant of dynamic frames inspired by separation logic and relyingon accessibility information contained within pre- and postconditions By includingaccessibility information in a methodrsquos precondition an upper bound on the set oflocations modifiable by the method can be detected In our case the upper bound onthe set of elements that a predicate may modify when exiting with a particular exit labelis implicitly the set of output variables generated on that exit label joined with theset of local variables The implicit dynamic frame approach requires the specificationof accessibility information Our correlation analysis is entirely automatic and infersfine-grained frame properties for compound data structures

The literature on shape analysis (Calcagno et al 2009 Sagiv Reps and Wilhelm1999 Jones and Muchnick 1979 Montenegro Pentildea and Segura 2015) and side effectsanalyses (Salcianu and Rinard 2005 Milanova Rountev and Ryder 2005) is vastThe former is aimed at deep-heap mutations while we are focusing on deep-state mod-ifications in the context of complex transition systems The latter determine memory

170 Chapter 7 Correlation Analysis

locations that may be modified by an operation Reasoning about heap locations isbeyond our scope We treat mappings between variables and their values analyse theirevolution in a side-effect free environment and detect not only what is modified butalso how and to what extent

In (Chang and Leino 2005) Chang and Leino present the congruence-closure ab-stract domain designed for an object-oriented context and implemented in the Specprogram verifier They infer and express relations between fields of variables a goalsimilar to ours The congruence-closure domain maintains equivalence graphs mappingfield accesses to symbolic locations On its own this domain allows the inference andexpression of relations for accessed fields In order to take into account updates as wellthis needs to use the heap succession domain as a base Unlike us they can expresspreorders between fields depending on the base domains used However our domainhandles both accesses and updates to structures arrays and variants in a uniform man-ner independent of additional information We have sketched an extension for handlingnot only equivalences but also more general relations capturing constructor evolutionThis is a direction we plan to investigate in the future

Rakamarić and Hu report in (Rakamaric and Hu 2008) a method to infer frameaxioms of procedures and loops based on static analysis As a starting point they usethe DSA shape analysis presented by Lattner et al (Lattner Lenharth and Adve2007) DSA provides a summary of points-to relations as a graph that is used tocompute a set of memory locations that are modified by a procedure or its calleesBy a pass through the graph for each node reachable from the globals or procedureparameters they generate expressions representing a path to that node The generatedframe axioms are used internally by an extended static checker of C programs iein a purely automatic setting In contrast our analysis is designed for an interactiveverification context Our technique focusing on a purely functional language is notconcerned by aliasing and does not depend on an external points-to framework

In (Taghdiri Seater and Jackson 2006) Taghdiri et al present a technique forextracting procedure summaries for object-oriented procedures used to prove verifi-cation conditions Procedures are executed symbolically and the environment of thepost-state is computed so as to express every variable and field in terms of the values ofthe variables and fields of the pre-state Their goal is broader than ours However un-like their summaries our correlation results encompass only information that is visiblefrom the outside (to the callers)

Bertrand Meyer presents the double frame inference strategy an approach that tar-gets the automation of both frame specification and frame verification in the contextof Eiffel (Meyer 1991) The first component ndash the frame specification inference ndash relieson the analysis of method postconditions The idea stems from an informal reviewof JML code which showed that in practice there is a considerable overlap betweenwhat is mentioned in an assignable clause ie modifies clause and what is includedin the postcondition It relies on the observation that in general when manually writ-ten specifications include clauses about what changes they also include clauses abouthow it changes By analysing a methodrsquos p postcondition a set p is obtained Thisrepresents an overapproximation of the set of elements that are allowed to be modified

78 Conclusion 171

by p according to its specification The second component of the strategy the frameimplementation inference relies on the frame calculus (Kogtenkov Meyer and Velder2015) which is itself based on alias calculus (Kogtenkov Meyer and Velder 2015Meyer 2010 Meyer 2011) Methods are analysed and p is detected this representsan overapproximation of the set of expressions whose values may change as a result ofexecuting p Frame verification amounts to verifying that p includes p Though ourgoal is closely related to the issue addressed by the double frame inference in generaland the frame calculus in particular the approaches are not directly comparable asthey target languages with different characteristics which in turn influence both theadopted analysis techniques and the derivative targeted issues Both approaches areconservative and automatic ie neither requires manual annotations In contrast tothe frame calculus our correlation analysis is standalone and it is not concerned byaliasing

78 ConclusionIdentifying precise information concerning the effects of program operations is possibleby means of static analysis without sacrificing scalability In this chapter we have pre-sented a data-flow analysis that tracks the origin of subparts of the output and relatesit to subparts of the inputs thus detecting not only what is modified but also how it ismodified and to what extent The correlation analysis is a flow-sensitive path-sensitiveinterprocedural analysis that handles arrays structures and variants The analysis iscontext-insensitive but this trait does not have a costly impact in terms of precisionWe have defined a partial equivalence type mirroring the layered structure of algebraicdata types and associative arrays and we introduced an intermediate level consisting ofaccess paths and correlations in order to compute expressive fine-grained equivalencesbetween parts of the inputs and parts of the outputs in a flexible manner Just asframe properties specified by means of old expressions tend to lead to a proliferationof conditions to be specified our correlation summaries showing equivalences betweeninput and output subelements can become verbose in the case of predicates handlinglarge compound values and modifying only a limited input subset However these aredetected automatically and their verbose form could easily be transformed using a morecompact notation of the following form

input ( - changed subelements) = output ( - corresponding subelements)Detecting modifications is traditionally associated to shape analyses that focus

on deep-heap mutations Side-effect analyses detect memory locations that may bemodified by an operation We however are interested by deep-state modifications inthe context of a functional language Other analyses inferring frame properties havebeen devised These are mostly used in a purely automatic setting We howeverdeveloped a correlation analysis meant to be used in an interactive verification context

Similarly to the case of the dependency analysis presented in Chapter 5 we haveimplemented a prototype of the correlation analysis in OCaml and we have applied it toa functional specification of ProvenCore (Lescuyer 2015) Medium-sized experiments

172 Chapter 7 Correlation Analysis

performed on the abstract layers of ProvenCore show encouraging results For instancethe correlation results of approximately 630 αSmil predicates totalling approximately10000 lines of code are obtained in less than 05 seconds ie faster than the dependencysummaries are obtained on the same predicates This is partly a consequence of thefact that unlike the dependency analysis which computes summaries for both codeand specifications the correlation analysis computes non-trivial results only for codeSpecifications are predicates with Boolean exit labels which generate no outputs Sinceour correlation analysis computes fine-grained relations between parts of the inputsand parts of the outputs it cannot detect anything non-trivial in their case Howeverthis would change if we were to extend our correlation analysis and track relationsbetween parts of the inputs as well This is a direction that we plan to investigate inthe future We will focus on the implementation and the discussion of the obtainedresults in Chapter 8 The prototype can be tested on the web page3 dedicated to ourcorrelation analysis where multiple examples are provided and explained Additionallyusers can devise and test their own examples

The correlation analysis presented in this chapter has been the subject of a previouspublication (Andreescu Jensen and Lescuyer 2016)

3Correlation Analysis Web Page httpwwwajl-demofr2016

173

Chapter 8

Implementation Application andResults

Any fact becomes important when itrsquosconnected to another

Umberto Eco

In this chapter we focus mainly on the practical aspects regarding our static anal-yses and the approach to using their results for inferring the preservation of certainlogical properties In Section 81 and Section 82 we give a brief overview of the imple-mentations of our dependency and correlation analyses respectively In Section 83 wesuccinctly present ProvenCore one of the two microkernels developed at Prove amp Runand discuss in terms of execution times and precision the experiments we made on itsfunctional specification In Section 84 we describe the manner in which the summariescomputed by our dependency and correlation analyses are meant to be combined andused for reasoning about the preservation of certain logical invariants We illustratethis approach and discuss it on some examples inspired by ProvenCore

81 Implementation of the Dependency AnalysisPrototypes for both of our static analyses the dependency analysis presented in Chap-ter 5 and its extension with symbolic dependencies presented in Chapter 6 as well as thecorrelation analysis presented in Chapter 7 have been implemented in OCaml (Reacutemyand Vouillon 1997) While trying to retain close proximity to the analyses as presentedtheoretically their implementation mildly diverges from them at certain points due toperformance and scalability considerations One of the main differences is related to themanner in which we store dependencies and partial equivalence relations Based on theobservation that in general when considering complex transition systems the statesare characterized by properties depending only on a limited subset of their subelementswhile most transitions modify only a limited subset of the input statersquos subelements weadopt a more compact representation This in turn is reflected in some of the operatorsas well

174 Chapter 8 Implementation Application and Results

811 Dependency Type and Operators

The abstract dependency type δ that mirrors the structure of associative arrays andalgebraic data types was introduced in Chapter 52 on page 83 It is implemented bythe recursive type dep shown below

( Implementation for the dependency typeintroduced in Chapter 52 )

type dep =| Everything ( top )| Impossible ( bottom )| Nothing| Deferred of accesses ( symbolic )| Struct of struct_typ dep FMapt| Variant of var_typ dep CMapt| Array of dep (var dep) option

The maps used for expressing dependencies for structures and variants use as keysfields and constructors respectively

type fieldmodule FMap EMapS with type key = field

type consmodule CMap EMapS with type key = cons

In contrast to the extended abstract dependency type δ (Definition 641) the actualdependency for structures stores in addition to the map associating dependencies tofields the type struct_typ of the structure as well Similarly the actual dependencyfor variants stores the variantrsquos type var_typ as well in addition to the map associatingdependencies to constructors

As previously mentioned we are targeting complex transition systems such as op-erating systems and microkernels In practice transitions frequently map a large inputstate to a large output state but for computing the output state they are concernedonly with a limited subset of the input state The number of subelements of a complexinput on which the outcome of a predicate depends tends to be low compared to thetotal number of input subelements so we are filtering fields mapped to denotedby Nothing in our implemented dependency type from dependencies for structuresSimilarly from dependencies for variants we are filtering constructors mapped to perpdenoted by Impossible in our implemented dependency type

As a consequence of this optimization we need to know and hence store the typesof structures and variants in order to correctly compare join and reduce dependenciescorresponding to such types In addition this is also useful for checking that theconstructed dependencies are well-typed

81 Implementation of the Dependency Analysis 175

For building dependencies of the corresponding type we have implemented smartconstructors The dependency type is private and new dependencies can be constructedonly by using the provided smart constructors

As explained in Section 52 gt and perp can apply to any type For instance gtcan be seen as a placeholder for data that is needed in its entirety Structure arrayor variant dependencies whose subelements are all entirely needed and thus uniformlymapped to gt are transformed to gt The perp dependency is a placeholder for data thatcannot occur on a certain execution scenario A whole variant value is impossible if allits constructors are mapped to perp A whole structure or array is impossible if any of itssubelements is impossible These canonizations1 are made by our smart constructorsFor instance the smart constructor for structure dependencies returns Everything ifit receives as an input a map of fields in which each key is mapped to EverythingSince fields that are absent from a field map must be interpreted as being mappedto Nothing before returning Everything the constructor also verifies that the map offields it received as an input contains all the fields of the structure type struct_typgiven as an input as well If the given map of fields contains an Impossible value thesmart constructor returns Impossible Any mapping field 7rarr Nothing is filtered fromthe given input map

Similarly for variant dependencies the corresponding smart constructor receives asinputs the variantrsquos type and a map from constructor keys to dependency values Ifall constructors of the variant as indicated by its type var_typ are present in the in-put map and mapped to Everything the smart constructor returns Everything Ifall constructors are present and mapped to Impossible the smart constructor re-turns Impossible Otherwise if the input map contains some constructors mappedto Impossible the corresponding mappings are filtered from the map used to build thevariant dependency

For arrays the smart constructor returns Everything if both the default dependencyand the known exceptional dependency are Everything or if the former is Everythingand there is no known exceptional dependency If any of the two dependencies isImpossible the smart constructor returns Impossible

The smart constructor for deferred dependencies receives a set of variables as aninput If the given set is empty the constructor returns Nothing Otherwise it createsthe access map having the variables in the given input set ie the root variables forsymbolic paths as keys As described in Section 65 a set containing a single paththe empty path is initially associated to each

The v operator (Definition 522) as formally presented in Section 52 and detailedin Table 51 on page 86 returns false whenever comparing two incompatible depen-dencies In practice situations in which comparisons on incompatible types are madeshould never be reached As a consequence whenever we compare structure or variantdependencies we check as a safety measure that the two dependencies correspondto structures or variants of the same type Otherwise the two dependencies are not

1For making all the described canonizations we have to make sure that whenever we replace δ byδprime both δ v δprime and δprime v δ hold

176 Chapter 8 Implementation Application and Results

comparable and we throw an exception that indicates that the types are incompatibleFor structure dependencies whenever a mapping for one field f can be found only inone of the two maps to be compared we compare its mapped dependency value toNothing since absent fields must be interpreted as being mapped to Nothing Similarlyfor variant dependencies whenever a mapping for a constructor C can be found only inone of the two maps to be compared we interpret it as being mapped to Impossible

The join (Definition 523) and reduction operator (Definition 524) as formallypresented in Section 52 on page 87 and 89 respectively are total they return gt theelement conveying no information for incompatible dependencies In practice the twooperators are partial an exception is thrown whenever the two dependencies to bejoined or reduced are incompatible This applies to structures or variant dependenciesthat do not correspond to the same type as well Otherwise when joining or reducingtwo compatible structure or variant dependencies we interpret missing fields or missingconstructors as being mapped to Nothing or Impossible respectively

In Section 661 we described that there are two types of free variables that canappear in dependencies The first type consists of index variables that can appear inarray dependencies For instance in ltNothing ^ i Everythinggt the variable i is theindex of the cell for which the exceptional dependency Everything is known Addi-tionally such index variables can also appear in symbolic paths related to arrays suchas ltNothing ^ i Deferred(a[i])gt or ltDeferred(a[ - i]) ^ i Nothinggt Suchindices must be input variables of the currently analysed predicate as explained in Sec-tion 532 on page 97 The second type of free variables are the root variables thatappear in deferred dependencies For instance in ltDeferred(a[ - i]) ^ i Nothinggtthe variable a is a root variable In the general case the root variables are those outputsto which symbolic access paths are associated in deferred dependencies In order tomake use of the computed context-sensitive information actual dependencies can besubstituted for the root variables This is done by applying the symbolic access pathsto the dependency to substitute By traversing entire dependencies such as

f -gt ltNothing ^ j Everything gtg -gt b -gt Deferred (o)h -gt x -gt Everything

y -gt ltDeferred (a[ - j]) ^ j Nothing gt

and substituting the nested deferred dependencies such as Deferred(a[ - j]) andDeferred(o) we apply context-sensitive information Simultaneously during the sametraversal we also substitute the indices appearing in array dependencies such as j inthe dependency associated to the field f for instance These are either substituted byanother index variable or they are forgotten If the index to substitute is an inputthe formal variable will be replaced by the effective one Otherwise an approximationis made in order to remove the local index variable This consists in joining thedefault and the exceptional dependencies and using the result for building a new arraydependency without an exception

An index substitution is a mapping from variables to either a new index variable toreplace it or to Forget if all references to the index variable should be removed Theindex type is shown below

81 Implementation of the Dependency Analysis 177

type index = | NewIdx of var | Forget

The substitution function subst has the following type

type varmodule VMap EMapS with type key = var

val subst index VMapt -gt dep VMapt -gt dep -gt dep

Its first argument is the index substitution the second argument is the dependencysubstitution mapping root variables to dependencies The third argument is the depen-dency on which the substitutions are to be made The function returns the dependencyobtained after making both substitutions The two substitution passes are fused forperformance considerations

A separate substitution is performed for dealing with polymorphic types Our de-pendency type is not polymorphic per se However αSmil supports polymorphic typesand thus the variables described by the computed dependencies can have a polymorphictype Since the types of structures and variants are stored in the corresponding depen-dencies we must substitute polymorphic type parameters by their effective argumentsThis is done by a recursive function which traverses the dependencies and makes thetype substitution at each nested level if necessary Besides this substitution no othermodifications were made in the implementation in order to handle polymorphism Thisjustifies our formal presentation of the analyses without polymorphism

812 Intraprocedural Dependency Analysis

The intraprocedural dependency type ∆ (Definition 531) mapping variables to depen-dencies δ that was introduced in Chapter 531 is implemented as shown below

type reachable = dep VMapt

( Implementation of the intraprocedural dependency domainintroduced in Chapter 531 )

type intra =| Unreachable| Reachable of reachable

The VMap type is a map having variables as keys

type varmodule VMap EMapS with type key = var

178 Chapter 8 Implementation Application and Results

In order to avoid needlessly storing large maps predominantly containing variablesmapped to Nothing we do not store by default mappings for variables for which de-pendencies have not yet been computed Therefore the intraprocedural dependency ofany variable v for which a mapping has not yet been stored in the map is interpreted asv 7rarr Nothing As discussed in the previous section for the partial order join and reduc-tion operators when applying v∆ (Definition 533) and the join or∆ (Definition 534)and reduction oplus∆ (Definition 535) operators at the intraprocedural level any miss-ing mapping from a Reachable domain has to be interpreted as a variable mapped toNothing

With this interpretation forgetting a variable v (Definition 532) from an intrapro-cedural domain denoted by in Chapter 531 becomes straightforward and amountsto simply removing the mapping for v from the intraprocedural domain

( Forget )l e t forget d v =match d with

| Unreachable -gt d| Reachable dmap -gt Reachable (VMap remove v dmap)

We remark that the complex operations are performed at the dependency typelevel and are mostly applied pointwise at the intraprocedural level The interproce-dural dependency domains are mappings from labels to intraprocedural dependencysummaries

82 Implementation of the Correlation Analysis

821 Partial Equivalence Relations and Operators

The partial equivalence type R (Definition 721) that mirrors the structure of associativearrays and algebraic data types which was introduced in Chapter 721 on page 141 isimplemented as shown below

( Implementation of the partial equivalence typeintroduced in Chapter 72 )

type pequiv =| Equal ( bottom )| Any ( top )| PStruct of struct_typ pequiv FMapt ( structures )| PVariant of var_typ pequiv CMapt ( variants )| PArray of pequiv (var pequiv ) option ( arrays )

The FMap and CMap types are the ones presented on page 174Similarly to structure and variant dependencies and due to the same practical

considerations in addition to the map associating partial equivalences to fields the

82 Implementation of the Correlation Analysis 179

type struct_typ of the structure is stored as well Similarly the implemented partialequivalence for variants stores the variantrsquos type var_typ as well in addition to themap associating partial equivalences to constructors

For avoiding to store large maps in which the majority of the fields or constructorsare mapped to Any we filter mappings of the type field 7rarr Any and cons 7rarr Any

The partial equivalence type is private and the only manner in which partial equiva-lence relations can be built is by using the provided smart constructors The two atomiccases Equal and Any respectively can apply to any type The smart constructors forpartial equivalences corresponding to structures filters out any field mapped to Any Italso returns Equal if all fields of the structure are mapped to Equal in the given inputmap If on the contrary the given input map is empty or all fields are mapped to Anythe smart constructor returns Any

Similarly for partial equivalences corresponding to variants the correspondingsmart constructor receives as inputs the variantrsquos type and a map with constructorkeys and partial equivalences If all constructors of the variants as indicated by theirtype are present in the input map and mapped to Equal the smart constructor returnsEqual If all constructors are present and mapped to Any or if the given input map isempty the smart constructor returns Any Otherwise if the input map contains someconstructors mapped to Any the corresponding mappings are filtered from the mapused to build the variant partial equivalence

For arrays the smart constructor returns Equal if both the default relation and theknown exceptional relation are Equal or if the former is Equal and there is no knownexceptional relation If both the default relation and the known exceptional relationare Any or if the former is Any and there is no known exceptional relation the smartconstructor returns Any

In contrast to dependencies there is only one type of free variables that can appearin partial equivalence relations namely index variables As was the case for arraydependencies these can appear in partial equivalence relations corresponding to arraysand they must be input variables We traverse the partial equivalences recursivelychecking for each index variable appearing in an array relation if it is an input ora local variable References to local variables are eliminated by approximating thepartial equivalences effectively joining the default array relations with the exceptionalarray relations

822 Intraprocedural Correlations

In Chapter 74 on page 156 we have defined intraprocedural correlation summaries(Definition 741) as mappings from pairs of variables to correlation maps In practicethe type intra is the following

module PVMap = EMapMake( struct type t = element element l e t compare = compare end)

module PMap = EMapMake( struct type t = Patht Patht l e t compare = compare end)

180 Chapter 8 Implementation Application and Results

type correlation = pequiv PMapttype intra = correlation PVMapt

type t =| Related of intra| NoCorrelation| Unreachable

The implemented intraprocedural correlation summary type intra is a mappingfrom pairs of elements to correlation maps The element type is shown below

( The type of the elements for which correlationsare computed and kept intraprocedurally Ghost elements are used only for variants for avariant [v] a ghost element that nests the typeof the variant [v] is created These are filteredfrom final results )

type element =| Local of var| Output of var| Ghost of texpr

In practice we need to distinguish between output variables and local variables Thisis important for distinguishing between the final value of an output ie the one cor-related with values of the inputs and its local intermediate values Furthermore weneed to introduce ghost elements for variants When constructing a variant v with aconstructor C(ab) for instance we can keep correlations between the pairs (av) and(bv) However we fail to capture the information regarding vrsquos construction with CIn order to maintain it we create a ghost element g_vtyp with vrsquos type we add thepair (g_vtypv) to the intraprocedural summary and associate (ε ε) 7rarr [C 7rarr Any] toit Such pairs are deleted from the intraprocedural predicate summaries they are onlyused while analysing a predicatersquos body

Unlike the operations discussed in Chapter 7 the implementations of the partialorder (Definition 742) and join (Definition 743) operations are parameterized by thetyping environment mapping variables to types This has to be threaded through alloperations as it is necessary for the injection operation (Definition 738) We needto know the variable type onto which the relation is injected For instance in orderto ldquofillrdquo the unknown relations for fields or constructors with Any we must first knowwhat those fields or constructors are

823 Dependency and Correlation Analysers

The input program is first parsed and each predicate is analysed in turn Implicit pred-icates are treated conservatively Since their implementation is hidden a pessimisticassumption must be made For the dependency analysis it is considered that every-thing in their inputs has been read in order to obtain the outputs for any possible exit

82 Implementation of the Correlation Analysis 181

label Similarly for the correlation analysis it is considered that there is no correlationbetween the input and the output variables on any possible exit label

For inductive predicates the dependency analysis computes a summary for eachcase and joins the results for obtaining the dependency summary for the true exitlabel The false label is treated conservatively and everything is considered to beread Since inductive predicates are specification-only predicates that do not generateoutputs the correlation analysis associates a NoCorrelation summary to both labels

( Analyse the body [g] of an explicit predicate )l e t analyze g =

l e t todo = Queue create () inListiter ( fun v -gt Queuepush v todo) (G vertices g)l e t result = init_result g inl e t rec progress r =

tryl e t v = Queue pop todo inl e t vd = MVfind v r inl e t edges = preds g v inl e t vd rsquo = transfer r v edges ini f Dleq vd rsquo vd then progress re l se begin

Listiter ( fun edge -gtQueue push ( source edge) todo) edges

progress (MVadd v (Djoin vd vd rsquo) r)end

with Queue Empty -gt rinprogress result

The body of each explicit predicate is analysed independently for each possibleexit label using a variation of the worklist algorithm as shown above in the analyzefunction Initially a map is created having as many elements as there are nodes inthe predicatersquos body All of these are initially mapped to Unreachable the bottomelement at the intraprocedural level All the predicatersquos exit nodes are loaded intothe working queue Then a recursive function progress is executed until a fixed pointis reached and there are no more nodes left to analyse in the working queue Thefirst node of the queue is popped and analysed The nodersquos summary as stored in themap is retrieved in vd The analysis returns a summary vdrsquo for the node The twosummaries vdrsquo and vd are compared and if the former is more precise than the latterthen the recursive function progress is called Otherwise before calling progress thepredecessors of the analysed node are pushed into the working queue and in the map ofnodes the join of vd and vdrsquo is associated to the analysed node Since both analyses arebackwards analyses the dependency and correlation information of a node is based onthe dependency or correlation information of its successors in the control flow graph andthe former must be recomputed if the latter are modified Finally from the computedintraprocedural dependency summary all mappings corresponding to local variables

182 Chapter 8 Implementation Application and Results

are filtered From the computed correlation summary of an exit label l all mappingsthat do not correspond to an input and output variable pair are filtered

For the dependency analyser a command-line flag can be used to disable the usageof deferred dependencies Also the well-typedness check of dependency summaries canbe enabled similarly

A parser for dependency information has been implemented as well This allowsus to annotate αSmil programs with the expected results and compare them to thecomputed ones A similar parser for the correlation information is planned for the nearfuture

83 Dependency and Correlation Results on ProvenCoreLayers

831 ProvenCore Description

ProvenCore (Lescuyer 2015) is one of the two microkernels entirely specified and devel-oped in Smart at Prove amp Run Unlike Minix 31 by which it was inspired ProvenCoretargets ARM architectures and uses a Memory Management Unit for managing virtualaddress spaces It is a general-purpose microkernel supporting creation and deletion ofprocesses execution of programs synchronous message-passing inter-process commu-nication with timeouts asynchronous notifications and process-to-process data copies

The main property ensured by ProvenCore is the isolation property Isolation impliestwo complementary properties namely integrity and confidentiality Integrity refersto ensuring that the resources of a process (its code data and registers) cannot bealtered or interfered with by other processes unless explicitly authorized by the processConfidentiality refers to ensuring that the resources of a process cannot be observed byother processes unless explicitly authorized by the process In other words integrityensures that until a process decides to communicate with other processes it will executeas if it were alone on the system Confidentiality ensures that as long as a process doesnot send its secrets to other processes it can change its secrets without affecting otherprocesses

The isolation property has been formally proven using the interactive proof as-sistant of ProvenTools The proofs also establish functional specifications verified byProvenCore (Lescuyer 2015)

The proof for the isolation property is based on multiple refinements between suc-cessive models from the most abstract on which the isolation property is defined andproven to the most concrete ie the actual model used for code generation Thesesuccessive models are shown in Figure 81

Using multiple abstract models each more abstract than its predecessor enablesa degree of separation of concerns in the overall proof The lower-level proofs includea plethora of low-level properties and invariants and are devoid of functional prop-erties while the higher-level models focus on functional specifications Each layer ofabstraction removes details that are not relevant for it anymore and enables changing

83 Dependency and Correlation Results on ProvenCore Layers 183

SPM

RSM

FSP

TDS

Most Abstract

Least Abstract

Figure 81 ndash ProvenCore ndash Abstract Layers

the representation of the transition system in order to internalize in the structure of itsstates some invariants of the preceding level

The Security Policy Model (SPM) is the most abstract level and the one at whichthe isolation property is expressed and proven The kernel is modeled as an abstractcontroller and the various processes are modeled as machines each possessing its ownindependent physical resources

The Refined Security Model (RSM) is an intermediate layer meant to bridge thewide gap between its successor the SPM and its predecessor the FSP In the RSMthe machines share the same physical resources which are managed by the controller

The Functional Specifications (FSP) layer is a model roughly equivalent to its pre-decessor ndash the TDS ndash in functionality but unlike the latter it uses data structures andalgorithms that facilitate reasoning and formal proof Its main functional differencewith the TDS is that it eliminates MMU address translation using instead a linearview of the RAM similarly to the RSM

The Target of Evaluation Design (TDS) is the model that is used to generate thesequential Smart code of the kernel as well as the models for hardware componentsthat are not translated into C code but which are necessary for completing the TDSspecifications

For each refinement a view ie a function from the concrete model state to theabstract model state is defined Then a correspondence or commutation lemma isproven establishing that transitions from c to cprime in the concrete model entail transitionsfrom the view of c to the view of cprime in the abstract model Since the views are not totalfunctions this requires showing that the views actually exist In this manner thehigher levels are attained reaching models that are simpler and more flexible than theTDS but that still simulate all its possible behaviours (Lescuyer 2015)

This refinement chain also facilitates reusing parts of one proof effort in other proofs

184 Chapter 8 Implementation Application and Results

832 Obtained Dependency and Correlation Results

Our dependency and correlation analyses must be evaluated by two different criterianamely execution time and precision In this section we are discussing the former Thelatter will be discussed in the following section

Both analyses target complex transition systems in general and operating systemsin particular The ideas behind them stemmed directly from the verification effortentailed by ProvenCore Unlike other static analyses which are frequently employed ina fully automatic setting our static analyses are supposed to be used as companiontools in the middle of interactive program verification They are supposed to be appliedoften as steps during interactive proofs For instance the dependency and correlationsummaries for different predicates might be needed for verifying a single propertyThese in turn may imply a whole-model analysis Therefore the dependency andcorrelation analyses must perform quickly in order to answer effectively ldquoquestionsrdquoasked frequently

Our analyses have currently been applied to the functional specification of Proven-Core (Lescuyer 2015) More specifically they have been applied to the RSM FSP andTDS layers shown in Figure 81 Each of these layers is characterized by a global statewith numerous fields and different transitions ie supported commands or systemcalls such as fork exec exit Each supported command receives as an input the globalstate before the transition and returns the state of the system after the transition

For instance in RSM the global states are much simpler compared to the ones inthe layers below it ie FSP and TDS They are modeled by a structure with 6 fieldsout of which 3 are modeled by arrays and 2 by structures The RSM counterpart ofthe optional table of processes is a store of machines which are themselves the coun-terpart of FSP processes Machines are structures with 7 fields that refer to registersinformation regarding inter-process communication or permissions and code and datasegments Out of the 7 fields 2 are modeled by variants 2 by associative arrays andother 2 by structures

The global state of the FSP layer is modeled by a structure type with 15 fieldsincluding fields that concern process management (for memory allocations informationabout processes) interrupt handling (registered handlers active handlers) scheduling(priority queues currently running process process to run next) time management orcode data Among these 15 fields 9 fields are ldquocompositerdquo themselves being modeledby structures variants or associative arrays For instance among the fields concerningprocess management there is a table of optional processes The processes themselvesare modeled by a structure type having 26 fields Out of the total of 26 fields 11 aremodeled by algebraic data structures or associative arrays too

The FSP global state is characterized by over 70 invariantsIn TDS the global state is a structure having 33 fields among which 23 are ldquocom-

positerdquo as well The processes are structures having 29 fields among which 14 aremodeled by associative arrays or algebraic data types The global state is character-ized by approximately 140 invariants

83 Dependency and Correlation Results on ProvenCore Layers 185

In Table 83 we give an overview of the global states for each analysed layer Thefirst column shows the total number of fields The second column indicates the numberof fields that are modeled by associative arrays Between parentheses we indicatethe number of arrays having ldquocompositerdquo elements and elements of atomic or implicittypes respectively For example the FSP global state has 6 fields that are modeled byassociative arrays and all 6 of them have ldquocompositerdquo elements In columns 3 4 and5 we show the number of fields that are modeled by structures variants and atomic orimplicit types respectively

Table 83 ndash ProvenCore Abstract Layers ndash Global State Type

Global State Arrays Structures Variants AtomicImplicit

RSM 6 fields 2 fields (11) 2 fields 0 fields 2 fieldsFSP 15 fields 6 fields (60) 0 fields 3 fields 6 fieldsTDS 33 fields 14 fields (140) 3 fields 6 fields 10 fields

The global state of each layer contains an array or store of processes or machinesIn Table 84 we give an overview of the process or machine type for each analysed layerThe table has the same structure as the one described previously for the global statetypes

Table 84 ndash ProvenCore Abstract Layers ndash ProcessMachine Type

ProcessMachine Arrays Structures Variants AtomicImplicit

RSM 7 fields 2 fields (11) 2 fields 2 fields 1 fieldFSP 26 fields 2 fields (02) 5 fields 3 fields 16 fieldsTDS 29 fields 1 field (10) 8 fields 5 fields 15 fields

We have applied our dependency and correlation analyses on the RSM FSP andTDS layers thus conducting medium-sized experiments An overview of the charac-teristics for the 3 ProvenCore layers is included in Table 85 Table 87 and Table 89In each of these the first column shows the total number of predicates of the analysedlayers In parentheses we indicate the number of predicates that only read informationand return a Boolean-like exit label ie logical properties as well as the number of im-plicit predicates for which a pessimistic assumption is made The second column showsthe total number of lines of code (LoC) for each including comments and type defini-tions The next three columns indicate the number of LoC corresponding to predicatestype definitions and comments respectively

We have run the analyses 101 times in a loop on a Lenovo laptop with a Quad-CoreIntel Core I7-5500U processor and 8 GB RAM The system runs Xubuntu GnuLinux64 bit Release 1510 with OCaml 401 Before the first run of each loop the operatingsystemrsquos cache was dropped using the following command

186 Chapter 8 Implementation Application and Results

echo 3 gt procsysvmdrop_caches

The time measured includes only the execution of the analysis algorithms It ex-cludes the time required to load the input files as well as the time spent printing theresults

On average our fully context-insensitive dependency analysis as presented in Chap-ter 5 computed the dependency summaries for 633 RSMFSP predicates in 0656 sec-onds For the TDS predicates the dependency summaries were computed in 0699seconds on average These results are indicated in Table 85

Table 85 ndash Abstract Layers ndash Evaluation Data and DependencyAnalysis Timing

Predicates Total LoC Code Types Comments Dependency Avg

RSMFSP 633 (23565) 9853 8402 596 855 0656 s

TDS 780 (231155) 14000 11306 588 2106 0699 s

In Table 86 we indicate the minimum and maximum execution times for thecontext-insensitive dependency analysis Various percentiles are indicated as well

Table 86 ndash Abstract Layers ndash Detailed Dependency Analysis Timing(in seconds)

Min 10ile 50ile 90ile Max Avg

RSMFSP 0650 0651 0652 0658 0730 0656

TDS 0690 0691 0693 0718 0798 0699

The average execution time of our dependency analysis with the deferred accessesextension is shown in Table 87 in the last column denoted by Avg On averageour dependency analysis extended with deferred accesses as presented in Chapter 6computed the dependency summaries with context-sensitive leaves for 633 predicatesin 0779 seconds For the TDS predicates the dependency information was computedin 0919 seconds on average These results are indicated in Table 87

Therefore using our relaxed form of context-sensitivity led to an increase of 10-20in execution time on the used benchmarks

The detailed timing information for the dependency analysis using deferred accessesis shown in Table 88

The average execution time of our correlation analysis is shown in Table 89 in thelast column denoted by Avg The correlation summaries for the RSMFSP predicatesare computed in 0426 seconds on average For the TDS predicates the correlationsummaries are computed in 0496 seconds on average Unlike the dependency analysis

83 Dependency and Correlation Results on ProvenCore Layers 187

Table 87 ndash Abstract Layers ndash Evaluation Data and Deferred Depen-dency Analysis Timing

Predicates Total LoC Code Types Comments Deferred Avg

RSMFSP 633 (23565) 9853 8402 596 855 0779 s

TDS 780 (231155) 14000 11306 588 2106 0919 s

Table 88 ndash Abstract Layers ndash Detailed Deferred Dependency AnalysisTiming (in seconds)

Min 10ile 50ile 90ile Max Avg

RSMFSP 0776 0777 0779 0781 0785 0779

TDS 0904 0905 0908 0975 0999 0919

which computes information for code as well as specifications ie logical propertiesin a unified manner the correlation analysis only computes information for predicatesthat actually modify data structures This partly explains the time difference betweenthe two analyses We also remark that the possible-constructors analysis is performedsimultaneously with the dependency analysis and this contributes to the differencebetween the execution times as well

Table 89 ndash Abstract Layers ndash Evaluation Data and Correlation Anal-ysis Timing

Predicates Total LoC Code Types Comments Correlation Avg

RSMFSP 633 (23565) 9853 8402 596 855 0426 s

TDS 780 (231155) 14000 11306 588 2106 0496 s

The detailed timing information for our correlation analysis is shown in Table 810Generally static analysis has been considered prohibitive in terms of execution

time and it has been avoided in an interactive context and used predominantly inan automatic context Though currently applied only on medium-sized models theexecution times of both of our analyses are short enough to expect reasonable executiontimes for larger models as well2

2It is noteworthy to remark that the interprocedural dependency and correlation summaries willnot necessarily be computed on-the-fly during the interactive proof They rather will be computed aspart of the build In contrast the treatment of a query once all interprocedural information has been

188 Chapter 8 Implementation Application and Results

Table 810 ndash Abstract Layers ndash Detailed Correlation Analysis Timing(in seconds)

Min 10ile 50ile 90ile Max Avg

RSMFSP 0424 0425 0425 0427 0432 0426

TDS 0492 0493 0494 0498 0540 0496

833 Precision of our Dependency and Correlation Summaries

In this section we try to illustrate the sort of dependency and correlation summariesthat are computed by our analyses We conclude the section with a brief discussionregarding the precision of our obtained results Assessing and discussing precision asa metric for usefulness is hard in isolation and can only be effectively done in relationto actual applications However we present some statistics in order to give someinsight about the proportion of the non-trivial information computed For our currentdiscussion we focus on the results obtained on the RSMFSP and the TDS layers

One of the analysed predicates of the RSMFSP layers is do_auth This predicateis a system call clearing or granting an authorization to some process to read from orwrite to some memory range of the current process It receives a global state in andan index i as inputs and produces on the true label the new global state out aftermodifying the permission for the i-th process in the process store

The code of do_auth performs various system-wide checks before registering thepermission change and is therefore not trivial although its effect is quite limitedIndeed the correlation results computed by our analysis for the true label of thispredicate are shown below

true (in out) 7rarr [(ε ε) 7rarr 7rarr Equal 14 fields

procs 7rarr Any (procs procs) 7rarr 〈 Equal i [ None 7rarr Equal

Some 7rarr v 7rarr 7rarr Equal 25fields

mem_auth 7rarr Any]〉]

The analysis detects that out of the 15 fields of out only the i-th element of the procsfield is changed Furthermore it detects that if this element is an active process iebuilt with the Some constructor only the mem_auth field is modified out of the total of26 fields Everything else is copied from the input state in

computed will be executed in real-time Nevertheless it is desirable to have fast analyses allowingdevelopers to iterate frequently

83 Dependency and Correlation Results on ProvenCore Layers 189

Combined with dependency summaries for logical properties this correlation sum-mary would allow us to infer the preservation of all invariants that are not concernedwith the memory permissions All but one out of the specified properties for the globalstate fall into this category This is the relevant memory permissions property

predicate proc_mem_auth_ok(proc proc) -gt [true | false]

which verifies a fundamental property that has to hold for all processes in the processstore of proc and states that a process has permissions covering a valid range of mem-ory addresses and referring only to existing processes After executing do_auth thisproperty is threatened and needs to be verified only for the i-th process of the storeIt is preserved for all others

The dependency results computed by our analysis for this predicate are shown be-low The analysis detects that for each of the possible execution scenarios the outcomedepends only on 2 out of the 26 fields namely the stackframe and the memory per-missions The dependency on the stackframe is confined to only one of the 3 fieldsthe data and stack segment The memory permissions are given by a variant with 3constructors denoting reading and writing permissions or the absence of any permis-sion Furthermore besides pinning down the outcomersquos dependency on 2 out of the 26fields of the proc structure the analysis also detects that the absence of any memorypermission indicated by the constructor NONE of the mem_auth variant is perp for the falseexecution scenario In other words unused permissions cannot threaten the propertyproc_mem_auth_ok

false rarr proc rarr mem_auth rarr [ READ rarr base rarr gt len rarr gtWRITE rarr base rarr gt len rarr gtNONE rarr perp ]

stackframe rarr ds rarr gttrue rarr proc rarr mem_auth rarr [ READ rarr base rarr gt len rarr gt

WRITE rarr base rarr gt len rarr gtNONE rarr ]

stackframe rarr ds rarr gt

The relevant memory permissions property is thus only threatened by transitionsthat add memory permissions or change a processrsquo virtual space layout Only 2 tran-sitions out of the 25 belong to this category exec which resets the processrsquo segmentsand do_auth which adds permissions and was discussed above In particular transi-tions deleting memory permissions do not impact the property since the absence ofpermissions as shown by the dependency of the constructor NONE for the false labelis an impossible case when the property does not hold This is one of the practicaladvantages of tracking constructor possibilities simultaneously and of extending thecorrelation analysis to track the evolution of constructors as well

In the following we briefly discuss our dependency summaries obtained on theRSMFSP layer in terms of precision An overview is given in Table 811 The firstcolumn refers to the fully context-insensitive dependency analysis as presented in Chap-ter 5 The second column refers to the dependency analysis extended with deferred

190 Chapter 8 Implementation Application and Results

access maps as presented in Chapter 6 The first line indicates the total number ofpredicates both implicit and explicit The second line indicates the total number ofimplicit predicates for which we are obliged to make a pessimistic assumption and toconsider everything needed given that their implementation is hidden The third lineindicates the number of explicit predicates without inputs for which empty summariesare retrieved Our dependency analysis detects the input subset that is read in orderto obtain the output In the case of predicates without inputs this subset is emptyMost explicit predicates without inputs correspond to wrapper predicates around callsto constructors that take no arguments Since αSmil is an intermediate language suchpredicates are automatically generated and do not necessarily correspond to program-mer written predicates The next line line 4 indicates the number of predicates forwhich we obtain non-trivial information By non-trivial information we mean depen-dency summaries in which the dependency associated to at least one input variableis different than gt ie Everything the element conveying no information With thecontext-insensitive dependency analysis we obtain non-trivial results for 344 predicatesWith the extended dependency we obtain non-trivial results for 403 predicates

Table 811 ndash RSMFSP Layers ndash Evaluation Data and DependencySummaries

Context-Insensitive Deferred

Number of Total Predicates 633 633

Number of Implicit Predicates 65 65No Inputs 26 26

Number of Non-Trivial Results 344 403

Number of Trivial-Results 289 230bull Implicit 65 65bull No Inputs 26 26bull Other 198 139

Predicates with Atomic Inputs 31 31

Completely Read 71 71

Overapproximation 96 37

The following line mdash line 5 mdash indicates the total number of predicates for whichtrivial results are obtained These include the results for implicit predicates as well asthose for predicates without inputs For the simple version of the dependency analysiswe obtain 198 trivial results excluding implicit predicates and predicates without in-puts For the extended dependency analysis we obtain trivial results for 139 predicatesexcluding implicit predicates and predicates without inputs Therefore for the first ver-sion of the analysis 49 trivial summaries are a consequence of context-insensitivity The

83 Dependency and Correlation Results on ProvenCore Layers 191

next 3 lines refer to the 139 predicates for which trivial results are obtained with bothversions of the dependency analysis 31 of them correspond to predicates manipulat-ing only inputs of atomic types such as int Such inputs are completely read andthus the trivial results are justified and do not correspond to an over-approximationOther 71 correspond to predicates making complex manipulations and actually read-ing all of their input such as well-formedness checks The last 37 trivial results area consequence of over-approximations made by our analysis The majority of themcorrespond to complex predicates making multiple calls to other complex predicatesand relying heavily on calls to implicit predicates for which conservative assumptionsare made For the simple dependency analysis other 46 trivial results are a result ofover-approximations related to context-insensitivity

An overview of the dependency results for the TDS layer is given in Table 812The table follows the same structure as described for Table 811

Table 812 ndash TDS Layer ndash Evaluation Data and DependencySummaries

Context-Insensitive Deferred

Number of Total Predicates 780 780

Number of Implicit Predicates 155 155No Inputs 15 15

Number of Non-Trivial Results 386 458

Number of Trivial-Results 394 322bull Implicit 155 155bull No Inputs 15 15bull Other 224 152

Predicates with Atomic Inputs 49 49

Completely Read 59 59

Overapproximation 116 44

We remark that with the deferred dependencies extension we obtain more pre-cise dependency summaries for 273 predicates of the RSMFSP abstract layer Theseconstitute approximately 50 of the predicates in the used benchmark For the TDSlayer we obtain more precise results for 308 predicates using the deferred dependenciesextension These constitute approximately 50 of the predicates in the TDS layer forwhich non-trivial results can be obtained (ie excluding implicit predicates and thosewithout inputs) The dependency summaries obtained with the extended analysis areconsiderably more detailed For instance just to give an intuition of the differencebetween the results obtained for the TDS layer the file containing the results com-puted with the context-insensitive dependency analysis contains 7333 lines and its size

192 Chapter 8 Implementation Application and Results

is 2631 kB while the file containing the results computed with the extended analysiscontains 11547 lines and its size is 5239 kB

The statistics for the correlation analysis are shown in Table 813 Unlike the depen-dency analysis which handles both logical properties and predicates generating outputsthe correlation analysis does not handle logical properties It tracks fine-grained partialequivalences between parts of the input and parts of the output Therefore the numberof RSMFSP predicates for which we can obtain non-trivial results (ie at least onepartial equivalence between an input (sub)element and an output (sub)element on atleast one exit label) is lower Implicit predicates and specification-only predicates aremapped to NoCorrelation the top element conveying no information Out of the 307predicates left we obtain non-trivial results for 186 of them The rest include predi-cates relying heavily on calls to implicit predicates They also include complex systemcalls such as fork or exec and auxiliary operations which modify their input entirely

Table 813 ndash RSMFSP Layers ndash Evaluation Data and CorrelationSummaries

Correlation Analysis

Number of Total Predicates 633

Number of Implicit Predicates 65Number of Logical Properties (No Outputs) 235

No Inputs 26

Number of Non-Trivial Results 186

Number of Trivial-Results 90bull Implicit 65bull No Inputs 26bull No Outputs 235bull AtomicImplicit Inputs 31

An overview of the correlation results for the TDS layer is given in Table 814 Thetable follows the same structure as described for Table 813

84 Reasoning about Framing using Correlations and De-pendencies

841 A Decision Procedure

In general reasoning about framing relies on the frame rule which is commonly illus-trated as follows

PCQP andRCQ andR

84 Reasoning about Framing using Correlations and Dependencies 193

Table 814 ndash TDS Layer ndash Evaluation Data and Correlation Summaries

Correlation Analysis

Number of Total Predicates 780

Number of Implicit Predicates 155Number of Logical Properties (No Outputs) 231

No Inputs 15

Number of Non-Trivial Results 235

Number of Trivial-Results 95bull Implicit 155bull No Inputs 15bull No Outputs 231bull AtomicImplicit Inputs 49

The purpose of the frame rule is to enable local reasoning a property R that holdsfor a state P will continue to hold after executing a command C provided that Rreads only locations that are unmodified by C The frame rule also called the rule ofconstancy (Reynolds 1981) applies in its original form to simple languages which donot use a heap Separation logic addresses framing for heap-supporting languages

In our case the αSmil language with which we are working does not support mu-tation Our work is not concerned with heap modifications but focuses on deep-statemodifications We handle predicates that receive a composite input state and constructa new composite output state without altering the former The new output state isconstructed by copying the input state and modifying a subset of subelements

In our context the frame rule must be reinterpreted as follows a property R ispreserved by a predicate C receiving an input state P and constructing an output stateQ if the states P and Q agree on the subset on which the property R depends In otherwords a property is preserved by a predicate if the latter only modifies subelements onwhich the property does not depend Using the terminology used in separation logica property R is preserved by a predicate C if the footprint of C is disjoint from thefootprint of R However we are not concerned with locations but with subelements oflarge states modeled by algebraic data structures and arrays Therefore when reasoningabout framing we need to check if the input subset modified by an operation is disjointfrom the subset that properties are reading and depending on

We have devised two static analyses for automatically computing the footprints ofoperations and properties The dependency analysis detects the input subset on whichthe outcome of an operation or of a property relies The correlation analysis detectsthe input subset that is modified by an operation in order to obtain the output Theresults of the two analyses are meant to be used and combined by a decision procedurein order to automatically infer the preservation of frame properties

The decision procedure has not been implemented yet but based on preliminary

194 Chapter 8 Implementation Application and Results

experiments we give an intuition about how the dependency and correlation summariesare meant to be unified what type of queries could be answered and the mechanismused for answering them

Concretely the decision procedure is meant to receive a sequence of atoms one ofwhich is a query The query is to be answered based on the correlation summariescomputed for the other atoms Atoms are calls to built-in or user-defined predicatesQueries usually consist of a Boolean built-in statement such as an equality check ora partial structure equality check for instance or a call to a logical predicate havingtrue and false as exit labels and generating no outputs In a nutshell the dependencysummary computed for the query would have to be transformed and interpreted as aset of correlations that are sufficient to answer affirmatively the given query Thisshould then be compared to the correlations computed for the atoms The query canbe answered affirmatively if the latter is less than or equal to the former

We sketch the envisioned mechanism behind our decision procedure on a simpleexample receiving 4 atoms One of them is a query as shown below

type state = f int g int h int

v1 = sft = s with g = w

v2 = tf

Q v1 = v2 - true -

In this case it is not necessary to first obtain the dependency for the query markedwith Q and to interpret it as a correlation The necessary and sufficient correlation forthe query to be answered affirmatively can be obtained directly

(v1 v2) 7rarr (ε ε) 7rarr Equal

Separately we need to extract all the correlation information regarding (v1 v2) fromthe given atoms For this we must first find the chains of correlations connecting thetwo through other intermediate atoms Therefore we begin by building an undirectedgraph in which every variable appearing in the atoms is added as a node An edge isadded between any nodes representing the input and the output of the same atom3For our example the graph is shown below

s

t v1

v2 w

3In general these graphs will not be acyclic Further measures will have to be taken for correctlydealing with all cases

84 Reasoning about Framing using Correlations and Dependencies 195

The path connecting v1 and v2 is highlighted in green In the general case such pathscould be detected using a depth-first search algorithm Using the detected path betweenv1 and v2 we build a chain of pairs of variables of the following form

(v1 s) lt-gt (s t) lt-gt (t v2)

These are the unordered paths for which we need to extract the correlation informationcontained in the correlation summaries of the atoms The correlation summaries of ourexample atoms are the following

v1 = sf (s v1) 7rarr (f ε) 7rarr Equal

t = s with g = w (s t ) 7rarr

(f f) 7rarr Equal(h h) 7rarr Equal

(w t ) 7rarr (ε g) 7rarr Equal

v2 = tf (t v2) 7rarr (f ε) 7rarr Equal

In the correlation summaries computed by our analysis correlation maps are associatedto pairs of input and output values ie the computed information is expressed betweenthe input and the output variables of an operation They can be seen as ordered pairshaving inputs as the left members and outputs as the right members However thecorrelation information expresses a relation between two runtime values which canbe compared independently of the order in which they appear4 The atoms refer tovalues that occur in the program at different times and answering the query is doneindependently of the order of execution Therefore at this level we can swap themembers of the pairs to which correlation maps are associated This allows us toobtain correlation information expressed in terms of the variable pairs in the chainextracted from the graph of atom variables For instance for our example we wouldobtain the following

(v1 s) lt-gt (s t) lt-gt (t v2)

(v1 s) 7rarr (ε f) 7rarr Equal

(s t ) 7rarr

(f f) 7rarr Equal(h h) 7rarr Equal

(t v2) 7rarr (f ε) 7rarr Equal

From these we compute the Cartesian product of the correlations appearing in thecorrelation maps as follows

4When the evolution of constructors will be tracked as well the relations will stop being symmetricThus the matrices will have to be transposed

196 Chapter 8 Implementation Application and Results

c1 times c2 c3 times c4

wherec1 = (ε f) 7rarr Equalc2 = (f f) 7rarr Equalc3 = (h h) 7rarr Equalc4 = (f ε) 7rarr Equal

For our example the obtained set would be the following((ε f) 7rarr Equal (f f) 7rarr Equal (f ε) 7rarr Equal)((ε f) 7rarr Equal (h h) 7rarr Equal (f ε) 7rarr Equal))

For each member of the obtained set we need to recursively compose the correlationsin order to obtain information regarding the values involved in the query The composeoperations would be applied as follows

(((cprime1 cprime2) cprime3) middot middot middot )

where for the first element of our example set cprime1 cprime2 and cprime3 have the following values

cprime1 = (ε f) 7rarr Equalcprime2 = (f f) 7rarr Equalcprime3 = (f ε) 7rarr Equal

For our example we cannot obtain any correlation information regarding (v1 v2)by composing the correlations of the second member of the Cartesian product Thefirst correlation relates the value of v1 to the value of the f field of s while the secondcorrelation relates the values of the field h of s and t Thus in this case we cannotinfer anything regarding v1 and t nor regarding v1 and v2 However by composingthe correlations of the first member of the Cartesian product we obtain the following

(v1 v2) 7rarr (ε ε) 7rarr Equal

If after composing we would have obtained multiple correlations referring to (v1 v2)these would have had to be intersected thus allowing us to extract from the givenatoms the most precise correlation information regarding (v1 v2) In the general casethe correlation information obtained after the intersection is the one that has to becompared to the correlation computed previously ie the sufficient correlation for thequery to be answered affirmatively For our example this amounts thus to comparing

(v1 v2) 7rarr (ε ε) 7rarr EqualvK

(v1 v2) 7rarr (ε ε) 7rarr Equal

Based on this we can conclude that the given query Q will be answered affirmativelyfor the atoms given in our example

84 Reasoning about Framing using Correlations and Dependencies 197

842 Types of Targeted Queries

The types of queries that are targeted by our approach can be categorized as follows

bull equality of values

bull structure equality on the values of a subset of fields

bull implications of the form logical_property(a) rArr logical_property(b) where a and bare related by the facts inferred from the other atoms of the query

bull conjunctions of such queries

In the general case we need to reinterpret a dependency summary as a correlationsummary The queryrsquos goal is to deduce the equality between pairs of variables Whentwo such variables are of the same type we can create a correlation map containinga single correlation That correlation associates to the pair of paths (ε ε) a partialequivalence relation which mirrors the dependency The partial equivalence relation iscreated as follows

bull When the dependency is Everything the equivalence relation becomes Equal

bull When the dependency is Nothing the equivalence relation becomes Any

bull Structure variant and array dependencies are transformed pointwise to structurevariant and array partial relations

bull When the dependency is Impossible the equivalence relation becomes Any in theabsence of the possible-constructors extension

We illustrate here some example queries revolving around our do_auth predicatediscussed in Section 833

A naive equality query on the entire input and output of do_auth would not besatisfiable as do_auth does modify the memory authorizations of one process This isthe first sort of supported query

do_auth (now i arg3 )[ true after |oob| f a l s e ]Q after = nowrArr no

The main argument of the do_auth predicate is the global state now an instance ofthe global_state structure5type global_state =

procs array ltoption ltprocess gtgtmemory_regions array lt mem_region gtirq_handlers array lt irq_handler gtcurrent_process int

5Due to confidentiality reasons the actual definition of the struct has been modified and edited forlength

198 Chapter 8 Implementation Application and Results

Since the do_auth predicate only affects the mem_auth of one process in the procsarray we can successfully deduce for the values of now and after the equality on thefields memory_regions and current_process This is the second sort of supported query

do_auth (now arg2 arg3 )[ true after |oob| f a l s e ]Q after = ltmemory_regions current_process gtnowrArr yes

Finally we can directly deduce that the all_ids_in_handlers_ok_global(state)property is not threatened by the execution of the do_auth predicate

do_auth (now arg2 arg3 )[ true after |oob| f a l s e ]Q congruent all_ids_in_handlers_ok_global (now)

all_ids_in_handlers_ok_global (after )rArr yes

This property verifies that all the identifiers used by the registered interruptionhandlers stored in the field irq_handlers are valid The property has the followingdependency summary

false rarr staterarr irq_handlersrarr Everythingtrue rarr staterarr irq_handlersrarr Everything

From the correlation of the do_auth predicate we know that the irq_handlers fieldis preserved and therefore it follows that the property which only depends on thatfield is preserved Similar properties that do not depend on the procs array but onlyon parts or on the entirety of one or more of the other 14 fields will be preserved aswell

The preservation of properties that have to hold for every process in the arrayprocs will be inferred as well as long as they do not depend on the mem_auth field ofthe processes For instance the property procs_proc_map_ok_global verifies that eachprocess of the array procs has valid code data and stack segments This property hasthe following dependency summary

truerarrstaterarr

procsrarr

lang[None rarr EverythingSome rarr vrarr proc_maprarr Everything

]rang

falserarrstaterarr

procsrarr

lang[None rarr EverythingSome rarr vrarr proc_maprarr Everything

]rang

Since for every active process of the array the property depends only on the proc_mapfield it is unaffected by the modification of the mem_auth field Therefore the propertyis preserved for the global state after obtained after the execution of do_auth Similarproperties that do not depend on the mem_auth field but only depend on other parts ofthe data structure will be preserved as well

An extension of the decision procedure sketched in Section 841 could take advan-tage of additional information regarding array indices For example the query couldspecify that two of the involved array indices are different

85 Decision Procedure Experiments 199

do_auth (now i arg3 )[ true after |oob| f a l s e ]Assert i = jQ congruent mem_auth_ok_global (now j)

mem_auth_ok_global (after j)rArr yes

The mem_auth_ok_global(statej) property checks the well-formedness of the mem-ory permission on the j-th process The above query is satisfied if the propertymem_auth_ok_global holds for all processes other than the i-th The correlation sum-mary for do_auth states that the elements of the procs array are unmodified by theoperation except for the i-th element Combined with the dependency summary formem_auth_ok_global given below this allows the query to be satisfied

truerarrstaterarr

procsrarr

langNothing j

[None rarr EverythingSome rarr vrarr ProcDep1

]rang

falserarrstaterarr

procsrarr

langNothing j

[None rarr EverythingSome rarr vrarr ProcDep2

]rang

where ProcDep1 ismem_auth rarr

READ rarr base rarr Everything

len rarr EverythingWRITE rarr base rarr Everything

len rarr EverythingNONE rarr Impossible

stackframe rarr dsrarr Everything

and ProcDep2 is

mem_auth rarr

READ rarr base rarr Everything

len rarr EverythingWRITE rarr base rarr Everything

len rarr EverythingNONE rarr Nothing

stackframe rarr dsrarr Everything

85 Decision Procedure ExperimentsWe have applied a basic prototype of the decision procedure using the dependency andcorrelation summaries computed for the RSMFSP layers of ProvenCore

Our prototype considers pairs of one logical property and one predicate The log-ical property and the predicate must both operate on values of the same type Moreprecisely one of the predicatersquos inputs as well as one of its outputs and one of thelogical propertyrsquos inputs must all be of the same type Our prototype attempts to

200 Chapter 8 Implementation Application and Results

detect whether the logical property is preserved after the execution of the predicate Ifseveral inputs or outputs are of the same type all combinations are considered Mostimplicit types were not considered when searching for propertypredicate pairs as theyare less likely to yield successful results For example arguments of a primitive typelike int are unlikely to be unaffected by the execution of the predicate

This prototype automatically inspected all such propertypredicate pairs found inthe RSMFSP layers A property was considered to be preserved if its dependencysummary for the argument involved when translated to a set of equalities formed asubset of the equalities implied by the predicatersquos correlation summary Both the trueand the false exit labels were considered independently and the property is consideredto be preserved (subject to some conditions) when it is preserved for either or both exitlabels More precisely given a property π(ı)[true|false] and a predicate p(ıprime)[` oprime] wereport success when it can satisfy the following

exist i isin ı iprime isin ıprime oprime isin oprime such that Γ(i) = Γ(iprime) = Γ(oprime) (81)and exist ` isin true false (82)and E(j) 6= E(k) and Eprime(j) 6= Eprime(k) forallj k isin ı ıprime oprime (83)

when j and k are used as array indices (84)

andlangE[

Prop(ı[irarr iprime])[true|false]]rang `minusrarr E (85)

andlangE[

Pred (ıprime)[`prime o| ]]rang `primeminusrarr Eprime (86)

andlangEprime[

Prop(ı[irarr oprime])[true|false]]rang `minusrarr Eprime (87)

where ı[i rarr iprime] and ı[i rarr oprime] denote the sequence of variables ı in which the variable iis replaced by the variable iprime (respectively oprime)

This initial prototype was run on the 398 explicit predicates and 235 properties ofthe RSMFSP layer of ProvenCore Out of these we filtered predicateproperty pairsfor which the property has an input i of the same type as one of the predicatersquos inputsiprime and one of its outputs oprime These pairs involve 161 distinct predicates and 165 distinctproperties In total there were 8250 tuples (i iprime oprime `) which satisfied the conditions 81and 82

This experiment allowed us as a first result to automatically identify 102 predicatesfor which at least one property is preserved under the conditions 81 ndash 87 stated aboveFor many predicates it was possible to show that after the execution of said predicateseveral properties are preserved (up to 33) Figure 82 shows an overview of howmany properties were inferred to be preserved for each predicate The blue regionat the bottom indicates how many properties are inferred to be preserved for a givenpredicate while the red region above shows how many properties were compatible withthe predicate but were not inferred to be preserved

Figure 83 shows an overview of how many predicates were inferred to be preservingeach property The blue region at the bottom indicates how many predicates areinferred to be preserving a given property while the red region above shows how many

85 Decision Procedure Experiments 201

20 40 60 80 1000

5

10

15

20

25

30

35

40

45

50

Predicates

Num

berof

preservedprop

ertie

sinferred

Figure 82 ndash Distribution of the number of inferred preserved proper-ties Predicates are sorted along that criterion

predicates were compatible with the property but were not inferred to be preservingit

It is worth noting that in both figures 82 and 83 the red zone contains properties(respectively predicates) which could fall into these cases

bull The property is actually threatened by the predicate (respectively the predicatethreatens the property)

bull The property is not threatened (respectively the predicate is not threatening)but proving so requires more information that is obtained by our dependencyand correlation analysis For example a more precise dependency or correlationanalysis (eg tracking constructor evolution as presented in 76) could be neededA numerical or value analysis could also help determine that the parts of the in-put data structure which are modified by the predicate and on which the logical

202 Chapter 8 Implementation Application and Results

20 40 60 80 900

10

20

30

40

50

60

Properties

Num

berof

predicates

preserving

theprop

erty

inferred

Figure 83 ndash Distribution of the number of inferred predicates for whicha property is preserved Properties are sorted along that criterion

property also depends still satisfy the property after the execution of the pred-icate Alternatively the preservation of these properties can be demonstratedusing an interactive prover

bull The property is not threatened (respectively the predicate is not threatening) andthe dependency and correlation summaries contain enough information to provethe non-interference of the predicate and property but our decision procedureprototype failed to infer it This can be due to a timeout (this initial prototypehas not been optimized at all and can take a substantial time in some cases) orto precision losses in the decision procedure prototype itself

203

Chapter 9

Conclusion and Perspectives

There is no real ending Itrsquos just theplace where you stop the story

Frank Herbert

Despite its intuitive simplicity the frame problem has proved to be an enduringissue with notoriously tedious implications Its different manifestations have been stud-ied for several decades in various contexts ranging from Artificial Intelligence in thecontext of which it has been originally identified to the field of formal specificationand verification Recently it has received extensive attention from the object-orientedverification community where it has been identified as a subsisting problem (LeavensLeino and Muumlller 2007) and an ideal candidate for automation (Meyer 2015) Clas-sical approaches to addressing the frame problem are typically relying on separationlogic (Reynolds 2005) or ownership types (Clarke Potter and Noble 1998) Thoughthe merits of such approaches are indisputable the manual specification effort that theyrequire is non-negligible as well Frame properties are an integral part of a completespecification and they are mandatory for proving correctness but ideally they shouldimpose little additional effort Programmers should be able to focus on the truly inter-esting part namely what code does and rely on automatic tools for the repetitive andcumbersome task of specifying and verifying frame properties

Interactive formal verification of complex transition systems is not exempt from themanifestations of the frame problem either Considerable effort is spent on provingthe preservation of the systemrsquos invariants even though in practice the majority ofoperations have a localised effect on the system and impact only a limited number ofinvariants at the same time Identifying those invariants that are unaffected by anoperation and automatically proving their preservation can substantially ease the proofburden for the programmer In this thesis we have presented an approach towardsautomatically inferring the preservation of framing-related invariants It is meant tobe used in the context of an interactive theorem prover and employs two differentstatic analyses namely a dependency analysis and a correlation analysis whose unifiedresults are meant to establish the disjointness between the data dependencies of a logicalproperty and the modifications performed by an operation The decision proceduremeant to combine the results of the two analyses is still in an incipient stage Howeverour preliminary experiments related to automatically answering queries regarding the

204 Chapter 9 Conclusion and Perspectives

preservation of certain invariants for unmodified parts are encouraging We believethat our envisioned approach can become applicable to complex transition systemson a routine basis Reasoning about framing can come for free without imposing thespecification of additional clauses We also believe that automatic reasoning aboutframing can be achieved through static analysis Generally static analysis has beenconsidered prohibitive in terms of execution time It has been predominantly usedin an automatic context and avoided in interactive contexts where queries have to beanswered fast so as not to impede the natural flow of an interactive proof Thoughcurrently applied only on medium-sized models given the short execution times of ourdedicated static analyses we believe that reasonable execution times for larger modelscan be expected as well Therefore we surmise that static analysis is applicable in aninteractive verification context

91 ContributionsThe main contributions of this thesis are the designed and implemented dependencyand correlation analyses which are meant to be used in the context of an interactivetheorem prover Both analyses handle associative arrays and algebraic data types andcompute fine-grained results mirroring the layered structures of such types They targetcomplex transition systems in general and operating systems in particular These arecharacterized by states defined by complex compound data structures and by transi-tions ie state changes that map an input state to an output state Both of our staticanalyses are concerned with deep-state manipulations ie accesses and modificationsrespectively

The dependency analysis presented in Chapter 5 automatically detects the relevantinput subset needed for producing certain outputs It handles functions and theirspecifications in a unified manner and computes for each possible execution scenario aconservative approximation of the input (sub)elements on which their outcome dependsIt is a flow-sensitive path-sensitive interprocedural data-flow analysis Furthermore forvariants an additional analysis is simultaneously conducted for computing the subsetof possible constructors on a given execution scenario Together with the dependencyinformation per se this additional information about constructors is meant to answerthe same question namely what fragments of the input influence the output from adifferent albeit related point of view The first version of the dependency analysis wasfully context-insensitive In order to introduce a relaxed form of context-sensitivity wehave devised an extension based on symbolic paths This was presented in Chapter 6

The extension for the dependency analysis is based on computing deferred depen-dencies consisting of symbolic access maps in which callers can subsequently injecttheir specific context information on an as-needed basis The dependency summariesfor each predicate are still computed only once However by including nested context-sensitive components at the summariesrsquo leaves we reduce the precision penalty exertedby the fully context-insensitive approach without sacrificing performance As discussedin Chapter 8 the deferred dependencies extension led to an increase of 10ndash20 in

91 Contributions 205

execution time on the used benchmarks In terms of precision it led to more precisedependency summaries for 50 of the predicates of the same benchmarks

We surmise that besides its intended target other programming activities can relyon our dependency analysis as well For instance the analysis can have applications inthe testing realm for designing and generating test suites that avoid redundant testingof the same execution scenario Classes of inputs that will test the same executionscenario can be automatically determined The input subelements on which the outputsof a predicate do not depend can be consistently supplied with the same testing value asthey are completely irrelevant for the outcome On the contrary the input subelementson which the outputs depend should be targeted and their values should be varied formore comprehensive testing Furthermore our dependency analysis could also facilitateunit testing for exceptions as it computes specific results for every execution scenarioof a predicate Indeed it is useful to have dedicated test cases which trigger eachexception that can be thrown by a function The set of relevant parts of the inputdiffers for each possible exception and for the regular execution behaviour

Our second contribution is the correlation analysis presented in Chapter 7 whichdetects the flow of input values into output values It computes a conservative approx-imation of fine-grained equivalences between the input and the output subelementsof a function The correlation analysis is an interprocedural data-flow analysis thattracks the origin of subparts of the output and relates it to subparts of the inputs thussummarising the behaviour of functions and detecting not only what is modified butalso how and to what extent We have defined a partial equivalence type mirroringthe layered structure of algebraic data types and associative arrays and we introducedan intermediate level consisting of access paths and correlations These allow comput-ing expressive information regarding equivalences between subparts of the inputs andsubparts of the outputs in a flexible manner

Prototypes for both of our analyses have been implemented in OCaml These werediscussed in Chapter 8 We have applied them to a functional specification of Proven-Core (Lescuyer 2015) a general-purpose microkernel that ensures isolation Resultsfor medium-sized models have been obtained on average in less than 1 second with thedependency analysis and less than 05 seconds on average with the correlation analysisStatic approaches have long been considered as being confined to small programs Webelieve that our preliminary results indicate that it is possible to report conservativeprecise information without sacrificing scalability

We remark that our experience with the design and implementation of the twoanalyses has been rather different The dependency analysis is much more complexsemantically This is partly a consequence of the simultaneous possible-constructorsanalysis which has an impact on the abstract dependency domain Deferred depen-dencies add yet another layer of complexity However the implementation proved tobe much simpler than the implementation of the correlation analysis The latter posedchallenges due to the intermediate layer of access paths and correlations that we had toadd for obtaining expressive fine-grained information However the correlation analy-sis is simpler from a semantics point of view It is also noteworthy to remark that forboth analyses an intermediate level below variables needed to be introduced as soon as

206 Chapter 9 Conclusion and Perspectives

fine-grained relations between pairs of variables were considered directly or indirectlyIn the case of deferred dependencies this was not the main goal but rather a mecha-nism for obtaining increased precision in specific cases for already pertinent dependencyinformation In contrast for the correlation analysis the inclusion of an intermediatelevel was imperative for obtaining useful expressive information in non-trivial cases

As a first step towards a solution for automatically inferring the preservation offraming-related invariants we have sketched a decision procedure meant to employour two static analyses By uncovering equivalences between inputs and outputs afterhaving detected that a property only depends on unmodified parts and by unifying theresults the preservation of invariants for the unmodified parts can be inferred

92 Future WorkWe conclude this thesis with some perspectives for practical future work as well assome theoretical open issues that we wish to address in the future

Practical Future Work From a practical point of view our future work goalsrevolve around the full implementation of the decision procedure its integration inthe interactive theorem prover developed at Prove amp Run as well as its comprehensiveassessment in a real-word context

Decision Procedure Implementation Our first and main goal for the nearfuture focuses on the full implementation of the decision procedure combining our de-pendency and correlation summaries and answering queries related to the preservationof logical properties The performance of the algorithm sketched in Section 84 shouldbe assessed on real-world examples The complexity of this algorithm depends on thenumber of paths relating two endpoints in the graph of query atoms variables Italso depends on the number of correlations relating pairs of variables along the chainsconnecting endpoints This could lead to a combinatorial explosion of the number ofcompose operations for large query graphs Further optimization manners should beinvestigated and applied in the algorithm implementing the decision procedure

Validation After having implemented the decision procedure the precision ofour two static analyses employed by it should be comprehensively assessed on variousbenchmarks

Some of the theoretical aspects related to our static analyses have been formalizedin Coq by Steacutephane Lescuyer However the actual implementation of the algorithmsis not formally connected to the mechanized proofs Therefore it would be desirableto extensively test the implementation of the analysis algorithms This could be doneby translating the dependencies and correlations to types in a sufficiently expressivetype system or by inserting runtime guards These guards would check equalities forcorrelations and would taint supposedly irrelevant values identified by the dependencyanalysis verifying that the output is not tainted For the correlation analysis inputs

92 Future Work 207

which are correlated to some output values could be given a universally quantified typethe same type appearing in the parts of the output which are supposed to be equalThis is commonly used as a design pattern in functional programming languages toexpress data-flow constraints via the type system For the dependency analysis eachpart of the input which is supposed to be irrelevant for a predicatersquos output could beassigned a distinct polymorphic type variable which does not appear in the outputThis allows the body of the predicate to take notice of a valuersquos presence without beingable to manipulate its contents

Tool Integration and Support Another important goal for the near future isthe integration of our decision procedure in the ProvenTools interactive prover A tac-tic allowing to automate the inference of framing-related invariant preservation shouldbe supported This goal entails a sequence of other considerations that have to beaddressed Currently the dependency and correlation analyses handle whole programsand compute summaries for every predicate of the analysed program Though theexecution times of our analyses are low even these can prove to be cumbersome ina real world context Therefore the two analyses should be adapted so as to allowincrementally analysing only parts of a program Caching the results of the analysesacross invocations of the decision procedure could prove to be efficient as well Addi-tionally the mechanism of answering queries regarding invariant preservation shouldbe transparent allowing users to see the reasoning steps behind the decision procedureTransparency is necessary for the ProvenTools prover which targets products that haveto be certified This possibly also requires a more concise output notation for thedependency and correlation summaries in order to ease the interpretation of resultsCurrently they tend to be rather verbose for predicates handling composite values witha large number of subelements

For the dependency summaries a parser was implemented allowing users to an-notate predicates with expected dependency information A similar parser could bewritten for the correlation summaries These annotations are a useful tool for testingthe analyses on benchmarks for which the correlations and dependencies are knownIn addition they would allow users to annotate programs with constraints on the ex-pected dependencies and correlations similarly to type annotations in the presence oftype inference and check that these expectations hold

Finally the decision procedure and our dependency and correlation analyses couldbe offered as a software library A public API should describe and prescribe the ex-pected behavior of our two static analyses and the decision procedure relying on them

Theoretical Perspective From a theoretical perspective several interesting as-pects remain open In a nutshell these consist in developing support for more sophis-ticated queries that could be answered by our decision procedure The precision of ourdependency and correlation analysis can be further increased as well

208 Chapter 9 Conclusion and Perspectives

Decision Procedure A first interesting theoretical effort revolves around theformalization of our envisioned decision procedure used for inferring framing-relatedinvariants The types of queries it can answer should be further investigated andextended For instance it would be desirable to assert as a hypothesis that certainpredicates are known to be valid on some nodes of the graph We further identifiedtwo extensions for our correlation analysis that could increase the number of answeredqueries

Constructor Evolution For increasing the number of queries that our decisionprocedure can answer one direction to investigate is the extension of our correlationanalysis in order to track and compute information regarding the evolution of variantconstructors This additional information should be leveraged to the context of ourdecision procedure The formalization and implementation of this extension constitutean interesting effort Furthermore other types of relations between variables could beconsidered as well

Correlations between Inputs Another extension of our correlation analysisthat would enrich the types of queries that can be answered by our decision proce-dure consists in tracking correlations between pairs of inputs in addition to the onescomputed between pairs of inputs and outputs Besides the unified treatment of bothactual code and logical properties on the correlation analysis side this would allowanswering queries that consist in a single logical property on multiple input values thatare additionally related by other facts It would also allow detecting aliasing betweenvariables used as array indices

Numerical Analysis for Arrays Arrays are a source of precision loss in bothof our static analyses Hence it would be interesting to investigate the impact of usingsimple numerical abstractions (congruence modulo and linear abstract domains) Thenumerical analysis could otherwise be offloaded to an external SMT solver such as Z3or Alt-Ergo for instance Symbolic evaluation of the arithmetic computations shouldalso be possible This would avoid precision losses when joining two dependencies orcorrelations with exceptional information on distinct index variables which prove tohave the same integer value in practice Eliminating this source of imprecision wouldlikely benefit the analysis of loops over arrays

In conclusion we have devised and implemented two static analyses detecting thedata dependencies of a logical property as well as correlations between the inputs andthe outputs of operations Our first results on a functional model of a microkernelare encouraging both in terms of precision and speed making these analyses suitableto use in the context of interactive provers Aside from incremental improvements onthe precision of our analyses the next steps are to combine them in order to detectinvariants which are not affected by the execution of a predicate and to integrate this

92 Future Work 209

as a tactic in the ProvenTools theorem prover We believe that reasoning about framingcan come for free without imposing additional annotations Inferring the preservationof framing-related invariants through static analysis can become applicable on a routinebasis for complex transition systems

211

Bibliography

Abrial Jean-Raymond Stephen A Schuman and Bertrand Meyer (1980) ldquoSpecifica-tion Languagerdquo In On the Construction of Programs pp 343ndash410

Alpuente Mariacutea Santiago Escobar and Salvador Lucas (2007) ldquoRemoving RedundantArguments Automaticallyrdquo In TPLP 71-2 pp 3ndash35 url httpdxdoiorg101017S1471068406002869

Andreescu Oana F Thomas Jensen and Steacutephane Lescuyer (2015) ldquoDependencyAnalysis of Functional Specifications with Algebraic Data Structuresrdquo In FormalMethods and Software Engineering - 17th International Conference on Formal En-gineering Methods ICFEM 2015 Proceedings pp 116ndash133 doi 101007978-3-319-25423-4_8 url httpdxdoiorg101007978-3-319-25423-4_8

Andreescu Oana Fabiana Thomas Jensen and Steacutephane Lescuyer (2016) ldquoCorrelat-ing Structured Inputs and Outputs in Functional Specificationsrdquo In Software En-gineering and Formal Methods - 14th International Conference SEFM 2016 Heldas Part of STAF 2016 Vienna Austria July 4-8 2016 Proceedings pp 85ndash103doi 101007978-3-319-41591-8_7 url httpdxdoiorg101007978-3-319-41591-8_7

Asati Rahul Amitabha Sanyal Amey Karkare and Alan Mycroft (2014) ldquoLiveness-Based Garbage Collectionrdquo In Compiler Construction - 23rd International Con-ference CC 2014 Held as Part of the European Joint Conferences on Theory andPractice of Software ETAPS 2014 Grenoble France April 5-13 2014 Proceed-ings pp 85ndash106 doi 101007978-3-642-54807-9_5 url httpdxdoiorg101007978-3-642-54807-9_5

Baier Christel and Joost-Pieter Katoen (2008) Principles of Model Checking MITPress isbn 978-0-262-02649-9

Banerjee Anindya Mike Barnett and David A Naumann (2008) ldquoBoogie Meets Re-gions A Verification Experience Reportrdquo In Verified Software Theories Tools Ex-periments Second International Conference VSTTE 2008 Toronto Canada Oc-tober 6-9 2008 Proceedings Ed by Natarajan Shankar and Jim Woodcock BerlinHeidelberg Springer Berlin Heidelberg pp 177ndash191 isbn 978-3-540-87873-5 doi101007978-3-540-87873-5_16 url httpdxdoiorg101007978-3-540-87873-5_16

Banerjee Anindya and David A Naumann (2014) ldquoA Logical Analysis of Framing forSpecifications with Pure Method Callsrdquo In Verified Software Theories Tools andExperiments - 6th International Conference VSTTE 2014 Vienna Austria July17-18 2014 Revised Selected Papers pp 3ndash20 doi 101007978-3-319-12154-3_1

212 BIBLIOGRAPHY

Banerjee Anindya David A Naumann and Stan Rosenberg (2008) ldquoRegional Logicfor Local Reasoning about Global Invariantsrdquo In ECOOP 2008 - Object-OrientedProgramming 22nd European Conference Paphos Cyprus July 7-11 2008 Pro-ceedings pp 387ndash411 doi 101007978-3-540-70592-5_17 url httpdxdoiorg101007978-3-540-70592-5_17

mdash (2013) ldquoLocal Reasoning for Global Invariants Part I Region Logicrdquo In J ACM603 181ndash1856 doi 1011452485982 url httpdoiacmorg1011452485982

Barnes J and Praxis Critical Systems Limited (1997) High Integrity Ada The SPARKApproach Programming Languages Addison-Wesley isbn 9780201175172 urlhttpsbooksgooglefrbooksid=YoBGAAAAYAAJ

Barnett Michael and David A Naumann (2004) ldquoFriends Need a Bit More Maintain-ing Invariants Over Shared Staterdquo In Mathematics of Program Construction 7thInternational Conference MPC 2004 Stirling Scotland UK July 12-14 2004Proceedings pp 54ndash84 doi 10 1007 978 - 3 - 540 - 27764 - 4 _ 5 url http dxdoiorg101007978-3-540-27764-4_5

Barnett Michael Robert DeLine Manuel Faumlhndrich K Rustan M Leino and Wol-fram Schulte (2004) ldquoVerification of Object-Oriented Programs with InvariantsrdquoIn Journal of Object Technology 36 pp 27ndash56 doi 105381jot200436a2url httpdxdoiorg105381jot200436a2

Barnett Michael Bor-Yuh Evan Chang Robert DeLine Bart Jacobs and K RustanM Leino (2005a) ldquoBoogie A Modular Reusable Verifier for Object-Oriented Pro-gramsrdquo In Formal Methods for Components and Objects 4th International Sym-posium FMCO 2005 Amsterdam The Netherlands November 1-4 2005 RevisedLectures pp 364ndash387 doi 10100711804192_17 url httpdxdoiorg10100711804192_17

Barnett Michael Robert DeLine Manuel Faumlhndrich Bart Jacobs K Rustan M LeinoWolfram Schulte and Herman Venter (2005b) ldquoThe Spec Programming SystemChallenges and Directionsrdquo In Verified Software Theories Tools ExperimentsFirst IFIP TC 2WG 23 Conference VSTTE 2005 Zurich Switzerland October10-13 2005 Revised Selected Papers and Discussions pp 144ndash152 doi 101007978-3-540-69149-5_16 url httpdxdoiorg101007978-3-540-69149-5_16

Barnett Mike Manuel Faumlhndrich K Rustan M Leino Peter Muumlller Wolfram Schulteand Herman Venter (2011) ldquoSpecification and Verification The Spec ExperiencerdquoIn Commun ACM 546 pp 81ndash91 doi 10114519531221953145 url httpdoiacmorg10114519531221953145

Berdine Josh Cristiano Calcagno and Peter W OrsquoHearn (2005) ldquoSmallfoot Mod-ular Automatic Assertion Checking with Separation Logicrdquo In Formal Methodsfor Components and Objects 4th International Symposium FMCO 2005 Amster-dam The Netherlands November 1-4 2005 Revised Lectures pp 115ndash137 doi10100711804192_6 url httpdxdoiorg10100711804192_6

mdash (2012) ldquoVerification Condition Generation and Variable Conditions in SmallfootrdquoIn CoRR abs12044804 url httparxivorgabs12044804

BIBLIOGRAPHY 213

Berdine Josh Byron Cook and Samin Ishtiaq (2011) ldquoSLAyer Memory Safety forSystems-Level Coderdquo In Computer Aided Verification - 23rd International Confer-ence CAV 2011 Snowbird UT USA July 14-20 2011 Proceedings pp 178ndash183doi 101007978-3-642-22110-1_15 url httpdxdoiorg101007978-3-642-22110-1_15

Berg Joachim van den and Bart Jacobs (2001) ldquoThe LOOP Compiler for Java andJMLrdquo In Tools and Algorithms for the Construction and Analysis of Systems7th International Conference TACAS 2001 Held as Part of the Joint EuropeanConferences on Theory and Practice of Software ETAPS 2001 Genova Italy April2-6 2001 Proceedings pp 299ndash312 doi 1010073- 540- 45319- 9_21 urlhttpdxdoiorg1010073-540-45319-9_21

Bertot Yves and Pierre Casteacuteran (2004) Interactive Theorem Proving and ProgramDevelopment - CoqrsquoArt The Calculus of Inductive Constructions Texts in The-oretical Computer Science An EATCS Series Springer isbn 978-3-642-05880-6doi 101007978-3-662-07964-5 url httpdxdoiorg101007978-3-662-07964-5

Bertrane Julien Patrick Cousot Radhia Cousot Jeacuterocircme Feret Laurent MauborgneAntoine Mineacute and Xavier Rival (2015) ldquoStatic Analysis and Verification of AerospaceSoftware by Abstract Interpretationrdquo In Foundations and Trends in ProgrammingLanguages 22-3 pp 71ndash190 doi 1015612500000002 url httpdxdoiorg1015612500000002

Blanchet Bruno Patrick Cousot Radhia Cousot Jeacuterocircme Feret Laurent MauborgneAntoine Mineacute David Monniaux and Xavier Rival (2003) ldquoA Static Analyzer forLarge Safety-Critical Softwarerdquo In Proceedings of the ACM SIGPLAN 2003 Con-ference on Programming Language Design and Implementation 2003 San DiegoCalifornia USA June 9-11 2003 pp 196ndash207 doi 101145781131781153url httpdoiacmorg101145781131781153

Bobot Franccedilois and Jean-Christophe Filliacirctre (2012) ldquoSeparation Predicates A Tasteof Separation Logic in First-Order Logicrdquo In Formal Methods and Software Engi-neering - 14th International Conference on Formal Engineering Methods ICFEM2012 Kyoto Japan November 12-16 2012 Proceedings pp 167ndash181 doi 101007978-3-642-34281-3_14 url httpdxdoiorg101007978-3-642-34281-3_14

Borgida Alexander John Mylopoulos and Raymond Reiter (1993) ldquo And NothingElse Changes The Frame Problem in Procedure Specificationsrdquo In Proceedings ofthe 15th International Conference on Software Engineering Baltimore MarylandUSA May 17-21 1993 Pp 303ndash314 url httpportalacmorgcitationcfmid=257572257636

mdash (1995) ldquoOn the Frame Problem in Procedure Specificationsrdquo In IEEE Trans Soft-ware Eng 2110 pp 785ndash798 doi 10110932469460 url httpdxdoiorg10110932469460

Bouissou O Eacute Conquet P Cousot R Cousot J Feret K Ghorbal Eacute GoubaultD Lesens L Mauborgne A Mineacute S Putot X Rival and M Turin (2009)

214 BIBLIOGRAPHY

ldquoSpace Software Validation using Abstract Interpretationrdquo In Proc of the In-ternational Space System Engineering Conference on Data Systems in Aerospace(DASIA 2009) Vol SP-669 httpwww-aprlip6fr~minepubliarticle-bouissou-al-dasia09pdf Istambul Turkey ESA p 7 doi 19215321921553

Burdy Lilian Yoonsik Cheon David R Cok Michael D Ernst Joseph R Kiniry GaryT Leavens K Rustan M Leino and Erik Poll (2005) ldquoAn Overview of JML Toolsand Applicationsrdquo In STTT 73 pp 212ndash232 doi 101007s10009-004-0167-4url httpdxdoiorg101007s10009-004-0167-4

Calcagno Cristiano and Dino Distefano (2011) ldquoInfer An Automatic Program Verifierfor Memory Safety of C Programsrdquo In NASA Formal Methods - Third Interna-tional Symposium NFM 2011 Pasadena CA USA April 18-20 2011 Proceed-ings pp 459ndash465 doi 101007978-3-642-20398-5_33 url httpdxdoiorg101007978-3-642-20398-5_33

Calcagno Cristiano Dino Distefano Peter W OrsquoHearn and Hongseok Yang (2008)ldquoSpace Invading Systems Coderdquo In Logic-Based Program Synthesis and Transfor-mation 18th International Symposium LOPSTR 2008 Valencia Spain July 17-18 2008 Revised Selected Papers pp 1ndash3 doi 101007978-3-642-00515-2_1url httpdxdoiorg101007978-3-642-00515-2_1

mdash (2009) ldquoCompositional Shape Analysis by Means of Bi-Abductionrdquo In Proceedingsof the 36th ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages POPL 2009 pp 289ndash300 doi 10114514808811480917 url httpdoiacmorg10114514808811480917

mdash (2011) ldquoCompositional Shape Analysis by Means of Bi-Abductionrdquo In J ACM586 p 26 doi 10114520496972049700

Cardelli Luca and Peter Wegner (1985) ldquoOn Understanding Types Data Abstractionand Polymorphismrdquo In ACM Comput Surv 174 pp 471ndash522 doi 10114560416042 url httpdoiacmorg10114560416042

Castillo Rosa Francisco Corbera Angeles G Navarro Rafael Asenjo and Emilio LZapata (2008) ldquoComplete Def-Use Analysis in Recursive Programs with DynamicData Structuresrdquo In Euro-Par 2008 Workshops - Parallel Processing VHPC 2008UNICORE 2008 HPPC 2008 SGS 2008 PROPER 2008 ROIA 2008 and DPA2008 Las Palmas de Gran Canaria Spain August 25-26 2008 Revised SelectedPapers pp 273ndash282 doi 101007978-3-642-00955-6_32 url httpdxdoiorg101007978-3-642-00955-6_32

Catantildeo Neacutestor and Marieke Huisman (2003) ldquoCHASE A Static Checker for JMLrsquosAssignable Clauserdquo In Verification Model Checking and Abstract Interpretation4th International Conference VMCAI 2003 New York NY USA January 9-112002 Proceedings pp 26ndash40 doi 10 1007 3 - 540 - 36384 - X _ 6 url http dxdoiorg1010073-540-36384-X_6

Chalin Patrice Joseph R Kiniry Gary T Leavens and Erik Poll (2005) ldquoBeyondAssertions Advanced Specification and Verification with JML and ESCJava2rdquoIn Formal Methods for Components and Objects 4th International SymposiumFMCO 2005 Amsterdam The Netherlands November 1-4 2005 Revised Lectures

BIBLIOGRAPHY 215

pp 342ndash363 doi 10100711804192_16 url httpdxdoiorg10100711804192_16

Chang Bor-Yuh Evan and K Rustan M Leino (2005) ldquoAbstract Interpretation withAlien Expressions and Heap Structuresrdquo In Verification Model Checking andAbstract Interpretation 6th International Conference VMCAI 2005 Proceedingspp 147ndash163 doi 101007978-3-540-30579-8_11 url httpdxdoiorg101007978-3-540-30579-8_11

Clarke David G and Sophia Drossopoulou (2002) ldquoOwnership Encapsulation andthe Disjointness of Type and Effectrdquo In Proceedings of the 2002 ACM SIGPLANConference on Object-Oriented Programming Systems Languages and ApplicationsOOPSLA 2002 Seattle Washington USA November 4-8 2002 Pp 292ndash310 doi101145582419582447 url httpdoiacmorg101145582419582447

Clarke David G John Potter and James Noble (1998) ldquoOwnership Types for Flex-ible Alias Protectionrdquo In Proceedings of the 1998 ACM SIGPLAN Conferenceon Object-Oriented Programming Systems Languages amp Applications (OOPSLArsquo98) Vancouver British Columbia Canada October 18-22 1998 Pp 48ndash64 doi101145286936286947 url httpdoiacmorg101145286936286947

Clarke Edmund M and E Allen Emerson (1981) ldquoDesign and Synthesis of Synchro-nization Skeletons Using Branching-Time Temporal Logicrdquo In Logics of ProgramsWorkshop Yorktown Heights New York May 1981 pp 52ndash71 doi 10 1007BFb0025774 url httpdxdoiorg101007BFb0025774

Cok David R (2005) ldquoReasoning with Specifications Containing Method Calls andModel Fieldsrdquo In Journal of Object Technology 48 pp 77ndash103 doi 105381jot200548a4 url httpdxdoiorg105381jot200548a4

Cousot P and R Cousot (1994) ldquoHigher-Order Abstract Interpretation (and Appli-cation to Comportment Analysis Generalizing Strictness Termination Projectionand PER Analysis of Functional Languages) invited paperrdquo In Proceedings of the1994 International Conference on Computer Languages Toulouse France IEEEComputer Society Press Los Alamitos California pp 95ndash112

Cousot Patrick (2001) ldquoAbstract Interpretation Based Formal Methods and FutureChallengesrdquo In Informatics - 10 Years Back 10 Years Ahead Pp 138ndash156 doi1010073-540-44577-3_10 url httpdxdoiorg1010073-540-44577-3_10

Cousot Patrick and Radhia Cousot (1977) ldquoAbstract Interpretation A Unified Lat-tice Model for Static Analysis of Programs by Construction or Approximation ofFixpointsrdquo In Conference Record of the Fourth ACM Symposium on Principles ofProgramming Languages Los Angeles California USA January 1977 pp 238ndash252 doi 101145512950512973 url httpdoiacmorg101145512950512973

mdash (2010) ldquoA Gentle Introduction to Formal Verification of Computer Systems byAbstract Interpretationrdquo In Logics and Languages for Reliability and Securitypp 1ndash29 doi 103233978-1-60750-100-8-1 url httpdxdoiorg103233978-1-60750-100-8-1

216 BIBLIOGRAPHY

Cousot Patrick Radhia Cousot Jeacuterocircme Feret Laurent Mauborgne Antoine MineacuteDavid Monniaux and Xavier Rival (2005) ldquoThe ASTREEacute Analyzerrdquo In Program-ming Languages and Systems 14th European Symposium on ProgrammingESOP2005 Held as Part of the Joint European Conferences on Theory and Practice ofSoftware ETAPS 2005 Edinburgh UK April 4-8 2005 Proceedings pp 21ndash30doi 101007978-3-540-31987-0_3 url httpdxdoiorg101007978-3-540-31987-0_3

Cousot Patrick Radhia Cousot Jeacuterocircme Feret Antoine Mineacute Laurent MauborgneDavid Monniaux and Xavier Rival (2007) ldquoVarieties of Static Analyzers A Com-parison with ASTREErdquo In First Joint IEEEIFIP Symposium on Theoretical As-pects of Software Engineering TASE 2007 June 5-8 2007 Shanghai China pp 3ndash20 doi 101109TASE200755 url httpdxdoiorg101109TASE200755

Cuoq Pascal Virgile Prevosto and Boris Yakobowski Frama-C Value Analysis UserManual url httpframa-ccomdownloadframa-c-value-analysispdf

Cuoq Pascal Florent Kirchner Nikolai Kosmatov Virgile Prevosto Julien Signolesand Boris Yakobowski (2012) ldquoFrama-C - A Software Analysis Perspectiverdquo InSoftware Engineering and Formal Methods - 10th International Conference SEFM2012 Thessaloniki Greece October 1-5 2012 Proceedings pp 233ndash247 doi 101007978-3-642-33826-7_16 url httpdxdoiorg101007978-3-642-33826-7_16

Cytron Ron Jeanne Ferrante Barry K Rosen Mark N Wegman and F KennethZadeck (1989) ldquoAn Efficient Method of Computing Static Single Assignment FormrdquoIn Conference Record of the Sixteenth Annual ACM Symposium on Principles ofProgramming Languages Austin Texas USA January 11-13 1989 pp 25ndash35 doi1011457527775280 url httpdoiacmorg1011457527775280

Darvas Aacutedaacutem and Peter Muumlller (2006) ldquoReasoning About Method Calls in InterfaceSpecificationsrdquo In Journal of Object Technology 55 pp 59ndash85 doi 105381jot200655a3 url httpdxdoiorg105381jot200655a3

Delmas David and Jean Souyris (2007) ldquoAstreacutee From Research to Industryrdquo In StaticAnalysis 14th International Symposium SAS 2007 Kongens Lyngby DenmarkAugust 22-24 2007 Proceedings pp 437ndash451 doi 101007978-3-540-74061-2_27 url httpdxdoiorg101007978-3-540-74061-2_27

Dietl Werner and Peter Muumlller (2005) ldquoUniverses Lightweight Ownership for JMLrdquoIn Journal of Object Technology 48 pp 5ndash32 doi 105381jot200548a1url httpdxdoiorg105381jot200548a1

Dijkstra Edsger W (1976) A Discipline of Programming Prentice-HallDistefano Dino Peter W OrsquoHearn and Hongseok Yang (2006) ldquoA Local Shape Anal-

ysis Based on Separation Logicrdquo In Proceedings of the 12th International Con-ference on Tools and Algorithms for the Construction and Analysis of SystemsTACASrsquo06 Vienna Austria Springer-Verlag pp 287ndash302 isbn 3-540-33056-9978-3-540-33056-1

Distefano Dino and Matthew J Parkinson (2008) ldquojStar Towards Practical Verifi-cation for Javardquo In Proceedings of the 23rd Annual ACM SIGPLAN Conference

BIBLIOGRAPHY 217

on Object-Oriented Programming Systems Languages and Applications OOPSLA2008 October 19-23 2008 Nashville TN USA pp 213ndash226 doi 10 1145 14497641449782 url httpdoiacmorg10114514497641449782

Drossopoulou Sophia Adrian Francalanza Peter Muumlller and Alexander J Summers(2008) ldquoA Unified Framework for Verification Techniques for Object Invariantsrdquo InECOOP 2008 - Object-Oriented Programming 22nd European Conference PaphosCyprus July 7-11 2008 Proceedings pp 412ndash437 doi 101007978- 3- 540-70592-5_18 url httpdxdoiorg101007978-3-540-70592-5_18

Eclipse Java Development Tools (JDT) httpwwweclipseorgjdt Accessed2016-09-11

Feijs L M G Loe M G and H B M Jonkers (1992) Formal Specification andDesign Cambridge tracts in theoretical computer science Cambridge New YorkCambridge University Press isbn 0-521-43457-2 url httpopacinriafrrecord=b1083844

Flanagan Cormac K Rustan M Leino Mark Lillibridge Greg Nelson James B Saxeand Raymie Stata (2002) ldquoExtended Static Checking for Javardquo In Proceedingsof the 2002 ACM SIGPLAN Conference on Programming Language Design andImplementation (PLDI) Berlin Germany June 17-19 2002 pp 234ndash245 doi101145512529512558 url httpdoiacmorg101145512529512558

Floyd Robert W (1967) ldquoAssigning Meanings to Programsrdquo In Mathematical Aspectsof Computer Science Ed by J T Schwartz Vol 19 Proceedings of Symposia inApplied Mathematics Providence Rhode Island American Mathematical Societypp 19ndash32

Gallier Jean H (1987) Logic for Computer Science Foundations of Automatic Theo-rem Proving Wiley isbn 978-0-471-61546-0

Gharat Pritam M Uday P Khedker and Alan Mycroft (2016) ldquoFlow- and Context-Sensitive Points-To Analysis Using Generalized Points-To Graphsrdquo In Static Anal-ysis - 23rd International Symposium SAS 2016 Edinburgh UK September 8-102016 Proceedings pp 212ndash236 doi 101007978- 3- 662- 53413- 7_11 urlhttpdxdoiorg101007978-3-662-53413-7_11

Greenhouse Aaron and John Boyland (1999) ldquoAn Object-Oriented Effects SystemrdquoIn ECOOPrsquo99 - Object-Oriented Programming 13th European Conference LisbonPortugal June 14-18 1999 Proceedings pp 205ndash229 doi 1010073-540-48743-3_10 url httpdxdoiorg1010073-540-48743-3_10

Gross Thomas R and Peter Steenkiste (1990) ldquoStructured Dataflow Analysis for Ar-rays and its Use in an Optimizing Compilerrdquo In Softw Pract Exper 202 pp 133ndash155 doi 101002spe4380200203 url httpdxdoiorg101002spe4380200203

Guttag John V James J Horning and Jeannette M Wing (1985) ldquoThe Larch Familyof Specification Languagesrdquo In IEEE Software 25 pp 24ndash36 doi 101109MS1985231756 url httpdxdoiorg101109MS1985231756

Guttag John V James J Horning Stephen J Garland Kevin D Jones A Modet andJeannette M Wing (1993a) Larch Languages and Tools for Formal SpecificationTexts and Monographs in Computer Science Springer isbn 978-1-4612-7636-4

218 BIBLIOGRAPHY

doi 101007978-1-4612-2704-5 url httpdxdoiorg101007978-1-4612-2704-5

Guttag John V James J Horning Stephen J Garland Kevin D Jones A Modet andJeannette M Wing (1993b) Larch Languages and Tools for Formal SpecificationTexts and Monographs in Computer Science Springer isbn 978-1-4612-7636-4doi 101007978-1-4612-2704-5 url httpdxdoiorg101007978-1-4612-2704-5

Hammer Christian and Gregor Snelting (2009) ldquoFlow-Sensitive Context-Sensitiveand Object-Sensitive Information Flow Control based on Program DependenceGraphsrdquo In Int J Inf Sec 86 pp 399ndash422 doi 101007s10207-009-0086-1url httpdxdoiorg101007s10207-009-0086-1

Hatcliff John Gary T Leavens K Rustan M Leino Peter Muumlller and Matthew JParkinson (2012) ldquoBehavioral Interface Specification Languagesrdquo In ACM Com-put Surv 443 p 16 doi 10114521876712187678 url httpdoiacmorg10114521876712187678

Heintze Nevin and Olivier Tardieu (2001) ldquoDemand-Driven Pointer Analysisrdquo InProceedings of the ACM SIGPLAN 2001 Conference on Programming LanguageDesign and Implementation PLDI rsquo01 Snowbird Utah USA ACM pp 24ndash34isbn 1-58113-414-2 doi 101145378795378802 url httpdoiacmorg101145378795378802

Hind Michael (2001) ldquoPointer Analysis Havenrsquot We Solved This Problem Yetrdquo InProceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program AnalysisFor Software Tools and Engineering PASTErsquo01 Snowbird Utah USA June 18-19 2001 pp 54ndash61 doi 101145379605379665 url httpdoiacmorg101145379605379665

Hoare C A R (1969) ldquoAn Axiomatic Basis for Computer Programmingrdquo In Com-mun ACM 1210 pp 576ndash580 doi 101145363235363259 url httpdoiacmorg101145363235363259

mdash (1971) ldquoProcedures and Parameters An Axiomatic Approachrdquo In Symposium onSemantics of Algorithmic Languages pp 102ndash116 doi 101007BFb0059696 urlhttpdxdoiorg101007BFb0059696

Horwitz Susan Thomas W Reps and Shmuel Sagiv (1995) ldquoDemand Interproce-dural Dataflow Analysisrdquo In SIGSOFT rsquo95 Proceedings of the Third ACM SIG-SOFT Symposium on Foundations of Software Engineering Washington DC USAOctober 10-13 1995 pp 104ndash115 doi 10 1145 222124222146 url http doiacmorg101145222124222146

Hughes J (1987) ldquoBackwards Analysis of Functional Programsrdquo In IFIP Workshopon Partial Evaluation and Mivxed Computation Ed by Bjoslashrner and Ershov

Hur Chung-Kil Derek Dreyer and Viktor Vafeiadis (2011) ldquoSeparation Logic in thePresence of Garbage Collectionrdquo In Proceedings of the 26th Annual IEEE Sym-posium on Logic in Computer Science LICS 2011 June 21-24 2011 TorontoOntario Canada pp 247ndash256 doi 101109LICS201146 url httpdxdoiorg101109LICS201146

BIBLIOGRAPHY 219

Jacobs Bart and Frank Piessens (2006) ldquoVerification of Programs with Inspector Meth-odsrdquo In In FTfJP 2006

Jacobs Bart Jan Smans and Frank Piessens (2010) ldquoA Quick Tour of the VeriFastProgram Verifierrdquo In Programming Languages and Systems - 8th Asian Sympo-sium APLAS 2010 Shanghai China November 28 - December 1 2010 Proceed-ings pp 304ndash311 doi 101007978-3-642-17164-2_21 url httpdxdoiorg101007978-3-642-17164-2_21

Jacobs Bart Jan Smans Pieter Philippaerts Freacutedeacuteric Vogels Willem Penninckx andFrank Piessens (2011) ldquoVeriFast A Powerful Sound Predictable Fast Verifier for Cand Javardquo In NASA Formal Methods - Third International Symposium NFM 2011Pasadena CA USA April 18-20 2011 Proceedings pp 41ndash55 doi 101007978-3-642-20398-5_4 url httpdxdoiorg101007978-3-642-20398-5_4

Java Native Interface Documentation (JNI) url https docs oracle com javase7docstechnotesguidesjnispecintrohtmlwp725 (Accessed09112016)

Jensen Simon Holm Anders Moslashller and Peter Thiemann (2010) ldquoInterproceduralAnalysis with Lazy Propagationrdquo In Static Analysis - 17th International Sympo-sium SAS 2010 Perpignan France September 14-16 2010 Proceedings pp 320ndash339 doi 101007978-3-642-15769-1_20 url httpdxdoiorg101007978-3-642-15769-1_20

Jhala Ranjit and Rupak Majumdar (2009) ldquoSoftware Model Checkingrdquo In ACMComput Surv 414 211ndash2154 doi 10 1145 1592434 1592438 url http doiacmorg10114515924341592438

Jones Cliff B (1990) Systematic Software Development Using VDM (2Nd Ed) UpperSaddle River NJ USA Prentice-Hall Inc isbn 0-13-880733-7

Jones Neil D and Steven S Muchnick (1979) ldquoFlow Analysis and Optimization of Lisp-Like Structuresrdquo In Conference Record of the Sixth Annual ACM Symposium onPrinciples of Programming Languages 1979 pp 244ndash256 doi 101145567752567776 url httpdoiacmorg101145567752567776

Jones Simon B and Daniel Le Meacutetayer (1989) ldquoComputer-Time Garbage Collectionby Sharing Analysisrdquo In Proceedings of the fourth international conference onFunctional programming languages and computer architecture FPCA 1989 Lon-don UK September 11-13 1989 pp 54ndash74 doi 1011459937099375 urlhttpdoiacmorg1011459937099375

Kassios Ioannis T (2006) ldquoDynamic Frames Support for Framing Dependencies andSharing Without Restrictionsrdquo In FM 2006 Formal Methods 14th InternationalSymposium on Formal Methods Hamilton Canada August 21-27 2006 Proceed-ings pp 268ndash283 doi 10100711813040_19 url httpdxdoiorg10100711813040_19

mdash (2011) ldquoThe Dynamic Frames Theoryrdquo In Formal Asp Comput 233 pp 267ndash288doi 101007s00165-010-0152-5 url httpdxdoiorg101007s00165-010-0152-5

220 BIBLIOGRAPHY

Kennedy Ken (1978) ldquoUse-Definition Chains with Applicationsrdquo In Comput Lang33 pp 163ndash179 doi 1010160096-0551(78)90009-7 url httpdxdoiorg1010160096-0551(78)90009-7

Khedker Uday P Alan Mycroft and Prashant Singh Rawat (2011) ldquoLazy PointerAnalysisrdquo In CoRR abs11125000 url httparxivorgabs11125000

Kildall Gary A (1973) ldquoA Unified Approach to Global Program Optimizationrdquo InConference Record of the ACM Symposium on Principles of Programming Lan-guages 1973 pp 194ndash206 doi 101145512927512945 url httpdoiacmorg101145512927512945

Klein Gerwin Kevin Elphinstone Gernot Heiser June Andronick David Cock PhilipDerrin Dhammika Elkaduwe Kai Engelhardt Rafal Kolanski Michael NorrishThomas Sewell Harvey Tuch and Simon Winwood (2009) ldquoseL4 Formal Verifica-tion of an OS Kernelrdquo In Proceedings of the ACM SIGOPS 22Nd Symposium onOperating Systems Principles SOSP rsquo09 Big Sky Montana USA ACM pp 207ndash220 isbn 978-1-60558-752-3 doi 10114516295751629596 url httpdoiacmorg10114516295751629596

Knoop Jens Oliver Ruumlthing and Bernhard Steffen (1994) ldquoPartial Dead Code Elim-inationrdquo In Proceedings of the ACM SIGPLANrsquo94 Conference on ProgrammingLanguage Design and Implementation (PLDI) Orlando Florida USA June 20-24 1994 pp 147ndash158 doi 101145178243178256 url httpdoiacmorg101145178243178256

Koenig Jason and K Rustan M Leino (2012) ldquoGetting Started with Dafny A GuiderdquoIn Software Safety and Security - Tools for Analysis and Verification pp 152ndash181doi 103233978-1-61499-028-4-152 url httpdxdoiorg103233978-1-61499-028-4-152

Kogtenkov Alexander Bertrand Meyer and Sergey Velder (2015) ldquoAlias CalculusChange Calculus and Frame Inferencerdquo In Sci Comput Program 97P1 pp 163ndash172 issn 0167-6423

Lattner Chris Andrew Lenharth and Vikram S Adve (2007) ldquoMaking Context-Sensitive Points-To Analysis with Heap Cloning Practical for the Real WorldrdquoIn Proceedings of the ACM SIGPLAN 2007 Conference on Programming LanguageDesign and Implementation 2007 pp 278ndash289 doi 10114512507341250766url httpdoiacmorg10114512507341250766

Leavens Gary T Albert L Baker and Clyde Ruby (2006) ldquoPreliminary Design ofJML A Behavioral Interface Specification Language for Javardquo In ACM SIGSOFTSoftware Engineering Notes 313 pp 1ndash38 doi 10114511278781127884 urlhttpdoiacmorg10114511278781127884

Leavens Gary T and Curtis Clifton (2005) ldquoLessons from the JML Projectrdquo In Veri-fied Software Theories Tools Experiments First IFIP TC 2WG 23 ConferenceVSTTE 2005 Zurich Switzerland October 10-13 2005 Revised Selected Papersand Discussions pp 134ndash143 doi 10 1007 978 - 3 - 540 - 69149 - 5 _ 15 urlhttpdxdoiorg101007978-3-540-69149-5_15

Leavens Gary T K Rustan M Leino and Peter Muumlller (2007) ldquoSpecification andVerification Challenges for Sequential Object-Oriented Programsrdquo In Formal Asp

BIBLIOGRAPHY 221

Comput 192 pp 159ndash189 doi 10 1007 s00165 - 007 - 0026 - 7 url http dxdoiorg101007s00165-007-0026-7

Leavens Gary T and Peter Muumlller (2007) ldquoInformation Hiding and Visibility in In-terface Specificationsrdquo In 29th International Conference on Software Engineer-ing (ICSE 2007) Minneapolis MN USA May 20-26 2007 pp 385ndash395 doi101109ICSE200744 url httpdxdoiorg101109ICSE200744

Leavens Gary T Erik Poll Curtis Clifton Yoonsik Cheon Clyde Ruby David Cokand Joseph Kiniry (2006) JML Reference Manual

Lehner Hermann and Peter Muumlller (2010) ldquoEfficient Runtime Assertion Checking ofAssignable Clauses with Datagroupsrdquo In Fundamental Approaches to Software En-gineering 13th International Conference FASE 2010 Held as Part of the JointEuropean Conferences on Theory and Practice of Software ETAPS 2010 PaphosCyprus March 20-28 2010 Proceedings pp 338ndash352 doi 101007978-3-642-12029-9_24 url httpdxdoiorg101007978-3-642-12029-9_24

Leinenbach Dirk and Thomas Santen (2009) ldquoVerifying the Microsoft Hyper-V Hy-pervisor with VCCrdquo In FM 2009 Formal Methods Second World Congress Eind-hoven The Netherlands November 2-6 2009 Proceedings Ed by Ana Cavalcantiand Dennis R Dams Berlin Heidelberg Springer Berlin Heidelberg pp 806ndash809isbn 978-3-642-05089-3 doi 101007978- 3- 642- 05089- 3_51 url httpdxdoiorg101007978-3-642-05089-3_51

Leino K Rustan M This is Boogie 2 Boogie Reference Manual url http researchmicrosoftcomen-usumpeopleleinopaperskrml178pdf

mdash (1998) ldquoData Groups Specifying the Modification of Extended Staterdquo In Pro-ceedings of the 1998 ACM SIGPLAN Conference on Object-Oriented ProgrammingSystems Languages amp Applications (OOPSLA rsquo98) Vancouver British ColumbiaCanada October 18-22 1998 Pp 144ndash153 doi 101145286936286953 urlhttpdoiacmorg101145286936286953

mdash (2001) ldquoExtended Static Checking A Ten-Year Perspectiverdquo In Informatics - 10Years Back 10 Years Ahead Pp 157ndash175 doi 1010073-540-44577-3_11 urlhttpdxdoiorg1010073-540-44577-3_11

mdash (2010) ldquoDafny An Automatic Program Verifier for Functional Correctnessrdquo InLogic for Programming Artificial Intelligence and Reasoning - 16th InternationalConference LPAR-16 Dakar Senegal April 25-May 1 2010 Revised Selected Pa-pers pp 348ndash370 doi 101007978-3-642-17511-4_20 url httpdxdoiorg101007978-3-642-17511-4_20

Leino K Rustan M and Peter Muumlller (2004) ldquoObject Invariants in Dynamic Con-textsrdquo In ECOOP 2004 - Object-Oriented Programming 18th European Confer-ence Oslo Norway June 14-18 2004 Proceedings pp 491ndash516 doi 101007978-3-540-24851-4_22 url httpdxdoiorg101007978-3-540-24851-4_22

mdash (2006) ldquoA Verification Methodology for Model Fieldsrdquo In Programming Languagesand Systems 15th European Symposium on Programming ESOP 2006 Held as Partof the Joint European Conferences on Theory and Practice of Software ETAPS

222 BIBLIOGRAPHY

2006 Vienna Austria March 27-28 2006 Proceedings pp 115ndash130 doi 10 100711693024_9 url httpdxdoiorg10100711693024_9

Leino K Rustan M and Peter Muumlller (2008a) ldquoUsing the Spec Language Method-ology and Tools to Write Bug-Free Programsrdquo In Advanced Lectures on SoftwareEngineering LASER Summer School 20072008 pp 91ndash139 doi 101007978-3-642-13010-6_4 url httpdxdoiorg101007978-3-642-13010-6_4

mdash (2008b) ldquoVerification of Equivalent-Results Methodsrdquo In Programming Languagesand Systems 17th European Symposium on Programming ESOP 2008 Held as Partof the Joint European Conferences on Theory and Practice of Software ETAPS2008 Budapest Hungary March 29-April 6 2008 Proceedings pp 307ndash321 doi101007978-3-540-78739-6_24 url httpdxdoiorg101007978-3-540-78739-6_24

Leino K Rustan M Peter Muumlller and Jan Smans (2009) ldquoVerification of Concur-rent Programs with Chalicerdquo In Foundations of Security Analysis and Design VFOSAD 200720082009 Tutorial Lectures pp 195ndash222 doi 101007978- 3-642-03829-7_7 url httpdxdoiorg101007978-3-642-03829-7_7

Leino K Rustan M Peter Muumlller and Angela Wallenburg (2008) ldquoFlexible Im-mutability with Frozen Objectsrdquo In Verified Software Theories Tools Experi-ments Second International Conference VSTTE 2008 Toronto Canada October6-9 2008 Proceedings pp 192ndash208 doi 101007978-3-540-87873-5_17 urlhttpdxdoiorg101007978-3-540-87873-5_17

Leino K Rustan M and Greg Nelson (1998) ldquoAn Extended Static Checker for Modular-3rdquo In Compiler Construction 7th International Conference CCrsquo98 Held as Part ofthe European Joint Conferences on the Theory and Practice of Software ETAPSrsquo98Lisbon Portugal March 28 - April 4 1998 Proceedings pp 302ndash305 doi 101007BFb0026441 url httpdxdoiorg101007BFb0026441

mdash (2002) ldquoData Abstraction and Information Hidingrdquo In ACM Trans ProgramLang Syst 245 pp 491ndash553 doi 101145570886570888 url httpdoiacmorg101145570886570888

Leino K Rustan M Arnd Poetzsch-Heffter and Yunhong Zhou (2002) ldquoUsing DataGroups to Specify and Check Side Effectsrdquo In Proceedings of the 2002 ACM SIG-PLAN Conference on Programming Language Design and Implementation (PLDI)Berlin Germany June 17-19 2002 pp 246ndash257 doi 101145512529512559url httpdoiacmorg101145512529512559

Leino K Rustan M and Philipp Ruumlmmer (2010) ldquoA Polymorphic Intermediate Ver-ification Language Design and Logical Encodingrdquo In Tools and Algorithms forthe Construction and Analysis of Systems 16th International Conference TACAS2010 Held as Part of the Joint European Conferences on Theory and Practice ofSoftware ETAPS 2010 Paphos Cyprus March 20-28 2010 Proceedings pp 312ndash327 doi 101007978-3-642-12002-2_26 url httpdxdoiorg101007978-3-642-12002-2_26

Leroy Xavier (2009) ldquoA Formally Verified Compiler Back-endrdquo In J Autom Reason-ing 434 pp 363ndash446 doi 101007s10817-009-9155-4 url httpdxdoiorg101007s10817-009-9155-4

BIBLIOGRAPHY 223

Leroy Xavier and Franccedilois Pessaux (2000) ldquoType-Based Analysis of Uncaught Excep-tionsrdquo In ACM Trans Program Lang Syst 222 pp 340ndash377 doi 101145349214349230 url httpdoiacmorg101145349214349230

Lescuyer Steacutephane (2015) ldquoProvenCore Towards a Verified Isolation Micro-KernelrdquoIn International Workshop on MILS Architecture and Assurance for Secure Sys-tems url httpmils-workshop-2015euromilseu

Leuschel Michael and Morten Heine Soslashrensen (1996) ldquoRedundant Argument Filteringof Logic Programsrdquo In Logic Programming Synthesis and Transformation 6th In-ternational Workshop LOPSTRrsquo96 Stockholm Sweden August 28-30 1996 Pro-ceedings pp 83ndash103 doi 1010073-540-62718-9_6 url httpdxdoiorg1010073-540-62718-9_6

Lhotaacutek Ondrej and Laurie J Hendren (2006) ldquoContext-Sensitive Points-to AnalysisIs It Worth Itrdquo In Compiler Construction 15th International Conference CC2006 Held as Part of the Joint European Conferences on Theory and Practice ofSoftware ETAPS 2006 Vienna Austria March 30-31 2006 Proceedings pp 47ndash64 doi 10100711688839_5 url httpdxdoiorg10100711688839_5

Liang Sheng (1999) Java Native Interface Programmerrsquos Guide and Reference 1stBoston MA USA Addison-Wesley Longman Publishing Co Inc isbn 0201325772

Liskov Barbara and John Guttag (1986) Abstraction and Specification in ProgramDevelopment Cambridge MA USA MIT Press isbn 0-262-12112-3

Liu Yanhong A (1998) ldquoDependence Analysis for Recursive Datardquo In Proceedings ofthe 1998 International Conference on Computer Languages ICCL 1998 ChicagoIL USA May 14-16 1998 pp 206ndash215 doi 101109ICCL1998674171 urlhttpdxdoiorg101109ICCL1998674171

Liu Yanhong A and Scott D Stoller (2003) ldquoEliminating Dead Code on RecursiveDatardquo In Sci Comput Program 472-3 pp 221ndash242 doi 10 1016 S0167 -6423(02)00134-X url httpdxdoiorg101016S0167-6423(02)00134-X

Lu Yi John Potter and Jingling Xue (2007) ldquoValidity Invariants and Effectsrdquo InECOOP 2007 - Object-Oriented Programming 21st European Conference BerlinGermany July 30 - August 3 2007 Proceedings pp 202ndash226 doi 101007978-3-540-73589-2_11 url httpdxdoiorg101007978-3-540-73589-2_11

Marcheacute Claude Christine Paulin-Mohring and Xavier Urbain (2004) ldquoThe KRAKA-TOA Tool for Certification of JAVAJAVACARD Programs Annotated in JMLrdquo InJ Log Algebr Program 581-2 pp 89ndash106 doi 101016jjlap200307006url httpdxdoiorg101016jjlap200307006

Marcheacute Claude (2016) The Krakatoa Verification Tool for Java Programs KrakatoaTutorial and Reference Manual url httpkrakatoalrifrkrakatoapdf

Martin-Loumlf Per (1984) Intuitionistic Type Theory Naples BibliopolisMcCarthy John and Patrick J Hayes (1969) ldquoSome Philosophical Problems from the

Standpoint of Artificial Intelligencerdquo In Machine Intelligence Edinburgh Univer-sity Press

Meyer Bertrand (1991) Eiffel The Language Prentice-Hall isbn 0-13-247925-7mdash (1992) ldquoApplying Design by Contractrdquo In IEEE Computer 2510 pp 40ndash51

doi 1011092161279 url httpdxdoiorg1011092161279

224 BIBLIOGRAPHY

Meyer Bertrand (1997) Object-Oriented Software Construction 2nd Edition Prentice-Hall isbn 0-13-629155-4

mdash (2010) ldquoTowards a Theory and Calculus of Aliasingrdquo In Journal of Object Tech-nology 92 pp 37ndash74 doi 105381jot201092c5 url httpdxdoiorg105381jot201092c5

mdash (2011) ldquoSteps Towards a Theory and Calculus of Aliasingrdquo In Int J Softwareand Informatics 51-2 pp 77ndash115 url httpwwwijsiorgchreaderview_abstractaspxfile_no=i77

mdash (2015) ldquoFraming the Frame Problemrdquo In Dependable Software Systems Engineer-ing pp 193ndash203 doi 103233978-1-61499-495-4-193 url httpdxdoiorg103233978-1-61499-495-4-193

Midtgaard Jan (2012) ldquoControl-Flow Analysis of Functional Programsrdquo In ACMComput Surv 443 p 10 doi 10114521876712187672 url httpdoiacmorg10114521876712187672

Mike Barnett Rustan Leino Wolfram Schulte (2005) ldquoThe Spec Programming Sys-tem An Overviewrdquo In CASSIS 2004 Construction and Analysis of Safe Secureand Interoperable Smart devices Vol 3362 Springer pp 49ndash69 url httpswwwmicrosoftcomen-usresearchpublicationthe-spec-programming-system-an-overview

Milanova Ana Atanas Rountev and Barbara G Ryder (2005) ldquoParameterized ObjectSensitivity for Points-To Analysis for Javardquo In ACM Trans Softw Eng Methodol141 pp 1ndash41 doi 10114510448341044835 url httpdoiacmorg10114510448341044835

Montenegro Manuel Ricardo Pentildea and Clara Segura (2015) ldquoShape Analysis in aFunctional Language by Using Regular Languagesrdquo In Sci Comput Program 111pp 51ndash78 doi 101016jscico201412006 url httpdxdoiorg101016jscico201412006

Morgenstern Leora (1995) ldquoThe Problem with Solutions to the Frame Problemrdquo InThe Robotrsquos Dilemma Revisited The Frame Problem in Artificial Intelligence AblexAblex Publishing Co pp 99ndash133

Moura Leonardo Mendonccedila de and Nikolaj Bjoslashrner (2008) ldquoZ3 An Efficient SMTSolverrdquo In Tools and Algorithms for the Construction and Analysis of Systems14th International Conference TACAS 2008 Held as Part of the Joint EuropeanConferences on Theory and Practice of Software ETAPS 2008 Budapest HungaryMarch 29-April 6 2008 Proceedings pp 337ndash340 doi 101007978- 3- 540-78800-3_24 url httpdxdoiorg101007978-3-540-78800-3_24

Muumlller Peter (2002) Modular Specification and Verification of Object-Oriented Pro-grams Vol 2262 Lecture Notes in Computer Science Springer isbn 3-540-43167-5 doi 1010073-540-45651-1 url httpdxdoiorg1010073-540-45651-1

Muumlller Peter Arnd Poetzsch-Heffter and Gary T Leavens (2003) ldquoModular Specifi-cation of Frame Properties in JMLrdquo In Concurrency and Computation Practiceand Experience 152 pp 117ndash154 doi 101002cpe713 url httpdxdoiorg101002cpe713

BIBLIOGRAPHY 225

mdash (2006) ldquoModular Invariants for Layered Object Structuresrdquo In Sci Comput Pro-gram 623 pp 253ndash286 doi 10 1016 j scico 2006 03 001 url http dxdoiorg101016jscico200603001

Naudziuniene Daiva Matko Botincan Dino Distefano Mike Dodds Radu Grigore andMatthew J Parkinson (2011) ldquojStar-Eclipse An IDE for Automated Verificationof Java Programsrdquo In SIGSOFTFSErsquo11 19th ACM SIGSOFT Symposium on theFoundations of Software Engineering (FSE-19) and ESECrsquo11 13th European Soft-ware Engineering Conference (ESEC-13) Szeged Hungary September 5-9 2011pp 428ndash431 doi 10114520251132025182 url httpdoiacmorg10114520251132025182

Naur Peter (1966) ldquoProof of Algorithms by General Snapshotsrdquo In BIT NumericalMathematics 64 pp 310ndash316 issn 1572-9125 doi 101007BF01966091 urlhttpdxdoiorg101007BF01966091

Nelson Greg and Derek C Oppen (1980) ldquoFast Decision Procedures Based on Con-gruence Closurerdquo In J ACM 272 pp 356ndash364 doi 101145322186322198url httpdoiacmorg101145322186322198

Nielson Flemming and Hanne Riis Nielson (1999) ldquoInterprocedural Control Flow Anal-ysisrdquo In Programming Languages and Systems 8th European Symposium on Pro-gramming ESOPrsquo99 Held as Part of the European Joint Conferences on the Theoryand Practice of Software ETAPSrsquo99 Amsterdam The Netherlands 22-28 March1999 Proceedings pp 20ndash39 doi 10 1007 3 - 540 - 49099 - X _ 3 url http dxdoiorg1010073-540-49099-X_3

Nielson Flemming Hanne Riis Nielson and Chris Hankin (1999) Principles of ProgramAnalysis Springer isbn 978-3-540-65410-0

Nordio Martin Cristiano Calcagno Bertrand Meyer Peter Muumlller and Julian Tschan-nen (2010) ldquoReasoning about Function Objectsrdquo In Objects Models ComponentsPatterns 48th International Conference TOOLS 2010 Maacutelaga Spain June 28 -July 2 2010 Proceedings pp 79ndash96 doi 101007978-3-642-13953-6_5 urlhttpdxdoiorg101007978-3-642-13953-6_5

Nordstroumlm Bengt Kent Petersson and Jan M Smith (1990) Programming in Martin-Loumlfrsquos Type Theory Vol 200 Oxford University Press Oxford

OrsquoCallahan Robert and Daniel Jackson (1997) ldquoLackwit A Program UnderstandingTool Based on Type Inferencerdquo In Pulling Together Proceedings of the 19th Inter-national Conference on Software Engineering Boston Massachusetts USA May17-23 1997 Pp 338ndash348 doi 101145253228253351 url httpdoiacmorg101145253228253351

OrsquoHearn Peter W (2005) ldquoScalable Specification and Reasoning Challenges for Pro-gram Logicrdquo In Verified Software Theories Tools Experiments First IFIP TC2WG 23 Conference VSTTE 2005 Zurich Switzerland October 10-13 2005Revised Selected Papers and Discussions pp 116ndash133 doi 101007978-3-540-69149-5_14 url httpdxdoiorg101007978-3-540-69149-5_14

mdash (2012) ldquoA Primer on Separation Logic (and Automatic Program Verification andAnalysis)rdquo In Software Safety and Security - Tools for Analysis and Verification

226 BIBLIOGRAPHY

pp 286ndash318 doi 103233978-1-61499-028-4-286 url httpdxdoiorg103233978-1-61499-028-4-286

OrsquoHearn Peter W John C Reynolds and Hongseok Yang (2001) ldquoLocal Reasoningabout Programs that Alter Data Structuresrdquo In Computer Science Logic 15thInternational Workshop CSL 2001 10th Annual Conference of the EACSL ParisFrance September 10-13 2001 Proceedings pp 1ndash19 doi 1010073-540-44802-0_1 url httpdxdoiorg1010073-540-44802-0_1

OrsquoHearn Peter W Hongseok Yang and John C Reynolds (2004) ldquoSeparation andInformation Hidingrdquo In Proceedings of the 31st ACM SIGPLAN-SIGACT Sympo-sium on Principles of Programming Languages POPL 2004 Venice Italy January14-16 2004 pp 268ndash280 doi 101145964001964024 url httpdoiacmorg101145964001964024

Padhye Rohan and Uday P Khedker (2013) ldquoInterprocedural Data Flow Analysisin Soot Using Value Contextsrdquo In Proceedings of the 2nd ACM SIGPLAN In-ternational Workshop on State Of the Art in Java Program analysis SOAP 2013Seattle WA USA June 20 2013 pp 31ndash36 doi 10114524875682487569url httpdoiacmorg10114524875682487569

Park Young Gil and Benjamin Goldberg (1992) ldquoEscape Analysis on Listsrdquo In Pro-ceedings of the ACM SIGPLANrsquo92 Conference on Programming Language Designand Implementation (PLDI) San Francisco California USA June 17-19 1992pp 116ndash127 doi 101145143095143125 url httpdoiacmorg101145143095143125

Parkinson Matthew J and Gavin M Bierman (2005) ldquoSeparation Logic and Ab-stractionrdquo In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium onPrinciples of Programming Languages POPL 2005 Long Beach California USAJanuary 12-14 2005 pp 247ndash258 doi 10114510403051040326 url httpdoiacmorg10114510403051040326

Parkinson Matthew J Richard Bornat and Cristiano Calcagno (2006) ldquoVariables asResource in Hoare Logicsrdquo In 21th IEEE Symposium on Logic in Computer Science(LICS 2006) 12-15 August 2006 Seattle WA USA Proceedings pp 137ndash146 doi101109LICS200652 url httpdxdoiorg101109LICS200652

Pierce Benjamin C (2002) Types and Programming Languages MIT Press isbn 978-0-262-16209-8

Plotkin Gordon D (2004) ldquoA Structural Approach to Operational Semanticsrdquo In JLog Algebr Program 60-61 pp 17ndash139

Polikarpova Nadia Carlo A Furia Yu Pei Yi Wei and Bertrand Meyer (2013) ldquoWhatGood are Strong Specificationsrdquo In 35th International Conference on SoftwareEngineering ICSE rsquo13 San Francisco CA USA May 18-26 2013 pp 262ndash271doi 101109ICSE20136606572 url httpdxdoiorg101109ICSE20136606572

Praun Christoph von and Thomas R Gross (2003) ldquoStatic Conflict Analysis forMulti-Threaded Object-Oriented Programsrdquo In Proceedings of the ACM SIGPLAN2003 Conference on Programming Language Design and Implementation 2003 San

BIBLIOGRAPHY 227

Diego California USA June 9-11 2003 pp 115ndash128 doi 101145781131781145 url httpdoiacmorg101145781131781145

Rakamaric Zvonimir and Alan J Hu (2008) ldquoAutomatic Inference of Frame AxiomsUsing Static Analysisrdquo In 23rd IEEEACM International Conference on Auto-mated Software Engineering (ASE 2008) pp 89ndash98 doi 101109ASE200819url httpdxdoiorg101109ASE200819

Reacutemy Didier and Jerome Vouillon (1997) ldquoObjective ML A Simple Object-OrientedExtension of MLrdquo In Conference Record of POPLrsquo97 The 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages Papers Presentedat the Symposium Paris France 15-17 January 1997 pp 40ndash53 doi 101145263699263707 url httpdoiacmorg101145263699263707

Reps Thomas W Susan Horwitz and Shmuel Sagiv (1995) ldquoPrecise InterproceduralDataflow Analysis via Graph Reachabilityrdquo In Conference Record of POPLrsquo9522nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lan-guages San Francisco California USA January 23-25 1995 pp 49ndash61 doi101145199448199462 url httpdoiacmorg101145199448199462

Reps Thomas W and Todd Turnidge (1996) ldquoProgram Specialization via ProgramSlicingrdquo In Partial Evaluation International Seminar Dagstuhl Castle GermanyFebruary 12-16 1996 Selected Papers pp 409ndash429 doi 1010073-540-61580-6_20 url httpdxdoiorg1010073-540-61580-6_20

Reynolds John C (1981) The Craft of Programming Prentice Hall International seriesin computer science Prentice Hall isbn 978-0-13-188862-3

mdash (2000) ldquoIntuitionistic Reasoning about Shared Mutable Data Structurerdquo In Mil-lennial Perspectives in Computer Science Palgrave pp 303ndash321

mdash (2002) ldquoSeparation Logic A Logic for Shared Mutable Data Structuresrdquo In 17thIEEE Symposium on Logic in Computer Science (LICS 2002) 22-25 July 2002Copenhagen Denmark Proceedings pp 55ndash74 doi 101109LICS20021029817url httpdxdoiorg101109LICS20021029817

mdash (2005) ldquoAn Overview of Separation Logicrdquo In Verified Software Theories ToolsExperiments First IFIP TC 2WG 23 Conference VSTTE 2005 Zurich Switzer-land October 10-13 2005 Revised Selected Papers and Discussions pp 460ndash469doi 101007978-3-540-69149-5_49 url httpdxdoiorg101007978-3-540-69149-5_49

Robert Valentin and Xavier Leroy (2012) ldquoA Formally-Verified Alias Analysisrdquo InCertified Programs and Proofs - Second International Conference CPP 2012 KyotoJapan December 13-15 2012 Proceedings pp 11ndash26 doi 101007978-3-642-35308-6_5 url httpdxdoiorg101007978-3-642-35308-6_5

Ruf Erik (1995) ldquoContext-Insensitive Alias Analysis Reconsideredrdquo In Proceedingsof the ACM SIGPLAN 1995 Conference on Programming Language Design andImplementation PLDI rsquo95 La Jolla California USA ACM pp 13ndash22 isbn 0-89791-697-2 doi 101145207110207112 url httpdoiacmorg101145207110207112

Sabelfeld Andrei and Andrew C Myers (2003) ldquoLanguage-Based Information-FlowSecurityrdquo In IEEE Journal on Selected Areas in Communications 211 pp 5ndash19

228 BIBLIOGRAPHY

doi 101109JSAC2002806121 url httpdxdoiorg101109JSAC2002806121

Sagiv Shmuel Thomas W Reps and Reinhard Wilhelm (1999) ldquoParametric ShapeAnalysis via 3-Valued Logicrdquo In POPL rsquo99 Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages 1999 pp 105ndash118doi 101145292540292552 url httpdoiacmorg101145292540292552

Salcianu Alexandru and Martin C Rinard (2005) ldquoPurity and Side Effect Analysis forJava Programsrdquo In Verification Model Checking and Abstract Interpretation 6thInternational Conference VMCAI 2005 Proceedings pp 199ndash215 doi 101007978-3-540-30579-8_14 url httpdxdoiorg101007978-3-540-30579-8_14

Shapiro Marc and Susan Horwitz (1997) ldquoThe Effects of the Precision of Pointer Anal-ysisrdquo In Static Analysis 4th International Symposium SAS rsquo97 Paris FranceSeptember 8-10 1997 Proceedings pp 16ndash34 doi 101007BFb0032731 urlhttpdxdoiorg101007BFb0032731

Sharir M and A Pnueli (1978) Two Approaches to Interprocedural Data Flow AnalysisNew York NY New York Univ Comput Sci Dept url httpscdscernchrecord120118

Shostak Robert E (1984) ldquoDeciding Combinations of Theoriesrdquo In J ACM 311pp 1ndash12 doi 1011452422322411 url httpdoiacmorg1011452422322411

Smans Jan Bart Jacobs and Frank Piessens (2008) ldquoVeriCool An Automatic Verifierfor a Concurrent Object-Oriented Languagerdquo In Formal Methods for Open Object-Based Distributed Systems 10th IFIP WG 61 International Conference FMOODS2008 Oslo Norway June 4-6 2008 Proceedings pp 220ndash239 doi 101007978-3-540-68863-1_14 url httpdxdoiorg101007978-3-540-68863-1_14

mdash (2012) ldquoImplicit Dynamic Framesrdquo In ACM Trans Program Lang Syst 34121ndash258 doi 10114521609102160911 url httpdoiacmorg10114521609102160911

Sozeau Matthieu (2009) ldquoA New Look at Generalized Rewriting in Type TheoryrdquoIn J Formalized Reasoning 21 pp 41ndash62 doi 106092issn1972-57871574url httpdxdoiorg106092issn1972-57871574

Sozeau Matthieu and the COQ development team (1997) The Coq Proof AssistantReference Manual Version 86 Inria

Sridharan Manu Denis Gopan Lexin Shan and Rastislav Bodiacutek (2005) ldquoDemand-Driven Points-to Analysis for Javardquo In Proceedings of the 20th Annual ACM SIG-PLAN Conference on Object-oriented Programming Systems Languages and Ap-plications OOPSLA rsquo05 San Diego CA USA ACM pp 59ndash76 isbn 1-59593-031-0 doi 10114510948111094817 url httpdoiacmorg10114510948111094817

Strachey Christopher (1967) Fundamental Concepts in Programming Languages Lec-ture Notes International Summer School in Computer Programming CopenhagenReprinted in Higher-Order and Symbolic Computation 13(12) pp 1ndash49 2000

BIBLIOGRAPHY 229

Taghdiri Mana Robert Seater and Daniel Jackson (2006) ldquoLightweight Extraction ofSyntactic Specificationsrdquo In Proceedings of the 14th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering FSE 2006 pp 276ndash286 doi10114511817751181809 url httpdoiacmorg10114511817751181809

Tip Frank (1995) ldquoA Survey of Program Slicing Techniquesrdquo In J Prog Lang 33url httpcompscinetdcskclacukJPjp030301abshtml

Vardi Moshe Y and Pierre Wolper (1994) ldquoReasoning about Infinite ComputationsrdquoIn Information and Computation 115 pp 1ndash37

Volpano Dennis M Cynthia E Irvine and Geoffrey Smith (1996) ldquoA Sound TypeSystem for Secure Flow Analysisrdquo In Journal of Computer Security 423 pp 167ndash188 doi 103233JCS-1996-42-304 url httpdxdoiorg103233JCS-1996-42-304

Wadler Philip and R J M Hughes (1987) ldquoProjections for Strictness Analysisrdquo InFunctional Programming Languages and Computer Architecture Portland OregonUSA September 14-16 1987 Proceedings pp 385ndash407 doi 1010073- 540-18317-5_21 url httpdxdoiorg1010073-540-18317-5_21

Wand Mitchell and William D Clinger (1998) ldquoSet Constraints for Destructive ArrayUpdate Optimizationrdquo In Proceedings of the 1998 International Conference onComputer Languages ICCL 1998 Chicago IL USA May 14-16 1998 pp 184ndash195 doi 101109ICCL1998674169 url httpdxdoiorg101109ICCL1998674169

Wand Mitchell and Igor Siveroni (1999) ldquoConstraint Systems for Useless VariableEliminationrdquo In POPL rsquo99 Proceedings of the 26th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages San Antonio TX USAJanuary 20-22 1999 pp 291ndash302 doi 101145292540292567 url httpdoiacmorg101145292540292567

Weiser Mark (1984) ldquoProgram Slicingrdquo In IEEE Trans Software Eng 104 pp 352ndash357 doi 101109TSE19845010248 url httpdxdoiorg101109TSE19845010248

Wing Jeannette M (1987) ldquoWriting Larch Interface Language Specificationsrdquo InACM Trans Program Lang Syst 91 pp 1ndash24 doi 101145975810500 urlhttpdoiacmorg101145975810500

Xtext Documentation httpseclipseorgXtext Accessed 2016-09-11Zee Karen Viktor Kuncak and Martin C Rinard (2008) ldquoFull Functional Verification

of Linked Data Structuresrdquo In Proceedings of the ACM SIGPLAN 2008 Conferenceon Programming Language Design and Implementation Tucson AZ USA June 7-13 2008 pp 349ndash361 doi 10114513755811375624 url httpdoiacmorg10114513755811375624

Zhao Yang and John Boyland (2008) ldquoA Fundamental Permission Interpretation forOwnership Typesrdquo In Second IEEEIFIP International Symposium on TheoreticalAspects of Software Engineering TASE 2008 June 17-19 2008 Nanjing Chinapp 65ndash72 doi 101109TASE200845 url httpdxdoiorg101109TASE200845

230 BIBLIOGRAPHY

Zheng Xin and Radu Rugina (2008) ldquoDemand-Driven Alias Analysis for Crdquo In Pro-ceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages POPL rsquo08 San Francisco California USA ACM pp 197ndash208 isbn 978-1-59593-689-9 doi 10114513284381328464 url httpdoiacmorg10114513284381328464

  • Reacutesumeacute eacutetendu en Franccedilais
    • Le Problegraveme du Frame
    • Objectifs
    • Analyse de deacutependance
    • Anaylse de correacutelation
    • Proceacutedure de deacutecision
    • Conclusion
      • Introduction
        • Formal Verification of Software
        • The Frame Problem in a Nutshell
        • Prove amp Run Objectives and Products
        • Context and Problem Statement
        • Contributions and Structure of the Document
          • The Frame Problem in Software Verification
            • Specification Languages and Verification Tools
            • Manifestations of the Frame Problem
            • Approaches to Specifying Frame Properties
              • The Manual Approach
              • The Exclusive Approach
              • The Implicit Approach
                • Topologies and Effects
                  • Explicit Footprints
                  • Implicit Footprints
                  • Predefined Footprints
                    • Other Approaches to Reason about Frames
                    • Other Relevant Work
                      • The Smart Language and ProvenTools
                        • The Smart Modeling Language
                          • Smart Predicates and Types
                          • Exit Labels and Control Flow
                          • Polymorphism amp Algebraic Data Types
                          • Specifications
                          • Illustrating Smart ndash An Abstract Process Manager
                            • ProvenTools
                            • Smil
                              • The alpha-Smil Language
                                • alpha-Smil Syntax
                                • Control Flow Graph
                                • Well-Typed Smil Statements
                                • Operational Semantics of Smil Statements
                                  • Dependency Analysis for Functional Specifications
                                    • Dependency Analysis in a Nutshell
                                      • Targeted Dependency Information
                                      • Outline
                                        • Abstract Dependency Domain
                                          • Join and Reduction Operator
                                          • Well-Typed Dependencies
                                            • Intraprocedural Analysis and Data-Flow Equations
                                              • Intraprocedural Dependency Domains
                                              • Intraprocedural Data-Flow Equations
                                              • Intraprocedural Dependency Analysis Illustrated
                                                • Interprocedural Dependencies
                                                  • Interprocedural Dependency Analysis Illustrated
                                                  • Context-Insensitivity and its Consequences
                                                    • Semantics of Dependency Values
                                                    • Related Work
                                                    • Conclusion
                                                      • Deferred Dependencies Injecting Context in Dependency Summaries
                                                        • Dealing with Context-Insensitivity
                                                        • Symbolic Dependency Components in a Nutshell
                                                        • Symbolic Paths
                                                          • Symbolic Path Type
                                                          • Semantics of Symbolic Paths
                                                          • Well-Typed Paths and Path Sets
                                                            • Abstract Dependency Domain with Deferred Accesses
                                                            • Deferred Dependencies at the Intraprocedural Level
                                                              • Extended Intraprocedural Dependency Analysis
                                                              • Intraprocedural Dependency Analysis Illustrated
                                                                • Deferred Dependencies at the Interprocedural Level
                                                                  • Applying Context-Sensitive Information by Substitution
                                                                  • Wrapped Calls and Results
                                                                    • Related Work
                                                                    • Conclusion
                                                                      • Correlation Analysis
                                                                        • Introduction
                                                                          • Targeted Correlation Information
                                                                          • Correlation Analysis in a Nutshell
                                                                            • Partial Equivalence Relations
                                                                              • Abstract Partial Equivalence Type
                                                                              • Well-Typed Partial Equivalences and their Semantics
                                                                                • Paths and Correlations
                                                                                  • Paths and Correlation Types
                                                                                  • Alignment and Partial Order
                                                                                    • Intraprocedural Correlation Analysis
                                                                                      • Intraprocedural Correlation Summaries and Analysis
                                                                                      • Intraprocedural Correlation Analysis Illustrated
                                                                                        • Interprocedural Correlation Analysis
                                                                                        • Extension ndash Constructor Evolution
                                                                                        • Related Work
                                                                                        • Conclusion
                                                                                          • Implementation Application and Results
                                                                                            • Implementation of the Dependency Analysis
                                                                                              • Dependency Type and Operators
                                                                                              • Intraprocedural Dependency Analysis
                                                                                                • Implementation of the Correlation Analysis
                                                                                                  • Partial Equivalence Relations and Operators
                                                                                                  • Intraprocedural Correlations
                                                                                                  • Dependency and Correlation Analysers
                                                                                                    • Dependency and Correlation Results on ProvenCore Layers
                                                                                                      • ProvenCore Description
                                                                                                      • Obtained Dependency and Correlation Results
                                                                                                      • Precision of our Dependency and Correlation Summaries
                                                                                                        • Reasoning about Framing using Correlations and Dependencies
                                                                                                          • A Decision Procedure
                                                                                                          • Types of Targeted Queries
                                                                                                            • Decision Procedure Experiments
                                                                                                              • Conclusion and Perspectives
                                                                                                                • Contributions
                                                                                                                • Future Work
                                                                                                                  • Bibliography
Page 3: Static Analysis of Functional Programs with an Application

ii

iii

UNIVERSITEacute DE RENNES 1

AbstractProve amp Run

Eacutecole doctorale Matisse

DOCTEUR DE LrsquoUNIVERSITEacute DE RENNES 1

Static Analysis of Functional Programs with anApplication to the Frame Problem in

Deductive Verification

by Oana Fabiana Andreescu

In the field of software verification the frame problem refers to establishing the bound-aries within which program elements operate It has notoriously tedious consequenceson the specification of frame properties which indicate the parts of the program statethat an operation is allowed to modify as well as on their verification ie provingthat operations modify only what is specified by their frame properties In the contextof interactive formal verification of complex systems such as operating systems mucheffort is spent addressing these consequences and proving the preservation of the sys-temsrsquo invariants However most operations have a localized effect on the system andimpact only a limited number of invariants at the same time In this thesis we addressthe issue of identifying those invariants that are unaffected by an operation and wepresent a solution for automatically inferring their preservation Our solution is meantto ease the proof burden for the programmer It is based on static analysis and doesnot require any additional frame annotations Our strategy consists in combining adependency analysis and a correlation analysis We have designed and implementedboth static analyses for a strongly-typed functional language that handles structuresvariants and arrays The dependency analysis computes a conservative approximationof the input fragments on which functional properties and operations depend Thecorrelation analysis computes a safe approximation of the parts of an input state to afunction that are copied to the output state It summarizes not only what is modifiedbut also how it is modified and to what extent By employing these two static analysesand by subsequently reasoning based on their combined results an interactive theo-rem prover can automate the discharching of proof obligations for unmodified partsof the state We have applied both of our static analyses to a functional specificationof a micro-kernel and the obtained results demonstrate both their precision and theirscalability

v

AcknowledgementsFirst of all I would like to express my gratitude to my two PhD advisors ThomasJensen and Steacutephane Lescuyer without whom this thesis would have been impossibleI thank them for their patience and dedication in guiding me throughout these yearsand for all the rigour that they instilled into me by word and by their own exampleThomas thank you for helping me put my work into perspective Thank you for yourencouragement when I was overwhelmed by doubts and for your optimism when I hadnone Steacutephane thank you for your inspiring advices for the rigorous proofreadingfor the many interesting discussions and for your careful attention to my work Knowthat this thank you note was written using Emacs to which I am happy to admit thatyou converted me

I am in debt to Dominique Bolignano for raising the possibility of this thesis andfor creating the frame that allowed me to embark on this interesting journey and toexplore the seas of research among an inspiring group of professionals - the Prove ampRun team

I am grateful to and would like to wholeheartedly thank Catherine Dubois andAntoine Mineacute for accepting to review my dissertation I am honoured to know that my200+ pages have been read by experts of static analysis and formal verification and Iam grateful for their valuable comments and remarks

I would also like to thank Sandrine Blazy and Sylvain Conchon for accepting to bemembers of the jury Sylvain Conchon I am grateful for your keen interest during mydefense Sandrine Blazy thank you for accepting to chair my defense and for drivingit in such a positive manner

For their understanding their advice and their support during the transition periodand the months before my defense I would like to thank Claire Loiseaux and CarolinaLavatelli

I thank all of my colleagues at Prove amp Run for our discussions and their adviceduring these years I thank Florence for her warmth energy and optimism Erica andHenry for being such great office colleagues Pauline and Franccedilois for being friendlyreliable colleagues in the academic trenches I am in debt to Olivier and Benoit forreviewing my articles and providing valuable remarks I thank Pascale for smoothingout the stormy waves of administrative work Though our interactions were brieferI would like to also thank the Celtique members for their openness and for the inter-esting seminaries A special thanks goes to Lydie Mabil for helping me deal with theadministrative work during these years and finally for helping prepare the defense ofmy dissertation

This academic journey started long ago even before I was aware with the help ofMarius Minea and Ovidiu Badescu who unknowingly motivated me to take this pathyears later I warmly thank them and I am grateful to both for paving the first part ofmy academic path

I would also like to thank my friends old and new far and near Thank you foralways being there for me and providing perspective enthusiasm and breaths of freshair Thank you as well for still being my friends despite the long winded and geeky

vi

descriptions of my work and the occasionally cancelled plans and absences while I wastrying to find my way into the research world

I lack the appropriate words to express the gratitude I feel towards my family fortheir never-ending love and support I thank my mother and my sister for being suchwonderful examples of women in science I thank my father for his unwavering belief inme and for his love and respect for well-written sentences no matter the context whichhe instilled into me I thank my brother-in-law for being the one who ignited early onthe sparkle and interest for computers and mathematics and my two wonderful niecesfor always being my rays of light

Last but surely not least I have only gratitude for Georges my companion mypillar of strength my compass and lighthouse during the darkest moments To quoteCarl Sagan in the vastness of space and immensity of time it is my absolute joy tospend a planet and an epoch with you

vii

Contents

I Reacutesumeacute eacutetendu en Franccedilais xxiiiI1 Le Problegraveme du Frame xxiiiI2 Objectifs xxiiiI3 Analyse de deacutependance xxivI4 Anaylse de correacutelation xxvI5 Proceacutedure de deacutecision xxvI6 Conclusion xxvi

1 Introduction 111 Formal Verification of Software 112 The Frame Problem in a Nutshell 513 Prove amp Run Objectives and Products 714 Context and Problem Statement 915 Contributions and Structure of the Document 11

2 The Frame Problem in Software Verification 1321 Specification Languages and Verification Tools 1322 Manifestations of the Frame Problem 1623 Approaches to Specifying Frame Properties 17

231 The Manual Approach 17232 The Exclusive Approach 19233 The Implicit Approach 21

24 Topologies and Effects 21241 Explicit Footprints 23242 Implicit Footprints 24243 Predefined Footprints 25

25 Other Approaches to Reason about Frames 2626 Other Relevant Work 27

3 The Smart Language and ProvenTools 2931 The Smart Modeling Language 29

311 Smart Predicates and Types 30312 Exit Labels and Control Flow 34313 Polymorphism amp Algebraic Data Types 40314 Specifications 43315 Illustrating Smart ndash An Abstract Process Manager 47

32 ProvenTools 52

viii

33 Smil 55

4 The αSmil Language 5941 αSmil Syntax 5942 Control Flow Graph 6743 Well-Typed αSmil Statements 6744 Operational Semantics of αSmil Statements 70

5 Dependency Analysis for Functional Specifications 7751 Dependency Analysis in a Nutshell 78

511 Targeted Dependency Information 79512 Outline 83

52 Abstract Dependency Domain 83521 Join and Reduction Operator 86522 Well-Typed Dependencies 90

53 Intraprocedural Analysis and Data-Flow Equations 91531 Intraprocedural Dependency Domains 91532 Intraprocedural Data-Flow Equations 93533 Intraprocedural Dependency Analysis Illustrated 97

54 Interprocedural Dependencies 100541 Interprocedural Dependency Analysis Illustrated 103542 Context-Insensitivity and its Consequences 104

55 Semantics of Dependency Values 10556 Related Work 10957 Conclusion 112

6 Deferred Dependencies Injecting Context in Dependency Summaries11561 Dealing with Context-Insensitivity 11562 Symbolic Dependency Components in a Nutshell 11663 Symbolic Paths 120

631 Symbolic Path Type 120632 Semantics of Symbolic Paths 122633 Well-Typed Paths and Path Sets 123

64 Abstract Dependency Domain with Deferred Accesses 12565 Deferred Dependencies at the Intraprocedural Level 128

651 Extended Intraprocedural Dependency Analysis 128652 Intraprocedural Dependency Analysis Illustrated 129

66 Deferred Dependencies at the Interprocedural Level 130661 Applying Context-Sensitive Information by Substitution 132662 Wrapped Calls and Results 134

67 Related Work 13468 Conclusion 136

ix

7 Correlation Analysis 13771 Introduction 137

711 Targeted Correlation Information 138712 Correlation Analysis in a Nutshell 140

72 Partial Equivalence Relations 141721 Abstract Partial Equivalence Type 141722 Well-Typed Partial Equivalences and their Semantics 144

73 Paths and Correlations 146731 Paths and Correlation Types 146732 Alignment and Partial Order 149

74 Intraprocedural Correlation Analysis 155741 Intraprocedural Correlation Summaries and Analysis 155742 Intraprocedural Correlation Analysis Illustrated 162

75 Interprocedural Correlation Analysis 16676 Extension ndash Constructor Evolution 16777 Related Work 16978 Conclusion 171

8 Implementation Application and Results 17381 Implementation of the Dependency Analysis 173

811 Dependency Type and Operators 174812 Intraprocedural Dependency Analysis 177

82 Implementation of the Correlation Analysis 178821 Partial Equivalence Relations and Operators 178822 Intraprocedural Correlations 179823 Dependency and Correlation Analysers 180

83 Dependency and Correlation Results on ProvenCore Layers 182831 ProvenCore Description 182832 Obtained Dependency and Correlation Results 184833 Precision of our Dependency and Correlation Summaries 188

84 Reasoning about Framing using Correlations and Dependencies 192841 A Decision Procedure 192842 Types of Targeted Queries 197

85 Decision Procedure Experiments 199

9 Conclusion and Perspectives 20391 Contributions 20492 Future Work 206

Bibliography 211

xi

List of Figures

11 Complex Transition Systems Frame Problem 912 Frame Problem and Solution Strategy 10

31 Possible Transitions between Thread States 4832 The ProvenTools Toolchain 5333 Smart Editor 54

41 Body of the stop_thread Predicate 6542 Example ndash Control Flow Graph of Predicate thread 6743 Well-Typed Control Flow Graph 70

51 Example Data Types ndash Thread and Memory Region 7952 Input Type ndash Process 8053 Predicate thread ndash Implementation 8054 Gthread ndash Control Flow Graph of Predicate thread 8155 Targeted Dependency Results for Predicate thread 8156 Gstart_address ndash Control Flow Graph of Predicate start_address 8257 Predicate start_address ndash Implementation 8258 Targeted Dependency Results for Predicate start_address 8259 Order Relation on Pairs of Atomic Dependencies 85510 Computation of the Intraprocedural Domain at a Nodersquos Entry Point 94511 Analysing Predicate thread ndash Initialisation 98512 Applying the Variant Switch Equation 98513 Analysing Predicate thread ndash Variant Switch 99514 Applying the Array Access Equation 99515 Analysing Predicate thread ndash Array Access 100516 Applying the Field Access Equation 100517 Analysing Predicate thread ndash Field Access 101518 Gstart_address ndash Dependency Information 103519 Gstart_address ndash Final Dependency Results 104

61 Analysing thread ndash Dependency Summary with Deferred Occurrences 13062 Gstart_address ndash Intermediate Dependency Results for start_address 13163 Substitution of Formal Parameters by Effective Parameters 13164 Substituting Deferred Dependencies by Actual Dependencies 132

71 Body of the stop_thread Predicate 138

xii

72 Targeted Correlation Results for Predicate stop_thread 13973 Intraprocedural Correlations ndash General Representation 14074 Intraprocedural Domain ndash Examples 14175 Entry Point ndash Correlation Information 16276 Analysing Predicate stop_thread ndash Initialisation 16377 Construction Evolution 167

81 ProvenCore ndash Abstract Layers 18382 Distribution of the number of inferred preserved properties 20183 Distribution of the number of inferred predicates for which a property is

preserved 202

xiii

List of Tables

42 αSmil ndash Set of Supported Statements 6243 Statements and their Exit Labels 6344 Predicate Body in αSmil 6446 Well-Typed Predicate Call 6847 Well-Typed Statements 6948 The Structural Operational Semantics of αSmil Generic Statements 7249 Operational Semantics of αSmil Structure-Related Statements 73410 Operational Semantics of αSmil Variant-Related Statements 74411 Operational Semantics of αSmil Array-Related Statements 75412 Semantics of a Predicate Call 76

51 v ndash Comparison of Two Domains 8652 or ndash Join Operation 8753 oplus ndash Reduction Operator 8954 Dependency Extractions 9055 Well-Typed Dependencies 9156 Statements ndash Representations and Data-Flow Equations 9357 Generic Statements ndash Data-Flow Equations 9558 Structure-Related Statements ndash Data-Flow Equations 9559 Variant-Related Statements ndash Data-Flow Equations 96510 Array-Related Statements ndash Data-Flow Equations 97

61 E ndash Path Semantics 12262 Well-Typed Dependency Paths

12463 Extended Leq - Comparison of Two Domains

12664 or ndash Extended Join 12765 oplus ndash Extended Reduction Operator 12766 Extended Extraction Operators 12867 Well-Typed Dependencies ndash Extended 12868 Deferred Paths ndash Application and Substitutions 13369 Interprocedural Domain ndash Substitutions 133

71 vR ndash Comparison of Two Domains 14272 Partial Equivalences ndash orR ndash Join Operation 14373 Partial Equivalences ndash andR ndash Meet Operation 143

xiv

74 Partial Equivalence Extractions 14475 Well-Typed Partial Equivalences 14576 Partial Equivalence Relations ndash Semantics 14677 Well-Typed Access Paths

14878 Well-Typed Correlations

14879 Well-Typed Correlation Maps

149711 Links between Access Paths 152712 Statements ndash Representations and Data-Flow Equations 157719 Well-Formed Intraprocedural Correlation Summaries

162

83 ProvenCore Abstract Layers ndash Global State Type 18584 ProvenCore Abstract Layers ndash ProcessMachine Type 18585 Abstract Layers ndash Evaluation Data and Dependency Analysis Timing 18686 Abstract Layers ndash Detailed Dependency Analysis Timing 18687 Abstract Layers ndash Evaluation Data and Deferred Dependency Analysis

Timing 18788 Abstract Layers ndash Detailed Deferred Dependency Analysis Timing 18789 Abstract Layers ndash Evaluation Data and Correlation Analysis Timing 187810 Abstract Layers ndash Detailed Correlation Analysis Timing 188811 RSMFSP Layers ndash Evaluation Data and Dependency Summaries 190812 TDS Layer ndash Evaluation Data and Dependency Summaries 191813 RSMFSP Layers ndash Evaluation Data and Correlation Summaries 192814 TDS Layer ndash Evaluation Data and Correlation Summaries 193

xv

List of notations

Section Symbol Type DescriptionSec 312 true L Special exit label 34Sec 312 false L Special exit label 35Sec 41 T0 sub T Set of base type identifiers 60Def 411 T Universe of type identifiers 60Def 411 τ T Type 60Def 411 τ0 T Primitive type 60Def 411 structf1 τ1 T Structure type 60Def 411 variant[C1 τ1| ] T Variant type 60Def 411 arrτ 〈τ〉 T Array type 60Sec 41 λ L Exit label 61Sec 41 L Set of exit labels 61Sec 41 error L Special exit label 61Sec 41 σ σp Σ Signature (of predicate p) 61Sec 41 Σ Set of predicate signatures 61Sec 41 o o V Output variable(s) 61Tab 42 s αSmil statement 62Tab 42 o = e αSmil assignment statement 62Tab 42 e1 = e2 αSmil equality test statement 62Tab 42 nop αSmil no operation statement 62Tab 42 r = e1 en αSmil create structure statement 62Tab 42 o1 on = r αSmil destructure structure 62Tab 42 o = rfi αSmil access field statement 62Tab 42 rprime = r with fi = e αSmil update field statement 62Tab 42 rprime = 〈f1 fk〉rprimeprime αSmil partial structure equality 62Tab 42 v = Cp[e] αSmil create variant statement 62Tab 42 switch(v) as [o1| ] αSmil destructure variant statement 62Tab 42 v isin C1 Ck αSmil variant possible statement 62Tab 42 o = a[i] αSmil array access statement 62Tab 42 aprime = [a with i = e] αSmil array update statement 62Tab 42 p(e1 ) [λ1 o1 | ] αSmil predicate call statement 62Sec 42 Gp = (N E) Control flow graph of predicate p 67Def 431 Γ V rarr T Typing environment 68Sec 43 v V Variable 68Sec 43 V Set of variables 68Sec 43 V+ sube V Writable variable identifiers 68Def 432 Σ P rarr S Maps predicate ids to signatures 68

xvi

Def 433 ΣΓO ` srarr λ Well-typed statement 68Sec 43 O sube V+ Output variables of a predicate 68Sec 44 Dτ Semantic values of type τ 70Sec 44 P sube Dτ Domain of valid array indices 71Sec 44 E = V rarr D Valuation or environment type 71Def 442 E E Valuation or environment 71Sec 44 Γ(v) Type of v 71Sec 44 Γ ` E Well-typed environment 71Def 443

langE [s]

rangConfiguration 71

Def 444langE [s]

rang λminusrarr Eprime Transition 71Def 445 E [xrarr v] Extension of E with xrarr v 72Def 446 I = PtimesErarrEtimesL Set of interpretations 72Def 446 I I Interpretation 72Sec 52 D Abstract dependency domain 83Def 521 δ D Dependency 83Def 521 gt D Everything atomic dependency 83Def 521 D Nothing atomic dependency 83Def 521 perp D Impossible atomic dependency 83Def 521 f1 7rarr δ1 D Structure dependency 83Def 521 [C1 7rarr δ1 ] D Variant dependency 83Def 521 〈δ〉 D Array dependency 83Def 521 〈δdef i δexc〉 D Array dependency exception for i 83Def 522 v sube DtimesD Partial order on dependencies 85Tab 51 Rules for v 86Def 523 or DtimesD rarr D Join operator for dependencies 86Tab 52 or cases 87Def 524 oplus DtimesD rarr D Reduction operator for dependencies 88Tab 53 oplus cases 89Def 525 f D 9 D Extraction of a fieldrsquos dependency 89Def 526 C D 9 D Extraction of a constructorrsquos dep 89Def 527 〈i〉 D 9 D Extraction of an arrayrsquos cell dep 89Def 528 〈lowast i〉 D 9 D Extraction of an arrayrsquos dep (exc) 90Def 529 〈lowast〉 D 9 D Extraction of an arrayrsquos dependency 90Tab 54 f c 〈lowast i〉 〈i〉 and 〈lowast〉 cases 90Tab 55 Γ ` δ τ Well-typed dependency 91Def 531 D = VrarrD Intraprocedural dependency domain 92Def 531 ∆ D Intraprocedural dependency 92Sec 531 Unreachable D Intra dep for unreachable nodes 92Def 532 ∆ x DtimesV rarr D Forget x 92Def 533 v∆ sube DtimesD Intraprocedural partial order 92Def 534 or∆ DtimesD rarr D Intraprocedural join operation 92Def 535 oplus∆ DtimesD rarr D Intraprocedural reduction operator 93Sec 532 JsKλ(∆nj ) Contribution of an edge (ni nj) 93

xvii

Sec 532 JsKλ() Transfer function of the edge s λ 93Sec 532 gensλ Written variables on the edge s λ 94Sec 532 ∆n D Dependency domain of node n 94Sec 532 I sube V Set of input variables 96Sec 54 χ Formal-Effective param mapping 101Sec 54 J (χ) Substitution formal to effective 101Def 631 π Π Symbolic path 120Def 631 Π Universe of symbolic paths 120Def 631 ε Π Symbolic path endpoint 120Def 631 fπ Π Symbolic path field 120Def 631 Cπ Π Symbolic path constructor 120Def 631 〈i〉π Π Symbolic path array cell 120Def 631 〈lowast i〉π Π Symbolic path array cells except 120Def 631 〈lowast〉π Π Symbolic path all array cells 120Sec 631 ΠtimesΠrarrΠ Path extension operator 121Sec 631 P 2Π Symbolic path set 121Def 632

v sub 2Πtimes2Π Partial order for path sets 121

Def 633or 2Πtimes2Πrarr2Π Join operator for path sets 121

Def 634 2ΠtimesΠrarr2Π Extension operator for path sets 121Def 635 π Π Actual path 122Def 635 Π Universe of actual paths 122Def 635 ε Π Actual path empty 122Def 635 f π Π Actual path field 122Def 635 Cπ Π Actual path constructor 122Def 635 〈i〉π Π Actual path array cell 122Def 61 E sub E timesΠtimesΠ Symbolic path covers actual path 122Sec 632 E sub E times2ΠtimesΠ Set of symbolic paths covers actual 122Def 636 JP KE sub E times2Πrarr2Π Interpretation of symbolic paths set 123Def 637 at ΠtimesDrarrD Find subpart of value at given path 123Tab 62 I ` π τrarrτ prime sub VtimesΠtimesTtimesT Symbolic paths typing judgement 124Sec 633 I

` P τrarrτ prime sub Vtimes2ΠtimesTtimesTSymbolic paths sets judgement 124

Def 641 δ D Extended dependency 125Def 641 D Ext abstract dependency domain 125Def 641 Deferred(o17rarrP1 ) D Deferred accesses dependency 125Def 642 A V 9 Π Access map 125Tab 63 Deferred rule for v 126Tab 64 or cases for deferred 127Tab 65 oplus cases for deferred 127Tab 66 f c 〈lowast i〉 〈i〉 〈lowast〉 deferred cases128Tab Γ IO ` δ τ Well-typed dependency deferred rule128Def 661 σ V rarr D Substitution roots vars to deps 132Def 662 φ V9V Substitution indices in arrays 132Sec 661 J (σ φ) Substitutes deferred dependencies 132

xviii

Sec 661 bull Applies symbolic paths to dep 132Sec 661 Applies symbolic path to dep 133Def 721 R R Partial equivalence 141Def 721 R Partial equivalence type 141Def 721 Equal R Partial equivalence equal 141Def 721 Any R Partial equivalence unrelated 141Def 721 f1 7rarr R1 R Partial equivalence structure 141Def 721 [C1 7rarr R1 ] R Partial equivalence variant 141Def 721 〈Rdef 〉 R Partial equivalence array 141Def 721 〈Rdef i Rexc〉 R Partial equivalence array + exc 141Def 722 vR sube RtimesR Preorder for partial equivalences 142Def 71 Rules for vR 142Def 723 orR RtimesRrarrR Join for partial equivalences 142Tab 72 orR cases 142Def 724 andR RtimesRrarrR Meet for partial equivalences 142Tab 73 andR cases 142Def 725 extrf R9R Extracts fieldrsquos partial eqv 143Def 726 extrC R9R Extracts constructorrsquos partial eqv 143Def 727 extr 〈i〉 R9R Extracts cellrsquos partial eqv 143Tab 74 extrf extrC and extr 〈i〉 cases 144Tab 75 Γ ` R τ Partial equivalence well-typedness 145Sec 722 JRKτ Partial equivalence semantics 145Def 731 π Π Access path 147Def 731 Π Access path type 147Def 731 ε Π Access path empty 147Def 731 f π Π Access path field 147Def 731 Cπ Π Access path constructor 147Def 731 〈i〉π Π Access path array cell 147Def 732 κ K Correlation map 147Def 732 K = ΠtimesΠrarrR Correlation map type 147Sec 731 (π ρ) 7rarr R ΠtimesΠtimesR Correlation 147Tab 77 ΓI ` π τrarr τ Well-typed access path 148Tab 78 ΓI `(πρ) 7rarrR (τlτr) Well-typed correlation 148Tab 79 ΓI `κ (τlτr) Well-typed correlation map 149Def 733 micro M Link 151Def 733 M Link type 151Def 733 Identical M Link identical 151Def 733 Left π M Link left path has suffix π 151Def 733 Right π M Link right path has suffix π 151Def 733 Incompatible M Link incompatible paths 151Def 734 f ΠtimesΠrarrM Matching Operator 151Def 735 R

(πρ)(πprimeρprime) Aligning a correlation 152

Def 736 Computation of R(πρ)(πprimeρprime) 154

xix

Def 737 ΠtimesR9R Projection 154Def 738 x RtimesΠ9R Injection 154Def 739 κ (πprime ρprime) Aligns correlation maps 154Def 7310v sube K timesK Correlation maps preorder 155Def 7311

orKtimesKrarrK Join for correlation maps 155

Def 7312and

KtimesKrarrK Meet for correlation maps 155Def 741 K K Intraprocedural corr summary 156Def 741 K = VtimesVrarrK Intraproc corr summary type 156Sec 741 NoCorrelation K Any for any pair of variables 156Def 742 vK sube KtimesK v for intraproc corr summaries 156Def 743

orK KtimesKrarrK Join for intraproc corr summaries 156

Def 744 Csλ() C Contribution of an edge 157Sec 741 csλ K Corr created by stmt s on label λ 157Sec 741 killλ sube V Variables redefined by stmt on label157Def 745 (πbull ρbull) 7rarr R ΠtimesΠtimesR New correlation after composition 161Def 746 KtimesKrarrK Composition of correlation maps 161Def 747 CtimesKrarrK Contribution Csλi(Kni) 161Def 719 Γ IO K Well-formed intraproc corr summ 162Sec 742 o Final value of o 162Def 751 Kp ΛprarrK Interproc correlation domain 166Def 751 Λp sube L Output labels of predicate p 166Sec 76 Impossible R Partial eqv constructor impossible 168Sec 76 RCiCj R Partial eqv variant matrix 168

xxi

To my family and close ones

xxiii

Chapter I

Reacutesumeacute eacutetendu en Franccedilais

I1 Le Problegraveme du FrameDans le domaine de la veacuterification formelle de logiciels il est impeacuteratif drsquoidentifier leslimites au sein desquelles les eacuteleacutements ou fonctions opegraverent Une speacutecification com-plegravete drsquoune opeacuteration doit non seulement preacuteciser que les valeurs de sortie possegravedentune certaine propriegravete mais elle doit eacutegalement deacutelimiter les parties de lrsquoeacutetat drsquoeacutentreacuteesur lesquelles lrsquoopeacuteration fonctionne Ces limites constituent les proprieacuteteacutes de frame(frame properties en anglais) Elles sont habituellement speacutecifieacutees manuellement parle programmeur et leur validiteacute doit ecirctre veacuterifieacutee il est neacutecessaire de prouver que lesopeacuterations du programme nrsquooutrepassent pas les limites ainsi deacuteclareacutees La speacutecificationet la preuve de proprieacuteteacutes de frame est une tacircche notoiremment connue comme eacutetantlongue et fastidieuse Lrsquoeffort consideacuterable investi dans cette tacircche est une manifesta-tion du problegraveme de frame (frame problem en anglais) Les manifestations du problegravemede frame apparaissent dans le contexte de tous les langages de speacutecification et de toutesles meacutethodes de veacuterification formelle

I2 ObjectifsAu fil du deacuteveloppement de ProvenCore un micro-noyau polyvalent qui garantit lrsquoisola-tion il est apparu eacutevident que la speacutecification et la veacuterification des systegravemes de transi-tion en geacuteneacuteral ainsi que la speacutecification et veacuterification des systegravemes drsquoexploitation enparticulier ne sont pas immunes au problegraveme du frame Les systegravemes drsquoexploitation sontcaracteacuteriseacutes par des eacutetats complexes deacutefinis par des types de donneacutees algeacutebriques et destableaux associatifs qui sont des briques fondamentales pour repreacutesenter et manipulerdes donneacutees complexes drsquoune maniegravere efficace Les systegravemes drsquoexploitation sont aussicaracteacuteriseacutes par des transitions qui associent de tels eacutetats drsquoentreacutee agrave de nouveaux eacutetatsde sortie Cependant la plupart des transitions ne sont pas concerneacutees par lrsquoeacutetat drsquoen-treacutee dans son inteacutegraliteacute mais deacutependent de et modifient un sous-ensemble de celui-ciIntuitivement des proprieacuteteacutes valides pour lrsquoeacutetat drsquoentreacutee restent trivialement validespour lrsquoeacutetat de sortie obtenue apregraves la transition tant qursquoelles deacutependent seulement desparties de lrsquoeacutetat drsquoentreacutee qui ne sont pas modifieacutees par la transition En pratique prou-ver la preacuteservation de ces proprieacuteteacutes nrsquoest pas une tacircche eacutevidente et impose un effortmanuel conseacutequent et une foule de preuves peacutenibles et reacutepeacutetitives

xxiv

Lrsquoobjectif de notre travail a eacuteteacute drsquoadresser ce problegraveme et de trouver une solutionautomatiseacutee pour infeacuterer la preacuteservation de ces proprieacuteteacutes Plus preacuteciseacutement notre but aeacuteteacute lrsquoinfeacuterence automatique des proprieacuteteacutes qui deacutependent drsquoun sous-ensemble de lrsquoentreacuteequi est disjoint du frame de lrsquoopeacuteration crsquoest-agrave-dire du sous-ensemble de lrsquoeacutetat qui estmodifieacute Agrave cette fin nous avons proposeacute une solution baseacutee sur lrsquoanalyse statique quine requiert pas drsquoannotations de frame suppleacutementaires En deacutetectant le sous-ensemblede lrsquoeacutetat dont deacutepend une proprieacuteteacute ainsi que la partie qui nrsquoest pas affecteacutee par uneopeacuteration nous pouvons reacutesoudre automatiquement les obligations de preuve lieacutees agravedes parties non modifieacutees

Nous employons deux analyses statiques dans ce but une analyse de deacutependance etune analyse de correacutelation Les deux analyses gegraverent des programmes manipulant des ta-bleaux associatifs ainsi que des types de donneacutees algeacutebriques (structures et variants) etcalculent des reacutesultats refleacutetant la structure sous-jacente de ces types (champs construc-teurs et cellules de tableau) Un raisonnement automatique baseacute sur le reacutesultat combineacutede ces deux analyses statiques permet drsquoinfeacuterer la preacuteservation de certaines proprieacuteteacuteesrelatives agrave lrsquoeacutetat de sortie Agrave terme ces deux analyses ont pour vocation agrave ecirctre em-ployeacutees par une tactique de preuve qui sera inteacutegreacutee agrave lrsquoassistant de preuve interactiveinclus dans la suite logicielle ProvenTools deacuteveloppeacutee par Prove amp Run

Smart le langage cibleacute par la suite logicielle ProvenTools est un langage purmentfonctionnel qui manipule des structures de donneacutees algeacutebriques et des tableaux associa-tifs immuables Ce travail a eacuteteacute motiveacute par la veacuterification de ProvenCore ProvenCore estimpleacutementeacute via de multiples raffinements entre des modegraveles successifs du noyau du plusabstrait qui permet la deacutefinition et la preuve de la proprieacuteteacute drsquoisolation au plus concretqui est utiliseacute pour la geacuteneacuteration de code Les eacutetats globaux des couches abstraites sontdes structures complexes contenant de nombreux champs eux-mecircmes composites Descommandes telles que fork exec et exit peuvent ecirctre exeacutecuteacutees Chacune de ces com-mandes reccediloit comme argument un eacutetat global drsquoentreacutee et produit lrsquoeacutetat du systegravemeapregraves exeacutecution de la commande En pratique la plupart des commandes supporteacuteespar le systegraveme ne menacent qursquoun nombre limiteacute drsquoinvariants Prouver automatique-ment la preacuteservation des invariants immunes peut diminuer consideacuterablement le nombretotal de preuves agrave la charge du programmeur et permet agrave celui-ci de se concentrer surles preuves les plus inteacuteressantes

I3 Analyse de deacutependanceLrsquoanalyse de deacutependance gegravere des fonctions et leur speacutecification de maniegravere uniformeElle calcule conservativement pour chaque sceacutenario drsquoexeacutecution possible une approxi-mation des sous-eacuteleacutements de lrsquoeacutetat drsquoentreacutee desquels deacutepend le reacutesultat Pour les va-riants une analyse suppleacutementaire est effectueacutee simultaneacutement afin de calculer le sous-ensemble des constructeurs possibles dans chaque sceacutenario drsquoexeacutecution

Nous avons deacutefini notre propre domaine abstrait repreacutesentant les deacutependances etobtenons des informations de deacutependance qui reflegravetent la structure en couche des typesde donneacutees

xxv

Cette analyse a eacuteteacute conccedilue dans le but drsquoecirctre exeacutecuteacutee agrave la voleacutee durant la veacuterifica-tion interactive et opegravere de maniegravere uniforme sur les programmes et leur speacutecificationces deux points confeacuterant agrave notre approche son originaliteacute Nous avons impleacutementeacute unprototype de cette analyse de deacutependance en OCaml et lrsquoavons appliqueacutee agrave une speacuteci-fication fonctionnelle de ProvenCore Les reacutesultats obtenus sont positifs par exemplelrsquoanalyse de deacutependance srsquoexeacutecute en moins drsquoune seconde sur un ensemble de plus de600 preacutedicats totalisant approximativement 10000 lignes de code

Afin drsquointroduire pour lrsquoanalyse de deacutependance une forme de sensibiliteacute au contextenous avons conccedilu une extension baseacutee sur des chemins symboliques Cette extensionrallonge leacutegegraverement le temps drsquoexeacutecution (de 10 agrave 20 sur les benchmarks utiliseacutes)Cependant en utilisant lrsquoanalyse de deacutependance avec cette extension nous avons obtenudes reacutesultats plus preacutecis pour 50 des preacutedicats inclus dans ces benchmarks

I4 Anaylse de correacutelationLrsquoanalyse de correacutelation deacutetecte le flot de valeurs drsquoentreacutee dans les valeurs de sortie Ellecalcule conservativement une approximation des eacutequivalences entre les sous-eacuteleacutementsdrsquoentreacutee et ceux de sortie pour une fonction donneacutee Crsquoest une analyse statique inter-proceacutedurale qui reacutesume le comportement drsquoune fonction et qui deacutetecte quelles partiesde lrsquoeacutetat sont modifieacutees et dans quelle mesure Nous avons deacutefini un type drsquoeacutequivalencepartiel qui reflegravete la structure des types de donneacutees algeacutebriques et tableaux associatifsPour gagner en preacutecision et ne pas perdre drsquoinformations lorsque lrsquoentreacutee et la sortieont des types diffeacuterents nous avons introduit un niveau intermeacutediaire Les correacutelationsconsistent donc en des chemins drsquoaccegraves vers des sous-eacuteleacutements de mecircme type et deseacutequivalences entre ces sous-eacuteleacutements Ce niveau intermeacutediaire permet de calculer demaniegravere flexible des eacutequivalences preacutecises entre des parties de lrsquoentreacutee et des parties dela sortie

Nous avons lagrave aussi impleacutementeacute en OCaml un prototype de cette analyse de cor-reacutelation et nous lrsquoavons appliqueacute agrave une speacutecification fonctionnelle de ProvenCore Lesreacutesultats obtenus sont encourageants par exemple les correacutelations calculeacutees pour unsous-ensemble de 630 preacutedicats totalisant approximativement 10000 lignes de code sontobtenus en moins de 05 secondes Bien que plus complexe que lrsquoanalyse de deacutependancelrsquoanalyse de correacutelation srsquoexeacutecute plus rapidement sur nos benchmarks car contrairementagrave la premiegravere elle ne srsquoapplique qursquoaux fonctions mais pas aux speacutecifications En effetles speacutecifications sont des preacutedicats booleacuteens et ne retournent pas un eacutetat modifieacute

I5 Proceacutedure de deacutecisionNous avons esquisseacute une proceacutedure de deacutecision qui emploie nos deux analyses statiquesCelle-ci constitue la premiegravere eacutetape de notre solution pour lrsquoinfeacuterence automatique dela preacuteservation des invariants de frame En mettant au jour des eacutequivalences entreles entreacutees et les sorties et apregraves avoir deacutetecteacute qursquoune proprieacuteteacute ne deacutepend que de

xxvi

parties inchangeacutees il est possible drsquoinfeacuterer la preacuteservation des invariants pour ces partiesinchangeacutees

La proceacutedure de deacutecision nrsquoa pas encore eacuteteacute impleacutementeacutee mais des expeacuteriencespreacuteliminaires et un prototype simple nous donnent une ideacutee de la maniegravere dont lesreacutesultats de deacutependance et de correacutelation doivent ecirctre unifieacutes Par ailleurs cela nous apermis de deacuteterminer le genre de requecirctes qui peuvent ecirctre traiteacutees et le meacutecanismepermettant drsquoy reacutepondre Les reacutesultats obtenus gracircce agrave notre prototype simple sur unespeacutecification fonctionnelle de ProvenCore sont deacutecrits et analyseacutes

Lrsquounification des reacutesultats des deux analyses passe par la creacuteation drsquoun graphe re-liant les variables drsquoentreacutee et de sortie examineacutees par la requecircte Les arcs repreacutesententdes correacutelations entre des sous-eacuteleacutements de ces variables qui sont deacutetecteacutees par la se-conde analyse Les deacutependances de la proprieacuteteacute dont on cherche agrave infeacuterer la preacuteservationindiquent les sous-eacuteleacutements qui influent sur le reacutesultat de cette proprieacuteteacute Lorsque cessous-eacuteleacutements sont laisseacutes intacts la proprieacuteteacute est trivialement preacuteserveacutee Lrsquoalgorithmedrsquounification parcourt donc le graphe en tentant de deacutetecter un maximum drsquoeacutequiva-lences entre des sous-eacuteleacutements des variables drsquoentreacutee et de sortie Si les sous-eacuteleacutementsindiqueacutes par la deacutependance sont inclus dans lrsquoensemble des sous-eacuteleacutements eacutequivalentsalors la proprieacuteteacute est neacutecessairement preacuteserveacutee car toutes les valeurs influant sur sonreacutesultat sont les mecircmes avant et apregraves lrsquoexeacutecution de lrsquoopeacuteration

I6 ConclusionPour conclure nous avons conccedilu et impleacutementeacute deux analyses statiques qui deacutetectentles deacutependances de donneacutees drsquoune proprieacuteteacute logique ainsi que des correacutelations entreles entreacutees et sorties drsquoopeacuterations Nos premiers reacutesultats sur un modegravele fonctionneldrsquoun micro-noyau sont encourageants tant pour leur preacutecision que pour la vitesse delrsquoanalyse ce qui rend ces analyses adeacutequates pour un usage dans le cadre drsquoun prouveurinteractif Hormis de menues ameacuteliorations impactant la preacutecision de notre analyse lesprochaines eacutetapes consistent agrave les combiner afin de deacutetecter les invariants qui ne sontpas affecteacutes par lrsquoexeacutecution drsquoun preacutedicat puis inteacutegrer cette deacutetection comme tactiquedans le prouveur de theacuteoregravemes ProvenTools Nous pensons qursquoil est possible de tirerparti des speacutecifications de frame agrave moindre coucirct en particulier sans que cela imposeau programmeur lrsquoeacutecriture fastidieuse drsquoannotations intuitivement eacutevidentes Lors dela veacuterification formelle de systegravemes de transition complexes il devient alors possibledrsquointeacutegrer aux outils de deacuteveloppement une infeacuterence automatique de la preacuteservationdes invariants lieacutes au frame via lrsquoanalyse statique

1

Chapter 1

Introduction

No human investigation can claim tobe scientific if it doesnrsquot pass the testof mathematical proof

Leonardo da Vinci

11 Formal Verification of SoftwareSince the middle of the last century computers and information technology broughtforth a digital revolution fundamentally changing the way we live work and inter-act with one another Nowadays computer programs govern our world and softwarepermeates our lives in manifold ways shaping our interactions with the surroundingenvironment From the alarm clock that marks the start of our day and the coffee ma-chine that motivates us to leave the house to the smart phone we use for checking ouremails or bank account and the car we are driving (or the automated driverless subwaywe are relying on) some type of software is discreetly acting in the background Wehave grown so accustomed to it that we do not even notice it anymore until it assertsitself by impeding us to check our email by displaying a blue error screen on an ATM orticket machine or by serving us a salty bag of crisps instead of the desperately neededbottle of water we have just paid for on a vending machine Such reminders can lead tofrustration and cause inconveniences but essentially they cause minor problems How-ever receiving such reminders as a result of malfunctions of medical equipment suchas radiation therapy machines of flight control systems Mars orbiters satellites or nu-clear power plants can have dramatic consequences endangering human lives causingenvironmental harm or entailing significant financial losses Therefore the quality ofthe software around us not only influences the quality of our daily lives but it mightpotentially have an impact on our safety and the safety of our surrounding world

Writing reliable completely error-free software is a difficult task and even a utopianone in the absence of dedicated rigorous approaches for improving its quality Indeedfor many software systems no guarantees or warranties are provided and their qualityis addressed only by traditional software engineering approaches such as testing or codereview which cannot guarantee the absence of bugs While this can be acceptable fornon-critical programs mission- or safety-critical software systems for which software

2 Chapter 1 Introduction

quality is of the utmost importance have to guarantee the absence of runtime errorsand provide high levels of confidence regarding their functional correctness Certainsafety-critical market segments impose standards and regulatory requirements for thedevelopment of such software systems In these domains formal program verificationis emerging as a promising approach gaining a wider audience and more and moreterrain

Formal program verification comprises a set of techniques and tools that can be usedto ensure by mathematical means that the program under scrutiny fulfills its functionalcorrectness requirements ie that it computes the right information For achieving thisgoal a formal description or specification of the programrsquos expected behaviour mustbe given Once this is established multiple mathematical tools can be employed forformally verifying that the programrsquos implementation follows the formal specification

Formal methods can be traced back to the early days of computer science andtheir origin can be linked to the names of Floyd (Floyd 1967) Hoare (Hoare 1969)and Naur (Naur 1966) (and later to that of Dijkstra (Dijkstra 1976)) and theirmethods for verifying program code with respect to assertions Despite their earlyfoundations formal methods seemed for decades to be confined to the research worldas a consequence of intricate notations failure to scale to real-world programs andlimited or inadequate tool support Since the 1960rsquos however considerable progresshas been made in the field of formal methods in terms of both methodology and toolsfor computer aided program verification Still formal program verification methods arenot yet a widespread alternative or even complement to testing in the industry Unliketesting that cannot show the absence of bugs the goal of formal verification methodsis to prove by means of mathematical tools that the program execution is correct in allspecified environments without actually executing the program itself These are staticverification techniques

Static verification techniques include program typing model checking deductiveverification methods and static program analysis Besides requiring a formal specifica-tion of the programrsquos intended behaviour and its envisioned properties at runtime allformal methods are theoretically characterized by undecidability and complexity whichare addressed by introducing some form of approximation For soundness consider-ations these approximations are necessarily over-approximations and all static veri-fication techniques are necessarily conservative they can prove the absence of someerroneous runtime behaviours but they will inevitably trigger some false warnings re-jecting certain behaviours that are in practice correct

Program Typing Type systems (Cardelli and Wegner 1985) are tools for reasoningabout programs More specifically they constitute ldquoa syntactic method for proving theabsence of certain program behaviours by classifying phrases according to the kindsof values they computerdquo (Pierce 2002) They are used for computing static approxi-mations of the runtime behaviours of the terms in a program and can guarantee thatwell-typed programs are free from certain runtime type errors such as passing stringsas arguments to a primitive arithmetic operation or using an integer as a pointer

11 Formal Verification of Software 3

In practice type systems have become the most widespread instance of formalmethods with applications to many programming languages and automatic typecheck-ers built into a variety of compilers Static typecheckers entail a variety of benefitsranging from early error detection to offering convenient abstraction and documen-tation mechanisms and improving the efficiency of compilers which nowadays makeuse of the information provided by typecheckers during their optimization and codegeneration phases

The Curry-Howard correspondence implies that types can be used for expressingarbitrary complex mathematical specifications Additional type annotations could inprinciple enable the full proof of complex properties effectively transforming typecheckers into proof checkers (Pierce 2002) Approaches such as Extended Static Check-ing (Leino 2001 Leino and Nelson 1998 Flanagan et al 2002) made progress towardsimplementing entirely automatic checks for broad classes of correctness properties

Additionally approaches relying on type inference have been used for alias analy-sis (OrsquoCallahan and Jackson 1997) and exception analysis (Leroy and Pessaux 2000)Powerful type systems based on dependent types (Martin-Loumlf 1984 Nordstroumlm Peters-son and Smith 1990) are used in automated theorem proving Various proof assistantsincluding Coq (Bertot and Casteacuteran 2004 Sozeau and team 1997) 1 are based on typetheory

Model Checking Model checking is a verification technique exhaustively exploringall possible system states in a systematic manner (Baier and Katoen 2008) More pre-cisely given a finite-state model of a system and a formal property a model checkingtool verifies whether the property under scrutiny holds for a state in the given modelModel checking emerged as a popular lightweight formal method as a consequence ofprogress made in the development of program logic and decision procedures auto-matic model checking techniques and compiler analysis (Jhala and Majumdar 2009)First program logic and decision procedures (Nelson and Oppen 1980 Shostak 1984)provided the needed framework and algorithmic tools to reason about infinite statespaces Automatic model checking techniques (Clarke and Emerson 1981 Vardi andWolper 1994) for temporal logic provided algorithmic tools for state-space explorationAbstract interpretation (Cousot and Cousot 1977) provided connections between thelogical world of infinite state spaces and the algorithmic world of finite representa-tions (Jhala and Majumdar 2009)

Currently model checking continues attracting considerable attention from the in-dustry This can be partly explained by it being a rather general verification approachthat is suitable for applications stemming from different areas ranging from embeddedsystems to hardware design In addition it is also an automatic lightweight techniquesupporting partial verification and requires a low degree of user interaction and a lowerdegree of expertise (Baier and Katoen 2008) compared to other verification techniques

1Coq Reference Manual Version 86 httpscoqinriafrdistribcurrentfilesReference-Manualpdf

4 Chapter 1 Introduction

Its main weaknesses stem on one hand from it suffering from the combinatorial state-space explosion (the number of states needed to model the system accurately may easilyexceed the amount of available computer memory) and on the other hand from itbeing less suitable for data-intensive applications

Model checking techniques also impose the production of models often expressedusing finite-state automata which are in turn described in a dedicated description lan-guage Another prerequisite for model checking is a formal specification of the prop-erties to be verified typically provided by means of temporal logic which is suitablefor the specification of a variety of properties ranging from functional correctness andsafety to liveness fairness and real-time properties (Baier and Katoen 2008)

Deductive Verification Methods Deductive verification methods consist in pro-ducing formal correctness proofs by first generating a set of formal mathematical proofobligations from the program and its specification and by subsequently dischargingthese Based on the manner in which proof obligations are discharged namely auto-matically or interactively the deductive verification methods can be classified into twobroad categories Both require a thorough understanding of the system to be provenas well as a good knowledge of the employed proof tools

The first category of deductive methods rely on standalone tools that accept asinputs programs written in a specific programming language (such as Java C or Ada)and specified in a dedicated annotation language (such as JML or ACSL) These auto-matically produce a set of mathematical formulas called verification conditions whichare typically proven using automatic theorem provers (Gallier 1987) or satisfiabilitymodulo theories solvers (SMT) such as Alt-Ergo Z3 CVC3 Yices Deductive verifi-cation tools such as Why3 or Boogie have their own programming and specificationlanguage (WhyML and Boogie respectively) which can act as intermediate verifica-tion languages and are designed as a layer on which to build program verifiers for otherlanguages Verifiers for C Dafny Chalice and Spec have been built using BoogieWhyML has been used for the verification of Java C and Ada programs

The second category of deductive methods relies on interactive theorem provers(Bertot and Casteacuteran 2004) also called proof assistants such as Isabelle Coq AgdaHOL or Mizar Both the program and its specification are encoded in the proof as-sistantrsquos own language (Gallina and Isar respectively) and the proofs that a programfollows its specification ie that it is functionally correct are typically conducted inan interactive manner using the underlying proof construction engine In other wordsusers are required to actively participate in the verification process by providing induc-tive arguments and guiding the proof through proof tactics proof hints or strategies

Both deductive verification methods offer a high level of assurance For automatictheorem provers the proof chain consisting of multiple steps (the model of the inputprogramming language the generator of verification condition the used SMT solver) atwhich errors could potentially infiltrate can be perceived as a weakness For interactivetheorem provers the high-level expertise required to employ them can be perceived asdiscouraging by the wider audience However major industrial breakthroughs havebeen recently achieved For instance Hyper-V Microsoftrsquos hypervisor for highly secure

12 The Frame Problem in a Nutshell 5

virtualization was verified using VCC and the Z3 prover (Leinenbach and Santen 2009)CompCert (Leroy 2009) the first formally proven C compiler was verified using theCoq proof assistant High security properties of the seL4 microkernel (Klein et al2009) have been proven using the IsabelleHOL proof assistant

Static Program Analysis Static program analysis comprises multiple techniquesfor computing at compile-time safe approximations of the set of values or behavioursthat can occur dynamically when executing a program Static analysis techniquesinitially emerged in the field of compilation where they provided manners to generatecode efficiently by avoiding redundant or superfluous computations (Nielson Nielsonand Hankin 1999)

Static analyses compute sound conservative information However for decadestheir scalability to industrial-size programs has been doubted and their application hasbeen considered as being limited to the research world and to small programs Recentmajor breakthroughs have been achieved however and they triggered on one hand theinclusion of static analysis at different levels of the software validation process (Cousot2001) and on the other hand a proliferation of static code analysers for a varietyof languages targeting mainstream usage and offering a solution for detecting andeliminating common runtime errors A recent example is Infer (Calcagno and Distefano2011) an open-source static analysis tool for bug detection in Java C and Objective-Ccode It was developed at Facebook where it is used as part of the development processfor mobile applications Furthermore static analysis techniques and tools are nowadaysemployed in the safety-critical market segment For instance Astreacutee (Cousot et al2005 Blanchet et al 2003 Cousot et al 2007) a static analyser for embedded softwarewritten in C has been employed for the verification of aerospace software (Delmas andSouyris 2007 Bouissou et al 2009 Bertrane et al 2015) In particular it has beenused for proving the absence of runtime errors in the primary flight control software ofthe fly-by-wire system of Airbus airplanes

It is argued (Cousot and Cousot 2010) that model checking deductive verifica-tion and static program analysis represent approximations of the program semanticsformalized by the abstract interpretation theory (Cousot and Cousot 1977)

Broadly speaking this thesis focuses on static program analysis techniques that aremeant to be used during interactive theorem proving in order to facilitate and auto-mate the verification of a certain class of properties in the context of a strongly typedlanguage

12 The Frame Problem in a NutshellThe frame problem (McCarthy and Hayes 1969) has been initially identified and de-scribed by McCarthy and Hayes in 1969 in the context of Artificial Intelligence (AI) Itshistory is essentially intertwined with that of logicist AI the branch of AI attempting

6 Chapter 1 Introduction

to formalize reasoning within mathematical logic The initial description of the frameproblem is the following

ldquoIn proving that one person could get into conversation with anotherwe were obliged to add the hypothesis that if a person has a telephone hestill has it after looking up a number in the telephone book If we hada number of actions to be performed in sequence we would have quite anumber of conditions to write down that certain actions do not change thevalues of certain fluents In fact with n actions and m fluents we mighthave to write down mn such conditionsrdquo

Unsurprisingly given its identification in the context of logicist AI the frame prob-lem manifests itself in the realm of formal software specification and verification aswell (Borgida Mylopoulos and Reiter 1993) In this area it continues to identify acurrent problem having notoriously tedious consequences and imposing a considerableamount of manual effort For instance when considering a simple procedure

transferAmount(ownerId id1 id2 amount)

that records the transfer of a given sum of money amount from a customerrsquos (identifiedby ownerId) current deposit account (identified by the account number id1) to a savingsaccount (identified by the account number id2) a reasonable specification would bethe following

Precondition owner(id1) = ownerId and owner(id2) = ownerIdandavailableAmount(id1) ge amount

Postcondition availableAmount(id1)rsquo = availableAmount(id1) - amountandavailableAmount(id2)rsquo = availableAmount(id2) + amount

The program states prior to the procedurersquos execution and the ones subsequent to it arereferred to by the typical unprimedprime notation and by the availableAmount(id)and owner(id) functions The given specification declares a precondition that hasto hold prior to transferring the indicated sum of money from one account to theother and it stipulates that the customer identified by ownerId must be the owner ofboth accounts involved in the transaction It also requires that the currently availableamout of money in the deposit account identified by id1 is higher than the amount tobe transferred The postcondition specifies the procedurersquos effects on the final programstate and encompasses the conditions that have to hold after executing the procedureThey include a stipulation about incrementing the amount of money available in thesavings account by the transferred sum amount as well as one referring to decrementingthe amount of money available in the current account by the same amount

As discussed by Borgida et al (Borgida Mylopoulos and Reiter 1993) the prin-ciples on which this specification relies are simple and ubiquitous Program states

13 Prove amp Run Objectives and Products 7

are represented in terms of predicates and functions and a procedurersquos effects on theprogram state are represented as changes to one or more of these predicates and func-tions However the above specification can be interpreted in at least two manners andmultiple implementations with different effects can comply to it For instance oneimplementation that can be considered results in exactly two changes to the programstate as required by the postcondition and as intuitively expected Another implemen-tation considered makes these two changes but additionally also changes the ownershipof the two accounts involved in the transition The postcondition still holds after exe-cuting the second procedure version However the intuitive interpretation of the givenspecification namely that nothing else but the amount of money in the two accountschanges is inconsistent with the second implementation which does more than it isnecessary and indeed even desired In order to prevent such situations the postcon-dition for the transferAmount(ownerId id1 id2 amount) procedure would haveto also include conditions such as

forall id owner(id)rsquo = owner(id) and owner(id2)rsquo = owner(id2)and

forall id id = id1rArr id = id2rArr amount(id)rsquo = amount(id)

In other words the postcondition should include not only information about whatchanges but also about what does not change While this might not seem dramaticfor the trivial example illustrated above in real-world examples this quickly escalatesleading to the necessity of specifying a plethora of conditions of the same type as theones indicated above These are called frame properties Writing such conditions isnecessary but also notoriously repetitive and tedious Kogtenkov et al (KogtenkovMeyer and Velder 2015) rightfully state that

ldquoIt is hard enough to convince programmers to state what their programdoes forcing them in addition to specify all that it does not do may be atough sellrdquo

The tedious undeserved manual effort entailed by the specification and verificationof frame properties is a manifestation of the frame problem Though certain conventionsand approaches such as the implicit frames approach for specifying frame propertiescan alleviate the manual effort imposed some manifestation of the frame problem willbe visible to some extent in the context of any specification language and verificationmethod

13 Prove amp Run Objectives and ProductsThe proliferation of mobile devices with unprecedented processing power storage ca-pacity and access to information already generated a plethora of new possibilities forbillions of people Breakthroughs in emerging technology stemming from fields suchas artificial intelligence and the Internet of Things have increased the number of such

8 Chapter 1 Introduction

possibilities but also brought forth an unprecedented number of massive security risksand challenges Prove amp Runrsquos2 objective is to offer solutions for the security chal-lenges entailed by the large-scale deployment of mobile and connected devices and ofthe Internet of Things

Attempts at addressing security challenges and diminishing or eliminating potentialsecurity issues in systems linked to such devices must put their underlying operatingsystems and kernels at the core of their efforts to ensure the absence of errors orfaulty behaviours Any software running on the operating system depends on theoperating system Furthermore operating systems run in privileged modes in whichprotection from certain faulty behaviours is non-existing and bugs can lead to arbitraryeffects Therefore these central software parts need to provide a high level of trust anddemonstrate proven and auditable compliance with security properties

Motivated by the desire to integrate the usage of formal methods in the industryworld and therefore to contribute to the increase of software quality and security thecompanyrsquos initial efforts concentrated on offering a reliable software solution that fa-cilitates the formalization of software functioning and mathematically proves that thissoftware accurately and correctly follows its specification and ensures complex secu-rity properties This led to the development of ProvenTools a software developmenttoolchain designed to write and formally prove models written in Smart Prove amp Runrsquospurely functional unified programming and specification language For formally prov-ing models written in Smart ProvenTools integrates an interactive proof assistant whichautomates simple proofs and guides or assists users during more complex ones Theprover was designed to offer detailed explanations about its results providing either thereasoning steps employed for achieved proofs or detailed information for properties thatcannot be proven Such transparency on the proverrsquos side is imperative for productsthat have to be certified as auditors need to be able to verify the claims of the proverFurthermore ProvenTools includes a generator for transforming programs modeled inSmart into their equivalents in other languages such as C while leveraging the proofguarantees of the Smart model

Following the development of ProvenTools Prove amp Run reached a new stage con-centrating on developing and providing formally proven microkernels and hypervisorsUnlike the widely used operating systems which are enormous and typically have mil-lions of lines of code microkernels are compact minimal software systems that canprovide all the mechanisms that need to run in privileged mode including low-level ad-dress space management thread management and inter-process communication Theycan be used for creating a protected secure environment on the execution platformon top of which sensitive security-critical services can run Being much smaller in sizecompared to traditional operating systems they are amenable to formal verificationHypervisors or virtualization platforms create and host virtual machines They cre-ate the possibility of running multiple different operating systems whose execution ismanaged by the hypervisor which has full control over all critical resources such asthe memory or the CPU Therefore any security issue of the hypervisor impacts every

2Prove amp Run Website httpwwwprovenruncom

14 Context and Problem Statement 9

operating system it hosts The security and reliability of the host hypervisor is thuscrucial

By employing Smart and ProvenTools two microkernels have been developed3 Thefirst named ProvenCore is a formally proven general purpose microkernel that ensuresisolation ie integrity and confidentiality The second named ProvenCore-M targetsembedded devices based on microcontrollers ProvenVisor is a hypervisor currently indevelopment at Prove amp Run

14 Context and Problem StatementDuring the development of ProvenCore it became obvious that the specification andverification of transition systems in general and operating systems in particular arenot insulated from the frame problem The latter are characterized by complex statesdefined by algebraic data types and associative arrays which are fundamental buildingblocks for representing grouping and handling complex data efficiently Transitionstheir other characteristic component map such a complex input state to an outputstate However most transitions are rarely concerned with the entire input state thatthey are manipulating for retrieving the output state Most frequently they depend on

sX

t

f

Observation

Observation

Figure 11 ndash Complex Transition Systems Frame Problem

and modify only a limited subset of it Intuitively properties holding for the inputstate should hold for the output state following the transition as well as long asthey depend only on fragments of the state that are not modified by the transition Inpractice proving the preservation of such properties does not come for free and imposesconsiderable manual effort and a multitude of tedious repetitive proofs

3Prove amp Run Products httpwwwprovenruncomproducts

10 Chapter 1 Introduction

This general case is illustrated in Figure 11 where a transition system and a states in it are considered For the state s a property depending only on a limited subsetshown in the grey rectangle with vertical lines is known to hold A transition f leadsto a new state t obtained by modifying only a small part of the input state s shownin the orange rectangles with inclined lines Since the previously proven property isknown to depend only on an unmodified subset of the state we should be able to inferthe preservation of the property for the state t as well This however is not inferred bydefault

The goal of this work is to address this issue and to find an automatic solution forinferring the preservation of such properties More specifically we target the automaticinference of properties that depend only on an input subset that is disjoint from anoperationrsquos frame ie the state subset it modifies

To this end we propose a solution based on static analysis which does not requireany additional frame annotations We argue that by detecting the subset on which aproperty depends and by uncovering the part that is not modified by an operationas shown in Figure 12 we can automatically discharge proof obligations related tounmodified parts We employ two different static analyses for this goal

Dependency Obs

= Obs

Correlation f

=

Invariant Obs

rArr Obs

f

Figure 12 ndash Frame Problem and Solution Strategy

The first analysis of our two-step strategy is a dependency analysis which is meantto detect the input subset δ on which the outcome of an operation or of a logicalproperty L relies This was illustrated by the grey rectangle with vertical lines inFigure 11 The second one is a correlation analysis meant to detect the subsetξ modified by an operation O This was illustrated by the orange rectangles withinclined lines in Figure 11 By employing these two static analyses thus detecting δand ξ automatically and by subsequently reasoning based on their combined resultswe can infer the preservation of the property L for the post-state of O

We target the development of a proof tactic that relies on our solution based onstatic analysis and that is meant to be integrated into the interactive proof assistantoffered by ProvenTools Smart the language to which the ProvenTools toolchain isassociated is a purely functional language manipulating immutable algebraic datastructures and associative arrays

15 Contributions and Structure of the Document 11

The motivation and ideas behind this work were triggered by the verification ofProvenCore Its proof is based on multiple refinements between successive models fromthe most abstract on which the isolation property is defined and proven to the mostconcrete ie the actual model used for code generation The global states of the ab-stract layers are complex structures with multiple compound fields Commands suchas fork exec exit can be executed Each of these receives as input the global statebefore executing the command and returns the state of the system after execution Inpractice most supported commands effectively affect only a limited number of invari-ants Automatically proving the preservation of unaffected invariants can diminish thetotal number of proof obligations

15 Contributions and Structure of the DocumentWe propose an approach for automatically inferring the preservation of framing-relatedinvariants which is meant to be used in the context of an interactive theorem proverOur approach employs two different static analyses namely a dependency analysis and acorrelation analysis Both analyses handle associative arrays and algebraic data typesie structures and variants and compute fine-grained results mirroring the layeredstructures of such types

The dependency analysis handles functions and their specifications in a unified man-ner and computes for each possible execution scenario a conservative approximation ofthe input (sub)elements on which their outcome depends It is a flow-sensitive path-sensitive interprocedural analysis For variants an additional analysis is simultaneouslyconducted for computing the subset of possible constructors on a given execution sce-nario

In order to introduce a relaxed form of context-sensitivity for our dependency anal-ysis we have devised an extension based on symbolic paths

The correlation analysis detects the flow of input values into output values It com-putes a conservative approximation of fine-grained equivalences between the input andthe output subelements of a function It is an interprocedural analysis that summarisesthe behaviour of functions and detects what is modified and to what extent

For both analyses a prototype has been implemented and applied to a medium-sizedfunctional specification of a microkernel

The rest of this dissertation is structured into 8 chapters the first two being intro-ductory

Chapter 2 discusses the manifestations and effects of the frame problem on bothformal specification and formal verification and presents some of the main approachesemployed for addressing them We also include a brief presentation of some of theleading specification languages and deductive verification tools and their mechanismsfor dealing with frame properties

In Chapter 3 we introduce the features and the syntax of Smart the unified pro-gramming and specification language developed at Prove amp Run and give a conciseoverview of ProvenTools the toolchain associated with it

12 Chapter 1 Introduction

After these two preliminary chapters in Chapter 4 we focus on the computationalversion of Smartrsquos intermediate language as it is the language that we consider through-out the rest of this dissertation We present its syntax underline its specificities andpresent its formal semantics

Chapter 5 is dedicated to the dependency analysis the first of the two static analysesthat we have developed and designed as companion tools to be used during interactiveprogram verification We present our abstract dependency domain that mirrors thelayered structure of associative arrays and algebraic data types discuss the analysisat an intra- and interprocedural level and present the semantic interpretations of thecomputed dependency information

Chapter 6 touches upon the issue of context-sensitivity and presents our extensionto the dependency analysis presented in Chapter 5 This is meant to eliminate someimprecision by introducing a relaxed form of context-sensitivity

The correlation analysis the second component of our strategy for inferring thepreservation of frame-related invariants is presented in Chapter 7 We introduce ourabstract partial equivalence type discuss the need for an additional level of abstractionallowing us to refer not only to variables but also to substructures within them and givean in-depth presentation of the analysis at an intraprocedural level and a descriptionof it at the interprocedural level

The implementations of our two analyses and the results obtained on a medium-sizedfunctional specification of a microkernel are presented in Chapter 8 The strategy foremploying the information computed by the two analyses is discussed and illustrated

Finally Chapter 9 concludes this dissertation with a summary of our contributionsand some remarks concerning the specificities of each of our static analyses as wellas our experience with their design and implementation In addition we also discussfuture perspectives and potential extensions to this work

Notes about Chapter 5 and Chapter 7

bull The work presented in Chapter 5 was the subject of a publication in the pro-ceedings of the 17th International Conference on Formal Engineering Methods(ICFEM15) (Andreescu Jensen and Lescuyer 2015)

bull The work presented in Chapter 7 was the subject of a publication in the proceed-ings of the 14th International Conference on Software Engineering and FormalMethods (SEFM) (Andreescu Jensen and Lescuyer 2016)

bull On-line dedicated web pages The prototypes for each of the two discussedstatic analyses can be tested on their dedicated web pages Various examplesare provided and explained and additionally users can devise and test their ownexamples The corresponding links are indicated in the chapters

13

Chapter 2

The Frame Problem in SoftwareVerification

All his successors gone before him havedonersquot and all his ancestors that comeafter him may

William Shakespeare

In this chapter in Section 21 we give a very brief necessarily incomplete pre-sentation of some of the major existing specification languages and verification toolsfocusing on those which have addressed the frame problem explicitly and which are rel-evant for our discussion in the section following it We then discuss the manifestationsof the frame problem in formal specification and verification in Section 22 and presentthe basic approaches to specifying and verifying frame properties in Section 23 In Sec-tion 24 we explain some of the difficulties entailed by these goals when combined withother concerns such as considerations regarding heap modifications and informationhiding Even though we are not concerned with information hiding and heap modifica-tions are beyond the scope of our work there are some parallels that can be drawn andsome ideas stemming from work that has been done in these areas that are relevant forour context and solution as well In Section 25 we briefly present other approaches tothe automatic detection of frame properties Finally we give a short overview of someof the approaches used for specifying and reasoning about pure methods in Section 26

21 Specification Languages and Verification ToolsDafny Dafny (Leino 2010) is a programming language designed at Microsoft witha focus on verification It is an imperative sequential language supporting genericclasses dynamic allocation and inductive data types Additionally it also offers built-in specification constructs such as pre- and postconditions frame specifications (whichwe will discuss in more detail in Section 23) quantifiers loop invariants and termi-nation metrics (decreases clauses used in conjunction with loop invariants) Theseare reminiscent of contracts in Eiffel (Meyer 1997 Meyer 1991) or similar constructsin JML (Leavens Baker and Ruby 2006) and Spec (Barnett et al 2005b) whichwe will present in the following paragraphs as well Additionally Dafny also includes

14 Chapter 2 The Frame Problem in Software Verification

support for algebraic data types recursive functions and types as well as updatableghost variables which are not allowed to flow into non-ghost variables Ghost vari-ables and specification constructs in general are eliminated from the executable codeas they are meant to be used strictly during verification For framing Dafny relies ondynamic frames (Kassios 2006) using ghost variables We will discuss this approach inSection 24

Dafny has an accompanying static program verifier run as part of the compilerwhich targets the verification of functional correctness properties of programs Thisis built on top of the Boogie verification engine (Barnett et al 2005a) which in turnuses Z3 (Moura and Bjoslashrner 2008) The Dafny compiler translates verified programswritten in Dafny to executable code for the Net Platform The tool is open source andcan be tried online 1

Smart the modeling language developed at Prove amp Run will be presented in detailin Chapter 3 Similar to Dafny it is a unified programming and specification languagedesigned with the goal of facilitating verification Unlike Dafny Smart is a functionallanguage relying on predicates the equivalent of functions in other programming lan-guages Both Dafny and Smart are translated into intermediate languages (Boogie andSmil respectively) which act as median layers between Dafny or Smart programs andthe underlying verification tools For Smart the deductive verification tool is an inter-active proof assistant Executable code can be generated from both verified Dafny andverified Smart models

Spec The Spec programming system (Mike Barnett 2005 Barnett et al 2005bBarnett et al 2011) includes a programming language a compiler and a static programverifier It stems from a research effort focusing on the development of a specificationmethodology for object-oriented languages and seeking suitable approaches for enforc-ing it both statically and dynamically The Spec methodology introduced some newideas that influenced the research community and served as a starting point for otherapproaches (Barnett et al 2011) It supports sound modular verification of object in-variants in the presence of multi-object invariants subclassing and reentrancy Specled to advances concerning the specification of pure methods ie methods withoutside-effects and it introduced an ownership model that allows expressing and usingheap topologies in specifications (Barnett et al 2011) We will discuss the latter inSection 24

The language Spec is a formal object-oriented language extending the type sys-tem of C with non-null types and checked exceptions It provides standard methodcontracts based on pre- and postconditions as well as object invariants as inspiredby Eiffel and the Design by Contract (Meyer 1992) approach The accompanyingcompiler performs various static data-flow analyses for checking that the non-null typesystem is enforced and that contracts are pure ie have no side-effects In additionit also performs admissibility checks which are important for soundness and consist in

1Dafny Web Page httpswwwmicrosoftcomen-usresearchprojectdafny-a-language-and-program-verifier-for-functional-correctnessAccessed 2017-02-12 (Archived by WebCite Rcopy at httpwwwwebcitationorg6oE9sn0iL)

21 Specification Languages and Verification Tools 15

restricting what can appear in object invariants and what pure methods can read Thecompiler also emits runtime checks run-time assertions are generated for the programpoints at which contracts are supposed to hold and any failure causes an exception tobe thrown (Barnett et al 2011)

Another important contribution having its origins in the Spec project are theBoogie intermediate language and verification engine Spec programs are translatedto the Boogie language where the heap is modeled as a two-dimensional array indexedby object references and field names Method calls are modeled by assuming theirpreconditions and type information by assigning arbitrary values to anything thatthey might modify and by subsequently assuming their postconditions Based on thisverification conditions are generated and expressed in a standard format supported byautomatic theorem provers Any error reported by the theorem prover is mapped backto Boogie and then to Spec (Barnett et al 2011)

Spec2 has been developed at Microsoft and is publicly available

Boogie The Boogie project 3 comprises both an intermediate verification languageand a verification tool The Boogie language (This is Boogie 2 Boogie Reference Man-ual) is meant to be used as an intermediate representation for static program verifiersof various source languages such as Dafny Chalice and Spec Verifiers for C such asVCC and HAVOC have been built on top of Boogie as well It supports mathematical(types constants functions axioms) and imperative components (global variables pro-cedure declarations and implementations) The latter specify sets of execution tracesthereby describing and constraining states using the former Parametric polymorphismpartial orders nondeterminism logical quantifications total expressions and partialstatements are among the languagersquos features

The Boogie verification tool (Barnett et al 2005a) infers invariants of the inputBoogie programs and then generates verification conditions expressed as formulae infirst-order logic and arithmetic that are passed to an SMT solver such as Z3 Theencoding for the verification formulae allows the reconstruction of error traces fromfailed proofs

JML The Java Modeling Language (JML) (Leavens Baker and Ruby 2006 Leavenset al 2006) is a behavioural interface specification language (Wing 1987) targetingas its name implies the specification of Java classes and interfaces Its design wasguided by the syntax and semantics of Java as some of the main targeted charac-teristics were understandability and a shallow learning curve for programmers alreadyfamiliar with Java The constructs it supports are inspired by the Design by Contractapproach as well as by the Larch family of specification languages (Guttag Horning

2Spec Web Page httpswwwmicrosoftcomen-usresearchprojectspecAccessed 2017-02-12 (Archived by WebCite Rcopy at httpwwwwebcitationorg6oEAJnY8b)

3Boogie Web Page httpswwwmicrosoftcomen-usresearchprojectboogie-an-intermediate-verification-languageAccessed 2017-02-12 (Archived by WebCite Rcopy at httpwwwwebcitationorg6oEAgwOzp)

16 Chapter 2 The Frame Problem in Software Verification

and Wing 1985) It also includes quantifiers constructs for specifying frame conditionsand specification-only fields and methods

Nowadays an evergrowing variety of tools supports JML (Burdy et al 2005)ranging from tools for type-checking specifications (the jmlc compiler) to tools forruntime debugging static analysis (such as ESCJava2 (Flanagan et al 2002 Burdyet al 2005 Chalin et al 2005) and Chase) and verification (such as LOOP KeY andKRAKATOA)

ESCJava2 performs extended static checking (Flanagan et al 2002) for Java pro-grams annotated with specifications written in JML It can check assertions and detectfrequent types of errors in Java such as dereferencing null or indexing an array outsideits bounds However the ESCJava2 tool did not initially address aspects related tochecking frame conditions and this became a notorious source of unsoundness (Burdyet al 2005) Various static verification tools (Berg and Jacobs 2001 Catantildeo and Huis-man 2003 Marcheacute Paulin-Mohring and Urbain 2004 Marcheacute 2016) and dynamicapproaches (Lehner and Muumlller 2010) addressed this issue

22 Manifestations of the Frame ProblemIn the realm of software verification the frame problem refers to establishing the bound-aries within which program elements operate and it has notoriously tedious implica-tions and consequences along two different axes the specification of frame propertiesor frame conditions which indicate which parts of the program state an operationis allowed to modify and their verification ie proving that operations modify onlywhat is allowed according to the specified frame properties Additionally the verifi-cation of frame properties has other ramifications such as proving the preservation ofproperties concerning parts of the state that are external to an operationrsquos frame iethe parts of the state modified by the operation Though identified decades ago in1969 in the context of Artificial Intelligence (McCarthy and Hayes 1969) the frameproblem is still a current concern in the field of formal specification and verificationLeavens et al (Leavens Leino and Muumlller 2007) identify it as one of the difficultremaining challenges in program verification Even more recently Bertrand Meyer de-scribed it as a subsisting problem (Meyer 2015) He argues that it constitutes anexcellent candidate for automation and describes the usual approaches to the frameproblem such as those frequently based on separation logic (Reynolds 2005) or own-ership types (Clarke Potter and Noble 1998) as elegant but requiring undeservedmanual specification effort in addition to annotations on the implementation side Inorder to make verification appealing to a wider audience in the industry the amountof annotations required from the programmers is of the utmost importance and thusmust be carefully taken into consideration when devising a solution While it is le-gitimate to require the specification of properties expressing the functional behaviourexpected of program elements intermediate properties to which frame properties be-long to should as much as possible be detected automatically They are an integral

23 Approaches to Specifying Frame Properties 17

part of a complete specification and they are necessary for proving functional correct-ness but in practical terms they are repetitive and cumbersome and their specificationis an inconvenience (Meyer 2015) Borgida et al provide a comprehensive discussionof the problem itself and the approaches to addressing it (Borgida Mylopoulos andReiter 1993 Borgida Mylopoulos and Reiter 1995) In (Borgida Mylopoulos andReiter 1995) Borgida et al suggest grouping the permissions to modify variablesaround variables themselves instead of methods However this type of specificationshave an unclear semantics in terms of proof obligations (Muumlller 2002) A more recentdiscussion of framing is provided by Hatcliff et al and it is included in a comprehensivesurvey of behavioural interface specification languages (Hatcliff et al 2012) A discus-sion regarding the remaining challenges related to the frame problem with a focus onmodular verification and information hiding is included in (Leavens Leino and Muumlller2007) The authors discuss possible approaches for addressing these challenges as wellas their respective limitations In the following section we present the main existingapproaches to specifying frame properties

We remark that Smart does not provide any explicit specification constructs forframe conditions It is a functional language and it does not support global variables ordestructive updates Implicitly Smart predicates may read anything passed to them asan input without modifying it and write everything in their output or locally declaredvariables The preservation of a frame property ie a logical property depending onlyon parts of the input that are copied without any modification to the output can bespecified as an implication of the form

frame_property(input) =rArr predicate(input output) =rArr frame_property(output)which can be included either in the predicatersquos postcondition or as a separate predicatewith a Boolean result receiving the predicatersquos input output elements as inputs

23 Approaches to Specifying Frame Properties

Various approaches for expressing frame properties have emerged These are knownas the manual exclusive and implicit approaches (Meyer 2015) We remark that allthree major approaches target only the specification of write effects of an operationMost specification languages do not offer special constructs for the specification of readeffects (some notable exceptions are JML Dafny and WhyML the programming andspecification language provided by Why3)

231 The Manual Approach

One of the existing approaches to specifying frame properties does not rely onany specific technique but instead treats them like any other specification componentThis consists in explicitly stating for each operation what is not modified implicitlyconveying that everything else may change This type of specification can be donewith logical variables or with old expressions by explicitly stating for each unchanged

18 Chapter 2 The Frame Problem in Software Verification

variable that its value in the operationrsquos post-state is equal to its prior value in theoperationrsquos pre-state

As described by McCarthy and Hayes (McCarthy and Hayes 1969) with m op-erations such as transfer and n ldquofluentsrdquo such as owner in our introductory examplefrom Section 12 the manual convention leads to a proliferation of clauses that needto be specified Their number can potentially be as high as mn This can prove tobe tedious repetitive and diverting attention and effort from what is truly interestingwhat is actually modified by the operation and how Moreover this approach can leadto instability in the software process (Meyer 2015)

For instance adding new fields to a class whose existing methods are not affected bythe newly added fields requires modifying the postcondition for each existing methodand adding clauses of the form newField = old newField for each added field

Both Dafny (Leino 2010) and Spec (Leino and Muumlller 2008a) support clauses ofthe form e = old(e) in method postconditions for specifying that a method has noimpact on the value of an expression e However these are not the primary mechanismsfor specifying frames in either Dafny or Spec as we will discuss in Section 232

In Smart for predicates manipulating inputs and outputs of the same structuredtype it can be specified in the postcondition that the values of certain fields are equalbetween the received input and the obtained output For instance for a predicatereceiving an input structure of type stype having fields f g h and returning an outputstructure of the same type where the values of the fields f h are equal to their valuesin the input a standard postcondition would have the following form

stypeequals[fh](input output)

This can be viewed as a form of old expressions However the construct used in theabove postcondition which we will discuss in Chapter 3 was not introduced specificallyfor this purpose This idiom is frequently employed for specifying contracts for implicitpredicates a form of foreign or native functions signatures

As we will discuss in Chapter 7 the fine-grained relations that we are detectingbetween parts of the input and parts of the output can be seen as clauses of the formsubvalue = old(subvalue) However in our case these are detected automatically bymeans of static analysis and thus do not require any annotation or manual effortFurthermore by detecting them automatically the potential of changes to the modeledentities and types leading to instability is eliminated

Another problem with this approach becomes visible when some variables are notin scope and hence cannot be explicitly mentioned in the specification (Hatcliff et al2012) In order to overcome the problem in this context complex solutions (Reynolds1981 OrsquoHearn Reynolds and Yang 2001 Banerjee Naumann and Rosenberg 2008)based on Hoare logic style frame rules (Hoare 1971) have been suggested (Hatcliff etal 2012)

23 Approaches to Specifying Frame Properties 19

232 The Exclusive Approach

The most frequent approach to framing is the exclusive approach This consists inexpressing frame properties by means of modifies-clauses that list all the variables thatmay be modified by an operation Implicitly everything that is not listed in such clausesis understood as having to remain unchanged (Guttag et al 1993a) This approachrelies on the observation that the mn matrix described by McCarthy and Hayes isusually sparse as most operations affect only a limited number of elements (Meyer2015)

Modifies clauses such asmodifies a b c can be interpreted as a set of clauses of theform q = old(q) for any q other than a b or c Despite their widely accepted yet mildlymisleading name a modifies clause does not require a command to modify all the listedelements Essentially modifies clauses put an upper bound on the set of elements thatcan be modified and imply that it is strictly forbidden to modify anything else Theexclusive approach to specifying frame properties owns its name to its characteristicof identifying unaffected elements by exclusion (Meyer 2015) Bertrand Meyer arguesthat a more appropriate name for such clauses is only clauses (Meyer 2015) sincethe main goal is not necessarily to enumerate variables that will change but rather tospecify that everything else ie variables that are not listed will not change

This approach has its roots in the modifies construct presented by Liskov and Gut-tag (Liskov and Guttag 1986) Forms of modifies clauses have been used in manydifferent specification languages including the Larch family (Guttag Horning andWing 1985 Guttag et al 1993a) JML (Leavens et al 2006) Spec (Mike Barnett2005) Dafny (Leino 2010) and Z (Abrial Schuman and Meyer 1980)

In JML (Leavens Baker and Ruby 2006) modifies clauses are called assignableclauses and are used for indicating locations that a method may assign to These areslightly different than classical modifies clauses in other languages For instance amethod assigning to a location a and then re-establishing its original value is requiredto list a in its corresponding assignable clause A typical modifies clause however doesnot require listing a since the method does not modify a effectively JML also featuresconditional modifies clauses allowing methods to specify that a modification may occuronly in certain situations Non-pure methods that do not explicitly specify assignableclauses are by default given an assignable everything clause Pure methods have bydefault an assignable nothing clause (Chalin et al 2005) Additionally JML providesaccessible clauses that allow specifying accessed locations (Leavens et al 2006)

In Dafny (Leino 2010) modifies clauses are expressed by sets of objects and theymust be interpreted as giving permissions to a method to modify any field of any objectthat is a member of the specified set Frame conditions are thus expressed at the levelof objects and not at the level of object fields While Dafny methods are not required tospecify what they read for Dafny predicates ie functions returning Booleans readingframe conditions can also be specified (Koenig and Leino 2012) These are memorylocations that predicates are allowed to read and they can be specified as sets ofobjects or object fields Dafny checks that memory locations outside the reading frame

20 Chapter 2 The Frame Problem in Software Verification

are not accessed nested predicate calls must have reading frames that are includedin the reading frames of the calling predicate Predicate parameters are not memorylocations and hence must not be declared In addition Dafny uses a form of dynamicframes (Kassios 2006) that we will present in Section 24

In Spec (Mike Barnett 2005 Leino and Muumlller 2008a) modifies clauses can beexplicitly added for constraining the modification of objects that were allocated in thepre-state of a method ie new objects allocated and modified by a method need notbe included in the modifies clauses Methods can specify that any field of an object omay be modified with a construct of the following form o it can also be specifiedthat only some field a may be modified with a construct of the form oa Unlikethe clauses expressed using old in postconditions for excluding some modificationsmodifies clauses must account for temporary modifications as well (similarly thus tothe JML assignable clause interpretation) For instance for a method decrementingsome integer field f and incrementing it subsequently the method could still specifythat f = old(f) in its postconditions However it would also have to include f in itsmodifies clause

Spec implicitly adds a modifies clause to methods in which this is the onlylisted element Thus by default methods are allowed to modify any field of the thisobject To prevent this the fields that may be modified must be explicitly includedin the clause (meaning that those not included are not allowed to change) A specialconstruct of the form thiso must be explicitly used for specifying that a method doesnot modify any field of this (Leino and Muumlller 2008a)

Information hiding imposes mechanisms for abstracting over program state thatcannot be explicitly mentioned in the modifies clause of a public method To this endwildcards can be used for specifying that the private representations of objects may bemodified as well as for specifying the modification of state in subclasses (Leino andMuumlller 2008a) However wildcards do not extend to aggregate objects and to this endSpec introduces the notion of ownership that we will discuss in Section 24

In Boogie frame conditions are expressed using coarse-grained modifies clausesin conjunction with postconditions These can quantify over fields and specify locationsof the heap that may be modified (This is Boogie 2 Boogie Reference Manual)

SPARK (Barnes and Limited 1997) uses a variation of the typical exclusive ap-proach SPARK procedures may reference or update the state associated with theirparameters in addition to that of global variables SPARK contracts must explicitlyaccount for the global variables accessed (read or written) during procedure executionin a globals construct Additionally for each parameter or global variable it must beindicated if it is read only written only or both read and written As SPARK is basedon the Ada language this is done by means of mode annotations such as in outindicating that a parameter or global variable is read only or written only respectivelyThe in out annotation is used for signaling that the annotated parameter or globalvariable is both read and written Together mode annotations on parameters and glob-als provide a complete specification of the inputs and outputs of a procedure (Hatcliffet al 2012) VDM (Jones 1990) provides similar annotations

24 Topologies and Effects 21

The exclusive convention facilitates the specification of pure operations ie opera-tions having no side-effects on which assertions in various languages including EiffelJML and Spec rely on for supporting data abstraction Specifying that an operationis pure simply amounts to specifying an empty modifies clause However specifyingand verifying the effects of heap modifications on the results of pure methods has beendescribed as one of the difficult remaining challenges related to framing (Hatcliff et al2012)

233 The Implicit Approach

The implicit approach eliminates the need to specify frame properties per se One ofthe implicit approaches relies on limiting what a procedure can modify based on theprocedurersquos precondition This approach is adopted in separation logic (discussed inSection 24) and in the implicit dynamic frames (Smans Jacobs and Piessens 2012)technique where reading and writing to memory requires knowing that the memorycontains that location To this end accessibility information is specified in the precon-ditions of methods By analysing preconditions an upper bound on the set of locationsthat are modifiable by a procedure can be detected As will be discussed in Chapter 7our approach to inferring fine-grained modifications can be seen as an implicit one aswell It relies on data-flow analysis and it is entirely automatic without requiring anydedicated annotations

Another approach to implicit framing was presented by Meyer He proposes theinference of frame properties for a method from the methodrsquos postcondition (Meyer2015) This approach relies on the empirical observation that in practice when pro-grammers realize that an element is modified by a methodrsquos execution they will gener-ally include and express information about how the element is modified It was inspiredby an informal review of publicly available JML code which showed that in practiceelements included in an assignable clause overlap those appearing in the methodrsquos post-condition Meyer argues that any exception to this observation can be easily addressedby inserting a Boolean function into the postcondition which always returns true andwhich introduces its elements into the implicit frame (Meyer 2015)

24 Topologies and EffectsSpecification techniques for complex data structures and operations manipulating themmust be able to describe and to address issues related to two different aspects namelythe topology or structure of the former and the effects of the latter on the data struc-turesrsquo state (Hatcliff et al 2012) In the object-oriented realm objects encapsulatestate and functionality yet their implementations are rarely limited to the fields andmethods of a single object After all one of the principles of object-oriented program-ming is to favour composition over inheritance Thus object fields reference otherobjects often of different classes and those objects in turn reference yet other objectsand so on In order to reason about and to prove functional correctness specificationshave to capture this ldquocompositerdquo shape of the implemented data structures (Leino and

22 Chapter 2 The Frame Problem in Software Verification

Muumlller 2008a) They also have to describe the effects of operations on the state ofthe data structures including write effects ie which parts are potentially modified byan operation and read effects ie which parts are potentially accessed by an opera-tion (Hatcliff et al 2012)

For objects and heap data structures the write and read effects (Greenhouse andBoyland 1999) refer to parts of the heap ie locations Specifications for heap datastructures might also require including allocation and deallocation effects as well aslocking information (Hatcliff et al 2012) Detecting and reasoning about read andwrite effects is necessary and relevant in different situations For instance Greenhouseand Boyland (Greenhouse and Boyland 1999) present an effects system for performingsemantics-preserving program manipulation on Java source code

Our work is done in the context of a purely functional language with immutabledata structures and no destructive updates Reasoning about the heap is beyond ourscope However our concerns are similar we handle ldquocompositerdquo data structuresmodeled by immutable associative arrays and algebraic data types ie structures andvariants and we want to capture the behaviour of operations receiving such a compositeinput manipulating it reconstructing it and returning its new state into a compositeoutput Thus in contrast to specification and reasoning techniques for objects whichare concerned with deep-heap effects we are concerned with deep-state effects

Specification techniques for topologies and effects must address three major chal-lenges namely abstraction reasoning and framing (Hatcliff et al 2012)

Abstraction In the object-oriented context heap properties must be expressed in animplementation-independent manner Abstraction is important for information hidingand for supporting subtyping (Leino 1998 Leavens and Muumlller 2007) Aspects relatedto visibility and information-hiding are orthogonal to our work The language we areworking with does not have subtyping Therefore disclosing the topology of our datastructures is not problematic from this point of view

Reasoning The formal framework in which (heap) properties are expressed shouldallow efficient ideally automatic reasoning

Framing Specifications of heap operations should ease reasoning about framing andaid in proving that certain heap properties are not affected by a heap operation Fram-ing can be illustrated by the following rule expressing that a state that is unmodifiedby C can be preserved

PCQP andRCQ andR

if the write effect of C is disjoint from the free variables of R In the presence of complexheap data structures the disjointness of the effects of C and the assertion R is moredifficult to express as it needs to specify that the locations that are modified by C aredisjoint from the locations read by R Similarly though not referring to locations we

24 Topologies and Effects 23

have to be able to express that the substructures (or subelements) modified by C andthose read by R are disjoint

The sets of written or read locations are called footprints Hatcliff et al classifyapproaches to the specification of heap properties into three categories The first cate-gory relies on explicit footprints and uses sets of objects or locations that are includedin predicates and effects specifications Dynamic frames (Kassios 2006 Kassios 2011)and region logic (Banerjee Barnett and Naumann 2008 Banerjee Naumann andRosenberg 2013) are the main exponents of this category The second category re-lies on implicit footprints which are derived from predicates in specialized logics suchas separation logic The third approach relies on predefined footprints which are de-rived from predefined heap topologies (Hatcliff et al 2012) Ownership types (ClarkePotter and Noble 1998) are the main exponent of this category All of these tech-niques allow specifying the topologies of common heap data structures and reasoningabout the effects of operations However each amounts to a different balance betweenexpressiveness and automation (Hatcliff et al 2012)

241 Explicit Footprints

The explicit footprint approach to framing was pioneered by Kassios and the dynamicframe theory (Kassios 2006 Kassios 2011) This proposed adding sets of locations tothe specification language and expressing footprints in terms of such sets For preservinginformation hiding these sets of locations can involve dynamic frames specificationvariables that abstract over a set of locations The initial solution based on dynamicframes was formalized in the context of an idealized logical framework using higher-order logic and inductive-based proofs which are difficult to automate Subsequentwork on region logic (Banerjee Naumann and Rosenberg 2008 Banerjee Barnettand Naumann 2008 Banerjee Naumann and Rosenberg 2013) and the Dafny verifieron one hand and VeriCool (Smans Jacobs and Piessens 2008) on the other handdeveloped dynamic frames in a first-order setting

VeriCool uses pure methods for describing sets of locations Recursively defined puremethods or logic functions can be a challenge for automatic theorem provers (Hatcliffet al 2012 Banerjee Barnett and Naumann 2008)

In region logic for minimizing the need for inductively defined predicates in spec-ifications the specification attributes used in the dynamic frames approach (Kassios2006) are replaced with ghost state (Banerjee Naumann and Rosenberg 2013) iemutable auxiliary fields and variables Programs have to be explicitly annotated withthese which might imply a cumbersome manual effort but unlike the dynamic frametheory in its original form this permits automated theorem proving

Zee et al have used explicit footprints for verifying the functional correctnessof linked data structures in Jahob (Zee Kuncak and Rinard 2008) Banerjee etal (Banerjee Naumann and Rosenberg 2008 Banerjee Barnett and Naumann 2008)encoded region logic in the intermediate verification language Boogie (Leino and Ruumlm-mer 2010)

24 Chapter 2 The Frame Problem in Software Verification

The dynamic frames approach using ghost variables is supported by the Dafnylanguage (Leino 2010 Koenig and Leino 2012) As described in Section 232 Dafnysupports the exclusive approach to specifying frames Ghost variables are used inmodifies clauses The standard idiom consists in declaring a set-valued ghost fieldRepr for instance to dynamically maintain Repr (ie explicitly update it in the code)as the set of objects that are part of the receiverrsquos representation and to use Repr inmodifies clauses (Leino 2010) The following idiom is standard (Leino 2010)

class MyClass ghost var Repr setltobjectgtmethod SomeMethod() modifies Repr

This modifies clause is to be interpreted as the method may modify any field ofany object in Repr If this is a member of the Repr set then the modifies clause alsoallows the method to modify the field Repr itself (Leino 2010)

With explicit footprints proving frame properties consists in proving that the readeffects of a predicate and the write effects of a method are disjoint

Before the dynamic frame approach data groups (Leino 1998 Leino Poetzsch-Heffter and Zhou 2002) and solutions based on the Universe type system (Muumlller2002) have been proposed for specifying footprints within single objects

The level of expressiveness offered by techniques based on explicit footprints is veryhigh allowing specifications to relate different regions in arbitrary ways ranging fromdisjointness or inclusion of regions to characterizing their intersection However thisflexibility complicates reasoning When regions are stored explicitly in ghost variablesas is done in Dafny programs need to explicitly update these ghost variables to maintaininvariants This can prove to be a cumbersome task When pure methods are used asin VeriCool it is mandatory to reason explicitly about the effects of heap modificationson their results (Hatcliff et al 2012)

242 Implicit Footprints

The implicit footprint approaches rely on specialized logics for implicitly representingfootprints Separation logic (OrsquoHearn Reynolds and Yang 2001 OrsquoHearn Yang andReynolds 2004 Reynolds 2002 Reynolds 2005 Reynolds 2000) is the most prominentrepresentative of this category

Separation logic extends Hoare logic (Hoare 1971) with the separating conjunctionoperator lowast Each assertion in separation logic defines a portion of the heap Theassertion P lowastQ is true if and only if P and Q hold for disjoint parts of the heap Localreasoning is fundamental to separation logic (OrsquoHearn Reynolds and Yang 2001)specifications need to describe all the state that the code C reads or writes Thus inthe triple PCQ P must be interpreted as being all the state that is needed forexecuting C ie the footprint of C This interpretation of Hoare triples leads to thefollowing frame rule in separation logic

24 Topologies and Effects 25

PCQP lowastRCQ lowastR

which allows inferring that a local property is preserved for a wider state obtained byextending P with another disjoint state R Some versions of separation logic imposeadditional conditions about local variable modifications as the lowast operator only separatesheaps Separation logic can be extended such that lowast also separates variables thuseliminating the need for additional conditions (Parkinson Bornat and Calcagno 2006)

A separation logic for Java was introduced by Parkinson (Parkinson and Bierman2005) This has primitive assertions to describe the values of fields in the heap andallows describing portions of the heap containing several disjoint objects using the lowastoperator

Separation logic does not require explicitly specifying read or write effects They areimplicit in a methodrsquos precondition Data structures are specified using logic functionsBy including such a logic function in a methodrsquos precondition the method is allowedto read and write anything belonging to the footprint of the logic function but cannotaccess anything outside this footprint

Approaches based on separation logic are hard to implement and to integrate intoverification tools Verifiers based on separation logic have mostly relied on sym-bolic execution and have not yet achieved the same level of automation as verifiersbased on verification condition generation (Hatcliff et al 2012) However currentlya series of tools exist that can reason using separation logic These include Small-foot (Berdine Calcagno and OrsquoHearn 2005 Berdine Calcagno and OrsquoHearn 2012)SpaceInvader (Distefano OrsquoHearn and Yang 2006 Calcagno et al 2008) jStar (Dis-tefano and Parkinson 2008 Naudziuniene et al 2011) VeriFast (Jacobs Smans andPiessens 2010 Jacobs et al 2011) and SLAyer (Berdine Cook and Ishtiaq 2011)

The implicit dynamic frames approach (Smans Jacobs and Piessens 2012) unifiesthe dynamic frames concept with separation logic Framing specifications of a methodare inferred using an implicit approach as described in Section 233 They are encodedin first-order logic and can be used for automatic verification with SMT solvers Thisis done in VeriCool (Smans Jacobs and Piessens 2008) and Chalice (Leino Muumlllerand Smans 2009)

243 Predefined Footprints

In contrast to the implicit and explicit footprint approaches which describe propertiesfound in a program the third approach focuses on reasoning efficiently about programswith restricted topologies Ownership types (Clarke Potter and Noble 1998) arerepresentative of this approach

Ownership types typically enforce a tree topology whereby every object in the heaphas at most one owner object and the owner relation is acyclic Topological propertiesbeyond this tree structure have to be expressed using object invariants and predicatelogic Read and write effects typically use ownership as an abstraction mechanism the

26 Chapter 2 The Frame Problem in Software Verification

right to read or write an object include the right to read or write all the objects it(transitively) owns (Hatcliff et al 2012)

Spec addresses framing through ownership types without explicit specificationsstating otherwise (modifies clauses of the form presented in Section 232) methodsmay modify only the fields of the receiver and of those objects within the subtree ofwhich the receiver is the root Ownership is expressed by means of attributes on fielddeclarations (Barnett et al 2004 Barnett et al 2011)

Ownership has been used to verify write effects (Muumlller Poetzsch-Heffter and Leav-ens 2003) and invariants (Drossopoulou et al 2008 Leino and Muumlller 2004 MuumlllerPoetzsch-Heffter and Leavens 2006) All the existing ownership-based verificationtechniques enforce that all modifications of an object must be initiated by the objectrsquosowner This gives owners total control over modifications of their internal representa-tions and allows them to maintain invariants (Hatcliff et al 2012) Ownership-basedapproaches have been used for reasoning about model fields (Leino and Muumlller 2006)and for enforcing object immutability (Leino Muumlller and Wallenburg 2008)

The ownership topology can be enforced by type systems (Lu Potter and Xue2007 Muumlller 2002) In JML it is enforced through universe types (Dietl and Muumlller2005) In Spec it is encoded as object invariants (Barnett et al 2004)

Reasoning about framing relies on the tree structure on the heap enforced by own-ership The ownership trees rooted in two different objects o1 and o2 are disjoint ifneither o1 owns o2 nor o2 owns o1 The disjointness of ownership trees can then beused to prove that read and write effects of methods do not overlap (Hatcliff et al2012)

25 Other Approaches to Reason about Frames

Rakamarić and Hu report in (Rakamaric and Hu 2008) a method to infer frameaxioms of procedures and loops based on static analysis As a starting point they usethe DSA shape analysis presented by Lattner et al (Lattner Lenharth and Adve2007) DSA provides a summary of points-to relations as a graph that is used tocompute a set of memory locations that are modified by a procedure or its callees Bya pass through the graph for each node that is reachable from the globals or procedureparameters they generate expressions representing a path to that node The generatedframe axioms are used internally by an extended static checker of C programs ie ina purely automatic setting

In (Taghdiri Seater and Jackson 2006) Taghdiri et al present a technique forextracting procedure summaries for object-oriented procedures used to prove verifi-cation conditions Procedures are executed symbolically and the environment of thepost-state is computed so as to express every variable and field in terms of the valuesof the variables and fields of the pre-state The extracted procedure summaries canbe viewed as detailed frame conditions describing which memory locations might bechanged and how

26 Other Relevant Work 27

In (Sozeau 2009) Sozeau presents a generalized rewriting technique implementedin the Coq proof assistant that allows substituting a term t of an expression by anotherterm tprime when t and tprime are related by a relation R This generalizes equational reasoningto reasoning modulo arbitrary relations The technique relies on dependent types andis based on a constraint generation algorithm generating type class constraints TheCoq tactic supports polymorphic relations morphisms and subrelations

Bertrand Meyer proposed the double frame inference strategy an approach that tar-gets the automation of both frame specification and frame verification in the contextof Eiffel (Meyer 1991) an object-oriented language with native support of Design byContract features (Meyer 1992) The first component ndash the frame specification infer-ence ndash relies on the analysis of method postconditions as described in Section 233 andobtaining a set p This represents an overapproximation of the set of elements that areallowed to be modified by p according to its specification The second component of thestrategy the frame implementation inference relies on the frame calculus (KogtenkovMeyer and Velder 2015) which is itself based on alias calculus (Kogtenkov Meyerand Velder 2015 Meyer 2010 Meyer 2011) Methods are analysed and p is detectedthis represents an overapproximation of the set of expressions whose values may changeas a result of executing p Frame verification amounts to verifying that p includes p

26 Other Relevant WorkPure methods also known as queries or observers are side-effect free methods that al-ways evaluate to the same result value given the same input value They are intensivelyused for providing specifications for methods without disclosing implementation detailsin languages such as JML Spec and Eiffel Leavens et al identify the developmentof specification and verification techniques for determining the effects of heap modifi-cations on the results of pure methods as one of the remaining challenging problemsrelated to framing (Leavens Leino and Muumlller 2007) Though our work is not con-cerned with heap modifications we are interested in the dependency of Boolean Smartpredicates ie logical properties on the layered (ldquocompositerdquo) data structures theyare receiving as inputs In Chapter 5 we present a static analysis meant to capturesuch dependencies

Various encodings of pure methods (Cok 2005 Darvas and Muumlller 2006) in pro-gram logic have been proposed but they do not cover aspects related to reasoningabout frame properties when the specifications make use of pure methods Some spec-ification techniques for frame properties (Leavens Baker and Ruby 2006 Leino andMuumlller 2006 Leino and Nelson 2002 Muumlller Poetzsch-Heffter and Leavens 2003)allow describing the fields that are potentially modified by a method execution usingmodifies clauses These however do not specify the effects of a method execution onthe results of pure methods (Leavens Leino and Muumlller 2007)

One technique for determining the effects of heap modifications on the results of puremethods requires listing all pure methods that are potentially affected by a methodin the methodrsquos modifies clause This approach is adopted in COLD-K (Feijs and

28 Chapter 2 The Frame Problem in Software Verification

Jonkers 1992) where the frame of a procedure specification lists the variables and theequivalent of pure methods whose value may be changed by the procedure For dealingwith modularity issues COLD-K also makes use of read effects

Other approaches (Leino and Muumlller 2006 Muumlller Poetzsch-Heffter and Leavens2003) for determining effects on the results of pure methods rely on model fields Theseare specification-only constructs whose value is determined by applying a mapping tothe concrete state of an object They are similar to pure methods but unlike the latterthey do not have parameters and they are required to be confined (Leino and Muumlller2006 Muumlller Poetzsch-Heffter and Leavens 2003)

Approaches based on model fields require that pure methods read only the stateof the receiver object and its sub-objects This information about the read effect of apure method can be used to determine which write effects potentially have an impacton the result of a pure method In general it can be proven that a method m does notaffect the result of a pure method p if the write effect of m and the read effect of p aredisjoint (Leavens Leino and Muumlller 2007)

There are various approaches to using read effects for reasoning about pure meth-ods One approach relies on complete specifications of result values included in thepostconditions of pure methods Used in conjunction with modifies clauses theseallow determining whether a method affects the result of a pure method (LeavensLeino and Muumlller 2007) Various solutions based on explicitly specified read effectsexist (Feijs and Jonkers 1992 Greenhouse and Boyland 1999 Jacobs and Piessens2006) Specification of these using data groups (Leino 1998 Leino Poetzsch-Heffterand Zhou 2002) and an effects system built on top of an ownership type system (Clarkeand Drossopoulou 2002) have been proposed Multi-threaded programs also requiresuch specifications (Praun and Gross 2003)

29

Chapter 3

The Smart Language andProvenTools

Languages are not strangers to oneanother

Walter Benjamin

In this chapter we introduce Smart a programming and specification languagedeveloped at Prove amp Run as well as the toolchain associated with it While notclaiming to be exhaustive we give an overview of the languagersquos features and syntax inSection 31 In Section 32 we present the tools manipulating Smartmodels Section 33briefly presents Smil the Smart Intermediate Language A computational version of itndash αSmil ndash is targeted by the static analyses presented throughout the remainder of thisthesis The following chapter will focus entirely on αSmil illustrating its usage andintroducing its syntax and formal semantics

31 The Smart Modeling LanguageSmart is a modeling language developed at Prove amp Run It constitutes a unified pro-gramming and specification language designed to facilitate proofs One of the commonoften cited reasons why programmers reject the use of formal methods is that they arenot willing to learn a separate language just for specifying their programs in particu-lar if that language is fundamentally different from the programming language Smartaddresses this issue by allowing one to both develop the implementation of programsand to specify their logical properties in a single language

The Smart language is a purely functional (side-effect free) strongly-typed poly-morphic first-order language The basic building blocks of programs written in Smartare predicates the equivalent of functions in other common programming languagesBesides the common primitive types that are traditionally available as built-in typesalgebraic data types (structures and variants) and associative arrays are provided aswell Exit labels constitute the languagersquos main specificity they facilitate separatingdata- and control-flow in programs

In addition being designed in order to write code that will subsequently be proventhe language allows the definition of various types of logical specifications as well

30 Chapter 3 The Smart Language and ProvenTools

These range from pre- and postcondition contracts local assertions and loop invariantsto inductive predicates lemmas and hypotheses

ProvenTools is a complex set of development tools for the Smart language It hasbeen developed at Prove amp Run with the goal of facilitating the achievement of high-levelcertifications The toolchain has the structure of a set of Eclipse plug-ins of JDT typendash Java Development Tools (Eclipse Java Development Tools (JDT)) Together theseconstitute a complete Integrated Development Environment (IDE) allowing one to notonly write edit and document Smart models but also to browse proof obligations toprove them by employing a built-in prover and finally to generate executable code inC

ProvenCore1 (Lescuyer 2015) and ProvenCore-M2 are two microkernels that havebeen completely modeled in Smart and developed using ProvenTools The former isa general-purpose microkernel that ensures isolation ie integrity and confidentialityThe latter targets embedded devices based on microcontrollers

Throughout the rest of this section we will present some of the main concepts andmechanisms of Smart discussing predicates control flow algebraic data types andspecification-only constructs

311 Smart Predicates and Types

Smart supports modular program development with a straightforward module con-cept Modules constitute the compilation units of Smart programs and any valid Smartprogram consists of a non-empty set of modules which are themselves organized inpackages Modules have an identifier that is unique in each program and in practicalterms each module corresponds to a file Modules can import other modules and theycontain a list of type and constant declarations as well as a list of predicates

Predicates the equivalent of functions in other common programming languagesare the basic building blocks of programs written in Smart Though named in referenceto predicate logic predicates in Smart receive a number of inputs and produce a numberof outputs in return in contrast to predicates in mathematics which are commonlyunderstood to be Boolean-valued functions of the form

P X rarr true false

Smart predicates can be classified in two different categories namely implicit andexplicit predicates based on their implementation or their lack thereof

Implicit predicates can be seen as a form of an assumption as their names suggestthey are not implemented per se but simply declared using the implicit programkeywords Such predicates are similar to the declarations of native methods in Javaor external functions in C Traditionally in Java programmers use the Java NativeInterface (JNI) (Liang 1999 Java Native Interface Documentation (JNI) 1999) whenthey need to implement small time-critical code portions in a lower-level language

1httpwwwprovenruncomproductsprovencore2httpwwwprovenruncomproductsprovencore-m

31 The Smart Modeling Language 31

such as assembly or when they need to access a library already written in anotherprogramming language such as C In Smart implicit predicates play an important rolewith respect to code documentation Their implementation is not provided in themodel but as we will further explain in Section 314 they can be used to specifylogical properties of the explicit implementations provided externally in a lower-levellanguage typically in C or assembly

For example an implicit predicate converting an integer given as an input into afloat can be declared as follows

public float_of ( int n f l oa t f+)impl ic i t program

The predicatersquos result is given a name f and it is introduced as one of the predi-catersquos parameters It is marked as being the predicatersquos output by the + symbol follow-ing it and is thereby syntactically distinguished from the predicatersquos input parametern which is unadorned

In the general case Smart predicates can have any number of input or output pa-rameters However a parameter cannot be both at the same time and each of thesemust be explicitly marked either as an input or as an output An input parameterrsquosvalue can be read and used in the predicatersquos implementation An output parameterrsquosvalue must be constructed by the predicatersquos implementation and returned as a resultFurthermore values in Smart are immutable As a consequence Smart predicates arepure it is impossible to pass a parameter ldquoby referencerdquo and modify a predicatersquos inputas a side-effect Smart is thus a side-effect free language which provides referential trans-parency (Strachey 1967) Furthermore the language supports neither global variablesnor global states but can be characterized rather as a state-passing style languageSmart predicates are deterministic they always return the same output any time theyare called with a specific set of input values In particular this is a prerequisite forimplicit predicates

As mentioned in the introduction Smart is also a strongly-typed language Eachinput and output parameter of a predicate must have an associated type and the us-age of an object of some type where a parameter of another data type is expected isforbidden by the language Unsafe conversions between different types are forbiddenas well Smart provides various built-in types such as int short long char booleanfloat and double that are traditionally available in other programming languages aswell Additionally users can declare new types with the type keyword and then de-fine predicates manipulating these types As in the case of predicates implicit datatypes can be simply declared without being explicitly defined For example supposingthat an implicit data type called cartesian_point and the predicates manipulating itare defined in a lower-level language we would make them available to other Smartpredicates using the following declarations

Implicit data type declarationtype cartesian_point

32 Chapter 3 The Smart Language and ProvenTools

Retrieve coordinate on X-axis public get_X ( cartesian_point p f l oa t x+)impl ic i t program

Retrieve coordinate on Y-axispublic get_Y ( cartesian_point p f l oa t y+)impl ic i t program

Construct a new point p with coordinates (x y)public new_point ( f l oa t x f l oa t y cartesian_point p+)impl ic i t program

Pretty - printpublic print_point ( cartesian_point p)impl ic i t program

Some implicit predicates manipulating inputs of type cartesian_point are declaredas well the first two of them ndash get_X and get_Y ndash simply return the input pointrsquos numer-ical coordinates on each of the Cartesian systemrsquos axes The next predicate new_pointcreates and returns a new point from the two given input coordinates Alternativelyit is possible to directly declare and implement these types and predicates in Smart aswe will show in the following paragraphs The last one print_point simply displaysthe input point without effectively producing an output As shown in the examplesimilarly to Java comments in Smart can be introduced by using for single-line com-ments or for multi-line comments Similarly to Javadoc code documentation canbe given using the begin-comment delimiter

In general implicit data types and the implicit predicates manipulating them canact as a public interface for a concrete class showing the type and the operationsallowed to manipulate values of that type but hiding the implementation

Explicit data types can be declared and defined using structures and variants Forexample we could explicitly define the type cart_point by means of a structure havingtwo different fields of type float called x and y Each of them corresponds to thepointrsquos numerical coordinates on the X- or Y-axis respectively

type cart_point = f l oa t xf l oa t y

For representing a point in a polar coordinate system we can define a different typepolar_point as follows

type polar_point = Radial coordinate ( distance from the pole) f l oa t radius

31 The Smart Modeling Language 33

Polar angle f l oa t azimuth

Explicit predicates have explicitly defined implementations following immediatelyafter their declaration which strongly resembles that of an implicit predicate but fromwhich the keyword implicit is omitted Their bodies are sequences of several state-ments which are essentially calls to other predicates For example to translate a point(x y) ie to add a given pair of numbers (a b) to its Cartesian coordinates and obtainthe new point (xprime yprime) = (x+ a y + b) a predicate translate_point could be defined inthe following manner

Convert x to float add it to y and retrieve the sum public sum_of ( int x f l oa t y f l oa t s+)impl ic i t programpublic translate_point ( cartesian_point p int a

int b cartesian_point q+)program f l oa t xa f l oa t yb Local variables

print_point (p) 1

get_X (p xa +) 2 get_Y (p yb +) 3

sum_of (a xa xa +) 4 sum_of (b yb yb +) 5

new_point (xa yb q+) 6

print_point (p) 7

The body of the translate_point predicate consists in a sequence of several state-ments the first of these simply pretty-prints the input point p The next two statementsare calls to accessors of prsquos coordinates on the X- and Y-axis which are stored in thelocal variables xa and yb respectively Next the coordinates (xaprime ybprime) = (a+xa b+yb)for the translated point are computed by calling the sum_of predicate which returnsthe float sum of an integer and a float The output point q is constructed by callingthe constructor new_point with xa and yb as inputs The last statement pretty-printsthe input point p again

As illustrated by our example each call to a predicate is made by passing theparameters in the same order as in the predicatersquos declaration and by explicitly mark-ing any output with +3 Replacing line 4 with sum_of(xa a xa+) would result in an

3This is mandatory because of overloading

34 Chapter 3 The Smart Language and ProvenTools

error because the first input parameter of a call to sum_of is expected to be an in-teger and the second a float Similarly omitting the + symbol at line 6 and writingnew_point(xa yb q) would result in an error By explicitly marking the outputs ofeach statement it is straightforward to distinguish between the variables that are ac-tually written by the statement and those that are used only as inputs Furthermoresince predicates are not allowed to modify their inputs the language strictly forbidsusing a predicatersquos input parameter as an output for any statement in the predicatersquosbody Thus in our example predicate we are prevented from using the input point pas the output of the new_point predicate call However outputs and local variablessuch as xa and xb can be written to but reading them (ie using them as inputs fora predicate call) before they have been written at least once amounts to using unini-tialized variables and behaves in an unspecified manner In our example xa and ya areused as both inputs and outputs at line 4 and 5 respectively This is correct since xaand ya are local variables that have already been written to by the statements at line2 and 3 preceding the calls to sum_of

We stress again the fact that destructive updates are not possible in Smart even ifat a first glance a statement such as the call to sum_of at line 4 might give the impressionthat xa is modified in place all that the statement actually does is to create a new floatwhose value is obtained by adding the old value of xa to the value of a and then to setxa to reference this new float instead of the old one A simple conversion to a staticsingle assignment form (Cytron et al 1989) would eliminate these assignments andshow the absence of any mutation whatsoever Thus were we to inspect the state ofthe input point p before and after the calls to sum_of we would observe that it remainsunchanged this is what we do when printing p again at the end of sum_of

As a last remark about our example it is noteworthy to mention that the statementnew_point(xa yb q+) which produces the predicatersquos output is not the predicatersquoslast statement Smart does not support any dedicated return statement Instead whenexiting from a predicate the outputs hold the values that they have been assigned whenexecuting the body This mechanism allows one to define predicates having multipleoutputs Their names are chosen by the programmer and their values can be modifiedmultiple times during the predicatersquos execution however the values retrieved are theones that are available at the moment the program exits the predicate

312 Exit Labels and Control Flow

Besides input and output parameters the declaration of a predicate can also include aset of exit labels When called a predicate exits with one of the specified exit labels thussummarising and returning to its callers further information regarding its execution

Exit labels constitute the main specificity of the Smart language They can denotedifferent exceptional execution scenarios and act as exit codes similarly to exceptionsand exit status return values in other programming languages

Every predicate has a non-empty set of labels by default any predicate has thebuilt-in exit label true that denotes the successful exit status of a predicate Thepredicates illustrated previously in Section 311 did not have explicitly declared exit

31 The Smart Modeling Language 35

labels in such a case it is assumed that the only possible exit label for the predicateis true and hence that the predicate will succeed in all circumstances

Returning to our previous example the predicate translate_point we could havewritten its complete declaration by explicitly stating that true is the only possible exitlabel

public translate_point ( cartesian_point p int aint b cartesian_point q+)

-gt [ true]program

This declaration is strictly equivalent to the one given in Section 311In the general case any number of labels can be specified after the parameters For

example we could declare a predicate that converts the coordinates of an input point(x y) of type cartesian_point to polar coordinates

r =radicx2 + y2

φ = atan2 (y x)

and returns a point (r φ) of type polar_point with these coordinates For computingthe second polar coordinate the polar angle or azimuth the predicate would call an-other predicate atan2 which is the arctangent function with two arguments a commonvariation on the arctangent function The atan2 function avoids the problem of divisionby zero however it is undefined when both x and y ie the Cartesian coordinates arezero For declaring it in Smart we can add a special exit label for the case when thegiven input coordinates represent the origin and the result cannot be returned

Computes atan(yx) public atan2( f l oa t x f l oa t y f l oa t at+) -gt [ true undef ]impl ic i t program

The declared labelrsquos name undef is a custom name and any valid identifier canbe chosen and used as a label in Smart As previously mentioned the exit label trueis predefined and has a special meaning Another predefined label that is interpretedin a special manner by conditional statements and logical operators is the false labelTogether these two exit labels offer a convenient manner to model a Boolean resultFrequently a Boolean output value can be replaced by declaring these two possible exitlabels true to denote a successful execution of the predicate and false respectively

Besides indicating the followed execution scenario exit labels play an importantrole with respect to control flow management Primarily the exit label of a call toa predicate determines whether the next predicate call in sequential order should beexecuted or not when the predicate exits with true the program can proceed to the

36 Chapter 3 The Smart Language and ProvenTools

next statement in the program Any other exit label lbl disrupts the normal controlflow and forces the current predicate to exit with label lbl

For example a predicate cart_to_polar can be defined with two exit labels trueand undef as well It takes two float numbers x and y computes the correspondingpolar coordinates r and phi by calling the predicates compute_radius and atan2 andconstructs a new point p of type polar_point using the computed values

public compute_radius ( f l oa t x f l oa t y f l oa t r+)-gt [ true]impl ic i t program

public cart_to_polar ( f l oa t x f l oa t y polar_point p+)-gt [ true undef ]program f l oa t phi f l oa t r

compute_radius (x y r+)

atan2 (y x phi +)new_polar_point (r phi p+)

There is no guarantee that the call to atan2 will return successfully with exit labeltrue it might return with undef in which case the execution of cart_to_polar willbreak at that point and exit with label undef Furthermore no output will be generatedIn Smart exit labels condition the existence of output parameters every output isassociated to an exit label lbl and it is generated if and only if the predicate exits withthat particular exit label lbl All other outputs are discarded and can be consideredas unchanged by the caller The same output can be associated to multiple labels Bydefault if no output parameters are specified for a label it means that no outputs aregenerated when the predicate exits with this label The only exception to this rule ismade in the case of the built-in true label since true normally represents a successfulexecution every output of the predicate is associated to it by default For examplethe previous declaration of cart_to_polar is strictly equivalent to

public cart_to_polar ( f l oa t x f l oa t y polar_point p+)-gt [ true ltpgt undef ltgt]program f l oa t phi f l oa t r

compute_radius (x y r+)atan2 (y x phi +)new_polar_point (r phi p+)

Exit labels can thus behave similarly to exceptions in other programming languages Inorder to handle specific observed execution scenarios Smart provides label transformerswhich allow catching labels before they escape the current predicate and transforming

31 The Smart Modeling Language 37

them into another label Complex control flow can be expressed by indicating a set ofrules of the form lbl1 lbl2 whose role is to transform the label lbl1 into lbl2 andby associating them to statements

For example we could let the predicate cart_to_polar return the label origin_failwhen the inner computation of the azimuth fails instead of just forwarding the labelreturned by atan2

public cart_to_polar ( f l oa t x f l oa t y polar_point p+)-gt [ true ltpgt origin_fail ]program f l oa t phi f l oa t r

compute_radius (x y r+)[undef origin_fail ]

atan2(y x phi +)new_polar_point (r phi p+)

Alternatively we could also handle the failure of the computation by using trans-formers and constructing the output point differently for example by declaring aconstant representing the azimuth of the origin often called pole in polar coordinatesand using this for the construction of p when the call to atan2 fails

public const float POLEAZIMUTH

public cart_to_polar (float x float y polar_point p+)-gt [true ltpgt]program float phi float r

compute_radius (x y r+)[done true]

[true done undef true]atan2(y x phi+)phi = POLEAZIMUTH

new_polar_point (r phi p+)

In the following we show how the control flows when atan2 terminates with labeltrue The green arrows indicate how control is passed from one statement to the otherbased on their exit labels when starting from the call to the atan2 predicate

38 Chapter 3 The Smart Language and ProvenTools

public const float POLEAZIMUTHpublic cart_to_polar (float x float y polar_point p+)-gt [true ltpgt]program float phi float r

compute_radius (x y r+)[done true]

[true done undef true]atan2(y x phi+)phi = POLEAZIMUTH

new_polar_point (r phi p+)

And here is how the control flows when atan2 terminates with label undef

public const float POLEAZIMUTHpublic cart_to_polar(float x float y polar_point p+)-gt [true ltpgt]program float phi float r

compute_radius (x y r+) 1 [done true ] 2

[true done undef true] 3 atan2(y x phi+) 4 phi = POLEAZIMUTH 5

new_polar_point (r phi p+)

After computing the radius r by calling compute_radius this new version of thepredicate starts by calling the predicate atan2 If this operation succeeds then phi isthe value of the azimuth and we can use this value as the second input parameter forthe pointrsquos constructor new_polar_point This is done by transforming true to a newlabel done whose effect is to jump immediately to the outer block in this case thetop-level The top-level block of the program catches done transforms it back to trueand continues with the statement following the block namely new_polar_point whichwill construct the output p by using r and phi the value of the azimuth returnedby atan2 When atan2 is undefined the transformer undef true is used to jump toan additional statement phi = POLEAZIMUTH that assigns the value of POLEAZIMUTH tophi The constructor is reached in this case as well However this time the value of phiwritten at line 5 is used as the second input parameter We note that the statementat line 5 is a call to a built-in assignment predicate denoted by = and using an infixnotation

The constant POLEAZIMUTH is declared using the keyword const In Smart constantscan be declared and used directly as inputs for predicate calls

31 The Smart Modeling Language 39

In the general case arbitrarily complex control flows can be expressed by couplinglabel transformers blocks and recursion

In order to facilitate the userrsquos task of simulating common control flow structureswith labels and transformers Smart provides various control flow statements whichare themselves based on this mechanism These include a construct that is equivalentto the try catch mechanism in Java a conditional if then else controlstructure as well as the common logical operators for negation () conjunction (ampamp)disjunction (||) implication (=gt) and equivalence (lt=gt)

Given the Cartesian coordinates (x y) the first polar coordinate the radius isobtained by computing radic

x2 + y2

For explicitly defining the predicate compute_radius we would first need to imple-ment a predicate sqrt computing the square root of a given positive number Such apredicate can be recursively implemented as follows by using the if then elseconstruct and three implicit predicates

Newton - Raphson Square Roots Finding Algorithm

Divides a to b and retrieves result in div public div_double (double a double b double div +)-gt [ true undef]impl ic i t program

Check if a is close enough to b |a - b| lt b 0001 public close_approximation (double a double b)-gt [ true f a l s e ]impl ic i t program

Compute ((b + ab) 2) public better_approximation (double a double b double g+)-gt [ true undef ]impl ic i t program

public sqrt(double x double g double sqr +)-gt [ true undef ] Returns the square root of x by making recursive callswith better and better guesses g until reaching a guessthat is close enough to the actual square root rsquos value program double aux

div_double (x g aux +)i f close_approximation (aux g)then

sqr = g

40 Chapter 3 The Smart Language and ProvenTools

e l se better_approximation (x g aux +)sqrt(x aux sqr +)

Besides recursion Smart also supports loops by providing a specific construct thatis similar to a traditional ldquowhilerdquo loop in other programming languageswhile

The body of thiswhile block is repeatedly executed until a dedicated exit label calledexit tries to escape in which case the loop is aborted and the execution continues afterthe block A ldquobreakrdquo can be achieved by raising the special exit label inside the loop

For instance the previously recursive predicate sqrt can be implemented iterativelywith a while loop as follows

public sqrt_iter (double x double g double sqr +)-gt [ true undef ] Computes the square root of x iteratively program

div_double (x g sqr +)while double aux

[ true exit f a l s e true]close_approximation (sqr g)

better_approximation (x g aux +)div_double (x aux sqr +)

313 Polymorphism amp Algebraic Data Types

Smart supports polymorphic types and predicates For declaring polymorphic types anumber of type parameters must be introduced in the typersquos declaration For examplean implicit type of polymorphic pairs can be declared as follows

type pair ltA Bgt

This type is parameterized by two types A and B which are the types of the first and sec-ond projection of the pair Type variables must always start with an uppercase letterwhile regular types must always start with a lowercase letter The declaration of poly-morphic predicates is straightforward For instance declaring an implicit constructorfor the pair type declared above amounts to the following

31 The Smart Modeling Language 41

public new_pair (A a B b pair ltA Bgt p+)impl ic i t program

This predicate is implicitly parameterized by two type variables A and B Thetype parameters of a predicate are implicitly determined by the type variables in itsarguments Local variables in explicit predicates can also be declared with polymorphictypes However they can only depend on type variables introduced in the predicatersquosparameters Type variables in polymorphic types can be instantiated by any type

As mentioned in Section 311 Smart allows users to define their own concrete datatypes by using algebraic data types namely structures and variants

Structures Structures also called records or tuples in other programming languagesrepresent the Cartesian products of the different types of their elements called fieldsIn Smart these can be declared in two manners either by using the keyword structfollowed by the name of the structured type and its list of field types and field namesor by using the keyword type as shown below The latter is preferred Declaringpolymorphic structures is possible by introducing type variables in the definition

struct pair ltA Bgt A fstB snd

type pair ltA Bgt = A fstB snd

In order to build and manipulate structures Smart supports built-in constructorsand accessors For instance for the following type definition of a structure

type t = t1 f1t2 f2

tn fn

a constructor a destructor as well as individual accessors and ldquoupdatersrdquo for any ofthe structurersquos fields are generated by Smart Constructing an object of type t amountsto using tnew which requires a value for each of trsquos fields For example creating astructure value s of type t with values e1 en for each field amounts to callingtnew(s+ e1 en) The values of these fields can all be read with a singlepredicate call to tall(s e1+ en+) (which ldquodestructsrdquo the structure value intoits fields components) Individual accessors of type tfi(s ei+) are provided as wellfor any field fi Finally the value of a field fi can be set to some variable vi by usingtfi(s+ vi) As all statements in Smart this call has a functional nature and handlesimmutable data Thus setting the value of the fi field amounts to returning a newstructure where all fields have the same value as s except fi which is set to vi

It is possible to define a structured type with no fields at all

42 Chapter 3 The Smart Language and ProvenTools

struct unit

The value s of this type can be constructed by using unitnew(s+) without any inputThis type can be seen as representing the absence of information

Variants Many programs need to deal with heterogeneous collections of values Forexample a node in a binary tree can be either a leaf or an interior node with twochildren similarly a node of an abstract syntax tree in a compiler can represent avariable an abstraction an application etc Variant types provide the mechanismthat supports this kind of heterogeneous value collections (Pierce 2002)

Variants also called tagged unions in other programming languages can be seen asthe dual of structures A variant is the disjoint union of different types It representsdata that may take on multiple forms where each form is marked by a specific tagcalled the constructor

Revisiting our previously declared types cartesian_point and polar_point in Smartwe can define a type point as being either expressed in Cartesian or in polar or sphericalcoordinates using the following variant declaration

type point =| Cartesian ( cartesian_point p)| Polar ( polar_point p)| Spherical ( f l oa t r f l oa t theta f l oa t phi)

Each form that a variant can take is indicated by the symbol | followed by theuppercase tag and the list of parameters and their types The cases are mutuallyexclusive and a value of type point can have only one form at a time An object of typepoint can be built by using one of the constructors called with the appropriate numberand types of inputs For instance a Cartesian point pc can be obtained by callingpointCartesian(p pc+) Given an object pt of type point we can also distinguishbetween the different cases by using a constructor that is similar to the match withconstruct in OCaml

switch (pt)case Cartesian ( cartesian_point p) get_X(p x+)case Polar ( polar_point p) get_radius (p r+)case Spherical ( f l oa t r f l oa t theta f l oa t phi)

For verifying if a given point pt is a Cartesian point we can use

pointcase[ Cartesian ](pt)

31 The Smart Modeling Language 43

This could be obtained using the switch construct but for practical considerationsthe case construct has been additionally provided as a built-in predicate

314 Specifications

Smart also supports various types of logical specifications ranging from axioms andlemmas to pre- and postconditions invariants and inductives

In Section 311 we stated that implicit predicates are a form of assumption andthat declaring implicit Smart types and the predicates manipulating them provides aconvenient manner of axiomatizing external implementations frequently developed in alower-level language They can provide implementation-independent descriptions andact as abstractions that hide hardware-related details and low-level implementationdecisions Another form of assumptions are hypotheses Hypotheses are logical resultsthat are assumed ie they constitute axioms which are supposed to be true In Smarthypotheses are specification-only predicates ie they cannot be called in the codeThey are introduced by the keyword hypothesis

For example we could revisit our polymorphic pair type introduced in Section 313and provide a polymorphic axiomatization for it by using implicit predicates and hy-potheses that stipulate that the operations fst and snd retrieve the first and secondrespectively elements of the pair These are declared as follows

type pair ltA Bgt

public new_pair (A a B b pair ltA Bgt p+)impl ic i t program

public fst(pair ltA Bgt p A a+)impl ic i t program

public snd(pair ltA Bgt p B b+)impl ic i t program

public hypothesis pair_fst (A a B b)program pair ltA Bgt p A a2

new_pair (a b p+)fst(p a2 +)a = a2

public hypothesis pair_snd (A a B b)program pair ltA Bgt p B b2

new_pair (a b p+)snd(p b2 +)b = b2

44 Chapter 3 The Smart Language and ProvenTools

Lemmas are another type of specification-only predicates meant to facilitate prov-ing logical properties In contrast to hypotheses lemmas must be proven A lemmacan be introduced with the keyword lemma and it states that all paths that exit fromits body with an undeclared exit label represent impossible execution scenarios

In Section 311 we introduced a type cartesian_point allowing to express a pointby its Cartesian coordinates and we defined a predicate translate_point for translatinga point by a given pair of numerical values (a b) We revisit our example and implementa predicate that translates a pair of points by a fixed pair of numbers (a b) that areadded to the Cartesian coordinates of each point of the pair In addition we consideran implicit predicate euclidean_dist that computes the Euclidean distance d

d =radic

(x2 minus x1)2 + (y2 minus y1)2

between a pair of points 〈(x1 y1) (x2 y2)〉 These are declared as follows

type point_pair = pair lt cartesian_point cartesian_point gt

For a pair of points (( x1 y1 ) (x2 y2 )) computed = sqrt ((x2 - x1 )^2 + (y2 - y1 )^2)

public euclidean_dist ( point_pair p f l oa t d+)-gt [ true]impl ic i t program

For a pair of points (( x1 y1 ) (x2 y2 )) and a fixednumerical pair (a b) compute ((x1 rsquo y1 rsquo) (x2 rsquo y2 rsquo))as (( x1 + a y1 + b) (x2 + a y1 + b))

public translate_pair ( point_pair p pair lt int int gt tpoint_pair o+)

-gt [ true]

The translation of a pair of points preserves the Euclidean distance between themthe Euclidean distance of a pair of points p will be equal to the Euclidean distanceof the pair of points obtained after a translation We can express this property bydeclaring it as a lemma

public lemma edist_preserved (pair lt f l oa t f loat gt tpoint_pair p)

program point_pair translated f l oa t d1 f l oa t d2

euclidean_dist (p d1+) =gttranslate_pair (p t translated +) =gteuclidean_dist ( translated d2 +) =gt d1 = d2

31 The Smart Modeling Language 45

Specifying contracts for Smart predicates is also possible by employing pre- andpostconditions A precondition represents a logical property that must be true priorto calling a predicate and it serves the purpose of letting the callers know when it issafe to call some predicate Typically it represents the callerrsquos obligations In Smarta precondition can be introduced with the keyword pre and it can be attached to anyimplicit or explicit predicate A precondition can refer to the predicatersquos inputs andit can declare its own local variables However it cannot make use of the predicatersquosoutputs

For instance for the atan2 predicate discussed in Section 312 we could indicatethat the predicate should never be called with the coordinates (0 0) of the origin byadding the following precondition

public const f l oa t ZERO

public atan2 ( f l oa t x f l oa t y f l oa t at +) -gt [ true]pre

x = ZERO || y = ZEROimpl ic i t program

A postcondition represents a logical condition that must be true after executinga predicate Its purpose is to indicate to the callers of a predicate what they areentitled to expect with respect to the outputs produced by the predicate In Smartpostconditions are introduced with the keyword post and they can be attached toany implicit or explicit (computational) predicate on a subset or all of the predicatersquosoutput labels They can refer to the predicatersquos inputs and the outputs associated tothe label considered in the postcondition Additionally they can declare their own localvariables

For instance a predicate equal_points verifying if two points are equal and havingfour possible exit labels eq_points eq_x eq_y and false respectively could declarepostconditions as follows

public equal_points ( cartesian_point p cartesian_point q)-gt [ eq_points eq_x eq_y f a l s e ]program f l oa t px f l oa t qx f l oa t py f l oa t qy

cartesian_pointx(p px +)cartesian_pointx(q qx +)cartesian_pointy(p py +)cartesian_pointy(q qy +)i f px = qxthen

[ true eq_points f a l s e eq_x] py = qy e l se

[ true eq_y] py = qy

post eq_points p = q

46 Chapter 3 The Smart Language and ProvenTools

post eq_x f l oa t x1 f l oa t x2 cartesian_pointequals[x](pq)

post p = q

The first postcondition applies to the exit label eq_points the second to the labeleq_x and the last one indicated by applies to labels eq_y and false

In Smart mathematical relations can be represented by introducing inductives orschemes These predicates have no outputs but they always have true and false astheir exit labels Inductive predicates are the only part of the language that cannot betransformed into executable code however they can be used to facilitate the proofsPredicates introduced with the inductive keyword represent the least fixed point oftheir cases introduced with the keyword case and a user-defined name Each case canintroduce existentially quantified variables In particular in the absence of recursioninductive predicates represent a parallel disjunction of cases An inductive predicatewill exit with the label true if any of its declared cases holds

For example we could specify membership for an implicit array type using aninductive named contains having a single case with the user-defined name ElemAtwhich introduces an existentially quantified variable idx

type array ltAgt

public get_size (array ltAgt arr int s+)impl ic i t programpublic get_elem (array ltAgt arr int i A ai+)-gt [ true oob]impl ic i t program

Membership defined with an inductive and an existential public contains (array ltAgt arr A a) -gt [ true f a l s e ]inductive An array contains an element if there exists a validindex where this element is to be found case ElemAt ( int idx ) A b

[ oob f a l s e ] get_elem (arr idx b+) ampamp b = a

Schemes on the other hand represent conjunction of cases cases are introducedwith the keyword with followed by a user-defined name and each of them can introduceuniversally quantified variables A scheme will return the label true only if all of itsdeclared cases hold

Using a scheme with two cases Size and Forall as shown below we can definethe pointwise equality of arrays The first case Size verifies if the two arrays have thesame length by introducing two universally quantified variables n and m The Forallcase verifies that for any index i the arrays contain equal elements Two arrays are

31 The Smart Modeling Language 47

equal pointwise if and only if they are of the same size and at any given index i thearrays have the same element

public equals_pointwise (array ltAgt arr1 array ltAgt arr2)-gt [ true f a l s e ] Extensional equality of arrays [arr1] and [arr2]scheme They must be of the same sizewith Size int n int m

get_size (arr1 n+) =gt get_size (arr2 m+) =gt n = m

If they exist elements at the same index must be equalwith Forall ( int i) A a A b

get_elem (arr1 i a+) =gt get_elem (arr2 i b+) =gta = b

Loop invariants are supported as well These can be introduced in various waysfor instance by declaring them with the keyword invariant or by declaring them asinductives

315 Illustrating Smart ndash An Abstract Process Manager

To illustrate the Smart language and its capabilities we consider an abstract processmanager and its fundamental components process and thread We define the data struc-tures corresponding to threads and processes implement the predicates correspondingto a simple thread switch and specify some fundamental properties for processes

Thread

Stack Register Counter

Data Files

Code

Process with a single thread

Thread1 Threadn

Stack Stack

Counter Counter

Register Register

Data Files

Code

Process with n threads

The implementation of threads and processes differs depending on the operatingsystem but frequently a thread is a component of a process that belongs to exactlyone process outside which it cannot exist Each thread represents a separate flow of

48 Chapter 3 The Smart Language and ProvenTools

control Multiple threads can be associated to one process they execute concurrentlyand provide a mechanism to improve application performance through parallelism Ina nutshell threads represent a software approach to improving the performance ofoperating systems by reducing the overhead of process switching

A thread is a flow of execution through the process code having its own programcounter that keeps track of which instruction to execute next as well as systemregisters which hold its current working variables and a stack which contains theexecution history Every thread is uniquely identified by a thread identifier Peerthreads share some information such as the code and data segments When one threadalters a code memory item all other threads see the change

Ready

Running

Blocked

Figure 31 ndash Possible Transitions between Thread States

We define a thread type as a structure consisting of multiple fields such as thethreadrsquos identifier its current state and the memory region for its stack

type memory_region = Start addressint start Region lengthint length

type state =| Ready| Running| Blocked

type thread = Identifierint id Current statestate crt_state Stackmemory_region stack

The threadrsquos stack is identified by its start address and its length The state of athread is defined as a variant having three alternatives Running (the thread is currentlyexecuting) Ready (the thread is currently awaiting execution and could potentially bestarted) and Blocked (the thread has exhausted its allocated time or is waiting foran event to occur it must be unblocked before being able to execute) The possibletransitions between states are shown in Figure 31 A threadrsquos current state determinesthe valid transitions

Similarly a process is defined as a structure consisting of an internal identifier anidentifier for the thread that is currently executing an address space and an array ofpossibly inactive threads associated with it Whether a thread in the thread array isactive or has terminated is indicated by a variant of type option An inactive thread

31 The Smart Modeling Language 49

indicated by None is a thread that terminated its execution and whose slot in the arrayof associated threads has not been reallocated In contrast a blocked thread indicatedby Some is a thread that cannot execute currently but should execute in the futureonce the resources it is waiting for are freed We consider a segmented address spacewith addresses existing not in a single linear range but instead in multiple segmentscorresponding to the code the data and the stack respectively

type option ltAgt =| None| Some (A a)

type address_space = memory_region codememory_region datamemory_region stack

type process = Array of associated threadsarray ltoption ltthread gtgt threads Internal idint pid Currently running threadint crt_thread Address spaceaddress_space adr_space

Next we consider a simple predicate called stop_thread having two possible exe-cution scenarios as indicated by its two exit labels true and invalid When the giveninput index i corresponds to an active thread the predicate executes successfully thusexiting with true In this case the state of the i-th thread associated to the inputprocess is set to Blocked and the new state of the process is returned in the outputout Otherwise when the given index i corresponds to a thread that is Ready or whenthere is no active thread at that index the predicate exits with the label invalid andno output is generated

public stop_thread ( process in int i process out +)-gt [ true invalid ]program array ltoption ltthread gtgt ta state s thread ti

option ltthread gt tio

Copy in to outout = in Fetch in threads and copy it to taprocessthreads (in ta +) Get the array rsquos i-th element[ oob invalid ] get_elem (ta i tio +) Check if the i-th element is activeswitch (tio)case Some ( thread th) ti = th case None ra i se invalid

50 Chapter 3 The Smart Language and ProvenTools

Get the thread rsquos current statethreadcrt_state (ti s+) Check whether the transition is valid[ f a l s e invalid ]statecase[ Running ](s) Create the new state for the running threadstateBlocked (s+) Set the newly created statethreadcrt_state (ti+ s) Reset tio to the thread with the modified stateoptionSome(tio + ti ) Reset the i-th thread and return the new state ta[ oob invalid ] set_ei (ta i tio ta +) Update out threads to taprocessthreads (out + ta)

Another auxiliary predicate called start_thread when given a valid index of anunblocked thread sets the state of the i-th thread to Running It is implementedsimilarly as shown below

public start_thread ( process in int i process out +)-gt [ true invalid ]program array ltoption ltthread gtgt ta state s thread ti

option ltthread gt tio

Copy in to outout = in Fetch in threads and copy it to taprocessthreads (in ta +) Get the array rsquos i-th element[ oob invalid ] get_ei (ta i tio +) Check if the i-th thread is activeswitch (tio)case Some ( thread th) ti = th case None ra i se invalid

threadcrt_state (ti s+)

Check whether the transition is valid[ f a l s e invalid ]statecase[Ready ](s) Create the new state for the running threadstateRunning (s+) Set the newly created statethreadcrt_state (ti + s) Reset tio to the thread with the modified stateoptionSome(tio + ti ) Set the i-th element and return the new state ta[ oob invalid ] set_ei (ta i tio ta +)

31 The Smart Modeling Language 51

Update out threads to taprocessthreads ( out + ta)

These two predicates will be called by the predicate run_thread that performs asimple thread switch It stops the thread currently executing indicated by crt_threadand starts the one with the given index i The new state of the process is returned inthe output out

public run_thread ( process in int i process out +)-gt [ true inval ]program int crt

processcrt_thread (in crt +)[ true true invalid inval ] stop_thread (in crt out +)[ true true invalid inval ] start_thread (out i out +)processcrt_thread (out + nid )

Next we introduce a fundamental property for any valid process state namely thefact that the stack regions of all its associated threads are completely disjoint

public not_disjoint ( process p) -gt [ true f a l s e ]inductivecase StacksJoint ( int i int j) thread ti thread tj memory_region sti

memory_region stj i = j[None f a l s e ] thread (p i ti +)[None f a l s e ] thread (p j tj +)threadstack(ti sti +) threadstack (tj stj +)overlap (sti stj )

case CodeStackJoint ( int i)

thread ti memory_region sti address_space asmemory_region code [None f a l s e ] thread (p i ti +)threadstack (ti sti +)processadr_space (p as +)address_spacecode(as code +)overlap (sti code )

case DataStackJoint ( int i)

thread ti memory_region sti address_space asmemory_region data [None f a l s e ] thread (p i ti +)threadstack (ti sti +)

52 Chapter 3 The Smart Language and ProvenTools

processadr_space (p as +)address_spacedata(as data +)overlap (sti data )

public disjoint_stacks ( process p) -gt [ true f a l s e ]program

not_disjoint (p)

This property is expressed using an inductive predicate that characterizes the potentialsituations in which the memory isolation of the different associated threads of a processcan be broken The natural manner of expressing such a property in Smart is by usinga scheme as presented in Section 314 here we use an inductive predicate becausethe language we are working with and which will be presented in Chapter 4 doesnot support schemes In our inductive predicate the first case StacksJoint checkswhether there exist two different threads having overlapping stacks The next twocases CodeStackJoint and DataStackJoint check whether there exists a thread whosestack overlaps the processrsquo code segment or data segment respectively This uses anauxiliary predicate verifying if two memory regions overlap ie if there exists anaddress that is contained simultaneously by two different segments This operation issymmetric we express this property with the lemma overlap_sym

public contains ( memory_region m int address )-gt [ true f a l s e ]impl ic i t programpublic overlap ( memory_region m1 memory_region m2)-gt [ true f a l s e ]inductivecase InBoth ( int address )

contains (m1 address ) ampamp contains (m2 address )

public lemma overlap_sym ( memory_region m1 memory_region m2)-gt [ true f a l s e ]program

overlap (m1 m2) =gt overlap (m2 m1)

32 ProvenToolsProvenTools is a comprehensive set of development tools for the Smart language Ithas been developed at Prove amp Run with the goal of facilitating the achievement ofhigh-level certifications The toolchain has the structure of a set of Eclipse plug-ins ofJDT type ndash Java Development Tools Together these constitute a complete IntegratedDevelopment Environment (IDE) allowing one to not only write edit and document

32 ProvenTools 53

Smart models but also to browse proof obligations to prove them by employing abuilt-in prover and finally to generate executable code in C or Java

The plug-ins are based on Xtext (Xtext Documentation) an official Eclipse plug-indedicated to the creation of DSLs (Domain Specific Languages) in Eclipse Xtext-basedDSLs are described in an EBNF (Extended Backus-Naur Form) grammar languageFully statically typed expressions can be embedded in the developed DSL and Javastyle scoping and linking are supported

Proofs

ProofObligations

C Code

Java Code

Prover

Code Generators

Prover

Code Generators

SmilSmart Code ampSpecifications

Front-end Back-end

Figure 32 ndash The ProvenTools Toolchain

Concretely the toolchain includes a compiler whose front-end contains the plug-inin charge of Smart as well as the plug-in dedicated to Smil the Smart IntermediateLanguage to which Smart programs and specifications are translated Smil is a simplerform of the Smart language Though roughly equivalent to Smart Smil has a ratherdifferent form manipulating less complex structures and having no syntactic sugarHarder to be understood by a human reader Smil is meant to be easily manipulated bythe back-end of the toolchain The back-end currently offers a C code generator andan interactive prover An overview of this architecture is shown in Figure 32

While employing ProvenTools the code undergoes various compilation steps andtransformations During the compilation chain the Smart code is transformed to aSmart AST (Abstract Syntax Tree) The obtained AST is then compiled to a SmilAST Following the Smil AST is transformed to Smil source code and then reinsertedin the compilation chain by the plug-in in charge of it

After finishing all the compilation chain and obtaining the Smil AST and the asso-ciated Smil source code the back-end of the compiler can be employed The back-endcomprises a source code generator and a prover The generator transforms Smart mod-els into their equivalents in C

54 Chapter 3 The Smart Language and ProvenTools

Figure 33 ndash Smart Editor

Smart Editor The Smart editor provides facilities to edit Smart code and supportsbroad and complex features such as syntax highlighting facilities for code navigationand visualization and edition assistants including word completion and quick fixes Asnapshot of it is shown in Figure 33

Prover ProvenTools provides users a dedicated view for interacting with the proverThis presents the existing proof obligations and provides facilities to solve them Proofobligations are generated for any logical lemma precondition postcondition or invariantincluded in the Smart models Additionally any label that remains unhandled in thecode triggers the generation of a proof obligation thus enforcing that each possible exitlabel of a predicate is either explicitly handled or proven to be impossible

An automatic prover trying various proof search procedures is called whenever aproof obligation is generated It uses previously proven obligations or existing hypothe-ses for discharging new obligations automatically Unproved obligations can be solvedby interactively employing manual tactics called hints which are provided in the IDEHints that are considered useless with respect to the currently selected proof obliga-tions are automatically disabled Additionally users can define strategies ie proofpatterns and employ an interactive proof assistant that applies them automatically inthe background This will suggest a possible proof as soon as it finds one Proofs thusfound are rechecked as if they had been done manually

33 Smil 55

ProvenTools offers facilities to inspect any manual or automatic proof step thusmaking an eventual review of the proofs possible The toolchain also provides a dedi-cated system for assisting the user into adapting former proofs to new changes due tocode maintenance or evolution

C Code Generator The executable part of Smartmodels is translated to executableC code by the C code generator To this end the executable parts of the Smart modelsare identified and extracted while the logical parts are discarded Users can guidethis process through annotations and they can specify that particular values are purelylogical Functional implementations are transformed to imperative ones the dedicatedC code generation plug-in tries to replace functional modifications of structures in themodels by in-place updates Such transformations are correct only if the differentvalues are handled linearly in the Smart code ie if no previous value is read afterapplying a functional update on it For ensuring the safety of functional to imperativecode transformations the C generation plug-in employs various global static analysesWhen safety cannot be guaranteed the generator reports errors or introduces copiesif the users deemed it acceptable

In earlier experiments (Lescuyer 2015) the Prove amp Run team was able to generateC code for a complete model of ProvenCore that did not require dynamic allocationand ran at a speed comparable to the original C code

33 SmilSmil is an intermediate language to which Smart models are compiled Similarly toSmart Smil is a functional language with algebraic data types (structures and variants)However unlike Smart Smil is not a user-oriented language ie it was not designed towrite programs in it directly but rather to provide a representation of Smart programsat a different level of abstraction Thus reading Smil code is a rather cumbersome taskas it is a language without syntactic sugar meant to serve as a starting point for themain components of the ProvenTools back-end exploiting Smart models the prover andthe code generator

To give an idea of Smilrsquos syntax we illustrate below the types thread and processas well as the stop_thread predicate from our abstract process manager example givenin Section 315

public type state =| Ready| Running| Blocked

public type thread = id int crt_state statestack memory_region

56 Chapter 3 The Smart Language and ProvenTools

public state_acopy_ahypothesis (state state_1 ) -gt [ true]hypothesis state state_2

[lt1gt] stateswitch ( state_1 )-gt [ Ready -gt 5 Running -gt 4 Blocked -gt 3]

[lt2gt] ==ltstate gt( state_1 state_2 )-gt [ true -gt true f a l s e -gt error ]

[lt3gt] stateBlocked ( state_2 )-gt [ true -gt 2]

[lt4gt] stateRunning ( state_2 )-gt [ true -gt 2]

[lt5gt] stateReady ( state_2 )-gt [ true -gt 2]

public thread_ahypothesis ( thread x1) -gt [ true]hypothesis thread x2 int zid state zcrt_state

memory_region zstack [lt1gt] threada l l (x1 zid zcrt_state zstack )

-gt [ true -gt 2][lt2gt] threadnew(x2 zid zcrt_state zstack )

-gt [ true -gt 3][lt3gt] ==lt thread gt( x1 x2)

-gt [ true -gt true f a l s e -gt error ]

The type declarations in Smil strongly resemble their Smart counterpart Predicatedeclarations as well mirror the form found in Smart except that in Smil any outputvariable associated to the true exit label is explicitly declared as such Preconditionsand postconditions are appended to any predicate and as shown above a hypothesisis added for any explicitly declared type

The real syntax differences are visible in predicate implementations every state-ment is preceded by a numerical label and every possible exit label lbl of the statementindicates another numerical label The latter numerical label actually designates thestatement that will be executed next if the current statement exits with label lbl Inparticular this mechanism replaces the try catch and the conditional controlconstructs as well as the logical operators and any other construct based on labeltransformers described in Section 312 Thus the predicate bodies are very similar inform to a control flow graph where the statements represent the nodes of the graphand the exit labels represent transitions

public stop_thread ( process in int i process out +)-gt [true ltout gt invalid ]

pre [lt0gt] true() -gt [ true -gt true]

33 Smil 57

array ltoption ltthread gtgt ta state s thread tioption ltthread gt tio thread th

[lt1gt] =lt process gt( out in)-gt [ true -gt 2]

[lt2gt] processthreads (in ta)-gt [ true -gt 3]

[lt3gt] get ltoption ltthread gtgt(ta i tio)-gt [ true -gt 4 oob -gt invalid ]

[lt4gt] optionswitch ltthread gt( tio th)-gt [None -gt 6 Some -gt 7]

[lt5gt] stateBlocked (s)-gt [ true -gt 8]

[lt6gt] true()-gt [ true -gt invalid ]

[lt7gt] =lt thread gt(ti th)-gt [ true -gt 5]

[lt8gt] threadcrt_state +( ti ti s)-gt [ true -gt 9]

[lt9gt] optionSome ltthread gt( tio ti)-gt [ true -gt 10]

[lt10 gt] set ltoption ltthread gtgt(ta i tio ta)-gt [ true -gt 11 oob -gt invalid ]

[lt11 gt] set ltoption ltthread gtgt(ta i tio ta)-gt [ true -gt 12 oob -gt invalid ]

[lt12 gt] processthreads +( out out ta)-gt [ true -gt true]

post true 0post invalid 0

In a nutshell Smil constitutes a representative albeit restricted set of constructsand it is a language designed to be well-suited for further transformations and analyses

The next chapter focuses entirely on αSmil the computational version of Smil withwhich we are working throughout the rest of this thesis We will illustrate its usageand describe its abstract syntax and formal semantics

59

Chapter 4

The αSmil Language

One day I will find the right wordsand they will be simple

Jack Kerouac

In this chapter we define the syntax and the semantics of αSmil the languagethat we consider in this thesis This is a computational version of Smil (presented inSection 33) which is essentially a subset of Smart presented in the previous chapterChapter 3 However it contains a few additional elements introduced for the purposeof this thesis

The αSmil language is a first-order purely functional and strongly-typed languagewith arrays and algebraic data types ie structures and variants It is an intermediateanalysis-oriented language

41 αSmil SyntaxThe αSmil language is minimal in the sense that it contains only those constructs thatare needed for the purpose of this thesis For instance unlike Smart and Smil thelanguage does not contain visibility modifiers because these modifiers play no role inthe techniques presented in the sequel During the introduction of the grammar wewill point out the most important deviations from Smart and Smil

Programs A program in αSmil consists of a number of type and constant declara-tions and definitions followed by a collection of predicates In contrast to Smart andSmil type and predicate declarations have no visibility modifiers (such as public) andthey are not organized into modules The absence of visibility modifiers is a naturalconsequence of the disappearance of modules We assume that there is one modulein which every type constant and predicate declaration resides and these are mutu-ally visible to each other These restrictions are made for the sake of simplicity sincethe techniques proposed in this thesis are orthogonal to the concepts of visibility andmodules

Constants are declared using the keyword const followed by the type and the con-stant identifier Constant identifiers are written in upper-case letters and are precededby the special symbol

60 Chapter 4 The αSmil Language

Types are declared using the keyword type followed by the type identifier and op-tionally in the case of polymorphic type declarations by a number of type parametersgiven in upper-case letters between ltgt In the case of implicit types this constitutes thecomplete type declaration Explicit type declarations continue with the symbol = andthe typersquos definition Throughout the rest of this chapter and the presentation of ourstatic analyses we will ignore polymorphism The abstract types of our analyses arenot polymorphic and the impact of polymorphism is visible only at the implementationlevel for type substitutions that will be discussed in Chapter 8

Types Similarly to Smart algebraic data types ie structures and variants andassociative arrays are supported We let T be the universe of type identifiers andT0 sub T the set of base type identifiers We assume a set of identifiers for structurefields and variant constructors denoted by F and C respectively

A structure represents the Cartesian product of the different types of its elementscalled fields A variant is the disjoint union of different types It represents data thatmay take on multiple forms where each form is marked by a specific tag called theconstructor Arrays group elements of data of the same type (given in angle brackets)into a single entity elements are selected by an index whose type is included (as denotedby the superscript) in the arrayrsquos definition as well

Definition 411 Types τ isin T in αSmil

τ isin T τ = | τ0 isin T0 base types| structf1 τ fn τ fi isin F 0 le n structures| variant[C1 τ | | Cn τ ] Ci isin C 1 le m variants| arrτ 〈τ〉 arrays

Variants and structures can be used together to model traditional algebraic variantswith zero or several parameters For instance a generic type optionltTgt is actuallymodeled as

variant[Some structt T | None struct]

Concretely structures are declared and defined by indicating a set of pairs of fieldidentifiers and their corresponding types between Declaring structures with no fieldsis possible Variants are declared and defined by indicating the list of their constructorseach starting with an upper-case letter preceded by the symbol | Unlike structuresvariants must have at least one declared constructor For instance the state and threadtypes from our Abstract Process Manager example given in Smart in Section 315 onpage 48 have the following Smil declaration

type state =|Ready| Running| Blocked

type thread = id int crt_state statestack memory_region

41 αSmil Syntax 61

In contrast to Smart in structure declarations the field name precedes the field type

Predicates Predicates are declared using the keyword predicate which is specificto αSmil followed by a predicate identifier and a signature A signature is given by asequence of input types and a non-empty finite mapping of exit labels λ isin L errorto sequences of output labels The set of exit labels L contains three distinguishedelements true false and error The latter cannot appear in predicate signatures it isused as a sink node in control flow graphs which will be presented in Section 42 Wewrite signatures in the following manner

σ =

(x1 τ1 xn τn)︸ ︷︷ ︸input identifiers types

[λ1 (τ11 y11 τ1k1 y1k1)| |label (output types identifiers)︷ ︸︸ ︷λp (τp1 yp1 τpkp ypkp)]︸ ︷︷ ︸

p possible exit labels

We denote by Σ the mapping between predicate identifiers and their signaturesThe predicate declaration is followed by the predicatersquos body Depending on its

bodyrsquos nature a predicate will be implicit explicit or inductive Smart implicit andexplicit predicates have been presented in Section 311 of our previous chapter whileinductive predicates have been illustrated in Section 314 on page 46 For implicitpredicates the body consists solely in the keyword implicit For explicit predicates anoptional declaration unit can follow This is a finite mapping from variables to types andit must be given between double curly braces ie typeid videntifier Input andoutput parameters must be different from all the variables appearing in the declarationunits Declaration units are followed by a sequence of statements representing calls topredicates

Just as presented in Chapter 314 for Smart an inductive predicate is syntacticallydistinguished by the keyword inductive followed by its different cases declared withthe keyword case followed by an identifier an optional list of existentially quantifiedvariables and a body of statements

A generic call to a predicate p is of the form

p(e1 en) [λ1 o1 | | λm om]

The predicate p is called with inputs e1 en and yields one of the declared exitlabels λ1 λm each having its own set of associated output variables o1 omrespectively We denote by o a sequence of 0 or more output variables

Statements The αSmil language supports the statements presented in Table 42These represent calls to built-in predicates and can be seen as special cases of thepredicate call presented above All statements have a functional nature and handleimmutable data A statement consists in as many variables as there are input types

62 Chapter 4 The αSmil Language

s = | o = e (1) assignment| e1 = e2 (2) equality test| nop (3) no operation| r = e1 en (4) create structure| o1 on = r (5) destructure structure| o = rfi (6) access field| rprime = r with fi = e (7) update field| rprime = 〈f1 fk〉rprimeprime (8) check (partial) structure equality| v = Cp[e] (9) create variant| switch(v) as [o1| |on] (10) destructure variant| v isin C1 Ck (11) variant possible| o = a[i] (12) array access| aprime = [a with i = e] (13) array update| p(e1 en) [λ1 o1 | | λm om] (14) predicate call

Table 42 ndash αSmil ndash Set of Supported Statements

in the signature σp of the called built-in predicate p and a mapping associating toeach exit label of σp a sequence of variables one variable for each output type in thecorresponding sequence

The first three statements are generic and can be applied to any type Statement (1)is a call to the built-in assignment predicate denoted by = present in an identical formin Smart as well Statement (2) is a call to the logical operator = verifying whether itstwo input arguments are equal Statement (3) is the αSmil equivalent of a no-operationAs a general convention for the statements notation we denote by e the identifiers ofentry variables and by o the identifiers of output variables

Statements (4) ndash (8) are structure-related The first of them statement (4) is theconstructor of a structure r of type rtype having n fields It corresponds to the state-ment rtypenew(r+ e_1 e_n) in Smart Statement (5) returns the values ofall the fields of r into the output parameters o1 on and it is the equivalent ofrtypeall(r o_1+ o_n+) in Smart Statement (6) is the individual accessor ofa field fi and corresponds to rtypef_i(r e_i+) in Smart As previously mentionedour language is purely functional and handles only immutable algebraic data structuresand arrays Therefore setting the field fi of a structure shown in (7) and being theequivalent of rtypef_i(rrsquo+ e_i) returns a new structure where all fields have thesame value as in r except fi which is set to ei Statement (8) verifies if the valuesof the indicated subset of fields of two structures rprime and rprimeprime are equal It exists inSmart as well where it has a similar syntax rtypeequals[fg](rrsquo rrsquorsquo) for check-ing that the values of fields f and g of the two structures are equal or the dualrtypeequals-[fg](rrsquo rrsquorsquo) for checking that the values of all fields except f and gare equal

The next group of statements is variant-related The first of them statement (9)creates a new variant v of type vtype using the constructor Cp with e as an argumentIt corresponds to vtypeCp(v+ e) in Smart Statement (10) is used for matching on

41 αSmil Syntax 63

the different constructors of the input variant v and corresponds to switch(v) case in Smart The last statement of this group statement (11) verifies if the given variantwas created with one of the constructors in C1 Ck This could be obtained witha variant switch but for practical considerations it has been provided as a built-inpredicate Its counterpart in Smart is vtypecase[C1 Ck](v)

Statements (12) and (13) are array-related (12) returns the value of the i-th cell ofthe input array a Similarly to (7) updating the i-th cell of an array ndash shown in (13) ndashhas a functional nature It returns a new array where all cells have the same values asin a except the i-th cell which is set to e These statements are specific to αSmil

Statement (14) is a generic call to a predicate p and has been presented on page 61

Exit Labels All of the built-in supported statements have an associated set of exitlabels λ isin L error These are indicated in Table 43 There are two distinguishedexit labels true and false respectively An additional built-in label called error is usedas a sink node in control flow graphs It cannot be used as an exit label for a predicate

Table 43 ndash Statements and their Exit Labels

Statement Exit Labels

o = e (1)[true 7rarr o

]

e1 = e2 (2)

[true 7rarr emptyfalse 7rarr empty

]

nop (3)[true 7rarr empty

]r = e1 en (4)

[true 7rarr r

]o1 on = r (5)

[true 7rarr o1 on

]o = rfi (6)

[true 7rarr o

]rprime = r with fi = e (7)

[true 7rarr rprime

]

rprime = 〈f1 fk〉rprimeprime (8)

[true 7rarr emptyfalse 7rarr empty

]

v = Cp[e] (9)[true 7rarr v

]

64 Chapter 4 The αSmil Language

switch(v) as [o1| |on] (10)

λC1 7rarr o1

λCn 7rarr on

v isin C1 Ck (11)

[true 7rarr emptyfalse 7rarr empty

]

o = a[i] (12)

[true 7rarr ofalse 7rarr empty

]

aprime = [a with i = e] (13)

[true 7rarr aprime

false 7rarr empty

]

p(e1 en) [λ1 o1 | | λm om] (14)

λ1 7rarr o1 λm 7rarr om

As shown in Table 43 statement (10) has an exit label λCi corresponding to eachconstructor Ci of the input variant Statements (2) (8) and (11) are bi-labeled using trueand false as logical values Neither of them has any associated outputs Statements (12)and (13) are bi-labeled as well However unlike the previously mentioned statementsthey use the label false as an ldquoout of boundsrdquo exception and generate an output onlyfor the label true All other statements except (14) are uni-labeled they associate alltheir output parameters (if any) to the label true In contrast to Smart in αSmilevery exit label including true must be explicitly indicated Furthermore any outputis explicitly associated to an exit label

In Section 315 (on page 50) of our previous chapter we introduced a Smart pred-icate called stop_thread If the given index i designates an active associated threadthis predicate sets its state to Blocked and returns the new state of the process Oth-erwise the predicate exits with label invalid Revisiting it we can finally indicate itsbody in the αSmil language1

Table 44 ndash Predicate Body in αSmil

Signaturepredicate stop_thread ( process p int i)-gt [ true process o | invalid ] Declaration unit array lt option_thread gt ta option_thread th

thread ti state s Predicate body

1The αSmil version is slightly simplified as we are not checking if the transition to Blocked is valid

41 αSmil Syntax 65

ta = p threads [ true -gt 1] 0th = ta[i] [ true -gt 2 f a l s e -gt 9] 1switch (th) as [ti | ] [Some -gt 3 None -gt 9] 2s = Blocked [ true -gt 4] 3ti = ti with crt_state = s [ true -gt 5] 4th = Some(ti) [ true -gt 6] 5ta = [ta with i = th] [ true -gt 7 f a l s e -gt 9] 6o = p with threads = ta [ true -gt 8] 7[ true] 8[ invalid ] 9

Every statement in our stop_thread example is followed by a construct of the formexit_label -gt numerical_label This indicates the statement to be executed next asidentified by the numerical_label if the current statement exits with label exit_labelFor example when the first statement ta = pthreads exits with label true thepredicatersquos execution continues with the statement following it having the numericallabel 1 We remark that the predicatersquos exit labels are included in the body of anexplicit predicate as can be seen at lines 8 and 9 respectively in the case of trueand inval Intuitively the predicatersquos body resembles a control flow graph and canbe illustrated as shown in Figure 41 The predicatersquos exit labels are the control flowgraphrsquos exit nodes as will be discussed in Section 42

0 ta = inthreads1 th = ta[i]2 switch(th) as [Someti | None]3 s = BLOCKED4 ti = ti with current_state=s5 th = Some(ti)6 ta = [ta with i=th]7 o = in with threads=ta8 true 9 inval

false

None

false

Figure 41 ndash Body of the stop_thread Predicate

We are working with αSmil which is a computational version of Smil where allspecification-only predicates have been removed Simulating hypotheses lemmas andcontracts is straightforward and can be achieved using predicates having only the trueand false labels and no associated output Inductives are the only exception to thisrule they are supported in αSmil as well and their declaration is similar to the one inSmart The αSmil equivalent of the not_disjoint inductive presented in our AbstractProcess Manager example (on page 46) has the following form

predicate not_disjoint ( process p)-gt [ true | f a l s e ]inductive

66 Chapter 4 The αSmil Language

case StacksJoint ( int i int j) thread ti thread tj memory_region sti

memory_region stj i = j [ true -gt 1 f a l s e -gt 7]thread (p i)[ true ti | None] [ true -gt 2 None -gt 7]thread (p j)[ true tj | None] [ true -gt 3 None -gt 7]sti = tistack [ true -gt 4]stj = tjstack [ true -gt 5]overlap (sti stj )[ true| f a l s e ] [ true -gt 6 f a l s e -gt 7][ true][ f a l s e ][error]

case CodeStackJoint ( int k)

thread tk memory_region stk address_space aspmemory_region code

thread (p k)[ true tk | None] [ true -gt 1 None -gt 6]stk = tkstack [ true -gt 2]asp = p adr_space [ true -gt 3]code = aspcode [ true -gt 4]overlap (stk code )[ true| f a l s e ] [ true -gt 5 f a l s e -gt 6][ true][ f a l s e ][error]

case DataStackJoint ( int l)

thread tl memory_region stl address_space aspace memory_region data

thread (p l)[ true tl | None] [ true -gt 1 None -gt 6]stl = tlstack [ true -gt 2]aspace = p adr_space [ true -gt 3]data = aspace data [ true -gt 4]overlap (stl data )[ true| f a l s e ] [ true -gt 5 f a l s e -gt 6][ true][ f a l s e ][error]

predicate disjoint_stacks ( process p) -gt [ true | f a l s e ]

not_disjoint (p)[ true| f a l s e ] [ true -gt 1 f a l s e -gt 2][ true][ f a l s e ][error]

This inductive predicate has been introduced and explained in Section 315 of theprevious chapter (on page 52) and it characterizes the potential situations in which thememory isolation of the different associated threads of a process can be broken

42 Control Flow Graph 67

42 Control Flow GraphPredicate bodies in αSmil resemble a control flow graph representation having state-ments as nodes The nodes represent program states and the edges are defined bystatements with a particular exit label λ

The control flow graph Gp = (N E) of a predicate p has a node ni isin N for eachprogram point For each statement s at program point ni that can execute and reachprogram point nj with exit label λk an edge (ni nj) is added to Gp and labeled withs and λk Gp has a single entry node nin isin N corresponding to the program pointassociated to the first statement of p The set of exit nodes nout sub N consists of thenodes associated to each possible exit label λk of the predicate To these one additionalexit node which is used as a sink node is added This corresponds to the error label

In practice all the outgoing edges of a node ni isin N bear the different cases of thesame statement s found at program point ni Thus the edges are labeled with thesame statement s and there is an edge labeled s λk for each possible exit label λk of s

The subfigures in Figure 42 show the control flow graph of the following predicate

predicate thread ( process p int i)-gt [ true thread ti | None | oob]

which receives a process p and an index i as inputs and returns the i-th active threadof the input process If the i-th thread is inactive it exits with the exit label NoneIn the case of an ldquoout of boundsrdquo exception the exit label oob is returned For betterreadability Figure 42-b gives the control flow of the same predicate where we havelabeled the nodes with statements of the predicate and the edges with their exit la-bels Throughout the rest of our αSmil predicate examples we will favour the latterrepresentation

a) Gthread b) Gthread ndash alternative representationn1

n2

n3 oob

true None

ts = pthreads true

tio = ts[i] truetio = ts[i] false

switch (tio) as [ti| ] Some switch (tio) as [ti| ] None

ts = pthreads

tio = ts[i]

switch(tio) as [ti| ] oob

true None

true

true false

Some None

Figure 42 ndash Example ndash Control Flow Graph of Predicate thread

43 Well-Typed αSmil StatementsWe formally define what it means for an αSmil statement to be well-typed and detailthe full system of inference rules for the statements supported by αSmil in Table 46

68 Chapter 4 The αSmil Language

and Table 47A well-typed αSmil statement is a statement that is compatible with the types

specified in the signature σp of the called built-in predicate p This requires a typingenvironment Γ mapping variables to their types

Definition 431 Typing Environment Γ

Γ V rarr T

Furthermore αSmil distinguishes between variables v isin V which can be writtento and variables which are read-only Therefore the definition of well-typedness forstatements requires two different sets of variable identifiers one for each kind of variableThese are

bull V+ V+ sube V which denotes the set of identifiers of writable and readable vari-ables and

bull V V+ which denotes the set of read-only variables

The mapping between predicate identifiers and their signatures is denoted by Σ

Definition 432 Mapping between Predicate Identifiers and Signatures

Σ P rarr S

Definition 433 Well-Typed Statement A statement s exiting with label λ isin L error is well-typed in the typing environment Γ given Σ

ΣΓO ` srarr λ

if it is compatible with the types specified in its signature Moreover outputs of awell-typed statement must be in the writable variables set O sube V+

The inference rule for a well-typed predicate call captures all these properties andis shown in rule [WTPCall] given in Table 46

Table 46 ndash Well-Typed Predicate Call

Σ(p) = (x1 τ1 xn τn)[λ1 (τ11 y11 τ1k1 y1k1)| | λm (τm1 ym1 τmkm ymkm)]

Γ(e1) = τ1 Γ(en) = τnforalli isin 1 m Γ(oi1) = τi1 Γ(oiki) = τiki

oi1 oiki isin O foralli foralljforallki j 6= ki oij 6= oiki λ isin λ1 λmΣΓO ` p(e1 en) [λ1 o1 | | λm om]rarr λ

WTPCall

43 Well-Typed αSmil Statements 69

The inference rules for the αSmil statements representing calls to built-in predicatesare detailed in Table 47

Table 47 ndash Well-Typed Statements

Γ(e1) = Γ(e2) λ isin true falseΣΓO ` e1 = e2 rarr λ

WTEquals

Γ(o) = Γ(e) o isin OΣΓO ` o = erarr true

WTAsgn

ΣΓO ` noprarr trueWTNop

Γ(r) = structf1 τ1 fn τnΓ(e1) = τ1 Γ(en) = τn r isin OΣΓO ` r = e1 en rarr true

WTRecNew

Γ(r) = structf1 τ1 fn τnΓ(o1) = τ1 Γ(on) = τn foralli oi isin O foralli 6= j oi 6= oj

ΣΓO ` o1 on = r rarr trueWTRecAll

Γ(r) = structf1 τ1 fi τi fn τn Γ(o) = τi o isin OΣΓO ` o = rfi rarr true

WTRecGet

Γ(r) = Γ(rprime) = structf1 τ1 fi τi fn τnΓ(e) = τi rprime isin O

ΣΓO ` rprime = r with fi = e rarr trueWTRecSet

Γ(rprime) = Γ(rprimeprime) = structg1 τ1 gn τnλ isin true false f1 fk sube g1 gn

ΣΓO ` rprime = 〈f1 fk〉rprimeprime rarr λWTRecEq

Γ(v) = variant[C1 τ1| | Cp τp| | Cn τn]Γ(e) = τp v isin O

ΣΓO ` v = Cp[e]rarr trueWTVarCons

Γ(v) = variant[C1 τ1| | Cp τp| | Cn τn]Γ(op) = τp op isin O

ΣΓO ` switch(v) as [o1| |on]rarr λCpWTVarSwitch

70 Chapter 4 The αSmil Language

Γ(v) = variant[D1 τ1| | Dm τm]C1 Ck sube D1 Dm λ isin true false

ΣΓO ` v isin C1 Ck rarr λWTVarPos

Γ(a) = arrτi〈τ〉 λ isin true false Γ(i) = τi Γ(o) = τ o isin OΣΓO ` o = a[i]rarr λ

WTAGet

Γ(aprime) = Γ(a) = arrτi〈τ〉λ isin true false Γ(i) = τi Γ(e) = τ aprime isin O

ΣΓO ` aprime = [a with i = e]rarr λWTASet

The well-typedness of statements plays an important role with respect to the state-mentsrsquo interpretation as we will show in the next section It is also essential for thewell-typedness and well-formedness of dependency and correlation summaries that willbe presented in the following chapters

The control flow graph Gp = (N E) of a predicate p is well-typed if any edge labeledwith (s λ) isin E is well-typed

forall(s λ) isin E ΣΓO ` srarr λ

ΣΓO ` Gp = (N E)WTCfg

Figure 43 ndash Well-Typed Control Flow Graph

44 Operational Semantics of αSmil StatementsThis section presents the structural operational semantics (Nielson Nielson and Han-kin 1999 Plotkin 2004) of the αSmil language Sometimes also called the small stepoperational semantics this allows reasoning about intermediate stages in a programrsquosexecution and emphasizes the individual steps of the computation

Types We take T0 to be the universe of primitive types τ0 isin T0 Structures variantsand associative arrays are defined inductively Structures are finite labeled products oftypes They are a generalization of the Cartesian product Variants are finite labeleddisjoint unions of several types τ Two types are equal when they are pointwise equal

Semantic Values For each type τ we define the set Dτ of semantic values of thattype For each primitive type τ0 isin T0 we suppose a given Dτ0 Other semantic valuesare defined inductively as shown below

44 Operational Semantics of αSmil Statements 71

Definition 441 Semantic Values Dτ

Dstructf1τ1fnτn = f1 = v1 fn = vn| foralli vi isin Dτi

Dvariant[C1τ1| | Cnτn] =⊎

1leilenCi[v]| v isin Dτiwhere⊎

is the disjointunion

Darrτi 〈τ〉 = (P (vk)kisinP)| P sube Dτi forallk isin P vk isin Dτ

In αSmil arrays are partial In a semantic value belonging to Darrτi 〈τ〉 P denotesthe domain of valid indices for the array

Two values of the same type are equal when they are pointwise equalTraditionally in operational semantics one is interested in how the state is modified

during the execution of a statement αSmil has no concept of state per se what isessential is the evaluation of variables in different environments or semantic contextsTo emphasize this idea we define a valuation or environment E isin E as a mappingfrom variables to semantic values

Definition 442 Valuation or environment E

E V rarr D

Two valuations E and Ersquo are equal if they are mapping the same set of variables tosemantic values that are pointwise equal

E = Eprime lArrrArr forallv isin V E(v) = Eprime(v)

Given a typing environment Γ a valuation E is well-typed if the value mapped toany variable v isin Dom(E) is of the appropriate type Γ(v) We denote this by Γ ` Eand show it in [WTEnv]

forallv isin E E(v) isin DΓ(v)

Γ ` EWTEnv

Definition 443 A configurationlangE [s]

rangof the semantics is a pair consisting of a

valuation and a statement

Definition 444 The transitions of the semantics are of the formlangE [s]

rang λminusrarr Eprime

They express how the configuration is changed by one step of computation occur-ing when executing a statement s that exits with label λ The exit label yielded bythe statementrsquos execution uniquely determines the statement that will be executednext The change of the valuation is recorded in the resulting valuation Ersquo We write

72 Chapter 4 The αSmil Language

E [xrarr v] for the valuation that is identical to E except that x is mapped to the valuev We say that E is extended with xrarr v and formally we define it as shown below

Definition 445 Extend E with xrarr v

(E [xrarr v])(y) =v if x = yE(y) otherwise

Extending a valuation E with multiple mappings x rarr v consists in applying theextension in a left-associative fashion In the following we will omit parentheses forsuch extensions thus denoting

( ((E [x1 rarr v1])[x2 rarr v2]) )[xn rarr vn]

asE [x1 rarr v1] [x2 rarr v2] [xn rarr vn]

An interpretation I isin I for a predicate is defined as a mapping from a predicateand an initial environment to an output environment and an exit label

Definition 446 Predicate Interpretation I isin I

I P times E rarr E times L

The initial environment is a mapping between the predicatersquos formal arguments andtheir effective values The output environment is a mapping between the predicatersquosformal output arguments and their effective values after executing the predicate

The detailed definition of the semantics of generic statements is described belowin Table 48 The first clause [nop] constitutes an axiom as it has no premises Itstates that the nop statement executes in one step yielding the exit label true withoutextending the valuation E The semantics of equality tests is given by two inferencerules [equalT ] and [equalF ] one for each of the statementrsquos possible exit labels Acall to the built-in predicate = will exit with label true if and only if the valuations ofits arguments e1 and e2 are equal (clause [equalT ]) Otherwise the statement will exitwith label false (clause [equalF ]) In both cases the statement leaves the valuation Eunchanged The semantics of an assignment is given by the [asgn] clause the statementalways yields the exit label true and extends the valuation E with o mapped to thevalue E(e) of e

Table 48 ndash The Structural Operational Semantics of αSmil GenericStatements

[nop]langE [nop]

rang trueminusminusrarr E

[equalT ]E(e1) = E(e2)lang

E [e1 = e2]rang trueminusminusrarr E

44 Operational Semantics of αSmil Statements 73

[equalF ]

E(e1) 6= E(e2)langE [e1 = e2]

rang falseminusminusrarr E

[asgn]Eprime = E [orarr E(e)]langE [o = e]

rang trueminusminusrarr Eprime

The semantics of structure-related statements is given in the Table 49 The creationof a structure always yields the exit label true as indicated by the [recNew] clause andit extends the valuation E by mapping the resulting output variable r to the structuralvalue obtained by mapping every field fi to the value E(ei) of the corresponding eiarguments The destructuring of a structure r extends the valuation E by mappingevery output oi to the corresponding value E(vi) of the fi field of r The statementalways exits with true The valuation Eprime obtained after executing an access to a givenfield fi of a structure r is an extension of E where the output o is mapped to thecorresponding value of rrsquos fi field in E The semantics of a field update is given bythe clause [recSet] This statement extends the valuation E by mapping the outputstructure rprime to a new value where the updated field fi is mapped to the value of e inE and every other field is mapped to the same value it had in E Finally the last twoclauses correspond to a partial structure equality test As shown by [recEqualsT ] thestatement yields the exit label true if and only if the values of every field gi in the givenset of fields are equal for r and rprime in E Otherwise the statement yields the label falseIn both cases the valuation E remains unchanged

Table 49 ndash Operational Semantics of αSmil Structure-RelatedStatements

[recNew]Eprime = E [r rarr f1 = E(e1) fi = E(ei) fn = E(en)]lang

E [r = e1 en]rang trueminusminusrarr Eprime

[recAll]

E(r) = f1 = v1 fn = vnEprime = E [o1 rarr v1] [o2 rarr v2] [on rarr vn] foralli j i 6= j oi 6= ojlang

E [o1 on = r]rang trueminusminusrarr Eprime

[recGet]

E(r) = f1 = v1 fi = vi fn = vnEprime = E [orarr vi]lang

E [o = rfi]rang trueminusminusrarr Eprime

[recSet]

E(r) = f1 = v1 fi = vi fn = vnEprime = E

[rprime rarr f1 = v1 fi = E(e) fn = vn

]langE [rprime = r with fi = e]

rang trueminusminusrarr Eprime

74 Chapter 4 The αSmil Language

[recEqualsT ]

E(rprime) = f1 = vf1 fn = vfnE(rprimeprime) = f1 = wf1 fn = wfn

g1 gk sube f1 fn vgi = wgi foralli isin 1 klangE [rprime = 〈g1 gk〉rprimeprime]

rang trueminusminusrarr E

[recEqualsF ]

E(rprime) = f1 = vf1 fn = vfnE(rprimeprime) = f1 = wf1 fn = wfn

g1 gk sube f1 fn existi i isin 1 k vgi 6= wgilangE [rprime = 〈g1 gk〉rprimeprime]

rang falseminusminusrarr E

Table 410 details the semantics of variant-related statements As indicated by the[varCons] clause the construction of a variant v with a constructor Cp always yieldsthe exit label true The obtained valuation Eprime is an extension of E where the valueof v is obtained by applying the constructor Cp to the argumentrsquos value E(e) Avariant switch exits with the label λCi if the value of v in E has been constructedwith the Ci constructor The valuation Eprime obtained after executing the statement is anextension of E whereby the corresponding output oi is mapped to the value of the Ciconstructorrsquos argument E(e) The last two clauses [varPossibleT ] and [varPossibleF ]indicate the semantics of a variant possible check and correspond to the statementrsquospossible exit labels The statement will yield the label true only if the value of v in E hasbeen obtained with a constructor D that is a member of the given set of constructorsC1 Ck Otherwise the false label will be returned In both cases the valuationremains unchanged

Table 410 ndash Operational Semantics of αSmil Variant-RelatedStatements

[varCons]Eprime = E [v rarr Cp[E(e)]]langE [v = Cp[e]]

rang trueminusminusrarr Eprime

[varSwitch]

E(v) = Ci[e] Eprime = E [oi rarr E(e)]langE [switch(v) as [o1| |on]]

rang λCiminusminusrarr Eprime

[varPossibleT ]E(v) = D[e] D isin C1 Cklang

E [v isin C1 Ck]rang trueminusminusrarr E

[varPossibleF ]

E(v) = D[e] D isin C1 CklangE [v isin C1 Ck]

rang falseminusminusrarr E

44 Operational Semantics of αSmil Statements 75

Table 411 describes the semantics of array-related statements Each array-relatedstatement has two corresponding clauses one for each of the Boolean exit labels Ac-cessing an arrayrsquos element yields the exit label true if the given index i is a valid indexThe resulting valuation Eprime is extended by mapping the output o to the value in E ofthe arrayrsquos i-th element Otherwise when the given index i is invalid as indicatedby the [arrGetF ] clause the statement yields the label false and leaves the valuationunmodified The semantics of an array update is given by the [arrSetT ] and [arrSetF ]clauses If the given index i is valid the exit label true is yielded and the resultingvaluation is obtained by extending E with aprime whose i-th elementrsquos value is the value ofe in the initial valuation E The values of all other elements of aprime are the ones found inE for the elements of a On the contrary if the given index i is invalid the valuationremains unchanged and the label false is yielded

Table 411 ndash Operational Semantics of αSmil Array-RelatedStatements

[arrGetT ]

E(a) = (P (v)k) E(i) isin P Eprime = E[orarr vE(i)

]langE [o = a[i]]

rang trueminusminusrarr Eprime

[arrGetF ]

E(a) = (P (v)k) E(i) isin PlangE [o = a[i]]

rang falseminusminusrarr E

[arrSetT ]

E(a) = (P (v)k) E(i) isin P

E

[aprime rarr (P (w)k) wk =

E(e) if k = E(i)vk otherwise

]langE [aprime = [a with i = e]]

rang trueminusminusrarr Eprime

[arrSetF ]

E(a) = (P (v)k) E(i) isin PlangE [aprime = [a with i = e]]

rang falseminusminusrarr E

The semantics of a generic predicate call p(e1 en) [λ1 o1 | | λm om] is cap-tured by the [pCall] inference rule shown in Table 412 Interpreting the predicate p inthe context of its argumentsrsquo values in the valuation E yields a label λi and a map-ping between its formal output arguments and their resulting values vij The resultingevaluation Eprime is obtained by extending E with the output variables oij mapped to thecorresponding vij

The interpretation of a statement is well-typed with respect to a signature if andonly if every tuple in the interpretation is well-typed ie if it has the expected numberof inputs with the adequate types and an adequate label with well-typed outputs as

76 Chapter 4 The αSmil Language

well Furthermore it has to be total ie for every well-typed tuple of inputs thereexists a label and some outputs that match in the interpretation

Table 412 ndash Semantics of a Predicate Call

Σ(p) = p(x1 τ1 xn τn)[λ1 (τ1 y1)| | λi (τi1 yi1 τiki yiki)| | λm (τm ym)]

I(p inputs) = (outputs λi) inputs(xl) = E(el)foralll isin 1 noutputs(yi1) = vi1 outputs(yiki) = viki

Eprime = E [oi1 rarr vi1] [oiki rarr viki ]langE [p(e1 en) [λ1 o1 | | λm om]]

rang λiminusrarr EprimepCall

Definition 447 Subject Reduction PropertyThe interpretation of a well-typed statement given well-typed interpretations for

the external predicate calls preserves the fact that the valuation is well-typed

forall Γ E s λΣ (Γ ` E) and (ΣΓO ` srarr λ) and (langE [s]

rang λminusrarr Eprime) =rArr Γ ` Eprime

Definition 448 The Progress PropertyA well-typed statement in a well-typed environment can always be interpreted to

some label and outputs

forall EΓΣ s (Γ ` E) and (ΣΓO ` srarr λ) =rArr existλprime EprimelangE [s]

rang λprimeminusrarr Eprime

The well-typedness of an interpretation as well as the subject reduction and progressproperties have been formally proven in Coq by Steacutephane Lescuyer

77

Chapter 5

Dependency Analysis forFunctional Specifications

like islands in the sea separate onthe surface but connected in the deep

William James

Algebraic data types (structures and variants) and associative arrays are fundamen-tal building blocks for representing grouping and handling complex data efficientlyHowever as argued in Chapter 1 operations manipulating them are rarely concernedwith the entire compound input data structure Frequently they depend only on a lim-ited subset of their input Complete specifications or contracts (Meyer 1997) of suchoperations will not only stipulate that the output possesses a certain property (BorgidaMylopoulos and Reiter 1993 Polikarpova et al 2013) but will also include their frameconditions (Borgida Mylopoulos and Reiter 1995) ie the parts of the input on whichthey operate Such conditions facilitate reasoning locally without overlooking the globalpicture if a property P is known to hold at a certain point in the program where apredicate p is called P still holds after the call to p provided that the (sub)structureson which P depends are disjoint from the (sub)structures that might be modified ac-cording to prsquos frame condition (Banerjee and Naumann 2014) Though intuitivelyeasy specifying and proving the preservation of logical properties for the unmodifiedpart is a particular manifestation of the frame problem (McCarthy and Hayes 1969Leavens Leino and Muumlller 2007) ndash a notoriously cumbersome task in formal softwareverification imposing unnecessary manual effort (Meyer 2015)

One of the challenges of addressing this problem and thereby simplifying the ver-ification of certain preserved properties is to determine the input fragments on whichthese properties depend ie their footprint (Distefano OrsquoHearn and Yang 2006)or to a first approximation their read effects (Feijs and Jonkers 1992 Greenhouseand Boyland 1999 Clarke and Drossopoulou 2002) While specifications sometimesinclude the write effects (Clarke and Drossopoulou 2002) of an operation through mod-ifies clauses (Guttag et al 1993b) read effects are usually not specified explicitly eventhough this information can be useful for reasoning about an operationrsquos results Thepurpose of the dependency analysis presented in this chapter is to take a step forward in

78 Chapter 5 Dependency Analysis for Functional Specifications

this direction and to detect such information automatically More precisely our analy-sis is a static dependency analysis for the αSmil language (presented in Chapter 4) thatcomputes a conservative approximation of the input fragments on which the operationsdepend

Dependence and liveness analyses are traditionally used in the compilation realmfor code optimization (Kennedy 1978) dead code elimination (Knoop Ruumlthing andSteffen 1994 Wand and Siveroni 1999 Liu and Stoller 2003) program slicing (Weiser1984 Tip 1995 Reps and Turnidge 1996 Castillo et al 2008) or compile-time garbagecollection (Jones and Meacutetayer 1989 Park and Goldberg 1992 Wand and Clinger1998) In contrast to the vast majority of static analyses that are meant to be usedstrictly on code and in an essentially purely automatic setting our analysis is thoughtof as a companion tool to be exploited in the middle of interactive program verificationand it is designed to be used on programs as well as on specifications

51 Dependency Analysis in a NutshellIn a nutshell our dependency analysis targets the delimitation of the input subset onwhich the output depends in the context of an operation with a compound input Wedefine dependency as the observed part of a structured domain and strive to obtain type-sensitive results distinguishing between the subelements of arrays and algebraic datatypes and capturing the dependency specific to each The targeted results are meantto mirror ndash in terms of dependency ndash the layered structure of compound data typesFurthermore the dependency analysis must work with conservative approximations andit must guarantee that what is marked as not needed is definitely not needed ie it isirrelevant for the obtained output

In the classification of Hind (Hind 2001) our dependency analysis is a flow-sensitive field-sensitive interprocedural analysis that handles associative arrays struc-tures and variant data types Specific dependency results are computed for each of thepossible execution scenarios ie for each exit label Thus our analysis also shows aform of path-sensitivity (Hind 2001) However we favour the term label-sensitivity todescribe this characteristic as it seems more appropriate applied to our case and thelanguage we are working with

Our dependency analysis targets complex transition systems in general and oper-ating systems and microkernels in particular These are characterized by states definedby complex compound data structures and by transitions ie state changes that mapan input state to an output state Automatically proving the preservation of invariantsconcerning only subelements of the state ie fields array cells etc that have not beenaltered by a transition in the system would considerably diminish the number of proofobligations The first step towards achieving this goal consists in automatically detect-ing dependency summaries and the minimum relevant input information for producingcertain outputs

As mentioned our analysis targets fine-grained dependency summaries for arraysstructures and variants expressed at the level of their subelements For variants

51 Dependency Analysis in a Nutshell 79

besides capturing the specific dependency on each constructor and its arguments weargue that additional relevant information can be computed regarding the subset ofpossible constructors at a given program point This is not dependency informationper se but it enriches the footprint of a predicate with useful information Togetherwith the dependency information this additional information about constructors ismeant to answer the same question namely what fragments of the input influence theoutput from a different albeit related point of view Therefore we are simultaneouslyperforming a possible-constructors analysis This has an impact on the defined abstractdependency type making it more complex as we will see in the following section Thepossible-constructors analysis could be performed separately as a stand-alone analysisBy performing the two analyses simultaneously we lose some of the precision thatwould be attained if the two were performed separately but we reduce overhead andpresent relevant information in a unified manner

Designing the analysis as a tool to be used in the context of interactive programverification on both code and specifications has led to specific traits One of themconcerns the treatment of arrays In contrast to dependence and liveness analyses usedfor code optimizations (Gross and Steenkiste 1990) which require precision for everyarray cell we compute dependency information referring to all cells of the array orto all but one cell for which an exceptional dependency is computed In practice aconsiderable number of relevant properties and operations involving arrays fall into thisspectrum

In the following subsection in order to better illustrate the problem that our analysisaddresses we briefly present two examples of αSmil predicates manipulating structuresvariants and arrays and describe the dependency information that we are targeting

511 Targeted Dependency Information

To present the envisioned dependency results we consider two αSmil predicates threadand start_address whose control flow graphs and implementations are shown belowBoth predicates manipulate inputs of type process introduced in Section 315 (onpage 49) and shown in Figure 52 Internally they handle values of type thread andmemory_region respectively described in Section 315 (on page 48) as well and shownbelow in Figure 51

type memory_region = Start addressstart int Region lengthlength int

type thread = Identifierid int Current statecrt_state state Stackstack memory_region

Figure 51 ndash Example Data Types ndash Thread and Memory Region

80 Chapter 5 Dependency Analysis for Functional Specifications

type option ltAgt =| None| Some (A a)

type process = Array of associated threadsthreads array ltoption ltthread gtgt Internal idpid int Currently running threadcrt_thread int Address spaceadr_space address_space

Figure 52 ndash Input Type ndash Process

The first predicate thread having the control flow graph shown in Figure 54 andwhose implementation is shown in Figure 53 receives a process p and an index ias inputs It reads the i-th element in the threads array of the input process p Ifthis element is active then the predicate exits with the label true and outputs thecorresponding thread ti Otherwise it exits with the label None and no output isgenerated

predicate thread ( process p int i)-gt [ true thread ti|None|oob]

array ltoption ltthread gtgt th option ltthread gt tio th = p threads [ true -gt 1]tio = th[i] [ true -gt 2 f a l s e -gt 5]switch (tio) as [ |ti] [None -gt 4 Some -gt 3][ true][None ][oob]

Figure 53 ndash Predicate thread ndash Implementation

Our dependency analysis should be able to distinguish between the different exitlabels of the predicate For the label true for instance it should detect that onlythe field threads is read by the predicate while all others are irrelevant to the resultFurthermore it should detect that for the threads array of the input p only the i-thelement is inspected Additionally since we are considering the label true the i-thelement is necessarily an active thread indicated by the constructor Some The otherconstructor None is impossible for this execution scenario On the contrary for theexit label None the constructor Some is impossible For the exit label oob nothing butthe index i and the ldquosupportrdquo or ldquolengthrdquo of the associated threads array is read Thetargeted dependency results for the predicate thread are depicted in Figure 55

The second predicate start_address whose control flow graph is shown in Fig-ure 56 receives a process p and an index j as inputs and finds the start address of

51 Dependency Analysis in a Nutshell 81

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some None

Figure 54 ndash Gthread ndash Control Flow Graph of Predicate thread

Exit label true

adr_space

crt_thread

pid

process p

ithreads

Exit label None

adr_space

crt_thread

pid

process p

ithreads

optionltthreadgt

Some(thread t)

None

ReadNeeded

IrrelevantNot Needed

Figure 55 ndash Targeted Dependency Results for Predicate thread

the stack corresponding to an active thread It makes a call to the predicate threadthus reading the j-th element of the threads array of its input process If this is anactive element it further accesses the field stack from which it only reads the startaddress start Otherwise if the element is inactive the predicate forwards the exitlabel None of the called predicate thread and generates no output When given aninvalid index i the predicate exits with label oob The predicatersquos implementation isshown in Figure 57

The dependency information for this predicate should capture the fact that on thetrue execution scenario only the field start of the inputrsquos j-th associated thread isread Furthermore the only possible constructor on this execution path is the Someconstructor On the contrary for the None execution scenario the only possible con-structor is the None constructor The targeted dependency results for the start_addresspredicate are depicted in Figure 58 We remark that for the oob execution scenarioonly the ldquosupportrdquo or ldquolengthrdquo of the threads array is read

82 Chapter 5 Dependency Analysis for Functional Specifications

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

error

trueNone

true

true

oob

Figure 56 ndash Gstart_address ndash Control Flow Graph of Predicatestart_address

predicate start_address ( process p int j)-gt [ true int adr|None]

thread tj memory_region sj thread (p j)[ true tj | None | oob] [ true -gt 1

None -gt 4 oob -gt 5]sj = tj stack [ true -gt 2]adr = sjstart [ true -gt 3][ true][None ][error]

Figure 57 ndash Predicate start_address ndash Implementation

Exit label true

adr_space

crt_thread

pid

process p

threads

idcrt_state

stack

thread tjstartstack stj

lengthExit label None

adr_space

crt_thread

pid

threads

process p

optionltthreadgt

Some(thread t)

None

ReadNeeded

IrrelevantNot Needed

Figure 58 ndash Targeted Dependency Results for Predicatestart_address

52 Abstract Dependency Domain 83

512 Outline

The rest of this chapter is focusing on technical details related to the dependency analy-sis In Section 52 we present the abstract dependency domain This is the fundamentalbuilding block on which our analysis relies in order to determine expressive dependencysummaries It is followed in Section 53 by an in-depth description of our analysis at anintraprocedural level underlining the data-flow equations in Section 532 and explain-ing them by illustrating the step-by-step mechanism on an example in Section 533 Asummary of the dependency analysis at an interprocedural level is given in Section 54We illustrate the approach underline its shortcomings on an example in Section 541and discuss their origin in Section 542 Two different semantic interpretations of ourdependency information are discussed in Section 55 In Section 56 we review anddiscuss approaches targeting information that is similar to our dependency summariesFinally in Section 57 we conclude and present some other potential applications ofour dependency analysis which are not confined to the field of interactive programverification

52 Abstract Dependency DomainThe first step towards inferring expressive type-sensitive results that capture the de-pendency specific to each subelement of an algebraic data type or an associative arrayis the definition of an abstract dependency domain D that mimics the structure of suchdata types The dependency domain δ isin D shown below is defined inductively fromthe three atomic cases mdash gt and perp mdash and mirrors the structure of the concretetypes

Definition 521 Dependency Domain δ isin D

δ = | gt Everything ndash atomic case (i)| Nothing ndash atomic case (ii)| perp Impossible ndash atomic case (iii)| f1 7rarr δ1 fn 7rarr δn f1 fn fields (iv)| [C1 7rarr δ1 Cm 7rarr δm] C1 Cm constructors (v)| 〈δ〉 (vi)| 〈δdef i δexc〉 i array index (vii)

As reflected by the above definition the dependency for atomic types is expressed interms of the domainrsquos atomic cases gt (least precise) denoting that everything is neededand denoting that nothing is needed The third atomic case perp denoting impossibleis introduced for the possible constructors analysis performed simultaneously and isfurther explained below

The dependency of a structure (iv) describes the dependency on each of its fields Forinstance revisiting our thread example from Section 511 we could express an over-approximation of the dependency information depicted for the process p in Figure 55

84 Chapter 5 Dependency Analysis for Functional Specifications

using the following dependency

threads 7rarr gt pid 7rarr crt_thread 7rarr adr_space 7rarr

This captures the fact that all fields except the threads field are irrelevant ie theyare not read and nothing in their contents is needed The dependency for the threadsfield is an over-approximation and expresses the fact that it is entirely necessary ieeverything in its value is needed for the result

For arrays we distinguish between two cases namely arrays with a general depen-dency applying to all of the cells given by (vi) and arrays with a general dependencyapplying to all but one exceptional cell for which a specific dependency is known givenby (vii) For instance for the threads field of the previous example the following de-pendency

〈 i gt〉

would be a less coarse approximation capturing the fact that only the i-th element ofthe associated threads array is needed while all others are irrelevant

For variants (v) the dependency is expressed in terms of the dependencies of theirconstructors expressed in turn in terms of their argumentsrsquo dependencies Thus aconstructor having a dependency mapped to is one for which nothing but the taghas been read ie its arguments if any are irrelevant for the execution For in-stance for the i-th element of the threads array of our previous example the followingdependency

[Some 7rarr gt None 7rarr ]

would be a more precise approximation when considering the exit label true It isstill an over-approximation as it expresses that both constructors are possible Theargument of the Some constructor is entirely read while for None only the tag is read

For variants we want to take a step further and to also include the informationthat certain constructors cannot occur for certain execution paths Impossible thethird atomic case mdash perp mdash is introduced for this purpose As mentioned previouslyin Section 51 in order to obtain this additional information we perform a ldquopossible-constructorsrdquo analysis simultaneously which computes for each execution scenario thesubset of possible constructors for a given value at a given program point All construc-tors that cannot occur on a given execution path are marked as being perp In contrastconstructors for which only the tag is read are marked as The difference between perpand can be illustrated by considering a polymorphic option type optionltAgt havingtwo constructors None and Some(A val) respectively and a Boolean predicate thatpattern matches on an input of this type and returns false in the case of None andtrue in the case of Some unconditioned by the value val of its argument For thetrue execution scenario the dependency on the Some constructor would be Thetag is read and it is decisive for the outcome but the value of its argument val iscompletely irrelevant The dependency on the None constructor however would be perpthe predicate can exit with label true if and only if the input matches against the Someconstructor By distinguishing between these two cases we can not only distinguish the

52 Abstract Dependency Domain 85

inputrsquos subelements that have a direct impact on an operationrsquos output but addition-ally we can also obtain a more detailed footprint that highlights the influence exertedby the inputrsquos ldquoshaperdquo on the operationrsquos outcome

For instance for the i-th element of the threads array of our previous example adependency mapping the constructor None to perp would be a more precise approximationwhen considering the label true Taking into account all the discussed values we canexpress the dependency depicted in Figure 55 for the label true as follows

threads 7rarr 〈 i [Some 7rarr gt None 7rarr perp]〉pid 7rarr crt_thread 7rarr adr_space 7rarr

We remark that gt and perp can apply to any type For instance gt can be seen

as a placeholder for data that is needed in its entirety Structure array or variantdependencies whose subelements are all entirely needed and thus uniformly mappedto gt are transformed to gt The perp dependency is a placeholder for data that cannotoccur on a certain execution scenario A whole variant value is impossible if all itsconstructors are mapped to perp A whole structure or array is impossible if any of itssubelements is impossible

The perp atomic value is the lower bound of our domain and hence the most precisevalue The final abstract dependency is a closure of all these combined recursively Togive an intuition of the shape of our dependency lattice we illustrate below in Figure 59the Hasse diagram of the order relation between pairs of atomic dependency valuesIntuitively if the two analyses would be performed separately the upper ldquodiamondrdquoshape would correspond to the dependency analysis and the lower one to the possible-constructors analysis The element would be the lower bound for the dependencydomain and the upper bound for the possible-constructors domain By performingthem simultaneously perp becomes the domainrsquos lower bound

(gtgt)

(gt) (gt)

()

(perp) (perp)

(perpperp)

(gtperp) (perpgt)

Figure 59 ndash Order Relation on Pairs of Atomic Dependencies

The partial order relation is denoted by v and defined as shown below

Definition 522 Partial Order v

v sube D timesD

86 Chapter 5 Dependency Analysis for Functional Specifications

Table 51 ndash v ndash Comparison of Two Domains

δ v gtTop

perp v δBot

δ1 v δprime1 δn v δprimenf1 7rarr δ1 fn 7rarr δn v f1 7rarr δprime1 fn 7rarr δprimen

Str v δ1 v δn v f1 7rarr δ1 fn 7rarr δn

Str

δ1 v δprime1 δn v δprimen[C1 7rarr δ1 Cn 7rarr δn] v [C1 7rarr δprime1 Cn 7rarr δprimen]

Var v δ1 v δn v [C1 7rarr δ1 Cn 7rarr δn]

Var

δdef v δprimedef

〈δdef 〉 v 〈δprimedef 〉ADef

v δdef

v 〈δdef 〉ADef

δdef v δprimedef δexc v δprimedef

〈δdef i δexc〉 v 〈δprimedef 〉AIA

δdef v δprimedef δdef v δprimeexc

〈δdef 〉 v 〈δprimedef i δprimeexc〉AAI

δdef v δprimedef δexc v δprimeexc

〈δdef i δexc〉 v 〈δprimedef i δprimeexc〉AI v δdef v δexc

v 〈δdef i δexc〉AI

δdef v δprimedef δexc v δprimeexc δdef v δprimeexc δexc v δprimedef i 6= j

〈δdef i δexc〉 v 〈δprimedef j δprimeexc〉AIJ

It is used to compare dependencies and it is detailed in Table 51 We write δ1 v δ2and we read it as ldquoa dependency δ1 is more precise than another dependency δ2rdquo ifit represents a smaller subset of a structural object and if it allows at most as manyconstructors as δ2 The greatest element is gt (Top) and perp is the least (Bot) Instancesof identical structure and variant types are compared pointwise (Str Var) For arrayswithout known exceptional dependencies we compare the default dependencies applyingto all array cells (ADef) If exceptional dependencies are known for the same cell theseare additionally compared (AI) For arrays with known exceptional dependencies fordifferent cells we compare each dependency on the left-hand side with each one on theright-hand side (AIJ) The comparison of with structures (Str) variants (Var)and arrays (ADef AI) is a pointwise comparison between and the dependencyof each subelement

521 Join and Reduction Operator

The join operation is denoted by or and it is defined as shown below

Definition 523 Join Operation or

or D timesD rarr D

52 Abstract Dependency Domain 87

It is detailed in Table 52 Intuitively the join of two dependencies is the union ofthe dependencies represented by the two It is a commutative operation for which theundisplayed cases in Table 52 are defined by their symmetrical counterparts Theoperation is total joining incompatible domains such as a structure and a variant ortwo structures having different field identifiers results in gt the least precise valueJoin is applied pointwise on each subelement perp is its identity element and gt is itsabsorbing element Joining and the dependency of a structure variant or array isapplied pointwise The value obtained by joining δ and δprime is an upper bound of the two

δ v δ or δprime and δprime v δ or δprime forall δ δprime isin D

Defining the join of two dependencies corresponding to arrays is subtle As shownin Table 51 we are allowing comparisons between dependencies corresponding to ar-rays with exceptions on different variables (rule AIJ) the join operation in this caseamounts to joining the four different dependencies without keeping any of the two ex-ceptions We could have chosen to keep one of the known exceptional dependenciesbut this would have posed two problems on one hand the join operation would notbe commutative and on the other hand it is hard to predict how the exceptionaldependencies would be used at the intraprocedural level and which of the two couldpotentially lead to a gain in precision Thus we adopted this design decision Astrategy possibly worth investigating in such cases would be to allow users to specifyarray cells of interest at specific program points This user-supplied information couldthen be taken into consideration whenever joining array dependencies with two differ-ent known exceptional dependencies Our current join approach for arrays can lead tonon-monotonic approximations in join This becomes visible when noting that for a

Table 52 ndash or ndash Join Operation

δprime δprimeprime δprime or δprimeprime

gt or δ = gtperp or δ = δ

f1 7rarr δ1 fn 7rarr δn or f1 7rarr δprime1 fn 7rarr δprimen = f1 7rarr δ1 or δprime1 fn 7rarr δn or δprimen or f1 7rarr δ1 fn 7rarr δn = f1 7rarr or δ1 fn 7rarr or δn

[C1 7rarr δ1 Cn 7rarr δn] or [C1 7rarr δprime1 Cn 7rarr δprimen] = [C1 7rarr δ1 or δprime1 Cn 7rarr δn or δprimen] or [C1 7rarr δ1 Cn 7rarr δn] = [C1 7rarr or δ1 Cn 7rarr or δn]

〈δdef 〉 or 〈δprimedef 〉 = 〈δdef or δprimedef 〉 or 〈δdef 〉 = 〈 or δdef 〉

〈δdef 〉 or 〈δprimedef i δprimeexc〉 = 〈δdef or δprimedef i δdef or δprimeexc〉 or 〈δdef i δexc〉 = 〈 or δdef i or δexc〉

〈δdef i δexc〉 or 〈δprimedef j δprimeexc〉i = j

i 6= j=

〈δdef or δprimedef i δexc or δprimeexc〉〈δdef or δexc or δprimedef or δprimeexc〉

or =

88 Chapter 5 Dependency Analysis for Functional Specifications

monotonic join operation the following should hold

forallδ δprime ρ δ v δprime =rArr δ or ρ v δprime or ρ (i)

Consideringρ equiv 〈ρdef i ρi〉δ equiv 〈δdef j δj〉δprime equiv 〈δprimedef i δprimei〉 where i 6= j

the hypothesis δ v δprime is translated into the following constraints

δdef v δprimedef δdef v δprimei δj v δprimedef δj v δprimei

Applying (i) for these three dependencies we obtain

〈(δdef or δj) or (ρdef or ρi)〉 v 〈δprimedef or ρdef i δprimei or ρi〉

which holds if and only if both of the following inequalities hold

(δdef or δj) or (ρdef or ρi) v δprimedef or ρdef(δdef or δj) or (ρdef or ρi) v δprimei or ρi

Considering for instance

ρi = gt ρdef 6= gt δdef = δj = δprimedef = perp

a counterexample is foundAs a consequence of the non-monotonic approximations made for arrays (rule AIJ)

the value obtained by joining two dependencies is an upper bound not a least upperbound We address this issue and indicate our solution in Section 53 (on page 94)We remark that we keep only one exceptional cell for array dependencies as in practicemost operations manipulating arrays tend to either modify only one element or all ofthem Logical properties on arrays generally have to hold for all elements Keepingmore than one exceptional dependency would be much more costly and the additionalcost would not necessarily be justified in practice However the join operation wouldbe more straightforward and would not impose non-monotonic approximations

Besides join a reduction operator denoted by oplus has been defined as well

Definition 524 Reduction Operator oplus

oplus D timesD rarr D

This is a recursive commutative pointwise operation Intuitively this operator is intro-duced for taking advantage of the information additionally computed by the possible-constructors analysis that we perform simultaneously Following the same executionpath the same constructors must be possible The reduction operator is used in orderto incorporate this additional information computed for constructors The dependency

52 Abstract Dependency Domain 89

analysis can be seen as amay analysis ie when combining the dependency informationcomputed at two different points on the same execution path the result must accountfor all dependencies computed at any of the two combined points In contrast thepossible-constructors analysis can be seen as a must analysis ie when combining in-formation at two different points on the same execution path it needs to keep facts thathold at both combined points Thus the reduction operator combines dependencies onthe same execution path and consists in performing the intersection of constructors inthe case of variants and the union of dependencies for all other types The reductionoperatorrsquos role will become more transparent after presenting the intraprocedural de-pendency analysis and the corresponding data-flow equations in Section 53 Its identityelement is and its absorbing element is perp The reduction operator between gt andthe dependency of a structure variant or array is applied pointwise Two instances ofidentical variant types are pointwise reduced Similarly to join the undisplayed casesin Table 53 are defined with respect to their symmetrical counterparts

δprime δprimeprime δprime oplus δprimeprime

perp oplus δ = perp oplus δ = δ

f1 7rarr δ1 fn 7rarr δn oplus f1 7rarr δprime1 fn 7rarr δprimen = f1 7rarr δ1 oplus δprime1 fn 7rarr δn oplus δprimenf1 7rarr δ1 fn 7rarr δn oplus gt = f1 7rarr δ1 oplusgt fn 7rarr δn oplusgt[C1 7rarr δ1 Cn 7rarr δn] oplus [C1 7rarr δprime1 Cn 7rarr δprimen] = [C1 7rarr δ1 oplus δprime1 Cn 7rarr δn oplus δprimen][C1 7rarr δ1 Cn 7rarr δn] oplus gt = [C1 7rarr δ1 oplusgt Cn 7rarr δn oplusgt]

〈δdef 〉 oplus 〈δprimedef 〉 = 〈δdef oplus δprimedef 〉〈δdef 〉 oplus 〈δprimedef i δprimeexc〉 = 〈δdef oplus δprimedef i δdef oplus δprimeexc〉

〈δdef i δexc〉 oplus 〈δprimedef j δprimeexc〉 =〈δdef oplus δprimedef i δdef oplus δprimeexc〉 where i = j

〈(δdef or δexc)oplus (δprimedef or δprimeexc)〉 otherwise〈δdef 〉 oplus gt = 〈δdef oplusgt〉

〈δdef i δexc〉 oplus gt = 〈δdef oplusgt i δexc oplusgt〉gt oplus gt = gt

Table 53 ndash oplus ndash Reduction Operator

Finally the extractions summarized in Table 54 have been defined for dependenciesδ and are used to express the data-flow equations of Section 53Definition 525 Extraction of a fieldrsquos dependency

f D 9 D

Definition 526 Extraction of a constructorrsquos dependency

C D 9 D

Definition 527 Extraction of an arrayrsquos cell dependency

〈i〉 D 9 D

90 Chapter 5 Dependency Analysis for Functional Specifications

Definition 528 Extraction of an arrayrsquos dependency outside a cell i

〈lowast i〉 D 9 D

Definition 529 Extraction of an arrayrsquos general dependency

〈lowast〉 D 9 D

They are partial functions and can only be applied on dependencies of the cor-responding kind For instance the field extraction f only makes sense for atomic orstructured values with a field named f which should be the case if the dependencyrepresents a variable of a structured type with some field f For any of the atomicdependencies δa applying any of the defined extractions yields δa

Table 54 ndash Dependency Extractions

δf f isin F

gtf = gtf = perpf = perpf1 7rarr δ1 fn 7rarr δnf = δi if f = fi

δCC isin C

gtC = gtC = perpC = perp[C1 7rarr δ1 Cm 7rarr δm]C = δj if C = Cj

δ〈lowast i〉 δ〈i〉 δ〈lowast〉

gt〈lowast i〉 = gt gt〈i〉 = gt gt〈lowast〉 = gt〈lowast i〉 = gt 〈i〉 = 〈lowast〉 = perp〈lowast i〉 = perp perp〈i〉 = perp perp〈lowast〉 = perp〈δdef 〉〈lowast i〉 = δdef 〈δdef 〉〈i〉 = δdef 〈δdef 〉〈lowast〉 = δdef

〈δdef k δexc〉〈lowast i〉 =δdef when i = kδdef or δexc otherwise

〈δdef k δexc〉〈i〉 =δexc when i = kδdef or δexc otherwise

〈δdef k δexc〉〈lowast〉 =δdef or δexc

522 Well-Typed Dependencies

The described syntactic dependencies are untyped However their interpretation ismade in the context of a type τ Dependencies such as or gt do not exhibit any datatype features and can apply to any type but others will be completely constrained andmost will fall in between uncovering a few layers of structured types before reaching oneof the ldquogenericrdquo leaves gt or perp For example the dependency f 7rarr δf only reallymakes sense for structured types with a single field f whose type itself is compatiblewith δf and shall not be used in connection with variant or array types

As a consequence we conclude the presentation of our abstract dependency typeby explaining what it means for a dependency to be compatible with some type τ ie

53 Intraprocedural Analysis and Data-Flow Equations 91

to be well-typed of some type τ This is described as a judgement parameterized by thetyping environment Γ (Definition 431) and the different inference rules are detailed inTable 55

Γ ` gt τWTgt

Γ ` perp τWTperp

Γ ` τWT

τ = structf1 τ1 fn τnΓ ` δ1 τ1 Γ ` δn τnΓ ` f1 7rarr δ1 fn 7rarr δn τ

WTStruct

τ = variant[C1 τ1| | Cn τn]Γ ` δ1 τ1 Γ ` δn τnΓ ` [C1 7rarr δ1 Cn 7rarr δn] τ

WTVar

Γ ` δdef τΓ ` 〈δdef 〉 arrτi〈τ〉

WTArr

Γ ` δdef τ Γ ` δexc τ Γ(i) = τi

Γ ` 〈δdef i δexc〉 arrτi〈τ〉WTArrI

Table 55 ndash Well-Typed Dependencies

The atomic dependency values are generic they are well-typed with respect to anytype (WTgt WT WTperp) The dependency δ for a structure (WTStruct) is well-typed only with respect to an adequate structured type whose field types are themselvescompatible with the dependency mapped to them in δ Similarly the dependency δfor a variant (WTVar) is well-typed only with respect to an adequate variant typeIn turn its constructors must be themselves compatible with the dependency mappedto them in δ For well-typed array dependencies (WTArr WTArrI) the defaultdependency as well as the exceptional dependency have to be compatible with thetype τ of the arrayrsquos elements Furthermore the type of i the index of the knownexceptional dependency has to be compatible with τi the arrayrsquos index type

In the following section we are discussing our intraprocedural dependency domainand the manner in which dependencies are computed and manipulated

53 Intraprocedural Analysis and Data-Flow Equations

531 Intraprocedural Dependency Domains

At an intraprocedural level dependency information has to be kept at each point ofthe control flow graph for each variable of the typing environment Γ that maps input

92 Chapter 5 Dependency Analysis for Functional Specifications

output and local variables to their types We use the term domain to denote thisinformation

Definition 531 Intraprocedural Dependency Domain ∆ isin D An intraproceduraldomain ∆ isin D

∆ V rarr D

is a mapping from variables to dependencies

An intraprocedural domain is associated to every node of the control flow graph rep-resenting the dependencies at the nodersquos entry point A special case is the mappingwhich binds all variables to perp which we call Unreachable

Unreachable equiv x 7rarr perp

In particular it is associated to nodes that cannot be reached during the analysisAlso if any of the variables of ∆ is marked as perp the entire node collapses becomingUnreachable

For any node of the control flow graph associated to an intraprocedural domain ∆∆(x) retrieves the dependency associated to the variable x If a dependency for x hasnot been computed yet it is mapped to

Forgetting a variable x from a reachable intraprocedural domain denoted by ∆ xldquoerasesrdquo the variablersquos dependency information by mapping it to

Definition 532 Forget x

∆ x =

Unreachable when ∆ = Unreachable

∆prime = y 7rarr

∆(y) when y 6= x when y = x

The v∆ or∆ and oplus∆ operations are pointwise extensions of v (defined in 522) or(defined in 523) and oplus (defined in 524) respectively they apply to intraproceduraldependency domains for each variable and its associated dependency δv

We define a partial order v∆ on D

Definition 533 Intraprocedural Partial Order v∆

v∆ sube D timesD ∆prime v∆ ∆primeprime iff ∆prime(x) v ∆primeprime(x)forallx isin V

In particular Unreachable is the bottom of this intraprocedural lattice It is the identityelement of the intraprocedural join or∆ operation and the absorbing element of theintraprocedural reduction operator oplus∆ defined below

Definition 534 Intraprocedural Join Operation or∆

or∆ D timesD rarr D

∆prime or∆ ∆primeprime = ∆ lArrrArr ∆(x) = ∆prime(x) or∆primeprime(x)forallx isin V

53 Intraprocedural Analysis and Data-Flow Equations 93

Definition 535 Intraprocedural Reduction Operator oplus∆

oplus∆ D timesD rarr D

∆prime oplus∆ ∆primeprime = ∆ lArrrArr ∆(x) = ∆prime(x)oplus∆primeprime(x) forallx isin Γ

Finally an intraprocedural domain ∆ is well-typed with respect to a typing envi-ronment Γ if and only if the dependency mapped to any variable x is well-typed withrespect to xrsquos type in the typing environment Γ (Definition 431)

532 Intraprocedural Data-Flow Equations

Table 56 ndash Statements ndash Representations and Data-Flow Equations

Representation Equationn

n1 ni nk∆n1

∆ni ∆nk

s λ1 s λks λi∆n =

or∆

nsλiminusminusrarrni

JsKλi(∆ni)

Our dependency analysis is a backward data-flow analysis For each exit label ittraverses the control flow graph starting with its corresponding exit node and it marksall other exit points as Unreachable since exit labels are mutually exclusive The in-traprocedural domain for the currently analysed label is initialized with its associatedoutput variables mapped to gt Thereby the analysis starts by making a conservativeapproximation and by considering that all the input has been observed and the outputdepends on it entirely Typically dependence analyses are forward analyses Howevergiven our goal to express label-specific dependencies as input-output relations and tak-ing into consideration the characteristics of the αSmil language choosing to design ouranalysis as a backward data-flow analysis seemed a pertinent choice In αSmil outputsare associated to a particular exit label and they are generated if and only if the pred-icate exits with that particular label By traversing the control flow graph backwardswe can use this information and consider starting with the initialisation phase onlythe outputs that are relevant for the analysed exit label

After the initialisation the analysis then traverses the control flow graph and grad-ually refines the dependencies until a fixed point is reached Table 56 summarizes therepresentation and general equation of the statements For each statement the pre-sented data-flow equation operates on the intraprocedural domains of the statementrsquossuccessor nodes The intraprocedural domain at the entry point of the node is obtainedby joining the contributions of each outgoing edge as shown in Figure 510

Definition 536 The contribution of an edge (ni nj) labeled with s and λ is givenby JsKλ(∆nj ) where JsKλ() is the transfer function of the edge labeled s λ

94 Chapter 5 Dependency Analysis for Functional Specifications

Dependencies corresponding to variables that are written by a statement s on an exitlabel λ denoted by gensλ in Figure 510 are forgotten from the intraprocedural domainon which we are operating

statement

∆in = JsKλ1(∆λ1)or∆ or∆JsKλn(∆λn)JsKλi(∆i) (∆i gensλi

)oplus∆ δsλi

δsλicontribution of s on λi

δsλ1∆λ1

δsλn

∆λn

(∆λ1 gensλ1) oplus∆δsλ1 (∆λn gensλn) oplus∆δsλn

Figure 510 ndash Computation of the Intraprocedural Domain at a NodersquosEntry Point

In Section 521 we explained that as a consequence of the non-monotonic approxi-mations made when joining dependencies corresponding to arrays the result of the joinoperation is an upper bound not a least upper bound In order to deal with this issue weadopt the generic solution consisting of systematically joining the dependency domainassociated to a node before its iteration with the new dependency domain computedby the transfer function Thus the dependency domain of a node n is

∆n = old(∆n)or∆ (or

∆nminusrarrnprime

JsKλ(∆nprime))

This is not prohibitive in terms of performance leading to an increase of the executiontime of 5 to 10

Tables 57 58 59 510 define the transfer functions for each built-in statementof our language whereas the general case of a predicate call and its correspondingequation will be detailed in Section 54

Table 57 presents the transfer functions for statements which are not type-specificFor equality tests (1) both of the inputs e1 e2 are completely read whether the testreturns true or false The transfer functions therefore reduce the domain of the corre-sponding successor node with a domain consisting of e1 and e2 both mapped to gt Inthe case of assignment (2) the dependency of the written output variable o is forgottenfrom the successorrsquos intraprocedural domain thus being mapped to and forwardedto the input variable e The transfer function for the nop operation (3) is simply theidentity

53 Intraprocedural Analysis and Data-Flow Equations 95

Statement JsKλi(∆)

Equality test (1)Je1 = e2Ktrue(∆) = ∆ oplus∆ dep where

Je1 = e2Kfalse(∆) = ∆ oplus∆ dep dep =e1 7rarr gte2 7rarr gt

Assignment (2) Jo = eKtrue(∆) = (∆ o) oplus∆ e 7rarr ∆(o)

No Operation (3) JnopKtrue(∆) = ∆

Table 57 ndash Generic Statements ndash Data-Flow Equations

The data-flow equations given in Table 58 correspond to structure-related state-ments For the equations (4) (5) (6) and (7) we assume that the variable r is of typestructf1 τ fn τ for some fields fi 1 le i le n The equation (4) refers to thecreation of a structure each input ei is read as much as the corresponding field fi ofthe structure is read The destructuring of a structure is handled in (5) each field fi isneeded as much as the corresponding variable oi is When accessing the i-th field of astructure r (6) only the field fi is read and only as much as the accessrsquo result o itselfThe equation (7) treats field updates the variable ei is read as much as the field fi isThe structure r is read as much as all the fields other than fi are read in rprime Finally theequations given in (8) handle partial structure equality tests and the transfer functionsare the same for the labels true or false for both compared structures rprime and rprimeprime all thefields in the given set f1 fk are completely read and only those

Statement JsKλi(∆)

Create (4) Jr = e1 enKtrue(∆) = (∆ r) oplus∆oplus

1leilenei 7rarr ∆(r)fi

Destructure (5) Jo1 on = rKtrue(∆) = (∆ oi| oi isin o) oplus∆ r 7rarr f1 7rarr ∆(o1) fn 7rarr ∆(on)

Access field (6) Jo = rfiKtrue(∆) = (∆ o) oplus∆ r 7rarr f1 7rarr fi 7rarr ∆(o) fn 7rarr

Update field (7) Jrprime = r with fi = eKtrue(∆) = (∆ rprime) oplus∆

ei 7rarr ∆(rprime)fir 7rarr f1 7rarr δ1 fn 7rarr δn

where δj =

∆(rprime)fj if j 6= i otherwise

Equality (8)

Jrprime = 〈f1 fk〉rprimeprimeKtrue(∆) = ∆ oplus∆ d where d =rprime 7rarr f1 7rarr δ1 fn 7rarr δnrprimeprime 7rarr f1 7rarr δ1 fn 7rarr δn

Jrprime = 〈f1 fk〉rprimeprimeKfalse(∆) = ∆ oplus∆ d and δi =gt if fi isin f1 fk otherwise

Table 58 ndash Structure-Related Statements ndash Data-Flow Equations

96 Chapter 5 Dependency Analysis for Functional Specifications

The data-flow equations given in Table 59 correspond to variant-related statementsThey follow the same principles as those used for structure-related statements aboveNote that the transfer functions for the switch (10) and possible constructor test (11)introduce perp dependencies for constructors which are known to be impossible on theconsidered edge In particular since perp is an absorbing element for oplus these transferfunctions erase for every constructor which is known to be locally impossible all thedependency information possibly attached to such a constructor in the successor nodesThis is the actual raison drsquoecirctre for the reduction operator since using or∆ to combinea successor domain and a local contribution would lose this information

Finally the equations for array-related statements are given in Table 510 We as-sume for both that the context is fixed and that I is the distinguished set of inputvariables for the analysed predicate This set is used to make sure that exceptions inarray dependencies are only registered to variables in I and not local or output vari-ables The reason for such a constraint is pragmatic input variables are not assignablein our language and therefore they always represent the same value intraprocedurallyOtherwise each time a variable is written by a statement we would need to traverseall the dependencies in the domain to erase or reinterpret the occurrences where thisvariable appears as an exception Only recording exceptions for input variables makesthis kind of costly traversal useless and since only exceptions about input variablesmake sense at the interprocedural level (see Section 54) we do not lose much precisionby doing so

Statement JsKλi(∆)

Create variant (9) Jv = Cp[e]Ktrue(∆) = (∆ v) oplus∆ e 7rarr ∆(v)Cp

Variant Switch (10) Jswitch(v) as [o1| |on]Kλi(∆) = (∆ oi)oplus∆ v 7rarr depiwhere depi = [C1 7rarr perp Ci 7rarr ∆(oi) Cn 7rarr perp]

Possible variant (11)

Jv isin C1 CkKtrue(∆) = ∆ oplus∆ v 7rarr [C1 7rarr δ1 Cn 7rarr δn ]

where δi =

∆(v)Ci if Ci isin C1 Ckperp otherwise

Jv isin C1 CkKfalse(∆) = ∆ oplus∆v 7rarr

[C1 7rarr δ1 Cn 7rarr δn

]

where δi =

∆(v)Ci if Ci isin C1 Ckperp otherwise

Table 59 ndash Variant-Related Statements ndash Data-Flow Equations

53 Intraprocedural Analysis and Data-Flow Equations 97

Statement JsKλi(∆)

Array access (12)

Jo = a[i]Ktrue(∆) =

(∆ o) oplus∆

i 7rarr gta 7rarr 〈 i ∆(o)〉

when i isin I

(∆ o) oplus∆

i 7rarr gta 7rarr 〈∆(o) or 〉

when i isin I

Jo = a[i]Kfalse(∆) = ∆ oplus∆

i 7rarr gta 7rarr 〈〉

Array update (13)

Japrime = [a with i = e]Ktrue(∆) =

(∆ aprime) oplus∆

i 7rarr gte 7rarr ∆(aprime)〈i〉a 7rarr 〈∆(aprime)〈lowast i〉 i 〉

when i isin I

(∆ aprime) oplus∆

i 7rarr gte 7rarr ∆(aprime)〈lowast〉a 7rarr 〈∆(aprime)〈lowast〉 or 〉

when i isin I

Japrime = [a with i = e]Kfalse(∆) = ∆ oplus∆

i 7rarr gta 7rarr 〈empty〉

Table 510 ndash Array-Related Statements ndash Data-Flow Equations

The transfer functions for (12) and (13) thus take care of making adequate approximationswhen exceptions cannot be introduced As for the cases when the array access exitswith the false label note that the contribution to the array a is 〈〉 which is strictlyless precise than The operation makes implicit bounds checking and this can thusbe seen as accounting for the fact that no cell in a has been read but the ldquolengthrdquoor ldquosupportrdquo of a has been read Hence it would not be correct to claim that theresult of the statement does not depend on a at all Similarly a variant dependency[C1 7rarr Cn 7rarr ] mapping all constructors to nothing has not read any value inany of the constructors but may still depend on the variantrsquos constructor itself Incontrast we do not make this distinction for structures because we assume surjectivepairing ie structure values consist only of the fields themselves Our solution caneasily be adapted in order to deal with non-surjective cases

533 Intraprocedural Dependency Analysis Illustrated

To better illustrate our analysis at an intraprocedural level we exemplify the mechanismbehind it step by step on the predicate thread discussed in Section 511 We considerthe true execution scenario apply our dependency analysis and compare the actualobtained results with the targeted ones depicted in Figure 55

Since a predicate can only exit with one label at a time and we are considering thetrue label we can map the nodes None and oob to Unreachable as shown in Figure 511This is an advantage of backwards analyses For true we make a pessimistic assumptionand map the output ti to gt considering that control on the output is external and

98 Chapter 5 Dependency Analysis for Functional Specifications

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachableti 7rarr gt

Figure 511 ndash Analysing Predicate thread ndash Initialisation

hence out of our reach and that ti will be entirely needed by a potential caller Goingfurther up the control flow graph we analyse the variant switch

In order to compute the dependency for the node corresponding to the variantswitch we apply the data-flow equation given by (10) in Table 59 Since we areanalysing the true case we know that all other constructors (only the constructor Nonein this case) are locally impossible Thus we map it to perp We continue by forgettingthe dependency information we knew about the output ti Since its value is neededonly in as much as the result of the switch on the corresponding edge is needed weforward it to the part corresponding to the Some constructor This is summarized below

oplusoplus perp perp

C1 CSome Cn

tio =

ti =

Jswitch(v) as [o1| |on]Kλi(∆) = (∆ oi)oplus v 7rarr depiwheredepi = [ C1 7rarr perp Ci 7rarr ∆(oi) Cn 7rarr perp ]

Figure 512 ndash Applying the Variant Switch Equation

Taking all this into account for the node corresponding to the variant switch weobtain the dependency shown in Figure 513 For the output ti we depend entirelyon the Some constructor of the nodersquos input variant tio while the constructor None isimpossible

Making a step further up the graph we access the cell i of the array th and applythe equation (12) given in Table 510 We begin by forgetting the dependency for theoutput tio since this is written Since we only access the element i we map all othercells to Nothing ie To the dependency corresponding to the i-th cell we forward

53 Intraprocedural Analysis and Data-Flow Equations 99

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

tio 7rarr [Some 7rarr gt None 7rarr perp]

ti 7rarr gt

Figure 513 ndash Analysing Predicate thread ndash Variant Switch

the dependency we knew about tio since we depend on it to the extent to which theresult of the access is needed

oplusoplus oplusoplus oplusoplus1 i n

th =

tio =

Jo = a[i]Ktrue(∆) =

(∆ o) oplus

i 7rarr gta 7rarr 〈 i ∆(o)〉

when i isin I

(∆ o) oplusi 7rarr gta 7rarr 〈∆(o) or 〉

when i isin I

Figure 514 ndash Applying the Array Access Equation

We thus obtain a dependency stating that we depend only on the i-th cell of thearray th for which only the constructor Some is possible and entirely needed The cellrsquosindex i is entirely needed as well The applied equation is shown in Figure 514 (sincei is an input we use the first case of the equation) and the obtained results are shownin Figure 515

As a last step we access the field threads of the input process p and apply theequation (6) given in Table 58 and illustrated in Figure 516 As before we forget theinformation for th the access result We map all other fields to and we forward thedependency of the variable th to the dependency part of the field threads

We thus obtain the dependency result shown in Figure 517 This states that for thelabel true the output ti depends only on the i-th cell of the field threads of the inputprocess p for which it depends entirely on the Some constructor Before returning thepredicatersquos final results the analysis filters out any dependency information referringto local variables and verifies that the invariant imposed on dependency information

100 Chapter 5 Dependency Analysis for Functional Specifications

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

th 7rarr 〈 i [Some 7rarr gt None 7rarr perp] 〉i 7rarr gt

tio 7rarr [Some 7rarr gt None 7rarr perp]

ti 7rarr gt

Figure 515 ndash Analysing Predicate thread ndash Array Access

f1 = oplusoplus f2 = oplusoplus

fthreads = oplusoplus

fnminus1 = oplusoplus fn = oplusoplus

p =

th =

Jo = rfiKtrue(∆) = (∆ o) oplus s 7rarr f1 7rarr fi 7rarr ∆(o) fn 7rarr

Figure 516 ndash Applying the Field Access Equation

related to arrays holds Since the results refer only to the inputs p and i and the indexof the exceptional computed dependency is an input the invariant holds and the finalresult can be retrieved The final dependency results obtained for the thread predicateon the exit label true are identical to the ones that we were targeting and that weredepicted in Figure 55 For readability considerations for structures such as the inputprocess p we omit dependencies on fields mapped to We maintain this conventionthroughout the rest of this chapter and thus any field of a structure that is omittedfrom a dependency summary should be interpreted as being mapped to ie nothing

54 Interprocedural DependenciesExit labels presented in Section 312 and in Section 41 (on page 63) constitute anincreased source of expressivity as they indicate the scenario that was observed whileexecuting a predicate We incorporate this expressivity in our dependency results bycomputing specific dependencies for each possible execution scenario Therefore ouranalysis is performed label by label and interprocedural dependency domains associatean intraprocedural domain to each exit label of the analysed predicate The variable

54 Interprocedural Dependencies 101

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

p 7rarr threads 7rarr 〈 i [Some 7rarr gt None 7rarr perp]〉i 7rarr gt

th 7rarr 〈 i [Some 7rarr gt None 7rarr perp] 〉i 7rarr gt

tio 7rarr [Some 7rarr gt None 7rarr perp]

ti 7rarr gt

Figure 517 ndash Analysing Predicate thread ndash Field Access

key-set of each associated intraprocedural domain comprises the inputs of the analysedpredicate A label that cannot be returned is mapped to an Unreachable intraproceduraldomain This is a form of path-sensitivity (Robert and Leroy 2012) However we favorthe term label-sensitivity for this characteristic as it seems to be a more natural choiceapplied to our case and the language we are working on

An interprocedural domain of a predicate p is thus defined as shown below

Definition 541 Interprocedural Dependency Domain

Dp Λp rarr D where Λp the set of output labels of predicate p

For each analysed label of a predicate the analysis starts by initializing the intrapro-cedural domain mapped to it with the output variables associated to the exit labelTo avoid making any false assumption these are initially mapped to the most generaldependency namely gt Subsequently as described in Section 532 the dependencyinformation is gradually refined until a fixed point is reached The execution scenariosdenoted by the exit labels of a predicate are mutually exclusive Therefore during theanalysis of a particular exit label all other exit labels of the predicate are mapped toUnreachable After reaching a fixed point the intraprocedural domain is filtered so thatonly input variables appear in the variable set As explained in Section 532 the in-traprocedural domains are built such that only input variables may appear as exceptionindices in dependencies computed for arrays This invariant is preserved throughoutthe analysis

Interprocedural dependency information is expressed in terms of the formal param-eters of predicates For analysing predicate calls we need to substitute the formalparameters of the callee by the ones that are supplied by the caller Therefore asubstitution must be performed on interprocedural summaries This consists in substi-tuting all occurrences of formal input parameters of a predicate by the correspondingeffective input parameters The substitution operation is denoted as J (χ) where χ isa substitution from formal to effective parameters

102 Chapter 5 Dependency Analysis for Functional Specifications

We proceed by detailing the equation corresponding to a call to a predicate

p(e1 en)[λ1 o1 | | λm om]

having the following signature

p(ε1 εn)[λ1 ω1 | | λm ωm]

The general equation (given in Table 56) applies

∆n =or

∆nsλiminusminusrarrni

Jp(e1 en) [λ1 o1 | | λm om]Kλi(∆ni)

The transfer functions for the predicate call statement are deduced from the predicatersquosinterprocedural domain in the following fashion

Jp(e1 en) [λ1 o1 | | λm om]Kλi(∆) = (∆ oi)oplus

jisin1nej 7rarr depij

where (PredEq)depij = Dp(λi)(εj) J (ε 7rarr e)

Namely the mappings for the outputs o associated to a label λi are removed and thecontribution of a call to each input ej stems from the contribution of the interproceduraldomain for label λi and formal input εj In these all the formal input parametersε in array dependency domains are substituted by the corresponding effective inputparameters from e

An αSmil program is analysed by computing once and for all an interproceduraldependency domain for every predicate These are stored in a mapping binding pred-icate identifiers to their interprocedural dependency domains Whenever a predicatecall is handled intraprocedurally the corresponding computed interprocedural depen-dency summary is retrieved from the mapping propagated to the calling site and usedas explained above If an interprocedural dependency summary for a called predicatehas not been computed yet it is handled as if it were an implicit predicate In practicein programs generated in αSmil from Smil predicates are sorted in topological orderwhen possible For implicit predicates described in Chapters 3 and 4 a pessimisticassumption is made it is considered that everything in their inputs has been read andis needed for any of their possible exit labels Since their implementation is hidden aconservative approximation must be made in their case

Inductive predicates have been discussed in Section 314 (on page 46) They arespecification-only predicates and represent a disjunction of cases Each case can intro-duce existentially quantified variables An inductive predicate exits with the true labelif any of its declared cases holds Therefore for inductive predicates one analysis percase is made For the true exit label the dependency results are obtained by joiningthe results of all cases For the false label everything is considered to be read

54 Interprocedural Dependencies 103

541 Interprocedural Dependency Analysis Illustrated

To better illustrate our analysis at an interprocedural level we revisit our start_addressexample predicate introduced in Section 511 We consider the true execution scenarioapply our dependency analysis and compare the actual obtained results with the tar-geted ones depicted in Figure 58

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

error

trueNone

true

true

oob

adr 7rarr gt

sj 7rarr start 7rarr gt

tj 7rarr stack 7rarr start 7rarr gt

Figure 518 ndash Gstart_address ndash Dependency Information

We begin by initialising the output adr withgt and continue by traversing the controlflow graph backwards and by computing the dependency information at each nodeWe apply the data-flow equation (6) given in Table 58 and we obtain the intermediateresults shown in Figure 518

To compute the dependency information of the control flow graphrsquos entry node iethe one corresponding to a predicate call to thread we use the dependency summarycomputed for this predicate for the exit label true and we substitute the formal pa-rameters ie p and i appearing in it with the effective arguments of the call ie pand j We thus obtain the following dependency summary

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

We apply the data-flow equation (PredEq) corresponding to a predicate call discussedon page 102 and make use of the dependency information corresponding to the suc-cessor node on the edge marked with true

tj 7rarr stack 7rarr start 7rarr gt

thus obtaining the following final dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

However the targeted results for start_address depicted in Figure 58 would trans-late to

104 Chapter 5 Dependency Analysis for Functional Specifications

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr gt None 7rarr perp]〉j 7rarr gt

Clearly the dependency information computed by our analysis and shown in Fig-ure 519 is an over-approximation of the results that we had envisioned The obtaineddependency summary states that the entire j-th associated thread of the input pro-cess p is needed in order to obtain the output adr on the true exit label Howeverin reality only one of this threadrsquos fields is actually needed namely the stack fieldfor which only one subelement ndash the start field ndash is read This loss of precision isa consequence of the dependency information mapped to the Some constructor at thecontrol flow graphrsquos entry node corresponding to a call to the thread predicate Whenexecuting successfully and exiting with label true the thread predicate returns the i-thassociated thread of its input process However the predicate thread does not need thiselement itself it does not read nor use it per se it merely retrieves it The dependencyon this returned element is relative to the amount in which the predicatersquos callers willuse it The start_address predicate for instance depends only on one of the 3 fieldsof the returned thread Yet by mapping the i-th thread to gt in threadrsquos dependencysummary we fail to mirror this distinction gt is the top element of our dependencydomain and joining it with any other dependency will lead to gt thus shadowing anyother information we might compute while observing its usage

542 Context-Insensitivity and its Consequences

Precision losses in dependency summaries such as the one detected in our previousexample are a direct consequence of considering and analysing predicates in isolationThere is a level of information that goes beyond a predicatersquos own control flow graphand a more detailed picture that can emerge once non-local information connected tothe predicatersquos use ie the calling context is included into the analysis

Interprocedural analyses that consider the calling context when analysing the targetof a function ndash or in our case a predicate ndash call are context-sensitive analyses (Hind2001) As the name implies context-sensitive analyses can jump back to the originalcall site using context information for the results they compute Context-insensitiveanalyses on the other hand dispense with such information and propagate back to all

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

error

trueNone

true

true

oob

adr 7rarr gt

sj 7rarr start 7rarr gt

tj 7rarr stack 7rarr start 7rarr gt

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

Figure 519 ndash Gstart_address ndash Final Dependency Results

55 Semantics of Dependency Values 105

possible call sites the information that they compute once This is a notorious sourceof potential precision loss in static analysis Choosing either one of these two traits hassignificant consequences on the one hand by choosing to ignore the calling contextand the additional information it supplies one pays a high price in terms of precisionand on the other hand by choosing to include such information one risks sacrificingscalability

Our dependency analysis as presented so far is context-insensitive for each predi-cate the analysis computes a dependency summary once stores it and further propa-gates it to its callers whenever needed Considering that αSmil predicates are sequencesof calls to other predicates built-in or user-defined as discussed in Chapter 4 if wewould adopt a purely context-sensitive solution we would gain in terms of precisionbut we would obtain results that are prohibitive in terms of performance This is atypical trade-off of static analyses We address this issue and describe our solution indetail in Chapter 6 Without adopting context-sensitivity to the letter we strike a bal-ance between the two alternatives by including lazy components in our interproceduraldependency summaries and by using them for injecting the current intraproceduralcontext on an as-needed basis As will be discussed in Chapters 6 and 8 this approachleads to improved precision with only a marginal decrease in performance

55 Semantics of Dependency ValuesThere are two different manners of interpreting dependency values δ one focusing onthe possible constructors part and the other focusing on the dependency part Inboth cases the interpretations are relative to a type τ and hold only for well-typeddependencies of the same type The set of types that a dependency is compatible withhas been discussed in Section 522 and defined in Table 55

First focusing on the possible constructors aspect dependencies can be interpretedas a constraint on the forms that values may take Such constraints can arise asa consequence of perp ie impossible appearing in nested dependencies These aredescribed by a characteristic function 1

DD = (v δ) isin DtimesD | δ isin D τ isin T v isin Dτ Γ ` δ τ1 DD rarr 0 1

This is defined as follows belowDefinition 551 Characteristic function 1

1(vgt) = 11(v) = 11(vperp) = 0

1(f1 = v1 fn = vn f1 7rarr δ1 fn 7rarr δn) =

1 when 1(vi δi)forall1 le i le n0 otherwise

106 Chapter 5 Dependency Analysis for Functional Specifications

1(Ci[v] [C1 7rarr δ1 Cn 7rarr δn]) =

1 when 1(v δi)0 otherwise

1((P (vk)kisinP) 〈δdef 〉) =

1 when 1(vk δdef )forallk isin P0 otherwise

1((P (vk)kisinP) 〈δdef i δexc〉) =

1 when (1(vk δdef )forallk isin P k 6= E(i)) or(E(i) isin P1(vE(i) δexc))

0 otherwise

This interpretation is compatible with the partial order v (Definition 522 Ta-ble 51) defined on dependencies If a dependency is more precise or equal to anotherdependency then it should be interpreted as constraints which are at least as strong asthe ones for the other dependency Given a typing environment Γ (Definition 431)

forallτ isin Tlowast δ v δprime =rArr (Dτ cap 1(bull δ)) sube (Dτ cap 1(bull δprime))

whereTlowast = τ isin T | Γ ` δ τ and Γ ` δprime τ

The interpretation of the reduction operator oplus (Definition 524) with respect tothe constraints semantics of dependencies is that if two dependencies δ and δprime can beinterpreted as constraints for a value v then their reduction can be interpreted as aconstraint for v as well

1(v δ) and 1(v δprime) =rArr 1(v δ oplus δprime)

The converse which one might expect to be true as well does not hold because ofapproximations made by our treatment of arrays

Given a valuation E (Definition 442) an intraprocedural dependency summarycan be interpreted as a conjunction of the constraints on every variablersquos value as givenby its associated dependency We use the notation E ∆ to indicate this

E ∆ =rArr forallv isin V1(E(v)∆(v))

Under the appropriate conditions given a semantic transition λminusrarr (Definition 444)from the configuration

langE [s]

rang(Definition 443) to the valuation Ersquo as defined in

Section 44 if the intraprocedural summary ∆prime of the statementrsquos s successor on labelλ represents the semantic interpretation of constraints given Ersquo then the contributionJsKλ(∆prime) (Definition 536) of the edge labeled with s and λ must necessarily representthe semantic interpretation of constraints given E We thus obtain the following

55 Semantics of Dependency Values 107

Γ ` E =rArr (51)ΣΓO ` srarr λ =rArr (52)lang

E [s]rang λminusrarr Eprime =rArr (53)

Γ Eprime ` ∆prime =rArr (54)Eprime ∆prime =rArr (55)E JsKλ(∆prime) (56)

We note that thanks to the subject reduction property (Definition 447) (53)implies that Γ ` Eprime

Following from (56) when joining the contributions on all labels of the statements the obtained intraprocedural dependency summary represents the semantic interpre-tation of the disjunction of constraints given E

(E JsKλ1(∆prime1))or∆ or∆(E JsKλn(∆primen)) =rArrE (JsKλ1(∆prime1)or∆ or∆JsKλn(∆primen)) =rArrE old(∆) =rArrE old(∆)or∆(JsKλ1(∆prime1)or∆ or∆JsKλn(∆primen))

For a predicate p exiting with label λ and having the intraprocedural summary ∆λthe characteristic function given I sube E a valuation mapping the predicatersquos inputs totheir values constrains the space of inputs that can make the predicate exit with thelabel λ It thus denotes the necessary conditions on inputs according to the observedexecution scenario and can be used as an inversion lemma when reasoning on calls toa predicate

The soundness of this interpretation as well as the well-formedness of our dependen-cies have been proven in Coq and the corresponding files can be consulted online1 Themechanized Coq proofs are entirely due to Steacutephane Lescuyer These proofs also dealwith deferred dependencies that will be presented in Chapter 6 but these constitutean extension that does not modify the underlying lattice

The second interpretation of dependency values focuses on the dependency part andis a partial equivalence relation asymp

TD= (τ δ) isin Ttimes D | Γ ` δ τasymp TDrarr Dtimes D

The partial equivalence relation asympτδ relates well-typed values of the same type τ Itrelates values that only differ in places that are irrelevant according to the dependencyδ It is defined as shown below

1The corresponding files are provided at the following address httpajl-demofr2015proveCoq

108 Chapter 5 Dependency Analysis for Functional Specifications

Definition 552 Partial Equivalence Relation asympτδ

asympτgt = (x x)| x isin Dτasympτ = (x y)| x y isin Dτasympτperp = (x y)| x y isin Dτ

asympstructf1τ1fnτnf1 7rarrδ1fn 7rarrδn = (f1 = v1 fn = vn f1 = w1 fn = wn) |

foralli 1 le i le n (vi wi) isin asympτiδi

asympvariant[C1τ1| | Cnτn][C1 7rarrδ1Cn 7rarrδn] = (Ci[vi] Ci[wi]) | (vi wi) isin asympτiδi

asymparrτi 〈τ〉〈δdef 〉 = ((P (vk)kisinP) (P (wk)kisinP)) | forallk (vk wk) isin asympτδdef

asymparrτi 〈τ〉〈δdef i δexc〉 = ((P (vk)kisinP) (P (wk)kisinP)) | E(i) isin P =rArr

(vE(i) wE(i)) isinasympτδexc forallk 6= E(i) (vk wk) isin asympτδdef

This interpretation is compatible with the partial order v (Definition 522) definedon dependencies If a dependency is more precise or equal to another dependency thenit should be interpreted as an equivalence relation relating more values

δ v δprime =rArr asympτδ supe asympτδprime forallτΓ ` δ τ and Γ ` δprime τ

The interpretation of the reduction operator oplus (Definition 524) with respect tothe equivalence relation interpretation of dependencies is that the set of values relatedby δ oplus δprime is a subset of the intersection of values related by δ and δprime respectively

asympτδoplusδprime sube asympτδ cap asympτδprime forallτΓ ` δ τ and Γ ` δprime τ

The interpretation of the or operator (Definition 523 Table 52) with respect tothe equivalence relation interpretation of dependencies is similar

asympτδorδprime sube asympτδ cap asympτδprime forallτΓ ` δ τ and Γ ` δprime τ

Given two valuations E and Ersquo they are equivalent modulo an intraproceduraldependency summary ∆ if the values that they associate to variables are equivalentmodulo the corresponding dependency associated in ∆

E asympΓ∆ Eprime =rArr forallv isin ∆ E(v) asympΓ(v)

∆(v) Eprime(v)

The equivalence relation asympΓ∆ thus relates valuations that are not distinguishable by

only looking at the parts specified by the intraprocedural dependency summary ∆This interpretation can be used to apply congruence modulo reasoning to predicate

calls By calling a predicate p with two sequences of input values v and u respectively

56 Related Work 109

which are related by the intraprocedural dependency summary of p on label λ thenthe predicate will necessarily exercise the same execution scenario exiting with label λand will yield identical outputs w

56 Related WorkThe frame problem and its manifestations in the software verification process ndash detect-ing program properties that remain unchanged under a certain operation ndash are notori-ous (Leavens Leino and Muumlller 2007 Leavens and Clifton 2005 OrsquoHearn 2005) Acomplete specification of a program will necessarily include frame properties (BorgidaMylopoulos and Reiter 1995) However though necessary specifying and verifyingframe properties is tedious and repetitive Two prominent solutions to the frame prob-lem come from separation logic (Reynolds 2005 Distefano OrsquoHearn and Yang 2006Calcagno et al 2011) and ownership types (Clarke and Drossopoulou 2002) HoweverMeyer (Meyer 2015) argues that the problem itself should not impose such annotation-heavy solutions Simpler automatic solutions for their specification and verificationwould allow programmers to concentrate on the truly challenging part (Meyer 2015)

Though we share the same desideratum with separation logic (Reynolds 2002Reynolds 2005 OrsquoHearn 2012 OrsquoHearn Yang and Reynolds 2004) the programmingparadigm and context under which we operate leads to a considerably different solutionSeparation logic is targeted at low-level imperative programming languages and itsapplications focus on shared mutable data structures We on the other hand focuson a purely functional language and consider immutable algebraic data structures andarrays We treat mappings between variables and values and analyse their evolution ina side-effect free environment in the context of verification of programs where a newoutput is obtained by altering just a subset of the inputrsquos subelements and preservingthe rest Instead of using a collection of Hoare triples as an abstract domain we havedefined our own dependency domain The results of our dependency analysis are closeto the concept of a footprint (Distefano OrsquoHearn and Yang 2006 Hur Dreyer andVafeiadis 2011 Bobot and Filliacirctre 2012) in the sense that they describe an over-approximation of only those variables and subelements that are needed by a programand are expressed as an input-output relation

The dependency results computed by our analysis are similar to primitive read andwrite effects used in ownership type systems (Clarke and Drossopoulou 2002) Writeeffects in our case are implicit and include strictly the output variables associated toan exit label Read effects can only refer to input variables of a predicate Alsoread effects comprise the whole execution of a method even if they are irrelevant forthe methodrsquos results We however ignore read effects on which the output does notdepend reflecting only those which contribute to the observed result A technique fordeclaring and verifying read effects in an ownership type system is presented in (Clarkeand Drossopoulou 2002) We use static analysis to automatically detect them Inthe Spec (Mike Barnett 2005) program verifier the notion of confined is used for

110 Chapter 5 Dependency Analysis for Functional Specifications

describing the reading effects of a pure method in terms of the ownership cone (ClarkePotter and Noble 1998) of its parameters

In (Hughes 1987) Hughes argues that analyses of programs that manipulate datastructures should ideally distinguish between the information they are computing fora data structure as a whole and the information computed for each component withinit The information that is computed by a backward analysis is dubbed generically ascontext A manner of constructing richer domains is described and it is argued that forinstance a context for a sum type must contain (sub)contexts for any of its summandsSimilarly for product types a context should include a (sub)context for each componentas well as a context referring to the value as a whole We target fine-grained dependencyinformation for structures variants and arrays Similarly to the described producttype contexts our dependencies for structures describe the dependency on each of thestructurersquos fields Variant dependencies are expressed in terms of the dependencies oftheir constructors ie their summands Furthermore it is argued that any contextshould include a maximal element interpreted as a ldquono informationrdquo value a minimalelement interpreted as ldquocontradictory requirementsrdquo and an element representing ldquonocontextrdquo or ldquounusedrdquo Close to the notion of ldquocontradictory requirementsrdquo we includean atomic value denoting impossible in our dependency domain Program points havinga ldquocontradictory requirementsrdquo context denote points in the program that will lead tocrashes if reached Our notion of impossible refers to nodes that are unreachable orconstructors that cannot occur on a given execution path Our maximal elementdenoting everything is a safe value close to the notion of ldquono informationrdquo Nothingan element different from both everything and impossible is similar to the notion ofldquounusedrdquo It denotes (sub)elements that are irrelevant and constitutes quite definiteinformation

Hughes (Hughes 1987) introduces a notion of neededunneeded parameters forprograms manipulating lists This enables detecting whether the value of a subterm isignored The method is formulated in terms of a fixed finite set of projection functionsMultiple other approaches and analyses focus on the elimination of unnecessary datastructures (Cousot and Cousot 1994) filtering of useless arguments and unnecessaryvariables in the context of logic programming (Leuschel and Soslashrensen 1996) and morerecently removing redundant arguments (Alpuente Escobar and Lucas 2007)

The concept of a context is further discussed by Wadler and Hughes in (Wadler andHughes 1987) The authors describe a technique for strictness analysis for non-flat listdomains that relies on contexts represented using the notion of projections from domaintheory These allow expressive list descriptions such as contexts specifying that while alistrsquos elements can be ignored its length is relevant Their backward analysis computesnecessary information using a fixed finite abstract domain

Leino and Muumlller (Leino and Muumlller 2008b) present a technique for verifying thatmethods that query the state of identical data structures return identical or equivalentresults They stress the frequency of such assumptions in program verification as wellas the counter-intuitive amount of effort required for the specification and verificationof such equivalent-results methods and their callers One of the two interpretationsof our dependency values mdash asympτδ mdash is an equivalence relation binding pairs of values

56 Related Work 111

that are not distinguishable by considering only the parts specified by the dependencydomain Thus it ensures not only that identical input data structures will lead to iden-tical results but also that different invocations of a predicate with input data structuresthat are congruent with respect to this interpretation will lead to identical results Ourdependencies are similar to the influence sets presented by Leino and Muumlller Influencesets are represented as sets of heap locations and they are used to specify the partsof the program state that are allowed to impact the return values Influence sets areuser-defined and they are required to be self-protecting This property is enforced byrequiring the set of path expressions specifying the influence set to be prefix close aconstraint which is then checked syntactically In contrast our dependencies are com-puted by static analysis Influence sets may depend on the heap Reasoning aboutheap locations is beyond the scope of our analysis We treat mappings between vari-ables and values analyse their evolution in a side-effect free environment and expressdependencies as input-output relations The technique presented by Leino and Muumlllerhas been applied for reasoning about pure methods (Leino Muumlller and Wallenburg2008 Hatcliff et al 2012 Nordio et al 2010 Banerjee and Naumann 2014)

Identifying the input (sub)parts on which a predicatersquos outputs depend can also beseen as an instance of secure information flow (Sabelfeld and Myers 2003) where thepredicatersquos outputs and the input (sub)parts appearing in the predicatersquos dependencysummary have a low-security level ie are public and everything else has a high-security level ie is private The first interpretation of our dependency values mirrorsthe notion of non-interference as given by Volpano et al in (Volpano Irvine andSmith 1996) for deterministic programs By only observing the public parts nothingcan be concluded about the private parts The link between permissions and ownershiptypes has been underlined by Zhao and Boyland (Zhao and Boyland 2008)

Liu and Stoller present a backward dependence analysis for the computation ofdead code (Liu and Stoller 2003) They obtain expressive descriptions of partiallydead recursive data using liveness patterns These are based on general regular treegrammars that were extended with two notions live and dead Users can specifyliveness patterns at particular program points of interest The analysis then uses theseand computes liveness patterns at all program points based on constraints derived fromthe programming language semantics and the program itself The obtained informationis meant to be used for identifying and eliminating dead code In a separate paper (Liu1998) Liu presents three approximation operations meant to guarantee terminationin the context of fixed point computations using general grammar transformers onpotentially infinite grammar domains

Static dependence or liveness analyses are typically used for code optimizationdead code elimination (Liu and Stoller 2003) and compile time garbage collectionbut only seldom for program verification One exception that we are aware of comesfrom Frama-C (Cuoq et al 2012) where it is used in a purely automatic setting andunlike our analysis it does not handle unions and arrays A plug-in based on theavailable value analysis (Frama-C Value Analysis User Manual) computes lists of inputand output locations for each function distinguishing between operational functionaland imperative inputs and outputs Dependencies computed for an output o hold if

112 Chapter 5 Dependency Analysis for Functional Specifications

and when the analysed function terminates They are represented as sets of variableswhose initial value can influence the final value of o Input variables appearing in thisset are called functional inputs Imperative inputs are the locations that may be readduring the execution of the analysed function An over-approximation of the set ofthese locations is computed locations that are read only in non-terminating branchesare included in the imperative inputs set as well Operational inputs are the memoryzones that are read without having been previously written to

57 ConclusionIn the context of interactive formal verification of complex systems considerable effortis spent on proving the preservation of the systemrsquos invariants However most oper-ations have a localised effect on the system which only really impacts few invariantsat the same time Identifying those invariants that are unaffected by an operation cansubstantially ease the proof burden for the programmer

In this chapter we have presented a data-flow analysis that computes a conserva-tive approximation of the input fragments on which the operations depend It is aflow-sensitive path-sensitive interprocedural dependency analysis that handles arraysstructures and variants For the latter it simultaneously computes a subset of possibleconstructors We have defined our own abstract dependency domain and we obtaindependency information that mirrors the layered structure of compound data types

The main original traits of this contribution stem from its design as an analysismeant to be used as a companion tool during interactive program verification in aunified manner on programs as well as on specifications

We have implemented a prototype of the dependency analysis in OCaml and wehave applied it to a functional specification of ProvenCore (Lescuyer 2015) a general-purpose microkernel that ensures isolation Its proof is based on multiple refinementsbetween successive models from the most abstract one on which the isolation propertyis defined and proven to the most concrete ie the actual model used for code gener-ation Medium-sized experiments performed on the abstract layers of ProvenCore showpositive results For instance the dependency results of approximately 630 αSmil pred-icates totalling approximately 10000 lines of code are obtained in less than 1 secondStatic approaches have long been considered as being confined to small programs Webelieve that our preliminary results indicate that it is possible to report conservativedependency summaries without sacrificing scalability The implementation and the ob-tained results will be presented and discussed in detail in Chapter 8 The prototypecan be tested on the web page2 dedicated to our dependency analysis where variousexamples are provided and explained Additionally users can devise and test their ownexamples

An obvious first challenge is to address the issue of context-sensitivity In thefollowing chapter we present a solution based on lazy components which are includedin our interprocedural dependency summaries The current intraprocedural context is

2Dependency Analysis Web Page httpajl-demofr2015

57 Conclusion 113

injected in them on an as-needed basis As we will show in Chapter 6 these lead toimproved precision with only a marginal decrease in performance

Our main goal is to combine the dependency analysis with the correlation analysispresented in Chapter 7 which is meant to detect relations between inputs and outputsBy uncovering partial equivalence relations between inputs and outputs after havingdetected that a property only depends on unmodified parts and by unifying the resultsthe preservation of invariants for the unmodified parts can be inferred

We surmise that besides its intended target other programming activities can relyon our dependency analysis as well For instance it could have applications in thetesting realm the computed dependency information could be used for designing andgenerating test suites that avoid redundant testing of the same execution scenarioBased on the second interpretation mdash asympτδ mdash of our dependency information given inSection 55 classes of inputs that will test the same execution scenario can be deter-mined The input subelements on which the outputs of a predicate do not depend canbe consistently supplied with the same testing value as they are completely irrelevantfor the outcome On the contrary the input subelements on which the outputs dependshould be targeted and their values should be varied for more comprehensive testingSince our dependency analysis computes results for every exit label of an αSmil pred-icate it could also facilitate unit testing for exceptions Furthermore the computeddependency information could provide assistance in specifying read effects of predicatessimilar to accesible clauses (Leavens et al 2006) in JML

The dependency analysis presented in this chapter has been the subject of a previouspublication (Andreescu Jensen and Lescuyer 2015)

115

Chapter 6

Deferred Dependencies InjectingContext in DependencySummaries

No symbols where none intended

Samuel Beckett

61 Dealing with Context-InsensitivityTraditionally the precision of static analyses is characterized along several axes in-cluding the scope of the analysis ie intraprocedural or interprocedural analyses anddifferent nuances of sensitivity relative to the analysisrsquo use of control-flow informationor of information pertaining to the calling context This classification and terminologyhas its origins in data-flow analyses (Hind 2001 Midtgaard 2012) Regarding scopeintraprocedural analyses are local and operate within the boundaries of procedures Incontrast interprocedural analyses are global and operate across procedure calls (Midt-gaard 2012) These are somewhat more challenging and costly to perform and imposedealing with parameter mechanisms

Another important distinction is made regarding the calling context Context-sensitive analyses distinguish between different calling contexts At the other end ofthe spectrum context-insensitive analyses compute information only once and subse-quently use the same information at all calling sites Clearly a context-sensitive analysisis more precise than a context-insensitive analysis but it is also more costly (NielsonNielson and Hankin 1999) The choice between which technique to use amounts to acareful balance between precision and efficiency (Nielson Nielson and Hankin 1999)The dependency analysis presented in the previous chapter is an interprocedural flow-sensitive context-insensitive data-flow analysis Regarding pure context-sensitivity ina functional language such as αSmil in which predicate calls and the manipulation ofthe returned outputs are omnipresent unfolding predicates at each call site and recom-puting the needed information seems to be a daunting perspective that risks becomingprohibitive in terms of execution time very quickly On the other hand choosing toanalyse predicates in isolation and to dispense completely with information regarding

116 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

the calling context leads to clear precision losses as illustrated in Section 541 anddiscussed in Section 542 In order to address this aspect we have devised a solutionbased on symbolic dependencies that requires an extension of our abstract dependencydomain (Definition 521) but which otherwise has a minimal impact on the dependencyanalysis at an intraprocedural and interprocedural level

Outline In this chapter we present our solution based on symbolic dependencies Westart by illustrating the addressed problem and the desired results in Section 62 InSection 63 and Section 64 we present the extended abstract dependency domain Weshow the insertion and use of symbolic components at the intra- and interprocedurallevel of our dependency analysis in Section 65 and Section 66 respectively Finallywe discuss their impact on the precision of the computed dependency information

62 Symbolic Dependency Components in a NutshellSymbolic dependency components allow us to compute interprocedural predicate sum-maries with lazy components in which the callerrsquos intraprocedural information andcontext can be injected on an as-needed basis The interprocedural dependency infor-mation for each predicate is still computed only once and propagated back to everypossible call site However even though the analysis does not systematically recomputethe dependency for the called predicate it shows a form of context-sensitivity (Hind2001) and leads to increased precision by creating templates with symbolic elements foreach predicate These elements introduce degrees of freedom in our interprocedural de-pendencies and allow us to parameterize and vary them according to the callerrsquos actualintraprocedural context Thus we exclude some sources of coarse over-approximationswithout sacrificing scalability

Previously in Section 541 we illustrated on two αSmil example predicates threadand start_address how failing to take into consideration the current context of acaller leads to over-approximations We argued in Section 542 that a more precisedependency blueprint can emerge once we consider a predicatersquos use as well The firstexample predicate given in Chapter 5 thread is an accessor predicate it receives aprocess p and an index i as inputs and returns the i-th associated thread of the processp when executing succesfully ie when exiting with the true label The computedpredicatersquos dependency summary for the successful execution scenario was the following

p 7rarr threads 7rarr 〈 i [Some 7rarr gt None 7rarr perp]〉i 7rarr gt

This dependency information is expressive it shows that only one of the 4 fields ofthe input process is read by the predicate while all others are irrelevant for its outputThe read field threads corresponds to the array of threads associated to the inputprocess p Furthermore the dependency summary shows that for this array only thei-th element is inspected This element is entirely needed while all others are irrelevant

62 Symbolic Dependency Components in a Nutshell 117

This summary provides a rather detailed and precise blueprint of the predicatersquos outputdependencies on its inputs Yet it fails to make one subtle but important distinctionregarding the dependency on the i-th element of the associated threads array Ifwe want to be more accurate while describing this predicatersquos dependency we needto acknowledge that the predicate itself is not actually needing or depending on thei-th associated thread of the process Indeed it does not read or use it per se itmerely retrieves it Thus the dependency on the input processrsquo i-th associated threadis relative to the amount in which the callers of the thread predicate will use theoutput element in which it is retrieved It is important to distinguish between thesetwo rather subtle nuances Failing to do so can shadow information that is computedwhile analysing callers of the thread predicate This was exactly what happened forour second example predicate start_address The predicate start_address receivesa process p and an index j as inputs It makes a call to the predicate thread thusreading the j-th associated element of the process p If this is an active element itfurther accesses the field stack from which it only reads the start address start Theobtained dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

was an over-approximation of the desired dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr gt None 7rarr perp]〉j 7rarr gt

Intraprocedurally the dependency analysis was correctly detecting that only thefield stack of the thread was needed for which only the start field was read Howeverwhen joining the dependency information computed locally for start_thread with theone given by the predicatersquos thread dependency summary we obtain less precise de-pendency results This scenario is not a corner case it would typically be exhibited inthe case of accessor predicates and their callers

In order to address this source of precision loss we can introduce symbolic or lazycomponents in our abstract dependency domain As a first attempt and approximationwe could consider the set of output variables of a predicate as the lazy componentsThese can be seen as the points at which a caller predicate may insert its intraproceduralinformation in the dependency summary computed for the callee predicate

The dependency summary for a successful execution of the thread predicate iethe true exit label would therefore not map the i-th element of the threads arrayto everything ie gt the top element of our abstract dependency domain Insteadthis would be mapped to the symbolic set of output variables in which this inputsubelement is retrieved ie the set containing the ti output variable We denote thisset by Deferred(ti) as it represents the set of points in which a caller predicate caninject its context Establishing the dependency on the i-th associated thread of theinput process p is thus deferred or postponed and left to the caller predicates it isrelative to their context and the amount in which they use the output ti

118 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr Deferred(ti) None 7rarr perp]〉j 7rarr gt

Using this dependency summary when computing the information for the predicatestart_thread we would obtain the targeted dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr Deferred(adr) None 7rarr perp]〉j 7rarr gt

This dependency summary for start_address shows that the dependency on thej-th associated thread of the input process p depends on the amount in which theoutput adr representing the start address of the threadrsquos stack is subsequently usedIndeed start_address itself is an accessor predicate

This first approximation of lazy components as sets of output variables of a predi-cate is effective for accessor predicates However its limitations become visible whenconsidering functional non-destructive mutator predicates for example Such predi-cates receive a compound input destructure it and construct a new output variableThis is created by modifying only one of the compound inputrsquos subelements and bycopying all the rest without further changes For example the predicate set_threadshown below is the dual of our thread example predicate It receives a process p athread ti and an index i as inputs and returns a new process r as an output ob-tained by setting the i-th associated thread in the threads array to ti and by copyingeverything else from p

predicate set_thread ( process p int i thread ti)-gt [ true process r] array ltoption ltthread gtgt threads option ltthread gt tio

r = p [ true -gt 1]threads = r threads [ true -gt 2]tio = Some(ti) [ true -gt 3]threads = [ threads with i = tio] [ true -gt 4 f a l s e -gt 6]r = r with threads = threads [ true -gt 5][ true][error]

The dependency summary computed for this predicate on the exit label true isshown below It indicates that the given inputs the index i and the thread ti used forupdating the i-th associated thread of the output process r are completely needed Forthe input process p the fields pid crt_thread and adr_space are completely neededas well They are copied without further changes to the output r From the arrayof associated threads all elements except the i-th are needed as well The latter iscompletely irrelevant since it is replaced in the output r by the given ti The formerare simply read and copied to r

62 Symbolic Dependency Components in a Nutshell 119

p 7rarr

threads 7rarr 〈gt i 〉pid 7rarr gt

crt_thread 7rarr gtadr_space 7rarr gt

i 7rarr gtti 7rarr gt

At a first glance this dependency summary seems to reflect rather accurately thepredicatersquos inputs and input subelements on which the output process r depends onHowever similarly to the accessor predicate thread a further distinction is possibleThe predicate set_thread does not depend itself on the input ti nor on the fields ofthe process p It does not use these for new computations ndash it simply copies them to thecorresponding output subelements Just as before the amount in which the outputrsquossubelements are used subsequently characterizes more precisely the dependency on theinputs of set_thread For instance the dependency on prsquos current thread field shouldbe the symbolic element corresponding to the outputrsquos process crt_thread Howeverour first attempt at representing symbolic elements as sets of output variables seen asa whole does not allow us to convey such information For expressing it we first needto be able to refer to the substructure rcrt_thread and use this as a lazy componentin which callers may inject their own context Similarly for the threads array we needto be able to refer to all other elements except the i-th one Thus at the symbolicdependencies level as well we need the capability of distinguishing between the differentsubelements of the inputs This would allow us to obtain the following dependencysummary

p 7rarr

threads 7rarr 〈 Deferred(rthreads〈lowast i 〉) i 〉pid 7rarr Deferred(rpid)

crt_thread 7rarr Deferred(rcrt_thread)adr_space 7rarr Deferred(radr_space)

i 7rarr gtti 7rarr Deferred(rthreads〈 i 〉Somet)

One way to capture the actual effect that is due to set_thread consists in replac-ing all deferred dependencies with ie nothing and simplifying the summary Thedependency summary thus obtained shows the dependency on set_threadrsquos inputs inthe extreme case of calling the predicate and throwing away its result In this casethe summary for set_thread would show that the predicate only depends on the in-put i and on the length or support of the threads array captured by 〈〉 On thecontrary by replacing the deferred dependencies with gt ie everything we obtainexactly the results computed by the context-insensitive dependency analysis presentedin Chapter 5 The information thus obtained shows the dependency on set_threadrsquosinputs when considering the other end of the spectrum namely calling the predicateand using its result entirely

120 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

The dependency summary with deferred occurrences is indeed precise Not onlydoes it create a dependency template in which callers can inject their own context but italso distills the predicatersquos set_thread specification A quick glance and interpretationof it indicates that it is indeed a non-destructive mutator updating the i-th associatedthread of a process to ti and preserving everything else

In order to obtain such dependency summaries we need to refine our first approx-imation of symbolic elements as sets of a predicatersquos output variables Just as neededin our initial abstract dependency domain we must reflect the layered structure ofalgebraic data types and arrays at the level of symbolic dependencies as well To thisend we need to consider not only sets of output variables but also symbolic paths tosubstructures within them

63 Symbolic Paths

631 Symbolic Path Type

In order to extend our abstract dependency domain with symbolic dependencies and toobtain expressive dependency summaries as the ones discussed in the previous sectionwe begin by introducing symbolic paths These are meant to mirror the layered structureof algebraic data types and arrays at the level of symbolic dependencies

Each deferred occurence in a dependency summary is identified by symbolic pathsSymbolic paths are rooted at one of the programrsquos variables and represent sequences ofsymbolic internal accesses inside some valuersquos structure ie they are symbolic traversalsfrom one value to some of its subparts Paths are chains of symbolic accesses leadingto nested elements in which different calling contexts can be subsequently injected Wedefine a recursive type π of symbolic paths encompassing this

Definition 631 Symbolic path type π isin Π

π isin Π π = | ε endpoint ndash root| f π f isin F | Cπ C isin C| 〈i〉π i index| 〈lowast i〉π i index| 〈lowast〉π

An endpoint denoted by ε is the special path denoting an entire element For struc-tures we denote the symbolic path to some field f by fπ Similarly for variants wedenote the path to some chosen constructor C by Cπ For arrays we distinguishbetween three cases

bull symbolic paths referring to a specific array cell identified by the cellrsquos index iand denoted by 〈i〉π

bull symbolic paths referring to all but one specific array cell identified by its indexi and denoted by 〈lowast i〉π

63 Symbolic Paths 121

bull symbolic paths referring to all the cells of an array denoted by 〈lowast〉π

With one exception these symbolic paths directly reflect the cases of our abstractdependency domain For instance the correspondance between symbolic paths forstructures or variants is immediately apparent In contrast for arrays the abstractdependency domain included two cases namely 〈δ〉 corresponding to a dependencyapplying to all of the cells and 〈δdef i δexc〉 corresponding to arrays with a generaldependency applying to all but one exceptional cell for which a specific dependencyis known In order to reflect the second case in the deferred occurrences we need tobe able to refer to the exceptional cell on one hand and to all other cells of the arrayon the other hand Hence to this end we need to introduce two symbolic path typesthe symbolic 〈i〉π path for expressing deferred occurrences of exceptional cells and the〈lowast i〉π symbolic path for expressing deferred occurences of all the other array cellsexcept the one identified by i

The action of appending a non-empty path πprime to another path π is denoted byπ πprime We call the extension operator and when applying it we say that we extendπ with πprime

We further consider sets P sub Π of symbolic paths π and define the partial orderv

between them

Definition 632 Partial Orderv for Path Sets

forallP sub Π P prime sub Π Pv P prime lArrrArr P sube P prime

They establish a semi-lattice based on the subset order The bottom element of thissemi-lattice is empty the empty set of paths

forallP sub Π emptyv P

There is no top element Theoretically this would correspond to the set representingall possible paths In practice this cannot be constructed and we chose not to add aspecial case for it to our symbolic path type π

The join operation of deferred path sets is based on set union and is denoted byor

Definition 633 Join Operationor for Path Sets

forallP sub Π P prime sub Π Por P prime = P cup P prime

It is symmetric and the value obtained by joining two path sets is the least upper boundApplying the extension operator on a set of symbolic paths P amounts to a

pointwise extension of each member of the path set

Definition 634 Extension Operator for Path Sets

forallP sub Π P πprime = π πprime| π isin P

122 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

632 Semantics of Symbolic Paths

Semantically paths of type π defined previously are a symbolic representation of severalactual paths In the following we explicit this notion and we begin by defining simpleactual paths in a value of the universe D (Definition 441)

Actual paths represent a unique sequence of internal accesses inside some valuersquosstructure leading to a single nested element Unlike symbolic paths that can forinstance cover multiple elements of an array an actual path designates a single subvalueof a structure variant or array The recursive actual path type π isin Π is defined below

Definition 635 Actual Path Type π isin Π

π = | ε empty| f π f isin F | C π C isin C| 〈i〉π i index

A symbolic path π covers an actual path π if when given a valuation E (Defini-tion 442) of the index variables for arrays it matches π A set of symbolic pathscovers an actual path π if at least one of the symbolic paths matches π We denotethis by the E relation that is parameterized by a valuation E The definition of Eis given in Table 61

Table 61 ndash E ndash Path Semantics

ε E εE ε

π E π

fπ E f πEStruct

π E π

Cπ E CπEVar

π E π E(i) = j

〈i〉π E 〈j〉πECell

π E π

〈lowast〉π E 〈j〉πEAnyCell

π E π E(i) 6= j

〈lowast i〉π E 〈j〉πEOutCell

Given a valuation E a set P of symbolic paths covers an actual path π if at leastone of the symbolic paths in the set covers or matches π

forallP sub Π P E π lArrrArr existπ isin P π E π

63 Symbolic Paths 123

The interpretation JP KE of a set of paths P is then the set of single actual pathsthat are covered given a valuation E

Definition 636 Interpretation JP KE of a set of paths P

forallP sube Π JP KE = π| P E π

The partial orderv (Definition 632) on sets of paths is compatible with the inter-

pretation JP KE in the sense that when Pv Q holds the interpretation JP KE of P is

included in JQKE for every valuation

forallPQ sube ΠforallEPv Q lArrrArr JP KE sube JQKE

Each single path can be interpreted as a way to find a subpart of a value which weexplicit by the following function at It is not defined for all cases since not all pathscan be applied to all values

Definition 637 Function at

at Πtimes Drarr D

at(π v) =

v when π = ε

at(πprime vi) when π = fiπprime and

v = f1 = v1 fi = vi fn = vnat(πprime vC) when π = Ciπprime and

v = Ci[vC ]at(πprime vi) when π = 〈i〉πprime and

v = (P (vk)kisinP)i isin P

633 Well-Typed Paths and Path Sets

Symbolic paths cannot be used in every context their interpretation must be made inthe context of a type τ An endpoint ie the ε symbolic path can apply to any type Incontrast other symbolic paths that exhibit specific data features can only apply to thecorresponding types For instance a path such as fπ is meaningless on values whichare not records or on record values that do not exhibit a field f the field specified inthe symbolic path

A path set can be seen as a set of sequences of internal accesses inside some valuesrsquosstructure In that sense it is a set of possible traversals from one value to some of itssubparts To characterize the contexts in which a path set is well-typed we need toconsider the types of values to which it can be applied and the types of values to whichit can lead to Therefore in the following we begin by defining a typing judgement forsymbolic paths as a three-place relation π τ rarr τ prime whose meaning is that π can beapplied to any value of type τ and in that case it will always describe subvalues of type

124 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

τ prime Additionally the typing judgement is also parameterized by a set of input variablesI which are the variables having the right to appear as identifiers for array accessesThis is detailed in Table 62

I ` ε τ rarr τWTε

τ = structf1 τ1 fi τi fn τnI ` πi τi rarr τ prime

I ` fiπi τ rarr τ primeWTStructPath

τ = variant[C1 τ1| | Ci τi| | Cn τn]I ` πC τi rarr τ prime

I ` CiπC τ rarr τ primeWTVarPath

Γ ` π τ rarr τ prime

I ` 〈lowast〉π arrτi〈τ〉 rarr τ primeWTArrayPath

I ` π τ rarr τ prime I(i) = τi

I ` 〈i〉π arrτi〈τ〉 rarr τ primeWTCellPath

I ` π τ rarr τ prime I(i) = τi

I ` 〈lowast i〉π arrτi〈τ〉 rarr τ primeWTOutPath

Table 62 ndash Well-Typed Dependency Paths

A set P of symbolic paths is well-typed if every path contained by it is well-typedfor the same types

forallP sub Π I` P τ rarr τ prime lArrrArr forallπ isin P I ` π τ rarr τ prime

The well-typedness property of sets of symbolic paths is preserved by the join op-eration

or (Definition 633)

forallP prime P primeprime isin Π forallτ prime τ primeprime isin TI` P prime τ prime rarr τ primeprime rArr I

` P primeprime τ prime rarr τ primeprime rArr I

` P prime

or pprimeprime τ prime rarr τ primeprime

When extending a well-typed set of symbolic paths with a well-typed path using theextension operator (Definition 634) the resulting set of symbolic paths is well-typed

64 Abstract Dependency Domain with Deferred Accesses 125

as well

forallP prime isin Π forallτ τ prime τ primeprime isin TI` P prime τ prime rarr τ primeprime I ` πprime τ primeprime rarr τ rArr I

` P prime πprime τ prime rarr τ

64 Abstract Dependency Domain with Deferred AccessesFrequently as explained in Section 62 the dependency on a predicatersquos input variable isrelative to the amount in which some of the predicatersquos outputs are subsequently neededMore precisely these outputs are those into which the input variable is copied andretrieved We strive to avoid over-approximations in such cases and to create degreesof freedom for the callers by treating such output variables as points in which callers caninject their own context externally In other words we want to defer the computationof the dependency on certain input variables of a predicate to the predicatersquos callerssince they have additional information about the actual use of the predicatersquos outputs

In our previous section mdash Section 63 mdash we have introduced and defined an in-termediate level consisting of symbolic paths and path sets These reflect the layeredstructure of algebraic data types and arrays and allow us to consider not only outputvariables as a whole but also symbolic paths within them Thus we can computemore flexible and expressive dependency summaries with finer-grained elements Wecan finally link these two ideas and extend our abstract dependency domain with de-ferred dependencies by including an additional dependency case in our domain δ isin Dinitially defined (Definition 521) in Section 52

Definition 641 Extended Abstract Dependency Domain δ isin D

δ = | gt Everything ndash atomic case (i)| Nothing ndash atomic case (ii)| perp Impossible ndash atomic case (iii)| f1 7rarr δ1 fn 7rarr δn f1 fn fields (iv)| [C1 7rarr δ1 Cm 7rarr δm] C1 Cm constructors (v)| 〈δ〉 (vi)| 〈δdef i δexc〉 i array index (vii)| Deferred(o1 7rarr P1 ok 7rarr Pk) deferred accesses (viii)

A deferred dependency shown in (viii) consists of a mapping which binds outputvariables which we also call root variables in this case to sets of symbolic paths

Definition 642 Access Map

A V 9 Π

Only output variables can be treated as lazy dependency components The sets ofsymbolic paths mapped to them allow us to distinguish between their subelements Inthe following discussion we will denote an access map o1 7rarr P1 ok 7rarr Pk by a

126 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

For the partial order v (Definition 522) defined in Chapter 5 and detailed in Ta-ble 51 an additional rule (Def) for comparing instances of deferred dependencies isadded This is shown in Table 63 The top and bottom elements of our dependencydomain are as before gt and perp respectively Thus any instance of a deferred depen-dency is more precise than gt and less precise than perp Just as gt perp and the specialdependency case a deferred dependency can be used in association to any typealbeit with some constraints for its elements

forallo 7rarr P isin a a(o)v aprime(o)

Deferred(a) v Deferred(aprime)Def

Table 63 ndash Extended Leq - Comparison of Two Domains

However unlike the atomic cases gt perp and deferred dependencies are not relatedto or to dependencies corresponding to structures variants or arrays Since they actas placeholders for dependencies that are effectively computed subsequently instancesof deferred dependencies can be compared only to gt and perp or to other instances ofdeferred dependencies For instance comparing a deferred dependency to wouldyield

Deferred(o1 7rarr P1 ok 7rarr Pk) 6v and

6v Deferred(o1 7rarr P1 ok 7rarr Pk)

The extended join operation or (Definition 523) initially defined in Section 521and detailed in Table 52 is shown below in Table 64 It still has perp as its identityelement and gt as its absorbing element Joining two instances of deferred dependen-cies amounts to a pointwise join of the path sets mapped to each output variable inthe access maps The join between an instance of a deferred dependency and a de-pendency corresponding to a structure a variant an array or to the special case amounts to gt the top element of our domain Since we cannot make any supposi-tion regarding deferred dependencies we are forced to make a pessimistic assumptionand to approximate to the least precise value Join is a commutative operation forwhich the undisplayed cases in Table 64 are defined with respect to their symmetricalcounterparts

Similarly to join the reduction operation oplus (Definition 524) has been initiallydefined in Section 521 and it has been detailed in Table 53 The extended form isshown in Table 65 It still has as an identity element and perp as an absorbing elementWhen applying the reduction operation between a deferred dependency and a depen-dency δprime corresponding to a structure a variant or an array we over-approximate thedeferred dependency to gt and apply the reduction operation between δprime and gt Apply-ing the reduction operation between a deferred dependency and gt behaves similarlythe outcome in this case is straightforward and amounts to gt As was the case forjoin applying the reduction operation between two instances of deferred dependencies

64 Abstract Dependency Domain with Deferred Accesses 127

δprime δprimeprime δprime or δprimeprime

Deferred(a) or Deferred(aprime) = Deferred(aprimeprime) where

aprimeprime(o) =

a(o)

or aprime(o) when o 7rarr Po isin a o 7rarr P primeo isin aprime

Po when o 7rarr Po isin aP primeo when o 7rarr P primeo isin aprime

Deferred(a) or f1 7rarr δ1 fn 7rarr δn = gtDeferred(a) or [C1 7rarr δ1 Cm 7rarr δm] = gtDeferred(a) or 〈δ〉 = gtDeferred(a) or 〈δdef i δexc〉 = gtDeferred(a) or = gt

Table 64 ndash or ndash Extended Join

amounts to a pointwise join of the path sets mapped to each output variable in theaccess maps The reduction operation is commutative and the undisplayed cases inTable 65 are defined with respect to their symmetrical counterparts

δprime δprimeprime δprime oplus δprimeprime

Deferred(a) oplus Deferred(a) = Deferred(aprimeprime) where

aprimeprime(o) =

a(o)

or aprime(o) when o 7rarr Po isin a o 7rarr P primeo isin aprime

Po when o 7rarr Po isin aP primeo when o 7rarr P primeo isin aprime

Deferred(a) oplus gt = gtDeferred(a) oplus f1 7rarr δ1 fn 7rarr δn = gtoplus f1 7rarr δ1 fn 7rarr δnDeferred(a) oplus [C1 7rarr δ1 Cm 7rarr δm] = gtoplus [C1 7rarr δ1 Cm 7rarr δm]Deferred(a) oplus 〈δ〉 = gtoplus 〈δ〉Deferred(a) oplus 〈δdef i δexc〉 = gtoplus 〈δdef i δexc〉

Table 65 ndash oplus ndash Extended Reduction Operator

Finally the extractions previously defined for dependencies δ (Definition 525 526527 528 and 529) have been extended in order to handle deferred dependencies aswell Their treatment is summarized in Table 66 Making array-specific extractions aswell as extracting field and constructor dependencies on a deferred dependency amountsto a pointwise extension of every path set in the access map with the correspondingsymbolic path

Finally we add the following rule to the well-typed dependency rules given in Chap-ter 5 Table 55

128 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

Extraction δ Result

Field Deferred(o1 7rarr P1 ok 7rarr Pk)f Deferred(o1 7rarr P1 fε ok 7rarr Pk

fε )Constructor Deferred(o1 7rarr P1 ok 7rarr Pk)C Deferred(o1 7rarr P1

Cε ok 7rarr Pk Cε )

Cell Deferred(o1 7rarr P1 ok 7rarr Pk)〈i〉 Deferred(o1 7rarr P1 〈i〉ε ok 7rarr Pk

〈i〉ε )Array General Deferred(o1 7rarr P1 ok 7rarr Pk)〈lowast〉 Deferred(o1 7rarr P1

〈lowast〉ε ok 7rarr Pk 〈lowast〉ε )

Outside Cell Deferred(o1 7rarr P1 ok 7rarr Pk)〈lowast i〉 Deferred(o1 7rarr P1 〈lowast i〉ε ok 7rarr Pk

〈lowast i〉ε )

Table 66 ndash Extended Extraction Operators

Γ(o1) = τ1 Γ I` P1 τ1 rarr τ

Γ(ok) = τk Γ I` Pk τk rarr τ

o1 isin O ok isin OΓ IO ` Deferred(o1 7rarr P1 ok 7rarr Pk) τ

WTDeferred

Table 67 ndash Well-Typed Dependencies ndash Extended

65 Deferred Dependencies at the Intraprocedural Level

651 Extended Intraprocedural Dependency Analysis

At the intraprocedural and interprocedural level of our dependency analysis the intro-duction of deferred dependencies has a minimal impact in terms of required changes

Intraprocedurally each predicate is analysed on every possible exit label As ex-plained in Section 532 our dependency analysis is a backward data-flow analysis Foreach possible exit label of a predicate the control flow graph is traversed backwardsstarting from the exit node that corresponds to the analysed execution scenario De-pendency information is computed at every point of the control flow graph for eachof the predicatersquos input output and local variables and this information is graduallyrefined until a fixed point is reached

By traversing the control flow graph backwards we take advantage of the infor-mation regarding the outputs that are associated to the analysed exit label and weconsider only the relevant ones starting from the initialisation phase As explainedpreviously in Section 532 the intraprocedural domain for the currently analysed exitlabel is initialised with its associated output variables mapped to gt the least preciseelement of our abstract dependency domain This is a conservative over-approximationit is considered that control on the outputs is lost and that these are entirely observedexternally As illustrated in Section 62 this over-approximation propagates along thecontrol flow graph and in certain cases has a non-negligible impact on the precisionof the computed dependency summaries

We argued that at the intraprocedural level of the analysis a subtle but importantdistinction can be made regarding the dependency on certain inputs This consists in

65 Deferred Dependencies at the Intraprocedural Level 129

distinguishing between the cases in which a predicate effectively uses an input subele-ment to compute an output subelement and those in which it simply forwards it toan output subelement In the latter cases the predicate does not use or need such aninput subelement per se and as a consequence the dependency on it is relative to theamount in which the predicatersquos callers will subsequently use the output in which itis retrieved At the intraprocedural level in order to avoid the propagation of over-approximations it is important to make this distinction early on from the initialisationphase Therefore we introduce deferred dependencies at this level instead of mappingthe output variables to gt as was previously done

For a predicate p of the following form

p(e1 en) [λ1 o11 o1k1 | | λi oi1 oiki | | λm om1 omkm ]

analysed on the λi exit label the intraprocedural dependency domain used for initial-ising the node corresponding to λi is the following

oi1 7rarr Deferred(oi1 7rarr ε) oiki 7rarr Deferred(oiki 7rarr ε)

For each associated output oij 1 le j le ki of the analysed label λi a set Poij ofsymbolic paths is constructed Initially this consists of a single element namely the εpath The deferred dependency associated to each output oij is an access map bindingoij itself to its corresponding set of symbolic paths Poij Since the symbolic paths εrefer to the output variables in their entirety this is still a conservative approximationbut in contrast to our previous initialisation strategy it acknowledges the fact thatdependencies on the inputs might be relative to the amount in which the outputs aresubsequently used It allows injecting context-sensitive information later on

This new initialisation strategy is enough to incorporate the expressive power ofdeferred dependencies at an intraprocedural level Whereas before we were computinglabel-specific dependency summaries as input-output relations the new strategy allowsus to obtain label-specific dependency templates with lazy components that can beparameterized and varied according to a callerrsquos own intraprocedural context Thesecan be seen as context-insensitive dependency summaries with context-sensitive leaves

652 Intraprocedural Dependency Analysis Illustrated

In order to illustrate the use of deferred dependencies at an intraprocedural level werevisit our thread example predicate discussed in Section 533 As done previouslywe consider the true execution scenario and apply our extended dependency analysisWe initialize the dependency corresponding to the true exit node by mapping thepredicatersquos output ti to the deferred dependency mapping it to a set containing asingle symbolic path namely ε

130 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

After the initialisation phase the analysis continues as before by traversing thecontrol flow graph backwards and by applying at each step the corresponding data-flow equation The deferred dependency is propagated upwards until the entry node isreached and analysed

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

p 7rarr threads 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti 7rarr ε) None 7rarr perp]〉i 7rarr gt

th 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti 7rarr ε) None 7rarr perp] 〉i 7rarr gt

tio 7rarr [Some 7rarr t 7rarr Deferred(ti 7rarr ε) None 7rarr perp]

ti 7rarr Deferred(ti 7rarr ε)

Figure 61 ndash Analysing thread ndash Dependency Summary with DeferredOccurrences

The final dependency summary for the true exit label of the predicate is obtained

p 7rarr threads 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti) None 7rarr perp]〉i 7rarr gt

and this is similar to the targeted dependency information for thread discussed inSection 62 and illustrated on page 117

66 Deferred Dependencies at the Interprocedural LevelAt the interprocedural level the impact of introducing deferred dependencies is visibleonly at the level of the substitutions that have to be performed Previously the only re-quired substitution consisted in replacing all occurrences of formal input parameters ofa predicate with the corresponding effective input parameters After having introduceddeferred dependencies further substitutions are needed These can be easily illustratedby revisiting our start_address example predicate discussed in Section 541 As donepreviously we consider the true execution scenario and apply our extended dependencyanalysis

We begin by initialising the output adr with a corresponding deferred dependencyas discussed in Section 651 The analysis traverses the control flow graph backwardsand computes the dependency information at each node until reaching the controlflow graphrsquos entry node which corresponds to a call to the thread predicate Theintermediate dependency results are shown in Figure 62

We obtain the dependency summary for the true exit label of the called predicatethread In order to be able to use it we must first substitute the formal input param-eters ie p and i appearing in it with the effective arguments of the call ie p andj Additionally in deferred dependencies we also have to substitute the formal output

66 Deferred Dependencies at the Interprocedural Level 131

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

trueNone oob

true

true

adr 7rarr Deferred(adr 7rarr ε)

sj 7rarr start 7rarr Deferred(adr 7rarr ε)

tj 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε)

Figure 62 ndash Gstart_address ndash Intermediate Dependency Results forstart_address

parameters appearing as roots in the access maps ie ti with the corresponding ef-fective output parameters These substitutions are shown in Figure 63 Formal indexvariables appearing in dependencies corresponding to arrays have to be substitutedwith their effective counterparts as well Similarly any formal index variable appearingin symbolic paths that correpond to arrays must be substituted by the correspondingeffective index variable

p 7rarr threads 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti) None 7rarr perp]〉i 7rarr gt

tj 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε)

p j tj

j

Figure 63 ndash Substitution of Formal Parameters by Effective Parame-ters

We can finally take advantage of the flexibility obtained using deferred dependenciesby injecting the callerrsquos intraprocedural dependency information into the deferred oc-currences of the calleersquos dependency summary This is another type of substitution andconsists in replacing deferred occurrences of formal output parameters of a predicateby the dependency information computed in the current context for the correspondingeffective output parameters For our start_address example this is shown in Fig-ure 64 and amounts to substituting the dependency computed for tj in the deferredoccurrence of ti in the dependency summary of thread

After this substitution we obtain the following dependency summary for the exitlabel true of the start_address predicate

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε) None 7rarr perp]〉j 7rarr gt

132 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr Deferred(tj) None 7rarr perp]〉j 7rarr gt

tj 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε)

Figure 64 ndash Substituting Deferred Dependencies by Actual Dependen-cies

661 Applying Context-Sensitive Information by Substitution

As shown in our previous example deferred dependencies associate sets of symbolicpaths to certain root variables We can substitute such deferred dependencies by actualdependencies computed in the current context by applying the symbolic paths to theactual dependency to substitute We iterate through entire dependency summaries inorder to substitute the nested deferred dependencies appearing at some leaves Thissubstitution can be seen as an application of contextual information to summarieswith deferred dependencies which are essentially context-insensitive abstractions withcontext-sensitive leaves It is denoted by a mapping σ which associates dependenciesto root variables appearing in deferred access maps

Definition 661 Substitution σ

σ V rarr D

Simultaneously while substituting root variables in deferred dependencies by theiractual dependencies computed in the current intraprocedural context we also substi-tute indices in information corresponding to arrays These are substituted either byanother array index ie the one corresponding to an actual input parameter or theyare eliminated when corresponding to a local variable Their elimination consists inapproximating the dependencies so as to remove references to the array index Thissubstitution is denoted by φ and it is a mapping from variables to new variables toreplace them

Definition 662 Substitution φ

φ V 9 V

The two substitutions can be done separately However for performance reasonswe chose to do them simultaneously This is also what the actual implementation of thedependency analysis does We denote the two simultaneous substitutions by J (σ φ)and detail them in Table 69 Performing the two operations simultaneously can beseen as a manner of reinterpreting a dependency computed in one context in anothercontext

For sets of symbolic paths (as defined in Section 631) in deferred dependenciesthe operation P bull (σ(o) φ) is the application of symbolic paths to the dependency of

66 Deferred Dependencies at the Interprocedural Level 133

the root variable o computed in the current context For a deferred access map alldependencies obtained by applying the symbolic paths are joined The application of asymbolic path π to a dependency δ is denoted by π (δ φ) and it is shown in Table 68During the application free variables appearing in symbolic paths associated to arraysare substituted by their corresponding index variables as given by φ If φ does notcontain a mapping for a free variable an approximation is made in order to remove itand the dependency obtained by applying 〈lowast〉 is returned

π (δ φ)

ε (δ φ) = δ

fπ (δ φ) = π (δf φ)Cπ (δ φ) = π (δC φ)〈lowast〉π (δ φ) = π (δ〈lowast〉 φ)

〈i〉π (δ φ) =π (δ〈φ(i)〉 φ) i isin Dom(φ)π (δ〈lowast〉 φ) otherwise

〈lowast i〉π (δ φ) =π (δ〈lowast φ(i)〉 φ) i isin Dom(φ)π (δ〈lowast〉 φ) otherwise

Table 68 ndash Deferred Paths ndash Application and Substitutions

Definition 663 Application of Symbolic Paths to a Dependency

P bull (δ φ) =orforallπisinP

π (δ φ)

δ J (σ φ)

gt J (σ φ) = gt J (σ φ) = perp J (σ φ) = perp

f1 7rarr δ1 fn 7rarr δn J (σ φ) = f1 7rarr δ1 J (σ φ) fn 7rarr δn J (σ φ)[C1 7rarr δ1 Cm 7rarr δm] J (σ φ) = [C1 7rarr δ1 J (σ φ) Cm 7rarr δm J (σ φ)]

Deferred(o1 7rarr P1 ok 7rarr Pk) J (σ φ) =or

1leilekPi bull (σ(oi) φ)

〈δdef 〉 J (σ φ) = 〈δdef J (σ φ)〉

〈δdef i δexc〉 J (σ φ) =〈δdef J (σ φ) φ(i) δexc J (σ φ)〉 i isin Dom(φ)〈δdef J (σ φ) or δexc J (σ φ)〉 otherwise

Table 69 ndash Interprocedural Domain ndash Substitutions

134 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

662 Wrapped Calls and Results

As a simple experiment for verifying the precision of our dependency analysis approachwith deferred dependencies we have replaced all calls to built-in predicates in ourprevious example predicates thread and start_address illustrated in Section 652and on page 131 respectively with calls to predicates wrapping every call of this typeWe compared the precision of the obtained results as well as the execution time neededto compute the dependency summaries

The thread_with_wrapped predicate thus has the following formpredicate thread_with_wrapped ( process p int i)-gt [ true thread ti|None|oob] array lt option_thread gt th option_thread tio

get_threads (p)[ true th] [ true -gt 1]get_ith (th i)[ true tio| f a l s e ] [ true -gt 2 f a l s e -gt 5]switch_option (tio )[ none|some ti] [none -gt 4 some -gt 3][ true][None ][oob]

The start_address predicate becomespredicate start_address_wrapped ( process p int j)

-gt [ true int adr|None] thread tj memory_region sj

thread (p j)[ true tj | None | oob] [ true -gt 1None -gt 4 oob -gt 4]

get_stack (tj) [ true sj] [ true -gt 2]get_start (sj) [ true adr] [ true -gt 3][ true][None ][error]

The dependency summaries obtained for each of the two predicates are identicalto the ones obtained for the predicates thread and start_address in their originalform The dependency information for thread and start_address is computed in 033milliseconds while that for the versions with calls to the wrapped built-in predicatesie thread_with_wrapped and start_address_wrapped are obtained in 065 millisecondsWe ran the analysis 10001 times in a loop The time measured includes only theexecution of the analysis algorithms It excludes the time required to load the inputfiles as well as the time spent printing the results

67 Related WorkFor the past few decades interprocedural analyses have generated considerable interestin the static analysis community They expand the scope of analysis beyond a pro-cedurersquos limits in order to encompass the effect of callees on callers The precision

67 Related Work 135

of both data-flow and control-flow analyses is traditionally characterized in terms ofcontext-sensitivity ie computing information depending on the calling context orits dual context-insensitivity For control-flow analyses the terms polyvariant andmonovariant analyses are used interchangeably for the same distinction (Nielson andNielson 1999) In (Midtgaard 2012) a comprehensive survey of control-flow analysesfor functional programs is made Context-sensitivity has the advantage of increasedprecision However the scalability of such analyses is frequently a major concern Theprecision and performance impact of context-sensitivity is discussed by Lhotaacutek andHendren in (Lhotaacutek and Hendren 2006) In contrast Ruf argues in (Ruf 1995) thatcontext-insensitivity leads to little or no precision penalty Shapiro and Horwitz ar-gue in (Shapiro and Horwitz 1997) that using a more precise pointer analysis does ingeneral lead to more precise results

Sharir and Pnueli introduced in (Sharir and Pnueli 1978) a comprehensive theoryof interprocedural data-flow analyses for general frameworks The first of them thefunctional approach is based upon computing a context-sensitive summary of a functionor procedure call Procedures are viewed as collections of structured program blocksand input-output relations are established for each such block Subsequently the effectof procedure calls is computed by simply using such relations The second approachproposed by Sharir and Pnueli is the call-string approach Broadly speaking this isbased upon avoiding infeasible paths by matching corresponding calls and returnsIt can be seen as an extension to intraprocedural data-flow analyses in which onlyvalid interprocedural paths are considered during graph traversal This is achieved bytagging the propagated data with an encoded history of procedure calls thus making theinterprocedural flow explicit and increasing the accuracy of the propagated informationBoth approaches are generic and can be used for a wide variety of analyses Our formof interprocedural dependency analysis is closer to the functional approach For eachpredicate of the analysed program it computes a dependency summary as an input-output relation and then uses this summary whenever the predicate is called Symbolicelements are used to allow callers to inject their own context information

Though desirable in terms of precision context-sensitivity is often considered pro-hibitively costly in terms of performance In practice many analyses make a com-promise and relax to a certain degree this requirement for scalability Our approachmakes no exception either it constitutes an application of context-sensitive informa-tion to summaries with deferred dependencies which are essentially context-insensitiveabstractions with context-sensitive leaves Though not purely context-sensitive weobtain a gain in precision without sacrificing scalability

Purely context-sensitive analyses have been developed especially in the area ofpoints-to analyses (Gharat Khedker and Mycroft 2016) but also for informationflow control (Hammer and Snelting 2009) or liveness analysis used for garbage collec-tion (Asati et al 2014) In (Khedker Mycroft and Rawat 2011) Khedker et alpresent a lazy context-sensitive points-to analysis Points-to information is computedonly for the pointers that are live and the propagation of points-to information is sparsebeing restricted to live ranges of pointers Though our approach is not directly com-parable to this approach it is interesting to make a few general remarks In (Khedker

136 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

Mycroft and Rawat 2011) strong liveness is used for identifying the pointers thatare directly used or which are used for defining pointers that are strongly live Onthe other hand we use strong dependency to identify and distinguish between inputsubelements that are directly needed for computing the output and input subelementsthat are simply copied into and forwarded as outputs Thus Khedker et al preventthe explosion of information by clearly distinguishing between relevant and irrelevantinformation We achieve scalability by refining the notion of needed or depending onTheir analysis is fully context-sensitive and is based on the call-string approach (Sharirand Pnueli 1978) our analysis shows a relaxed form of context-sensitivity and is closerto the functional approach

Jensen et al present in (Jensen Moslashller and Thiemann 2010) a technique based onlazy propagation for context-sensitive interprocedural analysis of JavaScript programsie programs with objects and first-class functions Transfer functions may not bedistributive and hence the IFDS technique (Reps Horwitz and Sagiv 1995 Padhyeand Khedker 2013) is not applicable They propagate data-flow information ldquoby needrdquoin an iterative fixpoint algorithm

The computation of relevant information is deferred in demand-driven analyses (Hor-witz Reps and Sagiv 1995 Heintze and Tardieu 2001 Zheng and Rugina 2008Sridharan et al 2005) as well These compute the targeted results only at specificprogram points thereby avoiding the effort of computing a global result We computedependency summaries with symbolic elements These can be seen as dependency tem-plates parameterized by a callerrsquos context Their instantiation is deferred and left tothe callers

68 ConclusionWe have presented an extension of our dependency analysis introducing a relaxedform of context-sensitivity Our solution is based on computing deferred dependen-cies consisting of symbolic access maps in which callerrsquos can subsequently inject theirspecific context information on an as-needed basis The dependency summaries foreach predicate are computed only once However by including nested context-sensitivecomponents at the summariesrsquo leaves we reduce the precision penalty exerted by ourprevious context-insensitive approach The introduction of deferred dependencies re-quired the introduction of an additional level of symbolic paths and path sets Howeverthe impact of this extension had a minimal impact on the dependency analysis at theintra- and interprocedural levels imposing only the modification of the initialisationstrategy and of the substitution operation As we will discuss in Chapter 8 our ex-tension of the dependency analysis with deferred dependencies led to an increase of10ndash20 in execution time on our used benchmark However it obtained more precisedependency information for 50 of the predicates included in the used benchmark

137

Chapter 7

Correlation Analysis

A thousand fibers connect us [] andamong those fibers as sympatheticthreads our actions run as causes andthey come back to us as effects

Hermann Melville

71 IntroductionIn the field of Artificial Intelligence the frame problem (McCarthy and Hayes 1969)is loosely but frequently described as ldquoknowing what stays the same as actions occurin a changing worldrdquo (Morgenstern 1995) In the realm of software verification theframe problem refers to establishing the boundaries within which functions operateand it has notoriously tedious implications and consequences along two different axesthe specification of frame properties (Borgida Mylopoulos and Reiter 1995) and theirverification

Another frequently used definition of the frame problem in the context of ArtificialIntelligence refers to ldquoefficiently determining what remains the same in a changingworldrdquo (Morgenstern 1995) This definition is similar to the first yet the initial wordsldquoefficiently determiningrdquo confer it a subtle but crucial nuance In this chapter we arerather interested in the latter and we address the issue of automatically detecting deep-state modifications in the context of αSmil a functional language In our ldquochangingworldrdquo destructive updates are not allowed The new state out of a structured valuein is obtained by destructuring in and reconstructing it in out by copying unmodifiedsubvalues from in and replacing in out only what needs to reflect the modificationThus referring to old values per se as one of the three major approaches to specifyingframe properties (described in Section 231) implies does not make sense Instead wehave to focus on and to detect the relations between the (sub)values in and out Tothis end we present a static correlation analysis which when given a predicate thatmanipulates a structured input is meant to determine automatically the subset thatremains unchanged and is further propagated into the output Thus the behaviour ofa predicate is summarised by computing relations between parts of the input and partsof the output The computed correlation summaries are a safe approximation of what

138 Chapter 7 Correlation Analysis

part of an input state of a predicate is copied to the output state they summarise notonly what is modified by the predicate but also how it is modified and to what extent

Outline We continue this chapter by illustrating the targeted correlation results onan αSmil example in Section 711 In Section 712 we give a brief overview of thecharacteristics of our correlation analysis and explain the motivation behind some ofthem The rest of the chapter is focusing on technical details related to the correlationanalysis In Section 72 we present our abstract partial equivalence type a fundamen-tal component of our correlation analysis It is followed in Section 73 by an in-depthpresentation of paths and correlations an intermediate level of abstraction that is im-perative for obtaining expressive results In Section 74 we focus on the correlationanalysis at an intraprocedural level and illustrate the step-by-step mechanism behindit in Section 742 A summary of the correlation analysis at an interprocedural level isgiven in Section 75 A possible extension going beyond the detection of equivalencesand handling more general relations is briefly discussed in Section 76 Detecting mod-ifications is traditionally associated to shape and side-effect analyses In Section 77 wereview and discuss such approaches

711 Targeted Correlation Information

The goal of our analysis and the targeted correlation results can be illustrated onan example predicate such as stop_thread for instance This predicate has beenintroduced in Section 315 (on page 50) and its body in the αSmil language was shownin Section 41 on page 64 We revisit it and illustrate the predicatersquos body in Figure 71

predicate stop_thread(process in int i)-gt [true process o | inval]arrayltoption_threadgt ta option_thread ththread ti state s1 ta = inthreads2 th = ta[i]3 switch(th) as [Someti | None]4 s = Blocked5 ti = ti with current_state=s6 th = Some(ti)7 ta = [ta with i=th]8 o = in with threads=ta9 true 10 inval

false

None

false

Figure 71 ndash Body of the stop_thread Predicate

It has two possible execution scenarios true when the given index i corresponds toan active thread and inval otherwise ie when it corresponds to an inactive elementor when it lies outside the arrayrsquos bounds In the latter case the predicate exits with

71 Introduction 139

the inval label and generates no output In the former case stop_thread modifies thestate of the i-th active thread by setting it to Blocked and returns the new state ofthe process in the output o This is accomplished by destructuring the input processin and copying the array of associated threads into the local variable ta (line 1) Thearrayrsquos i-th element is copied to the local variable th (line 2) and as it is an activeelement its corresponding thread is extracted and put into ti (line 3) The new statefor the thread value ti is created by setting its current_state field (line 5) to the states constructed previously (line 4) The new state o of the process is constructed usingti for its i-th active element (lines 6 and 7) and copying everything else from the inputin (line 9) It is interesting to note that for each destructuring step of in there is acorresponding construction step for o as is visible at lines 1 and 8 2 and 7 and 3 and6 for instance

The targeted correlation results for this predicate are illustrated in Figure 72 Ouranalysis should infer that between the input process in and the output o the valuesof the fields pid current_thread and address_space are equal Furthermore for thethreads array of associated threads it should detect that all elements are equal exceptthe value of the i-th element (as illustrated by Rth) for which only one of the threefields namely the current_state field differs (shown by Ri)

in

o

address_spacecurrent_thread

pidthreads

address_spacecurrent_thread

pidthreads =

==

Rth

Rth i iRi

Ri stackcurrent_stateidentifier stackcurrent_stateidentifier

Figure 72 ndash Targeted Correlation Results for Predicate stop_thread

By tracking equalities between pairs of variables of the same type and by defining

140 Chapter 7 Correlation Analysis

an abstract partial equivalence type that mirrors the layered structure of associativearrays and algebraic data types we can detect the equality of the values for the pidcurrent_thread and address_space fields between the input and the output However ifwe track only equalities between variables of the same type and we ignore the flow of aninputrsquos subelement value to a variable (or conversely the flow of a variablersquos value to anoutputrsquos subelement) valuable information is lost We are not only losing informationbetween inputs and outputs of different types but by accumulating imprecisions wealso lose information concerning inputs and outputs of the same type such as the inand o processes of our example For instance the equality between the values extractedfrom the input in and copied into ta and th respectively as well as the relation betweenthe values of ta and othreads and th and othreads[i] are ignored because neitherta nor th are of the same type as in and o As a consequence we lose the informationconcerning the relation between inrsquos and orsquos threads values altogether In order tocompute such information it is imperative to track (cor)relations between variables ofdifferent types as well

712 Correlation Analysis in a Nutshell

Our correlation analysis is a conservative static analysis inferring what is modified byan operation and to what extent It approximates the flow of input values into outputvalues by uncovering equalities and computing correlations as pairs between inputparts and the output parts into which these are injected What is marked as beingequal is definitely equal

π

ρ

πprime

ρprimeRprime

R

Figure 73 ndash Intraprocedural Correlations ndash General Representation

Outputs are often complex compounds of different subparts of different input vari-ables a subset of the input is modified while the rest is injected as is We track theorigin of subparts of the output and relate it to subparts of the input As previouslyillustrated on our stop_thread example predicate in order to prevent avoidable over-approximations we need to avoid dealing with data in a monolithic manner To thisend it is imperative to consider pairs of different types and granularities as well As aconsequence we are forced to introduce an additional level of granularity allowing us torefer not only to variables but also to substructures within them At the intraprocedu-ral level illustrated in Figure 73 we define correlations as mappings between pairs ofinputs and outputs to which we associate mappings between pairs of valid inner paths

72 Partial Equivalence Relations 141

and the relations binding them Correlations for arrays and variants are exemplified inFigures 74-a) and 74-b)

i i

R

a) Arrays foralli a[i]R b[i] b) Variants

Figure 74 ndash Intraprocedural Domain ndash Examples

Similarly to our dependency analysis presented in Chapter 5 the correlation analysisis an interprocedural flow-sensitive field-sensitive label-sensitive analysis that handlesassociative arrays structures and variant data types However unlike the dependencyanalysis for which we introduced a relaxed form of context-sensitivity in Chapter 6 thecorrelation analysis is context-insensitive Fine-grained equivalence relations betweenthe inputs and outputs of a predicate are computed once and subsequently propagatedto its callers

Our correlation analysis is meant to be used in an interactive verification contextPrecise correlation summaries must be computed quickly in order to answer effectivelywhen combined with dependency summaries queries regarding the preservation of cer-tain invariants

72 Partial Equivalence Relations

721 Abstract Partial Equivalence Type

The first step towards automatically reasoning about the propagation of input subele-ments into output subelements is the definition of an abstract partial equivalence typeR that mimics the structure of algebraic data types and arrays A partial equivalencerelation R isin R is defined inductively from the two atomic elements Equal and Anyand mirrors the structure of the concrete types

Definition 721 Partial Equivalence Type R isin R

R = | Equal atomic case ndash equal (i)| Any atomic case ndash unrelated (ii)| f1 7rarr R1 fn 7rarr Rn f1 fn fields (iii)| [C1 7rarr R1 Cn 7rarr Rn ] C1 Cn constructors (iv)| 〈Rdef 〉 array (v)| 〈Rdef i Rexc〉 i array index (vi)

Such relations represent fine-grained partial equivalences between pairs of values of thesame type Equal and Any represent equal and unrelated values respectively Partialequivalence relations for structures (given by (iii)) and for variants (given by (iv)) areexpressed in terms of the partial equivalences of their subparts by mapping each field

142 Chapter 7 Correlation Analysis

or constructor to the corresponding relations As for the dependency analysis presentedin Chapter 5 for arrays we distinguish between two cases namely arrays with a generalrelation applying to all of the cells (as given by (v)) or to all but one exceptional cell(as given by (vi)) for which a specific relation is known to hold

The preorder relation of the partial equivalence lattice is denoted by vR and definedbelow

Definition 722 Preorder Relation vR

vR sube R timesR

It is detailed in Table 71

Table 71 ndash vR ndash Comparison of Two Domains

R vR AnyTop

Equal vR RBot

R1 vR Rprime1 Rn vR Rprimen

f1 7rarr R1 fn 7rarr Rn vR f1 7rarr Rprime1 fn 7rarr RprimenStr

R1 vR Rprime1 Rn vR Rprimen

[C1 7rarr R1 Cn 7rarr Rn] vR [C1 7rarr Rprime1 Cn 7rarr Rprimen]Var

R vR Rprime

〈R〉 vR 〈Rprime〉Adef

Rdef vR Rprimedef Rexc vR Rprimeexc

〈Rdef i Rexc〉 vR

langRprimedef i Rprimeexc

rang AI

Rdef vR Rprime Rexc vR Rprime

〈Rdef i Rexc〉 vR 〈Rprime〉AIA

R vR Rprimedef R vR Rprimeexc

〈R〉 vR

langRprimedef i Rprimeexc

rang AAI

i 6= j Rdef vR Rprimedef Rdef vR Rprimeexc Rexc vR Rprimedef Rexc vR Rprimeexc

〈Rdef i Rexc〉 vR

langRprimedef j Rprimeexc

rang AIJ

The join and meet operations are denoted by orR and andR respectively

Definition 723 Join Operation orR

orR R times R rarr R

Definition 724 Meet Operation andR

andR R times R rarr R

72 Partial Equivalence Relations 143

Both are commutative operations applied pointwise on each subelement Join shownin Table 72 has Equal as its identity element and Any as its absorbing element Meetshown in Table 73 has Equal as its absorbing element and Any as its identity elementFor both operations the undisplayed cases are defined by their symmetrical counter-parts

Table 72 ndash Partial Equivalences ndash orR ndash Join Operation

Rprime Rprimeprime Rprime orR Rprimeprime

Any orR R = AnyEqual orR R = R

f1 7rarr R1 fn 7rarr Rn orR f1 7rarr Rprime1 fn 7rarr Rprimen = f1 7rarr R1 orR Rprime1 fn 7rarr Rn orR Rprimen[C1 7rarr R1 Cn 7rarr Rn] orR [C1 7rarr Rprime1 Cn 7rarr Rprimen] = [C1 7rarr R1 orR Rprime1 Cn 7rarr Rn orR Rprimen]

〈R〉 orR 〈Rprime〉 = 〈R orR Rprime〉〈R〉 orR 〈Rprimedef i Rprimeexc〉 = 〈R orR Rprimedef i R orR Rprimeexc〉

〈Rdef i Rexc〉 orR 〈Rprimedef j Rprimeexc〉i = j

i 6= j=

〈Rdef orR Rprimedef i Rexc orR Rprimeexc〉〈Rdef orR Rprimedef orR Rexc orR Rprimeexc〉

Table 73 ndash Partial Equivalences ndash andR ndash Meet Operation

Rprime Rprimeprime Rprime andR Rprimeprime

Any andR R = R

Equal andR R = Equalf1 7rarr R1 fn 7rarr Rn andR f1 7rarr Rprime1 fn 7rarr Rprimen = f1 7rarr R1 andR Rprime1 fn 7rarr Rn andR Rprimen[C1 7rarr R1 Cn 7rarr Rn] andR [C1 7rarr Rprime1 Cn 7rarr Rprimen] = [C1 7rarr R1 andR Rprime1 Cn 7rarr Rn andR Rprimen]

〈R〉 andR 〈Rprime〉 = 〈R andR Rprime〉〈R〉 andR 〈Rprimedef i Rprimeexc〉 = 〈R andR Rprimedef i R andR Rprimeexc〉

〈Rdef i Rexc〉 andR 〈Rprimedef j Rprimeexc〉i = j

i 6= j=

〈Rdef andR Rprimedef i Rexc andR Rprimeexc〉〈Rdef andR Rprimedef andR Rexc andR Rprimeexc〉

Additionally extraction functions are defined for partial equivalence relations

Definition 725 Extraction of a Fieldrsquos Relation

extrf R 9 R

Definition 726 Extraction of a Constructorrsquos Relation

extrC R 9 R

Definition 727 Extraction of a Cellrsquos Relation

extr 〈i〉 R 9 R

144 Chapter 7 Correlation Analysis

These are partial functions and can only be applied on relations of the correspondingtypes For example the field extraction extrf only makes sense for atomic or structuredrelations having a field named f which should be the case if the relation connects twovalues of a structured type with a field f For any of the two atomic relations Equalor Any applying any of these extractions yields Equal or Any respectively They aresummarized in Table 74

Table 74 ndash Partial Equivalence Extractions

extrf (R) f isin F

extrf (Any) = Anyextrf (Equal) = Equal

extrf (f1 7rarr R1 fi 7rarr Ri fn 7rarr Rn) = Ri if f = fi

extrC(R) C isin C

extrC(Any) = AnyextrC(Equal) = Equal

extrC([C1 7rarr R1 Ci 7rarr Ri Cn 7rarr Rn]) = Rj if C = Cj

extr 〈i〉(R)

extr 〈i〉(Any) = Anyextr 〈i〉(Equal) = Equal

extr 〈i〉(〈R〉) = R

extr 〈i〉(〈Rdef i Rexc〉) = Rexcextr 〈i〉(〈Rdef j Rexc〉) i 6= j = Rdef orR Rexc

722 Well-Typed Partial Equivalences and their Semantics

As discussed in the case of dependencies in Section 522 syntactic partial equivalencesare untyped However their interpretation is made in the context of a type τ isin TThe atomic cases such as Equal and Any can apply to any type since they are notexhibiting any data type features Cases other than Equal and Any only have non-empty interpretations for types τ which are compatible with their shape For instancethe structured relation f 7rarr R only really makes sense for structured types with asingle field f whose type itself is compatible with R and will not be used in connectionwith variant or array types for example In Table 75 we detail the inference rulesrelated to the well-typedness of partial equivalences This is described as a judgementparameterized by a typing environment Γ (Definition 431)

Γ ` Equal τWTgt

Γ ` Any τWTperp

72 Partial Equivalence Relations 145

τ = structf1 τ1 fn τnΓ ` R1 τ1 Γ ` Rn τnΓ ` f1 7rarr R1 fn 7rarr Rn τ

WTStruct

τ = variant[C1 τ1| | Cn τn]Γ ` R1 τ1 Γ ` Rn τnΓ ` [C1 7rarr R1 Cn 7rarr Rn] τ

WTVar

Γ ` R τΓ ` 〈R〉 arrτi〈τ〉

WTArr

Γ ` Rdef τ Γ ` Rexc τ Γ(i) = τi

Γ ` 〈Rdef i Rexc〉 arrτi〈τ〉WTArrI

Table 75 ndash Well-Typed Partial Equivalences

The atomic values are generic they are well-typed with respect to any type (WTgtWTperp) The partial equivalences of structures (WTStruct) are well-typed only withrespect to an adequate structured type whose field types are themselves compatiblewith the equivalences mapped to them Similarly the partial equivalences of variants(WTVar) are well-typed only with respect to an adequate variant type In turn theconstructors must be themselves pointwise compatible with the equivalences mappedto them For well-typed array equivalences (WTArr WTArrI) the default relationas well as the exceptional relation have to be compatible with the type τ of the arrayrsquoselements Furthermore the type of i the index of the known exceptional equivalencerelation has to be compatible with τi the arrayrsquos index type

The semantics of a partial equivalence R for a type τ is a partial equivalence re-lation over values of type τ Given a valuation E from variables to semantic values(Definition 442) the interpretation JRKτ of a relation R isin R with respect to sometype τ is a binary relation over Dτ (Definition 441) The interpretation JRKτ is definedas shown in Table 76

JEqualKτ = (x x)| x isin Dτ JAnyKτ = Dτ times Dτ

Jf1 7rarr R1 fn 7rarr RnKstructf1τ1fnτn =(f1 = v1 fn = vn f1 = w1 fn = wn) | foralli 1 le i le n (vi wi) isin JRiKτi

J[C1 7rarr R1 Cn 7rarr Rn]Kvariant[C1τ1| | Cnτn] = (Ci[vi] Ci[wi]) | (vi wi) isin JRiKτi

J〈Rdef 〉Karrτi 〈τ〉 = ((P (v)k) (P (w)k)) | forallk (vk wk) isin JRdef Kτ

146 Chapter 7 Correlation Analysis

J〈Rdef i Rexc〉Karrτi 〈τ〉 = ((P (v)k) (P (w)k)) |E(i) isin P =rArr (vE(i) wE(i)) isin JRexcKτ forallk 6= E(i) (vk wk) isin JRdef Kτ

Table 76 ndash Partial Equivalence Relations ndash Semantics

A partial equivalence relation R only relates values of the same type τ whichmust be compatible with Rrsquos ldquoshaperdquo For structures a partial equivalence relatespointwise the values of the fields of the two structure values For variant values apartial equivalence relation relates values built with the same constructor Ci usingarguments whose values are related by a relation Ri For arrays P indicates the supporttype which has to be identical for both values The values of the array elements arepointwise related by the same relation Rdef with the exception of the i-th elementswhich are potentially related by an exceptional relation Rexc Since variables i are usedfor indicating the exceptional elements the valuation E is used for determining thevalue of i

73 Paths and Correlations

731 Paths and Correlation Types

The partial equivalence relations discussed in Section 72 and defined in 721 are enoughto represent fine-grained information for values of the same structured type For thestop_thread example discussed in Section 711 these would suffice to express the equal-ity of the pid current_thread and address_space fields between the input process inand the output process o by simply mapping this pair to the following partial equiva-lence

threads 7rarr Anypid 7rarr Equalcurrent_thread 7rarr Equaladdress_space 7rarr Equal

However the partial equivalence relations cannot for instance be used to convey theequality at line 1 in Figure 71 between the value of the threads field of in and the localta variable By not tracking information such as this we lose the targeted informationregarding the threads field denoted by Rth in Figure 72 In order to express thisinformation we first need to be able to refer to the substructure inthreads and relateits value to the one of ta

To this end rather than handling only partial equivalences between pairs of variablesof the same type and approximating the rest to Any ndash the element that conveys noinformation ndash we introduce an intermediate level allowing us to store relations betweensubparts of values We begin by introducing access paths Unlike the symbolic pathsintroduced in Chapter 6 and defined in 631 that are used for computing dependencysummaries with context-sensitive elements the paths used for the correlation analysis

73 Paths and Correlations 147

are actual access paths inside some valuersquos structure The symbolic paths used indeferred dependencies may cover multiple actual paths inside a value whereas theaccess paths required for the correlation analysis represent unique chains of internalaccesses leading to a single nested subvalue Each access path is rooted at one of theprogramrsquos variables It is noteworthy to remark that in both cases an intermediate levelbelow variables needs to be introduced as soon as fine-grained relations between pairs ofvariables are considered directly or indirectly In the case of deferred dependencies thiswas not the main goal per se but rather a mechanism for obtaining more precision inspecific cases for already pertinent dependency results In contrast for the correlationresults this is imperative for obtaining useful expressive information in non-trivialcases We therefore define a recursive type π isin Π encompassing this

Definition 731 Access Path Type π isin Π

π = | ε empty ndash root| f π f isin F| Cπ C isin C| 〈i〉π i index program variable

The empty path denoted by ε is the special case denoting an access to an entireelement ie the root The action of appending a non-empty path πprime to another pathπ is denoted by π πprime For instance the path denoting the current_state field of thei-th active associated thread of the in process of our stop_thread predicate would bethe following inthreads〈i〉Sometcurrent_thread

Meaningful information is conveyed by associating paths and partial equivalencerelations For instance the equality between inthreads and ta at line 1 in Figure 71can be expressed by associating Equal to the pair of subelements identified by thethreads path in in and by ε in ta We call correlation such a mapping from a pairof access paths to a partial relation After setting the i-th element of ta to ti thethread with the current state set to Blocked and everything else left unmodified wecould express the relation between in and ta by two correlations namely

(threads ε) 7rarr 〈Equal i Any〉

(threads〈i〉Somet 〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

To this end we introduce correlation maps κ isin K defined below

Definition 732 Correlation Maps κ isin K Correlation maps κ isin K are finite mappings from pairs of paths to partial equiva-

lence relations R isin Rκ Πtimes Π rarr R

148 Chapter 7 Correlation Analysis

Generally for two given variables e and o a correlation (π ρ) 7rarr R specifies thate and o have nested subelements respectively identified by the inner paths π and ρwhose values are related by the relation R

We conclude this subsection by specifying what it means for paths correlations andcorrelation maps to be well-typed

For characterizing the contexts in which an access path π is well-typed we need toconsider the types of values to which it can be applied and the types of (sub)valuesto which it can lead to Therefore in the following we define a typing judgement foraccess paths as a three-place relation π τ rarr τ prime whose meaning is that π can beapplied to any value of type τ and in that case it will always describe subvalues of typeτ prime Additionally the typing judgement is also parameterized by a set of input variablesI which are the variables having the right to appear as identifiers for array accessesThis is detailed in Table 77

Γ I ` ε τ rarr τWTε

τ = structf1 τ1 fi τi fn τnΓ I ` πi τi rarr τ prime

Γ I ` fiπi τ rarr τ primeWTStructAPath

τ = variant[C1 τ1| | Ci τi| | Cn τn]Γ I ` πi τi rarr τ prime

Γ I ` Ciπi τ rarr τ primeWTVarAPath

Γ I ` πi τ rarr τ prime Γ(i) = τi i isin IΓ I ` 〈i〉πi arrτi〈τ〉 rarr τ prime

WTCellAPath

Table 77 ndash Well-Typed Access Paths

Correlations are mappings from pairs of access paths to partial relations Thoughthe two access paths can be applied to values of different types they both need toreturn subvalues of the same type τ prime Furthermore the partial equivalence relationassociated to them has to be well-typed with respect to τ prime as detailed in Table 75The inference rule for well-typed correlations is shown in Table 78

Γ I ` π τl rarr τ prime Γ I ` ρ τr rarr τ prime Γ ` R τ prime

Γ I ` (π ρ) 7rarr R (τl τr)WTCorrelation

Table 78 ndash Well-Typed Correlations

73 Paths and Correlations 149

Finally as shown in Table 79 a correlation map κ is well-typed if all the correlationsit contains are well-typed

forall(π ρ) 7rarr R isin κ Γ I ` (π ρ) 7rarr R (τl τr)Γ I ` κ (τl τr)

WTCorMaps

Table 79 ndash Well-Typed Correlation Maps

732 Alignment and Partial Order

There is no clear choice for a canonical form for correlations For instance it is equiv-alent to write (ε ε) 7rarr f 7rarr R and (f f) 7rarr R Is one superior to the otherWhich one should be chosen Operations can create and manipulate correlations indifferent manners that are hard to predict New correlations can also be introducedwhile considering def-use chains in the transfer function presented later in Section 741Choosing between the two forms considerably limits flexibility Not choosing a canoni-cal form however has consequences as well notably it renders the definition of a partialorder between correlation maps difficult In order to compare two correlation maps κ1and κ2 we cannot simply verify if the path pairs are identical and compare their asso-ciated relations A correlation of the second map could be linked in different mannersto multiple mappings of the first

For instance between a process p of the type used by our stop_thread example andan array ta of the same type as the field threads of the process we might have thefollowing correlation maps

κ1 (threads ε) 7rarrlang

None 7rarr Any

Some 7rarr

t 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

rang

κ2

(threads ε) 7rarr 〈Equal i Any〉

(threads〈i〉Somet 〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

These correlation maps can be depicted as follows

150 Chapter 7 Correlation Analysis

κ1

threadsR1

p

taε

κ2

threadsR2

Rprime2

p

taε

As illustrated above in the given example map κ2 in addition to the relation R2associated to (threads ε) the relation associated to (threads〈i〉Somet 〈i〉Somet)and denoted by Rprime2 expresses information about the values of the processrsquo threadsfield and ta as well These are nested in the i-th element of each as identified by〈i〉Somet In order to compare these two correlation maps we have to first determinethe relationships between the pair of paths (threads ε) from κ1 and each pair of pathsof κ2 The first pair of paths in κ2 is identical whereas the second pair refers toelements that are further away from the root Based on these relationships we haveto extract all the information relevant to (threads ε) from κ2 and consider it in itsentirety This amounts to

(threads ε) 7rarrlangEqual i

None 7rarr Any

Some 7rarr

t 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

rang

Having expressed the information from the κ2 correlation map at the same level asthe information of κ1 is expressed ie that of the pair of paths (threads ε) wecan finally compare them and conclude that the information contained by κ2 is moreprecise than the relation associated to (threads ε) in κ1 The relation associated to(threads ε) in κ1 captures the equality between the values of the identifier and stackfields of all active thread elements of the two arrays identified by the paths The relationassociated to (threads ε) in κ2 expresses the equality between all thread elements ofthe two arrays except the i-th elements Furthermore if the i-th elements of the twoarrays are active it captures the equality between the values of the identifier andstack fields Thus by using the information contained by κ1 we can conclude that for

73 Paths and Correlations 151

all active elements of the two arrays the values of 2 out of the 3 fields are equal byusing the more precise information contained by κ2 we can conclude that all elementsof the two arrays are equal except the i-th one for which the values of the same 2 outof 3 fields as in κ1 are equal

In the general case for comparing two correlation maps κ1 and κ2 we need tocollect for each correlation (π ρ) 7rarr R in κ2 all the information contained by κ1 thatrefers to the elements identified by (π ρ) and verify if this covers at least the sameinformation as the relation R This information could be scattered across multiplemappings of the correlation map κ1 We call alignment the process of collecting forany correlation (π ρ) 7rarr R in κ2 all the information contained in κ1 that refers tothe elements identified by (π ρ) It is necessary in the absence of a canonical forma trait of our approach that is both a weakness and a strength it leads to complexcomputations but gives considerable flexibility as will be shown in Section 74

For aligning we first determine the relationships between paths by determining therelationship between the sequences of internal accesses that they represent These canbe identical representing the same traversal to the same subelement of a value or theycan be completely unrelated such as f and g for instance representing accesses to twodifferent fields of a structure They can also represent sequences of accesses of differentdepths one being the prefix of the other ie being closer to the root For examplethe path f is a prefix of the path f〈i〉 the first represents the access to the field f whereas the second one represents an access to the i-th element of the array nested inthe field f

To distinguish between these cases we define a link type and a matching operator

Definition 733 Link Type micro isinM A link type denoted by micro isinM is defined as follows

micro = | Identical| Left π| Right π| Incompatible

Definition 734 Matching Operator fThe matching operator f retrieves the link micro between two paths

f Πtimes Π rarrM f (π ρ) =

Identical π = ρLeft πprime π πprime = ρRight ρprime ρ ρprime = πIncompatible otherwise

The different cases are depicted in Table 711

152 Chapter 7 Correlation Analysis

f(π ρ) = Identicalπ ρ

f(π ρ) = Left πprime

π

πprime

ρ ρ

f(π ρ) = Right ρprimeπ

ρ

ρprime

π

f(π ρ) = Incompatibleπ ρ

Table 711 ndash Links between Access Paths

Definition 735 AligningAligning a correlation (π ρ) 7rarr R to another pair of paths (πprime ρprime) is denoted by

(Πtimes ΠtimesR)times (Πtimes Π)rarr R [(π ρ) 7rarr R] (πprime ρprime) = R(πρ)(πprimeρprime)

From R we obtain the information referring to the elements identified by (πprime ρprime) anddenote it by R

(πρ)(πprimeρprime) This is done by matching on π and πprime on the one hand and

on ρ and ρprime on the other and by distinguishing between the different cases Whenthe paths are identical we can simply return the relation R When the links betweenthe paths differ or when the paths are incompatible we have to approximate to theleast precise relation thus returning Any When π and ρ are more shallow paths iecloser to the root we need to make a projection denoted by For example aligning(f ε) 7rarr a 7rarr Ra b 7rarr Rb c 7rarr Rc to (fb b) consists in projecting b on the relationa 7rarr Ra b 7rarr Rb c 7rarr Rc and thus obtaining Rb More generically this case isdepicted below

73 Paths and Correlations 153

αβγ

δ

πα

β

γ

δ

αβγ

δ

πα

β

γ

δ

R

For aligning the known correlation to the given pair of paths we need to extractfrom R the information that is relevant for the nested element δ as depicted below

αβγ

δ

πα

β

γ

δ

αβγ

δ

πα

β

γ

δ

R

On the contrary if πprime and ρprime are closer to the root we need to perform an injectiondenoted by x For example aligning (fb b) 7rarr Rb to (f ε) consists in creating arelation a 7rarr Any b 7rarr Rb c 7rarr Any More generically this case can be depicted asfollows

αβγ

δ

βγ

δ

αβ

β

For aligning the known correlation to the given pair of paths we need to expressthe relation R

δat the level of the (αβ β) paths a level that is closer to the root This

consists in creating a new higher-level relation where the element identified by δ ismapped to R

δand everything else is ldquofilledrdquo with Any since nothing is known about

the rest of the elements This can be depicted as follows

154 Chapter 7 Correlation Analysis

αβγ

δ

βγ

δ

αβ

β

Any Any

In the general case R(πρ)(πprimeρprime) is computed as defined below

Definition 736 Computation of R(πρ)(πprimeρprime)

R(πρ)(πprimeρprime) =

R whenf (π πprime) = f(ρ ρprime) = Identical (σ R) whenf (π πprime) = f(ρ ρprime) = Left σx (R σ) whenf (π πprime) = f(ρ ρprime) = Right σAny otherwise

The used projection and injection x operators are defined as follows

Definition 737 Projection Operator

ΠtimesR 9 R

Projection (π R) =

R when π = ε (πprime extrf (R)) when π = f πprime

(πprime extrC(R)) when π = Cπprime (πprime extr 〈i〉(R)) when π = 〈i〉πprime

Definition 738 Injection Operator x

x R times Π 9 R

Injection x (R π) =

R when π = ε

f1 7rarr Any fi 7rarrx (R πprime) fn 7rarr Any when π = f πprime f = fi[C1 7rarr Any Ci 7rarrx (R πprime) Cn 7rarr Any] when π = Cπprime C = Cilang

Any i x (R πprime)rang when π = 〈i〉πprime

For applying the injection operator we need to know the types of the elements ontowhich the relation is injected ie in order to ldquofillrdquo the unknown relations for fields orconstructors with Any we need to know which those fields or constructors are Thusin practice we need to connect the types to the context

Aligning a correlation map κ isin K to (πprime ρprime) amounts to performing this operationfor each element (π ρ) 7rarr R of κ and intersecting the results with the andR operator(Definition 724)

Definition 739 Aligning Correlation Maps

κ (πprime ρprime) =and

R(πρ)7rarrRisinκ

R(πρ)(πprimeρprime)

74 Intraprocedural Correlation Analysis 155

The obtained results R(πρ)(πprimeρprime) are intersected in order to take into account all the in-

formation scattered across the different elements of κ and thus to obtain the mostprecise partial equivalence relation that is contained in κ about the elements identifiedby (πprime ρprime)

Finally we can define the preorder for correlation maps

Definition 7310 Correlation Maps Preorder v

κ1 v κ2 lArrrArr forall[(π ρ) 7rarr R] isin κ2 κ1 (π ρ) vR R

A correlation map κ1 is therefore more precise than another correlation map κ2 if therelation obtained by aligning κ1 to any pair of paths (π ρ) of κ2 is more precise thanR the relation mapped to this pair in κ2 By definition any correlation map κ isin Kis smaller than empty the empty correlation map Therefore the empty correlation mapis the top element for the correlation maps semilattice A bottom element in this casedoes not make sense as it would have to map to Equal any pair of paths denoting(sub)elements having compatible typesThe defined join operation between two correlation maps is denoted by

or

Definition 7311 Join Operationor

for Correlation Maps

κ1orκ2 = κ3 lArrrArr forall[(π ρ) 7rarr R] isin κ1 κ3(π ρ) = R orR κ2 (π ρ)

It consists in aligning the correlation map κ2 to any correlation (π ρ) 7rarr R in κ1 andjoining the obtained aligned relation with R We note that the correlation map obtainedby joining κ1 and κ2 will contain the same keys as κ1 We could have expressed joinby aligning the first correlation map to the elements of the second map This wouldlead to results that have different forms ie (ε ε) 7rarr f 7rarr R versus (f f) 7rarr R butwhich are equivalent by definition

The meet operation between two correlation maps is denoted byand

Definition 7312 Meet Operationand

for Correlation Maps

κ1andκ2 = κ3 lArrrArr κ3(π ρ) =

R andR Rprime when (π ρ) 7rarr R isin κ1

and (π ρ) 7rarr Rprime isin κ2R when (π ρ) 7rarr R isin κ1Rprime when (π ρ) 7rarr Rprime isin κ2

forall(π ρ)

74 Intraprocedural Correlation Analysis

741 Intraprocedural Correlation Summaries and Analysis

As was the case for the dependency analysis presented in Chapter 5 we are working witha control flow graph (CFG) representation of the predicatesrsquo bodies We remind thatnodes represent program states and edges are defined by statements with a particularexit label λ In our case all the outgoing edges of a node n bear the different cases of

156 Chapter 7 Correlation Analysis

the same statement s found at the program point n For each statement s there is anedge labeled s λk for each of its possible exit labels λk (as discussed in Section 42)However similarly to the dependency analysis our correlation analysis does not dependon this specificity

Intraprocedurally correlation information has to be kept at each point of the controlflow graph for each input and output pair of the node

Definition 741 Intraprocedural Correlation SummariesAn intraprocedural correlation summary is a mapping from pairs of variables v isin V

to correlation mapsK isin K K V times V rarr K

There is one special case called NoCorrelation which associates Any ndash the least precisepartial relation ndash to any pair of variables on any pair of valid compatible paths Itis the top element at the intraprocedural level Unreachable is used for nodes thatcannot be reached as its name implies and constitutes the bottom element at theintraprocedural level

For each node of a given control flow graph K(e o) retrieves the correlation mapbetween the local variable e and the output variable o If a mapping for e and o doesnot currently exist K(e o) retrieves the correlation (ε ε) 7rarr Equal when e = o or theempty correlation map empty otherwise

Establishing the partial order vK and the join operationorK is straightforward v

(Definition 7310) andor

(Definition 7311) are extended pointwise to an intraproce-dural summary for each ordered input-output pair and its associated correlation map

Definition 742 Partial Order for Intraprocedural Correlation Summaries

vKsube K timesK K1 vK K2 lArrrArr foralle o isin V K1(e o) v K2(e o)

Definition 743 Join Operation for Intraprocedural Correlation SummariesorK K timesK rarr K K1

orKK2 = K3 lArrrArr forall(e o) K3(e o) = K1(e o)

orK2(e o)

Our correlation analysis is a backward data-flow analysis computing an intrapro-cedural summary at each point of the control flow graph This represents the cor-relations at the nodersquos entry point For each exit label it traverses the control flowgraph starting with its corresponding exit node The intraprocedural summary forthe currently analysed label is initialized with pairs between the local value of eachassociated output variable of the label and the final value of the same output variablemapped to (ε ε) 7rarr Equal The analysis traverses the control flow graph and graduallyrefines the correlations using Kildallrsquos worklist algorithm (Kildall 1973) until a fixedpoint is reached Table 712 summarizes the representation and general equation ofthe statements For each statement the presented data-flow equation operates on theintraprocedural summaries of the statementrsquos successor nodes The intraproceduralsummary at the entry point of the node is obtained by joining the contributions ofeach outgoing edge

74 Intraprocedural Correlation Analysis 157

Definition 744 The contribution of an edge (n ni) labeled with s and λi is givenby Csλi(Kni) isin C where Csλi() is the transfer function of the edge labeled s λi

We note that there are four statements supported by αSmil ie the equality test no-operation the partial structure equality test and the possible variant test that haveno write effects and thus have no own contribution and are not included in Table 712Excepting the no-operation statement the correlation information at their entry pointis obtained by simply joining the intraprocedural summaries of their successor nodeson the true and false exit labels For the no-operation statement the correlation in-formation at the entry point is identical to the intraprocedural summary of its onlysuccessor node the one on the true exit label

Table 712 ndash Statements ndash Representations and Data-Flow Equations

Representation Equationn

n1 ni nk

Kn

Kn1

KniKnk

s λ1 s λks λiKn =

orK

nsλiminusminusrarrni

Csλi

(Kni)

Statement Csλ() csλ killλ

Assignment o = e (e o) 7rarr [(ε ε) 7rarr Equal] otrue

New Struct r = e1 en foralli 1 le i le n (ei r) 7rarr [(ε fi) 7rarr Equal] rtrue

Destructure o1 on = r foralli 1 le i le n (r oi) 7rarr [(fi ε) 7rarr Equal] oitrue

Get Field o = rfi (r o) 7rarr [(fi ε) 7rarr Equal] otrue

Set Field rprime = r with fi = e (r rprime) 7rarr [(ε ε) 7rarr rprimetruef1 7rarr Equal fi 7rarr Any fn 7rarr Equal]

(e rprime) 7rarr [(ε fi) 7rarr Equal]

Create Var v = Cp[e] (e v) 7rarr [(εCpe) 7rarr Equal] vtrue

Var Switch switch(v) as [o1| |on] (v oi) 7rarr [(Cie ε) 7rarr Equal] oiλCi

Array Get o = a[i] (a o) 7rarr [(〈i〉 ε) 7rarr Equal] otrue

Array Set aprime = [a with i = e] (a aprime) 7rarr [(ε ε) 7rarr 〈Equal i Any〉] aprimetrue(e aprime) 7rarr [(ε 〈i〉) 7rarr Equal]

The transfer function Csλ() formalizes the correlations created by the statement son the label λ between its local input variables and its local output variables denotedby csλ as well as the set killλ of variables whose values have been redefined by thestatement s on the label λ These are shown in Table 712 There is one crucialdifference between transfer functions Csλ() and intraprocedural summaries K Anintraprocedural summary K implicitly maps any pair (v v) for v isin V to (ε ε) 7rarr EqualOn the contrary in csλ when the variable v is used as both input and output by the

158 Chapter 7 Correlation Analysis

statement s the pair (v v) is mapped to the correlation map known between the inputrsquosv old value and the outputrsquos v fresh value Otherwise when v is an output ie v isin killλbut not an input of s (v v) is mapped to empty We remark that K represents a statewhile csλ represents a transition

In order to obtain the contribution Csλi(Kni) of an edge labeled with s and λi weneed to connect the information given by csλi to the information contained in the in-traprocedural summary Kni For example at the entry of node 3 in Figure 71 (onpage 138) when considering the scenario in which the predicate exits with true theintraprocedural summary contains the mapping

(th o) 7rarr

(Somet threads〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

On the true edge statement 2 creates the mapping

(ta th) 7rarr [(〈i〉 ε) 7rarr Equal]

Intuitively since we are traversing the graph backwards and we are mapping ordered(local) input-output pairs (ta th) and (th o) can be seen as a def-use pair thecorrelation associated to (ta th) expresses the relation between the defined value of thand the input ta used for creating it while the correlation associated to (th o) showsa subsequent use of that value of th for creating o The contribution of statement 2 onthe true edge should capture this flow of tarsquos value to orsquos value through the variableth Thus it should contain a mapping for the pair (ta o) In the general case we needto detect any variable r such that [(p r) 7rarr κ] isin csλi [(r q) 7rarr κprime] isin Kni and computethe mapping for (p q) in Csλi(Kni)

In order to compute the correlation map associated to (ta o) we take into accountthe fact that both the right path ε of csλ(ta th) and the left path Somet of Kn3(th o)refer to the th variable However they do not represent traversals of the same depthε refers to the entire value of th while Somet refers to the value below the construc-tor Some Between ta and o we can conclude that the values nested under the Someconstructor of the i-th elements are related

(ta o) 7rarr

〈i〉Somet threads〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

We call the process of obtaining the correlation map associated to (ta o) from thecorrelations associated to (ta th) and (th o) composition

In the general case the composition operation is denoted by and it refers to theprocess of computing the flow of a variable p to a variable q through an intermediatevariable r Thus when knowing that (p r) 7rarr [(π ρ) 7rarr R] and that (r q) 7rarr [(πprime ρprime) 7rarrRprime] we must first obtain the link (Definition 733) between the paths ρ and πprime relating

74 Intraprocedural Correlation Analysis 159

subvalues of r to subvalues of p and q respectively This is obtained by matching withf (Definition 734) In the context of the example given above ρ and πprime are the pathsreferring to subvalues of the th variable ie ε and Somet respectively If the twopaths are incompatible ie they refer to different unrelated subvalues of r there isno flow between p and q through r If the paths are compatible we can compute thecorrelation between p and r by distinguishing between the three different possible linkcases obtained with f

The case when the same subvalue of r identified by ρ (and the identical πprime) is relatedto both p and q is depicted below

f(ρ πprime) = Identical

π ρ ρprimeR Rprimep r

πprimeq

In this case computing the flow from p to q through r is rather straightforward Sincethe same subvalue of r is related to prsquos subvalue identified by π and to qrsquos subvalueidentified by ρprime we can relate these two subvalues and map the pair (π ρprime) to therelation obtained by composing R and Rprime We note that given the special form ofpartial relations R isin R the compose operation at this level is equivalent to orR

1

(Definition 723) The computation of the correlation for p and q is depicted below

f(ρ πprime) = Identical

π ρ ρprimeR Rprime

R orR Rprime

p rπprime

q

The subelements of r related to p and to q respectively can also have differentgranularities one being nested deeper in r than the other For instance the subvalueof r identified by the path ρ can be closer to the root than its subelement identified byπprime related to q This case is depicted below

1However this would not be the case anymore for a more complex partial relation type includingnot only equivalences but also more general relations

160 Chapter 7 Correlation Analysis

f(ρ πprime) = Left σ

π

σ

ρ

σ

ρprimeR

Rprime

p rπprime

q

In this case we can only detect the flow of p to q at the level of rrsquos subelement that isrelated to both p and q ie the subelement nested deeper Thus in order to computethe correlation between p and q we need to project σ on R and to compose the obtainedrelation with Rprime This is summarized by the following figure

f(ρ πprime) = Left σ

π

σ

ρ

σ

ρprimeR

Rprime

(σ R) orR Rprime

p rπprime

q

Finally in the complementary case the subvalue of r identified by the path ρand correlated to p can be nested deeper than the subvalue identified by πprime which iscorrelated to q This case is depicted below

f(ρ πprime) = Right σ

π ρ

σ

ρprime

σ

RRprime

p rπprime

q

As in the previous case we can only detect the flow of p to q at the level of rrsquos subelementthat is related to both p and q ie the subelement nested deeper In this case we needto project σ on Rprime and to compose the obtained relation with R The flow between pand q is at the level of the subvalues identified by π and ρprime σ respectively This isillustrated below

74 Intraprocedural Correlation Analysis 161

f(ρ πprime) = Right σ

π πprime

σ

ρprime

σ

RRprime

R orR (σ Rprime)

p rπprime

q

Formally if the ρ and πprime paths are compatible we compose the correlation elements(π ρ) 7rarr R and (πprime ρprime) 7rarr Rprime thereby obtaining a new correlation element (πbull ρbull) 7rarrR which is computed as shown below

Definition 745 Computing (πbull ρbull) 7rarr R

(πbull ρbull) = (π ρ) bull (πprime ρprime) def=

(π ρprime) whenf (ρ πprime) = Identical(π σ ρprime) whenf (ρ πprime) = Left σ(π ρprime σ) whenf (ρ πprime) = Right σ

R = R Rprimedef=

R orR Rprime whenf (ρ πprime) = Identical (σR) orR Rprime whenf (ρ πprime) = Left σR orR (σRprime) whenf (ρ πprime) = Right σ

We note that the use of the projection operation (Definition 737) for both compat-ible non-identical link cases for rrsquos access paths related to p and to q respectively is aconsequence of not choosing a canonical form for correlations The flexibility conferedby the absence of a canonical correlation form is visible at the composition level

The composition of correlation maps is denoted by and defined below

Definition 746 Composition of Correlation MapsComputing κ1 κ2 amounts to intersecting the composition of all correlation ele-

ments from κ1 and κ2

(κ1 κ2)(πbull ρbull) =and

R(πρ)7rarrRisinκ1

(πprimeρprime)7rarrRprimeisinκ2(πbullρbull)=(πρ)bull(πprimeρprime)

R Rprime

Finally the contribution Csλi(Kni) is obtained as defined below

Definition 747 Contribution Csλi(Kni)

CtimesK rarr K csλ K = K prime where K prime(p q) =andr

(csλ(p r) K(r q))

It is depicted in Figure 75

162 Chapter 7 Correlation Analysis

statement s

(csλ1∆λ1)

orK

orK(csλn ∆λn)

csλ1Kλ1

csλnKλn

csλ1Kλ1 csλn Kλn

Figure 75 ndash Entry Point ndash Correlation Information

We conclude this section by specifying what it means for intraprocedural corre-lation summaries to be well-formed showing the corresponding inference rule in Ta-ble 719 Only ordered input-output pairs can appear as keys in intraprocedural map-pings Therefore the well-formedness judgement is parameterized by the set of inputvariables I and by the set of output variables O The former indicate variables thathave the right to appear as left members of the variable pairs while the latter indicatevariables that have the right to appear as right members of the variable pairs The cor-relation map associated to each such input-output pair must be well-typed with respectto the types of the variables as given by the typing environment Γ (Definition 431)The typing judgement for correlation maps was shown in Table 79

forall(e o) 7rarr κ isin K Γ(e) = τe Γ(o) = τo e isin I o isin OΓ I ` κ (τe τo)

Γ IO KWFIntraCor

Table 719 ndash Well-Formed Intraprocedural Correlation Summaries

742 Intraprocedural Correlation Analysis Illustrated

To better illustrate our correlation analysis at an intraprocedural level and to sum-marize everything that has been presented so far in this chapter we exemplify themechanism behind it step by step on the predicate stop_thread discussed in Sec-tion 711 on page 138 We consider the true execution scenario apply our analysisand compare the actual obtained correlation results with the targeted ones depicted inFigure 72

Since a predicate can only exit with one label at a time and we are analysing thetrue label we can map the exit node inval to the special case Unreachable We beginby initialising the correlation summary for the exit node corresponding to the true exitlabel As shown in Figure 76 this consists in mapping the pair referring to the localvalue of the o variable and the final state of o to a correlation map containing a singlecorrelation namely (ε ε) 7rarr Equal This acknowledges that the value of the output oretrieved to the predicatersquos callers is the most recent value computed locally In thefollowing we denote the final value of o by o in order to distinguish it from the localvalue

74 Intraprocedural Correlation Analysis 163

1 ta = inthreads

2 th = ta[i]

3 switch(th) as [ | ti]

4 s = Blocked

5 ti = ti with current_state = s

6 th = Some(ti)

7 ta = [ta with i=th]

8 o = in with threads=ta

9 true 10 inval

true

true

true

true

true

true

true

true

false

false

None

Unreachable(o o) 7rarr (ε ε) 7rarr Equal

Figure 76 ndash Analysing Predicate stop_thread ndash Initialisation

We advance backwards along the control flow graph reaching node 8 We apply theequation corresponding to a field access as given in Table 712 and obtain the followingcorrelation summary

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr (ε threads) 7rarr Equal

We compose it with the correlation summary of its successor node ie the exit nodecorresponding to the true exit label thus detecting the flow of in to o and of ta to o

164 Chapter 7 Correlation Analysis

respectively through the local value o This amounts to

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr (ε threads) 7rarr Equal

Since node 8 does not have any other successor nodes the correlation information atits entry point is identical to the one we have just computed

We advance one step reaching node 7 and apply the corresponding equationthereby obtaining

(ta ta) 7rarr (ε ε) 7rarr 〈Equal i Any〉

(th ta) 7rarr (ε 〈i〉) 7rarr Equal

We compose it with the correlation summary of node 8 tracking the flow of the localvalue of ta to o through the new state of the variable ta after updating its i-thelement We also track the flow of th to o The correlation map for the (in o) pairremains unchanged We thus obtain

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr (ε threads) 7rarr 〈Equal i Any〉

(th o) 7rarr (ε threads〈i〉) 7rarr Equal

In order to obtain the correlation information at the entry point of node 7 we need tojoin the computed correlation summary with the correlation summary known for theother successor of node 7 namely the exit node 10 Since the latter is Unreachable theidentity element for join at the intraprocedural level it does not affect the correlationsummary at the entry point of node 7 We proceed similarly for nodes 6 5 4 3 and 2applying the corresponding data-flow equation for each statement and composing withthe intraprocedural correlation summary of the successor node Since each of thesenodes has only one possible exit label there are not multiple contributions that need tobe joined At the entry point of node 6 for example we obtain the following summary

(ta o) 7rarr (ε threads) 7rarr 〈Equal i Any〉

(ti o) 7rarr (ε threads〈i〉Somet) 7rarr Equal

74 Intraprocedural Correlation Analysis 165

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

We skip some steps and obtain the following correlation summary at the entry point ofnode 2

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr

(ε threads) 7rarr 〈Equal i Any〉

(〈i〉Somet threads〈i〉Somet) 7rarr

id 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

Finally we reach node 1 where we apply the data-flow equation correspondingto a field access and compose the obtained information with the correlation summarycomputed at the entry of node 2 We obtain

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(threads threads) 7rarr 〈Equal i Any〉

(threads〈i〉Somet threads〈i〉Somet) 7rarr

id 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

Since the node 1 has only one successor node this correlation summary represents

the correlation information at the entry point of node 1 ie there is no other correlationsummary to join it with This contains a single pair of variables (in o) and theirassociated correlation map Since the pair is an input-output pair of the stop_threadpredicate we do not need to filter anything out This constitutes the final correlationsummary for the analysed predicate on the true exit label These results are identicalto the ones we had depicted as our targeted results in Figure 72

For the inval exit label the corresponding correlation summary is NoCorrelationThis example can be tried on the web page2 dedicated to our correlation analysis Other

2Correlation Analysis Web Page httpwwwajl-demofr2016

166 Chapter 7 Correlation Analysis

examples are provided and explained there as well Additionally users can devise andtest their own examples

75 Interprocedural Correlation AnalysisOur analysis is performed label by label and interprocedural correlation domains asso-ciate an intraprocedural summary to each exit label of the analysed predicate There-fore interprocedural domains encapsulate an intraprocedural summary for each possibleexecution scenario of a predicate

An interprocedural domain Kp of a predicate p is thus defined as shown below

Definition 751 Interprocedural Correlation Domain

Kp Λp rarr K where Λp is the set of output labels of predicate p

The intraprocedural summary associated to each label is filtered so as to contain onlyordered pairs of variables where the left member is an input of the analysed predicateand the right member is an output associated to the analysed label The correlationmaps associated to such pairs are built so as to contain correlations where only inputvariables may appear in array cell paths Similarly the exception index in partialequivalence relations of arrays must be an input variable Registering exceptions inarray correlations only for input variables is not a consequence of a language restrictionon array operations but simply a consequence of the fact that at the interprocedurallevel only correlation information between inputs and outputs makes sense

The interprocedural domain of a predicate is used for deducing the transfer functionsfor a predicate call statement

In the following we detail the equation corresponding to a call to a predicate

p(e1 en)[λ1 o1 | | λm om]︸ ︷︷ ︸s

having the following signature

p(ε1 εn)[λ1 ω1 | | λm ωm]

The general equation form given in Table 712 applies

Kn =orK

nsλiminusminusrarrni

Csλi(Kni)

The transfer functions for the predicate call statement are deduced from the predicatersquosinterprocedural domain in the following fashion

Csλi(Kni) = csλi Kni killλi = oicsλi(ej o

ki ) = κjki forallj isin 1 n forallk isin 1 h

76 Extension ndash Constructor Evolution 167

whereκjki = Kp(λi)(εj ωki ) J (ε 7rarr e)s = p(e1 en) [λ1 o1 | | λm om] oi = o1

i ohi

Namely the contribution of a predicate call to each (ej oki ) input-output pair stemsfrom the contribution of the interprocedural domain for label λi and formal input-output pair (εj ωki ) In these all the formal input parameters ε in array partial equiv-alences and in array cell paths are substituted by the corresponding effective inputparameters from e or approximated away The substitution operation is denoted byJ (χ) where χ is a substitution from formal to effective parameters

Our correlation analysis is context-insensitive and αSmil programs are analysed bycomputing once and for all an interprocedural correlation summary for every predicatethey contain The correlation summaries are stored in a mapping binding predicateidentifiers to their interprocedural correlation information

76 Extension ndash Constructor EvolutionThe correlation analysis as presented so far in this chapter tracks and detects partialequivalence relations between inputs and outputs of predicates An interesting directionto investigate would be an extension of our analysis allowing us to detect not onlyequivalences but more general relations that could capture the evolution of constructorsfor variants In Figure 74-b) we illustrated the form of correlations computed forvariants With the extension the correlation information obtained for variants wouldbe richer as illustrated in Figure 77

Figure 77 ndash Construction Evolution

This extension would allow inferring the preservation of certain properties whentransitioning from a ldquostrongerrdquo state to a ldquoweakerrdquo state For instance we consideragain our process and thread data types introduced in Chapter 3 Section 315 (onpage 49 and 48 respectively) Additionally we consider a predicate kill_thread shownbelow which modifies the array of associated threads of the input p by setting the i-thelement to None If the i-th element is already inactive no modifications are made Inthis case the predicate exits with label inactive and simply copies p to the output o

predicate kill_thread ( process p int i)-gt [ true process o | inactive process o | oob] array ltoption ltthread gtgt threads option ltthread gt thi thread ti o = p [ true -gt 1]

168 Chapter 7 Correlation Analysis

threads = o threads [ true -gt 2]thi = threads [i] [ true -gt 3 f a l s e -gt 9]switch (thi) as [ti |] [Some -gt 4 None -gt 8]thi = None [ true -gt 5]threads = [ threads with i = thi] [ true -gt 6 f a l s e -gt 9]o = o with threads = threads [ true -gt 7][ true][ inactive ][oob]

For variants we are currently detecting equivalence relations between the argumentsof variant values built with the same constructor With the extension for capturingconstructor evolution we could take a step further and also detect for a given executionscenario the set of possible transitions between the different constructors For instancefor the kill_thread predicate on the true exit label we could detect that the onlypossible transition of the i-th element of the threads array is from Some to None Had theelement been None the predicate would have followed the inactive execution scenario

We further consider a predicate disjoint_stacks(process p) verifying a fundamen-tal property of any process namely the fact that the stacks of all associated threads ofthe process are disjoint If the property holds for the input process p prior to executingkill_thread intuitively it should continue to hold subsequently for the output processo as well If the arrayrsquos i-th element was already inactive ie None the propertydisjoint_stack obviously still holds since the input p is simply copied to the outputo If it was active the transition from Some to None does not impact the property asit does not create a new memory region that could threaten the property In this casethe transition from Some to None is a transition from a ldquostrongerrdquo state to a ldquoweakerrdquostate

We have conducted preliminary experiments targeting the detection of such infor-mation and these have led to promising results Tracking general relations that captureevolution requires certain modifications that are confined to the abstract partial relationtype and to the data-flow equations concerning variants

The abstract partial relation type presented in Section 72 (Definition 721) wouldneed to be extended with Impossible an additional atomic case along with Equal andAny It is required for signalling impossible transitions between variant constructors andleads to some overlap with the possible-constructors analysis presented in Chapter 5The partial relations for variants would be expressed as a square matrix of constructorswhere each element aCiCj of the matrix has a corresponding associated partial relationRCiCj Impossible would be associated to any element aCiCj for which the transitionfrom Ci to Cj is impossible For the elements aCiCi on the main diagonal for which thetransition from Ci to Ci is possible we could compute partial equivalences between thearguments of the Ci constructor For the elements aCiCj lying outside the main matrixdiagonal for which the transition from Ci to Cj is possible the associated relationwould be Any Alternatively for computing reflexive relations we could consider thattransitions on the main diagonal ie from Ci to Ci are always possible

77 Related Work 169

Impossible would become the bottom element of our partial relation type R replac-ing Equal in this role It would also become the identity element for the join operationorR (Definition 723) of partial relations and the absorbing element for the meet op-eration andR (Definition 724) Similarly to the case of for the abstract dependencytype the current bottom element Equal would become the middle element of a doublediamond-shaped abstract type and it would require the addition of some extra compar-ison cases for vR (Definition 722) as well as some extra cases for the orR (Table 72)and andR (Table 73) operations The most important modification however would bein the case of the compose operation Currently the compose operation at the level ofpartial equivalence relations is orR With this extension it would amount to a matrixmultiplication

77 Related WorkA rigorous presentation of the frame problem in specification and the different existingapproaches for addressing it has been given by Borgida et al (Borgida Mylopoulosand Reiter 1993 Borgida Mylopoulos and Reiter 1995) A more recent overview offraming is included in (Hatcliff et al 2012)

In recent years a vast body of research has been conducted on the specificationof frame properties in the context of modular programming This ranges from com-plex approaches imposing the swinging pivots requirement (Leino and Nelson 2002) toapproaches using data groups (Leino 1998 Leino Poetzsch-Heffter and Zhou 2002)adopting the Universe type system (Muumlller 2002 Muumlller Poetzsch-Heffter and Leav-ens 2003) or variations of it (Leino and Muumlller 2004 Leino and Muumlller 2006 Barnettand Naumann 2004 Barnett et al 2004) to approaches based on the dynamic frametheory (Kassios 2006 Kassios 2011 Smans Jacobs and Piessens 2012) regionallogic (Banerjee Naumann and Rosenberg 2008) or separation logic (Reynolds 2002OrsquoHearn Yang and Reynolds 2004 Parkinson and Bierman 2005)

In (Smans Jacobs and Piessens 2012) Smans et al present a technique for frameinference based on a variant of dynamic frames inspired by separation logic and relyingon accessibility information contained within pre- and postconditions By includingaccessibility information in a methodrsquos precondition an upper bound on the set oflocations modifiable by the method can be detected In our case the upper bound onthe set of elements that a predicate may modify when exiting with a particular exit labelis implicitly the set of output variables generated on that exit label joined with theset of local variables The implicit dynamic frame approach requires the specificationof accessibility information Our correlation analysis is entirely automatic and infersfine-grained frame properties for compound data structures

The literature on shape analysis (Calcagno et al 2009 Sagiv Reps and Wilhelm1999 Jones and Muchnick 1979 Montenegro Pentildea and Segura 2015) and side effectsanalyses (Salcianu and Rinard 2005 Milanova Rountev and Ryder 2005) is vastThe former is aimed at deep-heap mutations while we are focusing on deep-state mod-ifications in the context of complex transition systems The latter determine memory

170 Chapter 7 Correlation Analysis

locations that may be modified by an operation Reasoning about heap locations isbeyond our scope We treat mappings between variables and their values analyse theirevolution in a side-effect free environment and detect not only what is modified butalso how and to what extent

In (Chang and Leino 2005) Chang and Leino present the congruence-closure ab-stract domain designed for an object-oriented context and implemented in the Specprogram verifier They infer and express relations between fields of variables a goalsimilar to ours The congruence-closure domain maintains equivalence graphs mappingfield accesses to symbolic locations On its own this domain allows the inference andexpression of relations for accessed fields In order to take into account updates as wellthis needs to use the heap succession domain as a base Unlike us they can expresspreorders between fields depending on the base domains used However our domainhandles both accesses and updates to structures arrays and variants in a uniform man-ner independent of additional information We have sketched an extension for handlingnot only equivalences but also more general relations capturing constructor evolutionThis is a direction we plan to investigate in the future

Rakamarić and Hu report in (Rakamaric and Hu 2008) a method to infer frameaxioms of procedures and loops based on static analysis As a starting point they usethe DSA shape analysis presented by Lattner et al (Lattner Lenharth and Adve2007) DSA provides a summary of points-to relations as a graph that is used tocompute a set of memory locations that are modified by a procedure or its calleesBy a pass through the graph for each node reachable from the globals or procedureparameters they generate expressions representing a path to that node The generatedframe axioms are used internally by an extended static checker of C programs iein a purely automatic setting In contrast our analysis is designed for an interactiveverification context Our technique focusing on a purely functional language is notconcerned by aliasing and does not depend on an external points-to framework

In (Taghdiri Seater and Jackson 2006) Taghdiri et al present a technique forextracting procedure summaries for object-oriented procedures used to prove verifi-cation conditions Procedures are executed symbolically and the environment of thepost-state is computed so as to express every variable and field in terms of the values ofthe variables and fields of the pre-state Their goal is broader than ours However un-like their summaries our correlation results encompass only information that is visiblefrom the outside (to the callers)

Bertrand Meyer presents the double frame inference strategy an approach that tar-gets the automation of both frame specification and frame verification in the contextof Eiffel (Meyer 1991) The first component ndash the frame specification inference ndash relieson the analysis of method postconditions The idea stems from an informal reviewof JML code which showed that in practice there is a considerable overlap betweenwhat is mentioned in an assignable clause ie modifies clause and what is includedin the postcondition It relies on the observation that in general when manually writ-ten specifications include clauses about what changes they also include clauses abouthow it changes By analysing a methodrsquos p postcondition a set p is obtained Thisrepresents an overapproximation of the set of elements that are allowed to be modified

78 Conclusion 171

by p according to its specification The second component of the strategy the frameimplementation inference relies on the frame calculus (Kogtenkov Meyer and Velder2015) which is itself based on alias calculus (Kogtenkov Meyer and Velder 2015Meyer 2010 Meyer 2011) Methods are analysed and p is detected this representsan overapproximation of the set of expressions whose values may change as a result ofexecuting p Frame verification amounts to verifying that p includes p Though ourgoal is closely related to the issue addressed by the double frame inference in generaland the frame calculus in particular the approaches are not directly comparable asthey target languages with different characteristics which in turn influence both theadopted analysis techniques and the derivative targeted issues Both approaches areconservative and automatic ie neither requires manual annotations In contrast tothe frame calculus our correlation analysis is standalone and it is not concerned byaliasing

78 ConclusionIdentifying precise information concerning the effects of program operations is possibleby means of static analysis without sacrificing scalability In this chapter we have pre-sented a data-flow analysis that tracks the origin of subparts of the output and relatesit to subparts of the inputs thus detecting not only what is modified but also how it ismodified and to what extent The correlation analysis is a flow-sensitive path-sensitiveinterprocedural analysis that handles arrays structures and variants The analysis iscontext-insensitive but this trait does not have a costly impact in terms of precisionWe have defined a partial equivalence type mirroring the layered structure of algebraicdata types and associative arrays and we introduced an intermediate level consisting ofaccess paths and correlations in order to compute expressive fine-grained equivalencesbetween parts of the inputs and parts of the outputs in a flexible manner Just asframe properties specified by means of old expressions tend to lead to a proliferationof conditions to be specified our correlation summaries showing equivalences betweeninput and output subelements can become verbose in the case of predicates handlinglarge compound values and modifying only a limited input subset However these aredetected automatically and their verbose form could easily be transformed using a morecompact notation of the following form

input ( - changed subelements) = output ( - corresponding subelements)Detecting modifications is traditionally associated to shape analyses that focus

on deep-heap mutations Side-effect analyses detect memory locations that may bemodified by an operation We however are interested by deep-state modifications inthe context of a functional language Other analyses inferring frame properties havebeen devised These are mostly used in a purely automatic setting We howeverdeveloped a correlation analysis meant to be used in an interactive verification context

Similarly to the case of the dependency analysis presented in Chapter 5 we haveimplemented a prototype of the correlation analysis in OCaml and we have applied it toa functional specification of ProvenCore (Lescuyer 2015) Medium-sized experiments

172 Chapter 7 Correlation Analysis

performed on the abstract layers of ProvenCore show encouraging results For instancethe correlation results of approximately 630 αSmil predicates totalling approximately10000 lines of code are obtained in less than 05 seconds ie faster than the dependencysummaries are obtained on the same predicates This is partly a consequence of thefact that unlike the dependency analysis which computes summaries for both codeand specifications the correlation analysis computes non-trivial results only for codeSpecifications are predicates with Boolean exit labels which generate no outputs Sinceour correlation analysis computes fine-grained relations between parts of the inputsand parts of the outputs it cannot detect anything non-trivial in their case Howeverthis would change if we were to extend our correlation analysis and track relationsbetween parts of the inputs as well This is a direction that we plan to investigate inthe future We will focus on the implementation and the discussion of the obtainedresults in Chapter 8 The prototype can be tested on the web page3 dedicated to ourcorrelation analysis where multiple examples are provided and explained Additionallyusers can devise and test their own examples

The correlation analysis presented in this chapter has been the subject of a previouspublication (Andreescu Jensen and Lescuyer 2016)

3Correlation Analysis Web Page httpwwwajl-demofr2016

173

Chapter 8

Implementation Application andResults

Any fact becomes important when itrsquosconnected to another

Umberto Eco

In this chapter we focus mainly on the practical aspects regarding our static anal-yses and the approach to using their results for inferring the preservation of certainlogical properties In Section 81 and Section 82 we give a brief overview of the imple-mentations of our dependency and correlation analyses respectively In Section 83 wesuccinctly present ProvenCore one of the two microkernels developed at Prove amp Runand discuss in terms of execution times and precision the experiments we made on itsfunctional specification In Section 84 we describe the manner in which the summariescomputed by our dependency and correlation analyses are meant to be combined andused for reasoning about the preservation of certain logical invariants We illustratethis approach and discuss it on some examples inspired by ProvenCore

81 Implementation of the Dependency AnalysisPrototypes for both of our static analyses the dependency analysis presented in Chap-ter 5 and its extension with symbolic dependencies presented in Chapter 6 as well as thecorrelation analysis presented in Chapter 7 have been implemented in OCaml (Reacutemyand Vouillon 1997) While trying to retain close proximity to the analyses as presentedtheoretically their implementation mildly diverges from them at certain points due toperformance and scalability considerations One of the main differences is related to themanner in which we store dependencies and partial equivalence relations Based on theobservation that in general when considering complex transition systems the statesare characterized by properties depending only on a limited subset of their subelementswhile most transitions modify only a limited subset of the input statersquos subelements weadopt a more compact representation This in turn is reflected in some of the operatorsas well

174 Chapter 8 Implementation Application and Results

811 Dependency Type and Operators

The abstract dependency type δ that mirrors the structure of associative arrays andalgebraic data types was introduced in Chapter 52 on page 83 It is implemented bythe recursive type dep shown below

( Implementation for the dependency typeintroduced in Chapter 52 )

type dep =| Everything ( top )| Impossible ( bottom )| Nothing| Deferred of accesses ( symbolic )| Struct of struct_typ dep FMapt| Variant of var_typ dep CMapt| Array of dep (var dep) option

The maps used for expressing dependencies for structures and variants use as keysfields and constructors respectively

type fieldmodule FMap EMapS with type key = field

type consmodule CMap EMapS with type key = cons

In contrast to the extended abstract dependency type δ (Definition 641) the actualdependency for structures stores in addition to the map associating dependencies tofields the type struct_typ of the structure as well Similarly the actual dependencyfor variants stores the variantrsquos type var_typ as well in addition to the map associatingdependencies to constructors

As previously mentioned we are targeting complex transition systems such as op-erating systems and microkernels In practice transitions frequently map a large inputstate to a large output state but for computing the output state they are concernedonly with a limited subset of the input state The number of subelements of a complexinput on which the outcome of a predicate depends tends to be low compared to thetotal number of input subelements so we are filtering fields mapped to denotedby Nothing in our implemented dependency type from dependencies for structuresSimilarly from dependencies for variants we are filtering constructors mapped to perpdenoted by Impossible in our implemented dependency type

As a consequence of this optimization we need to know and hence store the typesof structures and variants in order to correctly compare join and reduce dependenciescorresponding to such types In addition this is also useful for checking that theconstructed dependencies are well-typed

81 Implementation of the Dependency Analysis 175

For building dependencies of the corresponding type we have implemented smartconstructors The dependency type is private and new dependencies can be constructedonly by using the provided smart constructors

As explained in Section 52 gt and perp can apply to any type For instance gtcan be seen as a placeholder for data that is needed in its entirety Structure arrayor variant dependencies whose subelements are all entirely needed and thus uniformlymapped to gt are transformed to gt The perp dependency is a placeholder for data thatcannot occur on a certain execution scenario A whole variant value is impossible if allits constructors are mapped to perp A whole structure or array is impossible if any of itssubelements is impossible These canonizations1 are made by our smart constructorsFor instance the smart constructor for structure dependencies returns Everything ifit receives as an input a map of fields in which each key is mapped to EverythingSince fields that are absent from a field map must be interpreted as being mappedto Nothing before returning Everything the constructor also verifies that the map offields it received as an input contains all the fields of the structure type struct_typgiven as an input as well If the given map of fields contains an Impossible value thesmart constructor returns Impossible Any mapping field 7rarr Nothing is filtered fromthe given input map

Similarly for variant dependencies the corresponding smart constructor receives asinputs the variantrsquos type and a map from constructor keys to dependency values Ifall constructors of the variant as indicated by its type var_typ are present in the in-put map and mapped to Everything the smart constructor returns Everything Ifall constructors are present and mapped to Impossible the smart constructor re-turns Impossible Otherwise if the input map contains some constructors mappedto Impossible the corresponding mappings are filtered from the map used to build thevariant dependency

For arrays the smart constructor returns Everything if both the default dependencyand the known exceptional dependency are Everything or if the former is Everythingand there is no known exceptional dependency If any of the two dependencies isImpossible the smart constructor returns Impossible

The smart constructor for deferred dependencies receives a set of variables as aninput If the given set is empty the constructor returns Nothing Otherwise it createsthe access map having the variables in the given input set ie the root variables forsymbolic paths as keys As described in Section 65 a set containing a single paththe empty path is initially associated to each

The v operator (Definition 522) as formally presented in Section 52 and detailedin Table 51 on page 86 returns false whenever comparing two incompatible depen-dencies In practice situations in which comparisons on incompatible types are madeshould never be reached As a consequence whenever we compare structure or variantdependencies we check as a safety measure that the two dependencies correspondto structures or variants of the same type Otherwise the two dependencies are not

1For making all the described canonizations we have to make sure that whenever we replace δ byδprime both δ v δprime and δprime v δ hold

176 Chapter 8 Implementation Application and Results

comparable and we throw an exception that indicates that the types are incompatibleFor structure dependencies whenever a mapping for one field f can be found only inone of the two maps to be compared we compare its mapped dependency value toNothing since absent fields must be interpreted as being mapped to Nothing Similarlyfor variant dependencies whenever a mapping for a constructor C can be found only inone of the two maps to be compared we interpret it as being mapped to Impossible

The join (Definition 523) and reduction operator (Definition 524) as formallypresented in Section 52 on page 87 and 89 respectively are total they return gt theelement conveying no information for incompatible dependencies In practice the twooperators are partial an exception is thrown whenever the two dependencies to bejoined or reduced are incompatible This applies to structures or variant dependenciesthat do not correspond to the same type as well Otherwise when joining or reducingtwo compatible structure or variant dependencies we interpret missing fields or missingconstructors as being mapped to Nothing or Impossible respectively

In Section 661 we described that there are two types of free variables that canappear in dependencies The first type consists of index variables that can appear inarray dependencies For instance in ltNothing ^ i Everythinggt the variable i is theindex of the cell for which the exceptional dependency Everything is known Addi-tionally such index variables can also appear in symbolic paths related to arrays suchas ltNothing ^ i Deferred(a[i])gt or ltDeferred(a[ - i]) ^ i Nothinggt Suchindices must be input variables of the currently analysed predicate as explained in Sec-tion 532 on page 97 The second type of free variables are the root variables thatappear in deferred dependencies For instance in ltDeferred(a[ - i]) ^ i Nothinggtthe variable a is a root variable In the general case the root variables are those outputsto which symbolic access paths are associated in deferred dependencies In order tomake use of the computed context-sensitive information actual dependencies can besubstituted for the root variables This is done by applying the symbolic access pathsto the dependency to substitute By traversing entire dependencies such as

f -gt ltNothing ^ j Everything gtg -gt b -gt Deferred (o)h -gt x -gt Everything

y -gt ltDeferred (a[ - j]) ^ j Nothing gt

and substituting the nested deferred dependencies such as Deferred(a[ - j]) andDeferred(o) we apply context-sensitive information Simultaneously during the sametraversal we also substitute the indices appearing in array dependencies such as j inthe dependency associated to the field f for instance These are either substituted byanother index variable or they are forgotten If the index to substitute is an inputthe formal variable will be replaced by the effective one Otherwise an approximationis made in order to remove the local index variable This consists in joining thedefault and the exceptional dependencies and using the result for building a new arraydependency without an exception

An index substitution is a mapping from variables to either a new index variable toreplace it or to Forget if all references to the index variable should be removed Theindex type is shown below

81 Implementation of the Dependency Analysis 177

type index = | NewIdx of var | Forget

The substitution function subst has the following type

type varmodule VMap EMapS with type key = var

val subst index VMapt -gt dep VMapt -gt dep -gt dep

Its first argument is the index substitution the second argument is the dependencysubstitution mapping root variables to dependencies The third argument is the depen-dency on which the substitutions are to be made The function returns the dependencyobtained after making both substitutions The two substitution passes are fused forperformance considerations

A separate substitution is performed for dealing with polymorphic types Our de-pendency type is not polymorphic per se However αSmil supports polymorphic typesand thus the variables described by the computed dependencies can have a polymorphictype Since the types of structures and variants are stored in the corresponding depen-dencies we must substitute polymorphic type parameters by their effective argumentsThis is done by a recursive function which traverses the dependencies and makes thetype substitution at each nested level if necessary Besides this substitution no othermodifications were made in the implementation in order to handle polymorphism Thisjustifies our formal presentation of the analyses without polymorphism

812 Intraprocedural Dependency Analysis

The intraprocedural dependency type ∆ (Definition 531) mapping variables to depen-dencies δ that was introduced in Chapter 531 is implemented as shown below

type reachable = dep VMapt

( Implementation of the intraprocedural dependency domainintroduced in Chapter 531 )

type intra =| Unreachable| Reachable of reachable

The VMap type is a map having variables as keys

type varmodule VMap EMapS with type key = var

178 Chapter 8 Implementation Application and Results

In order to avoid needlessly storing large maps predominantly containing variablesmapped to Nothing we do not store by default mappings for variables for which de-pendencies have not yet been computed Therefore the intraprocedural dependency ofany variable v for which a mapping has not yet been stored in the map is interpreted asv 7rarr Nothing As discussed in the previous section for the partial order join and reduc-tion operators when applying v∆ (Definition 533) and the join or∆ (Definition 534)and reduction oplus∆ (Definition 535) operators at the intraprocedural level any miss-ing mapping from a Reachable domain has to be interpreted as a variable mapped toNothing

With this interpretation forgetting a variable v (Definition 532) from an intrapro-cedural domain denoted by in Chapter 531 becomes straightforward and amountsto simply removing the mapping for v from the intraprocedural domain

( Forget )l e t forget d v =match d with

| Unreachable -gt d| Reachable dmap -gt Reachable (VMap remove v dmap)

We remark that the complex operations are performed at the dependency typelevel and are mostly applied pointwise at the intraprocedural level The interproce-dural dependency domains are mappings from labels to intraprocedural dependencysummaries

82 Implementation of the Correlation Analysis

821 Partial Equivalence Relations and Operators

The partial equivalence type R (Definition 721) that mirrors the structure of associativearrays and algebraic data types which was introduced in Chapter 721 on page 141 isimplemented as shown below

( Implementation of the partial equivalence typeintroduced in Chapter 72 )

type pequiv =| Equal ( bottom )| Any ( top )| PStruct of struct_typ pequiv FMapt ( structures )| PVariant of var_typ pequiv CMapt ( variants )| PArray of pequiv (var pequiv ) option ( arrays )

The FMap and CMap types are the ones presented on page 174Similarly to structure and variant dependencies and due to the same practical

considerations in addition to the map associating partial equivalences to fields the

82 Implementation of the Correlation Analysis 179

type struct_typ of the structure is stored as well Similarly the implemented partialequivalence for variants stores the variantrsquos type var_typ as well in addition to themap associating partial equivalences to constructors

For avoiding to store large maps in which the majority of the fields or constructorsare mapped to Any we filter mappings of the type field 7rarr Any and cons 7rarr Any

The partial equivalence type is private and the only manner in which partial equiva-lence relations can be built is by using the provided smart constructors The two atomiccases Equal and Any respectively can apply to any type The smart constructors forpartial equivalences corresponding to structures filters out any field mapped to Any Italso returns Equal if all fields of the structure are mapped to Equal in the given inputmap If on the contrary the given input map is empty or all fields are mapped to Anythe smart constructor returns Any

Similarly for partial equivalences corresponding to variants the correspondingsmart constructor receives as inputs the variantrsquos type and a map with constructorkeys and partial equivalences If all constructors of the variants as indicated by theirtype are present in the input map and mapped to Equal the smart constructor returnsEqual If all constructors are present and mapped to Any or if the given input map isempty the smart constructor returns Any Otherwise if the input map contains someconstructors mapped to Any the corresponding mappings are filtered from the mapused to build the variant partial equivalence

For arrays the smart constructor returns Equal if both the default relation and theknown exceptional relation are Equal or if the former is Equal and there is no knownexceptional relation If both the default relation and the known exceptional relationare Any or if the former is Any and there is no known exceptional relation the smartconstructor returns Any

In contrast to dependencies there is only one type of free variables that can appearin partial equivalence relations namely index variables As was the case for arraydependencies these can appear in partial equivalence relations corresponding to arraysand they must be input variables We traverse the partial equivalences recursivelychecking for each index variable appearing in an array relation if it is an input ora local variable References to local variables are eliminated by approximating thepartial equivalences effectively joining the default array relations with the exceptionalarray relations

822 Intraprocedural Correlations

In Chapter 74 on page 156 we have defined intraprocedural correlation summaries(Definition 741) as mappings from pairs of variables to correlation maps In practicethe type intra is the following

module PVMap = EMapMake( struct type t = element element l e t compare = compare end)

module PMap = EMapMake( struct type t = Patht Patht l e t compare = compare end)

180 Chapter 8 Implementation Application and Results

type correlation = pequiv PMapttype intra = correlation PVMapt

type t =| Related of intra| NoCorrelation| Unreachable

The implemented intraprocedural correlation summary type intra is a mappingfrom pairs of elements to correlation maps The element type is shown below

( The type of the elements for which correlationsare computed and kept intraprocedurally Ghost elements are used only for variants for avariant [v] a ghost element that nests the typeof the variant [v] is created These are filteredfrom final results )

type element =| Local of var| Output of var| Ghost of texpr

In practice we need to distinguish between output variables and local variables Thisis important for distinguishing between the final value of an output ie the one cor-related with values of the inputs and its local intermediate values Furthermore weneed to introduce ghost elements for variants When constructing a variant v with aconstructor C(ab) for instance we can keep correlations between the pairs (av) and(bv) However we fail to capture the information regarding vrsquos construction with CIn order to maintain it we create a ghost element g_vtyp with vrsquos type we add thepair (g_vtypv) to the intraprocedural summary and associate (ε ε) 7rarr [C 7rarr Any] toit Such pairs are deleted from the intraprocedural predicate summaries they are onlyused while analysing a predicatersquos body

Unlike the operations discussed in Chapter 7 the implementations of the partialorder (Definition 742) and join (Definition 743) operations are parameterized by thetyping environment mapping variables to types This has to be threaded through alloperations as it is necessary for the injection operation (Definition 738) We needto know the variable type onto which the relation is injected For instance in orderto ldquofillrdquo the unknown relations for fields or constructors with Any we must first knowwhat those fields or constructors are

823 Dependency and Correlation Analysers

The input program is first parsed and each predicate is analysed in turn Implicit pred-icates are treated conservatively Since their implementation is hidden a pessimisticassumption must be made For the dependency analysis it is considered that every-thing in their inputs has been read in order to obtain the outputs for any possible exit

82 Implementation of the Correlation Analysis 181

label Similarly for the correlation analysis it is considered that there is no correlationbetween the input and the output variables on any possible exit label

For inductive predicates the dependency analysis computes a summary for eachcase and joins the results for obtaining the dependency summary for the true exitlabel The false label is treated conservatively and everything is considered to beread Since inductive predicates are specification-only predicates that do not generateoutputs the correlation analysis associates a NoCorrelation summary to both labels

( Analyse the body [g] of an explicit predicate )l e t analyze g =

l e t todo = Queue create () inListiter ( fun v -gt Queuepush v todo) (G vertices g)l e t result = init_result g inl e t rec progress r =

tryl e t v = Queue pop todo inl e t vd = MVfind v r inl e t edges = preds g v inl e t vd rsquo = transfer r v edges ini f Dleq vd rsquo vd then progress re l se begin

Listiter ( fun edge -gtQueue push ( source edge) todo) edges

progress (MVadd v (Djoin vd vd rsquo) r)end

with Queue Empty -gt rinprogress result

The body of each explicit predicate is analysed independently for each possibleexit label using a variation of the worklist algorithm as shown above in the analyzefunction Initially a map is created having as many elements as there are nodes inthe predicatersquos body All of these are initially mapped to Unreachable the bottomelement at the intraprocedural level All the predicatersquos exit nodes are loaded intothe working queue Then a recursive function progress is executed until a fixed pointis reached and there are no more nodes left to analyse in the working queue Thefirst node of the queue is popped and analysed The nodersquos summary as stored in themap is retrieved in vd The analysis returns a summary vdrsquo for the node The twosummaries vdrsquo and vd are compared and if the former is more precise than the latterthen the recursive function progress is called Otherwise before calling progress thepredecessors of the analysed node are pushed into the working queue and in the map ofnodes the join of vd and vdrsquo is associated to the analysed node Since both analyses arebackwards analyses the dependency and correlation information of a node is based onthe dependency or correlation information of its successors in the control flow graph andthe former must be recomputed if the latter are modified Finally from the computedintraprocedural dependency summary all mappings corresponding to local variables

182 Chapter 8 Implementation Application and Results

are filtered From the computed correlation summary of an exit label l all mappingsthat do not correspond to an input and output variable pair are filtered

For the dependency analyser a command-line flag can be used to disable the usageof deferred dependencies Also the well-typedness check of dependency summaries canbe enabled similarly

A parser for dependency information has been implemented as well This allowsus to annotate αSmil programs with the expected results and compare them to thecomputed ones A similar parser for the correlation information is planned for the nearfuture

83 Dependency and Correlation Results on ProvenCoreLayers

831 ProvenCore Description

ProvenCore (Lescuyer 2015) is one of the two microkernels entirely specified and devel-oped in Smart at Prove amp Run Unlike Minix 31 by which it was inspired ProvenCoretargets ARM architectures and uses a Memory Management Unit for managing virtualaddress spaces It is a general-purpose microkernel supporting creation and deletion ofprocesses execution of programs synchronous message-passing inter-process commu-nication with timeouts asynchronous notifications and process-to-process data copies

The main property ensured by ProvenCore is the isolation property Isolation impliestwo complementary properties namely integrity and confidentiality Integrity refersto ensuring that the resources of a process (its code data and registers) cannot bealtered or interfered with by other processes unless explicitly authorized by the processConfidentiality refers to ensuring that the resources of a process cannot be observed byother processes unless explicitly authorized by the process In other words integrityensures that until a process decides to communicate with other processes it will executeas if it were alone on the system Confidentiality ensures that as long as a process doesnot send its secrets to other processes it can change its secrets without affecting otherprocesses

The isolation property has been formally proven using the interactive proof as-sistant of ProvenTools The proofs also establish functional specifications verified byProvenCore (Lescuyer 2015)

The proof for the isolation property is based on multiple refinements between suc-cessive models from the most abstract on which the isolation property is defined andproven to the most concrete ie the actual model used for code generation Thesesuccessive models are shown in Figure 81

Using multiple abstract models each more abstract than its predecessor enablesa degree of separation of concerns in the overall proof The lower-level proofs includea plethora of low-level properties and invariants and are devoid of functional prop-erties while the higher-level models focus on functional specifications Each layer ofabstraction removes details that are not relevant for it anymore and enables changing

83 Dependency and Correlation Results on ProvenCore Layers 183

SPM

RSM

FSP

TDS

Most Abstract

Least Abstract

Figure 81 ndash ProvenCore ndash Abstract Layers

the representation of the transition system in order to internalize in the structure of itsstates some invariants of the preceding level

The Security Policy Model (SPM) is the most abstract level and the one at whichthe isolation property is expressed and proven The kernel is modeled as an abstractcontroller and the various processes are modeled as machines each possessing its ownindependent physical resources

The Refined Security Model (RSM) is an intermediate layer meant to bridge thewide gap between its successor the SPM and its predecessor the FSP In the RSMthe machines share the same physical resources which are managed by the controller

The Functional Specifications (FSP) layer is a model roughly equivalent to its pre-decessor ndash the TDS ndash in functionality but unlike the latter it uses data structures andalgorithms that facilitate reasoning and formal proof Its main functional differencewith the TDS is that it eliminates MMU address translation using instead a linearview of the RAM similarly to the RSM

The Target of Evaluation Design (TDS) is the model that is used to generate thesequential Smart code of the kernel as well as the models for hardware componentsthat are not translated into C code but which are necessary for completing the TDSspecifications

For each refinement a view ie a function from the concrete model state to theabstract model state is defined Then a correspondence or commutation lemma isproven establishing that transitions from c to cprime in the concrete model entail transitionsfrom the view of c to the view of cprime in the abstract model Since the views are not totalfunctions this requires showing that the views actually exist In this manner thehigher levels are attained reaching models that are simpler and more flexible than theTDS but that still simulate all its possible behaviours (Lescuyer 2015)

This refinement chain also facilitates reusing parts of one proof effort in other proofs

184 Chapter 8 Implementation Application and Results

832 Obtained Dependency and Correlation Results

Our dependency and correlation analyses must be evaluated by two different criterianamely execution time and precision In this section we are discussing the former Thelatter will be discussed in the following section

Both analyses target complex transition systems in general and operating systemsin particular The ideas behind them stemmed directly from the verification effortentailed by ProvenCore Unlike other static analyses which are frequently employed ina fully automatic setting our static analyses are supposed to be used as companiontools in the middle of interactive program verification They are supposed to be appliedoften as steps during interactive proofs For instance the dependency and correlationsummaries for different predicates might be needed for verifying a single propertyThese in turn may imply a whole-model analysis Therefore the dependency andcorrelation analyses must perform quickly in order to answer effectively ldquoquestionsrdquoasked frequently

Our analyses have currently been applied to the functional specification of Proven-Core (Lescuyer 2015) More specifically they have been applied to the RSM FSP andTDS layers shown in Figure 81 Each of these layers is characterized by a global statewith numerous fields and different transitions ie supported commands or systemcalls such as fork exec exit Each supported command receives as an input the globalstate before the transition and returns the state of the system after the transition

For instance in RSM the global states are much simpler compared to the ones inthe layers below it ie FSP and TDS They are modeled by a structure with 6 fieldsout of which 3 are modeled by arrays and 2 by structures The RSM counterpart ofthe optional table of processes is a store of machines which are themselves the coun-terpart of FSP processes Machines are structures with 7 fields that refer to registersinformation regarding inter-process communication or permissions and code and datasegments Out of the 7 fields 2 are modeled by variants 2 by associative arrays andother 2 by structures

The global state of the FSP layer is modeled by a structure type with 15 fieldsincluding fields that concern process management (for memory allocations informationabout processes) interrupt handling (registered handlers active handlers) scheduling(priority queues currently running process process to run next) time management orcode data Among these 15 fields 9 fields are ldquocompositerdquo themselves being modeledby structures variants or associative arrays For instance among the fields concerningprocess management there is a table of optional processes The processes themselvesare modeled by a structure type having 26 fields Out of the total of 26 fields 11 aremodeled by algebraic data structures or associative arrays too

The FSP global state is characterized by over 70 invariantsIn TDS the global state is a structure having 33 fields among which 23 are ldquocom-

positerdquo as well The processes are structures having 29 fields among which 14 aremodeled by associative arrays or algebraic data types The global state is character-ized by approximately 140 invariants

83 Dependency and Correlation Results on ProvenCore Layers 185

In Table 83 we give an overview of the global states for each analysed layer Thefirst column shows the total number of fields The second column indicates the numberof fields that are modeled by associative arrays Between parentheses we indicatethe number of arrays having ldquocompositerdquo elements and elements of atomic or implicittypes respectively For example the FSP global state has 6 fields that are modeled byassociative arrays and all 6 of them have ldquocompositerdquo elements In columns 3 4 and5 we show the number of fields that are modeled by structures variants and atomic orimplicit types respectively

Table 83 ndash ProvenCore Abstract Layers ndash Global State Type

Global State Arrays Structures Variants AtomicImplicit

RSM 6 fields 2 fields (11) 2 fields 0 fields 2 fieldsFSP 15 fields 6 fields (60) 0 fields 3 fields 6 fieldsTDS 33 fields 14 fields (140) 3 fields 6 fields 10 fields

The global state of each layer contains an array or store of processes or machinesIn Table 84 we give an overview of the process or machine type for each analysed layerThe table has the same structure as the one described previously for the global statetypes

Table 84 ndash ProvenCore Abstract Layers ndash ProcessMachine Type

ProcessMachine Arrays Structures Variants AtomicImplicit

RSM 7 fields 2 fields (11) 2 fields 2 fields 1 fieldFSP 26 fields 2 fields (02) 5 fields 3 fields 16 fieldsTDS 29 fields 1 field (10) 8 fields 5 fields 15 fields

We have applied our dependency and correlation analyses on the RSM FSP andTDS layers thus conducting medium-sized experiments An overview of the charac-teristics for the 3 ProvenCore layers is included in Table 85 Table 87 and Table 89In each of these the first column shows the total number of predicates of the analysedlayers In parentheses we indicate the number of predicates that only read informationand return a Boolean-like exit label ie logical properties as well as the number of im-plicit predicates for which a pessimistic assumption is made The second column showsthe total number of lines of code (LoC) for each including comments and type defini-tions The next three columns indicate the number of LoC corresponding to predicatestype definitions and comments respectively

We have run the analyses 101 times in a loop on a Lenovo laptop with a Quad-CoreIntel Core I7-5500U processor and 8 GB RAM The system runs Xubuntu GnuLinux64 bit Release 1510 with OCaml 401 Before the first run of each loop the operatingsystemrsquos cache was dropped using the following command

186 Chapter 8 Implementation Application and Results

echo 3 gt procsysvmdrop_caches

The time measured includes only the execution of the analysis algorithms It ex-cludes the time required to load the input files as well as the time spent printing theresults

On average our fully context-insensitive dependency analysis as presented in Chap-ter 5 computed the dependency summaries for 633 RSMFSP predicates in 0656 sec-onds For the TDS predicates the dependency summaries were computed in 0699seconds on average These results are indicated in Table 85

Table 85 ndash Abstract Layers ndash Evaluation Data and DependencyAnalysis Timing

Predicates Total LoC Code Types Comments Dependency Avg

RSMFSP 633 (23565) 9853 8402 596 855 0656 s

TDS 780 (231155) 14000 11306 588 2106 0699 s

In Table 86 we indicate the minimum and maximum execution times for thecontext-insensitive dependency analysis Various percentiles are indicated as well

Table 86 ndash Abstract Layers ndash Detailed Dependency Analysis Timing(in seconds)

Min 10ile 50ile 90ile Max Avg

RSMFSP 0650 0651 0652 0658 0730 0656

TDS 0690 0691 0693 0718 0798 0699

The average execution time of our dependency analysis with the deferred accessesextension is shown in Table 87 in the last column denoted by Avg On averageour dependency analysis extended with deferred accesses as presented in Chapter 6computed the dependency summaries with context-sensitive leaves for 633 predicatesin 0779 seconds For the TDS predicates the dependency information was computedin 0919 seconds on average These results are indicated in Table 87

Therefore using our relaxed form of context-sensitivity led to an increase of 10-20in execution time on the used benchmarks

The detailed timing information for the dependency analysis using deferred accessesis shown in Table 88

The average execution time of our correlation analysis is shown in Table 89 in thelast column denoted by Avg The correlation summaries for the RSMFSP predicatesare computed in 0426 seconds on average For the TDS predicates the correlationsummaries are computed in 0496 seconds on average Unlike the dependency analysis

83 Dependency and Correlation Results on ProvenCore Layers 187

Table 87 ndash Abstract Layers ndash Evaluation Data and Deferred Depen-dency Analysis Timing

Predicates Total LoC Code Types Comments Deferred Avg

RSMFSP 633 (23565) 9853 8402 596 855 0779 s

TDS 780 (231155) 14000 11306 588 2106 0919 s

Table 88 ndash Abstract Layers ndash Detailed Deferred Dependency AnalysisTiming (in seconds)

Min 10ile 50ile 90ile Max Avg

RSMFSP 0776 0777 0779 0781 0785 0779

TDS 0904 0905 0908 0975 0999 0919

which computes information for code as well as specifications ie logical propertiesin a unified manner the correlation analysis only computes information for predicatesthat actually modify data structures This partly explains the time difference betweenthe two analyses We also remark that the possible-constructors analysis is performedsimultaneously with the dependency analysis and this contributes to the differencebetween the execution times as well

Table 89 ndash Abstract Layers ndash Evaluation Data and Correlation Anal-ysis Timing

Predicates Total LoC Code Types Comments Correlation Avg

RSMFSP 633 (23565) 9853 8402 596 855 0426 s

TDS 780 (231155) 14000 11306 588 2106 0496 s

The detailed timing information for our correlation analysis is shown in Table 810Generally static analysis has been considered prohibitive in terms of execution

time and it has been avoided in an interactive context and used predominantly inan automatic context Though currently applied only on medium-sized models theexecution times of both of our analyses are short enough to expect reasonable executiontimes for larger models as well2

2It is noteworthy to remark that the interprocedural dependency and correlation summaries willnot necessarily be computed on-the-fly during the interactive proof They rather will be computed aspart of the build In contrast the treatment of a query once all interprocedural information has been

188 Chapter 8 Implementation Application and Results

Table 810 ndash Abstract Layers ndash Detailed Correlation Analysis Timing(in seconds)

Min 10ile 50ile 90ile Max Avg

RSMFSP 0424 0425 0425 0427 0432 0426

TDS 0492 0493 0494 0498 0540 0496

833 Precision of our Dependency and Correlation Summaries

In this section we try to illustrate the sort of dependency and correlation summariesthat are computed by our analyses We conclude the section with a brief discussionregarding the precision of our obtained results Assessing and discussing precision asa metric for usefulness is hard in isolation and can only be effectively done in relationto actual applications However we present some statistics in order to give someinsight about the proportion of the non-trivial information computed For our currentdiscussion we focus on the results obtained on the RSMFSP and the TDS layers

One of the analysed predicates of the RSMFSP layers is do_auth This predicateis a system call clearing or granting an authorization to some process to read from orwrite to some memory range of the current process It receives a global state in andan index i as inputs and produces on the true label the new global state out aftermodifying the permission for the i-th process in the process store

The code of do_auth performs various system-wide checks before registering thepermission change and is therefore not trivial although its effect is quite limitedIndeed the correlation results computed by our analysis for the true label of thispredicate are shown below

true (in out) 7rarr [(ε ε) 7rarr 7rarr Equal 14 fields

procs 7rarr Any (procs procs) 7rarr 〈 Equal i [ None 7rarr Equal

Some 7rarr v 7rarr 7rarr Equal 25fields

mem_auth 7rarr Any]〉]

The analysis detects that out of the 15 fields of out only the i-th element of the procsfield is changed Furthermore it detects that if this element is an active process iebuilt with the Some constructor only the mem_auth field is modified out of the total of26 fields Everything else is copied from the input state in

computed will be executed in real-time Nevertheless it is desirable to have fast analyses allowingdevelopers to iterate frequently

83 Dependency and Correlation Results on ProvenCore Layers 189

Combined with dependency summaries for logical properties this correlation sum-mary would allow us to infer the preservation of all invariants that are not concernedwith the memory permissions All but one out of the specified properties for the globalstate fall into this category This is the relevant memory permissions property

predicate proc_mem_auth_ok(proc proc) -gt [true | false]

which verifies a fundamental property that has to hold for all processes in the processstore of proc and states that a process has permissions covering a valid range of mem-ory addresses and referring only to existing processes After executing do_auth thisproperty is threatened and needs to be verified only for the i-th process of the storeIt is preserved for all others

The dependency results computed by our analysis for this predicate are shown be-low The analysis detects that for each of the possible execution scenarios the outcomedepends only on 2 out of the 26 fields namely the stackframe and the memory per-missions The dependency on the stackframe is confined to only one of the 3 fieldsthe data and stack segment The memory permissions are given by a variant with 3constructors denoting reading and writing permissions or the absence of any permis-sion Furthermore besides pinning down the outcomersquos dependency on 2 out of the 26fields of the proc structure the analysis also detects that the absence of any memorypermission indicated by the constructor NONE of the mem_auth variant is perp for the falseexecution scenario In other words unused permissions cannot threaten the propertyproc_mem_auth_ok

false rarr proc rarr mem_auth rarr [ READ rarr base rarr gt len rarr gtWRITE rarr base rarr gt len rarr gtNONE rarr perp ]

stackframe rarr ds rarr gttrue rarr proc rarr mem_auth rarr [ READ rarr base rarr gt len rarr gt

WRITE rarr base rarr gt len rarr gtNONE rarr ]

stackframe rarr ds rarr gt

The relevant memory permissions property is thus only threatened by transitionsthat add memory permissions or change a processrsquo virtual space layout Only 2 tran-sitions out of the 25 belong to this category exec which resets the processrsquo segmentsand do_auth which adds permissions and was discussed above In particular transi-tions deleting memory permissions do not impact the property since the absence ofpermissions as shown by the dependency of the constructor NONE for the false labelis an impossible case when the property does not hold This is one of the practicaladvantages of tracking constructor possibilities simultaneously and of extending thecorrelation analysis to track the evolution of constructors as well

In the following we briefly discuss our dependency summaries obtained on theRSMFSP layer in terms of precision An overview is given in Table 811 The firstcolumn refers to the fully context-insensitive dependency analysis as presented in Chap-ter 5 The second column refers to the dependency analysis extended with deferred

190 Chapter 8 Implementation Application and Results

access maps as presented in Chapter 6 The first line indicates the total number ofpredicates both implicit and explicit The second line indicates the total number ofimplicit predicates for which we are obliged to make a pessimistic assumption and toconsider everything needed given that their implementation is hidden The third lineindicates the number of explicit predicates without inputs for which empty summariesare retrieved Our dependency analysis detects the input subset that is read in orderto obtain the output In the case of predicates without inputs this subset is emptyMost explicit predicates without inputs correspond to wrapper predicates around callsto constructors that take no arguments Since αSmil is an intermediate language suchpredicates are automatically generated and do not necessarily correspond to program-mer written predicates The next line line 4 indicates the number of predicates forwhich we obtain non-trivial information By non-trivial information we mean depen-dency summaries in which the dependency associated to at least one input variableis different than gt ie Everything the element conveying no information With thecontext-insensitive dependency analysis we obtain non-trivial results for 344 predicatesWith the extended dependency we obtain non-trivial results for 403 predicates

Table 811 ndash RSMFSP Layers ndash Evaluation Data and DependencySummaries

Context-Insensitive Deferred

Number of Total Predicates 633 633

Number of Implicit Predicates 65 65No Inputs 26 26

Number of Non-Trivial Results 344 403

Number of Trivial-Results 289 230bull Implicit 65 65bull No Inputs 26 26bull Other 198 139

Predicates with Atomic Inputs 31 31

Completely Read 71 71

Overapproximation 96 37

The following line mdash line 5 mdash indicates the total number of predicates for whichtrivial results are obtained These include the results for implicit predicates as well asthose for predicates without inputs For the simple version of the dependency analysiswe obtain 198 trivial results excluding implicit predicates and predicates without in-puts For the extended dependency analysis we obtain trivial results for 139 predicatesexcluding implicit predicates and predicates without inputs Therefore for the first ver-sion of the analysis 49 trivial summaries are a consequence of context-insensitivity The

83 Dependency and Correlation Results on ProvenCore Layers 191

next 3 lines refer to the 139 predicates for which trivial results are obtained with bothversions of the dependency analysis 31 of them correspond to predicates manipulat-ing only inputs of atomic types such as int Such inputs are completely read andthus the trivial results are justified and do not correspond to an over-approximationOther 71 correspond to predicates making complex manipulations and actually read-ing all of their input such as well-formedness checks The last 37 trivial results area consequence of over-approximations made by our analysis The majority of themcorrespond to complex predicates making multiple calls to other complex predicatesand relying heavily on calls to implicit predicates for which conservative assumptionsare made For the simple dependency analysis other 46 trivial results are a result ofover-approximations related to context-insensitivity

An overview of the dependency results for the TDS layer is given in Table 812The table follows the same structure as described for Table 811

Table 812 ndash TDS Layer ndash Evaluation Data and DependencySummaries

Context-Insensitive Deferred

Number of Total Predicates 780 780

Number of Implicit Predicates 155 155No Inputs 15 15

Number of Non-Trivial Results 386 458

Number of Trivial-Results 394 322bull Implicit 155 155bull No Inputs 15 15bull Other 224 152

Predicates with Atomic Inputs 49 49

Completely Read 59 59

Overapproximation 116 44

We remark that with the deferred dependencies extension we obtain more pre-cise dependency summaries for 273 predicates of the RSMFSP abstract layer Theseconstitute approximately 50 of the predicates in the used benchmark For the TDSlayer we obtain more precise results for 308 predicates using the deferred dependenciesextension These constitute approximately 50 of the predicates in the TDS layer forwhich non-trivial results can be obtained (ie excluding implicit predicates and thosewithout inputs) The dependency summaries obtained with the extended analysis areconsiderably more detailed For instance just to give an intuition of the differencebetween the results obtained for the TDS layer the file containing the results com-puted with the context-insensitive dependency analysis contains 7333 lines and its size

192 Chapter 8 Implementation Application and Results

is 2631 kB while the file containing the results computed with the extended analysiscontains 11547 lines and its size is 5239 kB

The statistics for the correlation analysis are shown in Table 813 Unlike the depen-dency analysis which handles both logical properties and predicates generating outputsthe correlation analysis does not handle logical properties It tracks fine-grained partialequivalences between parts of the input and parts of the output Therefore the numberof RSMFSP predicates for which we can obtain non-trivial results (ie at least onepartial equivalence between an input (sub)element and an output (sub)element on atleast one exit label) is lower Implicit predicates and specification-only predicates aremapped to NoCorrelation the top element conveying no information Out of the 307predicates left we obtain non-trivial results for 186 of them The rest include predi-cates relying heavily on calls to implicit predicates They also include complex systemcalls such as fork or exec and auxiliary operations which modify their input entirely

Table 813 ndash RSMFSP Layers ndash Evaluation Data and CorrelationSummaries

Correlation Analysis

Number of Total Predicates 633

Number of Implicit Predicates 65Number of Logical Properties (No Outputs) 235

No Inputs 26

Number of Non-Trivial Results 186

Number of Trivial-Results 90bull Implicit 65bull No Inputs 26bull No Outputs 235bull AtomicImplicit Inputs 31

An overview of the correlation results for the TDS layer is given in Table 814 Thetable follows the same structure as described for Table 813

84 Reasoning about Framing using Correlations and De-pendencies

841 A Decision Procedure

In general reasoning about framing relies on the frame rule which is commonly illus-trated as follows

PCQP andRCQ andR

84 Reasoning about Framing using Correlations and Dependencies 193

Table 814 ndash TDS Layer ndash Evaluation Data and Correlation Summaries

Correlation Analysis

Number of Total Predicates 780

Number of Implicit Predicates 155Number of Logical Properties (No Outputs) 231

No Inputs 15

Number of Non-Trivial Results 235

Number of Trivial-Results 95bull Implicit 155bull No Inputs 15bull No Outputs 231bull AtomicImplicit Inputs 49

The purpose of the frame rule is to enable local reasoning a property R that holdsfor a state P will continue to hold after executing a command C provided that Rreads only locations that are unmodified by C The frame rule also called the rule ofconstancy (Reynolds 1981) applies in its original form to simple languages which donot use a heap Separation logic addresses framing for heap-supporting languages

In our case the αSmil language with which we are working does not support mu-tation Our work is not concerned with heap modifications but focuses on deep-statemodifications We handle predicates that receive a composite input state and constructa new composite output state without altering the former The new output state isconstructed by copying the input state and modifying a subset of subelements

In our context the frame rule must be reinterpreted as follows a property R ispreserved by a predicate C receiving an input state P and constructing an output stateQ if the states P and Q agree on the subset on which the property R depends In otherwords a property is preserved by a predicate if the latter only modifies subelements onwhich the property does not depend Using the terminology used in separation logica property R is preserved by a predicate C if the footprint of C is disjoint from thefootprint of R However we are not concerned with locations but with subelements oflarge states modeled by algebraic data structures and arrays Therefore when reasoningabout framing we need to check if the input subset modified by an operation is disjointfrom the subset that properties are reading and depending on

We have devised two static analyses for automatically computing the footprints ofoperations and properties The dependency analysis detects the input subset on whichthe outcome of an operation or of a property relies The correlation analysis detectsthe input subset that is modified by an operation in order to obtain the output Theresults of the two analyses are meant to be used and combined by a decision procedurein order to automatically infer the preservation of frame properties

The decision procedure has not been implemented yet but based on preliminary

194 Chapter 8 Implementation Application and Results

experiments we give an intuition about how the dependency and correlation summariesare meant to be unified what type of queries could be answered and the mechanismused for answering them

Concretely the decision procedure is meant to receive a sequence of atoms one ofwhich is a query The query is to be answered based on the correlation summariescomputed for the other atoms Atoms are calls to built-in or user-defined predicatesQueries usually consist of a Boolean built-in statement such as an equality check ora partial structure equality check for instance or a call to a logical predicate havingtrue and false as exit labels and generating no outputs In a nutshell the dependencysummary computed for the query would have to be transformed and interpreted as aset of correlations that are sufficient to answer affirmatively the given query Thisshould then be compared to the correlations computed for the atoms The query canbe answered affirmatively if the latter is less than or equal to the former

We sketch the envisioned mechanism behind our decision procedure on a simpleexample receiving 4 atoms One of them is a query as shown below

type state = f int g int h int

v1 = sft = s with g = w

v2 = tf

Q v1 = v2 - true -

In this case it is not necessary to first obtain the dependency for the query markedwith Q and to interpret it as a correlation The necessary and sufficient correlation forthe query to be answered affirmatively can be obtained directly

(v1 v2) 7rarr (ε ε) 7rarr Equal

Separately we need to extract all the correlation information regarding (v1 v2) fromthe given atoms For this we must first find the chains of correlations connecting thetwo through other intermediate atoms Therefore we begin by building an undirectedgraph in which every variable appearing in the atoms is added as a node An edge isadded between any nodes representing the input and the output of the same atom3For our example the graph is shown below

s

t v1

v2 w

3In general these graphs will not be acyclic Further measures will have to be taken for correctlydealing with all cases

84 Reasoning about Framing using Correlations and Dependencies 195

The path connecting v1 and v2 is highlighted in green In the general case such pathscould be detected using a depth-first search algorithm Using the detected path betweenv1 and v2 we build a chain of pairs of variables of the following form

(v1 s) lt-gt (s t) lt-gt (t v2)

These are the unordered paths for which we need to extract the correlation informationcontained in the correlation summaries of the atoms The correlation summaries of ourexample atoms are the following

v1 = sf (s v1) 7rarr (f ε) 7rarr Equal

t = s with g = w (s t ) 7rarr

(f f) 7rarr Equal(h h) 7rarr Equal

(w t ) 7rarr (ε g) 7rarr Equal

v2 = tf (t v2) 7rarr (f ε) 7rarr Equal

In the correlation summaries computed by our analysis correlation maps are associatedto pairs of input and output values ie the computed information is expressed betweenthe input and the output variables of an operation They can be seen as ordered pairshaving inputs as the left members and outputs as the right members However thecorrelation information expresses a relation between two runtime values which canbe compared independently of the order in which they appear4 The atoms refer tovalues that occur in the program at different times and answering the query is doneindependently of the order of execution Therefore at this level we can swap themembers of the pairs to which correlation maps are associated This allows us toobtain correlation information expressed in terms of the variable pairs in the chainextracted from the graph of atom variables For instance for our example we wouldobtain the following

(v1 s) lt-gt (s t) lt-gt (t v2)

(v1 s) 7rarr (ε f) 7rarr Equal

(s t ) 7rarr

(f f) 7rarr Equal(h h) 7rarr Equal

(t v2) 7rarr (f ε) 7rarr Equal

From these we compute the Cartesian product of the correlations appearing in thecorrelation maps as follows

4When the evolution of constructors will be tracked as well the relations will stop being symmetricThus the matrices will have to be transposed

196 Chapter 8 Implementation Application and Results

c1 times c2 c3 times c4

wherec1 = (ε f) 7rarr Equalc2 = (f f) 7rarr Equalc3 = (h h) 7rarr Equalc4 = (f ε) 7rarr Equal

For our example the obtained set would be the following((ε f) 7rarr Equal (f f) 7rarr Equal (f ε) 7rarr Equal)((ε f) 7rarr Equal (h h) 7rarr Equal (f ε) 7rarr Equal))

For each member of the obtained set we need to recursively compose the correlationsin order to obtain information regarding the values involved in the query The composeoperations would be applied as follows

(((cprime1 cprime2) cprime3) middot middot middot )

where for the first element of our example set cprime1 cprime2 and cprime3 have the following values

cprime1 = (ε f) 7rarr Equalcprime2 = (f f) 7rarr Equalcprime3 = (f ε) 7rarr Equal

For our example we cannot obtain any correlation information regarding (v1 v2)by composing the correlations of the second member of the Cartesian product Thefirst correlation relates the value of v1 to the value of the f field of s while the secondcorrelation relates the values of the field h of s and t Thus in this case we cannotinfer anything regarding v1 and t nor regarding v1 and v2 However by composingthe correlations of the first member of the Cartesian product we obtain the following

(v1 v2) 7rarr (ε ε) 7rarr Equal

If after composing we would have obtained multiple correlations referring to (v1 v2)these would have had to be intersected thus allowing us to extract from the givenatoms the most precise correlation information regarding (v1 v2) In the general casethe correlation information obtained after the intersection is the one that has to becompared to the correlation computed previously ie the sufficient correlation for thequery to be answered affirmatively For our example this amounts thus to comparing

(v1 v2) 7rarr (ε ε) 7rarr EqualvK

(v1 v2) 7rarr (ε ε) 7rarr Equal

Based on this we can conclude that the given query Q will be answered affirmativelyfor the atoms given in our example

84 Reasoning about Framing using Correlations and Dependencies 197

842 Types of Targeted Queries

The types of queries that are targeted by our approach can be categorized as follows

bull equality of values

bull structure equality on the values of a subset of fields

bull implications of the form logical_property(a) rArr logical_property(b) where a and bare related by the facts inferred from the other atoms of the query

bull conjunctions of such queries

In the general case we need to reinterpret a dependency summary as a correlationsummary The queryrsquos goal is to deduce the equality between pairs of variables Whentwo such variables are of the same type we can create a correlation map containinga single correlation That correlation associates to the pair of paths (ε ε) a partialequivalence relation which mirrors the dependency The partial equivalence relation iscreated as follows

bull When the dependency is Everything the equivalence relation becomes Equal

bull When the dependency is Nothing the equivalence relation becomes Any

bull Structure variant and array dependencies are transformed pointwise to structurevariant and array partial relations

bull When the dependency is Impossible the equivalence relation becomes Any in theabsence of the possible-constructors extension

We illustrate here some example queries revolving around our do_auth predicatediscussed in Section 833

A naive equality query on the entire input and output of do_auth would not besatisfiable as do_auth does modify the memory authorizations of one process This isthe first sort of supported query

do_auth (now i arg3 )[ true after |oob| f a l s e ]Q after = nowrArr no

The main argument of the do_auth predicate is the global state now an instance ofthe global_state structure5type global_state =

procs array ltoption ltprocess gtgtmemory_regions array lt mem_region gtirq_handlers array lt irq_handler gtcurrent_process int

5Due to confidentiality reasons the actual definition of the struct has been modified and edited forlength

198 Chapter 8 Implementation Application and Results

Since the do_auth predicate only affects the mem_auth of one process in the procsarray we can successfully deduce for the values of now and after the equality on thefields memory_regions and current_process This is the second sort of supported query

do_auth (now arg2 arg3 )[ true after |oob| f a l s e ]Q after = ltmemory_regions current_process gtnowrArr yes

Finally we can directly deduce that the all_ids_in_handlers_ok_global(state)property is not threatened by the execution of the do_auth predicate

do_auth (now arg2 arg3 )[ true after |oob| f a l s e ]Q congruent all_ids_in_handlers_ok_global (now)

all_ids_in_handlers_ok_global (after )rArr yes

This property verifies that all the identifiers used by the registered interruptionhandlers stored in the field irq_handlers are valid The property has the followingdependency summary

false rarr staterarr irq_handlersrarr Everythingtrue rarr staterarr irq_handlersrarr Everything

From the correlation of the do_auth predicate we know that the irq_handlers fieldis preserved and therefore it follows that the property which only depends on thatfield is preserved Similar properties that do not depend on the procs array but onlyon parts or on the entirety of one or more of the other 14 fields will be preserved aswell

The preservation of properties that have to hold for every process in the arrayprocs will be inferred as well as long as they do not depend on the mem_auth field ofthe processes For instance the property procs_proc_map_ok_global verifies that eachprocess of the array procs has valid code data and stack segments This property hasthe following dependency summary

truerarrstaterarr

procsrarr

lang[None rarr EverythingSome rarr vrarr proc_maprarr Everything

]rang

falserarrstaterarr

procsrarr

lang[None rarr EverythingSome rarr vrarr proc_maprarr Everything

]rang

Since for every active process of the array the property depends only on the proc_mapfield it is unaffected by the modification of the mem_auth field Therefore the propertyis preserved for the global state after obtained after the execution of do_auth Similarproperties that do not depend on the mem_auth field but only depend on other parts ofthe data structure will be preserved as well

An extension of the decision procedure sketched in Section 841 could take advan-tage of additional information regarding array indices For example the query couldspecify that two of the involved array indices are different

85 Decision Procedure Experiments 199

do_auth (now i arg3 )[ true after |oob| f a l s e ]Assert i = jQ congruent mem_auth_ok_global (now j)

mem_auth_ok_global (after j)rArr yes

The mem_auth_ok_global(statej) property checks the well-formedness of the mem-ory permission on the j-th process The above query is satisfied if the propertymem_auth_ok_global holds for all processes other than the i-th The correlation sum-mary for do_auth states that the elements of the procs array are unmodified by theoperation except for the i-th element Combined with the dependency summary formem_auth_ok_global given below this allows the query to be satisfied

truerarrstaterarr

procsrarr

langNothing j

[None rarr EverythingSome rarr vrarr ProcDep1

]rang

falserarrstaterarr

procsrarr

langNothing j

[None rarr EverythingSome rarr vrarr ProcDep2

]rang

where ProcDep1 ismem_auth rarr

READ rarr base rarr Everything

len rarr EverythingWRITE rarr base rarr Everything

len rarr EverythingNONE rarr Impossible

stackframe rarr dsrarr Everything

and ProcDep2 is

mem_auth rarr

READ rarr base rarr Everything

len rarr EverythingWRITE rarr base rarr Everything

len rarr EverythingNONE rarr Nothing

stackframe rarr dsrarr Everything

85 Decision Procedure ExperimentsWe have applied a basic prototype of the decision procedure using the dependency andcorrelation summaries computed for the RSMFSP layers of ProvenCore

Our prototype considers pairs of one logical property and one predicate The log-ical property and the predicate must both operate on values of the same type Moreprecisely one of the predicatersquos inputs as well as one of its outputs and one of thelogical propertyrsquos inputs must all be of the same type Our prototype attempts to

200 Chapter 8 Implementation Application and Results

detect whether the logical property is preserved after the execution of the predicate Ifseveral inputs or outputs are of the same type all combinations are considered Mostimplicit types were not considered when searching for propertypredicate pairs as theyare less likely to yield successful results For example arguments of a primitive typelike int are unlikely to be unaffected by the execution of the predicate

This prototype automatically inspected all such propertypredicate pairs found inthe RSMFSP layers A property was considered to be preserved if its dependencysummary for the argument involved when translated to a set of equalities formed asubset of the equalities implied by the predicatersquos correlation summary Both the trueand the false exit labels were considered independently and the property is consideredto be preserved (subject to some conditions) when it is preserved for either or both exitlabels More precisely given a property π(ı)[true|false] and a predicate p(ıprime)[` oprime] wereport success when it can satisfy the following

exist i isin ı iprime isin ıprime oprime isin oprime such that Γ(i) = Γ(iprime) = Γ(oprime) (81)and exist ` isin true false (82)and E(j) 6= E(k) and Eprime(j) 6= Eprime(k) forallj k isin ı ıprime oprime (83)

when j and k are used as array indices (84)

andlangE[

Prop(ı[irarr iprime])[true|false]]rang `minusrarr E (85)

andlangE[

Pred (ıprime)[`prime o| ]]rang `primeminusrarr Eprime (86)

andlangEprime[

Prop(ı[irarr oprime])[true|false]]rang `minusrarr Eprime (87)

where ı[i rarr iprime] and ı[i rarr oprime] denote the sequence of variables ı in which the variable iis replaced by the variable iprime (respectively oprime)

This initial prototype was run on the 398 explicit predicates and 235 properties ofthe RSMFSP layer of ProvenCore Out of these we filtered predicateproperty pairsfor which the property has an input i of the same type as one of the predicatersquos inputsiprime and one of its outputs oprime These pairs involve 161 distinct predicates and 165 distinctproperties In total there were 8250 tuples (i iprime oprime `) which satisfied the conditions 81and 82

This experiment allowed us as a first result to automatically identify 102 predicatesfor which at least one property is preserved under the conditions 81 ndash 87 stated aboveFor many predicates it was possible to show that after the execution of said predicateseveral properties are preserved (up to 33) Figure 82 shows an overview of howmany properties were inferred to be preserved for each predicate The blue regionat the bottom indicates how many properties are inferred to be preserved for a givenpredicate while the red region above shows how many properties were compatible withthe predicate but were not inferred to be preserved

Figure 83 shows an overview of how many predicates were inferred to be preservingeach property The blue region at the bottom indicates how many predicates areinferred to be preserving a given property while the red region above shows how many

85 Decision Procedure Experiments 201

20 40 60 80 1000

5

10

15

20

25

30

35

40

45

50

Predicates

Num

berof

preservedprop

ertie

sinferred

Figure 82 ndash Distribution of the number of inferred preserved proper-ties Predicates are sorted along that criterion

predicates were compatible with the property but were not inferred to be preservingit

It is worth noting that in both figures 82 and 83 the red zone contains properties(respectively predicates) which could fall into these cases

bull The property is actually threatened by the predicate (respectively the predicatethreatens the property)

bull The property is not threatened (respectively the predicate is not threatening)but proving so requires more information that is obtained by our dependencyand correlation analysis For example a more precise dependency or correlationanalysis (eg tracking constructor evolution as presented in 76) could be neededA numerical or value analysis could also help determine that the parts of the in-put data structure which are modified by the predicate and on which the logical

202 Chapter 8 Implementation Application and Results

20 40 60 80 900

10

20

30

40

50

60

Properties

Num

berof

predicates

preserving

theprop

erty

inferred

Figure 83 ndash Distribution of the number of inferred predicates for whicha property is preserved Properties are sorted along that criterion

property also depends still satisfy the property after the execution of the pred-icate Alternatively the preservation of these properties can be demonstratedusing an interactive prover

bull The property is not threatened (respectively the predicate is not threatening) andthe dependency and correlation summaries contain enough information to provethe non-interference of the predicate and property but our decision procedureprototype failed to infer it This can be due to a timeout (this initial prototypehas not been optimized at all and can take a substantial time in some cases) orto precision losses in the decision procedure prototype itself

203

Chapter 9

Conclusion and Perspectives

There is no real ending Itrsquos just theplace where you stop the story

Frank Herbert

Despite its intuitive simplicity the frame problem has proved to be an enduringissue with notoriously tedious implications Its different manifestations have been stud-ied for several decades in various contexts ranging from Artificial Intelligence in thecontext of which it has been originally identified to the field of formal specificationand verification Recently it has received extensive attention from the object-orientedverification community where it has been identified as a subsisting problem (LeavensLeino and Muumlller 2007) and an ideal candidate for automation (Meyer 2015) Clas-sical approaches to addressing the frame problem are typically relying on separationlogic (Reynolds 2005) or ownership types (Clarke Potter and Noble 1998) Thoughthe merits of such approaches are indisputable the manual specification effort that theyrequire is non-negligible as well Frame properties are an integral part of a completespecification and they are mandatory for proving correctness but ideally they shouldimpose little additional effort Programmers should be able to focus on the truly inter-esting part namely what code does and rely on automatic tools for the repetitive andcumbersome task of specifying and verifying frame properties

Interactive formal verification of complex transition systems is not exempt from themanifestations of the frame problem either Considerable effort is spent on provingthe preservation of the systemrsquos invariants even though in practice the majority ofoperations have a localised effect on the system and impact only a limited number ofinvariants at the same time Identifying those invariants that are unaffected by anoperation and automatically proving their preservation can substantially ease the proofburden for the programmer In this thesis we have presented an approach towardsautomatically inferring the preservation of framing-related invariants It is meant tobe used in the context of an interactive theorem prover and employs two differentstatic analyses namely a dependency analysis and a correlation analysis whose unifiedresults are meant to establish the disjointness between the data dependencies of a logicalproperty and the modifications performed by an operation The decision proceduremeant to combine the results of the two analyses is still in an incipient stage Howeverour preliminary experiments related to automatically answering queries regarding the

204 Chapter 9 Conclusion and Perspectives

preservation of certain invariants for unmodified parts are encouraging We believethat our envisioned approach can become applicable to complex transition systemson a routine basis Reasoning about framing can come for free without imposing thespecification of additional clauses We also believe that automatic reasoning aboutframing can be achieved through static analysis Generally static analysis has beenconsidered prohibitive in terms of execution time It has been predominantly usedin an automatic context and avoided in interactive contexts where queries have to beanswered fast so as not to impede the natural flow of an interactive proof Thoughcurrently applied only on medium-sized models given the short execution times of ourdedicated static analyses we believe that reasonable execution times for larger modelscan be expected as well Therefore we surmise that static analysis is applicable in aninteractive verification context

91 ContributionsThe main contributions of this thesis are the designed and implemented dependencyand correlation analyses which are meant to be used in the context of an interactivetheorem prover Both analyses handle associative arrays and algebraic data types andcompute fine-grained results mirroring the layered structures of such types They targetcomplex transition systems in general and operating systems in particular These arecharacterized by states defined by complex compound data structures and by transi-tions ie state changes that map an input state to an output state Both of our staticanalyses are concerned with deep-state manipulations ie accesses and modificationsrespectively

The dependency analysis presented in Chapter 5 automatically detects the relevantinput subset needed for producing certain outputs It handles functions and theirspecifications in a unified manner and computes for each possible execution scenario aconservative approximation of the input (sub)elements on which their outcome dependsIt is a flow-sensitive path-sensitive interprocedural data-flow analysis Furthermore forvariants an additional analysis is simultaneously conducted for computing the subsetof possible constructors on a given execution scenario Together with the dependencyinformation per se this additional information about constructors is meant to answerthe same question namely what fragments of the input influence the output from adifferent albeit related point of view The first version of the dependency analysis wasfully context-insensitive In order to introduce a relaxed form of context-sensitivity wehave devised an extension based on symbolic paths This was presented in Chapter 6

The extension for the dependency analysis is based on computing deferred depen-dencies consisting of symbolic access maps in which callers can subsequently injecttheir specific context information on an as-needed basis The dependency summariesfor each predicate are still computed only once However by including nested context-sensitive components at the summariesrsquo leaves we reduce the precision penalty exertedby the fully context-insensitive approach without sacrificing performance As discussedin Chapter 8 the deferred dependencies extension led to an increase of 10ndash20 in

91 Contributions 205

execution time on the used benchmarks In terms of precision it led to more precisedependency summaries for 50 of the predicates of the same benchmarks

We surmise that besides its intended target other programming activities can relyon our dependency analysis as well For instance the analysis can have applications inthe testing realm for designing and generating test suites that avoid redundant testingof the same execution scenario Classes of inputs that will test the same executionscenario can be automatically determined The input subelements on which the outputsof a predicate do not depend can be consistently supplied with the same testing value asthey are completely irrelevant for the outcome On the contrary the input subelementson which the outputs depend should be targeted and their values should be varied formore comprehensive testing Furthermore our dependency analysis could also facilitateunit testing for exceptions as it computes specific results for every execution scenarioof a predicate Indeed it is useful to have dedicated test cases which trigger eachexception that can be thrown by a function The set of relevant parts of the inputdiffers for each possible exception and for the regular execution behaviour

Our second contribution is the correlation analysis presented in Chapter 7 whichdetects the flow of input values into output values It computes a conservative approx-imation of fine-grained equivalences between the input and the output subelementsof a function The correlation analysis is an interprocedural data-flow analysis thattracks the origin of subparts of the output and relates it to subparts of the inputs thussummarising the behaviour of functions and detecting not only what is modified butalso how and to what extent We have defined a partial equivalence type mirroringthe layered structure of algebraic data types and associative arrays and we introducedan intermediate level consisting of access paths and correlations These allow comput-ing expressive information regarding equivalences between subparts of the inputs andsubparts of the outputs in a flexible manner

Prototypes for both of our analyses have been implemented in OCaml These werediscussed in Chapter 8 We have applied them to a functional specification of Proven-Core (Lescuyer 2015) a general-purpose microkernel that ensures isolation Resultsfor medium-sized models have been obtained on average in less than 1 second with thedependency analysis and less than 05 seconds on average with the correlation analysisStatic approaches have long been considered as being confined to small programs Webelieve that our preliminary results indicate that it is possible to report conservativeprecise information without sacrificing scalability

We remark that our experience with the design and implementation of the twoanalyses has been rather different The dependency analysis is much more complexsemantically This is partly a consequence of the simultaneous possible-constructorsanalysis which has an impact on the abstract dependency domain Deferred depen-dencies add yet another layer of complexity However the implementation proved tobe much simpler than the implementation of the correlation analysis The latter posedchallenges due to the intermediate layer of access paths and correlations that we had toadd for obtaining expressive fine-grained information However the correlation analy-sis is simpler from a semantics point of view It is also noteworthy to remark that forboth analyses an intermediate level below variables needed to be introduced as soon as

206 Chapter 9 Conclusion and Perspectives

fine-grained relations between pairs of variables were considered directly or indirectlyIn the case of deferred dependencies this was not the main goal but rather a mecha-nism for obtaining increased precision in specific cases for already pertinent dependencyinformation In contrast for the correlation analysis the inclusion of an intermediatelevel was imperative for obtaining useful expressive information in non-trivial cases

As a first step towards a solution for automatically inferring the preservation offraming-related invariants we have sketched a decision procedure meant to employour two static analyses By uncovering equivalences between inputs and outputs afterhaving detected that a property only depends on unmodified parts and by unifying theresults the preservation of invariants for the unmodified parts can be inferred

92 Future WorkWe conclude this thesis with some perspectives for practical future work as well assome theoretical open issues that we wish to address in the future

Practical Future Work From a practical point of view our future work goalsrevolve around the full implementation of the decision procedure its integration inthe interactive theorem prover developed at Prove amp Run as well as its comprehensiveassessment in a real-word context

Decision Procedure Implementation Our first and main goal for the nearfuture focuses on the full implementation of the decision procedure combining our de-pendency and correlation summaries and answering queries related to the preservationof logical properties The performance of the algorithm sketched in Section 84 shouldbe assessed on real-world examples The complexity of this algorithm depends on thenumber of paths relating two endpoints in the graph of query atoms variables Italso depends on the number of correlations relating pairs of variables along the chainsconnecting endpoints This could lead to a combinatorial explosion of the number ofcompose operations for large query graphs Further optimization manners should beinvestigated and applied in the algorithm implementing the decision procedure

Validation After having implemented the decision procedure the precision ofour two static analyses employed by it should be comprehensively assessed on variousbenchmarks

Some of the theoretical aspects related to our static analyses have been formalizedin Coq by Steacutephane Lescuyer However the actual implementation of the algorithmsis not formally connected to the mechanized proofs Therefore it would be desirableto extensively test the implementation of the analysis algorithms This could be doneby translating the dependencies and correlations to types in a sufficiently expressivetype system or by inserting runtime guards These guards would check equalities forcorrelations and would taint supposedly irrelevant values identified by the dependencyanalysis verifying that the output is not tainted For the correlation analysis inputs

92 Future Work 207

which are correlated to some output values could be given a universally quantified typethe same type appearing in the parts of the output which are supposed to be equalThis is commonly used as a design pattern in functional programming languages toexpress data-flow constraints via the type system For the dependency analysis eachpart of the input which is supposed to be irrelevant for a predicatersquos output could beassigned a distinct polymorphic type variable which does not appear in the outputThis allows the body of the predicate to take notice of a valuersquos presence without beingable to manipulate its contents

Tool Integration and Support Another important goal for the near future isthe integration of our decision procedure in the ProvenTools interactive prover A tac-tic allowing to automate the inference of framing-related invariant preservation shouldbe supported This goal entails a sequence of other considerations that have to beaddressed Currently the dependency and correlation analyses handle whole programsand compute summaries for every predicate of the analysed program Though theexecution times of our analyses are low even these can prove to be cumbersome ina real world context Therefore the two analyses should be adapted so as to allowincrementally analysing only parts of a program Caching the results of the analysesacross invocations of the decision procedure could prove to be efficient as well Addi-tionally the mechanism of answering queries regarding invariant preservation shouldbe transparent allowing users to see the reasoning steps behind the decision procedureTransparency is necessary for the ProvenTools prover which targets products that haveto be certified This possibly also requires a more concise output notation for thedependency and correlation summaries in order to ease the interpretation of resultsCurrently they tend to be rather verbose for predicates handling composite values witha large number of subelements

For the dependency summaries a parser was implemented allowing users to an-notate predicates with expected dependency information A similar parser could bewritten for the correlation summaries These annotations are a useful tool for testingthe analyses on benchmarks for which the correlations and dependencies are knownIn addition they would allow users to annotate programs with constraints on the ex-pected dependencies and correlations similarly to type annotations in the presence oftype inference and check that these expectations hold

Finally the decision procedure and our dependency and correlation analyses couldbe offered as a software library A public API should describe and prescribe the ex-pected behavior of our two static analyses and the decision procedure relying on them

Theoretical Perspective From a theoretical perspective several interesting as-pects remain open In a nutshell these consist in developing support for more sophis-ticated queries that could be answered by our decision procedure The precision of ourdependency and correlation analysis can be further increased as well

208 Chapter 9 Conclusion and Perspectives

Decision Procedure A first interesting theoretical effort revolves around theformalization of our envisioned decision procedure used for inferring framing-relatedinvariants The types of queries it can answer should be further investigated andextended For instance it would be desirable to assert as a hypothesis that certainpredicates are known to be valid on some nodes of the graph We further identifiedtwo extensions for our correlation analysis that could increase the number of answeredqueries

Constructor Evolution For increasing the number of queries that our decisionprocedure can answer one direction to investigate is the extension of our correlationanalysis in order to track and compute information regarding the evolution of variantconstructors This additional information should be leveraged to the context of ourdecision procedure The formalization and implementation of this extension constitutean interesting effort Furthermore other types of relations between variables could beconsidered as well

Correlations between Inputs Another extension of our correlation analysisthat would enrich the types of queries that can be answered by our decision proce-dure consists in tracking correlations between pairs of inputs in addition to the onescomputed between pairs of inputs and outputs Besides the unified treatment of bothactual code and logical properties on the correlation analysis side this would allowanswering queries that consist in a single logical property on multiple input values thatare additionally related by other facts It would also allow detecting aliasing betweenvariables used as array indices

Numerical Analysis for Arrays Arrays are a source of precision loss in bothof our static analyses Hence it would be interesting to investigate the impact of usingsimple numerical abstractions (congruence modulo and linear abstract domains) Thenumerical analysis could otherwise be offloaded to an external SMT solver such as Z3or Alt-Ergo for instance Symbolic evaluation of the arithmetic computations shouldalso be possible This would avoid precision losses when joining two dependencies orcorrelations with exceptional information on distinct index variables which prove tohave the same integer value in practice Eliminating this source of imprecision wouldlikely benefit the analysis of loops over arrays

In conclusion we have devised and implemented two static analyses detecting thedata dependencies of a logical property as well as correlations between the inputs andthe outputs of operations Our first results on a functional model of a microkernelare encouraging both in terms of precision and speed making these analyses suitableto use in the context of interactive provers Aside from incremental improvements onthe precision of our analyses the next steps are to combine them in order to detectinvariants which are not affected by the execution of a predicate and to integrate this

92 Future Work 209

as a tactic in the ProvenTools theorem prover We believe that reasoning about framingcan come for free without imposing additional annotations Inferring the preservationof framing-related invariants through static analysis can become applicable on a routinebasis for complex transition systems

211

Bibliography

Abrial Jean-Raymond Stephen A Schuman and Bertrand Meyer (1980) ldquoSpecifica-tion Languagerdquo In On the Construction of Programs pp 343ndash410

Alpuente Mariacutea Santiago Escobar and Salvador Lucas (2007) ldquoRemoving RedundantArguments Automaticallyrdquo In TPLP 71-2 pp 3ndash35 url httpdxdoiorg101017S1471068406002869

Andreescu Oana F Thomas Jensen and Steacutephane Lescuyer (2015) ldquoDependencyAnalysis of Functional Specifications with Algebraic Data Structuresrdquo In FormalMethods and Software Engineering - 17th International Conference on Formal En-gineering Methods ICFEM 2015 Proceedings pp 116ndash133 doi 101007978-3-319-25423-4_8 url httpdxdoiorg101007978-3-319-25423-4_8

Andreescu Oana Fabiana Thomas Jensen and Steacutephane Lescuyer (2016) ldquoCorrelat-ing Structured Inputs and Outputs in Functional Specificationsrdquo In Software En-gineering and Formal Methods - 14th International Conference SEFM 2016 Heldas Part of STAF 2016 Vienna Austria July 4-8 2016 Proceedings pp 85ndash103doi 101007978-3-319-41591-8_7 url httpdxdoiorg101007978-3-319-41591-8_7

Asati Rahul Amitabha Sanyal Amey Karkare and Alan Mycroft (2014) ldquoLiveness-Based Garbage Collectionrdquo In Compiler Construction - 23rd International Con-ference CC 2014 Held as Part of the European Joint Conferences on Theory andPractice of Software ETAPS 2014 Grenoble France April 5-13 2014 Proceed-ings pp 85ndash106 doi 101007978-3-642-54807-9_5 url httpdxdoiorg101007978-3-642-54807-9_5

Baier Christel and Joost-Pieter Katoen (2008) Principles of Model Checking MITPress isbn 978-0-262-02649-9

Banerjee Anindya Mike Barnett and David A Naumann (2008) ldquoBoogie Meets Re-gions A Verification Experience Reportrdquo In Verified Software Theories Tools Ex-periments Second International Conference VSTTE 2008 Toronto Canada Oc-tober 6-9 2008 Proceedings Ed by Natarajan Shankar and Jim Woodcock BerlinHeidelberg Springer Berlin Heidelberg pp 177ndash191 isbn 978-3-540-87873-5 doi101007978-3-540-87873-5_16 url httpdxdoiorg101007978-3-540-87873-5_16

Banerjee Anindya and David A Naumann (2014) ldquoA Logical Analysis of Framing forSpecifications with Pure Method Callsrdquo In Verified Software Theories Tools andExperiments - 6th International Conference VSTTE 2014 Vienna Austria July17-18 2014 Revised Selected Papers pp 3ndash20 doi 101007978-3-319-12154-3_1

212 BIBLIOGRAPHY

Banerjee Anindya David A Naumann and Stan Rosenberg (2008) ldquoRegional Logicfor Local Reasoning about Global Invariantsrdquo In ECOOP 2008 - Object-OrientedProgramming 22nd European Conference Paphos Cyprus July 7-11 2008 Pro-ceedings pp 387ndash411 doi 101007978-3-540-70592-5_17 url httpdxdoiorg101007978-3-540-70592-5_17

mdash (2013) ldquoLocal Reasoning for Global Invariants Part I Region Logicrdquo In J ACM603 181ndash1856 doi 1011452485982 url httpdoiacmorg1011452485982

Barnes J and Praxis Critical Systems Limited (1997) High Integrity Ada The SPARKApproach Programming Languages Addison-Wesley isbn 9780201175172 urlhttpsbooksgooglefrbooksid=YoBGAAAAYAAJ

Barnett Michael and David A Naumann (2004) ldquoFriends Need a Bit More Maintain-ing Invariants Over Shared Staterdquo In Mathematics of Program Construction 7thInternational Conference MPC 2004 Stirling Scotland UK July 12-14 2004Proceedings pp 54ndash84 doi 10 1007 978 - 3 - 540 - 27764 - 4 _ 5 url http dxdoiorg101007978-3-540-27764-4_5

Barnett Michael Robert DeLine Manuel Faumlhndrich K Rustan M Leino and Wol-fram Schulte (2004) ldquoVerification of Object-Oriented Programs with InvariantsrdquoIn Journal of Object Technology 36 pp 27ndash56 doi 105381jot200436a2url httpdxdoiorg105381jot200436a2

Barnett Michael Bor-Yuh Evan Chang Robert DeLine Bart Jacobs and K RustanM Leino (2005a) ldquoBoogie A Modular Reusable Verifier for Object-Oriented Pro-gramsrdquo In Formal Methods for Components and Objects 4th International Sym-posium FMCO 2005 Amsterdam The Netherlands November 1-4 2005 RevisedLectures pp 364ndash387 doi 10100711804192_17 url httpdxdoiorg10100711804192_17

Barnett Michael Robert DeLine Manuel Faumlhndrich Bart Jacobs K Rustan M LeinoWolfram Schulte and Herman Venter (2005b) ldquoThe Spec Programming SystemChallenges and Directionsrdquo In Verified Software Theories Tools ExperimentsFirst IFIP TC 2WG 23 Conference VSTTE 2005 Zurich Switzerland October10-13 2005 Revised Selected Papers and Discussions pp 144ndash152 doi 101007978-3-540-69149-5_16 url httpdxdoiorg101007978-3-540-69149-5_16

Barnett Mike Manuel Faumlhndrich K Rustan M Leino Peter Muumlller Wolfram Schulteand Herman Venter (2011) ldquoSpecification and Verification The Spec ExperiencerdquoIn Commun ACM 546 pp 81ndash91 doi 10114519531221953145 url httpdoiacmorg10114519531221953145

Berdine Josh Cristiano Calcagno and Peter W OrsquoHearn (2005) ldquoSmallfoot Mod-ular Automatic Assertion Checking with Separation Logicrdquo In Formal Methodsfor Components and Objects 4th International Symposium FMCO 2005 Amster-dam The Netherlands November 1-4 2005 Revised Lectures pp 115ndash137 doi10100711804192_6 url httpdxdoiorg10100711804192_6

mdash (2012) ldquoVerification Condition Generation and Variable Conditions in SmallfootrdquoIn CoRR abs12044804 url httparxivorgabs12044804

BIBLIOGRAPHY 213

Berdine Josh Byron Cook and Samin Ishtiaq (2011) ldquoSLAyer Memory Safety forSystems-Level Coderdquo In Computer Aided Verification - 23rd International Confer-ence CAV 2011 Snowbird UT USA July 14-20 2011 Proceedings pp 178ndash183doi 101007978-3-642-22110-1_15 url httpdxdoiorg101007978-3-642-22110-1_15

Berg Joachim van den and Bart Jacobs (2001) ldquoThe LOOP Compiler for Java andJMLrdquo In Tools and Algorithms for the Construction and Analysis of Systems7th International Conference TACAS 2001 Held as Part of the Joint EuropeanConferences on Theory and Practice of Software ETAPS 2001 Genova Italy April2-6 2001 Proceedings pp 299ndash312 doi 1010073- 540- 45319- 9_21 urlhttpdxdoiorg1010073-540-45319-9_21

Bertot Yves and Pierre Casteacuteran (2004) Interactive Theorem Proving and ProgramDevelopment - CoqrsquoArt The Calculus of Inductive Constructions Texts in The-oretical Computer Science An EATCS Series Springer isbn 978-3-642-05880-6doi 101007978-3-662-07964-5 url httpdxdoiorg101007978-3-662-07964-5

Bertrane Julien Patrick Cousot Radhia Cousot Jeacuterocircme Feret Laurent MauborgneAntoine Mineacute and Xavier Rival (2015) ldquoStatic Analysis and Verification of AerospaceSoftware by Abstract Interpretationrdquo In Foundations and Trends in ProgrammingLanguages 22-3 pp 71ndash190 doi 1015612500000002 url httpdxdoiorg1015612500000002

Blanchet Bruno Patrick Cousot Radhia Cousot Jeacuterocircme Feret Laurent MauborgneAntoine Mineacute David Monniaux and Xavier Rival (2003) ldquoA Static Analyzer forLarge Safety-Critical Softwarerdquo In Proceedings of the ACM SIGPLAN 2003 Con-ference on Programming Language Design and Implementation 2003 San DiegoCalifornia USA June 9-11 2003 pp 196ndash207 doi 101145781131781153url httpdoiacmorg101145781131781153

Bobot Franccedilois and Jean-Christophe Filliacirctre (2012) ldquoSeparation Predicates A Tasteof Separation Logic in First-Order Logicrdquo In Formal Methods and Software Engi-neering - 14th International Conference on Formal Engineering Methods ICFEM2012 Kyoto Japan November 12-16 2012 Proceedings pp 167ndash181 doi 101007978-3-642-34281-3_14 url httpdxdoiorg101007978-3-642-34281-3_14

Borgida Alexander John Mylopoulos and Raymond Reiter (1993) ldquo And NothingElse Changes The Frame Problem in Procedure Specificationsrdquo In Proceedings ofthe 15th International Conference on Software Engineering Baltimore MarylandUSA May 17-21 1993 Pp 303ndash314 url httpportalacmorgcitationcfmid=257572257636

mdash (1995) ldquoOn the Frame Problem in Procedure Specificationsrdquo In IEEE Trans Soft-ware Eng 2110 pp 785ndash798 doi 10110932469460 url httpdxdoiorg10110932469460

Bouissou O Eacute Conquet P Cousot R Cousot J Feret K Ghorbal Eacute GoubaultD Lesens L Mauborgne A Mineacute S Putot X Rival and M Turin (2009)

214 BIBLIOGRAPHY

ldquoSpace Software Validation using Abstract Interpretationrdquo In Proc of the In-ternational Space System Engineering Conference on Data Systems in Aerospace(DASIA 2009) Vol SP-669 httpwww-aprlip6fr~minepubliarticle-bouissou-al-dasia09pdf Istambul Turkey ESA p 7 doi 19215321921553

Burdy Lilian Yoonsik Cheon David R Cok Michael D Ernst Joseph R Kiniry GaryT Leavens K Rustan M Leino and Erik Poll (2005) ldquoAn Overview of JML Toolsand Applicationsrdquo In STTT 73 pp 212ndash232 doi 101007s10009-004-0167-4url httpdxdoiorg101007s10009-004-0167-4

Calcagno Cristiano and Dino Distefano (2011) ldquoInfer An Automatic Program Verifierfor Memory Safety of C Programsrdquo In NASA Formal Methods - Third Interna-tional Symposium NFM 2011 Pasadena CA USA April 18-20 2011 Proceed-ings pp 459ndash465 doi 101007978-3-642-20398-5_33 url httpdxdoiorg101007978-3-642-20398-5_33

Calcagno Cristiano Dino Distefano Peter W OrsquoHearn and Hongseok Yang (2008)ldquoSpace Invading Systems Coderdquo In Logic-Based Program Synthesis and Transfor-mation 18th International Symposium LOPSTR 2008 Valencia Spain July 17-18 2008 Revised Selected Papers pp 1ndash3 doi 101007978-3-642-00515-2_1url httpdxdoiorg101007978-3-642-00515-2_1

mdash (2009) ldquoCompositional Shape Analysis by Means of Bi-Abductionrdquo In Proceedingsof the 36th ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages POPL 2009 pp 289ndash300 doi 10114514808811480917 url httpdoiacmorg10114514808811480917

mdash (2011) ldquoCompositional Shape Analysis by Means of Bi-Abductionrdquo In J ACM586 p 26 doi 10114520496972049700

Cardelli Luca and Peter Wegner (1985) ldquoOn Understanding Types Data Abstractionand Polymorphismrdquo In ACM Comput Surv 174 pp 471ndash522 doi 10114560416042 url httpdoiacmorg10114560416042

Castillo Rosa Francisco Corbera Angeles G Navarro Rafael Asenjo and Emilio LZapata (2008) ldquoComplete Def-Use Analysis in Recursive Programs with DynamicData Structuresrdquo In Euro-Par 2008 Workshops - Parallel Processing VHPC 2008UNICORE 2008 HPPC 2008 SGS 2008 PROPER 2008 ROIA 2008 and DPA2008 Las Palmas de Gran Canaria Spain August 25-26 2008 Revised SelectedPapers pp 273ndash282 doi 101007978-3-642-00955-6_32 url httpdxdoiorg101007978-3-642-00955-6_32

Catantildeo Neacutestor and Marieke Huisman (2003) ldquoCHASE A Static Checker for JMLrsquosAssignable Clauserdquo In Verification Model Checking and Abstract Interpretation4th International Conference VMCAI 2003 New York NY USA January 9-112002 Proceedings pp 26ndash40 doi 10 1007 3 - 540 - 36384 - X _ 6 url http dxdoiorg1010073-540-36384-X_6

Chalin Patrice Joseph R Kiniry Gary T Leavens and Erik Poll (2005) ldquoBeyondAssertions Advanced Specification and Verification with JML and ESCJava2rdquoIn Formal Methods for Components and Objects 4th International SymposiumFMCO 2005 Amsterdam The Netherlands November 1-4 2005 Revised Lectures

BIBLIOGRAPHY 215

pp 342ndash363 doi 10100711804192_16 url httpdxdoiorg10100711804192_16

Chang Bor-Yuh Evan and K Rustan M Leino (2005) ldquoAbstract Interpretation withAlien Expressions and Heap Structuresrdquo In Verification Model Checking andAbstract Interpretation 6th International Conference VMCAI 2005 Proceedingspp 147ndash163 doi 101007978-3-540-30579-8_11 url httpdxdoiorg101007978-3-540-30579-8_11

Clarke David G and Sophia Drossopoulou (2002) ldquoOwnership Encapsulation andthe Disjointness of Type and Effectrdquo In Proceedings of the 2002 ACM SIGPLANConference on Object-Oriented Programming Systems Languages and ApplicationsOOPSLA 2002 Seattle Washington USA November 4-8 2002 Pp 292ndash310 doi101145582419582447 url httpdoiacmorg101145582419582447

Clarke David G John Potter and James Noble (1998) ldquoOwnership Types for Flex-ible Alias Protectionrdquo In Proceedings of the 1998 ACM SIGPLAN Conferenceon Object-Oriented Programming Systems Languages amp Applications (OOPSLArsquo98) Vancouver British Columbia Canada October 18-22 1998 Pp 48ndash64 doi101145286936286947 url httpdoiacmorg101145286936286947

Clarke Edmund M and E Allen Emerson (1981) ldquoDesign and Synthesis of Synchro-nization Skeletons Using Branching-Time Temporal Logicrdquo In Logics of ProgramsWorkshop Yorktown Heights New York May 1981 pp 52ndash71 doi 10 1007BFb0025774 url httpdxdoiorg101007BFb0025774

Cok David R (2005) ldquoReasoning with Specifications Containing Method Calls andModel Fieldsrdquo In Journal of Object Technology 48 pp 77ndash103 doi 105381jot200548a4 url httpdxdoiorg105381jot200548a4

Cousot P and R Cousot (1994) ldquoHigher-Order Abstract Interpretation (and Appli-cation to Comportment Analysis Generalizing Strictness Termination Projectionand PER Analysis of Functional Languages) invited paperrdquo In Proceedings of the1994 International Conference on Computer Languages Toulouse France IEEEComputer Society Press Los Alamitos California pp 95ndash112

Cousot Patrick (2001) ldquoAbstract Interpretation Based Formal Methods and FutureChallengesrdquo In Informatics - 10 Years Back 10 Years Ahead Pp 138ndash156 doi1010073-540-44577-3_10 url httpdxdoiorg1010073-540-44577-3_10

Cousot Patrick and Radhia Cousot (1977) ldquoAbstract Interpretation A Unified Lat-tice Model for Static Analysis of Programs by Construction or Approximation ofFixpointsrdquo In Conference Record of the Fourth ACM Symposium on Principles ofProgramming Languages Los Angeles California USA January 1977 pp 238ndash252 doi 101145512950512973 url httpdoiacmorg101145512950512973

mdash (2010) ldquoA Gentle Introduction to Formal Verification of Computer Systems byAbstract Interpretationrdquo In Logics and Languages for Reliability and Securitypp 1ndash29 doi 103233978-1-60750-100-8-1 url httpdxdoiorg103233978-1-60750-100-8-1

216 BIBLIOGRAPHY

Cousot Patrick Radhia Cousot Jeacuterocircme Feret Laurent Mauborgne Antoine MineacuteDavid Monniaux and Xavier Rival (2005) ldquoThe ASTREEacute Analyzerrdquo In Program-ming Languages and Systems 14th European Symposium on ProgrammingESOP2005 Held as Part of the Joint European Conferences on Theory and Practice ofSoftware ETAPS 2005 Edinburgh UK April 4-8 2005 Proceedings pp 21ndash30doi 101007978-3-540-31987-0_3 url httpdxdoiorg101007978-3-540-31987-0_3

Cousot Patrick Radhia Cousot Jeacuterocircme Feret Antoine Mineacute Laurent MauborgneDavid Monniaux and Xavier Rival (2007) ldquoVarieties of Static Analyzers A Com-parison with ASTREErdquo In First Joint IEEEIFIP Symposium on Theoretical As-pects of Software Engineering TASE 2007 June 5-8 2007 Shanghai China pp 3ndash20 doi 101109TASE200755 url httpdxdoiorg101109TASE200755

Cuoq Pascal Virgile Prevosto and Boris Yakobowski Frama-C Value Analysis UserManual url httpframa-ccomdownloadframa-c-value-analysispdf

Cuoq Pascal Florent Kirchner Nikolai Kosmatov Virgile Prevosto Julien Signolesand Boris Yakobowski (2012) ldquoFrama-C - A Software Analysis Perspectiverdquo InSoftware Engineering and Formal Methods - 10th International Conference SEFM2012 Thessaloniki Greece October 1-5 2012 Proceedings pp 233ndash247 doi 101007978-3-642-33826-7_16 url httpdxdoiorg101007978-3-642-33826-7_16

Cytron Ron Jeanne Ferrante Barry K Rosen Mark N Wegman and F KennethZadeck (1989) ldquoAn Efficient Method of Computing Static Single Assignment FormrdquoIn Conference Record of the Sixteenth Annual ACM Symposium on Principles ofProgramming Languages Austin Texas USA January 11-13 1989 pp 25ndash35 doi1011457527775280 url httpdoiacmorg1011457527775280

Darvas Aacutedaacutem and Peter Muumlller (2006) ldquoReasoning About Method Calls in InterfaceSpecificationsrdquo In Journal of Object Technology 55 pp 59ndash85 doi 105381jot200655a3 url httpdxdoiorg105381jot200655a3

Delmas David and Jean Souyris (2007) ldquoAstreacutee From Research to Industryrdquo In StaticAnalysis 14th International Symposium SAS 2007 Kongens Lyngby DenmarkAugust 22-24 2007 Proceedings pp 437ndash451 doi 101007978-3-540-74061-2_27 url httpdxdoiorg101007978-3-540-74061-2_27

Dietl Werner and Peter Muumlller (2005) ldquoUniverses Lightweight Ownership for JMLrdquoIn Journal of Object Technology 48 pp 5ndash32 doi 105381jot200548a1url httpdxdoiorg105381jot200548a1

Dijkstra Edsger W (1976) A Discipline of Programming Prentice-HallDistefano Dino Peter W OrsquoHearn and Hongseok Yang (2006) ldquoA Local Shape Anal-

ysis Based on Separation Logicrdquo In Proceedings of the 12th International Con-ference on Tools and Algorithms for the Construction and Analysis of SystemsTACASrsquo06 Vienna Austria Springer-Verlag pp 287ndash302 isbn 3-540-33056-9978-3-540-33056-1

Distefano Dino and Matthew J Parkinson (2008) ldquojStar Towards Practical Verifi-cation for Javardquo In Proceedings of the 23rd Annual ACM SIGPLAN Conference

BIBLIOGRAPHY 217

on Object-Oriented Programming Systems Languages and Applications OOPSLA2008 October 19-23 2008 Nashville TN USA pp 213ndash226 doi 10 1145 14497641449782 url httpdoiacmorg10114514497641449782

Drossopoulou Sophia Adrian Francalanza Peter Muumlller and Alexander J Summers(2008) ldquoA Unified Framework for Verification Techniques for Object Invariantsrdquo InECOOP 2008 - Object-Oriented Programming 22nd European Conference PaphosCyprus July 7-11 2008 Proceedings pp 412ndash437 doi 101007978- 3- 540-70592-5_18 url httpdxdoiorg101007978-3-540-70592-5_18

Eclipse Java Development Tools (JDT) httpwwweclipseorgjdt Accessed2016-09-11

Feijs L M G Loe M G and H B M Jonkers (1992) Formal Specification andDesign Cambridge tracts in theoretical computer science Cambridge New YorkCambridge University Press isbn 0-521-43457-2 url httpopacinriafrrecord=b1083844

Flanagan Cormac K Rustan M Leino Mark Lillibridge Greg Nelson James B Saxeand Raymie Stata (2002) ldquoExtended Static Checking for Javardquo In Proceedingsof the 2002 ACM SIGPLAN Conference on Programming Language Design andImplementation (PLDI) Berlin Germany June 17-19 2002 pp 234ndash245 doi101145512529512558 url httpdoiacmorg101145512529512558

Floyd Robert W (1967) ldquoAssigning Meanings to Programsrdquo In Mathematical Aspectsof Computer Science Ed by J T Schwartz Vol 19 Proceedings of Symposia inApplied Mathematics Providence Rhode Island American Mathematical Societypp 19ndash32

Gallier Jean H (1987) Logic for Computer Science Foundations of Automatic Theo-rem Proving Wiley isbn 978-0-471-61546-0

Gharat Pritam M Uday P Khedker and Alan Mycroft (2016) ldquoFlow- and Context-Sensitive Points-To Analysis Using Generalized Points-To Graphsrdquo In Static Anal-ysis - 23rd International Symposium SAS 2016 Edinburgh UK September 8-102016 Proceedings pp 212ndash236 doi 101007978- 3- 662- 53413- 7_11 urlhttpdxdoiorg101007978-3-662-53413-7_11

Greenhouse Aaron and John Boyland (1999) ldquoAn Object-Oriented Effects SystemrdquoIn ECOOPrsquo99 - Object-Oriented Programming 13th European Conference LisbonPortugal June 14-18 1999 Proceedings pp 205ndash229 doi 1010073-540-48743-3_10 url httpdxdoiorg1010073-540-48743-3_10

Gross Thomas R and Peter Steenkiste (1990) ldquoStructured Dataflow Analysis for Ar-rays and its Use in an Optimizing Compilerrdquo In Softw Pract Exper 202 pp 133ndash155 doi 101002spe4380200203 url httpdxdoiorg101002spe4380200203

Guttag John V James J Horning and Jeannette M Wing (1985) ldquoThe Larch Familyof Specification Languagesrdquo In IEEE Software 25 pp 24ndash36 doi 101109MS1985231756 url httpdxdoiorg101109MS1985231756

Guttag John V James J Horning Stephen J Garland Kevin D Jones A Modet andJeannette M Wing (1993a) Larch Languages and Tools for Formal SpecificationTexts and Monographs in Computer Science Springer isbn 978-1-4612-7636-4

218 BIBLIOGRAPHY

doi 101007978-1-4612-2704-5 url httpdxdoiorg101007978-1-4612-2704-5

Guttag John V James J Horning Stephen J Garland Kevin D Jones A Modet andJeannette M Wing (1993b) Larch Languages and Tools for Formal SpecificationTexts and Monographs in Computer Science Springer isbn 978-1-4612-7636-4doi 101007978-1-4612-2704-5 url httpdxdoiorg101007978-1-4612-2704-5

Hammer Christian and Gregor Snelting (2009) ldquoFlow-Sensitive Context-Sensitiveand Object-Sensitive Information Flow Control based on Program DependenceGraphsrdquo In Int J Inf Sec 86 pp 399ndash422 doi 101007s10207-009-0086-1url httpdxdoiorg101007s10207-009-0086-1

Hatcliff John Gary T Leavens K Rustan M Leino Peter Muumlller and Matthew JParkinson (2012) ldquoBehavioral Interface Specification Languagesrdquo In ACM Com-put Surv 443 p 16 doi 10114521876712187678 url httpdoiacmorg10114521876712187678

Heintze Nevin and Olivier Tardieu (2001) ldquoDemand-Driven Pointer Analysisrdquo InProceedings of the ACM SIGPLAN 2001 Conference on Programming LanguageDesign and Implementation PLDI rsquo01 Snowbird Utah USA ACM pp 24ndash34isbn 1-58113-414-2 doi 101145378795378802 url httpdoiacmorg101145378795378802

Hind Michael (2001) ldquoPointer Analysis Havenrsquot We Solved This Problem Yetrdquo InProceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program AnalysisFor Software Tools and Engineering PASTErsquo01 Snowbird Utah USA June 18-19 2001 pp 54ndash61 doi 101145379605379665 url httpdoiacmorg101145379605379665

Hoare C A R (1969) ldquoAn Axiomatic Basis for Computer Programmingrdquo In Com-mun ACM 1210 pp 576ndash580 doi 101145363235363259 url httpdoiacmorg101145363235363259

mdash (1971) ldquoProcedures and Parameters An Axiomatic Approachrdquo In Symposium onSemantics of Algorithmic Languages pp 102ndash116 doi 101007BFb0059696 urlhttpdxdoiorg101007BFb0059696

Horwitz Susan Thomas W Reps and Shmuel Sagiv (1995) ldquoDemand Interproce-dural Dataflow Analysisrdquo In SIGSOFT rsquo95 Proceedings of the Third ACM SIG-SOFT Symposium on Foundations of Software Engineering Washington DC USAOctober 10-13 1995 pp 104ndash115 doi 10 1145 222124222146 url http doiacmorg101145222124222146

Hughes J (1987) ldquoBackwards Analysis of Functional Programsrdquo In IFIP Workshopon Partial Evaluation and Mivxed Computation Ed by Bjoslashrner and Ershov

Hur Chung-Kil Derek Dreyer and Viktor Vafeiadis (2011) ldquoSeparation Logic in thePresence of Garbage Collectionrdquo In Proceedings of the 26th Annual IEEE Sym-posium on Logic in Computer Science LICS 2011 June 21-24 2011 TorontoOntario Canada pp 247ndash256 doi 101109LICS201146 url httpdxdoiorg101109LICS201146

BIBLIOGRAPHY 219

Jacobs Bart and Frank Piessens (2006) ldquoVerification of Programs with Inspector Meth-odsrdquo In In FTfJP 2006

Jacobs Bart Jan Smans and Frank Piessens (2010) ldquoA Quick Tour of the VeriFastProgram Verifierrdquo In Programming Languages and Systems - 8th Asian Sympo-sium APLAS 2010 Shanghai China November 28 - December 1 2010 Proceed-ings pp 304ndash311 doi 101007978-3-642-17164-2_21 url httpdxdoiorg101007978-3-642-17164-2_21

Jacobs Bart Jan Smans Pieter Philippaerts Freacutedeacuteric Vogels Willem Penninckx andFrank Piessens (2011) ldquoVeriFast A Powerful Sound Predictable Fast Verifier for Cand Javardquo In NASA Formal Methods - Third International Symposium NFM 2011Pasadena CA USA April 18-20 2011 Proceedings pp 41ndash55 doi 101007978-3-642-20398-5_4 url httpdxdoiorg101007978-3-642-20398-5_4

Java Native Interface Documentation (JNI) url https docs oracle com javase7docstechnotesguidesjnispecintrohtmlwp725 (Accessed09112016)

Jensen Simon Holm Anders Moslashller and Peter Thiemann (2010) ldquoInterproceduralAnalysis with Lazy Propagationrdquo In Static Analysis - 17th International Sympo-sium SAS 2010 Perpignan France September 14-16 2010 Proceedings pp 320ndash339 doi 101007978-3-642-15769-1_20 url httpdxdoiorg101007978-3-642-15769-1_20

Jhala Ranjit and Rupak Majumdar (2009) ldquoSoftware Model Checkingrdquo In ACMComput Surv 414 211ndash2154 doi 10 1145 1592434 1592438 url http doiacmorg10114515924341592438

Jones Cliff B (1990) Systematic Software Development Using VDM (2Nd Ed) UpperSaddle River NJ USA Prentice-Hall Inc isbn 0-13-880733-7

Jones Neil D and Steven S Muchnick (1979) ldquoFlow Analysis and Optimization of Lisp-Like Structuresrdquo In Conference Record of the Sixth Annual ACM Symposium onPrinciples of Programming Languages 1979 pp 244ndash256 doi 101145567752567776 url httpdoiacmorg101145567752567776

Jones Simon B and Daniel Le Meacutetayer (1989) ldquoComputer-Time Garbage Collectionby Sharing Analysisrdquo In Proceedings of the fourth international conference onFunctional programming languages and computer architecture FPCA 1989 Lon-don UK September 11-13 1989 pp 54ndash74 doi 1011459937099375 urlhttpdoiacmorg1011459937099375

Kassios Ioannis T (2006) ldquoDynamic Frames Support for Framing Dependencies andSharing Without Restrictionsrdquo In FM 2006 Formal Methods 14th InternationalSymposium on Formal Methods Hamilton Canada August 21-27 2006 Proceed-ings pp 268ndash283 doi 10100711813040_19 url httpdxdoiorg10100711813040_19

mdash (2011) ldquoThe Dynamic Frames Theoryrdquo In Formal Asp Comput 233 pp 267ndash288doi 101007s00165-010-0152-5 url httpdxdoiorg101007s00165-010-0152-5

220 BIBLIOGRAPHY

Kennedy Ken (1978) ldquoUse-Definition Chains with Applicationsrdquo In Comput Lang33 pp 163ndash179 doi 1010160096-0551(78)90009-7 url httpdxdoiorg1010160096-0551(78)90009-7

Khedker Uday P Alan Mycroft and Prashant Singh Rawat (2011) ldquoLazy PointerAnalysisrdquo In CoRR abs11125000 url httparxivorgabs11125000

Kildall Gary A (1973) ldquoA Unified Approach to Global Program Optimizationrdquo InConference Record of the ACM Symposium on Principles of Programming Lan-guages 1973 pp 194ndash206 doi 101145512927512945 url httpdoiacmorg101145512927512945

Klein Gerwin Kevin Elphinstone Gernot Heiser June Andronick David Cock PhilipDerrin Dhammika Elkaduwe Kai Engelhardt Rafal Kolanski Michael NorrishThomas Sewell Harvey Tuch and Simon Winwood (2009) ldquoseL4 Formal Verifica-tion of an OS Kernelrdquo In Proceedings of the ACM SIGOPS 22Nd Symposium onOperating Systems Principles SOSP rsquo09 Big Sky Montana USA ACM pp 207ndash220 isbn 978-1-60558-752-3 doi 10114516295751629596 url httpdoiacmorg10114516295751629596

Knoop Jens Oliver Ruumlthing and Bernhard Steffen (1994) ldquoPartial Dead Code Elim-inationrdquo In Proceedings of the ACM SIGPLANrsquo94 Conference on ProgrammingLanguage Design and Implementation (PLDI) Orlando Florida USA June 20-24 1994 pp 147ndash158 doi 101145178243178256 url httpdoiacmorg101145178243178256

Koenig Jason and K Rustan M Leino (2012) ldquoGetting Started with Dafny A GuiderdquoIn Software Safety and Security - Tools for Analysis and Verification pp 152ndash181doi 103233978-1-61499-028-4-152 url httpdxdoiorg103233978-1-61499-028-4-152

Kogtenkov Alexander Bertrand Meyer and Sergey Velder (2015) ldquoAlias CalculusChange Calculus and Frame Inferencerdquo In Sci Comput Program 97P1 pp 163ndash172 issn 0167-6423

Lattner Chris Andrew Lenharth and Vikram S Adve (2007) ldquoMaking Context-Sensitive Points-To Analysis with Heap Cloning Practical for the Real WorldrdquoIn Proceedings of the ACM SIGPLAN 2007 Conference on Programming LanguageDesign and Implementation 2007 pp 278ndash289 doi 10114512507341250766url httpdoiacmorg10114512507341250766

Leavens Gary T Albert L Baker and Clyde Ruby (2006) ldquoPreliminary Design ofJML A Behavioral Interface Specification Language for Javardquo In ACM SIGSOFTSoftware Engineering Notes 313 pp 1ndash38 doi 10114511278781127884 urlhttpdoiacmorg10114511278781127884

Leavens Gary T and Curtis Clifton (2005) ldquoLessons from the JML Projectrdquo In Veri-fied Software Theories Tools Experiments First IFIP TC 2WG 23 ConferenceVSTTE 2005 Zurich Switzerland October 10-13 2005 Revised Selected Papersand Discussions pp 134ndash143 doi 10 1007 978 - 3 - 540 - 69149 - 5 _ 15 urlhttpdxdoiorg101007978-3-540-69149-5_15

Leavens Gary T K Rustan M Leino and Peter Muumlller (2007) ldquoSpecification andVerification Challenges for Sequential Object-Oriented Programsrdquo In Formal Asp

BIBLIOGRAPHY 221

Comput 192 pp 159ndash189 doi 10 1007 s00165 - 007 - 0026 - 7 url http dxdoiorg101007s00165-007-0026-7

Leavens Gary T and Peter Muumlller (2007) ldquoInformation Hiding and Visibility in In-terface Specificationsrdquo In 29th International Conference on Software Engineer-ing (ICSE 2007) Minneapolis MN USA May 20-26 2007 pp 385ndash395 doi101109ICSE200744 url httpdxdoiorg101109ICSE200744

Leavens Gary T Erik Poll Curtis Clifton Yoonsik Cheon Clyde Ruby David Cokand Joseph Kiniry (2006) JML Reference Manual

Lehner Hermann and Peter Muumlller (2010) ldquoEfficient Runtime Assertion Checking ofAssignable Clauses with Datagroupsrdquo In Fundamental Approaches to Software En-gineering 13th International Conference FASE 2010 Held as Part of the JointEuropean Conferences on Theory and Practice of Software ETAPS 2010 PaphosCyprus March 20-28 2010 Proceedings pp 338ndash352 doi 101007978-3-642-12029-9_24 url httpdxdoiorg101007978-3-642-12029-9_24

Leinenbach Dirk and Thomas Santen (2009) ldquoVerifying the Microsoft Hyper-V Hy-pervisor with VCCrdquo In FM 2009 Formal Methods Second World Congress Eind-hoven The Netherlands November 2-6 2009 Proceedings Ed by Ana Cavalcantiand Dennis R Dams Berlin Heidelberg Springer Berlin Heidelberg pp 806ndash809isbn 978-3-642-05089-3 doi 101007978- 3- 642- 05089- 3_51 url httpdxdoiorg101007978-3-642-05089-3_51

Leino K Rustan M This is Boogie 2 Boogie Reference Manual url http researchmicrosoftcomen-usumpeopleleinopaperskrml178pdf

mdash (1998) ldquoData Groups Specifying the Modification of Extended Staterdquo In Pro-ceedings of the 1998 ACM SIGPLAN Conference on Object-Oriented ProgrammingSystems Languages amp Applications (OOPSLA rsquo98) Vancouver British ColumbiaCanada October 18-22 1998 Pp 144ndash153 doi 101145286936286953 urlhttpdoiacmorg101145286936286953

mdash (2001) ldquoExtended Static Checking A Ten-Year Perspectiverdquo In Informatics - 10Years Back 10 Years Ahead Pp 157ndash175 doi 1010073-540-44577-3_11 urlhttpdxdoiorg1010073-540-44577-3_11

mdash (2010) ldquoDafny An Automatic Program Verifier for Functional Correctnessrdquo InLogic for Programming Artificial Intelligence and Reasoning - 16th InternationalConference LPAR-16 Dakar Senegal April 25-May 1 2010 Revised Selected Pa-pers pp 348ndash370 doi 101007978-3-642-17511-4_20 url httpdxdoiorg101007978-3-642-17511-4_20

Leino K Rustan M and Peter Muumlller (2004) ldquoObject Invariants in Dynamic Con-textsrdquo In ECOOP 2004 - Object-Oriented Programming 18th European Confer-ence Oslo Norway June 14-18 2004 Proceedings pp 491ndash516 doi 101007978-3-540-24851-4_22 url httpdxdoiorg101007978-3-540-24851-4_22

mdash (2006) ldquoA Verification Methodology for Model Fieldsrdquo In Programming Languagesand Systems 15th European Symposium on Programming ESOP 2006 Held as Partof the Joint European Conferences on Theory and Practice of Software ETAPS

222 BIBLIOGRAPHY

2006 Vienna Austria March 27-28 2006 Proceedings pp 115ndash130 doi 10 100711693024_9 url httpdxdoiorg10100711693024_9

Leino K Rustan M and Peter Muumlller (2008a) ldquoUsing the Spec Language Method-ology and Tools to Write Bug-Free Programsrdquo In Advanced Lectures on SoftwareEngineering LASER Summer School 20072008 pp 91ndash139 doi 101007978-3-642-13010-6_4 url httpdxdoiorg101007978-3-642-13010-6_4

mdash (2008b) ldquoVerification of Equivalent-Results Methodsrdquo In Programming Languagesand Systems 17th European Symposium on Programming ESOP 2008 Held as Partof the Joint European Conferences on Theory and Practice of Software ETAPS2008 Budapest Hungary March 29-April 6 2008 Proceedings pp 307ndash321 doi101007978-3-540-78739-6_24 url httpdxdoiorg101007978-3-540-78739-6_24

Leino K Rustan M Peter Muumlller and Jan Smans (2009) ldquoVerification of Concur-rent Programs with Chalicerdquo In Foundations of Security Analysis and Design VFOSAD 200720082009 Tutorial Lectures pp 195ndash222 doi 101007978- 3-642-03829-7_7 url httpdxdoiorg101007978-3-642-03829-7_7

Leino K Rustan M Peter Muumlller and Angela Wallenburg (2008) ldquoFlexible Im-mutability with Frozen Objectsrdquo In Verified Software Theories Tools Experi-ments Second International Conference VSTTE 2008 Toronto Canada October6-9 2008 Proceedings pp 192ndash208 doi 101007978-3-540-87873-5_17 urlhttpdxdoiorg101007978-3-540-87873-5_17

Leino K Rustan M and Greg Nelson (1998) ldquoAn Extended Static Checker for Modular-3rdquo In Compiler Construction 7th International Conference CCrsquo98 Held as Part ofthe European Joint Conferences on the Theory and Practice of Software ETAPSrsquo98Lisbon Portugal March 28 - April 4 1998 Proceedings pp 302ndash305 doi 101007BFb0026441 url httpdxdoiorg101007BFb0026441

mdash (2002) ldquoData Abstraction and Information Hidingrdquo In ACM Trans ProgramLang Syst 245 pp 491ndash553 doi 101145570886570888 url httpdoiacmorg101145570886570888

Leino K Rustan M Arnd Poetzsch-Heffter and Yunhong Zhou (2002) ldquoUsing DataGroups to Specify and Check Side Effectsrdquo In Proceedings of the 2002 ACM SIG-PLAN Conference on Programming Language Design and Implementation (PLDI)Berlin Germany June 17-19 2002 pp 246ndash257 doi 101145512529512559url httpdoiacmorg101145512529512559

Leino K Rustan M and Philipp Ruumlmmer (2010) ldquoA Polymorphic Intermediate Ver-ification Language Design and Logical Encodingrdquo In Tools and Algorithms forthe Construction and Analysis of Systems 16th International Conference TACAS2010 Held as Part of the Joint European Conferences on Theory and Practice ofSoftware ETAPS 2010 Paphos Cyprus March 20-28 2010 Proceedings pp 312ndash327 doi 101007978-3-642-12002-2_26 url httpdxdoiorg101007978-3-642-12002-2_26

Leroy Xavier (2009) ldquoA Formally Verified Compiler Back-endrdquo In J Autom Reason-ing 434 pp 363ndash446 doi 101007s10817-009-9155-4 url httpdxdoiorg101007s10817-009-9155-4

BIBLIOGRAPHY 223

Leroy Xavier and Franccedilois Pessaux (2000) ldquoType-Based Analysis of Uncaught Excep-tionsrdquo In ACM Trans Program Lang Syst 222 pp 340ndash377 doi 101145349214349230 url httpdoiacmorg101145349214349230

Lescuyer Steacutephane (2015) ldquoProvenCore Towards a Verified Isolation Micro-KernelrdquoIn International Workshop on MILS Architecture and Assurance for Secure Sys-tems url httpmils-workshop-2015euromilseu

Leuschel Michael and Morten Heine Soslashrensen (1996) ldquoRedundant Argument Filteringof Logic Programsrdquo In Logic Programming Synthesis and Transformation 6th In-ternational Workshop LOPSTRrsquo96 Stockholm Sweden August 28-30 1996 Pro-ceedings pp 83ndash103 doi 1010073-540-62718-9_6 url httpdxdoiorg1010073-540-62718-9_6

Lhotaacutek Ondrej and Laurie J Hendren (2006) ldquoContext-Sensitive Points-to AnalysisIs It Worth Itrdquo In Compiler Construction 15th International Conference CC2006 Held as Part of the Joint European Conferences on Theory and Practice ofSoftware ETAPS 2006 Vienna Austria March 30-31 2006 Proceedings pp 47ndash64 doi 10100711688839_5 url httpdxdoiorg10100711688839_5

Liang Sheng (1999) Java Native Interface Programmerrsquos Guide and Reference 1stBoston MA USA Addison-Wesley Longman Publishing Co Inc isbn 0201325772

Liskov Barbara and John Guttag (1986) Abstraction and Specification in ProgramDevelopment Cambridge MA USA MIT Press isbn 0-262-12112-3

Liu Yanhong A (1998) ldquoDependence Analysis for Recursive Datardquo In Proceedings ofthe 1998 International Conference on Computer Languages ICCL 1998 ChicagoIL USA May 14-16 1998 pp 206ndash215 doi 101109ICCL1998674171 urlhttpdxdoiorg101109ICCL1998674171

Liu Yanhong A and Scott D Stoller (2003) ldquoEliminating Dead Code on RecursiveDatardquo In Sci Comput Program 472-3 pp 221ndash242 doi 10 1016 S0167 -6423(02)00134-X url httpdxdoiorg101016S0167-6423(02)00134-X

Lu Yi John Potter and Jingling Xue (2007) ldquoValidity Invariants and Effectsrdquo InECOOP 2007 - Object-Oriented Programming 21st European Conference BerlinGermany July 30 - August 3 2007 Proceedings pp 202ndash226 doi 101007978-3-540-73589-2_11 url httpdxdoiorg101007978-3-540-73589-2_11

Marcheacute Claude Christine Paulin-Mohring and Xavier Urbain (2004) ldquoThe KRAKA-TOA Tool for Certification of JAVAJAVACARD Programs Annotated in JMLrdquo InJ Log Algebr Program 581-2 pp 89ndash106 doi 101016jjlap200307006url httpdxdoiorg101016jjlap200307006

Marcheacute Claude (2016) The Krakatoa Verification Tool for Java Programs KrakatoaTutorial and Reference Manual url httpkrakatoalrifrkrakatoapdf

Martin-Loumlf Per (1984) Intuitionistic Type Theory Naples BibliopolisMcCarthy John and Patrick J Hayes (1969) ldquoSome Philosophical Problems from the

Standpoint of Artificial Intelligencerdquo In Machine Intelligence Edinburgh Univer-sity Press

Meyer Bertrand (1991) Eiffel The Language Prentice-Hall isbn 0-13-247925-7mdash (1992) ldquoApplying Design by Contractrdquo In IEEE Computer 2510 pp 40ndash51

doi 1011092161279 url httpdxdoiorg1011092161279

224 BIBLIOGRAPHY

Meyer Bertrand (1997) Object-Oriented Software Construction 2nd Edition Prentice-Hall isbn 0-13-629155-4

mdash (2010) ldquoTowards a Theory and Calculus of Aliasingrdquo In Journal of Object Tech-nology 92 pp 37ndash74 doi 105381jot201092c5 url httpdxdoiorg105381jot201092c5

mdash (2011) ldquoSteps Towards a Theory and Calculus of Aliasingrdquo In Int J Softwareand Informatics 51-2 pp 77ndash115 url httpwwwijsiorgchreaderview_abstractaspxfile_no=i77

mdash (2015) ldquoFraming the Frame Problemrdquo In Dependable Software Systems Engineer-ing pp 193ndash203 doi 103233978-1-61499-495-4-193 url httpdxdoiorg103233978-1-61499-495-4-193

Midtgaard Jan (2012) ldquoControl-Flow Analysis of Functional Programsrdquo In ACMComput Surv 443 p 10 doi 10114521876712187672 url httpdoiacmorg10114521876712187672

Mike Barnett Rustan Leino Wolfram Schulte (2005) ldquoThe Spec Programming Sys-tem An Overviewrdquo In CASSIS 2004 Construction and Analysis of Safe Secureand Interoperable Smart devices Vol 3362 Springer pp 49ndash69 url httpswwwmicrosoftcomen-usresearchpublicationthe-spec-programming-system-an-overview

Milanova Ana Atanas Rountev and Barbara G Ryder (2005) ldquoParameterized ObjectSensitivity for Points-To Analysis for Javardquo In ACM Trans Softw Eng Methodol141 pp 1ndash41 doi 10114510448341044835 url httpdoiacmorg10114510448341044835

Montenegro Manuel Ricardo Pentildea and Clara Segura (2015) ldquoShape Analysis in aFunctional Language by Using Regular Languagesrdquo In Sci Comput Program 111pp 51ndash78 doi 101016jscico201412006 url httpdxdoiorg101016jscico201412006

Morgenstern Leora (1995) ldquoThe Problem with Solutions to the Frame Problemrdquo InThe Robotrsquos Dilemma Revisited The Frame Problem in Artificial Intelligence AblexAblex Publishing Co pp 99ndash133

Moura Leonardo Mendonccedila de and Nikolaj Bjoslashrner (2008) ldquoZ3 An Efficient SMTSolverrdquo In Tools and Algorithms for the Construction and Analysis of Systems14th International Conference TACAS 2008 Held as Part of the Joint EuropeanConferences on Theory and Practice of Software ETAPS 2008 Budapest HungaryMarch 29-April 6 2008 Proceedings pp 337ndash340 doi 101007978- 3- 540-78800-3_24 url httpdxdoiorg101007978-3-540-78800-3_24

Muumlller Peter (2002) Modular Specification and Verification of Object-Oriented Pro-grams Vol 2262 Lecture Notes in Computer Science Springer isbn 3-540-43167-5 doi 1010073-540-45651-1 url httpdxdoiorg1010073-540-45651-1

Muumlller Peter Arnd Poetzsch-Heffter and Gary T Leavens (2003) ldquoModular Specifi-cation of Frame Properties in JMLrdquo In Concurrency and Computation Practiceand Experience 152 pp 117ndash154 doi 101002cpe713 url httpdxdoiorg101002cpe713

BIBLIOGRAPHY 225

mdash (2006) ldquoModular Invariants for Layered Object Structuresrdquo In Sci Comput Pro-gram 623 pp 253ndash286 doi 10 1016 j scico 2006 03 001 url http dxdoiorg101016jscico200603001

Naudziuniene Daiva Matko Botincan Dino Distefano Mike Dodds Radu Grigore andMatthew J Parkinson (2011) ldquojStar-Eclipse An IDE for Automated Verificationof Java Programsrdquo In SIGSOFTFSErsquo11 19th ACM SIGSOFT Symposium on theFoundations of Software Engineering (FSE-19) and ESECrsquo11 13th European Soft-ware Engineering Conference (ESEC-13) Szeged Hungary September 5-9 2011pp 428ndash431 doi 10114520251132025182 url httpdoiacmorg10114520251132025182

Naur Peter (1966) ldquoProof of Algorithms by General Snapshotsrdquo In BIT NumericalMathematics 64 pp 310ndash316 issn 1572-9125 doi 101007BF01966091 urlhttpdxdoiorg101007BF01966091

Nelson Greg and Derek C Oppen (1980) ldquoFast Decision Procedures Based on Con-gruence Closurerdquo In J ACM 272 pp 356ndash364 doi 101145322186322198url httpdoiacmorg101145322186322198

Nielson Flemming and Hanne Riis Nielson (1999) ldquoInterprocedural Control Flow Anal-ysisrdquo In Programming Languages and Systems 8th European Symposium on Pro-gramming ESOPrsquo99 Held as Part of the European Joint Conferences on the Theoryand Practice of Software ETAPSrsquo99 Amsterdam The Netherlands 22-28 March1999 Proceedings pp 20ndash39 doi 10 1007 3 - 540 - 49099 - X _ 3 url http dxdoiorg1010073-540-49099-X_3

Nielson Flemming Hanne Riis Nielson and Chris Hankin (1999) Principles of ProgramAnalysis Springer isbn 978-3-540-65410-0

Nordio Martin Cristiano Calcagno Bertrand Meyer Peter Muumlller and Julian Tschan-nen (2010) ldquoReasoning about Function Objectsrdquo In Objects Models ComponentsPatterns 48th International Conference TOOLS 2010 Maacutelaga Spain June 28 -July 2 2010 Proceedings pp 79ndash96 doi 101007978-3-642-13953-6_5 urlhttpdxdoiorg101007978-3-642-13953-6_5

Nordstroumlm Bengt Kent Petersson and Jan M Smith (1990) Programming in Martin-Loumlfrsquos Type Theory Vol 200 Oxford University Press Oxford

OrsquoCallahan Robert and Daniel Jackson (1997) ldquoLackwit A Program UnderstandingTool Based on Type Inferencerdquo In Pulling Together Proceedings of the 19th Inter-national Conference on Software Engineering Boston Massachusetts USA May17-23 1997 Pp 338ndash348 doi 101145253228253351 url httpdoiacmorg101145253228253351

OrsquoHearn Peter W (2005) ldquoScalable Specification and Reasoning Challenges for Pro-gram Logicrdquo In Verified Software Theories Tools Experiments First IFIP TC2WG 23 Conference VSTTE 2005 Zurich Switzerland October 10-13 2005Revised Selected Papers and Discussions pp 116ndash133 doi 101007978-3-540-69149-5_14 url httpdxdoiorg101007978-3-540-69149-5_14

mdash (2012) ldquoA Primer on Separation Logic (and Automatic Program Verification andAnalysis)rdquo In Software Safety and Security - Tools for Analysis and Verification

226 BIBLIOGRAPHY

pp 286ndash318 doi 103233978-1-61499-028-4-286 url httpdxdoiorg103233978-1-61499-028-4-286

OrsquoHearn Peter W John C Reynolds and Hongseok Yang (2001) ldquoLocal Reasoningabout Programs that Alter Data Structuresrdquo In Computer Science Logic 15thInternational Workshop CSL 2001 10th Annual Conference of the EACSL ParisFrance September 10-13 2001 Proceedings pp 1ndash19 doi 1010073-540-44802-0_1 url httpdxdoiorg1010073-540-44802-0_1

OrsquoHearn Peter W Hongseok Yang and John C Reynolds (2004) ldquoSeparation andInformation Hidingrdquo In Proceedings of the 31st ACM SIGPLAN-SIGACT Sympo-sium on Principles of Programming Languages POPL 2004 Venice Italy January14-16 2004 pp 268ndash280 doi 101145964001964024 url httpdoiacmorg101145964001964024

Padhye Rohan and Uday P Khedker (2013) ldquoInterprocedural Data Flow Analysisin Soot Using Value Contextsrdquo In Proceedings of the 2nd ACM SIGPLAN In-ternational Workshop on State Of the Art in Java Program analysis SOAP 2013Seattle WA USA June 20 2013 pp 31ndash36 doi 10114524875682487569url httpdoiacmorg10114524875682487569

Park Young Gil and Benjamin Goldberg (1992) ldquoEscape Analysis on Listsrdquo In Pro-ceedings of the ACM SIGPLANrsquo92 Conference on Programming Language Designand Implementation (PLDI) San Francisco California USA June 17-19 1992pp 116ndash127 doi 101145143095143125 url httpdoiacmorg101145143095143125

Parkinson Matthew J and Gavin M Bierman (2005) ldquoSeparation Logic and Ab-stractionrdquo In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium onPrinciples of Programming Languages POPL 2005 Long Beach California USAJanuary 12-14 2005 pp 247ndash258 doi 10114510403051040326 url httpdoiacmorg10114510403051040326

Parkinson Matthew J Richard Bornat and Cristiano Calcagno (2006) ldquoVariables asResource in Hoare Logicsrdquo In 21th IEEE Symposium on Logic in Computer Science(LICS 2006) 12-15 August 2006 Seattle WA USA Proceedings pp 137ndash146 doi101109LICS200652 url httpdxdoiorg101109LICS200652

Pierce Benjamin C (2002) Types and Programming Languages MIT Press isbn 978-0-262-16209-8

Plotkin Gordon D (2004) ldquoA Structural Approach to Operational Semanticsrdquo In JLog Algebr Program 60-61 pp 17ndash139

Polikarpova Nadia Carlo A Furia Yu Pei Yi Wei and Bertrand Meyer (2013) ldquoWhatGood are Strong Specificationsrdquo In 35th International Conference on SoftwareEngineering ICSE rsquo13 San Francisco CA USA May 18-26 2013 pp 262ndash271doi 101109ICSE20136606572 url httpdxdoiorg101109ICSE20136606572

Praun Christoph von and Thomas R Gross (2003) ldquoStatic Conflict Analysis forMulti-Threaded Object-Oriented Programsrdquo In Proceedings of the ACM SIGPLAN2003 Conference on Programming Language Design and Implementation 2003 San

BIBLIOGRAPHY 227

Diego California USA June 9-11 2003 pp 115ndash128 doi 101145781131781145 url httpdoiacmorg101145781131781145

Rakamaric Zvonimir and Alan J Hu (2008) ldquoAutomatic Inference of Frame AxiomsUsing Static Analysisrdquo In 23rd IEEEACM International Conference on Auto-mated Software Engineering (ASE 2008) pp 89ndash98 doi 101109ASE200819url httpdxdoiorg101109ASE200819

Reacutemy Didier and Jerome Vouillon (1997) ldquoObjective ML A Simple Object-OrientedExtension of MLrdquo In Conference Record of POPLrsquo97 The 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages Papers Presentedat the Symposium Paris France 15-17 January 1997 pp 40ndash53 doi 101145263699263707 url httpdoiacmorg101145263699263707

Reps Thomas W Susan Horwitz and Shmuel Sagiv (1995) ldquoPrecise InterproceduralDataflow Analysis via Graph Reachabilityrdquo In Conference Record of POPLrsquo9522nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lan-guages San Francisco California USA January 23-25 1995 pp 49ndash61 doi101145199448199462 url httpdoiacmorg101145199448199462

Reps Thomas W and Todd Turnidge (1996) ldquoProgram Specialization via ProgramSlicingrdquo In Partial Evaluation International Seminar Dagstuhl Castle GermanyFebruary 12-16 1996 Selected Papers pp 409ndash429 doi 1010073-540-61580-6_20 url httpdxdoiorg1010073-540-61580-6_20

Reynolds John C (1981) The Craft of Programming Prentice Hall International seriesin computer science Prentice Hall isbn 978-0-13-188862-3

mdash (2000) ldquoIntuitionistic Reasoning about Shared Mutable Data Structurerdquo In Mil-lennial Perspectives in Computer Science Palgrave pp 303ndash321

mdash (2002) ldquoSeparation Logic A Logic for Shared Mutable Data Structuresrdquo In 17thIEEE Symposium on Logic in Computer Science (LICS 2002) 22-25 July 2002Copenhagen Denmark Proceedings pp 55ndash74 doi 101109LICS20021029817url httpdxdoiorg101109LICS20021029817

mdash (2005) ldquoAn Overview of Separation Logicrdquo In Verified Software Theories ToolsExperiments First IFIP TC 2WG 23 Conference VSTTE 2005 Zurich Switzer-land October 10-13 2005 Revised Selected Papers and Discussions pp 460ndash469doi 101007978-3-540-69149-5_49 url httpdxdoiorg101007978-3-540-69149-5_49

Robert Valentin and Xavier Leroy (2012) ldquoA Formally-Verified Alias Analysisrdquo InCertified Programs and Proofs - Second International Conference CPP 2012 KyotoJapan December 13-15 2012 Proceedings pp 11ndash26 doi 101007978-3-642-35308-6_5 url httpdxdoiorg101007978-3-642-35308-6_5

Ruf Erik (1995) ldquoContext-Insensitive Alias Analysis Reconsideredrdquo In Proceedingsof the ACM SIGPLAN 1995 Conference on Programming Language Design andImplementation PLDI rsquo95 La Jolla California USA ACM pp 13ndash22 isbn 0-89791-697-2 doi 101145207110207112 url httpdoiacmorg101145207110207112

Sabelfeld Andrei and Andrew C Myers (2003) ldquoLanguage-Based Information-FlowSecurityrdquo In IEEE Journal on Selected Areas in Communications 211 pp 5ndash19

228 BIBLIOGRAPHY

doi 101109JSAC2002806121 url httpdxdoiorg101109JSAC2002806121

Sagiv Shmuel Thomas W Reps and Reinhard Wilhelm (1999) ldquoParametric ShapeAnalysis via 3-Valued Logicrdquo In POPL rsquo99 Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages 1999 pp 105ndash118doi 101145292540292552 url httpdoiacmorg101145292540292552

Salcianu Alexandru and Martin C Rinard (2005) ldquoPurity and Side Effect Analysis forJava Programsrdquo In Verification Model Checking and Abstract Interpretation 6thInternational Conference VMCAI 2005 Proceedings pp 199ndash215 doi 101007978-3-540-30579-8_14 url httpdxdoiorg101007978-3-540-30579-8_14

Shapiro Marc and Susan Horwitz (1997) ldquoThe Effects of the Precision of Pointer Anal-ysisrdquo In Static Analysis 4th International Symposium SAS rsquo97 Paris FranceSeptember 8-10 1997 Proceedings pp 16ndash34 doi 101007BFb0032731 urlhttpdxdoiorg101007BFb0032731

Sharir M and A Pnueli (1978) Two Approaches to Interprocedural Data Flow AnalysisNew York NY New York Univ Comput Sci Dept url httpscdscernchrecord120118

Shostak Robert E (1984) ldquoDeciding Combinations of Theoriesrdquo In J ACM 311pp 1ndash12 doi 1011452422322411 url httpdoiacmorg1011452422322411

Smans Jan Bart Jacobs and Frank Piessens (2008) ldquoVeriCool An Automatic Verifierfor a Concurrent Object-Oriented Languagerdquo In Formal Methods for Open Object-Based Distributed Systems 10th IFIP WG 61 International Conference FMOODS2008 Oslo Norway June 4-6 2008 Proceedings pp 220ndash239 doi 101007978-3-540-68863-1_14 url httpdxdoiorg101007978-3-540-68863-1_14

mdash (2012) ldquoImplicit Dynamic Framesrdquo In ACM Trans Program Lang Syst 34121ndash258 doi 10114521609102160911 url httpdoiacmorg10114521609102160911

Sozeau Matthieu (2009) ldquoA New Look at Generalized Rewriting in Type TheoryrdquoIn J Formalized Reasoning 21 pp 41ndash62 doi 106092issn1972-57871574url httpdxdoiorg106092issn1972-57871574

Sozeau Matthieu and the COQ development team (1997) The Coq Proof AssistantReference Manual Version 86 Inria

Sridharan Manu Denis Gopan Lexin Shan and Rastislav Bodiacutek (2005) ldquoDemand-Driven Points-to Analysis for Javardquo In Proceedings of the 20th Annual ACM SIG-PLAN Conference on Object-oriented Programming Systems Languages and Ap-plications OOPSLA rsquo05 San Diego CA USA ACM pp 59ndash76 isbn 1-59593-031-0 doi 10114510948111094817 url httpdoiacmorg10114510948111094817

Strachey Christopher (1967) Fundamental Concepts in Programming Languages Lec-ture Notes International Summer School in Computer Programming CopenhagenReprinted in Higher-Order and Symbolic Computation 13(12) pp 1ndash49 2000

BIBLIOGRAPHY 229

Taghdiri Mana Robert Seater and Daniel Jackson (2006) ldquoLightweight Extraction ofSyntactic Specificationsrdquo In Proceedings of the 14th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering FSE 2006 pp 276ndash286 doi10114511817751181809 url httpdoiacmorg10114511817751181809

Tip Frank (1995) ldquoA Survey of Program Slicing Techniquesrdquo In J Prog Lang 33url httpcompscinetdcskclacukJPjp030301abshtml

Vardi Moshe Y and Pierre Wolper (1994) ldquoReasoning about Infinite ComputationsrdquoIn Information and Computation 115 pp 1ndash37

Volpano Dennis M Cynthia E Irvine and Geoffrey Smith (1996) ldquoA Sound TypeSystem for Secure Flow Analysisrdquo In Journal of Computer Security 423 pp 167ndash188 doi 103233JCS-1996-42-304 url httpdxdoiorg103233JCS-1996-42-304

Wadler Philip and R J M Hughes (1987) ldquoProjections for Strictness Analysisrdquo InFunctional Programming Languages and Computer Architecture Portland OregonUSA September 14-16 1987 Proceedings pp 385ndash407 doi 1010073- 540-18317-5_21 url httpdxdoiorg1010073-540-18317-5_21

Wand Mitchell and William D Clinger (1998) ldquoSet Constraints for Destructive ArrayUpdate Optimizationrdquo In Proceedings of the 1998 International Conference onComputer Languages ICCL 1998 Chicago IL USA May 14-16 1998 pp 184ndash195 doi 101109ICCL1998674169 url httpdxdoiorg101109ICCL1998674169

Wand Mitchell and Igor Siveroni (1999) ldquoConstraint Systems for Useless VariableEliminationrdquo In POPL rsquo99 Proceedings of the 26th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages San Antonio TX USAJanuary 20-22 1999 pp 291ndash302 doi 101145292540292567 url httpdoiacmorg101145292540292567

Weiser Mark (1984) ldquoProgram Slicingrdquo In IEEE Trans Software Eng 104 pp 352ndash357 doi 101109TSE19845010248 url httpdxdoiorg101109TSE19845010248

Wing Jeannette M (1987) ldquoWriting Larch Interface Language Specificationsrdquo InACM Trans Program Lang Syst 91 pp 1ndash24 doi 101145975810500 urlhttpdoiacmorg101145975810500

Xtext Documentation httpseclipseorgXtext Accessed 2016-09-11Zee Karen Viktor Kuncak and Martin C Rinard (2008) ldquoFull Functional Verification

of Linked Data Structuresrdquo In Proceedings of the ACM SIGPLAN 2008 Conferenceon Programming Language Design and Implementation Tucson AZ USA June 7-13 2008 pp 349ndash361 doi 10114513755811375624 url httpdoiacmorg10114513755811375624

Zhao Yang and John Boyland (2008) ldquoA Fundamental Permission Interpretation forOwnership Typesrdquo In Second IEEEIFIP International Symposium on TheoreticalAspects of Software Engineering TASE 2008 June 17-19 2008 Nanjing Chinapp 65ndash72 doi 101109TASE200845 url httpdxdoiorg101109TASE200845

230 BIBLIOGRAPHY

Zheng Xin and Radu Rugina (2008) ldquoDemand-Driven Alias Analysis for Crdquo In Pro-ceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages POPL rsquo08 San Francisco California USA ACM pp 197ndash208 isbn 978-1-59593-689-9 doi 10114513284381328464 url httpdoiacmorg10114513284381328464

  • Reacutesumeacute eacutetendu en Franccedilais
    • Le Problegraveme du Frame
    • Objectifs
    • Analyse de deacutependance
    • Anaylse de correacutelation
    • Proceacutedure de deacutecision
    • Conclusion
      • Introduction
        • Formal Verification of Software
        • The Frame Problem in a Nutshell
        • Prove amp Run Objectives and Products
        • Context and Problem Statement
        • Contributions and Structure of the Document
          • The Frame Problem in Software Verification
            • Specification Languages and Verification Tools
            • Manifestations of the Frame Problem
            • Approaches to Specifying Frame Properties
              • The Manual Approach
              • The Exclusive Approach
              • The Implicit Approach
                • Topologies and Effects
                  • Explicit Footprints
                  • Implicit Footprints
                  • Predefined Footprints
                    • Other Approaches to Reason about Frames
                    • Other Relevant Work
                      • The Smart Language and ProvenTools
                        • The Smart Modeling Language
                          • Smart Predicates and Types
                          • Exit Labels and Control Flow
                          • Polymorphism amp Algebraic Data Types
                          • Specifications
                          • Illustrating Smart ndash An Abstract Process Manager
                            • ProvenTools
                            • Smil
                              • The alpha-Smil Language
                                • alpha-Smil Syntax
                                • Control Flow Graph
                                • Well-Typed Smil Statements
                                • Operational Semantics of Smil Statements
                                  • Dependency Analysis for Functional Specifications
                                    • Dependency Analysis in a Nutshell
                                      • Targeted Dependency Information
                                      • Outline
                                        • Abstract Dependency Domain
                                          • Join and Reduction Operator
                                          • Well-Typed Dependencies
                                            • Intraprocedural Analysis and Data-Flow Equations
                                              • Intraprocedural Dependency Domains
                                              • Intraprocedural Data-Flow Equations
                                              • Intraprocedural Dependency Analysis Illustrated
                                                • Interprocedural Dependencies
                                                  • Interprocedural Dependency Analysis Illustrated
                                                  • Context-Insensitivity and its Consequences
                                                    • Semantics of Dependency Values
                                                    • Related Work
                                                    • Conclusion
                                                      • Deferred Dependencies Injecting Context in Dependency Summaries
                                                        • Dealing with Context-Insensitivity
                                                        • Symbolic Dependency Components in a Nutshell
                                                        • Symbolic Paths
                                                          • Symbolic Path Type
                                                          • Semantics of Symbolic Paths
                                                          • Well-Typed Paths and Path Sets
                                                            • Abstract Dependency Domain with Deferred Accesses
                                                            • Deferred Dependencies at the Intraprocedural Level
                                                              • Extended Intraprocedural Dependency Analysis
                                                              • Intraprocedural Dependency Analysis Illustrated
                                                                • Deferred Dependencies at the Interprocedural Level
                                                                  • Applying Context-Sensitive Information by Substitution
                                                                  • Wrapped Calls and Results
                                                                    • Related Work
                                                                    • Conclusion
                                                                      • Correlation Analysis
                                                                        • Introduction
                                                                          • Targeted Correlation Information
                                                                          • Correlation Analysis in a Nutshell
                                                                            • Partial Equivalence Relations
                                                                              • Abstract Partial Equivalence Type
                                                                              • Well-Typed Partial Equivalences and their Semantics
                                                                                • Paths and Correlations
                                                                                  • Paths and Correlation Types
                                                                                  • Alignment and Partial Order
                                                                                    • Intraprocedural Correlation Analysis
                                                                                      • Intraprocedural Correlation Summaries and Analysis
                                                                                      • Intraprocedural Correlation Analysis Illustrated
                                                                                        • Interprocedural Correlation Analysis
                                                                                        • Extension ndash Constructor Evolution
                                                                                        • Related Work
                                                                                        • Conclusion
                                                                                          • Implementation Application and Results
                                                                                            • Implementation of the Dependency Analysis
                                                                                              • Dependency Type and Operators
                                                                                              • Intraprocedural Dependency Analysis
                                                                                                • Implementation of the Correlation Analysis
                                                                                                  • Partial Equivalence Relations and Operators
                                                                                                  • Intraprocedural Correlations
                                                                                                  • Dependency and Correlation Analysers
                                                                                                    • Dependency and Correlation Results on ProvenCore Layers
                                                                                                      • ProvenCore Description
                                                                                                      • Obtained Dependency and Correlation Results
                                                                                                      • Precision of our Dependency and Correlation Summaries
                                                                                                        • Reasoning about Framing using Correlations and Dependencies
                                                                                                          • A Decision Procedure
                                                                                                          • Types of Targeted Queries
                                                                                                            • Decision Procedure Experiments
                                                                                                              • Conclusion and Perspectives
                                                                                                                • Contributions
                                                                                                                • Future Work
                                                                                                                  • Bibliography
Page 4: Static Analysis of Functional Programs with an Application

iii

UNIVERSITEacute DE RENNES 1

AbstractProve amp Run

Eacutecole doctorale Matisse

DOCTEUR DE LrsquoUNIVERSITEacute DE RENNES 1

Static Analysis of Functional Programs with anApplication to the Frame Problem in

Deductive Verification

by Oana Fabiana Andreescu

In the field of software verification the frame problem refers to establishing the bound-aries within which program elements operate It has notoriously tedious consequenceson the specification of frame properties which indicate the parts of the program statethat an operation is allowed to modify as well as on their verification ie provingthat operations modify only what is specified by their frame properties In the contextof interactive formal verification of complex systems such as operating systems mucheffort is spent addressing these consequences and proving the preservation of the sys-temsrsquo invariants However most operations have a localized effect on the system andimpact only a limited number of invariants at the same time In this thesis we addressthe issue of identifying those invariants that are unaffected by an operation and wepresent a solution for automatically inferring their preservation Our solution is meantto ease the proof burden for the programmer It is based on static analysis and doesnot require any additional frame annotations Our strategy consists in combining adependency analysis and a correlation analysis We have designed and implementedboth static analyses for a strongly-typed functional language that handles structuresvariants and arrays The dependency analysis computes a conservative approximationof the input fragments on which functional properties and operations depend Thecorrelation analysis computes a safe approximation of the parts of an input state to afunction that are copied to the output state It summarizes not only what is modifiedbut also how it is modified and to what extent By employing these two static analysesand by subsequently reasoning based on their combined results an interactive theo-rem prover can automate the discharching of proof obligations for unmodified partsof the state We have applied both of our static analyses to a functional specificationof a micro-kernel and the obtained results demonstrate both their precision and theirscalability

v

AcknowledgementsFirst of all I would like to express my gratitude to my two PhD advisors ThomasJensen and Steacutephane Lescuyer without whom this thesis would have been impossibleI thank them for their patience and dedication in guiding me throughout these yearsand for all the rigour that they instilled into me by word and by their own exampleThomas thank you for helping me put my work into perspective Thank you for yourencouragement when I was overwhelmed by doubts and for your optimism when I hadnone Steacutephane thank you for your inspiring advices for the rigorous proofreadingfor the many interesting discussions and for your careful attention to my work Knowthat this thank you note was written using Emacs to which I am happy to admit thatyou converted me

I am in debt to Dominique Bolignano for raising the possibility of this thesis andfor creating the frame that allowed me to embark on this interesting journey and toexplore the seas of research among an inspiring group of professionals - the Prove ampRun team

I am grateful to and would like to wholeheartedly thank Catherine Dubois andAntoine Mineacute for accepting to review my dissertation I am honoured to know that my200+ pages have been read by experts of static analysis and formal verification and Iam grateful for their valuable comments and remarks

I would also like to thank Sandrine Blazy and Sylvain Conchon for accepting to bemembers of the jury Sylvain Conchon I am grateful for your keen interest during mydefense Sandrine Blazy thank you for accepting to chair my defense and for drivingit in such a positive manner

For their understanding their advice and their support during the transition periodand the months before my defense I would like to thank Claire Loiseaux and CarolinaLavatelli

I thank all of my colleagues at Prove amp Run for our discussions and their adviceduring these years I thank Florence for her warmth energy and optimism Erica andHenry for being such great office colleagues Pauline and Franccedilois for being friendlyreliable colleagues in the academic trenches I am in debt to Olivier and Benoit forreviewing my articles and providing valuable remarks I thank Pascale for smoothingout the stormy waves of administrative work Though our interactions were brieferI would like to also thank the Celtique members for their openness and for the inter-esting seminaries A special thanks goes to Lydie Mabil for helping me deal with theadministrative work during these years and finally for helping prepare the defense ofmy dissertation

This academic journey started long ago even before I was aware with the help ofMarius Minea and Ovidiu Badescu who unknowingly motivated me to take this pathyears later I warmly thank them and I am grateful to both for paving the first part ofmy academic path

I would also like to thank my friends old and new far and near Thank you foralways being there for me and providing perspective enthusiasm and breaths of freshair Thank you as well for still being my friends despite the long winded and geeky

vi

descriptions of my work and the occasionally cancelled plans and absences while I wastrying to find my way into the research world

I lack the appropriate words to express the gratitude I feel towards my family fortheir never-ending love and support I thank my mother and my sister for being suchwonderful examples of women in science I thank my father for his unwavering belief inme and for his love and respect for well-written sentences no matter the context whichhe instilled into me I thank my brother-in-law for being the one who ignited early onthe sparkle and interest for computers and mathematics and my two wonderful niecesfor always being my rays of light

Last but surely not least I have only gratitude for Georges my companion mypillar of strength my compass and lighthouse during the darkest moments To quoteCarl Sagan in the vastness of space and immensity of time it is my absolute joy tospend a planet and an epoch with you

vii

Contents

I Reacutesumeacute eacutetendu en Franccedilais xxiiiI1 Le Problegraveme du Frame xxiiiI2 Objectifs xxiiiI3 Analyse de deacutependance xxivI4 Anaylse de correacutelation xxvI5 Proceacutedure de deacutecision xxvI6 Conclusion xxvi

1 Introduction 111 Formal Verification of Software 112 The Frame Problem in a Nutshell 513 Prove amp Run Objectives and Products 714 Context and Problem Statement 915 Contributions and Structure of the Document 11

2 The Frame Problem in Software Verification 1321 Specification Languages and Verification Tools 1322 Manifestations of the Frame Problem 1623 Approaches to Specifying Frame Properties 17

231 The Manual Approach 17232 The Exclusive Approach 19233 The Implicit Approach 21

24 Topologies and Effects 21241 Explicit Footprints 23242 Implicit Footprints 24243 Predefined Footprints 25

25 Other Approaches to Reason about Frames 2626 Other Relevant Work 27

3 The Smart Language and ProvenTools 2931 The Smart Modeling Language 29

311 Smart Predicates and Types 30312 Exit Labels and Control Flow 34313 Polymorphism amp Algebraic Data Types 40314 Specifications 43315 Illustrating Smart ndash An Abstract Process Manager 47

32 ProvenTools 52

viii

33 Smil 55

4 The αSmil Language 5941 αSmil Syntax 5942 Control Flow Graph 6743 Well-Typed αSmil Statements 6744 Operational Semantics of αSmil Statements 70

5 Dependency Analysis for Functional Specifications 7751 Dependency Analysis in a Nutshell 78

511 Targeted Dependency Information 79512 Outline 83

52 Abstract Dependency Domain 83521 Join and Reduction Operator 86522 Well-Typed Dependencies 90

53 Intraprocedural Analysis and Data-Flow Equations 91531 Intraprocedural Dependency Domains 91532 Intraprocedural Data-Flow Equations 93533 Intraprocedural Dependency Analysis Illustrated 97

54 Interprocedural Dependencies 100541 Interprocedural Dependency Analysis Illustrated 103542 Context-Insensitivity and its Consequences 104

55 Semantics of Dependency Values 10556 Related Work 10957 Conclusion 112

6 Deferred Dependencies Injecting Context in Dependency Summaries11561 Dealing with Context-Insensitivity 11562 Symbolic Dependency Components in a Nutshell 11663 Symbolic Paths 120

631 Symbolic Path Type 120632 Semantics of Symbolic Paths 122633 Well-Typed Paths and Path Sets 123

64 Abstract Dependency Domain with Deferred Accesses 12565 Deferred Dependencies at the Intraprocedural Level 128

651 Extended Intraprocedural Dependency Analysis 128652 Intraprocedural Dependency Analysis Illustrated 129

66 Deferred Dependencies at the Interprocedural Level 130661 Applying Context-Sensitive Information by Substitution 132662 Wrapped Calls and Results 134

67 Related Work 13468 Conclusion 136

ix

7 Correlation Analysis 13771 Introduction 137

711 Targeted Correlation Information 138712 Correlation Analysis in a Nutshell 140

72 Partial Equivalence Relations 141721 Abstract Partial Equivalence Type 141722 Well-Typed Partial Equivalences and their Semantics 144

73 Paths and Correlations 146731 Paths and Correlation Types 146732 Alignment and Partial Order 149

74 Intraprocedural Correlation Analysis 155741 Intraprocedural Correlation Summaries and Analysis 155742 Intraprocedural Correlation Analysis Illustrated 162

75 Interprocedural Correlation Analysis 16676 Extension ndash Constructor Evolution 16777 Related Work 16978 Conclusion 171

8 Implementation Application and Results 17381 Implementation of the Dependency Analysis 173

811 Dependency Type and Operators 174812 Intraprocedural Dependency Analysis 177

82 Implementation of the Correlation Analysis 178821 Partial Equivalence Relations and Operators 178822 Intraprocedural Correlations 179823 Dependency and Correlation Analysers 180

83 Dependency and Correlation Results on ProvenCore Layers 182831 ProvenCore Description 182832 Obtained Dependency and Correlation Results 184833 Precision of our Dependency and Correlation Summaries 188

84 Reasoning about Framing using Correlations and Dependencies 192841 A Decision Procedure 192842 Types of Targeted Queries 197

85 Decision Procedure Experiments 199

9 Conclusion and Perspectives 20391 Contributions 20492 Future Work 206

Bibliography 211

xi

List of Figures

11 Complex Transition Systems Frame Problem 912 Frame Problem and Solution Strategy 10

31 Possible Transitions between Thread States 4832 The ProvenTools Toolchain 5333 Smart Editor 54

41 Body of the stop_thread Predicate 6542 Example ndash Control Flow Graph of Predicate thread 6743 Well-Typed Control Flow Graph 70

51 Example Data Types ndash Thread and Memory Region 7952 Input Type ndash Process 8053 Predicate thread ndash Implementation 8054 Gthread ndash Control Flow Graph of Predicate thread 8155 Targeted Dependency Results for Predicate thread 8156 Gstart_address ndash Control Flow Graph of Predicate start_address 8257 Predicate start_address ndash Implementation 8258 Targeted Dependency Results for Predicate start_address 8259 Order Relation on Pairs of Atomic Dependencies 85510 Computation of the Intraprocedural Domain at a Nodersquos Entry Point 94511 Analysing Predicate thread ndash Initialisation 98512 Applying the Variant Switch Equation 98513 Analysing Predicate thread ndash Variant Switch 99514 Applying the Array Access Equation 99515 Analysing Predicate thread ndash Array Access 100516 Applying the Field Access Equation 100517 Analysing Predicate thread ndash Field Access 101518 Gstart_address ndash Dependency Information 103519 Gstart_address ndash Final Dependency Results 104

61 Analysing thread ndash Dependency Summary with Deferred Occurrences 13062 Gstart_address ndash Intermediate Dependency Results for start_address 13163 Substitution of Formal Parameters by Effective Parameters 13164 Substituting Deferred Dependencies by Actual Dependencies 132

71 Body of the stop_thread Predicate 138

xii

72 Targeted Correlation Results for Predicate stop_thread 13973 Intraprocedural Correlations ndash General Representation 14074 Intraprocedural Domain ndash Examples 14175 Entry Point ndash Correlation Information 16276 Analysing Predicate stop_thread ndash Initialisation 16377 Construction Evolution 167

81 ProvenCore ndash Abstract Layers 18382 Distribution of the number of inferred preserved properties 20183 Distribution of the number of inferred predicates for which a property is

preserved 202

xiii

List of Tables

42 αSmil ndash Set of Supported Statements 6243 Statements and their Exit Labels 6344 Predicate Body in αSmil 6446 Well-Typed Predicate Call 6847 Well-Typed Statements 6948 The Structural Operational Semantics of αSmil Generic Statements 7249 Operational Semantics of αSmil Structure-Related Statements 73410 Operational Semantics of αSmil Variant-Related Statements 74411 Operational Semantics of αSmil Array-Related Statements 75412 Semantics of a Predicate Call 76

51 v ndash Comparison of Two Domains 8652 or ndash Join Operation 8753 oplus ndash Reduction Operator 8954 Dependency Extractions 9055 Well-Typed Dependencies 9156 Statements ndash Representations and Data-Flow Equations 9357 Generic Statements ndash Data-Flow Equations 9558 Structure-Related Statements ndash Data-Flow Equations 9559 Variant-Related Statements ndash Data-Flow Equations 96510 Array-Related Statements ndash Data-Flow Equations 97

61 E ndash Path Semantics 12262 Well-Typed Dependency Paths

12463 Extended Leq - Comparison of Two Domains

12664 or ndash Extended Join 12765 oplus ndash Extended Reduction Operator 12766 Extended Extraction Operators 12867 Well-Typed Dependencies ndash Extended 12868 Deferred Paths ndash Application and Substitutions 13369 Interprocedural Domain ndash Substitutions 133

71 vR ndash Comparison of Two Domains 14272 Partial Equivalences ndash orR ndash Join Operation 14373 Partial Equivalences ndash andR ndash Meet Operation 143

xiv

74 Partial Equivalence Extractions 14475 Well-Typed Partial Equivalences 14576 Partial Equivalence Relations ndash Semantics 14677 Well-Typed Access Paths

14878 Well-Typed Correlations

14879 Well-Typed Correlation Maps

149711 Links between Access Paths 152712 Statements ndash Representations and Data-Flow Equations 157719 Well-Formed Intraprocedural Correlation Summaries

162

83 ProvenCore Abstract Layers ndash Global State Type 18584 ProvenCore Abstract Layers ndash ProcessMachine Type 18585 Abstract Layers ndash Evaluation Data and Dependency Analysis Timing 18686 Abstract Layers ndash Detailed Dependency Analysis Timing 18687 Abstract Layers ndash Evaluation Data and Deferred Dependency Analysis

Timing 18788 Abstract Layers ndash Detailed Deferred Dependency Analysis Timing 18789 Abstract Layers ndash Evaluation Data and Correlation Analysis Timing 187810 Abstract Layers ndash Detailed Correlation Analysis Timing 188811 RSMFSP Layers ndash Evaluation Data and Dependency Summaries 190812 TDS Layer ndash Evaluation Data and Dependency Summaries 191813 RSMFSP Layers ndash Evaluation Data and Correlation Summaries 192814 TDS Layer ndash Evaluation Data and Correlation Summaries 193

xv

List of notations

Section Symbol Type DescriptionSec 312 true L Special exit label 34Sec 312 false L Special exit label 35Sec 41 T0 sub T Set of base type identifiers 60Def 411 T Universe of type identifiers 60Def 411 τ T Type 60Def 411 τ0 T Primitive type 60Def 411 structf1 τ1 T Structure type 60Def 411 variant[C1 τ1| ] T Variant type 60Def 411 arrτ 〈τ〉 T Array type 60Sec 41 λ L Exit label 61Sec 41 L Set of exit labels 61Sec 41 error L Special exit label 61Sec 41 σ σp Σ Signature (of predicate p) 61Sec 41 Σ Set of predicate signatures 61Sec 41 o o V Output variable(s) 61Tab 42 s αSmil statement 62Tab 42 o = e αSmil assignment statement 62Tab 42 e1 = e2 αSmil equality test statement 62Tab 42 nop αSmil no operation statement 62Tab 42 r = e1 en αSmil create structure statement 62Tab 42 o1 on = r αSmil destructure structure 62Tab 42 o = rfi αSmil access field statement 62Tab 42 rprime = r with fi = e αSmil update field statement 62Tab 42 rprime = 〈f1 fk〉rprimeprime αSmil partial structure equality 62Tab 42 v = Cp[e] αSmil create variant statement 62Tab 42 switch(v) as [o1| ] αSmil destructure variant statement 62Tab 42 v isin C1 Ck αSmil variant possible statement 62Tab 42 o = a[i] αSmil array access statement 62Tab 42 aprime = [a with i = e] αSmil array update statement 62Tab 42 p(e1 ) [λ1 o1 | ] αSmil predicate call statement 62Sec 42 Gp = (N E) Control flow graph of predicate p 67Def 431 Γ V rarr T Typing environment 68Sec 43 v V Variable 68Sec 43 V Set of variables 68Sec 43 V+ sube V Writable variable identifiers 68Def 432 Σ P rarr S Maps predicate ids to signatures 68

xvi

Def 433 ΣΓO ` srarr λ Well-typed statement 68Sec 43 O sube V+ Output variables of a predicate 68Sec 44 Dτ Semantic values of type τ 70Sec 44 P sube Dτ Domain of valid array indices 71Sec 44 E = V rarr D Valuation or environment type 71Def 442 E E Valuation or environment 71Sec 44 Γ(v) Type of v 71Sec 44 Γ ` E Well-typed environment 71Def 443

langE [s]

rangConfiguration 71

Def 444langE [s]

rang λminusrarr Eprime Transition 71Def 445 E [xrarr v] Extension of E with xrarr v 72Def 446 I = PtimesErarrEtimesL Set of interpretations 72Def 446 I I Interpretation 72Sec 52 D Abstract dependency domain 83Def 521 δ D Dependency 83Def 521 gt D Everything atomic dependency 83Def 521 D Nothing atomic dependency 83Def 521 perp D Impossible atomic dependency 83Def 521 f1 7rarr δ1 D Structure dependency 83Def 521 [C1 7rarr δ1 ] D Variant dependency 83Def 521 〈δ〉 D Array dependency 83Def 521 〈δdef i δexc〉 D Array dependency exception for i 83Def 522 v sube DtimesD Partial order on dependencies 85Tab 51 Rules for v 86Def 523 or DtimesD rarr D Join operator for dependencies 86Tab 52 or cases 87Def 524 oplus DtimesD rarr D Reduction operator for dependencies 88Tab 53 oplus cases 89Def 525 f D 9 D Extraction of a fieldrsquos dependency 89Def 526 C D 9 D Extraction of a constructorrsquos dep 89Def 527 〈i〉 D 9 D Extraction of an arrayrsquos cell dep 89Def 528 〈lowast i〉 D 9 D Extraction of an arrayrsquos dep (exc) 90Def 529 〈lowast〉 D 9 D Extraction of an arrayrsquos dependency 90Tab 54 f c 〈lowast i〉 〈i〉 and 〈lowast〉 cases 90Tab 55 Γ ` δ τ Well-typed dependency 91Def 531 D = VrarrD Intraprocedural dependency domain 92Def 531 ∆ D Intraprocedural dependency 92Sec 531 Unreachable D Intra dep for unreachable nodes 92Def 532 ∆ x DtimesV rarr D Forget x 92Def 533 v∆ sube DtimesD Intraprocedural partial order 92Def 534 or∆ DtimesD rarr D Intraprocedural join operation 92Def 535 oplus∆ DtimesD rarr D Intraprocedural reduction operator 93Sec 532 JsKλ(∆nj ) Contribution of an edge (ni nj) 93

xvii

Sec 532 JsKλ() Transfer function of the edge s λ 93Sec 532 gensλ Written variables on the edge s λ 94Sec 532 ∆n D Dependency domain of node n 94Sec 532 I sube V Set of input variables 96Sec 54 χ Formal-Effective param mapping 101Sec 54 J (χ) Substitution formal to effective 101Def 631 π Π Symbolic path 120Def 631 Π Universe of symbolic paths 120Def 631 ε Π Symbolic path endpoint 120Def 631 fπ Π Symbolic path field 120Def 631 Cπ Π Symbolic path constructor 120Def 631 〈i〉π Π Symbolic path array cell 120Def 631 〈lowast i〉π Π Symbolic path array cells except 120Def 631 〈lowast〉π Π Symbolic path all array cells 120Sec 631 ΠtimesΠrarrΠ Path extension operator 121Sec 631 P 2Π Symbolic path set 121Def 632

v sub 2Πtimes2Π Partial order for path sets 121

Def 633or 2Πtimes2Πrarr2Π Join operator for path sets 121

Def 634 2ΠtimesΠrarr2Π Extension operator for path sets 121Def 635 π Π Actual path 122Def 635 Π Universe of actual paths 122Def 635 ε Π Actual path empty 122Def 635 f π Π Actual path field 122Def 635 Cπ Π Actual path constructor 122Def 635 〈i〉π Π Actual path array cell 122Def 61 E sub E timesΠtimesΠ Symbolic path covers actual path 122Sec 632 E sub E times2ΠtimesΠ Set of symbolic paths covers actual 122Def 636 JP KE sub E times2Πrarr2Π Interpretation of symbolic paths set 123Def 637 at ΠtimesDrarrD Find subpart of value at given path 123Tab 62 I ` π τrarrτ prime sub VtimesΠtimesTtimesT Symbolic paths typing judgement 124Sec 633 I

` P τrarrτ prime sub Vtimes2ΠtimesTtimesTSymbolic paths sets judgement 124

Def 641 δ D Extended dependency 125Def 641 D Ext abstract dependency domain 125Def 641 Deferred(o17rarrP1 ) D Deferred accesses dependency 125Def 642 A V 9 Π Access map 125Tab 63 Deferred rule for v 126Tab 64 or cases for deferred 127Tab 65 oplus cases for deferred 127Tab 66 f c 〈lowast i〉 〈i〉 〈lowast〉 deferred cases128Tab Γ IO ` δ τ Well-typed dependency deferred rule128Def 661 σ V rarr D Substitution roots vars to deps 132Def 662 φ V9V Substitution indices in arrays 132Sec 661 J (σ φ) Substitutes deferred dependencies 132

xviii

Sec 661 bull Applies symbolic paths to dep 132Sec 661 Applies symbolic path to dep 133Def 721 R R Partial equivalence 141Def 721 R Partial equivalence type 141Def 721 Equal R Partial equivalence equal 141Def 721 Any R Partial equivalence unrelated 141Def 721 f1 7rarr R1 R Partial equivalence structure 141Def 721 [C1 7rarr R1 ] R Partial equivalence variant 141Def 721 〈Rdef 〉 R Partial equivalence array 141Def 721 〈Rdef i Rexc〉 R Partial equivalence array + exc 141Def 722 vR sube RtimesR Preorder for partial equivalences 142Def 71 Rules for vR 142Def 723 orR RtimesRrarrR Join for partial equivalences 142Tab 72 orR cases 142Def 724 andR RtimesRrarrR Meet for partial equivalences 142Tab 73 andR cases 142Def 725 extrf R9R Extracts fieldrsquos partial eqv 143Def 726 extrC R9R Extracts constructorrsquos partial eqv 143Def 727 extr 〈i〉 R9R Extracts cellrsquos partial eqv 143Tab 74 extrf extrC and extr 〈i〉 cases 144Tab 75 Γ ` R τ Partial equivalence well-typedness 145Sec 722 JRKτ Partial equivalence semantics 145Def 731 π Π Access path 147Def 731 Π Access path type 147Def 731 ε Π Access path empty 147Def 731 f π Π Access path field 147Def 731 Cπ Π Access path constructor 147Def 731 〈i〉π Π Access path array cell 147Def 732 κ K Correlation map 147Def 732 K = ΠtimesΠrarrR Correlation map type 147Sec 731 (π ρ) 7rarr R ΠtimesΠtimesR Correlation 147Tab 77 ΓI ` π τrarr τ Well-typed access path 148Tab 78 ΓI `(πρ) 7rarrR (τlτr) Well-typed correlation 148Tab 79 ΓI `κ (τlτr) Well-typed correlation map 149Def 733 micro M Link 151Def 733 M Link type 151Def 733 Identical M Link identical 151Def 733 Left π M Link left path has suffix π 151Def 733 Right π M Link right path has suffix π 151Def 733 Incompatible M Link incompatible paths 151Def 734 f ΠtimesΠrarrM Matching Operator 151Def 735 R

(πρ)(πprimeρprime) Aligning a correlation 152

Def 736 Computation of R(πρ)(πprimeρprime) 154

xix

Def 737 ΠtimesR9R Projection 154Def 738 x RtimesΠ9R Injection 154Def 739 κ (πprime ρprime) Aligns correlation maps 154Def 7310v sube K timesK Correlation maps preorder 155Def 7311

orKtimesKrarrK Join for correlation maps 155

Def 7312and

KtimesKrarrK Meet for correlation maps 155Def 741 K K Intraprocedural corr summary 156Def 741 K = VtimesVrarrK Intraproc corr summary type 156Sec 741 NoCorrelation K Any for any pair of variables 156Def 742 vK sube KtimesK v for intraproc corr summaries 156Def 743

orK KtimesKrarrK Join for intraproc corr summaries 156

Def 744 Csλ() C Contribution of an edge 157Sec 741 csλ K Corr created by stmt s on label λ 157Sec 741 killλ sube V Variables redefined by stmt on label157Def 745 (πbull ρbull) 7rarr R ΠtimesΠtimesR New correlation after composition 161Def 746 KtimesKrarrK Composition of correlation maps 161Def 747 CtimesKrarrK Contribution Csλi(Kni) 161Def 719 Γ IO K Well-formed intraproc corr summ 162Sec 742 o Final value of o 162Def 751 Kp ΛprarrK Interproc correlation domain 166Def 751 Λp sube L Output labels of predicate p 166Sec 76 Impossible R Partial eqv constructor impossible 168Sec 76 RCiCj R Partial eqv variant matrix 168

xxi

To my family and close ones

xxiii

Chapter I

Reacutesumeacute eacutetendu en Franccedilais

I1 Le Problegraveme du FrameDans le domaine de la veacuterification formelle de logiciels il est impeacuteratif drsquoidentifier leslimites au sein desquelles les eacuteleacutements ou fonctions opegraverent Une speacutecification com-plegravete drsquoune opeacuteration doit non seulement preacuteciser que les valeurs de sortie possegravedentune certaine propriegravete mais elle doit eacutegalement deacutelimiter les parties de lrsquoeacutetat drsquoeacutentreacuteesur lesquelles lrsquoopeacuteration fonctionne Ces limites constituent les proprieacuteteacutes de frame(frame properties en anglais) Elles sont habituellement speacutecifieacutees manuellement parle programmeur et leur validiteacute doit ecirctre veacuterifieacutee il est neacutecessaire de prouver que lesopeacuterations du programme nrsquooutrepassent pas les limites ainsi deacuteclareacutees La speacutecificationet la preuve de proprieacuteteacutes de frame est une tacircche notoiremment connue comme eacutetantlongue et fastidieuse Lrsquoeffort consideacuterable investi dans cette tacircche est une manifesta-tion du problegraveme de frame (frame problem en anglais) Les manifestations du problegravemede frame apparaissent dans le contexte de tous les langages de speacutecification et de toutesles meacutethodes de veacuterification formelle

I2 ObjectifsAu fil du deacuteveloppement de ProvenCore un micro-noyau polyvalent qui garantit lrsquoisola-tion il est apparu eacutevident que la speacutecification et la veacuterification des systegravemes de transi-tion en geacuteneacuteral ainsi que la speacutecification et veacuterification des systegravemes drsquoexploitation enparticulier ne sont pas immunes au problegraveme du frame Les systegravemes drsquoexploitation sontcaracteacuteriseacutes par des eacutetats complexes deacutefinis par des types de donneacutees algeacutebriques et destableaux associatifs qui sont des briques fondamentales pour repreacutesenter et manipulerdes donneacutees complexes drsquoune maniegravere efficace Les systegravemes drsquoexploitation sont aussicaracteacuteriseacutes par des transitions qui associent de tels eacutetats drsquoentreacutee agrave de nouveaux eacutetatsde sortie Cependant la plupart des transitions ne sont pas concerneacutees par lrsquoeacutetat drsquoen-treacutee dans son inteacutegraliteacute mais deacutependent de et modifient un sous-ensemble de celui-ciIntuitivement des proprieacuteteacutes valides pour lrsquoeacutetat drsquoentreacutee restent trivialement validespour lrsquoeacutetat de sortie obtenue apregraves la transition tant qursquoelles deacutependent seulement desparties de lrsquoeacutetat drsquoentreacutee qui ne sont pas modifieacutees par la transition En pratique prou-ver la preacuteservation de ces proprieacuteteacutes nrsquoest pas une tacircche eacutevidente et impose un effortmanuel conseacutequent et une foule de preuves peacutenibles et reacutepeacutetitives

xxiv

Lrsquoobjectif de notre travail a eacuteteacute drsquoadresser ce problegraveme et de trouver une solutionautomatiseacutee pour infeacuterer la preacuteservation de ces proprieacuteteacutes Plus preacuteciseacutement notre but aeacuteteacute lrsquoinfeacuterence automatique des proprieacuteteacutes qui deacutependent drsquoun sous-ensemble de lrsquoentreacuteequi est disjoint du frame de lrsquoopeacuteration crsquoest-agrave-dire du sous-ensemble de lrsquoeacutetat qui estmodifieacute Agrave cette fin nous avons proposeacute une solution baseacutee sur lrsquoanalyse statique quine requiert pas drsquoannotations de frame suppleacutementaires En deacutetectant le sous-ensemblede lrsquoeacutetat dont deacutepend une proprieacuteteacute ainsi que la partie qui nrsquoest pas affecteacutee par uneopeacuteration nous pouvons reacutesoudre automatiquement les obligations de preuve lieacutees agravedes parties non modifieacutees

Nous employons deux analyses statiques dans ce but une analyse de deacutependance etune analyse de correacutelation Les deux analyses gegraverent des programmes manipulant des ta-bleaux associatifs ainsi que des types de donneacutees algeacutebriques (structures et variants) etcalculent des reacutesultats refleacutetant la structure sous-jacente de ces types (champs construc-teurs et cellules de tableau) Un raisonnement automatique baseacute sur le reacutesultat combineacutede ces deux analyses statiques permet drsquoinfeacuterer la preacuteservation de certaines proprieacuteteacuteesrelatives agrave lrsquoeacutetat de sortie Agrave terme ces deux analyses ont pour vocation agrave ecirctre em-ployeacutees par une tactique de preuve qui sera inteacutegreacutee agrave lrsquoassistant de preuve interactiveinclus dans la suite logicielle ProvenTools deacuteveloppeacutee par Prove amp Run

Smart le langage cibleacute par la suite logicielle ProvenTools est un langage purmentfonctionnel qui manipule des structures de donneacutees algeacutebriques et des tableaux associa-tifs immuables Ce travail a eacuteteacute motiveacute par la veacuterification de ProvenCore ProvenCore estimpleacutementeacute via de multiples raffinements entre des modegraveles successifs du noyau du plusabstrait qui permet la deacutefinition et la preuve de la proprieacuteteacute drsquoisolation au plus concretqui est utiliseacute pour la geacuteneacuteration de code Les eacutetats globaux des couches abstraites sontdes structures complexes contenant de nombreux champs eux-mecircmes composites Descommandes telles que fork exec et exit peuvent ecirctre exeacutecuteacutees Chacune de ces com-mandes reccediloit comme argument un eacutetat global drsquoentreacutee et produit lrsquoeacutetat du systegravemeapregraves exeacutecution de la commande En pratique la plupart des commandes supporteacuteespar le systegraveme ne menacent qursquoun nombre limiteacute drsquoinvariants Prouver automatique-ment la preacuteservation des invariants immunes peut diminuer consideacuterablement le nombretotal de preuves agrave la charge du programmeur et permet agrave celui-ci de se concentrer surles preuves les plus inteacuteressantes

I3 Analyse de deacutependanceLrsquoanalyse de deacutependance gegravere des fonctions et leur speacutecification de maniegravere uniformeElle calcule conservativement pour chaque sceacutenario drsquoexeacutecution possible une approxi-mation des sous-eacuteleacutements de lrsquoeacutetat drsquoentreacutee desquels deacutepend le reacutesultat Pour les va-riants une analyse suppleacutementaire est effectueacutee simultaneacutement afin de calculer le sous-ensemble des constructeurs possibles dans chaque sceacutenario drsquoexeacutecution

Nous avons deacutefini notre propre domaine abstrait repreacutesentant les deacutependances etobtenons des informations de deacutependance qui reflegravetent la structure en couche des typesde donneacutees

xxv

Cette analyse a eacuteteacute conccedilue dans le but drsquoecirctre exeacutecuteacutee agrave la voleacutee durant la veacuterifica-tion interactive et opegravere de maniegravere uniforme sur les programmes et leur speacutecificationces deux points confeacuterant agrave notre approche son originaliteacute Nous avons impleacutementeacute unprototype de cette analyse de deacutependance en OCaml et lrsquoavons appliqueacutee agrave une speacuteci-fication fonctionnelle de ProvenCore Les reacutesultats obtenus sont positifs par exemplelrsquoanalyse de deacutependance srsquoexeacutecute en moins drsquoune seconde sur un ensemble de plus de600 preacutedicats totalisant approximativement 10000 lignes de code

Afin drsquointroduire pour lrsquoanalyse de deacutependance une forme de sensibiliteacute au contextenous avons conccedilu une extension baseacutee sur des chemins symboliques Cette extensionrallonge leacutegegraverement le temps drsquoexeacutecution (de 10 agrave 20 sur les benchmarks utiliseacutes)Cependant en utilisant lrsquoanalyse de deacutependance avec cette extension nous avons obtenudes reacutesultats plus preacutecis pour 50 des preacutedicats inclus dans ces benchmarks

I4 Anaylse de correacutelationLrsquoanalyse de correacutelation deacutetecte le flot de valeurs drsquoentreacutee dans les valeurs de sortie Ellecalcule conservativement une approximation des eacutequivalences entre les sous-eacuteleacutementsdrsquoentreacutee et ceux de sortie pour une fonction donneacutee Crsquoest une analyse statique inter-proceacutedurale qui reacutesume le comportement drsquoune fonction et qui deacutetecte quelles partiesde lrsquoeacutetat sont modifieacutees et dans quelle mesure Nous avons deacutefini un type drsquoeacutequivalencepartiel qui reflegravete la structure des types de donneacutees algeacutebriques et tableaux associatifsPour gagner en preacutecision et ne pas perdre drsquoinformations lorsque lrsquoentreacutee et la sortieont des types diffeacuterents nous avons introduit un niveau intermeacutediaire Les correacutelationsconsistent donc en des chemins drsquoaccegraves vers des sous-eacuteleacutements de mecircme type et deseacutequivalences entre ces sous-eacuteleacutements Ce niveau intermeacutediaire permet de calculer demaniegravere flexible des eacutequivalences preacutecises entre des parties de lrsquoentreacutee et des parties dela sortie

Nous avons lagrave aussi impleacutementeacute en OCaml un prototype de cette analyse de cor-reacutelation et nous lrsquoavons appliqueacute agrave une speacutecification fonctionnelle de ProvenCore Lesreacutesultats obtenus sont encourageants par exemple les correacutelations calculeacutees pour unsous-ensemble de 630 preacutedicats totalisant approximativement 10000 lignes de code sontobtenus en moins de 05 secondes Bien que plus complexe que lrsquoanalyse de deacutependancelrsquoanalyse de correacutelation srsquoexeacutecute plus rapidement sur nos benchmarks car contrairementagrave la premiegravere elle ne srsquoapplique qursquoaux fonctions mais pas aux speacutecifications En effetles speacutecifications sont des preacutedicats booleacuteens et ne retournent pas un eacutetat modifieacute

I5 Proceacutedure de deacutecisionNous avons esquisseacute une proceacutedure de deacutecision qui emploie nos deux analyses statiquesCelle-ci constitue la premiegravere eacutetape de notre solution pour lrsquoinfeacuterence automatique dela preacuteservation des invariants de frame En mettant au jour des eacutequivalences entreles entreacutees et les sorties et apregraves avoir deacutetecteacute qursquoune proprieacuteteacute ne deacutepend que de

xxvi

parties inchangeacutees il est possible drsquoinfeacuterer la preacuteservation des invariants pour ces partiesinchangeacutees

La proceacutedure de deacutecision nrsquoa pas encore eacuteteacute impleacutementeacutee mais des expeacuteriencespreacuteliminaires et un prototype simple nous donnent une ideacutee de la maniegravere dont lesreacutesultats de deacutependance et de correacutelation doivent ecirctre unifieacutes Par ailleurs cela nous apermis de deacuteterminer le genre de requecirctes qui peuvent ecirctre traiteacutees et le meacutecanismepermettant drsquoy reacutepondre Les reacutesultats obtenus gracircce agrave notre prototype simple sur unespeacutecification fonctionnelle de ProvenCore sont deacutecrits et analyseacutes

Lrsquounification des reacutesultats des deux analyses passe par la creacuteation drsquoun graphe re-liant les variables drsquoentreacutee et de sortie examineacutees par la requecircte Les arcs repreacutesententdes correacutelations entre des sous-eacuteleacutements de ces variables qui sont deacutetecteacutees par la se-conde analyse Les deacutependances de la proprieacuteteacute dont on cherche agrave infeacuterer la preacuteservationindiquent les sous-eacuteleacutements qui influent sur le reacutesultat de cette proprieacuteteacute Lorsque cessous-eacuteleacutements sont laisseacutes intacts la proprieacuteteacute est trivialement preacuteserveacutee Lrsquoalgorithmedrsquounification parcourt donc le graphe en tentant de deacutetecter un maximum drsquoeacutequiva-lences entre des sous-eacuteleacutements des variables drsquoentreacutee et de sortie Si les sous-eacuteleacutementsindiqueacutes par la deacutependance sont inclus dans lrsquoensemble des sous-eacuteleacutements eacutequivalentsalors la proprieacuteteacute est neacutecessairement preacuteserveacutee car toutes les valeurs influant sur sonreacutesultat sont les mecircmes avant et apregraves lrsquoexeacutecution de lrsquoopeacuteration

I6 ConclusionPour conclure nous avons conccedilu et impleacutementeacute deux analyses statiques qui deacutetectentles deacutependances de donneacutees drsquoune proprieacuteteacute logique ainsi que des correacutelations entreles entreacutees et sorties drsquoopeacuterations Nos premiers reacutesultats sur un modegravele fonctionneldrsquoun micro-noyau sont encourageants tant pour leur preacutecision que pour la vitesse delrsquoanalyse ce qui rend ces analyses adeacutequates pour un usage dans le cadre drsquoun prouveurinteractif Hormis de menues ameacuteliorations impactant la preacutecision de notre analyse lesprochaines eacutetapes consistent agrave les combiner afin de deacutetecter les invariants qui ne sontpas affecteacutes par lrsquoexeacutecution drsquoun preacutedicat puis inteacutegrer cette deacutetection comme tactiquedans le prouveur de theacuteoregravemes ProvenTools Nous pensons qursquoil est possible de tirerparti des speacutecifications de frame agrave moindre coucirct en particulier sans que cela imposeau programmeur lrsquoeacutecriture fastidieuse drsquoannotations intuitivement eacutevidentes Lors dela veacuterification formelle de systegravemes de transition complexes il devient alors possibledrsquointeacutegrer aux outils de deacuteveloppement une infeacuterence automatique de la preacuteservationdes invariants lieacutes au frame via lrsquoanalyse statique

1

Chapter 1

Introduction

No human investigation can claim tobe scientific if it doesnrsquot pass the testof mathematical proof

Leonardo da Vinci

11 Formal Verification of SoftwareSince the middle of the last century computers and information technology broughtforth a digital revolution fundamentally changing the way we live work and inter-act with one another Nowadays computer programs govern our world and softwarepermeates our lives in manifold ways shaping our interactions with the surroundingenvironment From the alarm clock that marks the start of our day and the coffee ma-chine that motivates us to leave the house to the smart phone we use for checking ouremails or bank account and the car we are driving (or the automated driverless subwaywe are relying on) some type of software is discreetly acting in the background Wehave grown so accustomed to it that we do not even notice it anymore until it assertsitself by impeding us to check our email by displaying a blue error screen on an ATM orticket machine or by serving us a salty bag of crisps instead of the desperately neededbottle of water we have just paid for on a vending machine Such reminders can lead tofrustration and cause inconveniences but essentially they cause minor problems How-ever receiving such reminders as a result of malfunctions of medical equipment suchas radiation therapy machines of flight control systems Mars orbiters satellites or nu-clear power plants can have dramatic consequences endangering human lives causingenvironmental harm or entailing significant financial losses Therefore the quality ofthe software around us not only influences the quality of our daily lives but it mightpotentially have an impact on our safety and the safety of our surrounding world

Writing reliable completely error-free software is a difficult task and even a utopianone in the absence of dedicated rigorous approaches for improving its quality Indeedfor many software systems no guarantees or warranties are provided and their qualityis addressed only by traditional software engineering approaches such as testing or codereview which cannot guarantee the absence of bugs While this can be acceptable fornon-critical programs mission- or safety-critical software systems for which software

2 Chapter 1 Introduction

quality is of the utmost importance have to guarantee the absence of runtime errorsand provide high levels of confidence regarding their functional correctness Certainsafety-critical market segments impose standards and regulatory requirements for thedevelopment of such software systems In these domains formal program verificationis emerging as a promising approach gaining a wider audience and more and moreterrain

Formal program verification comprises a set of techniques and tools that can be usedto ensure by mathematical means that the program under scrutiny fulfills its functionalcorrectness requirements ie that it computes the right information For achieving thisgoal a formal description or specification of the programrsquos expected behaviour mustbe given Once this is established multiple mathematical tools can be employed forformally verifying that the programrsquos implementation follows the formal specification

Formal methods can be traced back to the early days of computer science andtheir origin can be linked to the names of Floyd (Floyd 1967) Hoare (Hoare 1969)and Naur (Naur 1966) (and later to that of Dijkstra (Dijkstra 1976)) and theirmethods for verifying program code with respect to assertions Despite their earlyfoundations formal methods seemed for decades to be confined to the research worldas a consequence of intricate notations failure to scale to real-world programs andlimited or inadequate tool support Since the 1960rsquos however considerable progresshas been made in the field of formal methods in terms of both methodology and toolsfor computer aided program verification Still formal program verification methods arenot yet a widespread alternative or even complement to testing in the industry Unliketesting that cannot show the absence of bugs the goal of formal verification methodsis to prove by means of mathematical tools that the program execution is correct in allspecified environments without actually executing the program itself These are staticverification techniques

Static verification techniques include program typing model checking deductiveverification methods and static program analysis Besides requiring a formal specifica-tion of the programrsquos intended behaviour and its envisioned properties at runtime allformal methods are theoretically characterized by undecidability and complexity whichare addressed by introducing some form of approximation For soundness consider-ations these approximations are necessarily over-approximations and all static veri-fication techniques are necessarily conservative they can prove the absence of someerroneous runtime behaviours but they will inevitably trigger some false warnings re-jecting certain behaviours that are in practice correct

Program Typing Type systems (Cardelli and Wegner 1985) are tools for reasoningabout programs More specifically they constitute ldquoa syntactic method for proving theabsence of certain program behaviours by classifying phrases according to the kindsof values they computerdquo (Pierce 2002) They are used for computing static approxi-mations of the runtime behaviours of the terms in a program and can guarantee thatwell-typed programs are free from certain runtime type errors such as passing stringsas arguments to a primitive arithmetic operation or using an integer as a pointer

11 Formal Verification of Software 3

In practice type systems have become the most widespread instance of formalmethods with applications to many programming languages and automatic typecheck-ers built into a variety of compilers Static typecheckers entail a variety of benefitsranging from early error detection to offering convenient abstraction and documen-tation mechanisms and improving the efficiency of compilers which nowadays makeuse of the information provided by typecheckers during their optimization and codegeneration phases

The Curry-Howard correspondence implies that types can be used for expressingarbitrary complex mathematical specifications Additional type annotations could inprinciple enable the full proof of complex properties effectively transforming typecheckers into proof checkers (Pierce 2002) Approaches such as Extended Static Check-ing (Leino 2001 Leino and Nelson 1998 Flanagan et al 2002) made progress towardsimplementing entirely automatic checks for broad classes of correctness properties

Additionally approaches relying on type inference have been used for alias analy-sis (OrsquoCallahan and Jackson 1997) and exception analysis (Leroy and Pessaux 2000)Powerful type systems based on dependent types (Martin-Loumlf 1984 Nordstroumlm Peters-son and Smith 1990) are used in automated theorem proving Various proof assistantsincluding Coq (Bertot and Casteacuteran 2004 Sozeau and team 1997) 1 are based on typetheory

Model Checking Model checking is a verification technique exhaustively exploringall possible system states in a systematic manner (Baier and Katoen 2008) More pre-cisely given a finite-state model of a system and a formal property a model checkingtool verifies whether the property under scrutiny holds for a state in the given modelModel checking emerged as a popular lightweight formal method as a consequence ofprogress made in the development of program logic and decision procedures auto-matic model checking techniques and compiler analysis (Jhala and Majumdar 2009)First program logic and decision procedures (Nelson and Oppen 1980 Shostak 1984)provided the needed framework and algorithmic tools to reason about infinite statespaces Automatic model checking techniques (Clarke and Emerson 1981 Vardi andWolper 1994) for temporal logic provided algorithmic tools for state-space explorationAbstract interpretation (Cousot and Cousot 1977) provided connections between thelogical world of infinite state spaces and the algorithmic world of finite representa-tions (Jhala and Majumdar 2009)

Currently model checking continues attracting considerable attention from the in-dustry This can be partly explained by it being a rather general verification approachthat is suitable for applications stemming from different areas ranging from embeddedsystems to hardware design In addition it is also an automatic lightweight techniquesupporting partial verification and requires a low degree of user interaction and a lowerdegree of expertise (Baier and Katoen 2008) compared to other verification techniques

1Coq Reference Manual Version 86 httpscoqinriafrdistribcurrentfilesReference-Manualpdf

4 Chapter 1 Introduction

Its main weaknesses stem on one hand from it suffering from the combinatorial state-space explosion (the number of states needed to model the system accurately may easilyexceed the amount of available computer memory) and on the other hand from itbeing less suitable for data-intensive applications

Model checking techniques also impose the production of models often expressedusing finite-state automata which are in turn described in a dedicated description lan-guage Another prerequisite for model checking is a formal specification of the prop-erties to be verified typically provided by means of temporal logic which is suitablefor the specification of a variety of properties ranging from functional correctness andsafety to liveness fairness and real-time properties (Baier and Katoen 2008)

Deductive Verification Methods Deductive verification methods consist in pro-ducing formal correctness proofs by first generating a set of formal mathematical proofobligations from the program and its specification and by subsequently dischargingthese Based on the manner in which proof obligations are discharged namely auto-matically or interactively the deductive verification methods can be classified into twobroad categories Both require a thorough understanding of the system to be provenas well as a good knowledge of the employed proof tools

The first category of deductive methods rely on standalone tools that accept asinputs programs written in a specific programming language (such as Java C or Ada)and specified in a dedicated annotation language (such as JML or ACSL) These auto-matically produce a set of mathematical formulas called verification conditions whichare typically proven using automatic theorem provers (Gallier 1987) or satisfiabilitymodulo theories solvers (SMT) such as Alt-Ergo Z3 CVC3 Yices Deductive verifi-cation tools such as Why3 or Boogie have their own programming and specificationlanguage (WhyML and Boogie respectively) which can act as intermediate verifica-tion languages and are designed as a layer on which to build program verifiers for otherlanguages Verifiers for C Dafny Chalice and Spec have been built using BoogieWhyML has been used for the verification of Java C and Ada programs

The second category of deductive methods relies on interactive theorem provers(Bertot and Casteacuteran 2004) also called proof assistants such as Isabelle Coq AgdaHOL or Mizar Both the program and its specification are encoded in the proof as-sistantrsquos own language (Gallina and Isar respectively) and the proofs that a programfollows its specification ie that it is functionally correct are typically conducted inan interactive manner using the underlying proof construction engine In other wordsusers are required to actively participate in the verification process by providing induc-tive arguments and guiding the proof through proof tactics proof hints or strategies

Both deductive verification methods offer a high level of assurance For automatictheorem provers the proof chain consisting of multiple steps (the model of the inputprogramming language the generator of verification condition the used SMT solver) atwhich errors could potentially infiltrate can be perceived as a weakness For interactivetheorem provers the high-level expertise required to employ them can be perceived asdiscouraging by the wider audience However major industrial breakthroughs havebeen recently achieved For instance Hyper-V Microsoftrsquos hypervisor for highly secure

12 The Frame Problem in a Nutshell 5

virtualization was verified using VCC and the Z3 prover (Leinenbach and Santen 2009)CompCert (Leroy 2009) the first formally proven C compiler was verified using theCoq proof assistant High security properties of the seL4 microkernel (Klein et al2009) have been proven using the IsabelleHOL proof assistant

Static Program Analysis Static program analysis comprises multiple techniquesfor computing at compile-time safe approximations of the set of values or behavioursthat can occur dynamically when executing a program Static analysis techniquesinitially emerged in the field of compilation where they provided manners to generatecode efficiently by avoiding redundant or superfluous computations (Nielson Nielsonand Hankin 1999)

Static analyses compute sound conservative information However for decadestheir scalability to industrial-size programs has been doubted and their application hasbeen considered as being limited to the research world and to small programs Recentmajor breakthroughs have been achieved however and they triggered on one hand theinclusion of static analysis at different levels of the software validation process (Cousot2001) and on the other hand a proliferation of static code analysers for a varietyof languages targeting mainstream usage and offering a solution for detecting andeliminating common runtime errors A recent example is Infer (Calcagno and Distefano2011) an open-source static analysis tool for bug detection in Java C and Objective-Ccode It was developed at Facebook where it is used as part of the development processfor mobile applications Furthermore static analysis techniques and tools are nowadaysemployed in the safety-critical market segment For instance Astreacutee (Cousot et al2005 Blanchet et al 2003 Cousot et al 2007) a static analyser for embedded softwarewritten in C has been employed for the verification of aerospace software (Delmas andSouyris 2007 Bouissou et al 2009 Bertrane et al 2015) In particular it has beenused for proving the absence of runtime errors in the primary flight control software ofthe fly-by-wire system of Airbus airplanes

It is argued (Cousot and Cousot 2010) that model checking deductive verifica-tion and static program analysis represent approximations of the program semanticsformalized by the abstract interpretation theory (Cousot and Cousot 1977)

Broadly speaking this thesis focuses on static program analysis techniques that aremeant to be used during interactive theorem proving in order to facilitate and auto-mate the verification of a certain class of properties in the context of a strongly typedlanguage

12 The Frame Problem in a NutshellThe frame problem (McCarthy and Hayes 1969) has been initially identified and de-scribed by McCarthy and Hayes in 1969 in the context of Artificial Intelligence (AI) Itshistory is essentially intertwined with that of logicist AI the branch of AI attempting

6 Chapter 1 Introduction

to formalize reasoning within mathematical logic The initial description of the frameproblem is the following

ldquoIn proving that one person could get into conversation with anotherwe were obliged to add the hypothesis that if a person has a telephone hestill has it after looking up a number in the telephone book If we hada number of actions to be performed in sequence we would have quite anumber of conditions to write down that certain actions do not change thevalues of certain fluents In fact with n actions and m fluents we mighthave to write down mn such conditionsrdquo

Unsurprisingly given its identification in the context of logicist AI the frame prob-lem manifests itself in the realm of formal software specification and verification aswell (Borgida Mylopoulos and Reiter 1993) In this area it continues to identify acurrent problem having notoriously tedious consequences and imposing a considerableamount of manual effort For instance when considering a simple procedure

transferAmount(ownerId id1 id2 amount)

that records the transfer of a given sum of money amount from a customerrsquos (identifiedby ownerId) current deposit account (identified by the account number id1) to a savingsaccount (identified by the account number id2) a reasonable specification would bethe following

Precondition owner(id1) = ownerId and owner(id2) = ownerIdandavailableAmount(id1) ge amount

Postcondition availableAmount(id1)rsquo = availableAmount(id1) - amountandavailableAmount(id2)rsquo = availableAmount(id2) + amount

The program states prior to the procedurersquos execution and the ones subsequent to it arereferred to by the typical unprimedprime notation and by the availableAmount(id)and owner(id) functions The given specification declares a precondition that hasto hold prior to transferring the indicated sum of money from one account to theother and it stipulates that the customer identified by ownerId must be the owner ofboth accounts involved in the transaction It also requires that the currently availableamout of money in the deposit account identified by id1 is higher than the amount tobe transferred The postcondition specifies the procedurersquos effects on the final programstate and encompasses the conditions that have to hold after executing the procedureThey include a stipulation about incrementing the amount of money available in thesavings account by the transferred sum amount as well as one referring to decrementingthe amount of money available in the current account by the same amount

As discussed by Borgida et al (Borgida Mylopoulos and Reiter 1993) the prin-ciples on which this specification relies are simple and ubiquitous Program states

13 Prove amp Run Objectives and Products 7

are represented in terms of predicates and functions and a procedurersquos effects on theprogram state are represented as changes to one or more of these predicates and func-tions However the above specification can be interpreted in at least two manners andmultiple implementations with different effects can comply to it For instance oneimplementation that can be considered results in exactly two changes to the programstate as required by the postcondition and as intuitively expected Another implemen-tation considered makes these two changes but additionally also changes the ownershipof the two accounts involved in the transition The postcondition still holds after exe-cuting the second procedure version However the intuitive interpretation of the givenspecification namely that nothing else but the amount of money in the two accountschanges is inconsistent with the second implementation which does more than it isnecessary and indeed even desired In order to prevent such situations the postcon-dition for the transferAmount(ownerId id1 id2 amount) procedure would haveto also include conditions such as

forall id owner(id)rsquo = owner(id) and owner(id2)rsquo = owner(id2)and

forall id id = id1rArr id = id2rArr amount(id)rsquo = amount(id)

In other words the postcondition should include not only information about whatchanges but also about what does not change While this might not seem dramaticfor the trivial example illustrated above in real-world examples this quickly escalatesleading to the necessity of specifying a plethora of conditions of the same type as theones indicated above These are called frame properties Writing such conditions isnecessary but also notoriously repetitive and tedious Kogtenkov et al (KogtenkovMeyer and Velder 2015) rightfully state that

ldquoIt is hard enough to convince programmers to state what their programdoes forcing them in addition to specify all that it does not do may be atough sellrdquo

The tedious undeserved manual effort entailed by the specification and verificationof frame properties is a manifestation of the frame problem Though certain conventionsand approaches such as the implicit frames approach for specifying frame propertiescan alleviate the manual effort imposed some manifestation of the frame problem willbe visible to some extent in the context of any specification language and verificationmethod

13 Prove amp Run Objectives and ProductsThe proliferation of mobile devices with unprecedented processing power storage ca-pacity and access to information already generated a plethora of new possibilities forbillions of people Breakthroughs in emerging technology stemming from fields suchas artificial intelligence and the Internet of Things have increased the number of such

8 Chapter 1 Introduction

possibilities but also brought forth an unprecedented number of massive security risksand challenges Prove amp Runrsquos2 objective is to offer solutions for the security chal-lenges entailed by the large-scale deployment of mobile and connected devices and ofthe Internet of Things

Attempts at addressing security challenges and diminishing or eliminating potentialsecurity issues in systems linked to such devices must put their underlying operatingsystems and kernels at the core of their efforts to ensure the absence of errors orfaulty behaviours Any software running on the operating system depends on theoperating system Furthermore operating systems run in privileged modes in whichprotection from certain faulty behaviours is non-existing and bugs can lead to arbitraryeffects Therefore these central software parts need to provide a high level of trust anddemonstrate proven and auditable compliance with security properties

Motivated by the desire to integrate the usage of formal methods in the industryworld and therefore to contribute to the increase of software quality and security thecompanyrsquos initial efforts concentrated on offering a reliable software solution that fa-cilitates the formalization of software functioning and mathematically proves that thissoftware accurately and correctly follows its specification and ensures complex secu-rity properties This led to the development of ProvenTools a software developmenttoolchain designed to write and formally prove models written in Smart Prove amp Runrsquospurely functional unified programming and specification language For formally prov-ing models written in Smart ProvenTools integrates an interactive proof assistant whichautomates simple proofs and guides or assists users during more complex ones Theprover was designed to offer detailed explanations about its results providing either thereasoning steps employed for achieved proofs or detailed information for properties thatcannot be proven Such transparency on the proverrsquos side is imperative for productsthat have to be certified as auditors need to be able to verify the claims of the proverFurthermore ProvenTools includes a generator for transforming programs modeled inSmart into their equivalents in other languages such as C while leveraging the proofguarantees of the Smart model

Following the development of ProvenTools Prove amp Run reached a new stage con-centrating on developing and providing formally proven microkernels and hypervisorsUnlike the widely used operating systems which are enormous and typically have mil-lions of lines of code microkernels are compact minimal software systems that canprovide all the mechanisms that need to run in privileged mode including low-level ad-dress space management thread management and inter-process communication Theycan be used for creating a protected secure environment on the execution platformon top of which sensitive security-critical services can run Being much smaller in sizecompared to traditional operating systems they are amenable to formal verificationHypervisors or virtualization platforms create and host virtual machines They cre-ate the possibility of running multiple different operating systems whose execution ismanaged by the hypervisor which has full control over all critical resources such asthe memory or the CPU Therefore any security issue of the hypervisor impacts every

2Prove amp Run Website httpwwwprovenruncom

14 Context and Problem Statement 9

operating system it hosts The security and reliability of the host hypervisor is thuscrucial

By employing Smart and ProvenTools two microkernels have been developed3 Thefirst named ProvenCore is a formally proven general purpose microkernel that ensuresisolation ie integrity and confidentiality The second named ProvenCore-M targetsembedded devices based on microcontrollers ProvenVisor is a hypervisor currently indevelopment at Prove amp Run

14 Context and Problem StatementDuring the development of ProvenCore it became obvious that the specification andverification of transition systems in general and operating systems in particular arenot insulated from the frame problem The latter are characterized by complex statesdefined by algebraic data types and associative arrays which are fundamental buildingblocks for representing grouping and handling complex data efficiently Transitionstheir other characteristic component map such a complex input state to an outputstate However most transitions are rarely concerned with the entire input state thatthey are manipulating for retrieving the output state Most frequently they depend on

sX

t

f

Observation

Observation

Figure 11 ndash Complex Transition Systems Frame Problem

and modify only a limited subset of it Intuitively properties holding for the inputstate should hold for the output state following the transition as well as long asthey depend only on fragments of the state that are not modified by the transition Inpractice proving the preservation of such properties does not come for free and imposesconsiderable manual effort and a multitude of tedious repetitive proofs

3Prove amp Run Products httpwwwprovenruncomproducts

10 Chapter 1 Introduction

This general case is illustrated in Figure 11 where a transition system and a states in it are considered For the state s a property depending only on a limited subsetshown in the grey rectangle with vertical lines is known to hold A transition f leadsto a new state t obtained by modifying only a small part of the input state s shownin the orange rectangles with inclined lines Since the previously proven property isknown to depend only on an unmodified subset of the state we should be able to inferthe preservation of the property for the state t as well This however is not inferred bydefault

The goal of this work is to address this issue and to find an automatic solution forinferring the preservation of such properties More specifically we target the automaticinference of properties that depend only on an input subset that is disjoint from anoperationrsquos frame ie the state subset it modifies

To this end we propose a solution based on static analysis which does not requireany additional frame annotations We argue that by detecting the subset on which aproperty depends and by uncovering the part that is not modified by an operationas shown in Figure 12 we can automatically discharge proof obligations related tounmodified parts We employ two different static analyses for this goal

Dependency Obs

= Obs

Correlation f

=

Invariant Obs

rArr Obs

f

Figure 12 ndash Frame Problem and Solution Strategy

The first analysis of our two-step strategy is a dependency analysis which is meantto detect the input subset δ on which the outcome of an operation or of a logicalproperty L relies This was illustrated by the grey rectangle with vertical lines inFigure 11 The second one is a correlation analysis meant to detect the subsetξ modified by an operation O This was illustrated by the orange rectangles withinclined lines in Figure 11 By employing these two static analyses thus detecting δand ξ automatically and by subsequently reasoning based on their combined resultswe can infer the preservation of the property L for the post-state of O

We target the development of a proof tactic that relies on our solution based onstatic analysis and that is meant to be integrated into the interactive proof assistantoffered by ProvenTools Smart the language to which the ProvenTools toolchain isassociated is a purely functional language manipulating immutable algebraic datastructures and associative arrays

15 Contributions and Structure of the Document 11

The motivation and ideas behind this work were triggered by the verification ofProvenCore Its proof is based on multiple refinements between successive models fromthe most abstract on which the isolation property is defined and proven to the mostconcrete ie the actual model used for code generation The global states of the ab-stract layers are complex structures with multiple compound fields Commands suchas fork exec exit can be executed Each of these receives as input the global statebefore executing the command and returns the state of the system after execution Inpractice most supported commands effectively affect only a limited number of invari-ants Automatically proving the preservation of unaffected invariants can diminish thetotal number of proof obligations

15 Contributions and Structure of the DocumentWe propose an approach for automatically inferring the preservation of framing-relatedinvariants which is meant to be used in the context of an interactive theorem proverOur approach employs two different static analyses namely a dependency analysis and acorrelation analysis Both analyses handle associative arrays and algebraic data typesie structures and variants and compute fine-grained results mirroring the layeredstructures of such types

The dependency analysis handles functions and their specifications in a unified man-ner and computes for each possible execution scenario a conservative approximation ofthe input (sub)elements on which their outcome depends It is a flow-sensitive path-sensitive interprocedural analysis For variants an additional analysis is simultaneouslyconducted for computing the subset of possible constructors on a given execution sce-nario

In order to introduce a relaxed form of context-sensitivity for our dependency anal-ysis we have devised an extension based on symbolic paths

The correlation analysis detects the flow of input values into output values It com-putes a conservative approximation of fine-grained equivalences between the input andthe output subelements of a function It is an interprocedural analysis that summarisesthe behaviour of functions and detects what is modified and to what extent

For both analyses a prototype has been implemented and applied to a medium-sizedfunctional specification of a microkernel

The rest of this dissertation is structured into 8 chapters the first two being intro-ductory

Chapter 2 discusses the manifestations and effects of the frame problem on bothformal specification and formal verification and presents some of the main approachesemployed for addressing them We also include a brief presentation of some of theleading specification languages and deductive verification tools and their mechanismsfor dealing with frame properties

In Chapter 3 we introduce the features and the syntax of Smart the unified pro-gramming and specification language developed at Prove amp Run and give a conciseoverview of ProvenTools the toolchain associated with it

12 Chapter 1 Introduction

After these two preliminary chapters in Chapter 4 we focus on the computationalversion of Smartrsquos intermediate language as it is the language that we consider through-out the rest of this dissertation We present its syntax underline its specificities andpresent its formal semantics

Chapter 5 is dedicated to the dependency analysis the first of the two static analysesthat we have developed and designed as companion tools to be used during interactiveprogram verification We present our abstract dependency domain that mirrors thelayered structure of associative arrays and algebraic data types discuss the analysisat an intra- and interprocedural level and present the semantic interpretations of thecomputed dependency information

Chapter 6 touches upon the issue of context-sensitivity and presents our extensionto the dependency analysis presented in Chapter 5 This is meant to eliminate someimprecision by introducing a relaxed form of context-sensitivity

The correlation analysis the second component of our strategy for inferring thepreservation of frame-related invariants is presented in Chapter 7 We introduce ourabstract partial equivalence type discuss the need for an additional level of abstractionallowing us to refer not only to variables but also to substructures within them and givean in-depth presentation of the analysis at an intraprocedural level and a descriptionof it at the interprocedural level

The implementations of our two analyses and the results obtained on a medium-sizedfunctional specification of a microkernel are presented in Chapter 8 The strategy foremploying the information computed by the two analyses is discussed and illustrated

Finally Chapter 9 concludes this dissertation with a summary of our contributionsand some remarks concerning the specificities of each of our static analyses as wellas our experience with their design and implementation In addition we also discussfuture perspectives and potential extensions to this work

Notes about Chapter 5 and Chapter 7

bull The work presented in Chapter 5 was the subject of a publication in the pro-ceedings of the 17th International Conference on Formal Engineering Methods(ICFEM15) (Andreescu Jensen and Lescuyer 2015)

bull The work presented in Chapter 7 was the subject of a publication in the proceed-ings of the 14th International Conference on Software Engineering and FormalMethods (SEFM) (Andreescu Jensen and Lescuyer 2016)

bull On-line dedicated web pages The prototypes for each of the two discussedstatic analyses can be tested on their dedicated web pages Various examplesare provided and explained and additionally users can devise and test their ownexamples The corresponding links are indicated in the chapters

13

Chapter 2

The Frame Problem in SoftwareVerification

All his successors gone before him havedonersquot and all his ancestors that comeafter him may

William Shakespeare

In this chapter in Section 21 we give a very brief necessarily incomplete pre-sentation of some of the major existing specification languages and verification toolsfocusing on those which have addressed the frame problem explicitly and which are rel-evant for our discussion in the section following it We then discuss the manifestationsof the frame problem in formal specification and verification in Section 22 and presentthe basic approaches to specifying and verifying frame properties in Section 23 In Sec-tion 24 we explain some of the difficulties entailed by these goals when combined withother concerns such as considerations regarding heap modifications and informationhiding Even though we are not concerned with information hiding and heap modifica-tions are beyond the scope of our work there are some parallels that can be drawn andsome ideas stemming from work that has been done in these areas that are relevant forour context and solution as well In Section 25 we briefly present other approaches tothe automatic detection of frame properties Finally we give a short overview of someof the approaches used for specifying and reasoning about pure methods in Section 26

21 Specification Languages and Verification ToolsDafny Dafny (Leino 2010) is a programming language designed at Microsoft witha focus on verification It is an imperative sequential language supporting genericclasses dynamic allocation and inductive data types Additionally it also offers built-in specification constructs such as pre- and postconditions frame specifications (whichwe will discuss in more detail in Section 23) quantifiers loop invariants and termi-nation metrics (decreases clauses used in conjunction with loop invariants) Theseare reminiscent of contracts in Eiffel (Meyer 1997 Meyer 1991) or similar constructsin JML (Leavens Baker and Ruby 2006) and Spec (Barnett et al 2005b) whichwe will present in the following paragraphs as well Additionally Dafny also includes

14 Chapter 2 The Frame Problem in Software Verification

support for algebraic data types recursive functions and types as well as updatableghost variables which are not allowed to flow into non-ghost variables Ghost vari-ables and specification constructs in general are eliminated from the executable codeas they are meant to be used strictly during verification For framing Dafny relies ondynamic frames (Kassios 2006) using ghost variables We will discuss this approach inSection 24

Dafny has an accompanying static program verifier run as part of the compilerwhich targets the verification of functional correctness properties of programs Thisis built on top of the Boogie verification engine (Barnett et al 2005a) which in turnuses Z3 (Moura and Bjoslashrner 2008) The Dafny compiler translates verified programswritten in Dafny to executable code for the Net Platform The tool is open source andcan be tried online 1

Smart the modeling language developed at Prove amp Run will be presented in detailin Chapter 3 Similar to Dafny it is a unified programming and specification languagedesigned with the goal of facilitating verification Unlike Dafny Smart is a functionallanguage relying on predicates the equivalent of functions in other programming lan-guages Both Dafny and Smart are translated into intermediate languages (Boogie andSmil respectively) which act as median layers between Dafny or Smart programs andthe underlying verification tools For Smart the deductive verification tool is an inter-active proof assistant Executable code can be generated from both verified Dafny andverified Smart models

Spec The Spec programming system (Mike Barnett 2005 Barnett et al 2005bBarnett et al 2011) includes a programming language a compiler and a static programverifier It stems from a research effort focusing on the development of a specificationmethodology for object-oriented languages and seeking suitable approaches for enforc-ing it both statically and dynamically The Spec methodology introduced some newideas that influenced the research community and served as a starting point for otherapproaches (Barnett et al 2011) It supports sound modular verification of object in-variants in the presence of multi-object invariants subclassing and reentrancy Specled to advances concerning the specification of pure methods ie methods withoutside-effects and it introduced an ownership model that allows expressing and usingheap topologies in specifications (Barnett et al 2011) We will discuss the latter inSection 24

The language Spec is a formal object-oriented language extending the type sys-tem of C with non-null types and checked exceptions It provides standard methodcontracts based on pre- and postconditions as well as object invariants as inspiredby Eiffel and the Design by Contract (Meyer 1992) approach The accompanyingcompiler performs various static data-flow analyses for checking that the non-null typesystem is enforced and that contracts are pure ie have no side-effects In additionit also performs admissibility checks which are important for soundness and consist in

1Dafny Web Page httpswwwmicrosoftcomen-usresearchprojectdafny-a-language-and-program-verifier-for-functional-correctnessAccessed 2017-02-12 (Archived by WebCite Rcopy at httpwwwwebcitationorg6oE9sn0iL)

21 Specification Languages and Verification Tools 15

restricting what can appear in object invariants and what pure methods can read Thecompiler also emits runtime checks run-time assertions are generated for the programpoints at which contracts are supposed to hold and any failure causes an exception tobe thrown (Barnett et al 2011)

Another important contribution having its origins in the Spec project are theBoogie intermediate language and verification engine Spec programs are translatedto the Boogie language where the heap is modeled as a two-dimensional array indexedby object references and field names Method calls are modeled by assuming theirpreconditions and type information by assigning arbitrary values to anything thatthey might modify and by subsequently assuming their postconditions Based on thisverification conditions are generated and expressed in a standard format supported byautomatic theorem provers Any error reported by the theorem prover is mapped backto Boogie and then to Spec (Barnett et al 2011)

Spec2 has been developed at Microsoft and is publicly available

Boogie The Boogie project 3 comprises both an intermediate verification languageand a verification tool The Boogie language (This is Boogie 2 Boogie Reference Man-ual) is meant to be used as an intermediate representation for static program verifiersof various source languages such as Dafny Chalice and Spec Verifiers for C such asVCC and HAVOC have been built on top of Boogie as well It supports mathematical(types constants functions axioms) and imperative components (global variables pro-cedure declarations and implementations) The latter specify sets of execution tracesthereby describing and constraining states using the former Parametric polymorphismpartial orders nondeterminism logical quantifications total expressions and partialstatements are among the languagersquos features

The Boogie verification tool (Barnett et al 2005a) infers invariants of the inputBoogie programs and then generates verification conditions expressed as formulae infirst-order logic and arithmetic that are passed to an SMT solver such as Z3 Theencoding for the verification formulae allows the reconstruction of error traces fromfailed proofs

JML The Java Modeling Language (JML) (Leavens Baker and Ruby 2006 Leavenset al 2006) is a behavioural interface specification language (Wing 1987) targetingas its name implies the specification of Java classes and interfaces Its design wasguided by the syntax and semantics of Java as some of the main targeted charac-teristics were understandability and a shallow learning curve for programmers alreadyfamiliar with Java The constructs it supports are inspired by the Design by Contractapproach as well as by the Larch family of specification languages (Guttag Horning

2Spec Web Page httpswwwmicrosoftcomen-usresearchprojectspecAccessed 2017-02-12 (Archived by WebCite Rcopy at httpwwwwebcitationorg6oEAJnY8b)

3Boogie Web Page httpswwwmicrosoftcomen-usresearchprojectboogie-an-intermediate-verification-languageAccessed 2017-02-12 (Archived by WebCite Rcopy at httpwwwwebcitationorg6oEAgwOzp)

16 Chapter 2 The Frame Problem in Software Verification

and Wing 1985) It also includes quantifiers constructs for specifying frame conditionsand specification-only fields and methods

Nowadays an evergrowing variety of tools supports JML (Burdy et al 2005)ranging from tools for type-checking specifications (the jmlc compiler) to tools forruntime debugging static analysis (such as ESCJava2 (Flanagan et al 2002 Burdyet al 2005 Chalin et al 2005) and Chase) and verification (such as LOOP KeY andKRAKATOA)

ESCJava2 performs extended static checking (Flanagan et al 2002) for Java pro-grams annotated with specifications written in JML It can check assertions and detectfrequent types of errors in Java such as dereferencing null or indexing an array outsideits bounds However the ESCJava2 tool did not initially address aspects related tochecking frame conditions and this became a notorious source of unsoundness (Burdyet al 2005) Various static verification tools (Berg and Jacobs 2001 Catantildeo and Huis-man 2003 Marcheacute Paulin-Mohring and Urbain 2004 Marcheacute 2016) and dynamicapproaches (Lehner and Muumlller 2010) addressed this issue

22 Manifestations of the Frame ProblemIn the realm of software verification the frame problem refers to establishing the bound-aries within which program elements operate and it has notoriously tedious implica-tions and consequences along two different axes the specification of frame propertiesor frame conditions which indicate which parts of the program state an operationis allowed to modify and their verification ie proving that operations modify onlywhat is allowed according to the specified frame properties Additionally the verifi-cation of frame properties has other ramifications such as proving the preservation ofproperties concerning parts of the state that are external to an operationrsquos frame iethe parts of the state modified by the operation Though identified decades ago in1969 in the context of Artificial Intelligence (McCarthy and Hayes 1969) the frameproblem is still a current concern in the field of formal specification and verificationLeavens et al (Leavens Leino and Muumlller 2007) identify it as one of the difficultremaining challenges in program verification Even more recently Bertrand Meyer de-scribed it as a subsisting problem (Meyer 2015) He argues that it constitutes anexcellent candidate for automation and describes the usual approaches to the frameproblem such as those frequently based on separation logic (Reynolds 2005) or own-ership types (Clarke Potter and Noble 1998) as elegant but requiring undeservedmanual specification effort in addition to annotations on the implementation side Inorder to make verification appealing to a wider audience in the industry the amountof annotations required from the programmers is of the utmost importance and thusmust be carefully taken into consideration when devising a solution While it is le-gitimate to require the specification of properties expressing the functional behaviourexpected of program elements intermediate properties to which frame properties be-long to should as much as possible be detected automatically They are an integral

23 Approaches to Specifying Frame Properties 17

part of a complete specification and they are necessary for proving functional correct-ness but in practical terms they are repetitive and cumbersome and their specificationis an inconvenience (Meyer 2015) Borgida et al provide a comprehensive discussionof the problem itself and the approaches to addressing it (Borgida Mylopoulos andReiter 1993 Borgida Mylopoulos and Reiter 1995) In (Borgida Mylopoulos andReiter 1995) Borgida et al suggest grouping the permissions to modify variablesaround variables themselves instead of methods However this type of specificationshave an unclear semantics in terms of proof obligations (Muumlller 2002) A more recentdiscussion of framing is provided by Hatcliff et al and it is included in a comprehensivesurvey of behavioural interface specification languages (Hatcliff et al 2012) A discus-sion regarding the remaining challenges related to the frame problem with a focus onmodular verification and information hiding is included in (Leavens Leino and Muumlller2007) The authors discuss possible approaches for addressing these challenges as wellas their respective limitations In the following section we present the main existingapproaches to specifying frame properties

We remark that Smart does not provide any explicit specification constructs forframe conditions It is a functional language and it does not support global variables ordestructive updates Implicitly Smart predicates may read anything passed to them asan input without modifying it and write everything in their output or locally declaredvariables The preservation of a frame property ie a logical property depending onlyon parts of the input that are copied without any modification to the output can bespecified as an implication of the form

frame_property(input) =rArr predicate(input output) =rArr frame_property(output)which can be included either in the predicatersquos postcondition or as a separate predicatewith a Boolean result receiving the predicatersquos input output elements as inputs

23 Approaches to Specifying Frame Properties

Various approaches for expressing frame properties have emerged These are knownas the manual exclusive and implicit approaches (Meyer 2015) We remark that allthree major approaches target only the specification of write effects of an operationMost specification languages do not offer special constructs for the specification of readeffects (some notable exceptions are JML Dafny and WhyML the programming andspecification language provided by Why3)

231 The Manual Approach

One of the existing approaches to specifying frame properties does not rely onany specific technique but instead treats them like any other specification componentThis consists in explicitly stating for each operation what is not modified implicitlyconveying that everything else may change This type of specification can be donewith logical variables or with old expressions by explicitly stating for each unchanged

18 Chapter 2 The Frame Problem in Software Verification

variable that its value in the operationrsquos post-state is equal to its prior value in theoperationrsquos pre-state

As described by McCarthy and Hayes (McCarthy and Hayes 1969) with m op-erations such as transfer and n ldquofluentsrdquo such as owner in our introductory examplefrom Section 12 the manual convention leads to a proliferation of clauses that needto be specified Their number can potentially be as high as mn This can prove tobe tedious repetitive and diverting attention and effort from what is truly interestingwhat is actually modified by the operation and how Moreover this approach can leadto instability in the software process (Meyer 2015)

For instance adding new fields to a class whose existing methods are not affected bythe newly added fields requires modifying the postcondition for each existing methodand adding clauses of the form newField = old newField for each added field

Both Dafny (Leino 2010) and Spec (Leino and Muumlller 2008a) support clauses ofthe form e = old(e) in method postconditions for specifying that a method has noimpact on the value of an expression e However these are not the primary mechanismsfor specifying frames in either Dafny or Spec as we will discuss in Section 232

In Smart for predicates manipulating inputs and outputs of the same structuredtype it can be specified in the postcondition that the values of certain fields are equalbetween the received input and the obtained output For instance for a predicatereceiving an input structure of type stype having fields f g h and returning an outputstructure of the same type where the values of the fields f h are equal to their valuesin the input a standard postcondition would have the following form

stypeequals[fh](input output)

This can be viewed as a form of old expressions However the construct used in theabove postcondition which we will discuss in Chapter 3 was not introduced specificallyfor this purpose This idiom is frequently employed for specifying contracts for implicitpredicates a form of foreign or native functions signatures

As we will discuss in Chapter 7 the fine-grained relations that we are detectingbetween parts of the input and parts of the output can be seen as clauses of the formsubvalue = old(subvalue) However in our case these are detected automatically bymeans of static analysis and thus do not require any annotation or manual effortFurthermore by detecting them automatically the potential of changes to the modeledentities and types leading to instability is eliminated

Another problem with this approach becomes visible when some variables are notin scope and hence cannot be explicitly mentioned in the specification (Hatcliff et al2012) In order to overcome the problem in this context complex solutions (Reynolds1981 OrsquoHearn Reynolds and Yang 2001 Banerjee Naumann and Rosenberg 2008)based on Hoare logic style frame rules (Hoare 1971) have been suggested (Hatcliff etal 2012)

23 Approaches to Specifying Frame Properties 19

232 The Exclusive Approach

The most frequent approach to framing is the exclusive approach This consists inexpressing frame properties by means of modifies-clauses that list all the variables thatmay be modified by an operation Implicitly everything that is not listed in such clausesis understood as having to remain unchanged (Guttag et al 1993a) This approachrelies on the observation that the mn matrix described by McCarthy and Hayes isusually sparse as most operations affect only a limited number of elements (Meyer2015)

Modifies clauses such asmodifies a b c can be interpreted as a set of clauses of theform q = old(q) for any q other than a b or c Despite their widely accepted yet mildlymisleading name a modifies clause does not require a command to modify all the listedelements Essentially modifies clauses put an upper bound on the set of elements thatcan be modified and imply that it is strictly forbidden to modify anything else Theexclusive approach to specifying frame properties owns its name to its characteristicof identifying unaffected elements by exclusion (Meyer 2015) Bertrand Meyer arguesthat a more appropriate name for such clauses is only clauses (Meyer 2015) sincethe main goal is not necessarily to enumerate variables that will change but rather tospecify that everything else ie variables that are not listed will not change

This approach has its roots in the modifies construct presented by Liskov and Gut-tag (Liskov and Guttag 1986) Forms of modifies clauses have been used in manydifferent specification languages including the Larch family (Guttag Horning andWing 1985 Guttag et al 1993a) JML (Leavens et al 2006) Spec (Mike Barnett2005) Dafny (Leino 2010) and Z (Abrial Schuman and Meyer 1980)

In JML (Leavens Baker and Ruby 2006) modifies clauses are called assignableclauses and are used for indicating locations that a method may assign to These areslightly different than classical modifies clauses in other languages For instance amethod assigning to a location a and then re-establishing its original value is requiredto list a in its corresponding assignable clause A typical modifies clause however doesnot require listing a since the method does not modify a effectively JML also featuresconditional modifies clauses allowing methods to specify that a modification may occuronly in certain situations Non-pure methods that do not explicitly specify assignableclauses are by default given an assignable everything clause Pure methods have bydefault an assignable nothing clause (Chalin et al 2005) Additionally JML providesaccessible clauses that allow specifying accessed locations (Leavens et al 2006)

In Dafny (Leino 2010) modifies clauses are expressed by sets of objects and theymust be interpreted as giving permissions to a method to modify any field of any objectthat is a member of the specified set Frame conditions are thus expressed at the levelof objects and not at the level of object fields While Dafny methods are not required tospecify what they read for Dafny predicates ie functions returning Booleans readingframe conditions can also be specified (Koenig and Leino 2012) These are memorylocations that predicates are allowed to read and they can be specified as sets ofobjects or object fields Dafny checks that memory locations outside the reading frame

20 Chapter 2 The Frame Problem in Software Verification

are not accessed nested predicate calls must have reading frames that are includedin the reading frames of the calling predicate Predicate parameters are not memorylocations and hence must not be declared In addition Dafny uses a form of dynamicframes (Kassios 2006) that we will present in Section 24

In Spec (Mike Barnett 2005 Leino and Muumlller 2008a) modifies clauses can beexplicitly added for constraining the modification of objects that were allocated in thepre-state of a method ie new objects allocated and modified by a method need notbe included in the modifies clauses Methods can specify that any field of an object omay be modified with a construct of the following form o it can also be specifiedthat only some field a may be modified with a construct of the form oa Unlikethe clauses expressed using old in postconditions for excluding some modificationsmodifies clauses must account for temporary modifications as well (similarly thus tothe JML assignable clause interpretation) For instance for a method decrementingsome integer field f and incrementing it subsequently the method could still specifythat f = old(f) in its postconditions However it would also have to include f in itsmodifies clause

Spec implicitly adds a modifies clause to methods in which this is the onlylisted element Thus by default methods are allowed to modify any field of the thisobject To prevent this the fields that may be modified must be explicitly includedin the clause (meaning that those not included are not allowed to change) A specialconstruct of the form thiso must be explicitly used for specifying that a method doesnot modify any field of this (Leino and Muumlller 2008a)

Information hiding imposes mechanisms for abstracting over program state thatcannot be explicitly mentioned in the modifies clause of a public method To this endwildcards can be used for specifying that the private representations of objects may bemodified as well as for specifying the modification of state in subclasses (Leino andMuumlller 2008a) However wildcards do not extend to aggregate objects and to this endSpec introduces the notion of ownership that we will discuss in Section 24

In Boogie frame conditions are expressed using coarse-grained modifies clausesin conjunction with postconditions These can quantify over fields and specify locationsof the heap that may be modified (This is Boogie 2 Boogie Reference Manual)

SPARK (Barnes and Limited 1997) uses a variation of the typical exclusive ap-proach SPARK procedures may reference or update the state associated with theirparameters in addition to that of global variables SPARK contracts must explicitlyaccount for the global variables accessed (read or written) during procedure executionin a globals construct Additionally for each parameter or global variable it must beindicated if it is read only written only or both read and written As SPARK is basedon the Ada language this is done by means of mode annotations such as in outindicating that a parameter or global variable is read only or written only respectivelyThe in out annotation is used for signaling that the annotated parameter or globalvariable is both read and written Together mode annotations on parameters and glob-als provide a complete specification of the inputs and outputs of a procedure (Hatcliffet al 2012) VDM (Jones 1990) provides similar annotations

24 Topologies and Effects 21

The exclusive convention facilitates the specification of pure operations ie opera-tions having no side-effects on which assertions in various languages including EiffelJML and Spec rely on for supporting data abstraction Specifying that an operationis pure simply amounts to specifying an empty modifies clause However specifyingand verifying the effects of heap modifications on the results of pure methods has beendescribed as one of the difficult remaining challenges related to framing (Hatcliff et al2012)

233 The Implicit Approach

The implicit approach eliminates the need to specify frame properties per se One ofthe implicit approaches relies on limiting what a procedure can modify based on theprocedurersquos precondition This approach is adopted in separation logic (discussed inSection 24) and in the implicit dynamic frames (Smans Jacobs and Piessens 2012)technique where reading and writing to memory requires knowing that the memorycontains that location To this end accessibility information is specified in the precon-ditions of methods By analysing preconditions an upper bound on the set of locationsthat are modifiable by a procedure can be detected As will be discussed in Chapter 7our approach to inferring fine-grained modifications can be seen as an implicit one aswell It relies on data-flow analysis and it is entirely automatic without requiring anydedicated annotations

Another approach to implicit framing was presented by Meyer He proposes theinference of frame properties for a method from the methodrsquos postcondition (Meyer2015) This approach relies on the empirical observation that in practice when pro-grammers realize that an element is modified by a methodrsquos execution they will gener-ally include and express information about how the element is modified It was inspiredby an informal review of publicly available JML code which showed that in practiceelements included in an assignable clause overlap those appearing in the methodrsquos post-condition Meyer argues that any exception to this observation can be easily addressedby inserting a Boolean function into the postcondition which always returns true andwhich introduces its elements into the implicit frame (Meyer 2015)

24 Topologies and EffectsSpecification techniques for complex data structures and operations manipulating themmust be able to describe and to address issues related to two different aspects namelythe topology or structure of the former and the effects of the latter on the data struc-turesrsquo state (Hatcliff et al 2012) In the object-oriented realm objects encapsulatestate and functionality yet their implementations are rarely limited to the fields andmethods of a single object After all one of the principles of object-oriented program-ming is to favour composition over inheritance Thus object fields reference otherobjects often of different classes and those objects in turn reference yet other objectsand so on In order to reason about and to prove functional correctness specificationshave to capture this ldquocompositerdquo shape of the implemented data structures (Leino and

22 Chapter 2 The Frame Problem in Software Verification

Muumlller 2008a) They also have to describe the effects of operations on the state ofthe data structures including write effects ie which parts are potentially modified byan operation and read effects ie which parts are potentially accessed by an opera-tion (Hatcliff et al 2012)

For objects and heap data structures the write and read effects (Greenhouse andBoyland 1999) refer to parts of the heap ie locations Specifications for heap datastructures might also require including allocation and deallocation effects as well aslocking information (Hatcliff et al 2012) Detecting and reasoning about read andwrite effects is necessary and relevant in different situations For instance Greenhouseand Boyland (Greenhouse and Boyland 1999) present an effects system for performingsemantics-preserving program manipulation on Java source code

Our work is done in the context of a purely functional language with immutabledata structures and no destructive updates Reasoning about the heap is beyond ourscope However our concerns are similar we handle ldquocompositerdquo data structuresmodeled by immutable associative arrays and algebraic data types ie structures andvariants and we want to capture the behaviour of operations receiving such a compositeinput manipulating it reconstructing it and returning its new state into a compositeoutput Thus in contrast to specification and reasoning techniques for objects whichare concerned with deep-heap effects we are concerned with deep-state effects

Specification techniques for topologies and effects must address three major chal-lenges namely abstraction reasoning and framing (Hatcliff et al 2012)

Abstraction In the object-oriented context heap properties must be expressed in animplementation-independent manner Abstraction is important for information hidingand for supporting subtyping (Leino 1998 Leavens and Muumlller 2007) Aspects relatedto visibility and information-hiding are orthogonal to our work The language we areworking with does not have subtyping Therefore disclosing the topology of our datastructures is not problematic from this point of view

Reasoning The formal framework in which (heap) properties are expressed shouldallow efficient ideally automatic reasoning

Framing Specifications of heap operations should ease reasoning about framing andaid in proving that certain heap properties are not affected by a heap operation Fram-ing can be illustrated by the following rule expressing that a state that is unmodifiedby C can be preserved

PCQP andRCQ andR

if the write effect of C is disjoint from the free variables of R In the presence of complexheap data structures the disjointness of the effects of C and the assertion R is moredifficult to express as it needs to specify that the locations that are modified by C aredisjoint from the locations read by R Similarly though not referring to locations we

24 Topologies and Effects 23

have to be able to express that the substructures (or subelements) modified by C andthose read by R are disjoint

The sets of written or read locations are called footprints Hatcliff et al classifyapproaches to the specification of heap properties into three categories The first cate-gory relies on explicit footprints and uses sets of objects or locations that are includedin predicates and effects specifications Dynamic frames (Kassios 2006 Kassios 2011)and region logic (Banerjee Barnett and Naumann 2008 Banerjee Naumann andRosenberg 2013) are the main exponents of this category The second category re-lies on implicit footprints which are derived from predicates in specialized logics suchas separation logic The third approach relies on predefined footprints which are de-rived from predefined heap topologies (Hatcliff et al 2012) Ownership types (ClarkePotter and Noble 1998) are the main exponent of this category All of these tech-niques allow specifying the topologies of common heap data structures and reasoningabout the effects of operations However each amounts to a different balance betweenexpressiveness and automation (Hatcliff et al 2012)

241 Explicit Footprints

The explicit footprint approach to framing was pioneered by Kassios and the dynamicframe theory (Kassios 2006 Kassios 2011) This proposed adding sets of locations tothe specification language and expressing footprints in terms of such sets For preservinginformation hiding these sets of locations can involve dynamic frames specificationvariables that abstract over a set of locations The initial solution based on dynamicframes was formalized in the context of an idealized logical framework using higher-order logic and inductive-based proofs which are difficult to automate Subsequentwork on region logic (Banerjee Naumann and Rosenberg 2008 Banerjee Barnettand Naumann 2008 Banerjee Naumann and Rosenberg 2013) and the Dafny verifieron one hand and VeriCool (Smans Jacobs and Piessens 2008) on the other handdeveloped dynamic frames in a first-order setting

VeriCool uses pure methods for describing sets of locations Recursively defined puremethods or logic functions can be a challenge for automatic theorem provers (Hatcliffet al 2012 Banerjee Barnett and Naumann 2008)

In region logic for minimizing the need for inductively defined predicates in spec-ifications the specification attributes used in the dynamic frames approach (Kassios2006) are replaced with ghost state (Banerjee Naumann and Rosenberg 2013) iemutable auxiliary fields and variables Programs have to be explicitly annotated withthese which might imply a cumbersome manual effort but unlike the dynamic frametheory in its original form this permits automated theorem proving

Zee et al have used explicit footprints for verifying the functional correctnessof linked data structures in Jahob (Zee Kuncak and Rinard 2008) Banerjee etal (Banerjee Naumann and Rosenberg 2008 Banerjee Barnett and Naumann 2008)encoded region logic in the intermediate verification language Boogie (Leino and Ruumlm-mer 2010)

24 Chapter 2 The Frame Problem in Software Verification

The dynamic frames approach using ghost variables is supported by the Dafnylanguage (Leino 2010 Koenig and Leino 2012) As described in Section 232 Dafnysupports the exclusive approach to specifying frames Ghost variables are used inmodifies clauses The standard idiom consists in declaring a set-valued ghost fieldRepr for instance to dynamically maintain Repr (ie explicitly update it in the code)as the set of objects that are part of the receiverrsquos representation and to use Repr inmodifies clauses (Leino 2010) The following idiom is standard (Leino 2010)

class MyClass ghost var Repr setltobjectgtmethod SomeMethod() modifies Repr

This modifies clause is to be interpreted as the method may modify any field ofany object in Repr If this is a member of the Repr set then the modifies clause alsoallows the method to modify the field Repr itself (Leino 2010)

With explicit footprints proving frame properties consists in proving that the readeffects of a predicate and the write effects of a method are disjoint

Before the dynamic frame approach data groups (Leino 1998 Leino Poetzsch-Heffter and Zhou 2002) and solutions based on the Universe type system (Muumlller2002) have been proposed for specifying footprints within single objects

The level of expressiveness offered by techniques based on explicit footprints is veryhigh allowing specifications to relate different regions in arbitrary ways ranging fromdisjointness or inclusion of regions to characterizing their intersection However thisflexibility complicates reasoning When regions are stored explicitly in ghost variablesas is done in Dafny programs need to explicitly update these ghost variables to maintaininvariants This can prove to be a cumbersome task When pure methods are used asin VeriCool it is mandatory to reason explicitly about the effects of heap modificationson their results (Hatcliff et al 2012)

242 Implicit Footprints

The implicit footprint approaches rely on specialized logics for implicitly representingfootprints Separation logic (OrsquoHearn Reynolds and Yang 2001 OrsquoHearn Yang andReynolds 2004 Reynolds 2002 Reynolds 2005 Reynolds 2000) is the most prominentrepresentative of this category

Separation logic extends Hoare logic (Hoare 1971) with the separating conjunctionoperator lowast Each assertion in separation logic defines a portion of the heap Theassertion P lowastQ is true if and only if P and Q hold for disjoint parts of the heap Localreasoning is fundamental to separation logic (OrsquoHearn Reynolds and Yang 2001)specifications need to describe all the state that the code C reads or writes Thus inthe triple PCQ P must be interpreted as being all the state that is needed forexecuting C ie the footprint of C This interpretation of Hoare triples leads to thefollowing frame rule in separation logic

24 Topologies and Effects 25

PCQP lowastRCQ lowastR

which allows inferring that a local property is preserved for a wider state obtained byextending P with another disjoint state R Some versions of separation logic imposeadditional conditions about local variable modifications as the lowast operator only separatesheaps Separation logic can be extended such that lowast also separates variables thuseliminating the need for additional conditions (Parkinson Bornat and Calcagno 2006)

A separation logic for Java was introduced by Parkinson (Parkinson and Bierman2005) This has primitive assertions to describe the values of fields in the heap andallows describing portions of the heap containing several disjoint objects using the lowastoperator

Separation logic does not require explicitly specifying read or write effects They areimplicit in a methodrsquos precondition Data structures are specified using logic functionsBy including such a logic function in a methodrsquos precondition the method is allowedto read and write anything belonging to the footprint of the logic function but cannotaccess anything outside this footprint

Approaches based on separation logic are hard to implement and to integrate intoverification tools Verifiers based on separation logic have mostly relied on sym-bolic execution and have not yet achieved the same level of automation as verifiersbased on verification condition generation (Hatcliff et al 2012) However currentlya series of tools exist that can reason using separation logic These include Small-foot (Berdine Calcagno and OrsquoHearn 2005 Berdine Calcagno and OrsquoHearn 2012)SpaceInvader (Distefano OrsquoHearn and Yang 2006 Calcagno et al 2008) jStar (Dis-tefano and Parkinson 2008 Naudziuniene et al 2011) VeriFast (Jacobs Smans andPiessens 2010 Jacobs et al 2011) and SLAyer (Berdine Cook and Ishtiaq 2011)

The implicit dynamic frames approach (Smans Jacobs and Piessens 2012) unifiesthe dynamic frames concept with separation logic Framing specifications of a methodare inferred using an implicit approach as described in Section 233 They are encodedin first-order logic and can be used for automatic verification with SMT solvers Thisis done in VeriCool (Smans Jacobs and Piessens 2008) and Chalice (Leino Muumlllerand Smans 2009)

243 Predefined Footprints

In contrast to the implicit and explicit footprint approaches which describe propertiesfound in a program the third approach focuses on reasoning efficiently about programswith restricted topologies Ownership types (Clarke Potter and Noble 1998) arerepresentative of this approach

Ownership types typically enforce a tree topology whereby every object in the heaphas at most one owner object and the owner relation is acyclic Topological propertiesbeyond this tree structure have to be expressed using object invariants and predicatelogic Read and write effects typically use ownership as an abstraction mechanism the

26 Chapter 2 The Frame Problem in Software Verification

right to read or write an object include the right to read or write all the objects it(transitively) owns (Hatcliff et al 2012)

Spec addresses framing through ownership types without explicit specificationsstating otherwise (modifies clauses of the form presented in Section 232) methodsmay modify only the fields of the receiver and of those objects within the subtree ofwhich the receiver is the root Ownership is expressed by means of attributes on fielddeclarations (Barnett et al 2004 Barnett et al 2011)

Ownership has been used to verify write effects (Muumlller Poetzsch-Heffter and Leav-ens 2003) and invariants (Drossopoulou et al 2008 Leino and Muumlller 2004 MuumlllerPoetzsch-Heffter and Leavens 2006) All the existing ownership-based verificationtechniques enforce that all modifications of an object must be initiated by the objectrsquosowner This gives owners total control over modifications of their internal representa-tions and allows them to maintain invariants (Hatcliff et al 2012) Ownership-basedapproaches have been used for reasoning about model fields (Leino and Muumlller 2006)and for enforcing object immutability (Leino Muumlller and Wallenburg 2008)

The ownership topology can be enforced by type systems (Lu Potter and Xue2007 Muumlller 2002) In JML it is enforced through universe types (Dietl and Muumlller2005) In Spec it is encoded as object invariants (Barnett et al 2004)

Reasoning about framing relies on the tree structure on the heap enforced by own-ership The ownership trees rooted in two different objects o1 and o2 are disjoint ifneither o1 owns o2 nor o2 owns o1 The disjointness of ownership trees can then beused to prove that read and write effects of methods do not overlap (Hatcliff et al2012)

25 Other Approaches to Reason about Frames

Rakamarić and Hu report in (Rakamaric and Hu 2008) a method to infer frameaxioms of procedures and loops based on static analysis As a starting point they usethe DSA shape analysis presented by Lattner et al (Lattner Lenharth and Adve2007) DSA provides a summary of points-to relations as a graph that is used tocompute a set of memory locations that are modified by a procedure or its callees Bya pass through the graph for each node that is reachable from the globals or procedureparameters they generate expressions representing a path to that node The generatedframe axioms are used internally by an extended static checker of C programs ie ina purely automatic setting

In (Taghdiri Seater and Jackson 2006) Taghdiri et al present a technique forextracting procedure summaries for object-oriented procedures used to prove verifi-cation conditions Procedures are executed symbolically and the environment of thepost-state is computed so as to express every variable and field in terms of the valuesof the variables and fields of the pre-state The extracted procedure summaries canbe viewed as detailed frame conditions describing which memory locations might bechanged and how

26 Other Relevant Work 27

In (Sozeau 2009) Sozeau presents a generalized rewriting technique implementedin the Coq proof assistant that allows substituting a term t of an expression by anotherterm tprime when t and tprime are related by a relation R This generalizes equational reasoningto reasoning modulo arbitrary relations The technique relies on dependent types andis based on a constraint generation algorithm generating type class constraints TheCoq tactic supports polymorphic relations morphisms and subrelations

Bertrand Meyer proposed the double frame inference strategy an approach that tar-gets the automation of both frame specification and frame verification in the contextof Eiffel (Meyer 1991) an object-oriented language with native support of Design byContract features (Meyer 1992) The first component ndash the frame specification infer-ence ndash relies on the analysis of method postconditions as described in Section 233 andobtaining a set p This represents an overapproximation of the set of elements that areallowed to be modified by p according to its specification The second component of thestrategy the frame implementation inference relies on the frame calculus (KogtenkovMeyer and Velder 2015) which is itself based on alias calculus (Kogtenkov Meyerand Velder 2015 Meyer 2010 Meyer 2011) Methods are analysed and p is detectedthis represents an overapproximation of the set of expressions whose values may changeas a result of executing p Frame verification amounts to verifying that p includes p

26 Other Relevant WorkPure methods also known as queries or observers are side-effect free methods that al-ways evaluate to the same result value given the same input value They are intensivelyused for providing specifications for methods without disclosing implementation detailsin languages such as JML Spec and Eiffel Leavens et al identify the developmentof specification and verification techniques for determining the effects of heap modifi-cations on the results of pure methods as one of the remaining challenging problemsrelated to framing (Leavens Leino and Muumlller 2007) Though our work is not con-cerned with heap modifications we are interested in the dependency of Boolean Smartpredicates ie logical properties on the layered (ldquocompositerdquo) data structures theyare receiving as inputs In Chapter 5 we present a static analysis meant to capturesuch dependencies

Various encodings of pure methods (Cok 2005 Darvas and Muumlller 2006) in pro-gram logic have been proposed but they do not cover aspects related to reasoningabout frame properties when the specifications make use of pure methods Some spec-ification techniques for frame properties (Leavens Baker and Ruby 2006 Leino andMuumlller 2006 Leino and Nelson 2002 Muumlller Poetzsch-Heffter and Leavens 2003)allow describing the fields that are potentially modified by a method execution usingmodifies clauses These however do not specify the effects of a method execution onthe results of pure methods (Leavens Leino and Muumlller 2007)

One technique for determining the effects of heap modifications on the results of puremethods requires listing all pure methods that are potentially affected by a methodin the methodrsquos modifies clause This approach is adopted in COLD-K (Feijs and

28 Chapter 2 The Frame Problem in Software Verification

Jonkers 1992) where the frame of a procedure specification lists the variables and theequivalent of pure methods whose value may be changed by the procedure For dealingwith modularity issues COLD-K also makes use of read effects

Other approaches (Leino and Muumlller 2006 Muumlller Poetzsch-Heffter and Leavens2003) for determining effects on the results of pure methods rely on model fields Theseare specification-only constructs whose value is determined by applying a mapping tothe concrete state of an object They are similar to pure methods but unlike the latterthey do not have parameters and they are required to be confined (Leino and Muumlller2006 Muumlller Poetzsch-Heffter and Leavens 2003)

Approaches based on model fields require that pure methods read only the stateof the receiver object and its sub-objects This information about the read effect of apure method can be used to determine which write effects potentially have an impacton the result of a pure method In general it can be proven that a method m does notaffect the result of a pure method p if the write effect of m and the read effect of p aredisjoint (Leavens Leino and Muumlller 2007)

There are various approaches to using read effects for reasoning about pure meth-ods One approach relies on complete specifications of result values included in thepostconditions of pure methods Used in conjunction with modifies clauses theseallow determining whether a method affects the result of a pure method (LeavensLeino and Muumlller 2007) Various solutions based on explicitly specified read effectsexist (Feijs and Jonkers 1992 Greenhouse and Boyland 1999 Jacobs and Piessens2006) Specification of these using data groups (Leino 1998 Leino Poetzsch-Heffterand Zhou 2002) and an effects system built on top of an ownership type system (Clarkeand Drossopoulou 2002) have been proposed Multi-threaded programs also requiresuch specifications (Praun and Gross 2003)

29

Chapter 3

The Smart Language andProvenTools

Languages are not strangers to oneanother

Walter Benjamin

In this chapter we introduce Smart a programming and specification languagedeveloped at Prove amp Run as well as the toolchain associated with it While notclaiming to be exhaustive we give an overview of the languagersquos features and syntax inSection 31 In Section 32 we present the tools manipulating Smartmodels Section 33briefly presents Smil the Smart Intermediate Language A computational version of itndash αSmil ndash is targeted by the static analyses presented throughout the remainder of thisthesis The following chapter will focus entirely on αSmil illustrating its usage andintroducing its syntax and formal semantics

31 The Smart Modeling LanguageSmart is a modeling language developed at Prove amp Run It constitutes a unified pro-gramming and specification language designed to facilitate proofs One of the commonoften cited reasons why programmers reject the use of formal methods is that they arenot willing to learn a separate language just for specifying their programs in particu-lar if that language is fundamentally different from the programming language Smartaddresses this issue by allowing one to both develop the implementation of programsand to specify their logical properties in a single language

The Smart language is a purely functional (side-effect free) strongly-typed poly-morphic first-order language The basic building blocks of programs written in Smartare predicates the equivalent of functions in other common programming languagesBesides the common primitive types that are traditionally available as built-in typesalgebraic data types (structures and variants) and associative arrays are provided aswell Exit labels constitute the languagersquos main specificity they facilitate separatingdata- and control-flow in programs

In addition being designed in order to write code that will subsequently be proventhe language allows the definition of various types of logical specifications as well

30 Chapter 3 The Smart Language and ProvenTools

These range from pre- and postcondition contracts local assertions and loop invariantsto inductive predicates lemmas and hypotheses

ProvenTools is a complex set of development tools for the Smart language It hasbeen developed at Prove amp Run with the goal of facilitating the achievement of high-levelcertifications The toolchain has the structure of a set of Eclipse plug-ins of JDT typendash Java Development Tools (Eclipse Java Development Tools (JDT)) Together theseconstitute a complete Integrated Development Environment (IDE) allowing one to notonly write edit and document Smart models but also to browse proof obligations toprove them by employing a built-in prover and finally to generate executable code inC

ProvenCore1 (Lescuyer 2015) and ProvenCore-M2 are two microkernels that havebeen completely modeled in Smart and developed using ProvenTools The former isa general-purpose microkernel that ensures isolation ie integrity and confidentialityThe latter targets embedded devices based on microcontrollers

Throughout the rest of this section we will present some of the main concepts andmechanisms of Smart discussing predicates control flow algebraic data types andspecification-only constructs

311 Smart Predicates and Types

Smart supports modular program development with a straightforward module con-cept Modules constitute the compilation units of Smart programs and any valid Smartprogram consists of a non-empty set of modules which are themselves organized inpackages Modules have an identifier that is unique in each program and in practicalterms each module corresponds to a file Modules can import other modules and theycontain a list of type and constant declarations as well as a list of predicates

Predicates the equivalent of functions in other common programming languagesare the basic building blocks of programs written in Smart Though named in referenceto predicate logic predicates in Smart receive a number of inputs and produce a numberof outputs in return in contrast to predicates in mathematics which are commonlyunderstood to be Boolean-valued functions of the form

P X rarr true false

Smart predicates can be classified in two different categories namely implicit andexplicit predicates based on their implementation or their lack thereof

Implicit predicates can be seen as a form of an assumption as their names suggestthey are not implemented per se but simply declared using the implicit programkeywords Such predicates are similar to the declarations of native methods in Javaor external functions in C Traditionally in Java programmers use the Java NativeInterface (JNI) (Liang 1999 Java Native Interface Documentation (JNI) 1999) whenthey need to implement small time-critical code portions in a lower-level language

1httpwwwprovenruncomproductsprovencore2httpwwwprovenruncomproductsprovencore-m

31 The Smart Modeling Language 31

such as assembly or when they need to access a library already written in anotherprogramming language such as C In Smart implicit predicates play an important rolewith respect to code documentation Their implementation is not provided in themodel but as we will further explain in Section 314 they can be used to specifylogical properties of the explicit implementations provided externally in a lower-levellanguage typically in C or assembly

For example an implicit predicate converting an integer given as an input into afloat can be declared as follows

public float_of ( int n f l oa t f+)impl ic i t program

The predicatersquos result is given a name f and it is introduced as one of the predi-catersquos parameters It is marked as being the predicatersquos output by the + symbol follow-ing it and is thereby syntactically distinguished from the predicatersquos input parametern which is unadorned

In the general case Smart predicates can have any number of input or output pa-rameters However a parameter cannot be both at the same time and each of thesemust be explicitly marked either as an input or as an output An input parameterrsquosvalue can be read and used in the predicatersquos implementation An output parameterrsquosvalue must be constructed by the predicatersquos implementation and returned as a resultFurthermore values in Smart are immutable As a consequence Smart predicates arepure it is impossible to pass a parameter ldquoby referencerdquo and modify a predicatersquos inputas a side-effect Smart is thus a side-effect free language which provides referential trans-parency (Strachey 1967) Furthermore the language supports neither global variablesnor global states but can be characterized rather as a state-passing style languageSmart predicates are deterministic they always return the same output any time theyare called with a specific set of input values In particular this is a prerequisite forimplicit predicates

As mentioned in the introduction Smart is also a strongly-typed language Eachinput and output parameter of a predicate must have an associated type and the us-age of an object of some type where a parameter of another data type is expected isforbidden by the language Unsafe conversions between different types are forbiddenas well Smart provides various built-in types such as int short long char booleanfloat and double that are traditionally available in other programming languages aswell Additionally users can declare new types with the type keyword and then de-fine predicates manipulating these types As in the case of predicates implicit datatypes can be simply declared without being explicitly defined For example supposingthat an implicit data type called cartesian_point and the predicates manipulating itare defined in a lower-level language we would make them available to other Smartpredicates using the following declarations

Implicit data type declarationtype cartesian_point

32 Chapter 3 The Smart Language and ProvenTools

Retrieve coordinate on X-axis public get_X ( cartesian_point p f l oa t x+)impl ic i t program

Retrieve coordinate on Y-axispublic get_Y ( cartesian_point p f l oa t y+)impl ic i t program

Construct a new point p with coordinates (x y)public new_point ( f l oa t x f l oa t y cartesian_point p+)impl ic i t program

Pretty - printpublic print_point ( cartesian_point p)impl ic i t program

Some implicit predicates manipulating inputs of type cartesian_point are declaredas well the first two of them ndash get_X and get_Y ndash simply return the input pointrsquos numer-ical coordinates on each of the Cartesian systemrsquos axes The next predicate new_pointcreates and returns a new point from the two given input coordinates Alternativelyit is possible to directly declare and implement these types and predicates in Smart aswe will show in the following paragraphs The last one print_point simply displaysthe input point without effectively producing an output As shown in the examplesimilarly to Java comments in Smart can be introduced by using for single-line com-ments or for multi-line comments Similarly to Javadoc code documentation canbe given using the begin-comment delimiter

In general implicit data types and the implicit predicates manipulating them canact as a public interface for a concrete class showing the type and the operationsallowed to manipulate values of that type but hiding the implementation

Explicit data types can be declared and defined using structures and variants Forexample we could explicitly define the type cart_point by means of a structure havingtwo different fields of type float called x and y Each of them corresponds to thepointrsquos numerical coordinates on the X- or Y-axis respectively

type cart_point = f l oa t xf l oa t y

For representing a point in a polar coordinate system we can define a different typepolar_point as follows

type polar_point = Radial coordinate ( distance from the pole) f l oa t radius

31 The Smart Modeling Language 33

Polar angle f l oa t azimuth

Explicit predicates have explicitly defined implementations following immediatelyafter their declaration which strongly resembles that of an implicit predicate but fromwhich the keyword implicit is omitted Their bodies are sequences of several state-ments which are essentially calls to other predicates For example to translate a point(x y) ie to add a given pair of numbers (a b) to its Cartesian coordinates and obtainthe new point (xprime yprime) = (x+ a y + b) a predicate translate_point could be defined inthe following manner

Convert x to float add it to y and retrieve the sum public sum_of ( int x f l oa t y f l oa t s+)impl ic i t programpublic translate_point ( cartesian_point p int a

int b cartesian_point q+)program f l oa t xa f l oa t yb Local variables

print_point (p) 1

get_X (p xa +) 2 get_Y (p yb +) 3

sum_of (a xa xa +) 4 sum_of (b yb yb +) 5

new_point (xa yb q+) 6

print_point (p) 7

The body of the translate_point predicate consists in a sequence of several state-ments the first of these simply pretty-prints the input point p The next two statementsare calls to accessors of prsquos coordinates on the X- and Y-axis which are stored in thelocal variables xa and yb respectively Next the coordinates (xaprime ybprime) = (a+xa b+yb)for the translated point are computed by calling the sum_of predicate which returnsthe float sum of an integer and a float The output point q is constructed by callingthe constructor new_point with xa and yb as inputs The last statement pretty-printsthe input point p again

As illustrated by our example each call to a predicate is made by passing theparameters in the same order as in the predicatersquos declaration and by explicitly mark-ing any output with +3 Replacing line 4 with sum_of(xa a xa+) would result in an

3This is mandatory because of overloading

34 Chapter 3 The Smart Language and ProvenTools

error because the first input parameter of a call to sum_of is expected to be an in-teger and the second a float Similarly omitting the + symbol at line 6 and writingnew_point(xa yb q) would result in an error By explicitly marking the outputs ofeach statement it is straightforward to distinguish between the variables that are ac-tually written by the statement and those that are used only as inputs Furthermoresince predicates are not allowed to modify their inputs the language strictly forbidsusing a predicatersquos input parameter as an output for any statement in the predicatersquosbody Thus in our example predicate we are prevented from using the input point pas the output of the new_point predicate call However outputs and local variablessuch as xa and xb can be written to but reading them (ie using them as inputs fora predicate call) before they have been written at least once amounts to using unini-tialized variables and behaves in an unspecified manner In our example xa and ya areused as both inputs and outputs at line 4 and 5 respectively This is correct since xaand ya are local variables that have already been written to by the statements at line2 and 3 preceding the calls to sum_of

We stress again the fact that destructive updates are not possible in Smart even ifat a first glance a statement such as the call to sum_of at line 4 might give the impressionthat xa is modified in place all that the statement actually does is to create a new floatwhose value is obtained by adding the old value of xa to the value of a and then to setxa to reference this new float instead of the old one A simple conversion to a staticsingle assignment form (Cytron et al 1989) would eliminate these assignments andshow the absence of any mutation whatsoever Thus were we to inspect the state ofthe input point p before and after the calls to sum_of we would observe that it remainsunchanged this is what we do when printing p again at the end of sum_of

As a last remark about our example it is noteworthy to mention that the statementnew_point(xa yb q+) which produces the predicatersquos output is not the predicatersquoslast statement Smart does not support any dedicated return statement Instead whenexiting from a predicate the outputs hold the values that they have been assigned whenexecuting the body This mechanism allows one to define predicates having multipleoutputs Their names are chosen by the programmer and their values can be modifiedmultiple times during the predicatersquos execution however the values retrieved are theones that are available at the moment the program exits the predicate

312 Exit Labels and Control Flow

Besides input and output parameters the declaration of a predicate can also include aset of exit labels When called a predicate exits with one of the specified exit labels thussummarising and returning to its callers further information regarding its execution

Exit labels constitute the main specificity of the Smart language They can denotedifferent exceptional execution scenarios and act as exit codes similarly to exceptionsand exit status return values in other programming languages

Every predicate has a non-empty set of labels by default any predicate has thebuilt-in exit label true that denotes the successful exit status of a predicate Thepredicates illustrated previously in Section 311 did not have explicitly declared exit

31 The Smart Modeling Language 35

labels in such a case it is assumed that the only possible exit label for the predicateis true and hence that the predicate will succeed in all circumstances

Returning to our previous example the predicate translate_point we could havewritten its complete declaration by explicitly stating that true is the only possible exitlabel

public translate_point ( cartesian_point p int aint b cartesian_point q+)

-gt [ true]program

This declaration is strictly equivalent to the one given in Section 311In the general case any number of labels can be specified after the parameters For

example we could declare a predicate that converts the coordinates of an input point(x y) of type cartesian_point to polar coordinates

r =radicx2 + y2

φ = atan2 (y x)

and returns a point (r φ) of type polar_point with these coordinates For computingthe second polar coordinate the polar angle or azimuth the predicate would call an-other predicate atan2 which is the arctangent function with two arguments a commonvariation on the arctangent function The atan2 function avoids the problem of divisionby zero however it is undefined when both x and y ie the Cartesian coordinates arezero For declaring it in Smart we can add a special exit label for the case when thegiven input coordinates represent the origin and the result cannot be returned

Computes atan(yx) public atan2( f l oa t x f l oa t y f l oa t at+) -gt [ true undef ]impl ic i t program

The declared labelrsquos name undef is a custom name and any valid identifier canbe chosen and used as a label in Smart As previously mentioned the exit label trueis predefined and has a special meaning Another predefined label that is interpretedin a special manner by conditional statements and logical operators is the false labelTogether these two exit labels offer a convenient manner to model a Boolean resultFrequently a Boolean output value can be replaced by declaring these two possible exitlabels true to denote a successful execution of the predicate and false respectively

Besides indicating the followed execution scenario exit labels play an importantrole with respect to control flow management Primarily the exit label of a call toa predicate determines whether the next predicate call in sequential order should beexecuted or not when the predicate exits with true the program can proceed to the

36 Chapter 3 The Smart Language and ProvenTools

next statement in the program Any other exit label lbl disrupts the normal controlflow and forces the current predicate to exit with label lbl

For example a predicate cart_to_polar can be defined with two exit labels trueand undef as well It takes two float numbers x and y computes the correspondingpolar coordinates r and phi by calling the predicates compute_radius and atan2 andconstructs a new point p of type polar_point using the computed values

public compute_radius ( f l oa t x f l oa t y f l oa t r+)-gt [ true]impl ic i t program

public cart_to_polar ( f l oa t x f l oa t y polar_point p+)-gt [ true undef ]program f l oa t phi f l oa t r

compute_radius (x y r+)

atan2 (y x phi +)new_polar_point (r phi p+)

There is no guarantee that the call to atan2 will return successfully with exit labeltrue it might return with undef in which case the execution of cart_to_polar willbreak at that point and exit with label undef Furthermore no output will be generatedIn Smart exit labels condition the existence of output parameters every output isassociated to an exit label lbl and it is generated if and only if the predicate exits withthat particular exit label lbl All other outputs are discarded and can be consideredas unchanged by the caller The same output can be associated to multiple labels Bydefault if no output parameters are specified for a label it means that no outputs aregenerated when the predicate exits with this label The only exception to this rule ismade in the case of the built-in true label since true normally represents a successfulexecution every output of the predicate is associated to it by default For examplethe previous declaration of cart_to_polar is strictly equivalent to

public cart_to_polar ( f l oa t x f l oa t y polar_point p+)-gt [ true ltpgt undef ltgt]program f l oa t phi f l oa t r

compute_radius (x y r+)atan2 (y x phi +)new_polar_point (r phi p+)

Exit labels can thus behave similarly to exceptions in other programming languages Inorder to handle specific observed execution scenarios Smart provides label transformerswhich allow catching labels before they escape the current predicate and transforming

31 The Smart Modeling Language 37

them into another label Complex control flow can be expressed by indicating a set ofrules of the form lbl1 lbl2 whose role is to transform the label lbl1 into lbl2 andby associating them to statements

For example we could let the predicate cart_to_polar return the label origin_failwhen the inner computation of the azimuth fails instead of just forwarding the labelreturned by atan2

public cart_to_polar ( f l oa t x f l oa t y polar_point p+)-gt [ true ltpgt origin_fail ]program f l oa t phi f l oa t r

compute_radius (x y r+)[undef origin_fail ]

atan2(y x phi +)new_polar_point (r phi p+)

Alternatively we could also handle the failure of the computation by using trans-formers and constructing the output point differently for example by declaring aconstant representing the azimuth of the origin often called pole in polar coordinatesand using this for the construction of p when the call to atan2 fails

public const float POLEAZIMUTH

public cart_to_polar (float x float y polar_point p+)-gt [true ltpgt]program float phi float r

compute_radius (x y r+)[done true]

[true done undef true]atan2(y x phi+)phi = POLEAZIMUTH

new_polar_point (r phi p+)

In the following we show how the control flows when atan2 terminates with labeltrue The green arrows indicate how control is passed from one statement to the otherbased on their exit labels when starting from the call to the atan2 predicate

38 Chapter 3 The Smart Language and ProvenTools

public const float POLEAZIMUTHpublic cart_to_polar (float x float y polar_point p+)-gt [true ltpgt]program float phi float r

compute_radius (x y r+)[done true]

[true done undef true]atan2(y x phi+)phi = POLEAZIMUTH

new_polar_point (r phi p+)

And here is how the control flows when atan2 terminates with label undef

public const float POLEAZIMUTHpublic cart_to_polar(float x float y polar_point p+)-gt [true ltpgt]program float phi float r

compute_radius (x y r+) 1 [done true ] 2

[true done undef true] 3 atan2(y x phi+) 4 phi = POLEAZIMUTH 5

new_polar_point (r phi p+)

After computing the radius r by calling compute_radius this new version of thepredicate starts by calling the predicate atan2 If this operation succeeds then phi isthe value of the azimuth and we can use this value as the second input parameter forthe pointrsquos constructor new_polar_point This is done by transforming true to a newlabel done whose effect is to jump immediately to the outer block in this case thetop-level The top-level block of the program catches done transforms it back to trueand continues with the statement following the block namely new_polar_point whichwill construct the output p by using r and phi the value of the azimuth returnedby atan2 When atan2 is undefined the transformer undef true is used to jump toan additional statement phi = POLEAZIMUTH that assigns the value of POLEAZIMUTH tophi The constructor is reached in this case as well However this time the value of phiwritten at line 5 is used as the second input parameter We note that the statementat line 5 is a call to a built-in assignment predicate denoted by = and using an infixnotation

The constant POLEAZIMUTH is declared using the keyword const In Smart constantscan be declared and used directly as inputs for predicate calls

31 The Smart Modeling Language 39

In the general case arbitrarily complex control flows can be expressed by couplinglabel transformers blocks and recursion

In order to facilitate the userrsquos task of simulating common control flow structureswith labels and transformers Smart provides various control flow statements whichare themselves based on this mechanism These include a construct that is equivalentto the try catch mechanism in Java a conditional if then else controlstructure as well as the common logical operators for negation () conjunction (ampamp)disjunction (||) implication (=gt) and equivalence (lt=gt)

Given the Cartesian coordinates (x y) the first polar coordinate the radius isobtained by computing radic

x2 + y2

For explicitly defining the predicate compute_radius we would first need to imple-ment a predicate sqrt computing the square root of a given positive number Such apredicate can be recursively implemented as follows by using the if then elseconstruct and three implicit predicates

Newton - Raphson Square Roots Finding Algorithm

Divides a to b and retrieves result in div public div_double (double a double b double div +)-gt [ true undef]impl ic i t program

Check if a is close enough to b |a - b| lt b 0001 public close_approximation (double a double b)-gt [ true f a l s e ]impl ic i t program

Compute ((b + ab) 2) public better_approximation (double a double b double g+)-gt [ true undef ]impl ic i t program

public sqrt(double x double g double sqr +)-gt [ true undef ] Returns the square root of x by making recursive callswith better and better guesses g until reaching a guessthat is close enough to the actual square root rsquos value program double aux

div_double (x g aux +)i f close_approximation (aux g)then

sqr = g

40 Chapter 3 The Smart Language and ProvenTools

e l se better_approximation (x g aux +)sqrt(x aux sqr +)

Besides recursion Smart also supports loops by providing a specific construct thatis similar to a traditional ldquowhilerdquo loop in other programming languageswhile

The body of thiswhile block is repeatedly executed until a dedicated exit label calledexit tries to escape in which case the loop is aborted and the execution continues afterthe block A ldquobreakrdquo can be achieved by raising the special exit label inside the loop

For instance the previously recursive predicate sqrt can be implemented iterativelywith a while loop as follows

public sqrt_iter (double x double g double sqr +)-gt [ true undef ] Computes the square root of x iteratively program

div_double (x g sqr +)while double aux

[ true exit f a l s e true]close_approximation (sqr g)

better_approximation (x g aux +)div_double (x aux sqr +)

313 Polymorphism amp Algebraic Data Types

Smart supports polymorphic types and predicates For declaring polymorphic types anumber of type parameters must be introduced in the typersquos declaration For examplean implicit type of polymorphic pairs can be declared as follows

type pair ltA Bgt

This type is parameterized by two types A and B which are the types of the first and sec-ond projection of the pair Type variables must always start with an uppercase letterwhile regular types must always start with a lowercase letter The declaration of poly-morphic predicates is straightforward For instance declaring an implicit constructorfor the pair type declared above amounts to the following

31 The Smart Modeling Language 41

public new_pair (A a B b pair ltA Bgt p+)impl ic i t program

This predicate is implicitly parameterized by two type variables A and B Thetype parameters of a predicate are implicitly determined by the type variables in itsarguments Local variables in explicit predicates can also be declared with polymorphictypes However they can only depend on type variables introduced in the predicatersquosparameters Type variables in polymorphic types can be instantiated by any type

As mentioned in Section 311 Smart allows users to define their own concrete datatypes by using algebraic data types namely structures and variants

Structures Structures also called records or tuples in other programming languagesrepresent the Cartesian products of the different types of their elements called fieldsIn Smart these can be declared in two manners either by using the keyword structfollowed by the name of the structured type and its list of field types and field namesor by using the keyword type as shown below The latter is preferred Declaringpolymorphic structures is possible by introducing type variables in the definition

struct pair ltA Bgt A fstB snd

type pair ltA Bgt = A fstB snd

In order to build and manipulate structures Smart supports built-in constructorsand accessors For instance for the following type definition of a structure

type t = t1 f1t2 f2

tn fn

a constructor a destructor as well as individual accessors and ldquoupdatersrdquo for any ofthe structurersquos fields are generated by Smart Constructing an object of type t amountsto using tnew which requires a value for each of trsquos fields For example creating astructure value s of type t with values e1 en for each field amounts to callingtnew(s+ e1 en) The values of these fields can all be read with a singlepredicate call to tall(s e1+ en+) (which ldquodestructsrdquo the structure value intoits fields components) Individual accessors of type tfi(s ei+) are provided as wellfor any field fi Finally the value of a field fi can be set to some variable vi by usingtfi(s+ vi) As all statements in Smart this call has a functional nature and handlesimmutable data Thus setting the value of the fi field amounts to returning a newstructure where all fields have the same value as s except fi which is set to vi

It is possible to define a structured type with no fields at all

42 Chapter 3 The Smart Language and ProvenTools

struct unit

The value s of this type can be constructed by using unitnew(s+) without any inputThis type can be seen as representing the absence of information

Variants Many programs need to deal with heterogeneous collections of values Forexample a node in a binary tree can be either a leaf or an interior node with twochildren similarly a node of an abstract syntax tree in a compiler can represent avariable an abstraction an application etc Variant types provide the mechanismthat supports this kind of heterogeneous value collections (Pierce 2002)

Variants also called tagged unions in other programming languages can be seen asthe dual of structures A variant is the disjoint union of different types It representsdata that may take on multiple forms where each form is marked by a specific tagcalled the constructor

Revisiting our previously declared types cartesian_point and polar_point in Smartwe can define a type point as being either expressed in Cartesian or in polar or sphericalcoordinates using the following variant declaration

type point =| Cartesian ( cartesian_point p)| Polar ( polar_point p)| Spherical ( f l oa t r f l oa t theta f l oa t phi)

Each form that a variant can take is indicated by the symbol | followed by theuppercase tag and the list of parameters and their types The cases are mutuallyexclusive and a value of type point can have only one form at a time An object of typepoint can be built by using one of the constructors called with the appropriate numberand types of inputs For instance a Cartesian point pc can be obtained by callingpointCartesian(p pc+) Given an object pt of type point we can also distinguishbetween the different cases by using a constructor that is similar to the match withconstruct in OCaml

switch (pt)case Cartesian ( cartesian_point p) get_X(p x+)case Polar ( polar_point p) get_radius (p r+)case Spherical ( f l oa t r f l oa t theta f l oa t phi)

For verifying if a given point pt is a Cartesian point we can use

pointcase[ Cartesian ](pt)

31 The Smart Modeling Language 43

This could be obtained using the switch construct but for practical considerationsthe case construct has been additionally provided as a built-in predicate

314 Specifications

Smart also supports various types of logical specifications ranging from axioms andlemmas to pre- and postconditions invariants and inductives

In Section 311 we stated that implicit predicates are a form of assumption andthat declaring implicit Smart types and the predicates manipulating them provides aconvenient manner of axiomatizing external implementations frequently developed in alower-level language They can provide implementation-independent descriptions andact as abstractions that hide hardware-related details and low-level implementationdecisions Another form of assumptions are hypotheses Hypotheses are logical resultsthat are assumed ie they constitute axioms which are supposed to be true In Smarthypotheses are specification-only predicates ie they cannot be called in the codeThey are introduced by the keyword hypothesis

For example we could revisit our polymorphic pair type introduced in Section 313and provide a polymorphic axiomatization for it by using implicit predicates and hy-potheses that stipulate that the operations fst and snd retrieve the first and secondrespectively elements of the pair These are declared as follows

type pair ltA Bgt

public new_pair (A a B b pair ltA Bgt p+)impl ic i t program

public fst(pair ltA Bgt p A a+)impl ic i t program

public snd(pair ltA Bgt p B b+)impl ic i t program

public hypothesis pair_fst (A a B b)program pair ltA Bgt p A a2

new_pair (a b p+)fst(p a2 +)a = a2

public hypothesis pair_snd (A a B b)program pair ltA Bgt p B b2

new_pair (a b p+)snd(p b2 +)b = b2

44 Chapter 3 The Smart Language and ProvenTools

Lemmas are another type of specification-only predicates meant to facilitate prov-ing logical properties In contrast to hypotheses lemmas must be proven A lemmacan be introduced with the keyword lemma and it states that all paths that exit fromits body with an undeclared exit label represent impossible execution scenarios

In Section 311 we introduced a type cartesian_point allowing to express a pointby its Cartesian coordinates and we defined a predicate translate_point for translatinga point by a given pair of numerical values (a b) We revisit our example and implementa predicate that translates a pair of points by a fixed pair of numbers (a b) that areadded to the Cartesian coordinates of each point of the pair In addition we consideran implicit predicate euclidean_dist that computes the Euclidean distance d

d =radic

(x2 minus x1)2 + (y2 minus y1)2

between a pair of points 〈(x1 y1) (x2 y2)〉 These are declared as follows

type point_pair = pair lt cartesian_point cartesian_point gt

For a pair of points (( x1 y1 ) (x2 y2 )) computed = sqrt ((x2 - x1 )^2 + (y2 - y1 )^2)

public euclidean_dist ( point_pair p f l oa t d+)-gt [ true]impl ic i t program

For a pair of points (( x1 y1 ) (x2 y2 )) and a fixednumerical pair (a b) compute ((x1 rsquo y1 rsquo) (x2 rsquo y2 rsquo))as (( x1 + a y1 + b) (x2 + a y1 + b))

public translate_pair ( point_pair p pair lt int int gt tpoint_pair o+)

-gt [ true]

The translation of a pair of points preserves the Euclidean distance between themthe Euclidean distance of a pair of points p will be equal to the Euclidean distanceof the pair of points obtained after a translation We can express this property bydeclaring it as a lemma

public lemma edist_preserved (pair lt f l oa t f loat gt tpoint_pair p)

program point_pair translated f l oa t d1 f l oa t d2

euclidean_dist (p d1+) =gttranslate_pair (p t translated +) =gteuclidean_dist ( translated d2 +) =gt d1 = d2

31 The Smart Modeling Language 45

Specifying contracts for Smart predicates is also possible by employing pre- andpostconditions A precondition represents a logical property that must be true priorto calling a predicate and it serves the purpose of letting the callers know when it issafe to call some predicate Typically it represents the callerrsquos obligations In Smarta precondition can be introduced with the keyword pre and it can be attached to anyimplicit or explicit predicate A precondition can refer to the predicatersquos inputs andit can declare its own local variables However it cannot make use of the predicatersquosoutputs

For instance for the atan2 predicate discussed in Section 312 we could indicatethat the predicate should never be called with the coordinates (0 0) of the origin byadding the following precondition

public const f l oa t ZERO

public atan2 ( f l oa t x f l oa t y f l oa t at +) -gt [ true]pre

x = ZERO || y = ZEROimpl ic i t program

A postcondition represents a logical condition that must be true after executinga predicate Its purpose is to indicate to the callers of a predicate what they areentitled to expect with respect to the outputs produced by the predicate In Smartpostconditions are introduced with the keyword post and they can be attached toany implicit or explicit (computational) predicate on a subset or all of the predicatersquosoutput labels They can refer to the predicatersquos inputs and the outputs associated tothe label considered in the postcondition Additionally they can declare their own localvariables

For instance a predicate equal_points verifying if two points are equal and havingfour possible exit labels eq_points eq_x eq_y and false respectively could declarepostconditions as follows

public equal_points ( cartesian_point p cartesian_point q)-gt [ eq_points eq_x eq_y f a l s e ]program f l oa t px f l oa t qx f l oa t py f l oa t qy

cartesian_pointx(p px +)cartesian_pointx(q qx +)cartesian_pointy(p py +)cartesian_pointy(q qy +)i f px = qxthen

[ true eq_points f a l s e eq_x] py = qy e l se

[ true eq_y] py = qy

post eq_points p = q

46 Chapter 3 The Smart Language and ProvenTools

post eq_x f l oa t x1 f l oa t x2 cartesian_pointequals[x](pq)

post p = q

The first postcondition applies to the exit label eq_points the second to the labeleq_x and the last one indicated by applies to labels eq_y and false

In Smart mathematical relations can be represented by introducing inductives orschemes These predicates have no outputs but they always have true and false astheir exit labels Inductive predicates are the only part of the language that cannot betransformed into executable code however they can be used to facilitate the proofsPredicates introduced with the inductive keyword represent the least fixed point oftheir cases introduced with the keyword case and a user-defined name Each case canintroduce existentially quantified variables In particular in the absence of recursioninductive predicates represent a parallel disjunction of cases An inductive predicatewill exit with the label true if any of its declared cases holds

For example we could specify membership for an implicit array type using aninductive named contains having a single case with the user-defined name ElemAtwhich introduces an existentially quantified variable idx

type array ltAgt

public get_size (array ltAgt arr int s+)impl ic i t programpublic get_elem (array ltAgt arr int i A ai+)-gt [ true oob]impl ic i t program

Membership defined with an inductive and an existential public contains (array ltAgt arr A a) -gt [ true f a l s e ]inductive An array contains an element if there exists a validindex where this element is to be found case ElemAt ( int idx ) A b

[ oob f a l s e ] get_elem (arr idx b+) ampamp b = a

Schemes on the other hand represent conjunction of cases cases are introducedwith the keyword with followed by a user-defined name and each of them can introduceuniversally quantified variables A scheme will return the label true only if all of itsdeclared cases hold

Using a scheme with two cases Size and Forall as shown below we can definethe pointwise equality of arrays The first case Size verifies if the two arrays have thesame length by introducing two universally quantified variables n and m The Forallcase verifies that for any index i the arrays contain equal elements Two arrays are

31 The Smart Modeling Language 47

equal pointwise if and only if they are of the same size and at any given index i thearrays have the same element

public equals_pointwise (array ltAgt arr1 array ltAgt arr2)-gt [ true f a l s e ] Extensional equality of arrays [arr1] and [arr2]scheme They must be of the same sizewith Size int n int m

get_size (arr1 n+) =gt get_size (arr2 m+) =gt n = m

If they exist elements at the same index must be equalwith Forall ( int i) A a A b

get_elem (arr1 i a+) =gt get_elem (arr2 i b+) =gta = b

Loop invariants are supported as well These can be introduced in various waysfor instance by declaring them with the keyword invariant or by declaring them asinductives

315 Illustrating Smart ndash An Abstract Process Manager

To illustrate the Smart language and its capabilities we consider an abstract processmanager and its fundamental components process and thread We define the data struc-tures corresponding to threads and processes implement the predicates correspondingto a simple thread switch and specify some fundamental properties for processes

Thread

Stack Register Counter

Data Files

Code

Process with a single thread

Thread1 Threadn

Stack Stack

Counter Counter

Register Register

Data Files

Code

Process with n threads

The implementation of threads and processes differs depending on the operatingsystem but frequently a thread is a component of a process that belongs to exactlyone process outside which it cannot exist Each thread represents a separate flow of

48 Chapter 3 The Smart Language and ProvenTools

control Multiple threads can be associated to one process they execute concurrentlyand provide a mechanism to improve application performance through parallelism Ina nutshell threads represent a software approach to improving the performance ofoperating systems by reducing the overhead of process switching

A thread is a flow of execution through the process code having its own programcounter that keeps track of which instruction to execute next as well as systemregisters which hold its current working variables and a stack which contains theexecution history Every thread is uniquely identified by a thread identifier Peerthreads share some information such as the code and data segments When one threadalters a code memory item all other threads see the change

Ready

Running

Blocked

Figure 31 ndash Possible Transitions between Thread States

We define a thread type as a structure consisting of multiple fields such as thethreadrsquos identifier its current state and the memory region for its stack

type memory_region = Start addressint start Region lengthint length

type state =| Ready| Running| Blocked

type thread = Identifierint id Current statestate crt_state Stackmemory_region stack

The threadrsquos stack is identified by its start address and its length The state of athread is defined as a variant having three alternatives Running (the thread is currentlyexecuting) Ready (the thread is currently awaiting execution and could potentially bestarted) and Blocked (the thread has exhausted its allocated time or is waiting foran event to occur it must be unblocked before being able to execute) The possibletransitions between states are shown in Figure 31 A threadrsquos current state determinesthe valid transitions

Similarly a process is defined as a structure consisting of an internal identifier anidentifier for the thread that is currently executing an address space and an array ofpossibly inactive threads associated with it Whether a thread in the thread array isactive or has terminated is indicated by a variant of type option An inactive thread

31 The Smart Modeling Language 49

indicated by None is a thread that terminated its execution and whose slot in the arrayof associated threads has not been reallocated In contrast a blocked thread indicatedby Some is a thread that cannot execute currently but should execute in the futureonce the resources it is waiting for are freed We consider a segmented address spacewith addresses existing not in a single linear range but instead in multiple segmentscorresponding to the code the data and the stack respectively

type option ltAgt =| None| Some (A a)

type address_space = memory_region codememory_region datamemory_region stack

type process = Array of associated threadsarray ltoption ltthread gtgt threads Internal idint pid Currently running threadint crt_thread Address spaceaddress_space adr_space

Next we consider a simple predicate called stop_thread having two possible exe-cution scenarios as indicated by its two exit labels true and invalid When the giveninput index i corresponds to an active thread the predicate executes successfully thusexiting with true In this case the state of the i-th thread associated to the inputprocess is set to Blocked and the new state of the process is returned in the outputout Otherwise when the given index i corresponds to a thread that is Ready or whenthere is no active thread at that index the predicate exits with the label invalid andno output is generated

public stop_thread ( process in int i process out +)-gt [ true invalid ]program array ltoption ltthread gtgt ta state s thread ti

option ltthread gt tio

Copy in to outout = in Fetch in threads and copy it to taprocessthreads (in ta +) Get the array rsquos i-th element[ oob invalid ] get_elem (ta i tio +) Check if the i-th element is activeswitch (tio)case Some ( thread th) ti = th case None ra i se invalid

50 Chapter 3 The Smart Language and ProvenTools

Get the thread rsquos current statethreadcrt_state (ti s+) Check whether the transition is valid[ f a l s e invalid ]statecase[ Running ](s) Create the new state for the running threadstateBlocked (s+) Set the newly created statethreadcrt_state (ti+ s) Reset tio to the thread with the modified stateoptionSome(tio + ti ) Reset the i-th thread and return the new state ta[ oob invalid ] set_ei (ta i tio ta +) Update out threads to taprocessthreads (out + ta)

Another auxiliary predicate called start_thread when given a valid index of anunblocked thread sets the state of the i-th thread to Running It is implementedsimilarly as shown below

public start_thread ( process in int i process out +)-gt [ true invalid ]program array ltoption ltthread gtgt ta state s thread ti

option ltthread gt tio

Copy in to outout = in Fetch in threads and copy it to taprocessthreads (in ta +) Get the array rsquos i-th element[ oob invalid ] get_ei (ta i tio +) Check if the i-th thread is activeswitch (tio)case Some ( thread th) ti = th case None ra i se invalid

threadcrt_state (ti s+)

Check whether the transition is valid[ f a l s e invalid ]statecase[Ready ](s) Create the new state for the running threadstateRunning (s+) Set the newly created statethreadcrt_state (ti + s) Reset tio to the thread with the modified stateoptionSome(tio + ti ) Set the i-th element and return the new state ta[ oob invalid ] set_ei (ta i tio ta +)

31 The Smart Modeling Language 51

Update out threads to taprocessthreads ( out + ta)

These two predicates will be called by the predicate run_thread that performs asimple thread switch It stops the thread currently executing indicated by crt_threadand starts the one with the given index i The new state of the process is returned inthe output out

public run_thread ( process in int i process out +)-gt [ true inval ]program int crt

processcrt_thread (in crt +)[ true true invalid inval ] stop_thread (in crt out +)[ true true invalid inval ] start_thread (out i out +)processcrt_thread (out + nid )

Next we introduce a fundamental property for any valid process state namely thefact that the stack regions of all its associated threads are completely disjoint

public not_disjoint ( process p) -gt [ true f a l s e ]inductivecase StacksJoint ( int i int j) thread ti thread tj memory_region sti

memory_region stj i = j[None f a l s e ] thread (p i ti +)[None f a l s e ] thread (p j tj +)threadstack(ti sti +) threadstack (tj stj +)overlap (sti stj )

case CodeStackJoint ( int i)

thread ti memory_region sti address_space asmemory_region code [None f a l s e ] thread (p i ti +)threadstack (ti sti +)processadr_space (p as +)address_spacecode(as code +)overlap (sti code )

case DataStackJoint ( int i)

thread ti memory_region sti address_space asmemory_region data [None f a l s e ] thread (p i ti +)threadstack (ti sti +)

52 Chapter 3 The Smart Language and ProvenTools

processadr_space (p as +)address_spacedata(as data +)overlap (sti data )

public disjoint_stacks ( process p) -gt [ true f a l s e ]program

not_disjoint (p)

This property is expressed using an inductive predicate that characterizes the potentialsituations in which the memory isolation of the different associated threads of a processcan be broken The natural manner of expressing such a property in Smart is by usinga scheme as presented in Section 314 here we use an inductive predicate becausethe language we are working with and which will be presented in Chapter 4 doesnot support schemes In our inductive predicate the first case StacksJoint checkswhether there exist two different threads having overlapping stacks The next twocases CodeStackJoint and DataStackJoint check whether there exists a thread whosestack overlaps the processrsquo code segment or data segment respectively This uses anauxiliary predicate verifying if two memory regions overlap ie if there exists anaddress that is contained simultaneously by two different segments This operation issymmetric we express this property with the lemma overlap_sym

public contains ( memory_region m int address )-gt [ true f a l s e ]impl ic i t programpublic overlap ( memory_region m1 memory_region m2)-gt [ true f a l s e ]inductivecase InBoth ( int address )

contains (m1 address ) ampamp contains (m2 address )

public lemma overlap_sym ( memory_region m1 memory_region m2)-gt [ true f a l s e ]program

overlap (m1 m2) =gt overlap (m2 m1)

32 ProvenToolsProvenTools is a comprehensive set of development tools for the Smart language Ithas been developed at Prove amp Run with the goal of facilitating the achievement ofhigh-level certifications The toolchain has the structure of a set of Eclipse plug-ins ofJDT type ndash Java Development Tools Together these constitute a complete IntegratedDevelopment Environment (IDE) allowing one to not only write edit and document

32 ProvenTools 53

Smart models but also to browse proof obligations to prove them by employing abuilt-in prover and finally to generate executable code in C or Java

The plug-ins are based on Xtext (Xtext Documentation) an official Eclipse plug-indedicated to the creation of DSLs (Domain Specific Languages) in Eclipse Xtext-basedDSLs are described in an EBNF (Extended Backus-Naur Form) grammar languageFully statically typed expressions can be embedded in the developed DSL and Javastyle scoping and linking are supported

Proofs

ProofObligations

C Code

Java Code

Prover

Code Generators

Prover

Code Generators

SmilSmart Code ampSpecifications

Front-end Back-end

Figure 32 ndash The ProvenTools Toolchain

Concretely the toolchain includes a compiler whose front-end contains the plug-inin charge of Smart as well as the plug-in dedicated to Smil the Smart IntermediateLanguage to which Smart programs and specifications are translated Smil is a simplerform of the Smart language Though roughly equivalent to Smart Smil has a ratherdifferent form manipulating less complex structures and having no syntactic sugarHarder to be understood by a human reader Smil is meant to be easily manipulated bythe back-end of the toolchain The back-end currently offers a C code generator andan interactive prover An overview of this architecture is shown in Figure 32

While employing ProvenTools the code undergoes various compilation steps andtransformations During the compilation chain the Smart code is transformed to aSmart AST (Abstract Syntax Tree) The obtained AST is then compiled to a SmilAST Following the Smil AST is transformed to Smil source code and then reinsertedin the compilation chain by the plug-in in charge of it

After finishing all the compilation chain and obtaining the Smil AST and the asso-ciated Smil source code the back-end of the compiler can be employed The back-endcomprises a source code generator and a prover The generator transforms Smart mod-els into their equivalents in C

54 Chapter 3 The Smart Language and ProvenTools

Figure 33 ndash Smart Editor

Smart Editor The Smart editor provides facilities to edit Smart code and supportsbroad and complex features such as syntax highlighting facilities for code navigationand visualization and edition assistants including word completion and quick fixes Asnapshot of it is shown in Figure 33

Prover ProvenTools provides users a dedicated view for interacting with the proverThis presents the existing proof obligations and provides facilities to solve them Proofobligations are generated for any logical lemma precondition postcondition or invariantincluded in the Smart models Additionally any label that remains unhandled in thecode triggers the generation of a proof obligation thus enforcing that each possible exitlabel of a predicate is either explicitly handled or proven to be impossible

An automatic prover trying various proof search procedures is called whenever aproof obligation is generated It uses previously proven obligations or existing hypothe-ses for discharging new obligations automatically Unproved obligations can be solvedby interactively employing manual tactics called hints which are provided in the IDEHints that are considered useless with respect to the currently selected proof obliga-tions are automatically disabled Additionally users can define strategies ie proofpatterns and employ an interactive proof assistant that applies them automatically inthe background This will suggest a possible proof as soon as it finds one Proofs thusfound are rechecked as if they had been done manually

33 Smil 55

ProvenTools offers facilities to inspect any manual or automatic proof step thusmaking an eventual review of the proofs possible The toolchain also provides a dedi-cated system for assisting the user into adapting former proofs to new changes due tocode maintenance or evolution

C Code Generator The executable part of Smartmodels is translated to executableC code by the C code generator To this end the executable parts of the Smart modelsare identified and extracted while the logical parts are discarded Users can guidethis process through annotations and they can specify that particular values are purelylogical Functional implementations are transformed to imperative ones the dedicatedC code generation plug-in tries to replace functional modifications of structures in themodels by in-place updates Such transformations are correct only if the differentvalues are handled linearly in the Smart code ie if no previous value is read afterapplying a functional update on it For ensuring the safety of functional to imperativecode transformations the C generation plug-in employs various global static analysesWhen safety cannot be guaranteed the generator reports errors or introduces copiesif the users deemed it acceptable

In earlier experiments (Lescuyer 2015) the Prove amp Run team was able to generateC code for a complete model of ProvenCore that did not require dynamic allocationand ran at a speed comparable to the original C code

33 SmilSmil is an intermediate language to which Smart models are compiled Similarly toSmart Smil is a functional language with algebraic data types (structures and variants)However unlike Smart Smil is not a user-oriented language ie it was not designed towrite programs in it directly but rather to provide a representation of Smart programsat a different level of abstraction Thus reading Smil code is a rather cumbersome taskas it is a language without syntactic sugar meant to serve as a starting point for themain components of the ProvenTools back-end exploiting Smart models the prover andthe code generator

To give an idea of Smilrsquos syntax we illustrate below the types thread and processas well as the stop_thread predicate from our abstract process manager example givenin Section 315

public type state =| Ready| Running| Blocked

public type thread = id int crt_state statestack memory_region

56 Chapter 3 The Smart Language and ProvenTools

public state_acopy_ahypothesis (state state_1 ) -gt [ true]hypothesis state state_2

[lt1gt] stateswitch ( state_1 )-gt [ Ready -gt 5 Running -gt 4 Blocked -gt 3]

[lt2gt] ==ltstate gt( state_1 state_2 )-gt [ true -gt true f a l s e -gt error ]

[lt3gt] stateBlocked ( state_2 )-gt [ true -gt 2]

[lt4gt] stateRunning ( state_2 )-gt [ true -gt 2]

[lt5gt] stateReady ( state_2 )-gt [ true -gt 2]

public thread_ahypothesis ( thread x1) -gt [ true]hypothesis thread x2 int zid state zcrt_state

memory_region zstack [lt1gt] threada l l (x1 zid zcrt_state zstack )

-gt [ true -gt 2][lt2gt] threadnew(x2 zid zcrt_state zstack )

-gt [ true -gt 3][lt3gt] ==lt thread gt( x1 x2)

-gt [ true -gt true f a l s e -gt error ]

The type declarations in Smil strongly resemble their Smart counterpart Predicatedeclarations as well mirror the form found in Smart except that in Smil any outputvariable associated to the true exit label is explicitly declared as such Preconditionsand postconditions are appended to any predicate and as shown above a hypothesisis added for any explicitly declared type

The real syntax differences are visible in predicate implementations every state-ment is preceded by a numerical label and every possible exit label lbl of the statementindicates another numerical label The latter numerical label actually designates thestatement that will be executed next if the current statement exits with label lbl Inparticular this mechanism replaces the try catch and the conditional controlconstructs as well as the logical operators and any other construct based on labeltransformers described in Section 312 Thus the predicate bodies are very similar inform to a control flow graph where the statements represent the nodes of the graphand the exit labels represent transitions

public stop_thread ( process in int i process out +)-gt [true ltout gt invalid ]

pre [lt0gt] true() -gt [ true -gt true]

33 Smil 57

array ltoption ltthread gtgt ta state s thread tioption ltthread gt tio thread th

[lt1gt] =lt process gt( out in)-gt [ true -gt 2]

[lt2gt] processthreads (in ta)-gt [ true -gt 3]

[lt3gt] get ltoption ltthread gtgt(ta i tio)-gt [ true -gt 4 oob -gt invalid ]

[lt4gt] optionswitch ltthread gt( tio th)-gt [None -gt 6 Some -gt 7]

[lt5gt] stateBlocked (s)-gt [ true -gt 8]

[lt6gt] true()-gt [ true -gt invalid ]

[lt7gt] =lt thread gt(ti th)-gt [ true -gt 5]

[lt8gt] threadcrt_state +( ti ti s)-gt [ true -gt 9]

[lt9gt] optionSome ltthread gt( tio ti)-gt [ true -gt 10]

[lt10 gt] set ltoption ltthread gtgt(ta i tio ta)-gt [ true -gt 11 oob -gt invalid ]

[lt11 gt] set ltoption ltthread gtgt(ta i tio ta)-gt [ true -gt 12 oob -gt invalid ]

[lt12 gt] processthreads +( out out ta)-gt [ true -gt true]

post true 0post invalid 0

In a nutshell Smil constitutes a representative albeit restricted set of constructsand it is a language designed to be well-suited for further transformations and analyses

The next chapter focuses entirely on αSmil the computational version of Smil withwhich we are working throughout the rest of this thesis We will illustrate its usageand describe its abstract syntax and formal semantics

59

Chapter 4

The αSmil Language

One day I will find the right wordsand they will be simple

Jack Kerouac

In this chapter we define the syntax and the semantics of αSmil the languagethat we consider in this thesis This is a computational version of Smil (presented inSection 33) which is essentially a subset of Smart presented in the previous chapterChapter 3 However it contains a few additional elements introduced for the purposeof this thesis

The αSmil language is a first-order purely functional and strongly-typed languagewith arrays and algebraic data types ie structures and variants It is an intermediateanalysis-oriented language

41 αSmil SyntaxThe αSmil language is minimal in the sense that it contains only those constructs thatare needed for the purpose of this thesis For instance unlike Smart and Smil thelanguage does not contain visibility modifiers because these modifiers play no role inthe techniques presented in the sequel During the introduction of the grammar wewill point out the most important deviations from Smart and Smil

Programs A program in αSmil consists of a number of type and constant declara-tions and definitions followed by a collection of predicates In contrast to Smart andSmil type and predicate declarations have no visibility modifiers (such as public) andthey are not organized into modules The absence of visibility modifiers is a naturalconsequence of the disappearance of modules We assume that there is one modulein which every type constant and predicate declaration resides and these are mutu-ally visible to each other These restrictions are made for the sake of simplicity sincethe techniques proposed in this thesis are orthogonal to the concepts of visibility andmodules

Constants are declared using the keyword const followed by the type and the con-stant identifier Constant identifiers are written in upper-case letters and are precededby the special symbol

60 Chapter 4 The αSmil Language

Types are declared using the keyword type followed by the type identifier and op-tionally in the case of polymorphic type declarations by a number of type parametersgiven in upper-case letters between ltgt In the case of implicit types this constitutes thecomplete type declaration Explicit type declarations continue with the symbol = andthe typersquos definition Throughout the rest of this chapter and the presentation of ourstatic analyses we will ignore polymorphism The abstract types of our analyses arenot polymorphic and the impact of polymorphism is visible only at the implementationlevel for type substitutions that will be discussed in Chapter 8

Types Similarly to Smart algebraic data types ie structures and variants andassociative arrays are supported We let T be the universe of type identifiers andT0 sub T the set of base type identifiers We assume a set of identifiers for structurefields and variant constructors denoted by F and C respectively

A structure represents the Cartesian product of the different types of its elementscalled fields A variant is the disjoint union of different types It represents data thatmay take on multiple forms where each form is marked by a specific tag called theconstructor Arrays group elements of data of the same type (given in angle brackets)into a single entity elements are selected by an index whose type is included (as denotedby the superscript) in the arrayrsquos definition as well

Definition 411 Types τ isin T in αSmil

τ isin T τ = | τ0 isin T0 base types| structf1 τ fn τ fi isin F 0 le n structures| variant[C1 τ | | Cn τ ] Ci isin C 1 le m variants| arrτ 〈τ〉 arrays

Variants and structures can be used together to model traditional algebraic variantswith zero or several parameters For instance a generic type optionltTgt is actuallymodeled as

variant[Some structt T | None struct]

Concretely structures are declared and defined by indicating a set of pairs of fieldidentifiers and their corresponding types between Declaring structures with no fieldsis possible Variants are declared and defined by indicating the list of their constructorseach starting with an upper-case letter preceded by the symbol | Unlike structuresvariants must have at least one declared constructor For instance the state and threadtypes from our Abstract Process Manager example given in Smart in Section 315 onpage 48 have the following Smil declaration

type state =|Ready| Running| Blocked

type thread = id int crt_state statestack memory_region

41 αSmil Syntax 61

In contrast to Smart in structure declarations the field name precedes the field type

Predicates Predicates are declared using the keyword predicate which is specificto αSmil followed by a predicate identifier and a signature A signature is given by asequence of input types and a non-empty finite mapping of exit labels λ isin L errorto sequences of output labels The set of exit labels L contains three distinguishedelements true false and error The latter cannot appear in predicate signatures it isused as a sink node in control flow graphs which will be presented in Section 42 Wewrite signatures in the following manner

σ =

(x1 τ1 xn τn)︸ ︷︷ ︸input identifiers types

[λ1 (τ11 y11 τ1k1 y1k1)| |label (output types identifiers)︷ ︸︸ ︷λp (τp1 yp1 τpkp ypkp)]︸ ︷︷ ︸

p possible exit labels

We denote by Σ the mapping between predicate identifiers and their signaturesThe predicate declaration is followed by the predicatersquos body Depending on its

bodyrsquos nature a predicate will be implicit explicit or inductive Smart implicit andexplicit predicates have been presented in Section 311 of our previous chapter whileinductive predicates have been illustrated in Section 314 on page 46 For implicitpredicates the body consists solely in the keyword implicit For explicit predicates anoptional declaration unit can follow This is a finite mapping from variables to types andit must be given between double curly braces ie typeid videntifier Input andoutput parameters must be different from all the variables appearing in the declarationunits Declaration units are followed by a sequence of statements representing calls topredicates

Just as presented in Chapter 314 for Smart an inductive predicate is syntacticallydistinguished by the keyword inductive followed by its different cases declared withthe keyword case followed by an identifier an optional list of existentially quantifiedvariables and a body of statements

A generic call to a predicate p is of the form

p(e1 en) [λ1 o1 | | λm om]

The predicate p is called with inputs e1 en and yields one of the declared exitlabels λ1 λm each having its own set of associated output variables o1 omrespectively We denote by o a sequence of 0 or more output variables

Statements The αSmil language supports the statements presented in Table 42These represent calls to built-in predicates and can be seen as special cases of thepredicate call presented above All statements have a functional nature and handleimmutable data A statement consists in as many variables as there are input types

62 Chapter 4 The αSmil Language

s = | o = e (1) assignment| e1 = e2 (2) equality test| nop (3) no operation| r = e1 en (4) create structure| o1 on = r (5) destructure structure| o = rfi (6) access field| rprime = r with fi = e (7) update field| rprime = 〈f1 fk〉rprimeprime (8) check (partial) structure equality| v = Cp[e] (9) create variant| switch(v) as [o1| |on] (10) destructure variant| v isin C1 Ck (11) variant possible| o = a[i] (12) array access| aprime = [a with i = e] (13) array update| p(e1 en) [λ1 o1 | | λm om] (14) predicate call

Table 42 ndash αSmil ndash Set of Supported Statements

in the signature σp of the called built-in predicate p and a mapping associating toeach exit label of σp a sequence of variables one variable for each output type in thecorresponding sequence

The first three statements are generic and can be applied to any type Statement (1)is a call to the built-in assignment predicate denoted by = present in an identical formin Smart as well Statement (2) is a call to the logical operator = verifying whether itstwo input arguments are equal Statement (3) is the αSmil equivalent of a no-operationAs a general convention for the statements notation we denote by e the identifiers ofentry variables and by o the identifiers of output variables

Statements (4) ndash (8) are structure-related The first of them statement (4) is theconstructor of a structure r of type rtype having n fields It corresponds to the state-ment rtypenew(r+ e_1 e_n) in Smart Statement (5) returns the values ofall the fields of r into the output parameters o1 on and it is the equivalent ofrtypeall(r o_1+ o_n+) in Smart Statement (6) is the individual accessor ofa field fi and corresponds to rtypef_i(r e_i+) in Smart As previously mentionedour language is purely functional and handles only immutable algebraic data structuresand arrays Therefore setting the field fi of a structure shown in (7) and being theequivalent of rtypef_i(rrsquo+ e_i) returns a new structure where all fields have thesame value as in r except fi which is set to ei Statement (8) verifies if the valuesof the indicated subset of fields of two structures rprime and rprimeprime are equal It exists inSmart as well where it has a similar syntax rtypeequals[fg](rrsquo rrsquorsquo) for check-ing that the values of fields f and g of the two structures are equal or the dualrtypeequals-[fg](rrsquo rrsquorsquo) for checking that the values of all fields except f and gare equal

The next group of statements is variant-related The first of them statement (9)creates a new variant v of type vtype using the constructor Cp with e as an argumentIt corresponds to vtypeCp(v+ e) in Smart Statement (10) is used for matching on

41 αSmil Syntax 63

the different constructors of the input variant v and corresponds to switch(v) case in Smart The last statement of this group statement (11) verifies if the given variantwas created with one of the constructors in C1 Ck This could be obtained witha variant switch but for practical considerations it has been provided as a built-inpredicate Its counterpart in Smart is vtypecase[C1 Ck](v)

Statements (12) and (13) are array-related (12) returns the value of the i-th cell ofthe input array a Similarly to (7) updating the i-th cell of an array ndash shown in (13) ndashhas a functional nature It returns a new array where all cells have the same values asin a except the i-th cell which is set to e These statements are specific to αSmil

Statement (14) is a generic call to a predicate p and has been presented on page 61

Exit Labels All of the built-in supported statements have an associated set of exitlabels λ isin L error These are indicated in Table 43 There are two distinguishedexit labels true and false respectively An additional built-in label called error is usedas a sink node in control flow graphs It cannot be used as an exit label for a predicate

Table 43 ndash Statements and their Exit Labels

Statement Exit Labels

o = e (1)[true 7rarr o

]

e1 = e2 (2)

[true 7rarr emptyfalse 7rarr empty

]

nop (3)[true 7rarr empty

]r = e1 en (4)

[true 7rarr r

]o1 on = r (5)

[true 7rarr o1 on

]o = rfi (6)

[true 7rarr o

]rprime = r with fi = e (7)

[true 7rarr rprime

]

rprime = 〈f1 fk〉rprimeprime (8)

[true 7rarr emptyfalse 7rarr empty

]

v = Cp[e] (9)[true 7rarr v

]

64 Chapter 4 The αSmil Language

switch(v) as [o1| |on] (10)

λC1 7rarr o1

λCn 7rarr on

v isin C1 Ck (11)

[true 7rarr emptyfalse 7rarr empty

]

o = a[i] (12)

[true 7rarr ofalse 7rarr empty

]

aprime = [a with i = e] (13)

[true 7rarr aprime

false 7rarr empty

]

p(e1 en) [λ1 o1 | | λm om] (14)

λ1 7rarr o1 λm 7rarr om

As shown in Table 43 statement (10) has an exit label λCi corresponding to eachconstructor Ci of the input variant Statements (2) (8) and (11) are bi-labeled using trueand false as logical values Neither of them has any associated outputs Statements (12)and (13) are bi-labeled as well However unlike the previously mentioned statementsthey use the label false as an ldquoout of boundsrdquo exception and generate an output onlyfor the label true All other statements except (14) are uni-labeled they associate alltheir output parameters (if any) to the label true In contrast to Smart in αSmilevery exit label including true must be explicitly indicated Furthermore any outputis explicitly associated to an exit label

In Section 315 (on page 50) of our previous chapter we introduced a Smart pred-icate called stop_thread If the given index i designates an active associated threadthis predicate sets its state to Blocked and returns the new state of the process Oth-erwise the predicate exits with label invalid Revisiting it we can finally indicate itsbody in the αSmil language1

Table 44 ndash Predicate Body in αSmil

Signaturepredicate stop_thread ( process p int i)-gt [ true process o | invalid ] Declaration unit array lt option_thread gt ta option_thread th

thread ti state s Predicate body

1The αSmil version is slightly simplified as we are not checking if the transition to Blocked is valid

41 αSmil Syntax 65

ta = p threads [ true -gt 1] 0th = ta[i] [ true -gt 2 f a l s e -gt 9] 1switch (th) as [ti | ] [Some -gt 3 None -gt 9] 2s = Blocked [ true -gt 4] 3ti = ti with crt_state = s [ true -gt 5] 4th = Some(ti) [ true -gt 6] 5ta = [ta with i = th] [ true -gt 7 f a l s e -gt 9] 6o = p with threads = ta [ true -gt 8] 7[ true] 8[ invalid ] 9

Every statement in our stop_thread example is followed by a construct of the formexit_label -gt numerical_label This indicates the statement to be executed next asidentified by the numerical_label if the current statement exits with label exit_labelFor example when the first statement ta = pthreads exits with label true thepredicatersquos execution continues with the statement following it having the numericallabel 1 We remark that the predicatersquos exit labels are included in the body of anexplicit predicate as can be seen at lines 8 and 9 respectively in the case of trueand inval Intuitively the predicatersquos body resembles a control flow graph and canbe illustrated as shown in Figure 41 The predicatersquos exit labels are the control flowgraphrsquos exit nodes as will be discussed in Section 42

0 ta = inthreads1 th = ta[i]2 switch(th) as [Someti | None]3 s = BLOCKED4 ti = ti with current_state=s5 th = Some(ti)6 ta = [ta with i=th]7 o = in with threads=ta8 true 9 inval

false

None

false

Figure 41 ndash Body of the stop_thread Predicate

We are working with αSmil which is a computational version of Smil where allspecification-only predicates have been removed Simulating hypotheses lemmas andcontracts is straightforward and can be achieved using predicates having only the trueand false labels and no associated output Inductives are the only exception to thisrule they are supported in αSmil as well and their declaration is similar to the one inSmart The αSmil equivalent of the not_disjoint inductive presented in our AbstractProcess Manager example (on page 46) has the following form

predicate not_disjoint ( process p)-gt [ true | f a l s e ]inductive

66 Chapter 4 The αSmil Language

case StacksJoint ( int i int j) thread ti thread tj memory_region sti

memory_region stj i = j [ true -gt 1 f a l s e -gt 7]thread (p i)[ true ti | None] [ true -gt 2 None -gt 7]thread (p j)[ true tj | None] [ true -gt 3 None -gt 7]sti = tistack [ true -gt 4]stj = tjstack [ true -gt 5]overlap (sti stj )[ true| f a l s e ] [ true -gt 6 f a l s e -gt 7][ true][ f a l s e ][error]

case CodeStackJoint ( int k)

thread tk memory_region stk address_space aspmemory_region code

thread (p k)[ true tk | None] [ true -gt 1 None -gt 6]stk = tkstack [ true -gt 2]asp = p adr_space [ true -gt 3]code = aspcode [ true -gt 4]overlap (stk code )[ true| f a l s e ] [ true -gt 5 f a l s e -gt 6][ true][ f a l s e ][error]

case DataStackJoint ( int l)

thread tl memory_region stl address_space aspace memory_region data

thread (p l)[ true tl | None] [ true -gt 1 None -gt 6]stl = tlstack [ true -gt 2]aspace = p adr_space [ true -gt 3]data = aspace data [ true -gt 4]overlap (stl data )[ true| f a l s e ] [ true -gt 5 f a l s e -gt 6][ true][ f a l s e ][error]

predicate disjoint_stacks ( process p) -gt [ true | f a l s e ]

not_disjoint (p)[ true| f a l s e ] [ true -gt 1 f a l s e -gt 2][ true][ f a l s e ][error]

This inductive predicate has been introduced and explained in Section 315 of theprevious chapter (on page 52) and it characterizes the potential situations in which thememory isolation of the different associated threads of a process can be broken

42 Control Flow Graph 67

42 Control Flow GraphPredicate bodies in αSmil resemble a control flow graph representation having state-ments as nodes The nodes represent program states and the edges are defined bystatements with a particular exit label λ

The control flow graph Gp = (N E) of a predicate p has a node ni isin N for eachprogram point For each statement s at program point ni that can execute and reachprogram point nj with exit label λk an edge (ni nj) is added to Gp and labeled withs and λk Gp has a single entry node nin isin N corresponding to the program pointassociated to the first statement of p The set of exit nodes nout sub N consists of thenodes associated to each possible exit label λk of the predicate To these one additionalexit node which is used as a sink node is added This corresponds to the error label

In practice all the outgoing edges of a node ni isin N bear the different cases of thesame statement s found at program point ni Thus the edges are labeled with thesame statement s and there is an edge labeled s λk for each possible exit label λk of s

The subfigures in Figure 42 show the control flow graph of the following predicate

predicate thread ( process p int i)-gt [ true thread ti | None | oob]

which receives a process p and an index i as inputs and returns the i-th active threadof the input process If the i-th thread is inactive it exits with the exit label NoneIn the case of an ldquoout of boundsrdquo exception the exit label oob is returned For betterreadability Figure 42-b gives the control flow of the same predicate where we havelabeled the nodes with statements of the predicate and the edges with their exit la-bels Throughout the rest of our αSmil predicate examples we will favour the latterrepresentation

a) Gthread b) Gthread ndash alternative representationn1

n2

n3 oob

true None

ts = pthreads true

tio = ts[i] truetio = ts[i] false

switch (tio) as [ti| ] Some switch (tio) as [ti| ] None

ts = pthreads

tio = ts[i]

switch(tio) as [ti| ] oob

true None

true

true false

Some None

Figure 42 ndash Example ndash Control Flow Graph of Predicate thread

43 Well-Typed αSmil StatementsWe formally define what it means for an αSmil statement to be well-typed and detailthe full system of inference rules for the statements supported by αSmil in Table 46

68 Chapter 4 The αSmil Language

and Table 47A well-typed αSmil statement is a statement that is compatible with the types

specified in the signature σp of the called built-in predicate p This requires a typingenvironment Γ mapping variables to their types

Definition 431 Typing Environment Γ

Γ V rarr T

Furthermore αSmil distinguishes between variables v isin V which can be writtento and variables which are read-only Therefore the definition of well-typedness forstatements requires two different sets of variable identifiers one for each kind of variableThese are

bull V+ V+ sube V which denotes the set of identifiers of writable and readable vari-ables and

bull V V+ which denotes the set of read-only variables

The mapping between predicate identifiers and their signatures is denoted by Σ

Definition 432 Mapping between Predicate Identifiers and Signatures

Σ P rarr S

Definition 433 Well-Typed Statement A statement s exiting with label λ isin L error is well-typed in the typing environment Γ given Σ

ΣΓO ` srarr λ

if it is compatible with the types specified in its signature Moreover outputs of awell-typed statement must be in the writable variables set O sube V+

The inference rule for a well-typed predicate call captures all these properties andis shown in rule [WTPCall] given in Table 46

Table 46 ndash Well-Typed Predicate Call

Σ(p) = (x1 τ1 xn τn)[λ1 (τ11 y11 τ1k1 y1k1)| | λm (τm1 ym1 τmkm ymkm)]

Γ(e1) = τ1 Γ(en) = τnforalli isin 1 m Γ(oi1) = τi1 Γ(oiki) = τiki

oi1 oiki isin O foralli foralljforallki j 6= ki oij 6= oiki λ isin λ1 λmΣΓO ` p(e1 en) [λ1 o1 | | λm om]rarr λ

WTPCall

43 Well-Typed αSmil Statements 69

The inference rules for the αSmil statements representing calls to built-in predicatesare detailed in Table 47

Table 47 ndash Well-Typed Statements

Γ(e1) = Γ(e2) λ isin true falseΣΓO ` e1 = e2 rarr λ

WTEquals

Γ(o) = Γ(e) o isin OΣΓO ` o = erarr true

WTAsgn

ΣΓO ` noprarr trueWTNop

Γ(r) = structf1 τ1 fn τnΓ(e1) = τ1 Γ(en) = τn r isin OΣΓO ` r = e1 en rarr true

WTRecNew

Γ(r) = structf1 τ1 fn τnΓ(o1) = τ1 Γ(on) = τn foralli oi isin O foralli 6= j oi 6= oj

ΣΓO ` o1 on = r rarr trueWTRecAll

Γ(r) = structf1 τ1 fi τi fn τn Γ(o) = τi o isin OΣΓO ` o = rfi rarr true

WTRecGet

Γ(r) = Γ(rprime) = structf1 τ1 fi τi fn τnΓ(e) = τi rprime isin O

ΣΓO ` rprime = r with fi = e rarr trueWTRecSet

Γ(rprime) = Γ(rprimeprime) = structg1 τ1 gn τnλ isin true false f1 fk sube g1 gn

ΣΓO ` rprime = 〈f1 fk〉rprimeprime rarr λWTRecEq

Γ(v) = variant[C1 τ1| | Cp τp| | Cn τn]Γ(e) = τp v isin O

ΣΓO ` v = Cp[e]rarr trueWTVarCons

Γ(v) = variant[C1 τ1| | Cp τp| | Cn τn]Γ(op) = τp op isin O

ΣΓO ` switch(v) as [o1| |on]rarr λCpWTVarSwitch

70 Chapter 4 The αSmil Language

Γ(v) = variant[D1 τ1| | Dm τm]C1 Ck sube D1 Dm λ isin true false

ΣΓO ` v isin C1 Ck rarr λWTVarPos

Γ(a) = arrτi〈τ〉 λ isin true false Γ(i) = τi Γ(o) = τ o isin OΣΓO ` o = a[i]rarr λ

WTAGet

Γ(aprime) = Γ(a) = arrτi〈τ〉λ isin true false Γ(i) = τi Γ(e) = τ aprime isin O

ΣΓO ` aprime = [a with i = e]rarr λWTASet

The well-typedness of statements plays an important role with respect to the state-mentsrsquo interpretation as we will show in the next section It is also essential for thewell-typedness and well-formedness of dependency and correlation summaries that willbe presented in the following chapters

The control flow graph Gp = (N E) of a predicate p is well-typed if any edge labeledwith (s λ) isin E is well-typed

forall(s λ) isin E ΣΓO ` srarr λ

ΣΓO ` Gp = (N E)WTCfg

Figure 43 ndash Well-Typed Control Flow Graph

44 Operational Semantics of αSmil StatementsThis section presents the structural operational semantics (Nielson Nielson and Han-kin 1999 Plotkin 2004) of the αSmil language Sometimes also called the small stepoperational semantics this allows reasoning about intermediate stages in a programrsquosexecution and emphasizes the individual steps of the computation

Types We take T0 to be the universe of primitive types τ0 isin T0 Structures variantsand associative arrays are defined inductively Structures are finite labeled products oftypes They are a generalization of the Cartesian product Variants are finite labeleddisjoint unions of several types τ Two types are equal when they are pointwise equal

Semantic Values For each type τ we define the set Dτ of semantic values of thattype For each primitive type τ0 isin T0 we suppose a given Dτ0 Other semantic valuesare defined inductively as shown below

44 Operational Semantics of αSmil Statements 71

Definition 441 Semantic Values Dτ

Dstructf1τ1fnτn = f1 = v1 fn = vn| foralli vi isin Dτi

Dvariant[C1τ1| | Cnτn] =⊎

1leilenCi[v]| v isin Dτiwhere⊎

is the disjointunion

Darrτi 〈τ〉 = (P (vk)kisinP)| P sube Dτi forallk isin P vk isin Dτ

In αSmil arrays are partial In a semantic value belonging to Darrτi 〈τ〉 P denotesthe domain of valid indices for the array

Two values of the same type are equal when they are pointwise equalTraditionally in operational semantics one is interested in how the state is modified

during the execution of a statement αSmil has no concept of state per se what isessential is the evaluation of variables in different environments or semantic contextsTo emphasize this idea we define a valuation or environment E isin E as a mappingfrom variables to semantic values

Definition 442 Valuation or environment E

E V rarr D

Two valuations E and Ersquo are equal if they are mapping the same set of variables tosemantic values that are pointwise equal

E = Eprime lArrrArr forallv isin V E(v) = Eprime(v)

Given a typing environment Γ a valuation E is well-typed if the value mapped toany variable v isin Dom(E) is of the appropriate type Γ(v) We denote this by Γ ` Eand show it in [WTEnv]

forallv isin E E(v) isin DΓ(v)

Γ ` EWTEnv

Definition 443 A configurationlangE [s]

rangof the semantics is a pair consisting of a

valuation and a statement

Definition 444 The transitions of the semantics are of the formlangE [s]

rang λminusrarr Eprime

They express how the configuration is changed by one step of computation occur-ing when executing a statement s that exits with label λ The exit label yielded bythe statementrsquos execution uniquely determines the statement that will be executednext The change of the valuation is recorded in the resulting valuation Ersquo We write

72 Chapter 4 The αSmil Language

E [xrarr v] for the valuation that is identical to E except that x is mapped to the valuev We say that E is extended with xrarr v and formally we define it as shown below

Definition 445 Extend E with xrarr v

(E [xrarr v])(y) =v if x = yE(y) otherwise

Extending a valuation E with multiple mappings x rarr v consists in applying theextension in a left-associative fashion In the following we will omit parentheses forsuch extensions thus denoting

( ((E [x1 rarr v1])[x2 rarr v2]) )[xn rarr vn]

asE [x1 rarr v1] [x2 rarr v2] [xn rarr vn]

An interpretation I isin I for a predicate is defined as a mapping from a predicateand an initial environment to an output environment and an exit label

Definition 446 Predicate Interpretation I isin I

I P times E rarr E times L

The initial environment is a mapping between the predicatersquos formal arguments andtheir effective values The output environment is a mapping between the predicatersquosformal output arguments and their effective values after executing the predicate

The detailed definition of the semantics of generic statements is described belowin Table 48 The first clause [nop] constitutes an axiom as it has no premises Itstates that the nop statement executes in one step yielding the exit label true withoutextending the valuation E The semantics of equality tests is given by two inferencerules [equalT ] and [equalF ] one for each of the statementrsquos possible exit labels Acall to the built-in predicate = will exit with label true if and only if the valuations ofits arguments e1 and e2 are equal (clause [equalT ]) Otherwise the statement will exitwith label false (clause [equalF ]) In both cases the statement leaves the valuation Eunchanged The semantics of an assignment is given by the [asgn] clause the statementalways yields the exit label true and extends the valuation E with o mapped to thevalue E(e) of e

Table 48 ndash The Structural Operational Semantics of αSmil GenericStatements

[nop]langE [nop]

rang trueminusminusrarr E

[equalT ]E(e1) = E(e2)lang

E [e1 = e2]rang trueminusminusrarr E

44 Operational Semantics of αSmil Statements 73

[equalF ]

E(e1) 6= E(e2)langE [e1 = e2]

rang falseminusminusrarr E

[asgn]Eprime = E [orarr E(e)]langE [o = e]

rang trueminusminusrarr Eprime

The semantics of structure-related statements is given in the Table 49 The creationof a structure always yields the exit label true as indicated by the [recNew] clause andit extends the valuation E by mapping the resulting output variable r to the structuralvalue obtained by mapping every field fi to the value E(ei) of the corresponding eiarguments The destructuring of a structure r extends the valuation E by mappingevery output oi to the corresponding value E(vi) of the fi field of r The statementalways exits with true The valuation Eprime obtained after executing an access to a givenfield fi of a structure r is an extension of E where the output o is mapped to thecorresponding value of rrsquos fi field in E The semantics of a field update is given bythe clause [recSet] This statement extends the valuation E by mapping the outputstructure rprime to a new value where the updated field fi is mapped to the value of e inE and every other field is mapped to the same value it had in E Finally the last twoclauses correspond to a partial structure equality test As shown by [recEqualsT ] thestatement yields the exit label true if and only if the values of every field gi in the givenset of fields are equal for r and rprime in E Otherwise the statement yields the label falseIn both cases the valuation E remains unchanged

Table 49 ndash Operational Semantics of αSmil Structure-RelatedStatements

[recNew]Eprime = E [r rarr f1 = E(e1) fi = E(ei) fn = E(en)]lang

E [r = e1 en]rang trueminusminusrarr Eprime

[recAll]

E(r) = f1 = v1 fn = vnEprime = E [o1 rarr v1] [o2 rarr v2] [on rarr vn] foralli j i 6= j oi 6= ojlang

E [o1 on = r]rang trueminusminusrarr Eprime

[recGet]

E(r) = f1 = v1 fi = vi fn = vnEprime = E [orarr vi]lang

E [o = rfi]rang trueminusminusrarr Eprime

[recSet]

E(r) = f1 = v1 fi = vi fn = vnEprime = E

[rprime rarr f1 = v1 fi = E(e) fn = vn

]langE [rprime = r with fi = e]

rang trueminusminusrarr Eprime

74 Chapter 4 The αSmil Language

[recEqualsT ]

E(rprime) = f1 = vf1 fn = vfnE(rprimeprime) = f1 = wf1 fn = wfn

g1 gk sube f1 fn vgi = wgi foralli isin 1 klangE [rprime = 〈g1 gk〉rprimeprime]

rang trueminusminusrarr E

[recEqualsF ]

E(rprime) = f1 = vf1 fn = vfnE(rprimeprime) = f1 = wf1 fn = wfn

g1 gk sube f1 fn existi i isin 1 k vgi 6= wgilangE [rprime = 〈g1 gk〉rprimeprime]

rang falseminusminusrarr E

Table 410 details the semantics of variant-related statements As indicated by the[varCons] clause the construction of a variant v with a constructor Cp always yieldsthe exit label true The obtained valuation Eprime is an extension of E where the valueof v is obtained by applying the constructor Cp to the argumentrsquos value E(e) Avariant switch exits with the label λCi if the value of v in E has been constructedwith the Ci constructor The valuation Eprime obtained after executing the statement is anextension of E whereby the corresponding output oi is mapped to the value of the Ciconstructorrsquos argument E(e) The last two clauses [varPossibleT ] and [varPossibleF ]indicate the semantics of a variant possible check and correspond to the statementrsquospossible exit labels The statement will yield the label true only if the value of v in E hasbeen obtained with a constructor D that is a member of the given set of constructorsC1 Ck Otherwise the false label will be returned In both cases the valuationremains unchanged

Table 410 ndash Operational Semantics of αSmil Variant-RelatedStatements

[varCons]Eprime = E [v rarr Cp[E(e)]]langE [v = Cp[e]]

rang trueminusminusrarr Eprime

[varSwitch]

E(v) = Ci[e] Eprime = E [oi rarr E(e)]langE [switch(v) as [o1| |on]]

rang λCiminusminusrarr Eprime

[varPossibleT ]E(v) = D[e] D isin C1 Cklang

E [v isin C1 Ck]rang trueminusminusrarr E

[varPossibleF ]

E(v) = D[e] D isin C1 CklangE [v isin C1 Ck]

rang falseminusminusrarr E

44 Operational Semantics of αSmil Statements 75

Table 411 describes the semantics of array-related statements Each array-relatedstatement has two corresponding clauses one for each of the Boolean exit labels Ac-cessing an arrayrsquos element yields the exit label true if the given index i is a valid indexThe resulting valuation Eprime is extended by mapping the output o to the value in E ofthe arrayrsquos i-th element Otherwise when the given index i is invalid as indicatedby the [arrGetF ] clause the statement yields the label false and leaves the valuationunmodified The semantics of an array update is given by the [arrSetT ] and [arrSetF ]clauses If the given index i is valid the exit label true is yielded and the resultingvaluation is obtained by extending E with aprime whose i-th elementrsquos value is the value ofe in the initial valuation E The values of all other elements of aprime are the ones found inE for the elements of a On the contrary if the given index i is invalid the valuationremains unchanged and the label false is yielded

Table 411 ndash Operational Semantics of αSmil Array-RelatedStatements

[arrGetT ]

E(a) = (P (v)k) E(i) isin P Eprime = E[orarr vE(i)

]langE [o = a[i]]

rang trueminusminusrarr Eprime

[arrGetF ]

E(a) = (P (v)k) E(i) isin PlangE [o = a[i]]

rang falseminusminusrarr E

[arrSetT ]

E(a) = (P (v)k) E(i) isin P

E

[aprime rarr (P (w)k) wk =

E(e) if k = E(i)vk otherwise

]langE [aprime = [a with i = e]]

rang trueminusminusrarr Eprime

[arrSetF ]

E(a) = (P (v)k) E(i) isin PlangE [aprime = [a with i = e]]

rang falseminusminusrarr E

The semantics of a generic predicate call p(e1 en) [λ1 o1 | | λm om] is cap-tured by the [pCall] inference rule shown in Table 412 Interpreting the predicate p inthe context of its argumentsrsquo values in the valuation E yields a label λi and a map-ping between its formal output arguments and their resulting values vij The resultingevaluation Eprime is obtained by extending E with the output variables oij mapped to thecorresponding vij

The interpretation of a statement is well-typed with respect to a signature if andonly if every tuple in the interpretation is well-typed ie if it has the expected numberof inputs with the adequate types and an adequate label with well-typed outputs as

76 Chapter 4 The αSmil Language

well Furthermore it has to be total ie for every well-typed tuple of inputs thereexists a label and some outputs that match in the interpretation

Table 412 ndash Semantics of a Predicate Call

Σ(p) = p(x1 τ1 xn τn)[λ1 (τ1 y1)| | λi (τi1 yi1 τiki yiki)| | λm (τm ym)]

I(p inputs) = (outputs λi) inputs(xl) = E(el)foralll isin 1 noutputs(yi1) = vi1 outputs(yiki) = viki

Eprime = E [oi1 rarr vi1] [oiki rarr viki ]langE [p(e1 en) [λ1 o1 | | λm om]]

rang λiminusrarr EprimepCall

Definition 447 Subject Reduction PropertyThe interpretation of a well-typed statement given well-typed interpretations for

the external predicate calls preserves the fact that the valuation is well-typed

forall Γ E s λΣ (Γ ` E) and (ΣΓO ` srarr λ) and (langE [s]

rang λminusrarr Eprime) =rArr Γ ` Eprime

Definition 448 The Progress PropertyA well-typed statement in a well-typed environment can always be interpreted to

some label and outputs

forall EΓΣ s (Γ ` E) and (ΣΓO ` srarr λ) =rArr existλprime EprimelangE [s]

rang λprimeminusrarr Eprime

The well-typedness of an interpretation as well as the subject reduction and progressproperties have been formally proven in Coq by Steacutephane Lescuyer

77

Chapter 5

Dependency Analysis forFunctional Specifications

like islands in the sea separate onthe surface but connected in the deep

William James

Algebraic data types (structures and variants) and associative arrays are fundamen-tal building blocks for representing grouping and handling complex data efficientlyHowever as argued in Chapter 1 operations manipulating them are rarely concernedwith the entire compound input data structure Frequently they depend only on a lim-ited subset of their input Complete specifications or contracts (Meyer 1997) of suchoperations will not only stipulate that the output possesses a certain property (BorgidaMylopoulos and Reiter 1993 Polikarpova et al 2013) but will also include their frameconditions (Borgida Mylopoulos and Reiter 1995) ie the parts of the input on whichthey operate Such conditions facilitate reasoning locally without overlooking the globalpicture if a property P is known to hold at a certain point in the program where apredicate p is called P still holds after the call to p provided that the (sub)structureson which P depends are disjoint from the (sub)structures that might be modified ac-cording to prsquos frame condition (Banerjee and Naumann 2014) Though intuitivelyeasy specifying and proving the preservation of logical properties for the unmodifiedpart is a particular manifestation of the frame problem (McCarthy and Hayes 1969Leavens Leino and Muumlller 2007) ndash a notoriously cumbersome task in formal softwareverification imposing unnecessary manual effort (Meyer 2015)

One of the challenges of addressing this problem and thereby simplifying the ver-ification of certain preserved properties is to determine the input fragments on whichthese properties depend ie their footprint (Distefano OrsquoHearn and Yang 2006)or to a first approximation their read effects (Feijs and Jonkers 1992 Greenhouseand Boyland 1999 Clarke and Drossopoulou 2002) While specifications sometimesinclude the write effects (Clarke and Drossopoulou 2002) of an operation through mod-ifies clauses (Guttag et al 1993b) read effects are usually not specified explicitly eventhough this information can be useful for reasoning about an operationrsquos results Thepurpose of the dependency analysis presented in this chapter is to take a step forward in

78 Chapter 5 Dependency Analysis for Functional Specifications

this direction and to detect such information automatically More precisely our analy-sis is a static dependency analysis for the αSmil language (presented in Chapter 4) thatcomputes a conservative approximation of the input fragments on which the operationsdepend

Dependence and liveness analyses are traditionally used in the compilation realmfor code optimization (Kennedy 1978) dead code elimination (Knoop Ruumlthing andSteffen 1994 Wand and Siveroni 1999 Liu and Stoller 2003) program slicing (Weiser1984 Tip 1995 Reps and Turnidge 1996 Castillo et al 2008) or compile-time garbagecollection (Jones and Meacutetayer 1989 Park and Goldberg 1992 Wand and Clinger1998) In contrast to the vast majority of static analyses that are meant to be usedstrictly on code and in an essentially purely automatic setting our analysis is thoughtof as a companion tool to be exploited in the middle of interactive program verificationand it is designed to be used on programs as well as on specifications

51 Dependency Analysis in a NutshellIn a nutshell our dependency analysis targets the delimitation of the input subset onwhich the output depends in the context of an operation with a compound input Wedefine dependency as the observed part of a structured domain and strive to obtain type-sensitive results distinguishing between the subelements of arrays and algebraic datatypes and capturing the dependency specific to each The targeted results are meantto mirror ndash in terms of dependency ndash the layered structure of compound data typesFurthermore the dependency analysis must work with conservative approximations andit must guarantee that what is marked as not needed is definitely not needed ie it isirrelevant for the obtained output

In the classification of Hind (Hind 2001) our dependency analysis is a flow-sensitive field-sensitive interprocedural analysis that handles associative arrays struc-tures and variant data types Specific dependency results are computed for each of thepossible execution scenarios ie for each exit label Thus our analysis also shows aform of path-sensitivity (Hind 2001) However we favour the term label-sensitivity todescribe this characteristic as it seems more appropriate applied to our case and thelanguage we are working with

Our dependency analysis targets complex transition systems in general and oper-ating systems and microkernels in particular These are characterized by states definedby complex compound data structures and by transitions ie state changes that mapan input state to an output state Automatically proving the preservation of invariantsconcerning only subelements of the state ie fields array cells etc that have not beenaltered by a transition in the system would considerably diminish the number of proofobligations The first step towards achieving this goal consists in automatically detect-ing dependency summaries and the minimum relevant input information for producingcertain outputs

As mentioned our analysis targets fine-grained dependency summaries for arraysstructures and variants expressed at the level of their subelements For variants

51 Dependency Analysis in a Nutshell 79

besides capturing the specific dependency on each constructor and its arguments weargue that additional relevant information can be computed regarding the subset ofpossible constructors at a given program point This is not dependency informationper se but it enriches the footprint of a predicate with useful information Togetherwith the dependency information this additional information about constructors ismeant to answer the same question namely what fragments of the input influence theoutput from a different albeit related point of view Therefore we are simultaneouslyperforming a possible-constructors analysis This has an impact on the defined abstractdependency type making it more complex as we will see in the following section Thepossible-constructors analysis could be performed separately as a stand-alone analysisBy performing the two analyses simultaneously we lose some of the precision thatwould be attained if the two were performed separately but we reduce overhead andpresent relevant information in a unified manner

Designing the analysis as a tool to be used in the context of interactive programverification on both code and specifications has led to specific traits One of themconcerns the treatment of arrays In contrast to dependence and liveness analyses usedfor code optimizations (Gross and Steenkiste 1990) which require precision for everyarray cell we compute dependency information referring to all cells of the array orto all but one cell for which an exceptional dependency is computed In practice aconsiderable number of relevant properties and operations involving arrays fall into thisspectrum

In the following subsection in order to better illustrate the problem that our analysisaddresses we briefly present two examples of αSmil predicates manipulating structuresvariants and arrays and describe the dependency information that we are targeting

511 Targeted Dependency Information

To present the envisioned dependency results we consider two αSmil predicates threadand start_address whose control flow graphs and implementations are shown belowBoth predicates manipulate inputs of type process introduced in Section 315 (onpage 49) and shown in Figure 52 Internally they handle values of type thread andmemory_region respectively described in Section 315 (on page 48) as well and shownbelow in Figure 51

type memory_region = Start addressstart int Region lengthlength int

type thread = Identifierid int Current statecrt_state state Stackstack memory_region

Figure 51 ndash Example Data Types ndash Thread and Memory Region

80 Chapter 5 Dependency Analysis for Functional Specifications

type option ltAgt =| None| Some (A a)

type process = Array of associated threadsthreads array ltoption ltthread gtgt Internal idpid int Currently running threadcrt_thread int Address spaceadr_space address_space

Figure 52 ndash Input Type ndash Process

The first predicate thread having the control flow graph shown in Figure 54 andwhose implementation is shown in Figure 53 receives a process p and an index ias inputs It reads the i-th element in the threads array of the input process p Ifthis element is active then the predicate exits with the label true and outputs thecorresponding thread ti Otherwise it exits with the label None and no output isgenerated

predicate thread ( process p int i)-gt [ true thread ti|None|oob]

array ltoption ltthread gtgt th option ltthread gt tio th = p threads [ true -gt 1]tio = th[i] [ true -gt 2 f a l s e -gt 5]switch (tio) as [ |ti] [None -gt 4 Some -gt 3][ true][None ][oob]

Figure 53 ndash Predicate thread ndash Implementation

Our dependency analysis should be able to distinguish between the different exitlabels of the predicate For the label true for instance it should detect that onlythe field threads is read by the predicate while all others are irrelevant to the resultFurthermore it should detect that for the threads array of the input p only the i-thelement is inspected Additionally since we are considering the label true the i-thelement is necessarily an active thread indicated by the constructor Some The otherconstructor None is impossible for this execution scenario On the contrary for theexit label None the constructor Some is impossible For the exit label oob nothing butthe index i and the ldquosupportrdquo or ldquolengthrdquo of the associated threads array is read Thetargeted dependency results for the predicate thread are depicted in Figure 55

The second predicate start_address whose control flow graph is shown in Fig-ure 56 receives a process p and an index j as inputs and finds the start address of

51 Dependency Analysis in a Nutshell 81

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some None

Figure 54 ndash Gthread ndash Control Flow Graph of Predicate thread

Exit label true

adr_space

crt_thread

pid

process p

ithreads

Exit label None

adr_space

crt_thread

pid

process p

ithreads

optionltthreadgt

Some(thread t)

None

ReadNeeded

IrrelevantNot Needed

Figure 55 ndash Targeted Dependency Results for Predicate thread

the stack corresponding to an active thread It makes a call to the predicate threadthus reading the j-th element of the threads array of its input process If this is anactive element it further accesses the field stack from which it only reads the startaddress start Otherwise if the element is inactive the predicate forwards the exitlabel None of the called predicate thread and generates no output When given aninvalid index i the predicate exits with label oob The predicatersquos implementation isshown in Figure 57

The dependency information for this predicate should capture the fact that on thetrue execution scenario only the field start of the inputrsquos j-th associated thread isread Furthermore the only possible constructor on this execution path is the Someconstructor On the contrary for the None execution scenario the only possible con-structor is the None constructor The targeted dependency results for the start_addresspredicate are depicted in Figure 58 We remark that for the oob execution scenarioonly the ldquosupportrdquo or ldquolengthrdquo of the threads array is read

82 Chapter 5 Dependency Analysis for Functional Specifications

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

error

trueNone

true

true

oob

Figure 56 ndash Gstart_address ndash Control Flow Graph of Predicatestart_address

predicate start_address ( process p int j)-gt [ true int adr|None]

thread tj memory_region sj thread (p j)[ true tj | None | oob] [ true -gt 1

None -gt 4 oob -gt 5]sj = tj stack [ true -gt 2]adr = sjstart [ true -gt 3][ true][None ][error]

Figure 57 ndash Predicate start_address ndash Implementation

Exit label true

adr_space

crt_thread

pid

process p

threads

idcrt_state

stack

thread tjstartstack stj

lengthExit label None

adr_space

crt_thread

pid

threads

process p

optionltthreadgt

Some(thread t)

None

ReadNeeded

IrrelevantNot Needed

Figure 58 ndash Targeted Dependency Results for Predicatestart_address

52 Abstract Dependency Domain 83

512 Outline

The rest of this chapter is focusing on technical details related to the dependency analy-sis In Section 52 we present the abstract dependency domain This is the fundamentalbuilding block on which our analysis relies in order to determine expressive dependencysummaries It is followed in Section 53 by an in-depth description of our analysis at anintraprocedural level underlining the data-flow equations in Section 532 and explain-ing them by illustrating the step-by-step mechanism on an example in Section 533 Asummary of the dependency analysis at an interprocedural level is given in Section 54We illustrate the approach underline its shortcomings on an example in Section 541and discuss their origin in Section 542 Two different semantic interpretations of ourdependency information are discussed in Section 55 In Section 56 we review anddiscuss approaches targeting information that is similar to our dependency summariesFinally in Section 57 we conclude and present some other potential applications ofour dependency analysis which are not confined to the field of interactive programverification

52 Abstract Dependency DomainThe first step towards inferring expressive type-sensitive results that capture the de-pendency specific to each subelement of an algebraic data type or an associative arrayis the definition of an abstract dependency domain D that mimics the structure of suchdata types The dependency domain δ isin D shown below is defined inductively fromthe three atomic cases mdash gt and perp mdash and mirrors the structure of the concretetypes

Definition 521 Dependency Domain δ isin D

δ = | gt Everything ndash atomic case (i)| Nothing ndash atomic case (ii)| perp Impossible ndash atomic case (iii)| f1 7rarr δ1 fn 7rarr δn f1 fn fields (iv)| [C1 7rarr δ1 Cm 7rarr δm] C1 Cm constructors (v)| 〈δ〉 (vi)| 〈δdef i δexc〉 i array index (vii)

As reflected by the above definition the dependency for atomic types is expressed interms of the domainrsquos atomic cases gt (least precise) denoting that everything is neededand denoting that nothing is needed The third atomic case perp denoting impossibleis introduced for the possible constructors analysis performed simultaneously and isfurther explained below

The dependency of a structure (iv) describes the dependency on each of its fields Forinstance revisiting our thread example from Section 511 we could express an over-approximation of the dependency information depicted for the process p in Figure 55

84 Chapter 5 Dependency Analysis for Functional Specifications

using the following dependency

threads 7rarr gt pid 7rarr crt_thread 7rarr adr_space 7rarr

This captures the fact that all fields except the threads field are irrelevant ie theyare not read and nothing in their contents is needed The dependency for the threadsfield is an over-approximation and expresses the fact that it is entirely necessary ieeverything in its value is needed for the result

For arrays we distinguish between two cases namely arrays with a general depen-dency applying to all of the cells given by (vi) and arrays with a general dependencyapplying to all but one exceptional cell for which a specific dependency is known givenby (vii) For instance for the threads field of the previous example the following de-pendency

〈 i gt〉

would be a less coarse approximation capturing the fact that only the i-th element ofthe associated threads array is needed while all others are irrelevant

For variants (v) the dependency is expressed in terms of the dependencies of theirconstructors expressed in turn in terms of their argumentsrsquo dependencies Thus aconstructor having a dependency mapped to is one for which nothing but the taghas been read ie its arguments if any are irrelevant for the execution For in-stance for the i-th element of the threads array of our previous example the followingdependency

[Some 7rarr gt None 7rarr ]

would be a more precise approximation when considering the exit label true It isstill an over-approximation as it expresses that both constructors are possible Theargument of the Some constructor is entirely read while for None only the tag is read

For variants we want to take a step further and to also include the informationthat certain constructors cannot occur for certain execution paths Impossible thethird atomic case mdash perp mdash is introduced for this purpose As mentioned previouslyin Section 51 in order to obtain this additional information we perform a ldquopossible-constructorsrdquo analysis simultaneously which computes for each execution scenario thesubset of possible constructors for a given value at a given program point All construc-tors that cannot occur on a given execution path are marked as being perp In contrastconstructors for which only the tag is read are marked as The difference between perpand can be illustrated by considering a polymorphic option type optionltAgt havingtwo constructors None and Some(A val) respectively and a Boolean predicate thatpattern matches on an input of this type and returns false in the case of None andtrue in the case of Some unconditioned by the value val of its argument For thetrue execution scenario the dependency on the Some constructor would be Thetag is read and it is decisive for the outcome but the value of its argument val iscompletely irrelevant The dependency on the None constructor however would be perpthe predicate can exit with label true if and only if the input matches against the Someconstructor By distinguishing between these two cases we can not only distinguish the

52 Abstract Dependency Domain 85

inputrsquos subelements that have a direct impact on an operationrsquos output but addition-ally we can also obtain a more detailed footprint that highlights the influence exertedby the inputrsquos ldquoshaperdquo on the operationrsquos outcome

For instance for the i-th element of the threads array of our previous example adependency mapping the constructor None to perp would be a more precise approximationwhen considering the label true Taking into account all the discussed values we canexpress the dependency depicted in Figure 55 for the label true as follows

threads 7rarr 〈 i [Some 7rarr gt None 7rarr perp]〉pid 7rarr crt_thread 7rarr adr_space 7rarr

We remark that gt and perp can apply to any type For instance gt can be seen

as a placeholder for data that is needed in its entirety Structure array or variantdependencies whose subelements are all entirely needed and thus uniformly mappedto gt are transformed to gt The perp dependency is a placeholder for data that cannotoccur on a certain execution scenario A whole variant value is impossible if all itsconstructors are mapped to perp A whole structure or array is impossible if any of itssubelements is impossible

The perp atomic value is the lower bound of our domain and hence the most precisevalue The final abstract dependency is a closure of all these combined recursively Togive an intuition of the shape of our dependency lattice we illustrate below in Figure 59the Hasse diagram of the order relation between pairs of atomic dependency valuesIntuitively if the two analyses would be performed separately the upper ldquodiamondrdquoshape would correspond to the dependency analysis and the lower one to the possible-constructors analysis The element would be the lower bound for the dependencydomain and the upper bound for the possible-constructors domain By performingthem simultaneously perp becomes the domainrsquos lower bound

(gtgt)

(gt) (gt)

()

(perp) (perp)

(perpperp)

(gtperp) (perpgt)

Figure 59 ndash Order Relation on Pairs of Atomic Dependencies

The partial order relation is denoted by v and defined as shown below

Definition 522 Partial Order v

v sube D timesD

86 Chapter 5 Dependency Analysis for Functional Specifications

Table 51 ndash v ndash Comparison of Two Domains

δ v gtTop

perp v δBot

δ1 v δprime1 δn v δprimenf1 7rarr δ1 fn 7rarr δn v f1 7rarr δprime1 fn 7rarr δprimen

Str v δ1 v δn v f1 7rarr δ1 fn 7rarr δn

Str

δ1 v δprime1 δn v δprimen[C1 7rarr δ1 Cn 7rarr δn] v [C1 7rarr δprime1 Cn 7rarr δprimen]

Var v δ1 v δn v [C1 7rarr δ1 Cn 7rarr δn]

Var

δdef v δprimedef

〈δdef 〉 v 〈δprimedef 〉ADef

v δdef

v 〈δdef 〉ADef

δdef v δprimedef δexc v δprimedef

〈δdef i δexc〉 v 〈δprimedef 〉AIA

δdef v δprimedef δdef v δprimeexc

〈δdef 〉 v 〈δprimedef i δprimeexc〉AAI

δdef v δprimedef δexc v δprimeexc

〈δdef i δexc〉 v 〈δprimedef i δprimeexc〉AI v δdef v δexc

v 〈δdef i δexc〉AI

δdef v δprimedef δexc v δprimeexc δdef v δprimeexc δexc v δprimedef i 6= j

〈δdef i δexc〉 v 〈δprimedef j δprimeexc〉AIJ

It is used to compare dependencies and it is detailed in Table 51 We write δ1 v δ2and we read it as ldquoa dependency δ1 is more precise than another dependency δ2rdquo ifit represents a smaller subset of a structural object and if it allows at most as manyconstructors as δ2 The greatest element is gt (Top) and perp is the least (Bot) Instancesof identical structure and variant types are compared pointwise (Str Var) For arrayswithout known exceptional dependencies we compare the default dependencies applyingto all array cells (ADef) If exceptional dependencies are known for the same cell theseare additionally compared (AI) For arrays with known exceptional dependencies fordifferent cells we compare each dependency on the left-hand side with each one on theright-hand side (AIJ) The comparison of with structures (Str) variants (Var)and arrays (ADef AI) is a pointwise comparison between and the dependencyof each subelement

521 Join and Reduction Operator

The join operation is denoted by or and it is defined as shown below

Definition 523 Join Operation or

or D timesD rarr D

52 Abstract Dependency Domain 87

It is detailed in Table 52 Intuitively the join of two dependencies is the union ofthe dependencies represented by the two It is a commutative operation for which theundisplayed cases in Table 52 are defined by their symmetrical counterparts Theoperation is total joining incompatible domains such as a structure and a variant ortwo structures having different field identifiers results in gt the least precise valueJoin is applied pointwise on each subelement perp is its identity element and gt is itsabsorbing element Joining and the dependency of a structure variant or array isapplied pointwise The value obtained by joining δ and δprime is an upper bound of the two

δ v δ or δprime and δprime v δ or δprime forall δ δprime isin D

Defining the join of two dependencies corresponding to arrays is subtle As shownin Table 51 we are allowing comparisons between dependencies corresponding to ar-rays with exceptions on different variables (rule AIJ) the join operation in this caseamounts to joining the four different dependencies without keeping any of the two ex-ceptions We could have chosen to keep one of the known exceptional dependenciesbut this would have posed two problems on one hand the join operation would notbe commutative and on the other hand it is hard to predict how the exceptionaldependencies would be used at the intraprocedural level and which of the two couldpotentially lead to a gain in precision Thus we adopted this design decision Astrategy possibly worth investigating in such cases would be to allow users to specifyarray cells of interest at specific program points This user-supplied information couldthen be taken into consideration whenever joining array dependencies with two differ-ent known exceptional dependencies Our current join approach for arrays can lead tonon-monotonic approximations in join This becomes visible when noting that for a

Table 52 ndash or ndash Join Operation

δprime δprimeprime δprime or δprimeprime

gt or δ = gtperp or δ = δ

f1 7rarr δ1 fn 7rarr δn or f1 7rarr δprime1 fn 7rarr δprimen = f1 7rarr δ1 or δprime1 fn 7rarr δn or δprimen or f1 7rarr δ1 fn 7rarr δn = f1 7rarr or δ1 fn 7rarr or δn

[C1 7rarr δ1 Cn 7rarr δn] or [C1 7rarr δprime1 Cn 7rarr δprimen] = [C1 7rarr δ1 or δprime1 Cn 7rarr δn or δprimen] or [C1 7rarr δ1 Cn 7rarr δn] = [C1 7rarr or δ1 Cn 7rarr or δn]

〈δdef 〉 or 〈δprimedef 〉 = 〈δdef or δprimedef 〉 or 〈δdef 〉 = 〈 or δdef 〉

〈δdef 〉 or 〈δprimedef i δprimeexc〉 = 〈δdef or δprimedef i δdef or δprimeexc〉 or 〈δdef i δexc〉 = 〈 or δdef i or δexc〉

〈δdef i δexc〉 or 〈δprimedef j δprimeexc〉i = j

i 6= j=

〈δdef or δprimedef i δexc or δprimeexc〉〈δdef or δexc or δprimedef or δprimeexc〉

or =

88 Chapter 5 Dependency Analysis for Functional Specifications

monotonic join operation the following should hold

forallδ δprime ρ δ v δprime =rArr δ or ρ v δprime or ρ (i)

Consideringρ equiv 〈ρdef i ρi〉δ equiv 〈δdef j δj〉δprime equiv 〈δprimedef i δprimei〉 where i 6= j

the hypothesis δ v δprime is translated into the following constraints

δdef v δprimedef δdef v δprimei δj v δprimedef δj v δprimei

Applying (i) for these three dependencies we obtain

〈(δdef or δj) or (ρdef or ρi)〉 v 〈δprimedef or ρdef i δprimei or ρi〉

which holds if and only if both of the following inequalities hold

(δdef or δj) or (ρdef or ρi) v δprimedef or ρdef(δdef or δj) or (ρdef or ρi) v δprimei or ρi

Considering for instance

ρi = gt ρdef 6= gt δdef = δj = δprimedef = perp

a counterexample is foundAs a consequence of the non-monotonic approximations made for arrays (rule AIJ)

the value obtained by joining two dependencies is an upper bound not a least upperbound We address this issue and indicate our solution in Section 53 (on page 94)We remark that we keep only one exceptional cell for array dependencies as in practicemost operations manipulating arrays tend to either modify only one element or all ofthem Logical properties on arrays generally have to hold for all elements Keepingmore than one exceptional dependency would be much more costly and the additionalcost would not necessarily be justified in practice However the join operation wouldbe more straightforward and would not impose non-monotonic approximations

Besides join a reduction operator denoted by oplus has been defined as well

Definition 524 Reduction Operator oplus

oplus D timesD rarr D

This is a recursive commutative pointwise operation Intuitively this operator is intro-duced for taking advantage of the information additionally computed by the possible-constructors analysis that we perform simultaneously Following the same executionpath the same constructors must be possible The reduction operator is used in orderto incorporate this additional information computed for constructors The dependency

52 Abstract Dependency Domain 89

analysis can be seen as amay analysis ie when combining the dependency informationcomputed at two different points on the same execution path the result must accountfor all dependencies computed at any of the two combined points In contrast thepossible-constructors analysis can be seen as a must analysis ie when combining in-formation at two different points on the same execution path it needs to keep facts thathold at both combined points Thus the reduction operator combines dependencies onthe same execution path and consists in performing the intersection of constructors inthe case of variants and the union of dependencies for all other types The reductionoperatorrsquos role will become more transparent after presenting the intraprocedural de-pendency analysis and the corresponding data-flow equations in Section 53 Its identityelement is and its absorbing element is perp The reduction operator between gt andthe dependency of a structure variant or array is applied pointwise Two instances ofidentical variant types are pointwise reduced Similarly to join the undisplayed casesin Table 53 are defined with respect to their symmetrical counterparts

δprime δprimeprime δprime oplus δprimeprime

perp oplus δ = perp oplus δ = δ

f1 7rarr δ1 fn 7rarr δn oplus f1 7rarr δprime1 fn 7rarr δprimen = f1 7rarr δ1 oplus δprime1 fn 7rarr δn oplus δprimenf1 7rarr δ1 fn 7rarr δn oplus gt = f1 7rarr δ1 oplusgt fn 7rarr δn oplusgt[C1 7rarr δ1 Cn 7rarr δn] oplus [C1 7rarr δprime1 Cn 7rarr δprimen] = [C1 7rarr δ1 oplus δprime1 Cn 7rarr δn oplus δprimen][C1 7rarr δ1 Cn 7rarr δn] oplus gt = [C1 7rarr δ1 oplusgt Cn 7rarr δn oplusgt]

〈δdef 〉 oplus 〈δprimedef 〉 = 〈δdef oplus δprimedef 〉〈δdef 〉 oplus 〈δprimedef i δprimeexc〉 = 〈δdef oplus δprimedef i δdef oplus δprimeexc〉

〈δdef i δexc〉 oplus 〈δprimedef j δprimeexc〉 =〈δdef oplus δprimedef i δdef oplus δprimeexc〉 where i = j

〈(δdef or δexc)oplus (δprimedef or δprimeexc)〉 otherwise〈δdef 〉 oplus gt = 〈δdef oplusgt〉

〈δdef i δexc〉 oplus gt = 〈δdef oplusgt i δexc oplusgt〉gt oplus gt = gt

Table 53 ndash oplus ndash Reduction Operator

Finally the extractions summarized in Table 54 have been defined for dependenciesδ and are used to express the data-flow equations of Section 53Definition 525 Extraction of a fieldrsquos dependency

f D 9 D

Definition 526 Extraction of a constructorrsquos dependency

C D 9 D

Definition 527 Extraction of an arrayrsquos cell dependency

〈i〉 D 9 D

90 Chapter 5 Dependency Analysis for Functional Specifications

Definition 528 Extraction of an arrayrsquos dependency outside a cell i

〈lowast i〉 D 9 D

Definition 529 Extraction of an arrayrsquos general dependency

〈lowast〉 D 9 D

They are partial functions and can only be applied on dependencies of the cor-responding kind For instance the field extraction f only makes sense for atomic orstructured values with a field named f which should be the case if the dependencyrepresents a variable of a structured type with some field f For any of the atomicdependencies δa applying any of the defined extractions yields δa

Table 54 ndash Dependency Extractions

δf f isin F

gtf = gtf = perpf = perpf1 7rarr δ1 fn 7rarr δnf = δi if f = fi

δCC isin C

gtC = gtC = perpC = perp[C1 7rarr δ1 Cm 7rarr δm]C = δj if C = Cj

δ〈lowast i〉 δ〈i〉 δ〈lowast〉

gt〈lowast i〉 = gt gt〈i〉 = gt gt〈lowast〉 = gt〈lowast i〉 = gt 〈i〉 = 〈lowast〉 = perp〈lowast i〉 = perp perp〈i〉 = perp perp〈lowast〉 = perp〈δdef 〉〈lowast i〉 = δdef 〈δdef 〉〈i〉 = δdef 〈δdef 〉〈lowast〉 = δdef

〈δdef k δexc〉〈lowast i〉 =δdef when i = kδdef or δexc otherwise

〈δdef k δexc〉〈i〉 =δexc when i = kδdef or δexc otherwise

〈δdef k δexc〉〈lowast〉 =δdef or δexc

522 Well-Typed Dependencies

The described syntactic dependencies are untyped However their interpretation ismade in the context of a type τ Dependencies such as or gt do not exhibit any datatype features and can apply to any type but others will be completely constrained andmost will fall in between uncovering a few layers of structured types before reaching oneof the ldquogenericrdquo leaves gt or perp For example the dependency f 7rarr δf only reallymakes sense for structured types with a single field f whose type itself is compatiblewith δf and shall not be used in connection with variant or array types

As a consequence we conclude the presentation of our abstract dependency typeby explaining what it means for a dependency to be compatible with some type τ ie

53 Intraprocedural Analysis and Data-Flow Equations 91

to be well-typed of some type τ This is described as a judgement parameterized by thetyping environment Γ (Definition 431) and the different inference rules are detailed inTable 55

Γ ` gt τWTgt

Γ ` perp τWTperp

Γ ` τWT

τ = structf1 τ1 fn τnΓ ` δ1 τ1 Γ ` δn τnΓ ` f1 7rarr δ1 fn 7rarr δn τ

WTStruct

τ = variant[C1 τ1| | Cn τn]Γ ` δ1 τ1 Γ ` δn τnΓ ` [C1 7rarr δ1 Cn 7rarr δn] τ

WTVar

Γ ` δdef τΓ ` 〈δdef 〉 arrτi〈τ〉

WTArr

Γ ` δdef τ Γ ` δexc τ Γ(i) = τi

Γ ` 〈δdef i δexc〉 arrτi〈τ〉WTArrI

Table 55 ndash Well-Typed Dependencies

The atomic dependency values are generic they are well-typed with respect to anytype (WTgt WT WTperp) The dependency δ for a structure (WTStruct) is well-typed only with respect to an adequate structured type whose field types are themselvescompatible with the dependency mapped to them in δ Similarly the dependency δfor a variant (WTVar) is well-typed only with respect to an adequate variant typeIn turn its constructors must be themselves compatible with the dependency mappedto them in δ For well-typed array dependencies (WTArr WTArrI) the defaultdependency as well as the exceptional dependency have to be compatible with thetype τ of the arrayrsquos elements Furthermore the type of i the index of the knownexceptional dependency has to be compatible with τi the arrayrsquos index type

In the following section we are discussing our intraprocedural dependency domainand the manner in which dependencies are computed and manipulated

53 Intraprocedural Analysis and Data-Flow Equations

531 Intraprocedural Dependency Domains

At an intraprocedural level dependency information has to be kept at each point ofthe control flow graph for each variable of the typing environment Γ that maps input

92 Chapter 5 Dependency Analysis for Functional Specifications

output and local variables to their types We use the term domain to denote thisinformation

Definition 531 Intraprocedural Dependency Domain ∆ isin D An intraproceduraldomain ∆ isin D

∆ V rarr D

is a mapping from variables to dependencies

An intraprocedural domain is associated to every node of the control flow graph rep-resenting the dependencies at the nodersquos entry point A special case is the mappingwhich binds all variables to perp which we call Unreachable

Unreachable equiv x 7rarr perp

In particular it is associated to nodes that cannot be reached during the analysisAlso if any of the variables of ∆ is marked as perp the entire node collapses becomingUnreachable

For any node of the control flow graph associated to an intraprocedural domain ∆∆(x) retrieves the dependency associated to the variable x If a dependency for x hasnot been computed yet it is mapped to

Forgetting a variable x from a reachable intraprocedural domain denoted by ∆ xldquoerasesrdquo the variablersquos dependency information by mapping it to

Definition 532 Forget x

∆ x =

Unreachable when ∆ = Unreachable

∆prime = y 7rarr

∆(y) when y 6= x when y = x

The v∆ or∆ and oplus∆ operations are pointwise extensions of v (defined in 522) or(defined in 523) and oplus (defined in 524) respectively they apply to intraproceduraldependency domains for each variable and its associated dependency δv

We define a partial order v∆ on D

Definition 533 Intraprocedural Partial Order v∆

v∆ sube D timesD ∆prime v∆ ∆primeprime iff ∆prime(x) v ∆primeprime(x)forallx isin V

In particular Unreachable is the bottom of this intraprocedural lattice It is the identityelement of the intraprocedural join or∆ operation and the absorbing element of theintraprocedural reduction operator oplus∆ defined below

Definition 534 Intraprocedural Join Operation or∆

or∆ D timesD rarr D

∆prime or∆ ∆primeprime = ∆ lArrrArr ∆(x) = ∆prime(x) or∆primeprime(x)forallx isin V

53 Intraprocedural Analysis and Data-Flow Equations 93

Definition 535 Intraprocedural Reduction Operator oplus∆

oplus∆ D timesD rarr D

∆prime oplus∆ ∆primeprime = ∆ lArrrArr ∆(x) = ∆prime(x)oplus∆primeprime(x) forallx isin Γ

Finally an intraprocedural domain ∆ is well-typed with respect to a typing envi-ronment Γ if and only if the dependency mapped to any variable x is well-typed withrespect to xrsquos type in the typing environment Γ (Definition 431)

532 Intraprocedural Data-Flow Equations

Table 56 ndash Statements ndash Representations and Data-Flow Equations

Representation Equationn

n1 ni nk∆n1

∆ni ∆nk

s λ1 s λks λi∆n =

or∆

nsλiminusminusrarrni

JsKλi(∆ni)

Our dependency analysis is a backward data-flow analysis For each exit label ittraverses the control flow graph starting with its corresponding exit node and it marksall other exit points as Unreachable since exit labels are mutually exclusive The in-traprocedural domain for the currently analysed label is initialized with its associatedoutput variables mapped to gt Thereby the analysis starts by making a conservativeapproximation and by considering that all the input has been observed and the outputdepends on it entirely Typically dependence analyses are forward analyses Howevergiven our goal to express label-specific dependencies as input-output relations and tak-ing into consideration the characteristics of the αSmil language choosing to design ouranalysis as a backward data-flow analysis seemed a pertinent choice In αSmil outputsare associated to a particular exit label and they are generated if and only if the pred-icate exits with that particular label By traversing the control flow graph backwardswe can use this information and consider starting with the initialisation phase onlythe outputs that are relevant for the analysed exit label

After the initialisation the analysis then traverses the control flow graph and grad-ually refines the dependencies until a fixed point is reached Table 56 summarizes therepresentation and general equation of the statements For each statement the pre-sented data-flow equation operates on the intraprocedural domains of the statementrsquossuccessor nodes The intraprocedural domain at the entry point of the node is obtainedby joining the contributions of each outgoing edge as shown in Figure 510

Definition 536 The contribution of an edge (ni nj) labeled with s and λ is givenby JsKλ(∆nj ) where JsKλ() is the transfer function of the edge labeled s λ

94 Chapter 5 Dependency Analysis for Functional Specifications

Dependencies corresponding to variables that are written by a statement s on an exitlabel λ denoted by gensλ in Figure 510 are forgotten from the intraprocedural domainon which we are operating

statement

∆in = JsKλ1(∆λ1)or∆ or∆JsKλn(∆λn)JsKλi(∆i) (∆i gensλi

)oplus∆ δsλi

δsλicontribution of s on λi

δsλ1∆λ1

δsλn

∆λn

(∆λ1 gensλ1) oplus∆δsλ1 (∆λn gensλn) oplus∆δsλn

Figure 510 ndash Computation of the Intraprocedural Domain at a NodersquosEntry Point

In Section 521 we explained that as a consequence of the non-monotonic approxi-mations made when joining dependencies corresponding to arrays the result of the joinoperation is an upper bound not a least upper bound In order to deal with this issue weadopt the generic solution consisting of systematically joining the dependency domainassociated to a node before its iteration with the new dependency domain computedby the transfer function Thus the dependency domain of a node n is

∆n = old(∆n)or∆ (or

∆nminusrarrnprime

JsKλ(∆nprime))

This is not prohibitive in terms of performance leading to an increase of the executiontime of 5 to 10

Tables 57 58 59 510 define the transfer functions for each built-in statementof our language whereas the general case of a predicate call and its correspondingequation will be detailed in Section 54

Table 57 presents the transfer functions for statements which are not type-specificFor equality tests (1) both of the inputs e1 e2 are completely read whether the testreturns true or false The transfer functions therefore reduce the domain of the corre-sponding successor node with a domain consisting of e1 and e2 both mapped to gt Inthe case of assignment (2) the dependency of the written output variable o is forgottenfrom the successorrsquos intraprocedural domain thus being mapped to and forwardedto the input variable e The transfer function for the nop operation (3) is simply theidentity

53 Intraprocedural Analysis and Data-Flow Equations 95

Statement JsKλi(∆)

Equality test (1)Je1 = e2Ktrue(∆) = ∆ oplus∆ dep where

Je1 = e2Kfalse(∆) = ∆ oplus∆ dep dep =e1 7rarr gte2 7rarr gt

Assignment (2) Jo = eKtrue(∆) = (∆ o) oplus∆ e 7rarr ∆(o)

No Operation (3) JnopKtrue(∆) = ∆

Table 57 ndash Generic Statements ndash Data-Flow Equations

The data-flow equations given in Table 58 correspond to structure-related state-ments For the equations (4) (5) (6) and (7) we assume that the variable r is of typestructf1 τ fn τ for some fields fi 1 le i le n The equation (4) refers to thecreation of a structure each input ei is read as much as the corresponding field fi ofthe structure is read The destructuring of a structure is handled in (5) each field fi isneeded as much as the corresponding variable oi is When accessing the i-th field of astructure r (6) only the field fi is read and only as much as the accessrsquo result o itselfThe equation (7) treats field updates the variable ei is read as much as the field fi isThe structure r is read as much as all the fields other than fi are read in rprime Finally theequations given in (8) handle partial structure equality tests and the transfer functionsare the same for the labels true or false for both compared structures rprime and rprimeprime all thefields in the given set f1 fk are completely read and only those

Statement JsKλi(∆)

Create (4) Jr = e1 enKtrue(∆) = (∆ r) oplus∆oplus

1leilenei 7rarr ∆(r)fi

Destructure (5) Jo1 on = rKtrue(∆) = (∆ oi| oi isin o) oplus∆ r 7rarr f1 7rarr ∆(o1) fn 7rarr ∆(on)

Access field (6) Jo = rfiKtrue(∆) = (∆ o) oplus∆ r 7rarr f1 7rarr fi 7rarr ∆(o) fn 7rarr

Update field (7) Jrprime = r with fi = eKtrue(∆) = (∆ rprime) oplus∆

ei 7rarr ∆(rprime)fir 7rarr f1 7rarr δ1 fn 7rarr δn

where δj =

∆(rprime)fj if j 6= i otherwise

Equality (8)

Jrprime = 〈f1 fk〉rprimeprimeKtrue(∆) = ∆ oplus∆ d where d =rprime 7rarr f1 7rarr δ1 fn 7rarr δnrprimeprime 7rarr f1 7rarr δ1 fn 7rarr δn

Jrprime = 〈f1 fk〉rprimeprimeKfalse(∆) = ∆ oplus∆ d and δi =gt if fi isin f1 fk otherwise

Table 58 ndash Structure-Related Statements ndash Data-Flow Equations

96 Chapter 5 Dependency Analysis for Functional Specifications

The data-flow equations given in Table 59 correspond to variant-related statementsThey follow the same principles as those used for structure-related statements aboveNote that the transfer functions for the switch (10) and possible constructor test (11)introduce perp dependencies for constructors which are known to be impossible on theconsidered edge In particular since perp is an absorbing element for oplus these transferfunctions erase for every constructor which is known to be locally impossible all thedependency information possibly attached to such a constructor in the successor nodesThis is the actual raison drsquoecirctre for the reduction operator since using or∆ to combinea successor domain and a local contribution would lose this information

Finally the equations for array-related statements are given in Table 510 We as-sume for both that the context is fixed and that I is the distinguished set of inputvariables for the analysed predicate This set is used to make sure that exceptions inarray dependencies are only registered to variables in I and not local or output vari-ables The reason for such a constraint is pragmatic input variables are not assignablein our language and therefore they always represent the same value intraprocedurallyOtherwise each time a variable is written by a statement we would need to traverseall the dependencies in the domain to erase or reinterpret the occurrences where thisvariable appears as an exception Only recording exceptions for input variables makesthis kind of costly traversal useless and since only exceptions about input variablesmake sense at the interprocedural level (see Section 54) we do not lose much precisionby doing so

Statement JsKλi(∆)

Create variant (9) Jv = Cp[e]Ktrue(∆) = (∆ v) oplus∆ e 7rarr ∆(v)Cp

Variant Switch (10) Jswitch(v) as [o1| |on]Kλi(∆) = (∆ oi)oplus∆ v 7rarr depiwhere depi = [C1 7rarr perp Ci 7rarr ∆(oi) Cn 7rarr perp]

Possible variant (11)

Jv isin C1 CkKtrue(∆) = ∆ oplus∆ v 7rarr [C1 7rarr δ1 Cn 7rarr δn ]

where δi =

∆(v)Ci if Ci isin C1 Ckperp otherwise

Jv isin C1 CkKfalse(∆) = ∆ oplus∆v 7rarr

[C1 7rarr δ1 Cn 7rarr δn

]

where δi =

∆(v)Ci if Ci isin C1 Ckperp otherwise

Table 59 ndash Variant-Related Statements ndash Data-Flow Equations

53 Intraprocedural Analysis and Data-Flow Equations 97

Statement JsKλi(∆)

Array access (12)

Jo = a[i]Ktrue(∆) =

(∆ o) oplus∆

i 7rarr gta 7rarr 〈 i ∆(o)〉

when i isin I

(∆ o) oplus∆

i 7rarr gta 7rarr 〈∆(o) or 〉

when i isin I

Jo = a[i]Kfalse(∆) = ∆ oplus∆

i 7rarr gta 7rarr 〈〉

Array update (13)

Japrime = [a with i = e]Ktrue(∆) =

(∆ aprime) oplus∆

i 7rarr gte 7rarr ∆(aprime)〈i〉a 7rarr 〈∆(aprime)〈lowast i〉 i 〉

when i isin I

(∆ aprime) oplus∆

i 7rarr gte 7rarr ∆(aprime)〈lowast〉a 7rarr 〈∆(aprime)〈lowast〉 or 〉

when i isin I

Japrime = [a with i = e]Kfalse(∆) = ∆ oplus∆

i 7rarr gta 7rarr 〈empty〉

Table 510 ndash Array-Related Statements ndash Data-Flow Equations

The transfer functions for (12) and (13) thus take care of making adequate approximationswhen exceptions cannot be introduced As for the cases when the array access exitswith the false label note that the contribution to the array a is 〈〉 which is strictlyless precise than The operation makes implicit bounds checking and this can thusbe seen as accounting for the fact that no cell in a has been read but the ldquolengthrdquoor ldquosupportrdquo of a has been read Hence it would not be correct to claim that theresult of the statement does not depend on a at all Similarly a variant dependency[C1 7rarr Cn 7rarr ] mapping all constructors to nothing has not read any value inany of the constructors but may still depend on the variantrsquos constructor itself Incontrast we do not make this distinction for structures because we assume surjectivepairing ie structure values consist only of the fields themselves Our solution caneasily be adapted in order to deal with non-surjective cases

533 Intraprocedural Dependency Analysis Illustrated

To better illustrate our analysis at an intraprocedural level we exemplify the mechanismbehind it step by step on the predicate thread discussed in Section 511 We considerthe true execution scenario apply our dependency analysis and compare the actualobtained results with the targeted ones depicted in Figure 55

Since a predicate can only exit with one label at a time and we are considering thetrue label we can map the nodes None and oob to Unreachable as shown in Figure 511This is an advantage of backwards analyses For true we make a pessimistic assumptionand map the output ti to gt considering that control on the output is external and

98 Chapter 5 Dependency Analysis for Functional Specifications

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachableti 7rarr gt

Figure 511 ndash Analysing Predicate thread ndash Initialisation

hence out of our reach and that ti will be entirely needed by a potential caller Goingfurther up the control flow graph we analyse the variant switch

In order to compute the dependency for the node corresponding to the variantswitch we apply the data-flow equation given by (10) in Table 59 Since we areanalysing the true case we know that all other constructors (only the constructor Nonein this case) are locally impossible Thus we map it to perp We continue by forgettingthe dependency information we knew about the output ti Since its value is neededonly in as much as the result of the switch on the corresponding edge is needed weforward it to the part corresponding to the Some constructor This is summarized below

oplusoplus perp perp

C1 CSome Cn

tio =

ti =

Jswitch(v) as [o1| |on]Kλi(∆) = (∆ oi)oplus v 7rarr depiwheredepi = [ C1 7rarr perp Ci 7rarr ∆(oi) Cn 7rarr perp ]

Figure 512 ndash Applying the Variant Switch Equation

Taking all this into account for the node corresponding to the variant switch weobtain the dependency shown in Figure 513 For the output ti we depend entirelyon the Some constructor of the nodersquos input variant tio while the constructor None isimpossible

Making a step further up the graph we access the cell i of the array th and applythe equation (12) given in Table 510 We begin by forgetting the dependency for theoutput tio since this is written Since we only access the element i we map all othercells to Nothing ie To the dependency corresponding to the i-th cell we forward

53 Intraprocedural Analysis and Data-Flow Equations 99

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

tio 7rarr [Some 7rarr gt None 7rarr perp]

ti 7rarr gt

Figure 513 ndash Analysing Predicate thread ndash Variant Switch

the dependency we knew about tio since we depend on it to the extent to which theresult of the access is needed

oplusoplus oplusoplus oplusoplus1 i n

th =

tio =

Jo = a[i]Ktrue(∆) =

(∆ o) oplus

i 7rarr gta 7rarr 〈 i ∆(o)〉

when i isin I

(∆ o) oplusi 7rarr gta 7rarr 〈∆(o) or 〉

when i isin I

Figure 514 ndash Applying the Array Access Equation

We thus obtain a dependency stating that we depend only on the i-th cell of thearray th for which only the constructor Some is possible and entirely needed The cellrsquosindex i is entirely needed as well The applied equation is shown in Figure 514 (sincei is an input we use the first case of the equation) and the obtained results are shownin Figure 515

As a last step we access the field threads of the input process p and apply theequation (6) given in Table 58 and illustrated in Figure 516 As before we forget theinformation for th the access result We map all other fields to and we forward thedependency of the variable th to the dependency part of the field threads

We thus obtain the dependency result shown in Figure 517 This states that for thelabel true the output ti depends only on the i-th cell of the field threads of the inputprocess p for which it depends entirely on the Some constructor Before returning thepredicatersquos final results the analysis filters out any dependency information referringto local variables and verifies that the invariant imposed on dependency information

100 Chapter 5 Dependency Analysis for Functional Specifications

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

th 7rarr 〈 i [Some 7rarr gt None 7rarr perp] 〉i 7rarr gt

tio 7rarr [Some 7rarr gt None 7rarr perp]

ti 7rarr gt

Figure 515 ndash Analysing Predicate thread ndash Array Access

f1 = oplusoplus f2 = oplusoplus

fthreads = oplusoplus

fnminus1 = oplusoplus fn = oplusoplus

p =

th =

Jo = rfiKtrue(∆) = (∆ o) oplus s 7rarr f1 7rarr fi 7rarr ∆(o) fn 7rarr

Figure 516 ndash Applying the Field Access Equation

related to arrays holds Since the results refer only to the inputs p and i and the indexof the exceptional computed dependency is an input the invariant holds and the finalresult can be retrieved The final dependency results obtained for the thread predicateon the exit label true are identical to the ones that we were targeting and that weredepicted in Figure 55 For readability considerations for structures such as the inputprocess p we omit dependencies on fields mapped to We maintain this conventionthroughout the rest of this chapter and thus any field of a structure that is omittedfrom a dependency summary should be interpreted as being mapped to ie nothing

54 Interprocedural DependenciesExit labels presented in Section 312 and in Section 41 (on page 63) constitute anincreased source of expressivity as they indicate the scenario that was observed whileexecuting a predicate We incorporate this expressivity in our dependency results bycomputing specific dependencies for each possible execution scenario Therefore ouranalysis is performed label by label and interprocedural dependency domains associatean intraprocedural domain to each exit label of the analysed predicate The variable

54 Interprocedural Dependencies 101

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

p 7rarr threads 7rarr 〈 i [Some 7rarr gt None 7rarr perp]〉i 7rarr gt

th 7rarr 〈 i [Some 7rarr gt None 7rarr perp] 〉i 7rarr gt

tio 7rarr [Some 7rarr gt None 7rarr perp]

ti 7rarr gt

Figure 517 ndash Analysing Predicate thread ndash Field Access

key-set of each associated intraprocedural domain comprises the inputs of the analysedpredicate A label that cannot be returned is mapped to an Unreachable intraproceduraldomain This is a form of path-sensitivity (Robert and Leroy 2012) However we favorthe term label-sensitivity for this characteristic as it seems to be a more natural choiceapplied to our case and the language we are working on

An interprocedural domain of a predicate p is thus defined as shown below

Definition 541 Interprocedural Dependency Domain

Dp Λp rarr D where Λp the set of output labels of predicate p

For each analysed label of a predicate the analysis starts by initializing the intrapro-cedural domain mapped to it with the output variables associated to the exit labelTo avoid making any false assumption these are initially mapped to the most generaldependency namely gt Subsequently as described in Section 532 the dependencyinformation is gradually refined until a fixed point is reached The execution scenariosdenoted by the exit labels of a predicate are mutually exclusive Therefore during theanalysis of a particular exit label all other exit labels of the predicate are mapped toUnreachable After reaching a fixed point the intraprocedural domain is filtered so thatonly input variables appear in the variable set As explained in Section 532 the in-traprocedural domains are built such that only input variables may appear as exceptionindices in dependencies computed for arrays This invariant is preserved throughoutthe analysis

Interprocedural dependency information is expressed in terms of the formal param-eters of predicates For analysing predicate calls we need to substitute the formalparameters of the callee by the ones that are supplied by the caller Therefore asubstitution must be performed on interprocedural summaries This consists in substi-tuting all occurrences of formal input parameters of a predicate by the correspondingeffective input parameters The substitution operation is denoted as J (χ) where χ isa substitution from formal to effective parameters

102 Chapter 5 Dependency Analysis for Functional Specifications

We proceed by detailing the equation corresponding to a call to a predicate

p(e1 en)[λ1 o1 | | λm om]

having the following signature

p(ε1 εn)[λ1 ω1 | | λm ωm]

The general equation (given in Table 56) applies

∆n =or

∆nsλiminusminusrarrni

Jp(e1 en) [λ1 o1 | | λm om]Kλi(∆ni)

The transfer functions for the predicate call statement are deduced from the predicatersquosinterprocedural domain in the following fashion

Jp(e1 en) [λ1 o1 | | λm om]Kλi(∆) = (∆ oi)oplus

jisin1nej 7rarr depij

where (PredEq)depij = Dp(λi)(εj) J (ε 7rarr e)

Namely the mappings for the outputs o associated to a label λi are removed and thecontribution of a call to each input ej stems from the contribution of the interproceduraldomain for label λi and formal input εj In these all the formal input parametersε in array dependency domains are substituted by the corresponding effective inputparameters from e

An αSmil program is analysed by computing once and for all an interproceduraldependency domain for every predicate These are stored in a mapping binding pred-icate identifiers to their interprocedural dependency domains Whenever a predicatecall is handled intraprocedurally the corresponding computed interprocedural depen-dency summary is retrieved from the mapping propagated to the calling site and usedas explained above If an interprocedural dependency summary for a called predicatehas not been computed yet it is handled as if it were an implicit predicate In practicein programs generated in αSmil from Smil predicates are sorted in topological orderwhen possible For implicit predicates described in Chapters 3 and 4 a pessimisticassumption is made it is considered that everything in their inputs has been read andis needed for any of their possible exit labels Since their implementation is hidden aconservative approximation must be made in their case

Inductive predicates have been discussed in Section 314 (on page 46) They arespecification-only predicates and represent a disjunction of cases Each case can intro-duce existentially quantified variables An inductive predicate exits with the true labelif any of its declared cases holds Therefore for inductive predicates one analysis percase is made For the true exit label the dependency results are obtained by joiningthe results of all cases For the false label everything is considered to be read

54 Interprocedural Dependencies 103

541 Interprocedural Dependency Analysis Illustrated

To better illustrate our analysis at an interprocedural level we revisit our start_addressexample predicate introduced in Section 511 We consider the true execution scenarioapply our dependency analysis and compare the actual obtained results with the tar-geted ones depicted in Figure 58

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

error

trueNone

true

true

oob

adr 7rarr gt

sj 7rarr start 7rarr gt

tj 7rarr stack 7rarr start 7rarr gt

Figure 518 ndash Gstart_address ndash Dependency Information

We begin by initialising the output adr withgt and continue by traversing the controlflow graph backwards and by computing the dependency information at each nodeWe apply the data-flow equation (6) given in Table 58 and we obtain the intermediateresults shown in Figure 518

To compute the dependency information of the control flow graphrsquos entry node iethe one corresponding to a predicate call to thread we use the dependency summarycomputed for this predicate for the exit label true and we substitute the formal pa-rameters ie p and i appearing in it with the effective arguments of the call ie pand j We thus obtain the following dependency summary

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

We apply the data-flow equation (PredEq) corresponding to a predicate call discussedon page 102 and make use of the dependency information corresponding to the suc-cessor node on the edge marked with true

tj 7rarr stack 7rarr start 7rarr gt

thus obtaining the following final dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

However the targeted results for start_address depicted in Figure 58 would trans-late to

104 Chapter 5 Dependency Analysis for Functional Specifications

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr gt None 7rarr perp]〉j 7rarr gt

Clearly the dependency information computed by our analysis and shown in Fig-ure 519 is an over-approximation of the results that we had envisioned The obtaineddependency summary states that the entire j-th associated thread of the input pro-cess p is needed in order to obtain the output adr on the true exit label Howeverin reality only one of this threadrsquos fields is actually needed namely the stack fieldfor which only one subelement ndash the start field ndash is read This loss of precision isa consequence of the dependency information mapped to the Some constructor at thecontrol flow graphrsquos entry node corresponding to a call to the thread predicate Whenexecuting successfully and exiting with label true the thread predicate returns the i-thassociated thread of its input process However the predicate thread does not need thiselement itself it does not read nor use it per se it merely retrieves it The dependencyon this returned element is relative to the amount in which the predicatersquos callers willuse it The start_address predicate for instance depends only on one of the 3 fieldsof the returned thread Yet by mapping the i-th thread to gt in threadrsquos dependencysummary we fail to mirror this distinction gt is the top element of our dependencydomain and joining it with any other dependency will lead to gt thus shadowing anyother information we might compute while observing its usage

542 Context-Insensitivity and its Consequences

Precision losses in dependency summaries such as the one detected in our previousexample are a direct consequence of considering and analysing predicates in isolationThere is a level of information that goes beyond a predicatersquos own control flow graphand a more detailed picture that can emerge once non-local information connected tothe predicatersquos use ie the calling context is included into the analysis

Interprocedural analyses that consider the calling context when analysing the targetof a function ndash or in our case a predicate ndash call are context-sensitive analyses (Hind2001) As the name implies context-sensitive analyses can jump back to the originalcall site using context information for the results they compute Context-insensitiveanalyses on the other hand dispense with such information and propagate back to all

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

error

trueNone

true

true

oob

adr 7rarr gt

sj 7rarr start 7rarr gt

tj 7rarr stack 7rarr start 7rarr gt

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

Figure 519 ndash Gstart_address ndash Final Dependency Results

55 Semantics of Dependency Values 105

possible call sites the information that they compute once This is a notorious sourceof potential precision loss in static analysis Choosing either one of these two traits hassignificant consequences on the one hand by choosing to ignore the calling contextand the additional information it supplies one pays a high price in terms of precisionand on the other hand by choosing to include such information one risks sacrificingscalability

Our dependency analysis as presented so far is context-insensitive for each predi-cate the analysis computes a dependency summary once stores it and further propa-gates it to its callers whenever needed Considering that αSmil predicates are sequencesof calls to other predicates built-in or user-defined as discussed in Chapter 4 if wewould adopt a purely context-sensitive solution we would gain in terms of precisionbut we would obtain results that are prohibitive in terms of performance This is atypical trade-off of static analyses We address this issue and describe our solution indetail in Chapter 6 Without adopting context-sensitivity to the letter we strike a bal-ance between the two alternatives by including lazy components in our interproceduraldependency summaries and by using them for injecting the current intraproceduralcontext on an as-needed basis As will be discussed in Chapters 6 and 8 this approachleads to improved precision with only a marginal decrease in performance

55 Semantics of Dependency ValuesThere are two different manners of interpreting dependency values δ one focusing onthe possible constructors part and the other focusing on the dependency part Inboth cases the interpretations are relative to a type τ and hold only for well-typeddependencies of the same type The set of types that a dependency is compatible withhas been discussed in Section 522 and defined in Table 55

First focusing on the possible constructors aspect dependencies can be interpretedas a constraint on the forms that values may take Such constraints can arise asa consequence of perp ie impossible appearing in nested dependencies These aredescribed by a characteristic function 1

DD = (v δ) isin DtimesD | δ isin D τ isin T v isin Dτ Γ ` δ τ1 DD rarr 0 1

This is defined as follows belowDefinition 551 Characteristic function 1

1(vgt) = 11(v) = 11(vperp) = 0

1(f1 = v1 fn = vn f1 7rarr δ1 fn 7rarr δn) =

1 when 1(vi δi)forall1 le i le n0 otherwise

106 Chapter 5 Dependency Analysis for Functional Specifications

1(Ci[v] [C1 7rarr δ1 Cn 7rarr δn]) =

1 when 1(v δi)0 otherwise

1((P (vk)kisinP) 〈δdef 〉) =

1 when 1(vk δdef )forallk isin P0 otherwise

1((P (vk)kisinP) 〈δdef i δexc〉) =

1 when (1(vk δdef )forallk isin P k 6= E(i)) or(E(i) isin P1(vE(i) δexc))

0 otherwise

This interpretation is compatible with the partial order v (Definition 522 Ta-ble 51) defined on dependencies If a dependency is more precise or equal to anotherdependency then it should be interpreted as constraints which are at least as strong asthe ones for the other dependency Given a typing environment Γ (Definition 431)

forallτ isin Tlowast δ v δprime =rArr (Dτ cap 1(bull δ)) sube (Dτ cap 1(bull δprime))

whereTlowast = τ isin T | Γ ` δ τ and Γ ` δprime τ

The interpretation of the reduction operator oplus (Definition 524) with respect tothe constraints semantics of dependencies is that if two dependencies δ and δprime can beinterpreted as constraints for a value v then their reduction can be interpreted as aconstraint for v as well

1(v δ) and 1(v δprime) =rArr 1(v δ oplus δprime)

The converse which one might expect to be true as well does not hold because ofapproximations made by our treatment of arrays

Given a valuation E (Definition 442) an intraprocedural dependency summarycan be interpreted as a conjunction of the constraints on every variablersquos value as givenby its associated dependency We use the notation E ∆ to indicate this

E ∆ =rArr forallv isin V1(E(v)∆(v))

Under the appropriate conditions given a semantic transition λminusrarr (Definition 444)from the configuration

langE [s]

rang(Definition 443) to the valuation Ersquo as defined in

Section 44 if the intraprocedural summary ∆prime of the statementrsquos s successor on labelλ represents the semantic interpretation of constraints given Ersquo then the contributionJsKλ(∆prime) (Definition 536) of the edge labeled with s and λ must necessarily representthe semantic interpretation of constraints given E We thus obtain the following

55 Semantics of Dependency Values 107

Γ ` E =rArr (51)ΣΓO ` srarr λ =rArr (52)lang

E [s]rang λminusrarr Eprime =rArr (53)

Γ Eprime ` ∆prime =rArr (54)Eprime ∆prime =rArr (55)E JsKλ(∆prime) (56)

We note that thanks to the subject reduction property (Definition 447) (53)implies that Γ ` Eprime

Following from (56) when joining the contributions on all labels of the statements the obtained intraprocedural dependency summary represents the semantic interpre-tation of the disjunction of constraints given E

(E JsKλ1(∆prime1))or∆ or∆(E JsKλn(∆primen)) =rArrE (JsKλ1(∆prime1)or∆ or∆JsKλn(∆primen)) =rArrE old(∆) =rArrE old(∆)or∆(JsKλ1(∆prime1)or∆ or∆JsKλn(∆primen))

For a predicate p exiting with label λ and having the intraprocedural summary ∆λthe characteristic function given I sube E a valuation mapping the predicatersquos inputs totheir values constrains the space of inputs that can make the predicate exit with thelabel λ It thus denotes the necessary conditions on inputs according to the observedexecution scenario and can be used as an inversion lemma when reasoning on calls toa predicate

The soundness of this interpretation as well as the well-formedness of our dependen-cies have been proven in Coq and the corresponding files can be consulted online1 Themechanized Coq proofs are entirely due to Steacutephane Lescuyer These proofs also dealwith deferred dependencies that will be presented in Chapter 6 but these constitutean extension that does not modify the underlying lattice

The second interpretation of dependency values focuses on the dependency part andis a partial equivalence relation asymp

TD= (τ δ) isin Ttimes D | Γ ` δ τasymp TDrarr Dtimes D

The partial equivalence relation asympτδ relates well-typed values of the same type τ Itrelates values that only differ in places that are irrelevant according to the dependencyδ It is defined as shown below

1The corresponding files are provided at the following address httpajl-demofr2015proveCoq

108 Chapter 5 Dependency Analysis for Functional Specifications

Definition 552 Partial Equivalence Relation asympτδ

asympτgt = (x x)| x isin Dτasympτ = (x y)| x y isin Dτasympτperp = (x y)| x y isin Dτ

asympstructf1τ1fnτnf1 7rarrδ1fn 7rarrδn = (f1 = v1 fn = vn f1 = w1 fn = wn) |

foralli 1 le i le n (vi wi) isin asympτiδi

asympvariant[C1τ1| | Cnτn][C1 7rarrδ1Cn 7rarrδn] = (Ci[vi] Ci[wi]) | (vi wi) isin asympτiδi

asymparrτi 〈τ〉〈δdef 〉 = ((P (vk)kisinP) (P (wk)kisinP)) | forallk (vk wk) isin asympτδdef

asymparrτi 〈τ〉〈δdef i δexc〉 = ((P (vk)kisinP) (P (wk)kisinP)) | E(i) isin P =rArr

(vE(i) wE(i)) isinasympτδexc forallk 6= E(i) (vk wk) isin asympτδdef

This interpretation is compatible with the partial order v (Definition 522) definedon dependencies If a dependency is more precise or equal to another dependency thenit should be interpreted as an equivalence relation relating more values

δ v δprime =rArr asympτδ supe asympτδprime forallτΓ ` δ τ and Γ ` δprime τ

The interpretation of the reduction operator oplus (Definition 524) with respect tothe equivalence relation interpretation of dependencies is that the set of values relatedby δ oplus δprime is a subset of the intersection of values related by δ and δprime respectively

asympτδoplusδprime sube asympτδ cap asympτδprime forallτΓ ` δ τ and Γ ` δprime τ

The interpretation of the or operator (Definition 523 Table 52) with respect tothe equivalence relation interpretation of dependencies is similar

asympτδorδprime sube asympτδ cap asympτδprime forallτΓ ` δ τ and Γ ` δprime τ

Given two valuations E and Ersquo they are equivalent modulo an intraproceduraldependency summary ∆ if the values that they associate to variables are equivalentmodulo the corresponding dependency associated in ∆

E asympΓ∆ Eprime =rArr forallv isin ∆ E(v) asympΓ(v)

∆(v) Eprime(v)

The equivalence relation asympΓ∆ thus relates valuations that are not distinguishable by

only looking at the parts specified by the intraprocedural dependency summary ∆This interpretation can be used to apply congruence modulo reasoning to predicate

calls By calling a predicate p with two sequences of input values v and u respectively

56 Related Work 109

which are related by the intraprocedural dependency summary of p on label λ thenthe predicate will necessarily exercise the same execution scenario exiting with label λand will yield identical outputs w

56 Related WorkThe frame problem and its manifestations in the software verification process ndash detect-ing program properties that remain unchanged under a certain operation ndash are notori-ous (Leavens Leino and Muumlller 2007 Leavens and Clifton 2005 OrsquoHearn 2005) Acomplete specification of a program will necessarily include frame properties (BorgidaMylopoulos and Reiter 1995) However though necessary specifying and verifyingframe properties is tedious and repetitive Two prominent solutions to the frame prob-lem come from separation logic (Reynolds 2005 Distefano OrsquoHearn and Yang 2006Calcagno et al 2011) and ownership types (Clarke and Drossopoulou 2002) HoweverMeyer (Meyer 2015) argues that the problem itself should not impose such annotation-heavy solutions Simpler automatic solutions for their specification and verificationwould allow programmers to concentrate on the truly challenging part (Meyer 2015)

Though we share the same desideratum with separation logic (Reynolds 2002Reynolds 2005 OrsquoHearn 2012 OrsquoHearn Yang and Reynolds 2004) the programmingparadigm and context under which we operate leads to a considerably different solutionSeparation logic is targeted at low-level imperative programming languages and itsapplications focus on shared mutable data structures We on the other hand focuson a purely functional language and consider immutable algebraic data structures andarrays We treat mappings between variables and values and analyse their evolution ina side-effect free environment in the context of verification of programs where a newoutput is obtained by altering just a subset of the inputrsquos subelements and preservingthe rest Instead of using a collection of Hoare triples as an abstract domain we havedefined our own dependency domain The results of our dependency analysis are closeto the concept of a footprint (Distefano OrsquoHearn and Yang 2006 Hur Dreyer andVafeiadis 2011 Bobot and Filliacirctre 2012) in the sense that they describe an over-approximation of only those variables and subelements that are needed by a programand are expressed as an input-output relation

The dependency results computed by our analysis are similar to primitive read andwrite effects used in ownership type systems (Clarke and Drossopoulou 2002) Writeeffects in our case are implicit and include strictly the output variables associated toan exit label Read effects can only refer to input variables of a predicate Alsoread effects comprise the whole execution of a method even if they are irrelevant forthe methodrsquos results We however ignore read effects on which the output does notdepend reflecting only those which contribute to the observed result A technique fordeclaring and verifying read effects in an ownership type system is presented in (Clarkeand Drossopoulou 2002) We use static analysis to automatically detect them Inthe Spec (Mike Barnett 2005) program verifier the notion of confined is used for

110 Chapter 5 Dependency Analysis for Functional Specifications

describing the reading effects of a pure method in terms of the ownership cone (ClarkePotter and Noble 1998) of its parameters

In (Hughes 1987) Hughes argues that analyses of programs that manipulate datastructures should ideally distinguish between the information they are computing fora data structure as a whole and the information computed for each component withinit The information that is computed by a backward analysis is dubbed generically ascontext A manner of constructing richer domains is described and it is argued that forinstance a context for a sum type must contain (sub)contexts for any of its summandsSimilarly for product types a context should include a (sub)context for each componentas well as a context referring to the value as a whole We target fine-grained dependencyinformation for structures variants and arrays Similarly to the described producttype contexts our dependencies for structures describe the dependency on each of thestructurersquos fields Variant dependencies are expressed in terms of the dependencies oftheir constructors ie their summands Furthermore it is argued that any contextshould include a maximal element interpreted as a ldquono informationrdquo value a minimalelement interpreted as ldquocontradictory requirementsrdquo and an element representing ldquonocontextrdquo or ldquounusedrdquo Close to the notion of ldquocontradictory requirementsrdquo we includean atomic value denoting impossible in our dependency domain Program points havinga ldquocontradictory requirementsrdquo context denote points in the program that will lead tocrashes if reached Our notion of impossible refers to nodes that are unreachable orconstructors that cannot occur on a given execution path Our maximal elementdenoting everything is a safe value close to the notion of ldquono informationrdquo Nothingan element different from both everything and impossible is similar to the notion ofldquounusedrdquo It denotes (sub)elements that are irrelevant and constitutes quite definiteinformation

Hughes (Hughes 1987) introduces a notion of neededunneeded parameters forprograms manipulating lists This enables detecting whether the value of a subterm isignored The method is formulated in terms of a fixed finite set of projection functionsMultiple other approaches and analyses focus on the elimination of unnecessary datastructures (Cousot and Cousot 1994) filtering of useless arguments and unnecessaryvariables in the context of logic programming (Leuschel and Soslashrensen 1996) and morerecently removing redundant arguments (Alpuente Escobar and Lucas 2007)

The concept of a context is further discussed by Wadler and Hughes in (Wadler andHughes 1987) The authors describe a technique for strictness analysis for non-flat listdomains that relies on contexts represented using the notion of projections from domaintheory These allow expressive list descriptions such as contexts specifying that while alistrsquos elements can be ignored its length is relevant Their backward analysis computesnecessary information using a fixed finite abstract domain

Leino and Muumlller (Leino and Muumlller 2008b) present a technique for verifying thatmethods that query the state of identical data structures return identical or equivalentresults They stress the frequency of such assumptions in program verification as wellas the counter-intuitive amount of effort required for the specification and verificationof such equivalent-results methods and their callers One of the two interpretationsof our dependency values mdash asympτδ mdash is an equivalence relation binding pairs of values

56 Related Work 111

that are not distinguishable by considering only the parts specified by the dependencydomain Thus it ensures not only that identical input data structures will lead to iden-tical results but also that different invocations of a predicate with input data structuresthat are congruent with respect to this interpretation will lead to identical results Ourdependencies are similar to the influence sets presented by Leino and Muumlller Influencesets are represented as sets of heap locations and they are used to specify the partsof the program state that are allowed to impact the return values Influence sets areuser-defined and they are required to be self-protecting This property is enforced byrequiring the set of path expressions specifying the influence set to be prefix close aconstraint which is then checked syntactically In contrast our dependencies are com-puted by static analysis Influence sets may depend on the heap Reasoning aboutheap locations is beyond the scope of our analysis We treat mappings between vari-ables and values analyse their evolution in a side-effect free environment and expressdependencies as input-output relations The technique presented by Leino and Muumlllerhas been applied for reasoning about pure methods (Leino Muumlller and Wallenburg2008 Hatcliff et al 2012 Nordio et al 2010 Banerjee and Naumann 2014)

Identifying the input (sub)parts on which a predicatersquos outputs depend can also beseen as an instance of secure information flow (Sabelfeld and Myers 2003) where thepredicatersquos outputs and the input (sub)parts appearing in the predicatersquos dependencysummary have a low-security level ie are public and everything else has a high-security level ie is private The first interpretation of our dependency values mirrorsthe notion of non-interference as given by Volpano et al in (Volpano Irvine andSmith 1996) for deterministic programs By only observing the public parts nothingcan be concluded about the private parts The link between permissions and ownershiptypes has been underlined by Zhao and Boyland (Zhao and Boyland 2008)

Liu and Stoller present a backward dependence analysis for the computation ofdead code (Liu and Stoller 2003) They obtain expressive descriptions of partiallydead recursive data using liveness patterns These are based on general regular treegrammars that were extended with two notions live and dead Users can specifyliveness patterns at particular program points of interest The analysis then uses theseand computes liveness patterns at all program points based on constraints derived fromthe programming language semantics and the program itself The obtained informationis meant to be used for identifying and eliminating dead code In a separate paper (Liu1998) Liu presents three approximation operations meant to guarantee terminationin the context of fixed point computations using general grammar transformers onpotentially infinite grammar domains

Static dependence or liveness analyses are typically used for code optimizationdead code elimination (Liu and Stoller 2003) and compile time garbage collectionbut only seldom for program verification One exception that we are aware of comesfrom Frama-C (Cuoq et al 2012) where it is used in a purely automatic setting andunlike our analysis it does not handle unions and arrays A plug-in based on theavailable value analysis (Frama-C Value Analysis User Manual) computes lists of inputand output locations for each function distinguishing between operational functionaland imperative inputs and outputs Dependencies computed for an output o hold if

112 Chapter 5 Dependency Analysis for Functional Specifications

and when the analysed function terminates They are represented as sets of variableswhose initial value can influence the final value of o Input variables appearing in thisset are called functional inputs Imperative inputs are the locations that may be readduring the execution of the analysed function An over-approximation of the set ofthese locations is computed locations that are read only in non-terminating branchesare included in the imperative inputs set as well Operational inputs are the memoryzones that are read without having been previously written to

57 ConclusionIn the context of interactive formal verification of complex systems considerable effortis spent on proving the preservation of the systemrsquos invariants However most oper-ations have a localised effect on the system which only really impacts few invariantsat the same time Identifying those invariants that are unaffected by an operation cansubstantially ease the proof burden for the programmer

In this chapter we have presented a data-flow analysis that computes a conserva-tive approximation of the input fragments on which the operations depend It is aflow-sensitive path-sensitive interprocedural dependency analysis that handles arraysstructures and variants For the latter it simultaneously computes a subset of possibleconstructors We have defined our own abstract dependency domain and we obtaindependency information that mirrors the layered structure of compound data types

The main original traits of this contribution stem from its design as an analysismeant to be used as a companion tool during interactive program verification in aunified manner on programs as well as on specifications

We have implemented a prototype of the dependency analysis in OCaml and wehave applied it to a functional specification of ProvenCore (Lescuyer 2015) a general-purpose microkernel that ensures isolation Its proof is based on multiple refinementsbetween successive models from the most abstract one on which the isolation propertyis defined and proven to the most concrete ie the actual model used for code gener-ation Medium-sized experiments performed on the abstract layers of ProvenCore showpositive results For instance the dependency results of approximately 630 αSmil pred-icates totalling approximately 10000 lines of code are obtained in less than 1 secondStatic approaches have long been considered as being confined to small programs Webelieve that our preliminary results indicate that it is possible to report conservativedependency summaries without sacrificing scalability The implementation and the ob-tained results will be presented and discussed in detail in Chapter 8 The prototypecan be tested on the web page2 dedicated to our dependency analysis where variousexamples are provided and explained Additionally users can devise and test their ownexamples

An obvious first challenge is to address the issue of context-sensitivity In thefollowing chapter we present a solution based on lazy components which are includedin our interprocedural dependency summaries The current intraprocedural context is

2Dependency Analysis Web Page httpajl-demofr2015

57 Conclusion 113

injected in them on an as-needed basis As we will show in Chapter 6 these lead toimproved precision with only a marginal decrease in performance

Our main goal is to combine the dependency analysis with the correlation analysispresented in Chapter 7 which is meant to detect relations between inputs and outputsBy uncovering partial equivalence relations between inputs and outputs after havingdetected that a property only depends on unmodified parts and by unifying the resultsthe preservation of invariants for the unmodified parts can be inferred

We surmise that besides its intended target other programming activities can relyon our dependency analysis as well For instance it could have applications in thetesting realm the computed dependency information could be used for designing andgenerating test suites that avoid redundant testing of the same execution scenarioBased on the second interpretation mdash asympτδ mdash of our dependency information given inSection 55 classes of inputs that will test the same execution scenario can be deter-mined The input subelements on which the outputs of a predicate do not depend canbe consistently supplied with the same testing value as they are completely irrelevantfor the outcome On the contrary the input subelements on which the outputs dependshould be targeted and their values should be varied for more comprehensive testingSince our dependency analysis computes results for every exit label of an αSmil pred-icate it could also facilitate unit testing for exceptions Furthermore the computeddependency information could provide assistance in specifying read effects of predicatessimilar to accesible clauses (Leavens et al 2006) in JML

The dependency analysis presented in this chapter has been the subject of a previouspublication (Andreescu Jensen and Lescuyer 2015)

115

Chapter 6

Deferred Dependencies InjectingContext in DependencySummaries

No symbols where none intended

Samuel Beckett

61 Dealing with Context-InsensitivityTraditionally the precision of static analyses is characterized along several axes in-cluding the scope of the analysis ie intraprocedural or interprocedural analyses anddifferent nuances of sensitivity relative to the analysisrsquo use of control-flow informationor of information pertaining to the calling context This classification and terminologyhas its origins in data-flow analyses (Hind 2001 Midtgaard 2012) Regarding scopeintraprocedural analyses are local and operate within the boundaries of procedures Incontrast interprocedural analyses are global and operate across procedure calls (Midt-gaard 2012) These are somewhat more challenging and costly to perform and imposedealing with parameter mechanisms

Another important distinction is made regarding the calling context Context-sensitive analyses distinguish between different calling contexts At the other end ofthe spectrum context-insensitive analyses compute information only once and subse-quently use the same information at all calling sites Clearly a context-sensitive analysisis more precise than a context-insensitive analysis but it is also more costly (NielsonNielson and Hankin 1999) The choice between which technique to use amounts to acareful balance between precision and efficiency (Nielson Nielson and Hankin 1999)The dependency analysis presented in the previous chapter is an interprocedural flow-sensitive context-insensitive data-flow analysis Regarding pure context-sensitivity ina functional language such as αSmil in which predicate calls and the manipulation ofthe returned outputs are omnipresent unfolding predicates at each call site and recom-puting the needed information seems to be a daunting perspective that risks becomingprohibitive in terms of execution time very quickly On the other hand choosing toanalyse predicates in isolation and to dispense completely with information regarding

116 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

the calling context leads to clear precision losses as illustrated in Section 541 anddiscussed in Section 542 In order to address this aspect we have devised a solutionbased on symbolic dependencies that requires an extension of our abstract dependencydomain (Definition 521) but which otherwise has a minimal impact on the dependencyanalysis at an intraprocedural and interprocedural level

Outline In this chapter we present our solution based on symbolic dependencies Westart by illustrating the addressed problem and the desired results in Section 62 InSection 63 and Section 64 we present the extended abstract dependency domain Weshow the insertion and use of symbolic components at the intra- and interprocedurallevel of our dependency analysis in Section 65 and Section 66 respectively Finallywe discuss their impact on the precision of the computed dependency information

62 Symbolic Dependency Components in a NutshellSymbolic dependency components allow us to compute interprocedural predicate sum-maries with lazy components in which the callerrsquos intraprocedural information andcontext can be injected on an as-needed basis The interprocedural dependency infor-mation for each predicate is still computed only once and propagated back to everypossible call site However even though the analysis does not systematically recomputethe dependency for the called predicate it shows a form of context-sensitivity (Hind2001) and leads to increased precision by creating templates with symbolic elements foreach predicate These elements introduce degrees of freedom in our interprocedural de-pendencies and allow us to parameterize and vary them according to the callerrsquos actualintraprocedural context Thus we exclude some sources of coarse over-approximationswithout sacrificing scalability

Previously in Section 541 we illustrated on two αSmil example predicates threadand start_address how failing to take into consideration the current context of acaller leads to over-approximations We argued in Section 542 that a more precisedependency blueprint can emerge once we consider a predicatersquos use as well The firstexample predicate given in Chapter 5 thread is an accessor predicate it receives aprocess p and an index i as inputs and returns the i-th associated thread of the processp when executing succesfully ie when exiting with the true label The computedpredicatersquos dependency summary for the successful execution scenario was the following

p 7rarr threads 7rarr 〈 i [Some 7rarr gt None 7rarr perp]〉i 7rarr gt

This dependency information is expressive it shows that only one of the 4 fields ofthe input process is read by the predicate while all others are irrelevant for its outputThe read field threads corresponds to the array of threads associated to the inputprocess p Furthermore the dependency summary shows that for this array only thei-th element is inspected This element is entirely needed while all others are irrelevant

62 Symbolic Dependency Components in a Nutshell 117

This summary provides a rather detailed and precise blueprint of the predicatersquos outputdependencies on its inputs Yet it fails to make one subtle but important distinctionregarding the dependency on the i-th element of the associated threads array Ifwe want to be more accurate while describing this predicatersquos dependency we needto acknowledge that the predicate itself is not actually needing or depending on thei-th associated thread of the process Indeed it does not read or use it per se itmerely retrieves it Thus the dependency on the input processrsquo i-th associated threadis relative to the amount in which the callers of the thread predicate will use theoutput element in which it is retrieved It is important to distinguish between thesetwo rather subtle nuances Failing to do so can shadow information that is computedwhile analysing callers of the thread predicate This was exactly what happened forour second example predicate start_address The predicate start_address receivesa process p and an index j as inputs It makes a call to the predicate thread thusreading the j-th associated element of the process p If this is an active element itfurther accesses the field stack from which it only reads the start address start Theobtained dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr gt None 7rarr perp]〉j 7rarr gt

was an over-approximation of the desired dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr gt None 7rarr perp]〉j 7rarr gt

Intraprocedurally the dependency analysis was correctly detecting that only thefield stack of the thread was needed for which only the start field was read Howeverwhen joining the dependency information computed locally for start_thread with theone given by the predicatersquos thread dependency summary we obtain less precise de-pendency results This scenario is not a corner case it would typically be exhibited inthe case of accessor predicates and their callers

In order to address this source of precision loss we can introduce symbolic or lazycomponents in our abstract dependency domain As a first attempt and approximationwe could consider the set of output variables of a predicate as the lazy componentsThese can be seen as the points at which a caller predicate may insert its intraproceduralinformation in the dependency summary computed for the callee predicate

The dependency summary for a successful execution of the thread predicate iethe true exit label would therefore not map the i-th element of the threads arrayto everything ie gt the top element of our abstract dependency domain Insteadthis would be mapped to the symbolic set of output variables in which this inputsubelement is retrieved ie the set containing the ti output variable We denote thisset by Deferred(ti) as it represents the set of points in which a caller predicate caninject its context Establishing the dependency on the i-th associated thread of theinput process p is thus deferred or postponed and left to the caller predicates it isrelative to their context and the amount in which they use the output ti

118 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr Deferred(ti) None 7rarr perp]〉j 7rarr gt

Using this dependency summary when computing the information for the predicatestart_thread we would obtain the targeted dependency result

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr Deferred(adr) None 7rarr perp]〉j 7rarr gt

This dependency summary for start_address shows that the dependency on thej-th associated thread of the input process p depends on the amount in which theoutput adr representing the start address of the threadrsquos stack is subsequently usedIndeed start_address itself is an accessor predicate

This first approximation of lazy components as sets of output variables of a predi-cate is effective for accessor predicates However its limitations become visible whenconsidering functional non-destructive mutator predicates for example Such predi-cates receive a compound input destructure it and construct a new output variableThis is created by modifying only one of the compound inputrsquos subelements and bycopying all the rest without further changes For example the predicate set_threadshown below is the dual of our thread example predicate It receives a process p athread ti and an index i as inputs and returns a new process r as an output ob-tained by setting the i-th associated thread in the threads array to ti and by copyingeverything else from p

predicate set_thread ( process p int i thread ti)-gt [ true process r] array ltoption ltthread gtgt threads option ltthread gt tio

r = p [ true -gt 1]threads = r threads [ true -gt 2]tio = Some(ti) [ true -gt 3]threads = [ threads with i = tio] [ true -gt 4 f a l s e -gt 6]r = r with threads = threads [ true -gt 5][ true][error]

The dependency summary computed for this predicate on the exit label true isshown below It indicates that the given inputs the index i and the thread ti used forupdating the i-th associated thread of the output process r are completely needed Forthe input process p the fields pid crt_thread and adr_space are completely neededas well They are copied without further changes to the output r From the arrayof associated threads all elements except the i-th are needed as well The latter iscompletely irrelevant since it is replaced in the output r by the given ti The formerare simply read and copied to r

62 Symbolic Dependency Components in a Nutshell 119

p 7rarr

threads 7rarr 〈gt i 〉pid 7rarr gt

crt_thread 7rarr gtadr_space 7rarr gt

i 7rarr gtti 7rarr gt

At a first glance this dependency summary seems to reflect rather accurately thepredicatersquos inputs and input subelements on which the output process r depends onHowever similarly to the accessor predicate thread a further distinction is possibleThe predicate set_thread does not depend itself on the input ti nor on the fields ofthe process p It does not use these for new computations ndash it simply copies them to thecorresponding output subelements Just as before the amount in which the outputrsquossubelements are used subsequently characterizes more precisely the dependency on theinputs of set_thread For instance the dependency on prsquos current thread field shouldbe the symbolic element corresponding to the outputrsquos process crt_thread Howeverour first attempt at representing symbolic elements as sets of output variables seen asa whole does not allow us to convey such information For expressing it we first needto be able to refer to the substructure rcrt_thread and use this as a lazy componentin which callers may inject their own context Similarly for the threads array we needto be able to refer to all other elements except the i-th one Thus at the symbolicdependencies level as well we need the capability of distinguishing between the differentsubelements of the inputs This would allow us to obtain the following dependencysummary

p 7rarr

threads 7rarr 〈 Deferred(rthreads〈lowast i 〉) i 〉pid 7rarr Deferred(rpid)

crt_thread 7rarr Deferred(rcrt_thread)adr_space 7rarr Deferred(radr_space)

i 7rarr gtti 7rarr Deferred(rthreads〈 i 〉Somet)

One way to capture the actual effect that is due to set_thread consists in replac-ing all deferred dependencies with ie nothing and simplifying the summary Thedependency summary thus obtained shows the dependency on set_threadrsquos inputs inthe extreme case of calling the predicate and throwing away its result In this casethe summary for set_thread would show that the predicate only depends on the in-put i and on the length or support of the threads array captured by 〈〉 On thecontrary by replacing the deferred dependencies with gt ie everything we obtainexactly the results computed by the context-insensitive dependency analysis presentedin Chapter 5 The information thus obtained shows the dependency on set_threadrsquosinputs when considering the other end of the spectrum namely calling the predicateand using its result entirely

120 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

The dependency summary with deferred occurrences is indeed precise Not onlydoes it create a dependency template in which callers can inject their own context but italso distills the predicatersquos set_thread specification A quick glance and interpretationof it indicates that it is indeed a non-destructive mutator updating the i-th associatedthread of a process to ti and preserving everything else

In order to obtain such dependency summaries we need to refine our first approx-imation of symbolic elements as sets of a predicatersquos output variables Just as neededin our initial abstract dependency domain we must reflect the layered structure ofalgebraic data types and arrays at the level of symbolic dependencies as well To thisend we need to consider not only sets of output variables but also symbolic paths tosubstructures within them

63 Symbolic Paths

631 Symbolic Path Type

In order to extend our abstract dependency domain with symbolic dependencies and toobtain expressive dependency summaries as the ones discussed in the previous sectionwe begin by introducing symbolic paths These are meant to mirror the layered structureof algebraic data types and arrays at the level of symbolic dependencies

Each deferred occurence in a dependency summary is identified by symbolic pathsSymbolic paths are rooted at one of the programrsquos variables and represent sequences ofsymbolic internal accesses inside some valuersquos structure ie they are symbolic traversalsfrom one value to some of its subparts Paths are chains of symbolic accesses leadingto nested elements in which different calling contexts can be subsequently injected Wedefine a recursive type π of symbolic paths encompassing this

Definition 631 Symbolic path type π isin Π

π isin Π π = | ε endpoint ndash root| f π f isin F | Cπ C isin C| 〈i〉π i index| 〈lowast i〉π i index| 〈lowast〉π

An endpoint denoted by ε is the special path denoting an entire element For struc-tures we denote the symbolic path to some field f by fπ Similarly for variants wedenote the path to some chosen constructor C by Cπ For arrays we distinguishbetween three cases

bull symbolic paths referring to a specific array cell identified by the cellrsquos index iand denoted by 〈i〉π

bull symbolic paths referring to all but one specific array cell identified by its indexi and denoted by 〈lowast i〉π

63 Symbolic Paths 121

bull symbolic paths referring to all the cells of an array denoted by 〈lowast〉π

With one exception these symbolic paths directly reflect the cases of our abstractdependency domain For instance the correspondance between symbolic paths forstructures or variants is immediately apparent In contrast for arrays the abstractdependency domain included two cases namely 〈δ〉 corresponding to a dependencyapplying to all of the cells and 〈δdef i δexc〉 corresponding to arrays with a generaldependency applying to all but one exceptional cell for which a specific dependencyis known In order to reflect the second case in the deferred occurrences we need tobe able to refer to the exceptional cell on one hand and to all other cells of the arrayon the other hand Hence to this end we need to introduce two symbolic path typesthe symbolic 〈i〉π path for expressing deferred occurrences of exceptional cells and the〈lowast i〉π symbolic path for expressing deferred occurences of all the other array cellsexcept the one identified by i

The action of appending a non-empty path πprime to another path π is denoted byπ πprime We call the extension operator and when applying it we say that we extendπ with πprime

We further consider sets P sub Π of symbolic paths π and define the partial orderv

between them

Definition 632 Partial Orderv for Path Sets

forallP sub Π P prime sub Π Pv P prime lArrrArr P sube P prime

They establish a semi-lattice based on the subset order The bottom element of thissemi-lattice is empty the empty set of paths

forallP sub Π emptyv P

There is no top element Theoretically this would correspond to the set representingall possible paths In practice this cannot be constructed and we chose not to add aspecial case for it to our symbolic path type π

The join operation of deferred path sets is based on set union and is denoted byor

Definition 633 Join Operationor for Path Sets

forallP sub Π P prime sub Π Por P prime = P cup P prime

It is symmetric and the value obtained by joining two path sets is the least upper boundApplying the extension operator on a set of symbolic paths P amounts to a

pointwise extension of each member of the path set

Definition 634 Extension Operator for Path Sets

forallP sub Π P πprime = π πprime| π isin P

122 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

632 Semantics of Symbolic Paths

Semantically paths of type π defined previously are a symbolic representation of severalactual paths In the following we explicit this notion and we begin by defining simpleactual paths in a value of the universe D (Definition 441)

Actual paths represent a unique sequence of internal accesses inside some valuersquosstructure leading to a single nested element Unlike symbolic paths that can forinstance cover multiple elements of an array an actual path designates a single subvalueof a structure variant or array The recursive actual path type π isin Π is defined below

Definition 635 Actual Path Type π isin Π

π = | ε empty| f π f isin F | C π C isin C| 〈i〉π i index

A symbolic path π covers an actual path π if when given a valuation E (Defini-tion 442) of the index variables for arrays it matches π A set of symbolic pathscovers an actual path π if at least one of the symbolic paths matches π We denotethis by the E relation that is parameterized by a valuation E The definition of Eis given in Table 61

Table 61 ndash E ndash Path Semantics

ε E εE ε

π E π

fπ E f πEStruct

π E π

Cπ E CπEVar

π E π E(i) = j

〈i〉π E 〈j〉πECell

π E π

〈lowast〉π E 〈j〉πEAnyCell

π E π E(i) 6= j

〈lowast i〉π E 〈j〉πEOutCell

Given a valuation E a set P of symbolic paths covers an actual path π if at leastone of the symbolic paths in the set covers or matches π

forallP sub Π P E π lArrrArr existπ isin P π E π

63 Symbolic Paths 123

The interpretation JP KE of a set of paths P is then the set of single actual pathsthat are covered given a valuation E

Definition 636 Interpretation JP KE of a set of paths P

forallP sube Π JP KE = π| P E π

The partial orderv (Definition 632) on sets of paths is compatible with the inter-

pretation JP KE in the sense that when Pv Q holds the interpretation JP KE of P is

included in JQKE for every valuation

forallPQ sube ΠforallEPv Q lArrrArr JP KE sube JQKE

Each single path can be interpreted as a way to find a subpart of a value which weexplicit by the following function at It is not defined for all cases since not all pathscan be applied to all values

Definition 637 Function at

at Πtimes Drarr D

at(π v) =

v when π = ε

at(πprime vi) when π = fiπprime and

v = f1 = v1 fi = vi fn = vnat(πprime vC) when π = Ciπprime and

v = Ci[vC ]at(πprime vi) when π = 〈i〉πprime and

v = (P (vk)kisinP)i isin P

633 Well-Typed Paths and Path Sets

Symbolic paths cannot be used in every context their interpretation must be made inthe context of a type τ An endpoint ie the ε symbolic path can apply to any type Incontrast other symbolic paths that exhibit specific data features can only apply to thecorresponding types For instance a path such as fπ is meaningless on values whichare not records or on record values that do not exhibit a field f the field specified inthe symbolic path

A path set can be seen as a set of sequences of internal accesses inside some valuesrsquosstructure In that sense it is a set of possible traversals from one value to some of itssubparts To characterize the contexts in which a path set is well-typed we need toconsider the types of values to which it can be applied and the types of values to whichit can lead to Therefore in the following we begin by defining a typing judgement forsymbolic paths as a three-place relation π τ rarr τ prime whose meaning is that π can beapplied to any value of type τ and in that case it will always describe subvalues of type

124 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

τ prime Additionally the typing judgement is also parameterized by a set of input variablesI which are the variables having the right to appear as identifiers for array accessesThis is detailed in Table 62

I ` ε τ rarr τWTε

τ = structf1 τ1 fi τi fn τnI ` πi τi rarr τ prime

I ` fiπi τ rarr τ primeWTStructPath

τ = variant[C1 τ1| | Ci τi| | Cn τn]I ` πC τi rarr τ prime

I ` CiπC τ rarr τ primeWTVarPath

Γ ` π τ rarr τ prime

I ` 〈lowast〉π arrτi〈τ〉 rarr τ primeWTArrayPath

I ` π τ rarr τ prime I(i) = τi

I ` 〈i〉π arrτi〈τ〉 rarr τ primeWTCellPath

I ` π τ rarr τ prime I(i) = τi

I ` 〈lowast i〉π arrτi〈τ〉 rarr τ primeWTOutPath

Table 62 ndash Well-Typed Dependency Paths

A set P of symbolic paths is well-typed if every path contained by it is well-typedfor the same types

forallP sub Π I` P τ rarr τ prime lArrrArr forallπ isin P I ` π τ rarr τ prime

The well-typedness property of sets of symbolic paths is preserved by the join op-eration

or (Definition 633)

forallP prime P primeprime isin Π forallτ prime τ primeprime isin TI` P prime τ prime rarr τ primeprime rArr I

` P primeprime τ prime rarr τ primeprime rArr I

` P prime

or pprimeprime τ prime rarr τ primeprime

When extending a well-typed set of symbolic paths with a well-typed path using theextension operator (Definition 634) the resulting set of symbolic paths is well-typed

64 Abstract Dependency Domain with Deferred Accesses 125

as well

forallP prime isin Π forallτ τ prime τ primeprime isin TI` P prime τ prime rarr τ primeprime I ` πprime τ primeprime rarr τ rArr I

` P prime πprime τ prime rarr τ

64 Abstract Dependency Domain with Deferred AccessesFrequently as explained in Section 62 the dependency on a predicatersquos input variable isrelative to the amount in which some of the predicatersquos outputs are subsequently neededMore precisely these outputs are those into which the input variable is copied andretrieved We strive to avoid over-approximations in such cases and to create degreesof freedom for the callers by treating such output variables as points in which callers caninject their own context externally In other words we want to defer the computationof the dependency on certain input variables of a predicate to the predicatersquos callerssince they have additional information about the actual use of the predicatersquos outputs

In our previous section mdash Section 63 mdash we have introduced and defined an in-termediate level consisting of symbolic paths and path sets These reflect the layeredstructure of algebraic data types and arrays and allow us to consider not only outputvariables as a whole but also symbolic paths within them Thus we can computemore flexible and expressive dependency summaries with finer-grained elements Wecan finally link these two ideas and extend our abstract dependency domain with de-ferred dependencies by including an additional dependency case in our domain δ isin Dinitially defined (Definition 521) in Section 52

Definition 641 Extended Abstract Dependency Domain δ isin D

δ = | gt Everything ndash atomic case (i)| Nothing ndash atomic case (ii)| perp Impossible ndash atomic case (iii)| f1 7rarr δ1 fn 7rarr δn f1 fn fields (iv)| [C1 7rarr δ1 Cm 7rarr δm] C1 Cm constructors (v)| 〈δ〉 (vi)| 〈δdef i δexc〉 i array index (vii)| Deferred(o1 7rarr P1 ok 7rarr Pk) deferred accesses (viii)

A deferred dependency shown in (viii) consists of a mapping which binds outputvariables which we also call root variables in this case to sets of symbolic paths

Definition 642 Access Map

A V 9 Π

Only output variables can be treated as lazy dependency components The sets ofsymbolic paths mapped to them allow us to distinguish between their subelements Inthe following discussion we will denote an access map o1 7rarr P1 ok 7rarr Pk by a

126 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

For the partial order v (Definition 522) defined in Chapter 5 and detailed in Ta-ble 51 an additional rule (Def) for comparing instances of deferred dependencies isadded This is shown in Table 63 The top and bottom elements of our dependencydomain are as before gt and perp respectively Thus any instance of a deferred depen-dency is more precise than gt and less precise than perp Just as gt perp and the specialdependency case a deferred dependency can be used in association to any typealbeit with some constraints for its elements

forallo 7rarr P isin a a(o)v aprime(o)

Deferred(a) v Deferred(aprime)Def

Table 63 ndash Extended Leq - Comparison of Two Domains

However unlike the atomic cases gt perp and deferred dependencies are not relatedto or to dependencies corresponding to structures variants or arrays Since they actas placeholders for dependencies that are effectively computed subsequently instancesof deferred dependencies can be compared only to gt and perp or to other instances ofdeferred dependencies For instance comparing a deferred dependency to wouldyield

Deferred(o1 7rarr P1 ok 7rarr Pk) 6v and

6v Deferred(o1 7rarr P1 ok 7rarr Pk)

The extended join operation or (Definition 523) initially defined in Section 521and detailed in Table 52 is shown below in Table 64 It still has perp as its identityelement and gt as its absorbing element Joining two instances of deferred dependen-cies amounts to a pointwise join of the path sets mapped to each output variable inthe access maps The join between an instance of a deferred dependency and a de-pendency corresponding to a structure a variant an array or to the special case amounts to gt the top element of our domain Since we cannot make any supposi-tion regarding deferred dependencies we are forced to make a pessimistic assumptionand to approximate to the least precise value Join is a commutative operation forwhich the undisplayed cases in Table 64 are defined with respect to their symmetricalcounterparts

Similarly to join the reduction operation oplus (Definition 524) has been initiallydefined in Section 521 and it has been detailed in Table 53 The extended form isshown in Table 65 It still has as an identity element and perp as an absorbing elementWhen applying the reduction operation between a deferred dependency and a depen-dency δprime corresponding to a structure a variant or an array we over-approximate thedeferred dependency to gt and apply the reduction operation between δprime and gt Apply-ing the reduction operation between a deferred dependency and gt behaves similarlythe outcome in this case is straightforward and amounts to gt As was the case forjoin applying the reduction operation between two instances of deferred dependencies

64 Abstract Dependency Domain with Deferred Accesses 127

δprime δprimeprime δprime or δprimeprime

Deferred(a) or Deferred(aprime) = Deferred(aprimeprime) where

aprimeprime(o) =

a(o)

or aprime(o) when o 7rarr Po isin a o 7rarr P primeo isin aprime

Po when o 7rarr Po isin aP primeo when o 7rarr P primeo isin aprime

Deferred(a) or f1 7rarr δ1 fn 7rarr δn = gtDeferred(a) or [C1 7rarr δ1 Cm 7rarr δm] = gtDeferred(a) or 〈δ〉 = gtDeferred(a) or 〈δdef i δexc〉 = gtDeferred(a) or = gt

Table 64 ndash or ndash Extended Join

amounts to a pointwise join of the path sets mapped to each output variable in theaccess maps The reduction operation is commutative and the undisplayed cases inTable 65 are defined with respect to their symmetrical counterparts

δprime δprimeprime δprime oplus δprimeprime

Deferred(a) oplus Deferred(a) = Deferred(aprimeprime) where

aprimeprime(o) =

a(o)

or aprime(o) when o 7rarr Po isin a o 7rarr P primeo isin aprime

Po when o 7rarr Po isin aP primeo when o 7rarr P primeo isin aprime

Deferred(a) oplus gt = gtDeferred(a) oplus f1 7rarr δ1 fn 7rarr δn = gtoplus f1 7rarr δ1 fn 7rarr δnDeferred(a) oplus [C1 7rarr δ1 Cm 7rarr δm] = gtoplus [C1 7rarr δ1 Cm 7rarr δm]Deferred(a) oplus 〈δ〉 = gtoplus 〈δ〉Deferred(a) oplus 〈δdef i δexc〉 = gtoplus 〈δdef i δexc〉

Table 65 ndash oplus ndash Extended Reduction Operator

Finally the extractions previously defined for dependencies δ (Definition 525 526527 528 and 529) have been extended in order to handle deferred dependencies aswell Their treatment is summarized in Table 66 Making array-specific extractions aswell as extracting field and constructor dependencies on a deferred dependency amountsto a pointwise extension of every path set in the access map with the correspondingsymbolic path

Finally we add the following rule to the well-typed dependency rules given in Chap-ter 5 Table 55

128 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

Extraction δ Result

Field Deferred(o1 7rarr P1 ok 7rarr Pk)f Deferred(o1 7rarr P1 fε ok 7rarr Pk

fε )Constructor Deferred(o1 7rarr P1 ok 7rarr Pk)C Deferred(o1 7rarr P1

Cε ok 7rarr Pk Cε )

Cell Deferred(o1 7rarr P1 ok 7rarr Pk)〈i〉 Deferred(o1 7rarr P1 〈i〉ε ok 7rarr Pk

〈i〉ε )Array General Deferred(o1 7rarr P1 ok 7rarr Pk)〈lowast〉 Deferred(o1 7rarr P1

〈lowast〉ε ok 7rarr Pk 〈lowast〉ε )

Outside Cell Deferred(o1 7rarr P1 ok 7rarr Pk)〈lowast i〉 Deferred(o1 7rarr P1 〈lowast i〉ε ok 7rarr Pk

〈lowast i〉ε )

Table 66 ndash Extended Extraction Operators

Γ(o1) = τ1 Γ I` P1 τ1 rarr τ

Γ(ok) = τk Γ I` Pk τk rarr τ

o1 isin O ok isin OΓ IO ` Deferred(o1 7rarr P1 ok 7rarr Pk) τ

WTDeferred

Table 67 ndash Well-Typed Dependencies ndash Extended

65 Deferred Dependencies at the Intraprocedural Level

651 Extended Intraprocedural Dependency Analysis

At the intraprocedural and interprocedural level of our dependency analysis the intro-duction of deferred dependencies has a minimal impact in terms of required changes

Intraprocedurally each predicate is analysed on every possible exit label As ex-plained in Section 532 our dependency analysis is a backward data-flow analysis Foreach possible exit label of a predicate the control flow graph is traversed backwardsstarting from the exit node that corresponds to the analysed execution scenario De-pendency information is computed at every point of the control flow graph for eachof the predicatersquos input output and local variables and this information is graduallyrefined until a fixed point is reached

By traversing the control flow graph backwards we take advantage of the infor-mation regarding the outputs that are associated to the analysed exit label and weconsider only the relevant ones starting from the initialisation phase As explainedpreviously in Section 532 the intraprocedural domain for the currently analysed exitlabel is initialised with its associated output variables mapped to gt the least preciseelement of our abstract dependency domain This is a conservative over-approximationit is considered that control on the outputs is lost and that these are entirely observedexternally As illustrated in Section 62 this over-approximation propagates along thecontrol flow graph and in certain cases has a non-negligible impact on the precisionof the computed dependency summaries

We argued that at the intraprocedural level of the analysis a subtle but importantdistinction can be made regarding the dependency on certain inputs This consists in

65 Deferred Dependencies at the Intraprocedural Level 129

distinguishing between the cases in which a predicate effectively uses an input subele-ment to compute an output subelement and those in which it simply forwards it toan output subelement In the latter cases the predicate does not use or need such aninput subelement per se and as a consequence the dependency on it is relative to theamount in which the predicatersquos callers will subsequently use the output in which itis retrieved At the intraprocedural level in order to avoid the propagation of over-approximations it is important to make this distinction early on from the initialisationphase Therefore we introduce deferred dependencies at this level instead of mappingthe output variables to gt as was previously done

For a predicate p of the following form

p(e1 en) [λ1 o11 o1k1 | | λi oi1 oiki | | λm om1 omkm ]

analysed on the λi exit label the intraprocedural dependency domain used for initial-ising the node corresponding to λi is the following

oi1 7rarr Deferred(oi1 7rarr ε) oiki 7rarr Deferred(oiki 7rarr ε)

For each associated output oij 1 le j le ki of the analysed label λi a set Poij ofsymbolic paths is constructed Initially this consists of a single element namely the εpath The deferred dependency associated to each output oij is an access map bindingoij itself to its corresponding set of symbolic paths Poij Since the symbolic paths εrefer to the output variables in their entirety this is still a conservative approximationbut in contrast to our previous initialisation strategy it acknowledges the fact thatdependencies on the inputs might be relative to the amount in which the outputs aresubsequently used It allows injecting context-sensitive information later on

This new initialisation strategy is enough to incorporate the expressive power ofdeferred dependencies at an intraprocedural level Whereas before we were computinglabel-specific dependency summaries as input-output relations the new strategy allowsus to obtain label-specific dependency templates with lazy components that can beparameterized and varied according to a callerrsquos own intraprocedural context Thesecan be seen as context-insensitive dependency summaries with context-sensitive leaves

652 Intraprocedural Dependency Analysis Illustrated

In order to illustrate the use of deferred dependencies at an intraprocedural level werevisit our thread example predicate discussed in Section 533 As done previouslywe consider the true execution scenario and apply our extended dependency analysisWe initialize the dependency corresponding to the true exit node by mapping thepredicatersquos output ti to the deferred dependency mapping it to a set containing asingle symbolic path namely ε

130 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

After the initialisation phase the analysis continues as before by traversing thecontrol flow graph backwards and by applying at each step the corresponding data-flow equation The deferred dependency is propagated upwards until the entry node isreached and analysed

th = pthreads

tio = th[i]

switch(tio) as [ | ti] oob

true None

true

true false

Some NoneUnreachable

Unreachable

p 7rarr threads 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti 7rarr ε) None 7rarr perp]〉i 7rarr gt

th 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti 7rarr ε) None 7rarr perp] 〉i 7rarr gt

tio 7rarr [Some 7rarr t 7rarr Deferred(ti 7rarr ε) None 7rarr perp]

ti 7rarr Deferred(ti 7rarr ε)

Figure 61 ndash Analysing thread ndash Dependency Summary with DeferredOccurrences

The final dependency summary for the true exit label of the predicate is obtained

p 7rarr threads 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti) None 7rarr perp]〉i 7rarr gt

and this is similar to the targeted dependency information for thread discussed inSection 62 and illustrated on page 117

66 Deferred Dependencies at the Interprocedural LevelAt the interprocedural level the impact of introducing deferred dependencies is visibleonly at the level of the substitutions that have to be performed Previously the only re-quired substitution consisted in replacing all occurrences of formal input parameters ofa predicate with the corresponding effective input parameters After having introduceddeferred dependencies further substitutions are needed These can be easily illustratedby revisiting our start_address example predicate discussed in Section 541 As donepreviously we consider the true execution scenario and apply our extended dependencyanalysis

We begin by initialising the output adr with a corresponding deferred dependencyas discussed in Section 651 The analysis traverses the control flow graph backwardsand computes the dependency information at each node until reaching the controlflow graphrsquos entry node which corresponds to a call to the thread predicate Theintermediate dependency results are shown in Figure 62

We obtain the dependency summary for the true exit label of the called predicatethread In order to be able to use it we must first substitute the formal input param-eters ie p and i appearing in it with the effective arguments of the call ie p andj Additionally in deferred dependencies we also have to substitute the formal output

66 Deferred Dependencies at the Interprocedural Level 131

thread(p j)[true tj | None | oob]

sj = tjstack None

adr = sjstart

true

trueNone oob

true

true

adr 7rarr Deferred(adr 7rarr ε)

sj 7rarr start 7rarr Deferred(adr 7rarr ε)

tj 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε)

Figure 62 ndash Gstart_address ndash Intermediate Dependency Results forstart_address

parameters appearing as roots in the access maps ie ti with the corresponding ef-fective output parameters These substitutions are shown in Figure 63 Formal indexvariables appearing in dependencies corresponding to arrays have to be substitutedwith their effective counterparts as well Similarly any formal index variable appearingin symbolic paths that correpond to arrays must be substituted by the correspondingeffective index variable

p 7rarr threads 7rarr 〈 i [Some 7rarr t 7rarr Deferred(ti) None 7rarr perp]〉i 7rarr gt

tj 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε)

p j tj

j

Figure 63 ndash Substitution of Formal Parameters by Effective Parame-ters

We can finally take advantage of the flexibility obtained using deferred dependenciesby injecting the callerrsquos intraprocedural dependency information into the deferred oc-currences of the calleersquos dependency summary This is another type of substitution andconsists in replacing deferred occurrences of formal output parameters of a predicateby the dependency information computed in the current context for the correspondingeffective output parameters For our start_address example this is shown in Fig-ure 64 and amounts to substituting the dependency computed for tj in the deferredoccurrence of ti in the dependency summary of thread

After this substitution we obtain the following dependency summary for the exitlabel true of the start_address predicate

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε) None 7rarr perp]〉j 7rarr gt

132 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

p 7rarr threads 7rarr 〈 j [Some 7rarr t 7rarr Deferred(tj) None 7rarr perp]〉j 7rarr gt

tj 7rarr stack 7rarr start 7rarr Deferred(adr 7rarr ε)

Figure 64 ndash Substituting Deferred Dependencies by Actual Dependen-cies

661 Applying Context-Sensitive Information by Substitution

As shown in our previous example deferred dependencies associate sets of symbolicpaths to certain root variables We can substitute such deferred dependencies by actualdependencies computed in the current context by applying the symbolic paths to theactual dependency to substitute We iterate through entire dependency summaries inorder to substitute the nested deferred dependencies appearing at some leaves Thissubstitution can be seen as an application of contextual information to summarieswith deferred dependencies which are essentially context-insensitive abstractions withcontext-sensitive leaves It is denoted by a mapping σ which associates dependenciesto root variables appearing in deferred access maps

Definition 661 Substitution σ

σ V rarr D

Simultaneously while substituting root variables in deferred dependencies by theiractual dependencies computed in the current intraprocedural context we also substi-tute indices in information corresponding to arrays These are substituted either byanother array index ie the one corresponding to an actual input parameter or theyare eliminated when corresponding to a local variable Their elimination consists inapproximating the dependencies so as to remove references to the array index Thissubstitution is denoted by φ and it is a mapping from variables to new variables toreplace them

Definition 662 Substitution φ

φ V 9 V

The two substitutions can be done separately However for performance reasonswe chose to do them simultaneously This is also what the actual implementation of thedependency analysis does We denote the two simultaneous substitutions by J (σ φ)and detail them in Table 69 Performing the two operations simultaneously can beseen as a manner of reinterpreting a dependency computed in one context in anothercontext

For sets of symbolic paths (as defined in Section 631) in deferred dependenciesthe operation P bull (σ(o) φ) is the application of symbolic paths to the dependency of

66 Deferred Dependencies at the Interprocedural Level 133

the root variable o computed in the current context For a deferred access map alldependencies obtained by applying the symbolic paths are joined The application of asymbolic path π to a dependency δ is denoted by π (δ φ) and it is shown in Table 68During the application free variables appearing in symbolic paths associated to arraysare substituted by their corresponding index variables as given by φ If φ does notcontain a mapping for a free variable an approximation is made in order to remove itand the dependency obtained by applying 〈lowast〉 is returned

π (δ φ)

ε (δ φ) = δ

fπ (δ φ) = π (δf φ)Cπ (δ φ) = π (δC φ)〈lowast〉π (δ φ) = π (δ〈lowast〉 φ)

〈i〉π (δ φ) =π (δ〈φ(i)〉 φ) i isin Dom(φ)π (δ〈lowast〉 φ) otherwise

〈lowast i〉π (δ φ) =π (δ〈lowast φ(i)〉 φ) i isin Dom(φ)π (δ〈lowast〉 φ) otherwise

Table 68 ndash Deferred Paths ndash Application and Substitutions

Definition 663 Application of Symbolic Paths to a Dependency

P bull (δ φ) =orforallπisinP

π (δ φ)

δ J (σ φ)

gt J (σ φ) = gt J (σ φ) = perp J (σ φ) = perp

f1 7rarr δ1 fn 7rarr δn J (σ φ) = f1 7rarr δ1 J (σ φ) fn 7rarr δn J (σ φ)[C1 7rarr δ1 Cm 7rarr δm] J (σ φ) = [C1 7rarr δ1 J (σ φ) Cm 7rarr δm J (σ φ)]

Deferred(o1 7rarr P1 ok 7rarr Pk) J (σ φ) =or

1leilekPi bull (σ(oi) φ)

〈δdef 〉 J (σ φ) = 〈δdef J (σ φ)〉

〈δdef i δexc〉 J (σ φ) =〈δdef J (σ φ) φ(i) δexc J (σ φ)〉 i isin Dom(φ)〈δdef J (σ φ) or δexc J (σ φ)〉 otherwise

Table 69 ndash Interprocedural Domain ndash Substitutions

134 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

662 Wrapped Calls and Results

As a simple experiment for verifying the precision of our dependency analysis approachwith deferred dependencies we have replaced all calls to built-in predicates in ourprevious example predicates thread and start_address illustrated in Section 652and on page 131 respectively with calls to predicates wrapping every call of this typeWe compared the precision of the obtained results as well as the execution time neededto compute the dependency summaries

The thread_with_wrapped predicate thus has the following formpredicate thread_with_wrapped ( process p int i)-gt [ true thread ti|None|oob] array lt option_thread gt th option_thread tio

get_threads (p)[ true th] [ true -gt 1]get_ith (th i)[ true tio| f a l s e ] [ true -gt 2 f a l s e -gt 5]switch_option (tio )[ none|some ti] [none -gt 4 some -gt 3][ true][None ][oob]

The start_address predicate becomespredicate start_address_wrapped ( process p int j)

-gt [ true int adr|None] thread tj memory_region sj

thread (p j)[ true tj | None | oob] [ true -gt 1None -gt 4 oob -gt 4]

get_stack (tj) [ true sj] [ true -gt 2]get_start (sj) [ true adr] [ true -gt 3][ true][None ][error]

The dependency summaries obtained for each of the two predicates are identicalto the ones obtained for the predicates thread and start_address in their originalform The dependency information for thread and start_address is computed in 033milliseconds while that for the versions with calls to the wrapped built-in predicatesie thread_with_wrapped and start_address_wrapped are obtained in 065 millisecondsWe ran the analysis 10001 times in a loop The time measured includes only theexecution of the analysis algorithms It excludes the time required to load the inputfiles as well as the time spent printing the results

67 Related WorkFor the past few decades interprocedural analyses have generated considerable interestin the static analysis community They expand the scope of analysis beyond a pro-cedurersquos limits in order to encompass the effect of callees on callers The precision

67 Related Work 135

of both data-flow and control-flow analyses is traditionally characterized in terms ofcontext-sensitivity ie computing information depending on the calling context orits dual context-insensitivity For control-flow analyses the terms polyvariant andmonovariant analyses are used interchangeably for the same distinction (Nielson andNielson 1999) In (Midtgaard 2012) a comprehensive survey of control-flow analysesfor functional programs is made Context-sensitivity has the advantage of increasedprecision However the scalability of such analyses is frequently a major concern Theprecision and performance impact of context-sensitivity is discussed by Lhotaacutek andHendren in (Lhotaacutek and Hendren 2006) In contrast Ruf argues in (Ruf 1995) thatcontext-insensitivity leads to little or no precision penalty Shapiro and Horwitz ar-gue in (Shapiro and Horwitz 1997) that using a more precise pointer analysis does ingeneral lead to more precise results

Sharir and Pnueli introduced in (Sharir and Pnueli 1978) a comprehensive theoryof interprocedural data-flow analyses for general frameworks The first of them thefunctional approach is based upon computing a context-sensitive summary of a functionor procedure call Procedures are viewed as collections of structured program blocksand input-output relations are established for each such block Subsequently the effectof procedure calls is computed by simply using such relations The second approachproposed by Sharir and Pnueli is the call-string approach Broadly speaking this isbased upon avoiding infeasible paths by matching corresponding calls and returnsIt can be seen as an extension to intraprocedural data-flow analyses in which onlyvalid interprocedural paths are considered during graph traversal This is achieved bytagging the propagated data with an encoded history of procedure calls thus making theinterprocedural flow explicit and increasing the accuracy of the propagated informationBoth approaches are generic and can be used for a wide variety of analyses Our formof interprocedural dependency analysis is closer to the functional approach For eachpredicate of the analysed program it computes a dependency summary as an input-output relation and then uses this summary whenever the predicate is called Symbolicelements are used to allow callers to inject their own context information

Though desirable in terms of precision context-sensitivity is often considered pro-hibitively costly in terms of performance In practice many analyses make a com-promise and relax to a certain degree this requirement for scalability Our approachmakes no exception either it constitutes an application of context-sensitive informa-tion to summaries with deferred dependencies which are essentially context-insensitiveabstractions with context-sensitive leaves Though not purely context-sensitive weobtain a gain in precision without sacrificing scalability

Purely context-sensitive analyses have been developed especially in the area ofpoints-to analyses (Gharat Khedker and Mycroft 2016) but also for informationflow control (Hammer and Snelting 2009) or liveness analysis used for garbage collec-tion (Asati et al 2014) In (Khedker Mycroft and Rawat 2011) Khedker et alpresent a lazy context-sensitive points-to analysis Points-to information is computedonly for the pointers that are live and the propagation of points-to information is sparsebeing restricted to live ranges of pointers Though our approach is not directly com-parable to this approach it is interesting to make a few general remarks In (Khedker

136 Chapter 6 Deferred Dependencies Injecting Context in Dependency Summaries

Mycroft and Rawat 2011) strong liveness is used for identifying the pointers thatare directly used or which are used for defining pointers that are strongly live Onthe other hand we use strong dependency to identify and distinguish between inputsubelements that are directly needed for computing the output and input subelementsthat are simply copied into and forwarded as outputs Thus Khedker et al preventthe explosion of information by clearly distinguishing between relevant and irrelevantinformation We achieve scalability by refining the notion of needed or depending onTheir analysis is fully context-sensitive and is based on the call-string approach (Sharirand Pnueli 1978) our analysis shows a relaxed form of context-sensitivity and is closerto the functional approach

Jensen et al present in (Jensen Moslashller and Thiemann 2010) a technique based onlazy propagation for context-sensitive interprocedural analysis of JavaScript programsie programs with objects and first-class functions Transfer functions may not bedistributive and hence the IFDS technique (Reps Horwitz and Sagiv 1995 Padhyeand Khedker 2013) is not applicable They propagate data-flow information ldquoby needrdquoin an iterative fixpoint algorithm

The computation of relevant information is deferred in demand-driven analyses (Hor-witz Reps and Sagiv 1995 Heintze and Tardieu 2001 Zheng and Rugina 2008Sridharan et al 2005) as well These compute the targeted results only at specificprogram points thereby avoiding the effort of computing a global result We computedependency summaries with symbolic elements These can be seen as dependency tem-plates parameterized by a callerrsquos context Their instantiation is deferred and left tothe callers

68 ConclusionWe have presented an extension of our dependency analysis introducing a relaxedform of context-sensitivity Our solution is based on computing deferred dependen-cies consisting of symbolic access maps in which callerrsquos can subsequently inject theirspecific context information on an as-needed basis The dependency summaries foreach predicate are computed only once However by including nested context-sensitivecomponents at the summariesrsquo leaves we reduce the precision penalty exerted by ourprevious context-insensitive approach The introduction of deferred dependencies re-quired the introduction of an additional level of symbolic paths and path sets Howeverthe impact of this extension had a minimal impact on the dependency analysis at theintra- and interprocedural levels imposing only the modification of the initialisationstrategy and of the substitution operation As we will discuss in Chapter 8 our ex-tension of the dependency analysis with deferred dependencies led to an increase of10ndash20 in execution time on our used benchmark However it obtained more precisedependency information for 50 of the predicates included in the used benchmark

137

Chapter 7

Correlation Analysis

A thousand fibers connect us [] andamong those fibers as sympatheticthreads our actions run as causes andthey come back to us as effects

Hermann Melville

71 IntroductionIn the field of Artificial Intelligence the frame problem (McCarthy and Hayes 1969)is loosely but frequently described as ldquoknowing what stays the same as actions occurin a changing worldrdquo (Morgenstern 1995) In the realm of software verification theframe problem refers to establishing the boundaries within which functions operateand it has notoriously tedious implications and consequences along two different axesthe specification of frame properties (Borgida Mylopoulos and Reiter 1995) and theirverification

Another frequently used definition of the frame problem in the context of ArtificialIntelligence refers to ldquoefficiently determining what remains the same in a changingworldrdquo (Morgenstern 1995) This definition is similar to the first yet the initial wordsldquoefficiently determiningrdquo confer it a subtle but crucial nuance In this chapter we arerather interested in the latter and we address the issue of automatically detecting deep-state modifications in the context of αSmil a functional language In our ldquochangingworldrdquo destructive updates are not allowed The new state out of a structured valuein is obtained by destructuring in and reconstructing it in out by copying unmodifiedsubvalues from in and replacing in out only what needs to reflect the modificationThus referring to old values per se as one of the three major approaches to specifyingframe properties (described in Section 231) implies does not make sense Instead wehave to focus on and to detect the relations between the (sub)values in and out Tothis end we present a static correlation analysis which when given a predicate thatmanipulates a structured input is meant to determine automatically the subset thatremains unchanged and is further propagated into the output Thus the behaviour ofa predicate is summarised by computing relations between parts of the input and partsof the output The computed correlation summaries are a safe approximation of what

138 Chapter 7 Correlation Analysis

part of an input state of a predicate is copied to the output state they summarise notonly what is modified by the predicate but also how it is modified and to what extent

Outline We continue this chapter by illustrating the targeted correlation results onan αSmil example in Section 711 In Section 712 we give a brief overview of thecharacteristics of our correlation analysis and explain the motivation behind some ofthem The rest of the chapter is focusing on technical details related to the correlationanalysis In Section 72 we present our abstract partial equivalence type a fundamen-tal component of our correlation analysis It is followed in Section 73 by an in-depthpresentation of paths and correlations an intermediate level of abstraction that is im-perative for obtaining expressive results In Section 74 we focus on the correlationanalysis at an intraprocedural level and illustrate the step-by-step mechanism behindit in Section 742 A summary of the correlation analysis at an interprocedural level isgiven in Section 75 A possible extension going beyond the detection of equivalencesand handling more general relations is briefly discussed in Section 76 Detecting mod-ifications is traditionally associated to shape and side-effect analyses In Section 77 wereview and discuss such approaches

711 Targeted Correlation Information

The goal of our analysis and the targeted correlation results can be illustrated onan example predicate such as stop_thread for instance This predicate has beenintroduced in Section 315 (on page 50) and its body in the αSmil language was shownin Section 41 on page 64 We revisit it and illustrate the predicatersquos body in Figure 71

predicate stop_thread(process in int i)-gt [true process o | inval]arrayltoption_threadgt ta option_thread ththread ti state s1 ta = inthreads2 th = ta[i]3 switch(th) as [Someti | None]4 s = Blocked5 ti = ti with current_state=s6 th = Some(ti)7 ta = [ta with i=th]8 o = in with threads=ta9 true 10 inval

false

None

false

Figure 71 ndash Body of the stop_thread Predicate

It has two possible execution scenarios true when the given index i corresponds toan active thread and inval otherwise ie when it corresponds to an inactive elementor when it lies outside the arrayrsquos bounds In the latter case the predicate exits with

71 Introduction 139

the inval label and generates no output In the former case stop_thread modifies thestate of the i-th active thread by setting it to Blocked and returns the new state ofthe process in the output o This is accomplished by destructuring the input processin and copying the array of associated threads into the local variable ta (line 1) Thearrayrsquos i-th element is copied to the local variable th (line 2) and as it is an activeelement its corresponding thread is extracted and put into ti (line 3) The new statefor the thread value ti is created by setting its current_state field (line 5) to the states constructed previously (line 4) The new state o of the process is constructed usingti for its i-th active element (lines 6 and 7) and copying everything else from the inputin (line 9) It is interesting to note that for each destructuring step of in there is acorresponding construction step for o as is visible at lines 1 and 8 2 and 7 and 3 and6 for instance

The targeted correlation results for this predicate are illustrated in Figure 72 Ouranalysis should infer that between the input process in and the output o the valuesof the fields pid current_thread and address_space are equal Furthermore for thethreads array of associated threads it should detect that all elements are equal exceptthe value of the i-th element (as illustrated by Rth) for which only one of the threefields namely the current_state field differs (shown by Ri)

in

o

address_spacecurrent_thread

pidthreads

address_spacecurrent_thread

pidthreads =

==

Rth

Rth i iRi

Ri stackcurrent_stateidentifier stackcurrent_stateidentifier

Figure 72 ndash Targeted Correlation Results for Predicate stop_thread

By tracking equalities between pairs of variables of the same type and by defining

140 Chapter 7 Correlation Analysis

an abstract partial equivalence type that mirrors the layered structure of associativearrays and algebraic data types we can detect the equality of the values for the pidcurrent_thread and address_space fields between the input and the output However ifwe track only equalities between variables of the same type and we ignore the flow of aninputrsquos subelement value to a variable (or conversely the flow of a variablersquos value to anoutputrsquos subelement) valuable information is lost We are not only losing informationbetween inputs and outputs of different types but by accumulating imprecisions wealso lose information concerning inputs and outputs of the same type such as the inand o processes of our example For instance the equality between the values extractedfrom the input in and copied into ta and th respectively as well as the relation betweenthe values of ta and othreads and th and othreads[i] are ignored because neitherta nor th are of the same type as in and o As a consequence we lose the informationconcerning the relation between inrsquos and orsquos threads values altogether In order tocompute such information it is imperative to track (cor)relations between variables ofdifferent types as well

712 Correlation Analysis in a Nutshell

Our correlation analysis is a conservative static analysis inferring what is modified byan operation and to what extent It approximates the flow of input values into outputvalues by uncovering equalities and computing correlations as pairs between inputparts and the output parts into which these are injected What is marked as beingequal is definitely equal

π

ρ

πprime

ρprimeRprime

R

Figure 73 ndash Intraprocedural Correlations ndash General Representation

Outputs are often complex compounds of different subparts of different input vari-ables a subset of the input is modified while the rest is injected as is We track theorigin of subparts of the output and relate it to subparts of the input As previouslyillustrated on our stop_thread example predicate in order to prevent avoidable over-approximations we need to avoid dealing with data in a monolithic manner To thisend it is imperative to consider pairs of different types and granularities as well As aconsequence we are forced to introduce an additional level of granularity allowing us torefer not only to variables but also to substructures within them At the intraprocedu-ral level illustrated in Figure 73 we define correlations as mappings between pairs ofinputs and outputs to which we associate mappings between pairs of valid inner paths

72 Partial Equivalence Relations 141

and the relations binding them Correlations for arrays and variants are exemplified inFigures 74-a) and 74-b)

i i

R

a) Arrays foralli a[i]R b[i] b) Variants

Figure 74 ndash Intraprocedural Domain ndash Examples

Similarly to our dependency analysis presented in Chapter 5 the correlation analysisis an interprocedural flow-sensitive field-sensitive label-sensitive analysis that handlesassociative arrays structures and variant data types However unlike the dependencyanalysis for which we introduced a relaxed form of context-sensitivity in Chapter 6 thecorrelation analysis is context-insensitive Fine-grained equivalence relations betweenthe inputs and outputs of a predicate are computed once and subsequently propagatedto its callers

Our correlation analysis is meant to be used in an interactive verification contextPrecise correlation summaries must be computed quickly in order to answer effectivelywhen combined with dependency summaries queries regarding the preservation of cer-tain invariants

72 Partial Equivalence Relations

721 Abstract Partial Equivalence Type

The first step towards automatically reasoning about the propagation of input subele-ments into output subelements is the definition of an abstract partial equivalence typeR that mimics the structure of algebraic data types and arrays A partial equivalencerelation R isin R is defined inductively from the two atomic elements Equal and Anyand mirrors the structure of the concrete types

Definition 721 Partial Equivalence Type R isin R

R = | Equal atomic case ndash equal (i)| Any atomic case ndash unrelated (ii)| f1 7rarr R1 fn 7rarr Rn f1 fn fields (iii)| [C1 7rarr R1 Cn 7rarr Rn ] C1 Cn constructors (iv)| 〈Rdef 〉 array (v)| 〈Rdef i Rexc〉 i array index (vi)

Such relations represent fine-grained partial equivalences between pairs of values of thesame type Equal and Any represent equal and unrelated values respectively Partialequivalence relations for structures (given by (iii)) and for variants (given by (iv)) areexpressed in terms of the partial equivalences of their subparts by mapping each field

142 Chapter 7 Correlation Analysis

or constructor to the corresponding relations As for the dependency analysis presentedin Chapter 5 for arrays we distinguish between two cases namely arrays with a generalrelation applying to all of the cells (as given by (v)) or to all but one exceptional cell(as given by (vi)) for which a specific relation is known to hold

The preorder relation of the partial equivalence lattice is denoted by vR and definedbelow

Definition 722 Preorder Relation vR

vR sube R timesR

It is detailed in Table 71

Table 71 ndash vR ndash Comparison of Two Domains

R vR AnyTop

Equal vR RBot

R1 vR Rprime1 Rn vR Rprimen

f1 7rarr R1 fn 7rarr Rn vR f1 7rarr Rprime1 fn 7rarr RprimenStr

R1 vR Rprime1 Rn vR Rprimen

[C1 7rarr R1 Cn 7rarr Rn] vR [C1 7rarr Rprime1 Cn 7rarr Rprimen]Var

R vR Rprime

〈R〉 vR 〈Rprime〉Adef

Rdef vR Rprimedef Rexc vR Rprimeexc

〈Rdef i Rexc〉 vR

langRprimedef i Rprimeexc

rang AI

Rdef vR Rprime Rexc vR Rprime

〈Rdef i Rexc〉 vR 〈Rprime〉AIA

R vR Rprimedef R vR Rprimeexc

〈R〉 vR

langRprimedef i Rprimeexc

rang AAI

i 6= j Rdef vR Rprimedef Rdef vR Rprimeexc Rexc vR Rprimedef Rexc vR Rprimeexc

〈Rdef i Rexc〉 vR

langRprimedef j Rprimeexc

rang AIJ

The join and meet operations are denoted by orR and andR respectively

Definition 723 Join Operation orR

orR R times R rarr R

Definition 724 Meet Operation andR

andR R times R rarr R

72 Partial Equivalence Relations 143

Both are commutative operations applied pointwise on each subelement Join shownin Table 72 has Equal as its identity element and Any as its absorbing element Meetshown in Table 73 has Equal as its absorbing element and Any as its identity elementFor both operations the undisplayed cases are defined by their symmetrical counter-parts

Table 72 ndash Partial Equivalences ndash orR ndash Join Operation

Rprime Rprimeprime Rprime orR Rprimeprime

Any orR R = AnyEqual orR R = R

f1 7rarr R1 fn 7rarr Rn orR f1 7rarr Rprime1 fn 7rarr Rprimen = f1 7rarr R1 orR Rprime1 fn 7rarr Rn orR Rprimen[C1 7rarr R1 Cn 7rarr Rn] orR [C1 7rarr Rprime1 Cn 7rarr Rprimen] = [C1 7rarr R1 orR Rprime1 Cn 7rarr Rn orR Rprimen]

〈R〉 orR 〈Rprime〉 = 〈R orR Rprime〉〈R〉 orR 〈Rprimedef i Rprimeexc〉 = 〈R orR Rprimedef i R orR Rprimeexc〉

〈Rdef i Rexc〉 orR 〈Rprimedef j Rprimeexc〉i = j

i 6= j=

〈Rdef orR Rprimedef i Rexc orR Rprimeexc〉〈Rdef orR Rprimedef orR Rexc orR Rprimeexc〉

Table 73 ndash Partial Equivalences ndash andR ndash Meet Operation

Rprime Rprimeprime Rprime andR Rprimeprime

Any andR R = R

Equal andR R = Equalf1 7rarr R1 fn 7rarr Rn andR f1 7rarr Rprime1 fn 7rarr Rprimen = f1 7rarr R1 andR Rprime1 fn 7rarr Rn andR Rprimen[C1 7rarr R1 Cn 7rarr Rn] andR [C1 7rarr Rprime1 Cn 7rarr Rprimen] = [C1 7rarr R1 andR Rprime1 Cn 7rarr Rn andR Rprimen]

〈R〉 andR 〈Rprime〉 = 〈R andR Rprime〉〈R〉 andR 〈Rprimedef i Rprimeexc〉 = 〈R andR Rprimedef i R andR Rprimeexc〉

〈Rdef i Rexc〉 andR 〈Rprimedef j Rprimeexc〉i = j

i 6= j=

〈Rdef andR Rprimedef i Rexc andR Rprimeexc〉〈Rdef andR Rprimedef andR Rexc andR Rprimeexc〉

Additionally extraction functions are defined for partial equivalence relations

Definition 725 Extraction of a Fieldrsquos Relation

extrf R 9 R

Definition 726 Extraction of a Constructorrsquos Relation

extrC R 9 R

Definition 727 Extraction of a Cellrsquos Relation

extr 〈i〉 R 9 R

144 Chapter 7 Correlation Analysis

These are partial functions and can only be applied on relations of the correspondingtypes For example the field extraction extrf only makes sense for atomic or structuredrelations having a field named f which should be the case if the relation connects twovalues of a structured type with a field f For any of the two atomic relations Equalor Any applying any of these extractions yields Equal or Any respectively They aresummarized in Table 74

Table 74 ndash Partial Equivalence Extractions

extrf (R) f isin F

extrf (Any) = Anyextrf (Equal) = Equal

extrf (f1 7rarr R1 fi 7rarr Ri fn 7rarr Rn) = Ri if f = fi

extrC(R) C isin C

extrC(Any) = AnyextrC(Equal) = Equal

extrC([C1 7rarr R1 Ci 7rarr Ri Cn 7rarr Rn]) = Rj if C = Cj

extr 〈i〉(R)

extr 〈i〉(Any) = Anyextr 〈i〉(Equal) = Equal

extr 〈i〉(〈R〉) = R

extr 〈i〉(〈Rdef i Rexc〉) = Rexcextr 〈i〉(〈Rdef j Rexc〉) i 6= j = Rdef orR Rexc

722 Well-Typed Partial Equivalences and their Semantics

As discussed in the case of dependencies in Section 522 syntactic partial equivalencesare untyped However their interpretation is made in the context of a type τ isin TThe atomic cases such as Equal and Any can apply to any type since they are notexhibiting any data type features Cases other than Equal and Any only have non-empty interpretations for types τ which are compatible with their shape For instancethe structured relation f 7rarr R only really makes sense for structured types with asingle field f whose type itself is compatible with R and will not be used in connectionwith variant or array types for example In Table 75 we detail the inference rulesrelated to the well-typedness of partial equivalences This is described as a judgementparameterized by a typing environment Γ (Definition 431)

Γ ` Equal τWTgt

Γ ` Any τWTperp

72 Partial Equivalence Relations 145

τ = structf1 τ1 fn τnΓ ` R1 τ1 Γ ` Rn τnΓ ` f1 7rarr R1 fn 7rarr Rn τ

WTStruct

τ = variant[C1 τ1| | Cn τn]Γ ` R1 τ1 Γ ` Rn τnΓ ` [C1 7rarr R1 Cn 7rarr Rn] τ

WTVar

Γ ` R τΓ ` 〈R〉 arrτi〈τ〉

WTArr

Γ ` Rdef τ Γ ` Rexc τ Γ(i) = τi

Γ ` 〈Rdef i Rexc〉 arrτi〈τ〉WTArrI

Table 75 ndash Well-Typed Partial Equivalences

The atomic values are generic they are well-typed with respect to any type (WTgtWTperp) The partial equivalences of structures (WTStruct) are well-typed only withrespect to an adequate structured type whose field types are themselves compatiblewith the equivalences mapped to them Similarly the partial equivalences of variants(WTVar) are well-typed only with respect to an adequate variant type In turn theconstructors must be themselves pointwise compatible with the equivalences mappedto them For well-typed array equivalences (WTArr WTArrI) the default relationas well as the exceptional relation have to be compatible with the type τ of the arrayrsquoselements Furthermore the type of i the index of the known exceptional equivalencerelation has to be compatible with τi the arrayrsquos index type

The semantics of a partial equivalence R for a type τ is a partial equivalence re-lation over values of type τ Given a valuation E from variables to semantic values(Definition 442) the interpretation JRKτ of a relation R isin R with respect to sometype τ is a binary relation over Dτ (Definition 441) The interpretation JRKτ is definedas shown in Table 76

JEqualKτ = (x x)| x isin Dτ JAnyKτ = Dτ times Dτ

Jf1 7rarr R1 fn 7rarr RnKstructf1τ1fnτn =(f1 = v1 fn = vn f1 = w1 fn = wn) | foralli 1 le i le n (vi wi) isin JRiKτi

J[C1 7rarr R1 Cn 7rarr Rn]Kvariant[C1τ1| | Cnτn] = (Ci[vi] Ci[wi]) | (vi wi) isin JRiKτi

J〈Rdef 〉Karrτi 〈τ〉 = ((P (v)k) (P (w)k)) | forallk (vk wk) isin JRdef Kτ

146 Chapter 7 Correlation Analysis

J〈Rdef i Rexc〉Karrτi 〈τ〉 = ((P (v)k) (P (w)k)) |E(i) isin P =rArr (vE(i) wE(i)) isin JRexcKτ forallk 6= E(i) (vk wk) isin JRdef Kτ

Table 76 ndash Partial Equivalence Relations ndash Semantics

A partial equivalence relation R only relates values of the same type τ whichmust be compatible with Rrsquos ldquoshaperdquo For structures a partial equivalence relatespointwise the values of the fields of the two structure values For variant values apartial equivalence relation relates values built with the same constructor Ci usingarguments whose values are related by a relation Ri For arrays P indicates the supporttype which has to be identical for both values The values of the array elements arepointwise related by the same relation Rdef with the exception of the i-th elementswhich are potentially related by an exceptional relation Rexc Since variables i are usedfor indicating the exceptional elements the valuation E is used for determining thevalue of i

73 Paths and Correlations

731 Paths and Correlation Types

The partial equivalence relations discussed in Section 72 and defined in 721 are enoughto represent fine-grained information for values of the same structured type For thestop_thread example discussed in Section 711 these would suffice to express the equal-ity of the pid current_thread and address_space fields between the input process inand the output process o by simply mapping this pair to the following partial equiva-lence

threads 7rarr Anypid 7rarr Equalcurrent_thread 7rarr Equaladdress_space 7rarr Equal

However the partial equivalence relations cannot for instance be used to convey theequality at line 1 in Figure 71 between the value of the threads field of in and the localta variable By not tracking information such as this we lose the targeted informationregarding the threads field denoted by Rth in Figure 72 In order to express thisinformation we first need to be able to refer to the substructure inthreads and relateits value to the one of ta

To this end rather than handling only partial equivalences between pairs of variablesof the same type and approximating the rest to Any ndash the element that conveys noinformation ndash we introduce an intermediate level allowing us to store relations betweensubparts of values We begin by introducing access paths Unlike the symbolic pathsintroduced in Chapter 6 and defined in 631 that are used for computing dependencysummaries with context-sensitive elements the paths used for the correlation analysis

73 Paths and Correlations 147

are actual access paths inside some valuersquos structure The symbolic paths used indeferred dependencies may cover multiple actual paths inside a value whereas theaccess paths required for the correlation analysis represent unique chains of internalaccesses leading to a single nested subvalue Each access path is rooted at one of theprogramrsquos variables It is noteworthy to remark that in both cases an intermediate levelbelow variables needs to be introduced as soon as fine-grained relations between pairs ofvariables are considered directly or indirectly In the case of deferred dependencies thiswas not the main goal per se but rather a mechanism for obtaining more precision inspecific cases for already pertinent dependency results In contrast for the correlationresults this is imperative for obtaining useful expressive information in non-trivialcases We therefore define a recursive type π isin Π encompassing this

Definition 731 Access Path Type π isin Π

π = | ε empty ndash root| f π f isin F| Cπ C isin C| 〈i〉π i index program variable

The empty path denoted by ε is the special case denoting an access to an entireelement ie the root The action of appending a non-empty path πprime to another pathπ is denoted by π πprime For instance the path denoting the current_state field of thei-th active associated thread of the in process of our stop_thread predicate would bethe following inthreads〈i〉Sometcurrent_thread

Meaningful information is conveyed by associating paths and partial equivalencerelations For instance the equality between inthreads and ta at line 1 in Figure 71can be expressed by associating Equal to the pair of subelements identified by thethreads path in in and by ε in ta We call correlation such a mapping from a pairof access paths to a partial relation After setting the i-th element of ta to ti thethread with the current state set to Blocked and everything else left unmodified wecould express the relation between in and ta by two correlations namely

(threads ε) 7rarr 〈Equal i Any〉

(threads〈i〉Somet 〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

To this end we introduce correlation maps κ isin K defined below

Definition 732 Correlation Maps κ isin K Correlation maps κ isin K are finite mappings from pairs of paths to partial equiva-

lence relations R isin Rκ Πtimes Π rarr R

148 Chapter 7 Correlation Analysis

Generally for two given variables e and o a correlation (π ρ) 7rarr R specifies thate and o have nested subelements respectively identified by the inner paths π and ρwhose values are related by the relation R

We conclude this subsection by specifying what it means for paths correlations andcorrelation maps to be well-typed

For characterizing the contexts in which an access path π is well-typed we need toconsider the types of values to which it can be applied and the types of (sub)valuesto which it can lead to Therefore in the following we define a typing judgement foraccess paths as a three-place relation π τ rarr τ prime whose meaning is that π can beapplied to any value of type τ and in that case it will always describe subvalues of typeτ prime Additionally the typing judgement is also parameterized by a set of input variablesI which are the variables having the right to appear as identifiers for array accessesThis is detailed in Table 77

Γ I ` ε τ rarr τWTε

τ = structf1 τ1 fi τi fn τnΓ I ` πi τi rarr τ prime

Γ I ` fiπi τ rarr τ primeWTStructAPath

τ = variant[C1 τ1| | Ci τi| | Cn τn]Γ I ` πi τi rarr τ prime

Γ I ` Ciπi τ rarr τ primeWTVarAPath

Γ I ` πi τ rarr τ prime Γ(i) = τi i isin IΓ I ` 〈i〉πi arrτi〈τ〉 rarr τ prime

WTCellAPath

Table 77 ndash Well-Typed Access Paths

Correlations are mappings from pairs of access paths to partial relations Thoughthe two access paths can be applied to values of different types they both need toreturn subvalues of the same type τ prime Furthermore the partial equivalence relationassociated to them has to be well-typed with respect to τ prime as detailed in Table 75The inference rule for well-typed correlations is shown in Table 78

Γ I ` π τl rarr τ prime Γ I ` ρ τr rarr τ prime Γ ` R τ prime

Γ I ` (π ρ) 7rarr R (τl τr)WTCorrelation

Table 78 ndash Well-Typed Correlations

73 Paths and Correlations 149

Finally as shown in Table 79 a correlation map κ is well-typed if all the correlationsit contains are well-typed

forall(π ρ) 7rarr R isin κ Γ I ` (π ρ) 7rarr R (τl τr)Γ I ` κ (τl τr)

WTCorMaps

Table 79 ndash Well-Typed Correlation Maps

732 Alignment and Partial Order

There is no clear choice for a canonical form for correlations For instance it is equiv-alent to write (ε ε) 7rarr f 7rarr R and (f f) 7rarr R Is one superior to the otherWhich one should be chosen Operations can create and manipulate correlations indifferent manners that are hard to predict New correlations can also be introducedwhile considering def-use chains in the transfer function presented later in Section 741Choosing between the two forms considerably limits flexibility Not choosing a canoni-cal form however has consequences as well notably it renders the definition of a partialorder between correlation maps difficult In order to compare two correlation maps κ1and κ2 we cannot simply verify if the path pairs are identical and compare their asso-ciated relations A correlation of the second map could be linked in different mannersto multiple mappings of the first

For instance between a process p of the type used by our stop_thread example andan array ta of the same type as the field threads of the process we might have thefollowing correlation maps

κ1 (threads ε) 7rarrlang

None 7rarr Any

Some 7rarr

t 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

rang

κ2

(threads ε) 7rarr 〈Equal i Any〉

(threads〈i〉Somet 〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

These correlation maps can be depicted as follows

150 Chapter 7 Correlation Analysis

κ1

threadsR1

p

taε

κ2

threadsR2

Rprime2

p

taε

As illustrated above in the given example map κ2 in addition to the relation R2associated to (threads ε) the relation associated to (threads〈i〉Somet 〈i〉Somet)and denoted by Rprime2 expresses information about the values of the processrsquo threadsfield and ta as well These are nested in the i-th element of each as identified by〈i〉Somet In order to compare these two correlation maps we have to first determinethe relationships between the pair of paths (threads ε) from κ1 and each pair of pathsof κ2 The first pair of paths in κ2 is identical whereas the second pair refers toelements that are further away from the root Based on these relationships we haveto extract all the information relevant to (threads ε) from κ2 and consider it in itsentirety This amounts to

(threads ε) 7rarrlangEqual i

None 7rarr Any

Some 7rarr

t 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

rang

Having expressed the information from the κ2 correlation map at the same level asthe information of κ1 is expressed ie that of the pair of paths (threads ε) wecan finally compare them and conclude that the information contained by κ2 is moreprecise than the relation associated to (threads ε) in κ1 The relation associated to(threads ε) in κ1 captures the equality between the values of the identifier and stackfields of all active thread elements of the two arrays identified by the paths The relationassociated to (threads ε) in κ2 expresses the equality between all thread elements ofthe two arrays except the i-th elements Furthermore if the i-th elements of the twoarrays are active it captures the equality between the values of the identifier andstack fields Thus by using the information contained by κ1 we can conclude that for

73 Paths and Correlations 151

all active elements of the two arrays the values of 2 out of the 3 fields are equal byusing the more precise information contained by κ2 we can conclude that all elementsof the two arrays are equal except the i-th one for which the values of the same 2 outof 3 fields as in κ1 are equal

In the general case for comparing two correlation maps κ1 and κ2 we need tocollect for each correlation (π ρ) 7rarr R in κ2 all the information contained by κ1 thatrefers to the elements identified by (π ρ) and verify if this covers at least the sameinformation as the relation R This information could be scattered across multiplemappings of the correlation map κ1 We call alignment the process of collecting forany correlation (π ρ) 7rarr R in κ2 all the information contained in κ1 that refers tothe elements identified by (π ρ) It is necessary in the absence of a canonical forma trait of our approach that is both a weakness and a strength it leads to complexcomputations but gives considerable flexibility as will be shown in Section 74

For aligning we first determine the relationships between paths by determining therelationship between the sequences of internal accesses that they represent These canbe identical representing the same traversal to the same subelement of a value or theycan be completely unrelated such as f and g for instance representing accesses to twodifferent fields of a structure They can also represent sequences of accesses of differentdepths one being the prefix of the other ie being closer to the root For examplethe path f is a prefix of the path f〈i〉 the first represents the access to the field f whereas the second one represents an access to the i-th element of the array nested inthe field f

To distinguish between these cases we define a link type and a matching operator

Definition 733 Link Type micro isinM A link type denoted by micro isinM is defined as follows

micro = | Identical| Left π| Right π| Incompatible

Definition 734 Matching Operator fThe matching operator f retrieves the link micro between two paths

f Πtimes Π rarrM f (π ρ) =

Identical π = ρLeft πprime π πprime = ρRight ρprime ρ ρprime = πIncompatible otherwise

The different cases are depicted in Table 711

152 Chapter 7 Correlation Analysis

f(π ρ) = Identicalπ ρ

f(π ρ) = Left πprime

π

πprime

ρ ρ

f(π ρ) = Right ρprimeπ

ρ

ρprime

π

f(π ρ) = Incompatibleπ ρ

Table 711 ndash Links between Access Paths

Definition 735 AligningAligning a correlation (π ρ) 7rarr R to another pair of paths (πprime ρprime) is denoted by

(Πtimes ΠtimesR)times (Πtimes Π)rarr R [(π ρ) 7rarr R] (πprime ρprime) = R(πρ)(πprimeρprime)

From R we obtain the information referring to the elements identified by (πprime ρprime) anddenote it by R

(πρ)(πprimeρprime) This is done by matching on π and πprime on the one hand and

on ρ and ρprime on the other and by distinguishing between the different cases Whenthe paths are identical we can simply return the relation R When the links betweenthe paths differ or when the paths are incompatible we have to approximate to theleast precise relation thus returning Any When π and ρ are more shallow paths iecloser to the root we need to make a projection denoted by For example aligning(f ε) 7rarr a 7rarr Ra b 7rarr Rb c 7rarr Rc to (fb b) consists in projecting b on the relationa 7rarr Ra b 7rarr Rb c 7rarr Rc and thus obtaining Rb More generically this case isdepicted below

73 Paths and Correlations 153

αβγ

δ

πα

β

γ

δ

αβγ

δ

πα

β

γ

δ

R

For aligning the known correlation to the given pair of paths we need to extractfrom R the information that is relevant for the nested element δ as depicted below

αβγ

δ

πα

β

γ

δ

αβγ

δ

πα

β

γ

δ

R

On the contrary if πprime and ρprime are closer to the root we need to perform an injectiondenoted by x For example aligning (fb b) 7rarr Rb to (f ε) consists in creating arelation a 7rarr Any b 7rarr Rb c 7rarr Any More generically this case can be depicted asfollows

αβγ

δ

βγ

δ

αβ

β

For aligning the known correlation to the given pair of paths we need to expressthe relation R

δat the level of the (αβ β) paths a level that is closer to the root This

consists in creating a new higher-level relation where the element identified by δ ismapped to R

δand everything else is ldquofilledrdquo with Any since nothing is known about

the rest of the elements This can be depicted as follows

154 Chapter 7 Correlation Analysis

αβγ

δ

βγ

δ

αβ

β

Any Any

In the general case R(πρ)(πprimeρprime) is computed as defined below

Definition 736 Computation of R(πρ)(πprimeρprime)

R(πρ)(πprimeρprime) =

R whenf (π πprime) = f(ρ ρprime) = Identical (σ R) whenf (π πprime) = f(ρ ρprime) = Left σx (R σ) whenf (π πprime) = f(ρ ρprime) = Right σAny otherwise

The used projection and injection x operators are defined as follows

Definition 737 Projection Operator

ΠtimesR 9 R

Projection (π R) =

R when π = ε (πprime extrf (R)) when π = f πprime

(πprime extrC(R)) when π = Cπprime (πprime extr 〈i〉(R)) when π = 〈i〉πprime

Definition 738 Injection Operator x

x R times Π 9 R

Injection x (R π) =

R when π = ε

f1 7rarr Any fi 7rarrx (R πprime) fn 7rarr Any when π = f πprime f = fi[C1 7rarr Any Ci 7rarrx (R πprime) Cn 7rarr Any] when π = Cπprime C = Cilang

Any i x (R πprime)rang when π = 〈i〉πprime

For applying the injection operator we need to know the types of the elements ontowhich the relation is injected ie in order to ldquofillrdquo the unknown relations for fields orconstructors with Any we need to know which those fields or constructors are Thusin practice we need to connect the types to the context

Aligning a correlation map κ isin K to (πprime ρprime) amounts to performing this operationfor each element (π ρ) 7rarr R of κ and intersecting the results with the andR operator(Definition 724)

Definition 739 Aligning Correlation Maps

κ (πprime ρprime) =and

R(πρ)7rarrRisinκ

R(πρ)(πprimeρprime)

74 Intraprocedural Correlation Analysis 155

The obtained results R(πρ)(πprimeρprime) are intersected in order to take into account all the in-

formation scattered across the different elements of κ and thus to obtain the mostprecise partial equivalence relation that is contained in κ about the elements identifiedby (πprime ρprime)

Finally we can define the preorder for correlation maps

Definition 7310 Correlation Maps Preorder v

κ1 v κ2 lArrrArr forall[(π ρ) 7rarr R] isin κ2 κ1 (π ρ) vR R

A correlation map κ1 is therefore more precise than another correlation map κ2 if therelation obtained by aligning κ1 to any pair of paths (π ρ) of κ2 is more precise thanR the relation mapped to this pair in κ2 By definition any correlation map κ isin Kis smaller than empty the empty correlation map Therefore the empty correlation mapis the top element for the correlation maps semilattice A bottom element in this casedoes not make sense as it would have to map to Equal any pair of paths denoting(sub)elements having compatible typesThe defined join operation between two correlation maps is denoted by

or

Definition 7311 Join Operationor

for Correlation Maps

κ1orκ2 = κ3 lArrrArr forall[(π ρ) 7rarr R] isin κ1 κ3(π ρ) = R orR κ2 (π ρ)

It consists in aligning the correlation map κ2 to any correlation (π ρ) 7rarr R in κ1 andjoining the obtained aligned relation with R We note that the correlation map obtainedby joining κ1 and κ2 will contain the same keys as κ1 We could have expressed joinby aligning the first correlation map to the elements of the second map This wouldlead to results that have different forms ie (ε ε) 7rarr f 7rarr R versus (f f) 7rarr R butwhich are equivalent by definition

The meet operation between two correlation maps is denoted byand

Definition 7312 Meet Operationand

for Correlation Maps

κ1andκ2 = κ3 lArrrArr κ3(π ρ) =

R andR Rprime when (π ρ) 7rarr R isin κ1

and (π ρ) 7rarr Rprime isin κ2R when (π ρ) 7rarr R isin κ1Rprime when (π ρ) 7rarr Rprime isin κ2

forall(π ρ)

74 Intraprocedural Correlation Analysis

741 Intraprocedural Correlation Summaries and Analysis

As was the case for the dependency analysis presented in Chapter 5 we are working witha control flow graph (CFG) representation of the predicatesrsquo bodies We remind thatnodes represent program states and edges are defined by statements with a particularexit label λ In our case all the outgoing edges of a node n bear the different cases of

156 Chapter 7 Correlation Analysis

the same statement s found at the program point n For each statement s there is anedge labeled s λk for each of its possible exit labels λk (as discussed in Section 42)However similarly to the dependency analysis our correlation analysis does not dependon this specificity

Intraprocedurally correlation information has to be kept at each point of the controlflow graph for each input and output pair of the node

Definition 741 Intraprocedural Correlation SummariesAn intraprocedural correlation summary is a mapping from pairs of variables v isin V

to correlation mapsK isin K K V times V rarr K

There is one special case called NoCorrelation which associates Any ndash the least precisepartial relation ndash to any pair of variables on any pair of valid compatible paths Itis the top element at the intraprocedural level Unreachable is used for nodes thatcannot be reached as its name implies and constitutes the bottom element at theintraprocedural level

For each node of a given control flow graph K(e o) retrieves the correlation mapbetween the local variable e and the output variable o If a mapping for e and o doesnot currently exist K(e o) retrieves the correlation (ε ε) 7rarr Equal when e = o or theempty correlation map empty otherwise

Establishing the partial order vK and the join operationorK is straightforward v

(Definition 7310) andor

(Definition 7311) are extended pointwise to an intraproce-dural summary for each ordered input-output pair and its associated correlation map

Definition 742 Partial Order for Intraprocedural Correlation Summaries

vKsube K timesK K1 vK K2 lArrrArr foralle o isin V K1(e o) v K2(e o)

Definition 743 Join Operation for Intraprocedural Correlation SummariesorK K timesK rarr K K1

orKK2 = K3 lArrrArr forall(e o) K3(e o) = K1(e o)

orK2(e o)

Our correlation analysis is a backward data-flow analysis computing an intrapro-cedural summary at each point of the control flow graph This represents the cor-relations at the nodersquos entry point For each exit label it traverses the control flowgraph starting with its corresponding exit node The intraprocedural summary forthe currently analysed label is initialized with pairs between the local value of eachassociated output variable of the label and the final value of the same output variablemapped to (ε ε) 7rarr Equal The analysis traverses the control flow graph and graduallyrefines the correlations using Kildallrsquos worklist algorithm (Kildall 1973) until a fixedpoint is reached Table 712 summarizes the representation and general equation ofthe statements For each statement the presented data-flow equation operates on theintraprocedural summaries of the statementrsquos successor nodes The intraproceduralsummary at the entry point of the node is obtained by joining the contributions ofeach outgoing edge

74 Intraprocedural Correlation Analysis 157

Definition 744 The contribution of an edge (n ni) labeled with s and λi is givenby Csλi(Kni) isin C where Csλi() is the transfer function of the edge labeled s λi

We note that there are four statements supported by αSmil ie the equality test no-operation the partial structure equality test and the possible variant test that haveno write effects and thus have no own contribution and are not included in Table 712Excepting the no-operation statement the correlation information at their entry pointis obtained by simply joining the intraprocedural summaries of their successor nodeson the true and false exit labels For the no-operation statement the correlation in-formation at the entry point is identical to the intraprocedural summary of its onlysuccessor node the one on the true exit label

Table 712 ndash Statements ndash Representations and Data-Flow Equations

Representation Equationn

n1 ni nk

Kn

Kn1

KniKnk

s λ1 s λks λiKn =

orK

nsλiminusminusrarrni

Csλi

(Kni)

Statement Csλ() csλ killλ

Assignment o = e (e o) 7rarr [(ε ε) 7rarr Equal] otrue

New Struct r = e1 en foralli 1 le i le n (ei r) 7rarr [(ε fi) 7rarr Equal] rtrue

Destructure o1 on = r foralli 1 le i le n (r oi) 7rarr [(fi ε) 7rarr Equal] oitrue

Get Field o = rfi (r o) 7rarr [(fi ε) 7rarr Equal] otrue

Set Field rprime = r with fi = e (r rprime) 7rarr [(ε ε) 7rarr rprimetruef1 7rarr Equal fi 7rarr Any fn 7rarr Equal]

(e rprime) 7rarr [(ε fi) 7rarr Equal]

Create Var v = Cp[e] (e v) 7rarr [(εCpe) 7rarr Equal] vtrue

Var Switch switch(v) as [o1| |on] (v oi) 7rarr [(Cie ε) 7rarr Equal] oiλCi

Array Get o = a[i] (a o) 7rarr [(〈i〉 ε) 7rarr Equal] otrue

Array Set aprime = [a with i = e] (a aprime) 7rarr [(ε ε) 7rarr 〈Equal i Any〉] aprimetrue(e aprime) 7rarr [(ε 〈i〉) 7rarr Equal]

The transfer function Csλ() formalizes the correlations created by the statement son the label λ between its local input variables and its local output variables denotedby csλ as well as the set killλ of variables whose values have been redefined by thestatement s on the label λ These are shown in Table 712 There is one crucialdifference between transfer functions Csλ() and intraprocedural summaries K Anintraprocedural summary K implicitly maps any pair (v v) for v isin V to (ε ε) 7rarr EqualOn the contrary in csλ when the variable v is used as both input and output by the

158 Chapter 7 Correlation Analysis

statement s the pair (v v) is mapped to the correlation map known between the inputrsquosv old value and the outputrsquos v fresh value Otherwise when v is an output ie v isin killλbut not an input of s (v v) is mapped to empty We remark that K represents a statewhile csλ represents a transition

In order to obtain the contribution Csλi(Kni) of an edge labeled with s and λi weneed to connect the information given by csλi to the information contained in the in-traprocedural summary Kni For example at the entry of node 3 in Figure 71 (onpage 138) when considering the scenario in which the predicate exits with true theintraprocedural summary contains the mapping

(th o) 7rarr

(Somet threads〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

On the true edge statement 2 creates the mapping

(ta th) 7rarr [(〈i〉 ε) 7rarr Equal]

Intuitively since we are traversing the graph backwards and we are mapping ordered(local) input-output pairs (ta th) and (th o) can be seen as a def-use pair thecorrelation associated to (ta th) expresses the relation between the defined value of thand the input ta used for creating it while the correlation associated to (th o) showsa subsequent use of that value of th for creating o The contribution of statement 2 onthe true edge should capture this flow of tarsquos value to orsquos value through the variableth Thus it should contain a mapping for the pair (ta o) In the general case we needto detect any variable r such that [(p r) 7rarr κ] isin csλi [(r q) 7rarr κprime] isin Kni and computethe mapping for (p q) in Csλi(Kni)

In order to compute the correlation map associated to (ta o) we take into accountthe fact that both the right path ε of csλ(ta th) and the left path Somet of Kn3(th o)refer to the th variable However they do not represent traversals of the same depthε refers to the entire value of th while Somet refers to the value below the construc-tor Some Between ta and o we can conclude that the values nested under the Someconstructor of the i-th elements are related

(ta o) 7rarr

〈i〉Somet threads〈i〉Somet) 7rarr

identifier 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

We call the process of obtaining the correlation map associated to (ta o) from thecorrelations associated to (ta th) and (th o) composition

In the general case the composition operation is denoted by and it refers to theprocess of computing the flow of a variable p to a variable q through an intermediatevariable r Thus when knowing that (p r) 7rarr [(π ρ) 7rarr R] and that (r q) 7rarr [(πprime ρprime) 7rarrRprime] we must first obtain the link (Definition 733) between the paths ρ and πprime relating

74 Intraprocedural Correlation Analysis 159

subvalues of r to subvalues of p and q respectively This is obtained by matching withf (Definition 734) In the context of the example given above ρ and πprime are the pathsreferring to subvalues of the th variable ie ε and Somet respectively If the twopaths are incompatible ie they refer to different unrelated subvalues of r there isno flow between p and q through r If the paths are compatible we can compute thecorrelation between p and r by distinguishing between the three different possible linkcases obtained with f

The case when the same subvalue of r identified by ρ (and the identical πprime) is relatedto both p and q is depicted below

f(ρ πprime) = Identical

π ρ ρprimeR Rprimep r

πprimeq

In this case computing the flow from p to q through r is rather straightforward Sincethe same subvalue of r is related to prsquos subvalue identified by π and to qrsquos subvalueidentified by ρprime we can relate these two subvalues and map the pair (π ρprime) to therelation obtained by composing R and Rprime We note that given the special form ofpartial relations R isin R the compose operation at this level is equivalent to orR

1

(Definition 723) The computation of the correlation for p and q is depicted below

f(ρ πprime) = Identical

π ρ ρprimeR Rprime

R orR Rprime

p rπprime

q

The subelements of r related to p and to q respectively can also have differentgranularities one being nested deeper in r than the other For instance the subvalueof r identified by the path ρ can be closer to the root than its subelement identified byπprime related to q This case is depicted below

1However this would not be the case anymore for a more complex partial relation type includingnot only equivalences but also more general relations

160 Chapter 7 Correlation Analysis

f(ρ πprime) = Left σ

π

σ

ρ

σ

ρprimeR

Rprime

p rπprime

q

In this case we can only detect the flow of p to q at the level of rrsquos subelement that isrelated to both p and q ie the subelement nested deeper Thus in order to computethe correlation between p and q we need to project σ on R and to compose the obtainedrelation with Rprime This is summarized by the following figure

f(ρ πprime) = Left σ

π

σ

ρ

σ

ρprimeR

Rprime

(σ R) orR Rprime

p rπprime

q

Finally in the complementary case the subvalue of r identified by the path ρand correlated to p can be nested deeper than the subvalue identified by πprime which iscorrelated to q This case is depicted below

f(ρ πprime) = Right σ

π ρ

σ

ρprime

σ

RRprime

p rπprime

q

As in the previous case we can only detect the flow of p to q at the level of rrsquos subelementthat is related to both p and q ie the subelement nested deeper In this case we needto project σ on Rprime and to compose the obtained relation with R The flow between pand q is at the level of the subvalues identified by π and ρprime σ respectively This isillustrated below

74 Intraprocedural Correlation Analysis 161

f(ρ πprime) = Right σ

π πprime

σ

ρprime

σ

RRprime

R orR (σ Rprime)

p rπprime

q

Formally if the ρ and πprime paths are compatible we compose the correlation elements(π ρ) 7rarr R and (πprime ρprime) 7rarr Rprime thereby obtaining a new correlation element (πbull ρbull) 7rarrR which is computed as shown below

Definition 745 Computing (πbull ρbull) 7rarr R

(πbull ρbull) = (π ρ) bull (πprime ρprime) def=

(π ρprime) whenf (ρ πprime) = Identical(π σ ρprime) whenf (ρ πprime) = Left σ(π ρprime σ) whenf (ρ πprime) = Right σ

R = R Rprimedef=

R orR Rprime whenf (ρ πprime) = Identical (σR) orR Rprime whenf (ρ πprime) = Left σR orR (σRprime) whenf (ρ πprime) = Right σ

We note that the use of the projection operation (Definition 737) for both compat-ible non-identical link cases for rrsquos access paths related to p and to q respectively is aconsequence of not choosing a canonical form for correlations The flexibility conferedby the absence of a canonical correlation form is visible at the composition level

The composition of correlation maps is denoted by and defined below

Definition 746 Composition of Correlation MapsComputing κ1 κ2 amounts to intersecting the composition of all correlation ele-

ments from κ1 and κ2

(κ1 κ2)(πbull ρbull) =and

R(πρ)7rarrRisinκ1

(πprimeρprime)7rarrRprimeisinκ2(πbullρbull)=(πρ)bull(πprimeρprime)

R Rprime

Finally the contribution Csλi(Kni) is obtained as defined below

Definition 747 Contribution Csλi(Kni)

CtimesK rarr K csλ K = K prime where K prime(p q) =andr

(csλ(p r) K(r q))

It is depicted in Figure 75

162 Chapter 7 Correlation Analysis

statement s

(csλ1∆λ1)

orK

orK(csλn ∆λn)

csλ1Kλ1

csλnKλn

csλ1Kλ1 csλn Kλn

Figure 75 ndash Entry Point ndash Correlation Information

We conclude this section by specifying what it means for intraprocedural corre-lation summaries to be well-formed showing the corresponding inference rule in Ta-ble 719 Only ordered input-output pairs can appear as keys in intraprocedural map-pings Therefore the well-formedness judgement is parameterized by the set of inputvariables I and by the set of output variables O The former indicate variables thathave the right to appear as left members of the variable pairs while the latter indicatevariables that have the right to appear as right members of the variable pairs The cor-relation map associated to each such input-output pair must be well-typed with respectto the types of the variables as given by the typing environment Γ (Definition 431)The typing judgement for correlation maps was shown in Table 79

forall(e o) 7rarr κ isin K Γ(e) = τe Γ(o) = τo e isin I o isin OΓ I ` κ (τe τo)

Γ IO KWFIntraCor

Table 719 ndash Well-Formed Intraprocedural Correlation Summaries

742 Intraprocedural Correlation Analysis Illustrated

To better illustrate our correlation analysis at an intraprocedural level and to sum-marize everything that has been presented so far in this chapter we exemplify themechanism behind it step by step on the predicate stop_thread discussed in Sec-tion 711 on page 138 We consider the true execution scenario apply our analysisand compare the actual obtained correlation results with the targeted ones depicted inFigure 72

Since a predicate can only exit with one label at a time and we are analysing thetrue label we can map the exit node inval to the special case Unreachable We beginby initialising the correlation summary for the exit node corresponding to the true exitlabel As shown in Figure 76 this consists in mapping the pair referring to the localvalue of the o variable and the final state of o to a correlation map containing a singlecorrelation namely (ε ε) 7rarr Equal This acknowledges that the value of the output oretrieved to the predicatersquos callers is the most recent value computed locally In thefollowing we denote the final value of o by o in order to distinguish it from the localvalue

74 Intraprocedural Correlation Analysis 163

1 ta = inthreads

2 th = ta[i]

3 switch(th) as [ | ti]

4 s = Blocked

5 ti = ti with current_state = s

6 th = Some(ti)

7 ta = [ta with i=th]

8 o = in with threads=ta

9 true 10 inval

true

true

true

true

true

true

true

true

false

false

None

Unreachable(o o) 7rarr (ε ε) 7rarr Equal

Figure 76 ndash Analysing Predicate stop_thread ndash Initialisation

We advance backwards along the control flow graph reaching node 8 We apply theequation corresponding to a field access as given in Table 712 and obtain the followingcorrelation summary

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr (ε threads) 7rarr Equal

We compose it with the correlation summary of its successor node ie the exit nodecorresponding to the true exit label thus detecting the flow of in to o and of ta to o

164 Chapter 7 Correlation Analysis

respectively through the local value o This amounts to

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr (ε threads) 7rarr Equal

Since node 8 does not have any other successor nodes the correlation information atits entry point is identical to the one we have just computed

We advance one step reaching node 7 and apply the corresponding equationthereby obtaining

(ta ta) 7rarr (ε ε) 7rarr 〈Equal i Any〉

(th ta) 7rarr (ε 〈i〉) 7rarr Equal

We compose it with the correlation summary of node 8 tracking the flow of the localvalue of ta to o through the new state of the variable ta after updating its i-thelement We also track the flow of th to o The correlation map for the (in o) pairremains unchanged We thus obtain

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr (ε threads) 7rarr 〈Equal i Any〉

(th o) 7rarr (ε threads〈i〉) 7rarr Equal

In order to obtain the correlation information at the entry point of node 7 we need tojoin the computed correlation summary with the correlation summary known for theother successor of node 7 namely the exit node 10 Since the latter is Unreachable theidentity element for join at the intraprocedural level it does not affect the correlationsummary at the entry point of node 7 We proceed similarly for nodes 6 5 4 3 and 2applying the corresponding data-flow equation for each statement and composing withthe intraprocedural correlation summary of the successor node Since each of thesenodes has only one possible exit label there are not multiple contributions that need tobe joined At the entry point of node 6 for example we obtain the following summary

(ta o) 7rarr (ε threads) 7rarr 〈Equal i Any〉

(ti o) 7rarr (ε threads〈i〉Somet) 7rarr Equal

74 Intraprocedural Correlation Analysis 165

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

We skip some steps and obtain the following correlation summary at the entry point ofnode 2

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(ta o) 7rarr

(ε threads) 7rarr 〈Equal i Any〉

(〈i〉Somet threads〈i〉Somet) 7rarr

id 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

Finally we reach node 1 where we apply the data-flow equation correspondingto a field access and compose the obtained information with the correlation summarycomputed at the entry of node 2 We obtain

(in o) 7rarr

(ε ε) 7rarr

threads 7rarr Any

pid 7rarr Equalcrt_thread 7rarr Equaladr_space 7rarr Equal

(threads threads) 7rarr 〈Equal i Any〉

(threads〈i〉Somet threads〈i〉Somet) 7rarr

id 7rarr Equal

current_state 7rarr Anystack 7rarr Equal

Since the node 1 has only one successor node this correlation summary represents

the correlation information at the entry point of node 1 ie there is no other correlationsummary to join it with This contains a single pair of variables (in o) and theirassociated correlation map Since the pair is an input-output pair of the stop_threadpredicate we do not need to filter anything out This constitutes the final correlationsummary for the analysed predicate on the true exit label These results are identicalto the ones we had depicted as our targeted results in Figure 72

For the inval exit label the corresponding correlation summary is NoCorrelationThis example can be tried on the web page2 dedicated to our correlation analysis Other

2Correlation Analysis Web Page httpwwwajl-demofr2016

166 Chapter 7 Correlation Analysis

examples are provided and explained there as well Additionally users can devise andtest their own examples

75 Interprocedural Correlation AnalysisOur analysis is performed label by label and interprocedural correlation domains asso-ciate an intraprocedural summary to each exit label of the analysed predicate There-fore interprocedural domains encapsulate an intraprocedural summary for each possibleexecution scenario of a predicate

An interprocedural domain Kp of a predicate p is thus defined as shown below

Definition 751 Interprocedural Correlation Domain

Kp Λp rarr K where Λp is the set of output labels of predicate p

The intraprocedural summary associated to each label is filtered so as to contain onlyordered pairs of variables where the left member is an input of the analysed predicateand the right member is an output associated to the analysed label The correlationmaps associated to such pairs are built so as to contain correlations where only inputvariables may appear in array cell paths Similarly the exception index in partialequivalence relations of arrays must be an input variable Registering exceptions inarray correlations only for input variables is not a consequence of a language restrictionon array operations but simply a consequence of the fact that at the interprocedurallevel only correlation information between inputs and outputs makes sense

The interprocedural domain of a predicate is used for deducing the transfer functionsfor a predicate call statement

In the following we detail the equation corresponding to a call to a predicate

p(e1 en)[λ1 o1 | | λm om]︸ ︷︷ ︸s

having the following signature

p(ε1 εn)[λ1 ω1 | | λm ωm]

The general equation form given in Table 712 applies

Kn =orK

nsλiminusminusrarrni

Csλi(Kni)

The transfer functions for the predicate call statement are deduced from the predicatersquosinterprocedural domain in the following fashion

Csλi(Kni) = csλi Kni killλi = oicsλi(ej o

ki ) = κjki forallj isin 1 n forallk isin 1 h

76 Extension ndash Constructor Evolution 167

whereκjki = Kp(λi)(εj ωki ) J (ε 7rarr e)s = p(e1 en) [λ1 o1 | | λm om] oi = o1

i ohi

Namely the contribution of a predicate call to each (ej oki ) input-output pair stemsfrom the contribution of the interprocedural domain for label λi and formal input-output pair (εj ωki ) In these all the formal input parameters ε in array partial equiv-alences and in array cell paths are substituted by the corresponding effective inputparameters from e or approximated away The substitution operation is denoted byJ (χ) where χ is a substitution from formal to effective parameters

Our correlation analysis is context-insensitive and αSmil programs are analysed bycomputing once and for all an interprocedural correlation summary for every predicatethey contain The correlation summaries are stored in a mapping binding predicateidentifiers to their interprocedural correlation information

76 Extension ndash Constructor EvolutionThe correlation analysis as presented so far in this chapter tracks and detects partialequivalence relations between inputs and outputs of predicates An interesting directionto investigate would be an extension of our analysis allowing us to detect not onlyequivalences but more general relations that could capture the evolution of constructorsfor variants In Figure 74-b) we illustrated the form of correlations computed forvariants With the extension the correlation information obtained for variants wouldbe richer as illustrated in Figure 77

Figure 77 ndash Construction Evolution

This extension would allow inferring the preservation of certain properties whentransitioning from a ldquostrongerrdquo state to a ldquoweakerrdquo state For instance we consideragain our process and thread data types introduced in Chapter 3 Section 315 (onpage 49 and 48 respectively) Additionally we consider a predicate kill_thread shownbelow which modifies the array of associated threads of the input p by setting the i-thelement to None If the i-th element is already inactive no modifications are made Inthis case the predicate exits with label inactive and simply copies p to the output o

predicate kill_thread ( process p int i)-gt [ true process o | inactive process o | oob] array ltoption ltthread gtgt threads option ltthread gt thi thread ti o = p [ true -gt 1]

168 Chapter 7 Correlation Analysis

threads = o threads [ true -gt 2]thi = threads [i] [ true -gt 3 f a l s e -gt 9]switch (thi) as [ti |] [Some -gt 4 None -gt 8]thi = None [ true -gt 5]threads = [ threads with i = thi] [ true -gt 6 f a l s e -gt 9]o = o with threads = threads [ true -gt 7][ true][ inactive ][oob]

For variants we are currently detecting equivalence relations between the argumentsof variant values built with the same constructor With the extension for capturingconstructor evolution we could take a step further and also detect for a given executionscenario the set of possible transitions between the different constructors For instancefor the kill_thread predicate on the true exit label we could detect that the onlypossible transition of the i-th element of the threads array is from Some to None Had theelement been None the predicate would have followed the inactive execution scenario

We further consider a predicate disjoint_stacks(process p) verifying a fundamen-tal property of any process namely the fact that the stacks of all associated threads ofthe process are disjoint If the property holds for the input process p prior to executingkill_thread intuitively it should continue to hold subsequently for the output processo as well If the arrayrsquos i-th element was already inactive ie None the propertydisjoint_stack obviously still holds since the input p is simply copied to the outputo If it was active the transition from Some to None does not impact the property asit does not create a new memory region that could threaten the property In this casethe transition from Some to None is a transition from a ldquostrongerrdquo state to a ldquoweakerrdquostate

We have conducted preliminary experiments targeting the detection of such infor-mation and these have led to promising results Tracking general relations that captureevolution requires certain modifications that are confined to the abstract partial relationtype and to the data-flow equations concerning variants

The abstract partial relation type presented in Section 72 (Definition 721) wouldneed to be extended with Impossible an additional atomic case along with Equal andAny It is required for signalling impossible transitions between variant constructors andleads to some overlap with the possible-constructors analysis presented in Chapter 5The partial relations for variants would be expressed as a square matrix of constructorswhere each element aCiCj of the matrix has a corresponding associated partial relationRCiCj Impossible would be associated to any element aCiCj for which the transitionfrom Ci to Cj is impossible For the elements aCiCi on the main diagonal for which thetransition from Ci to Ci is possible we could compute partial equivalences between thearguments of the Ci constructor For the elements aCiCj lying outside the main matrixdiagonal for which the transition from Ci to Cj is possible the associated relationwould be Any Alternatively for computing reflexive relations we could consider thattransitions on the main diagonal ie from Ci to Ci are always possible

77 Related Work 169

Impossible would become the bottom element of our partial relation type R replac-ing Equal in this role It would also become the identity element for the join operationorR (Definition 723) of partial relations and the absorbing element for the meet op-eration andR (Definition 724) Similarly to the case of for the abstract dependencytype the current bottom element Equal would become the middle element of a doublediamond-shaped abstract type and it would require the addition of some extra compar-ison cases for vR (Definition 722) as well as some extra cases for the orR (Table 72)and andR (Table 73) operations The most important modification however would bein the case of the compose operation Currently the compose operation at the level ofpartial equivalence relations is orR With this extension it would amount to a matrixmultiplication

77 Related WorkA rigorous presentation of the frame problem in specification and the different existingapproaches for addressing it has been given by Borgida et al (Borgida Mylopoulosand Reiter 1993 Borgida Mylopoulos and Reiter 1995) A more recent overview offraming is included in (Hatcliff et al 2012)

In recent years a vast body of research has been conducted on the specificationof frame properties in the context of modular programming This ranges from com-plex approaches imposing the swinging pivots requirement (Leino and Nelson 2002) toapproaches using data groups (Leino 1998 Leino Poetzsch-Heffter and Zhou 2002)adopting the Universe type system (Muumlller 2002 Muumlller Poetzsch-Heffter and Leav-ens 2003) or variations of it (Leino and Muumlller 2004 Leino and Muumlller 2006 Barnettand Naumann 2004 Barnett et al 2004) to approaches based on the dynamic frametheory (Kassios 2006 Kassios 2011 Smans Jacobs and Piessens 2012) regionallogic (Banerjee Naumann and Rosenberg 2008) or separation logic (Reynolds 2002OrsquoHearn Yang and Reynolds 2004 Parkinson and Bierman 2005)

In (Smans Jacobs and Piessens 2012) Smans et al present a technique for frameinference based on a variant of dynamic frames inspired by separation logic and relyingon accessibility information contained within pre- and postconditions By includingaccessibility information in a methodrsquos precondition an upper bound on the set oflocations modifiable by the method can be detected In our case the upper bound onthe set of elements that a predicate may modify when exiting with a particular exit labelis implicitly the set of output variables generated on that exit label joined with theset of local variables The implicit dynamic frame approach requires the specificationof accessibility information Our correlation analysis is entirely automatic and infersfine-grained frame properties for compound data structures

The literature on shape analysis (Calcagno et al 2009 Sagiv Reps and Wilhelm1999 Jones and Muchnick 1979 Montenegro Pentildea and Segura 2015) and side effectsanalyses (Salcianu and Rinard 2005 Milanova Rountev and Ryder 2005) is vastThe former is aimed at deep-heap mutations while we are focusing on deep-state mod-ifications in the context of complex transition systems The latter determine memory

170 Chapter 7 Correlation Analysis

locations that may be modified by an operation Reasoning about heap locations isbeyond our scope We treat mappings between variables and their values analyse theirevolution in a side-effect free environment and detect not only what is modified butalso how and to what extent

In (Chang and Leino 2005) Chang and Leino present the congruence-closure ab-stract domain designed for an object-oriented context and implemented in the Specprogram verifier They infer and express relations between fields of variables a goalsimilar to ours The congruence-closure domain maintains equivalence graphs mappingfield accesses to symbolic locations On its own this domain allows the inference andexpression of relations for accessed fields In order to take into account updates as wellthis needs to use the heap succession domain as a base Unlike us they can expresspreorders between fields depending on the base domains used However our domainhandles both accesses and updates to structures arrays and variants in a uniform man-ner independent of additional information We have sketched an extension for handlingnot only equivalences but also more general relations capturing constructor evolutionThis is a direction we plan to investigate in the future

Rakamarić and Hu report in (Rakamaric and Hu 2008) a method to infer frameaxioms of procedures and loops based on static analysis As a starting point they usethe DSA shape analysis presented by Lattner et al (Lattner Lenharth and Adve2007) DSA provides a summary of points-to relations as a graph that is used tocompute a set of memory locations that are modified by a procedure or its calleesBy a pass through the graph for each node reachable from the globals or procedureparameters they generate expressions representing a path to that node The generatedframe axioms are used internally by an extended static checker of C programs iein a purely automatic setting In contrast our analysis is designed for an interactiveverification context Our technique focusing on a purely functional language is notconcerned by aliasing and does not depend on an external points-to framework

In (Taghdiri Seater and Jackson 2006) Taghdiri et al present a technique forextracting procedure summaries for object-oriented procedures used to prove verifi-cation conditions Procedures are executed symbolically and the environment of thepost-state is computed so as to express every variable and field in terms of the values ofthe variables and fields of the pre-state Their goal is broader than ours However un-like their summaries our correlation results encompass only information that is visiblefrom the outside (to the callers)

Bertrand Meyer presents the double frame inference strategy an approach that tar-gets the automation of both frame specification and frame verification in the contextof Eiffel (Meyer 1991) The first component ndash the frame specification inference ndash relieson the analysis of method postconditions The idea stems from an informal reviewof JML code which showed that in practice there is a considerable overlap betweenwhat is mentioned in an assignable clause ie modifies clause and what is includedin the postcondition It relies on the observation that in general when manually writ-ten specifications include clauses about what changes they also include clauses abouthow it changes By analysing a methodrsquos p postcondition a set p is obtained Thisrepresents an overapproximation of the set of elements that are allowed to be modified

78 Conclusion 171

by p according to its specification The second component of the strategy the frameimplementation inference relies on the frame calculus (Kogtenkov Meyer and Velder2015) which is itself based on alias calculus (Kogtenkov Meyer and Velder 2015Meyer 2010 Meyer 2011) Methods are analysed and p is detected this representsan overapproximation of the set of expressions whose values may change as a result ofexecuting p Frame verification amounts to verifying that p includes p Though ourgoal is closely related to the issue addressed by the double frame inference in generaland the frame calculus in particular the approaches are not directly comparable asthey target languages with different characteristics which in turn influence both theadopted analysis techniques and the derivative targeted issues Both approaches areconservative and automatic ie neither requires manual annotations In contrast tothe frame calculus our correlation analysis is standalone and it is not concerned byaliasing

78 ConclusionIdentifying precise information concerning the effects of program operations is possibleby means of static analysis without sacrificing scalability In this chapter we have pre-sented a data-flow analysis that tracks the origin of subparts of the output and relatesit to subparts of the inputs thus detecting not only what is modified but also how it ismodified and to what extent The correlation analysis is a flow-sensitive path-sensitiveinterprocedural analysis that handles arrays structures and variants The analysis iscontext-insensitive but this trait does not have a costly impact in terms of precisionWe have defined a partial equivalence type mirroring the layered structure of algebraicdata types and associative arrays and we introduced an intermediate level consisting ofaccess paths and correlations in order to compute expressive fine-grained equivalencesbetween parts of the inputs and parts of the outputs in a flexible manner Just asframe properties specified by means of old expressions tend to lead to a proliferationof conditions to be specified our correlation summaries showing equivalences betweeninput and output subelements can become verbose in the case of predicates handlinglarge compound values and modifying only a limited input subset However these aredetected automatically and their verbose form could easily be transformed using a morecompact notation of the following form

input ( - changed subelements) = output ( - corresponding subelements)Detecting modifications is traditionally associated to shape analyses that focus

on deep-heap mutations Side-effect analyses detect memory locations that may bemodified by an operation We however are interested by deep-state modifications inthe context of a functional language Other analyses inferring frame properties havebeen devised These are mostly used in a purely automatic setting We howeverdeveloped a correlation analysis meant to be used in an interactive verification context

Similarly to the case of the dependency analysis presented in Chapter 5 we haveimplemented a prototype of the correlation analysis in OCaml and we have applied it toa functional specification of ProvenCore (Lescuyer 2015) Medium-sized experiments

172 Chapter 7 Correlation Analysis

performed on the abstract layers of ProvenCore show encouraging results For instancethe correlation results of approximately 630 αSmil predicates totalling approximately10000 lines of code are obtained in less than 05 seconds ie faster than the dependencysummaries are obtained on the same predicates This is partly a consequence of thefact that unlike the dependency analysis which computes summaries for both codeand specifications the correlation analysis computes non-trivial results only for codeSpecifications are predicates with Boolean exit labels which generate no outputs Sinceour correlation analysis computes fine-grained relations between parts of the inputsand parts of the outputs it cannot detect anything non-trivial in their case Howeverthis would change if we were to extend our correlation analysis and track relationsbetween parts of the inputs as well This is a direction that we plan to investigate inthe future We will focus on the implementation and the discussion of the obtainedresults in Chapter 8 The prototype can be tested on the web page3 dedicated to ourcorrelation analysis where multiple examples are provided and explained Additionallyusers can devise and test their own examples

The correlation analysis presented in this chapter has been the subject of a previouspublication (Andreescu Jensen and Lescuyer 2016)

3Correlation Analysis Web Page httpwwwajl-demofr2016

173

Chapter 8

Implementation Application andResults

Any fact becomes important when itrsquosconnected to another

Umberto Eco

In this chapter we focus mainly on the practical aspects regarding our static anal-yses and the approach to using their results for inferring the preservation of certainlogical properties In Section 81 and Section 82 we give a brief overview of the imple-mentations of our dependency and correlation analyses respectively In Section 83 wesuccinctly present ProvenCore one of the two microkernels developed at Prove amp Runand discuss in terms of execution times and precision the experiments we made on itsfunctional specification In Section 84 we describe the manner in which the summariescomputed by our dependency and correlation analyses are meant to be combined andused for reasoning about the preservation of certain logical invariants We illustratethis approach and discuss it on some examples inspired by ProvenCore

81 Implementation of the Dependency AnalysisPrototypes for both of our static analyses the dependency analysis presented in Chap-ter 5 and its extension with symbolic dependencies presented in Chapter 6 as well as thecorrelation analysis presented in Chapter 7 have been implemented in OCaml (Reacutemyand Vouillon 1997) While trying to retain close proximity to the analyses as presentedtheoretically their implementation mildly diverges from them at certain points due toperformance and scalability considerations One of the main differences is related to themanner in which we store dependencies and partial equivalence relations Based on theobservation that in general when considering complex transition systems the statesare characterized by properties depending only on a limited subset of their subelementswhile most transitions modify only a limited subset of the input statersquos subelements weadopt a more compact representation This in turn is reflected in some of the operatorsas well

174 Chapter 8 Implementation Application and Results

811 Dependency Type and Operators

The abstract dependency type δ that mirrors the structure of associative arrays andalgebraic data types was introduced in Chapter 52 on page 83 It is implemented bythe recursive type dep shown below

( Implementation for the dependency typeintroduced in Chapter 52 )

type dep =| Everything ( top )| Impossible ( bottom )| Nothing| Deferred of accesses ( symbolic )| Struct of struct_typ dep FMapt| Variant of var_typ dep CMapt| Array of dep (var dep) option

The maps used for expressing dependencies for structures and variants use as keysfields and constructors respectively

type fieldmodule FMap EMapS with type key = field

type consmodule CMap EMapS with type key = cons

In contrast to the extended abstract dependency type δ (Definition 641) the actualdependency for structures stores in addition to the map associating dependencies tofields the type struct_typ of the structure as well Similarly the actual dependencyfor variants stores the variantrsquos type var_typ as well in addition to the map associatingdependencies to constructors

As previously mentioned we are targeting complex transition systems such as op-erating systems and microkernels In practice transitions frequently map a large inputstate to a large output state but for computing the output state they are concernedonly with a limited subset of the input state The number of subelements of a complexinput on which the outcome of a predicate depends tends to be low compared to thetotal number of input subelements so we are filtering fields mapped to denotedby Nothing in our implemented dependency type from dependencies for structuresSimilarly from dependencies for variants we are filtering constructors mapped to perpdenoted by Impossible in our implemented dependency type

As a consequence of this optimization we need to know and hence store the typesof structures and variants in order to correctly compare join and reduce dependenciescorresponding to such types In addition this is also useful for checking that theconstructed dependencies are well-typed

81 Implementation of the Dependency Analysis 175

For building dependencies of the corresponding type we have implemented smartconstructors The dependency type is private and new dependencies can be constructedonly by using the provided smart constructors

As explained in Section 52 gt and perp can apply to any type For instance gtcan be seen as a placeholder for data that is needed in its entirety Structure arrayor variant dependencies whose subelements are all entirely needed and thus uniformlymapped to gt are transformed to gt The perp dependency is a placeholder for data thatcannot occur on a certain execution scenario A whole variant value is impossible if allits constructors are mapped to perp A whole structure or array is impossible if any of itssubelements is impossible These canonizations1 are made by our smart constructorsFor instance the smart constructor for structure dependencies returns Everything ifit receives as an input a map of fields in which each key is mapped to EverythingSince fields that are absent from a field map must be interpreted as being mappedto Nothing before returning Everything the constructor also verifies that the map offields it received as an input contains all the fields of the structure type struct_typgiven as an input as well If the given map of fields contains an Impossible value thesmart constructor returns Impossible Any mapping field 7rarr Nothing is filtered fromthe given input map

Similarly for variant dependencies the corresponding smart constructor receives asinputs the variantrsquos type and a map from constructor keys to dependency values Ifall constructors of the variant as indicated by its type var_typ are present in the in-put map and mapped to Everything the smart constructor returns Everything Ifall constructors are present and mapped to Impossible the smart constructor re-turns Impossible Otherwise if the input map contains some constructors mappedto Impossible the corresponding mappings are filtered from the map used to build thevariant dependency

For arrays the smart constructor returns Everything if both the default dependencyand the known exceptional dependency are Everything or if the former is Everythingand there is no known exceptional dependency If any of the two dependencies isImpossible the smart constructor returns Impossible

The smart constructor for deferred dependencies receives a set of variables as aninput If the given set is empty the constructor returns Nothing Otherwise it createsthe access map having the variables in the given input set ie the root variables forsymbolic paths as keys As described in Section 65 a set containing a single paththe empty path is initially associated to each

The v operator (Definition 522) as formally presented in Section 52 and detailedin Table 51 on page 86 returns false whenever comparing two incompatible depen-dencies In practice situations in which comparisons on incompatible types are madeshould never be reached As a consequence whenever we compare structure or variantdependencies we check as a safety measure that the two dependencies correspondto structures or variants of the same type Otherwise the two dependencies are not

1For making all the described canonizations we have to make sure that whenever we replace δ byδprime both δ v δprime and δprime v δ hold

176 Chapter 8 Implementation Application and Results

comparable and we throw an exception that indicates that the types are incompatibleFor structure dependencies whenever a mapping for one field f can be found only inone of the two maps to be compared we compare its mapped dependency value toNothing since absent fields must be interpreted as being mapped to Nothing Similarlyfor variant dependencies whenever a mapping for a constructor C can be found only inone of the two maps to be compared we interpret it as being mapped to Impossible

The join (Definition 523) and reduction operator (Definition 524) as formallypresented in Section 52 on page 87 and 89 respectively are total they return gt theelement conveying no information for incompatible dependencies In practice the twooperators are partial an exception is thrown whenever the two dependencies to bejoined or reduced are incompatible This applies to structures or variant dependenciesthat do not correspond to the same type as well Otherwise when joining or reducingtwo compatible structure or variant dependencies we interpret missing fields or missingconstructors as being mapped to Nothing or Impossible respectively

In Section 661 we described that there are two types of free variables that canappear in dependencies The first type consists of index variables that can appear inarray dependencies For instance in ltNothing ^ i Everythinggt the variable i is theindex of the cell for which the exceptional dependency Everything is known Addi-tionally such index variables can also appear in symbolic paths related to arrays suchas ltNothing ^ i Deferred(a[i])gt or ltDeferred(a[ - i]) ^ i Nothinggt Suchindices must be input variables of the currently analysed predicate as explained in Sec-tion 532 on page 97 The second type of free variables are the root variables thatappear in deferred dependencies For instance in ltDeferred(a[ - i]) ^ i Nothinggtthe variable a is a root variable In the general case the root variables are those outputsto which symbolic access paths are associated in deferred dependencies In order tomake use of the computed context-sensitive information actual dependencies can besubstituted for the root variables This is done by applying the symbolic access pathsto the dependency to substitute By traversing entire dependencies such as

f -gt ltNothing ^ j Everything gtg -gt b -gt Deferred (o)h -gt x -gt Everything

y -gt ltDeferred (a[ - j]) ^ j Nothing gt

and substituting the nested deferred dependencies such as Deferred(a[ - j]) andDeferred(o) we apply context-sensitive information Simultaneously during the sametraversal we also substitute the indices appearing in array dependencies such as j inthe dependency associated to the field f for instance These are either substituted byanother index variable or they are forgotten If the index to substitute is an inputthe formal variable will be replaced by the effective one Otherwise an approximationis made in order to remove the local index variable This consists in joining thedefault and the exceptional dependencies and using the result for building a new arraydependency without an exception

An index substitution is a mapping from variables to either a new index variable toreplace it or to Forget if all references to the index variable should be removed Theindex type is shown below

81 Implementation of the Dependency Analysis 177

type index = | NewIdx of var | Forget

The substitution function subst has the following type

type varmodule VMap EMapS with type key = var

val subst index VMapt -gt dep VMapt -gt dep -gt dep

Its first argument is the index substitution the second argument is the dependencysubstitution mapping root variables to dependencies The third argument is the depen-dency on which the substitutions are to be made The function returns the dependencyobtained after making both substitutions The two substitution passes are fused forperformance considerations

A separate substitution is performed for dealing with polymorphic types Our de-pendency type is not polymorphic per se However αSmil supports polymorphic typesand thus the variables described by the computed dependencies can have a polymorphictype Since the types of structures and variants are stored in the corresponding depen-dencies we must substitute polymorphic type parameters by their effective argumentsThis is done by a recursive function which traverses the dependencies and makes thetype substitution at each nested level if necessary Besides this substitution no othermodifications were made in the implementation in order to handle polymorphism Thisjustifies our formal presentation of the analyses without polymorphism

812 Intraprocedural Dependency Analysis

The intraprocedural dependency type ∆ (Definition 531) mapping variables to depen-dencies δ that was introduced in Chapter 531 is implemented as shown below

type reachable = dep VMapt

( Implementation of the intraprocedural dependency domainintroduced in Chapter 531 )

type intra =| Unreachable| Reachable of reachable

The VMap type is a map having variables as keys

type varmodule VMap EMapS with type key = var

178 Chapter 8 Implementation Application and Results

In order to avoid needlessly storing large maps predominantly containing variablesmapped to Nothing we do not store by default mappings for variables for which de-pendencies have not yet been computed Therefore the intraprocedural dependency ofany variable v for which a mapping has not yet been stored in the map is interpreted asv 7rarr Nothing As discussed in the previous section for the partial order join and reduc-tion operators when applying v∆ (Definition 533) and the join or∆ (Definition 534)and reduction oplus∆ (Definition 535) operators at the intraprocedural level any miss-ing mapping from a Reachable domain has to be interpreted as a variable mapped toNothing

With this interpretation forgetting a variable v (Definition 532) from an intrapro-cedural domain denoted by in Chapter 531 becomes straightforward and amountsto simply removing the mapping for v from the intraprocedural domain

( Forget )l e t forget d v =match d with

| Unreachable -gt d| Reachable dmap -gt Reachable (VMap remove v dmap)

We remark that the complex operations are performed at the dependency typelevel and are mostly applied pointwise at the intraprocedural level The interproce-dural dependency domains are mappings from labels to intraprocedural dependencysummaries

82 Implementation of the Correlation Analysis

821 Partial Equivalence Relations and Operators

The partial equivalence type R (Definition 721) that mirrors the structure of associativearrays and algebraic data types which was introduced in Chapter 721 on page 141 isimplemented as shown below

( Implementation of the partial equivalence typeintroduced in Chapter 72 )

type pequiv =| Equal ( bottom )| Any ( top )| PStruct of struct_typ pequiv FMapt ( structures )| PVariant of var_typ pequiv CMapt ( variants )| PArray of pequiv (var pequiv ) option ( arrays )

The FMap and CMap types are the ones presented on page 174Similarly to structure and variant dependencies and due to the same practical

considerations in addition to the map associating partial equivalences to fields the

82 Implementation of the Correlation Analysis 179

type struct_typ of the structure is stored as well Similarly the implemented partialequivalence for variants stores the variantrsquos type var_typ as well in addition to themap associating partial equivalences to constructors

For avoiding to store large maps in which the majority of the fields or constructorsare mapped to Any we filter mappings of the type field 7rarr Any and cons 7rarr Any

The partial equivalence type is private and the only manner in which partial equiva-lence relations can be built is by using the provided smart constructors The two atomiccases Equal and Any respectively can apply to any type The smart constructors forpartial equivalences corresponding to structures filters out any field mapped to Any Italso returns Equal if all fields of the structure are mapped to Equal in the given inputmap If on the contrary the given input map is empty or all fields are mapped to Anythe smart constructor returns Any

Similarly for partial equivalences corresponding to variants the correspondingsmart constructor receives as inputs the variantrsquos type and a map with constructorkeys and partial equivalences If all constructors of the variants as indicated by theirtype are present in the input map and mapped to Equal the smart constructor returnsEqual If all constructors are present and mapped to Any or if the given input map isempty the smart constructor returns Any Otherwise if the input map contains someconstructors mapped to Any the corresponding mappings are filtered from the mapused to build the variant partial equivalence

For arrays the smart constructor returns Equal if both the default relation and theknown exceptional relation are Equal or if the former is Equal and there is no knownexceptional relation If both the default relation and the known exceptional relationare Any or if the former is Any and there is no known exceptional relation the smartconstructor returns Any

In contrast to dependencies there is only one type of free variables that can appearin partial equivalence relations namely index variables As was the case for arraydependencies these can appear in partial equivalence relations corresponding to arraysand they must be input variables We traverse the partial equivalences recursivelychecking for each index variable appearing in an array relation if it is an input ora local variable References to local variables are eliminated by approximating thepartial equivalences effectively joining the default array relations with the exceptionalarray relations

822 Intraprocedural Correlations

In Chapter 74 on page 156 we have defined intraprocedural correlation summaries(Definition 741) as mappings from pairs of variables to correlation maps In practicethe type intra is the following

module PVMap = EMapMake( struct type t = element element l e t compare = compare end)

module PMap = EMapMake( struct type t = Patht Patht l e t compare = compare end)

180 Chapter 8 Implementation Application and Results

type correlation = pequiv PMapttype intra = correlation PVMapt

type t =| Related of intra| NoCorrelation| Unreachable

The implemented intraprocedural correlation summary type intra is a mappingfrom pairs of elements to correlation maps The element type is shown below

( The type of the elements for which correlationsare computed and kept intraprocedurally Ghost elements are used only for variants for avariant [v] a ghost element that nests the typeof the variant [v] is created These are filteredfrom final results )

type element =| Local of var| Output of var| Ghost of texpr

In practice we need to distinguish between output variables and local variables Thisis important for distinguishing between the final value of an output ie the one cor-related with values of the inputs and its local intermediate values Furthermore weneed to introduce ghost elements for variants When constructing a variant v with aconstructor C(ab) for instance we can keep correlations between the pairs (av) and(bv) However we fail to capture the information regarding vrsquos construction with CIn order to maintain it we create a ghost element g_vtyp with vrsquos type we add thepair (g_vtypv) to the intraprocedural summary and associate (ε ε) 7rarr [C 7rarr Any] toit Such pairs are deleted from the intraprocedural predicate summaries they are onlyused while analysing a predicatersquos body

Unlike the operations discussed in Chapter 7 the implementations of the partialorder (Definition 742) and join (Definition 743) operations are parameterized by thetyping environment mapping variables to types This has to be threaded through alloperations as it is necessary for the injection operation (Definition 738) We needto know the variable type onto which the relation is injected For instance in orderto ldquofillrdquo the unknown relations for fields or constructors with Any we must first knowwhat those fields or constructors are

823 Dependency and Correlation Analysers

The input program is first parsed and each predicate is analysed in turn Implicit pred-icates are treated conservatively Since their implementation is hidden a pessimisticassumption must be made For the dependency analysis it is considered that every-thing in their inputs has been read in order to obtain the outputs for any possible exit

82 Implementation of the Correlation Analysis 181

label Similarly for the correlation analysis it is considered that there is no correlationbetween the input and the output variables on any possible exit label

For inductive predicates the dependency analysis computes a summary for eachcase and joins the results for obtaining the dependency summary for the true exitlabel The false label is treated conservatively and everything is considered to beread Since inductive predicates are specification-only predicates that do not generateoutputs the correlation analysis associates a NoCorrelation summary to both labels

( Analyse the body [g] of an explicit predicate )l e t analyze g =

l e t todo = Queue create () inListiter ( fun v -gt Queuepush v todo) (G vertices g)l e t result = init_result g inl e t rec progress r =

tryl e t v = Queue pop todo inl e t vd = MVfind v r inl e t edges = preds g v inl e t vd rsquo = transfer r v edges ini f Dleq vd rsquo vd then progress re l se begin

Listiter ( fun edge -gtQueue push ( source edge) todo) edges

progress (MVadd v (Djoin vd vd rsquo) r)end

with Queue Empty -gt rinprogress result

The body of each explicit predicate is analysed independently for each possibleexit label using a variation of the worklist algorithm as shown above in the analyzefunction Initially a map is created having as many elements as there are nodes inthe predicatersquos body All of these are initially mapped to Unreachable the bottomelement at the intraprocedural level All the predicatersquos exit nodes are loaded intothe working queue Then a recursive function progress is executed until a fixed pointis reached and there are no more nodes left to analyse in the working queue Thefirst node of the queue is popped and analysed The nodersquos summary as stored in themap is retrieved in vd The analysis returns a summary vdrsquo for the node The twosummaries vdrsquo and vd are compared and if the former is more precise than the latterthen the recursive function progress is called Otherwise before calling progress thepredecessors of the analysed node are pushed into the working queue and in the map ofnodes the join of vd and vdrsquo is associated to the analysed node Since both analyses arebackwards analyses the dependency and correlation information of a node is based onthe dependency or correlation information of its successors in the control flow graph andthe former must be recomputed if the latter are modified Finally from the computedintraprocedural dependency summary all mappings corresponding to local variables

182 Chapter 8 Implementation Application and Results

are filtered From the computed correlation summary of an exit label l all mappingsthat do not correspond to an input and output variable pair are filtered

For the dependency analyser a command-line flag can be used to disable the usageof deferred dependencies Also the well-typedness check of dependency summaries canbe enabled similarly

A parser for dependency information has been implemented as well This allowsus to annotate αSmil programs with the expected results and compare them to thecomputed ones A similar parser for the correlation information is planned for the nearfuture

83 Dependency and Correlation Results on ProvenCoreLayers

831 ProvenCore Description

ProvenCore (Lescuyer 2015) is one of the two microkernels entirely specified and devel-oped in Smart at Prove amp Run Unlike Minix 31 by which it was inspired ProvenCoretargets ARM architectures and uses a Memory Management Unit for managing virtualaddress spaces It is a general-purpose microkernel supporting creation and deletion ofprocesses execution of programs synchronous message-passing inter-process commu-nication with timeouts asynchronous notifications and process-to-process data copies

The main property ensured by ProvenCore is the isolation property Isolation impliestwo complementary properties namely integrity and confidentiality Integrity refersto ensuring that the resources of a process (its code data and registers) cannot bealtered or interfered with by other processes unless explicitly authorized by the processConfidentiality refers to ensuring that the resources of a process cannot be observed byother processes unless explicitly authorized by the process In other words integrityensures that until a process decides to communicate with other processes it will executeas if it were alone on the system Confidentiality ensures that as long as a process doesnot send its secrets to other processes it can change its secrets without affecting otherprocesses

The isolation property has been formally proven using the interactive proof as-sistant of ProvenTools The proofs also establish functional specifications verified byProvenCore (Lescuyer 2015)

The proof for the isolation property is based on multiple refinements between suc-cessive models from the most abstract on which the isolation property is defined andproven to the most concrete ie the actual model used for code generation Thesesuccessive models are shown in Figure 81

Using multiple abstract models each more abstract than its predecessor enablesa degree of separation of concerns in the overall proof The lower-level proofs includea plethora of low-level properties and invariants and are devoid of functional prop-erties while the higher-level models focus on functional specifications Each layer ofabstraction removes details that are not relevant for it anymore and enables changing

83 Dependency and Correlation Results on ProvenCore Layers 183

SPM

RSM

FSP

TDS

Most Abstract

Least Abstract

Figure 81 ndash ProvenCore ndash Abstract Layers

the representation of the transition system in order to internalize in the structure of itsstates some invariants of the preceding level

The Security Policy Model (SPM) is the most abstract level and the one at whichthe isolation property is expressed and proven The kernel is modeled as an abstractcontroller and the various processes are modeled as machines each possessing its ownindependent physical resources

The Refined Security Model (RSM) is an intermediate layer meant to bridge thewide gap between its successor the SPM and its predecessor the FSP In the RSMthe machines share the same physical resources which are managed by the controller

The Functional Specifications (FSP) layer is a model roughly equivalent to its pre-decessor ndash the TDS ndash in functionality but unlike the latter it uses data structures andalgorithms that facilitate reasoning and formal proof Its main functional differencewith the TDS is that it eliminates MMU address translation using instead a linearview of the RAM similarly to the RSM

The Target of Evaluation Design (TDS) is the model that is used to generate thesequential Smart code of the kernel as well as the models for hardware componentsthat are not translated into C code but which are necessary for completing the TDSspecifications

For each refinement a view ie a function from the concrete model state to theabstract model state is defined Then a correspondence or commutation lemma isproven establishing that transitions from c to cprime in the concrete model entail transitionsfrom the view of c to the view of cprime in the abstract model Since the views are not totalfunctions this requires showing that the views actually exist In this manner thehigher levels are attained reaching models that are simpler and more flexible than theTDS but that still simulate all its possible behaviours (Lescuyer 2015)

This refinement chain also facilitates reusing parts of one proof effort in other proofs

184 Chapter 8 Implementation Application and Results

832 Obtained Dependency and Correlation Results

Our dependency and correlation analyses must be evaluated by two different criterianamely execution time and precision In this section we are discussing the former Thelatter will be discussed in the following section

Both analyses target complex transition systems in general and operating systemsin particular The ideas behind them stemmed directly from the verification effortentailed by ProvenCore Unlike other static analyses which are frequently employed ina fully automatic setting our static analyses are supposed to be used as companiontools in the middle of interactive program verification They are supposed to be appliedoften as steps during interactive proofs For instance the dependency and correlationsummaries for different predicates might be needed for verifying a single propertyThese in turn may imply a whole-model analysis Therefore the dependency andcorrelation analyses must perform quickly in order to answer effectively ldquoquestionsrdquoasked frequently

Our analyses have currently been applied to the functional specification of Proven-Core (Lescuyer 2015) More specifically they have been applied to the RSM FSP andTDS layers shown in Figure 81 Each of these layers is characterized by a global statewith numerous fields and different transitions ie supported commands or systemcalls such as fork exec exit Each supported command receives as an input the globalstate before the transition and returns the state of the system after the transition

For instance in RSM the global states are much simpler compared to the ones inthe layers below it ie FSP and TDS They are modeled by a structure with 6 fieldsout of which 3 are modeled by arrays and 2 by structures The RSM counterpart ofthe optional table of processes is a store of machines which are themselves the coun-terpart of FSP processes Machines are structures with 7 fields that refer to registersinformation regarding inter-process communication or permissions and code and datasegments Out of the 7 fields 2 are modeled by variants 2 by associative arrays andother 2 by structures

The global state of the FSP layer is modeled by a structure type with 15 fieldsincluding fields that concern process management (for memory allocations informationabout processes) interrupt handling (registered handlers active handlers) scheduling(priority queues currently running process process to run next) time management orcode data Among these 15 fields 9 fields are ldquocompositerdquo themselves being modeledby structures variants or associative arrays For instance among the fields concerningprocess management there is a table of optional processes The processes themselvesare modeled by a structure type having 26 fields Out of the total of 26 fields 11 aremodeled by algebraic data structures or associative arrays too

The FSP global state is characterized by over 70 invariantsIn TDS the global state is a structure having 33 fields among which 23 are ldquocom-

positerdquo as well The processes are structures having 29 fields among which 14 aremodeled by associative arrays or algebraic data types The global state is character-ized by approximately 140 invariants

83 Dependency and Correlation Results on ProvenCore Layers 185

In Table 83 we give an overview of the global states for each analysed layer Thefirst column shows the total number of fields The second column indicates the numberof fields that are modeled by associative arrays Between parentheses we indicatethe number of arrays having ldquocompositerdquo elements and elements of atomic or implicittypes respectively For example the FSP global state has 6 fields that are modeled byassociative arrays and all 6 of them have ldquocompositerdquo elements In columns 3 4 and5 we show the number of fields that are modeled by structures variants and atomic orimplicit types respectively

Table 83 ndash ProvenCore Abstract Layers ndash Global State Type

Global State Arrays Structures Variants AtomicImplicit

RSM 6 fields 2 fields (11) 2 fields 0 fields 2 fieldsFSP 15 fields 6 fields (60) 0 fields 3 fields 6 fieldsTDS 33 fields 14 fields (140) 3 fields 6 fields 10 fields

The global state of each layer contains an array or store of processes or machinesIn Table 84 we give an overview of the process or machine type for each analysed layerThe table has the same structure as the one described previously for the global statetypes

Table 84 ndash ProvenCore Abstract Layers ndash ProcessMachine Type

ProcessMachine Arrays Structures Variants AtomicImplicit

RSM 7 fields 2 fields (11) 2 fields 2 fields 1 fieldFSP 26 fields 2 fields (02) 5 fields 3 fields 16 fieldsTDS 29 fields 1 field (10) 8 fields 5 fields 15 fields

We have applied our dependency and correlation analyses on the RSM FSP andTDS layers thus conducting medium-sized experiments An overview of the charac-teristics for the 3 ProvenCore layers is included in Table 85 Table 87 and Table 89In each of these the first column shows the total number of predicates of the analysedlayers In parentheses we indicate the number of predicates that only read informationand return a Boolean-like exit label ie logical properties as well as the number of im-plicit predicates for which a pessimistic assumption is made The second column showsthe total number of lines of code (LoC) for each including comments and type defini-tions The next three columns indicate the number of LoC corresponding to predicatestype definitions and comments respectively

We have run the analyses 101 times in a loop on a Lenovo laptop with a Quad-CoreIntel Core I7-5500U processor and 8 GB RAM The system runs Xubuntu GnuLinux64 bit Release 1510 with OCaml 401 Before the first run of each loop the operatingsystemrsquos cache was dropped using the following command

186 Chapter 8 Implementation Application and Results

echo 3 gt procsysvmdrop_caches

The time measured includes only the execution of the analysis algorithms It ex-cludes the time required to load the input files as well as the time spent printing theresults

On average our fully context-insensitive dependency analysis as presented in Chap-ter 5 computed the dependency summaries for 633 RSMFSP predicates in 0656 sec-onds For the TDS predicates the dependency summaries were computed in 0699seconds on average These results are indicated in Table 85

Table 85 ndash Abstract Layers ndash Evaluation Data and DependencyAnalysis Timing

Predicates Total LoC Code Types Comments Dependency Avg

RSMFSP 633 (23565) 9853 8402 596 855 0656 s

TDS 780 (231155) 14000 11306 588 2106 0699 s

In Table 86 we indicate the minimum and maximum execution times for thecontext-insensitive dependency analysis Various percentiles are indicated as well

Table 86 ndash Abstract Layers ndash Detailed Dependency Analysis Timing(in seconds)

Min 10ile 50ile 90ile Max Avg

RSMFSP 0650 0651 0652 0658 0730 0656

TDS 0690 0691 0693 0718 0798 0699

The average execution time of our dependency analysis with the deferred accessesextension is shown in Table 87 in the last column denoted by Avg On averageour dependency analysis extended with deferred accesses as presented in Chapter 6computed the dependency summaries with context-sensitive leaves for 633 predicatesin 0779 seconds For the TDS predicates the dependency information was computedin 0919 seconds on average These results are indicated in Table 87

Therefore using our relaxed form of context-sensitivity led to an increase of 10-20in execution time on the used benchmarks

The detailed timing information for the dependency analysis using deferred accessesis shown in Table 88

The average execution time of our correlation analysis is shown in Table 89 in thelast column denoted by Avg The correlation summaries for the RSMFSP predicatesare computed in 0426 seconds on average For the TDS predicates the correlationsummaries are computed in 0496 seconds on average Unlike the dependency analysis

83 Dependency and Correlation Results on ProvenCore Layers 187

Table 87 ndash Abstract Layers ndash Evaluation Data and Deferred Depen-dency Analysis Timing

Predicates Total LoC Code Types Comments Deferred Avg

RSMFSP 633 (23565) 9853 8402 596 855 0779 s

TDS 780 (231155) 14000 11306 588 2106 0919 s

Table 88 ndash Abstract Layers ndash Detailed Deferred Dependency AnalysisTiming (in seconds)

Min 10ile 50ile 90ile Max Avg

RSMFSP 0776 0777 0779 0781 0785 0779

TDS 0904 0905 0908 0975 0999 0919

which computes information for code as well as specifications ie logical propertiesin a unified manner the correlation analysis only computes information for predicatesthat actually modify data structures This partly explains the time difference betweenthe two analyses We also remark that the possible-constructors analysis is performedsimultaneously with the dependency analysis and this contributes to the differencebetween the execution times as well

Table 89 ndash Abstract Layers ndash Evaluation Data and Correlation Anal-ysis Timing

Predicates Total LoC Code Types Comments Correlation Avg

RSMFSP 633 (23565) 9853 8402 596 855 0426 s

TDS 780 (231155) 14000 11306 588 2106 0496 s

The detailed timing information for our correlation analysis is shown in Table 810Generally static analysis has been considered prohibitive in terms of execution

time and it has been avoided in an interactive context and used predominantly inan automatic context Though currently applied only on medium-sized models theexecution times of both of our analyses are short enough to expect reasonable executiontimes for larger models as well2

2It is noteworthy to remark that the interprocedural dependency and correlation summaries willnot necessarily be computed on-the-fly during the interactive proof They rather will be computed aspart of the build In contrast the treatment of a query once all interprocedural information has been

188 Chapter 8 Implementation Application and Results

Table 810 ndash Abstract Layers ndash Detailed Correlation Analysis Timing(in seconds)

Min 10ile 50ile 90ile Max Avg

RSMFSP 0424 0425 0425 0427 0432 0426

TDS 0492 0493 0494 0498 0540 0496

833 Precision of our Dependency and Correlation Summaries

In this section we try to illustrate the sort of dependency and correlation summariesthat are computed by our analyses We conclude the section with a brief discussionregarding the precision of our obtained results Assessing and discussing precision asa metric for usefulness is hard in isolation and can only be effectively done in relationto actual applications However we present some statistics in order to give someinsight about the proportion of the non-trivial information computed For our currentdiscussion we focus on the results obtained on the RSMFSP and the TDS layers

One of the analysed predicates of the RSMFSP layers is do_auth This predicateis a system call clearing or granting an authorization to some process to read from orwrite to some memory range of the current process It receives a global state in andan index i as inputs and produces on the true label the new global state out aftermodifying the permission for the i-th process in the process store

The code of do_auth performs various system-wide checks before registering thepermission change and is therefore not trivial although its effect is quite limitedIndeed the correlation results computed by our analysis for the true label of thispredicate are shown below

true (in out) 7rarr [(ε ε) 7rarr 7rarr Equal 14 fields

procs 7rarr Any (procs procs) 7rarr 〈 Equal i [ None 7rarr Equal

Some 7rarr v 7rarr 7rarr Equal 25fields

mem_auth 7rarr Any]〉]

The analysis detects that out of the 15 fields of out only the i-th element of the procsfield is changed Furthermore it detects that if this element is an active process iebuilt with the Some constructor only the mem_auth field is modified out of the total of26 fields Everything else is copied from the input state in

computed will be executed in real-time Nevertheless it is desirable to have fast analyses allowingdevelopers to iterate frequently

83 Dependency and Correlation Results on ProvenCore Layers 189

Combined with dependency summaries for logical properties this correlation sum-mary would allow us to infer the preservation of all invariants that are not concernedwith the memory permissions All but one out of the specified properties for the globalstate fall into this category This is the relevant memory permissions property

predicate proc_mem_auth_ok(proc proc) -gt [true | false]

which verifies a fundamental property that has to hold for all processes in the processstore of proc and states that a process has permissions covering a valid range of mem-ory addresses and referring only to existing processes After executing do_auth thisproperty is threatened and needs to be verified only for the i-th process of the storeIt is preserved for all others

The dependency results computed by our analysis for this predicate are shown be-low The analysis detects that for each of the possible execution scenarios the outcomedepends only on 2 out of the 26 fields namely the stackframe and the memory per-missions The dependency on the stackframe is confined to only one of the 3 fieldsthe data and stack segment The memory permissions are given by a variant with 3constructors denoting reading and writing permissions or the absence of any permis-sion Furthermore besides pinning down the outcomersquos dependency on 2 out of the 26fields of the proc structure the analysis also detects that the absence of any memorypermission indicated by the constructor NONE of the mem_auth variant is perp for the falseexecution scenario In other words unused permissions cannot threaten the propertyproc_mem_auth_ok

false rarr proc rarr mem_auth rarr [ READ rarr base rarr gt len rarr gtWRITE rarr base rarr gt len rarr gtNONE rarr perp ]

stackframe rarr ds rarr gttrue rarr proc rarr mem_auth rarr [ READ rarr base rarr gt len rarr gt

WRITE rarr base rarr gt len rarr gtNONE rarr ]

stackframe rarr ds rarr gt

The relevant memory permissions property is thus only threatened by transitionsthat add memory permissions or change a processrsquo virtual space layout Only 2 tran-sitions out of the 25 belong to this category exec which resets the processrsquo segmentsand do_auth which adds permissions and was discussed above In particular transi-tions deleting memory permissions do not impact the property since the absence ofpermissions as shown by the dependency of the constructor NONE for the false labelis an impossible case when the property does not hold This is one of the practicaladvantages of tracking constructor possibilities simultaneously and of extending thecorrelation analysis to track the evolution of constructors as well

In the following we briefly discuss our dependency summaries obtained on theRSMFSP layer in terms of precision An overview is given in Table 811 The firstcolumn refers to the fully context-insensitive dependency analysis as presented in Chap-ter 5 The second column refers to the dependency analysis extended with deferred

190 Chapter 8 Implementation Application and Results

access maps as presented in Chapter 6 The first line indicates the total number ofpredicates both implicit and explicit The second line indicates the total number ofimplicit predicates for which we are obliged to make a pessimistic assumption and toconsider everything needed given that their implementation is hidden The third lineindicates the number of explicit predicates without inputs for which empty summariesare retrieved Our dependency analysis detects the input subset that is read in orderto obtain the output In the case of predicates without inputs this subset is emptyMost explicit predicates without inputs correspond to wrapper predicates around callsto constructors that take no arguments Since αSmil is an intermediate language suchpredicates are automatically generated and do not necessarily correspond to program-mer written predicates The next line line 4 indicates the number of predicates forwhich we obtain non-trivial information By non-trivial information we mean depen-dency summaries in which the dependency associated to at least one input variableis different than gt ie Everything the element conveying no information With thecontext-insensitive dependency analysis we obtain non-trivial results for 344 predicatesWith the extended dependency we obtain non-trivial results for 403 predicates

Table 811 ndash RSMFSP Layers ndash Evaluation Data and DependencySummaries

Context-Insensitive Deferred

Number of Total Predicates 633 633

Number of Implicit Predicates 65 65No Inputs 26 26

Number of Non-Trivial Results 344 403

Number of Trivial-Results 289 230bull Implicit 65 65bull No Inputs 26 26bull Other 198 139

Predicates with Atomic Inputs 31 31

Completely Read 71 71

Overapproximation 96 37

The following line mdash line 5 mdash indicates the total number of predicates for whichtrivial results are obtained These include the results for implicit predicates as well asthose for predicates without inputs For the simple version of the dependency analysiswe obtain 198 trivial results excluding implicit predicates and predicates without in-puts For the extended dependency analysis we obtain trivial results for 139 predicatesexcluding implicit predicates and predicates without inputs Therefore for the first ver-sion of the analysis 49 trivial summaries are a consequence of context-insensitivity The

83 Dependency and Correlation Results on ProvenCore Layers 191

next 3 lines refer to the 139 predicates for which trivial results are obtained with bothversions of the dependency analysis 31 of them correspond to predicates manipulat-ing only inputs of atomic types such as int Such inputs are completely read andthus the trivial results are justified and do not correspond to an over-approximationOther 71 correspond to predicates making complex manipulations and actually read-ing all of their input such as well-formedness checks The last 37 trivial results area consequence of over-approximations made by our analysis The majority of themcorrespond to complex predicates making multiple calls to other complex predicatesand relying heavily on calls to implicit predicates for which conservative assumptionsare made For the simple dependency analysis other 46 trivial results are a result ofover-approximations related to context-insensitivity

An overview of the dependency results for the TDS layer is given in Table 812The table follows the same structure as described for Table 811

Table 812 ndash TDS Layer ndash Evaluation Data and DependencySummaries

Context-Insensitive Deferred

Number of Total Predicates 780 780

Number of Implicit Predicates 155 155No Inputs 15 15

Number of Non-Trivial Results 386 458

Number of Trivial-Results 394 322bull Implicit 155 155bull No Inputs 15 15bull Other 224 152

Predicates with Atomic Inputs 49 49

Completely Read 59 59

Overapproximation 116 44

We remark that with the deferred dependencies extension we obtain more pre-cise dependency summaries for 273 predicates of the RSMFSP abstract layer Theseconstitute approximately 50 of the predicates in the used benchmark For the TDSlayer we obtain more precise results for 308 predicates using the deferred dependenciesextension These constitute approximately 50 of the predicates in the TDS layer forwhich non-trivial results can be obtained (ie excluding implicit predicates and thosewithout inputs) The dependency summaries obtained with the extended analysis areconsiderably more detailed For instance just to give an intuition of the differencebetween the results obtained for the TDS layer the file containing the results com-puted with the context-insensitive dependency analysis contains 7333 lines and its size

192 Chapter 8 Implementation Application and Results

is 2631 kB while the file containing the results computed with the extended analysiscontains 11547 lines and its size is 5239 kB

The statistics for the correlation analysis are shown in Table 813 Unlike the depen-dency analysis which handles both logical properties and predicates generating outputsthe correlation analysis does not handle logical properties It tracks fine-grained partialequivalences between parts of the input and parts of the output Therefore the numberof RSMFSP predicates for which we can obtain non-trivial results (ie at least onepartial equivalence between an input (sub)element and an output (sub)element on atleast one exit label) is lower Implicit predicates and specification-only predicates aremapped to NoCorrelation the top element conveying no information Out of the 307predicates left we obtain non-trivial results for 186 of them The rest include predi-cates relying heavily on calls to implicit predicates They also include complex systemcalls such as fork or exec and auxiliary operations which modify their input entirely

Table 813 ndash RSMFSP Layers ndash Evaluation Data and CorrelationSummaries

Correlation Analysis

Number of Total Predicates 633

Number of Implicit Predicates 65Number of Logical Properties (No Outputs) 235

No Inputs 26

Number of Non-Trivial Results 186

Number of Trivial-Results 90bull Implicit 65bull No Inputs 26bull No Outputs 235bull AtomicImplicit Inputs 31

An overview of the correlation results for the TDS layer is given in Table 814 Thetable follows the same structure as described for Table 813

84 Reasoning about Framing using Correlations and De-pendencies

841 A Decision Procedure

In general reasoning about framing relies on the frame rule which is commonly illus-trated as follows

PCQP andRCQ andR

84 Reasoning about Framing using Correlations and Dependencies 193

Table 814 ndash TDS Layer ndash Evaluation Data and Correlation Summaries

Correlation Analysis

Number of Total Predicates 780

Number of Implicit Predicates 155Number of Logical Properties (No Outputs) 231

No Inputs 15

Number of Non-Trivial Results 235

Number of Trivial-Results 95bull Implicit 155bull No Inputs 15bull No Outputs 231bull AtomicImplicit Inputs 49

The purpose of the frame rule is to enable local reasoning a property R that holdsfor a state P will continue to hold after executing a command C provided that Rreads only locations that are unmodified by C The frame rule also called the rule ofconstancy (Reynolds 1981) applies in its original form to simple languages which donot use a heap Separation logic addresses framing for heap-supporting languages

In our case the αSmil language with which we are working does not support mu-tation Our work is not concerned with heap modifications but focuses on deep-statemodifications We handle predicates that receive a composite input state and constructa new composite output state without altering the former The new output state isconstructed by copying the input state and modifying a subset of subelements

In our context the frame rule must be reinterpreted as follows a property R ispreserved by a predicate C receiving an input state P and constructing an output stateQ if the states P and Q agree on the subset on which the property R depends In otherwords a property is preserved by a predicate if the latter only modifies subelements onwhich the property does not depend Using the terminology used in separation logica property R is preserved by a predicate C if the footprint of C is disjoint from thefootprint of R However we are not concerned with locations but with subelements oflarge states modeled by algebraic data structures and arrays Therefore when reasoningabout framing we need to check if the input subset modified by an operation is disjointfrom the subset that properties are reading and depending on

We have devised two static analyses for automatically computing the footprints ofoperations and properties The dependency analysis detects the input subset on whichthe outcome of an operation or of a property relies The correlation analysis detectsthe input subset that is modified by an operation in order to obtain the output Theresults of the two analyses are meant to be used and combined by a decision procedurein order to automatically infer the preservation of frame properties

The decision procedure has not been implemented yet but based on preliminary

194 Chapter 8 Implementation Application and Results

experiments we give an intuition about how the dependency and correlation summariesare meant to be unified what type of queries could be answered and the mechanismused for answering them

Concretely the decision procedure is meant to receive a sequence of atoms one ofwhich is a query The query is to be answered based on the correlation summariescomputed for the other atoms Atoms are calls to built-in or user-defined predicatesQueries usually consist of a Boolean built-in statement such as an equality check ora partial structure equality check for instance or a call to a logical predicate havingtrue and false as exit labels and generating no outputs In a nutshell the dependencysummary computed for the query would have to be transformed and interpreted as aset of correlations that are sufficient to answer affirmatively the given query Thisshould then be compared to the correlations computed for the atoms The query canbe answered affirmatively if the latter is less than or equal to the former

We sketch the envisioned mechanism behind our decision procedure on a simpleexample receiving 4 atoms One of them is a query as shown below

type state = f int g int h int

v1 = sft = s with g = w

v2 = tf

Q v1 = v2 - true -

In this case it is not necessary to first obtain the dependency for the query markedwith Q and to interpret it as a correlation The necessary and sufficient correlation forthe query to be answered affirmatively can be obtained directly

(v1 v2) 7rarr (ε ε) 7rarr Equal

Separately we need to extract all the correlation information regarding (v1 v2) fromthe given atoms For this we must first find the chains of correlations connecting thetwo through other intermediate atoms Therefore we begin by building an undirectedgraph in which every variable appearing in the atoms is added as a node An edge isadded between any nodes representing the input and the output of the same atom3For our example the graph is shown below

s

t v1

v2 w

3In general these graphs will not be acyclic Further measures will have to be taken for correctlydealing with all cases

84 Reasoning about Framing using Correlations and Dependencies 195

The path connecting v1 and v2 is highlighted in green In the general case such pathscould be detected using a depth-first search algorithm Using the detected path betweenv1 and v2 we build a chain of pairs of variables of the following form

(v1 s) lt-gt (s t) lt-gt (t v2)

These are the unordered paths for which we need to extract the correlation informationcontained in the correlation summaries of the atoms The correlation summaries of ourexample atoms are the following

v1 = sf (s v1) 7rarr (f ε) 7rarr Equal

t = s with g = w (s t ) 7rarr

(f f) 7rarr Equal(h h) 7rarr Equal

(w t ) 7rarr (ε g) 7rarr Equal

v2 = tf (t v2) 7rarr (f ε) 7rarr Equal

In the correlation summaries computed by our analysis correlation maps are associatedto pairs of input and output values ie the computed information is expressed betweenthe input and the output variables of an operation They can be seen as ordered pairshaving inputs as the left members and outputs as the right members However thecorrelation information expresses a relation between two runtime values which canbe compared independently of the order in which they appear4 The atoms refer tovalues that occur in the program at different times and answering the query is doneindependently of the order of execution Therefore at this level we can swap themembers of the pairs to which correlation maps are associated This allows us toobtain correlation information expressed in terms of the variable pairs in the chainextracted from the graph of atom variables For instance for our example we wouldobtain the following

(v1 s) lt-gt (s t) lt-gt (t v2)

(v1 s) 7rarr (ε f) 7rarr Equal

(s t ) 7rarr

(f f) 7rarr Equal(h h) 7rarr Equal

(t v2) 7rarr (f ε) 7rarr Equal

From these we compute the Cartesian product of the correlations appearing in thecorrelation maps as follows

4When the evolution of constructors will be tracked as well the relations will stop being symmetricThus the matrices will have to be transposed

196 Chapter 8 Implementation Application and Results

c1 times c2 c3 times c4

wherec1 = (ε f) 7rarr Equalc2 = (f f) 7rarr Equalc3 = (h h) 7rarr Equalc4 = (f ε) 7rarr Equal

For our example the obtained set would be the following((ε f) 7rarr Equal (f f) 7rarr Equal (f ε) 7rarr Equal)((ε f) 7rarr Equal (h h) 7rarr Equal (f ε) 7rarr Equal))

For each member of the obtained set we need to recursively compose the correlationsin order to obtain information regarding the values involved in the query The composeoperations would be applied as follows

(((cprime1 cprime2) cprime3) middot middot middot )

where for the first element of our example set cprime1 cprime2 and cprime3 have the following values

cprime1 = (ε f) 7rarr Equalcprime2 = (f f) 7rarr Equalcprime3 = (f ε) 7rarr Equal

For our example we cannot obtain any correlation information regarding (v1 v2)by composing the correlations of the second member of the Cartesian product Thefirst correlation relates the value of v1 to the value of the f field of s while the secondcorrelation relates the values of the field h of s and t Thus in this case we cannotinfer anything regarding v1 and t nor regarding v1 and v2 However by composingthe correlations of the first member of the Cartesian product we obtain the following

(v1 v2) 7rarr (ε ε) 7rarr Equal

If after composing we would have obtained multiple correlations referring to (v1 v2)these would have had to be intersected thus allowing us to extract from the givenatoms the most precise correlation information regarding (v1 v2) In the general casethe correlation information obtained after the intersection is the one that has to becompared to the correlation computed previously ie the sufficient correlation for thequery to be answered affirmatively For our example this amounts thus to comparing

(v1 v2) 7rarr (ε ε) 7rarr EqualvK

(v1 v2) 7rarr (ε ε) 7rarr Equal

Based on this we can conclude that the given query Q will be answered affirmativelyfor the atoms given in our example

84 Reasoning about Framing using Correlations and Dependencies 197

842 Types of Targeted Queries

The types of queries that are targeted by our approach can be categorized as follows

bull equality of values

bull structure equality on the values of a subset of fields

bull implications of the form logical_property(a) rArr logical_property(b) where a and bare related by the facts inferred from the other atoms of the query

bull conjunctions of such queries

In the general case we need to reinterpret a dependency summary as a correlationsummary The queryrsquos goal is to deduce the equality between pairs of variables Whentwo such variables are of the same type we can create a correlation map containinga single correlation That correlation associates to the pair of paths (ε ε) a partialequivalence relation which mirrors the dependency The partial equivalence relation iscreated as follows

bull When the dependency is Everything the equivalence relation becomes Equal

bull When the dependency is Nothing the equivalence relation becomes Any

bull Structure variant and array dependencies are transformed pointwise to structurevariant and array partial relations

bull When the dependency is Impossible the equivalence relation becomes Any in theabsence of the possible-constructors extension

We illustrate here some example queries revolving around our do_auth predicatediscussed in Section 833

A naive equality query on the entire input and output of do_auth would not besatisfiable as do_auth does modify the memory authorizations of one process This isthe first sort of supported query

do_auth (now i arg3 )[ true after |oob| f a l s e ]Q after = nowrArr no

The main argument of the do_auth predicate is the global state now an instance ofthe global_state structure5type global_state =

procs array ltoption ltprocess gtgtmemory_regions array lt mem_region gtirq_handlers array lt irq_handler gtcurrent_process int

5Due to confidentiality reasons the actual definition of the struct has been modified and edited forlength

198 Chapter 8 Implementation Application and Results

Since the do_auth predicate only affects the mem_auth of one process in the procsarray we can successfully deduce for the values of now and after the equality on thefields memory_regions and current_process This is the second sort of supported query

do_auth (now arg2 arg3 )[ true after |oob| f a l s e ]Q after = ltmemory_regions current_process gtnowrArr yes

Finally we can directly deduce that the all_ids_in_handlers_ok_global(state)property is not threatened by the execution of the do_auth predicate

do_auth (now arg2 arg3 )[ true after |oob| f a l s e ]Q congruent all_ids_in_handlers_ok_global (now)

all_ids_in_handlers_ok_global (after )rArr yes

This property verifies that all the identifiers used by the registered interruptionhandlers stored in the field irq_handlers are valid The property has the followingdependency summary

false rarr staterarr irq_handlersrarr Everythingtrue rarr staterarr irq_handlersrarr Everything

From the correlation of the do_auth predicate we know that the irq_handlers fieldis preserved and therefore it follows that the property which only depends on thatfield is preserved Similar properties that do not depend on the procs array but onlyon parts or on the entirety of one or more of the other 14 fields will be preserved aswell

The preservation of properties that have to hold for every process in the arrayprocs will be inferred as well as long as they do not depend on the mem_auth field ofthe processes For instance the property procs_proc_map_ok_global verifies that eachprocess of the array procs has valid code data and stack segments This property hasthe following dependency summary

truerarrstaterarr

procsrarr

lang[None rarr EverythingSome rarr vrarr proc_maprarr Everything

]rang

falserarrstaterarr

procsrarr

lang[None rarr EverythingSome rarr vrarr proc_maprarr Everything

]rang

Since for every active process of the array the property depends only on the proc_mapfield it is unaffected by the modification of the mem_auth field Therefore the propertyis preserved for the global state after obtained after the execution of do_auth Similarproperties that do not depend on the mem_auth field but only depend on other parts ofthe data structure will be preserved as well

An extension of the decision procedure sketched in Section 841 could take advan-tage of additional information regarding array indices For example the query couldspecify that two of the involved array indices are different

85 Decision Procedure Experiments 199

do_auth (now i arg3 )[ true after |oob| f a l s e ]Assert i = jQ congruent mem_auth_ok_global (now j)

mem_auth_ok_global (after j)rArr yes

The mem_auth_ok_global(statej) property checks the well-formedness of the mem-ory permission on the j-th process The above query is satisfied if the propertymem_auth_ok_global holds for all processes other than the i-th The correlation sum-mary for do_auth states that the elements of the procs array are unmodified by theoperation except for the i-th element Combined with the dependency summary formem_auth_ok_global given below this allows the query to be satisfied

truerarrstaterarr

procsrarr

langNothing j

[None rarr EverythingSome rarr vrarr ProcDep1

]rang

falserarrstaterarr

procsrarr

langNothing j

[None rarr EverythingSome rarr vrarr ProcDep2

]rang

where ProcDep1 ismem_auth rarr

READ rarr base rarr Everything

len rarr EverythingWRITE rarr base rarr Everything

len rarr EverythingNONE rarr Impossible

stackframe rarr dsrarr Everything

and ProcDep2 is

mem_auth rarr

READ rarr base rarr Everything

len rarr EverythingWRITE rarr base rarr Everything

len rarr EverythingNONE rarr Nothing

stackframe rarr dsrarr Everything

85 Decision Procedure ExperimentsWe have applied a basic prototype of the decision procedure using the dependency andcorrelation summaries computed for the RSMFSP layers of ProvenCore

Our prototype considers pairs of one logical property and one predicate The log-ical property and the predicate must both operate on values of the same type Moreprecisely one of the predicatersquos inputs as well as one of its outputs and one of thelogical propertyrsquos inputs must all be of the same type Our prototype attempts to

200 Chapter 8 Implementation Application and Results

detect whether the logical property is preserved after the execution of the predicate Ifseveral inputs or outputs are of the same type all combinations are considered Mostimplicit types were not considered when searching for propertypredicate pairs as theyare less likely to yield successful results For example arguments of a primitive typelike int are unlikely to be unaffected by the execution of the predicate

This prototype automatically inspected all such propertypredicate pairs found inthe RSMFSP layers A property was considered to be preserved if its dependencysummary for the argument involved when translated to a set of equalities formed asubset of the equalities implied by the predicatersquos correlation summary Both the trueand the false exit labels were considered independently and the property is consideredto be preserved (subject to some conditions) when it is preserved for either or both exitlabels More precisely given a property π(ı)[true|false] and a predicate p(ıprime)[` oprime] wereport success when it can satisfy the following

exist i isin ı iprime isin ıprime oprime isin oprime such that Γ(i) = Γ(iprime) = Γ(oprime) (81)and exist ` isin true false (82)and E(j) 6= E(k) and Eprime(j) 6= Eprime(k) forallj k isin ı ıprime oprime (83)

when j and k are used as array indices (84)

andlangE[

Prop(ı[irarr iprime])[true|false]]rang `minusrarr E (85)

andlangE[

Pred (ıprime)[`prime o| ]]rang `primeminusrarr Eprime (86)

andlangEprime[

Prop(ı[irarr oprime])[true|false]]rang `minusrarr Eprime (87)

where ı[i rarr iprime] and ı[i rarr oprime] denote the sequence of variables ı in which the variable iis replaced by the variable iprime (respectively oprime)

This initial prototype was run on the 398 explicit predicates and 235 properties ofthe RSMFSP layer of ProvenCore Out of these we filtered predicateproperty pairsfor which the property has an input i of the same type as one of the predicatersquos inputsiprime and one of its outputs oprime These pairs involve 161 distinct predicates and 165 distinctproperties In total there were 8250 tuples (i iprime oprime `) which satisfied the conditions 81and 82

This experiment allowed us as a first result to automatically identify 102 predicatesfor which at least one property is preserved under the conditions 81 ndash 87 stated aboveFor many predicates it was possible to show that after the execution of said predicateseveral properties are preserved (up to 33) Figure 82 shows an overview of howmany properties were inferred to be preserved for each predicate The blue regionat the bottom indicates how many properties are inferred to be preserved for a givenpredicate while the red region above shows how many properties were compatible withthe predicate but were not inferred to be preserved

Figure 83 shows an overview of how many predicates were inferred to be preservingeach property The blue region at the bottom indicates how many predicates areinferred to be preserving a given property while the red region above shows how many

85 Decision Procedure Experiments 201

20 40 60 80 1000

5

10

15

20

25

30

35

40

45

50

Predicates

Num

berof

preservedprop

ertie

sinferred

Figure 82 ndash Distribution of the number of inferred preserved proper-ties Predicates are sorted along that criterion

predicates were compatible with the property but were not inferred to be preservingit

It is worth noting that in both figures 82 and 83 the red zone contains properties(respectively predicates) which could fall into these cases

bull The property is actually threatened by the predicate (respectively the predicatethreatens the property)

bull The property is not threatened (respectively the predicate is not threatening)but proving so requires more information that is obtained by our dependencyand correlation analysis For example a more precise dependency or correlationanalysis (eg tracking constructor evolution as presented in 76) could be neededA numerical or value analysis could also help determine that the parts of the in-put data structure which are modified by the predicate and on which the logical

202 Chapter 8 Implementation Application and Results

20 40 60 80 900

10

20

30

40

50

60

Properties

Num

berof

predicates

preserving

theprop

erty

inferred

Figure 83 ndash Distribution of the number of inferred predicates for whicha property is preserved Properties are sorted along that criterion

property also depends still satisfy the property after the execution of the pred-icate Alternatively the preservation of these properties can be demonstratedusing an interactive prover

bull The property is not threatened (respectively the predicate is not threatening) andthe dependency and correlation summaries contain enough information to provethe non-interference of the predicate and property but our decision procedureprototype failed to infer it This can be due to a timeout (this initial prototypehas not been optimized at all and can take a substantial time in some cases) orto precision losses in the decision procedure prototype itself

203

Chapter 9

Conclusion and Perspectives

There is no real ending Itrsquos just theplace where you stop the story

Frank Herbert

Despite its intuitive simplicity the frame problem has proved to be an enduringissue with notoriously tedious implications Its different manifestations have been stud-ied for several decades in various contexts ranging from Artificial Intelligence in thecontext of which it has been originally identified to the field of formal specificationand verification Recently it has received extensive attention from the object-orientedverification community where it has been identified as a subsisting problem (LeavensLeino and Muumlller 2007) and an ideal candidate for automation (Meyer 2015) Clas-sical approaches to addressing the frame problem are typically relying on separationlogic (Reynolds 2005) or ownership types (Clarke Potter and Noble 1998) Thoughthe merits of such approaches are indisputable the manual specification effort that theyrequire is non-negligible as well Frame properties are an integral part of a completespecification and they are mandatory for proving correctness but ideally they shouldimpose little additional effort Programmers should be able to focus on the truly inter-esting part namely what code does and rely on automatic tools for the repetitive andcumbersome task of specifying and verifying frame properties

Interactive formal verification of complex transition systems is not exempt from themanifestations of the frame problem either Considerable effort is spent on provingthe preservation of the systemrsquos invariants even though in practice the majority ofoperations have a localised effect on the system and impact only a limited number ofinvariants at the same time Identifying those invariants that are unaffected by anoperation and automatically proving their preservation can substantially ease the proofburden for the programmer In this thesis we have presented an approach towardsautomatically inferring the preservation of framing-related invariants It is meant tobe used in the context of an interactive theorem prover and employs two differentstatic analyses namely a dependency analysis and a correlation analysis whose unifiedresults are meant to establish the disjointness between the data dependencies of a logicalproperty and the modifications performed by an operation The decision proceduremeant to combine the results of the two analyses is still in an incipient stage Howeverour preliminary experiments related to automatically answering queries regarding the

204 Chapter 9 Conclusion and Perspectives

preservation of certain invariants for unmodified parts are encouraging We believethat our envisioned approach can become applicable to complex transition systemson a routine basis Reasoning about framing can come for free without imposing thespecification of additional clauses We also believe that automatic reasoning aboutframing can be achieved through static analysis Generally static analysis has beenconsidered prohibitive in terms of execution time It has been predominantly usedin an automatic context and avoided in interactive contexts where queries have to beanswered fast so as not to impede the natural flow of an interactive proof Thoughcurrently applied only on medium-sized models given the short execution times of ourdedicated static analyses we believe that reasonable execution times for larger modelscan be expected as well Therefore we surmise that static analysis is applicable in aninteractive verification context

91 ContributionsThe main contributions of this thesis are the designed and implemented dependencyand correlation analyses which are meant to be used in the context of an interactivetheorem prover Both analyses handle associative arrays and algebraic data types andcompute fine-grained results mirroring the layered structures of such types They targetcomplex transition systems in general and operating systems in particular These arecharacterized by states defined by complex compound data structures and by transi-tions ie state changes that map an input state to an output state Both of our staticanalyses are concerned with deep-state manipulations ie accesses and modificationsrespectively

The dependency analysis presented in Chapter 5 automatically detects the relevantinput subset needed for producing certain outputs It handles functions and theirspecifications in a unified manner and computes for each possible execution scenario aconservative approximation of the input (sub)elements on which their outcome dependsIt is a flow-sensitive path-sensitive interprocedural data-flow analysis Furthermore forvariants an additional analysis is simultaneously conducted for computing the subsetof possible constructors on a given execution scenario Together with the dependencyinformation per se this additional information about constructors is meant to answerthe same question namely what fragments of the input influence the output from adifferent albeit related point of view The first version of the dependency analysis wasfully context-insensitive In order to introduce a relaxed form of context-sensitivity wehave devised an extension based on symbolic paths This was presented in Chapter 6

The extension for the dependency analysis is based on computing deferred depen-dencies consisting of symbolic access maps in which callers can subsequently injecttheir specific context information on an as-needed basis The dependency summariesfor each predicate are still computed only once However by including nested context-sensitive components at the summariesrsquo leaves we reduce the precision penalty exertedby the fully context-insensitive approach without sacrificing performance As discussedin Chapter 8 the deferred dependencies extension led to an increase of 10ndash20 in

91 Contributions 205

execution time on the used benchmarks In terms of precision it led to more precisedependency summaries for 50 of the predicates of the same benchmarks

We surmise that besides its intended target other programming activities can relyon our dependency analysis as well For instance the analysis can have applications inthe testing realm for designing and generating test suites that avoid redundant testingof the same execution scenario Classes of inputs that will test the same executionscenario can be automatically determined The input subelements on which the outputsof a predicate do not depend can be consistently supplied with the same testing value asthey are completely irrelevant for the outcome On the contrary the input subelementson which the outputs depend should be targeted and their values should be varied formore comprehensive testing Furthermore our dependency analysis could also facilitateunit testing for exceptions as it computes specific results for every execution scenarioof a predicate Indeed it is useful to have dedicated test cases which trigger eachexception that can be thrown by a function The set of relevant parts of the inputdiffers for each possible exception and for the regular execution behaviour

Our second contribution is the correlation analysis presented in Chapter 7 whichdetects the flow of input values into output values It computes a conservative approx-imation of fine-grained equivalences between the input and the output subelementsof a function The correlation analysis is an interprocedural data-flow analysis thattracks the origin of subparts of the output and relates it to subparts of the inputs thussummarising the behaviour of functions and detecting not only what is modified butalso how and to what extent We have defined a partial equivalence type mirroringthe layered structure of algebraic data types and associative arrays and we introducedan intermediate level consisting of access paths and correlations These allow comput-ing expressive information regarding equivalences between subparts of the inputs andsubparts of the outputs in a flexible manner

Prototypes for both of our analyses have been implemented in OCaml These werediscussed in Chapter 8 We have applied them to a functional specification of Proven-Core (Lescuyer 2015) a general-purpose microkernel that ensures isolation Resultsfor medium-sized models have been obtained on average in less than 1 second with thedependency analysis and less than 05 seconds on average with the correlation analysisStatic approaches have long been considered as being confined to small programs Webelieve that our preliminary results indicate that it is possible to report conservativeprecise information without sacrificing scalability

We remark that our experience with the design and implementation of the twoanalyses has been rather different The dependency analysis is much more complexsemantically This is partly a consequence of the simultaneous possible-constructorsanalysis which has an impact on the abstract dependency domain Deferred depen-dencies add yet another layer of complexity However the implementation proved tobe much simpler than the implementation of the correlation analysis The latter posedchallenges due to the intermediate layer of access paths and correlations that we had toadd for obtaining expressive fine-grained information However the correlation analy-sis is simpler from a semantics point of view It is also noteworthy to remark that forboth analyses an intermediate level below variables needed to be introduced as soon as

206 Chapter 9 Conclusion and Perspectives

fine-grained relations between pairs of variables were considered directly or indirectlyIn the case of deferred dependencies this was not the main goal but rather a mecha-nism for obtaining increased precision in specific cases for already pertinent dependencyinformation In contrast for the correlation analysis the inclusion of an intermediatelevel was imperative for obtaining useful expressive information in non-trivial cases

As a first step towards a solution for automatically inferring the preservation offraming-related invariants we have sketched a decision procedure meant to employour two static analyses By uncovering equivalences between inputs and outputs afterhaving detected that a property only depends on unmodified parts and by unifying theresults the preservation of invariants for the unmodified parts can be inferred

92 Future WorkWe conclude this thesis with some perspectives for practical future work as well assome theoretical open issues that we wish to address in the future

Practical Future Work From a practical point of view our future work goalsrevolve around the full implementation of the decision procedure its integration inthe interactive theorem prover developed at Prove amp Run as well as its comprehensiveassessment in a real-word context

Decision Procedure Implementation Our first and main goal for the nearfuture focuses on the full implementation of the decision procedure combining our de-pendency and correlation summaries and answering queries related to the preservationof logical properties The performance of the algorithm sketched in Section 84 shouldbe assessed on real-world examples The complexity of this algorithm depends on thenumber of paths relating two endpoints in the graph of query atoms variables Italso depends on the number of correlations relating pairs of variables along the chainsconnecting endpoints This could lead to a combinatorial explosion of the number ofcompose operations for large query graphs Further optimization manners should beinvestigated and applied in the algorithm implementing the decision procedure

Validation After having implemented the decision procedure the precision ofour two static analyses employed by it should be comprehensively assessed on variousbenchmarks

Some of the theoretical aspects related to our static analyses have been formalizedin Coq by Steacutephane Lescuyer However the actual implementation of the algorithmsis not formally connected to the mechanized proofs Therefore it would be desirableto extensively test the implementation of the analysis algorithms This could be doneby translating the dependencies and correlations to types in a sufficiently expressivetype system or by inserting runtime guards These guards would check equalities forcorrelations and would taint supposedly irrelevant values identified by the dependencyanalysis verifying that the output is not tainted For the correlation analysis inputs

92 Future Work 207

which are correlated to some output values could be given a universally quantified typethe same type appearing in the parts of the output which are supposed to be equalThis is commonly used as a design pattern in functional programming languages toexpress data-flow constraints via the type system For the dependency analysis eachpart of the input which is supposed to be irrelevant for a predicatersquos output could beassigned a distinct polymorphic type variable which does not appear in the outputThis allows the body of the predicate to take notice of a valuersquos presence without beingable to manipulate its contents

Tool Integration and Support Another important goal for the near future isthe integration of our decision procedure in the ProvenTools interactive prover A tac-tic allowing to automate the inference of framing-related invariant preservation shouldbe supported This goal entails a sequence of other considerations that have to beaddressed Currently the dependency and correlation analyses handle whole programsand compute summaries for every predicate of the analysed program Though theexecution times of our analyses are low even these can prove to be cumbersome ina real world context Therefore the two analyses should be adapted so as to allowincrementally analysing only parts of a program Caching the results of the analysesacross invocations of the decision procedure could prove to be efficient as well Addi-tionally the mechanism of answering queries regarding invariant preservation shouldbe transparent allowing users to see the reasoning steps behind the decision procedureTransparency is necessary for the ProvenTools prover which targets products that haveto be certified This possibly also requires a more concise output notation for thedependency and correlation summaries in order to ease the interpretation of resultsCurrently they tend to be rather verbose for predicates handling composite values witha large number of subelements

For the dependency summaries a parser was implemented allowing users to an-notate predicates with expected dependency information A similar parser could bewritten for the correlation summaries These annotations are a useful tool for testingthe analyses on benchmarks for which the correlations and dependencies are knownIn addition they would allow users to annotate programs with constraints on the ex-pected dependencies and correlations similarly to type annotations in the presence oftype inference and check that these expectations hold

Finally the decision procedure and our dependency and correlation analyses couldbe offered as a software library A public API should describe and prescribe the ex-pected behavior of our two static analyses and the decision procedure relying on them

Theoretical Perspective From a theoretical perspective several interesting as-pects remain open In a nutshell these consist in developing support for more sophis-ticated queries that could be answered by our decision procedure The precision of ourdependency and correlation analysis can be further increased as well

208 Chapter 9 Conclusion and Perspectives

Decision Procedure A first interesting theoretical effort revolves around theformalization of our envisioned decision procedure used for inferring framing-relatedinvariants The types of queries it can answer should be further investigated andextended For instance it would be desirable to assert as a hypothesis that certainpredicates are known to be valid on some nodes of the graph We further identifiedtwo extensions for our correlation analysis that could increase the number of answeredqueries

Constructor Evolution For increasing the number of queries that our decisionprocedure can answer one direction to investigate is the extension of our correlationanalysis in order to track and compute information regarding the evolution of variantconstructors This additional information should be leveraged to the context of ourdecision procedure The formalization and implementation of this extension constitutean interesting effort Furthermore other types of relations between variables could beconsidered as well

Correlations between Inputs Another extension of our correlation analysisthat would enrich the types of queries that can be answered by our decision proce-dure consists in tracking correlations between pairs of inputs in addition to the onescomputed between pairs of inputs and outputs Besides the unified treatment of bothactual code and logical properties on the correlation analysis side this would allowanswering queries that consist in a single logical property on multiple input values thatare additionally related by other facts It would also allow detecting aliasing betweenvariables used as array indices

Numerical Analysis for Arrays Arrays are a source of precision loss in bothof our static analyses Hence it would be interesting to investigate the impact of usingsimple numerical abstractions (congruence modulo and linear abstract domains) Thenumerical analysis could otherwise be offloaded to an external SMT solver such as Z3or Alt-Ergo for instance Symbolic evaluation of the arithmetic computations shouldalso be possible This would avoid precision losses when joining two dependencies orcorrelations with exceptional information on distinct index variables which prove tohave the same integer value in practice Eliminating this source of imprecision wouldlikely benefit the analysis of loops over arrays

In conclusion we have devised and implemented two static analyses detecting thedata dependencies of a logical property as well as correlations between the inputs andthe outputs of operations Our first results on a functional model of a microkernelare encouraging both in terms of precision and speed making these analyses suitableto use in the context of interactive provers Aside from incremental improvements onthe precision of our analyses the next steps are to combine them in order to detectinvariants which are not affected by the execution of a predicate and to integrate this

92 Future Work 209

as a tactic in the ProvenTools theorem prover We believe that reasoning about framingcan come for free without imposing additional annotations Inferring the preservationof framing-related invariants through static analysis can become applicable on a routinebasis for complex transition systems

211

Bibliography

Abrial Jean-Raymond Stephen A Schuman and Bertrand Meyer (1980) ldquoSpecifica-tion Languagerdquo In On the Construction of Programs pp 343ndash410

Alpuente Mariacutea Santiago Escobar and Salvador Lucas (2007) ldquoRemoving RedundantArguments Automaticallyrdquo In TPLP 71-2 pp 3ndash35 url httpdxdoiorg101017S1471068406002869

Andreescu Oana F Thomas Jensen and Steacutephane Lescuyer (2015) ldquoDependencyAnalysis of Functional Specifications with Algebraic Data Structuresrdquo In FormalMethods and Software Engineering - 17th International Conference on Formal En-gineering Methods ICFEM 2015 Proceedings pp 116ndash133 doi 101007978-3-319-25423-4_8 url httpdxdoiorg101007978-3-319-25423-4_8

Andreescu Oana Fabiana Thomas Jensen and Steacutephane Lescuyer (2016) ldquoCorrelat-ing Structured Inputs and Outputs in Functional Specificationsrdquo In Software En-gineering and Formal Methods - 14th International Conference SEFM 2016 Heldas Part of STAF 2016 Vienna Austria July 4-8 2016 Proceedings pp 85ndash103doi 101007978-3-319-41591-8_7 url httpdxdoiorg101007978-3-319-41591-8_7

Asati Rahul Amitabha Sanyal Amey Karkare and Alan Mycroft (2014) ldquoLiveness-Based Garbage Collectionrdquo In Compiler Construction - 23rd International Con-ference CC 2014 Held as Part of the European Joint Conferences on Theory andPractice of Software ETAPS 2014 Grenoble France April 5-13 2014 Proceed-ings pp 85ndash106 doi 101007978-3-642-54807-9_5 url httpdxdoiorg101007978-3-642-54807-9_5

Baier Christel and Joost-Pieter Katoen (2008) Principles of Model Checking MITPress isbn 978-0-262-02649-9

Banerjee Anindya Mike Barnett and David A Naumann (2008) ldquoBoogie Meets Re-gions A Verification Experience Reportrdquo In Verified Software Theories Tools Ex-periments Second International Conference VSTTE 2008 Toronto Canada Oc-tober 6-9 2008 Proceedings Ed by Natarajan Shankar and Jim Woodcock BerlinHeidelberg Springer Berlin Heidelberg pp 177ndash191 isbn 978-3-540-87873-5 doi101007978-3-540-87873-5_16 url httpdxdoiorg101007978-3-540-87873-5_16

Banerjee Anindya and David A Naumann (2014) ldquoA Logical Analysis of Framing forSpecifications with Pure Method Callsrdquo In Verified Software Theories Tools andExperiments - 6th International Conference VSTTE 2014 Vienna Austria July17-18 2014 Revised Selected Papers pp 3ndash20 doi 101007978-3-319-12154-3_1

212 BIBLIOGRAPHY

Banerjee Anindya David A Naumann and Stan Rosenberg (2008) ldquoRegional Logicfor Local Reasoning about Global Invariantsrdquo In ECOOP 2008 - Object-OrientedProgramming 22nd European Conference Paphos Cyprus July 7-11 2008 Pro-ceedings pp 387ndash411 doi 101007978-3-540-70592-5_17 url httpdxdoiorg101007978-3-540-70592-5_17

mdash (2013) ldquoLocal Reasoning for Global Invariants Part I Region Logicrdquo In J ACM603 181ndash1856 doi 1011452485982 url httpdoiacmorg1011452485982

Barnes J and Praxis Critical Systems Limited (1997) High Integrity Ada The SPARKApproach Programming Languages Addison-Wesley isbn 9780201175172 urlhttpsbooksgooglefrbooksid=YoBGAAAAYAAJ

Barnett Michael and David A Naumann (2004) ldquoFriends Need a Bit More Maintain-ing Invariants Over Shared Staterdquo In Mathematics of Program Construction 7thInternational Conference MPC 2004 Stirling Scotland UK July 12-14 2004Proceedings pp 54ndash84 doi 10 1007 978 - 3 - 540 - 27764 - 4 _ 5 url http dxdoiorg101007978-3-540-27764-4_5

Barnett Michael Robert DeLine Manuel Faumlhndrich K Rustan M Leino and Wol-fram Schulte (2004) ldquoVerification of Object-Oriented Programs with InvariantsrdquoIn Journal of Object Technology 36 pp 27ndash56 doi 105381jot200436a2url httpdxdoiorg105381jot200436a2

Barnett Michael Bor-Yuh Evan Chang Robert DeLine Bart Jacobs and K RustanM Leino (2005a) ldquoBoogie A Modular Reusable Verifier for Object-Oriented Pro-gramsrdquo In Formal Methods for Components and Objects 4th International Sym-posium FMCO 2005 Amsterdam The Netherlands November 1-4 2005 RevisedLectures pp 364ndash387 doi 10100711804192_17 url httpdxdoiorg10100711804192_17

Barnett Michael Robert DeLine Manuel Faumlhndrich Bart Jacobs K Rustan M LeinoWolfram Schulte and Herman Venter (2005b) ldquoThe Spec Programming SystemChallenges and Directionsrdquo In Verified Software Theories Tools ExperimentsFirst IFIP TC 2WG 23 Conference VSTTE 2005 Zurich Switzerland October10-13 2005 Revised Selected Papers and Discussions pp 144ndash152 doi 101007978-3-540-69149-5_16 url httpdxdoiorg101007978-3-540-69149-5_16

Barnett Mike Manuel Faumlhndrich K Rustan M Leino Peter Muumlller Wolfram Schulteand Herman Venter (2011) ldquoSpecification and Verification The Spec ExperiencerdquoIn Commun ACM 546 pp 81ndash91 doi 10114519531221953145 url httpdoiacmorg10114519531221953145

Berdine Josh Cristiano Calcagno and Peter W OrsquoHearn (2005) ldquoSmallfoot Mod-ular Automatic Assertion Checking with Separation Logicrdquo In Formal Methodsfor Components and Objects 4th International Symposium FMCO 2005 Amster-dam The Netherlands November 1-4 2005 Revised Lectures pp 115ndash137 doi10100711804192_6 url httpdxdoiorg10100711804192_6

mdash (2012) ldquoVerification Condition Generation and Variable Conditions in SmallfootrdquoIn CoRR abs12044804 url httparxivorgabs12044804

BIBLIOGRAPHY 213

Berdine Josh Byron Cook and Samin Ishtiaq (2011) ldquoSLAyer Memory Safety forSystems-Level Coderdquo In Computer Aided Verification - 23rd International Confer-ence CAV 2011 Snowbird UT USA July 14-20 2011 Proceedings pp 178ndash183doi 101007978-3-642-22110-1_15 url httpdxdoiorg101007978-3-642-22110-1_15

Berg Joachim van den and Bart Jacobs (2001) ldquoThe LOOP Compiler for Java andJMLrdquo In Tools and Algorithms for the Construction and Analysis of Systems7th International Conference TACAS 2001 Held as Part of the Joint EuropeanConferences on Theory and Practice of Software ETAPS 2001 Genova Italy April2-6 2001 Proceedings pp 299ndash312 doi 1010073- 540- 45319- 9_21 urlhttpdxdoiorg1010073-540-45319-9_21

Bertot Yves and Pierre Casteacuteran (2004) Interactive Theorem Proving and ProgramDevelopment - CoqrsquoArt The Calculus of Inductive Constructions Texts in The-oretical Computer Science An EATCS Series Springer isbn 978-3-642-05880-6doi 101007978-3-662-07964-5 url httpdxdoiorg101007978-3-662-07964-5

Bertrane Julien Patrick Cousot Radhia Cousot Jeacuterocircme Feret Laurent MauborgneAntoine Mineacute and Xavier Rival (2015) ldquoStatic Analysis and Verification of AerospaceSoftware by Abstract Interpretationrdquo In Foundations and Trends in ProgrammingLanguages 22-3 pp 71ndash190 doi 1015612500000002 url httpdxdoiorg1015612500000002

Blanchet Bruno Patrick Cousot Radhia Cousot Jeacuterocircme Feret Laurent MauborgneAntoine Mineacute David Monniaux and Xavier Rival (2003) ldquoA Static Analyzer forLarge Safety-Critical Softwarerdquo In Proceedings of the ACM SIGPLAN 2003 Con-ference on Programming Language Design and Implementation 2003 San DiegoCalifornia USA June 9-11 2003 pp 196ndash207 doi 101145781131781153url httpdoiacmorg101145781131781153

Bobot Franccedilois and Jean-Christophe Filliacirctre (2012) ldquoSeparation Predicates A Tasteof Separation Logic in First-Order Logicrdquo In Formal Methods and Software Engi-neering - 14th International Conference on Formal Engineering Methods ICFEM2012 Kyoto Japan November 12-16 2012 Proceedings pp 167ndash181 doi 101007978-3-642-34281-3_14 url httpdxdoiorg101007978-3-642-34281-3_14

Borgida Alexander John Mylopoulos and Raymond Reiter (1993) ldquo And NothingElse Changes The Frame Problem in Procedure Specificationsrdquo In Proceedings ofthe 15th International Conference on Software Engineering Baltimore MarylandUSA May 17-21 1993 Pp 303ndash314 url httpportalacmorgcitationcfmid=257572257636

mdash (1995) ldquoOn the Frame Problem in Procedure Specificationsrdquo In IEEE Trans Soft-ware Eng 2110 pp 785ndash798 doi 10110932469460 url httpdxdoiorg10110932469460

Bouissou O Eacute Conquet P Cousot R Cousot J Feret K Ghorbal Eacute GoubaultD Lesens L Mauborgne A Mineacute S Putot X Rival and M Turin (2009)

214 BIBLIOGRAPHY

ldquoSpace Software Validation using Abstract Interpretationrdquo In Proc of the In-ternational Space System Engineering Conference on Data Systems in Aerospace(DASIA 2009) Vol SP-669 httpwww-aprlip6fr~minepubliarticle-bouissou-al-dasia09pdf Istambul Turkey ESA p 7 doi 19215321921553

Burdy Lilian Yoonsik Cheon David R Cok Michael D Ernst Joseph R Kiniry GaryT Leavens K Rustan M Leino and Erik Poll (2005) ldquoAn Overview of JML Toolsand Applicationsrdquo In STTT 73 pp 212ndash232 doi 101007s10009-004-0167-4url httpdxdoiorg101007s10009-004-0167-4

Calcagno Cristiano and Dino Distefano (2011) ldquoInfer An Automatic Program Verifierfor Memory Safety of C Programsrdquo In NASA Formal Methods - Third Interna-tional Symposium NFM 2011 Pasadena CA USA April 18-20 2011 Proceed-ings pp 459ndash465 doi 101007978-3-642-20398-5_33 url httpdxdoiorg101007978-3-642-20398-5_33

Calcagno Cristiano Dino Distefano Peter W OrsquoHearn and Hongseok Yang (2008)ldquoSpace Invading Systems Coderdquo In Logic-Based Program Synthesis and Transfor-mation 18th International Symposium LOPSTR 2008 Valencia Spain July 17-18 2008 Revised Selected Papers pp 1ndash3 doi 101007978-3-642-00515-2_1url httpdxdoiorg101007978-3-642-00515-2_1

mdash (2009) ldquoCompositional Shape Analysis by Means of Bi-Abductionrdquo In Proceedingsof the 36th ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages POPL 2009 pp 289ndash300 doi 10114514808811480917 url httpdoiacmorg10114514808811480917

mdash (2011) ldquoCompositional Shape Analysis by Means of Bi-Abductionrdquo In J ACM586 p 26 doi 10114520496972049700

Cardelli Luca and Peter Wegner (1985) ldquoOn Understanding Types Data Abstractionand Polymorphismrdquo In ACM Comput Surv 174 pp 471ndash522 doi 10114560416042 url httpdoiacmorg10114560416042

Castillo Rosa Francisco Corbera Angeles G Navarro Rafael Asenjo and Emilio LZapata (2008) ldquoComplete Def-Use Analysis in Recursive Programs with DynamicData Structuresrdquo In Euro-Par 2008 Workshops - Parallel Processing VHPC 2008UNICORE 2008 HPPC 2008 SGS 2008 PROPER 2008 ROIA 2008 and DPA2008 Las Palmas de Gran Canaria Spain August 25-26 2008 Revised SelectedPapers pp 273ndash282 doi 101007978-3-642-00955-6_32 url httpdxdoiorg101007978-3-642-00955-6_32

Catantildeo Neacutestor and Marieke Huisman (2003) ldquoCHASE A Static Checker for JMLrsquosAssignable Clauserdquo In Verification Model Checking and Abstract Interpretation4th International Conference VMCAI 2003 New York NY USA January 9-112002 Proceedings pp 26ndash40 doi 10 1007 3 - 540 - 36384 - X _ 6 url http dxdoiorg1010073-540-36384-X_6

Chalin Patrice Joseph R Kiniry Gary T Leavens and Erik Poll (2005) ldquoBeyondAssertions Advanced Specification and Verification with JML and ESCJava2rdquoIn Formal Methods for Components and Objects 4th International SymposiumFMCO 2005 Amsterdam The Netherlands November 1-4 2005 Revised Lectures

BIBLIOGRAPHY 215

pp 342ndash363 doi 10100711804192_16 url httpdxdoiorg10100711804192_16

Chang Bor-Yuh Evan and K Rustan M Leino (2005) ldquoAbstract Interpretation withAlien Expressions and Heap Structuresrdquo In Verification Model Checking andAbstract Interpretation 6th International Conference VMCAI 2005 Proceedingspp 147ndash163 doi 101007978-3-540-30579-8_11 url httpdxdoiorg101007978-3-540-30579-8_11

Clarke David G and Sophia Drossopoulou (2002) ldquoOwnership Encapsulation andthe Disjointness of Type and Effectrdquo In Proceedings of the 2002 ACM SIGPLANConference on Object-Oriented Programming Systems Languages and ApplicationsOOPSLA 2002 Seattle Washington USA November 4-8 2002 Pp 292ndash310 doi101145582419582447 url httpdoiacmorg101145582419582447

Clarke David G John Potter and James Noble (1998) ldquoOwnership Types for Flex-ible Alias Protectionrdquo In Proceedings of the 1998 ACM SIGPLAN Conferenceon Object-Oriented Programming Systems Languages amp Applications (OOPSLArsquo98) Vancouver British Columbia Canada October 18-22 1998 Pp 48ndash64 doi101145286936286947 url httpdoiacmorg101145286936286947

Clarke Edmund M and E Allen Emerson (1981) ldquoDesign and Synthesis of Synchro-nization Skeletons Using Branching-Time Temporal Logicrdquo In Logics of ProgramsWorkshop Yorktown Heights New York May 1981 pp 52ndash71 doi 10 1007BFb0025774 url httpdxdoiorg101007BFb0025774

Cok David R (2005) ldquoReasoning with Specifications Containing Method Calls andModel Fieldsrdquo In Journal of Object Technology 48 pp 77ndash103 doi 105381jot200548a4 url httpdxdoiorg105381jot200548a4

Cousot P and R Cousot (1994) ldquoHigher-Order Abstract Interpretation (and Appli-cation to Comportment Analysis Generalizing Strictness Termination Projectionand PER Analysis of Functional Languages) invited paperrdquo In Proceedings of the1994 International Conference on Computer Languages Toulouse France IEEEComputer Society Press Los Alamitos California pp 95ndash112

Cousot Patrick (2001) ldquoAbstract Interpretation Based Formal Methods and FutureChallengesrdquo In Informatics - 10 Years Back 10 Years Ahead Pp 138ndash156 doi1010073-540-44577-3_10 url httpdxdoiorg1010073-540-44577-3_10

Cousot Patrick and Radhia Cousot (1977) ldquoAbstract Interpretation A Unified Lat-tice Model for Static Analysis of Programs by Construction or Approximation ofFixpointsrdquo In Conference Record of the Fourth ACM Symposium on Principles ofProgramming Languages Los Angeles California USA January 1977 pp 238ndash252 doi 101145512950512973 url httpdoiacmorg101145512950512973

mdash (2010) ldquoA Gentle Introduction to Formal Verification of Computer Systems byAbstract Interpretationrdquo In Logics and Languages for Reliability and Securitypp 1ndash29 doi 103233978-1-60750-100-8-1 url httpdxdoiorg103233978-1-60750-100-8-1

216 BIBLIOGRAPHY

Cousot Patrick Radhia Cousot Jeacuterocircme Feret Laurent Mauborgne Antoine MineacuteDavid Monniaux and Xavier Rival (2005) ldquoThe ASTREEacute Analyzerrdquo In Program-ming Languages and Systems 14th European Symposium on ProgrammingESOP2005 Held as Part of the Joint European Conferences on Theory and Practice ofSoftware ETAPS 2005 Edinburgh UK April 4-8 2005 Proceedings pp 21ndash30doi 101007978-3-540-31987-0_3 url httpdxdoiorg101007978-3-540-31987-0_3

Cousot Patrick Radhia Cousot Jeacuterocircme Feret Antoine Mineacute Laurent MauborgneDavid Monniaux and Xavier Rival (2007) ldquoVarieties of Static Analyzers A Com-parison with ASTREErdquo In First Joint IEEEIFIP Symposium on Theoretical As-pects of Software Engineering TASE 2007 June 5-8 2007 Shanghai China pp 3ndash20 doi 101109TASE200755 url httpdxdoiorg101109TASE200755

Cuoq Pascal Virgile Prevosto and Boris Yakobowski Frama-C Value Analysis UserManual url httpframa-ccomdownloadframa-c-value-analysispdf

Cuoq Pascal Florent Kirchner Nikolai Kosmatov Virgile Prevosto Julien Signolesand Boris Yakobowski (2012) ldquoFrama-C - A Software Analysis Perspectiverdquo InSoftware Engineering and Formal Methods - 10th International Conference SEFM2012 Thessaloniki Greece October 1-5 2012 Proceedings pp 233ndash247 doi 101007978-3-642-33826-7_16 url httpdxdoiorg101007978-3-642-33826-7_16

Cytron Ron Jeanne Ferrante Barry K Rosen Mark N Wegman and F KennethZadeck (1989) ldquoAn Efficient Method of Computing Static Single Assignment FormrdquoIn Conference Record of the Sixteenth Annual ACM Symposium on Principles ofProgramming Languages Austin Texas USA January 11-13 1989 pp 25ndash35 doi1011457527775280 url httpdoiacmorg1011457527775280

Darvas Aacutedaacutem and Peter Muumlller (2006) ldquoReasoning About Method Calls in InterfaceSpecificationsrdquo In Journal of Object Technology 55 pp 59ndash85 doi 105381jot200655a3 url httpdxdoiorg105381jot200655a3

Delmas David and Jean Souyris (2007) ldquoAstreacutee From Research to Industryrdquo In StaticAnalysis 14th International Symposium SAS 2007 Kongens Lyngby DenmarkAugust 22-24 2007 Proceedings pp 437ndash451 doi 101007978-3-540-74061-2_27 url httpdxdoiorg101007978-3-540-74061-2_27

Dietl Werner and Peter Muumlller (2005) ldquoUniverses Lightweight Ownership for JMLrdquoIn Journal of Object Technology 48 pp 5ndash32 doi 105381jot200548a1url httpdxdoiorg105381jot200548a1

Dijkstra Edsger W (1976) A Discipline of Programming Prentice-HallDistefano Dino Peter W OrsquoHearn and Hongseok Yang (2006) ldquoA Local Shape Anal-

ysis Based on Separation Logicrdquo In Proceedings of the 12th International Con-ference on Tools and Algorithms for the Construction and Analysis of SystemsTACASrsquo06 Vienna Austria Springer-Verlag pp 287ndash302 isbn 3-540-33056-9978-3-540-33056-1

Distefano Dino and Matthew J Parkinson (2008) ldquojStar Towards Practical Verifi-cation for Javardquo In Proceedings of the 23rd Annual ACM SIGPLAN Conference

BIBLIOGRAPHY 217

on Object-Oriented Programming Systems Languages and Applications OOPSLA2008 October 19-23 2008 Nashville TN USA pp 213ndash226 doi 10 1145 14497641449782 url httpdoiacmorg10114514497641449782

Drossopoulou Sophia Adrian Francalanza Peter Muumlller and Alexander J Summers(2008) ldquoA Unified Framework for Verification Techniques for Object Invariantsrdquo InECOOP 2008 - Object-Oriented Programming 22nd European Conference PaphosCyprus July 7-11 2008 Proceedings pp 412ndash437 doi 101007978- 3- 540-70592-5_18 url httpdxdoiorg101007978-3-540-70592-5_18

Eclipse Java Development Tools (JDT) httpwwweclipseorgjdt Accessed2016-09-11

Feijs L M G Loe M G and H B M Jonkers (1992) Formal Specification andDesign Cambridge tracts in theoretical computer science Cambridge New YorkCambridge University Press isbn 0-521-43457-2 url httpopacinriafrrecord=b1083844

Flanagan Cormac K Rustan M Leino Mark Lillibridge Greg Nelson James B Saxeand Raymie Stata (2002) ldquoExtended Static Checking for Javardquo In Proceedingsof the 2002 ACM SIGPLAN Conference on Programming Language Design andImplementation (PLDI) Berlin Germany June 17-19 2002 pp 234ndash245 doi101145512529512558 url httpdoiacmorg101145512529512558

Floyd Robert W (1967) ldquoAssigning Meanings to Programsrdquo In Mathematical Aspectsof Computer Science Ed by J T Schwartz Vol 19 Proceedings of Symposia inApplied Mathematics Providence Rhode Island American Mathematical Societypp 19ndash32

Gallier Jean H (1987) Logic for Computer Science Foundations of Automatic Theo-rem Proving Wiley isbn 978-0-471-61546-0

Gharat Pritam M Uday P Khedker and Alan Mycroft (2016) ldquoFlow- and Context-Sensitive Points-To Analysis Using Generalized Points-To Graphsrdquo In Static Anal-ysis - 23rd International Symposium SAS 2016 Edinburgh UK September 8-102016 Proceedings pp 212ndash236 doi 101007978- 3- 662- 53413- 7_11 urlhttpdxdoiorg101007978-3-662-53413-7_11

Greenhouse Aaron and John Boyland (1999) ldquoAn Object-Oriented Effects SystemrdquoIn ECOOPrsquo99 - Object-Oriented Programming 13th European Conference LisbonPortugal June 14-18 1999 Proceedings pp 205ndash229 doi 1010073-540-48743-3_10 url httpdxdoiorg1010073-540-48743-3_10

Gross Thomas R and Peter Steenkiste (1990) ldquoStructured Dataflow Analysis for Ar-rays and its Use in an Optimizing Compilerrdquo In Softw Pract Exper 202 pp 133ndash155 doi 101002spe4380200203 url httpdxdoiorg101002spe4380200203

Guttag John V James J Horning and Jeannette M Wing (1985) ldquoThe Larch Familyof Specification Languagesrdquo In IEEE Software 25 pp 24ndash36 doi 101109MS1985231756 url httpdxdoiorg101109MS1985231756

Guttag John V James J Horning Stephen J Garland Kevin D Jones A Modet andJeannette M Wing (1993a) Larch Languages and Tools for Formal SpecificationTexts and Monographs in Computer Science Springer isbn 978-1-4612-7636-4

218 BIBLIOGRAPHY

doi 101007978-1-4612-2704-5 url httpdxdoiorg101007978-1-4612-2704-5

Guttag John V James J Horning Stephen J Garland Kevin D Jones A Modet andJeannette M Wing (1993b) Larch Languages and Tools for Formal SpecificationTexts and Monographs in Computer Science Springer isbn 978-1-4612-7636-4doi 101007978-1-4612-2704-5 url httpdxdoiorg101007978-1-4612-2704-5

Hammer Christian and Gregor Snelting (2009) ldquoFlow-Sensitive Context-Sensitiveand Object-Sensitive Information Flow Control based on Program DependenceGraphsrdquo In Int J Inf Sec 86 pp 399ndash422 doi 101007s10207-009-0086-1url httpdxdoiorg101007s10207-009-0086-1

Hatcliff John Gary T Leavens K Rustan M Leino Peter Muumlller and Matthew JParkinson (2012) ldquoBehavioral Interface Specification Languagesrdquo In ACM Com-put Surv 443 p 16 doi 10114521876712187678 url httpdoiacmorg10114521876712187678

Heintze Nevin and Olivier Tardieu (2001) ldquoDemand-Driven Pointer Analysisrdquo InProceedings of the ACM SIGPLAN 2001 Conference on Programming LanguageDesign and Implementation PLDI rsquo01 Snowbird Utah USA ACM pp 24ndash34isbn 1-58113-414-2 doi 101145378795378802 url httpdoiacmorg101145378795378802

Hind Michael (2001) ldquoPointer Analysis Havenrsquot We Solved This Problem Yetrdquo InProceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program AnalysisFor Software Tools and Engineering PASTErsquo01 Snowbird Utah USA June 18-19 2001 pp 54ndash61 doi 101145379605379665 url httpdoiacmorg101145379605379665

Hoare C A R (1969) ldquoAn Axiomatic Basis for Computer Programmingrdquo In Com-mun ACM 1210 pp 576ndash580 doi 101145363235363259 url httpdoiacmorg101145363235363259

mdash (1971) ldquoProcedures and Parameters An Axiomatic Approachrdquo In Symposium onSemantics of Algorithmic Languages pp 102ndash116 doi 101007BFb0059696 urlhttpdxdoiorg101007BFb0059696

Horwitz Susan Thomas W Reps and Shmuel Sagiv (1995) ldquoDemand Interproce-dural Dataflow Analysisrdquo In SIGSOFT rsquo95 Proceedings of the Third ACM SIG-SOFT Symposium on Foundations of Software Engineering Washington DC USAOctober 10-13 1995 pp 104ndash115 doi 10 1145 222124222146 url http doiacmorg101145222124222146

Hughes J (1987) ldquoBackwards Analysis of Functional Programsrdquo In IFIP Workshopon Partial Evaluation and Mivxed Computation Ed by Bjoslashrner and Ershov

Hur Chung-Kil Derek Dreyer and Viktor Vafeiadis (2011) ldquoSeparation Logic in thePresence of Garbage Collectionrdquo In Proceedings of the 26th Annual IEEE Sym-posium on Logic in Computer Science LICS 2011 June 21-24 2011 TorontoOntario Canada pp 247ndash256 doi 101109LICS201146 url httpdxdoiorg101109LICS201146

BIBLIOGRAPHY 219

Jacobs Bart and Frank Piessens (2006) ldquoVerification of Programs with Inspector Meth-odsrdquo In In FTfJP 2006

Jacobs Bart Jan Smans and Frank Piessens (2010) ldquoA Quick Tour of the VeriFastProgram Verifierrdquo In Programming Languages and Systems - 8th Asian Sympo-sium APLAS 2010 Shanghai China November 28 - December 1 2010 Proceed-ings pp 304ndash311 doi 101007978-3-642-17164-2_21 url httpdxdoiorg101007978-3-642-17164-2_21

Jacobs Bart Jan Smans Pieter Philippaerts Freacutedeacuteric Vogels Willem Penninckx andFrank Piessens (2011) ldquoVeriFast A Powerful Sound Predictable Fast Verifier for Cand Javardquo In NASA Formal Methods - Third International Symposium NFM 2011Pasadena CA USA April 18-20 2011 Proceedings pp 41ndash55 doi 101007978-3-642-20398-5_4 url httpdxdoiorg101007978-3-642-20398-5_4

Java Native Interface Documentation (JNI) url https docs oracle com javase7docstechnotesguidesjnispecintrohtmlwp725 (Accessed09112016)

Jensen Simon Holm Anders Moslashller and Peter Thiemann (2010) ldquoInterproceduralAnalysis with Lazy Propagationrdquo In Static Analysis - 17th International Sympo-sium SAS 2010 Perpignan France September 14-16 2010 Proceedings pp 320ndash339 doi 101007978-3-642-15769-1_20 url httpdxdoiorg101007978-3-642-15769-1_20

Jhala Ranjit and Rupak Majumdar (2009) ldquoSoftware Model Checkingrdquo In ACMComput Surv 414 211ndash2154 doi 10 1145 1592434 1592438 url http doiacmorg10114515924341592438

Jones Cliff B (1990) Systematic Software Development Using VDM (2Nd Ed) UpperSaddle River NJ USA Prentice-Hall Inc isbn 0-13-880733-7

Jones Neil D and Steven S Muchnick (1979) ldquoFlow Analysis and Optimization of Lisp-Like Structuresrdquo In Conference Record of the Sixth Annual ACM Symposium onPrinciples of Programming Languages 1979 pp 244ndash256 doi 101145567752567776 url httpdoiacmorg101145567752567776

Jones Simon B and Daniel Le Meacutetayer (1989) ldquoComputer-Time Garbage Collectionby Sharing Analysisrdquo In Proceedings of the fourth international conference onFunctional programming languages and computer architecture FPCA 1989 Lon-don UK September 11-13 1989 pp 54ndash74 doi 1011459937099375 urlhttpdoiacmorg1011459937099375

Kassios Ioannis T (2006) ldquoDynamic Frames Support for Framing Dependencies andSharing Without Restrictionsrdquo In FM 2006 Formal Methods 14th InternationalSymposium on Formal Methods Hamilton Canada August 21-27 2006 Proceed-ings pp 268ndash283 doi 10100711813040_19 url httpdxdoiorg10100711813040_19

mdash (2011) ldquoThe Dynamic Frames Theoryrdquo In Formal Asp Comput 233 pp 267ndash288doi 101007s00165-010-0152-5 url httpdxdoiorg101007s00165-010-0152-5

220 BIBLIOGRAPHY

Kennedy Ken (1978) ldquoUse-Definition Chains with Applicationsrdquo In Comput Lang33 pp 163ndash179 doi 1010160096-0551(78)90009-7 url httpdxdoiorg1010160096-0551(78)90009-7

Khedker Uday P Alan Mycroft and Prashant Singh Rawat (2011) ldquoLazy PointerAnalysisrdquo In CoRR abs11125000 url httparxivorgabs11125000

Kildall Gary A (1973) ldquoA Unified Approach to Global Program Optimizationrdquo InConference Record of the ACM Symposium on Principles of Programming Lan-guages 1973 pp 194ndash206 doi 101145512927512945 url httpdoiacmorg101145512927512945

Klein Gerwin Kevin Elphinstone Gernot Heiser June Andronick David Cock PhilipDerrin Dhammika Elkaduwe Kai Engelhardt Rafal Kolanski Michael NorrishThomas Sewell Harvey Tuch and Simon Winwood (2009) ldquoseL4 Formal Verifica-tion of an OS Kernelrdquo In Proceedings of the ACM SIGOPS 22Nd Symposium onOperating Systems Principles SOSP rsquo09 Big Sky Montana USA ACM pp 207ndash220 isbn 978-1-60558-752-3 doi 10114516295751629596 url httpdoiacmorg10114516295751629596

Knoop Jens Oliver Ruumlthing and Bernhard Steffen (1994) ldquoPartial Dead Code Elim-inationrdquo In Proceedings of the ACM SIGPLANrsquo94 Conference on ProgrammingLanguage Design and Implementation (PLDI) Orlando Florida USA June 20-24 1994 pp 147ndash158 doi 101145178243178256 url httpdoiacmorg101145178243178256

Koenig Jason and K Rustan M Leino (2012) ldquoGetting Started with Dafny A GuiderdquoIn Software Safety and Security - Tools for Analysis and Verification pp 152ndash181doi 103233978-1-61499-028-4-152 url httpdxdoiorg103233978-1-61499-028-4-152

Kogtenkov Alexander Bertrand Meyer and Sergey Velder (2015) ldquoAlias CalculusChange Calculus and Frame Inferencerdquo In Sci Comput Program 97P1 pp 163ndash172 issn 0167-6423

Lattner Chris Andrew Lenharth and Vikram S Adve (2007) ldquoMaking Context-Sensitive Points-To Analysis with Heap Cloning Practical for the Real WorldrdquoIn Proceedings of the ACM SIGPLAN 2007 Conference on Programming LanguageDesign and Implementation 2007 pp 278ndash289 doi 10114512507341250766url httpdoiacmorg10114512507341250766

Leavens Gary T Albert L Baker and Clyde Ruby (2006) ldquoPreliminary Design ofJML A Behavioral Interface Specification Language for Javardquo In ACM SIGSOFTSoftware Engineering Notes 313 pp 1ndash38 doi 10114511278781127884 urlhttpdoiacmorg10114511278781127884

Leavens Gary T and Curtis Clifton (2005) ldquoLessons from the JML Projectrdquo In Veri-fied Software Theories Tools Experiments First IFIP TC 2WG 23 ConferenceVSTTE 2005 Zurich Switzerland October 10-13 2005 Revised Selected Papersand Discussions pp 134ndash143 doi 10 1007 978 - 3 - 540 - 69149 - 5 _ 15 urlhttpdxdoiorg101007978-3-540-69149-5_15

Leavens Gary T K Rustan M Leino and Peter Muumlller (2007) ldquoSpecification andVerification Challenges for Sequential Object-Oriented Programsrdquo In Formal Asp

BIBLIOGRAPHY 221

Comput 192 pp 159ndash189 doi 10 1007 s00165 - 007 - 0026 - 7 url http dxdoiorg101007s00165-007-0026-7

Leavens Gary T and Peter Muumlller (2007) ldquoInformation Hiding and Visibility in In-terface Specificationsrdquo In 29th International Conference on Software Engineer-ing (ICSE 2007) Minneapolis MN USA May 20-26 2007 pp 385ndash395 doi101109ICSE200744 url httpdxdoiorg101109ICSE200744

Leavens Gary T Erik Poll Curtis Clifton Yoonsik Cheon Clyde Ruby David Cokand Joseph Kiniry (2006) JML Reference Manual

Lehner Hermann and Peter Muumlller (2010) ldquoEfficient Runtime Assertion Checking ofAssignable Clauses with Datagroupsrdquo In Fundamental Approaches to Software En-gineering 13th International Conference FASE 2010 Held as Part of the JointEuropean Conferences on Theory and Practice of Software ETAPS 2010 PaphosCyprus March 20-28 2010 Proceedings pp 338ndash352 doi 101007978-3-642-12029-9_24 url httpdxdoiorg101007978-3-642-12029-9_24

Leinenbach Dirk and Thomas Santen (2009) ldquoVerifying the Microsoft Hyper-V Hy-pervisor with VCCrdquo In FM 2009 Formal Methods Second World Congress Eind-hoven The Netherlands November 2-6 2009 Proceedings Ed by Ana Cavalcantiand Dennis R Dams Berlin Heidelberg Springer Berlin Heidelberg pp 806ndash809isbn 978-3-642-05089-3 doi 101007978- 3- 642- 05089- 3_51 url httpdxdoiorg101007978-3-642-05089-3_51

Leino K Rustan M This is Boogie 2 Boogie Reference Manual url http researchmicrosoftcomen-usumpeopleleinopaperskrml178pdf

mdash (1998) ldquoData Groups Specifying the Modification of Extended Staterdquo In Pro-ceedings of the 1998 ACM SIGPLAN Conference on Object-Oriented ProgrammingSystems Languages amp Applications (OOPSLA rsquo98) Vancouver British ColumbiaCanada October 18-22 1998 Pp 144ndash153 doi 101145286936286953 urlhttpdoiacmorg101145286936286953

mdash (2001) ldquoExtended Static Checking A Ten-Year Perspectiverdquo In Informatics - 10Years Back 10 Years Ahead Pp 157ndash175 doi 1010073-540-44577-3_11 urlhttpdxdoiorg1010073-540-44577-3_11

mdash (2010) ldquoDafny An Automatic Program Verifier for Functional Correctnessrdquo InLogic for Programming Artificial Intelligence and Reasoning - 16th InternationalConference LPAR-16 Dakar Senegal April 25-May 1 2010 Revised Selected Pa-pers pp 348ndash370 doi 101007978-3-642-17511-4_20 url httpdxdoiorg101007978-3-642-17511-4_20

Leino K Rustan M and Peter Muumlller (2004) ldquoObject Invariants in Dynamic Con-textsrdquo In ECOOP 2004 - Object-Oriented Programming 18th European Confer-ence Oslo Norway June 14-18 2004 Proceedings pp 491ndash516 doi 101007978-3-540-24851-4_22 url httpdxdoiorg101007978-3-540-24851-4_22

mdash (2006) ldquoA Verification Methodology for Model Fieldsrdquo In Programming Languagesand Systems 15th European Symposium on Programming ESOP 2006 Held as Partof the Joint European Conferences on Theory and Practice of Software ETAPS

222 BIBLIOGRAPHY

2006 Vienna Austria March 27-28 2006 Proceedings pp 115ndash130 doi 10 100711693024_9 url httpdxdoiorg10100711693024_9

Leino K Rustan M and Peter Muumlller (2008a) ldquoUsing the Spec Language Method-ology and Tools to Write Bug-Free Programsrdquo In Advanced Lectures on SoftwareEngineering LASER Summer School 20072008 pp 91ndash139 doi 101007978-3-642-13010-6_4 url httpdxdoiorg101007978-3-642-13010-6_4

mdash (2008b) ldquoVerification of Equivalent-Results Methodsrdquo In Programming Languagesand Systems 17th European Symposium on Programming ESOP 2008 Held as Partof the Joint European Conferences on Theory and Practice of Software ETAPS2008 Budapest Hungary March 29-April 6 2008 Proceedings pp 307ndash321 doi101007978-3-540-78739-6_24 url httpdxdoiorg101007978-3-540-78739-6_24

Leino K Rustan M Peter Muumlller and Jan Smans (2009) ldquoVerification of Concur-rent Programs with Chalicerdquo In Foundations of Security Analysis and Design VFOSAD 200720082009 Tutorial Lectures pp 195ndash222 doi 101007978- 3-642-03829-7_7 url httpdxdoiorg101007978-3-642-03829-7_7

Leino K Rustan M Peter Muumlller and Angela Wallenburg (2008) ldquoFlexible Im-mutability with Frozen Objectsrdquo In Verified Software Theories Tools Experi-ments Second International Conference VSTTE 2008 Toronto Canada October6-9 2008 Proceedings pp 192ndash208 doi 101007978-3-540-87873-5_17 urlhttpdxdoiorg101007978-3-540-87873-5_17

Leino K Rustan M and Greg Nelson (1998) ldquoAn Extended Static Checker for Modular-3rdquo In Compiler Construction 7th International Conference CCrsquo98 Held as Part ofthe European Joint Conferences on the Theory and Practice of Software ETAPSrsquo98Lisbon Portugal March 28 - April 4 1998 Proceedings pp 302ndash305 doi 101007BFb0026441 url httpdxdoiorg101007BFb0026441

mdash (2002) ldquoData Abstraction and Information Hidingrdquo In ACM Trans ProgramLang Syst 245 pp 491ndash553 doi 101145570886570888 url httpdoiacmorg101145570886570888

Leino K Rustan M Arnd Poetzsch-Heffter and Yunhong Zhou (2002) ldquoUsing DataGroups to Specify and Check Side Effectsrdquo In Proceedings of the 2002 ACM SIG-PLAN Conference on Programming Language Design and Implementation (PLDI)Berlin Germany June 17-19 2002 pp 246ndash257 doi 101145512529512559url httpdoiacmorg101145512529512559

Leino K Rustan M and Philipp Ruumlmmer (2010) ldquoA Polymorphic Intermediate Ver-ification Language Design and Logical Encodingrdquo In Tools and Algorithms forthe Construction and Analysis of Systems 16th International Conference TACAS2010 Held as Part of the Joint European Conferences on Theory and Practice ofSoftware ETAPS 2010 Paphos Cyprus March 20-28 2010 Proceedings pp 312ndash327 doi 101007978-3-642-12002-2_26 url httpdxdoiorg101007978-3-642-12002-2_26

Leroy Xavier (2009) ldquoA Formally Verified Compiler Back-endrdquo In J Autom Reason-ing 434 pp 363ndash446 doi 101007s10817-009-9155-4 url httpdxdoiorg101007s10817-009-9155-4

BIBLIOGRAPHY 223

Leroy Xavier and Franccedilois Pessaux (2000) ldquoType-Based Analysis of Uncaught Excep-tionsrdquo In ACM Trans Program Lang Syst 222 pp 340ndash377 doi 101145349214349230 url httpdoiacmorg101145349214349230

Lescuyer Steacutephane (2015) ldquoProvenCore Towards a Verified Isolation Micro-KernelrdquoIn International Workshop on MILS Architecture and Assurance for Secure Sys-tems url httpmils-workshop-2015euromilseu

Leuschel Michael and Morten Heine Soslashrensen (1996) ldquoRedundant Argument Filteringof Logic Programsrdquo In Logic Programming Synthesis and Transformation 6th In-ternational Workshop LOPSTRrsquo96 Stockholm Sweden August 28-30 1996 Pro-ceedings pp 83ndash103 doi 1010073-540-62718-9_6 url httpdxdoiorg1010073-540-62718-9_6

Lhotaacutek Ondrej and Laurie J Hendren (2006) ldquoContext-Sensitive Points-to AnalysisIs It Worth Itrdquo In Compiler Construction 15th International Conference CC2006 Held as Part of the Joint European Conferences on Theory and Practice ofSoftware ETAPS 2006 Vienna Austria March 30-31 2006 Proceedings pp 47ndash64 doi 10100711688839_5 url httpdxdoiorg10100711688839_5

Liang Sheng (1999) Java Native Interface Programmerrsquos Guide and Reference 1stBoston MA USA Addison-Wesley Longman Publishing Co Inc isbn 0201325772

Liskov Barbara and John Guttag (1986) Abstraction and Specification in ProgramDevelopment Cambridge MA USA MIT Press isbn 0-262-12112-3

Liu Yanhong A (1998) ldquoDependence Analysis for Recursive Datardquo In Proceedings ofthe 1998 International Conference on Computer Languages ICCL 1998 ChicagoIL USA May 14-16 1998 pp 206ndash215 doi 101109ICCL1998674171 urlhttpdxdoiorg101109ICCL1998674171

Liu Yanhong A and Scott D Stoller (2003) ldquoEliminating Dead Code on RecursiveDatardquo In Sci Comput Program 472-3 pp 221ndash242 doi 10 1016 S0167 -6423(02)00134-X url httpdxdoiorg101016S0167-6423(02)00134-X

Lu Yi John Potter and Jingling Xue (2007) ldquoValidity Invariants and Effectsrdquo InECOOP 2007 - Object-Oriented Programming 21st European Conference BerlinGermany July 30 - August 3 2007 Proceedings pp 202ndash226 doi 101007978-3-540-73589-2_11 url httpdxdoiorg101007978-3-540-73589-2_11

Marcheacute Claude Christine Paulin-Mohring and Xavier Urbain (2004) ldquoThe KRAKA-TOA Tool for Certification of JAVAJAVACARD Programs Annotated in JMLrdquo InJ Log Algebr Program 581-2 pp 89ndash106 doi 101016jjlap200307006url httpdxdoiorg101016jjlap200307006

Marcheacute Claude (2016) The Krakatoa Verification Tool for Java Programs KrakatoaTutorial and Reference Manual url httpkrakatoalrifrkrakatoapdf

Martin-Loumlf Per (1984) Intuitionistic Type Theory Naples BibliopolisMcCarthy John and Patrick J Hayes (1969) ldquoSome Philosophical Problems from the

Standpoint of Artificial Intelligencerdquo In Machine Intelligence Edinburgh Univer-sity Press

Meyer Bertrand (1991) Eiffel The Language Prentice-Hall isbn 0-13-247925-7mdash (1992) ldquoApplying Design by Contractrdquo In IEEE Computer 2510 pp 40ndash51

doi 1011092161279 url httpdxdoiorg1011092161279

224 BIBLIOGRAPHY

Meyer Bertrand (1997) Object-Oriented Software Construction 2nd Edition Prentice-Hall isbn 0-13-629155-4

mdash (2010) ldquoTowards a Theory and Calculus of Aliasingrdquo In Journal of Object Tech-nology 92 pp 37ndash74 doi 105381jot201092c5 url httpdxdoiorg105381jot201092c5

mdash (2011) ldquoSteps Towards a Theory and Calculus of Aliasingrdquo In Int J Softwareand Informatics 51-2 pp 77ndash115 url httpwwwijsiorgchreaderview_abstractaspxfile_no=i77

mdash (2015) ldquoFraming the Frame Problemrdquo In Dependable Software Systems Engineer-ing pp 193ndash203 doi 103233978-1-61499-495-4-193 url httpdxdoiorg103233978-1-61499-495-4-193

Midtgaard Jan (2012) ldquoControl-Flow Analysis of Functional Programsrdquo In ACMComput Surv 443 p 10 doi 10114521876712187672 url httpdoiacmorg10114521876712187672

Mike Barnett Rustan Leino Wolfram Schulte (2005) ldquoThe Spec Programming Sys-tem An Overviewrdquo In CASSIS 2004 Construction and Analysis of Safe Secureand Interoperable Smart devices Vol 3362 Springer pp 49ndash69 url httpswwwmicrosoftcomen-usresearchpublicationthe-spec-programming-system-an-overview

Milanova Ana Atanas Rountev and Barbara G Ryder (2005) ldquoParameterized ObjectSensitivity for Points-To Analysis for Javardquo In ACM Trans Softw Eng Methodol141 pp 1ndash41 doi 10114510448341044835 url httpdoiacmorg10114510448341044835

Montenegro Manuel Ricardo Pentildea and Clara Segura (2015) ldquoShape Analysis in aFunctional Language by Using Regular Languagesrdquo In Sci Comput Program 111pp 51ndash78 doi 101016jscico201412006 url httpdxdoiorg101016jscico201412006

Morgenstern Leora (1995) ldquoThe Problem with Solutions to the Frame Problemrdquo InThe Robotrsquos Dilemma Revisited The Frame Problem in Artificial Intelligence AblexAblex Publishing Co pp 99ndash133

Moura Leonardo Mendonccedila de and Nikolaj Bjoslashrner (2008) ldquoZ3 An Efficient SMTSolverrdquo In Tools and Algorithms for the Construction and Analysis of Systems14th International Conference TACAS 2008 Held as Part of the Joint EuropeanConferences on Theory and Practice of Software ETAPS 2008 Budapest HungaryMarch 29-April 6 2008 Proceedings pp 337ndash340 doi 101007978- 3- 540-78800-3_24 url httpdxdoiorg101007978-3-540-78800-3_24

Muumlller Peter (2002) Modular Specification and Verification of Object-Oriented Pro-grams Vol 2262 Lecture Notes in Computer Science Springer isbn 3-540-43167-5 doi 1010073-540-45651-1 url httpdxdoiorg1010073-540-45651-1

Muumlller Peter Arnd Poetzsch-Heffter and Gary T Leavens (2003) ldquoModular Specifi-cation of Frame Properties in JMLrdquo In Concurrency and Computation Practiceand Experience 152 pp 117ndash154 doi 101002cpe713 url httpdxdoiorg101002cpe713

BIBLIOGRAPHY 225

mdash (2006) ldquoModular Invariants for Layered Object Structuresrdquo In Sci Comput Pro-gram 623 pp 253ndash286 doi 10 1016 j scico 2006 03 001 url http dxdoiorg101016jscico200603001

Naudziuniene Daiva Matko Botincan Dino Distefano Mike Dodds Radu Grigore andMatthew J Parkinson (2011) ldquojStar-Eclipse An IDE for Automated Verificationof Java Programsrdquo In SIGSOFTFSErsquo11 19th ACM SIGSOFT Symposium on theFoundations of Software Engineering (FSE-19) and ESECrsquo11 13th European Soft-ware Engineering Conference (ESEC-13) Szeged Hungary September 5-9 2011pp 428ndash431 doi 10114520251132025182 url httpdoiacmorg10114520251132025182

Naur Peter (1966) ldquoProof of Algorithms by General Snapshotsrdquo In BIT NumericalMathematics 64 pp 310ndash316 issn 1572-9125 doi 101007BF01966091 urlhttpdxdoiorg101007BF01966091

Nelson Greg and Derek C Oppen (1980) ldquoFast Decision Procedures Based on Con-gruence Closurerdquo In J ACM 272 pp 356ndash364 doi 101145322186322198url httpdoiacmorg101145322186322198

Nielson Flemming and Hanne Riis Nielson (1999) ldquoInterprocedural Control Flow Anal-ysisrdquo In Programming Languages and Systems 8th European Symposium on Pro-gramming ESOPrsquo99 Held as Part of the European Joint Conferences on the Theoryand Practice of Software ETAPSrsquo99 Amsterdam The Netherlands 22-28 March1999 Proceedings pp 20ndash39 doi 10 1007 3 - 540 - 49099 - X _ 3 url http dxdoiorg1010073-540-49099-X_3

Nielson Flemming Hanne Riis Nielson and Chris Hankin (1999) Principles of ProgramAnalysis Springer isbn 978-3-540-65410-0

Nordio Martin Cristiano Calcagno Bertrand Meyer Peter Muumlller and Julian Tschan-nen (2010) ldquoReasoning about Function Objectsrdquo In Objects Models ComponentsPatterns 48th International Conference TOOLS 2010 Maacutelaga Spain June 28 -July 2 2010 Proceedings pp 79ndash96 doi 101007978-3-642-13953-6_5 urlhttpdxdoiorg101007978-3-642-13953-6_5

Nordstroumlm Bengt Kent Petersson and Jan M Smith (1990) Programming in Martin-Loumlfrsquos Type Theory Vol 200 Oxford University Press Oxford

OrsquoCallahan Robert and Daniel Jackson (1997) ldquoLackwit A Program UnderstandingTool Based on Type Inferencerdquo In Pulling Together Proceedings of the 19th Inter-national Conference on Software Engineering Boston Massachusetts USA May17-23 1997 Pp 338ndash348 doi 101145253228253351 url httpdoiacmorg101145253228253351

OrsquoHearn Peter W (2005) ldquoScalable Specification and Reasoning Challenges for Pro-gram Logicrdquo In Verified Software Theories Tools Experiments First IFIP TC2WG 23 Conference VSTTE 2005 Zurich Switzerland October 10-13 2005Revised Selected Papers and Discussions pp 116ndash133 doi 101007978-3-540-69149-5_14 url httpdxdoiorg101007978-3-540-69149-5_14

mdash (2012) ldquoA Primer on Separation Logic (and Automatic Program Verification andAnalysis)rdquo In Software Safety and Security - Tools for Analysis and Verification

226 BIBLIOGRAPHY

pp 286ndash318 doi 103233978-1-61499-028-4-286 url httpdxdoiorg103233978-1-61499-028-4-286

OrsquoHearn Peter W John C Reynolds and Hongseok Yang (2001) ldquoLocal Reasoningabout Programs that Alter Data Structuresrdquo In Computer Science Logic 15thInternational Workshop CSL 2001 10th Annual Conference of the EACSL ParisFrance September 10-13 2001 Proceedings pp 1ndash19 doi 1010073-540-44802-0_1 url httpdxdoiorg1010073-540-44802-0_1

OrsquoHearn Peter W Hongseok Yang and John C Reynolds (2004) ldquoSeparation andInformation Hidingrdquo In Proceedings of the 31st ACM SIGPLAN-SIGACT Sympo-sium on Principles of Programming Languages POPL 2004 Venice Italy January14-16 2004 pp 268ndash280 doi 101145964001964024 url httpdoiacmorg101145964001964024

Padhye Rohan and Uday P Khedker (2013) ldquoInterprocedural Data Flow Analysisin Soot Using Value Contextsrdquo In Proceedings of the 2nd ACM SIGPLAN In-ternational Workshop on State Of the Art in Java Program analysis SOAP 2013Seattle WA USA June 20 2013 pp 31ndash36 doi 10114524875682487569url httpdoiacmorg10114524875682487569

Park Young Gil and Benjamin Goldberg (1992) ldquoEscape Analysis on Listsrdquo In Pro-ceedings of the ACM SIGPLANrsquo92 Conference on Programming Language Designand Implementation (PLDI) San Francisco California USA June 17-19 1992pp 116ndash127 doi 101145143095143125 url httpdoiacmorg101145143095143125

Parkinson Matthew J and Gavin M Bierman (2005) ldquoSeparation Logic and Ab-stractionrdquo In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium onPrinciples of Programming Languages POPL 2005 Long Beach California USAJanuary 12-14 2005 pp 247ndash258 doi 10114510403051040326 url httpdoiacmorg10114510403051040326

Parkinson Matthew J Richard Bornat and Cristiano Calcagno (2006) ldquoVariables asResource in Hoare Logicsrdquo In 21th IEEE Symposium on Logic in Computer Science(LICS 2006) 12-15 August 2006 Seattle WA USA Proceedings pp 137ndash146 doi101109LICS200652 url httpdxdoiorg101109LICS200652

Pierce Benjamin C (2002) Types and Programming Languages MIT Press isbn 978-0-262-16209-8

Plotkin Gordon D (2004) ldquoA Structural Approach to Operational Semanticsrdquo In JLog Algebr Program 60-61 pp 17ndash139

Polikarpova Nadia Carlo A Furia Yu Pei Yi Wei and Bertrand Meyer (2013) ldquoWhatGood are Strong Specificationsrdquo In 35th International Conference on SoftwareEngineering ICSE rsquo13 San Francisco CA USA May 18-26 2013 pp 262ndash271doi 101109ICSE20136606572 url httpdxdoiorg101109ICSE20136606572

Praun Christoph von and Thomas R Gross (2003) ldquoStatic Conflict Analysis forMulti-Threaded Object-Oriented Programsrdquo In Proceedings of the ACM SIGPLAN2003 Conference on Programming Language Design and Implementation 2003 San

BIBLIOGRAPHY 227

Diego California USA June 9-11 2003 pp 115ndash128 doi 101145781131781145 url httpdoiacmorg101145781131781145

Rakamaric Zvonimir and Alan J Hu (2008) ldquoAutomatic Inference of Frame AxiomsUsing Static Analysisrdquo In 23rd IEEEACM International Conference on Auto-mated Software Engineering (ASE 2008) pp 89ndash98 doi 101109ASE200819url httpdxdoiorg101109ASE200819

Reacutemy Didier and Jerome Vouillon (1997) ldquoObjective ML A Simple Object-OrientedExtension of MLrdquo In Conference Record of POPLrsquo97 The 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages Papers Presentedat the Symposium Paris France 15-17 January 1997 pp 40ndash53 doi 101145263699263707 url httpdoiacmorg101145263699263707

Reps Thomas W Susan Horwitz and Shmuel Sagiv (1995) ldquoPrecise InterproceduralDataflow Analysis via Graph Reachabilityrdquo In Conference Record of POPLrsquo9522nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lan-guages San Francisco California USA January 23-25 1995 pp 49ndash61 doi101145199448199462 url httpdoiacmorg101145199448199462

Reps Thomas W and Todd Turnidge (1996) ldquoProgram Specialization via ProgramSlicingrdquo In Partial Evaluation International Seminar Dagstuhl Castle GermanyFebruary 12-16 1996 Selected Papers pp 409ndash429 doi 1010073-540-61580-6_20 url httpdxdoiorg1010073-540-61580-6_20

Reynolds John C (1981) The Craft of Programming Prentice Hall International seriesin computer science Prentice Hall isbn 978-0-13-188862-3

mdash (2000) ldquoIntuitionistic Reasoning about Shared Mutable Data Structurerdquo In Mil-lennial Perspectives in Computer Science Palgrave pp 303ndash321

mdash (2002) ldquoSeparation Logic A Logic for Shared Mutable Data Structuresrdquo In 17thIEEE Symposium on Logic in Computer Science (LICS 2002) 22-25 July 2002Copenhagen Denmark Proceedings pp 55ndash74 doi 101109LICS20021029817url httpdxdoiorg101109LICS20021029817

mdash (2005) ldquoAn Overview of Separation Logicrdquo In Verified Software Theories ToolsExperiments First IFIP TC 2WG 23 Conference VSTTE 2005 Zurich Switzer-land October 10-13 2005 Revised Selected Papers and Discussions pp 460ndash469doi 101007978-3-540-69149-5_49 url httpdxdoiorg101007978-3-540-69149-5_49

Robert Valentin and Xavier Leroy (2012) ldquoA Formally-Verified Alias Analysisrdquo InCertified Programs and Proofs - Second International Conference CPP 2012 KyotoJapan December 13-15 2012 Proceedings pp 11ndash26 doi 101007978-3-642-35308-6_5 url httpdxdoiorg101007978-3-642-35308-6_5

Ruf Erik (1995) ldquoContext-Insensitive Alias Analysis Reconsideredrdquo In Proceedingsof the ACM SIGPLAN 1995 Conference on Programming Language Design andImplementation PLDI rsquo95 La Jolla California USA ACM pp 13ndash22 isbn 0-89791-697-2 doi 101145207110207112 url httpdoiacmorg101145207110207112

Sabelfeld Andrei and Andrew C Myers (2003) ldquoLanguage-Based Information-FlowSecurityrdquo In IEEE Journal on Selected Areas in Communications 211 pp 5ndash19

228 BIBLIOGRAPHY

doi 101109JSAC2002806121 url httpdxdoiorg101109JSAC2002806121

Sagiv Shmuel Thomas W Reps and Reinhard Wilhelm (1999) ldquoParametric ShapeAnalysis via 3-Valued Logicrdquo In POPL rsquo99 Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages 1999 pp 105ndash118doi 101145292540292552 url httpdoiacmorg101145292540292552

Salcianu Alexandru and Martin C Rinard (2005) ldquoPurity and Side Effect Analysis forJava Programsrdquo In Verification Model Checking and Abstract Interpretation 6thInternational Conference VMCAI 2005 Proceedings pp 199ndash215 doi 101007978-3-540-30579-8_14 url httpdxdoiorg101007978-3-540-30579-8_14

Shapiro Marc and Susan Horwitz (1997) ldquoThe Effects of the Precision of Pointer Anal-ysisrdquo In Static Analysis 4th International Symposium SAS rsquo97 Paris FranceSeptember 8-10 1997 Proceedings pp 16ndash34 doi 101007BFb0032731 urlhttpdxdoiorg101007BFb0032731

Sharir M and A Pnueli (1978) Two Approaches to Interprocedural Data Flow AnalysisNew York NY New York Univ Comput Sci Dept url httpscdscernchrecord120118

Shostak Robert E (1984) ldquoDeciding Combinations of Theoriesrdquo In J ACM 311pp 1ndash12 doi 1011452422322411 url httpdoiacmorg1011452422322411

Smans Jan Bart Jacobs and Frank Piessens (2008) ldquoVeriCool An Automatic Verifierfor a Concurrent Object-Oriented Languagerdquo In Formal Methods for Open Object-Based Distributed Systems 10th IFIP WG 61 International Conference FMOODS2008 Oslo Norway June 4-6 2008 Proceedings pp 220ndash239 doi 101007978-3-540-68863-1_14 url httpdxdoiorg101007978-3-540-68863-1_14

mdash (2012) ldquoImplicit Dynamic Framesrdquo In ACM Trans Program Lang Syst 34121ndash258 doi 10114521609102160911 url httpdoiacmorg10114521609102160911

Sozeau Matthieu (2009) ldquoA New Look at Generalized Rewriting in Type TheoryrdquoIn J Formalized Reasoning 21 pp 41ndash62 doi 106092issn1972-57871574url httpdxdoiorg106092issn1972-57871574

Sozeau Matthieu and the COQ development team (1997) The Coq Proof AssistantReference Manual Version 86 Inria

Sridharan Manu Denis Gopan Lexin Shan and Rastislav Bodiacutek (2005) ldquoDemand-Driven Points-to Analysis for Javardquo In Proceedings of the 20th Annual ACM SIG-PLAN Conference on Object-oriented Programming Systems Languages and Ap-plications OOPSLA rsquo05 San Diego CA USA ACM pp 59ndash76 isbn 1-59593-031-0 doi 10114510948111094817 url httpdoiacmorg10114510948111094817

Strachey Christopher (1967) Fundamental Concepts in Programming Languages Lec-ture Notes International Summer School in Computer Programming CopenhagenReprinted in Higher-Order and Symbolic Computation 13(12) pp 1ndash49 2000

BIBLIOGRAPHY 229

Taghdiri Mana Robert Seater and Daniel Jackson (2006) ldquoLightweight Extraction ofSyntactic Specificationsrdquo In Proceedings of the 14th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering FSE 2006 pp 276ndash286 doi10114511817751181809 url httpdoiacmorg10114511817751181809

Tip Frank (1995) ldquoA Survey of Program Slicing Techniquesrdquo In J Prog Lang 33url httpcompscinetdcskclacukJPjp030301abshtml

Vardi Moshe Y and Pierre Wolper (1994) ldquoReasoning about Infinite ComputationsrdquoIn Information and Computation 115 pp 1ndash37

Volpano Dennis M Cynthia E Irvine and Geoffrey Smith (1996) ldquoA Sound TypeSystem for Secure Flow Analysisrdquo In Journal of Computer Security 423 pp 167ndash188 doi 103233JCS-1996-42-304 url httpdxdoiorg103233JCS-1996-42-304

Wadler Philip and R J M Hughes (1987) ldquoProjections for Strictness Analysisrdquo InFunctional Programming Languages and Computer Architecture Portland OregonUSA September 14-16 1987 Proceedings pp 385ndash407 doi 1010073- 540-18317-5_21 url httpdxdoiorg1010073-540-18317-5_21

Wand Mitchell and William D Clinger (1998) ldquoSet Constraints for Destructive ArrayUpdate Optimizationrdquo In Proceedings of the 1998 International Conference onComputer Languages ICCL 1998 Chicago IL USA May 14-16 1998 pp 184ndash195 doi 101109ICCL1998674169 url httpdxdoiorg101109ICCL1998674169

Wand Mitchell and Igor Siveroni (1999) ldquoConstraint Systems for Useless VariableEliminationrdquo In POPL rsquo99 Proceedings of the 26th ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages San Antonio TX USAJanuary 20-22 1999 pp 291ndash302 doi 101145292540292567 url httpdoiacmorg101145292540292567

Weiser Mark (1984) ldquoProgram Slicingrdquo In IEEE Trans Software Eng 104 pp 352ndash357 doi 101109TSE19845010248 url httpdxdoiorg101109TSE19845010248

Wing Jeannette M (1987) ldquoWriting Larch Interface Language Specificationsrdquo InACM Trans Program Lang Syst 91 pp 1ndash24 doi 101145975810500 urlhttpdoiacmorg101145975810500

Xtext Documentation httpseclipseorgXtext Accessed 2016-09-11Zee Karen Viktor Kuncak and Martin C Rinard (2008) ldquoFull Functional Verification

of Linked Data Structuresrdquo In Proceedings of the ACM SIGPLAN 2008 Conferenceon Programming Language Design and Implementation Tucson AZ USA June 7-13 2008 pp 349ndash361 doi 10114513755811375624 url httpdoiacmorg10114513755811375624

Zhao Yang and John Boyland (2008) ldquoA Fundamental Permission Interpretation forOwnership Typesrdquo In Second IEEEIFIP International Symposium on TheoreticalAspects of Software Engineering TASE 2008 June 17-19 2008 Nanjing Chinapp 65ndash72 doi 101109TASE200845 url httpdxdoiorg101109TASE200845

230 BIBLIOGRAPHY

Zheng Xin and Radu Rugina (2008) ldquoDemand-Driven Alias Analysis for Crdquo In Pro-ceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages POPL rsquo08 San Francisco California USA ACM pp 197ndash208 isbn 978-1-59593-689-9 doi 10114513284381328464 url httpdoiacmorg10114513284381328464

  • Reacutesumeacute eacutetendu en Franccedilais
    • Le Problegraveme du Frame
    • Objectifs
    • Analyse de deacutependance
    • Anaylse de correacutelation
    • Proceacutedure de deacutecision
    • Conclusion
      • Introduction
        • Formal Verification of Software
        • The Frame Problem in a Nutshell
        • Prove amp Run Objectives and Products
        • Context and Problem Statement
        • Contributions and Structure of the Document
          • The Frame Problem in Software Verification
            • Specification Languages and Verification Tools
            • Manifestations of the Frame Problem
            • Approaches to Specifying Frame Properties
              • The Manual Approach
              • The Exclusive Approach
              • The Implicit Approach
                • Topologies and Effects
                  • Explicit Footprints
                  • Implicit Footprints
                  • Predefined Footprints
                    • Other Approaches to Reason about Frames
                    • Other Relevant Work
                      • The Smart Language and ProvenTools
                        • The Smart Modeling Language
                          • Smart Predicates and Types
                          • Exit Labels and Control Flow
                          • Polymorphism amp Algebraic Data Types
                          • Specifications
                          • Illustrating Smart ndash An Abstract Process Manager
                            • ProvenTools
                            • Smil
                              • The alpha-Smil Language
                                • alpha-Smil Syntax
                                • Control Flow Graph
                                • Well-Typed Smil Statements
                                • Operational Semantics of Smil Statements
                                  • Dependency Analysis for Functional Specifications
                                    • Dependency Analysis in a Nutshell
                                      • Targeted Dependency Information
                                      • Outline
                                        • Abstract Dependency Domain
                                          • Join and Reduction Operator
                                          • Well-Typed Dependencies
                                            • Intraprocedural Analysis and Data-Flow Equations
                                              • Intraprocedural Dependency Domains
                                              • Intraprocedural Data-Flow Equations
                                              • Intraprocedural Dependency Analysis Illustrated
                                                • Interprocedural Dependencies
                                                  • Interprocedural Dependency Analysis Illustrated
                                                  • Context-Insensitivity and its Consequences
                                                    • Semantics of Dependency Values
                                                    • Related Work
                                                    • Conclusion
                                                      • Deferred Dependencies Injecting Context in Dependency Summaries
                                                        • Dealing with Context-Insensitivity
                                                        • Symbolic Dependency Components in a Nutshell
                                                        • Symbolic Paths
                                                          • Symbolic Path Type
                                                          • Semantics of Symbolic Paths
                                                          • Well-Typed Paths and Path Sets
                                                            • Abstract Dependency Domain with Deferred Accesses
                                                            • Deferred Dependencies at the Intraprocedural Level
                                                              • Extended Intraprocedural Dependency Analysis
                                                              • Intraprocedural Dependency Analysis Illustrated
                                                                • Deferred Dependencies at the Interprocedural Level
                                                                  • Applying Context-Sensitive Information by Substitution
                                                                  • Wrapped Calls and Results
                                                                    • Related Work
                                                                    • Conclusion
                                                                      • Correlation Analysis
                                                                        • Introduction
                                                                          • Targeted Correlation Information
                                                                          • Correlation Analysis in a Nutshell
                                                                            • Partial Equivalence Relations
                                                                              • Abstract Partial Equivalence Type
                                                                              • Well-Typed Partial Equivalences and their Semantics
                                                                                • Paths and Correlations
                                                                                  • Paths and Correlation Types
                                                                                  • Alignment and Partial Order
                                                                                    • Intraprocedural Correlation Analysis
                                                                                      • Intraprocedural Correlation Summaries and Analysis
                                                                                      • Intraprocedural Correlation Analysis Illustrated
                                                                                        • Interprocedural Correlation Analysis
                                                                                        • Extension ndash Constructor Evolution
                                                                                        • Related Work
                                                                                        • Conclusion
                                                                                          • Implementation Application and Results
                                                                                            • Implementation of the Dependency Analysis
                                                                                              • Dependency Type and Operators
                                                                                              • Intraprocedural Dependency Analysis
                                                                                                • Implementation of the Correlation Analysis
                                                                                                  • Partial Equivalence Relations and Operators
                                                                                                  • Intraprocedural Correlations
                                                                                                  • Dependency and Correlation Analysers
                                                                                                    • Dependency and Correlation Results on ProvenCore Layers
                                                                                                      • ProvenCore Description
                                                                                                      • Obtained Dependency and Correlation Results
                                                                                                      • Precision of our Dependency and Correlation Summaries
                                                                                                        • Reasoning about Framing using Correlations and Dependencies
                                                                                                          • A Decision Procedure
                                                                                                          • Types of Targeted Queries
                                                                                                            • Decision Procedure Experiments
                                                                                                              • Conclusion and Perspectives
                                                                                                                • Contributions
                                                                                                                • Future Work
                                                                                                                  • Bibliography
Page 5: Static Analysis of Functional Programs with an Application
Page 6: Static Analysis of Functional Programs with an Application
Page 7: Static Analysis of Functional Programs with an Application
Page 8: Static Analysis of Functional Programs with an Application
Page 9: Static Analysis of Functional Programs with an Application
Page 10: Static Analysis of Functional Programs with an Application
Page 11: Static Analysis of Functional Programs with an Application
Page 12: Static Analysis of Functional Programs with an Application
Page 13: Static Analysis of Functional Programs with an Application
Page 14: Static Analysis of Functional Programs with an Application
Page 15: Static Analysis of Functional Programs with an Application
Page 16: Static Analysis of Functional Programs with an Application
Page 17: Static Analysis of Functional Programs with an Application
Page 18: Static Analysis of Functional Programs with an Application
Page 19: Static Analysis of Functional Programs with an Application
Page 20: Static Analysis of Functional Programs with an Application
Page 21: Static Analysis of Functional Programs with an Application
Page 22: Static Analysis of Functional Programs with an Application
Page 23: Static Analysis of Functional Programs with an Application
Page 24: Static Analysis of Functional Programs with an Application
Page 25: Static Analysis of Functional Programs with an Application
Page 26: Static Analysis of Functional Programs with an Application
Page 27: Static Analysis of Functional Programs with an Application
Page 28: Static Analysis of Functional Programs with an Application
Page 29: Static Analysis of Functional Programs with an Application
Page 30: Static Analysis of Functional Programs with an Application
Page 31: Static Analysis of Functional Programs with an Application
Page 32: Static Analysis of Functional Programs with an Application
Page 33: Static Analysis of Functional Programs with an Application
Page 34: Static Analysis of Functional Programs with an Application
Page 35: Static Analysis of Functional Programs with an Application
Page 36: Static Analysis of Functional Programs with an Application
Page 37: Static Analysis of Functional Programs with an Application
Page 38: Static Analysis of Functional Programs with an Application
Page 39: Static Analysis of Functional Programs with an Application
Page 40: Static Analysis of Functional Programs with an Application
Page 41: Static Analysis of Functional Programs with an Application
Page 42: Static Analysis of Functional Programs with an Application
Page 43: Static Analysis of Functional Programs with an Application
Page 44: Static Analysis of Functional Programs with an Application
Page 45: Static Analysis of Functional Programs with an Application
Page 46: Static Analysis of Functional Programs with an Application
Page 47: Static Analysis of Functional Programs with an Application
Page 48: Static Analysis of Functional Programs with an Application
Page 49: Static Analysis of Functional Programs with an Application
Page 50: Static Analysis of Functional Programs with an Application
Page 51: Static Analysis of Functional Programs with an Application
Page 52: Static Analysis of Functional Programs with an Application
Page 53: Static Analysis of Functional Programs with an Application
Page 54: Static Analysis of Functional Programs with an Application
Page 55: Static Analysis of Functional Programs with an Application
Page 56: Static Analysis of Functional Programs with an Application
Page 57: Static Analysis of Functional Programs with an Application
Page 58: Static Analysis of Functional Programs with an Application
Page 59: Static Analysis of Functional Programs with an Application
Page 60: Static Analysis of Functional Programs with an Application
Page 61: Static Analysis of Functional Programs with an Application
Page 62: Static Analysis of Functional Programs with an Application
Page 63: Static Analysis of Functional Programs with an Application
Page 64: Static Analysis of Functional Programs with an Application
Page 65: Static Analysis of Functional Programs with an Application
Page 66: Static Analysis of Functional Programs with an Application
Page 67: Static Analysis of Functional Programs with an Application
Page 68: Static Analysis of Functional Programs with an Application
Page 69: Static Analysis of Functional Programs with an Application
Page 70: Static Analysis of Functional Programs with an Application
Page 71: Static Analysis of Functional Programs with an Application
Page 72: Static Analysis of Functional Programs with an Application
Page 73: Static Analysis of Functional Programs with an Application
Page 74: Static Analysis of Functional Programs with an Application
Page 75: Static Analysis of Functional Programs with an Application
Page 76: Static Analysis of Functional Programs with an Application
Page 77: Static Analysis of Functional Programs with an Application
Page 78: Static Analysis of Functional Programs with an Application
Page 79: Static Analysis of Functional Programs with an Application
Page 80: Static Analysis of Functional Programs with an Application
Page 81: Static Analysis of Functional Programs with an Application
Page 82: Static Analysis of Functional Programs with an Application
Page 83: Static Analysis of Functional Programs with an Application
Page 84: Static Analysis of Functional Programs with an Application
Page 85: Static Analysis of Functional Programs with an Application
Page 86: Static Analysis of Functional Programs with an Application
Page 87: Static Analysis of Functional Programs with an Application
Page 88: Static Analysis of Functional Programs with an Application
Page 89: Static Analysis of Functional Programs with an Application
Page 90: Static Analysis of Functional Programs with an Application
Page 91: Static Analysis of Functional Programs with an Application
Page 92: Static Analysis of Functional Programs with an Application
Page 93: Static Analysis of Functional Programs with an Application
Page 94: Static Analysis of Functional Programs with an Application
Page 95: Static Analysis of Functional Programs with an Application
Page 96: Static Analysis of Functional Programs with an Application
Page 97: Static Analysis of Functional Programs with an Application
Page 98: Static Analysis of Functional Programs with an Application
Page 99: Static Analysis of Functional Programs with an Application
Page 100: Static Analysis of Functional Programs with an Application
Page 101: Static Analysis of Functional Programs with an Application
Page 102: Static Analysis of Functional Programs with an Application
Page 103: Static Analysis of Functional Programs with an Application
Page 104: Static Analysis of Functional Programs with an Application
Page 105: Static Analysis of Functional Programs with an Application
Page 106: Static Analysis of Functional Programs with an Application
Page 107: Static Analysis of Functional Programs with an Application
Page 108: Static Analysis of Functional Programs with an Application
Page 109: Static Analysis of Functional Programs with an Application
Page 110: Static Analysis of Functional Programs with an Application
Page 111: Static Analysis of Functional Programs with an Application
Page 112: Static Analysis of Functional Programs with an Application
Page 113: Static Analysis of Functional Programs with an Application
Page 114: Static Analysis of Functional Programs with an Application
Page 115: Static Analysis of Functional Programs with an Application
Page 116: Static Analysis of Functional Programs with an Application
Page 117: Static Analysis of Functional Programs with an Application
Page 118: Static Analysis of Functional Programs with an Application
Page 119: Static Analysis of Functional Programs with an Application
Page 120: Static Analysis of Functional Programs with an Application
Page 121: Static Analysis of Functional Programs with an Application
Page 122: Static Analysis of Functional Programs with an Application
Page 123: Static Analysis of Functional Programs with an Application
Page 124: Static Analysis of Functional Programs with an Application
Page 125: Static Analysis of Functional Programs with an Application
Page 126: Static Analysis of Functional Programs with an Application
Page 127: Static Analysis of Functional Programs with an Application
Page 128: Static Analysis of Functional Programs with an Application
Page 129: Static Analysis of Functional Programs with an Application
Page 130: Static Analysis of Functional Programs with an Application
Page 131: Static Analysis of Functional Programs with an Application
Page 132: Static Analysis of Functional Programs with an Application
Page 133: Static Analysis of Functional Programs with an Application
Page 134: Static Analysis of Functional Programs with an Application
Page 135: Static Analysis of Functional Programs with an Application
Page 136: Static Analysis of Functional Programs with an Application
Page 137: Static Analysis of Functional Programs with an Application
Page 138: Static Analysis of Functional Programs with an Application
Page 139: Static Analysis of Functional Programs with an Application
Page 140: Static Analysis of Functional Programs with an Application
Page 141: Static Analysis of Functional Programs with an Application
Page 142: Static Analysis of Functional Programs with an Application
Page 143: Static Analysis of Functional Programs with an Application
Page 144: Static Analysis of Functional Programs with an Application
Page 145: Static Analysis of Functional Programs with an Application
Page 146: Static Analysis of Functional Programs with an Application
Page 147: Static Analysis of Functional Programs with an Application
Page 148: Static Analysis of Functional Programs with an Application
Page 149: Static Analysis of Functional Programs with an Application
Page 150: Static Analysis of Functional Programs with an Application
Page 151: Static Analysis of Functional Programs with an Application
Page 152: Static Analysis of Functional Programs with an Application
Page 153: Static Analysis of Functional Programs with an Application
Page 154: Static Analysis of Functional Programs with an Application
Page 155: Static Analysis of Functional Programs with an Application
Page 156: Static Analysis of Functional Programs with an Application
Page 157: Static Analysis of Functional Programs with an Application
Page 158: Static Analysis of Functional Programs with an Application
Page 159: Static Analysis of Functional Programs with an Application
Page 160: Static Analysis of Functional Programs with an Application
Page 161: Static Analysis of Functional Programs with an Application
Page 162: Static Analysis of Functional Programs with an Application
Page 163: Static Analysis of Functional Programs with an Application
Page 164: Static Analysis of Functional Programs with an Application
Page 165: Static Analysis of Functional Programs with an Application
Page 166: Static Analysis of Functional Programs with an Application
Page 167: Static Analysis of Functional Programs with an Application
Page 168: Static Analysis of Functional Programs with an Application
Page 169: Static Analysis of Functional Programs with an Application
Page 170: Static Analysis of Functional Programs with an Application
Page 171: Static Analysis of Functional Programs with an Application
Page 172: Static Analysis of Functional Programs with an Application
Page 173: Static Analysis of Functional Programs with an Application
Page 174: Static Analysis of Functional Programs with an Application
Page 175: Static Analysis of Functional Programs with an Application
Page 176: Static Analysis of Functional Programs with an Application
Page 177: Static Analysis of Functional Programs with an Application
Page 178: Static Analysis of Functional Programs with an Application
Page 179: Static Analysis of Functional Programs with an Application
Page 180: Static Analysis of Functional Programs with an Application
Page 181: Static Analysis of Functional Programs with an Application
Page 182: Static Analysis of Functional Programs with an Application
Page 183: Static Analysis of Functional Programs with an Application
Page 184: Static Analysis of Functional Programs with an Application
Page 185: Static Analysis of Functional Programs with an Application
Page 186: Static Analysis of Functional Programs with an Application
Page 187: Static Analysis of Functional Programs with an Application
Page 188: Static Analysis of Functional Programs with an Application
Page 189: Static Analysis of Functional Programs with an Application
Page 190: Static Analysis of Functional Programs with an Application
Page 191: Static Analysis of Functional Programs with an Application
Page 192: Static Analysis of Functional Programs with an Application
Page 193: Static Analysis of Functional Programs with an Application
Page 194: Static Analysis of Functional Programs with an Application
Page 195: Static Analysis of Functional Programs with an Application
Page 196: Static Analysis of Functional Programs with an Application
Page 197: Static Analysis of Functional Programs with an Application
Page 198: Static Analysis of Functional Programs with an Application
Page 199: Static Analysis of Functional Programs with an Application
Page 200: Static Analysis of Functional Programs with an Application
Page 201: Static Analysis of Functional Programs with an Application
Page 202: Static Analysis of Functional Programs with an Application
Page 203: Static Analysis of Functional Programs with an Application
Page 204: Static Analysis of Functional Programs with an Application
Page 205: Static Analysis of Functional Programs with an Application
Page 206: Static Analysis of Functional Programs with an Application
Page 207: Static Analysis of Functional Programs with an Application
Page 208: Static Analysis of Functional Programs with an Application
Page 209: Static Analysis of Functional Programs with an Application
Page 210: Static Analysis of Functional Programs with an Application
Page 211: Static Analysis of Functional Programs with an Application
Page 212: Static Analysis of Functional Programs with an Application
Page 213: Static Analysis of Functional Programs with an Application
Page 214: Static Analysis of Functional Programs with an Application
Page 215: Static Analysis of Functional Programs with an Application
Page 216: Static Analysis of Functional Programs with an Application
Page 217: Static Analysis of Functional Programs with an Application
Page 218: Static Analysis of Functional Programs with an Application
Page 219: Static Analysis of Functional Programs with an Application
Page 220: Static Analysis of Functional Programs with an Application
Page 221: Static Analysis of Functional Programs with an Application
Page 222: Static Analysis of Functional Programs with an Application
Page 223: Static Analysis of Functional Programs with an Application
Page 224: Static Analysis of Functional Programs with an Application
Page 225: Static Analysis of Functional Programs with an Application
Page 226: Static Analysis of Functional Programs with an Application
Page 227: Static Analysis of Functional Programs with an Application
Page 228: Static Analysis of Functional Programs with an Application
Page 229: Static Analysis of Functional Programs with an Application
Page 230: Static Analysis of Functional Programs with an Application
Page 231: Static Analysis of Functional Programs with an Application
Page 232: Static Analysis of Functional Programs with an Application
Page 233: Static Analysis of Functional Programs with an Application
Page 234: Static Analysis of Functional Programs with an Application
Page 235: Static Analysis of Functional Programs with an Application
Page 236: Static Analysis of Functional Programs with an Application
Page 237: Static Analysis of Functional Programs with an Application
Page 238: Static Analysis of Functional Programs with an Application
Page 239: Static Analysis of Functional Programs with an Application
Page 240: Static Analysis of Functional Programs with an Application
Page 241: Static Analysis of Functional Programs with an Application
Page 242: Static Analysis of Functional Programs with an Application
Page 243: Static Analysis of Functional Programs with an Application
Page 244: Static Analysis of Functional Programs with an Application
Page 245: Static Analysis of Functional Programs with an Application
Page 246: Static Analysis of Functional Programs with an Application
Page 247: Static Analysis of Functional Programs with an Application
Page 248: Static Analysis of Functional Programs with an Application
Page 249: Static Analysis of Functional Programs with an Application
Page 250: Static Analysis of Functional Programs with an Application