Click here to load reader

Proper Plugin Protocolscchristo/docs/proposal.pdfsearching for tutorials and documentation. The developer has probably asked his colleagues, who also did not know how to fix the problem

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

  • Proper Plugin Protocols

    Cost-effective Verification of Frameworks

    Thesis Proposal

    Ciera JaspanSchool of Computer ScienceCarnegie Mellon University

    [email protected]

    Submitted in partial fulfillment of the requirementsfor the degree of Doctor of Philosophy.

    Thesis Committee

    Jonathan Aldrich (chair) William ScherlisSchool of Computer Science School of Computer ScienceCarnegie Mellon University Carnegie Mellon University

    Mary Shaw Gary T. LeavensSchool of Computer Science School of EECSCarnegie Mellon University University of Central Florida

    [email protected]

  • Contents

    1 Object Collaborations 1Examples from developer forums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Problem properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Revisiting frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Relationships to specify collaboration constraints . . . . . . . . . . . . . . . . . . . . . . . . 9Analyzing constraints cost effectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2 This thesis 11Expected research contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Potential Industry Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    3 Frameworks 14Connecting plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Collaboration constraints in frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4 FUSION: A Relationship-based Specification Language 18FUSION Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    5 Validation 30Validation Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Claims of generality and support for definitions . . . . . . . . . . . . . . . . . . . . . . . . 31Validating the Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    6 Research Plan 36Preliminary work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Risk management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    7 Related work 40

    References 46

    A Detailed examination of ASP.NET examples 47

    B DropDownList Anecdote 51

  • 1 Object Collaborations

    No object is an island.

    Beck and Cunningham

    No runtime entity exists independently in software, whether it be an object, component, orfunction. These entities interact and collaborate with each other in structured ways to make auseful program. As programmers, we manipulate these collaborations by performing operationson these entities, such as invoking methods, passing in arguments, setting stateful fields, andsending or receiving data through a port.

    As an example of two entities that interact to do something useful, consider a simple list ob-ject with items in it. To add an item, we might call list.add(obj) and to remove an item, wemight call list.remove(obj). Notice that both of these operations change the abstract associa-tion between these two objects. In fact, there is even another operation, list.contains(obj), thatallows the developer to determine if such an association exists. While this collaboration is small,collaborations like this are common and are semantically important.

    While objects frequently participate in collaborations, not every collaboration is valid. Theyare frequently constrained in some way. For example, a list may require that all objects which areadded to it be in a particular state. It is possible that the list checks this requirement, or perhapsthat the item itself does, but it is also possible that the list assumes that the caller is responsible forenforcing this constraint. Therefore, the programmer must always be aware of which constraintsshe must abide by. I will refer to constraints on how several entities collaborate as collaborationconstraints.

    Collaboration constraints occur with high frequency in software frameworks and are thereforeof particular concern to that category of software. A software framework, according to Johnson, isa reusable design of all or part of a system that is represented by a set of abstract classes and the way theirinstances interact [33]. 1 Johnson describes frameworks as a form of design reuse; the frameworkexpresses one part of the design, and the code which uses the framework, commonly known asa plugin, completes the other part. To complete this design, the plugin must provide the miss-ing pieces and must coordinate many objects according to the framework-defined design patterns.Johnson notes that the reuse of patterns makes frameworks very powerful as they reuse not onlycode, but the design as well. However, he also describes frameworks as difficult to use and learn; Iwill argue that this difficulty stems directly from the collaboration constraints on how objects maycoordinate within the framework-defined patterns.

    We can directly observe how difficult it is to use frameworks by inspecting posts on developerhelp forums, such as those for ASP.NET and Spring. When a developer chooses to post on a helpforum, it tells us several things about his current situation:

    • The developer has probably spent several hours trying to figure out the problem himself bysearching for tutorials and documentation.

    • The developer has probably asked his colleagues, who also did not know how to fix theproblem.

    1As we will see shortly, this definition is not adequate to describe frameworks. I will refine it in Section 3.

    1

  • • The developer has decided that it would be more efficient for him to anonymize the code,post it, and wait possibly several days for a response, rather than continue to puzzle it outalone.

    The developers who respond to these posts are either more advanced developers or consultantsand employees of companies that will benefit from others using this framework successfully. Forexample, some Microsoft teams require that employees spend several hours each month answer-ing developer questions on the help forums. The consultants and companies who write theseframeworks and associated tools are willing to put in the time to help developers use the frame-work successfully.

    To better understand object collaboration problems in the context of frameworks, I am system-atically looking through the forums and mailing lists of industry frameworks with active devel-oper communities. I have already discovered many discussion threads which could be traced backto a problem where the developer had to coordinate several objects in some valid way. 2 In theremainder of this section, I present three representative problems from the ASP.NET framework,summarize the properties of collaboration constraints, argue that these constraints are inherentin the design of software frameworks, and introduce a specification language and accompanyingstatic analysis for collaboration constraints.

    Examples from developer forums

    I will now present three representative problems from the ASP.NET help forums. At the core ofeach of these problems was a violated collaboration constraint. In the section following, I willanalyze the common properties of these three collaboration constraints. These examples will alsobe used later to describe the specification language.

    DropDownList example

    The ASP.NET web application framework allows developers to create web pages with user inter-face controls on them. A web page that uses the ASP.NET web framework is made up of two fileswhich represent a model and a view. The first file, an ASPX file, is a declarative, HTML-basedfile which describes the layout of the controls on the page. The ASPX file represents the “view”component of the plugin, and it is used by the framework to create the view of the webpage. Thesecond file represents the model of the plugin and is written in either C# or VB.NET. This file con-tains code which the framework calls to respond to page lifecycle events and end-user events onthe controls. Since this file contains the code behind the web page, it is known as the code-behindfile. In combination, these two files create a web page, and they are a plugin to the ASP.NETframework.

    One task that a developer might want to perform is to programmatically change the selectionof a drop down list. The ASP.NET framework provides the relevant pieces, as shown in Figure 13.Notice that if the developer wants to change the selection of a DropDownList (or any other derivedListControl), she has to access the individual ListItems through the ListItemCollection and

    2A deep analysis of my data gathering approach and of the threads which exhibit a collaboration constraint can befound in Appendix A.

    3To make this code more accessible to those unfamiliar with C#, we are using traditional getter/setter syntax ratherthan properties.

    2

  • Figure 1: ASP.NET ListControl Class Diagram

    Listing 1: Incorrect selection for a DropDownList1 DropDownList list;

    2

    3 private void Page_Load(object sender, EventArgs e)4 {

    5 ListItem newSel;

    6 newSel = list.getItems().findByValue("foo");

    7 newSel.setSelected(true);8 }

    change the selection using setSelected. Based on this information, she might naı̈vely changethe selection as shown in Listing 1. Her expectation is that the framework will see that she hasselected a new item and will change the selection accordingly.

    When the developer runs this code, she will get the exception shown in Figure 2. The errormessage clearly describes the problem; a DropDownList had more than one item selected. Thiserror is due to the fact that the developer did not de-select the previously selected item, and,by design, the framework does not do this automatically. While an experienced developer willrealize that this was the problem, an inexperienced developer might be confused because she didnot select multiple items.

    The stack trace in Figure 2 is even more interesting because it does not point to the code wherethe developer made the selection. In fact, the entire stack trace is from framework code; there isno plugin code referenced at all! At runtime, the framework called the plugin developer’s codein Listing 1, this code ran and returned to the framework, and then the framework discovered theerror just before rendering the DropDownList into HTML. To make matters worse, the programcontrol could go back and forth between the framework and plugin several times before finallyreaching the check that triggered the exception. Since the developer doesn’t know exactly wherethe problem occurred, or even what object it occurred on, she must search her code by hand to

    Figure 2: Error with partial stack trace from ASP.NET

    3

  • Listing 2: Correctly selecting an item using the ASP.NET API1 DropDownList list;

    2

    3 private void Page_Load(object sender, EventArgs e)4 {

    5 ListItem newSel, oldSel;

    6 oldSel = list.getSelectedItem();

    7 oldSel.setSelected(false);8 newSel = list.getItems().findByValue("foo");

    9 newSel.setSelected(true);10 }

    Listing 3: ASPX with a LoginView1 2

    3 You can only set up your account

    4 when you are logged in.

    5

    6

    7 Location

    8

    10

    12

    13

    find the erroneous selection.The correct code for this task is in Listing 2. In this code snippet, the developer de-selects

    the currently selected item before selecting a new item. Further anecdotes and discussion of thisexample can be found in Appendix B.

    LoginView example

    On the ASP.NET forums, a developer reported that he was attempting to retrieve a DropDownListwithin his code-behind file, but his code was throwing a NullReferenceException [51]. His plu-gin uses a LoginView control, which allows developers to display some controls if the user islogged in, and other controls if the user is not logged in. It achieves this by having two templateswhich represent these states, as shown in the developer’s ASPX file in Listing 3.

    The developer properly set up a LoginView, including the DropDownList within it, in theASPX file. The developer then went to his code-behind file in Listing 4, and in the initializa-tion event, attempted to set up the DropDownList with data when the page is viewed for the firsttime. The typical way to get a sub-control is to call Control.findControl with the appropriatename; findControl will return null only if there is no sub-control with that name. While this lineof code was throwing a NullReferenceException, the developer was confused because he hadused exactly the name he declared in the ASPX file.

    Another developer responded to the post and explained this unusual error. The original de-veloper did correctly set up his controls so that the DropDownListwould only show when the useris logged in. However, the LoggedInTemplate does more than just make the controls invisible ifno user is logged in; the controls will not even exist in memory unless a user is logged in. Therefore, if

    4

  • Listing 4: Incorrect way of retrieving controls in a LoginView1 LoginView LoginScreen;

    2

    3 private void Page_Load(object sender, EventArgs e)4 {

    5 if (!isPostBack()) {6 DropDownList list = (DropDownList)

    7 LoginScreen.FindControl("LocationList");

    8 list.DataSource = ...;

    9 list.DataBind();

    10 }

    11 }

    Listing 5: Correct way of retrieving controls in a LoginView1 LoginView LoginScreen;

    2

    3 private void Page_Load(object sender, EventArgs e)4 {

    5 Request myRequest = getRequest();

    6 if (!isPostBack() && myRequest.isAuthenticated()) {7 DropDownList list = (DropDownList)

    8 LoginScreen.FindControl("LocationList");

    9 list.DataSource = ...;

    10 list.DataBind();

    11 }

    12 }

    a developer wishes to set up data in these controls, he must do so before the control is displayed,but only if the user has logged in. This constraint make more sense from a security perspective;we do not want any chance of the data within that control leaking out of the system, so it doesnot exist at all until necessary. The solution proposed was to first check the login status fromRequest.isAuthenticated(), using the page’s Request object, as shown in the corrected Listing5.

    This example quickly becomes more complex if we want to show different controls to differentkinds of users. The LoginView also allows us to do this by creating many RoleGroups and associ-ating each with user role, as shown in Lising 6. If we also want this functionality, we must checkthe properties of the logged-in user (Listing 7) to determine whether a control is accessible. Thisadds a great deal of complexity to the plugin, and it is compounded if a user is specified in morethan one LoginTemplate.

    Page Lifecycle example

    When the ASP.NET framework receives a request for a webpage, it creates the HTML for thewebpage through a series of callbacks to the plugin which represents this page. This series ofpotential callbacks is known as the page’s lifecycle, and it occurs every time a page is loadedor re-loaded. Misunderstanding the page lifecycle results in exceptions and unusual behavior atruntime, including null references [49], disappearing controls [54], and missing user input [5].Several of the postings on the forum were about these issues, and responders frequently pointthese confused developers to the ASP.NET Page Lifecycle documentation [1].

    One developer posted the VB.NET code in Listing 8 to the forum and asked why he was getting

    5

  • Listing 6: ASPX with a LoginView and multiple RoleGroups1 2

    3 You can only set up your account

    4 when you are logged in.

    5

    6

    7

    8

    9

    11

    12

    13

    14

    15 Location

    16

    18

    20

    21

    22

    23

    Listing 7: Correct way of retrieving controls in a LoginViewwith a RoleGroup1 LoginView LoginScreen;

    2

    3 private void Page_Load(object sender, EventArgs e)4 {

    5 Request myRequest = getRequest();

    6 if (!isPostBack() && myRequest.isAuthenticated() && getUser.isInRole("Admin")) {7 DropDownList list = (DropDownList)

    8 LoginScreen.FindControl("LocationList");

    9 list.DataSource = ...;

    10 list.DataBind();

    11 }

    12 }

    an null reference exception on line 15. Three other developers responded with possible problemsin the code, but each potential issue they raised turned out to be implemented correctly. Finally,the third developer found the mistake on line 1 of Listing 8.

    Sorry just noticed the event you are using! PreInit. You should be using init for this.

    You need to read the page life cycle overview http://msdn2.microsoft.com/en-us/library/ms178472.aspx

    CreateChildControls will be called on the control between these two events.

    In the PreInit callback, no controls are loaded yet, so the framework-injected field DateYear isstill null. The Init callback guarantees that all statically declared controls exist, but they do not yethave any data. The controls are guaranteed to contain their original data in the Load callback. Inseveral other forum postings, the users confused the Init and Load events, which results in either nodata (if the developer created controls in Load, after the data loading occurred) or null referencesand clobbered data (if the user attempt to read or write the control’s data while in the Init callback,

    6

    http://msdn2.microsoft.com/en-us/library/ms178472.aspxhttp://msdn2.microsoft.com/en-us/library/ms178472.aspx

  • Listing 8: Incorrect usage of the page lifecycle1 Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.PreInit2

    3 ’Generate years for drop down menu4 Dim Dates As New Collections.Generic.List(Of System.DateTime)5 ’Dates.Add(System.DateTime.Now)6

    7 If Not Me.IsPostBack Then8 ’ Add next 5 years9 For i As Integer = 0 To 4

    10 Dates.Add(System.DateTime.Now.AddYears(i))11 Next12 End If13

    14 ’ DateYear is a statically declared DropDownList15 Me.DateYear.DataSource = Dates

    16 Me.DateYear.DataTextField = "Year"

    17

    18 Me.DateYear.DataBind()

    19 End Sub

    before data loading occurred). In addition to these three callbacks, there are eight other lifecycleevents that a page goes through, and there are another four events for controls which have bounddata and four more for authentication events.

    Problem properties

    These examples, and many others we have found on the forums, demonstrate four interestingproperties of collaboration constraints.Problem Property 1. Collaboration constraints involve multiple types and objects.

    Listing 1 from the first example referenced three objects, and Listing 2 required four objectsto make the proper selection. The framework code that the DropDownList example used waslocated in four classes (DropDownList, ListControl, ListItemCollection, and ListItem). Inthe LoginView example, the correct plugin also referenced four objects: the Request object, theLoginView control, the DropDownList control, and the Page in which all this code was runningand which owned the Request and the LoginView.Problem Property 2. Collaboration constraints are often extrinsic to a type.

    By extrinsic, I mean that the constraint is limiting the use of a type, but it is checked or definedoutside of that type. By contrast, an intrinsic constraint is one which limits the class it is definedby; class invariants and single-object protocols are examples of intrinsic constraints. In the Drop-DownList example, while the DropDownList was the class that checked the constraint (as seen bythe stack trace), the constraint itself was on the methods of ListItem. However, the ListItemclass is not aware of the DropDownList class or even that it is within a ListControl at all, andtherefore it should not be responsible for enforcing the constraint. Likewise, in the Page Lifecycleexample, the ability to call certain methods on a Control is limited based on what callback thePage is currently in, and not on any property of the Control itself. This extrinsic nature makes theconstraint difficult to check, even at runtime.Problem Property 3. Collaboration constraints involve semantic properties such as object identity,primitive values, state, and ordering of operations.

    7

  • Listing 9: Selecting on the wrong DropDownList1 DropDownList listA;

    2 DropDownList listB;

    3

    4 private void Page_Load(object sender, EventArgs e)5 {

    6 ListItem newSel, oldSel;

    7 oldSel = listA.getSelectedItem();

    8 oldSel.setSelected(false);9 newSel = listB.getItems().findByValue("foo");

    10 newSel.setSelected(true);11 }

    Collaboration constraints require plugin developers to be aware of the framework’s programsemantics. In the DropDownList example, the plugin developer had to be aware of which objectsshe was using to avoid the problem in Listing 9. In this example, the developer called the correctoperations, but on the wrong objects. She also had to be aware of the primitive values (such astrue or false) she used on the calls to change the selection. Finally, she had to be aware of theordering of the operations. In Listing 2, had she swapped lines 6 and 7 with lines 8 and 9, shewould have caused unexpected runtime behavior where the selection change does not occur. Thisbehavior occurs because getSelectedItem returns the first selected ListItem that it finds in theDropDownList, and that may be the newly selected item rather than the old item.

    Problem Property 4. Collaboration constraints span many kinds of files and data.

    The LoginView example shows how collaboration constraints extend across different kinds ofprogramming files. In this example, the ASPX file affected how the programmer could referenceand use the objects in the C# code-behind file. The code-behind file also had to use the samestrings as the ASPX file for the desired behavior to take place.

    The Page Lifecycle example contains another interesting interaction between these files. Thefield DateYear was not available because the framework uses dependency injection to automat-ically set this field for the plugin. Had the plugin set this field itself, the constraint no longerapplies. Whether or not the framework performs the dependency injection in the code-behind fileis based on what controls are declared in the ASPX file.

    Revisiting frameworks

    While interesting, the examples raise two concerns of whether solving the problems describedconstitutes fundamental research in frameworks.

    First, while ASP.NET declares itself a framework, it does not entirely match Johnson’s def-inition. Johnson states that frameworks are, by definition, object-oriented constructs. [34] Yetframeworks in industry, even those that have an object-oriented basis, frequently interact withplugins through declarative files, annotations, and even aspect-oriented programming. [8] Arethese still frameworks, even if they interact with plugins through non-OO mechanisms? While itis not uncommon for industry to create a buzzword by extending a term beyond its useful mean-ing, I argue that the ubiquity of non-OO mechanisms shows that the original definition was notadequate. In this work, I will be presenting a new set of definitions which will account for currentand future industrial frameworks; my definitions and rationale for them can be found in Section3.

    8

  • The second concern is that the designs here appear to be overly-complex; readers may wonderif there exist more usable designs, without the described collaboration constraints, which stillmeet the goals of the system. However, I argue that the collaboration constraints are inherent tothe design of frameworks. Framework designs must balance several quality attributes; they areconcerned with the ease-of-reuse of the API (API usability), the potential number of plugins whichcan reuse the framework (reusability and extensibility), and any quality attributes specific to theframework’s domain (scalability, compatibility, reliability, etc.). These quality attributes frequentlyconflict with each other, and balancing them all makes collaboration constraints a natural artifactof framework designs.

    As an example, let us try to “clean up” the constraint from the DropDownList API by localizingit and encapsulating it within a type, thereby making it of little consequence to the plugin code.To make this design easier to use, and make the constraint non-existent, the DropDownList classshould have maintained the selection instead of the ListItem class, and DropDownList shouldhave contained a setSelected(ListItem)method which had the expected behavior. This wouldachieve the desired results as users could simply find the desired ListItem and change the selec-tion. However, this functionality would then not be reusable by other kinds of ListControls. Itcould be moved up to the ListControl class itself, but not every derived class of ListControlhas this single-select functionality. Therefore, the developers chose to put the functionality in theListItem class and prioritize internal reuse above API usability.

    While the final design choices are debatable, there were tradeoffs which the developers had towork within. The other examples had similar tradeoffs. The LoginView design had to trade offusability of the API versus the security and performance benefits of not creating the sub-controlsof LoginView until absolutely necessary. Had the framework developer designed this API to makeit more usable, she would have opened up security holes because attackers could also access thecontrol without logging in. The Page Lifecycle example also traded off usability, this time forpotential reusablity and extensibility; without the many callbacks, either the plugin would haveto perform all the framework operations itself, or it would not be able to insert new functionalitybetween each of the framework operations.

    Collaboration constraints are a side effect of designing for resuability, extensibility, and otherquality attributes rather than the usability of the API. Therefore, we may never fully solve theproblem of collaboration constraints by simply having “better” designs.

    Further discussion of what frameworks are, and the difficulties in designing them, can befound in Section 3.

    Relationships to specify collaboration constraints

    Since we cannot remove collaboration constraints from our design without sacrificing desirablequality attributes, I intend to help developers by specifying these constraints so that plugins canbe statically checked for conformance. Recall that the collaboration constraints are limits on howmultiple objects collaborate. Therefore, I propose that we model the relationships which existbetween objects as named predicates across runtime objects `4.

    Relationship = Name(`1, . . . , `n)

    We can use relationships to describe how operations affect the collaborations between ob-jects. Consider the simple collaboration described earlier between a list and the objects within

    4` is actually a static representation of a runtime object

    9

  • the list. We can describe the operation list.add(obj) as adding relationship Item(list, obj)and the operation list.remove(obj) as removing this relationship. Likewise, the operationlist.contains(obj) tells us whether the relationship Item(list, obj) currently exists.

    Since we have modeled relationships as predicates, we can string them together using logicalconnectives and describe constraints over these collaborations. For example, we can specify theconstraint on LoginView.FindControl(String id) as:

    Name(result, id) ∧(LoggedInTemplate(this, result) =⇒

    Child(page, this) ∧ Request(page, request) ∧LoggedIn(request))

    That is, there must be a control with that name, and if that control is within the LoggedInTemplateof the LoginView, then we must also know that the Request object for the Page has user logged in.

    Further details about the specifications, including the concrete semantics and the differentforms of the specifications for different interaction mechanisms, can be found in Section 4.

    Analyzing constraints cost effectively

    In addition to specifying these constraints, I have also created a static analysis that will checkplugins to see if they conform to the specified collaboration constraints. Using this analysis shouldbe less burdensome than the current alternative, which involves the developer discovering theviolation himself and then posting on a help forum to find the cause. I believe an analysis will besuccessful if it can meet the following criteria:

    1. The analysis should direct developers to the source of the fault in the plugin code, as op-posed to a code trace of where the error occurs at runtime.

    2. The analysis should minimize the amount of time that developers investigate false positives,while maximizing the number of true positives shown.

    3. The analysis should be general enough that it is more cost-effective for a framework de-velopment team to specify the collaboration constraints and reuse the analysis, rather thancreate a new, custom analysis tool.

    Further details about the analysis, including formal definitions of soundness and complete-ness, a description of three variants of this analysis, and how the analysis is affected by the pres-ence of aliasing, can be found in Section 4.

    10

  • 2 This thesis

    The SE PhD Page mentions this thingcalled a “thesis proposal”. A “thesisproposal” is a description of whatresearch you plan to do during yoursecond half of graduate school. After youcomplete this research, you will graduate.

    Graduates of ISR

    My work introduces the concept of a collaboration constraint, a state-based restriction on howmultiple objects may interact. Collaboration constraints occur in many situations, but they occurfrequently within software frameworks. My analysis of software frameworks shows that develop-ers who use frameworks frequently violate these constraints and have trouble fixing violationswithout help from another developer.

    To help developers specify and analyze collaboration constraints, I have created a new ab-straction called a relationship. I have instantiated this abstraction in FUSION (Framework Us-age SpecificatIONs), a relationship-based specification language and analysis system for softwareframeworks written in Java and XML. FUSION will allow framework developers to specify col-laboration constraints, and it will allow plugin developers to check their code for violations in acost-effective way.

    Thesis Statement. Collaboration constraints are inherent to the design of software frameworks but areburdensome for plugin developers. These constraints can be defined by specifications that describe therelationships between objects and how relationships change, and a cost-effective static analysis can checkthat code conforms to the specified constraints.

    This work defines (or re-defines) three concepts and makes two general claims regarding col-laboration constraints. This research also investigates seven hypothesis about frameworks andthe FUSION specification and analysis system. Section 5 describes the supporting material forthe definitions, the form of the argument to show the generality of the claims, and the validationprocess for the hypotheses.

    Definition 1 (Collaboration Constraint). A collaboration constraint is a precondition that is ex-pressed as a predicate on abstract states of several objects.

    Notice that by combining several collaboration constraints together, a developer can describea protocol for using multiple objects based on their abstract state.

    Definition 2 (Software Framework). A software framework is a set of reusable modules which en-force that their clients conform to a predefined architecture.

    Claim 1. Collaboration constraints are an essential part of a framework’s design.

    Definition 3 (Relationship). A relationship is a user-defined, abstract state-based association be-tween several objects.

    11

  • Claim 2. Relationships are a practical and adoptable technique for specifying collaboration con-straints.

    Hypothesis 1. Collaboration constraints are burdensome for developers who are using frameworks.

    Hypothesis 2. FUSION can specify common collaboration constraints in Java and XML frame-works.

    Hypothesis 3. FUSION specifications are incremental; the framework developer can specify a con-straint without specifying irrelevant parts of the framework.

    Hypothesis 4. FUSION specifications are composable; the framework developer can add new con-straints at a later time without conflicts and can reuse relationships across multiple constraintsand even across frameworks.

    Hypothesis 5. FUSION specifications are more expressive and more composable than existing for-mal methods for specifying collaboration constraints.

    Hypothesis 6. The FUSION analysis can statically discover violations of specified constraints inplugin code, and it can do so either in a sound manner (all possible runtime violations are dis-covered), a complete manner (only actual runtime violations are discovered), or a hybrid manner (awell-defined point on the sound-complete continuum).

    Hypothesis 7. When run on typical industry code, the hybrid variant will report a higher propor-tion of true positives than either the sound variant or a combined variant that returns all completereports plus a random sampling of sound-variant-only reports.

    Expected research contributions

    This research will make three primary contributions to research and industry:

    1. Collaboration Constraints and Frameworks Provide a precise and useful definition of a “soft-ware framework” and show that the collaboration constraints described are essential to thenature of frameworks.

    (a) Present a clear and useful definition of frameworks and plugins, driven by industryconstructs and designs rather than historical accident. The definition will not be limitedto a particular design paradigm but will abstract over them in a useful manner.

    (b) Present a clear analysis of the interactions between frameworks and plugins that canbe used to drive further research in the field. In particular, it will provide a taxonomyof the types of constraints that frameworks impose on plugins based on empirical ev-idence. This taxonomy can be used to improve error prevention techniques, includingbetter framework design and other analyses.

    (c) Show that the collaboration constraints described are common in practice and are par-ticularly problematic for plugin developers.

    2. Relationships and FUSION. Show that relationships can be used to specify collaboration con-straints that occur in Java and XML frameworks.

    (a) Define the relationship abstraction and demonstrate its ability to specify collaborationconstraints.

    12

  • (b) Present a specification language to describe collaboration constraints that occur inframeworks based on Java and XML. While there are many languages to specify eitherJava or XML, there are no known specification languages that can describe frameworkdesign intent which transcends both languages.

    3. FUSION Analysis. Present a cost-effective static analysis of the specifications that can detectviolated collaboration constraints in plugin code.

    (a) Provide a static analysis which checks plugins for conformance to collaboration con-straint specifications and directs the developers to the cause of any errors found.

    (b) Present a detailed case study on the differences between three variants of the analysis:a sound version, a complete version, and a hybrid version which is neither sound norcomplete, but instead balances the tradeoffs of false positives and false negatives. Thecase study will detail several sources of imprecision for the static analysis, the affect ofthis imprecision on the three variants, and the extent to which this imprecision occursin industry code. The expected sources of imprecision are from aliasing, broken behav-ioral subtying, and inter-procedural protocols, but any others will be documented aswell.

    Potential Industry Applications

    I expect FUSION to be primarily used by industry professionals to specify their frameworks andassist plugin developers with finding problems. In particular, I anticipate that framework devel-opers would adopt this tool incrementally by adding relationship specifications on an on-demandbasis; when a plugin developer asks about a constraint on the forum or mailing list, the frame-work developers can answer the question and then add specifications for that constraint in thenext release. After the next release, plugin developers would be able to run the analysis to detectviolations of these constraints without any assistance from other developers.

    Many large frameworks, such as Spring and ASP.NET, have generated third-party service com-panies which sell developer tools and consulting services. I expect these companies would beattracted to this work as a means of increasing business; these service companies could sell spec-ification sets and tools. As the number of constraints in a particular framework increase, I wouldalso expect framework vendors and service companies to build more tools that take advantage ofthese specifications. For example, a tool that visually describes the constraints would be a usefulform of documentation, as would a tool that suggests operations based on the constraints whichneed to be satisfied.

    13

  • 3 Frameworks

    Framework frām werk NOUN An essentialsupporting structure of a building,vehicle, or object. A basic structureunderlying a system, concept, or text.

    New Oxford American Dictionary

    Software frameworks were created in the object-oriented community to reuse an OO designfor several applications. They have been characterized in many ways, including:

    1. “A framework is a reusable design of all or part of a system that is represented by a set ofabstract classes and the way their instances interact.”[34]

    2. “A framework is a partial design and implementation for an application in a given problemdomain.” [29]

    3. “Inversion of control is a common feature of frameworks...”[21]

    4. “The Hollywood Principle is a key to understanding frameworks. It lets a framework cap-ture architectural and implementation artifacts that don’t vary, deferring the variant parts toapplication-specific subclasses.”[55]

    However, these descriptions do not accurately represent the “software frameworks” that arein industry use today. Many industry software frameworks have non-OO features such as de-pendency injection, aspects, and static configuration files. This disconnect has stifled researchin the field; very little new work on software frameworks has appeared since around 2001, yetdevelopers continue to have problems using software frameworks.

    I argue that these industry software frameworks are rightly called software frameworks, de-spite these unusual non-OO technologies, and that earlier definitions of software frameworkswhich focus on objects and inversion of control are a consequence of the original context. Theobject-oriented community highly values design reuse, so it is no coincidence that they were thefirst to investigate software frameworks. However, definitions which limit software frameworksto the object-oriented domain are too narrow for industry practice. As part of this work, I willpresent a revised definition of a software framework, and the associated term, a plugin.

    My view of software frameworks stems from software architecture concepts. In particular,software frameworks are not simply a module of code reuse, but a module of code that imple-ments and enforces a software architecture. This view is shared by industry developers; the onlydefinition I found which described frameworks in architectural terms was in the book “SoftwareFactories”, by two Microsoft employees [25]. 5 Likewise, I will define a framework, and the asso-ciated terms, in architectural vocabulary:

    Definition 2 (Software Framework, or just Framework). A software framework is a set of reusablemodules which enforce that their clients conform to a predefined architecture.

    5In this book, they say that “A framework is developed to bootstrap implementations of products based on a com-mon architectural style.” However, this definition is not quite right as a framework is not solely about bootstrapping.

    14

  • Definition 3 (Plugin). A plugin is a module which extends a framework and works within theconstraints of a framework’s defined architecture to add specific functionality. 6

    Definition 4 (Extension point). An extension point is an API which is defined by the frameworkand is implemented by plugins.

    A framework is not simply a set of modules with a protocol for how to access some reusablefunctionality. In fact, a framework may have very little functionality; it may only be an implemen-tation to connect plugins together. Regardless, this framework implementation encapsulates thearchitecture for the final system. Consider the following examples:

    • Open|SpeedShop is a framework for creating distributed dynamic analyses. It has severaltypes of plugins: wizards set up an experiment to run, collectors gather the data, aggregatorsput data together, analyses run some computation on the data, and views display the resultsto the user. While the framework does provide some functionality, its primary purpose isconnecting these plugins into a pipe-and-filter architecture. In fact, the reusable functionalityit provides is handled by some built-in libraries; the framework itself just loads componentsand connects them together.

    • Eclipse is a framework for developer tools. Eclipse provides a mechanism for plugins todefine their own extension points, so that plugins in Eclipse can also be small frameworksand have their own plugins. Eclipse loads the plugins in and then connects them together inan architecture that resembles an acyclic graph of frameworks and plugins.

    • Spring is a framework for web applications. Each web application is forced by Spring to ad-here to a model-view-controller architecture. Like Open|SpeedShop, Spring provides somereusable functionality as well, but this functionality is packaged into libraries. In Spring,these libraries may also be plugins and can be replaced by other plugins.

    While each of these frameworks do have some reusable functionality, the functionality couldbe removed or replaced and it would still be a framework. In fact, any replaced functionalitywould still have to conform to the framework’s architecture. All of these frameworks also useOO designs, but the designs are not purely object-oriented. The frameworks above use staticconfiguration files, aspects, and dependency injection heavily; objects are only a part of how theyinteract with plugins. Therefore I argue that a framework is not simply a set of modules withreusable, object-oriented functionality, or even a reusable object-oriented design. A framework isa set of modules which encapsulates a reusable architecture, and this architecture may contain OOdesigns.

    Connecting plugins

    In the original definition of frameworks, plugins connected to a framework by implementing partof an incomplete object-oriented design pattern [34]. For example, the template method designpattern is very commonly used for this purpose; the framework provides the abstract templateand the plugin provides the concrete implementation for the method [24, 21].

    6It is interesting to notice that a plugin may be developed by the person who is composing the plugin with theframework, by a third-party, or even by the framework developer. Who develops the plugin is a separate issue fromwhat it is.

    15

  • As noted in the earlier section, a framework, as the term is used in industry, does not com-municate with a plugin through OO mechanisms alone. A sampling of some of the mechanismsused:

    • Frameworks use dependency injection to inject values into fields or internal structures of theplugin at runtime. This is frequently implemented through meta-data and reflection.

    • Similarly, frameworks use aspects to inject computation to occur before and after plugin com-putation occurs.

    • Many frameworks use some a static configuration files for several purposes, including as anarchitectural description for how to deploy components and as a input to an interpreter thattranslates the file into another format.

    I will describe the collection of these techniques, including, but not limited to the ones above, asframework interaction mechanisms.

    Definition 5 (Framework interaction mechanism). A framework interaction mechanism is a tech-nology used to manage connections, either static or dynamic, between a framework and a plugin.7

    Collaboration constraints in frameworks

    As discussed previously, framework designs are difficult to use. This is because these designs areattempting to balance an impossible tradeoff between the ease of using the framework and thenumber of potential plugins that can use the framework. Consider a simple framework which hasa single extension point with a single method. The framework can be used by any plugin whichimplements this extension point. The framework will then call through this API at the appropriatetime, thus allowing the plugin to extend the framework in a structured way. The framework limitswhat the plugin can extend since it only provides one method, so many customizations are notpossible in this scenario.

    If we want the framework to be usable by more plugins, then the API to the framework mustbe wider, more open, and more fine-grained. These properties increase extensibility of the frame-work, but also decrease the ease of use.

    • Widening the API means the framework provides more extension points where plugins mayadd customized code. Thus, the framework supports a wider variety of plugins. However,when an API is wide, the plugin developer may create multiple interacting plugins, andthe plugins will have to be consistent with each other. This is more difficult to check thanself-consistency, and it is particularly difficult when plugins are developed separately.

    • An open API is one where plugins have access to internal data structures which are normallyprotected; this access allows plugins to change the state of the framework in ways that mighthave been unanticipated by the framework developer. An open API has many problems; ifthe framework developer allows plugins to access internal data structures, then the onus ison the plugin developer to understand and maintain internal invariants of the framework.

    7This definition is intentionally vague as I am still investigating the properties of interaction mechanisms.

    16

  • • A fine-grained API allows plugins to add functionality at many stages of a single extensionpoint. Instead of a single method, the extension point may have many methods that getcalled in a particular sequence, and some may be called even when internal data structuresare not yet fully-initialized. A fine-grained API provides significant extensibility benefits,but it means plugins must be aware of the protocol and calling context to ensure they do notaccess invalid framework data or invalidate their own data.

    Each of these problems results in a collaboration constraint. There are constraints about howthe plugin collaborates internal elements, external elements, and how it collaborates with otherplugins as well.

    17

  • 4 FUSION: A Relationship-based Specification Language

    Fusion fyoō zhen NOUN The process orresult of joining two or more thingstogether to form a single entity

    New Oxford American Dictionary

    A relationship is a developer defined, design-level association between several objects. Forexample, most developers think of a relationship between a data structure and each item withinthe data structure. The concrete connection between these objects may go through several otherobjects in the heap, but developers still think of them as related directly. Relationships are a formof design intent, and they represent only the abstract connection between several objects, not aconcrete connection.

    Formally, a relationship is a user-named predicates across runtime objects `.Relationship = Name(`1, . . . , `n)

    Relationships are typed based on their name, parity, and the type of the objects. A relationship typeis therefore a named predicate across types τ.

    Relationship Type = Name(τ1, . . . , τn)

    By connecting relationships together using logic, we can specify framework constraints. Therelationship abstraction is useful because it handles all the properties of collaboration constraints,as defined earlier in Section 1.

    1. Collaboration constraints involve multiple types and objects. Relationships describe the stateacross multiple objects.

    2. Collaboration constraints are often extrinsic to a type. Relationships are not owned by any par-ticular type, so crossing a type boundary is not an obstacle.

    3. Collaboration constraints involve semantic properties such as object identity, primitive values, state,and ordering of operations. Relationships refer to object identity directly and can refer to prim-itives as well. As relationships capture a programmer’s design intent, it is straightforwardto capture the semantics of a collaboration constraint. 8

    4. Collaboration constraints span many kinds of files and data. Relationships are not a language-specific abstraction; they are a design abstraction. Any language with the concept of distinctentities and collaborations between entities can use relationships to describe the collabora-tions.

    Additionally, relationships are a natural abstraction for developers because they are alreadyimplicit in the program knowledge. Consider a program which is making function calls to someAPI. As this program calls functions, it is implicitly changing the relationships between objects.For example, after a program calls to the function List.add(Object item), it knows that there isa relationship between the item and the list. If the functions are returning data, the program can

    8Specific examples will be shown later.

    18

  • test the output to discover more relationships; a function call to List.contains(Object item)will tell the caller whether there is a relationship between the item and the list.

    Relationships occur not only from the caller’s persecutive, but the callee’s as well. When afunction is called, it received some tacit knowledge about the state of it’s parameters and thecontext it was called in. Recall that when an ASP.NET plugin received a PreInit callback, it receivestacit information regarding the state of its controls and the data within those controls.

    Even declarative programs imply relationships based upon their structure. In the exampleASP.NET plugin that used a LoginView control, it assumed knowledge of a DropDownList insideof the LoginView. The relationship between these two objects was not retrieved through any C#-based mechanism; it was declared within the ASPX file.

    The knowledge about relationships at a given program point is modeled by the relationshipcontext. This context maps relationships to their known state. If a developer has no knowledge,then the relationship’s state is unknown. The relationship context provides a model by which wecan store and query relationships.Relationship Context = {Relationship 7→ t|t ∈ {True, False,Unknown} ∧ Relationship is well-typed}

    FUSION Specifications

    To show how to use relationships to specify and check collaboration constraints, I have createdFUSION (Framework Usage SpecificatIONs). FUSION is a cost-effective, relationship-based sys-tem to specify and check collaboration constraints in Java and XML frameworks. The primarypurpose of FUSION is to verify that a plugin does not violate any collaboration constraints of theframework. These constraints are specified by the framework developer; the plugin developerwrites no specifications. The FUSION analysis then reads the specifications and verifies that plu-gin code does not break any collaboration constraints. The analysis assumes that the frameworkcode is correct with respect to the specifications. Since the framework developer writes specifi-cations that are shared by all the plugin developers, the system is highly cost-effective. A smalleffort on behalf of a few people assists the larger community.

    The first half of this section will describe the FUSION specifications, and the second half willdescribe a cost-effective static analysis to check that plugins conform to these specifications. Thespecifications of FUSION come in five forms, which are described in detail in this section: 9

    1. Relationship effects capture changes to relationships between multiple objects based on theframework operations used by a plugin. (In the abstract description above, this is the tacitknowledge obtained by the caller of a function.)

    2. Callback starting states capture the relationships which are known when the frameworkpasses control to the plugin. (This represents tacit knowledge obtained by the callee of afunction.)

    3. Constraints describe limits on the collaborations between objects.

    4. Inferred Relationships describe implicit, dynamic relationships effects which can be assumedwhen some predicate is true.

    9These five have been or will be implemented during this thesis, but this does not preclude the ability to have otherspecifications for other framework artifacts, as described above.

    19

  • Listing 10: Item relationship type1 @RelationshipType({Object.class, List.class})2 public @interface Item {3 public String[] value;4 public Effect effect; //Effect is an enum with options ADD, REMOVE, and TEST5 public String test = "";6 }

    5. Configuration Relationships capture the relationships which are known based on the contentsof an XML file.

    Relationship effects Relationship effects specify the tacit knowledge of the plugin after callinga framework method. Consider a framework developer who is specifying a typical List interfacewhere objects in the list are expected to be in an Item relationship with the list. The frameworkdeveloper can specify that the method List.add(Object item) has the effect of creating an Itemrelationship between the item and the list (also known as the target object, or this). Similarly,calling List.remove(Object item) removes the Item relationship between the item and the targetobject. The plugin can even test the state of this relationship by calling List.contains(Objectitem) to determine whether there exists an Item relationship between these objects.

    Relationship effects are specified as Java 5 Annotations. The framework developer must firstdefine an annotation type for each relationship type. The definition for the Item relationship typeis in Listing 10 The meta-annotation @RelationshipType defines the types which Item works on, inthis example, Object and List. The three parameters are required for all relationship types, andwill be explained as they are used.

    As relationship types are defined by the framework developer, they have no predefined se-mantics.10 Any hierarchy or ownership present, such as Item, is implicit and is only defined bythe framework developer. In fact, relationships do not have to reflect any reference paths found inthe heap, but may exist only as an abstraction of design intent to the developer. This allows rela-tionships to be treated as an abstraction independent from code. This is a common specificationparadigm; relationships have a similar purpose to the model fields in JML specifications [36]. Inparticular, this means relationships can be used across different types of files (like Java and XML)or across different frameworks (like Spring and Hibernate).

    Once the developer has defined a relationship type, she can annotate methods to show rela-tionship effects. Listing 11 shows the relationship effects for the simple List example. To add orremove a relationship, the developer specifies the objects within the relationship (the value pa-rameter in Listing 10) and the effect desired (the effect parameter in Listing 10). To test the stateof a relationship, the developer uses the TEST effect and provides a value for the third parameter.This must be a boolean value which is true if the effect is added and false if it is removed.

    Relationship effects may refer to any variables used by the specified operation. In the case ofmethod calls, relationships can refer to the parameters, the target of the method call or field access(designated with the name target), and the returned object (designated with result). Relationshipeffects may also refer to types and primitive values. Finally, parameters can be wild-carded, soItem({“*”, “list”}, REMOVE) removes all the Item relationships between list and any other object;

    10In the implementation, I have provided pre-defined semantics for the identity relation (==) and the type relation(instanceof) for usability purposes.

    20

  • Listing 11: Relationship effects on List1 public interface List {2 @Item({‘‘item’’, ‘‘target’’}, ADD)3 public void add(Object item);4

    5 @Item({‘‘item’’, ‘‘target’’}, REMOVE)6 public void remove(Object item);7

    8 @Item({‘‘item’’, ‘‘target’’}, TEST, ”result”)9 public boolean contains(Object item);

    10

    11 @Item({‘‘∗’’, ‘‘target’’}, REMOVE)12 public void clear();13 }

    this is especially useful to place on methods such as List.clear(), as shown in Listing 11. Anexample of these relationship effects on the ListControl API can be found in Listing 12; this APIuses all three of the effects described and uses wildcards.

    Relationship effects allow tools to build up the relationship context at compile time. Listing13 shows a snippet from a plugin, along with the current relationships after each instruction. Forexample, after line 4 in Listing 13, we apply the effects declared in Listing 12, lines 7-9. Therefore,at line 5, we learn the two new relationships shown. The relationship context will be used later toevaluate collaboration constraints.

    Callback states While relationship effects provide information to a caller, callback states provideinformation to a callee. When frameworks make callbacks into plugin code, there is an implicitcontract regarding when the callback will occur and the states of objects at this point. For example,at the start of a Load callback in the Page Lifecycle example from Section 1, the plugin should beaware that the Page’s controls are loaded with data. However, in the Init callback, they exist butare not loaded with data yet, and in the PreInit callback, the controls do not even exist yet.

    The framework developer specifies this using the @State annotation. The @State annotationtakes the name of an unary relation on the type of the target object.11 An example of using thisannotation for the Page example can be seen in Listing 14.

    These annotations describe the initial relationship context within a method, before any oper-ations occur. In the beginning of a Page Init method, the relationship context would be set to:

    Initialized(target) 7→ True, PreInit(target) 7→ False, Loaded(target) 7→ FalseConstraints Framework developers can specify constraints in propositional logic over relation-ships. They are written as class-level annotations, but as constraints are extrinsic, they canconstrain the operations on any other classes. As the three examples of constraints on theDropDownList class in Listing 15 show, a constraint has four parts:

    1. operation: This is a signature of an operation to be constrained, such as a method call, con-structor call, or even a tag signaling the end of a method. Notice that this may be an oper-ation of another class, as in the first constraint in Listing 15. This makes constraints more

    11The @State annotation is very similar to typestate, and indeed, typestate can be used instead of having a statedeclaration in this form.

    21

  • Listing 12: Partial ListControl API with Relation annotations. Effects are applied in the order they aredefined.1 public class ListControl {2 @List({‘‘result’’, ‘‘target’’}, ADD)3 public ListItemCollection getItems();4

    5 //After this call, we know two pieces of information.6 //The returned item is selected, and it is a child of this7 @Child({‘‘result’’, ‘‘target’’}, ADD)8 @Selected({‘‘result’’}, ADD)9 public ListItem getSelectedItem();

    10 }

    11

    12 public class ListItem {13 //if the return is true, then we know we have a selected item14 //if it is false, we know it was not selected.15 @Selected({‘‘target’’}, TEST, ‘‘return’’)16 public boolean isSelected();17

    18 @Selected({‘‘target’’}, TEST, select)19 public void setSelected(boolean select);20

    21 @Text({‘‘result’’, ‘‘target’’}, ADD)22 public String getText();23

    24 //When we call setText, remove any previous Text relationships,25 //then add one for text26 @Text({‘‘∗’’, ‘‘target’’}, REMOVE)27 @Text({‘‘text’’, ‘‘target’’}, ADD)28 public void setText(String text);29 }

    30

    31 public class ListItemCollection32 @Item({‘‘item’’, ‘‘target’’}, REMOVE)33 public void remove(ListItem item);34

    35 @Item({‘‘item’’, ‘‘target’’}, ADD)36 public void add(ListItem item);37

    38 @Item({‘‘item’’, ‘‘this’’}, TEST, ‘‘result’’)39 public boolean contains(ListItem item);40

    41 @Item({‘‘item’’, ‘‘target’’}, ADD)42 @Text({‘‘text’’, ‘‘result’’}, ADD)43 public ListItem findByText(String text);44

    45 //if we had any items before this, remove them after this call46 @Item({‘‘∗’’, ‘‘target’’}, REMOVE)47 public void clear();48 }

    expressible that a class or protocol invariant.

    2. trigger predicate: This is a logical predicate over relationships. The plugin’s relationship con-text must determine that this predicate holds for this constraint to be triggered. If not, theconstraint is ignored. While operation provides a syntactic trigger for the constraint, trig-ger provides the semantic trigger. The combination of both a syntactic and semantic triggerallows constraints to be more flexible and expressible than many existing protocol-basedsolutions.

    22

  • Listing 13: Comments showing how the relationship context changes after each instruction1 DropDownList ddl = ...;

    2 ListItemCollection coll;

    3 ListItem newSel, oldSel;

    4 oldSel = ddl.getSelectedItem();

    5 //Child(oldSel, ddl), Selected(oldSel)6 oldSel.setSelected(false);7 //Child(oldSel, ddl), !Selected(oldSel)8 coll = ddl.getItems();

    9 //Child(oldSel, ddl), !Selected(oldSel), List(coll, ddl)10 newSel = coll.findByText("foo");

    11 //Child(oldSel, ddl), !Selected(oldSel), List(coll, ddl),12 //Item(newSel, coll), Text(”foo”, newSel)

    Listing 14: Callback specifications1 public class Page {2 @State(‘‘PreInit’’)3 protected void Page_PreInit(object sender, EventArgs e);4

    5 @State(‘‘Initialized’’)6 protected void Page_Init(object sender, EventArgs e);7

    8 @State(‘‘Loaded’’)9 protected void Page_Load(object sender, EventArgs e);

    10 }

    3. requires predicate: This is another logical predicate over relationships. If the constraint istriggered, then this predicate must be true under the current relationship context. If therequires predicate is not true, this is a broken constraint and the analysis should report afault in the plugin.

    4. effect list: This is a list of relationship effects. If the constraint is triggered, these effects willbe applied to the relationship context. They will be applied regardless of the state of therequires predicate.

    In the first @Constraint annotation at the top of Listing 15, the constraint is checking that atevery call to ListItem.setSelected(boolean), if the relationship context shows that the argu-ment is false, the target is a Child of a ListControl, and if that ListControl is a DropDownList,then it must also indicate that the ListItem is Selected. Additionally, the context will change sothat the DropDownList is not CorrectlySelected. The second constraint is similar to the first and itenforces proper selection of ListItems in a DropDownList. The third constraint ensures that themethod does not end in an improper state by utilizing a special “end-of-method” operation totrigger when a plugin callback is about to end.

    The constraints in Listing 15 use three pre-defined relationship types: typing, object identity,and boolean equality. The FUSION analysis does not track these relationships; it instead dependson existing, third-party analyses to support these relationship types. Other pre-defined predicates,such as ones for integer values, can be used if an appropriate analysis is provided to determinethe truth of these predicates.

    Inferred relationships In some cases, the relationships between objects are implicit. Considerthe ListItemCollection from the DropDownList example. In this example, the framework devel-

    23

  • Listing 15: DropDownList Selection Constraints and Inferred Relationships1 @Constraint(2 op=‘‘ListItem.setSelected(boolean select)’’,3 trigger=‘‘select == false and Child(target, ctrl) and4 ctrl instanceof DropDownList’’,5 requires=‘‘Selected(target)’’,6 effect={‘‘!CorrectlySelected(ctrl)’’}7 )8

    9 @Constraint(10 op=‘‘ListItem.setSelected(boolean select)’’,11 trigger=‘‘select == true and Child(target, ctrl) and12 ctrl instanceof DropDownList’’,13 requires=‘‘!CorrectlySelected(ctrl)’’,14 effect={‘‘CorrectlySelected(ctrl)’’}15 )16

    17 @Constraint(18 op=‘‘end−of−method”,19 trigger=‘‘ctrl instanceof DropDownList’’,20 requires=‘‘CorrectlySelected(ctrl)’’,21 effect={}22 )23 public class DropDownList {...}

    oper would like to state that items in this list are in a Child relation with the ListControl parent.However, it does not make sense to annotate the ListItemCollection class with this informationsince ListItemCollections should not know about ListControls.

    Inferred relationships describe these implicit relationships that can be assumed any time someother relationship predicate is true. Listing 16 shows an example for inferring a Child relationshipbased on the relations Item and List. Whenever the relationship context can show that the triggerpredicate is true, it can infer the relationship effects in the infer list. Inferred relationships allowthe framework developer to specify relationship effects that would otherwise have to be placed onevery location that the predicate is true; this would significantly drive up the cost of adding thesespecifications. Inferred relationships are therefore a kind of shortcut to make the specificationsmore adoptable.

    It is possible to produce inferred relationships that directly conflict with the relationship con-text. To prevent this, the semantics of inferred relationships is that they are ignored in the case ofa conflict; that is, relationships from declared relationship effects, callback states, and constraintshave a higher precedence. The rationale behind this is that relationship effects, callback states,and constraints are explicitly declared, and this should be reflected by the giving them prece-dence. Additionally, the inferred relationships are only used on an as-needed basis; to generate allpossible inferred relations would be expensive for the analysis. An alternative mechanism wouldbe to signal an error, though it is not currently clear whether this would increase the number offalse positives.

    Configuration Relationships In addition to Java, FUSION also treats XML files as specifiableand analyzable artifacts. Declarative code, such as XML files can be thought of as a static repre-sentation of the relationships which exist throughout the lifecycle of the plugin. All that is neededis a way to pull these relationships out of their current form and put them into a relationship form.

    24

  • Listing 16: DropDownList Inferred Relationships1 @Infer(2 trigger=”List(list, ctrl) and Item(item, list)”,3 infer={”Child(item, ctrl)”}4 )5 public class DropDownList {...}

    To retrieve these relationships from XML-based files, the framework developer will writeXQuery, a query language for querying XML as a database. Each XQuery statement will query theXML for a particular relationship type and will return a set of relationships like the callback statespecifications, these relationships will be input into the relationship context.

    I am still researching this specification form, and an analysis of the problem of retrieving re-lationships from declarative files was published [32]. Listing 17 provides some example XQueryto retrieve Child relationships. I anticipate that a common library will define XQuery functionssuch as isSubtype and type, as the concept of types, as relating to other code artifacts, is not builtinto XML. There is still work to do to determine how to bind the object labels created in theserelationships to the variables in other program artifacts.

    Listing 17: XQuery for several relationship types1 for ctrl in doc(file)/, sub in ctrl/

    2 where isSubtype(type(ctrl), "System.Web.UI.Control") and isSubtype(type(ctrl), "System.Web.UI.Control")

    3 return

    4

    5

    6

    7

    8

    9 for ctrl in doc(file)/, sub in ctrl/AnonymousTemplate/

    10 where isSubtype(type(ctrl), "System.Web.UI.LoginView") and isSubtype(type(ctrl), "System.Web.UI.Control")

    11 return

    12

    13

    14

    15

    16

    17 for ctrl in doc(file)/, sub in ctrl/LoggedInTemplate/

    18 where isSubtype(type(ctrl), "System.Web.UI.LoginView") and isSubtype(type(ctrl), "System.Web.UI.Control")

    19 return

    20

    21

    22

    23

    Analysis

    The FUSION analysis will analyze plugin code to make sure it conforms to the specified con-straints. This is a modular, branch-sensitive, forward dataflow analysis12. It is implemented as aplugin to the Crystal static analysis framework [45, 13]. Additionally, there are three variants of

    12By branch-sensitive, we mean that the true and false branches of a conditional may receive different lattice infor-mation depending upon the condition. The transfer function on the condition is called twice, once assuming that theresult is false, and once assuming that it is true. This is not a path-sensitive analysis; the branch condition is not savedfor use after the branches merge together.

    25

  • this analysis: a sound variant, a complete variant, and a hybrid variant which attempts to take thebalance of false positives and false negatives. The changes between the variants are surprisinglyminor, as discussed later in this section.

    The FUSION analysis depends on several other analyses, including a boolean constant prop-agation analysis and an alias analysis. The boolean constant propagation analysis is used to de-termine the truth of the pre-defined boolean equality predicates. This analysis is also used for theTEST effect on user-defined relations. For this purpose, the relation analysis assumes there is afunction B to which it can pass a variable and learn whether the represented value is true, false, orunknown. The alias analysis is used to check the pre-defined typing predicates and object equalitypredicates. However, it is also used to handle aliasing issues within the FUSION analysis, and todo this, it must provide the functionality described below.

    Dealing with aliases

    The FUSION analysis can use any alias analysis which implements a simple interface. First, itassumes there is a context L that given any variable x, provides a finite set ¯̀ of abstract locationsthat the variable might point to. Second, it assumes a context Γ` which maps every abstract location` to a type τ. The combination of these two contexts, < Γ`,L > is represented as the alias lattice A.

    The alias lattice must be conservative in its abstraction of the heap, as defined by Definition 6.

    Definition 6 (Abstraction of Alias Lattice). Assume that a heap h is defined as a set of sourcevariables x, each of which points to a runtime location ` of type τ. Let H be all the possible heapsat a particular program counter. An alias lattice < Γ`,L > abstracts H at that program counter ifand only if

    ∀ h ∈ H . dom(h) = dom(L) and∀ (x1 ↪→ `1 : τ1) ∈ h . ∀ (x2 ↪→ `2 : τ2) ∈ h .

    if x1 6= x2 and `1 = `2 then∃ ` ′ . ` ′ ∈ L(x1) and ` ′ ∈ L(x2) and τ1

  • Table 1: Differences between sound, complete, and hybrid variants

    Trigger Predicate checks when... Requires Predicate passes when...Sound True or Unknown TrueComplete True True or UnknownPragmatic True True

    actually occur in some runtime scenario (if it is complete). For the purposes of these definitions,an error is a dynamic interpretation of the constraint which causes the requires predicate to fail.In the formal semantics, an error is signaled as a failure for the flow function to produce a newlattice for a particular instruction.

    As part of this thesis, I plan to investigate the practical implications of the soundness andcompleteness tradeoff in the framework domain. To accomplish this, FUSION can be run eitheras a sound analysis, a complete analysis, or a hybrid analysis which attempts a principled, cost-effective balance of these tradeoffs. The differences between the variants are summarized in Table1.

    Trigger condition. The trigger predicate determines when the constraint will check the re-quires predicate and when it will produce effects. The sound variant will trigger a constraintwhenever there is even a possibility of it triggering at runtime. Therefore, it triggers when thepredicate is either true or unknown. The complete variant can produce no false positives, so itwill only check the requires predicate when the trigger predicate is definitely true. Regardless ofthe variant, if the trigger is either true or unknown, the analysis produces a set of changes to maketo the lattice based upon the effects list. The hybrid variant will work the same as the completevariant when determining whether to trigger the constraint. The rationale here is to try to reducethe number of false positives by only checking constraints when they are known to be applicable.

    Error condition. The requires predicate should be true to signal that the operation is safe touse. The sound variant will cause an error whenever the requires predicate is false or unknown.The complete variant, however, can only cause an error if it is sure there is one, so it only flags anerror if the requires predicate is definitely false. In this case, the hybrid variant will work the sameas the sound variant. If the analysis has come to this point, it already has enough information todetermine that the trigger was true. Therefore, we will require that the plugin definitely show thatthe requires predicate is true, with the expectation that this will reduce the false negatives.

    While many analyses have the choice between soundness and completeness, very few havethe hybrid option. This option exists because FUSION allows for unknownness when determin-ing the applicability of a constraint. Other systems for specifying pre and post conditions havecomplete knowledge of the applicability of a constraint; either the constraint statically matchesthe operation being examined, or it does not. Because FUSION constraints are have both a syn-tactic and semantic trigger, and the semantic trigger may be statically unknowable, we open thepossibility for a hybrid variant with the following principle.

    Definition 7 (Hybrid Variant Principle). Given a constraint and an operation to verify, check forapplicability of the constraint in a sound manner, and then check the constraint in a completemanner.

    It is my belief that most false positives in the sound variant will occur from inapplicable con-straints, therefore we will reduce them by being complete with regard to applicability. Likewise,

    27

  • I believe many false negatives in the complete variant are from not checking the constraint whenit is known to be applicable. The hybrid variant will seek to reduce both of these in a balancedmanner. My rationale is that if there was enough knowledge in the relationship context to showthat the constraint is applicable, there should also be enough knowledge to show that it is valid.Missing knowledge at that stage signals that the plugin developer missed an operation to deter-mine whether the constraint is valid, and therefore it is likely invalid in some circumstance. Thepreliminary prototype has been promising; the false positives and false negatives have occurredin highly unlikely or highly unreadable code.

    I have formally defined soundness and completeness of the FUSION analysis by assumingan alias analysis which abstracts the heap using A, as described previously. For both of thesetheorems, we let Ac define the concrete heap at some point of an real execution, and we let Aa bea sound approximation of Ac. Likewise, we let Bc define the concrete boolean values of an realexecution, and we let Ba be a sound approximation of Bc. We also let ρa and ρc be relationshiplattices consistent with Aa and Ac where ρa is an abstraction of the concrete runtime lattice ρc,defined as ρc v ρa.

    For the sound variant, we expect that if the flow function generates a new lattice using the im-precise lattice ρa, then any more concrete lattice will also produce a new lattice for that instruction.As the flow function only generates a new lattice if it finds no errors, then there may be false pos-itives from when ρa produces errors, but there will be no false negatives. To be locally sound forthis instruction, the new abstract lattice must conservatively approximate any new concrete lat-tice. Theorem 1 captures the intuition of local soundness formally. Global soundness follows fromlocal soundness, the monotonicity of the flow function, and the initial conditions of the lattice.

    Theorem 1 (Local Soundness of Relations Analysis). Let fC;A;B(ρ, instr) = ρ ′ be the flow functionfor the relation analysis where C is the list of constraints, A is the alias lattice, B is the boolean constraintlattice, ρ is the relationship lattice, and instr is the instruction the flow function is running on. Also let vbe the precision relation for lattices. The flow function is sound if and only if

    ∀ C, Aa, Ac, Ba, Bc, ρa, ρc, instr, ρa ′ . ∃ ρc ′ .Ac v Aa and Bc v Ba and ρc v ρa and fC;Aa;Ba(ρa, instr) = ρa

    ′implies

    fC;Ac;Bc(ρc, instr) = ρc′and ρc

    ′ v ρa ′

    If the relation analysis is complete, we expect a theorem which is the opposite of the soundnesstheorem and is shown in Theorem 2. If a flow function generates a new lattice given a lattice ρc,then it will also generate a new lattice on any abstraction of ρc. An analysis with this property mayproduce false negatives, as the analysis can find an error using the concrete lattice yet generate anew lattice using ρa, but it will produce no false positives. Like the sound analysis, the resulting

    28

  • lattices must maintain their existing precision.

    Theorem 2 (Local Completeness of Relations Analysis). Let fC;A;B(ρ, instr) = ρ ′ be the flow functionfor the relation analysis where C is the list of constraints, A is the alias lattice, B is the boolean constraintlattice, ρ is the relationship lattice, and instr is the instruction the flow function is running on. Also let vbe the precision relation for lattices. The flow function is complete if and only if

    ∀ C, Aa, Ac, Ba, Bc, ρa, ρc, instr, ρc ′ . ∃ ρa ′ .Ac v Aa and Bc v Ba and ρc v ρa and fC;Ac;Bc(ρc, instr) = ρc

    ′implies

    fC;Aa;Ba(ρa, instr) = ρa′and ρc

    ′ v ρa ′

    Proofs of soundness and completeness, for the sound and complete variants respectively, canbe found in our associated technical report [14]. This report also contains the abstract semanticsof the analysis.

    29

  • 5 Validation

    A fact is a simple statement that everyonebelieves. It is innocent, unless foundguilty. A hypothesis is a novel suggestionthat no one wants to believe. It is guilty,until found effective.

    Edward Teller

    In this section, I will first describe my validation plan, and I will then describe how the planvalidates the definitions, claims, and hypotheses defined in Section 2.

    Validation Plan

    The validation plan has three distinct pieces. Each of these will support several of the contribu-tions of this work, as described in the later sections.

    Formalize the system I have currently formalized all the specifications described in Section 4 ex-cept the retrieval of declarative relationships. I have also formalized the semantics of the analysisusing the formal specifications and a three address code abstraction of a class-based OO-language.The formalism makes it clear where the three variants of the analysis differ, and I have providedtheorems and proofs of soundness and completeness for the sound and complete variant respec-tively. The formalism has been documented in an ECOOP paper [31] and tech report [14].

    Classify industry frameworks I have been investigating software framework designs. I wroteup my initial investigation of JUnit and Spring as a Ph.D. project for the Software Architecturecourse [30]. In this work, I discovered that there are many mechanisms by which frameworks in-teract with plugins, including callbacks, dependency injection through annotations, interceptors,and deployment configuration.

    I am currently working on a deeper investigation of frameworks [8]. This work explores sev-eral frameworks from different languages and domains to show the similarities of frameworkdesign and to show that they are used outside of the class-based OO community. In particular,we have deeply explored ASP.NET (C#, web domain), Open|SpeedShop (C, dynamic analysis do-main), and Facebook (PHP, web domain), and we have created a taxonomy which includes 20industry frameworks.

    Run a case study of FUSION on Spring I will start the case study by mining the Spring helpforums to find collaboration constraints. As I have done previously with the ASP.NET framework,I will record the properties of each collaboration constraint found (see Appendix A for the proper-ties of interest). For each constraint that uses Java and XML only, I will attempt to specify it usingFUSION. I will record the number and type of specifications required to to specify the issue in acheckable way, and I will also count how many specifications were reused from prior constraints.If the constraint was not specifiable in FUSION, I will record the reason why it was not.

    To test the analysis, I will create several variants of the sample code based upon my ownanalysis of other ways a developer might violate the same constraint. For each sample, regardless

    30

  • of whether it is original or created, I will run the three variants of the FUSION analysis and recordwhether it found the defect, and if so, the location where it reported the defect. If the analysisdid not report the correct result, I will determine and record the cause of the imprecision. Thisinformation will be used to validate several hypotheses, as described later.

    While I am only studying a single framework, Spring contains many of the interaction mech-anisms found in industry frameworks, and the taxonomy will show it o be representative of com-mon industry frameworks.

    Claims of generality and support for definitions

    In this work, I have provided definitions for three constructs: collaboration constraints, softwareframeworks, and relationships. While relationships are my own invention, collaboration con-straints and software frameworks are new definitions for existing constructs, and so I must showthat my definitions are useful. To show that my definitions are useful, I have analyzed manyexamples of collaboration constraints and software frameworks and created taxonomies that willhelp other researchers understand and use these constructs.

    Definition 1 (Collaboration Constraint). A collaboration constraint is a precondition that is ex-pressed as a predicate on abstract states of several objects.

    To show the usefulness of this definition, I have done an empirical evaluation of postings onthe ASP.NET help forums. I found that 20% of the comprehensible postings were about a collabo-ration constraint, and I have shown that these types of constraints are not well supported either byexisting developer tools or by formal specification languages. I have created a taxonomy for theseconstraints (see Appendix A); this classification will help further more research in the field byhighlighting the characteristics of these constraints that are not handled by existing technologies.I will continue to classify the postings in the Spring help forums along the same taxonomy.

    Definition 2 (Software Framework). A software framework is a set of reusable modules which en-force that their clients conform to a predefined architecture.

    To show the usefulness of this definition, I have examined many industry examples of softwareframeworks, as detailed further in [8]. In this work, we have shown that frameworks are notlimited to class-based, OO languages but can be found in many languages and domains. Wefound that the common thread of all these frameworks was that they are architectural in nature,and I have been taking a deeper examination at the mechanisms by which frameworks imposetheir architectural constraints on plugins. This architecturally-focused definition will have broaderimpact as it will encourage more research on frameworks beyond OO-specific techniques, and wehope it will also have an impact on producing better framework designs.

    I have also made two large claims about the association between collaboration constraints andsoftware frameworks, and the association between collaboration constraints and relationships.

    Claim 1. Collaboration constraints are an essential part of framework design.

    I will provide evidence for this claim by analyzing software frameworks in many contexts andshow that collaboration constraints are an essential outcome of these framework designs. I haveprovided some examples and an outline of my arguments in Section 3; I will continue my analysisof frameworks to find additional clear and compelling examples.

    31

  • Claim 2. Relationships are a practical and adoptable technique for specifying collaboration con-straints

    I have provided evidence that relationships can specify collaboration constraints by analyzingthe properties of collaboration constraints and showing that relationships handle each of theseproperties. Please refer to Section 4 for my analysis of this.

    Regarding practicability and adoptability of relationships for this purpose, I will show thatFUSION, a relationship-based specification language, meets the requirements of an adoptable tool,as defined by [?]. In particular, it is incremental (Hypothesis 3), composable (Hypothesis 4), andcan be analyzed in several ways to allow the user to select the most cost-effective tradeoff betweenfalse positives and false negatives (Hypothesis 7). My work on ERL [14] also shows that it ispossible to create human-readable error messages from a broken co