Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
A Service-Oriented Componentization Frameworkfor Java Software Systems
by
Shimin Li
A thesis
presented to the University of Waterloo
in fulfilment of the
thesis requirement for the degree of
Master of Applied Science
in
Electrical and Computer Engineering
Waterloo, Ontario, Canada, 2006
c©Shimin Li 2006
I hereby declare that I am the sole author of this thesis.
I authorize the University of Waterloo to lend this thesis to other institutions or individuals
for the purpose of scholarly research.
Shimin Li
I further authorize the University of Waterloo to reproduce this thesis by photocopying or by
other means, in total or in part, at the request of other institutions or individuals for the purpose
of scholarly research.
Shimin Li
ii
Abstract
Service-oriented computing has dramatically changed the way in which we develop software
systems. In the fast growing global market for services, providing competitive services to these
markets is critical for the success of businesses and organizations. Since many competitive ser-
vices have already been implemented in existing systems, leveraging the value of an existing
system by exposing all or parts of it as services within a service-oriented environment has be-
come a major concern in today’s industry. In this work, we categorize services embedded in a
system into two categories : i)Top-level servicesthat are not used by another service but may
contain a hierarchy of low-level services further describing and modularizing the service, and
ii) Low-level servicesthat are underneath a top-level service and may be agglomerated with other
low-level services to yield a new service with a higher level of granularity. To meet the de-
mand of identifying and reusing the business services embedded in an existing software system,
we present a novel service-oriented componentization framework that automatically supports:
i) identifying critical business services embedded in an existing Java system by utilizing graph
representations of the system models, ii) realizing each identified service as a self-contained com-
ponent that can be deployed as a single unit, and iii) transforming the object-oriented design into
a service-oriented architecture. A toolkit implementing our framework has been developed as an
Eclipse Rich Client Platform (RCP) application. Our initial evaluation has shown that the pro-
posed framework is effective in identifying services from an object-oriented design and migrating
it to a service-oriented architecture.
iii
Acknowledgments
First and foremost, I am deeply indebted to my supervisor, Professor Ladan Tahvildari, for
her patient academic (and personal) guidance over the years. Her passion for doing and commu-
nicating innovative and creative science has and always will be a great source of inspiration. I
feel very privileged to have worked with her.
I wish to thank the members of my dissertation committee: Professor Kostas Kontogiannis
and Professor Sagar Naik, for having accepted to take the time out of their busy schedule to read
my thesis and provide me invaluable comments and inspiring remarks.
I would like to thank all members of the Software Technologies and Applied Research (STAR)
group for their tremendous support and cooperation.
I want to thank my parents who have been extremely understanding and supportive of my
studies. I want to thank my wonderful wife, Wei, who has encouraged me so much over the
years. I also want to thank my lovely son, Zihan, for letting Dad work on his dissertation when
he needed to do so. I feel very lucky to have a family that shares my enthusiasm for academic
pursuits.
iv
Contents
1 Introduction 1
1.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
1.2 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
2 Related Work 8
2.1 Program Comprehension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
2.1.1 Feature Locating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
2.1.2 Software Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
2.2 Program Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
2.2.1 Migrating Procedural Legacy Systems to Object-Oriented Paradigm . . .13
2.2.2 Re-Engineering Existing Object-Oriented Systems . . . . . . . . . . . .15
2.3 Architecture Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
2.4 Software Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
2.4.1 Identification of Reusable Components in Source Code . . . . . . . . . .19
2.4.2 Creation of Services from Legacy Systems . . . . . . . . . . . . . . . .21
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22
3 Service-Oriented Componentization Framework 23
v
3.1 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
3.2 Architecture Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
3.3 Service Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
3.4 Component Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27
3.5 System Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
4 Architecture Recovery 30
4.1 XML Schema Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
4.1.1 UML Profile for XML Schemas . . . . . . . . . . . . . . . . . . . . . .31
4.1.2 Representing XML Schemas in UML . . . . . . . . . . . . . . . . . . .32
4.2 Modeling Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33
4.2.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
4.2.2 Source Code Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
4.3 Modeling Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38
4.3.1 Definitions of Class Relationships . . . . . . . . . . . . . . . . . . . . .39
4.3.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45
4.3.3 Class/Interface Relationship Graph . . . . . . . . . . . . . . . . . . . .47
4.3.4 Class/Interface Dependency Graph . . . . . . . . . . . . . . . . . . . . .49
4.3.5 An Example : Car Rental System . . . . . . . . . . . . . . . . . . . . .51
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53
5 Service Identification 54
5.1 Service Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54
5.2 Supporting Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56
5.2.1 Graph Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57
5.2.2 Dominance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
vi
5.2.3 Modularization Quality Metric . . . . . . . . . . . . . . . . . . . . . . .62
5.3 The Proposed Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63
5.3.1 Top-Level Service Identification . . . . . . . . . . . . . . . . . . . . . .64
5.3.2 Low-Level Service Identification . . . . . . . . . . . . . . . . . . . . . .68
5.3.3 An Example : Car Rental System . . . . . . . . . . . . . . . . . . . . .72
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79
6 Component Generation and System Transformation 80
6.1 Component Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81
6.1.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81
6.1.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85
6.2 System Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89
6.2.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89
6.2.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
7 Empirical Studies 94
7.1 A Prototype for the SOC4J Framework . . . . . . . . . . . . . . . . . . . . . . .95
7.1.1 Tool Integration Requirements . . . . . . . . . . . . . . . . . . . . . . .95
7.1.2 JComp RCP Application . . . . . . . . . . . . . . . . . . . . . . . . . .97
7.2 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100
7.2.1 Component Reusability . . . . . . . . . . . . . . . . . . . . . . . . . . .100
7.2.2 Architectural Improvement . . . . . . . . . . . . . . . . . . . . . . . . .105
7.3 Case Study : Jetty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .106
7.3.1 Statistics of the Jetty . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
7.3.2 Discussions on Obtained Results . . . . . . . . . . . . . . . . . . . . . .107
7.4 Case Study : Apache Ant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113
vii
7.4.1 Statistics of the Apache Ant . . . . . . . . . . . . . . . . . . . . . . . .113
7.4.2 Discussions on Obtained Results . . . . . . . . . . . . . . . . . . . . . .114
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118
8 Future Directions and Conclusions 119
8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .119
8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121
8.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122
A Top-Level Services of Jetty 123
B Top-Level Services of Apache Ant 125
viii
List of Tables
4.1 The Metric Suite at Class Level . . . . . . . . . . . . . . . . . . . . . . . . . . .46
7.1 Statistics of the Jetty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
7.2 Top-Level Services Identified from Jetty. . . . . . . . . . . . . . . . . . . . . . .109
7.3 Low-Level Services Identified in Top-Level ServiceWin32 Server. . . . . . . . . 111
7.4 Some Time and Space Statistics of the SOC4J Framework on the Case Study : Jetty.113
7.5 Statistics of the Apache Ant. . . . . . . . . . . . . . . . . . . . . . . . . . . . .114
7.6 Selected Top-Level Services Identified from Apache Ant. . . . . . . . . . . . . .114
7.7 Low-Level Services Identified in Top-Level ServiceWAR File Creation. . . . . . 115
7.8 Some Time and Space Statistics of the SOC4J Framework on the Case Study :
Apache Ant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118
A.1 Top-Level Services of Jetty (1). . . . . . . . . . . . . . . . . . . . . . . . . . . .123
A.2 Top-Level Services of Jetty (2). . . . . . . . . . . . . . . . . . . . . . . . . . . .124
B.1 Top-Level Services of Apache Ant (1). . . . . . . . . . . . . . . . . . . . . . . .125
B.2 Top-Level Services of Apache Ant (2). . . . . . . . . . . . . . . . . . . . . . . .126
B.3 Top-Level Services of Apache Ant (3). . . . . . . . . . . . . . . . . . . . . . . .127
B.4 Top-Level Services of Apache Ant (4). . . . . . . . . . . . . . . . . . . . . . . .128
B.5 Top-Level Services of Apache Ant (5). . . . . . . . . . . . . . . . . . . . . . . .129
ix
List of Figures
2.1 The Conceptual Model of Eisenbarth’s Approach. . . . . . . . . . . . . . . . . .11
2.2 The Block Diagram of the Quality-Based Re-engineering Process. . . . . . . . .16
2.3 The Dali Workbench. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
3.1 The Architecture of the Service-Oriented Componentization Framework. . . . . .24
4.1 The Approach for Source Code Modeling. . . . . . . . . . . . . . . . . . . . . .34
4.2 The Meta-Model for Java Package Models. . . . . . . . . . . . . . . . . . . . .35
4.3 The Meta-Model for Java Source File Models. . . . . . . . . . . . . . . . . . . .36
4.4 The Meta-Model for Java Classe/Interface Models. . . . . . . . . . . . . . . . .37
4.5 The Meta-Model for Java Method/Constructor Models. . . . . . . . . . . . . . .38
4.6 The Approach for Architecture Modeling. . . . . . . . . . . . . . . . . . . . . .45
4.7 The UML Representation of XML Schema for Nodes in the CIRG. . . . . . . . .48
4.8 The UML Representation of XML Schema for Nodes in the CIDG. . . . . . . . .50
4.9 The CIRG of the Car Rental System (CRS). . . . . . . . . . . . . . . . . . . . .51
4.10 The CIDG of the Car Rental System (CRS). . . . . . . . . . . . . . . . . . . . .52
5.1 The UML Representation of XML Schema for a Service. . . . . . . . . . . . . .56
5.2 An Example of a Directed Graph. . . . . . . . . . . . . . . . . . . . . . . . . .58
x
5.3 (a) A connected component of the directed graphG in Figure 5.2. (b) The other
connected component ofG. (c) The only strongly connected component ofG.
(d) A rooted component of graph (a). (e) The other rooted component of graph (a).59
5.4 (a) A Simple Directed Graph. (b) The Dominance Tree Corresponding to the
Graph in (a). (c) All Two Maximal Consolidation Subtrees of the Dominance
Tree in (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60
5.5 Processes in Service Identification Stage. . . . . . . . . . . . . . . . . . . . . .63
5.6 The MCIDGs of the Car Rental System. . . . . . . . . . . . . . . . . . . . . . .73
5.7 The SHG of the Top-Level ServiceV ehicleBooking. . . . . . . . . . . . . . . . 74
5.8 The Result SHG of Performing the SHG Transformation on the Original SHG of
the Top-Level ServiceV ehicleBooking in the CRS System. . . . . . . . . . . .75
5.9 The Service Dominance Tree of the SHG in Figure 5.8. . . . . . . . . . . . . . .76
5.10 The Reduced Dominance Tree of the Service Dominance Tree in Figure 5.9. . . .77
5.11 The SHG Reconstructed from the Reduced Service Dominance Tree in Figure 5.10.78
6.1 The UML Representation of XML Schema for a Component. . . . . . . . . . . .83
6.2 The UML Class Diagrams ofCustomer andPerson in the CRS System. . . . . 86
6.3 Part of UML Class Diagram of the ComponentCustomer. . . . . . . . . . . . . 88
6.4 The Meta-Model for the Component-Based Target System. . . . . . . . . . . . .90
6.5 The Service Hierarchy Graphs of the CRS System. . . . . . . . . . . . . . . . .92
6.6 The Component Hierarchy Graphs of the CRS System. . . . . . . . . . . . . . .92
6.7 The Component-Based Car Rental System. . . . . . . . . . . . . . . . . . . . .93
7.1 The Tool Interconnection for the SOC4J Framework. . . . . . . . . . . . . . . .96
7.2 The Architecture of the JComp Java Componentization Kit. . . . . . . . . . . . .98
7.3 A Snapshot of the JComp Java Componentization Kit. . . . . . . . . . . . . . . .99
7.4 The Component Reusability Model. . . . . . . . . . . . . . . . . . . . . . . . .104
xi
7.5 The AcceptedService Viewof the Extractor plug-in. . . . . . . . . . . . . . . . .108
7.6 Iterations of the Service Aggregation Process of Top-Level ServiceWin32 Server. 110
7.7 The CHG of Top-Level ComponentWin32 Serverof the Jetty. . . . . . . . . . .111
7.8 The Reusability of Components Extracted from Jetty. . . . . . . . . . . . . . . .112
7.9 The CHG of Top-Level ComponentWAR File Creationof the Apache Ant. . . .116
7.10 The Reusability of Components Extracted from the Apache Ant. . . . . . . . . .117
xii
Chapter 1
Introduction
Billions of dollars are spent each year on computer software. Much of this effort is spent on cre-
ating and testing new source code. To save money, increase productivity, and improve quality and
reliability, academic and industrial institutions have put a lot of effort into reusing existing soft-
ware. The arrival of new software technology creates the need to leverage existing software assets
in order to take advantage of the new technology, but implementing business-critical applications
whenever a new technology arrives is impossible due to the time and resources required. The
only option then is software re-engineering. Examples of such new software technologies that
created big demand and market for suitable legacy systems re-engineering, wrapping and evo-
lution methods are distributed object technology, component technology, the World Wide Web
(WWW) and XML.
Service-oriented computing has the potential to drastically change the way we develop soft-
ware. When global markets for services provide the potential for reuse at a much greater scale,
providing competitive services to these markets will be critical to the implementation of this vi-
sion as a whole as well as to the success of individual business. However, lots of what would make
competitive services are already implemented in existing systems. The challenge then is how to
transform the functionality of existing legacy systems fully or partially into services. Identifying,
1
CHAPTER 1. INTRODUCTION 2
extracting and re-engineering software components that implement abstractions within existing
systems is a promising cost-effective way to create reusable assets and re-engineer existing soft-
ware systems.
Today, more and more organizations are migrating to service-oriented architectures (SOA) to
achieve net-centric operations. This offers the potential of leveraging legacy systems by exposing
some parts of the system as services within the SOA. However, there is often a lack of effective
engineering approaches for identifying, describing, modeling, and realizing services embedded
in existing software systems. The core of an SOA is a service which is a coarse-grained, dis-
coverable, and self-contained software entity that interacts with applications and other services
through a loosely coupled, often asynchronous, message-based communication model.
The reuse of an existing software systems requires a comprehensive framework to identify
and extract critical business services embedded in the existing system. A business service of
a software system is an abstract resource that represents a capability of performing tasks that
represents a coherent functionality from the point of views of provider entities and requester en-
tities [40]. Effective system reuse and evolution require both the “big picture” and the lower level
dependencies between portions of the source code. The focal point of the proposed research is
to exploit the synergy between the areas ofProgram Comprehension[9, 21, 38, 69, 97, 100],
Architecture Recovery[43, 54, 55, 59, 60, 63, 64],Software Reuse[34, 35], andProgram Migra-
tion [57, 80–85, 96].
In this context, our goal is to develop a service-oriented componentization framework that
decomposes an existing object-oriented system to re-modularize the existing assets to support
service functionality. More specifically, the proposed framework should automatically support :
i) identifying critical business services embedded in an existing Java system, ii) realizing each
identified service as a self-contained component, and iii) transforming the object-oriented design
into a service-oriented architecture. To be of practical use, such a re-engineering environment
should be generic in the sense of being able to support different object-oriented existing systems.
CHAPTER 1. INTRODUCTION 3
In other words, it must be built upon a meta-model of object-oriented existing systems rather than
upon a particular existing system. This avoids the cost of developing of a dedicated evolution
environment for each target system. Hence the environment should consequently be configurable
with a model of the target existing system which parameterizes the evolution environment with
the existing system to be evolved and serves as a basis for specifying the components to be
created.
This research addresses a problem that has challenged the research community for several
years - namely the asset reuse of object-oriented existing systems. It also devises a framework
in which reuse and evolution activities do not occur in a vacuum, but can be monitored and fine-
tuned by the user in order to address specific quality requirements for extracted components and
the evolutive target system such as, component granularity and reusability, and system maintain-
ability.
1.1 Problem Description
An effective way of leveraging the value of existing systems is to expose their functionalities as
reusable components to a larger number of clients through well-defined component interfaces.
Each component encapsulates a business service such as processing a payment, currency con-
version, computing an insurance quotation, etcetera. In general, we have found that the code of
existing systems represents a set of components with significant reuse potential. However, be-
cause the existing system does not have sufficient architecture or other high level documentation,
it is difficult to understand both the “big picture” and the lower level dependencies between por-
tions of the code. From the implementation point of view, the challenge consists of two phases :
• Reverse Engineering: Identifying and extracting the top-level functions of an existing soft-
ware system, and providing service descriptions for these identified functions.
• Forward Engineering: Performing any necessary transformations to migrate the mono-
CHAPTER 1. INTRODUCTION 4
lithic architecture of the existing systems to a more flexible service-oriented architecture.
In this thesis, we are interested in the reverse engineering challenge. Service identification is
complicated by the usual obstacles of having to deal with potentially large and poorly structured
existing systems. Identifying these service candidates for packaging as reusable components
would require analysis of massive amounts of legacy code or at least graphic representations of
the code. Additionally, it would require intervention of people with background in the business
domain to judge what functions are likely to make reusable services.
The identification of functions suitable for exposure as services can be seen as an instance of
a more generic problem of functional decomposition of existing systems. Here we are required
to abstract the code, or an alternative code representation (e.g., XML or graphs) to higher-level
representations that describe the system architecture in terms of its functional units. Moreover,
access points to these functional units would need to be identified as well.
To reuse the identified services and migrate the existing system’s implementation into a
component-based architecture, it might be necessary to package the identified services into well-
documented and self-contained components, during the forward engineering phase. If service
packaging is required, then it needs techniques for automatically extracting the relevant procedu-
ral elements from existing systems, and creating an interface for components.
Furthermore, a formal description needs to be developed for each service. Service descrip-
tions should document possible dependencies between service invocations, beside syntactic in-
formation on the number and types of parameters. Such descriptions are crucial for developers
to implement applications based on the services extracted and should therefore be presented in a
way they can understand.
We seek a combination of solutions from three different domains in order to tackle the service
identification, service modeling, and service packaging problem :
• Source Code Analysis and Reverse Engineering Technology. We aim to create a frame-
CHAPTER 1. INTRODUCTION 5
work that has methodological and technological steps to recover higher-level design and
architecture representations of existing software systems based on the source code artifacts.
This includes creation of a suitable representation of design and architectural models that
reflect the functional decomposition of the system. To distinguish these models from each
other, design models are more detailed and refer to different parts of a system, whereas
architectural models are more abstract and refer to the system as a whole. Our starting
point for this line of work is exploring the existing body of work on architecture recovery
and reconstruction, as well as software clustering in searching for suitable algorithms and
ideas.
• UML Technology. Models in the UML will provide a high-level representation of anal-
ysis results and service descriptions that is understandable for both software developers
and business experts. As a universal language, the UML provides standard notations for
almost all aspects of a system. Structural features like data types, operation signatures, and
architectures are captured by class and component diagrams. System behavior, including
scenarios, processes, and protocols are captured by sequence or activity diagrams as well
as state-charts. We use component diagrams to provide a high-level overview of the pro-
posed services, components and their interfaces. Based on this representation, the users
can validate the proposed services.
• Graph Transformation Technology. We utilize the graph transformation technology to
implement mappings between different graphical representations of programs and models.
The strength of the approach lies in the fact that model transformations can be expressed
graphically, based on Meta-Object Facility (MOF) models for the source and target models.
Also in the service identification phase, the graph transformation technology can be used
to agglomerate services.
CHAPTER 1. INTRODUCTION 6
1.2 Thesis Contribution
This thesis aims to design a framework that helps to reuse the assets of existing systems and
migrate their object-oriented design to service-oriented architectures. This deals with the long-
standing problem of reusing and evolving existing object-oriented systems in the following ways :
• By designing and implementing comprehensive graphic representations of an object-oriented
system in different levels of abstraction.
• By exploring an incremental program comprehension approach, including describing an
object-oriented software system using different concurrent views, each of which addresses
a specific set of concerns of the system.
• By designing and implementing an efficient and effective methodology for identifying and
realizing critical business services embedded in an existing object-oriented system.
• By designing and implementing an object-oriented restructuring methodology that trans-
forms the typically monolithic architectures of existing systems to more flexible service-
oriented architectures.
• By designing and implementing a prototype system that supports the identification and
realization of critical business services embedded in an Java software system and the com-
ponentization of the Java System.
1.3 Thesis Organization
This thesis is organized as follows :
• Chapter 2 reviews the related work, with the aim of putting this thesis in context. It cov-
ers four research areas that form the foundation of this thesis :Program Comprehension,
Architecture Recovery, Software Reuse, andProgram Migration.
CHAPTER 1. INTRODUCTION 7
• Chapter 3 gives an overview of the service-oriented componentization framework for Java
software systems. This framework uses graph representations of an existing object-oriented
software system and graph transformations to identify business services embedded in the
system. Furthermore, the framework realizes each identified service into a self-contained
component and transforms the object-oriented design into a service-oriented architecture.
The proposed framework is composed of four stages :Architecture Recovery, Service Iden-
tification, Component Generation, andSystem Transformation.
• Chapter 4 discusses reverse engineering techniques used within the architecture recovery
stage to build source code models and architectural models of an existing object-oriented
software system.
• Chapter 5 presents the service identification strategy and algorithm that are used within the
service identification stage to identify critical business services embedded in an existing
object-oriented system.
• Chapter 6 discusses the processes within the component generation stage and the system
transformation stage. It covers the service packaging technique and architecture recon-
struction technique.
• Chapter 7 shows the application of the proposed service-oriented componentization frame-
work on some real world Java projects. The prototype of the framework and framework
evaluation criteria will be introduced. Case studies will be explained and the results will be
discussed.
• Chapter 8 presents the conclusions of this research work and discusses possible directions
future research might take.
• Appendix A and B list and describe the identified business services from the case studies.
Chapter 2
Related Work
In this chapter we review the related work, with the aim of putting this thesis in context. We
survey four research areas that form the foundation of this thesis, namelyProgram Comprehen-
sion, Program Migration, Architecture Recovery, andSoftware Reuse. The Program Comprehen-
sion section outlines approaches for locating features in source code and techniques for software
clustering. The Architecture Recovery section presents the technologies used in the software ar-
chitecture recovery domain. The Software Reuse section reviews the techniques for identifying
reusable components in source code and creating services from legacy systems. The Program
Migration section discusses current methodologies for migrating procedural legacy systems to
the object-oriented paradigm and re-engineering existing object-oriented systems. Finally, the
last section summarizes the material presented in this chapter.
2.1 Program Comprehension
The identification of potentially reusable services embedded in an existing system requires an
understanding of the functionality of each parts of the system. Program understanding or analysis
in general includes any activity that uses dynamic or static methods to reveal the properties of
8
CHAPTER 2. RELATED WORK 9
existing systems. It most commonly refers to an examination of source code, without the use
of any specification or execution information. There are two main subjects related to our work :
Feature LocatingandSoftware Clustering.
2.1.1 Feature Locating
A feature is a realized functional requirement of a system [30]. Generally, the termfeaturealso
subsumes non-functional requirements. In the context of this research, onlyfunctionalfeatures
are relevant; that is, we consider a feature to be an observable behavior of the system that can be
triggered by the user.
Understanding the implementation of a certain feature of a system requires identification of
the computational units of the system that contribute to this feature. In many cases, the map-
ping of features to the source code is poorly documented. Wilde et al. [93] were pioneers in
locating features taking a fully dynamic approach. The goal of their Software Reconnaissance
is the support of maintenance programmers when they modify or extend the functionality of a
legacy system. Based on the execution of test cases for a particular featuref , several sets of
computational units are identified :
• computational units commonly involved (code executed in all test cases, regardless off ),
• computational units potentially involved inf (code executed in at least one test case that
invokesf ),
• computational units indispensably involved inf (code that is executed in all test cases that
invokef ), and
• computational units uniquely involved inf (code executed exactly in cases wheref is
invoked).
CHAPTER 2. RELATED WORK 10
A computational unit is an executable part of a system. Examples for computational units are
instructions (like accesses to global variables), basic blocks, routines, classes, compilation units,
components, modules, or subsystems. Since the primary goal is the location of starting points for
further investigations, Wilde et al. focus on locating specific computational units rather than all
required computational units.
Another approach, based on dynamic information, was presented by Wong et al. [95]. They
analyzed execution slices of test cases implementing a particular functionality. The process was
described as follows :
1. The invoking input setI (i.e., a set of test cases) is identified that will invoke a feature.
2. The excluding input setE is identified that will not invoke a feature.
3. The program is executed twice usingI andE separately.
4. By comparison of the two resulting execution slices, the computational units can be identi-
fied that implement the feature.
In [94], Wong et al. presented a way to quantify features. Metrics are provided to compute
the dedication of computational units to features, the concentration of features in computational
units, and the disparity between features.
In [21], Chen and Rajlich proposed a semiautomatic method for feature location, in which
the programmer browses the statically derivedAbstract System Dependency Graph(ASDG). The
ASDG describes detailed dependencies among routines, types, and variables at the level of global
declarations. The navigation on the ASDG is computer-aided and the programmer takes on all
the search for a feature’s implementation. The method takes advantage of the programmer’s
experience with the analyzed software. It is less suited to locate features if programmers without
any pre-knowledge do not know where to start the search.
CHAPTER 2. RELATED WORK 11
Eisenbarth et al. [30] presented a semiautomatic technique that reconstructs the mapping for
features that are triggered by the user and exhibit an observable behavior. The mapping is in
general not injective; that is, a computational unit may contribute to several features. Their tech-
nique allows for the distinction between general and specific computational units with respect to a
given set of features. For a set of features, it also identifies jointly and distinctly required compu-
tational units. The presented technique combines dynamic and static analysis to rapidly focus on
the system’s parts that relate to a specific set of features. Dynamic information is gathered based
on a set of scenarios invoking these features. Figure 2.1 illustrates the conceptual model used
by Eisenbarth et al. It describes the relationships among features, scenarios, and computational
units.
Scenario Feature
Basic Block Routine Module
Computational Unitinvokes implemented by
* ***
Figure 2.1: The Conceptual Model of Eisenbarth’s Approach.
In [92], Wilde and Rajlich compared two feature locating approaches, namely theSoftware
Reconnaissancetechnique and theDependency Graph Searchmethod. In the presented case
study, both techniques were effective in locating features. The Software Reconnaissance showed
to be more suited to large infrequently changed programs, whereas the Dependency Graph Search
method was found to be more effective if further changes are likely and require deep and more
complete understanding.
CHAPTER 2. RELATED WORK 12
2.1.2 Software Clustering
Clustering techniques have been used in many disciplines to support the grouping of similar
objects of a system. Clustering analysis is a technique used for combining observations into
groups or clusters such that each group or cluster is homogeneous or compact with respect to
certain characteristics and each group should be different from other groups with respect to the
same characteristics [73]. The primary objective of clustering analysis is to take a set of objects
and characteristics with no apparent structure and impose a structure upon them with respect to a
characteristic.
The primary objective of clustering analysis is to facilitate better understanding of the ob-
servations and the subsequent construction of complex knowledge structures from features and
object clusters. Most clustering approaches attempt to provide solutions in restructuring legacy
systems.
Belady and Evangelisti introduced an approach that automatically clusters a software system
in order to reduce its complexity [6]. They also provided a measure for the complexity of a system
after it has been clustered. Their clustering approach was based on the information extracted from
the documentation of the system.
Muller et al. [63, 64] implemented several software clustering heuristics in the Rigi tool that
(i) measure the relative strength between interfaces, (ii) identify omnipresent modules, and (iii)
use the similarity between module names. They introduced the important principles of small
interfaces (the number of elements of a subsystem that interface with other subsystems should be
small compared to the total number of elements in the subsystem) and of few interfaces (a given
subsystem should interface only with a small number of the other subsystems).
Hutchens and Basili [43] developed an algorithm that clusters procedures into modules by
measuring the interaction between pairs of procedures. Their clustering technique was based on
data bindings. A data binding was defined as an interaction between two procedures based on
CHAPTER 2. RELATED WORK 13
the location of variables that are within the static scope of both procedures. Based on the data
bindings, a hierarchy is constructed from which a partition can be derived. They compared their
structures with the developer’s mental model with satisfactory results and evaluated the stability
of the system, focusing on what happened with the clustering when changes are done.
Mancoridis et al. [55] treated clustering as an optimization problem and used genetic algo-
rithms to overcome the local optima problem of hill-climbing algorithms, which are commonly
used in clustering problems. They implemented a tool called Bunch [54] that can generate better
results faster when users are able to integrate their knowledge into the clustering problems. They
also show how the subsystem structure of a system can be maintained incrementally after the
original structure has been produced.
2.2 Program Migration
Program transformation is the act of changing one program into another. The language in which
the program being transformed and the resulting program are written are called the source and
target languages, respectively. Program transformation is used in many areas of software engi-
neering, including compiler construction, software visualization, documentation generation, and
automatic software renovation. There are two main subjects related to our work :Migrating
Procedural Legacy Systems to Object-Oriented ParadigmandRe-Engineering Existing Object-
Oriented Systems.
2.2.1 Migrating Procedural Legacy Systems to Object-Oriented Paradigm
Many researchers have proposed different methodologies for migrating the architecture or the
code of software systems written in a procedural language to comply with object-oriented paradigms.
For instance, Martin and Muller [57] reported cased studies on transliterating C source code
to Java using Ephedra method. The method includes three processes :
CHAPTER 2. RELATED WORK 14
• Insertion of C function prototypes,
• Data type and type cast analysis, and
• Transliteration of source code.
By applying the Ephedra method, parts of C code can be implemented into Java platforms which
makes it possible to avoid a complete redevelopment of the business logic that was already pre-
sented in the current application. However, the difficulty in using this method is that as C is a
procedural language and Java is an object-oriented language, not only do the syntax and seman-
tics of the source code need to be translated, but also a paradigm shift is necessary to move from
procedural to object-oriented code.
Wong and Li [96] proposed a stepwise approach for abstracting object-oriented designs from
procedural source code :
• Abstract the program structure, such as procedure and variable call graphs, and group vari-
ables as well as procedures into classes by using structure similarity and pattern matching,
• Conduct dynamic code partition using an execution sliced-based technique and visualizing
various functionalities in the code, and
• Refine the object-oriented design generated in the previous step, if necessary, with the aid
of simulation.
Web enabling the existing applications offers high leverage and good return on investment.
The web enabling process may involve the following issues :
• Wrapping the existing legacy application with Internet technologies. The advantage of this
process is that previous investment into legacy code remains intact. Also, by segregating
the user interface from the business logic module of the legacy application, only that which
is required for making the application “Internet aware” is modified.
CHAPTER 2. RELATED WORK 15
• It is important to establish the proof of concept on the proposed solution by web enabling
a part of the system instead of the whole. This in turn can help in defining the long-term
strategy on the appropriate solution that will best suit the organization.
• An existing legacy application might need to be reconstructed to leverage the existing busi-
ness process.
In [101], Zou and Kontogiannis presented a framework to address these issues on migrating
legacy systems into a web-enabled environment by involving the CORBA wrapper and the SOAP
CORBA IDL translator. The migration process focuses on specifying the identified legacy com-
ponents in XML, consequently wrapping them by CORBA objects, and finally deploying the
distributed component into the application server. A scripting language that is encoded in an
XML format can be used for allowing thin clients to communicate with legacy components.
2.2.2 Re-Engineering Existing Object-Oriented Systems
Computing environments are evolving from mainframe systems to distributed systems. Stand-
alone programs that have been developed using object-oriented technology are not suitable for
these new environments. Hence, many researchers have addressed these issues by re-engineering
the existing object-oriented systems.
Tahvildari and Kontogiannis [80, 86] presented a framework for providing quality-based and
quality-driven re-engineering of object-oriented systems. The framework adopts an incremental
and iterative re-engineering process model that is driven by the soft-goal interdependency graphs.
The re-engineering process includes the following steps as illustrated in Figure 2.2. First, the
source code is represented as an Abstract Syntax Tree. The tree is further decorated using a linker,
with annotations that provide linkage, scope, and type information. Once software artifacts have
been understood, classified and stored during the reverse engineering phase, their behavior can
be available to the system during the forward engineering phase. Then, the forward engineering
CHAPTER 2. RELATED WORK 16
Goal-Driven
Non-Functional
Requirements
Transformation
Rules
Source Code Evaluation Final SystemNew
Code
ASG, AST, RSF, …
UML Diagrams
High-Level
Source Code
Representation
Figure 2.2: The Block Diagram of the Quality-Based Re-engineering Process.
phase aims to produce a new version of a legacy system that operates on the target architecture
and aims to address specific non-functional requirements. Finally, the framework uses an iterative
procedure to obtain the new migrant source code by selecting and applying a transformation
which leads to performance or maintainability enhancements. The transformation is selected
from the soft-goal interdependency graphs. The resulting migrant system is then evaluated and
the step is repeated until quality requirements are met.
Fanta and Rajlich [32] re-engineered the object-oriented program to improve the program
structure and thus its maintainability. A deteriorated C++ application was restructured to move
“misplaced” code and data from their original classes to the classes they naturally belong to.
Gleich and Kohler [37] proposed an approach for transforming object-oriented legacy systems
into modern framework-based architectures in order to improve their maintainability. They also
provided a reference architecture for re-engineering tools and a few tool-prototypes which were
developed at Daimler-Benz.
Xu et al. [97] presented an approach to program restructuring at the functional level based on
the clustering technique with cohesion as the main concern. The approach focused on automated
support for identifying ill-structured or low cohesive functions and providing heuristic advice in
both development and evolution phases. The empirical observations showed that the heuristic
CHAPTER 2. RELATED WORK 17
advice provided by the approach can help software designers make better decision of why and
how to restructure a program.
2.3 Architecture Recovery
One of the areas in software architecture is architecture recovery through reverse engineering of
existing implementations. Knowing the architecture of a software system may play an impor-
tant role in maintenance and evolution of the system. This knowledge helps the developer to
know where in the system to modify and what parts of the system will be affected by the change.
Moreover, in order to decompose an existing system, there is a need for an efficient architec-
ture recovery process. One of the areas in software architecture is architecture recovery through
reverse engineering of existing implementations.
View Extraction
Figure 2.3: The Dali Workbench.
CHAPTER 2. RELATED WORK 18
Since architecture recovery has received considerable attention recently, numerous articles
have been published on this topic and various frameworks, techniques and tools have been devel-
oped. Basically, existing knowledge, obtained from experts and design documents, and various
tools are necessary to solve the problem. For instance, Kazman and Carriere presented a work-
bench for architectural extraction calledDali [48]. Figure 2.3 illustrates Dali’s architecture. In
this workbench, a variety of lexical-based, parser-based and profiling-based tools are used to ex-
amine a system and extract static and dynamic views to be stored in a repository. Analysis of
these views is supported by visualization and specific analysis tools. They enable an interaction
with experts to control the recovery process until the software architecture is reconstructed.
Another architecture recovery approach was proposed by Guo et al. in [42], calledArchi-
tecture Recovery Method(ARM). ARM is semi-automatic analysis method for reconstructing
architectures based on the recognition of architectural patterns. Existing knowledge gained from
design documentation is used to define queries for potential pattern instances which are then ap-
plied automatically to extracted and fused source model views. Human evaluation is required to
determine which of the detected pattern instances are intended, and which are false positive and
false negative. ARM supports patterns at various abstraction levels and uses lower-level patterns
to build higher-level patterns and composite patterns. In this way the approach is aimed particu-
larly at systems that have been developed using design patterns whose implementations have not
eroded over time.
Dominance analysis is a fundamental concept in compiler optimizations and has been used
extensively to identify loops in basic block graphs [61]. It allows one to locate subordinated soft-
ware elements in a rooted dependency graph. Dominance analysis on call graphs of procedural
language applications has been used in reverse engineering to identify modules and subsystems
and recover system architectures [17, 26, 36]. Cimitile and Visaggio [26] first introduced domi-
nance analysis as a method to identify related parts of an imperative system. This idea was further
elaborated on in [17, 36]. The authors applied dominance analysis on call graphs of procedural
CHAPTER 2. RELATED WORK 19
language applications to identify modules and subsystems. In this research, we explore the use
of dominance analysis to identify services from an object-oriented application.
2.4 Software Reuse
Software reuse enables applications to be developed faster and less expensively. It also offers
numerous other benefits, including :
• Return on Investment. Components built or purchased by a company for one particular
project can be reused in future projects, maximizing the company’s return on investment.
• Adaptability. With component-based development (CBD), applications can be easily adapted
to respond to changing business needs. The modular nature of components enables them
to be easily modified, added, deleted or swapped to provide new or enhanced functionality.
• Reliability. Reusing software components decreases the risk of operational glitches be-
cause the components have already been previously tested in other applications.
Current software reuse techniques include object-orientation, component-based software devel-
opment, and service-based development. In this section, we review two topics on software reuse
which are relevant to this research work :Identification of Reusable Components in Source Code
andCreation of Services from Legacy Systems.
2.4.1 Identification of Reusable Components in Source Code
Re-engineering legacy systems into component-based systems involves identifying reusable pieces,
or components, of the legacy system so that the system can be restructured using those pieces.
These components are actually modules of the system’s code that perform certain business func-
tions independently by processing a specific set of data. Once such components are identified
CHAPTER 2. RELATED WORK 20
in the system, they can be “mined”, or extracted, and reused to build a component-based sys-
tem [39].
The component identification exercise first requires the software developer to gain an under-
standing of the legacy system. A software system can be understood in the following terms :
• Different elements of the system such as programs, jobs, and data files.
• Relationships that exist between those elements. Also different views can be constructed
based on these elements and their relationships to each other, for instance, a call graph can
be created to show the relationship between various programs.
Once we gain an understanding of how the legacy system is built, we need to break the system
down into components. This can be accomplished by selecting certain points within the system
and expanding the boundaries of those points until all related system elements are included within
the boundaries. The process of expanding these boundaries may be driven primarily by system
queries, documentation on the system, its maintenance history, and the knowledge of those who
have worked with the system in the past.
The component identification approach can be classified into two categories [39] :Data-
Centric IdentificationandEvent-Centric Identification. The data-centric approach to component
identification involves analyzing the different types of data within the system, identifying the
business functions performed on each type of data and pinpointing where each business function
is performed throughout the system. Once a unique, independent business function is identified
and isolated, it can then be segregated as a component. The event-centric approach to component
identification is used to identify components in event- driven systems such as onlineCustomer
Information Control System(CICS) programs. Most online CICS programs are driven by events
generated either by user input or internal programs. In an event-driven system, any time an
event takes place, specific code within the system is executed. Components can be identified by
triggering an event and isolating the specific business functions that result from that event. In this
CHAPTER 2. RELATED WORK 21
research, we focus only on the Data-Centric Identification approaches.
Caldiera and Basili introduced theComputer Aided Reuse Engineering(Care) system, which
describes an algorithmic approach for program understanding, to support identifying reusable
components using the user-definedreusability attribute modelbased on software metrics in the
context of a procedural paradigm [18].
Etzkorn and Davis presented an approach for identifying reusable classes from object-oriented
systems based on the understanding of comments and identifiers in the source code [31]. Their
tool CHRisuses natural-language techniques to help users decide whether a class implements
certain useful functionality.
In [4], Bansiya and Davis introduced aQuality Model for Object-Oriented Design(QMOOD)
which measures functional, structural and relational details of the system based on high-level
attributes. In the model, they calculate reusability based on coupling, cohesion, and design size.
Shin and Kim proposed techniques for transforming an available object-oriented design into
a component-based design [75]. Their techniques focus on formal model specification and trans-
formation.
None of these methods, however, provide hierarchical structures nor propose the reconstruc-
tion of the system’s original architectural design. We aim to develop techniques for recovering
high-level design, to extract the service hierarchy embedded in object-oriented systems, and to
migrate object-oriented designs to service-oriented architectures.
2.4.2 Creation of Services from Legacy Systems
A software service of a software system is an abstract resource that represents a capability of
performing tasks that represent a coherent functionality, from the point of views of both the
provider and the requester of the software [40]. A service should have well-defined functional
interface and be easily discovered and accessed [99]. A service-based development paradigm,
or services model [34], is one in which components are viewed as services. In this model, ser-
CHAPTER 2. RELATED WORK 22
vices can interact with one another and be providers or consumers of data and behavior. Some
of the defining characteristics of service-based technologies include modularity, availability, de-
scription, implementation-independence, and publication [34]. In the service-based development
paradigm, a primary focus is upon the definition of the interface needed to access a service (de-
scription) while hiding the details of its implementation (implementation-independence).
Gannod et al. described an architecture-based approach for the creation of services from
legacy components using wrapping, or adapters and the subsequent integration of these services
with service-requesting client applications [35]. The technique utilizes an architecture descrip-
tion language to describe components as services and achieves run-time integration using Jini [47]
middleware technology. The methodology involves two steps for creating services : (i) specifica-
tion of components as services; and (ii) generation of services using proxies via the construction
of appropriate adapters and glue code. These services are consequently registered and made
available on a network.
Mehta and Heineman [59, 60] integrated the concepts of features, regression tests, and the
component-based software engineering (CBSE) into an approach for evolving procedural legacy
systems. This methodology was divided into three parts : i) selecting test cases by consider-
ing features that need evolution; ii) executing selected test cases using code profilers to locate
source code that implements features and analyzing and refactoring located source code to create
components; iii) comparing pre- and post-evolution maintenance costs.
2.5 Summary
In this chapter, we have reviewed four principle research fields upon which this thesis is founded :
Program Comprehension, Program Migration, Architecture Recovery, andSoftware Reuse. The
aim of this chapter is to provide a general background to existing and ongoing research in these
areas. In subsequent chapters, we will present our own contributions in more detail, and also
present detailed analysis of our approach in comparison to closely related work.
Chapter 3
Service-Oriented Componentization
Framework
Since many competitive services have already been implemented in existing systems, leveraging
the value of an existing system by exposing all or parts of it as services within a service-oriented
environment has become a major concern in today’s industry. Identifying of functions suitable
for exposure as services can be seen as an instance of a more generic problem of functional
decomposition of existing systems. To reuse the identified services and migrate the existing
system’s implementation into a service-oriented environment, one needs to package the identified
services into well-documented and self-contained components, during the forward engineering
phase.
In this research, we develop a service-oriented componentization framework for the Java soft-
ware system, which decomposes an existing object-oriented system to re-modularize the existing
assets to support service functionality. More specifically, the proposed framework automatically
supports : i) identifying critical business services embedded in an existing Java system, ii) re-
alizing each identified service as a self-contained component, and iii) transforming the object-
23
CHAPTER 3. SERVICE-ORIENTED COMPONENTIZATION FRAMEWORK 24
oriented design into a service-oriented architecture. We name the proposed componentization
framework as the SOC4J framework.
This chapter outlines the proposed SOC4J framework, while the details are discussed more
thoroughly in subsequent chapters.
Architecture Reconstruction
Stage 4: System Transformation
Architecture Modeling
Stage 1: Architecture Recovery
Source Code Modeling
Source code models (Facts)
Top-Level Service Identification
Low-Level Service Identification Self-ContainedComponentRepository
Stage 2: Service Identification
Legend
ProcessData Flow Control Flow
Component Generation
Stage 3: Component Generation
SourceCode
Component-BasedSystem
Top-level services and atomicsub services contained in each top-level service
Architectural models
Self-ContainedComponentsValidated services
(Top-level servicesand their low-levelservices)
Figure 3.1: The Architecture of the Service-Oriented Componentization Framework.
3.1 Framework Overview
The proposed SOC4J framework uses graph representations of an existing object-oriented soft-
ware system and graph transformations to identify business services embedded in the system.
In this research, we are interested in the reverse engineering challenge. Service identification is
complicated by the usual obstacles of having to deal with potentially large and poorly structured
existing systems. Identifying these service candidates for packaging as reusable components
CHAPTER 3. SERVICE-ORIENTED COMPONENTIZATION FRAMEWORK 25
would require analysis of massive amounts of legacy code or at least graph representations of
the code. Additionally, it would require intervention of people with background in the business
domain to judge what functions are likely to make successful services. The identification of func-
tions suitable for exposure as services can be seen as an instance of a more generic problem of
the functional decomposition of existing systems. Here, we are required to abstract the code,
or an alternative code representation (e.g., XML or graphs), to higher-level representations that
describe the system architecture in terms of its functional units.
Furthermore, the framework realizes each identified service into a self-contained component
and reconstructs the object-oriented design into a service-oriented architecture. To reuse the iden-
tified services and migrate the existing system’s implementation into a component-based archi-
tecture, it is necessary to package the identified services into well-documented and self-contained
components. Service packaging needs techniques for automatically extracting the relevant pro-
cedural elements from the existing system and creating an interface for components. Also, the
restructuring of object-oriented systems requires a comprehensive framework to relate refactor-
ing operations and software transformations with non-functional requirements. As illustrated in
Figure 3.1, the proposed componentization framework is comprised of four stages :Architec-
ture Recovery, Service Identification, Component Generation, andSystem Transformation. The
following sections elaborate on each stage of these stages.
3.2 Architecture Recovery
Software architecture recovery aims at reconstructing views on the architecture as-built. Effective
system reuse and evolution require both the “big picture” and the lower level dependencies be-
tween portions of the source code. The identification of functions suitable for exposure as services
can be seen as an instance of a more generic problem of functional decomposition of existing sys-
tems. In this problem, it is required to abstract the code, or an alternative code representation (e.g.,
CHAPTER 3. SERVICE-ORIENTED COMPONENTIZATION FRAMEWORK 26
XML or graphs) to higher-level representations that describe the system architecture in terms of
its functional units.
In the architecture recovery stage, we aim to create a framework that has methodological
and technological steps to recover higher-level design and architecture representations of existing
software systems based on source code artifacts. This includes the creation of a suitable represen-
tation of design and architectural models that reflect the functional decomposition of the system.
To distinguish them from each other, design models are more detailed and refer to different parts
of a system, whereas architectural models are more abstract and refer to the system as a whole.
There are two goals we are trying to achieve at this stage : i) building complete data models
for Java source code at different abstracted levels to support a wide range of structural analysis
and recovery, and ii) establishing a repository of relationships among classes and interfaces which
can easily be queried in the service identification stage.
3.3 Service Identification
Identifying critical business services embedded in an existing Java system is one of the primary
tasks of the SOC4J framework. Essentially, the service identification process of the SOC4J frame-
work is to identify related modules in the system. This process is based on the analysis on the
recovered architectural information obtained from the previous chapter.
A business service of a software system is an abstract resource that represents a capabil-
ity of performing tasks that represent a coherent functionality from the point of views of both
the provider and the requester. In order to clearly describe and automate the service identifica-
tion process, we categorize the service embedded in an object-oriented system into two classes :
i) Top-level servicesthat are not used by another service but may contain a hierarchy of low-level
services further describing the service, and ii)Low-level servicesthat are underneath top-level
service and may be agglomerated with other low-level services to yield a new service with a
CHAPTER 3. SERVICE-ORIENTED COMPONENTIZATION FRAMEWORK 27
higher level of granularity. Furthermore, a formal description needs to be developed for each
service. Such descriptions should document possible dependencies between service invocations,
beside syntactic information on the number and types of parameters. Such descriptions are cru-
cial for developers to implement applications based on the services extracted and should therefore
be presented in a way understandable to them.
In the service identification stage, we aim to identify both the top-level services and the low-
level services embedded in an existing system. The proposed service identification approach is
supported by a combination of top-down and bottom-up techniques. In the top-down portion of
the process, we identify the top-level services and the atomic services underneath each top-level
service. In the bottom-up portion, we aggregate the atomic services to identify services with
higher level of granularity, using graph transformations.
3.4 Component Generation
An effective way of leveraging the value of existing systems is to expose their functionalities as
reusable components to a larger number of clients through well-defined component interfaces.
Hence, the identified services should be packaged as components so that they can be deployed
and thus invoked. Moreover, in order to migrate the existing system’s implementation into a
component-based architecture, it might be necessary to package the identified services into com-
ponents. If service packaging is required, then it needs techniques for automatically extracting the
relevant procedural elements from the existing system, and creating an interface for components.
The service-oriented architecture (SOA) encourages individual services to be self-contained.
A self-contained component is a component that contains all code necessary to implement its
services and hence it can be deployed independently. At the third stage of the proposed SOC4J
framework, we realize each top-level service and the low-level services underneath the top-level
service into self-contained components. More specifically, for each identified service, we extract
CHAPTER 3. SERVICE-ORIENTED COMPONENTIZATION FRAMEWORK 28
all classes and interfaces that are necessary for implementing the service, generate an interface
for the service, and package these classes/interfaces together with the interface as a JAR file.
As Figure 3.1 depicted, the output of this stage is a repository of self-contained components.
The quality of the component is important in order to succeed in the reuse-driven development
process. Key qualities of good reusable components include correctness, complexity, observ-
ability, testability, customizability, and performance. However, most of these qualities are not
directly measurable. In this thesis, we aim at assessing the reusability of the extracted compo-
nents through the analysis of their interfaces and internal methods. Reusability is a high-level
quality of software components and hence it is the result of the combination and interaction of
many low-level properties. We define a component reusability model that typically shows the
reusability as being composed of quality properties such as complexity, observability, customiz-
ability, and external dependency.
3.5 System Transformation
A component-based system is built by combining and interconnecting the components. There-
fore, the component-based approach supports reusability and flexibility. Based on the compo-
nents that realize the identified business services, transforming the monolithic architecture of an
existing object-oriented system to a more flexible service-oriented architecture is another goal of
the proposed SOC4J framework.
In the system transformation stage, we aim at reconstructing an existing Java system into a
component-based system by using the generated component from the source system. A reference
model for the component-based target system has been presented. The system transformation
process should preserve the functionality of the original system. The surrounding parts of the
component should use newly extracted components in order to avoid the situation where two sets
of classes providing the same functionalities exist in the same system.
CHAPTER 3. SERVICE-ORIENTED COMPONENTIZATION FRAMEWORK 29
As Figure 3.1 shows, the output of this stage is a component-based system providing the same
functionality as the original system.
3.6 Summary
In this chapter, we outlined the proposed service-oriented componentization framework. The role
of each stage of the framework has been discussed. We will present the techniques used within
each stage in the subsequent chapters.
Chapter 4
Architecture Recovery
Software architecture recovery aims at reconstructing views on the architecture as-built. Knowing
the architecture of a software system plays an important role in the maintenance and evolution
of the system. This knowledge helps the engineer to know where in the system to modify and
what parts of the system will be affected by the change. Moreover, in order to componentize an
existing system, there is a need for an efficient architecture recovery process. The first stage of
the service-oriented componentization framework is the architecture recovery stage. There are
two goals we are trying to achieve at this stage :
• Building complete data models for Java source code at different levels of abstraction to
support a wide range of structural analysis and recovery, and
• Establishing a repository of relationships among classes and interfaces which can easily be
queried in the service identification stage.
This chapter discusses two main processes contained in the architecture recovery stage :
Source Code Modelingprocess andArchitecture Modelingprocess. In Section 4.1, we discuss
the UML representation of XML schemas which we define in this thesis. We explain the source
code modeling process in Section 4.2, while the architecture modeling process is discussed in
30
CHAPTER 4. ARCHITECTURE RECOVERY 31
Section 4.3. Finally, Section 4.4 summarizes this chapter.
4.1 XML Schema Representation
As designed, the output of each stage of the componentization framework is presented as XML
documentations. Before we delve into the processes of each stage in the framework, we need
to find an understandable and formal way to present the XML schemas we define in each stage.
UML [65] is being used as the de-facto standard for software development; therefore a need
arises to integrate XML schemas into UML-based software development processes. Not only is
the production of XML schemas out of UML models required, but also the integration of XML
schemas as input into the development process, because standard data structures and document
types are part of the requirements [7]. In this section, we describe the UML representation of
XML schemas that we define in the rest of the thesis.
4.1.1 UML Profile for XML Schemas
Existing work on representing XML schemas in UML has emerged from approaches to platform
specific modeling in UML and transforming these models to XML schemas, with the recognized
need for UML extensions to specify XML schemas peculiarities. Booch et al. first presented an
approach to modeling XML schemas using UML notation in [11]. Although based on a prede-
cessor to XML schemas, it introduced UML extensions addressing the modeling of elements and
attributes, model groups, and enumerations that can also be found in recent approaches. Bernauer
et al. [7] summarized and compared recent main approaches to represent XML schemas in UML
as follows :
• Carlson [19] described an approach based on XMI rules for transforming UML to XML
schemas. Carlson introduced a UML profile which addresses most XML schema con-
cepts, except for simple content complex types, global elements and attributes, and identity
CHAPTER 4. ARCHITECTURE RECOVERY 32
constraints. Regarding semantic equivalence, the profile has some weaknesses in its repre-
sentation of model groups, i.e.,sequence, choice, andall elements in XML schemas.
• Provost [67] addressed some of the weaknesses of [19] by addressing representation of
enumerations and other restriction constraints, and of list and union type constructors, al-
though the latter doesn’t conform to UML.
• David Carlson [19] defined a UML profile for representing XML schemas that was based
on the XML conceptual models discussed in [27]. Their UML profile addressed some
enhancements regarding simple types and notations.
• Routledge et al. [71] pointed out the importance of separating the conceptual schema
(i.e., the platform independent model) from the logical schema (i.e., the platform specific
model). This separation is not considered in the other approaches. They considered the
logical schema as direct, one-to-one representation of the XML schema in terms of a UML
profile. The profile that they defined covers almost all concepts of XML schema, but sev-
eral of its representations do not conform to UML.
• Bernauer et al. [8] adapted the approach proposed in [71] to aim at a one-to-one represen-
tation of XML schemas in an UML profile. Their approach was built on the existing UML
profiles for XML Schemas, with some improvements and extensions.
4.1.2 Representing XML Schemas in UML
By applying the UML profile, we represent the XML schema defined in this research in UML
notation. We propose three criteria to choose an existing UML profile for an XML schema :
1. The UML profile provides a semantically equivalent representation of an XML schema in
UML supporting a bijective mapping between both representations. In order to satisfy this
CHAPTER 4. ARCHITECTURE RECOVERY 33
requirement, the profile has to address the whole range of XML schema concepts such that
any XML schema can be expressed in UML.
2. The UML profile supports round-trip engineering, that is, transformation from XML schema
to UML and back again without loss of schema information.
3. The UML profile maximizes understandability of semantic concepts by users knowledge-
able of UML but not XML schema.
By examining the result of the evaluation performed in [7], we adopt the UML profile de-
fined in [71] to represent the XML schema throughout this research work. The UML profile for
the XML schema provided in [71] contains classes and associations that represent constructions
found in the XML schema specification [88]. It is intended that every concept in an XML schemas
has a corresponding representation in the UML profile (and vice versa). As a result, there is a
one-to-one relationship between thelogical (UML notation) andphysical(XML schema nota-
tion) XML schema representations.
4.2 Modeling Source Code
Fact extraction from source code (i.e., finding pieces of information about the system) is a fun-
damental step of reverse engineering and often has to be performed first. That means the before
performing any high-level reverse engineering analysis or architecture recovery activities, avail-
able information in the source code has to be extracted and aggregated in a fact base. Such a fact
base forms the foundation for further analysis tasks that are conducted next. We aim to build a
complete data model set for Java source code at different levels of abstraction to support a wide
range of structural analysis and recovery. These models are essential for representing the sys-
tem at the source code level and computing reusability attributes for each individual class. The
source code models are presented as XML documents and form theBasic View(BView) of the
CHAPTER 4. ARCHITECTURE RECOVERY 34
system [51].
4.2.1 Approach
There are a number of existing meta-models for representing object-oriented software. Most
of those are aimed atObject-Oriented Analysis and Design(OOAD), the most notable example
being the Unified Modeling Language (UML). However, these meta-models represent software at
the design level. Re-engineering requires information about software at the source code level. We
propose an automated approach for modeling the entities of Java software systems at the source
code level. The approach is based on the Java Compiler Compiler (JavaCC) [44] as Figure 4.1
depicted.
Interpreter
JavaCC(Java Compiler Compiler)
Model Generator Source Code
Models
(XML doc)
JavaCC
Grammar
Java Source Code
<generates>
Raw Data
Data FlowControl Flow
Figure 4.1: The Approach for Source Code Modeling.
Source code parser construction tools have been around for serval years. The best known of
these are the famousyacc[98] and lex [50] tools from the Unix domain or their GNU versions
bison[10] andflex [33]. These tools, as well as their successors, allow a stream of input data to
be parsed based on two constructs :
• Tokens. A token is a sequence of input characters that has meaning based upon the desired
syntax. The first step in parser construction is to extract tokens from the input stream. This
CHAPTER 4. ARCHITECTURE RECOVERY 35
generally involves the specification of those tokens in some form of regular expressions.
Token extraction is also known as scanning or lexing (for lexical analysis).
• BackusCNaur Form (BNF) Productions. A BNF production is a set of token sequences
that has meaning based upon the desired syntax. For example, the string “2*3+4” can be
abstractly interpreted as “INTEGER MULT INTEGER ADD INTEGER”. The second step
in the parser construction is to group the tokens together to form the valid sequences for
the desired syntax.
JavaCC offers an excellent toolkit for generating parser classes in Java. JavaCC generates top-
down, recursive descent parsers. The top-down nature of JavaCC allows it to be used with a wider
variety of grammars than other traditional tools, such as yacc and lex. JavaCC also contains all
parsing information in one file (the JavaCC grammar file). The convention is to name this file
with a .jj extension.
The Interpreter in Figure 4.1 is composed of a set of parser classes which are generated by
JavaCC. It parsers the Java source code and outputs a set of raw data of the facts. These raw data
sets are passed to the theModel Generatorwhich builds source code models.
4.2.2 Source Code Models
<<elt>> +class [0..*] : xsd: string
<<elt>> +interface [0..*] : xsd: string
<<sequence>>
sequence
<<complexType>>
JPackage
<<seq>> +sequence [1..1]
<<attr>> +name [1..1] : xsd:string
Figure 4.2: The Meta-Model for Java Package Models.
Source code models represent Java packages, source files, classes, and methods defined in
a class. We define four meta-models for source code models at different levels of abstraction :
CHAPTER 4. ARCHITECTURE RECOVERY 36
JPackage, JFile, JClass, andJMethod. As designed, source code models are exported and stored
as XML documents. Therefore, these meta-models are XML schemas and presented as UML
models by applying the UML profile for XML schemas as discussed in Section 4.1.2.
JPackage
JPackage is the XML schema for modeling Java packages. Figure 4.2 illustrates the JPackage
XML Schema in UML.
JFile
JFile is the XML schema for modeling Java source files. Figure 4.3 illustrates the JFile XML
Schema in UML.
<<elt>> +publicType [0..1] : PublicType
<<elt>> +nonPublicTypes [0..1] : NonPublicTypes
<<sequence>>
sequence
<<seq>> +sequence [1..1]
<<attr>> +javaSourceFile [1..1] : xsd:string
<<attr>> +size [1..1] : xsd:positiveInteger
<<complexType>>
JFile
<<complexType>>
PublicType
<<choice>> +choice [1..1]
<<elt>> +class [0..1] : xsd: string
<<elt>> +interface [0..1] : xsd: string
<<choice>>
choice
<<complexType>>
NonPublicTypes
<<seq>> +npt_sequence [1..1]
<<elt>> +class [0..*] : xsd: string
<<elt>> +interface [0..*] : xsd: string
<<sequence>>
npt_sequence
Figure 4.3: The Meta-Model for Java Source File Models.
JClass
JClass is the XML schema for modeling Java classes or interfaces. Figure 4.4 illustrates the
JClass XML Schema in UML.
CHAPTER 4. ARCHITECTURE RECOVERY 37
<<seq>> +m_sequence [1..1]
<<complexType>>
JClass::Modifiers
<<seq>> +sc_sequence [1..1]
<<complexType>>
JClass::SuperClass
<<seq>> i_sequence [1..1]
<<complexType>>
JClass::Interfaces
<<seq>> f_sequence [1..1]
<<complexType>>
JClass::Fields
<<seq>> nc_sequence [1..1]
<<complexType>>
JClass::NestedClasses
<<elt>> +constructor [0..*] : JMethod
<<elt>> +method [0..*] : JMethod
<<sequence>>
cm_sequence
<<complexType>>
JClass
<<seq>> +sequence [1..1]
<<attr>> +name [1..1] : xsd:string
<<attr>> +type [1..1] : xsd:string
<<attr>> +size [1..1] : xsd:positiveInteger
<<attr>> +name [1..1] : xsd:string
<<attr>> +type [1..1] : xsd:string
<<attr>> +cardinality [1..1] : xsd:positiveInteger
<<complexType>>
JClass::Fields::Field
<<sequence>>
sequence
<<elt>> +package [0..1] : xsd:string
<<elt>> +importedClasses [0..1]
<<elt>> +modifiers [0..1]
<<elt>> +superClasses [0..1]
<<elt>> +interfaces [0..1]
<<elt>> +fields [0..1]
<<elt>> +nestedClasses [0..1]
<<elt>> +methods [0..1]
<<elt>> +class [0..*] : xsd:string
<<elt>> +interface [0..*] : xsd:string
<<sequence>>
ic_sequence
<<seq>> +ic_sequence [1..1]
<<complexType>>
JClass::ImportedClasses
<<elt>> +modifier [0..*] : xsd:string
<<sequence>>
m_sequence
<<elt>> +class [0..1] : xsd:string
<<sequence>>
sc_sequence
<<elt>> +interface [0..*] : xsd:string
<<sequence>>
i_sequence
<<elt>> field [0..*]
<<sequence>>
f_sequence
<<elt_ref>> jClass [0..*]
<<sequence>>
nc_sequence
<<seq>> cm_sequence [1..1]
<<complexType>>
JClass::Methods
Figure 4.4: The Meta-Model for Java Classe/Interface Models.
JMethod
JMethod is the XML schema for modeling Java methods defined in a class or constructors of a
class. Figure 4.5 illustrates the JMethod XML Schema in UML.
CHAPTER 4. ARCHITECTURE RECOVERY 38
<<seq>> +rt_sequence [1..1]
<<complexType>>
JMethod::ReturnType
<<seq>> +fp_sequence [1..1]
<<complexType>>
JMethod::FormalParameters
<<seq>> te_sequence [1..1]
<<complexType>>
JMethod::ThrowedExceptions
<<seq>> ce_sequence [1..1]
<<complexType>>
JMethod::CatchedException
<<seq>> it_sequence [1..1]
<<complexType>>
JMethod::InstantiatedTypes
<<complexType>>
JMethod
<<seq>> +sequence [1..1]
<<attr>> +name [1..1] : xsd:string
<<attr>> +type [1..1] : xsd:string
<<attr>> +cardinality [1..1] : xsd:positiveInteger
<<complexType>>
JMethod::InstantiatedTypes::InstantiatedType
<<sequence>>
sequence
<<elt>> +modifiers [0..1]
<<elt>> +returnType [0..1]
<<elt>> +formalParameters [0..1]
<<elt>> +throwedExceptions [0..1]
<<elt>> +catchedExceptions [0..1]
<<elt>> +unstantiatedTypes [0..1]
<<elt>> +modifier [0..*] : xsd:string
<<sequence>>
m_sequence
<<seq>> +m_sequence [1..1]
<<complexType>>
JMethod::Modifiers
<<elt>> +type [0..1] : xsd:string
<<sequence>>
rt_sequence
<<elt>> +type [0..*] : xsd:string
<<sequence>>
fp_sequence
<<elt>> +type [0..*] : xsd:string
<<sequence>>
te_sequence
<<elt>> +type [0..*] : xsd:string
<<sequence>>
ce_sequence
<<elt>> +instantiatedType [0..*]
<<sequence>>
it_sequence
Figure 4.5: The Meta-Model for Java Method/Constructor Models.
4.3 Modeling Architecture
In this thesis, the primary goal of architectural modeling is to establishing a repository of rela-
tionships among classes and interfaces which can easily be queried in the service identification
stage. The relationships among classes and interfaces occur at different levels of abstraction such
as package level, class level, and method level. In the specific context of our work, we ana-
lyze relationship at the class-level. Based on the source code models described in Section 4.2.2,
we identify the relationship between the classes/interfaces and build two architectural models at
CHAPTER 4. ARCHITECTURE RECOVERY 39
different levels of abstraction, namelyClass/Interface Relationship Graph(CIRG) andClass/In-
terface Dependency Graph(CIDG). In addition to the CIRG and CIDG, reusability attributes for
each class are computed and integrated into the graphs. The service identification and extraction
tasks in the next stage are performed upon the transformation of these two graphs. The CIRG and
CIDG are exported as XML documents and form theStructural View(SView) of the system [51].
4.3.1 Definitions of Class Relationships
We aim to identify class/interface relationship at the class level. In order to comply with UML,
the considered types of relationships between two classes (interfaces) in this thesis areinher-
itance, realization, association, aggregation, composition, andusage, which are adapted from
UML 2.0 superstructure specification [65]. We try to formalize the relationships so that we can
automatically detect them in implementation.
In order to formalize class relationships at the implementation level, we extend the class
relationship property set proposed in [41] :
Generalization Property Given two classes,A andB, A may be a specialized form ofB, or B
may provide a contract thatA agrees to carry out. We define the generalization property as
follows :
GE : Class× Class → G
whereG = {null, extends, implements}(4.1)
Hence we haveGE(A,B) ∈ {null, extends, implements}. GE(A,B) = extends if
classA is a specialized form of classB; GE(A,B) = implements if B serves as the
contract thatA agrees to carry out; otherwise,GE(A,B) = null.
Exclusivity Property An instance of classB involved at a given time in a relationship with an
instance of classA can, or cannot, be in another relationship at the same time. We define
CHAPTER 4. ARCHITECTURE RECOVERY 40
the exclusivity property as follows :
EX : Class× Class → B
whereB = {true, false}(4.2)
Given two classes,A andB, EX(A,B) ∈ {true, false}. Value true states that an in-
stance of classB can take part in another relationship with another instance of classA or
of another class. Valuefalse indicates that it cannot. The exclusivity property only holds
at a given time and it does not prevent possible transferals.
Invocation-Site Property Instances of classA, involved in a relationship, send messages to in-
stances of classB. We nameall the set of all possible invocation sites :
all ={field, arrayfield, collectionfield, parameter, arrayparameter,
collectionparameter, localvariable, localarray, localcollection}(4.3)
We distinguish three levels of invocation sites: fields, parameters, and local variables. Also,
we distinguish ”simple” invocation sites, arrays, and collections because they imply differ-
ent sets of programming idioms for their declarations and for their uses, which we need to
individualize when detecting the relationships. We define the invocation-site property as
follows :
IS : Class× Class ⊆ all (4.4)
Given two classes,A andB, IS(A,B) ⊆ all. Values of theIS property describe the
invocation sites for messages sent from instances of classA to instances of classB. There
can be no message sent from class A to class B, i.e.,IS(A,B) = φ, or messages can be
sent fromA through afield (respectively aparameter, a local variable) of typeB, anarray
field, or a field of typecollection.
CHAPTER 4. ARCHITECTURE RECOVERY 41
Lifetime Property Given two classes,A andB, the lifetime property constrains the lifetimes of
all instances of classB with respect to the lifetimes of all instances of classA. We define
the lifetime property as follows :
LT : Class× Class →‖
where ‖ = {−, +}(4.5)
Hence we haveLT (A,B) ∈ {−,+}. In programming languages with garbage collection,
LT (A,B) = + if all instances of classB are destroyed before the corresponding instances
of classA, andLT (A,B) = − if destroyed after. Also,LT (A,B) ∈‖ if the times of
destruction of instances of classesA andB are unspecified.
Multiplicity Property Given two classes,A andB, the multiplicity property specifies the num-
ber of instances of classB allowed in a relationship with classA. We express this property
as follows :
MU : Class× Class ⊂ N ∪ {+∞} (4.6)
Hence we haveMU(A,B) ⊂ N ∪ {+∞}. For the sake of simplicity, we use an interval
of the minimum and maximum numbers to represent multiplicity. Also, we only consider
multiplicity at the target end of a relationship.
Once the class relationship properties are defined, we can formalize the considered binary
class relationships at implementation level as six conjunctions of the above five properties. For-
malizations of the binary class relationships are important because i) they provide formal language-
independent definitions of the relationships for understanding and communication among soft-
ware engineers, and ii) they are the basis of the detection algorithms needed to bridge the gap
between implementation and design [41].
CHAPTER 4. ARCHITECTURE RECOVERY 42
Inheritance Relationship Given two classes,A andB, let A<IN>−→ B represent that there is
an inheritance relationship betweenA andB, whereA is the source class andB is the
target class. The inheritance relationship signifies that classA shares the structure and
behavior of classB and implies an ”is-a-kind of” relationship. We formalize the inheritance
relationship as follows :
A<IN>−→ B = (GE(A,B) = extends) ∧ (GE(B, A) = null) (4.7)
Realization Relationship Given two classes,A andB, let A<RE>−→ B represent that there is a
realization relationship betweenA andB, whereA is the source class andB is the target
class. The realization relationship signifies that classA must realize, or implement, the
behavior specified by the classesB (in Java case,B is an interface). We formalize the
inheritance relationship as follows :
A<IN>−→ B = (GE(A,B) = implements) ∧ (GE(B, A) = null) (4.8)
Association Relationship Given two classes,A andB, let A<AS>−→ B represent that there is
an association relationship betweenA andB, whereA is the source class andB is the
target class. The UML specifies that the association represents the ability of one instance
of the source class to send a message to an instance of the target class [65]. This is typi-
cally implemented with a pointer or reference instance variable, although it might also be
implemented as a method parameter, or the creation of a local variable. We formalize the
association relationship as follows :
CHAPTER 4. ARCHITECTURE RECOVERY 43
A<AS>−→ B = (GE(A,B) = null) ∧ (GE(B, A) = null) ∧
(EX(A, B) ∈ B) ∧ (EX(B,A) ∈ B) ∧
(IS(A,B) = all) ∧ (IS(B, A) = φ) ∧
(LT (A, B) ∈ ‖) ∧ (LT (B,A) ∈ ‖) ∧
(MU(A,B) = [0, +∞]) ∧ (MU(B, A) = [0, +∞])
(4.9)
Aggregation Relationship Given two classes,A andB, letA<AG>−→ B represent that there is an
aggregation relationship betweenA andB, whereA is the source class andB is the target
class. By the UML specification [65], the aggregation relationship is the typical whole/part
relationship. That is, an instance of the target class (the part) is a part of an instance of the
source class (the whole). The aggregation relationship implies a ”has a” relationship and
is exactly the same as an association with the exception that instances cannot have cyclic
aggregation relationships. We formalize the aggregation relationship as follows :
A<AG>−→ B = (GE(A,B) = null) ∧ (GE(B, A) = null) ∧
(EX(A,B) ∈ B) ∧ (EX(B,A) ∈ B) ∧
(IS(A,B) ⊆ {field, arrayfield, collectionfield}) ∧
(IS(B,A) = φ) ∧
(LT (A, B) ∈ ‖) ∧ (LT (B, A) ∈ ‖) ∧
(MU(A,B) = [0, +∞]) ∧ (MU(B,A) = [1, +∞])
(4.10)
Composition Relationship Given two classes,A andB, let A<CO>−→ B represent that there is a
composition relationship betweenA andB, whereA is the source class andB is the target
CHAPTER 4. ARCHITECTURE RECOVERY 44
class. Again, by the UML specification [65], the composition relationship is exactly like
aggregation with the exception that the lifetime of the ’part’ is controlled by the ’whole’.
This control may be direct or transitive. That is, the whole may take direct responsibility
for creating or destroying the part, or it may accept an already created part, and later pass
it on to some other whole that assumes responsibility for it. We formalize the aggregation
relationship as follows :
A<CO>−→ B = (GE(A,B) = null) ∧ (GE(B, A) = null) ∧
(EX(A, B) = true) ∧ (EX(B, A) = false) ∧
(IS(A,B) ⊆ {field, arrayfield, collectionfield}) ∧
(IS(B,A) = φ) ∧
(LT (A,B) = +) ∧ (LT (B, A) = −) ∧
(MU(A,B) = [1, +∞]) ∧ (MU(B, A) = [1, 1])
(4.11)
Usage RelationshipGiven two classes,A andB, let A<US>−→ B represent that there is a usage
relationship betweenA andB, whereA is the source class andB is the target class. The
UML specifies that a usage relationship is one in which the client (the source) requires
the presence of the supplier (the target) for its correct functioning or implmentation [65].
Furthermore, the UML defines five types of usage relationships: i) thecall relationship
signifies that the source operation invokes the target operation, ii) thecreaterelationship
signifies that the source class creates one or more instances of the target class, iii) the
instantiationrelationship signifies that one or more methods belonging to instances of the
source class create instances of the target class, iv) theresponsibilityrelationship signifies
that the client has some kind of obligation to the supplier, and v) thesendrelationship
signifies that instances of the source class send signals to instances of the target class. We
CHAPTER 4. ARCHITECTURE RECOVERY 45
formalize the usage relationship as follows :
A<US>−→ B = (GE(A,B) = null) ∧ (GE(B, A) = null) ∧
(EX(A,B) ∈ B) ∧ (EX(B,A) ∈ B) ∧
(IS(A,B) ⊆ all − {field, arrayfield, collectionfield}) ∧
(IS(B,A) = φ) ∧
(LT (A, B) ∈ ‖) ∧ (LT (B, A) ∈ ‖) ∧
(MU(A,B) = [1, +∞]) ∧ (MU(B,A) = [0, +∞])
(4.12)
4.3.2 Approach
The architecture modeling process identifies all relationships between the classes/interfaces and
represents the identified relationships in directed graphs. The process also computes the basic
reusability attributes for each class in the system. Figure 4.6 illustrates the architecture modeling
process.
Data Flow
Source Code Models
(XML Doc)XML Parser
Relationship Extractor
Metric Generator
Graph GeneratorGraph TransformerCIRGCIDG
Figure 4.6: The Approach for Architecture Modeling.
As we described before, the source code models built by the source code modeling process are
exported as XML documents. First, these source code models are parsed by theXML Parserin
Figure 4.6. Then, theRelationship Extractoridentifies all relationships described is Section 4.3.1
CHAPTER 4. ARCHITECTURE RECOVERY 46
and theMetric Generatorcomputes a set of metrics for each class/interfacce. We define a metric
suite at the class level to represent the basic reusability attributes for each class in the system.
The metric suite is presented in Table 4.1. The definition of each metric is adapted from SDMet-
rics [72]. Finally, theGraph GeneratorandGraph Transformergenerate the CIRG and CIDG,
respectively. We will give formal definitions of the CIRG and CIDG in following sections.
Metric Definitionlines code The number of lines of non-comment code in a class.num attr The number of attributes in a class. The metric counts all properties re-
gardless of their type (data type, class or interface), visibility, change-ability (read only or not), and owner scope (class-scope, i.e., static, orinstance attribute). Not counted are inherited properties, and proper-ties that are members of an association, i.e., that represent navigableassociation ends.
num ops The number of methods in a class. Includes all methods in the class thatare explicitly modeled (overriding methods, constructors), regardlessof their visibility, owner scope (class-scope, i.e., static), or whetherthey are abstract or not. Inherited operations are not counted.
num pub ops The number of public methods in a class. Same as metricnum ops,but only counts operations with public visibility. Measures the size ofthe class in terms of its public interface.
num nestedclasses The number of inner classes in a classsetters The number of operations with a name starting with ’set’. Note that
this metric does not always yield accurate results. For example, anoperationsettleAccount will be counted as setter method.
getters The number of operations with a name starting with ’get’, ’is’, or ’has’.Again, note that this metric does not always yield accurate results. Forexample, an operationisolateNodewill be counted as getter method.
fan in The number of classes/interfaces that depend on this class. This metriccounts incoming plain UML dependencies and usage dependencies.
fan out The number of classes/interfaces on which this class depends. Thismetric counts outgoing plain UML dependencies and usage dependen-cies.
Table 4.1: The Metric Suite at Class Level
CHAPTER 4. ARCHITECTURE RECOVERY 47
4.3.3 Class/Interface Relationship Graph
The CIDG captures the UML-compliant relationships as explained in Section 4.3.1. The formal
definition of the CIDG is given as follows :
Definition 4.1. A Labeled Directed Graph (LDG) is a tupleΓ(V, E, LV , LE , lV , lE), whereV
is a set of nodes (or vertices),E is a set of edges (or arcs),LV is a set of node labels,LE is a set
of edge labels,lV : V → LV is a label function that maps nodes to node labels, andlE : E → LE
is a label function that maps edges to edge labels.
Definition 4.2. The Class/Interface Relationship Graph (CIRG) of an object-oriented system is
an LDG defined in Definition 4.1, whereV is the set of all classes/interfaces of the system,lV (v)
returns the full name (i.e. package name concatenates class or interface name) ofv for anyv ∈ V ,
E = {(v, w) ∈ V × V | v referencesw}, andlE(e) returns the types of relationships between
the source node and target node ofe for anye ∈ E. The type of a relationship is one ofIN , RE,
AS, AG, CO, andUS, which representsinheritance, realization, association, aggregation,
composition, andusage, respectively.
Each class or interface of a Java system represents a node of the CIRG of the system. We
name the node in the CIRG asRClass, and each node is presented and exported as an XML
document. The XML schema for each node is depicted in Figure 4.7. The XML schema shows
that four types of information about the CIRG node are captured :
• Property The property field records the name, the type (i.e., class or interface), the package
name, and the Java source file name of the corresponding class or interface.
• Characteristics The characteristics field records the accessibility (i.e., public, protect, or
private) and the implementation status (i.e., concrete class or abstract class) of the corre-
sponding class or interface.
CHAPTER 4. ARCHITECTURE RECOVERY 48
• Metrics The metrics field records values of the metrics in Table 4.1 for the corresponding
class or interface.
• RelationshipsThe relationships field records all classes or interfaces which have one of the
defined relationships with the corresponding class or interface. The type and the direction
of the relationship are also stored.
<<sequence>>
o_sequence
<<elt>> +class [0..*] : xsd:string
<<elt>> +interface [0..*] : xsd:string
<<sequence>>
i_sequence
<<elt>> +class [0..*] : xsd:string
<<elt>> +interface [0..*] : xsd:string
<<all>> +p_all [1..1]
<<complexType>>
Property
<<all>> +c_all [1..1]
<<complexType>>
Characteristics
<<all>> +m_all [1..1]
<<complexType>>
Metrics
<<elt>> +name [1..1] : xsd:string
<<elt>> +type [1..1] : xsd:string
<<elt>> +package [1..1] : xsd:string
<<elt>> +sourceFile [1..1] : xsd:string
<<all>>
p_all
<<elt>> +accessibility [1..1] : xsd:string
<<elt>> +implementation [1..1] : xsd:string
<<all>>
c_all
<<elt>> +lines_code [1..1] : xsd:positiveInteger
<<elt>> +num_attr [1..1] : xsd:positiveInteger
<<elt>> +num_pub_ops [1..1] : xsd:positiveInteger
<<elt>> +num_ops [1..1] : xsd:positiveInteger
<<elt>> +num_nested_classes [1..1] : xsd:positiveInteger
<<elt>> +setters [1..1] : xsd:positiveInteger
<<elt>> +getters [1..1] : xsd:positiveInteger
<<all>>
m_all
<<all>> r_all[0..1]
<<complexType>>
Relationships
<<seq>> d_sequence[1..1]
<<complexType>>
Direction
<<elt>> +in [0..1]
<<elt>> +out [0..1]
<<sequence>>
d_sequence
<<seq>> i_sequence[1..1]
<<complexType>>
Direction::In
<<seq>> o_sequence[1..1]
<<complexType>>
Direction::Out
<<sequence>>
sequence
<<elt>> +property [0..1] : Property
<<elt>> +characteristics [1..1] : Characteristics
<<elt>> +metrics [0..1] : Metrics
<<elt>> +relationships [0..1] : Relationships
<<complexType>>
RClass
<<seq>> +sequence [1..1]
<<elt>> +inheritance [0..1] : Direction
<<elt>> +realization [0..1] : Direction
<<elt>> +association [0..1] : Direction
<<elt>> +aggregation [0..1] : Direction
<<elt>> +composition [0..1] : Direction
<<elt>> +usage [0..1] : Direction
<<all>>
r_all
Figure 4.7: The UML Representation of XML Schema for Nodes in the CIRG.
CHAPTER 4. ARCHITECTURE RECOVERY 49
4.3.4 Class/Interface Dependency Graph
Class dependencies occur when one class uses the services of another class. For example, this
can happen when a class inherits from another, has an attribute whose type is of another class, or
when one of its methods calls a method on an object of another class. Given two classes,v and
w, let v ³ w represent that classv depends upon classw. We formalize the class dependency as
follows :
v ³ w = v<IN>−→ w ∨
v<RE>−→ w ∨
v<AS>−→ w ∨
v<AG>−→ w ∨
v<CO>−→ w ∨
v<US>−→ w
(4.13)
Now, we are ready to give the formal definition of the CIDG of an object-oriented system:
Definition 4.3. The Class/Interface Dependency Graph (CIDG) of an object-oriented system is
an LDG defined in Definition 4.1, whereV is the set of all classes/interfaces of the system,lV (v)
returns the full name (i.e. package name concatenates class or interface name) ofv for anyv ∈ V ,
E = {(v, w) ∈ V × V | v ³ w}, LE = φ, and hencelE(e) returns an empty label for any
e ∈ E.
Again, each class or interface of a Java system represents a node of the CIDG of the system.
We name the node in the CIDG asDClass, and each node is presented and exported as an XML
document. The XML schema for each node is depicted in Figure 4.8. The XML schema shows
that four types of information about the CIDG node are captured :
• Property The property field records the name, the type (i.e., class or interface), the package
name, and the Java source file name of the corresponding class or interface.
CHAPTER 4. ARCHITECTURE RECOVERY 50
• Characteristics The characteristics field records the accessibility (i.e., public, protect, or
private) and the implementation status (i.e., concrete class or abstract class) of the corre-
sponding class or interface.
• Metrics The metrics field records values of the metrics in Table 4.1 for the corresponding
class or interface.
• DependencyThe dependency field records all classes or interfaces on which the corre-
sponding class or interface depends, and all classes or interfaces that depend on the corre-
sponding class or interface.
<<sequence>>
t_sequence
<<elt>> +class [0..*] : xsd:string
<<elt>> +interface [0..*] : xsd:string
<<all>> +p_all [1..1]
<<complexType>>
Property
<<all>> +c_all [1..1]
<<complexType>>
Characteristics
<<all>> +m_all [1..1]
<<complexType>>
Metrics
<<elt>> +name [1..1] : xsd:string
<<elt>> +type [1..1] : xsd:string
<<elt>> +package [1..1] : xsd:string
<<elt>> +sourceFile [1..1] : xsd:string
<<all>>
p_all
<<elt>> +accessibility [1..1] : xsd:string
<<elt>> +implementation [1..1] : xsd:string
<<all>>
c_all
<<elt>> +lines_code [1..1] : xsd:positiveInteger
<<elt>> +num_attr [1..1] : xsd:positiveInteger
<<elt>> +num_pub_ops [1..1] : xsd:positiveInteger
<<elt>> +num_ops [1..1] : xsd:positiveInteger
<<elt>> +num_nested_classes [1..1] : xsd:positiveInteger
<<elt>> +setters [1..1] : xsd:positiveInteger
<<elt>> +getters [1..1] : xsd:positiveInteger
<<elt>> +fan_in [1..1] : xsd:positiveInteger
<<elt>> +fan_out [1..1] : xsd:positiveInteger
<<all>>
m_all
<<all>>d_all[0..1]
<<complexType>>
Dependency
<<seq>> t_sequence[1..1]
<<complexType>>
Types
<<sequence>>
sequence
<<elt>> +property [0..1] : Property
<<elt>> +characteristics [1..1] : Characteristics
<<elt>> +metrics [0..1] : Metrics
<<elt>> +relationships [0..1] : Dependency
<<complexType>>
DClass
<<seq>> +sequence [1..1]
<<elt>> +in [0..1] : Types
<<elt>> +out [0..1] : Types
<<all>>
d_all
Figure 4.8: The UML Representation of XML Schema for Nodes in the CIDG.
CHAPTER 4. ARCHITECTURE RECOVERY 51
4.3.5 An Example : Car Rental System
In order to clarify the definitions and algorithms proposed in this thesis, we will give examples of
a hypothetical software system on appropriate places. The hypothetical software system is a Car
Rental System (CRS) which consists of agents, customers, and a vehicle repository. The CRS
provides two main business services: i) booking cars, and ii) evaluating cars based on the driving
records of the customers. Figure 4.9 shows the CIRG of the CRS system. The CIRG captures all
class relationships defined in Section 4.3.1 of the CRS system.
com.uwstar.crs.training
TrainingCourse
com.uwstar.crs
VehicleRepository
com.uwstar.crs.training
TrainingPlan
com.uwstar.crs
Booking
com.uwstar.crs
VehicleEvaluation
com.uwstar.crs
IBooking
com.uwstar.crs.person
Agent
com.uwstar.crs.person
Customer
com.uwstar.crs.person
Person
com.uwstar.crs.record
Record
com.uwstar.crs.record
DrivingRecord
com.uwstar.crs.record
CreditRecord
com.uwstar.crs.vehicle
Vehicle
com.uwstar.crs.vehicle
Car
com.uwstar.crs.vehicle
SUV
com.uwstar.crs.vehicle
Truck
com.uwstar.crs.person
Dealer
<<AS>>
<<AG>> <<AS>>
<<AG>> <<IN>> <<IN>>
<<RE>>
<<AS, US>>
<<AS, US>>
<<AS, US>>
<<IN>> <<IN>>
<<AG>> <<AG>>
<<AG, US>>
<<AG, US>> <<AG, US>> <<AG, US>>
<<CO>>
<<IN>> <<IN>> <<IN>>
Figure 4.9: The CIRG of the Car Rental System (CRS).
CHAPTER 4. ARCHITECTURE RECOVERY 52
Figure 4.10 shows the CIDG of the CRS system. Each node represents a class/interface of
the CRS system, and an edge between two classes/interfaces represents a dependency existing
between these two classes/interfaces. By their definitions, the CIRG is a UML-compliant model,
and the CIDG is a further abstraction of the CIRG. That is, the CIRG and CIDG model the
structure of an object-oriented software system at different levels of abstraction.
com.uwstar.crs.training
TrainingCourse
com.uwstar.crs
VehicleRepository
com.uwstar.crs.training
TrainingPlan
com.uwstar.crs
Booking
com.uwstar.crs
VehicleEvaluation
com.uwstar.crs
IBooking
com.uwstar.crs.person
Agent
com.uwstar.crs.person
Customer
com.uwstar.crs.person
Person
com.uwstar.crs.record
Record
com.uwstar.crs.record
DrivingRecord
com.uwstar.crs.record
CreditRecord
com.uwstar.crs.vehicle
Vehicle
com.uwstar.crs.vehicle
Car
com.uwstar.crs.vehicle
SUV
com.uwstar.crs.vehicle
Truck
com.uwstar.crs.person
Dealer
Figure 4.10: The CIDG of the Car Rental System (CRS).
CHAPTER 4. ARCHITECTURE RECOVERY 53
4.4 Summary
We have discussed the source code modeling process and the architecture modeling process that
are contained in the architecture recovery stage of the SOC4J framework. The source code mod-
eling process builds a complete data model set for Java source code at different abstracted levels.
Based on the data models, the architecture modeling process establishes a repository of relation-
ships among classes and interfaces which can easily be queried in the next stage of the SOC4J
framework.
Chapter 5
Service Identification
An effective way of leveraging the value of legacy systems is to expose their functionalities as
services to a larger number of clients. Identifying critical business services embedded in an
existing Java system is one of the primary tasks of the proposed SOC4J framework. This is
done in the service identification process of the SOC4J framework. This process is based on
the analysis on the recovered architectural information obtained from the previous chapter. This
chapter discusses the service identification strategy and algorithms that are used to identify critical
business services embedded in an existing object-oriented system.
In Section 5.1, we discuss how a service is described and modeled. We introduce the support-
ing techniques used in the service identification process in Section 5.2. The service identification
process is presented in Section 5.3. Finally, we give a summary of this chapter in Section 5.4.
5.1 Service Representations
A business service within a software system is an abstract resource that represents a capability
of performing tasks that represent a coherent functionality from the points of view of both the
provider and the requester [40]. We categorize services that are embedded in an object-oriented
54
CHAPTER 5. SERVICE IDENTIFICATION 55
system into two categories :
• Top-Level Services (TLS)A top-level service is a service that is not used by any other
services of the system. However, it may contain a hierarchy of low-level services that
further describe the service. From the requester’s point of view, top-level services are
services provided by the system that can be accessed independently. Top-level services are
hence independent from each other.
• Low-Level Services (LLS)A low-level service is a service that is underneath a top-level
service, which may be agglomerated with other low-level services underneath the same
top-level service to yield a new service with higher level of granularity (i.e., the desired
business result).
The SOC4J framework is designed to identify both the top-level services and the low-level
services embedded in an existing object-oriented system. In order to clearly describe and auto-
mate the identification process, we describe an identified service (either a top-level service or a
low-level service) as a tuple :
(name,CF , SHG)
In the above tuple,name is the name of the service.CF is the facade class set of the service.
The facade class set contains classes/interfaces that directly provide the functionality of the ser-
vice to the outside world.SHG is theService Hierarchy Graph(SHG) of the top-level service
represented by the tuple. The SHG is defined as follows :
Definition 5.1. The Service Hierarchy Graph (SHG) associated with a top-level service is a
rooted LDG, where the root,r ∈ V , represents the top-level service,V \ r represents the set
of low-level services contained in the top-level service,lV (v) returns theCF set ofv for any
v ∈ V , E = {(v, w) ∈ V × V | v containsw}, LE = φ, and hencelE(e) returns an empty label
for anye ∈ E.
CHAPTER 5. SERVICE IDENTIFICATION 56
The SHG shows the structural relationships between the services underneath a top-level ser-
vice. It gives a high-level representation of services that is understandable by both developers
and business experts. Furthermore, the SHG describes the modularization of its corresponding
top-level service. There is no SHG associated with a low-level service, that is to say : SHG =φ
for a low-level service. This is because each low-level service has already been presented in the
SHG of its top-level service. The SHGs of all top-level services of an object-oriented software
system form theservice view(ServView) of the system.
The identified services (represented as tuples) are exported and stored as XML documents.
The XML schema for services is illustrated in Figure 5.1.
<<complexType>>
Service::FacadeClassSet
<<seq>> +fc_sequence [1..1] <<elt>> +class [0..1] : xsd: string
<<elt>> +interface [0..1] : xsd: string
<<choice>>
fc_sequence
<<complexType>>
Service::ServiceHierarchyGraph
<<seq>> +shg_sequence [1..1]
<<elt>> +name [1..1] : xsd:string
<<elt>> +serviceHierarchyGraph [0..1]
<<elt>> +facadeClassSet [1..1]
<<sequence>>
sequence
<<seq>> +sequence [1..1]
<<complexType>>
Service
<<elt>> +name [1..1] : xsd: string
<<sequence>>
shg_sequence
Figure 5.1: The UML Representation of XML Schema for a Service.
5.2 Supporting Concepts
The proposed service identification approach involves a set of techniques such as graph trans-
formations, dominance analysis on directed graphs, and evaluation of the modularization of a
system that is represented by directed graphs. It is helpful to introduce these techniques prior to
explaining the service identification process.
CHAPTER 5. SERVICE IDENTIFICATION 57
5.2.1 Graph Techniques
Graphs can be used to describe complex object structures in a mathematical way. In the context
of software engineering, we can use graphs to formalize object-oriented languages and concepts,
especially, the UML. In this thesis, we apply graph techniques to assist in service identification.
The important graph concepts and techniques involved in this thesis are reviewed as follows :
Definition 5.2. Let G = (V, E) be a directed graph (DG), whereV represents all nodes (or
vertices) inG andE represents all edges (or arcs) inG. Given a nodev ∈ V , the in-degree ofv
is the number of inward directed edges fromv and the out-degree ofv is the number of outward
directed edges fromv. A root of G is a node whose in-degree is zero.G is said to be a rooted
directed graph iff there is only one root inV .
Definition 5.3. Let G = (V, E) be a DG, whereV represents all nodes (or vertices) inG andE
represents all edges (or arcs) inG. Given two nodesv ∈ V andw ∈ V , a path from vertexv to
vertexw is a sequence of consecutive edges betweenv andw. A cycle is a path from a node to
the same node. Nodew is said to be reachable from nodev if there is a path fromv to w. G is a
directed acyclic graph (DAG) iff there is no cycle inG.
Definition 5.4. A rooted tree is a DG G=(V,E), whereV represents all nodes (or vertices) inG
andE represents all edges (or arcs) inG, such that
1. there is a unique node inV (called the root) which has in-degree0;
2. every node inV except the root has in-degree1; and
3. there is a path from the root to every other node inG.
Definition 5.5. Let G = (V, E) be a DG, whereV represents all nodes (or vertices) inG and
E represents all edges (or arcs) inG. G is connected if the underlying undirected graph ofG is
connected. WhileG is strongly connected if there is a path inG between every pair of nodes in
V .
CHAPTER 5. SERVICE IDENTIFICATION 58
Definition 5.6. Let G = (V, E) be a DG, whereV represents all nodes (or vertices) inG
andE represents all edges (or arcs) inG. A connected component ofG is a maximal (though
not necessarily maximum) connected subgraph ofG. A strongly connected component ofG
is a maximal (though not necessarily maximum) strongly connected subgraph ofG. A rooted
component is a subgraph ofG that consists of a unique root and the collection of all nodesw
such that there is a path from the root tow.
Definition 5.7. Let G = (V, E) be a DG, whereV represents all nodes (or vertices) inG andE
represents all edges (or arcs) inG. A clique inG is a collection of nodes inV such that each pair
of nodes in the collection is joined by an edge. Ak-clique is a clique that the number of nodes in
the clique isk.
Figure 5.2: An Example of a Directed Graph.
For example, given the directed graphG in Figure 5.2, there are two connected components:
graphs (a) and (b) in Figure 5.3. The only strongly connected component ofG is the graph
(c) in Figure 5.3. Note that the subgraph{2, 5, 7} or {5, 6, 7} is not a strongly connected
component ofG, because they are not maximal. Graph (d) and graph (e) in Figure 5.3 are two
rooted components of graph (a) in Figure 5.3. The set{2, 3, 7} is a 3-clique in graphG.
CHAPTER 5. SERVICE IDENTIFICATION 59
(a) (b) (c)
(c) (d)
Figure 5.3: (a) A connected component of the directed graphG in Figure 5.2. (b) The otherconnected component ofG. (c) The only strongly connected component ofG. (d) Arooted component of graph (a). (e) The other rooted component of graph (a).
5.2.2 Dominance Analysis
Dominance analysis is a fundamental concept in compiler optimizations and has been used exten-
sively to identify loops in basic block graphs [61]. It allows one to locate subordinated software
elements in a rooted dependency graph. Dominance analysis on call graphs of procedural lan-
guage applications has been used in reverse engineering to identify modules and subsystems and
recover system architectures [17, 26, 36]. In this thesis, we explore the use of dominance analysis
on SHGs. This assists us in identifying low-level services underneath a top-level service.
Dominance is a relation between nodes in a rooted directed graph. This relation can be
formally defined as follows :
CHAPTER 5. SERVICE IDENTIFICATION 60
Definition 5.8. Let G = (V, E, r) be a rooted directed graph, whereV represents all nodes in
G, E represents all edges inG, andr ∈ V is the unique root node ofG. Given any two different
nodesv ∈ V andw ∈ V , nodev dominates nodew, writtenv dom w, iff every path from root
r to w containsv. Nodev directly dominates nodew, written v ddom w, iff all the nodes that
dominatew dominatev. Nodev strongly directly dominates nodew, written v sddom w, iff
v ddom w andv is the predecessor ofw.
Definition 5.9. LetG = (V, E, r) be a rooted directed graph, whereV represents all nodes inG,
E represents all edges inG, andr ∈ V is the unique root node ofG. The dominance tree corre-
sponding toG is a treeT = (V, Ed, r) whereEd = {(v, w) ∈ V×V | v ddom w ∨ v sddom w}.A ddom subtree ofT is a subtree that the root of the subtree has ddom incoming edge. A sddom
subtree ofT is a subtree that the root of the subtree has sddom incoming edge. A consolidation
subtree of the dominance tree is a subtree that contains only sddom edges. A maximal consolida-
tion subtree is a maximal subtree that contains only sddom edges.
(a) (c)
sddom
ddom
(b)
Figure 5.4: (a) A Simple Directed Graph. (b) The Dominance Tree Corresponding to the Graphin (a). (c) All Two Maximal Consolidation Subtrees of the Dominance Tree in (b).
Figure 5.4 shows a simple rooted directed graph, the corresponding dominance tree, and the
maximal consolidation subtrees in the dominance tree. Note that the subtree{6, 9} is a ddom
CHAPTER 5. SERVICE IDENTIFICATION 61
subtree and{2, 4, 5, 8} is a sddom subtree. The subtree{7, 10} is a consolidation subtree but not
a maximal consolidation subtree, because it is not a maximal subtree that contains only sddom
edges. In Figure 5.4, the dominance tree is constructed from an acyclic graph. However, this is
not a necessary condition. We can construct a dominance tree from every directed graph as long
as it is rooted.
By Definition 5.8 and 5.9, we can observe the following properties of dominance trees :
Property 5.1. Given a rooted directed graphG = (V, E, r), whereV represents all nodes inG,
E represents all edges inG, andr ∈ V is the unique root node ofG. Let T be the dominance
tree corresponding toG. For each node (except the root) in a subtrees (either ddom subtree or
sddom subtree) ofT , there is no incoming edge inE from any other nodes which are outside the
subtree.
Property 5.2. Given a rooted directed graphG = (V, E, r), whereV represents all nodes inG,
E represents all edges inG, andr ∈ V is the unique root node ofG. Let T be the dominance
tree corresponding toG. For each node (except the root) in a consolidation subtrees ofT , there
is no incoming edge inE from any other nodes (either inside the subtree or outside the subtree)
except its parent inT .
In the analysis process of reverse engineering, it is essential to have an effective way of
abstracting information. The dominance tree provides such an abstraction. More importantly, it
represents high-level modularization of the software system through its branches. Each branch of
the dominance tree represents a concept or high level functionality of the system. In the context
of object-oriented design, one benefit of using dominance trees in program comprehension is the
reduction of the visualization complexity of the class dependency graph by decreasing a large
number of edges. In the class dependency graph of a real world software system, a class may
have been referenced by hundreds of classes and a reduction to a single edge on the dominance
tree greatly clarifies the graphic.
CHAPTER 5. SERVICE IDENTIFICATION 62
5.2.3 Modularization Quality Metric
The modularization quality (MQ) metric was first introduced in [54]. It has been used in a number
of software engineering projects to evaluate the quality of software modularization achieved by
graph partitioning [24, 76]. Basically, the MQ metric measures the difference between the average
inter-connectivity and intra-connectivity of a system and shows how well the system is structured.
In this thesis, we use the MQ metric to evaluate how well a top-level service is modularized by
its low-level services.
Let C(G1, G2, ..., Gk) be a partition of a given graphG(V,E), whereV represents all nodes
in G andE represents all edges inG. The MQ metric of the system, which is represented by the
graphG, is defined as follows :
MQ(C, G) =∑k
i=1 s(Gi, Gi)n
−∑k−1
i=1
∑kj=i+1 s(Gi, Gj)
n(n− 1)/2(5.1)
The functions() used in Formula (5.1) is defined as the ratio of the actual number of edges
between two subsets ofV of graphG with respect to the maximum number of possible edges
between those two sets. LetU andW be two subsets ofV (i.e.,U ⊆ V andW ⊆ V ), then we
have
s(U,W ) =e(U,W )|U ||W | (5.2)
wheree(U,W ) denotes the number of edges connecting a vertex inU to a vertex inW .
The MQ metric determines the quality of the modularization quantitatively as the trade-off
between inter-connectivity and intra-connectivity of subsystems. This trade-off is based on the
assumption that well-designed software systems are organized into cohesive subsystems that are
loosely interconnected. Hence, the MQ metric is designed to reward the creation of highly cohe-
sive clusters, and to penalize excessive coupling between clusters. The value of the MQ metric is
CHAPTER 5. SERVICE IDENTIFICATION 63
between−1 (no internal cohesion) and1 (no external coupling). A straightforward consequence
is that a higherMQ value can be interpreted as better modularization since it corresponds to a
partition with either fewer edges connecting vertices from distinct blocks, or with more edges ly-
ing within the identical blocks of the partitions, which is what most clustering or modularization
algorithms aim to achieve [24].
5.3 The Proposed Processes
In the SOC4J framework, we aim to identify critical business services embedded in an exist-
ing Java system. Our service identification process, as shown in Figure 5.5, is supported by a
combination of top-down and bottom-up techniques.
No YesTermination CriteriaSatisfied?
Dominance Tree Reduction
Dominance Tree Generation
SHG Transformation
SHG Reconstruction
To Stage 3
Lo
w-L
evel S
erv
ice Id
en
tificatio
n
DTree of SHG
Reduced DTree
SHG
SHG
Serv
ice A
gg
reg
atio
n
Control Flow Process
Service Validation
Top-Level Service Candidate Generation
CIDG Transformatiom
From Stage 1
To
p-L
ev
el S
erv
ice Id
en
tificatio
n
Validated top-level services andtheir atomic services (described in SHGs)
Top-level service candidates
MCIDGs
Figure 5.5: Processes in Service Identification Stage.
CHAPTER 5. SERVICE IDENTIFICATION 64
In the top-down portion of the process, we identify the top-level services and the atomic
services (to be discussed later) underneath each top-level service. In the bottom-up portion,
we aggregate the atomic services to identify services with higher level of granularity (reusable
services). We will delve into these two portions in the subsequent two sections.
5.3.1 Top-Level Service Identification
The top-level service identification process is the top-down portion of the proposed service iden-
tification process. According to the definition of a top-level service ( introduced in Section 5.1),
top-level services of a software system partition the system into independent parts. Each of these
independent part represents a service to the outside world, from the user’s point of view. We
identify services of a system by starting with its top-level services, and then extracting a ser-
vice hierarchy for each top-level service to identify low-level services underneath each top-level
service.
Algorithm 5.1: CIDG-Transformation
Input : CIDG : The CIDG of the system.Output : MCIDGs : A set of MCIDGs.
// decompose the CIDG into connected componentsMCIDGs ← φ;1
CGraphs ← ConnectedComponents(CIDG);2
// decompose each connected component into a set of rooted// componentsforeachgraph g∈ CGraphsdo3
RGraphs ← RootedComponents(g);4
MCIDGs ← MCIDGs ∪RGraphs;5
end6
To identify the top-level services of an existing object-oriented system, the first step is to
identify the entry points of the system. In Chapter 4, we have modeled the existing system as
CHAPTER 5. SERVICE IDENTIFICATION 65
directed graphs : the class/interface relationship graph (CIRG) and the class/interface dependency
graph (CIDG). At this stage, we decompose the CIDG into a set of connected components with
an unique root such that each component is an independent subgraph of the CIDG. Algorithm 5.1
describes the decomposition process.
In Algorithm 5.1, functionConnectedComponents() computes and returns all connected
components of the given directed graph. While functionRootedComponents() decomposes a
connected directed graph into a set of rooted components. We name each of the rooted compo-
nents as amodularized CIDG(MCIDG). Essentially, Algorithm 5.1 applies a set of graph trans-
formation rules to transform the CIDG into a set of rooted components (i.e., MCIDGs). Note
that the output MCIDGs are subgraphs of the CIDG, and each node in an MCIDG represents a
single class or interface of the system. There is no other class or interface in the system depends
upon the unique root of each MCIDG. Consequently, the unique root of each MCIDG might rep-
resents an entry point of the system, and each MCIDG might therefore embed a top-level service
represented by its root.
As we have mentioned, each node of an MCIDG contains only one class or interface. At this
stage, we consider the root of each MCIDG as a top-level service candidate and the other nodes
as the low-level service candidates underneath the top-level service candidate. The second step of
the top-level service identification is to generate the top-level service candidates from MCIDGs.
This is achieved by performing three tasks for each top-level service candidate represented by an
MCIDG : i) to compute the facade class set, ii) to build the SHG of the top-level service candidate,
and iii) to describe the candidate as the tuple that we have defined in Section 5.1.
The final step of the top-level service identification is to validate the top-level service candi-
date and assign a meaningful name for each accepted top-level service. This is a user-involved
procedure. The user retrieves the functionality provided by the candidate through examining the
classes/interfaces in its facade class set. Based on the functionality, the user can make a decision
on the candidate.
CHAPTER 5. SERVICE IDENTIFICATION 66
Algorithm 5.2: Top-Level Service Identification
Input : CIDG : The CIDG of the system.Output : TLSs : A set of identified top-level services that are represented by
(name,CF , SHG) tuples.
// decompose the CIDG into a set of rooted components// each rooted component is an MCIDGMCIDGs ← Run CIDG-Transformation Alg. onCIDG;1
// generate top-level service candidates// represent candidates as (name,CF , SHG) tuplesCandidates ← φ;2
foreachMCIDG(Vm, Em) ∈ MCIDGs do3
Create a new graphG(V, E);4
V ← φ;5
E ← Em;6
for i ← 1 to |Vm| do7
// Vm(i) means the ith node in Vm
V (i) ← Facade(Vm(i),MCIDG, CIDG);8
end9
Create a new tupleT (name,CF , SHG);10
T.name ← null;11
T.CF ← Root(G);12
T.SHG ← G;13
Add tupleT (name,CF , SHG) to Candidates;14
end15
// validate the top-level service candidates// assign a meaningful name for each accepted serviceTLSs ← φ;16
foreach tupleT ∈ Candidates do17
The user validates the candidate by examiningT.CF ;18
if T is acceptablethen19
T.name ← An meaningful name for the service;20
Add T (name,CF , SHG) to TLSs;21
end22
end23
CHAPTER 5. SERVICE IDENTIFICATION 67
Algorithm 5.2 describes the details of these three steps in the top-level service identification
process. In Algorithm 5.2, each iteration of thefor loop on line3 transforms an MCIDG into a
top-level service candidate. FunctionFacade() computes and returns facade class sets for a given
top-level service candidate and its low-level service candidates. As we have described, the facade
class set contains classes/interfaces that describe the functionality of the service to the outside
world. Therefore, functionFacade() returns a set of classes/interfaces that have incoming edges
from classes/interfaces in the CIDG but not in the MCIDG. FunctionRoot() returns the root of a
given directed graph.
The user validates a candidate by examining its facade class set since classes in the set rep-
resent the functionality of the service. At this stage, the SHG corresponding to each top-level
service is built from the MCIDG and therefore can be viewed as a subgraph of the CIDG. In
other words, the SHG is a abstraction of a MCIDG hiding the non-necessary information for
understanding the service hierarchy. The functionality of low-level services in the hierarchy is
provided by a single class. Hence these services are calledatomic services. In most cases, these
atomic services are too fine-grained and have little reusability. However, the SHG at this stage
provides us a good starting point to identify services with a higher level of granularity by using
service aggregation techniques that are presented in the subsequent section.
After performing the top-level service identification, the critical top-level services of an exist-
ing system have been identified. Moreover, for each top-level service, we have extracted a service
hierarchy graph (SHG) to model its low-level services. However, at this time, low-level services
in the SHG are atomic services with little or no reusability. We need to build a new SHG for each
top-level service that contains low-level services with a higher level of granularity. Consequently,
these low-services in the new SHG are critical business services and have better reusability. This
is achieved at the low-level service identification process.
CHAPTER 5. SERVICE IDENTIFICATION 68
5.3.2 Low-Level Service Identification
The low-level service identification process is the bottom-up portion of the entire service iden-
tification process. SHGs built in the top-level service identification process are rooted directed
graphs that represent the structural dependency between a top-level service and its low-level ser-
vices (atomic services). As we have mentioned, these atomic services are too fine-grained and
therefore have limited reusability. At this stage, we aim to aggregate highly related atomic ser-
vices to build a new SHG for each top-level service such that the services contained in the new
SHG have a higher level of granularity and thus present a higher potential for reuse. The ser-
vice aggregation is an iterative process and the desired new SHG is achieved incrementally. The
low-level services obtained from each iteration have higher level of granularity than the previ-
ous iteration and hence modularize the top-level service in a different way. The result services of
each iteration are presented as an intermediate SHG to users. An evaluation procedure can be per-
formed at each iteration to determine whether specific goals have been reached. Then users can
make a decision on repeating or terminating the process according the pre-defined termination
criteria.
Algorithm 5.3 describes the low-level service identification process for a given top-level ser-
vice. Essentially, it repeatedly runs the service aggregation algorithm (i.e., Algorithm 5.4) on
low-level services underneath a top-level service until theTermination Criteriaare satisfied. Once
the iteration is terminated, the final SHG is built for the top-level service. Then, the algorithm
represents the low-services contained in the newly built SHG in tuples defined in Section 5.1.
FunctionMQ() computes the MQ metric of a given top-level service. The MQ metric quantita-
tively measures the quality of the modularization of a top-level service as the trade-off between
inter-connectivity and intra-connectivity of its low-level services.
Based on the modularization of the top-level service and the level of granularity of the low-
level services underneath the top-level service, we define twoTermination Criteriato stop the
CHAPTER 5. SERVICE IDENTIFICATION 69
Algorithm 5.3: Low-Level Service Identification
Input : CIRG : The CIRG of the system,CIDG : The CIDG of the system,T (name,CF , SHG) : The top-level service.
Output : LLSs : Identified low-level services represented in(name,CF , SHG) tuples.T (name,CF , SHG) : The input top-level service with newly built SHG.
// compute the MQ metric of the input top-level serviceComputeMQ(T.SHG, CIDG);1
// aggregate low-level services iterativelyrepeat2
SHGnew ← Run Service Aggregation Alg. onT.SHG;3
T.SHG ← SHGnew;4
ComputeMQ(T.SHG, CIDG);5
until Termination Criteria are satisfied;6
// represent identified Low-level services in tuplesLLSs ← φ;7
foreachnon-root nodev ∈ T.SHG do8
Create a new tupleL(name,CF , SHG);9
L.name ← Meaningful name for the service;10
L.CF ← lV (v);11
L.SHG ← φ;12
Add L(name,CF , SHG) to LLSs;13
end14
service aggregation iteration in Algorithm 5.3 :
Termination Criterion 5.1. The top-level service has been nicely modularized by its low-level
services.
Termination Criterion 5.2. Low-level services are presenting appropriate level of granularity.
In term of the structure of a top-level service, the low-level services underneath the top-level
service modularize the top-level service. By the definition of the MQ metric, the higher the value
of the MQ metric of a top-level service is, the better structure the service has. This is based on
CHAPTER 5. SERVICE IDENTIFICATION 70
the hypothesis that a well-modularized service becomes highly malleable; that is, the service can
evolve in less time and at less cost. On the other hand, the level of granularity of services must
be matched to the level of reusability and flexibility required for a given context. The basis of the
second criterion is the hypothesis that the component that realizes a service with higher level of
granularity has better reusability.
Algorithm 5.4: Service-Aggregation
Input : CIRG : The CIRG of the system,CIDG : The CIDG of the system,SHG : The SHG that contains the low-level services to be aggregated,Heuristic1 : Termination Criterion 1,Heuristic2 : Termination Criterion 2.
Output : SHGnew : A new SHG that contains low-level services with higher level ofgranularity.
// SHG transformationSHGnew ← CollapseCliques(SHG,CIRG, CIDG);1
SHGnew ← CollapseStronglyConnectedComponents(SHGnew);2
// dominance tree generationDTree ← GenerateDominanceTree(SHGnew);3
// dominance tree reductionReduceDominanceTree(DTree, Heuristic1);4
ReduceDominanceTree(DTree, Heuristic2);5
// SHG reconstructionSHGnew ← ReconstructSHG(DTree, CIDG);6
Algorithm 5.4 aggregates highly related low-level services into a single service with a higher
level of granularity and reconstructs a new SHG containing these newly identified services. The
output SHG contains fewer low-level services with a higher level of granularity than the input
SHG. In order words, it modularizes the corresponding top-level service in a better way.
The service aggregation is based on the dominance analysis on SHGs. As we have explained,
CHAPTER 5. SERVICE IDENTIFICATION 71
SHGs are rooted directed graphs, hence we can generate dominance trees from SHGs. However,
in order to improve the shape of the generated dominance tree (increase the height of the tree), we
perform a graph transformation on SHGs. The purpose of the graph transformation is to agglom-
erate strongly related services and remove cycles in SHGs. Program units linked by recursion
contribute to the implementation of a single functionality and can, therefore, be regarded as a sin-
gle module. We remove cycles in SHGs by aggregating the services within a cycle into a single
service. Where many services are involved within a cycle, poorer results of the dominance tree
analysis are generally obtained [17, 36]. Our empirical studies in Chapter 7 shows that collapsing
strongly related services and removing cycles in SHGs are essential to dominance analysis on
SHGs.
In Algorithm 5.4, functionCollapseCliques() collapses the services in a 3-clique in the in-
put SHG if the similarity of services in the clique exceeds a user-defined threshold. We have
developed a methodology for computing the similarity between two services, based on the cou-
pling analysis of the classes that implements these services [52].
FunctionCollapseStronglyConnectedComponents() iteratively detects the strongly con-
nected components (described in Section 5.2.1) in a directed graph and then collapses all nodes
in the component into one node and updates the edges accordingly until there is no strongly con-
nected component left. Consequently, the output graph of this function is a directed acyclic graph
(DAG). The output SHG of the SHG transformation contains no cycle.
Once the SHG transformation is done, functionGenerateDominanceTree() generates the
service dominance tree from the new SHG. FunctionReduceDominanceTree() reduces a dom-
inance tree by applying a given reducing heuristic. We define two reducing heuristics as follows:
Heuristic 5.1. Remove each maximal consolidation subtree by only keeping the root node of the
subtree.
Agglomerating all services that are parts of a maximal consolidation subtree into a service makes
CHAPTER 5. SERVICE IDENTIFICATION 72
sense because these services constitute an independent unit that can only be accessed by the rest
of services of the system through the root of the subtree. In order to simplify the visualization,
we only need to present the root because the rest of the subtree is only visible to the root and can
be hidden in the root.
Heuristic 5.2. Remove all leaf nodes in a subtree that contain bothddom andsddom edges,
which are linked to the root of the subtree bysddom edges.
These leaf nodes represent low level services that are only accessible to the service represented
by the root of the subtree. Therefore these low level services can be considered as subservices of
the root.
FunctionReconstructSHG() recovers the service hierarchy for the services presented in a
service dominance tree. It needs the CIDG to provide extra information since the service domi-
nance tree is an abstraction of a service hierarchy graph with some information lost.
After performing the low-level service identification for each identified top-level services
from an existing object-oriented system, critical low-level services underneath each top-level
service have been identified. Finally, the SHGs of all top-level services yield the ServView of the
system.
5.3.3 An Example : Car Rental System
To further explain the proposed service identification processes, in this section, we identify the
business services embedded in the CRS example by applying the algorithms introduced in the
service identification processes.
First of all, we identify the top-level services of the CRS system by running Algorithm 5.2 on
the CIDG of the CRS system, which is depicted in Figure 4.10. Algorithm 5.1 decomposes the
CIDG into rooted components (i.e., MCIDGs). Figure 5.6 depicts the result MCIDGs. There are
three MICDGs generated from the CRS system : graph (a), (b), and (c) in Figure 5.6.
CHAPTER 5. SERVICE IDENTIFICATION 73
com.uwstar.crs
VehicleEvaluation
com.uwstar.crs.person
Customer
com.uwstar.crs.person
Person
com.uwstar.crs.record
Record
com.uwstar.crs.record
DrivingRecord
com.uwstar.crs.record
CreditRecord
(b)
com.uwstar.crs.person
Dealer
(c)
com.uwstar.crs.training
TrainingCourse
com.uwstar.crs
VehicleRepository
com.uwstar.crs.training
TrainingPlan
com.uwstar.crs
Booking
com.uwstar.crs
IBooking
com.uwstar.crs.person
Agent
com.uwstar.crs.person
Customer
com.uwstar.crs.person
Person
com.uwstar.crs.record
Record
com.uwstar.crs.record
DrivingRecord
com.uwstar.crs.record
CreditRecord
com.uwstar.crs.vehicle
Vehicle
com.uwstar.crs.vehicle
Car
com.uwstar.crs.vehicle
SUV
com.uwstar.crs.vehicle
Truck
(a)
Figure 5.6: The MCIDGs of the Car Rental System.
CHAPTER 5. SERVICE IDENTIFICATION 74
Based on the MCIDGs extracted by Algorithm 5.1, Algorithm 5.2 generates the following
top-level service candidates (TLSC) :
• TLSC1 : (null, {com.uwstar.crsBooking}, SHG1).
• TLSC2 : (null, {com.uwstar.crsV ehicleEvaluation}, SHG2).
• TLSC3 : (null, {com.uwstar.crs.personDealer}, SHG3).
com.uwstar.crs.training
TrainingCourse
com.uwstar.crs
VehicleRepository
com.uwstar.crs.training
TrainingPlan
com.uwstar.crs
Booking
com.uwstar.crs
IBooking
com.uwstar.crs.person
Agent
com.uwstar.crs.person
Customer
com.uwstar.crs.person
Person
com.uwstar.crs.record
Record
com.uwstar.crs.record
DrivingRecord
com.uwstar.crs.record
CreditRecord
com.uwstar.crs.vehicle
Vehicle
com.uwstar.crs.vehicle
Car
com.uwstar.crs.vehicle
SUV
com.uwstar.crs.vehicle
Truck
Low-level services
underneath
the top-level service
Low-level services
Figure 5.7: The SHG of the Top-Level ServiceV ehicleBooking.
SHG1, SHG2, andSHG3 are graphs (a), (b), and (c) in Figure 5.6, respectively. By examining
the functionality of each top-level service candidate, we find that the candidate
(null, {com.uwstar.crs.personDealer}, SHG3)
is not a critical business service. The classcom.uwstar.crs.personDealer is a dead class.
Hence, after the service validation, we accept two top-level services (TLS) of the CRS system :
CHAPTER 5. SERVICE IDENTIFICATION 75
• TLS1 : (V ehicleBooking, {com.uwstar.crsBooking}, SHG1).
• TLS2 : (V ehicleEvaluation, {com.uwstar.crsV ehicleEvaluation}, SHG2).
After running Algorithm 5.2, the critical top-level services of the CRS system are identified.
Moreover, for each top-level service, we extract a service hierarchy graph (SHG) to model its low-
level services. Figure 5.7 illustrates the SHG of the identified top-level serviceV ehicleBooking.
At this stage, a low-level service in the SHG is a single class (atomic service) with little or no
reusability. We need to build a new SHG for each top-level service that contains low-level services
(groups of classes) with higher level of granularity.
com.uwstar.crs
VehicleRepository
com.uwstar.crs
Booking
com.uwstar.crs.person
Agent
com.uwstar.crs.person
Customer
com.uwstar.crs.record
DrivingRecord
com.uwstar.crs.record
CreditRecordcom.uwstar.crs.vehicle.Car
com.uwstar.crs.vehicle.Vehicle
com.uwstar.crs.vehicle.SUV
com.uwstar.crs.vehicle.Vehicle
com.uwstar.crs.vehicle.Truck
com.uwstar.crs.vehicle.Vehicle
Figure 5.8: The Result SHG of Performing the SHG Transformation on the Original SHG of theTop-Level ServiceV ehicleBooking in the CRS System.
Now, we are ready to identify low-level services underneath top-level services by running
Algorithm 5.3 on each top-level service. To save space, we only identify low-level services
underneath the top-level serviceV ehicleBooking.
CHAPTER 5. SERVICE IDENTIFICATION 76
com.uwstar.crs
VehicleRepository
com.uwstar.crs
Booking
com.uwstar.crs.person
Agent
com.uwstar.crs.person
Customer
com.uwstar.crs.record
DrivingRecord
com.uwstar.crs.record
CreditRecord
com.uwstar.crs.vehicle.Car
com.uwstar.crs.vehicle.Vehicle
com.uwstar.crs.vehicle.SUV
com.uwstar.crs.vehicle.Vehicle
com.uwstar.crs.vehicle.Truck
com.uwstar.crs.vehicle.Vehicle
Figure 5.9: The Service Dominance Tree of the SHG in Figure 5.8.
Essentially, Algorithm 5.3 computes the MQ metric ofV ehicle and runs Algorithm 5.4 re-
peatedly. In this example, in order to let identified low-level services have appropriate level of
granularity, we use Termination Criteria 5.2 to terminate the service aggregation iteration.
In the first iteration of Algorithm 5.4, Figure 5.8 shows the result SHG by performing the SHG
transformation on the original SHG (shown in Figure 5.7) of the top-level serviceV ehicleBooking.
The result SHG is obtained by aggregating the strongly related atomic services in the original
SHG. For instance, two services represented by nodes
com.uwstar.crs.vehicle.SUV andcom.uwstar.crs.vehicle.V ehicle
have an inheritance relationship and thus are agglomerated into one service represented by the
node
com.uwstar.crs.vehicle.SUV, com.uwstar.crs.vehicle.V ehicle
in the SHG depicted in Figure 5.8. The facade class set of the agglomerated service contains
com.uwstar.crs.vehicle.SUVandcom.uwstar.crs.vehicle.Vehiclebecause these two classes both
provide services to the outside of the new service. Also, there are three nodes in Figure 5.7 which
form a cycle :
CHAPTER 5. SERVICE IDENTIFICATION 77
com.uwstar.crs.personAgent,
com.uwstar.crs.trainingTrainingCourse, and
com.uwstar.crs.trainingTrainingP lan.
Hence, low-level services represented by these nodes are agglomerated into a service represented
by the nodecom.uwstar.crs.personAgent in Figure 5.8. The facade class set contains only
classcom.uwstar.crs.personAgent because the other two classes
com.uwstar.crs.trainingTrainingCourse and
com.uwstar.crs.trainingTrainingP lan.
do not provide services to the outside of the new service.
Once the SHG transformation is complete, functionGenerateDominanceTree() generates
the service dominance tree from the new SHG. Figure 5.9 shows the service dominance tree
of the SHG depicted in Figure 5.8. FunctionReduceDominanceTree() reduces the service
dominance tree in Figure 5.9 by applying the Heuristic 5.1 and Heuristic 5.2. Figure 5.11 shows
the reduced dominance tree.
com.uwstar.crs
VehicleRepository
com.uwstar.crs
Booking
com.uwstar.crs.person
Agent
com.uwstar.crs.person
Customer
com.uwstar.crs.vehicle.Car
com.uwstar.crs.vehicle.Vehicle
com.uwstar.crs.vehicle.SUV
com.uwstar.crs.vehicle.Vehicle
com.uwstar.crs.vehicle.Truck
com.uwstar.crs.vehicle.Vehicle
Figure 5.10: The Reduced Dominance Tree of the Service Dominance Tree in Figure 5.9.
CHAPTER 5. SERVICE IDENTIFICATION 78
FunctionReconstructSHG() recovers the service hierarchy for the services presented in
the service dominance tree in Figure 5.10. Figure 5.11 shows the reconstructed from the reduced
service dominance tree in Figure 5.10.
com.uwstar.crs
VehicleRepository
com.uwstar.crs
Booking
com.uwstar.crs.person
Agent
com.uwstar.crs.person
Customer
com.uwstar.crs.vehicle.Car
com.uwstar.crs.vehicle.Vehicle
com.uwstar.crs.vehicle.SUV
com.uwstar.crs.vehicle.Vehicle
com.uwstar.crs.vehicle.Truck
com.uwstar.crs.vehicle.Vehicle
Figure 5.11: The SHG Reconstructed from the Reduced Service Dominance Tree in Figure 5.10.
After the first iteration, by examining the MQ metric of the top-level serviceV ehicleBooking
and the granularity of low-level services underneath the top-level service, we know whether or
not the termination criteria are satisfied, and repeating the service aggregation process if the ter-
mination criteria are not satisfied. If satisfied, we terminate the process and identify the following
low-level services for the top-level serviceV ehicleBooking :
• (Car, {com.uwstar.crs.vehicle.Car, com.uwstar.crs.vehicle.V ehicle}, φ)
• (Truck, {com.uwstar.crs.vehicle.T ruck, com.uwstar.crs.vehicle.V ehicle}, φ)
• (SUV, {com.uwstar.crs.vehicle.SUV, com.uwstar.crs.vehicle.V ehicle}, φ)
• (V ehicleRepository, {com.uwstar.crs.V ehicleRepository}, φ)
• (Agent, {com.uwstar.crs.person.Agent}, φ)
• (Customer, {com.uwstar.crs.person.Customer}, φ)
CHAPTER 5. SERVICE IDENTIFICATION 79
5.4 Summary
In this chapter, we have discussed the two processes contained in the service identification stage
of the SOC4J framework, namelytop-level service identificationandlow-level service identifica-
tion. Also the techniques used in this stage have been introduced. The critical business services
embedded in an existing system have been identified and modeled. In the subsequent chapter, we
will introduce the approach to packaging identified services into self-contained components and
the methodology for transforming the existing system into a component-based system.
Chapter 6
Component Generation and System
Transformation
In the previous chapter, we have presented the methodology for identify services embedded in an
existing object-oriented software system. We categorize the critical business services embedded
in the system into two categories :top-level servicesand low-level services. Top-level services
and the low-level services underneath each top-level service can be identified by applying the
proposed approach.
The identified services must be packaged as components so that they can be deployed and
thus invoked. Another goal of the proposed SOC4J framework is reconstruct the existing system
to a component-based system, based on the components that realize the identified service. This
chapter discusses the service realization process and system reconstruction process.
In Section 6.1, we discuss how an identified service can be realized as a self-contained com-
ponent. A transformation technique that automatically reconstructs the existing system into a
component-based target system is introduced in Section 6.2. Finally, Section 6.3 gives a sum-
mary of this chapter.
80
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 81
6.1 Component Generation
The component-based development (CBD) assembles software from reusable components within
frameworks such as CORBA, Sun’s Enterprise JavaBeans (EJBs) and Microsoft COM. The service-
oriented architecture (SOA) encourages individual services to be self-contained. To reuse the
identified services and migrate the existing system’s implementation into a component-based
architecture, it is necessary to package the identified services into well-documented and self-
contained components. A self-contained component is a component that contains all the code
necessary to implement its services and hence can be deployed and invoked independently. At
the third stage of the proposed SOC4J framework, we realize each top-level service and the low-
level services contained in its SHG into self-contained components.
6.1.1 Approach
We package each identified service (either top-level service or low-level service) to generate a
self-contained component. A component that realizes a top-level service is called aTop-Level
Component(TLC), while a component that realizes a low-level service is called aLow-Level
Component(LLC). In order to explain the component generation process clearly and automate
the process in the implementation, we describe a generated component as a tuple :
(name, if , CF , CC , CHG)
In the above tuple,name is the name of the component,if is the interface that provides the
entry point of the component,CF is the facade class set of the realized service (we also call
it the Facade Class Setof the component),CC is theConstituent Class Setwhich contains all
classes/interfaces that are necessary to implement the component, andCHG is the abbreviation
of Component Hierarchy Graphthat is associated to a top-level component to describe its low-
level components. The CHG is defined in Definition 6.1. We export and store the generated
component represented by the above tuple as an XML document. The XML schema for the
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 82
component is illustrated in Figure 6.1.
Definition 6.1. The Component Hierarchy Graph (CHG) associated with a top-level component
is a rooted LDG, where the root,r ∈ V , represents the top-level component,V \ r represents the
set of low-level components contained in the top-level component,lV (v) returns thename of v
for anyv ∈ V , E = {(v, w) ∈ V × V | v containsw}, LE = φ, and hencelE(e) returns an
empty label for anye ∈ E.
The CHG shows the structural relationships between the low-level components underneath a
top-level component. Like the SHG, the CHG gives a high-level representation of the compo-
nents that is understandable by both developers and business experts. Also, the CHG describes
the modularization of its top-level component. There is no CHG associated with a low-level com-
ponent; that is, CHG =φ for a low-level component. That is because the low-level component has
already been presented in the CHG of its top-level service. The CHGs of all top-level components
form thecomponent view(CompView) of the system.
Before we present the technique for automatically generating components, we introduce the
reachability concept in the CIDG and CIRG. We use reachability concept in the component gen-
eration process.
Definition 6.2. Let G = (V, E) be the CIDG of an existing object-oriented system, whereV
represents all nodes (i.e., classes or interfaces) inG andE represents all edges (i.e., dependency)
in G. Given two classesv ∈ V andw ∈ V , classw is said to bereachablefrom classv if there
exists a directed path fromv to w, denoted byv∗−→ w.
Definition 6.3. Let G = (V, E) be the CIRG of an existing object-oriented system, whereV
represents all nodes (i.e., classes or interfaces) inG andE represents all edges (i.e., relationships)
in G. Given two classesv ∈ V andw ∈ V , classw is said to beinheritance (realization)
reachablefrom classv ∈ CIRG.V if there exists a directed path fromv to w and the labels of
all edges in this path containinheritance(realization) types, denoted byvIN∗−→ w (v
RE∗−→ w).
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 83
<<complexType>>
Component::FacadeClassSet
<<seq>> +fc_sequence [1..1] <<elt>> +class [0..1] : xsd: string
<<elt>> +interface [0..1] : xsd: string
<<choice>>
fc_sequence
<<complexType>>
Component::ConstituentClassSet
<<seq>> +cc_sequence [1..1]
<<elt>> +name [1..1] : xsd: string
<<sequence>>
chg_sequence
<<complexType>>
Component::ComponentHierarchyGraph
<<seq>> +chg_sequence [1..1]
<<elt>> +class [0..1] : xsd: string
<<elt>> +interface [0..1] : xsd: string
<<choice>>
cc_sequence
<<elt>> +name [1..1] : xsd:string
<<elt>> +interface [1..1] : xsd:string
<<elt>> +facadeClassSet [1..1]
<<elt>> +constituentClassSet [1..1]
<<elt>> +componentHierarchyGraph [1..1]
<<sequence>>
sequence
<<seq>> +sequence [1..1]
<<complexType>>
Component
Figure 6.1: The UML Representation of XML Schema for a Component.
We extend the refactoring approach presented in [90] to automatically generate an interface
for each component corresponding to an identified service. Letserv be an identified service
represented by the tuple
serv(name,CF , SHG)
andcomp be the generated represented by the tuple
comp(name, if , CF , CC , CHG),
the key steps for generating the component are enumerated as follows :
• Step 1: Name the component by copying its service’s name,comp.name = serv.name.
• Step 2: Compute the facade class set of the component by copying its service’s facade
class set, Ccomp.CF = serv.CF .
• Step 3: Compute the constituent class set of the component,
comp.CC = comp.CF ∪⋃
all c∈comp.CF{v ∈ CIDG.V | (c ∗−→ v)}.
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 84
• Step 4: Create a new interface namedif . Modify each class incomp.CF to implementif .
Modify each interface incomp.CF to extendif .
• Step 5: Add declarations of all public methods defined in each class inVIN to if , where
VIN =⋃
all c∈comp.CF{v ∈ CIRG.V | (c IN∗−→ v)},
and modify each class inVIN to implementif .
• Step 6: Copy declarations of all public methods declared in each interface inVRE to if ,
where
VRE =⋃
all c∈comp.CF{v ∈ CIRG.V | (c RE∗−→ v)},
and modify each interface inVRE to extendif .
• Step 7: Add declarations of setter and getter methods for all public class fields declared
in each class incomp.CF ∪ VIN to if , and implement the corresponding setter an getter
methods in classes where these fields are originally declared.
• Step 8: Add declarations of getter methods for all public class fields declared in each
interface incomp.CF ∪ VRE to if , and implement the corresponding getter methods in
classes that implement the interfaces where these fields are originally declared.
• Step 9: Assign the newly built interface to the component,comp.if = if .
• Step 10: Generate the component hierarchy graph (CHG) for the component,
comp.CHG =
G serv.SHG 6= φ (i.e.,serv is a top-level service);
φ otherwise.
whereG is a copy ofserv.SHG, except that names of all nodes inG are changed to
corresponding service names, not the facade classes any more.
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 85
Note that the source modification in the above steps does not change the observable behavior
of the original system. Once the tuple(name, if , CF , CC , CHG) for a component has been
constructed, we can package all classes and interfaces withinCC together with the newly created
interfaceif into a JAR file namedname.jar. The packaged component is self-contained and
loosely coupled and hence can be deployed and used independently.
6.1.2 An Example
To further describe the component generation process, let us give an example of realizing an
identified service. In Chapter 5, we identified services from the hypothetical CRS system . One
of these,Customer, is a low-level service underneath the top-level serviceV ehicleBooking
represented by the tuple
serv(name,CF , SHG)
where
serv.name = Customer,
serv.CF = {com.uwstar.crs.person.Customer}, and
serv.SHG = φ.
Let the tuplecomp(name, if , CF , CC , CHG) represent the component that realizes service
Customer, the steps for realizing the service are enumerated as follows (the part of UML class
diagram of the component is shown in Figure 6.3) :
1. comp.name = serv.name = Customer.
2. comp.CF = serv.CF = {com.uwstar.crs.person.Customer}.
3. Note that⋃
all c∈comp.CF{v ∈ CIDG.V | (c ∗−→ v)} represents all classes or interfaces
that are reachable from every class incomp.CF in the CIDG.
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 86
+ id : String
- creditRecord : CreditRecord
- drivingRecords : DrivingRecord[]
Customer
+ Customer()
+ Customer(String)
+ updateCreditRecord(int)
+ addDrivingRecord(String)
+ getCreditStatus() : int
+ isSafeDriver() : boolean
+ evaluateVehicles() : String[]
- name : String
- address : String
- phoneNumber : String
Person
+ Person()
+ setName(String)
+ getName() : String
+ setAddress(String)
+ getAddress() : String
+ setPhoneNumber(String)
+ getPhoneNumber() : String
Figure 6.2: The UML Class Diagrams ofCustomer andPerson in the CRS System.
In this example,⋃
all c∈comp.CF{v ∈ CIDG.V | (c ∗−→ v)} =
{ com.uwstar.crs.person.Person,
com.uwstar.crs.record.CreditRecord,
com.uwstar.crs.record.DrivingRecord,
com.uwstar.crs.record.Record }
Then, we have
comp.CC = comp.CF ∪⋃
all c∈comp.CF{v ∈ CIRG.V | (c ∗−→ v)} =
{ com.uwstar.crs.person.Customer,
com.uwstar.crs.person.Person,
com.uwstar.crs.record.CreditRecord,
com.uwstar.crs.record.DrivingRecord,
com.uwstar.crs.record.Record }
4. Create a new interface namedICustomer. Since there is only one class incomp.CF (i.e.,
com.uwstar.crs.person.Customer), we modify this class to implementICustomer as
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 87
shown in Figure 6.3.
5. The inheritance reachable class set of classcom.uwstar.crs.person.Customer is ex-
tracted as follows :
VIN = {com.uwstar.crs.person.Person}Figure 6.2 depicts the UML class diagrams of classcom.uwstar.crs.person.Customer
and classcom.uwstar.crs.person.Person. We add declarations of all public methods
defined in classcom.uwstar.crs.person.Person to ICustomer, and we modify class
com.uwstar.crs.person.Person to implement the interfaceICustomer. These modifi-
cations are reflected in Figure 6.3.
6. Since the realization reachable class set of classcom.uwstar.crs.person.Customer is
empty (i.e.,VRE = ∅), there is no action needed in this step.
7. As Figure 6.2 shows, there is only one public class field declared in class
com.uwstar.crs.person.Customer
(i.e., id) and no public class field in classcom.uwstar.crs.person.Person. We add the
setter method declarationsetID(String) and the getter method declarationgetID() :
String to interfaceICustomer. We also need to implement these two methods in class
com.uwstar.crs.person.Customer. Listing 6.1 lists the implementation of these two
methods. These modifications are also reflected in Figure 6.3.
8. Again, sinceVRE = ∅, there is no action needed in this step.
9. comp.if = ICustomer.
10. comp.CHG = φ, because the serviceCustomer is a low-level service. Hence, the gen-
erated component is a low-level component. If the service is a top-level service, the CHG
of the generated component is the SHG of the top-level service except node names in the
SHG are changed to corresponding service names.
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 88
ICustomer
+ updateCreditRecord(int)
+ addDrivingRecord(String)
+ getCreditStatus() : int
+ isSafeDriver() : boolean
+ evaluateVehicles() : String[]
+ setID(String)
+ getID() : String
+ setName(String)
+ getName() : String
+ setAddress(String)
+ getAddress() : String
+ setPhoneNumber(String)
+ getPhoneNumber() : String+ id : String
- creditRecord : CreditRecord
- drivingRecords : DrivingRecord[]
Customer
+ Customer()
+ Customer(String)
+ updateCreditRecord(int)
+ addDrivingRecord(String)
+ getCreditStatus() : int
+ isSafeDriver() : boolean
+ evaluateVehicles() : String[]
+ setID(String)
+ getID() : String
Newly added methods to
implement the methods declared
in Icustomer interface
Newly created interface for
component Customer
- name : String
- address : String
- phoneNumber : String
Person
+ Person()
+ setName(String)
+ getName() : String
+ setAddress(String)
+ getAddress() : String
+ setPhoneNumber(String)
+ getPhoneNumber() : String
Figure 6.3: Part of UML Class Diagram of the ComponentCustomer.
Now we are ready to package the following classes (i.e., the constituent class set) :
com.uwstar.crs.person.Customer,
com.uwstar.crs.person.Person,
com.uwstar.crs.record.CreditRecord,
com.uwstar.crs.record.DrivingRecord, and
com.uwstar.crs.record.Record
together with the newly created interfaceICustomer as a JAR file namedCustomer.jar.
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 89
1 p u b l i c c l a s s Customer e x t e n d s Person imp lements ICustomer{23 p u b l i c S t r i n g i d ; \\ cus tomer ID4 . . .5 p u b l i c vo id s e t I D ( S t r i n g i d ) {6 t h i s . i d = i d ;7 }8 p u b l i c S t r i n g get ID ( ) {9 r e t u r n i d ;
10 }11 . . .12 }
Listing 6.1: The Implementation of methodssetID andgetID in classCustomer.
6.2 System Transformation
One of the primary goals of the proposed SOC4J framework is to transform the monolithic ar-
chitecture of an existing object-oriented system to a more flexible service-oriented architecture.
In the previous stages of the framework, we have identified services and packaged the identi-
fied services into self-contained components. Now, we introduce a reconstruction technique that
automatically reconstructs the existing source system into a component-based target system.
6.2.1 Approach
The reconstruction process is based on the extracted components. In this thesis, extracted com-
ponents are categorized into two classes : top-level components and low-level components. A
top-level component has an associated component hierarchy graph (CHG) to describe the low-
level components contained in the top-level component. Each component is self-contained and
has been packaged into a JAR file. Based on extracted components, we design a meta-model,
depicted in in Figure 6.4, for the component-based target system. The target system is composed
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 90
Target System
(Component-Based System)
contains
contains
1
1
*
Top-Level Component
(JAR file)
contains
1
*
Low-Level Component
(JAR file)
Class/Interface
(Java file)contains
1
1..*
*
contains1
*
contains1
*
Figure 6.4: The Meta-Model for the Component-Based Target System.
of one or more top-level components, as well as a set of classes/interfaces, while each top-level
component might consist of some low-level components together with a set of classes and in-
terfaces. Like the top-level component, the low-level component might contain other low-level
sub-components, classes and interfaces. In the source system, some classes or interfaces may not
be identified as business services or not be contained in identified business services. Therefore,
these classes or interfaces are not packaged into components. In order to preserve the behavior of
the system, we have to include these classes or interfaces in the component-based target system.
We reconstruct the target system by adopting a bottom-up integration technique that collab-
orates with the extracted components, starting with the components in the lowest position in the
component hierarchy. The reconstruction process should not change the observable behavior of
the existing system. The surrounding parts of the component should use newly extracted compo-
nents in order to avoid the situation where two sets of classes, which provide the same function-
alities, exist in the same system. Algorithm 6.1 describes the transformation process, taking in
the source system and the extracted components represented as input. Extracted components are
represented as tuples in the form of(name, if , CF , CC). The output of the algorithm will be an
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 91
Algorithm 6.1: System-Transformation
Input : An existing object-oriented system and extracted components from the systemOutput : A component-based target system
foreach top-level componentt do1
while there exists a low-level component int.CHG do2
// star with the component in the lowest position in the// component hierarchyc ← node without descendants int.CHG;3
// retrieve components that contain component cP ← parents ofc in t.CHG;4
// refactoring the parents of component c to use cforeachp ∈ P do5
Change the code of classes inp.CC that reads (or writes) the public fields of6
classes inc.CF to the code that invokes the correspondinggetter(or setter)methods in interfacec.if ;Replace the reference types in classes inp.CC , which refer to any classes in7
c.CF , with interfacec.if ;end8
// update t.CHG to remove component cRemove nodec from t.CHG;9
end10
end11
instance of the meta-model described in Figure 6.4.
6.2.2 An Example
To further describe the system transformation process, we give an example of reconstructing the
CRS system into a component-based target system.
Consider the following top-level services identified after the service identification stage :
• (V ehicle Booking, {com.uwstar.crs.Booking}, SHGV B). The service hierarchy graph
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 92
com.uwstar.crs.VehicleRepository
com.uwstar.crs.vehicle.Car
com.uwstar.crs.vehicle.Truck
com.uwstar.crs.vehicle.SUV
com.uwstar.crs
Booking
com.uwstar.crs.person
Agent
com.uwstar.crs.person
Customer
com.uwstar.crs
VehicleEvaluation
com.uwstar.crs.person
Customer
(a) (b)
Figure 6.5: The Service Hierarchy Graphs of the CRS System.
Vehicle Repository
Vehicle Booking
Agent Customer
Vehicle Evaluation
Customer
(a) (b)
Figure 6.6: The Component Hierarchy Graphs of the CRS System.
SHGV B is shown in Figure 6.5 (a).
• (V ehicle Evaluation, {com.uwstar.crs.V ehicleEvaluation}, SHGV E). The service
hierarchy graphSHGV E is shown in Figure 6.5 (b).
We have two top-level components generated after the component generation stage, and the
low-level components underneath each top-level component are described in the related compo-
nent hierarchy graph. The two top-level components are described as follows :
• (V ehicle Booking, IBooking, CF1, CHGV B). The component hierarchy graphCHGV B
is shown in Figure 6.6 (a).
CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 93
• (V ehicle Evaluation, IEvaluation, CF1, CHGV E). The component hierarchy graph
CHGV E is shown in Figure 6.6 (b).
After running Algorithm 6.1, we get the component-based version of the CSR system as
Figure 6.7 shown. The component-based system has the same functionality as the original system.
<<application>>
Car Rental System
<<component>>
:Vehicle Repository
<<component>>
:Vehicle Booking
<<component>>
:Vehicle Evaluation
<<component>>
:Agent
<<component>>
:Customer
IRepository IAgent
ICustomer
IBooking IEvaluation
Dealer
contains
Figure 6.7: The Component-Based Car Rental System.
6.3 Summary
In this chapter, we explained the processes contained in the component generation stage and sys-
tem transformation stage of the SOC4J framework. We have discussed how an identified service
can be realized as a self-contained component and how the existing system can be reconstructed
into a component-based system based on the components that realize the identified services.
Chapter 7
Empirical Studies
In this chapter, we perform a set of empirical studies on the proposed SOC4J framework to
assess the service-oriented componentization techniques introduced in this thesis. The proposed
technique has been implemented in a prototype that aims to i)identify critical business services
embedded in an existing Java system, ii) realize identified services into self-contained reusable
components, and iii) transform the existing system into a component-based system. Therefore,
the purpose of the empirical study in this chapter is to test the effectiveness of the proposed SOC4J
framework and assess i) the usefulness in terms of feasibility and effectiveness of the architecture
recovery and representation approach, ii) the usefulness in terms of efficiency and effectiveness
of the business service identification technique, iii) the usefulness in terms of effectiveness of the
identified service modeling and packaging techniques, and iv) the time and space complexity of
the service-oriented componentization technique as a function of source code size.
We outline the implementation of the prototype for the SOC4J framework in Section 7.1. In
Section 7.2, we discusses two evaluation criteria for the proposed framework. While we present
empirical studies on two Java open source projects in Section 7.3 and 7.4. Finally, we summary
this chapter in Section 7.5.
94
CHAPTER 7. EMPIRICAL STUDIES 95
7.1 A Prototype for the SOC4J Framework
As a part of this work, the proposed service-oriented componentization approach has been im-
plemented in a prototype which offers an interactive and integrated environment for i) identify-
ing critical business services embedded in an existing Java system, ii) realizing each identified
service as a self-contained component, and iii) transforming the object-oriented design into a
service-oriented architecture. We have named the prototypeJComp, an Java Componentization
Kit. The JComp is an integrated tool workbench targeted at rapidly integrating software tools for
prototyping the SOC4J framework. Now, we examine the tool integration requirements for the
SOC4J framework and discuss the implementation of the JComp.
7.1.1 Tool Integration Requirements
As we discussed in Chapter 3, several software tools are needed for the SOC4J framework to
componentize an object-oriented system and re-modularize the existing assets for supporting ser-
vice functionality. Figure 7.1 depicts the tool interconnection of the SOC4J framework. Five
rounded rectangles on the right side of the figure represent the tools needed for the the SOC4J
framework, while the flow of data needed for integrating the tools within the framework is shown
by the thick arrow on the right side of the diagram.
The functionality of each tool is outlined as follows :
Source Code ModelingThis tool parsers the Java source code and outputs a set of raw data
of the facts. Based on the extracted facts, the tool further generates source code models
defined in Chapter 4, including, JPackage, JFile, JClass, and JMethod. The raw data set
and source code models are exported as XML documents.
Architecture Modeling Based on the source code model, this tool identifies all class relation-
ships defined in Chapter 4. It exports identified relationships in graph representations, that
CHAPTER 7. EMPIRICAL STUDIES 96
Integrated
Tool Workbench
for
SOC4J
Framework
Source Code Modeling
Service Identification
Architecture Modeling
Component Generation
System Transformation
Java Source Code
FactsSource Code Models
Flo
w o
f In
tegra
tion D
ate
Source Code Models
CIRG, CIDG
CIRG, CIDG
Identified Services
Identified Services
Self-Contained Components
Self-Contained ComponentsSource Code
Component-Based System
Figure 7.1: The Tool Interconnection for the SOC4J Framework.
is, the CIRG and CIDG. Basic reusability attributes for each class in the system also are
computed. The CIRG and CIDG are exported as XML documents.
Service Identification This tool assists users in identifying the business services embedded in
an existing Java system through analysis of the CIRG and CIDG. Firstly, it identifies the
top-level services of the system and builds a service hierarchy graph for each identified
top-level service. Then, it performs a graph transformation on the service hierarchy graph
to identify low-level services for each top-level service.
Component Generation This tool realizes identified services into self-contained components.
For each identified service, it extracts all classes/interfaces that are necessary for imple-
menting the service, generates an interface for the service, and packages these classes/in-
terfaces together with the interface as a JAR file.
System Transformation This tool reconstructs an existing Java system into a component-based
CHAPTER 7. EMPIRICAL STUDIES 97
system by using the generated component from the source system. The system transforma-
tion process preserves the functionality of the source system.
7.1.2 JComp RCP Application
The JComp is built on the top of the Eclipse Rich Client Platform (RCP) [68] and hence it is
called an Eclipse RCP application. An Eclipse RCP application is a collection of plug-ins and
the Runtime on which they run. The platform-independent Eclipse RCP architecture makes rich-
client applications easy to write because business logic is organized into reusable components
called plug-ins. Eclipse RCP provides a core set of services, representing a substantial percentage
of the rich client platform development functionality, so that developers do not have to rewrite
infrastructure code. These Eclipse RCP services are available to every application component
plug-in. These services are the interface between a plug-in and the low-level platform-specific
functionality that supports the plug-in, just like a J2EE container is the interface between EJB
and the application server. Moreover, because of the Eclipse open source license, we can use the
technologies that went into Eclipse to create our own commercial-quality programs. The GUI
toolkits used by Eclipse RCP are the same used by the Eclipse IDE and enable applications with
optimal performance that have a native look and feel on any platform that they run on.
The architecture of the JComp toolkit is depicted in Figure 7.2. The internals of the JComp are
the same OSGi runtime and GUI toolkit provided by the Eclipse IDE. The OSGi runtime enables
Java code from multiple sources to all run together in a single Java Virtual Machine (JVM). The
OSGi framework automatically loads and runs bundles which are encapsulations of various files.
This provides the mechanism by which plug-ins can be automatically detected and loaded into the
JComp RCP application. The resource manager provides a GUI to show the current configuration;
that is, a list of installed plug-ins. It assists the end user in finding and installing new plug-ins.
It is also capable of scanning through the list of already-installed plug-ins to look for updates to
CHAPTER 7. EMPIRICAL STUDIES 98
JComp RCP Application
Eclipse RCP Platform
Platform Runtime (OSGi)
Resource Manager
SWT
JFace
UI (Generic Workbench)
Transformer Plug-in
Generator Plug-in
Extractor Plug-in
Modeler Plug-in
Parser Plug-in
Figure 7.2: The Architecture of the JComp Java Componentization Kit.
these plug-ins. The Standard Widget Toolkit (SWT) provides a completely platform-independent
API that is tightly integrated with the operating system’s native windowing environment. Java
widgets actually map to the platform’s native widgets. This gives Java applications a look and
feel that makes them virtually indistinguishable from native applications. The JFace toolkit is
a platform-independent user interface API that extends and interoperates with the SWT. This
library provides a set of components and helper utilities that simplify many of the common tasks
in developing SWT user interfaces. The generic workbench provides extension points that the
plug-ins extend. The plug-ins provide functionality that is integrated into the RCP platform just
as if it were always part of the application.
As Figure 7.2 depicted, each tool described in Section 7.1.1 was implemented as a separate
JComp plug-in. A snapshot of the JComp Java Componentization Kit is depicted in Figure 7.3.
CHAPTER 7. EMPIRICAL STUDIES 99
Figure 7.3: A Snapshot of the JComp Java Componentization Kit.
CHAPTER 7. EMPIRICAL STUDIES 100
7.2 Evaluation Criteria
Since the proposed framework is trying to extract reusable components from an object-oriented
system and migrate the object-oriented design to a service-oriented architecture, the evaluation
criteria needs to addresscomponent reusabilityandarchitectural improvement.
7.2.1 Component Reusability
The components acquired by applying the proposed framework are structurally reusable because
the internal structures are encapsulated and the components are self-contained and thus have no
dependency upon the entities outside of them. However, we still need to seek a way to assess the
reusability quantitatively.
Reusability Metric Suite
Components have two relatively static sources of information : the external documentation and
the public interface. The external documentation is an important source of information that can
greatly affect component reusability; such documentation is developed for a human audience,
which makes it harder to measure. On the other hand, component interfaces are easily parsed by a
computer, making them easier to measure. This is an important argument for developing reusabil-
ity metrics based upon component interfaces. In this thesis, we aim to assess the reusability of the
extracted components through the analysis of their interfaces and internal methods as well. We
define a reusability metric suite by selecting and adapting the metrics defined in [13, 25, 70, 91]:
Parameter Per Method (PPM ) ThePPM metric measures the mean size of method declara-
tions of the interface of the component, and it is defined as follows:
PPM =
IPCIMC if IMC > 0;
0 otherwise.
(7.1)
CHAPTER 7. EMPIRICAL STUDIES 101
where the metricIPC (Interface Parameter Count) is the count of parameters of all public
methods in the interface of the component, and the metricIMC (Interface Method Count)
is the count of public methods in the interface of the component.
It is believed that methods with fewer parameters are easier to understand, and so will be
easier to reuse [58]. It follows that component interfaces with lowerPPM will tend to
have lower complexity and hence better understandability.
Reference Parameter Density (RPD) TheRPD metric measures the occurrence of reference
parameters in an interface, and it is defined as follows:
RPD =
IRPCIPC if IPC > 0;
0 otherwise.
(7.2)
where the metricIRPC (Interface Reference Parameter Count) is the count of reference
type parameters of all public methods in the interface of component.
It is believed that the use of references makes it more difficult to understand the pro-
gram [87]. This is also applicable to interfaces, as arguments which are passed by reference
tend to be more difficult to understand than arguments which are passed by value. A higher
RPD will indicate that an interface tends to be more difficult to understand. However, it
is often necessary for reference arguments to be used so that useful functionality can be
implemented. Therefore, a high value is not necessarily evidence of a poor interface, but it
does suggest that good documentation is requested [13].
Rate of Component Observability (RCO) TheRCO metric measures the percentage of read-
able properties in all fields implemented within the interface of the component, and it is
CHAPTER 7. EMPIRICAL STUDIES 102
defined as follows:
RCO =
IRMCIFRC if IFRC > 0;
0 otherwise.
(7.3)
where the metricIRMC (Interface Reader Method Count) is the count of public methods
in the interface of the component that read a field, the metricIFRC (Interface Field and
Reference Count) is the count of fields and references the interface of the component.
RCO indicates the component’s degree of observability for users of the component [91].
To understand the behavior of a component from outside the component, the observability
of the component should be high. However, there is a possibility that it is difficult for
users to find an important readable property among all of the readable properties when the
observability is too high.
Rate of Component Customizability (RCC) TheRCC metric measures the percentage of writable
properties in all fields implemented within the interface of the component, and it is defined
as follows:
RCC =
IWMCIFRC if IFRC > 0;
0 otherwise.
(7.4)
where the metricIWMC (Interface Writer Method Count) is the count of public methods
in the interface of the component that write a field.
RCC indicates the component’s degree of customizability for users of the component. To
adapt the settings of a component from outside the component to the user’s requirements,
the customizability of the component should be high. However, too high a customizability
violates the encapsulation of the component, and leads to greater opportunities for improper
use [91].
Self-Completeness of Component’s Return Values (SCCr) TheSCCr metric measures the per-
CHAPTER 7. EMPIRICAL STUDIES 103
centage of business methods without any return values in all business methods implemented
in the component, and it is defined as follows:
SCCr =
V MCMC if MC > 0;
1 otherwise.
(7.5)
where the metricV MC (Void Method Count) is the count of public methods in the compo-
nent that have void return type, and the metricMC (Method Count) is the count of public
methods in the component.
SCCr indicates the component’s degree of self-completeness and external dependency,
based on the return values of methods. The smaller the number of business methods without
return value, the smaller the possibility of the component having external dependency. High
self-completeness of a component (i.e., low external dependency) leads to high portability
of the component [91].
Self-Completeness of Component’s Parameters (SCCp) TheSCCp metric measures the per-
centage of business methods without any parameters in all business methods implemented
in the component, and it is defined as follows:
SCCp =
NPMCMC if MC > 0;
1 otherwise.
(7.6)
where the metricNPMC (None Parameter Method Count) is the count of public methods
in the component that do not have any parameters.
SCCp indicates the component’s degree of self-completeness and external dependency,
based on the parameters of methods. The fewer business methods without parameters, the
smaller the possibility of having dependency outside the component [91].
CHAPTER 7. EMPIRICAL STUDIES 104
Reusability Model
Reusability is a high-level quality of software components and hence it is the result of the combi-
nation and interaction of many low-level properties. The component reusability model typically
shows reusability as being composed of properties such as complexity, observability, customiz-
ability, and external dependency. From the user’s point of view, we define a component reusability
model as illustrated in Figure 7.4. This model is an adaptation of the reusability model intro-
duced by Washizaki et al. [91]. The quality factors are selected only to provide an analysis of
the reusability of a component, while factors related to other aspects of component quality that
are not considered to be important to reusability are not considered. The choice of the three fac-
tors affecting reusability has been made on the basis of an analysis of the activities carried out
when reusing a black-box component. We extend Washizaki’s model to quantify the complexity
of components by utilizing metricReference Parameter Density (RPD)proposed in [13]. Thus,
the adapted model includes aspects related to theUnderstandability, Adaptability, andPortability
factors given by ISO 9126 [1].
Reusability
Portability
Understandability
Adptability
Complexity
Observability
Customizability
External Dependency
RPD
RCO
RCC
SCCr
SCCp
Characteristic Quality Factor Criteria Metric
Figure 7.4: The Component Reusability Model.
In order to quantify the reusability of the components generated by our framework, based on
CHAPTER 7. EMPIRICAL STUDIES 105
the reusability model we formulate reusability measurement as follows:
Reusablity = wcomplexity ∗RPD +
wobservability ∗RCO +
wcustomizability ∗RCC +
wex−dependency ∗ (SCCr + SCCp
2)
(7.7)
By their definition, the values of all metrics in above formula are in[0, 1]. Since the com-
plexity and external dependency have a negative effect on reusability, the weightwcomplexity and
wex−dependency could be values in [−1, 0], while the observability and customizability have a
positive effect and hence the weightwobservability andwcustomizability could be any values in [0,
1]. Nevertheless, the sum of these four weights is set to1. Consequently, the reusability value
will be in [0, 1] and a higher value represents a higher level of the reusability.
7.2.2 Architectural Improvement
The software architecture of a program or computing system is the structure of the system, which
comprise software components, the externally visible properties of those components, and the
relationships among these components. The more complex a system structure is, the more dif-
ficult it is to understand, and therefore to maintain. We wish to measure the degree of confor-
mance, which the target (restructured) architecture presents, to the architectural principles of high
intra-module cohesion and low inter-module coupling. In this thesis, we introduce a metric for
measuring a large software system to determine if it is ”well-structured”, based on the concept of
entropy from information theory.
Entropy from an information theoretic point of view has been proposed in [78] for evaluating
the structuredness of a software’s design. We adopt the definition of entropy for an object-oriented
design introduced in [20] to compute the entropy of our source systems and target systems, re-
CHAPTER 7. EMPIRICAL STUDIES 106
spectively. The smaller the entropy value, the better structure the system has. We then compare
the results to see whether the structures of our target systems are improved. The entropy of a
object-oriented systemS with n classes is defined as follows [20]:
H(S) = −n∑
i=1
p(ci) log p(ci)2 (7.8)
It is assumed that the system is described in a standard class diagram format following UML
notation for associations between classes. For a randomly selected unary association,p(ci) is de-
fined as the probability that the association leads to classci. The existence of such an association
indicates that classci provides services to the rest of the system, since it responds to messages
sent to it. Within this context, bi-directional associations are treated as two separate unary as-
sociations. Classes are used as the units for entropy measurement because classes represent the
most important fundamental building blocks of an object-oriented system and are an identifiable
abstraction that is present both in designs and implementations.
To compute the entropy metric of the source system of our framework, letn be the number of
classes/interfaces of the source system, we computep(ci) as the ratio of the number of incoming
edges of classci over the total number of edges in the CIDG of the source system. To compute
the entropy metric of the target system of our framework, we considern as the total number
of components and classes/interfaces contained in the target system, and we then computep(ci)
using the same way as in the source system except that there may exist an association between a
class/interface and a component.
7.3 Case Study : Jetty
In this section, we apply the JComp Java Componentization Kit to Jetty [46] to empirically eval-
uate the usefulness of the proposed SOC4J framework.
CHAPTER 7. EMPIRICAL STUDIES 107
7.3.1 Statistics of the Jetty
Jetty is an open-source, standards-based, full-featured web server implemented entirely in Java. It
is released under the Apache 2.0 licence and is therefore free for commercial use and distribution.
Jetty can be used as : i) a stand-alone traditional web server for static and dynamic content, ii) a
dynamic content server behind a dedicated HTTP server such as Apache using Apache module
mod proxy, and iii) an embedded component within a Java application.
Project Version LOC Java Source Files Packages Classes Interfaces
Jetty 5.1.10 44125 318 25 273 47
Table 7.1: Statistics of the Jetty.
As shown in Table 7.1, we work on Jetty version 5.1.10, which was released on April 5, 2006.
It has about 44K LOC source code and consists of 318 Java source files that defines 273 classes
and 47 interfaces distributed in 25 packages.
7.3.2 Discussions on Obtained Results
In order to componentize the Jetty system, we first applied the JComp Java Componentization Kit
to identify business services embedded in the system. The JComp then generated a self-contained
component for each identified service.
The Parser plug-in of the JComp imported the source code of the Jetty and built a set of
source code models. These source code models were exported and stored as XML documents.
The Modeler plug-in imported the source code models and recovered architectural models that
are represented by the CIRG and CIDG. Like the source code models, the CIRG and CIDG were
exported and stored as XML documents. Firstly, based on the CIRG and CIDG, the Extractor
plug-in, which implements the top-level service identification algorithm (i.e., Algorithm 5.2) and
the low-level service identification algorithm (i.e., Algorithm 5.3), identified33 top-level service
CHAPTER 7. EMPIRICAL STUDIES 108
Figure 7.5: The AcceptedService Viewof the Extractor plug-in.
candidates from the CIDG. We then validated each candidate by examining the facade class set
of these candidates, and accepted16 top-level services. These16 top-level services represent the
functionality of the Jetty from the points of view of end users. Appendix A lists and describes
all accepted top-level services of the Jetty web server. Figure 7.5 depicts the acceptedService
View of the Extractor plug-in, which displays all accepted top-level services of the Jetty. The
unacceptable candidates are dead code, debugging modules, or testing modules. For instance, we
found8 dead classes inorg.mortbay.utilpackage and a debugging module whose entry point is
CHAPTER 7. EMPIRICAL STUDIES 109
the classorg.mortbay.servlet.ProxyServlet.
ID Top-Level Service Classes/interfaces Low-Level Services
T1 Win32 Server 248 11T2 Dynamic Servlet Invoker 207 12T3 Jetty Server MBean 126 9T4 Proxy Request Handler 113 7T5 XML Configuration MBean 87 5T6 Web Application MBean 86 6T7 Administration Servlet 56 5T8 CGI Servlet 49 5T9 Host Socket Listener 46 5T10 Web Configuration 34 3T11 Authentication Access Handler 30 3T12 Servlet Response Wrapper 27 2T13 IP Access Handler 18 0T14 Multipart Form Data Filter 16 2T15 HTML Script Block 12 1T16 Applet Block 9 1
Table 7.2: Top-Level Services Identified from Jetty.
After all the top-level services were validated, the Extractor plug-in then identified low-level
services underneath each top-level service. Table 7.2 shows the atomic services and identified
low-level services for each top-level service. Actually, atomic services of a top-level service are
Java classes or interfaces that implement the top-level service; they are represented by nodes of
the original SHG of the services. For example, as Table 7.3 shows, there are11 low-level services
identified from top-level serviceWin32Server (i.e., top-level service T1). This top-level service
runs the Jetty as a Windows HTTP server. When identifying low-level services, we used the
Termination Criterion 5.1 described in Chapter 5 to terminate the iteration in Algorithm 5.3 by
settingMQ = 0.75. In the case that the level of granularity of services is crucial, the user may use
the Termination Criterion 5.2 for Algorithm 5.3. As Figure 7.6 shows, we terminated the low-
level service identification process at the fifth iteration. The final low-level services identified for
CHAPTER 7. EMPIRICAL STUDIES 110
original SHG
1st iteration
2nd iteration
final iteration
- - - - - - - - - -
- - - - - - - - - -
- - - - - - - - - -
- - - - - - - - - -
Figure 7.6: Iterations of the Service Aggregation Process of Top-Level ServiceWin32 Server.
top-level serviceWin32Server are shown in Table 7.3.
To realize each identified service (both top-level service and low-level service), the Generator
plug-in generated a self-contained component for each service. Figure 7.7 illustrates the compo-
nent hierarchy graph (CHG) of the top-level componentWin32 Server. There are11 low-level
components contained in the top-level component. Furthermore, the Generator plug-in measured
the reusability for each generated component, applying the component reusability model by com-
puting Formula (7.7). In this empirical study, we setwcomplexity = − 0.3, wobservability = 0.8,
wcustomizability = 0.8, andwex−dependency = −0.3. Figure 7.8 shows reusability values of the
CHAPTER 7. EMPIRICAL STUDIES 111
HTTP
Response
Win32
Server
Jetty
Server
HTTP
Connection
HTTP
Request
Security
Handler
Service
Handlers
Resource
Handler
Servlet
Handler
Web
Application
Context
Servlet
Figure 7.7: The CHG of Top-Level ComponentWin32 Serverof the Jetty.
Low-Level Component Reusability
Jetty Server 0.9Service Handlers 0.6Resource Handler 0.7Security Handler 0.7Socket Listener 0.8HTTP Connection 0.9HTTP Request 0.7HTTP Response 0.5Web Application Context 0.6Servlet 0.7Servlet Handler 0.8
Table 7.3: Low-Level Services Identified in Top-Level ServiceWin32 Server.
CHAPTER 7. EMPIRICAL STUDIES 112
top-level components and the average value of the low-level components underneath each top-
level component. From Figure 7.8, it was observed that all top-level components, exceptC16,
have reusability value above0.5 and all the average values are between0.6 to 0.8. Thus, we could
conclude that identified services from the Jetty project have a reasonable level of the reusability.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16
Top-Level Components
Reu
sab
ilit
y
Reusability of Top-Level Components
Average Reusability of Low-Level Components in a Top-Level Component
Figure 7.8: The Reusability of Components Extracted from Jetty.
The Transformer plug-in transformed the Jetty into a component-based system based on the
generated components. We named the target systemJetty-JComp. As we see in Algorithm 6.1,
Jetty-JComp has the same functionality as Jetty. The Jetty-JComp now contains16 independent
JAR files. Each JAR file provides a top-level service and can be used independently. Also, each
independent JAR file is a component-based system that consists of a set of JAR files.
We have computed the entropy of both Jetty and Jetty-JComp by applying Formula (7.8).
When computing the entropy of Jetty-JComp, we used the component hierarchy graphs instead
of the CIDG because Jetty-JComp is comprised of components. We found that the entropy of the
Jetty-JComp was reduced by45.5%, compared to the the original Jetty project. Hence, we can
conclude that our transformation dramatically improves the structure of the system.
In Table 7.4, we summarize the time and space complexity of the proposed service-oriented
CHAPTER 7. EMPIRICAL STUDIES 113
Measurement Item Value
Case Study Size (KLOC) 44.1Source Code Modeling Time (min : sec) 2:18Source Code Model Space (MB) 1.43Architecture Modeling Time (min : sec) 4:19Architecture Model Space (MB) 1.57Top-Level Service Identification Time (min : sec) 6:45Average Low-Level Service Identification Time (sec) 66
Table 7.4: Some Time and Space Statistics of the SOC4J Framework on the Case Study : Jetty.
componentization framework as a function of source code size of the Jetty project. The experi-
ment was carried on a Windows desktop with Intel Pentium IV CPU 3.4GHz, 2G memory.
7.4 Case Study : Apache Ant
In this section, we apply the JComp Java Componentization Kit on another Java open source
project, namely Apache Ant [2], to further evaluate the usefulness of the proposed SOC4J frame-
work.
7.4.1 Statistics of the Apache Ant
The Apache Ant is a software tool for automating software build processes. It is similar to
makebut is written in the Java language and is primarily intended for use with Java. The most
immediately noticeable difference betweenAntandmakeis thatAntuses a file in XML format to
describe the build process and its dependencies, whereasmakehas its ownMakefileformat. By
default the XML file is namedbuild.xml. Ant is an Apache project. It is open source software,
and is released under the Apache Software License 2.0.
As shown in Table 7.5, we work on the Apache Ant version 1.6.5, which is the latest version.
It has around 86K LOC source code and consists of 690 Java source files that defines 640 classes
and 60 interfaces distributed in 70 packages.
CHAPTER 7. EMPIRICAL STUDIES 114
Project Version LOC Java Source Files Packages Classes Interfaces
Apache Ant 1.6.5 86468 690 70 640 60
Table 7.5: Statistics of the Apache Ant.
7.4.2 Discussions on Obtained Results
To componentize the Apache system, as we have done on the Jetty, we first applied the JComp
Java Componentization Kit to identify business services embedded in the system. Then, the
JComp generated a self-contained component for each identified service.
ID Top-Level Service Classes/interfaces Low-Level Services
T1 Project Building 205 34T3 WAR File Creation 152 17T4 TAR File Creation 144 20T6 JUnit Invocation 114 17T8 JAR File Creation 113 17T11 Unit Test Execution 86 14T14 File Content Loading 80 15T17 SSH File Copy 67 19T21 Zip File Creation 57 15T25 XML File Checking 54 9T30 Java Class Execution 45 11T31 Dependency Manifest Generation 45 8T48 GZip File Expansion 34 4T49 File Concatenation 34 6T53 Telnet Session Generation 34 8T63 CVS Repository Retrieval 29 4T69 JavaCC Invocation 26 5T74 File Permission Change 23 5T85 URL File Retrieval 18 4T92 String Replacement 16 4
Table 7.6: Selected Top-Level Services Identified from Apache Ant.
The Parser plug-in of the JComp imported the source code of the Apache Ant and built a set
CHAPTER 7. EMPIRICAL STUDIES 115
Low-Level Service Reusability
File Output 0.8Zip File Set 0.6Task Generator 0.9Identity Mapper 0.7Project Loader 0.5Zip Scanner 0.9File Packing 0.8File Mapper 0.5File Scanner 0.6Resource Selector 0.7File Entry 0.8Conversion Rules 0.9Exception Handle 0.7Resource Factory 0.6Type Integers 0.5File Field 0.7Resource Handler 0.8
Table 7.7: Low-Level Services Identified in Top-Level ServiceWAR File Creation.
of source code models. These source code models were exported and stored as XML documents.
The Modeler plug-in imported the source code models and recovered architectural models that
are represented by the CIRG and CIDG. Like source code models, the CIRG and CIDG were
exported and stored as XML documents. First, based on the CIRG and CIDG, the Extractor plug-
in identified236 top-level service candidates from the CIDG. Then we validated each candidate
by examining the facade class set of these candidates. Finally, we accepted101 top-level services.
Appendix B lists and describes all accepted top-level services of the Apache Ant system. These
101 top-level services represent the functionality of the Apache Ant from the point of views of
end users. We also found some candidates are dead code, debugging modules, or testing modules,
and hence are not accepted as top-level services.
After all top-level services were validated, the Extractor plug-in then identified low-level ser-
vices underneath each top-level service. We randomly selected20 top-level services from the
CHAPTER 7. EMPIRICAL STUDIES 116
101 accepted services to further identify low-level services underneath each of these20 top-level
services. Table 7.6 shows the atomic services and identified low-level services for each selected
top-level service. For example, as Table 7.7 shows, there are17 low-level services identified from
top-level serviceWARFileCreation (i.e., top-level service T3). TheWARFileCreation
packages Web applications. It packages a set of files into Web archive (WAR) files that should
end up in the WEB-INF/lib, WEB-INF/classes or WEB-INF directories of the Web Application
Archive. we used the Termination Criterion 5.2 described in Chapter 5 to terminate the iteration
in Algorithm 5.3 by examining the level of granularity of low-level services.
File
Output
WAR File
Creation
Task
Generator
Zip File
Set
Resource
Factory
Identity
Mapper
Resource
Selector
Exception
Handle
Resource
Handler
Project
Loader
File
Scanner
Zip
Scanner
File
Entry
File
Mapper
File
Packing
File
Field
Type
Integers
Conversion
Rules
Figure 7.9: The CHG of Top-Level ComponentWAR File Creationof the Apache Ant.
Again, to realize each identified service (both top-level service and low-level service), the
Generator plug-in generated a self-contained component for each service. Figure 7.9 illustrates
the component hierarchy graph (CHG) of top-level componentWAR File Creation. There are17
CHAPTER 7. EMPIRICAL STUDIES 117
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
C1 C3 C4 C6 C8 C11 C14 C17 C21 C25 C30 C31 C48 C49 C53 C63 C69 C74 C85 C92
Top-Level Components
Reu
sab
ilit
yReusability of Top-Level Components
Average Reusability of Low-Level Components in a Top-Level Component
Figure 7.10: The Reusability of Components Extracted from the Apache Ant.
low-level components contained in the top-level component. Furthermore, the Generator plug-
in measured the reusability for each generated component, applying the component reusability
model by computing Formula (7.7). Like we did for the Jetty project, we setwcomplexity = −0.3,
wobservability = 0.8, wcustomizability = 0.8, andwex−dependency = − 0.3. Figure 7.10 shows
reusability values of the top-level components of the Apache Ant and the average value of the
low-level components underneath each top-level component. From Figure 7.10, it was observed
that all top-level components, exceptC30, have reusability value above0.5 and all the average
values are between0.5 to 0.9. Thus, we could conclude that identified services from the Apache
Ant project have a reasonable level of the reusability.
Based on the generated components, the Transformer plug-in transformed the Apache Ant
into a component-based system. We named the target systemApache Ant-JComp. As we see in
Algorithm 6.1, Apache Ant-JComp has the same functionality as the Apache Ant. Jetty-JComp
now contains101 independent JAR files. Each JAR file provides a top-level service and can
be used independently. Since we have only further decomposed20 top-level components, each
of these20 corresponding JAR files is a component-based system that consists of a set of JAR
files (i.e., low-level components). Also, we have computed the entropy of both Apache Ant and
CHAPTER 7. EMPIRICAL STUDIES 118
Apache Ant-JComp by applying Formula (7.8). Again, when computing the entropy of Apache
Ant-JComp, we used the component hierarchy graphs instead of the CIDG because Apache Ant-
JComp is comprised of components. We found that the entropy of the Apache Ant-JComp was
reduced by16.3%, compared to the original Apache Ant project. The reduction of the entropy is
not as big as the Jetty-JComp, because we componentized only20 top-level services out of101
top-level services identified from the Apache Ant project.
Measurement Item Value
Case Study Size (KLOC) 86.5Source Code Modeling Time (min : sec) 5:20Source Code Model Space (MB) 3.34Architecture Modeling Time (min : sec) 9:15Architecture Model Space (MB) 3.92Top-Level Service Identification Time (min : sec) 19:43Average Low-Level Service Identification Time (sec) 54
Table 7.8: Some Time and Space Statistics of the SOC4J Framework on the Case Study : ApacheAnt.
In Table 7.8, we summarize the time and space complexity of the proposed service-oriented
componentization framework as a function of source code size of the Apache Ant project. The
experiment was carried on a Windows desktop with Intel Pentium IV CPU 3.4GHz, 2G memory.
7.5 Summary
The design and implementation of supporting tools are fundamental requirements to assess the
practical use of a re-engineering approach. In this chapter, we developed a toolkit implementing
the proposed componentization framework as an Eclipse Rich Client Platform (RCP) application,
The important aspects of the proposed framework have been tested through a series of experi-
ments. The empirical study has shown that the proposed framework is effective in identifying
services from an existing Java system and reconstructing it to a component-based system.
Chapter 8
Future Directions and Conclusions
In this Chapter, we summarize the findings of this thesis and outline future research directions
that may arise from this research. In Section 8.1, we present the contributions of this thesis, and
in Section 8.2, we discuss some future work that could extend this research. Finally, we make
some concluding remarks for this work in Section 8.3.
8.1 Contributions
The principle contributions of this thesis were stated in Chapter 1. Based on the material already
presented, we discuss them in more detail :
• The design and implementation of comprehensive graph representations of an object-oriented
system in different levels of abstraction. These graph representations include the class/in-
terface relationship graph (CIDG), the class/interface dependency graph (CIDG), modular-
ized CIDGs (MCIDGs), service hierarchy graphs (SHGs), and component hierarchy graphs
(CHGs). Each graph represents the system in a different level of abstraction.
• The exploration of an incremental program comprehension approach, including describ-
119
CHAPTER 8. FUTURE DIRECTIONS AND CONCLUSIONS 120
ing an object-oriented software system using different concurrent views, each of which
addresses a specific set of concerns of the system. The SOC4J framework extracts four
views to understand an object-oriented software system. The extracted source code models
provide the basic view (BView), while the recovered architectural models build the struc-
tural view (SView), the identified top-level services together with their service hierarchy
graphs give the service view (ServView), and the generated top-level components together
with their component hierarchy graphs introduce the component view (CompView) of the
system. Each view assists the user in understanding the system from a different perspective.
• The design and implementation of an efficient and effective methodology for identifying
and realizing critical business services embedded in an existing object-oriented system.
The business services embedded in an existing system were categorized into two classes :
Top-Level Services (TLS) and Low-Level Services (LLS). A top-level service is a service
that is not used by any other services of the system. However, it may contain a hierarchy of
low-level services further describing the service. From the requester’s point of view, top-
level services are provided by the system that can be accesses independently. A low-level
service is a service that is underneath a top-level service and may be agglomerated with
other low-level services to yield a new service with a higher level of granularity. The service
identification methodology is a combination of top-down and bottom-up techniques. In the
top-down portion of the methodology, we identify the top-level services and the atomic
services underneath each top-level service by identifying the entry points of the system. In
the bottom-up portion, we aggregate the atomic services to identify services with higher
level of granularity by applying a series of graph transformations. The service aggregation
is performed incrementally.
• The design and implementation of an object-oriented restructuring methodology that trans-
forms the typically monolithic architectures of an existing system to a more flexible service-
CHAPTER 8. FUTURE DIRECTIONS AND CONCLUSIONS 121
oriented architecture. For each identified service (both top-level services and low-level
services), we generate a self-contained component. A component that realizes a top-level
service is calledTop-Level Component(TLC), while a component that realizes a low-level
service is callLow-Level Component(LLC). Based on extracted components, a meta-model
for the component-based target system is designed. we introduce a reconstruction tech-
nique that automatically reconstructs the existing source system into a component-based
system.
• The design and implementation of a prototype system that supports the identification and
realization of critical business services embedded in an Java software system and the com-
ponentization of the Java System. The prototype is designed as an Eclipse Rich Client Plat-
form (RCP) application and namedJComp Java Componentization Kit. A list of JComp
plug-ins have been developed to implement the techniques introduced in the framework. A
set of empirical studies have been performed on the JComp toolkit.
8.2 Future Work
Several new research questions have arisen from this work. We believe that significant improve-
ments can be made in some aspects of the presented approach. The possible future work is
presented as follows :
• To apply the dynamic analysis on system behavior within the first stage of the SOC4J
framework to improve the detection of class relationships.
• To investigate algorithmic processes that can be used to automatically categorize the iden-
tified services.
• To measure the reusability and maintainability of the extracted components more concisely.
CHAPTER 8. FUTURE DIRECTIONS AND CONCLUSIONS 122
• To verify that our definitions are consensual with respect to developers’ intent when per-
forming software re-engineering.
• To apply our componentization toolkit, JComp, on more real-life programs and to validate
their results with the program developers.
• To extend our approach on other programming languages. For instance, C++ programs, or
even C and COBOL systems.
• To develop our approach with more flavors of binary class relationships, such as shared-
aggregation and container relationships.
• To improve the precision of the service identification by considering design-patterns, alter-
nate implementations of the algorithms, and alternate definitions of the class relationships.
8.3 Conclusions
In this thesis, we presented a service-oriented componentization framework for Java systems.
The framework componentizes an object-oriented system to re-modularize the existing assets for
supporting service functionality. We introduced an approach for identifying, modeling, and pack-
aging critical business services embedded in an existing system. In addition to producing reusable
components realizing the identified services, the framework also provides a component-based in-
tegration approach to migrate an object-oriented design to a service-oriented architecture. Our
initial evaluation has shown that our framework is effective in identifying services from an object-
oriented design and migrating it to a service-oriented architecture. Moreover, the BView, SView
ServView, and CompView built by our framework help users gain a program understanding of
the system.
Appendix A
Top-Level Services of Jetty
ID Top-Level Service AtomicServices
Description
T1 Win32 Server 248 Runs the Jetty as a Windows HTTP server.T2 Dynamic Servlet Invoker 207 Invokes anonymous servlets that have not
been defined in the web.xml or by othermeans.
T3 Jetty Server MBean 126 Configures a request log, which records allincoming HTTP requests.
T4 Proxy Request Handler 113 Makes the HTTP/1.1 proxy requests.T5 XML Configuration MBean 87 Performs all required configurations for run-
ning the SESM applications in Jetty contain-ers.
T6 Web Application MBean 86 Manages web applications’ lifecycle.T7 Administration Servlet 56 Jetty Administration Servlet. Allows start
and/or stop of server components and con-trol of debug parameters.
T8 CGI Servlet 49 Runs CGI servlets on Windows.T9 Host Socket Listener 46 Declares a socket listener for a Jetty http
server.T10 Web Configuration 34 Create web container configurations.
Table A.1: Top-Level Services of Jetty (1).
123
APPENDIX A. TOP-LEVEL SERVICES OF JETTY 124
ID Top-Level Service AtomicServices
Description
T11 Authentication Access Handler 30 Creates an authentication access handler forHTTP pages.
T12 Servlet Response Wrapper 27 Wraps a Jetty HTTP response as a 2.2Servlet response.
T13 IP Access Handler 18 Create a handler to authenticate access fromcertain IP-addresses.
T14 Multipart Form Data Filter 16 Decodes the multipart/form-data stream sentby a HTML form that uses a file input item.
T15 HTML Script Block 12 Represents the script block in a HTML form.T16 Applet Block 9 Represents the applet block in a HTML
form.
Table A.2: Top-Level Services of Jetty (2).
Appendix B
Top-Level Services of Apache Ant
ID Top-Level Service AtomicServices
Description
T1 Project Building 205 Runs Ant on a supplied build file.T2 JAR File Expansion 164 Unzips a jar file.T3 WAR File Creation 152 Creates Web Application Archive files.T4 TAR File Creation 144 Creates a tar archive.T5 Zip File Expansion 117 Unzips a zip file.T6 SQL Statement Execution 116 Executes a series of SQL statements via
JDBC to a database.T7 JUnit Invocation 114 Runs tests from the Junit testing framework.T8 JAR File Creation 113 Jars a set of files.T9 TAR File Expansion 95 Expands a tar file.T10 File Packing 92 Packs a file using the GZip or BZip2 algo-
rithm.T11 Unit Test Execution 86 Executes a unit test in the org.apache.testlet
framework.T12 WAR File Expansion 83 Unzips a war file.T13 RPM Invocation 81 Invokes the rpm executable to build a Linux
installation file.T14 File Content Loading 80 Loads a file’s contents as Ant properties.T15 Metamata MParse Invocation 71 Invokes the Metamata MParse compiler-
compiler on a grammar file.T16 CAB File Creation 67 Creates Microsoft CAB Archive files.
Table B.1: Top-Level Services of Apache Ant (1).
125
APPENDIX B. TOP-LEVEL SERVICES OF APACHE ANT 126
ID Top-Level Service AtomicServices
Description
T17 SSH File Copy 67 Copies files to or from a remote server usingSSH.
T18 Build File DTD Generation 67 Generates a DTD for Ant build files thatcontains information about all tasks cur-rently known to Ant.
T19 File Encoding Converting 65 Converts files from native encodings toASCII with escaped Unicode.
T20 Task Adding 59 Adds a task definition to the current project,such that this new task can be used in thecurrent project.
T21 Zip File Creation 57 Creates a zip file.T22 Macro Task Definition 56 Define a new task as a macro built-up upon
other tasks.T23 Path Converting 56 Converts a path format from one platform to
another platform.T24 FTP Implementation 56 Implements a basic FTP client that can send,
receive, list, and delete files, and create di-rectories.
T25 XML File Checking 54 Checks that XML files are valid (or onlywell-formed).
T26 File Expansion 52 Expands a file packed using GZip or BZip2.T27 Directory Property Setting 51 Sets a property to the value of the specified
file up to, but not including, the last path el-ement.
T28 File Availability Property Setting 50 Sets a property if a specified file, directory,class in the classpath, or JVM system re-source is available at runtime.
T29 Path Property Setting 50 Sets a property to the last element of a spec-ified path.
T30 Java Class Execution 45 Executes a Java class within the running(Ant) VM, or in another VM if the fork at-tribute is specified.
T31 Dependency Manifest Generation 45 Generates a manifest that declares all the de-pendencies in manifest.
T32 Key Generation 43 Generates a key in key store.T33 Property Setting 43 Sets a property (by name and value), or set
of properties (from a file or resource) in theproject.
T34 XML Property File Loading 43 Loads property values from a well-formedXML file.
T35 Web Proxy Property Setting 43 Sets Java’s web proxy properties.T36 XML Report Generation 43 Generates an XML report of the changes
recorded in a CVS repository.
Table B.2: Top-Level Services of Apache Ant (2).
APPENDIX B. TOP-LEVEL SERVICES OF APACHE ANT 127
ID Top-Level Service AtomicServices
Description
T37 File Token Identification 40 Identifies keys in files, delimited by specialtokens, and translates them with values readfrom resource bundles.
T38 Java Class Instrumenting 39 Instruments Java classes using the iContractDBC preprocessor.
T39 Existing Task Instrumenting 39 Defines a new task by instrumenting an ex-isting task with default values for attributesor child elements.
T40 File Loading 39 Loads a file into a property.T41 Splash Screen Display 38 Displays a splash screen.T42 File Set Packing 37 GZips a set of files.T43 CVS Pass Entry Adding 37 Adds entries to a .cvspass file.T44 File Checksum Generation 36 Generates a checksum for a file or set of
files.T45 Default Exclude Pattern Modifica-
tion36 Modifies the list of default exclude patterns
from within your build file.T46 JDepend Invocation 35 Invokes the JDepend parser.T47 Time Stamp Setting 35 Sets the DSTAMP, TSTAMP, and TODAY
properties in the current project, based onthe current date and time.
T48 GZip File Expansion 34 Expands a GZip file.T49 File Concatenation 34 Concatenates multiple files into a single one
or to Ant’s logging system.T50 Directory Synchronization 34 Synchronize two directory trees.T51 Condition Property Setting 34 Sets a property if a certain condition holds
true.T52 File Version Checking 34 Sets a property if a given target file is newer
than a set of source files.T53 Telnet Session Generation 34 Automates a remote telnet session.T54 Attribute Permission Change 33 Changes the permissions and/or attributes of
a file or all files inside the specified directo-ries.
T55 Build File Importing 32 Imports another build file and potentiallyoverride targets in it with users’ own targets.
T56 JJTree Invocation 32 Invokes the JJTree preprocessor for theJavaCC compiler-compiler.
T57 Resource Search 32 Finds a class or resource.T58 Temp File Generation 31 Generates a name for a new temporary file
and sets the specified property to that name.T59 Remote Command Execution 30 Execute a command on a remote server us-
ing SSH.T60 Manifest Creation 29 Creates a manifest file.
Table B.3: Top-Level Services of Apache Ant (3).
APPENDIX B. TOP-LEVEL SERVICES OF APACHE ANT 128
ID Top-Level Service AtomicServices
Description
T61 Documentation Generation 29 Generates code documentation using thejavadoc tool.
T62 XSLT Transformation 29 Processes a set of documents via XSLT.T63 CVS Repository Retrieval 29 Handles packages/modules retrieved from a
CVS repository.T64 SMTP Email Sending 28 Sends SMTP emails.T65 User Input 28 Allows user interaction during the build pro-
cess by displaying a message and reading aline of input from the console.
T66 JProbe Invocation 27 Invokes the JProbe suite.T67 Stylebook Invocation 26 Executes the Apache Stylebook documenta-
tion generator.T68 File Comparison 26 Compares a set of source files with a set of
target files, if any of the source files is newerthan any of the target files, all the target filesare removed.
T69 JavaCC Invocation 26 Invokes the JavaCC compiler-compiler on agrammar file.
T70 Regular Expression Replacement 25 Replaces the occurrence of a given regularexpression with a substitution pattern in afile or set of files.
T71 JJDoc Invocation 25 Invokes the JJDoc documentation generatorfor the JavaCC compiler-compiler.
T72 Current Property Listing 25 Lists the current properties.T73 EAA File Creation 24 Creates Enterprise Application Archive
files.T74 File Permission Change 23 Changes the permissions of a file or all files
inside the specified directories.T75 File Deletion 23 Deletes either a single file, all files and sub-
directories in a specified directory, or a setof files specified by one or more FileSets.
T76 Data Type Adding 23 Adds a data-type definition to the currentproject, such that this new type can be usedin the current project.
T77 Change Report File Generation 23 Generates an XML-formatted report file ofthe changes between two tags or datesrecorded in a CVS repository.
T78 File Move 21 Moves a file to a new file or directory, or aset(s) of file(s) to a new directory.
T79 Log Recording 21 Runs a listener that records the logging out-put of the build-process events to a file.
T80 Project Building Termination 21 Exits the current build by throwing aBuildException, optionally printing addi-tional information.
Table B.4: Top-Level Services of Apache Ant (4).
APPENDIX B. TOP-LEVEL SERVICES OF APACHE ANT 129
ID Top-Level Service AtomicServices
Description
T81 Property File Creation 21 Creates or modifies property files.T82 MMetrics Computation 19 Computes the metrics of a set of Java source
files, using the Metamata Metrics/WebGainQuality Analyzer source-code analyzer.
T83 Script Execution 19 Executes a script in a Apache BSF-supported language.
T84 TAB Updating 18 Modifies a file to add or remove tabs, car-riage returns, line feeds, and EOF charac-ters.
T85 URL File Retrieval 18 Gets a file from a URL.T86 Extension Checking 18 Checks whether an extension is present in a
file set or an extension set. If the extensionis present, the specified property is set.
T87 Command Execution 17 Executes a system command.T88 File Modification Time Change 17 Changes the modification time of a file and
possibly creates it at the same time.T89 Sound File Execution 17 Plays a sound file at the end of the build, ac-
cording to whether the build failed or suc-ceeded.
T90 ANTLR Invocation 17 Invokes the ANTLR Translator generator ona grammar file.
T91 JNI Header Generation 17 Generates JNI headers from a Java class.T92 String Replacement 16 Replaces the occurrence of a given string
with another string in a selected file.T93 MAudit Computation 15 Performs static analysis on a set of Java
source-code and byte-code files, using theMetamata Metrics/WebGain Quality Ana-lyzer source-code analyzer.
T94 Directory Creation 15 Creates a directory.T95 Text Output 15 Echoes text to System.out or to a file.T96 File Copying 13 Copies a file or Fileset to a new file or direc-
tory.T97 File Group Ownership Change 12 Changes the group ownership of a file or all
files inside the specified directories.T98 Project Filter Setting 12 Sets a token filter for this project, or reads
multiple token filters from a specified fileand sets these as filters.
T99 Source Code Extraction 12 Allows the user extract the latest edition ofthe source code from a PVCS repository.
T100 File Ownership Change 11 Changes the owner of a file or all files insidethe specified directories.
T101 JAR File Information Display 9 Displays the ”Optional Package” and ”Pack-age Specification” information containedwithin the specified jars.
Table B.5: Top-Level Services of Apache Ant (5).
Bibliography
[1] Software product evaluation-quality characteristics and guidlines for their use.ISO/IEC
Standard ISO-9129, 1991.
[2] Apache Ant. A Java-based build tool.http://ant.apache.org/, 2006.
[3] Jagdish Bansiya and Carl G Davis. A class cohesion metric for object-oriented designs.
Journal of Object-Oriented Programming, 11:47–52, January 1999.
[4] Jagdish Bansiya and Carl G Davis. A hierarchical model for object-oriented design quality
assessment.IEEE Transactions on Software Engineering, 28:4–17, January 2002.
[5] V. Basili, L. Briand, and W. Melo. A validation of object-oriented design metrics as quality
indicators.IEEE Transactions on Software Engineering, 22:751–761, October 1996.
[6] L. Belady and C. Evangelisti. System partitioning and its measure.Journal of Systems and
Software, 2:23–29, 1981.
[7] Martin Bernauer, Gerti Kappel, and Gerhard Kramler. Repre-
senting XML Schema in UML - a comparison of approaches.
http://www.big.tuwien.ac.at/research/publications/2003/1303.pdf, 2003.
[8] Martin Bernauer, Gerti Kappel, and Gerhard Kramler. A UML profile for XML Schema.
Technical report, Business Informatics Group and Vienna University of Technology, 2003.
130
BIBLIOGRAPHY 131
[9] T. Biggerstaff, B. Mitbander, and D. Webster. The concept assignment problem in pro-
gram understanding. InProceedings of the 15th International Conference on Software
Engineering (ICSE), pages 482–498, Baltimore, Maryland, USA, May 1993.
[10] Bison. The YACC-compatible parser generator.http://dinosaur.compilertools.net/#bison,
2006.
[11] G. Booch, M. Christerson, M. Fuchs, and J. Koistinen. UML for XML Schema mapping
specification.Rational White Paper, December 1999.
[12] B. Borges, K. Holley, and A. Arsanjani. Delving into service-oriented architecture.
http://www.developer.com/java/ent/article.php/3409221, 2006.
[13] Marcus A. S. Boxall and Saeed Araban. Interface metrics for reusability analysis of com-
ponents. InProceedings of the Australian Software Engineering Conference (ASWEC),
pages 40–51, April 2004.
[14] L. C. Briand, J. W. Daly, and J. K. Wust. A unified framework for coupling measure-
ment in object-oriented systems.IEEE Transactions on Software Engineering, 25:91–121,
January-February 1999.
[15] L. C. Briand, S. Morasca, and V. Basili. Measuring and assessing maintainability at the
end of high-level design. InProceedings of the IEEE Conference on Software Maintenance
(ICSM), pages 74–81, Montreal, Canada, September 1993.
[16] A. Brown, S. Johnston, and K. Kelly. Using service-oriented architecture and component-
based development to build web service applications.Santa Clara, CA: Rational Software
Corporation, 2002.
[17] E. Burd and M. Munro. Evaluating the use of dominance trees for C and COBOL. InPro-
BIBLIOGRAPHY 132
ceedings of the International Conference on Software Maintenance (ICSM), pages 401–
410, September 1999.
[18] Gianluigi Caldiera and Victor R. Basili. Identifying and qualifying reusable software com-
ponents.IEEE Computer, 24:61–70, Febuary 1991.
[19] David Carlson.Modeling XML Applications with UML: Practical e-Business Applications.
Addison Wesley Professional, 2001.
[20] Alexander Chatzigeorgiou and George Stephanides. Entropy as a measure of object-
oriented design quality. InProceedings of the Balkan Conference in Informatics (BCI),
pages 565–573, November 2003.
[21] K. Chen and V. Rajlich. Case study of feature location using dependence graph. InPro-
ceedings of the 8th International Workshop on Program Comprehension (IWPC), pages
241–249, Limerick, Ireland, June 2000.
[22] S. R. Chidamber and C. F. Kemerer. Towards a metrics suite for object oriented design.
In Proceedings of the Conference on Object-Oriented Programming: Systems, Languages
and Applications (OOPSLA), SIGPLAN Notices 26(11), November 1991.
[23] S. R. Chidamber and C. F. Kemerer. A metrics suite for object oriented design.IEEE
Transactions on Software Engineering, 20:476–493, June 1994.
[24] Y. Chiricota, F. Jourdan, and G. Melancon. Software components capture using graph
clustering. InProceedings of the International Workshop on Program Comprehension
(IWPC), pages 217–226, May 2003.
[25] E. Cho, M. Kim, and S. Kim. Component metrics to measure component quality. InPro-
ceedings of the 8th Asia-Pacific Software Engineering Conference (APSEC), pages 419–
426, Macau SAR, China, December 2001.
BIBLIOGRAPHY 133
[26] D. Cimitile and G. Visaggio. Software salvaging and call dominance tree.Journal of
Systems and Software, 28:117–127, Febuary 1992.
[27] R. Conrad, D. Scheffner, and J. C. Freytag. XML conceptual modeling using UML. In
Proceedings of the 19th International Conference on Conceptual Modeling, pages 558–
571, Salt Lake City, Utah, USA, October 2000.
[28] J. Daly, A. Brooks, J. Miller, J. Topber, and M. Wood. The effect of inheritance depth
on the maintainability of object-oriented software.Empirical Software Engineering: An
International Journal, 1:751–761, February 1996.
[29] J. Eder, G. Kappel, and M. Schrefl. Coupling and cohesion in object-oriented systems.
Technical report, University of Klagenfurt, 1994.
[30] Thomas Eisenbarth, Rainer Koschke, and Daniel Simon. Locating features in source code.
IEEE Transactions on Software Engineering, 29(3):210–224, March 2003.
[31] L. H. Etzkorn and C. G. Davis. Automatically identifying reusable oo legacy code.Com-
puter, 30:66–71, October 1997.
[32] R. Fanta and V. Rajlich. Reengineering object-oriented code. InProceedings of Interna-
tional Conference on Software Maintenance (ICSE), pages 238–246, Bethesda, Maryland,
March 1998.
[33] Flex. A fast scanner generator.http://dinosaur.compilertools.net/#flex, 2006.
[34] P. Fremantle, S. Weerawarana, and R. Khalaf. Enterprise services.Communications of the
ACM, 45(10):77–80, 2002.
[35] G. C. Gannod, S. V. Mudiam, and T. E. Lindquist. An architectural-based approach for
synthesizing and integrating adapters for legacy software. InProceedings of the Seventh
BIBLIOGRAPHY 134
Working Conference on Reverse Engineering (WCRE), pages 128–139, Brisbane, Aus-
tralia, November 2000.
[36] Jean-Franqois Girard and Rainer Koschke. Finding components in a hierarchy of mod-
ules: a step towards architectural understanding. InProceedings of the 13th International
Conference on Software Maintenance (ICSM), pages 58–65, Bari, Italy, October 1997.
[37] U. Gleich and T. Kohler. Tool-support for reengineering of object-oriented systems. In
Proceedings of ESEC-FSE/Workshop on Object-Oriented Reengineering, pages 43–51,
Zurich, Switzerland, September 1997.
[38] W. G. Griswold, J. J. Yuan, and Y. Kato. Exploiting the map metaphor in a tool for software
evolution. InProceedings of the 23th International Conference on Software Engineering
(ICSE), pages 265–274, Toronto, Canada, May 2001.
[39] CGI Group. Component mining: An approach for identifying reusable components from
legacy systems.http://www.cgi.com/cgi/pdf/cgiwhpr 07 mining e.pdf, 2004.
[40] W3C Working Group. Web service architecture.http://www.w3.org/TR/2004/NOTE-ws-
arch-20040211/, 2006.
[41] Yann-Gael Gueheneuc and Herve Albin-Amiot. Recovering binary class relationships:
Putting icing on the UML cake. InProceedings of the 19th Annual ACM Conference on
Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages
301–314, Vancouver, Canada, October 2004.
[42] George Yanbing Guo, Joanne M. Atlee, and Rick Kazman. A software architecture re-
construction method. InProceedings of the 1st Working IFIP Conference on Software
Architecture, pages 225–243, San Antonio, TX, USA, February 1999.
BIBLIOGRAPHY 135
[43] D. Hutchens and V. Basili. System structure analysis: Clustering with data bindings.IEEE
Transactions on Software Engineering, 11(8):749–757, August 1985.
[44] JavaCC. Java compiler compiler.https://javacc.dev.java.net/, 2006.
[45] Jess. A rule engine for the Java platform.http://www.jessrules.com/jess/index.shtml, 2005.
[46] Jetty. A Java HTTP server and servlet container.http://jetty.mortbay.org/jetty/index.html,
2006.
[47] Jini. Jini network technology.http://www.sun.com/software/jini/, 2006.
[48] Rick Kazman and S. Jeromy Carriere. View extraction and view fusion in architectural
understanding. InProceedings of the 5th International Conference on Software Reuse,
pages 290–299, Victoria, BC, Canada, May 1998.
[49] Wing Lam and Venky Shankararaman. An enterprise integration methodology.IT Profes-
sional, 6(2):40–49, 2004.
[50] Lex. A lexical analyzer generator.http://dinosaur.compilertools.net/#lex, 2006.
[51] Shimin Li and Ladan Tahvildari. Jcomp: A reuse-driven componentization framework for
java applications. InProceedings of the International Conference on Program Compre-
hension (ICPC), pages 264–267, Athens, Greece, June 2006.
[52] Shimin Li and Ladan Tahvildari. A service-oriented componentization framework for
java software systems. InProceedings of the 13th IEEE Working Conference on Reverse
Engineering (WCRE), Benevento, Italy, October 2006.
[53] Jing Luo, Renkuan Jiang, Lu Zhang, Hong Mei, and Jiasu Sun. An experimental study of
two graph analysis based component capture methods for object-oriented systems. InPro-
BIBLIOGRAPHY 136
ceedings of the International Conference on Software Maintenance (ICSM), pages 217–
226, May 2003.
[54] S. Mancoridis, B. Mitchell, Y. Chen, and E. R. Gansner. Bunch: A clustering tool for the
recovery and maintenance of software system structures. InProceedings of the Interna-
tional Conference on Software Maintenance (ICSM), pages 50–62, Oxford, UK, August
1999.
[55] S. Mancoridis, B. Mitchell, C. Rorres, and Y. Chen. Using automatic clustering to produce
high-level system organizations of source code. InProceedings of International Workshop
on Program Comprehension (IWPC), pages 45–53, Ischia, Italy, June 1998.
[56] M. Marin, A. Deursen, and L. Moonen. Identifying aspects using fan-in analysis. In
Proceedings of the 11th Working Conference on Reverse Engineering (WCRE), pages 132–
141, Delft University of Technology, Netherlands, November 2004.
[57] J. Martin and H. A. Muller. C to Java migration experiences. InProceedings of the
6th European Conference on Software Maintenance and Reengineering, pages 143–153,
Budapest, Hungary, March 2003.
[58] Steve McConnell.Code Complete. Microsoft Press, Redmond, Washington, USA, 1993.
[59] Alok Mehta and George T. Heineman. Evolving legacy systems features using regression
test cases and components. Inthe 4th International Workshop on Principles of Software
(IWPSE), pages 190–193, Vienna, Austria, September 2001.
[60] Alok Mehta and George T. Heineman. Evolving legacy system features into fine-grained
components. Inthe 24th International Conference on Software Engineering (ICSE), pages
417–427, Buenos Aires, Argentina, May 2002.
BIBLIOGRAPHY 137
[61] Robert Morgan.Building an Optimizing Compiler. Butterworth-Heinemann, Boston, Mas-
sachusetts, 1998.
[62] S. S. Muchnick.Advanced Compiler Design Implementation. Morgan Kaufmann Publish-
ers, San Francisco, California, 1997.
[63] H. Muller, M. Orgun, S. Tilley, and J. Uhl. A reverse engineering approach to subsystem
structure identification.Journal of Software Maintenance: Research and Practice, 5:181–
204, 1993.
[64] H. Muller and J. Uhl. Composing subsystem structures using (k,2)-partite graphs. In
Proceedings of International Conference on Software Maintenance (ICSM), pages 12–19,
San Diego, November 1990.
[65] OMG. UML 2.0 Superstructure Specification. Object Management Group, Framingham,
Massachusetts, USA, October 2004.
[66] Margaretha W. Price and Steven A. Demurjian. Analyzing and measuring reusability in
object-oriented design. InProceedings of the 12th ACM SIGPLAN Conference on Object-
Oriented Programming, Systems, Languages, and Applications, pages 22–33, Atlanta,
Georgia, United States, October 1997.
[67] W. Provost. UML for W3C XML Schema design.
http://www.xml.com/pub/a/2002/08/07/wx-suml.html, 2006.
[68] RCP. Rich Client Platform.www.eclipse.org/rcp, 2005.
[69] M. P. Robillard and G. C. Murphy. Concern graphs: Finding and describing concerns using
structural program depnedencies. InProceedings of the 24th International Conference on
Software Engineering (ICSE), pages 406–416, Buenos Aires, Argentina, May 2002.
BIBLIOGRAPHY 138
[70] O. P. Rotaru and M. Dobre. Reusability metrics for software components. InProceedings
of the 3rd International Conference on Computer Systems and Applications (AICCSA),
pages 24–32, Cairo, Egypt, January 2005.
[71] N. Routledge, L. Bird, and A. Goodchild. UML and XML Schema. InProceedings of
the 13th Australian Database Conference (ADC), pages 274–281, Melbourne, Australia,
February 2002.
[72] SDMetrics. SDMetrics User Manual.http://www.sdmetrics.com/manual/LOMetrics.html,
2006.
[73] Subhash Sharma.Applied Multivariate Techniques. John Wiley, 1996.
[74] S. C. Shaw, M. Goldstein, M. Munro, and E. Burd. Moral dominance relations for program
comprehension.IEEE Transactions on Software Engineering, 29:851–863, Septmeber
2003.
[75] Suk Kyung Shin and Soo Dong Kim. A method to transform object-oriented design into
component-based design using object-z. InProceedings of the International Conference on
Software Engineering Research, Management and Applications (SERA), pages 274–281,
August 2005.
[76] A. Shokoufandeh, S. Mancoridis, and M. Maycock. Applying spectral methods to software
clustering. InProceedings of the Working Conference on Reverse Engineering (WCRE),
pages 3–10, November 2002.
[77] H.M. Sneed. Encapsulating legacy software for use in client/server systems. InPro-
ceedings of the Working Conference on Reverse Engineering (WCRE), pages 104–119,
November 1996.
BIBLIOGRAPHY 139
[78] G. Snider. Measuring the entropy of large software systems.HP Technical Report HPL-
2001-221, 2001.
[79] T. A. Standish. An essay on software reuse.IEEE Transactions on Software Engineering,
10:494–497, September 1984.
[80] Ladan Tahvildari.Quality-Drive Object-Oriented Re-engineering Framework. PhD The-
sis, Department of Electrical and Computer Engineering, University of Waterloo, Ontario,
Canada, August 2003.
[81] Ladan Tahvildari. Testing challenges in adoption of component-based software. InPro-
ceedings of Proceedings of ICSE Workshop on Adoption-Centric Software Engineering
(ACSE), pages 21–25, Edinburgh, Scotland, May 2004.
[82] Ladan Tahvildari and Kostas Kontogiannis. Improving design quality using meta-pattern
transformations: A metric-based approach.Journal of Software Maintenance and Evolu-
tion: Research and Practice (JSME), 16(4), 2003.
[83] Ladan Tahvildari and Kostas Kontogiannis. Develop a multi-objective decision approach
for selecting source-code improving transformations. InProceedings of the 20th Inter-
national Conference on Software Maintenance (ICSM), pages 427–431, Chicago, Illinois,
USA, September 2004.
[84] Ladan Tahvildari and Kostas Kontogiannis. Quality-driven object-oriented code restructur-
ing. In Proceedings of Proceedings of ICSE Workshop on Software Quality (ICSE), pages
47–52, Edinburgh, Scotland, May 2004.
[85] Ladan Tahvildari and Kostas Kontogiannis. Requirements driven software evolution. In
Proceedings of the 12th IEEE International Workshop on program Comprehesion (IWPC),
pages 258–269, Bari, Italy, June 2004.
BIBLIOGRAPHY 140
[86] Ladan Tahvildari, Kostas Kontogiannis, and John Mylopoulos. Quality-driven software
re-engineering.Journal of Systems and Software (JSS), Special Issue on: Software Archi-
tecture - Engineering Quality Attributes, 66(3):225–239, June 2003.
[87] P. Tonella, G. Antoniol, R. Fiutem, and E. Merlo. Points-to analysis for program under-
standing. InProceedings of the 5th International Workshop on Program Comprehension
(IWPC), pages 90–99, May 1997.
[88] W3C. XML Schema Part I: Structures second edition.http://www.w3.org/TR/xmlschema-
1/, 2006.
[89] Ju An Wang. Towards component-based software engineering.Computing Sciences in
Colleges, 16:177–189, October 2000.
[90] H. Washizaki and Y. Fukazawa. A technique for automatic component extraction from
object-oriented programs by refactoring.Science of Computer Programming, 56:99–116,
April 2005.
[91] H. Washizaki, H. Yamamoto, and Y. Fukazawa. A metrics suite for measuring reusability
of software components. InProceedings of the International Software Metrics Symposium
(METRICS), pages 211–223, Spetember 2003.
[92] N. Wilde, M. Buckellew, H. Page, and V. Rajlich. A case study of feature location in un-
structured legacy fortran code. InProceedings of the 5th European Conference on Software
Maintenance and Reengineering (CSMR), pages 68–75, Lisbon, Portugal, March 2001.
[93] N. Wilde and M.C. Scully. Software reconnaissance: Mapping program features to code.
Journal of Software Maintenance: Research and Practice, 7:49–62, January 1995.
[94] W. E. Wong, S. S. Gokhale, and J. R. Hogan. Quantifying the closeness between program
components and features.Journal of Systems and Software, 54(2):87–98, October 2000.
BIBLIOGRAPHY 141
[95] W. E. Wong, S. S. Gokhale, J. R. Hogan, and K. S. Trivedi. Locating program features
using execution slices. InProceedings of IEEE Symposium on Application-Specific Sys-
tems and Software Engineering and Technology, pages 194–203, Richardson, Texas, USA,
March 1999.
[96] W. Eric Wong and J. Jenny Li. Redesigning legacy systems into the object-oriented
paradigm. InProceedings of International Symposium on Object-Oriented Real-Time Dis-
tributed Computing (ISORC), Hakodate, Hokkaido, Japan, May 2003.
[97] X. Xu, C. H. Lung, M. Zaman, and A. Srinivasan. Program restructure through cluster-
ing technique. InProceedings of International Workshop on Source Code Analysis and
Manipulation (SCAM), pages 75–84, September 2004.
[98] Yacc. Yet another compiler-compiler.http://dinosaur.compilertools.net/#yacc, 2006.
[99] Zhuopeng Zhang, Ruimin Liu, and Hongji Yang. Service identification and packaging
in service oriented reengineering. InProceedings of the 7th International Conference
on Software Engineering and Knowledge Engineering (SEKE), pages 241–249, Taipei,
Taiwan, China, July 2005.
[100] Wei Zhao, Lu Zhang, Yin Liu, Jiasu Sun, and Fuqing Yang. Sniafl: Towards a static
non-interactive approach to feature location. InProceedings of the 26th International
Conference on Software Engineering (ICSE), pages 293–303, Scotland, UK, May 2004.
[101] Ying Zou and Kostas Kontogiannis. Towards a web-centric legacy system migration. In
Proceedings of ICSE Workshop on Net-Centric Computing (NCC), May 2001.