A Service-Oriented Componentization Framework for Java … · 2007. 8. 20. · A Service-Oriented Componentization Framework for Java Software Systems by Shimin Li A thesis presented

A Service-Oriented Componentization Frameworkfor Java Software Systems

by

Shimin Li

A thesis

presented to the University of Waterloo

in fulfilment of the

thesis requirement for the degree of

Master of Applied Science

in

Electrical and Computer Engineering

Waterloo, Ontario, Canada, 2006

c©Shimin Li 2006

I hereby declare that I am the sole author of this thesis.

I authorize the University of Waterloo to lend this thesis to other institutions or individuals

for the purpose of scholarly research.

Shimin Li

I further authorize the University of Waterloo to reproduce this thesis by photocopying or by

other means, in total or in part, at the request of other institutions or individuals for the purpose

of scholarly research.

Shimin Li

ii

Abstract

Service-oriented computing has dramatically changed the way in which we develop software

systems. In the fast growing global market for services, providing competitive services to these

markets is critical for the success of businesses and organizations. Since many competitive ser-

vices have already been implemented in existing systems, leveraging the value of an existing

system by exposing all or parts of it as services within a service-oriented environment has be-

come a major concern in today’s industry. In this work, we categorize services embedded in a

system into two categories : i)Top-level servicesthat are not used by another service but may

contain a hierarchy of low-level services further describing and modularizing the service, and

ii) Low-level servicesthat are underneath a top-level service and may be agglomerated with other

low-level services to yield a new service with a higher level of granularity. To meet the de-

mand of identifying and reusing the business services embedded in an existing software system,

we present a novel service-oriented componentization framework that automatically supports:

i) identifying critical business services embedded in an existing Java system by utilizing graph

representations of the system models, ii) realizing each identified service as a self-contained com-

ponent that can be deployed as a single unit, and iii) transforming the object-oriented design into

a service-oriented architecture. A toolkit implementing our framework has been developed as an

Eclipse Rich Client Platform (RCP) application. Our initial evaluation has shown that the pro-

posed framework is effective in identifying services from an object-oriented design and migrating

it to a service-oriented architecture.

iii

Acknowledgments

First and foremost, I am deeply indebted to my supervisor, Professor Ladan Tahvildari, for

her patient academic (and personal) guidance over the years. Her passion for doing and commu-

nicating innovative and creative science has and always will be a great source of inspiration. I

feel very privileged to have worked with her.

I wish to thank the members of my dissertation committee: Professor Kostas Kontogiannis

and Professor Sagar Naik, for having accepted to take the time out of their busy schedule to read

my thesis and provide me invaluable comments and inspiring remarks.

I would like to thank all members of the Software Technologies and Applied Research (STAR)

group for their tremendous support and cooperation.

I want to thank my parents who have been extremely understanding and supportive of my

studies. I want to thank my wonderful wife, Wei, who has encouraged me so much over the

years. I also want to thank my lovely son, Zihan, for letting Dad work on his dissertation when

he needed to do so. I feel very lucky to have a family that shares my enthusiasm for academic

pursuits.

iv

Contents

1 Introduction 1

1.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

1.2 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6

1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6

2 Related Work 8

2.1 Program Comprehension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8

2.1.1 Feature Locating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9

2.1.2 Software Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . .12

2.2 Program Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13

2.2.1 Migrating Procedural Legacy Systems to Object-Oriented Paradigm . . .13

2.2.2 Re-Engineering Existing Object-Oriented Systems . . . . . . . . . . . .15

2.3 Architecture Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

2.4 Software Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19

2.4.1 Identification of Reusable Components in Source Code . . . . . . . . . .19

2.4.2 Creation of Services from Legacy Systems . . . . . . . . . . . . . . . .21

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22

3 Service-Oriented Componentization Framework 23

v

3.1 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24

3.2 Architecture Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

3.3 Service Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26

3.4 Component Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27

3.5 System Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

4 Architecture Recovery 30

4.1 XML Schema Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . .31

4.1.1 UML Profile for XML Schemas . . . . . . . . . . . . . . . . . . . . . .31

4.1.2 Representing XML Schemas in UML . . . . . . . . . . . . . . . . . . .32

4.2 Modeling Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33

4.2.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34

4.2.2 Source Code Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .35

4.3 Modeling Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38

4.3.1 Definitions of Class Relationships . . . . . . . . . . . . . . . . . . . . .39

4.3.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45

4.3.3 Class/Interface Relationship Graph . . . . . . . . . . . . . . . . . . . .47

4.3.4 Class/Interface Dependency Graph . . . . . . . . . . . . . . . . . . . . .49

4.3.5 An Example : Car Rental System . . . . . . . . . . . . . . . . . . . . .51

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53

5 Service Identification 54

5.1 Service Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54

5.2 Supporting Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56

5.2.1 Graph Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57

5.2.2 Dominance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .59

vi

5.2.3 Modularization Quality Metric . . . . . . . . . . . . . . . . . . . . . . .62

5.3 The Proposed Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63

5.3.1 Top-Level Service Identification . . . . . . . . . . . . . . . . . . . . . .64

5.3.2 Low-Level Service Identification . . . . . . . . . . . . . . . . . . . . . .68

5.3.3 An Example : Car Rental System . . . . . . . . . . . . . . . . . . . . .72

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79

6 Component Generation and System Transformation 80

6.1 Component Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81

6.1.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81

6.1.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85

6.2 System Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89

6.2.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89

6.2.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91

6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93

7 Empirical Studies 94

7.1 A Prototype for the SOC4J Framework . . . . . . . . . . . . . . . . . . . . . . .95

7.1.1 Tool Integration Requirements . . . . . . . . . . . . . . . . . . . . . . .95

7.1.2 JComp RCP Application . . . . . . . . . . . . . . . . . . . . . . . . . .97

7.2 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100

7.2.1 Component Reusability . . . . . . . . . . . . . . . . . . . . . . . . . . .100

7.2.2 Architectural Improvement . . . . . . . . . . . . . . . . . . . . . . . . .105

7.3 Case Study : Jetty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .106

7.3.1 Statistics of the Jetty . . . . . . . . . . . . . . . . . . . . . . . . . . . .107

7.3.2 Discussions on Obtained Results . . . . . . . . . . . . . . . . . . . . . .107

7.4 Case Study : Apache Ant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113

vii

7.4.1 Statistics of the Apache Ant . . . . . . . . . . . . . . . . . . . . . . . .113

7.4.2 Discussions on Obtained Results . . . . . . . . . . . . . . . . . . . . . .114

7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118

8 Future Directions and Conclusions 119

8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .119

8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121

8.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122

A Top-Level Services of Jetty 123

B Top-Level Services of Apache Ant 125

viii

List of Tables

4.1 The Metric Suite at Class Level . . . . . . . . . . . . . . . . . . . . . . . . . . .46

7.1 Statistics of the Jetty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107

7.2 Top-Level Services Identified from Jetty. . . . . . . . . . . . . . . . . . . . . . .109

7.3 Low-Level Services Identified in Top-Level ServiceWin32 Server. . . . . . . . . 111

7.4 Some Time and Space Statistics of the SOC4J Framework on the Case Study : Jetty.113

7.5 Statistics of the Apache Ant. . . . . . . . . . . . . . . . . . . . . . . . . . . . .114

7.6 Selected Top-Level Services Identified from Apache Ant. . . . . . . . . . . . . .114

7.7 Low-Level Services Identified in Top-Level ServiceWAR File Creation. . . . . . 115

7.8 Some Time and Space Statistics of the SOC4J Framework on the Case Study :

Apache Ant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118

A.1 Top-Level Services of Jetty (1). . . . . . . . . . . . . . . . . . . . . . . . . . . .123

A.2 Top-Level Services of Jetty (2). . . . . . . . . . . . . . . . . . . . . . . . . . . .124

B.1 Top-Level Services of Apache Ant (1). . . . . . . . . . . . . . . . . . . . . . . .125





ix

List of Figures

2.1 The Conceptual Model of Eisenbarth’s Approach. . . . . . . . . . . . . . . . . .11

2.2 The Block Diagram of the Quality-Based Re-engineering Process. . . . . . . . .16

2.3 The Dali Workbench. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

3.1 The Architecture of the Service-Oriented Componentization Framework. . . . . .24

4.1 The Approach for Source Code Modeling. . . . . . . . . . . . . . . . . . . . . .34

4.2 The Meta-Model for Java Package Models. . . . . . . . . . . . . . . . . . . . .35

4.3 The Meta-Model for Java Source File Models. . . . . . . . . . . . . . . . . . . .36

4.4 The Meta-Model for Java Classe/Interface Models. . . . . . . . . . . . . . . . .37

4.5 The Meta-Model for Java Method/Constructor Models. . . . . . . . . . . . . . .38

4.6 The Approach for Architecture Modeling. . . . . . . . . . . . . . . . . . . . . .45

4.7 The UML Representation of XML Schema for Nodes in the CIRG. . . . . . . . .48

4.8 The UML Representation of XML Schema for Nodes in the CIDG. . . . . . . . .50

4.9 The CIRG of the Car Rental System (CRS). . . . . . . . . . . . . . . . . . . . .51

4.10 The CIDG of the Car Rental System (CRS). . . . . . . . . . . . . . . . . . . . .52

5.1 The UML Representation of XML Schema for a Service. . . . . . . . . . . . . .56

5.2 An Example of a Directed Graph. . . . . . . . . . . . . . . . . . . . . . . . . .58

x

5.3 (a) A connected component of the directed graphG in Figure 5.2. (b) The other

connected component ofG. (c) The only strongly connected component ofG.

(d) A rooted component of graph (a). (e) The other rooted component of graph (a).59

5.4 (a) A Simple Directed Graph. (b) The Dominance Tree Corresponding to the

Graph in (a). (c) All Two Maximal Consolidation Subtrees of the Dominance

Tree in (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60

5.5 Processes in Service Identification Stage. . . . . . . . . . . . . . . . . . . . . .63

5.6 The MCIDGs of the Car Rental System. . . . . . . . . . . . . . . . . . . . . . .73

5.7 The SHG of the Top-Level ServiceV ehicleBooking. . . . . . . . . . . . . . . . 74

5.8 The Result SHG of Performing the SHG Transformation on the Original SHG of

the Top-Level ServiceV ehicleBooking in the CRS System. . . . . . . . . . . .75

5.9 The Service Dominance Tree of the SHG in Figure 5.8. . . . . . . . . . . . . . .76

5.10 The Reduced Dominance Tree of the Service Dominance Tree in Figure 5.9. . . .77

5.11 The SHG Reconstructed from the Reduced Service Dominance Tree in Figure 5.10.78

6.1 The UML Representation of XML Schema for a Component. . . . . . . . . . . .83

6.2 The UML Class Diagrams ofCustomer andPerson in the CRS System. . . . . 86

6.3 Part of UML Class Diagram of the ComponentCustomer. . . . . . . . . . . . . 88

6.4 The Meta-Model for the Component-Based Target System. . . . . . . . . . . . .90

6.5 The Service Hierarchy Graphs of the CRS System. . . . . . . . . . . . . . . . .92

6.6 The Component Hierarchy Graphs of the CRS System. . . . . . . . . . . . . . .92

6.7 The Component-Based Car Rental System. . . . . . . . . . . . . . . . . . . . .93

7.1 The Tool Interconnection for the SOC4J Framework. . . . . . . . . . . . . . . .96

7.2 The Architecture of the JComp Java Componentization Kit. . . . . . . . . . . . .98

7.3 A Snapshot of the JComp Java Componentization Kit. . . . . . . . . . . . . . . .99

7.4 The Component Reusability Model. . . . . . . . . . . . . . . . . . . . . . . . .104

xi

7.5 The AcceptedService Viewof the Extractor plug-in. . . . . . . . . . . . . . . . .108

7.6 Iterations of the Service Aggregation Process of Top-Level ServiceWin32 Server. 110

7.7 The CHG of Top-Level ComponentWin32 Serverof the Jetty. . . . . . . . . . .111

7.8 The Reusability of Components Extracted from Jetty. . . . . . . . . . . . . . . .112

7.9 The CHG of Top-Level ComponentWAR File Creationof the Apache Ant. . . .116

7.10 The Reusability of Components Extracted from the Apache Ant. . . . . . . . . .117

xii

Chapter 1

Introduction

Billions of dollars are spent each year on computer software. Much of this effort is spent on cre-

ating and testing new source code. To save money, increase productivity, and improve quality and

reliability, academic and industrial institutions have put a lot of effort into reusing existing soft-

ware. The arrival of new software technology creates the need to leverage existing software assets

in order to take advantage of the new technology, but implementing business-critical applications

whenever a new technology arrives is impossible due to the time and resources required. The

only option then is software re-engineering. Examples of such new software technologies that

created big demand and market for suitable legacy systems re-engineering, wrapping and evo-

lution methods are distributed object technology, component technology, the World Wide Web

(WWW) and XML.

Service-oriented computing has the potential to drastically change the way we develop soft-

ware. When global markets for services provide the potential for reuse at a much greater scale,

providing competitive services to these markets will be critical to the implementation of this vi-

sion as a whole as well as to the success of individual business. However, lots of what would make

competitive services are already implemented in existing systems. The challenge then is how to

transform the functionality of existing legacy systems fully or partially into services. Identifying,

1

CHAPTER 1. INTRODUCTION 2

extracting and re-engineering software components that implement abstractions within existing

systems is a promising cost-effective way to create reusable assets and re-engineer existing soft-

ware systems.

Today, more and more organizations are migrating to service-oriented architectures (SOA) to

achieve net-centric operations. This offers the potential of leveraging legacy systems by exposing

some parts of the system as services within the SOA. However, there is often a lack of effective

engineering approaches for identifying, describing, modeling, and realizing services embedded

in existing software systems. The core of an SOA is a service which is a coarse-grained, dis-

coverable, and self-contained software entity that interacts with applications and other services

through a loosely coupled, often asynchronous, message-based communication model.

The reuse of an existing software systems requires a comprehensive framework to identify

and extract critical business services embedded in the existing system. A business service of

a software system is an abstract resource that represents a capability of performing tasks that

represents a coherent functionality from the point of views of provider entities and requester en-

tities [40]. Effective system reuse and evolution require both the “big picture” and the lower level

dependencies between portions of the source code. The focal point of the proposed research is

to exploit the synergy between the areas ofProgram Comprehension[9, 21, 38, 69, 97, 100],

Architecture Recovery[43, 54, 55, 59, 60, 63, 64],Software Reuse[34, 35], andProgram Migra-

tion [57, 80–85, 96].

In this context, our goal is to develop a service-oriented componentization framework that

decomposes an existing object-oriented system to re-modularize the existing assets to support

service functionality. More specifically, the proposed framework should automatically support :

i) identifying critical business services embedded in an existing Java system, ii) realizing each

identified service as a self-contained component, and iii) transforming the object-oriented design

into a service-oriented architecture. To be of practical use, such a re-engineering environment

should be generic in the sense of being able to support different object-oriented existing systems.


In other words, it must be built upon a meta-model of object-oriented existing systems rather than

upon a particular existing system. This avoids the cost of developing of a dedicated evolution

environment for each target system. Hence the environment should consequently be configurable

with a model of the target existing system which parameterizes the evolution environment with

the existing system to be evolved and serves as a basis for specifying the components to be

created.

This research addresses a problem that has challenged the research community for several

years - namely the asset reuse of object-oriented existing systems. It also devises a framework

in which reuse and evolution activities do not occur in a vacuum, but can be monitored and fine-

tuned by the user in order to address specific quality requirements for extracted components and

the evolutive target system such as, component granularity and reusability, and system maintain-

ability.

1.1 Problem Description

An effective way of leveraging the value of existing systems is to expose their functionalities as

reusable components to a larger number of clients through well-defined component interfaces.

Each component encapsulates a business service such as processing a payment, currency con-

version, computing an insurance quotation, etcetera. In general, we have found that the code of

existing systems represents a set of components with significant reuse potential. However, be-

cause the existing system does not have sufficient architecture or other high level documentation,

it is difficult to understand both the “big picture” and the lower level dependencies between por-

tions of the code. From the implementation point of view, the challenge consists of two phases :

• Reverse Engineering: Identifying and extracting the top-level functions of an existing soft-

ware system, and providing service descriptions for these identified functions.

• Forward Engineering: Performing any necessary transformations to migrate the mono-


lithic architecture of the existing systems to a more flexible service-oriented architecture.

In this thesis, we are interested in the reverse engineering challenge. Service identification is

complicated by the usual obstacles of having to deal with potentially large and poorly structured

existing systems. Identifying these service candidates for packaging as reusable components

would require analysis of massive amounts of legacy code or at least graphic representations of

the code. Additionally, it would require intervention of people with background in the business

domain to judge what functions are likely to make reusable services.

The identification of functions suitable for exposure as services can be seen as an instance of

a more generic problem of functional decomposition of existing systems. Here we are required

to abstract the code, or an alternative code representation (e.g., XML or graphs) to higher-level

representations that describe the system architecture in terms of its functional units. Moreover,

access points to these functional units would need to be identified as well.

To reuse the identified services and migrate the existing system’s implementation into a

component-based architecture, it might be necessary to package the identified services into well-

documented and self-contained components, during the forward engineering phase. If service

packaging is required, then it needs techniques for automatically extracting the relevant procedu-

ral elements from existing systems, and creating an interface for components.

Furthermore, a formal description needs to be developed for each service. Service descrip-

tions should document possible dependencies between service invocations, beside syntactic in-

formation on the number and types of parameters. Such descriptions are crucial for developers

to implement applications based on the services extracted and should therefore be presented in a

way they can understand.

We seek a combination of solutions from three different domains in order to tackle the service

identification, service modeling, and service packaging problem :

• Source Code Analysis and Reverse Engineering Technology. We aim to create a frame-


work that has methodological and technological steps to recover higher-level design and

architecture representations of existing software systems based on the source code artifacts.

This includes creation of a suitable representation of design and architectural models that

reflect the functional decomposition of the system. To distinguish these models from each

other, design models are more detailed and refer to different parts of a system, whereas

architectural models are more abstract and refer to the system as a whole. Our starting

point for this line of work is exploring the existing body of work on architecture recovery

and reconstruction, as well as software clustering in searching for suitable algorithms and

ideas.

• UML Technology. Models in the UML will provide a high-level representation of anal-

ysis results and service descriptions that is understandable for both software developers

and business experts. As a universal language, the UML provides standard notations for

almost all aspects of a system. Structural features like data types, operation signatures, and

architectures are captured by class and component diagrams. System behavior, including

scenarios, processes, and protocols are captured by sequence or activity diagrams as well

as state-charts. We use component diagrams to provide a high-level overview of the pro-

posed services, components and their interfaces. Based on this representation, the users

can validate the proposed services.

• Graph Transformation Technology. We utilize the graph transformation technology to

implement mappings between different graphical representations of programs and models.

The strength of the approach lies in the fact that model transformations can be expressed

graphically, based on Meta-Object Facility (MOF) models for the source and target models.

Also in the service identification phase, the graph transformation technology can be used

to agglomerate services.


1.2 Thesis Contribution

This thesis aims to design a framework that helps to reuse the assets of existing systems and

migrate their object-oriented design to service-oriented architectures. This deals with the long-

standing problem of reusing and evolving existing object-oriented systems in the following ways :

• By designing and implementing comprehensive graphic representations of an object-oriented

system in different levels of abstraction.

• By exploring an incremental program comprehension approach, including describing an

object-oriented software system using different concurrent views, each of which addresses

a specific set of concerns of the system.

• By designing and implementing an efficient and effective methodology for identifying and

realizing critical business services embedded in an existing object-oriented system.

• By designing and implementing an object-oriented restructuring methodology that trans-

forms the typically monolithic architectures of existing systems to more flexible service-

oriented architectures.

• By designing and implementing a prototype system that supports the identification and

realization of critical business services embedded in an Java software system and the com-

ponentization of the Java System.

1.3 Thesis Organization

This thesis is organized as follows :

• Chapter 2 reviews the related work, with the aim of putting this thesis in context. It cov-

ers four research areas that form the foundation of this thesis :Program Comprehension,

Architecture Recovery, Software Reuse, andProgram Migration.


• Chapter 3 gives an overview of the service-oriented componentization framework for Java

software systems. This framework uses graph representations of an existing object-oriented

software system and graph transformations to identify business services embedded in the

system. Furthermore, the framework realizes each identified service into a self-contained

component and transforms the object-oriented design into a service-oriented architecture.

The proposed framework is composed of four stages :Architecture Recovery, Service Iden-

tification, Component Generation, andSystem Transformation.

• Chapter 4 discusses reverse engineering techniques used within the architecture recovery

stage to build source code models and architectural models of an existing object-oriented

software system.

• Chapter 5 presents the service identification strategy and algorithm that are used within the

service identification stage to identify critical business services embedded in an existing

object-oriented system.

• Chapter 6 discusses the processes within the component generation stage and the system

transformation stage. It covers the service packaging technique and architecture recon-

struction technique.

• Chapter 7 shows the application of the proposed service-oriented componentization frame-

work on some real world Java projects. The prototype of the framework and framework

evaluation criteria will be introduced. Case studies will be explained and the results will be

discussed.

• Chapter 8 presents the conclusions of this research work and discusses possible directions

future research might take.

• Appendix A and B list and describe the identified business services from the case studies.

Chapter 2

Related Work

In this chapter we review the related work, with the aim of putting this thesis in context. We

survey four research areas that form the foundation of this thesis, namelyProgram Comprehen-

sion, Program Migration, Architecture Recovery, andSoftware Reuse. The Program Comprehen-

sion section outlines approaches for locating features in source code and techniques for software

clustering. The Architecture Recovery section presents the technologies used in the software ar-

chitecture recovery domain. The Software Reuse section reviews the techniques for identifying

reusable components in source code and creating services from legacy systems. The Program

Migration section discusses current methodologies for migrating procedural legacy systems to

the object-oriented paradigm and re-engineering existing object-oriented systems. Finally, the

last section summarizes the material presented in this chapter.

2.1 Program Comprehension

The identification of potentially reusable services embedded in an existing system requires an

understanding of the functionality of each parts of the system. Program understanding or analysis

in general includes any activity that uses dynamic or static methods to reveal the properties of

8

CHAPTER 2. RELATED WORK 9

existing systems. It most commonly refers to an examination of source code, without the use

of any specification or execution information. There are two main subjects related to our work :

Feature LocatingandSoftware Clustering.

2.1.1 Feature Locating

A feature is a realized functional requirement of a system [30]. Generally, the termfeaturealso

subsumes non-functional requirements. In the context of this research, onlyfunctionalfeatures

are relevant; that is, we consider a feature to be an observable behavior of the system that can be

triggered by the user.

Understanding the implementation of a certain feature of a system requires identification of

the computational units of the system that contribute to this feature. In many cases, the map-

ping of features to the source code is poorly documented. Wilde et al. [93] were pioneers in

locating features taking a fully dynamic approach. The goal of their Software Reconnaissance

is the support of maintenance programmers when they modify or extend the functionality of a

legacy system. Based on the execution of test cases for a particular featuref , several sets of

computational units are identified :

• computational units commonly involved (code executed in all test cases, regardless off ),

• computational units potentially involved inf (code executed in at least one test case that

invokesf ),

• computational units indispensably involved inf (code that is executed in all test cases that

invokef ), and

• computational units uniquely involved inf (code executed exactly in cases wheref is

invoked).


A computational unit is an executable part of a system. Examples for computational units are

instructions (like accesses to global variables), basic blocks, routines, classes, compilation units,

components, modules, or subsystems. Since the primary goal is the location of starting points for

further investigations, Wilde et al. focus on locating specific computational units rather than all

required computational units.

Another approach, based on dynamic information, was presented by Wong et al. [95]. They

analyzed execution slices of test cases implementing a particular functionality. The process was

described as follows :

1. The invoking input setI (i.e., a set of test cases) is identified that will invoke a feature.

2. The excluding input setE is identified that will not invoke a feature.

3. The program is executed twice usingI andE separately.

4. By comparison of the two resulting execution slices, the computational units can be identi-

fied that implement the feature.

In [94], Wong et al. presented a way to quantify features. Metrics are provided to compute

the dedication of computational units to features, the concentration of features in computational

units, and the disparity between features.

In [21], Chen and Rajlich proposed a semiautomatic method for feature location, in which

the programmer browses the statically derivedAbstract System Dependency Graph(ASDG). The

ASDG describes detailed dependencies among routines, types, and variables at the level of global

declarations. The navigation on the ASDG is computer-aided and the programmer takes on all

the search for a feature’s implementation. The method takes advantage of the programmer’s

experience with the analyzed software. It is less suited to locate features if programmers without

any pre-knowledge do not know where to start the search.


Eisenbarth et al. [30] presented a semiautomatic technique that reconstructs the mapping for

features that are triggered by the user and exhibit an observable behavior. The mapping is in

general not injective; that is, a computational unit may contribute to several features. Their tech-

nique allows for the distinction between general and specific computational units with respect to a

given set of features. For a set of features, it also identifies jointly and distinctly required compu-

tational units. The presented technique combines dynamic and static analysis to rapidly focus on

the system’s parts that relate to a specific set of features. Dynamic information is gathered based

on a set of scenarios invoking these features. Figure 2.1 illustrates the conceptual model used

by Eisenbarth et al. It describes the relationships among features, scenarios, and computational

units.

Scenario Feature

Basic Block Routine Module

Computational Unitinvokes implemented by

* ***

Figure 2.1: The Conceptual Model of Eisenbarth’s Approach.

In [92], Wilde and Rajlich compared two feature locating approaches, namely theSoftware

Reconnaissancetechnique and theDependency Graph Searchmethod. In the presented case

study, both techniques were effective in locating features. The Software Reconnaissance showed

to be more suited to large infrequently changed programs, whereas the Dependency Graph Search

method was found to be more effective if further changes are likely and require deep and more

complete understanding.


2.1.2 Software Clustering

Clustering techniques have been used in many disciplines to support the grouping of similar

objects of a system. Clustering analysis is a technique used for combining observations into

groups or clusters such that each group or cluster is homogeneous or compact with respect to

certain characteristics and each group should be different from other groups with respect to the

same characteristics [73]. The primary objective of clustering analysis is to take a set of objects

and characteristics with no apparent structure and impose a structure upon them with respect to a

characteristic.

The primary objective of clustering analysis is to facilitate better understanding of the ob-

servations and the subsequent construction of complex knowledge structures from features and

object clusters. Most clustering approaches attempt to provide solutions in restructuring legacy

systems.

Belady and Evangelisti introduced an approach that automatically clusters a software system

in order to reduce its complexity [6]. They also provided a measure for the complexity of a system

after it has been clustered. Their clustering approach was based on the information extracted from

the documentation of the system.

Muller et al. [63, 64] implemented several software clustering heuristics in the Rigi tool that

(i) measure the relative strength between interfaces, (ii) identify omnipresent modules, and (iii)

use the similarity between module names. They introduced the important principles of small

interfaces (the number of elements of a subsystem that interface with other subsystems should be

small compared to the total number of elements in the subsystem) and of few interfaces (a given

subsystem should interface only with a small number of the other subsystems).

Hutchens and Basili [43] developed an algorithm that clusters procedures into modules by

measuring the interaction between pairs of procedures. Their clustering technique was based on

data bindings. A data binding was defined as an interaction between two procedures based on


the location of variables that are within the static scope of both procedures. Based on the data

bindings, a hierarchy is constructed from which a partition can be derived. They compared their

structures with the developer’s mental model with satisfactory results and evaluated the stability

of the system, focusing on what happened with the clustering when changes are done.

Mancoridis et al. [55] treated clustering as an optimization problem and used genetic algo-

rithms to overcome the local optima problem of hill-climbing algorithms, which are commonly

used in clustering problems. They implemented a tool called Bunch [54] that can generate better

results faster when users are able to integrate their knowledge into the clustering problems. They

also show how the subsystem structure of a system can be maintained incrementally after the

original structure has been produced.

2.2 Program Migration

Program transformation is the act of changing one program into another. The language in which

the program being transformed and the resulting program are written are called the source and

target languages, respectively. Program transformation is used in many areas of software engi-

neering, including compiler construction, software visualization, documentation generation, and

automatic software renovation. There are two main subjects related to our work :Migrating

Procedural Legacy Systems to Object-Oriented ParadigmandRe-Engineering Existing Object-

Oriented Systems.

2.2.1 Migrating Procedural Legacy Systems to Object-Oriented Paradigm

Many researchers have proposed different methodologies for migrating the architecture or the

code of software systems written in a procedural language to comply with object-oriented paradigms.

For instance, Martin and Muller [57] reported cased studies on transliterating C source code

to Java using Ephedra method. The method includes three processes :


• Insertion of C function prototypes,

• Data type and type cast analysis, and

• Transliteration of source code.

By applying the Ephedra method, parts of C code can be implemented into Java platforms which

makes it possible to avoid a complete redevelopment of the business logic that was already pre-

sented in the current application. However, the difficulty in using this method is that as C is a

procedural language and Java is an object-oriented language, not only do the syntax and seman-

tics of the source code need to be translated, but also a paradigm shift is necessary to move from

procedural to object-oriented code.

Wong and Li [96] proposed a stepwise approach for abstracting object-oriented designs from

procedural source code :

• Abstract the program structure, such as procedure and variable call graphs, and group vari-

ables as well as procedures into classes by using structure similarity and pattern matching,

• Conduct dynamic code partition using an execution sliced-based technique and visualizing

various functionalities in the code, and

• Refine the object-oriented design generated in the previous step, if necessary, with the aid

of simulation.

Web enabling the existing applications offers high leverage and good return on investment.

The web enabling process may involve the following issues :

• Wrapping the existing legacy application with Internet technologies. The advantage of this

process is that previous investment into legacy code remains intact. Also, by segregating

the user interface from the business logic module of the legacy application, only that which

is required for making the application “Internet aware” is modified.


• It is important to establish the proof of concept on the proposed solution by web enabling

a part of the system instead of the whole. This in turn can help in defining the long-term

strategy on the appropriate solution that will best suit the organization.

• An existing legacy application might need to be reconstructed to leverage the existing busi-

ness process.

In [101], Zou and Kontogiannis presented a framework to address these issues on migrating

legacy systems into a web-enabled environment by involving the CORBA wrapper and the SOAP

CORBA IDL translator. The migration process focuses on specifying the identified legacy com-

ponents in XML, consequently wrapping them by CORBA objects, and finally deploying the

distributed component into the application server. A scripting language that is encoded in an

XML format can be used for allowing thin clients to communicate with legacy components.

2.2.2 Re-Engineering Existing Object-Oriented Systems

Computing environments are evolving from mainframe systems to distributed systems. Stand-

alone programs that have been developed using object-oriented technology are not suitable for

these new environments. Hence, many researchers have addressed these issues by re-engineering

the existing object-oriented systems.

Tahvildari and Kontogiannis [80, 86] presented a framework for providing quality-based and

quality-driven re-engineering of object-oriented systems. The framework adopts an incremental

and iterative re-engineering process model that is driven by the soft-goal interdependency graphs.

The re-engineering process includes the following steps as illustrated in Figure 2.2. First, the

source code is represented as an Abstract Syntax Tree. The tree is further decorated using a linker,

with annotations that provide linkage, scope, and type information. Once software artifacts have

been understood, classified and stored during the reverse engineering phase, their behavior can

be available to the system during the forward engineering phase. Then, the forward engineering


Goal-Driven

Non-Functional

Requirements

Transformation

Rules

Source Code Evaluation Final SystemNew

Code

ASG, AST, RSF, …

UML Diagrams

High-Level

Source Code

Representation

Figure 2.2: The Block Diagram of the Quality-Based Re-engineering Process.

phase aims to produce a new version of a legacy system that operates on the target architecture

and aims to address specific non-functional requirements. Finally, the framework uses an iterative

procedure to obtain the new migrant source code by selecting and applying a transformation

which leads to performance or maintainability enhancements. The transformation is selected

from the soft-goal interdependency graphs. The resulting migrant system is then evaluated and

the step is repeated until quality requirements are met.

Fanta and Rajlich [32] re-engineered the object-oriented program to improve the program

structure and thus its maintainability. A deteriorated C++ application was restructured to move

“misplaced” code and data from their original classes to the classes they naturally belong to.

Gleich and Kohler [37] proposed an approach for transforming object-oriented legacy systems

into modern framework-based architectures in order to improve their maintainability. They also

provided a reference architecture for re-engineering tools and a few tool-prototypes which were

developed at Daimler-Benz.

Xu et al. [97] presented an approach to program restructuring at the functional level based on

the clustering technique with cohesion as the main concern. The approach focused on automated

support for identifying ill-structured or low cohesive functions and providing heuristic advice in

both development and evolution phases. The empirical observations showed that the heuristic


advice provided by the approach can help software designers make better decision of why and

how to restructure a program.

2.3 Architecture Recovery

One of the areas in software architecture is architecture recovery through reverse engineering of

existing implementations. Knowing the architecture of a software system may play an impor-

tant role in maintenance and evolution of the system. This knowledge helps the developer to

know where in the system to modify and what parts of the system will be affected by the change.

Moreover, in order to decompose an existing system, there is a need for an efficient architec-

ture recovery process. One of the areas in software architecture is architecture recovery through

reverse engineering of existing implementations.

View Extraction

Figure 2.3: The Dali Workbench.


Since architecture recovery has received considerable attention recently, numerous articles

have been published on this topic and various frameworks, techniques and tools have been devel-

oped. Basically, existing knowledge, obtained from experts and design documents, and various

tools are necessary to solve the problem. For instance, Kazman and Carriere presented a work-

bench for architectural extraction calledDali [48]. Figure 2.3 illustrates Dali’s architecture. In

this workbench, a variety of lexical-based, parser-based and profiling-based tools are used to ex-

amine a system and extract static and dynamic views to be stored in a repository. Analysis of

these views is supported by visualization and specific analysis tools. They enable an interaction

with experts to control the recovery process until the software architecture is reconstructed.

Another architecture recovery approach was proposed by Guo et al. in [42], calledArchi-

tecture Recovery Method(ARM). ARM is semi-automatic analysis method for reconstructing

architectures based on the recognition of architectural patterns. Existing knowledge gained from

design documentation is used to define queries for potential pattern instances which are then ap-

plied automatically to extracted and fused source model views. Human evaluation is required to

determine which of the detected pattern instances are intended, and which are false positive and

false negative. ARM supports patterns at various abstraction levels and uses lower-level patterns

to build higher-level patterns and composite patterns. In this way the approach is aimed particu-

larly at systems that have been developed using design patterns whose implementations have not

eroded over time.

Dominance analysis is a fundamental concept in compiler optimizations and has been used

extensively to identify loops in basic block graphs [61]. It allows one to locate subordinated soft-

ware elements in a rooted dependency graph. Dominance analysis on call graphs of procedural

language applications has been used in reverse engineering to identify modules and subsystems

and recover system architectures [17, 26, 36]. Cimitile and Visaggio [26] first introduced domi-

nance analysis as a method to identify related parts of an imperative system. This idea was further

elaborated on in [17, 36]. The authors applied dominance analysis on call graphs of procedural


language applications to identify modules and subsystems. In this research, we explore the use

of dominance analysis to identify services from an object-oriented application.

2.4 Software Reuse

Software reuse enables applications to be developed faster and less expensively. It also offers

numerous other benefits, including :

• Return on Investment. Components built or purchased by a company for one particular

project can be reused in future projects, maximizing the company’s return on investment.

• Adaptability. With component-based development (CBD), applications can be easily adapted

to respond to changing business needs. The modular nature of components enables them

to be easily modified, added, deleted or swapped to provide new or enhanced functionality.

• Reliability. Reusing software components decreases the risk of operational glitches be-

cause the components have already been previously tested in other applications.

Current software reuse techniques include object-orientation, component-based software devel-

opment, and service-based development. In this section, we review two topics on software reuse

which are relevant to this research work :Identification of Reusable Components in Source Code

andCreation of Services from Legacy Systems.

2.4.1 Identification of Reusable Components in Source Code

Re-engineering legacy systems into component-based systems involves identifying reusable pieces,

or components, of the legacy system so that the system can be restructured using those pieces.

These components are actually modules of the system’s code that perform certain business func-

tions independently by processing a specific set of data. Once such components are identified


in the system, they can be “mined”, or extracted, and reused to build a component-based sys-

tem [39].

The component identification exercise first requires the software developer to gain an under-

standing of the legacy system. A software system can be understood in the following terms :

• Different elements of the system such as programs, jobs, and data files.

• Relationships that exist between those elements. Also different views can be constructed

based on these elements and their relationships to each other, for instance, a call graph can

be created to show the relationship between various programs.

Once we gain an understanding of how the legacy system is built, we need to break the system

down into components. This can be accomplished by selecting certain points within the system

and expanding the boundaries of those points until all related system elements are included within

the boundaries. The process of expanding these boundaries may be driven primarily by system

queries, documentation on the system, its maintenance history, and the knowledge of those who

have worked with the system in the past.

The component identification approach can be classified into two categories [39] :Data-

Centric IdentificationandEvent-Centric Identification. The data-centric approach to component

identification involves analyzing the different types of data within the system, identifying the

business functions performed on each type of data and pinpointing where each business function

is performed throughout the system. Once a unique, independent business function is identified

and isolated, it can then be segregated as a component. The event-centric approach to component

identification is used to identify components in event- driven systems such as onlineCustomer

Information Control System(CICS) programs. Most online CICS programs are driven by events

generated either by user input or internal programs. In an event-driven system, any time an

event takes place, specific code within the system is executed. Components can be identified by

triggering an event and isolating the specific business functions that result from that event. In this


research, we focus only on the Data-Centric Identification approaches.

Caldiera and Basili introduced theComputer Aided Reuse Engineering(Care) system, which

describes an algorithmic approach for program understanding, to support identifying reusable

components using the user-definedreusability attribute modelbased on software metrics in the

context of a procedural paradigm [18].

Etzkorn and Davis presented an approach for identifying reusable classes from object-oriented

systems based on the understanding of comments and identifiers in the source code [31]. Their

tool CHRisuses natural-language techniques to help users decide whether a class implements

certain useful functionality.

In [4], Bansiya and Davis introduced aQuality Model for Object-Oriented Design(QMOOD)

which measures functional, structural and relational details of the system based on high-level

attributes. In the model, they calculate reusability based on coupling, cohesion, and design size.

Shin and Kim proposed techniques for transforming an available object-oriented design into

a component-based design [75]. Their techniques focus on formal model specification and trans-

formation.

None of these methods, however, provide hierarchical structures nor propose the reconstruc-

tion of the system’s original architectural design. We aim to develop techniques for recovering

high-level design, to extract the service hierarchy embedded in object-oriented systems, and to

migrate object-oriented designs to service-oriented architectures.

2.4.2 Creation of Services from Legacy Systems

A software service of a software system is an abstract resource that represents a capability of

performing tasks that represent a coherent functionality, from the point of views of both the

provider and the requester of the software [40]. A service should have well-defined functional

interface and be easily discovered and accessed [99]. A service-based development paradigm,

or services model [34], is one in which components are viewed as services. In this model, ser-


vices can interact with one another and be providers or consumers of data and behavior. Some

of the defining characteristics of service-based technologies include modularity, availability, de-

scription, implementation-independence, and publication [34]. In the service-based development

paradigm, a primary focus is upon the definition of the interface needed to access a service (de-

scription) while hiding the details of its implementation (implementation-independence).

Gannod et al. described an architecture-based approach for the creation of services from

legacy components using wrapping, or adapters and the subsequent integration of these services

with service-requesting client applications [35]. The technique utilizes an architecture descrip-

tion language to describe components as services and achieves run-time integration using Jini [47]

middleware technology. The methodology involves two steps for creating services : (i) specifica-

tion of components as services; and (ii) generation of services using proxies via the construction

of appropriate adapters and glue code. These services are consequently registered and made

available on a network.

Mehta and Heineman [59, 60] integrated the concepts of features, regression tests, and the

component-based software engineering (CBSE) into an approach for evolving procedural legacy

systems. This methodology was divided into three parts : i) selecting test cases by consider-

ing features that need evolution; ii) executing selected test cases using code profilers to locate

source code that implements features and analyzing and refactoring located source code to create

components; iii) comparing pre- and post-evolution maintenance costs.

2.5 Summary

In this chapter, we have reviewed four principle research fields upon which this thesis is founded :

Program Comprehension, Program Migration, Architecture Recovery, andSoftware Reuse. The

aim of this chapter is to provide a general background to existing and ongoing research in these

areas. In subsequent chapters, we will present our own contributions in more detail, and also

present detailed analysis of our approach in comparison to closely related work.

Chapter 3

Service-Oriented Componentization

Framework

Since many competitive services have already been implemented in existing systems, leveraging

the value of an existing system by exposing all or parts of it as services within a service-oriented

environment has become a major concern in today’s industry. Identifying of functions suitable

for exposure as services can be seen as an instance of a more generic problem of functional

decomposition of existing systems. To reuse the identified services and migrate the existing

system’s implementation into a service-oriented environment, one needs to package the identified

services into well-documented and self-contained components, during the forward engineering

phase.

In this research, we develop a service-oriented componentization framework for the Java soft-

ware system, which decomposes an existing object-oriented system to re-modularize the existing

assets to support service functionality. More specifically, the proposed framework automatically

supports : i) identifying critical business services embedded in an existing Java system, ii) re-

alizing each identified service as a self-contained component, and iii) transforming the object-

23

CHAPTER 3. SERVICE-ORIENTED COMPONENTIZATION FRAMEWORK 24

oriented design into a service-oriented architecture. We name the proposed componentization

framework as the SOC4J framework.

This chapter outlines the proposed SOC4J framework, while the details are discussed more

thoroughly in subsequent chapters.

Architecture Reconstruction

Stage 4: System Transformation

Architecture Modeling

Stage 1: Architecture Recovery

Source Code Modeling

Source code models (Facts)

Top-Level Service Identification

Low-Level Service Identification Self-ContainedComponentRepository

Stage 2: Service Identification

Legend

ProcessData Flow Control Flow

Component Generation

Stage 3: Component Generation

SourceCode

Component-BasedSystem

Top-level services and atomicsub services contained in each top-level service

Architectural models

Self-ContainedComponentsValidated services

(Top-level servicesand their low-levelservices)

Figure 3.1: The Architecture of the Service-Oriented Componentization Framework.

3.1 Framework Overview

The proposed SOC4J framework uses graph representations of an existing object-oriented soft-

ware system and graph transformations to identify business services embedded in the system.

In this research, we are interested in the reverse engineering challenge. Service identification is

complicated by the usual obstacles of having to deal with potentially large and poorly structured

existing systems. Identifying these service candidates for packaging as reusable components


would require analysis of massive amounts of legacy code or at least graph representations of

the code. Additionally, it would require intervention of people with background in the business

domain to judge what functions are likely to make successful services. The identification of func-

tions suitable for exposure as services can be seen as an instance of a more generic problem of

the functional decomposition of existing systems. Here, we are required to abstract the code,

or an alternative code representation (e.g., XML or graphs), to higher-level representations that

describe the system architecture in terms of its functional units.

Furthermore, the framework realizes each identified service into a self-contained component

and reconstructs the object-oriented design into a service-oriented architecture. To reuse the iden-

tified services and migrate the existing system’s implementation into a component-based archi-

tecture, it is necessary to package the identified services into well-documented and self-contained

components. Service packaging needs techniques for automatically extracting the relevant pro-

cedural elements from the existing system and creating an interface for components. Also, the

restructuring of object-oriented systems requires a comprehensive framework to relate refactor-

ing operations and software transformations with non-functional requirements. As illustrated in

Figure 3.1, the proposed componentization framework is comprised of four stages :Architec-

ture Recovery, Service Identification, Component Generation, andSystem Transformation. The

following sections elaborate on each stage of these stages.

3.2 Architecture Recovery

Software architecture recovery aims at reconstructing views on the architecture as-built. Effective

system reuse and evolution require both the “big picture” and the lower level dependencies be-

tween portions of the source code. The identification of functions suitable for exposure as services

can be seen as an instance of a more generic problem of functional decomposition of existing sys-

tems. In this problem, it is required to abstract the code, or an alternative code representation (e.g.,


XML or graphs) to higher-level representations that describe the system architecture in terms of

its functional units.

In the architecture recovery stage, we aim to create a framework that has methodological

and technological steps to recover higher-level design and architecture representations of existing

software systems based on source code artifacts. This includes the creation of a suitable represen-

tation of design and architectural models that reflect the functional decomposition of the system.

To distinguish them from each other, design models are more detailed and refer to different parts

of a system, whereas architectural models are more abstract and refer to the system as a whole.

There are two goals we are trying to achieve at this stage : i) building complete data models

for Java source code at different abstracted levels to support a wide range of structural analysis

and recovery, and ii) establishing a repository of relationships among classes and interfaces which

can easily be queried in the service identification stage.

3.3 Service Identification

Identifying critical business services embedded in an existing Java system is one of the primary

tasks of the SOC4J framework. Essentially, the service identification process of the SOC4J frame-

work is to identify related modules in the system. This process is based on the analysis on the

recovered architectural information obtained from the previous chapter.

A business service of a software system is an abstract resource that represents a capabil-

ity of performing tasks that represent a coherent functionality from the point of views of both

the provider and the requester. In order to clearly describe and automate the service identifica-

tion process, we categorize the service embedded in an object-oriented system into two classes :

i) Top-level servicesthat are not used by another service but may contain a hierarchy of low-level

services further describing the service, and ii)Low-level servicesthat are underneath top-level

service and may be agglomerated with other low-level services to yield a new service with a


higher level of granularity. Furthermore, a formal description needs to be developed for each

service. Such descriptions should document possible dependencies between service invocations,

beside syntactic information on the number and types of parameters. Such descriptions are cru-

cial for developers to implement applications based on the services extracted and should therefore

be presented in a way understandable to them.

In the service identification stage, we aim to identify both the top-level services and the low-

level services embedded in an existing system. The proposed service identification approach is

supported by a combination of top-down and bottom-up techniques. In the top-down portion of

the process, we identify the top-level services and the atomic services underneath each top-level

service. In the bottom-up portion, we aggregate the atomic services to identify services with

higher level of granularity, using graph transformations.

3.4 Component Generation

An effective way of leveraging the value of existing systems is to expose their functionalities as

reusable components to a larger number of clients through well-defined component interfaces.

Hence, the identified services should be packaged as components so that they can be deployed

and thus invoked. Moreover, in order to migrate the existing system’s implementation into a

component-based architecture, it might be necessary to package the identified services into com-

ponents. If service packaging is required, then it needs techniques for automatically extracting the

relevant procedural elements from the existing system, and creating an interface for components.

The service-oriented architecture (SOA) encourages individual services to be self-contained.

A self-contained component is a component that contains all code necessary to implement its

services and hence it can be deployed independently. At the third stage of the proposed SOC4J

framework, we realize each top-level service and the low-level services underneath the top-level

service into self-contained components. More specifically, for each identified service, we extract


all classes and interfaces that are necessary for implementing the service, generate an interface

for the service, and package these classes/interfaces together with the interface as a JAR file.

As Figure 3.1 depicted, the output of this stage is a repository of self-contained components.

The quality of the component is important in order to succeed in the reuse-driven development

process. Key qualities of good reusable components include correctness, complexity, observ-

ability, testability, customizability, and performance. However, most of these qualities are not

directly measurable. In this thesis, we aim at assessing the reusability of the extracted compo-

nents through the analysis of their interfaces and internal methods. Reusability is a high-level

quality of software components and hence it is the result of the combination and interaction of

many low-level properties. We define a component reusability model that typically shows the

reusability as being composed of quality properties such as complexity, observability, customiz-

ability, and external dependency.

3.5 System Transformation

A component-based system is built by combining and interconnecting the components. There-

fore, the component-based approach supports reusability and flexibility. Based on the compo-

nents that realize the identified business services, transforming the monolithic architecture of an

existing object-oriented system to a more flexible service-oriented architecture is another goal of

the proposed SOC4J framework.

In the system transformation stage, we aim at reconstructing an existing Java system into a

component-based system by using the generated component from the source system. A reference

model for the component-based target system has been presented. The system transformation

process should preserve the functionality of the original system. The surrounding parts of the

component should use newly extracted components in order to avoid the situation where two sets

of classes providing the same functionalities exist in the same system.


As Figure 3.1 shows, the output of this stage is a component-based system providing the same

functionality as the original system.

3.6 Summary

In this chapter, we outlined the proposed service-oriented componentization framework. The role

of each stage of the framework has been discussed. We will present the techniques used within

each stage in the subsequent chapters.

Chapter 4

Architecture Recovery

Software architecture recovery aims at reconstructing views on the architecture as-built. Knowing

the architecture of a software system plays an important role in the maintenance and evolution

of the system. This knowledge helps the engineer to know where in the system to modify and

what parts of the system will be affected by the change. Moreover, in order to componentize an

existing system, there is a need for an efficient architecture recovery process. The first stage of

the service-oriented componentization framework is the architecture recovery stage. There are

two goals we are trying to achieve at this stage :

• Building complete data models for Java source code at different levels of abstraction to

support a wide range of structural analysis and recovery, and

• Establishing a repository of relationships among classes and interfaces which can easily be

queried in the service identification stage.

This chapter discusses two main processes contained in the architecture recovery stage :

Source Code Modelingprocess andArchitecture Modelingprocess. In Section 4.1, we discuss

the UML representation of XML schemas which we define in this thesis. We explain the source

code modeling process in Section 4.2, while the architecture modeling process is discussed in

30

CHAPTER 4. ARCHITECTURE RECOVERY 31

Section 4.3. Finally, Section 4.4 summarizes this chapter.

4.1 XML Schema Representation

As designed, the output of each stage of the componentization framework is presented as XML

documentations. Before we delve into the processes of each stage in the framework, we need

to find an understandable and formal way to present the XML schemas we define in each stage.

UML [65] is being used as the de-facto standard for software development; therefore a need

arises to integrate XML schemas into UML-based software development processes. Not only is

the production of XML schemas out of UML models required, but also the integration of XML

schemas as input into the development process, because standard data structures and document

types are part of the requirements [7]. In this section, we describe the UML representation of

XML schemas that we define in the rest of the thesis.

4.1.1 UML Profile for XML Schemas

Existing work on representing XML schemas in UML has emerged from approaches to platform

specific modeling in UML and transforming these models to XML schemas, with the recognized

need for UML extensions to specify XML schemas peculiarities. Booch et al. first presented an

approach to modeling XML schemas using UML notation in [11]. Although based on a prede-

cessor to XML schemas, it introduced UML extensions addressing the modeling of elements and

attributes, model groups, and enumerations that can also be found in recent approaches. Bernauer

et al. [7] summarized and compared recent main approaches to represent XML schemas in UML

as follows :

• Carlson [19] described an approach based on XMI rules for transforming UML to XML

schemas. Carlson introduced a UML profile which addresses most XML schema con-

cepts, except for simple content complex types, global elements and attributes, and identity


constraints. Regarding semantic equivalence, the profile has some weaknesses in its repre-

sentation of model groups, i.e.,sequence, choice, andall elements in XML schemas.

• Provost [67] addressed some of the weaknesses of [19] by addressing representation of

enumerations and other restriction constraints, and of list and union type constructors, al-

though the latter doesn’t conform to UML.

• David Carlson [19] defined a UML profile for representing XML schemas that was based

on the XML conceptual models discussed in [27]. Their UML profile addressed some

enhancements regarding simple types and notations.

• Routledge et al. [71] pointed out the importance of separating the conceptual schema

(i.e., the platform independent model) from the logical schema (i.e., the platform specific

model). This separation is not considered in the other approaches. They considered the

logical schema as direct, one-to-one representation of the XML schema in terms of a UML

profile. The profile that they defined covers almost all concepts of XML schema, but sev-

eral of its representations do not conform to UML.

• Bernauer et al. [8] adapted the approach proposed in [71] to aim at a one-to-one represen-

tation of XML schemas in an UML profile. Their approach was built on the existing UML

profiles for XML Schemas, with some improvements and extensions.

4.1.2 Representing XML Schemas in UML

By applying the UML profile, we represent the XML schema defined in this research in UML

notation. We propose three criteria to choose an existing UML profile for an XML schema :

1. The UML profile provides a semantically equivalent representation of an XML schema in

UML supporting a bijective mapping between both representations. In order to satisfy this


requirement, the profile has to address the whole range of XML schema concepts such that

any XML schema can be expressed in UML.

2. The UML profile supports round-trip engineering, that is, transformation from XML schema

to UML and back again without loss of schema information.

3. The UML profile maximizes understandability of semantic concepts by users knowledge-

able of UML but not XML schema.

By examining the result of the evaluation performed in [7], we adopt the UML profile de-

fined in [71] to represent the XML schema throughout this research work. The UML profile for

the XML schema provided in [71] contains classes and associations that represent constructions

found in the XML schema specification [88]. It is intended that every concept in an XML schemas

has a corresponding representation in the UML profile (and vice versa). As a result, there is a

one-to-one relationship between thelogical (UML notation) andphysical(XML schema nota-

tion) XML schema representations.

4.2 Modeling Source Code

Fact extraction from source code (i.e., finding pieces of information about the system) is a fun-

damental step of reverse engineering and often has to be performed first. That means the before

performing any high-level reverse engineering analysis or architecture recovery activities, avail-

able information in the source code has to be extracted and aggregated in a fact base. Such a fact

base forms the foundation for further analysis tasks that are conducted next. We aim to build a

complete data model set for Java source code at different levels of abstraction to support a wide

range of structural analysis and recovery. These models are essential for representing the sys-

tem at the source code level and computing reusability attributes for each individual class. The

source code models are presented as XML documents and form theBasic View(BView) of the


system [51].

4.2.1 Approach

There are a number of existing meta-models for representing object-oriented software. Most

of those are aimed atObject-Oriented Analysis and Design(OOAD), the most notable example

being the Unified Modeling Language (UML). However, these meta-models represent software at

the design level. Re-engineering requires information about software at the source code level. We

propose an automated approach for modeling the entities of Java software systems at the source

code level. The approach is based on the Java Compiler Compiler (JavaCC) [44] as Figure 4.1

depicted.

Interpreter

JavaCC(Java Compiler Compiler)

Model Generator Source Code

Models

(XML doc)

JavaCC

Grammar

Java Source Code

<generates>

Raw Data

Data FlowControl Flow

Figure 4.1: The Approach for Source Code Modeling.

Source code parser construction tools have been around for serval years. The best known of

these are the famousyacc[98] and lex [50] tools from the Unix domain or their GNU versions

bison[10] andflex [33]. These tools, as well as their successors, allow a stream of input data to

be parsed based on two constructs :

• Tokens. A token is a sequence of input characters that has meaning based upon the desired

syntax. The first step in parser construction is to extract tokens from the input stream. This


generally involves the specification of those tokens in some form of regular expressions.

Token extraction is also known as scanning or lexing (for lexical analysis).

• BackusCNaur Form (BNF) Productions. A BNF production is a set of token sequences

that has meaning based upon the desired syntax. For example, the string “2*3+4” can be

abstractly interpreted as “INTEGER MULT INTEGER ADD INTEGER”. The second step

in the parser construction is to group the tokens together to form the valid sequences for

the desired syntax.

JavaCC offers an excellent toolkit for generating parser classes in Java. JavaCC generates top-

down, recursive descent parsers. The top-down nature of JavaCC allows it to be used with a wider

variety of grammars than other traditional tools, such as yacc and lex. JavaCC also contains all

parsing information in one file (the JavaCC grammar file). The convention is to name this file

with a .jj extension.

The Interpreter in Figure 4.1 is composed of a set of parser classes which are generated by

JavaCC. It parsers the Java source code and outputs a set of raw data of the facts. These raw data

sets are passed to the theModel Generatorwhich builds source code models.

4.2.2 Source Code Models

<<elt>> +class [0..*] : xsd: string

<<elt>> +interface [0..*] : xsd: string

<<sequence>>

sequence

<<complexType>>

JPackage

<<seq>> +sequence [1..1]

<<attr>> +name [1..1] : xsd:string

Figure 4.2: The Meta-Model for Java Package Models.

Source code models represent Java packages, source files, classes, and methods defined in

a class. We define four meta-models for source code models at different levels of abstraction :


JPackage, JFile, JClass, andJMethod. As designed, source code models are exported and stored

as XML documents. Therefore, these meta-models are XML schemas and presented as UML

models by applying the UML profile for XML schemas as discussed in Section 4.1.2.

JPackage

JPackage is the XML schema for modeling Java packages. Figure 4.2 illustrates the JPackage

XML Schema in UML.

JFile

JFile is the XML schema for modeling Java source files. Figure 4.3 illustrates the JFile XML

Schema in UML.

<<elt>> +publicType [0..1] : PublicType

<<elt>> +nonPublicTypes [0..1] : NonPublicTypes

<<sequence>>

sequence


<<attr>> +javaSourceFile [1..1] : xsd:string

<<attr>> +size [1..1] : xsd:positiveInteger

<<complexType>>

JFile

<<complexType>>

PublicType

<<choice>> +choice [1..1]

<<elt>> +class [0..1] : xsd: string

<<elt>> +interface [0..1] : xsd: string

<<choice>>

choice

<<complexType>>

NonPublicTypes

<<seq>> +npt_sequence [1..1]

<<elt>> +class [0..*] : xsd: string

<<elt>> +interface [0..*] : xsd: string

<<sequence>>

npt_sequence

Figure 4.3: The Meta-Model for Java Source File Models.

JClass

JClass is the XML schema for modeling Java classes or interfaces. Figure 4.4 illustrates the

JClass XML Schema in UML.


<<seq>> +m_sequence [1..1]

<<complexType>>

JClass::Modifiers

<<seq>> +sc_sequence [1..1]

<<complexType>>

JClass::SuperClass

<<seq>> i_sequence [1..1]

<<complexType>>

JClass::Interfaces

<<seq>> f_sequence [1..1]

<<complexType>>

JClass::Fields

<<seq>> nc_sequence [1..1]

<<complexType>>

JClass::NestedClasses

<<elt>> +constructor [0..*] : JMethod

<<elt>> +method [0..*] : JMethod

<<sequence>>

cm_sequence

<<complexType>>

JClass



<<attr>> +type [1..1] : xsd:string

<<attr>> +size [1..1] : xsd:positiveInteger



<<attr>> +cardinality [1..1] : xsd:positiveInteger

<<complexType>>

JClass::Fields::Field

<<sequence>>

sequence

<<elt>> +package [0..1] : xsd:string

<<elt>> +importedClasses [0..1]

<<elt>> +modifiers [0..1]

<<elt>> +superClasses [0..1]

<<elt>> +interfaces [0..1]

<<elt>> +fields [0..1]

<<elt>> +nestedClasses [0..1]

<<elt>> +methods [0..1]

<<elt>> +class [0..*] : xsd:string

<<elt>> +interface [0..*] : xsd:string

<<sequence>>

ic_sequence

<<seq>> +ic_sequence [1..1]

<<complexType>>

JClass::ImportedClasses

<<elt>> +modifier [0..*] : xsd:string

<<sequence>>

m_sequence

<<elt>> +class [0..1] : xsd:string

<<sequence>>

sc_sequence


<<sequence>>

i_sequence

<<elt>> field [0..*]

<<sequence>>

f_sequence

<<elt_ref>> jClass [0..*]

<<sequence>>

nc_sequence

<<seq>> cm_sequence [1..1]

<<complexType>>

JClass::Methods

Figure 4.4: The Meta-Model for Java Classe/Interface Models.

JMethod

JMethod is the XML schema for modeling Java methods defined in a class or constructors of a

class. Figure 4.5 illustrates the JMethod XML Schema in UML.


<<seq>> +rt_sequence [1..1]

<<complexType>>

JMethod::ReturnType

<<seq>> +fp_sequence [1..1]

<<complexType>>

JMethod::FormalParameters

<<seq>> te_sequence [1..1]

<<complexType>>

JMethod::ThrowedExceptions

<<seq>> ce_sequence [1..1]

<<complexType>>

JMethod::CatchedException

<<seq>> it_sequence [1..1]

<<complexType>>

JMethod::InstantiatedTypes

<<complexType>>

JMethod




<<attr>> +cardinality [1..1] : xsd:positiveInteger

<<complexType>>

JMethod::InstantiatedTypes::InstantiatedType

<<sequence>>

sequence

<<elt>> +modifiers [0..1]

<<elt>> +returnType [0..1]

<<elt>> +formalParameters [0..1]

<<elt>> +throwedExceptions [0..1]

<<elt>> +catchedExceptions [0..1]

<<elt>> +unstantiatedTypes [0..1]

<<elt>> +modifier [0..*] : xsd:string

<<sequence>>

m_sequence

<<seq>> +m_sequence [1..1]

<<complexType>>

JMethod::Modifiers

<<elt>> +type [0..1] : xsd:string

<<sequence>>

rt_sequence

<<elt>> +type [0..*] : xsd:string

<<sequence>>

fp_sequence


<<sequence>>

te_sequence


<<sequence>>

ce_sequence

<<elt>> +instantiatedType [0..*]

<<sequence>>

it_sequence

Figure 4.5: The Meta-Model for Java Method/Constructor Models.

4.3 Modeling Architecture

In this thesis, the primary goal of architectural modeling is to establishing a repository of rela-

tionships among classes and interfaces which can easily be queried in the service identification

stage. The relationships among classes and interfaces occur at different levels of abstraction such

as package level, class level, and method level. In the specific context of our work, we ana-

lyze relationship at the class-level. Based on the source code models described in Section 4.2.2,

we identify the relationship between the classes/interfaces and build two architectural models at


different levels of abstraction, namelyClass/Interface Relationship Graph(CIRG) andClass/In-

terface Dependency Graph(CIDG). In addition to the CIRG and CIDG, reusability attributes for

each class are computed and integrated into the graphs. The service identification and extraction

tasks in the next stage are performed upon the transformation of these two graphs. The CIRG and

CIDG are exported as XML documents and form theStructural View(SView) of the system [51].

4.3.1 Definitions of Class Relationships

We aim to identify class/interface relationship at the class level. In order to comply with UML,

the considered types of relationships between two classes (interfaces) in this thesis areinher-

itance, realization, association, aggregation, composition, andusage, which are adapted from

UML 2.0 superstructure specification [65]. We try to formalize the relationships so that we can

automatically detect them in implementation.

In order to formalize class relationships at the implementation level, we extend the class

relationship property set proposed in [41] :

Generalization Property Given two classes,A andB, A may be a specialized form ofB, or B

may provide a contract thatA agrees to carry out. We define the generalization property as

follows :

GE : Class× Class → G

whereG = {null, extends, implements}(4.1)

Hence we haveGE(A,B) ∈ {null, extends, implements}. GE(A,B) = extends if

classA is a specialized form of classB; GE(A,B) = implements if B serves as the

contract thatA agrees to carry out; otherwise,GE(A,B) = null.

Exclusivity Property An instance of classB involved at a given time in a relationship with an

instance of classA can, or cannot, be in another relationship at the same time. We define


the exclusivity property as follows :

EX : Class× Class → B

whereB = {true, false}(4.2)

Given two classes,A andB, EX(A,B) ∈ {true, false}. Value true states that an in-

stance of classB can take part in another relationship with another instance of classA or

of another class. Valuefalse indicates that it cannot. The exclusivity property only holds

at a given time and it does not prevent possible transferals.

Invocation-Site Property Instances of classA, involved in a relationship, send messages to in-

stances of classB. We nameall the set of all possible invocation sites :

all ={field, arrayfield, collectionfield, parameter, arrayparameter,

collectionparameter, localvariable, localarray, localcollection}(4.3)

We distinguish three levels of invocation sites: fields, parameters, and local variables. Also,

we distinguish ”simple” invocation sites, arrays, and collections because they imply differ-

ent sets of programming idioms for their declarations and for their uses, which we need to

individualize when detecting the relationships. We define the invocation-site property as

follows :

IS : Class× Class ⊆ all (4.4)

Given two classes,A andB, IS(A,B) ⊆ all. Values of theIS property describe the

invocation sites for messages sent from instances of classA to instances of classB. There

can be no message sent from class A to class B, i.e.,IS(A,B) = φ, or messages can be

sent fromA through afield (respectively aparameter, a local variable) of typeB, anarray

field, or a field of typecollection.


Lifetime Property Given two classes,A andB, the lifetime property constrains the lifetimes of

all instances of classB with respect to the lifetimes of all instances of classA. We define

the lifetime property as follows :

LT : Class× Class →‖

where ‖ = {−, +}(4.5)

Hence we haveLT (A,B) ∈ {−,+}. In programming languages with garbage collection,

LT (A,B) = + if all instances of classB are destroyed before the corresponding instances

of classA, andLT (A,B) = − if destroyed after. Also,LT (A,B) ∈‖ if the times of

destruction of instances of classesA andB are unspecified.

Multiplicity Property Given two classes,A andB, the multiplicity property specifies the num-

ber of instances of classB allowed in a relationship with classA. We express this property

as follows :

MU : Class× Class ⊂ N ∪ {+∞} (4.6)

Hence we haveMU(A,B) ⊂ N ∪ {+∞}. For the sake of simplicity, we use an interval

of the minimum and maximum numbers to represent multiplicity. Also, we only consider

multiplicity at the target end of a relationship.

Once the class relationship properties are defined, we can formalize the considered binary

class relationships at implementation level as six conjunctions of the above five properties. For-

malizations of the binary class relationships are important because i) they provide formal language-

independent definitions of the relationships for understanding and communication among soft-

ware engineers, and ii) they are the basis of the detection algorithms needed to bridge the gap

between implementation and design [41].


Inheritance Relationship Given two classes,A andB, let A<IN>−→ B represent that there is

an inheritance relationship betweenA andB, whereA is the source class andB is the

target class. The inheritance relationship signifies that classA shares the structure and

behavior of classB and implies an ”is-a-kind of” relationship. We formalize the inheritance

relationship as follows :

A<IN>−→ B = (GE(A,B) = extends) ∧ (GE(B, A) = null) (4.7)

Realization Relationship Given two classes,A andB, let A<RE>−→ B represent that there is a

realization relationship betweenA andB, whereA is the source class andB is the target

class. The realization relationship signifies that classA must realize, or implement, the

behavior specified by the classesB (in Java case,B is an interface). We formalize the

inheritance relationship as follows :

A<IN>−→ B = (GE(A,B) = implements) ∧ (GE(B, A) = null) (4.8)

Association Relationship Given two classes,A andB, let A<AS>−→ B represent that there is

an association relationship betweenA andB, whereA is the source class andB is the

target class. The UML specifies that the association represents the ability of one instance

of the source class to send a message to an instance of the target class [65]. This is typi-

cally implemented with a pointer or reference instance variable, although it might also be

implemented as a method parameter, or the creation of a local variable. We formalize the

association relationship as follows :


A<AS>−→ B = (GE(A,B) = null) ∧ (GE(B, A) = null) ∧

(EX(A, B) ∈ B) ∧ (EX(B,A) ∈ B) ∧

(IS(A,B) = all) ∧ (IS(B, A) = φ) ∧

(LT (A, B) ∈ ‖) ∧ (LT (B,A) ∈ ‖) ∧

(MU(A,B) = [0, +∞]) ∧ (MU(B, A) = [0, +∞])

(4.9)

Aggregation Relationship Given two classes,A andB, letA<AG>−→ B represent that there is an

aggregation relationship betweenA andB, whereA is the source class andB is the target

class. By the UML specification [65], the aggregation relationship is the typical whole/part

relationship. That is, an instance of the target class (the part) is a part of an instance of the

source class (the whole). The aggregation relationship implies a ”has a” relationship and

is exactly the same as an association with the exception that instances cannot have cyclic

aggregation relationships. We formalize the aggregation relationship as follows :

A<AG>−→ B = (GE(A,B) = null) ∧ (GE(B, A) = null) ∧

(EX(A,B) ∈ B) ∧ (EX(B,A) ∈ B) ∧

(IS(A,B) ⊆ {field, arrayfield, collectionfield}) ∧

(IS(B,A) = φ) ∧

(LT (A, B) ∈ ‖) ∧ (LT (B, A) ∈ ‖) ∧

(MU(A,B) = [0, +∞]) ∧ (MU(B,A) = [1, +∞])

(4.10)

Composition Relationship Given two classes,A andB, let A<CO>−→ B represent that there is a

composition relationship betweenA andB, whereA is the source class andB is the target


class. Again, by the UML specification [65], the composition relationship is exactly like

aggregation with the exception that the lifetime of the ’part’ is controlled by the ’whole’.

This control may be direct or transitive. That is, the whole may take direct responsibility

for creating or destroying the part, or it may accept an already created part, and later pass

it on to some other whole that assumes responsibility for it. We formalize the aggregation

relationship as follows :

A<CO>−→ B = (GE(A,B) = null) ∧ (GE(B, A) = null) ∧

(EX(A, B) = true) ∧ (EX(B, A) = false) ∧

(IS(A,B) ⊆ {field, arrayfield, collectionfield}) ∧

(IS(B,A) = φ) ∧

(LT (A,B) = +) ∧ (LT (B, A) = −) ∧

(MU(A,B) = [1, +∞]) ∧ (MU(B, A) = [1, 1])

(4.11)

Usage RelationshipGiven two classes,A andB, let A<US>−→ B represent that there is a usage

relationship betweenA andB, whereA is the source class andB is the target class. The

UML specifies that a usage relationship is one in which the client (the source) requires

the presence of the supplier (the target) for its correct functioning or implmentation [65].

Furthermore, the UML defines five types of usage relationships: i) thecall relationship

signifies that the source operation invokes the target operation, ii) thecreaterelationship

signifies that the source class creates one or more instances of the target class, iii) the

instantiationrelationship signifies that one or more methods belonging to instances of the

source class create instances of the target class, iv) theresponsibilityrelationship signifies

that the client has some kind of obligation to the supplier, and v) thesendrelationship

signifies that instances of the source class send signals to instances of the target class. We


formalize the usage relationship as follows :

A<US>−→ B = (GE(A,B) = null) ∧ (GE(B, A) = null) ∧

(EX(A,B) ∈ B) ∧ (EX(B,A) ∈ B) ∧

(IS(A,B) ⊆ all − {field, arrayfield, collectionfield}) ∧

(IS(B,A) = φ) ∧

(LT (A, B) ∈ ‖) ∧ (LT (B, A) ∈ ‖) ∧

(MU(A,B) = [1, +∞]) ∧ (MU(B,A) = [0, +∞])

(4.12)

4.3.2 Approach

The architecture modeling process identifies all relationships between the classes/interfaces and

represents the identified relationships in directed graphs. The process also computes the basic

reusability attributes for each class in the system. Figure 4.6 illustrates the architecture modeling

process.

Data Flow

Source Code Models

(XML Doc)XML Parser

Relationship Extractor

Metric Generator

Graph GeneratorGraph TransformerCIRGCIDG

Figure 4.6: The Approach for Architecture Modeling.

As we described before, the source code models built by the source code modeling process are

exported as XML documents. First, these source code models are parsed by theXML Parserin

Figure 4.6. Then, theRelationship Extractoridentifies all relationships described is Section 4.3.1


and theMetric Generatorcomputes a set of metrics for each class/interfacce. We define a metric

suite at the class level to represent the basic reusability attributes for each class in the system.

The metric suite is presented in Table 4.1. The definition of each metric is adapted from SDMet-

rics [72]. Finally, theGraph GeneratorandGraph Transformergenerate the CIRG and CIDG,

respectively. We will give formal definitions of the CIRG and CIDG in following sections.

Metric Definitionlines code The number of lines of non-comment code in a class.num attr The number of attributes in a class. The metric counts all properties re-

gardless of their type (data type, class or interface), visibility, change-ability (read only or not), and owner scope (class-scope, i.e., static, orinstance attribute). Not counted are inherited properties, and proper-ties that are members of an association, i.e., that represent navigableassociation ends.

num ops The number of methods in a class. Includes all methods in the class thatare explicitly modeled (overriding methods, constructors), regardlessof their visibility, owner scope (class-scope, i.e., static), or whetherthey are abstract or not. Inherited operations are not counted.

num pub ops The number of public methods in a class. Same as metricnum ops,but only counts operations with public visibility. Measures the size ofthe class in terms of its public interface.

num nestedclasses The number of inner classes in a classsetters The number of operations with a name starting with ’set’. Note that

this metric does not always yield accurate results. For example, anoperationsettleAccount will be counted as setter method.

getters The number of operations with a name starting with ’get’, ’is’, or ’has’.Again, note that this metric does not always yield accurate results. Forexample, an operationisolateNodewill be counted as getter method.

fan in The number of classes/interfaces that depend on this class. This metriccounts incoming plain UML dependencies and usage dependencies.

fan out The number of classes/interfaces on which this class depends. Thismetric counts outgoing plain UML dependencies and usage dependen-cies.

Table 4.1: The Metric Suite at Class Level


4.3.3 Class/Interface Relationship Graph

The CIDG captures the UML-compliant relationships as explained in Section 4.3.1. The formal

definition of the CIDG is given as follows :

Definition 4.1. A Labeled Directed Graph (LDG) is a tupleΓ(V, E, LV , LE , lV , lE), whereV

is a set of nodes (or vertices),E is a set of edges (or arcs),LV is a set of node labels,LE is a set

of edge labels,lV : V → LV is a label function that maps nodes to node labels, andlE : E → LE

is a label function that maps edges to edge labels.

Definition 4.2. The Class/Interface Relationship Graph (CIRG) of an object-oriented system is

an LDG defined in Definition 4.1, whereV is the set of all classes/interfaces of the system,lV (v)

returns the full name (i.e. package name concatenates class or interface name) ofv for anyv ∈ V ,

E = {(v, w) ∈ V × V | v referencesw}, andlE(e) returns the types of relationships between

the source node and target node ofe for anye ∈ E. The type of a relationship is one ofIN , RE,

AS, AG, CO, andUS, which representsinheritance, realization, association, aggregation,

composition, andusage, respectively.

Each class or interface of a Java system represents a node of the CIRG of the system. We

name the node in the CIRG asRClass, and each node is presented and exported as an XML

document. The XML schema for each node is depicted in Figure 4.7. The XML schema shows

that four types of information about the CIRG node are captured :

• Property The property field records the name, the type (i.e., class or interface), the package

name, and the Java source file name of the corresponding class or interface.

• Characteristics The characteristics field records the accessibility (i.e., public, protect, or

private) and the implementation status (i.e., concrete class or abstract class) of the corre-

sponding class or interface.


• Metrics The metrics field records values of the metrics in Table 4.1 for the corresponding

class or interface.

• RelationshipsThe relationships field records all classes or interfaces which have one of the

defined relationships with the corresponding class or interface. The type and the direction

of the relationship are also stored.

<<sequence>>

o_sequence



<<sequence>>

i_sequence



<<all>> +p_all [1..1]

<<complexType>>

Property

<<all>> +c_all [1..1]

<<complexType>>

Characteristics

<<all>> +m_all [1..1]

<<complexType>>

Metrics

<<elt>> +name [1..1] : xsd:string



<<elt>> +sourceFile [1..1] : xsd:string

<<all>>

p_all

<<elt>> +accessibility [1..1] : xsd:string

<<elt>> +implementation [1..1] : xsd:string

<<all>>

c_all

<<elt>> +lines_code [1..1] : xsd:positiveInteger

<<elt>> +num_attr [1..1] : xsd:positiveInteger

<<elt>> +num_pub_ops [1..1] : xsd:positiveInteger

<<elt>> +num_ops [1..1] : xsd:positiveInteger

<<elt>> +num_nested_classes [1..1] : xsd:positiveInteger

<<elt>> +setters [1..1] : xsd:positiveInteger

<<elt>> +getters [1..1] : xsd:positiveInteger

<<all>>

m_all

<<all>> r_all[0..1]

<<complexType>>

Relationships

<<seq>> d_sequence[1..1]

<<complexType>>

Direction

<<elt>> +in [0..1]

<<elt>> +out [0..1]

<<sequence>>

d_sequence

<<seq>> i_sequence[1..1]

<<complexType>>

Direction::In

<<seq>> o_sequence[1..1]

<<complexType>>

Direction::Out

<<sequence>>

sequence

<<elt>> +property [0..1] : Property

<<elt>> +characteristics [1..1] : Characteristics

<<elt>> +metrics [0..1] : Metrics

<<elt>> +relationships [0..1] : Relationships

<<complexType>>

RClass


<<elt>> +inheritance [0..1] : Direction

<<elt>> +realization [0..1] : Direction

<<elt>> +association [0..1] : Direction

<<elt>> +aggregation [0..1] : Direction

<<elt>> +composition [0..1] : Direction

<<elt>> +usage [0..1] : Direction

<<all>>

r_all

Figure 4.7: The UML Representation of XML Schema for Nodes in the CIRG.


4.3.4 Class/Interface Dependency Graph

Class dependencies occur when one class uses the services of another class. For example, this

can happen when a class inherits from another, has an attribute whose type is of another class, or

when one of its methods calls a method on an object of another class. Given two classes,v and

w, let v ³ w represent that classv depends upon classw. We formalize the class dependency as

follows :

v ³ w = v<IN>−→ w ∨

v<RE>−→ w ∨

v<AS>−→ w ∨

v<AG>−→ w ∨

v<CO>−→ w ∨

v<US>−→ w

(4.13)

Now, we are ready to give the formal definition of the CIDG of an object-oriented system:

Definition 4.3. The Class/Interface Dependency Graph (CIDG) of an object-oriented system is

an LDG defined in Definition 4.1, whereV is the set of all classes/interfaces of the system,lV (v)

returns the full name (i.e. package name concatenates class or interface name) ofv for anyv ∈ V ,

E = {(v, w) ∈ V × V | v ³ w}, LE = φ, and hencelE(e) returns an empty label for any

e ∈ E.

Again, each class or interface of a Java system represents a node of the CIDG of the system.

We name the node in the CIDG asDClass, and each node is presented and exported as an XML

document. The XML schema for each node is depicted in Figure 4.8. The XML schema shows

that four types of information about the CIDG node are captured :

• Property The property field records the name, the type (i.e., class or interface), the package

name, and the Java source file name of the corresponding class or interface.


• Characteristics The characteristics field records the accessibility (i.e., public, protect, or

private) and the implementation status (i.e., concrete class or abstract class) of the corre-


• Metrics The metrics field records values of the metrics in Table 4.1 for the corresponding

class or interface.

• DependencyThe dependency field records all classes or interfaces on which the corre-

sponding class or interface depends, and all classes or interfaces that depend on the corre-


<<sequence>>

t_sequence



<<all>> +p_all [1..1]

<<complexType>>

Property

<<all>> +c_all [1..1]

<<complexType>>

Characteristics

<<all>> +m_all [1..1]

<<complexType>>

Metrics




<<elt>> +sourceFile [1..1] : xsd:string

<<all>>

p_all

<<elt>> +accessibility [1..1] : xsd:string

<<elt>> +implementation [1..1] : xsd:string

<<all>>

c_all

<<elt>> +lines_code [1..1] : xsd:positiveInteger

<<elt>> +num_attr [1..1] : xsd:positiveInteger

<<elt>> +num_pub_ops [1..1] : xsd:positiveInteger

<<elt>> +num_ops [1..1] : xsd:positiveInteger

<<elt>> +num_nested_classes [1..1] : xsd:positiveInteger

<<elt>> +setters [1..1] : xsd:positiveInteger

<<elt>> +getters [1..1] : xsd:positiveInteger

<<elt>> +fan_in [1..1] : xsd:positiveInteger

<<elt>> +fan_out [1..1] : xsd:positiveInteger

<<all>>

m_all

<<all>>d_all[0..1]

<<complexType>>

Dependency

<<seq>> t_sequence[1..1]

<<complexType>>

Types

<<sequence>>

sequence

<<elt>> +property [0..1] : Property

<<elt>> +characteristics [1..1] : Characteristics

<<elt>> +metrics [0..1] : Metrics

<<elt>> +relationships [0..1] : Dependency

<<complexType>>

DClass


<<elt>> +in [0..1] : Types

<<elt>> +out [0..1] : Types

<<all>>

d_all

Figure 4.8: The UML Representation of XML Schema for Nodes in the CIDG.


4.3.5 An Example : Car Rental System

In order to clarify the definitions and algorithms proposed in this thesis, we will give examples of

a hypothetical software system on appropriate places. The hypothetical software system is a Car

Rental System (CRS) which consists of agents, customers, and a vehicle repository. The CRS

provides two main business services: i) booking cars, and ii) evaluating cars based on the driving

records of the customers. Figure 4.9 shows the CIRG of the CRS system. The CIRG captures all

class relationships defined in Section 4.3.1 of the CRS system.

com.uwstar.crs.training

TrainingCourse

com.uwstar.crs

VehicleRepository


TrainingPlan

com.uwstar.crs

Booking

com.uwstar.crs

VehicleEvaluation

com.uwstar.crs

IBooking

com.uwstar.crs.person

Agent


Customer


Person

com.uwstar.crs.record

Record


DrivingRecord


CreditRecord

com.uwstar.crs.vehicle

Vehicle


Car


SUV


Truck


Dealer

<<AS>>

<<AG>> <<AS>>

<<AG>> <<IN>> <<IN>>

<<RE>>

<<AS, US>>

<<AS, US>>

<<AS, US>>

<<IN>> <<IN>>

<<AG>> <<AG>>

<<AG, US>>

<<AG, US>> <<AG, US>> <<AG, US>>

<<CO>>

<<IN>> <<IN>> <<IN>>

Figure 4.9: The CIRG of the Car Rental System (CRS).


Figure 4.10 shows the CIDG of the CRS system. Each node represents a class/interface of

the CRS system, and an edge between two classes/interfaces represents a dependency existing

between these two classes/interfaces. By their definitions, the CIRG is a UML-compliant model,

and the CIDG is a further abstraction of the CIRG. That is, the CIRG and CIDG model the

structure of an object-oriented software system at different levels of abstraction.


TrainingCourse

com.uwstar.crs

VehicleRepository


TrainingPlan

com.uwstar.crs

Booking

com.uwstar.crs

VehicleEvaluation

com.uwstar.crs

IBooking


Agent


Customer


Person


Record


DrivingRecord


CreditRecord


Vehicle


Car


SUV


Truck


Dealer

Figure 4.10: The CIDG of the Car Rental System (CRS).


4.4 Summary

We have discussed the source code modeling process and the architecture modeling process that

are contained in the architecture recovery stage of the SOC4J framework. The source code mod-

eling process builds a complete data model set for Java source code at different abstracted levels.

Based on the data models, the architecture modeling process establishes a repository of relation-

ships among classes and interfaces which can easily be queried in the next stage of the SOC4J

framework.

Chapter 5

Service Identification

An effective way of leveraging the value of legacy systems is to expose their functionalities as

services to a larger number of clients. Identifying critical business services embedded in an

existing Java system is one of the primary tasks of the proposed SOC4J framework. This is

done in the service identification process of the SOC4J framework. This process is based on

the analysis on the recovered architectural information obtained from the previous chapter. This

chapter discusses the service identification strategy and algorithms that are used to identify critical

business services embedded in an existing object-oriented system.

In Section 5.1, we discuss how a service is described and modeled. We introduce the support-

ing techniques used in the service identification process in Section 5.2. The service identification

process is presented in Section 5.3. Finally, we give a summary of this chapter in Section 5.4.

5.1 Service Representations

A business service within a software system is an abstract resource that represents a capability

of performing tasks that represent a coherent functionality from the points of view of both the

provider and the requester [40]. We categorize services that are embedded in an object-oriented

54

CHAPTER 5. SERVICE IDENTIFICATION 55

system into two categories :

• Top-Level Services (TLS)A top-level service is a service that is not used by any other

services of the system. However, it may contain a hierarchy of low-level services that

further describe the service. From the requester’s point of view, top-level services are

services provided by the system that can be accessed independently. Top-level services are

hence independent from each other.

• Low-Level Services (LLS)A low-level service is a service that is underneath a top-level

service, which may be agglomerated with other low-level services underneath the same

top-level service to yield a new service with higher level of granularity (i.e., the desired

business result).

The SOC4J framework is designed to identify both the top-level services and the low-level

services embedded in an existing object-oriented system. In order to clearly describe and auto-

mate the identification process, we describe an identified service (either a top-level service or a

low-level service) as a tuple :

(name,CF , SHG)

In the above tuple,name is the name of the service.CF is the facade class set of the service.

The facade class set contains classes/interfaces that directly provide the functionality of the ser-

vice to the outside world.SHG is theService Hierarchy Graph(SHG) of the top-level service

represented by the tuple. The SHG is defined as follows :

Definition 5.1. The Service Hierarchy Graph (SHG) associated with a top-level service is a

rooted LDG, where the root,r ∈ V , represents the top-level service,V \ r represents the set

of low-level services contained in the top-level service,lV (v) returns theCF set ofv for any

v ∈ V , E = {(v, w) ∈ V × V | v containsw}, LE = φ, and hencelE(e) returns an empty label

for anye ∈ E.


The SHG shows the structural relationships between the services underneath a top-level ser-

vice. It gives a high-level representation of services that is understandable by both developers

and business experts. Furthermore, the SHG describes the modularization of its corresponding

top-level service. There is no SHG associated with a low-level service, that is to say : SHG =φ

for a low-level service. This is because each low-level service has already been presented in the

SHG of its top-level service. The SHGs of all top-level services of an object-oriented software

system form theservice view(ServView) of the system.

The identified services (represented as tuples) are exported and stored as XML documents.

The XML schema for services is illustrated in Figure 5.1.

<<complexType>>

Service::FacadeClassSet

<<seq>> +fc_sequence [1..1] <<elt>> +class [0..1] : xsd: string


<<choice>>

fc_sequence

<<complexType>>

Service::ServiceHierarchyGraph

<<seq>> +shg_sequence [1..1]


<<elt>> +serviceHierarchyGraph [0..1]

<<elt>> +facadeClassSet [1..1]

<<sequence>>

sequence


<<complexType>>

Service

<<elt>> +name [1..1] : xsd: string

<<sequence>>

shg_sequence

Figure 5.1: The UML Representation of XML Schema for a Service.

5.2 Supporting Concepts

The proposed service identification approach involves a set of techniques such as graph trans-

formations, dominance analysis on directed graphs, and evaluation of the modularization of a

system that is represented by directed graphs. It is helpful to introduce these techniques prior to

explaining the service identification process.


5.2.1 Graph Techniques

Graphs can be used to describe complex object structures in a mathematical way. In the context

of software engineering, we can use graphs to formalize object-oriented languages and concepts,

especially, the UML. In this thesis, we apply graph techniques to assist in service identification.

The important graph concepts and techniques involved in this thesis are reviewed as follows :

Definition 5.2. Let G = (V, E) be a directed graph (DG), whereV represents all nodes (or

vertices) inG andE represents all edges (or arcs) inG. Given a nodev ∈ V , the in-degree ofv

is the number of inward directed edges fromv and the out-degree ofv is the number of outward

directed edges fromv. A root of G is a node whose in-degree is zero.G is said to be a rooted

directed graph iff there is only one root inV .

Definition 5.3. Let G = (V, E) be a DG, whereV represents all nodes (or vertices) inG andE

represents all edges (or arcs) inG. Given two nodesv ∈ V andw ∈ V , a path from vertexv to

vertexw is a sequence of consecutive edges betweenv andw. A cycle is a path from a node to

the same node. Nodew is said to be reachable from nodev if there is a path fromv to w. G is a

directed acyclic graph (DAG) iff there is no cycle inG.

Definition 5.4. A rooted tree is a DG G=(V,E), whereV represents all nodes (or vertices) inG

andE represents all edges (or arcs) inG, such that

1. there is a unique node inV (called the root) which has in-degree0;

2. every node inV except the root has in-degree1; and

3. there is a path from the root to every other node inG.

Definition 5.5. Let G = (V, E) be a DG, whereV represents all nodes (or vertices) inG and

E represents all edges (or arcs) inG. G is connected if the underlying undirected graph ofG is

connected. WhileG is strongly connected if there is a path inG between every pair of nodes in

V .


Definition 5.6. Let G = (V, E) be a DG, whereV represents all nodes (or vertices) inG

andE represents all edges (or arcs) inG. A connected component ofG is a maximal (though

not necessarily maximum) connected subgraph ofG. A strongly connected component ofG

is a maximal (though not necessarily maximum) strongly connected subgraph ofG. A rooted

component is a subgraph ofG that consists of a unique root and the collection of all nodesw

such that there is a path from the root tow.

Definition 5.7. Let G = (V, E) be a DG, whereV represents all nodes (or vertices) inG andE

represents all edges (or arcs) inG. A clique inG is a collection of nodes inV such that each pair

of nodes in the collection is joined by an edge. Ak-clique is a clique that the number of nodes in

the clique isk.

Figure 5.2: An Example of a Directed Graph.

For example, given the directed graphG in Figure 5.2, there are two connected components:

graphs (a) and (b) in Figure 5.3. The only strongly connected component ofG is the graph

(c) in Figure 5.3. Note that the subgraph{2, 5, 7} or {5, 6, 7} is not a strongly connected

component ofG, because they are not maximal. Graph (d) and graph (e) in Figure 5.3 are two

rooted components of graph (a) in Figure 5.3. The set{2, 3, 7} is a 3-clique in graphG.


(a) (b) (c)

(c) (d)

Figure 5.3: (a) A connected component of the directed graphG in Figure 5.2. (b) The otherconnected component ofG. (c) The only strongly connected component ofG. (d) Arooted component of graph (a). (e) The other rooted component of graph (a).

5.2.2 Dominance Analysis

Dominance analysis is a fundamental concept in compiler optimizations and has been used exten-

sively to identify loops in basic block graphs [61]. It allows one to locate subordinated software

elements in a rooted dependency graph. Dominance analysis on call graphs of procedural lan-

guage applications has been used in reverse engineering to identify modules and subsystems and

recover system architectures [17, 26, 36]. In this thesis, we explore the use of dominance analysis

on SHGs. This assists us in identifying low-level services underneath a top-level service.

Dominance is a relation between nodes in a rooted directed graph. This relation can be

formally defined as follows :


Definition 5.8. Let G = (V, E, r) be a rooted directed graph, whereV represents all nodes in

G, E represents all edges inG, andr ∈ V is the unique root node ofG. Given any two different

nodesv ∈ V andw ∈ V , nodev dominates nodew, writtenv dom w, iff every path from root

r to w containsv. Nodev directly dominates nodew, written v ddom w, iff all the nodes that

dominatew dominatev. Nodev strongly directly dominates nodew, written v sddom w, iff

v ddom w andv is the predecessor ofw.

Definition 5.9. LetG = (V, E, r) be a rooted directed graph, whereV represents all nodes inG,

E represents all edges inG, andr ∈ V is the unique root node ofG. The dominance tree corre-

sponding toG is a treeT = (V, Ed, r) whereEd = {(v, w) ∈ V×V | v ddom w ∨ v sddom w}.A ddom subtree ofT is a subtree that the root of the subtree has ddom incoming edge. A sddom

subtree ofT is a subtree that the root of the subtree has sddom incoming edge. A consolidation

subtree of the dominance tree is a subtree that contains only sddom edges. A maximal consolida-

tion subtree is a maximal subtree that contains only sddom edges.

(a) (c)

sddom

ddom

(b)

Figure 5.4: (a) A Simple Directed Graph. (b) The Dominance Tree Corresponding to the Graphin (a). (c) All Two Maximal Consolidation Subtrees of the Dominance Tree in (b).

Figure 5.4 shows a simple rooted directed graph, the corresponding dominance tree, and the

maximal consolidation subtrees in the dominance tree. Note that the subtree{6, 9} is a ddom


subtree and{2, 4, 5, 8} is a sddom subtree. The subtree{7, 10} is a consolidation subtree but not

a maximal consolidation subtree, because it is not a maximal subtree that contains only sddom

edges. In Figure 5.4, the dominance tree is constructed from an acyclic graph. However, this is

not a necessary condition. We can construct a dominance tree from every directed graph as long

as it is rooted.

By Definition 5.8 and 5.9, we can observe the following properties of dominance trees :

Property 5.1. Given a rooted directed graphG = (V, E, r), whereV represents all nodes inG,

E represents all edges inG, andr ∈ V is the unique root node ofG. Let T be the dominance

tree corresponding toG. For each node (except the root) in a subtrees (either ddom subtree or

sddom subtree) ofT , there is no incoming edge inE from any other nodes which are outside the

subtree.

Property 5.2. Given a rooted directed graphG = (V, E, r), whereV represents all nodes inG,

E represents all edges inG, andr ∈ V is the unique root node ofG. Let T be the dominance

tree corresponding toG. For each node (except the root) in a consolidation subtrees ofT , there

is no incoming edge inE from any other nodes (either inside the subtree or outside the subtree)

except its parent inT .

In the analysis process of reverse engineering, it is essential to have an effective way of

abstracting information. The dominance tree provides such an abstraction. More importantly, it

represents high-level modularization of the software system through its branches. Each branch of

the dominance tree represents a concept or high level functionality of the system. In the context

of object-oriented design, one benefit of using dominance trees in program comprehension is the

reduction of the visualization complexity of the class dependency graph by decreasing a large

number of edges. In the class dependency graph of a real world software system, a class may

have been referenced by hundreds of classes and a reduction to a single edge on the dominance

tree greatly clarifies the graphic.


5.2.3 Modularization Quality Metric

The modularization quality (MQ) metric was first introduced in [54]. It has been used in a number

of software engineering projects to evaluate the quality of software modularization achieved by

graph partitioning [24, 76]. Basically, the MQ metric measures the difference between the average

inter-connectivity and intra-connectivity of a system and shows how well the system is structured.

In this thesis, we use the MQ metric to evaluate how well a top-level service is modularized by

its low-level services.

Let C(G1, G2, ..., Gk) be a partition of a given graphG(V,E), whereV represents all nodes

in G andE represents all edges inG. The MQ metric of the system, which is represented by the

graphG, is defined as follows :

MQ(C, G) =∑k

i=1 s(Gi, Gi)n

−∑k−1

i=1

∑kj=i+1 s(Gi, Gj)

n(n− 1)/2(5.1)

The functions() used in Formula (5.1) is defined as the ratio of the actual number of edges

between two subsets ofV of graphG with respect to the maximum number of possible edges

between those two sets. LetU andW be two subsets ofV (i.e.,U ⊆ V andW ⊆ V ), then we

have

s(U,W ) =e(U,W )|U ||W | (5.2)

wheree(U,W ) denotes the number of edges connecting a vertex inU to a vertex inW .

The MQ metric determines the quality of the modularization quantitatively as the trade-off

between inter-connectivity and intra-connectivity of subsystems. This trade-off is based on the

assumption that well-designed software systems are organized into cohesive subsystems that are

loosely interconnected. Hence, the MQ metric is designed to reward the creation of highly cohe-

sive clusters, and to penalize excessive coupling between clusters. The value of the MQ metric is


between−1 (no internal cohesion) and1 (no external coupling). A straightforward consequence

is that a higherMQ value can be interpreted as better modularization since it corresponds to a

partition with either fewer edges connecting vertices from distinct blocks, or with more edges ly-

ing within the identical blocks of the partitions, which is what most clustering or modularization

algorithms aim to achieve [24].

5.3 The Proposed Processes

In the SOC4J framework, we aim to identify critical business services embedded in an exist-

ing Java system. Our service identification process, as shown in Figure 5.5, is supported by a

combination of top-down and bottom-up techniques.

No YesTermination CriteriaSatisfied?

Dominance Tree Reduction

Dominance Tree Generation

SHG Transformation

SHG Reconstruction

To Stage 3

Lo

w-L

evel S

erv

ice Id

en

tificatio

n

DTree of SHG

Reduced DTree

SHG

SHG

Serv

ice A

gg

reg

atio

n

Control Flow Process

Service Validation

Top-Level Service Candidate Generation

CIDG Transformatiom

From Stage 1

To

p-L

ev

el S

erv

ice Id

en

tificatio

n

Validated top-level services andtheir atomic services (described in SHGs)

Top-level service candidates

MCIDGs

Figure 5.5: Processes in Service Identification Stage.


In the top-down portion of the process, we identify the top-level services and the atomic

services (to be discussed later) underneath each top-level service. In the bottom-up portion,

we aggregate the atomic services to identify services with higher level of granularity (reusable

services). We will delve into these two portions in the subsequent two sections.

5.3.1 Top-Level Service Identification

The top-level service identification process is the top-down portion of the proposed service iden-

tification process. According to the definition of a top-level service ( introduced in Section 5.1),

top-level services of a software system partition the system into independent parts. Each of these

independent part represents a service to the outside world, from the user’s point of view. We

identify services of a system by starting with its top-level services, and then extracting a ser-

vice hierarchy for each top-level service to identify low-level services underneath each top-level

service.

Algorithm 5.1: CIDG-Transformation

Input : CIDG : The CIDG of the system.Output : MCIDGs : A set of MCIDGs.

// decompose the CIDG into connected componentsMCIDGs ← φ;1

CGraphs ← ConnectedComponents(CIDG);2

// decompose each connected component into a set of rooted// componentsforeachgraph g∈ CGraphsdo3

RGraphs ← RootedComponents(g);4

MCIDGs ← MCIDGs ∪RGraphs;5

end6

To identify the top-level services of an existing object-oriented system, the first step is to

identify the entry points of the system. In Chapter 4, we have modeled the existing system as


directed graphs : the class/interface relationship graph (CIRG) and the class/interface dependency

graph (CIDG). At this stage, we decompose the CIDG into a set of connected components with

an unique root such that each component is an independent subgraph of the CIDG. Algorithm 5.1

describes the decomposition process.

In Algorithm 5.1, functionConnectedComponents() computes and returns all connected

components of the given directed graph. While functionRootedComponents() decomposes a

connected directed graph into a set of rooted components. We name each of the rooted compo-

nents as amodularized CIDG(MCIDG). Essentially, Algorithm 5.1 applies a set of graph trans-

formation rules to transform the CIDG into a set of rooted components (i.e., MCIDGs). Note

that the output MCIDGs are subgraphs of the CIDG, and each node in an MCIDG represents a

single class or interface of the system. There is no other class or interface in the system depends

upon the unique root of each MCIDG. Consequently, the unique root of each MCIDG might rep-

resents an entry point of the system, and each MCIDG might therefore embed a top-level service

represented by its root.

As we have mentioned, each node of an MCIDG contains only one class or interface. At this

stage, we consider the root of each MCIDG as a top-level service candidate and the other nodes

as the low-level service candidates underneath the top-level service candidate. The second step of

the top-level service identification is to generate the top-level service candidates from MCIDGs.

This is achieved by performing three tasks for each top-level service candidate represented by an

MCIDG : i) to compute the facade class set, ii) to build the SHG of the top-level service candidate,

and iii) to describe the candidate as the tuple that we have defined in Section 5.1.

The final step of the top-level service identification is to validate the top-level service candi-

date and assign a meaningful name for each accepted top-level service. This is a user-involved

procedure. The user retrieves the functionality provided by the candidate through examining the

classes/interfaces in its facade class set. Based on the functionality, the user can make a decision

on the candidate.


Algorithm 5.2: Top-Level Service Identification

Input : CIDG : The CIDG of the system.Output : TLSs : A set of identified top-level services that are represented by

(name,CF , SHG) tuples.

// decompose the CIDG into a set of rooted components// each rooted component is an MCIDGMCIDGs ← Run CIDG-Transformation Alg. onCIDG;1

// generate top-level service candidates// represent candidates as (name,CF , SHG) tuplesCandidates ← φ;2

foreachMCIDG(Vm, Em) ∈ MCIDGs do3

Create a new graphG(V, E);4

V ← φ;5

E ← Em;6

for i ← 1 to |Vm| do7

// Vm(i) means the ith node in Vm

V (i) ← Facade(Vm(i),MCIDG, CIDG);8

end9

Create a new tupleT (name,CF , SHG);10

T.name ← null;11

T.CF ← Root(G);12

T.SHG ← G;13

Add tupleT (name,CF , SHG) to Candidates;14

end15

// validate the top-level service candidates// assign a meaningful name for each accepted serviceTLSs ← φ;16

foreach tupleT ∈ Candidates do17

The user validates the candidate by examiningT.CF ;18

if T is acceptablethen19

T.name ← An meaningful name for the service;20

Add T (name,CF , SHG) to TLSs;21

end22

end23


Algorithm 5.2 describes the details of these three steps in the top-level service identification

process. In Algorithm 5.2, each iteration of thefor loop on line3 transforms an MCIDG into a

top-level service candidate. FunctionFacade() computes and returns facade class sets for a given

top-level service candidate and its low-level service candidates. As we have described, the facade

class set contains classes/interfaces that describe the functionality of the service to the outside

world. Therefore, functionFacade() returns a set of classes/interfaces that have incoming edges

from classes/interfaces in the CIDG but not in the MCIDG. FunctionRoot() returns the root of a

given directed graph.

The user validates a candidate by examining its facade class set since classes in the set rep-

resent the functionality of the service. At this stage, the SHG corresponding to each top-level

service is built from the MCIDG and therefore can be viewed as a subgraph of the CIDG. In

other words, the SHG is a abstraction of a MCIDG hiding the non-necessary information for

understanding the service hierarchy. The functionality of low-level services in the hierarchy is

provided by a single class. Hence these services are calledatomic services. In most cases, these

atomic services are too fine-grained and have little reusability. However, the SHG at this stage

provides us a good starting point to identify services with a higher level of granularity by using

service aggregation techniques that are presented in the subsequent section.

After performing the top-level service identification, the critical top-level services of an exist-

ing system have been identified. Moreover, for each top-level service, we have extracted a service

hierarchy graph (SHG) to model its low-level services. However, at this time, low-level services

in the SHG are atomic services with little or no reusability. We need to build a new SHG for each

top-level service that contains low-level services with a higher level of granularity. Consequently,

these low-services in the new SHG are critical business services and have better reusability. This

is achieved at the low-level service identification process.


5.3.2 Low-Level Service Identification

The low-level service identification process is the bottom-up portion of the entire service iden-

tification process. SHGs built in the top-level service identification process are rooted directed

graphs that represent the structural dependency between a top-level service and its low-level ser-

vices (atomic services). As we have mentioned, these atomic services are too fine-grained and

therefore have limited reusability. At this stage, we aim to aggregate highly related atomic ser-

vices to build a new SHG for each top-level service such that the services contained in the new

SHG have a higher level of granularity and thus present a higher potential for reuse. The ser-

vice aggregation is an iterative process and the desired new SHG is achieved incrementally. The

low-level services obtained from each iteration have higher level of granularity than the previ-

ous iteration and hence modularize the top-level service in a different way. The result services of

each iteration are presented as an intermediate SHG to users. An evaluation procedure can be per-

formed at each iteration to determine whether specific goals have been reached. Then users can

make a decision on repeating or terminating the process according the pre-defined termination

criteria.

Algorithm 5.3 describes the low-level service identification process for a given top-level ser-

vice. Essentially, it repeatedly runs the service aggregation algorithm (i.e., Algorithm 5.4) on

low-level services underneath a top-level service until theTermination Criteriaare satisfied. Once

the iteration is terminated, the final SHG is built for the top-level service. Then, the algorithm

represents the low-services contained in the newly built SHG in tuples defined in Section 5.1.

FunctionMQ() computes the MQ metric of a given top-level service. The MQ metric quantita-

tively measures the quality of the modularization of a top-level service as the trade-off between

inter-connectivity and intra-connectivity of its low-level services.

Based on the modularization of the top-level service and the level of granularity of the low-

level services underneath the top-level service, we define twoTermination Criteriato stop the


Algorithm 5.3: Low-Level Service Identification

Input : CIRG : The CIRG of the system,CIDG : The CIDG of the system,T (name,CF , SHG) : The top-level service.

Output : LLSs : Identified low-level services represented in(name,CF , SHG) tuples.T (name,CF , SHG) : The input top-level service with newly built SHG.

// compute the MQ metric of the input top-level serviceComputeMQ(T.SHG, CIDG);1

// aggregate low-level services iterativelyrepeat2

SHGnew ← Run Service Aggregation Alg. onT.SHG;3

T.SHG ← SHGnew;4

ComputeMQ(T.SHG, CIDG);5

until Termination Criteria are satisfied;6

// represent identified Low-level services in tuplesLLSs ← φ;7

foreachnon-root nodev ∈ T.SHG do8

Create a new tupleL(name,CF , SHG);9

L.name ← Meaningful name for the service;10

L.CF ← lV (v);11

L.SHG ← φ;12

Add L(name,CF , SHG) to LLSs;13

end14

service aggregation iteration in Algorithm 5.3 :

Termination Criterion 5.1. The top-level service has been nicely modularized by its low-level

services.

Termination Criterion 5.2. Low-level services are presenting appropriate level of granularity.

In term of the structure of a top-level service, the low-level services underneath the top-level

service modularize the top-level service. By the definition of the MQ metric, the higher the value

of the MQ metric of a top-level service is, the better structure the service has. This is based on


the hypothesis that a well-modularized service becomes highly malleable; that is, the service can

evolve in less time and at less cost. On the other hand, the level of granularity of services must

be matched to the level of reusability and flexibility required for a given context. The basis of the

second criterion is the hypothesis that the component that realizes a service with higher level of

granularity has better reusability.

Algorithm 5.4: Service-Aggregation

Input : CIRG : The CIRG of the system,CIDG : The CIDG of the system,SHG : The SHG that contains the low-level services to be aggregated,Heuristic1 : Termination Criterion 1,Heuristic2 : Termination Criterion 2.

Output : SHGnew : A new SHG that contains low-level services with higher level ofgranularity.

// SHG transformationSHGnew ← CollapseCliques(SHG,CIRG, CIDG);1

SHGnew ← CollapseStronglyConnectedComponents(SHGnew);2

// dominance tree generationDTree ← GenerateDominanceTree(SHGnew);3

// dominance tree reductionReduceDominanceTree(DTree, Heuristic1);4

ReduceDominanceTree(DTree, Heuristic2);5

// SHG reconstructionSHGnew ← ReconstructSHG(DTree, CIDG);6

Algorithm 5.4 aggregates highly related low-level services into a single service with a higher

level of granularity and reconstructs a new SHG containing these newly identified services. The

output SHG contains fewer low-level services with a higher level of granularity than the input

SHG. In order words, it modularizes the corresponding top-level service in a better way.

The service aggregation is based on the dominance analysis on SHGs. As we have explained,


SHGs are rooted directed graphs, hence we can generate dominance trees from SHGs. However,

in order to improve the shape of the generated dominance tree (increase the height of the tree), we

perform a graph transformation on SHGs. The purpose of the graph transformation is to agglom-

erate strongly related services and remove cycles in SHGs. Program units linked by recursion

contribute to the implementation of a single functionality and can, therefore, be regarded as a sin-

gle module. We remove cycles in SHGs by aggregating the services within a cycle into a single

service. Where many services are involved within a cycle, poorer results of the dominance tree

analysis are generally obtained [17, 36]. Our empirical studies in Chapter 7 shows that collapsing

strongly related services and removing cycles in SHGs are essential to dominance analysis on

SHGs.

In Algorithm 5.4, functionCollapseCliques() collapses the services in a 3-clique in the in-

put SHG if the similarity of services in the clique exceeds a user-defined threshold. We have

developed a methodology for computing the similarity between two services, based on the cou-

pling analysis of the classes that implements these services [52].

FunctionCollapseStronglyConnectedComponents() iteratively detects the strongly con-

nected components (described in Section 5.2.1) in a directed graph and then collapses all nodes

in the component into one node and updates the edges accordingly until there is no strongly con-

nected component left. Consequently, the output graph of this function is a directed acyclic graph

(DAG). The output SHG of the SHG transformation contains no cycle.

Once the SHG transformation is done, functionGenerateDominanceTree() generates the

service dominance tree from the new SHG. FunctionReduceDominanceTree() reduces a dom-

inance tree by applying a given reducing heuristic. We define two reducing heuristics as follows:

Heuristic 5.1. Remove each maximal consolidation subtree by only keeping the root node of the

subtree.

Agglomerating all services that are parts of a maximal consolidation subtree into a service makes


sense because these services constitute an independent unit that can only be accessed by the rest

of services of the system through the root of the subtree. In order to simplify the visualization,

we only need to present the root because the rest of the subtree is only visible to the root and can

be hidden in the root.

Heuristic 5.2. Remove all leaf nodes in a subtree that contain bothddom andsddom edges,

which are linked to the root of the subtree bysddom edges.

These leaf nodes represent low level services that are only accessible to the service represented

by the root of the subtree. Therefore these low level services can be considered as subservices of

the root.

FunctionReconstructSHG() recovers the service hierarchy for the services presented in a

service dominance tree. It needs the CIDG to provide extra information since the service domi-

nance tree is an abstraction of a service hierarchy graph with some information lost.

After performing the low-level service identification for each identified top-level services

from an existing object-oriented system, critical low-level services underneath each top-level

service have been identified. Finally, the SHGs of all top-level services yield the ServView of the

system.

5.3.3 An Example : Car Rental System

To further explain the proposed service identification processes, in this section, we identify the

business services embedded in the CRS example by applying the algorithms introduced in the

service identification processes.

First of all, we identify the top-level services of the CRS system by running Algorithm 5.2 on

the CIDG of the CRS system, which is depicted in Figure 4.10. Algorithm 5.1 decomposes the

CIDG into rooted components (i.e., MCIDGs). Figure 5.6 depicts the result MCIDGs. There are

three MICDGs generated from the CRS system : graph (a), (b), and (c) in Figure 5.6.


com.uwstar.crs

VehicleEvaluation


Customer


Person


Record


DrivingRecord


CreditRecord

(b)


Dealer

(c)


TrainingCourse

com.uwstar.crs

VehicleRepository


TrainingPlan

com.uwstar.crs

Booking

com.uwstar.crs

IBooking


Agent


Customer


Person


Record


DrivingRecord


CreditRecord


Vehicle


Car


SUV


Truck

(a)

Figure 5.6: The MCIDGs of the Car Rental System.


Based on the MCIDGs extracted by Algorithm 5.1, Algorithm 5.2 generates the following

top-level service candidates (TLSC) :

• TLSC1 : (null, {com.uwstar.crsBooking}, SHG1).

• TLSC2 : (null, {com.uwstar.crsV ehicleEvaluation}, SHG2).

• TLSC3 : (null, {com.uwstar.crs.personDealer}, SHG3).


TrainingCourse

com.uwstar.crs

VehicleRepository


TrainingPlan

com.uwstar.crs

Booking

com.uwstar.crs

IBooking


Agent


Customer


Person


Record


DrivingRecord


CreditRecord


Vehicle


Car


SUV


Truck

Low-level services

underneath

the top-level service

Low-level services

Figure 5.7: The SHG of the Top-Level ServiceV ehicleBooking.

SHG1, SHG2, andSHG3 are graphs (a), (b), and (c) in Figure 5.6, respectively. By examining

the functionality of each top-level service candidate, we find that the candidate

(null, {com.uwstar.crs.personDealer}, SHG3)

is not a critical business service. The classcom.uwstar.crs.personDealer is a dead class.

Hence, after the service validation, we accept two top-level services (TLS) of the CRS system :


• TLS1 : (V ehicleBooking, {com.uwstar.crsBooking}, SHG1).

• TLS2 : (V ehicleEvaluation, {com.uwstar.crsV ehicleEvaluation}, SHG2).

After running Algorithm 5.2, the critical top-level services of the CRS system are identified.

Moreover, for each top-level service, we extract a service hierarchy graph (SHG) to model its low-

level services. Figure 5.7 illustrates the SHG of the identified top-level serviceV ehicleBooking.

At this stage, a low-level service in the SHG is a single class (atomic service) with little or no

reusability. We need to build a new SHG for each top-level service that contains low-level services

(groups of classes) with higher level of granularity.

com.uwstar.crs

VehicleRepository

com.uwstar.crs

Booking


Agent


Customer


DrivingRecord


CreditRecordcom.uwstar.crs.vehicle.Car

com.uwstar.crs.vehicle.Vehicle

com.uwstar.crs.vehicle.SUV


com.uwstar.crs.vehicle.Truck


Figure 5.8: The Result SHG of Performing the SHG Transformation on the Original SHG of theTop-Level ServiceV ehicleBooking in the CRS System.

Now, we are ready to identify low-level services underneath top-level services by running

Algorithm 5.3 on each top-level service. To save space, we only identify low-level services

underneath the top-level serviceV ehicleBooking.


com.uwstar.crs

VehicleRepository

com.uwstar.crs

Booking


Agent


Customer


DrivingRecord


CreditRecord

com.uwstar.crs.vehicle.Car






Figure 5.9: The Service Dominance Tree of the SHG in Figure 5.8.

Essentially, Algorithm 5.3 computes the MQ metric ofV ehicle and runs Algorithm 5.4 re-

peatedly. In this example, in order to let identified low-level services have appropriate level of

granularity, we use Termination Criteria 5.2 to terminate the service aggregation iteration.

In the first iteration of Algorithm 5.4, Figure 5.8 shows the result SHG by performing the SHG

transformation on the original SHG (shown in Figure 5.7) of the top-level serviceV ehicleBooking.

The result SHG is obtained by aggregating the strongly related atomic services in the original

SHG. For instance, two services represented by nodes

com.uwstar.crs.vehicle.SUV andcom.uwstar.crs.vehicle.V ehicle

have an inheritance relationship and thus are agglomerated into one service represented by the

node

com.uwstar.crs.vehicle.SUV, com.uwstar.crs.vehicle.V ehicle

in the SHG depicted in Figure 5.8. The facade class set of the agglomerated service contains

com.uwstar.crs.vehicle.SUVandcom.uwstar.crs.vehicle.Vehiclebecause these two classes both

provide services to the outside of the new service. Also, there are three nodes in Figure 5.7 which

form a cycle :


com.uwstar.crs.personAgent,

com.uwstar.crs.trainingTrainingCourse, and

com.uwstar.crs.trainingTrainingP lan.

Hence, low-level services represented by these nodes are agglomerated into a service represented

by the nodecom.uwstar.crs.personAgent in Figure 5.8. The facade class set contains only

classcom.uwstar.crs.personAgent because the other two classes

com.uwstar.crs.trainingTrainingCourse and

com.uwstar.crs.trainingTrainingP lan.

do not provide services to the outside of the new service.

Once the SHG transformation is complete, functionGenerateDominanceTree() generates

the service dominance tree from the new SHG. Figure 5.9 shows the service dominance tree

of the SHG depicted in Figure 5.8. FunctionReduceDominanceTree() reduces the service

dominance tree in Figure 5.9 by applying the Heuristic 5.1 and Heuristic 5.2. Figure 5.11 shows

the reduced dominance tree.

com.uwstar.crs

VehicleRepository

com.uwstar.crs

Booking


Agent


Customer







Figure 5.10: The Reduced Dominance Tree of the Service Dominance Tree in Figure 5.9.


FunctionReconstructSHG() recovers the service hierarchy for the services presented in

the service dominance tree in Figure 5.10. Figure 5.11 shows the reconstructed from the reduced

service dominance tree in Figure 5.10.

com.uwstar.crs

VehicleRepository

com.uwstar.crs

Booking


Agent


Customer







Figure 5.11: The SHG Reconstructed from the Reduced Service Dominance Tree in Figure 5.10.

After the first iteration, by examining the MQ metric of the top-level serviceV ehicleBooking

and the granularity of low-level services underneath the top-level service, we know whether or

not the termination criteria are satisfied, and repeating the service aggregation process if the ter-

mination criteria are not satisfied. If satisfied, we terminate the process and identify the following

low-level services for the top-level serviceV ehicleBooking :

• (Car, {com.uwstar.crs.vehicle.Car, com.uwstar.crs.vehicle.V ehicle}, φ)

• (Truck, {com.uwstar.crs.vehicle.T ruck, com.uwstar.crs.vehicle.V ehicle}, φ)

• (SUV, {com.uwstar.crs.vehicle.SUV, com.uwstar.crs.vehicle.V ehicle}, φ)

• (V ehicleRepository, {com.uwstar.crs.V ehicleRepository}, φ)

• (Agent, {com.uwstar.crs.person.Agent}, φ)

• (Customer, {com.uwstar.crs.person.Customer}, φ)


5.4 Summary

In this chapter, we have discussed the two processes contained in the service identification stage

of the SOC4J framework, namelytop-level service identificationandlow-level service identifica-

tion. Also the techniques used in this stage have been introduced. The critical business services

embedded in an existing system have been identified and modeled. In the subsequent chapter, we

will introduce the approach to packaging identified services into self-contained components and

the methodology for transforming the existing system into a component-based system.

Chapter 6

Component Generation and System

Transformation

In the previous chapter, we have presented the methodology for identify services embedded in an

existing object-oriented software system. We categorize the critical business services embedded

in the system into two categories :top-level servicesand low-level services. Top-level services

and the low-level services underneath each top-level service can be identified by applying the

proposed approach.

The identified services must be packaged as components so that they can be deployed and

thus invoked. Another goal of the proposed SOC4J framework is reconstruct the existing system

to a component-based system, based on the components that realize the identified service. This

chapter discusses the service realization process and system reconstruction process.

In Section 6.1, we discuss how an identified service can be realized as a self-contained com-

ponent. A transformation technique that automatically reconstructs the existing system into a

component-based target system is introduced in Section 6.2. Finally, Section 6.3 gives a sum-

mary of this chapter.

80

CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION 81

6.1 Component Generation

The component-based development (CBD) assembles software from reusable components within

frameworks such as CORBA, Sun’s Enterprise JavaBeans (EJBs) and Microsoft COM. The service-

oriented architecture (SOA) encourages individual services to be self-contained. To reuse the

identified services and migrate the existing system’s implementation into a component-based

architecture, it is necessary to package the identified services into well-documented and self-

contained components. A self-contained component is a component that contains all the code

necessary to implement its services and hence can be deployed and invoked independently. At

the third stage of the proposed SOC4J framework, we realize each top-level service and the low-

level services contained in its SHG into self-contained components.

6.1.1 Approach

We package each identified service (either top-level service or low-level service) to generate a

self-contained component. A component that realizes a top-level service is called aTop-Level

Component(TLC), while a component that realizes a low-level service is called aLow-Level

Component(LLC). In order to explain the component generation process clearly and automate

the process in the implementation, we describe a generated component as a tuple :

(name, if , CF , CC , CHG)

In the above tuple,name is the name of the component,if is the interface that provides the

entry point of the component,CF is the facade class set of the realized service (we also call

it the Facade Class Setof the component),CC is theConstituent Class Setwhich contains all

classes/interfaces that are necessary to implement the component, andCHG is the abbreviation

of Component Hierarchy Graphthat is associated to a top-level component to describe its low-

level components. The CHG is defined in Definition 6.1. We export and store the generated

component represented by the above tuple as an XML document. The XML schema for the


component is illustrated in Figure 6.1.

Definition 6.1. The Component Hierarchy Graph (CHG) associated with a top-level component

is a rooted LDG, where the root,r ∈ V , represents the top-level component,V \ r represents the

set of low-level components contained in the top-level component,lV (v) returns thename of v

for anyv ∈ V , E = {(v, w) ∈ V × V | v containsw}, LE = φ, and hencelE(e) returns an

empty label for anye ∈ E.

The CHG shows the structural relationships between the low-level components underneath a

top-level component. Like the SHG, the CHG gives a high-level representation of the compo-

nents that is understandable by both developers and business experts. Also, the CHG describes

the modularization of its top-level component. There is no CHG associated with a low-level com-

ponent; that is, CHG =φ for a low-level component. That is because the low-level component has

already been presented in the CHG of its top-level service. The CHGs of all top-level components

form thecomponent view(CompView) of the system.

Before we present the technique for automatically generating components, we introduce the

reachability concept in the CIDG and CIRG. We use reachability concept in the component gen-

eration process.

Definition 6.2. Let G = (V, E) be the CIDG of an existing object-oriented system, whereV

represents all nodes (i.e., classes or interfaces) inG andE represents all edges (i.e., dependency)

in G. Given two classesv ∈ V andw ∈ V , classw is said to bereachablefrom classv if there

exists a directed path fromv to w, denoted byv∗−→ w.

Definition 6.3. Let G = (V, E) be the CIRG of an existing object-oriented system, whereV

represents all nodes (i.e., classes or interfaces) inG andE represents all edges (i.e., relationships)

in G. Given two classesv ∈ V andw ∈ V , classw is said to beinheritance (realization)

reachablefrom classv ∈ CIRG.V if there exists a directed path fromv to w and the labels of

all edges in this path containinheritance(realization) types, denoted byvIN∗−→ w (v

RE∗−→ w).


<<complexType>>

Component::FacadeClassSet

<<seq>> +fc_sequence [1..1] <<elt>> +class [0..1] : xsd: string


<<choice>>

fc_sequence

<<complexType>>

Component::ConstituentClassSet

<<seq>> +cc_sequence [1..1]

<<elt>> +name [1..1] : xsd: string

<<sequence>>

chg_sequence

<<complexType>>

Component::ComponentHierarchyGraph

<<seq>> +chg_sequence [1..1]

<<elt>> +class [0..1] : xsd: string


<<choice>>

cc_sequence


<<elt>> +interface [1..1] : xsd:string

<<elt>> +facadeClassSet [1..1]

<<elt>> +constituentClassSet [1..1]

<<elt>> +componentHierarchyGraph [1..1]

<<sequence>>

sequence


<<complexType>>

Component

Figure 6.1: The UML Representation of XML Schema for a Component.

We extend the refactoring approach presented in [90] to automatically generate an interface

for each component corresponding to an identified service. Letserv be an identified service

represented by the tuple

serv(name,CF , SHG)

andcomp be the generated represented by the tuple

comp(name, if , CF , CC , CHG),

the key steps for generating the component are enumerated as follows :

• Step 1: Name the component by copying its service’s name,comp.name = serv.name.

• Step 2: Compute the facade class set of the component by copying its service’s facade

class set, Ccomp.CF = serv.CF .

• Step 3: Compute the constituent class set of the component,

comp.CC = comp.CF ∪⋃

all c∈comp.CF{v ∈ CIDG.V | (c ∗−→ v)}.


• Step 4: Create a new interface namedif . Modify each class incomp.CF to implementif .

Modify each interface incomp.CF to extendif .

• Step 5: Add declarations of all public methods defined in each class inVIN to if , where

VIN =⋃

all c∈comp.CF{v ∈ CIRG.V | (c IN∗−→ v)},

and modify each class inVIN to implementif .

• Step 6: Copy declarations of all public methods declared in each interface inVRE to if ,

where

VRE =⋃

all c∈comp.CF{v ∈ CIRG.V | (c RE∗−→ v)},

and modify each interface inVRE to extendif .

• Step 7: Add declarations of setter and getter methods for all public class fields declared

in each class incomp.CF ∪ VIN to if , and implement the corresponding setter an getter

methods in classes where these fields are originally declared.

• Step 8: Add declarations of getter methods for all public class fields declared in each

interface incomp.CF ∪ VRE to if , and implement the corresponding getter methods in

classes that implement the interfaces where these fields are originally declared.

• Step 9: Assign the newly built interface to the component,comp.if = if .

• Step 10: Generate the component hierarchy graph (CHG) for the component,

comp.CHG =

G serv.SHG 6= φ (i.e.,serv is a top-level service);

φ otherwise.

whereG is a copy ofserv.SHG, except that names of all nodes inG are changed to

corresponding service names, not the facade classes any more.


Note that the source modification in the above steps does not change the observable behavior

of the original system. Once the tuple(name, if , CF , CC , CHG) for a component has been

constructed, we can package all classes and interfaces withinCC together with the newly created

interfaceif into a JAR file namedname.jar. The packaged component is self-contained and

loosely coupled and hence can be deployed and used independently.

6.1.2 An Example

To further describe the component generation process, let us give an example of realizing an

identified service. In Chapter 5, we identified services from the hypothetical CRS system . One

of these,Customer, is a low-level service underneath the top-level serviceV ehicleBooking

represented by the tuple

serv(name,CF , SHG)

where

serv.name = Customer,

serv.CF = {com.uwstar.crs.person.Customer}, and

serv.SHG = φ.

Let the tuplecomp(name, if , CF , CC , CHG) represent the component that realizes service

Customer, the steps for realizing the service are enumerated as follows (the part of UML class

diagram of the component is shown in Figure 6.3) :

1. comp.name = serv.name = Customer.

2. comp.CF = serv.CF = {com.uwstar.crs.person.Customer}.

3. Note that⋃

all c∈comp.CF{v ∈ CIDG.V | (c ∗−→ v)} represents all classes or interfaces

that are reachable from every class incomp.CF in the CIDG.


+ id : String

- creditRecord : CreditRecord

- drivingRecords : DrivingRecord[]

Customer

+ Customer()

+ Customer(String)

+ updateCreditRecord(int)

+ addDrivingRecord(String)

+ getCreditStatus() : int

+ isSafeDriver() : boolean

+ evaluateVehicles() : String[]

- name : String

- address : String

- phoneNumber : String

Person

+ Person()

+ setName(String)

+ getName() : String

+ setAddress(String)

+ getAddress() : String

+ setPhoneNumber(String)

+ getPhoneNumber() : String

Figure 6.2: The UML Class Diagrams ofCustomer andPerson in the CRS System.

In this example,⋃

all c∈comp.CF{v ∈ CIDG.V | (c ∗−→ v)} =

{ com.uwstar.crs.person.Person,

com.uwstar.crs.record.CreditRecord,

com.uwstar.crs.record.DrivingRecord,

com.uwstar.crs.record.Record }

Then, we have

comp.CC = comp.CF ∪⋃

all c∈comp.CF{v ∈ CIRG.V | (c ∗−→ v)} =

{ com.uwstar.crs.person.Customer,

com.uwstar.crs.person.Person,


com.uwstar.crs.record.DrivingRecord,

com.uwstar.crs.record.Record }

4. Create a new interface namedICustomer. Since there is only one class incomp.CF (i.e.,

com.uwstar.crs.person.Customer), we modify this class to implementICustomer as


shown in Figure 6.3.

5. The inheritance reachable class set of classcom.uwstar.crs.person.Customer is ex-

tracted as follows :

VIN = {com.uwstar.crs.person.Person}Figure 6.2 depicts the UML class diagrams of classcom.uwstar.crs.person.Customer

and classcom.uwstar.crs.person.Person. We add declarations of all public methods

defined in classcom.uwstar.crs.person.Person to ICustomer, and we modify class

com.uwstar.crs.person.Person to implement the interfaceICustomer. These modifi-

cations are reflected in Figure 6.3.

6. Since the realization reachable class set of classcom.uwstar.crs.person.Customer is

empty (i.e.,VRE = ∅), there is no action needed in this step.

7. As Figure 6.2 shows, there is only one public class field declared in class

com.uwstar.crs.person.Customer

(i.e., id) and no public class field in classcom.uwstar.crs.person.Person. We add the

setter method declarationsetID(String) and the getter method declarationgetID() :

String to interfaceICustomer. We also need to implement these two methods in class

com.uwstar.crs.person.Customer. Listing 6.1 lists the implementation of these two

methods. These modifications are also reflected in Figure 6.3.

8. Again, sinceVRE = ∅, there is no action needed in this step.

9. comp.if = ICustomer.

10. comp.CHG = φ, because the serviceCustomer is a low-level service. Hence, the gen-

erated component is a low-level component. If the service is a top-level service, the CHG

of the generated component is the SHG of the top-level service except node names in the

SHG are changed to corresponding service names.


ICustomer






+ setID(String)

+ getID() : String

+ setName(String)





+ getPhoneNumber() : String+ id : String

- creditRecord : CreditRecord

- drivingRecords : DrivingRecord[]

Customer

+ Customer()

+ Customer(String)






+ setID(String)

+ getID() : String

Newly added methods to

implement the methods declared

in Icustomer interface

Newly created interface for

component Customer

- name : String

- address : String

- phoneNumber : String

Person

+ Person()

+ setName(String)





+ getPhoneNumber() : String

Figure 6.3: Part of UML Class Diagram of the ComponentCustomer.

Now we are ready to package the following classes (i.e., the constituent class set) :

com.uwstar.crs.person.Customer,

com.uwstar.crs.person.Person,


com.uwstar.crs.record.DrivingRecord, and

com.uwstar.crs.record.Record

together with the newly created interfaceICustomer as a JAR file namedCustomer.jar.


1 p u b l i c c l a s s Customer e x t e n d s Person imp lements ICustomer{23 p u b l i c S t r i n g i d ; \\ cus tomer ID4 . . .5 p u b l i c vo id s e t I D ( S t r i n g i d ) {6 t h i s . i d = i d ;7 }8 p u b l i c S t r i n g get ID ( ) {9 r e t u r n i d ;

10 }11 . . .12 }

Listing 6.1: The Implementation of methodssetID andgetID in classCustomer.

6.2 System Transformation

One of the primary goals of the proposed SOC4J framework is to transform the monolithic ar-

chitecture of an existing object-oriented system to a more flexible service-oriented architecture.

In the previous stages of the framework, we have identified services and packaged the identi-

fied services into self-contained components. Now, we introduce a reconstruction technique that

automatically reconstructs the existing source system into a component-based target system.

6.2.1 Approach

The reconstruction process is based on the extracted components. In this thesis, extracted com-

ponents are categorized into two classes : top-level components and low-level components. A

top-level component has an associated component hierarchy graph (CHG) to describe the low-

level components contained in the top-level component. Each component is self-contained and

has been packaged into a JAR file. Based on extracted components, we design a meta-model,

depicted in in Figure 6.4, for the component-based target system. The target system is composed


Target System

(Component-Based System)

contains

contains

1

1

*

Top-Level Component

(JAR file)

contains

1

*

Low-Level Component

(JAR file)

Class/Interface

(Java file)contains

1

1..*

*

contains1

*

contains1

*

Figure 6.4: The Meta-Model for the Component-Based Target System.

of one or more top-level components, as well as a set of classes/interfaces, while each top-level

component might consist of some low-level components together with a set of classes and in-

terfaces. Like the top-level component, the low-level component might contain other low-level

sub-components, classes and interfaces. In the source system, some classes or interfaces may not

be identified as business services or not be contained in identified business services. Therefore,

these classes or interfaces are not packaged into components. In order to preserve the behavior of

the system, we have to include these classes or interfaces in the component-based target system.

We reconstruct the target system by adopting a bottom-up integration technique that collab-

orates with the extracted components, starting with the components in the lowest position in the

component hierarchy. The reconstruction process should not change the observable behavior of

the existing system. The surrounding parts of the component should use newly extracted compo-

nents in order to avoid the situation where two sets of classes, which provide the same function-

alities, exist in the same system. Algorithm 6.1 describes the transformation process, taking in

the source system and the extracted components represented as input. Extracted components are

represented as tuples in the form of(name, if , CF , CC). The output of the algorithm will be an


Algorithm 6.1: System-Transformation

Input : An existing object-oriented system and extracted components from the systemOutput : A component-based target system

foreach top-level componentt do1

while there exists a low-level component int.CHG do2

// star with the component in the lowest position in the// component hierarchyc ← node without descendants int.CHG;3

// retrieve components that contain component cP ← parents ofc in t.CHG;4

// refactoring the parents of component c to use cforeachp ∈ P do5

Change the code of classes inp.CC that reads (or writes) the public fields of6

classes inc.CF to the code that invokes the correspondinggetter(or setter)methods in interfacec.if ;Replace the reference types in classes inp.CC , which refer to any classes in7

c.CF , with interfacec.if ;end8

// update t.CHG to remove component cRemove nodec from t.CHG;9

end10

end11

instance of the meta-model described in Figure 6.4.

6.2.2 An Example

To further describe the system transformation process, we give an example of reconstructing the

CRS system into a component-based target system.

Consider the following top-level services identified after the service identification stage :

• (V ehicle Booking, {com.uwstar.crs.Booking}, SHGV B). The service hierarchy graph


com.uwstar.crs.VehicleRepository




com.uwstar.crs

Booking


Agent


Customer

com.uwstar.crs

VehicleEvaluation


Customer

(a) (b)

Figure 6.5: The Service Hierarchy Graphs of the CRS System.

Vehicle Repository

Vehicle Booking

Agent Customer

Vehicle Evaluation

Customer

(a) (b)

Figure 6.6: The Component Hierarchy Graphs of the CRS System.

SHGV B is shown in Figure 6.5 (a).

• (V ehicle Evaluation, {com.uwstar.crs.V ehicleEvaluation}, SHGV E). The service

hierarchy graphSHGV E is shown in Figure 6.5 (b).

We have two top-level components generated after the component generation stage, and the

low-level components underneath each top-level component are described in the related compo-

nent hierarchy graph. The two top-level components are described as follows :

• (V ehicle Booking, IBooking, CF1, CHGV B). The component hierarchy graphCHGV B

is shown in Figure 6.6 (a).


• (V ehicle Evaluation, IEvaluation, CF1, CHGV E). The component hierarchy graph

CHGV E is shown in Figure 6.6 (b).

After running Algorithm 6.1, we get the component-based version of the CSR system as

Figure 6.7 shown. The component-based system has the same functionality as the original system.

<<application>>

Car Rental System

<<component>>

:Vehicle Repository

<<component>>

:Vehicle Booking

<<component>>

:Vehicle Evaluation

<<component>>

:Agent

<<component>>

:Customer

IRepository IAgent

ICustomer

IBooking IEvaluation

Dealer

contains

Figure 6.7: The Component-Based Car Rental System.

6.3 Summary

In this chapter, we explained the processes contained in the component generation stage and sys-

tem transformation stage of the SOC4J framework. We have discussed how an identified service

can be realized as a self-contained component and how the existing system can be reconstructed

into a component-based system based on the components that realize the identified services.

Chapter 7

Empirical Studies

In this chapter, we perform a set of empirical studies on the proposed SOC4J framework to

assess the service-oriented componentization techniques introduced in this thesis. The proposed

technique has been implemented in a prototype that aims to i)identify critical business services

embedded in an existing Java system, ii) realize identified services into self-contained reusable

components, and iii) transform the existing system into a component-based system. Therefore,

the purpose of the empirical study in this chapter is to test the effectiveness of the proposed SOC4J

framework and assess i) the usefulness in terms of feasibility and effectiveness of the architecture

recovery and representation approach, ii) the usefulness in terms of efficiency and effectiveness

of the business service identification technique, iii) the usefulness in terms of effectiveness of the

identified service modeling and packaging techniques, and iv) the time and space complexity of

the service-oriented componentization technique as a function of source code size.

We outline the implementation of the prototype for the SOC4J framework in Section 7.1. In

Section 7.2, we discusses two evaluation criteria for the proposed framework. While we present

empirical studies on two Java open source projects in Section 7.3 and 7.4. Finally, we summary

this chapter in Section 7.5.

94

CHAPTER 7. EMPIRICAL STUDIES 95

7.1 A Prototype for the SOC4J Framework

As a part of this work, the proposed service-oriented componentization approach has been im-

plemented in a prototype which offers an interactive and integrated environment for i) identify-

ing critical business services embedded in an existing Java system, ii) realizing each identified

service as a self-contained component, and iii) transforming the object-oriented design into a

service-oriented architecture. We have named the prototypeJComp, an Java Componentization

Kit. The JComp is an integrated tool workbench targeted at rapidly integrating software tools for

prototyping the SOC4J framework. Now, we examine the tool integration requirements for the

SOC4J framework and discuss the implementation of the JComp.

7.1.1 Tool Integration Requirements

As we discussed in Chapter 3, several software tools are needed for the SOC4J framework to

componentize an object-oriented system and re-modularize the existing assets for supporting ser-

vice functionality. Figure 7.1 depicts the tool interconnection of the SOC4J framework. Five

rounded rectangles on the right side of the figure represent the tools needed for the the SOC4J

framework, while the flow of data needed for integrating the tools within the framework is shown

by the thick arrow on the right side of the diagram.

The functionality of each tool is outlined as follows :

Source Code ModelingThis tool parsers the Java source code and outputs a set of raw data

of the facts. Based on the extracted facts, the tool further generates source code models

defined in Chapter 4, including, JPackage, JFile, JClass, and JMethod. The raw data set

and source code models are exported as XML documents.

Architecture Modeling Based on the source code model, this tool identifies all class relation-

ships defined in Chapter 4. It exports identified relationships in graph representations, that


Integrated

Tool Workbench

for

SOC4J

Framework

Source Code Modeling

Service Identification

Architecture Modeling

Component Generation

System Transformation

Java Source Code

FactsSource Code Models

Flo

w o

f In

tegra

tion D

ate

Source Code Models

CIRG, CIDG

CIRG, CIDG

Identified Services

Identified Services

Self-Contained Components

Self-Contained ComponentsSource Code

Component-Based System

Figure 7.1: The Tool Interconnection for the SOC4J Framework.

is, the CIRG and CIDG. Basic reusability attributes for each class in the system also are

computed. The CIRG and CIDG are exported as XML documents.

Service Identification This tool assists users in identifying the business services embedded in

an existing Java system through analysis of the CIRG and CIDG. Firstly, it identifies the

top-level services of the system and builds a service hierarchy graph for each identified

top-level service. Then, it performs a graph transformation on the service hierarchy graph

to identify low-level services for each top-level service.

Component Generation This tool realizes identified services into self-contained components.

For each identified service, it extracts all classes/interfaces that are necessary for imple-

menting the service, generates an interface for the service, and packages these classes/in-

terfaces together with the interface as a JAR file.

System Transformation This tool reconstructs an existing Java system into a component-based


system by using the generated component from the source system. The system transforma-

tion process preserves the functionality of the source system.

7.1.2 JComp RCP Application

The JComp is built on the top of the Eclipse Rich Client Platform (RCP) [68] and hence it is

called an Eclipse RCP application. An Eclipse RCP application is a collection of plug-ins and

the Runtime on which they run. The platform-independent Eclipse RCP architecture makes rich-

client applications easy to write because business logic is organized into reusable components

called plug-ins. Eclipse RCP provides a core set of services, representing a substantial percentage

of the rich client platform development functionality, so that developers do not have to rewrite

infrastructure code. These Eclipse RCP services are available to every application component

plug-in. These services are the interface between a plug-in and the low-level platform-specific

functionality that supports the plug-in, just like a J2EE container is the interface between EJB

and the application server. Moreover, because of the Eclipse open source license, we can use the

technologies that went into Eclipse to create our own commercial-quality programs. The GUI

toolkits used by Eclipse RCP are the same used by the Eclipse IDE and enable applications with

optimal performance that have a native look and feel on any platform that they run on.

The architecture of the JComp toolkit is depicted in Figure 7.2. The internals of the JComp are

the same OSGi runtime and GUI toolkit provided by the Eclipse IDE. The OSGi runtime enables

Java code from multiple sources to all run together in a single Java Virtual Machine (JVM). The

OSGi framework automatically loads and runs bundles which are encapsulations of various files.

This provides the mechanism by which plug-ins can be automatically detected and loaded into the

JComp RCP application. The resource manager provides a GUI to show the current configuration;

that is, a list of installed plug-ins. It assists the end user in finding and installing new plug-ins.

It is also capable of scanning through the list of already-installed plug-ins to look for updates to


JComp RCP Application

Eclipse RCP Platform

Platform Runtime (OSGi)

Resource Manager

SWT

JFace

UI (Generic Workbench)

Transformer Plug-in

Generator Plug-in

Extractor Plug-in

Modeler Plug-in

Parser Plug-in

Figure 7.2: The Architecture of the JComp Java Componentization Kit.

these plug-ins. The Standard Widget Toolkit (SWT) provides a completely platform-independent

API that is tightly integrated with the operating system’s native windowing environment. Java

widgets actually map to the platform’s native widgets. This gives Java applications a look and

feel that makes them virtually indistinguishable from native applications. The JFace toolkit is

a platform-independent user interface API that extends and interoperates with the SWT. This

library provides a set of components and helper utilities that simplify many of the common tasks

in developing SWT user interfaces. The generic workbench provides extension points that the

plug-ins extend. The plug-ins provide functionality that is integrated into the RCP platform just

as if it were always part of the application.

As Figure 7.2 depicted, each tool described in Section 7.1.1 was implemented as a separate

JComp plug-in. A snapshot of the JComp Java Componentization Kit is depicted in Figure 7.3.


Figure 7.3: A Snapshot of the JComp Java Componentization Kit.


7.2 Evaluation Criteria

Since the proposed framework is trying to extract reusable components from an object-oriented

system and migrate the object-oriented design to a service-oriented architecture, the evaluation

criteria needs to addresscomponent reusabilityandarchitectural improvement.

7.2.1 Component Reusability

The components acquired by applying the proposed framework are structurally reusable because

the internal structures are encapsulated and the components are self-contained and thus have no

dependency upon the entities outside of them. However, we still need to seek a way to assess the

reusability quantitatively.

Reusability Metric Suite

Components have two relatively static sources of information : the external documentation and

the public interface. The external documentation is an important source of information that can

greatly affect component reusability; such documentation is developed for a human audience,

which makes it harder to measure. On the other hand, component interfaces are easily parsed by a

computer, making them easier to measure. This is an important argument for developing reusabil-

ity metrics based upon component interfaces. In this thesis, we aim to assess the reusability of the

extracted components through the analysis of their interfaces and internal methods as well. We

define a reusability metric suite by selecting and adapting the metrics defined in [13, 25, 70, 91]:

Parameter Per Method (PPM ) ThePPM metric measures the mean size of method declara-

tions of the interface of the component, and it is defined as follows:

PPM =

IPCIMC if IMC > 0;

0 otherwise.

(7.1)


where the metricIPC (Interface Parameter Count) is the count of parameters of all public

methods in the interface of the component, and the metricIMC (Interface Method Count)

is the count of public methods in the interface of the component.

It is believed that methods with fewer parameters are easier to understand, and so will be

easier to reuse [58]. It follows that component interfaces with lowerPPM will tend to

have lower complexity and hence better understandability.

Reference Parameter Density (RPD) TheRPD metric measures the occurrence of reference

parameters in an interface, and it is defined as follows:

RPD =

IRPCIPC if IPC > 0;

0 otherwise.

(7.2)

where the metricIRPC (Interface Reference Parameter Count) is the count of reference

type parameters of all public methods in the interface of component.

It is believed that the use of references makes it more difficult to understand the pro-

gram [87]. This is also applicable to interfaces, as arguments which are passed by reference

tend to be more difficult to understand than arguments which are passed by value. A higher

RPD will indicate that an interface tends to be more difficult to understand. However, it

is often necessary for reference arguments to be used so that useful functionality can be

implemented. Therefore, a high value is not necessarily evidence of a poor interface, but it

does suggest that good documentation is requested [13].

Rate of Component Observability (RCO) TheRCO metric measures the percentage of read-

able properties in all fields implemented within the interface of the component, and it is


defined as follows:

RCO =

IRMCIFRC if IFRC > 0;

0 otherwise.

(7.3)

where the metricIRMC (Interface Reader Method Count) is the count of public methods

in the interface of the component that read a field, the metricIFRC (Interface Field and

Reference Count) is the count of fields and references the interface of the component.

RCO indicates the component’s degree of observability for users of the component [91].

To understand the behavior of a component from outside the component, the observability

of the component should be high. However, there is a possibility that it is difficult for

users to find an important readable property among all of the readable properties when the

observability is too high.

Rate of Component Customizability (RCC) TheRCC metric measures the percentage of writable

properties in all fields implemented within the interface of the component, and it is defined

as follows:

RCC =

IWMCIFRC if IFRC > 0;

0 otherwise.

(7.4)

where the metricIWMC (Interface Writer Method Count) is the count of public methods

in the interface of the component that write a field.

RCC indicates the component’s degree of customizability for users of the component. To

adapt the settings of a component from outside the component to the user’s requirements,

the customizability of the component should be high. However, too high a customizability

violates the encapsulation of the component, and leads to greater opportunities for improper

use [91].

Self-Completeness of Component’s Return Values (SCCr) TheSCCr metric measures the per-


centage of business methods without any return values in all business methods implemented

in the component, and it is defined as follows:

SCCr =

V MCMC if MC > 0;

1 otherwise.

(7.5)

where the metricV MC (Void Method Count) is the count of public methods in the compo-

nent that have void return type, and the metricMC (Method Count) is the count of public

methods in the component.

SCCr indicates the component’s degree of self-completeness and external dependency,

based on the return values of methods. The smaller the number of business methods without

return value, the smaller the possibility of the component having external dependency. High

self-completeness of a component (i.e., low external dependency) leads to high portability

of the component [91].

Self-Completeness of Component’s Parameters (SCCp) TheSCCp metric measures the per-

centage of business methods without any parameters in all business methods implemented

in the component, and it is defined as follows:

SCCp =

NPMCMC if MC > 0;

1 otherwise.

(7.6)

where the metricNPMC (None Parameter Method Count) is the count of public methods

in the component that do not have any parameters.

SCCp indicates the component’s degree of self-completeness and external dependency,

based on the parameters of methods. The fewer business methods without parameters, the

smaller the possibility of having dependency outside the component [91].


Reusability Model

Reusability is a high-level quality of software components and hence it is the result of the combi-

nation and interaction of many low-level properties. The component reusability model typically

shows reusability as being composed of properties such as complexity, observability, customiz-

ability, and external dependency. From the user’s point of view, we define a component reusability

model as illustrated in Figure 7.4. This model is an adaptation of the reusability model intro-

duced by Washizaki et al. [91]. The quality factors are selected only to provide an analysis of

the reusability of a component, while factors related to other aspects of component quality that

are not considered to be important to reusability are not considered. The choice of the three fac-

tors affecting reusability has been made on the basis of an analysis of the activities carried out

when reusing a black-box component. We extend Washizaki’s model to quantify the complexity

of components by utilizing metricReference Parameter Density (RPD)proposed in [13]. Thus,

the adapted model includes aspects related to theUnderstandability, Adaptability, andPortability

factors given by ISO 9126 [1].

Reusability

Portability

Understandability

Adptability

Complexity

Observability

Customizability

External Dependency

RPD

RCO

RCC

SCCr

SCCp

Characteristic Quality Factor Criteria Metric

Figure 7.4: The Component Reusability Model.

In order to quantify the reusability of the components generated by our framework, based on


the reusability model we formulate reusability measurement as follows:

Reusablity = wcomplexity ∗RPD +

wobservability ∗RCO +

wcustomizability ∗RCC +

wex−dependency ∗ (SCCr + SCCp

2)

(7.7)

By their definition, the values of all metrics in above formula are in[0, 1]. Since the com-

plexity and external dependency have a negative effect on reusability, the weightwcomplexity and

wex−dependency could be values in [−1, 0], while the observability and customizability have a

positive effect and hence the weightwobservability andwcustomizability could be any values in [0,

1]. Nevertheless, the sum of these four weights is set to1. Consequently, the reusability value

will be in [0, 1] and a higher value represents a higher level of the reusability.

7.2.2 Architectural Improvement

The software architecture of a program or computing system is the structure of the system, which

comprise software components, the externally visible properties of those components, and the

relationships among these components. The more complex a system structure is, the more dif-

ficult it is to understand, and therefore to maintain. We wish to measure the degree of confor-

mance, which the target (restructured) architecture presents, to the architectural principles of high

intra-module cohesion and low inter-module coupling. In this thesis, we introduce a metric for

measuring a large software system to determine if it is ”well-structured”, based on the concept of

entropy from information theory.

Entropy from an information theoretic point of view has been proposed in [78] for evaluating

the structuredness of a software’s design. We adopt the definition of entropy for an object-oriented

design introduced in [20] to compute the entropy of our source systems and target systems, re-


spectively. The smaller the entropy value, the better structure the system has. We then compare

the results to see whether the structures of our target systems are improved. The entropy of a

object-oriented systemS with n classes is defined as follows [20]:

H(S) = −n∑

i=1

p(ci) log p(ci)2 (7.8)

It is assumed that the system is described in a standard class diagram format following UML

notation for associations between classes. For a randomly selected unary association,p(ci) is de-

fined as the probability that the association leads to classci. The existence of such an association

indicates that classci provides services to the rest of the system, since it responds to messages

sent to it. Within this context, bi-directional associations are treated as two separate unary as-

sociations. Classes are used as the units for entropy measurement because classes represent the

most important fundamental building blocks of an object-oriented system and are an identifiable

abstraction that is present both in designs and implementations.

To compute the entropy metric of the source system of our framework, letn be the number of

classes/interfaces of the source system, we computep(ci) as the ratio of the number of incoming

edges of classci over the total number of edges in the CIDG of the source system. To compute

the entropy metric of the target system of our framework, we considern as the total number

of components and classes/interfaces contained in the target system, and we then computep(ci)

using the same way as in the source system except that there may exist an association between a

class/interface and a component.

7.3 Case Study : Jetty

In this section, we apply the JComp Java Componentization Kit to Jetty [46] to empirically eval-

uate the usefulness of the proposed SOC4J framework.


7.3.1 Statistics of the Jetty

Jetty is an open-source, standards-based, full-featured web server implemented entirely in Java. It

is released under the Apache 2.0 licence and is therefore free for commercial use and distribution.

Jetty can be used as : i) a stand-alone traditional web server for static and dynamic content, ii) a

dynamic content server behind a dedicated HTTP server such as Apache using Apache module

mod proxy, and iii) an embedded component within a Java application.

Project Version LOC Java Source Files Packages Classes Interfaces

Jetty 5.1.10 44125 318 25 273 47

Table 7.1: Statistics of the Jetty.

As shown in Table 7.1, we work on Jetty version 5.1.10, which was released on April 5, 2006.

It has about 44K LOC source code and consists of 318 Java source files that defines 273 classes

and 47 interfaces distributed in 25 packages.

7.3.2 Discussions on Obtained Results

In order to componentize the Jetty system, we first applied the JComp Java Componentization Kit

to identify business services embedded in the system. The JComp then generated a self-contained

component for each identified service.

The Parser plug-in of the JComp imported the source code of the Jetty and built a set of

source code models. These source code models were exported and stored as XML documents.

The Modeler plug-in imported the source code models and recovered architectural models that

are represented by the CIRG and CIDG. Like the source code models, the CIRG and CIDG were

exported and stored as XML documents. Firstly, based on the CIRG and CIDG, the Extractor

plug-in, which implements the top-level service identification algorithm (i.e., Algorithm 5.2) and

the low-level service identification algorithm (i.e., Algorithm 5.3), identified33 top-level service


Figure 7.5: The AcceptedService Viewof the Extractor plug-in.

candidates from the CIDG. We then validated each candidate by examining the facade class set

of these candidates, and accepted16 top-level services. These16 top-level services represent the

functionality of the Jetty from the points of view of end users. Appendix A lists and describes

all accepted top-level services of the Jetty web server. Figure 7.5 depicts the acceptedService

View of the Extractor plug-in, which displays all accepted top-level services of the Jetty. The

unacceptable candidates are dead code, debugging modules, or testing modules. For instance, we

found8 dead classes inorg.mortbay.utilpackage and a debugging module whose entry point is


the classorg.mortbay.servlet.ProxyServlet.

ID Top-Level Service Classes/interfaces Low-Level Services

T1 Win32 Server 248 11T2 Dynamic Servlet Invoker 207 12T3 Jetty Server MBean 126 9T4 Proxy Request Handler 113 7T5 XML Configuration MBean 87 5T6 Web Application MBean 86 6T7 Administration Servlet 56 5T8 CGI Servlet 49 5T9 Host Socket Listener 46 5T10 Web Configuration 34 3T11 Authentication Access Handler 30 3T12 Servlet Response Wrapper 27 2T13 IP Access Handler 18 0T14 Multipart Form Data Filter 16 2T15 HTML Script Block 12 1T16 Applet Block 9 1

Table 7.2: Top-Level Services Identified from Jetty.

After all the top-level services were validated, the Extractor plug-in then identified low-level

services underneath each top-level service. Table 7.2 shows the atomic services and identified

low-level services for each top-level service. Actually, atomic services of a top-level service are

Java classes or interfaces that implement the top-level service; they are represented by nodes of

the original SHG of the services. For example, as Table 7.3 shows, there are11 low-level services

identified from top-level serviceWin32Server (i.e., top-level service T1). This top-level service

runs the Jetty as a Windows HTTP server. When identifying low-level services, we used the

Termination Criterion 5.1 described in Chapter 5 to terminate the iteration in Algorithm 5.3 by

settingMQ = 0.75. In the case that the level of granularity of services is crucial, the user may use

the Termination Criterion 5.2 for Algorithm 5.3. As Figure 7.6 shows, we terminated the low-

level service identification process at the fifth iteration. The final low-level services identified for


original SHG

1st iteration

2nd iteration

final iteration

- - - - - - - - - -

- - - - - - - - - -

- - - - - - - - - -

- - - - - - - - - -

Figure 7.6: Iterations of the Service Aggregation Process of Top-Level ServiceWin32 Server.

top-level serviceWin32Server are shown in Table 7.3.

To realize each identified service (both top-level service and low-level service), the Generator

plug-in generated a self-contained component for each service. Figure 7.7 illustrates the compo-

nent hierarchy graph (CHG) of the top-level componentWin32 Server. There are11 low-level

components contained in the top-level component. Furthermore, the Generator plug-in measured

the reusability for each generated component, applying the component reusability model by com-

puting Formula (7.7). In this empirical study, we setwcomplexity = − 0.3, wobservability = 0.8,

wcustomizability = 0.8, andwex−dependency = −0.3. Figure 7.8 shows reusability values of the


HTTP

Response

Win32

Server

Jetty

Server

HTTP

Connection

HTTP

Request

Security

Handler

Service

Handlers

Resource

Handler

Servlet

Handler

Web

Application

Context

Servlet

Figure 7.7: The CHG of Top-Level ComponentWin32 Serverof the Jetty.

Low-Level Component Reusability

Jetty Server 0.9Service Handlers 0.6Resource Handler 0.7Security Handler 0.7Socket Listener 0.8HTTP Connection 0.9HTTP Request 0.7HTTP Response 0.5Web Application Context 0.6Servlet 0.7Servlet Handler 0.8

Table 7.3: Low-Level Services Identified in Top-Level ServiceWin32 Server.


top-level components and the average value of the low-level components underneath each top-

level component. From Figure 7.8, it was observed that all top-level components, exceptC16,

have reusability value above0.5 and all the average values are between0.6 to 0.8. Thus, we could

conclude that identified services from the Jetty project have a reasonable level of the reusability.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16

Top-Level Components

Reu

sab

ilit

y

Reusability of Top-Level Components

Average Reusability of Low-Level Components in a Top-Level Component

Figure 7.8: The Reusability of Components Extracted from Jetty.

The Transformer plug-in transformed the Jetty into a component-based system based on the

generated components. We named the target systemJetty-JComp. As we see in Algorithm 6.1,

Jetty-JComp has the same functionality as Jetty. The Jetty-JComp now contains16 independent

JAR files. Each JAR file provides a top-level service and can be used independently. Also, each

independent JAR file is a component-based system that consists of a set of JAR files.

We have computed the entropy of both Jetty and Jetty-JComp by applying Formula (7.8).

When computing the entropy of Jetty-JComp, we used the component hierarchy graphs instead

of the CIDG because Jetty-JComp is comprised of components. We found that the entropy of the

Jetty-JComp was reduced by45.5%, compared to the the original Jetty project. Hence, we can

conclude that our transformation dramatically improves the structure of the system.

In Table 7.4, we summarize the time and space complexity of the proposed service-oriented


Measurement Item Value

Case Study Size (KLOC) 44.1Source Code Modeling Time (min : sec) 2:18Source Code Model Space (MB) 1.43Architecture Modeling Time (min : sec) 4:19Architecture Model Space (MB) 1.57Top-Level Service Identification Time (min : sec) 6:45Average Low-Level Service Identification Time (sec) 66

Table 7.4: Some Time and Space Statistics of the SOC4J Framework on the Case Study : Jetty.

componentization framework as a function of source code size of the Jetty project. The experi-

ment was carried on a Windows desktop with Intel Pentium IV CPU 3.4GHz, 2G memory.

7.4 Case Study : Apache Ant

In this section, we apply the JComp Java Componentization Kit on another Java open source

project, namely Apache Ant [2], to further evaluate the usefulness of the proposed SOC4J frame-

work.

7.4.1 Statistics of the Apache Ant

The Apache Ant is a software tool for automating software build processes. It is similar to

makebut is written in the Java language and is primarily intended for use with Java. The most

immediately noticeable difference betweenAntandmakeis thatAntuses a file in XML format to

describe the build process and its dependencies, whereasmakehas its ownMakefileformat. By

default the XML file is namedbuild.xml. Ant is an Apache project. It is open source software,

and is released under the Apache Software License 2.0.

As shown in Table 7.5, we work on the Apache Ant version 1.6.5, which is the latest version.

It has around 86K LOC source code and consists of 690 Java source files that defines 640 classes

and 60 interfaces distributed in 70 packages.


Project Version LOC Java Source Files Packages Classes Interfaces

Apache Ant 1.6.5 86468 690 70 640 60

Table 7.5: Statistics of the Apache Ant.

7.4.2 Discussions on Obtained Results

To componentize the Apache system, as we have done on the Jetty, we first applied the JComp

Java Componentization Kit to identify business services embedded in the system. Then, the

JComp generated a self-contained component for each identified service.

ID Top-Level Service Classes/interfaces Low-Level Services

T1 Project Building 205 34T3 WAR File Creation 152 17T4 TAR File Creation 144 20T6 JUnit Invocation 114 17T8 JAR File Creation 113 17T11 Unit Test Execution 86 14T14 File Content Loading 80 15T17 SSH File Copy 67 19T21 Zip File Creation 57 15T25 XML File Checking 54 9T30 Java Class Execution 45 11T31 Dependency Manifest Generation 45 8T48 GZip File Expansion 34 4T49 File Concatenation 34 6T53 Telnet Session Generation 34 8T63 CVS Repository Retrieval 29 4T69 JavaCC Invocation 26 5T74 File Permission Change 23 5T85 URL File Retrieval 18 4T92 String Replacement 16 4

Table 7.6: Selected Top-Level Services Identified from Apache Ant.

The Parser plug-in of the JComp imported the source code of the Apache Ant and built a set


Low-Level Service Reusability

File Output 0.8Zip File Set 0.6Task Generator 0.9Identity Mapper 0.7Project Loader 0.5Zip Scanner 0.9File Packing 0.8File Mapper 0.5File Scanner 0.6Resource Selector 0.7File Entry 0.8Conversion Rules 0.9Exception Handle 0.7Resource Factory 0.6Type Integers 0.5File Field 0.7Resource Handler 0.8

Table 7.7: Low-Level Services Identified in Top-Level ServiceWAR File Creation.

of source code models. These source code models were exported and stored as XML documents.

The Modeler plug-in imported the source code models and recovered architectural models that

are represented by the CIRG and CIDG. Like source code models, the CIRG and CIDG were

exported and stored as XML documents. First, based on the CIRG and CIDG, the Extractor plug-

in identified236 top-level service candidates from the CIDG. Then we validated each candidate

by examining the facade class set of these candidates. Finally, we accepted101 top-level services.

Appendix B lists and describes all accepted top-level services of the Apache Ant system. These

101 top-level services represent the functionality of the Apache Ant from the point of views of

end users. We also found some candidates are dead code, debugging modules, or testing modules,

and hence are not accepted as top-level services.

After all top-level services were validated, the Extractor plug-in then identified low-level ser-

vices underneath each top-level service. We randomly selected20 top-level services from the


101 accepted services to further identify low-level services underneath each of these20 top-level

services. Table 7.6 shows the atomic services and identified low-level services for each selected

top-level service. For example, as Table 7.7 shows, there are17 low-level services identified from

top-level serviceWARFileCreation (i.e., top-level service T3). TheWARFileCreation

packages Web applications. It packages a set of files into Web archive (WAR) files that should

end up in the WEB-INF/lib, WEB-INF/classes or WEB-INF directories of the Web Application

Archive. we used the Termination Criterion 5.2 described in Chapter 5 to terminate the iteration

in Algorithm 5.3 by examining the level of granularity of low-level services.

File

Output

WAR File

Creation

Task

Generator

Zip File

Set

Resource

Factory

Identity

Mapper

Resource

Selector

Exception

Handle

Resource

Handler

Project

Loader

File

Scanner

Zip

Scanner

File

Entry

File

Mapper

File

Packing

File

Field

Type

Integers

Conversion

Rules

Figure 7.9: The CHG of Top-Level ComponentWAR File Creationof the Apache Ant.

Again, to realize each identified service (both top-level service and low-level service), the

Generator plug-in generated a self-contained component for each service. Figure 7.9 illustrates

the component hierarchy graph (CHG) of top-level componentWAR File Creation. There are17


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

C1 C3 C4 C6 C8 C11 C14 C17 C21 C25 C30 C31 C48 C49 C53 C63 C69 C74 C85 C92

Top-Level Components

Reu

sab

ilit

yReusability of Top-Level Components

Average Reusability of Low-Level Components in a Top-Level Component

Figure 7.10: The Reusability of Components Extracted from the Apache Ant.

low-level components contained in the top-level component. Furthermore, the Generator plug-

in measured the reusability for each generated component, applying the component reusability

model by computing Formula (7.7). Like we did for the Jetty project, we setwcomplexity = −0.3,

wobservability = 0.8, wcustomizability = 0.8, andwex−dependency = − 0.3. Figure 7.10 shows

reusability values of the top-level components of the Apache Ant and the average value of the

low-level components underneath each top-level component. From Figure 7.10, it was observed

that all top-level components, exceptC30, have reusability value above0.5 and all the average

values are between0.5 to 0.9. Thus, we could conclude that identified services from the Apache

Ant project have a reasonable level of the reusability.

Based on the generated components, the Transformer plug-in transformed the Apache Ant

into a component-based system. We named the target systemApache Ant-JComp. As we see in

Algorithm 6.1, Apache Ant-JComp has the same functionality as the Apache Ant. Jetty-JComp

now contains101 independent JAR files. Each JAR file provides a top-level service and can

be used independently. Since we have only further decomposed20 top-level components, each

of these20 corresponding JAR files is a component-based system that consists of a set of JAR

files (i.e., low-level components). Also, we have computed the entropy of both Apache Ant and


Apache Ant-JComp by applying Formula (7.8). Again, when computing the entropy of Apache

Ant-JComp, we used the component hierarchy graphs instead of the CIDG because Apache Ant-

JComp is comprised of components. We found that the entropy of the Apache Ant-JComp was

reduced by16.3%, compared to the original Apache Ant project. The reduction of the entropy is

not as big as the Jetty-JComp, because we componentized only20 top-level services out of101

top-level services identified from the Apache Ant project.

Measurement Item Value

Case Study Size (KLOC) 86.5Source Code Modeling Time (min : sec) 5:20Source Code Model Space (MB) 3.34Architecture Modeling Time (min : sec) 9:15Architecture Model Space (MB) 3.92Top-Level Service Identification Time (min : sec) 19:43Average Low-Level Service Identification Time (sec) 54

Table 7.8: Some Time and Space Statistics of the SOC4J Framework on the Case Study : ApacheAnt.

In Table 7.8, we summarize the time and space complexity of the proposed service-oriented

componentization framework as a function of source code size of the Apache Ant project. The

experiment was carried on a Windows desktop with Intel Pentium IV CPU 3.4GHz, 2G memory.

7.5 Summary

The design and implementation of supporting tools are fundamental requirements to assess the

practical use of a re-engineering approach. In this chapter, we developed a toolkit implementing

the proposed componentization framework as an Eclipse Rich Client Platform (RCP) application,

The important aspects of the proposed framework have been tested through a series of experi-

ments. The empirical study has shown that the proposed framework is effective in identifying

services from an existing Java system and reconstructing it to a component-based system.

Chapter 8

Future Directions and Conclusions

In this Chapter, we summarize the findings of this thesis and outline future research directions

that may arise from this research. In Section 8.1, we present the contributions of this thesis, and

in Section 8.2, we discuss some future work that could extend this research. Finally, we make

some concluding remarks for this work in Section 8.3.

8.1 Contributions

The principle contributions of this thesis were stated in Chapter 1. Based on the material already

presented, we discuss them in more detail :

• The design and implementation of comprehensive graph representations of an object-oriented

system in different levels of abstraction. These graph representations include the class/in-

terface relationship graph (CIDG), the class/interface dependency graph (CIDG), modular-

ized CIDGs (MCIDGs), service hierarchy graphs (SHGs), and component hierarchy graphs

(CHGs). Each graph represents the system in a different level of abstraction.

• The exploration of an incremental program comprehension approach, including describ-

119

CHAPTER 8. FUTURE DIRECTIONS AND CONCLUSIONS 120

ing an object-oriented software system using different concurrent views, each of which

addresses a specific set of concerns of the system. The SOC4J framework extracts four

views to understand an object-oriented software system. The extracted source code models

provide the basic view (BView), while the recovered architectural models build the struc-

tural view (SView), the identified top-level services together with their service hierarchy

graphs give the service view (ServView), and the generated top-level components together

with their component hierarchy graphs introduce the component view (CompView) of the

system. Each view assists the user in understanding the system from a different perspective.

• The design and implementation of an efficient and effective methodology for identifying

and realizing critical business services embedded in an existing object-oriented system.

The business services embedded in an existing system were categorized into two classes :

Top-Level Services (TLS) and Low-Level Services (LLS). A top-level service is a service

that is not used by any other services of the system. However, it may contain a hierarchy of

low-level services further describing the service. From the requester’s point of view, top-

level services are provided by the system that can be accesses independently. A low-level

service is a service that is underneath a top-level service and may be agglomerated with

other low-level services to yield a new service with a higher level of granularity. The service

identification methodology is a combination of top-down and bottom-up techniques. In the

top-down portion of the methodology, we identify the top-level services and the atomic

services underneath each top-level service by identifying the entry points of the system. In

the bottom-up portion, we aggregate the atomic services to identify services with higher

level of granularity by applying a series of graph transformations. The service aggregation

is performed incrementally.

• The design and implementation of an object-oriented restructuring methodology that trans-

forms the typically monolithic architectures of an existing system to a more flexible service-


oriented architecture. For each identified service (both top-level services and low-level

services), we generate a self-contained component. A component that realizes a top-level

service is calledTop-Level Component(TLC), while a component that realizes a low-level

service is callLow-Level Component(LLC). Based on extracted components, a meta-model

for the component-based target system is designed. we introduce a reconstruction tech-

nique that automatically reconstructs the existing source system into a component-based

system.

• The design and implementation of a prototype system that supports the identification and

realization of critical business services embedded in an Java software system and the com-

ponentization of the Java System. The prototype is designed as an Eclipse Rich Client Plat-

form (RCP) application and namedJComp Java Componentization Kit. A list of JComp

plug-ins have been developed to implement the techniques introduced in the framework. A

set of empirical studies have been performed on the JComp toolkit.

8.2 Future Work

Several new research questions have arisen from this work. We believe that significant improve-

ments can be made in some aspects of the presented approach. The possible future work is

presented as follows :

• To apply the dynamic analysis on system behavior within the first stage of the SOC4J

framework to improve the detection of class relationships.

• To investigate algorithmic processes that can be used to automatically categorize the iden-

tified services.

• To measure the reusability and maintainability of the extracted components more concisely.


• To verify that our definitions are consensual with respect to developers’ intent when per-

forming software re-engineering.

• To apply our componentization toolkit, JComp, on more real-life programs and to validate

their results with the program developers.

• To extend our approach on other programming languages. For instance, C++ programs, or

even C and COBOL systems.

• To develop our approach with more flavors of binary class relationships, such as shared-

aggregation and container relationships.

• To improve the precision of the service identification by considering design-patterns, alter-

nate implementations of the algorithms, and alternate definitions of the class relationships.

8.3 Conclusions

In this thesis, we presented a service-oriented componentization framework for Java systems.

The framework componentizes an object-oriented system to re-modularize the existing assets for

supporting service functionality. We introduced an approach for identifying, modeling, and pack-

aging critical business services embedded in an existing system. In addition to producing reusable

components realizing the identified services, the framework also provides a component-based in-

tegration approach to migrate an object-oriented design to a service-oriented architecture. Our

initial evaluation has shown that our framework is effective in identifying services from an object-

oriented design and migrating it to a service-oriented architecture. Moreover, the BView, SView

ServView, and CompView built by our framework help users gain a program understanding of

the system.

Appendix A

Top-Level Services of Jetty

ID Top-Level Service AtomicServices

Description

T1 Win32 Server 248 Runs the Jetty as a Windows HTTP server.T2 Dynamic Servlet Invoker 207 Invokes anonymous servlets that have not

been defined in the web.xml or by othermeans.

T3 Jetty Server MBean 126 Configures a request log, which records allincoming HTTP requests.

T4 Proxy Request Handler 113 Makes the HTTP/1.1 proxy requests.T5 XML Configuration MBean 87 Performs all required configurations for run-

ning the SESM applications in Jetty contain-ers.

T6 Web Application MBean 86 Manages web applications’ lifecycle.T7 Administration Servlet 56 Jetty Administration Servlet. Allows start

and/or stop of server components and con-trol of debug parameters.

T8 CGI Servlet 49 Runs CGI servlets on Windows.T9 Host Socket Listener 46 Declares a socket listener for a Jetty http

server.T10 Web Configuration 34 Create web container configurations.

Table A.1: Top-Level Services of Jetty (1).

123

APPENDIX A. TOP-LEVEL SERVICES OF JETTY 124


Description

T11 Authentication Access Handler 30 Creates an authentication access handler forHTTP pages.

T12 Servlet Response Wrapper 27 Wraps a Jetty HTTP response as a 2.2Servlet response.

T13 IP Access Handler 18 Create a handler to authenticate access fromcertain IP-addresses.

T14 Multipart Form Data Filter 16 Decodes the multipart/form-data stream sentby a HTML form that uses a file input item.

T15 HTML Script Block 12 Represents the script block in a HTML form.T16 Applet Block 9 Represents the applet block in a HTML

form.

Table A.2: Top-Level Services of Jetty (2).

Appendix B

Top-Level Services of Apache Ant


Description

T1 Project Building 205 Runs Ant on a supplied build file.T2 JAR File Expansion 164 Unzips a jar file.T3 WAR File Creation 152 Creates Web Application Archive files.T4 TAR File Creation 144 Creates a tar archive.T5 Zip File Expansion 117 Unzips a zip file.T6 SQL Statement Execution 116 Executes a series of SQL statements via

JDBC to a database.T7 JUnit Invocation 114 Runs tests from the Junit testing framework.T8 JAR File Creation 113 Jars a set of files.T9 TAR File Expansion 95 Expands a tar file.T10 File Packing 92 Packs a file using the GZip or BZip2 algo-

rithm.T11 Unit Test Execution 86 Executes a unit test in the org.apache.testlet

framework.T12 WAR File Expansion 83 Unzips a war file.T13 RPM Invocation 81 Invokes the rpm executable to build a Linux

installation file.T14 File Content Loading 80 Loads a file’s contents as Ant properties.T15 Metamata MParse Invocation 71 Invokes the Metamata MParse compiler-

compiler on a grammar file.T16 CAB File Creation 67 Creates Microsoft CAB Archive files.

Table B.1: Top-Level Services of Apache Ant (1).

125

APPENDIX B. TOP-LEVEL SERVICES OF APACHE ANT 126


Description

T17 SSH File Copy 67 Copies files to or from a remote server usingSSH.

T18 Build File DTD Generation 67 Generates a DTD for Ant build files thatcontains information about all tasks cur-rently known to Ant.

T19 File Encoding Converting 65 Converts files from native encodings toASCII with escaped Unicode.

T20 Task Adding 59 Adds a task definition to the current project,such that this new task can be used in thecurrent project.

T21 Zip File Creation 57 Creates a zip file.T22 Macro Task Definition 56 Define a new task as a macro built-up upon

other tasks.T23 Path Converting 56 Converts a path format from one platform to

another platform.T24 FTP Implementation 56 Implements a basic FTP client that can send,

receive, list, and delete files, and create di-rectories.

T25 XML File Checking 54 Checks that XML files are valid (or onlywell-formed).

T26 File Expansion 52 Expands a file packed using GZip or BZip2.T27 Directory Property Setting 51 Sets a property to the value of the specified

file up to, but not including, the last path el-ement.

T28 File Availability Property Setting 50 Sets a property if a specified file, directory,class in the classpath, or JVM system re-source is available at runtime.

T29 Path Property Setting 50 Sets a property to the last element of a spec-ified path.

T30 Java Class Execution 45 Executes a Java class within the running(Ant) VM, or in another VM if the fork at-tribute is specified.

T31 Dependency Manifest Generation 45 Generates a manifest that declares all the de-pendencies in manifest.

T32 Key Generation 43 Generates a key in key store.T33 Property Setting 43 Sets a property (by name and value), or set

of properties (from a file or resource) in theproject.

T34 XML Property File Loading 43 Loads property values from a well-formedXML file.

T35 Web Proxy Property Setting 43 Sets Java’s web proxy properties.T36 XML Report Generation 43 Generates an XML report of the changes

recorded in a CVS repository.




Description

T37 File Token Identification 40 Identifies keys in files, delimited by specialtokens, and translates them with values readfrom resource bundles.

T38 Java Class Instrumenting 39 Instruments Java classes using the iContractDBC preprocessor.

T39 Existing Task Instrumenting 39 Defines a new task by instrumenting an ex-isting task with default values for attributesor child elements.

T40 File Loading 39 Loads a file into a property.T41 Splash Screen Display 38 Displays a splash screen.T42 File Set Packing 37 GZips a set of files.T43 CVS Pass Entry Adding 37 Adds entries to a .cvspass file.T44 File Checksum Generation 36 Generates a checksum for a file or set of

files.T45 Default Exclude Pattern Modifica-

tion36 Modifies the list of default exclude patterns

from within your build file.T46 JDepend Invocation 35 Invokes the JDepend parser.T47 Time Stamp Setting 35 Sets the DSTAMP, TSTAMP, and TODAY

properties in the current project, based onthe current date and time.

T48 GZip File Expansion 34 Expands a GZip file.T49 File Concatenation 34 Concatenates multiple files into a single one

or to Ant’s logging system.T50 Directory Synchronization 34 Synchronize two directory trees.T51 Condition Property Setting 34 Sets a property if a certain condition holds

true.T52 File Version Checking 34 Sets a property if a given target file is newer

than a set of source files.T53 Telnet Session Generation 34 Automates a remote telnet session.T54 Attribute Permission Change 33 Changes the permissions and/or attributes of

a file or all files inside the specified directo-ries.

T55 Build File Importing 32 Imports another build file and potentiallyoverride targets in it with users’ own targets.

T56 JJTree Invocation 32 Invokes the JJTree preprocessor for theJavaCC compiler-compiler.

T57 Resource Search 32 Finds a class or resource.T58 Temp File Generation 31 Generates a name for a new temporary file

and sets the specified property to that name.T59 Remote Command Execution 30 Execute a command on a remote server us-

ing SSH.T60 Manifest Creation 29 Creates a manifest file.




Description

T61 Documentation Generation 29 Generates code documentation using thejavadoc tool.

T62 XSLT Transformation 29 Processes a set of documents via XSLT.T63 CVS Repository Retrieval 29 Handles packages/modules retrieved from a

CVS repository.T64 SMTP Email Sending 28 Sends SMTP emails.T65 User Input 28 Allows user interaction during the build pro-

cess by displaying a message and reading aline of input from the console.

T66 JProbe Invocation 27 Invokes the JProbe suite.T67 Stylebook Invocation 26 Executes the Apache Stylebook documenta-

tion generator.T68 File Comparison 26 Compares a set of source files with a set of

target files, if any of the source files is newerthan any of the target files, all the target filesare removed.

T69 JavaCC Invocation 26 Invokes the JavaCC compiler-compiler on agrammar file.

T70 Regular Expression Replacement 25 Replaces the occurrence of a given regularexpression with a substitution pattern in afile or set of files.

T71 JJDoc Invocation 25 Invokes the JJDoc documentation generatorfor the JavaCC compiler-compiler.

T72 Current Property Listing 25 Lists the current properties.T73 EAA File Creation 24 Creates Enterprise Application Archive

files.T74 File Permission Change 23 Changes the permissions of a file or all files

inside the specified directories.T75 File Deletion 23 Deletes either a single file, all files and sub-

directories in a specified directory, or a setof files specified by one or more FileSets.

T76 Data Type Adding 23 Adds a data-type definition to the currentproject, such that this new type can be usedin the current project.

T77 Change Report File Generation 23 Generates an XML-formatted report file ofthe changes between two tags or datesrecorded in a CVS repository.

T78 File Move 21 Moves a file to a new file or directory, or aset(s) of file(s) to a new directory.

T79 Log Recording 21 Runs a listener that records the logging out-put of the build-process events to a file.

T80 Project Building Termination 21 Exits the current build by throwing aBuildException, optionally printing addi-tional information.




Description

T81 Property File Creation 21 Creates or modifies property files.T82 MMetrics Computation 19 Computes the metrics of a set of Java source

files, using the Metamata Metrics/WebGainQuality Analyzer source-code analyzer.

T83 Script Execution 19 Executes a script in a Apache BSF-supported language.

T84 TAB Updating 18 Modifies a file to add or remove tabs, car-riage returns, line feeds, and EOF charac-ters.

T85 URL File Retrieval 18 Gets a file from a URL.T86 Extension Checking 18 Checks whether an extension is present in a

file set or an extension set. If the extensionis present, the specified property is set.

T87 Command Execution 17 Executes a system command.T88 File Modification Time Change 17 Changes the modification time of a file and

possibly creates it at the same time.T89 Sound File Execution 17 Plays a sound file at the end of the build, ac-

cording to whether the build failed or suc-ceeded.

T90 ANTLR Invocation 17 Invokes the ANTLR Translator generator ona grammar file.

T91 JNI Header Generation 17 Generates JNI headers from a Java class.T92 String Replacement 16 Replaces the occurrence of a given string

with another string in a selected file.T93 MAudit Computation 15 Performs static analysis on a set of Java

source-code and byte-code files, using theMetamata Metrics/WebGain Quality Ana-lyzer source-code analyzer.

T94 Directory Creation 15 Creates a directory.T95 Text Output 15 Echoes text to System.out or to a file.T96 File Copying 13 Copies a file or Fileset to a new file or direc-

tory.T97 File Group Ownership Change 12 Changes the group ownership of a file or all

files inside the specified directories.T98 Project Filter Setting 12 Sets a token filter for this project, or reads

multiple token filters from a specified fileand sets these as filters.

T99 Source Code Extraction 12 Allows the user extract the latest edition ofthe source code from a PVCS repository.

T100 File Ownership Change 11 Changes the owner of a file or all files insidethe specified directories.

T101 JAR File Information Display 9 Displays the ”Optional Package” and ”Pack-age Specification” information containedwithin the specified jars.


Bibliography

[1] Software product evaluation-quality characteristics and guidlines for their use.ISO/IEC

Standard ISO-9129, 1991.

[2] Apache Ant. A Java-based build tool.http://ant.apache.org/, 2006.

[3] Jagdish Bansiya and Carl G Davis. A class cohesion metric for object-oriented designs.

Journal of Object-Oriented Programming, 11:47–52, January 1999.

[4] Jagdish Bansiya and Carl G Davis. A hierarchical model for object-oriented design quality

assessment.IEEE Transactions on Software Engineering, 28:4–17, January 2002.

[5] V. Basili, L. Briand, and W. Melo. A validation of object-oriented design metrics as quality

indicators.IEEE Transactions on Software Engineering, 22:751–761, October 1996.

[6] L. Belady and C. Evangelisti. System partitioning and its measure.Journal of Systems and

Software, 2:23–29, 1981.

[7] Martin Bernauer, Gerti Kappel, and Gerhard Kramler. Repre-

senting XML Schema in UML - a comparison of approaches.

http://www.big.tuwien.ac.at/research/publications/2003/1303.pdf, 2003.

[8] Martin Bernauer, Gerti Kappel, and Gerhard Kramler. A UML profile for XML Schema.

Technical report, Business Informatics Group and Vienna University of Technology, 2003.

130

BIBLIOGRAPHY 131

[9] T. Biggerstaff, B. Mitbander, and D. Webster. The concept assignment problem in pro-

gram understanding. InProceedings of the 15th International Conference on Software

Engineering (ICSE), pages 482–498, Baltimore, Maryland, USA, May 1993.

[10] Bison. The YACC-compatible parser generator.http://dinosaur.compilertools.net/#bison,

2006.

[11] G. Booch, M. Christerson, M. Fuchs, and J. Koistinen. UML for XML Schema mapping

specification.Rational White Paper, December 1999.

[12] B. Borges, K. Holley, and A. Arsanjani. Delving into service-oriented architecture.

http://www.developer.com/java/ent/article.php/3409221, 2006.

[13] Marcus A. S. Boxall and Saeed Araban. Interface metrics for reusability analysis of com-

ponents. InProceedings of the Australian Software Engineering Conference (ASWEC),

pages 40–51, April 2004.

[14] L. C. Briand, J. W. Daly, and J. K. Wust. A unified framework for coupling measure-

ment in object-oriented systems.IEEE Transactions on Software Engineering, 25:91–121,

January-February 1999.

[15] L. C. Briand, S. Morasca, and V. Basili. Measuring and assessing maintainability at the

end of high-level design. InProceedings of the IEEE Conference on Software Maintenance

(ICSM), pages 74–81, Montreal, Canada, September 1993.

[16] A. Brown, S. Johnston, and K. Kelly. Using service-oriented architecture and component-

based development to build web service applications.Santa Clara, CA: Rational Software

Corporation, 2002.

[17] E. Burd and M. Munro. Evaluating the use of dominance trees for C and COBOL. InPro-

BIBLIOGRAPHY 132

ceedings of the International Conference on Software Maintenance (ICSM), pages 401–

410, September 1999.

[18] Gianluigi Caldiera and Victor R. Basili. Identifying and qualifying reusable software com-

ponents.IEEE Computer, 24:61–70, Febuary 1991.

[19] David Carlson.Modeling XML Applications with UML: Practical e-Business Applications.

Addison Wesley Professional, 2001.

[20] Alexander Chatzigeorgiou and George Stephanides. Entropy as a measure of object-

oriented design quality. InProceedings of the Balkan Conference in Informatics (BCI),

pages 565–573, November 2003.

[21] K. Chen and V. Rajlich. Case study of feature location using dependence graph. InPro-

ceedings of the 8th International Workshop on Program Comprehension (IWPC), pages

241–249, Limerick, Ireland, June 2000.

[22] S. R. Chidamber and C. F. Kemerer. Towards a metrics suite for object oriented design.

In Proceedings of the Conference on Object-Oriented Programming: Systems, Languages

and Applications (OOPSLA), SIGPLAN Notices 26(11), November 1991.

[23] S. R. Chidamber and C. F. Kemerer. A metrics suite for object oriented design.IEEE

Transactions on Software Engineering, 20:476–493, June 1994.

[24] Y. Chiricota, F. Jourdan, and G. Melancon. Software components capture using graph

clustering. InProceedings of the International Workshop on Program Comprehension

(IWPC), pages 217–226, May 2003.

[25] E. Cho, M. Kim, and S. Kim. Component metrics to measure component quality. InPro-

ceedings of the 8th Asia-Pacific Software Engineering Conference (APSEC), pages 419–

426, Macau SAR, China, December 2001.

BIBLIOGRAPHY 133

[26] D. Cimitile and G. Visaggio. Software salvaging and call dominance tree.Journal of

Systems and Software, 28:117–127, Febuary 1992.

[27] R. Conrad, D. Scheffner, and J. C. Freytag. XML conceptual modeling using UML. In

Proceedings of the 19th International Conference on Conceptual Modeling, pages 558–

571, Salt Lake City, Utah, USA, October 2000.

[28] J. Daly, A. Brooks, J. Miller, J. Topber, and M. Wood. The effect of inheritance depth

on the maintainability of object-oriented software.Empirical Software Engineering: An

International Journal, 1:751–761, February 1996.

[29] J. Eder, G. Kappel, and M. Schrefl. Coupling and cohesion in object-oriented systems.

Technical report, University of Klagenfurt, 1994.

[30] Thomas Eisenbarth, Rainer Koschke, and Daniel Simon. Locating features in source code.

IEEE Transactions on Software Engineering, 29(3):210–224, March 2003.

[31] L. H. Etzkorn and C. G. Davis. Automatically identifying reusable oo legacy code.Com-

puter, 30:66–71, October 1997.

[32] R. Fanta and V. Rajlich. Reengineering object-oriented code. InProceedings of Interna-

tional Conference on Software Maintenance (ICSE), pages 238–246, Bethesda, Maryland,

March 1998.

[33] Flex. A fast scanner generator.http://dinosaur.compilertools.net/#flex, 2006.

[34] P. Fremantle, S. Weerawarana, and R. Khalaf. Enterprise services.Communications of the

ACM, 45(10):77–80, 2002.

[35] G. C. Gannod, S. V. Mudiam, and T. E. Lindquist. An architectural-based approach for

synthesizing and integrating adapters for legacy software. InProceedings of the Seventh

BIBLIOGRAPHY 134

Working Conference on Reverse Engineering (WCRE), pages 128–139, Brisbane, Aus-

tralia, November 2000.

[36] Jean-Franqois Girard and Rainer Koschke. Finding components in a hierarchy of mod-

ules: a step towards architectural understanding. InProceedings of the 13th International

Conference on Software Maintenance (ICSM), pages 58–65, Bari, Italy, October 1997.

[37] U. Gleich and T. Kohler. Tool-support for reengineering of object-oriented systems. In

Proceedings of ESEC-FSE/Workshop on Object-Oriented Reengineering, pages 43–51,

Zurich, Switzerland, September 1997.

[38] W. G. Griswold, J. J. Yuan, and Y. Kato. Exploiting the map metaphor in a tool for software

evolution. InProceedings of the 23th International Conference on Software Engineering

(ICSE), pages 265–274, Toronto, Canada, May 2001.

[39] CGI Group. Component mining: An approach for identifying reusable components from

legacy systems.http://www.cgi.com/cgi/pdf/cgiwhpr 07 mining e.pdf, 2004.

[40] W3C Working Group. Web service architecture.http://www.w3.org/TR/2004/NOTE-ws-

arch-20040211/, 2006.

[41] Yann-Gael Gueheneuc and Herve Albin-Amiot. Recovering binary class relationships:

Putting icing on the UML cake. InProceedings of the 19th Annual ACM Conference on

Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages

301–314, Vancouver, Canada, October 2004.

[42] George Yanbing Guo, Joanne M. Atlee, and Rick Kazman. A software architecture re-

construction method. InProceedings of the 1st Working IFIP Conference on Software

Architecture, pages 225–243, San Antonio, TX, USA, February 1999.

BIBLIOGRAPHY 135

[43] D. Hutchens and V. Basili. System structure analysis: Clustering with data bindings.IEEE

Transactions on Software Engineering, 11(8):749–757, August 1985.

[44] JavaCC. Java compiler compiler.https://javacc.dev.java.net/, 2006.

[45] Jess. A rule engine for the Java platform.http://www.jessrules.com/jess/index.shtml, 2005.

[46] Jetty. A Java HTTP server and servlet container.http://jetty.mortbay.org/jetty/index.html,

2006.

[47] Jini. Jini network technology.http://www.sun.com/software/jini/, 2006.

[48] Rick Kazman and S. Jeromy Carriere. View extraction and view fusion in architectural

understanding. InProceedings of the 5th International Conference on Software Reuse,

pages 290–299, Victoria, BC, Canada, May 1998.

[49] Wing Lam and Venky Shankararaman. An enterprise integration methodology.IT Profes-

sional, 6(2):40–49, 2004.

[50] Lex. A lexical analyzer generator.http://dinosaur.compilertools.net/#lex, 2006.

[51] Shimin Li and Ladan Tahvildari. Jcomp: A reuse-driven componentization framework for

java applications. InProceedings of the International Conference on Program Compre-

hension (ICPC), pages 264–267, Athens, Greece, June 2006.

[52] Shimin Li and Ladan Tahvildari. A service-oriented componentization framework for

java software systems. InProceedings of the 13th IEEE Working Conference on Reverse

Engineering (WCRE), Benevento, Italy, October 2006.

[53] Jing Luo, Renkuan Jiang, Lu Zhang, Hong Mei, and Jiasu Sun. An experimental study of

two graph analysis based component capture methods for object-oriented systems. InPro-

BIBLIOGRAPHY 136

ceedings of the International Conference on Software Maintenance (ICSM), pages 217–

226, May 2003.

[54] S. Mancoridis, B. Mitchell, Y. Chen, and E. R. Gansner. Bunch: A clustering tool for the

recovery and maintenance of software system structures. InProceedings of the Interna-

tional Conference on Software Maintenance (ICSM), pages 50–62, Oxford, UK, August

1999.

[55] S. Mancoridis, B. Mitchell, C. Rorres, and Y. Chen. Using automatic clustering to produce

high-level system organizations of source code. InProceedings of International Workshop

on Program Comprehension (IWPC), pages 45–53, Ischia, Italy, June 1998.

[56] M. Marin, A. Deursen, and L. Moonen. Identifying aspects using fan-in analysis. In

Proceedings of the 11th Working Conference on Reverse Engineering (WCRE), pages 132–

141, Delft University of Technology, Netherlands, November 2004.

[57] J. Martin and H. A. Muller. C to Java migration experiences. InProceedings of the

6th European Conference on Software Maintenance and Reengineering, pages 143–153,

Budapest, Hungary, March 2003.

[58] Steve McConnell.Code Complete. Microsoft Press, Redmond, Washington, USA, 1993.

[59] Alok Mehta and George T. Heineman. Evolving legacy systems features using regression

test cases and components. Inthe 4th International Workshop on Principles of Software

(IWPSE), pages 190–193, Vienna, Austria, September 2001.

[60] Alok Mehta and George T. Heineman. Evolving legacy system features into fine-grained

components. Inthe 24th International Conference on Software Engineering (ICSE), pages

417–427, Buenos Aires, Argentina, May 2002.

BIBLIOGRAPHY 137

[61] Robert Morgan.Building an Optimizing Compiler. Butterworth-Heinemann, Boston, Mas-

sachusetts, 1998.

[62] S. S. Muchnick.Advanced Compiler Design Implementation. Morgan Kaufmann Publish-

ers, San Francisco, California, 1997.

[63] H. Muller, M. Orgun, S. Tilley, and J. Uhl. A reverse engineering approach to subsystem

structure identification.Journal of Software Maintenance: Research and Practice, 5:181–

204, 1993.

[64] H. Muller and J. Uhl. Composing subsystem structures using (k,2)-partite graphs. In

Proceedings of International Conference on Software Maintenance (ICSM), pages 12–19,

San Diego, November 1990.

[65] OMG. UML 2.0 Superstructure Specification. Object Management Group, Framingham,

Massachusetts, USA, October 2004.

[66] Margaretha W. Price and Steven A. Demurjian. Analyzing and measuring reusability in

object-oriented design. InProceedings of the 12th ACM SIGPLAN Conference on Object-

Oriented Programming, Systems, Languages, and Applications, pages 22–33, Atlanta,

Georgia, United States, October 1997.

[67] W. Provost. UML for W3C XML Schema design.

http://www.xml.com/pub/a/2002/08/07/wx-suml.html, 2006.

[68] RCP. Rich Client Platform.www.eclipse.org/rcp, 2005.

[69] M. P. Robillard and G. C. Murphy. Concern graphs: Finding and describing concerns using

structural program depnedencies. InProceedings of the 24th International Conference on

Software Engineering (ICSE), pages 406–416, Buenos Aires, Argentina, May 2002.

BIBLIOGRAPHY 138

[70] O. P. Rotaru and M. Dobre. Reusability metrics for software components. InProceedings

of the 3rd International Conference on Computer Systems and Applications (AICCSA),

pages 24–32, Cairo, Egypt, January 2005.

[71] N. Routledge, L. Bird, and A. Goodchild. UML and XML Schema. InProceedings of

the 13th Australian Database Conference (ADC), pages 274–281, Melbourne, Australia,

February 2002.

[72] SDMetrics. SDMetrics User Manual.http://www.sdmetrics.com/manual/LOMetrics.html,

2006.

[73] Subhash Sharma.Applied Multivariate Techniques. John Wiley, 1996.

[74] S. C. Shaw, M. Goldstein, M. Munro, and E. Burd. Moral dominance relations for program

comprehension.IEEE Transactions on Software Engineering, 29:851–863, Septmeber

2003.

[75] Suk Kyung Shin and Soo Dong Kim. A method to transform object-oriented design into

component-based design using object-z. InProceedings of the International Conference on

Software Engineering Research, Management and Applications (SERA), pages 274–281,

August 2005.

[76] A. Shokoufandeh, S. Mancoridis, and M. Maycock. Applying spectral methods to software

clustering. InProceedings of the Working Conference on Reverse Engineering (WCRE),

pages 3–10, November 2002.

[77] H.M. Sneed. Encapsulating legacy software for use in client/server systems. InPro-

ceedings of the Working Conference on Reverse Engineering (WCRE), pages 104–119,

November 1996.

BIBLIOGRAPHY 139

[78] G. Snider. Measuring the entropy of large software systems.HP Technical Report HPL-

2001-221, 2001.

[79] T. A. Standish. An essay on software reuse.IEEE Transactions on Software Engineering,

10:494–497, September 1984.

[80] Ladan Tahvildari.Quality-Drive Object-Oriented Re-engineering Framework. PhD The-

sis, Department of Electrical and Computer Engineering, University of Waterloo, Ontario,

Canada, August 2003.

[81] Ladan Tahvildari. Testing challenges in adoption of component-based software. InPro-

ceedings of Proceedings of ICSE Workshop on Adoption-Centric Software Engineering

(ACSE), pages 21–25, Edinburgh, Scotland, May 2004.

[82] Ladan Tahvildari and Kostas Kontogiannis. Improving design quality using meta-pattern

transformations: A metric-based approach.Journal of Software Maintenance and Evolu-

tion: Research and Practice (JSME), 16(4), 2003.

[83] Ladan Tahvildari and Kostas Kontogiannis. Develop a multi-objective decision approach

for selecting source-code improving transformations. InProceedings of the 20th Inter-

national Conference on Software Maintenance (ICSM), pages 427–431, Chicago, Illinois,

USA, September 2004.

[84] Ladan Tahvildari and Kostas Kontogiannis. Quality-driven object-oriented code restructur-

ing. In Proceedings of Proceedings of ICSE Workshop on Software Quality (ICSE), pages

47–52, Edinburgh, Scotland, May 2004.

[85] Ladan Tahvildari and Kostas Kontogiannis. Requirements driven software evolution. In

Proceedings of the 12th IEEE International Workshop on program Comprehesion (IWPC),

pages 258–269, Bari, Italy, June 2004.

BIBLIOGRAPHY 140

[86] Ladan Tahvildari, Kostas Kontogiannis, and John Mylopoulos. Quality-driven software

re-engineering.Journal of Systems and Software (JSS), Special Issue on: Software Archi-

tecture - Engineering Quality Attributes, 66(3):225–239, June 2003.

[87] P. Tonella, G. Antoniol, R. Fiutem, and E. Merlo. Points-to analysis for program under-

standing. InProceedings of the 5th International Workshop on Program Comprehension

(IWPC), pages 90–99, May 1997.

[88] W3C. XML Schema Part I: Structures second edition.http://www.w3.org/TR/xmlschema-

1/, 2006.

[89] Ju An Wang. Towards component-based software engineering.Computing Sciences in

Colleges, 16:177–189, October 2000.

[90] H. Washizaki and Y. Fukazawa. A technique for automatic component extraction from

object-oriented programs by refactoring.Science of Computer Programming, 56:99–116,

April 2005.

[91] H. Washizaki, H. Yamamoto, and Y. Fukazawa. A metrics suite for measuring reusability

of software components. InProceedings of the International Software Metrics Symposium

(METRICS), pages 211–223, Spetember 2003.

[92] N. Wilde, M. Buckellew, H. Page, and V. Rajlich. A case study of feature location in un-

structured legacy fortran code. InProceedings of the 5th European Conference on Software

Maintenance and Reengineering (CSMR), pages 68–75, Lisbon, Portugal, March 2001.

[93] N. Wilde and M.C. Scully. Software reconnaissance: Mapping program features to code.

Journal of Software Maintenance: Research and Practice, 7:49–62, January 1995.

[94] W. E. Wong, S. S. Gokhale, and J. R. Hogan. Quantifying the closeness between program

components and features.Journal of Systems and Software, 54(2):87–98, October 2000.

BIBLIOGRAPHY 141

[95] W. E. Wong, S. S. Gokhale, J. R. Hogan, and K. S. Trivedi. Locating program features

using execution slices. InProceedings of IEEE Symposium on Application-Specific Sys-

tems and Software Engineering and Technology, pages 194–203, Richardson, Texas, USA,

March 1999.

[96] W. Eric Wong and J. Jenny Li. Redesigning legacy systems into the object-oriented

paradigm. InProceedings of International Symposium on Object-Oriented Real-Time Dis-

tributed Computing (ISORC), Hakodate, Hokkaido, Japan, May 2003.

[97] X. Xu, C. H. Lung, M. Zaman, and A. Srinivasan. Program restructure through cluster-

ing technique. InProceedings of International Workshop on Source Code Analysis and

Manipulation (SCAM), pages 75–84, September 2004.

[98] Yacc. Yet another compiler-compiler.http://dinosaur.compilertools.net/#yacc, 2006.

[99] Zhuopeng Zhang, Ruimin Liu, and Hongji Yang. Service identification and packaging

in service oriented reengineering. InProceedings of the 7th International Conference

on Software Engineering and Knowledge Engineering (SEKE), pages 241–249, Taipei,

Taiwan, China, July 2005.

[100] Wei Zhao, Lu Zhang, Yin Liu, Jiasu Sun, and Fuqing Yang. Sniafl: Towards a static

non-interactive approach to feature location. InProceedings of the 26th International

Conference on Software Engineering (ICSE), pages 293–303, Scotland, UK, May 2004.

[101] Ying Zou and Kostas Kontogiannis. Towards a web-centric legacy system migration. In

Proceedings of ICSE Workshop on Net-Centric Computing (NCC), May 2001.

Documents

A Service-Oriented Componentization Framework for Java … · 2007. 8. 20. · A Service-Oriented Componentization Framework for Java Software Systems by Shimin Li A thesis presented