Reverse Engineering Object-Oriented
Distributed SystemsDan C. Cosma
LOOSE Research Group“Politehnica” University of Timisoara
Romania
OverviewUnderstanding object-oriented
distributed software applications
by reverse engineeringthe source code,
focusing on the distribution-related aspects of the system,
using a structural, technology-aware analysis approach
Distributed Software
The distributed aspect is crucial for understanding - systems are specifically built for distributed problems - technology dependence: communication infrastructure
Making the distributed aspect central makes the analysis easy - without ignoring the local functionality concerns
Methodology for understanding object-oriented distributed systems
meta-model
reverse engineering techniques
metrics
visualization
tool
A reverse engineering process
System Representation
Model
Augments an OO meta-model (Memoria): makes the distributed aspect a main concept
distributable feature -- feature directly involved in the distributed functionality, either by providing remote services, or by directly using such services
frontier classes -- act at the frontier between the system and the communication infrastructure (“communication mediator”)
Model Overview
System - Mediator
Frontier
Frontier Class
Frontier Class
Frontier Class
Frontier Class
Core Class
Core ClassCore
ClassCore Class
Core Class
Frontier Class
Core Class
Core Class
Core Class
Acquaintance Class Acquaintance
ClassAcquaintance
Class
Acquaintance Class
Acquaintance Class
Acquaintance Class
Acquaintance Class Acquaintance
Class
Acquaintance Class
Acquaintance Class
Acquaintance Class
Acquaintance Class
Class
Class
Class
Class
Class
Class
Acquaintance Class
Distributable
Feature Core
Distributable
Feature Core
Distributable
Feature Distributable
Feature
Local
Feature
The Approach
Mediator
remote
call
Mediator
2: Separate distinct cores of distributable features
utility
class
core of
distr.feat.
Mediator
4: Assess impact ofdistributable features
3: Capture coarse-grained architecture
of distributable features
1: Build the dependency graph of distributable features (DGDF)
frontier
class
0: Initial graph ofclasses
5: Support for restructuring
core class
extracted
new feature
Approach
vertex: a classedge: method call / attribute access / inheritance relation
The case studies
Java / RMI
I. Core analysis
Goals
Find the core entities involved in the distributed functionality
Get an overview of the distributed architecture
Build a Core Graph
Start with the Frontier Classes: the best starting points describing involvement in distribution
Incrementally add new vertices and edges
Identify the Frontier - technology dependent rules
until a configurable depth of search is reached
Identify the Distributable Feature Cores
Detect and remove edges that connect loosely coupled sets of classes - technology-aware and cohesion-based heuristics
The resulting connected components: candidate DF cores
Identify the remote communication channels
The engineer reviews the result
Classes in DF cores: ~10%
Architectural preview: the Distributed Architecture Perspective
II. Impact of distribution
Goalsfocus on the rest of the classes in the system (the
majority)
evaluate their involvement in providing the Distributable Features
identify the classes that follow the main patterns of involvement
make system-level and class-level characterizations
Class involvementSet of coupling-based metrics
The collaboration of a class with the entire system Total bidirectional coupling (TBC)
Involvement in providing a particular DF Acquaintance with a Distributable Feature (ADF)
Involvement in providing all DFs = involvement in distribution Total Acquaintance with Distributable Features (TADF)
System-wide “distributed awareness” Average Total Acquaintance with Distributable Features (Average TADF)
Visualization
Feature Affiliation
Perspective
Total Collaboration Dispersion
Dispersion of
Feature
Acquaintance
Intensity of
Feature
Acquaintance
To
tal C
olla
bo
ratio
n I
nte
nsity
- intensity: no of collaborations- dispersion: no. of collaborators
gray: total collaborationcolor: distribution-related collaboration
The Feature Affiliation Perspective
Visualization Example
Part of the visualization for EHCACHE
Patterns of InvolvementHow does a class participate in providing the DFs
- The main patterns of involvement were detected (Patterns of acquaintance)
- Define and use a set of detection strategies [Marinescu04] to detect the classes following a certain pattern
- Put the visualization to use: see the interesting classes
Pattern I.Significant Feature
Acquaintance Big Color Box
AND
Significant
Acquaintance of
Distributable
Features
Total coupling with
distributable features is high
Class is mostly coupled with
distributable features
TADFTBC ≥ AV ERAGE
TADF ≥ HIGH
Class has significant involvement with the distributed functionality
Pattern II.Local Feature Contributor Big Gray
Class has significant involvement with local (non-distributed) functionality
AND
Class is strongly coupled with the
other classes in the system
Class has (almost) no relation
with the distributable features
TBC ≥ HIGH
Local Feature
Contributor
TADF
TBC≤ LOW
Pattern III.Connector Class Color-Spotted
Gray
Class connects a local feature with a distributed one
AND
Class has significant coupling with
the distributable features
Class has significant coupling with
other classes in the system
TADF ≥ AV ERAGE
LOW <TADF
TBC≤ AV ERAGE
Connector
Class
EHCACHE
FWS
System-level characterization
FWS: 2 DF cores Average TADF=3, lot of gray - significant local functionality [80 classes belong to a local tool, system initially non distributed]
EHCACHE: 5 DF cores Average TADF=9, more color - more distributed functionality [documentation: system redesigned specifically as distributed]
Class-level characterization
• FWS- 80 classes -- the local tool for visually editing workflow specifications- 6 classes -- belonging to other local features
• EHCACHE- Less than 5 classes - Cache – highest TBC heavily used, but local - ConfigurationHelper – manages configuration files
Local Feature Contributor / Big Gray
Class-level characterization
• FWS- 5 classes, related to the Workflow Engine- Small number => the functionality is well located in the system
• EHCACHE- 12 classes, related to the Cache Peer Manager- TADF/TBC close to 1 => classes are dedicated to the distributable feature (ex: Mutex, ConcurrencyUtil, Sync)
Significant Feature Acquaintance / Big Color Spot
Class-level characterization
FWS- 5 classes- Most interesting case: ProcessDefinition - TADF=15, TADF/TBC=0.2 - Models/stores the internal representation of workflows in execution - Links the classes that run the workflow (detected as Significant Feature Acquaintances) with an XML parser that reads the workflow specifications
EHCACHE- 6 classes- Most interesting case: Element - Represents the data item cached by the system - The only class that has a noticeable relation with Cache Replicator - Links the Cache Replicator with the non-distributed feature of the system that actually stores (caches) data
Connector Class / Color-Spotted Gray
III. Support for Restructuring
Goal
Apply concepts and measurements similar to those used in the analysis
to help the engineer explore / play with
tentative restructuring scenarios
ApproachVisualize (a part of) the graph of classes
Select a set of initial classes
See what happens if they are to be extracted (removed) as a separate unit: - evaluate the redesign layout which classes should go with those selected,
which should remain in the initial system
- evaluate the cost
Apply such scenarios at will
HelpersMetrics-based visualization to help select initial classes - In-group Adequacy (IGA) metric
Compute the forecasted layout - Acquaintance with Class Group (ACG) - Configurable threshold value
Computing the extraction cost - Extraction Cost (EC)
dispersion
intensity
a) b) c)
Example
Applying successive scenarios can also help improve the system understanding
IV. The Tool
niSiDe“non-invasive Structural insight on Distributed environments”
Follows all the steps in the methodology, and provides complete support for analysis
Generates all visualizations and support diagrams
Built for extensibility
Integrated in the iPlasma environment
Conclusions
Contributions
• A methodology for understanding object-oriented distributed systems
• A model for object-oriented distributed systems
• The Distributable Features View (visualization)
• Basic restructuring support as a natural extension to the understanding techniques
• Comprehensive tool support