An Information-Theoretical Framework for Modeling Component-Based Systems

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 34, NO. 4, NOVEMBER 2004 475

An Information-Theoretical Framework forModeling Component-Based Systems

Remzi Seker, Member, IEEE, and Murat M. Tanik, Member, IEEE

Abstract—Software systems tend to be large scale and complexwith the inevitable increase in their functionalities. The increasingcosts related to system development and maintenance in correla-tion to the software size requires new assessment tools for the newlyevolving development methodologies. Taking advantage of existingtools and methodologies in a mature field is beneficial to relativelyyoung, related disciplines. Therefore, this paper brings modelingtechniques from a well-developed and mature discipline, informa-tion theory, into component-based software (CBS) engineering. In-formation-theoretic representation and analysis techniques in gen-eral, noiseless information channel concepts in particular, are goodcandidates to be adopted to model the dynamic behavior of soft-ware components and quantify the interaction between them. Thismodeling approach is realized by first modeling the component in-tegration units of CBS with cubic control flowgraphs. The arcs inthese models can be labeled as functions of parameters of their“hidden” components in the originating nodes or arcs, or both.Each of these labeled graphs defines a Shannon language. Then,a set of metrics, labeled as pervasive Shannon metrics is defined.Four case studies are demonstrated to show the applicability of theproposed metrics for assessment of CBS.

Index Terms—Capacity, component-based software, metrics,modeling, pervasive Shannon metrics, Shannon languages.

I. INTRODUCTION

THE CURRENT expectations from software systems causethem to grow in size and complexity. In order to achieve

the goal of developing large-scale software systems in an effi-cient way, reuse of existing software becomes of critical impor-tance [1]–[6]. Driven by the reuse principle, component-basedsoftware (CBS) engineering is becoming the preferred softwaredevelopment methodology to alleviate the process of buildinglarge-scale software systems [4], [7]–[11]. This software devel-opment practice is motivated by what has been accomplishedin manufacturing of electronic devices. While building elec-tronic devices, the components with well-defined interfaces andfunctionalities are integrated, without knowledge of the actualdesign and circuit elements inside the components [12], [13].Hence, a similar concept follows in developing software sys-tems from components; the integrator does not need the sourcecode of the components but their functionalities and interfaces.The style of CBS development (CBSD) relies on the reuse of ex-

Manuscript received June 18, 2003; revised January 25, 2004 and March 10,2004. This paper was recommended by Associate Editor S. H. Rubin.

R. Seker is with the Department of Computer and Software Engineering,Embry-Riddle Aeronautical University, Daytona Beach, FL 32114-3900 USA(e-mail: [email protected]).

M. M. Tanik is with the Department of Electrical and Computer Engineering,University of Alabama at Birmingham, Birmingham, AL 35294 (e-mail:[email protected]).

Digital Object Identifier 10.1109/TSMCC.2004.829297

isting, well-defined components developed for integration [5],[14]. Hence, the development niche began to shift its focus fromlines-of-code to coarser-grained components and their intercon-nections. CBSD is centered around the architecture, separatesinfrastructure from logic, and helps in dealing with systemscomplexity, thereby easing the large-scale development processin a cost-effective fashion [10].

The size and complexity of CBS systems requires modeling[6]. A model of a system is a representation of the system’s at-tributes of interest significant for the purpose of modeling [2],[15], [16]. Therefore, a model may represent some aspects ofa system while ignoring the others, depending on the purposefor which the model is built. Following Chestnut [15], modelscan be divided into three categories: iconic, analog, and sym-bolic. The modeling technique introduced in this paper resultsin models that cut through the analog and symbolic models.

Modeling a system is not enough. A set of rules for assess-ment of the system with respect to the desired parameters isneeded. Therefore, metrics based on models are valuable in as-sessment of systems. There has been reasonable progress in thesoftware metrics area [17]–[19] to quantify various aspects ofsoftware, such as the complexity of the software system. How-ever, much work is needed for the assessment of CBS systems[9], [19]–[22].

There are many definitions available for components [8],[23]. The two common characteristics of all the availabledefinitions are that the components have been developed forcomposition and they have a well-defined interface.

Definition 1: A component is an “object” with: 1) an inde-pendently developed body for composition with other compo-nents and 2) a well-defined run-time interface.

Definition 2: CBSD is a software development methodologyin which all life-cycle processes are based on integration ofcomponents.

The definition we provided for CBSD is simplified, especiallyin terms of the life cycle. A more detailed information on the lifecycle of CBSD can be found in [24]. Brown and Wallnau [10]compared different component definitions, and they also statedthat the need for CBSD arose from the difficulties in describingabstractions via abstract interfaces. Brereton and Budgen [8]stated the advantage of having a loose component definition toprevent a possible technology lock-in. They also stated the cor-relation of the gap between requirements specification, compo-nent description languages, and achieving a satisfactory productvia integration.

The increasing size of software systems makes the testingcumbersome and expensive. Therefore, providing a reliable ar-tifact of desired quality becomes a challenge [25], and the myth

1094-6977/04$20.00 © 2004 IEEE

476 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 34, NO. 4, NOVEMBER 2004

of testing not being important in a product’s success is changing[26]. Hence, reliable, practical, and cost-effective assessmentmethods are needed to reduce the increasing cost of quality as-sessment for software systems [20].

The attempts to introduce metrics for CBSD begin withtrying to apply the traditional metrics, similar to those de-veloped for O-O methodologies, to CBSD. This approach,however, does not help much when the CBS is composed ofcommercial off-the-shelf (COTS) or components that have beendeveloped in different languages. Dumke and Winkler [27]proposed an O-O measurement framework for management ofCBSD. Cherinka et al. [28] applied static analysis techniques,mainly static slicing, to an application composed of COTScomponents. The analysis techniques they used were concludedto be useful, but not sufficient, for the maintenance of CBS.

Unavailability of source code of the components being usedin system development renders classical analysis and testingtools ineffective. Moreover, even if source code for a compo-nent is accessible, the components and the application the com-ponents are going to be used in may be in different languages.In addition, the components might have been developed on dif-ferent platforms. It is also worthwhile to consider the fact thata component may be providing more functionality than the por-tion that is used within an application, and some of those func-tionalities might not have yet been tested by the componentprovider. This makes the context-based testing necessary. More-over, lack of source code alone leaves the integrator with spec-ification-based (black-box) testing techniques [29]. Based onthese facts, Harrold et al. [19] provided an approach for testingand analyzing CBS. They proposed a component should be pro-vided with the testing summary for the subdomains in which thecomponent has been tested.

Communication between the components in a CBS is an im-portant phenomenon to investigate, since the computation per-formed by a CBS system is based on the communication be-tween its components. Ancona et al. [30] proposed a commu-nication modeling called channel reification to model the com-munication between complex objects in an O-O system. Theybased their modeling on the computational reflection property.Computational reflection is defined as the activity performed bya computational unit when doing computations about its owncomputation [31]. Liu and Meersman [32] defined activity basedon the specification of communication in O-O database systems.In this paper, we will introduce a new modeling technique forCBS and its assessment by using the communication betweenand within the components.

As CBSs evolve and become distributed CBSs, the classicalmetrics introduced to measure certain quality attributes of thesoftware system fail. This is caused by absence or limitation ofavailable information about the implementation of a component,especially by the unavailability of the source code.

II. METHODOLOGY

Taking advantage of the tools and methodologies that alreadyexist in another mature field becomes of critical importance forrelatively young and related disciplines. The approach taken in

this paper is to bring a modeling technique from a well-devel-oped and mature discipline, specifically the noiseless channelconcepts of information theory, and apply them in the relativelynew area of software development, CBSD.

A CBS system is composed of component integration units(CIUs) that are composed of components. Therefore, a CIU canbe a composite component, a subsystem, or a system. The CIUsof a CBS can be represented with control flowgraphs (CFG),specifically with cubic control flowgraphs (CCFGs). Then, com-position/decomposition principles that are derived from cubicgraph formalism may be utilized to investigate the integrationof the system [4], [17], [33]. The CCFG of a CBS is a stronglyconnected cubic graph. The CCFG of a system, when labeled,defines a Shannon language. Hence, the labeling of arcs is ofimportance for the specific concern about a parameter or set ofparameters related to the system. Then, we apply the notions forcomputing the capacity for the noiseless channel that is equal tothe capacity of the Shannon language [34] defined on the labeledCCFG and during this process, we derive the proposed metrics.

A labeling of an arc in a CCFG, as a function of a parameterbelonging to its originating node, would disregard the “hidden”components on that arc and concentrate on the nodes of theCCFG. In this case, one would consider the components arelocated within the nodes of the CCFG. However, if needed,weights of the arcs can be made functions of their “hidden” com-ponents (along the arcs) and originating nodes. In this respect,our proposed modeling approach has an advantage compared tothe regular use of control flowgraphs. Since a CBS can be de-composed down to its prime components, our approach remainsvalid for the CIUs as well. In order to show the applicability ofthe proposed metrics, four case studies are provided to point outsome of the application areas of the metrics for CBS.

III. SHANNON LANGUAGES MODELING OF CBS AND

PERVASIVE SHANNON METRICS

Representing programs by using graphs enables bringing inthe application of graph-theoretical notions to solve problemsin software engineering [33], [35]–[37]. One such approach isthe investigation and derivation of software metrics by usingflowgraphs [38].

Every program has an equivalent structured program com-posedofrepetitive,selective,andsequentialconstructs [39]–[43].Once a program is represented by a control flowgraph, the regularexpression that stands for the control flowgraph can be found[43]. The regular expression that can be written for a control flow-graph can be viewed as the expression standing for a Shannonlanguage for which the capacity calculations can be done.

A specific class of control flowgraphs, CCFGs, is of special in-terest. A CCFG is a CFG that retains the cubic graph properties.Cubic graphs are preferred because it is possible to transform agraph into a cubic graph while retaining the inherent structure.Cubic graphs, although they can often be models for real systems,constitute a small set of graphs where the problem has an equiv-alent difficulty as it would have in the general case. [44]. In somecases, restricting a problem to the set of cubic graphs may en-able a better solution than keeping the problem in the general case[44]. A structured program can be represented by a CCFG. Then

SEKER AND TANIK: AN INFORMATION-THEORETICAL FRAMEWORK FOR MODELING COMPONENT-BASED SYSTEMS 477

the CCFG can be reduced to its prime components, namely theCCFGs of the three constructs for structured programming [17].Prather [17] used CCFG properties in designing and analyzingsoftware metrics. Moreover, by utilizing sequencing and nestingproperties of CCFGs, Prather [45] identified a collection of inde-pendent metrics and constructed a combination metric.

In CCFGs, the decision nodes are colored black, and theyhave indegree one and outdegree two. The junction nodes arewhite, and they have indegree two and outdegree one. Tang [4]developed a framework for the composition and decompositionof CBS systems through the CCFG representation of a systemskeleton.

A. Capacity of Shannon Languages

Shannon’s capacity notion, specifically the capacity fornoiseless channels, has been well studied [34], [46]–[48].Because information theory is a well-established, mature field,and is widely applicable, the applications span from investiga-tion of randomness [49] to software engineering [50]–[53]. Ingeneral, the applications of the information theory to softwarehas been to define metrics through the pre-defined probabilitydistributions (usually equal probabilities). This paper differsfrom its counterparts by not only in terms of using CCFGs andthe capacity notion for noiseless channels through considera-tion of execution time and latency, but also by not using a graphthat represents software, labeled with pre-assigned probabilities(for both nodes and arcs).

In order to show how the methodology works, in this section,the necessary background is introduced briefly. The graph-theo-retical background enables us to calculate the capacity of the lan-guage that isdefined on the graph.This calculation needs labelingof the arcs with values from a parameter of interest (e.g., time)and theory of nonnegative matrices for calculating the capacity(combinatorial capacity). Theory of nonnegative matrices is alsoused tocalculate theMarkovprocessdefinedonthegraph throughwhich the capacity can be calculated (probabilistic capacity). Wenow proceed with intoducing the necessary background briefly.

Theorem 1: Let be the adjacency matrix of a graph ,. is irreducible if and only if is strongly connected.

Following [34], [46], and [54], the necessary definitions andtheorems for calculating the capacity of Shannon languages areprovided.

Definition 3: A channel that introduces zero noise uncer-tainty is a noiseless channel.

Definition 4: A Shannon language is a language de-fined on a labeled directed graph with a labeling .

Definition 5: Let be a nonnegative real number. The arcduration partition function for each pair of vertices and

is defined as

(1)

where is the set of arcs directed from to , is theduration of arc , and , with being set of vertices (nodes)

of the digraph. One can consider the arc duration partition func-tions for each pair of vertices as the(i.e., ) entries of an matrix , which correspondsto the adjacency matrix of the digraph.

Definition 6: matrix is called the partition matrix.Theorem 2: The combinatorial capacity of is given by

(2)

where is the unique solution to and is thespectral radius of partition matrix . Moreover, for

, is the greatest positive solution of the equation.

Theorem 3: The and (the probabilistic ca-pacity) of the -language are equal

(3)

where is the unique solution of .Making a change in variable, such that , eases the

computations. By doing this, instead of having to solve the ex-ponential equation , solving the polynomial

, and finding the smallest positive real rootis easier. Then the combinatorial capacity becomes

(4)

Existence of the desired root (smallest positive real root, theunique root) is guaranteed when the CCFG is strongly con-nected [47], [48], [54]–[58].

B. A Markov Process Defined on a Shannon Language

Perron–Frobenius theorem assures that the irreducible matrixhas positive left and right Perron eigenvectors and

associated with the greatest eigenvalue 1, such that

Following [34] and [59], the conditional probabilities definedon the arcs are defined to be

(5)

where is the destination node for arc and is thestarting node (origin) of arc .

The corresponding stationary state probabilities for the nodesare defined as

(6)

where is the set of vertices of the digraph. The resulting sta-tionary state and transition probabilities defined by (5) and (6)correspond to a Markov process defined on the labeled digraph.

It may be worthwhile to mention that both Lind and Markus[48] and Khandekar et al. [34], pointed out the right-resolvingproperty (all the outgoing arcs of a node should have different


labels) on the labeling of the digraph standing for the Shannonlanguage, in order to calculate its capacity. In this study, theright-resolving property will be omitted, due to the purpose ofusing the capacity notion as a metric in assessment of CBS.Moreover, the right-resolving property can be satisfied by intro-ducing a dummy node wherever necessary and “split” the labelbetween the two arcs such that their “product” will result in theoriginal symbol.

C. Pervasive Shannon Metrics

The advantage of information-theoretical modeling of CBSbecomes noteworthy when the systems composed of distributedcomponents are considered. In distributed CBSs, all that is pro-vided to the customer or service requester is the interface anddescription of the service that will be provided. For the sakeof assessing the performance of CBSs, the CCFGs are labeledby exponential functions utilizing the values of parameter of in-terest, such as time. Although pervasive Shannon metrics can beapplied even if the control flowgraph is not cubic, this study waslimited to CCFGs due to their established relationship to CBS.

Definition 7: Capacity of a Shannon language is de-fined to be the behavioral growth metric.

Definition 8: The inefficiency (symbols generated per bit) ofa Shannon language is defined to be and isreferred to as the inefficiency metric.

Remark 1: The unit for capacity is bits per symbol. Inversionof this unit results in symbols per bit that can be used to measurethe inefficiency.

Definition 9: The inactivity of the Shannon languageis defined to be and is referred to as the inactivity metric.

Remark 2: Since capacity corresponds to rate of growth in asystem, it can be viewed as the “freedom” of the system. Then,

can be named as activity in the system.Due to the close relationship between the pervasive Shannon

metrics, only one of them would be enough to use. However, wedefined all three of them due to different scalings they possess.One metric may offer better resolution for a particular situationthan the others. When we provide the case studies, we will useonly inefficiency and inactivity metrics.

D. Computation of the Pervasive Shannon Metrics

Fig. 1(a) presents, for demonstation purposes, a simpleCBS and its corresponding CCFG. Each of the components

that constitute this “small” system may containthousands of lines of code. The pervasive Shannon metricscan be calculated for this system and its decomposed values atevery step.

The CCFG shown in Fig. 1(a) is labeled according to the timedurations used by the components hidden along the arcs, in-cluding the time needed for the decision to be made, in case theoriginating node is a decision node. The arc duration partitionfunctions for this labeled CCFG are as seen follows:

...

Fig. 1. Program that (a) is considered in need of evolution and its CIU that isto be replaced and (b) the same system after the CIU is replaced.

One can assume there is no hidden component along an arcand the origin node for this arc is a junction node. Then, thetime duration for this arc, which may be considered as the ex-ecution time for the components along the arcs and for compo-nents hidden inside the nodes, may be assumed to be zero. Thisresults in the label for that arc to be . In this


example, for demonstration purposes, the time spent along eacharc is different than zero.

The partition matrix for this CCFG is found to be as shownin (7), at the bottom of the page. Then,results in

The smallest positive real root that is needed to calculate thecapacity for this CCFG is . Therefore, thevalues for , , and at the overall system level are found to be

Since the pervasive Shannon metrics have not been widelyused, there is not a scale against which we can compare the cal-culated values above. However, these values can be helpful inmaking decisions by comparing the calculated values for twodifferent systems or even components. Three of the case studiespresented in Sections IV show this type of comparison. More-over, this paper assumes a specific application domain in whichthe “higher inefficiency” and “higher inactivity” are not desired.Hence, in other application domains (for example in a domainwhere higher self similarity—by desing—is not desirable: wewould want higher , which requires smaller and values)change in the meanings associated with these metrics may berequired.

IV. APPLICATIONS OF THE PERVASIVE SHANNON METRICS

In order to show the usability of the proposed modeling ap-proach and metrics in CBS, four case studies are provided. Thecase areas for which examples will be provided are: 1) systemevolution (component swapping); 2) identifying performancedegradation; 3) “slothful” component and process identificationwithin a system; and 4) component and service ranking.

A. Evolution of CBS Systems

Alleviating the system evolution is an important aspect ofCBS systems and distributed CBS systems in particular. In sucha system, the customer may want to exchange a service pro-viding component with another one that is cheaper, offeringmore stability and perhaps security, among other features. In the

following example, a criterion based on the inactivity and inef-ficiency metrics will be proposed to assist the system integratorin deciding whether to swap a component or not.

Assume we have a program with the corresponding CCFGas shown in Fig. 1(a). The integrator came across a new COTSCIU, , that is supposed to replace the CIU seen inside thedotted rectangle, . However, the vendor has not providedthe and values for this new component. The integrator takes

and starts examination of this component. It is found, bytesting with two different data sets, that terminates, onthe average, in three and ten units of time for each data set, onthe same system where the actual application is running. Thisnew component, , seems to be a promising candidate toreplace .

In order to make a decision on whether the COTS componentshould replace the already integrated component ,

the integrator needs to calculate the metric values for thesystem assuming has replaced . The programand its CCFG after the swap are shown in Fig. 1(b). Thecomputed values of and for the overall system beforeand after the swap are summarized in Table I. The change inthe values of both parameters suggests decrease the activityand efficiency of the system. Therefore, the integrator canconclude that swap of the CIU in question with the proposedCOTS component is not a good idea.

We should point out that for the sake of simplicity, the la-beling of arcs as functions of the parameter of interest are lim-ited to one element. However, if one considers a more real-istic case, in which a component would have more than one(in this example two: three and ten, that constitute the label

) parameter values, obtained from testing with differentdata sets, comparing the components without using the perva-sive Shannon metrics becomes harder.

This example demonstrated that the metrics and havethe potential to assist in system maintenance. The discussionwill follow with a proposal to include these metric valueswith a component. For this specific example, assuming thatthe skeleton of the component is known, the candidatecomponent must be tested for three different data sets(three different paths that can be taken within ). Aftertesting, must terminate for each data set with averagetimes of four, seven, and eight, at the most. These set of valuesprovides the same metric values for and for the overallsystem as if the swap was never made. If the integrator tested

(7)


TABLE IVALUES OF ' AND � FOR THE OVERALL SYSTEM BEFORE AND AFTER

THE SWAP

the COTS component for three or more different datasets, making the decision—at the cost of extra testing—mighthave been easier.

B. Identifying Performance Degradation

Assessment of performance for a software system ensures theextent one can benefit from the system during its operation. Dueto the needs in today’s service industry, building systems withdistributed, service-providing components is of key importance.Assessment of the performance in such a system becomes ofparamount interest to the system integrator since there is no con-trol over the provided services by third parties, except for de-ciding whether to use a particular service or not. Therefore, thenext example shows how the integrator can identify the degra-dation in the systems performance resulting from even one spe-cific component which may be a service-providing componentlocated on another machine.

Consider the distributed CBS system shown in Fig. 2(a). Letus assume that this system has been in use for some time, andthere has not been any complaint on its operation, performance,etc. However, the service provider for component replacedthe component with another one that has the same interface butsome extra functionalities to provide more services for othercustomers. There is also the possibility that the vendor replacedthe component with a newer version that includes some bugfixes. Since the change of a component by its vendor is not trans-parent to the integrator at the customer side, it is not possible toknow such a change has been made. Moreover, it may be thatthe vendor’s service providing component became very pop-ular and is being used by many service requesting customers.Component , for whatever reason, takes longer time to pro-vide the requested service. The system’s CCFG demonstratesthis behavior, as shown in Fig. 2(b). Both CCFGs seen in thefigure are almost identical except for the parameter values (la-bels) for the arcs extending from nodes to , which representthe execution times for the old and new versions of component

.The and metric values, for the normal operation of the

system, have been documented as and. The integrator computes the current metric values

and finds out that the new values for these metrics areand . Comparing these metric

values, the integrator notices the degradation in system perfor-mance.

Once the integrator is aware of the performance degradationin the system, the possible causes for the poor performance canbe investigated. The problem may be originating from hard-ware failure, or the increasing number of customers on the mainsystem, or even the network where the system is located. Weassume that the hardware related checks have been made, and

Fig. 2. A system that has (a) been functioning well and (b) undergoneperformance degradation.

the problem is of the type mentioned in this case study: a spe-cific component that is provided by a third party. The questionis whether the integrator can identify the “slothful” componentor components that are causing the degradation in the overallperformance of the system. This question will be addressed inthe next case study.

C. “Slothful” Component and Process Identification

In the previous case study, the performance degradation in aCBS system was identified due to the changes encountered by athird-party component. Although the detection of performancedegradation was accomplished, the locality of the problem re-mained unknown. In other words, it is not known which compo-nent or CIU is causing the degradation in the performance of thesystem. Therefore, once the degradation is detected, one needsto identify the “slothful” component or CIU that is causing thedegradation. Now, a procedure based on the proposed metricswill be provided to locate the “slothful” component or CIUwithin a system.

Assume we have the distributed system shown in Fig. 2(a).This system is subjected to performance degradation and the in-tegrator is seeking to identify which CIU is causing the problem.The system’s CCFG after performance degradation started isshown in Fig. 2(b). At this point, the integrator can take advan-tage of the CIU/CCFG-based documentation of the system thatwas constructed at the design phase and has been updated witheach evolution the system has undergone.


Fig. 3. CIUs for the system shown in Fig. 2.

TABLE IIVALUES OF ' AND � FOR CIU-1 AND CIU-2 OF THE SYSTEM BEFORE AND

AFTER UNDERGOING PERFORMANCE DEGRADATION

Composition/decomposition methodologies of CCFGs arewell established [4], [17], [60]. The interested reader can referto Prather [17] or Tang [60] for detailed descriptions on howthe composition and decomposition of CCFGs take place.Decomposition of the system into its CIUs before and afterdegradation are shown in Fig. 3. The and values for CIU-1and CIU-2 before and after degradation are summarized inTable II. By examining these values, the integrator concludesthat the CIU-1 is not the source of degradation. The changes inboth metric values suggests that CIU-2 is where the problem islocated. Therefore, the integrator can conclude that the problemlocation has been identified at the CIU level.

In this case study, the integrator has identified the CIU that iscausing the problem. Once the location of the problem has beenidentified at CIU level, the following options are available to theintegrator:

1) replacing the entire CIU with a functionally equivalentone, using completely different components, possibly fromdifferent service providing vendors, or decomposing the CIUfurther and finding the problematic irreducible CIU and re-placing the irreducible CIU as explained above;2) decomposing the CIU further to have a better resolution

of the source of problem.

If the problematic CIU is not decomposable any further, theintegrator will identify the component within the problematic

CIU by examining the components. Then, the service-providingcomponent is replaced with another one, possibly from anothervendor.

D. Component and Service Ranking

Inevitable and growing necessity of providing services vianetworked computers is pushing software to be distributed.Therefore, the nature of applications is becoming no longersingle computer-single environment but multicomputer andpossibly multienvironment. The change in the practice ofdevelopment and operation of software raises the followingquestions.

1) How much is a component or service in use while beingpaid for constantly?2) Can there be a bias free ranking mechanism for compo-nents and services used, especially for distributed systems?

The first question addresses the economical cost of using CBScomposed of various components and services. The applicationprovider for each component or service charges the user in cor-relation with the number of components the customer is utilizingwithin an application. For a large corporation, the money paidfor the unused components becomes a considerable amount ofmoney. Therefore, a mechanism is needed to identify the com-ponents that are not used as often and then possibly find a wayto eliminate those components from the design.

The second question relates more significantly to marketinga product. Vendors may advertise their products in a biasedway, and the reliability of the product often remains an unan-swered question. Companies also advertise their products bygiving larger companies that use their products as an example.However, that does not always mean that the product is a reli-able one. Therefore, a “relationship-resistant” ranking mecha-nism for components and services is needed.

Consider a CBS that is composed of ten components, markedas , shown in Fig. 1(a). The arcs’ labels may repre-sent either the duration of each arc during the execution (aver-aged and scaled throughout several executions of the system),in analogy to the approach proposed by Unwala and Cragon[61], [62] for modeling pipelined processors with finite Markovchains, or weight of the paths based on another parameter of in-terest. The question to be answered is “During the constant ex-ecution of this system, which components are mostly used andwhich are not?” This question will be answered by utilizing theMarkov process defined on a CCFG defining a Shannon lan-guage. The stationary state probabilities defined for the nodeswill serve the purpose.

The left and right Perron eigenvectors and for areseen in (8), where stands for transpose. Once the left andright Perron eigenvectors are computed, the component rankingvector can be found by their element-wise product. The resultingvector is seen in Table III. Examining the results, one can seethat the components located at nodes and are being usedwith the least frequency, less than 4% throughout the executionof the system. After this verification, the integrator can sit down


TABLE IIICOMPONENT USAGE FOR THE SYSTEM IN FIG. 1(A)

and reorganize his design and perhaps try to eliminate thosecomponents with a new design

(8)

Trust and reliability are two important concerns when usingservices from third parties in composing a system. Often, whena system is composed, the security and trust issues result inchanging the design or the provider of the service. Consider thesame graph shown in Fig. 1(a) as a relationship graph for the ser-vices provided by different vendors, with the branches showingthe weight of links that are between any two services. For ex-ample, there is one direct service link from component to

, but at the same time component calls the services of com-ponent by passing through two other nodes which arenot of interest. The stationary state probabilities that are definedon the nodes (components/services) of this CCFG were alreadycomputed and tabulated in Table III. Then, one can conclude thatthe most demanded (popular) components/services are those lo-cated on nodes and .

V. RESULTS AND CONCLUSIONS

This paper introduced some notions from a well-establishedfield, information theory, into a yet developing and relativelyyoung field, CBS. The notions brought in from informationtheory to help solve the problems in CBS are based on thenoiseless channel concepts. Hence, a relationship between thetwo fields has been established.

The pervasive Shannon metrics were defined within the pro-posed modeling technique for CBS. A CBS was first modeled asa CCFG, and then the pervasive Shannon metrics were definedby making use of the capacity notion for the Shannon languagedefined on the respective CCFG. The metrics defined for thenoiseless channel modeling approach are the inactivity metric,behavioral growth metric, and, finally, the inefficiency metric.

The labeling of the CCFG describing the CBS can be parametervalues of “hidden” components within the nodes or along thearcs, or both. In this respect, we provide an advantage over theregular use of CFGs.

A new viewpoint to CBS has been introduced in this paper:the information-theoretical view enables analysis of a softwareat the specification level, code level, and component level. Theproposed modeling technique and metrics in this paper are scal-able: modeling and analysis of a software system can be doneeither at the CIU/component level or at the system level. Themodeling techniques and metrics hold their applicability at anygranularity level. Moreover, the proposed modeling techniqueand metrics can be used at the design phase to help mitigate thearchitectural risks. Therefore, the modeling technique and met-rics proposed in this paper contribute to the set of existing toolsand methodologies that help the system designer. Hence, thisstudy contributes to the set of existing metrics for assessment ofCBS.

To assure the worthiness of these metrics, four case studieswere presented, each involving a different application of the pro-posed metrics. Use of the metrics in these case studies helpeddemonstrate the benefits of using these metrics in CBS systemswhich are modeled based on noiseless channels.

Although this study was limited to use of CCFGs for repre-senting CBS systems, this is not a must. As long as the CFG rep-resenting the system is strongly connected, the proposed mod-eling technique and metrics are applicable. CCFGs were usedin this study due to the advantages they provide in dealing withCBS, especially in terms of composition and decomposition.

VI. FUTURE WORK

The pervasive Shannon metrics proposed in this paper can betested against a set of validation axioms. We think these metricscan be validated by using the validity axioms provided by Tianand Zelkowitz [63], TZ axioms. It is foreseen that these metricswill not require any modification in these axioms. Modificationof one of the TZ axioms was originally proposed by Tian andZelkowitz [63].

Projective planes have been used in combinatorial design ofcongestion-free networks [64], [65]. A projective plane of order

exists when is prime or an order of a prime. Desargue-sian planes are constructed from a finite field of order , de-noted [65]. Moreover, Mihalek [66] provided a the-orem which states that an algebraic incidence basis with ele-ments from a division ring is a Desarguesian plane. The relation-ship between desarguesian planes and CCFGs defining Shannonlanguages with labelings from a division ring [58] seems to be apromising research direction with potential applications on de-sign of congestion-free networks.

Kemeny’s [67] constant is related to modeling componentsbased on Shannon languages through the underlying irreduciblematrices. Levene and Loizou [68]presented an application ofKemeny’s constant for random surfing. One can explore theuse of Kemeny’s constant in the Shannon language defined ona CCFG, to explore the “minimized” test paths in the V&Vprocess [69]. Furthermore, the proposed modeling techniquescan be utilized in assessment of critical and shortest paths [70].


The modeling techniques and metrics proposed in this studycan be useful in investigating the load sharing properties in a va-riety of systems, even at the design phase, and help the designerwith designing uniform systems. Application of the proposedmetrics and modeling technique to different service industrymethodologies [71], by conducting a case study, may providemore insight into the application procedure of the proposed met-rics in this area. Moreover, a detailed study on analysis of busi-ness processes with the proposed metrics and modeling tech-nique may help identify the inefficient processes or provide in-formation useful to improve the overall business performance.

Exploring the possible applications of the modeling tech-niques and metrics developed in this paper in reliability analysisof software seems to be a promising area to explore [72]. Fur-thermore, investigating the applicability of the proposedmodeling techniques and metrics in computer interconnectionminimization [73] may provide new application avenues.

Investigating the use of proposed modeling and measurementtechnique in assessment of component or service retrieval effi-ciency, using the component library concept, is a promising re-search direction. Welch [74] used entropy and related notionsto measure the information retrieval efficiency in library cata-logs. Hence, the proposed research direction can benefit fromthe work of Welch.

ACKNOWLEDGMENT

The authors thank Dr. C. V. Ramamoorthy of the ComputerScience Department, University of California at Berkeley, andDr. A. J. Kornecki of the Department of Computing, Embry-Riddle Aeronautical University, for their invaluable discussionsand comments on the subject.

REFERENCES

[1] R. Seker, “Component-based software modeling based on Shannon’sinformation channels,” Ph.D. dissertation, Univ. of Alabama at Birm-ingham, Birmingham, 2002.

[2] A. P. Sage and J. D. Palmer, Software Systems Engineering. New York:Wiley, 1990.

[3] M. M. Tanik and E. S. Chan, Fundamentals of Computing for SoftwareEngineers. New York: Van Nostrand, 1991.

[4] Y. Tang, “A methodology for component-based system integration,”Ph.D. dissertation, New Jersey Institute of Technology, Newark, NJ,1999.

[5] L. K. Jololian, “A meta-semantic language for smart compo-nent-adapters,” Ph.D. dissertation, New Jersey Institute of Technology,Newark, NJ, 2000.

[6] J. Hopkins, “Component primer,” Commun. ACM, vol. 43, no. 10, pp.27–30, 2000.

[7] K. C. Wallnau, S. A. Hissam, and R. C. Seacord, Building Systems FromCommercial Components. Reading, MA: Addison-Wesley, 2002.

[8] P. Brereton and D. Budgen, “Component-based systems: A classificationof issues,” IEEE Comput., vol. 33, pp. 54–62, Nov. 2000.

[9] W. Emmerich, “Distributed component technologies and their softwareengineering implications,” in Proc. 24th Int. Conf. Software Engi-neering, Orlando, FL, 2002, pp. 537–546.

[10] A. W. Brown and K. C. Wallnau, “The current state of CBSE,” IEEESoftware, pp. 37–46, Sept./Oct. 1998.

[11] J. Grundy, R. Mugridge, J. Hosking, and M. Apperley, “Tool integration,collaboration and user interaction issues in component-based softwarearchitectures,” Technol. Object-Oriented Lang., pp. 299–312, 1998.

[12] P. T. Cox and S. Baoming, “A formal model for component-based soft-ware,” in Proc. IEEE Symp. Human-Centric Computing Languages andEnvironments, 2001, pp. 304–311.

[13] E. Martins, C. M. Toyota, and R. L. Yanagawa, “Constructingself-testable software components,” in Proc. Int. Conf. DependableSystems and Networks, 2001, pp. 151–160.

[14] G. T. Heineman and W. T. Councill, Component-Based Software Engi-neering: Putting the Pieces Together. Reading, MA: Addison-Wesley,2001.

[15] H. Chestnut, Systems Engineering Tools. New York: Wiley, 1965.[16] S. Wolfram, A New Kind of Science: Wolfram Media Inc., 2002.[17] R. E. Prather, “Design and analysis of hierarchical software metrics,”

ACM Comput. Surveys (CSUR), vol. 27, no. 4, pp. 497–518, 1995.[18] E. J. Weyuker, “Evaluating software complexity measures,” IEEE Trans.

Software Eng., vol. 14, pp. 1357–1365, Sept. 1988.[19] M. J. Harrold, D. Liang, and S. Sinha, “An approach to analyzing and

testing component-based systems,” in Proc. 1st Int. ICSE Workshop onTesting Distributed Component-Based Systems.

[20] M. J. Harrold, “Testing: A roadmap,” in Proc. Conf. Future of SoftwareEngineering , 2000, pp. 61–72.

[21] Y. Wu, D. Pan, and M.-H. Chen, “Techniques for testing component-based software,” in Proc. 7th IEEE Int. Conf. Engineering of ComplexComputer Systems, 2001, pp. 222–232.

[22] H. Zhu and X. He, “An observational theory of integration testing forcomponent-based software development,” in Proc. 25th Annu. Int. Com-puter Software and Applications Conference (COMPSAC 2001), 2001,pp. 363–368.

[23] C. Szyperski, Component Software: Beyond Object-Oriented Program-ming. New York: ACM Press/Addison-Wesley, 1998.

[24] X. Cai, M. R. Lyu, W. Kam-Fai, and K. Roy, “Component-based soft-ware engineering: Technologies, development frameworks, and qualityassurance schemes,” in Proc. 7th Asia-Pacific Software EngineeringConf. (APSEC 2000), 2000, pp. 372–379.

[25] A. Fujimura and F. Moore, “Quality on time,” Software Quality J., vol.7, no. 2, pp. 97–106, July 1997.

[26] E. J. Weyuker, T. J. Ostrand, J. Brophy, and B. Prasad, “Clearing a careerpath for software testers,” IEEE Software, vol. 17, pp. 76–82, Mar.–Apr.2000.

[27] R. R. Dumke and A. S. Winkler, “Managing the component-based soft-ware engineering with metrics,” in Proc. 5th Int. Symp. Assessment ofSoftware Tools and Technologies, 1997, pp. 104–110.

[28] R. Cherinka, C. M. Overstreet, and J. Ricci, “Maintaining a COTS in-tegrated solution -are traditional static analysis techniques sufficient forthis new programming methodology,” in Proc. Int. Conf. Software Main-tenance, 1998, pp. 160–169.

[29] E. J. Weyuker, “Testing component-based software: A cautionary tale,”IEEE Software, vol. 15, no. 5, pp. 54–59, Sept.–Oct. 1998.

[30] M. Ancona, W. Cazzola, G. Dodero, and V. Gianuzzi, “Communicationmodeling by channel reification,” in Proc. Workshop “Advances in Lan-guages for User Modeling”—6th Int. Conf. User Modeling, Chia La-guna, Sardinia Italia, June 1997, pp. 1–9.

[31] P. Maes, “Concepts and experiments in computational reflection,” inProc. Conf. Object-Oriented Programming Systems, Languages and Ap-plications, 1987, pp. 147–155.

[32] L. Liu and R. Meersman, “The building blocks for specifying communi-cation behavior of complex objects: An activity-driven approach,” ACMTrans. Database Syst. (TODS), vol. 21, no. 2, pp. 157–207, 1996.

[33] S. S. Muchnick and N. D. Jones, Program Flow Analysis: Theory andApplications. Englewood Cliffs, NJ: Prentice-Hall, 1981.

[34] A. Khandekar, R. McEliece, and E. Rodemich, “The discrete noiselesschannel revisited,” in Proc. 1999 Int. Symp. Communication Theory andApplications, 1999, pp. 115–137.

[35] C. V. Ramamoorthy, “Analysis of graphs by connectivity considera-tions,” J. ACM (JACM), vol. 13, no. 2, pp. 211–222, 1966.

[36] C. V. Ramamoorthy and S.-B. F. Ho, “Testing large software with au-tomated software evaluation systems,” IEEE Trans. Software Eng., vol.SE-1, pp. 46–58, Mar. 1975.

[37] C. V. Ramamoorthy, S.-B. F. Ho, and W. T. Chen, “On the automatedgeneration of program test data,” IEEE Trans. Software Eng., vol. SE-2,pp. 293–300, Dec. 1976.

[38] P. van den Broek and K. van den Berg, “Generalized approach to soft-ware structure metrics,” Software Eng. J., vol. 10, no. 2, pp. 61–67, Mar.1995.

[39] C. Böhm and G. Jacopini, “Flow diagrams, turing machines and lan-guages with only two formation rules,” Commun. ACM, vol. 9, no. 5,pp. 366–371, 1966.

[40] E. W. Dijkstra, “Letters to the editor: Go to statement consideredharmful,” Commun. ACM, vol. 11, no. 3, pp. 147–148, 1968.

[41] , “The humble programmer,” Commun. ACM, vol. 15, no. 10, pp.859–866, 1972.


[42] D. E. Knuth, “Structured programming with go to statements,” ACMComput. Surveys (CSUR), vol. 6, no. 4, pp. 261–301, 1974.

[43] R. E. Prather, “Regular expressions for program computations,” Amer.Mathemat. Monthly, vol. 104, no. 2, pp. 120–130, Feb. 1997.

[44] R. Greenlaw and R. Petreschi, “Cubic graphs,” ACM Comput. Surveys(CSUR), vol. 27, no. 4, pp. 471–495, 1995.

[45] R. E. Prather, “Convexity and independence in software metric theory,”Software Eng. J., vol. 11, no. 4, pp. 238–246, July 1996.

[46] C. E. Shannon and W. Weaver, The Mathematical Theory of Communi-cation. Urbana: Univ. of Illinois Press, 1963.

[47] K. K. Nambiar, “Shannon’s communication channels and word spaces,”Mathemat. Comput. Modeling, vol. 34, no. 7–8, pp. 757–759, 2001.

[48] D. Lind and B. Markus, An Introduction to Symbolic Dynamics andCoding. Cambridge, U.K.: Cambridge Univ. Press, 1999.

[49] G. J. Chaitin, “Randomness and mathematical proof,” Sci. Amer., vol.232, no. 5, pp. 47–52, May 1975.

[50] L. Hellerman, “A measure of computational work,” IEEE Trans.Comput., vol. C-21, pp. 439–446, May 1972.

[51] N. Coulter, R. B. Cooper, and M. K. Solomon, “Information-theoreticcomplexity of program specifications,” Comput. J., vol. 30, no. 3, pp.223–227, 1987.

[52] E. B. Allen and T. M. Khoshgoftaar, “Measuring coupling and cohe-sion: An information-theory approach,” in Proc. 6th IEEE Symp. Soft-ware Metrics, Nov. 1999, pp. 119–127.

[53] R. Seker and M. M. Tanik, “Component-based software modelingbased on noisy information channels,” in Proc. Int. Conf. Computer,Communication, and Control Technologies (CCCT’03), vol. 3, H.-W.Chu, J. Ferrer, and J. M. Pineda, Eds., Orlando, FL, July/Aug. 2003,pp. 144–149.

[54] , “Discrete noiseless information channel and search engines,”Univ. of Alabama at Birmingham, Tech. Rep. 2002-04-ECE-002, 2002.

[55] H. Minc, Nonnegative Matrices. New York: Wiley, 1988.[56] A. Berham and R. J. Plemmons, Nonnegative Matrices in the Mathemat-

ical Sciences. New York: Academic, 1979.[57] R. Seker and M. M. Tanik, “Division ring, information theory, and dis-

crete noiseless communication channel,” Univ. of Alabama at Birm-ingham, Tech. Rep. 2001 12-ECE-018, 2001.

[58] K. K. Nambiar, “Matrices with elements from a division ring,” Math-emat. Comput. Modeling, vol. 24, no. 1, pp. 1–3, 1996.

[59] , “Theory of search engines,” Comput. Mathemat. Applicat., vol.42, no. 12, pp. 1523–1526, 2001.

[60] Y. Tang, A. H. Dogru, F. J. Kurfess, and M. M. Tanik, “Computing cy-clomatic complexity with cubic graphs,” J. Syst. Integrat., vol. 10, no. 4,pp. 395–409, Sept. 2001.

[61] I. H. Unwala and H. G. Cragon, “A markov chain modeling techniquefor evaluating pipelined processor designs,” in Proc. 37th Midwest Symp.Circuits and Systems, vol. 1, 1994, pp. 319–322.

[62] , “Design evaluation of pipelined processors using finite state ma-chine analysis with markov chains,” in Proc. 3rd Int. Conf. Economicsof Design, Test, and Manufacturing, 1994, pp. 147–151.

[63] J. Tian and M. V. Zelkowitz, “Complexity measure evaluation and se-lection,” IEEE Trans. Software Eng., vol. 21, pp. 641–650, Aug. 1995.

[64] B. Yener, Y. Ofek, and M. Yung, “Combinatorial design of congestion-free networks,” IEEE/ACM Trans. Networking, vol. 5, pp. 989–1000,Dec. 1997.

[65] C. J. Colbourn, “Projective planes and congestion-free networks,” Dis-crete Appl. Mathemat., to be published.

[66] R. J. Mihalek, Projective Geometry and Algebraic Structures. NewYork: Academic, 1972.

[67] J. G. Kemeny and J. L. Snell, Finite Markov Chains. Princeton, NJ:Van Nostrand, 1960.

[68] M. Levene and G. Loizou, “Kemeny’s constant and the random surfer,”Amer. Mathemat. Monthly, vol. 109, 2002, to be published.

[69] I. Sommerville, Software Engineering, 6 ed. Reading, MA: Addison-Wesley, 2001.

[70] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algo-rithms. Cambridge, MA: MIT Press, 1990.

[71] C. V. Ramamoorthy, “A study of the service industry-functions, featuresand control,” IEICE Trans. Commun., vol. E83-B, no. 5, pp. 885–902,May 2000.

[72] M. Sahinoglu, “Compound-poisson software reliability model,” IEEETrans. Software Eng., vol. 18, pp. 624–630, 1992.

[73] J. D. Carothers and H. G. Cragon, “Graph-theoretic techniques for com-puter intraconnection minimization,” IEEE Trans. Syst., Man, Cybern.,vol. 23, pp. 876–888, May–June 1993.

[74] T. A. Welch, “Bounds of information retrieval efficiency in static filestructures,” Ph.D. dissertation, Massachusetts Inst. Technol., Cam-bridge, MA.

Remzi Seker (M’96) received the B.Sc. and M.S. de-grees from the Electrical and Electronics EngineeringDepartment, University of Cukurova, Adana, Turkey,and the Ph.D. degree in computer engineering fromthe University of Alabama at Birmingham (UAB) in2002.

Prior to joining the University of Alabama atBirmingham (UAB) as a graduate student, he wasawarded three scholarships for his Ph.D. studiesabroad. These scholarships were offered by TurkishMinistry of National Education, Helsinki University

of Technology, and UAB. He has been an Assistant Professor in the Depart-ment of Computer and Software Engineering, Embry-Riddle AeronauticalUniversity, Daytona Beach, FL, since January 2003. His research interests arecomponent-based software, dependable safety-critical software, and disasterand infrastructure engineering. His current research sponsors are GuidantCorporation and the Federal Aviation Administration.

Dr. Seker received the Outstanding Graduate Student award in 2000 and 2001from UAB. He was awarded a Service Award in 2000 and the Outstanding LongTerm Service Award in 2003, both from the Society for Design and ProcessScience. He is a member of Phi Kappa Phi and Tau Beta Pi.

Murat M. Tanik received the Ph.D. degree from theComputer Science Department, Texas A&M Univer-sity, College Station, in 1978.

He joined the the School of Engineering at theUniversity of Alabama at Birmingham (UAB) in1998 as a Professor. Prior to joining the UABfaculty, he was an Associate Professor and the Di-rector of Electronic Enterprise Engineering at NJIT,Newark, NJ, and the Director of Software SystemsEngineering Institute (SSEI) at The University ofTexas at Austin. He is also the Director and Chief

Scientist of the Process Sciences Laboratory, a think-tank of process-centeredknowledge integration. He has worked on related projects for NASA, ArthurA. Collins (developer of Apollo moon missions’ tracking and communicationssystems), and for Raymond T. Yeh at International Software Systems, Inc.(ISSI). He was an Associate Professor and the Director of the Software Sys-tems Engineering Technology (SEK) Research Group at Southern MethodistUniversity (SMU). He is co-founder of the interdisciplinary and internationalsociety, Society for Design and Process Science. His publications includeco-authoring six books, co-editing eight collected works, and more than 100journal papers, conference papers, book chapters, and reports funded by variousgovernment agencies and corporations. Under his direction, 16 Ph.D.s and22 M.S. theses have been completed. His research interests include softwaresystems engineering, embedded and intelligent software systems, wirelessand time-critical software support, information theory, quantum computing,and integrated systems design and process engineering. Sponsored projectsinclude distributed agent sensor systems, e-business application design forsports medicine, multilifecycle engineering software systems for environmentalprotection, time-critical systems for super conducting super collider, intelligentcost-estimation systems software, intelligent time-and-space critical systems,intelligent digital switch maintenance systems, and intelligent user-interfacesfor factory and laboratory automation. His current and recent sponsors includeDoD, the Army, NATO, Texas Instruments, AT&T, E-Systems, Lockheed,CTI-Brazil, EMBRAPA-Brazil, Super Conducting SuperCollider, State ofNew Jersey, Northern Telecom, IRS, DEC, Abbott Laboratories, Merle-Collinsfoundation, NCIIA, and Texacone, Inc.

Documents

An Information-Theoretical Framework for Modeling Component-Based Systems