Planar similarity - A new synthetic metric F.J. Garlick · Planar similarity - A new synthetic metric ... of 4GL products where programmers are developing whole systems the need for

Planar similarity - A new synthetic metric

F.J. Garlick

Department of Information Science, Portsmouth Business

School, University of Portsmouth, Portsmouth, Hampshire,

Abstract

This paper describes a new metric which identifies similarities betweenprograms and uses this similarity as a measure or indicator of quality. Themethod requires that a set of generic programs be used in design and fromthese a set of conventional metrics is defined which are then used to generatethe similarity metric.

GENERIC PROGRAMMING - A RECIPE FOR INNOVATION

Stephane Grappelli, regarded by many as perhaps the world greatest exponentof the Jazz Violin once said, when speaking about his kind of music;

..practice is good for the fingers but not good for the imagination.

On the face of it this seems unwise to say the least, but the point is that forJazz the essence of the music is improvisation (innovation) and variations ona theme. Thus when Grappelli is playing Tea for Two he knows what it issupposed to sound like, his ensemble knows and so does the audience, yetparadoxically no two performances ever sound exactly alike.

If we require our programmers to be imaginative, then our training must bemore closely aligned with reality. Current training is often suspect in thisrespect since the tasks assigned are usually unrealistic. Is there, then, aprogramming methodology that can offer the sort of facility where training isrealistic and the materials used are essentially the same as those used inproduction code? In other words a kind of cookbook or generic design andcoding scheme, or system.

Some modern languages and systems give active support to this idea, notablyOOD/OMT (Rambuagh [1]). The point at issue then, is why do so manyprogrammers apparently work from scratch when they are producing newcode? Surely the wise thing to do is to use proven designs as a basis for anynew work.

Design Innovation or Stagnation.It has often been said that generic code causes stagnation and reduces the

Transactions on Information and Communications Technologies vol 4, © 1993 WIT Press, www.witpress.com, ISSN 1743-3517

504 Software Quality Management

function of the programmer to that of mere coding. This does not seem to bethe case in practice, and it can be argued that hanging on to traditionalmethods is considerably more likely to cause stagnation both in terms ofdesign, innovation and personal development. Indeed, with the increasing useof 4GL products where programmers are developing whole systems the needfor generic systems is of great importance.

At the start of development there is a need to evolve sound generic designs.This requires resources because it can only be done by experienced staff andto do it well, time is needed to develop and test the designs, gain approvalfrom those who will use them and so on. Once the designs are approved thensignificant advances should follow in terms of productivity, quality andmaintainability. Thus:

Development of generics should be continuous with staff encouraged tosuggest either new generics or changes to existing ones.

Using generics frees the programmer from simply re-inventing the samepiece of code. They can therefore concentrate on the application beingdeveloped in the knowledge that the major logic is correct.

Design generics carry over into other languages. For example, once a serialupdate generic is designed it is not a difficult task to implement that samedesign in a different languages.

Generic Designs and CodeStructured programming, the cure-all of the early 80's has not really given themajor increases in productivity which Were expected though it did have abeneficial impact on the quality of delivered systems. What is needed ispredictability in terms of the final code, reliability, re-usability and generalquality.

Very few of our current methods can meet, in any measurable way, thedemands of this last characteristic (predictability) and are frequently notflexible enough in the sense that they allow the programmer to breath. It is mycontention that generic design does offer the best hope of achieving all theabove characteristics. Generic designs and coding systems may in reality beexpressed as a methodology such as OOD/OMT but the underlying idea is thatof abstracting from the real world appropriate programming frameworks thatare re-useable.

COMPARISON - A QUALITY CONTROL MECHANISM

The section discusses the role of generic design and suggests its possible usein providing a mechanism of comparison. The idea is to compare a given


Software Quality Management 505

program implementation with a generic constructed for that application. Thedifficulty is that of finding a way to sensibly measure the degree of similaritybetween the generic and programs derived from it. If we can find such asimilarity measure then it can be used as a software metric.

Software QualityThere have been many attempts at classifying software characteristics, one ofbest known is Boehms COCOMO model [2]. The model is constructive in thesense that the indicators it suggests are intuitively useful characteristics of anypiece of software - that is, they are essentially operationally derived. Themodel sensibly divides these desirable features into groups; helpfully expressedin the following manner by Kitchenham & Wood [3]

Quality Drivers Initial assessment of the quality requirements of a projectin terms of features of the organisation, the product, the developmentprocess the personnel, and the project.

Quality Factors The required properties of the final product.

Quality Metrics or Indicators These are the quantitative measures withwhich we can assess the progress toward the final quality requirements.

Several different quality schemes have been advocated based on the factor,criteria, metric model discussed above. The first significant work in the fieldwas due to Boehm and his co-workers [4], this being followed by McCall,Richards and Walters [5] and latterly Bo wen, Wigle and Tsai [6]. The qualityfactors identified by these groups are illustrated in table 1 below.

Boehm et al McCall et al Bowen et al

Efficiency Efficiency EfficiencyReliability Reliability ReliabilityHuman Engineering Usability UsabilityModifiability Flexibility ExpandabilityPortability Portability PotabilityTestability Testability VerifiabilityUnderstandability Reusability Reusability

Maintainability MaintainabilityInteroperability InteroperabilityCorrectness CorrectnessIntegrity Integrity

Flexibility

Table 1. Quality factors



Multiplicity of Quality FactorsThe increase in the number of factors used by later researchers leads to acorresponding increase in the number of underlying quality criteria. Aconsequence of this extended list of quality factors and criteria is the need forfurther quality metrics. However, it is fairly clear that the are numeroussynonyms and probably considerable overlap between the various factors. Evenso there are still a number of important problems that are not at all easy toresolve. In particular:

How can we objectively decide whether to include or exclude a particularquality factor.

Quality factors are not defined in measurable terms. Thus, it may beimpossible to verify relationships that we think may exist between factors.

The factors seem to concentrate on elements that are only indirectlyconnected to the applications themselves.

Latterly we have seen a move towards better methodologies as a means ofquality control. This is natural, for it is well understood that standardisationbrings about quality improvements. In this sense there is less emphasis on codemetrics and a better understanding that to gain control of the developmentprocess, metrics must be available for measuring every part of the process.This is well illustrated by Moller [7] when he states that metrics are driven bybusiness objectives.

METRICS FOR QUALITY CONTROL

In this study it is necessary to understand what is meant by the term SoftwareMetric. Metric is a mathematical term and metric spaces are required to havethe following properties, where the function d may be interpreted as distance:

Reflexive d(x,x) = 0

Symmetric d(x,y) = d(y,x)

Triangle Inequality d(x,y) < = d(x,z) + d(z,y)

Ideally we would like a software metric to have these properties because itwould then be mathematically tractable and we could use metric spacetheorems to generate further results, make comparisons of one metric withanother and so on. Unfortunately, when calculating software metrics the wordmetric cannot generally be understood in a mathematical way, rather it mustbe understood in the sense described by Kitchenham [8] who expresses theessential idea in the following manner:



The term software metric is used ...to mean measures (in terms of amountsor counts) related to software products., this is a fairly loose definition andreflects the fact that the term software metrics is used as a general tag tocover all aspects of quantification ...

Nature and Use of Software MetricsThe normal concept of measurement requires some scale along which equalunits exist and a zero position. Software measurements typically have neitherequal units of measurements nor anything but arbitrary zero. Thus it is notpossible (at present) for software practitioners to make statements to the effectthat one design method produces code who's reliability is a particular multipleof the code produced by some other method.

A Metric of SimilarityThe metric described in this paper, planar similarity, is based on the simplepremise that if two or more programs perform the same function, then theirunderlying algorithms must in some sense be similar. It follows that if we canagree on a basic algorithm and corresponding program structure for aparticular application, then it might well be possible to measure the degree ofsimilarity. For example, suppose we develop a generic for standard updatingof an indexed sequential file, from this generic many update programs will bederived and using the similarity metric we may be able to detect significantdepartures from the parent generic in any given program.

In order to obtain planar similarity it will be necessary to use other, moreconventional metrics. These metrics are direct measurements from programcode (e.g. decision counts) or be synthetics (e.g. decision density). At thisstage it is not known with any certainty, which metrics will be good indicatorsof similarity, however it seems likely that such measures do exist and maywell vary from one application type to another. Assuming that such metricscan be found, it will be necessary to process them to assess the degree ofsimilarity or otherwise.

Multidimensional ScalingIn my experimental work I have used a multidimensional scaling method thatwill take a set of values and process them to give a measure of closeness.Multidimensional scaling (MDS) deals with the problem of how to measurerelationships between objects, (in our particular case programs) when theunderlying dimensions are not known. Specifically, MDS reduces all the datato just two dimensions. In this study 26 metrics (dimensions) for each programin the sample set are examined, but the need is to somehow compress this databy a systematic classification scheme and hopefully this will enable us tounderstand and organize the basic concepts. Now the best we can say about thetype of scale on which metrics are based is that it is likely to be ordinal. That



is, we can arrange the metric values derived from the various programs inrank order of magnitude.

Schiffman [9], Manly [10] and others uses a very simple example to illustratehow MDS works. Briefly, consider the case of reconstructing a road mapwhen all you know is the distances between towns. To do this you could startwith any initial distribution of towns and then by a series of iterations thetowns are moved in the plane until the distances between a particular town andall other towns agree with the initial data.

For the program data in this study, MDS on the first iteration looks at the rankorder of distances along a line and compares this with the rank order of theactual distances found in the data. If a large measure of error (called stress)is found then points are moved to new positions and a further iteration isperformed. This process continues until the stress value is sufficiently small.

Once the final configuration is obtained it is largely up to the experimenter todecide what the two dimensions are and whether they are meaningful. Inpractice, interpretation of dimensions require a degree of intuition and areasonably complete knowledge of the properties of the stimuli.

General MDS ProceduresFor this paper 4 sets of approximately 10 complete COBOL programs wereprocessed and 26 metrics calculated for each one. MDS was then used toreduce this 26 dimensional space down to a single point. When this iscomplete the process then finds positions in space for each program relativeto every other program point such that the distances between them willcorrespond as closely as possible to rank order differences in the raw data. Inorder to reduce the computation necessary a principle component analysis isperformed first and the two largest components are selected. In effect thisgives us a two dimensional plane through the subject space. The MDSprocedure then uses this plain and tries to account for the remainingdimensions.

Similarity MetricThe similarity metric will be a kind of synthetic measure since it will bederived from other more conventional metrics. The particular conventionalmetrics used to generate this similarity metric are listed in table 2. I will notformally define these metrics here, however, the reader must take note that forthe most part the definition themselves will be arbitrary.

Cautionary NotesAny one who has done metric studies will tell you that it is relatively easy tocollect the actual counts. In this study 13 basic counts were used, however itwould have required very little extra effort to make this figure much larger.The problem is that all the definitions are arbitrary and represent essentially



personal views. Ideally, we would like to measure some intrinsic properties ofthe code that demonstrably are sound indicators of program quality.

Basic Counts Synthetic Values

Arithmetic Count Arithmetic DensityBlank Line Count Average Procedure SizeBranch Count Branch DensityComment Count Code/comment ratioExecutable Lines Eloc-50Function Count Function DensityIO Count Data/Code ratioParagraph Count Paragraph DensitySection Count Section DensityTotal Lines Count Data ComplexityTransformation Count Transformation DensityUnique Variable Count Efficiency EstimateUnique Verb Count Variable/verb ratio

Table 2. Study Metrics.

However I want to formally define planar similarity, a new synthetic using theframework proposed by the Ross [11] for the software data library.

NamePlanar similarity

ExplanationThe distance between points, each point representing a program, on a planarplot produced after statistically scaling a given set of conventional metrics.

ElaborationThe measure applies to any set of conventional metrics so long as the metricrefers to the same type of application. As such planar similarity may beused as both a control and predictor metric.

MeasurePlanar similarity is found by calculating any set of conventional metrics anddisplaying graphically, after scaling, several values obtained from othersimilar products and the parent generic program or design. The actualplanar similarity value between programs can the be measured directly from

the plot.

Comparability DataFor the measure to be effective it is necessary to determine whichconventional metrics are good indicators of similarity for the application to



which the technique is applied. Additionally, it will be necessary to ensurethat the programs being compared are validly derived from a knowngeneric.

ExamplesExperimental evidence would indicate that conventional metrics such asnormalized executable lines of code or decision counts are both reasonableindicators of similarity when a well developed generic is used.

UsesPlanar similarity can be used in several ways depending on the conventionalmetrics whose closeness to some parent generic or other set of programs isbeing examined. For example, project leaders might look at severalconventional metrics, obtained from a series of application programs afterscaling, in order to decide whether team members are adhering to agreedgeneric design standards. In general it would be used to exaggeratedeviations from the norm, the norm usually being expressed in terms of awell developed generic. Additionally, planar similarity could be used as avalidating mechanism for any proposed new set of metrics.

One further possible use is to indicate similarity between fault tolerantprograms. Thus if several different implementations of the same algorithmare used then it might be possible to assess how well the differing versionsmeet the specification.

DATA COLLECTION FOR PLANAR SIMILARITY METRIC

IntroductionIn order to test the premise that a planar similarity metric would yield usefuldata on both control and predictive factors in software design it was decidedto use a series of programs written by undergraduate students. The studentswere all familiar with using generic design methods for production of COBOLcode so no releaming was involved, and in the majority of cases this was theonly commercial programming strategy with which they were familiar.Additionally about thirteen programs were chosen randomly from an existingcommercial system in case the measure turned out to be so coarse that it wouldeven indicate similarity in such a sample.

Data Sample CharacteristicsFour samples would be taken; three were chosen from the standard assignmentschedule of students and one random sample as described above. For the firstthree samples the programs were chosen so that they represented;

Use of three well defined generics.



Consistent task types; in particular all programs are essentially report

programs.

Increasing order of difficulty.

This sample construction was chosen because it is expected that use of genericcode would produce similar results for at least the simpler programs. Thesecond and third samples represent relatively complex programs whereindividual student choices may lead to diverse program implementations.

ANALYSIS OF STATISTICAL FINDINGS

This section discusses the principal component analysis and the MDS finalconfigurations. At the end of a long study it would be pleasing to report somedeeply significant results. However this is rarely the case and though myfindings are I think quite exciting they are nevertheless just a step along theway to understanding the nature of software and its developmental process.

Analysis of MDS Final Configurations.MDS is commonly use when we are not sure about the underlying dimensionsof the space involved. To do this we look at the plots (figures 1 to 4) and theobjects they represent in order to find the dimensions of a reduced space andthis is where the method ceases to be purely mechanical in nature. There aretechniques for helping in this search but none are universally accepted. Inpractice, as with any statistical technique, close knowledge of the subject spaceis needed to make sense of the plots. If you cannot identify any dimensions inthis way it probably means that your set of measures together do not yield anysensible dimensionality.

In general we look for outliers and their orientation in the plots since theprograms represented by such points are deviant in some way. However, theplots do give some additional information in that it is possible to identify somedimensions. Using these dimensions we can begin to say with some degree ofconfidence why a particular program is different from the rest.

Sample 1 PlotsFig 1 shows the general distribution of points and clearly shows four outliers.One of these is the generic (G) and the others are the three programs; A, B,and C. The advantage of the plot is that such deviant programs are easy toidentify. An examination all the programs in the sample shows that points(representing programs) near the centre of the cluster are those that are soundimplementation of the specification but naturally differing in small details suchas method of initializing variables or the number of paragraph labels used.



Figure 1. Sample 1 Data with and without Generic

Programs B and C which are above the main cluster, exhibit weak binding. Inparticular they are examples of coincidental binding. This is noticeable sincethe various program modules are not well defined. For example, program Bhas read instructions mixed up with instructions that set up a print line.

Program A also exhibits weak binding but the problem here is that modulescontain repeated code elements. In this particular case the program has toomuch functionality. Interestingly the generic lies at the other extreme of thescale - logically, we can say by comparison, that the generic has not enoughfunctionality. Clearly this is the case since the generic is a framework uponwhich we build particular programs; that is we expect to add functionality tothe generic.

Sample 2 Plots.Fig 2 shows the general distribution of the points for the second sample. Thistime it looks as if we have three outliers, one of them being the generic. Theposition of the generic can be explained in a similar way to that for sample 1plots. The remaining deviant programs; A and B all show some structuralweaknesses but not significant amounts. Perhaps the fact that the clusterstretches out along one axis is indicative of the fact that the programs areindeed functionally similar and in general are correctly modularised, but withsmall individual differences, as one would expect.

The generic itself seem quite far away from the main cluster and this mightindicate that further work is needed to produce a more suitable framework.Also the generic introduces several new features and it may well be that whenstudents first use it they are a little confused because of its complexity. Again,if this is true then some re-work might be worthwhile.

It is perhaps worth commenting at this point that data structures do not seemto be significant in determining whether a given program is close to thegeneric or not. Alternatively, one might argue that the structures are so simplethat most students in general do not make serious mistakes in this area.However it is more than likely that the metrics are not sufficiently relevant topick out data structure faults.




Sample 3 Plots.Because of a data collection error one program was included in this set whenit should have been in placed in sample 2. However, the plot shown in Fig 3does emphasise that there is something very different about program A sinceeven the generic is lost in the general cluster and we know from the otherexamples that this is not the usual case. This mistake was useful for it doeshighlight how sensitive the procedure can be.


A close examination of the programs shows that they all are quite goodimplementations of the specification and this would also cause the cluster tocompress thus exaggerating further the outlier.

Sample 4 PlotsThis set represents a completely random choice of programs. All that can besaid here is that the points shown in Fig 4 are scattered over the plain with noobvious orientation or particular deviating points. This is as is expected froma random sample where no generic was used, so any similarity is likely to bejust coincidental.

Principle Component AnalysisPrincipal components analysis has been used in two ways in this study; thefirst to generate a plain for use in the MDS procedure and the second as ameans of generating a reduced set of variables that can account for most of thevariability in the original data. Taking the principle components overall it isclear that in every program the principal components are most likely to be:



1st Component Executable lines of code.

2nd Component Section and Paragraph counts, in COBOL terms thesevalues are essentially procedure counts.

3rd Component Nothing sensible it seems can be said regarding thiscomponent.


In summary, most of the variation can be explained in terms of simple sizemetrics. This is not altogether surprising though it does confirm the findingsof other researchers and corroborate the commonsense suspicions of sensibleworkers in the metrics field. The second component is a little more interestingand points to the fact that modularization is probably an important feature ofall software.

Summary Findings Based on MDS PlotsTaking all the MDS and principle component findings together we maysummarise them as follows.

Including the generic seems a useful scaling factor and does not in generaltend to compress the remaining data to an extent that obscures any outliers.Points that are distant from the main cluster are almost certainly indicativeof poor structuring, evidenced by weak binding of one sort or another.

The principle components responsible for the majority of the variation arelikely to be line and procedure counts or synthetics derived from them.

Cluster have been generally found to come from programs that adhereclosely to the generic and represent sound implementations of thespecification.

There is good evidence that the horizontal axis represents the degree offunctionality. Program points lying to the right of the cluster do not haveenough functionality - the generic being a prime example, whilst programslying to the left of the cluster have too much functionality in that they tendto contain repeated functions.



The vertical axis seems to measure functional binding. That is the degreeto which a module does just one task. Anything above the main cluster islikely to exhibit coincidental binding and implies that the various programfunctions have not been properly separated. In a similar way it seemsprobable that outliers below the main cluster have been excessivelyfunctionalized. That is, many of its procedures are trivial.

CONCLUSIONS

Use of generic programming principles proved to be a fruitful way to learnCOBOL. Students effectively have sample programs to learn from and thosesame samples are being used by other course members. The two majoradvantages are that it is possible to discuss meaningfully your code withsomeone else, and since the generic is well tried students are free toconcentrate on the specification and need not worry about overall structure andsmall processing details that are common to a whole class of applications.

It would appear that simple counts are as good and probably better thansynthetics when it comes to measuring code to obtain quality indicators.

From the MDS plots it seems certain that metric measures will be applicationdependent. In the test carried out it was possible to make sensible predictionsabout particular programs when a generic was involved. However, for plotsof a random sample no such inferences could be made.

Planar Similarity was proposed. It turns out that such a metrics is useful sinceit clearly indicates structural weaknesses in code. More research is needed butit does look as if two of the dimensions of code are binding and functionality.

In my sample the same programmer's work seemed to feature a little to oftenas outliers. It follows that poor programmers or those not using the correctgeneric can be identified - for retraining or if necessary censure.

The results did not indicate very much in terms of the data structures used.This may be partly a refection of my own predisposition toward procedurallybased methods.

The principle component analysis is a little weak because of not taking anadequate sample. However, it does tend to confirm that simple size measuresprobably account for most of the variation in metric samples.



References.

1. Rambaugh, J et al, Object-Oriented Modeling and Design, Prentice-Hall,1991.

2. Boehm, B.W. Software Engineering Economics, Prentice-Hall, 1979.

3. Kitchenham, B. A. and Wood, L.M., Statistical Techniques for ModellingSoftware Quality, REQUEST/ICL-bak/049/S l/QL-RP/01,1986.

4. Boehm, B.W, et al, Characteristics of Software Quality, North-Holand,1978.

5. McCall, J.A, et al, Factors in Software Quality, Volumes 1,11 and III,RADC reports, NTIS/AD/A-049 014,015 & 055,1977.

6. Bo wen, T.P. et al, Specification of Software Attributes. Volumes 1,11 andIII, RADC reports (prepared by Boeing), D182-11678-l,2 & 3, 1984.

7. Moller, K.H. and Paulish, D.J., Software Metrics - A Practitioner'sGuide to Improved Product Development, Chapman & Hall, 1993.

8. Kitchenham, B.A. and Walker, J.G. The Meaning of Quality, Draft,1986.

9. Schiffman, S.S., Reynolds, M.L. and Young, F.W. Introduction ToMultidimentional Scaling, Academic Press, 1981.

10. Manly, B.F.J., Multivariate Statistical Methods - A Primer, Chapman &Hall, 1993

11. Ross, N. Data Definitions, Software Library Report, 2.2.2, 1986.


Documents

Planar similarity - A new synthetic metric F.J. Garlick · Planar similarity - A new synthetic metric ... of 4GL products where programmers are developing whole systems the need for