Software products evaluation system: quality models, metrics and processes—International Standards and Japanese practice

Information and Software Technology 38 (1996) 145-154

Software products evaluation system: quality models, International Standards and Japanese

Motoei Azuma

metrics and processes- Practice

Department of Industrial and Management Systems Engineering, Waseda University, 3-4-l Okubo, Shinjuku-ku, Tokyo 169, Japan

Abstract

ISO/IEC JTCl/SC7/WG6 and JSA/INSTAC/STD/WGS activities are introduced, followed by an outline of product quality evaluation. Then a concept of an evaluation system is proposed. Software evaluation technologies mainly derived from SC7/WG6 and INSTAC activities and some other contributions are explained based on the system concept. ISO/IEC 9126 may be used not only by developers for evaluating their products, but also by a user for selecting a product from alternatives and for many other purposes by many other people. This paper focuses on the evaluation for developers.

Keywords: Product evaluation; Quality evaluation; Quality metrics; Software measurement; Software metrics; Software quality

1. Introduction

As the computer application area has expanded, so has

the criticality of computer based systems. The examples are human life critical systems, social life critical systems, and security critical systems. Probably because software in itself is harmless, the public in general seem to be indifferent to software quality. However, it is the software quality that has significant influence on the system quality.

A wide variety of off-the-shelf software packages for personal computers such as wordprocessor, spreadsheet, drawing, and presentation software, have come into use, as the utilization of personal computers in business environment has spread. An especially rapid spread of downsizing driven by powerful personal computers, local area networks and the Internet resulted in the software quality problems being

highlighted. In order to develop a high quality software, the following

are considered to be important.

(1) To make the input to the process better, that is to say to clarify quality requirement and development policy.

(2) To utilize good resources, such as techniques, highly skilled people and better environments.

(3) To design good development processes, measure the processes, control the processes and improve the processes.

(4) To plan, implement and do software product evaluation properly for both intermediate and final products.

A software product is evaluated by the degree of satisfaction to required quality. To develop software without any goal and try to acquire quality by testing is a waste of time and effort. In order to develop a high quality software

0950-5849/96/$15.00 0 1996 Elsevier Science B.V. All rights reserved

SSDI 0950.5849(95)01069-6

product without redundancy, it is necessary to define quality requirement clearly, and to evaluate the product at the early stage of the life-cycle concretely and qualitatively.

ISO/IEC-JTCl/SC7 (International Organization for Stan- dardization/International Electrotechnical Commission- Joint Technical Committee l/Sub-Committee 7) is working on standardization in the area of software engineering. It consists of nine working groups. SC7/WG6 (Working Group 6: Evaluation and Metrics) is developing drafts for a series of international standards on software product evaluation. ‘ISO/IEC 9126: Information technology- Software product evaluation-Quality characteristics and guides for their use’ [l] is the initial output of the series, which is now widely used and regarded as an unexpectedly successful international standard.

INSTAC/STD/WGS (Information Technologies Research and Standardization Center) was established by the sponsor of MIT1 (Ministry of International Trade and Industry) for the purpose of supporting SC7/WG6 in 1987. It has been working for making researches, and publishing the results as annual reports. The past work was compiled into a book and published by JSA (Japan Standard Association) in 1994 as Software quality evaluation guide book [2].

In this paper, SC7/WG6 and INSTAC/STD/WGS activities are introduced, followed by an outline of product quality evaluation. Then a concept of evaluation system is proposed. Software evaluation technologies mainly derived from SC71 WG6 and INSTAC activities and some other contributions are explained based on the system concept. ISO/IEC 9126 may be used not only by developers for evaluating their product, but also by a user for selecting a product from alternatives and for many other purposes by many other people. This paper focuses on the evaluation for developers.

WG2 WG4 WG6

WGI

WG8

WG9

WGlO

WGll

WG12

146

2. Background

M. Azuma I Information and Sojiware

2.1. JTCl/SC7/WG6 activities

Working groups and their conveners are listed in Table 1. It also illustrates major projects. WG6 (Evaluation and Metrics) is responsible for developing 9126 series and 14598 series of international standard on software product

evaluation as project 7-13. ‘ISO/IEC 9126: Information technologies-Quality

characteristics and metrics’ series is a revision of current ISO/IEC 9126 and consists of the following three parts.

Part 1 Quality characteristics Part 2 External metrics Part 3 Internal metrics

The purpose of ‘ISO/IEC 14598: Information technologies- Software product evaluation series’ is to provide a set of requirements, recommendations and some guides for product evaluation process. It consists of the following six parts.

Part 1 General overview Part 2 Planning and management Part 3 Process for developers Part 4 Process for acquirers Part 5 Process for evaluators Part 6 Evaluation modules

In addition to these work items, WG6 has already finished the project on ‘ISO/IEC 12119: Quality requirement and testing’ [ 3 ] , which was published in 1994. It defines quality requirements and testing directives for software packages, i.e. off-the-shelf software for consumer, based on each of six quality characteristics defined by 9126. SC7/WG4 developed new standard ‘ISO/IEC 14102: Guidelines for evaluation and selection of CASE tools’ [4] . It also gives guides based on each of six quality characteristics defined by ISO/IEC 9126. As software product evaluation related projects, ISO/IEC 14143: Functional size measurement (Committee Draft, WG12) gives basic size measure which is useful not only for scale measure but also for a base of quality metrics. New work item ‘Measurement and rating

Table 1

Working groups in JTClISC7

Working Title

group

Convener

System & Software Documentation

Tools and Environment Evaluation and Metrics

Life Cycle Process

Support of Life Cycle Process

Software Integrity

Process Assessment

Software Engineering Data Definition and Representation

Functional Size Measurement

K. Jonson (UK) T. Vallman (USA) M. Azuma (Japan)

R. Singh (USA)

M. Kaplan (USA)

D. Kiang (USA)

A. Dorling (UK)

P. Eirich (USA)

H Rehesaar (Australia)

Technology 38 (1996) 145-154

of performance of computer-based software systems’ was approved by JTCl ballot and assigned to WG6, which defines metrics and process for performance measurement.

2.2. Corresponding standards organizations in Japan

MIT1 supports the secretariat of international and national standardization activities. Publication of standards is the responsibility of JSA which is a non-profit organization.

Technical works, such as making or reviewing a draft, preparing comments, are entrusted to various professional societies and associations. Works in the area of JTCl are entrusted to IPSJ/ITSCJ (Information Processing Society of Japan/Information Technologies Standardization Commission of Japan). ITSCJ has many sub-committees each of which corresponds to each sub-committee of the JTC 1. Japanese SC7 is one of these sub-committees, and the author is the chairman of it. Its members come from universities, main- frame computer manufacturers, software industry and rep- resentatives of users. It has working groups which correspond to those of the JTCUSC7. Therefore, Japanese national SC7/WG6 (Convener: M Azuma) is responsible for the

9126 series and 14598 series. In order to support standardization in information tech-

nologies, JSA established INSTAC in 1988. WG5 is one of working groups in the INSTAC and has been working for software product evaluation technologies. Its results were published annually. Some parts of the result were translated into English and submitted to the SC7/WG6.

3. Concept of software evaluation

3.1. What is quality evaluation ?

TQM and quality evaluation. TQM (Total Quality Manage- ment) has been applied successfully in various industries in Japan. The concept and technique of TQM were transferred to software management. SWQC (Software Quality Control) in which the author made initial contributions [5] is well known as a successful example. PDCA (Plan DO Check Action) Cycle (Fig. 1) and Quality model (Quality deployment), which are accepted as effective techniques for software quality management, are key techniques of TQM and popular in the TQM community.

Quality system and quality evaluation. In this model, in order to make timely and correct actions, it is necessary to measure and assess both products and processes. If there is no information that shows which part of the product and which characteristic of the product needs to be improved, no action is possible for improvement. IS0 9001 Quality system states requirements for implementing a quality system. However, as it is the generic standard, i.e. not a software specific standard, there is no requirement nor recommendation concerning quality characteristics, metrics,

M. Azumallnformation and Sofiware Technology 38 (1996) 145-154 147

Supporting tool and techniques for evaluation. Evaluating a software tool is a kind of product evaluation, because it is also a software product, especially from the software

tools developers’ viewpoint. Therefore technologies for product evaluation are useful for this purpose. On the other hand (a) selecting the right technology such as a design method and CASE tools; and (b) predicting and ensuring the product quality, are important for improving quality and productivity from the developer’s viewpoint. For this reason, the international standard ‘ISO/IEC 14102 Guidelines for evaluation and selection of CASE tools’ was developed

by JTCl/SC7.

Fig. 1. TQM PDCA cycle model.

and evaluation process. In these respects, both ISO/IEC 9126 series and 14598 series and IS0 9000 series complement

each other.

Intermediate product evaluation. Product evaluation may be categorized into two categories, i.e. intermediate product evaluation and end product evaluation. Intermediate product evaluation is done at the end of a stage of life-cycle as a part of formal review. Usually it is done by a developer or a reviewer in the organization. The purposes of intermediate product evaluation may be:

Process assessment and quality evaluation. SE1 (Software

Engineering Institute, Pittsburgh, USA) developed CMM (Capability Maturity Model), which is widely accepted [ 61. CMM specifies process maturity by five levels, namely from Level 1 to Level 5. Measurement is one of the important conditions to improve the organization’s maturity from Level 2 to Level 3. The top level, that is Level 5, requires an organization to make continuous efforts to achieve quality and productivity improvements. This continuous effort is exactly what Japanese excellent companies, such as Toyota, have made and are still making to achieve excellence.

JTCl/SC7/WGlO (Process Assessment, Convener: A Dorling) is preparing working drafts for a series of international standards on process assessment with the reference to CMM concepts. If technologies are available for measuring software product quality precisely, measures are

useful for process assessment too. Thus, a good cycle of process and product assessment can be generated.

3.2. Purposes of quality evaluation

Evaluation is the key to success for developing or acquiring good software. Evaluation is necessary at various stages of the software life-cycle for various target entities depending on the specific purpose. In general, the purpose of evaluation is to judge how good the target entity is for the specific objectives, which include:

(1) To select something from two or more alternatives. (2) To estimate or predict values of a target entity. (3) To assess the effect of target entity when it is used. (4) To get information on intermediate products or process

for controlling and managing the process.

Evaluation purpose is clarified when the target entity is defined. The following are the examples of evaluation purposes.

(4

(b)

(cl

(4 (4

to make a decision whether the intermediate product developed by the subcontractor has sufficient quality to accept; to make a decision at the end of a process whether the product has sufficient quality to be forwarded to the next process; to clarify a specific part or attribute which does not meet the requirement or which proves to be the cause of discrepancy;

predicting end product quality; or to give data for process improvement by analysing the cause of excellent parts or bad parts.

At this stage, usually, it is not possible to execute a program for the evaluation. Therefore, the targets to be measured and evaluated are internal documents, such as specification and source program codes. The most popular tool for intermediate product evaluation is a checklist.

End product evaluation. The end product evaluation is further categorized into three sub-categories by a person who does the evaluation, i.e. developers, acquirers and independent evaluators. Purposes and methods are different depending on the sub-category. JTC l/SC7/WG6 is preparing requirement and guides for each.

(1)

(2)

Product evaluation by developers

Usually this evaluation means testing stage evaluation. Evaluation is done by executing the target program as testing. Its purpose may be either deciding release of the product, or comparing the product with competitive products. Product evaluation by acquirers Acquirers may be either those who acquire software products from the contractor, or those who buy software packages. In the case of the former, the main purpose

148 M. Azumallnformation and So&are Technology 38 (19%) 145-154

is to validate the product with quality requirement for deciding acceptance of the product. In the latter case, selecting a product from alternative products is the main purpose. Usually evaluation must be done by comparing the specifications. Even when testing is permitted, the nature of it is quite different from the former case, because only limited testing is possible.

When the product is to be used as a part of a critical system, e.g. a safety critical system, an acquirer is responsible for assessing the product to see whether it has sufficient quality characteristics for the criticality or not. This must be carried for both stated and implied needs. Quality assurance of critical software is considered to be the responsibility of both developers and acquirers, even if some quality requirement is not stated.

(3) Product evaluation by independent evaluators

It means evaluation by evaluators in testing laboratories or other evaluators in an independent organization, either in the same corporation or outside of it. There are three types of purposes of evaluations in this category: (a) evaluation by a request from a developer; (b) by a request from an acquirer; and (c) others such as comparing products for a software magazine. Applic- ability of information, testing environment and other conditions are different for each case.

4. Quality evaluation system

In order to carry out quality evaluation successfully, not only is each individual technique or tool important, but also it must be integrated well with others. Applying a system concept as Quality Evaluation System (QES) is helpful for this integration.

4.1. Conceptual models of sojiware QES

Quality Evaluation System is a human-computer system which receives evaluation request and quality requirement specification as inputs, does planning, designing, implementing, executing evaluation as processes, and transfers results as outputs.

QES may be implemented as a part of a higher level system (larger scope system), such as software development system or software acquisition system, or it may be considered as an independent system.

QES consists of such resources as follows.

QES-R = {Quality model, Metrics, Measurement tools, Evaluation techniques, Data management tools, Evaluators, Computers}

QES Process (QES-P) can be decomposed into Support Process (QES-SP) and Evaluation Process (QES-EP). Each process is decomposed into sub-processes as follows.

QES-P = {QES-EP, QES-SP}

QES-EP = {Evaluation requirement analysis, Planning and implementation, Measurement and rating, Total assessment, Evaluation management}

QES-SP = {Technologies development, Technology transfer, Technology assessment, Technologies and data (experience) management, Standardization, Evaluation support management}

These resources and processes should be integrated, defined formally, and standardized. The concept of the quality model and the metrics for the evaluation system are described in clause 5 and 6 respectively, and the process of the evaluation system is stated in clause 7.

4.2. QES requirements

QES should satisfy but not be limited to the following requirements.

Repeatability. The same results are expected by repeated evaluation on the same product.

Objectivity. The same results are expected by a different evaluation done by a different person or by a different team.

Quantitativeness. Measurement is quantitative and the presentation of the results is quantitative.

Zndicativeness. When some discrepancy or other problems are found by the evaluation their causes and required actions are indicated.

Economical&y. The cost required for the evaluation is rela- tively small, that is, cost effective. Evaluation depending on priority should be possible.

Inclusiveness. The evaluation should cover all quality characteristics.

5. Quality model and quality characteristics

5.1. Quality model

A quality model is a model which deploys quality into a set of characteristics and shows the relationships between them. It provides the basis for specifying quality requirements and evaluating quality of a product.

Quality can be represented by a set of characteristics at various level of abstraction. ISO/IEC 9126 states: ‘A software quality characteristic may be refined into multiple levels of sub-characteristics.’ Using higher level abstraction

M. Azumal Information and Software Technology 38 (19%) 145-154 149

is convenient for quick understanding, especially for man-

agement. However, more detailed technical work must be

done based on lower levels of characteristics.

As initial input to the area, Boehm model [7] and McCall model [ 81 are popular. Quality deployment is a well known and widely used technique of TQM in Japan. M Azuma and T Sunazuka developed SQMAT (Software quality measurement and assessment technology) [ 9 I which is a total quality evaluation system that includes a quality model and metrics. Some other companies, such as NTT, Fujitsu, Hitachi and Toshiba, developed quality models and evaluation methods in Japan.

A quality model is a reflection of quality from a specific view. Therefore, any specialist can propose a new model: and there is no single solution. Some models may have logical problems, and may be useless. But it is more likely that each model has a reason to exist. This situation causes a problem of communication among software related people. If two people use the same word in different meaning they cannot communicate with each other. For example, if one thinks in one way and assumes that the other understood as he/she does, and if the other understood in other way, a big problem may be caused by the misunderstanding. Therefore quality model and associated definition of characteristics should be standardized, so that people can use the same terms with the same meaning.

ISO/IEC 9126 was developed to give a baseline of quality model (Fig. 2) with associated quality characteristic

Oualitv Char’tics Subchar’tics

Functionality

Qualitv Characteristics

Fuctionality

Reliability

Usability

Efficiency

Maintainability

Portability

Subcharacteristics

Suitability, Accuracy, Interoperability, Compliance, Security

Maturity, Fault tolerance, Recoverability

Undestandability, Learnability, Operability

Time behaviour Resource behaviour

Analyzability, Changeability Stability, Testability

Adaptability, Installability, Conformance, Replaceability

Fig. 2. ISO/IEC 9126-quality characteristics.

definitions. INSTAC developed further refinement based on 9126. Fig. 3(a), Fig. 307) and Fig. 3(c) show the INSTAC quality model.

INSTAC Quality Model introduced a concept of internal characteristics. Relations between (external) quality characteristics/sub-characteristics and internal characteristics are shown in this model using three different symbols ranging from strong relation to weak relation. However, as the model is developed by a kind of Delphi method, which means expert opinion with feedback, it is not proved to be the best model. More efforts must be made to refine the model based on statistics and other scientific approaches. Therefore it is rather an initial input for further studies.

Internal Char’tics

Suitability

Accurateness

Inter-operability

Compliance

Security

ommunications-corn

Access control

\ ~Access audit

Robustness

Fig. 3(a). INSTAC quality model-functionality

Quality Char’ticss Subchar’tics

Oualitv Char’ticss Subchar’Gcs Internal Char’tics Usability -

Reliability

-E Maturity

Fault Tolerance

Recoverability

Completeness Traceability Consistency Self-descriptiveness Access audit Robustness Integrity Modularity Simplicity Instrumentalability Self-containedness

- Understandability

- Learnability

Operability

Installability

Controlability

Communicativena

ity

Internal Char’tics

Uniformity Expressiveness Hierarchieness Informativeness Metaphorability Well-equipmentness Attractiveness Timeliness Memorability Conciseness Choosability Guideability safety Labor-saving Adjustability

Fig. 3(b). INSTAC quality model-reliability. Fig. 3(c). INSTAC quality model-usability. 1

150 M. Azumallnformation and Software Technology 38 (19%) 145-154

5.2. Quality characteristics the measurement of one or more other attributes.’ [ lo]

‘Quality characteristic’ is defined as ‘A set of attributes of a software product by which its quality is described and evaluated’ in ISO/IEC 9126. A quality characteristic is a specific view of quality. An attribute may be considered as ‘a measurable physical or abstract property of an entity’.

Sometimes a community defines and uses a term which has generic meaning in the specific meaning. ‘Set’ in the mathematics community and ‘relation’ in the database community are examples of this fact. When one term is used in a different meaning by two or more communities it

may cause confusion, and especially when people in both communities have an opportunity to work jointly, it may cause a problem. There exists this problem in defining a quality characteristic. ‘Usability’ is an example. It is used by information technologies people and ergonomics people with somewhat different meanings.

INSTAC categorized quality characteristics of a software product as external and internal. When a software quality is evaluated, it may be done either by testing or using a software, or by analysing a software, i.e. documentation or source code. The former is the case to measure and evaluate external characteristics, and the latter is the case of internal characteristics.

4.2. External metrics

External metrics are those which are used to measure external characteristics. MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair) are popular metrics for measuring reliability.

INSTAC collected metrics in this category and classified based on the quality model. Some examples are shown in Fig. 4(a), Fig. 4(b) and Fig. 4(c)..

SC7/WG6 started to revise ISO/IEC 9126 (new 9126-1) as stated before. This new version will include sub- characteristics not as an informative annex but as the main body. But we must wait until more research on internal

characteristics is done and contributed. WG6 is planning to recommend the new quality model of 9126-1 to use as default standard model, unless there is a specific model in a specific community such as safety critical systems or usability critical systems.

6.3. Internal metrics

Internal metrics are those which are used to measure internal characteristics. Many companies have statistical reports or

Functional Specification = The Number of Functions Changed

Change Ratio The Number of Functions Implemented

6. Metrics

6.1. Concept of metrics and measurement

‘Measurement’ is a process to assign a value to an attribute of target entity. ‘Measure’ is an assigned value as a result of measurement. Metric is a scale and associated rule and method to be applied for a measurement process. For example, program size may be measured by such scales as LOC (Lines of Code), FP (Function Points), the memory size which the program requires, and specification pages. An obtained values such as LOC, FP by measurement is a measure of the target program size.

The Number of Users’ Requests for Change

Change Request Ratio =

Product Scale (KLOC or The Number of Functions)

Fig. 4(a). Examples of INSTAC external metrics-functionality-suitability.

Total Operating Time

Mean Time To Failure =

The Number of Observed Failure

But if there is no rule for counting LOC, the same program may have various values depending on how it is counted as widely known. The rule and method for counting

LOC of a target program is a metric.

Product Error Density = The Number of Errors in Product

Volume of Product

Fig. 4(b). Examples of INSTAC external metrics-reliability-maturity.

Norman categorized scales into nominal, ordinal, interval and ratio. He further categorized targets of measurement by entities and attributes. He categorized entities into products, processes and resources, and attributes into internal and external.

The Number of Operation Commands with D V Default Value = Availability Ratio Total Number of Operation Commands

There are two kinds of measures, i.e. direct measures and indirect measures. Norman defined them as: ‘Direct measurement of an attribute is measurement which does not depend on the measurement of any other attribute. Indirect measurement of an attribute is measurement which involves

Message Term = Consistency

The Number of Standardized Ten

The Number of Terms in Message

Fig. 4(c). Examples of INSTAC external metrics-usability-operability-

(Controllability).

M. Azumal Information and Sofrware Technology 38 (1996) 145-154 151

at least records of problems, such as bug reports. Com- plexity metrics, such as cyclomatic complexity [ 111, modularity metrics, such as cohesion and coupling [ 121

are other examples of metrics in this category. INSTAC collected metrics in this category and classified

based on the quality model. Some examples are shown in Fig. 5.

6.4. Basic metrics

In many cases, when a measure is used for evaluation or selection, a measure should be normalized. For example, the number of faults and the number of failures are important measures. But they are not useful for evaluation unless they are normalized as MTBF and the number of faults for every KLOC.

As a quality characteristic is abstract property, most measures for it are indirect measure like a ratio, which is derived from measures of attributes. And there are some measures which are frequently used for composing various indirect measures. We call this class of measures basic measures, and corresponding metrics basic metrics. KLOC is an example of basic metrics.

7. Evaluation process

7. I. Life-cycle and software product evaluation

There are several well-known life-cycle models for software development. Implementing software product evaluation

process in a life-cycle may be different, depending on a life- cycle model. If the waterfall model is to be taken as an example, it is suggested to take the following procedure as shown by the U Shape Model in Fig. 6. This model is still under discussion in SC7/WG6 as a part of 14598-1 General Overview.

This approach is considered as a kind of MB0 (Manage- ment By Objectives). MB0 is a widely accepted management approach, which means that a well-defined goal is a necessary condition for acquiring good results. In the field of software, NEC applied MB0 successfully in SWQC. Weinberg proved the importance of goal in program development by an experiment [ 131. The GQM (Goal Question

Traceable Function Ratio of Specifications =

The Number of Functions in Specification at Current Phase

The Number of Functions in Specification at the Previous Phase

The Number of Verifiable Items

Verifiable Item Ratio = The Number of Items checked

Fig. 5. Examples of INSTAC internal metrics-traceability.

Internal Review & Verification

Fig. 6. U-shape model

Metric) paradigm proposed by Basili [ 141, and OPM (Objectives Principles Metrics) Approach by Arthur and Nance [ 151 are two better defined MB0 specifics to software quality.

The one feature of the U shape model approach is that there are two views of quality, i.e. external and internal. Major processes in this model are as follows.

(1)

(2)

Recognizing needs for new product @-e-development

stage) Developing a product is always triggered by an aware- ness of needs. Needs may be ambiguous or clear. A user may want to have a better product than what he/she has. Or needs may be just complaint or dis- satisfaction with a product. In a case of a software package developer, needs may be a good idea for a new product, or may be a strong wish to develop better products than a competitor’s. Needs amy exist in the human mind without being expressed, may be stated verbally or may be described in a document. Anyway, they are, in most cases, informal and incomplete. Dejining quality requirement (requirement analysis stage) In order to develop a good software product, quality requirement must be clearly defined. Without quality requirement, product evaluation is difficult and mean- ingless, because quality is ‘totality of characteristics of an entity (product) that bear on its ability to satisfy stated and implied needs’ [ 161. Therefore, quality requirements should be stated for every quality characteristics based on a quality model such as ISO/IEC 9126, and evaluated for each of them.

Relative importance of each characteristic is different by a mission of a system in which the software product is used. For example, time behaviour (performance) of efficiency and security of functionality are very important for a banking system which processes an enormous number of transactions within a limited time frame. Another example is a wordprocessor and a spreadsheet software in which usability, interoperability and portability are important. These software products are widely used by a large number of people, most of them

152

(3)

(4)

(5)

M. Azumallnformation and Sofhoare Technology 38 (19%) 145-154

not professional computer engineers, for various purposes in various environments.

Though quality requirements are to be stated for every quality characteristic, state-of-the-art requirement definition technique supports only functionality and does not support most of other characteristics. Quality characteristics and their sub-characteristics in ISO/IEC 9126 are useful as a checklist for a quality requirement statement.

As a result of requirement analysis, requirement specification does not always state all requirements. If it does, it is rather a rare case. A product which incorporates all stated needs as requirement is not necessarily a good product. Sometimes it is necessary to cut some minor needs to make the product consistent. On the other hand, requirement analysts should try to take implied needs into requirement specification using logical reasoning, heuristics or other design strategy. Dejining quality design goals (development stage)

If a developer develops a software product without explicit goal, no quality product is available. It is impossible or inefficient to try to acquire good quality only by means of testing.

Quality requirement should be stated from the users’ view. However, it is suggested here that design goals are stated from a developer’s view, because how to achieve required quality is the developer’s responsibility. A developer’s view focuses on internal characteristics or internal attributes of software, such as modularity and traceability.

Design goal should be as quantitative as possible by using metrics. The selection and use of metrics depends on the relative importance of each internal characteristic.

By setting design goals for each internal characteristic and attribute, not only developers can have a clear guide, but also guides to design review and code inspection become clear. Preparing metrics and measurement (evaluation

planning stage) Some attributes need to be measured in advance. If there is no record, some measurement is difficult. For example, there may be a number of discrepancies found during design review divided by a number of reviewed specification pages. In this case these data items should be counted and recorded in advance. Therefore, selecting metrics and associated measurement should be planned at this stage. It is especially important for basic data elements. Evaluating internal quality (development stage) Formal design review is a popular method for verifying that designed specification is correct. Though the name may vary, there are two major design reviews, i.e. preliminary design review and critical design review. Guidelines, checklists, or any other tools, are vitally important for successful reviews. Design reviews without any of these tools cannot be expected

(6)

(7)

to have effective results. In order to improve total quality in specification, it

is necessary to plan a review, to measure, and sum- marize the results for each internal characteristic based on a checklist or other tools. Thus, it should be verified by measuring the results that the design goals are properly fulfilled. Evaluating external quality (testing stage)

At this stage, a program product is ready for execution, especially in the system testing stage for evaluating it. Testing means to execute a program for finding defects. Defects may be faults or other unsatisfactory attributes in a product. However, in many cases, testing is planned and done just for finding faults, and does not cover all quality characteristics.

In order to confirm that the product meets all quality requirements, testing should be carefully planned; test cases are designed to cover all quality requirements. This stage is extremely important for assuring quality. Evaluating ‘in use quality’ (use stage) External quality characteristics are validated with quality requirements. However, as it is impossible to expect, from various reasons’ as stated before, that quality requirement is equal to the needs, a user may be aware of some discrepancies between the users’ needs and the characteristics or attributes of the product throughout its actual usage.

This is inevitable. Therefore, following up these problems and finding a mechanism for supporting it and making feedback to the next generation development cycle is important.

This U shape model should be implemented in a specific software life-cycle by customizing it properly. For example, if prototyping is used for user interface, usability evaluation of stage (6) should be implemented at the earlier stage of the life-cycle.

7.2. Quality evaluation process

ISO/IEC 9126 contains evaluation process model as Fig. 7. This model will be modified and transferred to the new series ISO/IEC 14598-1 General Overview.

Quality requirement definition

‘=pLq Fig. 7. ISO/IEC 9126-evaluation process model.

M. Aumallnformtion and Sofrware Technology 38 (19%) 145-154 153

By taking these discussion into consideration, the outline of each process is as follows.

(1)

(2)

(3)

(4)

Evaluation requirement analysis

Quality is, when evaluated from external view, represented by a degree in which the product satisfies stated needs, so quality requirement should be stated clearly for every quality characteristic. Evaluation starts with studying quality requirements to see whether they are stated exhaustively and clearly or not. Other- wise, no clear goal of evaluation can be identified. Planning and implementation

Evaluation planning and preparation is done based on the evaluation goals which are identified at the previous process. Preparation includes identifying target con- figuration such as documentation and program and their attributes for measurement, planning, selecting

metrics, making measurement and data handling procedure, defining rules for rating and assessment. Measurement and rating A measurement process can be refined into the primary measurement process and the secondary measurement process. The primary measurement process is to get original data by applying metrics to the attribute of the target product. It may be random sampling process or exhaustive measurement process. It may be automated process or manual process using checklist. It may be objective measurement or subjective judgement. The secondary process is to generate required measure (value) from a set of original data based on a statistic method or other defined functions. It is suggested by ISO/IEC 9126 that the measurement process is followed by the rating process. Total assessment In this process, obtained measures are analysed, sum- marized or transformed based on a predefined procedure. Some judgements and interpretation of measure are done based on assessment criteria. Then a final report is prepared.

7.3. Quality evaluation support process

Support functions, both managerial and technical, are important for effective evaluation. The questionnaire survey on software management by the author and D Mole clarified that many European and Japanese managers suggested that supporting functions, especially software engineering support, quality assurance support and education support, are important for the successful software management [ 171. Support functions include such processes as:

technology development process including metrics, methodologies and tools; technology transfer process which includes education and direct support; technology assessment process both before and after evaluation;

T 1 Evaluation Process + [

Fig. 8. Supporting process and evaluation process.

technology and data (experience) management; standardization process; and evaluation support management.

Outline of support process and relationship between the support process and evaluation process is shown in Fig. 8.

8. Conclusion

Recently, software process assessment and product evaluation drew the attention of a large population in the field of software quality. As there are so many opinions and contributions, it is difficult to overview all the technologies. In this paper, the author has tried to clarify various concepts and technologies for quality evaluation by the static system model, the U-shape model and the evahiation and support process models. These models, I believe, can be used as a framework and can define the position of most of current technologies. By using these models it is also possible to identify what is missing.

The works in SC7/WG6 and INSTAC are still going on. The author hopes that this paper can contribute to understanding these activities and give an opportunity for the information exchange in the field.

Acknowledgements

The author is grateful to Dr Nigel Bevan, Mr Andrew Chruscicki and Dr Shigeru Nishiyama who contributed as project editors of 14598-1 General overview, other editors of 9126 and 14598 series, and members of SC7/WG6 for their useful suggestions. Appreciations are also given to the members of INSTAC/STD/WGS and the national SC7/WG6 members.

154 M. AzumalInformation and Sofware Technology 38 (1996) 145-154

References

[ 1 ] ISO/IEC 9126, Information technology-Software product evalu-

ation-Quality characteristics and guides for their use (1991).

[2] M Azuma (ed.), SoJiware Quality Evaluation Guide Book JSA

(1994) (Japanese edition).

[ 31 ISO/IEC 12119, Information technology-Software packages-

Quality requirement and testing (1994).

[ 41 ISO/IEC 14102, Information technology-Guidelines for evaluation

and selection of CASE tools.

[5] M A Cusmano, Japan’s Sojiware Facfories Oxford (1991).

[6] W S Humphrey, Managing the Software Process Addison-Wesley

(1989).

[7] B W Boehm, J R Brown and M Lipow, Quantitative evaluation of

software quality, Proc. 2nd Int. Conf on Soft. Eng. (1976) pp

592-605.

[8] J A McCall, P K Richards and G F Walters, Factors in Software quality, Rome Air Develop. Center Report, TR-77-369 (1977).

[9] T Sunazuka and M Azuma, Software quality measurement and

assessment technology, Proc. 8th ICSE (1985) pp 142-148. [ lo] E F Norman, Software Metrics-a Rigorous Approach Chapman &

Hall (1991). [ 111 T J McCabe, A software complexity measure, IEEE Trans. on So@

Eng. Vol 2 No 6 (1976). [ 121 G Myers, Composite Structured Design Van Nostrand (1978). [ 131 G M Weinberg and E L Schulman, Goals and performance in

computer programming, Human Factors Vol 16 No 1 (1974).

[ 141 V R Basili and H D Rombach, The TAME project: towards

improvement-oriented software environments, IEEE Trans. on Soft.

Eng. Vol 14 No 6 (1988) pp 758-773.

[ 151 J D Arthur and R E Nance, Developing an automated procedure for evaluating software development methodologies and associated

products, A final Report. Technical Report SRC-87-007 Systems

Research Center and Virginia Tech (1987).

[ 161 IS0 8402, Quality Vocabulary (1994).

[ 171 M Azuma and D Mole, Software management practice and metrics:

EC and Japan-some results of questionnaire survey, J. of Systems & Software Vol 26 No 5, Elsevier (June 1994).

Documents

Software products evaluation system: quality models, metrics and processes—International Standards and Japanese practice