7
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-11, NO. 12, DECEMBER 1985 Toward High Confidence Software JOSEPH P. CAVANO Abstract-Moving toward high confidence software that can meet ever increasing demands for critical DOD applications will require plan- ning, specifying, selecting, and managing the necessary development and testing activities that will ensure the success of the software proj- ect. In order to trust the decisions being made, there must be evidence (i.e., an information base of data and facts) that techniques and tools being chosen for application on critical projects will perform as ex- pected. Today, these expectations are mostly intuitive; there is little hard evidence available to guide acquisition managers and software devel- opers in making necessary decisions. This paper brings into perspective how software measurements can affect the confidence and trust that the DOD can place in its software. The paper describes a management approach to achieving high confi- dence software, highlighting software reliability as a key factor in that confidence. The concepts are presented from a DOD perspective but are applicable to any organization interested in managing software re- liability. A software reliability measurement methodology is described for 1) specifying software reliability goals and the processes needed to help achieve those goals, 2) predicting and tracking progress during development (leading to corrective action to improve the software), 3) estimating reliability based on testing effort and processes, and 4) as- sessing achieved software reliability during operational use. The con- cept for a workable set of software reliability measures is presented to support this methodology. Typical questions that face the DOD are posed, and it is shown how the methodology, once implemented, can help answer those questions. Index Terms-DOD applications, high confidence software, software reliability measurement methodology. INTRODUCTION A DOD acquisition manager tasked with developing mission-critical and safety-intensive software for projects such as the Strategic Defense Initiative is faced with obtaining high confidence software, i.e., by trusting the management and engineering decisions being made throughout the acquisition and software development pro- cess and by believing in its reliability. Recognizing the problem is only the first step; knowing what to do to achieve high confidence software still re- mains elusive. This is especially true in the area of soft- ware reliability, one of the prime factors affecting confi- dence (or lack of it) in software systems. Current DOD development programs are unable to achieve satisfactory software reliability in a consistent fashion because of the lack of understanding of what conditions truly affect re- liability. This situation is compounded when you consider that software reliability requirements for future DOD sys- tems will be much higher as functional demands on the software become more complex, as criticality of the soft- Manuscript received February 28, 1985; revised September 30, 1985. The author is with Rome Air Development Center, Griffiss Air Force Base, NY 13441. ware increases and as system components become more distributed. In a recent study of DOD long range planning documents, it was found that 80 percent of current and future DOD programs in the 1985-1989 time frame con- tain a significant software development component, and by the end of the decade, 85 percent of embedded computer resources will be allocated to software [1]. Although many of the DOD development programs surveyed in the above study were able to specify system reliability (usually in terms of mean time between failure), few were able to specify software reliability. Yet, software reliability be- comes an obvious confidence issue that an acquisition manager must address when the system depends heavily on the software for mission success. The challenge facing the acquisition manager is to de- termine how to improve the confidence in the software DOD must deploy in its major programs. Among the ques- tions that typically remain unanswered because of lack of quantitative information are the following: 1) How does the acquisition manager go about estab- lishing levels of confidence and meaningful requirements for software reliability? 2) What tradeoffs need to be considered among reli- ability, cost, schedule, and performance? 3) What is the current state-of-practice for software re- liability? 4) What development techniques should be required to improve confidence on the project? 5) How can future software reliability be predicted and evaluated at key milestones during the development life cycle (e.g., software specification review, preliminary de- sign review, critical design review, code reviews, and test- ing reviews)? 6) What types of corrective action can be taken as a result of the above evaluations to increase DOD confi- dence in its software? 7) How much testing should be performed and which testing techniques should be required to achieve specified/ confidence and reliability levels? 8) How can the user assess how well reliability goals were met during deployment of the software? Until these questions can be answered, it will be diffi- cult to have high confidence in software on critical appli- cations. The Rome Air Development Center (RADC) and the Software Technology for Adaptable, Reliable Systems (STARS) Measurement Programs are seeking solutions to these problems by developing and applying software mea- surements. To pursue high confidence software, the co- operation of the DOD, industry, and the research com- U.S. Government work not protected by U.S. copyright 1449

Toward High Confidence Software

Embed Size (px)

Citation preview

Page 1: Toward High Confidence Software

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-11, NO. 12, DECEMBER 1985

Toward High Confidence SoftwareJOSEPH P. CAVANO

Abstract-Moving toward high confidence software that can meet everincreasing demands for critical DOD applications will require plan-ning, specifying, selecting, and managing the necessary developmentand testing activities that will ensure the success of the software proj-ect. In order to trust the decisions being made, there must be evidence(i.e., an information base of data and facts) that techniques and toolsbeing chosen for application on critical projects will perform as ex-pected. Today, these expectations are mostly intuitive; there is little hardevidence available to guide acquisition managers and software devel-opers in making necessary decisions.

This paper brings into perspective how software measurements canaffect the confidence and trust that the DOD can place in its software.The paper describes a management approach to achieving high confi-dence software, highlighting software reliability as a key factor in thatconfidence. The concepts are presented from a DOD perspective butare applicable to any organization interested in managing software re-liability. A software reliability measurement methodology is describedfor 1) specifying software reliability goals and the processes needed tohelp achieve those goals, 2) predicting and tracking progress duringdevelopment (leading to corrective action to improve the software), 3)estimating reliability based on testing effort and processes, and 4) as-sessing achieved software reliability during operational use. The con-cept for a workable set of software reliability measures is presented tosupport this methodology. Typical questions that face the DOD areposed, and it is shown how the methodology, once implemented, canhelp answer those questions.

Index Terms-DOD applications, high confidence software, softwarereliability measurement methodology.

INTRODUCTIONADOD acquisition manager tasked with developing

mission-critical and safety-intensive software forprojects such as the Strategic Defense Initiative is facedwith obtaining high confidence software, i.e., by trustingthe management and engineering decisions being madethroughout the acquisition and software development pro-cess and by believing in its reliability.

Recognizing the problem is only the first step; knowingwhat to do to achieve high confidence software still re-mains elusive. This is especially true in the area of soft-ware reliability, one of the prime factors affecting confi-dence (or lack of it) in software systems. Current DODdevelopment programs are unable to achieve satisfactorysoftware reliability in a consistent fashion because of thelack of understanding of what conditions truly affect re-liability. This situation is compounded when you considerthat software reliability requirements for future DOD sys-tems will be much higher as functional demands on thesoftware become more complex, as criticality of the soft-

Manuscript received February 28, 1985; revised September 30, 1985.The author is with Rome Air Development Center, Griffiss Air Force

Base, NY 13441.

ware increases and as system components become moredistributed. In a recent study of DOD long range planningdocuments, it was found that 80 percent of current andfuture DOD programs in the 1985-1989 time frame con-tain a significant software development component, and bythe end of the decade, 85 percent of embedded computerresources will be allocated to software [1]. Although manyof the DOD development programs surveyed in the abovestudy were able to specify system reliability (usually interms of mean time between failure), few were able tospecify software reliability. Yet, software reliability be-comes an obvious confidence issue that an acquisitionmanager must address when the system depends heavilyon the software for mission success.The challenge facing the acquisition manager is to de-

termine how to improve the confidence in the softwareDOD must deploy in its major programs. Among the ques-tions that typically remain unanswered because of lack ofquantitative information are the following:

1) How does the acquisition manager go about estab-lishing levels of confidence and meaningful requirementsfor software reliability?

2) What tradeoffs need to be considered among reli-ability, cost, schedule, and performance?

3) What is the current state-of-practice for software re-liability?

4) What development techniques should be required toimprove confidence on the project?

5) How can future software reliability be predicted andevaluated at key milestones during the development lifecycle (e.g., software specification review, preliminary de-sign review, critical design review, code reviews, and test-ing reviews)?

6) What types of corrective action can be taken as aresult of the above evaluations to increase DOD confi-dence in its software?

7) How much testing should be performed and whichtesting techniques should be required to achieve specified/confidence and reliability levels?

8) How can the user assess how well reliability goalswere met during deployment of the software?

Until these questions can be answered, it will be diffi-cult to have high confidence in software on critical appli-cations. The Rome Air Development Center (RADC) andthe Software Technology for Adaptable, Reliable Systems(STARS) Measurement Programs are seeking solutions tothese problems by developing and applying software mea-surements. To pursue high confidence software, the co-operation of the DOD, industry, and the research com-

U.S. Government work not protected by U.S. copyright

1449

Page 2: Toward High Confidence Software

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-I1, NO. 12, DECEMBER 1985

munity will be required. We need to better understand theeffects of software acquisition and software engineeringpractices on the software that is produced. We need betterconstruction, not just better inspection. We need to dis-cover the causes of software problems so we can preventthem, not just treat their effects. We need to identify whichconditions contribute to high confidence or to low confi-dence, so that we can encourage the former and avoid thelatter. We need to evaluate software products earlier in thesoftware development life cycle and to instigate correctiveactions as needed to ensure that critical DOD weapon sys-tems achieve required goals. Before substantial improve-ment can be realized in these areas, valid quantitative in-formation must be available. The lack of valid softwarereliability models and data has long been recognized ashampering progress in solving these problems [2].

A FRAMEWORK FOR CHARACTERIZING SOFTWAREPRODUCTS AND PROCESSES

Currently there are no widely accepted definitions ormeasures for describing software across its life cycle. Pro-cedures for data collection vary from project to project,making it difficult to make comparisons among softwareprojects and to benefit from lessons learned on past proj-ects. To overcome these obstacles, a framework is neededto characterize software products and processes acrossmany dimensions. The STARS Measurement Program hasadopted the framework shown in Fig. 1 for characterizingMission Critical Computer Resources (MCCR) in termsof 8 dimensions: 1) function, 2) resources, 3) schedule,4) performance, 5) quality, 6) personnel, 7) methodology,and 8) environment.Each dimension plays a role in influencing high confi-

dence software. In the past, software studies have typi-cally focused on only one dimension or at only one stageof the software development life cycle. One reason the re-sults of those studies have not been widely accepted isbecause they take too narrow a view for such a broad area.The RADC and STARS programs seek to integrate all ofthese dimensions, to cover the complete life cycle, and todetermine what role each dimension plays in affectingsoftware reliability and confidence.As a first step in integrating the above dimensions, an

interim set of data collection forms has been produced,consistent with DOD-STD-2167, describing resource ex-penditure, software characteristics, software testing, soft-ware problem/change, software environment, and soft-ware evaluation [3]. The next step will be to produce ameasurement data collection and analysis guidebook, en-hancing and streamlining data collection procedures andsuggesting analyses that could be performed on the data.The guidebook will be applied initially on critical DODprojects to evaluate the software development processesand products against quantitative goals required by the ap-plication. The resulting data will also be placed in a re-pository or measurement database where it will be avail-able for evaluating the measures and guiding futureprojects. A goals-directed approach is being followed for

COSTS TIME (PLANNED VS ACTUAL)NUMBER OF PEOPLE LIFE CYCLE DEVELOPMENTCOMPUTER TIME REOUIREMENTDOCUMENTATION DESIGN

IMPLEMENTATION

2<<'/ Xz \ ~~~~~~~~~~~~TESTRELIABILITYMAINTAINABILITYPORTABILITY RESOURCES SCHEDULE APPLICATIONREUSABILITYTESTABILITYCORRECTNESS FUNCTION RUN TIME

MEMORY/STORAGEUREAUIREMENTS

DEVELOPERSSYSTEM ENGINEERS UALI FORMANCEANALYSTS ACOUISITION PROCESSPROGRAMMERS DPERSONNEL EVELOPMENT PROCESSTESTERS DEVELOPMENT TECHNIDUES

\ \ METH~ODOLOGY/OPERATORSIUSER S

ENVIRONMENT_T DEVELOPMENT ENVIRONMENTCENTRALIZED/DISTRIBUTEDTOOLS

MAINTENANCE ENVIRONMENTOPERATIONAL ENVIRONMENT

Fig. 1. A framework for characterizing a software product.

data collection and analysis [4] which consists of 6 steps:1) establish the goals of the data collection; 2) develop alist of questions of interest; 3) establish data categories;4) design and test data collection forms; 5) collect andvalidate data; and 6) analyze data.Within this framework several types of software mea-

surements are needed:1) Descriptive measures to provide general information

characterizing a software development by type of appli-cation or mission, software category, program size, typesof faults experienced and their mission impact, extent ofsoftware reuse, software engineering processes employed,and capabilities of development, maintenance and opera-tional environments. Descriptive measures will enablemeaningful comparisons to be made across projects by in-suring that projects are similar in size, scope, application,and schedule.

2) Prescriptive measures to instigate a course of cor-rective action to remedy a potential problem, perhaps inthe form of checklists. An example might be checking firstfor the existence of a traceability matrix for mapping mod-ules to requirements. If a module is discovered which can-not be matched to a requirement, then that discrepancyshould be resolved by eliminating the module or updatingthe matrix.

3) Predictive measures applied during development tocharacterize expected software reliability drivers such asstructure or complexity and then use that information togive some indication of future reliability.

4) Appraisal measures to directly characterize what hasbeen achieved on the final product based on operationaluse.

5) Historical measures to characterize past software de-velopment projects. Stored in a data repository, historicalmeasures will facilitate comparisons from a current proj-ect to past but similar projects to help track progress or toestablish trends or thresholds.

A SOFTWARE RELIABILITY MEASUREMENTMETHODOLOGY

The application of this framework for software mea-surement will be crucial for managing high confidence

1450

Page 3: Toward High Confidence Software

CAVANO: TOWARD HIGH CONFIDENCE SOFTWARE

SPECIFICATION PREDICTION & TRACKING ESTIMATION ASSESSMENT

SYSTEM/ SOFTWARE CODING CSC SYSTEM OPER- PRODUC-PRELIMI-SYSTEM ASOFWARE REOUIRE- NARY DETAILEE INTE- CSCI INTE- ATIONAL TION

CONCEPTS REQUIRE- MENTS DESIGN UNIT GRATION x TESTING GRATION TESTING &

IMENTS ANALYSIS DESIGN EVALE- DEPLOY-

ANALYSIS TESTING TESTING TESTING ATION MENT

APPLICATION REDUIREMENT DESIGN IMPLEMENTATION TEST TEST TEST TEST ACTUALMEASURES MEASURES MEASURES MEASURES OATA DATA DATA DATA PERFORMANCE

DATA

RELIABILITY Y PREDICTION RELIABILITY

GOAL FIGURE-OF-MERIT NEMBER (RELIABILITY ESTIMATION ASSESSMENTvSPECIFICATIONy NUMBER) ENCHMARKJ

QUANTITIYE STATEMENT GUANTITATIAE STATEMENT OUANTITATIVE NUMBER DERIVED BASED ON ACTUAL

OF RELIABILITY GOAL ABOUT FUTURE RELIABILITY FROM TEST DATA OR DATA PERFORMANCEAS A FUNCTION OF METRICS COLLECTED IN OPERATIONAL DURING OPERATION

ENVIRONMENT

Fig. 2. Framework for software reliability.

software. This paper will discuss a methodology high-lighting the quality dimension of this framework (and re-

liability in particular). It is intended that the methodologywill be used as a component within the entire framework.Over the past eight years RADC has developed a hierar-chical measurement model for predicting software qualityduring the software development life cycle [5]. This modeldecomposes software quality into 13 quality factors: reli-ability, maintainability, testability, correctness, flexibility,reusability, user-friendliness, efficiency, portability, se-

curity, survivability, interoperability, and extensibility.Although high confidence software, in its broadest sense,

encompasses the first five factors in the RADC model,this paper focuses on the effect of software reliability.

Recently, RADC has begun enhancing the reliabilitycomponent of this model and is developing a Software Re-liability Measurement Methodology (RELMM) that can

be used for managing and improving software reliability[6]. RELMM is aimed at supporting all organizationalentities typically involved in a large DOD system acqui-

sition (e.g., in the Air Force this would include the ac-

quisition manager, the end user, the developer, the IV&Vagent, the test agent and the life cycle support agent).RELMM incorporates four related activities: reliabilityspecification, prediction and tracking, estimation, and fi-nal reliability assessment (Fig. 2).

Producing numbers to quantitatively represent softwarereliability across these activities is an integral part ofRELMM and will be driven by tracking and documentingsoftware faults (i.e., manifestations of errors) and soft-ware failures (i.e., the inability of a system to perform a

required function within specified limits) over the com-

plete life cycle. The concept of software reliability is toocomplicated to be adequately described by a single num-

ber, just as weather cannot be characterized by the singlenumber representing temperature. Until agreement can belogically reached, multiple representations of software re-

liability (e.g., fault density and failure rate) will be used.An analogy from physics for describing light illustrates a

multiple representation; light is treated in some situationsas a particle, while other times it is considered as a wave.

As software reliability matures, which representation isbest in which situations will be determined, or a betterrepresentation will be developed. Since an interpretationof failures over time cannot be made until after unit test-

ing (much too late in the development cycle to guide thesoftware development in terms of meeting software reli-ability goals), fault density is more suitable early in thedevelopment cycle to characterize the software product.Once testing has begun, a failure rate can be computedbut must be used with discretion. Using the failure rate toproject mean-time-between-failure (a tempting analogyfrom hardware reliability) can be misleading because soft-ware faults are really uncovered by an external event trig-gering a path/data value which excutes the code such thata software fault that had been previously present is nowrevealed, and this is not truly dependent on time. Failuresare also uncovered when portions of the code which havenever been exercised before (e.g., error recovery routines)are finally executed, and this occurrence is also not de-pendent on time in the same sense that hardware failurescan be associated with time.

Failure rate was chosen as the basic unit for the reli-ability numbers used in RELMM because it serves as acommon thread throughout system acquisition and deploy-ment. It furnishes a quantitative basis for characterizingsoftware reliability. An equally important issue is the se-lection of variables which will be measured, combined,and then characterized by failure rate. The failure rateserves as a useful device for describing reliability so thatcomparisons among projects can be made. The measure-ments will provide the basis for learning which variablesaffect reliability and, thereby, the confidence placed on thesoftware.

Reliability SpecificationReliability Specification during the requirements defi-

nition phase involves establishing the goals for softwarereliability required for the missions of the given applica-tion. Both the user or customer of the project and the ac-quisition manager should be involved in producing thespecification. The goal will be expressed in measureableterms based on what has been actually achieved on pastprojects and so its achievement on actual missions can bereadily assessed. At the same time, the necessary soft-ware development techniques and software acquisitionmanagement practices should be specified to ensure thatthe goals will be achieved. Early specification of goals anddevelopment techniques will make it clear to the developerwhat is desired and will elevate reliability goals to equalfooting with cost, schedule, and performance in the ac-quisition process. This will serve as a first step towardhigh confidence software.A number of sources provide general guidance for spec-

ifying goals and selecting software engineering techniquesneeded to achieve it [5], [7]-[9]. RADC has developed aquality specification guidebook, providing a set of proce-dures for enabling a software acquisition manager to iden-tify and specify software quality factor goals including re-liability [5]. The guidebook describes steps for selectingquality factors of importance and assigning initial qualitygoals based on command and control applications.Guidance is also available concerning the selection of

1451

Page 4: Toward High Confidence Software

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE- 1, NO. 12, DECEMBER 1985

software engineering techniques or tools based on theirability to improve software quality (e.g., minitnize the in-troduction of software errors, maximize the detection oferrors already introduced, and maximize the recoveryfrom failures experienced during operational use). Aranking of software methodologies is given in [9] in termsof their expected effect on software reliability (from "mostvaluable" to "least valuable"). Another study [8] char-acterizes processes as either error avoidance or error de-tection techniques. It presents the results of a surveyshowing the influence of each technique as it relates toreliability.RADC has also produced a software test guidebook for

selecting appropriate testing techniques, depending onproject and software characteristics and the testing con-fidence level required for the application [7]. The guide-book uses project, software and test considerations in es-tablishing a testing confidence level. Software testingtechniques are selected by matching their capabilities withproject considerations based on software catagories (e.g,process control, sensor and signal processing, databasemanagement, message processing, etc.), test phase andtheir objectives, and types of software errors (i.e., errorsthat have occurred in similar projects and are likely to oc-cur in the present project).

If DOD cannot establish the proper goals for a projectinitially, then it will be impossible to manage the projectsto insure that the proper goals are achieved. Many man-agement decisions are currently based on intuition and un-supported claims, but the foundation for these decisionswill be improved by more precise measurements.

Prediction and TrackingOnce the reliability goals are established, the software

project would be evaluated at key milestones to ensure thatthe goal will be achieved. During development, softwaremeasures which characterize the software processes em-ployed and their associated products can be used as indi-cators to predict future reliability and to track progresstoward established reliability goals. The assumption be-hind this concept is that there is a causal relationship be-tween the way software is constructed and its final reli-ability. Although much effort has gone into developing suchmeasures, the form of these "cause and effect" relation-ships have not been firmly established. Preliminary evi-dence suggests that this is possible [10]. If this potentialcan be realized, the measures will then be a powerful toolin guiding the selection of techniques for developing soft-ware and in predicting software reliability in-line with thedevelopment so that improvements can be made.

Software products during development will also be eval-uated at key milestones to identify potential problem areasthat should be corrected before development continues,and to indicate whether the required reliability levels willbe achieved. For example, a high software complexitymeasure might indicate that software reliability goals willnot be met. Corrective action to rectify this might take theform of redesigning or recoding those units with the high-

TABLE IPREDICTIVE AND ESTIMATION MEASURES

PREDICTIVE RELIABILITY FIGURE-OF-MERIT NUMBER (RFOM)

APPLICATION TYPE ADEVELOPMENT ENVIRONMENT 0SOFFWARE CHARACTERISTICS S

REQUIREMENTS & DESIGN REPRESENTATION SiANOMALY MANAGEMENT SATRACEABILITY STDUALITY REVIEW RESULTS SO

SOFTWARE IMPLEMENTATION S2LANGUAGE TYPE SLPROGRAM SIZE SSMODULARITY SMEXTENT OF REUSE SUCOMPLEXITY SXSTANDARDS REVIEW RESULTS SR

RFOM = A * D * S WHERES =S1 *S2S1 = SA * ST* SOS2 = SL * SS *SM *SU *SX*SR

RELIABILITY ESTIMATION NUMBER (REN)FAILURE RATE DURING TESTING FTEST ENVIRONMENT T

TEST EFFORT TETEST METHODOLOGY TMTEST COVERAGE TC

OPERATING ENVIRONMENT EWORKLOAD EWINPUT VARIABILITY EV

REN = F * T, DURING TESTING WHERET=TE*TM* TC &

REN = F * E, DURING OT&E WHEREE = EW* EV

est measured levels of complexity. If that action was notfeasible, an alternate method might be to identify thoseunits tagged by higher complexity for additional or moreintensive testing. Corrective actions employed and theireffect would be tracked to ensure progress is made towardtheir goal.

To support the prediction and tracking activity, a mea-sure is needed as part of the evaluation process for ensur-ing that the software project is on-target with its goals.The predictive reliability figure-of-merit (RFOM) number,a prediction of future reliability consistent with the estab-lished goals, will be based on a number of indicators be-lieved to directly influence software reliability. These in-dicators will be represented as measures pertaining toapplication type, development environment, and softwarecharacteristics (including language type, program size,modularity, extent of reuse, complexity, anomaly manage-ment, and results from quality and standards reviews) asshown in Table I [10].Each of the identified measures will be collected on a

project, and their product will yield the RFOM, expressedas a fault density. A fault density will be associated witheach application type by examining known faults associ-ated with deployed systems and taking the average forthose that fall within each application type. To arrive atthe RFOM Number, the fault density will then be adjustedup or down according to the values obtained for the otherpredictive measures. For example, initial work has beenestablished a fault density of 0.001 for a class of interac-tive military systems. This will be used as the startingpoint for the expected fault density of a new project of thatapplication type. Another entry under software character-istics will be language type. The expected fault densitywill be adjusted up if an Assembly language is used orheld constant if a higher order language is used. In thiscase a multiplier effect of 1.4 is planned for Assembly lan-

1452

Page 5: Toward High Confidence Software

CAVANO: TOWARD HIGH CONFIDENCE SOFTWARE

guage programs and 1.0 for higher order languages (themultiplier value takes into consideration the expansion ra-tio of higher order languages to comparable assembly lan-guage statements) [10]. A significant dependency of faultdensity on language has been established in [11].

EstimationMore precise estimation of software reliability is needed

during the test and evaluation phase to evaluate whetherthe system should be accepted and determine whether itwill perform as expected in the operational environment.A reliability estimation based on data available from thisphase is still an "estimate" because it is not known howclosely or thoroughly the software is being exercised inthe testing process to relate it to future use by the opera-tional user. While prediction deals primarily with softwarein its "softest" form (e.g., specifications, designs, codedmodules) anid the processes employed in developing thesoftware, estimation deals with software that can be exe-cuted to perform specific functions. A reliability estima-tion based on software failures observed during opera-tional tests and evaluation [9] would be more directindication of the software's future reliability than the pre-dictive measures just discussed. Naturally, the estimationactivity will be strongly influenced by the types andamount of testing being performed and this effect must betaken into consideration.To support the estimation activity, a reliability estima-

tion number (REN) will be based on failures exposed dur-ing the testing process as shown in Table I [10]. The basicunit for expressing REN will be failure rate obtained byobserving failures over software execution time. It will beadjusted by variables which influence the testing process,represented by measures of the test environment (includ-ing test effort, test methodology, and test coverage) or theoperating environment during operational test and evalu-ation (OT&E) (including workload and input variability).The observed failure rate will be adjusted up or down de-pending on the values for the other estimation variablescharacterizing the test process.

Final Reliability AssessmentFinal reliability assessment is the activity of determin-

ing achieved reliability on deployed software systems usinlgactual field data, not test data. The objective here is todetermine in an objective fashion how a software systemhas been performing its operational missions. Instead ofan indirect measure of reliability as indicated by the pre-dictive metrics, an assessment of reliability will involvedirect observation of software failures experienced by thesystem in performing its mission. These assessments willhelp demonstrate whether the reliability goals were met.

Assessments will facilitate quantifying goals for reli-ability specifications on similar applications in the future.The assessments will be defined in terms of failure ratefor comparisons to the predicted and estimated softwarereliability and will be used in the validation of the mea-sures applied during development. Assessing the reliabil-

ity on delivered software systems will also promote "les-sons-learned" and identify weaknesses and deficiencies forfuture research considerations. It will also improve un-derstanding of what causes software reliability problems.A reliability assessment benchmark (RAB) will be used

to assess actual software reliability performance duringmission use in the field. It will be expressed as the ratioof failures per million program executions or executiontime during a given reporting period for software perform-ing its operational missions [12]. Essentially, this is anapplication of the Nelson model [13] to the operationalenvironment [12]. The RAB will have to be recomputedafter any significant software release. Other descriptivemeasures will also be needed as qualifying data to'helpinterpret the benchmark values. Input domain variabilityis covered as the operational usage profile experiencedduring the mission.The software reliability measurement numbers (RFOM,

REN, RAB) have been chosen on the basis of applicationof their composite measures to several system develop-ments and statistical analyses of the measured values withthe observed failure rate on those systems [10]. Furthervalidation and calibration of the measures are underway.Criteria for evaluating software models in terms of valid-ity, capability, quality of assumptions, applicability, andsimplicity have been proposed [14] which could be adaptedand followed in evaluating these types of measures. As thenumbers mature, experience might indicate that somemeasures are irrelevant and should be dropped or changed.However, in an area which has been difficult to expressquantitatively, these candidate numbers represent a neededstarting point for providing evidence of the effect of soft-ware acquisition and software engineering decisions on thereliability and confidence of DOD software.A transformation function from fault density to failure

rate has been proposed to provide a relationship betweenthe various measurement numbers [ 10]. Thus, the predic-tive RFOM Number will be comparable to the REN andmore importantly both will be comparable to the goal es-tablished at the project outset and to the RAB obtainedduring deployment.

APPLICATION OF THE METHODOLOGYIn summary, let us review the questions posed in the

introduction and show how the application of RELMM,its associated measures of software reliability, and a re-pository of software reliability information will help pro-vide at least some of the answers. These answers will ob-viously not come overnight. It will require extensiveevaluation and data analysis to provide the keys to unlockthe door leading to high confidence software.

1) Establishing Confidence and Reliability Levels:Tracking deployed projects in terms of their RAB andstoring this information in a repository will provide aquantitative basis for establishing goals for new projects.The repository will be searched for projects of the sameapplication and scope and their RAB reviewed. This wouldprovide the acquisition manager and user with a range of

1453

Page 6: Toward High Confidence Software

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-11, NO. 12, DECEMBER 1985

achieved reliability as measured by the benchmark and an

indication of whether that was considered satisfactory bythe user of the system. The acquisition manager and user

would then set software reliability goals based on theunique characteristics of the new application, using therange of the benchmarks as a quantitative yardstick. Theconfidence in reaching this goal will be determined bywhere the goal fits within this historical range.

2) Reliability, Cost, Schedule, and Performance Trade-offs: The software repository will include information foreach of these characteristics and once that is available,future analysis can begin to develop the relationshipsamong them. Acquisition managers and contract devel-opers must begin to give equal weight to the reliabilityfactor. If a contract begins to be squeezed by cost or'schedule, software testing often suffers as a result. Therepository will allow tracking planned versus actualschedules for testing and the effect of schedule compres-

sion on testing could also be revealed. Other relationshipscould be explored in a similar fashion.

3) State-of-Practice for Software Reliability: As in 1)above, a repository containing RFOM's, REN's andRAB's would provide the foundation for characterizingreliability. Reliability trends could be produced by group-

ing the projects according to a number of categories (e..g.,year project was completed, development proces's or en-

vironment used, application, etc.) to identify both thestate-of-practice of reliability and areas that need furtherresearch. Threshold values for these measures will also beestablished.

4) Development Process Needed to Improve SoftwareReliability: A recent survey [8]' shows the percentage oferrors that could be either avoided or detected dependingon the development process used based on the judgmentof experienced software reliability 'personnel. By catego-rizing these processes more precisely, tracking their use

in DOD projects, and collecting failure data, these survey

results can be made more quantitative. This informationwould help guide acquisition managers in selecting the ap-

propriate development techniques for their application. Itwill also' be helpful during proposal evaluation in deter-mining the effect of proposed techniques on software re-

liability.5) Predicting Software Reliability: The use of the

RFOM and REN for prediction and estimation at key mile-stones would provide an indication of future reliabilityduring development and testing. These numbers will helpdetermine progress toward goals early in the software de-velopment lifecycle so that a-n acquisition manager will notbe "surprised" with the results of operational test andevaluation.

6) Corrective Action Techniques: Software measure-

ments will help control the software development processby complementing the quality assurance function. An ex-

ample would help illustrate how a corrective action pro-cess might work. Suppose a software reliability goal hasbeen established as'part of the reliability specification.Suppose also a set of processes related to that goal were

also specified: conducting design walk-throughs (but notcode inspections); using the software measurement tech-nology; and testing at a specified level. If the predictedreliability level at a key milestone, for example the criticaldesign review, does not meet the required goal, a numberof corrective actions should be pursued. First, an attemptcould be made to reduce design complexity by reworkingthe design. Or a decision could be made to add fault de-tection techniques (i.e, code reviews) which were notoriginally planned to improve reliability. To carry this ex-ample further, the reliability of the software could be es-timated again during system test. If the estimate does notmatch the required goal, one course of action might be toconduct OT&E'testing at a higher level of effort than orig-inally planned. If the estimated reliability still has notreached the specified level during OT&E, the testingschedule could be lengthened until an acceptable level isreached.

7) Testing Requirements: The use of products such asthe Software Test Guidebook [7] would assist in selectingtesting techniques based on the level of'testing confidencerequired for the software. A current rule-of-thumb sug-gests that 40 percent of the development effort should bedevoted to testing. Consistent tracking of test effort wouldhelp determine how valid this rule-of-thumb is for plan-ning test effort; i.e., how many projects truly spend 40percent of their effort in testing? Improved estimation ofsoftware reliability during the test phase will also help de-termine the answer to the "when to stop testing" ques-tion.

8) Reliability Assessment: Tracking software reliabil-ity in the field and producing the proposed RAB will'pro-vide quantitative evidence of software reliability on de-ployed software. In addition, this measure will helpvalidate prediction and estimation reliability numbers. Ifthat relationship proves sufficiently precise, those num-bers, available earlier in the development cycle, could beused to help decide if the software should be accepted.

CONCLUSIONSThe current technology in software reliability based on

past research efforts has not been, for the most part, ac-cepted by reliability practitioners. On the one hand, mea-sures of software reliability related to structural charac-teristics of the software provided predictions of the num-ber of faults expected in a portion of the code. This hadlittle relevance to reliability engineers because of theirtime-orientation for hardware (e.g., failure rate or MTBF).On the other hand, models of software reliability usingfailure detection rates during testing provides more rele-vant data, but because of necessary (and sometimes weak)assumptions, the sensitivity tj the testing approach, thelateness in application, failure to incorporate intrinsicsoftware factors (including mission/input domain vari-ability, functional complexity, application type, develop-ment parameters, etc.), the models also did not meetpractitioner's needs.The software reliability measurement methodology

1454

Page 7: Toward High Confidence Software

CAVANO: TOWARD HIGH CONFIDENCE SOFTWARE4

should help alleviate these concerns. It describes howsoftware measures can be used to improve DOD confi-dence in its software. Software measures (RFOM andREN) characterizing the software development process

and products will be related to failure rate during opera-tional use by statistical regression. The RFOM will incor-porate intrinsic functional complexity of the application,and the REN will incorporate test effort, test methodol-ogy, and test coverage. The RAB will incorporate datavariability and environmental considerations as experi-enced during actual mission performance.

ACKNOWLEDGMENT

The, author wishes to gratefully acknowledge the dis-cussions and reviews of this paper from A. Vito and J.Palaimo, Rome Air Development Center; R. luorno, ITTResearch Institute; J. McCall, Science Applications In-ternational Corporation; and R. Thibodeau, General Re-search Corporation.

REFERENCES

[1] S. Redwine et al., "DOD related software technology requirements,practices, and prospects for the future," IDA Paper P-1788, June 1984.

[2] M. Shooman, "Software reliability: A historical perspective," IEEETrans. Rel., vol. R-33, Apr. 1984.

[3] "STARS interim software data collection forms," DACS Rep., Apr.1985.

[4] V. Basili and D. Weiss, "A methodology for collecting valid softwareengineering data," IEEE Trans. Software Eng., vol. SE-10, Nov. 1984.

[5] T. Bowen, G. Wigle, and J. Tsai, "Specification of software qualityattributes," RADC-TR-85-37, Oct. 1984.

[6] J. Cavano, "Software reliability measurement: Prediction, estimation,and assessment," J. Syst. Software, issue 4.3, 1984.

[7] E. Presson, "Software test guidebook," RADC-TR-84-53, Mar. 1984.[8] E. Soistman and K. Ragsdale, "Impact of hardware/software faults

on system reliability," RADC-TR-85-228, Apr. 1985.[9] R. Glass, Software Reliability Guidebook. Englewood Cliffs, NJ:

Prentice-Hall, 1979.[10] J. McCall, H. Hecht, et al., "Methodology for software and system

reliability prediction, Phase 11," RADC Interim Rep., Mar. 1985.[11] H. Hecht and M. Hecht, "Trends in software reliability for digital

flight control," NASA Ames Res. Center, Apr. 1983.[12] R. Thibodeau and A. Hughes, "Software reliability benchmark,"

RADC Interim Rep., Oct. 1984.[13] T. Thayer etal., "Software reliability study," RADC-TR-76-238, Aug.

1976.[14] A. lannino, J. Musa, K. Okumoto, and B. Littlewood, "Criteria for

software reliability model comparisons," IEEE Trans. Software Eng.vol. SE-10, Nov. 1984.

Joseph P. Cavano received the B.S. degree inmathematics from Clarkson University, Potsdam,NY, in 1970, and the M.S. degree in industrial en-

I _ gineering operations research from Syracuse Uni-versity, Syracuse, NY, in 1976.

He is a Computer Scientist at the Rome Air De-elopment Center, Griffiss Air Force Base, NY.He is also Group Leader for the Software QualityArea in the Software Engineering Section and isthe Chairman of the STARS Measurement AreaCoordinating Team. He has been directing re-

search in developing measurements for software quality, especially in theareas of software reliability and supportability, for the last nine years. Hehas published papers in the areas of software reliability, management in-formation systems, and database management systems.

1455