A Many-State Markov Model for Computer Software Performance Parameters

66 IEEE TRANSACTIONS ON RELIABILITY, VOL. R-25, NO. 2, JUNE 1976

A Many-State Markov Model forComputer Software Performance Parameters

Martin L. Shooman, Member IEEE bug(s) that is judged to have caused the error. Note: A soft-Ashok K. Trivedi ware correction is classified as successful (bug removed), or

unsuccessful (bug remains, or is modified, or more bugs added).

Abstract-A many-state Markov model has been developed for the MODEL ASSUMPTIONSpurpose of providing performance criteria for computer software. Themodel provides estimates and closed form predictions of the availability 1 The software system is large (say 1 O5 words of code).and of the most probable number of errors that will have been correctedat a given time in the operation of a large software package. The model 2. The origin of operation time t is the beginning of theis based on constant rates for error-occurrence X and error cortection process integration and test phase.,u. An interesting application case is when X and ,u are functions of the 3. At t = 0, the system contains an unknown number ofstate of debugging achieved. This case is discussed and solved numeri- unknown bugs.cally. Extensions and modifications of the basic model are briefly dis- 4. Al software errors are of a single category, and cause thecussed.

system to go down.Key Words-Software, Availability, Markov model. 5. At most one error occurs (is discovered) at a time.

6. When an error occurs, it is repaired successfully beforeReaderAids: the next error occurs.

Purpose: Advance state of the artSpecial math needed for explanations: Probability and reliability MANYSTATE MODELSpecial math needed for results: ReliabilityResult useful to: Software and reliability analysts, software engineers

Figure 1 illustrates the configuration of states and transitionprobabilities.

1. The sequence of system up states is {n, n - 1, n - 2, ...}.INTRODUCTION 2. The sequence of system down states is {n m -1,

m - 2, ...}This paper is a summary of [8]. 3. The system enters state (n - k) when the k-th bug has

just been removed.Definition 1: A Software Bug - b 4. The system enters state (m - k) when the (k + 1) error

is discovered.A bug is a portion of code which, under one or more in- 5. State transitions occur in negligible time intervals.

put conditions, will result in the output of the system being 6. S(t) is the random variable denoting the state of the sys-different from its true output, as determined according to the tem at time t.system design specifications. 7. Error occurrence rates {X.} and error repair rates {,uj}

must be initially modeled. (The subscript j refers to the systemDefinition 2: A Software Error - e state.)

8. We assume that transition rates Xn - k- X(k) andA software error (failure) is the event which occurs when Am -k Ai(k), that is, that error occurrence and repair rates are

the software system is subjected to an input condition such functions of the number of bugs (k) removed from the system,that, due to the presence in the code of one or more bugs, the but are otherwise constant.resultant output will be different from its true value accordingto design specifications. SOLUTION AND RESULTS

Definition 3: A Software Correction - c The differential equations corresponding to the basic Markovmodel have been solved both exactly (for the case when X and

A software correction (repair) is an attempt, after a software ,u are constant). and numerically (for the general case) usingerror e has been detected, to diagnose and then to remove the Euler's Integration Method. These solutions are implemented

using various computer programs written in BASIC, FORTRAN,and PL/I.

IThis work was supported by the Air Force Systems Command's,Rome Air Development Center, Griffiss Air Force Base, NY, USA. Models Selected for X and AL

SHOOMAN/TRIVEDI: A MANY-STATE MARKOV MODEL FOR COMPUTER SOFTWARE PERFORMANCE PARAMETERS 67

I/Lsm/\t i~lam-lA\t I/rnm-k+At I-*m-At 1.00Zt)es 0.005

rn rn) (m-I) ------__ ___rnp(m-k) -095 At)X, At lLt1-IAt X At rntkA

n

IL,At At n-k+ 0.70

n C____n-In-2 - k n-k 0.6n5

IXn/\t I_Xn..,&t lX,\n_2At 1- k+ n t I-X\nS-Ok0.60 - SOTCAUSE

Fig. 1. A Markov model for software performance evaluation.00.55 |TIME t

0.50 10 2 3 4 5 6 7 8 r69 lo10t, =8.6

It seems reasonable to assume that as the number of errors Fig. 2 Availability, A(t), and availability + nonavailability, ,(t) (fromremoved from the software system increases, the rate at which numerical solution, A and p are given by (9) and (2)).errors are discovered will decrease. As a first approximation,we assume a linearly decreasing variation for X(k):

2-(k/6) ; O<S k S 12\(k) = (1) where e is a predefined number which we chose as e - 0.005.

O ; kk> 12 The availability function (3) is computed and the resultsgiven in Figure 2. The range of validity of these results is esti-

The numbers in (1) are chosen for convenience of illustration. mated to be 0 < t < 8.6 for e 0.005, as demonstrated byThe mean of X(k) in the range k: 0 6 k < 12 is 1.0. We also l(t). The increasing curve ofA(t) between t _ 1.0 andassume that the rate at which bugs are repaired remains con- t _ 8.6 is a consequence of the models we chose for the errorstant: occurrence and error correction rates.

g(k) = 2.0; k = 0, 1, 2, ... (2) EXTENSIONS OF THE MODEL

In practice, to avoid evaluation of many of the infinite It is useful to consider several possible extensions or modifi-series of terms, it is necessary to choose a maximum value cations of our many-state Markov model:kmax for the number of bugs removed. For (1) and (2), weempirically chose kmax = 10, to illustrate how the accuracy of 1. Split the errors into critical errors and noncritical errors.the computations depend on km.. Availability and the non- 2. Consider the error occurrence and the error repair ratesavailability of the system are defined as follows: (X and ,) as functions of system operation time t, rather than

k.kmax 3. Allow for the possibilities of unsuccessful repairs and

A(t) k Pr{S(t) = n - k} (availability) (3) even the introduction of additional errors. See [9].k=0 4. Consider applying the many-state model to real data.kmax

B(t) z Pr{S(t) = m - k)} (nonavailability) (4) We have investigated each of the above points in consider-k=O able detail: In principal, there is no difficulty in considering,

By choosing kmax sufficiently large, A(t) and B(t) can be made say, a third sequence of states {Q, Q- 1, Q- 2, ...} correspond-arbitrarily close to the true values of availability and nonavail- ing to the system being non-critically down. The modeling ofability. the state transition rates as functions of t (rather than k) has

It is also useful to defme been implemented, and this method may be preferable in theapplication of the model to real data.

E(t) A(t) + B(t) ; for all t > 0 (5) By allowing two further classes of transitions the basicmodel has been extended to include the possibilities of unsuc-

Clearly, if kmax is large enough and if the S1(t) are accurately cessful error repairs and introducing additional bugs into thiscomputed, S(t) _1 for all values of t > 0. Thus, S(t) is a system. From the applications point of view, this extensioncheck of accuracy. is very desirable.

In order to avoid round off errors for a given case, we willaccept as correct only those values of A(t),Pn k(t),Pm k(t), REFERENCESetc., which correspond to t-values satisfying:

[1 ] M.L. Shooman, Probabilistic Reliability: An EngineeringAp-(t)-1.0I .e (6) proach, New York: McGraw-Hill Book Co., 1968.

68 IEEE TRANSACTIONS ON RELIABILITY, VOL. R-25, NO. 2, JUNE 1976

[21 M.L. Shooman, "Probabilistic models for software reliability Martin L. Shooman//Prof. of Electrical Engineering and Computerprediction," Statistical Computer Performance Evaluation, pp. Science//Polytechnic Institute of New York//333 Jay Street, Brooklyn,485-502, Academic Press, New York, 1972. NY 11201 USA

[31 M.L. Shooman, "Operational testing and software reliability esti-mation during program development," Record: 1973 IEEE Sym-posium on Computer Software Reliability, pp. 51-57, New York,April 1973.

[4] R.E. Barlow, R. Proschan, E.M. Scheur, "A system debuggingmodel," Operations Research Center, University of California at Martin L. Shooman is a Professor of Electrical Engineering and Coi-Berkeley, April 1969. puter Science at the Polytechnic Institute of New York in Brooklyn.

[5] S.D. Conte, Elementary Numerical Analysis: An Algorithmic He has been active in the reliability field for 20 years and the softwareApproach, New York: McGraw-Hill Book Co., 1965. reliability area for 5 years. He has published widely in the reliability[61 J.G. Estep, "A software availability and reliability model," 1973 literature and is author of the book Probabilistic Reliability: An En-IEEE Symposium on Computer Software Reliability, p. 101, gineering Approach. Dr. Shooman is presently writing a new bookNew York, April 1973. Software Engineering Reliability, Management, Methods.[7] A. Trivedi, M. Shooman, "Computer software reliability: Many-state Markov modeling Techniques," Polytechnic Institute ofNew York, Poly EE/EP 75-005, EER 116, March 1975.

[8] A. Trivedi, M. Shooman, "A many-state Markov model for com-puter software performance parameters," Proc. 1975 Interna-tional Conference on Reliable Software, April 1975. Los Angeles,California. IEEE Cat. No. 75 CMO 940-7CSR. Ashok K. Trivedi is a Member of the Scientiflc Staff at Bell-Northern

[9] M.L. Shooman, S. Natarajan, "Effect of Manpower Deployment Research, Ottawa, Canada and is on the Adjunct Staff of the Universityand Error Generation on Software Reliability," Polytechnic In- of Ottawa. He received his Bachelor's degree in Electrical Engineeringstitute of New York MRI Symposium, on Computer Software from the Catholic University of America, Washington, D.C., and hisEngineering, April 1976, hardcover Proceedings available from Master's and Ph.D. degrees from the Polytechnic Institute of Brooklyn,Polytechnic Press, Attn. Mr. Jerome Fox. New York. This paper is based on work done towards Dr. Trivedi's

Doctoral dissertation in Electrical Engineering. His current interestsManuscript received May 3, 1975; revised December 12, 1975, and include Reliability Modeling and Computer-Communications Networks.March 18, 1976 l n M

FREE Proceedings

Members of the IEEE Reliability Group and the ASQC Electronics Division can receive the following publications FREE of extracharge. Just write to the place indicated for that publication. Quantities are limited, and are available on a first-come first-servedbasis. If you are not a member and would like to join, see the inside front and rear covers for more information on the twogroups.

IEEE Reliability Group ASQC Electronics Division

Proc. Annual Reliability & Maintainability Symposium. (Sent (request must go to James H. King//BART//800 Madisonautomatically to all members; extra copies are not available.) Avenue//Oakland, California 94607 USA)

Proc. Reliability Physics Symposium. (Sent automatically to Proc. Annual Reliability & Maintainability Symposium, forall members; extra copies are not available.) 1975 & 1976*"

Proc. Product Liability Prevention Seminar, for 1974 & 1975. Proc. Reliability Physics Symposium, for 1975.(Richard M. Jacobs//Consulting Services Inst.//23 RumsonRoad//Livingston, New Jersey 07039 USA) Proc. Product Liability Prevention Seminar, for 1975.

Proc. International Conf. on Reliable Software, April 1975.(W.J. Thomas//14000 Georgia Avenue//Silver Spring, Maryland20910 USA) *There is a $4.00 charge for the 1975 Proc.

Documents

A Many-State Markov Model for Computer Software Performance Parameters