commprogbayesian

Embed Size (px)

Citation preview

  • 8/6/2019 commprogbayesian

    1/8

    To what extent can Bayesian Inference be used in the

    confirmation of the hypothesis, this program is correct?

    C. D. Peach (691439)

    Abstract

    An essential part of the software life-cycle is the testing component. This is where it is hoped allerrors within a program can be removed to ensure the highest of standards. The problem of identifyingerrors is simply solved by the process of debugging. However, with increasing complexity it is extremely

    difficult to prove that all errors have been removed. Statistical inference methods are methods which canutilize testing procedures in order to produce levels of confidence in identifying the likelihood of whethera program will fail during its operational use. Bayesian inference is a method which uses statistics withsimple logic in order to yield probabilities of given hypothesis, which in this case would be this programis correct. Throughout this review the basic theory of Bayesian analysis, application and research isdiscussed with minimal mathematical detail in order to asses to what extent Bayesian inference can beused in confirming program correctness.

    1 Introduction

    Program testing is an important process in the software development life-cycle and is a necessity when highstandards of quality and reliability are to be met. It is often not possible to prove total reliability as manytest procedures are non-exhaustive due to time and cost constraints. As a result, testing has its main use inthe finding of bugs where errors in the output are used to narrow down the location of errors in a program.Modern day software development involves the writing of quite complex programs. If the software is to bemade commercially available, for example for use in hospitals, mechanical machinery and logistics whereerrors can be disastrous then the standards required are high. It is then the case that it is difficult to confirmthe dependability requirements of a program with absolute certainty. Instead it is hoped that the confidencelevels in a program can be raised through a more accurate analysis of test data, such that a threshold ofconfidence can be satisfied.

    An important question to understand after testing, is what can actually be inferred from the programoutput? For example, if many tests are undertaken with the program performing successfully on all tests, isit the case that the program is bug free, only contains a few yet minor errors or that the testing procedure isinsufficient to fully test the program in question? A possible method to answering this question is to ensurethat the testing procedure simulates real-life application as closely as possible [1]. An operational profileis a quantifiable measure of how a system will be used in order to gear test procedures towards the desiredprofile. This ensures that the reliability is maximum for the given test time. The use of a statistical methodcan then be adopted to infer reliability e.g [2], [3], [4], of which the main one to be discussed is Bayesianinference.

    1.1 Bayesian Inference

    A possible way to test the correctness of a program is to adopt a bayesian approach. The method is concernedwith finding the probability of failure of a program given information about the testing procedure such asquantity of testing and priori probabilities of program failure.

    1

  • 8/6/2019 commprogbayesian

    2/8

    Figure 1: A diagram showing the link between deductive and statistical inference [5].

    In many different forms of data gathering experiments it is often the case that there is not enough data inorder for deductive reasoning to be used in the proving or disproving of a given hypothesis. Bayesian inferenceis a powerful method which combines probability with extended logic. Figure 1 shows how the various aspectsof deductive and statistical inference are linked. To introduce the theory of Bayesian inference the basicnotation will be explained and used to derive the basis of Bayesian inference known as Bayes Theorem.

    Firstly, introducing the notation where:

    P(A) is the probability of A.

    P(A|B) is the probability of A given B has occurred.

    Consider the probability of say event A followed by event B, given by

    P(AB) = P(A)P(B|A). (1)

    By simple logic it is possible to see that the probability of event B followed by event A, given by

    P(BA) = P(B)P(A|B) (2)

    is equivalent to equation 1. Thus by equating the two statements yields Bayes Theorem which is written as

    P(A)P(B|A) = P(B)P(A|B). (3)

    For the purpose of this review, the application to computer program testing requires us to replace event A

    with a hypothesis (i.e this program is correct) and B with a set of information on the testing (i.e number ofsuccesful tests).

    2

  • 8/6/2019 commprogbayesian

    3/8

    2 Application of Bayesian Inference to program testing

    In order to show how Bayesian analysis can be applied an example given in [1] will be used and explained.This will then enable a discussion of past and present work where Bayesian inference has been employed aswell as a comparison to other methods of inference.

    2.1 Basic application

    It is possible to think of a testing procedure as a set of trials in which there is a constant probability of aprogram failure, otherwise known as Bernoullis trials. In such a case the probability of a program passingsay N tests, P(N) is given by:

    P(N) = (1 )N, (4)

    where (1-) is the probability of passing one test. Using Bayes theorem it is possible to make the followingsubstitutions:

    P(A|B) P(C|N): This is better known as the posterior odds probability of correctness given Nsuccessful tests and yeilds a number between 0 and 1.

    P(A) P(C): The priori probability of program correctness and involves proir knowledge to choosea value between 0 and 1.

    P(B|A) P(N|C): Probability of N successful tests given a correct program. By logic this is

    equivalent to 1.

    P(B) P(N): Normalization factor.

    After substituting terms into equation 3, the posterior probability of program correctness is given by thefollowing equation:

    P(C|N) =P(N|C)P(C)

    P(N)(5)

    2.2 Assumptions and estimation of posterior probability

    The use of equation 5 in inferring program correctness revolves around the plotting of the posterior prob-ability against the number of successful tests. In order to use the equation it is necessary to make someimportant assumptions regarding the priori probability of program correctness and the so-called testabilityof a program.

    The testability can be defined as the conditional probability that the program fails, without taking intoaccount that failures may go undetected due to an imperfect oracle and that faults can be revealed (byobserving errors) in the absence of failures [6], [7], [8], [9]. This definition can be written as follows usingthe notation introduced:

    Testab = P(F|Pr,f ), (6)

    3

  • 8/6/2019 commprogbayesian

    4/8

    where F corresponds to program failure, Pr represents a given distribution of test inputs and f representsfaults in the program. The reason for its introduction, without going into mathematical detail is that in thedescription given in [1] the de-numerator of equation 5 can be written in terms of the testability of a givenprogram. The question is therefor how to provide a sensible or indeed correct estimate for the testability of

    a program. It would appear measuring testability is not just the assigning of a number without justification.A possible path to take is to use sensitivity analysis, a method based on the separation of software failureinto three phases: execution of a software fault, creation of an incorrect data state and propagation of thisincorrect data state to a discernible output referred to as PIE (Propagation, Infection, Execution) [10]. Themethod is explained in more detail in [11].

    The other major assumption in the application of Bayes theorem involves estimation of the priori probabiltiyi.e. the probability that the program is correct. This is usually calculated using past information on previoussoftware and testing. [1] mentions that a reasonable belief for the majority of software is that the probabiltiyof a program being correct usually has an upper bound of 50%. Figure 2 shows the effect of the testabilityand prior values on the posterior odds probability of program correctness. Both the assumptions mentioned

    above are important and need to be carefully considered for the Bayesian method to have any hold andfeasibility. The values that are chosen for a given program therefor require sufficient justification.

    Figure 2: A plot showing the probability of program perfection as a function of successful tests using differentvalues of testability [1].

    The figure above shows a range of scenarios created by the different values of priori and testability. From theplot it may be inferred that as long as the program passes 1.68107 tests the program beyond reasonabledoubt statistically is correct.

    2.3 Past and present application

    In this section the aim is to review some applications of Bayesian inference to past and present research inprogram testing and the estimation of the probability of failure. A lot of the literature is very technical and

    4

  • 8/6/2019 commprogbayesian

    5/8

    mathematical. However, the aim of this section is to highlight some issues in the application of the Bayesianapproach and to give an overview of the techniques used to overcome these issues.

    A good example of application is miller et al. (1992) [12] who investigated very high reliability using

    the random black-box testing model. The black-box model assumes that a program can be treated as amathematical function with a well defined domain and range [13]. The model is then used to select aninput element at random, consistent with an assumed input distribution of test data. The test values areused to run a program so that the output of the program can be evaluated. The theoretical frameworkapplied by [12] used formulae based on Bayesian analysis. The formulae incorporated a particular inputdistribution and priori assumptions regarding the probability of failure of a program. It was first argued byVoas and co-authors ([6], [8]) that using testability estimates could yield more favourable predictions thanthe black-box method. Other examples including ([7], [9]) used testability estimates with the logic that ifa program is likely to fail under certain testing, yet is successful then it is likely the program is fault free.One of the assumptions in [12], was that the software being tested would be revised and resubmitted fortesting if it were to fail a test. Their work had three main focuses:

    Estimation of failure when random testing does not reveal failures.

    Adjusting the prior belief of failure when the test distribution does not match the intended operationaldistribution.

    Combining random testing with other information to estimate the probability of failure.

    Similar work by Thayer et al closely followed the techniques of miller et al however their work differed in theassumption of the priori probabilities. Thayer et al assumed a uniform priori distrubution whereas milleret al allowed for a non-uniform priori distribution. However, It was stated in [14] that their methods would

    probably not be suitable if Ultra-reliability was needed (classified by only one program failure in 109

    hours[15]) due to insufficient accuracy. [2] underlines the difficulties in validating ultra-high dependability andstates that it is an area with much needed research.

    The main reason for discussing Miller et al. (1992) is that it highlights the importance of the prior assump-tions. In their analysis they use two parameters a and b (see paper for mathematical details) in order forthe assumptions to influence the probability of failure. For complete ignorance of prior knowledge about apiece of software a and b can be set to a = b = 1. Moving past the simplest case, another example scenariois where a is set to 1 and b is allowed to vary. This encodes the belief that the software is equivalent to thatof software which has successfully passed b random tests. Another method of providing implied behaviorof a software based on previous versions include reliability growth models [16]. These revolve around theassumption that detection of an error leads to repair and resubmission. It is the successive successful number

    of program executions which are used as input in the growth models [2].

    Aswell as priori probabilities and logical assumptions there is an effort associated with different applicationsof test data and using input distributions in a useful way. For example ([12], [17], [18], [19]) use so-calledpartitioning of input data for which the input domain is binned such that each bin has an associatedprobability with each element in a given bin having equal probability of being drawn.

    The literature available appears to show that there are many ways to apply Bayesian inference to a problemand the simple picture of Bayes theorem presented in the previous section requires great thought in theassumptions of priori probabilities, distributions and partitioning of test inputs as well as the level ofreliability that is to be validated, highlighting the limitations of this type of inference. It appears that theresearch is centered around the limitations of program testing and the refinement of application of Bayesian

    analysis.

    5

  • 8/6/2019 commprogbayesian

    6/8

    3 Conclusion

    From the discussion in the previous sections it is obvious to see that the area of quantifying the variable ofprogram correctness is an area of great interest and is an almost never ending learning curve. The reason forthis appears to be in the application of the Bayesian analysis to a given problem and not in the theory itself.The method involves using assumptions about the so-called testability of a program and of probabilities ofa correct program using prior knowledge. Determining these values can be done in many ways and so thequestion is therefor, Is it possible to estimate the correct value for these variables? The answer would be no,it could never be feasible to give values with absolute certainty. However, lets say that this was possible. Ibelieve that the method, although simple is a powerful method in which many scenarios can be simulatedin order to observe probabilities of program correctness. As a result if all prior assumptions happen to becorrect then the method would be able to infer that a program is correct or not with certainty.

    In the more realistic case where assumptions can only be mathematically and logically assigned valuesthe method is still a perfect method for inference. The method allows for simulation of any scenario (i.e.

    different combinations of assumed input variables). As a result by using plots as the one shown in figure ?it is possible to show requirements of testing results in order to give a probability of 1 for the probabilityof a program being correct. Thus it can be shown for example that if a program passes a number of testssuccessfully that in any or many scenarios the number of successes is enough to infer program correctness.However, it is then the case that the testing procedure needs to be thorough and specific to the real-lifeuse for which a program is intended. This review has examined some of the ways in which a program istested and this again is an area where there can always be improvement or developments such that the typeof testing is proven to be the best type.

    I believe the greatest need for improvement lies in the testing component and not the application of theBayesian analysis. Many scenarios can be tested easily however, it is not so easy to keep trying different formsof testing an so it would seem more time and cost effective to work on this area of the testing component.

    6

  • 8/6/2019 commprogbayesian

    7/8

    References

    [1] A. Bertolino., & L. Strigini., On the use of testability measures for dependability assessment, IEEE Transactions on

    Software Engineering, Vol. 22, No. 2, February 1996.

    [2] B. Littlewood and L. Strigini,Validation of Ultra-High Dependability for Software-based Systems, Communications of the

    ACM, Vol. 36, No. 11, pp. 69-80, November 1993.

    [3] K. W. Miller, L. J. Morell, R. E. Noonan, S. K. Park, D. M. Nicol, B. W. Murrill, and J. M. Voas, Estimating the

    Probability of Failure when Testing Reveals No Failures, IEEE Transactions on Software Engineering, Vol. 18, No.1, pp

    33-44, Jan. 1992.

    [4] D. L. Parnas, A. J. van Schouwen, and S. P. Kwan, Evaluation of Safety-Critical Software, Communications of the ACM,

    Vol. 33, No.6, pp. 636-648, June 1990.

    [5] P. Gregory., Bayesian logical dat analysis for the physical sciences, Cambridge Univ. press, 2005.

    [6] D. Hamlet and J. Voas, Faults on Its Sleeve: Amplifying Software Reliability Testing, 1993 Int. Symposium on Software

    Testing and Analysis (ISSTA), Cambridge, Massachusetts, June 28-30, 1993, pp. 89-98, in ACM SIGSOFT Software Eng.

    Notes, Vol. 18 (3), July 1993.

    [7] J. M. Voas, C. C. Michael and K. W. Miller, Confidently Assessing a Zero Probability of Software Failure, High IntegritySystems, Vol. 1, No. 3, pp. 269-275, 1995.

    [8] J. M. Voas and K. W. Miller, Improving the Software Development Process Using Testability Research, Proc. of the

    Third Int. Symposium on Soft. Reliability Engineering, Oct. 7-10, 1992, pp. 114-121.

    [9] J. M. Voas and K. W. Miller, Software Testability: The New Verification, IEEE Software, pp. 17-28, May 1995.

    [11] J. Voas. PIE: A Dynamic Failure Based Technique. IEEE Trans. on Software Engineering 18(8):717-727, August 1992.

    [13] H. D. Mills, The new math of computer programming, CACM, vol. 18, no. 1, pp. 4348, Jan. 1975.

    [14] D. R. Miller, Making statistical inferences about software reliability, NASA Contractor Rep. No. 4197, Dec. 1988.

    [15] Special Committee 152, Software considerations in airborne system and equipment certification, Radio Tech. Commission

    for Aeronautics, Washington, DC, DO-l78A, Mar. 1985.

    [16] K. W. Miller, L. J. Morell, R. E. Noonan, S. K. Park, D. M. Nicol, B. W. Murrill & J. M. Voas., Estimating the probabitlityof failure when testing reveals no failures, IEEE Trans. on Software Engineering, vol 18, no. 1, Jan 1992.

    [17] B. Jeng and E. Weyuker, Some observations on partition testing, in Proc. ACM SIGSOFTRY: 3rd Symp. on SofhYare

    Testing, Analysis, and Verification (TAV3), Dec. 1989, pp. 38-47.

    [18] T. A. Thayer, M. Lipow, and E. C. Nelson, Software Reliability (TRW Series of Software Techn., vol. 2). New York:

    North-Holland, 1978.

    [19] S. N. Weiss and E. J. Weyuker, An extended domain-based model of software reliability, IEEE Trans. Software Eng., vol.

    14, pp. 1512-1524, Oct. 1988.

    7

  • 8/6/2019 commprogbayesian

    8/8

    4 Discussion of references

    The references will be discussed in number order:

    [1]: This reference is taken from the IEEE Xplore, which is the worlds leading membership organizationfor computing professionals. This reference was mainly used for confirmation of certain topics which wereexplained in a more basic way in this reference. Because of its publication pedigree it was thought of as atrusted source and so the references [11], [16], [18], [21], [23], [27], [28], [30] who were used from this sourcewhere basic terms where explained however, more specific detail was needed in order for learning and tomake links in my review. As a result I believe that this reference along with the ones stated above aretrusted sources.

    [5]: This reference was a book published by Cambridge university press and was used to understand andexplain the basic theory and philosophy behind Bayesian analysis. The book is most intended for studentsand phD students. Cambridge university press have published many education books and are a well respectedsource. The ideas in this book are well known across the area of statistical inference where only theory was

    explained and no opinions or new ideas.

    [12]: This reference was used in the research section of the review which was written by academic professionalsof associate professor status or above. The publication appeared in IEEE Xplore and so was again trustedfor the reasons previously stated. Also the references [12], [13], [14], [15] derived from this reference werealso trusted due to the caliber of the paper.

    For the remaining references and also including the ones explained google scholar was used with keywordssuch as Bayesian, program failure and testing were used. The references obtained from google scholarI believe are respectable sources. This is due to the fact google ranks the papers in much the same way asan academic equivalent would do with factors such as author quality, citations and popularity being used inthe ranking process.

    8