[IEEE 2012 Ninth International Conference on Information Technology: New Generations (ITNG) - Las Vegas, NV, USA (2012.04.16-2012.04.18)] 2012 Ninth International Conference on Information

Automatic Coverage Evaluation For a Medical Expert System

Ahmad Eyadat Computer Information Systems Department

Yarmouk University Irbid, Jordan

[email protected]

Izzat Alsmadi Computer Information Systems Department

Yarmouk University Irbid, Jordan

[email protected]

Abstract—It is always important to test software products to ensure that they are correct and are built according to the specified requirements. Testing may not only occur after developing the software product. In the other stages of the software development process (e.g. requirement and design stages) testing may occur to detect errors early which can be more effective in saving project scarce resources. In this paper, an expert system is built for dental clinical treatment. A graph is automatically generated based on a formal model built for possible diseases and symptoms related to dental clinics. Later on, a software application is built to automatically generate, execute and verify test cases from the dental expert system formal model. Coverage is also evaluated based on the graph: nodes, edges and paths. Initial test results showed problems with the developed formal model in producing several dead lock paths. Several cycles of improvements to the model were implemented based on the test output and coverage results. Results showed that such approach can be very useful in evaluating expert systems in general.

Keywords-component; Formal methods; expert systems, software testing; model checkers.

I. INTRODUCTION (HEADING 1) Computer programs strive to come free from errors.

Working programmers may dedicate more than half of their time of the software development process on testing and debugging in order to increase product reliability. Which means that we must make sure that software actually performs what it is supposed to do, and must do it correctly. The testing phase of many software processes is generally the final part of the software project; so it is the first to market, which means some errors in the program may not be uncovered until it reaches the customer. This is not because the testing isn’t preformed, but to say that testing isn’t performed strictly as it should be. Testing phase uses formal proof methods. However, formal proof methods can guarantee correctness only to systems of a limited size. To solve the problem, Software testing activities can be automated which means accelerating the process. The advantages of this process is that it is computationally cheaper, increase confidence in the correctness, practical with current technology and fits in with established practices

and workflows. Thus to do such a process we need tools that can automatically generate test cases.

Model checking is a process that is used to check the software model and detect possible errors in the design. While such process may consume extra resources from the already scarce software project resources, the goal of model checking is to detect errors early in the development stage to save, hopefully, future possibly more expensive resources. Results showed that a bug that is discovered in the design stage can be much more expensive if the discovery was delayed to the testing or the maintenance stage. Error in the software design itself can be very expensive to modify and fix. This is why, several important design quality such as low coupling and high cohesion are desired to reduce the cost of fixing or updating a software program in later stages or in future releases.

There are many forms in which a software can be modeled. In all those forms, the process starts from the initial, unstructured requirements written in an informal natural language. A formal language, method, tool, etc. is used to convert the requirements or specification from its informal form to a formal form. Such formal form can be used as an input to model checkers or test case generation tools.

In our paper, the specification of a dental expert system is first collected in its natural language informal form. Several resources are used to collect the requirements from. This include: books, papers and dental domain experts. Later on, requirements are written in a textual Control Flow Graph (CFG) format. A CFG is a graph that analyzes a system based on its control or decision statements (e.g. if, else, when, while, etc). Figure 1 below shows the CFG generated for the dental expert system used in this study.

II. LITERATURE REVIEW Many researchers focus on the expert systems, test case

generators, and software testing. This section contains several different researches about examples of expert systems in the medical or dental field with focus on software testing aspects, especially the test case generators, and software testing.

2012 Ninth International Conference on Information Technology- New Generations

978-0-7695-4654-4/12 $26.00 © 2012 IEEE

DOI 10.1109/ITNG.2012.77

867

2012 Ninth International Conference on Information Technology - New Generations

978-0-7695-4654-4/12 $26.00 © 2012 IEEE

DOI 10.1109/ITNG.2012.77

867

Valerie Barr mentioned in his research that the performance of the rule-based system is related to the number of test cases with certain solution [1]. The system must be modified until reaching to the sufficient portion of test cases. The problem is that there is no guarantee that all possible cases will be covered. To solve this problem the author introduces an approach that uses coverage measure to evaluate the testing process. Author in his search uses CASNET Graph-based analyses, KP-Reducer to analysis the rule-bases, and path hunter and path tracer these tools determine how broadly the possible execution paths are covered by the test data. As a conclusion author found that to improve test data and rule-base they can use information about the extent data set has covered the rule base under test.

Belli and Güldal in their research tried to resolve problem using the formal methods to test software [2]. The solution was in using model checking that can be automatically carried out. Author’s idea that will generate test cases from the specification transfers the specification-oriented testing process to model checking. Thus the approach combines the advantages of testing and model checking. The research is assuming that the model will visit all reachable states, then it will check if the results will match the expected results. When the generator is applied the problem is was who guarantees that all of the relevant requirements have been verified. Tom solve the problem researcher found that the existing approaches combine testing and model checking in order to automatically generate test cases which are then exercised on the real target system. Authors found that this approach has many advantages; first it reduces of the cost, second the test case and test generation are controlled by the coverage of the specification model, which enables an effective handling of the test termination.

Menzies and Cukic presented a paper on knowledge based models testing [3]. Authors tried to find the relation between the completeness criterion and the size of test suites. Many experiments are done in many models of real-world knowledge based systems indicate that a very limited gain in completeness can be achieved through prolonged testing. The use of simple search strategies for testing appears to be as powerful as testing by more thorough search algorithms. The paper showed that the problem is about how to select test cases to thoroughly test the rule base, what is an adequate number of tests for a given knowledge based system and when to stop testing. The paper made a comparison between the light and the thorough testing. From the experiments author found that a few explorations of a search space yield as much information as more thorough searches, thus Simple random strategies for testing of knowledge based be as powerful as the more detailed testing strategies. As a result he concludes that substantial coverage may be achieved by test suites containing a surprisingly small of randomly chosen tests. So he defines the light testing is the best strategy.

Ahmed and Hermadi paper tried to prove that using a powerful generator to test paths will reduce the cost and time of testing [4]. They proposed test path evaluation based on Genetic Algorithms (GAs) for automating the generation of test data. One problem in this approach is the inefficiency in

covering multiple target paths. To solve this problem authors designed a GA-based test data generator that is, in one run, able to synthesize multiple test data to cover multiple target paths. After implementing this strategy author did many experiments and he found that the genetic algorithms (GA) test data generator is more efficient and more effective than other generators, because it generates only one test datum at a time, thus GA run multiple test data to cover multiple target paths with less number of test data examined.

Harrold paper used the roadmap testing which is one of the oldest forms of verification [5]. Testing is accomplished by executing the software with input test cases and the output data is observed. The researcher concludes that improving the software quality and efficiency will be accomplished by making fundamental research that addresses the challenging problems, development of methods and tools, and empirical studies. Author also concludes that there limitations for any testing, such as testing cannot show the absence of faults it can show only their presence. Additionally, testing cannot show that the software has certain qualities.

Rayadurgam and Heimdahl paper proposed a method to automatically generating test cases to structural coverage criteria [6]. Results produced from the test will be used in software development. From this research author aim to help reduce the high cost of developing test cases for safety-critical software applications that require a certain level of coverage for certification. The main part of the research is to produce structural coverage criteria which will be used to make a model checker provide test sequences to achieve this coverage. As an example, the paper used MC/DC coverage, and the model checker will generate test sequences for it. The results showed that the approach will scale well to larger systems and have the potential to dramatically reduce the costs associated with generating test cases to high levels of structural coverage.

Almaliotis et al. paper proposed that the software cost can be reduced and verification and validation increase by building a model checker [7]. The idea is to develop a set of linear temporal logic (LTL) formulae that will produce a set of test cases for a given coverage criteria, the problem is the production is not easily automated. To solve the problem authors proposed that automation can be done by producing model program from the source program (i.e. SPIN model) to the generation of the expected test cases. Some problems are facing the research are the need for fine tuning the source to model abstraction and The program’s execution environment and its role in the source to model transformation.

Pfaller et al. paper tried to work on test case design and improving test case coverage [8]. To solve such a problem authors introduced an approach to derive test cases along different levels of abstraction during the design phase. Many different design models of different abstraction levels are presented. The test case is executed in many levels then finally on the implementation. Advantages of this approach that it protects the link from test cases to corresponding user requirements, but the problem is using many abstract models which do not reflect that inevitable crucial aspects of the realization is avoided. Authors tried to keep the trace

868868

between generated test cases and user requirements. Thus testing many levels produces test cases which cover realization constraints and identify error-prone parts of the system design. Authors conclude that using this method will be effective approach for the definition of new coverage criteria for test case generation which are based on user requirements rather than on structural aspects.

Abu Naser et al. paper proposed a design for a medical expert system [9]. The paper focused on the language to represent patient’s medical history and current state into a knowledge base for the expert systems. It also focused on how to get correct results and effective consultation. To solve the language problem, authors proposed and developed CLIPS(C Language Integrated Production System) with Java Interface as a proper language to achieve the expert system development. Authors concluded that testing many cases in the system appears good results and the system performed very well when compared to the expert human doctor.

Filho paper demonstrated a solution to the medical expert system applied to support the diagnosis and treatment of diabetes [10]. The problem that the paper focused on is on how to avoid unnecessary search, then increasing response speed. This is accomplished by adopting strategies of proven clinical effectiveness by the system, the system adopts a forward, backward chaining inference mechanism by implementing shorten methods. The designed system has many capabilities such as: judging the possibility of illness, its severity, and its potential complications including a statistical belief, based on the patient's symptoms and laboratory examinations. The system can also give prescriptions and make useful indications and suggestions.

Utting paper focused on how the model-based system is generating test cases automatically from a model of the software product [11]. Test cases with oracle components result in pass/fail outputs for each test. The paper discussed also how model-based testing is incorporated into the proposed grand challenge for a program verifier. The paper concluded that there are many advantages for the model-based testing such as good coverage of all the behaviors of the product, tests are linked directly to requirements and repeatable and scientific basis for product testing.

Barr paper focused on evaluating the performance of the rule-based system. The paper found that the performance on the test cases may not accurately predict performance of the system in actual use [12]. The paper demonstrated a method to evaluate the rule-based system coverage. The method showed the relation between representatives the test data and the population on which the rule-base will be used, and also demonstrates possibility of occurrence of different kinds of test cases in the larger population. The paper found a performance prediction can be accomplished without running repetitive test cases, so no need for similar cases to get statistical confirmation and this done by Using the information about coverage, and number of paths and compute a safe performance prediction outline.

Swain et al paper demonstrated how to control quality in the software and get reliable system [13]. Testing is done in many ways and forms, but this paper focus in how to build model-based testing (MBT). The problem is on how to build

model-based testing with less cost, time and error prone. The solution was in building automatic testing of object oriented software, which will automate test case execution. The approach is based solely on analysis of code and requirements specification. From the testing results authors concludes that the new approach MBT reveals considerable benefit in terms of increase productivity and reduced development time and costs, also it can replace code based testing.

III. GOALS AND APPROACHES In this section we will describe this research

methodology. it is composed of several steps: data collection, data processing building the expert System, test case generation, execution and coverage assessment.

A. Data collection Initial dataset is collected from the dental center of

Jordan Science and Technology University (JUST) in Irbid, Jordan. Tables 1 and Table 2 show samples of the collected data for symptoms and diseases.

TABLE I. SAMPLE OF SYMPTOMS DATA Abnormal dental position

Jaw pain dislocation of the jaw

Radiating facial pain

Abnormal Toothenamel

Jaw pain when yawning Dizziness Radiating neck

pain

abraded teeth Jaw popping Earache Radiating shoulder pain

Abscess Limited jaw movement

Eating Disorders

Red face on one-side

Altered tooth position Locked jaw Erosion of

tooth enamel Red gums

Anxiety Loose tooth facial pain Relief of prior toothache

Bad taste Loss of hearing Fever Sensitive teeth Bleeding around tooth Malaise Grating jaw

sound Severe toothache - later

Bleeding gums Malocclusion Grayish film

on the gums Shiny appearance to gums

Breath odor Mild dental pain Grinding teeth Stress or Tension The collected data contains totally 200 symptoms and 20

diseases. TABLE II. THE DISEASES NAME

Disease Name No. Disease Name No

Dental abscess 11 Ankylosis of teeth

1

Jaw joint disorders 12 Bruxism 2 Acute Necrotizing 13 Dental conditions 3

Impacted tooth 14 Dental caries 4

Tooth Demineralization 15 Dental tissue neoplasm

5

Jaw conditions 16 Teeth grinding 6

Tooth Abrasion 17 Gum disease 7 trench mouth 18 Periodontitis 8 Gum disorders 19 Toothache 9

Gingivitis 20 Bleeding Gums 10

869869

B. Data processing One of the major objectives in producing the formal

model or the CFG is to put data into a suitable format for the modeling phase.

The CFG is produced based on evaluating all possible paths based on the possible symptoms and diseases. Verification is accomplished with related books and experts on the domain. Table 3 shows a sample of the diseases and their paths that include the name of the edges that the path will go through to reach a final decision of the particular disease.

TABLE III. DATA AFTER PROCESSING: SAMPLE Disease Name Symptoms path in the graph 0 Bleeding Gums 11 1 Gingivitis 11,74,94,99 2 Dental abscess 13,34,35,64,67,70,72,77,87,8

9,93,98,101 3 Jaw joint

disorders 15,17,28,29,36,39,49,50,55,57,58,59,60,61.62,65,73,76,77, 79,83,84,85,86,90,110,112

4 Acute Necrotizing 19,37,71,92,100

5 Impacted tooth 20,41,43,109 6 Tooth

Demineralization 24,92,108 7 Jaw conditions 28,29,49,50,56,57,58,60,61,7

7,79,84,85,86,103 8 Tooth Abrasion 31,51,52 9 trench mouth 35,44,46,48,78 10 Gum disorders 40,42,43,44,45

Following are examples of some of the preprocessing

steps that were necessary to improve the quality of the CFG and reduce the possibility of generate redundant paths or dead lock paths. Redundant paths are paths that can be visited by more than one disease. Dead lock paths are paths of symptoms that can reach no or known disease.

• Remove all redundant symptoms. • Combine the set of related symptoms for each Disease.

Table 4 shows examples of some of the combined roles

of similar symptoms.

C. Building the decision tree or CFG From the refined data resultant in the Table 4 we built a

decision tree. The decision tree contains nodes, which represents the symptoms, connected with links. The path contains many nodes that finally lead to the diseases. Figure 1 shows a condensed version of the dental CFG model.

Figure 1. Dental Expert System CFG

D. Generating rules from Decision Tree. After building the decision tree, the desired rules are

extracted (to generate test cases from). Each rule represents a path that includes particular symptoms leading to one specific disease. There are 20 rules extracted (one for each disease) from the decision tree as shown in Table 4 (sample).

TABLE IV. DENTAL EXPERT SYSTEM ROLES (SAMPLE) If Bleeding gums Then Bleeding Gums If Bleeding gums & Swollen gums & Mouth sores & Shiny appearance to gums , Then Gingivitis

if Bleeding gums & Bad breath & Bite changes & Dentures fitting poorly & Loose tooth & Gums pulling away from your teeth & Red gums & Sensitive teeth & Sore gums & Swollen gums Then Gum disease If Bleeding gums & Bad breath & Bite changes & Dentures fitting poorly & Loose tooth & Relief of prior toothache & Severe discomfort when eating or swallowing & Stress or Tension & Swollen lymph nodes of the head and neck Then Periodontitis

If Abscess & Bad breath &Cervical adenopathy & Chills & Fever &Tooth pain & rismus & Foul taste in mouth Then Dental caries

If Abscess & Bad breath & Cervical adenopathy & Chills & Fever & Tooth pain & Trismus & Bad taste & Malodorous breath Then Dental conditions

If Anxiety & Depression & Earache & Eating Disorders & Grinding noises when asleep & Headache & Insomnia & Jaw pain & Loss of tooth enamel & Swelling around tooth & Teeth misalignment & Tooth pain & Wear on the teeth Then Teeth grinding

870870

For example, in role (2), Rule (2): If the patient complains from Bleeding gums and swollen gums and Mouth sores and Shiny appearance to gums, then the disease is Gingivitis.

IV. TEST CASE GENERATION, EXECUTION, AND COVERAGE ASSESSMENT

To evaluate the CFG model, we built a testing tool called model based system. This tool consists from two algorithms to generate test suites automatically. Figure 2 shows an example of a report automatically generated to evaluate the performance of the generated test cases in terms of edge, node and path coverage.

Figure 2. Random Algorithm Results

Two algorithms are developed to automatically generate test cases from the CFG. Those algorithms are: sequential and random algorithms. As explained earlier, there are two constraints for generating test cases from the CFG. The first one is that the test case should produce a valid path from the CFG representing a disease. The second one, is that test cases should not be redundant and keep visiting the same paths which will reduce the overall coverage of the test set.

A. Random algorithm: In this algorithm, test cases are generated randomly from

the dental CFG model. The algorithm is based on pure randomness and hence does not guarantee that the same paths or diseases will not be revisited in the next test cases. This is why coverage is low in this algorithm (Table 5). Coverage is calculated based on the number of visited edges, nodes or paths in the test set of all test cases to the actual number of edges, nodes and paths in the CFG. Results are showed for 5 test cases. If all the 5 test cases are visiting the

same path, the coverage for all test cases will be the same as that of one of them.

Table V Random Algorithm Results

B. Sequential algorithm

In the sequential algorithm and in order to improve coverage, a constraint is added to the random algorithm previously described to ensure that a new test case will not visit a path in the CFG that was already visited by a previous test case. Table 6 shows a sample output of results from this algorithm where there is a significant improvement based on coverage relative to the previous algorithm. Table VI. Sequential Algorithm Results

Test Cases Path

Coverage Node

Coverage Edge

Coverage

5 0.09 0.05 0.045

10 0.19 0.15 0.15

15 0.29 0.21 0.21

20 0.39 0.29 0.29

25 0.49 0.39 0.39

30 0.58 0.53 0.53

35 0.68 0.62 0.62

40 0.78 0.74 0.74

45 0.88 0.85 0.86

50 0.98 0.97 0.97

V. CONCLUSION After the text edit has been completed, the paper is ready

for the template. Duplicate the template file by using the Save As command, and use the naming convention prescribed by your conference for the name of your paper. In

Test Suite 1

Test Cases Path

Coverage Node

Coverage Edge

Coverage

5

0.09 0.15 0.15

0.09 0.14 0.14

0.09 0.11 0.11

0.09 0.07 0.07

0.09 0.08 0.08

AVG 0.09 0.11 0.11

871871

this newly created file, highlight all of the contents and import your prepared text file. You are now ready to style your paper.

In this paper we have presented a web based expert system

to assist in dental clinics for symptom-disease prediction. After building the formal specifications of the dental decision support or expert system, a control flow graph is developed based on the possible alternative for symptoms and diseases. The majority of the tasks in this system can be executed automatically. This includes the automatic generation of test cases based on the dental control flow graph. Test case execution and verification are also developed programmatically based on some constraints and algorithms. The algorithms try to generate test cases based on the control flow graph to represent actual test cases or scenarios in the application. The other constraint that the automatic generation of test cases depends on is the coverage assessment where new test cases in a test set are generated to find new: edges, nodes, or paths in the control graph and hence increase testing coverage. System evaluation is conducted to evaluate the accuracy, and performance of the process in addition to the coverage evaluation. The process shows that formal and graph based models can be very useful for expert systems especially in cases where it is necessary to ensure that the expert system should not miss any possible scenario. Coverage assessment based on the graph coverage can be an important indicator in finding possible unvisited or – dead code – paths.

REFERENCES

[1] Valerie Barr. (1998). Application of rule-base coverage measures to expert system Evaluation, AAAI- Journal of Knowledge Based Systems.

[2] Fevzi Belli and Bar�� Güldal. (2004). Software Testing via Model Checking, Computer and Information Sciences – ISCIS.

[3] Tim Menzies and Bojan Cukic. (2000). Adequacy of Limited Testing for Knowledge Based Systems, International Journal on Artificial Intelligence Tools (IJAIT).

[4] Moataz A. Ahmed and Irman Hermadi. (2008). GA-based multiple paths test data generator, Elsevier Science Ltd. Oxford.

[5] Mary Jean Harrold. (2000). Testing: A Roadmap. In Future of Software Engineering, 22nd International Conference on Software Engineering, June 2000.

[6] Sanjai Rayadurgam and Mats P. E. Heimdahl. (2005). Coverage Based Test-Case Generation using Model Checkers, Kluwer Academic Publishers Hingham, MA

[7] Vasilios Almaliotis, Panagiotis Katsaros, Konstantinos Mokos. (2006). Model Checking for Generation of Test Suites in Software Unit Testing, IEEE

[8] Christian Pfaller, Andreas Fleischmann, Judith Hartmann, Martin Rappl, Sabine Rittmann, Doris Wild. (2006). On the Integration of Design and Test —A Model Based Approach for Embedded Systems, ACM

[9] S. Abu Naser, R. Al-Dahdooh, A. Mushtaha and M. El-Naffar. (2010). Knowledge Management in ESMDA: Expert System for Medical Diagnostic Assistance, AIML Journa.

[10] Antonio Ribeiro Filho. (2008). MDSS, Medical Diagnosis Support System,

[11] Mark Utting. (2007). Position Paper: Model-Based Testing, Springer-Verlag Berlin, Heidelberg

[12] Valerie Barr. (1998). Quantitative Performance Prediction for Rule-Based Expert Systems, American Association for Artificial Intelligence.

[13] Santosh Kumar Swain, Subhendu Kumar Pani, Durga Prasad Mohapatra. (2010). Model Based Object-Oriented Software Teting, Journal of Theoretical and Applied. Information Technology ( JATIT).

[14] Cu D. Nguyen, Anna Perini and Paolo Tonella. (2008). eCAT: a Tool for Automating Test Cases Generation and Execution in Testing Multi-Agent Systems, AAMAS

[15] R. Jeevarathinam and Antony Selvadoss Thanamani. (2010).Towards Test cases Generation From Software Specification, International Journal of Computer Science and Information Security, IJCSIS, Vol.

[16] Gregoire Hamon Leonardo de Moura, and John Rushby. (2004). Generating Efficient Test Sets with a Model Checker, IEEE Computer Society Washington, DC.

[17] Maher Lamari. (2007). Towards an Automated Test Generation for the Verification of Model Transformations, ACM.

[18] [18] Dawei Qi, Abhik Roychoudhury, Zhenkai Liang. (2010). Test Generation to Expose Changes in Evolving Programs, ACM

[19] [19] A. Pretschner, W. Prenninger, S. Wagner, C. K uhnel, M. Baumgartner, B. Sostawa R. Zolch, T. Stauner. (2005). One Evaluation of Model Based Testing and its Automation, ACM.

[20] Kyungsoo Im, Tacksoo Im, John D. McGregor. (2008). Automating Test Case Definition Using a Domain Specific Language, ACM.

[21] Heiko Stallbaum, Andreas Metzger, Klaus Pohl. (2008). An Automated Technique for Risk-based Test Case Generation and Prioritization, ACM.

[22] D.Jeya Mala, V.Mohan. (2010). Quality Improvement and Optimization of Test Cases – A Hybrid Genetic Algo-rithm Based Approach, ACM.

[23] M.J. Escalona, J.J. Gutierrez, M. Mejias, G. Arag�n, I. Ramos, J. Torres, F.J. Dominguez. (2011). An overview on test generation from functional requirements, Elsevier Inc journal.

[24] Joachim Baumeister. (2010). Advanced empirical testing, Elsevier Inc journal.

[25] Konstantin Rubinov. (2010). Generating Integration Test Cases Automatically, ACM.

[26] M. Sarma, R. Mall. (2008). Automatic generation of test specifications for coverage of system state transitions, Elsevier B.V

872872

Documents

[IEEE 2012 Ninth International Conference on Information Technology: New Generations (ITNG) - Las Vegas, NV, USA (2012.04.16-2012.04.18)] 2012 Ninth International Conference on Information