Prioritizing Test Cases for Regression TestingSebastian Elbaum University of Nebraska, LincolnAlexey Malishevsky Oregon State UniversityGregg Rothermel Oregon State UniversityISSTA 2000
Defining PrioritizationTest scheduling During regression testing stageGoal: maximize a criterion/criteriaIncrease rate of fault detectionIncrease rate of coverageIncrease rate of fault likelihood exposure
Prioritization RequirementsDefinition of goalIncrease rate of fault detectionMeasurement criterion% Of faults detected over life of test suitePrioritization techniqueRandomlyTotal statements coverageProbability of exposing faults
Previous WorkGoalIncrease rate of fault detectionMeasurementAPFD: weighted average of the percentage of faults detected over life of test suiteScale: 0 - 100 (higher means faster detection)
Previous Work (2)A-B-C-D-EC-E-B-A-DE-D-C-B-AMeasuring Rate of Fault Detection
Previous Work (3)Prioritization Techniques
Summary Previous WorkPerformed empirical evaluation of general prioritization techniquesEven simple techniques generated gainsUsed statement level techniquesStill room to improve
Research QuestionsCan version specific TCP improve the rate of fault detection?
How does fine technique granularity compare with coarse level granularity?
Can the use of fault proneness improve the rate of fault detection?
Addressing RQNew family of prioritization techniquesNew series of experimentsVersion specific prioritizationStatementFunctionGranularityContribution of fault pronenessPractical implications
Family of Experiments8 programs 29 versions50 test suites per programBranch coverage adequate14 techniques2 control techniques optimal & random4 statement level8 function level
Generic Factorial DesignTechniquesPrograms50 Test Suites29 VersionsIndependence of codeIndependenceof suite compositionIndependence of changes
Experiment 1a Version SpecificRQ1: Prioritization works on version specific at stat. level. ANOVA: Different average APFD among stat. level techniquesBonferroni: St-fep-addtl significantly better
Experiment 1b Version SpecificRQ1: Prioritization works on version specific at function level.ANOVA: Different average APFD among function level techniquesBonferroni: Fn-fep not significantly different than Fn-total
Experiment 2: GranularityRQ2: Fine granularity has greater prioritization potentialTechniques at the stat. level are significantly better than functional levelHowever, best functional level are better than worse statement level
Experiment 3: Fault PronenessRQ3: Incorporating fault likelihood did not significantly increased APFD. ANOVA: Significant differences in average APFD values among all functional level techniquesBonferroni: Surprise. Techniques using fault likelihood did not rank significantly betterReasons:For small changes fault likelihood does not seem to be worth it.We believe it will be worthwhile for larger changes. Further exploration required.
GroupTechniqueValueAFn-fi-fep-addtl76.34A BFn-fi-fep-total75.92A BFn-fi-total75.63A BFn-fep-addtl75.59A BFn-fep-total75.48 BFn-total75.09CFn-fi-addtl72.62CFn-addtl71.66
Practical ImplicationsAPFD:Optimal = 99%Fn-fi-fep-addtl= 98%Fn-total = 93%Random = 84%Time:Optimal = 1.3Fn-fi-fep-addtl = 2.0 (+.7)Fn-total = 11.9(+10.6)Random = 16.5(+15.2)
ConclusionsVersion specific techniques can significantly improve rate of fault detection during regression testingTechnique granularity is noticeableIn general, statement level is more powerful but,Advanced functional level techniques are better than simple statement level techniquesFault likelihood may not be helpful
Working on Controlling the threats More subjectsExtending modelDiscovery of additional factors Development of guidelines to choose best technique
ThreatsRepresentativenessProgramChanges Tests and process
APFD as a test efficiency measureTools correctness
Test Suite Avg. Size
FEP ComputationProbability that a fault causes a failureWorks with mutation analysisInsert mutantsDetermine how many mutant are exposed by a test case
FEP(t,s) =# of mutants of s exposed by t# of mutants of s
FI ComputationFault likelihoodAssociated with measurable software attributesComplexity metricsSize, Control Flow, and CouplingGenerated fault index principal component analysis
GroupTechniqueValueAOptimal94.24BSt-fep-addtl78.88 CSt-fep-total76.99D CFn-fi-fep-addtl76.34D CSt-total76.30D EFn-fi-fep-total75.92D EFn-fi-total75.63D EFn-fep-addtl75.59D EFn-fep-total75.48F EFn-total75.09FSt-addtl74.44GFn-fi-addtl72.62GFn-addtl71.66HRandom59.73