Upload
pavneet-singh-kochhar
View
221
Download
0
Embed Size (px)
Citation preview
An Empirical Study on the Adequacy of Testing inOpen Source Projects
Pavneet S. Kochhar1, Ferdian Thung1, David Lo1, and Julia Lawall2
1Singapore Management University2Inria/Lip6 France
{kochharps.2012,ferdiant.2013,davidlo}@smu.edu.sg, [email protected]
Asia-Pacific Software Engineering Conference (APSEC’14)
2
Open-Source Software, Why Bother?
• Plethora of open source software used by many commercial applications
• Large organizations investing time, effort and money in open source development
3
Software Testing, Why Bother?
Functionality -- Requirements
Bugs -- Software reliability
Costs -- Late bugs cost more
4
Software Testing, Why Bother?
• Horgan and Mathur [1]– Adequate testing is critical to develop reliable
software• Tassey [2]
– Inadequate testing cost US economy 59 billion dollars annually
[1] J.R. Horgan and A.P. Mathur, “Software testing and reliability.” McGraw-Hill, Inc., 1996.[2] G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” National Institute of Standards and Technology, 2002.
5
Study Goals
• Understand the state-of-the-practice of testing among open source projects
• Make recommendations to improve the state-of-practice
Are open-source projects adequately tested?
6
Understanding State-of-Practice
• Study a large number of projects• Check adequacy of testing
– Execute test cases – Assess test adequacy
• Characterize cases of inadequate testing– Correlate project metrics with test adequacy– At various levels of granularity
7
Outline
• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work
8
Test Adequacy
• Test Adequacy Criterion– Property that must be satisfied for a test suite
to be thorough. – Often measured by code coverage.
• Code Coverage– Percentage of the code executed by test cases
• Line coverage• Branch coverage
Test Adequacy
9
CT = number of branches that evaluate to trueCF = number of branches that evaluate to falseB = total number of branchesLC = total number of lines that are executedEL = total number of lines that are executable
10
Why Code Coverage?• Mockus et al. [1]
– Higher coverage leads to low post-release defects.
• Berner et al. [2] – Judicious use of coverage helps in finding new
defects.• Shamasunder [3]
– Branch & block coverage have correlation with fault detection.
[1] A. Mockus, N. Nagappan, and T. T. Dinh-Trong, “Test coverage and post-verification defects: A multiple case study,” in ESEM, 2009.[2] S. Berner, R. Weber, and R. K. Keller, “Enhancing software testing by judicious use of code coverage information,” in ICSE, 2007.[3] S. Shamasunder, “Empirical study - pairwise prediction of fault based on coverage,” Master’s thesis, 2012.
11
Source Code Metrics
• Number of lines of code (LOC)• Cyclomatic complexity (CC)
– Number of linearly independent paths through the source code
• Number of developers
12
Tool Support
• Computes the source code metrics• Runs test cases• Compute the overall coverage• Relies on the maven directory structure
13
Outline
• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work
14
Data Collection
• The largest site for open source project development– >3,000,000 users & 5,000,000 repositories
• One of the most popular Linux distributions
15
Data Collection• Find projects that use Maven
– Needed to run Sonar
757 projects 228 projects
945 projects(After removing duplicates)
16
Data Collection
• mvn clean install – Compiles the project• mvn sonar:sonar – Runs test cases and get statistics
945 projects
872 projectscontain test suites
327 projectsSuccessfully compile, run test
cases & produce coverage
17
Data Collection
Number of Lines of Code
Number of Test Cases
18
Data Collection
Cyclomatic Complexity
Number of Developers
19
Outline
• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work
20
Research Questions
RQ1: What are the coverage levels and test success densities exhibited by different projects? RQ2: What are the correlations between various software metrics and code coverage at the project level?
RQ3: What are the correlations between various software metrics and code coverage at the source code file level?
21
Research Questions
RQ1:Coverage Levels & Test Success Densities
22
RQ1: Coverage
Coverage Level (%) Number of Projects0-25 10525-50 9050-75 92
75-100 40
• 40 projects have coverage between 75%-100% • Average Coverage – 41.96%• Median Coverage – 40.30%
Coverage Level Distribution
23
RQ1: Success Density
• 254 projects have test success density >= 98%
Test Success Density• Passing Tests / Total
tests
24
Research Questions
RQ2:Metrics vs. Coverage at Project Level
25
RQ2: Metrics vs. Coverage (Project)Lines of Code vs. Coverage
• Spearman’s rho = -0.306 (Negative Correlation)• p-value = 1.566e-08
26
RQ2: Metrics vs. Coverage (Project)
• Spearman’s rho = -0.276 (Negative Correlation)• p-value = 3.665e-07
Cyclomatic Complexity vs. Coverage
27
RQ2: Metrics vs. Coverage (Project)
• Spearman’s rho = 0.016 (Insignificant Correlation)• p-value = 0.763
Number of Developers vs. Coverage
28
Research Questions
RQ3:Metrics vs. Coverage at File Level
29
RQ3: Metrics vs. Coverage (File)
• Spearman’s rho = 0.180 (Small +ve Correlation)• p-value < 2.2e-16
Lines of Code vs. Coverage
30
RQ3: Metrics vs. Coverage (File)
• Spearman’s rho = 0.221 (Small +ve Correlation)• p-value < 2.2e-16
Cyclomatic Complexity vs. Coverage
31
RQ3: Metrics vs. Coverage (File)
• Spearman’s rho = 0.050 (No Correlation)• p-value < 2.2e-16
Number of Developers vs. Coverage
32
Outline
• Motivation and Goals• Test Adequacy and Code Metrics• Data Collection• Empirical Results• Recommendations• Related Work• Conclusion and Future Work
33
Recommendations• Practitioners:
‒ Need to improve testing efforts, especially for large or complex software projects
‒ Need to look into automated test case generation tools
• Researchers:‒ Need to promote new tools that can be easily
used by developers‒ Need to develop test case generation tools
that can scale to large projects
34
Threats to Validity
• Internal validity:– Sonar might produce incorrect metrics or
coverage values• Projects do not conform to Maven directory
structure– We have performed some manual checks
• External validity:– Only analyze 300+ projects from GitHub and
Debian
35
Threats to Validity
• Construct validity:– Make use of standard adequacy criterion
• Code coverage– Make use of standard code metrics
• Lines of code (LOC)• Cyclomatic complexity (CC)
– Little threats to construct validity
36
Related Work• Empirical study on testing and coverage
– Mockus et al. study the impact of coverage on number of post-release defects [1]
– Shamasunder analyze the impact of different kinds of coverage on fault detection [2]
– Gopinath et al. investigate the correlation between coverage and a test suite’s effectiveness in killing mutants [3]
[1] A. Mockus, N. Nagappan, and T. T. Dinh-Trong, “Test coverage and post-verification defects: A multiple case study”, in ESEM, 2009.[2] S. Shamasunder, “Empirical study - pairwise prediction of fault based on coverage”, Master’s thesis, 2012.[3] R Gopinath, C. Jensen, and A. Groce, “Code coverage for suite evaluation for developers”, ICSE, 2014.
37
Related Work• Test case generation techniques
– Thummalapenta et al. automatically generates a series of method invocations to produce a target object state [1]
– Pandita et al. produce test inputs to achieve logical and boundary-value coverage [2]
– Park et al. combines random testing with static program analysis and concolic execution [3]
[1] S, Thummalapenta et al., “Synthesizing method sequences for high-coverage testing”, in OOPSLA, 2011.[2] R. Pandita et al., “Guided test generation for coverage criteria”, ICSM, 2010.[3] S. Park et al., “Carfast: Achieving higher statement coverage faster”, FSE, 2012.
Conclusion
38
• Many open-source projects are poorly tested‒ Only 40/327 projects have high coverage‒ Average coverage: 41.96%
• Coverage is poorer when projects get larger and more complex.
• Coverage is better for larger and more complex source code files.
• Number of developers are not significantly correlated with coverage.
39
Future Work
• Expand the study to include more projects– Address the threats to external validity
• Investigate other software metrics – Common cases of poor coverage
• Investigate the amount of effort required to attain a particular level of coverage– Cost-effectiveness analysis: effort vs. benefit
Thank you!
Questions? Comments? Advice?{kochharps.2012,ferdiant.2013}@[email protected]@lip6.fr