Upload
fabian-keller
View
206
Download
0
Embed Size (px)
Citation preview
Systematic Architecture Level Fault Diagnosis Using Statistical Techniques
Bachelor Thesis by Fabian Keller
Estimated Costs 2012as reported by Britton et al. [2013]
11.11.2014 STARDUST - Fabian Keller 2
Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 3
Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 4
Fault Diagnosiswhat is the current practice?
Goal: Pinpoint single/multiple failure/s
Commonly used techniques:
• System.out.println()
• Symbolic Debugging
• Static Slicing / Dynamic Slicing
There is room for improvement!
11.11.2014 STARDUST - Fabian Keller 5
Automated Fault Diagnosisis it possible?
B1 B2 B3 B4 B5 Error
Test1 1 0 0 0 0 0
Test2 1 1 0 0 0 0
Test3 1 1 1 1 1 0
Test4 1 1 1 1 1 0
Test5 1 1 1 1 1 1
Test6 1 1 1 0 1 0
11.11.2014 STARDUST - Fabian Keller 6
By intuition: A block is more suspicious, if:- It is involved in failing test cases- It is not involved in passing test cases
Ranking Metrics… it is possible
Tarantula𝑆𝑆𝑇𝑇 =
#𝐼𝐼𝐼𝐼#𝐼𝐼𝐼𝐼 + #𝑁𝑁𝐼𝐼
#𝐼𝐼𝐼𝐼#𝐼𝐼𝐼𝐼 + #𝑁𝑁𝐼𝐼 + #𝐼𝐼𝐼𝐼
#𝐼𝐼𝐼𝐼 + #𝑁𝑁𝐼𝐼
Jaccard𝑆𝑆𝐽𝐽 =
#𝐼𝐼𝐼𝐼#𝐼𝐼𝐼𝐼 + #𝑁𝑁𝐼𝐼 + #𝐼𝐼𝐼𝐼
Ochiai𝑆𝑆𝑂𝑂 =
#𝐼𝐼𝐼𝐼(#𝐼𝐼𝐼𝐼 + #𝑁𝑁𝐼𝐼) ⋅ #𝐼𝐼𝐼𝐼 + #𝐼𝐼𝐼𝐼
Involved / Not involved / Failing / Passing
11.11.2014 STARDUST - Fabian Keller 7
B1 B2 B3 B4 B5 Error
Test1 1 0 0 0 0 0
Test2 1 1 0 0 0 0
Test3 1 1 1 1 1 0
Test4 1 1 1 1 1 0
Test5 1 1 1 1 1 1
Test6 1 1 1 0 1 0
𝑆𝑆𝑇𝑇 0,50 0,56 0,63 0,71 0,63
𝑆𝑆𝐽𝐽 0,17 0,20 0,25 0,33 0,25
𝑆𝑆𝑂𝑂 0,41 0,45 0,50 0,58 0,50
Ranking:1. B4 2. B3, B5 3. B2 4. B1
Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 8
Commonly Used Dataand its limiting factors
11.11.2014 STARDUST - Fabian Keller 9
Software-artifact Infrastructure Repository• Siemens set• space program
Program Faulty versions LOC Test cases Descriptionprint_tokens 7 478 4130 Lexical anayzer
print_tokens2 10 399 4115 Lexical analyzer
replace 32 512 5542 Pattern recognition
schedule 9 292 2650 Priority scheduler
schedule2 10 301 2710 Priority scheduler
tcas 41 141 1608 Altitude separation
tot_info 23 440 1052 Information measure
space 38 6218 13585 Array definition language
Performance Metricshow can fault localization performance be evaluated?
• Wasted Effort (WE):
Ranking: L4, L3, L2, L7, L6, L1, L5, L9, L10, L8
Wasted Effort (prominent bug): 2 (or 20%)
• Proportion of Bugs Localized (PBL)
Percentage of bugs localized with WE < p%
• Hit@X
Number of bugs localized after inspecting X elements
11.11.2014 STARDUST - Fabian Keller 10
Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 11
AspectJ – Lines of Codenearly doubled in the examined timespan
11.11.2014 STARDUST - Fabian Keller 12
AspectJ – Commitsactive development with mostly 50+ commits per month
11.11.2014 STARDUST - Fabian Keller 13
AspectJ – Bugsnearly 2500 bugs reported in the examined time span
11.11.2014 STARDUST - Fabian Keller 14
AspectJ – Dataless than 40% of the investigated bugs are applicable for SBFL
AspectJ AJDT Sum
All bugs 1544 886 2430
Bugs in iBugs 285 65 350
Classified Bugs 99 11 110
Applicable Bugs 41 1 42
Involved Bugs 20 1 21
11.11.2014 STARDUST - Fabian Keller 15
What happened?
Bug 36234workarounds cannot be used as evaluation oracle
11.11.2014 STARDUST - Fabian Keller 16
Bug report: „Getting an out of memory error when compiling with Ajc 1.1 RC1 […]”
Pre-Fix Post-Fix
Bug 61411platform specific bugs are mostly not present in test suites
11.11.2014 STARDUST - Fabian Keller 17
Bug report: „[…] highlights a problem that I've seen using ajdoc.bat on Windows […]”
Pre-Fix Post-Fix
Bug 151182synchronization bugs are mostly not present in test suites
11.11.2014 STARDUST - Fabian Keller 18
Bug report: „[…] recompiled the aspect using 1.5.2 and tried to run it […], but it fails with a NullPointerException.[…]”
Pre-Fix Post-Fix
Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 19
Research Questions
• RQ1: How does the program size influence fault localization
performance?
• RQ2: How many bugs can be found when examining a fixed
amount of ranked elements?
• RQ3: How does the program size influence suspiciousness
scores produced by different ranking metrics?
• RQ4: Are the fault localization performance metrics
currently used by the research community valid?
11.11.2014 STARDUST - Fabian Keller 20
RQ1: Program Size vs. SBFL Performance?multiple ranked elements are mapped to the same suspiciousness
11.11.2014 STARDUST - Fabian Keller 21
11.11.2014 STARDUST - Fabian Keller 22
RQ4: Are the Performance Metrics Valid?on average, no bugs can be found in the first 100 lines
11.11.2014 STARDUST - Fabian Keller 23
RQ4: Are the Performance Metrics Valid?with luck, 33% of all bugs can be found in the first 1000 lines
11.11.2014 STARDUST - Fabian Keller 24
Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 25
Conclusionsthere is still some work to be done
• Bugs need more context to be fully understood
• Current metrics cannot be applied to large projects
• SBFL is not feasible for large projects
• New metrics are starting point for future work
11.11.2014 STARDUST - Fabian Keller 26
Thank you for your attention!
Questions?
11.11.2014 STARDUST - Fabian Keller 27
RQ2: examining a fixed amountinspect more than 100 files to find 50% of all bugs
11.11.2014 STARDUST - Fabian Keller 28
RQ3: Program Size vs. Suspiciousnessmean suspiciousness drops for larger programs
11.11.2014 STARDUST - Fabian Keller 29
WAUC: Weighted Area Under Curve
11.11.2014 STARDUST - Fabian Keller 30