Effective Fault-Localization Techniques for Concurrent Software

Effective Fault-Localization Techniques for

Concurrent Software

Sangmin Park 08/06/2014

!Committee:

Rich Vuduc, Mayur Naik, Alex Orso Milos Prvulovic, Mark Grechanik

(Mary Jean Harrold)

Introduction Background Prior Work User Study Conclusion

Impact of Concurrency Bugs

2

Northeast Blackout Facebook IPO Glitch

FAIL


Debugging Concurrency Bugs

3

Concurrency bugs are rated as the most difficult types of bugs

Survey at Microsoft [Godefroid08]

• 72% rated concurrency bugs ‘very hard’ or ‘hard’ to debug!

• 83% rated concurrency bugs ‘most severe’ or ‘severe’

What is the hardest bug?! #1: Concurrency bugs (40%, 101/255)

StackOverflow

http://bit.ly/sohardest


Debugging Concurrency Bugs

4

Concurrency bugs are difficult to locate, understand, and fix

“Intermittently I get the following error. I would be grateful if anyone could shed any light on this issue.”

Difficult to Locate

* BugID: 27315

“I’ve noticed and reproduced crashes with the following stack trace. … I have no clues on why this crash occurs.”

Difficult to Locate and Understand

* BugID: 3596MySQL

!40% of initial patches to concurrency bugs are buggy Highest ratio among all software bugs

Difficult to Fix

[Yin, FSE11]Survey


Challenges

5

Non-determinism Complex state changes

Debugging Concurrent Programs [McDowell 89]


Debugging Process

6

Software

Testcase

Fault Localization Fault Understanding Fault Correction


Debugging Process

7

Software

Testcase


Localize Single-Variable Faults [ICSE 2010]

1

Localize Multi-Variable Faults [ICST 2012]

2Provide Fault Explanation [ISSTA 2013]

3

User Study4

Before Proposal

After Proposal


Thesis Statement

8

Dynamic fault-localization techniques can assist developers in locating and

understanding non-deadlock concurrency bugs by identifying

suspicious memory-access patterns and providing calling contexts and methods.


Concurrency Bugs

9

Bug Type Ratio Bug Cause

Deadlock 30% Mutual Exclusion, Hold/Wait, No preemption, Circular Wait

Order Violation 22% Memory Access Orders

Atomicity Violation 47% Memory Access Orders

Others 1%

* Learning from Mistakes - A Comprehensive Survey on Concurrency Bugs [Lu08].


Order Violation

10* https://bugzilla.mozilla.org/show_bug.cgi?id=61369

Thread 1 Thread 2

void init(…) { !mThread = CreateThread(); !}

void foo(…) { !mState = mThread -> State; !}

W R


Atomicity Violation

11* https://bugzilla.mozilla.org/show_bug.cgi?id=73291

Thread 1 Thread 2

… lock(L); lptr = str; unlock(L); !… lock(L); llen = length; unlock(L);

… lock(L); str = newStr; unlock(L); !… lock(L); length = newLength; unlock(L);

WR

char* str; // shared vars int length; // locked by L

WR


Patterns for Concurrency Bugs

12

Order Violation: R1(x) W2(x) !

Atomicity Violation: R1(x) W2(x) W2(y) R1(y)


Patterns for Concurrency Bugs

13

Type Memory Access Patterns

Order Violation R1(x) WW1(x) RW1(x) W

Atomicity Violation (one variable)

R1(x) WW1(x) WR1(x) WW1(x) RW1(x) W

Atomicity Violation(multiple variables)

W1(x) WW1(x) WW1(x) WW1(x) RW1(x) RR1(x) WR1(x) WR1(x) WW1(x) R

* The patterns were identified by previous work [Lu06, Vaziri06, Hammer08].

Developed fault-localization techniques for these patterns


Prior Work

14

Software

Testcase


Localize Single-Variable Faults [ICSE 2010]

1

Localize Multi-Variable Faults [ICST 2012]

2Provide Fault Explanation [ISSTA 2013]

3

User Study4

Before Proposal

After Proposal


Fault Localization for Single-Variable Faults

15

Ranked List for Single-Variable Bugs

1. R-W-R2. R-W-W3. R-W-W4. W-W-W5. R-W-W....

Software

Testcase

FALCON

Falcon

Statistical Fault Localization

Dynamic Pattern Detection

Single-VariablePatterns

• Pros: Effective in ranking patterns • Cons: Miss multi-variable faults

(30% of non-deadlock concurrency bugs [Lu 08])


Fault Localization for Multiple-Variable Faults

16

1. R-W-R2. R-W-W-W3. R-W-W-R4. W-W-W-W5. R-W....

Ranked List for Single-/Multi-Variable Bugs

UNICORN

Software

Testcase

Unicorn

Dynamic Pair Detection

Pattern Combination

Pairs Patterns Statistical Fault Localization

• Pros: Effective in ranking patterns • Cons: Miss contextual information


Fault Explanation

17

GRIFFIN

Software

Testcase

Bug Graphs (memory accesses +

calling stacks + suspicious methods)

Thread 1 Thread 2

150 Foo ()

270 int getS ()

271 return s; R

R R

W 851 b.s += c.s; 852 b.a += c.a; W

152 Foo () 680 void Bar()

681 a.s = b.s; 682 a.a = b.a;

Griffin

UnicornFault Localization

Pattern Clustering

Patternsper

Execution

Clustered Patterns

ContextReconstruction

Effective in clustering memory accesses and locating the bug at method level



Fault Explanation

16

Software

Testcase

Bug Graphs!(memory accesses +

calling stacks +!suspicious methods)

Griffin


ExecutionClustering

PatternClustering

Ranked Lists

Clustered Failing

Executions

GRIFFIN



15

Software

Testcase


Ranked List!for Single-/Multi-Variable Bugs

Unicorn


Pattern Combination


UNICORN


Fault Localization for !Single-Variable Faults

14

Ranked List!for Single-Variable Bugs


Software

Testcase

Falcon




FALCON

18

Usefulness

FALCON

UNICORN GRIFFIN


Goal

19


Fault Explanation

16

Software

Testcase



Griffin


ExecutionClustering

PatternClustering

Ranked Lists

Clustered Failing

Executions

GRIFFIN



15

Software

Testcase



Unicorn


Pattern Combination


UNICORN



14



Software

Testcase

Falcon




FALCON

FALCON

UNICORN GRIFFIN

To determine whether these fault-localization techniques help developers in understanding and

fixing concurrency bugs

3 techniques implemented in Eclipse tools


Debugging Tools

20

Tools Output Comments

Tracer!(Baseline)

Dump of shared memory accesses from a failing execution

• Based on ConcurrencyExplorer (at Microsoft)

• Tool used for debugging*

Unicorn Ranked list of memory-access patterns

• Unicorn subsumes Falcon • Based on Unicorn

GriffinList of memory accesses with calling context

• Based on Griffin

* Other tools (e.g., TIE, Jive, Jove) focus on visualizing thread interactions.


Tool: Tracer

21

Output: Dump of memory accesses

Thread Selector

Thread, Source Location, Variable…

Compared to ConcurrencyExplorer • Same output (dump of memory accesses) • Same outlook (tool + editor)



Tracer!(Baseline)








Debugging Tools

22


Tool: Unicorn

23

Output: Ranked list of memory-access patterns

R-W-W pattern

Compared to Tracer + Memory patterns - Thread identifier



Tracer!(Baseline)








Debugging Tools

24


Tool: Griffin

25

Interleaving

Output: List of memory accesses with calling context

+ Clustered accesses

+ Suspicious methods

+ Calling context

Compared to Unicorn + Clustered memory accesses + Suspicious methods + Calling context


Hypotheses• H1 (understanding): Unicorn > Tracer

✦ Unicorn provides summary of bugs !

• H2 (understanding): Griffin > Unicorn, Tracer ✦ Griffin provides more context information

!

• H3 (fix): Unicorn, Griffin > Tracer ✦ Understand better => Fix better

26


Study Setup

27

3 Subject Programs

32 Developers

Protocol


Study Setup

28

3 Java Programs: - Bank Account (100 LoC) - Shop (300 LoC) - List (25 KLoC)


Subject 1: Bank Account

29

User 2User 1

Balance: $100Balance: $100

Deposit: $300

Withdraw: -$100

Transfer: -$100

Deposit: $300

Withdraw: -$100

Transfer: -$100

$100

$400

$300

$300

$100

$400

$300

$300$200 $400

• Size: 100 LoC • Difficulty: Easy


Subject 2: Shop

30

CustomerGetItem

Supplier

Shop

PutItemCustomerGetItem

CustomerGetItem

Bug: The program crashes with an exception at Shop.

• Size: 300 LoC • Difficulty: Medium


Subject 3: List

31

A B CInitially, create three synchronized lists

…

.add(item)B

.add(item)B

.add(item)B

C .add(item)

C .add(item)

C .add(item)B .clear()

A .addAll( )B

Item Item Item Itemnull Item Item ItemA B C :

• Size: 25 KLoC • Difficulty: Hard


Study Setup

32

32 Developers: - Graduate students - Development experience (2~30 years, 11 median) - Concurrency experience (7 beginners, 10 experts)


Study Design

33

T1 T2 T3

S1 S2 S3 1) S1-T1, S2-T2, S3-T3

2) S1-T1, S2-T3, S3-T2

3) S1-T2, S2-T1, S3-T3

4) S1-T2, S2-T3, S3-T1

5) S1-T3, S2-T1, S3-T2

6) S1-T3, S2-T2, S3-T1

Factorial Design


Study Setup

34

Protocol: - 1 hr 30 min = 20 min tutorial + 20 min per task + 10 min buffer - Task = Debug + Survey - 5 surveys


Surveys

35

Background: • Programming experience • Concurrency experience !For each task: • Usefulness • Understanding • Fix !!!


Surveys

36



Surveys

37



Surveys

38

Background: • Programming experience • Concurrency experience !For each task: • Usefulness • Understanding • Fix !Final: • Rank of the tools • General feedback

Evaluation Scores (1 to 5 scale)

• Usefulness • Understanding: graded • Fix: ranking-based

!

Hypothesis Testing • For each task, we performed

unpaired t-test for different tool users


Overall Result

39

Score Type Hypothesis Testing

Task 1:!Bank Account!

Task 2:!Shop

Task 3: List

Usefulness

Griffin > Tracer 0.67 2.53 2.17

Griffin > Unicorn -0.14 0.31 1.44

Unicorn > Tracer 0.81 2.22 0.72

Understanding


Griffin > Unicorn 0.62 0.07 1.11

Unicorn > Tracer -0.07 0.11 -0.13

Fix



Unicorn > Tracer -0.29 0.56 -0.69

* Numbers = Mean difference (-4 to 4); Bold = Statistically significant (p < 0.05).


Hypothesis Testing

40



Task 2:!Shop

Task 3: List

Usefulness




Understanding



Unicorn > Tracer -0.07 0.11 -0.13

Fix



Unicorn > Tracer -0.29 0.56 -0.69

H1 (understanding): Unicorn > Tracer



Hypothesis Testing

41



Task 2:!Shop

Task 3: List

Usefulness




Understanding



Unicorn > Tracer -0.07 0.11 -0.13

Fix



Unicorn > Tracer -0.29 0.56 -0.69

H2 (understanding): Griffin > Unicorn, Tracer



Hypothesis Testing

42



Task 2:!Shop

Task 3: List

Usefulness




Understanding



Unicorn > Tracer -0.07 0.11 -0.13

Fix



Unicorn > Tracer -0.29 0.56 -0.69

H3 (fix): Unicorn, Griffin > Tracer



• H1 (understanding): Unicorn > Tracer

• H2 (understanding): Griffin > Unicorn, Tracer

• H3 (fix): Unicorn, Griffin > Tracer

Hypothesis Testing

43


Analysis by Tool Preference

• How many participants rate Griffin as the best tool?

!

• Did these participants actually understand bugs better?

44


Results by Tool Preference

45

Task Score Type Group-T!(2)

Group-U!(7)

Group-G!(21)

Task 1: !Bank Account

Understanding 3.0 3.75 3.78

Fix 2.0 2.37 3.05

Task 2: !Shop


Fix 2.33 3.75 4.0

Task 3:!List


Fix 1.33 2.87 2.68

21

* Numbers in headers = # participants, Numbers in other cells = average scores

• How many participants rate Griffin as the best tool? 21 (70%)

• Did these participants actually understand bugs better? Yes


Discussion: Tool Usage

46

Griffin Tracer

Track

Confirm

“There are three dimensions to think about: Time vs. Thread vs. Context. Griffin showed these quite effectively. However, the other two tools lacked in these aspects.”

•“Tracer might be useful for simple code. However, overall it won’t scale in real life scenarios because most programs are complex.”

•“Tracer wasn’t very useful on this task because there were too many threads and instructions to keep track of.”


Discussion: Improvements

47

Fix AdviceInteractive Debugging

Visual Improvement


Future Work

48

Software

Testcase


Increase Bug Coverage

Reduce Overhead

Improve Visualization

Support Interactive Debugging

Use Multiple InputsProvide

Fix Advice


Data Available in Public

49

Data Location

Unicorn http://www.cc.gatech.edu/~sangminp/unicorn

Griffin http://www.cc.gatech.edu/~sangminp/griffin

Subject Programs http://www.cc.gatech.edu/~sangminp/bugs

User Study http://www.cc.gatech.edu/~sangminp/concurrency-study


Contributions

50


• H1: Participants using Unicorn will understand concurrency bugs better than participants using Tracer

• H2: Participants using Griffin will understand concurrency bugs better than participants using Unicorn

• H3: Participants using Unicorn and Griffin will fix concurrency bugs better than participants using Tracer

Hypothesis Testing

32


Data Available in Public

45

Data Location

Unicorn http://www.cc.gatech.edu/~sangminp/unicorn

Griffin http://www.cc.gatech.edu/~sangminp/griffin

Subject Programs http://www.cc.gatech.edu/~sangminp/bugs

User Study http://www.cc.gatech.edu/~sangminp/concurrency-study



14



Software

Testcase

Falcon




FALCON Introduction Background Prior Work User Study Conclusion


15

Software

Testcase



Unicorn


Pattern Combination


UNICORN


Fault Explanation

17

GRIFFIN

Software

Testcase



Thread 1 Thread 2

150 Foo ()

270 int getS ()

271 return s; R

R R

W 851 b.s += c.s; 852 b.a += c.a; W

152 Foo () 680 void Bar()

681 a.s = b.s; 682 a.a = b.a;

Griffin


Pattern Clustering

Patternsper

Executions

Clustered Patterns

ContextReconstruction

Backup Slides


Why did you implement Tracer as Eclipse plugin?

• To minimize the effect of UI:

• Same IDE: Eclipse

• Similar UI for all debuggers: similar colors, list view

• Language difference: C#, Java

52


Concurrency Explorer

53Shared-memory dump Editor windows


ConcurrencyExplorer vs Tracer

54

ConcurrencyExplorer Tracer

Output Memory dump (Source line, Thread, Object ID

Memory dump (Source line, Thread)

UI Elements • Window for Dump • Editor for Source

• Window for Dump • Editor for Source

IDE Visual Studio Eclipse

* ConcurrencyExplorer doesn’t show values of variables.


Tools for Concurrent S/W

55

Jive and Jove !• Link: http://

cs.brown.edu/~spr/research/visjove.html !

• To show thread interactions

• Not focused on showing bugs


56


TIE !• Link:

https://www.youtube.com/watch?v=kbNXlLAkPgU !

• To show thread interactions

• Not focused on showing bugs


57


Concurrency Visualizer (for Performance): The snapshot shows inter-thread dependencies.


Tutorial

• Tutorial on Java Concurrency

• Bugs: Order/Atomicity Violations

• Fix Strategies

• Example Program

• Demo on Debugging Tools

58


Survey Links

• Link (Background): https://docs.google.com/forms/d/1xthnR5Ibw8q1qrqn-WrBti5zjFVD1b-nYZZ1S4RurMM/viewform

• Link (Task): https://docs.google.com/forms/d/1SNlg4anVAZmR99EZjErvwG0rnW2nLrLYpwqZjNfQ-yc/viewform

• Link (Final): https://docs.google.com/forms/d/1L3_Intjm6oSwoZp3wWfHIv8Z2jcpNcbbn1nPeuiv1b8/viewform

59


Fix Strategies

60


Study Design

61

T1 T2 T3

S1 S2 S3 1) S1-T1, S2-T2, S3-T3

2) S1-T1, S2-T3, S3-T2

3) S1-T2, S2-T1, S3-T3

4) S1-T2, S2-T3, S3-T1

5) S1-T3, S2-T1, S3-T2

6) S1-T3, S2-T2, S3-T1

Factorial Design

1 3

1 1

1 1 !

1 2 !

1 2 !

2 1

Beginner

Expert

• Setup: Random distribution of participants • Results: No significant score differences between groups


Factorial Design

62https://explorable.com/factorial-design

•“Factorial designs are extremely useful to psychologists and field scientists as a preliminary study, allowing them to judge whether there is a link between variables, whilst reducing the possibility of experimental error and confounding variables.”

•“The main disadvantage is the difficulty of experimenting with more than two factors, or many levels. A”


Eclipse Navigation Data

63

Task 1:!Bank Account

Task 2:!Shop

Task 3:!List

Tracer!Users 54.5 46.11 75.1

Unicorn!Users 63.4 60.6 62.33

Griffin!Users 59.11 69 39.11

• Numbers = Average navigation data (click+keyboard) • For Task 3, Griffin users have fewer navigation, but the

result is not statistically significant.


Why is Fixing more difficult?• Many strategies

• Adding a lock

• Adding a condition (if, while)

• Switch statements, …

• Many decisions in one strategy

• Where should we add a lock?

• Should we use an existing lock or add a new one?

• Fix becomes bugs (e.g., adding a new lock -> deadlock)64


Limitations

• Participants - size, quality

• Factorial design

• Debugging - no editing

65


Related Work• Empirical Studies for Sequential Bugs/Debuggers

• Weiser, Whyline, Parnin & Orso

• Empirical Studies for Concurrency

• for writing faster code

• for education

• Empirical Studies for Concurrency Bugs/Debuggers

• Sadowski and Yi’s study66