56
A Randomized Algorithm for Concurrency Testing Madan Musuvathi Research in Software Engineering Microsoft Research

A Rand omized Algorithm for Concurrency Testing

  • Upload
    zea

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

A Rand omized Algorithm for Concurrency Testing. Madan Musuvathi Research in Software Engineering Microsoft Research. The Concurrency Testing Problem. A closed program = program + test harness Test harness encodes both the concurrency scenario and the inputs - PowerPoint PPT Presentation

Citation preview

Page 1: A Rand omized  Algorithm for  Concurrency Testing

A Randomized Algorithm for Concurrency Testing

Madan MusuvathiResearch in Software Engineering

Microsoft Research

Page 2: A Rand omized  Algorithm for  Concurrency Testing

The Concurrency Testing ProblemA closed program = program + test harnessTest harness encodes both the concurrency scenario

and the inputsThe only nondeterminism is the thread interleavings

Page 3: A Rand omized  Algorithm for  Concurrency Testing

Verification vs TestingVerification:

Prove that the program is correct (free of bugs)With the minimum amount of resources

Testing: ??

Page 4: A Rand omized  Algorithm for  Concurrency Testing

Verification vs TestingVerification:

Prove that the program is correct (free of bugs)With the minimum amount of resources

Testing: Given a certain amount of resourcesHow close to a proof you can get?Maximize the number of bugs that you can find

In the limit: Verification == Testing

Page 5: A Rand omized  Algorithm for  Concurrency Testing

Testing is more important than VerificationUndecidability argument

There is always going to be programs large enough and properties complex enough for which verification cannot be done

Economic argumentIf the cost of a bug is lesser than the cost finding the bug (or proving

its absence)You are better off shipping buggy software

Engineering arugmentMake software only as reliable as the weakest link in the entire system

o

Page 6: A Rand omized  Algorithm for  Concurrency Testing

Providing Probabilistic GuaranteesProblem we would like to solve:

Given a program, prove that it does not do something wrong with probability > 95%

Problem we can hope to solve:Given a program that contains a bug, design a testing

algorithm that finds the bug with probability > 95%Prove optimality: no testing algorithm can do better

Page 7: A Rand omized  Algorithm for  Concurrency Testing

Cuzz: Concurrency FuzzingDisciplined randomization of schedules

Probabilistic guaranteesEvery run finds a bug with some (reasonably large) probabilityRepeat runs to increase the chance of finding a bug

ScalableIn the no. of threads and program size

EffectiveBugs in IE, Firefox, Office Communicator, Outlook, …Bugs found in the first few runs

Page 8: A Rand omized  Algorithm for  Concurrency Testing

Cuzz Demo

Page 9: A Rand omized  Algorithm for  Concurrency Testing

Problem FormulationP is a class of programsB is a class of bugsGiven P and B, you design a testing algorithm TGiven T, the adversary picks a program p in P

containing a bug b in BGiven p, T generates an input in constant timeProve that T finds b in p with a probability X(p,B)

Page 10: A Rand omized  Algorithm for  Concurrency Testing

In our caseP is a class of closed terminating concurrent programsB is a class of bugsGiven P and B, you design a testing algorithm TGiven T, the adversary picks a program p in P

containing a bug b in BGiven p, T generates an interleaving in constant timeProve that T finds b in p with a probability X(p,B)

Page 11: A Rand omized  Algorithm for  Concurrency Testing

Useful parametersFor a closed terminating concurrent program

(Fancy way of saying, a program combined with a concurrency test)

n : maximum number of threadsk : maximum number of instructions executed

Page 12: A Rand omized  Algorithm for  Concurrency Testing

What is a “Bug” – first attempt

Bug is defined as a particular buggy interleaving

No algorithm can find the bug with a probably greater than 1/nk

k instructions(~ millions)

nk schedules

n threads(~ tens)

Page 13: A Rand omized  Algorithm for  Concurrency Testing

A Deterministic Algorithm

Provides no guarantees

k instructions(~ millions)

nk schedules

n threads(~ tens)

Page 14: A Rand omized  Algorithm for  Concurrency Testing

Randomized Algorithm

Samples the schedule space with some probability distribution Adversary picks the schedule that is the least probable Probability of finding the bug <= 1/nk

k instructions(~ millions)

nk schedules

n threads(~ tens)

Page 15: A Rand omized  Algorithm for  Concurrency Testing

Randomized Algorithm

1/nk is a mighty small numberHard to design algorithms that find the bug with

probability == 1/nk

k instructions(~ millions)

nk schedules

n threads(~ tens)

Page 16: A Rand omized  Algorithm for  Concurrency Testing

A Good Research Trick

When you cant solve a problem, change the problem definition

Page 17: A Rand omized  Algorithm for  Concurrency Testing

Bugs are not adversarialUsually, if there is one interleaving that finds the bug

there are many interleavings that find the same bugThis is not true for program inputs

These set of interleavings that find the bug share the same root cause

The root cause of real bugs are not complicatedSmart people make stupid mistakes

Page 18: A Rand omized  Algorithm for  Concurrency Testing

Classifying BugsClassify concurrency bugs based on a suitable “depth”

metric

Adversary can chose any bug but within a given depth

Testing algorithm provides better guarantees for bugs with a smaller depthEven if worst-case probability is less than 1/nk

We want real bugs to have small depthWe want to be able design effective sampling

algorithms for finding bugs of a particular depth

Page 19: A Rand omized  Algorithm for  Concurrency Testing

Our Bug Depth DefinitionBug Depth = number of ordering constraints sufficient

to find the bug

Best explained through examples

Page 20: A Rand omized  Algorithm for  Concurrency Testing

A Bug of Depth 1 Bug Depth = no. of ordering constraints

sufficient to find the bug

A: …B: fork (child);C: p = malloc();D: …E: …

Parent

F: ….G: do_init();H: p->f ++; I: …J: …

Child Possible schedules

A B C D E F G H I J A B F G H C D E I J A B F G C D E H I J A B F G C H D E I J A B F G H I J C D E …

Page 21: A Rand omized  Algorithm for  Concurrency Testing

A Bug of Depth 2 Bug Depth = no. of ordering constraints

sufficient to find the bug

A: …B: p = malloc();C: fork (child);D: ….E: if (p != NULL)F: p->f ++;G:

Parent

H: … I: p = NULL;J : ….

Child Possible schedules

A B C D E F G H I J A B C D E H I J F G A B C H I D E G J A B C D H E F I J G A B C H D E I J F G …

Page 22: A Rand omized  Algorithm for  Concurrency Testing

Another Bug of Depth 2 Bug Depth = no. of ordering constraints

sufficient to find the bug

A: …B: Lock (A);C: …D: Lock (B);E: …

Parent

F: …G: Lock (B);H: … I: Lock (A);J: …

Child

Page 23: A Rand omized  Algorithm for  Concurrency Testing

HypothesisMost concurrency bugs in practice have a very small

depth

What has been empirically validated :There are lots of bugs of small depths in real programs

Page 24: A Rand omized  Algorithm for  Concurrency Testing

Defining a BugA schedule is a sequence of (dynamic) instructions

S = set of schedules of a closed program

A concurrency bug B is a strict subset of S

Page 25: A Rand omized  Algorithm for  Concurrency Testing

Ordering ConstraintsA schedule satisfies an ordering constraint (a,b) if

instruction a occurs before instruction b in the schedule

A: …B: fork (child);C: p = malloc();D: …E: …

Parent

F: ….G: do_init();H: p->f ++; I: …J: …

ChildA B F G H C D E I JSatisfies (H, C)

Page 26: A Rand omized  Algorithm for  Concurrency Testing

Depth of a BugS(c1,c2,…cn) = set of schedules that satisfy the ordering

constraints c1,c2,…cn

A bug B is of depth ≤ d, if there exists constraints c1,c2,…cd such that S(c1,c2,…cd) B

Page 27: A Rand omized  Algorithm for  Concurrency Testing

A Bug of Depth 1 Bug Depth = no. of ordering constraints

sufficient to find the bug

A: …B: fork (child);C: p = malloc();D: …E: …

Parent

F: ….G: do_init();H: p->f ++; I: …J: …

Child Possible schedules

A B C D E F G H I J A B F G H C D E I J A B F G C D E H I J A B F G C H D E I J A B F G H I J C D E …

Page 28: A Rand omized  Algorithm for  Concurrency Testing

What is the Depth of this Bug

A: …B: p = malloc();C: fork (child);D: allocated = 1E: p = null;

Parent

F: ….G: if(allocated)H: p->f++; I: …J: …

Child

Any buggy interleaving satisfies(D, G) && (E, H)

Bug depth <= 2

Page 29: A Rand omized  Algorithm for  Concurrency Testing

What is the Depth of this Bug

A: …B: p = malloc();C: fork (child);D: allocated = 1E: p = null;

Parent

F: ….G: if(allocated)H: p->f++; I: …J: …

Child Any interleaving that satisfies(E,G) is buggy

Bug depth == 1

Even though there are buggy interelavings thatdon’t satisfy (E,G)

Page 30: A Rand omized  Algorithm for  Concurrency Testing

Lets look at the complicated bugvoid AddToCache() { // ... A: x &= ~(FLAG_NOT_DELETED); B: x |= FLAG_CACHED; MemoryBarrier(); // ... } AddToCache(); assert( x & FLAG_CACHED );

Page 31: A Rand omized  Algorithm for  Concurrency Testing

The bit operations are not atomicvoid AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );

Page 32: A Rand omized  Algorithm for  Concurrency Testing

The bugvoid AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );

void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );

Page 33: A Rand omized  Algorithm for  Concurrency Testing

Cuzz GuaranteeGiven a program that creates at most n threads and

executes at most k instructionsCuzz finds every bug of depth d with probability in

every run of the program

Page 34: A Rand omized  Algorithm for  Concurrency Testing

A Bug of Depth 1 Bug Depth = no. of ordering constraints

sufficient to find the bug

Probability of bug >= 1/n n: no. of threads (~ tens)

A: …B: fork (child);C: p = malloc();D: …E: …

Parent

F: ….G: do_init();H: p->f ++; I: …J: …

Child Possible schedules

A B C D E F G H I J A B F G H C D E I J A B F G C D E H I J A B F G C H D E I J A B F G H I J C D E …

Page 35: A Rand omized  Algorithm for  Concurrency Testing

A Bug of Depth 2 Bug Depth = no. of ordering constraints

sufficient to find the bug

Probability of bug >= 1/nkn: no. of threads (~ tens)k: no. of instructions (~ millions)

A: …B: p = malloc();C: fork (child);D: ….E: if (p != NULL)F: p->f ++;G:

Parent

H: … I: p = NULL;J : ….

Child Possible schedules

A B C D E F G H I J A B C D E H I J F G A B C H I D E G J A B C D H E F I J G A B C H D E I J F G …

Page 36: A Rand omized  Algorithm for  Concurrency Testing

Another Bug of Depth 2 Bug Depth = no. of ordering constraints

sufficient to find the bug

Probability of bug >= 1/nkn: no. of threads (~ tens)k: no. of instructions (~ millions)

A: …B: Lock (A);C: …D: Lock (B);E: …

Parent

F: …G: Lock (B);H: … I: Lock (A);J: …

Child

Page 37: A Rand omized  Algorithm for  Concurrency Testing

Cuzz AlgorithmInputs: n: estimated bound on the number of threads

k: estimated bound on the number of stepsd: target bug depth

// 1. assign random priorities >= d to threads for t in [1…n] do priority[t] = rand() + d;

// 2. chose d-1 lowering points at randomfor i in [1...d) do lowering[i] = rand() % k;

steps = 0;while (some thread enabled) { // 3. Honor thread priorities Let t be the highest-priority enabled thread; schedule t for one step; steps ++;

// 4. At the ith lowering point, set the priority to i if steps == lowering[i] for some i priority[t] = i;}

Page 38: A Rand omized  Algorithm for  Concurrency Testing

A Bug of Depth 1Found when child has a higher probability than the

parent (prob = ½)

fork (child);p = malloc();

ParentPri = 1

do_init();p->f ++;

ChildPri = 2

fork (child);

p = malloc();

Page 39: A Rand omized  Algorithm for  Concurrency Testing

A Bug of Depth 2Found when the parent starts with a higher probability and

a lowering point is inserted after the branch condition (prob = 1/2*5 = 1/10)

p = malloc();fork (child);

if (p != NULL) p->f ++;

ParentPri = 3

p = NULL;

ChildPri = 2

p->f ++;

p = malloc();fork (child);

if (p != NULL)Pri = 1

Lowering Point

Page 40: A Rand omized  Algorithm for  Concurrency Testing

In Practice, Cuzz Beats its BoundCuzz performs far greater than the theoretical bound

1. The worst-case bound is based on a conservative analysis

2. We employ various optimizations

3. Programs have LOTS of bugs Probability of finding any of the bug is (roughly) the

sum of the probability of finding each

4. The buggy code is executed LOTS of times

Page 41: A Rand omized  Algorithm for  Concurrency Testing

For Some of our BenchmarksProbability increases with n, stays the same with k

In contrast, worst-case bound = 1/nkd-1

2 3 5 9 17 33 650

0.005

0.01

0.015

0.02

0.0254 items 16 items64 items

Number of Threads

Prob

abili

ty o

f find

ing

the

bug

Page 42: A Rand omized  Algorithm for  Concurrency Testing

Dimension TheoryAny partial-order G can be expressed as an

intersection of a set of total orders

This set is called a realizer of G

a

b

c

d

e

=a b c d e

a d b e c

Page 43: A Rand omized  Algorithm for  Concurrency Testing

Property of RealizersFor any unordered pair a and b, a realizer contains

two total orders that satisfy (a,b) and (b,a)

a

b

c

d

e

=a b c d e

a d b e c

Page 44: A Rand omized  Algorithm for  Concurrency Testing

Dimension of a Partial OrderDimension of G is the size of the smallest realizer of

G

Dimension is 2 for this example

a

b

c

d

e

=a b c d e

a d b e c

Page 45: A Rand omized  Algorithm for  Concurrency Testing

Why is it called “dimension”You can encode a partial-order of dimension d as

points in a d-dimensional spacea

b

c

d

e

=

a b c d e

a d b e c

∩0 1 2 3 4

0

1

2

3

4

a

b

c

d

e

Page 46: A Rand omized  Algorithm for  Concurrency Testing

Why is it relevant for usP = Set of all partial orders, B = Set of all bugs of depth 1

If you can uniformly sample the smallest realizer of a partial order p

Probability of any bug of depth 1 >= 1/dimension(p)

a

b

c

d

e

=a b c d e

a d b e c

Page 47: A Rand omized  Algorithm for  Concurrency Testing

All this is good, but

Finding the dimension of a partial order in NP complete

Real programs are not static partial-orders

Page 48: A Rand omized  Algorithm for  Concurrency Testing

Width of a Partial-OrderWidth of a partial-order G is the minimum number of

total orders needed to cover GWidth corresponds to the number of “threads” in G

For all G, Dimension(G) <= Width(G)

a

b

c

d

e

a

b

c

d

e

is covered by

Page 49: A Rand omized  Algorithm for  Concurrency Testing

Cuzz AlgorithmCuzz is an online randomized algorithm for uniformly

sampling a realizer of size Width(G)

Assign random priorities to “threads” and topologically sort based on the priorities

a

b

c

d

e

=a b c d e

a d b e c

Page 50: A Rand omized  Algorithm for  Concurrency Testing

Extension to Larger DepthsNote: a realizer of G covers all possible orderings of an unordered

pair

We define a d-realizer of G as a set of total orders that covers all possible orderings of d unordered pairs

d-dimension of G is the size of the smallest d-realizer of G

Theoremd-Dimension(G) <= Dimension(G) . kd-1

where k is the number of nodes in G

Cuzz is an online algorithm for uniformly sampling over a d-realizer of G

Page 51: A Rand omized  Algorithm for  Concurrency Testing

OptimizationsNeed to insert lowering points only at sync.

operationsSync operations include locks, semaphores, hardware

interlocked instructions, racy shared memory accessesBased on partial-order reduction in model checkingReduces k from ~millions to ~ten thousands

Reset algorithm after every join point Join point = a state in which only one thread is enabledReduces n and k

Page 52: A Rand omized  Algorithm for  Concurrency Testing

Join-Point OptimizationIf partial order G is the serial composition of A and B, thend-Dimension(G) = Max (d-Dimension(A), d-Dimension(B))

d-Dimension(G) <= Dimension(G) . 4d-1

a

b c

ed

e f

g

A

B

Page 53: A Rand omized  Algorithm for  Concurrency Testing

Other Practical ConsiderationsLower priority threads can be

starvedTemporarily boost priorities

with a very small probability

Perturbation of real-time can result in “false-errors”Low priority threads run very

slowlySome programs use timing

based synchronization

while(!x) { ; }

x = 1;

High PriLow Pri

sleep(10 sec);p->f++;

p = malloc();

High PriLow Pri

Page 54: A Rand omized  Algorithm for  Concurrency Testing

Comparison with Worst-Case Bound

Program Empirical Bound

Splash – Barnes 0.5 0.5

Splash – LU 0.5 0.5

Splash- Barnes 0.49 0.5

Pbzip 0.701 0.0001

Work Steal Queue 0.002 0.0003

Dryad 0.164 2x10-5

Page 55: A Rand omized  Algorithm for  Concurrency Testing

Scalability Program LOC d n k sync k optimized

Splash – FFT 1200 1 2 791 139Splash – LU 1130 1 2 1517 996Splash – Barnes 3465 1 2 7917 318

Pbzip2 1978 2 4 1981 1207TPL WSQ 495 2 4 1488 75

Dryad 16036 2 5 9631 1990IE - 1 25 1.4M 0.13MMozilla 245K 1 12 38.4M 3M

Page 56: A Rand omized  Algorithm for  Concurrency Testing

ConclusionsProbabilistic concurrency testing

Provides reasonable probabilistic bounds of finding bugs

Notion of bug depthA classification of concurrency bugsEssential for probabilistic boundsMany bugs have a very small depth

The initial prototype of Cuzz is very effectiveFinds lots of bugs within the first few hundred runsScales to large programs