Upload
zea
View
53
Download
0
Tags:
Embed Size (px)
DESCRIPTION
A Rand omized Algorithm for Concurrency Testing. Madan Musuvathi Research in Software Engineering Microsoft Research. The Concurrency Testing Problem. A closed program = program + test harness Test harness encodes both the concurrency scenario and the inputs - PowerPoint PPT Presentation
Citation preview
A Randomized Algorithm for Concurrency Testing
Madan MusuvathiResearch in Software Engineering
Microsoft Research
The Concurrency Testing ProblemA closed program = program + test harnessTest harness encodes both the concurrency scenario
and the inputsThe only nondeterminism is the thread interleavings
Verification vs TestingVerification:
Prove that the program is correct (free of bugs)With the minimum amount of resources
Testing: ??
Verification vs TestingVerification:
Prove that the program is correct (free of bugs)With the minimum amount of resources
Testing: Given a certain amount of resourcesHow close to a proof you can get?Maximize the number of bugs that you can find
In the limit: Verification == Testing
Testing is more important than VerificationUndecidability argument
There is always going to be programs large enough and properties complex enough for which verification cannot be done
Economic argumentIf the cost of a bug is lesser than the cost finding the bug (or proving
its absence)You are better off shipping buggy software
Engineering arugmentMake software only as reliable as the weakest link in the entire system
o
Providing Probabilistic GuaranteesProblem we would like to solve:
Given a program, prove that it does not do something wrong with probability > 95%
Problem we can hope to solve:Given a program that contains a bug, design a testing
algorithm that finds the bug with probability > 95%Prove optimality: no testing algorithm can do better
Cuzz: Concurrency FuzzingDisciplined randomization of schedules
Probabilistic guaranteesEvery run finds a bug with some (reasonably large) probabilityRepeat runs to increase the chance of finding a bug
ScalableIn the no. of threads and program size
EffectiveBugs in IE, Firefox, Office Communicator, Outlook, …Bugs found in the first few runs
Cuzz Demo
Problem FormulationP is a class of programsB is a class of bugsGiven P and B, you design a testing algorithm TGiven T, the adversary picks a program p in P
containing a bug b in BGiven p, T generates an input in constant timeProve that T finds b in p with a probability X(p,B)
In our caseP is a class of closed terminating concurrent programsB is a class of bugsGiven P and B, you design a testing algorithm TGiven T, the adversary picks a program p in P
containing a bug b in BGiven p, T generates an interleaving in constant timeProve that T finds b in p with a probability X(p,B)
Useful parametersFor a closed terminating concurrent program
(Fancy way of saying, a program combined with a concurrency test)
n : maximum number of threadsk : maximum number of instructions executed
What is a “Bug” – first attempt
Bug is defined as a particular buggy interleaving
No algorithm can find the bug with a probably greater than 1/nk
k instructions(~ millions)
nk schedules
n threads(~ tens)
A Deterministic Algorithm
Provides no guarantees
k instructions(~ millions)
nk schedules
n threads(~ tens)
Randomized Algorithm
Samples the schedule space with some probability distribution Adversary picks the schedule that is the least probable Probability of finding the bug <= 1/nk
k instructions(~ millions)
nk schedules
n threads(~ tens)
Randomized Algorithm
1/nk is a mighty small numberHard to design algorithms that find the bug with
probability == 1/nk
k instructions(~ millions)
nk schedules
n threads(~ tens)
A Good Research Trick
When you cant solve a problem, change the problem definition
Bugs are not adversarialUsually, if there is one interleaving that finds the bug
there are many interleavings that find the same bugThis is not true for program inputs
These set of interleavings that find the bug share the same root cause
The root cause of real bugs are not complicatedSmart people make stupid mistakes
Classifying BugsClassify concurrency bugs based on a suitable “depth”
metric
Adversary can chose any bug but within a given depth
Testing algorithm provides better guarantees for bugs with a smaller depthEven if worst-case probability is less than 1/nk
We want real bugs to have small depthWe want to be able design effective sampling
algorithms for finding bugs of a particular depth
Our Bug Depth DefinitionBug Depth = number of ordering constraints sufficient
to find the bug
Best explained through examples
A Bug of Depth 1 Bug Depth = no. of ordering constraints
sufficient to find the bug
A: …B: fork (child);C: p = malloc();D: …E: …
Parent
F: ….G: do_init();H: p->f ++; I: …J: …
Child Possible schedules
A B C D E F G H I J A B F G H C D E I J A B F G C D E H I J A B F G C H D E I J A B F G H I J C D E …
A Bug of Depth 2 Bug Depth = no. of ordering constraints
sufficient to find the bug
A: …B: p = malloc();C: fork (child);D: ….E: if (p != NULL)F: p->f ++;G:
Parent
H: … I: p = NULL;J : ….
Child Possible schedules
A B C D E F G H I J A B C D E H I J F G A B C H I D E G J A B C D H E F I J G A B C H D E I J F G …
Another Bug of Depth 2 Bug Depth = no. of ordering constraints
sufficient to find the bug
A: …B: Lock (A);C: …D: Lock (B);E: …
Parent
F: …G: Lock (B);H: … I: Lock (A);J: …
Child
HypothesisMost concurrency bugs in practice have a very small
depth
What has been empirically validated :There are lots of bugs of small depths in real programs
Defining a BugA schedule is a sequence of (dynamic) instructions
S = set of schedules of a closed program
A concurrency bug B is a strict subset of S
Ordering ConstraintsA schedule satisfies an ordering constraint (a,b) if
instruction a occurs before instruction b in the schedule
A: …B: fork (child);C: p = malloc();D: …E: …
Parent
F: ….G: do_init();H: p->f ++; I: …J: …
ChildA B F G H C D E I JSatisfies (H, C)
Depth of a BugS(c1,c2,…cn) = set of schedules that satisfy the ordering
constraints c1,c2,…cn
A bug B is of depth ≤ d, if there exists constraints c1,c2,…cd such that S(c1,c2,…cd) B
A Bug of Depth 1 Bug Depth = no. of ordering constraints
sufficient to find the bug
A: …B: fork (child);C: p = malloc();D: …E: …
Parent
F: ….G: do_init();H: p->f ++; I: …J: …
Child Possible schedules
A B C D E F G H I J A B F G H C D E I J A B F G C D E H I J A B F G C H D E I J A B F G H I J C D E …
What is the Depth of this Bug
A: …B: p = malloc();C: fork (child);D: allocated = 1E: p = null;
Parent
F: ….G: if(allocated)H: p->f++; I: …J: …
Child
Any buggy interleaving satisfies(D, G) && (E, H)
Bug depth <= 2
What is the Depth of this Bug
A: …B: p = malloc();C: fork (child);D: allocated = 1E: p = null;
Parent
F: ….G: if(allocated)H: p->f++; I: …J: …
Child Any interleaving that satisfies(E,G) is buggy
Bug depth == 1
Even though there are buggy interelavings thatdon’t satisfy (E,G)
Lets look at the complicated bugvoid AddToCache() { // ... A: x &= ~(FLAG_NOT_DELETED); B: x |= FLAG_CACHED; MemoryBarrier(); // ... } AddToCache(); assert( x & FLAG_CACHED );
The bit operations are not atomicvoid AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );
The bugvoid AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );
void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );
Cuzz GuaranteeGiven a program that creates at most n threads and
executes at most k instructionsCuzz finds every bug of depth d with probability in
every run of the program
A Bug of Depth 1 Bug Depth = no. of ordering constraints
sufficient to find the bug
Probability of bug >= 1/n n: no. of threads (~ tens)
A: …B: fork (child);C: p = malloc();D: …E: …
Parent
F: ….G: do_init();H: p->f ++; I: …J: …
Child Possible schedules
A B C D E F G H I J A B F G H C D E I J A B F G C D E H I J A B F G C H D E I J A B F G H I J C D E …
A Bug of Depth 2 Bug Depth = no. of ordering constraints
sufficient to find the bug
Probability of bug >= 1/nkn: no. of threads (~ tens)k: no. of instructions (~ millions)
A: …B: p = malloc();C: fork (child);D: ….E: if (p != NULL)F: p->f ++;G:
Parent
H: … I: p = NULL;J : ….
Child Possible schedules
A B C D E F G H I J A B C D E H I J F G A B C H I D E G J A B C D H E F I J G A B C H D E I J F G …
Another Bug of Depth 2 Bug Depth = no. of ordering constraints
sufficient to find the bug
Probability of bug >= 1/nkn: no. of threads (~ tens)k: no. of instructions (~ millions)
A: …B: Lock (A);C: …D: Lock (B);E: …
Parent
F: …G: Lock (B);H: … I: Lock (A);J: …
Child
Cuzz AlgorithmInputs: n: estimated bound on the number of threads
k: estimated bound on the number of stepsd: target bug depth
// 1. assign random priorities >= d to threads for t in [1…n] do priority[t] = rand() + d;
// 2. chose d-1 lowering points at randomfor i in [1...d) do lowering[i] = rand() % k;
steps = 0;while (some thread enabled) { // 3. Honor thread priorities Let t be the highest-priority enabled thread; schedule t for one step; steps ++;
// 4. At the ith lowering point, set the priority to i if steps == lowering[i] for some i priority[t] = i;}
A Bug of Depth 1Found when child has a higher probability than the
parent (prob = ½)
fork (child);p = malloc();
ParentPri = 1
do_init();p->f ++;
ChildPri = 2
fork (child);
p = malloc();
A Bug of Depth 2Found when the parent starts with a higher probability and
a lowering point is inserted after the branch condition (prob = 1/2*5 = 1/10)
p = malloc();fork (child);
if (p != NULL) p->f ++;
ParentPri = 3
p = NULL;
ChildPri = 2
p->f ++;
p = malloc();fork (child);
if (p != NULL)Pri = 1
Lowering Point
In Practice, Cuzz Beats its BoundCuzz performs far greater than the theoretical bound
1. The worst-case bound is based on a conservative analysis
2. We employ various optimizations
3. Programs have LOTS of bugs Probability of finding any of the bug is (roughly) the
sum of the probability of finding each
4. The buggy code is executed LOTS of times
For Some of our BenchmarksProbability increases with n, stays the same with k
In contrast, worst-case bound = 1/nkd-1
2 3 5 9 17 33 650
0.005
0.01
0.015
0.02
0.0254 items 16 items64 items
Number of Threads
Prob
abili
ty o
f find
ing
the
bug
Dimension TheoryAny partial-order G can be expressed as an
intersection of a set of total orders
This set is called a realizer of G
a
b
c
d
e
=a b c d e
a d b e c
∩
Property of RealizersFor any unordered pair a and b, a realizer contains
two total orders that satisfy (a,b) and (b,a)
a
b
c
d
e
=a b c d e
a d b e c
∩
Dimension of a Partial OrderDimension of G is the size of the smallest realizer of
G
Dimension is 2 for this example
a
b
c
d
e
=a b c d e
a d b e c
∩
Why is it called “dimension”You can encode a partial-order of dimension d as
points in a d-dimensional spacea
b
c
d
e
=
a b c d e
a d b e c
∩0 1 2 3 4
0
1
2
3
4
a
b
c
d
e
Why is it relevant for usP = Set of all partial orders, B = Set of all bugs of depth 1
If you can uniformly sample the smallest realizer of a partial order p
Probability of any bug of depth 1 >= 1/dimension(p)
a
b
c
d
e
=a b c d e
a d b e c
∩
All this is good, but
Finding the dimension of a partial order in NP complete
Real programs are not static partial-orders
Width of a Partial-OrderWidth of a partial-order G is the minimum number of
total orders needed to cover GWidth corresponds to the number of “threads” in G
For all G, Dimension(G) <= Width(G)
a
b
c
d
e
a
b
c
d
e
is covered by
Cuzz AlgorithmCuzz is an online randomized algorithm for uniformly
sampling a realizer of size Width(G)
Assign random priorities to “threads” and topologically sort based on the priorities
a
b
c
d
e
=a b c d e
a d b e c
∩
Extension to Larger DepthsNote: a realizer of G covers all possible orderings of an unordered
pair
We define a d-realizer of G as a set of total orders that covers all possible orderings of d unordered pairs
d-dimension of G is the size of the smallest d-realizer of G
Theoremd-Dimension(G) <= Dimension(G) . kd-1
where k is the number of nodes in G
Cuzz is an online algorithm for uniformly sampling over a d-realizer of G
OptimizationsNeed to insert lowering points only at sync.
operationsSync operations include locks, semaphores, hardware
interlocked instructions, racy shared memory accessesBased on partial-order reduction in model checkingReduces k from ~millions to ~ten thousands
Reset algorithm after every join point Join point = a state in which only one thread is enabledReduces n and k
Join-Point OptimizationIf partial order G is the serial composition of A and B, thend-Dimension(G) = Max (d-Dimension(A), d-Dimension(B))
d-Dimension(G) <= Dimension(G) . 4d-1
a
b c
ed
e f
g
A
B
Other Practical ConsiderationsLower priority threads can be
starvedTemporarily boost priorities
with a very small probability
Perturbation of real-time can result in “false-errors”Low priority threads run very
slowlySome programs use timing
based synchronization
while(!x) { ; }
x = 1;
High PriLow Pri
sleep(10 sec);p->f++;
p = malloc();
High PriLow Pri
Comparison with Worst-Case Bound
Program Empirical Bound
Splash – Barnes 0.5 0.5
Splash – LU 0.5 0.5
Splash- Barnes 0.49 0.5
Pbzip 0.701 0.0001
Work Steal Queue 0.002 0.0003
Dryad 0.164 2x10-5
Scalability Program LOC d n k sync k optimized
Splash – FFT 1200 1 2 791 139Splash – LU 1130 1 2 1517 996Splash – Barnes 3465 1 2 7917 318
Pbzip2 1978 2 4 1981 1207TPL WSQ 495 2 4 1488 75
Dryad 16036 2 5 9631 1990IE - 1 25 1.4M 0.13MMozilla 245K 1 12 38.4M 3M
ConclusionsProbabilistic concurrency testing
Provides reasonable probabilistic bounds of finding bugs
Notion of bug depthA classification of concurrency bugsEssential for probabilistic boundsMany bugs have a very small depth
The initial prototype of Cuzz is very effectiveFinds lots of bugs within the first few hundred runsScales to large programs