CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research

CHESS : Systematic Testing of Concurrent Programs

Madan MusuvathiShaz Qadeer

Microsoft Research

Testing multithreaded programs is HARD

Specific thread interleavings expose subtle errorsTesting often misses these errors

Even when found, errors are hard to debugNo repeatable traceSource of the bug is far away from where it manifests

Concurrency is a real problemWindows 2000 hot fixes

Concurrency errors most common defects among “detectable errors”

Incorrect synchronization and protocol errors most common defects among all coding errors

Windows Server 2003 late cycle defectsSynchronization errors second in the list, next to buffer

overruns

Race conditions can result in security exploits

Current practiceConcurrency testing == Stress testing

Example: testing a concurrent queueCreate 100 threads performing queue operationsRun for days/weeksPepper the code with sleep ( random() )

Stress increases the likelihood of rare interleavingsMakes any error found hard to debug

CHESS: Unit testing for concurrencyExample: testing a concurrent queue

Create 1 reader thread and 1 writer threadExhaustively try all thread interleavings

Run the test repeatedly on a specialized scheduler

Explore a different thread interleaving each timeUse model checking techniques to avoid redundancy

Check for assertions and deadlocks in every runThe error-trace is repeatable

Systematic Stress Testing Using CHESS

Kernel: Threads, Scheduler, Synchronization Objects

While(not done) { TestScenario()}

While(not done) { TestScenario()}

TestScenario() { …}

ProgramTester Provides a Test Scenario CHESS

CHESS runs the scenario in a loop • Every run takes a different interleaving• Every run is repeatable

Win32 API

Conditions on Test ScenarioTest scenario should terminate in all interleavings

Test scenario should be idempotentFree all resources (handles, memory, …)Clear the hardware state

Key observation:Existing stress tests already have these propertiesBecause they repeatedly run for ever

Perturb the System as Little as Possible

Kernel: Threads, Scheduler, Synchronization Objects

While(not done){ TestScenario()}

While(not done){ TestScenario()}

TestScenario(){ …}

Program

CHESS

Win32 API

Detour Win32 API calls• To control and introduce nondeterminism

Run the system as is• On the actual OS, hardware• Using system threads, synchronization

Advantages• Avoid reporting false errors• Easy to add to existing test frameworks• Use existing debuggers

Implementation detailsHandle all the Win32 synchronization mechanisms

Critical sections, locks, semaphores, events,…ThreadpoolsAsynchronous procedure callsTimersIO Completions

No modification to the kernel scheduler / Win32 library

CHESS drives the system along a desired by interleaving by ‘hijacking’ the scheduler

Controlling the Scheduling NondeterminismNondeterministic choices for the scheduler

Determine when to context switchOn context switch, pick the next runnable thread to runOn resource release, wake up one of the waiting threads

Hijack these choices from the schedulerEnsure at most one thread is runnableNo thread is waiting on a resourceAt chosen schedule points, block the current thread while

waking the next threadEmulate program execution on a uniprocessor with

context switches only at synchronization points

Partial-order reductionMany thread interleavings are equivalent

Accesses to separate memory locations by different threads can be reordered

Avoid exploring equivalent thread interleavings

Partial-order reduction in CHESSAlgorithm:

Assume the program is data-race freeContext switch only at synchronization pointsCheck for data-races in each execution

Theorem:If the algorithm terminates without reporting races,

then the program has no assertion failures

Executions on Multi-coresCHESS checks for data-racesIf a Test Scenario manifests a bug on a multi-core

machine, then CHESS willEither report a data-raceOr the bug

CHESS systematically enumerates all sequentially consistent executionsAny data-race free multi-core execution is equivalent to

a sequentially consistent execution

State space explosion

x = 1;y = 1;x = 1;y = 1;

x = 2;y = 2;x = 2;y = 2;

2,12,1

1,01,0

0,00,0

1,11,1

2,22,2

2,22,22,12,1

2,02,0

2,12,12,22,2

1,21,2

2,02,0

2,22,2

1,11,1

1,11,1 1,21,2

1,01,0

1,21,2 1,11,1

y = 1;y = 1;

x = 1;x = 1;

y = 2;y = 2;

x = 2;x = 2;

x = 2; … … … … … y = 2;

x = 2; … … … … … y = 2;

State space explosion

x = 1; … … … … …y = 1;

x = 1; … … … … …y = 1;

…

n threads

k steps each

Number of executions = O( nnk )

Exponential in both n and kTypically: n < 10 k > 100

Limits scalability to large programs (large k)

Bounding execution depthWorks very well for message-passing programs

Limit the number of message exchanges

Message processing code executed atomicallyCan go ‘deep’ in the state space

Does not work for multithreaded programsEven toy programs can have large number of steps

(shared-variable accesses)

x = 1;if (p != 0) { x = p->f;}

x = 1;if (p != 0) { x = p->f;}

Iterative context bounding

x = p->f;} x = p->f;}

x = 1;if (p != 0) {x = 1;if (p != 0) {

p = 0;p = 0;

preemption

non-preemption

Iterative context-bounding algorithmThe scheduler has a budget of c preemptions

Nondeterministically choose the preemption pointsResort to non-preemptive scheduling after c

preemptionsOnce all executions explored with c preemptions

Try with c+1 preemptions

Iterative context-bounding has desirable propertiesProperty 0: Easy to implement

Property 1: Polynomial state spaceTerminating program with fixed inputs and deterministic threads

n threads, k steps each, c preemptionsNumber of executions <= nkCc . (n+c)! = O( (n2k)c. n! )

Exponential in n and c, but not in k

x = 1; … … … … …y = 1;

x = 1; … … … … …y = 1;

x = 2; … … … … … y = 2;

x = 2; … … … … … y = 2;

x = 1; … … … …

x = 1; … … … …

x = 2; … … …

x = 2; … … …

…y = 1; …y = 1;

… … … …

y = 2;y = 2;

• Choose c preemption points

• Permute n+c atomic blocks

Property 2: Deep exploration possible with small boundsA context-bounded execution has unbounded depth

a thread may execute unbounded number of steps within each context

Event a context-bound of zero yields complete terminating executions

Property 3: Finds the ‘simplest’ error traceFinds smallest number of preemptions to the

error

Number of preemptions better metric of error complexity than execution length

Property 4: Coverage metricIf search terminates with context-bound of c, then any

remaining error must require at least c+1 preemptions

Intuitive estimate forThe complexity of the bugs remaining in the programThe chance of their occurrence in practice

Property 5: Lots of bugs with small number of preemptionsA non-blocking implementation of the work-

stealing queue algorithmbounded circular buffer accessed concurrently by

readers and stealersDeveloper provided

test harnessthree buggy variations of the program

Each bug found with at most 2 preemptionsexecutions with 35 preemptions are possible!

Context-bounding + Partial-order reductionAlgorithm:

Assume the program is data-race freeContext switch only at synchronization pointsExplore executions with c preemptionsCheck for data-races in each execution

Theorem:If the algorithm terminates without reporting races,

Then the program has no assertion failures reachable with c preemptions

Requires that a thread can block only at synchronization pointsProof (Musuvathi-Q, PLDI 2007)

Bugs found

Program KLOC Max Num Threads

Bugs Reachable with Preemption Count

0 1 2 3 Total

Bluetooth 0.4 3 0 1 0 0 1

Work-Stealing Queue

1.3 3 0 1 2 0 3

Transaction Manager

7.0 2 0 0 2 1 3

APE 18.9 4 2 1 1 - 4

Dryad Channels 16.0 5 1 5 1 - 7

// Function called by a worker thread // of RChannelReaderImplvoid RChannelReaderImpl::AlertApplication(RChannelItem* item){ // Notify Application

// XXX: Preempt here for the bug EnterCriticalSection(&m_baseCS); // process before exit LeaveCriticalSection(&m_baseCS);}

// Function called by the main threadvoid TestChannel(WorkQueue* workQueue, ...){ // Creating a channel // allocates worker threads RChannelReader* channel = new RChannelReaderImpl(..., workQueue);

// ... do work here

channel->Close(); // wrong assumption that channel->Close() // waits for worker threads to be finished

delete channel; // BUG: deleting the channel when // worker threads still have a valid // reference to the channel}




// ... do work here






// ... do work here






// ... do work here






// ... do work here



Facts about Dryad error trace

Long error trace but requires only one preemptionDepth-bounding cannot find it without a lot of luck

The error trace has 6 non-preempting context switchesIt is important to leave unbounded the number of non-

preempting context switches This (and the other 6 errors) in Dryad remained in

spite of careful regression testing and months of production use

Bugs found

Program KLOC Max Num Threads

Bugs Reachable with Preemption Count

0 1 2 3 Total

Bluetooth 0.4 3 0 1 0 0 1

Work-Stealing Queue

1.3 3 0 1 2 0 3

Transaction Manager

7.0 2 0 0 2 1 3

APE 18.9 4 2 1 1 - 4

Dryad Channels 16.0 5 1 5 1 - 7

Coverage vs. Context-bound

Dryad (coverage vs. time)

Current CHESS applications (work in progress)Dryad (library for distributed dataflow programming)Singularity/Midori (OS in managed code)User-mode drivers

Cosmos (distributed file system)SQL database

ConclusionConcurrency is important

Building robust concurrent software is still a challengeLack of debugging and testing toolsCHESS: Concurrency unit-testing

Exhaustively try all interleavingsAttempt to seamlessly integrate with existing test

frameworksProvide replay capability

Iterative context-bounding algorithm key to the design

Documents

CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research