Unit Testing Concurrent Software

  • Published on
    25-May-2015

  • View
    1.153

  • Download
    0

Transcript

  • 1. Unit Testing Concurrent SoftwareWilliam PughNathaniel AyewahDept. of Computer ScienceDept. of Computer ScienceUniv. of MarylandUniv. of MarylandCollege Park, MD College Park, MDpugh@cs.umd.edu ayewah@cs.umd.edu ABSTRACTvarious locking protocols, avoid race conditions, and coordi- There are many diculties associated with developing cor- nate multiple threads. Many developers manage this com- rect multithreaded software, and many of the activities thatplexity by separating the concurrent logic from the business are simple for single threaded software are exceptionally logic, limiting concurrency to small abstractions like latches, hard for multithreaded software. One such example is con- semaphores and bounded buers. These abstractions can structing unit tests involving multiple threads. Given, for then be tested independently, adequately validating the cor- example, a blocking queue implementation, writing a testrectness of most of the concurrent aspects of the application. case to show that it blocks and unblocks appropriately us- One strategy for testing concurrent components is to write ing existing testing frameworks is exceptionally hard. In thislarge test cases that contain many possible interleavings and paper, we describe the MultithreadedTC framework whichrun these test cases many times hopefully inducing some rare allows the construction of deterministic and repeatable unitbut faulty interleavings. Many concurrent test frameworks tests for concurrent abstractions. This framework is not de-are based on this paradigm and provide facilities to increase signed to test for synchronization errors that lead to rare the diversity of interleavings [5, 9]. But this strategy may probabilistic faults under concurrent stress. Rather, thismiss some interleavings that lead to failures, and do not framework allows us to demonstrate that code does provide yield consistent results. specic concurrent functionality (e.g., a thread attemptingA dierent strategy is to test specic interleavings sepa- to acquire a lock is blocked if another thread has the lock). rately. But some critical interleavings are hard to exerciseWe describe the framework and provide empirical compar-because of the presence of blocking and/or timing. This isons against hand-coded tests designed for Suns Java con- paper describes MultithreadedTC, a framework that allows currency utilities library and against previous frameworksa test designer to exercise a specic interleaving of threads that addressed this same issue. The source code for thisin an application. It features a metronome (or clock) that framework is available under an open source license.allows test designers to coordinate threads even in the pres- ence of blocking and timing issues. The clock advances when all threads are blocked; test designers can delay operations Categories and Subject Descriptorswithin a thread until the clock has reached a desired tick. D.2.5 [Software Engineering]: Testing and DebuggingMultithreadedTC also features a concise syntax for speci- testing tools fying threads and eliminates much of the scaolding code required to make multithreaded tests work well with JUnit. General Terms It can also detect deadlocks and livelocks. Experimentation, Reliability, Verication 2. A SIMPLE EXAMPLE Keywords Figure 1 shows four operations on a bounded buer, dis- Java, testing framework, concurrent abstraction, JUnit test tributed over two threads. A bounded buer allows users to cases, MultithreadedTCtake elements that have been put into the container, in the order in which they were put. It has a xed capacity and 1. INTRODUCTION both put and take cause the calling thread to block if the container is full or empty respectively. Concurrent applications are often hard to write and even In this example, the call to put 17 should block thread 1 harder to verify. The application developer has to adhere tountil the assertion take = 42 in thread 2 frees up space in the bounded buer. But how does the test designer guar- antee that thread 1 blocks before thread 2s assertion, and Permission to make digital or hard copies of all or part of this work for does not unblock until after the assertion? personal or classroom use is granted without fee provided that copies areOne option is to put thread 2 to sleep for a xed time pe- not made or distributed for prot or commercial advantage and that copies riod long enough to guarantee that thread 1 has blocked. bear this notice and the full citation on the rst page. To copy otherwise, toBut this introduces unnecessary timing dependence the republish, to post on servers or to redistribute to lists, requires prior specic resulting code does not play well in a debugger, for in- permission and/or a fee. ASE07, November 49, 2007, Atlanta, Georgia, USA.stance and the presence of multiple timing dependent Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00.units would make a large example harder to understand.

2. class BoundedBufferTest extends MultithreadedTestCase {ArrayBlockingQueue buf;@Override public void initialize() { buf = new ArrayBlockingQueue(1);}public void thread1() throws InterruptedException { buf.put(42); buf.put(17); assertTick(1);}public void thread2() throws InterruptedException { waitForTick(1); assertTrue(buf.take() == 42); assertTrue(buf.take() == 17); Figure 1: (a) Operations in multiple threads on a} bounded buer of capacity 1 must occur in a par- @Override public void finish() { assertTrue(buf.isEmpty()); ticular order, (b) MultithreadedTC guarantees the} correct order. } // JUnit testpublic void testBoundedBuffer() throws Throwable { Another approach is to make thread 2 wait on a latch that TestFramework.runOnce( new BoundedBufferTest() ); is released at the appropriate time. But this will not work} here because the only other thread available to release the Figure 2: Java version of bounded buer test. latch (thread 1) is blocked.MultithreadedTCs solution is to superimpose an exter- nal clock on both threads. The clock runs in a separate The test sequence is run by invoking one of the run meth- (daemon) thread which periodically checks the status of allods in TestFramework. (This can be done in a JUnit test, test threads. If all the threads are blocked, and at least one as in Figure 2.) TestFramework is also responsible for regu- thread is waiting for a tick, the clock advances to the next lating the clock. It creates a thread that periodically checks desired tick. In Figure 1, thread 2 blocks immediately, andthe status of the clock and test threads and decides when to thread 1 blocks on the call to put 17. At this point, allrelease threads that are waiting for a tick. It can also detect threads are blocked with thread 2 waiting for tick 1. So the deadlock: when all threads are blocked and none of them is clock thread advances the clock to tick 1 and thread 2 iswaiting for a tick or a timeout. More details on creating and free to go. Thread 2 then takes from the buer with the as-running tests are available on the project website at [3]. sertion take = 42, and this releases Thread 1. Hence with the test, we are condent that (1) thread 1 blocked (because the clock advanced) and (2) thread 1 was not released until4. TEST FRAMEWORK EVALUATION after thread 2s rst assertion (because of the nal assertionTo evaluate MultithreadedTC, we performed some quali- in thread 1).tative and quantitative comparisons with other test frame-MultithreadedTC is partly inspired by ConAn [11, 12,works, focusing on how easy it is to write and understand 13], a script-based concurrency testing framework which also test cases, and how expressive each test framework is. Sec- uses a clock to coordinate the activities of multiple threads. tion 4.1 compares MultithreadedTC with TCK tests created We provide a comparison of ConAn and MultithreadedTC to validate Javas concurrency utilities library [1]. We take in Section 4.2.this test suite to represent a typical JUnit-based implemen-tation of multithreaded tests. Section 4.2 compares Multi-threadedTC with ConAn. 3. IMPLEMENTATION DETAILS Figure 2 shows a Java implementation of the test in Figure 4.1 JSR 166 TCK comparison 1 and illustrates some of the basic components of a Multi-JSR (Java Specication Request) 166 is a set of proposals threadTC test case. Each test is a subclass of Multithread-for adding a concurrency utilities library to Java [1]. This edTestCase. Threads are specied using thread methodsincludes components like locks, latches, thread locals and which return void, have no arguments and have names pre- bounded buers. Its TCK (Technology Compatibility Kit) xed with the word thread. Each test may also override includes a suite of JUnit tests that validate the correctness the initialize() or finish() methods provided by the of the librarys implementation against the specication pro- base class. When a test is run, initialize() is invokedvided in the JSR. We examined 258 of these tests which at- rst, then all thread methods are invoked simultaneously tempt to exercise specic interleavings in 33 classes, and im- in separate threads, and nally when all the thread meth-plemented them using MultithreadedTC. These tests do not ods have completed, finish() is invoked. This is analogous measure performance, nor do they look for rare errors that to the organization of setup, test and teardown methods in may occur when a concurrent abstraction is under stress. JUnit [2].In general MultithreadedTC allows the test designer to Each MultithreadedTestCase maintains a clock that starts use simpler constructs that require fewer lines of code. Our at time 0. The clock advances when all threads are blocked evaluation also demonstrates that MultithreadedTC can ex- and at least one thread is waiting for a tick. The methodpress all the strategies used in the TCK tests and is often waitForTick(i) blocks the calling thread until the clock more precise and expressive, especially for tests that do not reaches tick i. A test designer can also temporarily prevent require a timing dependency. Table 1 gives an overview of the clock from advancing when all threads are blocked by the comparison, including the observation that Multithread- using freezeClock(). edTC requires fewer local variables and anonymous inner 3. #ticktime 200#monitor m WriterPreferenceReadWriteLock Table 1: Overall comparison of TCK tests and MTC ... (MultithreadedTC) implementation ldots#begin MeasureTCK MTC #test C1 C13#tick Lines of Code 80037070#thread Bytecode Size 1017K980K#excMonitor m.readLock().attempt(1000); #end *Local variables per method 1.120.12 #valueCheck time() # 1 #end #end *Av. anon. inner classes/method 0.380.01 #end * Metrics measured by the software quality tool Swat4j [4] #tick #thread #excMonitor m.readLock().release(); #end#valueCheck time() # 2 #end threadFailed = false; #end ...#end Thread t = new Thread(new Runnable() { #endpublic void run() { try {Figure 4: The script for a ConAn test to validate a// statement(s) that may cause exception } catch(InterruptedException ie) { Writer Preference Read Write LockthreadFailed = true;fail("Unexpected exception"); } ConAn (Concurrency Analyzer) is a script-based test frame-} });work that, like MultithreadedTC, uses a clock to synchro- t.start(); nize the actions in multiple threads [12]. ConAn also aims to ...make it possible to write succinct test cases by introducing a t.join(); assertFalse(threadFailed); script-based syntax illustrated in Figure 4. To run the test,the script is parsed and translated into a Java test driver Figure 3: Handling threads in JUnit testscontaining methods which interact with some predened Co-nAn library classes and implement the tests specied in thescript. classes, in part because it does not make the test designer Each test is broken into tick blocks that contain one or construct threads manually. The TCK tests generally usemore thread blocks (which may have optional labels). A anonymous inner classes as illustrated in Figure 3. This thread block may span multiple tick blocks by using the syntax is quite verbose, especially since all the useful func- same label. Thread blocks contain valid Java code or use tionality is in the method run().custom script tags to perform common operations like han-Another source of verbosity in Figure 3 is the scaolding dling exceptions (excMonitor) and asserting values and equal- required to handle exceptions. While exceptions in Multi-ities (valueCheck). Any blocking statements in a given tick threadedTC cause the test to fail immediately, exceptionsmay unblock at a later tick, which is conrmed by checking in a JUnit test thread will kill the thread but will not failthe time. the test. To force failure, the TCK tests use additional try- The major dierence between the two frameworks is that catch blocks to catch thread exceptions and set a ag (suchConAn relies on a timer-based clock which ticks at regular as the threadFailed ag in Figure 3). MultithreadedTCintervals, while MultithreadedTC advances the clock when also eliminates the last join statement in Figure 3 because it all threads are blocked. ConAns strategy introduces a tim- checks to make sure all threads complete and uses finish() ing dependence even in tests that do not involve any timing to execute code that must run after all threads complete.like the bounded buer example in Figure 1.Some of the dierences we have just described are quanti-The two frameworks also use dierent paradigms to or- ed in Table 2. This table includes simple counts of the num-ganize tests. ConAn organizes tests by ticks, while Multi- ber of TCK tests that use the constructs described above threadedTC organizes tests by threads. The two paradigms (anonymous inner classes, try-catch blocks and joins) in a are dicult to compare quantitatively as each involves trade- way that is eliminated by the corresponding Multithread- os for the test designer. MultithreadedTC allows designers edTC test. It also counts the total number of times theseto inadvertently introduce indeterminism into tests by plac- constructs are removed.ing a waitForTick at the wrong place, while ConAn forcesdesigners to deterministically put code segments into the 4.2 Comparison to ConAntick in which they are to run. On the other hand ConAnsparadigm can be confusing when writing tests because thecode in one thread is spread out spatially into many ticks, Table 2: The number of TCK tests with constructs while MultithreadedTC places all the code for a thread into that were removed in the MultithreadedTC version one method, consistent with Javas paradigm. Also, Co- and the total number of constructs removed.nAns organization hardcodes the content of ticks and doesnot allow for relative time which is useful if blocking occursConstruct RemovedTestsRemoved in a loop.Anonymous inner classes 216 257Another signicant dierence between the two frameworksThreads join() method193 239 is that ConAn relies on a custom script-based syntax, whiletry-catch blocks104 106 MultithreadedTC is pure Java and borrows metaphors fromThreads sleep() method 198 313 JUnit. This means that test designers using ConAn do nothave access to many of the powerful features provided in 4. lead to faults. Its algorithm also modies the original pro- Table 3: Line count comparisons between Multi- gram to add synchronization. threadedTC and ConAn for tests on the Writer Pref-ConTest and some other frameworks like GroboUtils also erence Read Write Lock provide facilities to run the tests many times in hopes of nd-Test Suite MTCConAn ConAn Drivering rare failing interleavings. ConTest relies on source levelBasic Tests 274829 2192 seeding using standard Java Thread methods like sleep()Tests with Interrupts 456 1386 3535 and yield() to increase the chance that a dierent inter-Tests with Timeouts 389585 1629 leaving is used each time the test is run. Another approachTotal Line Count 1119 2800 7356 would be to take control of the thread scheduler, but thiswould require modifying the Java virtual machine, and makethe nal solution less portable and platform independent. modern integrated development environments (IDEs) such6. CONCLUSIONS as refactoring, syntax highlighting, and code completion. Concurrency in applications is receiving new emphasis as ConAn was partially evaluated using tests written to vali- chip manufacturers develop multi-core processors. Software date an implementation of the WriterPreferenceReadWrite- engineers need more eective approaches of creating fault Lock. We implemented MultithreadedTC versions of these free programs that exploit the level of concurrency these pro- tests and compared the number lines of code of the two im- cessors can oer. MultithreadedTC provides a Java based plementations. This comparison, shown in Table 3 indicates framework for writing tests that exercise specic interleav- that the MultithreadedTC tests are more succinct than theings of concurrent programs. It was designed for testing ConAn scripts and signicantly more succinct than ConAnssmall concurrent abstractions in which all possible interleav- generated Java drivers.ings can be enumerated and tested separately. The frame-work is robust in the face of blocking and timing issues. Theresulting tests are succinct, JUnit compatible, and involvevery little overhead. 5. RELATED WORK The source code and documentation for MultithreadedTCMany java test frameworks provide basic facilities that al- are available under an open source license at [3]. low test designers to run unit tests with multiple threads but do not remove the resulting nondeterminism. JUnit [2] pro- 7. REFERENCES vides an ActiveTestSuite extension which runs all the tests in the suite simultaneously in dierent threads and waits [1] Jsr 166: Concurrency utilities. for the threads to complete. In TestNG [6], tests are run http://www.jcp.org/en/jsr/detail?id=166, 2004. in parallel if the parallel parameter is set. An additional [2] Junit testing framework. http://www.junit.org, 2007. parameter, thread-count, is used to specify the size of the [3] Multithreadedtc. thread pool. Another framework, GroboUtils [5] extends thehttp://code.google.com/p/multithreadedtc/, 2007. JUnit framework to provide support for writing thread safe[4] Swat4j. http://www.codeswat.com, 2007. multithreaded tests without having to write much thread [5] M. Albrecht. Using multi-threaded tests. handling code. A thread is specied by extending a pro- http://groboutils.sourceforge.net/testing- vided TestRunnable class and implementing the runTest() junit/using mtt.html, September method. Other classes are provided to run multiple in-2004. stances of this thread simultaneously, enforce a time limit [6] C. Beust and A. Popescu. Testng: Testing, the next (after which all threads are killed) and regularly monitorgeneration. http://www.testng.org, 2007. the execution of the threads to look for inconsistent states. [7] R. H. Carver and K.-C. Tai. Replay and testing forMultithreadedTCs eort to control the synchronization concurrent programs. IEEE Softw., 8(2):6674, 1991. of multiple threads is partly inspired by ConAn [11, 12,[8] J.-D. Choi and H. Srinivasan. Deterministic replay of 13]. ConAn extends a technique for testing concurrent mon-java multithreaded applications. In SPDT 98: itors introduced by Brinch Hansen [10]. Both ConAn andProceedings of the SIGMETRICS symposium on Hansens method use a clock that is incremented at reg- Parallel and distributed tools, pages 4859, New York, ular time intervals and used to synchronize thread oper-NY, USA, 1998. ACM Press. ations. But ConAn organizes tests into tick blocks while[9] O. Edelstein, E. Farchi, Y. Nir, G. Ratsaby, and S. Ur. Hansens method organizes tests into threads and uses await Multithreaded java program test generation. IBM statements (which is the paradigm followed by Multithread-Systems Journal, 41(1):111125, 2002. edTC).[10] P. B. Hansen. Reproducible testing of monitor. Softw.,Other frameworks and approaches have focused on allow- Pract. Exper., 8(6):721729, 1978. ing tests to run nondeterministically, recording the inter-[11] B. Long. Testing Concurrent Java Components. PhD leavings that fail, and deterministically replaying them by thesis, The University of Queensland, July 2005. transforming the original program. Carver and Tai [7] use a language-based approach (in Ada) that transforms the origi-[12] B. Long, D. Homan, and P. Strooper. Tool support nal program under test into an equivalent program that uses for testing concurrent java components. IEEE semaphores and monitors to control synchronization. Con-Transactions on Software Engineering, 29(6), 2003. Test [9] is a Java testing framework that uses a source level[13] L. Wildman, B. Long, and P. A. Strooper. Testing version of the deterministic replay algorithm introduced in java interrupts and timed waits. In APSEC, pages DejaVu [8] to record and replay specic interleavings that438447, 2004.

Recommended

View more >