Assessing Unit Test Quality

Welcome!Welcome!

http://www.sevajug.orghttp://www.sevajug.org

Assessing Unit Test Quality Assessing Unit Test Quality

Matt HarrahMatt Harrah

Southeast Virginia Java Users Southeast Virginia Java Users GroupGroup

20 Nov 200720 Nov 2007

About this presentation

• This presentation is more like a case study than a lecture

• Several different technologies will be discussed• In particular, we will discuss the suitability of

various technologies and approaches to solve a computing problem

• We will also discuss the computing problem itself

• Suggestions and questions are welcomed throughout

Before we begin: Some Before we begin: Some definitionsdefinitions

• Unit testing – testing a single class to Unit testing – testing a single class to ensure that it performs according to its ensure that it performs according to its API specs (i.e., its Javadoc)API specs (i.e., its Javadoc)

• Integration testing – testing that the Integration testing – testing that the units interact appropriatelyunits interact appropriately

• Functional testing – testing that the Functional testing – testing that the integrated units meet the system integrated units meet the system requirementsrequirements

• Regression testing – testing that Regression testing – testing that changes to code have not changes to code have not (re-)introduced unexpected changes in (re-)introduced unexpected changes in performance, inputs, or outputsperformance, inputs, or outputs

Part I: The Problem

Restating the obviousRestating the obvious

• The Brass Ring: We generally want The Brass Ring: We generally want our code to be as free of bugs as is our code to be as free of bugs as is economically feasibleeconomically feasible

• Testing is the only way to know how Testing is the only way to know how bug-free your code isbug-free your code is

• All four kinds of testing mentioned a All four kinds of testing mentioned a minute ago can be automated with minute ago can be automated with repeatable suites of testsrepeatable suites of tests

Restating the obviousRestating the obvious

• The better your test suite, the more The better your test suite, the more confidence you can have in your confidence you can have in your code’s correctnesscode’s correctness

• Junit is the most commonly used way Junit is the most commonly used way to automate unit tests for Java codeto automate unit tests for Java code– Suites of repeatable tests are commonly Suites of repeatable tests are commonly

built up over timebuilt up over time– QA involves running these suites of tests QA involves running these suites of tests

on a regular basison a regular basis

Brief DigressionBrief Digression

• Junit can also be used to do integration, Junit can also be used to do integration, functional, and regression testingfunctional, and regression testing– Integration tests theoretically should create Integration tests theoretically should create

two objects, and test their interactionstwo objects, and test their interactions– Functional tests can simulate the user Functional tests can simulate the user

interacting with the system and verifying its interacting with the system and verifying its outcomesoutcomes

– Regression testing is typically making sure Regression testing is typically making sure that changes do not introduce test failures that changes do not introduce test failures in the growing suite of automated testsin the growing suite of automated tests

So…if the better your test suite, the So…if the better your test suite, the better the code, the real question is:better the code, the real question is:

Mock ObjectsMock Objects

• Are you isolating your objects under test?Are you isolating your objects under test?– If a test uses two objects and the objects If a test uses two objects and the objects

interact, a test failure can be attributed to interact, a test failure can be attributed to either of the two objects, or because they either of the two objects, or because they were not meant to interactwere not meant to interact

– Mock objects are a common solutionMock objects are a common solution• One and only one real code object is tested – the One and only one real code object is tested – the

other objects are “mock objects” which simulate other objects are “mock objects” which simulate the real objects for test purposesthe real objects for test purposes

• Allows test writer to simulate conditions that might Allows test writer to simulate conditions that might be otherwise difficult to createbe otherwise difficult to create

– This problem is well-known and amply This problem is well-known and amply addressed by several products (e.g., addressed by several products (e.g., EasyMock)EasyMock)

Code CoverageCode Coverage

• Do you have enough tests? What’s Do you have enough tests? What’s tested and what isn’t?tested and what isn’t?– Well-known problem with numerous tools Well-known problem with numerous tools

to help, such as Emma, Jcoverage, to help, such as Emma, Jcoverage, Cobertura, and Clover. These tools monitor Cobertura, and Clover. These tools monitor which pieces of code under test get which pieces of code under test get executed during the test suite. executed during the test suite.

– All the code that executed during the test All the code that executed during the test is considered covered, and the other code is considered covered, and the other code is considered uncovered.is considered uncovered.

– This provides a numeric measurement of This provides a numeric measurement of test coverage (e.g., “Package x has 49% test coverage (e.g., “Package x has 49% class coverage”)class coverage”)

JUnit Fallacy # 1JUnit Fallacy # 1

• ““The code is just fine – all our tests The code is just fine – all our tests pass”pass”

• Test success does not mean the code is Test success does not mean the code is finefine

• Consider the following test:Consider the following test:

public void testMethod() {public void testMethod() {

;// Do absolutely nothing;// Do absolutely nothing

}}

This test will pass every time.This test will pass every time.

What the world What the world reallyreally needs needs

• Some way of measuring how rigorous Some way of measuring how rigorous each test iseach test is– A test that makes more assertions about A test that makes more assertions about

the behaviour of the class under test is the behaviour of the class under test is presumably more rigorous than one that presumably more rigorous than one that makes fewer assertionsmakes fewer assertions

– If only we had some sort of measure of If only we had some sort of measure of how many assertions are made per how many assertions are made per something-or-othersomething-or-other

“Assertion Density”

• Assertion Density for a test is defined by the equation shown, where– A is the assertion density– a is the number of assertions

made during the execution of the test

– m is the number of method calls made during the execution of the test

• Yep, I just made this up

Junit Fallacy #2Junit Fallacy #2

• ““Our code is thoroughly tested – Our code is thoroughly tested – Cobertura says we have 95% code Cobertura says we have 95% code coverage”coverage”

• Covered is not the same as testedCovered is not the same as tested• Many modules call other modules Many modules call other modules

which call other modules. which call other modules.

Indirect TestingIndirect Testing

Test A

Class A Class B Class C

Tests

Calls Calls

•Class A, Class B, and Class C all execute as Test A runs•Code coverage tools will register Class A, Class B, and class C as all covered, even though there was no test specifically written for Class B or Class C

What the world What the world reallyreally needs needs

• Some way of measuring how directly Some way of measuring how directly a class is testeda class is tested– A class that is tested directly and A class that is tested directly and

explicitly by a test designed for that explicitly by a test designed for that class is better-tested than one that only class is better-tested than one that only gets run when some other class is gets run when some other class is testedtested

– If only we had some sort of “test If only we had some sort of “test directness” measure…directness” measure…

– Perhaps a reduced quality rating the Perhaps a reduced quality rating the more indirectly a class is tested?more indirectly a class is tested?

“Testedness”

• Testedness is defined by the formula shown, where– t is the testedness– d is the test distance

– nd is the number of calls at test distance d

• Yep, I made this one up too

Part II: Solving the Problem

(or, at least attempting to…)

Project PEA

• Project PEA – Named after The Princess and the Pea

• Primary Goals: – Collect and report test directness /

testedness of code– Collect and report assertion density of

tests

• Start with test directness• Add assertion density later

Project PEA

• Requirements:– No modifications to source code or tests

required– Test results not affected by data gathering– XML result set

• Ideals:– Fast– Few restrictions on how the tests can be

run

Approach #1:Static Code Analysis

• From looking at a test’s imports, determine which classes are referenced directly by tests

• From each class, look what calls what• Assemble a call network graph• From each node in graph, count

steps to a test case


• Doesn’t work• Reflective calls defeat static

detection of what is being called• Polymorphism defeats static

detection of what is being called


• Consider:class TestMyMap extends AbstractMapTestCase;private void testFoo() {

// Map m defined in superclassm.put(“bar”,”baz”);

}

• What concrete class’ put() method is being called?

• Java’s late binding makes this static code analysis unsuitable

Approach #2:Byte Code Instrumentation

• Modify the compiled .class files to call a routine on each method’s entry

• This routine gets a dump of the stack and looks through the it until it finds a class that is a subclass of TestCase

• Very similar to what other code coverage tools like Emma do (except those tools don’t examine the stack)


• There are several libraries out there for modifying .class files– BCEL from Apache– ASM from ObjectWeb

• ASM was much easier to use than BCEL

• Wrote an Ant task to go through class files and add call to a tabulating routine that examined the stack


• It did work, but it was unbearably slow – typically 1000x slower and sometimes even slower

• This is because getting a stack dump is inherently slow– Stack dumps are on threads– Threads are implemented natively– To get a dump of the stack, the thread needs

to be stopped and the JVM has to pause

• To be viable, PEA cannot use thread stack dumps

Approach #3:Aspect-Oriented Programming

• Use AOP to accomplish similar tasks as the byte code instrumentation– Track method entry/exit and maintain a

mirror of the stack in the app– Calculate and record distance from a

test at every method entry

• Avoids the overhead of getting stack dumps from the thread


• Unsatisfactory – in fact, a complete failure

• Method exits are all over the place in a method– Method exits in the byte code do not always

correspond to the source structure, particularly where exceptions are concerned

– Introducing an aspect behavior at each exit point can increase the size of the byte code by up to 30%


• Expanding methods by 30% can (and did) cause them to bump into Java’s 64K limit on method bytecode– Instrumented classes would not load

• In addition, AspectJ required you to either:– Recompile the source and tests using AspectJ’s

compiler; or– Create and use your own aspecting-on-the-fly

classloader– Either way you still hit the 64K barrier

Approach #4:Debugger

• The idea is to write a debugger that monitors the tests in one JVM as they run in another

• The debugger can track method entries and exits as they happen and keep the stack straight

• The code being tested, and the tests themselves, do not need to be aware that the debugger is watching them


• Java includes in the SE JDK an architecture called JPDA– Java Platform Debugger Architecture

• This architecture allows one JVM to debug another JVM over sockets, shared files, etc.

• It provides an Object-Oriented API for the debugging JVM– Models the debugged JVM as a POJO– Provides call-backs for events as they occur in

the debugged JVM


• JPDA allows you to– Specify which events you want to be

notified about– Specify which packages, etc. should be

monitored– Pause and restart the other process– Inspect the variables, call stacks, etc. of

the other process (as long as it’s paused)

• No additional libraries required!

Putting JPDA to work

• First I wrote a debugger process to attach to an already running JVM– By using the <parallel> task in Ant, I can

simultaneously launch the debugger and the Junit tests

– The debugger will attach to and monitor the other JVM

– Register interest in callbacks on method entry and exit, exceptions, and JVM death


• As methods enter and exit, push and pop entries onto a stack maintained in the debugger– This effectively mirrors the real stack in

the tests– Ignore certain packages beyond

developer control (such as the JDK itself)


• As methods are entered, calculate and record its distance in the stack from a test

• Shut down when the other JVM dies• Just before shutdown, write all the

recorded data to a file


• Remember – an Ant Junit task can fork multiple JVMs – one per test if you want, so we need to monitor each one

• Multiple JVMs mean multiple files of recorded data that need to be accumulated after all the tests are complete

• Produce XML file of accumulated results

Results of using JPDA

• Performance is way better than using byte code instrumentation– Running with monitoring on slows execution by

100x or less, depending on the code

• Ant script is kind of complicated– JUnit tests and PEA must be run with forked

JVMs– Special JVM parameters for the debugged

process are required– JUnit and PEA must be started simultaneously

using the <parallel> task (which many people don’t know about)

Results of using JPDA

• The byte code being monitored is completely unchanged – No special instrumentation or

preparatory build step is required

• XML file comes out with details about how many method calls were made at what test distance

Results file example

Report Sample

Code Review

Let’s roll that beautiful Java footage!

Future Plans

• Implement assertion density tracking• Tweak performance• Make easier to run the tests with PEA running

– Perhaps subclass Ant’s <junit> task to run with PEA?

• Documentation (ick)• Eat my own dog food and use the tool to

measure my own JUnit tests• Sell for $2.5billion to Scott McNeely and

Jonathan Schwartz and retire to Jamaica

Lessons Learned (so far)

• What seems impossible sometimes isn’t• Creativity is absolutely crucial to solving

problems (as opposed to just implementing solutions)

• JDPA is cool – and I had never heard of it– I never thought I’d be able to write a

debugger but JDPA made it easy• ASM library is also cool – much nicer

than BCEL

Wanna help?

• http://pea-coverage.sourceforge.net• I’d welcome anyone who wants to

participate• Contributing to an open-source

project looks good on your resume hint hint…

Resources

• Java Platform Debugger Architecturehttp://java.sun.com/javase/technologies/core/toolsapi/jpda

• ASM – Byte code processing libraryhttp://asm.objectweb.org

• BCEL – Byte code processing libraryhttp://jakarta.apache.org/bcel

• AspectJ – Aspect-oriented Java extensionhttp://eclipse.org/aspectj

• JUnit – unit testing frameworkhttp://junit.org

Resources

• Emma – code coverage toolhttp://emma.sourceforge.net

• Cobertura – code coverage toolhttp://cobertura.sourceforge.net

• JCoverage – code coverage toolhttp://www.jcoverage.com

• Clover – code coverage toolhttp://www.atlassian.com/software/clover

Travel

Assessing Unit Test Quality