Upload
guest268ee8
View
2.000
Download
3
Tags:
Embed Size (px)
Citation preview
Welcome!Welcome!
http://www.sevajug.orghttp://www.sevajug.org
Assessing Unit Test Quality Assessing Unit Test Quality
Matt HarrahMatt Harrah
Southeast Virginia Java Users Southeast Virginia Java Users GroupGroup
20 Nov 200720 Nov 2007
About this presentation
• This presentation is more like a case study than a lecture
• Several different technologies will be discussed• In particular, we will discuss the suitability of
various technologies and approaches to solve a computing problem
• We will also discuss the computing problem itself
• Suggestions and questions are welcomed throughout
Before we begin: Some Before we begin: Some definitionsdefinitions
• Unit testing – testing a single class to Unit testing – testing a single class to ensure that it performs according to its ensure that it performs according to its API specs (i.e., its Javadoc)API specs (i.e., its Javadoc)
• Integration testing – testing that the Integration testing – testing that the units interact appropriatelyunits interact appropriately
• Functional testing – testing that the Functional testing – testing that the integrated units meet the system integrated units meet the system requirementsrequirements
• Regression testing – testing that Regression testing – testing that changes to code have not changes to code have not (re-)introduced unexpected changes in (re-)introduced unexpected changes in performance, inputs, or outputsperformance, inputs, or outputs
Part I: The Problem
Restating the obviousRestating the obvious
• The Brass Ring: We generally want The Brass Ring: We generally want our code to be as free of bugs as is our code to be as free of bugs as is economically feasibleeconomically feasible
• Testing is the only way to know how Testing is the only way to know how bug-free your code isbug-free your code is
• All four kinds of testing mentioned a All four kinds of testing mentioned a minute ago can be automated with minute ago can be automated with repeatable suites of testsrepeatable suites of tests
Restating the obviousRestating the obvious
• The better your test suite, the more The better your test suite, the more confidence you can have in your confidence you can have in your code’s correctnesscode’s correctness
• Junit is the most commonly used way Junit is the most commonly used way to automate unit tests for Java codeto automate unit tests for Java code– Suites of repeatable tests are commonly Suites of repeatable tests are commonly
built up over timebuilt up over time– QA involves running these suites of tests QA involves running these suites of tests
on a regular basison a regular basis
Brief DigressionBrief Digression
• Junit can also be used to do integration, Junit can also be used to do integration, functional, and regression testingfunctional, and regression testing– Integration tests theoretically should create Integration tests theoretically should create
two objects, and test their interactionstwo objects, and test their interactions– Functional tests can simulate the user Functional tests can simulate the user
interacting with the system and verifying its interacting with the system and verifying its outcomesoutcomes
– Regression testing is typically making sure Regression testing is typically making sure that changes do not introduce test failures that changes do not introduce test failures in the growing suite of automated testsin the growing suite of automated tests
So…if the better your test suite, the So…if the better your test suite, the better the code, the real question is:better the code, the real question is:
Mock ObjectsMock Objects
• Are you isolating your objects under test?Are you isolating your objects under test?– If a test uses two objects and the objects If a test uses two objects and the objects
interact, a test failure can be attributed to interact, a test failure can be attributed to either of the two objects, or because they either of the two objects, or because they were not meant to interactwere not meant to interact
– Mock objects are a common solutionMock objects are a common solution• One and only one real code object is tested – the One and only one real code object is tested – the
other objects are “mock objects” which simulate other objects are “mock objects” which simulate the real objects for test purposesthe real objects for test purposes
• Allows test writer to simulate conditions that might Allows test writer to simulate conditions that might be otherwise difficult to createbe otherwise difficult to create
– This problem is well-known and amply This problem is well-known and amply addressed by several products (e.g., addressed by several products (e.g., EasyMock)EasyMock)
Code CoverageCode Coverage
• Do you have enough tests? What’s Do you have enough tests? What’s tested and what isn’t?tested and what isn’t?– Well-known problem with numerous tools Well-known problem with numerous tools
to help, such as Emma, Jcoverage, to help, such as Emma, Jcoverage, Cobertura, and Clover. These tools monitor Cobertura, and Clover. These tools monitor which pieces of code under test get which pieces of code under test get executed during the test suite. executed during the test suite.
– All the code that executed during the test All the code that executed during the test is considered covered, and the other code is considered covered, and the other code is considered uncovered.is considered uncovered.
– This provides a numeric measurement of This provides a numeric measurement of test coverage (e.g., “Package x has 49% test coverage (e.g., “Package x has 49% class coverage”)class coverage”)
JUnit Fallacy # 1JUnit Fallacy # 1
• ““The code is just fine – all our tests The code is just fine – all our tests pass”pass”
• Test success does not mean the code is Test success does not mean the code is finefine
• Consider the following test:Consider the following test:
public void testMethod() {public void testMethod() {
;// Do absolutely nothing;// Do absolutely nothing
}}
This test will pass every time.This test will pass every time.
What the world What the world reallyreally needs needs
• Some way of measuring how rigorous Some way of measuring how rigorous each test iseach test is– A test that makes more assertions about A test that makes more assertions about
the behaviour of the class under test is the behaviour of the class under test is presumably more rigorous than one that presumably more rigorous than one that makes fewer assertionsmakes fewer assertions
– If only we had some sort of measure of If only we had some sort of measure of how many assertions are made per how many assertions are made per something-or-othersomething-or-other
“Assertion Density”
• Assertion Density for a test is defined by the equation shown, where– A is the assertion density– a is the number of assertions
made during the execution of the test
– m is the number of method calls made during the execution of the test
• Yep, I just made this up
Junit Fallacy #2Junit Fallacy #2
• ““Our code is thoroughly tested – Our code is thoroughly tested – Cobertura says we have 95% code Cobertura says we have 95% code coverage”coverage”
• Covered is not the same as testedCovered is not the same as tested• Many modules call other modules Many modules call other modules
which call other modules. which call other modules.
Indirect TestingIndirect Testing
Test A
Class A Class B Class C
Tests
Calls Calls
•Class A, Class B, and Class C all execute as Test A runs•Code coverage tools will register Class A, Class B, and class C as all covered, even though there was no test specifically written for Class B or Class C
What the world What the world reallyreally needs needs
• Some way of measuring how directly Some way of measuring how directly a class is testeda class is tested– A class that is tested directly and A class that is tested directly and
explicitly by a test designed for that explicitly by a test designed for that class is better-tested than one that only class is better-tested than one that only gets run when some other class is gets run when some other class is testedtested
– If only we had some sort of “test If only we had some sort of “test directness” measure…directness” measure…
– Perhaps a reduced quality rating the Perhaps a reduced quality rating the more indirectly a class is tested?more indirectly a class is tested?
“Testedness”
• Testedness is defined by the formula shown, where– t is the testedness– d is the test distance
– nd is the number of calls at test distance d
• Yep, I made this one up too
Part II: Solving the Problem
(or, at least attempting to…)
Project PEA
• Project PEA – Named after The Princess and the Pea
• Primary Goals: – Collect and report test directness /
testedness of code– Collect and report assertion density of
tests
• Start with test directness• Add assertion density later
Project PEA
• Requirements:– No modifications to source code or tests
required– Test results not affected by data gathering– XML result set
• Ideals:– Fast– Few restrictions on how the tests can be
run
Approach #1:Static Code Analysis
• From looking at a test’s imports, determine which classes are referenced directly by tests
• From each class, look what calls what• Assemble a call network graph• From each node in graph, count
steps to a test case
Approach #1:Static Code Analysis
• Doesn’t work• Reflective calls defeat static
detection of what is being called• Polymorphism defeats static
detection of what is being called
Approach #1:Static Code Analysis
• Consider:class TestMyMap extends AbstractMapTestCase;private void testFoo() {
// Map m defined in superclassm.put(“bar”,”baz”);
}
• What concrete class’ put() method is being called?
• Java’s late binding makes this static code analysis unsuitable
Approach #2:Byte Code Instrumentation
• Modify the compiled .class files to call a routine on each method’s entry
• This routine gets a dump of the stack and looks through the it until it finds a class that is a subclass of TestCase
• Very similar to what other code coverage tools like Emma do (except those tools don’t examine the stack)
Approach #2:Byte Code Instrumentation
• There are several libraries out there for modifying .class files– BCEL from Apache– ASM from ObjectWeb
• ASM was much easier to use than BCEL
• Wrote an Ant task to go through class files and add call to a tabulating routine that examined the stack
Approach #2:Byte Code Instrumentation
• It did work, but it was unbearably slow – typically 1000x slower and sometimes even slower
• This is because getting a stack dump is inherently slow– Stack dumps are on threads– Threads are implemented natively– To get a dump of the stack, the thread needs
to be stopped and the JVM has to pause
• To be viable, PEA cannot use thread stack dumps
Approach #3:Aspect-Oriented Programming
• Use AOP to accomplish similar tasks as the byte code instrumentation– Track method entry/exit and maintain a
mirror of the stack in the app– Calculate and record distance from a
test at every method entry
• Avoids the overhead of getting stack dumps from the thread
Approach #3:Aspect-Oriented Programming
• Unsatisfactory – in fact, a complete failure
• Method exits are all over the place in a method– Method exits in the byte code do not always
correspond to the source structure, particularly where exceptions are concerned
– Introducing an aspect behavior at each exit point can increase the size of the byte code by up to 30%
Approach #3:Aspect-Oriented Programming
• Expanding methods by 30% can (and did) cause them to bump into Java’s 64K limit on method bytecode– Instrumented classes would not load
• In addition, AspectJ required you to either:– Recompile the source and tests using AspectJ’s
compiler; or– Create and use your own aspecting-on-the-fly
classloader– Either way you still hit the 64K barrier
Approach #4:Debugger
• The idea is to write a debugger that monitors the tests in one JVM as they run in another
• The debugger can track method entries and exits as they happen and keep the stack straight
• The code being tested, and the tests themselves, do not need to be aware that the debugger is watching them
Approach #4:Debugger
• Java includes in the SE JDK an architecture called JPDA– Java Platform Debugger Architecture
• This architecture allows one JVM to debug another JVM over sockets, shared files, etc.
• It provides an Object-Oriented API for the debugging JVM– Models the debugged JVM as a POJO– Provides call-backs for events as they occur in
the debugged JVM
Approach #4:Debugger
• JPDA allows you to– Specify which events you want to be
notified about– Specify which packages, etc. should be
monitored– Pause and restart the other process– Inspect the variables, call stacks, etc. of
the other process (as long as it’s paused)
• No additional libraries required!
Putting JPDA to work
• First I wrote a debugger process to attach to an already running JVM– By using the <parallel> task in Ant, I can
simultaneously launch the debugger and the Junit tests
– The debugger will attach to and monitor the other JVM
– Register interest in callbacks on method entry and exit, exceptions, and JVM death
Putting JPDA to work
• As methods enter and exit, push and pop entries onto a stack maintained in the debugger– This effectively mirrors the real stack in
the tests– Ignore certain packages beyond
developer control (such as the JDK itself)
Putting JPDA to work
• As methods are entered, calculate and record its distance in the stack from a test
• Shut down when the other JVM dies• Just before shutdown, write all the
recorded data to a file
Putting JPDA to work
• Remember – an Ant Junit task can fork multiple JVMs – one per test if you want, so we need to monitor each one
• Multiple JVMs mean multiple files of recorded data that need to be accumulated after all the tests are complete
• Produce XML file of accumulated results
Results of using JPDA
• Performance is way better than using byte code instrumentation– Running with monitoring on slows execution by
100x or less, depending on the code
• Ant script is kind of complicated– JUnit tests and PEA must be run with forked
JVMs– Special JVM parameters for the debugged
process are required– JUnit and PEA must be started simultaneously
using the <parallel> task (which many people don’t know about)
Results of using JPDA
• The byte code being monitored is completely unchanged – No special instrumentation or
preparatory build step is required
• XML file comes out with details about how many method calls were made at what test distance
Results file example
Report Sample
Code Review
Let’s roll that beautiful Java footage!
Future Plans
• Implement assertion density tracking• Tweak performance• Make easier to run the tests with PEA running
– Perhaps subclass Ant’s <junit> task to run with PEA?
• Documentation (ick)• Eat my own dog food and use the tool to
measure my own JUnit tests• Sell for $2.5billion to Scott McNeely and
Jonathan Schwartz and retire to Jamaica
Lessons Learned (so far)
• What seems impossible sometimes isn’t• Creativity is absolutely crucial to solving
problems (as opposed to just implementing solutions)
• JDPA is cool – and I had never heard of it– I never thought I’d be able to write a
debugger but JDPA made it easy• ASM library is also cool – much nicer
than BCEL
Wanna help?
• http://pea-coverage.sourceforge.net• I’d welcome anyone who wants to
participate• Contributing to an open-source
project looks good on your resume hint hint…
Resources
• Java Platform Debugger Architecturehttp://java.sun.com/javase/technologies/core/toolsapi/jpda
• ASM – Byte code processing libraryhttp://asm.objectweb.org
• BCEL – Byte code processing libraryhttp://jakarta.apache.org/bcel
• AspectJ – Aspect-oriented Java extensionhttp://eclipse.org/aspectj
• JUnit – unit testing frameworkhttp://junit.org
Resources
• Emma – code coverage toolhttp://emma.sourceforge.net
• Cobertura – code coverage toolhttp://cobertura.sourceforge.net
• JCoverage – code coverage toolhttp://www.jcoverage.com
• Clover – code coverage toolhttp://www.atlassian.com/software/clover