Dynamic Analysis of Algebraic Structure to Optimize Test Generation and Test Case Selection Anthony J H Simons and Wenwen Zhao

Dynamic Analysis of Algebraic Structure to Optimize Test Generation

and Test Case Selection

Anthony J H Simons and Wenwen Zhao

Overview

Lazy Systematic Unit Testing JWalk testing concept and methodology

The JWalk 1.0 toolset JWalkTester, JWalkUtility, JWalkEditor, etc.

Dynamic analysis and pruning extending earlier work to full algebraic analysis

Comparison and evaluation measure path pruning, before and after test result prediction, before and after

http://www.dcs.shef.ac.uk/~ajhs/jwalk/

Motivation

State of the art in agile testing Test-driven development is good, but… …no specification to inform the selection of tests …manual test-sets are fallible (missing, redundant cases) …reusing saved tests for conformance testing is fallible –

state partitions hide paths, faults (Simons, 2005) Lazy systematic testing method: the insight

Complete testing requires a specification (even in XP!) Infer an up-to-date specification from a code prototype Let tools handle systematic test generation and coverage Let the programmer focus on novel/unpredicted results

Lazy Systematic Unit Testing

Lazy Specification late inference of a specification from evolving code semi-automatic, by static and dynamic analysis of code

with limited user interaction specification evolves in step with modified code

Systematic Testing bounded exhaustive testing, up to the specification emphasis on completeness, conformance, correctness

properties after testing, repeatable test quality

http://en.wikipedia.org/wiki/Lazy_systematic_unit_testing

JWalk 1.0 Toolset

JWalk Tester JWalk Utility JWalk Editor

JWalk Marker JWalk Grapher JWalk SOAR

JWalk Tester

Lazy systematic unit testing for Java static analysis - extracts the public API

of a compiled Java class protocol walk (all paths) – explores, validates all interleaved

methods to a given path depth algebra walk (memory states) – explores, validates all

observations on all mutator-method sequences state walk (high-level states) – explores, validates n-switch

transition cover for all high-level states


Try me

Baseline Approaches

Breadth-first generation all constructors and all interleaved methods (eg JCrasher,

DSD-Crasher, Jov) generate-and-filter (eg Rostra, Java Pathfinder) by state

equivalence class

Computational cost exponential growth, memory issues, wasteful over-

generation, even if filtering is later applied

#paths = Σc.mk, for k = 0..n

Key: c = #constructors, m = #methods, k = depth

Dynamic Pruning

Interleaved analysis generate-and-evaluate, pruning active paths on the fly (eg

JWalk, Randoop) remove redundant prefix paths after each test cycle, don’t

bother to expand in next cycle Increasing sophistication

prune prefix paths ending in exceptions (fail again) JWalk, Randoop (2007)

and prefixes ending in algebraic observers (unchanged) JWalk 0.8 (2007)

and prefixes ending in algebraic transformers (reentrant) JWalk 1.0 (2009)

Prune Exceptions…

newpush

top

pop

pushtop

poppush

top

pop

pushtop

pop

Key: novel state

exception

top

poptop

pop

top

pop

top

poptop

pop

push push

push

push

Prune error-prefixes (JWalk0.8, Randoop)

…and Observers

newpush

top

pop

pushtop

poppush

top

pop

pushtop

pop

Key: novel state

exception

unchanged state

pushtop

pop

pushtop

pop

Prune error- and observer-prefixes (JWalk0.8)

Algebraic Pruning

newpush

top

pop

poptop

pushpush

top

pop

pushtop

pop

Key: novel state

exception

unchanged state

reentrant state

Prune error-, observer- and transformer-prefixes (JWalk1.0)

What is the Same State?

Some earlier approaches distinguish observers, mutators by signature (Rostra) intrusive state equality predicate methods (ASTOOT) external (partial) state equality predicates (Rostra) subsumption of execution traces in JVM (Pathfinder)

Some algebraic approaches shallow, deep equality under all observers (TACCLE)

but assumes observations are also comparable very costly to compute from first principles

serialise object states and hash (Henkel & Diwan) but not all objects are serialisable no control over depth of comparison

Smart State Inspection

Reflection-and-hash extract state vector from objects compute hash code for each field order-sensitive combination hash code

Proper depth control shallow or deep equality settings, to chosen depth hash on pointer, or recursively invoke algorithm

Fast state comparison each test evaluation stores posterior state code fast comparison with preceding, or all prior states possible to detect unchanged, or reentrant states

Pruning: Stack

Stack baseline except. observ. algebr.

0 1 1 1 1

1 7 7 7 7

2 43 31 13 13

3 259 139 25 19

4 1555 667 43 25

5 9331 3391 79 31

Pruned: 9,300 redundant pathsRetained: 31 significant paths (best 0.33%)

Table 1: Cumulative paths explored after each test cycle

Pruning: Reservable Book

ResBook baseline except. observ. algebr.

0 1 1 1 1

1 9 9 9 9

2 73 73 25 25

3 585 561 49 33

4 4681 4185 97 41

5 37449 memex 169 41

Pruned: 37,408 redundant pathsRetained: 41 significant paths (best 0.12%)

Table 2: Cumulative paths explored after each test cycle

Test Result Prediction

Semi-automatic validation the user confirms or rejects key results these constitute a test oracle, used in prediction eventually > 90% test outcomes predicted

JWalk test result prediction rules eg: predict repeat failure

new().pop().push(e) == new().pop()

eg: predict same state target.size().push(e) == target.push(e)

eg: predict same result target.push(e).pop().size() == target.size()

Try me

Kinds of Prediction

Strong prediction From known results, guarantee further

outcomes in the same equivalence class eg: observer prefixes empirically checked before making any

inference, unchanged state is guaranteed target.push(e).size().top() == target.push(e).top()

Weak prediction From known facts, guess further outcomes; an incorrect

guess will be revealed in the next cycle eg: methods with void type usually return no result, but may

raise an exception target.pop() predicted to have no result target.pop().size() == -1 reveals an error

Test Confirmation – JWalk 0.8

newpush

top

pop

pushtop

poppush

top

pop

pushtop

pop

Key: confirm result

confirm error

predicted result

pushtop

pop

pushtop

pop

Confirm all observations, errors on all state-modifying paths

Test Confirmation – JWalk 1.0

newpush

top

pop

pop

top

pushpush

top

pop

pushtop

pop

Confirm all observations, errors on all primitive algebraic constructions

Key: confirm result

confirm error

predicted result

Confirmations: Stack

Stack v0.8 alg v0.8 pro v1.0 alg v1.0 pro

0 1 - 1 -

1 5 - 5 -

2 4 - 4 -

3 9 - 4 -

4 12 - 4 +4

5 26 - 4 +8

Total 57 57 22 34

Table 3: Confirmations per test cycle (new oracle)

JWalk 0.8: trained oracle after 57 confirmationsJWalk 1.0: trained oracle after 34 confirmations

Confirmations: Reservable Book

ResBook v0.8 alg v0.8 pro v1.0 alg v1.0 pro

0 1 - 1 -

1 2 - 2 -

2 8 - 8 -

3 12 - 6 -

4 30 - 6 +20

5 40 memex - memex

Total 93 93 23 43

Table 4: Confirmations per test cycle (inherited oracle)

JWalk 0.8: trained oracle after 93 confirmationsJWalk 1.0: trained oracle after 43 confirmations

Why Residual Confirmations?

Prediction based on state equality from state equivalence:

target.push(e).pop() == target

predict identical observations: target.push(e).pop().size() == target.size()

Novel states occur in longer protocols JWalk has deterministic argument synthesis:

elements generated in order: e1, e2, … en

algebraic reduction yields a novel state: target.push(e1).pop().push(e2) == target.push(e2) target.push(e2) != target.push(e1) from the oracle

Conclusions …

Test path pruning algebraic analysis effective at eliminating redundant paths absolutely necessary when testing classes with large APIs

java.lang.Character: c = 1, m = 78; d3 base = 480,715 paths; alg = 79 paths, stable after 1 cycle

java.lang.String: c = 13, m = 64; d3 base = 54,093 paths; alg = 845 paths, stable after 1 cycle

More test automation presents user with the ideal mimimal test-set for judgement user only has to confirm all errors and observations on all

primitive algebraic constructions

Conclusions

Faster state exploration algebra-walking finds the leaves of the algebra-tree faster state-walking discovers high-level states faster, by growing

only primitive state-modifying paths can afford to search to greater test depths

Test result prediction algebraic anaylsis improves predictive power as expected but oracle must also have the reduction (and may not) future idea: axiom generalisation? (Henkel & Diwan)

Thank You!

And thanks also to: Wenwen Zhao – hashing on states for comparison Mihai-Gabriel Glont – prototype UI for JWalkTester Arne-Michael Toersel – case studies for JWalk


Let’s go JWalking!

Confirmations: Library Book

LibBook v0.8 alg v0.8 pro v1.0 alg v1.0 pro

0 1 - 1 -

1 2 - 2 -

2 3 - 3 -

3 2 - - -

4 3 - - +3

5 2 - - -

Total 13 13 6 9

Table 5: Confirmations per test cycle (new oracle)

JWalk 0.8: depth-5 oracle after 13 confirmationsJWalk 1.0: depth-5 oracle after 9 confirmations

Documents

Dynamic Analysis of Algebraic Structure to Optimize Test Generation and Test Case Selection Anthony J H Simons and Wenwen Zhao