A CONTROL INSTRUMENTS COMPANY The Effectiveness of T-way Test Data Generation or Data Driven Testing Michael Ellims

A CONTROL INSTRUMENTS COMPANY

The Effectiveness of T-way Test Data Generation

or Data Driven Testing

Michael Ellims


Overview

The problem with testing

Experimental designs

Adequacy of tests

Experiments in effectiveness

Optimisation


The problems with testing

Expensive– Estimated at 50% of project cost

Hard– No good theory on designing tests

Solution: automate the testing process?


Automated testing

Generation is half the problem– we can generate data but...

We still need an “oracle”– test have to pass or fail

Options– embedded assertions– formal models– usually – some human


Automated testing

We want a “simple” method– easy to understand– easy to use– inputs from development e.g. data

dictionary– data driven testing

Solution: design of experiments techniques?


Design of experiments

Full factorial experiments– “in which every setting of every factor

appears with every setting of every other factor”

Factor == variable

Setting == level == value


Design of experiments – Latin Square

v a r ia b le 2

v a r ia b le 1

A

B

C

D

B

C

D

A

D

A

B

C

C

D

A

B


Design of experiments

we have a set of sixteen test vectors

• v1 .. v

16

• read from the matrix as follows:

v1 = {1, 1, A}

v2 = {1, 2, B}

v3 = {1, 3, C} …

v16

= {4, 4, C}


t-way testing : example

Three variables a, b, c•a has three “valid” values, b has two c, has two•pairwise or 2-way adequate test set...

a1 a2 a3 a2 a1 a3 a1

b2 b1 b1 b2 b2 b2 b1

c1 c2 c1 c1 c2 c2 c1


Evidence

Many papers on 2-way adequate test– mostly vs. coverage criteria (good not

great)– issues with coverage

• some work supports, some doesn’t

Kuhn et al. (series of papers)– Implied higher factors than 2-way needed– t = 5 or 6

Schroeder et al.


Research Questions

• How good are t-way adequate test sets?

t = 2 to t =5

• Can we address oracle problem?

2283 vectors – can’t reviewed by hand!


Problem...

• Compare against what?– coverage : too weak

• Statement coverage• Branch coverage• MCDC coverage


Adequacy - code mutation

• Error based testing– for a limited set of errors– conceptually simple - coding errors

• Direct measure of test set “goodness”– Can test N find error X– Higher fidelity and statement coverage

• F1 : 12 lines but 81 code mutants• F2 : 33 lines but 669 code mutants• F3 : 51 lines but 1297 code mutants


What are code mutants?

if ((a < b) && ((x + y) > q))) ff = jj + 34;

if ((a > b) && ((x + y) > q))) ff = jj + 34;

if ((a < b) || ((x + y) > q))) ff = jj + 34;

if ((a < b) && ((x - y) > q))) ff = jj + 34;

if ((a < b) && ((x * y) > q))) ff = jj + 34;

if ((a < b) && ((x + y) > q))) ff += jj + 34;

if ((a < b) && ((x + y) > q))) ff = jj + 35;


Procedure

FOR each vector

FOR each mutant

run vector on un-mutated code // oracle!

run vector on mutant

compare results

ENDFOR

ENDFOR


Experiment 1 - effectiveness

How good is automated testing?– t-way verses hand generated tests– t-way verses random tests– t-way verses random designs


All methods – mutation score

0

20

40

60

80

100

120

2-way3-way4-way5-wayRdesignRandomBaseHand


Selected methods – mutation score

0

20

40

60

80

100

120

5-way

Rdesign

Random

Hand


Selected methods - raw data

0

100

200

300

400

500

600

700

800

5-way

Rdesign

Random

Hand


Experiment 2 - minimisation

Can we reduce test set to a manageable size?

– oracle problem – the oracle is a person!– You can examine 1000's test vectors

Can we get it to run faster?– 2000 vectors over 2000 mutants– at two seconds per test...


Optimization

FOR next t-way adequate test set // t = 2 .. 5

run remaining mutants vs. all remaining vectors

WHILE a test kill > 1 mutant remains

select test that kills most mutants

mark mutants as dead

ENDWHILE

ENDFOR

select vectors that kill remaining mutants


Time Improvement

05000

100001500020000250003000035000400004500050000

max

min


Size Improvement (x5)

0

500

1000

1500

2000

2500

3000

3500

_dip_debounce

_aip_median_filter

_sdc_fuel_control

aip_spike_filter

_thc_decide_state

_thc_autocal

_aip_apply_filters

_gov_rpm_err

_sdc_pre_start

_gov_gen_ffd_rpm

hand

max

min


Conclusions

t-way adequate test sets are competitive with hand generated tests.

– 2-way adequate tests sets are not– t >= 3, t = 5 or t =6 is best

Random Testing...– Good but...– NOT reliable– serious implications for testing research


Issues

Is mutation adequate?– Equivalent mutants

Too few functions

Simplistic data models

Structures– N dimensional arrays– Structures with structure– Sparse structures


Random Ideas

Mutations as a measure of complexity?– complexity of code is hard to measure– possible too one dimensional

Mutations as a measure of robustness– is code that has easily killed mutants

“better”

Documents

A CONTROL INSTRUMENTS COMPANY The Effectiveness of T-way Test Data Generation or Data Driven Testing Michael Ellims