DAIMI(c) Henrik Bærbak Christensen1 Test Planning

DAIMI (c) Henrik Bærbak Christensen 1

Test Planning


Definition

Plan: Document that provides a framework or approach for achieving a set of goals.

Corollary: You have to define the goals in advance.

Burnstein provides templates of a company testing policy that states overall goals.


Plan Contents

A testing plan must address issues like:– Overall testing objectives: why are we testing, risks, etc.– Which pieces will be tested?– Who performs the testing?– How will testing be performed?– When will testing be performed?– How much testing is adequate?

These dimensions are orthogonal (independent). Decisions must be made where to place your project within each of these dimensions.

Each dimension is a continuum.


Which pieces will be tested?

Continuum extremes:– every unit is tested– no testing at all (i.e. the users do it )

and variations– systematic approach for choosing which to test…– ROI (return on investment) important

• where does one test hour spent find most defects?• or ‘most annoying’ defects

Strategies– “Defect Hunting”, “Allocate by Profile”


Who performs testing?

Project roles:– Developer: construct products– Tester: detect failures in products

Remember: roles - not persons. Continuum extremes

– same persons do everything– roles always split between different persons

and all kinds of variations in between:– unit level: often same person has both roles

• XP pair programming often split roles in the pair

– system level: often separate teams

Testing psychology: do not test your own code…


How will testing be performed?

Continuum extremes– Specification only what black-box– Implementation alsohow white-box

Levels– unit, integration, system

Documentation– The XP way: move forward as fast as possible– The CMM way: make as much paper as possible


When will testing be performed?

Continuum extremes– test every unit as it becomes available

• “high frequency integration”, test-driven development

– delay until all units available • “Big-bang integration”

and variations– defects found early are usually cheaper to fix !!!

• why ???• Kent Beck says that this is not true !!!

– testing at the end of each increment / milestone


How much testing is adequate?

Continuum from none to very thorough… but when is enough enough ???

– life critical software; asset transaction handling– once used converter; research demo

Adequacy– defect detection cost versus increase in quality– standards: drug manufacturing / furniture manufact.

Coverage– code coverage– requirement coverage (use cases covered)


Test Plan Format


IEEE Test plan

IEEE Std for test plan

Template independent of particular testing level.– system, intg., unit

If followed rigorously at every level the cost may be very high...


Features to be tested

Items to be tested– “Module view”: actual units to be put under test.

Features to be tested/not to be tested– “Use case view”:

• from the users’ perspective• use cases


Points to note

Features not tested– incremental development means large code base is

relative stable...– additions + base changes

But – what do we retest?– everything?– just added+changed code?

Exercise: – Any ideas?– What influences our views?

Increment n

Inc n+1

Legend: ’blob’ measures code size; position whether code is additions or changes


Regression testing

The simple answer is : test everything all the time

which is what XP says at the unit level. However,

– some test run slowly (stress, deployment testing)– or are expensive to make (manual, hardware req.)

The question is then– which test cases exercise code that is changed???

Any views?


Test case traceability

It actually points towards a very important problem, namely traceability between tests, specification, and code units.

Simple model (ontology)– the problem is the multiplicity !

– tracing the dependencies ! test case

code unit

use case

derived-from

tested-by

exercise

implement

*

*

*

*

*

*


Side bar

At the CSMR 2004 conference an interesting problem was stated:– Stock trading application– 80.000 test cases over 7½ million C++ code lines– no traceability between specification, units, and tests

So – what to do?– Dynamic analysis

• Record time when each test case runs• Record time when each method is run (req. instrumentation)• compare the time stamps !


Approach

Section 5– managerial information that defines the testing

process• degree of coverage, time and budget limitations, stop-test

criteria

– and the actual test cases!

A bit weird to have both the framework of the test as well as the test itself in the same document. Usually the real test cases are in a separate document.


Pass/Fail Criteria

Pass/Fail criteria– at unit level this is often a binary decision

• either it passes (computed = expected)• or fail

– higher levels require severity levels• “save” operation versus “reconfigure button panel”

• allows conditionally passing the test

– compare review terminology


Suspension/Resumption Criteria

When to suspend testing– for instance if severity level 0 defect encountered

• “back to the developers, no idea to waste more time”

When to resume:– redo all tests after a suspend? Or only those not

tested so far?


Contents

Deliverables– what is the output

• test design specifications, test procedures, test cases• test incident reports, logs, ....

Tasks– the work-break-down structure

Environment– software/hardware/tools

Responsibilities– roles


Contents

Staff / Training Needs

Scheduling– PERT and Gant

Risks


Testing Costs

Estimation, in the form of ”staff hours”, is known to be a hard problem.– historical project data important– still, underestimation more the rule than the exception

Suggestion– ’prototype’ testing for ’typical’ use-cases/classes and

measure effort (staff hours)– count/estimate total use-cases and classes

Burnstein– look at project + organization characteristics, use

models, gain experience


Section 5

Section 5 contains the actual tests– The design of the tests, IDs– Test cases

• input, expected output, environment

– Procedure• how testing must be done

– especially important for manual

Test result reports– Test log: “laboratory diary”– Incident report: Report defects

• alternatively in defect tracking tool like bugzilla

Summary– summary and approval


Monitoring the Testing Process


Motivation

Testing is a managed process.– Clear goals and planned increments/milestones to

achieve them

– Progress must be monitored to ensure plan is kept.


Terms

Project monitoring: activities and tasks defined to periodically check project status.

Project controlling: developing and applying corrective actions to get project on track.

Usually we just use the term project management to cover both processes.


Measurements

Measuring should of course be done for a purpose. Thus there are several issues to consider:– Which measures to collect?– For what purpose?– Who will collect them?– Which tools/forms will be used to collect data?– Who will analyze data?– Who will have access to reports?


Purpose

Why collect data? Data is important for monitoring:

– testing status• indirectly: quality assessment of product

– tester productivity– testing costs– failures

• so we can remove defects


Metrics

Burnstein’s suggested metrics– Coverage– Test case development– Test execution– Test harness development

– Tester productivity

– Test cost

– Failure tracking


Coverage

Whitebox metrics– statement (block), branch, flow, path,...– ratio

• actual coverage / planned coverage

Blackbox metrics– # of requirements to be tested– # of requirements covered– ECs identified– ECs covered– ... and their ratios


Test Case Development

Data to collect:– # of planned test cases

• based upon (time allocated/mean time to complete one test)?

– # of available test cases

– # of unplanned test cases

So – what does the last measure mean?– heavy “water fall model” smell here?


Test Execution

Data collected:– # test cases executed– ... and passed– # unplanned test cases executed– ... and passed

– # of regression tests executed– ... and passed

– and their ratios


XP example

[From Jeffries’ paper]– Functional tests ≠ unit tests

• customer owned• feature oriented• not running at 100%

– Status• not developed• developed and

– pass

– fail

– expected output not validated


Test Harness Development

Data collected– LOC of harness (planned, available)

Comments?– Who commissions and develops the harness code?


Tester Productivity & Cost

!


Defects

Data collected on detected defects in order to– evaluate product quality– evaluate testing effectiveness– stop-test decision– cause analysis– process improvement

Metrics– # of incident reports; solved/unsolved; severity levels;

defects/KLOC; # of failures; # defects repaired


Test Completion

At one time, testing must stop... The question is: when? Criteria

– Planned tests pass• what about the unplanned ones?

– Coverage goals are met• branch coverage/unit; use case coverage/system

– Specific number of defects found• Estimates from historical data

– Defect detection rate falls below level• “less than 5 severity level > 3 defects per week.”


Test Completion Criteria

Criteria– fault seeding ratios are favorable

• seed with “representative defects”• how many does testing find

– postulate:• found seed defects / total seed defects• =• found actual defects / total actual defects


Summary

Plan testing– what, who, when, how, how much

• all are a continuum where decisions must be made

Document testing– IEEE outline a document template that is probably no

worse than many others...

Monitor testing– collect data to make sound judgements about

• progress• stop-testing criteria

Record incidents – defects found/repaired

Documents

DAIMI(c) Henrik Bærbak Christensen1 Test Planning