2011/09/20 - Software Testing

Software Testing

Fernando Brito e Abreu

DCTI / ISCTE-IUL QUASAR Research Group

mailto:[email protected]

http://www.dcti.iscte.pt/






http://ctp.di.fct.unl.pt/QUASAR

Software Engineering / Fernando Brito e Abreu 2 27-Sep-11

SWEBOK: the 10 Knowledge Areas

Software Requirements

Software Design

Software Construction

Software Testing

Software Maintenance

Software Configuration Management

Software Engineering Management

Software Engineering Process

Software Engineering Tools and Methods

Software Quality


Motivation - The Bad News ...

Software bugs cost the U.S. economy an

estimated $59.5 billion annually, or about 0.6%

of the gross domestic product.

Sw users shoulder more than half of the costs

Sw developers and vendors bear the remainder

of the costs.

Source:The Economic Impacts of Inadequate Infrastructure for

Software Testing, Technical Report, National Institute of

Standards and Technology, USA, May 2002

http://www.nist.gov/director/prog-ofc/report02-3.pdf







Motivation - The GOOD News!

According to the same report:

More than 1/3 of the costs (an estimated $22.2

billion) can be eliminated with earlier and more

effective identification and removal of software

defects.

Savings can mainly occur in the development

stage, when errors are introduced.

More than half of these errors aren't detected until

later in the development process or during post-sale

software use.


Motivation

Reliability is one of the most important software

quality characteristics

Reliability has a strong financial impact:

better image of producer

reduction of maintenance costs

celebration or revalidation of maintenance contracts,

new developments, etc.

The quest for Reliability is the aim of V&V !


Verification and Validation (V&V)

Verification - product correctness and

consistency in a given development phase, face

to products and standards used as input to that

phase - "Do the Job Right"

Validation - product conformity with specified

requirements - "Do the Right Job"

Basically two complementary V&V techniques :

Reviews (Walkthroughs, Inspections, ...)

Tests


Summary

Software Testing Fundamentals

Test Levels

Test Techniques

Test-related Measures

Test Process


Summary


Test Levels

Test Techniques


Test Process


Testing is …

… an activity performed for evaluating product quality, and for improving it, by identifying defects and problems.

… the dynamic verification of the behavior of a program on a finite set of test cases, suitably selected from the usually infinite executions domain, against the expected behavior.


Dynamic versus static verification

Testing always implies executing the program on

(valued) inputs; therefore is a dynamic technique

The input value alone is not always sufficient to determine a

test, since a complex, nondeterministic system might react to

the same input with different behaviors, depending on its state

Different from testing and complementary to it are static

techniques (described in the Software Quality KA)


Terminology issues Error

the human cause for defect existence (although bugs walk …)

Fault or defect (aka bug) incorrectness, omission or undesirable characteristic in a deliverable

the cause of a failure

Failure Undesired effect (malfunction) observed in the system’s delivered service

Incorrectness in the functioning of a system

See: IEEE Standard for SE Terminology (IEEE610-90)


Testing views

Testing for defect identification A successful test is one which causes a system to fail

Testing can reveal failures, but it is the faults (defects) that must be removed

Testing to demonstrate (that the software meets its specifications or other desired properties) A successful test is one where no failures are observed

Fault detection (e.g. in code) is often hard through failure exposure Identifying all failure-causing input sets (i.e. those sets of inputs that

cause a failure to appear) may not be feasible


Summary


Test Levels

Test Techniques


Test Process


Test Levels Objectives of testing

Testing can be aimed at verifying different properties: Checking if functional specifications are implemented right

aka conformance testing, correctness testing, or functional testing

Checking nonfunctional properties E.g. performance, reliability evaluation, reliability measurement,

usability evaluation, etc

Stating the objective in precise, quantitative terms allows control to be established over the test process Often objectives are qualitative or not even stated explicitly


Test Levels Objectives of testing

Acceptance /

Qualification testing

Installation testing

Alpha and beta testing

Conformance /

Functional /

Correctness testing

Reliability achievement

and evaluation

Regression testing

Performance testing

Stress testing

Back-to-back testing

Recovery testing

Configuration testing

Usability testing


Test Levels – Objectives of testing Acceptance / Qualification testing

Checks the system behavior against the

customer’s requirements

The customer may not exist yet, so someone has to

forecast his intended requirements

This testing activity may or may not involve the

developers of the system


Test Levels – Objectives of testing Installation testing

Installation testing can be viewed as system

testing conducted once again according to

hardware configuration requirements

Usually performed in the target environment at the

customer’s premises

Installation procedures may also be verified

e.g. is the customer local expert able to add a new

user in the developed system?


Test Levels – Objectives of testing Alpha and beta testing

Before the software is released, it is sometimes

given to a small, representative set of potential

users for trial use. Those users may be:

in-house (alpha testing)

external (beta testing)

These users report problems with the product

Alpha and beta use is often uncontrolled, and is not

always referred to in a test plan


Test Levels – Objectives of testing Conformance / Functional / Correctness testing

Conformance testing is aimed at validating

whether or not the observed behavior of the

tested software conforms to its specifications


Test Levels – Objectives of testing Reliability achievement and evaluation

Testing is a means to improve reliability

By randomly generating test cases according to

the operational profile, statistical measures of

reliability can be derived

Reliability growth models allow to express this

reality


Reliability growth models

Provide a prediction of reliability based on the failures observed under reliability achievement and evaluation

They assume, in general, that:

a growing number of well-succeeded tests increases our confidence on the system’s reliability

the faults that caused the observed failures are fixed after being found (thus, on average, product’s reliability has an increasing trend)


Reliability growth models

Many models were published, which are divided

into:

failure-count models

time-between failure models


Test Levels – Objectives of testing Regression testing (1/2)

Regression testing is:

The “selective retesting of a system or component to verify

that modifications have not caused unintended effects.”

(IEEE610.12-90)

Any repetition of tests intended to show that the software’s

behavior is unchanged, except insofar as required

A technique to combat side-effects!

In practice, the idea is to show that software which

previously passed the tests still does


Test Levels – Objectives of testing Regression testing (2/2)

A trade-off must be made between: the assurance given by regression every time a change is made

… and the resources required to do that

To allow regression tests we must build, incrementally, a test battery

Regression testing is more feasible if we have tools to record and playback test cases Several commercial user interface event-caption tools (black-

box testing) exist


Test Levels – Objectives of testing Performance testing / Stress testing

Aimed at verifying that the software meets the

specified performance requirements:

e.g. volume testing and response time

The performance degradation under increasingly

exigent scenarios should be plotted

If we exercise software at the maximum design

load (or beyond it), we call it stress testing


Test Levels – Objectives of testing Back-to-back testing

A single test set is performed on two

implemented versions of a software product

The results are compared

Whenever a mismatch occurs, then one of the two

versions (at least) is probably evidencing failure


Test Levels – Objectives of testing Recovery testing

Aimed at verifying software restart capabilities

after a “disaster”

Recovery testing is a fundamental step in

building a contingency plan


Test Levels – Objectives of testing Configuration testing

When software is built to serve different users,

configuration testing analyzes the software under

the various specified configurations

The problem is similar when the hardware of software

platform varies somehow (e.g. different mobile phone

versions, different browsers)

This is one of the main issues in software

product lines development

See: http://www.sei.cmu.edu/plp/framework.html

http://www.sei.cmu.edu/plp/framework.html


Test Levels – Objectives of testing Usability testing

This process evaluates how easy it is for end-

users to use and learn the software, including:

user documentation

initial installation and extension through add-ons

effectively support in user tasks

…


Test Levels The target of the test

Unit testing the target is a single module

Integration testing the target is a group of modules (related by purpose, use,

behavior, or structure)

System testing the target is a whole system


Test Levels – The target of the test Unit testing

Verifies the functioning in isolation of software pieces

which are separately testable

Depending on the context, they can be individual subprograms

or a larger component made of tightly related units

Typically, unit testing occurs with:

access to the code being tested

support of debugging tools

the programmers who wrote the code


Test Levels – The target of the test Integration testing

Is the process of verifying the interaction between software components Classical integration testing strategies

top-down or bottom-up, are used with hierarchically structured sw

Modern systematic integration strategies architecture-driven, which implies integrating the software

components or subsystems based on identified functional threads

Except for small, simple software, systematic, incremental integration testing strategies are usually preferred to putting all the components together at once The latter is called “big bang” testing


Test Levels – The target of the test System testing

The majority of functional failures should already have

been identified during unit and integration testing

Main concerns:

Assessing if the system complies to the non-functional

requirements, such as security, speed, accuracy, and reliability

Assess if the external interfaces to other applications,

utilities, hardware devices, or the operating environment are

performed well


Test Levels Identifying the test set

Test adequacy criteria

Is the test set consistent?

How much testing is enough?

How many test cases should be selected?

Test selection criteria

How is the test set composed?

Which test cases should be selected?


Test case selection

Proposed test techniques differ essentially in

how they select the test set, which may yield

vastly different degrees of effectiveness

In practice, risk analysis techniques and test

engineering expertise are applied to identify the

most suitable selection criterion under given

conditions


How large should a test battery be?

Even in simple programs, so many test cases are

theoretically possible that exhaustive testing

could require months or years to execute

In practice the whole test set can generally be

considered infinite

Testing always implies a trade-off:

limited resources and schedules on the one hand

inherently unlimited test requirements on the other


After testing …

Even after successful completion of extensive

testing, the software could still contain faults

The remedy for sw failures found after delivery is

provided by corrective maintenance actions

This will be covered in the Software Maintenance KA


Summary


Test Levels

Test Techniques


Test Process


Test Techniques

Functional / Black box (based on user’s intuition

and experience)

Based on tester's intuition and experience

Specification-based

Code-based

Usage-based

Fault-based

Based on nature of application

Selecting and combining techniques


Functional Tests (Black-Box) actors

A relevant aspect of black-

box testing is that it is not

compulsory to use

programming experts to

produce a test battery

Extensive invalid input

characterization heavily

relies on tester experience

Case study: Tool to capture GUI events

(functional test cases)


Functional Test Tools - Visual Test


Grouping of test cases

Test cases

Test battery

(test suite)

Reusable test code

Functional Test Tools




Integration with other

Rational tools




Reported failures

Test cases to

execute in this suite

Integration with other

Rational tools



Assessing Functional Test Coverage

The ReModeler tool from the

QUASAR team takes an

innovative model-based

approach to represent this

kind of testing coverage

The color represents the

percentage of the scenarios of

each use case that were

executed by a given test suite


Test Techniques Based on tester's intuition and experience

Ad hoc testing

Perhaps the most widely practiced technique remains

ad hoc testing

Tests are derived relying on the software engineer’s

skill, intuition, and experience with similar programs

Ad hoc testing might be useful for identifying special

tests, those not easily captured by formalized

techniques


Test Techniques Based on tester's intuition and experience Exploratory testing

Simultaneous learning, test design and execution

The tests are not defined in advance in an established test

plan, but are dynamically designed, executed, and modified

The effectiveness of this approach relies on the tester

knowledge, which can be derived from many sources:

observed product behavior during previous version testing

familiarity with the application, platform, failure process

type of possible faults and failures

the risk associated with a particular product

…


Test Techniques Specification-based

Equivalence partitioning

Boundary-value analysis

Decision table

Finite-state machine-based

Testing from formal specifications

Random testing


Test Techniques – Specification-based Equivalence partitioning

The input domain is subdivided into a collection

of subsets, or equivalent classes, which are

deemed equivalent according to a specified

relation, and a representative set of tests

(sometimes only one) is taken from each class.


Test Techniques – Specification-based Boundary-value analysis

Test cases are chosen on and near the boundaries of

the input domain of variables, with the underlying

rationale that many faults tend to concentrate near the

extreme values of inputs

An extension of this technique is robustness testing,

wherein test cases are also chosen outside the input

domain of variables, to test program robustness to

unexpected or erroneous inputs

Case study: Equivalence partitioning and

boundary-value analysis


Triangle Classifier

Classic problem proposed in [Myers79] and

[Hetzel84]:

Distinct classification criteria:

dimension of sides - equilateral, isosceles or scalene

bigger angle - acute, rectangle or obtuse


Triangle Classifier: specification

Input:

dimensions of the three sides: three numbers,

separated by commas (or two angles instead).

Algorithm:

If the dimension of one side is superior to the sum

of the other two, then write ”Not a triangle!"

If it is a valid triangle, then write its classification:

according to the biggest angle - obtuse, rectangle or

acute

according to the side dimension - scalene, isosceles or

equilateral


Triangle Classifier: specification

Output: Write a test case battery for the triangle

classifier

For each test case, specify:

input values (including invalid or unexpected

conditions)

corresponding expected output values

Example: 3,4,5 -> scalene, rectangle


Triangle Classifier equivalence partitioning

For a complete test battery, we need to:

divide the solution space in partitions

identify typical cases for each partition

identify frontier cases

identify extreme cases

identify invalid cases.

Now it is your time to work ...

Don’t turn the page until you finished!


Triangle Classifier partitions and typical cases

SCALENE ISOSCELES EQUILATERAL

OBTUSE 10, 6, 5 12, 7, 7 Impossible

RECTANGLE 5, 4, 3 18 , 3, 3 Impossible

ACUTE 6, 5, 2 7,7,4 6, 6, 6

SCALENE ISOSCELES EQUILATERAL

OBTUSE 120º, 40º (20º) 120º, 30º (30º) Impossible

RECTANGLE 90º, 40º (50º) 90º, 45º (45º) Impossible

ACUTE 30º, 70º (80º) 30º, 75º (75º) 60º, 60º (60º)


Triangle Classifier boundary values

4.001, 4, 3.999 almost equilateral (scalene acute)

4.0001, 4, 4 almost equilateral (isosceles acute)

3, 4.9999, 5 almost isosceles (scalene acute)

9, 4.9999, 5 almost isosceles (scalene obtuse)

5, 3.9999, 3 almost rectangle (scalene acute)

5.0001, 4, 3 almost rectangle (scalene obtuse)

1, 1, 1.4141 almost rectangle (isosceles acute)

1, 1, 1.4143 almost rectangle (isosceles obtuse)


Triangle Classifier extreme cases

1, 2, 3 line segment!

0, 0, 0 point!

Note: extreme cases are not invalid!


Triangle Classifier: invalid cases 6, 4, 0 null side!

12, 4, 3 not a triangle!

5, 3, 2, 5 four sides!

2, 5 one side missing!

3.45 only one side!

No value!

3, , 4, 6 incorrect format

4A, 3, 7 invalid value

6, -1, 4 negative value


Triangle Classifier invalid cases

As we saw, apparently simple problems, often

have some subtleties that make testing more

complex than expected!

Frontier values and invalid input state spaces are

the most likely situations producing failures


Test Techniques – Specification-based Decision table

Decision tables represent logical relationships between

conditions (roughly, inputs) and actions (roughly,

outputs)

Test cases are systematically derived by considering

every possible combination of conditions and actions

A related technique is cause-effect graphing


Test Techniques – Specification-based Finite-state machine-based

By modeling a program as a finite state machine,

tests can be selected in order to cover states and

transitions on it


Test Techniques – Specification-based Testing from formal specifications

Giving the specifications in a formal language

allows for automatic derivation of functional

test cases

At the same time, provides a reference output, an

oracle, for checking test results

This is an active research topic


Test Techniques – Specification-based Random testing

Tests are generated in a stochastic (non-deterministic)

way

This form of testing falls under the heading of the

specification-based entry, since at least the input

domain must be known, to be able to pick random

points within it



We simulate the data input by generating sequences

of values that may occur in practice

This process must be repeated on and on because, in the

long term, we can generate all possible input combinations

This approach is only feasible with a tool, a case test

generator - its input is some sort of description of

possible input values input, their sequence and

probability of occurrence



Random tests are often used to test compilers,

through the generation of random programs

The description of possible input sequences can be made

with BNF (Backus Naur Form)

Random testing can also be used in testing

communications protocol software

The description of possible input sequences can be made

out of the state machines that describe each of the involved

parties


Test Techniques Code-based (aka white box)

Control-flow-based criteria

Data flow-based criteria


Test Techniques – Code-based Control-flow-based criteria

Several testing tools allow the generation of

Control Flow Graphs from source code.

By instrumenting source code these tools allow to

verify graphically the execution of each edge and

node in the network



The strongest control-flow-based criteria is path testing, which aims executing all entry-to-exit control flow paths in the flowgraph

Full path testing is generally not feasible because of loops



Control-flow-based coverage criteria is aimed at covering all the statements or blocks of statements in a program Several coverage criteria have been proposed, like

condition/decision coverage

A test battery coverage is the percentage of the total code (e.g. statements or branches/decisions coverage) which is exercised by that battery

Code coverage is a much less stringent criteria than path coverage

Case study: Graph-based control flow

testing techniques


Control flow graphs Are a graphical representation of programs that

traduces the ways they can be transversed

during execution

nodes represent decisions

oriented edges represent sets of sequential

instructions

In more complex code segments, the graph looks like

spaghetti - more tests are needed


Control flow graphs


Example: tax calculation

Consider an IRS tax system that reads annual

income revenues and determines the

corresponding tax due:

If the total income is less than 25K EUROS no tax is

deducted

If it is above that, but less than 100K EUROS, the tax

is 7%

otherwise is 15%


Example: tax calculation Function Calculates_Tax ( Int n)

Array of Int income;

Int total,tax;

1. total, tax = 0;

2. for i=1 to n

3. {read(income[i]);

4. total = total + income[i]};

5. if total >= 25000 then

6. tax = total * 0.07

7. else if total >= 100000 then

8. tax = total * 0.15;

9. return( tax)

12

3

4

5 6

7

8 9

2

5

7

9

Note: the problem solution is wrong,

because the condition for the 100K

EURO limit should be tested first.

This defect would be caught by

structural testing.


Example: how many test cases?

Based on graph theory, Tom McCabe proposed

the cyclomatic complexity metric that

expresses the minimum number of test cases for

100% test coverage: v(G) = # edges - # nodes + # inputs and outputs

In the current case we obtain:

11 - 9 + 2 = 4 (complete graph)

6 - 4 + 2 = 4 (reduced graph)

Therefore we should be able to produce 4 test cases

that when applied would lead to a 100% coverage.


Call graphs

Are a graphical representation of the

dependences of functions, procedures or

methods on each other

nodes (boxes) represent functions, methods, etc

oriented edges represent invocations made

This kind of white box testing is often used for

profiling execution snapshots


Call graph based testing





Colors are often used to

represent coverage

percentages


Assessing structural test coverage

The ReModeler tool from

the QUASAR team uses

a model-based approach

to represent this kind of

testing coverage

Each class or package is

colored according to the

percentage of executed

methods


Test Techniques – Code-based Data-flow-based criteria In data-flow-based testing, the control flowgraph is

annotated with information about how the program

variables are defined, used, and killed (undefined)

The strongest criterion, all definition-use paths, requires

that, for each variable, every control flow path segment

from a definition of that variable to a use of that

definition is executed

In order to reduce the number of paths required, weaker

strategies such as all-definitions and all-uses are

employed


Test Techniques Fault-based

With different degrees of formalization, fault-

based testing techniques devise test cases

specifically aimed at revealing categories of likely

or predefined faults

Two main techniques exist:

Error guessing

Mutation testing


Test Techniques – Fault-based Error guessing

In error guessing, test cases are specifically

designed by software engineers trying to figure

out the most plausible faults in a given program

A good source of information is the history of

faults discovered in earlier projects, as well as

the software engineer’s expertise


Test Techniques – Fault-based Mutation testing

A mutant is a slightly modified version of the program under test, differing from it by a small, syntactic change

Every test case exercises both the original and all generated mutants: if a test case is successful in identifying the difference between the program and a mutant, the latter is said to be “killed”

Originally conceived as a technique to evaluate a test set, mutation testing is also a testing criterion in itself: either tests are randomly generated until enough mutants have been killed, or tests are specifically designed to kill surviving mutants In the latter case, mutation testing can also be categorized as a code-based technique

The underlying assumption of mutation testing, the coupling effect, is that by looking for simple syntactic faults, more complex but real faults will be found

For the technique to be effective, a large number of mutants must be automatically derived in a systematic way.


Test Techniques Usage-based

Operational profile

Software Reliability Engineered Testing


Test Techniques – Usage-based Operational profile In testing for reliability evaluation, the test

environment must reproduce the operational environment of the software as closely as possible

The idea is to infer, from the observed test results, the future reliability of the software when in actual use

To do this, inputs are assigned a probability distribution, or profile, according to their occurrence in actual operation


Test Techniques – Usage-based Software Reliability Engineered Testing

Software Reliability Engineered Testing (SRET)

is a testing method encompassing the whole

development process, whereby testing is

“designed and guided by reliability objectives and

expected relative usage and criticality of different

functions in the field.”


Test Techniques Based on nature of application Object-oriented testing

Component-based testing

Web-based testing

GUI testing

Testing of concurrent programs

Protocol conformance testing

Testing of real-time systems

Testing of safety-critical systems


Test Techniques Selecting and combining techniques Specification-based and code-based test

techniques are often contrasted as functional vs.

structural testing

These two approaches to test selection are not to

be seen as alternative but rather complementary

in fact, they use different sources of information and

have proved to highlight different kinds of problems

they should be used in combination, depending on

budgetary considerations


Automatic Construction of Test Cases

Test generation is possible from: model-based specifications

algebraic (formal) specifications

Segmentation (“slicing”) and ramification

(“branch analysis”) techniques are used to

identify partitions


Automatic Construction of Test Cases TTCN (Tree and Tabular Combined Notation)

1983: ISO TC 97/SC 16 and later in ISO/IEC JTC 1/SC 21 and in CCITT SG VII as part of the work on OSI conformance testing methodology and framework Has been widely used since then for describing protocol

conformance test suites in standardization organizations such as ITU-T, ISO/IEC, ATM Forum, ETSI and industry

1998: TTCN-2, in ISO/IEC and in ITU-T New features: concurrency mechanism, concepts of module and package,

manipulation of ASN.1 encoding

TTCN-3


Automatic Construction of Test Cases TTCN (Tree and Tabular Combined Notation)

TTCN is a standardized test case format

The main characteristics of TTCN are that: its Tabular Notation allows its user to describe easily and

naturally in a tree form all possible scenarios of stimulus and various reactions to it between the tester and the target

its verdict system is designed such that to facilitate conformance judgment on the test result agrees against the test purpose, and

it provides a mechanism to describe appropriate constraints on received messages so that conformance of the received messages can be automatically evaluated against the test purpose

TTCN-3 example

The following is an example of an

Abstract Test Suite (ATS)

where we are trying to test a

weather service

The tester sends a request

consisting of a location, a date

and a kind of report to some on-

line weather service and receives

a response with confirmation of

the location and date along with

the temperature, the wind

velocity and the weather

conditions at this location


TTCN-3 example

A TTCN-3 ATS is always composed of four sections:

1. type definitions: data structures like in C but also an easy to use

concept of lists and sets

2. template definitions: A TTCN-3 template consists of two separate

concepts merged into one:

test data definition

test data matching rules

3. test cases definitions: specifies the sequences and alternatives of

sequences of messages sent and received to and from the System Under

Test (SUT)

4. test control definitions: defines the order of execution of various test

cases


Sample TTCN-3 Abstract Test Suite module SimpleWeather {

type record weatherRequest {

charstring location,

charstring date,

charstring kind

}

template weatherRequest

ParisWeekendWeatherRequest := {

location := "Paris",

date := "15/06/2006",

kind := "actual"

}

type record weatherResponse {

charstring location,

charstring date,

charstring kind,

integer temperature,

integer windVelocity,

charstring conditions

}

template weatherResponse ParisResponse := {

location := "Paris",

date := "15/06/2006",

kind := "actual",

temperature := (15..30),

windVelocity := (0..20),

conditions := "sunny"

}


Sample TTCN-3 Abstract Test Suite

type port weatherPort message {

in weatherResponse;

out weatherRequest;

}

type component MTCType {

port weatherPort weatherOffice;

}

testcase testWeather() runs on MTCType {

weatherOffice.send(ParisWeekendWeatherRequest);

alt {

[] weatherOffice.receive(ParisResponse) {

setverdict(pass)

}

[] weatherOffice.receive {

setverdict(fail)

}

}

}

control {

execute (testWeather())

}

}



Automatic Construction of Test Cases

Implies the resolution of several problems:

program decomposition (slicing)

classification of partitions found

selection of test paths

test case generation to exercise those paths

validation of generated cases

Last problem is solved by the construction of an oracle

(software) whose function is to find if, for a given test,

the program responds according to its specification.


Automatic Construction of Test Cases An example

Who? » Siemens + Swiss PTT

What? » SAMSTAG (Sdl And Msc

baSed Test cAse Generation)

How to model system & tests? Target system (SDL) Test scenarios (MSC)

SDL - Specification and Description

Language [ITU Z.100]

MSC - Message Sequence Chart

[ITU Z.120]

TTCN (Tree and Tabular Combined

Notation) [ISO/IEC JTC1/SC21]


Automatic Construction of Test Cases Some tools

Validator (Aonix)

SoftTest (?)

ObjectGEODE TestComposer

(Verilog)

TestFactory (Rational)


Summary


Test Levels

Test Techniques


Test Process


Test-related Measures Evaluation of the program under test

Program measurements to aid in planning and designing testing

To guide testing we may use measures based on: program size

E.g. SLOC or function points

program structure E.g. McCabe’s metrics or frequency with which modules

call each other


Test-related Measures Evaluation of the program under test Fault types, classification, and statistics

Testing literature is rich in classifications and taxonomies of faults

To make testing more effective, it is important to know: which types of faults could be found in the software under test

the relative frequency with which these faults have occurred in the past

This information can be very useful in making quality predictions, as well as for process improvement


Test-related Measures Evaluation of the program under test

Fault density

A program under test can be assessed by counting and

classifying the discovered faults by their types

For each fault class, fault density is measured as the

ratio between the number of faults found and the size of

the program


Test-related Measures Evaluation of the tests performed

Coverage/thoroughness measures

Several test adequacy criteria require that the test cases systematically exercise a set of elements identified in the program or in the specifications

To evaluate the thoroughness of the executed tests, testers can monitor the elements covered, so that they can dynamically measure the ratio between covered elements and their total number For example, it is possible to measure the percentage of covered branches

in the program flowgraph, or that of the functional requirements exercised among those listed in the specifications document

Code-based adequacy criteria require appropriate instrumentation of the program under test

Example: Static and dynamic metrics

used to guide white-box testing


Static Metrics Collection

Some examples collected by White-box tools:

– Number of private, protect and public attributes

– Overloading, overriding and visibility of operations

– Comments density (eg. JavaDoc comments per class)

– Inheritance metrics (ex: depth,width,inherited features)

– MOOSE metrics (Chidamber and Kemerer)

– MOOD metrics (Brito e Abreu)

– QMOOD metrics (Jagdish Bansiya)


Static Metrics - ex: Cantata++


Dynamic Metrics Collection

Class, Operation, Branch, Exception clause coverage

Example: Multiple Condition Coverage

Measures whether each combination of condition outcomes for a decision has been exercised; are f() and g() called in the following code extract? if ((a == b || f()) && (c == d || g())) x(); else y();

Note that the expression can be evaluated to true without

calling f() or g().


Test-related Measures Evaluation of the tests performed Fault seeding

Some faults are artificially introduced into the program before test

When the tests are executed, some of these seeded faults will be revealed, and possibly some faults which were already there will be as well depending on which of the artificial faults are discovered, and how many,

testing effectiveness can be evaluated, and the remaining number of genuine faults can be estimated

Problems: distribution and representativeness of seeded faults relative to original ones

small sample size on which any extrapolations are based

inserting faults into software involves the obvious risk of leaving them there


Test-related Measures Evaluation of the tests performed

Mutation score

In mutation testing, the ratio of killed mutants to

the total number of generated mutants can be a

measure of the effectiveness of the executed test

set


Summary


Test Levels

Test Techniques


Test Process


Test Process – Practical Considerations Attitudes / Egoless programming

A very important component of successful testing is a collaborative attitude towards testing and quality assurance activities

Managers have a key role in fostering a generally favorable reception towards failure discovery during development and maintenance for instance, by preventing a mindset of code ownership

among programmers, so that they will not feel responsible for failures revealed by their code


Test Process – Practical Considerations Test guides

The testing phases could be guided by various

aims:

in risk-based testing, which uses the product risks

to prioritize and focus the test strategy

in scenario-based testing, in which test cases are

defined based on specified software scenarios


Test Process – Practical Considerations Test documentation and work products

Documentation is an integral part of the formalization of the test process

Test documents may include: Test Plan

Test Design Specification

Test Procedure Specification

Test Case Specification

Test Log

Test Incident or Problem Report


Test Process – Practical Considerations Internal vs. independent test team

External members, may bring an unbiased, independent

perspective

Decision on internal, external or a blend of teams,

should be based upon considerations of:

cost

schedule

maturity levels of the involved organizations

criticality of the application

Case study:

ISV&V at Critical Software

http://www.criticalsoftware.com/software-vv.html


Test Process – Practical Considerations Cost/effort estimation and other process measures

Several measures related to the resources spent

on testing, as well as to the relative fault-finding

effectiveness of the various test phases, are

used by managers to control and improve the

test process, such as:

number of test cases specified

number of test cases executed

number of test cases passed

number of test cases failed


Test Process – Practical Considerations Cost/effort estimation and other process measures

Evaluation of test phase reports can be combined with root cause analysis to evaluate test process effectiveness in finding faults as early as possible Such an evaluation could be associated with the analysis of

risks

Moreover, the resources that are worth spending on testing should be commensurate with the use/criticality of the application: different techniques have different costs and yield different

levels of confidence in product reliability


Test Process – Practical Considerations Termination

A decision must be made as to how much testing is enough and when a test stage can be terminated

Thoroughness measures, such as … achieved code coverage

functional completeness

estimates of fault density or of operational reliability

… provide useful support, but are not sufficient in themselves


Test Process – Practical Considerations Termination

The decision also involves considerations about the

costs and risks incurred by the potential for remaining

failures, as opposed to the costs implied by continuing

to test

There are two possible approaches to this problem

Termination based on test efficiency

Termination based on test effectiveness


Test efficiency-based termination

To decide on test termination or to compare distinct V&V procedures and tools we need to know their Efficiency

walkthroughs, inspections, black-box, white-box ?

Efficiency = work produced / resources spent

» Test efficiency = defects found / effort spent

= benefit / cost


Test efficiency-based termination

As testing proceeds …

defect density decreases

test efficiency decreases - more and more

effort is spent (cost) to find new defects

(benefit)

reliability grows - probability that users

experience defect effects (failures) reduces


Case Study Testing Effort

Testing effort spent per week

0

500

1000

1500

2000

2500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Week

Cumulative Effort (Cost)

0

5000

10000

15000

20000

25000

30000

35000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Week


Case Study Defects Found

Defects found per week

0

50

100

150

200

250

300

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Week

Cumulative Defects (Benefit)

0

500

1000

1500

2000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Week


Benefit / Cost Ratio (Test Efficiency)

0

20

40

60

80

100

120

140

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Week

Cost / Benefit Ratio

0

500

1000

1500

2000

2500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Week

Case Study Test Efficiency

These ratios can be

used to set test

stopping thresholds


Test effectiveness-based termination

"Testing can only show the presence of bugs but never their absence"

Dijkstra

Is this statement correct ?


Test effectiveness-based termination

Test effectiveness allows to decide when tests

should be stopped

test plan should indicate that level (e.g. 90%)

Effectiveness = achieved effect / desired effect

» Test effectiveness = percentage of total defects found


Weekly % of Defects Found

(Weekly Test Effectiveness)

0%

2%

4%

6%

8%

10%

12%

14%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Week Cumulative % of Defects Found

(Cumulative Test Effectiveness)

0%

20%

40%

60%

80%

100%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Week

Test Effectiveness - Case Study

Conclusion: it is not

worth testing beyond a

certain point; that point

can be based on a

given effectiveness

threshold


Test Effectiveness To calculate it we need to know:

the total number of defects

or the number of remaining defects

total = found + remaining

Remaining defects can be known à posteriori

Simply wait by user action (not a good choice ...)

Even then, we have to set an observation period

Obs. period = f (system complexity, transaction rate)

some defects may only cause failures after intensive use


Defect Injection Technique

This technique allows to estimate remaining defects and

therefore obtain test effectiveness

1. A member of the development team (not necessarily the

producer) includes deliberately some defects in the target

system, neither condensed nor in a captious way.

2. He documents and describes the localization of injected

defects and delivers that information to the project leader.

3. The target system is passed on to the testing team.

4. Test process efficiency is verified through the percentage of

injected defects that were found.

5. Remaining defects (not injected) are then estimated


Defect Injection (continued)

Before the beginning of the test we have:

DOi Original Defects (unknown !)

DIi Injected Defects (known)

At all moments after the beginning of the test we have:

DOe Original defects found

DIe Injected defects found

DOr = DOi - Doe Original defects remaining (not found)

DIr = DIi - DIe Injected defects remaining (not found)


Defect Injection (continued)

Let:

ERO = DOe / DOi Effectiveness in Original Defects

Removal (unknown !)

ERI = DIe / DIi Effectiveness in Injected Defects Removal

(known !)

Considering ERO ERI which will be close to truth if the

number of injected defects is sufficiently large:

DOi = DOe / ERO DOe / ERI

DOr = DOi ( 1 - ERO ) = DOe ( 1 / ERO - 1 ) DOe ( 1 / ERI - 1 )


Test Process – Test activities Defect tracking

Detected defects can be analyzed to determine:

when they were introduced into the software

what kind of error caused them to be created

E.g. poorly defined requirements, incorrect variable

declaration, memory leak, programming syntax error, …

when they could have been first observed in the

software


Test Process – Test activities Defect tracking

Defect-tracking information is used to determine what aspects of software engineering need improvement and how effective previous analyses and testing have been

This causal analysis allows introducing prevention actions

Prevention is better than the cure and is a typical characteristic of higher levels of maturity in the software development process


Defect prevention in CMMI


Bibliography [Bec02] K. Beck, Test-Driven

Development by Example, Addison-Wesley, 2002.

[Bei90] B. Beizer, Software Testing Techniques, International Thomson Press, 1990, Chap. 1-3, 5, 7s4, 10s3, 11, 13.

[Jor02] P. C. Jorgensen, Software Testing: A Craftsman's Approach, second edition, CRC Press, 2004, Chap. 2, 5-10, 12-15, 17, 20.

[Kan99] C. Kaner, J. Falk, and H.Q. Nguyen, Testing Computer Software, 2nd ed., John Wiley & Sons, 1999, Chaps. 1, 2, 5-8, 11-13, 15.

[Kan01] C. Kaner, J. Bach, and B. Pettichord, Lessons Learned in Software Testing, Wiley Computer Publishing, 2001.

[Lyu96] M.R. Lyu, Handbook of Software Reliability Engineering, Mc-Graw-Hill/IEEE, 1996, Chap. 2s2.2, 5-7.

[Per95] W. Perry, Effective Methods for Software Testing, John Wiley & Sons, 1995, Chap. 1-4, 9, 10-12, 17, 19-21.

[Pfl01] S. L. Pfleeger, Software Engineering: Theory and Practice, 2nd ed., Prentice Hall, 2001, Chap. 8, 9.

[Zhu97] H. Zhu, P.A.V. Hall and J.H.R. May, “Software Unit Test Coverage and Adequacy,” ACM Computing Surveys, vol. 29, iss. 4 (Sections 1, 2.2, 3.2, 3.3), Dec. 1997, pp. 366-427.


Applicable standards (IEEE610.12-90) IEEE Std 610.12-

1990 (R2002), IEEE Standard Glossary of Software Engineering Terminology, IEEE, 1990.

(IEEE829-98) IEEE Std 829-1998, Standard for Software Test Documentation, IEEE, 1998.

(IEEE982.1-88) IEEE Std 982.1-1988, IEEE Standard Dictionary of Measures to Produce Reliable Software, IEEE, 1988.

(IEEE1008-87) IEEE Std 1008-1987 (R2003), IEEE Standard for Software Unit Testing, IEEE, 1987.

(IEEE1044-93) IEEE Std 1044-1993 (R2002), IEEE Standard for the Classification of Software Anomalies, IEEE, 1993.

(IEEE1228-94) IEEE Std 1228-1994, Standard for Software Safety Plans, IEEE, 1994.

(IEEE12207.0-96) IEEE/EIA 12207.0-1996 // ISO/IEC12207:1995, Industry Implementation of Int. Std. ISO/IEC 12207:95, Standard for Information Technology-Software Life Cycle Processes, IEEE, 1996.


Black-Box Tools - Web Links JavaStar (http://www.sun.com/workshop/testingtools/javastar.html)

JavaLoad (http://www.sun.com/workshop/testingtools/javaload.html)

VisualTest, Scenario Recorder, Test Suite Manager (http://www.rational.com/)

SoftTest (http://www.softtest.com/pages/prod_st.htm)

AutoTester (http://www.autotester.com/)

WinRunner (http://www.merc-int.com/products/winrunguide.html)

LoadRunner (http://www.merc-int.com/products/loadrunguide.html)

QuickTest (http://www.mercury.com)

TestComplete (http://www.automatedqa.com)

S-Unit test framework (http://sunit.sourceforge.net)

eValid™ Automated Web Testing Suite (http://www.soft.com/eValid/)

http://www.sun.com/workshop/testingtools/javastar.html









http://www.sun.com/workshop/testingtools/javaload.html









http://www.rational.com/

http://www.softtest.com/pages/prod_st.htm









http://www.autotester.com/







http://www.merc-int.com/products/xrunguide.html











http://www.merc-int.com/products/loadrunguide.html



http://www.mercury.com/

http://www.automatedqa.com/

http://sunit.sourceforge.net/

http://www.soft.com/eValid/


White-Box Tools - Web Links JavaScope

(http://www.sun.com/workshop/testingtools/javascope.html)

JavaSpec ( http://www.sun.com/workshop/testingtools/javaspec.html )

Cantata++ ( http://www.iplbath.com/ )

PureCoverage, Quantify, Purify ( http://www.rational.com/ )

LDRA ( http://www.luna.co.uk/~elverex/ldratb.htm )

McCabe Test (http://www.mccabe.com/?file=./prod/test/data.html )

ATTOL Coverage ( http://www.attol-testware.com/coverage.htm )

Quality Works ( http://www.segue.com )

Panorama (http://www.softwareautomation.com/)

http://www.sun.com/workshop/testingtools/javascope.html









http://www.sun.com/workshop/testingtools/javaspec.html









http://www.iplbath.com/














http://www.luna.co.uk/~elverex/ldratb.htm











http://www.mccabe.com/?file=./prod/test/data.html











http://www.attol-testware.com/coverage.htm











http://www.softwareautomation.com/

Education

2011/09/20 - Software Testing