Introduction to Software Testing

Università degli Studi dell’Aquila

L19: Introduction to Software Testing

Henry Muccini

DISIM, University of L’Aquila

www.henrymuccini.com, [email protected]

The material in these slides may be freely reproduced and distributed, partially or totally, as far as an explicitreference or acknowledge to the material author ispreserved.

With very special thanks to Antonia Bertolino and Debra J. Richardson which collaborated in previous versions of these lecture notes

Software Failures: examples

Verification and Validation

Software Testing: Intro

Software Testing: Basics

AGENDA

Software Testing: Basics

Testing Process

Type of Testing

The Skype

pervasive

failures:

•Black out for two hours

•Europe, Japan, Australia,

Afghanistan, Sud Africa, Afghanistan, Sud Africa,

Malesia and Brasil have

been affected

Therac-25 safety failure:

•approximately 100 times the intended dose of

radiation

•3 people died, and 6 got injured

Factors:

•Overconfidence in

Software

•Confusing reliability

with safety

•Lack of defensive Design

see article at: http://sunnyday.mit.edu/papers/therac.pdf

•Lack of defensive Design

•Failure to eliminate

fault causes

•Inadequate software

engineering practices

•…

Ash Cloud–related

stress software

failures

Trains in the Netherlands (march 22, 2012)

Tens of thousands of people around the large cities weren’t able to travel by train Thursday morning. No trains from and to Amsterdam and Airport Schiphol from early morning until after the morning rush hour. A failure in the back-up system was the cause. ProRail said that there was a fault in the ‘switch software’. The system therefore didn’t start. And then the signals and switches could not be operated.

Checking some other articles, it simply tries to say that both primary and backup Checking some other articles, it simply tries to say that both primary and backup failed, hence no operations.

Links:http://www.elsevier.nl/web/Nieuws/Nederland/334086/Oorzaak-van-treinstoring-blijkt-fout-in-software.htm

http://www.rnw.nl/english/bulletin/trains-amsterdam-running-again

On impact on people:http://www.dutchnews.nl/news/archives/2012/03/signalling_problems_cause_rail.php

The Poste Italiane business failure:

Il sistema del MIUR per le prove di maturità:“La maturità 2.0 parte con un flop. Il sistema «commissione web», la novità dell’esame di Stato 2012, non ha funzionato. Il software, messo a punto per consentire alle commissioni di comunicare in tempo reale col cervellone centrale del Miur tutte le attività connesse con gli esami, è andato in tilt ancora prima di partire. Nelle scuole di Firenze le commissioni non sono riuscite ad inserire online i verbali delle riunioni di insediamento che si sono tenute questa mattina. ”

http://corrierefiorentino.corriere.it/firenze/notizie/cronaca/2012/18-giugno-2012/maturita-20-partenza-flop-201657781657.shtml

Prenotazioni Trenitalia:

“Il nuovo sistema di Ferrovie dello Stato è un disastro: c'è chi non riesce più a usare il proprio codice ma non può cancellarsi perché per farlo occorre usare il codice.Dalle 1 alle 3 di notte non funziona, perché fanno la manutenzione, ma mica te lo dicono …”

http://righedidiomira.blogspot.it/2012/01/sempre-trenitalia-sempre-piu-disservizi.html

Fineco, pagamento IMU F24:Con la detrazione prima casa, il mio imponibile va sotto zero e il sistema va in tilt.

http://violapost.it/?p=7351

[Fatal Defect, Ivars Peterson, 1995]

Half book is about failures in software development

http://www.wired.com/software/coolapps/news/2005/11/69355?currentPage=all

http://www.devtopics.com/20-famous-software-disasters/

NIST (National Institute of Standards and

Technology) study in 2002 [NIST],

→software errors cost the U.S. economy $59.5 billion

every year.

Standish Chaos report [Standish]Standish Chaos report [Standish]

→a clear statement of requirements is one of the three

main reasons that lead to project success, as well as

incomplete requirements are one of the main reason of

projects deletion.

[NIST] The economic impacts of inadequate infrastructure for software testing. In NIST Planning Report 02-3. 2002. http://www.nist.gov/public affairs/releases/n02-10.htm.[Standish] The standish group report: Chaos. 1995. http://www.projectsmart.co.uk/docs/chaos-report.pdf.

Validation:

does the software system meets the user's real needs?

are we building the right software?

(valid with respect to users’ needs)

Verification:

does the software system meets the requirements

specifications?

are we building the software right?

(valid with respect to the system specification)

Software Inspection (static analysis technique)

Debugging (to locate and fix bugs)

Theorem proving

Model checking (to prove a property Model checking (to prove a property correctness)

Software Testing

(None is the absolute perfect solution)

Completeness & Correctness

•• Correctness properties are Correctness properties are undecidableundecidable

•• False positive and False negativeFalse positive and False negative

Timeliness

•• The V&V process stops (most of the time) where there is The V&V process stops (most of the time) where there is

no more no more timetime

•• Time is one of the stopping rulesTime is one of the stopping rules

Cost-effectiveness

•• “Select the less that gives you the most”“Select the less that gives you the most”

•• V&V is justified especially when failures are expensiveV&V is justified especially when failures are expensive

SOFTWARE TESTING

An all-inclusive definition

Software testing consists of:

the dynamic verification of the behavior of a program

on a finite set of test cases

IMP

suitably selected from the (in practice infinite) input domain

against the specified expected behavior

[A. Bertolino]

is not

(citation from Hamlet, 1994):

I've searched hard for defects in this program, found a lot of them, and repaired them. I can't find any more, so I'm confident there aren't any.

� Testing is NOT exhaustiveexhaustive

IMP

� Testing is NOT exhaustiveexhaustive

⇒⇒What to test?What to test?

⇒⇒When to stop?When to stop?

� Testing is NOT cheapcheap

⇒⇒ test less and best!!test less and best!!

⇒⇒When to stop?When to stop?

(1) Testing Process:

� Test Selection (Category partition)

� Test Execution

� Oracle

� Regression Testing

�S

yste

ma

ticvs. A

d H

oc

�G

lossa

ry


(2) Type of Testing:

� Black Box and White Box

� Unit, Integration, System

� White Box

vs. A

d H

oc

Testing involves several demanding tasks:

→Test selection─ how to identify a suitable finite set of test cases

→Test execution─ how to translate test cases into executable runs

IMP

→Test oracle─ Deciding wheter the test outcome is acceptable or not

─ If not, evaluating the impact of the failure and its direct cause (the fault)

→Testing adequacy─ Judging wheter the test campaign is sufficient

→Test coverage

Test selection consists in the identification of a

“suitable” and finite set of test cases.

The test selection activity provides guidelines on how to

select test cases. It is driven by a ‘‘test criterion’’ and has select test cases. It is driven by a ‘‘test criterion’’ and has

to produce ‘‘suitable’’ test cases

Slide taken from Alex Orso

Test Criterion:

� A test criterion provides the guidelines, rules, and strategy by which test cases are selected. In general, a test criterion is a means of deciding which shall be a ‘‘good’’ set of test cases (Reference 117 of [Muccini08]) .

Suitability: Suitability:

� A test case is suitable if it contributes to discovering as many failures as possible, according to a test criterion.

[Muccini08] Henry Muccini, Software Testing: Testing New Software Paradigms andNew Artifacts, in: Wiley Encyclopedia of Computer Science and Engineering, John Wiley & Sons, Inc., 2008

Test Case:

� A test case is a set of inputs, execution conditions, and a

pass/fail criterion (Ref. 116 of [Muccini08]) .

A test case thus includes not only input data but also any

relevant execution conditions and procedures, and includes a

way of determining whether the program has passed or way of determining whether the program has passed or

failed the test on a particular execution (Ref. 8 of

[Muccini08]).

Test Suite:

� A test suite is a collection of test cases.

The EasyLine system is

composed of three sub-

systems:

�The SP system

�The Mobile app

�The server-side

application

How to select test cases? (test selection technique)

How many test cases? (when to stop --stopping rule?)

Which artefacts to use for selecting test cases? (code, spec?)(code, spec?)

Ad hoc or Systematic testing?

Tester’s intuition and expertise

• “Ad hoc testing” (sometime quite effective)

• Special cases

Specifications

• Equivalence partitioning

• Boundary-value analysis

Fault-based

• Error guessing/special cases

• Mutation

Usage

• SRET• Decision table

• Automated derivation from formal specs (conformance t.)

• ....

Code

• Control-flow based

• Data-flow based

• SRET

• Field testing

Nature of application, e.g.:

• Object Oriented

• Web

• GUI

• Real-time, embedded

• Scientific

• .....

No one is the best technique, but a combination of different criteria has empirically

shown to be the most effective approach

Code-based: (code graphs)

→Structural/White Box Testing

─ Test cases selected based on structure of code

─ Views program /component as white box

(also called glass box testing)

Source

Code

(Test) Inputs

Output

Internal behavior

Specification-based: (Input-output)

→Functional/Black Box Testing

─ Test cases selected based on specification

─ Views program/component as black box

Internal behavior

Bynary

Code or

Spec

(Test) Inputs

Output

We focus on “systematic” testing:

→Repeatable

→Measurable

─ best tester

IMP

─ coverage

→Based on sampling:

─ Infinite input domain, but finite set of test

cases

Two are the main sub-activities to be performed:

→B1) identify those “inputs" which force the execution of the

selected test case,

→B2) put the system in a state from which the specified test

can be launched.

B1 -- Forcing the execution of the test cases derived

according to one criterion might not be obvious

→In code-based testing, we have entry-exit paths over the

graph model and test inputs that execute the corresponding

program paths need be found

B2 -- put the system in a state from which the specified

test can be launched

→Also called, Test Pre-condition

→In Synchronous Systems:

─ Several runs in sequence are required to put the system in the test Several runs in sequence are required to put the system in the test

pre-condition

→In Concurrent Systems:

─ Non Determinism problem

Replay problem


composed of :

�Web services

�Sensors

�Mobile applications�Mobile applications

� routing algorithms

�…

A test oracle is a mechanism for verifying the behavior of test execution

→ extremely costly and error prone to verify

→ oracle design is a critical part of test planning

Sources of oracles

input/outcome oracle→ input/outcome oracle

→ tester decision

→ regression test suites

→ standardized test suites and oracles

→ gold or existing program

→ formal specification

BACK

The Expected Output is

f*(d)

YES, given input d

f(d) = f*(d)f(d) = f*(d)

» In some cases easier (e.g., an existing version,

existing formal specification), but generally very

difficult (e.g., operational testing)

» Not enough emphasized research problem

Theoretical notions of test adequacy are usually defined in terms of adequacy criteria

→ Coverage metrics (sufficient percentage of the program structure has been exercised)

→ Empirical assurance (failures/test curve flatten out)

→ Error seeding (percentage of seeded faults found is → Error seeding (percentage of seeded faults found is proportional to the percentage of real faults found)

→ Independent testing (faults found in common are representative of total population of faults)

Adequacy criteria are evaluated with respect to a test suite and a program under test

BACK

(1) Testing Process:

� Test Selection (Category partition)

� Test Execution

� Oracle


�S

yste

ma

ticvs. A

d H

oc

�G

lossa

ry


(2) Type of Testing:

� Black Box and White Box

� Unit, Integration, System

� White Box

vs. A

d H

oc

Black box vs White box [in next lectures]

Unit, Integration, System

Performance, Stress

Regression Testing [in next lectures]Regression Testing [in next lectures]

…

Unit:

→The Unit test purpose is to ensure that the unit satisfies its functional

specification and/or that its implemented structure matches the intended

design structure

→Unit tests can also be applied for test interface or local data structure.

Integration:Integration:

→Integration testing is specifically aimed at exposing the problems that

arise from the combination of components

→Communicating interfaces among integrated components need to be tested

→Type: big-bang or incremental (top-down, bottom-up, mixed)

System:

→It attempts to reveal bugs which depends on the environment

→Recovery testing, security testing, stress testing and performance testing

SoftwareRequirementsSpecification

Architecture

UserRequirements

System

Testing

Acceptance

Testing

levels

of

abstr

action

plan &validate/verify

Unit

Implementations

Component

Specifications

ArchitectureDesign

Specification

UnitTesting

Integration

Testing

Component

Testingdesign &

analyze

integrate

& test

time


composed of three sub-

systems:

�The SP system

�The Mobile app

�The server-side

application

Stress testing: designed to test the software with abnormal situations.

→Stress testing attempts to find the limits at which the system will fail through abnormal quantity or frequency of inputs.

→The test is expected to succeed when the system is stressed with higher rates of inputs, maximum use of memory or system resources.

Performance testing is usually applied to real-time, embedded systems in which low performances may have serious impact on the normal execution.

→Performance testing checks the run-time performance of the system and may be coupled with stress testing.

→Performance is not strictly related to functional requirements: functional tests may fail, while performance ones may succeed.

Education

Introduction to Software Testing