The Past & Future of Software Testing - Cem Kaner · The Past & Future of Software Testing: ... I wrote Testing Computer Software to ... •Let’s face some realities

1How Many Lightbulbs to Change a Tester? Copyright © 2003 Cem Kaner

The Past & Future of Software Testing:How Many Lightbulbs Does It Take

To Change A Tester?

Cem Kaner, J.D., [email protected]

www.kaner.com, www.testingeducation.org

at the

Software Testing Day

Tampere University of Technology

May 4, 2004

Research underlying these slides was partially supported by NSF Grant EIA-0113539ITR/SY+PE: "Improving the Education of Software Testers." Any opinions, findings andconclusions or recommendations expressed in this material are those of the author(s) anddo not necessarily reflect the views of the National Science Foundation (NSF).

1


Old Truths• Many years ago, the software development community

formed a model for the software testing effort. As I interactedwith it from 1980 onward, the model included several "bestpractices” and other shared beliefs about the nature oftesting.

• Beginning in 1983, I wrote Testing Computer Software tofoster rebellion against some of these. And to support manyothers.

The testing community has developed a culture

around these shared beliefs.


Lightbulbs?• Over the years, many

– testing projects have failed– test managers have been fired– products have been successful

despite corporate refusal toconform to testing “standards”

• And yet, most of the same old lore isstill being repeated as the proper guideto testing culture and practice.

Don’t think the old stuff is still with us?Look at ISEB’s current syllabus for test practitioner certification:www1.bcs.org.uk/DocsRepository/00900/913/docs/practsyll.pdf

Or look at IEEE’s SWEBOK, www.swebok.orgThere are many other examples. . . .

What should it take,

for us to learn from

our experiences?

2


We Should Advocate for Waterfall or V models?• In these develop-in-stages lifecycle models, we try to finish each

(stage / phase) before moving on to the next:

from

Robin Goldsmith, TheForgotten Phase,Software Development,July 2002,http://www.sdbestpractices.com/documents/s=8815/sdm0207e/0207e.htm


We Should Advocate for Waterfall or V models?• These models are often harshly criticized as being inherently

high risk.

– they force the project team to lock down details longbefore the implications, costs, complexity, andimplementation difficulty of those details are known.

• The iterative approaches (spiral, RUP, evolutionarydevelopment, XP, etc.) are reactions to these risks.

Why would testers actively advocatefor this type of lifecycle?


We Should Advocate for Waterfall or V models?• Think of the tradeoffs:

-- Features -- Reliability -- Cost -- Time to Market• In the waterfall (and V), we lock down the feature set early, do

the design, write the code (and so spend most of the money thatwe ever will spend on it)

• So, toward the end of the project, what variables are still free?

Reliability versus Time

Sound familiar?Why should we set ourselves up for this grief?

Iterative models are designed to change this tradeoff


Testers Should Refuse to Test if There is No Specification?• This was one of the assertions that motivated me to start writing

Testing Computer Software in 1983. (I disagreed with it.)• It’s still with us. I saw a leading consultant / author publicly give this

advice to a pair of newly-promoted test managers a couple years ago.• Let’s face some realities

– Many development groups choose not to write detailedspecifications. Is this always bad?

– Many product changes are made without updating thespecification, because the spec is not considered final authority inthat company. Is this always bad?

– A tester who refuses to proceed until the engineering process ischanged is hijacking the management of the project.

Jump in front of the train enough times and eventually you will get run over.


Drive Tests From the Specification?• It’s easy to say that test cases should be based on documented

characteristics of the program, for example on the requirementsdocuments or the specifications.

• And there should be at least one thoroughly documented test forevery requirement item or specification item.

– Why only one test per item? Does one test each cover thespectrum of failure risks for these items? (No)

– What about all the items in the program that can’t be tracedback to the requirements docs or the design specification?

A specification is one source of fallible guidance.To the extent that the spec limits what you test,

it is a source of risk. 3


Expected Results are Only Part of the Story

Expected Test Results

Postcondition Data

PostconditionProgram State

EnvironmentalResults

Test Oracle

System UnderTest

Test Inputs

Precondition Data

PreconditionProgram State

EnvironmentalInputs

Obtained Test Results

Postcondition Data

PostconditionProgram State

EnvironmentalResults

Reprinted with permission of Doug Hoffman


A Test Without an Expected Result is Not a Test?• A test that is defined in terms of one expected result is

undefined against the other types of results available fromthat test.

• High-volume automated tests may run until crash. The only“expected result” might be non-crash or conformance toother simple predictions. These techniques exposeinteresting problems that we do not know how to test forotherwise. The expected results have no relationship to theactual risks we are attempting to mitigate.– Are these improper tests?– Narrow visions of test design certainly steer us away

from them.– But they expose critical bugs.


Design Most Tests Early in Development?• Why would anyone want to spend most of their test design

money early in development?

– The earlier in the project, the less we know about how itcan fail, and so the less accurately we can prioritize

One of the core problems of testing is the infinityof possible tests.

Good test design involves selection of a tiny subsetof these tests.

The better we understand the product and its risks,the more wisely we can pick those few tests.


Design Most Tests Early in Development?• “Test then code” is fundamentally different from test-first

programming

Supports exploratory developmentof architecture & design

Supports understanding ofrequirements

Near-zero delay, communicationcost

Usual process inefficiencies anddelays (code, then deliver build,then wait for test results, slow,costly feedback)

Primarily unit tests and low-levelintegration

Primarily acceptance, or system-level tests

The programmer creates 1 test,writes code, gets the code working,refactors, moves to next test

The tester creates many testsand then the programmer codes

Test-first developmentTest then code


Good Engineering RequiresEarly Lockdown of the User Interface?

• How many times do we hear this nonsense from GUI regressiontest tool vendors and their shills?

• The user interface is there for communication with the user.

• Usability engineering is highly iterative

When we do beta testingand discover,

as we always discover,that the product confuses / annoys the user,

Do we really want to push a process that will discourage developers

from making the improvementswe call for in our test results?

4


It’s Their Job to Keep the User Interface StableNot Our Job to Make Our Tests Maintainable

• WHY do some testers think programmers will restructure theirdevelopment process to make things convenient for testers?

• WHY would anyone expect programmers to show test code(and the tester) any respect if the code can’t cope with simplechanges?

• Change is inevitable. Deal with it.

Mommy, mommythose big, bad, nasty programmers

won’t let me do my job!

It’s THEIR fault I can’t get anything done, Mommy!


Good Tests Should be Reused as Regression Tests?

• Let’s distinguish between the change-detectors at the codelevel and UI / System level regression tests

• Change detectors– writing these helped the TDD programmer think through the

design & implementation– near-zero feedback delay and near-zero communication cost

make these tests a strong support for refactoring• System-level regression

– provide no support for implementation / design– are run well after the code is put into a build that is released

to testing (long feedback delay)– run by someone other than the programmer (feedback cost)


Good Tests Should be Reused as Regression Tests?• Maintenance of UI / system-level tests is not free

– change the design discover the inconsistency discoverthe problem is obsolescence of the test change the test

• So we have a cost/benefit analysis to consider carefully:– What information will we obtain from re-use of this test?– What is the value of that information?– How much does it cost to automate the test the first time?– How much maintenance cost for the test over a period of

time?– How much inertia does the maintenance create for the

project?– How much support for rapid does the test suite provide for

the project?


The Concept of Inertia• INERTIA: The resistance to change that we build into a project.

– Intentional inertia:• Change control boards

• User interface freezes– Process-induced inertia

• Costs of change that are imposed by the development process– rewrite the specification– rewrite the tests– re-run all the tests

• Testers advocate for late changes (aka bug fixes)

• What inertia do our processes induce in the project, and towhat extent does our inertia block needed improvements?

5


Testers are The Advocates of Quality?How often have you heard statements like this?

• “Somebody has to look out for the user.”

• “The project needs one group that really cares about quality.”

• “Project managers can’t afford to care about quality.”

When you identify yourself as

The Advocate For Qualityyou’re saying that

“We care about QualityAND THEY DON’T.

That prophesy getsself-fulfilling after a while.

6


Testers Should be Able to Veto a Product Release?• The power to veto shipment is often described as the thing

that distinguishes a “testing” group from a “qualityassurance” group.

• There are serious problems with this “power”– It takes accountability away from the project manager who is

making the development tradeoffs– It lets project managers (and others) play “release chicken”, setting

the test group up to be the bad guys for holding the release (andblowing the schedule)

– It sets testers up to be “the enemy” without any long-term qualitybenefit, and it makes testers the ones to blame if the product fails inthe field.

– The test group may well lack the knowledge to make the rightdecision


Testers Should be Able to Veto a Product Release?

• To achieve better quality, build a better product.

– Better code, better user documentation, bettertraining, better support.)

• To assure better quality,take over management of thegroups that can make the product better.

Complaining about bad quality after the fact, even beating people with a stick after the fact,

will not assure better quality.


So What Do We Do?• If we're not the quality assurance group, what are we?

– Service providers.– We provide testing services.

Testing is a technical investigation of a product, anempirical search for quality-related information of

value to a project's stakeholders.


Testers Should Work WithoutKnowledge of the Underlying Code?

• Why?

– Protects testers from bias? Huh?– Gives us an excuse for hiring people

who can’t code?• Maybe a better reason is discipline:

– Rather than learning about theproduct from the code, theblack box tester has to gather otherclasses of information that theprogrammer probably didn’t consider.

Are we really advocatingignorance-

based testing?

7


So, Testers SHOULD Work WithoutKnowledge of the Code?

• This is a practical question, not a question of principle:

– What does the project gain from a tester who focuses oncustomer benefits, configuration / compatibility, andcoordination with other products?

contrasted with– What does the project gain from a tester who can review code

to determine plausibility of some tests, and can implementcode-aware test tools / techniques (e.g. FIT, simulators usingprobes, event logs, etc.)

– What does the project gain from a tester who activelycollaborates with the programmers in the analysis anddebugging of the code?


Programmers Can’t Catch Their Own Bugs?• A programmer’s public bug rate includes all bugs left in the code when she

gives it to someone else (such as a tester.) Rates of one to three bugs perhundred statements are not unusual.

• A programmer’s private bug rate includes all bugs she makes, including anyshe fixes before passing the program to testing.

• Estimates of private bug rates: 15-150 bugs per 100 statements (e.g.Beizer).

Therefore, programmers must be findingand fixing between 80% and 99.3% of theirown bugs before their code goes into test.

• What does this tell us about our task?

We find bugs by looking into theprogrammer’s (and her tools’) blind spots.

• Merely repeating the types of tests that the programmers did won’t yield morebugs.

• That’s one of the reasons that an alternative approach is so valuable.


Testers Should Work IndependentlyFrom Programmers?

Benefits of collaboration?• Functional testing

Emphasis is on capability / benefits for the user.

The skilled functional tester often gains a deep knowledge ofthe needs of customers and customer-supporting stakeholders.

• Para-functional testingSecurity, usability, accessibility, supportability, localizability,interoperability, installability, performance, scalability.

The customer / user is not an expert in these attributes but hasgreat need of them.Effective testing will often require collaboration and mutualcoaching between programmers and testers.

• Preventative testing (programmer support)


Testers Should Work Independently From Programmers?• Preventative testing (programmer support)

Test-driven development (TDD) can benefit from support fromtesters (pairing with programmers)

Benefits of TDD to the project?– Provides a structure for working from examples, rather than from an

abstraction. (Supports a common learning / thinking style.)– Provides concrete communication with future maintainers.– Provides a unit-level regression-test suite (change detectors)

• support for refactoring

• support for maintenance

– Makes bug finding / fixing more efficient• No roundtrip cost, compared to GUI automation and bug reporting.

• No (or brief) delay in feedback loop compared to external tester loop

– Provides support for experimenting with the language


Our Job is to Find and ReportCoding Errors, Not Design Errors?

This is an old debate.

Programmer-centric vision of the role oftesting

Broad benefit-to-the-business visionof the role of testing

V

E

R

S

U

STesters are specialists, focusing oncoding mistakes and playing a juniorsupport role to programmers

Testers are generalists, learningabout all dimensions of qualityproblems

“Not my job” theory of testing – we shouldrecruit and train for a narrow set ofresponsibilities

We should build diverse test groupswith knowledge in many relevantdomains, including quality attributes

Testers don’t have skills to assessquality-characteristic problems (usability,security, accessibility, etc.).

We can assess for many types ofissue; few companies havespecialists (e.g. UI expert), so wecarry the problem.

A key risk is the long-term de-skilling of the test group.


The IEEE Software Engineering Standards are aBasis for Sound Testing Practice?

• We see this assertion regularly, but

– how many testers have ever read these standards? Have you? (atPNSQC, only half were familiar with the IEEE software processstandards and less than 1% advocated their use in their projects.How can this be an "industry standard"?)

– I think they have a strong bias toward heavyweight processes(massive paperwork, not much engineering).

– I think they have a strong bias toward waterfall, which I see as high-risk engineering.

– I think they have a strong bias toward a One True Way that doesn’tvary with the project context.

– On balance, I think Standard 829 has done more harm than good, andthe other process standards have been largely irrelevant. 8


Document Manual Tests in Full Procedural Detail?

• The claim is that manual tests should be documented in greatprocedural detail so that they can be handed to lessexperienced or less skilled testers, who will

(a) repeat the tests consistently, in the way they wereintended,

(b) learn about test design from executing these tests, and(c) learn the program from testing it, using these tests.

I don’t see any reason to believe that we will achieve any of these benefits. I think this is as close as

we come to an industry worst practice.


Most Testers are Stupid or Non-Technical?• Under this theory,

– We attempt to reduce testing to a set of routine steps• document intensely so that one smart person can

transmit The Great Thought to a bunch of lesser testers• automate intensely so that one smart person and a

machine can replace those lesser testers– It makes sense to replace testers with test-first programmers– Test automation languages should be ultra-simple– Testers’ time/work should be closely supervised with little room

for independent judgment

This vision cannot scale to current product / development complexity 9


Ultimately We Face a Conflict of Visions

Tools help us learn new thingsTools help us replace manual laborwith automated labor

We support and influence a broadgroup of stakeholders;communication skills are vital.

We support a process model or anarrow constituency (programmers orin-house end users or ...)

Breadth-first, then learning-guideddepth, until we run out of ideas ortime. We rely on cognitive triggersand peers to keep the flow ofideas steady (until we run out)

We want our infinite set of tests to feelfinite. We use progress measures (e.g.“coverage”) to gauge minimally-sufficient testing.

We are constantly trying to learnnew things about the product (andhow it can fail)

We are trying to develop a simple,easily taught, easily supervised,control process.

Active LearnerQA / QC


We Should Not Do Exploratory Testing?• Exploratory testing involves simultaneously

– executing tests– learning about the program, its market, its risks– designing new tests based on what we have just learned

• These tests can be automated or manual, whatever will teachus more about what we’re investigating in the moment

Exploratory testing is the tool of the active learner,

the technical investigator,

Rather than the QC automaton.


We Should Not Do Exploratory Testing?• Everyone explores to some degree

– follow-up testing (try to replicate or extent) a just-found bug– regression testing an allegedly-fixed bug

• Some tests provide all their value the first time you use them– Many scenario tests inform the tester of the design and provide

little new information (compared to new scenarios) on reuse• Straw man objections

– Exploratory testing should only be done by experts (SWEBOK)– All testing should be automated, and if impossible to automate,

should be made into a precise routine as if it were automated(Crispin)

– All tests should be captured and turned into regression tests


We Should Not Do Exploratory Testing?• Another way to avoid exploratory testing (and the vision of a

skilled technical investigator) is to find a way to recharacterizeexploration as something easily turned into a routine:– Exploration is “really” based on memorization of past failures: we could

get the same benefit from a fault catalog, a failure catalog, or applicationof PSP.

– Exploration is “really” based on a stereotyped set of attacks: maybe wecan build a program that will generate these attacks.

Reference materials and improved tools certainly help the explorer explore, but ultimately, exploratory testing is about

discovery of things we don’t know about the program and don’t yet have a routine, cost-effective system for exposing

(that can work in the present project’s context).


What’s With All This Outsourcing?• Peter Drucker, Managing in the Next Society, stresses that we

should manufacture remotely but provide services locally. Thelocal service provider is:

– more readily available, more responsive, and more ableto understand what is needed

• So why are companies so willing to obtain their softwaredevelopment services from halfway around the world?

If we adopt development processes that (at great cost)push all communication to paper, demand early decisions

and make late changes difficult and expensive, what benefit is left to the local service provider?

If we spend the money to create the formalistic infrastructure that we would need for outsourcing,

we may as well do the work where it is cheaper, because we have squandered our local advantages.

10

Documents

The Past & Future of Software Testing - Cem Kaner · The Past & Future of Software Testing: ... I wrote Testing Computer Software to ... •Let’s face some realities