Testing Interactive Software: A Challenge for Usability and Reliability Philippe Palanque LIIHS-IRIT, University Toulouse 3, 31062 Toulouse, France [email protected]

Testing Interactive Software: A Challenge for Usability and Reliability

Philippe PalanqueLIIHS-IRIT,

University Toulouse 3, 31062 Toulouse, France

[email protected]

Regina Bernhaupt ICT&S-Center,

Universität Salzburg5020 Salzburg, Austria

[email protected]

Ronald Boring Idaho National

LaboratoryIdaho Falls 83415,

Idaho, [email protected]

Chris JohnsonDept. of Computing Science,

University of Glasgow, Glasgow, G12 8QQ, Scotland

[email protected]

Sandra BasnyatLIIHS-IRIT,

University Toulouse 3,

31062 Toulouse, France

[email protected]

Special Interest Group – CHI 2006 – Montréal – 22nd April 2006

Outline of the SIG Short introduction about the SIG (10 mn) Short presentations (20 mn)

Software engineering testing for reliability (Philippe) Human reliabilty for interactive systems testing

(Ron) Incident and accident analysis and reporting for

testing (Sandra) HCI testing for usability (Regina)

Gathering feedback from audience (10 mn) Presentation of some case studies (20 mn) Listing of issues and solutions for interactive

systems testing (20 mn) Discussion and summary (10 mn)

Introduction What are interactive applications What is interactive applications testing

Coverage testing Non regression testing

Usability versus reliability What about usability testing of a non

reliable interactive application What about reliable applications with poor

usability

Interactive Systems

A paradigm switch

Control flow is in the hands of the user

Interactive application idle waiting for input from the users

Code is sliced Execution influenced by internal and

external states Nothing new but …

6

Classical Behavior

Sortie ?

Fin

sortie ?

Lire une entree

Effectuer un traitement.

Lire une entrée

Effectuer un traitement

Read Input

Exit ?

End

Read Input

Process InputProcess Input

Exit ?

Event-based Functioning

Application

Window Manager

States

At

star

tup

Get next event

Dispatch event

Register Event Handlers

Call Window Manager

Finished

Event Handler 1

Event Handler 2

Event Handler n

EH Registration

Event Queue

At

run

tim

e

Ack received

Wait for next event

Safety Critical Interactive Systems

Safety Critical Systems Software Engineers System centered Reliability Safety requirements

(certification) Formal specification Verification / Proof Waterfall model /

structured Archaic interaction

techniques

Interactive Systems Usability experts User centered Usability Human factors Task analysis & modeling Evaluation Iterative process /

Prototyping Novel Interaction

techniques

Some Well-known Examples (1/2)

Some Well-known Examples

The Shift from Reliability to Fault-Tolerance

Failures will occur Mitigate failures Reduce the impact of a failure

A small demo …

Informal Description of a Civil Cockpit application

The working mode The tilt selection mode: AUTO or

MANUAL (AUTO)The CTRL push-button allows to swap between the two modes

The stabilization mode: ON or OFFThe CTRL push-button allows to swap between the two modesThe access to the button is forbidden when in AUTO tilt selection mode

The tilt angle: a numeric edit box permits to select its valueinto range [-15°; 15°]

Modifications are forbidden when in AUTO tilt selection mode

Various perspectives of this Special Interest Group

Software engineering testing for reliability

Human reliability testing Incident and accident analysis and

reporting for testing HCI testing for usability

14

Consequence: Inconvenience

Consequence: Danger

What do we mean by human error?

Conceptualizing error Humans are natural “error emitters”

On average we make around 5-6 errors every hour Under stress and fatigue that rate can increase

dramatically Most errors are inconsequential or mitigated

No consequences or impact from many mistakes made

Where there may consequences, many times defenses and recovery mechanisms prevent serious accidents

15

Human Reliability Analysis (HRA) Classic Definition

The use of systems engineering and human factors methods in order to render a complete description of the human contribution to risk and to identify ways to reduce that risk

What’s Missing HRA can be used to predict human performance

issues and to identify human contributions to incidents before they occur

Can be used to design safe and reliable systems

16

Performance Shaping Factors (PSFs) Are environmental, personal, or task-

oriented factors that influence the probability of human error

Are an integral part of error modeling and characterization

Are evaluated and used during quantification to obtain a human error rate applicable to a particular set of circumstances

Specifically, the basic human error probabilities obtained for generic circumstances are modified (adjusted) per the specific situation

17

Example: SPAR-H PSFs

Maximizing Human Reliability Increasingly, human reliability needs to go beyond being a diagnostic tool to become a prescriptive tool NRC and nuclear industry are looking at new designs for control

rooms and want plants designed with human reliability in mind, not simply verified after the design is completed

NASA has issued strict Human-Rating Requirements (NPR 8705.2) that all space systems designed to come in contact with humans must demonstrate that they impose minimal risk, they are safe for humans, and they maximize human reliability in the operation of that system

How do we make reliable human systems? Design Test Model

19

} “classic” human factors

} human reliability analysis

Best Achievable Practices for HR The Human Reliability Design

Triptych

20

21

Concluding Thoughts Human error is ubiquitous Pressing need to design ways to prevent

human error Impetus comes from safety-critical systems

Lessons learned from safety-critical systems potentially apply across the board, even including designing consumer software that is usable

Designing for human reliability requires merger of two fields Human factors/HCI for design and testing Human reliability for modeling

Incidents and Accidents as a Support for Testing Aim, contribute to a design method for

safer safety-critical interactive systems Inform a formal system model Ultimate goals

Embedding reliability, usability, efficiency and error tolerance within the end product

While ensuring consistency between models

The Approach (1/2) Address the issue of system redesign

after the occurrence of an incident or accident

2 Techniques Events and Causal Factors Analysis Marking Graphs extracted from a system

model 2 Purposes

Ensure current system model accurately models the sequence of events that led to the accident

Reveal further scenarios that could eventually lead to similar adverse outcomes

The Approach (2/2)

Incident & accident

investigation part

Accident Report

Safety-Case

Analysis

Model The System

ECF Analysis

Re-model The

System

Formal ICO System Model

Including Erroneous Events

Marking Graph

Analysis

Re-Design System Model to make Accident

Torlerant

Extraction of Relevant Scenarios

DocumentProcedureData

Decision Modelling

Key

Part of the whole process

System design

part

Seal On North

Grinder Overheated

Worker checks valves in

containment area are in correct

position to make the switch

South pump idle since 3 days

South pump grinder motor replaced due

to bad seal

Fuel does not flow to

kilns

Low Pressure in the Pipes

Air in the Pipes

Blockage caused by a clog

Blockage caused by a closed valve

Kiln op. radios supervisor to

inform that fuel is not getting to

kilns

Supervisor is informed that

fuel is not getting to kilns

Kiln op. monitors system

through fuel line sensors in control room

Decision Made to

Switch from North to

South DS

Supervisor and Kiln Operator

Discuss Situation

Kiln Operator Notices Seal On

North Grinder Overheated

Kiln Operator Notices Fuel

does not flow to Kilns

Supervisor bleeds air from ¾” ball valves of fuel pipe

system

Supervisor believes it is

an air problem

Worker bleeds air at

south pumps

Worker Unaware of Manufacturer’s

Guidelines against Bleeding Air while

Pumps in Operation

Supervisor Unaware of Manufacture’s


Pumps in Operation

Worker radios kiln op. to find out if fuel has started to go to

kilns

Kiln op. tells Worker fuel is not going

through

Kiln Operator observes Fuel is still not reaching

the Kiln Area

Low Pressure in the Kiln Area

Blockage in pipe abruptly removed

Water- Hammer

Effect

Grinder explode

d & propelled off its base

Air continues to block the monyo grinder fuel piping

Supervisor bleeds air of valve south 330 pump while in

operation

Fuel sprayed from grinder

base covering Supervisor and Worker

Fuel ignites

Kiln area pressure sensor senses low pressure

Kiln area pressure sensor sends signal to pump motors

to activate the ‘Step- Increase’ program

Fuel starts

flowing in the piping

Max PSIG of 334 per motor can be achieved

Kiln area pressure sensor sends signal to F-system

PLC does not respond

PLC not connected to

F-System

Kiln area pressure sensor senses low

pressure

Supervisor unaware of pump

manufacturer advice not to bleed air while pumps in operation

Follow-up maintenance

checks were not performed after new PLC was

installed

New PLC was installed 3

months prior to accident

FOXBRO was still connected

to old PLC system

manufacturer's warning to not

bleed the lines of air while the pumps

were operating

Manufacturer’s warning not

included in training

PLC did not activate the auto shut-down of

pumps (should occur if <60PSI not sensed after

3 mins of startup)

Fire suppression sensors mounted

40' above floor

Fireball occurred approx. 20' above floor

Fire suppression system did not

activate

Kiln control operator saw the fire from the

control room and radioed for help

Control room supervisor, activated the manual

emergency shut down on the pumps

Worker and Supervisor’s clothes catch

fire

Supervisor runs outside

Kiln area

Supervisor uses

extinguisher to extinguish

himself

Worker runs outside

Supervisor tries to

extinguish worker

Supervisor directed Worker to switch the pumps from North to South

12:45pm

Worker Goes to containment area where pumps are

located

12:46pm

Worker shuts down North

system pumps & starts South

system pumps

12:48pm

Pump Motors increase speed in order to achieve pressure of

70 PSI

12:48pm

F-System sends signal to PLC

12:51pm

Supervisor observes pipe entering south

grinder begin to shake&vibrate

12:58pm

ECFA Chart of the Accident

Marking Trees & Graphs

Marking Tree – identify the entire set of reachable states Is a form of state transition diagram Analysis support tools available However, can impose considerable

overheads when considering complex systems such as those in case study

The Approach Not Simplified

Safety-Cases

Safety-Case

Analysis

Accident ReportECF

Analysis

Accident Scenarios

Model The

System

Re-model The

System

Not OK

Marking Graph

Analysis

All Possible Scenarios of

System Model

HERT Data



Including Erroneous Events

Extraction of Relevant Scenarios

Re-Design System Model to make

Accident Torlerant

Re-Designed Model

Check New Model

Simulate Scenarios

Relevant Runnable Scenarios

System Model Problems

OK

Finish

INCIDENT AND ACCIDENT INVESTIGATION PART

SYSTEM DESIGN PART

Document

Modelling

model

Decision Procedure

Key

Data

Usability Evaluation Methods (UEM)

UEMs conducted by experts Usability Inspection Methods,

Guideline Reviews, … Any type of interactive systems

UEMs involving the user Empirical evaluation, Observations, … Any type of interactive systems (from

low-fi prototypes to deployed applications)

Usability Evaluation Methods (UEM)

Computer supported UEMs Automatic testing based on

guidelines, … Task models-based evaluations,

metrics-based evaluation, … Applications with standardized

interaction techniques (Web, WIMP)

Issues of Reliability and Usability

Testing the usability of a non reliable system?

Constructing reliable systems without concerning usability?

Possible ways to enhance, extend, enlarge UEMs to address these needs?

Gathering feedback from the audience through case studies Do we need to integrate methods OR develop new

methods ? In favor of integration

Joint meetings (including software developers) through brainstorming + rapid prototyping (more problems of non usable reliable systems)

Problems Some issues are also related to system reliability (ATMs)

problem of testing a prototype versus testing the system Issues of development time rather than application type Application type has an impact of the processes

selected for development Don’t know how to build a reliable interactive system …

whatever time we have How can reliablity-oriented methods support usability-

oriented methods

Gathering feedback from the audience through case studies How to design for testability (both the reliability of the

software and the usability) Is testing enough or do we need proof Usability testing is at higher level of abstraction (goal

oriented) while software testing is at lower level (functions oriented)

Is there an issue with interaction techniques (do we need precise description of interaction techniques and is it useful for usability testing?)

Automated testing through user-events simulation (how to understand how the user can react to that?)

Issue of reliability according to the intention of the user? and not only the reliability of the system per se

Beyond one instance of use but on reproducing the use many times

Gathering feedback from the audience and case studies

Control Room (Ron) Home/Mobile – testing in non

traditional environments (Regina) Mining case study (Sandra)

First Case Study: Control Room

Advanced Control Room DesignTransitioning to new domains of Human

System Interaction

Problem: Next generation nuclear power plants coupled with advanced instrumentation and controls (I&C), increased levels of automation and onboard intelligence all coupled with large-scale hydrogen production present unique operational challenges.

PBMR Conceptual design

Typical DesignHybrid Controls

Example

Software Interface with:Software Interface with: Cumbersome dialog boxCumbersome dialog box No discernible exitsNo discernible exits Good shortcutsGood shortcuts

Example

10 1 1 1 10 .1 1 1 1 0.1

UCC =

0.1 x 2 =

0.2

Second Case Study: Mobile interfaces

Testing Mobile Interfaces

Lab or field Method selection Data gathering/

analysis Problematic Area:

Testing in non traditional environment

Non Traditional Environments

Combine and balance different UEMs according to usability/reliability issues

Combine Lab and Field Select UEMs according to

development phase

Third Case Study: Mining Accident

Reminder

Events & Causal Factors Analysis (ECFA)

Provides scenario of events and causal factors that contributed to the accident Chronologically sequential representation Provides overall picture Relation between factors

Gain overall perspective of Casual factors such as conditions (pressure,

temperature…), evolution of system states

Analysing the accident

Fatal mining accident involving human operators, piping system & control system

Decided to switch from North to South Fuel didn’t arrive to plant kilns Bled pipes while motors in operation Motor speed auto-increase due to low

pressure Fuel hammer effect Grinder exploded

Seal On North

Grinder Overheated

Worker checks valves in

containment area are in correct

position to make the switch

South pump idle since 3 days

South pump grinder motor replaced due

to bad seal

Fuel does not flow to

kilns

Low Pressure in the Pipes

Air in the Pipes

Blockage caused by a clog

Blockage caused by a closed valve

Kiln op. radios supervisor to

inform that fuel is not getting to

kilns

Supervisor is informed that

fuel is not getting to kilns

Kiln op. monitors system

through fuel line sensors in control room

Decision Made to

Switch from North to

South DS

Supervisor and Kiln Operator

Discuss Situation

Kiln Operator Notices Seal On

North Grinder Overheated

Kiln Operator Notices Fuel

does not flow to Kilns

Supervisor bleeds air from ¾” ball valves of fuel pipe

system

Supervisor believes it is

an air problem

Worker bleeds air at

south pumps

Worker Unaware of Manufacturer’s


Pumps in Operation

Supervisor Unaware of Manufacture’s


Pumps in Operation

Worker radios kiln op. to find out if fuel has started to go to

kilns

Kiln op. tells Worker fuel is not going

through

Kiln Operator observes Fuel is still not reaching

the Kiln Area

Low Pressure in the Kiln Area

Blockage in pipe abruptly removed

Water- Hammer

Effect

Grinder explode

d & propelled off its base

Air continues to block the monyo grinder fuel piping

Supervisor bleeds air of valve south 330 pump while in

operation

Fuel sprayed from grinder

base covering Supervisor and Worker

Fuel ignites

Kiln area pressure sensor senses low pressure

Kiln area pressure sensor sends signal to pump motors

to activate the ‘Step- Increase’ program

Fuel starts

flowing in the piping

Max PSIG of 334 per motor can be achieved

Kiln area pressure sensor sends signal to F-system

PLC does not respond

PLC not connected to

F-System

Kiln area pressure sensor senses low

pressure

Supervisor unaware of pump

manufacturer advice not to bleed air while pumps in operation

Follow-up maintenance

checks were not performed after new PLC was

installed

New PLC was installed 3

months prior to accident

FOXBRO was still connected

to old PLC system

manufacturer's warning to not

bleed the lines of air while the pumps

were operating

Manufacturer’s warning not

included in training

PLC did not activate the auto shut-down of

pumps (should occur if <60PSI not sensed after

3 mins of startup)

Fire suppression sensors mounted

40' above floor

Fireball occurred approx. 20' above floor

Fire suppression system did not

activate

Kiln control operator saw the fire from the

control room and radioed for help

Control room supervisor, activated the manual

emergency shut down on the pumps

Worker and Supervisor’s clothes catch

fire

Supervisor runs outside

Kiln area

Supervisor uses

extinguisher to extinguish

himself

Worker runs outside

Supervisor tries to

extinguish worker

Supervisor directed Worker to switch the pumps from North to South

12:45pm

Worker Goes to containment area where pumps are

located

12:46pm

Worker shuts down North

system pumps & starts South

system pumps

12:48pm

Pump Motors increase speed in order to achieve pressure of

70 PSI

12:48pm

F-System sends signal to PLC

12:51pm

Supervisor observes pipe entering south

grinder begin to shake&vibrate

12:58pm

ECFA Chart of the Accident

Listing of issues and solutions for interactive systems testing

Hybrid methods (Heuristic evaluation refined (prioritisation of Heuristics))

Remote usability testing Task analysis + system modelling Cognitive walkthrough (as is)

Towards Solutions

Formal models for supporting usability testing

Formal models for incidents and accidents analysis

Usability and human reliability analysis

Usability Heuristics Heuristics are key factors that comprise a

usable interface (Nielsen & Molich, 1990)

Useful in identifying usability problems Obvious cost savings for developers 9 heuristics identified for use in the present

study In our framework, these usability heuristics are

used as

“performance shaping factors” to constitute a usability error probability (UEP)

Heuristic Evaluation and HRA

“Standard” heuristic evaluation

HRA-based heuristic evaluation

Heuristic Evaluation MatrixSteps

• Determine level of heuristic

• Determine product of heuristic multipliers

• Multiply product by nominal error rate

Consequence Consequence DeterminationDetermination

Strict consequence assignment in PRA/HRA, Strict consequence assignment in PRA/HRA, part of cut sets approachpart of cut sets approach

• More molar approach taken in the present studyMore molar approach taken in the present study

• “ “Likely effect of usability problem on usage”Likely effect of usability problem on usage”• NotNot literal consequence model literal consequence model

• Results in usability consequence coefficient (UCC)Results in usability consequence coefficient (UCC)

• Four consequence levels assigned Four consequence levels assigned • high, medium, low, and nonehigh, medium, low, and none

Usability Consequence Usability Consequence MatrixMatrix

Steps

• Determine level of usability consequence

• Multiply UEP by consequence Multiplier

• Usability Consequence Coefficient determines priority of fix

Example

Software Interface with:Software Interface with: Cumbersome dialog boxCumbersome dialog box No discernible exitsNo discernible exits Good shortcutsGood shortcuts

ExampleExample

10 1 1 1 10 .1 1 1 1 0.1

UCC =

0.1 x 2 =

0.2

Listing of issues and solutions for new interaction techniques testing

Roadmap on Testing

Interactive Systems

Target Applications, Domains - context

Software Engineering Issues Notations and Tools

User Interface Interaction Technique

No more usability problems ? No more bugs ?

Automated autonomous Real-Time Systems (VAL, TCAS) B (Atelier B), Z, … No Interaction

Technique

WIMP - hierarchical

Direct Manipulation

Augmented Reality

Command and Control Systems

All Types of Applications

Tangible User Interface

2006

2020TO

DA

Y

2009

Web Applications

Multimodal Interaction

Business ApplicationsUML, E/R, …

•Full concurrency•Dynamic instantiation•Hardware/Software•Infinite number of states•Tool support•Advanced Analysis techniques

Embodied UI

Mobile phones

Mobile systems

Web systems

Gaming

Future Plans and Announcements Future plans

Web site is setup and will be populated (slides, list of attendees, topics, …) http://liihs.irit.fr/palanque/SIGchi2006.html

Further work IFIP WG 13.5 on Human Error Safety and System

Developement [email protected] NoE ResIST (Resilience for IST) www.resist-noe.org Workshop on Testing in Non-Traditional Environments

at CHI 2006 MAUSE: www.cost-294.org

Announcements DSVIS 2006, HCI Aero, HESSD next year

Best Achievable Practices for HRThe Human Reliability Design Triptych

63

Best Practices for Design Compliance with applicable standards and best practices

documents Where applicable, ANSI, ASME, IEEE, ISO, or other discipline-specific

standards and best practices should be followed Consideration of system usability and human factors

System should be designed according to usability and human factors standards such as NASA-STD-3000, MIL-STD-1472, or ISO

Iterative design-test-redesign-retest cycle Tractability of design decisions

Where decisions have been made that could affect the functions of the system, these decisions should be clearly documented

Verified reliability of design solutions Reliability of systems should be documented through vendor data,

cross-reference to the operational history of similar existing systems, and/or test results.

It is especially important to project system reliability throughout the system lifecycle, including considerations for maintenance once the system has been deployed

It is also important to incorporate the estimated mean time before failure into the estimated life of the system

64

Best Practices for Testing Controlled studies that avoid confounds or experimental artifacts

Testing may include hardware reliability testing, human-system interaction usability evaluation, and software debugging

Use of maximally realistic and representative scenarios, users, and/or conditions

Testing scenarios and conditions should reflect the range of actions the system will experience in actual use, including possible worst-case situations

Use of humans-in-the-loop testing A system that will be used by humans should always be tested

by humans Use of valid metrics such as statistically significant results for

acceptance criteria Where feasible, the metrics should reflect system or user

performance across the entire range of expected circumstances In many cases, testing will involve use of a statistical sample

evaluated against a pre-defined acceptance (e.g., alpha) level for “passing” the test

Documented test design, hypothesis, manipulations, metrics, and acceptance criteria

Should include the test design, hypothesis (or hypotheses), manipulations, metrics, and acceptance criteria

65

Best Practices for Modeling Compliance with applicable standards and best practices documents

E.g., NASA NPR 8705.5, Probabilistic Risk Assessment (PRA) Procedures for NASA Programs and Projects or NRC NUREG-1792, Good Practices for Implementing Human Reliability Analysis

Use of established modeling techniques It is better to use an existing, vetted method than to make use of

novel techniques and methods that have not been established Validation of models to available operational data

To ensure a realistic modeling representation, models must be baselined to data obtained from empirical testing or actual operational data

Such validation increases the veracity of model extrapolations to novel domains

Completeness of modeling scenarios at the correct level of granularity A thorough task analysis, a review of relevant past operating

experience, and a review by subject matter experts help to ensure the completeness of the model

The appropriate level of task decomposition or granularity should be determined according to the modeling method’s requirement, the fidelity required to model success and failure outcomes, and specific requirements of the system that is being designed

Realistic model end states End states should reflect reasonable and realistic outcomes

across the range of operating scenarios

66

Documents

Testing Interactive Software: A Challenge for Usability and Reliability Philippe Palanque LIIHS-IRIT, University Toulouse 3, 31062 Toulouse, France [email protected]