40
1 Scenario-based assessment of non- functional requirements Andreas Gregoriades, Alistair Sutcliffe, Member, IEEE Abstract—This paper describes a method and a tool for validating non-functional requirements in complex socio-technical systems. The System Requirements Analyser (SRA) tool validates system reliability and operational performance requirements using scenario-based testing. Scenarios are transformed into sequences of task steps and the reliability of human agents performing tasks with computerised technology is assessed using Bayesian Belief Network (BN) models. The tool tests system performance within an envelope of environmental variations and reports the number of tests that pass a benchmark threshold. The tool diagnoses problematic areas in scenarios representing pathways through system models, assists in the identification of their causes and supports comparison of alternative requirements specifications and system designs. It is suitable for testing socio-technical systems where operational scenarios are sequential and deterministic, in domains where designs are incrementally modified so set-up costs of the BNs can be defrayed over multiple tests. Index Terms—Non-Functional Requirements Validation, Scenario-Based Testing, Bayesian Belief Networks, Systems Engineering —————————— —————————— 1 INTRODUCTION cenarios have attracted considerable interest as a means of validating requirements specifications [4, 5, 9, 54]. Foundations of scenario-based approaches were laid by Hsia and Davis [8, 33], and the influential work of Potts [48] who created the Inquiry Cycle and later the ScenIC [46] method for scenario-based requirements validation [46, 47, 48, 49]. The potential of scenario-based requirements validation has also been recognised by Anderson and Durley [1], Zhu and Jin [71], and Haumer [28]. Scenarios have been applied to the analysis of non-functional requirements (NFRs) using dependency tables to assess the relationships between different NFRs [43] and by modelling the dependencies between goals (representing functional requirements and non-functional requirements, also called soft goals), and the agents and tasks that achieve them in the i* language [70]. The “satisfying” or fulfilment of soft goals (i.e. NFRs) by functional requirements is assessed by inspecting strategic dependency and rationale models that show goals, agents, tasks and dependency relationships [40, 41, S

Scenario-based assessment of non- functional requirements · Scenario-based assessment of non-functional requirements ... despite the advantages, ... uses a formal object-oriented

  • Upload
    lyduong

  • View
    224

  • Download
    2

Embed Size (px)

Citation preview

1

Scenario-based assessment of non-functional requirements

Andreas Gregoriades, Alistair Sutcliffe, Member, IEEE

Abstract—This paper describes a method and a tool for validating non-functional requirements in

complex socio-technical systems. The System Requirements Analyser (SRA) tool validates system

reliability and operational performance requirements using scenario-based testing. Scenarios are

transformed into sequences of task steps and the reliability of human agents performing tasks with

computerised technology is assessed using Bayesian Belief Network (BN) models. The tool tests

system performance within an envelope of environmental variations and reports the number of tests

that pass a benchmark threshold. The tool diagnoses problematic areas in scenarios representing

pathways through system models, assists in the identification of their causes and supports comparison

of alternative requirements specifications and system designs. It is suitable for testing socio-technical

systems where operational scenarios are sequential and deterministic, in domains where designs are

incrementally modified so set-up costs of the BNs can be defrayed over multiple tests.

Index Terms—Non-Functional Requirements Validation, Scenario-Based Testing, Bayesian Belief Networks,

Systems Engineering

—————————— � ——————————

1 INTRODUCTION

cenarios have attracted considerable interest as a means of validating requirements

specifications [4, 5, 9, 54]. Foundations of scenario-based approaches were laid by

Hsia and Davis [8, 33], and the influential work of Potts [48] who created the

Inquiry Cycle and later the ScenIC [46] method for scenario-based requirements

validation [46, 47, 48, 49]. The potential of scenario-based requirements validation

has also been recognised by Anderson and Durley [1], Zhu and Jin [71], and Haumer

[28].

Scenarios have been applied to the analysis of non-functional requirements (NFRs)

using dependency tables to assess the relationships between different NFRs [43] and

by modelling the dependencies between goals (representing functional requirements

and non-functional requirements, also called soft goals), and the agents and tasks that

achieve them in the i* language [70]. The “satisfying” or fulfilment of soft goals (i.e.

NFRs) by functional requirements is assessed by inspecting strategic dependency and

rationale models that show goals, agents, tasks and dependency relationships [40, 41,

S

2

70]. Although i* support tools do provide limited reasoning support for assessing

dependencies, most validation still requires human expertise. The TROPOS [20]

language supports more formal reasoning about i* models; however, it does not

explicitly assess non-functional requirements.

Unlike functional requirements, which can be deterministically validated, NFRs are

soft variables that can not be implemented directly; instead, they are satisfied [40] by

a combination of functional requirements. Since many NFRs are influenced by human

properties, they inherit the diverse nature of human characteristics: for example,

assessment of NFRs such as system reliability is influenced by human characteristics

such as ability, stress, concentration, etc. Software engineering and systems

engineering requirements validation methods do not take human factors into account,

even though they are a critical cause of systems failure [30, 31, 52].

In our previous work [61] we developed a method and software tool for scenario-

based requirements validation that prompted designers with questions about potential

problems in a scenario event sequence. The tool used a psychology-based taxonomy

of failure causes [32] with a pathway expansion algorithm that generated alternative

paths from a single seed scenario. It supported an inspection-based process with probe

questions about possible problems and generic requirements as cures for the problems

it identified. However, evaluation of this approach showed that too many scenario

variations were generated and the software developers drowned in excessive detail.

To address this problem, we developed a semi-automated approach to requirements

validation [59], by transforming the taxonomy of human and system failure causes

into a model to predict potential errors in a system design. Bayesian Belief Nets

(BNs) provided a probabilistic reasoning mechanism to predict reliabilities, from

models composed of descriptions of system components and attributes of human

operators [21]. However, the output from the BN model was fed into a paper-based

walkthrough for validating scenarios which was still time-consuming. This led to the

motivation for the research we report in this paper, to create a software tool, for

scenario-based requirements validation that automates as much of the process as

possible.

3

The paper is organised in seven further sections. BN and uncertainty modelling is

briefly described; this is followed by the methodology and the tool’s architecture; the

NFR assessment follows. A case study analysis of NFR compliance and validation of

system-level components in a military command and control domain is presented in

which the tool is applied; the BN evaluation is explained; and the paper concludes

with a discussion and proposals for future development of our approach.

2 RELATED WORK

The SRA – System Requirements Analyser – tool described in this paper can be

regarded as a form of model checking which takes place early in the system

development life cycle, and uses BNs to reason about properties of system

components rather than more detailed models of system behaviour.

Model-checking techniques have been used extensively to verify and validate

requirements. However, despite the advantages, formal modelling suffers from a

communication problem between user-stakeholders and the model developers [7, 11],

since formal models are difficult to communicate to the stakeholders who set the

requirements in the first place. The software cost reduction (SCR) system used a

tabular notation for specifying requirements dependencies which is relatively easy for

software developers and end users to understand [34]. Tabular representation based on

the underlying SCR state transition formal model provided a precise, unambiguous

basis for communication among developers, coupled with automated analysis of

specifications. The approach hides the logic associated with most formal methods and

adopts a notation that developers find easier to use.

While tabular representations can improve communication of requirements, a

combination of visualisations, examples and simulation are necessary to explain

complex requirements to end users [6]. Scenario-based representations and animated

simulations help users see the implications of system behaviour and thereby improve

requirements validation [22]. Lalioti [36, 37] suggested potential benefits from

animating requirements validation including an interactive and user-friendly

validation environment for stakeholders.

4

Animation simulation tools integrated with formal model checkers have been

developed by Dubois in the ALBERT II [10] language and associated requirements

validation-animator tool (animator). The language preserves the structure of the

informal requirements expressed by stakeholders and maintains traceability links to

the formalised software requirements document. The animator validates the

requirements based on scenarios proposed by the stakeholders, allowing them to

cooperatively explore different possible behaviours of the future system. A similar

approach has been adopted in the KAOS language and supporting GRAIL tool which

enable formal reasoning about dependencies between goal model, required system

behaviour and obstacles or constraints [66, 67]. Another similar animator-validator

tool, TROLL [23], uses a formal object-oriented language for modelling information

systems, with syntax and consistency checker tools as well as an animator to

generates executable prototypes that can be used for requirements validation. As with

SCR and ALBERT II animators, our approach employs a tabular and graphical

representation of results [29] and runs test scenarios against the system model to

identify problems with the requirements specifications.

Scenario-based requirements analysis methods, pioneered by Potts [46, 47, 48],

proposed that obstacles or difficulties which might prevent a goal being achieved

should challenge requirements and hence promote refinement of the requirements

specification to deal with such obstacles. This approach was developed by van

Lamsweerde [65, 67], who applied formal reasoning to requirements specifications to

infer whether goals could or could not be achieved given constraints imposed by

obstacles. Hierarchical goal decomposition produced specifications of the states to be

achieved and the system behaviour required to reach those states, so considerable

problem refinement is necessary before automated reasoning can be applied. These

approaches also assumed that a limited number of scenarios and their inherent

obstacles are tested. This raises the question of test data coverage, i.e. just what is a

sufficient set of scenarios to enable validation to be completed with confidence?

While we believe there is no quick answer to this vexing problem, one approach is to

automate the process as far as possible so more scenarios can be tested.

Methods for requirements validation in safety critical systems have adopted

hierarchical fault trees to represent the space of possible normal and abnormal system

5

behaviours and their causal conditions (e.g. THERP [64]). While fault stress can be

formalised as state machines with temporal logic to reason about potential failures in

deterministic systems [27], the influence of human operators and the system

environment are generally not modelled. When they are represented, as performance

shaping factors [35], probabilistic modelling has to be used to reason about the

likelihood of failure of system components based on a description of their properties

and factors such as operator stress and fatigue [68].

Intent specifications provide a hierarchical model to facilitate reasoning about system

goals and requirements in safety critical systems [38]. Goals are decomposed in a

means-ends hierarchy, widely practised in requirements engineering [54, 67]. Intent

specification requirements are assessed by inspecting dependencies between

constraints, design principles and system goals to discover conflicts. Automated

support for reasoning about conflicting system states and behaviour is provided by the

SpecTRM-RL tool which uses a tabular format to represent relationships between

threat events and systems states, based on design assumptions and constraints.

However, intent specifications do not support assessment of human error in systems

or dependencies between human operators and user interfaces.

Assessment of non-functional system requirements, such as system reliability, has to

use probabilistic reasoning since the range of potential system behaviours is either

unknown, in the early requirements phase, or too large to specify. Bayesian Nets

(BNs) have been developed to assess software quality from properties of the code and

software engineering process [13, 15, 16, 18, 19], and for system risk analysis and

management [17]. Fenton and Littlewood’s [16] approach predicts the number of

defects in the system. They estimate software reliability using BNs to reason about

quality probabilities based on information gathered during the software development

process, such as the difficulty of the problem, the complexity of the designed solution,

the programmer’s skill, and the design methods employed. Fenton [17, 42] has

developed large BN models to assess risk at the system level, such as the reliability of

system engineering processes for developing ships, vehicles or the operational

reliability of air traffic control systems. This work has also produced methods and

tools for building large BN models to solve complex real world problems and

improved support for use of BN tools by end users. BNs have also been applied to

6

evaluating the confidence which might be assigned to different combinations of test

strategies in assuring reliable software [72].

In summary, BNs have been widely applied as a probabilistic reasoning technique in

software engineering and other domains; however, previous work used single nets to

evaluate a set of discrete states pertaining to a software product or development

process. In our earlier work we extended the application of BNs for safety analysis in

systems engineering domains using a semi-automated scenario-based approach [21].

We then developed more automated tools for scenario analysis of NFR conformance

for requirements specifications with multiple BN tests [60]. This paper extends that

work to show the development of a more comprehensive tool architecture which can

be configured with different types of BNs to analyse other non-functional

requirements; description of the scenario-based NFR evaluation method with different

modes of using BNs in scenario analysis; and validation studies of the BNs. The

extensive case study is reported using the tool to analyse a requirements specification

for an aircraft weapons loading system for a future aircraft carrier.

3 MODELLING UNCERTAINTY

Because of the uncertain nature of NFRs it is necessary to model them using

modelling techniques such as Bayesian probability, Dempster-Shafer theory, fuzzy

sets or possibility theory. Following Wright and Cai’s [69] review of the advantages

and disadvantages of stochastic reasoning methods, we adopted Bayesian probability.

They argued that Bayesian probability offered easier combination of multiple

influences on probability than Dempster Shafer and a sounder reasoning mechanism

than fuzzy sets. Bayesian probability provides a decision theory of how to act on the

world in an optimal fashion under circumstances of uncertainty. It also offers a

language and calculus for reasoning about the beliefs that can be reasonably held, in

the presence of uncertainty, about future events, on the basis of available evidence

[45]. BNs are useful for inferring the probabilities of future events, on the basis of

observations or other evidence that may have a causal relationship to the event in

question [12, 19].

7

BNs are directed acyclic graphs of causal influences, where the nodes represent

variables, and the arcs represent (usually causal) relationships between variables [12].

The example in figure 1 shows two influences on agent stress loading: workload and

duty time. Variables can have any number of states in a BN, so the choice of

measurement scale is left to the analyst’s discretion. For the illustration we have

assigned these variables to one of the two possible states: high, or low.

Fig. 1: Fragment of the proposed BN model.

In the above example, if we know that when the duty time is high (bad) and the

workload is high (bad), then the overall probability of the agent’s stress loading being

high (i.e. bad influence on human agent) will be greater. In the BN we model this by a

network probability table (NPT), as shown in table 1.

Table 1: A network probability table for the BN in figure 1.

Column 1 asserts that if the duty time of a human agent is high (bad) and his/her

workload is high, then the probability of stress loading being high (bad) is 1, with

zero probability of being low. NPTs are configured by estimating the probabilities for

the output variables by an exhaustive pairwise combination of the input variables.

BNs can accommodate both probabilities based on subjective judgements (elicited

from domain experts) and objective data [17]. When the net and NPTs have been

completed, Bayes’ theorem is used to calculate the probability of each state of each

node in the net. The theorem is shown in equation 1:

)(

)()/()/(

bP

aPabPbaP = [1]

Where,

P(a/b) = posterior (unknown) probability of a being true given b is true

P(b/a) = prediction term for b given a is true (from NPT)

Duty Time High Low

Workload High Low High Low

High 1 0.4 0.6 0

Stress-loading Low 0 0.6 0.4 1

8

P(a) = prior (input) probability of a

P(b) = input probability of b

or, less formally:

Evidence

abilityPrior_ProbLikelihoodyProbabilitPosterior_

⋅=

Substituting data from the above example, the calculation is as follows. We want to

calculate the probability P that duty_time will be high and if we have opbserved that

the agent has a high workload. The likelihood of P stress (high) given work loading =

high and duty_time = high is 0.6 as given in the network probability table. To

calculate the posterior probability P(duty_time = high) and P(overloading = high), we

need the prior P(duty_time = high) is 0.5 and the input evidence of workload being

high which is 0.42, which produces the following calculation (equation 2):

P(duty = high / load = high) = (P(loading = high / duty = high) * P(duty = high)) /

P(loading = high) or

P(duty = high /load = high) = (0.6 * 0.5) / 0.42 = 0.71 [2]

Input evidence values are propagated through the network, updating the values of

other nodes. The network predicts the probability of certain variable(s) being in

particular state(s), given the combination(s) of evidence entered. BN models are

extremely computation-intensive; however, recent propagation algorithms exploit

graphical models’ topological properties to reduce computational complexity [45].

These are used in several commercial inference engines such as HUGIN, which we

used. BNs have to conform to a strict hierarchy since cycles lead to recursive and non

terminating propagation of probabilities by the algorithm. This imposes some

compromises in modelling influences, which can be partially overcome by

introducing additional input nodes to model cyclic influences, although this increases

complexity of the network and the control process for the algorithm.

BNs are currently used in many applications to reason about probabilities of

properties given a set of existing (prior) states; however, they do not naturally lend

themselves to a time series analysis. We examined three possibilities. First was serial

evaluation using an extended net which contained an input node that accepted the

result from the previous run. Hence the output reliability from step 1 became an input

prior state for step 2. This approach had the advantage of being able to explicitly

9

model the interaction between events; for instance, a high probability of failure at step

1 may make completion of step 2 much more difficult. However, input of a posterior

probability into a BN as a prior observation over many events has doubtful validity,

and we were advised by an expert statistician to avoid this approach. The expert’s

argument was that each run should be assumed to be independent, which would not be

the case if we propagated results between runs. The second approach was to combine

the output probabilities from a sequential run; assuming a BN has been used to assess

the probability of failure in a multi-step scenario, how should N probability

judgements be combined into a single value? One possibility was to use the output

probabilities as input into a “summariser net” that combined all the inputs as prior

observations into a single probability, with the net structure organised to group events

into episodes in a scenario. However, this option also faced the same criticism as the

first, namely converting multiple posterior probabilities into input observations. Our

expert advised that sample runs assuming they were independent, were possible but

this required probabilities of sampling particular runs to be set. This introduced a

subjective sampling bias; accordingly we rejected this option as well.

The third option avoided the net combination problem by converting the output

probability into a Boolean variable by judging each step to have succeeded or failed.

The output calculated probability for each event was compared with a user-defined

target value, and if it surpassed the target is was assigned as a “survivor”, otherwise a

failure and discounted. This option had the advantage of being able to pinpoint

particular steps in scenarios that were posing reliability problems. Furthermore,

sensitivity analyses could be carried out with multiple BN runs for each step by

varying the environmental conditions, and thus producing frequencies of survivors for

a set number of tests at each scenario event. This enabled investigation of the effect of

environmental conditions on design (x) with a set of scenarios (a, b, c) by counting

the number of surviving BN runs per step, having systematically varied all

combinations of the environmental variables from worst case to best case.

The SRA tool currently has two BNs: one to evaluate reliability and one to evaluate

performance time. Each BN model has variants with different probability distributions

in the NPTs to deal with variations in the degree of automation between tasks. New

BNs can be added to the tool to evaluate a wide range of NFRs.

10

3.1 BN MODEL OF SYSTEM RELIABILITY

The BN model of system reliability is based on a taxonomy of influencing factors by

Sutcliffe and Rugg [62] and the slips/mistakes distinction from Reason [52], who

drew on earlier work by Norman [44]. Slips are attention-based lapses and omissions

in skilled behaviour, whereas mistakes are failures in plans and hence fit into

Rasmussen’s [51] rule and knowledge levels of processing. The BN model

distinguishes between tasks that involve highly trained skills and are more prone to

slips (e.g. monitoring tasks) and knowledge-intensive tasks, such as analysis and

planning, that are more prone to mistakes.

According to the human error theory [41], the system environmental variables have an

indirect influence on an individual’s ability through increasing the fatigue and stress

levels, as reflected in the BN model in figure 2. An individual’s ability, however, has

a direct effect on mistakes. Organisational factors (management culture, incentives)

have a direct effect on individuals’ motivation [39]. Finally, individuals’

characteristics, such as domain and task knowledge, have a direct effect on mistake-

type errors [53]. Slips are mainly influenced by the user interface, the constraints

(time constraints, interruptions) and the individual’s dedication [52]. Tasks of high

cognitive complexity are considered to be more prone to mistake-errors, while tasks

of physical complexity, such as complex manipulations involving precise movements

and detailed co-ordination, are more prone to slip-errors [59].

11

Fig. 2: BN model for system reliability. Inputs 1-2 relate to the task, 3-6 are technology attributes, 7-10 are human attributes and 11-22 are environmental variables. Appendix A

describes the nodes and summarises the NPT influences from parent to child nodes.

The first two inputs represent judgement of task complexity; for instance, operating

radar is cognitively and physically easy, whereas interpreting an object on the radar is

cognitively more complex (hence set to high). Inputs 3 to 6 describe technical

component properties, which can be taken from historic data on similar equipment, or

estimated. Inputs 7 to 10 are properties of human agents which are taken from training

data and job descriptions. Input values for the agent’s task knowledge, domain

knowledge, motivation and so forth can be measured using aptitude and psychometric

tests. The next six variables model influences on the human operational environment.

These include the short-term effects of time pressure, distractions and workload,

which can be estimated from narrative scenario descriptions, to the longer-term

influences of management culture and incentives. The final six inputs describe aspects

of the system’s operational environment (noise, lighting, comfort, sea state, visibility

and war/peace status). All the inputs are held in databases containing attributes of

human agents, technology components, and tasks. The environmental variables, sub-

Physical

complexity

Usability

Time

constraints

Cognitive

complexity

Internal_Env

factors

1 -2

3 -6

7 - 10

11 -16

17 -22 3

18 19 20 21 2217

4

1 2

5

6

87

9

10

11 12

13 14

15

16

Noise Lighting Comfort

External_Env

factors

Functionality Performance Reliability

Task

support

Autom

Effectiveness

Task

knowledge Domain

knowledge

Inherent

ability

Internal

motivation

Task

complexity

Operational

stress

External

influence

Slips

Functional

UI_Design Ability

Knowledge

Motivation

Distractions

Duty_Time Workload

Management

culture

Incentives

War_Peace Sea_State Visibility

Env_Context

Organisation

culture

Enthusiasm

Fatigue

Stress

Ineffective

-ness

Expertise

Mistakes

Dedication Internal

influence

12

divided into human and system operational aspects, can be entered manually to reflect

a particular scenario or systematically varied.

Task complexity can be either cognitive or physical; for instance, operating radar is

cognitively and physically easy, whereas interpreting an object on the radar is

cognitively more complex (hence set to high). Attributes of technical components can

be taken from historic data on similar equipment, or estimated. Inputs from the human

agent are taken from training data, job descriptions or are measured objectively by

using psychological questionnaires. For instance, general ability and

accuracy/concentration can be measured by intelligence aptitude scales, decision

making and judgement by locus of control scales, whilst domain and task knowledge

can be measured by creating simple tests for a specific task/domain. Input nodes in

the human operational environment include the short-term effects of time pressure,

distractions and workload, which can be estimated from narrative scenario

descriptions, to the longer-term influences of management culture and incentives,

which are judged from contextual scenarios. The final input nodes describe aspects of

the system’s operational environment. All the inputs are contained in files that link the

variables to human agents, technology components, task properties or the

environment, sub-divided into human and system operational aspects. The input

variables are all discrete states (best/worst case) which are derived from the measures

detailed in appendix A.

The BN is run with a range of scenarios that stress-test system design against

operational variables. Scenarios can either be taken from domain-specific operational

procedures or by interviewing users, or postulated to cover a variety of organisational

and work situations that may occur in the domain. The BN produces outputs: slip-type

errors that apply to skilled tasks (recognise, interpret and act), and mistake errors

pertinent to judgement-style tasks (analyse, plan and decide).

3.2 BN FOR OPERATIONAL PERFORMANCE TIME

The topology and components of the BN for performance time assessment are similar

to the Reliability BN since many of the influences on performance and error are the

same. The Operational Performance Time model has a similar causal network to the

Reliability BN, apart from having one output node (operational performance) rather

13

than two. As with the Reliability BN illustrated in figure 2, the likelihood influences

expressed in the BN model and its NPTs are based on human factors performance

literature [64] (see also appendix A). For example, a poor physical and operational

environment (time on duty and workload) have an adverse influence on the agent’s

stress and fatigue levels which in turn adversely influence the agent’s concentration

[3]. Input variables are either an expert’s assessment of a quality, e.g. information for

decision support provided in a prototype; or a functionally rich and more expensive

design which would have a higher rating for functionality, situation awareness

support, etc. Different levels of automation are reflected in variations in the BNs. For

example highly automated tasks tend to be quicker and more reliable, but this only

applies if the equipment is well designed and maintained. Hence maintenance has

more influence in highly automated tasks than in minimum automation, and this is

reflected in different NPTs based on the equipment types. Similarly, the type of task

(manual, semi-automated) determines the degree of influence of technology.

Whereas the Reliability BN produces probabilities of reliable completion for each

task step, output from the Operational Performance BN is used to increase a best case

task completion time to reflect the less than ideal properties of human and machine

agents. Each task is assigned a best and worst case completion time, obtained from

domain experts. The estimated task completion time is calculated using the following

formula (equation 3):

ET = (Plow

* BT) + (Phigh

) * WT) [3]

Where,

ET = Estimated time

highP = Probability of operational performance being high

BT = Best task-completion time

lowP = Probability of operational performance being low

WT = Worst completion time

Hence, if the probability of high operational performance is equal to 1 then the

probability of low operational performance will be 0 (best case); this will result in a

best case completion time. On the other hand, if the probability of low operational

performance time is 0.57 and the best and worst times are 3 and 10 sec respectively,

then the estimated time is (0.57*3) + (0.43*10) = 6.01 sec. If the threshold value is set

14

at 75% in the range best-worst case, then this is converted into time with the

following formula (equation 4):

Thsec = (Th%

/ 100) * BT + ((1- Th%

) / 100) * WT [4]

Where,

Thsec = Threshold in seconds

BT = Best task-completion time

Th%

= Threshold as a percentage value

WT = Worst completion time

Therefore, according to the above example and equation 5:

sec75.4100

)310(7510sec =

−⋅−=Th [5]

Hence any task completion time less than 4.75 is acceptable. For each task-step the

system counts the BN runs with task completion times below the threshold.

To reflect the case of reverting to manual when an automated technology fails, highly

automated tasks’ worst completion times are generally set much higher than those of

the manual tasks. This is because the human operator has to diagnose the reason for

failure and then substitute the manual version of the task which will not be familiar.

Hence the worst-case time will be longer than the manual task alone. For instance, the

task “Manually load weapons on trolley” requires 120 sec to complete best-case

situations and 180 sec in the worst case. On the other hand, the same task with

automated technology could be completed ideally in 70 seconds but in 320 sec in the

worst case. If the automated technology fails to load the weapons correctly then

intervention of a human agent is required to discover the reason for the failure and

then correct the misplacement or manually load the weapons.

4 SRA SYSTEM ARCHITECTURE

Analysis starts with the selection of the i* model to be evaluated, and creating the test

scenarios. Scenarios are narratives taken from real life experience describing

operation of similar systems from which event sequences are extracted. This process

is explained in more depth in sections 5 and 6. A scenario editor tool is provided [24]

which allows the analyst to point to task nodes on the i* diagram; the tool then

presents a list of the technology and human agents which may be associated with the

15

task. The analyst picks the agents from the list to form a task ‘tuple’ consisting of

<human agent, task, technology agent>. Scenarios are built up in this manner by

following task pathways through the i* model, which is illustrated in figure 3. The

analyst specifies the NFR threshold values, then selected the scenarios and system

database. The SRA loads the required information (for the task and agents in the

scenario) from the domain database. Because of differences between semi- and highly

automated tasks, the system evaluates operational performance for each type of task

using slightly different BN models. Nodes that do not apply to the equipment used are

left undefined and therefore have a neutral influence on operational performance. For

instance, tasks that are highly automated are more dependent on maintenance

compared with semi-automated tasks, whereas highly automated equipment is

generally more reliable as long as it is well designed and maintained. These influences

are reflected in the network probability tables of the BN models.

Fig. 3: System model for a navy command and control combat system represented in the i* notation. To simplify the model only human agents are shown. Scenarios trace pathways

through the model from the radar operator to PWO and then to weapons directors – EWD, WDB or WDV – for a response to the threat.

Furthermore, depending on the task type, the SRA assesses system reliability based on

two types of errors, slips and mistakes. Slips are more common in tasks that are

highly skilled or physical in nature, while mistakes occur in tasks that are cognitively

16

complex or knowledge-intensive, such as planning [50, 51]. For each BN run the tool

assesses the system reliability and compares it against the pre-defined threshold.

Throughout this process the system keeps track of the number of BN runs that pass

the threshold.

In its current form the tool assesses two NFRs, system reliability and operational

performance time. The BN models are used in a plug-and-play architecture that binds

BN models’ input nodes with the System Requirements Analyser (SRA), enabling a

range of NFRs to be tested using the same set of scenarios.

The SRA tool is composed of the following software components (see figure 4):

• The Session Controller implements the user command interface for selecting

designs and scenarios and executes the algorithm that assesses a set of

scenarios with the BNs. It calls the system reliability or operational

performance BN assessors to execute the BN runs with all possible

environmental combinations.

• The i* model editor allows interactive construction of i* models with typical

CASE tool-type functions.

• The Interactive Scenario Constructor produces test scenarios from the system

model based on user directions. Scenarios are stored in a database in an array

of tuples.

• The Model Controller controls the BN models. It selects the appropriate BN

model for each task step, then populates the input nodes, runs the model and

receives the belief distributions of the output nodes. The Model Controller also

manages the back propagation of the BN model to identify required

technology and agent characteristics.

• The BN assessor modules run the net by calling the HUGIN algorithm for

each task step and for each set of environmental variable combinations. The

output from each run is compared with the desired NFR threshold and the

survivor runs are passed to the results visualiser.

17

Fig. 4: System Requirements Analyser – conceptual architecture and functional components.

• The Visualiser provides a visual summary of all qualified BN runs for a set of

scenarios for one or more system designs. This enables different designs to be

compared and problem areas in the requirements to be identified, i.e.

task/technical component combinations which show low potential NFR

assessments. The Visualiser displays results at three levels: System, Scenario

and Phase views based on our previous visualisation model [24].

The system can be configured with new BNs by creating a new net and NPTs using

the HUGIN tool. The new BN is then added to the Model and Session Controllers by

editing menus to allow selection of the new NFR analysis and adding any rules to the

Model Controller to select between different model sub-types and NPTs according to

task or agent/equipment types. Currently only one NFR can be analysed in a session;

however, several designs and scenarios can be analysed sequentially. The system

automatically aggregates results from lower-level phase views, to the scenario and

then system design level, allowing two or more designs to be compared using the

i* model editor Scenario constructor

i* system models

Domain database

Desired NFR thresholds

Session controller

Model controller

Results visualiser

Scenario task sequences

Configuration editing: add new BN models and selection rules

Selects appropriate BN models, controls runs

agent task properties

Survivor runs for each step /

scenario / design

HUGIN BN tool

HUGIN BN editor

18

same set of scenarios. The system was developed in JAVA using JBuilder 9 (J2EE).

The user interface was implemented using Swing components while the model

controller interfaces with the HUGIN Decision Engine via the provided Java API. The

connection to the database uses JDBC.

5 NFR ANALYSIS METHOD

The process, illustrated in figure 5, starts by creating the system model, using the i*

modelling language, to describe the characteristics of agents, tasks, resources and soft

goals. Soft goals in this case constitute the NFRs under investigation, while resources

are the equipment used by the agent to perform the task. The domain knowledge

necessary for the development of the i* model is elicited from domain experts. NFRs

and their validation criteria are specified in the requirements specification (e.g. system

reliability should be >= 95% for a system design with a set of operational scenarios

1..n).

The next step converts scenarios, which are narrative stories, into a format that can be

executed by the system. This is achieved by extracting task sequences undertaken by

agents from the narrative. For example in the naval domain a missile attack scenario

narrative is “The enemy aircraft launches a missile, which is detected by the ship’s

radar. The Radar Operator (RO) reports a hostile contact, speeds and bearing to the

Tactical Picture Complier (TPC) who estimates the direction and closing time of the

threat and notifies the incoming missile threat to the Principal Weapons Officer

(PWO). PWO decides to jam the missile’s radar using electronic counter-measures

and issues the command to the Electronic Weapons Director (EWD) … [continues]”.

Scenario narratives can contain implicit tasks which are not articulated because they

are tacit or assumed knowledge, therefore we apply generic task patterns [58] to

define the task sequence. In the above example the generic pattern for command and

control consists of five tasks Monitor (events), Interpret (threat), Analyse (situation),

Plan (response), and Act.

Using the scenario editor with the i* system model, test scenarios are constructed by

selecting the tasks that are explicit and implicit in the scenario narrative, so for the

above example the task sequence from the Monitor by RO to Plan by PWO followed

by Act (EWD) would be selected. Scenarios are composed of a number of phases and

19

each phase is composed of a number of task-steps, each one modelled as a

<Agent,Task,Technology> tuple.

Fig. 5: NFR analysis method, processes and tool support. Ellipses denote method steps, ordered in a dependency sequence; boxes show tool components that support the method

step they are connected to.

In the above missile attack example, the narrative has four phases, each one

representing a command and control sequence: first electronic counter-measures are

tried; in the next phase the ship manoeuvres to avoid the threat; then fires decoys; and

finally destroys the hostile missile with a defensive missile. Phases are used to

structure task sequences that fulfil a higher order goal. Scenarios can be interactively

constructed by pointing to tasks on the system model editor display. The tool then

automatically creates a scenario task sequence by tracing the human and machine

agents involved with each task.

The Compare Design step finds the best system design using the system view bar

chart (see figure 6) to investigate the number of surviving runs for each task step.

Trade-offs between NFRs can be assessed by selecting different BN models (e.g.

reliability, performance time) from the Session Controller menu, while designs can be

Descriptions of tasks, agents,

goals

Develop system model

Identify critical

components

i* editor

results visualiser

results visualiser

scenario constructor

survivors bar chart

survivors bar chart

back- propagation

Requirements changes

Construct scenarios

Select designs, scenarios

and NFRs Compare designs

Assess environmental

variables

Pinpoint critical tasks

Identify improvements

20

compared by changing the database, which loads different technology and human

agents that represent a new design, and repeating the process. NFR thresholds can be

set at the user’s discretion so the tool allows the analyst to compare designs and

desired performance in a more flexible manner than if the variables had been hard

coded.

The best design will generally have more surviving BN runs (as defined in section 3);

however, it is also desirable that the design succeeds in all scenario steps. Each bar in

the system view (see figure 6) corresponds to the cumulative number of surviving

runs for each task-step in a scenario phase. The analyst can easily identify the best

design and pinpoint task steps with low NFR satisfaction rates by focusing on low

scores on the bar chart. Moving the cursor on top of any bar reveals the total number

of surviving runs for the task-step.

The bar chart identifies poorly performing task steps, which can be cross-referenced

to the human and machine agents involved. Right-clicking on top of any bar to reveal

the components involved. The domain database can then be queried to find the input

variables. The domain database has an annotation field so the analyst can record

reasons for settings, and refer to these when improvement may have to be made. The

BN models have a limited explanation facility of pop-up tool tips that summarise the

NPT influences (see appendix A) for each parent-child node combination. This

information is then used in the Identify Improvement step. Further advice on generic

requirements for technology to support particular tasks, and improving human

operation, is given in a related part of the toolset which we have described elsewhere

[57].

The best design also needs to be resilient to environmental conditions. This analysis is

supported by the results visualiser in the Assess Environment step. The results

visualiser uses colour coding to identify variables which adversely affect system risks

over a range of scenario steps. In the phase view the influences of environmental

variables on survivor runs are collated into a matrix (figure 6). Columns correspond to

the twelve environmental variables, and rows report the percentage scores that passed

the threshold. The impact of environmental variables is calculated as equation 6:

21

100)()(

)(

×=

AllxEP

xEP

b

Q

QxIEP [6]

Where,

)( xEP

bQ = Survivor runs with environmental variable (x) set to best case

AllxEPQ )( = Total survivor runs for all settings

The matrix’s colour coding denotes the level of importance of each parameter;

“green” designates a low risk parameter since it has been assigned to “worst-case”

most of the time. On the other hand “red” denotes the high risk due to the high

percentage of runs with “best-case” settings. Since the environmental variables which

were set to worst case did not degrade the NFR level below the threshold, if they are

set to best case they can only have a positive effect on the NFR. Conversely, with

variables that were set to best case during the NFR assessment, if set to worst case

they will decrease the NFR so it fails to pass the threshold level, therefore they are

indicated as a risk.

Fig. 6: System visualisation showing the system and phase view of the operational

performance assessment. The Incentives column (1) is worst case (coloured red in display), whereas the Light column (2) is better than average (yellow) and other columns are average

(orange). In this run no best-case (green) runs survived.

1

Phase view

System view 2 different designs

Dynamic task-steps display, depicting the last phase of a scenario composed of three task-steps

Selected BN for NFR

Test Scenarios

2

22

In the Identify Improvements step, if an overall design or a particular task step fails to

meet the desired NFR threshold then the back propagation analysis is used to set the

desired NFR value and the BN is back-propagated to discover the necessary settings

of agent or environmental variables to achieve the NFR value. Back propagation can

be used in two modes: all input nodes unconstrained, in which case the BN calculates

the input values required to achieve the user-determined output NFR; or one/few input

nodes unconstrained, in which case the BN calculates the values for these nodes given

settings for the constrained nodes. Back-propagation is usually hypothesis-driven to

focus on where design improvement could be made, so many variables are left with

their original settings, with a few nodes left unconstrained.

The results from the back propagation are compared with the properties of the original

component in order to identify the level of improvement required. For instance, if the

usability of the radar is set to 0.65 (actual) in the database and the assessed usability

from the back propagation is 0.83 (estimated) to achieve the desired NFR for

reliability of 0.85, then the required level of improvement is 0.18, i.e. 0.83 minus

0.65.

Figure 7 depicts the back propagation of the Operational Performance model using an

input set of environmental variables, the agent properties and the required NFR values

defined by the requirement specifications. The monitor windows on top of system

environment, human agent and NFR notes show the input variables. The monitor

windows on top of technology influences depict the distribution of the output nodes.

23

Fig. 7: Back propagating the BN to identify the cause of the NFR effect in terms of technology characteristics (influence of each one). A sub-set of the Operational Time performance net is

illustrated.

6 CASE STUDY

This case study describes the application of the SRA tool in validating the operational

performance and system reliability of a complex socio-technical system. The

requirements question is to assess the impact of new automated technology on the

task of loading weapons on to aircraft in an aircraft carrier. A description of the

human roles used in the following scenario is provided in table 2 and the technology

components are listed in appendix B.

A request for an air mission arrives in the control room from Carrier Group Strategic

Command. The mix of weapons/fuel tanks/electronic counter-measures pods, etc. is

planned according to the mission type and aircraft assigned to the mission. The Air

Planning Officer (APO) plans the required weapons load and schedules the loading

with the Deputy Air Planning Officer (DAPO). The load plan is communicated to the

Magazine Weapons Supervisor (MWS). The MWS plans the retrieval of weapons

Desired NFR value input

Changed functionality variable output

24

from the magazine and the Magazine Artificer (MA) retrieves the weapons and places

them on a trolley. The trolley is placed on the hoist which lifts it to the flight deck.

The trolley is then moved by the Weapons Artificer (WA) to the specified aircraft.

The Weapons Team Supervisor (WTS) is responsible for organising the WA teams. A

number of checks are performed by the Weapons Loading Controller (WLC) prior to

the loading of the weapons, e.g. check that the aircraft is properly grounded and

engine power is set to off; visually inspect the wing rack to ensure safety pins are

placed and the rack is locked; verify that all cockpit armament selectors are in the off

or safe position. On completion of safety checks the WA positions the trolley under

the aircraft wing, orients the trolley under the desired rack, lifts into position and

attaches the weapons. The trolley has a pneumatic pump to hoist the weapon up to the

wing; however, the final load-and-secure is manual and requires two or more WAs

depending on weapon weight. The process is repeated for the rest of the weapons. On

completion of the loading process the WLC tests the connections between the

weapons and the rack, then the WA removes the trolley. Finally the WLC inspects the

weapons before arming them and reporting completion to the Flight Deck Supervisor.

The process is usually carried out concurrently with two teams, one per aircraft wing.

Table 2. Description of the agent roles.

Roles Description

APO Air Planning Officer is responsible for the planning of the weapons load

according to missions requirements

DAPO Deputy Air Planning Officer is accountable to the APO. Responsible for the

planning of weapons load and communicating the plan to the magazine

MWS Magazine Weapons Supervisor is responsible for the effective management of

the MAs and the planning of the weapons retrieval

MA Magazine Artificer is responsible for the retrieval of weapons from the

magazine and loading on the transportation equipment

WTS Weapons Team Supervisor is responsible for the effective management of the

weapons loading team

WA Weapons Artificer is responsible for handling weapon systems on the flight

deck and elsewhere

WLC Weapons Loading Controller manages the flight deck weapon loading process

The scenario task-steps and components used for two prospective designs are shown

in appendix B. Tasks in Design 1 are manual or semi-automated, while in Design 2

they are semi- or fully automated; for instance, the task “Transfer weapons to aircraft”

has becomes specialised into “Move trolley to aircraft” and “Drive autoload palette to

25

aircraft”. The autoload palette has image sensors to detect the correct position on the

aircraft wing and knowledge of the aircraft and weapon type, so it can automatically

hoist and connect the weapons. The second design saves manpower since it can be

operated by one WA, and is potentially more rapid to operate, but it is more

expensive. The systems engineer needs to compare the two designs with a sensitivity

analysis to test different assumptions.

The analyst can easily pinpoint the more reliable design by focusing on the

comparison in the system view. Overall most of the tasks were more reliable in

Design 2 (advanced technology) at the rear of the bar chart in figure 6; however, tasks

“Schedule load” and “Report task completion” had more survivors and hence better

reliability in Design 1. Also both designs had poor reliability for “Move trolley to

aircraft” and the following checking tasks, so these are critical tasks that warrant

further attention. The two designs have equal and acceptable reliability for the Load

Planning task even though Design 2 was automated. Inspection of the agents’

properties and the BN tables shows that the information accuracy and maintenance

technology properties were set to poor because the planning system was a new

prototype, hence the improvement from automation was small. The poor reliability of

“Move trolley to aircraft” in both designs is a consequence of the effect of

environmental variables on human operation. This can be seen in the phase view in

figure 6 which shows that this task and load planning both suffer from adverse

environmental influences. Moving the trolley is primarily a manual task, so the

system selects the NPT tables which minimise the influence of the technology

component; in the Design 2 autoload palette, poor maintenance settings for new

technology reduce the advantage of automation. The adverse environmental

influences on human and machine agents are present for both designs, reflecting the

experience that manoeuvring equipment on a pitching aircraft carrier deck (sea

variable setting) is prone to error. Similarly the subsequent four checking tasks are all

manual and exposed to reliability influences from motivation (slips when not paying

attention) and interruptions in a busy flight deck environment (concurrency variable).

Solutions require human factors knowledge, which might suggest double checks to

improve reliability or improved design to support checking by augmented reality

display of reminders, location of objects to check, etc.

26

Fig. 8: Task completion time for each task in both designs. The lower part of the bar is the best case time; the upper part is the estimated time taking agent and environment variables

into account.

When the operational performance times are compared (see lower bars at the rear of

figure 8), Design 2 is quicker for nearly all tasks, which is not surprising since it has

more automated tasks. The projected increase from the best case task completion

times for Design 1 reflects the effect of the same variables that also caused poor

reliability.

Completion times for Plan and Schedule load tasks are long for both designs, which

might seem strange since Design 2 partially automated both tasks. However, best case

time even after automation is still long, since human checking is necessary to verify

automated decision making. The projected actual times reflect the poor reliability of

both designs, which can be traced to poor rating of information provided by the

technology, reflecting uncertainty under operational conditions. Most tasks have more

rapid best-case and estimated times in Design 2 because automated processes are

quicker and the time advantage is not changed by the effect of poor reliability in some

tasks, e.g. Planning, Scheduling, and Move trolley to aircraft.

The next step is to consider the critical environmental variables for both designs,

illustrated in figure 9. Figure 9a shows that incentives, motivation, duty time

concurrency, and time constraints were all marked as vulnerable for Design 1. Design

2 (figure 9b) in contrast fares better with only motivation, concurrency and

maintenance marked as vulnerable. Maintenance becomes a concern for the second,

more highly automated design and this reflects the NPTs selected for different levels

of automation. Cures as before require human factors knowledge; however, some

suggestions which can be found in the system database are to increase motivation to

improve crew morale, or provide incentives for these roles. Concurrency is difficult to

27

cure since so many tasks are prone to interruptions, while the effect of maintenance

depends on the system engineer’s judgement about the effectiveness of planned

maintenance. The tool’s role is to point out the problem which can be cured by

changed procedures, and management decisions such as to increase investment in low

maintenance equipment.

Fig. 9(a): Environmental influences for Design 1. The arrow points to critical task. Red (darker

shading) indicates adverse environmental variables.

Fig. 9(b): Environmental influences for Design 2. The arrow points to the critical task.

After identifying the most appropriate design, the problematic tasks and the critical

environmental variables, the analyst investigates the improvements required for the

Autoload palette component, which was the weakest link in Design 2. Using the back-

propagation facility, the minimum acceptable reliability is set in the output node, and

the nodes where design or operational environmental changes can be made are left

unconstrained.

28

Fig. 10: Tuple components suggested improvements for Design 2. The circled cells correspond to the required improvements for the generic task “Drive autoload palette to aircraft”. Dark-filled cells represent properties that are not applicable to the component.

In this case, equipment maintenance (already identified as a vulnerability) and the

human operator’s experience (the only way to overcome difficult carrier deck

operations) are selected. The BN shows that maintenance needs to be improved by

50%, and operator’s experience by 26% (see figure 10). Translating these into specific

needs requires domain expertise; however, the tool does quantify the degree of

improvement and this can be empirically tested by setting targets in a prototype

system.

7 VALIDATING THE BN MODELS

We used data mining techniques to test the assumptions embedded in the BN models

to map the expected influences elicited from domain experts and theory. We

simulated all possible permutations of the input model variables and created a

database of reliability and performance time predictions for these runs. This produced

an extensive set of test data; for example, for one scenario composed of four phases

with six task steps in each phase the tool generated 4*6*312

records. The BN model’s

NPT and the causal influences were analysed with the following data mining

techniques: relevance analysis, association rules and classification [25]. Relevance

analysis ranks input parameters of the model based on their relevance to one of the

model’s output parameters (e.g. reliability in our BN). Association rules describe how

often two or more facts co-occur in a data set and were employed to check the causal

associations in our model. Classification partitions large quantities of data into sets

with common characteristics and properties and was used to provide a further check

on the structure of the BN models.

The initial assumptions made about influences on system reliability and operational

performance were mainly satisfied. However, the relevance analysis revealed that sea

state had only a minor influence on system error, although according to domain

29

experts, it is a major influence on human error. Several intermediate nodes had diluted

the influence of sea state on system error nodes so it was necessary to alter the BN

causal diagram. The two BN models for assessing operational performance with

different levels of automation showed a similar influence of maintenance on

operational performance, which should not be the case. These inaccuracies were

addressed by altering the BN’s NPTs to increase the prior probability influence for

poor maintenance on automated tasks.

Association analysis identified two rules with high significance levels that were not

explicitly defined in the model:

IF (DutyTime = High) THEN (Survived = Fail)

IF (Workload) = High) THEN (Survived = Fail).

These rules indicated that the causal influences of “Duty Time” and “Workload” were

higher than the influences in the BN what had been specified by the domain experts.

In order to overcome this problem we altered the NPT settings to reduce the

weighting of these nodes and increase the influence of the “Distractions” node that

appeared weak. Finally, classification analysis pinpointed problems with crew

motivation and agent ability nodes which suggested changes to the BN model

structure.

8 DISCUSSION AND CONCLUSIONS

The main contribution of this research has been to develop automated testing of

requirements specifications and designs for conformance to non-functional

requirements using a set of scenarios and variations in the system environment. This

is a considerable advance over existing tools which support validation of NFRs by

inspection of models [41]. Our automated scenario-based testing tool explicitly

considers environmental influences, and provides visualisations for pinpointing

problematic tasks and components within a design and scenario sequence. The

technology is applicable to problems where requirements are expressed as properties

of components, such as the human and machine agents in our system engineering

domain. However, the configuration costs of the BNs will limit the cost effectiveness

of the technology for new green-field requirements engineering problems; on the

other hand it should pay back in brown-field domains where designs are incrementally

refined, and the set-up costs can be amortised over many generations of testing.

30

More generally the SRA could be applied to any class of component-based problems

where the selection of components needs to be optimised for non-functional

requirement types of criteria. The architecture is modular and scalable, allowing new

NFRs to be investigated by “plugging in” the appropriate BN. Our work presents a

new view on component-based model-checking using BNs which could, in principle,

be applied to model-checking requirements at lower levels of granularity, such as

black-box software component configuration. The BN approach could apply to any

domain where requirements attributes can be synthesised into a predictive model of

performance, effectiveness, or other non-functional requirements. It can be applied to

problems that can be described by a set of sequential tasks, for instance checking

workflow systems expressed as sequential tasks/functions undertaken by a

collaboration between human and software agents.

The SRA tool was a development from our previous BN requirements analyser [26],

and has partially addressed the difficult problem of scenario-based testing [4, 63x].

Although there is no substitute for domain expertise in generating or acquiring

scenarios, our approach can amplify scenario-based validation by systematically

testing a set of assumptions that are implicit within scenarios. This enables areas of

concern to be pinpointed, as well as enabling trade-off analysis between alternative

designs. However, the fidelity of testing depends on the accuracy and sophistication

of the BN models. There is no quick solution to validating complex models of human

error and environmental influences on system failure since exhaustive experiments on

complex systems can never be complete; incorporating human factors into assessment

of systems or user interfaces has to rely on models constructed from theory and

domain expertise [30, 35, 53]. We have followed both approaches in constructing BN

models.

The SRA tool is aimed at requirements investigation in complex socio-technical

systems, and hence it complements model-checking tools which are more appropriate

to later stages in development when specifications of agent behaviour are available,

e.g. SpecTM-RL [38], KAOS-GRAIL [66, 67]. Other scenario-based requirements

analysis tools such as ARTSCENE [56] help to automatically generate scenario

variations by pathway expansion algorithms that trace normal and

alternative/exception paths through use cases, but no validation support is provided

31

beyond suggestions for generic requirements which may be applications for different

scenario events.

The use of BNs by Fenton et al. in their work on software metrics and risk analysis

[12, 15, 18] is closely related to our approach. However, they employed BNs to assess

the quality of software systems based on the properties of system specifications,

development process and code. Their use of BNs assumes a static view whereas we

have extended Bayesian tests for a dynamic view in operational scenarios by

introducing the notion of test survivors to avoid the problems of Bayesian reasoning

over multiple sequence states. We do not consider operational testing with scenarios.

In the JSIMP tool Fenton and Cates [14] provide predictions of project failures based

on BN analysis of project management practices. Users enter scenario information via

a questionnaire interface and obtain probability distributions of unknown variables

using the back-propagation facilities of BNs, also incorporated within our tool.

Although the JSIMP tool has an end-user interface that hides complexities of the BN

from the user, it does not include sophisticated visualisation facilities to compare with

our SRA tool, which allows the analyst to assess multiple model assessments over a

variety of scenario sequences and environmental conditions.

There is no shortage of scenario-based tools for requirements validation and

verification; however, all these tools use more detailed specifications of system

behaviour which will not exist in the early stages of the requirements process or

domains with black-box component-based design. For instance, Ryser and Glinz [55]

convert natural language scenarios into statecharts which in turn are used to generate

test cases used for system validation. In common with our tool the scenario

conversion process is manual and labour intensive, so one future direction in our work

will be to investigate information extraction tools [57] which may be able to partially

automate generation of scenario event sequences from text-based narratives. Like the

ARTSCENE environment, the SCENT method [53] only provides automated

derivation of possible test cases, and no assistance in validation of requirements

specifications. Zhu and Jin [71] also used formalised scenarios for validating

requirements based on the principles of activity lists [2] but did not provide any

validation for non-functional requirements.

32

Although our approach has delivered an analysis tool for investigating system

requirements, there are some limitations in its applicability. First we make the

assumption of single-threaded tasks. While this is true for highly trained military

domains in event-driven scenarios, it will not be the case in domains where

opportunistic behaviour is the norm. Another simplification is that we do not model

concurrency and communication in our scenarios. Since our scenarios are single-

threaded, concurrency is not a severe problem; furthermore, we argue that the SRA

tool uses approximate models so its value lies not in diagnosis of a completely

realistic task model but rather in comparative assessment of two (or more) different

designs using the same set of scenarios and analysis approach. Given these

limitations, the SRA provides a reasonable trade-off between modelling effort and

diagnostic power. However, in our ongoing research we are investigating concurrent

scenarios and communication within the BN analysis.

REFERENCES

[1] J. S. Anderson and B. Durley, “Using scenarios in deficiency-driven requirements

engineering,” presented at Requirements Engineering RE'93, 1993.

[2] J. S. Annett and K. D. Duncam, “Task analysis and training design,” Occupational

Psychology, vol. 41, pp. 211-221, 1967.

[3] R. W. Bailey, Human Performance Engineering: A Guide for System Designers. Englewood

Cliffs NJ: Prentice Hall, 1982.

[4] J. M. Carroll, Scenario-based design: Envisioning work and technology in system

development. New York.: Wiley, 1995.

[5] J. M. Carroll, M. B. Rosson, G. Chin, and J. Koenemann, “Requirements development in

scenario-based design,” IEEE Transactions on Software Engineering, vol. 24, pp. 1156 -

1170, 1998.

[6] K. Casey and C. Exton, “A Java 3D Implementation of a Geon Based Visualization tool for

UML,” presented at PPPJ, Kilkenny, Ireland, 2003.

[7] S. Cunning, J., “Test scenario generation from structural requirements specification,”

presented at Symposium on Engineering of Computer-Based Systems (ECBS '99), Nashville,

TN, USA, 1999.

[8] A. Davis and P. Hsia, “Giving voice to requirements engineering,” IEEE Software, vol. 11,

pp. 12-16, 1994.

[9] J. C. S. do Prado Leite and L. M. Cysneiros, “Nonfunctional Requirements: From Elicitation

to Conceptual Models,” IEEE Transactions on Software Engineering, vol. 30, pp. 328-350,

2004.

33

[10] P. Dubois, E. Dubois, and J. Zeippen, “On the Use of a Formal Representation,” presented at

3rd IEEE International Symposium on Requirements Engineering, Los Alamitos CA, 1997.

[11] G. Engels, “Model-Based Verification and Validation of properties,” Electronic Notes in

Theoretical Computer Science, vol. 82, 2003.

[12] N. Fenton, “Applying Bayesian belief networks to critical systems assessment,” Critical

Systems, vol. 8, pp. 10-13, 1999.

[13] N. Fenton, “A critique of software defect prediction models,” IEEE Transactions on Software

Engineering, vol. 25, pp. 675-689, 1999.

[14] N. Fenton and P. Cates, “JSIMP: BN model and tool for the SIMP project,” Queen Mary

(University of London), London 30 July 2003.

[15] N. Fenton, P. Krause, and M. Neil, “Software Measurement: Uncertainty and Causal

Modeling,” IEEE Software, vol. 10, pp. 116-122, 2002.

[16] N. Fenton and B. Littlewood, Software reliability and metrics: Elsevier, 1991.

[17] N. Fenton and N. Maiden, “Making Decisions: Using BNs and MCDA.” London.: Computer

Science Dept, Queen Mary and Westfield College, 2000.

[18] N. Fenton and M. Neil, “Software metrics: successes, failures and new directions,” Journal of

Systems Software, 2000.

[19] N. Fenton and S. L. Pfleeger, Software Metrics: A Rigorous Approach. London: International

Thomson Computer Press, 1997.

[20] A. Fuxman, M. Pistore, J. Mylopoulos, and P. Traverso, “Model Checking Early

Requirements Specifications in Tropos,” presented at International Symposium on

Requirements Engineering 01, Toronto, Canada, 2001.

[21] J. Galliers, S. Sutcliffe, and S. Minocha, “An impact analysis method for safety-critical user

interface design,” IEEE Transactions on Software Engineering, vol. 6, pp. 341-369, 1999.

[22] A. Gemino, “Empirical comparison of animation and narration in requirements validation,”

Requirements Engineering, vol. 9, pp. 153-168, 2003.

[23] A. Grau and M. Kowsari, “A validation system for object-oriented specifications of

information systems,” presented at 1st East European symposium on advances in databases

and information systems (ADBIS '97), St Petersburg, 1997.

[24] A. Gregoriades, J. E. Shin, and A. G. Sutcliffe. “Human-centred requirements engineering”. In

Proceedings: RE 04, Kyoto Japan,. Los Alamitos CA: IEEE Computer Society Press, pp154-

164, 2004.

[25] A. Gregoriades, A. G. Sutcliffe, and H. Karanikas, “Evaluation of the SRA Tool Using Data

Mining Techniques,” presented at CAiSE 2003, Klagenfurt/Velden, Austria, 2003.

[26] A. Gregoriades, A. G. Sutcliffe, and J. E. Shin, “Assessing the Reliability of Socio-technical

Systems,” presented at 12th Annual Symposium INCOSE, Las Vegas, USA, 2002.

[27] K. M. Hansen, A. P. Ravn, and V. Stavridou, “From safety analysis to software requirements,”

IEEE Transactions on Software Engineering, vol. 24, pp. 573 - 584, 1998.

[28] P. Haumer, K. Pohl, and K. Weidenhaupt, “Requirements elicitation and validation with real

world scenes,” IEEE Transactions on Software Engineering, vol. 24, pp. 1036-1054, 1998.

34

[29] C. Heitmeyer, J. Kirby, and B. Labaw, “Applying the SCR requirements method to a weapons

control panel: An experience report,” presented at FMSP 98, Clearwater Beach, Florida, USA,

1998.

[30] E. Hollnagel, Cognitive Reliability & Error Analysis Method: Elsevier Science, 1998.

[31] E. Hollnagel, Human Reliability Analysis Context and Control. New York: Academic Press,

1993.

[32] E. Hollnagel, “The phenotype of erroneous actions: Implications for HCI design,” in Human-

computer Interaction and complex systems, G. Weir and J. Alty, Eds. London: Academic

Press, 1990.

[33] P. Hsia, A. Davis, and D. Kung, “Status Report: Requirements engineering,” IEEE Software,

vol. 10, pp. 75-79, 1993.

[34] R. Jeffords and C. Heitmeyer, “A strategy for efficient verifying requirements specification

using composition and invariants,” presented at ESEC/FSE 03, Helsinki, Finland, 2003.

[35] B. I. Kirwan, A Guide to Practical Human Reliability Assessment. London: Taylor and

Francis, 1994.

[36] V. Lalioti, “Animation for validation of business system specifications,” presented at Hawaii

International Conference on System Sciences 30, The dynamics of business systems

engineering, Wailea, Hawaii, January 1997, pp 7-10, 1997.

[37] V. Lalioti and P. Loucopoulos, “Visualisation of conceptual specifications.,” Information

Systems, vol. 19, pp. 291-309, 1994.

[38] N. G. Leveson, “Intent specifications: an approach to building human-centered

specifications,” IEEE Transactions on Software Engineering, vol. 26, pp. 15 - 35, 2000.

[39] N. G. Leveson, Safeware: System Safety and Computers. Reading, MA.: Addison Wesley,

1995.

[40] J. Mylopoulos, L. Chung, and B. Nixon, “Representing and using non-functional

requirements: A process oriented approach,” IEEE Transactions on Software Engineering,

vol. 18, pp. 483-497, 1992.

[41] J. Mylopoulos, L. Chung, and E. Yu, “From Object-Oriented to Goal-Oriented Requirements

Analysis,” Communications of the ACM, vol. 42, pp. 1-7, 1999.

[42] M. Neil, N. Fenton, and L. Nielsen, “Building large-scale Bayesian Networks,” The

Knowledge Engineering Review, vol. 15, pp. 257-284, 2000.

[43] B. Nixon, “Management of performance requirements for information systems,” IEEE

Transactions on Software Engineering, vol. 26, pp. 1122 - 1146, 2000.

[44] D. Norman, The psychology of everyday things. New York: MIT Press, 1988.

[45] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Information.

San Francisco: Morgan Kaufmann, 1988.

[46] C. Potts, “ ScenIC: A strategy for inquiry-driven requirements determination,” presented at

RE'99: International Symposium on Requirements Engineering, Limerick, Ireland, 1999.

[47] C. Potts and A. Anton, “A Representational Framework for Scenarios of System Use,”

Requirements Engineering, vol. 3, pp. 219-241, 1998.

35

[48] C. Potts, K. Takahashi, and A. Anton, “Inquiry-Based Requirements Analysis,” IEEE

Software, vol. 11, pp. 21-32, 1994.

[49] C. Potts, K. Takahashi, J. Smith, and K. Ota, “An Evaluation of Inquiry-Based Requirements

Analysis for an Internet Service,” presented at Second International Symposium on

Requirements Engineering, York, UK, 1995.

[50] J. Rasmussen, “Human Error and the Problem of Causality in Analysis of Accidents,”

Philosophical Transactions of the Royal Society of London Series B - Biological Sciences, vol.

327, pp. 449-462, 1990.

[51] J. Rasmussen, “ Skills, rules, knowledge; signals, signs, and symbols; and other distinctions in

human performance models,” IEEE Transactions on System Man and Cybernetics, vol. 13,

pp. 257-266., 1983.

[52] J. Reason, Human Error. New York: Cambridge University Press, 1990.

[53] J. Reason, Managing the Risks of Organizational Accidents. Ashgate: Aldershot, 2000.

[54] C. Rolland, C. Souveyet, and C. B. Achour, “Guiding goal modeling using scenarios,” IEEE

Transactions on Software Engineering, vol. 24, pp. 1055 - 1071, 1998.

[55] J. Ryser and M. Glinz, “A scenario-based approach to validating and testing software systems

using statecharts,” presented at 12th International Conference on Software and Systems

Engineering and their Applications ICSSEA' 99, Paris, France, 1999.

[56] N. Seyff, P. Grunbacher, N. Maiden, and A. Toscar, “Requirements engineering tools go

mobile,” presented at International conference on software engineering (ICSE 04), Scotland,

2004.

[57] J. E. Shin, A. Sutcliffe, and A. Gregoriades, “Scenario Advisor Tool for Requirements

Engineering,” Requirements Engineering, vol. Online

http://www.springerlink.com/app/home/journal.asp?wasp=m3tlwhruwl4u54qmhqvl&referrer=

parent&backto=linkingpublicationresults,1:102830,1, 2004.

[58] Sutcliffe, A.G. The Domain Theory: Patterns for Knowledge and Software Reuse. Mahwah

NJ: Lawrrence Erlbaum Associates, 2002.

[59] A. G. Sutcliffe, J. Galliers, and S. Minocha, “Human Errors and System Requirements,”

presented at 4th IEEE International Symposium on Requirements Engineering, Los Alamitos,

1999.

[60] A. G. Sutcliffe and A. Gregoriades. “Validating Functional System Requirements with

Scenarios.” In Proceedings of 1st IEEE Joint International Conference on Requirements

Engineering, RE02, Essen, Germany Sept 2002, Eds Greenspan S., Siddiqi J., Dubois E. and

Pohl K., pp 181-190. Los Alamitos CA: IEEE Computer Society Press, 2002.

[61] A. Sutcliffe, N. Maiden, S. Minocha, and M. Darrel, “Supporting scenario based requirements

engineering,” IEEE Transactions on software engineering, vol. 24, pp. 1072-1088., 1998.

[62] A. G. Sutcliffe and G. Rugg, “A taxonomy of error types for failure analysis and risk

assessment,” International Journal of Human Computer Interaction, vol. 10, pp. 381-406.,

1998.

[63] A. Sutcliffe, G. and M. Ryan, “Assessing the Usability and Efficiency of Design Rationale,”

presented at Human Computer Interaction INTERACT-97, IFIP/Chapman and Hall, 1997.

[64] A. D. Swain and H. Guttmann, “Handbook of human reliability analysis with emphasis on

nuclear power plants applications,” Nuclear Regulatory Commission, Washington, DC 1983.

36

[65] A. van Lamsweerde, “Goal-Oriented Requirements Engineering: A Guided Tour,” presented

at Fifth IEEE International Symposium on Requirements Engineering (RE '01), 2001.

[66] A. van Lamsweerde, “Goal-oriented requirements engineering: a roundtrip from research to

practice,” presented at Requirements Engineering Conference, Kyoto, Japan, 2004.

[67] A. van Lamsweerde and E. Letier, “Handling obstacles in goal-oriented requirements

engineering,” IEEE Transactions on Software Engineering, vol. 26, pp. 978 - 1005, 2000.

[68] M. Visser and P. A. Wieringa, “PREHEP: human error probability based process unit

selection,” IEEE Transactions on Software Engineering, vol. 31, pp. 1 - 15, 2001.

[69] D. Wright and K. Cai, “Representing uncertainty for safety critical systems,” City University,

London 1994.

[70] E. Yu and J. Mylopoulos, “Towards Modelling Strategic Actor Relationships for Information

Systems Development, with Examples from Business Process Reengineering,” presented at

4th Workshop on Information Technologies and Systems, Vancouver, B.C., Canada, 1994.

[71] H. Zhu and L. Jin, “Scenario analysis in an automated tool for requirements engineering,”

Requirements Engineering, vol. 5, pp. 2-22, 2000.

[72] H. Ziv and D.J. Richardson. “Constructing Bayesian-network Models of Software Testing and

Maintenance Uncertainties”, International Conference on Software Maintenance, Bari, Italy,

September 1997.

37

Appendix A: BN models: summary of input nodes and measurements

Node Description + measure Worst-case settings

Noise Ambient noise: decibels (dB) >100 dB (good <50 dB)

Lighting Ambient lighting: lux, or legibility of

small 10 pt text

10 pt text not legible at 20 cms

Comfort Ambient temperature temperature <15C or >35C

War/peace War or peace status on 1 to 4 scale,

peacetime to war

War emergency

Sea state Sea state and hence ship roll and pitch,

measured on Beaufort scale 1 to 9

Beaufort force >8

Visibility Visibility from vessel in nautical miles <1 nautical mile

Workload Agent’s workload >3 concurrent tasks

Duty time Agent’s time on duty and at sea >3 months continuously at sea

Fatigue Time on watch, weighted by war/peace >7 hours on duty at high alert

Time constraints Time available to complete a task Response necessary <1 min

Incentives Incentives: measured by job satisfaction

questionnaire

No incentives to improve, rating

<2 on 1 to 7 (best) scale

Management culture Management culture: job satisfaction

questionnaire

No leadership, little motivation

or responsibility, rating <2 on 1

to 7 (best) scale

Functionality Support for user’s task: equipment

satisfaction questionnaire or expert

assessment of technical specification

Rating of useful features <2 on 1

to 7 scale where 7 is excellent

Performance Expert assessment of technical

performance

e.g. threat detection/destroy

probabilities fail to meet

minimum requirements

Reliability Reliability history: mean time between

failures

MTBF >1 in 10 hours’ operation

Usability Usability measured by questionnaire

rating or usability testing

>5 errors committed by 95%

users following test task

Distraction Distractions to normal task operation >5 interruptions/min

Internal motivation Agent’s internal motivation assessed by

questionnaire or task performance test

Rating <2 on 1 to 7 motivation

questionnaire, 7 excellent

Cognitive complexity Cognitive complexity of the task: NASA

TLX

Cognitive complexity measure

>10 on TLX scale

Physical complexity Physical complexity measured by number

of manipulations, precision, and difficulty;

expert assessment; or operational time

Physical complexity upper 10%

of distribution of task assessed

Inherited ability Agent’s inherited ability: IQ test or

aptitude questionnaire

Agent’s score <25% or in lowest

10% of test score distribution

Task knowledge Agent’s task knowledge: quiz score or

performance test

Agent’s score <25% or in lowest

10% of test score distribution

Domain knowledge Agent’s domain knowledge: quiz score Agent’s score <25% or in lowest

10% of test score distribution

38

Appendix B: Alternative designs for the aircraft carrier’s aircraft weapons loading system

Design 1: Manual Design 2:

Increased Automation Tasks

Agent Technology Agent Technology

Plan weapons load for mission APO Weapons aircraft

availability display

APO Automated weapons

aircraft allocation

system

Schedule weapons load

sequence

DAPO Flight deck display APO Aircraft weapons load

scheduler

Communicate load plan to

flight deck and magazine

DAPO Radio DAPO Data link

Plan weapons retrieval MWS Weapons layout chart MWS Weapons layout chart

Retrieve weapons from

magazine

MA Weapons retrieval trolley MWS Weapons retrieval robot

Load weapons onto

transporter

MA Weapons trolley MA Weapons autoload

palette

Place transporter on hoist MA Weapons trolley MA Weapons autoload

palette

Operate hoist MA Hoist MA Autoload hoist

Transfer weapons to aircraft WA Weapons trolley WA Weapons autoload

palette

Check aircraft is grounded WTS Ground cable indicator WTS Ground cable indicator

Check safety pins WTS Safety pins WTS Safety pins

Check armament is set to off WTS Armament indicator WTS Armament indicator

Check power is off WTS Power indicator WTS Power indicator

Position weapons loading

equipment under aircraft wing

WA Weapons loading trolley WA Weapons autoload

palette

Orient weapons loading

equipment

WA Weapons loading trolley WA Weapons autoload

palette

Lift and position weapons on

wing

WA Weapons loading trolley WA Weapons autoload

palette

Test weapons connection WTS Aircraft weapon mounts WTS Weapons autoload

palette

Remove weapons loading

equipment

WA Weapons loading trolley WA Weapons autoload

palette

Inspect weapons WLC Weapon racks and mounts WLC Weapon racks and

mounts

Arm weapons WTS Weapon controls WTS Weapon controls

Report load completion WTS Radio WTS Data link

39

Andreas Gregoriades holds a PhD and MPhil in Computer Science from UMIST

(University of Manchester Institute of Science and Technology). Currently he is

employed as a Research Fellow at the Surrey Defence Technology Centre (DTC). His

research interests cover Artificial Intelligence for smart Decision Support, Systems

Engineering, Human Reliability Assessment and Software Engineering. He has been

involved in a number of EPSRC and European R&D projects in the areas of Complex

Socio-technical Systems Design, Business Process Modelling and Simulation,

Requirements Engineering and Systems Reliability Assessment. He has also acted as

a reviewer for IEEE Transactions on Knowledge and Data Engineering and for

various International Conferences and Workshops.

Alistair Sutcliffe is Professor of Systems Engineering, in the School of Informatics,

University of Manchester. He has been principle investigator on numerous EPSRC

and European Union projects on requirements engineering, multimedia user

interfaces, safety critical systems and cognitive modelling for information retrieval.

He researches in Human Computer Interaction and Software Engineering. In HCI

particular interests are interaction theory, and user interface design methods for web

sites, multimedia, virtual reality, safety critical systems, and design of complex socio-

technical systems. In software engineering he specialises in requirements engineering

methods and tools, scenario based design, knowledge reuse and theories of domain

knowledge. Alistair Sutcliffe on the editorial board of ACM-TOCHI, REJ and JASE.

Alistair Sutcliffe and is the editor of the ISO standard 14915 part 3, on Multimedia

user interface design. He has over 200 publications including five books and several

edited volumes of papers and was awarded the IFIP silver core in 2000.

40