Upload
yasuharu-nishi
View
214
Download
0
Embed Size (px)
Citation preview
Design of Stress Testing Focused on Resource
Yasuharu Nishi and Yoshinori Iizuka
Department of Chemical System Engineering, Graduate School of Engineering,
The University of Tokyo, Tokyo, 113-8656 Japan
SUMMARY
A stress test is a vital element of software system
testing. Although experiential methods have already been
proposed, no systematic design method using models has
been proposed. In this paper the authors first survey the
viewpoint of experienced engineers, then clarify the impor-
tance of focusing on resources. Next they expand their
concept of resources, then perform a causal analysis of
problems using a resource state model. Based on these
causes, the authors propose a design method and design
process for stress test items. In addition, the authors provide
an applied example of their proposed method. © 2002
Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 85(10):
51�68, 2002; Published online in Wiley InterScience
(www.interscience.wiley.com). DOI 10.1002/ ecjc.10044
Key words: software test; system test; stress; load;
resource.
1. Introduction
Software testing can be divided into unit tests, com-
bined tests, function tests, and system tests [1]. The stress
test is one of the important tests to be performed in a system
test. Stress tests are becoming extremely important as par-
allel processing becomes more common and data storage
capacity rises.
Research related to stress tests includes Myers�s con-
ceptual categories [2] and Beizer�s experiential method [3].
In addition, the use of Kaner and colleagues� boundary
value analysis method [2] has been suggested [4].
When designing an actual stress test, efficient test
designs are created by combining the methods described
above. This is because the viewpoint to be focused on in the
design of the stress test is established by an experienced
engineer.
On the other hand, Beizer described the need to
simplify the concept of the program as a way to test a
program efficiently [5]. In other words, the test design
requires a model similar to the product design. By introduc-
ing a model, the product design and a similar test design
can be performed systematically, and reviews and other
design evaluations can be made simpler.
In prior test design, a data flow model and various
other models used for unit testing have been proposed, but
a model specific to stress test design has not been proposed.
Thus, in this paper the authors propose a model
particular to stress test design and a method for stress test
design based on the perspective of an experienced engineer.
Section 2 describes the problems found in existing methods.
Section 3 investigates and analyzes the perspective to be
focused on in stress test design as seen by an experienced
engineer, then proposes a resource state model for stress test
design. In Section 4 the conditions which result in problems
in the model are clarified based on outline analysis in FT
diagrams, and stress is discussed. Section 5 explains the
methods of applying stress, then proposes a design method
for stress tests which focuses on resources. Section 6 offers
applied examples.
© 2002 Wiley Periodicals, Inc.
Electronics and Communications in Japan, Part 3, Vol. 85, No. 10, 2002Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J83-D-I, No. 10, October 2000, pp. 1070�1086
51
2. Existing Methods
2.1. Myers�s conceptual categories
Myers described the concepts necessary for stress
tests using the following four categories [2]:
• High volume tests (high volumes of data)
• Storage region tests (necessary storage region not
given)
• Stress tests (large amounts of data in brief times)
• Reliability tests (execution over the long term)
Myers goes no further than describing concepts, and so
specific test items cannot be designed.
2.2. Beizer�s experiential method
Beizer defined a stress test as a test in which all
resources are filled at the same time by using high back-
ground loads [3]. A resource refers to a storage region in
memory or on a disk. In contrast to Myers�s conceptual
categories, storage region tests and reliability tests are not
performed.
Beizer describes a design policy in which tests are
performed with consideration for the threshold values of
system resources. However, because no examples are given,
experience is required to design the specific test items.
2.3. The boundary value analysis method
Kaner�s group suggests using a boundary value
analysis method [2] for the design of stress tests [4]. The
boundary value analysis method gives as parameters near
boundary values for the input space and result space defined
in the specifications.
3. Modeling Programs Focused on
Resources
In this section surveys and analysis are performed of
the perspectives of an experienced engineer designing
stress tests. In accordance with these perspectives, the
authors propose a resource state model used to systemati-
cally design stress tests.
3.1. Investigation of perspectives needed for
stress test design
The authors performed surveys and analysis of the
perspectives which focus on stress test design as held by an
experienced engineer. In order to minimize bias, we em-
ployed engineers with over 3 years of experience at testing
organizations which are asked by various large vendors to
perform tests.
The survey method involved showing the tests to the
engineers, then offering system test items as design results.
There were nine types of tests, including groupware,
DBMS, operating systems, and device drivers. In order to
simplify the responses, a brief explanation of Myers�s con-
ceptual categories* was given.
The survey results yielded 411 system test items in
12 categories. First, 144 items which could be used in
stress tests were selected from the items obtained; sec-
ond, 4 items in which the definition of a stress test itself,
that is, a test which imparts high levels of stress, was
included. Next, 138 items for which designs could be
created without the perspective of experience but instead
using the boundary value analysis method were ex-
cluded. In this fashion the perspectives focused on when
designing the 13 items obtained are thought to represent
the perspectives focused on by experienced engineers
when designing stress tests.
Further detailed analysis was performed on these 13
items. The boundary value analysis method was then ap-
plied to the storage capacity of 8 of these items after setting
up an exchange of memory and other information; in addi-
tion, the remaining 5 items became test items used to
increase the exchange time after setting up an exchange of
memory and other information. In other words, tests de-
signs were created after setting of an exchange of all mem-
ory and other information.
The majority of what is used for exchanging infor-
mation is memory, a disk or other storage region referred
to as a resource by Beizer.
On the other hand, however, although information
such as the bandwidth of a network could be used, such
information was assumed not to represent a storage region.
In addition, analysis results expressed at various lev-
els of abstraction were obtained for the presumed resource,
from abstract expressions like memory to concrete expres-
sions like 32-KB segments.
Based on the above, we can conclude that the per-
spectives necessary for the design of stress tests include
�resources� at various levels of abstraction in the broad
sense given by Beizer�s definition.
3.2. Outline of the resource state model
A model specific to stress tests is necessary for per-
forming stress test design systematically. In this section the
authors model the operation of a program in accordance
with the resource perspective developed in the previous
*No investigation of efficiency tests, setup tests, or service type tests was
made.
52
section. This model will be referred to as a resource state
model. Figure 1 shows an outline of the resource state
model.
This research describes a mechanism which gener-
ates problems due to stress by using a resource state model
in order to derive stress test items. Here a problem refers to
the occurrence of incorrect condition or output results that
do not match expectations [6].
For instance, this could refer to a problem in which
an incompatibility occurs in the content of a database due
to a competitive state found when a CPU becomes heavily
loaded due to multiprocess processing, or a problem which
occurs when a WRITE operation to a resource occurs
continuously over very short times due to a large number
of tasks or a content inconsistency occurs in the state value
of a resource.
In Section 3.3 a resource is modeled, and in Section
3.4 the resource state value and the related exchanges are
modeled. In Section 3.5 the concept of a layer is described
as a model for the level of abstraction of the resource, and
in Section 3.6 a task is described as a model of program
operation.
3.3. Resources
Although Beizer has represented memory, a disk, or
other storage region as a resource [3], based on the analyti-
cal results in Section 3.1, an experienced engineer would
also include network bandwidth as a resource.
Thus, in this paper the definition of a resource will be
expanded to �an object which exchanges information held
by itself in order to execute a program.� As a result, network
bandwidth can also represent a resource. Here the informa-
tion held in a resource will be referred to as the resource
�state value.�
By using this definition the execution of a program
can be taken to be �a chain of exchanges of the respective
state values of several resources.�
3.4. Resource state values and exchanging
them
The state values exchanged by resources include not
only the content of a resource, such as the data being input
or output or interim computational results, but also various
resource attributes such as pointers and mutex.
In order to make their work general, in the authors�
model only the state values absolutely necessary are mod-
eled under conditions in which exchanges are occurring
among several resources.
When one resource is working with several resources,
first the working resource must be identified and indicated
from among the other resources. If conversely a large
amount of data is passed to it as a form of stress, buffer
overflow may result, and it will not be possible to identify
the resource. In such cases a general protection fault or
similar problem will occur.
When the resource is identified, the content of the
data it holds is passed to the designated resource. The
designated resource then holds the content of the passed
data thereafter.
When data are passed from several resources at the
same time, nonexclusive control is implemented in order to
avoid a competitive state, and the resource must be desig-
nated as dominant. In particular, when a lot of data are
passed as stress at the same time, competitive states can
readily occur.
The resource state value can be taken to be in three
forms, as follows. In the authors� model, these are expressed
as shown below.
[Identification state value (ID)]
The identification state value is the state value held
by the resource in order to identify and indicate the resource
to be used from among the various resources. Pointer and
variable names, and the sector number represent the iden-
tification state value.
[Content state value (DATA)]
The content state value is the content of the data to be
exchanged with another resource. The value of the variable,
and the data in memory or on disk represent the content
state value.
[Dominant state value (LOCK)]
The dominant state value is the state value held by the
resource in order to indicate that the exchange is being
performed nonexclusively under conditions in which ex-
changes with other resources are occurring at the same time.Fig. 1. Resource state model.
53
An empty transmission path in mutex or Ethernet represents
the dominant state value.
The exchange of state values occurring among re-
sources can be taken as representing two operations, those
of referencing and updating state values, when seen from
the viewpoint of one resource.
In order to reference a state value, the ID for the
resource to be referenced is required. In order to update a
state value, the ID and DATA for the resource to be updated
are required. In the authors� model, the ID and other infor-
mation needed for such referencing or updating are referred
to as a �parameter.� If a parameter can be passed using a
reference to another resource such as a pointer, then it can
be displayed to the program as a variable name or it can be
passed from the outside as would be input from the user.
In other words, in order to model the exchange of
state values between resources, �READ� and �WRITE� for
the three types of state values as well as their respective
parameters must be considered.*
Hereafter the expressions RI and WD will refer to
READ (R.) and WRITE (W.) for the state values, and the
subscripts will refer to the state value type
(ID/DATA/LOCK) using uppercase letters. Note that P.
indicates the state value passed as a parameter. For instance,
RI would refer to an ID state value (ID) READ, WD to a
content state value (DATA) WRITE, and PI to an ID passed
as a parameter. In addition, IDs which cannot be identified
are called �NULL IDs.�
3.5. Level of abstraction for resources
3.5.1. Layers and resource pools
Based on the analysis results obtained in Section 3.1,
an experienced engineer will represent resources using
various levels of abstraction from abstract expressions such
as memory to concrete expressions such as 32-KB seg-
ments, then will perform boundary value analysis.
As an example let us consider acquiring a memory
block from an OS heap region using an operation like
malloc() in the C language. If a test designer implements
the test design by referring only to the boundary value
representing the maximum size of the heap region, then
there is no need to consider a resource which represents a
memory block. Conversely, if the designer implements the
test by also considering the internal boundary value which
represents the maximum size of the memory block which
can be acquired, then a more concrete resource, such as a
memory block, must also be considered.
Thus, when a test design is implemented by also
considering a more internal boundary value, a more con-
crete resource, that is one which has a lower level of
abstraction, must be used. In the authors� model, the level
of abstraction of a resource is called a �layer,� and resource
in a particular layer is defined as subdivision of the resource
in an upper layer. In other words, expressing the same
resource as a resource in a lower layer means implementing
test designs by considering the boundary values which
occur at the subdivision. Here, in order to make the distinc-
tion between an upper and lower layer, the resource imme-
diately about a particular resource will be referred to as the
�resource pool.� Figure 2 shows a diagram of a resource
and a resource pool.
3.5.2. Allocating and freeing resources
In the malloc() example above, because there is a
limit to the size of the heap region, memory blocks which
are not read cannot be returned. When not returned, mem-
ory insufficiency and other problems resulting from mem-
ory leaks may occur if the resource acquisition is performed
many times as a form of stress.
In the authors� model, an operation which involves
acquiring a memory block is called �allocating� a resource,
and an operation which involves returning a memory block
is referred to as �freeing� a resource. Also, the size of a
resource which can be allocated will be referred to as the
�size� of the resource pool. The size can be set externally,
and is an important element for test design. Below the
expressions A and F will refer to allocating (A) a resource
and freeing (F) a resource.
3.5.3. Overflow and insufficiency
As discussed in Section 3.5.1, when implementing a
test design by considering a resource with a more internal
boundary value, the resource operation must consider more
internal elements.
Fig. 2. Resource and resource pool.
*For convenience, a write to a dominant state value will represent a
dominant operation or a dominant clear operation.
54
As an example, let us consider an array with five
elements. In this instance, the array is the resource pool, and
the array elements are the resources. If the test designer only
considers the layer that is the array, array substitutions can
only be performed all at once. On the other hand, if the test
designer considers the layers that are the array elements,
then substitutions in the array can be considered as substi -
tution sets for the individual array elements.
If substitutions into the array elements are performed
six times here, then problems will occur. This problem is
referred to as buffer overflow. It is equivalent to when a
complete substitution is performed repeatedly on a matrix
and checks on boundary conditions are not performed
appropriately. In other words, problems such as a general
protection error or an update to a variable that should not
be updated occur, because a substitution is performed in an
illegal region after the sixth substitution exceeds the matrix
boundary or boundary condition checks are not performed.
In particular, if large amounts of data are passed as stress,
these conditions can readily occur.
In this fashion operations must be subdivided in
accordance with the breakdown of resources. If the parame-
ter PS is used for subdivision, the row in the matrix above
would experience problems when PS is six for a resource
pool whose size is five. This is called �overflow� and occurs
when PS is larger than the size.
A separate problem occurs when PS is 4. If a complete
substitution is performed with PS = 4, then later operations
are performed with PS = 5, the matrix elements which are
to be substituted are not substituted, and so the output will
be irregular. This is called �shortage� and occurs when PS
is smaller than the size.
Overflow and shortage are shown in Fig. 3. The
assignment of PS can be performed directly using a value
input by software, or it can be performed indirectly using
the data size.
3.6. Program operation
Based on the definitions above, program operation
under the authors� model can be considered to consist of
performing operations which include allocating the individ-
ual resources, repeating the reads and writes, then freeing
in succession.
In general, program operation is implemented using
parallel processing. In parallel processing, when a lot of
processes are passed as stress, deadlock or a starvation state
can occur.
The execution unit during parallel processing under
the authors� model is referred to as a �task� and is defined
as follows. A task consists of operations on one or more
resources. When a resource to be operated on is controlled
by another task, or when the conditions defined in the
execution environment are met, a task transfers its opera-
tions to another task and then �waits.�
When a task waits, a task which is waiting is selected
based on priority and then executed. This is called a �task
switch� in the authors� model, and the priority for a task
switch is called the �execution priority.� When a particular
task�s execution priority is sufficiently low, if a lot of
processes are passed as stress, a starvation state will result
with the task stopped in the wait state.
The determination of task selection and the number
of tasks being executed, as well as the number of tasks to
be executed in succession represent important elements of
test design. In addition, the determination of the speed of
resources which affect the wait time is also a part of test
design.
4. Causal Analysis of Problems Using FT
Diagrams
In Section 4.1 the problems which can occur in
programs which are expressed using the resource state
model are described. The causes of these problems are
analyzed using FT diagrams in Section 4.2, and stress is
discussed in Section 4.3.
4.1. Problems in the resource state model
A problem represents a situation in which the output
results do not match what was expected or in which an
illegal phenomenon occurs [6]. An illegal phenomenon
refers to something that is deemed illegal regardless of the
program specifications, such as the occurrence of an error
of the program stopping, or something for which the defi-
nition of illegality varies depending on the program speci-Fig. 3. Overflow and shortage.
55
fications, such as a drop in speed. For the latter, the rela-
tionship with specifications is discussed in detail in a sepa-
rate publication, and so in this paper only the former will
be dealt with. As a result, in this section problems which
can occur in the resource state model when output results
do not match expectations or when a phenomenon is
deemed illegal regardless of the program specifications will
be clarified.
In Section 3.3 the execution of a program can be taken
to be a chain of exchanges of resource state values. The
output results can be considered the final link in the chain,
and so when the output results do not match expectations,
this means that the results written to the resource which
comprises the chain do not match expectations.
On the other hand, an illegal phenomenon can be
considered to occur during individual operations in a task
or between operations.
An illegal phenomenon which occurs during an op-
eration can be considered as occurring when the conditions
necessary for the operation to succeed are not satisfied. The
conditions necessary for an operation to succeed can be
broken down into two categories: acquire and
write/read/free. The conditions necessary to successfully
acquire a resource are only that the size of the resource to
be acquired is smaller than the size of the resource pool. If
a large resource is to be acquired as stress, problems such
as a memory insufficiency may occur due to memory leaks.
In contrast, the conditions necessary for a
write/read/free for a resource are that the ID can be dis-
criminated and is not controlled. If a large amount of data
are passed as stress, buffer overflow may occur, and a
problem such as a general protection error may occur
without ID discrimination. On the other hand, if a resource
is controlled continuously as a result of stress, then prob-
lems such as data lock in which a task is not executed may
occur.
An illegal phenomenon which occurs between two
operations is related to task switching. Because a task
switch only involves selecting the tasks to be executed, a
problem such as the starvation state in which a task is not
selected or executed may occur.
In this manner there can be said to be four types of
problems in the resource state model. These problems are
represented in the following fashion in the authors� model.
[Internal Inconsistency]
An internal consistency occurs when the internal
state value (DATA) of a resource to be read does not match
what is expected. A problem due to a conflict state is an
internal inconsistency.
[Resource Exhaustion]
Resource exhaustion occurs when the size of the
resource pool is smaller than the size of the resource to be
acquired, and the resource cannot be acquired. Memory
lock represents resource exhaustion.
[Protection Exception]
Protection exception occurs when an ID which can-
not be discriminated (null ID) is passed as a parameter. A
general protection fault is a protection exception.
[Wait Stop]
A wait stop occurs when a task stop occurs while
waiting and not executing. Deadlock and the starvation
state represent a wait stop.
4.2. Analysis of causes of problems using FT
diagrams
In this section the causes for each of the problems
described in the previous section are analyzed. The analyses
use FT diagrams [e.g., 7].
An FT diagram is a tree which represents hierarchi-
cally the causes of a problem. An element in a branch is
normally called an �event,� an element at the root a �top
event,� and an element at a leaf a �basic event.� A basic
event is an event which cannot be analyzed further using
the information given. Also, conditions called an �AND
gate� and �OR gate� are provided in order to express
whether or not there is a need to have several causes occur
at the same time.
In order to avoid having the FT diagrams become
confusing in this analysis, the authors created a supplemen-
tary FT diagram in which �A illegal,� �A extra,� �R./W./F.
illegal,� �R./W./F. extra,� and �timing off� represent top
events. In addition, in cases in which the cause of an internal
inconsistency is an internal inconsistency in another re-
source or parameter, the FT diagram was made to be itera-
tive. Note also that in an FT diagram, WO represents
�occupy� and WR �release.� The main FT diagrams and
supplementary FT diagrams the authors created are in the
Appendix.
In the authors� research the next step was the derived
basic events. This resulted in the basic event being broken
down into seven classifications. The basic event and its
classifications are shown in Table 1.
4.3. Stress in the resource state model
A problem occurs when the portion of a program that
is the trigger is executed. There are many causes of prob-
lems, such as specification errors and program defects due
to omissions, or hardware that fails after operational chaos.
Regardless, a problem will not necessarily occur even if the
trigger portion is executed.
For instance, a defect such as the lack of the code to
free a memory block which is acquired using malloc() will
not cause a problem merely by executing the defective part.
In order to cause a problem, stress must be applied in a form
which causes a number of executions in excess of the size
of the heap region.
56
In order to cause these kinds of problems, conditions
in addition to the part which represents the trigger are at
times necessary. When representing a program using the
resource state model, the basic elements in Table 1 derived
using the FT diagram in Section 4.2 represent the problem
trigger and the conditions with respect to the problems
described in Section 4.1.
In the authors� research the condition which causes a
problem when executing the part which represents the
problem trigger is referred to as �stress.� In other words,
stress in the authors� research can be taken to represent the
conditions which make clear the defector failure that is not
normally visible. Therefore, items such as the problems in
the program or the problems with hardware that appear in
Table 1 indicate the parts which represent the problem
triggers, and items such as size, frequency, speed, timing,
and illegal input can be interpreted as classifications of
stress. These items will be called �stress items.� In the next
section the characteristics and procedures necessary to con-
trol stress items are discussed.
5. Proposal of a Stress Test Design Method
with Focuses on Resources
In this section the method for passing stress, in other
words the characteristics used to control stress items, is
explained first. Section 5.2 describes the design of stress
items. In Section 5.3 the process for stress test design
focusing on resources is proposed, and Section 5.4 explains
the estimation of resources themselves.
5.1. Characteristics for controlling stress items
When designing a test, two categories of charac-
teristics must be delineated. The first is the qualitative
characteristics. Here qualitative refers to characteristics
such as what functions are executed, what features and
functions are combined and executed, what sequence is
used for the execution, and what data are used.
The second category is quantitative characteristics.
Here quantitative refers to characteristics such as how many
functions are executed at the same time, how many func-
tions are executed in succession, the size of the data, the
size of the resources, and the speed of the resources.
Because qualitative characteristics vary with soft-
ware, deriving a general design policy is problematic. As a
result, in this section quantitative characteristics which can
control stress items are discussed. This policy will be re-
ferred to as �stress driver.� In the remainder of this section
examples of stress drivers will be given. As shown in Table
2, examples will be given for three categories of charac-
teristics: characteristics which provide control from outside
using the resource state model, in other words charac-
teristics related to tasks and operations, characteristics re-
lated to parameters, and characteristics related to resources.
Table 2. Stress drivers
Model characteristics Stress driver
Characteristics related to tasks and
operations
Burst stress
Long-run stress
Characteristics related to parameters Limit stress
Volume stress
Omission stress
Characteristics related to resources Speed stress
Table 1. Categorized basic events
Category Basic event
Program problems Problems in the code
Problems in the design
Problems with tasks when
performing task switching
Low execution priority for a
particular task
Hardware problems Problems with hardware
Size Execution in excess of the size
Input in excess of the size
Input below the size
Frequency Too many tasks to be controlled
Too many tasks to be executed
The frequency of A is greater than
that of F
Speed Operation speed is fast/slow
Timing Occurs between W. and R.
Occurs before the original F
Occurs after the original F
Occurs before even WO
Occurs before even WR
Occurs during a task switch
Illegal input Input of null ID
Input of other resource ID
Input of ID that is the same as . . .
Input of strange DATA
57
5.1.1. Characteristics related to tasks
Quantitative characteristics related to tasks include
how many tasks are being executed at the same time and
how many tasks are being executed in succession.
Consequently, �burst stress� in which many tasks are
executed at the same time and �long-run� stress in which
many tasks are executed in succession can be mentioned as
stress drivers. The number of tasks executed under burst
stress or long-run stress is derived from the size of the
resources when resource exhaustion or content discrepan-
cies occur, and from the size of the resource which runs the
tasks which is performing a task switch when a wait stop
occurs. A test item to reduce the size of the resource can
also be designed when necessary.
5.1.2. Characteristics related to parameters
Quantitative characteristics related to parameters can
designate directly or indirectly, as discussed in Section
3.5.3. A parameter which is designated directly includes
PS greater or less than the size, the null ID, other resource
IDs, IDs which are the same as other resources, and strange
DATA. The PS greater or less than the size and the null ID
can be designed using the boundary value analysis method
with the resource size as the boundary. This is referred to
as �limit stress.� The remaining characteristics, however,
are not addressed here, because they are qualitative charac-
teristics for which design cannot be performed without
knowing internal information such as the resource ID or
DATA attributes.
By contrast, parameters which designate indirectly
include PS greater or less than the size. PS greater than the
size can be designed by either setting the test data size
greater than the resource size, or setting the resource size
lower than the test data size. This is called �volume stress.�
Overflow can be caused by volume stress. Also, PS less than
the size can be designed by either setting the test data size
lower than the resource size, or setting the resource size
greater than the test data size. This is called �omission
stress.� Omission stress causes operational overflow be-
cause the size of the data is smaller than the size of the
resource to be operated, in other words the program cannot
check for insufficiencies. As a result, it is vital to specialize
functions which may cause insufficiencies. For instance,
there are cases in which the data size is set above specifica-
tions, but in fact less data than the specifications indicate
are actually passed.
5.1.3. Characteristics related to resources
Quantitative characteristics related to resources in-
clude how large to make the resource size and how high to
set the resource speed. The resource size is not treated as an
independent stress driver, because it is determined by the
relative relationship between the size of the data given and
the number of tasks being executed. On the other hand, the
resource speed is a characteristic associated with timing
problems, and it affects the next operation and operations
being executed at the same time. Thus, the speed of the
resource in question is derived from the speed of the re-
source treated by the next operation or the operation of tasks
being executed at the same time. The resource settings can
be updated, and the resources themselves can be exchanged.
This is called �speed stress.�
5.2. Stress test item design
In this section, the method of applying stress, that is,
the design of stress test items, is explained by describing
the relationship between a stress item and the stress driver.
The size stress item can be passed using an operation
leak with respect to the resource using the limit, burst, or
long-run stress for a data overflow with respect to a re-
source.
The frequency stress item can be passed by burst
stress. Also, because operations subdivided in volume and
limit stress occur in large number, they can pass stress.
The speed stress item can be passed by speed stress.
Also, if the resource in question is linked in virtual memory,
and has a trade-off relationship between size and speed,
then slashing due to large swaps resulting from volume and
limit, burst, or long-run stress occur, and stress which
reduces speed can be created.
The timing stress item analyzes which software func-
tions equate with which operations in which resources, then
regulates the operational sequence by combining functions
and regulates the execution start time for functions pre-
cisely. Because precisely controlling the execution start
time for functions is problematic in practice, the probability
of execution in the real world with the desired timing can
be raised by burst and long-run stress. Also, by lowering the
speed, controlling the execution start timing should become
easier.
The illegal input stress item can be passed using
overflow resulting from volume, burst, or long-run stress,
or input which exceeds its range as a result of limit stress.
Different from problems in the program, problems
with hardware occur probabilistically, and occur more fre-
quently with wear. As a result, stress can be created by burst
and long-run, as well as volume and limit stress.
The relationship between the stress items and stress
drivers described above is shown in Table 3.
5.3. Stress test design process for resources
In this section the process for designing and imple-
menting the stress test items derived in the previous section
are explained.
58
Using the stress test items derived in the previous
section in real software first requires a phase in which the
resources of the software are analyzed. In this phase, the
resources must first all be described. If the description of
the resources is missing, none of the problems associated
with that resource can be detected. Also, resources which
are not described must be estimated. Such estimation will
be touched on in Section 5.4. In addition, the question of
up to what layer to consider the resources described must
be studied.
What is next necessary is a phase for the analysis of
functions. In this phase, functions of the software are ap-
plied to the resource state model. First, all of the functions
of the software are described. Next, it must be determined
which resource operation the functions described corre-
spond to, and which takes performing what kinds of opera-
tions can they be taken to represent. Then, the
before-and-after relationship as well as the possibility of
simultaneous execution can be determined based on the
interdependent relationship between operations for each
resource.
Once the analysis of functions is performed, we can
move into the stress design phase. In this phase, the stress
items derived from the resource state model using the FT
diagrams are turned into a real test case. First, the stress
items passed with respect to each resource are analyzed.
Next, the stress drivers being used are analyzed, and char-
acteristic values are calculated by applying the boundary
value analysis method with respect to the characteristic, all
for the purpose of applying stress. In accordance with the
calculated characteristic values, the functions to be tested,
as well as combinations of them, their sequence and timing
are determined, and then test data are designed. Finally, the
necessary items are determined, and the test case is created.
The final phase of test design is the phase in which
the test case priority is determined. First, the importance of
resources is decided with consideration for the magnitude
of cases in which problems occur. Problems which are
centered around architecture or resources which resulted in
security holes are deemed to be very important. In addition,
the importance of functions based on product charac-
teristics is also considered, and the test case priority is
determined.
In addition, the test design process also requires
efforts to prevent test item errors and omissions. As a result,
in this process a design review is implemented during each
of the phases described above. In addition, for functions
whose correspondence with a task for a resource cannot be
created due to an insufficient resource estimation, test leaks
must be prevented to the utmost possible by using stress
applied to similar functions or by using the boundary value
analysis method on data passed to the function.
The test execution phase comes after test design.
Figure 4 outlines the procedure above. The design method
proposed in Sections 5.2 and 5.3 will be referred to as the
�resource-oriented stress testing.�
Under this proposed method, stress test items can be
designed systematically while bearing in mind the cause-
and-effect relationship with problems by using the resource
state model. This is quite different from the Myers category
method and the Beizer example method.
In addition, the boundary value analysis method typi-
cally focuses only on the input and output values for its
specifications. As a result, even if detailed information
related to the design and implementation of tests is avail-
Table 3. Relation between stress items and stress drivers
Stress item Stress passed Stress driver
Size Size is large Volume, limit, burst, long-run
Size is small Omission
Frequency Frequency is high Burst
Subdivided Volume, limit
Speed Speed varies Speed
Slashing Volume, limit, burst, long-run
Timing Probability is high Burst, long-run
Control is easy Speed, volume, limit, burst, long-run
Illegal input Input is direct Limit
Input is indirect Volume, burst, long-run
Problems with hardware Probability is high Bust, long-run, volume, limit
Wear Long-run
59
able, it cannot be used effectively to discover the cause-and-
effect relationship with problems.
5.4. Resource estimation
The method proposed in the authors� research is
effective when the test designers have sufficient informa-
tion about resources. However, information about resources
which must be considered after the detailed design of
software is often not given to the test designers. In such
cases, the test designers must extract silent resource infor-
mation which is not known by performing a task known as
�estimation.� In particular, the specifications of recent soft-
ware are fluid, and so the task of estimation is becoming
increasingly important.
This being said, the task of estimating the presence
of resources and the characteristics of resources depends on
experience, and can be performed based on experience, as
is test design. Thus, by providing the perspective for esti-
mation to the test designers beforehand, precise estimations
become possible. Examples of this perspective go beyond
the scope of this research, and so here examples of estima-
tion of resources themselves will be given.
• Functions which use prior execution results
• Multiple functions which use the same type of
data
• Software behavior
• Knowledge accumulated or acquired beforehand
6. Applied Example
This section provides an applied example of the
proposed method using software which has been shipped
to the marketplace. Also, for reference, results of the expe-
riential method are given, and a comparison and commen-
tary are provided. The test items are two World Wide Web
browsers.
6.1. The tests
For these tests two commercial World Wide Web
browsers were obtained. These represent software which
effectively divide the market into two. They will be referred
to as Browsers A and B. The OS was Windows 95.
The tests were implemented for text functions and
table functions activated by the tags shown in Table 4. The
text functions were 23 formatting tags classified into Font
Style, Headings, Phrase in HTML 3.2 [8]. The table func-
tions were the table tags TABLE, TR, and TD.
6.2. Design of stress test items
Test design using the proposed method was com-
pleted in roughly 2 hours by the authors. For the conven-
tional test design, design results from engineers with over
8 years of experience in test organization were used. The
total time required for design was 4 hours.
6.2.1. Test design using the proposed method
In test design using the proposed method, first re-
source estimation was performed in accordance with the
design process proposed in the previous section. By focus-
ing on the nesting of the tags from the standpoint of �func-
tions using prior execution results� for formatting tags,
resources which store the depth of the nesting was esti-
mated. The estimated resource will be referred to as a tag
buffer. Table 4 shows the tags which use the tag buffer. For
the table tags, the resources which store the depth of the
nesting were estimated in the same way as was done for the
formatting tags, but estimation could not be performed for
the resource characteristics. These estimated resources will
be referred to as the table buffer.
Next, task analysis was performed. The following
results were obtained: the buffer is acquired and written
to each time the World Wide Web browser reads a tag.
Then, the buffer is read whenever the read data are
Fig. 4. Design process of resource-oriented stress
testing.
60
displayed, and is then freed each time a World Wide Web
page is displayed.
Only volume stress was designed with respect to the
measured buffer for the test case. Volume stress was de-
signed for tags which use the measured buffer as the test
case. For tags which do not use the measured buffer, test
items using volume stress were passed along. Because the
resource characteristics could not be measured for format-
ting tags, 46 items were designed using 10,000 or 1000 as
nesting based on the limitations of execution time. In order
to maintain the data file and size designed using the format-
ting today test case at roughly the same level, 6 items were
designed for the table tags: a table with 1000 rows and
columns, a nested table with 10 rows and 10 columns, a
table with 10,000 rows and columns, and a table with 30
levels of nesting in 30 rows and 30 columns.
As for the priority for the test cases, the test case tags
which could estimate the buffer were given priority. This is
because the buffer overflow occurred as a problem in the
World Wide Web browsers, a major security hole resulted.
Also, no design review was implemented.
6.2.2. Test design using the experiential
method
For formatting tags, three tests were designed.
First was a �same tag iteration test� for which the
same tag was used with 100 repetitions without nesting and
the data in the tags rose in proportion to the number of
times.
Next, a �same tag nesting test� was designed with 100
levels of nesting for <BIG> and <SMALL>, the effect of
multiple designations. For the eight Phrase tags, a �distinct
tag nesting test� was designed to take the other 15 tags
besides Phrase and nest them cyclically.
Half-sized and full-sized numbers and Katakana
script, plus full-sized Hiragana script and Kanji characters
of six types were used for the test data.
The test items designed consisted of a total of 198
items: 138 items for the same tag iteration test, 12 items for
the same tag nesting test, and 48 items for the distinct tag
nesting test.
A test item with a table of 1 row and 1 column to n
rows and n columns stacked vertically was used for the table
tags. There were three items: n at 30, 40, and 50. The test
data consisted of half-sized numbers.
6.3. Detection results
6.3.1. Detection results using the proposed
method
In the proposed method, 26 problems were detected
using the 53 test cases*. Analysis of similar problems
showed that they could be classified into five categories.
Table 5 shows the detection results.
6.3.2. Detection results using the experiential
method
In the experiential method, 92 problems were de-
tected using 198 test cases. Here as well, analysis of similar
problems showed they could be classified into four catego-
ries. Table 6 shows the detection results.
6.4. Evaluation and discussion
In the proposed method, problems which could not
be detected in the developmental organization of the test
were readily detected. In particular, the general protection
fault detected in the test of table tags for Browser B repre-
sents a security hole for a World Wide Web browser, and so
is a problem which must be detected and revised properly.
The significance of detecting a problem like this is very
high.
In addition, in the formatting tag test using the pro-
posed method, the problems of extra lines and new lines
being out of position detected under the experiential
method could not be detected. Also, priority for tags which
did not use the tag buffer was lowered.
Nevertheless, the results of considering the detection
results using the proposed method after the test execution
clarified design errors such as resource estimation leaks
which the buffer for the size of the screen could not esti-
*In the test item (30 rows, 30 columns, 30 levels of nesting) for the table
tags in Browser B, the problem which occurred at the same time that
Browser B abnormally ended could not be detected. As a result, a test case
for a 5 row, 5 column, 5 levels of nesting was specially designed and
implemented.
Table 4. Tags to be stress-tested and tags guessed to
use buffers
Tags to be tested Tags using buffers
Font style B, BIG, I, SMALL,
STRIKE, SUB, SUP,
TT, U
B, I
Headings H1, H2, H3, H4, H5,
H6
H1, H2, H3, H4, H5,
H6
Phrase CITE, CODE, DFN,
EM, KBD, SAMP,
STRONG, VAR
CITE, DFN, EM,
STRONG
Tables TABLE, TR, TD TABLE
61
mate. In the proposed method, however, because design
review was implemented with a focus on resources, if an
appropriate design review had been implemented, resource
estimation leaks would have been identified, and found to
represent problems which could not be detected. On the
other hand, the results of considering the detection results
using the experiential method after the test execution were
complex and problematic due to the need to consider each
test case, and the causes of detection leaks were not speci-
fied. In the same fashion, because the design review was
not carried out for each test case, complexity and problems
were expected.
In this example, the number of functions to be tested
was limited, and the qualitative characteristics of the func-
tions were comparatively clear. When designing test items
in practice, the qualitative characteristics of the functions
must be extracted, and then the selection and combination
of functions must be found using an FT diagram. This work
requires interpretation of a resource state model and FT
diagrams, though the interpretation of the model and FT
diagrams, as well as the analysis of tasks may not be so easy,
depending on the test in question.
In addition, an inclusive example of resources is still
problematic, even if the internal structure is known. As a
result, the authors expect examples based on the importance
of resources and estimation viewpoint to become more
important. In particular, the adjustment of viewpoint for the
estimation of resource characteristics which cannot at pre-
sent be estimated represents a vital task.
7. Conclusion
In this paper the authors created a resource state
model, a model of the operation of programs which focuses
on resources. The authors then performed causal analysis
of problems in accordance with their model, and described
a design method for stress test items. In addition, the
authors proposed a stress test design process which focuses
on resources. Using the proposed method, systematic and
efficient stress test design becomes possible.
Table 6. Results of non-resource-oriented stress testing
Formatting tag tests Table tag tests
Test items for which
problems were detected
Undetected problems Test items for which
problems were detected
Undetected problems
Browser A Same tag iteration test New line position off None None
Browser B Same tag iteration test New line position off All Upper right frame and
lower frame lost
Same tag iteration test Extra line present
Table 5. Results of the resource-oriented stress testing
Formatting tag tests Table tag tests
Test items for which
problems were detected
Undetected problems Test items for which
problems were detected
Undetected problems
Browser A Tags using the tag buffer Nesting invalid 30 rows
30 columns
30 levels of nesting
A table frame table was
inverted, and upper right
frame was lost
Browser B Tags not using the tag
buffer
Upper right character lost 30 rows
30 columns
30 levels of nesting
General protection fault
and memory
insufficiency
5 rows
5 columns
5 levels of nesting
Upper right frame and
lower frame lost
62
Major problems which cannot be detected during the
development process can be detected by applying the pro-
posed method to real-world software products.
Future topics of research include a proposal for a
detailed analysis method for tasks and the interpretation of
the resource state model and FT diagrams, a proposal for a
viewpoint of estimation and examples based on the impor-
tance of resources, and large-scale use in real-world test
design.
Acknowledgments. The authors express their
gratitude to Professors Hitoshi Kubome and Musashi
Nakashino of the Science and Engineering Faculty of Chuo
University for their valuable input, Professor Shimon Ken
of Musashi Institute of Technology, and the members of the
System Evaluation Office of the Business Products First
Division of CSK Business Systems. In addition, they thank
Mercury Interactive Japan K.K. for providing their power-
ful tools. Also, the authors thank their readers for invaluable
input.
REFERENCES
1. Pressman RS. Software engineering�A practitio-
ner�s approach. McGraw�Hill; 1997.
2. Myers GJ. The art of software testing. John Wiley &
Sons; 1979. Software testing methodology. Kin-
daikagakusha; 1980.
3. Beizer B. Software system testing and quality assur-
ance. ITP; 1996.
4. Kaner C, Falk J, Nguyen HQ. Testing computer soft-
ware. ITP; 1993.
5. Beizer B. Software testing techniques. Van Nostrand
Reinhold; 1990. Software testing techniques. Nikkei
BP Corp.; 1994.
6. Beizer B. Black-box testing. John Wiley & Sons;
1995. An introduction to real-world software testing.
Nikkei BP Corp.; 1997.
7. Shiomi H, Shimaoka J, Ishiyama K. FMEA and FTA
applications. Nikka Giren Publishing; 1984.
8. Raggett D. HTML 3.2 Reference Specification, W3C
Recommendation, 1997.
http://www.w3.org/TR/REC-html32.html
APPENDIX
1. Supplemental FT Diagrams Using Causal
Analysis
Figure A-1 shows a supplemental FT diagram for A
illegal, A extra, and R./W./F illegal. Figure A-2 shows a
supplemental FT diagram for R./W./F extra and timing off.
2. Main FT Diagrams Using Causal Analysis
Figure A-3 shows a main FT diagram for an internal
inconsistency and resource exhaustion. Figure A-4 shows a
main FT diagram for a protection exception and wait stop.
63
Fig. A-1. Supplemental fault trees (1).
64
Fig. A-2. Supplemental fault trees (2).
65
Fig A-3. Main fault trees (1).
66
Fig A-4. Main fault trees (2).
67
AUTHORS (from left to right)
Yasuharu Nishi (student member) graduated from the Department of Chemical Systems in the School of Engineering at
the University of Tokyo in 1995 and completed his master�s program in 1997. Currently he is working on his doctorate there.
He is pursuing research related to software quality, particularly software testing. He is a member of the Information Processing
Council, the Japan Quality Control Society, IEEE-CS, and ACM.
Yoshinori Iizuka graduated from the Department of Statistics in the School of Engineering at the University of Tokyo
in 1970 and completed his master�s program in 1975. He then became a lecturer in the College of Electronic Communications.
He became a lecturer in reactive chemistry at the School of Engineering in 1976, an assistant professor in 1984, and in 1994
moved to the Department of Chemical Systems in the Graduate School due to a restructuring there. He has been a professor
there since 1997. He holds a D.Eng. degree, and is pursuing research related to quality control and statistical analysis. He is a
member of the Japan Quality Control Society, the Applied Statistics Association, the Japan Reliability Council, the Chemical
Engineering Association, and ASQ.
68