1 Software Dependability Assessment: A Reality or A Dream? Karama Kanoun The Sixth IEEE International Conference on Software Security and Reliability,

1

Software Dependability Assessment:

A Reality or A Dream?

Karama Kanoun

The Sixth IEEE International Conference on Software Security and Reliability, SERE, Washington D.C., USA, 20-22 June, 2012

2

Complexity

Economic pressure

[From J. Gray, ‘Dependability in the Internet era’] Availability Outage duration/yr

0,999999 32s

0,99999 5min 15s

0,9999 52min 34s

0,999 8h 46min

0,99 3d 16h

Availability

1950 1960 1970 1980 1990 20009%

99%

99.9%

99.99%

99.999%

99.9999%

Computer Systems

Telephone Systems

Cellphone

s

Internet

2010

3

Examples of Historical Failures

June 1980: False alerts at the North American Air Defence (NORAD)

June 1985 - Jan. 1987: Excessive radiotherapy doses (Therac-25)

Aug. 1986 - 1987: the « Willy Hacker » penetrates several tens of sensitive computing facilities

15 January 1990: 9 hours outage of the long distance phone in the USA

February 1991: Scud missed by a Patriot (Irak, Gulf war)

Nov. 1992: Crash of communication system of London Ambulance service

26 -27 June 1993: Authorization denials of credit card operations, France

4 June 1996: Failure of Ariane 5 first flight

Feb 2000: Distributed denials of service on large web sites: Yahoo, ebay…

Aug. 2003: electricity blackout in USA, Canada

Oct. 2006: 83000 email addresses, credit card info, banking transaction files stolen in UK

Aug. 2008: Air traffic control computing system failure (USA)

Sep. 2009: 3rd service interruption of Gmail, Google email service, during year 2009

Phy

sica

l

Ava

ialb

ility

/R

elia

bilit

y

Dev

elop

men

t

Inte

ract

ion

Saf

ety

Con

fiden

tialit

y

Cause

Loca

lized

Dis

trib

ute

d

Impact

Dependabilityproperties

4

Accidental faults

Number of failures

[consequences and outage durations depend upon application]

Dedicated computing systems

(e.g., transaction processing, electronic

switching, Internet backend servers)

Controlled systems

(e.g., civil airplanes, phone network, Internet frontend

servers)

Faults Rank Proportion Rank Proportion

Physical internal 3 ~ 10% 2 15-20%

Physical external 3 ~ 10% 2 15-20%

Human interactions 2 ~ 20% 1 40-50%

Development 1 ~ 60% 2 15-20%

5

Global Information Security Survey 2004 — Ernst & YoungLoss of availability: Top ten incidents

Percentage of respondents that indicated the following incidents resulted in an unexpected or unscheduled outage of their critical business

0% 20% 40% 60% 80%

Hardware failures

Major virus, Trojan horse, or Internet worms

Telecommunications failure

Software failure

Third party failure, e.g., service provider

System capacity issues

Operational erors, e.g., wrong software loaded

Infrastructure failure, e.g., fire, blackout

Former or current employee misconduct

Distributed Denial of Servive (DDoS) attacks

Non malicious76%

Malicious24%

6

User / customer

• Confidence in the product

• Acceptable failure rate

Why Software Dependability Assessment?

Developer / Supplier

• During production

Reduce # faults (zero defect)

Optimize development

Increase operational dependability

• During operation

Maintenance planning

• Long term

Improve software dependability

of next generations

7

Approaches to Software Dependability Assessment

Assessment based on software characteristics

• Language, complexity metrics, application domain, …

Assessment based on measurements

• Observation of the software behavior

Assessment based on controlled experiments

• Ad hoc vs standardized benchmarking

Assessment of the production process

• Maturity models

8

Outline of the Presentation

Assessment based on software characteristics

• Language, complexity metrics, application domain, …

Assessment based on measurements

• Observation of the software behavior

Assessment based on controlled experiments

• Ad hoc vs standardized benchmarking

Assessment of the production process

• Maturity models

9

Corrections non-repetitive process

No relationship between failures and corrections

Continuous evolution of usage profile

• According to the development phase

• Within a given phase

Overselling of early reliability “growth” models

Judgement on quality of the software developers

What is software dependability dependability measures?

Number of faults, fault density, complexity?

MTTF, failure intensity, failure rate?

Software Dependability Assessment — Difficulties

10

Dependability Measures?

Static measuresStatic measuresDynamic measures: Dynamic measures: characterizing occurrence characterizing occurrence of failures and correctionsof failures and corrections

Failure intensityFailure rateMTTFRestart timeRecovery timeAvailability…

Complexity metricsNumber of faultsFault density… Usage profile

& Environment

11

Number of Faults vs MTTF

MTTF (years)

5000 1580 500 158 50 15.8 5 1.58Product

1

2

3

4

5

6

7

8

9

28,8

28,0

28,5

28,5

28,5

28,2

28,5

27,1

27,6

17,8

18,2

18,0

18,7

18,4

20,1

18,5

18,4

20,4

10,3

9,7

8,7

11,9

9,4

11,5

9,9

11,1

12,8

5,0

4,5

6,5

4,4

4,4

5,0

4,5

6,5

5,6

2,1

3,2

2,8

2,0

2,9

2,1

2,7

2,7

1,9

1,2

1,5

1,4

0,3

1,4

0,8

1,4

1,4

0,5

0,7

0,7

0,4

0,1

0,7

0,3

0,6

1,1

0,0

34,2

34,3

33,7

34,2

34,2

32,0

34,0

31,9

31,2

Percentage of faults and corresponding MTTF (published by IBM) MTTF≤1.58 y

1.58 y ≤ MTTF≤ 5 y

12

Assessment Based on Measurements

• Descriptive statistics

• Trend analysis

• Modelling/prediction

Data Collection Data Processing

• Non-stationary processes

• Stochastic models

• Model validation

Times to failures /# failures

Failure impact

Failure originCorrections

• Trend evolution

• Failure modes

• MTTF / failure rate

• Correlations

Outputs

• MTTR

• Availability

13

Why Trend Analysis?

Corrections

. . .

Vi,1

Failure intensity

time

Vi,2

Vi,k

14

Why Trend Analysis?

Corrections

. . .

Vi,1

Failure intensity

Vi,2

Vi,k

Corrections

Changes (usage profile, environment, specifications,...)

Vi+1,1

Vi+1,2

Vi+1,3

Vi+1,4

time

15

0

5

10

15

20

25

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 months

Failure intensity

OperationValidation

10

20

30

40

11 15 19 23 29 31

# systems

Example: Electronic Switching System

16

0

20

40

60

80

100

120

140

160

180

200

220

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 months

Observed

OperationValidation

Cumulative number of failures

Electronic Switching System (Cont.)

17

Cumulative number of failures

0

20

40

60

80

100

120

140

160

180

200

220

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 months

Observed

Hyperexponential model application maintenance planning

Predictive assessmentRetrodictive assessment

Observed # failures [20-32] = 33 Predicted # failures [21-32] = 37


OperationValidation

18

Failure intensity and failure rate in operation(for an average system)

0

0.5

1

1.5

2

2.5

17 19 21 23 25 27 29 31

Estimated by Hyperexponential model

Observed

Residual failure rate: 5.7 10-5 /h


19

Failure intensity of the software components

observed failure intensity

Hyperexponential model

0

0.2

0.4

0.6

0.8

1

17 19 21 23 25 27 29 31



0

0.1

0.2

0.3

0.4

0.5

17 19 21 23 25 27 29 31



0

0.1

0.2

0.3

0.4

17 19 21 23 25 27 29 31



0

0.1

0.2

0.3

0.4

17 19 21 23 25 27 29 31

Telephony

Interface

Defense

Management


20

Failure intensity and failure rate in operation(for an average system)

Residual failure rate6

5

5

6

-

-

-

-

-5

Telephony 1.2 10 /h

Defense 1.4 10 /hInterface 2.9 10 /h

Management 8.5 10 /h

Sum 5.3 10 /h

Component

0

0.5

1

1.5

2

2.5

17 19 21 23 25 27 29 31

Estimated by Hyperexponential model

Observed

Residual failure rate: 5.7 10-5 /h


Sum of the failure intensities of thecomponents estimated by HE

21

Other Example: Operating System in Operation

u(i)

-2-1,5

-1-0,5

00,5

11,5

2

1 21 41 61 81 101 121 141 161 181# failures

Trend evolution = stable dependability

Data = Time to Failure during operation

100000

150000

200000

250000

300000

1 21 41 61 81 101 121 141 161 181# failures

MeanTime to Failure

22

Validity of Results

Early Validation

Trend analysis

development

follow-up

Assessment

End of Validation

Trend analysis + Assessment

• operational profile

• enough data?

Limits: 10-3/h -10-4/h

Operation

Trend analysis + Assessment

High relevance

Examples:

E10-B (Alcatel ESS):

1400 systems, 3 years

= 5 10-6/h

c = 10-7/h

Nuclear I&C systems:

8000 systems, 4 years

: 3 10-7/h 10-7/h

c = 4 10-8/h

23

Research Gaps

Applicability to new classes of systems

• Service oriented systems

• Adaptive and dynamic software systems on-line assessment

Industry implication

• Confidentiality real-life data

• Cost (perceptible overhead, invisible immediate benefits)

Case of Off-The-Shelf software components?

Applicability to safety critical systems

• During development

Accumulation of experience software process improvement

assessment of the software process

24

Off-The-Shelf software components — Dependability Benchmarking

No information available from software development

Evaluation based on controlled experimentation

Ad hoc Standard

Evaluation of dependability measures / features

in a non-ambiguous way comparison

Properties

Reproducibility, repeatability, portability, representativeness, acceptable cost

Dependability benchmarkingInternal purpose

Results: a

vailable

& reusable

25

Benchmarks of Operating Systems

Which OS for my computer

system?

Operating System

MacLinux

Windows

Computer System

Limited knowledge: functional description

Limited accessibility and observability

Black-box approach robustness benchmark

26

OS Outcomes

Operating system

Hardware

Devicedrivers

Application

APIFaults

Faults = corrupted system calls

Robustness Benchmarks

27

OS Response Time

s

0

100

200

300

400

500

600

700

NT 4 2000 XP NT4Server

2000Server

2003Server

0

100

200

300

400

500

600

700

2.2.26 2.4.5 2.4.26 2.6.6

s

In the presence of corrupted system calls

Without corruption

Windows Linux

28

Mean Restart Time

Windows Linux

0

40

80

120

NT 4 2000 XP NT4Server

2000Server

2003Server

0

40

80

120

2.2.26 2.4.5 2.4.26 2.6.6

seconds seconds

In the presence of corrupted system calls

Without corruption

29

# exp

Detailed Restart Time

# exp

check disk

50

100

150

200

250

0 100 200 300 400

secondsseconds

Windows XP Linux 2.2.26

30

50

100

150

200

250

0 50 100 150 200 250 300 350 400

Experiment

seconds

2000

XP

NT4

Impact of application state after failure

Restart time(seconds)

More on Windows family

# exp XP

NT4

2000

31

Benchmark Characteristics and Limitations

A benchmark should not replace software test and validation

Non-intrusiveness robustness benchmarks

(faults injected outside the benchmark target)

Make use of available inputs and outputs impact on measures

Balance between cost and degree of confidence

# dependability benchmark measures >>

# performance benchmark measures

32

Maturity of Dependability Benchmarks

“Competition” benchmarks

Performance benchmarks

• Mature domain

• Cooperative work

• Integrated to system development

• Accepted by all actors for

competitive system comparison

“Ad hoc” benchmarks

Dependability benchmarks

• Infancy

• Isolated work

• Not explicitly addressed

• Acceptability?

• •??

Maturity

33

Data Collection

Data ProcessingControlled experiments

Measurements

Measures

Software Process Improvement (SPI)

Objectivesof the analysis

Capitalize experience

Data related to similar projects

Feedback to software development process

34

Examples of Benefits from SPI Programs

AT&T(quality program):

Customer reported problems (maintenance program) divided by 10

System test interval divided by 2

New product introduction interval divided by 3

Fujitsu(concurrent development process):

Release cycle reduction = 75 %

Motorola (Arlington Heights), mix of methods:

Fault density reduction = 50% within 3.5 years

Raytheon (Electronic Systems), CMM:

Rework cost divided by 2 after two years of experience

Productivity increase = 190%

Product quality: multiplied by 4

35

Process improvement Dependability improvement & cost reduction!

Cost of dependability

Dependability

Cost

Basic development cost

Cost of scrap / rework

— — without process improvement approaches

Total cost——— with process improvement approaches

36

Ultimate objective:

more reliable softw

are, faster, a

nd cheaper!

37

Karama Kanoun

The Sixth IEEE International Conference on Software Security and Reliability, SERE, Washington D.C., USA, 20-22 June, 2012

Software Dependability Assessment:

A Reality or A Dream?

Documents

1 Software Dependability Assessment: A Reality or A Dream? Karama Kanoun The Sixth IEEE International Conference on Software Security and Reliability,