Section 17 Bayesian Reliability Analysis - UNENE

NSERC-UNENE Industrial Research Chair

Department of Civil and Environmental Engineering

University of Waterloo

Mahesh Pandey

Section 17

Bayesian Reliability

Analysis

Learning Objectives

The purpose of this section is to:

⚫ Demonstrate how to perform reliability computations when little or

no data is available

⚫ Discuss the idea behind “Weibayes” analysis, and why it is not

really a Bayesian approach

⚫ Introduce the concept of material improvement factors (MIFs)

used in the reliability assessment of SG tubing materials

⚫ Show how to estimate and update important reliability parameters,

such as the constant failure rate l, using Bayesian “conjugate”

distributions

⚫ Describe how industry generic data can readily be used to

construct conjugate “prior” distributions

⚫ Explain the key challenges and advantages associated with the

Log-normal prior distribution used commonly in PSA

2

Introduction

3

Introduction

Often the most challenging part of a credible reliability analysis

is the lack of data

⚫ Most engineering systems, structures and components (SSCs)

are highly reliable by design

⚫ Preventive maintenance (PM) programs are also in place to

mitigate against failures

⚫ Reliability testing of new components may be expensive both in

terms of cost and duration

The objective of Bayesian analysis is to supplement the lack

of data with expert judgment and/or generic information

There are two basic types of reliability problems

⚫ Analysis of first failures

⚫ Analysis of repeat failures

4

First Failure Analysis (Weibull)

The analysis of first failure is typically based on the Weibull

distribution

Basic idea is to fit the Weibull distribution to the time to (first)

failure data

⚫ Including censoring (i.e., survivors)

If little or no data is available, assumptions must be made

regarding the Weibull shape and scale parameters

⚫ “Weibayes”: Assume the shape parameter is known

⚫ Fully Bayesian: Assign a prior distribution to both parameters

This approach is most applicable to the survival or reliability

analysis of groups of similar (i.e., identical) components

⚫ Used in product design and manufacturing

5

Repeat Failure Analysis (HPP)

The analysis of repeat failures (or the frequency of recurring

events) is typically based on the Poisson process (HPP)

model

Here the interest is in analyzing the number of failures in a

given time interval

⚫ Failures are assumed to be fixed to as-good-as-new condition in a

short time interval

If little or no data is available, assumptions must be made

regarding the parameter l in the Poisson distribution

⚫ Fully Bayesian analysis using a conjugate (Gamma) prior

This approach is most applicable to the analysis of systems in

continuous operation (i.e., repairs are being made)

⚫ Basis for Probabilistic Safety Assessment (PSA)

6

PSA Analysis

The Bayesian approach is very useful in PSA

⚫ Analysis of low probability events for which few data are available

Both expert judgment and generic data are used for

characterizing prior distributions

Site specific information is then used to update the prior

distributions to obtain the posteriors

Generic data typically allows the estimation of prior distribution

parameters

⚫ Available from many industry sources and databases (e.g., INPO,

EPRI, NRC, DOE, etc.)

⚫ Allows the use of Bayesian conjugates

Expert judgment in PRA is often incorporated using the

Log-normal distribution with an error factor

7

Bayesian Update of Failure

Probability

88

Failure Probability (or Proportion)

The parameter p in the Binomial distribution represents the

probability of failure or proportion

⚫ From data, p can be estimated simply by dividing the observed

number of failures x with the observed number of trials n

⚫ n can also represent the number of identical components in a fleet

(i.e., population size)

If data is scarce or not available, we can treat p as a random

variable and assign a probability distribution to it

It is possible to use any type of prior distribution for p, however,

it is computationally advantageous and preferable to use a

conjugate distribution

The conjugate distribution for the Binomial distribution is the

Beta distribution

9

Beta/Binomial Conjugate

Therefore, we describe the parameter p as a random variable

having the Beta distribution

This is the prior distribution for the parameter p

⚫ i.e., probability of failure

The mean and variance of p are equal to

The unknown distribution parameters are a and b where

⚫ a corresponds to the prior number of failures

⚫ b corresponds to the prior number of successes (or no failures)

(no Excel formula available for the PDF)

10

Beta/Binomial Conjugate (cont’d)

The likelihood function in this case is the Binomial distribution

The evidence (i.e., observed data) are x and n where

⚫ x is the number of failures

⚫ n is the number of trials (or components in the fleet)

The updating is based on the Bayes formula

However, because we are using conjugate distributions, there

is no need to evaluate the Bayes’ theorem explicitly

⚫ i.e., no integration is required

=BINOMDIST(x,n,p,FALSE)

11


The posterior distribution is also a Beta distribution

The parameters of the posterior distribution are obtained using

simple formulas as

Therefore, the parameters of the prior distribution can be

updated directly using the new information

⚫ i.e. there is no need to use the integral form of the Bayes’ theorem

x is the number of observed failuresn is the number of trials (or components)

12


The credibility interval for p can then be obtained by

computing the appropriate quantiles from the posterior Beta

distribution

⚫ For example, for the

90 % credibility interval

The median value of p can

also be obtained as

Median Upper

Limit

Lower

Limit

Area = 0.05Area = 0.05

13

Consider the failure to start of the turbine train of the auxiliary

feedwater system (discussed in a previous example). Nine years

of industry data from such an event has been analyzed and

compiled in NUREG/CR-5500 Vol. 1. Based on the analysis, it

was determined that on average, there were 4.2 failures of the

turbine train to start in 157.3 demands.

(a) Plot the prior distribution of the failure to start assuming it

follows the Beta distribution.

(b) Compute the posterior distribution for a plant where the train

has failed to start once in the last eight demands.

(c) Compare the results of the Bayesian approach to the

maximum likelihood estimate using the plant data only.

Example 1

Solution:

⚫ Part (a) The prior parameters of the Beta distribution are a - the

number of failures and b - the number of successes (no failures)

14

Solution (cont’d)

For the generic industry data, we have x = 4.2 (number of failures) and n = 157.3 (number of demands)

The number of successes (no failures in 157.3 demands) is therefore equal to

Therefore, for the generic industry data, we get a = 4.2 and b = 153.1

The prior distribution for the failure to start p is given by the Beta distribution

where a = 4.2 and b = 153.1

15

Solution (cont’d)

The prior distribution is plotted below

0

5

10

15

20

25

30

35

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

p (failures/demand)

De

ns

ity

Mean

= 0.0267

Median

= 0.0247

a = 4.2

b = 153.1

16

Solution (cont’d)

Part (b) We now have to consider the observations from the

actual plant.

⚫ We have x = 1 (number of failures) in n = 8 (number of demands)

⚫ Because of the Beta/Binomial conjugate, the parameters of the

posterior distribution for p are easy to compute as

The posterior distribution for p is therefore given by

where a´ = 5.2 and b´ = 160.1

17

Solution (cont’d)

The mean and the median of the posterior distribution are

equal to

The 90 % credibility interval for the posterior distribution is computed using the =BETAINV() function as

18

0

5

10

15

20

25

30

35

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14

p (failures/demand)

De

ns

ity

Prior Distribution

Posterior Distribution

Max. Likelihood (data only)

Solution (cont’d)

Part (c) Compare the results to the plant data only.

⚫ The max. likelihood estimate is equal to p = 1/8 = 0.125

⚫ The 90 % confidence interval was computed previously in

Example 15.1 and is equal to 0.0064 < p < 0.47

Very strong prior!!!

(very few data)

19

Solution (cont’d)

The results are summarized below

Distribution 5th %tile Mean Median 95th %tile

Prior 0.00954 0.0267 0.0247 0.0506

Posterior 0.0128 0.0315 0.0296 0.0565

Plant Data Only 0.0064 0.125 -- 0.47

0.1250

0.0315

0.0267

0 0.1 0.2 0.3 0.4 0.5

p (failures/demand)

Prior

(industry average)

Posterior

(Bayesian estimate)

Max. LikelihoodPlant Data

Only

20

Bayesian Update of Failure

Rate

21

Poisson Process

The occurrence of recurring events (e.g., repeat failures) is

often modelled as a Homogeneous Poisson Process (HPP)

⚫ The failure rate l is constant (i.e., independent of time)

⚫ Repair time is assumed to be negligible

The probability associated with the number of failures in a

given time interval is given by the Poisson distribution

⚫ It is also the likelihood function for the Bayesian approach as

The parameter of interest is the unknown l which represents

the (constant) occurrence (e.g., failure) rate

⚫ i.e., “average” number of failures per unit time

=POISSON(x,lt,FALSE)

22

Gamma/Poisson Conjugate

The Bayesian conjugate distribution for the Poisson

distribution is the Gamma distribution

Therefore, we describe the constant failure rate l as a random

variable having the Gamma distribution

The probability density function (PDF) for the Gamma

distribution is

(Note that the Excel function uses 1/b instead of b)

This is the prior distribution for the failure rate l

The unknown distribution parameters are a and b where

⚫ a corresponds to the total number of failures

⚫ b corresponds to the total operating time

=GAMMADIST(l,a,1/b,FALSE)

23

Gamma/Poisson Conjugate (cont’d)

Because the Gamma distribution and Poisson distribution are

conjugates, the posterior distribution is also a Gamma

distribution

The mean and variance of l are given as

The parameters of the posterior distribution are obtained using

simple formulas as

Similar to before, the parameters of the prior distribution can be

updated directly using the new information

x is the number of observed failurest is the total operating time

=GAMMADIST(l,a’,1/b’,FALSE)

24

Gamma/Poisson Conjugate (cont’d)

The credibility interval for l is obtained by computing the

appropriate quantiles from the posterior Gamma distribution

⚫ For example, for the

90 % credibility interval

The median value of l can

also be obtained as

The mean value is simply Median Upper

Limit

Lower

Limit

Area = 0.05Area = 0.05

=GAMMAINV(0.05,a′,1/b′)



25

Consider again the plant planning on upgrading their eddy

current inspection system. Rather than a pure expert judgment,

the system engineer decides to use the failure history of the old

equipment to construct the prior distribution for the new

technology. Based on the historical data, the engineer has

estimated that the old system failed on average 3.2 times per

each outage campaign.

a) Construct and plot the prior distribution for the equipment

failure rate, assuming it follows the Gamma distribution.

b) Compute the posterior distribution for the failure rate given

that 2 failures of the new system were observed in the most

recent plant outage.

Example 2

Solution:

⚫ Assume the failure of the eddy current inspection system follows

the Homogeneous Poisson process (HPP)

26

Solution (cont’d)

Part (a): The prior parameters of the Gamma distribution are

⚫ a - the number of events and

⚫ b - the operating time

Using the historical data, we have a = 3.2 (number of failures)

and b = 1 (outage duration)

The prior distribution describes the uncertainty in the Poisson

rate parameter l, which is the equipment failure rate

The prior distribution for l is therefore given by the Gamma

distribution

where a = 3.2 and b = 1

=GAMMADIST(l,a,1/b,FALSE)

27

The prior distribution is plotted below

Solution (cont’d)

28

Solution (cont’d)

Part (b): We now consider the observations from the latest

outage.

⚫ We have x = 2 failures in t = 1 outage

⚫ Because of the Poisson/Gamma conjugate, the parameters of the

posterior distribution for l are extremely easy to compute as

The posterior distribution for l is therefore given by

where a´ = 5.2 and b´ = 2

=GAMMADIST(l,a’,1/b’,FALSE)

29

Solution (cont’d)

The mean and the median of the posterior distribution are

equal to

The 90 % credibility interval for the posterior distribution is computed using the =GAMMAINV() function as

30

The two distributions are plotted below for comparison

⚫ The plant data only estimate is equal to l = 2/1 = 2 failures/outage

⚫ The posterior distribution has less uncertainty (i.e., spread) and

has moved toward the point estimate

Solution (cont’d)

31

Reliability Analysis

Based on the previous example, what is the probability of

having no failures in the next outage?

We know that the probability is computed using the Poisson

distribution, however, now the failure rate l is no longer

constant but follows the Gamma distribution

⚫ This means that the probability of no failure in the next outage is

no longer a single value but a distribution!

May be difficult to evaluate analytically

Use the “best estimate” approach

⚫ e.g., use the mean or median failure rate as the “best estimate”

⚫ Could also use some upper percentile for worst case risk etc.

(i.e., bounding analysis)

32

Log-Normal Prior

33

Lack of Evidence

It is evident that there are many benefits for selecting prior

distributions that are conjugate to the failure models

⚫ Bayesian updating is straightforward

⚫ The distributions have tidy algebraic formulas

⚫ The information is easily entered into a PRA program by simply

entering the distribution type and associated parameter values

In some cases, however, there may not be enough evidence

(e.g., observations of failure) to estimate the prior distribution

type and its parameters

Must use expert opinion and engineering judgment

The Log-Normal distribution is often used to in this case

⚫ The Log-Normal prior is used extensively in PRA

34

Log-Normal in PRA

Historically, basic event inputs in Probabilistic Risk Assessment

(PRA) have been characterized using the Log-Normal

distribution (e.g., WASH-1400)

The use of the Log-Normal distribution is attractive because

⚫ It is always positive (good for modelling non-negative phenomena)

⚫ Has two parameters (allows for flexibility)

⚫ Large amount of existing failure rate knowledge, both plant-

specific and generic industry data is found in this form

Unfortunately, the Log-Normal distribution is not a conjugate to

any of the other standard distributions

⚫ Makes Bayesian analysis more complicated

⚫ i.e., no simple update formulas exist

35

Error Factor

The uncertainty range in the expert estimate is typically

characterized using the error factor

The error factor for the Log-Normal distribution is defined

simply as the ratio of the 95th percentile value to the median

(or the ratio of the median value to the 5th percentile value)

The error factor describes the amount of “dispersion” of spread

in the distribution from the median value

⚫ For example, for EF = 10, the lower and upper bounds of the

90 % confidence interval are equal to “10 times” the median value

⚫ E.g., for l = 10-3, the 90 % confidence interval is 10-4 < l < 10-2

36

0

100

200

300

400

500

600

700

800

0 0.001 0.002 0.003 0.004 0.005l

De

ns

ity

l 50 = 0.001

l 95 = 0.003l 05 = 3.3E-4

EF = 3

Error Factor (cont’d)

Log-Normal distribution for l = 10-3 and EF = 3

37

Error Factor (cont’d)

In most cases, the Log-Normal prior is simply characterized by

the median value and an error factor

⚫ e.g., an initiating event frequency may have a point estimate

equal to 10-4 (assumed to be the median value) with an error

factor of 5

Error factors in PRA generally range from 1.3 to 30

⚫ Smallest factors or 1.3 - 2 typically apply to the higher event

frequencies (one or more per reactor year) for which a larger

amount of recorded data generally exists

⚫ Factors of 2 - 3 apply to higher valued component failure rates

and to single human error rates, which are in the vicinity of 1×10-3

per demand or per attempt

⚫ The largest factors of 20 - 30 apply to unlikely pipe rupture rates

and multiple human errors being committed

38

Documents

Section 17 Bayesian Reliability Analysis - UNENE