Download pdf - Dfr Presentation

Reliability Audit Lab

VEM RAL

DFR – Fundamentals for Engineers

DFR – Design for Reliability


VEM RAL

Topics that will be covered:

1. Need for DFR2. DFR Process3. Terminology4. Weibull Plotting5. System Reliability6. DFR Testing7. Accelerated Testing


VEM RAL

1. Need for DFR


VEM RAL

What Customers Care about:

1. Product Life…. i.e., useful life before wear-out.

2. Minimum Downtime…. i.e., Maximum MTBF.

3. Endurance…. i.e., # operations, robust to environmental changes.

4.Stable Performance…. i.e., no degradation in CTQs.

5. ON time Startup…. i.e., ease of system startup


VEM RAL


VEM RALReliable Product Vision

Failure Mode Identification

(Pre-Launch)Failure Rate Resources/Costs

Identify & “eliminate” inherent failure modes before launch. (Minimize Excursions!)

Start with lower “running rate”, then aggressively “grow” reliability. (Reduce Warranty Costs)

Reduce overall costs by employing DFR from the beginning.

Take control of our product quality and aggressively drive to our goals

Time

# Fa

ilure

Mod

es DFR

No DFR

TimeFa

ilure

Rat

eGoal

Release

DFR

No DFR

Time

Res

ourc

es/c

osts

DFR

50%

5%

Release

No DFR


VEM RAL

2. DFR - Process


VEM RAL

DP0 DP2 DP3Specify Design ImplementDP1

NPI Process

• CTQ Identification• Customer Metrics

Rel. Goal Setting

• Assess Customer needs• Establish Reliability goals•

Develop Reliability metrics

Verification

• Execute Reliability Test strategy

• Continue Growth Testing• Accelerated Tests• Demonstration Testing• Agency / Compliance Testing

Production / Field

• Establish audit program• FRACAS system using ‘Clarify’• Correlate field data & test results

System Model

• Construct functional block diagrams•

ID critical comps. & failure potential• Define Reliability model

• Allocate reliability targets

Design

• Apply robust design tools•• DFSS tools Generate life predictions

• Begin Growth Testing

• Field data analysis


VEM RAL

Institute Reliability Validation Program• Implement process firewalls & sensors to hold design robustness• Develop and implement long-term reliability validation audit

Legacy Product DFR Process . . .

1

2

3

4

5

Develop & Execute Reliability Growth Plan• Determine root cause for all identified failures• Redesign process or parts to address failure mode pareto• Validate reliability improvement through accelerated life testing & field betas

Develop Reliability Profile & Goals• Develop P-Diagrams & System Block Diagram• Generate Reliability Weibull plots for operational endurance• Allocate reliability goals to key subsystems• Identify reliability gaps between existing product & goals for each subsystem

Analyze Field & In-house Endurance Test Data• Develop product Fault Tree Analysis• Identify and pareto observed failure modes

Review Historical Data• Review historical reliability & field failure data• Review field RMA’s• Review customer environments & applications


VEM RALDesign For Reliability Program Summary

DFR needs to be part of the entire product development cycle

• Customer reliability expectations & needs must be fully understood

• Reliability must be viewed from a “systems engineering” perspective

• Product must be designed for the intended use environment

• Reliability must be statistically verified (or risk must be accepted)

• Field data collection is imperative (environment, usage, failures)

• Manufacturing & supplier reliability “X’s” must be actively managed

Keys to DFR:


VEM RAL

3. DFR - Terminology


VEM RAL

What do we mean by

1. Reliability

2. Failure

3. Failure Rate

4. Hazard Rate

5. MTTF / MTBF


VEM RAL

1. Reliability R(t): The probability that an item will perform its intended function without failure under stated conditions for a specified period of time

2. Failure: The termination of the ability of the product to perform its intended function

3. Failure Rate [F(t)]: The ratio of no. of failures within a sample to the cumulative operating time.

4. Hazard Rate [h(t)]: The instantaneous probability of failure of an item given that it has survived until that time, sometimes called as instantaneous failure rate.


VEM RALFailure Rate Calculation Example

EXAMPLE: A sample of 1000 meters is tested for a week, and two of them fail. (assume they fail at the end of the week). What is the Failure Rate?

hoursfailuresRateFailure

7*24*10002= =

2168 ,000

failures /hour

= 1.19E-5 failures/hr


VEM RAL

Probability Distribution Function (PDF):

The Probability Distribution Function (PDF) is the distribution f(t) of times to failure. The value of f(t) is the probability of the product failing precisely at time t.

time

f (t)

Probability Distribution Function

t


VEM RAL

ProbabilityDistribution

Probability DensityFunction, f(t)

Variate,Range, t

Exponential

Weibull

Normal

LogNormal

f t =λe−λt

f t =βη⋅tηβ−1⋅e

− tββ

f t =1σ 2π

⋅e− t−μ 2

2σ2

f t =1σt 2π

⋅e ln t −μ 2

2σ2

0≤t∞

0≤t∞

−∞t∞

0≤t∞

Common Distributions


VEM RAL

The Cumulative Distribution Function (CDF) represents the probability that the product fails at some time prior to t. It is the integral of the PDF evaluated from 0 to t.

Cumulative Distribution Function (CDF) :

CDF=F t =∫0

t

f t dt

time

f (t)

Probability Distribution Function

t1

CumulativeDistribution Function


VEM RAL

Reliability Function R(t)The reliability of a product is the probability that it does not fail before time t. It is therefore the complement of the CDF:

R t =1−F t =1−∫0

t

f t dt

or

R t =∫t

∞

f t dt

time

f (t)

t

R(t) = 1-F(t)

Probability Density Function

Typical characteristics: • when t=0, R(t)=1• when t→∞, R(t) →0

time

f (t)

t

R(t) = 1-F(t)

Probability Density Function


VEM RAL

Hazard Function h(t)

The hazard function is defined as the limit of the failure rate as Δt approaches zero.

In other words, the hazard function or the instantaneous failure rate is obtained as

h(t) = lim [R(t) – R(t+Δt)] / [Δt * R(t)] Δt -> 0

The hazard function or hazard rate h(t) is the conditional probability of failure in the interval t to (t + Δt), given that there was no failure at t. It is expressed as

h(t) = f(t) / R(t).


VEM RAL

Hazard Functions As shown the hazard rate is a function of time.

What type of function does hazard rate exhibit with time?

The general answer is the bathtub-shaped function.

The sample will experience a high failure rate at the beginning of the operation time due to weak or substandard components, manufacturing imperfections, design errors and installation defects. This period of decreasing failure rate is referred to as the “infant mortality region”

This is an undesirable region for both the manufacturer and consumer viewpoints as it causes an unnecessary repair cost for the manufacturer and an interruption of product usage for the consumer.

The early failures can be minimized by improving the burn-in period of systems or components before shipments are made, by improving the manufacturing process and by improving the quality control of the products.


VEM RAL

At the end of the early failure-rate region, the failure rate will eventually reach a constant value. During this constant failure-rate region the failures do not follow a predictable pattern but occur at random due to the changes in the applied load.

The randomness of material flaws or manufacturing flaws will also lead to failures during the constant failure rate region.

The third and final region of the failure-rate curve is the wear-out region. The beginning of the wear out region is noticed when the failure rate starts to increase significantly more than the constant failure rate value and the failures are no longer attributed to randomness but are due to the age and wear of the components.

To minimize the effect of the wear-out region, one must use periodic preventive maintenance or consider replacement of the product.


VEM RAL

Infant MortalityRandom Failure

(Useful Life) Wear out

ManufacturingDefects

RandomFailures

Wear outFailures

Product's Hazard Rate Vs. Time : “The Bathtub Curve”

Time

Haz

ard

Rat

e, h

(t)

h(t) decreasing

h(t) constant

h(t) increasing


VEM RAL

Mean Time To Failures [MTTF] -

One of the measures of the system's reliability is the mean time to failure (MTTF). It should not be confused with the mean time between failure (MTBF). We refer to the expected time between two successive failures as the MTTF when the system is non-repairable.

When the system is repairable we refer to it as the MTBF

Now let us consider n identical non-repairable systems and observe the time to failure for them. Assume that the observed times to failure are t1, t

2, .........,t

n. The estimated mean time to failure, MTTF is

MTTF = (1/n)Σ ti


VEM RAL

EXAMPLE: A motor is repaired and returned to service six times during its life and provides 45,000 hours of service. Calculate MTBF.

Useful Life Metrics: Mean Time Between Failures (MTBF)

MTBF =Total operating time

¿ of failures=

45 ,0006

= 7,500 hours

MTBF or MTTF is a widely-used metric during theUseful Life period, when the hazard rate is constant

(also Mean Cycles Between Failures, MCBF, etc.)

Mean Time Between Failures [MTBF] - For a repairable item, the ratio of the cumulative operating time to the number of failures for that item.


VEM RALThe Exponential DistributionIf the hazard rate is constant over time, then the product follows the exponential distribution. This is often used for electronic components.

h t = λ=constant

MTBF mean time between failures =1λ

f t =λe−λt F t =1−e−λt R t =e−λt

At MTBF: R t =e−λt=e−λ 1

λ =e−1=36.8

Appropriate tool if failure rate is known to be constant


VEM RAL

0 1 104 2 104 3 104 4 104 5 1040

0.0001

0.0002

0.0003

f(t)

λ=.0003

λ=.0002

λ=.0001

Time to Failure

0 1 104 2 104 3 104 4 104 5 1040

0.333

0.667

1

F(t)λ=.0001

λ=.0002

λ=.0003

Time

PDF:

CDF:

The Exponential Distribution


VEM RALUseful Life Metrics: Reliability

R = e− t

MTBF = e−FR t Where: t = Mission length (uptime or cycles in question)

EXAMPLE: If MTBF for a motor is 7,500 hours, the probability of operating for 30 days without failure is ...

R = e− 30 ∗24 hours

7500 hours = 0 .908 = 90 . 8

A mathematical model for reliability during Useful Life

Reliability can be described by the single parameter exponential distribution when the Hazard Rate, λ, is constant (i.e. the “Useful Life” portion of the bathtub curve),


VEM RAL

3. DFR – Weibull Plotting


VEM RAL

• Originally proposed by the Swedish engineer Waloddi Weibull in the early 1950’s

• Statistically represented fatigue failures

• Weibull probability density function (PDF, distribution of values):

Weibull Probability Distribution

t = Mission length (time, cycles, etc.)

β = Weibull Shape Parameter, “Slope”

η = Weibull Scale Parameter, “Characteristic Life” Waloddi Weibull 1887-1979

f t = β t β -1

ηβe− tη

β

Equation valid for minimum life = 0


VEM RAL

This powerful and versatile reliability function is capable of modeling most real-life systems because the time dependency of the failure rate can be adjusted.

The Weibull Distribution

R t =1−F t =e− tη

β

f t = βtβ−1

ηβe− tη

β

h t = βηβ

t β -1


VEM RAL

• Exponential when β = 1.0• Approximately normal when β = 3.44• Time dependent hazard rate

Weibull PDF

500 1000 1500 2000

0.001

0.002

0.003

0.004

0.005

β=0.5η=1000

β=3.44η=1000

β=1.0η=1000

f t = βtβ−1

ηβe− tη

β


VEM RAL

h t = f t 1 - F t

= f t R t

h t =

βh tη

β−1

exp [− tη β]1 - {1 - exp [−tη β]}

h t = βη β

t β -10 500 1000 1500 2000 2500

0.002

0.004

0.006

β=3.44η=1000

β=0.5η=1000

β=1.0η=1000

h(t)

Time

Weibull Hazard Function

β < 1: Highest failure rate early-“Infant Mortality”

β > 1: Highest failure rate later-“Wear-Out”

β = 1: Constant failure rate


VEM RALWeibull Reliability Function

Time

0 500 1000 1500 2000 25000

0.2

0.4

0.6

0.8

1

β=3.44η=1000

β=1.0η=1000

β=0.5η=1000

R(t)

R t =1−F t =e− tη

β

Reliability is the probability that the part survives to time t.


VEM RAL

Beta (β): The slope of the Weibull CDF when printed on Weibull paper

B-life: A common way to express values of the cumulative density function - B10 refers to the time at which 10% of the parts are expected to have failed.

CDF: Cumulative Density Function expresses the time-dependent probability that a failure occurs at some time before time t.

Eta (η): The characteristic life, or time at which 63.2% of the parts are expected to have failed. Also expressed as the B63.2 life. This is the y-intercept of the CDF function when plotted on Weibull paper.

PDF: Probability Density Function expresses the expected distribution of failures over time.

Weibull plot: A plot where the x-axis is scaled as ln(time) and the y-axis is scaled as ln(ln(1 / (1-CDF(t))). The Weibull CDF plotted on Weibull paper will be a straight line of slope β and y intercept = ln(ln(1 / (1-CDF(0))) = η.

Summary of Useful Definitions - Weibull Analysis


VEM RAL

• Comparison: test results for a redesigned product can be plotted against original product or against goals

Weibull Analysis

What is a Weibull Plot ?

Confidence on Fit

ObservedFailures

Weibull Best Fit

• Easily generated, easily interpreted graphical read-out

• Nominal “best-fit” line, plus confidence intervals

• Log-log plot of probability of failure versus age for a product or component


VEM RAL

Scale and Shape are the Key Weibull Parameters

Weibull Shape Parameter (β ) and Scale Parameter (η ) Defined

η is called the CHARACTERISTIC LIFE For the Weibull distribution, the characteristic life is equal to the scale parameter, η. This is the time at which 63.2% of the product will have failed.

β is called the SLOPEFor the Weibull distribution, the slope describes the steepness of the Weibull best-fit line (see following slides for more details). β also has a relationship with the trend of the hazard rate, as shown on the “bathtub curves” on a subsequent slide.


VEM RALβ and the Bathtub Curve

β < 1

• Implies “infant mortality”

• If this occurs: Failed products “not to print” Manufacturing or assembly defects Burn-in can be helpful

• If a component survives infant mortality phase, likelihood of failure decreases with age.

β = 1• Implies failures are “random”, individually

unpredictable

• An old part is as good as a new part (burn-in not appropriate)

• If this occurs: Failures due to external stress,

maintenance or human errors. Possible mixture of failure modes

1 < β < 4• Implies mild wearout

• If this occurs Low cycle fatigue Corrosion or Erosion Scheduled replacement may be cost

effective

β > 4• Implies rapid wearout

• If this occurs, suspect: Material properties Brittle materials like ceramics

• Not a bad thing if it happens after mission life has been exceeded.


VEM RAL

5. DFR – System Reliability


VEM RAL

System Reliability Evaluation

A system (or a product) is a collection of components arranged according to a specific design in order to achieve desired functions with acceptable performance and reliability measures.

Clearly, th type of components used, their qualities, and the design configuration in which they are arranged have a direct effect on the system performance an its reliability. For example, a designer may use a smaller number of high-quality components and configure them in a such a way to result in a highly reliable system, or a designer may use larger number of lower-quality components and configure them differently in order to achieve the same level of reliability.

Once the system is configured, its reliability must be evaluated and compared with an acceptable reliability level. If it does not meet the required level, the system should be redesigned and its reliability should be re-evaluated.


VEM RALReliability Block Diagram (RBD) Technique

The first step in evaluating a system's reliability is to construct a reliability block diagram which is a graphical representation of the components of the system and how they are connected.The purpose of RBD technique is to represent failure and success criteria pictorially and to use the resulting diagram to evaluate System Reliability.

BenefitsThe pictorial representation means that models are easily understood and

therefore readily checked.Block diagrams are used to identify the relationship between elements in the

system. The overall system reliability can then be calculated from the reliabilities of the blocks using the laws of probability.

Block diagrams can be used for the evaluation of system availability provided that both the repair of blocks and failures are independent events, i.e. provided the time taken to repair a block is dependent only on the block concerned and is independent of repair to any other block


VEM RAL

Elementary modelsBefore beginning the model construction, consideration should be given to

the best way of dividing the system into blocks. It is particularly important that each block should be statistically independent of all other blocks (i.e. no unit or component should be common to a number of blocks).

The most elementary models are the followingSeriesActive parallelm-out-of-nStandby models


VEM RAL

Simple Series and Parallel System

A B C Z

a) Series System

Figure a shows the units A,B,C,….Z constituting a system. The interpretation can be stated as ‘any unit failing causes the system as a whole to fail’, and the system is referred to as active series system.Under these conditions, the reliability R(s) of the system is given by R(s) = Ra * Rb * Rc * ………Rz

X

Y

I O

I O

b) Parallel System

Figure b shows the units X and Y that are operating in such a way that the system will survive as long asAt lest one of the unit survives. This type of system is referred to as an active parallel system.

R(s) = 1 – (1 – Rx)(1 – Ry)

Typical RBD configurations and related formulae


VEM RAL

A Series / Parallel System

A1 B1 C1 Z1

I

O

A2 B2 C2 Z2

c) Series / ParallelSystem

When blocks such as X and Y themselves comprise sub-blocks in series, block diagrams of the type are illustrated in figure c.

Rx = Ra1 * Rb1 * Rc1 *……..Rz1;

Ry = Ra2 * Rb2 * Rc2 *……..Rz2

Rs = 1 – (1 – Rx)(1 – Ry)


VEM RAL

m-out-of-n unitsThe figure represents instances where system success is assured whenever at least m of n identical units are in an operational state. Here m = 2, n = 3.

Rs = (Rx)^3 + 3*(Rx)^2*Fx, where Fx = 1 – Rx.

X

X

X 2/3I O

d) m-out-of-n System


VEM RAL

6. DFR – Reliability Testing


VEM RAL

Reliability Testing allows us to:

Reliability Testing - Why?

• Provide a path to “grow” a product’s reliability by identifying weak points in the design.

• Have confidence that our sample-based prediction will accurately reflect the performance of the entire population.

• Determine if a product’s design is capable of performing its intended function for the desired period of time.

• Identify failures caused by severe applications that exceed the ratings, and recognize opportunities for the product to safely perform under more diverse applications.

• Confirm the product’s performance in the field.


VEM RAL

Reliability Testing answers questions like …

Reliability Testing - Measures

• What is my product’s Failure Rate?

. . . . . .

These metrics and more can be obtained with the right reliability test

• Which distribution does my data follow?

• What is the expected life?

• What does my hazard function look like?

• What failure modes are present?

• How “mature” is my product’s reliability?


VEM RAL

Four Major Categories of Reliability Testing

• Reliability Growth Tests (RGT)

• Reliability Demonstration Tests (RDT)

• Production Reliability Acceptance Tests (PRAT)

• Reliability Validation (RV)

- Normal Testing- Accelerated Testing


VEM RAL

Scope: To determine a product’s physical limitations, functionalcapabilities and inherent failure mechanisms.

Used early & throughout the design process

Reliability Testing - Growth Testing

• Emphasis is on discovering & “eliminating” failure modes

• Failures are welcome. . . represent data sources

• Failures in development = less failures in field

• Used with a changing design to drive reliability growth

• Sample size is typically small• Test Types: Normal or Accelerated Testing

• Can be very helpful early in process when done on competitor products which are sufficiently similar to the new design.


VEM RAL

Scope: To demonstrate the product’s ability to fulfill reliability, availability & design requirements under realistic conditions.

Reliability Testing … Demonstration Testing

Used at end of design stages to demonstrate compliance to specification

• Failures are no longer hoped for, because they jeopardize compliance (though it’s still better to catch a problem before rather than after launch!)

• Management tool . . . provides means for verifying compliance

• Provide reliability measurement, typically performed on a static design (subsequent design changes may invalidate the demonstrated reliability results)

• Sample size is typically larger, due to need for degree of confidence in results and increased availability of samples.


VEM RAL

Scope: To ensure that variation in materials, parts, & processes related to move from prototypes to full production does not affect product reliability

Reliability Testing … Production Reliability Acceptance Testing (PRAT)

Screens and Audits precipitate and detect hidden defects

• Provides feedback for continuous improvement in sourcing/manufacturing

• Performed during full production, verifies that predictions based on

prototype results are valid in full production

• Sample size ranges from full(screen) to partial (audit)

• Test Types: Highly Accelerated Stress Screens/Audits (HASS/A),

Environmental Stress Screening (ESS), Burn in


VEM RAL

Scope: To ensure that the product is performing reliably in the actual customer environment/application.

Reliability Testing … Validation

Reliability Validation tracks field data on Customer Dashboards

• Provides field feedback on the success of the design

• “Testing results” based on actual field data sources

• Helps to improve future design / redesign & prediction methods

• Requires effective data collection & corrective action process

• Sample size depends on the customer & product type


VEM RAL

Reliability Tests are critical at all stages!

Reliability Testing … The Path

Legacy Products:

Initial Design

Set Reliability GoalsDevelop Models

Initial DesignAccelerated Testing

NPI (New Products):

Growth Testing

Demonstration Testing

Acceptance Testing

Growth Testing

Validation Testing Implementation

Implement ProductionReliability Demonstration

Audit Programs

Establish service scheduleKeep updated dashboards

Ensure Data CollectionImprove future design

Post-Sales ServiceDemonstration Testing

NPI Pilot ReadinessMature Design

Pilot Testing

Implementation

Implement changesReliability Demonstration

Audit Programs

Product Redesign

Revise goalsRedefine modelsProduct redesign

Verification

Reproduce Failure Reliability Verification

Complaint generatedCreate case Clarify

Field Data Acquisition

Validation TestingAcceptance Testing


VEM RAL

7. DFR – Accelerated Testing


VEM RAL

Results @ high stress + stress-life relationship = Results @ normal stress

.............

BASIC CONCEPT

Stress

Tim

e to

Fai

lure

Accelerated Testing

Model:The model is how we extrapolate back to normal stress levels.

Common Models:

}

To predict here,(Normal stress level)

}we test here

(Elevated stress level)

Scope : Accelerated testing allows designers to make predictions about the life of a product by developing a model that correlates reliability under accelerated conditions to reliability under normal conditions.

• Arrhenius: Thermal

• Inverse Power Law: Non-Thermal

• Eyring: Combined


VEM RAL

Key steps in planning an accelerated test:

• Choose a stress to elevate: requires an understanding of the anticipated failure mechanism(s) - must be relevant (temp. & vibration usually apply)

Applicability of technique depends on careful planning and execution

Accelerated Testing

• Determine the accelerating model: requires knowledge of the nature of the acceleration of this failure mechanism, as a function of the accelerating stress.

• Select elevated stress levels: requires a previous study of the product’s operating & destructive limits to ensure that the elevated stress level does not introduce new failure modes which would not occur at normal operating stress levels.


VEM RAL

Parametric Reliability Models

One of the most important factors that influence the design process of a product or a system is the reliability values of its components.

In order to estimate the reliability of the individual components or the entire system, we may follow one or more of the following approaches.

➢Historical Data➢Operational Life Testing➢Burn-In Testing➢Accelerated Life Testing


VEM RAL

Approach 1 : Historical Data

The failure data for the components can be found in data banks such as

➢GIDEP (Government-Industry Data Exchange Program), ➢MIL-HDBK-217 (which includes failure data for components as well as

procedures for reliability prediction), ➢AT&T Reliability Manual and ➢Bell Communications Research Reliability Manual.

In such data banks and manuals, the failure data are collected from different manufacturers and presented with a set of multiplying factors that relate to different manufacturer's quality levels and environmental conditions