Reliability Audit Lab
VEM RAL
DFR – Fundamentals for Engineers
DFR – Design for Reliability
Reliability Audit Lab
VEM RAL
Topics that will be covered:
1. Need for DFR2. DFR Process3. Terminology4. Weibull Plotting5. System Reliability6. DFR Testing7. Accelerated Testing
Reliability Audit Lab
VEM RAL
1. Need for DFR
Reliability Audit Lab
VEM RAL
What Customers Care about:
1. Product Life…. i.e., useful life before wear-out.
2. Minimum Downtime…. i.e., Maximum MTBF.
3. Endurance…. i.e., # operations, robust to environmental changes.
4.Stable Performance…. i.e., no degradation in CTQs.
5. ON time Startup…. i.e., ease of system startup
Reliability Audit Lab
VEM RAL
Reliability Audit Lab
VEM RALReliable Product Vision
Failure Mode Identification
(Pre-Launch)Failure Rate Resources/Costs
Identify & “eliminate” inherent failure modes before launch. (Minimize Excursions!)
Start with lower “running rate”, then aggressively “grow” reliability. (Reduce Warranty Costs)
Reduce overall costs by employing DFR from the beginning.
Take control of our product quality and aggressively drive to our goals
Time
# Fa
ilure
Mod
es DFR
No DFR
TimeFa
ilure
Rat
eGoal
Release
DFR
No DFR
Time
Res
ourc
es/c
osts
DFR
50%
5%
Release
No DFR
Reliability Audit Lab
VEM RAL
2. DFR - Process
Reliability Audit Lab
VEM RAL
DP0 DP2 DP3Specify Design ImplementDP1
NPI Process
• CTQ Identification• Customer Metrics
Rel. Goal Setting
• Assess Customer needs• Establish Reliability goals•
Develop Reliability metrics
Verification
• Execute Reliability Test strategy
• Continue Growth Testing• Accelerated Tests• Demonstration Testing• Agency / Compliance Testing
Production / Field
• Establish audit program• FRACAS system using ‘Clarify’• Correlate field data & test results
System Model
• Construct functional block diagrams•
ID critical comps. & failure potential• Define Reliability model
• Allocate reliability targets
Design
• Apply robust design tools•• DFSS tools Generate life predictions
• Begin Growth Testing
• Field data analysis
Reliability Audit Lab
VEM RAL
Institute Reliability Validation Program• Implement process firewalls & sensors to hold design robustness• Develop and implement long-term reliability validation audit
Legacy Product DFR Process . . .
1
2
3
4
5
Develop & Execute Reliability Growth Plan• Determine root cause for all identified failures• Redesign process or parts to address failure mode pareto• Validate reliability improvement through accelerated life testing & field betas
Develop Reliability Profile & Goals• Develop P-Diagrams & System Block Diagram• Generate Reliability Weibull plots for operational endurance• Allocate reliability goals to key subsystems• Identify reliability gaps between existing product & goals for each subsystem
Analyze Field & In-house Endurance Test Data• Develop product Fault Tree Analysis• Identify and pareto observed failure modes
Review Historical Data• Review historical reliability & field failure data• Review field RMA’s• Review customer environments & applications
Reliability Audit Lab
VEM RALDesign For Reliability Program Summary
DFR needs to be part of the entire product development cycle
• Customer reliability expectations & needs must be fully understood
• Reliability must be viewed from a “systems engineering” perspective
• Product must be designed for the intended use environment
• Reliability must be statistically verified (or risk must be accepted)
• Field data collection is imperative (environment, usage, failures)
• Manufacturing & supplier reliability “X’s” must be actively managed
Keys to DFR:
Reliability Audit Lab
VEM RAL
3. DFR - Terminology
Reliability Audit Lab
VEM RAL
What do we mean by
1. Reliability
2. Failure
3. Failure Rate
4. Hazard Rate
5. MTTF / MTBF
Reliability Audit Lab
VEM RAL
1. Reliability R(t): The probability that an item will perform its intended function without failure under stated conditions for a specified period of time
2. Failure: The termination of the ability of the product to perform its intended function
3. Failure Rate [F(t)]: The ratio of no. of failures within a sample to the cumulative operating time.
4. Hazard Rate [h(t)]: The instantaneous probability of failure of an item given that it has survived until that time, sometimes called as instantaneous failure rate.
Reliability Audit Lab
VEM RALFailure Rate Calculation Example
EXAMPLE: A sample of 1000 meters is tested for a week, and two of them fail. (assume they fail at the end of the week). What is the Failure Rate?
hoursfailuresRateFailure
7*24*10002= =
2168 ,000
failures /hour
= 1.19E-5 failures/hr
Reliability Audit Lab
VEM RAL
Probability Distribution Function (PDF):
The Probability Distribution Function (PDF) is the distribution f(t) of times to failure. The value of f(t) is the probability of the product failing precisely at time t.
time
f (t)
Probability Distribution Function
t
Reliability Audit Lab
VEM RAL
ProbabilityDistribution
Probability DensityFunction, f(t)
Variate,Range, t
Exponential
Weibull
Normal
LogNormal
f t =λe−λt
f t =βη⋅tηβ−1⋅e
− tββ
f t =1σ 2π
⋅e− t−μ 2
2σ2
f t =1σt 2π
⋅e ln t −μ 2
2σ2
0≤t∞
0≤t∞
−∞t∞
0≤t∞
Common Distributions
Reliability Audit Lab
VEM RAL
The Cumulative Distribution Function (CDF) represents the probability that the product fails at some time prior to t. It is the integral of the PDF evaluated from 0 to t.
Cumulative Distribution Function (CDF) :
CDF=F t =∫0
t
f t dt
time
f (t)
Probability Distribution Function
t1
CumulativeDistribution Function
Reliability Audit Lab
VEM RAL
Reliability Function R(t)The reliability of a product is the probability that it does not fail before time t. It is therefore the complement of the CDF:
R t =1−F t =1−∫0
t
f t dt
or
R t =∫t
∞
f t dt
time
f (t)
t
R(t) = 1-F(t)
Probability Density Function
Typical characteristics: • when t=0, R(t)=1• when t→∞, R(t) →0
time
f (t)
t
R(t) = 1-F(t)
Probability Density Function
Reliability Audit Lab
VEM RAL
Hazard Function h(t)
The hazard function is defined as the limit of the failure rate as Δt approaches zero.
In other words, the hazard function or the instantaneous failure rate is obtained as
h(t) = lim [R(t) – R(t+Δt)] / [Δt * R(t)] Δt -> 0
The hazard function or hazard rate h(t) is the conditional probability of failure in the interval t to (t + Δt), given that there was no failure at t. It is expressed as
h(t) = f(t) / R(t).
Reliability Audit Lab
VEM RAL
Hazard Functions As shown the hazard rate is a function of time.
What type of function does hazard rate exhibit with time?
The general answer is the bathtub-shaped function.
The sample will experience a high failure rate at the beginning of the operation time due to weak or substandard components, manufacturing imperfections, design errors and installation defects. This period of decreasing failure rate is referred to as the “infant mortality region”
This is an undesirable region for both the manufacturer and consumer viewpoints as it causes an unnecessary repair cost for the manufacturer and an interruption of product usage for the consumer.
The early failures can be minimized by improving the burn-in period of systems or components before shipments are made, by improving the manufacturing process and by improving the quality control of the products.
Reliability Audit Lab
VEM RAL
At the end of the early failure-rate region, the failure rate will eventually reach a constant value. During this constant failure-rate region the failures do not follow a predictable pattern but occur at random due to the changes in the applied load.
The randomness of material flaws or manufacturing flaws will also lead to failures during the constant failure rate region.
The third and final region of the failure-rate curve is the wear-out region. The beginning of the wear out region is noticed when the failure rate starts to increase significantly more than the constant failure rate value and the failures are no longer attributed to randomness but are due to the age and wear of the components.
To minimize the effect of the wear-out region, one must use periodic preventive maintenance or consider replacement of the product.
Reliability Audit Lab
VEM RAL
Infant MortalityRandom Failure
(Useful Life) Wear out
ManufacturingDefects
RandomFailures
Wear outFailures
Product's Hazard Rate Vs. Time : “The Bathtub Curve”
Time
Haz
ard
Rat
e, h
(t)
h(t) decreasing
h(t) constant
h(t) increasing
Reliability Audit Lab
VEM RAL
Mean Time To Failures [MTTF] -
One of the measures of the system's reliability is the mean time to failure (MTTF). It should not be confused with the mean time between failure (MTBF). We refer to the expected time between two successive failures as the MTTF when the system is non-repairable.
When the system is repairable we refer to it as the MTBF
Now let us consider n identical non-repairable systems and observe the time to failure for them. Assume that the observed times to failure are t1, t
2, .........,t
n. The estimated mean time to failure, MTTF is
MTTF = (1/n)Σ ti
Reliability Audit Lab
VEM RAL
EXAMPLE: A motor is repaired and returned to service six times during its life and provides 45,000 hours of service. Calculate MTBF.
Useful Life Metrics: Mean Time Between Failures (MTBF)
MTBF =Total operating time
¿ of failures=
45 ,0006
= 7,500 hours
MTBF or MTTF is a widely-used metric during theUseful Life period, when the hazard rate is constant
(also Mean Cycles Between Failures, MCBF, etc.)
Mean Time Between Failures [MTBF] - For a repairable item, the ratio of the cumulative operating time to the number of failures for that item.
Reliability Audit Lab
VEM RALThe Exponential DistributionIf the hazard rate is constant over time, then the product follows the exponential distribution. This is often used for electronic components.
h t = λ=constant
MTBF mean time between failures =1λ
f t =λe−λt F t =1−e−λt R t =e−λt
At MTBF: R t =e−λt=e−λ 1
λ =e−1=36.8
Appropriate tool if failure rate is known to be constant
Reliability Audit Lab
VEM RAL
0 1 104 2 104 3 104 4 104 5 1040
0.0001
0.0002
0.0003
f(t)
λ=.0003
λ=.0002
λ=.0001
Time to Failure
0 1 104 2 104 3 104 4 104 5 1040
0.333
0.667
1
F(t)λ=.0001
λ=.0002
λ=.0003
Time
PDF:
CDF:
The Exponential Distribution
Reliability Audit Lab
VEM RALUseful Life Metrics: Reliability
R = e− t
MTBF = e−FR t Where: t = Mission length (uptime or cycles in question)
EXAMPLE: If MTBF for a motor is 7,500 hours, the probability of operating for 30 days without failure is ...
R = e− 30 ∗24 hours
7500 hours = 0 .908 = 90 . 8
A mathematical model for reliability during Useful Life
Reliability can be described by the single parameter exponential distribution when the Hazard Rate, λ, is constant (i.e. the “Useful Life” portion of the bathtub curve),
Reliability Audit Lab
VEM RAL
3. DFR – Weibull Plotting
Reliability Audit Lab
VEM RAL
• Originally proposed by the Swedish engineer Waloddi Weibull in the early 1950’s
• Statistically represented fatigue failures
• Weibull probability density function (PDF, distribution of values):
Weibull Probability Distribution
t = Mission length (time, cycles, etc.)
β = Weibull Shape Parameter, “Slope”
η = Weibull Scale Parameter, “Characteristic Life” Waloddi Weibull 1887-1979
f t = β t β -1
ηβe− tη
β
Equation valid for minimum life = 0
Reliability Audit Lab
VEM RAL
This powerful and versatile reliability function is capable of modeling most real-life systems because the time dependency of the failure rate can be adjusted.
The Weibull Distribution
R t =1−F t =e− tη
β
f t = βtβ−1
ηβe− tη
β
h t = βηβ
t β -1
Reliability Audit Lab
VEM RAL
• Exponential when β = 1.0• Approximately normal when β = 3.44• Time dependent hazard rate
Weibull PDF
500 1000 1500 2000
0.001
0.002
0.003
0.004
0.005
β=0.5η=1000
β=3.44η=1000
β=1.0η=1000
f t = βtβ−1
ηβe− tη
β
Reliability Audit Lab
VEM RAL
h t = f t 1 - F t
= f t R t
h t =
βh tη
β−1
exp [− tη β]1 - {1 - exp [−tη β]}
h t = βη β
t β -10 500 1000 1500 2000 2500
0.002
0.004
0.006
β=3.44η=1000
β=0.5η=1000
β=1.0η=1000
h(t)
Time
Weibull Hazard Function
β < 1: Highest failure rate early-“Infant Mortality”
β > 1: Highest failure rate later-“Wear-Out”
β = 1: Constant failure rate
Reliability Audit Lab
VEM RALWeibull Reliability Function
Time
0 500 1000 1500 2000 25000
0.2
0.4
0.6
0.8
1
β=3.44η=1000
β=1.0η=1000
β=0.5η=1000
R(t)
R t =1−F t =e− tη
β
Reliability is the probability that the part survives to time t.
Reliability Audit Lab
VEM RAL
Beta (β): The slope of the Weibull CDF when printed on Weibull paper
B-life: A common way to express values of the cumulative density function - B10 refers to the time at which 10% of the parts are expected to have failed.
CDF: Cumulative Density Function expresses the time-dependent probability that a failure occurs at some time before time t.
Eta (η): The characteristic life, or time at which 63.2% of the parts are expected to have failed. Also expressed as the B63.2 life. This is the y-intercept of the CDF function when plotted on Weibull paper.
PDF: Probability Density Function expresses the expected distribution of failures over time.
Weibull plot: A plot where the x-axis is scaled as ln(time) and the y-axis is scaled as ln(ln(1 / (1-CDF(t))). The Weibull CDF plotted on Weibull paper will be a straight line of slope β and y intercept = ln(ln(1 / (1-CDF(0))) = η.
Summary of Useful Definitions - Weibull Analysis
Reliability Audit Lab
VEM RAL
• Comparison: test results for a redesigned product can be plotted against original product or against goals
Weibull Analysis
What is a Weibull Plot ?
Confidence on Fit
ObservedFailures
Weibull Best Fit
• Easily generated, easily interpreted graphical read-out
• Nominal “best-fit” line, plus confidence intervals
• Log-log plot of probability of failure versus age for a product or component
Reliability Audit Lab
VEM RAL
Scale and Shape are the Key Weibull Parameters
Weibull Shape Parameter (β ) and Scale Parameter (η ) Defined
η is called the CHARACTERISTIC LIFE For the Weibull distribution, the characteristic life is equal to the scale parameter, η. This is the time at which 63.2% of the product will have failed.
β is called the SLOPEFor the Weibull distribution, the slope describes the steepness of the Weibull best-fit line (see following slides for more details). β also has a relationship with the trend of the hazard rate, as shown on the “bathtub curves” on a subsequent slide.
Reliability Audit Lab
VEM RALβ and the Bathtub Curve
β < 1
• Implies “infant mortality”
• If this occurs: Failed products “not to print” Manufacturing or assembly defects Burn-in can be helpful
• If a component survives infant mortality phase, likelihood of failure decreases with age.
β = 1• Implies failures are “random”, individually
unpredictable
• An old part is as good as a new part (burn-in not appropriate)
• If this occurs: Failures due to external stress,
maintenance or human errors. Possible mixture of failure modes
1 < β < 4• Implies mild wearout
• If this occurs Low cycle fatigue Corrosion or Erosion Scheduled replacement may be cost
effective
β > 4• Implies rapid wearout
• If this occurs, suspect: Material properties Brittle materials like ceramics
• Not a bad thing if it happens after mission life has been exceeded.
Reliability Audit Lab
VEM RAL
5. DFR – System Reliability
Reliability Audit Lab
VEM RAL
System Reliability Evaluation
A system (or a product) is a collection of components arranged according to a specific design in order to achieve desired functions with acceptable performance and reliability measures.
Clearly, th type of components used, their qualities, and the design configuration in which they are arranged have a direct effect on the system performance an its reliability. For example, a designer may use a smaller number of high-quality components and configure them in a such a way to result in a highly reliable system, or a designer may use larger number of lower-quality components and configure them differently in order to achieve the same level of reliability.
Once the system is configured, its reliability must be evaluated and compared with an acceptable reliability level. If it does not meet the required level, the system should be redesigned and its reliability should be re-evaluated.
Reliability Audit Lab
VEM RALReliability Block Diagram (RBD) Technique
The first step in evaluating a system's reliability is to construct a reliability block diagram which is a graphical representation of the components of the system and how they are connected.The purpose of RBD technique is to represent failure and success criteria pictorially and to use the resulting diagram to evaluate System Reliability.
BenefitsThe pictorial representation means that models are easily understood and
therefore readily checked.Block diagrams are used to identify the relationship between elements in the
system. The overall system reliability can then be calculated from the reliabilities of the blocks using the laws of probability.
Block diagrams can be used for the evaluation of system availability provided that both the repair of blocks and failures are independent events, i.e. provided the time taken to repair a block is dependent only on the block concerned and is independent of repair to any other block
Reliability Audit Lab
VEM RAL
Elementary modelsBefore beginning the model construction, consideration should be given to
the best way of dividing the system into blocks. It is particularly important that each block should be statistically independent of all other blocks (i.e. no unit or component should be common to a number of blocks).
The most elementary models are the followingSeriesActive parallelm-out-of-nStandby models
Reliability Audit Lab
VEM RAL
Simple Series and Parallel System
A B C Z
a) Series System
Figure a shows the units A,B,C,….Z constituting a system. The interpretation can be stated as ‘any unit failing causes the system as a whole to fail’, and the system is referred to as active series system.Under these conditions, the reliability R(s) of the system is given by R(s) = Ra * Rb * Rc * ………Rz
X
Y
I O
I O
b) Parallel System
Figure b shows the units X and Y that are operating in such a way that the system will survive as long asAt lest one of the unit survives. This type of system is referred to as an active parallel system.
R(s) = 1 – (1 – Rx)(1 – Ry)
Typical RBD configurations and related formulae
Reliability Audit Lab
VEM RAL
A Series / Parallel System
A1 B1 C1 Z1
I
O
A2 B2 C2 Z2
c) Series / ParallelSystem
When blocks such as X and Y themselves comprise sub-blocks in series, block diagrams of the type are illustrated in figure c.
Rx = Ra1 * Rb1 * Rc1 *……..Rz1;
Ry = Ra2 * Rb2 * Rc2 *……..Rz2
Rs = 1 – (1 – Rx)(1 – Ry)
Reliability Audit Lab
VEM RAL
m-out-of-n unitsThe figure represents instances where system success is assured whenever at least m of n identical units are in an operational state. Here m = 2, n = 3.
Rs = (Rx)^3 + 3*(Rx)^2*Fx, where Fx = 1 – Rx.
X
X
X 2/3I O
d) m-out-of-n System
Reliability Audit Lab
VEM RAL
6. DFR – Reliability Testing
Reliability Audit Lab
VEM RAL
Reliability Testing allows us to:
Reliability Testing - Why?
• Provide a path to “grow” a product’s reliability by identifying weak points in the design.
• Have confidence that our sample-based prediction will accurately reflect the performance of the entire population.
• Determine if a product’s design is capable of performing its intended function for the desired period of time.
• Identify failures caused by severe applications that exceed the ratings, and recognize opportunities for the product to safely perform under more diverse applications.
• Confirm the product’s performance in the field.
Reliability Audit Lab
VEM RAL
Reliability Testing answers questions like …
Reliability Testing - Measures
• What is my product’s Failure Rate?
. . . . . .
These metrics and more can be obtained with the right reliability test
• Which distribution does my data follow?
• What is the expected life?
• What does my hazard function look like?
• What failure modes are present?
• How “mature” is my product’s reliability?
Reliability Audit Lab
VEM RAL
Four Major Categories of Reliability Testing
• Reliability Growth Tests (RGT)
• Reliability Demonstration Tests (RDT)
• Production Reliability Acceptance Tests (PRAT)
• Reliability Validation (RV)
- Normal Testing- Accelerated Testing
Reliability Audit Lab
VEM RAL
Scope: To determine a product’s physical limitations, functionalcapabilities and inherent failure mechanisms.
Used early & throughout the design process
Reliability Testing - Growth Testing
• Emphasis is on discovering & “eliminating” failure modes
• Failures are welcome. . . represent data sources
• Failures in development = less failures in field
• Used with a changing design to drive reliability growth
• Sample size is typically small• Test Types: Normal or Accelerated Testing
• Can be very helpful early in process when done on competitor products which are sufficiently similar to the new design.
Reliability Audit Lab
VEM RAL
Scope: To demonstrate the product’s ability to fulfill reliability, availability & design requirements under realistic conditions.
Reliability Testing … Demonstration Testing
Used at end of design stages to demonstrate compliance to specification
• Failures are no longer hoped for, because they jeopardize compliance (though it’s still better to catch a problem before rather than after launch!)
• Management tool . . . provides means for verifying compliance
• Provide reliability measurement, typically performed on a static design (subsequent design changes may invalidate the demonstrated reliability results)
• Sample size is typically larger, due to need for degree of confidence in results and increased availability of samples.
Reliability Audit Lab
VEM RAL
Scope: To ensure that variation in materials, parts, & processes related to move from prototypes to full production does not affect product reliability
Reliability Testing … Production Reliability Acceptance Testing (PRAT)
Screens and Audits precipitate and detect hidden defects
• Provides feedback for continuous improvement in sourcing/manufacturing
• Performed during full production, verifies that predictions based on
prototype results are valid in full production
• Sample size ranges from full(screen) to partial (audit)
• Test Types: Highly Accelerated Stress Screens/Audits (HASS/A),
Environmental Stress Screening (ESS), Burn in
Reliability Audit Lab
VEM RAL
Scope: To ensure that the product is performing reliably in the actual customer environment/application.
Reliability Testing … Validation
Reliability Validation tracks field data on Customer Dashboards
• Provides field feedback on the success of the design
• “Testing results” based on actual field data sources
• Helps to improve future design / redesign & prediction methods
• Requires effective data collection & corrective action process
• Sample size depends on the customer & product type
Reliability Audit Lab
VEM RAL
Reliability Tests are critical at all stages!
Reliability Testing … The Path
Legacy Products:
Initial Design
Set Reliability GoalsDevelop Models
Initial DesignAccelerated Testing
NPI (New Products):
Growth Testing
Demonstration Testing
Acceptance Testing
Growth Testing
Validation Testing Implementation
Implement ProductionReliability Demonstration
Audit Programs
Establish service scheduleKeep updated dashboards
Ensure Data CollectionImprove future design
Post-Sales ServiceDemonstration Testing
NPI Pilot ReadinessMature Design
Pilot Testing
Implementation
Implement changesReliability Demonstration
Audit Programs
Product Redesign
Revise goalsRedefine modelsProduct redesign
Verification
Reproduce Failure Reliability Verification
Complaint generatedCreate case Clarify
Field Data Acquisition
Validation TestingAcceptance Testing
Reliability Audit Lab
VEM RAL
7. DFR – Accelerated Testing
Reliability Audit Lab
VEM RAL
Results @ high stress + stress-life relationship = Results @ normal stress
.............
BASIC CONCEPT
Stress
Tim
e to
Fai
lure
Accelerated Testing
Model:The model is how we extrapolate back to normal stress levels.
Common Models:
}
To predict here,(Normal stress level)
}we test here
(Elevated stress level)
Scope : Accelerated testing allows designers to make predictions about the life of a product by developing a model that correlates reliability under accelerated conditions to reliability under normal conditions.
• Arrhenius: Thermal
• Inverse Power Law: Non-Thermal
• Eyring: Combined
Reliability Audit Lab
VEM RAL
Key steps in planning an accelerated test:
• Choose a stress to elevate: requires an understanding of the anticipated failure mechanism(s) - must be relevant (temp. & vibration usually apply)
Applicability of technique depends on careful planning and execution
Accelerated Testing
• Determine the accelerating model: requires knowledge of the nature of the acceleration of this failure mechanism, as a function of the accelerating stress.
• Select elevated stress levels: requires a previous study of the product’s operating & destructive limits to ensure that the elevated stress level does not introduce new failure modes which would not occur at normal operating stress levels.
Reliability Audit Lab
VEM RAL
Parametric Reliability Models
One of the most important factors that influence the design process of a product or a system is the reliability values of its components.
In order to estimate the reliability of the individual components or the entire system, we may follow one or more of the following approaches.
➢Historical Data➢Operational Life Testing➢Burn-In Testing➢Accelerated Life Testing
Reliability Audit Lab
VEM RAL
Approach 1 : Historical Data
The failure data for the components can be found in data banks such as
➢GIDEP (Government-Industry Data Exchange Program), ➢MIL-HDBK-217 (which includes failure data for components as well as
procedures for reliability prediction), ➢AT&T Reliability Manual and ➢Bell Communications Research Reliability Manual.
In such data banks and manuals, the failure data are collected from different manufacturers and presented with a set of multiplying factors that relate to different manufacturer's quality levels and environmental conditions