Complex Conjugate History of Reliability
Larry George©2011 ASQ & Presentation Larry George
Presented live on Jan 13th, 2011
http://reliabilitycalendar.org/The_Reliability_Calendar/Webinars_‐_English/Webinars_‐_English.html
ASQ Reliability Division English Webinar SeriesOne of the monthly webinars
on topics of interest to reliability engineers.
To view recorded webinar (available to ASQ Reliability Division members only) visit asq.org/reliability
To sign up for the free and available to anyone live webinars visit reliabilitycalendar.org and select English Webinars to find links to register for upcoming events
http://reliabilitycalendar.org/The_Reliability_Calendar/Webinars_‐_English/Webinars_‐_English.html
1/10/2011 Problem Solving Tools 1
Complex Conjugate History of Reliability • SORD*SOTA = Real Reliability
– SORD = Significant Other Reliability Developments – SOTA = State Of The (reliability) Art
• Why? – Profit, save our jobs, and protect privacy – Do something about reliability, risk, and uncertainty!
• What’s in the future? What’s needed?
1/10/2011 Problem Solving Tools 2
What SORDs?
• Nonparametric reliability and failure rate functions for: – Grouped, left-and-right-censored, and
truncated data – Renewal and repairable processes
• Without life data
• Uncertainty: brooms, jackknives and bootstraps, extrapolations, scenarios,…
“Risk is present when future events occur with measurable probability. Uncertainty is present when the likelihood of future events is indefinite or incalculable.” Frank Knight
1/10/2011 Problem Solving Tools 3
Examples
• Component D (Weibull vs. nonparametric) • M88A1 drivetrain parts (Renewal process) • LED L70 reliability (Black-Scholes) • Pleasanton O-D matrix and travel times
(multivariate, network tomography)
1/10/2011 Problem Solving Tools 4
ANCIENT HISTORY • Discrete failure rate functions, aka actuarial
rates ~220 AD – Domitius Ulpianus: Roman Legion pension
planning, life table – John Graunt 1600s life tables – Edmond Halley ca 1693 annuities
• Insurance – James Dodson, Equitable Life, casualty (1762) – Gompertz' Curve (1825) death rate is
• a(t) = aebt+l from a double exponential cdf (Weibull)
1/10/2011 Problem Solving Tools 5
Gambling and Physics
• Gambling: Pascal, Laplace, Bernoullis, John Kelly, Ed Thorp, Dr. Z
• Utility, game, risk, credibility: Neumann, Morgenstern, Nash, Harsanyi, Hilary Seal, Bühlmann…
• Financial analysis, hedging, scenarios: Black-Merton-Scholes, Shannon, Thorp, Ziemba
• Physics: Schrödinger wave function : |(x;t)|2 is probability density: Myron Tribus’ statistical thermodynamics, entropy, and reliability
1/10/2011 Problem Solving Tools 6
Modern Times (outline)
• Modern histories • Significant other reliability developments
– RAND and the US AFLC – Barlow, Proschan, Marshall, Saunders, Block, et al. – Lajos Takacs, Stephen Vajda – Kaplan-Meier – Sir David Cox – Network tomography
1/10/2011 Problem Solving Tools 7
Modern Histories
• Barlow and Proschan reviewed reliability in their first book (1965)
• Nowlan and Heap’s “RCM” appendix D-1 contains more (1978)
• Recent publications about adopted developments [McLinn, Saleh and Marais]
• Psychologists hijack the meaning of reliability
1/10/2011 Problem Solving Tools 8
RAND and US AFLC • RAND adapted actuarial methods for
managing expensive, repairable equipment such as aircraft engines ~1960 – AFI 21-104 is current version – Actuarial forecast = Sn(t)a(t); demand ~Poisson
• MOD-METRIC used to buy $4B of F100PW100 engines and spares ~1973
• USPO 5287267, Robin Roundy et al. patented negative binomial demand distribution ~1991
1/10/2011 Problem Solving Tools 9
Barlow, Proschan, et al.
• What if failure rate isn’t constant? – Tests and bounds: IFR, IFRA, DMRL… – Renewal theory, replacement, availability, maintenance – FTA, Bayes, system vs. parts
• Coherence, redundancy, multivariate,
• Russians too: Kolmgorov, Gnedenko, Belyayev, Gertsbakh,… – Inspection, opportunistic maintenance
1/10/2011 Problem Solving Tools 10
Hungarians Too
• Asymptotic alternating renewal process (up-down-up-down-) statistics are normally distributed, regardless (Takacs) – Even with dependence (1960s) – Improve production throughput and reduce
variance, http://www.fieldreliability.com/Genie.htm • Gozintos N next-assembly matrix (Vajda)
– Products Vector*(I-N)-1 = Parts Vector
1/10/2011 Problem Solving Tools 11
Kaplan-Meier npmle • Nonparametric max. likelihood reliability
function (npmle) estimate from right-censored ages at failures – JASA made Ed Kaplan combine his vacuum tube
reliability paper with Paul Meier's biostatistics paper (1957)
– For dead-forever systems, not repairable
• Odd Aalen did the same for the failure rate function (Nelson-Aalen estimator)
1/10/2011 Problem Solving Tools 12
Sir David Cox PH Model
• Proportional hazards (aka relative risk) model is a “semiparametric” failure rate function of “concomitant” factors z (1971) – az(t) = ao(t)e-bz: b is regression coeff. vector – Easier than multivariate statistics: e.g., calendar
time and miles, operating hours
• Biostatisticians adopt PH model for testing hypotheses about z – Clinical trials
1/10/2011 Problem Solving Tools 13
Finance and Reliability
• Risk and hedging – Black-Scholes stochastic pde for stock price S
dS = mdt+sSdW: W is Brownian motion • Nobel prize to Merton and Scholes (1997) for option price
model • Hedging, LTCM, SIVs, CDOs, CDSs, mortgage defaults,
credit crises, deflation, deleveraging, inflation, unemployment???
– LED deterioration resembles geometric Brownian motion
– Scenarios include some black swans
1/10/2011 Problem Solving Tools 14
SORD Reliability (outline)
• “Credible Reliability Prediction” – Not just MTBF (ASQ RD monograph advert)
• Parametric vs. nonparametric – Component D
• LEDs L70 • Help! No life data • Unforeseen consequences • Renewal and repair
1/10/2011 Problem Solving Tools 15
Parametric vs. Nonparametric
• Parametric distribution if justified – Normal variation or asymptotic, weakest link,
exponential-Poisson-beta-binomial-Gamma-chi-square, lognormal (rate changes), inverse Gauss,…
• Nonparametric distribution – Preserves all information in data – Avoids opinions and mathematical convenience
• AIC balances overfitting and likelihood • Entropy quantifies assumed information
“Rule 1. Original data should be presented in a way that will preserve the relevant information derived from evidence in the original data for all predictions assessed to be useful.” Walter A Shewhart
1/10/2011 Problem Solving Tools 16
Component D Weibull vs. nonparametric • AIC = 2k2lnL: k = # estimated
parameters and L is likelihood function • Entropy Sp(t)ln(p(t)) is uncertainty in a
random variable’s pdf; less is better
Weibull Npmle
AIC 16.683 16.685
Entropy 0.0127 0.0135
1/10/2011 Problem Solving Tools 17
Black-Scholes and LEDs
Scatter Plot of Data Set 1 Normalized
0.96
0.97
0.98
0.99
1
1.01
1.02
0 730.5 1461 2191.5 2922 3652.5 4383 5113.5 5844 6574.5
Each Label is One Month in Hours
1/10/2011 Problem Solving Tools 18
L70: P[Age at 70% initial lumens > t]?
• Lumens at age t ~N[mt,st], independent • Deterioration fits Black-Scholes dSt = mdt+sStdWt
where St is 1-(% of initial lumens) – Estimate m and s from geometric Brownian motion – L70 ~inverse Gauss with parameters as functions of
70%, m and s
1/10/2011 Problem Solving Tools 19
L70 Weibull vs. Inverse Gauss
LED L70 Inverse-Gaussian Mixture and Weibull
Reliability Functions
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16 18 20
Age, Years
Relia
bilit
y
IG MixtureWeibull
1/10/2011 Problem Solving Tools 20
Help! No Life Data?
• “You need ages at failures and survivors’ ages” • “It’s too hard to estimate reliability from ships
and returns counts” – Ships are counts of production, sales, installations,
or other installed base – Returns are counts of complaints, failures, repairs,
or even spares sales
• Follow a sample by S/N? Ships and returns are population data, required by GAAP!
“People’s intuition about random sampling appears to satisfy the law of small numbers, which asserts that the law of large numbers applies to small numbers as well.” Tversky and Kahneman
1/10/2011 Problem Solving Tools 21
M/G/ and npmle
• Npmle of service distribution from M/G/ queue input and output times (1975 NLRQ)
• Richard Barlow and I overlooked potential for reliability
• Works for Mt/G/ queues under mild conditions on the nonstationary Poisson Mt
• Extended to renewal processes (recycling)
•n1
•n2
•R1
•R2
Time
•n1
•n2
•R1
•R2
Time
•Cases •Deaths
1/10/2011 Problem Solving Tools 22
Nplse: Actuarial Forecasts • Orjan Hallberg (Ericsson ret.) researches
medical problems http://www.hir.nu • Carl Harris and Ed Rattner used nplse to
forecasts AIDS deaths from HIV+AIDS conversions and death counts – Carl died early of heart attack, and Ed claims he’s
fully retired.
• Dick Mensing: SSE = S[Expected-Observed]2
– Expected = actuarial forecast (hindcast)
1/10/2011 Problem Solving Tools 23
Apple: Unforeseen Consequences • Boss thinks ships and returns
counts are sufficient. Lit. search =>1975 NRLQ article
• Estimate all service parts’ reliability, forecast failures and recommend stock levels
• Dealers scream! Apple had required dealers to buy obsolescent spares
• Apple bought back $36M of obsolescent spares, for $18M, and crushed them. Made me limit returns to ~$6M per quarter.
1/10/2011 Problem Solving Tools 24
Repairable Reliability (outline)
• Triad Systems Corp. • Brie Engineering M88A1 • Larry Ellison, Oracle
School Clip Art / TOASTER 12/19/01
1/10/2011 Problem Solving Tools 25
Triad Systems Corp. • New Products manager proposes auto parts
demand forecast = Sn(t)a(t): n(t) = cars by year – Fails due to autocorrelation, no pun intended – Auto parts sales might be the second, third, or ???
Stores don’t know – Derived the nplse failure rate estimates for renewal
processes ~1994. Got job. Forecasts are better. – Extended to generalized repairable processes (first
TTF differs) and npmle ~1999
• Triad US Patent 5765143 actuarial forecast
1/10/2011 Problem Solving Tools 26
M88A1 • In 2000, Brie engineer
shares M88A1 drivetrain rebuilds counts for 1990s, $186k then. Laid off – Estimate: ~25% fail in first year. Either problem
wasn’t fixed or faulty rebuild. TACOM uninterested.
– 2005 AVDS 1790 engine backorders. RAND publishes “Velocity Management.” RAND uninterested in actuarial forecasts
– ASQ Quality Progress 2010 publishes article on greening the engine overhaul process
1/10/2011 Problem Solving Tools 27
M88A1 M88A1 Drivetrain Component Reliability
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25
Age at replacement, years
EngineTransRelayAsmTransPTOGenEngACGeneratorDrvAssyRtFdAsmFuelPumpEngPTOStarterTurboCTranCooler
1/10/2011 Problem Solving Tools 28
Oracle and Breast Cancer
• Oracle CMM dbs record ages at system failures and the parts that failed – They don’t identify parts by serial number,
location: TOAD, AIMS?, Other? – What if there were duplicate parts?
• Breast cancer recurrences: same side second time or other side???
1/10/2011 Problem Solving Tools 29
EM and Hidden Renewals
• EM algorithm, (Estimation-Maximization), gives part reliability npmle – www.wikipedia.com/EM_algorithm [Dempster,
Laird, and Rubin] • Nplse failure rate estimates and forecasts
for renewal processes with missing data (2008) – Provisional patent pending application is in
procrastination
1/10/2011 Problem Solving Tools 30
Two-Part System • Least Sqs is for both parts, EM is for one
Alternative Reliability Estimates
0
0.2
0.4
0.6
0.8
1
0 4 8 12
Age, Quarters
Least Sqs R(t)EM R(t)
1/10/2011 Problem Solving Tools 31
You’re Being Followed
– Pleasanton residents complain about traffic cutting thru. City adjust signal timing to back cars onto freeway. Crash
– City cars follow intruders. Citizens arise (2000) – Pleasanton gives traffic count data – Nplse of O-D matrix and travel time distributions – Traffic manager doesn’t understand O-D,
probability distributions, and their use – City stations cheap labor at major intersections to
record license numbers (2009)
• “It’s human nature to doubt statistically significant conclusions based on a sample that is a small fraction of the population” Tversky and Kahneman
1/10/2011 Problem Solving Tools 32
Pleasanton
1/10/2011 Problem Solving Tools 33
Network Tomography
Northbound: Sunol Blvd.
Southbound: Foothill, Hopyard-Hacienda-Owens, Santa Rita
Westbound: Stanley Blvd
Eastbound: Las Positas, Stoneridge, Foothill
Source-Sink
1/10/2011 Problem Solving Tools 34
Pleasanton PM OD matrix
• AKA network tomography Pmatrix Pton
origin Thru Pton
O from\ D to->
From 0 From N From S From E From W Lambda0
go g1
To 0 0.0000 0.8640 0.0000 0.0000 0.7801 6.5128 0.9924 0.8541
To N 0.2136 0.0000 0.0135 0.0000 0.0721 5.9121 0.0001 0.0802
To S 0.1801 0.0285 0.0000 1.0000 0.0000 0.0000 0.0075 0.0656
To E 0.1755 0.0177 0.2679 0.0000 0.1479 0.0000 0.0000 0.0000
To W 0.4308 0.0899 0.7186 0.0000 0.0000
1/10/2011 Problem Solving Tools 35
Dealing with Uncertainty
• Randomness (aleatory uncertainty) – Reliability function, bounds, and stochastic dominance
• Sample uncertainty vs. population – Why sample if you can get population statistics?
• Epistemic, Knightian, unknown unknowns… – PRA and “Uncertainty in the URC” – Jackknife, bootstrap, broom charts… – Nonparametric extrapolations – Scenarios
“The analyst should provide a measure of the uncertainty that results from the assumptions underpinning the set of models applied in the analysis and the deliberate and unconscious simplifications made.” Terje Aven
1/10/2011 Problem Solving Tools 36
Component D
• Given first year of monthly failure counts, how many will fail in remainder of 3-year warranty? – Data are left and right censored. All failure counts were
collected on one calendar date. Monthly ships too – Some failures are 12 months old, some 11 months….
• “I do not think that a nonparametric approach would work.” – It works: facilitates extrapolation, uncertainty – Weibull reliability under-forecasts failures
1/10/2011 Problem Solving Tools 37
Alternative Reliability Estimates • 12 months of ships and failures
0.998
0.9985
0.999
0.9995
1
0 3 6 9 12
Age, Months
npmleWeibull mlenplseNaïvemle Weibulllse Weibull
1/10/2011 Problem Solving Tools 38
Failure Rate Extrapolation Uncertainty
0
0.0001
0.0002
0.0003
0.0004
0.0005
0 3 6 9 12 15 18 21 24 27 30 33 36
Age, Months
npmlenplsemle Weibulllse Weibull
1/10/2011 Problem Solving Tools 39
Actuarial Forecasts
Method E[Failures]
Npmle 2687
Nplse 2704
Mle Weibull 2066
Lse Weibull 2495
Meeker et al. (Weibull) 2032
1/10/2011 Problem Solving Tools 40
Extrapolation Scenarios • Nonparametric linear extrapolations
– Jackknife; leave out one month’s data – Broom; all 12 months, first 11, first 10…
• W. Weibull recommends power functions for simplicity
• Sensitivity and delta method: – derivatives of actuarial forecasts wrt linear
extrapolation coeffs are Sn(t) and Stn(t) • Future uncertainty???
1/10/2011 Problem Solving Tools 41
Possible Reliability Futures • MTBF no longer a specification? • Less Weibull? More inverse Gauss? • Consumer bills of rights? WikiReliability?
– Do not track by serial number or name (privacy), unless reduced sample uncertainty is worth the costs
• More uncertainty and risk analysis? – Risk equity, FMERD… – Dempster-Shaefer Theory of Evidence, belief – Statisticians work on causal inference and vv
• What do you think? What’s needed?
1/10/2011 Problem Solving Tools 42
REFERENCES
• AFI 21-104, “Selective Management of Selected Gas Turbine Engines,” Air Force Instruction 21-104, Air Force Material Command, June 1994, http://afpubs.hq.af.mil
• McLinn, James, “A Short History of Reliability,” ASQ Reliability Review, Vol. 30, No. 1, pp. 11-18, March 2010
• Barlow, Richard E. and Frank Proschan, “Historical Background of the Mathematical Theory of Reliability,” in chapter 1 of Mathematical Theory of Reliability, John Wiley, SIAM, New York, 1965
• Geisler, Murray and H. W. Karr, “The design of military supply tables for spare parts,” Operations Research, Vol. 4, No. 4, pp. 431-442, 1956
• Kamins, Milton and J. J. McCall, “Rules for Planned Replacement of Aircraft and Missile Parts,” RAND RM-2810-PR, Nov. 1961
• Saleh, J. H. and K. Marais, “Highlights from the early (and pre-) history of reliability engineering, Reliability Engineering and System Safety, Vol. 91, No. 2, pp. 249-256, Feb. 2006
• ISO 26000, “Guidance on Social Responsibility,” Draft International Standard, 2009
• Lee, Miky, Craig Hillman, and Duksoo Kim, “How to predict failure mechanisms in LED and laser diodes,” Aug. 2005, http://www.dfrsolutions.com/uploads/publications/2005_MAE_LED_article.pdf
1/10/2011 Problem Solving Tools 43
References by George
• “Estimation of a Hidden Service Distribution of an M/G/ Service System,” Naval Research Logistics Quarterly, pp. 549-555, September 1973, Vol. 20, No. 3. co-author A. Agrawal
• “A Note on Estimation of a Hidden Service Distribution of an M/G/ Service System,” Random Samples, ASQC Santa Clara Valley June 1994
• “Origin-Destination Proportions and Travel-Time Distributions Without Surveys,” INFORMS Salt Lake City, May 2000, http:/www.fieldreliability.com/OD.ppt
• “Biomedical Survival Analysis vs. Reliability: Comparison, Crossover, and Advances,” The J. of the RIAC, pp. 1-5. Q4-2003, http://www.theriac.org/DeskReference/viewDocument.php?id=85&Scope=reg
• “Failure Modes and Effects Risk Diagnostics,” http://www.fieldreliability.com/FMERD.htm
• “Nonparametric Forecasts from Left-Censored Failures,” http://www.fieldreliability.com/QPMeeker.doc, Dec. 2010
• “LED Reliability Analysis,” ASQ Reliability Review, Vol. 30. No. 4, pp.4-11, http://www.fieldreliability.com/PhilLEDs.doc, Dec. 2010
• Credible Reliability Prediction, ASQ Reliability Division Monograph, http://www.asq.org/reliability/quality-information/publications-reliability.html, 2003
• “Nonparametric Forecasts From Left-Censored Data,“ http://www.fieldreliability.com/QPMeeker.doc, Dec. 2010