Upload
truongdat
View
279
Download
2
Embed Size (px)
Citation preview
Reliability Engineering for Today’s Technology Developers and
Information Analysis Centers (IACs) David Nicholls, CRE
Quanterion Solutions Incorporated Director of RMQ Engineering
Presentation to the Technology Alliance of Central New York (TACNY)
13 May 2014
2
Outline
• Background – Quanterion
– Information Analysis Centers (IACs)
• High-Level Definitions
• Typical Reliability Program Problems
• Typical Reliability Engineering Activities
• Strategic Reliability Programs
3
Quanterion Solutions Background
• Formed in 2000 as a NYS “S-Corp”
• Quanterion = QUANTitative + critERION (for decision making)
• Core Competencies
– Reliability/Maintainability/Quality
– Information Analysis Center Operation
– Software Engineering
– Software/Database Development
– Cyber Security
– Materials Applications
– Training Programs
CERTIFIED
RELIABILITY
ENGINEERS
CERTIFIED
QUALITY
ENGINEERS
2007 “Leading EDGE Award” Winner
E. Quint Carr Awards for Engineering Excellence
2002 & 2008
IACs are DoD Technical Centers of Excellence
5
• Improve productivity of Researchers, Engineers, and Program Managers in the Defense Research, Development, and Acquisition Communities by collecting, analyzing, synthesizing, and disseminating worldwide Scientific Technical Information (STI) in clearly defined, specialized fields or subject areas
Quanterion Involved in Two IACs
Cyber Security & Information Systems (CSIAC) • Merger of 3 “legacy” IACs
• Software (DACS) • Modeling and Simulation (MSIAC) • Information Assurance (IATAC)
• CNY Subcontractors: AIS, SRC, Syracuse University, Griffiss Institute, Wetstone, SUNYIT, Utica College
6
Defense Systems (DSIAC) • Merger of 6 “legacy” IACs
• Reliability, Maintainability, Quality, Supportability and Interoperability (RIAC)
• Advanced Materials, Manufacturing and Testing (AMMTIAC)
• Chemical Propulsion (CPIAC) • Military Sensing (SENSIAC) • Survivability/Vulnerability (SURVIAC) • Weapons System Technology (WSTIAC)
• Quanterion is the Focal Point for RMQSI (RIAC) and Advanced Materials (AMMTIAC) Technologies
• Reliability • The (statistical) probability that an item will
perform its intended function for a specified (life unit) interval under stated conditions (RIAC “System Reliability Toolkit”)
• Covers hardware, software, human factors
• Quality • No universal definition • “The totality of features and characteristics of a
product or service that bears its ability to satisfy stated or implied needs” (ISO 8402-1986)
Relevant High-Level Definitions
7
Relevant High-Level Definitions
8
• Reliability vs. Quality • The terms are not synonymous • Quality is based on “stated or implied needs” at a
discrete point of “measurement” (if measured) • If “measurements” meet those needs, an item is
considered to have acceptable quality
• Poorly stated requirements and bad design practices can result in poor reliability products that still meet quality specifications • A product can meet its quality requirements, but fail
“prematurely” due to poor reliability
• Reliability Growth • The (generally) positive improvement in a reliability measure over a
duration of (life units) due to the identification and mitigation of failure modes, and subsequent verified effective corrective actions, to system inherent design, operation, maintenance and manufacturing processes, procedures and documentation (RIAC “Achieving System Reliability Growth Through Robust Design and Test”)
• “It’s not rocket science”
• Root (Failure) Cause • The lowest-level condition (people, process, software, material,
documentation or requirements) that is identified as being responsible for precipitating a failure
• If you don’t mitigate root cause, reliability doesn’t grow • “Mitigate root causes, not symptoms”
Relevant High-Level Definitions
9
Relevant High-Level Definitions
10
• Return on Investment (ROI) • Popular financial metric for evaluating the
financial consequences of individual investments and actions
• Several different metrics are called ROI • Best known is Simple ROI
• Compares the magnitude and timing of investment gains directly with the magnitude and timing of costs
• A positive ROI means that financial gains compare favorably to invested costs (higher is better)
Relevant High-Level Definitions
11
• Reliability Growth vs. ROI • Reliability growth implies positive ROI
• “I achieved a X% ROI because I estimate/measure $Y in savings through my $Z investment in reliability.”
• For ROI to be accurate, reliability growth resulting from each Reliability Program activity must be measured and tracked • Investing $50K in a reliability task that has no impact
on a design (no reliability growth) does not yield any return, but is incorrectly lumped into Simple ROI (more later)
Typical Reliability Program Problems (Requirements)
12
• Specification Requirements • IT ALL STARTS WITH THE
SPECIFICATIONS!!! • Not always documented? • Is the right organization defining
them? • Marketing vs. Engineering vs.
Customer(s) • Incorrect/unclear/ambiguous/
missing? • Performance requirements • Environmental requirements • Design requirements • Reliability and maintainability
(R&M) requirements • Is “failure” defined; how is it
“scored”? • Scope creep and failure
negotiations Higher costs, schedule delays, customer dissatisfaction
Typical Reliability Program Problems (Requirements)
13
• US DoD “Requirements” Problems
14
Typical Reliability Program Problems (Requirements)
• The Warfighter Has Critical Operational Reliability Needs
– “Does Not Care” What Caused a Mission Failure:
• Inherent hardware (wearout)
• Hardware quality (random part quality/variability, manufacturing workmanship)
• Inherent software
• Induced (maintenance or operator)
• No defect found/cannot duplicate
• Inadequate design (e.g., inadequate margins, tolerance stack-up, sneak paths)
• System management (e.g., requirements issues, insufficient resources) 14
Typical Reliability Program Problems (Requirements)
15
• US DoD “Requirements” Problems
Typical Reliability Program Problems (Requirements)
16
• Based on a Robust System Design Approach Using DFR Processes and Reliability Growth Planning/ Tracking to Meet the 100-Hour MTBF Requirement
• …the Warfighter Will Only “See” a 31-Hour MTBF
Typical Reliability Program Problems (Requirements)
17
• Based on a System Design Using the Same Rigorous DFR Processes and Reliability Growth Planning, What Should the Specified Requirement Have Been?
• The Warfighter Will “See” a 100-Hour MTBF
18
• Design Requirements • Standardized processes for design reliability do not exist or are not
consistent across the business? • Integration with new businesses • Different market segments/customer needs • Multiple geographic locations • “Not invented here”
• Do existing design processes consider reliability needs? • Environmental characterization (where does this come from?)
• Operational, plus packaging, transportation, handling and installation
• Part selection/control • Use of electronic/mechanical part stress derating • Proactive identification/mitigation of failure modes
Typical Reliability Program Problems (Design)
Typical Reliability Engineering Activities During Design
Reliability Program Activity+
Direct Contribution to Inherent Reliability Growth?
Caveats
Design Reviews Yes* Independent evaluation/critique of a design may identify/mitigate potential failure modes overlooked in the original design, or confirm they have been identified and suitably mitigated.
Dormancy Analysis Yes Determines effects of long-term non-operation, where “effects” may include identification of failure modes (assumed to be mitigated after identification).
Durability Assessment (Physics-of-Failure)
Yes Assessment of adequate mechanical strength requires identification of specific failure modes (assumed to be mitigated after identification).
*…but only if the results of the activity result in an actual design/process change!
+…how much (if anything) should I invest in this (or other) Reliability Program activities?
19
Typical Reliability Engineering Activities During Design
Reliability Program Activity
Direct Contribution to Inherent Reliability Growth?
Caveats
Failure Modes and Effects Analysis/Criticality Analysis (FMEA/FMECA)
Yes Explicitly requires identification of all failure modes and factors in the results of failure mode mitigation.
Fault Tolerance Yes An important failure mode mitigation technique that can contribute to operational/mission reliability growth (although it may decrease logistic reliability).
Fault Tree Analysis Yes If mitigation techniques are applied based on the identified failure modes in the FTA, then reliability growth can occur.
Reliability Centered Maintenance
Yes Identifies failure modes from sources not normally considered. Serves as a mitigation tool for modes identified from other analyses/ tests.
20
Typical Reliability Engineering Activities During Design
Reliability Program Activity
Direct Contribution to Inherent Reliability Growth?
Caveats
Sneak Analysis Yes Investigates/identifies failure modes that contribute to unintended failure paths (electronic circuits) or logic flows (software), presenting opportunities for mitigation.
Test Strategy Yes Presumption for reliability growth is that (1) it will include tests specifically design to “discover” failures and (2) all discovered failure modes will be mitigated through appropriate Corrective Actions.
Worst Case Analysis Yes Can identify degradation failure modes that will manifest under worst case conditions.
21
22
• Test Requirements • Testing for functionality and testability only is performed? • Testing for environmental qualification only is performed?
• Designs should function within extremes of tested environments (are “all” environments tested for?)
• Limited testing to determine inherent design reliability? • Any testing to determine inherent design maintainability
(other than testability)? • Emphasis on reactively testing R&M into products, rather
than proactively designing R&M into products? • Are post-manufacturing reliability screens (burn-in,
environmental stress screening) adequate to precipitate infant mortality failures (quality, workmanship, incoming part lot-dependence, latent design defects)?
Typical Reliability Program Problems (Test)
Typical Reliability Engineering Activities During Test
Test Type Direct Contribution to Inherent Reliability Growth?
Are Failures Desirable?
Accelerated Stress Test/Highly Accelerated Stress Test (AST/HAST)
Yes Yes. Primary purpose of test is identification/mitigation of failure modes.
Accelerated Life Test/Highly Accelerated Life Test (ALT/HALT)
No No. Classical ALT/HALT quantifies reliability life characteristics only. More failures mean shorter life.
Environmental Stress Screening (ESS) No Yes. Eliminates infant mortality failures before delivery to the customer to maintain (not grow) inherent reliability.
Highly Accelerated Stress Screening (HASS)
No Yes. Same as ESS.
Reliability Growth Test (RGT) Yes Yes. Primary purpose of test is identification/mitigation of failure modes.
Reliability Demonstration Test/Reliability Qualification Test (RDT/RQT)
No No. Failures counted/scored against a requirement. More failures may mean a “Reject” decision.
Production Reliability Acceptance Test (PRAT)
No No. Same as RDT/RQT.
24
• Manufacturing Requirements • Basic Question: How can designed-in (inherent)
reliability be retained (not degraded) during manufacturing? • Adequate emphasis on materials and process
controls? • Is manufacturing an integral part of a design
reliability process that addresses reliability requirements concurrently with other requirements?
• Is duplication of effort avoided? • Is expensive/repetitive rework avoided?
Typical Reliability Program Problems (Manufacturing)
Typical Reliability Engineering Activities During Manufacturing
Reliability Program Activity
Direct Contribution to Inherent Reliability Growth?
Caveats
Design of Experiments (DOE)
No Provides mechanism for evaluating corrective action alternatives resulting from unacceptable manufacturing process variability, or unexpected failures in manufacturing. Used to restore process control or identify necessary process modifications.
ESS/HASS No Precipitates failures resulting from design flaws or workmanship defects at the level of assembly applied, before the manufactured item is delivered to the customer.
FMEA/FMECA Yes Results can be used as a troubleshooting aid and identifying necessary repairs and/or CAs (if unanticipated or unexpected process or design failure modes are being discovered).
DOE can also be used proactively in the Design and Test phases to help identify, understand and mitigate failure modes. In that context, it can contribute to reliability growth of the inherent design. 25
Typical Reliability Engineering Activities During Manufacturing
Reliability Program Activity
Direct Contribution to Inherent Reliability Growth?
Caveats
Failure Reporting, Analysis and Corrective Action System (FRACAS)
Yes Repository for manufacturing in-process failure data for root failure cause determination. Closed-loop feedback CAs facilitates growth in current or next-generation processes or systems.
Inspection No Ensures that no physical defects exist that result in customer dissatisfaction or an increase in safety/liability risk.
PRAT No Ensures that deliverable/delivered items demonstrate the ability to meet the inherent design reliability during customer use. Based on 100% or sample testing.
Statistical Process Control (SPC)/Six-Sigma
No Identifies whether manufacturing process variability is in control or out of control, and whether failures are random or special cause (correctable).
26
27
• Operation and Support (or Maintenance) Requirements • Activities Performed During the Operation and Support (O&S) Phase of the
System Life Cycle: • Are operating, installation and training procedures implemented
correctly? • Is repair and warranty service adequate? • Expected or unexpected number of warranty failures? • Meaningful reliability performance feedback (relevant data and
information)? • Refurbishment and disposal tasks required? • Success in resolving of potential wearout issues?
• Are failure modes being introduced during the maintenance process? • Inadequate repair personnel skill levels? • Inadequate maintenance support test equipment? • Lack of guidance (inadequate training)? • Inadequate maintenance/repair documentation? • Components exposed to “abnormal” repair/replacement scenarios?
Typical Reliability Program Problems (O&S/M)
Typical Reliability Engineering Activities During O&S/M
Reliability Program Activity
Direct Contribution to Inherent Reliability Growth?
Caveats
FMEA/FMECA Yes Results can be used as a troubleshooting tool, helping to minimize incidents of incorrect repair/replacement. It can/should be updated to capture failure modes discovered in the field that were either unanticipated or unexpected.
FRACAS Yes Repository for failure data/information related to unanticipated, unexpected or “routine” failures during customer use. Analysis determines root failure cause and effective CA. Closed-loop feedback facilitates necessary design/process changes to grow reliability.
28
29
Typical Reliability Program Problems (Data)
• Data and Information Requirements • Is collected data/information adequate?
• Is not (or cannot be) captured • Not enough data captured (missing relevant data, failure
symptoms, extenuating circumstances) • R&M-related data not adequately analyzed (resource limitations)
• Taken from all life cycle phases (Requirements, Design, Test, Manufacturing and O&S/M)?
• Capturing/analyzing all failure data/information? • Unacceptably high number of returns (warranty and non-
warranty)? • Cannot Duplicate (CND) failures exist (what percent)? • Does root cause failure analysis identify true root cause (not
just the symptom)? • Is the most effective corrective action being implemented • Are corrective actions verified as effective
Typical Reliability Engineering Activities for Data
Reliability Program Activity
Direct Contribution to Inherent Reliability Growth?
Caveats
FRACAS Yes Repository for failure data/information related to unanticipated, unexpected or “routine” failures during customer use. Analysis determines root failure cause and effective CA. Closed-loop feedback facilitates necessary design/process changes to grow reliability.
• Traditional FRACAS usually covers only formal testing/screening, manufacturing and field data
• Enhanced FRACAS can/should also include: • Failures during design (less formal tests like DOE, ALT/HALT, software
debug, etc.) • Lessons Learned • Cost data (to support TLCC and ROI assessments)
30
Typical Reliability Program Problems (TLCC)
• The Reliability Program Represents a Simple Choice - “Pay Me Now, or Pay Me Later” – “Pay me now” sees investment in reliability, starting in the early
design/development phase, as being more than offset by downstream savings and/or cost avoidance
– “Pay me later” favors lower investment in reliability design/ development processes in favor of reactive reliability growth (in test)
– “Pay me a lot more, much later” ignores the importance of a reliability program and relies on field/customer experience to identify problems
• Should Justify Each Reliability Program Activity with Cost-Benefit Analysis (Value-Added or Not?)
• Objective of Life Cycle Cost Analysis: – Choose the most cost-effective approach for utilizing available
resources over the entire system/product life cycle
31
Typical Reliability Program Problems (TLCC)
• Life Cycle Cost Analysis is a Formal, Structured Process for Evaluating and Quantifying the Cost Impacts of Alternative Courses of Action
– Supports trade studies between competing design/process configurations or program approaches
– Measures sensitivity of a specific design/process to changes in specific performance parameters
• For Reliability Growth, TLCC Considers the Long-Term Cost Impacts of Identifying Failure Modes and Mitigating Their Root Causes Based on Where They Are Found and Corrected in the Overall Product/System Life Cycle
32
Typical Reliability Program Problems (TLCC)
C ACQ (negligible)
C OM
C ACQ
C OM (negligible)
MTBF MTBF MTBF (Decrease MTBF to
improve TLCC) (Increase MTBF to
improve TLCC)
Cost
C LCC
Optimum TLCC C LCC
C LCC
}
C ACQ
C OM
• How Much Should Be Spent on the Reliability Program…
33
Typical Reliability Program Problems (TLCC)
• …and When/Where Should It Be Spent?
FA
ILU
RE
MO
DE
ID
EN
TIF
IED
/DIS
CO
VE
RE
D
D
ecrea
sin
g D
FR
Eff
ecti
ven
ess
an
d I
ncrea
sin
g C
ost
to
Mit
iga
te
FAILURE MODE INTRODUCED
Increasing Design for Reliability (DFR) Effectiveness
Requirements Design Code
and Unit
Test
SW
Integration
SW
Quality
Test
System
Integration
and Test
SW
Maintenance
TOTAL
Requirements $1,515 $1,515
Design $11,810 $1,555 $13,365
Code and Unit
Test
$40,200 $9,120 $2,421 $51,741
SW
Integration
$20,000 $42,000 $15,250 $37 $77,287
SW Quality
Test
$19,100 $22,300 $37,000 $70 $1 $78,471
System
Integration
and Test
$89,000 $11,400 $11,400 $500 0 $10 $112,310
SW
Maintenance
0 0 0 0 0 0 0 0
TOTAL $181,625 $86,375 $66,071 $607 $1 $10 0 $334,689
34
Typical Reliability Program Problems (ROI)
35
A:
B:
1. Invest $1M in basic design (No Reliability Program)
2. Reliability = “X”
1. Invest $1M in basic design (No Reliability Program)
2. Reliability = “X”
3. Invest $100K in Reliability Program
4. No design changes made 5. Reliability = “X”
3. Invest $100K in Reliability Program
4. Critical design changes made
5. Reliability = “2X”
6. Savings = $0 7. Investment = $100K
8. No ROI
6. Savings (2X Reliability) = $1.5M 7. Investment = $100K
8. ROI = 1500%
Strategic R&M
36
Reliability Centered Maintenance-Based Strategy
37
Summary
• Reliability and Quality are Not Synonymous
• Reliability Includes Hardware, Software and Human Elements
• Root Failure Causes Include Hardware, Software, Human, Process, Documentation, Requirements and Management
• It All Starts with Requirements
• “Pay Now, Later or Much Later”
• Reliability Activities Must Impact the Design to Grow Reliability
• Certain Reliability Activities Only Ensure Inherent Reliability is Not Degraded
• Investing in a Reliability Program Does Not Guarantee a ROI
• Strategic Reliability Programs Optimize TLCC; They Do Not Maximize Reliability
38
Contact Information
39
David Nicholls Director of RMQ Engineering 100 Seymour Rd – Suite C101 Utica, NY 13502-1311 315.351.4202 [email protected]