Use of Failure Mechanisms enhance FMEA and FMECA

Proceedings of the IEEE Workshop on Accelerated Stress Testing & Reliability (ASTR), Austin, Texas, October 3 - 5, 2005.

Identification and Utilization of Failure Mechanisms to Enhance FMEA and FMECA

Sathyanarayan Ganesan, Valérie Eveloy, Diganta Das, and Michael G. Pecht CALCE Electronic Products and Systems Center

Department of Mechanical Engineering University of Maryland, College Park, MD 20742

[email protected]

Abstract Failure mechanisms are the processes by which physical,

electrical, chemical and mechanical stresses induce failure. Knowledge of the failure mechanisms that cause product failure is essential to design and qualify reliable products. The standard Failure Modes and Effects Analysis (FMEA) and Failure Modes, Effects and Criticality Analysis (FMECA) procedures do not identify the product failure mechanisms and models, which limits their applicability to provide a meaningful input to critical procedures such as virtual qualification, root cause analysis, accelerated test programs, and to remaining life assessment. This paper proposes a new methodology, namely Failure Modes, Mechanisms and Effects Analysis (FMMEA), which enhances the value of FMEA and FMECA by identifying high priority failure mechanisms and failure models. High priority failure mechanisms determine the operational stresses, and the environmental and operational parameters that need to be controlled. Models for the failure mechanisms help in the design and development of the product. The proposed FMMEA methodology is applied to an electronic circuit board assembly mounted in an automotive underhood environment.

1. Introduction The competitive marketplace places demands on

manufacturers to identify cost-effective methods of improving the product development process. In particular, the industry has been interested in an efficient approach of understanding potential product failures that might affect product performance over time. Some organizations are either using or requiring the use of Failure Mode and Effects Analysis (FMEA) towards that goal, but companies are generally not satisfied with this methodology, except for the purpose of safety analysis.

FMEA was developed as a formal methodology in the 1950’s at Grumman Aircraft Corporation, to analyze the safety of naval aircraft flight control systems [1, 2]. From the 1970’s through the 1990’s, a number of military and professional society standards and procedures were published to define the FMEA methodology [3, 4]. In 1971, the Electronic Industries Association (EIA) G-41 Committee on Reliability published a Bulletin entitled “Failure Mode and Effects Analysis” (FMECA) [5]. Mil-Std 1629, “Procedures for Performing a Failure Mode, Effects and Criticality Analysis ” was released by the United States (US) Department of Defense in 1974 [6]. FMECA was considered an extension of FMEA, that included assessing the probability of occurrence and criticality of potential failure modes.

Today, the distinction between FMEA and FMECA is not as clear and both terms are used interchangeably [2, 7]. In 1985, the International Electrotechnical Commission (IEC) introduced IEC 812, “Analysis Techniques for System Reliability – Procedure for Failure Modes and Effects Analysis” [8]. The automotive industry adopted the FMEA practice in the late 1980’s, and in 1993, the Supplier Quality Requirements Task Force, comprised of representatives from Chrysler, Ford and GM, introduced FMEA into quality manuals through the QS 9000 process [2]. In 1994, the Society of Automotive Engineers (SAE) published SAE J- 1739, “Potential Failure Modes and Effects Analysis in Design and Potential Failure Modes and Effects Analysis in Manufacturing and Assembly Processes,” as a reference manual to provide general guidelines in preparing an FMEA [9]. In 1999, Daimler Chrysler, Ford and GM as part of the International Automotive Task Force, agreed to recognize the international standard “ISO/TS 16949” [10] that included FMEA and would eventually replace QS 9000. FMEA is also one of the six sigma tools [11] and is part of the training of Six Sigma Green and Black Belts.

FMEA is now used across many industries and is referred to as System FMEA, Design FMEA, Process FMEA, Machinery FMEA, Functional FMEA, Interface FMEA and Detailed FMEA, depending upon its application. Although the purpose and terminology can vary according to the product and industry, the main objectives of all FMEA processes are to anticipate problems early in the development process, and either prevent such problems or minimize their consequences [9]. In this paper, FMEA is discussed in the context of its application to the design and use of a product.

2. Failure Modes ands Effects Analysis Electronic hardware is typically a combination of board,

components and interconnects, all with various failure mechanisms by which they can fail in the life-cycle environment. A potential failure mode is the manner in which a failure can occur - that is, the ways in which the item fails to perform its intended design function, or performs the function but fails to meet its objectives [9, 12]. Failure modes are closely related to the functional and performance requirements of the product.

The FMEA methodology is a systematic procedure to recognize and evaluate the potential failure of a product and its effects, and to identify actions that could eliminate or reduce the likelihood of the potential failure to occur [2]. The basic FMEA procedure consists of the following steps:

1) Identify elements or functions in the product

© 2005 IEEE IEEE ASTR 2005


Ganesan et al., Identification and Utilization of Failure… IEEE ASTR 2005

2) Identify all element or function failure modes 3) Determine the effect(s) of each failure mode and its

severity 4) Determine the cause(s) of each failure mode and its

probability of occurrence 5) Identify the current controls in place to prevent or

detect the potential failure modes 6) Assess risk, prioritize failures and assign corrective

actions to eliminate or mitigate the risk 7) Document the process

To achieve the greatest value, FMEA should be conducted before a failure mode has been unknowingly built into the product [13]. A typical design FMEA worksheet is shown in Figure 1. For risk assessment, a FMEA uses occurrence and detection probabilities in conjunction with severity criteria to develop a risk priority number (RPN). The RPN is the product of severity, occurrence and detection. The calculated RPNs are prioritized and corrective actions are taken to mitigate the risk associated with the potential failure. Once corrective actions are implemented, the severity, occurrence and detection values are reassessed, and a new RPN is calculated. This process continues until the risk level is acceptable.

A limitation of the FMEA and FMECA procedures is that neither identifies the product failure mechanisms and models in the analysis and reporting process. Failure mechanisms are the processes by which physical, electrical, chemical and mechanical stresses induce failure [14]. Investigation of the possible failure modes and mechanisms of the product aids in developing failure-free and reliable designs. A design team must be aware of the possible failure mechanisms to design hardware capable of withstanding loads without failing. Failure mechanisms and their related physical models are also important for planning tests and screens to audit nominal design and manufacturing specifications, as well as the level of defects introduced by excessive variability in manufacturing and material parameters. Without information on failure mechanisms, FMEA may not provide a meaningful input to critical procedures such as virtual qualification, root cause analysis, accelerated test programs, and to remaining life assessment. Another potential shortcoming of the standard FMEA is that the use of environmental and operating conditions information is not made at a quantitative level.

In reliability simulation, failure models are used to analytically estimate times to failure distributions. Reliability simulation at the product development stage is essential to reduce the product development cost and time by allowing the identification of weaknesses in the design and evaluating design options. Reliability simulation can only be technically and economically effective if it considers the appropriate failure mechanisms relevant to a particular design and application environment.

Root cause investigations require an understanding of possible failure mechanisms to guide the data collection for incident analysis, and the root cause hypothesis development and verification.

Accelerated testing is based upon the concept that a product will exhibit the same failure mechanism and mode in a short time under high stress conditions, as it would exhibit in a longer time under actual life cycle stress conditions. Accelerated tests are used to precipitate failures during product development and qualification. Only with the knowledge of the relevant failure mechanisms, can one design appropriate tests (e.g., stress levels, physical architecture, and durations) that will precipitate the failures by the same mechanism without resulting in spurious failures. The accelerated test data can only be analyzed for estimating times to failure in the field, when the mechanism and stresses that affect both the mechanism and times to failure, are known and understood.

Health and usage monitoring of electronics involves the selection and placement of appropriate sensors into a product to monitor the loads experienced by the system. The constraints on physical space and interfaces available for data collection and transmission limit the number of sensors that can be integrated into a product. Therefore, a prioritized list of failure mechanisms and the environmental conditions that affect them needs to be established to ensure that the appropriate data is collected and utilized for remaining life assessment.

3. Failure Modes, Mechanisms and Effects Analysis Methodology A novel approach, Failure Modes, Mechanisms and

Effects Analysis (FMMEA), is proposed to address weaknesses in the traditional FMEA and FMECA processes. FMMEA is a systematic methodology to identify potential

System

Subsystem

Component

Potential FMEA

Number

Prepared By

FMEA Date

Failure Mode and Effects Analysis

(Design FMEA)Design Lead Key Date Revision Date

Page of Core Team

ActionResults

Item /

Function

Potential Failure Mode(s)

Potential Effect(s)

of Failure

Sev

Potential

Cause(s) of Failure

Prob Current Design

Controls

Det

RP

N

Recommended

Action(s)

Responsibility & Target

Completion Date

Actions Taken

New

Sev

New

Occ

New

Det

N

ew R

PN

Figure 1. FMEA worksheet [15].



failure mechanisms and models for all potential failures modes, and to prioritize failure mechanisms. FMMEA enhances the value of the FMEA and FMECA methods by identifying high-priority failure mechanisms in order to create an action plan to mitigate their effects. High priority failure mechanisms determine the operational stresses and the environmental and operational parameters that need to be controlled. Models for the failure mechanisms help in the design and development of the product.

FMMEA is based on understanding the relationships between product requirements and the physical characteristics of the product (and their variation in the production process), the interactions of product materials with loads (stresses at application conditions) and their influence on the product susceptibility to failure with respect to the use conditions. This involves identifying the failure mechanisms and

Identify life cycle environmental and operating conditions

Define system and identity elements and functions to be analyzed

Identify potential failure modes

Identify potential failure causes

Identify potential failure mechanisms

Identify failure models

Prioritize failure mechanisms

Document the process

reliability models to quantitatively evaluate the susceptibility to failure. In addition to the information gathered and used for FMEA, FMMEA uses life cycle environmental and operating conditions and the duration of the intended application with knowledge of the active stresses and potential failure mechanisms. The steps of the FMMEA process are summarized in Figure 2, and described in the following sections.

3.1. System Definition, Elements and Functions As illustrated in Figure 2, a FMMEA process begins by

defining the system to be analyzed, which is viewed as a composite of subsystems or levels that are integrated to achieve a specific objective. The system is divided into various sub-systems or levels, continuing to the lowest possible level, which is referred to as component or element. The system breakdown can be either be performed by function (i.e., according to what the system elements “do”), by location (i.e., according to where the system elements “are”), or a combination of both (i.e., functional breakdown by location, or vice versa). In a printed circuit board, for example, a location breakdown would include the package, plated through-hole (PTH), metallization, and the board itself.

For each element, all of the associated functions are listed. For example, the primary function of a solder joint is to connect two materials mechanically and electrically. Hence, failure of a solder joint will relate to its inability to perform as a physical and electrical interconnection. Further analysis is conducted on each element thus identified.

3.2. Potential Failure Modes For all the elements that have been identified, all possible

failure modes are listed. For example, in a solder joint the potential failure modes are open or intermittent change in resistance, that can hamper its functioning as an interconnect. In cases where information on possible failure modes that may occur is not available, potential failure modes may be identified using numerical stress analysis, accelerated tests to failure (e.g., HALT), past experience and engineering judgment [12]. A potential failure mode at one level may be the cause of a potential failure mode in a higher level system or subsystem, or be the effect of one in a lower level component.

Figure 2. FMMEA methodology.

3.3. Potential Failure Causes A failure cause is defined as the circumstances during

design, manufacture, or use that lead to a failure mode [12]. For each failure mode, all possible ways a failure can result are listed. Failure causes are identified by finding the basic reason that may lead to a failure during design, manufacturing, storage, transportation or use. The failure causes can include environmental and operational conditions. In an automotive underhood environment, for example, solder joint failure modes such as open and intermittent change in resistance can be caused by temperature cycling, random vibration, and shock impact. Knowledge of the potential failure causes can help identify the failure mechanisms that drive the failure modes for a given element.

3.4. Potential Failure Mechanisms Failure mechanisms are determined for each combination

of potential failure mode and cause based on known mechanisms [16]. Studies on electronic material failure mechanisms, and the application of physics-based damage models to the design of reliable electronic products include [17, 18].

The failure mechanisms identified are categorized as either overstress or wearout mechanisms. Catastrophic failures due to a single occurrence of a stress event when the intrinsic strength of the material is exceeded are termed overstress failures. Failure mechanisms due to monotonic accumulation of incremental damage beyond the endurance of the material are termed wearout mechanisms [12]. When the damage exceeds the endurance limit of the component, failure will occur. Unanticipated large stress events can either cause an overstress (catastrophic) failure, or shorten life by causing the accumulation of wearout damage. Examples of such stresses are accidental abuse and acts of God. On the other hand, in well-designed and high-quality hardware, stresses should cause only uniform accumulation of wearout damage; the threshold of damage required to cause eventual failure should not occur within the usage life of the product.

Failure mechanisms frequently occurring in electronics can be classified as electrical performance failures, thermal



performance failures, mechanical performance failures, radiation failures, and chemical failures. Electrical performance failures can be caused by individual components with improper electrical parameters, such as resistance, impedance, capacitance, or dielectric properties, or by inadequate shielding from electromagnetic interference (EMI) or particle radiation. Failure modes can manifest as reversible drifts in transient and steady-state responses, such as delay time, rise time, attenuation, signal-to-noise ratio, and crosstalk. Electrical failures common in electonic hardware include overstress mechanisms due to electrical overstress (EOS) and electrostatic discharge (ESD); examples of such failures in semiconductor components include dielectric breakdown, junction breakdown, hot electron injection, surface and bulk trapping, and surface breakdown, and wearout mechanisms such as electromigration and stress- driven diffusive voiding.

Thermal performance failures can arise due to poor optimization of the heat transfer chain in an electronic assembly. Thermal overstress failures are a result of heating a component beyond critical temperatures such as the glass- transition temperature, melting point, fictile point, or flash point. Some examples of thermal wearout failures are aging due to depolymerization, intermetallic growth, and interdiffusion. Failures due to inadequate thermal design can manifest as components operating at excessive temperature and causing operational parameters to drift beyond specifications, although the degradation is often reversible upon cooling. Such failures can be caused either by direct thermal loads or by electrical resistive loads, which in turn generate excessive localized thermal stresses.

Mechanical performance failures include those that may compromise the product performance without necessarily causing any irreversible material damage, such as abnormal elastic deformation in response to mechanical static loads,

abnormal transient response (such as natural frequency or damping) to dynamic loads, and abnormal time-dependent reversible (anelastic) response, as well as failures that cause material damage, such as buckling, brittle and/or ductile fracture, interfacial separation, fatigue crack initiation and propagation, creep, and creep rupture. For example, excessive elastic deformations in slender structures in electronic packages can sometimes constitute functional failure due to overstress loads, such as excessive flexing of interconnection wires, package lids, or flex circuits in electronic devices, causing shorting and/or excessive crosstalk. However, when the load is removed, the deformations (and consequent functional abnormalities) disappear completely without any permanent damage.

Radiation failures are principally caused by uranium and thorium contaminants, and secondary cosmic rays. Radiation can cause wearout, aging, embrittlement of materials, and overstress soft errors in electronic hardware, such as logic chips.

Chemical failures occur in adverse chemical environments that result in corrosion, oxidation, or ionic surface dendritic growth. There may also be interactions between different types of stresses. For example, metal migration may be accelerated in the presence of chemical contaminants and composition gradients, and thermal loads can accelerate a failure mechanism due to a thermal expansion mismatch.

3.5. Failure Models

Failure models use appropriate stress and damage analysis methods to evaluate the susceptibility to failure based on the time-to-failure or likelihood of a failure for a given geometry, material construction, and environmental and set of operational conditions. Table 1 provides a list of failure models for common failure mechanisms in electronics. Failure models for overstress mechanisms use stress analysis

Table 1. Examples of failure mechanisms in electronics, relevant loads, and models [19]. Failure Mechanism Failure Sites Relevant Loads Sample Model

Fatigue Die attach, Wirebond/TAB, Solder leads, Bond pads, Traces, Vias/PTHs, Interfaces

T, Tmean, dT/dt, dwell time, H, V

Nonlinear Power Law (Coffin-Manson)

Corrosion Metallizations M, V, T Eyring (Howard)

Electromigration Metallizations T, J Eyring (Black)

Conductive Filament Formation

Between Metallizations M, ΛV Power Law (Rudra)

Stress Driven Diffusion Voiding

Metal Traces s, T Eyring (Okabayashi)

Time Dependent Dielectric Breakdown

Dielectric layers V, T Arrhenius (Fowler- Nordheim)

: Cyclic range V: Voltage T: Temperature s: Stress Λ: gradient M: Moisture J: Current density H: Humidity



to estimate the likelihood of a failure based on a single exposure to a defined stress condition. The simplest formulation for an overstress model is the comparison of an induced stress versus the strength of the material that must sustain the stress.

In the case of wearout failures, damage is accumulated over a period until the item is no longer able to withstand the applied load. Wearout mechanisms are analyzed using both stress and damage analysis to calculate the time required to induce failure based on a defined stress condition. An appropriate method for combining multiple conditions must be determined for assessing the time to failure. Sometimes, the damage due to the individual loading conditions may be analyzed separately, and the failure assessment results may be combined in a cumulative manner [4].

Analysis of the system’s susceptibility to failure may be limited by the availability and accuracy of the models used for quantifying the time to failure of the system. It may also be limited by the ability to combine the results of multiple failure models for a single failure site and the ability to combine the results of a single model for multiple stress conditions [12]. If no failure model is available, the appropriate parameter(s) to monitor can be selected based on an empirical model developed from prior field failure data or models derived from accelerated testing.

3.6. Failure Mechanism Prioritization Ideally all failure mechanisms and their interactions must

be considered for product design and analysis. In the life cycle of a product, several failure mechanisms may be activated by different environmental and operational parameters acting at various stress levels, but in general only a few operational and environmental parameters, and failure mechanisms, are responsible for the majority of the failures. High priority failure mechanisms determine the operational stresses, and environmental and operational parameters, that must be accounted for in the design or be controlled. Prioritization of the failure mechanisms provides an opportunity for effective utilization of resources. The methodology for failure mechanism prioritization is shown in Figure 3.

Initial prioritization of all potential failure mechanisms is based upon environmental and operating conditions. If the stress levels generated by certain operational and environmental conditions are non-existent or negligible, the failure mechanisms that are exclusively dependent on those environmental and operating conditions are assigned a “low” risk level and are eliminated from further consideration.

For all the failure mechanisms remaining after the initial prioritization, the susceptibility to failure by those mechanisms is evaluated using the identified failure models when such models are available. For overstress mechanisms, the susceptibility to failure is evaluated by conducting a stress analysis to determine if failure is precipitated under the given environmental and operating conditions. For wearout mechanisms, the susceptibility to failure is evaluated by determining the time-to-failure under the given environmental

Potential failure mechanisms

Initial prioritization

Evaluate failure susceptibility

Evaluate occurrence

Evaluate severity

Final prioritization

RPN

High risk Medium risk Low risk

Figure 3. Failure mechanism prioritization.

and operating conditions. To determine the combined effect of all wearout failures, the overall time-to-failure is evaluated with all wearout mechanisms acting simultaneously. In cases where no failure model is available, the evaluation is based on past experience, manufacturer data, or handbooks.

After evaluation of the susceptibility to failure, occurrence ratings are assigned to the failure mechanisms for the environmental and operating conditions experienced by the system. The occurrence ratings are defined in Table 2. For overstress failure mechanisms, the highest occurrence rating, namely “frequent”, is assigned to mechanisms that actually precipitate failure, and the lowest occurrence rating, namely “extremely unlikely”, is assigned to overstress mechanisms that do not precipitate any failure. For wearout failure mechanisms, the ratings are assigned based on a comparison of the individual time-to-failure for a given wearout mechanism, with the overall time-to-failure, expected product life, past experience and engineering judgment. “Frequent”, “Reasonably probable”, “occasional”, “remote” and “extremely unlikely” ratings are assigned to wearout failure mechanisms with very low, low, moderate, remote, and very high TTF, respectively.

To provide a qualitative measure of the impact of the failures, each failure mechanism is assigned a severity rating. The impact of the failure is firstly assessed at the lowest level of the system being analyzed, followed by the immediately higher level, and the other intermediate levels, up to system level [9]. The severity ratings are defined in Table 3. Their assignment is primarily based on the impact of the failure mechanism on safety and on the end system functionality. Past experience and engineering judgment may also be used in assigning severity ratings. In rating the severity of a failure, the possible worst case consequence is assumed for the failure mechanism considered.



Table 2. Occurrence ratings. Rating Criteria

Frequent Overstress failure or very low TTF

Reasonably Probable Low TTF

Occasional Moderate TTF

Remote High TTF

Extremely Unlikely No overstress failure or very high TTF

Table 3. Severity ratings.

Rating Criteria

Very high or catastrophic

System failure or safety-related catastrophic failure

High Loss of function or severe injury Moderate or significant

Gradual performance degradation or minor injury

Low or minor System operable at reduced performance or no injury

Very low or none Minor nuisance

Table 4. Risk matrix.

OCCURRENCE

Frequent

Reasonably Probable

Occasional

Remote

ExtremelyUnlikely

SE

VE

RIT

Y

Very high or catastrophic

High risk

High risk

High risk

Moderate risk

Moderate risk

High High risk High risk Moderate

risk Moderate

risk Low risk

Moderate or significant

High risk

Moderate risk

Moderate risk

Low risk Low risk

Low or minor

High risk Moderate risk

Low risk Low risk Low risk

Very low or none

Moderate risk

Moderate risk

Low risk Low risk Low risk

A “very high or catastrophic” severity rating is assigned to

a failure mode that may involve loss of life or complete failure of the system. A “high” severity rating is assigned to a failure mode that might cause severe injury or a loss of function of the system. A “moderate or significant” rating is assigned to failure modes that may cause minor injury or gradual degradation in performance over time through loss of availability. A “low or minor” rating is assigned to a failure mode that may not cause any injury or result in the system operating at reduced performance. A “very low or none” severity rating is associated with a failure that does not cause any injury and has no impact on the system, or may be a minor nuisance.

The final prioritization of the failure mechanisms is performed by rating the failure mechanisms according to three risk levels, namely “low”, “moderate” and “high”, using the risk matrix presented in Table 4. In principle, all failure mechanisms with a “high risk” level are high priority

mechanisms that need to be accounted for and controlled. Further prioritization within a given risk level may be performed depending on the product type, use conditions, or needs and objectives of the organization.

3.7. Documentation The FMMEA process facilitates the organization,

distribution, and analysis of failure data. In addition, FMMEA also documents the corrective actions considered and implemented based on the results of the FMMEA. After corrective actions are implemented, the FMMEA can be maintained and updated to generate a new list of high priority failure mechanisms.

For products already developed and manufactured, a root- cause analysis of failures that occur during testing and usage may be conducted, and corrective actions taken to eliminate or reduce the impacts of the failures. The documented history and lessons learned provide a framework for FMMEA of future products or future product versions.

3.8. Application of FMMEA to an Electronic Assembly A printed circuit board (PCB) assembly used in an

automotive application was selected to demonstrate the FMMEA process. This assembly consisted of an FR-4 PCB with copper metallization, plated through-holes (PTH), and eight surface mount inductors that were soldered onto the PCB pads using 63Sn-37Pb solder. This assembly was mounted in the engine compartment of a 1997 Toyota 4Runner, and was mechanically connected to the compartment at its all four PCB corners. Assembly failure was defined as one that would result in breakdown, or no current passage in the event detector circuit. To detect failure, the PTHs were solder filled and an event detector circuit was connected in series with all inductors through the PTHs. The assembly was powered independently from the automobile electrical system using a three-volt battery source. It was verified that no external high current, voltage, magnetic or radiation sources significantly impacted on the assembly.

The FMMEA worksheet for this application is shown in Table 5, which details the system elements, failure modes, causes and mechanisms, models, susceptibility, occurrence, severity and risk. For all of the elements listed, the corresponding functions, the potential failure modes and their physical locations were identified. For example, for solder joint interconnections the potential failure modes are open and intermittent change in resistance.

For demonstration purposes, it was assumed that the board and its components, and failure test apparatus, were defect free. This is a valid assumption if proper screening is conducted after manufacture. In addition, it was also assumed that no damage was made to the assembly after manufacture. Based on these assumptions, potential failure causes were identified for the failure modes identified. For example, for the solder joint interconnections, potential failure causes for open and intermittent change in resistance are temperature cycling, random vibration or sudden shock impact caused by vehicle collision.

Based on the potential failure causes that were associated



with the failure modes, the corresponding failure mechanisms were identified. For the solder joints, for example, the mechanisms driving an open circuit and intermittent change in resistance were solder joint fatigue and fracture.

Appropriate failure models were identified for the failure mechanisms from the literature. Product geometry was obtained from design specification, board layout drawing and component manufacturer data sheets. For example, for solder joint fatigue, Coffin-Manson’s [20] model was used for stress and damage analysis due to temperature cycling.

After all potential failure modes, causes, mechanisms and models were identified for each element; an initial prioritization of the failure mechanisms was made based on the life cycle environmental and operating conditions. Temperature, vibration and humidity conditions were based on estimates provided by the Society of Automotive Engineers (SAE) environmental handbook for automotive underhood environments [21], as no corresponding manufacturer field data were available for automotive underhood environments in the Washington DC area. The SAE handbook specifies a maximum ambient temperature of 121ºC, and maximum relative humidity of 98% at 38oC, for automotive underhood environments [21]. The average daily maximum and minimum temperature in the Washington DC area over the duration of the study were 27ºC and 16ºC, respectively [22]. The maximum shock level was assumed to be 45G for 3 milliseconds. The car was assumed to operate on average three hours per day, that were divided in two trips of equal duration in the Washington, DC area. Failures induced by electrical overstress (EOS) and electrostatic discharge (ESD) were assigned a “low” risk level for the test assembly under analysis, considering both the absence of active devices and the low voltage level supplied the batteries. Electromagnetic interference (EMI) was also assigned a “low” risk level as the circuit function was not susceptible to transients.

The occurrence ratings of the wearout failure mechanisms were determined by comparing the time-to-failure of each wearout mechanism, with the overall time-to-failure obtained with all wearout mechanisms acting simultaneously. The time to failures were calculated using calcePWA1. In absence of failure model for inductor wearout of insulation, the occurrence rating was derived from inductor failure rate data published by Telcordia [23]. From prior knowledge of PCB pad fatigue, this mechanism was assigned a “remote” occurrence rating. CalcePWA predictions indicated that a shock level of 45G for 3 ms would produce no interconnect or board failure. Shock failure mechanisms were therefore was assigned an “extremely unlikely” occurrence rating. Since no shock failure was expected to affect the board and second level interconnections, it was assumed that shock would not cause pad failure either, and this pad failure mechanism was also assigned an “extremely unlikely” rating. As the board laminate glass transition temperature, namely 150ºC, exceeded the estimated maximum ambient air temperature,

121ºC [21], glass transition was assigned an “extremely unlikely” rating.

In terms of severity rating, as the PTHs were only used as terminations for the inductors, a short or open PTH would have no impact on circuit functionality. Consequently, this failure mechanism was assigned a “very low” severity rating. For all other elements, any failure mode would impact circuit functionality. Hence, all failure modes for all other elements were assigned a “very high” severity rating.

Using the risk matrix presented in Table 4, of all failure mechanisms considered, solder joint fatigue due to thermal cycling and vibration were the only mechanisms that were associated with a high risk and thus were considered as high priority. This was confirmed by the corresponding field experiment, where the board assembly failed by combined solder joint thermal and vibrational fatigue [24].

4. Summary

FMMEA allows the design team to take into account the available scientific knowledge of failure mechanisms and merge them with the systematic features of the FMEA template with the intent of “design for reliability” philosophy and knowledge. The part of the FMEA that is incorporated in the FMMEA aids in being systematic in the identification process so that all the elements are considered and nothing is overlooked. The idea of prioritization embedded in the FMEA process is also utilized in FMMEA to identify the mechanisms that are likely to cause failures during the product life cycle.

FMMEA differs from FMEA in a few respects. In FMEA, potential failure modes are examined individually and the combined effects of coexisting failures causes are not considered. FMMEA on the other hand considers the impact of failure mechanisms acting simultaneously. FMEA involves precipitation and detection of failure for updating and calculating the Risk Priority Number (RPN), and cannot be applied in cases that involve a continuous monitoring of performance degradation over time. By contrast, FMMEA does not require the failure to be precipitated and detected, and the uncertainties associated with the detection estimation are not present. The use of environmental and operating conditions is not made at a quantitative level in FMEA. Consequently, at best these conditions are used to eliminate certain failure modes. FMMEA prioritizes the failure mechanisms using information on stress levels of environmental and operating conditions to identify high priority mechanisms that must be accounted for in the design or be controlled. This prioritization overcomes the shortcomings of the RPN prioritization used in FMEA, which can provide a false sense of granularity. Thus the use of FMMEA provides additional quantitative information regarding product reliability, and opportunities for improvement, as it takes into account specific failure mechanisms and the stress levels induced by environmental and operating conditions in the analysis process.

1 A physics-of-failure based virtual reliability assessment tool developed by CALCE Electronic Products and Systems Center, University of Maryland.


Table 5. FMMEA worksheet for a printed circuit board assembly mounted in an automotive underhood environment.

Element

Potential failure mode

Potential failure cause

Potential failure mechanism

Mechanism type

Failure model Failure

susceptibility Occurrence Severity Risk

PTH

Electrical open in PTH

Temperature cycling Fatigue Wearout

CALCE PTH barrel thermal fatigue [25]

> 10 years Remote Very low Low

Metallization Electrical short / open, or change in electrical

resistance

High temperature Electromigration Wearout Black [26] > 10 years Remote Very high Moderate

High relative humidity Corrosion

Wearout Howard [27] > 10 years Remote Very high Moderate

Ionic contamination Wearout

Component (Inductor)

Electrical short / open between windings and

core

High temperature

Wearout of winding insulation

Wearout No Model --- Remote* Very high Moderate

Interconnect

Electrical open, or

intermittent change in electrical resistance

Temperature cycling Fatigue

Wearout Coffin-Manson

[20] 170 days Frequent Very high High

Random vibration Wearout Steinberg [28] 43 days Frequent Very high High

Sudden impact Shock Overstress Steinberg [28] No failure Extremely unlikely

Very high Moderate

PCB

Electrical short between PTHs

High relative humidity CFF Wearout Rudra and Pecht

[29] 4.6 years Occasional Very low Low

Crack / fracture

Random vibration Fatigue Wearout Basquin [28] > 10 years Remote Very high Moderate

Sudden impact Shock Overstress Steinberg [28] No failure Extremely unlikely

Very high Moderate

Loss of polymer strength

High temperature Glass transition Overstress No model No failure Extremely unlikely

Very high Moderate

Open

Discharge of high voltage through

dielectric material EOS/ESD Overstress No model Eliminated in first level prioritization Low

Excessive noise

Proximity to high current or magnetic

source EMI Overstress No model Eliminated in first level prioritization Low

Pad

Lift / crack

Temperature cycling / Random vibration

Fatigue Wearout No Model

--- Remote Very high Moderate

Sudden impact Shock Overstress --- Extremely unlikely

Very high Moderate

* Based on failure rate data for inductors from Telcordia [23].



FMMEA has the potential to offer several benefits to organizations. It provides specific information on stress conditions so that that the acceptance and qualification tests yield useable result. The use of the failure models at the development stage of a product also allows for appropriate “what-if” analysis on proposed technology upgrades. FMMEA can also be used to aid several design and development steps considered to be the best practices, which can only be performed or enhanced by the utilization of the knowledge of failure mechanisms and models. These steps include virtual qualification, accelerated testing, root cause analysis, life consumption monitoring and prognostics. All the technological and economic benefits provided by these practices are realized better through the adoption of FMMEA.

FMMEA enhances the value of FMEA, by identifying and evaluating the relevant failure mechanisms and models, using stress levels of environmental and operating conditions and provides a high return on investment by providing knowledge about the possible failures and their causes in a quantifiable manner. While FMEA and FMECA are often implemented as a standard requirement or contractual obligation, FMMEA makes the process useful by incorporating the scientific

knowledge regarding the failure mechanisms and models.

5. References

1. Coutinho, J. S., “Failure-Effect Analysis”, Trans. New York Academy of Sciences, Vol. 26, 1964, pp. 564-585.

2. Bowles, J.B., “Fundamentals of Failure Modes and Effects Analysis,” Tutorial Notes Annual Reliability and Maintainability Symposium, 2003.

3. Kara-Zaitri, C., Keller, A.Z., Fleming, P.V., “A Smart Failure Mode and Effect Analysis Package,” Annual Reliability and Maintainability Symposium Proceedings, pp. 414 - 421, 1992.

4. “Guidelines for Failure Mode and Effects Analysis for Automotive, Aerospace and General Manufacturing Industries,” Dyadem Press, Ontario, Canada, 2003.

5. Electronic Industries Association, “Failure Mode and Effect Analyses”, Electronic Industries Association G-41 Committee on Reliability, Reliability Bulletin No. 9, November 1971.

6. United States Department of Defense, “Procedures For Performing A Failure Mode Effects and Criticality Analysis”, US Mil-Std-1629 (ships), November 1, 1974, US Mil-Std- 1629A, November 24, 1980, US Mil-Std-1629A/Notice 2, November 28, 1984.

7. Bowles, J.B. and Bonnell, R.D., “Failure Modes, Effects and Criticality Analysis – What Is It and How To Use It,” Tutorial Notes Annual Reliability and Maintainability Symposium, 1998.

8. International Electrotechnical Commission, “Analysis Techniques for system reliability—Procedure for failure mode and effects analysis (FMEA)”, International Electrotechnical Commission, IEC Standard Pub. 812, 1985.

9. SAE Standard SAE J1739 “Potential Failure Mode and Effects Analysis in Design (Design FMEA) and Potential Failure Mode and Effects Analysis in Manufacturing and Assembly Processes (Process FMEA) and Effects Analysis for Machinery (Machinery FMEA)” August 2002.

10. ISO, “ISO/TS 16949 - The Harmonized Standard for the Automotive Supply Chain,” ISO, 2002.

11. Signor, M.C., “The Failure-Analysis Matrix: a Kinder, Gentler Alternative to FMEA for Information Systems,” Annual Reliability and Maintainability Symposium Proceedings, pp. 173-177, January 2002.

12. IEEE Standard 1413.1-2002, IEEE Guide for Selecting and Using Reliability Predictions Based on IEEE 1413, 2003.

13. JEDEC Publication JEP 131 “Process Failure Modes and Effects Analysis (FMEA),” February 1998.

14. Hu, J., Barker, D., Dasgupta, A., and Arora, A., “Role of Failure-mechanism Identification in Accelerated Testing,” Journal of the IES, Vol. 36, No. 4, pp. 39-45, July 1993.

15. Failure Modes and Effects Analysis (FMEA): “A Guide for Continuous Improvement for the Semiconductor Equipment Industry,” Technology Transfer #92020963B-ENG, SEMATECH, 1992.

16. JEDEC Publication JEP 148 “Reliability Qualification of Semiconductor Devices Based on Physics of Failure Risk and Opportunity Assessment,” April 2004.

17. Dasgupta, A. and Pecht, M., “Material Failure Mechanisms and Damage Models,” IEEE Transactions on Reliability, Vol. 40, No. 5, pp. 531-536, December 1991.

18. JEDEC Publication JEP 122-B “Failure Mechanisms and Models for Semiconductor Devices,” August 2003.

19. Lall, P., Pecht, M., and Hakim, E., “Influence of Temperature on Microelectronics and System Reliability”, CRC Press, New York, 1997.

20. Foucher, B., Boullie, J., Meslet, B., Das, D., “A Review of Reliability Predictions Methods for Electronic Devices,” Microelectronics Reliability, Vol. 42, No. 8, pp. 1155-1162, August 2002.

21. Society of Automotive Engineers, Recommended Environmental Practices for Electronic Equipment Design, SAE J1211, Rev. Nov 1978.

22. Monthly Temperature Averages for the Washington, DC Area, <http://www.weather.com/weather/climatology/monthly/USD C0001> accessed August 17, 2003.

23. Telcordia Technologies, Special Report SR-332: “Reliability Prediction Procedure for Electronic Equipment Issue 1,” Telcordia Customer Service, Piscataway, N. J., May 2001.

24. Ramakrishnan, A., and Pecht, M., “A Life Consumption Monitoring Methodology for Electronic Systems,” IEEE Transactions on Compoonents and Packaging Technologies, Vol. 26, No. 3, pp. 625-634, 2003.

25. Bhandarkar, S.M., et al., "Influence of Selected Design Variables on Thermomechanical Stress Distributions in Plated Through Hole Structures," Transaction of the ASME - Journal of Electronic Packaging, Vol. 114, pp. 8-13, March 1992.

26. Black, J.R., “Physics of Electromigration,” IEEE Proceedings of International Reliability Physics Symposium, pp. 142-149, 1983.

27. Howard, R.T., “Electrochemical Model for Corrosion of Conductors on Ceramic Substrates,” IEEE Transactions on CHMT, Vol. 4, No 4, pp. 520 – 525, December 1981.

28. Steinberg, D.S., “Vibration Analysis for Electronic Equipment,” 2nd Edition, John Wiley & Sons, 1988.

29. Rudra, A.B., Li, M., Pecht, M., and Jennings, D., “Electrochemical Migration in Multichip Modules,” Circuit World, Vol. 22, No. 1, pp. 67-70, 1995.


Education

Use of Failure Mechanisms enhance FMEA and FMECA