An Automated Failure Modes and Effect Analysis Based ...users.aber.ac.uk/nns/publications/papers/PHM09-Snooke-revision1.pdf · An Automated Failure Modes and Effect Analysis Based

An Automated Failure Modes and Effect Analysis BasedVisual Matrix Approach to Sensor Selection and

Diagnosability AssessmentNeal Snooke 1

1 Aberystwyth University, Department of Computer Science, Ceredigion, SY23 3DB, United [email protected]

ABSTRACT

This paper builds on the ability to produce a com-prehensive automated Failure Modes and EffectsAnalysis (FMEA) using qualitative model basedreasoning techniques. The automated FMEA pro-vides a comprehensive set of fault–effect rela-tions by qualitative simulation and can be per-formed early in the design process. The compre-hensive nature of the automated FMEA results ina fault-effect mapping that can be used to investi-gate the diagnosability of the system. A commonrequirement is to facilitate cost reductions by re-moving sensors or to improve diagnosability byincluding additional sensors. Measurements aretypically expensive (in the broadest sense) andthe problem addressed by this paper is how to al-low select a set that fulfills the diagnosability re-quirements of the system. This paper documentsa technique that provides an engineer with easyaccess to information about diagnostic capabilityvia a matrix visualisation technique. The focusof the work was for the fuel system of an Unin-habited Aerial Vehicle (UAV) although the sys-tem has also been used on an automotive electri-cal system, and is applicable to a wide range ofschematic and component based systems.

1 INTRODUCTIONThis paper presents a technique to allow an engineerto investigate the relationship between sensor selec-tion and the ability of a one step diagnostic systemto detect faults. It has been developed as part of AS-TRAEA (ASTRAEA, 2009), a pioneering £32 mil-lion UK aerospace programme which is addressingkey technological and regulatory issues in order toopen up non-segregated airspace to uninhabited au-tonomous aircraft.

Automated failure mode and effects analysis(FMEA) is a technique that is used to provide a com-prehensive and consistent description of the effects of

This is an open-access article distributed under the termsof the Creative Commons Attribution 3.0 United States Li-cense, which permits unrestricted use, distribution, and re-production in any medium, provided the original author andsource are credited.

component faults (Price et al., 1997; 2006). The re-sults can be used to generate symptoms for an onboarddiagnostic application. Conceptually this is the pro-cess of generating an effect→ fault mapping from thefault→ effect mapping provided by the FMEA whileexcluding effects present within nominal observations.It is not the purpose of this paper to describe in de-tail the transformation of the FMEA into a set of di-agnostic symptoms but rather to use the symptoms toassist diagnosability assessment. The FMEA baseddiagnostic system has a comprehensive fixed set ofsymptoms that detect as many of the faults itemisedin the FMEA as possible, and is more closely relatedto a manually coded set of diagnostics than traditionalrun time consistency or abductive MBR approaches(Peischl and Wotawa, 2003) that compare the resultsof running an on-board system model with actual ob-servations (Struss and Dressler, 2003; Struss, 1992;Reiter, 1987).

The proposed diagnostic system is limited to theoperating modes considered in the FMEA, and it canonly detect faults defined in the component library butit does have advantages of fast on-board execution,comprehensive analysis of all possible symptoms, andbehaviour that can be validated for certification pur-poses. This allows combinations of measurements toform symptoms that may not be immediately obviousto an engineer and in any case would be very tedious(and potentially error prone) to generate manually.Symptoms are generated effortlessly since no addi-tional modelling is required beyond that to produce theFMEA. The FMEA requires a library of componentswith failure modes, system schematic, and operatingscenaro (Price et al., 1997). The system behaviouris simulated from a schematic based structural modeland compositional component (nominal and failure)behaviour models. This ensures the correct (qualita-tive) effects are available for structural and functionalfailures thus allowing wider range of failures than apurely structural model, but without the arbitrary mod-elling decisions associated with manually producedcausal models (Console et al., 1989). While mul-tisignal and dependency modelling approaches such asTEAMS-RT (Deb et al., 1995) allow sophisticated test

1

Annual Conference of the Prognostics and Health Management Society, 2009

sequencing and do not require fault models thus al-lowing unforseen faults to be diagnosed, there is sub-stantial modelling effort required specifically to sup-port the diagnosis and diagnosability investigations inthis approach.

It is useful to note that the qualitative simulationmeans that exact systems parameters are not requiredallowing a broad analysis early in the design cycle. Forexample we may have the topology of a new aircraftfuel system design concept but the pipe lengths maynot yet be known, however the symptom when valvea open and pump b on then pressure in pipe x lowfor potential faults {blockage in pipe y, valve z stuckopen} can be generated. In the Automotive industryelectrical systems suffer from these issues to an evengreater extent, where system and harness designs mustbe proposed and analysed for failure characteristicsprior to the availability of detailed spatial or compo-nent parameter information. Qualitative measurementsallow broad regions of system (mis)behaviour to betreated by a relatively small set of symptoms which isgood for assessing broad diagnosability issues. Usingthe qualitative rules on-board requires some additionalwork and in practice measurement thresholding and aBayesian network that allows weighting of fault typesand measurement reliability has been used to rank di-agnoses for the ASTRAEA project, however for thepurpose of investigating diagnosability at design timefor a topologically complex electrical or fluid flow sys-tem, the qualitative regions of behaviour provide rele-vant details based only on logical diagnostic espres-sions of qualitative measurements.

An onboard diagnostic system will only have accessto a limited number of measurements, and the abilityto rapidly investigate at early design stages which mea-surements may be useful for fault detection is valuable.When many hundreds of these symptoms are possi-ble, each requiring selections of measurements, wefind that by providing or excluding measurements theset of usable diagnostic rules and hence system diag-nosability and isolatability is changed. Measurementsare typically expensive (in the broadest sense) and theproblem addressed by this paper is how to allow selecta set that fulfills the diagnosability requirements of thesystem.

The Automated FMEA report itself contains a highlevel description of fault effects in terms of the failureof system function, however more detailed informationconcerning every variable and signal in the system isproduced by the simulation, and diagnostic rules cantherefore be generated utilising very detailed informa-tion. In most systems there are various costs (financial,mass, layout, harness complexity) involved with eachsensor, resulting in a need to compromise between di-agnostic ability and sensing and therefore a small setof the most useful and obtainable measurements needto be selected. Due to the complexity of the mappingbetween sensors, symptoms and faults it is a non trivialtask for an engineer to answer these questions without

tool assistance. Typical issues that require considera-tion are:• Which faults are diagnosable by the system?• Which additional sensors could be included to di-

agnose additional or critical faults?• What is the best ‘diagnostic value’ that can be ob-

tained by adding additional sensors.Some existing optimisation methods are very spe-

cific solutions to an individual system e.g. (Maul etal., 2007; Mushini and Simon, 2005) and do not sup-port schematic and component library based analy-sis. Other approaches are generic but require largemodelling effort to enable varied additional applica-tion specific information to be taken into account (De-bouk et al., 1999; Trave-Massuyes et al., 2006). Evenwhen the information required to assess diagnosabilitycan be modelled, the problem has large search spacesand techniques such as genetic algorithms (GA) areoften used to find solutions (Spanache et al., 2004;Mushini and Simon, 2005; Maul et al., 2007). Ex-perience shows that in many cases there are simplytoo many additional application specific considera-tions that an engineer can resolve but which wouldbe difficult to provide to a fully automated system.For example spatial constraints associated with addingnew sensors for electrical systems where an engineermay have a good idea where it is feasible to add sen-sors, but without a detailed and 3D spatial model in-tegrated with the electrical circuit description it is im-possible for an automated system to decide. A sec-ond example is the knowledge of which sensors arerequired for basic system functionality and thereforehave a very low cost to any diagnostic system andthose which are present for diagnostic purposes only.A system engineer will know this due to his in depthfunctional and causal understanding of the system ar-chitecture but it is very difficult to extract this infor-mation from an electrical circuit diagram. As a finalexample an engineer may know that some parametersare very noisy and should perhaps be avoided (or re-quire additional processing) as inputs to a diagnosticsystem for example a fuel level sensor on an aerobaticaircraft. Modelling could be provided for all of theabove situations however the investment in modellingis high for relatively low return, and we take the al-ternative approach of providing tools that support rel-atively simple models but allow the engineer to easilymake decisions and understand the effects on the po-tential diagnosability of the system.

The following sections of this paper firstly outlinethe FMEA generated symptoms and their characteris-tics and we briefly describe a software tool to allowan engineer to explore the diagnostic system using asimulator. A graphical matrix approach is presentedto assist an engineer to quickly visualize the diagnos-tic behavior of the system. This allows rapid investi-gation of the sensor selection and placement optionsavailable. The technique has been used on several case

2


Bravo on ADTF

Hist o ry nst - o r ig inal versio n 7/ 1/ 2 0 0 8

nst - name chang es 11/ 1/ 2 0 0 8

nst - chang ed eng ines 2 5/ 1/ 2 0 0 8

nst - chang ed eng ine ret urn. 8 / 4 / 2 0 0 8

nst - chang ed names + mo d els. 2 9 / 4 / 2 0 0 8

nst - chang ed p ressure mo d els. 15/ 5/ 2 0 0 8

FL_TR_LH FL1_TR

FL2_TR

FL5_TR FL6_TR FL7_TR

FL4_TR

FL3_TR

FL_TR_RH

TP1_AT_LH

TP3_AT_LH

TP2_AT_LH

RP1_RL_LH

RP2_4_RL_LH

RP2_3_RL_LH

RL_FS_LH

RL2_2_RL_LH

TP1_WT_LH TP2_WT_LH

FL1_1_FS_LH

FL1_2_FS_LH

FL1_3_FS_LH

TP1_FL_LH

TP4_FL_LH

TP5_FL_LH

TP6_FL_LH

TP7_FL_LH

TP8_FL_LH TP3_FL_LHR

P2_1_RL_LH

RP2

_1_R

L_R

H

TP3_

FL_R

H

TP8_FL_RH

TP7_FL_RH

TP6_FL_RH

TP5_FL_RH

TP4_FL_RH

TP2_FL_RH

TP1_FL_RH

RP2_2_RL_RH

RP2_3_RL_RH

FL1_2_FS_RH

FL1_3_FS_RH

FL1_1_FS_RH

TP2_WT_RH TP1_WT_RH

TP2_AT_RH TP1_AT_RH

TK_AT_LH TK_AT_RH

CP_TR

CP_AT_LH CP_AT_RH

CP_FL_LHCP_FL_RH

TJ_AT_LH TJ_AT_RH

TJ_RL_LH TJ_RL_RH

TJ_FL_LH TJ_FL_RH

TJ_FL_FS_LH TJ_FL_FS_RH

TJ_TR_RH

TJ_TR_LH

DV_WT_RHDV_WT_LH

IV_FL_LH IV_FL_RH

DV_AT_LH DV_AT_RH

CK_TR

F_WT_LH F_WT_RH

F_FL_RHF_FL_LH

FC_RL_LH FC_RL_RH

EH_LH EH_RH

FT_FL_LH FT_FL_RH

TP2_FL_LH

RL_FS_RH

TP3_AT_RHRP2_4_RL_RH

RP1_RL_RH

TVL_RL_LH TVL_RL_RH

TVL_FL_RHTVL_FL_LH

TVL_TR_LH TVL_TR_RH

OC_WT_LH OC_WT_RH

PT_FL_LH PT_FL_RH

Figure 1: Fuel system schematic

3


studies including an aircraft fuel system and an auto-motive Daylight Running Light (DTRL) electrical sys-tem and these systems with differing diagnostic char-acteristics are presented as case studies to illustrate thediagnostic system generation.

2 SYMPTOM GENERATION AND THEDIAGNOSTIC SYSTEM

Given a set of symptoms S1..SN derived from anFMEA, each symptom is comprised of a tuple of(Ce,Oe, F ) where both Ce and Oe are logical expres-sions and F is a non empty set of faults that are indi-cated when the symptom is satisfied. Each of these as-sociated faults will have produced an abnormal set ofobservations in the FMEA that will lead to the symp-tom being satisfied. Ce specifies when the symptomis applicable and is termed the symptom condition ex-pression. If Ce is false then the symptom is consideredinvalid and cannot be used. Oe is termed the symptomexpression. If Ce∧Oe evaluates true then one or moreof the faults F are indicated. Table 1 shows the possi-ble states of a symptom.

Table 1: Symptom states

Ce Oe Faults indicatedfalse false ∅ (no fault information)false true ∅true false ¬F (∅ for non negatable symptoms)true true F implicated

The third row illustrates a ‘negatable’ symptom ableto exonerate faults (¬F ) and is the reason for Ceexpressions. We have observed that allowing negat-able symptoms typically leads to fewer symptomsbut requires more terms in the expressions than non-negatable symptoms. The ability to exonorate faultswhen observations are absent is important when thesymptoms are used in some forms of on board diagno-sis based on for example Bayesian networks.

Both Ce and Oe are logical expressions formedfrom boolean observations and the usual logical oper-ators. Observations may be formed from any availablesensor reading, variable, state or system parameter thatcan be observed. Inputs (externally controlled values)are also considered as measurements and in fact thediagnostic system does not need to differentiate inputsand outputs during symptom generation or when inuse, although observations that are required in the con-ditional part of a symptom often turn out to be inputsto satisfy the definition of a symptom. Most sensorsproduce measurements and a comparison operator isnormally used to create an observation (e.g. pressure< 5, or flow 6= high). The use of a qualitative simulator(Price et al., 2003; Lee and Ormsby, 1991; Lee, 2000;Snooke, 2007) makes it unnecessary to consider nu-merical values at the symptom generation stage sinceall measurements produced by the simulator are from

qualitative quantity spaces for example ‘high’, ‘zero’‘lower than expected’ etc. Typical symptom exam-ples for the system in Figure 1 are shown in Table 2.The example symptoms demonstrate qualitative anal-ysis; in the final row we see that when the pump (CP)is on and a valve (TVL) is set, a low flow transducer(FT) observation indicates a possible blockage in twoplaces.

Based on the systems analysed for automatedFMEA from the automotive application areas we findthere are typically hundreds of qualitatively distinctfaults (several for each component) and several po-tential measurements associated with each component(Price, 2000). Although a symptom can predict anynumber of faults, and a fault can be predicted by anynumber of symptoms, we have found that the numberof qualitative symptoms generated is of the same or-der as the number of faults considered in the FMEA.Informally this seems to be for the following reason.Useful symptoms do not require more than few mea-surements, and in fact symptoms that require manymeasurements (>10) are disallowed because they gen-erally occur due to artifactual issues associated withincomplete exercising of the system state space bythe FMEA, or due to approximations in the compo-nent behaviour modelling. In addition if a reason-able level fault isolation is possible (and we assumeit is given the above observation of several measure-ments per component), symptoms on average predicta relatively small number of faults and because symp-toms are generated to be as specific as possible eachfault is on average predicted by a relatively small num-ber of symptoms. Therefore on average the numberof symptoms is of the same order as the number offaults, and since symptoms may require several mea-surements but measurements are on average present inmore than one symptom the number of measurementsis also of the same order as the number of faults. Forthese systems it is feasible to use visual matrices de-picting measurement-symptom-fault relationships asproposed in the next section.

An example automatically generated diagnostic sys-tem with 168 symptoms produced from an automatedFMEA is illustrated in Figure 2 for a twin engine air-craft fuel system in Figure 1 with 184 possible faults.The tool in the Figure allows an engineer to exercise adiagnostic system by inserting known faults in the toppanel. The values determined by the simulation areimmediately shown in middle section. The functionsare derived from a functional model of the system andprovide interpretation of the behaviour for presentationto an engineer in an FMEA output (Bell et al., 2007;Bell and Snooke, 2004; Snooke and Bell, 2002). Func-tions are not used in the evaluation of the symptoms(but do have a role in their generation) and are onlyshown in the interface to allow easy recognition of theoverall effect of the fault to the user. The lower part ofthe screen shows the results of the diagnosis. On theleft are all symptoms where Ce = true. The symp-

4


Table 2: Example symptoms

Ce Oe F

TVL RL LH.position==‘isolation’ TVL FL LH.tellback==‘crossover’ TVL FL LH.stuck crossoverTVL RL LH.position==‘crossover’∧CP FL LH.control==‘on’

OC WT RH.tank level==‘higher thanexpected’

TP4 FL LH.fractureTP2 FL LH.fractureTP4 FL LH.partialblocked

CP FL RH.Control==‘on’∧TVL RL RH.position==‘normal’

FT FL RH.flow==‘low’ FL1 1 FS RH.partialblockedTP5 FL RH.partialblocked

input configurations

fault simulation results

user selected fault

valid symptoms

diagnosis

Figure 2: Diagnostic evaluator interface

tom set is negatable and therefore a check in the I/Ecolumn of Figure 2 indicates that Oe = true for thesymptom and therefore indicates a set of faults. Thereis no check in the I/E column if Oe = false and in thiscase the symptom will exonorate associated faults. Asimple ranking of faults is provided based on the sumof the total number of symptoms indicating and ex-onerating each fault (shown in parenthesis). In thisexample there are nine top ranking faults and theseare in fact indistinguishable from the sensing available.The real diagnostic system includes other informationabout symptom and measurement confidence, usingBayesian methods to provide more fine grained faultranking. This tool simply allows the symptom gener-ation to be exercised. Further down the list faults mayhave negative scores, showing that there is evidencefrom the symptoms that those faults are not present.

The engineer can select or deselect any sensor andthe effect on the diagnosis is shown instantly and thisis useful to check the applicability of specific measure-ments in specific fault scenarios, however it is not suf-ficient to enable an engineer to decide on a set of sen-sors which will cover all possible faults on the system,due to the number of operating modes and faults pos-sible. It is this issue that provides the main focus of theremainder of this paper.

3 FAULT MATRICESThe relationship between observations (sensor mea-surements), symptoms and faults can be representedusing two 2 dimensional matrices as shown by ageneric example in Figure 3. This Figure is intendedonly to show the form of the matrices, for a real systemthere may be hundreds of rows and columns, and it is

5


the visual correlations present in the matrices that pro-vide information to the engineer (zoom/pan is avail-able for larger matrices). A graphical method to as-sess competing requirements was also described by(Thompson et al., 1999) however this was aimed atarchitectural choices rather than sensor selection.

Each symptom is represented by a column in thematrices on the left of the Figure. In the upper matrixany measurement included in the Ce or Oe expressionfor a symptom is indicated by non empty element inthe column representing that symptom. In the lowermatrix the set of faults indicated by each symptom isindicated by a non empty element in the column rep-resenting that symptom. The top matrix shows which

SYMPTOMS

MEASUREMENTS

FAULTS

M1

S11S1

M7

F1

F9

Figure 3: Measurement - Fault Matrix

measurements are required for each symptom and thelower matrix shows the faults that each symptom candiagnose. A colour coding system is used to indicatethe status of each element and these change as addi-tional measurements are included or excluded in themeasurement vector (top right). Green indicates thatan item is available to the diagnostic system (also asmall tick is shown for clarity) and grey indicates thatthe item forms part of a diagnostic relationship but isnot available because it needs a measurement not yetobservable to the diagnostic system. Red is used to ex-plicitly exclude items - for example when the engineerhas decided that a measurement is not available.

Once a measurement is made available it appears asa green element in the measurement vector (top right)and also as green elements for the symptoms that re-quire it in the corresponding row in the top matrix.Any symptoms that have all the necessary measure-ments available to evaluate their Ce and Oe expres-sions have the appropriate element coloured green inthe row vector on the (centre left) indicating that thesymptom can be fully evaluated and hence used to di-agnose its associated faults. The lower left matrix col-umn elements represent associated faults diagnosableby an available symptom, and are coloured green to

indicate a diagnosable fault. Finally any faults thathave one more more available diagnosing symptomsare coloured green in the faults vector (lower right). Alighter green colour (centre dot) in the top matrix in-dicates that a measurement is available to a symptombut the symptom requires further measurements.

If a measurement is excluded by the engineer thenit will be coloured red (a small cross shown) and anysymptoms and faults that therefore cannot be diag-nosed also turn red. Notice that it is necessary for allsymptoms that can diagnose a fault to be excluded be-fore the fault is not diagnosable. Hence, cells that arepink (dot) in the lower matrix indicate a symptom thatcannot be used for a fault that can be diagnosed us-ing an alternative. Elements that form diagnostic re-lationships but are undecided are coloured grey andmay therefore be included or excluded based on theavailability of undecided measurements. These willbe measurements that are neither chosen or excluded,symptoms that require undecided measurements anddo not include excluded measurements, and faults thatcould still be diagnosed if additional symptoms (mea-surements) are included.

A real example for the aircraft fuel system is shownin Figure 4. The GUI follows a similar structure toFigure 3 with the measurement-symptom matrix at thetop left and the symptom-fault matrix lower left. Themeasurements and faults are now shown as textual listswith the order of the lists being the same as the rowsin the matrices. Measurements can be selected or de-selected using the lists and the associated fault sta-tus is updated, together with the colour coding of thematrices. The yellow colour is used to allow sets ofmeasurements to be proposed prior to committing orexcluding them, allowing the incremental change infaults that can be diagnosed to be observed.

The visible patterns in the matrices are formed bythe structure that exists in the fault behaviour of thesystem. The patterns represent correlations betweenmeasurements, faults and symptoms. In addition thematrices are relatively sparse as expected since mea-surements, symptoms and faults form (overlapping)sets and subsets due to the structure of the system andthe predictable behaviour of the system in the pres-ence of faults. Specific patterns in the matrices graph-ically illustrate some characteristics of the diagnosticsystem:

• Highly populated rows in the measurement-symptom matrix shows measurements that partic-ipate in many symptoms and are therefore impor-tant to the diagnostic system.

• Similar patterns existing in more than one rowof the measurement-symptom matrix indicate thatthere are several measurements required as a set,for a given a set of symptoms. In practice wefind measurements (inputs) such as valve posi-tions and switches that affect major system statetypically have this characteristic.

6


Figure 4: Aircraft fuel system matrix example

• Highly populated columns in the measurement-symptom matrix indicate symptoms that requiremany measurements.• Highly populated columns in the fault - symp-

tom matrix indicate symptoms that can diagnosemany faults. These are the symptoms that providecheap detectability, but poor fault isolation.• Similar patterns in several fault - symptom

columns show that there may be a choice ofsymptoms that diagnose the same set of faults.

The ordering of the measurements, symptoms, andfaults will change the appearance of the matrices andwhere possible we would like to group related symp-toms and faults into rectangular blocks that representalternative symptoms that have equivalent diagnosticpower. Reordering the matrices is discussed in section4.1.

4 SENSOR SELECTIONSimply by selecting and deselecting measurements atany point in the measurement selection process an en-gineer can find out which (additional) measurementsprovide the ability to detect many faults in the contextof the currently available measurements. In Figure 4the user has already selected some measurements us-ing the tick boxes and the result of this in terms of the

symptoms and faults that can be diagnosed is shown asgreen elements (darker) and as tick boxes in the faultlist.

There are usually a set of measurements that willdefinitely be available to the diagnostic system, andsome that the engineer knows will be important in thediagnosis of a required set of faults and these can beselected. At some point the question will arise as to thenext set of measurements that diagnose the maximumnumber of faults.

The problem of finding n additional measurementsthat allow the maximum number of faults to be de-tected is exponential in the number of additional mea-surements if a brute force search is carried out. Dueto the localisation of measurement - fault relationshipsit is only useful to use small numbers for n, until anew ‘block’ of elements (measurements, symptomsand faults) is identified. For an exhaustive search if nis the number of additional measurements required andr is the number of unselected measurements remainingthere are r!

(n∗(r!−n)) combinations of measurements toconsider. We has observed that in the early stages sys-tems tend to have a few critical measurements that pro-vide big diagnostic returns and so a relatively small nis adequate to find these, and once a good number ofthe measurements are determined, r becomes small al-

7


lowing larger n in reasonable time, although by thisstage symptoms and faults tend to be closely coupled,so adding a measurement gains a few additional faults,and therefore the next best n measurements provides asuperset of the faults that can be obtained by the nextbest n− 1 measurements.

The best solution may not necessarily be includedin best solutions for larger numbers of measurementsso strict hill climbing solutions do not work in gen-eral and allowing the engineer to choose n based onthe visible structure of the matrices provides a reason-able compromise. No attempt has been made to im-prove the search using other methods (e.g. backtrack-ing heuristics) because the major issue is that there areoften many possible solutions for ‘next best’ combina-tions of n measurements often due to symmetry in de-signs, or sensors equivalent for some diagnostic aspect,or simply different parts of the system structure all ofwhich require the same number of (different) measure-ments. For example there is little point in being ableto diagnose a left hand circuit aircraft fuel system faultand not an equivalent right circuit fault so a symmet-rical left and right of sensors would be added elimi-nating the alternative left or right permutations fromthe next set of measurements. Our experience is that itis better to only consider a small number of measure-ments and then investigate why there are alternatives,make a selection (noting any significant effects on thematrices) and then consider subsequent measurementsassociated with the next region of system structure andbehaviour.

Figure 5: Fuel System - Equally good measurements

The results of a search for the maximum number offaults diagnosable from n measurements naturally fallinto a hierarchy presented to the user that gives accessto the alternative solutions.

1. Top elements of the hierarchy specify the max-imum number of additional faults can be diag-nosed for each additional measurement set size

from 1..n. In the subsequent example in Figure9, the phrase “Best 2 measurements provide 80additional faults” is seen in the lower right.

2. For each of the elements in item 1 the total num-ber of different measurements involved in any ofthe possible solutions are listed. For example“Total 6 measurements used” indicating that thereare 6 distinct measurements that are used in somecombinations (in pairs for best 2 measurements)to form the best solutions.

3. There are often several different sets of measure-ments that can diagnose exactly the same set offaults. This forms the next grouping under item 1for example “4 combinations of 2 measurementsprovide 2 groups of faults” in Figure 5. Thismeans that there are 2 distinct sets of faults di-agnosable but 4 different pairs of measurementsthat have been found that are relevant to the 2 setsof faults.

4. The sets of faults are itemised together with themeasurements required for each fault set is givenunder item 3 showing which different measure-ments can be used to detect the set of faults. Oftenthere will be several similar sets of measurementswith only one different measurement alternative.

The engineer can select any set of measurements atany level in the above categorisation simply by select-ing any item in the hierarchy as illustrated in Figure 5.The impact on the symptom set and fault set is shownhighlighted in yellow on the matrices as in Figure 4where a whole set of measurements has been selected.By selecting alternately different groups of measure-ments the diagnostic effect can be visualised. For ex-ample some sets of measurements provide very smallchanges to a set of faults whereas others may providefor diagnosis of a completely different set of faults.Hovering over the matrices instantly produces a tooltipthat identifies what the element represents (see Figure4).

4.1 The diagonal matrixTo gain a much better understanding of the relation-ships contained within either matrix they can be au-tomatically reformed into an ‘approximate diagonalform’ which places all the non empty matrix elementsas close to an imaginary line from top-left to bottom-right as possible (this is the purpose of the “Order”buttons on the tool interface). The algorithm used issimilar to the well known bubble sort applied alter-nately to row and columns, with the ordering compari-son based on the imbalance of the number of non zerocells from the diagonal. Since the matrices are not gen-erally square a true diagonal matrix in the mathemati-cal sense is not possible.

The concept of a row (or column) weight is used todescribe the number of cells in either a row or columnto either side of the imaginary diagonal line across thematrix. Figure 6 shows an example 6 by 4 matrix.

8


Weight of row 1 = 0-5/3 + 4-5/3 = -5/3 + 7/3 = 2/3

1 2 30 4 5

0

1

2

3

MidPoint of row 1 = 1*5/3 = 5/3

Weight of row 2 = 1-10/3 + 2-10/3 = -7/3 + -4/3 = -11/3MidPoint of row 1 = 2*5/3 = 10/3

Weight of row 1 = 1-5/3 + 2-5/3 = -2/3 + 1/3 = -1/3

0

1

2

3

MidPoint of row 1 = 1*5/3 = 5/3

Weight of row 2 = 0-10/3 + 4-10/3 = -10/3 + 2/3 = -9/3MidPoint of row 1 = 2*5/3 = 10/3

Figure 6: Producing the diagonal matrix

The mid point of rows 1 and 2 are shown by the filledsymbols. The weight of each row is calculated as thesum of the distance (as a cell count) of each active cell(shown grey in Figure 6) from the mid point. In the up-per matrix of the example row 1 has a weight of 2

3 androw 2 has a weight of− 11

3 . By extension, the columnscan be similarly considered. If the imbalance of tworows is defined as the weight of row n−the weight ofrow n + 1, then the rows are swapped if the imbalanceis greater than zero unless the result of swapping therows creates a larger imbalance for the rows. In theexample the imbalance is 2

3 − (− 113 ) = 13

3 . This isgreater than zero and therefore the rows are swappedto produce the matrix shown in the lower part of Figure6, in which the imbalance is − 1

3 − (− 93 ) = 8

3 . Since83 is less than 11

3 the reordered matrix is consideredcloser to diagonal than the original and the swap is re-tained. A similar procedure is then carried out betweenrows 2 and 3, and so on. The overall effect of swapsis to reorder the lists of measurements, symptoms, andfaults. Each pair of rows are repeatedly considered inthe manner of a bubble sort, using the weight mea-sure as the ordering criterion. However, in contrast toa standard sort the weight of a row changes (and istherefore recalculated) when it is moved. The sort isundertaken alternately on rows and columns.

Once each pair of row and column sorts is com-pleted the total imbalance of the entire matrix is calcu-lated as the imbalance sum of all rows plus the imbal-ance sum of all columns. The alternate sorting of rowsand columns continues until no further reduction in thetotal matrix imbalance can be achieved. Once the cho-sen matrix is in diagonal form the unshared axis of theother matrix is sorted to make it as diagonal as pos-

sible. At this point the majority of the weight of thematrix is balanced around the diagonal as closely aspossible. This has the effect of bringing related mea-surements and symptoms (or symptoms and faults) to-gether on the diagonal and allows the user/engineerfurther insight to the diagnostic capability of the sys-tem by producing visual blocks of colour represent-ing the relationship between groups of measurements,symptoms and faults. Disjoint blocks also graphicallyillustrate parts of the system that are diagnosticallyseparate, for example sets of symptoms and measure-ments that are the only possibility for diagnosing a setof faults for some part of a system.

Each row or column sort is effectively a bubble sortwith a worst and average O(n2) complexity where nis the number of measurements, or symptoms, or faultsdependent of which dimension is being sorted. How-ever the matrices have two characteristics that in prac-tice seem to make the average complexity of the wholealgorithm not much worse than this. Firstly the ma-trices are rather sparse and secondly there is a strongrelationship between groups of elements on each axis.For example we find (and expect also) a set of faultsthat can be diagnosed by a set of symptoms using a setof measurements. The algorithm will only need a sin-gle sort on one dimension for a matrix that has a per-fect simple diagonal form since the order of one axiscan be arbitrary and the elements moved onto the di-agonal by reordering the other. The more ‘imperfect’the final diagonal matrix in the sense of the numberof empty elements between the diagonal and any nonzero element in the result, the more iterations of therow and column sort sequence could be needed. Thisis because the solution may require (worst case) a spe-cific ordering of each axis. The matrices are relativelysparse for the reasons outlined in section 2 and thiscombined with the systematic effects of faults and thestructure of the system cause the matrices to have agood ‘compact’ diagonal form, and in fact they willonly be useful if this is the case. Therefore only smallnumber iterations of the sorting should be required thishas been observed experimentally. We also observethat the algorithm is converging towards the solutionand therefore once the first sort is completed on eachaxis, subsequent sorts start with most of the elementsalready in the correct order. The visual effect is thatnon empty elements ‘bubble’ along the diagonal untileach group of elements has achieved its best order onthe diagonal.

The aim is to assist in the selection or removal ofmeasurement and therefore any elements that are al-ready decided are NOT included in the process and aremoved to the bottom or right of the matrix and do notparticipate in the sorting. This is why the diagonal linedoes not extend the full size of the center left matrix inFigure 12 (discussed later) which is also an exampleof a diagonal symptom-fault matrix showing blocks ofelements that represent distinct sets of symptoms thatdiagnose distinct sets of faults for an automotive sys-

9


Figure 7: Fuel system - Selected flow measurements

tem. It is useful to repeatedly make the matrices diag-onal as an interactive activity during the measurementselection process as diagnostic characteristics are dis-covered.

5 EXAMPLE

The benefits of the diagnosability matrices are bestillustrated by a worked example of how an engineermight use the information to select a set of sensorsand generate a diagnostic system. Consider the air-craft fuel system example of Figure 4. In Figure 7 themeasurement matrix has been diagonalised and mostmeasuremets set undecided, and we see that the ma-jority of measurements are needed in several symp-toms because of the horizontal bars in the matrix. Ifthe user/engineer knows that the measurements fromthe flow meters are definitely available to the diagnos-tic system, then these can be selected in the measure-ment list by checking boxes as shown, resulting in theappropriate cells in the matrices turning green. How-ever, it can be seen on the fault matrix that no cellsturn green demonstrating that making these measure-ments available to the diagnostic system would not beenough to allow it to diagnose any fault. The summaryat the top of the window notes that we have chosento make 2 measurements visible but this would not al-

low diagnosis of any faults (0/184). The information“0 faults are not diagnosable” refers to the as yet un-decided measurements and hence by adding additionalmeasurements we could still be able to diagnose all thefaults. If measurements are excluded then the numberof undiagnosable faults may rise, and some systemsmay have undiagnosable faults even with all availablemeasurements if the FMEA had faults that provide noobservable abnormal effect.

The pump control values are computer controlledand hence available to a potential diagnostic system(the engineer knows this even though the FMEA wasonly performed on the fluid system), and can be se-lected, in Figure 8. It can then be seen that these ob-servations are part of a symptom superset of the flowvalues and so the user may appreciate that it might bebetter to use them as a starting point instead of the flowmeters. The flow meter measurements could be des-elected, but this might lead to un-diagnosable faults.In use, none of the cells in the symptom-fault matrixturn red when the flow meter measurements are de-selected, which indicates that no faults are precludedby not using the flow meter measurements, i.e. there isalways an alternative symptom available.

The user can request an exhaustive search for thenext best n measurements that provide the maximum

10


Figure 8: Fuel system - Control valves selected

Figure 9: Fuel System - Result of search for two additional measurements

11


number of fault detections. The search space can belarge so the application firstly will inform the user ofthe search space size. In the example of Figure 9 theseare as follows:

1. 21

2. 210 (as selected in the example of Figure 9)

3. 1330

4. 5985

5. 20349

6. 54264

7. 116280

8. 352716

The result of a search for the next best two measure-ments is seen in Figure 9. The user is able to select thesets of measurements by clicking on any of the itemsunder “Best measurements search results” and will im-mediately see the affected measurements, symptomsand faults highlighted (not shown in the Figure). Ateach level all of the measurements related to lowerlevel categories will be selected. Also in this list adarker font is used to distinguish parts of a measure-ment set that are not part of a shorter best solution. Itcan be seen on lower right of the Figure that by addingone additional measurement six faults can be detected(i.e. the left pressure sensor detects 6 blockage faultsin the left system and the right pressure sensor detects6 blockage faults in the right system). However, it alsopossible to detect 80 faults by adding two measure-ments. Selecting on the Total 6 measurements mes-sage expands it to display all measurements involvedin any pairs that provide these 80 faults, as shown inFigure 5.

The skilled user will appreciate that there are twogroups of faults that can be detected (left and rightvariants). Considering the first set of faults, it is ap-parent that the flow meter measurement is common,plus either of the left flow or return valves. An engi-neer would know that both valves are, in fact, mechan-ically slaved and so the measurements are equivalent,save for a mechanical linkage failure1. If it is knownthat the flow valve is most closely connected to theactuator and return valve slaved to it then this is theone to choose. Thus, the flow left and right metersand flow valves are selected as it is pointless to diag-nose only left or right systems. When this is done, itcan be seen at the top of the resulting window shownin Figure 10 that 116 of the 184 faults are now diag-nosable using 6 measurements, and these are shownas diagnosable (green) in the lower matrix and faultlist when this is scrolled. Viewing a schematic of thesystem colour coded to indicate diagnosable faults willclearly show that the main fuel and supply return faultsare detectable with the subset of symptoms selected atthis point. The skilled user/engineer can continue this

1the mechanical aspects of the system are not modelledor included in the FMEA in this example

process of selecting measurements and reviewing theresulting symptom/fault displays until an optimal se-lection of measurements is made, ideally one that re-sults in all faults being diagnosable with no fault beingun-diagnosable using a minimal number of measure-ments.

It is possible to include features other than simplythe number of faults diagnosed in the definition of bestmeasurements, e.g. the ability of the diagnostic systemto isolate faults based on the number of different setsand intersections of sets of faults diagnosed by eachsymptom. Weighting of measurements and/or faultsaccording to physical features such as cost, accessibil-ity or severity is also possible where such data can beobtained, and will result in modified orderings and se-lections.

6 SYSTEM INSTRUMENTATION

The aircraft fuel system example in the previous sec-tions of this paper had a predefined set of sensors andobservable settings. For other systems the task may beto determine which sensors to add to build a diagnosticsystem. We concentrate on sensors that measure sys-tem parameters within the domain of the simulation,so for example in an electrical network, rising tem-peratures as a fault symptom could not be producedas a symptom unless the simulation were to includea thermal model. For systems that include diagnosisspecific sensors (e.g. vibration sensors) from other do-mains, hand crafted or externally generated symptomscan easily be added to the symptom set and includedin the overall diagnosability analysis, if required.

It is easy to allow the diagnostic generator to haveaccess to any system (simulation) parameter, and as anexample we present an automotive daylight runninglights system (DTRL) allowing the current in everywire in the system as a possible sensor input. Perhapsunsurprisingly, many symptoms are generated basedon the function output observations (lamps) and theinputs that are the triggers for the functionality thatwill cause activity at the observation point. The matri-ces show which observations are diagnostically equiv-alent for various sets of faults, for example the ver-tical ‘stripe’ patterns in the Figure 11 fault - symp-tom matrix. Figure 11 also demonstrates critical inputas a long horisontal bar in the center of the measure-ment matrix (lighting switch position), without whichmost faults cannot be diagnosed. The bar is (green)light coloured because it is clear it must be selectedfor the majority of the symptoms to be usable. Thelower right of the Figure also demonstrates a situationwhere three equivalent alternative measurements maybe used. The number plate lamps have been excludedbecause they are not directly observable by a sensor,leaving a choice between W16 and W27. W27 waschosen and this makes 6 symptoms redundant (red),although there is no effect on the number of faults thatcan be diagnosed.

12


Figure 10: Fuel system - left and right main fuel supply diagnosable

13


Figure 11: Instrumented DTRL system

In Figure 12 the remaining elements have been di-agonalised on the fault symptom matrix and groupsof related faults are clearly seen, each block tends tobe related to a different system function, due to struc-tural locality. Hovering the mouse over each block andlooking at the symptom conditions easily reveals thestates of the system involved, for example the blockunder the mouse pointer is related to the sidelights andthe yellow (light coloured) selected symptoms are allrelated to the dip lights. Following the process untilall faults are accounted for results in the statistics inTable 3. Most systems exhibit this law of diminishingreturns as more sensors are required to identify fewerfaults.

7 CONCLUSION AND FUTUREENHANCEMENTS

The work presented in this paper builds on the recentlydeveloped capability to develop symptom sets basedon an automated simulation based FMEA. It providesan engineer with tools to investigate the diagnosticability of a system or product based on existing oradditional sensing. Both on board and workshop di-agnostic systems could be produced and evaluated bymodifying the visibility of the available observations.The tools have been applied to a number of systemsincluding an aircraft fuel system containing 98 compo-

Table 3: DTRL sensor selection

Measurements(55 total)

Faults (46total)

Symptoms(87 total)

2 17 23 19 44 28 65 35 86 38 108 42 119 43 1310 44 1511 45 1612 46 18

nents and 239 possible faults [Snooke07] and a numberof automotive electrical systems.

Sometimes diagnostics require specific computa-tions or information from additional domains and thesecannot be included unless the system simulation pro-duces the relevant measurements. For specialist di-agnostic data it is possible to include a module intothe system that produces any such computed resultsusing the usual component modeling capabilities in-cluding state machines and general computations. Thesymptom generator will then utilize any of these spe-

14


Figure 12: DTRL fault symptom relationships

cialist measurements that fulfill a diagnostic capabil-ity, allowing an engineer to experiment with a numberof possible specialist measurements, to determine howwell they perform. Some systems contain distinct op-erating modes and symptoms often relate to specificmodes only due to their condition expressions. Thesemodes could be identified and included in the diag-nostic generation process to allow choices to be madeconcerning when faults can be detected during systemoperation. A good deal of this information is alreadycontained in the functional description of the systemand it may therefore be possible to indicate selectedinformation on the matrices via additional colouringor symbolism.

The tool concentrates on optimizing the total num-ber of diagnosable faults. In some applications theability to isolate faults (to a replaceable unit) and theability to diagnose faults in specific operating modesis important. Various graphical notations could be de-veloped to visualise these relationships by colour orspatial grouping or possibly a hierarchical version ofthe matrices that allow rows or columns to be aggre-gated, such an approach may also help in the presen-tation of very large systems if there are disjoint sec-tions to the diagnostic structures present in the matri-ces. In addition there are a number of ranking mea-sures that may be available for fault types, componentfailure instances, or affected system functions, all ofwhich could be used to guide the sensor selection ad-visor. These additions are feasible future additions tothe tools that would allow a more tailored diagnostic

system to be generated.There are a few additions to the graphical interface

that would improve the tool, for example the abilityto select elements by region in the matrices, and topresent lists of the elements within these selected re-gions for inclusion or exclusion. The ability to viewthe current set of diagnosable faults and measurementsneeded by (for example) colouring or labelling compo-nents on the original system schematic as each selec-tion is made may be a useful way of assessing diag-nosability.

8 ACKNOWLEDGEMENTS

Aberystwyth University’s work on the ASTRAEAproject is funded by the Welsh Assembly Govern-ment, by BAE Systems and by Flight Refuelling Lim-ited. The ASTRAEA project is co-funded by theTechnology Strategy Board’s Collaborative Researchand Development programme, following an open com-petition. The DRTL system was kindly providedby Sumitomo Electrical Wiring Systems Ltd. Thiswork is protected by BAE systems patent applications(0910145.2).

REFERENCES(ASTRAEA, 2009) ASTRAEA.

http://www.projectastraea.co.uk/, April 2009.(Bell and Snooke, 2004) J. Bell and N. A. Snooke.

Describing system functions that depend on inter-mittent and sequential behavior. In Proceedings

15


18th International Workshop on Qualitative Rea-soning, QR2004, pages 51–57, 2004.

(Bell et al., 2007) J. Bell, N. A. Snooke, and C. J.Price. A language for functional interpretation ofmodel based simulation. Advanced Engineering In-formatics, 21(4):398–409, Oct 2007.

(Console et al., 1989) Luca Console, Daniele Thesei-der Dupre, and Pietro Torasso. A theory of diagno-sis for incomplete causal models. In In Proc. 11thIJCAI, pages 1311–1317, 1989.

(Deb et al., 1995) S. Deb, K.R. Pattipati, V. Ragha-van, M. Shakeri, and R. Shrestha. Multi-signal flowgraphs: a novel approach for system testabilityanal-ysis and fault diagnosis. Aerospace and ElectronicSystems Magazine, IEEE, 10(5):14–25, 1995.

(Debouk et al., 1999) Rami Debouk, Stephane Lafor-tune, and Demosthenis Teneketzis. On an optimiza-tion problem in sensor selection for failure diagno-sis. In in Proc. of the 38th IEEE Conf. on Deci-sion and Control, pages 4990–4995. University ofMichigan, 1999.

(Lee and Ormsby, 1991) Mark H. Lee and AndrewR. T. Ormsby. A qualitative circuit simulator. InSecond Annual Conference on AI Simulation andPlanning in High Autonomy Systems. IEEE, 1991.

(Lee, 2000) Mark H. Lee. Qualitative modelling oflinear networks in engineering applications. In Pro-ceedings 14th European Conference on ArtificialIntelligence ECAI 2000, pages 161–165, Berlin,2000.

(Maul et al., 2007) William A. Maul, GeorgeKopasakis, Louis M. Santi, Thomas S. Sowers, andAmy Chicatelli. Sensor selection and optimizationfor health assessment of aerospace systems. Tech-nical Report NASA/TM—2007-214822, NASA,http://gltrs.grc.nasa.gov/, 2007.

(Mushini and Simon, 2005) R. Mushini and Dan Si-mon. On optimization of sensor selection for air-craft gas turbine engines. In 18th InternationalConference on Systems Engineering, pages 9–14.ISBN: 0-7695-2359-5, August 2005.

(Peischl and Wotawa, 2003) Bernhard Peischl andFranz Wotawa. Model-based diagnosis or reason-ing from first principles. IEEE Intelligent Systems,pages 32–37, 2003.

(Price et al., 1997) C. J. Price, D. R. Pugh, N. A.Snooke, J. E. Hunt, and M. S. Wilson. Combiningfunctional and structural reasoning for safety anal-ysis of electrical designs. Knowledge EngineeringReview, 12(3):271–287, 1997.

(Price et al., 2003) Christopher J. Price, Neal A.Snooke, and Stuart D. Lewis. Adaptable mod-eling of electrical systems. In Paulo Salles andBert Bredeweg, editors, Proceedings of 17th In-ternational Workshop on Qualitative Reasoning(QR2003), pages 147–153, Brasilia, Brazil, 2003.

(Price et al., 2006) C. J. Price, N. A. Snooke, andS. D. Lewis. A layered approach to automated elec-trical safety analysis in automotive environments.Computers in Industry, 57(5):451–461, 2006.

(Price, 2000) C. J. Price. AutoSteve: automated elec-trical design analysis. In Proceedings ECAI-2000,pages 721–725, 2000.

(Reiter, 1987) R Reiter. A theory of diagnosis fromfirst principles. Artif. Intell., 32(1):57–95, 1987.

(Snooke and Bell, 2002) Neal A. Snooke andJonathan Bell. Abstracting automotive systemmodels from component-based simulation withmulti level behaviour. In Sixteenth InternationalWorkshop on Qualitative Reasoning (QR02), pages151–160, Barcelona, Spain, 2002.

(Snooke, 2007) N. A. Snooke. M2cirq: Qualitativefluid flow modelling for aerospace fmea applica-tions. In Proceedings 21st international workshopon qualitative reasoning, pages 161–169, 2007.

(Spanache et al., 2004) Stefan Spanache, Teresa Es-cobet, and Louise Trave-Massuyes. Sensor place-ment optimisation using genetic algorithms. In Pro-ceeding DX04, pages 179–183, 2004.

(Struss and Dressler, 2003) P. Struss and OskarDressler. A toolbox integrating model-baseddiagnosability analysis and automated generationof diagnostics. In Proceedings of the 14th Inter-national Workshop on Principles of Diagnosis,2003.

(Struss, 1992) Peter Struss. What’s in SD? towards atheory of modelling for diagnosis. In Wlater Ham-scher, Luca Console, and Johan de Kleer, editors,Readings in model-based diagnosis, pages 419–449. Morgan Kaufman, 1992.

(Thompson et al., 1999) H. A. Thompson, A. J. Chip-perfield, P. J. Flemming, and C. Legge. Distributedaero-engine control systems architecture selectionusing multi-objective optimisation. Control Engi-neering Practice, 7(5):655–664, 1999.

(Trave-Massuyes et al., 2006) L. Trave-Massuyes,T. Escobet, and X. Olive. Diagnosability anal-ysis based on component-supported analyticalredundancy relations. IEEE Transactions onSystems, Man, and Cybernetics, Part A: Systemsand Humans, 36(6):1146–1160, 2006.

16

Documents

An Automated Failure Modes and Effect Analysis Based ...users.aber.ac.uk/nns/publications/papers/PHM09-Snooke-revision1.pdf · An Automated Failure Modes and Effect Analysis Based