6
Causal analysis for alarm flood reduction Vicent Rodrigo *,** Moncef Chioua ** Tore Hagglund * Martin Hollender ** * Department of Automatic Control, Lund University, SE-221 00 Lund, Sweden (email: [email protected], [email protected]). ** ABB Corporate Research Germany, Wallstadter Strasse 59 (email: [email protected], [email protected]) Abstract: The introduction of distributed control systems and the high level of interconnectiv- ity of modern process plants has caused alarm flooding to become one of the main problems in alarm management of process plants. A reduction of alarm flood periods contributes to a decrease in plant incidents. In this work, a combination of alarm log, process data and connectivity analysis is used to isolate consequence alarms originating from the same process abnormality and to provide a causal alarm suggestion. The effectiveness of the method is illustrated on an industrial case study of an ethylene plant, a typical example of a large-scale industrial system. Keywords: alarm systems, fault detection and diagnosis, root-cause, plant-wide disturbance, plant connectivity, approximate sequence matching, large-scale systems. 1. INTRODUCTION Alarm systems are crucial elements in the operation of process plants as they continuously monitor the process and inform the operators if an undesired process state that requires their assessment or action is reached (IEC-62682, 2014). The introduction of distributed control systems in the process industry has increased the number of alarms per operator exponentially (Hollifield and Habibi, 2011). Ad- ditionally, nowadays, modern process plants present a high level of interconnectivity due to steam recirculation, heat integration and use of complex control systems. A dis- turbance affecting the process plant can therefore spread through its material, energy and information connections and affect all the process variables on the path. Alarms associated to these process variables are triggered and may overload the control room operators who will not be able to properly investigate each of these alarms (Wang et al., 2015). This undesired situation known as alarm flood has been recognized as a major cause of most industrial incidents investigated by the US Chemical Safety Board (Beebe et al., 2007). The ANSI/ISA-18.2 (2009) defines an alarm flood as the occurrence of more than 10 alarms per 10 minutes and per operator. In such situation, the operators might not be able to keep the process plant within safe operation endangering human lives and the environment (Bransby and Jenkinson, 1998). Henningsen and Kemmerer (1995) pointed out that in a typical alarm log, a significant part of nuisance alarms fall in the following categories: repetitive alarms, standing alarms and consequence alarms. For removing repetitive and standing alarms methods such as the use of filter- ing, time-delay, dead-band and difference functions are proposed. These techniques succeed in lowering the num- ber of alarms when the process is running under nomi- nal conditions. Nevertheless if an abnormality occurs and propagates through the plant, these methods are unable to suppress consequence alarms and to provide a proper di- agnosis. This task demands to identify all causal relations between alarms (Hollender and Beuthel, 2007). Recently, several contributions from academia and indus- try proposed systematic methods for the grouping of con- sequence alarms. Event correlation is applied in Higuchi et al. (2009) and Yang et al. (2012) for this purpose. On the other hand, noticing that a large portion of the alarms occurring in an alarm flood are related to each other and follow a specific pattern, Kabir et al. (2013) and Cheng et al. (2013) suggest using sequence pattern matching for grouping consequence alarms. Kabir et al. (2013) presented a method in which the similarity between alarm flood sequences is determined under the assump- tion that alarms occur in a consequential manner. This assumption, may however not always be true as the or- der of occurrence of alarms is highly dependent on their limit setting. Postulating that the exact order of alarm occurrences is not as important as their proximity in time, Cheng et al. (2013) proposed an alternative method for pattern matching of alarm flood sequences based on the Smith-Waterman algorithm. Time stamps of the alarms present in the alarm flood sequence are used to blur the order of the alarm sequence when the alarm time stamps are considered close to each other. Lai and Chen (2015) proposed an extension of this algorithm to the case of aligning multiple alarm flood sequences. In Schleburg et al. (2013), the authors suggest a method to group consequence alarms based on both a topology model describing the process connectivity and a set of rules built using process knowledge. In this method, every pair of alarms is checked against the set of rules and the plant connectivity to evaluate if two alarms are related. While grouping related alarms has a clear benefit in assisting the operators reacting to an alarm flood episode, Preprint, 11th IFAC Symposium on Dynamics and Control of Process Systems, including Biosystems June 6-8, 2016. NTNU, Trondheim, Norway Copyright © 2016 IFAC 723

Causal Analysis for Alarm Flood Reductionfolk.ntnu.no/skoge/prost/proceedings/dycops-cab-2016/... · 2016. 5. 18. · Causal analysis for alarm ood reduction Vicent Rodrigo; Moncef

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

  • Causal analysis for alarm flood reduction

    Vicent Rodrigo ∗,∗∗ Moncef Chioua ∗∗ Tore Hagglund ∗

    Martin Hollender ∗∗

    ∗Department of Automatic Control, Lund University, SE-221 00 Lund,Sweden (email: [email protected], [email protected]).∗∗ABB Corporate Research Germany, Wallstadter Strasse 59 (email:

    [email protected], [email protected])

    Abstract: The introduction of distributed control systems and the high level of interconnectiv-ity of modern process plants has caused alarm flooding to become one of the main problems inalarm management of process plants. A reduction of alarm flood periods contributes to a decreasein plant incidents. In this work, a combination of alarm log, process data and connectivityanalysis is used to isolate consequence alarms originating from the same process abnormalityand to provide a causal alarm suggestion. The effectiveness of the method is illustrated on anindustrial case study of an ethylene plant, a typical example of a large-scale industrial system.

    Keywords: alarm systems, fault detection and diagnosis, root-cause, plant-wide disturbance,plant connectivity, approximate sequence matching, large-scale systems.

    1. INTRODUCTION

    Alarm systems are crucial elements in the operation ofprocess plants as they continuously monitor the processand inform the operators if an undesired process state thatrequires their assessment or action is reached (IEC-62682,2014).

    The introduction of distributed control systems in theprocess industry has increased the number of alarms peroperator exponentially (Hollifield and Habibi, 2011). Ad-ditionally, nowadays, modern process plants present a highlevel of interconnectivity due to steam recirculation, heatintegration and use of complex control systems. A dis-turbance affecting the process plant can therefore spreadthrough its material, energy and information connectionsand affect all the process variables on the path. Alarmsassociated to these process variables are triggered and mayoverload the control room operators who will not be ableto properly investigate each of these alarms (Wang et al.,2015). This undesired situation known as alarm floodhas been recognized as a major cause of most industrialincidents investigated by the US Chemical Safety Board(Beebe et al., 2007). The ANSI/ISA-18.2 (2009) definesan alarm flood as the occurrence of more than 10 alarmsper 10 minutes and per operator. In such situation, theoperators might not be able to keep the process plantwithin safe operation endangering human lives and theenvironment (Bransby and Jenkinson, 1998).

    Henningsen and Kemmerer (1995) pointed out that in atypical alarm log, a significant part of nuisance alarmsfall in the following categories: repetitive alarms, standingalarms and consequence alarms. For removing repetitiveand standing alarms methods such as the use of filter-ing, time-delay, dead-band and difference functions areproposed. These techniques succeed in lowering the num-ber of alarms when the process is running under nomi-nal conditions. Nevertheless if an abnormality occurs and

    propagates through the plant, these methods are unable tosuppress consequence alarms and to provide a proper di-agnosis. This task demands to identify all causal relationsbetween alarms (Hollender and Beuthel, 2007).

    Recently, several contributions from academia and indus-try proposed systematic methods for the grouping of con-sequence alarms. Event correlation is applied in Higuchiet al. (2009) and Yang et al. (2012) for this purpose.On the other hand, noticing that a large portion of thealarms occurring in an alarm flood are related to eachother and follow a specific pattern, Kabir et al. (2013)and Cheng et al. (2013) suggest using sequence patternmatching for grouping consequence alarms. Kabir et al.(2013) presented a method in which the similarity betweenalarm flood sequences is determined under the assump-tion that alarms occur in a consequential manner. Thisassumption, may however not always be true as the or-der of occurrence of alarms is highly dependent on theirlimit setting. Postulating that the exact order of alarmoccurrences is not as important as their proximity in time,Cheng et al. (2013) proposed an alternative method forpattern matching of alarm flood sequences based on theSmith-Waterman algorithm. Time stamps of the alarmspresent in the alarm flood sequence are used to blur theorder of the alarm sequence when the alarm time stampsare considered close to each other. Lai and Chen (2015)proposed an extension of this algorithm to the case ofaligning multiple alarm flood sequences.

    In Schleburg et al. (2013), the authors suggest a methodto group consequence alarms based on both a topologymodel describing the process connectivity and a set of rulesbuilt using process knowledge. In this method, every pairof alarms is checked against the set of rules and the plantconnectivity to evaluate if two alarms are related.

    While grouping related alarms has a clear benefit inassisting the operators reacting to an alarm flood episode,

    Preprint, 11th IFAC Symposium on Dynamics and Control of Process Systems,including BiosystemsJune 6-8, 2016. NTNU, Trondheim, Norway

    Copyright © 2016 IFAC 723

  • pointing them to the causal alarm (i.e. the alarm relatedto the root-cause of the alarm flood) could significantlyreduce their reaction time.

    Causal root-cause analysis for process troubleshooting isthe topic of several recent contributions investigating thedesign of systematic methods for abnormality propagationpaths determination and plant-wide disturbances causalanalysis. Quantitative data-driven methods detecting di-rectionality between measurements have emerged, such astime delay analysis (Fan and Deyun, 2012) or transferentropy, which is used to identify the causality betweenmeasurements even in absence of observable time delaysbetween them (Bauer et al., 2007). A combination of quan-titative process history methods and qualitative modelslike Signed direct graphs (SDG) (Fan and Deyun, 2012)are proposed in Thambirajah et al. (2009).

    The present work follows the line of thought in Schleburget al. (2013) but aims at developing a systematic methodfor the isolation of the causal alarm in an alarm flood. Theproposed approach relies on the use of multiple processinformation sources: alarm logs, historical process dataand a topology model of the process. Even though thedata originating from different information sources differin their nature, they are related to each other.

    By combining alarm log, process data and conectivityanalysis not only related alarms of an alarm flood can begrouped, but also their causal alarm can be identified.

    2. CAUSAL ANALYSIS FOR ALARM FLOODREDUCTION

    This section describes an approach for alarm flood reduc-tion and causal alarm determination.

    The proposed approach relies on two assumptions:

    (1) Alarm floods are the result of an abnormality prop-agating in the process through material, energy andinformation connections.

    (2) If two alarm flood sequences are similar, then theystem from the same process abnormality.

    Regarding the first hypothesis, the aim of this work isto reduce alarm floods caused by a disturbance spreadingthrough the plant due to its high level of interconnectivity.As for the second hypothesis, if two alarm flood sequencesshare a common pattern and if a large portion of the alarmoccurrences in the alarm floods are present in this pattern,it is very likely that they originate from the same processabnormality.

    The method consists of five steps. In the first step, chatter-ing alarms are removed from the alarm log. In the secondstep, time intervals containing alam floods are identifiedand alarm flood sequences are built. Similar alarm floodsequences are clustered in the third step. In the fourthstep, the signals and assets associated with the analyzedalarms are identified. In the last step, process data causalanalysis followed by connectivity validation are performedin order to suggest a causal alarm. The method’s generalworkflow is depicted in Fig. 1.

    In the present work it is assumed that an alarm sequenceis defined as follows:

    Mapalarmstosignalsandassets

    Clusteralarmfloodsequences

    Alarmfloodsequences

    Alarmlog

    Iden7fyalarmfloodperiods

    Removecha=eringalarms

    Alarmlogwithoutcha=eringalarms

    Abnormalitytemplatesandcorrespondingcausal

    alarms

    Isolatethecausalalarm

    Alarmfloodsequencesclusteredandan

    abnormalitytemplateforeachcluster

    Abnormalitytemplatesandalarm,signalandassetsiden7fiersmapped

    Topologydata Processdata

    Fig. 1. Method’s workflow

    A = [a1, a2, ..., aM ]am = (em, tm),m = 1, 2, ...,M and em ∈ Σwhere em is the alarm type, particularly a tag containedin the alphabet of alarm types Σ, and tm its time stamp.

    2.1 Remove chattering alarms

    Chattering alarms are defined by the industrial standardANSI/ISA-18.2 (2009) as “alarms that repeatedly tran-sition between the alarm state and the normal state ina short period of time”. Since the present work focuseson analyzing consequence alarms, chattering alarms areremoved prior to the causal analysis.

    Following Kabir et al. (2013), in order to remove chatteringalarms, a minimal admissible time interval length is chosenand if the time elapsed between the occurrences of twoalarms of the same type is lower than the chosen interval,the second alarm is removed. The minimal admissible timeinterval length in the present work is taken as 10 min,based on the definition of alarm flood.

    2.2 Identify alarm flood periods

    This step isolates time periods where alarm floods occurand builds the corresponding alarm flood sequences.

    The alarm log is divided into time intervals of 10 min.Intervals with more alarm occurrences than a specifiedthreshold are highlighted. The threshold (τ) is chosenbased on the definition of an alarm flood, i.e. more than10 alarms per 10 min and per operator.

    Duration of alarm floods in process plants ranges from afew minutes to several hours. In order to identify an alarmflood regardless of its duration, consecutive intervals withmore alarm occurrences than the specified threshold aremerged. Since the alarm log is discretized, it might happen

    IFAC DYCOPS-CAB, 2016June 6-8, 2016. NTNU, Trondheim, Norway

    724

  • that the beginning or the end of an alarm flood sequence iscut out, in order not to lose part of the sequence, the timeintervals before and after the consecutive time intervalscontaining more than τ alarms are also included.

    2.3 Cluster alarm flood sequences

    Once the alarm flood time intervals are isolated andthe alarm flood sequences are built, similar alarm floodsequences are clustered using sequence pattern matching.

    Alarm time stamps are taken into account as the proximityin time of the alarms is more important than the exact or-der of occurrence. We use the method described in Chenget al. (2013) based on a modified Smith-Waterman (MSW)algorithm to blur the order of the alarms in the sequencewhen they occur closely in time. A similarity matrix isbuilt using pairwise similarity indices obtained from theMSW algorithm. Finally, agglomerative hierarchical clus-tering (AHC) with the average linkage criterion is appliedto cluster the alarm flood sequences.

    One main disadvantage of the MSW algorithm is itsheavy computational burden. In order to overcome thisproblem a prefiltering stage is added. A less precise butless time consuming method is used (Kabir et al., 2013).Combinations of alarm sequences with low similarity arefiltered out and the MSW index computation is applied toa subset of alarm sequence combinations with a significantsimilarity. A pair of alarm sequences is considered similarif the two alarm sequences have a high ratio of alarmsin common. If two alarm sequences have few alarms incommon, the computation of the MSW similarity index isdiscarded. A zero similarity index will be assumed.

    Two thresholds are used: the prefilter threshold (λpref )and the clustering threshold (λclus). The first thresholddecides whether a pair of alarm sequences is filtered out.The second threshold sets the stopping criteria of theclustering algorithm. The values of these two thresholdsare interrelated: for a given clustering threshold, settingthe prefilter threshold too high can lead to combinations ofalarm sequences with MSW similarity index large enoughfor being clustered, but with a prefilter similarity indextoo low to pass the prefilter. A solution to facilitate thethresholds’ choice is to make the prefilter and the MSWsimilarity indices comparable. If one could insure thatSIprefilter ≥ SIMSW , for a given clustering threshold ifthe prefilter threshold is set to be equal or smaller than theclustering threshold, all pairs of alarms that have a MSWsimilarity index high enough to be clustered together willpass the prefilter.

    Given two alarm sequences A and B, a similarity indexthat meet this requirement is:

    SIprefilter(A,B) =|A ∩B|

    max(|A|, |B|)(1)

    The numerator represents the number of common alarmoccurrences in both sequences. And the denominator is thenumber of alarm occurrences of the longest sequence (thesimilarity index is normalized between 0 and 1).

    According to assumption (2) in Section 2, sequences be-longing to the same cluster stem from the same process

    abnormality. Therefore, an alarm template can be builtfor each cluster/process abnormality. This alarm templatecontains the representative alarms that identify the corre-sponding process abnormality (Bouchair et al., 2013), seeFig. 1. Alarms that are not related to a process abnormal-ity can trigger during the same time interval. Obviously,these alarms should not be included in the abnormalitytemplate. Filtering out these alarms is achieved based ontheir frequency of occurrence in the training set of alarmfloods. The threshold on frequency of occurrence is set to50%, i.e. an alarm is kept only if it occurs in more than50% of the sequences belonging to a given cluster.

    Set of alarm flood sequences

    Pre-filter Similarity indices

    Clustering and create templates

    Reduced set of combinations of alarm flood sequences

    Similarity indices Alarm flood sequences clustered and a fault template for each cluster

    Fig. 2. Clustering alarm flood sequences step

    The third step of the proposed method where alarm floodsequences are clustered and a set of templates definingthe process abnormalities of all clusters is generated issummarized in Fig. 2 and described in Alg. 1.

    Alg. 1: Cluster alarm flood sequencesinput : Alarm flood sequences {Ai} : i = 1, 2, ..., Noutput: Clusters {Ck} : k = 1, 2, ..., K Ck = {Ai : i ∈ 1, 2, ..., N} and

    fault templates {Tk} : k = 1, 2, ..., K Tk = {e : e ∈ Σ}

    1 For each pair {Ai, Aj} : i 6= j and i, j = 1, 2, ..., Nif(SIpref(Ai, Aj) ≥ λpref ) add to list LMSW

    2 Build an N ×N similarity matrix S

    si,j =

    {SIMSW (Ai, Aj), if {Ai, Aj} ∈ LMSW0, otherwise

    3 {Ck} = AHC(S, λclus)4 For each Ck if (alarm type e ∈ Σ is present in more than 50% ofAi ∈ Ck) add e to Tk

    2.4 Map alarms to signals and assets

    The absence of a common naming convention for alarm,signal and assets tags hinders a straightforward mappingbetween the different process information sources. Wepropose to map alarms, signals and assets based on thesimilarity of the symbols contained in their tag names.For instance, the tag name of a temperature controllerfrom the case study is TC11 in the P&ID, this controllerhas a signal associated whose tag is TC11CO.XAY andan alarm type TC11CO is assigned to this signal.

    In this work six hashmaps {Mapx→y : x 6= y, x, y =e, si, as} mapping alarm types (e), signals (si) and assets(as) are generated based on the computation of a similarityindex as described in Alg. 2. Particularly, the similarityindex (SISW ) obtained from the original Smith Watermanalgorithm (Smith and Waterman, 1981).

    Alg. 2: Mapping signals and alarmsinput : Sets of all signal tags Si={sitag} and all alarm types Σoutput: Mapsi→e and Mape→si

    1 For each sitag , find emap : emap ∈ Σ, SISW (sitag, emap) =max({SI(sitag, e)}) e ∈ Σ, SISW (sitag, emap) ≥ λmap

    2 Add pair [sitag, emap] to Mapsi→e and [emap, sitag ] to Mape→si

    IFAC DYCOPS-CAB, 2016June 6-8, 2016. NTNU, Trondheim, Norway

    725

  • Fig. 3. P&ID of the cracking and quenching section of the ethylene plant

    2.5 Isolate the causal alarm

    The first alarm with respect to the time of occurrencecannot be considered as the causal alarm since the timeat which an alarm triggers is highly dependent on the set-tings of the alarm limits. Moreover, the time between theoccurrence of a process abnormality and the correspondingalarm triggering is probabilistic. As the causal relation-ships between signals can be captured, the process dataassociated to these alarms should be analyzed instead.

    The method used for this purpose is transfer entropy(Bauer et al., 2007). Transfer entropy uses transition prob-abilities to estimate the amount of information transferredfrom one signal x to another signal y and reads:

    t(x|y) =∑

    p(xi+h,yi,xi) · logp(xi+h|yi,xi)p(xi+h|xi)

    (2)

    The causality is obtained by comparing the amount ofinformation transferred from signal x to signal y with theamount information transferred from signal y to signal x.If tx→y is positive, more information is transferred from xto y, and therefore x caused y. If tx→y is negative, y causedx. If it has a value close to 0, then the causality cannot bededuced.

    tx→y = t(y|x)− t(x|y) (3)

    This method may give rise to spurious results (Baueret al., 2007). In order to reduce their number, the ob-tained results are validated against the plant connectiv-ity information contained in the topology model of theplant (Thambirajah et al., 2009). CAEX (Computer AidedEngineering eXchange) is a vendor independent object-oriented data exchange format based on the XML schemathat has been used in the past years for structuring planttopology models in a flexible and expandable way (see e.g.:Schleburg et al., 2013).

    The topology based validation of the data-driven analysisstarts with the identification of the assets correspondingto the alarms in the fault template. The connectivity of

    the plant is explored using a graph-search algorithm ona graph capturing the connections between assets in theplant. The depth first search algorithm is employed inthis work (Yang et al., 2010). The assets suggested asroot-causes from the data-driven analysis are taken asstarting points of the disturbance propagation path whilethe remaining assets are taken as secondary disturbedelements. All feasible propagation paths between the sug-gested root-cause and each of the secondary disturbedassets are explored. If no feasible path is found, the root-cause hypothesis is discarded.

    Once the spurious results of the data-driven analysis havebeen filtered out and one of the root-cause suggestions hasbeen validated with plant connectivity, the causal alarm istaken as the alarm associated to the root-cause asset (seeAlg. 3).

    Alg. 3: Isolate the causal alarminput : Clusters {Ck} and fault templates {Tk}output: A causal alarm akc for each fautl template Tk

    1 For each cluster Ck, choose a sequence Ai : Ai ∈ Ck2 {sip} = Mape→si({ep : ep ∈ Tk})3 Perform process-data causal analysis on signals {sip} to obtain

    suggested root-cause signals {siksg} : q ∈ N4 {asksg} = Mapsi→as({si

    ksg,q}) and {as

    ksd} = Mape→as({ep})

    5 If ( ∃asksg,q with feasible propagation paths to all asksd

    )

    akc = Mapas→e(asksg,q)

    3. INDUSTRIAL CASE STUDY

    The proposed method determining the causal alarm inalarm flood sequence is illustrated by an analysis con-ducted on the cracking and quenching section of an ethy-lene plant located in Austria.

    The alarm management system of the plant has more than3800 alarms configured in the monitoring system. Theanalysis is performed using eight months of collected alarmdata, during which more than 48000 alarms occurred. Theprocess data is presented in an excel file with a one-second

    IFAC DYCOPS-CAB, 2016June 6-8, 2016. NTNU, Trondheim, Norway

    726

  • sampling period data set. A CAEX file containing thetopology model of the plant is used. The P&ID of the plantsection is shown in Fig. 3. The data are analyzed usinga developed prototype software tool that guides the userthrough the steps of the proposed method. The parametersfor the alarm analysis are set by the user. The thresholdnumber of alarms (τ) (see Section 2.2) is set to a valueof six. Since operators of the ethylene plant control morethan one area, the threshold number of alarms is set toa value smaller than 10 alarms per 10 min. Additionally,the value of 0.4 is chosen for the clustering threshold aswell as for the prefiltering threshold (see Subsection 2.3).The parameter σ2 from the MSW algorithm (Cheng et al.,2013) is set to 5000, allowing alarms with a time delay ofaround 3.5min to be matched.

    Fig. 4. Clustering results: (a) Clusters (b) Abnormalitytemplate of Cluster1

    After loading the alarm log to the software tool and per-forming the steps Remove chattering alarms and Identifyalarm flood periods, 23 alarm flood periods are identified.Results obtained from the Cluster alarm flood sequencesstep are depicted in Fig. 4 (a). Three clusters are obtained.From the 23 alarm sequences analyzed, four do not belongto any cluster and 19 are clustered under one of the threegroups. Each one of these clusters containing similar alarmflood sequences represents a different process abnormality.Only the process abnormality from Cluster1 is furtheranalyzed. The template of this abnormality is displayedin Fig. 4 (b).

    To perform the data-driven and connectivity analysis, thesignals and assets associated to the alarms included in thetemplate are identified. After loading the process data andthe topology model of the plant, the automatic mappingusing the Smith-Waterman algorithm is performed. Theresults are displayed in Fig. 5. Tags in red represent alarms,tags in blue signals and tags in green assets. For instance,the valve G02 has an associated signal G02.Y and twoalarms G02/010 and G02/011.

    The process signals appearing in the mapping tree of Fig. 5are considered for the analysis. These signals are displayedin Fig. 6. The results of the process-data analysis areshown in a bubble chart representing the transfer entropybased causality matrix (Bauer et al., 2007). The analysis

    Fig. 5. Alarms (red) from the abnormality template ofCluster 1 mapped to their corresponding signals(blue) and assets (green)

    Fig. 6. Time trends of the signals included in the analysis

    Fig. 7. Transfer entropy values tx→y

    suggests that the root-cause is the signal PC11.XAY.GI01.Y, PC12.XAY, PI15.Y, PI14.Y, PI13.Y, PI16.Y andGI02.Y are considered secondary disturbed signals (seeFig. 7).

    Finally, the result of the process-data analysis is validatedusing the process connectivity. The process connectivityanalysis found feasible propagation paths starting from

    IFAC DYCOPS-CAB, 2016June 6-8, 2016. NTNU, Trondheim, Norway

    727

  • Fig. 8. Propagation paths found by the connectivity anal-ysis

    the suggested root-cause to all secondary disturbed points.Fig. 8 depicts the obtained propagation paths. The processconnectivity analysis suggests that the abnormality spreadfrom the pressure controller PC11 S to the valves G01and G02 via the information connection between thiscontroller and the valves. From valve G01, the disturbancespread to the pressure controller PC12 S. Simultaneously,the disturbance reached Cracking Zone A, Cracking ZoneB, Cracking Zone C and Cracking Zone D via the heatexchangers LQE 1, LQE 2, LQE 3 and LQE 4. SensorsPI13 S, PI14 S, PI15 S and PI16 S were also affectedby the propagating disturbance since they are locatedin the coils of the different cracking zones. The root-cause suggestion from the data-driven analysis is thereforevalidated by the plant connectivity analysis. An alarmassociated to PC11 S is the given causal alarm suggestion,i.e. alarm PC11 CO 4. All the alarms in the template willbe grouped under it.

    After discussion with site engineers the obtained resultswere confirmed. The analysed abnormal situation corre-sponds to a shift in the operating mode between cracking-decoking. The transition between these two operatingmodes triggers a characteristic alarm flood sequence. Thesite crew mentioned that pressure controller PC11 is per-forming the opening and closing of both valve G01 andvalve G02 during the change of operation mode and istherefore the origin of the alarm flood.

    4. CONCLUSIONS

    This paper proposed an innovative way to reduce alarmflooding by using a combination of data from differentinformation sources. Combining alarm log, historical pro-cess data and process connectivity analysis allowed to notonly group consequence alarms, but also to determinethe causal alarm of an alarm flood sequence. The pro-posed method was tested on actual data collected froman ethylene plant. The proposed approach was able togroup related alarms triggered during an alarm flood andto identify their causal alarm.

    REFERENCES

    ANSI/ISA-18.2 (2009). Management of alarm systems forthe process industries.

    Bauer, M., Thornhill, N., Cox, J., Caveness, M., andDowns, J. (2007). Finding the direction of disturbance

    propagation in a chemical process using transfer entropy.IEEE Transactions on Control Systems Technology.

    Beebe, D., Ferrer, S., and Logerot, D. (2007). Alarm floodsand plant incidents. ProSys.

    Bouchair, N., Gayet, P., and Charbonnier, S. (2013). Faultisolation by comparison of event lists using a weighteddistance. IEEE Internation Conference on Systems,Man and Cybernetics.

    Bransby, M.L. and Jenkinson, J. (1998). The managementof alarm systems: a review of best practice in theprocurement, design and management of alarm systemsin the chemical and power industries. HSE ResearchReport CRR 166.

    Cheng, Y., Chen, T., and Izadi, I. (2013). Patternmatching of alarm flood sequences by a modified smith-waterman algorithm. Chemical Engineering Researchand Design.

    Fan, Y. and Deyun, X. (2012). Progress in root causeand fault propagation analysis of large-scale industrialprocesses. Journal of Control Science & Engineering.

    Henningsen, A. and Kemmerer, J.P. (1995). Intelligentalarm handling in cement plants. IEEE Industry Appli-cations Magazine.

    Higuchi, F., Yamamoto, I., Takai, T., Noda, M., andNishitani, H. (2009). Use of event correlation analysisto reduce number of alarms. Computer Aided ChemicalEngineering.

    Hollender, M. and Beuthel, C. (2007). Intelligent alarming.ABB Review 1/2007.

    Hollifield, B. and Habibi, E. (2011). Alarm Management:A Comprehensive Guide, Second Edition. Isa.

    IEC-62682 (2014). Management of alarm systems for theprocess industries.

    Kabir, A., Iyadi, I., Chen, T., Joe, D., and T., B. (2013).Similarity analysis of industrial alarm flood data. IEEETransactions on automation science and engineering.

    Lai, S. and Chen, T. (2015). Methodology and applicationof pattern mining in multiple alarm flood sequences.IFAC-PapersOnLine.

    Schleburg, M., Christiansen, L., Fay, A., and Thornhill,N. (2013). A combined analysis of plant connectivityand alarm logs to reduce the number of alerts in anautomation system. Journal of Process Control.

    Smith, T. and Waterman, M. (1981). Identification ofcommon molecular subsequences. Journal of MolecularBiology.

    Thambirajah, J., Thornhill, N., Benabbas, L., and Bauer,M. (2009). Cause-and-effect analysis in chemical pro-cesses utilizing xml, plant connectivity and quantitativeprocess history. Computers and Chemical Engineering.

    Wang, J., Yang, F., Chen, T., and Shah, S. (2015). Anoverview of industrial alarm systems: Main causes foralarm overloading, research status, and open problems.Automation Science and Engineering, IEEE Transac-tions on.

    Yang, F., Shah, S., Xiao, D., and Chen, T. (2012). Im-proved correlation analysis and visualization of indus-trial alarm data. ISA Transactions.

    Yang, F., Shah, S.L., and Xiao, D. (2010). Signed directedgraph modeling of industrial processes and their valida-tion by data-based methods. Control and Fault-TolerantSystems (SysTol).

    IFAC DYCOPS-CAB, 2016June 6-8, 2016. NTNU, Trondheim, Norway

    728