43
Abnormal Situation Management Defining the way things will be. ASM

Presentation on Alarm Management

  • Upload
    buikhue

  • View
    241

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Presentation on Alarm Management

AbnormalSituationManagement

Defining the way things will be.

ASM

Page 2: Presentation on Alarm Management

The birth of ASM...

• ASM grew from an initial focus on alarm management. Most sites are aware that operator overload and alarm floods are common during abnormal operations. As we analyzed the issues around alarm management, we discovered that operator problems with the alarm system were only a symptom of a general issue: – the design, implementation, and maintenance

of many facilities, systems, and practices.

Page 3: Presentation on Alarm Management

ASM Consortium• Charter:

– Research the causes of abnormal situations and create technologies to address this problem

• Deliverables: – Technology, best practices,

application knowledge, prototypes, metrics

• History:– Started in 1994– Co-funded by US Govt

(NIST)– Budget: +$16M USD

• Current Status:– Committed through 2002– Honeywell leadership– Expanding membership

Current Membership:

University AffiliatesB R A D A D A M S W A L K E R A R C H I T E C T U R E, P. C.

Page 4: Presentation on Alarm Management

Requirements for Safe Operation• Hazards must be recognized and

Understood• Equipment must be “fit for purpose”• Systems and procedures to maintain

plant Integrity• Competent staff• Emergency Preparedness• Monitor Performance

In the area of alarm management most companies fail to meet these basic requirements for safe operation

Page 5: Presentation on Alarm Management

Various cost elementsVarious cost elements

Efficie

ncy

Operating Target

Current Limit

Theoretical Limit

Plant Performance

Comfort Margin

Theoretically possible; currently unsustainable

Lost opportunity(Cost of comfort)

Future upgrades (e.g.,Advanced Control)

Lost Profit

Additionalunplanned costs

Break-even

LossFixed Costs(Idle Plant)

Equipmentdamage, etc.

Accident

Lost Revenue

Profit

Shut down

Incident

Losses due toincidents, accidents(about 10% ofoperating costs)

Savings from reducing the comfortmargin

Page 6: Presentation on Alarm Management

A Look At Plant Operations

Daily Production

Day

s pe

r Yea

r

95% 100%< 60%

A typical Production Profile

for an Asset Intensive Facility

for a calendar year.

95 days

62 days

23 days

79 days

47 days

30 days

16 days

8 days

5 days

Production Target set by Enterprise

Page 7: Presentation on Alarm Management

Factors Affecting Plant Operations

Plant Operating Target

Plant Capacity Limit

Daily Production

Day

s pe

r Yea

r

Operational Constraints

Planning Constraints

95% 100%< 60%

Plant Availability

Plant Incidents Production EffectivenessAsset Utilization

Agility/Flexibility

Page 8: Presentation on Alarm Management

Real Life Examples

Total Feed

0

5

10

15

20

280

290

300

310

320

330

340

350

360

370

380

390

400

410

420

430

440

450

460

470

480

490

500

510

520

530

540

550

560

570

580

590

600

610

620

Rate

# D

ays

$33.5 M

Total Feed

02468

1012141618

280

290

300

310

320

330

340

350

360

370

380

390

400

410

420

430

440

450

460

470

480

490

500

510

520

530

540

550

560

570

580

590

600

610

620

Rate

# D

ays $38.5 M

0

50

100

150

200

250

300

112

115

118

121

124

127

130

133

136

139

142

145

148

151

154

157

160

163

166

169

172

174

177

180

183

Production rate

Freq

uenc

y

3.2%5.8%

0

50

100

150

200

250

300

457

463

468

474

480

486

492

497

503

509

515

520

526

532

538

543

549

555

561

567

572

578

584

590

595

Feed Rate

Freq

uenc

y

1503

$24.2M

24.2M

5.8% This plant had 5.8% in lost capacity!

This plant had $24.2M in lost capacity due to asset availability & incidents!

This plant lost $38.5M!

And this plant lost $33.5M!

Page 9: Presentation on Alarm Management

Site Studies have identified Plant Lost Opportunity

Plant Operating Target

Plant Capacity Limit

Daily Production

Day

s pe

r Yea

r

Operational Constraints

Planning Constraints

95% 100%< 60%

Between 3-15% in Lost Capacity is attributed to asset in-availability and

incidentsPlant Availability

Plant Incidents ProductionManagement

DCS/APC/ Optimization efforts

Manufacturing

ExecutionScheduling & ERP

NEW EMPHASIS!!

Asset Management

Reliability & CMMS

Page 10: Presentation on Alarm Management

Higher Plant Operating Target

Plant Capacity Limit

Fewer Operational Constraints

Fewer Planning Constraints

Day

s pe

r Yea

r

95% 100%< 60% Daily Production

Emphasis on plant & equipment reliability improvements and reduced incidents can result in a recovery of 3-15% of lost capacity!

Major Profit Potential

Page 11: Presentation on Alarm Management

The Importance of Alarm Management Improvement Project

Alarm management is the proper design, implementation, operation, and maintenance of industrial manufacturing plant alarm systems.

Current alarming practices are leading to Incidents

Major problem is:-

alarm flood

Standing Alarms

Poor Configuration of Alarms

Nuisance Alarms

Technology exists to significantly contribute to effective alarm systems and provide good Situation Awareness

Page 12: Presentation on Alarm Management

Alarms identified as contribution

Page 13: Presentation on Alarm Management

A Case

The lightning struck just before 9:00 AM on a Sunday. It immediately started a fire in the crude distillation unit of the refinery. The control operators on duty responded by calling out the fire brigade, and then had to divert their attention to a growing number of alarms while desperately trying to bring the crude unit to a safe emergency shutdown.

Hydrocarbon flow was lost to the deethanizer in the FCCU recovery section, which fed the debutanizer further along. The system was arranged to prevent total loss of liquid level in the two vessels, so the falling level in the deethanizer caused the deethanizer discharge valve to close. This, in turn, caused the level in the debutanizer to drop rapidly and its discharge valve also closed. Heat remained on the debutanizer and the trapped liquid vaporized as the pressure rose causing the pressure relief valve to “pop” (for the first of three times) into the flare KO drum and then immediately onto the flare itself.

b

Page 14: Presentation on Alarm Management

continued

In a matter of minutes, the board operator was able to restore flow to the deethanizer. This permitted the deethanizer discharge valve to be opened, allowing renewed flow forward to the debutanizer. The rising level in the debutanizer should have caused the debutanizer discharge valve to open (by the level controller action) and allow flow on to the naphtha splitter. Although the operators in the control room received a signal indicating the valve had opened, the debutanizer, nonetheless was filling rapidly with liquid while the naphtha splitter was emptying. The operators were concentrating on the displays which focussed on the problems with the deethanizer and debutanizer, and had no overview of the process available to indicate that even though the debutanizer discharge valve registered as open, there was no flow going from the debutanizer to the naphtha splitter.

b

Page 15: Presentation on Alarm Management

Despite attempts to divert the excess, the debutanizer became liquid-logged about an hour later and the pressure relief valve lifted for the second time, venting to the flare via the flare KO drum. Because there were enormous volumes of gas venting, the level of liquid in the flare KO drum was rising to a very high value.

About 2-1/2 hours later, the debutanizer vented to the flare a third time AND CONTINUED VENTING FOR 36 MINUTES. The high level alarm for the flare drum was activated at this time. But with alarms going off every 2 to 3 seconds, there appears to be no evidence that that alarm was ever seen. By this time, the flare KO drum had filled with liquid well beyond its design capacity. The fast-flowing gas through the overfilled drum forced liquid out of the drum’s discharge pipe. The discharge line was not designed for liquid, so the force of the liquid caused a rupture at an elbow. This released over 20 tons of highly flammable hydrocarbon.

Page 16: Presentation on Alarm Management

continued

The ensuing release quickly formed an ominous drifting cloud of vapor and droplets. In a matter of minutes, this cloud found its ignition source 350 feet downwind. The resulting explosion was heard 80 miles away. In the town nearest the plant, few windows still held intact panes, so overpowering was the pressure shock wave from the blast. The last fires in the refinery were eventually extinguished 2 days later. end

Page 17: Presentation on Alarm Management

Stylistic or Cultural IndicatorsTop Down:

CommitmentCompetenceCognizance

data collected & analyzed

Diagnostic and remedial measures

Source Failure Types

Unsafe ActsErrors &

Violations

Condition Tokens

Precursors

Functional Failure Types

Safety Information System

Interfacebetween theorganization

& the individualManagement Workplace

General Failure Types

AccidentsIncidents

Near-Misses1-10 hit list

Proactive DesignSI Projects

Best Practices

Poor workplacedesign

High workloadUnsociable hours

Inadequatetraining

Poor perceptionof hazardsAlarms

Human Factors

Control roomdesign

Near miss AuditingDu PontTraining

WorkspaceMotivation

Attitude

Group FactorsWorking Practice

Organization Individual

Page 18: Presentation on Alarm Management

Various cost elementsVarious cost elements

Efficie

ncy

Operating Target

Current Limit

Theoretical Limit

Plant Performance

Comfort Margin

Theoretically possible; currently unsustainable

Lost opportunity(Cost of comfort)

Future upgrades (e.g.,Advanced Control)

Lost Profit

Additionalunplanned costs

Break-even

LossFixed Costs(Idle Plant)

Equipmentdamage, etc.

Accident

Lost Revenue

Profit

Shut down

Incident

Losses due toincidents, accidents(about 10% ofoperating costs)

Savings from reducing the comfortmargin

Page 19: Presentation on Alarm Management

Managing Abnormal Situations Anatomy of a Disaster from Operations Perspective

Operational Modes:

Normal

Abnormal

Emergency

Plant States:

Normal

Abnormal

Out of Control

Accident

Disaster

Critical Systems:

Decision Support System

Process Equipment,

DCS, Automatic Controls

Plant Management Systems

Safety Shutdown,

Protective Systems,

Hardwired Emergency Alarms

DCS Alarm System

Physical and Mechanical Containment System

Site Emergency Response System

Area Emergency Response System

Operational Goals:

Keep Normal

Return to Normal

Bring to Safe State

Minimize Impact

Plant Activities:

Preventative Monitoring & Testing

Manual Control & Troubleshooting

Firefighting

First Aid

Rescue

Evacuation

Page 20: Presentation on Alarm Management

Total Feed

0

5

10

15

20

280

290

300

310

320

330

340

350

360

370

380

390

400

410

420

430

440

450

460

470

480

490

500

510

520

530

540

550

560

570

580

590

600

610

620

Rate

# D

ays $33.5 M

Total Feed

02468

1012141618

280

290

300

310

320

330

340

350

360

370

380

390

400

410

420

430

440

450

460

470

480

490

500

510

520

530

540

550

560

570

580

590

600

610

620

Rate

# D

ays $38.5 M

0

50

100

150

200

250

300

112

115

118

121

124

127

130

133

136

139

142

145

148

151

154

157

160

163

166

169

172

174

177

180

183

Production rate

Freq

uenc

y

3.2%5.8%

0

50

100

150

200

250

300

457

463

468

474

480

486

492

497

503

509

515

520

526

532

538

543

549

555

561

567

572

578

584

590

595

Feed Rate

Freq

uenc

y

1503

$24.2M

Summarized Production Data

Unexpected Upsets Cost 3-8% of Capacity

Plant Operating Target

Plant Capacity Limit

Daily Production

Day

s pe

r Yea

r

Optimization efforts

Operational Constraints

Planning Constraints

95% 100%< 60%

~ $10 Billion annually in lost production !

Page 21: Presentation on Alarm Management

Higher Plant Operating Target

Plant Capacity Limit

Fewer Operational Constraints

Fewer Planning Constraints

Day

s pe

r Yea

r

95% 100%< 60% Daily Production

Focused efforts can result in recovery of 3-8% of capacity

~ $10 Billion potential to the bottom line!

Major Profit Potential

Page 22: Presentation on Alarm Management

Timing diagram of DIN V 19251 as applicable for a single channel SRS with ultimate self tests

executed within the PST

Failure isDetected

Safe status of theProcess assured

Failure Occurrence in theProcess or in the

Safeguarding System

tTime for Time for reaction of the Process

corrective action on the corrective action

Fault Tolerance Time

Fault tolerance time of the process or Process Safety Time (PST)

System internaldiagnostic time

Page 23: Presentation on Alarm Management

Reliability Requirements for AlarmsClaimed PFDavg Alarm system

integrity/reliabilityrequirements

Humanreliabilityrequirements

1 – 0.1 Alarms may beintegrated into theprocess controlsystem

No special requirements – however the alarm system should be operated engineered and maintained to the good engineering standards identified in the EEMUA Guide

EMMUA Alarm Systems Guide page 17

Page 24: Presentation on Alarm Management

CONCEPT 1 : RISK REDUCTION

IncreasingRisk

EUC Risk

Necessary minimum risk reduction [ R ]

Risk to meet required Level

of Safety

Partial risk covered by External Risk

Reduction Facilities

Partial risk covered by Other Technology

SRSs

Partial risk covered by E/E/PES

SRSs

Risk reduction achieved by all SRSs & External Risk Reduction Facilities

Actual risk reduction

Actualremaining

risk

Page 25: Presentation on Alarm Management

SAFETY INTEGRITY LEVELS

TABLE 2: SAFETY INTEGRITY LEVELS: TARGET FAILURE MEASURES

SAFETY INTEGRITY

LEVEL

(SIL)

DEMAND MODE OF OPERATION

(Average Probability of failure to perform its design function

on demand)

CONTINUOUS/HIGH DEMAND MODE OF

OPERATION(Average Probability

of a dangerous failure per year)

4

3

2

1 10-2 to < 10-1 10-2 to < 10-1

10-3 to < 10-2 10-3 to < 10-2

10-4 to < 10-3 10-4 to < 10-3

10-5 to < 10-4 10-5 to < 10-4

Page 26: Presentation on Alarm Management

Reliability requirements for alarmsClaimed PFDavg Alarm system

integrity/reliabilityrequirements

Human reliabilityrequirements

0.1 – 0.01 Alarms system shouldbe designated as safetyrelated & categorized asSIL 1

The operator should betrained in themanagement of thespecific plant failurethat the alarm indicates;

Alarm system shouldbe independent fromthe process controlsystem

The alarm presentationarrangements shouldmake the claimed alarmvery obvious to theoperator anddistinguishable fromother alarmsThe alarm shouldremain on view to theoperator for the wholeof the time it is active

EMMUA Alarm Systems Guide page 17

Page 27: Presentation on Alarm Management

Reliability requirements for alarmsClaimed PFDavg Alarm system

integrity/reliabilityrequirements

Human reliabilityrequirements

Below 0.01 Alarms system wouldhave to be designated assafety related andcategorized as at leastSIL2

It is not recommendedthat claims for a PFDavgbelow 0.01 are madefor any operator actioneven if it is multiplealarmed and verysimple.For all credibleaccident scenarios thedesigner shoulddemonstrate that thetotal number of safetyrelated alarms and theirmaximum rate ofpresentation does notoverload the operator

EMMUA Alarm Systems Guide page 17

Page 28: Presentation on Alarm Management

The Setting of a high pre-trip alarm

B

A

Maximum rate of changeof alarmed variable during fault

Limit at whichprotection operatesTime for operator

to respond to alarmand correct fault

Alarm Setting

Limit of largest normaloperational fluctuation

EMMUA Alarm Systems Guide page 17

Abnormal Operating Region

Page 29: Presentation on Alarm Management

0

80

60

40

20

100

120

Gas

Con

cent

ratio

n (P

erce

ntag

e of

LE

L)

0 302010 5040 7060 80Time after onset of fault (Seconds)

ExplosionLower Explosive Limit (LEL)

Actual GasConcentration

Error

ErrorDelay

SamplingDelay

FaultOccurs

SensorDelay

Shut DownSystem Delay

Set trip point

Actual trip point

Measured GasConcentration

Gas concentrationprior to fault

Normaloperating Level

Page 30: Presentation on Alarm Management

Redesign Choices• Redesign - the plant or its controls to provide greater margin between the normal

operating limits & the trip limits. This is the most desirable solution but is often impractical or too expensive;

• Setting within normal operating limits - setting the alam within the limits of normal operating fluctuations & accepting that spurious alarms will occur during large normal disturbances. This is ergonomically very undesirable and will tend to increase alarm rates and reduce the operator confidence in the alarm system. In effect it increases the Average Probability of Failure on Demand (PFDavg) for the alarm system as a whole;

• setting nearer trip limits - setting the alarm closer to the trip limits and accepting that some fast transients will not be corrected by the operator before they reach the trip level. This will increase the production losses due to plant trips, & because there are more demands on the protection system, tend to make the plant less safe. It also implies an increase PFDavg for the alarm system.

EMMUA Alarm Systems Guide page 17

Page 31: Presentation on Alarm Management

Different Kinds of Events

Time

Abrupt/Catastrophic

Insidious

Manageable

PotentialImpact

of Initiating

Event

Page 32: Presentation on Alarm Management

Impact of DCS Alarm SystemAwareness of Disturbances

PotentialImpact

of Initiating

Event

Time

With typical alarm systems, orienting begins after an event creates an abnormal plant state.

The extent of the problem can impact operator’s ability to be fully aware of the locations of process disturbances.

As disturbances propagate the number of conditions to be aware of increases as well as the response requirements and the likelihood of missing important information.

Point of operator awareness

Correct intervention causes return to normal

Failure Occurrence in theProcess or in the Safeguarding System

Failure isDetected

Safe status of theProcess assured

Incident

Page 33: Presentation on Alarm Management

Impact of DCS Alarm System Management of Problems

PotentialImpact

of Initiating

Event

Time

Alarm Floods delay Evaluation

Point of operator awareness

Correct intervention causes return to normal

Standing Alarms interfere with Orientation

Inadequate filtering interferes with Action

Incident

Page 34: Presentation on Alarm Management

Impact of Good Alarm Management in Situation Awareness

PotentialImpact

of Initiating

Event

Time

• Increases likelihood of awareness of disturbances

• Reduces time to awareness• Hence, reduces the average

impact of initiating events

Average shift in awareness with decision support

Page 35: Presentation on Alarm Management

Emergency Alarm

Impact of Protection System

Impactof

InitiatingEvent

Time

UN-SAFE

SAFE

Trip from SIS

FTTProcess Safety Time

Trip

Loss

Quality

ProfitHigh Alarm

HighEmergency

Incident

FTT= Fault Tolerance Time

Operator diagnostic time

Page 36: Presentation on Alarm Management

PotentialImpact

of Initiating

Event

Time

No responseNo responseIncorrectIncorrect

Best

SuboptimalSuboptimal

Page 37: Presentation on Alarm Management

Impact of Decision Support SystemSupport for Optimal Response

PotentialImpact

of Initiating

Event

Time

• Reduces errors• Decreases time to implement

response• Manages side effects• Increases awareness

Page 38: Presentation on Alarm Management

ASM Alarm Management Solutions

Education for Management, Engineers, Technicians and Operators.

• Alarm Performance Assessment.• Requirement for alarm optimization tools.• Alignment with Company & EEMUA Guidelines.• Alarm Rationalization.• User Interface Design.• Decision Support Activities

Page 39: Presentation on Alarm Management

ObjectivesObjectivesAlarm Management Optimization

• Enhance operator effectiveness– Avoid alarm floods– Identify root causes– Eliminate nuisance alarms

• Enhance profitability– Reduce variability– Maximize plant up time– Prevent damage to equipment

• Reduce risk of : – Injury to personnel– Environmental incidents

Page 40: Presentation on Alarm Management

Develop PlantDevelop PlantAlarm ManagementAlarm Management

Standards & PhilosophyStandards & Philosophy

IdentifyIdentifyEnhancementsEnhancements

ChangeChangeManagementManagement

ImplementImplement

Verify AgainstVerify AgainstStandardsStandards

Collect DataCollect Data

AnalyzeAnalyze

Alarm Management OptimizationThe ProcessThe Process

Page 41: Presentation on Alarm Management

Alarm Management Alarm Management

After - 30 Points Account for ~ 52 % of All Alarms

Before - 30 Points Account for ~ 85 % of All Alarms

100K

2K

Alarm Management Optimization• Increase the effectiveness of the existing

alarm system through proven methodology– Analyze existing system performance– Assist in developing an alarm strategy and educating

operations staff– Rationalize existing alarm system

• Recommend and apply new alarm management software– UserAlert– Optimization Suite

• Alarm Rationalization and Documentation• Alarm Metrics and Analysis• Advanced Alarm Handlers

Page 42: Presentation on Alarm Management

Alarm RationalizationAlarm RationalizationOptimization Suite…

• Alarm priority (class) is based on severity and level of impact and time

• Available priority options in TPS:– No Action– Journal– Print– Print & Journal– Low– High– Emergency

Page 43: Presentation on Alarm Management

Alarm RationalizationAlarm RationalizationOptimization Suite…

• Recommends alarm priorities based on plant philosophy– Severity of impact– Time to respond– Trip Point

• Electronically captures plant alarm management philosophy– Time to respond rules definition – Impact and severity rules definition

• Apply manual priority override• Use Alarm Impact Templates• Generate EC Files (Honeywell)