24
Retail Market Messaging Incident Report 1

Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Embed Size (px)

Citation preview

Page 1: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Retail Market MessagingIncident Report

1

Page 2: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Agenda

• Recap of integrated market messaging system

• Incident Overview

• Current Status

• Incident Observations

• Next Steps

2

Page 3: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Agenda

• Recap of integrated market messaging system

• Incident overview

• Current Status

• Incident Observations

• Next Steps

3

Page 4: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Arrangements before May 2012

SAP IS-U SeebeyondMessage Hub

SupplierBilling System

MPCC(Supplier)

CC&BSupplier

Billing SystemEMMA(Supplier)

SeebeyondEMMA

SeebeyondGEMMA

Page 5: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Arrangements after Go-Live in October 2012

SAP IS-UVPN

Central Messaging

Hub

SAP IS-U

Common technical harmonisation solution Single instance supported by Northgate

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA (MPCC)

~30M

~ 4M

Total Market Messaging Volumes PA

Page 6: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Agenda

• Recap of integrated market messaging system

• Incident overview

• Current Status

• Incident Observations

• Next Steps

6

Page 7: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Incident/s in November, December ‘12 & January ‘13

Central Messaging

Hub

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Page 8: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Incident/s in November, December ‘12 & January ‘13

Central Messaging

Hub

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

DB DB

Processing Messages

Page 9: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Incident/s in November, December ‘12 & January ‘13

Central Messaging

Hub

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

Market ParticipantBilling System

Market ParticipantMessaging Application

EMMA

DB DB

Processing Messages

Page 10: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Incident Description

• Serious performance issue/s at EMMA level and multi-faceted– EI EMMA server resources

• CPU, memory etc. at upper limits

– EI EMMA database/application• performance issues & timeouts

• database table/storage limits reached

• system stressed

– Others MPs EMMAs had similar issues but greatly reduced impact

10

Page 11: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Incident Description

• Central messaging hub issues– Messages back-logged and clogging the system

• Time-out parameters

– System resources tied-up on processing un-sent messages, retry after retry

– Performance impacted overall– Systems stressed and data storage management issues ( processing folder )

– Market Participants impacts dependent on volumes and timings

– Central hub database integrity was impacted

– Presented serious operational issues on system performance

– Highlighted specific application and database performance weaknesses

11

Page 12: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

TIBCO and Non-TIBCO Related Events

• A number of events occurred that impacted market messaging delivery

– only some of these events were the result of the market messaging application incidents

• Summary of the main events that occurred:-– EMMA and Central Hub performance issues– Northgate outages

• Emergency and planned

– Market Participants systems outages• EMMA and back-end systems (Non TIBCO)

– Multi hardware faults• ESBN SAP IS-U and PI

– ESB data centre switch-overs• Planned and short notice

– Connectivity issues

12

Page 13: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Incident Management

• EMMA’s

– Electric Ireland EMMA • Additional EMMA application server resources where added

• Significant performance improvements resulted

• EMMA database monitoring fully put in place by EI

– Eirgrid EMMA• Application service resources for outbound messages re-allocated to inbound

messages.

• Good performance improvements resulted

– Airtricity EMMA • Database issue resolved

– Database indexes applied at EMMAs

– 341 market messaging defect fix applied to EMMAs

13

Page 14: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Incident Management

• Central messaging hub

– Manual manipulation of messaging folders required to resolve incidents successfully

– Manual operational management of un-sent messages put in place

– Manual reconciliation processes put in place

– Additional central messaging hub server and data storage resources were added

– Database indexes applied

14

Page 15: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Agenda

• Recap of integrated market messaging system

• Incident overview

• Current Status

• Incident Observations

• Next Steps

15

Page 16: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Current Status

• EMMAs are working well• Central messaging is working well• Working manual processes in place to deal with an event of a

significant back-log of messages at central messaging hub• A clear scope of specific TIBCO software changes has been

identified for project delivery• Market Participants (MPs)

– Monitoring of their EMMA database– Monitoring their EMMA servers with regard to performance

16

Page 17: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Risk Management

• There are various events that could potentially cause a central messaging hub message backlog, for example:-

– EMMA performance issues– MP loss of data communications services – broadband– MP hardware failure– Northgate internet services outage– Others

• Mitigation– Improvements already made to EMMA’s and central hub– MP management of EMMA– Increased Northgate awareness and experience of central hub management– Planned contingency solution– Improved communications overall – Remedial project delivery

17

Page 18: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Agenda

• Recap of integrated market messaging system

• Incident overview

• Current Status

• Incident Observations

• Next Steps

18

Page 19: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Incident Observations

• Performance Testing– Significant repeated performance testing was carried out with message volumes

of 350,000 plus in ~12 hour time periods• Current working day market messaging max volumes total is in the range of ~135,000 ( end of month

figure )

– Scenario testing coverage• Loss of a EMMA that processed large volumes of market messages, for a prolonged period of time

and resulting in a significant back-log of market messages at the central messaging hub was not included in the performance testing scope

• EMMA Application and Database Management– Current MP EMMA documentation needs to be updated with relevant material on

EMMA housekeeping activities– Documentation needs to be issued for comment and final version published to all– Archiving tools required for EMMA housekeeping– MP role and responsibilities in EMMA housekeeping

19

Page 20: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Incident Observations

• Central Messaging Hub and EMMA

– TIBCO system review carried out by Wipro as System Implementer – Northgate carried out an operational review– Application software changes now identified and scoped to deliver improvements

in reliance, performance and database integrity– Performance testing will be required on all software changes– Project required for the delivery of these changes.

20

Page 21: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Agenda

• Recap of integrated market messaging system

• Incident overview

• Current Status

• Incident Observations

• Next Steps

21

Page 22: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Next Steps

• Remedial Project Scope and Delivery

– Delivery of archiving tools for solution overall– Delivery of reconciliation of physical xml market messages– Contingency solution for possible future events– Update of market participants EMMA documentation on housekeeping– Communications to market participants on EMMA housekeeping– Review of EMMA specification overall for the market.– Delivery of messaging enquiry tool ( MET ) for central messaging hub– Review of TIBCO software modules with regard to software versions, upgrade

paths and end of maintenance support.

22

Page 23: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

Integrated Plan ( development in progress )

• Market schema release 2013• IPT for new market entrants• Remedial project• TIBCO product upgrade• Market Participants IT plans

23

Page 24: Retail Market Messaging Incident Report 1. Agenda Recap of integrated market messaging system Incident Overview Current Status Incident Observations Next

24