21
Information Technology and Facilities Report Jerry Dreyer Vice President & Chief Information Officer Board of Directors Meeting Board of Directors Meeting October 18, 2011

Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

Information Technology and Facilities Report

Jerry Dreyery yVice President & Chief Information Officer

Board of Directors MeetingBoard of Directors MeetingOctober 18, 2011

Page 2: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

Highlights

Service Availability: y Market Operations IT systems met all SLA targets Market Data Transparency IT systems met all SLA targets

Retail Market IT systems missed one SLA target (Retail Processing Business Hours) Grid Operations IT Systems met all SLA targets

Retail Market IT system outage with impact to SLA:• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address

configuration (9/26)• Improperly routed traffic caused messages to queue and slow processing which resulted in 60

minutes of delayed transactions and slow MarkeTrak performance• Correct IP configuration change was applied to resolve the issue

R t il M k t l d tRetail Market unplanned outages:• Two parallel processes encountered contention that affected Electronic Data Interchange (EDI) (9/6)

• Retail transaction processing was unavailable for 28 minutes (outside of business hours)• Short-term resolution: Restarted one of the processes in contention

L t C fi t diff t t d li i t t ti Th fi ill• Long-term: Configure processes to run on different systems and eliminate contention. The fix will be implemented with the move to new data center by 10/30

• Hardware failure caused outage of Enterprise Data Warehouse (EDW) (9/26)• Caused Get Reports and other components of TML and MIS to be unavailable for 75 minutes• Short term: Manually took component out of service and used redundant component (NIC)

2 ERCOT PublicOctober 18, 2011

• Short-term: Manually took component out of service and used redundant component (NIC)• Long-term: Replaced hardware to resolve issue

Page 3: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

Highlights Cont’d

Core unplanned outages:• Grid Operations: There was an automatic LFC local failover caused by an EMS software defect (9/26)

• The EMS internal communication application locked-up and caused a local failover • Vendor has a lead on root cause but still doing investigationVendor has a lead on root cause but still doing investigation

Planned Outages:• Weekend Retail and Market Operations maintenance activity (9/11 and 9/25)

• Outages lasted 1,471 minutes which is within the 1,800 minutes allowed via SLA

October Planned Outage Updates• For the first time, the EMS and MMS production ran in the new Bastrop Data Center. The team successfully

completed the failover with only one missed SCED interval (Target is three or less) (10/4)• For the first time, the non-core systems (MIS, CDR, CMM,…) ran in the new Bastrop Data Center. The failover

was completed with no issues to report. (10/5)• An extended retail outage requested through RMS and COPS to execute retail system moves to the new T3 DataAn extended retail outage requested through RMS and COPS to execute retail system moves to the new T3 Data

Center in Taylor (10/22, 10/23, 10/29, 10/30)

3 ERCOT PublicOctober 18, 2011

Page 4: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

2011 Net Service Availability (Retail and Market Ops)

2011 Net Service AvailabilityyYear to Date

100%

99.95% 99.88%99.92% 99.85% 99.80% 99.96%99.91% 99.87%

99%

98%

97%

100%

94%

95%

96%

93%

92% and below

Transaction Processing

TML MarkeTrak Retail API

TML Report Explorer

CRRTransaction Processing MIS

Retail & Market Operations

Transaction Processing SLA Target: ‐ Business Hours (BH): 99.9%‐ Off Business Hours (Off BH): 99%

MarkeTrak SLA Target: 98%TML Report Explorer SLA Target: 99%Retail API SLA Target: 99%

g(BH)

API Explorerg( Off BH)

4 ERCOT PublicOctober 18, 2011

Off Business Hours (Off BH): 99%Texas Market Link (TML) SLA Target: 99%

Congestion Revenue Rights SLA Target: 98%Market Information System SLA Target: 99%

Page 5: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

2011 Net Service Availability (Grid Ops)

2011 Net Service AvailabilityYear to Date

99.85% 99.98% 99.99% 99.99%99.98%99.85%

99%

98%

97%

100%

97%

94%

95%

96%

93%

92% and below

MMS EMS OS NMMSEMS MMS

SCED LFC OS NMMSAggAgg

MMS Aggregate SLA Target: 99%EMS Aggregate SLA Target: 99%

EMS LFC Target: 99.93% Outage Scheduler Target: 99%

Grid Operations

5 ERCOT PublicOctober 18, 2011

MMS SCED SLA Target: 99.93% NMMS Target: 97%

Page 6: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

2011 Data Center Availability

2011 Data Center AvailabilityYear to Date

100% 100%100% 100%

Target – 99.982%

99.99%

99 98%

100.0%

g99.98%

99.97%

99 95%

99.96%

99.94%

99.95%

99.93%

99.92%

Taylor 1 Austin BastropTaylor 2

6 ERCOT PublicOctober 18, 2011

Outage prevented due to Tier 3 redundancy

Page 7: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

September 2011 Net Service Availability

September 2011 Net Service Availability

100%

September 2011 Net Service Availability

99.90% 100%100% 100% 99.82% 100%99.61% 99.91%

99%

98%

97%

100%

94%

95%

96%

93%

92% and below

Transaction Processing

TML MarkeTrak Retail API

TML Report Explorer

CRRTransaction Processing MISProcessing

(BH)API Explorer

Retail & Market Operations

Processing ( Off BH)

Transaction Processing SLA Target: ‐ Business Hours (BH): 99.9%‐ Off Business Hours (Off BH): 99%

MarkeTrak SLA Target: 98%TML Report Explorer SLA Target: 99%Retail API SLA Target: 99%

7 ERCOT PublicOctober 18, 2011

‐ Off Business Hours (Off BH): 99%Texas Market Link (TML) SLA Target: 99%

Congestion Revenue Rights SLA Target: 98%Market Information System SLA Target: 99%

Page 8: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

September 2011 Net Service Availability

September 2011 Net Service Availability

100%

September 2011 Net Service Availability

100% 99.99% 100% 99.99%99.99%100%

99%

98%

97%

100%

94%

95%

96%

93%

92% and below

MMS SCED

EMS LFC OS NMMSEMS

AMMS Agg SCED LFCAggAgg

MMS Aggregate SLA Target: 99%EMS Aggregate SLA Target: 99%MMS SCED SLA T t 99 93%

EMS LFC Target: 99.93% Outage Scheduler Target: 99%NMMS T t 97%

Grid Operations

8 ERCOT PublicOctober 18, 2011

MMS SCED SLA Target: 99.93% NMMS Target: 97%

Page 9: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

September 2011 Data Center Power Availability

100% 100%100% 100%

September 2011 Data Center Availability

Target – 99.982%

99.99%

99.98%

100.0%

99.97%

99 95%

99.96%

99.94%

99.95%

99.93%

99.92%

Taylor 1 Austin BastropTaylor 2

9 ERCOT PublicOctober 18, 2011

Page 10: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

YTD Availability – Retail Market IT ServicesRetail Transaction Processing

(Off Business Hours)(Off Business Hours)

99.00%

98.00%

97.00%

96.00%

100.0%

94.00%

95.00%

96.00%

93.00%

92.00%

Jan Feb MayAprMar JulyJune Aug Sepyp yJune g p

10 ERCOT PublicOctober 18, 2011

Page 11: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

YTD Availability – Market OperationsRetail API

Target – 99%99.00%

98.00%

97.00%

96.00%

100.0%

YTD – 99.85%

94.00%

95.00%

93.00%

92.00%

Jan Feb MayAprMar JulyJune Aug Sep

11 ERCOT PublicOctober 18, 2011

Page 12: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

YTD Availability – Grid Operations IT Services

12 ERCOT PublicOctober 18, 2011

Page 13: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

Load Frequency Control Availability

13 ERCOT PublicOctober 18, 2011

Page 14: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

Retail Transaction Processing Availability

14 ERCOT PublicOctober 18, 2011

Page 15: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

Retail Transaction Processing Availability

15 ERCOT PublicOctober 18, 2011

Page 16: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

NMMS Availability

S t b 2011 N t k M d l M t S t (NMMS)September 2011 Network Model Management System (NMMS) Availability Summary

9/22 (3 Minutes): Manual NMMS restart due to database lock

September 2011 NMMS Availability – 99 99%99.00%

100.0%

9/22 (3 Minutes): Manual NMMS restart due to database lock

September 2011 NMMS Availability 99.99%

98.00%

97.00% Target – 97%

NMMS unscheduled restarts

94.00%

95.00%

96.00%

6

8

10

12

rt C

ount

NMMS unscheduled restarts

93.00%

92.00%0

2

4

Rest

a

16 ERCOT PublicOctober 18, 2011

NMMS Availability

Page 17: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

TML Report Explorer Availability

17 ERCOT PublicOctober 18, 2011

Page 18: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

Release Management Metrics (Releases)

Awaiting slide

18 ERCOT PublicOctober 18, 2011

Page 19: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

Release Management Metrics (Changes)

Awaiting slide

19 ERCOT PublicOctober 18, 2011

Page 20: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

ERCOT Public Website Metrics

Awaiting slide

20 ERCOT PublicOctober 18, 2011

Page 21: Information Technology and Facilities Report...• Data Center migration to the new T3 site resulted in an incorrect Internet Protocol (IP) address configuration (9/26) • Improperly

ERCOT Public Website Metrics

Awaiting slide

21 ERCOT PublicOctober 18, 2011