Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Event Sponsor
PanoramaData Center Events, Trends & Outages
PRESENTATION
Matt StansberryVP North America, Uptime Institute
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Greenpeace Report Virginia
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Renewable Energy and Data Centers in Lock Step?
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Interest in Renewable - Industry Wide
15%
13%
20%
19%
32%
29%
66%
55%
52%
Near site (shared campus)generation/microgrid
On-site Renewable (Solar, wind, biomass)
Purchasing Renewable Power
Installed or contracted Considering No
Has your organization adopted or is it considering any of the following power
generation and storage approaches?
Source: Uptime Institute Global Survey of IT and Data Center Managers 2019, n=418
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Bloom Energy, Santa Clara City at Odds on Fuel Cells
• Santa Clara bans non- renewable
self generation
• Fuel cells deemed non-renewable
• Equinix among operators barred
from new installations
• Bloom Energy to sue the City,
claims motive is to protect funding
from utility.
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
China, Huawei Dispute Escalates
• Huawei cellphone sales collapse. Overall
revenue down by $20bn for 2019.
• US companies to be banned from supplying
Huawei (and others).
• 1200 US, European companies fear huge impact
from Chinese bans.
• Big Chinese data centers builds could pose
problems for US, European suppliers.
• 5G could split into in Chinese/Western versions.
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Cyberwars and Paranoia
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
The battle builds
• FTC, DoJ clarify roles,
prepare for action
• Elizabeth Warren vows to
break up tech giants
• EU, China step up vigilance
Internet Giants and the Regulators
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
The Break Up Discussion Begins….
Break up?
• Platform and apps (Search)
separated?
• Acquisitions reversed?
• Utility style regulation?
• What does it mean for
data centers, networks,
colos?
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Public Clouds Go Hybrid…and Edge
• Public cloud extends to on-prem
• Simplified centralized management across different venues
• Easy access, easy to move
› AWS Outposts
› Microsoft Azure Stack
› Google Cloud Services Platform
Positions public cloud services to readily run in distributed environments
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Enterprise IT is Growing, not Shrinking
44% 45%
33% 37%
23%17%
Enterprise-owned data centers Leased colocation data centers
Growing Flat Shrinking
Source: Uptime Institute Intelligence Capacity Survey Q4 2018, n=272
Please describe your data center capacity trends for owned and dedicated colocation sites
(overall MW, not number of sites)
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
DCIM and DMAAS
• Uptime Institute research: DCIM is
(finally) mainstream
• Integrations up the stack is the focus
• DMaaS (Data center management as a
service) beginning to gaining traction
• DMaaS helps democratize artificial
intelligence in data centers
• DCIM plus DMaaS – not either/orUptime Intelligence report, June 2019
OUTAGES UPDATE
OUTAGES REPORT
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Uptime Institute Outage and Incident Tracking
• The Abnormal Incident Report (AIRs) database.
• Annual Uptime Institute Survey.
• Uptime Institute Research public outage monitoring database. o Monitors outages in media and other
sources
o Running since January 2016
o Reports submitted from Uptime staff around world
o Covers all causes except security breaches
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Uptime Institute Survey 2018 Outages Results
31% of respondents had
experienced an IT downtime
incident or severe service
degradation in the
past year
48% had an outage in their
own site or a service
provider’s in the past three
years
80% report that their most
recent outage was
preventable
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Public Outages: 2016 – 2018
Source: Various online news sources, Uptime Intelligence, January 2019
163 total
27
57
78
2016 2017 2018
Number of Outages
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
XRating
Service
OutageImpact of Outage
Level 1 NegligibleRecordable outage but little or no obvious impact on services, no
service disruptions
Level 2 Minimal Services disrupted. Minimal effect on users/customers/reputation
Level 3 SignificantCustomer/user service disruptions, mostly of limited scope, duration or
effect. Minimal or no financial effect. Some reputational or compliance
impact(s)
Level 4 SeriousDisruption of service and/or operation. Ramifications include some
financial losses, compliance breaches, reputation damages, possibly
safety concerns. Customer losses possible
Level 5 SevereMajor and damaging disruption of services and/or operations with
ramifications including large financial losses, possible safety issues,
compliance breaches, customer losses, reputational damage
The Outages Severity Rating system was developed by Uptime Institute © 2019, All Rights Reserved
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Serious Category 4/5 Outages
Serious outages 2016 2017 2018
Serious business/service outage (Category 4) 6 27 13
Severe business/mission critical outage (Category 5) 3 5 3
Total 9 32 16
2018 Primary cause
Government Network
Finance/banking IT system
Finance/banking IT system
2017 Primary cause
Government IT system
Airline Network
Airline IT system
Airline Power
Hosting/managed
servicesCooling/mechanical
2016 Primary cause
Airline Network
Airline Power outage
Hosting
(Government)Power outage
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Causes of Major Outages Over Three Years
• Power was the most common cause of Level 4/5 outages 2016 - 2018
• IT and Network close behind…
Source: Various online news sources, Uptime Institute Intelligence, January 2019
Fire 2%
Fire suppression
4%
IT system 28%
Network 26%Other/externa
l 2%
Power 30%
Security 7%
Cooling / mechanical
2%
% of total Level 4 & 5 outages over 3 years (total number of Level 4 and 5 outages = 57)
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Length of User Disruption
0-1 hr10%
1-4 hrs36%
4-12 hrs32%
12-24 hrs8%
24-48 hrs5% More
than 48 hrs9%
2018
(Resultant disruption)
• More than 50% of service outages were still causing disruption after 4 hours.
• 22% still caused more than half a day disruption.
• Many caused days or weeks of disruption…
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Costs of Outages
• Costs not recorded in public outage data.
• Uptime Institute Survey: • 39 outages reported costing more than $1m
• One in three outages cost more than $250,000
• Lawsuits and Financial adjustments increasingly common.
Source: Uptime Institute Annual Survey 2018
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
The Cause of Reported Outages is Changing….
23
IT system32%
Network19%
Power28%
2017
IT system35%
Network32%
Power11%
2018
(% of each year's total)Source: Various online news sources, Uptime Institute Intelligence, January 2019
Top Three Primary Causes
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Causes of Outages Cited: IT
IT Systems
• A poorly managed upgrade with insufficient testing at the software level.
• The failure and subsequent data corruption of large disk drives/storage area networks.
This is likely caused by hardware failure, exacerbated by configuration/programming
errors.
• Failure of synchronization or programming errors across load balancing or traffic
management system.
• Incorrectly programmed failure/synchronization or Disaster recovery systems.
• Loss of power to non backed- up single components (servers, large disk drives)
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Causes of Outages Cited: Network
Network
• Fiber cuts outside the datacenter, with insufficient routing alternatives (common).
• Intermittent failure of major switches, with secondary routers not deployed.
• Major switch failure without backup.
• Incorrect configuration of traffic during maintenance
• Incorrectly configured routers/software defined networks.
• Loss of power to non backed up single components (switches, routers)
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Causes of Outages Cited: Power
Power outages
• Lightning strikes, leading to surges and lost power. Back up software/configuration failed.
• Intermittent failures with transfer switches, leading to failure to start generators, or transfer to
second data center.
• UPS failures and failure to transfer to secondary system.
• Operator errors, turning off/misconfiguring power.
• Utility power loss and subsequent of failure of generator, or UPS.
• Damage to critical IT equipment caused by power surges.
• IT equipment not equipped with dual power suppliers to switch to secondary feed
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Outages by Industry Sector: 2016-2018
27Source: Various online news sources, Uptime Intelligence, January 2019
• Cloud/internet giants not immune to failure….
• Professional service providers account for most reported outages.
• Finance and airlines disproportionately disrupted by IT failures…
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Outages Continue to Trouble the Sector
• Half (50%) had an outage/severe service degradation in the past three years.
• Results closely match of those of the 2018 survey.
34%(2018 = 31%)Has your organization
experienced an IT
service outage or
severe service
degradation in the last
year, either in your
own site or a third-
party provider?
Source: Uptime Institute Global Data Center Survey of Data Center Operators 2019 N=479
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Blackout Hits South America
• Argentina, Uruguay, Paraguay, Chile all
suffer power outage
• 48 million affected for up to 14 hours.
• Trains, traffic lights, elevators, water
distribution and elections affected.
• Data centers report few failures to date.
• Caused by transmission failure at the
Yacyreta hydroelectric dam.
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Lessons from Global Switch and Arup Case?
• Global Switch (Singapore) $17.5m claim
against Arup dismissed.
• GS claims negligence and design faults
caused outages, re-design and additional
power redundancy investment.
• Parties disagree on who understood,
what and when about power availability.
• Judge says: "it remained vague what tier
of data center the extension would be.”
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Target’s Father’s Day Glitches
Target suffered two separate outages in one
weekend.
• Outage 1 – Saturday 15th:
• POS machines unable to process
transactions for 2 hours.
• Failure due to maintenance issue.
• Outage 2: - Sunday 16th
• Check out machines out for 2 hours at
1800 stores.
• Target unable to process many
transactions.
• Data Center issue at NCR.
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
More Trouble at Airlines….
• AeroData data center fault causes weight and balance system failure.
• 100s of flights from multiple airlines affected.
_________________________
• [Unnamed]airline: Major fault caused by UPS disconnection and reconnection during training exercise.
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Recap: March 13 - A Bad Day for Internet…
Apple: multiple
systems down 4 hoursFacebook, Instagram, Messenger,
Whats App down for up to 14 hrs.
Google Cloud Console down for 4 hrs due to software bug/upgrade.
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Google Cloud Outage (June)
• Millions lost access to key Google services
• Impact lasted up to 4 hours.
• Server configuration error led to severe
network congestion.
• Configuration error replicated across regions.
• Many Google cloud clients affected.
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Also Hit by Network, IT Issues…
June outage caused
by “network issue.”
May hosting problem
caused by
“container
inconsistencies.”
June outage caused
by “technical issue.”
May – Users
blocked for 15
hours or more to
fix database issue
7x24 Carolinas & Atlanta Chapters 2019 SUMMER MEETING
Networks Rival Power as a Cause of Downtime
Two For the Road
• Take a service/application view of outages & how to prevent them
• Complexity & interdependencies will continue to be technical & business issues for operators
Visit www.uptimeinstitute.com for more information
Uptime Institute is a division of The 451 Group, a leading technology industry analyst and data company. Uptime Institute has office locations in the U.S., Mexico, Costa Rica, Brazil, U.K., Spain, U.A.E., Russia, Taiwan, Singapore, and Malaysia.
© 2017 Uptime Institute, LLC. All rights reserved.