35
1 Myths and realities about designing high availability data centers Tier III and Tier IV: What do you need to know? Steven Shapiro, P.E., ATD Mission Critical Practice Lead

Myths and realities about designing high availability data centers

Embed Size (px)

Citation preview

Page 1: Myths and realities about designing high availability data centers

1

Myths and realities about designing high availability data centers

Tier III and Tier IV: What do you need to know?

Steven Shapiro, P.E., ATD

Mission Critical Practice Lead

Page 2: Myths and realities about designing high availability data centers

2

Data Center World – Certified Vendor Neutral

Each presenter is required to certify that their presentation will be vendor-neutral.

As an attendee you have a right to enforce this policy of having no sales pitch within a session by alerting the speaker if you feel the session is not being presented in a vendor neutral fashion. If the issue continues to be a problem, please alert Data Center World staff after the session is complete.

Page 3: Myths and realities about designing high availability data centers

3

Agenda

• Tier definitions

• Nines

• Tier III/IV issues – one line diagram

• Factors affecting performance

• Reliability and availability

• Causes of critical failures

• Key takeaways

• Questions

Page 4: Myths and realities about designing high availability data centers

4

Tier Definitions

Page 5: Myths and realities about designing high availability data centers

5

Things that are not tier-dependent

• Site location

• Facility construction

• Quality of equipment

• Facility commissioning

• Age of site

• Operations and maintenance program

• Personnel training

• Level of personnel coverage

Tier Definitions

Page 6: Myths and realities about designing high availability data centers

6

• Align business mission and facility performance expectation

• Benchmark against the industry

• Assist in developing business case for capital expenditures

Tier Requirements

User must define tier requirements for a facility

Page 7: Myths and realities about designing high availability data centers

7

Five 9’s Refers To Availability

• Availability (A) is the long-term average percentage of time that a component or system is in service and satisfactorily performing its intended function.

• Five nines availability means:

Minutes of Downtime Each Year

Hours of Downtime Every 20 Years

• Availability does not specify how often an outage occurs

“Nines”

Page 8: Myths and realities about designing high availability data centers

8

Tier Requirements

Tier I Tier II Tier III Tier IV

Number of Delivery Paths 1 11 Active

1 Passive2 Active

Redundancy N N+1 N+1 2N Minimum

Compartmentalization No No No Yes

Concurrent Maintainability No No Yes Yes

Fault Tolerance No No No Yes

Availability 99.671 99.749 99.982 99.95

Downtime in Hr/Yr 28.8 22 1.6 0.4

Page 9: Myths and realities about designing high availability data centers

9

• Tier I: $10,000 US/kW of useable UPS Power Output

• Tier II: $11,000 US/kW of useable UPS Power Output

• Tier III: $20,000 US/kW of useable UPS Power Output

• Tier IV: $22,000 US/kW of useable UPS Power Output

• Plus $225 US/SF of computer room

Based on a 15,000 SF white space, +/- 30%

Data Center Costs

From The Uptime Institute

Page 10: Myths and realities about designing high availability data centers

10

One Line Diagram2N Utility

N+2 Gens

2N Gen Distribution

2N UPS

2NDistribution

Mechanical UPS

One Line Diagram

Page 11: Myths and realities about designing high availability data centers

11

2N Utility

Not a tier requirement

Page 12: Myths and realities about designing high availability data centers

12

Generator Count and Distribution

• 2N generators not a tier requirement

• Some sort of 2N distribution is a Tier III and IV requirement

Page 13: Myths and realities about designing high availability data centers

13

• UPS can be configured in

many ways

• N = number of modules

installed meets the load – Tier

I And II

• N+1 = number of modules to

meet the load plus 1 additional

module, Tier III

Multi-Module UPS System Configuration

Page 14: Myths and realities about designing high availability data centers

14

• UPS can be configured in many

ways

• 2N Systems = 2X the number of

systems than required to meet

the load – Tier IV

• 2(N+1) Systems = 2x the

number of N+1 systems installed

than required to meet the load –

Tier IV

Multi-Module UPS System Configuration

Page 15: Myths and realities about designing high availability data centers

15

UPS Systems With External Maintenance Bypass

Page 16: Myths and realities about designing high availability data centers

16

• Mechanical UPS is required to keep

data center HVAC systems

operational until generator plant

supports load

• May run CRAC units, secondary or

primary pumps, etc.

• Sized to match cooling load for data

center and battery time of data center

UPS

Mechanical UPS

Page 17: Myths and realities about designing high availability data centers

17

Certain things can

be overdone.

How Much Redundancy is Enough?How Much Redundancy Is Enough?

Page 18: Myths and realities about designing high availability data centers

18

The Cost of Reliability

99.0

.9

99.9

99.99

99.999

Reliability

99.9999

Cost $

Page 19: Myths and realities about designing high availability data centers

19

• Location

• Design

• Redundancy level

• Construction

• Quality of equipment

• Thoroughness of commissioning program

• Age

• Operations & maintenance program

• Personnel training

• Level of coverage

Factors Affecting Performance But Not Tier Level

Lurking vulnerabilities

Page 20: Myths and realities about designing high availability data centers

20

• Document Management

• Maintenance Programs (CMMS)

• Commissioning

• Vendor Management

• Change Management

• Standard and Emergency Operating Procedures

• Training

• Staffing

Factors Affecting Performance But Not Tier Level

Page 21: Myths and realities about designing high availability data centers

21

• Harmonics Analysis

• EMF Studies

• Short Circuit Studies

• Coordination Studies

• CFD Modeling

Cold Aisle

Hot Aisle

IT Equipment

Computer Room Air ConditioningUnits

Factors Affecting Performance But Not Tier Level

Page 22: Myths and realities about designing high availability data centers

22

• Probability of failure/reliability

• Availability

• MTTF

• MTTR

• Susceptibility to natural disasters

• Fault tolerance

• Single points of failure

• Maintainability

• Operational readiness

• Maintenance program

Reliability Considerations

Page 23: Myths and realities about designing high availability data centers

23

Single Utility Feeder, Parallel Redundant UPS and Generators, Single-Corded IT Rack

Page 24: Myths and realities about designing high availability data centers

24

2N UPS, N+1 Generators, ASTSs and Dual-Corded IT Rack

Page 25: Myths and realities about designing high availability data centers

25

Two Utility Feeders, 2(N+1) UPS, 2(N+1) Generators, ASTSs, Dual Corded IT Rack

Page 26: Myths and realities about designing high availability data centers

26

Distributed Redundant UPS, N+2 Generators, Two Utility Feeders, ASTSs and Dual Corded IT Rack

Page 27: Myths and realities about designing high availability data centers

27

Reliability Considerations

Page 28: Myths and realities about designing high availability data centers

28

• 2(N+1) / system + system with dual utility feeders is the most

reliable topology

• There is no significant reliability improvement in using a 2(N+1)

UPS configuration over 2N

• Distributed redundant configuration is less reliable than 2N

• Improvement if a second utility feeder is provided

• N+2 and/or 2N generator systems are marginally more reliable

than N+1

Reliability Considerations

Page 29: Myths and realities about designing high availability data centers

29

Fail after 24 hours

Reliability Considerations

Study Performed by Idaho National Engineering Laboratory – February 1996 at Nuclear Power Plants

Emergency Diesel Generators

Fail to start

Fail after ½ hour

Fail after 8 hours

Page 30: Myths and realities about designing high availability data centers

30

• A hybrid configuration may be most effective

• STS’s on the secondary side of the PDU transformer yield a 2-to-1

reliability improvement over 480 V STS’s

• Dual cord has higher impact than the use of STS’s

• Ultimate reliability: STS + Dual Cord

• Assess the condition of the mechanical plant in conjunction with the

electrical system

• The facility reliability will be driven by the least reliable component

(typically the electrical infrastructure)

Reliability Considerations

Page 31: Myths and realities about designing high availability data centers

31

• Segregate system in independent blocks

• Eliminate common source components to minimize fault

propagation (i.e., LBS, hot-tie, manual bus ties)

• Move single points of failures as close to the load as possible

• Always maintain two independent sources of power to the critical

load

• Optimize the design of monitoring and controls circuits

• Keep it simple and minimize human intervention

Fundamentals of High Availability Design

Page 32: Myths and realities about designing high availability data centers

32

Causes of Critical Failures

28%

20%

18%

13%

10%

4%4% 3%

Equipment failure

System design

Human error

Equipment design

Installation error

Commissioning or test deficiency

Maintenance oversight

Natural disaster

Page 33: Myths and realities about designing high availability data centers

33

• Typically a combination of factors

• External event (power failure)

• Equipment failure

• Human factor

• Latent failures

• Root cause not always easy to ascertain

• Most major failures occur during change of state events

• Loss of utilities

• System transfers during maintenance activities

• More maintenance does not necessarily mean higher availability

Causes of Critical Failures

Page 34: Myths and realities about designing high availability data centers

34

• What reliability level do you really need based on your business case?

• Do you want concurrent maintainability?

• Do you want fault tolerance?

• Minimize single points of failure within systems

• Ensure adequacy of operations, maintenance and testing programs

• Review/develop SOPS and EOPS

• Review/develop existing documentation

• Review/develop training practices

Key Takeaways

Page 35: Myths and realities about designing high availability data centers

35

Steven Shapiro, PE, ATDMission Critical Practice Lead

(914) 420-3213

[email protected]

http://www.linkedin.com/in/stevenshapirope

Twitter: @stevenshapirope

Questions?

References:Uptime Institute White Paper: Tier Myths and MisconceptionsUptime Institute White Paper: Data Center Site Infrastructure Tier Standard - Topology