Green it economics_aurora_tco_paper

a total cost of ownership approach

by

Giovanbattista Mattiussi, MBA, MSc

marketing manager, Eurotech S.p.a.

an Eurotech HPC Division White Paper

The economics of green HPC

2Economics of Green HPC - A TCO approach

Table of contentsIntroduction 3Remarks 4Definitions 6Data centers comparison example 9Results 14Economical conclusions 19Aurora from Eurotech 21

ReferencesM.K. Patterson, D.G. Costello, & P. F. Grimm Intel Corporation, Hillsboro, Oregon, USA M. Loef-fler Intel Corporation, Santa Clara, California, USA - Data center TCO; a comparison of high-density and low-density spaces

Rachel A. Dines, Forrester - Build Or Buy? The Economics Of Data Center Facilities

Neil Rasmussen - Re-examining the Suitability of the Raised Floor for Data Center Applica-tions

Jonathan Koomey, Ph.D. with Kenneth Brill, Pitt Turner, John Stanley, and Bruce Taylor - A Sim-ple Model for Determining True Total Cost of Ownershipfor Data Centers

Eurotech S.p.a. – Aurora systems technical specifications

Fabrizio Petrini, Kei Davis and Jos´e Carlos Sancho - System-Level Fault-Tolerance in Large-Scale Parallel Machines with Buffered Coscheduling

Narasimha Raju, Gottumukkala, Yudan Liu, Chokchai Box Leangsuksun, Raja Nassar, Stephen Scott, College of Engineering & Science, Louisiana Tech University, Oak Ridge National Lab - Reliability Analysis in HPC clusters

3Economics of green HPC - a TCO approach

IntroductionThe evolution of supercomputers from very expensive special machines to cheaper high performing computing (HPC) has made very fast and advance systems available to an au-dience much vaster now than just 10 years ago. The availability of a lot more computing power is helping to progress in many strategic fields like science, applied research, na-tional security and business. This has fired the demand for even more computing power and more storage to stock larger and larger amounts of data. At the same time, HPC has be-come more pervasive, being progressively and increasingly adopted in fields like oil&gas, finance, manufacturing, digital media, chemistry, biology, pharmaceutics, cyber security, all sectors that together with the “historical” HPC users (scientific research, space and cli-mate modeling) have seen the demand for HPC services to soar.

More computing power per installation and more installations mean more electrical power needed and more complex heat management. In the last decade there has been a dramatic raise in electrical power requirements to both feed and cool an increased number of serv-ers.

According to the U.S. Environmental Protection Agency (EPA) the amount of energy con-sumed by data centers has doubled between 2000 and 2006 alone. Projecting this in the future means facing harsh problems, like electrical power availability, energy cost and car-bon footprint associated to datacenters.

While the cost of powering and cooling a data center has been traditionally low compared to the cost of hardware, this is now changing. As shown in Figure 1, the cost of power could reach that of hardware in the next 3 years, especially in areas like Europe or northeast US, where utilities have a high price.

This is particularly true in supercomputing, where computational power and utilization are taken to extreme levels. All of the above suggests that energy efficiency is becoming an unavoidable requirement not only for green IT but also for the economic health of the HPC data centers.

Another trend with HPC systems is the increase in density. On the one hand, this raises the heat concentration and gives more problems in how to extract it. On the other hand, more density means also to gain benefits in terms of cost and green, especially when, as with liquid cooling, an effective way of heat extraction is found. High density means low space occupancy, which comes with capital and financial gains related to occupancy reduction and intangible gains related to the lower carbon footprint that saving real estate entitles.

Supercomputers occupy the extreme side of computing, like F1 in cars. Such machines are often utilized to the limit of their computational power, with utilization rates close to 100%. Reliability of supercomputers becomes paramount to really exploit their main char-acteristic of being fast. If a HPC system breaks often, than the advantages of a lot of Flops may fade away quite quickly. Reliability is associated to the cost of maintenance, spare parts and to the generally much bigger business cost of an outage.


Reliability means basically less faults or less major faults, and hence less spare parts. Low-ering parts replacements has a connection to sustainability, according to which waste and depletion of hearth resources should be limited.Green IT and economics are intimately interweaved. This paper wants to provide some food for thought to help organizations to take the right decisions when buying HPC sys-tems, considering both an economic and an ecologic reasoning. The approach chosen in the following paragraphs is to create a comparison among 3 different data centers using 3 different HPC systems and to detail a hypothesis of total cost of ownership for each of them.

New Server Costs

Power Costs

2007 2011201020092008

$20

$70

$60

$50

$40

$30

$10

0

Billi

on

Figure 1: Raise in power cost (source Intel)

This paper focuses on hardware, even if it recognizes the determinant importance of software in making HPC systems greener and at the same time economically more attractive. Software impacts both the capital and operational sides of a HPC return of investment calculation. Bet-ter software and architectures mean saving energy, reducing time to solution and time to mar-ket, better hardware utilization and improved operational efficiency. Ultimately, it is software that determines how HPC hardware is used and hence discriminates between simple hardware expenditures and full rounded strategic investments.

It is the integrated approach to optimize hardware and software investments that allows reach-ing the best returns: fixing one side and neglecting the other won’t make any business case really sound.

Remarks


This is particularly true when attempting to make a data center greener. An integrated green IT approach means optimizing all parts in a data center, from processor to building. This paper cannot detail all actions needed to achieve greener results but wants to emphasize starting from a case that greener and cheaper often go together, side by side.

While storage has its impact on green IT management and economics, the analysis presented in this paper concentrated on computing units, assuming that the same storage is installed in the 3 data centers of the example reported below.


DefinitionsSome definitions may help the reader to better understand the remaining of this paper.

TCO stands for total cost of ownership. It is the sum, adjusted for the time value for money, of all of the costs that a customer incurs during the lifetime of a technology solution. Talking about an HPC data center, a common breakdown of those costs includes:

• Purchase of hardware, including cabling and switches• Purchase of software licences• Building work for new constructions or extensions, including power adaptation• Air conditioning and cooling• Electrical equipment, , including power distribution units, transformers, patch pan-

els, UPSes, auto transfer switches, generators…• Installation of hardware, software, electrical and cooling• Hardware, software, electrical and cooling maintenance• Software upgrades and recurring (monthly, yearly) software licences• Any additional form of lease• Energy costs• Disposal costs

PUE stands for power usage effectiveness. It measures how much of the electrical power enter-ing a data center is effectively used for the IT load, which is the energy absorbed by the servers and usefully transformed in computational power. The definition of PUE in formula is as follows:

The perfect theoretical PUE is equal to 1, that means all of the energy entering the data center is used to feed IT equipment and nothing is wasted.

ERE stands for energy reuse efficiency and it is the ratio between the energy balance of the data center and the energy absorbed by the IT equipment. The data center energy balance is the total energy consumed by the data center minus the part of this energy that is reused outside the data center. A typical example where ERE is a useful matric is liquid cooling, with which water, cooling the IT equipment, heats up and moves energy outside the data center, where it can be reused.The ERE can range between 0 and the PUE. As with the PUE the lower the value, the better is for the data center. Practically speaking, ERE as a metric, helps in those situations where the PUE is not enough to explain energy reuse. It now mends situations, where it was common habit to factor the energy recovery into the PUE, talking about a PUE lower than 1, which makes a mathematical non-sense.

TCO

PUE

PUE = Total Facility Energy Consumption

IT Equipment Energy Consumption

ERE


CUE stands for carbon usage effectiveness. It measures the total CO2 emissions caused by the data center divided by the IT load, which is the energy consumed by the servers. The formula can be expressed as follows:

which, simplified, becomes:

The CEF is the carbon emission factor (kgCO2eq/kWh) of the site, based on the government’s published data for the region of operation for that year. The CEF depends on the mix of energy types a site is fed with. A site which is entirely powered by a hydroelectric or solar power sta-tion, it has a low CEF. A site which is entirely powered with electricity produced in an oil or coal power station, it bears a higher CEF. While CEF can vary with the site considered, the utility companies provide regional or national average values that can be used with good approxima-tion.

The layout efficiency measures the data center square footage utilization and the efficiency of the data center layout. This is defined as racks per thousand square feet and so it is a measure of space efficiency related to the electrically active floor in a data center.

The most datacenter common layout in today’s data centers is a repeating of rows of racks side-by-side with alternating cold aisles and hot aisles. The cold aisle supplies air to the serv-ers, which heat it up and discharge it into a hot aisle, shared with the next row of server. The servers lay on a raised floor, which provides cool air to the cold aisle, while the hot air returns to the air conditioning system.

For instance, in the data center in Figure 2, computer refrigerated air conditioning units (CRACs) pull hot air across chillers that distribute the cool air beneath the raised floor. In big buildings requiring more ventilation power, CRACs are normally replaced by air handling units (AHU).

In this hot aisle / cold aisle configuration, a varying number of servers can be fit into each rack based on many factors; cooling capability, power availability, network availability, and floor loading capability (the rack’s loaded weight).

ERE = Total Facility Energy Consumption-Recovered Energy

IT Equipment Energy Consumption

CUE = CO2 emitted (Kg CO2 eq)

unit of energy (Kwh)

CUE =

Total data center energy

IT Equipment energyX

X CEFPUE

CUE

Layout efficiency

DC layout


Figure 2 - Server rack thermal example - Source: Intel IT, 2007

A work cell is a square footage area in a data center, which is directly attributable to a rack of servers. In air cooled data centers, a work cell is a repeating unit of hot aisle, rack and cold aisle. In a liquid cooled data center it is the rack, plus the liquid distribution and the maneuver space in front and on the back of the rack.

If a data center uses server liquid cooling technology, the above layout may look obsolete. In an existing data center, liquid cooling technology doesn’t require a change of layout, because plumbing, electrical and network works can blend into the existing one. In the case of a new data center that employs solely liquid cooling, most if not all the air conditioning infrastructure becomes redundant, leaving many opportunities for layout and space optimizations. Some of them would be the reduction of the work cell (as defined above), the reduction of the raised floor, the avoidance of chillers, CRACs and air handling units, to be replaced by the cooling circuit (pipes) and free coolers (liquid to air heat exchangers).

L i q u i d cooling

Work cell


The best way to get a good grip with the TCO concept is to draw an example, where 3 different data centers are compared: • A data center with a low/medium density HPC system, air cooled• A data center with an high density HPC system, air cooled• A data center with an high density HPC system, water cooled

Table 1 summarizes the main differences and hypothesis.

The data centers are located in the same geographical area and hence the external tempera-ture profile is the same for all 3 cases throughout the year.

The measurements for the PUE computation are taken, one at the plug from where the data center is powered, the other after the PSU, immediately before the circuit feeds the computa-tion units. In this way, there is a neat separation between the power absorbed by the IT equip-ment and the one used for cooling, lighting and other ancillary services.

The type of energy mix that feeds the data center is important to calculate the CUE. The CUE can give an indication of how green a data center is on the basis of its carbon footprint.

In terms of the actual HPC systems installed in the 3 data centers, as mentioned, the analysis will focus on computational nodes, considering the following 3 systems:

• A low/medium density high performance cluster made up of 1U servers, air cooled• A high density blade HPC server, air cooled• A high density blade HPC server, water cooled

Comparing systems that are fundamentally different does not always produce fair results, be-cause the risk is to compare apples and oranges, so to compare systems with different features and value addition that fit best the customer depending on the case. So, in order to level out differences as much as possible, the comparison in this example is made between systems delivering the same computational power (500 Tflop/s), having the same cluster architecture, the same RAM, the same Interconnects and software stack. Regarding the processors, the com-parison makes 2 different hypotheses. The first is summarized in Table 2 and it assumes that 3 different processors are used. The second is summarized in Table 3 and posits that the same processor is used.

Data centers comparison example

Air cooled low den-

sity data center

Air cooled high

density data center

Liquid cooled high

density data center

Location Temperate climate Temperate climate Temperate climate

PUE 2.3 1.8 1.05

Power mix Various energy types Various energy types Various energy types

Table 1 - Comparison between the 3 different datacenters

Comparing


Air cooled low density HPC

Air cooled high density HPC

Liquid cooled high density HPC

TFLOP/S 500 500 500

Architecture1U servers,Intel

CPUsBlades, Intel CPUs Blades, Intel CPUs

Processors per node2xIntel Xeon X5650

6C 2.66GHz

2xIntel Xeon X5690

6C 3.46 GHz

2xIntel Xeon E5,

(Sandy Bridge) 8C

2.60GHz

Layout efficiency 22 22 40

GFLOPS/node 130 165 340

Total nodes needed 3846 3030 1471

Servers/blades per rack 35 100 256

Cores per rack 280 1200 4096

Server racks needed 110 30 6

Total network equipment 481 379 184

Total racks for network

equipment11 9 4

Total racks needed 121 39 10

Occupancy (ft2) of electri-

cal active floor5515 1787 253

Total DC space (in ft2) 11031 3575 953

(In m2) 1015 329 88

Energy consumption

Mflops/W 280 450 800

Total IT Power (Kw) 1,786 1,111 625

Total DC power (Kw) 4107 2000 656

Reliability

MTBF per node (hours) 670,000 670,000 804,000

MTBF per system (hours) 174 221 547

Price of outage per h ($) 2000 2000 2000

Table 2 – Characteristics of the 3 data centers: case of different processor types


Air cooled low density HPC

Air cooled high density HPC

Liquid cooled high density HPC

TFLOP/S 500 500 500

Architecture1U servers, Intel

CPUs Blades, Intel CPUs Blades, Intel CPUs

Processors per node2 x Intel Xeon X5690

6C 3.46 GHz

2 x Intel Xeon

X5690 6C 3.46 GHz

2 x Intel Xeon X5690

6C 3.46 GHz

Layout efficiency 22 22 40

GFLOPS/node 165 165 165

Total nodes needed 3030 3030 3030

Servers/blades per rack 35 90 256

cores per rack 420 1080 3072

Server racks needed 87 34 12

Total network equipment 379 379 379

Total racks for network

equipment9 9 9

Total racks needed 96 43 21

Occupancy (ft2) of electri-

cal active floor4345 1940 521

Total DC space (in ft2) 8691 3881 1221

(In m2) 800 357 112

Energy consumption

mflops/w 450 450 450

Total IT Power (Kw) 1,111 1,111 1,111

Total DC power (Kw) 2556 2000 1167

Reliability

MTBF per node (hours) 670,000 670,000 804,000

MTBF per system (hours) 221 221 265

price of outage per h ($) 2000 2000 2000

Table 3 – Characteristics of the 3 data centers: same processors


The comparison involves CPU based systems. The same reasoning can be extended to GPUs based systems or hybrid (CPU and GPU based), but we think it would be not useful to compare a CPU system with a hybrid or GPU based one, due to their different technological nature.

In terms of density, the comparison is made between a low density 1U servers system and 2 higher density blade servers systems, one air cooled and the other liquid cooled. According to Uptime institute, on average in data centers, 75% of a rack gets filled by computing nodes, the rest by power supplies, switches… In the current example, this would set a number of 35 and 100 servers per rack, respectively for the 1U and blade server configurations. For the third system, water cooled, we have taken as reference the Aurora HPC 10-10 from Eurotech, which mounts 256 blades per each 19 inches rack.

In terms of number of racks and servers needed, using different processor types or using the same processor type in the 3 systems being compared results in a different total number of servers and racks.In the case the 3 systems use different processors, it is assumed that each node in the 3 systems has 2 processors each, generating a number of Gflops that varies depending on the processor itself. It follows that, to reach the required total Flops, each system will need a different number of servers (nodes).

If the same processor type is used and if we make the additional assumption that each node has 2 sockets and generates an equal number of Gflops per node, then the 3 systems have the same number of servers.

Nevertheless, in both cases, the density of the systems, so how many cores can be packed into each rack, will determine how many racks are needed and consequently the number of racks needed is different in the 3 systems compared.

Following the Uptime institute observations, it has been assumed that the network equipment is roughly 10% of the total IT equipment used in the data center. This has led to the numbers reported in the 2 tables above, where there are also the total racks needed to host the network equipment (assumed to be 1U).

With regards to the layout efficiency, it has been assumed as 22 standard racks per 1000 ft2, number obtained from the standard size work cell (48sqft). We assumed that a liquid cooled system requires a smaller work cell, because it eliminates the plenum requirements for optimal air cooling. For the liquid cooled system, an assumption of work cell of 25 ft2 seems to be fair and leads to a layout efficiency of 40 racks for 1000 ft2.

In terms of footprint, once the number of racks needed per data center type is determined, knowing the layout efficiency of the data center, a simple calculation gives the total square feet of electrical active floor. The electrical active occupancy has been doubled to include all infrastructure and data canter facilities, the additional space being at minimum on the order of 700 ft2

Density

Racks

LayoutEfficiency

Footprint


The total power needed in the data center has been calculated making 3 assumptions re-garding the efficiency of the IT equipment, efficiency expressed in Mflops/w. The TOP 500 list helped us to understand the typical values of Mflops/w per processor type, from which we derived the electrical power needed for the IT equipment to generate the required Flops. The value obtained for IT power of each datacenter was multiplied by the PUE of the datacenter to determine the total electrical power installed in each of them.

System failure rates increase with higher temperature, larger footprints, and more IT compo-nents. Efficient cooling and a reduction of the number of components are essential to increase systems reliability. We have calculated an average nominal MTBF (mean time between failures) per each node, representative of the reliability of the entire system, aggregating the MTBF val-ues of single components (memory, processors…) interconnects, switches, power supplies. The overall system MTBF was derived from the single node MTBF through the formula:

For the water cooled system we have again taken as reference the Eurotech Aurora super-computer family, in which the limitation of hot spots, absence of vibrations, soldered memory and 3 independent sensor networks help to reduce the fault rate of each node, leading to an increased MTBF of the total system (assumed 20% higher).To comprehensively consider the systems availability through their life, an analysis of the re-liability should be extended to all of the electrical and mechanical components in the data center, including auxiliary electrical equipment (like UPSs) and the cooling system.

To calculate the annualized investment costs, we have used the static annuity method, defin-ing a capital recovery factor, which contains both the return of investment (depreciation) and the return on investment. The CRF is defined as:

Where d is the discount rate (assumed 7%) and L is the facility or equipment lifetime. The CRF converts a capital cost to a constant stream of annual payments over the lifetime of the invest-ment that has a present value at discount rate d that is equal to the initial investment.

∑i=1

i=NMTBF =

μi

μ is the fault rate for each node, the higher, the lower is the MTBF

N is number of nodes in the system: the higher, the lower is the MTBF

Electrical power

Reliability

1

Capital recovery

factor

d(1+d)L

(1+d)L - 1CRF =


We have summarized the quantitative results of the analysis in the set of tables reported below.

Table 4 reports the results of the comparison in terms of total TCO (3 years)Table 5 reports the annualized TCO resultsTable 6 shows the results of the total TCO (3 years) in the case the same processor types are usedTable 7 shows the results of the annualized TCO in the case the same processors types are used

The calculations consider the costs of the entire data center, including civil, structural and ar-chitectural (CSA), IT equipment, cooling, electrical capital cost, the energy consumption, main-tenance and additional operating costs. The capital cost to purchase hardware and software is assumed to be the same for the 3 systems. This hypothesis will be relaxed in the next chapter, where some conclusions are presented.

CSA costs are highly impacted by the density. According to Forrester research, the CSA cost per ft2 is around $220/ft2 (in USA). This is an average number, its variance depending on the real estate costs and building costs, which may vary greatly by area, region or nation.

The permitting and fees for any project are often based on square footage. The specifics of the individual site location would dictate the magnitude of additional savings associated with higher density. Forrester estimates $70 per square foot in building permits and local taxes, which represents a moderate cost in the United States.

The cost of computational fluid dynamics (CFD) depends more on complexity than on the square footage. However, it is possible to assume that for a new and homogenous data center $5/ft2 is a fair number.

Air cooled data centers normally require a raised floor higher than the ones adopting liquid cooling technologies. In fact, liquid cooling may not require a raised floor at all, but it would be safer to assume that some raised floor is needed for cabling and piping. The presence of a higher raised floor requires a similar height increase in the return air plenum, so the overall building height is higher in the case of air cooled systems than in the case of liquid cooled ones. Although building costs are far less sensitive to height than to area, we can assume that the marginal cost to increase the height of the building as required is 10% of the total building cost.As for the raised floor per se, the cost associated to the height delta between the air and liquid cooling cases could be assumed on the order of 2$ per ft2.

The IT equipment is normally a large portion of the total budget. The cost of computational node cards, processors, memory, I/O is considered to be the same in the 3 systems being com-pared. What differs is the number of physical racks needed to deliver the 500 Tflop/s required. To correctly account for this difference, a per-rack cost of $1500 is assumed, with an additional $1500 to move in and install it. This value is conservative, considering that it depends of how advanced and interconnected is the data center. Each rack is normally associated to a rack management hardware which can be quoted a similar amount.

Results

CSA costs

CFD

Raised floor

Racks


Air cooled low density

Air cooled high density

Liquid cooled high density

Initial investment costs

Cost of IT (HW) $7,500 $7,500 $7,500

Building permits and local taxes $770 $250 $60

CSA capital costs $2,420 $780 $200

Capital cost of taller DC $240 $70 $0

Design cost for CFD $30 $10 $0

Delta cost raised floor $20 $10 $0

Fire suppression and detection $60 $30 $20

Racks $360 $110 $30

Rack management hardware $360 $110 $30

Liquid cooling $0 $0 $320

Total for network equipment $960 $750 $360

Cooling infrastructure/plumbing $5,350 $3,330 $280

Electrical $7,140 $4,440 $2,500

TOTAL CAPITAL COSTS $25,210 $17,390 $11,300

Annual costs (3 years)

Cost of energy $8,660 $4,980 $1,630

Retuning and additional CFD $50 $20 $0

Reactive maintenance (cost of outages) $1,510 $1,190 $490

Preventive maintenance $450 $450 $450

Facility and infrastructure maintenance $2,040 $1,170 $420

Lighting $50 $20 $10

TOTAL TCO $37,770 $25,020 $14,100

Table 4 – 3 different processors, TCO comparison results (In K USD dollars)




Cost of energy $2,720 $1,560 $510


Reactive maintenance (cost of outages) $500 $390 $160


Facility and infrastructure maintenance. $670 $380 $130

Lighting $20 $10 $5

Annualized 3 years capital costs $3,440 $3,180 $2,960

Annualized 10 years capital costs $1,770 $1,100 $440

Annualized 15 years capital costs $380 $120 $30

ANNUALIZED TCO $9,670 $6,900 $4,385

Table 5 – 3 different processors, annualized TCO comparison results (In K USD dollars)





Initial investment costs

Cost of IT (HW) $7,500 $7,500 $7,500

Building permits and local taxes $600 $270 $80

CSA capital costs $1,910 $850 $260

Capital cost of taller DC $190 $80 $0

Design cost for CFD $30 $10 $0

Delta cost raised floor $20 $10 $0

Fire suppression and detection $60 $30 $20

Racks $290 $130 $70

Rack management hardware $290 $130 $70

Liquid cooling $0 $0 $810

Total for network equipment $750 $750 $750

Cooling infrastructure/plumbing $3,330 $3,330 $500

Electrical $4,440 $4,440 $4,440

Annual costs (3 years)

Cost of energy $5,390 $4,980 $2,900


Reactive maintenance (cost of outages) $1,190 $1,190 $1,000


Facility and infrastructure maintenance $1,360 $1,180 $730

Lighting $40 $20 $10

TOTAL TCO $27,880 $25,380 $19,590

Table 4 – 3 different processors, TCO comparison results (In K USD dollars)




Cost of energy $1,690 $1,560 $910


Reactive maintenance (cost of outages) $390 $390 $330


Facility and infrastructure maintenance $450 $390 $240

Lighting $20 $10 $5

Annualized 3 years capital costs $3,390 $3,270 $3,220

Annualized 10 years capital costs $1,100 $1,100 $820

Annualized 15 years capital costs $300 $130 $40

ANNUALIZED TCO $7,510 $7,010 $5,715

Table 5 – 3 different processors, annualized TCO comparison results (In K USD dollars)


The third system in the current comparison is liquid cooled. The analysis takes in consideration the additional cost of the liquid cooling structure. The “liquid cooling” item in the tables above refers to the cost of actual cooling components within the rack. The external piping, dry coolers etc. are grouped under the “Cooling infrastructure” cost item. For instance, in the Eurotech Au-rora supercomputers the cooling parts within each rack include liquid distribution bars, piping and the aluminum cold plates that are coupled with each electronic board, allowing a direct and efficient heat removal. In other supercomputers or HPC clusters, the cooling system may consist in distribution pipes and micro channels.

In the case of air cooled servers, the cost of the cooling infrastructure includes chillers, AHUs, CRACs, piping, filters…. In the case of water cooling, it includes heat exchangers (free coolers), piping, filters and pumps. The dimensioning of the cooling depends on the data center location (which, in this example, is assumed the same) and its temperature profile across the year, the Kw/ft2 of IT equipment installed and the size of the building. With air cooled systems, It is assumed that an approximate 3000$ of cooling infrastructure is required for each Kw of IT equipment installed, while the liquid cooling infrastructure is gener-ally cheaper. This is particularly true in the case of a hot water cooled solution, like Aurora from Eurotech. Such technology allows the employment of free coolers instead of the more expen-sive chillers, in most of the world climate zones. Also, much of the ventilation, necessary in an air cooled data center, may be avoided. Hot liquid cooled allows for high density systems to be deployed and hence for less occupancy, reducing the volume of air that needs to be chilled. As a consequence, the use of water cooled systems can save a big chunk of the money dedicated to the cooling infrastructure.

As regards the electrical equipment (e.g., power distribution units, transformers, patch panels, UPSes, auto transfer switches, generators), it is a cost that can be made, like cooling, propor-tional to the KWs of IT equipment installed. For the present example, it has been assumed a cost of $3500 per Kw of IT equipment installed.

Analyzing now the operating costs, particular attention should be dedicated to the cost of en-ergy. In the comparison being described, the cost of energy is calculated computing the energy consumed by the data center each year and during 3 years of operations. The cost per Kwh has been fixed as:

• 0.11$/Kwh for System 1• 0.13 $/Kwh for System 2 and 3

System 1 cost per Kwh is less than the one for the other 2 systems, because the first datacenter ends up consuming enough to most probably benefit from volume discounts. Both of the en-ergy unit costs chosen are relatively high for some parts of US, but generally low for Europe, where the average cost per Kwh for industrial use is 0.11 € (0.15$). The IT equipment is supposed to bear an average load of 80% and have an availability of 86%, an assumption conservative enough to counterbalance cases in which energy prices are lower than the ones assumed. The calculated cost takes in consideration all of the energy consumed by IT equipment, cooling, ventilation, power conversion and lighting.

LiquidCooling

Cooling infrastructure

Electrical

Energy costs


Two main aspects affect the difference in energy consumption: the efficiency of the single servers (in terms of flops/watt) and the efficiency of the whole data center, in terms of PUE.

We have divided the costs of maintenance in 3 different groups: preventive maintenance for the IT systems, reactive maintenance for the IT systems and the maintenance of all other infra-structure, including building.The IT system preventive maintenance includes cost of IT systems monitoring, fine tuning, soft-ware upgrades and patches, consultancy and personnel. It is assumed to be proportional to the flops installed and, for this reason, it is equal for the 3 systems compared.The reactive maintenance has been associated to outages. To take in consideration the cost of HW replacements, personnel, support, consultancy and business impact associated to the fault, a value of 2000$ per each hour of outage has been chosen. While we believe this value is a fair and conservative assumption, it is also true that it may vary a lot from sector to sector and company to company. In particular, the business impact cost of an hour of outage may span between few hundred dollars to millions depending on the type of outage and the nature of the business of the company using the HPC facility (an obvious difference could be comparing a small manufacturing company with a large investment bank).While the annual facility and infrastructure is an unpredictable ongoing costs of a data center, on average, companies should expect a base cost of 3% to 5% of the initial construction cost

The initial investment costs (capital costs) have been annualized using 3 different year bases, 3, 10 and 15 years. Capital costs with three year life include all IT equipment costs (which include internal routers and switches). Capital costs with 10 years life include equipment other than IT (electrical and cooling). Capital costs with 15 year lifetime include all other capital costs (build-ing).We have calculated a capital recovery factor, using the different life values and a 7% real dis-count rate.

ELEC

TRIC

AL

POW

ER IN

HEA

T O

UT

INDOOR

DATA CENTER

HEAT

PDU 3%

CHILLER 23%

SWITCHGEAR/GENERATOR 1%

LIGHTING/AUX DEVICES 2%

UPS 6%

HUMIDIFIER 3%

CRAC/CRAH 15%

IT EQUIPMENT 47%PUE = 2.13

Figure 3: Typical power distribution in a data center (source APC)

Maintenance costs

AnnualizedTCO


Economical conclusionsThis brief study demonstrates that there is a sense in considering the overall total cost owner-ship of an HPC solution and so extending the cost analysis beyond the initial expenditure of hardware and software. Energy efficiency, density and higher reliability play an important role in determining the overall cost of a HPC solution and consequently its value, compounded as total benefits minus total costs.

In the example reported in this paper, an assumption was made that the purchase cost of hard-ware was the same for the 3 systems being compared. We want now to relax this assumption, keeping a distinction between the 2 cases in which different and the same processor types are used.

In the case the 3 systems have different processor types, the increased energy efficiency, den-sity and flops/$ that newer generation processors bring, put in advantage the system that adopted the latest Intel technology. There is a clear benefit in using the latest technology (at least comparing systems with the same total flops).

If we compare 3 systems mounting different processor types, in order for the annualized TCO of the 3 systems to be the same:• The 1U server solution cannot equal the TCO of the water cooled solution, even if the IT

hardware is given away for free• The IT hardware of the air cooled blade server solution must be 70-80% cheaper than the

IT hardware of the liquid cooled one.

In the case the 3 systems compared are using the same processors, in order for the annualized TCO to be the same:• The IT hardware of 1U server solution must be 50-55% cheaper than the IT hardware of the

water cooled solution• The IT hardware of the air cooled blade server solution must be 35-40% cheaper than the

IT hardware of the water cooled solution.

It is important to remark that a TCO study is conditioned by the particular organization situa-tion. Important variables are location, availability of datacenter space, energy contracts, avail-ability of power…While this is certainly true, the example reported in this paper gives an idea of how useful it is to dig deeply enough to discover that a throughout analysis can lead to more sensible purchases.


Green considerationsWe have seen that energy efficiency, density and reliability are areas that contribute to im-prove the TCO of a high performance computing solution. Incidentally, they are also means to reach greener IT.

Saving energy is a paramount component of any green IT policy. We have pointed out that the approach to energy savings should be multilevel, involving hardware, software, facilities and even human behavior in a data center. Some actions, more than others, can have a determinant impact in making datacenter greener, like, for instance, adopting liquid cooling, which lower the data center PUE down to limits not easily reachable with air cooling.

Also, a reduction in space occupancy ticks many green boxes, when building avoidance or reduction of newly built is taken in consideration. High density HPC solutions help delaying the construction of new real estate or limiting the size of newly built, contributing to reduce carbon footprint and the depletion of hearth resources, as well as limiting the environmental impact on landscapes.High reliability means less faults, so less spare parts, making it possible to save on components that use hearth resources and need to be disposed, with the consequent creation of waste.

A part from the benefits brought to the cost side of the profit equation, Green IT can influence the revenue side, when organizations are able to capitalize their green image or posture. Many companies and public institutions can leverage the intangible value of being green, through increased reputation that pushes up sales or through the government contributions that are reserved to green policies/sustainability adopters.

The biggest payback of green and sustainability will be seen overtime. According to the Natu-ral Step, a word renewed sustainability think tank, it is not a matter of if companies and gov-ernments need to adopt sustainability measures, it is a matter of when, within a context where who will be late to adapt to new requirements will pay a severe price.

Moving to a more practical dimension, to complete the example made in the previous para-graph, it would be useful to compute the total number of CO2 tons associated to the 3 data center operations throughout the lifetime of the solution adopted (assumed to be 5 years). Table 8 reports this calculation, in which it has been assumed a CEF (carbon efficiency factor) of 0.59 Kg CO2 per kg (kgCO2eq). This is average in the US and quite similar to Europen average. It is fair to mention that the CEF has a great variability depending on country, state, region, power mix of the data center.




Tons of CO2 51692 16385 9558

CUE 1.26 1.06 0.62

Table 8 – CO2 equivalent and CUE in the 3 cases under scrutiny

E c o n o m i c s and green

Green value

C a r b o n footprint


When in operations, the IT equipment of a data center transforms electrical energy in heat. Air cooled data centers remove the heat through air conditioning, while liquid cooled IT equip-ment convey the heat out of the building using water. In the latter case, water can be heated enough to be utilized for producing air conditioning, through a absorption chiller, producing some electrical energy or simply heat a building. This practice is called thermal energy recovery and, while still rather marginal due to the average size of present data centers, it will become progressively more important while the average data center dimension increases. Reusing the energy is an interesting concept both from the economic and ecologic viewpoints. This is why a specific measure has been introduced, the ERE, to overcome the limitations of PUE, which is mathematically incapable of considering thermal energy recovery.

Aurora is the family of green, liquid cooled supercomputers from Eurotech.

Eurotech develops the Aurora products from boards to racks, maintaining production in house in the Eurotech plant in Japan and keeping a strict control over quality. Aurora product line is the synthesis of more than 10 years of Eurotech HPC history and investments in research projects, giving birth to supercomputers that set the scene for their advanced technology and forward-thinking concepts.

Aurora from Eurotech

Aurora supercomputers tick all boxes to take full advantage of the green economics:

Energy efficiency, thanks to liquid cooling and a very high power conversion efficiency. The direct on component water cooling tech-nology allows the use of hot water and free coolers in any climate zone, with no need for expensive and power hungry chillers. In ad-dition, Aurora supercomputers employ a power conversion with ef-ficiency up to 97%. An Aurora data center can aim to a PUE of 1.05.

High density, with 4096 cores and 100 Tflop/s per standard rack, Aurora supercomputers are among the densest in the world, per-mitting to save in space, cabling, air conditioning, power and data center complexity.

High reliability, guaranteed by high quality, liquid cooling that reduces or eliminates the data center and on board hot spots, vi-bration less operations, redundancy of all critical components, 3 independent sensor networks, soldered memory, optional SSDs. Eurotech HPC division has inherited from Eurotech embedded and rugged PCs core business the ability to construct reliable systems. Eurotech excels in building bomb resistant computers or PCs that need to operate in very harsh conditions, so they know one thing or two…

T h e r m a l e n e r g y recovery


For more in depth technical information about Eurotech Aurora range:

[email protected]

For sales information about Eurotech Aurora product range:

[email protected]

www.eurotech.com

Eurotech is a listed global company (ETH.MI) that integrates hardware, software, services and expertise to deliver embedded computing platforms, sub-systems and high performance com-puting solutions to leading OEMs, system integrators and enterprise customers for successful and efficient deployment of their products and services. Drawing on concepts of minimalist computing, Eurotech lowers power draw, minimizes physical size and reduces coding complex-ity to bring sensors, embedded platforms, sub-systems, ready-to-use devices and high perfor-mance computers to market, specializing in defense, transportation, industrial and medical segments. By combining domain expertise in wireless connectivity as well as communications protocols, Eurotech architects platforms that simplify data capture, processing and transfer over unified communications networks. Our customers rely on us to simplify their access to state-of-art embedded technologies so they can focus on their core competencies.

Learn more about Eurotech at www.eurotech.comFor HPC visit www.eurotech.com/aurora

About Eurotech

http://www.eurotech.com

http://www.eurotech.com

http://www.eurotech.com/aurora

Technology

Green it economics_aurora_tco_paper