Managing distributed UPS energy for effective power capping in data centers

Preview:

DESCRIPTION

Vasileios Kontorinis , L.Zhang , B.Aksanli , J.Sampson , H.Homayoun , E. Pettis*, D. Tullsen , T. Rosing. Managing distributed UPS energy for effective power capping in data centers. *Google UCSD. ISCA 2012. Datacenter market is g rowing. - PowerPoint PPT Presentation

Citation preview

MANAGING DISTRIBUTED UPS ENERGY FOR EFFECTIVE POWER CAPPING IN

DATA CENTERS

ISCA 2012

Vasileios Kontorinis, L.Zhang, B.Aksanli, J.Sampson, H.Homayoun,

E. Pettis*, D. Tullsen, T. Rosing*Google UCSD

Datacenter market is growing World is becoming more IT dependent.

Internet users increased from 16% to 30% of world population

in 5 years [Internet World Stats] Smart phones are projected to jump from 500M in 2011 to 2B in 2015 [Inter.Telecom.Union]

Internet heavily depends on Datacenters Data center power will double in 5 years Expected worldwide Datacenter Investment in 2012: 35B$ (equivalent to GDP of Lithuania) [DataCenterDynamics]

2

Important to build cost-effective Datacenters

Power Oversubscription - Opportunity

3

Datacenter

Supporting equipment

No Oversubscription

With Oversubscription

One time

capital expense

sRecurring

Costs

More servers

Same infrastructure

Power Oversubscription More Cost-effective Data centers

Total Cost of Ownership / Server

Servers

ServerCost

Facility Space4.5%

Power In-frastructure

7.9%Cooling In-

frastructure3.3%

Rest11.9%

DC opex9.9%

UPS LA0.2%

Server De-preciation

40.6%

Server Opex2.0%

PUE overhead2.6%

Utility Energy11.7%

Utility Peak5.5%

Power Oversubscription – Opportunity

4

[Barroso et al. + APC TCO calc] Assumptions:

Server cost: 1500$ 28000 servers (10MW) Energy: 4.7c/KWh Power: 12$/kW Amort. Time DC: 10y, servers: 4y Distributed LA-based UPS

Available at:http://cseweb.ucsd.edu/~tullsen/DCmodeling.html

Power Oversubscription using Stored Energy

5

Leverage diurnal patterns of web services Discharge UPS batteries during high activity (once per

day) Recharge during high (once per day)

Pow

er

TimeP

ower

Time

Peak Power

Pulse ModelDiurnal Power Profile

Peak Power Pulse

Low Power Pulse

Power Profile Shaping

Peak Power Reduction

M Tu W … Su

+ _

UPS stored Energy

Centralized UPS

Used in most small / medium data centers

Scales poorly High losses in AC-DC-AC

conversion (5-10%) Centralized single point of

failure, requires redundancy

6

Increasingly cost-inefficient for large data centers

X

Distributed UPS

Used in large data centers Scales with data center size Avoids AC-DC-AC conversion Distributed points of failure

7

Cheaper UPS solution

Facebook

Google

Place more servers under same power infrastructure

Related work and our proposal

Centralized UPSs for power

capping [Govindan, ISCA 2011] Distributed UPSs for rare

power emergencies [Govindan, ASPLOS 2012]

Our proposal: Provision distributed UPS for

peak power capping Different battery technology Shave power on daily basis

8

Utility

Diesel Generator

PDUs

Racks

+ _

UPS

Better amortize capex costs

Outline

Introduction Choosing the right battery for power

shaving Datacenter workload and power

modeling Policies and results Conclusions

9

Outline

Introduction Choosing the right battery for power

shaving Datacenter workload and power

modeling Policies and results Conclusions

10

Competing Battery Technologies

11

Lead Acid (LA)

Lithium Cobalt Oxide (LCO)

Lithium Iron Phosphate (LFP)

Electric

Metrics12

Backup UPS batteries rarely used (3-4 times per year) Proper metrics:

Cost Size

Backup + peak shaving UPS batteries used on daily basis Proper metrics:

Charge cycles Cost Size Recharge speed

Wh / $Volumetric Density (Wh / liter)

Wh * cycles / $Volumetric Density (Wh / litre)( % charge / hour)

Battery Technology Comparison

13

Backup: Lead Acid (cheaper)Backup+Peak Shaving: Lithium Iron Phosphate (cost effective)

Datacenter

Shaved Energy

Server level Shaved Energy

• Number of servers

• Power supply efficiency

Capacity of server

level battery:

• Battery discharge properties

• DoD• Lifetime

capacity loss• Size

UPS Cost+ UPS

Depreciation• UPS Cost =

Bat.Cap.*$/Ah• UPS depr. =

UPS Cost/expected battery life

Battery Capacity-Cost Estimation

14

LFP Lead Acid

Pow

er

Time

PeakReduction

Peak Duration

Assumptions15

Number of servers 28KServer Type Custom Sun Fire X4270

- Intel Xeon (8-core), 8 GB Mem.- Idle Power: 175W- Max Power: 350W

PSU efficiency 80%Workload Pulse Model, utilization 50%Batteries LFP (5$/Ah), LA (2$/Ah)

TCO savings with peak duration

16

LFP more space,energy efficient than LA, can shave more!

The more we shave, the more we gain!

LA

LFP LA

LFP size constraint

LA size constraint

TCO savings with battery DoD

17

(a) LA (b) LFP

Sweet DoD spot for TCO savings (LA: 40%, LFP: 60%)

+ _

High DoDLow DoD

When shaving same energy:

+ _

Key points for battery selection

When using batteries for peak power shaving: Shave as much power as possible (reasonably sized

battery) There is a DoD sweet spot, maximizing TCO savings LFP better technology because:

lots of recharges more efficient discharge higher energy density cheaper in the future

18

What if: - Servers with unbalanced load? - Day-to-day variation in demand?

Outline

Introduction Choosing the right battery for power

shaving Datacenter workload and power

modeling Policies and results Conclusions

19

Workload Modeling Whole year traffic data from Google Transparency

Report Apply weights according to web presence: (Search 29.2%, Social Networking 55.8%, Map Reduce

15%) Present results for 3 worst consecutive days (11/17/2010-11/19/2010)

20

Service Time

Workload Modeling (cont.) Model 1000 machine cluster, with 5 PDUs, 10 racks

per PDU, 20 servers (2u) per rack. We simulate load based on M/M/8 queues and scale

inter-arrival time according to workload traffic

JobJobJob

JobJobJobJob

JobJobJob

JobJobJobJob

Job

Scheduler(Round Robin or Load-

aware)

JobJobJobJobJob ……..

Interarrival Time

8 Cores (consumers)/ Server

21

Outline

Introduction Choosing the right battery for power

shaving Datacenter workload and power

modeling Policies and results Conclusions

22

Policy goals

Guarantee power budget at specific level of power hierarchy

Discharge during only high activity, charge during only low activity Effective irrespective of job scheduling Make uniform battery usage

23

Available In Use

RechargeNot

Available

Power over Threshold

Power below Threshold

Reached D

oD G

oalRec

harg

e C

ompl

ete

(Power + Bat. Recharge Power) below Threshold

Uncoordinated Policy

Applied at the server level Easy to implement Runs independently per

server DoD goal set to 60% of battery capacity (LFP)

24

25

Round Robin Scheduling

Uncoordinated Policy Results

Batteries discharge when not required Batteries recharge during

peak Fails to guarantee budget

Budget violation

Uncoordinated Policy Results (cont.)

26

Coordination is required!!

Load-aware Scheduling

Batteries discharge all together

(wasteful) Recharge all together (violates budget) Fails to guarantee budget

Budget violation

Applied at higher levels (PDU, Cluster) Requires remote battery enable/disable, initiate recharge Number of batteries enabled

proportional to peak magnitude Batteries used spatially

distributed

Coordinated Control27

Over

all

Powe

r

Day1

Day2

Day3

100 server equivalent 200 server equivalent 0 server equivalent

200 server equivalent

300 server equivalent

rack1 rack2

Peak power reduction of 19% 23% more servers 6.2% TCO/server reduction

Coordinated Policies28

Power cap close to Average power (ideal) of 250W

Pdu-level Cluster-level

Discussion: Energy proportionality

Sharper, thinner peaks We can shave more

power, with same stored energy

Peak power reduction of up to 37.5% with the 40Ah LFP battery

Energy Proporional Servers

Modern Servers

Over

all

Powe

r

Day1

Day2

Day3

29

Concluding remarks

Battery provisioning of distributed UPS topologies to cap power and oversubscribe data center is beneficial

Critical to reconsider battery properties (technology, capacity, DoD) Coordination of charges and discharges is required We cap peak power by 19%, allow 23% more

servers and better amortize capex costs Achieve 6.2% reduction in TCO/server ($15M -- 28k

server DC)

30

31

BACKUP SLIDES

TCO savings with battery cost

32

TCO savings increase over time with LFP!

LA is stable technology LFP advancements expected, due to electric

vehicles

Scenario 1: Unexpected daily traffic We use the additional 35% capacity in our batteries (DoD optimized for TCO savings at 60%)

Scenario 2: Batteries are not replaced immediately

With 50% of batteries dead we can still reduce peak by 15%

When things go wrong?33

Grouping battery maintenance/replacement for cost savings possible

Exploration of Dead Batteries

34

No DVFSWITH DVFS

Discussion: DVFS

To DVFS or not DVFS? Datacenter SLAs

violations likely during peak load

DVFS bad during high demand

Great during low demand

Creates higher margins for aggressive battery capping

Potential SLA violation

SLA violation unlikely

Over

all

Powe

r

Day1

Day2

Day3

35

= =

=

Battery Capacity-Cost Estimation

36

LFP

Lead Acid (~twice volume)

Pow

er

Time

PeakReduction

Peak Duration

= PeakReduction

* PeakDuration

Battery Related Assumptions

37

Workload partitioning38

39

Distributed Algorithm

Recommended