35
Huazhong University of Science and Technolo Evaluating Latency-Sensitive Applications’ Performance Degradation in Datacenters with Restricted Power Budget Song Wu, Chuxiong Yan, Haibao Chen, Hai Jin, Wei Guo, Zhen Wang, Deqing Zou [email protected] The 44th International Conference on Parallel Processing (ICPP-15) Beijing, China, September 1-4, 2015

Huazhong University of Science and Technology Evaluating Latency-Sensitive Applications’ Performance Degradation in Datacenters with Restricted Power Budget

Embed Size (px)

Citation preview

Huazhong University of Science and Technology

Evaluating Latency-Sensitive Applications’ Performance Degradation in Datacenters with Restricted Power Budget

Song Wu, Chuxiong Yan, Haibao Chen, Hai Jin, Wei Guo, Zhen Wang, Deqing [email protected]

The 44th International Conference on Parallel Processing (ICPP-15) Beijing, China, September 1-4, 2015

Outline

Background

Motivation

Approach

Evaluation

Conclusion

Background

ISPs (Internet Service Providers)

Power budget▫The reserved space of power for servers

Power margin▫The part of the power budget that is not consumed by

the servers

Background

ISPs (Internet Service Providers)

Background

The solution▫Restricting power budget

The problem▫May incur power budget violation

We need to evaluate the performance degradation with a evaluation method

Outline

Background

Motivation

Approach

Evaluation

Conclusion

MotivationState-of-art▫PBV(percentage of budget violation)

▫In these two cases, the performance degradation values given by PBV are both

Cannot reflect the affected percentage of the application

Motivation

State-of-art▫PPL(percentage of performance loss)

Cannot reflect the delay of some parts of latency-sensitive applications

Motivation

Latency-sensitive applications▫Sensitive to brief variation in response time▫Common application of Internet service

The problem▫The state-of-art methods are too coarse-grained

Our target▫Design a evaluation method for latency-sensitive

applications

Outline

Background

Motivation

Approach

Evaluation

Conclusion

Approach

CPU Workload (Workload for short)

The actual CPU utilization will be capped under thrld.

Approach

Workload

▫Workload reflects the part of application affected

Approach

Workload

Approach

Differential Workload▫Workload in a very narrow time span

ApproachFunctions▫Delay(t). It is used to express the delay of differential

Workload at time t.

ApproachFunctions▫WA(t). It is used to express the accumulated Workload

at time t.

ApproachFunctions▫TotalWorkload(t). The amount of total Workload

submitted to the server between time 0 and t.▫DelayedWorkload(t). The summation of delayed

differential Workload between time 0 and t.

Approach

Metrics▫In what percentage the application is delayed? —— PDW (Percentage of Degraded Workload)▫What is the average delay of this part of application? —— AD (Average Delay)

Approach

Metrics’ expression

▫PDW is the percentage of workload whose delay is greater than 0

▫AD is the division between workload-delay product and delayed workload

Approach

The algorithm▫Design an algorithm based on CPU utilization trace▫Obtain the result in O(n) time

Approach

Use Case in Datacenter

Transformation Map + CPU trace PDW & AD under different budget

The decision of power budget for all servers

Outline

Background

Motivation

Approach

Evaluation

Conclusion

Evaluation

The accuracy of methods

A synthetic CPU trace covering the range from 0% to 100%

Evaluation

The accuracy of methods

The average difference of PDW and AD is 2.8% and 3.4%, respectively

Evaluation

The accuracy of methods

The average difference of PBV and PPL is 34.9% and 86.3%, respectively

Evaluation

The accuracy of methodsA real trace from WorldCup98

Evaluation

The accuracy of methods

The average difference of PDW and AD is 3.3% and 7.5%, respectively

Evaluation

The accuracy of methods

The average difference of PBV and PPL is 49.6% and 95.8%, respectively

Summary:• PDW and AD can

accurately evaluate the performance degradation, but PBV and PPL cannot.

• Fluctuant CPU trace may bring about more difference.

Evaluation

Typical servers

We choose 9 servers in Tencent’s datacenter according to their application types and load

Evaluation

Typical servers

PDW and AD increase with lower CPU utilization threshold;

More space in reducing power budget with light load servers;

There could be a maximum-benefit point.

Evaluation

Evaluating in datacenterEvaluate the space in saving power budget of about 25000 servers

Save about 1/3 power budget with almost no performance degradation

Outline

Background

Motivation

Approach

Evaluation

Conclusion

Conclusion

The state-of-art▫Inaccurate for latency-sensitive applications

Our evaluation method▫Two metrics (PDW and AD)▫A fine-grained method

Experimental result▫Our evaluation method is more accurate▫Substantial space in power budget restriction

Huazhong University of Science and Technology

Thank you!

Any questions, pls. contact [email protected]

Approach

The derivation process

We can obtain the result of PDW & AD by simultaneous equations