Thermal Aware Resource Management Framework Xi He, Gregor von Laszewski, Lizhe Wang Golisano College of Computing and Information Sciences Rochester Institute

Thermal Aware Resource Management Framework

Xi He, Gregor von Laszewski, Lizhe WangGolisano College of Computing and Information Sciences

Rochester Institute of TechnologyRochester, NY [email protected]

1

Outline

2

• Introduction• Motivation• Thermal-aware Resource Management

Framework• Motivational Examples• System Model and Problem Definition• Thermal-aware Task Scheduling Algorithm• Conclusion

Introduction

3

Distributed Collaborative Experiment

Introduction

4

• 61 billion kilowatt-hours of power in 2006, 1.5 percent of all US electricity use costing around $4.5 billion.

• Energy usage doubled between 2000 and 2006.• Energy usage will double again by 2011[1]. 61 billion

kilowatt-hours of power in 2006, 1.5 percent of all US electricity use costing around $4.5 billion.

• [1] http://www.energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf

Dynamic Voltage Scaling Hardware LevelDynamic Frequency Scaling

Virtualization Software Level

Job Scheduling Middleware LevelVirtual Machine Scheduling

Introduction

5

Cooling System Data Center Level

6

Motivation

• Why thermal-aware resource management framework? – To allow end users easily collaborate with each

other and get access to remote resources.– To implement Green Computing.– To monitor temperature situation in Data Center.

Architecture Overview

7

8

Different types of task-temperature profiles

Motivational Examples

9

Task-temperature profile (Buffalo Data Center)


10

job1=(0,2,20,f(job1))

job2=(0,1,40,f(job2))

node1=40C

node2=32C

node3=34C

node4=32C

node1=40C

node2=40C

node3=40C

node4=40C

job1node4job1node2

job2node3

job1node1job1node2job2node3

max=40Cσ=0

node1=48C node2=40C

node3=40C Node4=32CMax=48C Σ=5.6


System Model

11

•Where, nodei indicates ith node in the data center; Each node has a temperature-time profile that indicates the node’s temperature value over time.

System Model

12

•Where, tstart indicates the starting time of job; The job needs nodenum processors and lasts texe; ftemp(t) is a function caused by the execution of the job based on the execution time of the job.

Problem Definition

13

•Given a set of jobs. Find an optimal schedule to assign each job to the nodes to minimize computing nodes’ temperature deviation. •Where, ΔTemp is the temperature increase that jobk causes.

Problem Definition

14

•We use standard deviation as the metric for measuring the temperature distribution.

Algorithm

15

Algorithm

16

1. Select the node which has the lowest “current” temperature. 2. Sort jobs in descending order of the temperature rise they caused.3. For each job4. Assign the job to the selected node.5. Update the node’s temperature-time profile. 6. Select the node which has the lowest “current” temperature.7. End For8. If a node’s temperature exceed the threshold, don’t choose it

in the next round and let it cool down.

Experiment

17

0 20 40 60 80 100 120 140 160 1800

2

4

6

8

10

12

14

16

f(x) = 6.17136207851786 ln(x) − 16.980854076871f(x) = − 0.000488906926406926 x² + 0.169975108225108 x − 0.543030303030302

Series1Logarithmic (Series1)Polynomial (Series1)

Task temperature profile

Execution Time(s)

Tempe

rature

Experiment

18

iCore7 cooling profile

0 20 40 60 80 100 120 14062

64

66

68

70

72

74

76

78

80

Series1Polynomial (Series1)

Time(s)Tem

pera

ture

Result

19

σ ( Thermal aware task scheduling )

σ ( Random task scheduling )

N=10M=30

6.2 13.4

N=20M=30

5.3 11.1

N=20M=40

7.3 16.5

N indicates the number of job groupsM indicated the number of jobs in each group

Related Work

•In [1], [2], power reduction is achieved by the power- aware task scheduling on DVS-enabled commodity systems which can adjust the supply voltage and support multiple operating points.

•[1] K. H. Kim, R. Buyya, and J. Kim, “Power aware scheduling of bag-of- tasks applications with deadline constraints on dvs-enabled clusters,” in CCGRID, 2007, pp. 541–548. •[2] R. Ge, X. Feng, and K. W. Cameron, “Performance-constrained distributed dvs scheduling for scientific applications on power-aware clusters,” in SC, 2005, p. 34.

20

Related Work

•In [3], [4] thermodynamic formulation of steady state hot spots and cold spots in data centers is examined and based on the formulation several task scheduling algorithms are presented to reduce the cooling energy consumption.

•[3] Q. Tang, S. K. S. Gupta, and G. Varsamopoulos, “Thermal-aware task scheduling for data centers through minimizing heat recirculation,” in CLUSTER, 2007, pp. 129–138.•[4] J. D. Moore, J. S. Chase, P. Ranganathan, and R. K. Sharma, “Making scheduling ”cool”: Temperature-aware workload placement in data centers,” in USENIX Annual Technical Conference, General Track, 2005, pp. 61–75.

21

CONCLUSION

My accomplishment in the research: Grid computing and Cloud computing

literature review Make an analyzing study on Buffalo data

center operation. Scheduling algorithms literature review

22

23

Conclusion• A novel framework to solve resource

management problem.• A thermal-aware task scheduling for data

center, which will save a lot of cooling energy cost.

• Future work– Investigate other thermal characteristic of data

centers.– Continue the development of thermal-aware

resource management framework.

24

PUBLICATION

G. von Laszewski, F. Wang, A. Younge, X. He, Z. Guo, and M. Pierce, “Cyberaide javascript: A javascript commodity grid

kit,” in GCE08 at SC’08. Austin, TX: IEEE, Nov. 16 2008. [Online]. Available:

http://cyberaide.googlecode.com/svn/trunk/papers/ 08- javascript/vonLaszewski- 08- javascript.pdf

G. von Laszewski, A. Younge, X. He, K. Mahinthakumar, and L. Wang, “Experiment and workflow management using

cyberaide shell,” in 4th International Workshop on Workflow Systems in e-Science (WSES 09) in conjunction with 9th IEEE

International Symposium on Cluster Computing and the Grid. IEEE, 2009.

25

http://cyberaide.googlecode.com/svn/trunk/papers/

26

Appendix

Appendix

27

Appendix

28

Documents

Thermal Aware Resource Management Framework Xi He, Gregor von Laszewski, Lizhe Wang Golisano College of Computing and Information Sciences Rochester Institute