Thermal modeling and management of cluster storage systems xunfei jiang 2014
42
Thermal Modeling and Management of Cluster Storage Systems Xunfei Jiang Advisor: Xiao Qin Department of Computer Science and Software Engineering, Auburn University, Auburn, AL Download the dissertation at: http://www.eng.auburn.edu/~ xqin/theses/PhD-Jiang-Thermal_Model.pdf The abstract of this dissertation can be found at: https:// etd.auburn.edu/handle/10415/4202
Thermal modeling and management of cluster storage systems xunfei jiang 2014
Thermal Modeling and Management of Storage Systems Author: Jiang, Xunfei Abstract: Energy consumption of data storage systems has increased significantly for the past decades. There is an urgent need to build energy-efficient data storage systems. Computing cost of IT facilities and cooling cost of air conditioners contribute to a large portion of the total energy consumption of data centers. A large amount of researchers focus on reducing the computing cost by balancing workload or powering off idle data nodes to save energy. In recent years, growing attention has been paid to decreasing the cooling cost. Temperature is a major contributor to cooling cost, and thermal management has become a popular topic in building energy-efficient data centers. Extensive research of thermal impacts of processors and memories has been presented in literature, however, the thermal impacts of disks have not been fully investigated. In this dissertation, experiments are conducted to characterize the thermal behavior of processors and disks by using real-world benchmarks (e.g., postmark and whetstone). The profiling results show that disks have comparable thermal impacts as processors to overall temperature of a data node. Then, we develop an approach to generate thermal models for estimating temperatures of processors, disks, and data nodes. We validate the thermal models by comparing the predictions with real measurements by temperature sensors deployed on data nodes. We further propose an energy model to estimate the total energy cost of data nodes. Finally, by applying our thermal and energy models, we propose thermal management strategies for building energy-efficient data centers. These strategies include a thermal-aware task scheduling strategy, thermal-aware data placement strategies for homogeneous and hybrid storage clusters, and a predictive thermal-aware data transmission strategy.
Citation preview
1. Thermal Modeling and Management of Cluster Storage Systems
Xunfei Jiang Advisor: Xiao Qin Department of Computer Science and
Software Engineering, Auburn University, Auburn, AL Download the
dissertation at: http://www.eng.auburn.edu/~xqin/theses/PhD-Jiang-
Thermal_Model.pdf The abstract of this dissertation can be found
at: https://etd.auburn.edu/handle/10415/4202
2. Outline Motivation Related Work Thermal Modeling Thermal
Management Results 2
3. Motivation On Facebook, 500 terabytes of new data every day
High definition: 1h -> 1GB 144 TB/day Standard definition:
1h->1/3 GB 48 TB/day Until 2011, there are 509,147 data centers
reported by Emerson. 3
4. Motivation (cont.) Energy Consumption of Data Centers 4
5. Motivation (cont.) Cooling Cost VS Outlet Temperature What
contribute to outlet temperatures? Power down memory units Change
inter-seek time or seek-time Reduce CPU frequency CPU Disk
Mather-board Memory 5
6. Motivation (cont.) Why study thermal impact of disks ?
Teradata equipment 6
7. Motivation (cont.) Data transfer Energy Predictor Best
Solution How to reduce energy cost 7
8. Related Work Reduce Energy Cost Computing Cost Power off
data nodes Workload distribution Cooling Cost Reducing the outlet
temperatures of data nodes Balance temperature distribution
Optimizing the air recirculation 8
9. Related Work (cont.) Disk Temperature Models hA T dQ dt
Steady temperature(45.22C) 48 minutes External temperature: 28C
dQ/dt: heat transfer rate h: heat transfer coefficient A: area T:
temperature difference Source: Sudhanva Gurumurthi, Anand
Sivasubramaniam, and Vivek K. Natarajan, Disk drive roadmap from
the thermal perspective: A case for dynamic thermal management.
ISCA05. (The Pennsylvania State University) 9
10. Related Work (cont.) Disk Temperature Models Seek time
Inter-seek time Inter-seek time Seek-time Idle Acceleration Coast
Deceleration Data transfer Source: Youngjae Kim, S. Gurumurthi, and
A. Sivasubramaniam. Understanding the performance-temperature
interactions in disk i/o of server workloads. HPCA2006. (The
Pennsylvania State University) 10
11. Thermal Modeling Testbed Node CPU Intel(R) Celeron(R)
[email protected] Network 1 Gigabit Ethernet network card Disk WD-500GB
SATA disk Operating Ubuntu 10.04(lucid) System Linux kernel
2.6.32-43 Software and Tools Exterior temperature sensors and
Minigoose iostat and sensors 11
12. Model Disk Temperature Previous disk temperature models
Heat transfer Disk activity Our new method Disk utilization hard to
collect data easy to collect data Percentage of CPU time during
which I/O requests were issued to the disk. 12
13. Model Disk Temperature (cont.) Impact of Transactions on
Disks Experiments 1 2 3 File number 100 100 100 File size 1.E+6 -
1.E+8 1.E+6 - 1.E+8 1.E+6 - 1.E+8 Transaction numbers 1000 2000
5000 Benchmark Postmark Task Configuration buffering : default to
true (use buffered stdio function instead of lower level raw system
calls) 13
14. Model Disk Temperature (cont.) Impact of Transactions on
Disks Preliminary results disk utilization 100% 29 28.8 28.6 28.4
28.2 28 27.8 27.6 27.4 heat up stage steady stage cool down stage 0
10 20 30 40 50 60 70 80 90 100110120130140150 Temperature ( C) Time
(min) 1000 transactions 2000 transactions 5000 transactions 14
15. Model Disk Temperature (cont.) Heat Up Stage Polynomial
Model Tdisk(t) = * t2 + * t + Logarithmic Model Tdisk(t) = * ln(t)
+ 28.9 28.7 28.5 28.3 28.1 27.9 27.7 27.5 Real Measurement
Polynomial Fit Logarithmic Fit 0 5 10 15 20 25 30 Temperature ( C)
Time (min) 15
16. Model Disk Temperature (cont.) Steady Stage Tdisk = 28.7
Cool Down Stage The same process as the heat up stage 16
17. Impact of Disk Utilization Various Utilizations Buffering
false: lower level raw system calls. set various write block size
28.37 28.87 28.89 29.11 14.24 28.9 53.49 80.57 100 80 60 40 20 0
29.2 29 28.8 28.6 28.4 28.2 28 16 32 64 128 Utilization (%)
Temperature ( C) Write Block Size (KB) Temperature Utilization
17
18. Impact of Disk Utilization Parameters for Models of Disk
Temperature under Various Utilizations Utiliz ation (%) Polynomial
Fit Logarithmic Fit Err(%) Err(%) 14 -0.0018 0.0486 27.6310 1.15
0.2130 27.497 0.18 29 -0.0007 0.0392 27.5599 0.17 0.1983 27.559
0.19 53 -0.0001 0.0257 27.6838 0.27 0.1918 27.733 0.17 80 -0.0018
0.0958 27.4431 0.61 0.2382 27.526 0.21 100 -0.0029 0.1085 27.6833
0.27 0.2733 27.758 0.43 On average, Logarithmic models fit the disk
temperatures better than Polynomial models. 18
19. Generate Disk Model for a Specific Utilization 29 28.8 28.6
28.4 28.2 28 27.8 27.6 27.4 27.2 27 WordCount: average disk
utilization 18.60% 0 5 10 15 20 25 30 Temperature ( C) Time (min)
14.24% 28.92% 53.49% 80.57% 100% 18.60% Logarithmic Fit Real Value
Precision Error: 0.48% 19
20. Impact of disks on Outlet Temperature Multiple Disks 4 3.5
3 2.5 2 Inlet/outlet Temperature Difference 0 10 20 30 40 50 60 70
80 90 100110120 # of disks Initial difference Peak difference 1
2.45 2.95 2 2.8 3.06 3 2.9 3.38 4 3.38 3.75 Temperature ( C) Time
(min) 1 disk 2 disks 3 disks 4 disks increase 0.3C 20
21. Thermal Profiling of CPU Impact of CPU on CPU Temperature
Whetstone Floating-point benchmark Small modifications Change the
number of LOOPs Various the utilization of CPU Interior sensor
Monitor CPU temperature 21
22. Thermal Profiling of CPU (cont.) 60 55 50 45 40 35 30 Min
Average Max 4000 8000 10000119001195012000 Temperature (C) LOOPS
100 90 80 70 60 50 40 30 20 10 0 4000 8000 10000119001195012000
Utilization (%) LOOPS 22
23. Modeling CPU Temperature Polynomial Model TCPU(t) = * t2 +
* t + Logarithmic Model TCPU(t) = * ln(t) + Real Measurement
Polynomial Fit Logrithmic Fit 52 50 48 46 44 42 40 0 100 200 300
400 500 600 Temperature ( C) Time (sec) 23
24. CPU Temperature Models Parameters for Models of CPU
Temperature under Various Utilizations Utiliza tion (%) Polynomial
Fit Logarithmic Fit Error(%) Error(%) 13.8 -9*E-06 0.007 40.41 2.25
0.23 40.06 2.52 26.7 -1*E-05 0.0011 41.57 6.48 0.57 40.01 2.37 33.1
-1*E-06 0.0005 42.01 2.32 0.52 40.20 2.15 65.2 -1*E-06 0.004 45.17
4.11 1.10 40.02 3.93 77.9 -2*E-05 0.0016 44.96 2.58 1.13 41.55 2.34
90.5 -1*E-05 0.0014 45.74 1.87 1.16 42.11 1.31 On average, the
precisions of Logarithmic models are better than those of
Polynomial models. 24
25. Outlet Temperature Model Outlet Temperature Toutlet = a + b
* TCPU + c * Tdisk + Tinlet 26.2 26 25.8 25.6 25.4 25.2 25 24.8
real measurement estimate value 0 10 20 30 40 50 60 70 Temperature
( C) Time (min) Precision error: 0.5 % 25
26. Energy Prediction Model Energy Prediction Model Computing
Cost Model Performance Model Thermal Model COP Model Total Energy
Cost Computing Cost Cooling Cost Utilization & Execution Time
Utilization Tinlet Workload 26
27. Energy Prediction Model (cont.) Performance Model
Utilizations of CPU and disks Execution time Computing Cost Model =
+ ( ( 27 ))
28. Energy Prediction Model (cont.) COP (Coefficient of
Performance) Model COP(T) = PC /PAC COP(T) = 0.0068*T2 + 0.0008*T +
0.458 28 8 7 6 5 4 3 2 1 0 10 12 14 16 18 20 22 24 26 28 30
Coefficient of Performance (Heat Removed / Work) CRAC Supply
Temperature ( C) Source: Justin Moore, Jeff Chase, Parthasarathy
Ranganathan, and Ratnesh Sharma. Making scheduling "cool":
temperature-aware workload placement in data centers. ATEC05
29. PTMS: Predictive Thermal Management System Data
Transmission 1 2 3 4 Direct Transmission Archive Transmission 5 6 7
8 9 Node1 Node 2 Compressed Transmission How will data transmission
methods affect energy? 29
30. Preliminary Experiments Testbed Node 1 Node 2 CPU Intel(R)
Celeron(R) [email protected] Network 1 Gigabits Ethernet network card Disk
WD-500GB SATA disk WD-160GB SATA disk Operating Ubuntu 10.04(lucid)
System Linux kernel 2.6.32-43 Ubuntu 10.04(lucid) Linux kernel
2.6.32-38 30
31. Preliminary Experiments Dataset 1 A Single Text File 507.7
MB Dataset 2 Linux kernel package 454.8 MB 40,927 small source code
files 31
34. Preliminary Experiments Conclusion on Experimental Results
Dataset 1 Dataset 2 Least Thermal-friendly CT CT Least Energy
Consumption DT AT 34
35. Predictive Thermal Management System (PTMS) Framework
Method Selector Energy Predictor Runtime Data Monitor Monitor Node
n Node 1 Prediction Request Energy Cost Data Transmission Request
Method Data PTMS (Predictive Thermal Management system) 35
36. Experiments Datasets Human Genome Dataset 60 GB NIHs
(National Institutes of Health) NCBI Multimedia Dataset 50 GB
millions of songs Strategies DT/AT/CT/PTMS 36
37. Experimental Results Human Genome Dataset Transmission time
and total energy cost of transferring the Human Genome dataset
4000000 3500000 3000000 2500000 2000000 1500000 1000000 500000 0
12000 10000 8000 6000 4000 2000 0 Time Energy DT AT CT PTMS Energy
Consumption (J) Execution Time (s) Transmission Strategies 37
38. Experimental Results (cont.) Multimedia Dataset
Transmission time and total energy cost of transferring the
multimedia dataset 4000000 3500000 3000000 2500000 2000000 1500000
1000000 500000 0 5000 4000 3000 2000 1000 0 Time Energy DT AT CT
PTMS Energy Consumption (J) Execution Time (s) Transmission
Strategies 38
39. Experimental Results (cont.) Overall Energy Cost 1,163,600
819,665 3,503,273 816,701 4,000,000 3,500,000 3,000,000 2,500,000
2,000,000 1,500,000 1,000,000 500,000 0 DT AT CT PTMS Energy
Consumption (J) 39
40. Conclusion Thermal Modeling Approach Model CPU and disk
temperatures Model Outlet Temperature CPU, disk and inlet
temperatures Predictive Thermal Management System Choose the most
energy-efficient strategy for data transmission 40
41. Thank you! 41
42. Related Work (cont.) Predictive Thermal Management
Performance Effective Dynamic Thermal Management (DTM) a predictive
algorithm use response mechanisms C-Oracle predict temperature and
performance impacts various thermal management reactions select the
best reaction 42