Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Predicting Container-Level Power
Consumption in Data Centers using
Machine Learning Approaches
Rasmus Bergström
Computer Science and Engineering, master's level
2020
Luleå University of Technology
Department of Computer Science, Electrical and Space Engineering
ABSTRACT
Due to the ongoing climate crisis, reducing waste and carbon emissions has become hot
topic in many fields of study. Cloud data centers contribute a large portion to the world’s
energy consumption. In this work, methodologies are developed using machine learning
algorithms to improve prediction of the energy consumption of a container in a data
center. The goal is to share this information with the user ahead of time, so that the
same can make educated decisions about their environmental footprint.
This work differentiates itself in its sole focus on optimizing prediction, as opposed to
other approaches in the field where energy modeling and prediction has been studied as
a means to building advanced scheduling policies in data centers.
In this thesis, a qualitative comparison between various machine learning approaches to
energy modeling and prediction is put forward. These approaches include Linear, Poly-
nomial Linear and Polynomial Random Forest Regression as well as a Genetic Algorithm,
LSTM Neural Networks and Reinforcement Learning.
The best results were obtained using the Polynomial Random Forest Regression, which
produced a Mean Absolute Error of of 26.48% when run against data center metrics
gathered after the model was built. This prediction engine was then integrated into a
Proof of Concept application as an educative tool to estimate what metrics of a cloud
job have what impact on the container power consumption.
iii
PREFACE
This work was be performed at Xarepo AB. It was made possible by the access to
anonymized data from the Oulu and RISE Lulea data centers, as part of cooperation in
the ArctiqDC project, with funding by Interreg North.
Special thanks to Marcus Liwicki, Saleha Javed and Olov Schelen for great input and
support throughout the entire duration of the thesis.
v
CONTENTS
Chapter 1 – Introduction 1
1.1 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2 – Related Work 5
2.1 Green Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Resource Allocation Policy . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Workload Analysis and Prediction . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Energy Consumption Analysis and Prediction . . . . . . . . . . . . . . . 8
Chapter 3 – Theory 11
3.1 Contributing Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Data Center Related . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.2 Workload-Related . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.3 Environment Related . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Candidate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.2 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . 15
3.2.4 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 4 – Method 17
4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.1 Metrics from Data Centers . . . . . . . . . . . . . . . . . . . . . . 18
4.1.2 Weather Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 General Model Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4.1 PyMC3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4.3 Polynomial Linear Regression . . . . . . . . . . . . . . . . . . . . 23
4.4.4 Polynomial Random Forest Regression . . . . . . . . . . . . . . . 23
4.5 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.5.1 Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.5.2 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5.3 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5.4 Mutation & Crossover . . . . . . . . . . . . . . . . . . . . . . . . 25
4.6 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.6.1 Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.6.2 Long Short-Time Memory . . . . . . . . . . . . . . . . . . . . . . 28
4.7 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.8 Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 5 – Results 31
5.1 Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.2 Polynomial Linear Regression . . . . . . . . . . . . . . . . . . . . 35
5.2.3 Random Forest Regression . . . . . . . . . . . . . . . . . . . . . . 38
5.3 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4.1 Basic ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4.2 Long Short-Time Memory . . . . . . . . . . . . . . . . . . . . . . 43
5.5 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Chapter 6 – Evaluation 45
6.1 Qualitative Model Comparison . . . . . . . . . . . . . . . . . . . . . . . . 45
6.1.1 Prediction Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.1.2 Prediction Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Chapter 7 – Discussion 49
7.1 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.2 Comparison with Previous Work . . . . . . . . . . . . . . . . . . . . . . . 50
7.3 Usefulness of the Findings . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Chapter 8 – Conclusion and Future Work 51
8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
viii
FIGURES
Chapter 1 – Introduction 1
Chapter 2 – Related Work 5
2.4.1 Power consumption prediction results shown in papers over the last decade. 8
Chapter 3 – Theory 11
Chapter 4 – Method 17
4.1.1 Python string containing a PromQL query . . . . . . . . . . . . . . . . . 18
4.1.2 PromQL query expressed using the Python adapter . . . . . . . . . . . 18
4.3.1 General model setup, the model should find a relationship between the
metrics and the power consumption. . . . . . . . . . . . . . . . . . . . . 21
4.5.1 The first version of an individual in the Genetic Algorithm. b is the bias, n
is the number of degrees in the polynomial, m is the number of parameters. 24
4.6.1 The architecture schema for the basic ANN model. . . . . . . . . . . . . 27
4.6.2 An overview of the Encoder-Decoder LSTM chain. . . . . . . . . . . . . 28
4.7.1 How NAF updates parameters. . . . . . . . . . . . . . . . . . . . . . . . 29
4.7.2 The main loop of the Reinforcement Learning approach. . . . . . . . . . 29
Chapter 5 – Results 31
5.1.1 The raw data obtained from the data center. It is difficult to detect
correlations by eye. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.2 Principal Component Analysis of container_power with 6 parameters. . 33
5.1.3 Correlation matrix of the parameters and output. The stronger corre-
lation between the power consumption and the network metrics could
partly be resulting from the fact that both are averaged node metrics. . 34
5.2.1 Test sample accuracy of simple linear regression between container_cpu_seconds
and average node_power per container. . . . . . . . . . . . . . . . . . . 36
5.2.2 Polynomial linear regression . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.3 Polynomial Random Forest Regression . . . . . . . . . . . . . . . . . . . 39
5.3.1 The training accuracy per generation when running the Genetic Algo-
rithm. Due to the large error in the beginning it is very hard to visualize,
but the decrease in error was quite gradual, and the best level of accuracy
was achieved around generation 1000. . . . . . . . . . . . . . . . . . . . 40
5.4.1 The training and validation errors over the course of the training. They
show that, unexpectedly, validation errors are lower than training errors. 41
5.4.2 The training and validation predictions compared to the actual values of
the container_power. Note: The values on the x-axis are indexes, not
epochs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4.3 The training and validation predictions at epoch 3000, compared to the
actual container_power. The values on the y-axis are the container_power
in Watt-Hours, with the indexes on the x-axis. . . . . . . . . . . . . . . 43
5.5.1 Results of the Reinforcement Learning per Epoch. As can be clearly seen
on the graph, the result did not converge towards a low absolute error. . 44
Chapter 6 – Evaluation 45
Chapter 7 – Discussion 49
Chapter 8 – Conclusion and Future Work 51
x
TABLES
Chapter 1 – Introduction 1
Chapter 2 – Related Work 5
Chapter 3 – Theory 11
3.1.1 Data center metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.2 Workload metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.3 Environment metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 4 – Method 17
4.1.1 Data center metric granularity . . . . . . . . . . . . . . . . . . . . . . . 19
Chapter 5 – Results 31
5.2.1 Prediction accuracy of linear regression between container_cpu_seconds
and container_power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.2 Results of polynomial linear regression. The leftmost column contains
the details for the actual values that the regression is trying to predict,
the rest of the columns show the distribution of the absolute error
achieved with Kth degree polynomial linear regression. The best accuracy
is marked with boldface. . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.3 Results of polynomial random forest regression. The leftmost column
contains the details for the actual values that the regression is trying to
predict, the rest of the columns show the distribution of the absolute
error achieved with Kth degree polynomial random forest regression. The
best accuracy is marked with boldface. . . . . . . . . . . . . . . . . . . 38
Chapter 6 – Evaluation 45
6.1.1 The attempted approaches and their outcomes. . . . . . . . . . . . . . . 47
Chapter 7 – Discussion 49
Chapter 8 – Conclusion and Future Work 51
CHAPTER 1
Introduction
A large portion of today’s computation is carried out in data centers. Cloud providers
transparently manage the entire infrastructure, from cooling and hardware to virtualiza-
tion and horizontal scaling. While potently reducing the amount of operations needed
on the part of the user, the opacity of the underlying resources can lead to a diminished
association between the jobs submitted to the data center on the one hand, and the
pollution they cause on the other.
When browsing the offers of the main cloud vendors it is clear that energy consumption
is not meant to drive the customer’s choice of cloud provider.
Foundational to this work is the belief that people are eager to make a difference in
the fight against the raging climate crisis. With access to advanced models for predicting
and understanding the energy consumption of cloud systems, they could be informed of
the size of their environmental footprint ahead of time. This would empower users to
make educated choices with regards to the green footprint of their technology usage.
Most research in the field [1][2][3] is related to making the data centers more energy
efficient. This means that energy modeling and prediction has been studied as a means
to building advanced scheduling policies for data centers. This work differentiates itself
from other research in the field in that it deals solely with improving the prediction
accuracy itself, with the intention of informing users.
This section outlines the goal of the project by stating the research question and break-
ing it down into actionable steps. It also covers practicalities such as the delimitations
and evaluation of the project.
1
2 Introduction
1.1 Research QuestionCan the prediction of energy consumption of a job in a data center be opti-
mized using machine learning methodologies?
In the context of this thesis, prediction does not refer to foretelling the power consump-
tion at a different (later) time. Instead, the term refers to estimating the consumption
given a collection of other metrics. This is seen as a necessary step in order to later being
able to forecast future energy consumption (see Section 8.2).
The research question can be broken down into the following steps,
1. Select metrics that impact the energy consumption of a container running in a data
center.
2. Figure out how to attain data for these metrics and clean it for use.
3. Develop multiple models to predict the energy consumption of a container based
on the metrics chosen.
4. Evaluate the different models and select the best one.
5. Present the metrics and predictions to the user in a way that enables and motivates
them to reduce the environmental impact of their cloud usage.
1.2 DelimitationsThis project is focused solely on making accurate predictions, with the goal helping users
make educated decisions about their cloud energy performance. What follows is a list of
concerns that could be meaningful to explore, but that are considered outside the scope
of the project, with motivations as to why.
• Taking any (scheduling) actions based on the results of the predictions. This is
because the goal is to inform users, not to make a more efficient data center.
• Forecasting energy consumption into the future, since is a related topic but requires
a different approach.
• Changing the prediction model during run-time, since it would require much more
engineering and it is unclear what advantages it would have.
1.3. Evaluation 3
• Varied climates and seasons, because only data gathered during the course of the
project and from the available data centers will be used.
• User research into what kind of presentation has the greatest impact on people’s
desire to sacrifice comfort and ease-of-use for a better environmental footprint,
since that research would preferably be performed when it is batter known what
the nature of possible predictions is.
1.3 EvaluationSince it was unknown from the beginning of the project whether it would be possible
to derive the desired prediction accuracy from the data set, the project had a nature of
research and exploration. The following points were to be guiding when evaluating the
quality of the findings,
1. A qualitative comparison between the different models attempted in the course of
the project (See Section 6.1).
2. An evaluation of the effectiveness of the model when faced with data that it has
not yet encountered (See Section 6.2).
3. A critical reasoning about how useful the information obtained is to reduce the
environmental footprint of the individual users (See Chapter 7.3).
1.4 Thesis StructureThe chapters of this thesis are structured according to convention with related work and
theory followed by method, results and evaluation. In the cases where the theory was
considered too short to warrant its own section, it was included directly in the method
section. Since the work itself is split over the following parts,
• Contributing Factors
• Data collection and exploration
• Candidate Models (Ordered by complexity)
many of the chapters follow that same division. In order to help the flow of the report,
these parts appear in the same order in each chapter.
CHAPTER 2
Related Work
The project started with an extended and deep literature review. Because the problem
statement differs from most related research, it was deemed necessary to throw the net
wide, researching many adjacent problems to get a comprehensive understanding of the
field. The chapter is split into four sections.
First, papers regarding Green Computing were explored. Efforts to discover more green
ways to develop and deploy software are not a new phenomenon. This research was aimed
at discovering to what extent cloud emissions had been explored.
Then, in order to learn more about data centers and the ways energy efficiency is
currently measured and optimized, a large collection of papers suggesting scheduling op-
timizations were read. Though some of these documents even mention energy-awareness,
they did not suggest actual power modeling schemes. Instead, they were valuable for the
insights they gave into the world of data center energy consumption as a whole.
One big difference between the approach taken in this thesis and those of most other
studies is that in those studies, the user-provided workload is considered an input of
unknown weight. In the case of the problem domain explored in this project the user’s
input could be considered known, since the goal is to inform the user of their own impact.
That said, many efforts have been expended to model the expected workload in a data
center at a given time. Though the goal is different, theirs was also a prediction endeavor,
and could therefore provide useful learnings to help in this project.
Finally, the papers most relevant to this research are covered. These are papers that
deal directly or indirectly with cloud center energy consumption analysis and prediction.
Many of these papers still do not discuss the accuracy of their predictions, since prediction
is seen as a means to improve scheduling algorithms, and those algorithms are the main
purpose of the paper.
5
6 Related Work
2.1 Green Computing
Though since the beginning energy consumption has been on the mind of hardware man-
ufacturers, it has rarely been a main focus for software. In recent years, the intensifying
climate crisis and the proliferation of cloud computing with its virtualization and con-
tainerization have started to change this mindset. Though some findings have been made,
Hindle [4] has outlined the continuing large need for research in the space.
In an instructive article from 2013, Chauhan et al. [5] introduced a framework for
thinking holistically about green infrastructure throughout an organization. They pos-
tulated that in order to achieve lasting impact in a large corporation, the green mindset
has to be present from requirements and design and all the way to test and deployment.
They also suggested that the customers should be given the ability to hold cloud vendors
accountable by tracking energy consumption limits in the Service-Level Agreements.
Ardito et al. [6] used power profiles from mobile devices to show that more often than
not, performant code is green code. They argue that established practices such as refac-
toring and eliminating dead code can have a large impact on device energy consumption.
2.2 Resource Allocation Policy
The main goal of most energy-aware techniques used in data centers is to improve energy
efficiency by implementing better scheduling and resource allocation. This is a hot topic,
with an explosion of papers over the last decade, all suggesting innovative optimization
techniques.
Of these, many have achieved promising results with regards to energy consumption
by presenting novel algorithms for Virtual Machine (VM) allocation (See Berral et al. [7,
8], Qiu et al. [2], Portaluri et al. [9], Fang et al. [10]) that take the maximum energy
consumption of the Physical Machines (PM) into account when deciding where to put
the VMs. Most of these algorithms operate under the assumption that the best way to
improve energy efficiency is to place the VMs as densely as possible on a subset of the
PMs, so that the rest of the PMs can be turned off completely, thus saving energy.
He et al. [11] decided to also account for the energy price and the potential availability
of renewable energy sources. They registered a 60 % improvement compared with a
previous solution (though theirs is the only paper that refers to that previous solution).
Wang et al. [12] used thermal data in addition to the energy consumption data and were
able to improve energy efficiency, though with a slight rise in SLA violations.
Zhou et al. [13], Radhakrishnan et al. [14], Kar et al. [15] and Javed et al. [16] all
used Genetic Algorithms to improve the energy efficiency of clouds when subjected to
various workloads. The difference between many of those solutions and the subject of
this thesis is that their algorithms change the behavior of the data center in response
to the workloads. Shaw et al. [1] and Zhou et al. [17] used Reinforcement Learning to
allocate resources for optimal energy efficiency, also with good results.
2.3. Workload Analysis and Prediction 7
2.3 Workload Analysis and PredictionOne of the main challenges with estimating energy consumption in data centers is the
heterogeneous and dynamic nature of the workloads that the cloud is expected to handle.
Many attempts have been made to accurately and confidently estimate such workloads
in order to optimize resource allocation.
Rajarathinam et al. [18] used a non-linear auto-regressive network with exogenous input
and were able to show that their method was superior to the purely statistical method
they used as reference. Qazi et al. [19] based their approach on Chaos Theory and nearest
neighbors classification in order to develop a framework that allowed them to make fine
grained predictions.
Ramezani et al. [20] applied fuzzy workload prediction and a fuzzy logic system to pre-
dict and control future changes in CPU workload. They were able to predict which PMs
would become hotspots by continuously looking for poor VM performance. Kalyampudi
et al. [21] used a Moving Error Rate to predict the workload of various nodes. They were
able to obtain an average error rate of 6.18 % data sets from 5 different data centers.
Zhang et al. [22] applied deep learning to predict CPU utilization of VMs, both for the
next hour and the next day. They also discuss ways to speed up training using Polyadic
Decomposition. They were able to obtain a Means Absolute Percentage Error of 0.26
and a Root Mean Square Error of 9.97 for 60 minute predictions.
Nwanganga et al. [23] classified a given workload according to the nearest neighbor
structural similarity to historical workloads and used those previous workloads to pre-
dict the behavior of the new workload with successful results in some cases, but with
varied support and confidence values. They propose introducing more features into the
specification.
Ding et al. [24] combined Moving Average and Median Absolute Deviation in order to
predict the workload. Sadly they did not focus on revealing the accuracy of that model,
but they said that it was effective relative to other models.
8 Related Work
2.4 Energy Consumption Analysis and Prediction
An important component of forecasting is the analysis of past data in order to gain
valuable insights. This section is all about energy consumption modeling and estimation.
While there have been numerous papers in that domain, it is worth to note that the field
has improved rapidly over the last decade and that recent results are far better than
older ones. Figure 2.4.1 shows some of the previous results over the last 10 years.
Figure 2.4.1: Power consumption prediction results shown in papers over the last decade.
A large problem when exploring the previous work is the lack of access to benchmark
metric and power consumption traces. Each paper seems to be referring to different data
centers and/or datasets. Most authors state that the traces were from peak data center
performance, which is not the case for the datasets used in the course of this thesis. It
is also not the norm to provide access to the traces for reproduction of the results. This
makes it very difficult to establish what the current state of the art actually is.
2.4. Energy Consumption Analysis and Prediction 9
Earlier attempts, dating back to 2010, can be found in Meisner et al. [25], who modeled
peak power consumption by characterizing the relationship between server utilization and
power supply behavior. They were able to predict the peak power trace with an error be-
low 20 %. Meanwhile Dhiman et al. [26], using Gaussian Mixture Vector Quantification,
achieved an average error of less than 10 %.
Jaiantilal et al. [3] used linear as well as random forest regression to model energy con-
sumption for scheduling purposes. They did not explicitly state the error they obtained,
but from their graphs it looks like the random forest regression was more effective.
In 2016, Dayarathna et al. [27] performed an in-depth study of the existing literature
on data center power modeling available at that time, and emphasized taking the entire
data center system into account when modeling energy consumption.
Canuto et al. [28] proposed deriving a single model per platform to account for het-
erogeneity in cloud systems. They surmised that the correlation between certain metrics
and energy consumption will vary between platforms, and used a minimum set of indi-
cators for each platform, based on that correlation. At the time, their results were very
promising.
Borghesi et al. [29] used random forest regression to predict job power consumption in
high-power computing scenarios. They reported that training and predicting went very
fast, with a mean error of around 8–9 % over the entire test period (15 % when including
outliers).
Li et al. [30] used extensive power dynamic profiling, auto-encoders and deep learning
models to try and optimize the accuracy of predictions. They presented two models, one
coarse and one fine-grained, and reported 79 % error reduction for certain cases.
Kistowski et al. [31] used multiple linear regression to show that the power consump-
tion of CPU and storage loads could be predicted with a prediction error of less than
15 % percent across a number of virtualized environment configurations. They further
introduced a heuristic for pruning workloads to avoid using workloads that may lead to
a decrease in prediction accuracy.
Liu et al. [32] used an LSTM-based approach, landing at a mean absolute error rate of
4.42 % on data center power consumption. Ferroni et al. [33] used a divide and conquer
approach to model power consumption of heterogeneous data centers. They were able to
achieve a relative error of 2 % on average and under 4 % in almost all cases. Instead of
building one comprehensive model they identified distinct working states of the system
and built a model for each of them.
Rayan et al. [34] used polynomial regression to predict power consumption as well as
the number of physical machines needed, all based on the daily workload. They did not
share numbers for the error but the graphs seemed to show good results.
Hsu et al. [35] made a feature selection from over 4000 operational trace data variables
and ran though through a non-linear auto-regressive exogenous model. They used sliding
window and validation data sets for model building and were able to achieve a mean
squared error of 1.13 %.
10 Related Work
Khan et al. [36] studied node power consumption and discussed approaches to future
estimation. They covered vast amounts of log data with statistical and machine learning
analysis and were able to estimate plug energy consumption with a mean absolute error
rate of 1.97 %. They found that the biggest impact came from failed jobs, as well as
from the CPU and Memory metrics.
Patil et al. [37] suggested forwarding an ensemble of base predictors (Exponential
Smoothening, Auto-Regressive Integrated Moving Average, Nonlinear Neural Network
and Trigonometric Box-Cox Auto-Regressive Moving Average Trend Seasonal Model) to
a fuzzy neural network with self-adjusting learning rate and momentum weight.
Yi et al. [38] used two LSTM in tandem to predict the temperature and energy con-
sumption of the processor in the next step of their resource allocation algorithm. They
found that a single LSTM yielded inferior prediction accuracy. With the tandem ap-
proach they achieved a root mean square error of 3 %.
Kistowski et al. [39] introduced an off-line power prediction method that used the
results of standard power rating tools. They used a selection of four different formalisms,
from which they attempted to automatically select the best one. They were able to
achieve an average error of 9.49 % for three workloads running on real-world, physical
servers.
Yi et al. [40] showed that deep reinforcement learning can be effective when allocating
compute-intensive job in data centers. They used an expectation maximization algorithm
to construct a Gaussian mixture model. They found that constructing separate LSTM
networks for each of the clusters led to a higher prediction accuracy.
CHAPTER 3
Theory
In this section the supporting theory is put forth. It builds heavily on the research con-
ducted in Chapter 2 and on other sources. Section 3.1 is concerned with a walk-through
of the various data metrics that could be important for the project, while Section 3.2
outlines the theory behind the various methodologies used to perform data analysis and
prediction.
Due to the lack of related work in exactly the same problem domain, the theory portion
of this thesis is limited. Most of the models used were developed by trial and error, and
are covered in Chapter 4.
3.1 Contributing FactorsThis section outlines the selection of metrics used as parameters when analyzing and
predicting power consumption. Section 3.1.1 describes actual metrics from the data
center. Section 3.1.2, covers factors that are related to the workload. Section 3.1.3
contains factors related to the environment in which the data center operates.
11
12 Theory
Table 3.1.1: Data center metrics
Name Description Unit
cpu_seconds CPU Processing Time Seconds
memory_bytes Memory Usage Bytes
read_bytes File System Read Bytes
write_bytes File System Write Bytes
receive_bytes Network Receive Bytes
transmit_bytes Network Transmit Bytes
power Node Plug Power Watt-Hours
Table 3.1.2: Workload metrics
Name Description Unit
payload_bytes The size of the payload Bytes
payload_cycles Number of cycles to perform job Scalar
3.1.1 Data Center Related
There are hundreds of metrics available from most data center monitoring systems. What
can be difficult when switching from one distributed setup to another is comparing the
metrics, it is easy to end up with an apples to oranges comparison. In order to combat
this, the focus was put on the most straightforward measurements, data that the user
themselves could experiment with.
The metrics chosen can be seen in Table 3.1.1. It is worth noting that many pa-
pers [26][32] focused predominantly on cpu_seconds and memory_bytes for prediction
purposes.
3.1.2 Workload-Related
The properties of the job the user submits impact its performance. Armed with knowledge
about what impact their decisions make, the user can make educated decisions about how
to optimize. In Table 3.1.2, metrics are highlighted which describe the workload.
3.2. Candidate Models 13
Table 3.1.3: Environment metrics
Name Description Unit
temperature Air temperature Celsius
wind_speed Wind speed km/h
weather_description Human readable description of the weather String
pressure Air pressure hPA
humidity Air humidity %
time_of_day Time of Day Seconds
month The number of the month [1-12]
power_price The average price of power that day SEK
3.1.3 Environment Related
A data center does not operate in a vacuum. There are a number of factors in the
environment, and many of them might impact the performance of the data center. The
metrics chosen for the various factors for use in training prediction models can be found
in Table 3.1.3.
3.2 Candidate ModelsThere are numerous ways in which to perform data analysis and prediction, ranging
from simple linear regression to more advanced approaches. Indeed, as can be seen in
Section 2, many different approaches have been tried in adjacent problem domains with
notable success. In order to make an interesting study, it was deemed wise to try a
variety of different approaches and perform a comparative analysis between them.
This section outlines the supporting theory for the approaches attempted during the
project. Section 3.2.1 outlines the theory behind Bayesian Inference, which was chosen as
a statistical baseline upon which to draw. Section 3.2.3 covers Artificial Neural Networks,
Recurrent Neural Networks and Long Short Term Memory to see explore they can be
optimized for the task at hand. Section 3.2.2 explores an evolutionary approach, and
finally Section 3.2.4 explores and evaluates the merits of Reinforcement Learning as
applied to the problem formulation.
14 Theory
3.2.1 Bayesian Inference
Bayes’ Theorem [41] is a mathematical framework for estimating the probability of an
event based on some initial belief or knowledge that we have, commonly known as the
prior. The scenario is the following, we have just observed event B, and we are trying
to estimate P (A|B). According to Bayes’ Theorem (See Equation 3.1), we can then use
the prior P (A) to estimate it.
P (A|B) =P (B|A) P (A)
P (B)(3.1)
Bayes’ Theorem has many applications, one of which is Bayesian Inference, which refers
to the process of extracting properties from data using Bayes’ Theorem. Equation 3.1
can then be rewritten as shown in Equation 3.2 (Θ represents the prior distribution).
P (Θ|data) =P (data|Θ) P (Θ)
P (data)(3.2)
In other words, one makes an initial assumption about the distribution of the data
given a set of parameters. One uses this prior to make a prediction based on the next
data point observed. The actual value can then be differentiated with the prediction in
order to find the error, which is then used to update the prior distribution. With more
observations, the prior becomes more and more accurate and becomes the final prediction
of the algorithm. Bayesian inference has the advantage that it can be performed on an
on-line basis and can be relatively quick to perform in most cases.
3.2.2 Genetic Algorithms
Genetic Algorithms is the name for a large group of algorithms inspired by Darwinian
evolution and molecular genetics, more specifically by the biological processes in chro-
mosomes. [14]. In essence, Genetic Algorithms are random search algorithms with the
ability to that self-organize, adapt and learn. [13].
The methodology was originally introduced [42] as a probabilistic optimization algo-
rithm. To apply the terms used by Darwin [43] nature (environment) is represented
by the problem definition, and individuals (chromosomes) are represented by candidate
solutions. A set of individuals is known as a population.
Genetic Algorithms work as follows. To start the process, a population is initialized in
a way that in some way maps to the problem definition. The individuals are then scored
using a fitness function to evaluate how well they solve the problem, this is known as
selection. The fittest individuals are then allowed to reproduce, exchanging genes and
then splitting to create a new generation in the crossover step.
Finally, mutation is allowed to take place by arbitrarily changing a subset of individuals,
after which the new generation is ready to take on nature. This process continues until
some predefined fitness criterion has been met.
3.2. Candidate Models 15
3.2.3 Artificial Neural Networks
As the name suggests, Artificial Neural Networks (ANN) take inspiration from the be-
havior of biological neurons in order to perform learning tasks. At its simplest form, an
ANN is a layered system. At each layer the neurons assign weights to the inputs from
the previous layer. By running an experiment many times one can then let the error
propagate back through the system, constantly reassigning the weights to improve the
output.
A Recurrent Neural Network (RNN) is an ANN where the result of the previous training
step is taken into account when making the next prediction. RNNs have been proven
to be successful in solving problems in a wide range of domains. One of their major
shortcomings is their inability to remember features further in the past, since more recent
results tend to cloud earlier ones.
A Long Short-Term Memory RNN (LSTM) is an attempt to combat this problem by
adding channels to access such memory in the past. This approach has been used to
address similar prediction problems in the past. A common thread was to incorporate
a pair of chained LSTM networks, known as an autoencoder, where one network is
responsible for encoding the historical data, and the second responsible for recreating
the original representation based on the encoding. This approach leads to a desired loss
between the decoding and encoding, known as a drop-off, that reduces overfitting on a
subset of the data features.
3.2.4 Reinforcement Learning
The goal of regular reinforcement learning is to explore and learn from an environment.
There are two main kinds of reinforcement learning, model-based and model-free. In the
model-based reinforcement learning, supervised learning is used to learn about a domain
that is already at least partly known. In the model-free approach it assumed that no
knowledge of the environment is known ahead of time. Instead the algorithm works by
giving every state in the environment a so-called Q-score. This Q-score is an estimation
of the highest possible reward obtainable originating from that state.
Model-free learning (or Q-learning) is then performed by going through the possible ac-
tions, one by one, estimating the state that would result from that action. The algorithm
then selects the action that would give the highest Q-score. Whenever the algorithm in-
teracts with the environment it remembers the different outcomes that came from taking
a certain action in a certain state and uses that to improve the Q-scores. This is the
essence of Reinforcement Learning.
16 Theory
There are two main challenges with applying Reinforcement Learning to predicting
energy consumption.
1. The algorithm as described above is based on the premise that one can cycle through
the list of possible actions in a given state and compare all the outcomes. In other
word it is assumed that the action space is finite. This is rarely the case for
the physical world. In the problem described in this thesis the action space is
continuous.
This problem can be addressed. In Gu et al. [44, 45] two algorithms were presented
that use normalizations techniques to be able to use the techniques described above
on problems with a continuous action space.
2. Reinforcement learning essentially is about finding causality between correlated
actions and rewards. In this problem, since we cannot actually change the behavior
of the data center in order to reduce energy consumption. In the current problem
definitions, the actions do not impact the state (i.e. the accuracy of our prediction
does not change what the next value will be). Thus the algorithm will most likely
not converge.
CHAPTER 4
Method
This chapter outlines the way data was collected and explored, as well as the implemen-
tation of the various models used to predict power consumption. These are organized
first by general approach and then split into subsections based on individual models.
4.1 Data CollectionIn Section 3.1, the various metrics were described that were thought to impact power con-
sumption. This section covers to what extent those metrics were available, and how they
were collected and stored. All the data gathered throughout the thesis was anonymized
and made available at https://github.com/Xarepo/green-data.
Both the data centers supplying data for the projects were running Rancher [46] on
top of Kubernetes [47]. This was fortunate since that setup provides data monitoring
out of the box. This data is gathered in real time and stored in a Prometheus [48]
time-series database for up to 7 days before being discarded. Accordingly, it had to
be gathered continuously throughout the project and stored separately. Section 4.1.1
describes that process in detail, including the production of an adapter for effective
extraction of Prometheus data into a Python-friendly format.
The plug power of the nodes in the data center was not part of the monitoring data
provided out of the box by Rancher. Instead, plug power consumption was measured
separately and added to a separate database for simple extraction. This was considered
straightforward enough to not warrant its own section.
The collection of environmental data is covered in Section 4.1.2. Sadly, no metrics were
obtainable for the characteristics of the actual jobs running in the data center (The ones
discussed in Section 3.1.2).
17
18 Method
4.1.1 Metrics from Data Centers
Prometheus allows for queries using PromQL, a Domain-Specific Language for time-series
queries. To mitigate the impracticalities of building large query strings, a wrapper layer
was built to allow for rapid query composition using Python syntax.
Prometheus data is queried by metric, with an optional subfield to filter the results of
the query. To get support from Python introspection, all the available metrics were added
to a Python class, providing quick in-editor completion. In order to facilitate filtering, a
class attribute lookup was used to convert each metric into a function accepting a list of
filters.
The result of the adapter layer was that queries that would previously have been written
as Python strings (Figure 4.1.1) could now be written as Python code (Figure 4.1.2),
vastly improving productivity, since PromQL syntax errors were now Python syntax
errors.
query = (
'sum(rate(container_cpu_usage_seconds_total'
+ '{name!~".*prometheus.*", image!="", container_name!="POD"}'
+ '[5m])) by (node)'
)
Figure 4.1.1: Python string containing a PromQL query
query = p_sum(
p_rate(
p.container_cpu_usage_seconds_total([p_ignore_k8s()]),
"5m",
),
["node"],
)
Figure 4.1.2: PromQL query expressed using the Python adapter
4.1. Data Collection 19
Table 4.1.1: Data center metric granularity
Name Granularity
cpu_seconds Container
memory_bytes Container
read_bytes Container
write_bytes Container
receive_bytes Pod
transmit_bytes Pod
power Node
Metric Granularity
Available information granularity differed by metric. Some data could be obtained for
each container, some at pod level and some data was only available at the node level. In
Table 4.1.1 the granularity of different metrics is listed (Compare to Table 3.1.1).
It was decided to gather data at two levels. All the available data was added up
and gathered at the node level. The names of these data point were prefixed with
node (node_cpu_seconds, node_power etc.). Additionally, all the data that could be
gathered at container level was also gathered at that granularity, and the names of those
data points were prefixed with container. No data was gathered at the pod granularity.
Whenever a data point is used on a finer granularity than was available as was the case
with container_power, container_receive_bytes and container_transmit_bytes,
the per container average of that metric on that node is meant. This could lead to some
outliers on sparsely used nodes, but was considered the best way to facilitate the use of
those data metrics when modeling.
20 Method
4.1.2 Weather Data
It was considered interesting whether environmental data such as the weather had an
impact on power consumption. To investigate this, accurate weather data was needed.
After some research about the available weather data APIs, it was decided to use the
Weather Underground API [49].
Their API, among other things, gives access to the conditions at Lulea Airport every
half hour. Through it, all the data points in Table 3.1.3 except power_price were
obtainable. In the sake of simplicity, the weather conditions were then extrapolated to
all timestamps within the half hour.
4.2 Data ExplorationThe first approach to data exploration was to make scatter plots of the container_power
against each of the parameters available. Unfortunately, on these plots, it was very
difficult to discern any correlations with the human eye.
For this reason, Principal Component Analysis (PCA) was performed to try and visu-
alize deeper patterns in the data. Principal components are vectors where the parameters
have been encoded in a way that retains meaningful information about the relationship
between said parameters. The goal of PCA, therefore, is to reduce the number of dimen-
sions in order to visualize relationships between principal components and the output
without losing important data.
A correlation matrix was also made, which is a table that is used to show the correlation
between the different parameters.
4.3. General Model Setup 21
4.3 General Model SetupThe general setup of all the models was the following. A collection of metrics was
submitted as parameters to the model (See Figure 4.3.1). In order to prevent overfitting
on past data and to decouple the user impact from the time when the job was submitted,
container names and timestamps were not used as parameters. Instead, relationships
were sought between the parameters and the power consumption of the container.
The goal of each model was thus to take the list of parameters and, using only that
information, make a prediction as to how much power a container with those parameter
values is expected to consume.
Figure 4.3.1: General model setup, the model should find a relationship between the metrics
and the power consumption.
22 Method
4.4 Bayesian Inference
The Bayesian modeling commenced with reading up on various implementations of
Bayesian Inference to prediction problems in Python. From those sources it was sur-
mised that PyMC3 would be the right library for the job, and that approach is covered
in Section 4.4.1.
When those approaches struggled to handle the large amount of data used in the model-
ing, the rest of the attempts were performed using scikit-learn [50]. Sections 4.4.2 – 4.4.3
describe the progression from a simple linear regression to more powerful, polynomial
models.
In all the regression attempts using the Bayesian modeling techniques, the dataset was
split into two smaller sets. The first, roughly 90% of the points, was used to train the
model. The other 10% was kept back for testing. In all the attempts made using Bayesian
Inference, these latter 10% were used to produce the actual prediction results.
4.4.1 PyMC3
The main sources for the initial implementation were the PyMC3 [51] getting started
guides [52], [53] as well as a related blog post [54].
Initially, it was believed that investigating the node data might be sufficient to make
extrapolations about the power consumption. A simple linear regression was attempted,
then it was attempted to use GMM as well as ordering the data and introduce switch-
points between the nodes. It was determined that it did not contain significant enough
insights, so the node models were discarded and the focus moved to examining the data
on a container level.
Making that shift meant dealing with data that was roughly 20 times larger than the
node data, which made PyMC3 feel very slow. Therefore, it was decided to move the
statistical modeling to scikit-learn instead.
4.4.2 Linear Regression
Scikit-learn has a built-in model for Linear Regression. All that was needed to perform
linear regression was to provide the input/output pairs and to fit the linear regression
model to the data. In order to judge prediction accuracy the dataset was split into a
training and a test set. Consistently for the regression models, the training was only
performed on the training set, and the final accuracy estimate only calculated using the
test set.
There was only so much information that can be extracted from the dataset using linear
modeling. Thus, the next step was to introduce the other parameters and to perform
polynomial regression.
4.4. Bayesian Inference 23
4.4.3 Polynomial Linear Regression
Scikit-learn has a preprocessing module for expanding a dataset into its polynomial fea-
tures. It takes as parameters the dataset and the degree (called K here) of the expansion.
For example, given the list [a, b] and K = 2, the list [1, a, b, a2, ab, b2] would be returned.
For the polynomial prediction, a pipeline was built that took as input the data, a
list of the desired parameters and K. This algorithm first performed a Kth degree
polynomial expansion and then passed it through the Linear Regression module discussed
in Section 4.4.2.
4.4.4 Polynomial Random Forest Regression
The Polynomial Random Forest Regression worked in the same way as the Polynomial
Linear Regression described in Section 4.4.3. A pipeline was built that took as input
the data, a list of the desired parameters and K. This algorithm first performed a Kth
degree polynomial expansion and then passed it through the RandomForestRegressor
module from scikit-learn.
24 Method
4.5 Genetic Algorithm
The Genetic Algorithm modeling commenced with reading up on various implementations
of Genetic Algorithms in Python. Different libraries were considered and DEAP [55]
chosen as a helpful framework for building genetic models. Sections 4.5.1 – 4.5.4 cover
the definition of individuals, fitness, selection and crossover respectively.
4.5.1 Individuals
In an approach very similar to the polynomial regression used in the Bayesian Inference
model, it was decided to define an individual (See Figure 4.5.1) as a bias b and a two-
dimensional (m × n) list W , where m signified the number of parameters supplied to
the model, and n the degree of the polynomial expansion. These values were initialized
according to a uniform distribution.
Figure 4.5.1: The first version of an individual in the Genetic Algorithm. b is the bias, n is
the number of degrees in the polynomial, m is the number of parameters.
Later, using the same idea as in Section 4.4.3, the individuals were simplified to be
defined as a one-dimensional list with the same length as the number of variables obtained
by running PolynomialFeatures from scikit-learn on the parameters.
4.5. Genetic Algorithm 25
4.5.2 Fitness
A prediction for a point I = [i1, . . . , im] was defined as the result of Equation 4.1, using the
values from the individual, whose fitness was to be determined. A sample of 500 points
was randomly selected for each generation. The fitness of an individual was defined as
its average prediction error over a sample of the input.
prediction = b
+ w1,1i1 + w1,2(i1)2 + . . . + w1,n(i1)
n
+ w2,1i2 + w2,2(i2)2 + . . . + w2,n(i2)
n
+ . . .
+ wm,1im + wm,2(im)2 + . . . + wm,n(im)n (4.1)
Later, when the individuals were simplified as a one-dimensional list, a prediction was
redefined as the dot product between the individual and the polynomial expansion of I.
4.5.3 Selection
Tournament selection was used to determine which individuals to bring into the next
generation. It refers to the procedure of repeatedly choosing a small number (3 in this
case) random individuals from the population, and comparing their fitness. At each step,
the individual with the best fitness is selected for the next generation. This process is
continued until the population of the next generation has reached the same size as the
population of the previous one.
4.5.4 Mutation & Crossover
The uniform distribution of the weights had as a consequence that the initial fitness
was very bad. To combat this, mutation was performed very aggressively. Mutation
probability was set to very high (80%) and a custom mutation algorithm was introduced.
This aggressive approach was a trade-off. It had the benefit that the results would start
converging quickly towards the best possible outcome, though it also meant that the final
value might lack some precision.
26 Method
The custom mutation worked as follow. Each value in the individual went through a
step, where it could be altered one of the four following ways,
• 1/4 chance — it would stay the same
• 1/4 chance — it would be doubled
• 1/4 chance — it would be halved
• 1/4 chance — its sign would be inverted
For mating, two-point crossover was used between each of the rows in the 2-dimensional
list. The bias was left unchanged by crossover.
4.6 Artificial Neural NetworksThe initial approach with regards to building a prediction engine was to look at previous
approaches and try to reproduce the state of the art. In essence, this meant starting with
a chained LSTM approach, as used in some of the papers covered in Section 2.4.
When this approach did not yield great results, it was decided to start from the begin-
ning and build up increasingly complex models in order to gain understanding and thus
be able to make more intelligent decisions going forward. Thus, this section covers the
progression from a basic ANN model to the more advanced models.
4.6. Artificial Neural Networks 27
4.6.1 Basic Model
This model architecture of the first model attempted can be found in Figure 4.6.1. It was
implemented as a basic ANN, with one hidden layer and using ReLU (See Equation 4.2)
as the activation function.
ReLU(x) =
{0, if x ≤ 0
x, otherwise(4.2)
Over time this approach grew to be seen as a very useful starting point, since it allowed
for implementing saving, loading and good plotting without being as computationally
heavy as the more complicated approaches.
Figure 4.6.1: The architecture schema for the basic ANN model.
28 Method
4.6.2 Long Short-Time Memory
The LSTM modeling commenced with reading up on LSTM in general, specifically on
Colah’s blog [56]. Various implementations of LSTM in Python were considered, and
the first implementation inspired by [57]. That model is based on two chained LSTM
networks.
The first one takes the input parameters and encodes the time series into a fixed length
vector. The second takes this vector and interprets it back to a prediction in the desired
domain. A overview of the architecture can be found in Figure 4.6.2.
Figure 4.6.2: An overview of the Encoder-Decoder LSTM chain.
4.7 Reinforcement LearningThe idea was to tackle the continuous action space by basing the reinforcement learning
model on an implementation [58] of Normalized Advantage Functions [44].
A data center environment was created for the agent to explore. This environment
returned the actual data center values grouped by container and sorted by timestamp.
The agent was then to make a guess at the next container_power, and the loss was
defined as the absolute error of the guess. For visualizations of the parameter optimization
and the reinforcement learning process, please refer to Figures 4.7.1 – 4.7.2.
4.7. Reinforcement Learning 29
Figure 4.7.1: How NAF updates parameters.
Figure 4.7.2: The main loop of the Reinforcement Learning approach.
30 Method
4.8 Proof of ConceptThe purpose of this work is to help educate users on the effects of their power consumption
and to motivate them to reduce they environmental footprint. As a part of the project, a
Proof of Concept application was built in order to demonstrate how this could be done.
The Proof of Concept has three parts, which are described in below.
The first part is a monitoring engine where the user can run queries against the cloud
center in real time in order to gain insights about the distribution of the metrics studied.
This part does not necessitate a prediction engine but is useful in giving an introduction
as to what the metrics are.
The second part is the power consumption estimation engine. It allows the user to
submit six different metrics that describe the characteristics of their planned job, and to
see an estimation as to what the power consumption of that job could be given those
characteristics.
The third part is very similar to the second. Instead of submitting six metrics, how-
ever, the user is asked to submit five. A graph is then showed of the estimated power
consumption over the sixth parameter, given the five fixed parameters. It is believed that
this view could help a user understand in which scenarios what metrics have the most
impact.
CHAPTER 5
Results
This chapter shows the results of the various approaches outlined in Chapter 4. In Sec-
tion 5.1, the graphs resulting from the data exploration can be found. Then in Section 5.2,
the results of the different regression attempts are covered. In Sections 5.3 – 5.5 the re-
sults of Genetic Algorithm, Neural Networks and Reinforcement Learning respectively
are outlined.
5.1 Data ExplorationThis section contains the results of the data exploration performed on the raw data as
described in Section 4.2. The dependencies between the container_power and the data
center metrics can be found in Figure 5.1.1. It is very hard to detect any correlations
among these metrics with the human eye.
The result of the PCA can be found in Figure 5.1.2. It shows that there seems to be
clusters within the parameter space that have the same or similar container_power. The
correlation matrix can be found in Figure 5.1.3. It shows that the strongest correlation
is relative to the network metrics, which could be partly due to the fact that they are
averaged node metrics.
31
32 Results
Figure 5.1.1: The raw data obtained from the data center. It is difficult to detect correlations
by eye.
5.1. Data Exploration 33
Figure 5.1.2: Principal Component Analysis of container_power with 6 parameters.
34 Results
Figure 5.1.3: Correlation matrix of the parameters and output. The stronger correlation
between the power consumption and the network metrics could partly be resulting from the fact
that both are averaged node metrics.
5.2. Bayesian Inference 35
Table 5.2.1: Prediction accuracy of linear regression between container_cpu_seconds and
container_power
Actual Predicted Error
count 96287.000000 96287.000000 96287.000000
mean 74.961997 74.797272 22.084664
std 49.375480 1.463845 44.141939
min 12.923077 74.522052 0.006016
25% 54.090909 74.523434 7.649254
50% 69.222222 74.530191 20.432554
75% 91.538462 74.565144 28.015783
max 1890.000000 94.599362 1815.477917
5.2 Bayesian Inference
This section contains the results for the Bayesian Analysis. Section 5.2.1 shows the results
for the linear regression outlined in Section 4.4.2 and Sections 5.2.2 – 5.2.3 contain the
results for the polynomial regression attempts described in Sections 4.4.3 – 4.4.4.
5.2.1 Linear Regression
In Table 5.2.1, the prediction accuracy of linear regression between container_cpu_seconds
and the average node_power per container is displayed. Figure 5.2.1 shows that there
seems to be a very slight correlation between higher CPU usage and power consumption
(though there were some quite impactful outliers).
5.2.2 Polynomial Linear Regression
The result of performing Kth degree polynomial linear regression on the metrics to predict
container_power can be found in Table 5.2.2 for K ∈ [1..7]. It shows that K = 4 yielded
the lowest mean absolute error, but that the overall accuracy was best at K = 3. The
graphs for the K = 3 polynomial linear regression can be found in Figure 5.2.2.
It is worth to note that the improvements made by going from linear to polynomial
linear regression were very small. Although these prediction results were precise enough
to be guiding in many of the applications of such predictions, they were still quite far off
the current state of the art.
36 Results
Figure 5.2.1: Test sample accuracy of simple linear regression between
container_cpu_seconds and average node_power per container.
Table 5.2.2: Results of polynomial linear regression. The leftmost column contains the details
for the actual values that the regression is trying to predict, the rest of the columns show the
distribution of the absolute error achieved with Kth degree polynomial linear regression.
The best accuracy is marked with boldface.
Actual K = 1 K = 2 K = 3 K = 4 K = 5 K = 6 K = 7
count 96287 96287 96287 96287 96287 96287 96287 96287
mean 74.96 23.06 23.04 21.54 21.53 21.61 22.05 23.74
std 49.38 41.31 41.31 37.94 38.16 41.39 86.05 596.99
min 12.92 6.38e−4 6.38e−4 1.23e−4 7.12e−3 8.85e−4 5.64e−4 8.59e−4
25% 54.09 8.70 8.70 7.32 7.77 7.75 7.83 7.84
50% 69.22 18.55 18.55 20.66 19.79 20.26 20.25 20.24
75% 91.54 30.14 30.14 27.38 28.11 28.09 28.31 28.18
max 1890.0 1819.8 1819.8 1815.1 1816.1 5126.4 20967 84823
5.2. Bayesian Inference 37
Figure 5.2.2: Polynomial linear regression
38 Results
Table 5.2.3: Results of polynomial random forest regression. The leftmost column contains the
details for the actual values that the regression is trying to predict, the rest of the columns show
the distribution of the absolute error achieved with Kth degree polynomial random forest
regression. The best accuracy is marked with boldface.
Actual K = 2 K = 3 K = 4
count 365655 365 655 365 655 365 655
mean 57.48 0.7180 0.6443 0.6670
std 31.51 7.132 7.129 7.431
min 6.30 0 0 0
25% 36.06 2.487e−14 2.132e−14 1.421e−14
50% 58.00 8.527e−14 7.816e−14 7.105e−14
75% 70.00 0.167 0.094 0.079
max 1260.00 1134 1144 1145
5.2.3 Random Forest Regression
The result of performing Kth degree polynomial random forest regression on the metrics
to predict container_power can be found in Table 5.2.3 for K ∈ [2..4]. There were some
issues with displaying the results of K > 4, and since the performance beyond that point
degraded for each K it was decided to leave those values out of the report.
The table shows that in general, K = 3 yielded the lowest mean absolute error, and
the best overall accuracy. The graphs for the K = 3 polynomial random forest regression
can be found in Figure 5.2.3.
The results for the Polynomial Random Forest Regression were really good, with an
error percentage of 1.10%. In order to validate these findings on new data, test data was
collected from 3 new days and the trained model exposed to that data instead. Sadly,
the model performed quite poorly in that case, with an error percentage of 26.48%.
5.2. Bayesian Inference 39
Figure 5.2.3: Polynomial Random Forest Regression
40 Results
5.3 Genetic AlgorithmThis sections show the results of the Genetic Algorithm described in Section 4.5. It was
possible to get the Genetic Algorithm to converge at an mean absolute training error of
around 68, which represents an error of around 89%. After the change was made to the
individuals to use PolynomialFeatures, the algorithm converged at an error of around
28%, though with very small difference in the individual predictions.
In Figure 5.3.1, the training accuracy per generation is shown. Due to the large error
in the beginning it is very hard to visualize, but the decrease in error was quite gradual,
and the best level of accuracy was achieved around generation 1000. This figure is
representative for both representations of individuals.
Figure 5.3.1: The training accuracy per generation when running the Genetic Algorithm. Due
to the large error in the beginning it is very hard to visualize, but the decrease in error was quite
gradual, and the best level of accuracy was achieved around generation 1000.
5.4. Artificial Neural Networks 41
5.4 Artificial Neural NetworksThis section contains the results for the Artificial Neural Networks. Section 5.4.1 shows
the results for the basic ANN described in Section 4.6.1 and Section 5.4.2 contains the
results for the LSTM approach detailed in Section 4.6.2.
5.4.1 Basic ANN
For some reason, something was going very wrong with the training. The training pre-
dictions were improving on a lot slower rate than the validation predictions, as can be
seen in Figure 5.4.1. It is believed that this is due to a bug somewhere in the model, and
it is a part of the next steps to locate that bug. The accuracy after 9900 epochs can be
found in Figure 5.4.2.
The lowest error was found in Epoch 9995 (out of 10000 total epochs). The error
kept getting smaller, but any tests run yielded very poor results. Remember that this
inaccuracy probably is due to the training/validation issues described earlier.
Figure 5.4.1: The training and validation errors over the course of the training. They show
that, unexpectedly, validation errors are lower than training errors.
42 Results
Figure 5.4.2: The training and validation predictions compared to the actual values of the
container_power. Note: The values on the x-axis are indexes, not epochs.
5.4. Artificial Neural Networks 43
5.4.2 Long Short-Time Memory
The lowest error at that point was found in Epoch 3056. The mean averaged error loss
at that point was around 70, which is an error of almost 100%. Figure 5.4.3 shows the
result of the prediction in Epoch 3000.
Figure 5.4.3: The training and validation predictions at epoch 3000, compared to the actual
container_power. The values on the y-axis are the container_power in Watt-Hours, with the
indexes on the x-axis.
44 Results
5.5 Reinforcement LearningThis section shows the results of the Reinforcement Learning approach outlined in Sec-
tion 4.7. Unfortunately, due to the issues discussed in Section 3.2.4, no converging model
using Reinforcement Learning was obtained. Please refer to Figure 5.5.1 for the accuracy
over time when running the Reinforcement Learning. Due to time constraints and since
it was suspected from the beginning that the methodology was unsuited for the task, it
was decided to not continue improving the model beyond this point.
Figure 5.5.1: Results of the Reinforcement Learning per Epoch. As can be clearly seen on the
graph, the result did not converge towards a low absolute error.
CHAPTER 6
Evaluation
This section contains the evaluation of the results. The criterion for the evaluation can be
found in Section 1.3. Section 6.1 contains a qualitative comparison between the models
tried in this thesis and their results. Section 6.2 then evaluates effectiveness of the best
model against the state of the art.
6.1 Qualitative Model ComparisonThis section discusses the advantages and disadvantages to each of the approaches tried
throughout the project. The approaches are then scored according to prediction accuracy,
prediction time and overall potential. An evaluation of each of these properties can be
found in Sections 6.1.1 – 6.1.3.
6.1.1 Prediction Accuracy
This is the most important part, how accurate the prediction is. The spectrum of success
was very wide, with a number of solutions not converging at all or converging at really
bad estimates (more than 100% wrong). These included the Neural Network and Re-
inforcement Learning Approaches. The Neural Network approach was not studied very
extensively as part of the project so it is possible that there are more gains to be had
there.
Amidst the working solutions, the Genetic Algorithm solution never became amounted
to more than a glorified linear regression, potentially due to the similarities in the ap-
proaches. It achieved an error of around 23%, but the characteristics of the predictions
were such that the algorithm was more or less performing the same guess every time,
which is not ideal since the goal was to learn from the parameters.
45
46 Evaluation
Then there were the Bayesian Inference methods. It should be said up front that these
outperformed the rest when it came to prediction accuracy, with Polynomial Random
Forest Regression clearly being the winner, though the difference is much less clear when
running the model on completely new test data.
The reason why the Bayesian Inference performed the best is believed to be that the
approach is more straightforward. Since it works in a purely statistical manner, the
risk for overfitting is reduced. Though LSTM based approaches proved unsuccessful in
this project, it is believed that with enough experimentation, they could potentially be
leveraged to achieve even better results.
6.1.2 Prediction Time
This section deals only with how fast a model could be considered to reach optimal
prediction strength. This is important since the goal of the project is related to green
energy, and it therefore behooves the algorithm to be energy efficient as well.
Due to the train/validation issue described in Section 5.4.1, that model never converged
(in 10000 epochs), probably since it was only ever considering the training set.
The slowest algorithm of them all was by far the LSTM approach, which would converge
after roughly three days of computation (Running on a Ubuntu 18.04 VM, 16GB RAM).
Then came the Genetic Algorithm which was optimized using the techniques described
in Section 4.5.4 and after that would converge in around 12 hours.
The Polynomial Random Forest Regression also took quite some time to compute,
roughly 8 hours on a dataset of around 3.5 million points. The Polynomial Linear Re-
gression was a lot faster, only around 20 minutes. The fastest algorithms were the Simple
Linear Regression and the Reinforcement Learning approach (though the latter converged
to an error of around 4285%, and it is unclear whether to refer to that as convergence).
6.1. Qualitative Model Comparison 47
Table 6.1.1: The attempted approaches and their outcomes.
Name Build Time Mean Absolute Error
Linear Regression 10 seconds 29.46%
Polynomial Linear Regression 20 minutes 28.72%
Polynomial Random Forest 8 hours 26.48%
Genetic Algorithm 12 hours Around 28%
Basic ANN Didn’t converge —
LSTM 3 days Almost 100%
Reinforcement Learning Didn’t converge —
6.1.3 Summary
From Sections 6.1.1 – 6.1.2 and as shown in Table 6.1.1, we see that the Polynomial
Random Forest Regression was the most accurate of the solutions, but that it only
outperformed the other regression approaches by percentage points when run against the
final test set.
It can thus be concluded that if the best absolute prediction is wanted, the Polynomial
Random Forest Regression is recommended, and that the Polynomial Linear Regression
can be seen as a happy medium between model build time and prediction accuracy.
48 Evaluation
6.2 EffectivenessThe best Mean Absolute Error was achieved using Polynomial Random Forest Regression
with K = 3, and had an error of 26.48% when run against test data gathered after the
model was built. This is not even close to the state of the art as found in Section 2.4.
Some ideas as to why the results are so different from the ones shown in previous
attempts.
• None of the papers studied released the entire data trace they were working from,
so validating the models against their findings can only be subjective.
• Most other papers focused on making predictions on data center peak power per-
formance. It could be that the sparse utilization of the data centers used in this
report leads to metrics that are harder to predict.
• Different papers used different ways of measuring errors (i.e. Absolute Error, Rel-
ative Error to previous solution etc.).
• There are other factors that are not included in the data set (i.e. GPU utilization,
Server Room Temperature etc.)
• Due to the coarse granularity of some of the measurements, container averages are
used for power and network utilization. This could lead both to imprecision and
overfitting on the averages.
Regardless of how good the predictions are when compared to the state of the art.
They can still serve as a motivator to help users be more thoughtful about their cloud
usage, which was the goal all along.
CHAPTER 7
Discussion
This section covers discussion of the impact of the results, in addition to what was
covered in Section 6.2. Section 7.1 discusses the research question and to what extent it
was answered. Section 7.2 touches on the comparison between this project and previous
work. Section 7.3 then discusses how useful the findings are to the end user.
7.1 Research QuestionThroughout the project, the research question has been helpful in guiding all activities
towards finding different machine learning approaches to optimize prediction energy con-
sumption. The goal was to determine whether these algorithms would help optimize the
prediction of energy consumption.
It can be concluded that the answer to the question with regards to the dataset used
throughout the project is that yes, machine learning methodologies are helpful when
predicting power consumption. Currently the best approach is Polynomial Random Tree
Regression, but it is expected that even better approaches would be found with further
study.
49
50 Discussion
7.2 Comparison with Previous WorkWhen studying the previous work performed by others it became clear that no consensus
existed as to what constituted a good prediction. This is also understandable from
the view that every dataset is subjective to the conditions in the cloud center where it
was gathered. This is the case since number of unknowns is too large at this stage for a
complete model that would hold for all data centers, locations, workloads and schedulers.
The discrepancy between the state of the art predictions in the papers and the pre-
dictions achieved in this thesis can thus be explained by the different conditions under
which this data center was operating (Sparse Load, no GPU measurements etc.). For
more on this, see Section 6.2.
7.3 Usefulness of the FindingsThrough the work performed during the thesis, it could be shown that there is a correla-
tion between user behavior and container energy consumption. In general, the prediction
models tend to reflect other patterns in the data in addition to the user’s impact, making
it to give detailed guidance with regards to a user’s environmental footprint. Thus at
this point, the research should be used mainly to raise awareness and to motivate the
user to think about the environmental impact of their cloud usage.
Given that the prediction results will probably keep improving as more work is per-
formed in this or adjacent fields and as the climate crisis unfolds, the relevance of the
concept to user behavior is expected keep increasing.
CHAPTER 8
Conclusion and Future Work
This chapter summarizes the thesis by outlining what was accomplished and what re-
search is yet to be done. The conclusion of the project can be found in Section 8.1 and
the notes on future work can be found in Section 8.2.
8.1 ConclusionIn this thesis the problem domain of cloud center energy consumption analysis and es-
timation has been researched and discussed. The research question was whether the
application of machine learning algorithms would help optimize the prediction accuracy.
Various models have been made to perform such predictions.
The state of the art was found to be subjective based on the dataset, and no consensus
exists as to what dataset to use as a baseline when making comparisons. The best results
in this project were found using a Polynomial Random Forest Regression algorithm with
the 3 as the degree of the polynomial expansion. The test set prediction accuracy of that
approach resulted in a mean absolute error of 26.48%.
It is concluded that the prediction accuracy can be useful in the use case described in
this thesis, as an educative estimate of what metrics of a cloud job have what impact
on power consumption. In addition literature review, data analysis and initial thinking
about models could be useful in future studies (See Section 8.2).
51
52
8.2 Future WorkThis research could be taken in many different directions, in order to gain an even greater
understanding of the relationship between customer workloads and the energy consump-
tion, as well as to be able to forecast future energy consumption based on predicted
workloads.
The data in this thesis was obtained mainly from the RISE data center in Lulea. That
data center was seeing quite sparse usage, and was thus operating far from peak perfor-
mance. This could have a huge impact on the nature of the results and the relationship
between the load on the data center as a whole and the impact of each container needs
to be investigated further.
It is believed that the user’s behavior impacts the data center’s energy consumption,
but that impact can easily be drowned out by other, more powerful trends in the data.
Further studies are needed to develop heuristics to make sure that it is the user’s impact
that is being predicted, and not factors beyond their control.
The heuristics for splitting power consumption by the containers used throughout this
thesis were very straightforward. It is believed that better heuristics exist, and that
correct heuristics could provide valuable insights into the data center dynamics.
There is more work to be done with regards to fine tuning the models and finding even
better ways of predicting the energy consumption. This work can be seen as a starting
point in that endeavor.
Another interesting angle to study would be what kind of prompts would be most
helpful to motivate users to improve the environmental footprint of their cloud usage.
This could be done in the form of user research, as well as through looking at adjacent
attempts to help people be more environmentally aware.
This project was all about predicting energy consumption based on available metrics.
An interesting follow up study would be to try and project the behavior of the cloud
into the future. The research performed in this thesis could be helpful in making sure
that one does not have to introduce power measuring devices for each node in each data
center where such forecasting is to be done.
That means that if it is possible to forecast those metrics (given a certain user workload
and a time when it is submitted to the data center), it would be possible to give even
better predictions to the user, based on the most recent data on that node. Such a system
would have to be easy to use in order to invite active participation.
Bibliography
[1] R. Shaw, E. Howley, and E. Barrett, “An advanced reinforcement learning approach
for energy-aware virtual machine consolidation in cloud data centers,” 2017 12th In-
ternational Conference for Internet Technology and Secured Transactions (ICITST),
2017.
[2] Y. Qiu, C. Jiang, Y. Wang, D. Ou, Y. Li, and J. Wan, “Energy aware virtual machine
scheduling in data centers,” Energies, vol. 12, no. 4, p. 646, 2019.
[3] A. Jaiantilal, Y. Jiang, and S. Mishra, “Modeling cpu energy consumption for energy
efficient scheduling,” Proceedings of the 1st Workshop on Green Computing - GCM
10, 2010.
[4] A. Hindle, “Green software engineering: The curse of methodology,” 2016 IEEE
23rd International Conference on Software Analysis, Evolution, and Reengineering
(SANER), 2016.
[5] N. S. Chauhan and A. Saxena, “A green software development life cycle for cloud
computing,” IT Professional, vol. 15, no. 1, pp. 8–34, 2013.
[6] L. Ardito, G. Procaccianti, M. Torchiano, and A. Vetro, “Understanding green soft-
ware development: A conceptual framework,” IT Professional, vol. 17, no. 1, pp.
44–50, 2015.
[7] J. L. Berral, Inigo Goiri, R. Nou, F. Julia, J. Guitart, R. Gavalda, and J. Torres, “To-
wards energy-aware scheduling in data centers using machine learning,” Proceedings
of the 1st International Conference on Energy-Efficient Computing and Networking
- e-Energy 10, 2010.
[8] J. L. Berral, R. Gavalda, and J. Torres, “Adaptive scheduling on power-aware man-
aged data-centers using machine learning,” 2011 IEEE/ACM 12th International
Conference on Grid Computing, 2011.
[9] G. Portaluri, D. Adami, A. Gabbrielli, S. Giordano, and M. Pagano, “Power
consumption-aware virtual machine placement in cloud data center,” IEEE Trans-
actions on Green Communications and Networking, vol. 1, no. 4, pp. 541–550, 2017.
53
54
[10] Q. Fang, J. Wang, and Q. Gong, “Qos-driven power management of data centers
via model predictive control,” IEEE Transactions on Automation Science and En-
gineering, vol. 13, no. 4, pp. 1557–1566, 2016.
[11] H. He and H. Shen, “Green-aware online resource allocation for geo-distributed cloud
data centers on multi-source energy,” 2016 17th International Conference on Parallel
and Distributed Computing, Applications and Technologies (PDCAT), 2016.
[12] J. V. Wang, C.-T. Cheng, and C. K. Tse, “A power and thermal-aware virtual
machine allocation mechanism for cloud data centers,” 2015 IEEE International
Conference on Communication Workshop (ICCW), 2015.
[13] D.-M. Zhao, J.-T. Zhou, and K. Li, “An energy-aware algorithm for virtual machine
placement in cloud computing,” IEEE Access, vol. 7, pp. 55 659–55 668, 2019.
[14] A. Radhakrishnan and K. Saravanan, “Energy aware resource allocation model for
iaas optimization,” Studies in Big Data Cloud Computing for Optimization: Foun-
dations, Applications, and Challenges, pp. 51–71, 2018.
[15] I. Kar, R. R. Parida, and H. Das, “Energy aware scheduling using genetic algorithm
in cloud data centers,” 2016 International Conference on Electrical, Electronics, and
Optimization Techniques (ICEEOT), 2016.
[16] S. Javed, W. Manzoor, N. Akhtar, and D. K. Zafar, “Optimization of resource
allocation scheduling in cloud computing by genetic algorithm,” vol. 2, no. 1, 2013.
[17] X. Zhou, K. Wang, W. Jia, and M. Guo, “Reinforcement learning-based adaptive
resource management of differentiated services in geo-distributed data centers,” 2017
IEEE/ACM 25th International Symposium on Quality of Service (IWQoS), 2017.
[18] V. R. Rajarathinam, J. Rajarathinam, and H. Gupta, “Power-aware meta scheduler
with non-linear workload prediction for adaptive virtual machine provisioning,” In-
telligent Computing Theory Lecture Notes in Computer Science, pp. 826–837, 2014.
[19] K. Qazi, Y. Li, and A. Sohn, “Workload prediction of virtual machines for har-
nessing data center resources,” 2014 IEEE 7th International Conference on Cloud
Computing, 2014.
[20] F. Ramezani and M. Naderpour, “A fuzzy virtual machine workload prediction
method for cloud environments,” 2017 IEEE International Conference on Fuzzy
Systems (FUZZ-IEEE), 2017.
[21] P. S. L. Kalyampudi, P. V. Krishna, S. Kuppani, and V. Saritha, “A work load
prediction strategy for power optimization on cloud based data centre using deep
machine learning,” Evolutionary Intelligence, 2019.
55
[22] Q. Zhang, L. T. Yang, Z. Yan, Z. Chen, and P. Li, “An efficient deep learning
model to predict cloud workload for industry informatics,” IEEE Transactions on
Industrial Informatics, vol. 14, no. 7, pp. 3170–3178, 2018.
[23] F. Nwanganga and N. Chawla, “Using structural similarity to predict future work-
load behavior in the cloud,” 2019 IEEE 12th International Conference on Cloud
Computing (CLOUD), 2019.
[24] W. Ding, F. Luo, C. Gu, H. Lu, and Q. Zhou, “Performance-to-power ratio aware
resource consolidation framework based on reinforcement learning in cloud data
centers,” IEEE Access, pp. 1–1, 2020.
[25] D. Meisner and T. F. Wenisch, “Peak power modeling for data center servers with
switched-mode power supplies,” Proceedings of the 16th ACM/IEEE international
symposium on Low power electronics and design - ISLPED 10, 2010.
[26] G. Dhiman, K. Mihic, and T. Rosing, “A system for online power prediction in
virtualized environments using gaussian mixture models,” Proceedings of the 47th
Design Automation Conference on - DAC 10, 2010.
[27] M. Dayarathna, Y. Wen, and R. Fan, “Data center energy consumption modeling:
A survey,” IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 732–794,
2016.
[28] M. Canuto, R. Bosch, M. Macias, and J. Guitart, “A methodology for full-system
power modeling in heterogeneous data centers,” Proceedings of the 9th International
Conference on Utility and Cloud Computing - UCC 16, 2016.
[29] A. Borghesi, A. Bartolini, M. Lombardi, M. Milano, and L. Benini, “Predictive
modeling for job power consumption in hpc systems,” Lecture Notes in Computer
Science High Performance Computing, pp. 181–199, 2016.
[30] Y. Li, H. Hu, Y. Wen, and J. Zhang, “Learning-based power prediction for data
centre operations via deep neural networks,” Proceedings of the 5th International
Workshop on Energy Efficient Data Centres - E2DC 16, 2016.
[31] J. V. Kistowski, M. Schreck, and S. Kounev, “Predicting power consumption in vir-
tualized environments,” Computer Performance Engineering Lecture Notes in Com-
puter Science, pp. 79–93, 2016.
[32] N. Liu, X. Lin, and Y. Wang, “Data center power management for regulation service
using neural network-based power prediction,” 2017 18th International Symposium
on Quality Electronic Design (ISQED), 2017.
56
[33] M. Ferroni, A. Corna, A. Damiani, R. Brondolin, J. A. Colmenares, S. Hofmeyr, J. D.
Kubiatowicz, and M. D. Santambrogio, “Power consumption models for multi-tenant
server infrastructures,” ACM Transactions on Architecture and Code Optimization,
vol. 14, no. 4, pp. 1–22, 2017.
[34] A. Rayan and Y. Nah, “Energy-aware resource prediction in virtualized data centers:
A machine learning approach,” 2018 IEEE International Conference on Consumer
Electronics - Asia (ICCE-Asia), 2018.
[35] Y.-F. Hsu, K. Matsuda, and M. Matsuoka, “Self-aware workload forecasting in data
center power prediction,” 2018 18th IEEE/ACM International Symposium on Clus-
ter, Cloud and Grid Computing (CCGRID), 2018.
[36] K. N. Khan, S. Scepanovic, T. Niemi, J. K. Nurminen, S. V. Alfthan, and O.-P.
Lehto, “Analyzing the power consumption behavior of a large scale data center,”
SICS Software-Intensive Cyber-Physical Systems, vol. 34, no. 1, pp. 61–70, 2018.
[37] S. V, A. M, C. D. H, G. M. Chethana, and K. S, “A weighted ensemble of auto-
matic algorithms for virtual machine performance prediction in cloud,” International
Journal of Current Engineering and Scientific Research, vol. 6, no. 6, pp. 198–203,
2019.
[38] D. Yi, X. Zhou, Y. Wen, and R. Tan, “Toward efficient compute-intensive job allo-
cation for green data centers: A deep reinforcement learning approach,” 2019 IEEE
39th International Conference on Distributed Computing Systems (ICDCS), 2019.
[39] J. V. Kistowski, J. Grohmann, N. Schmitt, and S. Kounev, “Predicting server power
consumption from standard rating results,” Proceedings of the 2019 ACM/SPEC
International Conference on Performance Engineering - ICPE 19, 2019.
[40] D. Yi, X. Zhou, Y. Wen, and R. Tan, “Efficient compute-intensive job allocation in
data centers via deep reinforcement learning,” IEEE Transactions on Parallel and
Distributed Systems, pp. 1–1, 2020.
[41] “Lii. an essay towards solving a problem in the doctrine of chances. by the late rev.
mr. bayes, f. r. s. communicated by mr. price, in a letter to john canton, a. m. f. r.
s,” Philosophical Transactions of the Royal Society of London, vol. 53, pp. 370–418,
1763.
[42] J. H. Holland, “Outline for a logical theory of adaptive systems,” Journal of the
ACM (JACM), vol. 9, no. 3, pp. 297–314, 1962.
[43] C. Darwin, “On the origin of species by means of natural selection, or, the preser-
vation of favoured races in the struggle for life,” 1859.
57
[44] S. Gu, T. P. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep
q-learning with model-based acceleration,” CoRR, 2016. [Online]. Available:
http://arxiv.org/abs/1603.00748
[45] S. Gu, “Sample-efficient deep reinforcement learning for continuous control,” Ph.D.
dissertation.
[46] “Rancher – run kubernetes everywhere.” [Online]. Available: https://rancher.com/
(Accessed 2020-03-02).
[47] “Kubernetes — production-grade container orchestration.” [Online]. Available:
https://kubernetes.io/ (Accessed 2020-03-02).
[48] “Prometheus — from metrics to insight.” [Online]. Available: https://prometheus.
io/ (Accessed 2020-03-02).
[49] “Weather underground.” [Online]. Available: https://www.wunderground.com/
about/data (Accessed 2020-04-13).
[50] “scikit-learn - machine learning in python.” [Online]. Available: https:
//scikit-learn.org/stable/ (Accessed 2020-03-30).
[51] J. Salvatier, T. Wiecki, and C. Fonnesbeck, “Probabilistic programming in python
using pymc3,” 01 2016.
[52] “Getting started with pymc3.” [Online]. Available: https://docs.pymc.io/
notebooks/getting started.html (Accessed 2020-02-25).
[53] “General api quickstart.” [Online]. Available: https://docs.pymc.io/notebooks/
api quickstart.html (Accessed 2020-02-25).
[54] “Probabilistic programming & bayesian methods for hack-
ers.” [Online]. Available: https://camdavidsonpilon.github.io/
Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/ (Accessed 2020-
02-25).
[55] F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, and C. Gagne,
“DEAP: Evolutionary algorithms made easy,” Journal of Machine Learning Re-
search, vol. 13, pp. 2171–2175, Jul. 2012.
[56] “Understanding lstm networks.” [Online]. Available: https://colah.github.io/posts/
2015-08-Understanding-LSTMs/ (Accessed 2020-02-26).
[57] Chandler, “A pytorch example to use rnn for financial prediction.” [Online].
Available: https://chandlerzuo.github.io/blog/2017/11/darnn (Accessed 2020-02-
26).
58
[58] Ikostrikov, “Reimplementation of continuous deep q-learning with model-based
acceleration and continuous control with deep reinforcement learning,” Jan. 2020.
[Online]. Available: https://github.com/ikostrikov/pytorch-ddpg-naf (Accessed
2020-03-02).