Predicting Container-Level Power Consumption in Data Centers …1439069/... · 2020-06-11 · Predicting Container-Level Power Consumption in Data Centers using Machine Learning Approaches

Predicting Container-Level Power

Consumption in Data Centers using

Machine Learning Approaches

Rasmus Bergström

Computer Science and Engineering, master's level

2020

Luleå University of Technology

Department of Computer Science, Electrical and Space Engineering

ABSTRACT

Due to the ongoing climate crisis, reducing waste and carbon emissions has become hot

topic in many fields of study. Cloud data centers contribute a large portion to the world’s

energy consumption. In this work, methodologies are developed using machine learning

algorithms to improve prediction of the energy consumption of a container in a data

center. The goal is to share this information with the user ahead of time, so that the

same can make educated decisions about their environmental footprint.

This work differentiates itself in its sole focus on optimizing prediction, as opposed to

other approaches in the field where energy modeling and prediction has been studied as

a means to building advanced scheduling policies in data centers.

In this thesis, a qualitative comparison between various machine learning approaches to

energy modeling and prediction is put forward. These approaches include Linear, Poly-

nomial Linear and Polynomial Random Forest Regression as well as a Genetic Algorithm,

LSTM Neural Networks and Reinforcement Learning.

The best results were obtained using the Polynomial Random Forest Regression, which

produced a Mean Absolute Error of of 26.48% when run against data center metrics

gathered after the model was built. This prediction engine was then integrated into a

Proof of Concept application as an educative tool to estimate what metrics of a cloud

job have what impact on the container power consumption.

iii

PREFACE

This work was be performed at Xarepo AB. It was made possible by the access to

anonymized data from the Oulu and RISE Lulea data centers, as part of cooperation in

the ArctiqDC project, with funding by Interreg North.

Special thanks to Marcus Liwicki, Saleha Javed and Olov Schelen for great input and

support throughout the entire duration of the thesis.

v

CONTENTS

Chapter 1 – Introduction 1

1.1 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Chapter 2 – Related Work 5

2.1 Green Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Resource Allocation Policy . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Workload Analysis and Prediction . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Energy Consumption Analysis and Prediction . . . . . . . . . . . . . . . 8

Chapter 3 – Theory 11

3.1 Contributing Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Data Center Related . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1.2 Workload-Related . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1.3 Environment Related . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Candidate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.1 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2.2 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . 15

3.2.4 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . 15

Chapter 4 – Method 17

4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.1 Metrics from Data Centers . . . . . . . . . . . . . . . . . . . . . . 18

4.1.2 Weather Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 General Model Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.4 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.4.1 PyMC3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.4.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.4.3 Polynomial Linear Regression . . . . . . . . . . . . . . . . . . . . 23

4.4.4 Polynomial Random Forest Regression . . . . . . . . . . . . . . . 23

4.5 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.5.1 Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.5.2 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.5.3 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.5.4 Mutation & Crossover . . . . . . . . . . . . . . . . . . . . . . . . 25

4.6 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.6.1 Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.6.2 Long Short-Time Memory . . . . . . . . . . . . . . . . . . . . . . 28

4.7 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.8 Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Chapter 5 – Results 31

5.1 Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2.2 Polynomial Linear Regression . . . . . . . . . . . . . . . . . . . . 35

5.2.3 Random Forest Regression . . . . . . . . . . . . . . . . . . . . . . 38

5.3 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.4.1 Basic ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.4.2 Long Short-Time Memory . . . . . . . . . . . . . . . . . . . . . . 43

5.5 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Chapter 6 – Evaluation 45

6.1 Qualitative Model Comparison . . . . . . . . . . . . . . . . . . . . . . . . 45

6.1.1 Prediction Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.1.2 Prediction Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.2 Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Chapter 7 – Discussion 49

7.1 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.2 Comparison with Previous Work . . . . . . . . . . . . . . . . . . . . . . . 50

7.3 Usefulness of the Findings . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Chapter 8 – Conclusion and Future Work 51

8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

viii

FIGURES



2.4.1 Power consumption prediction results shown in papers over the last decade. 8



4.1.1 Python string containing a PromQL query . . . . . . . . . . . . . . . . . 18

4.1.2 PromQL query expressed using the Python adapter . . . . . . . . . . . 18

4.3.1 General model setup, the model should find a relationship between the

metrics and the power consumption. . . . . . . . . . . . . . . . . . . . . 21

4.5.1 The first version of an individual in the Genetic Algorithm. b is the bias, n

is the number of degrees in the polynomial, m is the number of parameters. 24

4.6.1 The architecture schema for the basic ANN model. . . . . . . . . . . . . 27

4.6.2 An overview of the Encoder-Decoder LSTM chain. . . . . . . . . . . . . 28

4.7.1 How NAF updates parameters. . . . . . . . . . . . . . . . . . . . . . . . 29

4.7.2 The main loop of the Reinforcement Learning approach. . . . . . . . . . 29


5.1.1 The raw data obtained from the data center. It is difficult to detect

correlations by eye. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.1.2 Principal Component Analysis of container_power with 6 parameters. . 33

5.1.3 Correlation matrix of the parameters and output. The stronger corre-

lation between the power consumption and the network metrics could

partly be resulting from the fact that both are averaged node metrics. . 34

5.2.1 Test sample accuracy of simple linear regression between container_cpu_seconds

and average node_power per container. . . . . . . . . . . . . . . . . . . 36

5.2.2 Polynomial linear regression . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2.3 Polynomial Random Forest Regression . . . . . . . . . . . . . . . . . . . 39

5.3.1 The training accuracy per generation when running the Genetic Algo-

rithm. Due to the large error in the beginning it is very hard to visualize,

but the decrease in error was quite gradual, and the best level of accuracy

was achieved around generation 1000. . . . . . . . . . . . . . . . . . . . 40

5.4.1 The training and validation errors over the course of the training. They

show that, unexpectedly, validation errors are lower than training errors. 41

5.4.2 The training and validation predictions compared to the actual values of

the container_power. Note: The values on the x-axis are indexes, not

epochs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.4.3 The training and validation predictions at epoch 3000, compared to the

actual container_power. The values on the y-axis are the container_power

in Watt-Hours, with the indexes on the x-axis. . . . . . . . . . . . . . . 43

5.5.1 Results of the Reinforcement Learning per Epoch. As can be clearly seen

on the graph, the result did not converge towards a low absolute error. . 44




x

TABLES




3.1.1 Data center metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1.2 Workload metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1.3 Environment metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13


4.1.1 Data center metric granularity . . . . . . . . . . . . . . . . . . . . . . . 19


5.2.1 Prediction accuracy of linear regression between container_cpu_seconds

and container_power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2.2 Results of polynomial linear regression. The leftmost column contains

the details for the actual values that the regression is trying to predict,

the rest of the columns show the distribution of the absolute error

achieved with Kth degree polynomial linear regression. The best accuracy

is marked with boldface. . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.2.3 Results of polynomial random forest regression. The leftmost column

contains the details for the actual values that the regression is trying to

predict, the rest of the columns show the distribution of the absolute

error achieved with Kth degree polynomial random forest regression. The

best accuracy is marked with boldface. . . . . . . . . . . . . . . . . . . 38


6.1.1 The attempted approaches and their outcomes. . . . . . . . . . . . . . . 47



CHAPTER 1

Introduction

A large portion of today’s computation is carried out in data centers. Cloud providers

transparently manage the entire infrastructure, from cooling and hardware to virtualiza-

tion and horizontal scaling. While potently reducing the amount of operations needed

on the part of the user, the opacity of the underlying resources can lead to a diminished

association between the jobs submitted to the data center on the one hand, and the

pollution they cause on the other.

When browsing the offers of the main cloud vendors it is clear that energy consumption

is not meant to drive the customer’s choice of cloud provider.

Foundational to this work is the belief that people are eager to make a difference in

the fight against the raging climate crisis. With access to advanced models for predicting

and understanding the energy consumption of cloud systems, they could be informed of

the size of their environmental footprint ahead of time. This would empower users to

make educated choices with regards to the green footprint of their technology usage.

Most research in the field [1][2][3] is related to making the data centers more energy

efficient. This means that energy modeling and prediction has been studied as a means

to building advanced scheduling policies for data centers. This work differentiates itself

from other research in the field in that it deals solely with improving the prediction

accuracy itself, with the intention of informing users.

This section outlines the goal of the project by stating the research question and break-

ing it down into actionable steps. It also covers practicalities such as the delimitations

and evaluation of the project.

1

2 Introduction

1.1 Research QuestionCan the prediction of energy consumption of a job in a data center be opti-

mized using machine learning methodologies?

In the context of this thesis, prediction does not refer to foretelling the power consump-

tion at a different (later) time. Instead, the term refers to estimating the consumption

given a collection of other metrics. This is seen as a necessary step in order to later being

able to forecast future energy consumption (see Section 8.2).

The research question can be broken down into the following steps,

1. Select metrics that impact the energy consumption of a container running in a data

center.

2. Figure out how to attain data for these metrics and clean it for use.

3. Develop multiple models to predict the energy consumption of a container based

on the metrics chosen.

4. Evaluate the different models and select the best one.

5. Present the metrics and predictions to the user in a way that enables and motivates

them to reduce the environmental impact of their cloud usage.

1.2 DelimitationsThis project is focused solely on making accurate predictions, with the goal helping users

make educated decisions about their cloud energy performance. What follows is a list of

concerns that could be meaningful to explore, but that are considered outside the scope

of the project, with motivations as to why.

• Taking any (scheduling) actions based on the results of the predictions. This is

because the goal is to inform users, not to make a more efficient data center.

• Forecasting energy consumption into the future, since is a related topic but requires

a different approach.

• Changing the prediction model during run-time, since it would require much more

engineering and it is unclear what advantages it would have.

1.3. Evaluation 3

• Varied climates and seasons, because only data gathered during the course of the

project and from the available data centers will be used.

• User research into what kind of presentation has the greatest impact on people’s

desire to sacrifice comfort and ease-of-use for a better environmental footprint,

since that research would preferably be performed when it is batter known what

the nature of possible predictions is.

1.3 EvaluationSince it was unknown from the beginning of the project whether it would be possible

to derive the desired prediction accuracy from the data set, the project had a nature of

research and exploration. The following points were to be guiding when evaluating the

quality of the findings,

1. A qualitative comparison between the different models attempted in the course of

the project (See Section 6.1).

2. An evaluation of the effectiveness of the model when faced with data that it has

not yet encountered (See Section 6.2).

3. A critical reasoning about how useful the information obtained is to reduce the

environmental footprint of the individual users (See Chapter 7.3).

1.4 Thesis StructureThe chapters of this thesis are structured according to convention with related work and

theory followed by method, results and evaluation. In the cases where the theory was

considered too short to warrant its own section, it was included directly in the method

section. Since the work itself is split over the following parts,

• Contributing Factors

• Data collection and exploration

• Candidate Models (Ordered by complexity)

many of the chapters follow that same division. In order to help the flow of the report,

these parts appear in the same order in each chapter.

CHAPTER 2

Related Work

The project started with an extended and deep literature review. Because the problem

statement differs from most related research, it was deemed necessary to throw the net

wide, researching many adjacent problems to get a comprehensive understanding of the

field. The chapter is split into four sections.

First, papers regarding Green Computing were explored. Efforts to discover more green

ways to develop and deploy software are not a new phenomenon. This research was aimed

at discovering to what extent cloud emissions had been explored.

Then, in order to learn more about data centers and the ways energy efficiency is

currently measured and optimized, a large collection of papers suggesting scheduling op-

timizations were read. Though some of these documents even mention energy-awareness,

they did not suggest actual power modeling schemes. Instead, they were valuable for the

insights they gave into the world of data center energy consumption as a whole.

One big difference between the approach taken in this thesis and those of most other

studies is that in those studies, the user-provided workload is considered an input of

unknown weight. In the case of the problem domain explored in this project the user’s

input could be considered known, since the goal is to inform the user of their own impact.

That said, many efforts have been expended to model the expected workload in a data

center at a given time. Though the goal is different, theirs was also a prediction endeavor,

and could therefore provide useful learnings to help in this project.

Finally, the papers most relevant to this research are covered. These are papers that

deal directly or indirectly with cloud center energy consumption analysis and prediction.

Many of these papers still do not discuss the accuracy of their predictions, since prediction

is seen as a means to improve scheduling algorithms, and those algorithms are the main

purpose of the paper.

5

6 Related Work

2.1 Green Computing

Though since the beginning energy consumption has been on the mind of hardware man-

ufacturers, it has rarely been a main focus for software. In recent years, the intensifying

climate crisis and the proliferation of cloud computing with its virtualization and con-

tainerization have started to change this mindset. Though some findings have been made,

Hindle [4] has outlined the continuing large need for research in the space.

In an instructive article from 2013, Chauhan et al. [5] introduced a framework for

thinking holistically about green infrastructure throughout an organization. They pos-

tulated that in order to achieve lasting impact in a large corporation, the green mindset

has to be present from requirements and design and all the way to test and deployment.

They also suggested that the customers should be given the ability to hold cloud vendors

accountable by tracking energy consumption limits in the Service-Level Agreements.

Ardito et al. [6] used power profiles from mobile devices to show that more often than

not, performant code is green code. They argue that established practices such as refac-

toring and eliminating dead code can have a large impact on device energy consumption.

2.2 Resource Allocation Policy

The main goal of most energy-aware techniques used in data centers is to improve energy

efficiency by implementing better scheduling and resource allocation. This is a hot topic,

with an explosion of papers over the last decade, all suggesting innovative optimization

techniques.

Of these, many have achieved promising results with regards to energy consumption

by presenting novel algorithms for Virtual Machine (VM) allocation (See Berral et al. [7,

8], Qiu et al. [2], Portaluri et al. [9], Fang et al. [10]) that take the maximum energy

consumption of the Physical Machines (PM) into account when deciding where to put

the VMs. Most of these algorithms operate under the assumption that the best way to

improve energy efficiency is to place the VMs as densely as possible on a subset of the

PMs, so that the rest of the PMs can be turned off completely, thus saving energy.

He et al. [11] decided to also account for the energy price and the potential availability

of renewable energy sources. They registered a 60 % improvement compared with a

previous solution (though theirs is the only paper that refers to that previous solution).

Wang et al. [12] used thermal data in addition to the energy consumption data and were

able to improve energy efficiency, though with a slight rise in SLA violations.

Zhou et al. [13], Radhakrishnan et al. [14], Kar et al. [15] and Javed et al. [16] all

used Genetic Algorithms to improve the energy efficiency of clouds when subjected to

various workloads. The difference between many of those solutions and the subject of

this thesis is that their algorithms change the behavior of the data center in response

to the workloads. Shaw et al. [1] and Zhou et al. [17] used Reinforcement Learning to

allocate resources for optimal energy efficiency, also with good results.

2.3. Workload Analysis and Prediction 7

2.3 Workload Analysis and PredictionOne of the main challenges with estimating energy consumption in data centers is the

heterogeneous and dynamic nature of the workloads that the cloud is expected to handle.

Many attempts have been made to accurately and confidently estimate such workloads

in order to optimize resource allocation.

Rajarathinam et al. [18] used a non-linear auto-regressive network with exogenous input

and were able to show that their method was superior to the purely statistical method

they used as reference. Qazi et al. [19] based their approach on Chaos Theory and nearest

neighbors classification in order to develop a framework that allowed them to make fine

grained predictions.

Ramezani et al. [20] applied fuzzy workload prediction and a fuzzy logic system to pre-

dict and control future changes in CPU workload. They were able to predict which PMs

would become hotspots by continuously looking for poor VM performance. Kalyampudi

et al. [21] used a Moving Error Rate to predict the workload of various nodes. They were

able to obtain an average error rate of 6.18 % data sets from 5 different data centers.

Zhang et al. [22] applied deep learning to predict CPU utilization of VMs, both for the

next hour and the next day. They also discuss ways to speed up training using Polyadic

Decomposition. They were able to obtain a Means Absolute Percentage Error of 0.26

and a Root Mean Square Error of 9.97 for 60 minute predictions.

Nwanganga et al. [23] classified a given workload according to the nearest neighbor

structural similarity to historical workloads and used those previous workloads to pre-

dict the behavior of the new workload with successful results in some cases, but with

varied support and confidence values. They propose introducing more features into the

specification.

Ding et al. [24] combined Moving Average and Median Absolute Deviation in order to

predict the workload. Sadly they did not focus on revealing the accuracy of that model,

but they said that it was effective relative to other models.

8 Related Work

2.4 Energy Consumption Analysis and Prediction

An important component of forecasting is the analysis of past data in order to gain

valuable insights. This section is all about energy consumption modeling and estimation.

While there have been numerous papers in that domain, it is worth to note that the field

has improved rapidly over the last decade and that recent results are far better than

older ones. Figure 2.4.1 shows some of the previous results over the last 10 years.

Figure 2.4.1: Power consumption prediction results shown in papers over the last decade.

A large problem when exploring the previous work is the lack of access to benchmark

metric and power consumption traces. Each paper seems to be referring to different data

centers and/or datasets. Most authors state that the traces were from peak data center

performance, which is not the case for the datasets used in the course of this thesis. It

is also not the norm to provide access to the traces for reproduction of the results. This

makes it very difficult to establish what the current state of the art actually is.

2.4. Energy Consumption Analysis and Prediction 9

Earlier attempts, dating back to 2010, can be found in Meisner et al. [25], who modeled

peak power consumption by characterizing the relationship between server utilization and

power supply behavior. They were able to predict the peak power trace with an error be-

low 20 %. Meanwhile Dhiman et al. [26], using Gaussian Mixture Vector Quantification,

achieved an average error of less than 10 %.

Jaiantilal et al. [3] used linear as well as random forest regression to model energy con-

sumption for scheduling purposes. They did not explicitly state the error they obtained,

but from their graphs it looks like the random forest regression was more effective.

In 2016, Dayarathna et al. [27] performed an in-depth study of the existing literature

on data center power modeling available at that time, and emphasized taking the entire

data center system into account when modeling energy consumption.

Canuto et al. [28] proposed deriving a single model per platform to account for het-

erogeneity in cloud systems. They surmised that the correlation between certain metrics

and energy consumption will vary between platforms, and used a minimum set of indi-

cators for each platform, based on that correlation. At the time, their results were very

promising.

Borghesi et al. [29] used random forest regression to predict job power consumption in

high-power computing scenarios. They reported that training and predicting went very

fast, with a mean error of around 8–9 % over the entire test period (15 % when including

outliers).

Li et al. [30] used extensive power dynamic profiling, auto-encoders and deep learning

models to try and optimize the accuracy of predictions. They presented two models, one

coarse and one fine-grained, and reported 79 % error reduction for certain cases.

Kistowski et al. [31] used multiple linear regression to show that the power consump-

tion of CPU and storage loads could be predicted with a prediction error of less than

15 % percent across a number of virtualized environment configurations. They further

introduced a heuristic for pruning workloads to avoid using workloads that may lead to

a decrease in prediction accuracy.

Liu et al. [32] used an LSTM-based approach, landing at a mean absolute error rate of

4.42 % on data center power consumption. Ferroni et al. [33] used a divide and conquer

approach to model power consumption of heterogeneous data centers. They were able to

achieve a relative error of 2 % on average and under 4 % in almost all cases. Instead of

building one comprehensive model they identified distinct working states of the system

and built a model for each of them.

Rayan et al. [34] used polynomial regression to predict power consumption as well as

the number of physical machines needed, all based on the daily workload. They did not

share numbers for the error but the graphs seemed to show good results.

Hsu et al. [35] made a feature selection from over 4000 operational trace data variables

and ran though through a non-linear auto-regressive exogenous model. They used sliding

window and validation data sets for model building and were able to achieve a mean

squared error of 1.13 %.

10 Related Work

Khan et al. [36] studied node power consumption and discussed approaches to future

estimation. They covered vast amounts of log data with statistical and machine learning

analysis and were able to estimate plug energy consumption with a mean absolute error

rate of 1.97 %. They found that the biggest impact came from failed jobs, as well as

from the CPU and Memory metrics.

Patil et al. [37] suggested forwarding an ensemble of base predictors (Exponential

Smoothening, Auto-Regressive Integrated Moving Average, Nonlinear Neural Network

and Trigonometric Box-Cox Auto-Regressive Moving Average Trend Seasonal Model) to

a fuzzy neural network with self-adjusting learning rate and momentum weight.

Yi et al. [38] used two LSTM in tandem to predict the temperature and energy con-

sumption of the processor in the next step of their resource allocation algorithm. They

found that a single LSTM yielded inferior prediction accuracy. With the tandem ap-

proach they achieved a root mean square error of 3 %.

Kistowski et al. [39] introduced an off-line power prediction method that used the

results of standard power rating tools. They used a selection of four different formalisms,

from which they attempted to automatically select the best one. They were able to

achieve an average error of 9.49 % for three workloads running on real-world, physical

servers.

Yi et al. [40] showed that deep reinforcement learning can be effective when allocating

compute-intensive job in data centers. They used an expectation maximization algorithm

to construct a Gaussian mixture model. They found that constructing separate LSTM

networks for each of the clusters led to a higher prediction accuracy.

CHAPTER 3

Theory

In this section the supporting theory is put forth. It builds heavily on the research con-

ducted in Chapter 2 and on other sources. Section 3.1 is concerned with a walk-through

of the various data metrics that could be important for the project, while Section 3.2

outlines the theory behind the various methodologies used to perform data analysis and

prediction.

Due to the lack of related work in exactly the same problem domain, the theory portion

of this thesis is limited. Most of the models used were developed by trial and error, and

are covered in Chapter 4.

3.1 Contributing FactorsThis section outlines the selection of metrics used as parameters when analyzing and

predicting power consumption. Section 3.1.1 describes actual metrics from the data

center. Section 3.1.2, covers factors that are related to the workload. Section 3.1.3

contains factors related to the environment in which the data center operates.

11

12 Theory

Table 3.1.1: Data center metrics

Name Description Unit

cpu_seconds CPU Processing Time Seconds

memory_bytes Memory Usage Bytes

read_bytes File System Read Bytes

write_bytes File System Write Bytes

receive_bytes Network Receive Bytes

transmit_bytes Network Transmit Bytes

power Node Plug Power Watt-Hours

Table 3.1.2: Workload metrics


payload_bytes The size of the payload Bytes

payload_cycles Number of cycles to perform job Scalar

3.1.1 Data Center Related

There are hundreds of metrics available from most data center monitoring systems. What

can be difficult when switching from one distributed setup to another is comparing the

metrics, it is easy to end up with an apples to oranges comparison. In order to combat

this, the focus was put on the most straightforward measurements, data that the user

themselves could experiment with.

The metrics chosen can be seen in Table 3.1.1. It is worth noting that many pa-

pers [26][32] focused predominantly on cpu_seconds and memory_bytes for prediction

purposes.

3.1.2 Workload-Related

The properties of the job the user submits impact its performance. Armed with knowledge

about what impact their decisions make, the user can make educated decisions about how

to optimize. In Table 3.1.2, metrics are highlighted which describe the workload.

3.2. Candidate Models 13

Table 3.1.3: Environment metrics


temperature Air temperature Celsius

wind_speed Wind speed km/h

weather_description Human readable description of the weather String

pressure Air pressure hPA

humidity Air humidity %

time_of_day Time of Day Seconds

month The number of the month [1-12]

power_price The average price of power that day SEK

3.1.3 Environment Related

A data center does not operate in a vacuum. There are a number of factors in the

environment, and many of them might impact the performance of the data center. The

metrics chosen for the various factors for use in training prediction models can be found

in Table 3.1.3.

3.2 Candidate ModelsThere are numerous ways in which to perform data analysis and prediction, ranging

from simple linear regression to more advanced approaches. Indeed, as can be seen in

Section 2, many different approaches have been tried in adjacent problem domains with

notable success. In order to make an interesting study, it was deemed wise to try a

variety of different approaches and perform a comparative analysis between them.

This section outlines the supporting theory for the approaches attempted during the

project. Section 3.2.1 outlines the theory behind Bayesian Inference, which was chosen as

a statistical baseline upon which to draw. Section 3.2.3 covers Artificial Neural Networks,

Recurrent Neural Networks and Long Short Term Memory to see explore they can be

optimized for the task at hand. Section 3.2.2 explores an evolutionary approach, and

finally Section 3.2.4 explores and evaluates the merits of Reinforcement Learning as

applied to the problem formulation.

14 Theory

3.2.1 Bayesian Inference

Bayes’ Theorem [41] is a mathematical framework for estimating the probability of an

event based on some initial belief or knowledge that we have, commonly known as the

prior. The scenario is the following, we have just observed event B, and we are trying

to estimate P (A|B). According to Bayes’ Theorem (See Equation 3.1), we can then use

the prior P (A) to estimate it.

P (A|B) =P (B|A) P (A)

P (B)(3.1)

Bayes’ Theorem has many applications, one of which is Bayesian Inference, which refers

to the process of extracting properties from data using Bayes’ Theorem. Equation 3.1

can then be rewritten as shown in Equation 3.2 (Θ represents the prior distribution).

P (Θ|data) =P (data|Θ) P (Θ)

P (data)(3.2)

In other words, one makes an initial assumption about the distribution of the data

given a set of parameters. One uses this prior to make a prediction based on the next

data point observed. The actual value can then be differentiated with the prediction in

order to find the error, which is then used to update the prior distribution. With more

observations, the prior becomes more and more accurate and becomes the final prediction

of the algorithm. Bayesian inference has the advantage that it can be performed on an

on-line basis and can be relatively quick to perform in most cases.

3.2.2 Genetic Algorithms

Genetic Algorithms is the name for a large group of algorithms inspired by Darwinian

evolution and molecular genetics, more specifically by the biological processes in chro-

mosomes. [14]. In essence, Genetic Algorithms are random search algorithms with the

ability to that self-organize, adapt and learn. [13].

The methodology was originally introduced [42] as a probabilistic optimization algo-

rithm. To apply the terms used by Darwin [43] nature (environment) is represented

by the problem definition, and individuals (chromosomes) are represented by candidate

solutions. A set of individuals is known as a population.

Genetic Algorithms work as follows. To start the process, a population is initialized in

a way that in some way maps to the problem definition. The individuals are then scored

using a fitness function to evaluate how well they solve the problem, this is known as

selection. The fittest individuals are then allowed to reproduce, exchanging genes and

then splitting to create a new generation in the crossover step.

Finally, mutation is allowed to take place by arbitrarily changing a subset of individuals,

after which the new generation is ready to take on nature. This process continues until

some predefined fitness criterion has been met.

3.2. Candidate Models 15

3.2.3 Artificial Neural Networks

As the name suggests, Artificial Neural Networks (ANN) take inspiration from the be-

havior of biological neurons in order to perform learning tasks. At its simplest form, an

ANN is a layered system. At each layer the neurons assign weights to the inputs from

the previous layer. By running an experiment many times one can then let the error

propagate back through the system, constantly reassigning the weights to improve the

output.

A Recurrent Neural Network (RNN) is an ANN where the result of the previous training

step is taken into account when making the next prediction. RNNs have been proven

to be successful in solving problems in a wide range of domains. One of their major

shortcomings is their inability to remember features further in the past, since more recent

results tend to cloud earlier ones.

A Long Short-Term Memory RNN (LSTM) is an attempt to combat this problem by

adding channels to access such memory in the past. This approach has been used to

address similar prediction problems in the past. A common thread was to incorporate

a pair of chained LSTM networks, known as an autoencoder, where one network is

responsible for encoding the historical data, and the second responsible for recreating

the original representation based on the encoding. This approach leads to a desired loss

between the decoding and encoding, known as a drop-off, that reduces overfitting on a

subset of the data features.

3.2.4 Reinforcement Learning

The goal of regular reinforcement learning is to explore and learn from an environment.

There are two main kinds of reinforcement learning, model-based and model-free. In the

model-based reinforcement learning, supervised learning is used to learn about a domain

that is already at least partly known. In the model-free approach it assumed that no

knowledge of the environment is known ahead of time. Instead the algorithm works by

giving every state in the environment a so-called Q-score. This Q-score is an estimation

of the highest possible reward obtainable originating from that state.

Model-free learning (or Q-learning) is then performed by going through the possible ac-

tions, one by one, estimating the state that would result from that action. The algorithm

then selects the action that would give the highest Q-score. Whenever the algorithm in-

teracts with the environment it remembers the different outcomes that came from taking

a certain action in a certain state and uses that to improve the Q-scores. This is the

essence of Reinforcement Learning.

16 Theory

There are two main challenges with applying Reinforcement Learning to predicting

energy consumption.

1. The algorithm as described above is based on the premise that one can cycle through

the list of possible actions in a given state and compare all the outcomes. In other

word it is assumed that the action space is finite. This is rarely the case for

the physical world. In the problem described in this thesis the action space is

continuous.

This problem can be addressed. In Gu et al. [44, 45] two algorithms were presented

that use normalizations techniques to be able to use the techniques described above

on problems with a continuous action space.

2. Reinforcement learning essentially is about finding causality between correlated

actions and rewards. In this problem, since we cannot actually change the behavior

of the data center in order to reduce energy consumption. In the current problem

definitions, the actions do not impact the state (i.e. the accuracy of our prediction

does not change what the next value will be). Thus the algorithm will most likely

not converge.

CHAPTER 4

Method

This chapter outlines the way data was collected and explored, as well as the implemen-

tation of the various models used to predict power consumption. These are organized

first by general approach and then split into subsections based on individual models.

4.1 Data CollectionIn Section 3.1, the various metrics were described that were thought to impact power con-

sumption. This section covers to what extent those metrics were available, and how they

were collected and stored. All the data gathered throughout the thesis was anonymized

and made available at https://github.com/Xarepo/green-data.

Both the data centers supplying data for the projects were running Rancher [46] on

top of Kubernetes [47]. This was fortunate since that setup provides data monitoring

out of the box. This data is gathered in real time and stored in a Prometheus [48]

time-series database for up to 7 days before being discarded. Accordingly, it had to

be gathered continuously throughout the project and stored separately. Section 4.1.1

describes that process in detail, including the production of an adapter for effective

extraction of Prometheus data into a Python-friendly format.

The plug power of the nodes in the data center was not part of the monitoring data

provided out of the box by Rancher. Instead, plug power consumption was measured

separately and added to a separate database for simple extraction. This was considered

straightforward enough to not warrant its own section.

The collection of environmental data is covered in Section 4.1.2. Sadly, no metrics were

obtainable for the characteristics of the actual jobs running in the data center (The ones

discussed in Section 3.1.2).

17

https://github.com/Xarepo/green-data

18 Method

4.1.1 Metrics from Data Centers

Prometheus allows for queries using PromQL, a Domain-Specific Language for time-series

queries. To mitigate the impracticalities of building large query strings, a wrapper layer

was built to allow for rapid query composition using Python syntax.

Prometheus data is queried by metric, with an optional subfield to filter the results of

the query. To get support from Python introspection, all the available metrics were added

to a Python class, providing quick in-editor completion. In order to facilitate filtering, a

class attribute lookup was used to convert each metric into a function accepting a list of

filters.

The result of the adapter layer was that queries that would previously have been written

as Python strings (Figure 4.1.1) could now be written as Python code (Figure 4.1.2),

vastly improving productivity, since PromQL syntax errors were now Python syntax

errors.

query = (

'sum(rate(container_cpu_usage_seconds_total'

+ '{name!~".*prometheus.*", image!="", container_name!="POD"}'

+ '[5m])) by (node)'

)

Figure 4.1.1: Python string containing a PromQL query

query = p_sum(

p_rate(

p.container_cpu_usage_seconds_total([p_ignore_k8s()]),

"5m",

),

["node"],

)

Figure 4.1.2: PromQL query expressed using the Python adapter

4.1. Data Collection 19

Table 4.1.1: Data center metric granularity

Name Granularity

cpu_seconds Container

memory_bytes Container

read_bytes Container

write_bytes Container

receive_bytes Pod

transmit_bytes Pod

power Node

Metric Granularity

Available information granularity differed by metric. Some data could be obtained for

each container, some at pod level and some data was only available at the node level. In

Table 4.1.1 the granularity of different metrics is listed (Compare to Table 3.1.1).

It was decided to gather data at two levels. All the available data was added up

and gathered at the node level. The names of these data point were prefixed with

node (node_cpu_seconds, node_power etc.). Additionally, all the data that could be

gathered at container level was also gathered at that granularity, and the names of those

data points were prefixed with container. No data was gathered at the pod granularity.

Whenever a data point is used on a finer granularity than was available as was the case

with container_power, container_receive_bytes and container_transmit_bytes,

the per container average of that metric on that node is meant. This could lead to some

outliers on sparsely used nodes, but was considered the best way to facilitate the use of

those data metrics when modeling.

20 Method

4.1.2 Weather Data

It was considered interesting whether environmental data such as the weather had an

impact on power consumption. To investigate this, accurate weather data was needed.

After some research about the available weather data APIs, it was decided to use the

Weather Underground API [49].

Their API, among other things, gives access to the conditions at Lulea Airport every

half hour. Through it, all the data points in Table 3.1.3 except power_price were

obtainable. In the sake of simplicity, the weather conditions were then extrapolated to

all timestamps within the half hour.

4.2 Data ExplorationThe first approach to data exploration was to make scatter plots of the container_power

against each of the parameters available. Unfortunately, on these plots, it was very

difficult to discern any correlations with the human eye.

For this reason, Principal Component Analysis (PCA) was performed to try and visu-

alize deeper patterns in the data. Principal components are vectors where the parameters

have been encoded in a way that retains meaningful information about the relationship

between said parameters. The goal of PCA, therefore, is to reduce the number of dimen-

sions in order to visualize relationships between principal components and the output

without losing important data.

A correlation matrix was also made, which is a table that is used to show the correlation

between the different parameters.

4.3. General Model Setup 21

4.3 General Model SetupThe general setup of all the models was the following. A collection of metrics was

submitted as parameters to the model (See Figure 4.3.1). In order to prevent overfitting

on past data and to decouple the user impact from the time when the job was submitted,

container names and timestamps were not used as parameters. Instead, relationships

were sought between the parameters and the power consumption of the container.

The goal of each model was thus to take the list of parameters and, using only that

information, make a prediction as to how much power a container with those parameter

values is expected to consume.

Figure 4.3.1: General model setup, the model should find a relationship between the metrics

and the power consumption.

22 Method

4.4 Bayesian Inference

The Bayesian modeling commenced with reading up on various implementations of

Bayesian Inference to prediction problems in Python. From those sources it was sur-

mised that PyMC3 would be the right library for the job, and that approach is covered

in Section 4.4.1.

When those approaches struggled to handle the large amount of data used in the model-

ing, the rest of the attempts were performed using scikit-learn [50]. Sections 4.4.2 – 4.4.3

describe the progression from a simple linear regression to more powerful, polynomial

models.

In all the regression attempts using the Bayesian modeling techniques, the dataset was

split into two smaller sets. The first, roughly 90% of the points, was used to train the

model. The other 10% was kept back for testing. In all the attempts made using Bayesian

Inference, these latter 10% were used to produce the actual prediction results.

4.4.1 PyMC3

The main sources for the initial implementation were the PyMC3 [51] getting started

guides [52], [53] as well as a related blog post [54].

Initially, it was believed that investigating the node data might be sufficient to make

extrapolations about the power consumption. A simple linear regression was attempted,

then it was attempted to use GMM as well as ordering the data and introduce switch-

points between the nodes. It was determined that it did not contain significant enough

insights, so the node models were discarded and the focus moved to examining the data

on a container level.

Making that shift meant dealing with data that was roughly 20 times larger than the

node data, which made PyMC3 feel very slow. Therefore, it was decided to move the

statistical modeling to scikit-learn instead.

4.4.2 Linear Regression

Scikit-learn has a built-in model for Linear Regression. All that was needed to perform

linear regression was to provide the input/output pairs and to fit the linear regression

model to the data. In order to judge prediction accuracy the dataset was split into a

training and a test set. Consistently for the regression models, the training was only

performed on the training set, and the final accuracy estimate only calculated using the

test set.

There was only so much information that can be extracted from the dataset using linear

modeling. Thus, the next step was to introduce the other parameters and to perform

polynomial regression.

4.4. Bayesian Inference 23

4.4.3 Polynomial Linear Regression

Scikit-learn has a preprocessing module for expanding a dataset into its polynomial fea-

tures. It takes as parameters the dataset and the degree (called K here) of the expansion.

For example, given the list [a, b] and K = 2, the list [1, a, b, a2, ab, b2] would be returned.

For the polynomial prediction, a pipeline was built that took as input the data, a

list of the desired parameters and K. This algorithm first performed a Kth degree

polynomial expansion and then passed it through the Linear Regression module discussed

in Section 4.4.2.

4.4.4 Polynomial Random Forest Regression

The Polynomial Random Forest Regression worked in the same way as the Polynomial

Linear Regression described in Section 4.4.3. A pipeline was built that took as input

the data, a list of the desired parameters and K. This algorithm first performed a Kth

degree polynomial expansion and then passed it through the RandomForestRegressor

module from scikit-learn.

24 Method

4.5 Genetic Algorithm

The Genetic Algorithm modeling commenced with reading up on various implementations

of Genetic Algorithms in Python. Different libraries were considered and DEAP [55]

chosen as a helpful framework for building genetic models. Sections 4.5.1 – 4.5.4 cover

the definition of individuals, fitness, selection and crossover respectively.

4.5.1 Individuals

In an approach very similar to the polynomial regression used in the Bayesian Inference

model, it was decided to define an individual (See Figure 4.5.1) as a bias b and a two-

dimensional (m × n) list W , where m signified the number of parameters supplied to

the model, and n the degree of the polynomial expansion. These values were initialized

according to a uniform distribution.

Figure 4.5.1: The first version of an individual in the Genetic Algorithm. b is the bias, n is

the number of degrees in the polynomial, m is the number of parameters.

Later, using the same idea as in Section 4.4.3, the individuals were simplified to be

defined as a one-dimensional list with the same length as the number of variables obtained

by running PolynomialFeatures from scikit-learn on the parameters.

4.5. Genetic Algorithm 25

4.5.2 Fitness

A prediction for a point I = [i1, . . . , im] was defined as the result of Equation 4.1, using the

values from the individual, whose fitness was to be determined. A sample of 500 points

was randomly selected for each generation. The fitness of an individual was defined as

its average prediction error over a sample of the input.

prediction = b

+ w1,1i1 + w1,2(i1)2 + . . . + w1,n(i1)

n

+ w2,1i2 + w2,2(i2)2 + . . . + w2,n(i2)

n

+ . . .

+ wm,1im + wm,2(im)2 + . . . + wm,n(im)n (4.1)

Later, when the individuals were simplified as a one-dimensional list, a prediction was

redefined as the dot product between the individual and the polynomial expansion of I.

4.5.3 Selection

Tournament selection was used to determine which individuals to bring into the next

generation. It refers to the procedure of repeatedly choosing a small number (3 in this

case) random individuals from the population, and comparing their fitness. At each step,

the individual with the best fitness is selected for the next generation. This process is

continued until the population of the next generation has reached the same size as the

population of the previous one.

4.5.4 Mutation & Crossover

The uniform distribution of the weights had as a consequence that the initial fitness

was very bad. To combat this, mutation was performed very aggressively. Mutation

probability was set to very high (80%) and a custom mutation algorithm was introduced.

This aggressive approach was a trade-off. It had the benefit that the results would start

converging quickly towards the best possible outcome, though it also meant that the final

value might lack some precision.

26 Method

The custom mutation worked as follow. Each value in the individual went through a

step, where it could be altered one of the four following ways,

• 1/4 chance — it would stay the same

• 1/4 chance — it would be doubled

• 1/4 chance — it would be halved

• 1/4 chance — its sign would be inverted

For mating, two-point crossover was used between each of the rows in the 2-dimensional

list. The bias was left unchanged by crossover.

4.6 Artificial Neural NetworksThe initial approach with regards to building a prediction engine was to look at previous

approaches and try to reproduce the state of the art. In essence, this meant starting with

a chained LSTM approach, as used in some of the papers covered in Section 2.4.

When this approach did not yield great results, it was decided to start from the begin-

ning and build up increasingly complex models in order to gain understanding and thus

be able to make more intelligent decisions going forward. Thus, this section covers the

progression from a basic ANN model to the more advanced models.

4.6. Artificial Neural Networks 27

4.6.1 Basic Model

This model architecture of the first model attempted can be found in Figure 4.6.1. It was

implemented as a basic ANN, with one hidden layer and using ReLU (See Equation 4.2)

as the activation function.

ReLU(x) =

{0, if x ≤ 0

x, otherwise(4.2)

Over time this approach grew to be seen as a very useful starting point, since it allowed

for implementing saving, loading and good plotting without being as computationally

heavy as the more complicated approaches.

Figure 4.6.1: The architecture schema for the basic ANN model.

28 Method

4.6.2 Long Short-Time Memory

The LSTM modeling commenced with reading up on LSTM in general, specifically on

Colah’s blog [56]. Various implementations of LSTM in Python were considered, and

the first implementation inspired by [57]. That model is based on two chained LSTM

networks.

The first one takes the input parameters and encodes the time series into a fixed length

vector. The second takes this vector and interprets it back to a prediction in the desired

domain. A overview of the architecture can be found in Figure 4.6.2.

Figure 4.6.2: An overview of the Encoder-Decoder LSTM chain.

4.7 Reinforcement LearningThe idea was to tackle the continuous action space by basing the reinforcement learning

model on an implementation [58] of Normalized Advantage Functions [44].

A data center environment was created for the agent to explore. This environment

returned the actual data center values grouped by container and sorted by timestamp.

The agent was then to make a guess at the next container_power, and the loss was

defined as the absolute error of the guess. For visualizations of the parameter optimization

and the reinforcement learning process, please refer to Figures 4.7.1 – 4.7.2.

4.7. Reinforcement Learning 29

Figure 4.7.1: How NAF updates parameters.

Figure 4.7.2: The main loop of the Reinforcement Learning approach.

30 Method

4.8 Proof of ConceptThe purpose of this work is to help educate users on the effects of their power consumption

and to motivate them to reduce they environmental footprint. As a part of the project, a

Proof of Concept application was built in order to demonstrate how this could be done.

The Proof of Concept has three parts, which are described in below.

The first part is a monitoring engine where the user can run queries against the cloud

center in real time in order to gain insights about the distribution of the metrics studied.

This part does not necessitate a prediction engine but is useful in giving an introduction

as to what the metrics are.

The second part is the power consumption estimation engine. It allows the user to

submit six different metrics that describe the characteristics of their planned job, and to

see an estimation as to what the power consumption of that job could be given those

characteristics.

The third part is very similar to the second. Instead of submitting six metrics, how-

ever, the user is asked to submit five. A graph is then showed of the estimated power

consumption over the sixth parameter, given the five fixed parameters. It is believed that

this view could help a user understand in which scenarios what metrics have the most

impact.

CHAPTER 5

Results

This chapter shows the results of the various approaches outlined in Chapter 4. In Sec-

tion 5.1, the graphs resulting from the data exploration can be found. Then in Section 5.2,

the results of the different regression attempts are covered. In Sections 5.3 – 5.5 the re-

sults of Genetic Algorithm, Neural Networks and Reinforcement Learning respectively

are outlined.

5.1 Data ExplorationThis section contains the results of the data exploration performed on the raw data as

described in Section 4.2. The dependencies between the container_power and the data

center metrics can be found in Figure 5.1.1. It is very hard to detect any correlations

among these metrics with the human eye.

The result of the PCA can be found in Figure 5.1.2. It shows that there seems to be

clusters within the parameter space that have the same or similar container_power. The

correlation matrix can be found in Figure 5.1.3. It shows that the strongest correlation

is relative to the network metrics, which could be partly due to the fact that they are

averaged node metrics.

31

32 Results

Figure 5.1.1: The raw data obtained from the data center. It is difficult to detect correlations

by eye.

5.1. Data Exploration 33

Figure 5.1.2: Principal Component Analysis of container_power with 6 parameters.

34 Results

Figure 5.1.3: Correlation matrix of the parameters and output. The stronger correlation

between the power consumption and the network metrics could partly be resulting from the fact

that both are averaged node metrics.


Table 5.2.1: Prediction accuracy of linear regression between container_cpu_seconds and

container_power

Actual Predicted Error

count 96287.000000 96287.000000 96287.000000

mean 74.961997 74.797272 22.084664

std 49.375480 1.463845 44.141939

min 12.923077 74.522052 0.006016

25% 54.090909 74.523434 7.649254

50% 69.222222 74.530191 20.432554

75% 91.538462 74.565144 28.015783

max 1890.000000 94.599362 1815.477917

5.2 Bayesian Inference

This section contains the results for the Bayesian Analysis. Section 5.2.1 shows the results

for the linear regression outlined in Section 4.4.2 and Sections 5.2.2 – 5.2.3 contain the

results for the polynomial regression attempts described in Sections 4.4.3 – 4.4.4.

5.2.1 Linear Regression

In Table 5.2.1, the prediction accuracy of linear regression between container_cpu_seconds

and the average node_power per container is displayed. Figure 5.2.1 shows that there

seems to be a very slight correlation between higher CPU usage and power consumption

(though there were some quite impactful outliers).

5.2.2 Polynomial Linear Regression

The result of performing Kth degree polynomial linear regression on the metrics to predict

container_power can be found in Table 5.2.2 for K ∈ [1..7]. It shows that K = 4 yielded

the lowest mean absolute error, but that the overall accuracy was best at K = 3. The

graphs for the K = 3 polynomial linear regression can be found in Figure 5.2.2.

It is worth to note that the improvements made by going from linear to polynomial

linear regression were very small. Although these prediction results were precise enough

to be guiding in many of the applications of such predictions, they were still quite far off

the current state of the art.

36 Results

Figure 5.2.1: Test sample accuracy of simple linear regression between

container_cpu_seconds and average node_power per container.

Table 5.2.2: Results of polynomial linear regression. The leftmost column contains the details

for the actual values that the regression is trying to predict, the rest of the columns show the

distribution of the absolute error achieved with Kth degree polynomial linear regression.

The best accuracy is marked with boldface.

Actual K = 1 K = 2 K = 3 K = 4 K = 5 K = 6 K = 7

count 96287 96287 96287 96287 96287 96287 96287 96287

mean 74.96 23.06 23.04 21.54 21.53 21.61 22.05 23.74

std 49.38 41.31 41.31 37.94 38.16 41.39 86.05 596.99

min 12.92 6.38e−4 6.38e−4 1.23e−4 7.12e−3 8.85e−4 5.64e−4 8.59e−4

25% 54.09 8.70 8.70 7.32 7.77 7.75 7.83 7.84

50% 69.22 18.55 18.55 20.66 19.79 20.26 20.25 20.24

75% 91.54 30.14 30.14 27.38 28.11 28.09 28.31 28.18

max 1890.0 1819.8 1819.8 1815.1 1816.1 5126.4 20967 84823


Figure 5.2.2: Polynomial linear regression

38 Results

Table 5.2.3: Results of polynomial random forest regression. The leftmost column contains the

details for the actual values that the regression is trying to predict, the rest of the columns show

the distribution of the absolute error achieved with Kth degree polynomial random forest

regression. The best accuracy is marked with boldface.

Actual K = 2 K = 3 K = 4

count 365655 365 655 365 655 365 655

mean 57.48 0.7180 0.6443 0.6670

std 31.51 7.132 7.129 7.431

min 6.30 0 0 0

25% 36.06 2.487e−14 2.132e−14 1.421e−14

50% 58.00 8.527e−14 7.816e−14 7.105e−14

75% 70.00 0.167 0.094 0.079

max 1260.00 1134 1144 1145

5.2.3 Random Forest Regression

The result of performing Kth degree polynomial random forest regression on the metrics

to predict container_power can be found in Table 5.2.3 for K ∈ [2..4]. There were some

issues with displaying the results of K > 4, and since the performance beyond that point

degraded for each K it was decided to leave those values out of the report.

The table shows that in general, K = 3 yielded the lowest mean absolute error, and

the best overall accuracy. The graphs for the K = 3 polynomial random forest regression

can be found in Figure 5.2.3.

The results for the Polynomial Random Forest Regression were really good, with an

error percentage of 1.10%. In order to validate these findings on new data, test data was

collected from 3 new days and the trained model exposed to that data instead. Sadly,

the model performed quite poorly in that case, with an error percentage of 26.48%.


Figure 5.2.3: Polynomial Random Forest Regression

40 Results

5.3 Genetic AlgorithmThis sections show the results of the Genetic Algorithm described in Section 4.5. It was

possible to get the Genetic Algorithm to converge at an mean absolute training error of

around 68, which represents an error of around 89%. After the change was made to the

individuals to use PolynomialFeatures, the algorithm converged at an error of around

28%, though with very small difference in the individual predictions.

In Figure 5.3.1, the training accuracy per generation is shown. Due to the large error

in the beginning it is very hard to visualize, but the decrease in error was quite gradual,

and the best level of accuracy was achieved around generation 1000. This figure is

representative for both representations of individuals.

Figure 5.3.1: The training accuracy per generation when running the Genetic Algorithm. Due

to the large error in the beginning it is very hard to visualize, but the decrease in error was quite

gradual, and the best level of accuracy was achieved around generation 1000.


5.4 Artificial Neural NetworksThis section contains the results for the Artificial Neural Networks. Section 5.4.1 shows

the results for the basic ANN described in Section 4.6.1 and Section 5.4.2 contains the

results for the LSTM approach detailed in Section 4.6.2.

5.4.1 Basic ANN

For some reason, something was going very wrong with the training. The training pre-

dictions were improving on a lot slower rate than the validation predictions, as can be

seen in Figure 5.4.1. It is believed that this is due to a bug somewhere in the model, and

it is a part of the next steps to locate that bug. The accuracy after 9900 epochs can be

found in Figure 5.4.2.

The lowest error was found in Epoch 9995 (out of 10000 total epochs). The error

kept getting smaller, but any tests run yielded very poor results. Remember that this

inaccuracy probably is due to the training/validation issues described earlier.

Figure 5.4.1: The training and validation errors over the course of the training. They show

that, unexpectedly, validation errors are lower than training errors.

42 Results

Figure 5.4.2: The training and validation predictions compared to the actual values of the

container_power. Note: The values on the x-axis are indexes, not epochs.


5.4.2 Long Short-Time Memory

The lowest error at that point was found in Epoch 3056. The mean averaged error loss

at that point was around 70, which is an error of almost 100%. Figure 5.4.3 shows the

result of the prediction in Epoch 3000.

Figure 5.4.3: The training and validation predictions at epoch 3000, compared to the actual

container_power. The values on the y-axis are the container_power in Watt-Hours, with the

indexes on the x-axis.

44 Results

5.5 Reinforcement LearningThis section shows the results of the Reinforcement Learning approach outlined in Sec-

tion 4.7. Unfortunately, due to the issues discussed in Section 3.2.4, no converging model

using Reinforcement Learning was obtained. Please refer to Figure 5.5.1 for the accuracy

over time when running the Reinforcement Learning. Due to time constraints and since

it was suspected from the beginning that the methodology was unsuited for the task, it

was decided to not continue improving the model beyond this point.

Figure 5.5.1: Results of the Reinforcement Learning per Epoch. As can be clearly seen on the

graph, the result did not converge towards a low absolute error.

CHAPTER 6

Evaluation

This section contains the evaluation of the results. The criterion for the evaluation can be

found in Section 1.3. Section 6.1 contains a qualitative comparison between the models

tried in this thesis and their results. Section 6.2 then evaluates effectiveness of the best

model against the state of the art.

6.1 Qualitative Model ComparisonThis section discusses the advantages and disadvantages to each of the approaches tried

throughout the project. The approaches are then scored according to prediction accuracy,

prediction time and overall potential. An evaluation of each of these properties can be

found in Sections 6.1.1 – 6.1.3.

6.1.1 Prediction Accuracy

This is the most important part, how accurate the prediction is. The spectrum of success

was very wide, with a number of solutions not converging at all or converging at really

bad estimates (more than 100% wrong). These included the Neural Network and Re-

inforcement Learning Approaches. The Neural Network approach was not studied very

extensively as part of the project so it is possible that there are more gains to be had

there.

Amidst the working solutions, the Genetic Algorithm solution never became amounted

to more than a glorified linear regression, potentially due to the similarities in the ap-

proaches. It achieved an error of around 23%, but the characteristics of the predictions

were such that the algorithm was more or less performing the same guess every time,

which is not ideal since the goal was to learn from the parameters.

45

46 Evaluation

Then there were the Bayesian Inference methods. It should be said up front that these

outperformed the rest when it came to prediction accuracy, with Polynomial Random

Forest Regression clearly being the winner, though the difference is much less clear when

running the model on completely new test data.

The reason why the Bayesian Inference performed the best is believed to be that the

approach is more straightforward. Since it works in a purely statistical manner, the

risk for overfitting is reduced. Though LSTM based approaches proved unsuccessful in

this project, it is believed that with enough experimentation, they could potentially be

leveraged to achieve even better results.

6.1.2 Prediction Time

This section deals only with how fast a model could be considered to reach optimal

prediction strength. This is important since the goal of the project is related to green

energy, and it therefore behooves the algorithm to be energy efficient as well.

Due to the train/validation issue described in Section 5.4.1, that model never converged

(in 10000 epochs), probably since it was only ever considering the training set.

The slowest algorithm of them all was by far the LSTM approach, which would converge

after roughly three days of computation (Running on a Ubuntu 18.04 VM, 16GB RAM).

Then came the Genetic Algorithm which was optimized using the techniques described

in Section 4.5.4 and after that would converge in around 12 hours.

The Polynomial Random Forest Regression also took quite some time to compute,

roughly 8 hours on a dataset of around 3.5 million points. The Polynomial Linear Re-

gression was a lot faster, only around 20 minutes. The fastest algorithms were the Simple

Linear Regression and the Reinforcement Learning approach (though the latter converged

to an error of around 4285%, and it is unclear whether to refer to that as convergence).

6.1. Qualitative Model Comparison 47

Table 6.1.1: The attempted approaches and their outcomes.

Name Build Time Mean Absolute Error

Linear Regression 10 seconds 29.46%

Polynomial Linear Regression 20 minutes 28.72%

Polynomial Random Forest 8 hours 26.48%

Genetic Algorithm 12 hours Around 28%

Basic ANN Didn’t converge —

LSTM 3 days Almost 100%

Reinforcement Learning Didn’t converge —

6.1.3 Summary

From Sections 6.1.1 – 6.1.2 and as shown in Table 6.1.1, we see that the Polynomial

Random Forest Regression was the most accurate of the solutions, but that it only

outperformed the other regression approaches by percentage points when run against the

final test set.

It can thus be concluded that if the best absolute prediction is wanted, the Polynomial

Random Forest Regression is recommended, and that the Polynomial Linear Regression

can be seen as a happy medium between model build time and prediction accuracy.

48 Evaluation

6.2 EffectivenessThe best Mean Absolute Error was achieved using Polynomial Random Forest Regression

with K = 3, and had an error of 26.48% when run against test data gathered after the

model was built. This is not even close to the state of the art as found in Section 2.4.

Some ideas as to why the results are so different from the ones shown in previous

attempts.

• None of the papers studied released the entire data trace they were working from,

so validating the models against their findings can only be subjective.

• Most other papers focused on making predictions on data center peak power per-

formance. It could be that the sparse utilization of the data centers used in this

report leads to metrics that are harder to predict.

• Different papers used different ways of measuring errors (i.e. Absolute Error, Rel-

ative Error to previous solution etc.).

• There are other factors that are not included in the data set (i.e. GPU utilization,

Server Room Temperature etc.)

• Due to the coarse granularity of some of the measurements, container averages are

used for power and network utilization. This could lead both to imprecision and

overfitting on the averages.

Regardless of how good the predictions are when compared to the state of the art.

They can still serve as a motivator to help users be more thoughtful about their cloud

usage, which was the goal all along.

CHAPTER 7

Discussion

This section covers discussion of the impact of the results, in addition to what was

covered in Section 6.2. Section 7.1 discusses the research question and to what extent it

was answered. Section 7.2 touches on the comparison between this project and previous

work. Section 7.3 then discusses how useful the findings are to the end user.

7.1 Research QuestionThroughout the project, the research question has been helpful in guiding all activities

towards finding different machine learning approaches to optimize prediction energy con-

sumption. The goal was to determine whether these algorithms would help optimize the

prediction of energy consumption.

It can be concluded that the answer to the question with regards to the dataset used

throughout the project is that yes, machine learning methodologies are helpful when

predicting power consumption. Currently the best approach is Polynomial Random Tree

Regression, but it is expected that even better approaches would be found with further

study.

49

50 Discussion

7.2 Comparison with Previous WorkWhen studying the previous work performed by others it became clear that no consensus

existed as to what constituted a good prediction. This is also understandable from

the view that every dataset is subjective to the conditions in the cloud center where it

was gathered. This is the case since number of unknowns is too large at this stage for a

complete model that would hold for all data centers, locations, workloads and schedulers.

The discrepancy between the state of the art predictions in the papers and the pre-

dictions achieved in this thesis can thus be explained by the different conditions under

which this data center was operating (Sparse Load, no GPU measurements etc.). For

more on this, see Section 6.2.

7.3 Usefulness of the FindingsThrough the work performed during the thesis, it could be shown that there is a correla-

tion between user behavior and container energy consumption. In general, the prediction

models tend to reflect other patterns in the data in addition to the user’s impact, making

it to give detailed guidance with regards to a user’s environmental footprint. Thus at

this point, the research should be used mainly to raise awareness and to motivate the

user to think about the environmental impact of their cloud usage.

Given that the prediction results will probably keep improving as more work is per-

formed in this or adjacent fields and as the climate crisis unfolds, the relevance of the

concept to user behavior is expected keep increasing.

CHAPTER 8

Conclusion and Future Work

This chapter summarizes the thesis by outlining what was accomplished and what re-

search is yet to be done. The conclusion of the project can be found in Section 8.1 and

the notes on future work can be found in Section 8.2.

8.1 ConclusionIn this thesis the problem domain of cloud center energy consumption analysis and es-

timation has been researched and discussed. The research question was whether the

application of machine learning algorithms would help optimize the prediction accuracy.

Various models have been made to perform such predictions.

The state of the art was found to be subjective based on the dataset, and no consensus

exists as to what dataset to use as a baseline when making comparisons. The best results

in this project were found using a Polynomial Random Forest Regression algorithm with

the 3 as the degree of the polynomial expansion. The test set prediction accuracy of that

approach resulted in a mean absolute error of 26.48%.

It is concluded that the prediction accuracy can be useful in the use case described in

this thesis, as an educative estimate of what metrics of a cloud job have what impact

on power consumption. In addition literature review, data analysis and initial thinking

about models could be useful in future studies (See Section 8.2).

51

52

8.2 Future WorkThis research could be taken in many different directions, in order to gain an even greater

understanding of the relationship between customer workloads and the energy consump-

tion, as well as to be able to forecast future energy consumption based on predicted

workloads.

The data in this thesis was obtained mainly from the RISE data center in Lulea. That

data center was seeing quite sparse usage, and was thus operating far from peak perfor-

mance. This could have a huge impact on the nature of the results and the relationship

between the load on the data center as a whole and the impact of each container needs

to be investigated further.

It is believed that the user’s behavior impacts the data center’s energy consumption,

but that impact can easily be drowned out by other, more powerful trends in the data.

Further studies are needed to develop heuristics to make sure that it is the user’s impact

that is being predicted, and not factors beyond their control.

The heuristics for splitting power consumption by the containers used throughout this

thesis were very straightforward. It is believed that better heuristics exist, and that

correct heuristics could provide valuable insights into the data center dynamics.

There is more work to be done with regards to fine tuning the models and finding even

better ways of predicting the energy consumption. This work can be seen as a starting

point in that endeavor.

Another interesting angle to study would be what kind of prompts would be most

helpful to motivate users to improve the environmental footprint of their cloud usage.

This could be done in the form of user research, as well as through looking at adjacent

attempts to help people be more environmentally aware.

This project was all about predicting energy consumption based on available metrics.

An interesting follow up study would be to try and project the behavior of the cloud

into the future. The research performed in this thesis could be helpful in making sure

that one does not have to introduce power measuring devices for each node in each data

center where such forecasting is to be done.

That means that if it is possible to forecast those metrics (given a certain user workload

and a time when it is submitted to the data center), it would be possible to give even

better predictions to the user, based on the most recent data on that node. Such a system

would have to be easy to use in order to invite active participation.

Bibliography

[1] R. Shaw, E. Howley, and E. Barrett, “An advanced reinforcement learning approach

for energy-aware virtual machine consolidation in cloud data centers,” 2017 12th In-

ternational Conference for Internet Technology and Secured Transactions (ICITST),

2017.

[2] Y. Qiu, C. Jiang, Y. Wang, D. Ou, Y. Li, and J. Wan, “Energy aware virtual machine

scheduling in data centers,” Energies, vol. 12, no. 4, p. 646, 2019.

[3] A. Jaiantilal, Y. Jiang, and S. Mishra, “Modeling cpu energy consumption for energy

efficient scheduling,” Proceedings of the 1st Workshop on Green Computing - GCM

10, 2010.

[4] A. Hindle, “Green software engineering: The curse of methodology,” 2016 IEEE

23rd International Conference on Software Analysis, Evolution, and Reengineering

(SANER), 2016.

[5] N. S. Chauhan and A. Saxena, “A green software development life cycle for cloud

computing,” IT Professional, vol. 15, no. 1, pp. 8–34, 2013.

[6] L. Ardito, G. Procaccianti, M. Torchiano, and A. Vetro, “Understanding green soft-

ware development: A conceptual framework,” IT Professional, vol. 17, no. 1, pp.

44–50, 2015.

[7] J. L. Berral, Inigo Goiri, R. Nou, F. Julia, J. Guitart, R. Gavalda, and J. Torres, “To-

wards energy-aware scheduling in data centers using machine learning,” Proceedings

of the 1st International Conference on Energy-Efficient Computing and Networking

- e-Energy 10, 2010.

[8] J. L. Berral, R. Gavalda, and J. Torres, “Adaptive scheduling on power-aware man-

aged data-centers using machine learning,” 2011 IEEE/ACM 12th International

Conference on Grid Computing, 2011.

[9] G. Portaluri, D. Adami, A. Gabbrielli, S. Giordano, and M. Pagano, “Power

consumption-aware virtual machine placement in cloud data center,” IEEE Trans-

actions on Green Communications and Networking, vol. 1, no. 4, pp. 541–550, 2017.

53

54

[10] Q. Fang, J. Wang, and Q. Gong, “Qos-driven power management of data centers

via model predictive control,” IEEE Transactions on Automation Science and En-

gineering, vol. 13, no. 4, pp. 1557–1566, 2016.

[11] H. He and H. Shen, “Green-aware online resource allocation for geo-distributed cloud

data centers on multi-source energy,” 2016 17th International Conference on Parallel

and Distributed Computing, Applications and Technologies (PDCAT), 2016.

[12] J. V. Wang, C.-T. Cheng, and C. K. Tse, “A power and thermal-aware virtual

machine allocation mechanism for cloud data centers,” 2015 IEEE International

Conference on Communication Workshop (ICCW), 2015.

[13] D.-M. Zhao, J.-T. Zhou, and K. Li, “An energy-aware algorithm for virtual machine

placement in cloud computing,” IEEE Access, vol. 7, pp. 55 659–55 668, 2019.

[14] A. Radhakrishnan and K. Saravanan, “Energy aware resource allocation model for

iaas optimization,” Studies in Big Data Cloud Computing for Optimization: Foun-

dations, Applications, and Challenges, pp. 51–71, 2018.

[15] I. Kar, R. R. Parida, and H. Das, “Energy aware scheduling using genetic algorithm

in cloud data centers,” 2016 International Conference on Electrical, Electronics, and

Optimization Techniques (ICEEOT), 2016.

[16] S. Javed, W. Manzoor, N. Akhtar, and D. K. Zafar, “Optimization of resource

allocation scheduling in cloud computing by genetic algorithm,” vol. 2, no. 1, 2013.

[17] X. Zhou, K. Wang, W. Jia, and M. Guo, “Reinforcement learning-based adaptive

resource management of differentiated services in geo-distributed data centers,” 2017

IEEE/ACM 25th International Symposium on Quality of Service (IWQoS), 2017.

[18] V. R. Rajarathinam, J. Rajarathinam, and H. Gupta, “Power-aware meta scheduler

with non-linear workload prediction for adaptive virtual machine provisioning,” In-

telligent Computing Theory Lecture Notes in Computer Science, pp. 826–837, 2014.

[19] K. Qazi, Y. Li, and A. Sohn, “Workload prediction of virtual machines for har-

nessing data center resources,” 2014 IEEE 7th International Conference on Cloud

Computing, 2014.

[20] F. Ramezani and M. Naderpour, “A fuzzy virtual machine workload prediction

method for cloud environments,” 2017 IEEE International Conference on Fuzzy

Systems (FUZZ-IEEE), 2017.

[21] P. S. L. Kalyampudi, P. V. Krishna, S. Kuppani, and V. Saritha, “A work load

prediction strategy for power optimization on cloud based data centre using deep

machine learning,” Evolutionary Intelligence, 2019.

55

[22] Q. Zhang, L. T. Yang, Z. Yan, Z. Chen, and P. Li, “An efficient deep learning

model to predict cloud workload for industry informatics,” IEEE Transactions on

Industrial Informatics, vol. 14, no. 7, pp. 3170–3178, 2018.

[23] F. Nwanganga and N. Chawla, “Using structural similarity to predict future work-

load behavior in the cloud,” 2019 IEEE 12th International Conference on Cloud

Computing (CLOUD), 2019.

[24] W. Ding, F. Luo, C. Gu, H. Lu, and Q. Zhou, “Performance-to-power ratio aware

resource consolidation framework based on reinforcement learning in cloud data

centers,” IEEE Access, pp. 1–1, 2020.

[25] D. Meisner and T. F. Wenisch, “Peak power modeling for data center servers with

switched-mode power supplies,” Proceedings of the 16th ACM/IEEE international

symposium on Low power electronics and design - ISLPED 10, 2010.

[26] G. Dhiman, K. Mihic, and T. Rosing, “A system for online power prediction in

virtualized environments using gaussian mixture models,” Proceedings of the 47th

Design Automation Conference on - DAC 10, 2010.

[27] M. Dayarathna, Y. Wen, and R. Fan, “Data center energy consumption modeling:

A survey,” IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 732–794,

2016.

[28] M. Canuto, R. Bosch, M. Macias, and J. Guitart, “A methodology for full-system

power modeling in heterogeneous data centers,” Proceedings of the 9th International

Conference on Utility and Cloud Computing - UCC 16, 2016.

[29] A. Borghesi, A. Bartolini, M. Lombardi, M. Milano, and L. Benini, “Predictive

modeling for job power consumption in hpc systems,” Lecture Notes in Computer

Science High Performance Computing, pp. 181–199, 2016.

[30] Y. Li, H. Hu, Y. Wen, and J. Zhang, “Learning-based power prediction for data

centre operations via deep neural networks,” Proceedings of the 5th International

Workshop on Energy Efficient Data Centres - E2DC 16, 2016.

[31] J. V. Kistowski, M. Schreck, and S. Kounev, “Predicting power consumption in vir-

tualized environments,” Computer Performance Engineering Lecture Notes in Com-

puter Science, pp. 79–93, 2016.

[32] N. Liu, X. Lin, and Y. Wang, “Data center power management for regulation service

using neural network-based power prediction,” 2017 18th International Symposium

on Quality Electronic Design (ISQED), 2017.

56

[33] M. Ferroni, A. Corna, A. Damiani, R. Brondolin, J. A. Colmenares, S. Hofmeyr, J. D.

Kubiatowicz, and M. D. Santambrogio, “Power consumption models for multi-tenant

server infrastructures,” ACM Transactions on Architecture and Code Optimization,

vol. 14, no. 4, pp. 1–22, 2017.

[34] A. Rayan and Y. Nah, “Energy-aware resource prediction in virtualized data centers:

A machine learning approach,” 2018 IEEE International Conference on Consumer

Electronics - Asia (ICCE-Asia), 2018.

[35] Y.-F. Hsu, K. Matsuda, and M. Matsuoka, “Self-aware workload forecasting in data

center power prediction,” 2018 18th IEEE/ACM International Symposium on Clus-

ter, Cloud and Grid Computing (CCGRID), 2018.

[36] K. N. Khan, S. Scepanovic, T. Niemi, J. K. Nurminen, S. V. Alfthan, and O.-P.

Lehto, “Analyzing the power consumption behavior of a large scale data center,”

SICS Software-Intensive Cyber-Physical Systems, vol. 34, no. 1, pp. 61–70, 2018.

[37] S. V, A. M, C. D. H, G. M. Chethana, and K. S, “A weighted ensemble of auto-

matic algorithms for virtual machine performance prediction in cloud,” International

Journal of Current Engineering and Scientific Research, vol. 6, no. 6, pp. 198–203,

2019.

[38] D. Yi, X. Zhou, Y. Wen, and R. Tan, “Toward efficient compute-intensive job allo-

cation for green data centers: A deep reinforcement learning approach,” 2019 IEEE

39th International Conference on Distributed Computing Systems (ICDCS), 2019.

[39] J. V. Kistowski, J. Grohmann, N. Schmitt, and S. Kounev, “Predicting server power

consumption from standard rating results,” Proceedings of the 2019 ACM/SPEC

International Conference on Performance Engineering - ICPE 19, 2019.

[40] D. Yi, X. Zhou, Y. Wen, and R. Tan, “Efficient compute-intensive job allocation in

data centers via deep reinforcement learning,” IEEE Transactions on Parallel and

Distributed Systems, pp. 1–1, 2020.

[41] “Lii. an essay towards solving a problem in the doctrine of chances. by the late rev.

mr. bayes, f. r. s. communicated by mr. price, in a letter to john canton, a. m. f. r.

s,” Philosophical Transactions of the Royal Society of London, vol. 53, pp. 370–418,

1763.

[42] J. H. Holland, “Outline for a logical theory of adaptive systems,” Journal of the

ACM (JACM), vol. 9, no. 3, pp. 297–314, 1962.

[43] C. Darwin, “On the origin of species by means of natural selection, or, the preser-

vation of favoured races in the struggle for life,” 1859.

57

[44] S. Gu, T. P. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep

q-learning with model-based acceleration,” CoRR, 2016. [Online]. Available:

http://arxiv.org/abs/1603.00748

[45] S. Gu, “Sample-efficient deep reinforcement learning for continuous control,” Ph.D.

dissertation.

[46] “Rancher – run kubernetes everywhere.” [Online]. Available: https://rancher.com/

(Accessed 2020-03-02).

[47] “Kubernetes — production-grade container orchestration.” [Online]. Available:

https://kubernetes.io/ (Accessed 2020-03-02).

[48] “Prometheus — from metrics to insight.” [Online]. Available: https://prometheus.

io/ (Accessed 2020-03-02).

[49] “Weather underground.” [Online]. Available: https://www.wunderground.com/

about/data (Accessed 2020-04-13).

[50] “scikit-learn - machine learning in python.” [Online]. Available: https:

//scikit-learn.org/stable/ (Accessed 2020-03-30).

[51] J. Salvatier, T. Wiecki, and C. Fonnesbeck, “Probabilistic programming in python

using pymc3,” 01 2016.

[52] “Getting started with pymc3.” [Online]. Available: https://docs.pymc.io/

notebooks/getting started.html (Accessed 2020-02-25).

[53] “General api quickstart.” [Online]. Available: https://docs.pymc.io/notebooks/

api quickstart.html (Accessed 2020-02-25).

[54] “Probabilistic programming & bayesian methods for hack-

ers.” [Online]. Available: https://camdavidsonpilon.github.io/

Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/ (Accessed 2020-

02-25).

[55] F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, and C. Gagne,

“DEAP: Evolutionary algorithms made easy,” Journal of Machine Learning Re-

search, vol. 13, pp. 2171–2175, Jul. 2012.

[56] “Understanding lstm networks.” [Online]. Available: https://colah.github.io/posts/

2015-08-Understanding-LSTMs/ (Accessed 2020-02-26).

[57] Chandler, “A pytorch example to use rnn for financial prediction.” [Online].

Available: https://chandlerzuo.github.io/blog/2017/11/darnn (Accessed 2020-02-

26).

http://arxiv.org/abs/1603.00748

https://rancher.com/

https://kubernetes.io/

https://prometheus.io/

https://prometheus.io/

https://www.wunderground.com/about/data

https://www.wunderground.com/about/data

https://scikit-learn.org/stable/

https://scikit-learn.org/stable/

https://docs.pymc.io/notebooks/getting_started.html

https://docs.pymc.io/notebooks/getting_started.html

https://docs.pymc.io/notebooks/api_quickstart.html

https://docs.pymc.io/notebooks/api_quickstart.html

https://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/

https://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

https://chandlerzuo.github.io/blog/2017/11/darnn

58

[58] Ikostrikov, “Reimplementation of continuous deep q-learning with model-based

acceleration and continuous control with deep reinforcement learning,” Jan. 2020.

[Online]. Available: https://github.com/ikostrikov/pytorch-ddpg-naf (Accessed

2020-03-02).

https://github.com/ikostrikov/pytorch-ddpg-naf

Documents

Predicting Container-Level Power Consumption in Data Centers …1439069/... · 2020-06-11 · Predicting Container-Level Power Consumption in Data Centers using Machine Learning Approaches