Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica
Tesi di Laurea Magistrale in Sistemi Real Time
Real-time monitoring of microservices-based software systems Anno Accademico 2016/2017
relatore Ch.mo Prof. Marcello Cinque correlatore Ing. Raffaele Della Corte candidato Raffaele Iorio matr. M63/000492
[Dedica]
Index Index .................................................................................................................................................. IIIIntroduction .......................................................................................................................................... 5Chapter 1: The world of microservices ................................................................................................ 7
1.1 Before the microservices ............................................................................................................ 71.2 Microservices ............................................................................................................................. 9
1.2.1 Communication between microservices ........................................................................... 10Chapter 2: How to monitor microservices? ....................................................................................... 14
2.1 Real-time log analysis .............................................................................................................. 162.1.1 ELK Stack ......................................................................................................................... 16
2.2 Consideration on the state of the art ........................................................................................ 18Chapter 3: A new approach ................................................................................................................ 19
3.1 Rule-based logging .................................................................................................................. 193.2 Sniffing or proxy? .................................................................................................................... 213.3 What information do I need in the log? ................................................................................... 22
Chapter 4: MetroFunnel ..................................................................................................................... 244.1 Software Architecture .............................................................................................................. 244.2 Software workflow ................................................................................................................... 284.3 Docker version ......................................................................................................................... 31
4.3.1 Image and Dockerfile ........................................................................................................ 334.4 Elastic stack configuration ....................................................................................................... 344.5 User and operating manual ...................................................................................................... 36
Chapter 5: Comparison ...................................................................................................................... 405.1 Clearwater IMS ........................................................................................................................ 40
5.1.1 Clearwater-live-test ........................................................................................................... 445.1.2 Basic operation .................................................................................................................. 44
5.2 Testbed ..................................................................................................................................... 455.3 Performance Analysis .............................................................................................................. 47
5.3.1 Log Size ............................................................................................................................ 495.3.2 Data incoming ................................................................................................................... 515.3.3 Execution time .................................................................................................................. 545.3.4 Bandwidth ......................................................................................................................... 68
5.4 Failure Analysis ....................................................................................................................... 705.4.1 Failure 503 ........................................................................................................................ 715.4.2 Failure 181 ........................................................................................................................ 725.4.3 Failure 502 –Homestead-prov (forced kill) ...................................................................... 745.4.4 Failure 502 – Homer (forced kill) ..................................................................................... 775.4.5 Failure 504 – Overload ..................................................................................................... 785.4.6 Further considerations on failure analysis ........................................................................ 82
Conclusions ........................................................................................................................................ 84
Future developments .......................................................................................................................... 85Bibliography ...................................................................................................................................... 86
Real-time monitoring of microservices-based software systems
5
Introduction
Monitoring is one of the most widespread practices to evaluate the functioning status of a
system. However, progress in development techniques is not always followed by just as
much progress in monitoring techniques.
This is the case of microservices, and of their monitoring.
Microservices, a recent evolution of service-oriented architecture (SOA), are presented in
some studies in the period 2007/08, until their complete diffusion in 2014/15.
If a few years have passed since their affirmation, to date, there is still no specific tool, able
to monitor the microservices; all that is available are various tools or techniques that can be
adapted to microservices.
This adaptation implies that it is not always possible to obtain the right information from
monitoring; furthermore, considerable configurations are necessary before actually starting
monitoring.
The idea that has been pursued during this work is to make available to developers and the
community, a specific tool for microservices, easy to install, transparent and non-intrusive.
This was made possible, taking the best of current solutions, without bringing back all their
faults; moreover, the rule-based logging principles has been applied, which is an effective
technique for the generation of log.
Thus MetroFunnel has developed, which is a specific tool for microservices.
Real-time monitoring of microservices-based software systems
6
Subsequently, to validate the project, a test application was chosen, Clearwater IMS, and it
was compared with various techniques and tools available.
We first assessed the performance impact, which MetroFunnel produces compared to the
other solutions, and then the number of failures detected between the different solutions.
As it can be seen, MetroFunnel achieves excellent results from many points of view, with
minimum performance effort to pay, but which in some respects may be totally negligible.
Therefore, the proposed solution, is ready to use as a valid tool for monitoring
microservices, and at the same time, can be used as a valid starting point for even better
future developments.
The first chapter starts with a little history and why microservices are born.
Chapter 2 illustrates the techniques present today for the monitoring of microservices,
showing strengths and weaknesses, and above all, what is not currently present.
Chapter 3 presents the idea behind MetroFunnel, which is packet sniffing to generate logs,
exploiting the principles of the rule-based logging.
Chapter 4 describes the architecture of MetroFunnel, its behavior, various examples of log
and examples in function.
Then, in Chapter 5, it shows a comparison of the different solutions.
Real-time monitoring of microservices-based software systems
7
Chapter 1: The world of microservices
This chapter is an introduction to the world of microservices; we start with a bit of history,
showing what the architectural models of application were and their evolution up to the birth
of microservices.
Then we go into more detail, explaining the operating principles of microservices, the
various development techniques, showing their advantages and disadvantages.
1.1 Before the microservices In the beginning, there was the "monolith": applications developed and distributed as a
single entity. These monolithic applications are easy to implement because they have only
one code base, typically gathered in a single project that is distributed within a single
package. Another possible definition is: software application whose modules can not be
executed independently. This definition explains another criticality of monoliths; they are
very difficult to use in distributed systems without an appropriate framework or an ad-hoc
solution [4], [15].
This type of architecture lends itself well to small applications or in any case not subject to
changes, but it is problematic, when we find ourselves developing complex and rapidly
evolving applications.
The disadvantages of monolithic applications have quickly become clear to developers, and
therefore a new model of architecture was needed. It started with the principle of
decomposition and in particular with the logical decomposition that allowed more efficient
scalability.
Real-time monitoring of microservices-based software systems
8
This architecture, defined multi-tier, is generally made up of a data layer, a business logic
layer, and a presentation layer, to divide the program according to the tasks it has to manage.
Data layer takes care of the memorization, the presentation layer takes care of the interaction
with the user, the logical level contains all the processing parts, of information.
The next step was to break down applications based on business functionality (for example,
to manage an inventory, insert items, check availability, etc.) rather than a stack-level
division like a multi-tier. An application is seen as a collection of services, functionality,
and therefore the first applications based on service-oriented architecture (SOA) are
developed. The advantages of an SOA architecture are [1], [4]:
• Dynamism - New instances of the same service can be launched to split the load on
the system;
• Modularity and reuse - Complex services are composed of simpler ones. The same
services can be used by different systems;
• Distributed development – By agreeing on the interfaces of the distributed system,
distinct development teams can develop partitions of it in parallel;
• Integration of heterogeneous and legacy systems - Services merely have to
implement standard protocols to communicate.
Precisely, the last point was one of the determining factors for the birth of microservices.
As previously written, to connect different and heterogeneous systems, it was necessary to
implement ad-hoc communication mechanisms; usually, a proprietary Enterprise Service
Bus (ESB) product is used for this task. This proprietary product conceals within itself the
complexity of the coordination of the various components. Over time, changing the
configurations of the ESB becomes increasingly difficult and we tend not to use the ESB
anymore, but to make the components call each other directly, re-presenting the problems
that ESB had proposed to resolve [5].
Therefore, it was necessary to have independent, easily modifiable and replaceable services,
which communicate in a simple way.
Real-time monitoring of microservices-based software systems
9
1.2 Microservices The development of the microservices architecture is very linked to the affirmation of the
DevOps and Agile methodologies.
DevOps, which derives from "development" and "operations", is a software development
methodology that focuses on communication, collaboration and integration between
developers and employees of information technology.
A definition proposed is [6]:
DevOps is a set of practices intended to reduce the time between committing a change to a
system and the change being placed into normal production, while ensuring high quality.
Among the practices promoted by Agile methods there are the formation of small, cross-
functional and self-organized development teams, iterative and incremental development,
adaptive planning, and the direct and continuous involvement of the client in the
development process.
The SOAs, as they were constituted, were not sufficient, as there was a high coupling
between the various services (high coupling indicates the degree to which each component
of a software depends on the other components), thus preventing the independent
development of services and the rapid release of new software versions as required by the
two methodologies.
The services were then broken down not only on the basis of their business functionality,
but also on the principle of the Single Responsibility Principle (SRP). According to the SRP,
it is good to collect together those elements that change for the same reason, and keep the
elements that change for different reasons separate.
The microservices follow the same principle: they group operations that do the same things
and make them independent of the others. A microservice, must be able to be released in a
completely independent of others way and, above all, in a completely transparent way with
respect to its consumers.
Real-time monitoring of microservices-based software systems
10
The main features of a microservice are:
• Small Size
• Independency
• Flexibility
• Modularity
The size of a microservice is generally smaller than a service, but the word “micro” must
not mislead; a microservice is not only small, but must be above all simple, like UNIX
philosophy [7]:
Write programs that do one thing and do it well. Write programs to work together. Write
programs to handle text streams, because that is a universal interface.
An application must be able to keep up with the ever-changing business environment and is
able to support all modifications that is necessary for an organisation to stay competitive on
the market, therefore the microservices must be flexible and independent of each other.
Moreover, an application must be modular, composed of isolated components, where each
component contributes to the overall system behaviour.
Now it was necessary to find a simple and effective method of communication for
microservices, which overcame the limits and problems faced by SOAs.
As often happens, it is easier to use a methodology that works, rather than to develop a new
one; this explain why HTTP with REST are linked to microservice architectures.
1.2.1 Communication between microservices
There are various studies where techniques for microservices communication are discussed;
however, all agree that they must be simple, light and efficient [5][15]. For example, you
can use memory sharing (database), message exchange, or specific ad-hoc protocols. The
most important thing to focus on is maintaining the independence of each microservice over
others.
Database sharing is the easiest method for communication, it is easy and above all quick to
Real-time monitoring of microservices-based software systems
11
implement, but it has a huge flaw: all microservices need to know the database
implementation. Moreover, if you need to make a change to the database to implement a
new microservice, you must to be sure that this change does not affect other microservices,
making them independent on each other.
As for SOAs, implementing a protocol or an ad-hoc bus, with the increase of microservices,
leads to a considerable increase in the difficulty of implementation and maintenance of the
bus, thus preventing the development of new microservices.
The message exchange model is based on two simple messages, requests and response; a
client sends a request and waits for a response from the server.
There are two possible solutions to implement the request/response model, they are RPC
(Remote Procedure Call) and REST (REpresentational State Transfer).
In the RPC model, a call is made to a local function, which will be executed on a remote
service, of which we do not necessarily know the location.
There are several types of RPC-based technologies: some of them exhibit a separate
interface that makes the communication between clients and servers easier, even if they are
made with different technologies.
Some examples are SOAP (Simple Object Access Protocol), which is a protocol
specification for exchanging structured information, and WSDL (Web Services Description
Language), which is a formal language in XML format used for the creation of "documents"
for the description of Web Service.
Using WSDL, the public interface of a Web Service can be described, and indicates how to
interact with a specific service.
The disadvantage is the high coupling between client and server, due precisely to the WSDL
model. Each change to the server involves the new generation of the WSDL model, and
therefore a new reading of the same by the client in order to use the new implementations.
REST is an architectural style inspired by the web, which exploits what the web itself
already exposes; the main difference, compared to RPC, is the concept of resource.
Real-time monitoring of microservices-based software systems
12
RPC exposes services, while REST exposes “resource”. In the context of microservices, the
resource is something that a microservice knows well, such as the “Customer”, or the
“Article”. However, the application must know the format of the returned information
(representation), typically an HTML, XML or JSON (JavaScript Object Notation, is a
format suitable for the interchange of data between client-server applications) document,
but it could also be an image or any other content.
Another principle introduced by REST, useful for the development of totally decoupled
services, is HATEOAS (Hypermedia As The Engine Of Application State). This principle
states that in a REST application, the client needs to know very little about the application
in order to use it; ideally, the only thing that needs to know is the input Uniform Resource
Identifier (URI).
An URI indicates a sequence of characters that uniquely identifies a generic resource.
Examples of URIs are: a web address (URL), a document, an image, a file, a service, an e-
mail address, etc.
The server creates different representations of the resource; however, how the resource is
exposed externally, is completely separate from the way it is stored inside. The protocol that
is most used to implement REST is HTTP, but it is not necessarily the only supported one.
In any case, the methods exposed by the HTTP protocol perfectly match the REST style,
because it is possible to perform all the CRUD operations (Create, Read, Update and Delete)
on the resources by means of different type of HTTP request, as:
• GET: used to read the status of a resource
• POST: used to create a resource
• PUT: used to modify a resource
• DELETE: used to delete a resource.
Now through some examples, let's see how HTTP methods are related to resources.
Suppose we have a "test" application that manages the "users" resource, allowing consumers
to create, modify, insert and delete users, the operations on resources will be:
Real-time monitoring of microservices-based software systems
13
1. GET /test/users/1
2. GET /test/users
3. POST /test/users/2
4. DELETE /test/users/3
In the first example, with the expression "/test/users/1" we are going to indicate the resource,
which in this case represents the user with the identified code 1 of the “test” application.
In the second example instead, we want to have a list of all the resources contained in the
resources identified by the URI "/test/users/".
In example 3, we want to create the resource identified by the URI “/test/users/2”.
In the example 4 instead, we want to eliminate the resource identified by the URI
“/test/users/3”.
Below, a typical architecture based on microservices is shown.
Figure 1 - Microservices architecture
Real-time monitoring of microservices-based software systems
14
Chapter 2: How to monitor microservices?
After having seen how microservices work, we can analyse what are the techniques and
tools that can monitor the functioning of microservices. Currently it is possible to divide the
monitoring of microservices under various aspects and application domains.
For example, tools are available to monitor the development of microservices, in the
validation and design phase (example of API validation tools are Postman [22], Apiwatcher
[23], etc.); they are able to verify that the response of a microservice request is the expected
one, thus going to break down the message and its payload and compare them with the
expected ones.
Then there are all those tools that allow monitoring of computers and network resources;
among these for example very good tools are Nagios[13], Amazon Cloudwatch [20].
Otherwhise, there are tools like Hystrix [21] (by Netflix) that, by manually adding code, add
a wrapper to all calls to external systems (could be used for microservice calls), to increase
latency and failure control and stop cascading failures but, it also allows the monitoring of
resources.
Finally, there are tools for analysing logs in real time, that allow to obtain information from
the log that generates the application.
As we are interested in monitoring during the normal operation of the application, and not
in the development phase, it is not possible to use the API validation tools.
Tools like Hystrix, allow the monitoring of resources, but only by adding code in the
application; moreover, the monitoring must be seen as an additional functionality with
respect to the main purpose of using Hystrix, that is, i.e., to improve the application
Real-time monitoring of microservices-based software systems
15
reliability preventing that cascading failures lead to the failure of the entire application.
Let's move on to tools like Amazon Cloudwatch and Nagios.
The first is a commercial tool for collecting and monitoring parameters and log files, setting
alarms and automatically reacting to changes in AWS resources (Amazon Web Services).
It can be used to achieve system-wide visibility on resource utilization, application
performance, and operational health status. The information obtained in this way can be
used to correct the operation and keep the performance of the applications always optimal.
However, Cloudwatch can only be used for applications running on AWS and only the
commercial version exists.
Nagios is a valid suite of monitoring tools. In particular, by exploiting the URL monitoring
provided by the Nagios XI program, it is possible to have some information on the status of
the microservices (through a URL you identify also a resource, and therefore a
microservice); it is also possible to set the monitoring of an entire website, obtaining general
information on the operating status.
In the first case, Nagios requires the configuration for each URL that you want to monitor;
therefore, since the number of resources of the microservices is considerable, that this
approach can take a lot of time in the configuration phase when applied in system based on
micoservices. In addition, every modification requires a new configuration.
In the second case, it is possible to monitor an entire website, but losing the granularity of
the information at the resource level and therefore at the microservice level.
It is important to note that microservices use REST and HTTP to work, but it is not necessary
to use them through a web server, but it is also possible to use them through a different
application listening on certain TCP ports; so, Nagios may not work properly or not work
completely, in these cases.
Moreover, Nagios is a very demanding tool from the point of view of the recommended
hardware requirements; to monitor up to 250 services, is recommended, 40GB of HD and
from 1 to 4 GB of RAM [13].
Based on these considerations, let's study the main advantages and disadvantages of real-
Real-time monitoring of microservices-based software systems
16
time log analysis.
2.1 Real-time log analysis Currently, the best method for monitoring microservices is certainly the analysis of the log
in real time. This is possible thanks to the support of some tools that allow you to read the
log generated by the application, extrapolate the right information and show them on the
screen. To do this many applications are available, in particular we are going to consider
those distributed by Elastic [12] and in particular the ELK stack.
2.1.1 ELK Stack
The ELK stack consists mainly of 4 components, as can be seen from the following figure.
Beats is a whole family of programs useful for data collection, in particular we have Filebeat,
which is precisely a lightweight shipper, for sending the logs, to the next component of the
chain. Filebeat is useful above all in the distributed field, where there are various physical
nodes that generate log files.
Logstash is the application of the chain that takes care of receiving data, filtering them
according to a configuration file, making any changes to them, and standardizing them
according to a user-defined template.
Elasticsearch is the component that takes care of data storage; it also allows to index the
data received, for quick access and comparison between them.
Kibana is the software for the graphical interface, for data visualization; through Kibana it
Figure 2 - ELK Stack
Real-time monitoring of microservices-based software systems
17
is possible to create graphs according to the data received, to show only those that are
interested through a filtering, to visualize the trend over time of some parameters, etc.
To date, the one shown is one of the best tools, to perform real-time log analysis; however,
it is not always easy to implement, due in particular, to the Logstash configuration for data
filtering. Below are two excerpts of logs relating to two different applications.
1. [pid: 2177|app: 0|req: 6/8] 127.0.0.1 () {42 vars in 675 bytes} [Fri Mar 3 05:20:00
2017] GET /api/users/3/ => generated 669 bytes in 16 msecs (HTTP/1.1 200) 4
headers in 134 bytes (2 switches on core 0)
1. 10-01-2018 10:33:01.794 UTC INFO homestead.py:272: Sending HTTP PUT
request to http://homestead-prov:8889/private/6505550742%40example.com
As you can see, even if the information inside them is similar (timestamp, request URI, etc.),
their filtering is totally different; for the human eye it may seem easy, but for an automatic
filtering through an application like Logstash, it corresponds to two different configuration
files. For example, an extract of the Logstash configuration file for data filtering for the first
application is shown in the following [19]:
filter {grok { match=>{ “message”=>”\[pid: %{NUMBER:pid}\|app:
%{NUMBER:id}\|req: %{NUMBER:currentReq}/%{NUMBER:totalReq}\]
%{IP:remoteAddr} \(%{WORD:remoteUser}?\) \{%{NUMBER:CGIVar} vars in
%{NUMBER:CGISize} bytes\} %{SYSLOG5424SD:timestamp} %{WORD:method}
%{URIPATHPARAM:uri} \=\> generated %{NUMBER:resSize} bytes in
%{NUMBER:resTime} msecs \(HTTP/%{NUMBER:httpVer} %{NUMBER:status}\)
%{NUMBER:headers} headers in %{NUMBER:headersSize} bytes
%{GREEDYDATA:coreInfo}”}
Furthermore, it is important to consider that in this case, you have two logs with a well
defined data format, in all those cases, where there is not a standard in writing the log, or
the application does not generate a log file, the monitoring of microservices, through the
Real-time monitoring of microservices-based software systems
18
analysis of logs in real time, is very difficult if not impossible.
2.2 Consideration on the state of the art As we could understand through this chapter, the tools to carry out a monitoring of
microservices exist, however, they differ on the ease of implementation and on the
information that can be obtained. With tools like Nagios, it is simple to have information on
how a website works but it is not possible to obtain information with a granularity at the
level of the single microservice; vice versa, it is possible to have information on the single
microservice, but it requires considerable configurations.
With the analysis of the log in real time, we have much more information, but they require
many configurations strictly dependent on the application to monitor.
What is missing, is a monitoring tool for all microservices-based applications, which allows
to get amount of information and at the same time is easy to implement.
Surely among the exisiting methods and tools, the analysis of the log in real time is the one
that promises better results. However, we must find a way to provide a monitoring approach
that allows the generation of logs easy to interpret, and that is transparent to the application,
without changing its behaviour.
In order to be transparent to the application, it is not possible to think of creating frameworks
or libraries based on the instrumentation of the code, which allow adding, during the
development phase of the application, notation for the generation of logs that are easy to
interpret.
Furthermore, there is no single implementation of microservices; we have seen that the main
method is through the use of REST, but to develop REST API various frameworks can be
use, for example Spring or JAX-WS, if you want to use Java, or for example you can use
PHP code and an Apache Webserver, and so on.
However all the applications have in common the use of HTTP for the exchange of
messages; therefore we must find a method to generate logs starting from the messages
exchanged through HTTP, which allows to obtain useful information for microservices
monitoring.
Real-time monitoring of microservices-based software systems
19
Chapter 3: A new approach
As explained in the previous chapter, we want to provide a monitoring approach that is
completely transparent to the application and easy to use it.
To do this, we chose to generate logs starting from the messages exchanged; when an
application receives requests and sends replies, since they are dependent on the HTTP
protocol, all applications are the same and it is therefore possible to generate a generic log
for any type of application that uses microservices.
After explaining the criterion by which we can associate, the exchange of messages, with
the functioning of the microservices, it is shown how you can capture the exchanged
messages, by packet sniffing or using a proxy.
After that, will be addressed the issues concerning the information needed in a log.
3.1 Rule-based logging We have to connect the transit of the network packets, to the functioning or not of the
microservices; by capturing the exchanged messages, we have information about the HTTP
requests and responses of the applications, but these must be properly analysed in order to
understand if the monitored microservices are working properly.
The proposed approach is inspired by rule-based logging (described in the study [3]) which
allow to make logs effective to analyse software failures
This study shows the main patterns used, within the code, for the generation of error logs,
such as the construct if (condition) then log_error (), which is the most used method among
the various applications under examination.
Real-time monitoring of microservices-based software systems
20
These patterns, in most cases are inserted at the end of the development cycle, often without
the knowledge of the program structure. Furthermore, this means that often, the failures
occurring in the system are not reported in the logs. In fact, the study shows that "around
60% of failures caused by software faults go unreported by current logging mechanisms".
The proposed approach, is based on the rules, what to write and where to write it.
Rather than giving to the application the task of of detecting the occurrence of a failure as
well as its reporting in the log, the approach aims to verify the failures at a later time, through
the analysis of information written in the logs.
One of the rules proposed, for example, for the verification of the correct execution of a
service, is to insert at the beginning, as the first line of code, and end, as the last line of code,
the printing on the log of the beginning and end of the service. These two rules are identified
in the study as LR-1, Service STart (SST) and LR-2 Service ENd (SEN).
In doing so, if analysing the log, you notice a different number of SST and SEN, you can
identify the cases in which the function is terminated before the correct operation, whether
due to an error, unexpected exception, termination of the whole application or a timeout.
Instead, using the current logging mechanism can lead to cases where failure is not recorded
in the log, as it may terminate before the information has actually been written to the log.
As can be read at the end of the study, the rule-based approach allows to detect 94% of
failure against 34% of best practices used, but it is more difficult to detect the cause of
failure, because the information is lacking in detail, such as “file could not be opened, a
service was invoked with bad parameters, etc.”.
This approach can be useful for our purpose, but how to use it without having to modify the
application by inserting additional code?
Based on the design and behaviour of microservices, the following assumption can be made:
the arrival of an HTTP packet can be assimilated at the beginning of a service, while the
reply sent can be assimilated to the end of the service.
In this way we can adopt the same principle of functioning of the rule based logging, but
without any modifications to the application. The approach proposed in this thesis leverages
Real-time monitoring of microservices-based software systems
21
the previous assumption to create an ad-hoc log, containing all the information there are
needed to monitor microservices.
3.2 Sniffing or proxy? Two approaches are possible to capture packets exchanged on the network, use a proxy or
sniffing. With the proxy, we mean an application that acts as an intermediary between the
requests of the clients and the resources present on the server; in this case the message must
not be modified in any way, but only the information present must be read.
Sniffing means the practice of passive interception of data passing through the network;
neither the server nor the client are aware of the presence of a sniffer.
Both solutions are valid, but they have weaknesses. With the packet sniffing there is a risk
of packet loss (being passive, it is not always possible in time, to read all the information
present in the package or to capture all the packets that transit), thus creating false positives:
failing to intercept the answer, for the rule-based logging seen in the previous paragraph, to
a request it is not possible to associate any answer and therefore to label that request as
incorrect execution of the microservice. While, the proxy requires a minimum of
application-dependent configuration, as it needs to know which TCP ports to forward the
traffic to. However, the application-dependent configuration required by proxy is not the
only reason that led to discard that solution. An in-depth analysis highlighted other
shortcomings of the proxy, which are detailed in the following.
First, the use of a proxy, creates a single point of failure, every to make the application
robust and scalable, is in vain because everything will depend on a single node, the proxy
precisely. Second, as the load increases, the performance is strictly dependent on the proxy,
without any possibility of tuning on it or using workarounds. Moreover, even the simple
switching on and off of the monitoring becomes no longer trivial, in both cases it needs to
reconfigure the nodes appropriately.
Creating a highly reliable and scalable proxy would have been very disputable both as
working hours and as a final cost; having an excellent but expensive product involves
Real-time monitoring of microservices-based software systems
22
reducing the possible slice of the market that can be attacked, leading the project to failure.
With the sniffing approach of the packets, we have the probable problem of packet loss, (as
we will see later, with a minimum additional CPU effort there are no losses and therefore
no false positives), but certainly a simpler use of the monitoring tool and above all a
definitely lower cost.
At this point we are going to analyse what information we can draw from the packets that
transit on the network and which can be useful for monitoring the microservices.
3.3 What information do I need in the log? In summary, we have found a method to monitor an application based on microservices and
at the same time be transparent, without having to modify it or even know the
implementation details and its operation.
Now we need to understand what information can be useful for monitoring.
First of all, we must be able to identify the microservices, and this is possible because this
information corresponds to the fields of the HTTP request header, method and URL.
Afterwards we need to know the result of the request, and then pick up the Response Code
field in the HTTP response header.
Later we may want to discriminate between source and recipient nodes, so we need TCP
and IP headers. In particular, from the TCP header we go to pick up the Source Port and
Destination Port fields; from the IP header instead, let's take IP Source Address and IP
Destination Address.
These are the minimum information we need, for example we can also take the additional
data sent via JSON, XML and so on, for a better understanding of a possible failure, but at
this stage of development and in this version of the application are not provided.
One aspect of monitoring is also the performance evaluation of microservices, so we need
to know the execution times of the microservice. To do this we can compare the arrival time
of a packet and the instant in which the packet is ready to be sent; the difference of these
times, with a small approximation, can be assimilated to the time of execution of the
Real-time monitoring of microservices-based software systems
23
microservice. It should be noted that as both the request and the response are, they are
captured on the server, all delays due to transmission, any retransmission and network traffic
are already eliminated.
In order to evaluate the performance, it can also be useful, to know how many other requests,
the microservice or the application, is managing at the same time, so we have added two
additional fields within the log, which represent, respectively, the number of requests
pending at the instant of arrival of the request, and the number of pending requests at the
instant of sending the reply.
For a quick filtering of the information it is useful to have a level of alert in the log, and
therefore this will be our last field of the log.
At this point all that remains is to create an application that can track received requests and
replies sent, and that generates a log with the fields identified above.
For quick access to information, and to easily import it into third-party programs such as
Logstash, ElasticSearch and Kibana for viewing, or to import them into programs for
statistical analysis, such as JMP, we have chosen to use a data format simple and based on
the CSV format.
Since we will certainly have simultaneous connections to the same microservice, we want
to have both the request and the response in a single line, so as to have a quick feedback to
the unanswered requests, without having to search the log and understand the various
requests to which answers correspond. In this way we do not need to repeat the same
information, as IP addresses and TCP ports are simply reversed between request and
response. But this is an implementation detail that will be analysed in the next chapter.
So our log line, representing the beginning and end of a microservice will be the following:
Method, Url, IP source, TCP port source, IP Destination, TCP port destination,
Response code, Duration, Pending request at the beginning, Pending request at the end,
Info.
GET, /test/users/1, 127.0.0.1, 46594, 127.0.0.1, 8080, 200, 2.312776, 1, 0, Request –
Response
Real-time monitoring of microservices-based software systems
24
Chapter 4: MetroFunnel
After having seen in the previous chapter, at high level the principle of operation of this new
approach, we go down more in detail and introduce MetroFunnel. This application,
developed in Java language, allows to analyse the packets passing on the network, to filter
those with HTTP headers and to extract the information, seen in the previous chapter for the
generation of a log. Using this log, based on rule-based logging, it is possible to perform a
real-time analysis, using the ELK stack shown in chapter 2, to monitoring the microservices
and detect any failures.
Before developing the application, research was carried out to verify if there was a similar
product or that could be adapted, to extract the information seen in the previous chapter
from HTTP packets, but with a negative result, as no one allowed to carry out what we want
and the changes to be made would have been difficult implementation. For this reason, we
chose to create a project from scratch.
The architecture of MetroFunnel will now be presented, then we will proceed to describe
the functioning also through some examples; then it will be described how the version in
Docker was made, the ELK stack configuration and finally a short operating manual to
describe the various phases of operation.
4.1 Software Architecture MetroFunnel is a multithreaded Java application, which allows to analyse the traffic
exchanged on several interfaces simultaneously, a thread is instantiated for each open
interface. During start-up, it is possible to customize the capture of the packets and the
Real-time monitoring of microservices-based software systems
25
verification of requests without a response through a timeout, by inserting the TCP ports to
be analysed and setting the timer. It should be added that the filter on TCP ports acts in a
passive way, verifying the source and destination TCP port number of the packets; it does
not interfere in any way on the open sockets.
The project has been divided into two parts: PacketSniffer and LogManagement.
PacketSniffer is the package responsible for capturing packets, filtering, if any, on the TCP
port, and saving the fields related to requests and responses.
It consists of a single file, Sniffer.Java, and is the main core of the application; the sniffer is
set to operate in promiscuous mode, then it analyses all the packets passing on the network
and not just recipients at the open interface.
So we went in search of frameworks and libraries, that could be helpful to develop the sniffer
and we found the following libraries useful for our purpose:
• Pcap4J
• jpcap
• jNetPcap
All are java wrappers of the libpcap/Winpcap library, which provide objects with methods
and attributes for rapid project development. So the choice between these, was based on the
functionality, available versions, documentation, etc.
For this reason, in the end it was decided to use jNetPcap, of the aspect illustrated above, it
is the most complete from every point of view.
Then it uses JNI (Java Native Interface), while the other competitors use JNA (Java Native
Access), this allows jNetPcap to perform better on the others.
In addition, there is both the open source version, which will be the one we are going to use,
both a commercial version that promises better performance and technical support, which
can be used in the future as an improved and commercial version of MetroFunnel. The
1.4r1425 version for 64 bit Linux will be used, while for Java, JDK Java SE 8u151 will be
used. Because it uses the libpcap library through its wrapper, MetroFunnel needs root
privileges to work.
Real-time monitoring of microservices-based software systems
26
LogManagement is the package responsible for writing the log, it consists of two files:
1. EventLog.Java
2. EventLogManager.Java.
EventLog, which corresponds to the single event to be recorded on the log, includes the data
seen in chapter 3, both relating to the request and the response.
EventLogManager on the other hand takes care of physical writing on files, and also has the
methods for correspondence request-> response and verification of the timeout.
Since MetroFunnel is developed as a multithreaded application, the only method running is
the run method of the Sniffer class; the other methods present inside the class are related to
the configuration phase during startup.
In the EventLogManager class, there are methods to create an event, search for an event,
insert response, check timeout and print to file.
In the EventLog class, in addition to the constructor method, there are methods for entering
the response and for the request/response association.
The architecture is as follows:
Real-time monitoring of microservices-based software systems
27
Figure 3 - MetroFunnel architecture
Real-time monitoring of microservices-based software systems
28
4.2 Software workflow An activity diagram is now shown containing the operation of MetroFunnel.
Figure 4 - MetroFunnel Activity
Real-time monitoring of microservices-based software systems
29
For every package that arrives, MetroFunnel verifies that it has an HTTP header, otherwise
it is discarded. If so, it saves the information in the headers below the HTTP level; from the
IP header it saves the information regarding the source and destination IP addresses. From
the TCP header it saves the information regarding the TCP source port and TCP destination
port.
At this point check whether or not filtering is performed. If you have set up to capture
packets coming and going from all ports, go directly to the next step. Instead, if one or more
port numbers have been set, check that the source or destination TCP port has a number
from those listed; if at least one of the two numbers matches, it continues, otherwise the
package is discarded.
Then, it proceeds to analyse the packets’ HTTP headers, verifying if it is a request or a
response. In this case, for a known bug of the jNetPcap library, due to an incorrect use of
the pointer to the recomposed message, it was not possible to use the method made available,
therefore a manual check was made. To check if the message is a request or a response,
MetroFunnel controls the first 4 bytes of the header: if it starts with “HTTP” it means that
it is a response, otherwise it means that it is a request.
Based on the verification, the method picks up the necessary fields and, in the case of a
request, proceeds to create a new event object, and insert it in the list of pending events.
If the message is an answer, the method searches the event list for the first event that matches
the IP address and TCP port pairs, inverted between source and destination, calculate the
execution time and write the event on the log. In this way, the events in the log are sorted
according to the answers and not based on the arrival times of the requests.
It is possible to make this association between response and request, depending on the
functioning of the HTTP protocol. The HTTP 1.0 standard, provides that every request
corresponds to a response before proceeding with a new request, so it is not possible to have
different requests belonging to the same pair, IP and TCP, source. Starting from the HTTP
1.1 standard, it provides for the pipelining of the requests, but the answers must be in the
same order as the requests, this explains why just take the first compatible event.
Real-time monitoring of microservices-based software systems
30
If there is no event compatible with the response, it means that an insufficient timer has been
set and the request has already been timed out. Therefore, in the log there will be two rows
relating to the same event: one as an unanswered request, the other as a response without a
request. Analysing the log, if the pairs of events cancel each other, it means that there were
no cases of lost packets, otherwise it indicates that there was at least one packet lost.
Then, check if there are any other pending requests, which have a running time greater than
the set timeout; in this case, MetroFunnel writes every request on the log, insert as response
code “999” and deletes them from the list.
Since the requests are ordered according to the arrival time, if the first request has a time
less than the timeout, it automatically interrupts the search and waits for another packet.
Now a series of examples are shown to clarify the cases described above.
This is the basic case, a request and a correct answer.
1. GET, /test/users/1, 127.0.0.1, 46594, 127.0.0.1, 8080, 200, 60010.567, 1, 0, Request
– Response
Here an example of a false positive, we have a timeout request and a response not associated
with any request but with a positive response code.
1. GET, /test/users/1, 127.0.0.1, 46594, 127.0.0.1, 8080, 999, 60010.567, 1, 0, Request
– TIMEOUT
2. NULL, NULL, 127.0.0.1, 46594, 127.0.0.1, 8080, 200, 7564.334, 1, 0, NO
REQUEST – Response
This is a timeout request, with no response received later; it can be either a false positive if
MetroFunnel has failed to capture the answer, or a real failure if indeed the answer has never
been sent.
1. GET, /test/users/1, 127.0.0.1, 46594, 127.0.0.1, 8080, 999, 60010.567, 1, 0, Request
– TIMEOUT
Real-time monitoring of microservices-based software systems
31
Here instead, one request that it received the error code 502; in this case, the timeout was
high enough to insert the answer.
1. GET, /test/users/1, 127.0.0.1, 46594, 127.0.0.1, 8080, 502, 7564.334, 1, 0, Request
– Response
Finally, an example of a request expired in timeout, and then the response with error code
502 is received; in this case it is not a false positive, because the answer contains an error
code, both lines refer to the same failure.
1. GET, /test/users/1, 127.0.0.1, 46594, 127.0.0.1, 8080, 999, 60010.567, 1, 0, Request
– TIMEOUT
2. NULL, NULL, 127.0.0.1, 46594, 127.0.0.1, 8080, 502, 7564.334, 1, 0, NO
REQUEST – Response
4.3 Docker version It was decided to develop a Docker[11] version of MetroFunnel, in order to increase its
portability and facilitate its use. According to a Linux.com article[14],
Docker is a tool that can package an application and its dependencies in a virtual
container that can run on any Linux server. This helps enable flexibility and portability on
where the application can run, whether on premises, public cloud, private cloud, bare
metal, etc.
Docker provides an additional layer of abstraction and automation of operating-system-level
virtualization on Windows and Linux. Operating-system-level virtualization, also called as
containerization, it is a kernel feature of an operating system, to have multiple isolated
instances of user-space.
Docker uses the resource isolation features of the Linux kernel to allow independent
containers to run within a single Linux instance. In particular, it uses:
Real-time monitoring of microservices-based software systems
32
• cgroups, to provide resource limiting, including the CPU, memory, block I/O, and
network
• kernel namespaces, to isolate an application's view of the operating environment,
including process trees, network, user IDs and mounted file systems
• union-capable file system, as OverlayFS, to combine multiple directories into one
that appears to contain their combined contents
Docker can use different interfaces to access virtualization features of the Linux kernel.
It includes the libcontainer library, to directly use virtualization facilities provided by the
Linux kernel, or it can use libvirt, LXC (Linux Containers) and systemd-nspawn, to use
abstracted virtualization interfaces.
So, thanks to the features described above, a Docker container, unlike a virtual machine,
does not require an operating system. Instead, it relies on the kernel's functionality to isolate
the application's view of the operating system.
In Docker we have images and containers: A Docker image is a lightweight, stand-alone,
executable package of a piece of software that includes everything needed to run it: code,
runtime, system tools, system libraries, settings; a Docker container is the running instance
of that image.
By default, each container’s access to the host machine’s CPU cycles is unlimited. It’s
Figure 5 - Kernel functionality used by Docker
Real-time monitoring of microservices-based software systems
33
possible to set various constraints to limit a given container’s access to the host machine’s
CPU cycles. The scheduler used is CFS (Completely Fair Scheduler), it is a process
scheduler introduced from the 2.6.23 (October 2007) release of the Linux kernel and it is
the default scheduler. It handles CPU resource allocation for executing processes, and aims
to maximize overall CPU utilization while also maximizing interactive performance.
Through this scheduler, the CPU usage is evenly divided among all the containers that
request it and are running.
4.3.1 Image and Dockerfile
To create a Docker image, we must first create a folder with a Dockerfile and the additional
files necessary for creation. In our case are:
• MetroFunnel.jar - the Jar file to execute MetroFunnel
• libjnetpcap.so
• libjnetpcap-pcap100.so
• Dockerfile
The two files related to the libpcap library, are required from MetroFunnel that need that for
working.
The Dockerfile for generating the Docker image is the following:
Line 1 represents the basic starting image; therefore, from line 2 to line 10 are all the
operations to be performed on the base image, in particular: repository update, system
update, insertion the repository for the installation of Java, installation of Java, installation
Figure 6 - MetroFunnel Dockerfile
Real-time monitoring of microservices-based software systems
34
of libpcap library, cleaning. Lines 11-13 we add previously viewed files to our images.
Line 14, we set the command to execute when the container is running, then the command
to run MetroFunnel.
The command to execute the Docker container is the following:
docker run --net=host --privileged -v MetroFunnelData:/MetroFunnelData --it --rm
--name=MetroFunnel metrofunnelimage
As you can see, apart from the usual parameters, the -v parameter for the connection to the
volume has been added. This volume is necessary to allow files to survive the termination
of the container and to allow a container to read the files of the other container; in this case
to allow Filebeat to read the MetroFunnel log files. (Using the Docker version of
MetroFunnel, you need to add the -v parameter to the Filebeat execution command, to link
it to the same volume). Furthermore, the --privileged parameter has been added to allow the
container to have access to all the devices of the physical machine.
4.4 Elastic stack configuration Having a log, it is possible to configure the Elastic stack, for correct operation and to allow
real-time display of data.
The stack used is the one shown in chapter 2, when a typical solution used for monitoring
was presented. So we have Filebeat for reading the log, Logstash for parsing, Elasticsearch
for indexing and saving, finally Kibana for viewing. All related versions in Docker have
been used, creating a customized image with the configuration files, for each of them.
Unlike the example shown in chapter 2, now we do not have the problem of different logs,
our log is always the same, regardless of the application we are monitoring.
The configuration of Filebeat concerns only the path for reading the log and the IP address
where Logstash resides. With the standard version of MetroFunnel, the product log comes
to the program execution folder, the Docker version instead, writes the files to a Docker
volume, then the log path is the volume, as explained in the previous paragraph.
The configuration of Logstash instead, includes the parsing of the log, it is shown in full:
Real-time monitoring of microservices-based software systems
35
As you can see, even the configuration of Logstash becomes very simple; in addition to the
configuration of the input and the output, the filtering part consists only in the identification
of the fields and the conversion of the parameters as numbers.
Finally, the configurations of Elasticsearch and Kibana, also concern them only the
configuration of the IP address, so they are not shown.
At this point we show the complete architecture of MetroFunnel, highlighting the different
versions of MetroFunnel; it also shows the functionality, offered by the ELK stack to
separate the data collection node from the data visualization node.
Figure 7 – Logstash configuration file
Real-time monitoring of microservices-based software systems
36
4.5 User and operating manual At this point an example of MetroFunnel operation is shown; at the time of execution, it
displays all the available interfaces, and then requests the number of interfaces (which can
be physical or virtual) that you want to monitor.
Figure 8 - Complete architecture of MetroFunnel
Figure 9 - List of available interfaces
Real-time monitoring of microservices-based software systems
37
Next, the list of interfaces is shown again and MetroFunnel asks for the reference ID of the
interface to be monitored; you must enter the ID of the network interface on which the
microservices packets pass.
After entering the interface ID, the correct creation of the log file is shown, with its name,
and then asks if you want to enter the TCP port number to filter the data; this check is made
to each packet that transits and records the requests and the responses of the packets that
have that TCP port number, whether it is as a source or as a destination.
You can enter multiple values separated by a space or enter any to capture everything
without filtering.
Figure 10 - Insert the interface number ID
Figure 11 - Insert TCP port number
Figure 12 - Capture on any TCP port
Figure 13 - List of TCP port number
Real-time monitoring of microservices-based software systems
38
Finally, it asks for the max time before considering an expired timed request; this value is
unique for all requests, regardless of the Method and the reference URL.
If you chose to monitor multiple interfaces simultaneously, all the previous options are
related to the single interface.
After having inserted everything, the program starts with the monitoring of the packets that
transit on the network, showing them on the video when they receive the corresponding
response or expire by timeout.
Then you can start the Elastic stack, with the previous configuration, to see the operation of
the microservices in real time.
In the first image the Kibana interface is shown, where at the top it is possible to see the
number of events occurring over time, in the middle the rows of logs received; through the
setting of a filter, on the parameters of the log, you can view only the events we are interested
in. For example, we can set the filter on response code, a particular method or URL, etc.
In the second image you can see how easy it is to create graphs, as in this case the average
values of the execution times are shown according to the various methods.
Figure 14 - Example of capture
Real-time monitoring of microservices-based software systems
39
Figure 16 - Kibana
Figure 15 - Kibana histogram visualization
Real-time monitoring of microservices-based software systems
40
Chapter 5: Comparison
In this chapter, are highlighted the functional and performance differences between the
classic approach and MetroFunnel. To do this we chose an application, Clearwater IMS, and
in this chapter there will be an introduction to its features and functionality.
This application was chosen, as it is very widespread at the company level, known and
repeatedly used as a test application. Moreover, it is a full-bodied project, composed of
various nodes each performs their own functions, can be distributed, the application's nodes
can reside on different physical nodes, and especially for our purpose, a version developed
as microservices is available.
5.1 Clearwater IMS The project Clearwater is an open-source IMS core, developed by Met switch Networks and
released under the GNU GPLv3. IMS (the IP Multimedia Subsystem) is the standards-based
architecture that has been adopted by largest telcos as the basis of their IP-based voice, video
and messaging services, replacing legacy circuit-switched systems and previous generation
VoIP systems based on soft switching.
Clearwater provides SIP-based call control for voice and video communications and for
SIP-based messaging applications. You can use Clearwater as a standalone solution for
mass-market VoIP services, relying on its built-in set of basic calling features and
standalone subscriber database, or you can deploy Clearwater as an IMS core in conjunction
with other elements such as Telephony Application Servers and a Home Subscriber Server.
It has been chosen the Docker version of Clearwater, which is implemented as microservices
Real-time monitoring of microservices-based software systems
41
deployed as 11 nodes distributed in as many Docker container:
1. Etcd
2. Astaire
3. Bono
4. Cassandra
5. Chronos
6. Ellis
7. Homer
8. Homestead
9. Homestead-prov
10. Ralf
11. Sprout
Etcd is a distributed reliable key-value store for the most critical data of a distributed
system, with a focus on being:
• Simple: well-defined, user-facing API (gRPC)
• Secure: automatic TLS with optional client cert authentication
• Fast: benchmarked 10,000 writes/sec
• Reliable: properly distributed using Raft
Figure 17 - Clearwater architecture
Real-time monitoring of microservices-based software systems
42
Astaire pro-actively resynchronises data across a cluster of Memcached nodes, allowing for
faster scale-up/scale-down. Memcached, is a cache system, in RAM memory, to distributed
objects, to improve the speed and decrease the loading times of the pages of dynamic
database-based websites, by caching the required data and reducing the load on the database
servers. Astaire works with the Project Clearwater MemcachedStore to create a dynamically
scalable, geographically redundant, highly consistent transient data store.
Bono is Clearwater's edge proxy. It provides limited P-CSCF function and the some of
Clearwater's S-CSCF function.
P-CSCF (Proxy – Call Session Control Function) is a proxy, of the SIP protocol, and is the
first node that is crossed by an IMS terminal; it is crossed by all signalling messages, and
can check every message.
S-CSCF (Serving – Call Session Control Function) it is the main node of the signal level.
It generally acts as a stateful SIP proxy, receiving SIP messages from users, checking their
authenticity and forwarding them to other bono instances or one of the sprout instances.
Chronos is a distributed, redundant, reliable timer service. It is designed to be generic to
allow it to be used as part of any service infrastructure. It is designed to scale out horizontally
to handle large loads on the system and also supports elastic, lossless scaling up and down
of the cluster to handle extra load on the service.
Ellis contains the user database and the pool of numbers that can be allocated. It does not
contain per-line configuration - it stores all this directly in Homestead and Homer, accessing
them over their defined HTTP APIs.
Ellis is mainly written in Python. It uses Tornado for HTTP and MySQL as the underlying
database. Virtualenv is used to manage dependencies.
It provides a web GUI and underlying HTTP API for user and line creation, number
allocation, and configuration of iFCs and call services.
Real-time monitoring of microservices-based software systems
43
Homer is the XDMS (XML Document Management Server) component in Clearwater. It
provides storage, management and subscription to documents.
Homestead is a RESTful CRUD server built using C++ on top of Cassandra. It is designed
to be easily extensible and makes some assumptions about how you'll want to store your
data in Cassandra.
Ralf is a component of the Metaswitch Clearwater project, designed to act as the CTF
(Charging Trigger Function) for Clearwater nodes in an IMS compliant deployment. It
converts JSON bodies in HTTP requests from IMS components into Diameter Rf ACRs. It
uses memcached to store Rf session information for the duration of a session, and it uses
Chronos to send regular INTERIM ACRs to keep the session alive.
Sprout is Clearwater's SIP router. It provides most of Clearwater's S-CSCF function. It
generally acts as a stateful SIP proxy. It provides registrar function, storing registration
information in a memcached store distributed across all sprout instances. It also provides
application server function, retrieving Initial Filter Criteria documents from Homestead and
acting on them. As well as supporting external application servers, sprout has built-in
support for MMTEL services.
Cassandra is a non-relational database management system distributed with open source
license and optimized for managing large amounts of data. it takes care to preserve all the
information generated by the other nodes, in particular Ellis, Homer and Homestead.
Features:
• Decentralized: the nodes in the cluster are identical. There is no single point of
failure.
• Fault-tolerance: data is automatically replicated on multiple nodes. Replication by
different data centers is supported, and node replacement can be done without
Real-time monitoring of microservices-based software systems
44
downtime
• Tunable consistency: the level of consistency (both in writing and in reading) can be
modified (for example from writes never fail to block for all replicas to be readable)
• Elasticity: read or write throughput linearly with the addition of new machines
(nodes), without downtime and without interruption of any application
For any further detail and specification of Clearwater, please refer to the documentation and
official website of the Clearwater project.
5.1.1 Clearwater-live-test
A framework is made available to test the application, a suite composed of 80 tests.
Of these 80, 50 were chosen, which work by default without further configurations of
Clearwater. Tests in the framework are essentially short Ruby programs. These programs
use the Quaff library to talk over SIP to Clearwater nodes for calls, and the rest-client library
to communicate with Ellis for provisioning.
5.1.2 Basic operation
Through log analysis, package analysis using Wireshark and the logs generated by
MetroFunnel, we note the following (simplified) pattern of operation of the test suite.
It is shown to clarify the operation of Clearwater, to give an idea to the reader of what we
actually monitor, and will also be used for the analysis of the failures that will be shown in
the following paragraphs.
First of all, the client container sends a session opening request, with the POST method and
URI: “/ session” to Ellis through TCP port 80.
Subsequently we have the following operations:
1. Client request to register a telephone number to Ellis
[POST, /accounts/[email protected]/numbers/]
a. This execution involves in sequence the successive requests (8 requests)
from node Ellis to the Homer and Homestead-prov nodes
Real-time monitoring of microservices-based software systems
45
b. Only after receiving a reply, the client’s request is answered
2. (optional) Registration of the following telephone numbers required for the test: each
test requires a different number of telephone numbers, depending on the test being
performed, from a minimum of 1 number to a maximum of 4 telephone numbers.
Each execution of the method records only one number at a time, repeating in turn
the steps a and b of the previous point
3. Execution of the test through the SIP protocol, and microservices request, variable
in, number and requests, depending on the type of test
4. Client request for cancellation of the telephone number to Ellis
[DELETE, /accounts/[email protected]/numbers/sip%3AXXXXXX]
a. This execution involves in sequence the successive requests (6 requests)
from node Ellis to the Homer and Homestead-prov nodes
b. Only after receiving a reply, the client’s request is answered
5. (optional) Cancellation of additional telephone numbers. Each execution involves
deleting only one number at a time, always performing steps a and b of the previous
point
These steps are all performed for each repetition and for each individual test of the entire
suite.
5.2 Testbed The following Test System has been configured to compare the Classic and MetroFunnel
solutions.
Server:
• CPU: Intel I3-2100 3.1GHz
• RAM: 6 GB DDR3
• LAN: Realtek Gigabit
• OS: Ubuntu 16.04.03 LTS
Real-time monitoring of microservices-based software systems
46
Client side used a virtual machine hosted on an Apple physical machine.
Client:
• Host: Apple MacBook Pro (13.3 Early 2015) – 8GB RAM DDR3 – Thunderbolt
Ethernet Gigabit – OS: MacOS HighSierra 10.13.1 - VMware Fusion 8.1.0
• VM:
o CPU: 1 core dedicated
o RAM: 4GB
o OS: Ubuntu 16.04.03
For the connection, a RJ45 cross cable has been used, with a length of 1 meter and category
6 (Gigabit). The bandwidth available between the two machines was verified through the
iperf tool. 20 tests were carried out, as shown in the table:
Table 1 - iperf tests (Mbps)
test Server machine
Number of simultaneous TCP connections
Detected value on i3 machine (Mbps)
Detected value on Apple machine (Mbps)
1 i3 1 879 881 2 i3 1 926 928 3 i3 5 880 891 4 i3 5 866 867 5 i3 10 874 886 6 i3 10 820 821 7 i3 15 702 773 8 i3 15 723 802 9 i3 20 650 715
10 i3 20 705 759 11 Apple 1 939 938 12 Apple 1 930 929 13 Apple 5 932 932 14 Apple 5 940 939 15 Apple 10 938 938 16 Apple 10 935 934 17 Apple 15 942 936 18 Apple 15 939 852 19 Apple 20 915 829 20 Apple 20 942 853
min 650 715
Real-time monitoring of microservices-based software systems
47
In 10 tests, iperf was set as server on the machine with I3 processor, while on the other it
was set as a client with different number of simultaneous TCP connections. In the other 10,
the reverse test was performed, setting iperf as server on the Apple machine. For each test
the values measured by both machines are taken.
Of these values, the minimum value was chosen to position itself in the worst-case scenario,
thus having a bandwidth of 650Mbps available.
5.3 Performance Analysis In this phase the performances of the different solutions are compared.
Four test cases with variable load were performed:
1. Monitoring off (indicated in the table with Off)
2. Monitoring through the analysis of Clearwater internal logs (shown in the table with
Classic)
3. Monitoring through the analysis of the MetroFunnel Standard log
4. Monitoring through the analysis of the MetroFunnelDocker log
In this phase the log is not actually analysed, but the effect, on the performance of the various
solutions, is studied. In particular, we take into consideration:
• Size of the logs
• The amount of data exchanged on the network
• The execution time of the test suite
• The bandwidth used
With the Classic solution, the 10 Clearwater containers that generate logs were modified,
by instantiating on each of them Filebeat for sending the logs.
By default, each component logs to /var/log/<service>/, at log level 2 (which only includes
errors and very high level events). To see more detailed logs, you can enable debug logging.
No parameter is modified regarding the management of the logs.
With MetroFunnel solutions, we analyse how much performance degradations we have on
the server, in the two case. For the Standard version, we have a Java process for writing the
Real-time monitoring of microservices-based software systems
48
log and a Filebeat process for sending log. For the Docker version, we have two additional
containers (MetroFunnel to generate logs and Filebeat for sending logs) over eleven
Clearwater's containers.
To test the application, we use the live tests chosen described in the previous paragraph but
with a modify to the Rakefile and the Ruby script of start, to execute a single test several
times. (The suite has the REPEAT parameter, this allows to execute the entire suite several
times, while we wanted to repeat the same test several times before executing the next one).
This change has been made, to have, simultaneously, multiple clients that perform the same
test for a longer period of time.
With the changes made, we created the Docker image, which will be used to run the
containers for the test.
To simulate the variation of the workload, a bash ad-hoc script has been created, which takes
in 2 parameters, the number of containers and the number of repetitions of the single test.
After a couple of tests, we chose to set up repetitions, at 5 repetitions per test, while the
number of containers will be the load index of the system workload.
With a load value of 15 containers, the Ellis node crashes (as we will see later in the failure
analysis) during the execution of the test suites, failing to correctly handle all simultaneous
connections for recording and deleting telephone numbers. Therefore, we have chosen to
stay a little below the operating limit, in particular, the workload will start with 1 container
up to a maximum of 12 containers.
The purpose of the test is not to perform a Clearwater stress test, but to analyse the
performance and compare the two monitoring solutions. For this reason, these load values
have been chosen, to allow all the tests to be passed without errors, because, at each error,
the client waits a time of 30 seconds before closing the related test in timeout, falsifying the
results.
To validate the results, and at the same time, reduce the total number of tests to be performed
to have a complete coverage, 5 repeated tests were carried out on 3 different load values, in
particular 20% (2 container), 50% (6 containers) and 100% (12 containers).
Real-time monitoring of microservices-based software systems
49
To reduce the complexity of the Test System, only Clearwater plus any monitoring is
instantiated on the Server machine.
On the Client machine, the live-test containers are instantiated, then from the second step
on, the two containers, Logstash and ElasticSearch, are added, which are necessary in all
other scenarios.
The values of the amount of data exchanged on the network and the duration of the tests are
measured on the client machine. The data exchanged are calculated as the difference
between the total data incoming to the device at the end of the test and the total data in
incoming at the beginning of the test. These two values are taken through the use of the
Linux nload tool, executed before launching the relevant test and ended immediately after
finishing the test.
The duration of the tests, on the other hand, are calculated using the "time" command,
inserted before the launch of each test container; subsequently the values of the durations of
the whole suite and therefore of each client container are written in the table where the
average is calculated.
Instead, the values on the size of the logs are taken from the Server machine.
5.3.1 Log Size
In the Clearwater column, the sum of all logs in all containers is shown.
Clearwater is restarted for each test, so the size of the logs depends on the startup plus the
log due to the live-test.
• In Clearwater the startup (10 min) produces a log of 250-300KB
• With the MetroFunnel the startup produces a log of 100KB
• After 30 min without live-test the Clearwater log is 310KB
• After 30 min without live-test the MetroFunnel log is 250KB
This difference is due to the Heartbeat packages which are not reported in the Clearwater
logs. The measures expressed in the table are in MB.
Real-time monitoring of microservices-based software systems
50
Table 2 - Log size (MB)
As you can see, regardless of the load, there is a saving of almost 60% on the size of the
logs.
Clearwater logs include information on microservices, and information regarding the SIP
protocol, equal to about 15% of the total log size. This calculation can be done manually,
while the log lines concerning the SIP protocol; automatically, can not be done except by
filtering through Logstash, and then only after sending it.
Therefore, if you consider the logs for the same information, with MetroFunnel you have a
saving of almost of 50%. In the measurements concerning the comparison of incoming
data, as it should be, the logs are sent for integers, giving the burden to Logstash to filter the
information.
Below, the table containing the log size of the repeated tests respectively of the 3 selected
load values.
Table 3 - Log size test repeated (MB)
Source log Load Rep 1 Rep 2 Rep 3 Rep 4 Rep 5 Clearwater 2 15,9 15,8 15,8 15,8 15,8
6 46,9 46,9 46,9 46,9 46,9 12 93,5 93,6 93,5 93,5 93,5
MetroFunnel 2 6,5 6,6 6,5 6,5 6,5 6 19,3 19,3 19,4 19,3 19,3
12 38,7 38,7 38,8 38,7 38,7 MetroFunnelDocker 2 6,5 6,5 6,5 6,5 6,5
6 19,3 19,3 19,4 19,3 19,3 12 38,7 38,7 38,7 38,8 38,7
Load Clearwater* MetroFunnel MetroFunnelDocker Difference 1 8,1 3,3 3,3 -59,26% 2 15,9 6,5 6,5 -59,12% 3 23,6 9,7 9,7 -58,90% 4 31,4 12,9 12,9 -58,92% 5 39,1 16,1 16,1 -58,82% 6 46,9 19,3 19,3 -58,85% 7 54,7 22,5 22,5 -58,87% 8 62,4 25,8 25,8 -58,65% 9 70,2 29,0 29,0 -58,69%
10 78 32,3 32,3 -58,59% 11 85,7 35,5 35,5 -58,58% 12 93,5 38,7 38,7 -58,61%
Real-time monitoring of microservices-based software systems
51
As you can see, the log values are almost identical in all repetitions, except for a sporadic
100KB difference, equal to a 1.5% error in the worst case, and present only in 6 results out
of the total of 45 tests carried out.
So without further analysis we can assume the trend is confirmed.
5.3.2 Data incoming
After launching Logstash and ElasticSearch on the client machine, we compare the values
of data exchanged in incoming to the machine, between the different solutions, so how much
it affects the sending of logs from the Server machine to the Client machine.
The measures expressed in the table are in KB.
Table 4 - Data incoming (KB)
Load Off Classic MetroFunnel MetroFunnelDocker 1 4275,160 6369,280 5836,80 5856,010 2 7966,720 12134,400 10649,60 10567,680 3 11683,840 18073,600 15360,00 15349,760 4 15472,640 24074,240 20121,60 20193,280 5 19200,000 30023,680 25384,96 25210,880 6 22999,040 34211,840 29767,68 29931,520 7 26736,640 41748,480 34703,36 34846,720 8 30515,200 47656,960 39802,88 39864,320 9 34406,400 53565,440 44728,32 44789,760
10 38195,200 59494,400 49838,08 49776,640 11 41963,520 65413,120 54937,60 54917,120 12 45772,800 69314,560 59668,48 59760,640
Real-time monitoring of microservices-based software systems
52
As can be seen and easily imaginable, the data related to the two different versions of
MetroFunnel are overlapping completions. In the following table, the percentages of
increase of the incoming data of the different solutions are shown.
Table 5 - Ratio data incoming
Load Classic/Off MetroFunnel/Off MetroFunnelDocker/Off MetroFunnel/Classic 1 48,98% 36,53% 36,98% -8,36% 2 52,31% 33,68% 32,65% -12,24% 3 54,69% 31,46% 31,38% -15,01% 4 55,59% 30,05% 30,51% -16,42% 5 56,37% 32,21% 31,31% -15,45% 6 48,75% 29,43% 30,14% -12,99% 7 56,15% 29,80% 30,33% -16,88% 8 56,17% 30,44% 30,64% -16,48% 9 55,68% 30,00% 30,18% -16,50%
10 55,76% 30,48% 30,32% -16,23% 11 55,88% 30,92% 30,87% -16,01% 12 51,43% 30,36% 30,56% -13,92%
As for the log size, 5 repeated tests are performed on the 3 load values chosen to confirm
the trend. The measures expressed in the table are in KB.
Figure 18 - Data incoming (KB/s)
Real-time monitoring of microservices-based software systems
53
Table 6 - Data incoming repeated (KB)
Monitoring Load Rep 1 Rep 2 Rep 3 Rep 4 Rep 5 Average Off 2 7966,72 8007,68 8007,68 8007,68 8017,92 8001,54
6 22999,04 23152,64 23695,36 23173,12 23173,12 23238,66 12 45772,80 45998,08 46254,08 46059,52 46039,04 46024,70
Classic 2 12134,40 11970,56 12042,24 12154,88 12216,32 12103,68 6 34211,84 34017,28 33904,64 34273,28 34058,24 34093,06
12 69314,56 68751,36 68823,04 69294,08 68730,88 68982,78 MetroFunnel 2 10649,60 10465,28 10076,16 10414,08 10414,08 10403,84
6 29767,68 29173,76 29112,32 29952,00 29798,40 29560,83 12 59668,48 59648,00 59586,56 59330,56 59781,12 59602,94
MetroFunnelDocker 2 10567,68 10700,80 10741,76 10608,64 10690,56 10661,89 6 29931,52 29788,16 29358,08 29829,12 29808,64 29743,10
12 59760,64 59648,00 59688,96 59330,56 59781,12 59641,86
The difference between the various results is due to some packet retransmissions.
This difference is totally negligible on the whole quantity of data exchanged, therefore the
average of these is performed. Then the relationship between the various averages was
executed and in the following graph it is possible to see the results.
The Classic approach involves an increase in input data of about 50% compared to a 30%
increase with the MetroFunnel solution. This involves a saving of 15% exchanged data.
Figure 19 - Ratio data incoming
Real-time monitoring of microservices-based software systems
54
5.3.3 Execution time
At this point we take into consideration the impact on the test execution times.
The execution time of each single test suite is measured, therefore with a load equal to 2 (2
simultaneous containers), there are 2 measurements, with a load of 3 there are 3
measurements and so on.
Then the average of the different containers was carried out according to the load and
reported in the table as a single value. The values expressed in the table are in seconds. Table 7 - Execution time (s)
Load Off Classic MF MFD 1 637,709 644,754 641,278 647,060 2 664,087 673,417 660,916 673,788 3 679,662 685,640 684,978 691,898 4 697,546 713,891 721,792 734,914 5 753,103 769,174 781,832 799,084 6 852,233 864,870 885,647 913,038 7 966,282 981,108 1001,491 1034,346 8 1068,321 1084,562 1114,020 1144,939 9 1188,074 1222,121 1238,558 1275,878
10 1307,912 1345,457 1379,418 1413,562 11 1448,595 1467,049 1514,946 1560,206
Figure 20 - Execution time (s)
Real-time monitoring of microservices-based software systems
55
In the first 3 levels of loading, the various solutions have minimal differences in execution
times, conversely, as soon as the 3 client containers are exceeded, the differences are
accentuated.
The Classic solution has less impact on execution time, while the MetroFunnel solution
produces a greater impact.
It can also be noted that the Docker version of MetroFunnel (MetroFunnelDocker) has an
even worse effect on execution times compared to the standard dual solution.
Below is a report of the response times of the 3 monitoring solutions with respect to
execution times in the absence of monitoring.
Since there is no precise trend in the results, and even noting a certain randomness in the
results, repeated tests have been carried out in-depth statistical analysis to verify if these
results were due to measurement errors or the type of monitoring is a significant factor in
the execution times.
The table below shows the average values of the execution times measured with the different
types of monitoring and the different load values.
Figure 21 - Rate execution time
Real-time monitoring of microservices-based software systems
56
As anticipated, even under the same conditions (Load and Monitoring), in the different
repetitions there is a certain randomness in the results.
So we chose to perform the ANOVA test, to check if the difference between the different
monitoring methodologies was due to random and uncontrollable phenomena or to
monitoring. Therefore, it was first verified whether the monitoring factor was significant
and then the importance of this factor.
To do this analysis, we chose to divide it according to the load; It should be noted that it
was not decided to carry out a two-factor analysis, Load and Monitoring, and to see which
of the two factors is most influential, but there were 3 separate analyses of the single
Monitoring factor, based on the different load values, and therefore how important the
Monitoring factor is with a load of 20% (2 containers), 50% (6 containers) and 100% (12
containers).
Therefore, through the JMP software, the normality and homoschedasticity of the residues
were first verified, and the appropriate ANOVA test was chosen based on the results.
Table 8 – Execution time test repetead
Real-time monitoring of microservices-based software systems
57
If the visual test is well passed, the Saphiro-Wilk test is not exceeded, so as regards the
results of monitoring with load 2 the residues are not normal.
Figure 22 - Residual normality test with load 2
Real-time monitoring of microservices-based software systems
58
In this case, both the visual test and the Saphiro-Wilk test are successfully passed, so the
residues with load 6 are normal.
Figure 23 - Residual normality test with load 6
Real-time monitoring of microservices-based software systems
59
Also in this case, both the visual test and the Saphiro-Wilk test are successfully passed, so
the residues with load 12 are normal.
After analysing the normality of the residues, the homoscedasticity tests of the residues are
shown, divided according to the load.
Figure 24 - Residual normality test with load 12
Real-time monitoring of microservices-based software systems
60
As you can see the Levine test is exceeded, so the residues with load 2 are homoscedastic.
Figure 25 - Homoschedasticity test residues with load 2
Real-time monitoring of microservices-based software systems
61
The Levene test is not exceeded, so the residues with load 6 are not homoscedastic.
Figure 26 - Homoschedasticity test residues with load 6
Real-time monitoring of microservices-based software systems
62
The Levene test is exceeded, so the residues with load 12 are homoscedastic.
In summary, we have the following characteristics of the residuals.
Table 9 - Residual test summary
Load Normality Omoscedasticity Test 2 NO YES Kruskal-Wallis 6 YES NO Test Welch’s
12 YES YES ANOVA - F test
In the table, in the test column, the ANOVA test will be reported and executed.
Figure 27 - Homoschedasticity test residues with load 12
Real-time monitoring of microservices-based software systems
63
The null hypothesis is rejected; the monitoring factor is significant with a load of 2.
Figure 28 - Anova test with load 2
Real-time monitoring of microservices-based software systems
64
The null hypothesis is rejected; the monitoring factor is significant with a load of 6.
Figure 29 - Anova test with load 6
Real-time monitoring of microservices-based software systems
65
The null hypothesis is rejected; the monitoring factor is significant with a load of 12.
All three analyses are passed, so the monitoring factor is significant for all three different
load values. At this point, we went to calculate the importance of this factor in the results.
The calculated values of SST, SSA and SSE are shown in the table.
Figure 30 - Anova test with load 12
Real-time monitoring of microservices-based software systems
66
As can be seen from the table the Monitoring factor affects the execution time of the tests
for 62.70% with a load of 20%, has an incidence of 93.18% with a load of 6 containers and
an incidence of 86.23% with the maximum load.
Given the importance of the factors, the graph is shown on the ratios between the execution
times of the various monitoring with the execution times in the absence of monitoring of the
repeated tests, making the average of these values.
We can therefore state that the Classic monitoring has an incidence of 2% on execution
times, which does not vary excessively depending on the load.
Differently, with the MetroFunnel monitoring both in the standard version and in the Docker
version, it produces an effect on the execution times that is load-dependent, which is equal
Table 10 - Calculation of the importance of the monitoring factor
Figure 31 - Rate execution time test repetead
Real-time monitoring of microservices-based software systems
67
to 3% and 6% with a load of 6 while it is equal to 6% and 7.5% with a load of 12.
The performance difference obtained with the Standard the Docker version of MetroFunnel,
is due to the management of the CPU from the Linux kernel and from Docker.
Both use CFS (Completely Fair Scheduler), through the management of cgroups and CPU
Share. cgroups is a feature of the Linux kernel for allocating and limiting resources (in this
case, CPU) of process groups; therefore, every running process belongs to a cgroups that
will be scheduled through the CPU Share.
CPU Share assigns a time slot, to the various cgroups, where every task present within the
cgroups has the opportunity to perform its operations.
The standard version and the Docker version of MetroFunnel differentiate precisely for the
cgroups they belong to.
In the first case, the Standard version can use all the CPU Share assigned to that cgroups,
while in the second case, the quota is divided among the various containers, because all the
containers belong to the same cgroups.
Thus, the Docker version, uses part of CPU cycles that with the Standard version would be
used by other containers, slowing down the execution of the processes present in those
containers.
In fact, by performing a statistics of the execution times of the microservices, through the
logs produced by MetroFunnel, Standard version and Docker version, with the test at
maximum load, we see how the times of all microservices are dilated by 1-2 milliseconds,
in based on microservice.
Multiplying this delay, by the number of requests (about 250 thousand) divided by the
number of simultaneous test suites (12), a difference of about 40 seconds is obtained, which
is roughly the difference in average execution time of the two tests.
The execution times are calculated by the difference of the timestamps between arrival
request and departure response; since such timestamps are set to nanoseconds, and being
the difference of milliseconds, it is categorically excluded that it may depend only on
measurement errors.
Real-time monitoring of microservices-based software systems
68
5.3.4 Bandwidth
After taking into account the incoming data and the test execution time, depending on the
load, we pass to analyse the incoming band used during the execution of the tests.
The used band value is calculated using the previous results, respectively the incoming data
and the execution times as a function of the load.
The results in the table are average values expressed in KB/s. Table 11 - Bandwith incoming (KB/s)
Load Off Classic MetroFunnel MetroFunnelDocker 1 6,70 9,88 9,10 9,05 2 11,97 18,00 16,10 15,68 3 17,11 26,61 22,55 22,14 4 22,11 33,65 27,80 27,45 5 25,41 38,12 32,44 31,46 6 26,57 39,48 33,48 32,53 7 27,58 42,88 34,92 33,62 8 28,43 43,75 35,66 34,67 9 28,89 43,70 35,96 35,03
10 29,14 44,05 35,61 35,09 11 28,84 44,51 36,42 34,86 12 28,90 43,35 36,60 34,67
Figure 32 - Bandwith incoming (KB/s)
Real-time monitoring of microservices-based software systems
69
As can be seen from the table, the bandwidth values used are nettings below the measured
band values and shown at the beginning of the chapter. This indicates that all the results
shown have not been altered by random phenomena of network traffic. Moreover, from the
graph it is possible to notice that:
• As is easily understood, in the absence of monitoring there is less bandwidth
consumption.
• The Classic solution is the one with the highest bandwidth consumption.
• The standard version of MetroFunnel has a slightly higher bandwidth consumption,
compared to the Docker version. This is because at the same data incoming it has a
shorter execution time.
• Given the above considerations, it is possible to understand that in the ratio, data
incoming and execution time, the main factor is data incoming, because, the increase
factor in the numerator (data incoming) is greater than the increase factor in the
denominator (execution time).
Also in this case, in order to validate the results, the bandwidth values were calculated with
the results of the 5 repeated tests for the 3 load values of the incoming data and of the
execution times. The table shows the average values.
The second table shows the rate of bandwidth increase and the bandwidth rate saved
between the Classic and MetroFunnel solutions.
Table 12 - Bandwith incoming test repeted (KB/s)
Table 13 - Ratio bandwith incoming
Load Classic/Off MetroFunnel/Off MetroFunnelDocker/Off MetroFunnel/Classic 2 48,57% 30,14% 31,59% -12,41% 6 44,06% 23,13% 20,35% -14,53%
12 46,48% 22,61% 20,69% -16,29%
Load Off Classic MetroFunnel MetroFunnelDocker 2 12,03 17,87 15,65 15,83 6 27,24 39,24 33,54 32,78
12 29,49 43,20 36,16 35,60
Real-time monitoring of microservices-based software systems
70
As can be seen, with the MetroFunnel solution there is a 20% increase in bandwidth while
with the Classic solution there is a bandwidth increase of 45%. This means a bandwidth
saving of 16%.
5.4 Failure Analysis At this point we move on to the functional analysis between the Classic and MetroFunnel
methodology. In this phase we are going to verify that actually the information in the
MetroFunnel log is useful for monitoring purposes and that are at least comparable to the
information in the Clearwater default log.
They are shown five cases of failures. The first three are spontaneous failures of the
application, the other two are forced going to end the processes inside the containers.
For obvious reasons, the spontaneous failures that occurred during the analysis with
monitoring Off or Classic were not taken into account because it is not possible to compare
the MetroFunnel log. Instead, in the analysis phase with monitoring via MetroFunnel, both
Figure 33 - Bandwith incoming (KB/s)
Real-time monitoring of microservices-based software systems
71
in the Standard version and in the Docker version, it was possible to take it manually (and
therefore not via Filebeat) and save the Clearwater internal logs in a local folder on the
Server machine to perform the analysis and comparisons later.
This clarification was added in order not to induce the reader to think that the failures
occurred only with monitoring through MetroFunnel and that in some way it was one of the
triggers of the failures.
5.4.1 Failure 503
The test was in progress using the version of MetroFunnel Standard, Clearwater was
restarted and 10 minutes had been expected for startup. At this point the test for 5
simultaneous containers was launched on the client side and the shells running the test
containers show all the following error message:
RuntimeError thrown:
Account creation failed with HTTP code 503, body
{"status": 503, "message": "Service Unavailable", "reason":
"No available numbers", "detail": {}, "error": true}
The MetroFunnel log contains information such as these:
1. [POST,/accounts/[email protected]/numbers/,
192.168.1.2,46594,172.18.0.12,80,503,2.312776,1,0, Request - Response]
With the help of the test execution pattern, seen in section 5.1.2, we can see how the requests
for registration arrive at the Ellis node (172.18.0.12 - TCP 80) and it responds with code
503. In addition, the log does not contain the consequential requests to the Homer and
Homestead-prov nodes (pattern 1a e 1b). Therefore, the Ellis node does not send requests,
otherwise they would be present in the log as a normal line or as a request reported by
MetroFunnel as timeout.
Now we show, an extract from the Ellis log:
Real-time monitoring of microservices-based software systems
72
1. 05-01-2018 17:24:18.650 UTC ERROR homestead.py:41: Failed to ping
Homestead at http://homestead-prov:8889/ping. Have you configured your
HOMESTEAD_URL?
2. …
3. 05-01-2018 17:34:27.160 UTC WARNING numbers.py:128: No available numbers
4. 05-01-2018 17:34:27.160 UTC WARNING _base.py:138: 503 POST
/accounts/[email protected]/numbers/ (0.0.0.0): No available numbers
5. 05-01-2018 17:34:27.160 UTC ERROR web.py:1447: 503 POST
/accounts/[email protected]/numbers/ (0.0.0.0) 3.48ms
In the other Clearwater nodes, there is no information in the logs that can help in the
interpretation of the failure.
The information regarding the microservices is the same; in Clearwater there is an error
concerning the ping failed to the homestead-prov node. Probably, because this problem
occurred during the startup phase, the Ellis node is already aware that the Homestead-prov
node is not available and therefore does not forward the registration requests of the number
but automatically responds with code 503.
In this case the Clearwater log adds information, but only because the problem occurred in
the startup phase, as we will see later, in the event of problems during normal operation the
information in the MetroFunnel log is comparable to the information in the Clearwater log.
5.4.2 Failure 181
During the tests, sometimes, with different load values, a single test within the entire suite
and related to only one client container failed, reporting the following message:
Endpoint threw exception:
- Expected 100, got 181 (call ID 070443892a6498e2825f51731f5bcaff)
Reading the error message, it is easy to understand that it is not a failure at the microservice
level, as in the HTTP/1.1 standard there are no status code 181. To demonstrate this, we first
Real-time monitoring of microservices-based software systems
73
show two extracts from the MetroFunnel log, two extracts from the Clearwater log, then an
extract from the log that the live-test framework produces when a test fails.
Going to see the MetroFunnel log we see that all the calls of the microservices end with the
code 200/201 and there are no requests timeout and in particular there are two requests with
the same call-id ended both with the code 200:
1. [POST, /call-id/070443892a6498e2825f51731f5bcaff,
172.18.0.11,57338,172.18.0.9,10888, 200, 0.467959]
2. [POST, /call-id/070443892a6498e2825f51731f5bcaff,
172.18.0.11,41848,172.18.0.9,10888, 200,0.980225]
These two lines are extracted from the Ralf node log:
1. 30-12-2017 11:17:27.073 UTC 200 POST /call-id/
070443892a6498e2825f51731f5bcaff 0.000247 seconds
2. 30-12-2017 11:17:28.986 UTC 200 POST /call-id/
070443892a6498e2825f51731f5bcaff 0.000147 seconds
In no other Clearwater log, there is information that has the same call-id or that can be
associated with the error message.
The live-test log is instead the following:
Endpoint on 46839 received:
SIP/2.0 181 Call Is Being Forwarded
Content-Length: 0
Via: SIP/2.0/TCP
So we can say that the failure is not at the level of microservices but due to the SIP protocol
that is not object of monitoring.
Real-time monitoring of microservices-based software systems
74
5.4.3 Failure 502 –Homestead-prov (forced kill)
This failure is due to the forced termination of the process homestead-prov, inside the
Homestead-prov container during the execution of a test suite.
To do this it used the following command:
docker exec homestead-prov kill –9 216
The following error message is displayed on the client:
Account creation failed with HTTP code 502, body {"status": 502, "message": "Bad
Gateway", "reason": "Upstream request failed", "detail": {"Upstream error": "502",
"Upstream URL": "http://homestead-prov:8889/private/6505550742%40example.com"},
"error": true}
The MetroFunnel log is the following; as explained in chapter 4, the order of the rows in
which appear in the log is consistent with the replies received and not with the requests sent.
1. PUT,/private/6505550742%40example.com,
172.18.0.12,48260,172.18.0.7,8889,502,0.432941,2,1,Request - Response
2. GET,/private/6505550742%40example.com/associated_implicit_registration_sets,
172.18.0.12,48262,172.18.0.7,8889,502,0.443208,2,1,Request - Response
3. PUT,/org.etsi.ngn.simservs/users/sip%3A6505550742%40example.com/simservs.x
ml, 172.18.0.12,35254,172.18.0.8,7888,200,0.258912,2,1,Request - Response
4. GET,/public/sip%3A6505550742%40example.com/associated_private_ids,
172.18.0.12,48266,172.18.0.7,8889,502,0.231437,2,1,Request - Response
5. POST,/accounts/[email protected]/numbers/,
192.168.1.2,51456,172.18.0.12,80,502,24.941935,1,0,Request - Response
As you can see all the Homestead requests (lines 1,2 and 4) have as code 502. The request
to the Homer node (172.18.0.8 – TCP 7888), line 3, has code 200. Finally, there is the
answer to the client that the registration of the number was not successful with code 502.
Real-time monitoring of microservices-based software systems
75
Now we show, an extract of the Ellis node log and then the Homer and Homestead-prov
log. As explained in the previous chapters, by default in the log of Ellis, only the requests
sent are present, with log level INFO; the answers are not added.
Only in the event of an error, the answers are added to the log of the node that made the
request with log level WARNING or ERROR.
1. 10-01-2018 10:33:01.794 UTC INFO homestead.py:272: Sending HTTP PUT
request to http://homestead-prov:8889/private/6505550742%40example.com
2. 10-01-2018 10:33:01.795 UTC INFO homestead.py:272: Sending HTTP GET
request to http://homestead-
prov:8889/private/6505550742%40example.com/associated_implicit_registration_
sets
3. 10-01-2018 10:33:01.797 UTC INFO xdm.py:29: Sending HTTP PUT request to
http://homer:7888/org.etsi.ngn.simservs/users/sip%3A6505550742%40example.co
m/simservs.xml
4. 10-01-2018 10:33:01.798 UTC WARNING utils.py:53: Non-OK HTTP response.
HTTP 502: Bad Gateway
5. 10-01-2018 10:33:01.798 UTC WARNING numbers.py:180: Failed to update all
the backends
6. 10-01-2018 10:33:01.798 UTC INFO homestead.py:253: Sending HTTP GET
request to http://homestead-
prov:8889/public/sip%3A6505550742%40example.com/associated_private_ids
7. 10-01-2018 10:33:01.799 UTC WARNING utils.py:53: Non-OK HTTP response.
HTTP 502: Bad Gateway
8. 10-01-2018 10:33:01.804 UTC WARNING utils.py:53: Non-OK HTTP response.
HTTPResponse(code=502,request_time=0.0060460567474365234,buffer=<_io.B
ytesIO object at
0x7f4fc9c56e90>,_body=None,time_info={},request=<tornado.httpclient.HTTPR
equest object at 0x7f4fc9bcac90>,effective_url='http://homestead-
Real-time monitoring of microservices-based software systems
76
prov:8889/public/sip%3A6505550742%40example.com/associated_private_ids',he
aders={'Date': 'Wed, 10 Jan 2018 10:33:01 GMT', 'Content-Length': '181',
'Content-Type': 'text/html', 'Connection': 'close', 'Server': 'nginx/1.4.6
(Ubuntu)'},error=HTTPError('HTTP 502: Bad Gateway',))
9. 10-01-2018 10:33:01.807 UTC WARNING numbers.py:192: Backed out changes
after failure
10. 10-01-2018 10:33:01.818 UTC ERROR web.py:1447: 502 POST
/accounts/[email protected]/numbers/ (0.0.0.0) 29.00ms
Lines 1,2,3 6 are the various requests made to the Homestead and Homer nodes.
Lines 4-5 and 7-8 contain information about the failed response from the Homer node.
Finally, in line 10 is presented the response sent by Ellis to the client
In the Homestead-prov log, as was obvious, there are no rows with a timestamp following
the termination of the process.
The Homer log instead is the following:
1. 10-01-2018 10:33:01.799 UTC INFO base.py:259: Received request from
localhost - PUT
http://http_homer/org.etsi.ngn.simservs/users/sip%3A6505550742%40example.co
m/simservs.xml
2. 10-01-2018 10:33:01.800 UTC INFO xsd.py:51: Performing XSD validation
3. 10-01-2018 10:33:01.802 UTC INFO base.py:272: Sending 200 response to
localhost for PUT
http://http_homer/org.etsi.ngn.simservs/users/sip%3A6505550742%40example.co
m/simservs.xml
As you can see, the information present in the MetroFunnel log and those obtained from
Clearwater logs, are roughly the same.
Real-time monitoring of microservices-based software systems
77
5.4.4 Failure 502 – Homer (forced kill)
This failure is due to the forced termination of the process homer, inside the Homer
container during the execution of a test suite.
To do this it used the following command:
docker exec homer kill –9 221
The following error message is displayed on the client:
Account creation failed with HTTP code 502, body {"status": 502, "message": "Bad
Gateway", "reason": "Upstream request failed", "detail": {"Upstream error": "502",
"Upstream URL":
"http://homer:7888/org.etsi.ngn.simservs/users/sip%3A6505550622%40example.com/sim
servs.xml"}, "error": true}
The error message is the same as the previous case, except for the reference URL.
For reasons of simplicity, only the different rows of logs are reported.
MetroFunnel log:
1. PUT,/org.etsi.ngn.simservs/users/sip%3A6505550622%40example.com/simservs.x
ml, 172.18.0.12,39156,172.18.0.8,7888,502,0.368791,3,2,Request – Response
2. POST,/accounts/[email protected]/numbers/
,172.18.0.13,55378,172.18.0.12,80,502,56.252505,2,1,Request - Response
Ellis log:
1. 10-01-2018 10:40:24.378 UTC INFO xdm.py:29: Sending HTTP PUT request to
http://homer:7888/org.etsi.ngn.simservs/users/sip%3A6505550622%40example.co
m/simservs.xml
2. 10-01-2018 10:40:24.383 UTC WARNING utils.py:53: Non-OK HTTP response.
HTTPResponse(code=502,request_time=0.004508018493652344,buffer=<_io.Byt
esIO object at
0x7f23a2249a70>,_body=None,time_info={},request=<tornado.httpclient.HTTP
Real-time monitoring of microservices-based software systems
78
Request object at
0x7f23a2244f90>,effective_url='http://homer:7888/org.etsi.ngn.simservs/users/sip
%3A6505550622%40example.com/simservs.xml',headers={'Date': 'Wed, 10 Jan
2018 10:40:24 GMT', 'Content-Length': '181', 'Content-Type': 'text/html',
'Connection': 'close', 'Server': 'nginx/1.4.6 (Ubuntu)'},error=HTTPError('HTTP
502: Bad Gateway',))
3. ….
4. 10-01-2018 10:40:24.543 UTC ERROR web.py:1447: 502 POST
/accounts/[email protected]/numbers/ (0.0.0.0) 379.43ms
Also in this case, the information present in the MetroFunnel log and those obtained from
Clearwater logs, are roughly the same.
5.4.5 Failure 504 – Overload
As mentioned previously, with a load value of 15 containers, the Ellis node crashes and
therefore in the performance assessment a load value of more than 12 is not used.
In this phase, however, we take advantage of this limit, to compare how many and which
failures can be detected with the Clearwater log and with MetroFunnel log.
After a couple of tests, all the containers show error message like this:
RuntimeError thrown:
Account creation failed with HTTP code 504, body <html>
<head><title>504 Gateway Time-out</title></head>
or the following, depending on whether the failure occurred during the insertion of a
telephone number or during its cancellation:
Leaked sip:[email protected], DELETE returned 504
Failed
RestClient::GatewayTimeout thrown:
504 Gateway Timeout
Real-time monitoring of microservices-based software systems
79
The MetroFunnel log is as follows (only a few lines are shown):
1. DELETE,
/accounts/[email protected]/numbers/sip%3A6505550627%40example.com
,192.168.1.2,47564,172.18.0.12,80,999,9175.693857,19,23, Request – TIMEOUT
2. DELETE,
/accounts/[email protected]/numbers/sip%3A6505550664%40example.com
,192.168.1.2,50094,172.18.0.12,80,999,8971.865984,20,22, Request - TIMEOUT
3. POST,
/accounts/[email protected]/numbers/,192.168.1.2,33282,172.18.0.12,80,
999,8192.985622,22,15, Request – TIMEOUT
In particular, the MetroFunnel log detects 203 requests in timeout, of which 36 DELETE,
158 POST, 8 PUT and 1 GET.
Furthermore, analysing the source and destination IP addresses, it is possible to add that of
these 203, 193 correspond to requests made by the client node, while the other 10 correspond
to requests of the Ellis node made to the Homer and Homestead-prov nodes.
These 193, we find also in the log of MetroFunnel as response not associated with any
request; they are the responses generated by the HTTP protocol, in fact they have as
response code all 500/502/504.
1. NULL, NULL, 192.168.1.2,47564,172.18.0.12,80,500,1134430.152055,1,0,
NO REQUEST - Response
2. NULL, NULL, 192.168.1.2,50094,172.18.0.12,80,502,1134431.107602,1,0,
NO REQUEST - Response
3. NULL, NULL, 192.168.1.2,33282,172.18.0.12,80,502,1134432.563724,2,1,
NO REQUEST - Response
In MetroFunnel it is possible to set a timeout and through this value, MetroFunnel can
identify unanswered requests; in this case we are not faced with the case of false positives,
Real-time monitoring of microservices-based software systems
80
because in the answers, detected beyond the timeout, and therefore not associated with any
request, the error codes 50X are present as response codes.
Therefore, they must be considered as 2 lines related to the same event; in the first is
MetroFunnel to detect the failure, the second is the confirmation of failure.
At this point we are going to analyse more in depth the other 10 requests that MetroFunnel
detects as failures. To do this we compare the requests and responses found in the logs of
Ellis, Homer and Homestead-prov. For each request URL detected as failed, let's take a look
at how many times that URL appears in the various logs; this comparison can be made
because in the communications between Ellis and the Homer and Homestead-prov nodes in
the URL there are the telephone numbers associated with the request.
The MetroFunnel log related to the failures is as follows:
1. PUT,
/org.etsi.ngn.simservs/users/sip%3A6505550490%40example.com/simservs.xml,17
2.18.0.12,52844,172.18.0.8,7888,999,9210.065918,19,3, Request – TIMEOUT
2. DELETE, /private/6505550086%40example.com,
172.18.0.12,38582,172.18.0.7,8889,999,9209.536692,20,2, Request – TIMEOUT
3. DELETE, /private/6505550036%40example.com,
172.18.0.12,38586,172.18.0.7,8889,999,9209.063506,21,1, Request – TIMEOUT
4. PUT, /irs/315d5e00-a64d-44cc-a283-323f380006c1/service_profiles/ccca53d4-
40b5-4dcc-99d1-7b8f960665ea/filter_criteria,
172.18.0.12,38578,172.18.0.7,8889,999,9213.987397,1,8, Request – TIMEOUT
5. DELETE,
/org.etsi.ngn.simservs/users/sip%3A6505550086%40example.com/simservs.xml,
172.18.0.12,52838,172.18.0.8,7888,999,9213.405119,2,7, Request – TIMEOUT
6. PUT,
/org.etsi.ngn.simservs/users/sip%3A6505550906%40example.com/simservs.xml,
172.18.0.12,52830,172.18.0.8,7888,999,9212.817478,3,6, Request – TIMEOUT
7. PUT,
Real-time monitoring of microservices-based software systems
81
/org.etsi.ngn.simservs/users/sip%3A6505550308%40example.com/simservs.xml,
172.18.0.12,52834,172.18.0.8,7888,999,9211.938706,4,5, Request – TIMEOUT
8. DELETE,
/org.etsi.ngn.simservs/users/sip%3A6505550036%40example.com/simservs.xml,
172.18.0.12,52842,172.18.0.8,7888,999,8175.936295,19,1, Request – TIMEOUT
9. GET, /public/sip%3A6505550996%40example.com/associated_private_ids,
172.18.0.12,38428,172.18.0.7,8889,999,8174.563955,21,2, Request – TIMEOUT
10. PUT,
/org.etsi.ngn.simservs/users/sip%3A6505550652%40example.com/simservs.xml,
172.18.0.12,52678,172.18.0.8,7888,999,8176.211097,18,3, Request - TIMEOUT
The table shows the number of times that the same URL appears within the logs.
Table 14 - Number of row with same URL
Failure MetroFunnel Ellis Homer Homestead 1 5+1 6+1 5+5 - 2 19+1 20+1 - 19+19 3 11+1 12+1 - 11+11 4 1 1+1 - 0 5 5+1 6+1 5+5 - 6 6+1 7+1 6+6 - 7 8+1 9+1 8+8 - 8 4+1 5+1 4+4 - 9 3+1 4+1 - 4+4
10 6+1 7+1 7+7 -
In the MetroFunnel column, the number of requests with answer + the number of
unanswered requests are indicated.
In the Ellis column, as the log is organized, the number of requests sent is reported + the
number of incorrect answers (as already described in the previous paragraph, in Ellis only
the incorrect answers it receives appear).
The number of requests received + the number of responses sent are shown in the Homer
and Homestead columns.
As you can see, in all 10 cases where MetroFunnel reported a timeout request, Ellis reports
that it has received a non-HTTP response, such as can be seen in the log below:
Real-time monitoring of microservices-based software systems
82
20-01-2018 10:45:43.420 UTC WARNING utils.py:53: Non-OK HTTP response.
HTTPResponse(code=599,request_time=30.03533101081848,buffer=None,_body=None,t
ime_info={},request=<tornado.httpclient.HTTPRequest object at
0x7fbb69be0710>,effective_url='http://homestead-
prov:8889/public/sip%3A6505550996%40example.com/associated_private_ids',headers=
{},error=HTTPError('HTTP 599: Timeout',))
While in both the Homer logs and the Homestead-prov logs, it is possible to notice that the
number of requests received is one unit lower than the number of requests detected both
by MetroFunnel and in the Ellis log.
It is possible to notice a very particular case related to failures 9 and 10.
In both cases MetroFunnel detects a request in timeout and Ellis detects an incorrect
answer, while in the Homer and Homestead-prov logs, the number of requests received
and answers correctly (with response code 20X) corresponds to the number of requests
sent by Ellis.
Once verified that the 203 failures detected by MetroFunnel are all true, let's check how
many failures are detected within Clearwater logs; since the presence of the previous 10
failures has already been ascertained, we are looking for the remaining 193.
In the Ellis log there are 68 lines with “ERROR” level, of which 50 can be associated to
client requests, while the other 18 are errors related to detected exceptions.
Even if we consider all events as a failed client requests, the number of failures detected is
much lower than the number of failures detected by MetroFunnel (68 instead of 193).
So in this case, MetroFunnel managed to capture all the failures, while in the
Clearwater log, only 35% of the failures are present.
5.4.6 Further considerations on failure analysis
As you can see, in cases where the failure occurs in the secondary nodes and not directly
connected to the clients (Homer and Homestead-prov), both the MetroFunnel log and the
Ellis log, contain all the failures.
Real-time monitoring of microservices-based software systems
83
In the case in which to fail, it is the first node connected to the client, there are 65% less
failures, which is a value close to what is also found in the rule-based logging.
With the log of MetroFunnel it is possible to know the execution time of each microservice,
allowing to perform the performance monitoring; it is not possible to do this with Clearwater
logs, as this information is not present.
Furthermore, since the data is preformatted, as shown during the performance analyses, they
are easy to import into programs such as JMP for statistical analysis or via Kibana for fast
performance monitoring.
Another difference in favour of MetroFunnel is the centralized log.
In this case the Clearwater nodes and therefore the logs, are distributed, even if they all
reside on the same physical machine, while with MetroFunnel, it is possible to monitor the
traffic exchanged at the network interface level (in this case Docker Bridge - the virtual
interface of Docker that connects all containers) allowing monitor all connected nodes at
the same time and have a unique log containing both the requests and the answers, even if
grafted.
Although the information content is the same, the simplicity of understanding and the speed
with which it is possible to take the information are clearly in favour of MetroFunnel.
Moreover, through the log of MetroFunnel, being unique, it is much easier to recognize the
execution patterns, while with a distributed and fragmented log, it is much more complicated
and above all almost impossible to perform in real time.
Real-time monitoring of microservices-based software systems
84
Conclusions
MetroFunnel, together with the ELK stack configured as shown in previous chapters, is a
tool ready to be used to monitor microservices; it is the instrument that was missing, and
not a simple improvement to something that already existed.
The principle of operation may seem simple, but the approaches that are already used are
exploited, adding the study of rule-based logging.
MetroFunnel is effective (it can monitor performance and detect failures), transparent
(neither the application nor users should be informed that monitoring is active) and not
intrusive (the behaviour of the application must not be changed).
The results show that it is a valid tool, due to the performance impact of using MetroFunnel,
and for the detection of failures, which in the tests performed, proved to be perfect,
compared to the internal logs of the test application.
Real-time monitoring of microservices-based software systems
85
Future developments
MetroFunnel can be a starting point for further developments, such as improving debugging
in the event of failure; this can be done by including in the log the information exchanged
microservices via JSON or XML.
Another point that can be improved is the performance impact that it provides on the server:
if the tests are performed on a simple desktop machine, and therefore on a server that
performs better, with more resources available (CPU and RAM), it is very likely that this
impact is less; however, by adopting more efficient algorithms, or by using more performing
support libraries, this impact can be reduced.
86
Bibliography
[1] Leonard Richardson, Sam Ruby, “RESTful Web Services”, O’Reilly, 2007
[2] Bhakti Mehta, “RESTful Java Patterns and Best Practices”, Packt Publishing, 2014
[3] Marcello Cinque, Domenico Cotroneo, Antonio Pecchia, “Event Logs for the
Analysis of Software Failures: A Rule-Based Approach”, IEEE Transactions on software
engineering, vol. 39, 806-821, Jun. 2013
[4] Nicola Dragoni, Saverio Giallorenzo, Alberto Lluch Lafuente, Manuel Mazzara,
Fabrizio Montesi, Ruslan Mustafin, Larisa Safina, “Present and Ulterior Software
Engineering”, Springer, 195-216, 2017.
[5] Cesare Pautasso, Olaf Zimmermann, Frank Leymann, “RESTful Web Services vs.
“Big” Web Services: Making the Right Architectural Decision”, Proceedings of the 17th
international conference on World Wide Web, ACM, 805-814, 2008
[6] Bass Len,Weber Ingo, Zhu Liming, “DevOps: A Software Architect's Perspective”,
Addison-Wesley Professional, May 2015
[7] Peter Salus, “A Quarter Century of Unix”, Addison-Wesley Professional, May 1994
[8] Spring.io, https://spring.io/blog/2015/07/14/microservices-with-spring, 25/01/2018
[9] SlyTechnologies jNetPcap, http://jnetpcap.com/, 25/01/2018
[10] W3C, https://www.w3.org/Protocols/rfc2616/rfc2616.html, 25/01/2018
[11] Docker, https://www.docker.com/, 25/01/2018
[12] Elastic, https://www.elastic.co/, 25/01/2018
[13] Nagios, https://support.nagios.com/kb/article.php?id=12, 25/01/2018
[14] Linux,https://www.linux.com/news/docker-shipping-container-linux-code,25/1/2018
[15] Clearwater, http://www.projectclearwater.org/clearwater-microservices-and-docker/,
87
25/01/2018
[16] LoSviluppatore,http://losviluppatore.it/microservices-architecture-il-pattern-
architetturale-emergente-per-le-grandi-applicazioni-moderne/, 25/01/2018
[17] mokabyte, http://www.mokabyte.it/2016/12/microservizi-1/, 25/01/2018
[18] mokabyte, http://www.mokabyte.it/2017/01/microservizi-2/, 25/01/2018
[19] Medium,https://medium.com/aubergine-solutions/real-time-api-performance-
monitoring-with-es-beat-logstash-and-grafana-21f67655f41e, 25/01/2018
[20] Amazon, https://aws.amazon.com/it/cloudwatch/, 25/01/2018
[21] Netflix, https://github.com/Netflix/Hystrix/wiki, 25/01/2018
[22] Postman, https://www.getpostman.com/, 25/01/2018
[23] Apiwatcher, https://www.apiwatcher.com/, 25/01/2018