Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Article
A cloud-to-edge approach to support predictiveanalytics in robotics industry
Simone Panicucci1 , Nikolaos Nikolakis2 , Tania Cerquitelli3 , Francesco Ventura3 , StefanoProto 3 , Enrico Macii4, Sotiris Makris2, David Bowden5, Paul Becker6, Niamh O’Mahony3,Lucrezia Morabito1, Chiara Napione1, Angelo Marguglio7, Guido Coppo8, Salvatore Andolina8
1 COMAU S.p.A., Turin, Italy; {name.surname}@comau.com2 Laboratory for Manufacturing Systems & Automation, Department of Mechanical Engineering and
Aeronautics, University of Patras, Patras, Greece; {name.surname}@lms.mech.upatras.gr3 Department of Control and Computer engineering, Politecnico di Torino, Turin, Italy;
{name.surname}@polito.it4 Interuniversity Department of Regional and Urban Studies and Planning, Politecnico di Torino, Turin, Italy;
{name.surname}@polito.it5 DELL EMC, Cork, Ireland; {name.surname}@dell.com6 Fraunhofer Gesellschaft zur Förderung der angewandten Forschung, Aachen, Germany;
{name.surname}@ipt.fraunhofer.de7 Engineering Ingegneria Informatica S.p.A., Palermo, Italy; {name.surname}@eng.it8 SynArea Consultants S.r.l., Turin, Italy; {name.surname}@synarea.com
Version March 11, 2020 submitted to Electronics
Abstract: Data management and processing to enable predictive analytics in cyber physical systems1
holds the promise of creating insight over underlying processes, discovering anomalous behaviours2
and predicting imminent failures threatening a normal and smooth production process. In this3
context, proactive strategies can be adopted, as enabled by predictive analytics. Predictive analytics4
in turn can make a shift in traditional maintenance approaches to more effective optimising their5
cost and transforming maintenance from a necessary evil to a strategic business factor. Empowered6
by the aforementioned, this paper discusses on a novel methodology for RUL estimation enabling7
predictive maintenance of industrial equipment using partial knowledge over its degradation function8
and the parameters that are affecting it. Moreover, the design and prototype implementation9
of a plug-n-play end-to-end cloud architecture, supporting predictive maintenance of industrial10
equipment is presented integrating the aforementioned concept as a service. This is achieved by11
integrating edge gateways, data stores at both the edge and the cloud, and various applications,12
such as predictive analytics, visualization and scheduling, integrated as services in the cloud system.13
The proposed approach has been implemented into a prototype and tested in an industrial use case14
related to the maintenance of a robotic arm. Obtained results show the effectiveness and the efficiency15
of the proposed methodology in supporting predictive analytics in the era of Industry 4.0.16
Keywords: Machine learning; predictive maintenance; visualization techniques; data management;17
big data architecture.)18
1. Introduction19
The industrial world is undergoing a shift to industry 4.0, where physical processes merge20
with their virtual counterparts towards increased flexibility and predictability. In this context, data21
acquisition and processing can enable smart functionalities. Data analysis techniques, such as clustering22
and anomaly detection, are used for enabling predictive analytics. Hence, powerful and reliable23
cyber-physical system (CPS) architectures [1] are becoming prominent to effectively analyze such large24
Submitted to Electronics, pages 1 – 21 www.mdpi.com/journal/electronics
Version March 11, 2020 submitted to Electronics 2 of 21
amounts of data, creating insight into the production process, and, thus, enabling its improvement, as25
well as competitive business advantages.26
The proposed architecture, the “SERENA” system, can identify the symptoms of imminent27
machine failure, through the characterization of the current dynamics of the process/machine using28
data collected from the factory. A scalable and modular approach has been taken in the design of29
the architecture, decoupling the overall design from any specific set of technologies. In particular,30
there have been designed and implemented three different services for enabling predictive analytics,31
including a self-assessment service to automatically detect the drift in the data under analysis and32
trigger the building of a new predictive model, and a data-driven estimation of the remaining useful33
life.34
This prototype has been validated in a real-world scenario involving anomaly detection on a35
robotic axis and concerning the maintenance requirements caused by wrong belt tensioning effect. The36
visualization service enables a real-time data stream and machine visualization, while the predictive37
analytics services generate the estimated remaining useful life (RUL) value, which is consumed by the38
scheduling service to proactively schedule the maintenance activities.39
This paper is organized as follows. Section 2 discusses the state-of-the-art architectures and40
methodologies in the era of Industry 4.0. Section 3 describes the proposed architecture enabling41
predictive maintenance in industrial environments equipped with Internet of Thing devices. In42
addition, Section 4 presents the services of the SERENA system, while Section 5 introduces the43
industrial use case. Section 6 presents some results achieved by exploiting the proposed architecture44
in a robotic industry. Finally, Section 7 draws conclusions and discusses future developments of the45
proposed approach.46
2. Related work47
The expression Industry 4.0 has been minted by research and industrial partners during the last48
decade. In this context, a large variety of studies and research projects have been conducted. [2] shows49
an exemplary case study on big data analytics in industrial production environments. Outcome of50
this is a cross-industry methodology, capable of process improvements with data mining. In [1] a51
distributed system architecture was used to enable predictive maintenance on a self-tuning engine by52
combining dynamic use of different prediction algorithms with data interpretation functionalities to53
allow better understanding by the end user. While in [3] an overview of IoT platforms with applications54
in enhanced living environments and in healthcare systems is given highlighting the applications, the55
challenges, the opportunities of these new technologies.56
The differences between multi-class classifiers and deep learning techniques are discussed in [4].57
A comparative experimental analysis of exploratory techniques was presented in [5]. The scalability of58
big data methodologies is discussed in [6] using an example from the energy domain industry. The59
authors of [7] present a proposal for on-demand remote sensing of data to save network and processing60
capacities by extracting features as soon as possible in the data flow.61
The linking between existing production system, their legacy systems and advanced Internet62
of Things (IoT) technologies is discussed in [6]. For this purpose, virtualization and a cloud-based63
micro-service architecture are key technologies. The topic of distributed micro-service architectures64
for big data applications is also discussed in [8]. Therein, the author describes an approach done65
by the Open Geospatial Consortium (OGC) to process large quantities of data collected by earth66
observation satellites. By bringing micro-services to the place where data is captured and stored67
instead of moving these large quantities of data to one central processing node, unnecessary data68
transport can be significantly reduced. In [9] a scalable full-stack distributed engine tailored to69
energy data collection and forecasting in the smart-city context enabling the integration of the70
heterogeneous data measurements gathered from real-world systems and the forecasting of average71
power demand. Moreover, related to fog computing, in [10] the authors introduce a new architecture72
enabling on-demand computation to offload the computation in the cloud architecture, reducing73
Version March 11, 2020 submitted to Electronics 3 of 21
unnecessary network overhead, by properly selecting the most effective edge devices as computation74
delegates.75
In the context of maintenance strategies, the condition based approach is considered as effective76
[11], but also complex to implement and integrate to a production system [12]. However, the77
determination of the current equipment condition is an important issue in the maintenance process78
chain. Using a combination of raw data and intelligent trend filtering, anomalies can be detected79
in various ways, as discussed in [13]. Usually identified anomalies or the status of machine’s80
degradation are expressed in terms of a machine’s remaining useful life (RUL). In that way, a81
complex statistical analysis is directly mapped to a value with physical meaning for the maintenance82
personnel, responsible for further actions. [14] and [15] explore ways to estimate the RUL and derive83
a detailed maintenance scheduling strategy. It is important to note that to adequately estimate an84
equipment’s condition using raw data as a basis, a dataset of proper quality is necessary. This85
requires data corresponding to both normal and abnormal working conditions. Methods for data86
quality improvements are discussed by the authors of [16]. The authors describe how datasets can be87
visualized, grouped, classified, clustered and evaluated in order to detect and remove outliers.88
This paper extends the work presented in [17] by (i) presenting an up-to-date architecture of89
the SERENA system, (ii) describing in further detail the main components of the predictive analytics90
service, and (iii) providing additional experimental results.91
3. Proposed Architecture92
The SERENA system is built on a lightweight and flexible micro-services architecture as shown in93
Figure 1, with the various applications implemented as independent Docker services. The services are94
integrated to seamlessly support maintenance engineers and personnel by providing the following95
main functionalities: (i) predictive maintenance analytics; (ii) dynamic maintenance scheduling; and96
(iii) assisted plant maintenance, using augmented reality devices. Communications are facilitated97
through a central message broker. With respect to the cloud-based nature of the architecture and its98
services, the SERENA data model is based on a JSON-LD [18] mapping of the MIMOSA XML schema99
[19]. A Docker orchestration layer manages the instantiation and life cycle of the services. The adoption100
of virtualization technologies like Docker, creates an agnostic middle-ware, enabling the SERENA101
system to be deployable on a wide variety of host infrastructures while providing scalability and102
resiliency. The Docker orchestration layer also provides the overlay network that allows the services to103
transparently communicate, irrespective of where they are deployed.104
A standard practice for companies not providing IT products or services, is renting cloud space105
and web servers to avoid the related capital and maintenance cost, it is obvious that production106
equipment and sensors are in a distance to the actual rented cloud infrastructure. With respect to107
the aforementioned, the SERENA architecture extends the IoT cloud concept to directly include108
these devices. This is accomplished by wrapping the edge applications as Docker services, and109
managing their deployment and lifecycle from the Docker swarm as docker workers or managers in110
case of a complex swarm(s). This in turn can enable workload balancing in terms of deploying and111
executing docker applications to the appropriate network nodes, as well as managing the activation112
or deactivation of additional services at a certain node. In case of network bandwidth or latency113
constraints, the service can be moved to the edge of the cloud to analyze and convert the raw data into114
features, at source; alternatively, if high performance compute resources are required, such as tensor115
processing units (TPUs) for training machine learning (ML) models, services can be run on the central116
cloud infrastructure or in remote public clouds. Thus, the SERENA architecture considers both cloud117
and edge nodes, all being part of the same hybrid SERENA cloud system, centrally and holistically118
managed as one or more Docker swarm service network.119
In the context of the SERENA project, a specific gateway has been designed and developed based120
on an Intel NUC, which has 8 GB of RAM and an Intel i5, as well as some analog inputs. The edge121
component, referring to an edge gateway (hardware and software) allows building real-time streaming122
Version March 11, 2020 submitted to Electronics 4 of 21
Figure 1. The SERENA Architecture
applications, that reliably collects data among different heterogeneous systems or applications and123
transforms or reacts to the streams of data collected in the plant. It can scale in both data size and124
velocity. The edge gateway is located on the factory floor collecting sensor data coming from industrial125
equipment as well as context information and the operating condition of the equipment. It includes126
the following key functionalities:127
• The data flow engine, managing the data streams128
• A feature engineering component pre-processing incoming raw data into values that are of grater129
meaning for the predictive analytics, such as the current average/rms value instead of simple130
amplitude.131
• The rules for performing the aforementioned pre-processing132
• A pre-trained predictive analytics model providing real time predictions/classifications133
All components are deployed as docker containers under the control of a docker orchestration manager.134
Wrapping edge functionalities as docker services gives the system the flexibility to execute the135
application workloads wherever it makes the most sense: if there are network bandwidth or latency136
constraints, the service can be moved to the edge of the cloud to analyze and convert the raw data into137
feature engineering, at source; alternatively, if high performance computing resources are required,138
such as tensor processing units (TPUs) for training machine learning (ML) models, services can be run139
on the central cloud infrastructure or in remote public clouds.140
The SERENA system has been designed and implemented aiming to be technology independent.141
The reason being is to allow any integrator or developer to customize it and tailor its reference142
technologies depending on a case specific technology limitations. Whilst a reference architecture and143
implementation are provided, the system has been designed in such a way as to give the developer the144
freedom to choose the technologies that conform to their own corporate guidelines or meet specific145
implementation challenges.146
In particular, the technologies used for implementing the SERENA system during the proof of147
concept stage along with the data storage mechanisms are transparent to the other services. Thus,148
older technologies can be swapped for newer more powerful ones, without having to rebuild the entire149
system. As an example, the preferred data storage platform is HDFS [20], but other databases, such as150
Cassandra [21], may be preferred in specific cases.151
Version March 11, 2020 submitted to Electronics 5 of 21
4. Services152
The architecture presented in the previous section has been designed to enable predictive analytics153
of industrial equipment. Towards that, a set of application services has been integrated. In the following154
sections the main services are presented in greater detail.155
4.1. The predictive analytic service156
A general-purpose service suited to fulfill the condition monitoring and maintenance needs of157
modern companies in the context of industry 4.0 is proposed. The architecture of the analytical service158
proposed in this research, reported in Figure 2,159
is flexible and it can be customized for any other use cases of interest (e.g., [22]). The service is160
based on four main steps: feature engineering, predictive analytics, model validation, self-assessment.
Figure 2. Analytic architecture161
Feature engineering. Industrial sensors monitor many production processes, that are usually162
repeated periodically and characterized by a specific duration. The feature engineering component163
is in charge of transforming and processing raw sensor data to extract the main features describing164
the signals. They are identified at the first stages of the raw data analysis process (i.e. clustering,165
correlation analysis, PCA, other) and implemented at a latter stage to reduce the amount of data166
communicated from the edge to the cloud. Specifically, each monitored cycle is divided in several167
segments over the time domain, to better extract the variability separately for each sub-cycle. Each168
segment is thus characterized by many statistical features (e.g. mean, standard deviation, quartiles,169
kurtosis, skewness, root mean squared error, sum of absolute values, number of elements over the170
mean, absolute energy, mean absolute change). These features are then used by the other steps of the171
analytical process to predict the outcome for each cycle.172
Since the great number of computed statistical features, it is possible that the time domain feature173
computation step produces a huge number of attributes and that, in some cases, this can affect the174
performance of the successive analysis. However, since it is possible that some features are highly175
correlated with some others in the dataset, the information gained by these attributes could be176
redundant and it can possibly produce noise in the model building phase. Thus, to select the features177
that contain the most valuable information, the analytics engine includes a feature selection technique178
based on the Pearson correlation test [23]. Computing the correlation of each couple of attributes and179
removing those that are correlated the most, on average over all the (other) attributes, it is possible180
to identify which are the attributes that can be discarded without losing any accuracy in the model181
construction.182
Predictive analytics. The predictive analytics block has the role to learn from the historical dataset183
and build a prediction model, to forecast the correct label for new incoming sensor data in the real184
time prediction phase. The model building process exploits a training phase necessary to extract the185
latent relations which exist between an historical set of signals and its labels (events to predict). The186
Version March 11, 2020 submitted to Electronics 6 of 21
information contained in the features is analyzed by different scalable and transparent classification187
algorithms, available in the Apache Spark framework [24,25]: (i) Decision Tree [26] (DT) , (ii) Random188
Forest [27] (RF), (iii) Gradient Boosted Tree [28] (GBT). The chosen classification algorithms have189
been specifically picked-up for their interpretability: domain experts can indeed easily understand190
and inspect the main relationships between the input data and the outcome of the predictions. Once191
the model is built, the labels (and the probability to belong to a specific label) for new incoming192
sensors data can be forecasted by the real time prediction. The trained model is pushed down to the193
edge to fulfill the need for short response time of the predictive models, in order to enable in-time194
corrective/preventive actions to take place on the shopfloor. Also for safety reasons for the human195
personnel as well as for preventing any robot/machine damage, that would be associated to additional196
expenditures. This would need a pre-trained model deployed close to the robot, getting data and197
processing in short time, providing results easy to understand and interpret by the human(s) in close198
proximity outputs.199
Model validation. The performance of the predictive model built in the previous step has to be200
evaluated by the Validation block. In such phase, all the trained classification models are validated201
using the Stratified K-Fold Cross validation strategy and the values of precision, recall and f-measure202
are calculated for each class of interest. This step is necessary to understand which algorithm better203
fits the requirements of the given classification task (i.e. have higher values for the f-measure and the204
other metrics). The indexes computed to evaluate the training of the model require the information205
of the real class of the test data, namely a subset of labeled data kept apart during the training phase206
through a certain sampling strategy. Because of this reason, these approaches are not eligible to be207
used to evaluate the model performances on unlabeled data over time.208
Self-assessment. It is common, especially in industrial production environments, that the nature209
of the collected data changes over time due to equipment degradation, the adoption of new machinery210
or to some environmental factors. The Self-assessment phase allows to identify if, due to changes in211
the production environment or in case of new labels not included in the dataset, the predictive model212
performance has degraded and so decides whether to trigger the update and retrain of model with213
newly arrived data.214
The main idea behind the self-assessment approach is to exploit the changes in the geometrical215
distribution of the data over time to identify when the predictive model is no more performing216
as expected. Moreover, in real use cases, the ground truth label is often not available, thus, only217
unsupervised techniques can be exploited. The proposed methodology exploits a new, scalable version218
of a well known unsupervised index able to measure the intra-cohesion and inter-separation of clusters219
of data along. The exploited metric is the Descriptor Silhouette (DS) index proposed in [29]. It provides220
an approximated estimation of the Silhouette index. The DS index ranges in [−1, 1], as the standard221
Silhouette index, where -1 represents a point that is distant from the distribution of the points to which222
it is assigned (bad assignment), while 1 describe a well assignment for the point in exam. Each record223
in the dataset is characterized by a DS score each time the self-assessment is executed, producing a224
Silhouette curve. Given a dataset characterized by class labels, it is possible to measure the cohesion225
and the separation of the data, before and after the collection of new incoming data. It has been proved226
experimentally that a variation of the geometric distribution of the incoming data w.r.t. the historical227
data corresponds to the presence of new unknown data. Then, the self-assessment service produces a228
measure of the percentage of degradation of the predictive model performance given by the following229
relation:230
DEG(c, t) = α ∗MAAPE(Silt0 , Silt) ∗Nc
N(1)
The degradation is calculated for each class c learned by the predictive model and at specific231
timestamp t. The degradation is obtained calculating the Mean Arctangent Absolute Percentage Error232
(MAAPE) [30] between the DS curve calculated with the training data and the DS curve calculated233
with all the data collected until time t. This error is then weighted by the number of points predicted234
Version March 11, 2020 submitted to Electronics 7 of 21
to belong to class c (Nc) w.r.t. the cardinality of the whole dataset (N). The coefficient α instead, give235
the sign to the degradation, describing if the mean silhouette is increased (negative sign) or decreased236
(positive sign), after the arrival of new unknown data. A positive sign of α represents a lower cohesion237
in the data, since the mean silhouette after the arrival of new data is lowered and thus the degradation238
is increased (the degradation is positive).239
Remaining Useful Life estimation.240
A widely used indicator in industry about the condition of its assets is the Remaining Useful Life241
(RUL). The RUL of an asset can be defined as the time left to the end of its expected operational life and242
as it regards the functionality for which it was purchased for. The same asset operating under different243
conditions can be characterised by different degradation level, and as a result varying RUL values at244
the same point of time. As a consequence, the creation of a generic model, statistical or otherwise, for245
estimating the RUL value is extremely challenging if at all feasible [31]. Nevertheless, the accurate246
estimation of the RUL value is a key component towards enabling predictive maintenance of a system.247
Thus, several efforts are underway to provide a solution to this problem. However, state-of-the-art248
algorithms, such as liner regression models, Hidden Markov Models, or Long Short-Term Memory249
Neural Networks, require extensive and complex modelling and training to estimate the RUL value,250
with limited efficiency.251
To that end and towards overcoming the aforementioned limitations, this work proposes a novel252
methodology for RUL estimation. In particular, data collected out of different phases of an asset253
operational life are used to model its behavior at specified and labelled timestamps and support the254
identification of deviations from its nominal or ideal operational profile and other as well labelled255
profiles. More specifically, data acquired at the deployment of an asset on a shop floor can be considered256
as an ideal operational profile from which degradation starts to gradually deteriorate its operational257
behavior. Afterwards and during its usage for production purposes, data are used to estimate the258
deviation from this nominal profile. The distribution of the nominal set of key variables is calculated.259
Then the probability of each selected variable to belong to the nominal distribution is estimate using260
the Gaussian kernel density estimation [32]. This probability representing the deviation of the two261
profiles can then be associated to the RUL indicator or similar, facilitating its estimation. The lower is262
the probability of each selected variable to belong to the data distribution of the correct functioning of263
the machinery, the faster is the degradation of object functionality, thus the lower is the RUL.264
It should be mentioned that the set of selected as important variables for the aforementioned265
analysis can either be a-priori known, based on existing engineering/production knowledge or266
identified during the course of the data analysis by extracting features that may be used for pattern267
recognition and anomaly detection enabling in turn predictive maintenance activities. Afterwards,268
as the degradation level of the monitored equipment increases a drift from the initial distribution is269
detected, that can be associated to a decrease in the value of the RUL indicator. The deviation from270
the nominal profile or "distribution drift" is evaluated through the self-assessment service and for the271
selected features.272
Hence, the estimation of the RUL value in the context of SERENA and as presented in this work273
relies upon two main factors: (1) the joint probability of the selected features with respect to their274
nominal profile or distribution, and (2) the presence of probability drifting in the collected data over275
time. Both factors consist functions of time t. At a random time t, the percentage of the RUL value can276
be defined as follows:277
RUL(t) =1
Nt
X(t)
∑x
((100− DEG(t))
S
∏s
P(xs ∈ K(Xs(t0))
)(2)
where278
• X(t0) refers to the set of historical data collected from the the beginning of the operational life to a279
time t0 corresponding to the nominal or ideal operational profile of the asset under consideration,280
• X(t) refers to the new data collected from t0 to time t,281
Version March 11, 2020 submitted to Electronics 8 of 21
• Nt corresponds to the number of new signals collected up to time t,282
• S is the set of relevant features for characterizing the degradation level of the asset,283
• K(Xs(t0)) denotes the distribution of the feature s ∈ S estimated from the historical data X(t0),284
• P(xs ∈ K(Xs(t0)) refers to the probability of feature s as estimated out of a new set of data x to285
belong in the distribution K(Xs(t0)),286
• DEG(t) is the overall degradation estimated by the data collected from t0 to t as measured out of287
the Equation 1.288
As a result, the equation 2 estimates the percentage of RUL as a function of time by calculating the289
mean of the joint probabilities of each selected feature to belong to the ideal or nominal operational290
distribution, for each incoming set of data, weighted by 100 minus the percentage reflecting the291
probability drift at a time t. The contribution of the drifting probability is calculated as 100 minus292
the percentage of degradation, reflected in the collected data, since an opposite analogy is considered293
between the RUL and the drift values.294
Using the aforementioned concept as a basis, the SERENA system is able to estimate the RUL295
value of an asset, machine, robot, other. Moreover, considering a human-centered cyber physical296
production system, the proposed approach supports its customised tuning and configuration by the297
human user towards addressing different scenarios and assets. It should be noted that an important298
aspect of the presented approach and as part of the implemented SERENA system, is the knowledge299
of the domain expert for configuring the system and identifying the relevant or important features for300
the analysis. This knowledge is considered as part of the experience acquired during production and301
not knowledge that an equipment manufacturer could provide, unless monitoring the asset during its302
operational lifetime.303
In addition the proposed RUL estimation approach supports the correlation to the drift measured304
by 1 to unknown or unmodelled knowledge collected over time through the analysed features. This305
phenomenon is considered to represent degradation that cannot be effectively modelled due to its306
complexity and unknown correlation to the known asset variables and monitored characteristics.307
4.2. The scheduling service308
The result of the aforementioned predictive analytics service, a forecasted failure time horizon, is309
consumed by a scheduling service [33]. The aim of the service is to prevent the predicted failure, by310
assigning the required maintenance activities to operators within the given time-frame. This service311
can be extended to consider the current production plan, hence fitting the maintenance activities within312
a given time slot to optimize production outputs.313
The scheduling service has been implemented in Java following a client-server model. The314
service inputs include the monitored equipment, RUL value, maintenance tasks, including precedence315
relations and default duration per operation experience, and a number of potential operators with their316
characteristics, such as experience level. The server side includes a multi-criterion decision making317
framework, evaluating the alternative scheduling configurations, ranking them and selecting the318
highest ranked one. The client side communicates with the server side via restful APIs, supporting319
editing of tasks, resources, equipment, time series visualization, and process plan Gantt visualization.320
The process time required to create a new schedule depends on the complexity of the schedule,321
referring to the number of tasks, resources, and their dependencies, along with the evaluated criteria.322
4.3. The visualization service323
The aim of the visualization service is to present the data, coming from the manufacturing324
process, in an effective and intuitive manner. It is an essential support for the maintenance engineers,325
giving them the opportunity to evaluate the status of the remote manufacturing process, and to the326
maintenance operator, who is responsible for performing the maintenance activity.327
The visualization service can be considered as a set of information pages that integrate data and328
plant information coming from the manufacturing process, with the results of the predictive analysis.329
Version March 11, 2020 submitted to Electronics 9 of 21
To effectively present real-time errors and predictions to the user in an intuitive way, the service330
uses a simplified, but realistic, 3D representation of the machinery. In addition, the user can display331
maintenance guidance and information pertaining to the mechanical and electrical components of the332
machine.333
A Unity application that provides a real-time animation of the RobotBox 3D model via a web334
browser has been designed and developed. The live animation of the 3D model is achieved through a335
real-time data feed from the RobotBox controller, and the predictive information and machine status336
is obtained from the other SERENA services. The status of the component of interest is highlighted337
on the 3D model, along with its RUL. The 3D Virtual Procedure is an animated step-by-step guide,338
which shows the correct procedure to perform the maintenance operation. The concept is to evolve339
the typical text-based maintenance manual into an interactive 3D maintenance visualization, and so340
collaboratively guide the operator through the maintenance procedure.341
The visualization service is also charged to show to the end user the information produced by the342
analytic services. The visualization service and the analytic services communicate through RESTFUL343
APIs, enabling also the monitoring of the performance for the predictive models used in the production344
environment. The monitoring of the predictive models performance, as explained in previous sections,345
is an important activity that should be always taken into account when dealing with real applications.346
The visualization service enable the access to the information related to this task simplifying the347
understanding by non-expert users and the reliability of the models in production environments.348
The information about the predictive model performance are showed in an interactive dashboard. The349
dashboard reports the precision and the recall information for the predictive model in production.350
Moreover, the dashboard shows an interactive section that exploits bar charts to describe the351
degradation of the model performance over time, for each learned class. The user can interact with the352
dashboard to see the details about the degradation at a specific timestamp. For the selected timestamp353
and for each learned class, the dashboard shows the silhouette lines produced by the self-assessment354
service though an interpolated scatter plot.355
More dashboard details are reported in Section 6-Self-assessment.356
4.4. Cloud management and communication services357
The cloud management and communication services provide the base functionality that358
implements the SERENA system.359
Docker orchestrator service. The orchestrator controls the deployment and manages the life cycle360
of all services in the SERENA cloud. It ensures that the services are running and will redeploy them if361
part of the underlying infrastructure fails. Critical services, such as the central message broker, are362
deployed as resilient and scalable service clusters. The SERENA system implements its own local363
image registry, where the service images are stored. Running a local image registry reduces the security364
risk of directly accessing an Internet Docker registry, and ensures that the required images are always365
available from a trusted source.366
As the edge gateways are part of the SERENA cloud, the orchestrator can automatically deploy367
services all the way to the edge. The analytics models are periodically retrained in the cloud, then368
wrapped as a Docker service and deployed to the gateways. The specific manufacturing process that369
the gateway supports is configured via the orchestrator using function description labels. Based on the370
labels the orchestrator will automatically deploy the correct services to the gateway.371
Edge gateway services. The edge gateways collect and analyze raw sensor data from the372
manufacturing process, and are fully integrated parts of the SERENA cloud. Typically, two types373
of service are deployed to the gateway: a data flow engine and a pre-trained data analytics or374
predictive analytics . The data flow engine, based on Node-RED [34], collects raw sensor data from the375
manufacturing process and passes it to the analytics service. The analytics service converts the raw376
time series data into features and encoding them into JSON-LD. JSON-LD is an extension of JSON that377
supports Linked Data, which allows data and metadata to be combined into a single context. The data378
Version March 11, 2020 submitted to Electronics 10 of 21
flow engine then forwards the JSON-LD to the central message broker for distribution to the other379
SERENA services. Regarding predictive analytics, the gateway represents a core component since380
it allows to fulfill different use case requirements and/or constraints. In case of low latency is not381
needed in prediction (as is required in case of mechanical reaction in the actual process) or analytics382
algorithms require a huge amount of historical data which does not fit in the gateway storage; we383
can both train and execute predictive algorithms in the cloud in order to better optimize hardware384
resources and not overload equipment close to production. Otherwise, if the application requires to385
have the results of analytics services with high-performance constraints, the SERENA architecture386
offers a mechanism to deploy to the local gateway a model, trained in the cloud, whose output could387
be used in near real-time applications to easily perform the prediction.388
Central message broker service. The central message broker service is the main communication389
hub of the SERENA system. To ensure its scalability and high availability, it is deployed as a resilient390
Docker cluster service. The broker exposes REST, MQTT and web service endpoints, which distribute391
messages between the cloud services and the edge gateways. In addition, the broker is intended to392
act as the access point for external facilities. Security is an important feature of the SERENA system,393
and the broker, as the communications hub, provides secure channels between the gateways and394
cloud services. It also validates the authenticity of incoming messages, and whether the requester is395
authorized to use the requested service.396
Repository services. The SERENA architecture provides a number of repositories which hold397
various types of data used by the system, including the features and raw data repositories as well398
as maintenance manuals, data artefacts and information required by other SERENA services, such399
as maintenance tasks for the scheduler. The raw data repository is used for training the predictive400
analytics models. In addition, a metadata repository is included for storing contextual information401
connecting distributed stored and collected data. The repositories are implemented as stateless Docker402
services, the state being stored in external virtual volumes. Implementing the repositories as stateless403
services, enables a plug-n-play and dynamic deployment like the other services in the system.404
5. Use case description405
Given the high throughput of automation lines and the great number of robots involved in406
production, a common issue is the robot belts, working continuously for days at high cycles per minute407
ratios, suffering tensioning problems. The belts are used as transmission for the robot axes, and setting408
the tension is crucial for obtaining high levels of performance and precision.409
In factory plants, preventive maintenance policies are used to avoid production stops. Hence, the410
acquisition and analysis of failure data for predictive analytics is a challenge. To overcome the lack of411
failure data, since it would have been too complicated and expensive to manipulate an entire robot, a412
test-bed (Figure 3) has been created, the RobotBox. The RobotBox is made of a motor equipped with413
an encoder (to read the actual motor axis position), a belt, a gearbox reducer, and a 5 kilos weight to414
simulate an actual robot application. All components have been taken from a six axis industrial robot.415
The decision to build the test-bed (RobotBox) instead of using an entire robot is not only due to the416
cost. Indeed, this simplification allows to better isolate the system from environmental factors such as417
weight of the links, inertia, vibration and so on. Thus, using the RobotBox allows to collect less noisy418
data and generalize the knowledge acquired to the entire robot axes.419
In order to create a high quality dataset, changing the belt tension levels in a reproducible manner420
was necessary. Hence a slider was installed allowing a continuous change of the tension and a421
centesimal indicator, to measure the belt tensioning with respect to a pre-configured position. The422
distance between the gearbox and the motor is proportional to the belt tensioning.423
In this work, only the current (in Amperes) used by the RobotBox motor is taken into account and424
used as input for the analytic component. The current is tracked by the RobotBox controller every 2425
milliseconds. Nevertheless, future works will consider fusing data of different sources.426
Version March 11, 2020 submitted to Electronics 11 of 21
Figure 3. RobotBox components.
The gateway collects data from the RobotBox and pre-processes the data in order to extract feature427
engineering for the analytics. The results of this processing step enriched by the result of already428
deployed analytics models, remaining useful life or otherwise, is transmitted afterwards to the cloud429
node as a JSON file. Currently and for the use case presented in this work, almost 1 GB of data are430
collected and transmitted per day, including raw data, extracted features and analytics results of431
already deployed at the gateway models. Nevertheless, depending on the application, it is could be432
sufficient to collect a smaller dataset with a greater time window.433
Regarding the computational power and the bandwidth, the most impacting process is the434
prediction of the status of the machine, which is performed in runtime and at the gateway level. The435
response time, hence the hardware setup requirements in terms of connectivity, data transmission and436
processing/storage power, for the use case under discussion is expected to be within some minutes.437
This is addressed by the aforementioned setup. More complex and time or resources consuming438
analyses including the training and testing of new predictive models with new datasets as collected439
through time, takes place at the cloud and when a new model is created this replaces the one deployed440
at the gateway.441
6. Experimental setting and results442
In this section the experimental setting and results related to the predictive analytics service are443
discussed in order to demonstrate its ability in correctly performing the prediction on real data. The444
analytics service, including both the model building phase and the real-time prediction, has been445
developed on the top of the big-data analytic framework Apache Spark [24], along with the scalable446
machine learning library MLLib [25]. The effectiveness of the proposed methodology is validated447
through a set of experiments, all performed on a Intel i7 8-cores processor workstation, with a 32GB448
main memory.
Figure 4. Samples of electricity consumption for different tensioning values.449
Version March 11, 2020 submitted to Electronics 12 of 21
Table 1. Tension level dataset.
Classes # of cycles Dataset %
0 5,952 35.30%1 1,591 9.44%2 1,937 11.70%3 3,174 18.82%4 4,172 24.74%
Total cycles 16,862 100%
Figure 5. Sample current signal collected from the RobotBox with the segments highlighted.
Using the slider installed on the RobotBox, 5 different classes of tensioning have been defined:450
classes 0 and 1 refer to low tensioning, class 2 to correct tensioning and classes 3 and 4 to high451
tensioning. Figure 4 shows an example of the current amplitudes values in Ampere for different level452
of tensioning. Table 1 shows the number of cycles in the dataset under analysis, divided by class.453
From the experimental results, the analytic service presented in section 4.1 turns out to be454
effective in predicting the class relative to the belt tensioning values given measurements of the current455
amplitude.456
Features computation. Each incoming motor current signal has been divided into 24 segments457
(Figure 5 shows the segments over a sample signal collected from the RobotBox use case), and each458
segment has been characterized by statistical features (e.g. mean, std, quartiles, Kurtosis, skewness),459
for a total of 350 features.460
Features selection. Then, through the correlation test, the number of features is reduced choosing461
a proper threshold. From this step both the performance and quality of the predictive model will462
be affected. Figure 6 shows the f-measure obtained without the feature selection step and with two463
different thresholds for the correlation test, 0.3 and 0.5. From the image it is clear that the threshold464
of 0.5 is outperforming the other two options with all the tested classifier while demonstrating the465
effectiveness of the proposed feature selection strategy.466
467
Predictive analytics. The service performs a grid search over three different algorithms available468
in the MLLib library: Decision Tree (DT), Gradient Boosted Tree (GBT), Random Forest (RF). Each469
tested configuration has been validated performing a 3 fold cross-validation.470
Table 2 shows the performance for each class of the tested classifiers after the grid-search with the471
stratified k-fold cross-validation. The grid-search has been performed to analyze an exhaustive range472
of algorithm configurations including the max-depth of the trees ranging in {10, 25, 50, 100, 250, 500}473
and the number of estimators ranging in {10, 25, 50, 100, 250, 500} (just in case of RF). Based on the474
grid-search results, the proposed approach automatically selects the Random Forest (RF) configured475
with max_depth:10 and n_estimators:250 showing better results for all the classes in term of precision,476
recall and f-measure. RF and GBT have similar performances despite class 4 where RF is performing477
slightly better. Table 3 shows the top 10 important features for the RF model along with their feature478
importance values calculated accordingly to [35]. Feature names are composed by the name of the479
statistics and the segment identifier on which they have been calculated with the following format:480
<name_of_the_statistics>-<segment_id>. From the feature importance it is possible to notice that the481
Version March 11, 2020 submitted to Electronics 13 of 21
Figure 6. Average f-measure for the different classifiers.
Table 2. Classification results, correlation threshold 0.5.
Classifier Class Precision Recall F-measure
RF 0 0.99 0.99 0.99RF 1 0.99 0.99 0.99RF 2 0.99 0.99 0.99RF 3 0.98 0.96 0.97RF 4 0.84 0.93 0.85
GBT 0 0.99 0.99 0.99GBT 1 0.99 0.99 0.99GBT 2 0.99 0.99 0.99GBT 3 0.98 0.96 0.97GBT 4 0.83 0.89 0.82
DT 0 0.99 0.99 0.99DT 1 0.98 0.96 0.97DT 2 0.99 0.95 0.97DT 3 0.97 0.86 0.90DT 4 0.70 0.92 0.72
most frequent statistical feature exploited by the RF model is the mean of the absolute change (i.e.,482
the mean of the numerical differences between each couple of consecutive values in the segment)483
calculated in the first part of the input signals (segments 1, 4, 5, 6 and 10 in Figure 5). Also segment 1484
(i.e., at the beginning of the cycle in the acceleration phase) appears three times and segment 10 (i.e.,485
the brake phase before idle phase) two times in the top ten features meaning that these segments are486
relevant for the prediction process. The best model is then saved on HDFS or deployed on the edge487
enabling the real-time prediction step to be available both on the cloud or on premises.488
In case of high performance constraints of the production environment, the model, trained on the489
cloud, can be deployed directly to the edge gateway. In this case the latency of the predictive analytics490
can be drastically reduced.491
In the actual real case scenario, the edge gateway maintains a queue of incoming current signals and492
runs the edge prediction service that performs both the features computation and the prediction for493
each new incoming signal. The current implementation of the prediction service on the edge is able to494
perform the above two tasks (i.e., feature computation and real-time prediction) for multiple incoming495
signals, parallelizing the tasks through a local deployment of the same spark service available in the496
cloud. However, to test the latency of the predictive analytic service deployed on the edge a dual core497
gateway with 4GB of ram has been exploited receiving a flow of 1000 RobotBox signals in sequence.498
To assure its performance a sequential scenario has been simulated in which signals are evaluated499
Version March 11, 2020 submitted to Electronics 14 of 21
Table 3. Feature importance for the top 10 features extracted from the best RF model trained by thepredictive analytics service.
Feature name Importanceskewness-1 0.048
mean_abs_change-6 0.048mean_abs_change-1 0.045mean_abs_change-5 0.044mean_abs_change-4 0.040
mean_abs_change-10 0.038kurtosis-3 0.030
mean_abs_change-7 0.029kurtosis-1 0.023
third_quartile-10 0.023
in sequence one by one with no delay between each other. The edge prediction service reached, in500
the sequential scenario, an average computation time of 2.73 seconds with a standard deviation of501
0.36. Considering that a robot cycle in the RobotBox use case lasts around 24 seconds, the performance502
reached by the edge prediction service are more than sufficient to have a real-time answer.503
Self-assessment. The self-assessment service provides information about the reliability of the504
model, built on the historical data, over time.505
To better evaluate the proposed approach, a model considering only the classes of data that506
usually are known by the domain experts has been generated. This is a likely assumption from the507
moment that usually, and especially in industrial use cases, for a given machinery, only some behaviors508
are known. Indeed, monitoring the malfunctioning of a machinery is a difficult task, since having a509
complete description of all the possible wrong behaviors is very complex: this motivation supports the510
role of this service in the SERENA cloud.511
To assess the effectiveness of this service a model has been trained on the tension level dataset512
exploiting only the elements of classes 2 and 4 since they are supposed to be the known classes at513
training time. Then, a flow of new incoming data has been simulated exploiting the remaining classes514
in the dataset i.e. classes 0, 1 and 3. The test dataset has been injected in the SERENA platform and the515
self-assessment service has been triggered at specific timestamps. The test trigger’s timestamps has516
been measured in number of incoming samples to keep the analysis time independent and process517
independent.518
For this reason, the self-assessment service has been triggered following the pattern in Table 4. The519
cardinality of the data used to train the predictive model is reported in column Training (t0), then, for520
each timestamp (tn) the number of record for each class label injected is reported. Row New data %521
shows the percentage of data that is new w.r.t. the data used to train the model, while row Drift %522
reports the percentage of drift included at each timestamp.523
From Table 4 it is possible to notice that from time t1 to time t5 only data from known classes have524
been injected, then, starting from time t6, class 0 is injected, followed by class 1 (from time t8) and then525
by class 3 (from time t9).526
The graphs in Figure 7 are extracted from the SERENA dashboard. They show the degradation of the527
model trained on classes 2 and 4 over time. Figures 7a and 7b show the percentage of degradation528
measured by the self-assessment cloud service. From the charts it is possible to notice that until time t5529
no degradation occurs, reflecting what expected from the injection pattern: only data with a known530
distribution has been collected by the system and the performance of the model remains unchanged.531
Then, at time t6 class 0 starts to be recorded by the system, however, this data has an unknown532
distribution to the model under analysis, since class 0 was not present in the training dataset. At time533
t6 the self-assessment service shows a clear degradation of model performance, correctly identifying534
the presence of the new unknown distribution of data. The degradation then, continue to increase535
until time t10 correctly identifying the increasing of the drift with the new incoming data.536
Version March 11, 2020 submitted to Electronics 15 of 21
In particular it is possible to notice from Figure 7a that label 2 is the class mostly affected by the537
presence of concept drift, while label 4 is slightly affected only when new data belonging to the538
unknown labels 1 and 3 are injected in the system.539
Thus, the self-assessment service allows to correctly detect per-class predictive model performance540
degradation over time.541
The interactive dashboard allows also to select a specific timestamp showing the details of the542
degradation w.r.t. the information learned at training time. It is possible to analyse the details of the543
silhouette curves calculated by the system at training time (silhouette base) and calculated after the544
arrival of new incoming data (silhouette degraded) comparing their trends in Figures 7c ,7d for what545
concern degradation at time t1 and Figures 7e ,7f for time t10.546
Comparing the curves in Figures 7c and 7d they show that no degradation has occurred at time t1 since547
the curves are almost overlapped. While, analysing the curves in Figures 7e and 7f is possible to notice548
that the most degraded class is 2, showing a highly degraded silhouette curve (dotted curve) w.r.t. the549
base silhouette calculated at training time (continuous curve). Class 4 is instead softly degraded since550
it was affected mainly by the new classes 1 and 3.551
In this case, a quantitative comparison of the proposed approach with state-of-the-art552
methodologies proposed to estimate model degradation over time was not performed, since to the553
best of the authors knowledge none of the available data analytics solutions in literature [36–38] is554
capable of self-evaluating a predictive model degradation over time when ground-truth class labels555
are not available. Thus, any comparison to such approaches would not be objective. Techniques like556
[36] are limited to detect abrupt concept drift due to context changes and they are not able to correctly557
deal with slowly drift as in the context of degrading flows of production data. In addition, strategies558
like [38] have been developed and tested in very domain specific use cases, while the generality of the559
approach proposed here has been proved in [29,39]. Furthermore, the proposed concept drift detection560
strategy has been proved to reach near real-time performance even in big-data environments [39] and561
to the best of authors knowledge no other solutions addressed this scenario.562
Remaining Useful Life (RUL) estimation. The current implementation of the RUL estimation563
features a tunable approach for estimating the level of degradation of a machinery. In the presented564
experiments, relevant features to be included in the estimation of the RUL have been defined by a565
domain expert and validated under a robotic industry use case.566
As described in Section 4.1 the motor current signals are splitted in 24 segments to characterize each567
phase of the robotic operation. A sample current signal along with its segments is depicted in Figure 5.568
When dealing with mechanical machinery, domain experts usually posses the knowledge to recognise569
the most arduous phases of robot activity that, in long term, are the causes of its degradation. In this570
specific use case the phases in which the motor is breaking down before the idle phase (segments571
10 and 11 in Figure 5) have been selected as the phases affecting most its remaining useful life.572
To summarize the current signals in segments 10 and 11 the mean of the current values has been573
calculated separately for each segment and each RobotBox cycle. Figures 8a and 8b show respectively574
the Gaussian kernel densities as estimated for the nominal/correct classes 2 and 4 and for the classes575
describing the machinery degradation i.e. class 0. However, in a real-life setting, only the correct576
functioning of the machinery is known as used in the proposed RUL estimation (i.e., the distribution577
of classes 2 and 4) while the distribution of signals when the machinery is degrading are not available578
(i.e., class 0). Nevertheless all of them are reported only to demonstrate that the Gaussian kernel579
densities represent a simple and easy to understand and interpret approach for summarizing the580
nominal profile and behavior of an asset versus profiles representing different evolving degradation581
stages. Given the distribution of the correct and known operational profiles (i.e. classes 2 and 4) the582
probability of a new robot cycle, in terms of current mean value in segments 10 and 11, to belong to583
the distribution of classes 2 and 4 is an indication of whether the robot cycle under analysis is still584
included in a correct range of values or not and its probability. In other words, if the probability of585
a feature (current mean in segment 10) belonging to the distribution of classes 2 or 4 is high it can586
Version March 11, 2020 submitted to Electronics 16 of 21
Table 4. Self-assessment service, data injection pattern.
Label Training (t0) t1 t2 t3 t4 t5 t6 t7 t8 t9 t100 - - - - - - 2000 4000 5952 5952 59521 - - - - - - - - 48 1591 15912 1500 250 473 473 473 473 473 473 473 473 4733 - - - - - - - - - 457 31744 3000 - 400 800 1172 1172 1172 1172 1172 1172 1172
Tot 4500 250 873 1273 1645 1645 3645 5645 7645 9645 12362New data% 5.26 16.25 22.05 26.77 26.77 44.75 55.64 62.95 68.19 73.31
Drift % 0.00 0.00 0.00 0.00 0.00 24.55 39.43 49.40 56.56 63.56
(a) Degradation for class 2 over time. (b) Degradation for class 4 over time.
(c) Comparisons of the silhouette curves base,computed at training time, and degraded,computed at time t1 for label 2.
(d) Comparisons of the silhouette curvesbase, computed at training time, and degraded,computed at time t1 for label 4.
(e) Comparisons of the silhouette curves base,computed at training time, and degraded,computed at time t10 for label 2.
(f) Comparisons of the silhouette curves base,computed at training time, and degraded,computed at time t1 for label 4.
Figure 7. SERENA cloud dashboard for the self-assessment service.
be interpreted that the RobotBox is still working as expected. Then if the probability decreases over587
time it means that the RobotBox is gradually moving to higher degradation levels, thus the probability588
Version March 11, 2020 submitted to Electronics 17 of 21
(a) Distribution of the mean of thecurrent at segment 10.
(b) Distribution of the mean of thecurrent at segment 11.
Figure 8. Distributions of the mean of the current signals for segments 10 and 11 for classes 2 and 4describing the correct functioning of the machinery and for class 0 describing the incorrect functioningof the machinery. The density has been computed exploiting the Gaussian kernel density estimation.
distribution of the current signals probably fits under class 0.589
590
To simulate a real scenario for the estimation of the RUL of the RobotBox, where only its correct591
functioning is known, the same data injection pattern described in Table 4 has been exploited from592
time t1 to time t7 using classes 2 and 4 to simulate the correct functioning of the machine and class593
0 to simulate the its incorrect functioning. In particular, in the context of SERENA, the Gaussian594
distribution of the current mean value in segments 10 and 11 is calculated separately by analysing all595
historical signals collected from the beginning of the RobotBox life to time t0. Then, for each timestamp596
tn the percentage of RUL is estimated via Equation 2 with new data from each cycle as shown in597
Table 4.598
Figure 9 shows the RUL value estimated at each timestamp from t1 to t7 along with their error bands.599
As expected, the trend of the remaining useful life is clearly descending over time. At the time t1 only600
data belonging to the correct functioning is collected showing an estimated RUL of almost 62% with601
an error ranging between 40% and 85%. As soon as more data belonging to the correct distribution are602
collected the RUL slowly decreases as a result of the RobotBox’s natural degradation. Then, starting603
from time t6, data belonging to the wrong (unknown) functioning of the machine are injected and the604
RUL value is decreased to 13.30% at time t6 and to 7.18% at time t7. These results were assessed with605
domain experts in the robotics industry, verifying that the high error in the RUL estimation in time t1606
is a result of the limited number of robot cycles analyzed.607
Scheduling. In the current experiment, a schedule triggered by a RUL estimation, as provided by608
the predictive analytics service, was generated in approximately 11 msec, and included the execution609
of two tasks; machine inspection and replacement of the gearbox, along with three potential resources;610
(1) a team of one newcomer and one of middle experience, (2) one newcomer and one expert and (3)611
one expert. The difference in task completion time as well as their cost are presented in the Table 5, per612
task.613
3D model visualization. This service is part of the SERENA cloud, and visualizes the 3D model of614
the RobotBox, along with its real time movement, prediction information and a 3D virtual maintenance615
procedure. Figure 10 shows a screenshot of the service interface that includes the real time position616
chart of the axis of rotation, which at runtime is mirrored by the movement of the yellow arm. The top617
bar provides links to: the ”Charts” panel, which presents visual charts of the analytics information618
coming from the prediction service; and the ”Maintenance” panel, which provides a step-by-step619
animated guide.620
Version March 11, 2020 submitted to Electronics 18 of 21
Figure 9. Remaining Useful Time percentage estimated by SERENA over time. The data injectionpattern refers to the one showed in Table 4 from time t1 to time t7.
Table 5. Information used by the scheduling service for the experiment.
Operator Task Time (min) Cost (Euros/min)Newcomer, Middle Machine inspection 20 0.25Newcomer, Expert Machine inspection 120 0.25
Expert Machine inspection 15 0.4Newcomer, Middle Replace gearbox 100 0.4Newcomer, Expert Replace gearbox 10 0.5
Expert Replace gearbox 80 0.5
Figure 10. Visualization service screenshot.
7. Conclusions and future applications621
This paper presents a lightweight architecture merging cloud based and edge deployed622
components towards a first end-to-end implementation of a predictive analytics platform, with respect623
to cyber-physical features and the vision of Industry 4.0. In this regard, the proposed architecture624
has been designed with respect to some well-established needs of industrial enterprises, such as625
compatibility with both the on-premise and the in-the-cloud environments, exploitation of reliable and626
largely supported big data platforms, high levels of horizontal scalability, easy deployment through627
containerized software modules.628
Version March 11, 2020 submitted to Electronics 19 of 21
In addition, a novel and generic methodology for RUL estimation towards enabling predictive629
maintenance activities in a cyber physical production system is introduced. Different operational630
profiles of the monitored asset(s) are created and compared to a nominal one, created at the very631
beginning of the assets setup and operation. It should be noted that the proposed approach supports632
the connection of the data collected for the RUL estimation to the existing maintenance plan, optimising633
the amount of data collected. For example, considering a conventional maintenance plan of two weeks,634
positive results out of the proposed approach can be demonstrated with a dataset of only two weeks.635
in this way, the proposed approach is capable of being applied to a versatile set of domains and assets,636
without requiring extensive and costly monitoring systems and numerous sensors.637
A prototype has been implemented and validated in an industrial use case concerning the638
predictive maintenance of a robotic manipulator. A set of services has been integrated to evaluate the639
proposed architecture and its potential. The results demonstrate that the integrated solution achieved640
to bridge the gap between machine data acquisition and cloud processing and enable the generation of641
predictive analytics and strategies. The use of containerization technologies poses additional effort642
for industrial applications, including workload balancing, automating deployments, auto-discovery643
of nodes, as well as the configuration effort for creating the cluster. As the proposed approach is not644
constrained to any specific set of technologies, it has the ability to evolve as the project progresses into645
the next phase.646
In conclusion, future work will focus on integrating additional functionalities to the overall647
architecture, such as data security features and increasing the robustness of the integrated solution.648
As it regards the data analytics, further investigation and research is required to effectively analyze649
different kinds of industrial processes (including those that are very slowly degrading over time) and650
being able to perform the prediction with different time horizons. Moreover, since the analytics part is651
a critical aspect for enabling predictive maintenance solutions, particular focus will be given on testing652
and improving the proposed approaches in the context of other real-life settings, on both on-premise653
and (public) cloud.654
Author Contributions: Data curation, Lucrezia Morabito and Chiara Napione; Methodology, Nikolakis Nikolaos,655
Tania Cerquitelli and Simone Panicucci; Software, David Bowden, Francesco Ventura, Stefano Proto, Paul Becker,656
Guido Coppo and Salvatore Andolina; Validation, Simone Panicucci, Francesco Ventura and Stefano Proto; Writing657
– original draft, Nikolakis Nikolaos and Tania Cerquitelli; Writing – review & editing Enrico Macii, Sotiris Makris,658
Niamh O’Mahony and Angelo Marguglio.659
Funding: The research leading to these results has been partially funded by European Commission under660
the H2020-IND-CE-2016-17 program, FOF-09-2017, Grant agreement no. 767561 “SERENA" project, VerSatilE661
plug-and-play platform enabling REmote predictive mainteNAnce.662
Conflicts of Interest: “The authors declare no conflict of interest.”663
References664
1. Apiletti, D.; Barberis, C.; Cerquitelli, T.; Macii, A.; Macii, E.; Poncino, M.; Ventura, F. iSTEP, an Integrated665
Self-Tuning Engine for Predictive Maintenance in Industry 4.0. IEEE International Conference on666
Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big667
Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications,668
ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018, Melbourne, Australia, December 11-13, 2018, 2018,669
pp. 924–931.670
2. Niño, M.; Blanco, J.M.; Illarramendi, A. Business understanding, challenges and issues of Big Data671
Analytics for the servitization of a capital equipment manufacturer. 2015 IEEE International Conference on672
Big Data (Big Data), 2015, pp. 1368–1377.673
3. Marques, G.; Pitarma, R.; M. Garcia, N.; Pombo, N. Internet of Things Architectures, Technologies,674
Applications, Challenges, and Future Directions for Enhanced Living Environments and Healthcare675
Systems: A Review. Electronics 2019, 8, 1081. doi:10.3390/electronics8101081.676
4. Mis̆kuf, M.; Zolotová, I. Comparison between multi-class classifiers and deep learning with focus on677
industry 4.0. 2016 Cybernetics Informatics (K I), 2016, pp. 1–5.678
Version March 11, 2020 submitted to Electronics 20 of 21
5. Apiletti, D.; Baralis, E.; Cerquitelli, T.; Garza, P.; Pulvirenti, F.; Venturini, L. Frequent itemsets mining for679
Big Data: a comparative analysis. Big Data Research 2017, 9, 67–83.680
6. Babiceanu, R.F.; Seker, R. Big Data and virtualization for manufacturing cyber-physical systems: A survey681
of the current status and future outlook. Computers in Industry 2016, 81, 128 – 137. Emerging {ICT} concepts682
for smart, safe and sustainable industrial systems.683
7. Huang, Z.; Zhong, A.; Li, G. On-Demand Processing for Remote Sensing Big Data Analysis. 2017684
IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017685
IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017, pp.686
1241–1245.687
8. Simonis, I. Container-based architecture to optimize the integration of microservices into cloud-based688
data-intensive application scenarios. Proceedings of the 12th European Conference on Software689
Architecture: Companion Proceedings. ACM, 2018, p. 34.690
9. Acquaviva, A.; Apiletti, D.; Attanasio, A.; Baralis, E.; Bottaccioli, L.; Cerquitelli, T.; Chiusano, S.; Macii, E.;691
Patti. Forecasting Heating Consumption in Buildings: A Scalable Full-Stack Distributed Engine. Electronics692
2019, 8, 491. doi:10.3390/electronics8050491.693
10. Jin, Y.; Lee, H. On-Demand Computation Offloading Architecture in Fog Networks. Electronics 2019,694
8, 1076. doi:10.3390/electronics8101076.695
11. Xiang, Y.; Zhu, Z.; Coit, D.W.; Feng, Q. Condition-based Maintenance Under Performance-based696
Contracting. Comput. Ind. Eng. 2017, 111, 391–402.697
12. Mehta, P.; Werner, A.; Mears, L. Condition Based Maintenance-systems Integration and Intelligence Using698
Bayesian Classification and Sensor Fusion. J. Intell. Manuf. 2015, 26, 331–346.699
13. Murphree, J. Machine learning anomaly detection in large systems. 2016 IEEE AUTOTESTCON, 2016, pp.700
1–9.701
14. Zhang, C.; Lim, P.; Qin, A.K.; Tan, K.C. Multiobjective Deep Belief Networks Ensemble for Remaining702
Useful Life Estimation in Prognostics. IEEE Transactions on Neural Networks and Learning Systems 2016,703
PP, 1–13.704
15. Asmai, S.A.; Basari, A.S.H.; Shibghatullah, A.S.; Ibrahim, N.K.; Hussin, B. Neural network prognostics705
model for industrial equipment maintenance. 2011 11th International Conference on Hybrid Intelligent706
Systems (HIS), 2011, pp. 635–640.707
16. Chen, Y.; Zhu, F.; Lee, J. Data quality evaluation and improvement for prognostic modeling using visual708
assessment based data partitioning method. Computers in Industry 2013, 64, 214 – 225.709
17. Bowden, D.; Marguglio, A.; Morabito, L.; Napione, C.; Panicucci, S.; Nikolakis, N.; Makris, S.; Coppo, G.;710
Andolina, S.; Macii, A.; Macii, E.; O’Mahony, N.; Becker, P.; Jung, S. A Cloud-to-edge Architecture for711
Predictive Analytics. Proceedings of the Workshops of the EDBT/ICDT 2019 Joint Conference, EDBT/ICDT712
2019, Lisbon, Portugal, March 26, 2019., 2019.713
18. JSON-LD. https://json-ld.org/. Accessed: 2019-11-15.714
19. MIMOSA. http://www.mimosa.org/specifications/osa-eai-3-2-3/. Accessed: 2019-11-15.715
20. Shvachko, K.; Kuang, H.; Radia, S.; Chansler, R. The Hadoop Distributed File System. Proceedings of the716
2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST); IEEE Computer Society:717
Washington, DC, USA, 2010; MSST ’10, pp. 1–10.718
21. Lakshman, A.; Malik, P. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev.719
2010, 44, 35–40.720
22. Proto, S.; Ventura, F.; Apiletti, D.; Cerquitelli, T.; Baralis, E.; Macii, E.; Macii, A. PREMISES, a Scalable721
Data-Driven Service to Predict Alarms in Slowly-Degrading Multi-Cycle Industrial Processes. 2019 IEEE722
International Congress on Big Data, BigData Congress 2019, Milan, Italy, July 8-13, 2019, 2019, pp. 139–143.723
23. Ross, S.M. Introduction to probability models; Academic press, 2014.724
24. Zaharia, M.; Xin, R.S.; Wendell, P.; Das, T.; Armbrust, M.; Dave, A.; Meng, X.; Rosen, J.; Venkataraman, S.;725
Franklin, M.J.; Ghodsi, A.; Gonzalez, J.; Shenker, S.; Stoica, I. Apache Spark: A Unified Engine for Big Data726
Processing. Commun. ACM 2016, 59, 56–65.727
25. Meng, X.; Bradley, J.; Yavuz, B.; Sparks, E.; Venkataraman, S.; Liu, D.; Freeman, J.; Tsai, D.; Amde, M.;728
Owen, S.; Xin, D.; Xin, R.; Franklin, M.J.; Zadeh, R.; Zaharia, M.; Talwalkar, A. MLlib: Machine Learning in729
Apache Spark. J. Mach. Learn. Res. 2016, 17, 1235–1241.730
26. Breiman, L. Classification and regression trees; Routledge, 2017.731
Version March 11, 2020 submitted to Electronics 21 of 21
27. Breiman, L. Random Forests. Machine Learning 2001, 45, 5–32. doi:10.1023/A:1010933404324.732
28. Friedman, J.H. Greedy function approximation: a gradient boosting machine. Annals of statistics 2001, pp.733
1189–1232.734
29. Ventura, F.; Proto, S.; Apiletti, D.; Cerquitelli, T.; Panicucci, S.; Baralis, E.; Macii, E.; Macii, A. A New735
Unsupervised Predictive-Model Self-Assessment Approach That SCALEs. 2019 IEEE International736
Congress on Big Data, BigData Congress 2019, Milan, Italy, July 8-13, 2019, 2019, pp. 144–148.737
30. Kim, S.; Kim, H. A new metric of absolute percentage error for intermittent demand forecasts. International738
Journal of Forecasting 2016, 32, 669–679.739
31. Kleinbaum, D.G.; Klein, M. Survival analysis; Vol. 3, Springer, 2010.740
32. Parzen, E. On Estimation of a Probability Density Function and Mode. Ann. Math. Statist. 1962,741
33, 1065–1076. doi:10.1214/aoms/1177704472.742
33. Nikolakis, N.; Papavasileiou, A.; Dimoulas, K.; Bourmpouchakis, K.; Makris, S. On a versatile scheduling743
concept of maintenance activities for increased availability of production resources. Procedia CIRP 2018,744
78, 172 – 177. 6th CIRP Global Web Conference – Envisaging the future manufacturing, design, technologies745
and systems in innovation era (CIRPe 2018).746
34. Node-RED. https://nodered.org/. Accessed: 2019-11-15.747
35. Hastie, T.; Tibshirani, R.; Friedman, J. The elements of statistical learning: data mining, inference, and prediction;748
Springer Science & Business Media, 2009.749
36. Vorburger, P.; Bernstein, A. Entropy-based Concept Shift Detection. Sixth International Conference on750
Data Mining (ICDM’06), 2006, pp. 1113–1118. doi:10.1109/ICDM.2006.66.751
37. Bifet, A.; Gavalda, R. Learning from time-changing data with adaptive windowing. Proceedings of the752
2007 SIAM international conference on data mining. SIAM, 2007, pp. 443–448.753
38. Klinkenberg, R.; Joachims, T. Detecting Concept Drift with Support Vector Machines. ICML, 2000, pp.754
487–494.755
39. Cerquitelli, T.; Proto, S.; Ventura, F.; Apiletti, D.; Baralis, E. Towards a Real-Time Unsupervised756
Estimation of Predictive Model Degradation. Proceedings of Real-Time Business Intelligence757
and Analytics; Association for Computing Machinery: New York, NY, USA, 2019; BIRTE 2019.758
doi:10.1145/3350489.3350494.759
c© 2020 by the authors. Submitted to Electronics for possible open access publication760
under the terms and conditions of the Creative Commons Attribution (CC BY) license761
(http://creativecommons.org/licenses/by/4.0/).762