Rongze Li1†, Zhengtian Chu1†, Wangkai Jin2†, Yaohua Wang3, Xiao
Hu3∗,
Abstract—Remaining Useful Life (RUL) is an essential factor in the
Prognostics and Health Management (PHM) field. A reliable and
accurate RUL estimation of the condi- tion monitoring data could
maximize system performance and reduce maintenance costs. Recently,
with a surge of interest in deep learning (DL) and the rise of com-
putational power, many state-of-the-art neural networks have been
introduced in the PHM field. However, the previously proposed
networks have drawbacks in handling sequential tasks. For example,
the widely-used Recurrent Neural Network (RNN) and Long Short-term
Neural Network (LSTM) have long-term dependency problems and
gradient vanishing problems. In this paper, we adopt the Temporal
Convolutional Network (TCN), which excels in sequential data
processing and avoid potential problems shared by the
aforementioned models. We have leveraged TCN on the C-MAPSS Dataset
from NASA to examine its performance in RUL estimation. Our
experiments result shows that TCN outperforms all the previous
proposed neural networks for RUL estimation, which indicates the
potential of TCN applications in the PHM field.
Index Terms—Alternative to RNN, Prognostics Health Management,
Remaining Useful Life, Temporal Convolu- tional Network.
I. INTRODUCTION
Prognostics and Health Management (PHM) is a disci- pline that
mainly focuses on studying the failure mech- anism of a system.
Applying PHM methodologies on manufacturing/industrial systems
could release the full potential of a system while guaranteeing
systems’ safety by spotting potential faulty components at an early
stage. By processing sensor data as input, PHM approaches could
predict a system’s or a component’s rest working
1Rongze Li and Zhengtian Chu are with the University of Notting-
ham.
2Wangkai Jin is with the University of Nottingham, Ningbo, China.
3Yaohua Wang and Xiao Hu are with the National University of
Defense Technology. ∗ The corresponding author is Xiao Hu,
Email:
[email protected]. † Work is done during internship in National
University of Defense
Technology
time until failure. Examples of several implementations in the
industry world are [1] and [2]. The prediction time refers to the
remaining useful life (RUL), which is essential in fault detection
and maintenance decision- making.
RUL has already served as one of the standard cri- teria in the
industry field and many experts have been striving to enhance the
accuracy of predicting RUL in diverse scenarios. There are mainly
three types of model approaches of RUL estimation, which are
model-based prognostics, data-driven prognostics and hybrid
approach [3]. Model-based prognostics emphasize the implemen-
tation of physical models, which can achieve on several
levels(e.g., micro or macro levels). It has a compelling
performance in scenarios where the degradation mecha- nism is
determined and failure thresholds can be defined. However, time and
cost are two obstacles for researchers to implement or reproduce
these models. Data-driven prognostics, which use sensor data to
simplify the im- plementation and lower costs, are applied more in
the sequence learning process. It could further be further
categorized into statistical and classical Machine Learn- ing (ML)
approaches. With a surge of interest in Deep Learning (DL),
researches have applied many state- of-the-art deep learning neural
networks in the PHM field. The general advantages of DL over ML in
data- driven approaches are: 1) deeper network architecture
contributes to more precise feature extractions. 2) DL performs
better in processing temporal data. 3) advanced ability to handle a
large amount of high-dimensional data. Therefore, data-driven
approaches have strength in predicting systems’ dysfunction by
using run-to-failure data. Thus, they have a wider confidence
interval than model-based prognostics, which means its prediction
is more assured, despite uncertainty in prediction error,
degradation changes and human operations, etc.. Hybrid approaches
incorporate the advantages of both aforemen- tioned prognostic
approaches and its practice is more inclined to a real-world
situation.
Recently, researchers have been exploring data-driven
approaches, especially DL-based approaches to en- hance prediction
accuracy. Deep learning methods have high performance in processing
high-volume and high- dimensional Time Series Data (TSD). This type
of data requires sequence modeling to solve different problems and
many deep learning networks are capable of achiev- ing that.
However, these networks still face various drawbacks which are
caused by the flawed architec- ture design. Networks like CNN[4],
which has strength in feature extraction, performs poorly in
keeping time coherence. While networks like RNN[5], LSTM[6] are
proposed to solve the problem. they have other problems such as
gradient vanishing and longer execution time.
Towards these challenges, this paper applied the Tem- poral
Convolutional Network (TCN)[7] in RUL estima- tion to address these
problems. The TCN model incorpo- rates the strength of causal
convolution, residual connec- tion, and dilation convolution,
following a convolutional neural network (CNN) framework for
sequence model- ing. It excels in solving gradients
vanishing/explosion problems, fastening training epochs, and
changing the receptive fields flexibly. Experiment results
conducted on the C-MAPSS dataset provided by NASA [8] shows the
exceptional performance of our proposed work. A systematic study
performs to test TCN’s effectiveness among DL-based approaches, and
Its result shows that TCN has superior performance in obtaining
high accu- racy of faults prognostics with less time.
The rest of the paper organizes as follows. Section II introduces
the key features and evaluation of the TCN model; Section III
presents the experiment study and evaluation of different network
architectures on C- MAPSS data; Section IV shows the discussions
and future work plans; Section V provides the conclusion of the
paper.
II. TEMPORAL CONVOLUTIONAL NETWORK FOR
RUL ESTIMATION
A. Brief Introduction of TCN
Convolutional neural network (CNN) is a classical neural network
which is good at image processing based on its excellent feature
extraction capability. At present, CNN has widely used in many
fields, such as face recognition, automatic driving, and security.
Neverthe- less, there was no mature CNN model applied in timing
problems until the advent of the temporal convolutional network
(TCN) proposed by [7]. TCN has shown great ability in solving
sequence problems and it could be used as a better alternative of
RNN/LSTM in such problems.
The following sections will illustrate its working princi- ples and
main advantages.
B. The principle of TCN
Generally speaking, TCN has two main characteris- tics. Firstly, it
maintains a causal relationship between each layer of the network,
which means that the con- volution output of a layer t is
determined solely on the convolution result of layers before t.
Thus, the data coherence and time coherence are better protected
than the limited historical information storage and possible data
absence of LSTM’s memory cell. Secondly, the architecture of this
model can be flexibly adjusted to any length. It can also be mapped
according to several interfaces required by the output, which is
similar to the RNN framework. Compared with the traditional CNN
network structure, TCN adds four core parts in the de- sign:
sequence modeling, causal convolutions and dilated convolutions,
and residual connections. This section will introduce the
architecture and working principle through these four parts.
1) Sequence Modeling: A simple sequence modeling task is used to
illustrate the sequence modeling char- acteristics of TCN. Assuming
that the input sequence i0, ..., iT is given, and it requires
predicting the specific outputs O0, ..., OT at every step.
Following the require- ments, the model should predict the
corresponding out- put O0 at a particular time point t. The key
constraint of sequence modeling is that the output at time t should
be generated by exactly the recorded inputs before time t instead
of the post-positional information, which follows the sequence of
data flow. The one-to-one mapping from it to yt of sequence
modeling network could be simply expressed as:
O0, ..., OT = f(i0, ..., iT ) (1)
After the prediction, it is necessary to establish a corre-
sponding evaluation mechanism to evaluate the quality of the
prediction results and control the whole training procedure like
the equation below. C-MAPSS Data Set accomplish the demand of
sequential tasks and are suit- able to implement TCN architecture
due to the features of datasets.
Fig. 1. An example of causal convolutions.
2) Causal Convolutions: After the introduction of the sequence
modeling above, two principles of TCN are summarized. First, the
length of output after model prediction will always remain the same
as the input length. Second, the TCN remains invisible to ’future’
information and always depends on the previous inputs to complete
the prediction. To maintain the first principle, the TCN utilizes
the 1D fully-convolutional network (FCN)[10]. The core idea of FCN
is adopting the zero- padding method to guarantee each output layer
keeps the same length and width as the input layer in the
propagation of the network. As for the second principle, TCN
utilizes causal convolutions to prevent future in- formation
leakage. Causal convolutions are abstracted to predict current
output yT depending on previous inputs x0, ..., xT and previous
layers’ output y0, ..., yT−1 to make yT approach to the actual
value.
Fig. 2. An example of dilated convolutions.
3) Dilated Convolutions: Although those above causal convolutional
structure is feasible to prevent fu- ture information leakage, it
increases the number of layers in the network and keeps extremely
long histor- ical information sequences simultaneously. As Figure 1
shows, the signed output in the upper right corresponds to five
perceptive fields (5 black balls in the input sequence), and it is
obtained through five layers. It shows that the size of the
receptive field has a positive linear
correlation with the depth of the network, which may burden the
learning process. To simplify the network and relieve memory
storage pressure, TCN applies dilated convolutions[11] on the
network and forms an exponen- tial correlation between the size of
the receptive field and the number of layers[12]. The following
equation can demonstrate the principle:
F (s) = (x ∗d f)(s) = k−1∑ i=0
p(i) · xs−d·i (2)
where d is the dilation factor, k is the filter size, and s − d · i
means convoluting only the former state. x is the sequence input
and f : {0, ..., k−1} is the filter. The operation F takes the
inputs s to complete convolutions using a fixed step between every
two adjacent filter taps. Figure 2 shows the different dilated
convolutions when d is 1, 2, 4 respectively, the whole architecture
of the network becomes dilated and includes less historical data.
Therefore, this method can keep a large perceptive field with fewer
layers and simplify learning tasks.
4) Residual Connections: The fast track in ResNet[16] enables the
model to learn the difference information, which effectively allows
the network to modify the identity mapping to avoid gradient
vanishing and gradient exploding problem in the deep-layer model.
For TCN, if the model needs to record a large amount of historical
information, the final receptive field could be vast and the
network could become extremely deep. Hence, TCN adopted residual
connections to reduce network depth. Each residual block module
consists of two layers of residual convolutions, ReLU[17] and batch
normalization. Weight normalization is adopted for batch
normalization operation. In addition, spatial dropout[18] is added
after the activation function. An illustration of detailed residual
block construction is in figure 3.
Fig. 3. The profile of one residual block in TCN.
Figure 4 shows the sample residual block of TCN with 3 kernel size
and 1 dilated factor as below.
Fig. 4. A sample residual block of TCN with 3 kernel size and 1
dilated factor.
C. Advantages and Disadvantages
This section demonstrates the strengths and weak- nesses of
TCN.
1) Advantages: (1) CNN can conduct convolution operations in
paral-
lel. Therefore, TCN can preserve long-term memory in both training
and validation.
(2) Gradient stable TCN has a different back- propagation path from
the sequence time direction, which avoids the gradient exploding
and gradient vanishing problems in deep-layer networks com- pared
to the RNN.
(3) The TCN can possess a sizeable perceptive field under the
condition of shallow layers. Therefore, TCN can be more flexible in
the model’s memory size, and it is easy to migrate to other
fields
(4) The TCN can accept any length of input sequence by sliding
one-dimensional convolutional kernels. Therefore, it is flexible to
be utilized on distinct tasks.
2) Disadvantages: (1) To maintain the long-term memory and generate
the
predicted result, the TCN needs to occupy more memories during the
testing phase.
(2) When TCN migrates to different fields, the require- ment of
historical length and perceptive field will be distinct. Hence,
migration operations could result in a weak expression of the TCN
model.
III. EXPERIMENTS
A. Dataset Description
The NASA C-MAPSS Data Set selected in this ex- periment is widely
used in the research of remaining useful life prediction. It has
four sub-datasets; each sub- dataset contains a different number of
turbofan engine
performance in several conditions and fault modes. Each sub-dataset
is also divided into a train set and a test set of multiple
multivariate time series. Each row of the data set describes a
series of data for different turbofan engines in a life cycle. The
first column represents the engine ID; the second one represents
the current operating cycle; 3- 5 columns are three operation
settings and column 6-26 records are the 21
sensors’measurements[13].
B. Data Preprocessing
1) Feature Selection: There are 21 sensor measure- ments and three
setting values can choose. However, among these values, some of
them have no apparent fluctuations which contribute little to the
experiment. To reduce the complexity of the experiment and to
shorten training time, this work ignored these values. Therefore,
these following features have been discarded in FD001: three
setting values and sensor 1, 5, 10, 16, 18, 19. The remaining
number of features is 15.
2) Data Normalization: This paper normalized all the selected
features to 0 through standard score normaliza- tion. Let µi be the
mean of i-th feature in the correspond- ing data set, σi be
corresponding the standard deviation, xi is the data to be
normalized and xi
′ represents the
xi ′ = xi − µi σi
(3)
Fig. 5. Piece-wise RUL of C-MAPSS data set (Maximum limit RUL is
125 time cycles)
3) Training Label Calculations: If RUL is calculated directly from
the current cycle time and the total cycle time, a linear
relationship will be obtained. It may be inaccurate because when
the engine works at a startup, the performance degradation can be
negligible or even no attenuation. When the engine runs for a
particular time, it will show a relatively apparent downward trend.
Therefore, using a linear relationship to calculate the RUL will
overestimate the RUL at the beginning of
engine use. This paper chose a piece-wise linear function [14] [15]
to handle the training labels, as Figure 5 shows. The maximum limit
of RUL was set to 125 time-cycles in this experiment.
(a) FD001 example (b) FD002 example
(c) FD003 example (d) FD004 example
Fig. 6. Sample sensor data in four training datasets (sensor2, 3,
4, 12)
4) Data Visualization: To build an efficient and practical model
requires the ability to discover potential patterns in data. Figure
6 shows the fluctuation of the values of one example sensor from 4
datasets over the cycle time. It is noted that the value
fluctuation of these sensors in FD001 and FD003 is much less than
that in FD002 and FD004. In FD002 and FD004, the operation setting
values can be clustered into 6 clusters[14], which produces six
different kinds of work conditions. When the engine is working, the
operating condition is continuously switching and fault modes may
happen, which leads to significant fluctuations in the data and
makes the prediction work more difficult. This phenomenon also
implies that the traditional CNN can not meet the requirements well
when building the training model of C-MAPSS Data Set. It needs the
intervention of the sequence model to solve the problem of
operating condition switching and fault modes.
C. Experimental Network Architecture
TCN is the proposed network that has shown high
performance in sequence modeling tasks in diverse fields. Towards
testing its advantages introduced in Section II and verifying its
feasibility in RUL estimation, TCN is applied on the available C-
MAPSS dataset and compare its performance with other innovative
network models.
• Long short-term Memory LSTM is a representative and effective
network of the RNN family, a group of networks famous for its
strength in processing sequence modeling tasks. By introducing a
memory cell to store historical data, LSTM could remember the
sequence information longer than RNN which mitigates the gradients
vanishing or explosion problems to some extent.
• One Dimensimsonal Convolutional Neural Net- work 1DCNN is a
typical CNN architecture widely ap- plied in the field of signal
verification and natu- ral language processing. The critical
advantage of 1DCNN is that it works well for analysis of time
series data because its one-dimensional kernel could fully extract
the feature of input sequence data by scanning it thoroughly from
start to the end.
• Deep Convolutional Neural Network DCNN is also a convolutional
neural network model that is good at feature extraction. It was
first in- troduced in fault diagnosis and prognosis by [14]. Based
on their pioneer work, a modified DCNN is used to compare the
performance with TCN.
TABLE I THE CONSTRUCTION DETAILS OF TCN LAYER.
Parameters Value
Number of filters 128
Padding ’causal’
Drop rate 0.5
Batch normalization True
Layer normalization True
2) Network Architecture setting: Detailed network settings are
introduced below: 1) In Table I, the TCN layer is illustrated. The
whole TCN structure utilizes
three stacks of residual blocks. The dilated factors are set to 1,
2, 4, 8, 16, and 32 respectively in each residual block.
Additionally, data pruning, batch normalization and layer
normalization were added to improve the generalization ability of
the TCN model. The other three networks are exhibited in Figure 7;
2) For LSTM training, the model designs five hidden layers,
including three LSTM layers and two fully connected layers.
Finally, there is a 1-dimensional output layer. 3) Five one
dimensional convolutional layers constructed the whole 1DCNN
network, three one dimensional pooling layers, one flatten
connection layer and two fully connected layers. Some normalized
layers like batch normalization layers and drop out layers were
also added, but they are not shown in Figure 7; 4) The CNN
architecture includes six convolutional layers and three pooling
layers. The flatten layer and fully connected layers were put at
the end of the network; All of the settings were obtained in many
times experiments with high-quality results. Each model was trained
with the same learning rate (0.001), batch size (512) and epochs
(200).
(a) DCNN (b) LSTM (c) 1DCNN
Fig. 7. Brief illustration of DCNN, LSTM and 1DCNN’s network
architecture
D. Evaluation of Experimental Results
1) Evaluation Methodology:
n∑ i=1
hi 2 (4)
This paper chose Root Mean Square Error (RMSE) function for
evaluating the RUL estimation. It can measure the deviation between
the observed value and the real value which is often used as the
standard to measure the prediction results of such models. In
training, the woek selected Mean Square Error (MSE) as the loss
function. However, since the results of MSE often reach thousands
or even tens of thousands, it is difficult for describing data.
RMSE is the root of MSE, which can better describe the model
appearance without affecting the results.
S =
t=1(e ht 10 − 1), when ht ≥ 0
(5)
Moreover, a score function provided by [15] also occurs in the
evaluation metric for RUL estimation. Let n be the total number of
samples in a test set, ht = ˆRULt − RULt ,which means the estimated
RUL value minus actual RUL value. For RMSE, when the model
overestimates or underestimates RUL, the curve of the penalty term
is symmetric. Unlike RMSE, when the model overestimates RUL, that
is, ht ≥ 0, the penalty rises faster than when ht < 0. If the
RUL is overestimated, the engine will remain running after the end
of the work cycle, which may cause system failure and lead to more
serious consequences. Therefore, a greater penalty is imposed on
the engine in this case.
Fig. 8. Training loss curve of TCN in FD001
2) Experimental Results and Evaluation: Figure 8 displays the
training loss curve. It is observed that TCN reduces the loss
sharply in the first 40 epochs and keeps the curve flattened after
about 70 epochs. According to the metrics introduced in the
previous function, a
TABLE II EXPERIMENTAL RESULTS FOR DIFFERENT LEARNING METHODS
Model FD001 FD002 FD003 FD004
RMSE Score RMSE Score RMSE Score RMSE Score
TCN 11.58 195.1 14.67 1020 12.67 228.2 17.00 1810
LSTM 12.52 291.7 21.42 3897 13.54 347.3 24.21 5806 1DCNN 13.56
326.1 21.01 3800 13.75 378.2 22.72 4178 DCNN 14.41 357.4 23.74 4050
14.00 484.1 24.23 5293 DBN[9] 15.21 417.59 27.12 9031.64 14.71
442.43 29.88 7954.51 MLP[9] 16.78 560.59 28.78 14026.72 18.47
479.85 30.96 10444.35 SVM[9] 40.72 7703.33 52.99 316483.31 46.32
22541.58 59.96 141122.19
(a) Test unit 21 (b) Test unit 24
(c) Test unit 34 (d) Test unit 81
Fig. 9. Comparison between four engines life-time RUL prediction
results and the actual RUL.
network with a lower score performs better in RUL estimation. In
this experiment, TCN obtains the lowest score among all seven deep
learning neural network models. Table II displays the final RMSE
and scores. It is worth noticing that TCN improves each dataset’s
scores by about 33%, 73%, 34% and 68% compared with the scores
achieved by the second-best model, LSTM. To take an insight into
TCN’s prognostics, in Figure 9 four test units are selected to
compare the predicted value with actual RUL, whose unit numbers are
21,24,34 and 81. Overall, the predictions on these four test units
fit the real value well despite slight fluctuations during about 50
cycles in each test unit’s training time cycles. From these four
figures, it is conspicuous that the predicted RUL is relatively
close to the actual values and the estimated values almost form a
linear degradation following the curves of actual values.
Moreover, the improvement of scores on each dataset
varies greatly. Scores of FD002 and FD004 have a dra- matic leap
compared to that of FD001 and FD003. The reason for it could be
that FD002 and FD004 have more densely fluctuating data which could
be better processed by TCN rather than LSTM. Whereas the overall
scores of FD002 and FD004 are still much higher than the scores of
FD001 and FD003, which might be caused by the more massive and
complex data in FD002 and FD004 to enhance the difficulties of
training. The following section will demonstrate the detailed
comparisons between these networks.
3) Comparison between networks: In this section, the experimental
results will be evaluated and analyzed in the following two
perspectives.
• Comparison between 1DCNN and DCNN It can be observed from Table
II, the RMSE results of 1DCNN reduce about 6%, 11%, 2%, and 6%
respectively on four sub-datasets compared with the results of
DCNN. Simultaneously, the score accuracy increase by about 9%, 6%,
22%, and 21% respectively. 1DCNN optimize the training effect
slightly based on DCNN, which indicates that the fully sliding of
one-dimensional kernels can learn sequential features to some
extent. Hence, 1DCNN could also be a choice to be applied in some
simple sequence modeling tasks.
• Comparison between TCN and 1DCNN Although 1DCNN has showed its
strength in pro- cessing sequential data, further improvements
still exist according to Table II. As TCN applies 1DCNN in its
architecture, the comparison between TCN and 1DCNN indicates the
effectiveness of dilated convolutions and residual
connections.
• Comparison between TCN and RNN This section will compare the TCN
with the RNN according to each network’s experimental results and
principles.
It is worth mentioning that the comparative trial was completed to
verify whether TCN can replace RNN in the PHM field. According to
Table II, the RMSE results of TCN reduce about 8%, 32%, 6%, and 30%
respectively on four sub-datasets compared with the results of
LSTM. Simultaneously, the score accuracy increase by around 33%,
73%, 34%, and 68% respectively. The improvement of the evaluated
results is dramatic for FD002 and FD004 but inap- parent for FD001
and FD003. This phenomenon is still caused by the differences in
the data volatility in each data set, where FD002 and FD004 are
more difficult sequence modeling tasks. TCN has more advantages in
processing sequence modeling tasks according to the result. The
reasons for RNN failure will be explained in the following
contents. For RNN, two main problems are unstable gradient and
non-parallelism. To solve the first problem, TCN applies residual
connections to pass informa- tion in the useful blocks and skip
useless blocks to deepen the network layers. An experiment was
designed to compare the gradient’s stability in the TCN and RNN
network model. In the testing procedure, the test loss fluctuations
reflected the changes of stability. Based on the FD002 data set
training task, the network depth could be increased by adding
residual blocks for the TCN model or adding LSTM layers for the
LSTM model. Fig- ure 10 shows the negative impact of the unstable
gradient. For LSTM, the test loss kept decreasing until adding
three LSTM layers, the gradient van- ishing occurred when adding
five layers, and the trained model’s accuracy declined. Moreover,
for TCN, the training results were more accurate with the increase
of block numbers, which means the residual connections effectively
stabilize gradients. Secondly, non-parallelizability dramatically
slows down the training speed of the LSTM network and consumes too
much computing resources. This limitation also reduces the
expression ability of the LSTM model. Nevertheless, TCN can perform
parallel computing and reduce low memory con- sumption during
training because of the causal convolution. Generally, the
experiment provides a new idea and enlightenment for the
application of TCN and shows that TCN is promising to break the
monopoly of RNN and even replace RNN in more research and
industrial fields.
Fig. 10. The change of test loss with increased residual blocks for
TCN and LSTM layers for LSTM based on FD002 data set.
IV. DISCUSSION AND FUTURE WORK
This work demonstrates significant advantages of TCN in remaining
useful life estimation. Compared with other network architectures
like RNN, CNN and LSTM’s ap- pearance in training C-MAPSS, TCN has
lower training loss and faster loss convergence rate. This kind of
advantage should be more widely used in the PHM field and extended
to other fields.
A. Extension on the Application of TCN on PHM
In addition to fault detection of mechanical compo- nents such as
the engine, TCN has a broad application prospect in other PHM
fields. For example, in some signal system of railway transit,
train delay caused by the failure of trackside equipment and the
signal system often occurs[19]. To detect the fault in time, the
TCN network model can be established using all kinds of data before
the fault occurs. The reference data include temperature and
humidity and other external environ- ment conditions, a series of
parameters of parts on the track, and signal fluctuation of the
system. Exploring the potential relationship between these data may
improve the accuracy of fault detection. Moreover, TCN can also be
used to predict the health of some intelligent machines.
B. TCN in Other Fields
Because TCN can deal with the problem of the se- quence model, it
can be applied to speech processing, language model, and time
series prediction. [7] has been proved that TCN is more accurate
and faster in dealing with many language modeling and music
modeling tasks such as text8 dataset[20], Music JSB Chorales[21],
etc. At the same time, TCN, as an innovation of the CNN model, can
also achieve high-quality results in com- puter vision. For
example, in the field of sign language translation, there is no
explicit mapping relationship between sign language actions and
text words in sentence meaning expression. The model of TCN can not
only
capture the actions in hierarchical views but also help to learn
the correlation between adjacent features to reduce the difficulty
and improve the accuracy [22]. The future experimental direction
will be based on this model and continue to improve to be
compatible with the work in other fields.
V. RELATED WORK
In addition to the mentioned work, some related work in RUL
estimation prediction realm will be introduced. The combined usage
of enhanced deep LSTM and Gaussian Mixture Models (GMMs) was
introduced by M.Sayah[23] in 2021. A novel prediction
architecture[24] also taking LSTM as a basic part was proposed by
R. Guo in 2021. Their prediction mode is based on empirical mode
decomposition (EMD) and LSTM. Similarly, LSTM was utilized with
genetic algorithm to predict the remaining useful life by Yang,
K[25], such integration was named as GAPLS-LSTM. consequently, many
researchers used LSTM as their prediction architectures’ essential
part and multi-algorithm mode was the key to differ from other
work. Additionally, there were some work utilizing other algorithm
more than LSTM. For instance, DCNN was chosen to make prediction
combined with Bayesian optimization and adaptive batch
normalization (AdaBN) by J. Li[26]. Additionally, a remaining life
prediction method based on fuzzy evaluation-Gaussian process
regression (FE-GPR) was also attempted by W. Kang[27]. In this way,
plenty kinds of methods were applied in remaining useful life
prediction problem.
VI. CONCLUSION
This work adopted the temporal convolutional network (TCN) to
predict turbofans’ RUL based on C-MAPSS Dataset. As it describes,
the TCN adds four core parts in the design: sequence modeling,
causal convolutions and dilated convolutions, and residual
connections. It is good at learning sequential features of data and
keeping long historical information. This paper selected three
other networks(LSTM, 1DCNN, and DCNN) to train with the same input
data in order to verify the effectiveness of the TCN. The
experimental results showed that the TCN model is more accurate
than other compared network models, which indicates that the impact
of migrating the TCN to the PHM field could be positive. Moreover,
the analysis of the result has shown the potential to replace the
RNN to deal with sequence modeling tasks due to
the TCN’s stable gradient and parallel computing capa- bility. The
current research and experiments illustrated significant advantages
of TCN in remaining useful life estimation preliminarily.
VII. ACKNOWLEDGMENT
We thank for valuable suggestions and feedbacks from all members of
User-Centric Computing Group and the reviewers from ICPHM2021. This
research is supported by ported by The Science and Technology
Planning Project of Hunan Province (2019RS2027).
REFERENCES
[1] K. Jamali, Achieving reasonable conservatism in nuclear safety
analyses, Reliab. Eng. Syst. Saf. 137 (2015) 112–119.
[2] J. Park, W. Jung, A systematic framework to investigate the
coverage of abnormal operating procedures in nuclear power plants,
Reliab. Eng. Syst. Saf. 138 (2015) 21–30.
[3] J. Lee, F. Wu, W. Zhao, M. Ghaffari, L. Liao, D. Siegel,
Prognostics and health management design for rotary machinery
systems—Reviews, methodology and applications, Mech. Syst. Signal
Process. 42 (2014) 314–334.
[4] G. S. Babu, P. Zhao, and X.-L. Li, “Deep convolutional neural
network based regression approach for estimation of remaining
useful life,” in International Conference on Database Systems for
Advanced Applications. Springer, 2016, pp. 214–228.
[5] I. Sutskever, J. Martens, and G. E. Hinton, “Generating text
with recurrent neural networks,” in Proc. 28th Int. Conf. Mach.
Learn., 2011, pp. 1017–1024.
[6] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[7] Shaojie Bai and J. Zico Kolter and Vladlen Koltun, An empirical
evaluation of generic convolutional and recurrent networks for
sequence modeling, arxiv:1803.01271 [cs.LG].
[8] E. Ramasso, A. Saxena, Review and analysis of algorithmic
approaches developed for prognostics on CMAPSS dataset, in:
Conference of the Prognostics and Health Management Society,
2015.
[9] C. Zhang, P. Lim, A.K. Qin, K.C. Tan, Multiobjective deep
belief networks ensemble for remaining useful life estimation in
prognostics, IEEE Trans. Neural Netw. Learn. Syst. 28 (2017)
2306–2318.
[10] Shelhamer E, Long J, Darrell T. Fully Convolutional Networks
for Semantic Segmentation. IEEE Trans Pattern Anal Mach Intell.
2017;39(4):640-651. doi:10.1109/TPAMI.2016.2572683
[11] Oord, Aaron van den, et al. Wavenet: A generative model for
raw audio. arXiv preprint arXiv:1609.03499 (2016).
[12] Fisher Yu and Vladlen Koltun, Multi-Scale Context Aggregation
by Dilated Convolutions. arXiv:1511.07122 [cs.CV]
[13] E. Ramasso and A. Saxena, “Performance benchmarking and
analysis of prognostic methods for cmapss datasets.” Interna-
tional Journal of Prognostics and Health Management, vol. 5, no. 2,
pp. 1–15, 2014.
[14] G. S. Babu, P. Zhao, and X.-L. Li, “Deep convolutional neural
network based regression approach for estimation of remaining
useful life,” in International Conference on Database Systems for
Advanced Applications. Springer, 2016, pp. 214–228.
[15] F. O. Heimes, “Recurrent neural networks for remaining useful
life estimation,” in Prognostics and Health Management, 2008. PHM
2008. International Conference on. IEEE, 2008, pp. 1–6.
[16] K. He, X. Zhang, S. Ren and J. Sun, ”Deep residual learning
for image recognition,” 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770- 778, doi:
10.1109/CVPR.2016.90.
[17] Nair, Vinod and Hinton, Geoffrey E. Rectified linear units
improve restricted Boltzmann machines. In ICML, 2010.
[18] Srivastava, Nitish, Hinton, Geoffrey E, Krizhevsky, Alex,
Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout: A simple way
to prevent neural networks from overfitting. JMLR, 15(1),
2014.
[19] Oyebande, B.O.; Renfrew, A.C.: ’Condition monitoring of
railway electric point machines’, IEE Proceedings - Electric Power
Applications, 2002, 149, (6), p. 465-473, DOI: 10.1049/ip-
epa:20020499
[20] Mikolov T, Sutskever I, Deoras A, et al. Subword language
modeling with neural networks[J]. preprint (http://www. fit. vutbr.
cz/imikolov/rnnlm/char. pdf), 2012, 8: 67.
[21] Allan, Moray, and Christopher Williams. ”Harmonising chorales
by probabilistic inference.” Advances in neural infor- mation
processing systems. 2005.
[22] Guo D, Wang S, Tian Q, et al. Dense Temporal Convolution
Network for Sign Language Translation[C]//IJCAI. 2019: 744-
750.
[23] Sayah, M., Guebli, D., Noureddine, Z. et al. Deep LSTM
Enhancement for RUL Prediction Using Gaussian Mix- ture Models.
Aut. Control Comp. Sci. 55, 15–25 (2021).
https://doi.org/10.3103/S0146411621010089
[24] R. Guo, Y. Wang, H. Zhang and G. Zhang, ”Remaining Useful Life
Prediction for Rolling Bearings Using EMD-RISI- LSTM,” in IEEE
Transactions on Instrumentation and Mea- surement, vol. 70, pp.
1-12, 2021, Art no. 3509812, doi: 10.1109/TIM.2021.3051717.
[25] Yang, K, Wang, Y, Yao, Y-n, Fan, S-d. Remaining useful life
prediction via long-short time memory neural network with novel
partial least squares and genetic algorithm. Qual Reliab Eng Int.
2021; 37: 1080– 1098. https://doi.org/10.1002/qre.2782
[26] J. Li and D. He, ”A Bayesian Optimization AdaBN-DCNN Method
With Self-Optimized Structure and Hyperparameters for Domain
Adaptation Remaining Useful Life Prediction,” in IEEE Access, vol.
8, pp. 41482-41501, 2020, doi: 10.1109/AC- CESS.2020.2976595.