Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
IN DEGREE PROJECT TECHNOLOGY,FIRST CYCLE, 15 CREDITS
, STOCKHOLM SWEDEN 2020
Simulating Fetal ECG Using Machine Learning on Ultrasound Images
MATHILDA VILLOT BERLING
JULIA ÖNERUD
KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ENGINEERING SCIENCES IN CHEMISTRY, BIOTECHNOLOGY AND HEALTH
i
This project was performed in collaboration with Center for Fetal Medicine, Department of Obstetrics and Gynecology
Karolinska University Hospital Supervisors: Jonas Johnson and Lotta Herling
Simulating Fetal ECG Using Machine Learning on Ultrasound Images
Simulering av foster-EKG genom maskininlärning på ultraljudsbilder
M A T H I L D A V I L L O T B E R L I N G J U L I A Ö N E R U D
Degree project in medical engineering First level, 15 hp
Supervisor at KTH: Tobias Nyberg, Mattias Mårtensson Examiner: Mats Nilsson
KTH Royal Institute of Technology School of Engineering Sciences in Chemistry, Biotechnology and Health
SE-141 86 Flemingsberg, Sweden http://www.kth.se/cbh
2020
ii
iii
Abstract
ECG is used clinically to detect a multitude of medical conditions, such as heart-problems like
arrhythmias and heart failure, and to give a good general image of the function of the heart
with a quick and harmless exam. In many clinical cases, normal ECG measurements cannot
be taken, such as with fetuses where ECG signals from the mother’s own body hinder the
measurement. This paper examines using machine learning algorithms to be able to simulate
ECG graphs from ultrasound data alone. These algorithms are trained on ultrasound and ECG
data acquired from the same patient simultaneously. The data used in the training of the
algorithms is taken from samples acquired from 100 adult patients. The results found using
this method to simulate an ECG indicate good possibilities for future usefulness, where
machine learning to acquire simulated ECG can help facilitate clinicians in evaluating fetal
heart function, as well as in other cases where ECG cannot be measured normally.
Keywords: ECG, Ultrasound, Fetal-ECG, Heart, Machine learning, Simulated
iv
Sammanfattning
EKG används kliniskt för att upptäcka en mängd olika åkommor, så som hjärtsvikt och
arytmier, men också för att ge en generell bild av hjärtfunktionen med en snabb och harmlös
undersökning. I många kliniska fall kan dock inte normal EKG mätning ske, så som för foster
då EKG signaler från moderns egna kropp hindrar EKG-mätningen. I detta papper undersöks
användandet av maskininlärningsalgoritmer för att kunna simulera EKG grafer från enbart
ultraljuds data. Dessa algoritmer är tränade på ultraljud och EKG data som simultant fåtts från
samma undersökning av en patient. I detta papper har ultraljudsdatan som använts kommit
från 100 mätningar från olika vuxna patienter. Resultaten funna från undersökningen av EKG
simulerings metoden indikerar goda möjligheter för framtida användbarhet, då
maskininlärningsalgoritmer för att simulera EKG kan underlätta när kliniker ska utvärdera
hjärtfunktionen hos foster, eller i andra fall då EKG inte kan mätas normalt.
Nyckelord: EKG, Ultraljud, Foster-EKG, Hjärta, Maskininlärning, Simulerat
v
Contents
Abstract iii
Sammanfattning iv
Contents v
Abbreviations vii
Glossary vii
1 Introduction 1
1.1 Aim 1
2 Background 3
2.1 Tissue Doppler 3
2.2 Electrocardiography 3
2.3 Fetal Heart Physiology 4
2.4 Importance of Fetal Echocardiography 4
2.5 Machine Learning 5
2.6 Artificial Neural Networks 5
2.7 Statistical Methods 6
2.8 Machine Learning in Fetal Cardiology 7
3 Method 9
3.1 Datasets 9
3.2 Programming language and hardware 10
3.3 Initial processing of training datasets 10
3.4 Further processing of training datasets 10
3.4.2 Dataset 1B: Tissue Doppler dataset of one heart cycle 11
3.4.4 Dataset 2B: Cine-loop dataset, rate of change 11
3.4.5 Dataset 2C, 2D, 2E and 2F: Cine-loop dataset, rate of change and minimized data
11
3.5 Training the algorithm 11
3.6 Evaluating performance 12
3.6.1 Dataset 1A and 1B: The tissue Doppler datasets 12
3.6.3 Dataset 2: The cine-loop datasets 12
3.6.4 Dataset 2F: The best performing cine-loop dataset 12
3.7 Testing on fetal data 12
4 Results 13
vi
4.1 Dataset 1A: The tissue Doppler dataset of multiple heart cycles 13
4.2 Dataset 1B: The tissue Doppler dataset of one heart cycle 15
4.3 Dataset 2B through E: The cine-loop datasets 17
4.4 Dataset 2F: The cine-loop dataset, rate of change and minimized data 19
5 Discussion 21
5.1 Results on dataset 1 21
5.2 Results on dataset 2 21
5.3 Fetal data results 22
5.4 Results regarding aim 23
5.5 General improvements of models 23
5.6 Improvements for results on fetal data 24
5.7 Results of study on the future of fetal diagnostics 24
6 Conclusion 25
7 References 27
Appendix 1: Optimised parameters for learning models
vii
Abbreviations ML – Machine learning
ANN – Artificial neural networks
MSE – Mean squared error
ECG – Electrocardiogram
PCC – Pearson correlation coefficient
ROI – Region of interest
Glossary Hyperparameters – layers and nodes in an ANN
Mean squared error – difference between predicted value and true value squared
Variance – Flexibility of model
Bias – Error of the model
Overfitting – A model highly adapted to training data that generalizes poorly
R-squared – Statistical measure based on variance
Pearson – Statistical measure based on linearity
Cine-loop – Echocardiography images in a digital form as a sequence of a determined frame
number
viii
1
1 Introduction
Electrocardiogram (ECG) is used clinically to detect a multitude of medical conditions, such
as heart-problems like arrhythmias and heart failure, and also to give a good general image of
the function of the heart [1]. In situations with fetal cardiac dysfunction and structural cardiac
anomalies, it is of importance for both the fetus and the mother to detect these problems
prenatally to minimise perinatal complications and to have more time to prepare for possible
post-birth surgeries or interventions. Problematically though, the results from classical ECG
are of significantly diminished quality when performed on a fetus compared to a postnatal
patient. This is due to the mothers own electrical signals from the heart and body adding a
large amount of noise to the measurement [2]. A method of obtaining a fetal ECG could
therefore be an important tool in order to diagnose cardiac conditions.
Echocardiography can be used to obtain images of the fetal heart, with tissue Doppler
ultrasonography the velocity of the walls of the fetal heart can be obtained to be able to
evaluate fetal heart function. Tissue Doppler ultrasonography shows promise in assessing
fetal cardiac function. However, it requires an experiences sonographer to spend a large
amount of time analysing the data. This is where an ECG curve could be of great help in
evaluating fetal cardiac function more accurately and effectively due to the simplicity of the
ECG.
Many studies have been performed on different methods of extracting the fetal ECG-signal
from the ECG-signal of the mother, via filtering and data separation, but these have problems
with accuracy due to not being able to completely remove the noise [2]. The proposed
solution to this, is to instead get the supplementary ECG from the less noisy ultrasound
measurements by using algorithms trained using machine learning. These algorithms will
learn using data from adult patients containing both ultrasound measurements and classical
ECG measurements. When trained the algorithms will then be tested on prenatal patients. The
aim of this project is therefore to create machine learning algorithms that can produce an ECG
from ultrasound data and assess the grade to which they work.
1.1 Aim
The aim of this project was to train a model that could produce a plausible simulated ECG-
curve from tissue Doppler ultrasound data sampled from fetuses, and for unseen ultrasound
data in adults also achieve:
1) A result better than noise
2) P-wave, QRS-complex and T-wave visible in every heart cycle for 90% of the test
samples
3) A mean Pearson correlation coefficient (PCC) score of at least 0.7
2
3
2 Background
The background of this project is twofold, knowledge about the diagnostic techniques and
physiology combined with the science of machine learning. In this background segment, we
will provide insights needed to understand our study in both these aspects.
2.1 Tissue Doppler
Tissue Doppler is a form of echocardiography that measures the velocity of the myocardium
(heart muscle) throughout one or more heartbeats using the Doppler effect. The Doppler
effect is the principle that the ultrasound reflected back from an object will have an altered
frequency depending on the velocity of the object that it is reflected on [3]. And therefore, by
simply looking at the frequencies sent out and received by the transducer, motion can be
deduced [3]. The velocity of the myocardium, the heart valves and the blood can all be used to
find signs of heart defects and problems which makes tissue Doppler one of the more
important modalities when it comes to cardiovascular defects and diseases.
2.2 Electrocardiography
Electrocardiography, is the process of creating an ECG, which is a graph of the measured
electrical activity of the heart. The electrical activity is measured using a multitude of
electrodes placed in direct contact with the skin which detect the small electrical changes in
body that are a result of cardiac muscle depolarization and repolarization during each
heartbeat [4]. The ECG contains three main parts, known as the P-wave, QRS complex and
the T-wave (Figure 1). The P-wave represents the atrial depolarization, the T- wave
represents the repolarization of the ventricles and the most important one, the QRS complex,
represents a combination of depolarization of the left and right ventricles and the contraction
of the large ventricular muscles [5]. Clinicians can quickly detect cardiac anomalies by
looking at the ECG and these three main parts in particular, and seeing if the amplitude of a
certain part looks strange or if a certain interval is too long.
Figure 1:Left: Schematic diagram of normal sinus rhythm for a human heart as seen on ECG (with English labels), With the P part representing the depolarization of the left and right atrium, the QRS part representing electrical impulses spreading through the ventricles and indicating ventricular depolarization and the T part representing ventricular repolarization [6].
Right: ECG (green) and simultaneous tissue Doppler (yellow) in combined plot
4
2.3 Fetal Heart Physiology
The heart of the fetus is markedly different from an adult heart, both in physiology and
function. These differences are partly due to the fetus still being in stages of development,
having a much higher amount of stem cells in circulation and a vastly different circulatory
need compared to an adult [7]. One clear difference that exist due to this is the much higher
heart rate of a fetus compared to an adult, with heart rates ranging between 120 and 160 bpm
being the normal [8]. The fetus is also fully dependent on the placenta, which is located inside
the womb with connection to both the uterus and the liquid-filled sac within which the fetus is
held. Oxygen and nourishment is transferred through the placenta and via the umbilical cord
to the fetus, and there is no direct contact between the circulatory system of the fetus and the
mother. The lungs of the fetus are filled with amniotic fluid during gestation and only a small
amount of blood is pumped past the lungs [7].
Figure 2: Fetal Circulatory System-02.jpg, CC BY 3.0 License [9]
Since there is less of a need for blood to pump past the lungs while they are filled with
amniotic fluid, the fetal heart does not have a separate pulmonary artery and aorta. Instead
they are connected by a blood vessel called the ductus arteriosus. This extra blood vessel is
closed after birth and the pulmonary artery and aorta becomes separate. There is also an
opening between the left and the right atria in the fetal heart, called the foramen ovale. It
allows for blood to flow directly to the left atrium from the right (Figure 2)[9]. And as with
the ductus arteriosus, the foramen ovale is also closed and disappears shortly after birth [7].
2.4 Importance of Fetal Echocardiography
Fetal ECG is important because it would aid the clinician in correctly diagnosing the fetus,
which in turn makes it possible to do two things, firstly it helps with planning the perinatal
management and identifying what kind of intervention may be required in the delivery room
or within the first days of life. And secondly it helps to identify fetuses who may benefit from
fetal cardiac intervention, meaning different medicines or surgeries on the fetuses heart while
in the womb [10]. Fetal echocardiography is used to detect arrhythmias, a collection of
ailments where the heart beats irregularly. Examples include Atrioventricular block (AV-
5
block), an impairment in the electrical signals when atria and ventricles beat asynchronously
and supraventricular extrasystole (SVES), an early depolarisation which causes the heart to
beat irregularly [5].
2.5 Machine Learning
Machine learning (ML) is the science of computational learning, combining statistics with
computer science to build algorithms which can process data and derive complex conclusions
and models that otherwise would be impossible to discern. ML algorithms can be categorized
in different ways, but all are based on inputs, the measured data which in turn affects the
output of a system [11].
2.6 Artificial Neural Networks
There are many different methods in machine learning to achieve a well performing model, a
very flexible and diverse method is Artificial neural networks (ANN), a nonlinear statistical
model [12]. According to Rebala [13] ANNs were initially created to mimic the function of
neurons in the human brain in an oversimplified manner, each neuron is modelled with
multiple inputs and one single output and every neuron is connected in a network to each
other. He further states that the neurons, or nodes form columns, or layers that are not
connected to each other vertically inside the layer but the layers are in turn connected to each
other via each node (Figure 3). In computer science terms, the artificial neuron is simply a
function regulated by a weight factor to control the strength of the impact to other artificial
neurons or functions via their connection [13].
Figure 3: Schematic overview of an ANN, the rectangles represent the different layers, circles are the artificial neurons/nodes and thin lines represent the connection between different nodes. a) input layer b) hidden layer c) output layer.
The input layer of artificial neurons gathers input data from the given dataset and sends that
information to the next layer in with respect to the weight [14]. The middle, or hidden layers
process this input with an activation function, typically a sigmoid function, and sends the
information to the output layer which in turn produces an output [13]. The artificial neural
network can have multiple layers and nodes in each layer, i.e. hyperparameters [15]. Sharma
[16] explains the sigmoid functions as a class of functions with similar shapes and attributes
with resemblance of an ‘S’ shape (Figure 4, right), examples include Softmax, logistic
function and tanh. Their purpose as an activation function is to make the connection non-
linear, which is needed for the ANN to find complex correlations [17]. Another activation
function is ReLU (Figure 4, left), that according to Sharma [16] it is the most commonly used
activation function today.
6
Figure 4: Example of a Sigmoid function with typical 'S' shape (right) and ReLU function (left).
For the artificial neural network to correctly model the behaviour of the system, it needs to be
trained on a given dataset. To train the algorithm, the standard approach is to change the
weights according to the stochastic gradient descent method (SGD), although other methods
exist [18]. The SGD method uses partial derivatives of a loss function (a function defining
how wrong the algorithm is) with respect to the weights in order to find the local minimum of
the loss function and change the weights accordingly [13]. The limited memory Broyden-
Fletcher-Goldfarb-Shanno (LBFGS) algorithm is another example of an optimisation
algorithm extensively used. Although it has limitations it often converges faster than standard
SGD [19]. A variant of the SGD method is Adam, presented by Sci-kit learn [20] as a SGD-
based optimiser that works well on larger datasets. They further explain that LBFGS is more
useful for smaller datasets, with faster convergence and better results.
Multilayer perceptron is a basic ANN, with inputs flowing through the network in a
unidirectional way – forward [18]. The documentation for the Multilayer Perceptron
Regressor (MLPRegressor) from Sci-kit learn describe the learning algorithm with 23 tuning
parameters, for example hidden layer sizes, activation function and solver (optimisation
method) [20]. Further the documentation shows that the solvers available are LBFGS, SGD
and Adam. There are five methods of the MLPRegressor, “fit” that uses training data (both
input and target) to train the model, “predict” that predicts an output given an input after
training, “score” which evaluates the model and “get_params” as well as “set_params” which
are methods for configuration of parameters [20].
2.7 Statistical Methods
The most common way to evaluate the accuracy of a ML regression model is to use the Mean
squared error (MSE) [17]. Since there is little interest in how well the model performs on
training data, an unseen portion of the data from the dataset is used to assess model
performance, this is also called the testing data. Overfitting is a common problem in ML
algorithms, not least in regard to artificial neural networks. Overfitting occurs when the
algorithm is too flexible in regards to the training portion of the dataset, and perceives
patterns occurring randomly in the training dataset that are not properties of the system
(Figure 5)[21]. When a loss function is at its minimum, the model is usually overfitted. To
prevent overfitting there are a number of methods that reduced the flexibility of the model, for
7
example weight decay or early stopping rule [12].
Figure 5: Example of a regression problem a) Overfitted example b) Not overfitted example c) true regression
R-squared (R2) is a commonly used statistical measure for quantifying the variance of a
regression problem, usually it measures the overall fit of the model along a scale of negative
infinity to 1 with higher scores indicating a better fit [22]. Another statistical measure is the
Pearson correlation coefficient (PCC) which measures the strength of linear association
between two variables on a scale from -1 (perfect negative correlation) to 1 (perfect positive
correlation) with 0 indicating uncorrelated variables [23].
2.8 Machine Learning in Fetal Cardiology
Garcia-Canadilla et al. [24] states that ML in fetal cardiology is of great interest and
development since evaluations of cardiac function and structures in fetuses often face
challenges. Examples include fetal movement, small heart size and inexperienced medical
personnel. Garcia-Canadilla also states that ML can facilitate the assessment of the fetal heart,
for example by improving image acquisition, extracting information for evaluation and
diagnosing abnormalities. Many papers are published on extraction of maternal ECG from
abdominal ECG readings to produce a fetal ECG or fetal QRS complexes with the use of
machine learning methods. For example Yu et al. [25] propose using independent component
analysis, Muduli et al. [26] focus on deep learning and Lukosevicius et al. [27] proposed a
method using ANN. Another approach from Sulas et al. [28] is to use data from pulsed-wave
doppler to extract features including heartbeat of the fetus using ANN.
One issue that arises when using ML methods to diagnose conditions is the “black-box” effect
that is especially apparent when using deep learning methods [24]. The “black-box” effect is
problematic since the decisions made by the model cannot be logically followed by medical
personnel, they are completely non-transparent in most ML methods [29].
8
9
3 Method
Many datasets were used in this project to test different methods of training the algorithm.
The training datasets (datasets 1 and 2) consists of adult data with corresponding ECG, and
the testing dataset (dataset 3) consists of fetal data without corresponding ECG.
3.1 Datasets
The training datasets were based on adult ultrasound data with regular heart rythm imaged on
Vivid S6 ultrasound imaging system equipped with a M4S-RS (1.9-4.1 MHz) phased-array
transducer (GE CV Ultrasound, Haifa, Israel) with correlating ECG data taken
simultaneously. The data was exported in software EcoPAC version 201 (GE Vingmed
Ultrasound AS, Horten, Norway).
Two types of data were exported in the EcoPAC software which gave rise to dataset 1 and
dataset 2. An overview of the training datasets can be found in figure 7.
Dataset 1 consisted of 100 color tissue Doppler ultrasound data and ECG patient samples
from 100 different adults, with lengths ranging between 1 and 3 seconds. The Doppler
ultrasound images was exported by placing a Region of interest (ROI) on the septal wall of
the heart (Figure 6) while in ‘q-analysis’ mode, and exporting the processed velocity curve of
that ROI to a .txt-file. The ECG data was also stored in the same .txt-file.
Dataset 2 consisted of 100 ultrasound cine-loops taken from 100 different adult patients, with
cine-loop lengths ranging between 1 and 4 seconds, sampled and saved in .avi files, along
with corresponding ECG data for each of the cine-loops, saved in a .txt-file.
A testing dataset consisting of fetal data was also used to evaluate the performance of the
models on fetal data. The fetal data consisted of tissue Doppler ultrasound data in two patient
groups. The first group, named dataset 3A, consisted of four samples from fetuses of normal
heart function. The second group, named dataset 3B consisted of four samples from fetuses
with irregular heart rythm, where arrhythmias like for example AV-block and SVES were
present.
Figure 6: Placement of the ROI when extracting tissue velocity data in EcoPAC software
10
Figure 7: Overview of the training datasets based on adult ultrasound data
3.2 Programming language and hardware
The language used for processing, training, visualising and evaluating on both datasets was
Python 3.7 (Python Software Foundation, Wilmington, DE, United states), with accompanied
libraries such as Numpy, Scikit-learn, Matplotlib and Scipy. The MLPRegressor from Scikit-
learn was used as our learning algorithm for all datasets. The processing, training and
evaluation of the models were done on a 2017 Macbook Air with 1,8 GHz Intel Core i5
processor for dataset 1 and on a stationary computer using a AMD ryzen 3900x cpu and a
AMD Radeon RX 5700 XT gpu on Windows 10 for dataset 2.
3.3 Initial processing of training datasets
The .txt-files from Dataset 1 were extracted into arrays and looped to normalise the length to
3 sec. An interpolation function (interp1d, Scipy) was used on both tissue Doppler data and
ECG to sample the data 500 times, all data therefore had a common x-axis. The data was also
smoothed using a Savitzky-Golay filter.
Dataset 2 consisted of cine-loops, with each of these cine-loops being made up of consecutive
images, called frames that in turn consisted of grey-level pixel values. The initial processing
of these cine-loops was merely to retrieve these pixel values and store them in an array that
could more easily be used by our later functions and neural networks
The measurements in both datasets were visually evaluated on quality, in order to categorize
individual measurements into three groups - low, medium and high quality. For dataset 1, the
quality depended on a multitude of issues. ECG data and velocity curve not aligned,
ECG/velocity reading null or noisy and velocity reading not sampled on correct ROI
(incorrect shape of plot). For dataset 2, the quality was evaluated based on null or noisy data
and grainy or low-resolution frames. In either dataset, if the ECG was found to be upside-
down it was flipped to show a correct trace.
3.4 Further processing of training datasets
Each dataset was processed with different methods, therefore resulting in two new datasets
based on dataset 1 and four new datasets based on dataset 2. The processing of these datasets
will be presented in this section. All datasets were normalized using mean and standard
deviation, see equation 1. This causes the mean of the dataset to be zero, and the standard
11
deviation to be one. See figure 7 for an overview of the training datasets. Datasets 1A and 2A
were used with no further processing.
Equation 1: Normalization formula, z denotes the normalized datapoint, x the original datapoint as well as and mean
and standard deviation of the dataset
3.4.2 Dataset 1B: Tissue Doppler dataset of one heart cycle
Dataset 1B consisted of velocity traces and ECG data from dataset 1 divided into heart cycles.
Using the ECG, the data was cut from R-peak to R-peak so that each sample of the dataset
consisted of one heart cycle, in this process high and medium quality data was used. Since
each heart cycle has unique length all samples had different lengths and unique x-axis. The
sample frequency was normalised using an interpolation function to 300 samples per heart
cycle. The resulting dataset was velocity inputs of one heart cycle sampled 300 times and
corresponding ECG targets of one heart cycle, sampled 300 times. After quality evaluation
and segmentation in heart cycles dataset 1B consisted of 162 heart cycles.
3.4.4 Dataset 2B: Cine-loop dataset, rate of change
Dataset 2B was processed by looking at how fast the pixels of the frames changed. This was
done by creating frames with new pixels representing the change in the original pixels over
multiple images. This resulted in a lower number of total frames compared to before, and now
contained information about how much the pixel values had changed over a set number of
cine-loop frames instead of information about the current state. The number of frames that
each cine-loop contained varied between 50 and 259 frames and also had varying lengths of
ECG that did not correspond to the number of frames, and so in order to correlate these cine-
loops to the ECG they had to be interpolated into arrays of the same size. Both the ECG data
and the cine-loop data were transformed into arrays of length 64 using interpolation, one array
per starting cine-loop and one per ECG data relating each cine loop.
3.4.5 Dataset 2C, 2D, 2E and 2F: Cine-loop dataset, rate of change and minimized data
Datasets 2C through E was processed in similar ways. For each frame in the original cine-
loops, a new frame was created where the pixel values in a square area of pixels in the
original frame was averaged into one pixel in the new frame. For Datasets 2C that square was
4x4 pixels for 2D 8x8 pixels and for 2E 16x16 pixels. Thereby reducing the image size by 16,
64, and 256 respectively. After that the same processing as for 2B was done for all three of
the datasets. Dataset 2F was processed the same way as 2E, but only using high quality data.
3.5 Training the algorithm
The inputs and targets from each of the datasets were split into a train and test group using the
train_test_split method from Scikit-learn, 30% of the dataset was split into testing. The
MLPRegressor was then trained on the training dataset using the method ‘fit’, with the input
of the dataset as the input (X) and the target of the dataset as the true values of the output (y).
For tissue Doppler datasets 1A and 1B, the optimal parameters of the MLPRegressor were
chosen with an optimiser function. Three optimisers were constructed, one for each type of
solver, ‘Adam’, ‘SGD’ and ‘LBFGS’. For each type of activation, ‘tanh’, ‘logistic’ and
‘ReLU’ and each type of solver the remaining parameters of the MLPRegressor were one by
one iterated on a chosen interval corresponding to that specific parameter, see appendix 2.
The parameter value that corresponded to the model with the best PCC score was then
12
selected, and the next parameter value started the same process. Resulting was therefore nine
models optimised on parameters for each different combination of activation and solver. The
nine models were evaluated on performance.
For dataset 2A through 2F, the same optimization method of parameters was used, with
optimization of two parameters for the ‘LBFGS’ solver and three for the ‘Adam’ solver. The
‘SGD’ solver was not used past initial testing for dataset 2.
For the differing sizes of datasets 2A through 2F, different neural net layer size and structure
was used. For dataset 2A through 2D the neural net was simple layered, meaning that it had
only one hidden layer with less than 50 neurons, and for dataset 2E and 2F the neural net had
many hidden layers with between 32 and 384 neurons, see appendix 1.
3.6 Evaluating performance
The evaluation of the models was implemented differently for the different datasets. The set
of statistical measures used for evaluation were: PCC, MSE and R2 score. All datasets except
2A were evaluated on the statistical measures and a visual score from 0-10.
On datasets 1A and 2F, the data was divided into low, medium and high quality, and required
an evaluation on which portions to use. The evaluation was a test that was devised with three
different dataset portions of dataset 1 and 2 respectively, only good quality data, good and
medium quality data and all quality of data to test on the best performing model in each
dataset group. The portion with the best PCC on 10 different iterations with random partitions
of training and testing data was chosen. Dataset 1B as well as Datasets 2B through 2E used
high and medium quality data, without an evaluation test.
3.6.1 Dataset 1A and 1B: The tissue Doppler datasets
For dataset 1A and 1B the evaluation was done iteratively for all combinations of solver and
activation function 10 times. The solver and activation combination that received the best
overall score was chosen as the best performing and further tested on the different data quality
portions.
3.6.2 Dataset 2: The cine-loop datasets
Due to the size of the datasets 2B through 2F, the long-time of each optimization and the large
differences in performance based on pre-processing, dataset 2A through F was evaluated
comparing different types of pre-processing and a few key parameter values.
3.6.3 Dataset 2F: The best performing cine-loop dataset
Dataset 2F was seen to be the best performing dataset out of the cine-loop datasets, and as
such it was chosen to be evaluated more strongly. For datasets 2A through E only one
test/train split of 70 % training and 30 testing data was trained and optimized on due to time
constraints, but 2F was done with many randomized splits of 70%/30% test/train data and
then evaluated on the average result of the training on all of the splits. To be able to more
reliably evaluate the effectiveness of the algorithms trained on the dataset.
3.7 Testing on fetal data
Since the fetal data only consisted of tissue Doppler data it was tested on models trained on
dataset 1A and 1B. The fetal data was pre-processed to fit the inputs of these datasets, the
model trained on dataset 1B inputs heart cycle data therefore the fetal data was cut into heart
cycles by manually segmenting the data. Since the fetal data does not have a correlating ECG
only visual correlation could be shown, with no statistical measures.
13
4 Results
The results are presented based on models trained and tested on each dataset. The statistical
measures presented are PCC (closer to 1 indicates a better fit), MSE (lower error is better), R2
(closer to 1 indicates a better fit) and visual score where 0 is lowest and 10 highest. For each
of the datasets, the quality of gathered data was evaluated. This evaluation showed that out of
100 samples from dataset 1, 51 were high quality, 24 medium quality and 22 low quality. Out
of 100 samples from dataset 2, 52 were high quality, 31 medium quality and 17 low quality.
Examples of ECG data from each evaluated quality in dataset 1 are presented in figure 8.
Figure 8: Examples of results on the data quality evaluation for dataset 1 for the ECG signal (yellow). X-axis in seconds and
Y-axis 10 V
4.1 Dataset 1A: The tissue Doppler dataset of multiple heart cycles
The results of the 10 trials for the different optimised models are presented in figure 9,
whereas the average scores are presented in table 1. The best performing model in a majority
of the statistical measures presented in table 1 has the combination of solver and activation
LBFGS and ReLU with an average PCC of 0.517 and visual score of 5.8. Further results on
data portions regarding quality were tested on this model and presented in table 2, the results
show that performance was optimised when only high-quality data was used. Parameter result
from the optimised models are presented in appendix 1. The combination SGD and ReLU
could not be trained.
14
Figure 9: Test results for each model performance trained on dataset 1A (all quality data) based on the statistical measures and visual scores. All axes have dimensionless values.
Table 1: Average of statistical scores for each combination of activation and solver for models trained on dataset 1A
Solver &
activation
Average PCC
Average MSE
Average R2
Average visual
score
LBFGS + ReLU 0.517 2662 0.0777 5.8
LBFGS + logistic 0.401 2795 -0.086 5.0
LBFGS + tanh 0.380 2952 -0.018 5.0
Adam + logistic 0.352 3341 -0.010 3.7
Adam + ReLU 0.254 5384 -1.029 3.7
Adam + tanh 0.385 2845 0.015 3.0
SGD + logistic 0.425 3137 -0.069 5.9
SGD + tanh 0.424 3753 -0.259 5.3
Table 2: Results of test of data portion used on best performing model
Data portion
Average PCC on 10 iterations
High quality 0.538
High and medium quality 0.517
High, medium and low quality 0.506
The visual results are presented from best performing model on adult data with only high-
quality data, solver LBFGS and activation ReLU in figure 10.
15
Figure 10: Visual results on adult data for best performing model trained on dataset 1A using only high quality data. Red
curve indicates predicted ECG, green curve indicates true ECG and yellow curve indicates tissue Doppler velocity. X-axis in
seconds and Y-axis in 10V (ECG) and cm/s (tissue Doppler).
The results on fetal data 3A (normal heart function and rhythm) for the best performing model
on dataset 1A are presented in figure 11b. Only visual results can be presented since
corresponding ECG on fetal data does not exist. An ECG curve is present, but it does not
correctly identify the heart cycles. The results on fetal data 3B (irregular heart rhythm) for the
best performing model (solver LBFGS, activation ReLU) on dataset 1A are presented in
figure 11a. Only visual results can be presented since corresponding ECG on fetal data does
not exist. An ECG curve is slightly present, but it does not correctly identify the heart cycles.
Figure 11: Visual results on a) abnormal heart function, b) healthy heart function tissue Doppler data from prenatal patients
of the best performing model trained on dataset 1A using only high-quality data. Red curve indicates predicted ECG and yellow curve fetal tissue Doppler velocity. Abnormal heart function ailments in a) are as follows: top left: Arrhythmia, top
right: long QT, bottom left: SVES, bottom right: AV-block III. X-axis in seconds and Y-axis in 10V (ECG) and cm/s (tissue Doppler).
4.2 Dataset 1B: The tissue Doppler dataset of one heart cycle
The results of the 10 trials for the different optimised models are presented in figure 12,
whereas the average scores are presented in table 3. The best performing model in a majority
of the statistical measures presented in table 3 has the combination of solver and activation
Adam and ReLU with an average PCC of 0.752 and visual score of 8.2, although the model
with solver Adam and activation tanh and the model with activation LBFGS and activation
tanh also performed well. Parameter result from the optimised models are presented in
appendix 1.
16
Figure 12: Test results for each model performance trained on dataset 1B (medium and high quality data) based on the statistical measures and visual scores. All axes have dimensionless values.
Table 3: Average of statistical scores for each combination of activation and solver for models trained on dataset 1B
Solver &
activation
Average PCC
Average MSE
Average R2
Average
visual score
LBFGS + ReLU 0.720 1588 0.511 7.7
LBFGS + logistic 0.734 1544 0.531 7.5
LBFGS + tanh 0.752 1657 0.549 7.7
Adam + logistic 0.727 1629 0.513 7.4
Adam + ReLU 0.752 1546 0.550 8.2
Adam + tanh 0.723 1627 0.511 8.4
SGD + logistic 0.734 1598 0.524 7.5
SGD + tanh 0.725 1500 0.522 7.7
SGD + ReLU -0.582 7419 -1.398 0.9
The visual results on adult data with high and medium quality data, with solver Adam and
activation ReLU is shown from figure 13.
Figure 13: Results on adult data for best performing model with high and medium quality data, yellow: tissue doppler
velocity, green: true ECG, red: predicted ECG. X-axis in seconds and Y-axis in 10V (ECG) and cm/s (tissue Doppler).
17
The results on fetal data 3A (normal heart function and rhythm) for the best performing model
(solver Adam, activation ReLU) on dataset 1B are presented in figure 14. The ECG curve can
accurately be predicted with some exceptions. The results on fetal data 3B (irregular heart
rhythm) for the best performing model (solver LBFGS, activation ReLU) on dataset 1B are
presented in figure 15. Only visual results can be presented since corresponding ECG on fetal
data does not exist.
Figure 14: Visual results on healthy heart function tissue Doppler data from prenatal patients from highest scoring model
trained on dataset 1B using high and medium quality data. Red curve indicates predicted ECG and yellow curve fetal tissue
Doppler velocity. X-axis in seconds and Y-axis in 10V (ECG) and cm/s (tissue Doppler).
Figure 15:Visual results on abnormal heart function tissue Doppler from prenatal patients from highest scoring model
trained on dataset 1B using hgh and medium quality data. Red curve indicates predicted ECG and yellow curve fetal tissue Doppler velocity. Abnormal heart function ailments are as follows: top left: Arrhythmia, top right: long QT, bottom left:
SVES, bottom right: AV-block III. X-axis in seconds and Y-axis in 10V (ECG) and cm/s (tissue Doppler).
4.3 Dataset 2B through E: The cine-loop datasets
Table 4 shows performance results from neural net training on datasets B through E with
different activation and solver settings, optimised towards higher PCC score. Evaluated on
three statistical measures and one visual. Parameter result from the optimized models are
presented in appendix 1. The best performing model in a majority of the statistical measures
is trained on dataset 2D-2E with the combination of activation and solver Adam and tanh
with an average PCC of 0.637 and visual score of 5.
18
Table 4: Statistical scores for each combination of activation and solver for models trained on datasets B and C as well as D and E
Solver & activation
Average
PCC
Average MSE
Average R2
Average visual
score
Dataset 2B-C ReLU,
Adam
0.135 2071000 -973 0
Dataset 2B-C tanh,
Adam
0.374 2465 -0.156 4
Dataset 2B-C
ReLU, LBFGS
0.293 2600 -0.224 3
Dataset 2B-C
tanh, LBFGS
0.334 3080 -0.449 4
Dataset 2D-E
ReLU, Adam
0.333 2690 -0.265 0
Dataset 2D-E
tanh, Adam
0.425 2352 -0.107 5
Dataset 2D-E
ReLU, LBFGS
0.365 2642 -0.243 3
Dataset 2D-E
tanh, LBFGS
0.407 2847 -0.340 6
In Figures 16-18 are plots from the neural net training on datasets B through E with their
visual scores. These illustrate how the different similarities in terms of simulated and real
ECG graph would be visually scored.
Figure 16: Dataset 2E, tanh activation and LBFGS solver. Visual score 6. In each graph in figures 16-18, the simulated ECG
is the red line while the green line is the real ECG. X-axis in seconds and Y-axis in 10V.
19
Figure 17: Dataset 2C, ReLU activation and LBFGS solver. Visual score 4. X-axis in seconds and Y-axis in 10V. The simulated ECG is the red line while the green line is the real ECG
Figure 18: Dataset 2B, ReLU activation and Adam solver. Visual score 0. X-axis in seconds and Y-axis in 10V. The simulated ECG is the red line while the green line is the real ECG
4.4 Dataset 2F: The cine-loop dataset, rate of change and minimized data
Table 5 shows performance results from neural net training on dataset F while Figure 19
shows graphs of simulated ECG compared to real ECG from training done on one train/test
split of dataset 2F. Parameter result from the optimised models are presented in appendix 1.
The performance results were acquired by taking the performance results from training the
neural net 11 times with different test/train split randomization and then averaging the
results. For each of these 11 times, 8 results were acquired with 8 different combinations of
layer complexity, activation settings and solver settings. For all combinations, parameters
were optimized towards higher PCC score. The performance results are three statistical
measures and one visual. The best performing model in a majority of the statistical measures
is the combination of activation and solver LBFGS and ReLU with an average PCC of 0.637
and visual score of 7.
20
Table 5: Statistical scores for each combination of activation, solver and layer complexity for dataset 2F
Solver &
activation
Average PCC
Average MSE
Average R2
Average visual
score
Dataset F
Simple layers
ReLU, Adam
0.420 627478 -152.8 1
Dataset F
Simple layers
tanh, Adam
0.580 5949 -0.169 6
Dataset F
Simple layers
ReLU, LBFGS
0.607 5623 -0.085 6
Dataset F
Simple layers
tanh, LBFGS
0.632 5804 -0.142 7
Dataset F
Complex layers
ReLU, Adam
0.610 6391 -0.258 4
Dataset F
Complex layers
tanh, Adam
0.540 6287 -0.227 7
Dataset F
Complex layers
ReLU, LBFGS
0.637 5584 -0.072 7
Dataset F
Complex layers
tanh, LBFGS
0.604 5975 -0.177 8
Figure 19: Dataset F, complex layer, activation ReLU and solver LBFGS. Visual score 8. X-axis in seconds and Y-axis in
10V. The simulated ECG is the red line while the green line is the real ECG
21
5 Discussion
The results of the different machine learning models showed promising signs of being able to
produce a fetal ECG from ultrasound data. Moreover, they also showed which improvements
could be made to further the indication that an adult training set can be used to predict the
fetal ECG. The continuation of other connected projects would further the development and
indicate any clinical usability of the method.
5.1 Results on dataset 1
As seen in figure 9, the performance in dataset 1A had very varying results depending on test
number, for the same model, PCC score could fluctuate between approximately 0.15 and
0.50. Since the test and training data was shuffled between each iteration, variance in this
regard could be due to inconsistent data - depending on which data happens to fall in training
the model performs differently. In comparison, dataset 1B had fewer varying results
regarding test number, the largest fluctuation of PCC was approximately 0.6 to 0.75, see
figure 12. Since the data in dataset 1B was cut in heart cycles, the data was less varied and
more consistent which could explain this difference between dataset 1A and 1B.
Performance in dataset 1A was also optimized on only high-quality data (table 2) - which
could further the indication that lower quality of data has a stronger negative impact on the
model than a smaller amount of data.
By comparing table 1 and table 3 it could be seen that the different models in dataset 1A
were more diverse in performance than dataset 1B, with exception of SGD + ReLU for
dataset 1B. This would be explained by the same reasoning as earlier - the data is more
consistent for 1B and can be more easily interpreted by any model. The SGD + ReLU
performed comparably worse on dataset 1B than the other models, and in dataset 1A it
produced unviable values. A conclusion for this could be that this combination of solver and
activation function does not suit dataset 1 at all.
Both datasets had the best performing model with activation ReLU, which is unsurprising
since it is the most used activation function today [16]. However, the two datasets differed in
the best performing solver, LBFGS for dataset 1A and Adam for dataset 1B. The Adam
solver generally performs better on larger datasets [20] which could be an explanation, since
dataset 1B was cut in heart-cycles the dataset contains more training/testing samples.
Regarding both visual and statistical measures, the best performing model in dataset 1B
outperforms the best performing model in dataset 1A, visually this can be seen by comparing
figure 10 and 13 and statistically by comparing table 1 and 3. The model trained on dataset
1A predicted some ECG curves nearly perfect whereas some are mostly noise and have no
distinguishable pattern. The model trained on dataset 1B is much more consistent and rarely
predicts noise, although one could argue that it could be overfitted to the healthy adult heart
and not detect changes or medical conditions changing the heart function. More data on
patients with different heart functions would be needed to assess this fact.
5.2 Results on dataset 2
Through observations of early testing using dataset 2 it was shown that the performance of
the algorithms trained on dataset 2 had drastically varying results depending on pre-
processing, solve type and activation type. As well as less variation depending on layer size,
and maximum amount of iterations. The rest of the parameters made little to no difference
and since optimization of these parameters was omitted due to time constraints, they will
22
also be omitted from the results. The solver type ‘SGD’ was not used for dataset 2 due to it
never producing a result better than noise during initial testing.
The performance results for datasets 2B through 2E, shown in table 4, indicated
improvement that correlated with the amount of pre-processing. The neural net training went
faster the smaller the dataset was in size, and also got higher average PCC scores. Visually
the improvement due to pre-processing could be seen when looking at the graphs in figure
16 compared to 17. With figure 17 being a typical simulated ECG curve for the low pre-
processing dataset 2C and figure 16 being a typical simulated ECG curve for the high pre-
processing dataset 2E. The algorithms trained on low pre-processing datasets like 2B and 2C
in general missed more QRS complexes and was visually scored lower than the high pre-
processing datasets. A couple of the graphs for dataset 2B and 2C are visually close to noise,
similarly to the one in figure 18. While dataset 2D and 2E had all trained algorithms
producing simulated ECG curves resembling the correct ones, except for the graphs acquired
using the combination of ReLU activation and Adam solver. As seen in figure 16, the
simulated ECGs made using dataset 2E missed QRS complexes only a small amount of
times and visually very similar to the real ECG
The different combinations of activations and solvers gave noticeably different results, with
the combination of ReLU activation and Adam solver only producing results better than
noise for datasets 2E and, while Tanh activation with both Adam and LBFGS solver
produced statistically and visually good results. For dataset 2, as shown in table 4 and 5, the
solve type LBFGS consistently outperformed the solve type Adam in terms of statistical and
visual results, with the Adam solve type only catching up in performance with the smallest
dataset 2E and dataset 2F. This could be explained by how the different solvers operate with
Adam using a type of gradient descent [30] and LBFGS using a more complex approach
with functions [31].
In terms of visual score, the correlation with dataset size was also clear with the smaller
datasets having less noisy graphs and more often finding the QRS complex as well as the P-
wave and the T-wave. Dataset 2F had the best performance in both visual score and
statistical score, only rarely failing to find the different parts of the heart cycle in its testing.
In some cases, the simulated ECG visually resembles a normal ECG more than the
corresponding real ECG, as shown in figure 18. The real ECGs in these cases that do not
resemble a normal ECG could in the future be removed by adding a pre-processing step
where the real ECGs are checked for their similarity to all types of normal ECGs including
arrythmias and removed if they are too different from all kinds of normal ECGs. How this
should best be done would need to be researched.
In the evaluation, only dataset 2F was evaluated based on an average of different
test/training data splits. This was done to save time since the optimization for each of the
iterations took over thirty minutes. Dataset 2F was the dataset which got the best results out
of all the cine-loop datasets so that one was chosen to be evaluated harder. It could be argued
that all of the datasets should have been evaluated based on averages, but the time
investment to make that possible was decided not to be worth it.
5.3 Fetal data results
The fetal datasets 3A and 3B were tested on the best performing models trained on dataset
23
1A and 1B. The results on dataset 1A were not satisfactory, since the model could not
accurately identify the different heart cycles (see figure 11). This could be due to the higher
fetal heart rate compared to adults, if the model was overfitted to the heart rate of an adult.
The results on dataset 1B were better (see figure 14), the resulting ECG curve could in most
cases be predicted. Because the inputs were cut into heart cycles for models trained on
dataset 1B, the ECG was easier to predict. Abnormalities in the tissue Doppler curves from
fetal dataset 3B gave results unlike a normal ECG for the model trained on dataset 1B, as
seen in figure 15. This is probably due to the fact that the model was not trained on adult
data containing irregular rhythm and thus has not learned how the corresponding ECG would
look like.
Generally, the results of the best performing model trained on dataset 1B indicates that an
algorithm trained on adult data could predict a potential fetal ECG of a patient with regular
heart rhythm. However, in predicting a fetal ECG of a patient with irregular heart rhythm,
more training and testing data would be necessary to assess possibility and performance.
5.4 Results regarding the aim
The first aim of this project was to produce a model that could produce a plausible simulated
ECG-curve from tissue Doppler ultrasound data. The results from figure 14 show that the
best performing model in dataset 1B could produce an ECG signal that has the right
characteristics of an ECG and is therefore considered plausible. This aim is consequently
considered met.
The second aim was to obtain statistical results for the models when tested on adult data. The
first statistical aim was to produce a result better than noise, an example of producing a
result equal to noise can be found in figure 18 which equalled a visual score of 0. The best
performing model in this paper obtained an average visual score of 8.2 which is considered
better than noise. The second statistical aim was to have P-wave, QRS-complex and T-wave
visible in every heart cycle for 90% of the test samples, examples of models obtaining this
aim can be found in figure 13 and figure 19. The third statistical aim was to have a PCC
score greater than 0.7, which was met by the best performing model trained on dataset 1B,
see table 3.
5.5 General improvements of models
Exemplified in figure 8, there was a lot of low-quality data in the datasets for this study,
which led to a decreased amount of useful data for training and testing. An improvement in
the quality of the overall data could increase the amount of viable data and produce more
accurate models for both datasets since the quality is elevated. The amount of data could also
be increased for the model to more accurately learn the correlation between movement of the
heart and ECG. More data from diverse ranges of heart function and rhythm could also
improve all models to correctly present an ECG for each abnormal heart function case.
For dataset 2 there are many possible improvements, apart from just having more data. Since
there is a lot of pre-processing done to the cine-loop data in order to make it usable for
machine learning, changes to that pre-processing can lead to big improvements in terms of
better algorithms or faster training times. Given more time, many different ways of reducing
frame size and measuring change over time in the cine-loops could be tested. This is
something that could be done automatically via code, but for a good home computer testing
just one of the pre-processing types would take around ten hours which would mean weeks
to find the best type.
24
The size of dataset 2 meant that each single neural net training iteration took a very long
time to perform and therefore, an optimization of a large number of parameters would take
too long to compute. So only the variables that contributed the most to change the end score
were optimized. But with enough time, those parameters could be optimized as well to
possibly achieve even better results.
The highest scoring model in dataset 1 was the model trained on dataset 1B. Since the
dataset uses data portioned in heart cycles, the ECG and tissue Doppler data need to be
manually processed in order to fit the model. This is time-consuming in a clinical setting,
and not ideal. An improvement of this model would therefore be to automatically, for
example using another ML model, cut the data into heart cycle portions to save time and
make processing easier in a clinical setting.
5.6 Improvements for results on fetal data
The ultrasound and ECG data used in the creation of all the algorithms in this paper was data
from adult patients, while the goal was to simulate ECG for fetuses. The differences in
cardiac physiology between an adult and a fetus makes the use of adult data imperfect, but
the fact that prenatal ECG is not available makes it a good option to start with. A possibility
for improvement in this area would be to use ultrasound and ECG data acquired from animal
fetuses with similar physiology as humans as training data.
5.7 Results of study on the future of fetal diagnostics
Earlier studies on fetal ECG detection has focused more on extracting the fetal ECG from
ECG measurements done on the mother, the paper [32] “A robust fetal ECG detection
method for abdominal recordings” is an example of the method, which proposes using priori
information about interference signals to distinguish the fetal ECG from the mother’s signal
and noise. Our study has utilized the possibility to get fetal ECG data without having the
mothers body interfering. This method of getting ECG could give less noisy results due to
less noisy input. The results of this study indicated that the correlation between ultrasound
data and ECG data could be found and used to simulate fetal ECG. Further studies on this
subject could include a development of the method proposed in this paper, with a larger,
higher quality training dataset more similar in function to the fetal data as well as data with a
multitude of variations, abnormal and normal hearts with regular and irregular heart rhythm.
Further processing of data could also benefit further results. This proposed method does not
however, reduce the “black-box” effect that it and many other ML methods exhibit. The
correlation between ultrasound data and ECG is still widely unknown which could hinder the
use of the method since it might be considered unconfirmed.
Although the highest scoring model in terms of PCC was trained on dataset 1B, it would be
better if continuation of this project was done with datasets similar to dataset 2E. Due to the
fact that dataset 1B requires manual pre-processing while dataset 2E only uses automatic
pre-processing that can easier be scaled up to use a larger amount of data.
25
6 Conclusion
We have shown that the proposed method of using ML algorithms to produce a simulated
ECG curve from ultrasound is a viable and informative route for obtaining an adult ECG, and
a potentially informative route for obtaining a possible fetal ECG. We have also obtained the
statistical aims in this paper for one of the models presented.
Specifically, a multilayer perceptron network with approximately 50 manually processed
training samples can predict an adult ECG with an average PCC score of 0.75 to the true
ECG, using tissue Doppler ultrasound as the input parameter. We have also shown that using
automatically processed cine-loops as the input parameter in a similar multilayer perceptron
network is as well a promising method for simulating adult ECG. Finally, we have results
supporting that the proposed method of obtaining an ECG in adults could also be used for
fetuses, and that our best performing model can produce a plausible fetal ECG.
We therefore conclude that the aims of this papers were met. Further developments of the
proposed method could further the claims of this paper and develop a clinical method of
obtaining a fetal ECG.
26
27
7 References
[1] Who, "Electrocardiograph, ECG," Core medical equipment - Information, 2011 2011.
[Online]. Available:
https://www.who.int/medical_devices/innovation/electrocardiograph.pdf.
[2] K. Jeffrey, S. Elizabeth, S. Robin, and V.-C. Lilliam, "ABDOMINAL FETAL EKG
NOISE REMOVAL. 171," Pediatric Research, vol. 39, no. S4, p. 31, 1996, doi:
10.1203/00006450-199604001-00190.
[3] M. Poessel, "Waves, motion and frequency: the Doppler effect," Einstein Online, vol.
Band 05, 2011 2011.
[4] L. S. Lilly, ed. , Pathophysiology of Heart Disease: A Collaborative Project of
Medical Students and Faculty sixth ed ed. Lippincott Williams & Wilkins, 2016.
[5] M. L. P. Å. Öberg, Medicin och Teknik. Studentlitteratur, 2016.
[6] A. M. training. "The Basics of ECG." https://www.aclsmedicaltraining.com/basics-of-
ecg/ (accessed.
[7] I. Texas Heart, "Fetal Heart," (in eng), TexasHeart. [Online]. Available:
https://www.texasheart.org/heart-health/heart-information-center/topics/the-fetal-
heart/.
[8] B. A. Pildner von Steinburg S, Lederer C, Grunow S, Schiermeier S, Hatzmann W,
Schneider KM, Daumer M., "What is the “normal” fetal heart rate?," PeerJ, vol.
1:e82, 2013.
[9] J. G. Betts et al., "Fetal Development," in Anatomy and Physiology. Houston, Texas:
OpenStax, 2013.
[10] L. Sanapo, J. D. Pruetz, M. Słodki, M. B. Goens, A. J. Moon-Grady, and M. T.
Donofrio, "Fetal echocardiography for planning perinatal and delivery room care of
neonates with congenital heart disease," Echocardiography, vol. 34, no. 12, pp. 1804-
1821, 2017/12/01/ 2017, doi: 10.1111/echo.13672
10.1111/echo.13672</p>.
[11] C. R. Deo, "Machine Learning in Medicine," Circulation, vol. 132, no. 20, pp. 1920-
1930, 2015, doi: 10.1161/CIRCULATIONAHA.115.001593.
[12] T. Hastie, J. Friedman, and R. Tibshirani, The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. New York, NY: New York, NY: Springer New
York, 2001.
[13] G. Rebala, An Introduction to Machine Learning, 1st ed. 2019.. ed. Cham : Springer
International Publishing : Imprint: Springer, 2019.
[14] J. Dacombe. "An introduction to Artificial Neural Networks." Medium.
https://medium.com/@jamesdacombe/an-introduction-to-artificial-neural-networks-
with-example-ad459bb6941b (accessed 10 May, 2020).
[15] Z. Z. Li, Z. Y. Zhong, and L. W. Jin, "Identifying best hyperparameters for deep
architectures using random forests," vol. 8994, ed, 2015, pp. 29-42.
[16] S. Sharma. "Activation Functions in Neural Networks." Medium.
https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6
(accessed 10 May, 2020).
[17] G. James, An Introduction to Statistical Learning with Applications in R, 1st ed. 2013..
ed. New York, NY : Springer New York : Imprint: Springer, 2013.
[18] M. Titterington, "Neural networks," Wiley Interdisciplinary Reviews: Computational
Statistics, vol. 2, no. 1, pp. 1-8, 2010, doi: 10.1002/wics.50.
[19] S. Yatawatta, H. Spreeuw, and F. Diblen, "Improving LBFGS Optimizer in PyTorch:
Knowledge Transfer from Radio Interferometric Calibration to Machine Learning,"
ed, 2018, pp. 386-387.
28
[20] Scikit-learn. "MLPRegressor." Scikit-learn. https://scikit-
learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html
(accessed 11 May, 2020).
[21] A. Prieditis and S. Sapp, "Lazy overfitting control," vol. 7988, ed, 2013, pp. 481-491.
[22] N. Redell, "Shapley Decomposition of R-Squared in Machine Learning Models,"
2019.
[23] P. Sedgwick, "Pearson’s correlation coefficient," BMJ : British Medical Journal, vol.
345, no. jul04 1, 2012, doi: 10.1136/bmj.e4483.
[24] P. Garcia-Canadilla, S. Sanchez-Martinez, F. Crispi, and B. Bijnens, "Machine
Learning in Fetal Cardiology: What to Expect," Fetal diagnosis and therapy, pp. 1-10,
2020, doi: 10.1159/000505021.
[25] Q. Yu et al., "Automatic identifying of maternal ECG source when applying ICA in
fetal ECG extraction," Biocybernetics and Biomedical Engineering, vol. 38, no. 3, pp.
448-455, 2018, doi: 10.1016/j.bbe.2018.03.003.
[26] P. R. Muduli, R. R. Gunukula, and A. Mukherjee, "A deep learning approach to fetal-
ECG signal reconstruction," ed, 2016, pp. 1-6.
[27] M. Lukosevicius and V. Marozas, "Noninvasive fetal QRS detection using Echo State
Network," vol. 40, ed, 2013, pp. 205-208.
[28] E. Sulas, E. Ortu, L. Raffo, M. Urru, R. Tumbarello, and D. Pani, "Automatic
Recognition of Complete Atrioventricular Activity in Fetal Pulsed-Wave Doppler
Signals," vol. 2018-, ed, 2018, pp. 917-920.
[29] J. L. Rojo-Alvarez, A. Arenal-Maiz, and A. Artes-Rodriguez, "Support vector black-
box interpretation in ventricular arrhythmia discrimination," IEEE Engineering in
Medicine and Biology Magazine, vol. 21, no. 1, pp. 27-35, 2002, doi:
10.1109/51.993191.
[30] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," 2014.
[31] J. G. G. Andrew, "Scalable Training of L1 Regularized Log-Linear Models,"
presented at the International Conference on Machine Learning, 2007.
[32] S. M. M. Martens, C. Rabotti, M. Mischi, and R. J. Sluijter, "A robust fetal ECG
detection method for abdominal recordings," Physiol Meas, vol. 28, no. 4, pp. 373-
388, 2007, doi: 10.1088/0967-3334/28/4/004.
Appendix 1: Optimised parameters for learning models
Dataset 1A
Solver & activation
Parameters
LBFGS + ReLU {'activation': 'relu', 'alpha': 0.00012, 'batch_size': 'auto', 'beta_1': 0.3,
'beta_2': 0.5, 'early_stopping': True, 'epsilon': 1e-08, 'hidden_layer_sizes':
201, 'learning_rate': 'constant', 'learning_rate_init': 0.95, 'max_fun': 23000,
'max_iter': 40, 'momentum': 0.9, 'n_iter_no_change': 5010,
'nesterovs_momentum': True, 'power_t': 0.5, 'random_state': None, 'shuffle':
True, 'solver': 'lbfgs', 'tol': 0.0006000000000000001, 'validation_fraction':
0.45, 'verbose': False, 'warm_start': False}
LBFGS + logistic {'activation': 'logistic', 'alpha': 8e-05, 'batch_size': 'auto', 'beta_1': 0.3,
'beta_2': 0.5, 'early_stopping': True, 'epsilon': 1e-08, 'hidden_layer_sizes':
261, 'learning_rate': 'constant', 'learning_rate_init': 0.95, 'max_fun': 32000,
'max_iter': 65, 'momentum': 0.9, 'n_iter_no_change': 5010,
'nesterovs_momentum': True, 'power_t': 0.5, 'random_state': None, 'shuffle':
True, 'solver': 'lbfgs', 'tol': 0.0008, 'validation_fraction': 0.2, 'verbose': False,
'warm_start': False}
LBFGS + tanh {'activation': 'tanh', 'alpha': 0.00014000000000000001, 'batch_size': 'auto',
'beta_1': 0.3, 'beta_2': 0.5, 'early_stopping': True, 'epsilon': 1e-08,
'hidden_layer_sizes': 221, 'learning_rate': 'constant', 'learning_rate_init':
0.95, 'max_fun': 22000, 'max_iter': 70, 'momentum': 0.9,
'n_iter_no_change': 5010, 'nesterovs_momentum': True, 'power_t': 0.5,
'random_state': None, 'shuffle': True, 'solver': 'lbfgs', 'tol': 0.0008,
'validation_fraction': 0.25, 'verbose': False, 'warm_start': False}
Adam + logistic {'activation': 'logistic', 'alpha': 8e-05, 'batch_size': 'auto', 'beta_1':
0.7000000000000001, 'beta_2': 0.5000000000000001, 'early_stopping':
True, 'epsilon': 1e-08, 'hidden_layer_sizes': 281, 'learning_rate': 'constant',
'learning_rate_init': 0.04, 'max_fun': 15000, 'max_iter': 70, 'momentum':
0.9, 'n_iter_no_change': 30, 'nesterovs_momentum': True, 'power_t': 0.5,
'random_state': None, 'shuffle': True, 'solver': 'adam', 'tol':
0.0006000000000000001, 'validation_fraction': 0.1, 'verbose': False,
'warm_start': False}
Adam + ReLU {'activation': 'relu', 'alpha': 4e-05, 'batch_size': 'auto', 'beta_1':
0.7000000000000001, 'beta_2': 0.7000000000000001, 'early_stopping':
True, 'epsilon': 1e-08, 'hidden_layer_sizes': 181, 'learning_rate': 'constant',
'learning_rate_init': 0.09999999999999999, 'max_fun': 15000, 'max_iter':
20, 'momentum': 0.9, 'n_iter_no_change': 40, 'nesterovs_momentum': True,
'power_t': 0.5, 'random_state': None, 'shuffle': True, 'solver': 'adam', 'tol':
0.0008, 'validation_fraction': 0.3, 'verbose': False, 'warm_start': False}
Adam + tanh {'activation': 'tanh', 'alpha': 4e-05, 'batch_size': 'auto', 'beta_1':
0.5000000000000001, 'beta_2': 0.7000000000000001, 'early_stopping':
True, 'epsilon': 1e-08, 'hidden_layer_sizes': 221, 'learning_rate': 'constant',
'learning_rate_init': 0.060000000000000005, 'max_fun': 15000, 'max_iter':
60, 'momentum': 0.9, 'n_iter_no_change': 50, 'nesterovs_momentum': True,
'power_t': 0.5, 'random_state': None, 'shuffle': True, 'solver': 'adam', 'tol':
0.0008, 'validation_fraction': 0.05, 'verbose': False, 'warm_start': False}
SGD + logistic {'activation': 'logistic', 'alpha': 8e-05, 'batch_size': 'auto', 'beta_1': 0.9,
'beta_2': 0.999, 'early_stopping': False, 'epsilon': 1e-08,
'hidden_layer_sizes': 241, 'learning_rate': 'adaptive', 'learning_rate_init':
0.01, 'max_fun': 15000, 'max_iter': 35, 'momentum': 0.9,
'n_iter_no_change': 10, 'nesterovs_momentum': True, 'power_t': 0,
'random_state': None, 'shuffle': True, 'solver': 'sgd', 'tol': 0.0004,
'validation_fraction': 0.35000000000000003, 'verbose': False, 'warm_start':
False}
SGD + tanh {'activation': 'tanh', 'alpha': 0.00018, 'batch_size': 'auto', 'beta_1': 0.9,
'beta_2': 0.999, 'early_stopping': False, 'epsilon': 1e-08,
'hidden_layer_sizes': 161, 'learning_rate': 'constant', 'learning_rate_init':
0.01, 'max_fun': 15000, 'max_iter': 65, 'momentum': 0.9,
'n_iter_no_change': 10, 'nesterovs_momentum': True, 'power_t': 0,
'random_state': None, 'shuffle': True, 'solver': 'sgd', 'tol': 0.0002,
'validation_fraction': 0.35000000000000003, 'verbose': False, 'warm_start':
False}
Dataset 1B
Solver &
activation
Parameters
LBFGS + ReLU {'activation': 'relu', 'alpha': 0.00012, 'batch_size': 'auto', 'beta_1': 0.3,
'beta_2': 0.5, 'early_stopping': True, 'epsilon': 1e-08, 'hidden_layer_sizes':
1, 'learning_rate': 'constant', 'learning_rate_init': 0.95, 'max_fun': 16000,
'max_iter': 45, 'momentum': 0.9, 'n_iter_no_change': 5010,
'nesterovs_momentum': True, 'power_t': 0.5, 'random_state': None,
'shuffle': True, 'solver': 'lbfgs', 'tol': 0.0004, 'validation_fraction': 0.1,
'verbose': False, 'warm_start': False}
LBFGS + logistic {'activation': 'logistic', 'alpha': 8e-05, 'batch_size': 'auto', 'beta_1': 0.3,
'beta_2': 0.5, 'early_stopping': True, 'epsilon': 1e-08, 'hidden_layer_sizes':
121, 'learning_rate': 'constant', 'learning_rate_init': 0.95, 'max_fun': 30000,
'max_iter': 45, 'momentum': 0.9, 'n_iter_no_change': 5010,
'nesterovs_momentum': True, 'power_t': 0.5, 'random_state': None,
'shuffle': True, 'solver': 'lbfgs', 'tol': 0.0002, 'validation_fraction': 0.2,
'verbose': False, 'warm_start': False}
LBFGS + tanh {'activation': 'tanh', 'alpha': 0.0001, 'batch_size': 'auto', 'beta_1': 0.3,
'beta_2': 0.5, 'early_stopping': True, 'epsilon': 1e-08, 'hidden_layer_sizes':
221, 'learning_rate': 'constant', 'learning_rate_init': 0.95, 'max_fun': 26000,
'max_iter': 65, 'momentum': 0.9, 'n_iter_no_change': 5010,
'nesterovs_momentum': True, 'power_t': 0.5, 'random_state': None,
'shuffle': True, 'solver': 'lbfgs', 'tol': 0.0008, 'validation_fraction': 0.05,
'verbose': False, 'warm_start': False}
Adam + logistic {'activation': 'logistic', 'alpha': 0.00012, 'batch_size': 'auto', 'beta_1':
0.9000000000000001, 'beta_2': 0.9000000000000001, 'early_stopping':
True, 'epsilon': 1e-08, 'hidden_layer_sizes': 221, 'learning_rate': 'constant',
'learning_rate_init': 0.05, 'max_fun': 15000, 'max_iter': 65, 'momentum':
0.9, 'n_iter_no_change': 40, 'nesterovs_momentum': True, 'power_t': 0.5,
'random_state': None, 'shuffle': True, 'solver': 'adam', 'tol': 0.0008,
'validation_fraction': 0.35000000000000003, 'verbose': False, 'warm_start':
False}
Adam + ReLU {'activation': 'relu', 'alpha': 6.000000000000001e-05, 'batch_size': 'auto',
'beta_1': 0.30000000000000004, 'beta_2': 0.7000000000000001,
'early_stopping': True, 'epsilon': 1e-08, 'hidden_layer_sizes': 281,
'learning_rate': 'constant', 'learning_rate_init': 0.04, 'max_fun': 15000,
'max_iter': 65, 'momentum': 0.9, 'n_iter_no_change': 50,
'nesterovs_momentum': True, 'power_t': 0.5, 'random_state': None,
'shuffle': True, 'solver': 'adam', 'tol': 0.0004, 'validation_fraction': 0.4,
'verbose': False, 'warm_start': False}
Adam + tanh {'activation': 'tanh', 'alpha': 0.00012, 'batch_size': 'auto', 'beta_1':
0.7000000000000001, 'beta_2': 0.9000000000000001, 'early_stopping':
True, 'epsilon': 1e-08, 'hidden_layer_sizes': 81, 'learning_rate': 'constant',
'learning_rate_init': 0.03, 'max_fun': 15000, 'max_iter': 65, 'momentum':
0.9, 'n_iter_no_change': 40, 'nesterovs_momentum': True, 'power_t': 0.5,
'random_state': None, 'shuffle': True, 'solver': 'adam', 'tol': 0.0008,
'validation_fraction': 0.3, 'verbose': False, 'warm_start': False}
SGD + logistic {'activation': 'logistic', 'alpha': 2e-05, 'batch_size': 'auto', 'beta_1': 0.9,
'beta_2': 0.999, 'early_stopping': False, 'epsilon': 1e-08,
'hidden_layer_sizes': 181, 'learning_rate': 'constant', 'learning_rate_init':
0.001, 'max_fun': 37000, 'max_iter': 65, 'momentum': 0.9,
'n_iter_no_change': 10, 'nesterovs_momentum': True, 'power_t': 0.5,
'random_state': None, 'shuffle': True, 'solver': 'sgd', 'tol':
0.0006000000000000001, 'validation_fraction': 0.25, 'verbose': False,
'warm_start': False}
SGD + tanh {'activation': 'tanh', 'alpha': 6.000000000000001e-05, 'batch_size': 'auto',
'beta_1': 0.9, 'beta_2': 0.999, 'early_stopping': False, 'epsilon': 1e-08,
'hidden_layer_sizes': 141, 'learning_rate': 'constant', 'learning_rate_init':
0.001, 'max_fun': 20000, 'max_iter': 25, 'momentum': 0.9,
'n_iter_no_change': 10, 'nesterovs_momentum': True, 'power_t': 0.5,
'random_state': None, 'shuffle': True, 'solver': 'sgd', 'tol':
0.0006000000000000001, 'validation_fraction': 0.3, 'verbose': False,
'warm_start': False}
SGD + ReLU {'activation': 'relu', 'alpha': 0.0001, 'batch_size': 'auto', 'beta_1': 0.9,
'beta_2': 0.999, 'early_stopping': False, 'epsilon': 1e-08,
'hidden_layer_sizes': 201, 'learning_rate': 'constant', 'learning_rate_init':
0.001, 'max_fun': 45000, 'max_iter': 10, 'momentum': 0.9,
'n_iter_no_change': 10, 'nesterovs_momentum': True, 'power_t': 0.5,
'random_state': None, 'shuffle': True, 'solver': 'sgd', 'tol': 0.0004,
'validation_fraction': 0.1, 'verbose': False, 'warm_start': False}
Dataset 2B
Solver &
activation
Parameters
LBFGS + ReLU (activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=1, learning_rate='constant',
learning_rate_init=0.1, max_fun=20, max_iter=20,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='lbfgs', tol=0.01,
validation_fraction=0.1, verbose=False, warm_start=False)
LBFGS + tanh (activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=1, learning_rate='constant',
learning_rate_init=0.1, max_fun=35, max_iter=35,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='lbfgs', tol=0.01,
validation_fraction=0.1, verbose=False, warm_start=False)
Adam + ReLU (activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=1, learning_rate='constant',
learning_rate_init=0.12, max_fun=70, max_iter=5,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='adam', tol=0.0002,
validation_fraction=0.1, verbose=False, warm_start=False)
Adam + tanh (activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=1, learning_rate='constant',
learning_rate_init=0.19, max_fun=70, max_iter=70,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='adam', tol=0.0002,
validation_fraction=0.1, verbose=False, warm_start=False)
Dataset 2C
Solver &
activation
Parameters
LBFGS + ReLU (activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=50, learning_rate='constant',
learning_rate_init=0.1, max_fun=5, max_iter=5, momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='lbfgs', tol=0.01,
validation_fraction=0.1, verbose=False, warm_start=False)
LBFGS + tanh (activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=10, learning_rate='constant',
learning_rate_init=0.1, max_fun=45, max_iter=45,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='lbfgs', tol=0.01,
validation_fraction=0.1, verbose=False, warm_start=False)
Adam + ReLU (activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=30, learning_rate='constant',
learning_rate_init=0.08, max_fun=70, max_iter=5,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='adam', tol=0.0002,
validation_fraction=0.1, verbose=False, warm_start=False)
Adam + tanh (activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=50, learning_rate='constant',
learning_rate_init=0.19, max_fun=70, max_iter=20,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='adam', tol=0.0008,
validation_fraction=0.1, verbose=False, warm_start=False)
Dataset 2D
Solver &
activation
Parameters
LBFGS + ReLU (activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=110, learning_rate='constant',
learning_rate_init=0.1, max_fun=5, max_iter=5, momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='lbfgs', tol=0.01,
validation_fraction=0.1, verbose=False, warm_start=False)>
LBFGS + tanh (activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=170, learning_rate='constant',
learning_rate_init=0.1, max_fun=25, max_iter=25,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='lbfgs', tol=0.01,
validation_fraction=0.1, verbose=False, warm_start=False)
Adam + ReLU (activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=10, learning_rate='constant',
learning_rate_init=0.08, max_fun=70, max_iter=70,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='adam', tol=0.0008,
validation_fraction=0.1, verbose=False, warm_start=False)
Adam + tanh (activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=150, learning_rate='constant',
learning_rate_init=0.19, max_fun=70, max_iter=10,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='adam', tol=0.0002,
validation_fraction=0.1, verbose=False, warm_start=False)
Dataset 2E
Solver &
activation
Parameters
LBFGS + ReLU
Simple layers
(activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=150, learning_rate='constant',
learning_rate_init=0.1, max_fun=5, max_iter=5, momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='lbfgs', tol=0.01,
validation_fraction=0.1, verbose=False, warm_start=False)
LBFGS + ReLU
Complex layers
(activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=(384, 320, 256, 256, 192, 128),
learning_rate='constant', learning_rate_init=0.1, max_fun=15,
max_iter=15, momentum=0.9, n_iter_no_change=10,
nesterovs_momentum=True, power_t=0.5, random_state=20,
shuffle=True, solver='lbfgs', tol=0.01, validation_fraction=0.1,
verbose=False, warm_start=False)
LBFGS + tanh
Simple layers
(activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=110, learning_rate='constant',
learning_rate_init=0.1, max_fun=25, max_iter=25,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='lbfgs', tol=0.01,
validation_fraction=0.1, verbose=False, warm_start=False
LBFGS + tanh
Complex layers
(activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=(384, 320, 256, 256, 192, 128),
learning_rate='constant', learning_rate_init=0.1, max_fun=5,
max_iter=5, momentum=0.9, n_iter_no_change=10,
nesterovs_momentum=True, power_t=0.5, random_state=20,
shuffle=True, solver='lbfgs', tol=0.01, validation_fraction=0.1,
verbose=False, warm_start=False)
Adam + ReLU
Simple layers
(activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=10, learning_rate='constant',
learning_rate_init=0.15000000000000002, max_fun=70,
max_iter=25,
momentum=0.9, n_iter_no_change=10,
nesterovs_momentum=True,
power_t=0.5, random_state=20, shuffle=True, solver='adam',
tol=0.0002, validation_fraction=0.1, verbose=False,
warm_start=False)
Adam + ReLU
Complex layers
(activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=(384, 320, 256, 256, 192, 128),
learning_rate='constant', learning_rate_init=0.02, max_fun=70,
max_iter=15, momentum=0.9, n_iter_no_change=10,
nesterovs_momentum=True, power_t=0.5, random_state=20,
shuffle=True, solver='adam', tol=0.0002, validation_fraction=0.1,
verbose=False, warm_start=False)
Adam + tanh
Simple layers
(activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=130, learning_rate='constant',
learning_rate_init=0.19, max_fun=70, max_iter=5,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='adam', tol=0.0002,
validation_fraction=0.1, verbose=False, warm_start=False)>
Adam + tanh
Complex layers
(activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=(384, 320, 256, 256, 192, 128),
learning_rate='constant',
learning_rate_init=0.060000000000000005,
max_fun=70, max_iter=20, momentum=0.9,
n_iter_no_change=10,
nesterovs_momentum=True, power_t=0.5, random_state=20,
shuffle=True, solver='adam', tol=0.0002, validation_fraction=0.1,
verbose=False, warm_start=False)
Dataset 2F
Solver &
activation
Parameters
LBFGS + ReLU
Simple layers
(activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=50, learning_rate='constant',
learning_rate_init=0.1, max_fun=45, max_iter=45,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='lbfgs', tol=0.01,
validation_fraction=0.1, verbose=False, warm_start=False)
LBFGS + ReLU
Complex layers
(activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=(384, 320, 256, 256, 192, 128),
learning_rate='constant', learning_rate_init=0.1,
max_fun=40,
max_iter=40, momentum=0.9, n_iter_no_change=10,
nesterovs_momentum=True, power_t=0.5,
random_state=20,
shuffle=True, solver='lbfgs', tol=0.01,
validation_fraction=0.1,
verbose=False, warm_start=False)
LBFGS + tanh
Simple layers
(activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=150, learning_rate='constant',
learning_rate_init=0.1, max_fun=20, max_iter=20,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='lbfgs', tol=0.01,
validation_fraction=0.1, verbose=False, warm_start=False)
LBFGS + tanh
Complex layers
(activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=(384, 320, 256, 256, 192, 128),
learning_rate='constant', learning_rate_init=0.1, max_fun=60,
max_iter=60, momentum=0.9, n_iter_no_change=10,
nesterovs_momentum=True, power_t=0.5, random_state=20,
shuffle=True, solver='lbfgs', tol=0.01, validation_fraction=0.1,
verbose=False, warm_start=False)
Adam + ReLU
Simple layers
(activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=10, learning_rate='constant',
learning_rate_init=0.19, max_fun=70, max_iter=70,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True,
power_t=0.5,
random_state=20, shuffle=True, solver='adam',
tol=0.0002,
validation_fraction=0.1, verbose=False,
warm_start=False)
Adam + ReLU
Complex layers
(activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=(384, 320, 256, 256, 192, 128),
learning_rate='constant', learning_rate_init=0.02, max_fun=70,
max_iter=15, momentum=0.9, n_iter_no_change=10,
nesterovs_momentum=True, power_t=0.5, random_state=20,
shuffle=True, solver='adam', tol=0.0002, validation_fraction=0.1,
verbose=False, warm_start=False)
Adam + tanh
Simple layers
(activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=170, learning_rate='constant',
learning_rate_init=0.19, max_fun=70, max_iter=15,
momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=20, shuffle=True, solver='adam', tol=0.0002,
validation_fraction=0.1, verbose=False, warm_start=False)
Adam + tanh
Complex layers
(activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,
beta_2=0.6, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=(384, 320, 256, 256, 192, 128),
learning_rate='constant', learning_rate_init=0.12, max_fun=70,
max_iter=25, momentum=0.9, n_iter_no_change=10,
nesterovs_momentum=True, power_t=0.5, random_state=20,
shuffle=True, solver='adam', tol=0.0002, validation_fraction=0.1,
verbose=False, warm_start=False)
Appendix 2: Parameter intervals for optimization
Parameter Interval
Learning_rate_init 0.01 - 0.2 with step 0.01
Beta1 and Beta2 0.1-1 with step 0.2
Alpha 0.00002-0.0002 with step 0.00002
Validation_fraction 0.05-0.55 with step 0.05
Max_iter 5-75 with step 5
Tol 0.0002-0.001 with step 0.0002
N_iter_no_change 10-60 with step 10
Hidden_layer_sizes 1-300 with step 20
Max_fun 10000-50000 with step 1000
Learning_rate ‘constant’, ‘invscaling’, ‘adaptive’
Power_t 0-2 with step 0.1
Momentum 0.001-1 with step 0.1
Nesterovs_momentum True, False
www.kth.se
TRITA CBH-GRU-2020:146