Machine Learning on Ultrasound

IN DEGREE PROJECT TECHNOLOGY,FIRST CYCLE, 15 CREDITS

, STOCKHOLM SWEDEN 2020

Simulating Fetal ECG Using Machine Learning on Ultrasound Images

MATHILDA VILLOT BERLING

JULIA ÖNERUD

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ENGINEERING SCIENCES IN CHEMISTRY, BIOTECHNOLOGY AND HEALTH

i

This project was performed in collaboration with Center for Fetal Medicine, Department of Obstetrics and Gynecology

Karolinska University Hospital Supervisors: Jonas Johnson and Lotta Herling

Simulating Fetal ECG Using Machine Learning on Ultrasound Images

Simulering av foster-EKG genom maskininlärning på ultraljudsbilder

M A T H I L D A V I L L O T B E R L I N G J U L I A Ö N E R U D

Degree project in medical engineering First level, 15 hp

Supervisor at KTH: Tobias Nyberg, Mattias Mårtensson Examiner: Mats Nilsson

KTH Royal Institute of Technology School of Engineering Sciences in Chemistry, Biotechnology and Health

SE-141 86 Flemingsberg, Sweden http://www.kth.se/cbh

2020

ii

iii

Abstract

ECG is used clinically to detect a multitude of medical conditions, such as heart-problems like

arrhythmias and heart failure, and to give a good general image of the function of the heart

with a quick and harmless exam. In many clinical cases, normal ECG measurements cannot

be taken, such as with fetuses where ECG signals from the mother’s own body hinder the

measurement. This paper examines using machine learning algorithms to be able to simulate

ECG graphs from ultrasound data alone. These algorithms are trained on ultrasound and ECG

data acquired from the same patient simultaneously. The data used in the training of the

algorithms is taken from samples acquired from 100 adult patients. The results found using

this method to simulate an ECG indicate good possibilities for future usefulness, where

machine learning to acquire simulated ECG can help facilitate clinicians in evaluating fetal

heart function, as well as in other cases where ECG cannot be measured normally.

Keywords: ECG, Ultrasound, Fetal-ECG, Heart, Machine learning, Simulated

iv

Sammanfattning

EKG används kliniskt för att upptäcka en mängd olika åkommor, så som hjärtsvikt och

arytmier, men också för att ge en generell bild av hjärtfunktionen med en snabb och harmlös

undersökning. I många kliniska fall kan dock inte normal EKG mätning ske, så som för foster

då EKG signaler från moderns egna kropp hindrar EKG-mätningen. I detta papper undersöks

användandet av maskininlärningsalgoritmer för att kunna simulera EKG grafer från enbart

ultraljuds data. Dessa algoritmer är tränade på ultraljud och EKG data som simultant fåtts från

samma undersökning av en patient. I detta papper har ultraljudsdatan som använts kommit

från 100 mätningar från olika vuxna patienter. Resultaten funna från undersökningen av EKG

simulerings metoden indikerar goda möjligheter för framtida användbarhet, då

maskininlärningsalgoritmer för att simulera EKG kan underlätta när kliniker ska utvärdera

hjärtfunktionen hos foster, eller i andra fall då EKG inte kan mätas normalt.

Nyckelord: EKG, Ultraljud, Foster-EKG, Hjärta, Maskininlärning, Simulerat

v

Contents

Abstract iii

Sammanfattning iv

Contents v

Abbreviations vii

Glossary vii

1 Introduction 1

1.1 Aim 1

2 Background 3

2.1 Tissue Doppler 3

2.2 Electrocardiography 3

2.3 Fetal Heart Physiology 4

2.4 Importance of Fetal Echocardiography 4

2.5 Machine Learning 5

2.6 Artificial Neural Networks 5

2.7 Statistical Methods 6

2.8 Machine Learning in Fetal Cardiology 7

3 Method 9

3.1 Datasets 9

3.2 Programming language and hardware 10

3.3 Initial processing of training datasets 10

3.4 Further processing of training datasets 10

3.4.2 Dataset 1B: Tissue Doppler dataset of one heart cycle 11

3.4.4 Dataset 2B: Cine-loop dataset, rate of change 11

3.4.5 Dataset 2C, 2D, 2E and 2F: Cine-loop dataset, rate of change and minimized data

11

3.5 Training the algorithm 11

3.6 Evaluating performance 12

3.6.1 Dataset 1A and 1B: The tissue Doppler datasets 12

3.6.3 Dataset 2: The cine-loop datasets 12

3.6.4 Dataset 2F: The best performing cine-loop dataset 12

3.7 Testing on fetal data 12

4 Results 13

vi

4.1 Dataset 1A: The tissue Doppler dataset of multiple heart cycles 13

4.2 Dataset 1B: The tissue Doppler dataset of one heart cycle 15

4.3 Dataset 2B through E: The cine-loop datasets 17

4.4 Dataset 2F: The cine-loop dataset, rate of change and minimized data 19

5 Discussion 21

5.1 Results on dataset 1 21

5.2 Results on dataset 2 21

5.3 Fetal data results 22

5.4 Results regarding aim 23

5.5 General improvements of models 23

5.6 Improvements for results on fetal data 24

5.7 Results of study on the future of fetal diagnostics 24

6 Conclusion 25

7 References 27

Appendix 1: Optimised parameters for learning models

vii

Abbreviations ML – Machine learning

ANN – Artificial neural networks

MSE – Mean squared error

ECG – Electrocardiogram

PCC – Pearson correlation coefficient

ROI – Region of interest

Glossary Hyperparameters – layers and nodes in an ANN

Mean squared error – difference between predicted value and true value squared

Variance – Flexibility of model

Bias – Error of the model

Overfitting – A model highly adapted to training data that generalizes poorly

R-squared – Statistical measure based on variance

Pearson – Statistical measure based on linearity

Cine-loop – Echocardiography images in a digital form as a sequence of a determined frame

number

viii

1

1 Introduction

Electrocardiogram (ECG) is used clinically to detect a multitude of medical conditions, such

as heart-problems like arrhythmias and heart failure, and also to give a good general image of

the function of the heart [1]. In situations with fetal cardiac dysfunction and structural cardiac

anomalies, it is of importance for both the fetus and the mother to detect these problems

prenatally to minimise perinatal complications and to have more time to prepare for possible

post-birth surgeries or interventions. Problematically though, the results from classical ECG

are of significantly diminished quality when performed on a fetus compared to a postnatal

patient. This is due to the mothers own electrical signals from the heart and body adding a

large amount of noise to the measurement [2]. A method of obtaining a fetal ECG could

therefore be an important tool in order to diagnose cardiac conditions.

Echocardiography can be used to obtain images of the fetal heart, with tissue Doppler

ultrasonography the velocity of the walls of the fetal heart can be obtained to be able to

evaluate fetal heart function. Tissue Doppler ultrasonography shows promise in assessing

fetal cardiac function. However, it requires an experiences sonographer to spend a large

amount of time analysing the data. This is where an ECG curve could be of great help in

evaluating fetal cardiac function more accurately and effectively due to the simplicity of the

ECG.

Many studies have been performed on different methods of extracting the fetal ECG-signal

from the ECG-signal of the mother, via filtering and data separation, but these have problems

with accuracy due to not being able to completely remove the noise [2]. The proposed

solution to this, is to instead get the supplementary ECG from the less noisy ultrasound

measurements by using algorithms trained using machine learning. These algorithms will

learn using data from adult patients containing both ultrasound measurements and classical

ECG measurements. When trained the algorithms will then be tested on prenatal patients. The

aim of this project is therefore to create machine learning algorithms that can produce an ECG

from ultrasound data and assess the grade to which they work.

1.1 Aim

The aim of this project was to train a model that could produce a plausible simulated ECG-

curve from tissue Doppler ultrasound data sampled from fetuses, and for unseen ultrasound

data in adults also achieve:

1) A result better than noise

2) P-wave, QRS-complex and T-wave visible in every heart cycle for 90% of the test

samples

3) A mean Pearson correlation coefficient (PCC) score of at least 0.7

2

3

2 Background

The background of this project is twofold, knowledge about the diagnostic techniques and

physiology combined with the science of machine learning. In this background segment, we

will provide insights needed to understand our study in both these aspects.

2.1 Tissue Doppler

Tissue Doppler is a form of echocardiography that measures the velocity of the myocardium

(heart muscle) throughout one or more heartbeats using the Doppler effect. The Doppler

effect is the principle that the ultrasound reflected back from an object will have an altered

frequency depending on the velocity of the object that it is reflected on [3]. And therefore, by

simply looking at the frequencies sent out and received by the transducer, motion can be

deduced [3]. The velocity of the myocardium, the heart valves and the blood can all be used to

find signs of heart defects and problems which makes tissue Doppler one of the more

important modalities when it comes to cardiovascular defects and diseases.

2.2 Electrocardiography

Electrocardiography, is the process of creating an ECG, which is a graph of the measured

electrical activity of the heart. The electrical activity is measured using a multitude of

electrodes placed in direct contact with the skin which detect the small electrical changes in

body that are a result of cardiac muscle depolarization and repolarization during each

heartbeat [4]. The ECG contains three main parts, known as the P-wave, QRS complex and

the T-wave (Figure 1). The P-wave represents the atrial depolarization, the T- wave

represents the repolarization of the ventricles and the most important one, the QRS complex,

represents a combination of depolarization of the left and right ventricles and the contraction

of the large ventricular muscles [5]. Clinicians can quickly detect cardiac anomalies by

looking at the ECG and these three main parts in particular, and seeing if the amplitude of a

certain part looks strange or if a certain interval is too long.

Figure 1:Left: Schematic diagram of normal sinus rhythm for a human heart as seen on ECG (with English labels), With the P part representing the depolarization of the left and right atrium, the QRS part representing electrical impulses spreading through the ventricles and indicating ventricular depolarization and the T part representing ventricular repolarization [6].

Right: ECG (green) and simultaneous tissue Doppler (yellow) in combined plot

4

2.3 Fetal Heart Physiology

The heart of the fetus is markedly different from an adult heart, both in physiology and

function. These differences are partly due to the fetus still being in stages of development,

having a much higher amount of stem cells in circulation and a vastly different circulatory

need compared to an adult [7]. One clear difference that exist due to this is the much higher

heart rate of a fetus compared to an adult, with heart rates ranging between 120 and 160 bpm

being the normal [8]. The fetus is also fully dependent on the placenta, which is located inside

the womb with connection to both the uterus and the liquid-filled sac within which the fetus is

held. Oxygen and nourishment is transferred through the placenta and via the umbilical cord

to the fetus, and there is no direct contact between the circulatory system of the fetus and the

mother. The lungs of the fetus are filled with amniotic fluid during gestation and only a small

amount of blood is pumped past the lungs [7].

Figure 2: Fetal Circulatory System-02.jpg, CC BY 3.0 License [9]

Since there is less of a need for blood to pump past the lungs while they are filled with

amniotic fluid, the fetal heart does not have a separate pulmonary artery and aorta. Instead

they are connected by a blood vessel called the ductus arteriosus. This extra blood vessel is

closed after birth and the pulmonary artery and aorta becomes separate. There is also an

opening between the left and the right atria in the fetal heart, called the foramen ovale. It

allows for blood to flow directly to the left atrium from the right (Figure 2)[9]. And as with

the ductus arteriosus, the foramen ovale is also closed and disappears shortly after birth [7].

2.4 Importance of Fetal Echocardiography

Fetal ECG is important because it would aid the clinician in correctly diagnosing the fetus,

which in turn makes it possible to do two things, firstly it helps with planning the perinatal

management and identifying what kind of intervention may be required in the delivery room

or within the first days of life. And secondly it helps to identify fetuses who may benefit from

fetal cardiac intervention, meaning different medicines or surgeries on the fetuses heart while

in the womb [10]. Fetal echocardiography is used to detect arrhythmias, a collection of

ailments where the heart beats irregularly. Examples include Atrioventricular block (AV-

5

block), an impairment in the electrical signals when atria and ventricles beat asynchronously

and supraventricular extrasystole (SVES), an early depolarisation which causes the heart to

beat irregularly [5].

2.5 Machine Learning

Machine learning (ML) is the science of computational learning, combining statistics with

computer science to build algorithms which can process data and derive complex conclusions

and models that otherwise would be impossible to discern. ML algorithms can be categorized

in different ways, but all are based on inputs, the measured data which in turn affects the

output of a system [11].

2.6 Artificial Neural Networks

There are many different methods in machine learning to achieve a well performing model, a

very flexible and diverse method is Artificial neural networks (ANN), a nonlinear statistical

model [12]. According to Rebala [13] ANNs were initially created to mimic the function of

neurons in the human brain in an oversimplified manner, each neuron is modelled with

multiple inputs and one single output and every neuron is connected in a network to each

other. He further states that the neurons, or nodes form columns, or layers that are not

connected to each other vertically inside the layer but the layers are in turn connected to each

other via each node (Figure 3). In computer science terms, the artificial neuron is simply a

function regulated by a weight factor to control the strength of the impact to other artificial

neurons or functions via their connection [13].

Figure 3: Schematic overview of an ANN, the rectangles represent the different layers, circles are the artificial neurons/nodes and thin lines represent the connection between different nodes. a) input layer b) hidden layer c) output layer.

The input layer of artificial neurons gathers input data from the given dataset and sends that

information to the next layer in with respect to the weight [14]. The middle, or hidden layers

process this input with an activation function, typically a sigmoid function, and sends the

information to the output layer which in turn produces an output [13]. The artificial neural

network can have multiple layers and nodes in each layer, i.e. hyperparameters [15]. Sharma

[16] explains the sigmoid functions as a class of functions with similar shapes and attributes

with resemblance of an ‘S’ shape (Figure 4, right), examples include Softmax, logistic

function and tanh. Their purpose as an activation function is to make the connection non-

linear, which is needed for the ANN to find complex correlations [17]. Another activation

function is ReLU (Figure 4, left), that according to Sharma [16] it is the most commonly used

activation function today.

6

Figure 4: Example of a Sigmoid function with typical 'S' shape (right) and ReLU function (left).

For the artificial neural network to correctly model the behaviour of the system, it needs to be

trained on a given dataset. To train the algorithm, the standard approach is to change the

weights according to the stochastic gradient descent method (SGD), although other methods

exist [18]. The SGD method uses partial derivatives of a loss function (a function defining

how wrong the algorithm is) with respect to the weights in order to find the local minimum of

the loss function and change the weights accordingly [13]. The limited memory Broyden-

Fletcher-Goldfarb-Shanno (LBFGS) algorithm is another example of an optimisation

algorithm extensively used. Although it has limitations it often converges faster than standard

SGD [19]. A variant of the SGD method is Adam, presented by Sci-kit learn [20] as a SGD-

based optimiser that works well on larger datasets. They further explain that LBFGS is more

useful for smaller datasets, with faster convergence and better results.

Multilayer perceptron is a basic ANN, with inputs flowing through the network in a

unidirectional way – forward [18]. The documentation for the Multilayer Perceptron

Regressor (MLPRegressor) from Sci-kit learn describe the learning algorithm with 23 tuning

parameters, for example hidden layer sizes, activation function and solver (optimisation

method) [20]. Further the documentation shows that the solvers available are LBFGS, SGD

and Adam. There are five methods of the MLPRegressor, “fit” that uses training data (both

input and target) to train the model, “predict” that predicts an output given an input after

training, “score” which evaluates the model and “get_params” as well as “set_params” which

are methods for configuration of parameters [20].

2.7 Statistical Methods

The most common way to evaluate the accuracy of a ML regression model is to use the Mean

squared error (MSE) [17]. Since there is little interest in how well the model performs on

training data, an unseen portion of the data from the dataset is used to assess model

performance, this is also called the testing data. Overfitting is a common problem in ML

algorithms, not least in regard to artificial neural networks. Overfitting occurs when the

algorithm is too flexible in regards to the training portion of the dataset, and perceives

patterns occurring randomly in the training dataset that are not properties of the system

(Figure 5)[21]. When a loss function is at its minimum, the model is usually overfitted. To

prevent overfitting there are a number of methods that reduced the flexibility of the model, for

7

example weight decay or early stopping rule [12].

Figure 5: Example of a regression problem a) Overfitted example b) Not overfitted example c) true regression

R-squared (R2) is a commonly used statistical measure for quantifying the variance of a

regression problem, usually it measures the overall fit of the model along a scale of negative

infinity to 1 with higher scores indicating a better fit [22]. Another statistical measure is the

Pearson correlation coefficient (PCC) which measures the strength of linear association

between two variables on a scale from -1 (perfect negative correlation) to 1 (perfect positive

correlation) with 0 indicating uncorrelated variables [23].

2.8 Machine Learning in Fetal Cardiology

Garcia-Canadilla et al. [24] states that ML in fetal cardiology is of great interest and

development since evaluations of cardiac function and structures in fetuses often face

challenges. Examples include fetal movement, small heart size and inexperienced medical

personnel. Garcia-Canadilla also states that ML can facilitate the assessment of the fetal heart,

for example by improving image acquisition, extracting information for evaluation and

diagnosing abnormalities. Many papers are published on extraction of maternal ECG from

abdominal ECG readings to produce a fetal ECG or fetal QRS complexes with the use of

machine learning methods. For example Yu et al. [25] propose using independent component

analysis, Muduli et al. [26] focus on deep learning and Lukosevicius et al. [27] proposed a

method using ANN. Another approach from Sulas et al. [28] is to use data from pulsed-wave

doppler to extract features including heartbeat of the fetus using ANN.

One issue that arises when using ML methods to diagnose conditions is the “black-box” effect

that is especially apparent when using deep learning methods [24]. The “black-box” effect is

problematic since the decisions made by the model cannot be logically followed by medical

personnel, they are completely non-transparent in most ML methods [29].

8

9

3 Method

Many datasets were used in this project to test different methods of training the algorithm.

The training datasets (datasets 1 and 2) consists of adult data with corresponding ECG, and

the testing dataset (dataset 3) consists of fetal data without corresponding ECG.

3.1 Datasets

The training datasets were based on adult ultrasound data with regular heart rythm imaged on

Vivid S6 ultrasound imaging system equipped with a M4S-RS (1.9-4.1 MHz) phased-array

transducer (GE CV Ultrasound, Haifa, Israel) with correlating ECG data taken

simultaneously. The data was exported in software EcoPAC version 201 (GE Vingmed

Ultrasound AS, Horten, Norway).

Two types of data were exported in the EcoPAC software which gave rise to dataset 1 and

dataset 2. An overview of the training datasets can be found in figure 7.

Dataset 1 consisted of 100 color tissue Doppler ultrasound data and ECG patient samples

from 100 different adults, with lengths ranging between 1 and 3 seconds. The Doppler

ultrasound images was exported by placing a Region of interest (ROI) on the septal wall of

the heart (Figure 6) while in ‘q-analysis’ mode, and exporting the processed velocity curve of

that ROI to a .txt-file. The ECG data was also stored in the same .txt-file.

Dataset 2 consisted of 100 ultrasound cine-loops taken from 100 different adult patients, with

cine-loop lengths ranging between 1 and 4 seconds, sampled and saved in .avi files, along

with corresponding ECG data for each of the cine-loops, saved in a .txt-file.

A testing dataset consisting of fetal data was also used to evaluate the performance of the

models on fetal data. The fetal data consisted of tissue Doppler ultrasound data in two patient

groups. The first group, named dataset 3A, consisted of four samples from fetuses of normal

heart function. The second group, named dataset 3B consisted of four samples from fetuses

with irregular heart rythm, where arrhythmias like for example AV-block and SVES were

present.

Figure 6: Placement of the ROI when extracting tissue velocity data in EcoPAC software

10

Figure 7: Overview of the training datasets based on adult ultrasound data

3.2 Programming language and hardware

The language used for processing, training, visualising and evaluating on both datasets was

Python 3.7 (Python Software Foundation, Wilmington, DE, United states), with accompanied

libraries such as Numpy, Scikit-learn, Matplotlib and Scipy. The MLPRegressor from Scikit-

learn was used as our learning algorithm for all datasets. The processing, training and

evaluation of the models were done on a 2017 Macbook Air with 1,8 GHz Intel Core i5

processor for dataset 1 and on a stationary computer using a AMD ryzen 3900x cpu and a

AMD Radeon RX 5700 XT gpu on Windows 10 for dataset 2.

3.3 Initial processing of training datasets

The .txt-files from Dataset 1 were extracted into arrays and looped to normalise the length to

3 sec. An interpolation function (interp1d, Scipy) was used on both tissue Doppler data and

ECG to sample the data 500 times, all data therefore had a common x-axis. The data was also

smoothed using a Savitzky-Golay filter.

Dataset 2 consisted of cine-loops, with each of these cine-loops being made up of consecutive

images, called frames that in turn consisted of grey-level pixel values. The initial processing

of these cine-loops was merely to retrieve these pixel values and store them in an array that

could more easily be used by our later functions and neural networks

The measurements in both datasets were visually evaluated on quality, in order to categorize

individual measurements into three groups - low, medium and high quality. For dataset 1, the

quality depended on a multitude of issues. ECG data and velocity curve not aligned,

ECG/velocity reading null or noisy and velocity reading not sampled on correct ROI

(incorrect shape of plot). For dataset 2, the quality was evaluated based on null or noisy data

and grainy or low-resolution frames. In either dataset, if the ECG was found to be upside-

down it was flipped to show a correct trace.

3.4 Further processing of training datasets

Each dataset was processed with different methods, therefore resulting in two new datasets

based on dataset 1 and four new datasets based on dataset 2. The processing of these datasets

will be presented in this section. All datasets were normalized using mean and standard

deviation, see equation 1. This causes the mean of the dataset to be zero, and the standard

11

deviation to be one. See figure 7 for an overview of the training datasets. Datasets 1A and 2A

were used with no further processing.

Equation 1: Normalization formula, z denotes the normalized datapoint, x the original datapoint as well as and mean

and standard deviation of the dataset

3.4.2 Dataset 1B: Tissue Doppler dataset of one heart cycle

Dataset 1B consisted of velocity traces and ECG data from dataset 1 divided into heart cycles.

Using the ECG, the data was cut from R-peak to R-peak so that each sample of the dataset

consisted of one heart cycle, in this process high and medium quality data was used. Since

each heart cycle has unique length all samples had different lengths and unique x-axis. The

sample frequency was normalised using an interpolation function to 300 samples per heart

cycle. The resulting dataset was velocity inputs of one heart cycle sampled 300 times and

corresponding ECG targets of one heart cycle, sampled 300 times. After quality evaluation

and segmentation in heart cycles dataset 1B consisted of 162 heart cycles.

3.4.4 Dataset 2B: Cine-loop dataset, rate of change

Dataset 2B was processed by looking at how fast the pixels of the frames changed. This was

done by creating frames with new pixels representing the change in the original pixels over

multiple images. This resulted in a lower number of total frames compared to before, and now

contained information about how much the pixel values had changed over a set number of

cine-loop frames instead of information about the current state. The number of frames that

each cine-loop contained varied between 50 and 259 frames and also had varying lengths of

ECG that did not correspond to the number of frames, and so in order to correlate these cine-

loops to the ECG they had to be interpolated into arrays of the same size. Both the ECG data

and the cine-loop data were transformed into arrays of length 64 using interpolation, one array

per starting cine-loop and one per ECG data relating each cine loop.

3.4.5 Dataset 2C, 2D, 2E and 2F: Cine-loop dataset, rate of change and minimized data

Datasets 2C through E was processed in similar ways. For each frame in the original cine-

loops, a new frame was created where the pixel values in a square area of pixels in the

original frame was averaged into one pixel in the new frame. For Datasets 2C that square was

4x4 pixels for 2D 8x8 pixels and for 2E 16x16 pixels. Thereby reducing the image size by 16,

64, and 256 respectively. After that the same processing as for 2B was done for all three of

the datasets. Dataset 2F was processed the same way as 2E, but only using high quality data.

3.5 Training the algorithm

The inputs and targets from each of the datasets were split into a train and test group using the

train_test_split method from Scikit-learn, 30% of the dataset was split into testing. The

MLPRegressor was then trained on the training dataset using the method ‘fit’, with the input

of the dataset as the input (X) and the target of the dataset as the true values of the output (y).

For tissue Doppler datasets 1A and 1B, the optimal parameters of the MLPRegressor were

chosen with an optimiser function. Three optimisers were constructed, one for each type of

solver, ‘Adam’, ‘SGD’ and ‘LBFGS’. For each type of activation, ‘tanh’, ‘logistic’ and

‘ReLU’ and each type of solver the remaining parameters of the MLPRegressor were one by

one iterated on a chosen interval corresponding to that specific parameter, see appendix 2.

The parameter value that corresponded to the model with the best PCC score was then

12

selected, and the next parameter value started the same process. Resulting was therefore nine

models optimised on parameters for each different combination of activation and solver. The

nine models were evaluated on performance.

For dataset 2A through 2F, the same optimization method of parameters was used, with

optimization of two parameters for the ‘LBFGS’ solver and three for the ‘Adam’ solver. The

‘SGD’ solver was not used past initial testing for dataset 2.

For the differing sizes of datasets 2A through 2F, different neural net layer size and structure

was used. For dataset 2A through 2D the neural net was simple layered, meaning that it had

only one hidden layer with less than 50 neurons, and for dataset 2E and 2F the neural net had

many hidden layers with between 32 and 384 neurons, see appendix 1.

3.6 Evaluating performance

The evaluation of the models was implemented differently for the different datasets. The set

of statistical measures used for evaluation were: PCC, MSE and R2 score. All datasets except

2A were evaluated on the statistical measures and a visual score from 0-10.

On datasets 1A and 2F, the data was divided into low, medium and high quality, and required

an evaluation on which portions to use. The evaluation was a test that was devised with three

different dataset portions of dataset 1 and 2 respectively, only good quality data, good and

medium quality data and all quality of data to test on the best performing model in each

dataset group. The portion with the best PCC on 10 different iterations with random partitions

of training and testing data was chosen. Dataset 1B as well as Datasets 2B through 2E used

high and medium quality data, without an evaluation test.

3.6.1 Dataset 1A and 1B: The tissue Doppler datasets

For dataset 1A and 1B the evaluation was done iteratively for all combinations of solver and

activation function 10 times. The solver and activation combination that received the best

overall score was chosen as the best performing and further tested on the different data quality

portions.

3.6.2 Dataset 2: The cine-loop datasets

Due to the size of the datasets 2B through 2F, the long-time of each optimization and the large

differences in performance based on pre-processing, dataset 2A through F was evaluated

comparing different types of pre-processing and a few key parameter values.

3.6.3 Dataset 2F: The best performing cine-loop dataset

Dataset 2F was seen to be the best performing dataset out of the cine-loop datasets, and as

such it was chosen to be evaluated more strongly. For datasets 2A through E only one

test/train split of 70 % training and 30 testing data was trained and optimized on due to time

constraints, but 2F was done with many randomized splits of 70%/30% test/train data and

then evaluated on the average result of the training on all of the splits. To be able to more

reliably evaluate the effectiveness of the algorithms trained on the dataset.

3.7 Testing on fetal data

Since the fetal data only consisted of tissue Doppler data it was tested on models trained on

dataset 1A and 1B. The fetal data was pre-processed to fit the inputs of these datasets, the

model trained on dataset 1B inputs heart cycle data therefore the fetal data was cut into heart

cycles by manually segmenting the data. Since the fetal data does not have a correlating ECG

only visual correlation could be shown, with no statistical measures.

13

4 Results

The results are presented based on models trained and tested on each dataset. The statistical

measures presented are PCC (closer to 1 indicates a better fit), MSE (lower error is better), R2

(closer to 1 indicates a better fit) and visual score where 0 is lowest and 10 highest. For each

of the datasets, the quality of gathered data was evaluated. This evaluation showed that out of

100 samples from dataset 1, 51 were high quality, 24 medium quality and 22 low quality. Out

of 100 samples from dataset 2, 52 were high quality, 31 medium quality and 17 low quality.

Examples of ECG data from each evaluated quality in dataset 1 are presented in figure 8.

Figure 8: Examples of results on the data quality evaluation for dataset 1 for the ECG signal (yellow). X-axis in seconds and

Y-axis 10 V

4.1 Dataset 1A: The tissue Doppler dataset of multiple heart cycles

The results of the 10 trials for the different optimised models are presented in figure 9,

whereas the average scores are presented in table 1. The best performing model in a majority

of the statistical measures presented in table 1 has the combination of solver and activation

LBFGS and ReLU with an average PCC of 0.517 and visual score of 5.8. Further results on

data portions regarding quality were tested on this model and presented in table 2, the results

show that performance was optimised when only high-quality data was used. Parameter result

from the optimised models are presented in appendix 1. The combination SGD and ReLU

could not be trained.

14

Figure 9: Test results for each model performance trained on dataset 1A (all quality data) based on the statistical measures and visual scores. All axes have dimensionless values.

Table 1: Average of statistical scores for each combination of activation and solver for models trained on dataset 1A

Solver &

activation

Average PCC

Average MSE

Average R2

Average visual

score

LBFGS + ReLU 0.517 2662 0.0777 5.8

LBFGS + logistic 0.401 2795 -0.086 5.0

LBFGS + tanh 0.380 2952 -0.018 5.0

Adam + logistic 0.352 3341 -0.010 3.7

Adam + ReLU 0.254 5384 -1.029 3.7

Adam + tanh 0.385 2845 0.015 3.0

SGD + logistic 0.425 3137 -0.069 5.9

SGD + tanh 0.424 3753 -0.259 5.3

Table 2: Results of test of data portion used on best performing model

Data portion

Average PCC on 10 iterations

High quality 0.538

High and medium quality 0.517

High, medium and low quality 0.506

The visual results are presented from best performing model on adult data with only high-

quality data, solver LBFGS and activation ReLU in figure 10.

15

Figure 10: Visual results on adult data for best performing model trained on dataset 1A using only high quality data. Red

curve indicates predicted ECG, green curve indicates true ECG and yellow curve indicates tissue Doppler velocity. X-axis in

seconds and Y-axis in 10V (ECG) and cm/s (tissue Doppler).

The results on fetal data 3A (normal heart function and rhythm) for the best performing model

on dataset 1A are presented in figure 11b. Only visual results can be presented since

corresponding ECG on fetal data does not exist. An ECG curve is present, but it does not

correctly identify the heart cycles. The results on fetal data 3B (irregular heart rhythm) for the

best performing model (solver LBFGS, activation ReLU) on dataset 1A are presented in

figure 11a. Only visual results can be presented since corresponding ECG on fetal data does

not exist. An ECG curve is slightly present, but it does not correctly identify the heart cycles.

Figure 11: Visual results on a) abnormal heart function, b) healthy heart function tissue Doppler data from prenatal patients

of the best performing model trained on dataset 1A using only high-quality data. Red curve indicates predicted ECG and yellow curve fetal tissue Doppler velocity. Abnormal heart function ailments in a) are as follows: top left: Arrhythmia, top

right: long QT, bottom left: SVES, bottom right: AV-block III. X-axis in seconds and Y-axis in 10V (ECG) and cm/s (tissue Doppler).

4.2 Dataset 1B: The tissue Doppler dataset of one heart cycle

The results of the 10 trials for the different optimised models are presented in figure 12,

whereas the average scores are presented in table 3. The best performing model in a majority

of the statistical measures presented in table 3 has the combination of solver and activation

Adam and ReLU with an average PCC of 0.752 and visual score of 8.2, although the model

with solver Adam and activation tanh and the model with activation LBFGS and activation

tanh also performed well. Parameter result from the optimised models are presented in

appendix 1.

16

Figure 12: Test results for each model performance trained on dataset 1B (medium and high quality data) based on the statistical measures and visual scores. All axes have dimensionless values.

Table 3: Average of statistical scores for each combination of activation and solver for models trained on dataset 1B

Solver &

activation

Average PCC

Average MSE

Average R2

Average

visual score

LBFGS + ReLU 0.720 1588 0.511 7.7

LBFGS + logistic 0.734 1544 0.531 7.5

LBFGS + tanh 0.752 1657 0.549 7.7

Adam + logistic 0.727 1629 0.513 7.4

Adam + ReLU 0.752 1546 0.550 8.2

Adam + tanh 0.723 1627 0.511 8.4

SGD + logistic 0.734 1598 0.524 7.5

SGD + tanh 0.725 1500 0.522 7.7

SGD + ReLU -0.582 7419 -1.398 0.9

The visual results on adult data with high and medium quality data, with solver Adam and

activation ReLU is shown from figure 13.

Figure 13: Results on adult data for best performing model with high and medium quality data, yellow: tissue doppler

velocity, green: true ECG, red: predicted ECG. X-axis in seconds and Y-axis in 10V (ECG) and cm/s (tissue Doppler).

17

The results on fetal data 3A (normal heart function and rhythm) for the best performing model

(solver Adam, activation ReLU) on dataset 1B are presented in figure 14. The ECG curve can

accurately be predicted with some exceptions. The results on fetal data 3B (irregular heart

rhythm) for the best performing model (solver LBFGS, activation ReLU) on dataset 1B are

presented in figure 15. Only visual results can be presented since corresponding ECG on fetal

data does not exist.

Figure 14: Visual results on healthy heart function tissue Doppler data from prenatal patients from highest scoring model

trained on dataset 1B using high and medium quality data. Red curve indicates predicted ECG and yellow curve fetal tissue

Doppler velocity. X-axis in seconds and Y-axis in 10V (ECG) and cm/s (tissue Doppler).

Figure 15:Visual results on abnormal heart function tissue Doppler from prenatal patients from highest scoring model

trained on dataset 1B using hgh and medium quality data. Red curve indicates predicted ECG and yellow curve fetal tissue Doppler velocity. Abnormal heart function ailments are as follows: top left: Arrhythmia, top right: long QT, bottom left:

SVES, bottom right: AV-block III. X-axis in seconds and Y-axis in 10V (ECG) and cm/s (tissue Doppler).

4.3 Dataset 2B through E: The cine-loop datasets

Table 4 shows performance results from neural net training on datasets B through E with

different activation and solver settings, optimised towards higher PCC score. Evaluated on

three statistical measures and one visual. Parameter result from the optimized models are

presented in appendix 1. The best performing model in a majority of the statistical measures

is trained on dataset 2D-2E with the combination of activation and solver Adam and tanh

with an average PCC of 0.637 and visual score of 5.

18

Table 4: Statistical scores for each combination of activation and solver for models trained on datasets B and C as well as D and E

Solver & activation

Average

PCC

Average MSE

Average R2

Average visual

score

Dataset 2B-C ReLU,

Adam

0.135 2071000 -973 0

Dataset 2B-C tanh,

Adam

0.374 2465 -0.156 4

Dataset 2B-C

ReLU, LBFGS

0.293 2600 -0.224 3

Dataset 2B-C

tanh, LBFGS

0.334 3080 -0.449 4

Dataset 2D-E

ReLU, Adam

0.333 2690 -0.265 0

Dataset 2D-E

tanh, Adam

0.425 2352 -0.107 5

Dataset 2D-E

ReLU, LBFGS

0.365 2642 -0.243 3

Dataset 2D-E

tanh, LBFGS

0.407 2847 -0.340 6

In Figures 16-18 are plots from the neural net training on datasets B through E with their

visual scores. These illustrate how the different similarities in terms of simulated and real

ECG graph would be visually scored.

Figure 16: Dataset 2E, tanh activation and LBFGS solver. Visual score 6. In each graph in figures 16-18, the simulated ECG

is the red line while the green line is the real ECG. X-axis in seconds and Y-axis in 10V.

19

Figure 17: Dataset 2C, ReLU activation and LBFGS solver. Visual score 4. X-axis in seconds and Y-axis in 10V. The simulated ECG is the red line while the green line is the real ECG

Figure 18: Dataset 2B, ReLU activation and Adam solver. Visual score 0. X-axis in seconds and Y-axis in 10V. The simulated ECG is the red line while the green line is the real ECG

4.4 Dataset 2F: The cine-loop dataset, rate of change and minimized data

Table 5 shows performance results from neural net training on dataset F while Figure 19

shows graphs of simulated ECG compared to real ECG from training done on one train/test

split of dataset 2F. Parameter result from the optimised models are presented in appendix 1.

The performance results were acquired by taking the performance results from training the

neural net 11 times with different test/train split randomization and then averaging the

results. For each of these 11 times, 8 results were acquired with 8 different combinations of

layer complexity, activation settings and solver settings. For all combinations, parameters

were optimized towards higher PCC score. The performance results are three statistical

measures and one visual. The best performing model in a majority of the statistical measures

is the combination of activation and solver LBFGS and ReLU with an average PCC of 0.637

and visual score of 7.

20

Table 5: Statistical scores for each combination of activation, solver and layer complexity for dataset 2F

Solver &

activation

Average PCC

Average MSE

Average R2

Average visual

score

Dataset F

Simple layers

ReLU, Adam

0.420 627478 -152.8 1

Dataset F

Simple layers

tanh, Adam

0.580 5949 -0.169 6

Dataset F

Simple layers

ReLU, LBFGS

0.607 5623 -0.085 6

Dataset F

Simple layers

tanh, LBFGS

0.632 5804 -0.142 7

Dataset F

Complex layers

ReLU, Adam

0.610 6391 -0.258 4

Dataset F

Complex layers

tanh, Adam

0.540 6287 -0.227 7

Dataset F

Complex layers

ReLU, LBFGS

0.637 5584 -0.072 7

Dataset F

Complex layers

tanh, LBFGS

0.604 5975 -0.177 8

Figure 19: Dataset F, complex layer, activation ReLU and solver LBFGS. Visual score 8. X-axis in seconds and Y-axis in

10V. The simulated ECG is the red line while the green line is the real ECG

21

5 Discussion

The results of the different machine learning models showed promising signs of being able to

produce a fetal ECG from ultrasound data. Moreover, they also showed which improvements

could be made to further the indication that an adult training set can be used to predict the

fetal ECG. The continuation of other connected projects would further the development and

indicate any clinical usability of the method.

5.1 Results on dataset 1

As seen in figure 9, the performance in dataset 1A had very varying results depending on test

number, for the same model, PCC score could fluctuate between approximately 0.15 and

0.50. Since the test and training data was shuffled between each iteration, variance in this

regard could be due to inconsistent data - depending on which data happens to fall in training

the model performs differently. In comparison, dataset 1B had fewer varying results

regarding test number, the largest fluctuation of PCC was approximately 0.6 to 0.75, see

figure 12. Since the data in dataset 1B was cut in heart cycles, the data was less varied and

more consistent which could explain this difference between dataset 1A and 1B.

Performance in dataset 1A was also optimized on only high-quality data (table 2) - which

could further the indication that lower quality of data has a stronger negative impact on the

model than a smaller amount of data.

By comparing table 1 and table 3 it could be seen that the different models in dataset 1A

were more diverse in performance than dataset 1B, with exception of SGD + ReLU for

dataset 1B. This would be explained by the same reasoning as earlier - the data is more

consistent for 1B and can be more easily interpreted by any model. The SGD + ReLU

performed comparably worse on dataset 1B than the other models, and in dataset 1A it

produced unviable values. A conclusion for this could be that this combination of solver and

activation function does not suit dataset 1 at all.

Both datasets had the best performing model with activation ReLU, which is unsurprising

since it is the most used activation function today [16]. However, the two datasets differed in

the best performing solver, LBFGS for dataset 1A and Adam for dataset 1B. The Adam

solver generally performs better on larger datasets [20] which could be an explanation, since

dataset 1B was cut in heart-cycles the dataset contains more training/testing samples.

Regarding both visual and statistical measures, the best performing model in dataset 1B

outperforms the best performing model in dataset 1A, visually this can be seen by comparing

figure 10 and 13 and statistically by comparing table 1 and 3. The model trained on dataset

1A predicted some ECG curves nearly perfect whereas some are mostly noise and have no

distinguishable pattern. The model trained on dataset 1B is much more consistent and rarely

predicts noise, although one could argue that it could be overfitted to the healthy adult heart

and not detect changes or medical conditions changing the heart function. More data on

patients with different heart functions would be needed to assess this fact.

5.2 Results on dataset 2

Through observations of early testing using dataset 2 it was shown that the performance of

the algorithms trained on dataset 2 had drastically varying results depending on pre-

processing, solve type and activation type. As well as less variation depending on layer size,

and maximum amount of iterations. The rest of the parameters made little to no difference

and since optimization of these parameters was omitted due to time constraints, they will

22

also be omitted from the results. The solver type ‘SGD’ was not used for dataset 2 due to it

never producing a result better than noise during initial testing.

The performance results for datasets 2B through 2E, shown in table 4, indicated

improvement that correlated with the amount of pre-processing. The neural net training went

faster the smaller the dataset was in size, and also got higher average PCC scores. Visually

the improvement due to pre-processing could be seen when looking at the graphs in figure

16 compared to 17. With figure 17 being a typical simulated ECG curve for the low pre-

processing dataset 2C and figure 16 being a typical simulated ECG curve for the high pre-

processing dataset 2E. The algorithms trained on low pre-processing datasets like 2B and 2C

in general missed more QRS complexes and was visually scored lower than the high pre-

processing datasets. A couple of the graphs for dataset 2B and 2C are visually close to noise,

similarly to the one in figure 18. While dataset 2D and 2E had all trained algorithms

producing simulated ECG curves resembling the correct ones, except for the graphs acquired

using the combination of ReLU activation and Adam solver. As seen in figure 16, the

simulated ECGs made using dataset 2E missed QRS complexes only a small amount of

times and visually very similar to the real ECG

The different combinations of activations and solvers gave noticeably different results, with

the combination of ReLU activation and Adam solver only producing results better than

noise for datasets 2E and, while Tanh activation with both Adam and LBFGS solver

produced statistically and visually good results. For dataset 2, as shown in table 4 and 5, the

solve type LBFGS consistently outperformed the solve type Adam in terms of statistical and

visual results, with the Adam solve type only catching up in performance with the smallest

dataset 2E and dataset 2F. This could be explained by how the different solvers operate with

Adam using a type of gradient descent [30] and LBFGS using a more complex approach

with functions [31].

In terms of visual score, the correlation with dataset size was also clear with the smaller

datasets having less noisy graphs and more often finding the QRS complex as well as the P-

wave and the T-wave. Dataset 2F had the best performance in both visual score and

statistical score, only rarely failing to find the different parts of the heart cycle in its testing.

In some cases, the simulated ECG visually resembles a normal ECG more than the

corresponding real ECG, as shown in figure 18. The real ECGs in these cases that do not

resemble a normal ECG could in the future be removed by adding a pre-processing step

where the real ECGs are checked for their similarity to all types of normal ECGs including

arrythmias and removed if they are too different from all kinds of normal ECGs. How this

should best be done would need to be researched.

In the evaluation, only dataset 2F was evaluated based on an average of different

test/training data splits. This was done to save time since the optimization for each of the

iterations took over thirty minutes. Dataset 2F was the dataset which got the best results out

of all the cine-loop datasets so that one was chosen to be evaluated harder. It could be argued

that all of the datasets should have been evaluated based on averages, but the time

investment to make that possible was decided not to be worth it.

5.3 Fetal data results

The fetal datasets 3A and 3B were tested on the best performing models trained on dataset

23

1A and 1B. The results on dataset 1A were not satisfactory, since the model could not

accurately identify the different heart cycles (see figure 11). This could be due to the higher

fetal heart rate compared to adults, if the model was overfitted to the heart rate of an adult.

The results on dataset 1B were better (see figure 14), the resulting ECG curve could in most

cases be predicted. Because the inputs were cut into heart cycles for models trained on

dataset 1B, the ECG was easier to predict. Abnormalities in the tissue Doppler curves from

fetal dataset 3B gave results unlike a normal ECG for the model trained on dataset 1B, as

seen in figure 15. This is probably due to the fact that the model was not trained on adult

data containing irregular rhythm and thus has not learned how the corresponding ECG would

look like.

Generally, the results of the best performing model trained on dataset 1B indicates that an

algorithm trained on adult data could predict a potential fetal ECG of a patient with regular

heart rhythm. However, in predicting a fetal ECG of a patient with irregular heart rhythm,

more training and testing data would be necessary to assess possibility and performance.

5.4 Results regarding the aim

The first aim of this project was to produce a model that could produce a plausible simulated

ECG-curve from tissue Doppler ultrasound data. The results from figure 14 show that the

best performing model in dataset 1B could produce an ECG signal that has the right

characteristics of an ECG and is therefore considered plausible. This aim is consequently

considered met.

The second aim was to obtain statistical results for the models when tested on adult data. The

first statistical aim was to produce a result better than noise, an example of producing a

result equal to noise can be found in figure 18 which equalled a visual score of 0. The best

performing model in this paper obtained an average visual score of 8.2 which is considered

better than noise. The second statistical aim was to have P-wave, QRS-complex and T-wave

visible in every heart cycle for 90% of the test samples, examples of models obtaining this

aim can be found in figure 13 and figure 19. The third statistical aim was to have a PCC

score greater than 0.7, which was met by the best performing model trained on dataset 1B,

see table 3.

5.5 General improvements of models

Exemplified in figure 8, there was a lot of low-quality data in the datasets for this study,

which led to a decreased amount of useful data for training and testing. An improvement in

the quality of the overall data could increase the amount of viable data and produce more

accurate models for both datasets since the quality is elevated. The amount of data could also

be increased for the model to more accurately learn the correlation between movement of the

heart and ECG. More data from diverse ranges of heart function and rhythm could also

improve all models to correctly present an ECG for each abnormal heart function case.

For dataset 2 there are many possible improvements, apart from just having more data. Since

there is a lot of pre-processing done to the cine-loop data in order to make it usable for

machine learning, changes to that pre-processing can lead to big improvements in terms of

better algorithms or faster training times. Given more time, many different ways of reducing

frame size and measuring change over time in the cine-loops could be tested. This is

something that could be done automatically via code, but for a good home computer testing

just one of the pre-processing types would take around ten hours which would mean weeks

to find the best type.

24

The size of dataset 2 meant that each single neural net training iteration took a very long

time to perform and therefore, an optimization of a large number of parameters would take

too long to compute. So only the variables that contributed the most to change the end score

were optimized. But with enough time, those parameters could be optimized as well to

possibly achieve even better results.

The highest scoring model in dataset 1 was the model trained on dataset 1B. Since the

dataset uses data portioned in heart cycles, the ECG and tissue Doppler data need to be

manually processed in order to fit the model. This is time-consuming in a clinical setting,

and not ideal. An improvement of this model would therefore be to automatically, for

example using another ML model, cut the data into heart cycle portions to save time and

make processing easier in a clinical setting.

5.6 Improvements for results on fetal data

The ultrasound and ECG data used in the creation of all the algorithms in this paper was data

from adult patients, while the goal was to simulate ECG for fetuses. The differences in

cardiac physiology between an adult and a fetus makes the use of adult data imperfect, but

the fact that prenatal ECG is not available makes it a good option to start with. A possibility

for improvement in this area would be to use ultrasound and ECG data acquired from animal

fetuses with similar physiology as humans as training data.

5.7 Results of study on the future of fetal diagnostics

Earlier studies on fetal ECG detection has focused more on extracting the fetal ECG from

ECG measurements done on the mother, the paper [32] “A robust fetal ECG detection

method for abdominal recordings” is an example of the method, which proposes using priori

information about interference signals to distinguish the fetal ECG from the mother’s signal

and noise. Our study has utilized the possibility to get fetal ECG data without having the

mothers body interfering. This method of getting ECG could give less noisy results due to

less noisy input. The results of this study indicated that the correlation between ultrasound

data and ECG data could be found and used to simulate fetal ECG. Further studies on this

subject could include a development of the method proposed in this paper, with a larger,

higher quality training dataset more similar in function to the fetal data as well as data with a

multitude of variations, abnormal and normal hearts with regular and irregular heart rhythm.

Further processing of data could also benefit further results. This proposed method does not

however, reduce the “black-box” effect that it and many other ML methods exhibit. The

correlation between ultrasound data and ECG is still widely unknown which could hinder the

use of the method since it might be considered unconfirmed.

Although the highest scoring model in terms of PCC was trained on dataset 1B, it would be

better if continuation of this project was done with datasets similar to dataset 2E. Due to the

fact that dataset 1B requires manual pre-processing while dataset 2E only uses automatic

pre-processing that can easier be scaled up to use a larger amount of data.

25

6 Conclusion

We have shown that the proposed method of using ML algorithms to produce a simulated

ECG curve from ultrasound is a viable and informative route for obtaining an adult ECG, and

a potentially informative route for obtaining a possible fetal ECG. We have also obtained the

statistical aims in this paper for one of the models presented.

Specifically, a multilayer perceptron network with approximately 50 manually processed

training samples can predict an adult ECG with an average PCC score of 0.75 to the true

ECG, using tissue Doppler ultrasound as the input parameter. We have also shown that using

automatically processed cine-loops as the input parameter in a similar multilayer perceptron

network is as well a promising method for simulating adult ECG. Finally, we have results

supporting that the proposed method of obtaining an ECG in adults could also be used for

fetuses, and that our best performing model can produce a plausible fetal ECG.

We therefore conclude that the aims of this papers were met. Further developments of the

proposed method could further the claims of this paper and develop a clinical method of

obtaining a fetal ECG.

26

27

7 References

[1] Who, "Electrocardiograph, ECG," Core medical equipment - Information, 2011 2011.

[Online]. Available:

https://www.who.int/medical_devices/innovation/electrocardiograph.pdf.

[2] K. Jeffrey, S. Elizabeth, S. Robin, and V.-C. Lilliam, "ABDOMINAL FETAL EKG

NOISE REMOVAL. 171," Pediatric Research, vol. 39, no. S4, p. 31, 1996, doi:

10.1203/00006450-199604001-00190.

[3] M. Poessel, "Waves, motion and frequency: the Doppler effect," Einstein Online, vol.

Band 05, 2011 2011.

[4] L. S. Lilly, ed. , Pathophysiology of Heart Disease: A Collaborative Project of

Medical Students and Faculty sixth ed ed. Lippincott Williams & Wilkins, 2016.

[5] M. L. P. Å. Öberg, Medicin och Teknik. Studentlitteratur, 2016.

[6] A. M. training. "The Basics of ECG." https://www.aclsmedicaltraining.com/basics-of-

ecg/ (accessed.

[7] I. Texas Heart, "Fetal Heart," (in eng), TexasHeart. [Online]. Available:

https://www.texasheart.org/heart-health/heart-information-center/topics/the-fetal-

heart/.

[8] B. A. Pildner von Steinburg S, Lederer C, Grunow S, Schiermeier S, Hatzmann W,

Schneider KM, Daumer M., "What is the “normal” fetal heart rate?," PeerJ, vol.

1:e82, 2013.

[9] J. G. Betts et al., "Fetal Development," in Anatomy and Physiology. Houston, Texas:

OpenStax, 2013.

[10] L. Sanapo, J. D. Pruetz, M. Słodki, M. B. Goens, A. J. Moon-Grady, and M. T.

Donofrio, "Fetal echocardiography for planning perinatal and delivery room care of

neonates with congenital heart disease," Echocardiography, vol. 34, no. 12, pp. 1804-

1821, 2017/12/01/ 2017, doi: 10.1111/echo.13672

10.1111/echo.13672</p>.

[11] C. R. Deo, "Machine Learning in Medicine," Circulation, vol. 132, no. 20, pp. 1920-

1930, 2015, doi: 10.1161/CIRCULATIONAHA.115.001593.

[12] T. Hastie, J. Friedman, and R. Tibshirani, The Elements of Statistical Learning: Data

Mining, Inference, and Prediction. New York, NY: New York, NY: Springer New

York, 2001.

[13] G. Rebala, An Introduction to Machine Learning, 1st ed. 2019.. ed. Cham : Springer

International Publishing : Imprint: Springer, 2019.

[14] J. Dacombe. "An introduction to Artificial Neural Networks." Medium.

https://medium.com/@jamesdacombe/an-introduction-to-artificial-neural-networks-

with-example-ad459bb6941b (accessed 10 May, 2020).

[15] Z. Z. Li, Z. Y. Zhong, and L. W. Jin, "Identifying best hyperparameters for deep

architectures using random forests," vol. 8994, ed, 2015, pp. 29-42.

[16] S. Sharma. "Activation Functions in Neural Networks." Medium.

https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6

(accessed 10 May, 2020).

[17] G. James, An Introduction to Statistical Learning with Applications in R, 1st ed. 2013..

ed. New York, NY : Springer New York : Imprint: Springer, 2013.

[18] M. Titterington, "Neural networks," Wiley Interdisciplinary Reviews: Computational

Statistics, vol. 2, no. 1, pp. 1-8, 2010, doi: 10.1002/wics.50.

[19] S. Yatawatta, H. Spreeuw, and F. Diblen, "Improving LBFGS Optimizer in PyTorch:

Knowledge Transfer from Radio Interferometric Calibration to Machine Learning,"

ed, 2018, pp. 386-387.

https://www.who.int/medical_devices/innovation/electrocardiograph.pdf

https://www.aclsmedicaltraining.com/basics-of-ecg/

https://www.aclsmedicaltraining.com/basics-of-ecg/

https://www.texasheart.org/heart-health/heart-information-center/topics/the-fetal-heart/

https://www.texasheart.org/heart-health/heart-information-center/topics/the-fetal-heart/

https://medium.com/@jamesdacombe/an-introduction-to-artificial-neural-networks-with-example-ad459bb6941b

https://medium.com/@jamesdacombe/an-introduction-to-artificial-neural-networks-with-example-ad459bb6941b

https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6

28

[20] Scikit-learn. "MLPRegressor." Scikit-learn. https://scikit-

learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html

(accessed 11 May, 2020).

[21] A. Prieditis and S. Sapp, "Lazy overfitting control," vol. 7988, ed, 2013, pp. 481-491.

[22] N. Redell, "Shapley Decomposition of R-Squared in Machine Learning Models,"

2019.

[23] P. Sedgwick, "Pearson’s correlation coefficient," BMJ : British Medical Journal, vol.

345, no. jul04 1, 2012, doi: 10.1136/bmj.e4483.

[24] P. Garcia-Canadilla, S. Sanchez-Martinez, F. Crispi, and B. Bijnens, "Machine

Learning in Fetal Cardiology: What to Expect," Fetal diagnosis and therapy, pp. 1-10,

2020, doi: 10.1159/000505021.

[25] Q. Yu et al., "Automatic identifying of maternal ECG source when applying ICA in

fetal ECG extraction," Biocybernetics and Biomedical Engineering, vol. 38, no. 3, pp.

448-455, 2018, doi: 10.1016/j.bbe.2018.03.003.

[26] P. R. Muduli, R. R. Gunukula, and A. Mukherjee, "A deep learning approach to fetal-

ECG signal reconstruction," ed, 2016, pp. 1-6.

[27] M. Lukosevicius and V. Marozas, "Noninvasive fetal QRS detection using Echo State

Network," vol. 40, ed, 2013, pp. 205-208.

[28] E. Sulas, E. Ortu, L. Raffo, M. Urru, R. Tumbarello, and D. Pani, "Automatic

Recognition of Complete Atrioventricular Activity in Fetal Pulsed-Wave Doppler

Signals," vol. 2018-, ed, 2018, pp. 917-920.

[29] J. L. Rojo-Alvarez, A. Arenal-Maiz, and A. Artes-Rodriguez, "Support vector black-

box interpretation in ventricular arrhythmia discrimination," IEEE Engineering in

Medicine and Biology Magazine, vol. 21, no. 1, pp. 27-35, 2002, doi:

10.1109/51.993191.

[30] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," 2014.

[31] J. G. G. Andrew, "Scalable Training of L1 Regularized Log-Linear Models,"

presented at the International Conference on Machine Learning, 2007.

[32] S. M. M. Martens, C. Rabotti, M. Mischi, and R. J. Sluijter, "A robust fetal ECG

detection method for abdominal recordings," Physiol Meas, vol. 28, no. 4, pp. 373-

388, 2007, doi: 10.1088/0967-3334/28/4/004.

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html

Appendix 1: Optimised parameters for learning models

Dataset 1A

Solver & activation

Parameters

LBFGS + ReLU {'activation': 'relu', 'alpha': 0.00012, 'batch_size': 'auto', 'beta_1': 0.3,

'beta_2': 0.5, 'early_stopping': True, 'epsilon': 1e-08, 'hidden_layer_sizes':

201, 'learning_rate': 'constant', 'learning_rate_init': 0.95, 'max_fun': 23000,

'max_iter': 40, 'momentum': 0.9, 'n_iter_no_change': 5010,

'nesterovs_momentum': True, 'power_t': 0.5, 'random_state': None, 'shuffle':

True, 'solver': 'lbfgs', 'tol': 0.0006000000000000001, 'validation_fraction':

0.45, 'verbose': False, 'warm_start': False}

LBFGS + logistic {'activation': 'logistic', 'alpha': 8e-05, 'batch_size': 'auto', 'beta_1': 0.3,




'nesterovs_momentum': True, 'power_t': 0.5, 'random_state': None, 'shuffle':

True, 'solver': 'lbfgs', 'tol': 0.0008, 'validation_fraction': 0.2, 'verbose': False,

'warm_start': False}

LBFGS + tanh {'activation': 'tanh', 'alpha': 0.00014000000000000001, 'batch_size': 'auto',

'beta_1': 0.3, 'beta_2': 0.5, 'early_stopping': True, 'epsilon': 1e-08,

'hidden_layer_sizes': 221, 'learning_rate': 'constant', 'learning_rate_init':

0.95, 'max_fun': 22000, 'max_iter': 70, 'momentum': 0.9,

'n_iter_no_change': 5010, 'nesterovs_momentum': True, 'power_t': 0.5,

'random_state': None, 'shuffle': True, 'solver': 'lbfgs', 'tol': 0.0008,

'validation_fraction': 0.25, 'verbose': False, 'warm_start': False}

Adam + logistic {'activation': 'logistic', 'alpha': 8e-05, 'batch_size': 'auto', 'beta_1':

0.7000000000000001, 'beta_2': 0.5000000000000001, 'early_stopping':

True, 'epsilon': 1e-08, 'hidden_layer_sizes': 281, 'learning_rate': 'constant',

'learning_rate_init': 0.04, 'max_fun': 15000, 'max_iter': 70, 'momentum':

0.9, 'n_iter_no_change': 30, 'nesterovs_momentum': True, 'power_t': 0.5,

'random_state': None, 'shuffle': True, 'solver': 'adam', 'tol':

0.0006000000000000001, 'validation_fraction': 0.1, 'verbose': False,


Adam + ReLU {'activation': 'relu', 'alpha': 4e-05, 'batch_size': 'auto', 'beta_1':



'learning_rate_init': 0.09999999999999999, 'max_fun': 15000, 'max_iter':

20, 'momentum': 0.9, 'n_iter_no_change': 40, 'nesterovs_momentum': True,

'power_t': 0.5, 'random_state': None, 'shuffle': True, 'solver': 'adam', 'tol':

0.0008, 'validation_fraction': 0.3, 'verbose': False, 'warm_start': False}

Adam + tanh {'activation': 'tanh', 'alpha': 4e-05, 'batch_size': 'auto', 'beta_1':



'learning_rate_init': 0.060000000000000005, 'max_fun': 15000, 'max_iter':

60, 'momentum': 0.9, 'n_iter_no_change': 50, 'nesterovs_momentum': True,

'power_t': 0.5, 'random_state': None, 'shuffle': True, 'solver': 'adam', 'tol':

0.0008, 'validation_fraction': 0.05, 'verbose': False, 'warm_start': False}

SGD + logistic {'activation': 'logistic', 'alpha': 8e-05, 'batch_size': 'auto', 'beta_1': 0.9,

'beta_2': 0.999, 'early_stopping': False, 'epsilon': 1e-08,

'hidden_layer_sizes': 241, 'learning_rate': 'adaptive', 'learning_rate_init':


'n_iter_no_change': 10, 'nesterovs_momentum': True, 'power_t': 0,

'random_state': None, 'shuffle': True, 'solver': 'sgd', 'tol': 0.0004,

'validation_fraction': 0.35000000000000003, 'verbose': False, 'warm_start':

False}

SGD + tanh {'activation': 'tanh', 'alpha': 0.00018, 'batch_size': 'auto', 'beta_1': 0.9,




'n_iter_no_change': 10, 'nesterovs_momentum': True, 'power_t': 0,



False}

Dataset 1B

Solver &

activation

Parameters

LBFGS + ReLU {'activation': 'relu', 'alpha': 0.00012, 'batch_size': 'auto', 'beta_1': 0.3,




'nesterovs_momentum': True, 'power_t': 0.5, 'random_state': None,

'shuffle': True, 'solver': 'lbfgs', 'tol': 0.0004, 'validation_fraction': 0.1,

'verbose': False, 'warm_start': False}

LBFGS + logistic {'activation': 'logistic', 'alpha': 8e-05, 'batch_size': 'auto', 'beta_1': 0.3,







LBFGS + tanh {'activation': 'tanh', 'alpha': 0.0001, 'batch_size': 'auto', 'beta_1': 0.3,







Adam + logistic {'activation': 'logistic', 'alpha': 0.00012, 'batch_size': 'auto', 'beta_1':





'random_state': None, 'shuffle': True, 'solver': 'adam', 'tol': 0.0008,


False}

Adam + ReLU {'activation': 'relu', 'alpha': 6.000000000000001e-05, 'batch_size': 'auto',

'beta_1': 0.30000000000000004, 'beta_2': 0.7000000000000001,

'early_stopping': True, 'epsilon': 1e-08, 'hidden_layer_sizes': 281,

'learning_rate': 'constant', 'learning_rate_init': 0.04, 'max_fun': 15000,



'shuffle': True, 'solver': 'adam', 'tol': 0.0004, 'validation_fraction': 0.4,


Adam + tanh {'activation': 'tanh', 'alpha': 0.00012, 'batch_size': 'auto', 'beta_1':





'random_state': None, 'shuffle': True, 'solver': 'adam', 'tol': 0.0008,


SGD + logistic {'activation': 'logistic', 'alpha': 2e-05, 'batch_size': 'auto', 'beta_1': 0.9,





'random_state': None, 'shuffle': True, 'solver': 'sgd', 'tol':



SGD + tanh {'activation': 'tanh', 'alpha': 6.000000000000001e-05, 'batch_size': 'auto',

'beta_1': 0.9, 'beta_2': 0.999, 'early_stopping': False, 'epsilon': 1e-08,




'random_state': None, 'shuffle': True, 'solver': 'sgd', 'tol':



SGD + ReLU {'activation': 'relu', 'alpha': 0.0001, 'batch_size': 'auto', 'beta_1': 0.9,







Dataset 2B

Solver &

activation

Parameters

LBFGS + ReLU (activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,

beta_2=0.6, early_stopping=True, epsilon=1e-08,

hidden_layer_sizes=1, learning_rate='constant',

learning_rate_init=0.1, max_fun=20, max_iter=20,

momentum=0.9,

n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,

random_state=20, shuffle=True, solver='lbfgs', tol=0.01,

validation_fraction=0.1, verbose=False, warm_start=False)

LBFGS + tanh (activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,




momentum=0.9,




Adam + ReLU (activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,




momentum=0.9,


random_state=20, shuffle=True, solver='adam', tol=0.0002,


Adam + tanh (activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,




momentum=0.9,




Dataset 2C

Solver &

activation

Parameters




learning_rate_init=0.1, max_fun=5, max_iter=5, momentum=0.9,








momentum=0.9,








momentum=0.9,








momentum=0.9,




Dataset 2D

Solver &

activation

Parameters







validation_fraction=0.1, verbose=False, warm_start=False)>





momentum=0.9,








momentum=0.9,








momentum=0.9,




Dataset 2E

Solver &

activation

Parameters

LBFGS + ReLU

Simple layers

(activation='relu', alpha=0.0006, batch_size='auto', beta_1=0.6,







LBFGS + ReLU

Complex layers



hidden_layer_sizes=(384, 320, 256, 256, 192, 128),

learning_rate='constant', learning_rate_init=0.1, max_fun=15,

max_iter=15, momentum=0.9, n_iter_no_change=10,

nesterovs_momentum=True, power_t=0.5, random_state=20,

shuffle=True, solver='lbfgs', tol=0.01, validation_fraction=0.1,

verbose=False, warm_start=False)

LBFGS + tanh

Simple layers

(activation='tanh', alpha=0.0006, batch_size='auto', beta_1=0.6,




momentum=0.9,



validation_fraction=0.1, verbose=False, warm_start=False

LBFGS + tanh

Complex layers









Adam + ReLU

Simple layers




learning_rate_init=0.15000000000000002, max_fun=70,

max_iter=25,

momentum=0.9, n_iter_no_change=10,

nesterovs_momentum=True,

power_t=0.5, random_state=20, shuffle=True, solver='adam',

tol=0.0002, validation_fraction=0.1, verbose=False,

warm_start=False)

Adam + ReLU

Complex layers







shuffle=True, solver='adam', tol=0.0002, validation_fraction=0.1,


Adam + tanh

Simple layers





momentum=0.9,



validation_fraction=0.1, verbose=False, warm_start=False)>

Adam + tanh

Complex layers




learning_rate='constant',

learning_rate_init=0.060000000000000005,

max_fun=70, max_iter=20, momentum=0.9,

n_iter_no_change=10,




Dataset 2F

Solver &

activation

Parameters

LBFGS + ReLU

Simple layers





momentum=0.9,




LBFGS + ReLU

Complex layers




learning_rate='constant', learning_rate_init=0.1,

max_fun=40,


nesterovs_momentum=True, power_t=0.5,

random_state=20,

shuffle=True, solver='lbfgs', tol=0.01,

validation_fraction=0.1,


LBFGS + tanh

Simple layers





momentum=0.9,




LBFGS + tanh

Complex layers









Adam + ReLU

Simple layers





momentum=0.9,

n_iter_no_change=10, nesterovs_momentum=True,

power_t=0.5,

random_state=20, shuffle=True, solver='adam',

tol=0.0002,

validation_fraction=0.1, verbose=False,

warm_start=False)

Adam + ReLU

Complex layers









Adam + tanh

Simple layers





momentum=0.9,




Adam + tanh

Complex layers









Appendix 2: Parameter intervals for optimization

Parameter Interval

Learning_rate_init 0.01 - 0.2 with step 0.01

Beta1 and Beta2 0.1-1 with step 0.2

Alpha 0.00002-0.0002 with step 0.00002

Validation_fraction 0.05-0.55 with step 0.05

Max_iter 5-75 with step 5

Tol 0.0002-0.001 with step 0.0002

N_iter_no_change 10-60 with step 10

Hidden_layer_sizes 1-300 with step 20

Max_fun 10000-50000 with step 1000

Learning_rate ‘constant’, ‘invscaling’, ‘adaptive’

Power_t 0-2 with step 0.1

Momentum 0.001-1 with step 0.1

Nesterovs_momentum True, False

www.kth.se

TRITA CBH-GRU-2020:146

Documents

Machine Learning on Ultrasound