7
ECG-Based Heart Arrhythmia Diagnosis Through Attentional Convolutional Neural Networks Ziyu Liu University of New South Wales Sydney, Australia [email protected] Xiang Zhang Harvard Medical School Harvard University Boston, USA xiang [email protected] Abstract—Electrocardiography (ECG) signal is a highly ap- plied measurement for individual heart condition, and much effort have been endeavored towards automatic heart arrhythmia diagnosis based on machine learning. However, traditional ma- chine learning models require large investment of time and effort for raw data preprocessing and feature extraction, as well as challenged by poor classification performance. Here, we propose a novel deep learning model, named Attention-Based Convolutional Neural Networks (ABCNN) that taking advantage of CNN and multi-head attention, to directly work on the raw ECG signals and automatically extract the informative dependencies for accu- rate arrhythmia detection. To evaluate the proposed approach, we conduct extensive experiments over a benchmark ECG dataset. Our main task is to find the arrhythmia from normal heartbeats and, at the meantime, accurately recognize the heart diseases from five arrhythmia types. We also provide convergence analysis of ABCNN and intuitively show the meaningfulness of extracted representation through visualization. The experimental results show that the proposed ABCNN outperforms the widely used baselines, which puts one step closer to intelligent heart disease diagnosis system. I. I NTRODUCTION Electrocardiography (ECG) refers to the electrical activity of the heart over a specific period of time which be recorded by placing electrodes on the subject’s chest surface [1]–[3]. The electrodes detect the significant electrical variations on skin from the electro-physiologic pattern of the heart muscle, which depolarizing and re-polarizing during each heartbeat. The regular waves of a normal ECG is refer to PQRST complex waves, occurred by electrical condition in different part of heart and relative to the typical conduction path within heart. ECG signal is rich of critical information including the basic situation about the heart activities, along with position and size of heart chambers, thus, it is an commonly used noninvasive cardiac diagnostic procedure for a variety scenarios including the diagnosis of heart disease, heart rate monitoring [4], home- care tracking [5], congestive cardiac failure detection [6], remote medicine service [7] and hearth care applications which embedded in mobile devices [8]. Among the heart diseases which have been detected and recognized from the test of ECG, the heart arrhythmia could be occurred with any disturbance in the heart’s electrical system [9]. The symptoms usually demonstrated as an abnormal heart rate or an irregular heart beat, such as a racing or a slow heartbeat, trouble concentrating and it may also lead to pain in the chest and shortness of breath as well as dizziness and fainting. There are two types of heart arrhythmia, either common or fatal are all worth of attention. The first kind of arrhythmia (such as tachycardia, bradycardia and ventricular fibrillation) are considered as life-threatening and need professional treatment without delay, in some cases, the invasive medical devices like artificial pacemaker and cardiac defibrillator are also required for more serious syndrome. The other group of non-fatal heart arrhythmia are not danger to life immediately, but proper treatments are still necessary. Thus, accurate and timely heart arrhythmia diagnosis is crucial. However, the ECG-based diagnosis of heart arrhythmia faces several challenges. First, most existing studies are based on the manually extracted ECG features (e.g., RR inter- vals [10]) and signal embedding (e.g., Discrete Wavelet Trans- form and Fast Fourier Transform) [11]. The pre-processing and feature engineering requires domain knowledge and ar- tificial selection, which are time-consuming and expertise- depending. For example, a common used method in ECG processing is QRS complex detection in order to discover the QRS shape from the whole ECG heartbeat period. The QRS complex detection requires manually operation and may loss some information outside the QRS wave. To overcome the aforementioned issues, in this work, we directly work on the raw ECG data to overstep the pre-processing and feature engineering. Recently, deep learning has been demonstrated success in several research areas like computer version [12], nat- ural language processing [13], and Brain-computer Inter- face (BCI) [14]. In particular, Convolutional Neural Network (CNNs), one typical structure of deep learning, has been widely used for processing of digital signals due to their salient features such as regularized structure, good spatial locality and translation invariance. In the traditional CNN, all the input neurons are equally weighted. However, in the ECG data, the different heartbeat period takes various important. For example, the QRS complex only lasts for 0.08 0.12 seconds, which is 29 43 sampling slices, however, it takes a comparable part of information. To deal with this problem, insipired by the success of attention mechanism [], we propose Attention Based Convolutional Neural Network (ABCNN) with an attention layer to constraint the model to focus on the valuable signals. The proposed ABCNN is designed to arXiv:2108.10226v1 [eess.SP] 18 Aug 2021

ECG-Based Heart Arrhythmia Diagnosis Through Attentional

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECG-Based Heart Arrhythmia Diagnosis Through Attentional

ECG-Based Heart Arrhythmia Diagnosis ThroughAttentional Convolutional Neural Networks

Ziyu LiuUniversity of New South Wales

Sydney, [email protected]

Xiang ZhangHarvard Medical School

Harvard UniversityBoston, USA

xiang [email protected]

Abstract—Electrocardiography (ECG) signal is a highly ap-plied measurement for individual heart condition, and mucheffort have been endeavored towards automatic heart arrhythmiadiagnosis based on machine learning. However, traditional ma-chine learning models require large investment of time and effortfor raw data preprocessing and feature extraction, as well aschallenged by poor classification performance. Here, we propose anovel deep learning model, named Attention-Based ConvolutionalNeural Networks (ABCNN) that taking advantage of CNN andmulti-head attention, to directly work on the raw ECG signalsand automatically extract the informative dependencies for accu-rate arrhythmia detection. To evaluate the proposed approach, weconduct extensive experiments over a benchmark ECG dataset.Our main task is to find the arrhythmia from normal heartbeatsand, at the meantime, accurately recognize the heart diseasesfrom five arrhythmia types. We also provide convergence analysisof ABCNN and intuitively show the meaningfulness of extractedrepresentation through visualization. The experimental resultsshow that the proposed ABCNN outperforms the widely usedbaselines, which puts one step closer to intelligent heart diseasediagnosis system.

I. INTRODUCTION

Electrocardiography (ECG) refers to the electrical activityof the heart over a specific period of time which be recorded byplacing electrodes on the subject’s chest surface [1]–[3]. Theelectrodes detect the significant electrical variations on skinfrom the electro-physiologic pattern of the heart muscle, whichdepolarizing and re-polarizing during each heartbeat. Theregular waves of a normal ECG is refer to PQRST complexwaves, occurred by electrical condition in different part ofheart and relative to the typical conduction path within heart.ECG signal is rich of critical information including the basicsituation about the heart activities, along with position and sizeof heart chambers, thus, it is an commonly used noninvasivecardiac diagnostic procedure for a variety scenarios includingthe diagnosis of heart disease, heart rate monitoring [4], home-care tracking [5], congestive cardiac failure detection [6],remote medicine service [7] and hearth care applications whichembedded in mobile devices [8].

Among the heart diseases which have been detected andrecognized from the test of ECG, the heart arrhythmia could beoccurred with any disturbance in the heart’s electrical system[9]. The symptoms usually demonstrated as an abnormalheart rate or an irregular heart beat, such as a racing or aslow heartbeat, trouble concentrating and it may also lead

to pain in the chest and shortness of breath as well asdizziness and fainting. There are two types of heart arrhythmia,either common or fatal are all worth of attention. The firstkind of arrhythmia (such as tachycardia, bradycardia andventricular fibrillation) are considered as life-threatening andneed professional treatment without delay, in some cases, theinvasive medical devices like artificial pacemaker and cardiacdefibrillator are also required for more serious syndrome. Theother group of non-fatal heart arrhythmia are not danger to lifeimmediately, but proper treatments are still necessary. Thus,accurate and timely heart arrhythmia diagnosis is crucial.

However, the ECG-based diagnosis of heart arrhythmiafaces several challenges. First, most existing studies are basedon the manually extracted ECG features (e.g., RR inter-vals [10]) and signal embedding (e.g., Discrete Wavelet Trans-form and Fast Fourier Transform) [11]. The pre-processingand feature engineering requires domain knowledge and ar-tificial selection, which are time-consuming and expertise-depending. For example, a common used method in ECGprocessing is QRS complex detection in order to discoverthe QRS shape from the whole ECG heartbeat period. TheQRS complex detection requires manually operation and mayloss some information outside the QRS wave. To overcomethe aforementioned issues, in this work, we directly work onthe raw ECG data to overstep the pre-processing and featureengineering.

Recently, deep learning has been demonstrated successin several research areas like computer version [12], nat-ural language processing [13], and Brain-computer Inter-face (BCI) [14]. In particular, Convolutional Neural Network(CNNs), one typical structure of deep learning, has beenwidely used for processing of digital signals due to their salientfeatures such as regularized structure, good spatial localityand translation invariance. In the traditional CNN, all theinput neurons are equally weighted. However, in the ECGdata, the different heartbeat period takes various important.For example, the QRS complex only lasts for 0.08 ∼ 0.12seconds, which is 29 ∼ 43 sampling slices, however, it takesa comparable part of information. To deal with this problem,insipired by the success of attention mechanism [], we proposeAttention Based Convolutional Neural Network (ABCNN)with an attention layer to constraint the model to focus onthe valuable signals. The proposed ABCNN is designed to

arX

iv:2

108.

1022

6v1

[ee

ss.S

P] 1

8 A

ug 2

021

Page 2: ECG-Based Heart Arrhythmia Diagnosis Through Attentional

automatically extract the distinguishable information from theraw ECG data for the heart arrhythmia detection. The attentionmechanism is designed to pay attention to the most informativepart of the raw data such as QRS complex. Differ from thetraditional QRS complex detection method, the attention inABCNN is automatically learned by the model based on thetraining data. We summary our contributions as follow:

• We propose a novel approach, ABCNN, to automate ex-tract the distinctive representation directly from raw ECGsignal and precisely recognize the heart arrhythmia. Theproposed ABCNN combines attention mechanism andCNN to pay more attention to the important informationin ECG signals.

• The proposed ABCNN is evaluated over a well-knownheart arrhythmia dataset. The experiments results demon-strate that the proposed ABCNN outperforms a widerange of competitive state-of-the-art baselines.

The well-cleaned dataset and reusable implementation willbe made public after the acceptance.

II. PRELIMINARY KNOWLEDGE AND RELATED WORK

A. Preliminary knowledge of heart arrhythmia

Heart arrhythmia happens when the heartbeats can not workcorrectly which will cause abnormal electrical impulses of theheart. The too fast/slow or irregular heartbeat could directlylead to fault in the circulatory system that pumping bloodto support your organs to functioning well. Atrial fibrillation(AFib) is the most common type of heart arrhythmia thatcan cause stroke and heart failure. There are 33.5 millionpeople are diagnosed with AFib worldwide [15]. Therefore,the detection and diagnosis of heart arrhythmia either commonor fatal are all worth of attention.

Following the ANSI/AAMI (Association for the Advance-ment of Medical Instrumentation) EC57:1998 standard whichis used to detect cardiac rhythm disturbances and ST segmentmeasurement, heart beats are commonly divided in to fivetypical class including type N (normal), S (supraventricular),V (ventricular), F (fusion) and Q (question/unknown). Thedescription and their corresponding categories in a benchmark(MIT-BIH [16]) dataset are presented in table I.

• Normal heartbeat (class N), it has a healthy heart rateranging from 50 ∼100 beats per minute.

• Supraventricular ectopic beat (class S), including atrialpremature beat, aberrated atrial premature beat, nodalpremature beat and supraventricular premature or ectopicbeat. Among them, the atrial premature beat is themost commonly heart beat irregularity and not usuallyrequiring formal treatment. Frequent APC is possibleto trigger palpitations. It features often demonstrated onECG as a normal QRS complex following an abnormal Ppattern. As for the supraventricular premature or ectopicbeat, it is atrial contractions initiated by ectopic foc. Itfeature usually illustrate as negative P wave, narrow QRScomplex and no compensatory pause.

• Ventricular ectopic beat (class V) inclusive of the prema-ture ventricular contraction and ventricular escape beat.Premature Ventricular Contraction is a premature beatdrawing by an ectopic focus within the ventricles. Ithas a broad QRS complex (> 120 ms) with abnormalmorphology, occurs earlier than the regular predictiontime of next sinus impulse and a intact compensatorypause would come after.

• Fusion beat (class F), due to the morphology similaritywith other types,this type of heartbeat are very hard tocharacterize.

• Unknown beat (class Q), including peaced beat, fusionof paced and normal beat as well as the unclassifiedbeat. The paced beat is a kind of arrhythmia triggeredat specific rate and speed.

B. Traditional ECG classification

A large amount of various studies on heart arrhythmia clas-sification and diagnosis by using traditional machine learningapproaches have been developed in past decades [1], [17]–[20]. These researches aim to build a machine learning model(such as SVM [17], random forest [18], Naıve Bayes [1],shallow neural network [19]) in order to assist doctors/expertsin the field obtain a more comprehensive analysis and di-agnosis. For example, Mohebbi et al. [20] present a lineardiscrimination analysis-based feature reduction method andfed the extracted features to SVM to recognize the abnormalpattern of ECG.

However, the traditional machine learning methods arefacing several issues that could not be neglected. First, thecomplicated and messy pre-processing of raw ECG data istime consuming and energy wasted, yet not be certain aboutwhether it is positive affected the final results. Furthermore, thefeature engineering (such as wavelet decomposition and time-domain/frequency-domain feature extractions) also requiresplenty of time. The manually extracted features may not helpto achieve the optimal classification results.

C. Deep learning-based ECG classification

The classifiers established on top of deep neural networksare also an intense research trend [2], [21]–[24]. For in-stance, Kiranyaz et al. [25] develop a patient-specific ECGclassification and monitoring system, for each single user,there will be only relatively small common and individualspecific data to fed into the 1-D CNN. The feature extractionand classification process thus can be combined into onesingle learning framework, which would further enhance theclassification system performance. In [26], Acharya et al.propose a computer-aid diagnosis system which based on CNNto automated discriminate different ECG signal segments. Theadopted CNN contains 11 layers followed by the output layerwith 4 nodes, each of them is a depiction of one ECG class ofrecurring life-threatening arrhythmias. In the experimentation,2 and 5 seconds’ duration’s of ECG signals with QRS complexdetection removed is adopted.

Page 3: ECG-Based Heart Arrhythmia Diagnosis Through Attentional

TABLE I: Redefine the MIT-BIH heart arrhythmia dataset ECG classes based on the AAMI standard

AAMI ECGclasses N S V F Q

Description Normal heartbeat that not includedin S,V,F or Q classes

Supraventricularectopic beat Ventricular ectopic beat Fusion beat Unknown beat

MIT-BIHdatabase

ECG classes

Normal beat (N) Atrial prematurebeat (A)

Premature ventricularcontraction (V)

Fusion of ventricularand normal beat (F) Paced beat (/)

Left bundle branchblock beat (L)

Aberrated atrialpremature beat (a)

Ventricular escapebeat (E)

Fusion of pacedand normal beat (f)

Right bundle branchblock beat (R)

Nodal (junctional)premature beat (J)

Unclassifiablebeat (Q)

Atrial escape beat (e) Supraventricular prematureor ectopic beat (S)

Nodal (junctional)escape beat (j)

Although the aforementioned algorithms employs deep neu-ral networks structure, some alter feature extraction (suchas wavelet based method [21]) are still required. More im-portantly, they all assign the same weights across differentcomponents in ECG signals, which prevent the model tofocus on the real distinctive pattern. Moreover, the accuracyof the heart arrhythmias is not good enough for the practicalimplementation.

III. METHODOLOGY

A. Overview and motivation

CNN has obtained great success in many research topics,such as computer vision [12] and recommendation system[25], due to the excellent high-level feature learning ability.Although Recurrent Neural Networks (RNN; such as LSTM)are designed for temporal feature capturing, however, they arechallenged by dealing with long dependencies and computa-tional expensive for long sequence: our ECG signals are longsequences, so that we design our model on top of CNN base. Ingeneral, a CNN architecture is essentially composed of severalkey layers: the convolutional layer, the pooling layer, and thefully connected layer. For the aim of discovering the internaldependencies among different features in each ECG signal,we propose an ABCNN model. For better understanding, wetake a real ECG sample from MIT-BIH dataset, a benchmarkECG collection, as an example to present our model. The ECGsignals are acquired in two channels/leads with a sampling rateof 360 Hz. We select a sub-time period (240 sampling slice,i.t., 0.76 second) signal (near to R peak) of each heartbeat.The R peak is in the middle of the selected segment. Thus,the ECG signal of each heartbeat is arranged as a matrix withshape [2, 240] where the row denotes channel and the columndenotes the sampling slice.

In traditional CNN, all input neurons are equally weighted.However, in ECG data, the different heartbeat periods takedifferent importantance. For example, the QRS complex onlylasts for 0.08− 0.12 seconds (around 10% time in a heartbeat)but contribute the most crucial information. To deal withthis problem, we propose ABCNN with an attention layer toconstraint the model to focus on the valuable signals. In otherwords, the attention layer assign different weights to the inputneurons in order to force the model pay attention to the mostinformative nodes.

B. Attention-Based Convolutional Neural Networks

The designed ABCNN is composed of the following layers(Figure 1): the input layer which receives the input data, theattention layer which measures the importance of input sampleand controls how much attention the model should pay toeach signal slice, the 1st convolutional and the 1st poolinglayer, the 2nd convolutional and the 2nd pooling layer, the 1stand the 2nd fully connected layer, and the output layer whichfurther generates output labels. Next we will introduce each ofthe components of ABCNN in detail. The convolutional layercontains a number of kernels to take convolutional operationwith the ECG signals [27]. Afterward, a pooling layer andnon-linear embedding are employed to extract the spatialinformation from the raw input data. Generally, a pooling layeris applied after each convolutional layer, in order to reduce thedimension of representation and mitigate the computationalload. Moreover, it can decrease the number of parameter neu-rons and prevent overfitting. A fully connected layer containsa number of neurons and each neuron is connected to eachof the neurons in the successive layer, meanwhile, there’s noconnections among the same layer.

1) Notation: Since ABCNN works on the intra-samplerelevance, we analyze the working theory of a single ECGsample (a single heartbeat). Denote the representations inthe i-th layer (i = 1, 2, · · · , 8) by Xi ∈ R[ni ,Ki ,di ], whereni, Ki and di denote the number of channels, representationdimensions, and the representation depth of the i-th layer,respectively. In the input layer, a single ECG sample is denotedby X1 ∈ R[2,240,1]. In the following introduction, we use exactnumbers to denote the shape of data for better understanding.

2) Multi-head attention: We design ABCNN to pay moreattention to the important ECG peaks by integrating attentionmechanism. In specific, we assign a weight to each timeslice in the input ECG sample. The attention weights areautomatically learned depend on the ECG signal through anonlinear mapping. For the input segment with 240 timeslices, we use a nonlinear fully-connected layer to learn anattention weight vector with 240 elements, where each elementcorresponds to the importance of a certain time slice. Wethen broadcast the weights to the same shape as Xi andconduct element-wide multiplication to adjust the scale ofeach dimension. Furthermore, to increase the express ability,we introduce multi-head attention to independently calculatemultiple attention weights [28]. Then, we take the average of

Page 4: ECG-Based Heart Arrhythmia Diagnosis Through Attentional

Multi-head attention weights

N

S

V

F

Q

ECG signals

Diagnosis results

Weightsbroadcast

1st conv [2,240,2]

1st pooling [2,120,2]

2nd pooling [2,60,4]

2nd conv [2,120,4]

ECG sample [2,240] Fully-connected

Fig. 1: The workflow of the proposed ABCNN. The model learns an attention weight for each time slice of the input EEG sample, in orderto pay more attention to the important signatures. The weights are broadcast to the same shape with sample and update the sample throughelement-wise multiplication. The five nodes in the last layer produces the probability that how likely the input ECG sample associates witha certain heart arrhythmia type.

all attention heads’ results as the final attention weights.3) ABCNN structure: In the first convolutional layer, we

set the convolutional kernels with shape [1, 1] and the strideas [1, 1]. The stride denotes the x-axis wise sliding distanceand y-axis wise sliding distance during the kernel’s convolveoperation. We set the padding as zero-padding and keepthe shape insistent, as a result, the representation size willkeep unchanged through the convolution calculation. Thereare two kernels in the first convolutional layer, therefore, therepresentation depth changed from 1 to 2, so the shape of X2is [2, 240, 2].

We employ the ReLU as the activation function in order tonon-linearize the convolved representation after each poolinglayer. In the 1st pooling layer, the pooling window is setas [1, 2] and the moving stride is set as [1, 1]. The poolingmethod is Max pooling, therefore, the output of each poolingwindow is the maximum value. Through the pooling layer,the representation shape shrinks according to the poolingparameters, thus, the shape of X3 is [2, 120, 2]. Similarly, the2nd convolutional layer has 4 kernels with size of [1, 2] andstride of [1, 1]. The results has a shape of [2, 120, 4]. The 2ndpooling layer selects the identical window and stride with the1st pooling layer. The results the shape as [2, 60, 4].

In the two fully-connected layers, the extracted informativefeatures calculating from the previous layers, are flat to a 1-D vector. For instance, the representation of the 2nd poolinglayer (X5 which has size of [2, 60, 4]) is unfolded to a 1-Dvector (X6) which has size of [1, 480]. Finally, the output ofour model is X8 with the size of [1, 5]. The cross-entropyis employed as the cost function which is optimized by thepopular AdamOptimizer.

IV. EXPERIMENTAL SETTING

A. Dataset

Our model is evaluated over a benchmark MIT-BIH heartArrhythmia dataset [16] that has been widely used in automaticarrhythmia classification. The ECG signals in this database

contains 4000 long-term Holter recordings, which lasts for24-hour, gathered from 47 subjects (60% inpatients and 40%outpatients; 25 males aged from 32 to 89 yers; 22 femaleaged from 23-89 years) over various ages. Among these,48 adequate quality records is selected and each of them isaround 30 minutes. The dataset contains two subsets where thefirst subset contains 23 records which represent a number ofclassic heart beat signals. The second subset intends to coverthe less common but clinically significant arrhythmias (e.g.,junctional, supraventricular arrhythmias, complex ventricular,and conduction abnormalities). The ECG signals are collectedin two channels ((MLII and V5) and the sampling frequencyis 360 Hz. Each heartbeat is around 1 second, therefore, weselect a sub-time period (240 sampling slice, collected in0.76 second) signal of each heartbeat. The R peak is in themiddle of the selected segment. Thus, the ECG signal of eachheartbeat is arranged as a matrix with shape [2, 240] wherethe row denotes channel and the column denotes the samplingslice.

B. Multi-class arrhythmia diagnosis scenarioAccording to the characteristics of heart arrhythmia, we split

the heartbeats from MIT-BIH dataset into 5 typical classes.All the 48 subjects (include a variety of rare but clinicallyimportant phenomena) are used for automatic heart arrhythmiaclassification. We select 2,000 samples from each subjectand get 96,000 samples in total. Every sample includes 240sampling point with two numerical values for each point. Werandomly split the dataset into training (80%) and testing(20%) set. For each sample, R peak are also settled at the mid-dle of the time window to include different heart arrhythmias’major features. The standard (z-score) normalization is appliedto the raw ECG data to prevent the influence of channels’varying scales.

C. Evaluation metricsThe number of samples associated with different arrhythmia

types is highly imbalance, we mainly adopt ROC (Receiver

Page 5: ECG-Based Heart Arrhythmia Diagnosis Through Attentional

N

S

V

F

Q

N S V F Q

(a) Confusion Matrix

NSVFQ

(b) ROC with AUC

Fig. 2: Confusion matrix and ROC curves

TABLE II: The comparison between the proposed method andcommonly used baseline in multiclass arrhythmia detection.

Literature Model AUC[29] SVM 0.9514 ± 0.0053[18] RF 0.9704 ± 0.0041[30] KNN 0.9126 ± 0.0087[26] CNN 0.9655 ± 0.0251[3] LSTM 0.6297 ± 0.03363

Ours ABCNN 0.9896 ± 0.0109

TABLE III: Summary of diagnosis results in each specific arrhythmiacategory.

Label Precision Recall F-1 SupportN 0.9902 0.9968 0.9965 16884S 0.9476 0.9061 0.9263 1298V 0.9795 0.9571 0.9682 1351F 0.9147 0.7662 0.8339 154Q 0.995 0.996 0.9955 993

Average 0.9865 0.9868 0.9865 20680

Operating Characteristic) curves and AUC (Area Under theCurves) to describe the model’s performance as they not onlyconsider the predicted results but also consider the predictiveprobabilities (confidence level on the predicted label). Inspecific, an important advantage of ROC is that the curveis nonsensitive to the training data size. For example, theROC curve of a classifier has not obviously fluctuation nomatter it is trained by 1 million samples or 10 billion samples.The value of AUC drops in the range [0.5, 1], the higher thebetter. Although the commonly used evaluation metrics (suchas accuracy, precision, recall, F-1 score) are not suitable asthey all sensitive the sample amount, we use them to showthe model’s performance on each single class.

D. Hyper-parameter tuning

Through the preliminary hyper-parameter tuning, we get theoptimal setting of the proposed ABCNN: the first convolu-tional layer has 4 kernels with size [2, 2] and stride of [1, 1];the first pooling layer has the kernel size as [1, 2] and strides of

[1, 1]; the second convolutional layer has 8 kernels which havethe identical kernel and stride size with the first convolutionallayer; the second pooling layer has the kernel size as [1, 2]and strides of [1, 1]. The two fully connected layers have 240,60 hidden neurons, respectively. The output layer has 5 nodescorresponding to 5 ECG classes. The number of heads forattention mechanism is 32. We use Adam optimizer with alearning rate of 0.0005. We use early-stopping strategy thatstop the iteration if the testing loss not decrease over the priorconsequent 20 epochs. A dropout layer with 0.3 dropout rateis added to the first fully-connected layer. The experimentsettings (e.g., training set and testing set splitting, learningrate, number of epochs), without specific clarification, are thesame for all the methods.

E. Comparison Methods

This work employs a number of widely used machinelearning models to compare with the proposed approach. Thebaselines include Support Vector Machine (SVM), RandomForest (RF), K-nearest neighbors (KNN). The key parametersof the baselines are listed here: KNN with 3 nearest neigh-bors; SVM with RBF kernel; RF with 50 trees. To furtherdemonstrate the effectiveness of ABCNN, we compare withthe standard CNN and long short-term memory (LSTM), thelatter is another very popular deep learning architecture. Forfair comparison, the CNN baseline has the same parameters asABCNN (introduced in previous subsection). except attentionmodule. For the baselines with LSTM model, we reimplementthe model with 2 LSTM layers where each layer has 6 cells;the time step is 240. More experimental settings are presnetedin our public code.

V. RESULTS

A. Classification results

The comparison results of ABCNN and several competitivebaselines are shown in Table II. We can observe that our

Page 6: ECG-Based Heart Arrhythmia Diagnosis Through Attentional

(a) Raw data

−8 −6 −4 −2 0 2 4 6 8component 1

−7.5

−5.0

−2.5

0.0

2.5

5.0

7.5

compo

nent 2

class 0class 1class 2class 3class 4

(b) After classification

Fig. 3: Visualization of raw data and learned representation.

approach achieves the highest classification performance bythe AUC of 98.96%, and outperforms all the well-knowntraditional machine learning-based or deep learning-basedclassifiers. For a fair comparison, we use AUC to evaluatethe results as the dataset are imbalanced. For more details ineach single heart arrhythmia type, the classification report inTable III show the classification precision, recall, F-1 scoreand support for each specific class. The support denotes thenumber of samples of the corresponding class in testing set.The confusion matrix, ROC curves and AUC scores for eachclass are presented in Figure 2. The X-axis of ROC figureis in log scaled. The class F has the lowest accuracy andAUC score, which shows that the fusion beats is the mostdifficult recognized arrhythmia type. Moreover, we learn fromthe confusion matrix that a number of misclassified fusionbeats (16.88%) are recognized as class 0 (the normal beats).This is reasonable as the result that the fusion beats combinesventricular beats and normal beats. The class N and classQ obtain the highest performances for the reason that theycontain the highest and the second highest number of samplesin the datasets.

B. Convergence analysis

For a novel robust approach, the convergence characteristicis a crucial criteria. Thus, we assess how the accuracy and costchanges with the increasing of training iteration. In Figure 4,

0 200 400 600 800 1000Iterations

0.91

0.92

0.93

0.94

0.95

0.96

0.97

0.98

0.99

Accu

racy

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Cost

Fig. 4: Accuracy and testing loss varies with the number of iterationsin ABCNN. The highest accuracy achieves 98.57%.

the left Y axis denotes the accuracy change trend while theright side vertical axis denotes the cost variance. We canobserve from the figure that the accuracy pumps up to around98% in less than 300 iterations. However, in the around 350-thiteration, the accuracy drops to about 96% and the cost hasa slightly increase accordingly. This is because the gradientfunction of neural network is not a convex optimization, whichmeans that it can not guarantee to find the global optimalposition. However, the ABCNN can find a low cost localoptimal solution, although it may jump out the local optimaland then optimize to another local area. This will cause thechange of the accuracy and cost, but the performance and costare trending to convergence in a long training process like1000 epochs. At last, we can see that our model converged at98.5% after 800 iterations.

C. Visualization

To intuitively present the results of the proposed classi-fication model, we visualize the the raw data of MIT-BIHdataset and the extracted features of ABCNN with 5 labels. Wefirst adopt Principle Component Analysis (PCA) to reduce thedimension of raw data to two dimension in order to visualizethe data in 2D coordinate system. The visualization is plottedbased on the test dataset. Figure 3 shows the raw data andthe ABCNN processed features in heart arrhythmia detection.It can be obviously observed that the classified samples areclustered better than the raw data.

VI. DISCUSSION AND FUTURE WORK

In this section, we discuss the disadvantages and futureworks of the proposed ABCNN. ABCNN is demonstratedeffective with competitive heart arrhythmia diagnosis perfor-mance over the benchmark dataset. However, there are stillseveral open challenges require future attention.

First, the deep learning algorithms are heavily depend on theparameter tuning. In this work, we conduct pre-experiments totune the hyper-parameters like learning rate, convolutional lay-ers, kernel size, pooling size, etc. However, more detailed andreasonable hyper-parameters tuning methods, e.g., OrthogonalArray method [31], should be adopted to save time and getthe optimal parameters.

Page 7: ECG-Based Heart Arrhythmia Diagnosis Through Attentional

Second, the heart arrhythmia detection constantly faces anumber of obstructions before the wide deployment in realworld environment. For example, the online test is necessary tovalidate the performance under online situation. As we know,some machine learning algorithms may perform excellentoffline but can’t deal with the online test. This is one of thekey research directions in the future.

VII. CONCLUSION

In this paper, we propose a novel and robust heart arrhyth-mia detection model, called ABCNN, by combing multi-headattention mechanism and convolutional neural network. Theproposed model directly works on the raw ECG data withoutfeature engineering. The attention layer forces the modelto focus on the most informative ECG signals. The CNNstructure is employed to automatically discover the spatialdependency from the input data and used for classification.To evaluate the effectiveness of the proposed ABCNN, weconduct extensive experiments over the well-known MIT-BIHheart arrhythmia dataset. Our approach achieves the AUC of98.96% for the arrhythmia diagnosis. The experiments resultsoutperforms the widely used baselines, which demonstrate thatABCNN is effective in heart arrhythmia detection.

REFERENCES

[1] T. Soman and P. O. Bobbie, “Classification of arrhythmia using machinelearning techniques.” WSEAS Transactions on computers, vol. 4, no. 6,pp. 548–552, 2005.

[2] S. Chauhan and L. Vig, “Anomaly detection in ecg time signals viadeep long short-term memory networks,” in Data Science and AdvancedAnalytics (DSAA), 2015. 36678 2015. IEEE International Conference on.IEEE, 2015, pp. 1–7.

[3] B. Hou, J. Yang, P. Wang, and R. Yan, “Lstm-based auto-encoder modelfor ecg arrhythmias classification,” IEEE Transactions on Instrumenta-tion and Measurement, vol. 69, no. 4, pp. 1232–1240, 2019.

[4] A. Schumann, N. Wessel, A. Schirdewan, K. J. Osterziel, and A. Voss,“Potential of feature selection methods in heart rate variability analysisfor the classification of different cardiovascular diseases,” Statistics inmedicine, vol. 21, no. 15, pp. 2225–2242, 2002.

[5] S. S. Lobodzinski and A. A. Jadalla, “Integrated heart failure telemon-itoring system for homecare,” Cardiology journal, vol. 17, no. 2, pp.200–204, 2010.

[6] A. Hossen and B. Al-Ghunaimi, “Identification of patients with con-gestive heart failure by recognition of sub-bands spectral patterns,” inConf Proc of World Academy of Science, Engineering and Technology,vol. 34. Citeseer, 2008, pp. 21–24.

[7] J.-c. Hsieh and M.-W. Hsu, “A cloud computing based 12-lead ecgtelemedicine service,” BMC medical informatics and decision making,vol. 12, no. 1, p. 77, 2012.

[8] E. B. Mazomenos, D. Biswas, A. Acharyya, T. Chen, K. Maharatna,J. Rosengarten, J. Morgan, and N. Curzen, “A low-complexity ecgfeature extraction algorithm for mobile healthcare applications,” IEEEjournal of biomedical and health informatics, vol. 17, no. 2, pp. 459–469, 2013.

[9] S. L, “The only ekg book you will ever need,” JAMA, vol. 261,no. 3, p. 453, 1989. [Online]. Available: +http://dx.doi.org/10.1001/jama.1989.03420030127049

[10] D. Sadhukhan and M. Mitra, “R-peak detection algorithm for ecg usingdouble difference and rr interval processing,” Procedia Technology,vol. 4, pp. 873–877, 2012.

[11] A. A. Shinde and P. Kanjalkar, “The comparison of different transformbased methods for ecg data compression,” in 2011 International Confer-ence on Signal Processing, Communication, Computing and NetworkingTechnologies. IEEE, 2011, pp. 332–335.

[12] Z. Huang, R. Wang, S. Shan, and X. Chen, “Face recognition on large-scale video in the wild with hybrid euclidean-and-riemannian metriclearning,” Pattern Recognition, vol. 48, no. 10, pp. 3113–3124, 2015.

[13] R. Socher, Y. Bengio, and C. D. Manning, “Deep learning for nlp(without magic),” in Tutorial Abstracts of ACL 2012. Association forComputational Linguistics, 2012, pp. 5–5.

[14] L. Deng, D. Yu et al., “Deep learning: methods and applications,”Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp.197–387, 2014.

[15] S. S. Chugh, R. Havmoeller, K. Narayanan, D. Singh, M. Rienstra, E. J.Benjamin, R. F. Gillum, Y.-H. Kim, J. H. McAnulty Jr, Z.-J. Zhenget al., “Worldwide epidemiology of atrial fibrillation: a global burdenof disease 2010 study,” Circulation, vol. 129, no. 8, pp. 837–847, 2014.

[16] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C.Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E.Stanley, “Physiobank, physiotoolkit, and physionet,” Circulation, vol.101, no. 23, pp. e215–e220, 2000.

[17] S. Osowski, L. T. Hoai, and T. Markiewicz, “Support vector machine-based expert system for reliable heartbeat recognition,” IEEE transac-tions on biomedical engineering, vol. 51, no. 4, pp. 582–589, 2004.

[18] R. Mahajan, R. Kamaleswaran, J. A. Howe, and O. Akbilgic, “Cardiacrhythm classification from a short single lead ecg recording via randomforest,” in 2017 Computing in Cardiology (CinC). IEEE, 2017, pp.1–4.

[19] R. d. F. Dalvi, G. T. Zago, and R. V. Andreao, “Heartbeat classificationsystem based on neural networks and dimensionality reduction,” Re-search on Biomedical Engineering, vol. 32, no. 4, pp. 318–326, 2016.

[20] M. Mohebbi and H. Ghassemian, “Detection of atrial fibrillationepisodes using svm,” in Engineering in Medicine and Biology Society,2008. EMBS 2008. 30th Annual International Conference of the IEEE.IEEE, 2008, pp. 177–180.

[21] O. Yildirim, “A novel wavelet sequence based on deep bidirectional lstmnetwork model for ecg signal classification,” Computers in biology andmedicine, vol. 96, pp. 189–202, 2018.

[22] M. M. Al Rahhal, Y. Bazi, H. AlHichri, N. Alajlan, F. Melgani,and R. R. Yager, “Deep learning approach for active classification ofelectrocardiogram signals,” Information Sciences, vol. 345, pp. 340–354,2016.

[23] P. Rajpurkar, A. Y. Hannun, M. Haghpanahi, C. Bourn, and A. Y.Ng, “Cardiologist-level arrhythmia detection with convolutional neuralnetworks,” arXiv preprint arXiv:1707.01836, 2017.

[24] S. Sakib, M. M. Fouda, and Z. M. Fadlullah, “A rigorous analysisof biomedical edge computing: An arrhythmia classification use-caseleveraging deep learning,” in 2020 IEEE International Conference onInternet of Things and Intelligence System (IoTaIS). IEEE, 2021, pp.136–141.

[25] S. Kiranyaz, T. Ince, and M. Gabbouj, “Real-time patient-specific ecgclassification by 1-d convolutional neural networks,” IEEE Transactionson Biomedical Engineering, vol. 63, no. 3, pp. 664–675, 2016.

[26] U. R. Acharya, H. Fujita, O. S. Lih, Y. Hagiwara, J. H. Tan, andM. Adam, “Automated detection of arrhythmias using different inter-vals of tachycardia ecg segments with convolutional neural network,”Information sciences, vol. 405, pp. 81–90, 2017.

[27] X. Zhang, L. Yao, X. Wang, J. J. Monaghan, D. Mcalpine, and Y. Zhang,“A survey on deep learning-based non-invasive brain signals: recentadvances and new frontiers,” Journal of Neural Engineering, 2020.

[28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advancesin neural information processing systems, 2017, pp. 5998–6008.

[29] T. Barrella and S. McCandlish, “Identifying arrhythmia from electrocar-diogram data,” 2014.

[30] R. Saini, N. Bindal, and P. Bansal, “Classification of heart diseases fromecg signals using wavelet transform and knn classifier,” in InternationalConference on Computing, Communication & Automation. IEEE, 2015,pp. 1208–1215.

[31] X. Zhang, X. Chen, L. Yao, C. Ge, and M. Dong, “Deep neuralnetwork hyperparameter optimization with orthogonal array tuning,” inInternational conference on neural information processing. Springer,2019, pp. 287–295.