DOI QR코드

DOI QR Code

CAB: Classifying Arrhythmias based on Imbalanced Sensor Data

  • Wang, Yilin (Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science & Technology, Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology) ;
  • Sun, Le (Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science & Technology, Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology) ;
  • Subramani, Sudha (Victoria University)
  • Received : 2021.04.25
  • Accepted : 2021.06.26
  • Published : 2021.07.31

Abstract

Intelligently detecting anomalies in health sensor data streams (e.g., Electrocardiogram, ECG) can improve the development of E-health industry. The physiological signals of patients are collected through sensors. Timely diagnosis and treatment save medical resources, promote physical health, and reduce complications. However, it is difficult to automatically classify the ECG data, as the features of ECGs are difficult to extract. And the volume of labeled ECG data is limited, which affects the classification performance. In this paper, we propose a Generative Adversarial Network (GAN)-based deep learning framework (called CAB) for heart arrhythmia classification. CAB focuses on improving the detection accuracy based on a small number of labeled samples. It is trained based on the class-imbalance ECG data. Augmenting ECG data by a GAN model eliminates the impact of data scarcity. After data augmentation, CAB classifies the ECG data by using a Bidirectional Long Short Term Memory Recurrent Neural Network (Bi-LSTM). Experiment results show a better performance of CAB compared with state-of-the-art methods. The overall classification accuracy of CAB is 99.71%. The F1-scores of classifying Normal beats (N), Supraventricular ectopic beats (S), Ventricular ectopic beats (V), Fusion beats (F) and Unclassifiable beats (Q) heartbeats are 99.86%, 97.66%, 99.05%, 98.57% and 99.88%, respectively. Unclassifiable beats (Q) heartbeats are 99.86%, 97.66%, 99.05%, 98.57% and 99.88%, respectively.

Keywords

1. Introduction

Heart arrhythmia is a common disease in elderly people [1]. It can lead to serious health problems and even sudden death [2]. Therefore, it is important to have immediate diagnosis and treatment for heart arrhythmias [3]. Electrocardiogram (ECG) is a popular sensor signal for diagnosing heart diseases [4]. However, manually analyzing the complex ECG signals is difficult [5] [6]. Therefore, it is necessary to develop efficient methods to assist experts in detecting heart arrhythmias automatically [7].

The Association for Advancement of Medical Instrumentation (AAMI) groups the heartbeats of arrhythmia patients into five categories: Normal beat (N), Ventricular ectopic beat (V), Supraventricular ectopic beat (S), Unclassifiable beat (Q) and Fusion beat (F) [8]. Heartbeats in class N are normal or bundle branch block heartbeats. Class S includes abnormal supraventricular heartbeats and premature heartbeats. Class V refers to ventricular ectopic beats, which are not paced by the sinus node of the ventricle. Class F refers to the fusion heartbeat, and Q refers to the unclassified heartbeat. Fig. 1 shows the heartbeat types contained in each category.

E1KOBZ_2021_v15n7_2304_f0001.png 이미지

Fig. 1. Five categories of heartbeat types defined by AAMI.

In ECG data, some classes of data are rare, such as F and Q classes. Therefore, data imbalance is a significant problem in ECG classification. The lack of data can make the classification model not adequately trained. Insufficient model training can affect the accuracy of heartbeat classification, especially the classification accuracy of rare heartbeats. To solve the data imbalance problem, some data augmentation methods have been proposed. A commonly used algorithm for time series data augmentation is Synthetic Minority Oversampling Technique (SMOTE) [9]. However, SMOTE has the following limitations: (1) it has blindness in the selection of neighbors; (2) the problem of data distribution cannot be overcome, and it is easy to cause distribution marginalization [9]. So we propose a data augmentation method based on Generative Adversarial Network (GAN). It can generate high-quality ECG data to prevent data imbalance from affecting classification.

GAN is a generative model [10] for data generation. It is composed of a generator and a discriminator. When the generator is fixed, the discriminator is optimized to maximize discriminant accuracy. When the discriminator model is fixed, the generator is optimized to minimize the discriminant accuracy [11]. Long Short Term Memory Recurrent Neural Network (LSTM) is a kind of special Recurrent Neural Network (RNN). Its cellular state is more complex than that of the ordinary RNN [12]. LSTM stores information in gated units that can store, write, or read information [13].

In this paper, we propose a GAN-based framework (called CAB) to Classify Arrhythmias (N, S, V, F, and Q) based on imBalanced ECG sensor data. In CAB, we design a GAN to generate data for the heartbeat categories with sparse samples. In addition, we develop a stacked Bidirectional Long Short Term Memory Recurrent Neural Network (Bi-LSTM) to classify heartbeats. Our main contributions are as follows:

• design a GAN-based data augmentation algorithm for augmenting ECG data. The algorithm can effectively capture the characteristics of ECG data to generate high-quality ECG data. Then it generates enough heartbeat data for classes with sparse samples to train the classification model.

• propose a deep neural network based on stacked Bi-LSTM. The network is trained by the original data and the data augmented by GAN. CAB classifies heartbeats based on the morphology of heartbeats. It can extract timing features, thus improving the accuracy of heartbeat classification based on a small set of labeled samples.

• conduct comprehensive experiments to evaluate CAB by comparing it with state-of-the-art methods. The overall classification accuracy is 99.71%. The F1- scores of classifying N, S, V, F, and Q heartbeats are 99.86%, 97.66%, 99.05%, 98.57%, and 99.88%, respectively.

The rest structure of this paper is: Section 2 introduces related work Section 3 introduces CAB; Section4 describes experiments and results, and Section 5 concludes this paper.

2. Related Work

2.1 Arrhythmia Heartbeat Classification Methods

2.1.1 Machine Learning Classification Methods

Elhaj et al. [14] proposed a classification method combining Bayesian algorithm with Extreme Learning Machine (ELM), which can avoid the problem of overfitting. Mondéjar-Guerra et al. [15] proposed a multi-support vector machines-based classification method. This method uses the time intervals and the morphology of heartbeats as features. Shi et al. [16] developed a classification method using the weighted extreme gradient boosting. Garcia et al. [17] proposed a new ECG representation based on an ECG vector, called Temporal VectorCardioGram (TVCG). The classification method uses Support Vector Machines (SVMs) as a classifier and performs feature selection with a Particle Swarm Optimization (PSO) algorithm. Alfaras et al. [18] proposed an automatic and fast ECG arrhythmia classifier based on a brain-inspired machine learning method called Echo State Networks. Houssein et al. [19] proposed a heartbeat classification method based on Twin Support Vector Machines (TWSVM) and Particle Swarm Optimization (PSOGSA). Empirical Mode Decomposition (EMD) is applied to ECG noise removal and feature extraction. PSOGSA is used to find the optimal parameters of TWSVM to improve the classification process.

2.1.2 Deep Learning Classification Methods

Jiang et al. [20] proposed a sequence-to-sequence (seq2seq) model based on LSTM with Convolutional Neural Network (CNN)-based embedding. Specifically, local channel-wise attention is used to highlight more discriminative features. Li et al. [21] developed a DNN based on the residual network. Chazal et al. [22] used the ECG morphology, heartbeat interval, and RR interval to select the optimal classifier configuration. Niu et al. [23] designed a heartbeat classification framework based on Multi-Perspective CNN. This method automatically learns features and classifies heartbeats based on the symbolic representation of heartbeat. Hannun et al. [24] proposed a deep neural network (DNN) that can classify 12 kinds of ECG signals, including 10 arrhythmias, one sinus rhythm, and one noise. Saadatnejad et al. [25] proposed a heartbeat classification framework composed of wavelet transform and multiple LSTM. Lu et al. [26] proposed a hybrid ECG classification model based on LSTM and CNN. The model can learn the structural features of ECG signals and mine the time correlation between ECG signal points.

Li et al. [27] proposed a heartbeat classification model based on the Bi-LSTM and Bi-LSTM Attention algorithm. Wang et al. [28] proposed a classification method based on semantic coding, and Bi-LSTM is used as the classification model. The model uses a Stack Denoising AutoEncoder (SDAE) as the encoder to automatically learn the semantic encoding of the heartbeat. Yildirim [29] proposed a wavelet sequence classification model called DBLSTM- WS based on a deep Bi-LSTM network. This model divides the ECG signal into sub-bands of different scales based on wavelets and inputs them into the LSTM network. Wu et al. [30] proposed an ECG signal analysis method based on residual network and Bi-LSTM. This method uses the statistical time domain, frequency domain, nonlinear domain, and deep features of the ECG signal to distinguish basic heart diseases. The combination of residual network and Bi-LSTM can better pay attention to the sequence information contained in the ECG signal. Li et al. [31] proposed a heartbeat classification method based on General CNN.

Li et al. [32] proposed a customizable CNN-based automatic patient-specific classification method and a channel-wise attention module to selectively emphasize the information features. Zhai and Tin [33] proposed a CNN-based classifier. According to the change of heart rate, the single-channel ECG signal is divided into beats and the beat value is converted into a two-beat coupling matrix. The coupling matrix input into the CNN classifier can not only obtain the morphology of beats in ECG but also the correlation of beats to beats. Xia et al. [34] proposed a sparse constraint SDAE to learn ECG features. Mousavi and Afghah [35] developed an automatic heartbeat classification method using deep CNN and sequence-to-sequence models for unbalanced datasets. Xu et al. [36] proposed a signal alignment method based on deep learning and a feature extraction method using DNN. Wang et al. [37] proposed an improved CNN for accurate classification. Each convolution layer of CNN uses cores of different sizes, making full use of the characteristics of different scales.

2.2 Data Augmentation Methods

Shaker et al. [38] proposed two methods: (1) an end-to-end deep learning method, (2) a two-stage hierarchical learning method based on deep CNNs. The designed GAN for the balance of the dataset is based on fully connected layers. Wang et al. [39] proposed a classification method using stacked residual networks and LSTM. They used an Auxiliary Classifier Generative Adversarial Network (ACGAN) for data augmentation. Zhou et al. [40] proposed an arrhythmia classification system based on GAN. The generator of GAN is used for data augmentation. The discriminator is used as the arrhythmia classifier. Golany et al. [41] proposed an Ordinary Differential Equation (ODE) system. The ODE system uses GAN to generate ECG data. Golany and Radinsky [42] proposed a patient-specific ECG classification method using semi-supervised approach. A GAN-based augmentation model is designed to learn and synthesize patient-specific ECG signals.

Acharya et al. [43] designed a 9-layer CNN to classify heartbeat and a method based on Z score normalization to augment data. Rajesh and Dhuli [9] proposed a nonstationary nonlinear decomposition method to extract features. The AdaBoost integrated classifier is used for heartbeat classification. Besides, Re-sampling, SMOTE, and Distribution based data sampling are used for data augmentation. Sellami and Hwang [44] proposed a new deep CNN for heartbeat classification. A batch weighted loss function is proposed to quantify the loss to overcome the problem of data imbalance. Romdhane et al. [45] proposed a CNN-based method and an optimized loss function for data augmentation. Nazi et al. [46] proposed a classification framework based on the dot Residual LSTM network. Conditional Variational AutoEncoder (CVAE) and LSTM network are used to increase training samples to solve data imbalance.

2.3 Abnormal Heartbeat Detection

Atrial Fibrillation (AF) is the most common arrhythmia, and the incidence of AF increases with age. To prevent atrial fibrillation from affecting people’s health, it must be diagnosed and treated as soon as possible. Many studies are devoted to the detection of AF. These studies can screen out AF. Zhou et al. [47] proposed an algorithm based on GAN called BeatGAN for detecting abnormal heartbeats. Lai et al. [48] proposed a lightweight CNN to automatically identify AF, using representative rhythmic features of AF. Shen et al. [49] developed a 50-layer CNN to detect AF. Fan et al. [50] proposed a Multi-Scale deep CNN (MS-CNN) fusion algorithm. MS-CNN uses double-stream convolutional network architecture with filters of different sizes to capture features of different scales. Zhao et al. [51] proposed a Bayesian spectral time representation method for ECG signals based on the state-space model and Kalman filter. A dense CNN is used as a classifier.

There are also some studies that classify ECG records. Dang et al. [52] proposed a model based on Bi-LSTM and CNN for AF heartbeat detection. Wu et al. [53] used the feature of the wavelet coefficient matrix based on the continuous wavelet transform. And a CNN with a specific structure is used as a detection model. Limam and Precioso [54] designed a classification network using Convolutional Recurrent Neural Network (CRNN). The network is composed of two independent CNNs, one RNN, and an SVM. Zabihi et al. [55] selected the ECG signal features from time, frequency, time-frequency domains, and phase space reconstruction. Finally, a random forest classifier is used to classify the selected features.

Many studies have used the MIT-BIH dataset to classify the heartbeats of N, S, V, F, and Q. Li et al. [27] developed a Bi-LSTM network for heartbeat classification. Chen et al. [56] proposed an SVM algorithm to classify heartbeats. Wang et al. [57] developed a two-layer classifier. Each layer contains two independent fully connected networks. Acharya et al. [43] designed a nine-layer CNN for classification. Wang et al. [37] developed an improved CNN, where the kernel sizes of convolution layers are different from each other. Experiment results show that CAB performs better than the classification models in [27] [37][43] [56][57][59][60] (see Section 4).

3. The Proposed Framework for Heartbeat Classification

CAB consists of two parts: (1) a GAN for generating data to solve the problem of class imbalance; and (2) a stacked Bi-LSTM for heartbeat classification. The framework of CAB is shown in Fig. 2. The GAN generates data for the classes with small samples. And then the generated data combined with the original data are used to train the classification model.

E1KOBZ_2021_v15n7_2304_f0002.png 이미지

Fig. 2. Structure of CAB framework.

3.1 GAN for Data Augmentation

Fig. 3 shows the GAN structure of CAB. The input of the discriminator is a heartbeat sequence 𝑋𝑖={𝑥1, 𝑥2, ⋯ , 𝑥𝑡}, where 𝑥𝑗(𝑗 ∈ [1, ⋯ 𝑡]) is an ECG signal value at the 𝑗𝑡ℎ time tick. The input of the generator is a sequence of noise signals 𝑍𝑖 = {𝑧1, 𝑧2, ⋯ , 𝑧𝑡}. Z is a random noise satisfying the Gaussian distribution. Finally, GAN outputs the generated ECG sequence 𝑋𝑖 = {x0, x1, ⋯ , xt}. The target function of the GAN is represented by (1).

\(\mathrm{V}(\mathrm{D}, \mathrm{G})=\mathrm{E}_{\mathrm{X}_{\mathrm{i}} \sim \mathrm{p}_{\text {data }}\left(\mathrm{X}_{\mathrm{i}}\right)}\left[\log \mathrm{D}\left(\mathrm{X}_{\mathrm{i}}\right)\right]+\mathrm{E}_{\mathrm{Z}_{\mathrm{i}} \sim \mathrm{p}_{\mathrm{Z}_{\mathrm{i}}}\left(\mathrm{Z}_{\mathrm{i}}\right)}\left[\log \left(10 \mathrm{D}\left(\mathrm{G}\left(\mathrm{Z}_{\mathrm{i}}\right)\right)\right)\right]\)       (1) 

E1KOBZ_2021_v15n7_2304_f0003.png 이미지

Fig. 3. Structure of the generator and discriminator of CAB.

Both the discriminator and generator are implemented on a full-connection basis. The generator adds the Batch Normalization to process the data, and the discriminator uses the Flatten layer to transform multi-dimensional inputs into one-dimensional data. The generator has five repeated blocks. Each block has a fully connected layer, a LeakyReLu layer, and a Batch Normalization. The discriminator has five repeated blocks too. Each block consists of a fully connected layer and a LeakyReLu layer.

3.2 Stacked Bi-LSTM for Classification

Our classification model combines the stacked Bi-LSTM with the fully connected layer. It also adds several dropout layers to prevent overfitting. ReLu is used as the activation function. After using GAN to generate enough data, we integrate the generated data with the original data (represented by 𝑋𝑖′′ = {𝑥1′′, 𝑥2′′, ⋯ , 𝑥𝑡′′}). 𝑋𝑖′′ is input to the Stacked Bi-LSTM. At last, the classification model outputs the probability of each type of heartbeat. The type with the highest probability is the type of the input heartbeat.

In each LSTM cell, the forgotten gate determines the discarding of information. Equation (2) is the activation function of the forgotten gate. The input of the 𝑡𝑡ℎ forgotten gate is 𝑥𝑡 and the output of cell ℎ𝑡−1. The input gate adds useful new information to the cellular state. Equation (3) decides which information needs to be updated. Equation (4) determines the alternate content to be updated. And 𝐶𝑡 is optional for updating. Equation (5) adds new information to the cell state 𝐶𝑡 to replace the old state 𝐶𝑡−1. The output gate selects important information from the current state as the output of the cellular state, (6) determines 𝑜𝑡, i.e., the output information. Equation (7) transforms 𝑜𝑡 to the final output ℎ𝑡.

ft = σ(Wf·[ht-1,xt'']+bf)       (2)

it = σ(Wi·[ht-1,xt'']+bi)       (3)

\(\tilde{C}_{t}=\tanh \left(W_{c} \cdot\left[h_{t-1}, x_{t}^{\prime \prime}\right]+b_{c}\right)\)       (4)

\(C_{t}=f_{t} * C_{t-1}+i_{t} * \tilde{C}_{t}\)       (5)

ot = σ(Wo[ht-1,xt'']+bo)       (6)

ht = ot*tanh(Ct)       (7)

CAB uses the Bi-LSTM network, whose output result is determined by both the previous input and the later input. The forward LSTM combining with the backward LSTM forms Bi-LSTM. Besides, a stacked structure is used to deepen the depth of the model. The Bi-LSTM uses the Adam gradient descent algorithm and the spars categorical cross-entropy loss function. The parameter settings of the Stacked Bi-LSTM are shown in Table 1.

Table 1. The numbers of cells in different layers

E1KOBZ_2021_v15n7_2304_t0001.png 이미지

4. Experiment

The experimernt is based on the MIT-BIH arrhythmia database. It is performed on a computer with a GPU of NVIDIA GeForce GTX 950M and 3049 MB memory.

4.1 Arrhythmia Dataset

The MIT-BIH arrhythmia database contains 48 two-channel ECG recordings from Beth Israel hospital. The duration of each record is half an hour. The ECG signals are from 47 patients. And they are digitized at a rate of 360 samples per second per channel, with an 11-bit resolution of more than 10 mV. In addition, two or more experts annotated each record to iron out their differences [59].

4.2 Experimental Setup

The experiment includes the following steps: (1) heartbeat segmentation; (2) data generation; (3) training, testing, and validation dataset partition; and (4) model training and evaluation.

4.2.1 Heartbeats Segmentation

The procedure of heartbeat segmentation refers to the method of [47]. At first, we de-noise ECG signals and locate the R peak. An ECG record is divided by taking 140-time ticks in front of R peaks and 180-time ticks behind R peaks. Therefore, a heartbeat contains a total of 320-time ticks. Fig. 4 shows the heartbeat segmentation. According to AAMI [8], the signals in the recordings named 102, 104, 107, and 218 are of poor quality, so we do not extract the signals in these four recordings. Finally, the number of extracted heartbeats of each type is shown in Table 2.

E1KOBZ_2021_v15n7_2304_f0004.png 이미지

Fig. 4. Heartbeat segmentation.

Table 2. Heartbeat distributions in MIT-BIH arrhythmia dataset

E1KOBZ_2021_v15n7_2304_t0002.png 이미지4.2.2 Data Generation

As can be seen from Table 2, the numbers of different types of heartbeat vary greatly. The heartbeat numbers of F and Q are much fewer than the numbers of the other three types. To solve this class imbalance problem, we use GAN to respectively expand the samples of F and Q to 2500.

4.2.3 Training, Testing, and Validation Dataset Partition

We first partition the original dataset as 90% for training and 10% for testing. And then we partition the training dataset as 80% for training and 20% for validating. Figure 1 shows that each category contains several subclasses. To make the subclasses evenly distributed in the dataset, data are shuffled before partitioning the data set. We divide all the ECG sequences into ten equal portions, one portion for testing, and the other nine portions for training. This operation is repeated ten times by shifting the testing portion. Each time the evaluation metrics are evaluated. Finally, the average performance of CAB is calculated.

4.2.4 Model Training

Before the training, we shuffle the training set again to effectively avoid the occurrence of overfitting. The settings of important parameters are shown in Table 3. The classification model uses Adam [60] as the optimizer. The loss function is the sparse categorical cross-entropy. Equation (8) expresses the cross-entropy.

\(C=-\frac{1}{n} \sum[y \ln a+(1-y) \ln (1-a)]\)       (8)

where, 𝑛 is the number of categories, 𝑦 is the desired output, 𝑎 is the actual output of the neuron, and 𝑥𝑥 is the input.

Table 3. The important parameter settings of classification model

E1KOBZ_2021_v15n7_2304_t0003.png 이미지

4.2.5 Evaluation Metrics

Four metrics are used to evaluate the performance of CAB, which are the accuracy(ACC), precision (PRE), recall (REC), and f1 score (F1) (see (9)-(12)).

\(A C C=\frac{T P+T N}{T P+T N+F P+F N}\)       (9)

\(\mathrm{PRE}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}\)       (10)

\(\mathrm{REC}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}\)       (11)

\(F_{1}=\frac{2 * P R E * R E C}{P R E+R E C}\)       (12)

where, TP, FP, TN and FN represent the true positives, false positives, true negatives and false negatives respectively.

4.3 Results

4.3.1 Training

The average ACC, PRE, REC, and F1 of CAB based on the training set are 99.71%, 99.30%, 98.73%, and 99.01% respectively (see Fig. 6). The ACCs of training and validation for 100 epochs are shown in Fig. 5. We can see that the ACCs increase when the epoch number increases.

E1KOBZ_2021_v15n7_2304_f0006.png 이미지

Fig. 6. The classification PRE, REC, and F1 of CAB and No-GAN basedon the training set.

E1KOBZ_2021_v15n7_2304_f0005.png 이미지

Fig. 5. The train history of training set.

Fig. 6 also shows the training results based on the raw unbalanced data, i.e. without using the GAN to balance classes (called No-GAN). The average ACC, PRE, REC, and F1 of No-GAN based on the training set are 97.92%, 97.92%, 94.09%, and 95.76% respectively. We can see from Fig. 6 that the average performance of CAB is better than the average performance of No-GAN. No-GAN performs as well as CAB in terms of classifying N, S, and V heartbeats. However, CAB performs much better than No-GAN for classifying F and Q heartbeats. The average PRE, REC, and F1 of CAB are 2%, 5%, and 3% higher than those of No-GAN respectively.

4.3.2 Testing

The average ACC, PRE, REC, and F1 of CAB based on the testing set are 99.48%, 98.81%, 97.93%, and 98.27% respectively (see Fig. 7). The average ACC, PRE, REC, and F1 of No-GAN based on the testing set are 98.52%, 73.95%, 72.81%, and 73.41% respectively. The average PRE, REC, and F1 of CAB are 15%, 15%, and 15% higher than those of No-GAN respectively.

E1KOBZ_2021_v15n7_2304_f0007.png 이미지

Fig. 7. The classification PRE, REC, and F1 on testing set of CAB and No-GAN.

We compare CAB with five state-of-the-art arrhythmia classification models based on the MIT-BIH dataset: Li et al. [27], Chen et al. [56], Wang et al. [57], Acharya et al. [43], and Wang et al. [37]. The comparison results are shown in Table IV. The average ACC of CAB is higher than the ACCs of the other work. It is 7%, 6% and 6% higher than the average ACCs of [56], [57], and [43] respectively. For ’N’ classification, the F1 of CAB is around 10% higher than the F1’s of [56], [57] and [43] respectively. For ’S’ classification, the F1 of CAB is around 60%, 30%, 10% and 15% higher than the F1’s of [56], [57] [43] and [37] respectively. For ’V’ classification, the F1 of CAB is around 30%, 10% and 6% higher than [56], [57] and [43] respectively. And for ’F’ classification, the F1 of CAB is around 25%, 6%, and 17% higher than [57] [43] and [37] respectively.

Table 4. Compare CAB with State of the Art

E1KOBZ_2021_v15n7_2304_t0004.png 이미지

Overall, CAB performs better than No-GAN and the classification models in [27] [37][43][56][57][59][60] on average. The experiment results show that it is efficient to use GAN to solve the data unbalance problem for heartbeat classifications.

5. Conclusion

In this paper, we propose CAB, a heartbeat classification framework for arrhythmia detection based on the morphology of heartbeats. CAB solves the class imbalance problem by using a GAN to generate samples for the heartbeat categories with sparse samples. We design a heartbeat classification model based on Bi-LSTM to capture information between time series. Using the ECG data augmented by GAN to train the classification model can improve the accuracy of heartbeat classification. Experiment results show that CAB performs better than the state-of-the-art heartbeat classification models. The overall classification accuracy is 99.71%. The F1-scores of classifying N, S, V, F and, Q heartbeats are 99.86%, 97.66%, 99.05%, 98.57%, and 99.88%, respectively. CAB realizes the intelligent classification of a heartbeat. It automatically classifies the received ECG data and analyzes the type of arrhythmia in time. This can save the energy of cardiologists and effectively prevent the occurrence of various complications. CAB helps cardiologists make a timely diagnosis and promotes the development of telemedicine. It can bring convenience to patients and doctors, save time and medical resources. Besides, CAB can continuously monitor people’s heart health, reduce the prevalence of heart diseases, and vigorously promote physical health. In the future, we will further design a general classification model with more arrhythmia data sets for testing. And we plan to extract the information of the frequency domain and the time domain using the wavelet transform. The accurate classification of finer-grained arrhythmia heartbeats will be also explored.

Acknowledgement

This work is supported by the National Natural Science Foundation of China (Grants No 61702274) and PAPD.

References

  1. D. Kim, H. Kim, and J. Kwak, "Secure sharing scheme of sensitive data in the precision medicine system," Computers, Materials & Continua, vol. 64, no. 3, pp. 1527-1553, Jun. 2020. https://doi.org/10.32604/cmc.2020.010535
  2. T. S. Dillon, Y. P. Chen, E. Chang, M. Mohania, and V. Ramakonar, "Conjoint knowledge discovery utilizing data and content with applications in business, bio-medicine, transport logistics and electrical power systems," Computer Systems Science and Engineering, vol. 35, no.5, pp. 321-334, Jan. 2020. https://doi.org/10.32604/csse.2020.35.321
  3. S. and A. Vincent, "Effective and efficient ranking and re-ranking feature selector for healthcare analytics," Intelligent Automation & Soft Computing, vol. 26, no.2, pp. 261-268, 2020.
  4. V. S. Naresh, S. S. Pericherla, P. Sita, and S. Reddi, "Internet of things in healthcare: architecture, applications, challenges, and solutions," Computer Systems Science and Engineering, vol. 35, no.6, pp. 411-421, Nov. 2020. https://doi.org/10.32604/csse.2020.35.411
  5. K. N. Wang, J. S. Bell, E. Y. Chen, J. F. Gilmartin-Thomas, and J. Ilomaki, "Medications and prescribing patterns as factors associated with hospitalizations from long-term care facilities: a systematic review," Drugs & Aging, vol. 35, no. 5, pp. 423-457, Mar. 2018. https://doi.org/10.1007/s40266-018-0537-3
  6. E. J. d. S. Luz, W. R. Schwartz, G. Camara-Chavez, and D. Menotti, "ECG-based heartbeat classification for arrhythmia detection: A survey," Computer Methods and Programs in Biomedicine, vol. 127, pp. 144-164, Apr. 2016. https://doi.org/10.1016/j.cmpb.2015.12.008
  7. Y. Chen, X. Qin, L. Zhang, and B. Yi, "A novel method of heart failure prediction based on DPCNN-Xgboost model," Computers, Materials & Continua, vol. 65, no. 1, pp. 495-510, 2020. https://doi.org/10.32604/cmc.2020.011278
  8. AAMI, "Testing and reporting performance results of cardiac rhythm and ST segment measurement algorithms," ANSIAAMI EC38, Tech. Rep., 1998.
  9. K. N. Rajesh and R. Dhuli, "Classification of imbalanced ECG beats using re-sampling techniques and adaboost ensemble classifier," Biomedical Signal Processing and Control, vol. 41, pp. 242-254, Mar. 2018. https://doi.org/10.1016/j.bspc.2017.12.004
  10. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, "Generative adversarial networks," ArXiv, vol. abs1406.2661, Jun. 2014.
  11. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, G. Klambauer, and S. Hochreiter, "Gans trained by a two time-scale update rule converge to a nash equilibrium," ArXiv, vol. abs1706.08500, Nov. 2017.
  12. Z. C. Lipton, "A critical review of recurrent neural networks for sequence learning," ArXiv, vol. abs1506.00019, May. 2015.
  13. F. A. Gers and E. Schmidhuber, "Lstm recurrent networks learn simple context-free and contextsensitive languages," IEEE Transactions on Neural Networks, vol. 12, no. 6, pp. 1333-1340, Nov. 2001. https://doi.org/10.1109/72.963769
  14. F. A. Elhaj, N. Salim, T. Ahmed, A. R. Harris, and T. T. Swee, "Hybrid classification of bayesian and extreme learning machine for heartbeat classification of arrhythmia detection," in Proc. of 6th ICT International Student Project Conference (ICT-ISPC), pp. 1-4, May. 2017.
  15. V. Mondejar-Guerra, J. Novo, J. Rouco, M. G. Penedo, and M. Ortega, "Heartbeat classification fusing temporal and morphological information of ECGs via ensemble of classifiers," Biomedical Signal Processing and Control, vol. 47, pp. 41-48, Jan. 2019. https://doi.org/10.1016/j.bspc.2018.08.007
  16. H. Shi, H. Wang, Y. Huang, L. Zhao, C. Qin, and C. Liu, "A hierarchical method based on weighted extreme gradient boosting in ECG heartbeat classification," Computer Methods and Programs in Biomedicine, vol. 171, pp. 1-10, Apr. 2019.
  17. G. Garcia, G. Moreira, D. Menotti, and E. Luz, "Inter-patient ECG heartbeat classification with temporal VCG optimized by pso," Scientific Reports, vol. 7, no. 1, pp. 1-11, Sep. 2017. https://doi.org/10.1038/s41598-016-0028-x
  18. M. Alfaras, M. C. Soriano, and S. Ortin, "A fast machine learning model for ECG-based heartbeat classification and arrhythmia detection," Frontiers in Physics, vol. 7, p. 103, Jul. 2019. https://doi.org/10.3389/fphy.2019.00103
  19. E. H. Houssein, A. A. Ewees, and M. Abd ElAziz, "Improving twin support vector machine based on hybrid swarm optimizer for heartbeat classification," Pattern Recognition and Image Analysis, vol. 28, no. 2, pp. 243-253, Jun. 2018.
  20. K. Jiang, S. Liang, L. Meng, Y. Zhang, P. Wang, and W. Wang, "A two-level attention-based sequence-to-sequence model for accurate inter patient arrhythmia detection," in Proc. of IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, pp. 1029-1033, Dec. 2020.
  21. Z. Li, D. Zhou, L. Wan, J. Li, and W. Mou, "Heartbeat classification using deep residual convolutional neural network from 2-lead electrocardiogram," Journal of Electrocardiology, vol. 58, pp. 105-112, Jan. 2020. https://doi.org/10.1016/j.jelectrocard.2019.11.046
  22. P. De Chazal, M. O'Dwyer, and R. B. Reilly, "Automatic classification of heartbeats using ECG morphology and heartbeat interval features," IEEE Transactions on Biomedical Engineering, vol. 51, no. 7, pp. 1196-1206, Jul. 2004. https://doi.org/10.1109/tbme.2004.827359
  23. J. Niu, Y. Tang, Z. Sun, and W. Zhang, "Inter-patient ECG classification with symbolic representations and multi-perspective convolutional neural networks," IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 5, pp. 1321-1332, May. 2019. https://doi.org/10.1109/jbhi.2019.2942938
  24. A. Y. Hannun, P. Rajpurkar, M. Haghpanahi, G. H. Tison, C. Bourn, M. P. Turakhia, and A. Y. Ng, "Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network," Nature Medicine, vol. 25, no. 1, pp. 65-69, Jan. 2019. https://doi.org/10.1038/s41591-018-0268-3
  25. S. Saadatnejad, M. Oveisi, and M. Hashemi, "LSTM-based ECG classification for continuous monitoring on personal wearable devices," IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 2, pp. 515-523, Feb. 2020. https://doi.org/10.1109/jbhi.2019.2911367
  26. P. Lu, S. Guo, Y. Wang, L. Qi, X. Han, and Y. Wang, "ECG classification based on long shortterm memory networks," in Proc. of 2nd International Conference on Healthcare Science and Engineering, Springer Singapore, pp. 129-140, May. 2019.
  27. R. Li, X. Zhang, H. Dai, B. Zhou, and Z. Wang, "Interpretability analysis of heartbeat classification based on heartbeat activity's global sequence features and Bi-LSTM-attention neural network," IEEE Access, vol. 7, pp. 109870-109883, 2019. https://doi.org/10.1109/ACCESS.2019.2933473
  28. E. K. Wang, X. Zhang, and L. Pan, "Automatic classification of cad ECG signals with SDAE and bidirectional long short-term network," IEEE Access, vol. 7, pp. 182873-182880, Aug. 2019. https://doi.org/10.1109/ACCESS.2019.2936525
  29. ozal Yildirim, "A novel wavelet sequence based on deep bidirectional LSTM network model for ecg signal classification," Computers in Biology and Medicine, vol. 96, pp. 189-202, May. 2018. https://doi.org/10.1016/j.compbiomed.2018.03.016
  30. X. Wu, X. Wang, J. Ma, Q. Li, and T. Zhao, "A short-term ecg signal classification method based on residual network and bi-directional lstm," in Proc. of International Communication Engineering and Cloud Computing Conference, pp. 19-22, Oct. 2019.
  31. Y. Li, Y. Pang, J. Wang, and X. Li, "Patient-specific ecg classification by deeper cnn from generic to dedicated," Neurocomputing, vol. 314, pp. 336-346, Nov. 2018. https://doi.org/10.1016/j.neucom.2018.06.068
  32. F. Li, J. Wu, M. Jia, Z. Chen, and Y. Pu, "Automated heartbeat classification exploiting convolutional neural network with channel-wise attention," IEEE Access, vol. 7, pp. 122955-122963, Aug. 2019. https://doi.org/10.1109/ACCESS.2019.2938617
  33. X. Zhai and C. Tin, "Automated ECG classification using dual heartbeat coupling based on convolutional neural network," IEEE Access, vol. 6, pp. 27465-27472, May. 2018. https://doi.org/10.1109/ACCESS.2018.2833841
  34. Y. Xia, H. Zhang, L. Xu, Z. Gao, H. Zhang, H. Liu, and S. Li, "An automatic cardiac arrhythmia classification system with wearable electrocardiogram," IEEE Access, vol. 6, pp. 16529-16538, Feb. 2018. https://doi.org/10.1109/ACCESS.2018.2807700
  35. S. Mousavi and F. Afghah, "Inter-and intra-patient ECG heartbeat classification for arrhythmia detection: a sequence to sequence deep learning approach," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1308-1312, Apr. 2019.
  36. S. S. Xu, M.-W. Mak, and C.-C. Cheung, "Towards end-to-end ECG classification with raw signal extraction and deep neural networks," IEEE Journal of Biomedical and Health Informatics, vol. 23, no. 4, pp. 1574-1584, Jul. 2018. https://doi.org/10.1109/jbhi.2018.2871510
  37. H. Wang, H. Shi, X. Chen, L. Zhao, Y. Huang, and C. Liu, "An improved convolutional neural network based approach for automated heartbeat classification," Journal of Medical Systems, vol. 44, no. 2, pp. 1-9, Dec. 2020. https://doi.org/10.1007/s10916-019-1451-x
  38. Shaker, A. M., Tantawi, M., Shedeed, H. A., and Tolba, M. F., "Generalization of convolutional neural networks for ecg classification using generative adversarial networks," IEEE Access, vol. 8, pp. 35592-35605, Feb. 2020. https://doi.org/10.1109/ACCESS.2020.2974712
  39. P. Wang, B. Hou, S. Shao, and R. Yan, "ECG arrhythmias detection using auxiliary classifier generative adversarial network and residual network," IEEE Access, vol. 7, pp. 100910-100922, Jul. 2019. https://doi.org/10.1109/ACCESS.2019.2930882
  40. Z. Zhou, X. Zhai, and C. Tin, "Fully automatic electrocardiogram classification system based on generative adversarial network with auxiliary classifier," Expert Systems with Applications, vol. 174 , p. 114809, Jul. 2021. https://doi.org/10.1016/j.eswa.2021.114809
  41. T. Golany, K. Radinsky, and D. Freedman, "Simgans: Simulator-based generative adversarial networks for ecg synthesis to improve deep ecg classification," in Proc. of International Conference on Machine Learning, PMLR, pp. 3597-3606, Jul. 2020.
  42. T. Golany and K. Radinsky, "Pgans: Personalized generative adversarial networks for ECG synthesis to improve patient-specific deep ECG classification," in Proc. of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 557-564, Jul. 2019.
  43. U. R. Acharya, S. L. Oh, Y. Hagiwara, J. H. Tan, M. Adam, A. Gertych, and R. San Tan, "A deep convolutional neural network model to classify heartbeats," Computers in Biology and Medicine, vol. 89, pp. 389-396, Oct. 2017. https://doi.org/10.1016/j.compbiomed.2017.08.022
  44. A. Sellami and H. Hwang, "A robust deep convolutional neural network with batch-weighted loss for heartbeat classification," Expert Systems with Applications, vol. 122, pp. 75-84, May. 2019. https://doi.org/10.1016/j.eswa.2018.12.037
  45. T. F. Romdhane and M. A. Pr, "Electrocardiogram heartbeat classification based on a deep convolutional neural network and focal loss," Computers in Biology and Medicine, vol. 123, p. 103866, Aug. 2020. https://doi.org/10.1016/j.compbiomed.2020.103866
  46. Z. A. Nazi, A. Biswas, M. A. Rayhan, and T. Azad Abir, "Classification of ECG signals by dot residual lstm network with data augmentation for anomaly detection," in Proc. of 22nd International Conference on Computer and Information Technology (ICCIT), pp. 1-5, Dec. 2019.
  47. B. H. X. C. Bin Zhou, Shenghua Liu, and J. Ye, "Beatgan: Anomalous rhythm detection using adversarially generated time series," in Proc. of International Joint Conference on Artificial Intelligence (IJCAI), pp. 4433-4439, 2019.
  48. D. Lai, X. Zhang, Y. Bu, Y. Su, and C.-S. Ma, "An automatic system for real-time identifying atrial fibrillation by using a lightweight convolutional neural network," IEEE Access, vol. 7, pp. 130074-130084, Sep. 2019. https://doi.org/10.1109/ACCESS.2019.2939822
  49. Y. Shen, M. Voisin, A. Aliamiri, A. Avati, A. Hannun, and A. Ng, "Ambulatory atrial fibrillation monitoring using wearable photoplethys mography with deep learning," in Proc. of 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1909-1916, Jul. 2019.
  50. X. Fan, Q. Yao, Y. Cai, F. Miao, F. Sun, and Y. Li, "Multiscaled fusion of deep convolutional neural networks for screening atrial fibrillation from single lead short ECG recordings," IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 6, pp. 1744-1753, Aug. 2018. https://doi.org/10.1109/jbhi.2018.2858789
  51. Z. Zhao, S. Sarkka, and A. B. Rad, "Spectro-temporal ECG analysis for atrial fibrillation detection," in Proc. of 28th International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, pp. 1-6, Sep. 2018.
  52. H. Dang, M. Sun, G. Zhang, X. Qi, X. Zhou, and Q. Chang, "A novel deep arrhythmia-diagnosis network for atrial fibrillation classification using electrocardiogram signals," IEEE Access, vol. 7, pp. 75577-75590, May. 2019. https://doi.org/10.1109/ACCESS.2019.2918792
  53. Z. Wu, X. Feng, and C. Yang, "A deep learning method to detect atrial fibrillation based on continuous wavelet transform," in Proc. of 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, pp. 1908-1912, Jul. 2019.
  54. M. Limam and F. Precioso, "Atrial fibrillation detection and ECG classification based on convolutional recurrent neural network," in Proc. of 2017 Computing in Cardiology (CinC), IEEE, pp. 1-4, Sep. 2017.
  55. M. Zabihi, A. B. Rad, A. K. Katsaggelos, S. Kiranyaz, S. Narkilahti, and M. Gabbouj, "Detection of atrial fibrillation in ECG hand-held devices using a random forest classifier," in Proc. of 2017 Computing in Cardiology (CinC), IEEE, pp. 1-4, Sep. 2017.
  56. S. Chen, W. Hua, Z. Li, J. Li, and X. Gao, "Heartbeat classification using projected and dynamic features of ecg signal," Biomedical Signal Processing and Control, vol. 31, pp. 165-173, Jan. 2017. https://doi.org/10.1016/j.bspc.2016.07.010
  57. H. Wang, H. Shi, K. Lin, C. Qin, L. Zhao, Y. Huang, and C. Liu, "A high-precision arrhythmia classification method based on dual fully connected neural network," Biomedical Signal Processing and Control, vol. 58, p. 101874, Apr. 2020. https://doi.org/10.1016/j.bspc.2020.101874
  58. G. B. Moody and R. G. Mark, "The impact of the MIT-BIH arrhythmia database," IEEE Engineering in Medicine & Biology Magazine, vol. 20, no. 3, pp. 45-50, May. 2001. https://doi.org/10.1109/51.932724
  59. A. KingaD, "Adam: a method for stochastic optimization," in Proc. of Anon. International Conference on Learning Representations. SanDego: ICLR, 2015.