Browse > Article
http://dx.doi.org/10.6109/jicce.2022.20.1.31

Hybrid Feature Selection Method Based on Genetic Algorithm for the Diagnosis of Coronary Heart Disease  

Wiharto, Wiharto (Department of Informatics, Sebelas Maret University)
Suryani, Esti (Department of Informatics, Sebelas Maret University)
Setyawan, Sigit (Department of Medicine, Sebelas Maret University)
Putra, Bintang PE (Department of Informatics, Sebelas Maret University)
Abstract
Coronary heart disease (CHD) is a comorbidity of COVID-19; therefore, routine early diagnosis is crucial. A large number of examination attributes in the context of diagnosing CHD is a distinct obstacle during the pandemic when the number of health service users is significant. The development of a precise machine learning model for diagnosis with a minimum number of examination attributes can allow examinations and healthcare actions to be undertaken quickly. This study proposes a CHD diagnosis model based on feature selection, data balancing, and ensemble-based classification methods. In the feature selection stage, a hybrid SVM-GA combined with fast correlation-based filter (FCBF) is used. The proposed system achieved an accuracy of 94.60% and area under the curve (AUC) of 97.5% when tested on the z-Alizadeh Sani dataset and used only 8 of 54 inspection attributes. In terms of performance, the proposed model can be placed in the very good category.
Keywords
coronary heart disease; genetic algorithm; feature selection; ensemble learning; support vector machine;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 J. Lin, H. Chen, S. Li, Y. Liu, X. Li, and B. Yu, "Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier," Artificial Intelligence in Medicine, vol. 98, pp. 35-47, Jul. 2019, DOI: 10.1016/j.artmed.2019.07.005.   DOI
2 F. Gorunescu, "Data mining: Concepts, models, and techniques," Berlin, Heidelberg: Springer, 2011.
3 N. M. Hemphill, M. T. Y. Kuan, and K. C. Harris, "Reduced physical activity during COVID-19 pandemic in children with congenital heart disease," Canadian Journal of Cardiology, vol. 36, no. 2020, pp. 1130-1134, 2020.   DOI
4 M. Pramanik, R. Pradhan, P. Nandy, A. K. Bhoi, and P. Barsocchi, "Machine learning methods with decision forests for parkinson's detection," Applied Sciences, vol. 11, no. 2, p. 581, Jan. 2021, DOI: 10.3390/app11020581.   DOI
5 R. Jing and Y. Zhang, "A view of support vector machines algorithm on classification problems," in 2010 International Conference on Multimedia Communications, TBD, TBD, Hong Kong, pp. 13-16, Aug. 2010. DOI: 10.1109/MEDIACOM.2010.21.   DOI
6 N. Sanchez-Marono, A. Alonso-Betanzos, and M. Tombilla-Sanroman, "Filter methods for feature selection - A comparative study," in Intelligent Data Engineering and Automated Learning - IDEAL 2007, vol. 4881, H. Yin, P. Tino, E. Corchado, W. Byrne, and X. Yao, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 178-187. DOI: 10.1007/978-3-540-77226-2_19.   DOI
7 Y. Khourdifi and M. Bahaj, "Feature selection with fast correlation-based filter for breast cancer prediction and classification using machine learning algorithms," in 2018 International Symposium on Advanced Electrical and Communication Technologies (ISAECT), Rabat, Morocco, pp. 1-6, Nov. 2018. DOI: 10.1109/ISAECT.2018.8618688.   DOI
8 X. Luo, F. Lin, Y. Chen, S. Zhu, Z. Xu, Z. Huo, M. Yu, and J. Peng, "Coupling logistic model tree and random subspace to predict the landslide susceptibility areas with considering the uncertainty of environmental features," Sci Rep, vol. 9, no. 1, p. 15369, Dec. 2019, DOI: 10.1038/s41598-019-51941-z.   DOI
9 J. O. Ogutu, H.-P. Piepho, and T. Schulz-Streeck, "A comparison of random forests, boosting and support vector machines for genomic selection," BMC Proc, vol. 5, no. S3, p. S11, Dec. 2011, DOI: 10.1186/1753-6561-5-S3-S11.   DOI
10 P. Verma, V. K. Awasthi, and S. K. Sahu, "A novel design of classification of coronary artery disease using deep learning and data mining algorithms," Revue d'Intelligence Artificielle, vol. 35, no. 3, pp. 209-215, 2021.   DOI
11 K. P. Shroff and H. H. Maheta, "A comparative study of various feature selection techniques in high-dimensional data set to improve classification accuracy," in 2015 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, Jan. 2015, pp. 1-6. DOI: 10.1109/ICCCI.2015.7218098.   DOI
12 F. Z. Abdeldjouad, M. Brahami, and N. Matta, "A hybrid approach for heart disease diagnosis and prediction using machine learning techniques," in The Impact of Digital Technologies on Public Health in Developed and Developing Countries, vol. 12157, M. Jmaiel, M. Mokhtari, B. Abdulrazak, H. Aloulou, and S. Kallel, Eds. Cham: Springer International Publishing, pp. 299-306, 2020. DOI: 10.1007/978-3-030-51517-1_26.   DOI
13 N. Kumar, N. N. Das, D. Gupta, K. Gupta, and J. Bindra, "Efficient Automated Disease Diagnosis Using Machine Learning Models," Journal of Healthcare Engineering, vol. 2021, pp. 1-13, 2021.
14 P. Ghosh et al., "Efficient prediction of cardiovascular disease using machine learning algorithms with relief and lasso feature selection techniques," IEEE Access, vol. 9, pp. 19304-19326, 2021, DOI: 10.1109/ACCESS.2021.3053759.   DOI
15 L. Ashish, S. K. V, and S. Yeligeti, "Ischemic heart disease detection using support vector machine and extreme gradient boosting method," Materials Today: Proceedings, p. S2214785321008129, Feb. 2021, DOI: 10.1016/j.matpr.2021.01.715.   DOI
16 J. Kim, J. Lee, and Y. Lee, "Data-mining-based coronary heart disease risk prediction model using fuzzy logic and decision tree," Healthcare Informatics Research, vol. 21, no. 3, pp. 167-174, 2015, DOI: 10.4258/hir.2015.21.3.167.   DOI
17 N. M. Khan, N. Madhav C, A. Negi, and I. S. Thaseen, "Analysis on improving the performance of machine learning models using feature selection technique," in Intelligent Systems Design and Applications, vol. 941, A. Abraham, A. K. Cherukuri, P. Melin, and N. Gandhi, Eds. Cham: Springer International Publishing, 2020, pp. 69-77. DOI: 10.1007/978-3-030-16660-1_7.   DOI
18 M. M. Ghiasi, S. Zendehboudi, and A. A. Mohsenipour, "Decision tree-based diagnosis of coronary artery disease: CART model," Computer Methods and Programs in Biomedicine, vol. 192, pp. 1-14, Aug. 2020, DOI: 10.1016/j.cmpb.2020.105400.   DOI
19 R. Alizadehsani, M. J. Hosseini, Z. A. Sani, A. Ghandeharioun, and R. Boghrati, "Diagnosis of coronary artery disease using cost-sensitive algorithms," in Proceedings - 12th IEEE International Conference on Data Mining Workshops, ICDMW 2012, pp. 9-16, 2012, DOI: 10.1109/ICDMW.2012.29.   DOI
20 R. Alizadehsani et al., "Non-invasive detection of coronary artery disease in high-risk patients based on the stenosis prediction of separate coronary arteries," Computer Methods and Programs in Biomedicine, vol. 162, pp. 119-127, 2018, DOI: 10.1016/j.cmpb.2018.05.009.   DOI
21 E. M. Karabulut, S. A. Ozel, and T. Ibrikci, "A comparative study on the effect of feature selection on classification accuracy," Procedia Technology, vol. 1, pp. 323-327, 2012, DOI: 10.1016/j.protcy.2012.02.068.   DOI
22 Y. Zhang, F. Liu, Z. Zhao, D. Li, X. Zhou, and J. Wang, "Studies on application of support vector machine in diagnose of coronary heart disease," 2012 6th International Conference on Electromagnetic Field Problems and Applications, ICEF'2012, 2012, DOI: 10.1109/ICEF.2012.6310380.   DOI
23 R. Alizadehsani et al., "Diagnosis of coronary artery disease using data mining based on lab data and echo features," Journal of Medical and Bioengineering, vol. 1, no. 1, pp. 26-29, 2013, DOI: 10.12720/jomb.1.1.26-29.   DOI
24 R. Alizadehsani et al., "Coronary artery disease detection using computational intelligence methods," Knowledge-Based Systems, vol. 109, pp. 187-197, Oct. 2016, DOI: 10.1016/j.knosys.2016.07.004.   DOI
25 R. Detrano, A. Janosi, W. Steinbrunn, K. H. Guppy, S. Lee, and V. Froelicher, "International application of a new probability algorithm for the diagnosis of coronary artery disease," The American Journal of Cardiology, vol. 64, no. 5, pp. 304-310, 1989, DOI: 10.1016/0002-9149(89)90524-9.   DOI
26 K. Uyar and A. Ilhan, "Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks," Procedia Computer Science, vol. 120, pp. 588-593, 2017, DOI: 10.1016/j.procs.2017.11.283.   DOI
27 M. N. Adnan and M. Z. Islam, "Forest PA : Constructing a decision forest by penalizing attributes used in previous trees," Expert Systems with Applications, vol. 89, pp. 389-403, Dec. 2017, DOI: 10.1016/j.eswa.2017.08.002.   DOI
28 Y. Zhao, Z. S.-Y. Wong, and K. L. Tsui, "a framework of rebalancing imbalanced healthcare data for rare events' classification: A case of look-alike sound-alike mix-up incident detection," Journal of Healthcare Engineering, vol. 2018, pp. 1-11, 2018, DOI: 10.1155/2018/6275435.   DOI
29 N. Landwehr, M. Hall, and E. Frank, "Logistic model trees," Machine Learning, vol. 59, pp. 161-205, 2005, DOI: 10.1007/s10994-005-0466-3.   DOI
30 M. C. Tu, D. Shin, and D. Shin, "A Comparative Study of Medical Data Classification Methods Based on Decision Tree and Bagging Algorithms," in 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing, Chengdu, China, pp. 183-187, Dec. 2009. DOI: 10.1109/DASC.2009.40.   DOI
31 R. Alizadehsani, J. Habibi, M. J. Hosseini, H. Mashayekhi, R. Boghrati, A. Ghandeharioun, B. Bahadorian, Z. A. Sani, "A data mining approach for diagnosis of coronary artery disease," Computer Methods and Programs in Biomedicine, vol. 111, no. 1, pp. 52-61, Jul. 2013, DOI: 10.1016/j.cmpb.2013.03.004.   DOI
32 N. Jothi, W. Husain, N. Abdul Rashid, and S. M. Syed-Mohamad, "Feature selection method using genetic algorithm for medical dataset," International Journal on Advanced Science, Engineering and Information Technology, vol. 9, no. 6, p. 1907, Dec. 2019, DOI: 10.18517/ijaseit.9.6.10226.   DOI
33 J. Brandt and E. Lanzen, "A comparative review of SMOTE and ADASYN in imbalanced data classification," Dissertation, Uppsala University, Swedia, 2021.
34 C. Hu, W. Fan, J. -X. Du, and N. Bouguila, "A novel statistical approach for clustering positive data based on finite inverted Beta-Liouville mixture models," Neurocomputing, vol. 333, pp. 110-123, Mar. 2019, DOI: 10.1016/j.neucom.2018.12.066.   DOI
35 B. Senliol, G. Gulgezen, L. Yu, and Z. Cataltepe, "Fast Correlation Based Filter (FCBF) with a different search strategy," in 2008 23rd International Symposium on Computer and Information Sciences, Istanbul, Turkey, pp. 1-4, Oct. 2008. DOI: 10.1109/ISCIS.2008.4717949.   DOI
36 W. Xie, G. Liang, Z. Dong, B. Tan, and B. Zhang, "An improved oversampling algorithm based on the samples' selection strategy for classifying imbalanced data," Mathematical Problems in Engineering, vol. 2019, pp. 1-13, May 2019, DOI: 10.1155/2019/3526539.   DOI
37 A. G. Karegowda, A. S. Manjunath, G. Ratio, and C. F. Evaluation, "Comparative study of attribute selection using gain ratio," International Journal of Information Technology and Knowledge and Knowledge Management, vol. 2, no. 2, pp. 271-277, 2010.
38 E. P. Ephzibah, "Cost effective approach on feature selection using genetic algorithms and LS-SVM classifier," IJCA, vol. ecot, no. 1, pp. 16-20, Dec. 2010. DOI: 10.5120/1532-135.   DOI
39 J. H. Joloudari, E. H. Joloudari, H. Saadatfar, M. Ghasemigol, S. M. Razavi, A. Mosavi, N. Nabipour, S. Shamshirband, and L. Nadai, "Coronary artery disease diagnosis; ranking the significant features using a random trees model," IJERPH, vol. 17, no. 3, p. 731, Jan. 2020, DOI: 10.3390/ijerph17030731.   DOI
40 Y. -T. Kim, D. -K. Kim, H. Kim, and D. -J. Kim, "A comparison of oversampling methods for constructing a prognostic model in the patient with heart failure," in 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Korea (South), pp. 379-383, Oct. 2020. DOI: 10.1109/ICTC49870.2020.9289522.   DOI
41 M. Zomorodi-moghadam, M. Abdar, Z. Davarzani, X. Zhou, P. Plawiak, and U. R. Acharya, "Hybrid particle swarm optimization for rule discovery in the diagnosis of coronary artery disease," Expert Systems, vol. 38, no. 1, Jan. 2021, DOI: 10.1111/exsy.12485.   DOI
42 N. Matondang and N. Surantha, "Effects of oversampling SMOTE in the classification of hypertensive dataset," Adv. Sci. Technol. Eng. Syst. J., vol. 5, no. 4, pp. 432-437, Aug. 2020, DOI: 10.25046/aj050451.   DOI
43 S. Bae, S. R. Kim, N. Kim, W. J. Shim, and M. Park, "Impact of cardiovascular disease and risk factors on fatal outcomes in patients with COVID-19 according to age: a systematic review and metaanalysis," vol. 107, no. 5, pp. 373-380, 2021.   DOI
44 W. Wiharto, H. Kusnanto, and H. Herianto, "System diagnosis of coronary heart disease using a combination of dimensional reduction and data mining techniques: A review," Indonesian Journal of Electrical Engineering and Computer Science, vol. 7, no. 2, pp. 514-523, 2017, DOI: 10.11591/ijeecs.v7.i2.pp514-523.   DOI
45 A. K. Shukla, P. Singh, and M. Vardhan, "A new hybrid feature subset selection framework based on binary genetic algorithm and information theory," Int. J. Comp. Intel. Appl., vol. 18, no. 03, p. 1950020, Sep. 2019, DOI: 10.1142/S1469026819500202.   DOI
46 W. Wiharto, H. Herianto, and H. Kusnanto, "A tiered approach on dimensional reduction process for prediction of coronary heart disease," Indonesian Journal of Electrical Engineering and Computer Science, vol. 11, no. 2, pp. 487-495, 2018, DOI: 10.11591/ijeecs.v11.i2.   DOI
47 L. Yu and H. Liu, "Feature selection for high-dimensional data: A fast correlation-based filter solution," in Proceedings, Twentieth International Conference on Machine Learning, Washington, DC, United States, pp. 856-863, Aug. 2003.
48 M. Abdar, W. Ksiazek, U. R. Acharya, R.-S. Tan, V. Makarenkov, and P. Plawiak, "A new machine learning technique for an accurate diagnosis of coronary artery disease," Computer Methods and Programs in Biomedicine, vol. 179, p. 104992, Oct. 2019, DOI: 10.1016/j.cmpb.2019.104992.   DOI
49 C. Krittanawong, "Machine learning prediction in cardiovascular diseases: A meta-analysis," Scientific Reports, vol. 2020, no. 10, pp. 1-11, 2020.
50 H. Djellali, S. Guessoum, N. Ghoualmi-Zine, and S. Layachi, "Fast correlation based filter combined with genetic algorithm and particle swarm on feature selection," in 2017 5th International Conference on Electrical Engineering - Boumerdes (ICEE-B), Boumerdes, pp. 1-6, Oct. 2017. DOI: 10.1109/ICEE-B.2017.8192090.   DOI
51 A. R. Purnajaya, W. A. Kusuma, and M. K. D. Hardhienata, "Performance comparison of data sampling techniques to handle imbalanced class on prediction of compound-protein interaction," Bio, vol. 8, no. 1, pp. 41-48, Jun. 2020, DOI: 10.24252/bio.v8i1.12002.   DOI
52 A. H. Shahid and M. P. Singh, "A Novel Approach for Coronary Artery Disease Diagnosis using Hybrid Particle Swarm Optimization based Emotional Neural Network," Biocybernetics and Biomedical Engineering, vol. 40, no. 4, pp. 1568-1585, Oct. 2020, DOI: 10.1016/j.bbe.2020.09.005.   DOI
53 R. P. Cherian, N. Thomas, and S. Venkitachalam, "Weight optimized neural network for heart disease prediction using hybrid lion plus particle swarm algorithm," Journal of Biomedical Informatics, vol. 110, p. 103543, Oct. 2020, DOI: 10.1016/j.jbi.2020.103543.   DOI
54 R. Blagus and L. Lusa, "SMOTE for high-dimensional class-imbalanced data," BMC Bioinformatics, vol. 14, no. 106, pp. 1-6, Dec. 2013, DOI: 10.1186/1471-2105-14-106.   DOI
55 N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, vol. 16, no. 1, pp. 321-357, 2002, DOI: 10.1613/jair.953.   DOI
56 S. Belarouci and M. A. Chikh, "Medical imbalanced data classification," Adv. Sci. Technol. Eng. Syst. J., vol. 2, no. 3, pp. 116-124, Apr. 2017, DOI: 10.25046/aj020316.   DOI
57 B. A. Tama, S. Im, and S. Lee, "Improving an intelligent detection system for coronary heart disease using a two-tier classifier ensemble," BioMed Research International, vol. 2020, pp. 1-10, Apr. 2020, DOI: 10.1155/2020/9816142.   DOI
58 Z. Arabasadi, R. Alizadehsani, M. Roshanzamir, H. Moosaei, and A. A. Yarifard, "Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm," Computer Methods and Programs in Biomedicine, vol. 141, no. 2017, pp. 19-26, 2017, DOI: 10.1016/j.cmpb.2017.01.004.   DOI
59 E. Ramentol, Y. Caballero, R. Bello, and F. Herrera, "SMOTE-RSB *: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory," Knowledge and Information Systems, vol. 33, no. 2, pp. 245-265, 2012, DOI: 10.1007/s10115-011-0465-6.   DOI
60 C. R. Olsen, R. J. Mentz, K. J. Anstrom, D. Page, and P. A. Patel, "Clinical applications of machine learning in the diagnosis, classification, and prediction of heart failure," American Heart Journal, vol. 229, pp. 1-17, Nov. 2020, DOI: 10.1016/j.ahj.2020.07.009.   DOI