Browse > Article
http://dx.doi.org/10.22937/IJCSNS.2021.21.9.3

Investigating Non-Laboratory Variables to Predict Diabetic and Prediabetic Patients from Electronic Medical Records Using Machine Learning  

Mukhtar, Hamid (College of Computers and Information Technology, Taif University)
Al Azwari, Sana (College of Computers and Information Technology, Taif University)
Publication Information
International Journal of Computer Science & Network Security / v.21, no.9, 2021 , pp. 19-30 More about this Journal
Abstract
Diabetes Mellitus (DM) is one of common chronic diseases leading to severe health complications that may cause death. The disease influences individuals, community, and the government due to the continuous monitoring, lifelong commitment, and the cost of treatment. The World Health Organization (WHO) considers Saudi Arabia as one of the top 10 countries in diabetes prevalence across the world. Since most of the medical services are provided by the government, the cost of the treatment in terms of hospitals and clinical visits and lab tests represents a real burden due to the large scale of the disease. The ability to predict the diabetic status of a patient without the laboratory tests by performing screening based on some personal features can lessen the health and economic burden caused by diabetes alone. The goal of this paper is to investigate the prediction of diabetic and prediabetic patients by considering factors other than the laboratory tests, as required by physicians in general. With the data obtained from local hospitals, medical records were processed to obtain a dataset that classified patients into three classes: diabetic, prediabetic, and non-diabetic. After applying three machine learning algorithms, we established good performance for accuracy, precision, and recall of the models on the dataset. Further analysis was performed on the data to identify important non-laboratory variables related to the patients for diabetes classification. The importance of five variables (gender, physical activity level, hypertension, BMI, and age) from the person's basic health data were investigated to find their contribution to the state of a patient being diabetic, prediabetic or normal. Our analysis presented great agreement with the risk factors of diabetes and prediabetes stated by the American Diabetes Association (ADA) and other health institutions worldwide. We conclude that by performing class-specific analysis of the disease, important factors specific to Saudi population can be identified, whose management can result in controlling the disease. We also provide some recommendations learnt from this research.
Keywords
prediabetics; prediction; feature importance; feature contribution; PIMA diabetes dataset;
Citations & Related Records
연도 인용수 순위
  • Reference
1 H. Yki-Jarvinen, "Combination therapies with insulin in type 2 diabetes," Diabetes care, vol. 24, no. 4, pp. 758-767, 2001.   DOI
2 A. D. Association, "2. classification and diagnosis of diabetes: standards of medical care in diabetes-2019," Diabetes care, vol. 42, no. Supplement 1, pp. S13-S28, 2019.   DOI
3 A. H. Syed and T. Khan, "Machine learning-based application for predicting risk of type 2 diabetes mellitus (T2DM) in saudi arabia: A retrospective cross-sectional study," IEEE Access, 2020.
4 I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas et al., "Machine learning and data mining methods in diabetes research," Computational and Structural Biotechnology Journal, vol. 15, pp. 104 - 116, 2017.   DOI
5 D. Sacks, "A1c versus glucose testing: A comparison," Diabetes Care, vol. 34, pp. 518 - 523, 2011.   DOI
6 W. H. Organization, "World health organization: definition and diagnosis of diabetes mellitus and intermediate hyperglycemia: report of a WHO/IDF consultation," 2006.
7 J. Gutierrez, A. Alloubani, M. Mari and M. Alzaatreh, "Cardiovascular disease risk factors: Hypertension, diabetes mellitus and obesity among Tabuk citizens in Saudi Arabia," The open cardiovascular medicine journal, vol. 12, p. 41, 2018.   DOI
8 P. Kaur and R. Kaur, "Comparative analysis of classification techniques for diagnosis of diabetes," in Advances in Bioinformatics, Multimedia, and Electronics Circuits and Signals. Springer, Singapore, 2020, pp. 215-221.
9 H. Abbas, L. Alic, M. Erraguntla, J. Ji, M. Abdul-Ghani et al., "Predicting long-term type 2 diabetes with support vector machine using oral glucose tolerance test," bioRxiv, 2019.
10 M. S. Kadhm, I. W. Ghindawi and D. E. Mhawi, "An accurate diabetes prediction system based on k-means clustering and proposed classification approach," International Journal of Applied Engineering Research, vol. 13, no. 6, pp. 4038-4041, 2018.
11 M. AlMazroa, "Cost of diabetes in saudi arabia," Iproceedings, vol. 4, no. 1, p. e10566, 2018.   DOI
12 M. J. Alomar, K. R. Al-Ansari, and N. A. Hassan, "Comparison of awareness of diabetes mellitus type ii with treatment's outcome in term of direct cost in a hospital in Saudi Arabia," World journal of diabetes, vol. 10, no. 8, p. 463, 2019.   DOI
13 P. Saeedi, I. Petersohn, P. Salpea, B. Malanda, S. Karuranga et al., "Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the international diabetes federation diabetes atlas," Diabetes research and clinical practice, vol. 157, p. 107843, 2019.   DOI
14 N. Cho, J. Shaw, S. Karuranga, Y. Huang, J. da Rocha Fernandes et al., "Idf diabetes atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045," Diabetes research and clinical practice, vol. 138, pp. 271-281, 2018.   DOI
15 A. Alotaibi, L. Perry, L. Gholizadeh and A. Al-Ganmi, "Incidence and prevalence rates of diabetes mellitus in saudi arabia: An overview," Journal of Epidemiology and Global Health, vol. 7, pp. 211 - 218, 2017.   DOI
16 A. M. Saad, Z. M. Younes, H. Ahmed, J. A. Brown, R. M. Al Owesie et al., "Self-efficacy, self-care and glycemic control in saudi arabian patients with type 2 diabetes mellitus: A cross-sectional survey," Diabetes research and clinical practice, vol. 137, pp. 28-36, 2018.   DOI
17 E. Almutairi, M. Abbod and T. Itagaki, "Mathematical modelling of diabetes mellitus and associated risk factors in saudi arabia." International Journal of Simulation-Systems, Science & Technology, vol. 21, no. 2, 2020.
18 H. Lai, H. Huang, K. Keshavjee, A. Guergachi and X. Gao, "Predictive models for diabetes mellitus using machine learning techniques," BMC endocrine disorders, vol. 19, no. 1, pp. 1-9, 2019.   DOI
19 R. H. Devi, A. Bai and N. Nagarajan, "A novel hybrid approach for diagnosing diabetes mellitus using farthest first and support vector machine algorithms," Obesity Medicine, vol. 17, p. 100152, 2020.   DOI
20 S. Afzali and O. Yildiz, "An effective sample preparation method for diabetes prediction," Int. Arab J. Inf. Technol., vol. 15, pp. 968-973, 2018.
21 W. Yu, T. Liu, R. Valdez, M. Gwinn and M. J. Khoury, "Application of support vector machine modeling for prediction of common diseases: the case of diabetes and prediabetes," BMC medical informatics and decision making, vol. 10, no. 1, p. 16, 2010.   DOI
22 R. Caruana and A. Niculescu-Mizil, "Data mining in metric space: an empirical analysis of supervised learning performance criteria," in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, USA, 2004, pp. 69-78.
23 M. Feurer and F. Hutter, "Hyperparameter optimization," in Automated Machine Learning. Springer, Cham, 2019, pp. 3- 33.
24 Q. Wang, W. Cao, J. Guo, J. Ren, Y. Cheng et al., "DMP_MI: An effective diabetes mellitus classification algorithm on imbalanced data with missing values," IEEE Access, vol. 7, pp. 102 232-102 238, 2019.   DOI
25 K. Al-Rubeaan, H. A. Al-Manaa, T. Khoja, N. Ahmad, A. Alsharqawi et al., "The saudi abnormal glucose metabolism and diabetes impact study (saudi-dm)," Annals of Saudi Medicine, vol. 34, pp. 465 - 475, 2014.   DOI
26 A. D. Association, "Standards of medical care in diabetes-2018 abridged for primary care providers," Clinical diabetes: a publication of the American Diabetes Association, vol. 36, no. 1, p. 14, 2018.   DOI
27 C. Zhang and Y. Ma, "Random Forests" in Ensemble machine learning: methods and applications. London, UK: Springer Science+Business Media, pp. 157-176, 2012. [Online]. Available: https://link.springer.com/content/pdf/10.1007/978-1-4419-9326-7.pdf
28 H. Nasri and M. Yazdani, "The relationship between serum LDL-cholesterol, HDL-cholesterol and systolic blood pressure in patients with type 2 diabetes." Kardiologia polska, vol. 64, no. 12, pp. 1364-8, 2006.
29 P. Tuso, "Prediabetes and lifestyle modification: time to prevent a preventable disease." The Permanente journal, vol. 18, no. 3, pp. 88-93, 2014.   DOI
30 O. Daanouni, B. Cherradi and A. Tmiri, "Type 2 diabetes mellitus prediction model based on machine learning approach," in The Proceedings of the Third International Conference on Smart City Applications. Springer, Cham, 2019, pp. 454-469.
31 A. Dinh, S. Miertschin, A. Young and S. Mohanty, "A datadriven approach to predicting diabetes and cardiovascular disease with machine learning," BMC Medical Informatics and Decision Making, vol. 19, 2019.
32 B. Alic', L. Gurbeta and A. Badnjevic, "Machine learning techniques for classification of diabetes and cardiovascular diseases," in 2017 6th Mediterranean Conference on Embedded Computing (MECO), Bar, Montenegro, 2017, pp. 1-4.
33 S. Uddin, A. Khan, M. E. Hossain and M. A. Moni, "Comparing different supervised machine learning algorithms for disease prediction," BMC Medical Informatics and Decision Making, vol. 19, no. 1, pp. 1-16, 2019.   DOI
34 J. Semerdjian and S. Frank, "An ensemble classifier for predicting the onset of type ii diabetes," arXiv preprint arXiv:1708.07480, 2017.
35 J. V. Tu, "Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes," Journal of clinical epidemiology, vol. 49, no. 11, pp. 1225-1231, 1996.   DOI
36 A. Mathur and G. M. Foody, "Multiclass and binary svm classification: Implications for training and classification users," IEEE Geoscience and remote sensing letters, vol. 5, no. 2, pp. 241-245, 2008.   DOI
37 M. A. Alsuliman, S. A. Alotaibi, Q. Zhang and P. K. Durgampudi, "A systematic review of factors associated with uncontrolled diabetes and meta-analysis of its prevalence in saudi arabia since 2006," Diabetes/Metabolism Research and Reviews, p. e3395, 2020.
38 L. Zhou, S. Pan, J. Wang and A. V. Vasilakos, "Machine learning on big data: Opportunities and challenges," Neurocomputing, vol. 237, pp. 350-361, 2017.   DOI
39 H. F. Ahmad, H. Mukhtar, H. Alaqail, M. Seliaman, A. Abdulaziz. Investigating Health-Related Features and Their Impact on the Prediction of Diabetes Using Machine Learning, Applied Sciences, Vol. 11, No. 3. 2021. DOI:10.3390/app11031173.   DOI
40 H. Rodbard, P. Jellinger, J. Davidson, D. Einhorn, A. Garber et al., "Statement by an American association of clinical endocrinologists/american college of endocrinology consensus panel on type 2 diabetes mellitus: an algorithm for glycemic control," Endocrine practice, vol. 15, no. 6, pp. 540-559, 2009.   DOI