Browse > Article
http://dx.doi.org/10.6109/jkiice.2021.25.10.1296

Diabetes prediction mechanism using machine learning model based on patient IQR outlier and correlation coefficient  

Jung, Juho (Applied Artifical Intelligence, Sungkyunkwan University)
Lee, Naeun (Applied Artifical Intelligence, Sungkyunkwan University)
Kim, Sumin (Applied Artifical Intelligence, Sungkyunkwan University)
Seo, Gaeun (Applied Artifical Intelligence, Sungkyunkwan University)
Oh, Hayoung (College of Computing and Informatics, Sungkyunkwan University)
Abstract
With the recent increase in diabetes incidence worldwide, research has been conducted to predict diabetes through various machine learning and deep learning technologies. In this work, we present a model for predicting diabetes using machine learning techniques with German Frankfurt Hospital data. We apply outlier handling using Interquartile Range (IQR) techniques and Pearson correlation and compare model-specific diabetes prediction performance with Decision Tree, Random Forest, Knn (k-nearest neighbor), SVM (support vector machine), Bayesian Network, ensemble techniques XGBoost, Voting, and Stacking. As a result of the study, the XGBoost technique showed the best performance with 97% accuracy on top of the various scenarios. Therefore, this study is meaningful in that the model can be used to accurately predict and prevent diabetes prevalent in modern society.
Keywords
Stacking; Ensemble; Diabetes prediction; Machine learning; Interquartile range (IQR);
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. S. Jang, M. J. Lee, and T. R. Lee, "Development of T2DM Prediction Model Using RNN," Journal of Digital Convergence, vol. 17, no. 8, pp. 249-255, 2019.   DOI
2 Documents for Grid Search [Internet]. Available: https://databuzz-team.github.io/2018/12/05/hyperparameter-setting/.
3 Q. Sun, M. V. Jankovie, L. Bally, and S. G. Mougiakakou, "Predicting Blood Glucose with an LSTM and Bi-LSTM Based Deep Neural Network," 2018 14th Symposium on Neural Networks and Applications IEEE, pp. 1-5, 2018.
4 C. H. Lim, H. S. Kang, Y. S. Lee, H. J. Lee, and T. H. Eom, "Short Term Glucose and Hypoglycemia Prediction Using CGM and Convolutional Recurrent Neural Network," The Korean Institute of Information Scientists and Engineers, pp. 1556-1557, 2020.
5 S. H. Lee, T. H. Ahn, S. W. Song, and Y. G. Jung, "Improving the Accuracy of Diabetes Prediction using Filtering Techniques," The Institute of Electronics and Information Engineers, pp. 983-986, 2017.
6 Documents for IQR [Internet]. Available: https://bookdown.org/yuaye_kt/RTIPS/data-prep-2.html.
7 Y. J. Hong, E. H. Na, Y. H. Jung, and Y. U. Kim, "Distributed Processing Environment for Outlier Removal to Analyze Big Data," Journal of Korean Computer Information Society Korean Computer Information Society, vol. 24, no. 2, pp. 73-74, Jul. 2016.
8 J. E. Yoo, "Random Forest," Education Evaluation Study, vol. 28, no. 2, pp. 427-448, Jun. 2015.
9 J. M Lee, "Artificial Intelligence : An Efficient kNN Algorithm," The KIPS Transactions : Part B, vol. 11, no. 7, pp. 849-854, 2016.
10 J. H. Han, D. G. Go, and H. J. Choi, "Predicting and Analyzing Factors Affecting Financial Stress of Household using Machine Learning: Application of XGBoost," Korea Consumer Association, vol. 30, no. 2, pp. 21-43, 2019.
11 H. N. Eom, J. S. Kim, and S. O. Choi, "Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model," Journal of intelligence and information systems, vol. 26, no. 2, pp. 105-129, 2020.   DOI
12 Documents for Voting [Internet]. Available: https://velog.io/@guns/%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-%EC%8A%A4%ED%84%B0%EB%94%94-%EC%95%99%EC%83%81%EB%B8%94-Ensemble-Voting.
13 N. P. Tigga and S. Grag, "Prediction of Type 2 Diabetes using Machine Learning Classification Methods," Procedia Computer Science, vol. 167, pp. 706-716, 2020.   DOI
14 A. Mujumdar and V. Vaidehi, "Diabetes Prediction using Machine Learning Algorithms," Procedia Computer Science, vol. 165, pp. 292-299, 2019.   DOI
15 H. Naz and S. Ahuja, "Deep learning approach for diabetes prediction using PIMA Indian dataset," Journal of Diabetes & Metabolic Disorders, vol. 19, pp. 391-403, 2020.   DOI
16 S. H. Kim, H. B. Lee, S. W. Jeon, D. Y. Kim, and S. J. Lee, "Prediction of Blood Glucose in Diabetic Inpatients Using LSTM Neural Network," Journal of KIISE, vol. 47, no. 12, pp. 1120-1125, 2020.   DOI
17 K. B. Won and M. K. Kim, "The Implemetation of Artificial Neural Network Model for Improving the Diagnosis Accuracy of Type 2 Diabetes," Proceedings of Symposium of the Korean Institute of communications and Information Sciences, pp. 849-850, 2018.
18 Documents for Peason Coefficient [Internet]. Available: https://support.minitab.com/ko-kr/minitab/18/help-and-how-to/statistics/basic-statistics/how-to/correlation/interpret-the-results/key-results/.
19 K. B. Park, "Possibility of Learning AI Decision Tree Algorithm in Social Studies Education," Korean journal of elementary education, vol. 31, no. 4, pp. 133-143, 2020.   DOI
20 H. M. Je and S. Y. Bang, "Improving SVM Classification by Constructing Ensemble," Journal of the Information Society: Software and Application, vol. 30, no. 3.4, pp. 251-258, Apr. 2003.
21 Y. R. Lee, E. S. Kim, J. U. Park, Y. W. Kim, H. S. Choi, and K. J. Lee, "A Prediction Algorithm of Hypoglycemia using Electrocardiogram based on Support Vector Machine," The Institute of Electronics and Information Engineers, pp. 1613-1615, 2020.