DOI QR코드

DOI QR Code

환자 IQR 이상치와 상관계수 기반의 머신러닝 모델을 이용한 당뇨병 예측 메커니즘

Diabetes prediction mechanism using machine learning model based on patient IQR outlier and correlation coefficient

  • Jung, Juho (Applied Artifical Intelligence, Sungkyunkwan University) ;
  • Lee, Naeun (Applied Artifical Intelligence, Sungkyunkwan University) ;
  • Kim, Sumin (Applied Artifical Intelligence, Sungkyunkwan University) ;
  • Seo, Gaeun (Applied Artifical Intelligence, Sungkyunkwan University) ;
  • Oh, Hayoung (College of Computing and Informatics, Sungkyunkwan University)
  • 투고 : 2021.05.26
  • 심사 : 2021.07.09
  • 발행 : 2021.10.31

초록

최근 전 세계적으로 당뇨병 유발률이 증가함에 따라 다양한 머신러닝과 딥러닝 기술을 통해 당뇨병을 예측하려고 는 연구가 이어지고 있다. 본 연구에서는 독일의 Frankfurt Hospital 데이터로 머신러닝 기법을 활용하여 당뇨병을 예측하는 모델을 제시한다. IQR(Interquartile Range) 기법을 이용한 이상치 처리와 피어슨 상관관계 분석을 적용하고 Decision Tree, Random Forest, Knn, SVM, 앙상블 기법인 XGBoost, Voting, Stacking로 모델별 당뇨병 예측 성능을 비교한다. 연구를 진행한 결과 Stacking ensemble 기법의 정확도가 98.75%로 가장 뛰어난 성능을 보였다. 따라서 해당 모델을 이용하여 현대 사회에 만연한 당뇨병을 정확히 예측하고 예방할 수 있다는 점에서 본 연구는 의의가 있다.

With the recent increase in diabetes incidence worldwide, research has been conducted to predict diabetes through various machine learning and deep learning technologies. In this work, we present a model for predicting diabetes using machine learning techniques with German Frankfurt Hospital data. We apply outlier handling using Interquartile Range (IQR) techniques and Pearson correlation and compare model-specific diabetes prediction performance with Decision Tree, Random Forest, Knn (k-nearest neighbor), SVM (support vector machine), Bayesian Network, ensemble techniques XGBoost, Voting, and Stacking. As a result of the study, the XGBoost technique showed the best performance with 97% accuracy on top of the various scenarios. Therefore, this study is meaningful in that the model can be used to accurately predict and prevent diabetes prevalent in modern society.

키워드

과제정보

Following are results of a study on the "Convergence and Open Sharing System" Project, supported by the Ministry of Education and National Research Foundation of Korea.

참고문헌

  1. A. Mujumdar and V. Vaidehi, "Diabetes Prediction using Machine Learning Algorithms," Procedia Computer Science, vol. 165, pp. 292-299, 2019. https://doi.org/10.1016/j.procs.2020.01.047
  2. H. Naz and S. Ahuja, "Deep learning approach for diabetes prediction using PIMA Indian dataset," Journal of Diabetes & Metabolic Disorders, vol. 19, pp. 391-403, 2020. https://doi.org/10.1007/s40200-020-00520-5
  3. N. P. Tigga and S. Grag, "Prediction of Type 2 Diabetes using Machine Learning Classification Methods," Procedia Computer Science, vol. 167, pp. 706-716, 2020. https://doi.org/10.1016/j.procs.2020.03.336
  4. J. S. Jang, M. J. Lee, and T. R. Lee, "Development of T2DM Prediction Model Using RNN," Journal of Digital Convergence, vol. 17, no. 8, pp. 249-255, 2019. https://doi.org/10.14400/JDC.2019.17.8.249
  5. S. H. Kim, H. B. Lee, S. W. Jeon, D. Y. Kim, and S. J. Lee, "Prediction of Blood Glucose in Diabetic Inpatients Using LSTM Neural Network," Journal of KIISE, vol. 47, no. 12, pp. 1120-1125, 2020. https://doi.org/10.5626/jok.2020.47.12.1120
  6. Q. Sun, M. V. Jankovie, L. Bally, and S. G. Mougiakakou, "Predicting Blood Glucose with an LSTM and Bi-LSTM Based Deep Neural Network," 2018 14th Symposium on Neural Networks and Applications IEEE, pp. 1-5, 2018.
  7. C. H. Lim, H. S. Kang, Y. S. Lee, H. J. Lee, and T. H. Eom, "Short Term Glucose and Hypoglycemia Prediction Using CGM and Convolutional Recurrent Neural Network," The Korean Institute of Information Scientists and Engineers, pp. 1556-1557, 2020.
  8. K. B. Won and M. K. Kim, "The Implemetation of Artificial Neural Network Model for Improving the Diagnosis Accuracy of Type 2 Diabetes," Proceedings of Symposium of the Korean Institute of communications and Information Sciences, pp. 849-850, 2018.
  9. S. H. Lee, T. H. Ahn, S. W. Song, and Y. G. Jung, "Improving the Accuracy of Diabetes Prediction using Filtering Techniques," The Institute of Electronics and Information Engineers, pp. 983-986, 2017.
  10. Y. R. Lee, E. S. Kim, J. U. Park, Y. W. Kim, H. S. Choi, and K. J. Lee, "A Prediction Algorithm of Hypoglycemia using Electrocardiogram based on Support Vector Machine," The Institute of Electronics and Information Engineers, pp. 1613-1615, 2020.
  11. Documents for Peason Coefficient [Internet]. Available: https://support.minitab.com/ko-kr/minitab/18/help-and-how-to/statistics/basic-statistics/how-to/correlation/interpret-the-results/key-results/.
  12. Documents for IQR [Internet]. Available: https://bookdown.org/yuaye_kt/RTIPS/data-prep-2.html.
  13. Y. J. Hong, E. H. Na, Y. H. Jung, and Y. U. Kim, "Distributed Processing Environment for Outlier Removal to Analyze Big Data," Journal of Korean Computer Information Society Korean Computer Information Society, vol. 24, no. 2, pp. 73-74, Jul. 2016.
  14. K. B. Park, "Possibility of Learning AI Decision Tree Algorithm in Social Studies Education," Korean journal of elementary education, vol. 31, no. 4, pp. 133-143, 2020. https://doi.org/10.20972/KJEE.31.4.202012.133
  15. J. E. Yoo, "Random Forest," Education Evaluation Study, vol. 28, no. 2, pp. 427-448, Jun. 2015.
  16. J. M Lee, "Artificial Intelligence : An Efficient kNN Algorithm," The KIPS Transactions : Part B, vol. 11, no. 7, pp. 849-854, 2016.
  17. H. M. Je and S. Y. Bang, "Improving SVM Classification by Constructing Ensemble," Journal of the Information Society: Software and Application, vol. 30, no. 3.4, pp. 251-258, Apr. 2003.
  18. J. H. Han, D. G. Go, and H. J. Choi, "Predicting and Analyzing Factors Affecting Financial Stress of Household using Machine Learning: Application of XGBoost," Korea Consumer Association, vol. 30, no. 2, pp. 21-43, 2019.
  19. Documents for Grid Search [Internet]. Available: https://databuzz-team.github.io/2018/12/05/hyperparameter-setting/.
  20. Documents for Voting [Internet]. Available: https://velog.io/@guns/%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-%EC%8A%A4%ED%84%B0%EB%94%94-%EC%95%99%EC%83%81%EB%B8%94-Ensemble-Voting.
  21. H. N. Eom, J. S. Kim, and S. O. Choi, "Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model," Journal of intelligence and information systems, vol. 26, no. 2, pp. 105-129, 2020. https://doi.org/10.13088/JIIS.2020.26.2.105