DOI QR코드

DOI QR Code

A Study on Diabetes Management System Based on Logistic Regression and Random Forest

  • ByungJoo Kim (Department of Electrical and Electronics Engineering Youngsan University)
  • Received : 2024.04.15
  • Accepted : 2024.04.28
  • Published : 2024.06.30

Abstract

In the quest for advancing diabetes diagnosis, this study introduces a novel two-step machine learning approach that synergizes the probabilistic predictions of Logistic Regression with the classification prowess of Random Forest. Diabetes, a pervasive chronic disease impacting millions globally, necessitates precise and early detection to mitigate long-term complications. Traditional diagnostic methods, while effective, often entail invasive testing and may not fully leverage the patterns hidden in patient data. Addressing this gap, our research harnesses the predictive capability of Logistic Regression to estimate the likelihood of diabetes presence, followed by employing Random Forest to classify individuals into diabetic, pre-diabetic or nondiabetic categories based on the computed probabilities. This methodology not only capitalizes on the strengths of both algorithms-Logistic Regression's proficiency in estimating nuanced probabilities and Random Forest's robustness in classification-but also introduces a refined mechanism to enhance diagnostic accuracy. Through the application of this model to a comprehensive diabetes dataset, we demonstrate a marked improvement in diagnostic precision, as evidenced by superior performance metrics when compared to other machine learning approaches. Our findings underscore the potential of integrating diverse machine learning models to improve clinical decision-making processes, offering a promising avenue for the early and accurate diagnosis of diabetes and potentially other complex diseases.

Keywords

References

  1. E. J. Moon, Y. E. Jo, T. C. Park, Y. K. Kim, S. H. Jung, H. J. Kim, and K. W. Lee, "Clinical characteristics and direct medical costs of type 2 diabetic patients," Korean Diabetes Journal, vol. 32, no. 4, pp. 358-365, 2008. DOI: 10.4093/kdj.2008.32.4.358
  2. A. Kumar Dewangan and P. Agrawal, "Classification of diabetes mellitus using machine learning techniques," International Journal of Engineering and Applied Sciences, vol. 2, no. 5, 257905, 2015.
  3. T. Zhu, K. Li, P. Herrero, and P. Georgiou, "Deep learning for diabetes: a systematic review," IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 7, pp. 2744-2757, 2020.
  4. R. A. Sowah, A. A. Bampoe-Addo, S. K. Armoo, F. K. Saalia, F. Gatsi, and B. Sarkodie-Mensah, "Design and development of diabetes management system using machine learning," International Journal of Telemedicine and Applications, 2020.
  5. R. Singla, A. Singla, Y. Gupta, and S. Kalra, "Artificial intelligence/machine learning in diabetes care," Indian Journal of Endocrinology and Metabolism, vol. 23, no. 4, pp. 495, 2019.
  6. A. Mujumdar and V. Vaidehi, "Diabetes prediction using machine learning algorithms," Procedia Computer Science, vol. 165, pp. 292-299, 2019.
  7. R. Couronne, P. Probst, and A.-L. Boulesteix, "Random forest versus logistic regression: a large-scale benchmark experiment," BMC Bioinformatics, vol. 19, pp. 1-14, 2018.
  8. J. P. Kandhasamy and S. Balamurali, "Performance analysis of classifier models to predict diabetes mellitus," Procedia Computer Science, vol. 47, pp. 45-51, 2015.
  9. A. Liaw and M. Wiener, "Classification and regression by Random Forest," R News, vol. 2, no. 3, pp. 18-22, 2002.
  10. J. Zhou, Y. Cao, X. Wang, P. Li, and W. Xu, "Deep recurrent models with fast-forward connections for neural machine translation," Transactions of the Association for Computational Linguistics, vol. 4, pp. 371-383, 2016.
  11. https:// web.stanford.edu/~jurafsky/slp3/5.pdf
  12. I. Tougui, A. Jilbab, and J. El Mhamdi, "Impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications," Healthcare Informatics Research, vol. 27, no. 3, pp. 189-199, 2021. DOI: https://doi.org/10.4258/hir.2021.27.3.189
  13. A. Gunawardana and G. Shani, "A survey of accuracy evaluation metrics of recommendation tasks," J. Mach. Learn. Res., vol. 10, no. 12, 2009.
  14. B. Juba and H. S. Le, "Precision-recall versus accuracy and the role of large data sets," in Proc. AAAI Conf. Artif. Intell., vol. 33, no. 01, pp. 4039-4048, July 2019.
  15. E. J. Michaud, Z. Liu, and M. Tegmark, "Precision Machine Learning," Entropy, vol. 25, no. 1, pp. 175, 2023.
  16. D. Chicco and G. Jurman, "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation," BMC Genomics, vol. 21, pp. 1-13, 2020.