A Study on Diabetes Management System Based on Logistic Regression and Random Forest

ByungJoo Kim;

doi:10.7236/IJASC.2024.13.2.61

International journal of advanced smart convergence

Volume 13 Issue 2
/
Pages.61-68
/
2024
/
2288-2847(pISSN)
/
2288-2855(eISSN)

The Institute of Internet, Broadcasting and Communication (한국인터넷방송통신학회)

DOI QR Code

A Study on Diabetes Management System Based on Logistic Regression and Random Forest

ByungJoo Kim (Department of Electrical and Electronics Engineering Youngsan University)

Received : 2024.04.15
Accepted : 2024.04.28
Published : 2024.06.30

https://doi.org/10.7236/IJASC.2024.13.2.61 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In the quest for advancing diabetes diagnosis, this study introduces a novel two-step machine learning approach that synergizes the probabilistic predictions of Logistic Regression with the classification prowess of Random Forest. Diabetes, a pervasive chronic disease impacting millions globally, necessitates precise and early detection to mitigate long-term complications. Traditional diagnostic methods, while effective, often entail invasive testing and may not fully leverage the patterns hidden in patient data. Addressing this gap, our research harnesses the predictive capability of Logistic Regression to estimate the likelihood of diabetes presence, followed by employing Random Forest to classify individuals into diabetic, pre-diabetic or nondiabetic categories based on the computed probabilities. This methodology not only capitalizes on the strengths of both algorithms-Logistic Regression's proficiency in estimating nuanced probabilities and Random Forest's robustness in classification-but also introduces a refined mechanism to enhance diagnostic accuracy. Through the application of this model to a comprehensive diabetes dataset, we demonstrate a marked improvement in diagnostic precision, as evidenced by superior performance metrics when compared to other machine learning approaches. Our findings underscore the potential of integrating diverse machine learning models to improve clinical decision-making processes, offering a promising avenue for the early and accurate diagnosis of diabetes and potentially other complex diseases.

Keywords

References

E. J. Moon, Y. E. Jo, T. C. Park, Y. K. Kim, S. H. Jung, H. J. Kim, and K. W. Lee, "Clinical characteristics and direct medical costs of type 2 diabetic patients," Korean Diabetes Journal, vol. 32, no. 4, pp. 358-365, 2008. DOI: 10.4093/kdj.2008.32.4.358
A. Kumar Dewangan and P. Agrawal, "Classification of diabetes mellitus using machine learning techniques," International Journal of Engineering and Applied Sciences, vol. 2, no. 5, 257905, 2015.
T. Zhu, K. Li, P. Herrero, and P. Georgiou, "Deep learning for diabetes: a systematic review," IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 7, pp. 2744-2757, 2020.
R. A. Sowah, A. A. Bampoe-Addo, S. K. Armoo, F. K. Saalia, F. Gatsi, and B. Sarkodie-Mensah, "Design and development of diabetes management system using machine learning," International Journal of Telemedicine and Applications, 2020.
R. Singla, A. Singla, Y. Gupta, and S. Kalra, "Artificial intelligence/machine learning in diabetes care," Indian Journal of Endocrinology and Metabolism, vol. 23, no. 4, pp. 495, 2019.
A. Mujumdar and V. Vaidehi, "Diabetes prediction using machine learning algorithms," Procedia Computer Science, vol. 165, pp. 292-299, 2019.
R. Couronne, P. Probst, and A.-L. Boulesteix, "Random forest versus logistic regression: a large-scale benchmark experiment," BMC Bioinformatics, vol. 19, pp. 1-14, 2018.
J. P. Kandhasamy and S. Balamurali, "Performance analysis of classifier models to predict diabetes mellitus," Procedia Computer Science, vol. 47, pp. 45-51, 2015.
A. Liaw and M. Wiener, "Classification and regression by Random Forest," R News, vol. 2, no. 3, pp. 18-22, 2002.
J. Zhou, Y. Cao, X. Wang, P. Li, and W. Xu, "Deep recurrent models with fast-forward connections for neural machine translation," Transactions of the Association for Computational Linguistics, vol. 4, pp. 371-383, 2016.
https:// web.stanford.edu/~jurafsky/slp3/5.pdf
I. Tougui, A. Jilbab, and J. El Mhamdi, "Impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications," Healthcare Informatics Research, vol. 27, no. 3, pp. 189-199, 2021. DOI: https://doi.org/10.4258/hir.2021.27.3.189
A. Gunawardana and G. Shani, "A survey of accuracy evaluation metrics of recommendation tasks," J. Mach. Learn. Res., vol. 10, no. 12, 2009.
B. Juba and H. S. Le, "Precision-recall versus accuracy and the role of large data sets," in Proc. AAAI Conf. Artif. Intell., vol. 33, no. 01, pp. 4039-4048, July 2019.
E. J. Michaud, Z. Liu, and M. Tegmark, "Precision Machine Learning," Entropy, vol. 25, no. 1, pp. 175, 2023.
D. Chicco and G. Jurman, "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation," BMC Genomics, vol. 21, pp. 1-13, 2020.

International journal of advanced smart convergence

A Study on Diabetes Management System Based on Logistic Regression and Random Forest

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)