DOI QR코드

DOI QR Code

Obesity Level Prediction Based on Data Mining Techniques

  • Alqahtani, Asma (Computer Science Department, College of Science and Humanities, Imam Abdulrahman Bin Faisal University) ;
  • Albuainin, Fatima (Computer Science Department, College of Science and Humanities, Imam Abdulrahman Bin Faisal University) ;
  • Alrayes, Rana (Computer Science Department, College of Science and Humanities, Imam Abdulrahman Bin Faisal University) ;
  • Al muhanna, Noura (Computer Science Department, College of Science and Humanities, Imam Abdulrahman Bin Faisal University) ;
  • Alyahyan, Eyman (Computer Science Department, College of Science and Humanities, Imam Abdulrahman Bin Faisal University) ;
  • Aldahasi, Ezaz (Computer Science Department, College of Science and Humanities, Imam Abdulrahman Bin Faisal University)
  • Received : 2021.03.05
  • Published : 2021.03.30

Abstract

Obesity affects individuals of all gender and ages worldwide; consequently, several studies have performed great works to define factors causing it. This study develops an effective method to trace obesity levels based on supervised data mining techniques such as Random Forest and Multi-Layer Perception (MLP), so as to tackle this universal epidemic. Notably, the dataset was from countries like Mexico, Peru, and Colombia in the 14- 61year age group, with varying eating habits and physical conditions. The data includes 2111 instances and 17 attributes labelled using NObesity, which facilitates categorization of data using Overweight Levels l I and II, Insufficient Weight, Normal Weight, as well as Obesity Type I to III. This study found that the highest accuracy was achieved by Random Forest algorithm in comparison to the MLP algorithm, with an overall classification rate of 96.7%.

Keywords

References

  1. A. Bewick and E. P. Greener, "Ref 1.Pdf." p. 4623, 1969.
  2. H. B. Hubert, M. Feinleib, P. M. McNamara, and W. P. Castelli, "Obesity as an independent risk factor for cardiovascular disease: A 26-year follow-up of participants in the Framingham Heart Study," Circulation, vol. 67, no. 5, pp. 968-977, 1983, doi: 10.1161/01.CIR.67.5.968.
  3. A. Must, J. Spadano, E. H. Coakley, A. E. Field, G. Colditz, and W. H. Dietz, "The disease burden associated with overweight and obesity," J. Am. Med. Assoc., vol. 282, no. 16, pp. 1523-1529, 1999, doi: 10.1001/jama.282.16.1523.
  4. B. Guy-Grand, "Beyond body mass index," Cah. Nutr. Diet., vol. 49, no. 3, pp. 93-94, 2014, doi: 10.1016/j.cnd.2014.05.002.
  5. E. Alyahyan and D. Dusteaor, "Decision trees for very early prediction of student's achievement," 2020 2nd Int. Conf. Comput. Inf. Sci. ICCIS 2020, 2020, doi: 10.1109/ICCIS49240.2020.9257646.
  6. N. Lavrac, "Selected techniques for data mining in medicine," Artif. Intell. Med., vol. 16, no. 1, pp. 3-23, 1999, doi: 10.1016/S0933-3657(98)00062-1.
  7. M. H. J. and P. Jian and Kamber, "Data Mining Techniques, Third Edition," p. 847, 2011.
  8. M. Khajehei and F. Etemady, "Data mining and medical research studies," Proc. - 2nd Int. Conf. Comput. Intell. Model. Simulation, CIMSim 2010, no. September 2010, pp. 119-122, 2010, doi: 10.1109/CIMSiM.2010.24.
  9. R. C. Cervantes and U. M. Palacio, "Estimation of obesity levels based on computational intelligence," Informatics Med. Unlocked, vol. 21, no. November, 2020, doi: 10.1016/j.imu.2020.100472.
  10. B. Singh and H. Tawfik, "A Machine Learning Approach for Predicting Weight Gain Risks in Young Adults," Conf. Proc. 2019 10th Int. Conf. Dependable Syst. Serv. Technol. DESSERT 2019, pp. 231-234, 2019, doi: 10.1109/DESSERT.2019.8770016.
  11. R. Hossain, S. M. H. Mahmud, M. A. Hossin, S. R. Haider Noori, and H. Jahan, "PRMT: Predicting Risk Factor of Obesity among Middle-Aged People Using Data Mining Techniques," Procedia Comput. Sci., vol. 132, pp. 1068-1076, 2018, doi: 10.1016/j.procs.2018.05.022.
  12. Z. Zheng and K. Ruggiero, "Using machine learning to predict obesity in high school students," Proc. - 2017 IEEE Int. Conf. Bioinforma. Biomed. BIBM 2017, vol. 2017-Janua, pp. 2132-2138, 2017, doi: 10.1109/BIBM.2017.8217988.
  13. M. K. Ucar, Z. Ucar, F. Koksal, and N. Daldal, "Estimation of body fat percentage using hybrid machine learning algorithms," Meas. J. Int. Meas. Confed., vol. 167, 2021, doi: 10.1016/j.measurement.2020.108173.
  14. N. Daud, N. L. Mohd Noor, S. A. Aljunid, N. Noordin, and N. I. M. Fahmi Teng, "Predictive Analytics: The Application of J48 Algorithm on Grocery Data to Predict Obesity," 2018 IEEE Conf. Big Data Anal. ICBDA 2018, pp. 1-6, 2019, doi: 10.1109/ICBDAA.2018.8629623.
  15. J. Dunstan, M. Aguirre, M. Bastias, C. Nau, T. A. Glass, and F. Tobar, "Predicting nationwide obesity from food sales using machine learning," Health Informatics J., vol. 26, no. 1, pp. 652-663, 2020, doi: 10.1177/1460458219845959.
  16. E. De-La-Hoz-Correa, F. E. Mendoza-Palechor, A. DeLa-Hoz-Manotas, R. C. Morales-Ortega, and S. H. B. Adriana, "Obesity level estimation software based on decision trees," J. Comput. Sci., vol. 15, no. 1, pp. 67-77, 2019, doi: 10.3844/jcssp.2019.67.77.
  17. F. M. Palechor and A. de la H. Manotas, "Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico," Data Br., vol. 25, p. 104344, 2019, doi: 10.1016/j.dib.2019.104344.
  18. A. S. Nur, "Artificial Neural Network Weight Optimization: A Review," Telkomnika, 2014.
  19. R. A. Flauzino, Artificial Neural Networks A Practical Course. Springer International Publishing, 2016.
  20. S. Riad, J. Mania, L. Bouchaou, and Y. Najjar, "Predicting catchment flow in a semi-arid region via an artificial neural network technique," Hydrol. Process., vol. 18, no. 13, pp. 2387-2393, 2004, doi: 10.1002/hyp.1469.
  21. Y. Qi, "Random forest for bioinformatics," Springer, 2012, pp. 307-323.
  22. P. M. Chakraborty Sounak, Khalilia Mohammed, "Predicting disease risks from highly imbalanced data using random forest," vol. 11, no. 1, p. 51, 2011. https://doi.org/10.1186/1472-6947-11-51
  23. A. F. in A. N. L. Sarica, Alessia; Cerasa, Antonio; Quattrone, "Random Forest algorithm for the classification of neuroimaging data in Alzheimer's disease: A systematic review," 2017.
  24. and S. H. ] A. Hemmati-Sarapardeh, A. Larestani, M. Nait Amar, Chapter 2 - Intelligent models. 2020.
  25. B. J. Saleh, A. Y. F. Saedi, A. T. Q. Al-aqbi, and L. A. Salman, "A Review Paper: Analysis of Weka Data Mining Techniques for Heart Disease Prediction System," Libr. Philos. Pract., vol. 7, no. 1, p. 1, 2020.
  26. R. Sangeetha and S. Sathappan, "Preprocessing Using Attribute Selection in Data Stream Mining," Proc. 3rd Int. Conf. Commun. Electron. Syst. ICCES 2018, no. Icces, pp. 431-438, 2018, doi: 10.1109/CESYS.2018.8723918.
  27. R. Kohavi, "Wrappers for performance enhancement and obvious decision graphs," no. September, 1995.
  28. C. M. Lai, W. C. Yeh, and C. Y. Chang, "Gene selection using information gain and improved simplified swarm optimization," Neurocomputing, vol. 218, no. November 2018, pp. 331-338, 2016, doi: 10.1016/j.neucom.2016.08.089.
  29. M. Mursalin, Y. Zhang, Y. Chen, and N. V. Chawla, "Automated epileptic seizure detection using improved correlation-based feature selection with random forest classifier," Neurocomputing, vol. 241, no. February, pp. 204-214, 2017, doi: 10.1016/j.neucom.2017.02.053.