DOI QR코드

DOI QR Code

Comparative analysis of model performance for predicting the customer of cafeteria using unstructured data

  • Seungsik Kim (Department of Statistics, Kyungpook National University) ;
  • Nami Gu (Department of Statistics, Pusan National University) ;
  • Jeongin Moon (Department of Statistics, Yeungnam University) ;
  • Keunwook Kim (Daegu Digital Innovation Agency, Big Data Utilization Center) ;
  • Yeongeun Hwang (Industrial Complex Promotion Department, Korea Industrial Complex Corporation) ;
  • Kyeongjun Lee (Department of Mathematics and Big Data Science, Kumoh National Institute of Technology)
  • Received : 2023.03.17
  • Accepted : 2023.06.04
  • Published : 2023.09.30

Abstract

This study aimed to predict the number of meals served in a group cafeteria using machine learning methodology. Features of the menu were created through the Word2Vec methodology and clustering, and a stacking ensemble model was constructed using Random Forest, Gradient Boosting, and CatBoost as sub-models. Results showed that CatBoost had the best performance with the ensemble model showing an 8% improvement in performance. The study also found that the date variable had the greatest influence on the number of diners in a cafeteria, followed by menu characteristics and other variables. The implications of the study include the potential for machine learning methodology to improve predictive performance and reduce food waste, as well as the removal of subjective elements in menu classification. Limitations of the research include limited data cases and a weak model structure when new menus or foreign words are not included in the learning data. Future studies should aim to address these limitations.

Keywords

References

  1. Baek OH, Kim MY, and Lee BH (2007). Menu satisfaction survey for business and industry foddservice workers - Focused on food preferences by gender, Journal of The Korean Society of Food Culture, 22, 511-519.
  2. Blecher L and Yeh RJ (2008). Forecasting meal participation in university residential dining facilities, Journal of Foodservice Business Research, 11, 352-362. https://doi.org/10.1080/15378020802519637
  3. Breiman L (2001). Random forests, Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
  4. Brown G (2010). Ensemble learning, Encyclopedia of Machine Learning, 312, 15-19. https://doi.org/10.1007/978-0-387-30164-8_252
  5. Cetinkaya Z and Erdal E (2019). Daily food demand forecast with artificial neural networks: Kirikkale University case, In 2019 4th International Conference on Computer Science and Engineering, Samsun, Turkey, 1-6.
  6. Cheng L, Yang IS, and Baek SH (2003). Investigation on the performance of the forecasting model in university foodservice, Journal of Nutrition and Health, 36, 966-973.
  7. Cho H (2019). Cluster analysis on categorical data using word embedding method : Focused on a high dimensional data (Master's thesis), Kookmin University, Seoul.
  8. Dhillon IS and Modha DS (2001). Concept decompositions for large sparse text data using clustering, Machine Learning, 42, 143-175. https://doi.org/10.1023/A:1007612920971
  9. Dorogush AV, Ershov V, and Gulin A (2018). CatBoost: GradientBoosting with categorical features support, Available from: arXiv preprint arXiv:1810.11363
  10. Friedman JH (2001). Greedy function approximation: A GradientBoosting machine, Annals of Statistics, 29, 1189-1232. https://doi.org/10.1214/aos/1013203451
  11. Huang A (2008). Similarity measures for text document clustering, In Proceedings of the Sixth New zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand, 4, 9-56.
  12. Jeon J, Park E, and Kwon OB (2019). Predicting the number of people for meals of an institutional foodservice by applying machine learning methods: S city hall case, Journal of the Korean Dietetic Association, 25, 44-58.
  13. Lim JY (2016). Analysis of forecasting factors affecting meal service in business foodservice (Master's thesis), Yonsei University, Seoul.
  14. Mikolov T, Chen K, Corrado G, and Dean J (2013). Efficient estimation of word representations in vector space, Available from: arXiv preprint arXiv:1301.3781v3
  15. Montgomery DC, Peck EA, and Vining GG (2021). Introduction to Linear Regression Analysis, John Wiley & Sons, Hoboken.
  16. Park S and Byun YC (2021). Improving recommendation accuracy based on machine learning using multi-dimensional features of Word2Vec, Journal of Korean Institute of Information Technology, 19, 9-14. https://doi.org/10.14801/jkiit.2021.19.3.9
  17. Park SS and Lee KC (2018). Effective Korean sentiment classification method using word2vec and ensemble classifier, Journal of Digital Contents Society, 19, 133-140. https://doi.org/10.9728/DCS.2018.19.1.133
  18. Polikar R (2006). Ensemble-based systems in decision making, IEEE Circuits and Systems Magazine, 6, 21-45. https://doi.org/10.1109/MCAS.2006.1688199
  19. Ryu K and Sanchez A (2003). The evaluation of forecasting methods at an institutional foodservice dining facility, The Journal of Hospitality Financial Management, 11, 27-45. https://doi.org/10.1080/10913211.2003.10653769
  20. Segal MR (2004). Machine learning benchmarks and random forest regression, Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco.
  21. Silverman BW (2018). Density Estimation for Statistics and Data Analysis, CRC press.
  22. Syarif I, Zaluska E, Prugel-Bennett A, and Wills G (2012). Application of bagging, boosting and stacking to intrusion detection, In Machine Learning and Data Mining in Pattern Recognition: 8th International Conference (pp. 593-602), Springer, Berlin Heidelberg.
  23. Yoo JE (2015). Random forests, an alternative data mining technique to decision tree, Journal of Educational Evaluation, 28, 427-448.
  24. Zhou ZH (2012). Ensemble Methods: Foundations and Algorithms, CRC press, Hoboken.