Comparative analysis of model performance for predicting the customer of cafeteria using unstructured data

Seungsik Kim;Nami Gu;Jeongin Moon;Keunwook Kim;Yeongeun Hwang;Kyeongjun Lee;

doi:10.29220/CSAM.2023.30.5.485

Communications for Statistical Applications and Methods

제30권5호
/
Pages.485-499
/
2023
/
2287-7843(pISSN)
/
2383-4757(eISSN)

한국통계학회 (The Korean Statistical Society)

DOI QR Code

Comparative analysis of model performance for predicting the customer of cafeteria using unstructured data

Seungsik Kim (Department of Statistics, Kyungpook National University) ;
Nami Gu (Department of Statistics, Pusan National University) ;
Jeongin Moon (Department of Statistics, Yeungnam University) ;
Keunwook Kim (Daegu Digital Innovation Agency, Big Data Utilization Center) ;
Yeongeun Hwang (Industrial Complex Promotion Department, Korea Industrial Complex Corporation) ;
Kyeongjun Lee (Department of Mathematics and Big Data Science, Kumoh National Institute of Technology)

투고 : 2023.03.17
심사 : 2023.06.04
발행 : 2023.09.30

https://doi.org/10.29220/CSAM.2023.30.5.485 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

This study aimed to predict the number of meals served in a group cafeteria using machine learning methodology. Features of the menu were created through the Word2Vec methodology and clustering, and a stacking ensemble model was constructed using Random Forest, Gradient Boosting, and CatBoost as sub-models. Results showed that CatBoost had the best performance with the ensemble model showing an 8% improvement in performance. The study also found that the date variable had the greatest influence on the number of diners in a cafeteria, followed by menu characteristics and other variables. The implications of the study include the potential for machine learning methodology to improve predictive performance and reduce food waste, as well as the removal of subjective elements in menu classification. Limitations of the research include limited data cases and a weak model structure when new menus or foreign words are not included in the learning data. Future studies should aim to address these limitations.

키워드

참고문헌

Baek OH, Kim MY, and Lee BH (2007). Menu satisfaction survey for business and industry foddservice workers - Focused on food preferences by gender, Journal of The Korean Society of Food Culture, 22, 511-519.
Blecher L and Yeh RJ (2008). Forecasting meal participation in university residential dining facilities, Journal of Foodservice Business Research, 11, 352-362. https://doi.org/10.1080/15378020802519637
Breiman L (2001). Random forests, Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
Brown G (2010). Ensemble learning, Encyclopedia of Machine Learning, 312, 15-19. https://doi.org/10.1007/978-0-387-30164-8_252
Cetinkaya Z and Erdal E (2019). Daily food demand forecast with artificial neural networks: Kirikkale University case, In 2019 4th International Conference on Computer Science and Engineering, Samsun, Turkey, 1-6.
Cheng L, Yang IS, and Baek SH (2003). Investigation on the performance of the forecasting model in university foodservice, Journal of Nutrition and Health, 36, 966-973.
Cho H (2019). Cluster analysis on categorical data using word embedding method : Focused on a high dimensional data (Master's thesis), Kookmin University, Seoul.
Dhillon IS and Modha DS (2001). Concept decompositions for large sparse text data using clustering, Machine Learning, 42, 143-175. https://doi.org/10.1023/A:1007612920971
Dorogush AV, Ershov V, and Gulin A (2018). CatBoost: GradientBoosting with categorical features support, Available from: arXiv preprint arXiv:1810.11363
Friedman JH (2001). Greedy function approximation: A GradientBoosting machine, Annals of Statistics, 29, 1189-1232. https://doi.org/10.1214/aos/1013203451
Huang A (2008). Similarity measures for text document clustering, In Proceedings of the Sixth New zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand, 4, 9-56.
Jeon J, Park E, and Kwon OB (2019). Predicting the number of people for meals of an institutional foodservice by applying machine learning methods: S city hall case, Journal of the Korean Dietetic Association, 25, 44-58. https://doi.org/10.14373/JKDA.2019.25.1.44
Lim JY (2016). Analysis of forecasting factors affecting meal service in business foodservice (Master's thesis), Yonsei University, Seoul.
Mikolov T, Chen K, Corrado G, and Dean J (2013). Efficient estimation of word representations in vector space, Available from: arXiv preprint arXiv:1301.3781v3
Montgomery DC, Peck EA, and Vining GG (2021). Introduction to Linear Regression Analysis, John Wiley & Sons, Hoboken.
Park S and Byun YC (2021). Improving recommendation accuracy based on machine learning using multi-dimensional features of Word2Vec, Journal of Korean Institute of Information Technology, 19, 9-14. https://doi.org/10.14801/jkiit.2021.19.3.9
Park SS and Lee KC (2018). Effective Korean sentiment classification method using word2vec and ensemble classifier, Journal of Digital Contents Society, 19, 133-140. https://doi.org/10.9728/DCS.2018.19.1.133
Polikar R (2006). Ensemble-based systems in decision making, IEEE Circuits and Systems Magazine, 6, 21-45. https://doi.org/10.1109/MCAS.2006.1688199
Ryu K and Sanchez A (2003). The evaluation of forecasting methods at an institutional foodservice dining facility, The Journal of Hospitality Financial Management, 11, 27-45. https://doi.org/10.1080/10913211.2003.10653769
Segal MR (2004). Machine learning benchmarks and random forest regression, Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco.
Silverman BW (2018). Density Estimation for Statistics and Data Analysis, CRC press.
Syarif I, Zaluska E, Prugel-Bennett A, and Wills G (2012). Application of bagging, boosting and stacking to intrusion detection, In Machine Learning and Data Mining in Pattern Recognition: 8th International Conference (pp. 593-602), Springer, Berlin Heidelberg.
Yoo JE (2015). Random forests, an alternative data mining technique to decision tree, Journal of Educational Evaluation, 28, 427-448.
Zhou ZH (2012). Ensemble Methods: Foundations and Algorithms, CRC press, Hoboken.

Communications for Statistical Applications and Methods

Comparative analysis of model performance for predicting the customer of cafeteria using unstructured data

초록

키워드

참고문헌

자세히 찾기