Browse > Article
http://dx.doi.org/10.6109/jkiice.2021.25.2.171

Store Sales Prediction Using Gradient Boosting Model  

Choi, Jaeyoung (Library and Information Science, Sungkyunkwan University)
Yang, Heeyoon (Library and Information Science, Sungkyunkwan University)
Oh, Hayoung (College of Computing & Informatics, Sungkyunkwan University)
Abstract
Through the rapid developments in machine learning, there have been diverse utilization approaches not only in industrial fields but also in daily life. Implementations of machine learning on financial data, also have been of interest. Herein, we employ machine learning algorithms to store sales data and present future applications for fintech enterprises. We utilize diverse missing data processing methods to handle missing data and apply gradient boosting machine learning algorithms; XGBoost, LightGBM, CatBoost to predict the future revenue of individual stores. As a result, we found that using median imputation onto missing data with the appliance of the xgboost algorithm has the best accuracy. By employing the proposed method, fintech enterprises and customers can attain benefits. Stores can benefit by receiving financial assistance beforehand from fintech companies, while these corporations can benefit by offering financial support to these stores with low risk.
Keywords
CatBoost; LightGBM; Machine learning; Sales prediction; XGBoost;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. I. Jang and K. C. Kwak, "Comparison of Safety Driver Prediction Performance with XGBoost and LightGBM," in Proceeding of Korea Institute of Infomation Technology Conference, pp. 360-362, Jun. 2019.
2 Dacon. Korea data competition platform. Card Sales Prediction contest [Internet]. Available: https://dacon.io/competitions/official/140472/overview/.
3 Dacon. Korea data competition platform [Internet]. Available: https://dacon.io/.
4 R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing Data, 2nd ed. Hambrug, NJ: John Wiley & Sons Inc., 2014.
5 S. R. Lee, "Comparison of algorithms for the missing data imputation methods," M. S. Thesis, Hankuk University of Foreign Studies, Seoul, 2020.
6 Yonsei Structure & Bridge Eng Lab. Interpolation [Internet]. Available: http://str.yonsei.ac.kr/korean/portal.php.
7 J. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine," The Annals of Statistics, 2nd ed. Cambridge, MA: The MIT Press., vol. 29, no. 5, pp.1189-1194, 2001.
8 T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco: CA, pp. 785-794, 2016.
9 Documents for Xgboost [Internet]. Available: https://xgboost.readthedocs.io/en/latest/#.
10 G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Liu, "LightGBM: A Highly Efficient Gradient Boosting Decision Tree," in Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach: CA, pp. 3149-3157, 2017.
11 Documents for Lightgbm [Internet]. Available: https://lightgbm.readthedocs.io/en/latest/index.html.
12 L. Prokhorenkova, G. Gusev, A. Vorobev, V. A. Dorogush, and A. Gulin, "CatBoost: unbiased boosting with categorical feature," in Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 6639-6649, 2018.
13 Kaggle. Rossmann Store Sales [Internet]. Available: https://www.kaggle.com/c/rossmann-store-sales.
14 Documents for Catboost [Internet]. Available: https://catboost.ai/.
15 J. M. Yoon, "Effectiveness Analysis of Credit Card Default Risk with Deep Learning Neural Network," Journal of Money & Finance, vol. 33, no. 1, pp. 151-183, Mar. 2019.   DOI
16 Kaggle. UCI Credit Card Dataset [Internet]. Available: https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset.
17 A. Shen, R. Tong, and Y. Deng, "Application of Classification Models on Credit Card Fraud Detection," in 2007 International Conference on Service Systems and Service Management, pp. 1-4, Jul. 2007.
18 B. M. Pavlyshenko, "Machine-Learning Models for Sales Time Series Forecasting," Data, vol. 4, no. 1, Apr. 2019.
19 J. H. Lee, "Stock price prediction model using deep learning," M. S. Thesis, Soongsil University, Seoul, 2016.
20 S. B. Jha, R. F. Babiceanu, V. Pandey, and R. K. Jha, "Housing Market Prediction Problem using Different Machine Learning Algorithms: A Case Study," arXiv: 2006.10092v1, Jun. 2020.
21 H. Kim, "The Prediction of PM2.5 in Seoul through XGBoost Ensemble," Journal of the Korean Data Analysis Society, vol. 22, no. 4, pp. 1661-1671, Aug. 2020.   DOI
22 Y. G. Lee, J. Y. Oh, and G. B. Kim, "Interpretation of Load Forecasting Using Explainable Artificial Intelligence Techniques," The Transactions of the Korean Institute of Electrical Engineers, vol. 69, no. 3, pp. 480-485, Feb. 2020.   DOI