DOI QR코드

DOI QR Code

Iowa Liquor Sales Data Predictive Analysis Using Spark

  • Ankita Paul (Computer Information Systems, California State University) ;
  • Shuvadeep Kundu (Computer Information Systems, California State University) ;
  • Jongwook Woo (CIS Department, California State University)
  • Received : 2020.12.08
  • Accepted : 2021.04.15
  • Published : 2021.06.30

Abstract

The paper aims to analyze and predict sales of liquor in the state of Iowa by applying machine learning algorithms to models built for prediction. We have taken recourse of Azure ML and Spark ML for our predictive analysis, which is legacy machine learning (ML) systems and Big Data ML, respectively. We have worked on the Iowa liquor sales dataset comprising of records from 2012 to 2019 in 24 columns and approximately 1.8 million rows. We have concluded by comparing the models with different algorithms applied and their accuracy in predicting the sales using both Azure ML and Spark ML. We find that the Linear Regression model has the highest precision and Decision Forest Regression has the fastest computing time with the sample data set using the legacy Azure ML systems. Decision Tree Regression model in Spark ML has the highest accuracy with the quickest computing time for the entire data set using the Big Data Spark systems.

Keywords

References

  1. Gupta, N., Le, H. A., Boldina, M., and Woo, J. (2019). Predicting fraud of AD click using traditional and spark ML. KSII The 14th Asia Pacific International Conference on Information Science and Technology (APIC-IST), pp.24-28. 
  2. Iowa Liquor Sales Sales & Distribution (n.d.). Retrieve d 2019 from https://data.iowa.gov/Sales-Distribution/Iowa-Liquor-Sales/m3tr-qhgy 
  3. Lutins, E. (2017). Predicting-Iowa-Liquor-Sales. GitHub, 2017 [Online]. Retrieved from https://github.com/elutins/Predicting-Iowa-Liquor-Sales 
  4. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D. B., Amde, M., Owen, S., and Xin, D. (2015). MLlib: Machine learning in apache spark. arXiv preprint arXiv:1505.06807. 
  5. Purushu, P., Melcher, N., Bhagwat B., and Woo, J. (2018). Predictive analysis of financial fraud detection using azure and spark ML. Asia Pacific Journal of Information Systems (APJIS), 28(4), 308-319.  https://doi.org/10.14329/apjis.2018.28.4.308
  6. Purushu, P., and Woo, J. (2020). Financial fraud detection adopting distributed deep learning in big data. KSII The 15th Asia Pacific International Conference on Information Science and Technology (APIC-IST) 2020, July 5-7 2020, Seoul, Korea, pp.271-273. 
  7. Salmon, M. (2017). Predictive modeling with iowa state liquor sales data. Towards data science, 2017[Online]. Retrieved from https://towardsdatascience.com/predictive-modeling-with-iowa-state-liquor-sales-data-e45342081b83 
  8. Woo, J., and Xu, Y. (2011). Market basket analysis algorithm with map/reduce of cloud computing. The 2011 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2011), Las Vegas. 
  9. Woo, J. (2013). Market basket analysis algorithms with mapreduce. DMKD-00150, Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, 3(6), 445-452. https://doi.org/10.1002/widm.1107