DOI QR코드

DOI QR Code

Prediction of New Confirmed Cases of COVID-19 based on Multiple Linear Regression and Random Forest

다중 선형 회귀와 랜덤 포레스트 기반의 코로나19 신규 확진자 예측

  • Received : 2022.04.01
  • Accepted : 2022.06.14
  • Published : 2022.08.31

Abstract

The COVID-19 virus appeared in 2019 and is extremely contagious. Because it is very infectious and has a huge impact on people's mobility. In this paper, multiple linear regression and random forest models are used to predict the number of COVID-19 cases using COVID-19 infection status data (open source data provided by the Ministry of health and welfare) and Google Mobility Data, which can check the liquidity of various categories. The data has been divided into two sets. The first dataset is COVID-19 infection status data and all six variables of Google Mobility Data. The second dataset is COVID-19 infection status data and only two variables of Google Mobility Data: (1) Retail stores and leisure facilities (2) Grocery stores and pharmacies. The models' performance has been compared using the mean absolute error indicator. We also a correlation analysis of the random forest model and the multiple linear regression model.

Keywords

References

  1. J. S. Bae, S. B. Kim, "Predictions of COVID-19 in Korea Using Machine Learning Models," Journal of the Korean Institute of Industrial Engineers. Vol. 47, No. 3, pp. 272-279, 2021 (in Korean). https://doi.org/10.7232/JKIIE.2021.47.3.272
  2. E. J. Kim, D. S. Lee, "Coronaviruses: SARS, MERS and COVID-19," Journal of Clinical Laboratory Science, Vol. 52, pp. 297-309, 2020 (in Korean).
  3. J. S. Bae, S. B. Kim, "Prediction of new Corona Cases Based on Machine Learning and Deep Learning," Proceedings of Korean Institute Of Industrial Engineers, pp. 3675-3685, 2020 (in Korean).
  4. J. U. Moon, S. W. Jung, H. J. Kim, E. J. Hwang, "Daily Occurrence Prediction of Regional Infectious Diseases Using Random Forest," Proceedings of The Korean Institute of Information Scientists and Engineers, pp. 335-337, 2019 (in Korean).
  5. Y. A. Noh, S. W. Jung, J. U. Moon, E. J. Hwang, "LSTM-based Daily COVID-19 Forecasting Scheme Considering Social Variables," Proceedings of The Korean Institute of Information Scientists and Engineers, Vol. 28, No. 2, pp.116-121, 2022 (in Korean).
  6. 공공데이터포털. (2020-2022). 공공데이터활용지원센터_보건복지부 코로나19 감염 현황 [데이터 세트]. 공공데이터활용지원센터. https://www.data.go.kr/tcs/dss/selectApiDataDetailView.do?publicDataPk=15043376
  7. Google. (2020-2022). 코로나19 지역사회 이동성 보고서[데이터 세트]. Google. https://www.google.com/covid19/mobility/index.html?hl=ko
  8. D. G. Kim, Y. S. Park, L. J. Park, T. Y. Chung, "Developing of New a Tensorflow Tutorial Model on Machine Learning : Focusing on the Kaggle Titanic Dataset," IEMEK Journal of Embedded Systems and Applications, Vol. 14, No. 4 pp. 207-218, 2019 (in Korean). https://doi.org/10.14372/IEMEK.2019.14.4.207
  9. H. J. Jhang, H. Kwak, S. H. Choi, "Analysis of the Outcome for the Korean Pro-Basketball Games Using Regression Models". Journal of Korean Institute of Intelligent Systems. Vol. 25, No. 5, pp. 489-494, 2015 (in Korean). https://doi.org/10.5391/JKIIS.2015.25.5.489
  10. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, "Attention is all you need," Proceedings of Neural Information Processing Systems (NIPS 2017), pp. 5998-6008, 2017.
  11. D. W. Lee, S. W. Lee, "Hourly Prediction of Particulate Matter (PM2.5) Concentration Using Time Series Data and Random Forest," Journal of KIPS Transactions on Software and Data Engineering, Vol. 9, No. 4, pp. 129-136, 2020 (in Korean).
  12. Y. J. Oh, H. C. Jung, "Development of Galaxy Image Classification Based on Hand-crafted Features and Machine Learning," IEMEK Journal of Embedded Systems and Applications, Vol. 16, No. 1, pp. 17-27, 2021 (in Korean). https://doi.org/10.14372/IEMEK.2021.16.1.17
  13. R. M. Kim, K. M. Kim, J. H. Ahn, "Comparison Between Random Forest and Recurrent Neural Network for Photovoltaic Power Forecasting," Journal of Korean Society Of Environmental Engineers, Vol. 43, No. 5, pp. 347-355, 2021 (in Korean). https://doi.org/10.4491/KSEE.2021.43.5.347
  14. E. J. Lee, H. S. Cho, Y. S. Song, "An Exploratory Study on Determinants Predicting University Graduate Newcomers' Early Turn Over," Journal of Corporate Education and Talent Research, Vol. 22, No. 1, pp. 163-194, 2020 (in Korean). https://doi.org/10.46260/kslp.22.1.7
  15. L. Breiman, "Random forests," Journal of Machine Learning, Vol. 45, No. 1, pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324
  16. K. S. Ko, Y. W. Kim, S. H. Byeon, S. J. Lee, "LSTM Based Prediction of Ocean Mixed Layer Temperature Using Meteorological Data," Journal of Remote Sensing, Vol. 37, No. 3, pp. 603-614, 2021 (in Korean).