Browse > Article
http://dx.doi.org/10.7472/jksii.2020.21.5.139

Prediction of infectious diseases using multiple web data and LSTM  

Kim, Yeongha (Department of Computer Science, Sangmyung University)
Kim, Inhwan (Department of Computer Science, Sangmyung University)
Jang, Beakcheol (Department of Computer Science, Sangmyung University)
Publication Information
Journal of Internet Computing and Services / v.21, no.5, 2020 , pp. 139-148 More about this Journal
Abstract
Infectious diseases have long plagued mankind, and predicting and preventing them has been a big challenge for mankind. For this reasen, various studies have been conducted so far to predict infectious diseases. Most of the early studies relied on epidemiological data from the Centers for Disease Control and Prevention (CDC), and the problem was that the data provided by the CDC was updated only once a week, making it difficult to predict the number of real-time disease outbreaks. However, with the emergence of various Internet media due to the recent development of IT technology, studies have been conducted to predict the occurrence of infectious diseases through web data, and most of the studies we have researched have been using single Web data to predict diseases. However, disease forecasting through a single Web data has the disadvantage of having difficulty collecting large amounts of learning data and making accurate predictions through models for recent outbreaks such as "COVID-19". Thus, we would like to demonstrate through experiments that models that use multiple Web data to predict the occurrence of infectious diseases through LSTM models are more accurate than those that use single Web data and suggest models suitable for predicting infectious diseases. In this experiment, we predicted the occurrence of "Malaria" and "Epidemic-parotitis" using a single web data model and the model we propose. A total of 104 weeks of NEWS, SNS, and search query data were collected, of which 75 weeks were used as learning data and 29 weeks were used as verification data. In the experiment we predicted verification data using our proposed model and single web data, Pearson correlation coefficient for the predicted results of our proposed model showed the highest similarity at 0.94, 0.86, and RMSE was also the lowest at 0.19, 0.07.
Keywords
Machine Learning; Predict infectious diseases; Web data; LSTM;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Y. G. Song, "전염병의 역사는 진행중," Korean J. Med., Vol. 68, No. 2, pp. 127, 2005.
2 S. Chae, S. Kwon, and D. Lee, "Predicting infectious disease using deep learning and big data," Int. J. Environ. Res. Public Health, Vol. 15, No. 8, Aug. 2018. https://doi.org/10.3390/ijerph15081596.
3 Muthusami R, Bharathi A, and Saritha K, "COVID-19 Outbreak: Tweet based Analysis and Visualization towards the Influence of Coronavirus in the World," GEDRAG Organ., Vol. 33, No. 02, pp. 534-549.
4 G. Singh Aujla and S. Grover, "Prediction Model for Influenza Epidemic Based on Twitter Data" Int. J. Adv. Res. Comput. Commun. Eng., Vol. 3, No. 7, pp. 7541-7545, 2014.
5 S. Mandal, M. Rath, Y. Wang, and B. G. Patra, "Predicting zika prevention techniques discussed on twitter: An exploratory study," in CHIIR 2018 - Proceedings of the 2018 Conference on Human Information Interaction and Retrieval, Feb. 2018, Vol. 2018-March, pp. 269-272, 2018. https://doi.org/10.1145/3176349.3176874.
6 Q. Xu, Y. R. Gel, L. L. R. Ramirez, K. Nezafati, Q. Zhang, and K. L. Tsui, "Forecasting influenza in Hong Kong with Google search queries and statistical model fusion," PLoS One, Vol. 12, No. 5, May 2017. https://doi.org/10.1371/journal.pone.0176690.
7 S. M. Ayyoubzadeh, S. M. Ayyoubzadeh, H. Zahedi, M. Ahmadi, and S. R Niakan Kalhori, "Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study," JMIR Public Heal. Surveill., Vol. 6, No. 2, p. e18828, Apr. 2020. https://doi.org/10.2196/18828.   DOI
8 F. Liang, P. Guan, W. Wu, and D. Huang, "Forecasting influenza epidemics by integrating internet search queries and traditional surveillance data with the support vector machine regression model in Liaoning, from 2011 to 2015," PeerJ, Vol. 2018, No. 6, pp. 02-14, 2018. https://doi.org/10.7717/peerj.5134.
9 Y. Nan and Y. Gao, "A machine learning method to monitor China's AIDS epidemics with data from Baidu trends," PLoS One, Vol. 13, No. 7, pp. 01-12, Jul. 2018. https://doi.org/10.1371/journal.pone.0199697.
10 L. Wang, J. Chen, and M. Marathe, "DEFSI: Deep Learning Based Epidemic Forecasting with Synthetic Information," pp. 9607-9612, 2019. www.aaai.org.
11 X. Zhu et al., "Attention-based recurrent neural network for influenza epidemic prediction," BMC Bioinformatics, Vol. 20, Nov. 2019. https://doi.org/10.1186/s12859-019-3131-8.
12 D. Liu et al., "A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models.," ArXiv, 2020.
13 J. Heo, "Epidemiological Prediction using Deep Learning," Graduate School of UNIST, 2020.
14 S. Aich and S. Han, "Malaria Epidemic Prediction Model by Using Twitter Data and Precipitation Volume in Nigeria," J. Korea Multimed. Soc., Vol. 22, No. 5, pp. 588-600, 2019. https://doi.org/10.9717/kmms.2019.22.5.588.   DOI
15 J. Zhang and K. Nawata, "A comparative study on predicting influenza outbreaks," Biosci. Trends, Vol. 11, No. 5, pp. 533-541, 2017. https://doi.org/10.5582/bst.2017.01257.   DOI
16 J. Xu, K. Xu, Z. Li, T. Tu, L. Xu, and Q. Liu, "Developing a dengue forecast model using Long Short Term Memory neural networks method" bioRxiv, 2019. https://doi.org/10.1101/760702.
17 H. Xue, Y. Bai, H. Hu, and H. Liang, "Influenza Activity Surveillance Based on Multiple Regression Model and Artificial Neural Network," IEEE Access, Vol. 6, pp. 563-575, Nov. 2017. https://doi.org/10.1109/ACCESS.2017.2771798.   DOI
18 Gupta, Rajan, et al. "Machine learning models for government to predict COVID-19 outbreak." Digital Government: Research and Practice" Vol.1, No.4, pp.1-6. 2020.
19 Siami-Namini, Sima, and Akbar Siami Namin. "Forecasting economics and financial time series: ARIMA vs. LSTM." arXiv preprint arXiv" Vol.1803 No.06386 2018.
20 Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv" Vol.1412 No.6980 2014.
21 Benesty, Jacob, et al. "Pearson correlation coefficient." Noise reduction in speech processing. Springer, Berlin, Heidelberg, pp.1-4. 2009.
22 Chai, Tianfeng, and Roland R. Draxler. "Root mean square error (RMSE) or mean absolute error (MAE)?." Geosci. Model Dev, Vol.7, No.3, pp.1247-1250, 2014.   DOI