Prediction of spatio-temporal AQI data

KyeongEun Kim;MiRu Ma;KyeongWon Lee;

doi:10.29220/CSAM.2023.30.2.119

Communications for Statistical Applications and Methods

Volume 30 Issue 2
/
Pages.119-133
/
2023
/
2287-7843(pISSN)
/
2383-4757(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Prediction of spatio-temporal AQI data

KyeongEun Kim (Department of Statistics, Seoul National University) ;
MiRu Ma (Department of Statistics, Sungkyunkwan University) ;
KyeongWon Lee (Department of Statistics, Seoul National University)

Received : 2022.07.21
Accepted : 2023.01.15
Published : 2023.03.31

https://doi.org/10.29220/CSAM.2023.30.2.119 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

With the rapid growth of the economy and fossil fuel consumption, the concentration of air pollutants has increased significantly and the air pollution problem is no longer limited to small areas. We conduct statistical analysis with the actual data related to air quality that covers the entire of South Korea using R and Python. Some factors such as SO₂, CO, O₃, NO₂, PM₁₀, precipitation, wind speed, wind direction, vapor pressure, local pressure, sea level pressure, temperature, humidity, and others are used as covariates. The main goal of this paper is to predict air quality index (AQI) spatio-temporal data. The observations of spatio-temporal big datasets like AQI data are correlated both spatially and temporally, and computation of the prediction or forecasting with dependence structure is often infeasible. As such, the likelihood function based on the spatio-temporal model may be complicated and some special modelings are useful for statistically reliable predictions. In this paper, we propose several methods for this big spatio-temporal AQI data. First, random effects with spatio-temporal basis functions model, a classical statistical analysis, is proposed. Next, neural networks model, a deep learning method based on artificial neural networks, is applied. Finally, random forest model, a machine learning method that is closer to computational science, will be introduced. Then we compare the forecasting performance of each other in terms of predictive diagnostics. As a result of the analysis, all three methods predicted the normal level of PM_2.5 well, but the performance seems to be poor at the extreme value.

Keywords

Acknowledgement

This research was supported by the Basic Research Program through the National Research Foundation of Korea (NRF) funded by the MSIT (NRF-2020R1A4A1018207).

References

Bakar KS and Kokic P (2017). Bayesian Gaussian models for point referenced spatial and spatio-temporal data, Journal of Statistical Research, 51, 17-40. https://doi.org/10.47302/jsr.2017510102
Baran B (2019). Prediction of air quality index by extreme learning machines, In Proceedings of International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey, 19079408, Available from: http: doi.org/10.1109/IDAP.2019.8875910
Herrera VM, Khoshgoftaar TM, Villanustre F, and Furht B (2019). Random forest implementation and optimization for big data analytics on LexisNexis's high performance computing cluster platform, Journal of Big Data, 6, 1-36. https://doi.org/10.1186/s40537-018-0162-3
Hengl T, Nussbaum M, Wright MN, Heuvelink GB, and Graler B (2018). Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables, PeerJ, 6, e5518, Available from: https://doi.org/10.7717/peerj.5518
Ioffe S and Szegedy C (2015). "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning, pmlr, 2015.
Jiang W (2021). The data analysis of Shanghai Air Quality Index based on linear regression analysis, Journal of Physics: Conference Series, 1813, 012031, Available from: https://doi.org/10.1088/1742-6596/1813/1/012031
Johnson RA and Wichern DW (2013). Applied Multivariate Statistical Analysis, Pearson Educated Limited Harlow, England.
Leo B (2001). Random forests, Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
Loshchilov I and Hutter F (2016). SGRD: Stochastic gradient descent with warm restarts, Available from: arXiv preprint arXiv:1608.03983
Loshchilov I and Hutter F (2017). Decoupled weight decay regularization. arXiv preprint, Available from: arXiv:1711.05101
Nair V and Hinton GE (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), 807-814.
Paszke A, Gross S, Massa F et al. (2019). Pytorch: An imperative style S, high-performance deep learning library, Advances in Neural Information Processing Systems, 32, 8024-8035.
Powers DMW (2020). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation, International Journal of Machine Learning Technology, 2, 37-63, Available from: https://arxiv.org/abs/2010.16061 https://doi.org/10.16061
Quinlan R (1986). Induction of decision trees, Machine Learning, 1, 81-106. https://doi.org/10.1007/BF00116251
Searle SR (2017). Matrix Algebra Useful for Statistics, Wiley Hoboken, New Jersey.
Simonyan K and Zisserman A (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition, Available from: https://arxiv.org/abs/1409.1556
Wang J, Li X, Jin L, Li J, Sun Q, and Wang H (2022). An air quality index prediction model based on CNN-ILSTM, Scientific Reports, 12, 8373, Available from: http://doi.org/ 10.1038/s41598-022-12355-6
Wikle CK, Zammit-Mangion A, and Cressie N (2019). Spatio-temporal Statistics with R, CRC Press, Taylor & Francis Group, Florida.
Yoon J, Jordon J, and van der Schaar M (2018). Gain: Missing data imputation using generative adversarial nets, International Conference on Machine Learning, 80, 5689-5698.
Ma H, Yue S, and Li J (2020). Air quality evaluation method based on data analysis, In Proceedings of 2020 39th Chinese Control Conference (CCC), Shenyang, China, 3162-3167.

Communications for Statistical Applications and Methods

Prediction of spatio-temporal AQI data

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)