DOI QR코드

DOI QR Code

Hourly Prediction of Particulate Matter (PM2.5) Concentration Using Time Series Data and Random Forest

시계열 데이터와 랜덤 포레스트를 활용한 시간당 초미세먼지 농도 예측

  • 이득우 (숭실대학교 융합소프트웨어학과) ;
  • 이수원 (숭실대학교 소프트웨어학부)
  • Received : 2019.08.23
  • Accepted : 2019.12.19
  • Published : 2020.04.30

Abstract

PM2.5 which is a very tiny air particulate matter even smaller than PM10 has been issued in the environmental problem. Since PM2.5 can cause eye diseases or respiratory problems and infiltrate even deep blood vessels in the brain, it is important to predict PM2.5. However, it is difficult to predict PM2.5 because there is no clear explanation yet regarding the creation and the movement of PM2.5. Thus, prediction methods which not only predict PM2.5 accurately but also have the interpretability of the result are needed. To predict hourly PM2.5 of Seoul city, we propose a method using random forest with the adjusted bootstrap number from the time series ground data preprocessed on different sources. With this method, the prediction model can be trained uniformly on hourly information and the result has the interpretability. To evaluate the prediction performance, we conducted comparative experiments. As a result, the performance of the proposed method was superior against other models in all labels. Also, the proposed method showed the importance of the variables regarding the creation of PM2.5 and the effect of China.

최근 환경 문제에서 중요한 화두로 떠오른 초미세먼지(PM2.5)는 미세먼지(PM10)보다도 작은 부유물질이다. PM2.5는 안구나 호흡기 질환을 일으키며 뇌혈관에까지 침투할 수 있어서 시간별로 수치를 예측하여 대비하는 것이 중요하다. 그러나 PM2.5의 생성과 이동에 관한 명확한 설명이 아직까지는 제시되지 않고 있어서 예측에 어려움이 따른다. 따라서 PM2.5 예측뿐만 아니라 예측 결과에 대한 설명력을 갖는 예측 방법이 제시될 필요가 있다. 본 연구에서는 서울시의 시간당 PM2.5를 예측하고자 하며, 이를 위해 각기 다른 지상관측 데이터를 시계열로 전처리하고 부트스트랩수를 조정한 랜덤 포레스트(Random Forest)를 데이터 학습 및 예측에 사용하는 방법을 제안한다. 이 방법은 예측 모델이 입력 데이터의 시각별 정보를 균형 있게 학습하게 하며 예측 결과에 대한 설명이 가능하다는 장점을 갖는다. 예측 정확도 평가를 위해 기존 모델과의 비교실험을 수행한 결과 제안 방법은 모든 레이블에서 가장 뛰어난 예측 성능을 보였으며, PM2.5의 생성과 관련된 변수와 중국의 영향과 관련된 변수가 예측 결과에 중요한 영향을 미치는 것을 보여주었다.

Keywords

References

  1. H. J. Lee, Y. Jeong, S. T. Kim, and W. S. Lee, "Atmospheric Circulation Patterns Associated with Particulate Matter over South Korea and Their Future Projection," Journal of Climate Change Research, Vol.9, No.4, pp.423-433, 2018. https://doi.org/10.15531/KSCCR.2018.9.4.423
  2. Ministry of Environment, "What is Fine Dust?," Republic of Korea's Ministry of Environment, Apr. 2016.
  3. National Institute of Environmental Research & NASA, "KORUS-AQ: An International Cooperative Air Quality Field Study in Korea," KORUS-AQ, 2016.
  4. D. Lee and S. Lee, "Prediction of fine Dust(PM2.5) Concentration Based on RBF Kernel SVM," Proceedings of the ISSAT international Conference on Data Science in Business, Finance and Industry, pp.114-117, Jul. 2019.
  5. S. Choi, J. An, and Y. Jo, "Review of Analysis Principle of Fine Dust," Prospectives of Industrial Chemistry, Vol.21, No.2, pp.16-23, Apr. 2018.
  6. H. Choi and M. S. Lee, "Atmospheric Boundary Layer Influenced upon Hourly PM10, PM2.5, PM1 Concentrations and Their Correlations at Gangneung City before and after Yellow Dust Transportation from Gobi Desert," Atmospheric Research, Vol.7, No.1, pp.30-54, Feb. 2012.
  7. Y. H. Seo and J. Kweon, "Relation of Levoglucosan and the Outbreaks of High PM10 and PM2.5 Concentration Occurred in Seoul Air," J. Korea Society of Environmental Administration, Vol.19. No.1, pp.1-10, Mar. 2013.
  8. K. Huang, Q. Xiao, X. Meng, G. Geng, Y. Wang, A. Lyapustin, D. Gu, and Y. Liu, "Predicting monthly highresolution PM2.5 concentrations with random forest model in the North China Plain," Environmental Pollution, Vol. 242, No.A, pp.675-683, 2018. https://doi.org/10.1016/j.envpol.2018.07.016
  9. Y. Lin, N. Mago, Y. Gao, Y. Li, Y. Y. Chiang, C. Shahabi, and J. L. Ambite, "Exploiting Spatiotemporal Patterns for Accurate Air Quality Forecasting using Deep Learning," ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL), pp.359-368, Nov. 2018.
  10. S. Jiaming. "PM 2. 5 Concentration Prediction using Times Series Based Data Mining." 2015.
  11. Y. Zheng, X. Yi, M. Li, R. Li, Z. Shan, E. Chang, and T. Li, "Forecasting Fine-Grained Air Quality Based on Big Data," Proceedings of the 21th SIGKDD Conference on Knowledge Discovery and Data Mining, pp.2267-2276, Aug. 2015.
  12. J. E. Choi, H. Lee, and J. Song, "Forecasting Daily PM10 Concentrations in Seoul using Various Data Mining Techniques," Communications for Statistical Applications and Methods 2018, Vol.25, No.2, 199-215, Mar. 2018.
  13. Y. Liang, S. Ke, J. Zhang, X. Yi, and Y. Zheng, "GeoMAN Multi-level Attention Networks for Geo-sensory Time Series Prediction," Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp.3428-3434, Jul. 2018.
  14. L. A. Diaz-Robles, J. C. Ortega, J. S. Fu, G. D. Reed, J. C. Chow, J. G. Watson, and J. A. Moncada-Herrera, "A Hybrid ARIMA and Artificial Neural Networks Model to Forecast Particulate Matter in Urban Areas: The Case of Temuco, Chile," Atmospheric Environment, Vol.42, No.35, pp.8331-8340, Nov. 2008. https://doi.org/10.1016/j.atmosenv.2008.07.020
  15. J. Zhao, F. Deng, Y. Cai, and J. Chen, "Long Short-term Memory - Fully Connected (LSTM-FC) Neural Network for PM2.5 Concentration Prediction," Chemosphere, Vol.220, No.1, pp.486-492, 2019. https://doi.org/10.1016/j.chemosphere.2018.12.128
  16. Y. Cheng, H. Zhang, Z. Liu, L. Chen, and P. Wang, "Hybrid Algorithm for Short-Term Forecasting of PM2.5 in China," Atmospheric Environment, Vol.200, pp.264-279, 2019. https://doi.org/10.1016/j.atmosenv.2018.12.025
  17. T. C. Kang and H. B. Kang, "Machine Learning-based Estimation of the Concentration of Fine Particulate Matter Using Domain Adaptation Method," Journal of Korea Multimedia Society, Vol.20, No.8, pp.1208-1215, August. 2017. https://doi.org/10.9717/kmms.2017.20.8.1208
  18. S. OH, J. Koo, and U. M. Kim, "Concentration Prediction Technique Based on Locality of Fine Dust Generation," The Institute of Electronics Engineers of Korea 2017, pp. 1357-1360, Jun. 2017.
  19. J. Cha and J. kim, "Development of Data Mining Algorithm for Implementation of Fine Dust Numerical Prediction Model," Journal of the Korea Institute of Information and Communication Engineering, Vol.22, No.4, pp.595-601, Apr. 2018. https://doi.org/10.6109/JKIICE.2018.22.4.595
  20. J. H. Kwon, Y. Lim, and H. S. Oh, "Particulate Matter Prediction using Quantile Boosting," The Korean Journal of Applied Statistics, Vol.28, No.1, pp.83-92, 2015. https://doi.org/10.5351/KJAS.2015.28.1.083
  21. S. Joun, J. Choi, and J. Bae, "Performance Comparison of Algorithms for the Prediction of Fine Dust Concentration," Korea Software Congress 2017, pp.775-777, Dec. 2017.
  22. G. James, D. Witten, T. Hastie, and R. Tibshirani, "An Introduction to. Statistical Learning with Applications in R," Springer, 2017.
  23. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), pp.5998-6008, Dec. 2017.
  24. A. Defazio, F. Bach, and S. Lacoste-Julien, "SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives," Advances in Neural Information Processing Systems 27 (NIPS 2014), Jul. 2014.
  25. D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), May. 2015.