Browse > Article
http://dx.doi.org/10.7780/kjrs.2020.36.6.3.7

Evaluation and Predicting PM10 Concentration Using Multiple Linear Regression and Machine Learning  

Son, Sanghun (Division of Earth Environmental System Science (Major of Spatial Information Engineering), Pukyong National University)
Kim, Jinsoo (Department of Spatial Information Engineering, Pukyong National University)
Publication Information
Korean Journal of Remote Sensing / v.36, no.6_3, 2020 , pp. 1711-1720 More about this Journal
Abstract
Particulate matter (PM) that has been artificially generated during the recent of rapid industrialization and urbanization moves and disperses according to weather conditions, and adversely affects the human skin and respiratory systems. The purpose of this study is to predict the PM10 concentration in Seoul using meteorological factors as input dataset for multiple linear regression (MLR), support vector machine (SVM), and random forest (RF) models, and compared and evaluated the performance of the models. First, the PM10 concentration data obtained at 39 air quality monitoring sites (AQMS) in Seoul were divided into training and validation dataset (8:2 ratio). The nine meteorological factors (mean, maximum, and minimum temperature, precipitation, average and maximum wind speed, wind direction, yellow dust, and relative humidity), obtained by the automatic weather system (AWS), were composed to input dataset of models. The coefficients of determination (R2) between the observed PM10 concentration and that predicted by the MLR, SVM, and RF models was 0.260, 0.772, and 0.793, respectively, and the RF model best predicted the PM10 concentration. Among the AQMS used for model validation, Gwanak-gu and Gangnam-daero AQMS are relatively close to AWS, and the SVM and RF models were highly accurate according to the model validations. The Jongno-gu AQMS is relatively far from the AWS, but since PM10 concentration for the two adjacent AQMS were used for model training, both models presented high accuracy. By contrast, Yongsan-gu AQMS was relatively far from AQMS and AWS, both models performed poorly.
Keywords
$PM_{10}$ concentration; Meteorological Variables; Multiple Linear Regression; Support Vector Machine; Random Forest;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Abdullah, S., N. N. L. M. Napi, A. N. Ahmed, W. N. W. Mansor, A. A. Mansor, M. Ismail, A. M. Abdullah, and Z. T. A. Ramly, 2020. Development of Multiple Linear Regression for Particulate Matter(PM10) Forecasting during Episodic Transboundary Haze Event in Malaysia, Atmosphere, 11(289): 1-14.
2 Arampongsanuwat, S. and P. Meesad, 2012. PM10 Prediction Model by Support Vector Regression Based on Particle Swarm Optimization, Advanced Materials Research, 403-408: 3693-3698.   DOI
3 Bozdag, A., Y. Dokuz, and O. B. Gokcek, 2020. Spatial prediction of PM10 concentration using machine learning algorithms in Ankara, Turkey, Environmental Pollution, 263(A): 1-10.
4 Breiman, L., 2001. Random Forest, Machine Learning, 45(1): 5-32.   DOI
5 Chen, H. L., B. Yang, J. Liu, and D. Y. Liu, 2011. A support vector machine classifier with rough setbased feature selection for breast cancer diagnosis, Expert Systems with Applications, 38(7): 9014-9022.   DOI
6 Choubin, B., M. Abdolshahnejad, E. Moradi, X. Querol, A. Mosavi, S. Shamshirband, and P. Ghamisi, 2020. Spatial hazard assessment of the PM10 using machine learning models in Barcelona, Spain, Science of The Total Environment, 701(20): 1-11.
7 Cortes, C. and V. Vapnik, 1995. Support-vector networks, Machine Learning, 20: 273-297.   DOI
8 Diaz-Robles, J. A., J. C. Ortega, J. S. Fu, G. D. Reed, J. C. Chow, J. G. Watson, and J. A. MoncadaHerrera, 2008. A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: The case of Temuco, Chile, Atmospheric Environment, 42(35): 8331-8340.   DOI
9 Grange, S. K., D. C. Carslaw, A. Lewis, E. Boleti, and C. Heuglin, 2018. Random forest meteorological normalisation models for Swiss PM10 trend analysis, Atmospheric Chemistry and Physics Discussions, 18(9): 6223-6239.   DOI
10 Han, J. H., M. H. Lee, and Y. S. Ghim, 2008. Cluster Analysis of PM10 Concentrations from Urban Air Monitoring Network in Korea during 2000 to 2005, Journal of Korean Society for Atmospheric Environment, 24(3): 300-309 (in Korean with English Abstract).   DOI
11 Hwang, I. C. and J. S. Han, 2018. A Feasibility Study of a New Urban Access Regulation in Seoul: Policy Design, Public Acceptance, and the Expected Effects, The Seoul Institute, Seoul, KOR.
12 Ibrir, A., Y. Kerchich, N. Hadidi, H. Merabet, and M. Hentabli, 2020. Prediction of the concentrations of PM1, PM2.5, PM4, and PM10 by using the hybrid dragonfly-SVM algorithm, Air Quality, Atmosphere & Health, 2020: 1-11.
13 Ivanov, A., D. Voynikova, M. Stoimenova, S. GochevaIlieva, and I. Iliev, 2020. Random Forests Models of Particulate Matter PM10: A Case Study, Proc. of 2018 American Institute of Physics Conference Proceedings, Albena, BUL, Jun. 20-25, vol. 2025 p. 03001.
14 Li, Y. and Y. Tao, 2017. PM10 Concentration Forecast Based on Wavelet Support Vector Machine, Proc. of 2017 International Conference on Sensing, Diagnostics, Prognostics, and Control, Shanghai, CHA, Aug. 16-18, pp. 383-386.
15 Kim, W. S. and J.A. Kim, 2011. A Study of Building Customized Management Strategies Based on Local PM10 Emission, The Seoul Institute, Seoul, KOR.
16 Kirasich, T., T. Smith, and B. Sadler, 2018. Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets, SMU Data Science Review, 1(3): 1-24.
17 Lee, M. H., 2016. Korea-China collaborative study to abate trans-boundary air pollution(II), National Institute of Environmental Research, Research Report, Incheon, KOR.
18 Lim, J. M., 2019. An Estimation Model of Fine Dust Concentration Using Meteorological Environment Data and Machine Learning, Journal of Information Technology Services, 18(1): 173-186 (in Korean with English Abstract).   DOI
19 Liu, K., D. Tian, H. Xu, H. Wang, and G. Yang, 2019. Quantitative analysis of toxic elements in polypropylene (PP) via laser-induced breakdown spectroscopy (LIBS) coupled with random forest regression based on variable importance (VI-RFR), Analytical Methods, 11: 4769-4774.   DOI
20 Mallet, M.D., 2020. Meteorological normalisation of PM10 using machine learning reveals distinct increases of nearby source emissions in the Australian mining town of Moranbah, Atmospheric pollution research, 2020: 1-16.
21 Kampa, M. and E. Castanas, 2008. Human health effects of air pollution, Environmental Pollution, 151(2): 362-367.   DOI
22 Slini, T., A. Kaprara, K. Karatzas, and N. Moussiopoulos, 2006. PM10 forecasting for Thessaloniki, Greece, Environmental Modelling & Software, 21(4): 559-565.   DOI
23 Munir S., 2016. Modelling the non-linear association of particulate matter(PM10) with meteorological parameters and other air pollutants-a case study in Makkah, Arabian Journal of Geosciences, 9(64): 1-13.   DOI
24 Ozdemir, U. and S. Taner, 2014. Impacts of Meteorological Factors on PM10: Artificial Neural Networks(ANN) and Multiple Linear Regression(MLR) Approaches, Environmental Forensics, 15(4): 329-336.   DOI
25 Pourghasemi H. R., A. G. Jirandeh, B. Pradhan, C. Xu, and C. Gokceoglu, 2018. Landslide susceptibility mapping using support vector machine and GIS at the Golestan Province, Iran, Journal of Earth System Science, 122(2): 349-369.   DOI
26 Saeed, S., L. Hussain, I. A. Awan, and A. Idris, 2017. Comparative Analysis of different Statistical Methods for Prediction of PM2.5 and PM10 Concentrations in Advance for Several Hours, International Journal of Computer Science and Network Security, 17(11): 45-52.
27 Seo, J. D., 2016. Foreign Exchange Rate Forecasting Using the GARCH extended Random Forest Model, Journal of Industrial Economics and Business, 29(5): 1607-1628 (in Korean with English abstrsct).
28 Stafoggia, M., T. Bellander, S. Bucci, M. Davoli, K. de Hoogh, F. de' Donato, C. Gariazzo, A. Lyapustin, P. Michelozzi, M. Renzi, M. Scortichini, A. Shtein, G. Viegi, I. Kloog, and J. Schwartz, 2019. Estimation of daily PM10 and PM2.5 concentrations in Italy, 2013-2015, using a spatiotemporal land-use random-forest model, Environment International, 124: 170-179.   DOI
29 Weizhen, H., L. Zhengqiang, Z. Yuhuan, X. Hua, Z. Ying, L. Kaitao, L. Donghui, W. Peng, and M. Yan, 2014. Using support vector regression to predict PM10 and PM2.5, IOP Conference Series: Earth and Environmental Science, Proc. of 2013 35th International Symposium on Remote Sensing of Environment (ISRSE35), Beijing, CHN, Apr. 22-26, vol.17, pp. 012268.1-012268.6.
30 Zaman, N. A. F. K., K. D. Kanniah, and D. G. Kaskaoutis, 2017. Estimating Particulate Matter using satellite based aerosol optical depth and meteorological variables in Malaysia, Atmospheric Research, 193: 142-162.   DOI
31 UI-Saufie, A., A. Yahya, N. Ramli, and H. Hamid, 2011. Comparison Between Multiple Linear Regression And Feed forward Back propagation Neural Network Models For Predicting PM10 Concentration Level Based On Gaseous And Meteorological Parameters, International Journal of Applied Science and Technology, 1(4): 42-49.