Browse > Article
http://dx.doi.org/10.29220/CSAM.2018.25.2.199

Forecasting daily PM10 concentrations in Seoul using various data mining techniques  

Choi, Ji-Eun (Department of Statistics, Ewha Womans University)
Lee, Hyesun (Department of Statistics, Ewha Womans University)
Song, Jongwoo (Department of Statistics, Ewha Womans University)
Publication Information
Communications for Statistical Applications and Methods / v.25, no.2, 2018 , pp. 199-215 More about this Journal
Abstract
Interest in $PM_{10}$ concentrations have increased greatly in Korea due to recent increases in air pollution levels. Therefore, we consider a forecasting model for next day $PM_{10}$ concentration based on the principal elements of air pollution, weather information and Beijing $PM_{2.5}$. If we can forecast the next day $PM_{10}$ concentration level accurately, we believe that this forecasting can be useful for policy makers and public. This paper is intended to help forecast a daily mean $PM_{10}$, a daily max $PM_{10}$ and four stages of $PM_{10}$ provided by the Ministry of Environment using various data mining techniques. We use seven models to forecast the daily $PM_{10}$, which include five regression models (linear regression, Randomforest, gradient boosting, support vector machine, neural network), and two time series models (ARIMA, ARFIMA). As a result, the linear regression model performs the best in the $PM_{10}$ concentration forecast and the linear regression and Randomforest model performs the best in the $PM_{10}$ class forecast. The results also indicate that the $PM_{10}$ in Seoul is influenced by Beijing $PM_{2.5}$ and air pollution from power stations in the west coast.
Keywords
$PM_{10}$ concentration; linear regression; Randomforest; gradient boosting; support vector machine; neural network; ARFIMA;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Chaloulakou A, Kassomenos P, Spyrellis N, Demokritou P, and Koutrakis P (2003). Measurements of $PM_{10}$ and $PM_{2.5}$ particle concentrations in Athens, Greece, Atmospheric Environment, 37, 649-660.   DOI
2 Cheng S,Wang F, Li J, Chen D, Li M, Zhou Y, and Ren Z (2013). Application of trajectory clustering and source apportionment methods for investigating trans-boundary atmospheric $PM_{10}$ pollution, Aerosol and Air Quality Research, 13, 333-342.
3 Cortes C and Vapnik V (1995). Support-vector networks, Machine Learning, 20, 273-297.
4 Friedman JH (2002). Stochastic gradient boosting, Computational Statistics & Data Analysis, 38, 367-378.   DOI
5 Granger CWJ and Roselyne J (1980). An introduction to long-memory time series model and frac-tional differencing. Journal of Time Series Analysis, 1, 15-29.   DOI
6 Hastie T, Tibshirani R, and Friedman J (2009). The Elements of Statistical Learning : Data Mining, Inference, and Prediction (2nd ed), Springer-Verlag, New York.
7 Hooyberghs J, Mensink C, Dumont G, Fierens F, and Brasseur O (2005). A neural network forecast for daily average $PM_{10}$ concentrations in Belgium, Atmospheric Environment, 39, 3279-3289.   DOI
8 Kubat M, Holte R, and Matwin S (1997). Learning when negative examples abound. In Proceedings of the 9th European Conference on Machine Learning (pp. 146-153), Springer, London.
9 Nejadkoorki F and Baroutian S (2012). Forecasting extreme $PM_{10}$ concentrations using artificial Neural Networks, International Journal of Environmental Research, 6, 277-284.
10 Park C, Kim Y, Kim J, Song J, and Choi H (2011). Datamining using R, Kyowoo, Seoul.
11 Perez P and Reyes J (2006). An integrated neural network model for $PM_{10}$ forecasting, Atmospheric Environment, 40, 2845-2851.   DOI
12 Poggi JM and Portier B (2011). $PM_{10}$ forecasting using clusterwise regression, Atmospheric Environment, 45, 7005-7014.   DOI
13 Ridgeway G (2012). Generalized Boosted Models: A guide to the gbm package, Accessed March 31, 2010, from: http://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf
14 Sayegh AS, Munir S, and Habeebullah TM (2014). Comparing the performance of statistical models for predicting $PM_{10}$ concentrations, Aerosol and Air Quality Research, 14, 653-665.
15 Shaughnessy WJ, Venigalla MM, and Trump D (2015). Health effects of ambient levels of res-pirable particulate matter (PM) on healthy, young-adult population, Atmospheric Environment, 123, 102-111.   DOI
16 Taneja K, Ahmad S, Ahmad K, and Attri SD (2016). Time series analysis of aerosol optical depth over New Delhi using Box-Jenkins ARIMA modeling approach, Atmospheric Pollution Research, 7, 585-596.   DOI
17 Zuniga J, Tarajia M, Herrera V, Urriola W, Gomez B, and Motta J (2016). Assessment of the possible association of air pollutants $PM_{10}$, $O_3$, $NO_2$ with an increase in cardiovascular, respiratory, and diabetes mortality in Panama City, Medicine, 95, e2464.   DOI
18 Breiman L (2001). Random forests, Machine Learning, 45, 5-32.   DOI
19 Box GEP and Jenkins GM (1976). Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco.