Browse > Article
http://dx.doi.org/10.36498/kbigdt.2021.6.2.39

Comparison of Machine Learning Techniques in Urban Weather Prediction using Air Quality Sensor Data  

Jong-Chan Park (인하대학교 통계학과)
Heon Jin Park (인하대학교 통계학과)
Publication Information
The Journal of Bigdata / v.6, no.2, 2021 , pp. 39-49 More about this Journal
Abstract
Recently, large and diverse weather data are being collected by sensors from various sources. Efforts to predict the concentration of fine dust through machine learning are being made everywhere, and this study intends to compare PM10 and PM2.5 prediction models using data from 840 outdoor air meters installed throughout the city. Information can be provided in real time by predicting the concentration of fine dust after 5 minutes, and can be the basis for model development after 10 minutes, 30 minutes, and 1 hour. Data preprocessing was performed, such as noise removal and missing value replacement, and a derived variable that considers temporal and spatial variables was created. The parameters of the model were selected through the response surface method. XGBoost, Random Forest, and Deep Learning (Multilayer Perceptron) are used as predictive models to check the difference between fine dust concentration and predicted values, and to compare the performance between models.
Keywords
PM10.PM2.5; machine learning; spatio-temporal model; weather data;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Atluri, Gowtham, Anuj Karpatne, and Vipin Kumar. "Spatio-temporal data mining: A survey of problems and methods." ACM Computing Surveys (CSUR) 51.4 (2018): 1-41.
2 Barnett, Vic, and Toby Lewis. "Outliers in statistical data." Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics (1984).
3 Box, George EP, and Donald W. Behnken. "Some new three level designs for the study of quantitative variables." Technometrics 2.4 (1960): 455-475.   DOI
4 Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.   DOI
5 Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.
6 Hawkins, Douglas M. Identification of outliers. Vol. 11. London: Chapman and Hall, 1980.
7 Johnson, Richard Arnold, and Dean W. Wichern. Applied multivariate statistical analysis. Vol. 6. London, UK:: Pearson, 2014.
8 Kowalski, Pawel, and Robert Smyk. "Review and comparison of smoothing algorithms for one-dimensional data noise reduction." 2018 International Interdisciplinary PhD Workshop (IIPhDW). IEEE, 2018.
9 이범석. 반응 표면 방법을 이용한 딥러닝 매개변수 최적화 연구 . 인천: 인하대학교 대학원, 2017. Print.