Browse > Article
http://dx.doi.org/10.11001/jksww.2021.35.6.507

A study on applying random forest and gradient boosting algorithm for Chl-a prediction of Daecheong lake  

Lee, Sang-Min (Department of Environmental Engineering, Pukyong National University)
Kim, Il-Kyu (Department of Environmental Engineering, Pukyong National University)
Publication Information
Journal of Korean Society of Water and Wastewater / v.35, no.6, 2021 , pp. 507-516 More about this Journal
Abstract
In this study, the machine learning which has been widely used in prediction algorithms recently was used. the research point was the CD(chudong) point which was a representative point of Daecheong Lake. Chlorophyll-a(Chl-a) concentration was used as a target variable for algae prediction. to predict the Chl-a concentration, a data set of water quality and quantity factors was consisted. we performed algorithms about random forest and gradient boosting with Python. to perform the algorithms, at first the correlation analysis between Chl-a and water quality and quantity data was studied. we extracted ten factors of high importance for water quality and quantity data. as a result of the algorithm performance index, the gradient boosting showed that RMSE was 2.72 mg/m3 and MSE was 7.40 mg/m3 and R2 was 0.66. as a result of the residual analysis, the analysis result of gradient boosting was excellent. as a result of the algorithm execution, the gradient boosting algorithm was excellent. the gradient boosting algorithm was also excellent with 2.44 mg/m3 of RMSE in the machine learning hyperparameter adjustment result.
Keywords
Chlorophyll-a(Chl-a); Machine learning; Daecheong lake; RMSE; Hyper parameter;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Falconer, I.R. and Humpage, A.R. (2005). Health risk assessment of cyanobacterial (blue-green algal) toxins in drinking water, Int. J. Environ. Res. Public Health, 2(1), 43-50.   DOI
2 Hyndman, R.J. and Koehler, A.B. (2006). Another look at measure of forecast accuracy, Int. J. Forecast., 22(4), 679-688.   DOI
3 Kim, D.H. and Yom, J.H. (2018). Machine learning based estimation of chlorophyll-a concentrations in the Nakdong river using satellite imagery, J. Korean Soc. Surv Geodesy Photogramm Catogr., 4, 231-236.
4 Krishna, T.H., Rajabhushanam, C., Michael, G., and Kavitha, R. (2019). Liver disorderprognosis with Apache spark random forest and gradient booster Algorithms, IJITEE, 8, 2278-3075.
5 Lee, S.M., and Kim, I,K. (2021). A comparative study on the application of boosting algorithm for Chl-a estimation in the downstream of Nakdong river, J. Korean Soc. Environ. Eng., 43(1), 66-78.   DOI
6 Back, S.C., Park, J.K., and Park, J.H. (2016). Spatial Distribution Mapping of Cyanobacteria in Daecheong Reservoir Using the Satellite Imagery, J. Korean Soc. Agric. Eng., 58(2), 53-63.   DOI
7 Cho, J.Y. (2019). Odor compounds forecasting in Daecheong water intake station using machine learning models, Doctor's Thesis, Chungnam National University, Daejeon, Korea.
8 Cho, W.H., Yum, K.T., Kim, J.S., Ban, Y.J., and Chung, S.W. (2012). Study on Algae Occurrence in Daecheong Reservoir, J. Environ. Impact Assess., 21(3), 367-380.   DOI
9 Caissie, D., Satish, M.G., and El-Jabi, N. (2007). Predicting water temperatures using a deterministic model: Application on Miramichi River catchment(New Brunswick, Canada), J. Jhydrol., 336, 303-315.   DOI
10 Cheon, S.U., Lee, J.A., Lee, J.J., Yoo, Y.B., Bang, K.C., and Lee, Y.J. (2006). Relationship among Inflow Volume, Water Quality and Algal Growth in the Daecheong Lake, J. Korean Soc. Water Environ., 22(2), 342-348.
11 Do, S.H. (2004). Residual Plots based on Robust Fits, Department of Computer Science and Statistics The Graduate School, Doctor's Thesis, Catholic University of Daegu, Daegu, Korea.
12 Korea Water Resources Corporation (K-water). (2007). Practical Manual of Dam, Korea Water Resources Corporation.
13 Nieto, P.J.G., Gonzalo, E.G., Lasheras, F.S., Fernandez, J.J.R., Muniz, C.D., and Cos Jues F.J (2018). Cyanotoxin level prediction in a resevoir using gradient boosted regression trees: A case study, Environ. Sci. Pollut. R., 25, 22658-22671.   DOI
14 Lawrence, R., Bunn, A., Powell, S., and Zambon, M. (2004). Classification of remotely sensed imagery using stochastic gradient boosting as a refinement of classification tree analysis, Remote Sens. Environ., 90(3), 331-336.   DOI
15 Fan, J., Ma, X., Wu, L., Zang, F., Yu, X., and Zeng, W. (2019). Light gradient boosting machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological date, J. Agwat., 225, 105758.
16 Heo, J.S., Kwon, D,h., Kim, J.B., Han, Y.H., and An, C.H. (2018). Prediction of Cryptocurrency Price Trend Using Gradient Boosting, J. Inf. Process. Syst., 7(10), 387-396.
17 Johnson, N.E., Bonczak, B., and Kontokosta, C.E. (2018). Using a gradient boosting model to improve the performance of low-cost aerosol monitors in a dense, heterogeneous urban environment, J. Atmosenv., 184, 9-16.
18 Kim, C.W. and Seo, Y.G. (2020). Design and performance prediction of ultra-low flow hydrocyclone using the random forest method, J. Korean Soc. Manuf. Technol. Eng., 29(2), 83-88.   DOI
19 Kim, H.G. (2017). Prediction of chlorophyll-a in the middle reach of the Nakdong River at Maegok using artificial neural networks, Department of Integrated Biological Science, Master's Thesis, The Graduate School Busan National University, Busan, Korea.
20 Lee, S.M., Park, K,D., Kim, I,K. (2020). Comparison of machine learning algorithms for Chl-a prediction in the middle of Nakdong River (focusing on water quality and quantity factors), J. Korean Soc. Water Wastewater, 34(4), 277-288.   DOI
21 Muller, A.C. and Guido, S. (2016). Introduction to Machine Learning with Python: A Guide for Data Scientists. O'Reilly Media, Inc.
22 National Institute of Environmental Research (NIER). (2008). Algal bloom forecast operating manual.
23 Oh, K,H. and Cho, Y,C. (2015). Evaluation of contamination level of the sediments from Chusori and Chudong areas in Daechung reservoir, J. Korean Soc. Environ. Eng., 37(5), 277-284.   DOI
24 Park, B.G. (2015). A study for estimation of chlorophyll-a in a mid-lower reach of the Nakdong river using a neural network, Master's Thesis, Department of Civil Engineering, The Graduate School Pukyong Natioal University, Busan, Korea.
25 Rokach, L., Maimon, O. (2005). Decision Trees In Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA.
26 Wei, L., Huang, C., Wang, Z., Wang, Z., Zhou, X., and Cao, L. (2019). Monitoring of urban black-odor water based on Nemerow index and gradient boosting decision tree regression using UAV-borne hyperspectral imagery, Remote Sens., 11(20), 2402.   DOI