DOI QR코드

DOI QR Code

A study on applying random forest and gradient boosting algorithm for Chl-a prediction of Daecheong lake

대청호 Chl-a 예측을 위한 random forest와 gradient boosting 알고리즘 적용 연구

  • Lee, Sang-Min (Department of Environmental Engineering, Pukyong National University) ;
  • Kim, Il-Kyu (Department of Environmental Engineering, Pukyong National University)
  • 이상민 (부경대학교 환경공학과) ;
  • 김일규 (부경대학교 환경공학과)
  • Received : 2021.10.26
  • Accepted : 2021.12.14
  • Published : 2021.12.15

Abstract

In this study, the machine learning which has been widely used in prediction algorithms recently was used. the research point was the CD(chudong) point which was a representative point of Daecheong Lake. Chlorophyll-a(Chl-a) concentration was used as a target variable for algae prediction. to predict the Chl-a concentration, a data set of water quality and quantity factors was consisted. we performed algorithms about random forest and gradient boosting with Python. to perform the algorithms, at first the correlation analysis between Chl-a and water quality and quantity data was studied. we extracted ten factors of high importance for water quality and quantity data. as a result of the algorithm performance index, the gradient boosting showed that RMSE was 2.72 mg/m3 and MSE was 7.40 mg/m3 and R2 was 0.66. as a result of the residual analysis, the analysis result of gradient boosting was excellent. as a result of the algorithm execution, the gradient boosting algorithm was excellent. the gradient boosting algorithm was also excellent with 2.44 mg/m3 of RMSE in the machine learning hyperparameter adjustment result.

Keywords

Acknowledgement

본 연구는 "부경대학교 자율창의학술연구비 지원사업(2021)"의 일환으로 수행되었음.

References

  1. Back, S.C., Park, J.K., and Park, J.H. (2016). Spatial Distribution Mapping of Cyanobacteria in Daecheong Reservoir Using the Satellite Imagery, J. Korean Soc. Agric. Eng., 58(2), 53-63. https://doi.org/10.5389/KSAE.2016.58.2.053
  2. Caissie, D., Satish, M.G., and El-Jabi, N. (2007). Predicting water temperatures using a deterministic model: Application on Miramichi River catchment(New Brunswick, Canada), J. Jhydrol., 336, 303-315. https://doi.org/10.1016/j.jhydrol.2007.01.008
  3. Cheon, S.U., Lee, J.A., Lee, J.J., Yoo, Y.B., Bang, K.C., and Lee, Y.J. (2006). Relationship among Inflow Volume, Water Quality and Algal Growth in the Daecheong Lake, J. Korean Soc. Water Environ., 22(2), 342-348.
  4. Cho, J.Y. (2019). Odor compounds forecasting in Daecheong water intake station using machine learning models, Doctor's Thesis, Chungnam National University, Daejeon, Korea.
  5. Cho, W.H., Yum, K.T., Kim, J.S., Ban, Y.J., and Chung, S.W. (2012). Study on Algae Occurrence in Daecheong Reservoir, J. Environ. Impact Assess., 21(3), 367-380. https://doi.org/10.14249/EIA.2012.21.3.367
  6. Do, S.H. (2004). Residual Plots based on Robust Fits, Department of Computer Science and Statistics The Graduate School, Doctor's Thesis, Catholic University of Daegu, Daegu, Korea.
  7. Falconer, I.R. and Humpage, A.R. (2005). Health risk assessment of cyanobacterial (blue-green algal) toxins in drinking water, Int. J. Environ. Res. Public Health, 2(1), 43-50. https://doi.org/10.3390/ijerph2005010043
  8. Fan, J., Ma, X., Wu, L., Zang, F., Yu, X., and Zeng, W. (2019). Light gradient boosting machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological date, J. Agwat., 225, 105758.
  9. Heo, J.S., Kwon, D,h., Kim, J.B., Han, Y.H., and An, C.H. (2018). Prediction of Cryptocurrency Price Trend Using Gradient Boosting, J. Inf. Process. Syst., 7(10), 387-396.
  10. Hyndman, R.J. and Koehler, A.B. (2006). Another look at measure of forecast accuracy, Int. J. Forecast., 22(4), 679-688. https://doi.org/10.1016/j.ijforecast.2006.03.001
  11. Johnson, N.E., Bonczak, B., and Kontokosta, C.E. (2018). Using a gradient boosting model to improve the performance of low-cost aerosol monitors in a dense, heterogeneous urban environment, J. Atmosenv., 184, 9-16.
  12. Kim, C.W. and Seo, Y.G. (2020). Design and performance prediction of ultra-low flow hydrocyclone using the random forest method, J. Korean Soc. Manuf. Technol. Eng., 29(2), 83-88. https://doi.org/10.7735/ksmte.2020.29.2.83
  13. Kim, D.H. and Yom, J.H. (2018). Machine learning based estimation of chlorophyll-a concentrations in the Nakdong river using satellite imagery, J. Korean Soc. Surv Geodesy Photogramm Catogr., 4, 231-236.
  14. Kim, H.G. (2017). Prediction of chlorophyll-a in the middle reach of the Nakdong River at Maegok using artificial neural networks, Department of Integrated Biological Science, Master's Thesis, The Graduate School Busan National University, Busan, Korea.
  15. Korea Water Resources Corporation (K-water). (2007). Practical Manual of Dam, Korea Water Resources Corporation.
  16. Krishna, T.H., Rajabhushanam, C., Michael, G., and Kavitha, R. (2019). Liver disorderprognosis with Apache spark random forest and gradient booster Algorithms, IJITEE, 8, 2278-3075.
  17. Lawrence, R., Bunn, A., Powell, S., and Zambon, M. (2004). Classification of remotely sensed imagery using stochastic gradient boosting as a refinement of classification tree analysis, Remote Sens. Environ., 90(3), 331-336. https://doi.org/10.1016/j.rse.2004.01.007
  18. Lee, S.M., Park, K,D., Kim, I,K. (2020). Comparison of machine learning algorithms for Chl-a prediction in the middle of Nakdong River (focusing on water quality and quantity factors), J. Korean Soc. Water Wastewater, 34(4), 277-288. https://doi.org/10.11001/jksww.2020.34.4.277
  19. Lee, S.M., and Kim, I,K. (2021). A comparative study on the application of boosting algorithm for Chl-a estimation in the downstream of Nakdong river, J. Korean Soc. Environ. Eng., 43(1), 66-78. https://doi.org/10.4491/KSEE.2021.43.1.66
  20. Muller, A.C. and Guido, S. (2016). Introduction to Machine Learning with Python: A Guide for Data Scientists. O'Reilly Media, Inc.
  21. National Institute of Environmental Research (NIER). (2008). Algal bloom forecast operating manual.
  22. Nieto, P.J.G., Gonzalo, E.G., Lasheras, F.S., Fernandez, J.J.R., Muniz, C.D., and Cos Jues F.J (2018). Cyanotoxin level prediction in a resevoir using gradient boosted regression trees: A case study, Environ. Sci. Pollut. R., 25, 22658-22671. https://doi.org/10.1007/s11356-018-2219-4
  23. Oh, K,H. and Cho, Y,C. (2015). Evaluation of contamination level of the sediments from Chusori and Chudong areas in Daechung reservoir, J. Korean Soc. Environ. Eng., 37(5), 277-284. https://doi.org/10.4491/KSEE.2015.37.5.277
  24. Park, B.G. (2015). A study for estimation of chlorophyll-a in a mid-lower reach of the Nakdong river using a neural network, Master's Thesis, Department of Civil Engineering, The Graduate School Pukyong Natioal University, Busan, Korea.
  25. Rokach, L., Maimon, O. (2005). Decision Trees In Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA.
  26. Wei, L., Huang, C., Wang, Z., Wang, Z., Zhou, X., and Cao, L. (2019). Monitoring of urban black-odor water based on Nemerow index and gradient boosting decision tree regression using UAV-borne hyperspectral imagery, Remote Sens., 11(20), 2402. https://doi.org/10.3390/rs11202402