Browse > Article
http://dx.doi.org/10.11001/jksww.2020.34.4.277

Comparison of machine learning algorithms for Chl-a prediction in the middle of Nakdong River (focusing on water quality and quantity factors)  

Lee, Sang-Min (Department of Environmental Engineering, Pukyong National University)
Park, Kyeong-Deok (Department of Marine Design Convergence engineering, Pukyong National University)
Kim, Il-Kyu (Department of Environmental Engineering, Pukyong National University)
Publication Information
Journal of Korean Society of Water and Wastewater / v.34, no.4, 2020 , pp. 277-288 More about this Journal
Abstract
In this study, we performed algorithms to predict algae of Chlorophyll-a (Chl-a). Water quality and quantity data of the middle Nakdong River area were used. At first, the correlation analysis between Chl-a and water quality and quantity data was studied. We extracted ten factors of high importance for water quality and quantity data about the two weirs. Algorithms predicted how ten factors affected Chl-a occurrence. We performed algorithms about decision tree, random forest, elastic net, gradient boosting with Python. The root mean square error (RMSE) value was used to evaluate excellent algorithms. The gradient boosting showed 10.55 of RMSE value for the Gangjeonggoryeong (GG) site and 11.43 of RMSE value for the Dalsung (DS) site. The gradient boosting algorithm showed excellent results for GG and DS sites. Prediction value for the four algorithms was also evaluated through the Receiver operating characteristic (ROC) curve and Area under curve (AUC). As a result of the evaluation, the AUC value was 0.877 at GG site and the AUC value was 0.951 at DS site. So the algorithm's ability to interpret seemed to be excellent.
Keywords
Chlorophyll-a (Chl-a); Machine learning; Gradient boosting; Nakdong River; RMSE; ROC curve;
Citations & Related Records
Times Cited By KSCI : 6  (Citation Analysis)
연도 인용수 순위
1 Breiman. L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and regression trees, Wadsworth Statistics/Probability Series, Wadsworth Advanced Books and Software.
2 Caissie, D., Satish, M.G., and El-Jabi, N. (2007). Predicting water temperatures using a deterministic model: Application on Miramichi River catchment(New Brunswick, Canada), J. Hydrol., 336, 303-315.   DOI
3 Chun, D.J. and Eun, J. (2017). Application method of remote sensing method for monitoring the water quality of big River, KEI Environmental Forum, 214, 21.
4 Cho, J. Y. (2019). Odor compounds forecasting in Daecheong water intake station using machine learning models, Doctor's Thesis, Chungnam National University, Daejeon, Korea.
5 Clercq, D.D., Wen, Z., and Fei, F. (2019). Determinants of efficiency in anaerobic bio-waste co-digestion facilities: A data envelopment analysis and gradient boosting approach, Appl. Energy, 253, 113570.   DOI
6 Dhaliwal, S.S., Nahid, A.A., and Abbas, R. (2018). Effective intrusion detection system using XGboost, Information, 9(7), 149.   DOI
7 Do, D.T. and Le, N.Q.K. (2020). Using extreme gradient boosting to identify origin of replication in Saccharomyces cerevisiae via hybrid features, Genomics. 112(3), 2445-2451.   DOI
8 Falconer, I.R. and Humpage, A.R. (2005). Health risk assessment of cyanobacterial (blue-green algal) toxins in drinking water, Int. J. Environ. Res. Public Health, 2(1), 43-50.   DOI
9 Fan, J., Ma, X., Wu, L., Zang, F., Yu, X., and Zeng, W. (2019). Light gradient boosting machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological date, Agric. Water Manag., 225, 105758.   DOI
10 Friedman, J.H. (2002). Stochastic gradient boosting, Comput. Stat. Data Anal., 38(4), 367-378.   DOI
11 Hastie, T., Tibshirani, R., and Friedman, J. (2009). The elements of statistical learning: date mining, inference and prediction, Springer Series in Statistics, New York, 745.
12 Heo, J.S., Kwon, D,h., Kim, J,B., Han, Y.H., and An, C.H. (2018). Prediction of cryptocurrency price trend using gradient boosting, KIPS Trans, Softw. Data Eng., 7(10), 387-396.   DOI
13 Hoerl, A.E. and Kennard, R.W. (1970). Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, 12(1), 55-67.   DOI
14 Hyndman, R.J. and Koehler, A.B. (2006). Another look at measure of forecast accuracy, Int. J. Forecast., 22(4), 679-688.   DOI
15 Johnson, N.E., Bonczak, B., and Kontokosta, C.E. (2018). Using a gradient boosting model to improve the performance of low-cost aerosol monitors in a dense, heterogeneous urban environment, Atmos. Environ., 184, 9-16.   DOI
16 Johnson, N.E., Ianiuk, O., Cazap, D., Liu, L., Starobin, D., Dobler, G., and Ghandehari, M. (2017). Patterns of waste generation: A gradient boosting model for short-term waste prediction in New York City, J. Waste Manag., 62, 3-11.   DOI
17 Kim, D.H. and Yom, J.H. (2018). Machine Learning Based Estimation of Chlorophyll-a Concentrations in the Nakdong River Using Satellite Imagery, J. Korean Soc, Geom. atics., 4, 231-236.
18 Jung, S.Y. and Kim, I.G. (2017). Analysis of water quality factor and correlation between water quality and Chl-a in middle and downstream weir section of Nakdong River, J. Korean Soc. Environ. Eng., 39(2), 89-96.   DOI
19 Jung, W.S., Kim, B,G., Kim, Y.D., and Kim, S.E. (2019). A study on the characteristics of cyanobacteria in the mainstream of Nakdong river using decision trees, J. Wetl. Res., 21(4), 312-320.   DOI
20 Kim, C.W. and Seo, Y.G. (2020). Design and performance prediction of ultra-low flow hydrocyclone using the random forest method, J. Korean Soc. Manuf. Technol. Eng., 29(2), 83-88.
21 Kim, G.H., Jung, K.Y., Yoon, J.S., and Cheon, S.U. (2013). Temporal and spatial analysis of water quality data observed in lower watershed of Nam River Dam, J. Korean Soc. Hazard Mitig., 13(6), 429-437.   DOI
22 Hwang, S.J. (2012). Forecasting system for water quality using artificial neural Networks: The Kangjung-Koryung weir on the Nakdong River, Doctor's Thesis, Keimyung University.
23 Kim, H.G. (2017). Prediction of chlorophyll-a in the middle reach of the Nakdong River at Maegok using artificial neural networks, Department of Integrated Biological Science, Master's Thesis, The Graduate School Busan National University, Busan, Korea.
24 Krishna, T.H., Rajabhushanam, C., Michael, G., and Kavitha, R. (2019). Liver disorderprognosis with Apache spark random forest and gradient booster Algorithms, IJITEE, 8, 2278-3075.
25 Lee, J.A. and Yoo, J.E. (2019). Exploration of predictors to teacher efficacy via elastic net, Asian J. Education, 20(1), 149-172.   DOI
26 Landry, M., Erlinger, T.P., Patschke, D., and Varrichio, O. (2016). Probabilistic gradient boosting machines for Gefcom 2014 wind forecasting, Int. J. Forecast, 32(3), 1061-1066.   DOI
27 Lawrence, R., Bunn, A., Powell, S., and Zambon, M. (2004). Classification of remotely sensed imagery using stochastic gradient boosting as a refinement of classification tree analysis, Remote Sens. Environ., 90(3), 331-336.   DOI
28 Lee, H.W. (2013). A study on nutrient mass balance of the weir sections in the middle of Nakdong River basin, Master's Thesis, Department of Environment Engineering Graduate School Yeungnam University, Gyeongsan, Gyeongbuk, Korea.
29 Lee, S.H., Kim, B.R., and Lee, H.W. (2014). A study on water quality after construction of the weirs in the middle area in Nakdong River, J. Korean Soc. Environ. Eng., 36(4), 258-264.   DOI
30 Lim, J.S., Kim, Y.W., Lee, J.H., Park, T.J., and Byun, I.G. (2015). Evaluation of Correlation between Chlorophyll-a and Multiple Parameters by Multiple Linear Regression Analysis, J. Korean Soc. Environ. Eng., 37(5), 253-261.   DOI
31 McLaughlin, D.B. (2012). Assessing the predictive performance of risk-based water quality criteria using decision error estimate from receiver operating characteristics(ROC) analysis, Integr. Environ. Asses., 8(4), 674-684.   DOI
32 Morrison, A.M., Coughlin, K., Shin, J.P., Coull, B.A., and Rex, A.C. (2003). Receiver operating characteristic curve analysis of beach water quality indicator variables, Appl. Environ. Microb., 69(11), 6405-6411.   DOI
33 Park, K.Y., and Ko. J.W. (2019). A short guide to machine learning for economists, Korean J. Econ., 26(2), 367-408.   DOI
34 Nieto, P.J.G., Gonzalo, E.G., Lasheras, F.S., Fernandez, J.J.R., Muniz, C.D., and Cos Jues, F.J. (2018). Cyanotoxin level prediction in a resevoir using gradient boosted regression trees: A case study, Environ. Sci. Pollut. R., 25, 22658-22671.   DOI
35 Muller, A.C., and Guido, S. (2016). Introduction to Machine Learning with Python: A Guide for Data Scientists, O'Reilly Media, Inc.
36 Park, B.G. (2015). A study for estimation of chlorophyll-a in a mid-lower reach of the Nakdong River using a neural network, Master's Thesis, Department of Civil Engineering, The Graduate School Pukyong Natioal University, Busan, Korea.
37 Rokach, L., and Maimon, O. (2005). Decision Trees In Data Mining and Knowledge Discovery Handbook, Springer, Boston, MA.
38 Song, S.S., Park, J.J., Kang, T.T., Kim, Y.S., Kim, J.Y., and Kang, T.K. (2017). Accuracy evaluation and alert level setting for real-time cyanobacteria measurement using receiver operating characteristic curve analysis, J. Korean Soc. Water Environ., 33(2), 130-139.   DOI
39 Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, Series B (Methodological), 58(1), 267-288.   DOI
40 Metz, C.E. (1978). Basic principles of ROC analysis, Seminars in the Nuclear Medicine, 8(4), 283-298.   DOI
41 Twisti, H., Edeards. A.C., and Codd, G.A. (1988). Algae growth respones to waters of contrasting tributaries of the river Dee, North-East Scotland, Water Res., 32(8), 2471-2479.   DOI
42 Vapnik, V. (1998). Statistical learning theory, Wiley-Interscience, New York.
43 Wei, L., Huang, C., Wang, Z., Wang, Z., Zhou, X., and Cao, L. (2019). Monitoring of urban black-odor water based on Nemerow index and gradient boosting decision tree regression using UAV-borne hyperspectral imagery, Remote Sens., 11(20), 2402.   DOI
44 Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.   DOI
45 Persson, C., Bacher, P., Shiga, T., and Madsen, H. (2017). Multi-site solar power forecasting using gradient boosted regression trees, J. Sol. Energy, 150, 423-436.   DOI