Browse > Article
http://dx.doi.org/10.15681/KSWE.2022.38.6.292

Comparison of Chlorophyll-a Prediction and Analysis of Influential Factors in Yeongsan River Using Machine Learning and Deep Learning  

Sun-Hee, Shim (Department of Environmental Science and Engineering, Ewha Womans University)
Yu-Heun, Kim (Department of Environmental Science and Engineering, Ewha Womans University)
Hye Won, Lee (Department of Environmental Science and Engineering, Ewha Womans University)
Min, Kim (Severe Storm Research Center, Ewha Womans University)
Jung Hyun, Choi (Department of Environmental Science and Engineering, Ewha Womans University)
Publication Information
Abstract
The Yeongsan River, one of the four largest rivers in South Korea, has been facing difficulties with water quality management with respect to algal bloom. The algal bloom menace has become bigger, especially after the construction of two weirs in the mainstream of the Yeongsan River. Therefore, the prediction and factor analysis of Chlorophyll-a (Chl-a) concentration is needed for effective water quality management. In this study, Chl-a prediction model was developed, and the performance evaluated using machine and deep learning methods, such as Deep Neural Network (DNN), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). Moreover, the correlation analysis and the feature importance results were compared to identify the major factors affecting the concentration of Chl-a. All models showed high prediction performance with an R2 value of 0.9 or higher. In particular, XGBoost showed the highest prediction accuracy of 0.95 in the test data.The results of feature importance suggested that Ammonia (NH3-N) and Phosphate (PO4-P) were common major factors for the three models to manage Chl-a concentration. From the results, it was confirmed that three machine learning methods, DNN, RF, and XGBoost are powerful methods for predicting water quality parameters. Also, the comparison between feature importance and correlation analysis would present a more accurate assessment of the important major factors.
Keywords
Chlorophyll-a; Deep learning; Feature importance; Machine learning; Yeongsan river;
Citations & Related Records
Times Cited By KSCI : 23  (Citation Analysis)
연도 인용수 순위
1 Alizamir, M., Heddam, S., Kim, S., and Mehr, A. D. (2021). On the implementation of a novel data-intelligence model based on extreme learning machine optimized by bat algorithm for estimating daily chlorophyll-a concentration: Case studies of river and lake in USA, Journal of Cleaner Production, 285, 124868.
2 An, Y. J. and Kampbell, D. H. (2003). Monitoring chlorophyll a as a measure of algae in lake Texoma marinas, Bulletin of Environmental Contamination and Toxicology, 70(3), 606-611.   DOI
3 Bae, S. W. and Yu, J. S. (2018). Predicting the real estate price index using machine learning methods and time series analysis model, Housing Studies Review, 26(1), 107-133. [Korean Literature]
4 Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794.
5 Breiman, L. (2001). Random forests, Machine Learning, 45(1), 5-32.   DOI
6 Cha, Y., Shin, J., and Kim, Y. (2020). Data-driven modeling of freshwater aquatic systems: Status and prospects, Journal of Korean Society on Water Environment, 36(6), 611-620. [Korean Literature]   DOI
7 Choi, M. S., Kim, C. H., Park, H. M., Cheon, M. A., Yoon, H., Namgoong, Y., and Kim, J. H. (2020). Detecting errors in POS-Tagged corpus on XGBoost and cross validation, KIPS Transactions on Software and Data Engineering, 9(7), 221-228. [Korean Literature]   DOI
8 Kim, S. H., Park, J. H., and Kim, B. (2021). Prediction of cyanobacteria harmful algal blooms in reservoir using machine learning and deep learning, Journal of Korea Water Resources Association, 54(spc1), 1167-1181. [Korean Literature]   DOI
9 Kim, S. W. and Jun, S. H. (2019). AI technology analysis using variable importance of deep learning, Journal of the Korean Institute of Intelligent Systems, 29, 70-75. [Korean Literature]   DOI
10 Kim, Y., Kwak, G. H., Lee, K. D., Na, S. I., Park, C. W., and Park, N. W. (2018). Performance evaluation of machine learning and deep learning algorithms in crop classification: Impact of hyper-parameters and training sample size, Korean Journal of Remote Sensing, 34(5), 811-827. [Korean Literature]   DOI
11 Korea Environment Institute (KEI). (2020). Development and application of algal bloom using artificial intelligence deep learning, https://www.kei.re.kr/elibList.es?mid=a10101000000&elibName=researchreport&act=view&c_id=732914 (accessed Dec. 2020)
12 Korea Meteorological Administration (KMA). (2022). Open MET Data Portal (OMDP), https://data.kma.go.kr/ (accessed Jun. 2022).
13 Kim, H. I., Lee, Y. S., and Kim, B. (2021). Real-time flood prediction applying random forest regression model in urban areas, Journal of Korea Water Resources Association, 54(spc1), 1119-1130. [Korean Literature]   DOI
14 Kim, J., Kim, J., and Seo, D. (2020). Effect of major pollution sources on algal blooms in the Seungchon weir and Juksan weir in the Yeongsan river using EFDC, Journal of Korea Water Resources Association, 53(5), 369-381. [Korean Literature]   DOI
15 Shin, Y., Lee, H., Lee, Y. J., Seo, D. K., Jeong, B., Hong, S., and Heo, T. Y. (2019). The prediction of diatom abundance by comparison of various machine learning methods, Mathematical Problems in Engineering, 2019, 1-13.
16 Kim, K. M. and Ahn, J. H. (2022). Machine learning predictions of chlorophyll-a in the Han river basin, Korea, Journal of Environmental Management, 318, 115636.
17 Kriegeskorte, N. and Golan, T. (2019). Neural network models and deep learning, Current Biology, 29(7), R231-R236.   DOI
18 Shin, J. K., Kang, B. G., and Hwang, S. J. (2016). Limnological study on spring-bloom of a green algae, eudorina elegans and weirwater pulsed-flows in the midstream (Seungchon weir pool) of the Yeongsan river, Korea, Korean Journal of Ecology and Environment, 49(4), 320-333. [Korean Literature]   DOI
19 Shin, Y., Yu, H., Lee, H., Lee, D., and Park, G. (2015). The change in patterns and conditions of algal blooms resulting from construction of weirs in the Youngsan river: Long-term data analysis, Korean Journal of Ecology and Environment, 48(4), 238-252. [Korean Literature]   DOI
20 Chun, B., Lee, T., Kim, S., Kim, J., Jang, K., Chun, J., and Shin, Y. (2020). Estimation of DNN-based Soil moisture at mountainous regions, Journal of The Korean Society of Agricultural Engineers, 62(5), 93-103. [Korean Literature]   DOI
21 Chung, D. H., Yun, J. S., and Yang, S. M. (2021). Machine learning for predicting entrepreneurial innovativeness, Asia-Pacific Journal of Business Venturing and Entrepreneurship, 16(3), 73-86. [Korean Literature]   DOI
22 Cui, Y., Meng, F., Fu, P., Yang, X., Zhang, Y., and Liu, P. (2021). Application of hyperspectral analysis of chlorophyll a concentration inversion in Nansi lake, Ecological Informatics, 64, 101360.
23 Ha, J. E., Shin, H. C., and Lee, Z. K. (2017). Korean text classification using randomforest and XGBoost focusing on Seoul metropolitan civil complaint data, The Journal of Bigdata, 2(2), 95-104. [Korean Literature]
24 Dittman, D. J., Khoshgoftaar, T. M., and Napolitano, A. (2015). The effect of data sampling when using random forest on imbalanced bioinformatics data, 2015 IEEE International Conference on Information Reuse and Integration, IEEE, 457-463.
25 Friedman, J. H. and Popescu, B. E. (2003). Importance sampled learning ensembles, Journal of Machine Learning Research, 94305, 1-32.
26 Gnauck, A. (2004). Interpolation and approximation of water quality time series and process identification, Analytical and Bioanalytical Chemistry, 380(3), 484-492.   DOI
27 Han, J. H., Ko, D. K., and Choe, H. (2019). Predicting and analyzing factors affecting financial stress of household using machine learning: Application of Xgboost, Journal of Consumer Studies, 30(2), 21-43. [Korean Literature]   DOI
28 Han, S. H., Kim, Y. Y., Sung, Y. G., Park, I. B., Cho, D. H., Nam, W. K., and Oh, J. K. (2015). Characteristics of organics and ammonia nitrogen discharged by pollution source from human living, Journal of Korean Society on Water Environment, 31(4), 377-386. [Korean Literature]   DOI
29 He, Y., Wang, X., and Xu, F. (2022). How reliable is chlorophyll-a as algae proxy in lake environments? New insights from the perspective of n-alkanes, Science of The Total Environment, 836, 155700.
30 Kwak, J. (2021). A study on the 3-month prior prediction of Chl-a concentraion in the Daechong lake using hydrometeorological forecasting data, Journal of Wetlands Research, 23(2), 144-153. [Korean Literature]   DOI
31 K-water (2022). My Water, https:/www.water.or.kr/ (accessed Jun. 2022).
32 Lee, Y. and Sun, J. (2020). Predicting highway concrete pavement damage using XGBoost, Korean Journal of Construction Engineering and Management, 21(6), 46-55. [Korean Literature]   DOI
33 Lee, K. T., Kim, M. S., Kim, H. J., and Kim, J. H. (2021). A model to predict occupational safety and health management expenses in construction applying multi-variate regression analysis and deep neural network, Journal of the Architectural Institute of Korea, 37(9), 217-226. [Korean Literature]   DOI
34 Lee, S. M. and Kim, I. K. (2021). A study on applying random forest and gradient boosting algorithm for Chl-a prediction of Daecheong lake, Journal of Korean Society of Water and Wastewater, 35(6), 507-516. [Korean Literature]   DOI
35 Lee, S. M., Park, K. D., and Kim, I. K. (2020). Comparison of machine learning algorithms for Chl-a prediction in the middle of Nakdong river (focusing on water quality and quantity factors), Journal of Korean Society of Water and Wastewater, 34(4), 277-288. [Korean Literature]   DOI
36 Lee, Y. G., Oh, J. Y., and Kim, G. (2020). Interpretation of load forecasting using explainable artificial intelligence techniques, The Transactions of the Korean Institute of Electrical Engineers, 69(3), 480-485. [Korean Literature]   DOI
37 Lee, Y. J., Jeong, B. K., Shin, Y. S., Kim, S. H., and Shin, K. H. (2013). Determination of the origin of particulate organic matter at the estuary of Youngsan river using stable isotope ratios (δ13C, δ15N), Korean Journal of Ecology and Environment, 46(2), 175-184. [Korean Literature]   DOI
38 Lepot, M., Aubin, J. B., and Clemens, F. H. (2017). Interpolation in time series: An introductive overview of existing methods, their performance criteria and uncertainty assessment, Water, 9(10), 796.
39 Liu, X., Feng, J., and Wang, Y. (2019). Chlorophyll a predictability and relative importance of factors governing lake phytoplankton at different timescales, Science of the Total Environment, 648, 472-480.   DOI
40 Lim, J. S., Kim, Y. W., Lee, J. H., Park, T. J., and Byun, I. G. (2015). Evaluation of correlation between chlorophyll-a and multiple parameters by multiple linear regression analysis, Journal of Korean Society of Environmental Engineers, 37(5), 253-261. [Korean Literature]   DOI
41 Ma, J., Qin, B., Paerl, H. W., Brookes, J. D., Hall, N. S., Shi, K., and Long, S. (2016). The persistence of cyanobacterial (M icrocystis spp.) blooms throughout winter in lake Taihu, China, Limnology and Oceanography, 61(2), 711-722.   DOI
42 Ministry of Environment (ME). (2022). Water Environment Information System (WEIS), https://water.nier.go.kr/ (accessed Jun. 2022).
43 Sim, D., Lee, J. Y., Jang, J., and Lee, M. (2022). Prediction of chloride concentration in groundwater on Jeju Island using XGBoost regression machine learning, Journal of the Geological Society of Korea, 55(2), 243-256. [Korean Literature]
44 Singha, S., Pasupuleti, S., Singha, S. S., Singh, R., and Kumar, S. (2021). Prediction of groundwater quality using efficient machine learning technique, Chemosphere, 276, 130265.
45 Song, J. J., Kim, B. B., and Hong, S. G. (2015). Study on water quality change of Yeongsan river's upstream, Journal of Korean Society of Environmental Technology, 16(2), 154-159. [Korean Literature]
46 Tekile, A., Kim, I., and Kim, J. (2015). Mini-review on river eutrophication and bottom improvement techniques, with special emphasis on the Nakdong river, Journal of Environmental Sciences, 30, 113-121.   DOI
47 Jung, S. Y. and Kim, I. K. (2017a). Analysis of water quality factor and correlation between water quality and Chl-a in middle and downstream weir section of Nakdong river, Journal of Korean Society of Environmental Engineers, 39(2), 89-96. [Korean Literature]   DOI
48 Tyralis, H., Papacharalampous, G., and Langousis, A. (2019). A brief review of random forests for water scientists and practitioners and their recent history in water resources, Water, 11(5), 910.
49 Wetzel, R. G. and Likens, G. E. (2013). Limnological Analyses, third ed, Springer Science & Business Media.
50 Jeong, J. H., Jeong, Y. C., and Chae, T. Y. (2021). Feature importance of electricity consumption for highly energy demand commercial buildings in cooling season, Journal of The Korean Society of Living Environmental System, 28(1), 29-38. [Korean Literature]   DOI
51 Jung, S. Y. and Kim, I. K. (2017b). Analysis of the water quality and correlation of impact factors during summer season in changnyeong-haman weir section, Journal of Korean Society of Water and Wastewater, 31(1), 83-91. [Korean Literature]   DOI
52 Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., and Kolehmainen, M. (2004). Methods for imputation of missing values in air quality data sets, Atmospheric Environment, 38(18), 2895-2907.   DOI
53 Kang, B. K. and Park, J. (2021). Effect of input variable characteristics on the performance of an ensemble machine learning model for algal bloom prediction, Journal of Korean Society of Water and Wastewater, 35(6), 417-424. [Korean Literature]   DOI
54 Savitzky, A. and Golay, M. J. (1964). Smoothing and differentiation of data by simplified least squares procedures, Analytical chemistry, 36(8), 1627-1639.   DOI
55 Park, H. K., Byeon, M. S., Choi, M. J., and Kim, Y. J. (2008). The effect factors on the growth of phytoplankton and the sources of organic matters in downstream of South-Han river, Journal of Korean Society on Water Environment, 24(5), 556-562. [Korean Literature]
56 Park, J. (2022). Development of ensemble machine learning model considering the characteristics of input variables and the interpretation of model performance using explainable artificial intelligence, Journal of Korean Society of Water and Wastewater, 36(4), 239-248. [Korean Literature]   DOI
57 Park, Y., Cho, K. H., Park, J., Cha, S. M., and Kim, J. H. (2015). Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea, Science of the Total Environment, 502, 31-41. [Korean Literature]   DOI
58 Schuwirth, N., Borgwardt, F., Domisch, S., Friedrichs, M., Kattwinkel, M., Kneis, D., and Vermeiren, P. (2019). How to make ecological models useful for environmental management, Ecological Modelling, 411, 108784.
59 Seo, K., Na, J. E., Ryu, H. S., and Kim, K. (2018). Characteristics of nitro-nutrients and phytoplankton dynamics in the Yeongsan river after weir construction, Journal of Korean Society on Water Environment, 34(4), 423-430. [Korean Literature]   DOI
60 Kang, K. H. and Park, H. J. (2019). Study on the effect of training data sampling strategy on the accuracy of the landslide susceptibility analysis using random forest method, Economic and Environmental Geology, 52(2), 199-212. [Korean Literature]   DOI
61 Karaca, Y. and Baleanu, D. (2020). A novel R/S fractal analysis and wavelet entropy characterization approach for robust forecasting based on self-similar time series modeling, Fractals, 28(08), 2040032.
62 Kim, C. W. and Seo, Y. G. (2020). Design and performance prediction of ultra-low flow hydrocyclone using the random forest method, Journal of the Korean Society of Manufacturing Technology Engineers, 29(2), 83-88. [Korean Literature]   DOI
63 Kim, G. H., Jung, K. Y., Yoon, J. S., and Cheon, S. U. (2013). Temporal and spatial analysis of water quality data observed in lower watershed of Nam river dam, Journal of the Korean Society of Hazard Mitigation, 13(6), 429-438. [Korean Literature]   DOI
64 Muller, A. C. and Guido, S. (2016). Introduction to machine learning with Python: A guide for data scientists, O'Reilly Media, Inc., 386.
65 Noh, S., Park, H., Choi, H., and Lee, J. (2014). Effect of climate change for cyanobacteria growth pattern in Chudong station of Lake Daechung, Journal of Korean Society on Water Environment, 30(4), 377-385. [Korean Literature]   DOI
66 Oh, J. Y., Ham, D. H., Lee, Y. G., and Kim, G. (2019). Short-term load forecasting using XGBoost and the analysis of hyperparameters, The Transactions of the Korean Institute of Electrical Engineers, 68, 1073-1078. [Korean Literature]   DOI