Browse > Article
http://dx.doi.org/10.5389/KSAE.2021.63.2.051

Evaluation of Surrogate Monitoring Parameters for SS and T-P Using Multiple Linear Regression and Random Forest  

Jeung, Minhyuk (Department of Rural and Bio-Systems Engineering, Chonnam National University)
Beom, Jina (Department of Rural and Bio-Systems Engineering, Chonnam National University)
Choi, Dongho (Presidential Water Commission Support Department Planning and Operation, Republic of Korea Presidential Water Commission)
Kim, Young-joo (Department of Cadastre and Civil Engineering, VISION College of Jeonju)
Her, Younggu (Tropical Research and Education, Department of Agricultural and Biological Engineering, University of Florida)
Yoon, Kwangsik (Department of Rural and Bio-Systems Engineering, Chonnam National University)
Publication Information
Journal of The Korean Society of Agricultural Engineers / v.63, no.2, 2021 , pp. 51-60 More about this Journal
Abstract
Effective nonpoint source (NPS) pollution management requires frequent water quality monitoring, which is, however, often costly to be implemented in practice. Statistical techniques and machine learning methods allow us to identify and focus on fundamental environmental variables that have close relationships with NPS pollutants of interest. This study developed surrogate models to predict the concentrations of suspended sediment (SS) and total phosphorus (T-P) from turbidity and runoff discharge rates using multiple linear regression (MLR) and random forest (RF) methods. The RF models provided acceptable performance in predicting SS and T-P, especially when runoff discharge rates were high. The RF models outperformed the MLR models in all the cases. Such finding highlights the potential of RF techniques and models as a tool to identify fundamental environmental variables that are measured in relatively inexpensive ways or freely available but still able to provide information required to quantify the concentrations of NP S pollutants. The analysis of relative importance rates showed that the temporal variations of SS and T-P concentrations could be more effectively explained by that of turbidity than runoff discharge rate. This study demonstrated that the advanced statistical techniques such as machine learning could help to improve the efficiency of NPS pollutants monitoring.
Keywords
Surrogate monitoring; non-point source; machine learning; influence factor;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Afendras, G., and M. Markatou, 2019. Optimality of training/test size and resampling effectiveness in cross-validation. Journal of Statistical Planning and Inference 199: 286-301. doi:10.1016/j.jspi.2018.07.005.   DOI
2 Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, 267-281.
3 American Public Health Association (APHA), 2001. Standard Methods for the Examination of Water and Waste Water. 21st ed, Washington DC, USA.
4 Breiman, L., 2001. Random forests. Machine Learning 45: 5-32.   DOI
5 Camdevyren, H., N. Demyr, A. Kanik, and S. Keskyn, 2005. Use if principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs. Ecological Modelling 181(4): 581-589. doi:10.1016/j.ecolmodel.2004.06.043.   DOI
6 Chattergee, S., and A. S. Hadi, 1988. Sensitivity Analysis in Linear Regression. USA, Wiley.
7 Chenini, I., and S. Khemiri, 2009. Evaluation of ground water quality using multiple linear regression and structural equation modeling. International Journal of Environmental Science and Technology 6(3): 509-519. doi:10.1007/BF03326090.   DOI
8 DeForest, D. K., K. V. Brix, L. M. Tear, and W. J. Adams, 2018. Multiple linear regression models for predicting chronic aluminum toxicity to freshwater aquatic organisms and developing water quality guidelines. Environmental Toxicology and Chemistry 37(1): 80-90. doi:10.1002/etc.3922.   DOI
9 Diaz-Uriate, R., and S. A. de Andres, 2006. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7: 3. doi:10.1186/1471-2105-7-3.   DOI
10 Granata, F., S. Papirio, G. Esposito, R. Gargano, and G. Marinis, 2017. Machine learning algorithms for the forecasting of wastewater quality indicators. Water 9(2): 105. doi:10.3390/w9020105.   DOI
11 Hasanipanah, M., R. S. Faradonbeh, H. B. Amnieh, D. J. Armaghani, and M. Monjezi, 2017. Forecasting blast-induced ground vibration developing a CART model. Engineering with Computers 33: 307-316. doi:10.1007/s00366-016-0475-9.   DOI
12 Breiman, L., J. Friedman, R. A. Olshen, and C. J. Stone, 1984. Classification and Regression Trees. Wadsworth: CRC press.
13 Horsburgh, J. S., A. S. Jones, D. K. Stevens, D. G. Tarboton, and N. O. Mesner, 2009. A sensor network for high frequency estimation of water quality constituent fluxes using surrogates. Environmental Modelling and Software 25(9): 1031-1044. doi:10.1016/j.envsoft.2009.10.012.   DOI
14 Houser, J. N., P. J. Mulholland, and K. O. Maloney, 2006. Upland disturbance affects headwater stream nutrients and suspended sediments during baseflow and stormflow. Journal of Environmental Quality 35: 352-365. doi:10.2134/jeq2005.0102.   DOI
15 Johanna, I. F., and S. Petra, 2014. A turbidity-based method to continuously monitor sediment, carbon and nitrogen flows in mountainous watersheds. Journal of Hydrology 513: 45-57. doi:10.1016/j.jhydrol.2014.03.034.   DOI
16 Jones, A. S., K. S. David, S. H. Jefery, and O. Nancy, 2011. Surrogate measures for providing high frequency estimates of total suspended solids and total phosphorus concentrations. Journal of the American Water Resources Association 47(2): 239-253. doi:10.1111/j.1752-1688.2010.00505.x.   DOI
17 Montgomery, J. L., T. C. Harmon, C. N. Haas, R. Hooper, N. L. Clesceri, W. Graham, W. J. Kaiser, A. Snaderson, B. Minsker, J. Schnoor, and P. Brezonik, 2007. The waters network: an integrated environmental observatory network for water research. Environmental Science and Technology 41(19): 6642-6647. doi:10.1021/es072618f.   DOI
18 Jordan, P., A. Arnscheidt, and H. Mcgrogan, 2007. Characterising phosphorus transfers in rural catchments using a continuous bankside analyser. Hydrology and Earth System Science 11: 372-381. doi:10.5194/hess-11-372-2007.   DOI
19 LieB, M., B. Glaser, and B. Huwe, 2012. Uncertainty in the spatial prediction of soil texture comparison of regression tree and random forest models. Geoderma 170: 70-79. doi:10.1016/j.geoderma.2011.10.010.   DOI
20 Moriasi, D. N., J. G. Arnold, M. W. Van Liew, R. L. Bingner, R. D. Harmel, and T. L. Veith, 2007. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Transactions of the ASABE. 50(3): 885-900. doi:10.13031/2013.23153.   DOI
21 Nash, J. E., and J. V. Sutcliffe, 1970. River flow forecasting through conceptual models part I - a discussion of principles. Journal of Hydrology. 10(3): 282-290. doi:10.1016/0022-1694(70)90255-6.   DOI
22 Ouedraogo, I., P. Defourny, and M. Vanclooster, 2018. Application of random forest regression and comparison of its performance to multiple linear regression in modeling groundwater nitrate concentration at the African continent scale. Hydrogeology Journal 27(3): 1-18. doi:10.1007/s10040-018-1900-5.   DOI
23 Prasad, A. M., L. R. Iverson, and A. Liaw, 2006. Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9: 181-199. doi:10.1007/s10021-005-0054-1.   DOI
24 Singh, B., P. Sihag, and K. Singh, 2017. Modeling of impact of water quality on infiltration rate of soil by random forest regression. Modeling Earth Systems and Environment 3: 999-1004. doi:10.1007/s40808-017-0347-3.   DOI
25 Razi, M. A., and K. A. Athappilly, 2005. A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models. Expert Systems with Applications 29(1): 65-74. doi:10.1016/j.eswa.2005.01.006.   DOI
26 Rodriguez-Galiano, V., M. P. Mendes, M. J. Garcia-soldado, M. Chica-Olmo, and L. Ribeiro, 2014. Predictive modeling of groundwater nitrate pollution using random forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain). Science of the Total Environment 476-477: 189-206. doi:10.1016/j.scitotenv.2014.01.001.   DOI
27 Ruegner, H., M. Schwientek, B. Beckingham, B. Kuch, and P. Grathwohl, 2013. Turbidity as a proxy for total suspended solids (TSS) and particle facilitated pollutant transport in catchments. Environmental Earth Sciences 69: 373-380. doi:10.1007/s12665-013-2307-1.   DOI
28 Scholefield, D., T. L. Goff, J. Braven, L. Dbdon, T. Long, and M. Butler, 2005. Concerted diurnal patterns in riverine nutrient concentrations and physical conditions. Science of the Total Environment 344: 201-210. doi:10.1016/j.scitotenv.2005.02.014.   DOI
29 Settle, S., A. Goonetilleke, and G. Ayoko, 2007. Determination of surrogate indicators for phosphorus and solids in urban stormwater: application of multivariate data analysis techniques. Water, Air, and Soil Pollution 182: 149-161. doi:10.1007/s11270-006-9328-2.   DOI
30 Snipes, M., and C. D. Taylor, 2014. Model selection and Akaike information criteria: an example from wine ratings and prices. Wine Economics and Policy 3(1): 3-9. doi:10.1016/j.wep.2014.03.001.   DOI
31 United States Environmental Protection Agency (USEPA), 2007. "An approach for using load duration curves in the development of TMDLs." 841-B-07-006, United States Environmentl Protection Agency, 1-68.
32 Verzani, J., 2018. Data sets, etc. for the text "Using R for introductory statistics". 2nd ed. Version 2.0-6.
33 Villa, A., J. Folster, and K. Kyllmar, 2019. Determining suspended solids and total phosphorus from turbidity: comparison of high-frequency sampling with conventional monitoring methods. Environmental Monioring and Assessment 191: 605. doi:10.1007/s10661-019-7775-7.   DOI
34 Ziegler, A. D., L. X. Xi, and C. Tantasarin, 2011. Sediment load monitoring in the Mae Sa catchment in Northern Thailand, IAHS-AISH publication 86-91.
35 Wang, H., Y. Zhao, R. L. Pu, and Z. Z. Zhang, 2015. Mapping Robinia pseudoacacia forest health conditions by using combined spectral, spatial, and textural information extracted from IKONOS imagery and random forest classifier. Remote Sensing 7(7): 9020-9044. doi:10.3390/rs70709020.   DOI
36 Zambrano-Bigiarini, M., 2017. Goodness-of-fit functions for comparison of simulated and observed hydrological time series. Version 0.3-10.