DOI QR코드

DOI QR Code

Evaluation of Surrogate Monitoring Parameters for SS and T-P Using Multiple Linear Regression and Random Forest

다중 선형 회귀 분석과 랜덤 포레스트를 이용한 SS, T-P 대리모니터링 기법 평가

  • Jeung, Minhyuk (Department of Rural and Bio-Systems Engineering, Chonnam National University) ;
  • Beom, Jina (Department of Rural and Bio-Systems Engineering, Chonnam National University) ;
  • Choi, Dongho (Presidential Water Commission Support Department Planning and Operation, Republic of Korea Presidential Water Commission) ;
  • Kim, Young-joo (Department of Cadastre and Civil Engineering, VISION College of Jeonju) ;
  • Her, Younggu (Tropical Research and Education, Department of Agricultural and Biological Engineering, University of Florida) ;
  • Yoon, Kwangsik (Department of Rural and Bio-Systems Engineering, Chonnam National University)
  • Received : 2020.11.27
  • Accepted : 2021.02.03
  • Published : 2021.03.31

Abstract

Effective nonpoint source (NPS) pollution management requires frequent water quality monitoring, which is, however, often costly to be implemented in practice. Statistical techniques and machine learning methods allow us to identify and focus on fundamental environmental variables that have close relationships with NPS pollutants of interest. This study developed surrogate models to predict the concentrations of suspended sediment (SS) and total phosphorus (T-P) from turbidity and runoff discharge rates using multiple linear regression (MLR) and random forest (RF) methods. The RF models provided acceptable performance in predicting SS and T-P, especially when runoff discharge rates were high. The RF models outperformed the MLR models in all the cases. Such finding highlights the potential of RF techniques and models as a tool to identify fundamental environmental variables that are measured in relatively inexpensive ways or freely available but still able to provide information required to quantify the concentrations of NP S pollutants. The analysis of relative importance rates showed that the temporal variations of SS and T-P concentrations could be more effectively explained by that of turbidity than runoff discharge rate. This study demonstrated that the advanced statistical techniques such as machine learning could help to improve the efficiency of NPS pollutants monitoring.

Keywords

References

  1. Afendras, G., and M. Markatou, 2019. Optimality of training/test size and resampling effectiveness in cross-validation. Journal of Statistical Planning and Inference 199: 286-301. doi:10.1016/j.jspi.2018.07.005.
  2. Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, 267-281.
  3. American Public Health Association (APHA), 2001. Standard Methods for the Examination of Water and Waste Water. 21st ed, Washington DC, USA.
  4. Breiman, L., 2001. Random forests. Machine Learning 45: 5-32. https://doi.org/10.1023/A:1010933404324
  5. Breiman, L., J. Friedman, R. A. Olshen, and C. J. Stone, 1984. Classification and Regression Trees. Wadsworth: CRC press.
  6. Camdevyren, H., N. Demyr, A. Kanik, and S. Keskyn, 2005. Use if principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs. Ecological Modelling 181(4): 581-589. doi:10.1016/j.ecolmodel.2004.06.043.
  7. Chattergee, S., and A. S. Hadi, 1988. Sensitivity Analysis in Linear Regression. USA, Wiley.
  8. Chenini, I., and S. Khemiri, 2009. Evaluation of ground water quality using multiple linear regression and structural equation modeling. International Journal of Environmental Science and Technology 6(3): 509-519. doi:10.1007/BF03326090.
  9. DeForest, D. K., K. V. Brix, L. M. Tear, and W. J. Adams, 2018. Multiple linear regression models for predicting chronic aluminum toxicity to freshwater aquatic organisms and developing water quality guidelines. Environmental Toxicology and Chemistry 37(1): 80-90. doi:10.1002/etc.3922.
  10. Diaz-Uriate, R., and S. A. de Andres, 2006. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7: 3. doi:10.1186/1471-2105-7-3.
  11. Granata, F., S. Papirio, G. Esposito, R. Gargano, and G. Marinis, 2017. Machine learning algorithms for the forecasting of wastewater quality indicators. Water 9(2): 105. doi:10.3390/w9020105.
  12. Hasanipanah, M., R. S. Faradonbeh, H. B. Amnieh, D. J. Armaghani, and M. Monjezi, 2017. Forecasting blast-induced ground vibration developing a CART model. Engineering with Computers 33: 307-316. doi:10.1007/s00366-016-0475-9.
  13. Horsburgh, J. S., A. S. Jones, D. K. Stevens, D. G. Tarboton, and N. O. Mesner, 2009. A sensor network for high frequency estimation of water quality constituent fluxes using surrogates. Environmental Modelling and Software 25(9): 1031-1044. doi:10.1016/j.envsoft.2009.10.012.
  14. Houser, J. N., P. J. Mulholland, and K. O. Maloney, 2006. Upland disturbance affects headwater stream nutrients and suspended sediments during baseflow and stormflow. Journal of Environmental Quality 35: 352-365. doi:10.2134/jeq2005.0102.
  15. Johanna, I. F., and S. Petra, 2014. A turbidity-based method to continuously monitor sediment, carbon and nitrogen flows in mountainous watersheds. Journal of Hydrology 513: 45-57. doi:10.1016/j.jhydrol.2014.03.034.
  16. Jones, A. S., K. S. David, S. H. Jefery, and O. Nancy, 2011. Surrogate measures for providing high frequency estimates of total suspended solids and total phosphorus concentrations. Journal of the American Water Resources Association 47(2): 239-253. doi:10.1111/j.1752-1688.2010.00505.x.
  17. Jordan, P., A. Arnscheidt, and H. Mcgrogan, 2007. Characterising phosphorus transfers in rural catchments using a continuous bankside analyser. Hydrology and Earth System Science 11: 372-381. doi:10.5194/hess-11-372-2007.
  18. LieB, M., B. Glaser, and B. Huwe, 2012. Uncertainty in the spatial prediction of soil texture comparison of regression tree and random forest models. Geoderma 170: 70-79. doi:10.1016/j.geoderma.2011.10.010.
  19. Moriasi, D. N., J. G. Arnold, M. W. Van Liew, R. L. Bingner, R. D. Harmel, and T. L. Veith, 2007. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Transactions of the ASABE. 50(3): 885-900. doi:10.13031/2013.23153.
  20. Montgomery, J. L., T. C. Harmon, C. N. Haas, R. Hooper, N. L. Clesceri, W. Graham, W. J. Kaiser, A. Snaderson, B. Minsker, J. Schnoor, and P. Brezonik, 2007. The waters network: an integrated environmental observatory network for water research. Environmental Science and Technology 41(19): 6642-6647. doi:10.1021/es072618f.
  21. Nash, J. E., and J. V. Sutcliffe, 1970. River flow forecasting through conceptual models part I - a discussion of principles. Journal of Hydrology. 10(3): 282-290. doi:10.1016/0022-1694(70)90255-6.
  22. Ouedraogo, I., P. Defourny, and M. Vanclooster, 2018. Application of random forest regression and comparison of its performance to multiple linear regression in modeling groundwater nitrate concentration at the African continent scale. Hydrogeology Journal 27(3): 1-18. doi:10.1007/s10040-018-1900-5.
  23. Prasad, A. M., L. R. Iverson, and A. Liaw, 2006. Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9: 181-199. doi:10.1007/s10021-005-0054-1.
  24. Razi, M. A., and K. A. Athappilly, 2005. A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models. Expert Systems with Applications 29(1): 65-74. doi:10.1016/j.eswa.2005.01.006.
  25. Rodriguez-Galiano, V., M. P. Mendes, M. J. Garcia-soldado, M. Chica-Olmo, and L. Ribeiro, 2014. Predictive modeling of groundwater nitrate pollution using random forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain). Science of the Total Environment 476-477: 189-206. doi:10.1016/j.scitotenv.2014.01.001.
  26. Ruegner, H., M. Schwientek, B. Beckingham, B. Kuch, and P. Grathwohl, 2013. Turbidity as a proxy for total suspended solids (TSS) and particle facilitated pollutant transport in catchments. Environmental Earth Sciences 69: 373-380. doi:10.1007/s12665-013-2307-1.
  27. Scholefield, D., T. L. Goff, J. Braven, L. Dbdon, T. Long, and M. Butler, 2005. Concerted diurnal patterns in riverine nutrient concentrations and physical conditions. Science of the Total Environment 344: 201-210. doi:10.1016/j.scitotenv.2005.02.014.
  28. Settle, S., A. Goonetilleke, and G. Ayoko, 2007. Determination of surrogate indicators for phosphorus and solids in urban stormwater: application of multivariate data analysis techniques. Water, Air, and Soil Pollution 182: 149-161. doi:10.1007/s11270-006-9328-2.
  29. Singh, B., P. Sihag, and K. Singh, 2017. Modeling of impact of water quality on infiltration rate of soil by random forest regression. Modeling Earth Systems and Environment 3: 999-1004. doi:10.1007/s40808-017-0347-3.
  30. Snipes, M., and C. D. Taylor, 2014. Model selection and Akaike information criteria: an example from wine ratings and prices. Wine Economics and Policy 3(1): 3-9. doi:10.1016/j.wep.2014.03.001.
  31. United States Environmental Protection Agency (USEPA), 2007. "An approach for using load duration curves in the development of TMDLs." 841-B-07-006, United States Environmentl Protection Agency, 1-68.
  32. Verzani, J., 2018. Data sets, etc. for the text "Using R for introductory statistics". 2nd ed. Version 2.0-6.
  33. Villa, A., J. Folster, and K. Kyllmar, 2019. Determining suspended solids and total phosphorus from turbidity: comparison of high-frequency sampling with conventional monitoring methods. Environmental Monioring and Assessment 191: 605. doi:10.1007/s10661-019-7775-7.
  34. Wang, H., Y. Zhao, R. L. Pu, and Z. Z. Zhang, 2015. Mapping Robinia pseudoacacia forest health conditions by using combined spectral, spatial, and textural information extracted from IKONOS imagery and random forest classifier. Remote Sensing 7(7): 9020-9044. doi:10.3390/rs70709020.
  35. Zambrano-Bigiarini, M., 2017. Goodness-of-fit functions for comparison of simulated and observed hydrological time series. Version 0.3-10.
  36. Ziegler, A. D., L. X. Xi, and C. Tantasarin, 2011. Sediment load monitoring in the Mae Sa catchment in Northern Thailand, IAHS-AISH publication 86-91.