DOI QR코드

DOI QR Code

Development of a model to predict water quality using an automated machine learning algorithm

머신러닝 자동화 알고리즘을 이용한 수질예측 모형 구축

  • Jungsu Park (Department of Civil and Environmental Engineering, Hanbat National University)
  • 박정수 (국립한밭대학교 건설환경공학과)
  • Received : 2022.10.04
  • Accepted : 2022.10.31
  • Published : 2022.12.15

Abstract

The management of algal bloom is essential for the proper management of water supply systems and to maintain the safety of drinking water. Chlorophyll-a(Chl-a) is a commonly used indicator to represent the algal concentration. In recent years, advanced machine learning models have been increasingly used to predict Chl-a in freshwater systems. Machine learning models show good performance in various fields, while the process of model development requires considerable labor and time by experts. Automated machine learning(auto ML) is an emerging field of machine learning study. Auto ML is used to develop machine learning models while minimizing the time and labor required in the model development process. This study developed an auto ML to predict Chl-a using auto sklearn, one of most widely used open source auto ML algorithms. The model performance was compared with other two popular ensemble machine learning models, random forest(RF) and XGBoost(XGB). The model performance was evaluated using three indices, root mean squared error, root mean squared error-observation standard deviation ratio(RSR) and Nash-Sutcliffe coefficient of efficiency. The RSR of auto ML, RF, and XGB were 0.659, 0.684 and 0.638, respectively. The results shows that auto ML outperforms RF, and XGB shows better prediction performance than auto ML, while the differences between model performances were not significant. Shapley value analysis, an explainable machine learning algorithm, was used to provide quantitative interpretation about the model prediction of auto ML developed in this study. The results of this study present the possible applicability of auto ML for the prediction of water quality.

Keywords

Acknowledgement

1. 이 성과는 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임 (No. 2022 R1F1A1065518) (50%). 2. 본 논문은 2022년도 정부(국토교통부)의 재원으로 국토교통과학기술진흥원의 지원을 받아 수행된 연구입니다 (22UGCP-B157945-03) (50%).

References

  1. Bennett, N.D., Croke, B.F., Guariso, G., Guillaume, J.H., Hamilton, S.H., Jakeman, A.J., Marsili-Libelli, S., Newham, L.T., Norton, J.P. and Perrin, C. (2013). Characterising performance of environmental models, Environ. Modell. Softw., 40, 1-20. https://doi.org/10.1016/j.envsoft.2012.09.011
  2. Breiman, L. (2001). Random forests, Mach. learn., 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
  3. Chen, T. and Guestrin, C. (2016). "Xgboost: A scalable tree boosting system", In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17 August, San Francisco, CA, USA. Association for Computing Machinery.
  4. Confalonieri, R., Coba, L., Wagner, B. and Besold, T.R. (2021). A historical perspective of explainable Artificial Intelligence, WIREs Data Min. Knol. Discov., 11(1), e1391.
  5. Dietterich, T.G. (2000). Ensemble methods in machine learning, Int. Workshop Multiple Classif. Syst., 1-15.
  6. Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M. and Hutter, F. (2020). Auto-sklearn 2.0: Hands-free automl via meta-learning. arXiv preprint arXiv:2007.04074.
  7. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M. and Hutter, F. (2015). "Efficient and robust automated machine learning", 2962-2970, Montreal. Canada.
  8. Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine, Ann. Stat., 1189-1232.
  9. Hollister, J.W., Milstead, W.B. and Kreakie, B.J. (2016). Modeling lake trophic state: A random forest approach, Ecosphere, 7, e01321.
  10. Kwak, J. (2021). A Study on the 3-month Prior Prediction of Chl-a Concentraion in the Daechong Lake using Hydrometeorological Forecasting Data, J. Wetl. Res., 23(2), 144-153.
  11. Kwon, Y.S., Baek, S.H., Lim, Y.K., Pyo, J., Ligaray, M., Park, Y. and Cho, K.H. (2018). Monitoring coastal chlorophyll-a concentrations in coastal areas using machine learning models, Water, 10(8), 1020.
  12. LeDell, E., and Poirier, S. (2020). "H2O AutoML: Scalable automatic machine learning", In Proceedings of the 7th ICML Workshop on Automated Machine Learning, 18 July, 2020, Virtual Workshop.
  13. Liu, M., and Lu, J. (2014). Support vector machine an alternative to artificial neuron network for water quality forecasting in an agricultural nonpoint source polluted river?, Environ. Sci. Pollut. R., 21, 11036-11053. https://doi.org/10.1007/s11356-014-3046-x
  14. Lundberg, S.M. and Lee, S.I. (2017). "A unified approach to interpreting model predictions", Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768-4777, 4-9 December, 2017, Long Beach California USA.
  15. Lundberg, S.M., Erion, G.G., and Lee, S.I. (2018). Consistent individualized feature attribution for tree ensembles arXiv preprint arXiv:1802.03888.
  16. Moriasi, D.N., Arnold, J.G., Van Liew, M.W., Bingner, R.L., Harmel, R.D. and Veith, T.L. (2007). Model evaluation guidelines for systematic quantification of accuracy in watershed simulations, Am. Soc. Agric. Biol. Eng., 50, 885-900.
  17. NIER National Institute of Environmental Research, realtime water information system http://www.koreawqi.go.kr/index_web.jsp (June 1, 2022).
  18. Park, J. (2022) Development of ensemble machine learning model considering the characteristics of input variables and the interpretation of model performance using explainable artificial intelligence, J. Korean Soc. Water Wastewater, 36(4), 209-218. https://doi.org/10.11001/jksww.2022.36.4.209
  19. Park, J., Lee, W.H., Kim, K.T., Park, C.Y., Lee, S. and Heo, T.Y. (2022). Interpretation of ensemble learning to predict water quality using explainable artificial intelligence, Sci. Total Environ., 832, 155070.
  20. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R. and Dubourg, V. (2011). Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12, 2825-2830.
  21. Shin, Y., Kim, T., Hong, S., Lee, S., Lee, E., Hong, S., Lee, C., Kim, T., Park, M.S., and Park, J. (2020). Prediction of chlorophyll-a concentrations in the Nakdong River using machine learning methods, Water, 12, 1822.
  22. Song, Y.H., Lee, Y.H., Lee, J.G., Park, J.G., Kim, G.U., Daejeon Sejong Research Institute. (2021). Policy analysis and response strategy to improve water quality in Miho stream, 2021-47, 33-36.
  23. Xin, D., Wu, E.Y., Lee, D.J.L., Salehi, N., and Parameswaran, A. (2021). Whither AutoML? Understanding the role of automation in machine learning workflows, 2021, arXiv preprint arXiv:2101.04834.