DOI QR코드

DOI QR Code

Development of a water quality prediction model for mineral springs in the metropolitan area using machine learning

머신러닝을 활용한 수도권 약수터 수질 예측 모델 개발

  • Yeong-Woo Lim (Graduate School of Business IT, Kookmin University) ;
  • Ji-Yeon Eom (Graduate School of Business IT, Kookmin University) ;
  • Kee-Young Kwahk (College of Business Administration/Graduate School of Business IT, Kookmin University)
  • 임영우 (국민대학교 비즈니스IT전문대학원) ;
  • 엄지연 (국민대학교 비즈니스IT전문대학원) ;
  • 곽기영 (국민대학교 경영대학/비즈니스IT전문대학원)
  • Received : 2022.11.15
  • Accepted : 2023.03.11
  • Published : 2023.03.31

Abstract

Due to the prolonged COVID-19 pandemic, the frequency of people who are tired of living indoors visiting nearby mountains and national parks to relieve depression and lethargy has exploded. There is a place where thousands of people who came out of nature stop walking and breathe and rest, that is the mineral spring. Even in mountains or national parks, there are about 600 mineral springs that can be found occasionally in neighboring parks or trails in the metropolitan area. However, due to irregular and manual water quality tests, people drink mineral water without knowing the test results in real time. Therefore, in this study, we intend to develop a model that can predict the quality of the spring water in real time by exploring the factors affecting the quality of the spring water and collecting data scattered in various places. After limiting the regions to Seoul and Gyeonggi-do due to the limitations of data collection, we obtained data on water quality tests from 2015 to 2020 for about 300 mineral springs in 18 cities where data management is well performed. A total of 10 factors were finally selected after two rounds of review among various factors that are considered to affect the suitability of the mineral spring water quality. Using AutoML, an automated machine learning technology that has recently been attracting attention, we derived the top 5 models based on prediction performance among about 20 machine learning methods. Among them, the catboost model has the highest performance with a prediction classification accuracy of 75.26%. In addition, as a result of examining the absolute influence of the variables used in the analysis through the SHAP method on the prediction, the most important factor was whether or not a water quality test was judged nonconforming in the previous water quality test. It was confirmed that the temperature on the day of the inspection and the altitude of the mineral spring had an influence on whether the water quality was unsuitable.

코로나19 팬데믹의 장기화로 인해 실내 생활에 지쳐가는 사람들이 우울감, 무기력증 등을 해소하기 위해 근거리의 산과 국립공원을 찾는 빈도가 폭발적으로 증가하였다. 자연으로 나온 수많은 사람들이 오가는 걸음을 멈추고 숨을 돌리며 쉬어가는 장소가 있는데 바로 약수터이다. 산이나 국립공원이 아니더라도 근린공원 또는 산책로에서도 간간이 찾아볼 수 있는 약수터는 수도권에만 약 6백여개가 위치해 있다. 하지만 불규칙적이고 수작업으로 수행되는 수질검사로 인해 사람들은 실시간으로 검사 결과를 알 수 없는 상태에서 약수를 음용하게 된다. 따라서 본 연구에서는 약수터 수질에 영향을 미치는 요인을 탐색하고 다양한 곳에 흩어져 있는 데이터를 수집하여 실시간으로 약수터 수질을 예측할 수 있는 모델을 개발하고자 한다. 데이터 수집의 한계로 인해 서울과 경기로 지역을 한정한 후 데이터 관리가 잘 이루어지고 있는 18개 시의 약 300여개 약수터를 대상으로 2015~2020년의 수질 검사 데이터를 확보하였다. 약수터 수질 적합 여부에 영향을 미칠 것으로 여겨지는 다양한 요인들 중 두 차례의 검토를 거쳐 총 10개의 요인을 최종 선별하였다. 최근 주목받고 있는 자동화 머신러닝 기술인 AutoML 기법을 활용하여 20여가지의 머신러닝 기법들 중 예측 성능 기준 상위 5개의 모델을 도출하였으며 그 중 catboost 모델이 75.26%의 예측 분류 정확도로 가장 높은 성능을 가지고 있음을 확인하였다. 추가로 SHAP 기법을 통해 분석에 사용한 변인들이 예측에 미치는 절대적인 영향력을 살펴본 결과 직전 수질 검사에서 부적합 판정을 받았는지 여부가 가장 중요한 요인이었으며 그 외 평균 기온, 과거 연속 2번 수질 부적합 판정 기록 유무, 수질 검사 당일 기온, 약수터 고도 등이 수질 부적합 여부에 영향을 미치고 있음을 확인하였다.

Keywords

References

  1. Ahmed, A. N., F. B. Othman, H. A. Afan, R. K. Ibrahim, C. M. Fai, M. S. Hossain, ... and A. Elshafie, "Machine learning methods for better water quality prediction," Journal of Hydrology, (2019), 578, 124084.
  2. Ahn, J. H. XAI Explainable AI, dissect artificial intelligence, Wikibooks, 2020.
  3. Choi, P., P. Heo, K. Lee, D. Cho, C. Kim, and T. Kim, "Study on Water Quality Improvement in Public Drinking Water Facilities in Gyeonggi-do," Journal of the Korean Society for Environmental Analysis, Vol.21, No.3(2018), 148~153.
  4. Costa, D. D., A. A. Gomes, M. Fernandes, R. L. da Costa Bortoluzzi, M. D. L. B. Magalhaes, and E. Skoronski, "Using natural biomass microorganisms for drinking water denitrification," Journal of Environmental Management, Vol.217, (2018), 520~530. https://doi.org/10.1016/j.jenvman.2018.03.120
  5. Eom, H. N., J. S. Kim, and S. O. Choi, "Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model," Journal of Intelligence and Information Systems, Vol.26, No.2(2020), 105~129.
  6. Faruk, D. O., "A hybrid neural network and ARIMA model for water quality time series prediction," Engineering applications of artificial intelligence, Vol.23, No.4(2010), 586~594. https://doi.org/10.1016/j.engappai.2009.09.015
  7. Fram, M. S. and K. Belitz, "Occurrence and concentrations of pharmaceutical compounds in groundwater used for public drinking-water supply in California," Science of the Total Environment, Vol.409, No.18(2011), 3409~3417. https://doi.org/10.1016/j.scitotenv.2011.05.053
  8. Gibson, R., E. Becerril-Bravo, V. Silva-Castro, and B. Jimenez, "Determination of acidic pharmaceuticals and potential endocrine disrupting compounds in wastewaters and spring waters by selective elution and analysis by gas chromatography-mass spectrometry," Journal of chromatography, Vol.1169, No.1~2 (2007), 31~39. https://doi.org/10.1016/j.chroma.2007.08.056
  9. Han, W. W. and D. H. Park, "A Study on the Characteristics of Ground Water Quality in Taejon (I)," DaeJeon University the Institute of Environmental Studies, Vol.2, (1997), 17~28.
  10. Herrero-Hernandez, E., M. S. Andrades, A. Alvarez-Martin, E. Pose-Juan, M. S. Rodriguez-Cruz, and M. J. Sanchez-Martin, "Occurrence of pesticides and some of their degradation products in waters in a Spanish wine region," Journal of hydrology, Vol.486, (2013), 234~245. https://doi.org/10.1016/j.jhydrol.2013.01.025
  11. Hyun, G. T., "Studies on the contamination properties of soil and groudwater in a densely populated livestock area in Jeju island," Doctoral Dissertation, Jeju national university, 2011.
  12. Khan, A., A. Khan, F. A. Khan, L. A. Shah, A. U. Rauf, Y. I. Badrashi, W. Khan, and J. Khan, "Assessment of the Impacts of Terrestrial Determinants on Surface Water Quality at Multiple Spatial Scales," Polish Journal of Environmental Studies, Vol.30, No.3(2021), 2137~2147. https://doi.org/10.15244/pjoes/122503
  13. Kim, C. S., N. K. Kim, and K. Y. Kwahk, "Research Trends Analysis of Machine Learning and Deep Learning: Focused on the Topic Modeling," Journal of the Korea Society of Digital Industry and Information Management, Vol.15, No.2(2019), 19~28.
  14. Kim, D. Y. and S. W. Jung, "Comparison of Crime Forecasting Models based on Spatio-Temporal Data and Machine Learning," Journal of the Architectural Institute of Korea, Vol.37, No.1 (2021), 135~143.
  15. Kim, E. M., S. B. Kim, and E. S. Cho, "Using Mechanical Learning Analysis of Determinants of Housing Sales and Establishment of Forecasting Model," Journal of Cadastre & Land InformatiX, Vol.50, No.1(2020), 181~200. https://doi.org/10.22640/LXSIRI.2020.50.1.181
  16. Kim, I., H. Ha, W. Seo, J. Bae, H. Moon, C. Park, E. Oh, S. Kim, and M. Kim, "A Study of Water Quality Characteristic of Natural Mineral Water - In Chonnam Area -," Korean Journal of Environmental Health, Vol.24, No.1(1998), 87~97.
  17. Kim, J. H. and K-. Y. Kwahk, "Class Imbalance Resolution Method and Classification Algorithm Suggesting Based on Dataset Type Segmentation," Journal of Intelligence and Information Systems, Vol.28, No.3(2022), 23~43. https://doi.org/10.13088/JIIS.2022.28.3.023
  18. Kim, K., B. Lee, O. Kim, M. Hur, K. Kim, J. Ro, C. Choe, J. Go, and Y. Kim, "A Study on pollution of spring in Incheon Area," Korean Journal of Sanitation, Vol.22, No.3(2007), 35~50.
  19. Kim, K., H. Gil, H. Kim, B. Roh, J. Hong, J. Lee, J. Kim, M. Lee, S. Eom, and J. Lee, "Study on Water Quality of Spring Water in Seoul," Journal of soil and groundwater environment, Vol.15, No.6(2010), 99~106.
  20. Kim, K., H. Gil, M. Lee, S. Eom, and J. Lee, "Survey of Citizens Public Opinion for Natural Spring Water in Seoul," Journal of soil and groundwater environment, Vol.16, No.2(2011), 1~5. https://doi.org/10.7857/JSGE.2011.16.2.001
  21. Kubat, M., R. Holte, and S. Matwin, "Learning when negative examples abound," In European conference on machine learning, Springer, Berlin, Heidelberg, (1997), 146~153.
  22. Lee, D. J., J. Kang, and K. Chung, "Data Processing of AutoML-based Classification Models for improving Performance in Unbalanced Classes," Journal of Convergence for Information Technology, Vol.11, No.6(2021), 49~54.
  23. Lee, G. T., "A study on the development of a predictive model of hotel financial distress by machine learning algorithm," International Journal of Tourism and Hospitality Research, Vol.35, No.1(2021), 59~71. https://doi.org/10.21298/IJTHR.2021.1.35.1.59
  24. Lee, H. H., "Arsenic distribution characteristics of surface water and groundwater in southern Hwasun region," Doctoral Dissertation, Chonnam national university, 2002.
  25. Lee, H. J. and S. G. Lee, "Comparative Analysis of Machine Learning Models for the Prediction of Pedestrian Crash Severity: Focused on Balancing Pedestrian Crash Dataset," Journal of Korean Society for Geospatial Information Science, Vol.29, No.2(2021), 3~15. https://doi.org/10.7319/kogsis.2021.29.2.003
  26. Lee, S., H. Song, C. Cho, Y. Lee, S. Lee, H. Jeon, D. Jung, and W. Jang, "The Characterization of the Rainfall Effects on the Chemical and Microbiological Mineral Water Quality (in Daegu Area)," Journal of Korean Society of Environmental Engineers, Vol. 24, No.12(2002), 2213~2225.
  27. Lee, Y., O. Park, S. An, Y. Kim, J. Kim, S. Bae, K. Paik, and Y. Moon, "Quality of Spring Water Influenced by Rainfall in Mudeung Mountain," Journal of the Korea Society for Environmental Analysis, Vol.14, No.3(2011), 146~157.
  28. Lundberg, S. M., G. G. Erion, and S. I. Lee, "Consistent individualized feature attribution for tree ensembles," University of Washington, 2018.
  29. Moon, H. and K. H. Park, "Mineral Characteristics of Spring Water ill Chonnam," Korean journal of food science and technology, Vol.30, No.2(1998), 253~259.
  30. Muniruzzaman, M., and D. Pedretti, "Mechanistic models supporting uncertainty quantification of water quality predictions in heterogeneous mining waste rocks: a review," Stochastic Environmental Research and Risk Assessment, Vol.35, No.5(2021), 985~1001. https://doi.org/10.1007/s00477-020-01884-z
  31. Nam, S. W. and K. D. Zoh, "A Study on Characteristics of Contamination and Target compounds for Water Quality in Public Spring Waters, Korea," The Korean journal of public health, Vol.51, No.1(2014), 55~66.
  32. Ok, Y., S. Kim, K. Kim, S. Lee, D. Moon, K. Lim, J. Sung, S. Hur, and J. Yang, "Monitoring of selected veterinary antibiotics in environmental compartments near a composting facility in Gangwon Province, Korea" Environmental monitoring and assessment, (2011), 693-701.
  33. Park, J., S. Kim, Y. Lee, N. Kim, Y. Kang, S. Bae, and J. Kim, "Evaluation of Characteristics of Microorganisms Isolated from Public Drinking Water Facilities in Gwangju City," Journal of Environmental Health Sciences, Vol.47, No.2 (2021), 182~191.
  34. Ryu, D. K., "Water quality and human health risk assessment on springs in Seoul," Master thesis, University of Seoul, 2005.
  35. Sattari, M. T., M. Abbasgoli Naebzad, and R. Mirabbasi Najafabadi, "Surface water quality prediction using decision tree method," Irrigation and Water Engineering, Vol.4, No.3(2014), 76-88.
  36. Shin, D. I. and K-. Y. Kwahk, "Development of a Detection Model for the Companies Designated as Administrative Issue in KOSDAQ Market," Journal of Intelligence and Information Systems, Vol.24, No.3(2018), 157~176. https://doi.org/10.13088/JIIS.2018.24.3.157
  37. Shin, S. K., "Measures to improve the quality of mineral springs in Busan," BDI Policy Focus, No.281(2015), 1~12.
  38. Song, H., H. Lim, G. Park, H. Park, H. Lee, M. Jo, Y. Kim, and J. Oh, "Mineral Components of Water Supply Plants and Spring Waters in Northern Gyeonggi Area," Journal of environmental health sciences, Vol.45, No.3(2019), 238~246.
  39. Song, H., N. Kim, D. Jeong, Y. Lee, H. Jeon, Y. Kim, U. Jang, and J. Kim, "Water Quality and Influencing Factors at Dalbi Spring in Daegu," Journal of Korean Society of Environmental Engineers, Vol.25, No.12(2003), 1570~1577.
  40. Stidson, R. T., C. A. Gray, and C. D. McPhail, "Development and use of modelling techniques for real-time bathing water quality predictions," Water and Environment Journal, Vol.26, No.1 (2012), 7~18. https://doi.org/10.1111/j.1747-6593.2011.00258.x
  41. Woo, J. S., "A study on the water quality of springs well in Cheoan city," Master thesis, Hanbat national university, 2008.
  42. Yang, S., J. Bae, H. Lim, E. Oh, B. Park, and N. Heo, "The Management and Water Quality of the Public Mineral Water in Jeonnam Area," Joint spring conference, (2006), 93~100.
  43. Yoon, T., H. Lee, G. Choi, S. Lee, M. Lee, and S. Eo, "Occurrence of Indicator Bacteria and Identification of Total Coliforms Using 16S rRNA Gene in Drinking Spring Water in Seoul," Journal of environmental health sciences, Vol.39, No.6(2013), 513~521. https://doi.org/10.5668/JEHS.2013.39.6.513