DOI QR코드

DOI QR Code

Optimization of Soil Contamination Distribution Prediction Error using Geostatistical Technique and Interpretation of Contributory Factor Based on Machine Learning Algorithm

지구통계 기법을 이용한 토양오염 분포 예측 오차 최적화 및 머신러닝 알고리즘 기반의 영향인자 해석

  • Hosang Han (Energy and Mineral Resources Engineering, Kangwon National University) ;
  • Jangwon Suh (Energy Resources and Chemical Engineering, Kangwon National University) ;
  • Yosoon Choi (Energy Resources Engineering, Pukyong National University)
  • 한호상 (강원대학교 에너지자원융합공학과) ;
  • 서장원 (강원대학교 에너지자원화학공학과) ;
  • 최요순 (부경대학교 에너지자원공학과)
  • Received : 2023.04.08
  • Accepted : 2023.05.04
  • Published : 2023.06.28

Abstract

When creating a soil contamination map using geostatistical techniques, there are various sources that can affect prediction errors. In this study, a grid-based soil contamination map was created from the sampling data of heavy metal concentrations in soil in abandoned mine areas using Ordinary Kriging. Five factors that were judged to affect the prediction error of the soil contamination map were selected, and the variation of the root mean squared error (RMSE) between the predicted value and the actual value was analyzed based on the Leave-one-out technique. Then, using a machine learning algorithm, derived the top three factors affecting the RMSE. As a result, it was analyzed that Variogram Model, Minimum Neighbors, and Anisotropy factors have the largest impact on RMSE in the Standard interpolation. For the variogram models, the Spherical model showed the lowest RMSE, while the Minimum Neighbors had the lowest value at 3 and then increased as the value increased. In the case of Anisotropy, it was found to be more appropriate not to consider anisotropy. In this study, through the combined use of geostatistics and machine learning, it was possible to create a highly reliable soil contamination map at the local scale, and to identify which factors have a significant impact when interpolating a small amount of soil heavy metal data.

지구통계 기법을 기반으로 토양오염지도를 작성하는 경우 예측 오차가 발생하며 이에 영향을 미치는 다양한 원인이 존재한다. 본 연구에서는 정규 크리깅을 활용하여 폐광산지역의 토양 내 중금속 농도 샘플링 데이터로부터 격자형 기반의 토양오염지도를 작성하였다. 해당 지도의 예측 오차에 영향을 미친다고 판단된 5개 인자를 선정하고, Leave-one-out 기법을 기반으로 인자의 옵션과 설정값의 변화에 따른 예측값과 실측값 간의 평균제곱근오차(root mean square error, RMSE) 변화를 분석하였다. 이후 머신러닝 알고리즘을 이용하여 RMSE에 영향을 미치는 상위 3개 인자를 도출하였다. 그 결과, Standard interpolation에서는 Variogram Model, Minimum Neighbors, Anisotropy 인자가 RMSE에 가장 큰 영향을 미치는 것으로 분석되었다. 베리오그램 모델에서는 Spherical 모델이 가장 낮은 RMSE를 보였으며, Minimum Neighbors는 3에서 최젓값을 보인 후 값이 증가함에 따라 증가하였다. Anisotropy의 경우 이방성을 고려하지 않는 것이 더 적합한 것으로 나타났다. 본 연구에서는 지구통계와 머신러닝의 복합 활용을 통해 지역 규모에서 높은 신뢰성을 갖는 토양오염지도를 작성할 수 있었고, 적은 수의 토양 샘플링 데이터의 보간 작업 시 어떠한 요인들이 큰 영향을 미치는지 파악할 수 있었다.

Keywords

Acknowledgement

본 연구는 2021년도 정부(산업통상자원부)의 재원으로 해외자원개발협회의 지원을 받아 수행된 연구임(데이터 사이언스 기반 석유·가스 탐사 컨소시엄).

References

  1. Abuzaid, A.S., Jahin, H.S., Shokr, M.S., El Baroudy, A.A., Mohamed, E.S., Rebouh, N.Y. and Bassouny, M.A. (2023) A Novel Regional-Scale Assessment of Soil Metal Pollution in Arid Agroecosystems. Agronomy, v.13(1). doi: 10.3390/agronomy13010161
  2. Choi, S., Go, W., Yoon, W., Hwang, S. and Kang, M. (2003) Analysis of the Distribution Pattern of Seawater Intrusion in Coastal Area using the Geostatistics and GIS. The Journal of GIS Assosiation of Korea, v.11(3), p.251-260.
  3. Chung, S., Kang, D., Park, H. and Shim, B. (2000) Application of Geostatistical Methods for the Analysis of Groundwater Contamination in Pusan. The Journal of Engineering Geology, v.10(3), p.247-261.
  4. Chung, S., Shim, B., Kang, D., Kim, B., Park, H., Won, J. and Kim, G. (2001) Interpolation of Missing Groundwater Level Data Using Kriging at a National Groundwater Monitoring Well. Journal of the Geological Society of Korea, v.37(3), p.421-430.
  5. Franke, R. (1982) Smooth Interpolation of Scattered Data by Local Thin Plate Splines. Computers & Mathematics with Applications, v.8(4), p.273-281. doi: 10.1016/0898-1221(82)90009-8
  6. Hammam, A.A., Mohamed, W.S., Sayed, S.E.E., Kucher, D.E. and Mohamed, E.S. (2022) Assessment of Soil Contamination Using GIS and Multi-Variate Analysis: A Case Study in El-Minia Governorate, Egypt. Agronomy, v.12(5). doi: 10.3390/agronomy12051197
  7. Heuvelink, G.B.M. and Webster, R. (2022) Spatial Statistics and Soil Mapping: A Blossoming Partnership under Pressure. Spatial Statistics, v.50. doi: 10.1016/j.spasta.2022.100639
  8. Jeong, J. and Jang, W. (2011) Estimation of Distribution of the Weak Soil Layer for Using Geostatistics. Journal of the Korean Society of Marine Engineering, v.35(8), p.1132-1140. doi: 10.5916/jkosme.2011.35.8.1132
  9. Jung, M., Jung, M. and Choi, Y. (2004) Environmental Assessment of Heavy Metals Anna Abandoned Metalliferous Mine in Korea. Economic and Environmental Geology, v.37(1), p.21-33.
  10. Jung, Y. and Lee, S. (2001) Potential Contamination of Soil and Groundwater from the Residual Mine Tailings in the Restored Abandoned Mine Area : Shihung Mine Area. Economic and Environmental Geology, v.34(5), p.461-470.
  11. Kira K. and Rendell L. A. (1992) The feature selection problem: traditional methods and a new algorithm. In Proceedings of the tenth national conference on Artificial intelligence (AAAI' 92), AAAI Press, p.129-134.
  12. Kim, J., Choi, J. and Kim, C. (2010) Comparative Evaluation of Interpolation Accuracy for CO2 Emission using GIS. Journal of Environmental Impact Assessment, v.19(6), p.647-656.
  13. Kim, H. and Jo, W. (2012) Assessment of PM-10 Monitoring Stations in Daegu using GIS Interpolation. Journal of Korean Society for Geospatial Information System, v.20(2), p.3-13. doi: 10.7319/kogsis.2012.20.2.003
  14. Kim, H., Kim, K., Yun, S., Hwang, S., Kim, H., Lee, G. and Kim, Y. (2012a) Evaluation of Geostatistical Approaches for better Estimation of Polluted Soil Volume with Uncertainty Evaluation. Journal of Soil and Groundwater Environment, v.17(6), p.69-81. doi: 10.7857/jsge.2012.17.6.069
  15. Kim, S., Lee, W., Kim, J., Shin, K., Kwon, T., Hyun, S. and Yang, J. (2012b) Prediction of Spatial Distribution Trends of Heavy Metals in Abandoned Gangwon Mine Site by Geostatistical Technique. Spatial Information Research, v.20(4), p.17-27. doi: 10.12672/ksis.2012.20.4.017
  16. Kim, H., Yu, S., Yun, S., Kim, K., Lee, G., Lee, J., Heo, C. and Ryu, D. (2022) Estimation of Spatial Distribution Using the Gaussian Mixture Model with Multivariate Geoscience Data. Economic and Environmental Geology, v.55(4), p.353-366. doi: 10.9719/EEG.2022.55.4.353
  17. Lee, Y., Park, M. and Hyun, S. (2022) Leaching Behavior of Metallic Elements from Abandoned Mine Sites in Varying Environmental Factors. Institute of Life Science and Natural Resources, v.30, p.87-100. doi: 10.33147/LSNRR.2022.30.1.87
  18. Lee, I. and Choi, S. (2008) Characteristics of Stream and Soil Contamination from the Tailing Disposal and Waste Rocks at the Abandoned Uljin Mine. Economic and Environmental Geology, v.41(1), p.63-79.
  19. Lee, M., Choi, J. and Kim, J. (2003) Distribution and remediation design of heavy metal contamination in farm-land soils and river deposits in the vicinity of the Goro abandoned mine. Economic and Environmental Geology, v.36(2), p.89-101.
  20. Park, N. (2009) Comparison of Univariate Kriging Algorithms for GIS-based Thematic Mapping with Ground Survey Data. Korean Journal of Remote Sensing, v.25(4), p.321-338. https://doi.org/10.7780/KJRS.2009.25.4.321
  21. Park, N. (2010) Application of Indicator Geostatistics for Probabilistic Uncertainty and Risk Analyses of Geochemical Data. Journal of Korean Earth Science Society, v.31(4), p.301-312. doi: 10.5467/JKESS.2010.31.4.301
  22. Park, N. (2013) Geostatistical Downscaling of Coarse Scale Remote Sensing Data and Integration with Precise Observation Data for Generation of Fine Scale Thematic Information. Korean Journal of Remote Sensing, v.29(1), p.69-79. doi: 10.7780/kjrs.2013.29.1.7
  23. Park, H., Shin, H., Roh, Y., Kim, K. and Park, K. (2012) Estimating Forest Carbon Stocks in Danyang Using Kriging Methods for Aboveground Biomass. Journal of the Korean Association of Geographic Information Studies, v.15(1), p.16-33. https://doi.org/10.11108/kagis.2012.15.1.016
  24. Park, N., Jang, D. and Chi, K. (2006) Geostatistical Integration of Ground Survey Data and Secondary Data for Geological Thematic Mapping. Korean Journal of Remote Sensing, v.22(6), p.581-593. https://doi.org/10.7780/KJRS.2006.22.6.581
  25. Park, N. and Jang, D. (2008) Mapping of Temperature and Rainfall Using DEM and Multivariate Kriging. Journal of the Korean Geographical Society, v.43(6), p.1002-1015.
  26. Robnik-Sikonja, M. and Kononenko, I. (1997) An Adaptation of Relief for Attribute Estimation in Regression. Machine Learning: Proceedings of the Fourteenth International Conference (ICML'97), p.296-304.
  27. Suh, J., Lee, H., Choi, Y. (2016) A rapid, accurate and efficient method to map heavy metal contaminated soils of abandoned mine sites using converted portable XRF data and GIS. International Journal of Environmental Research and Public Health, v.13(12), p.1191-1208. doi: 10.3390/ijerph13121191