DOI QR코드

DOI QR Code

Estimation of Spatial Distribution Using the Gaussian Mixture Model with Multivariate Geoscience Data

다변량 지구과학 데이터와 가우시안 혼합 모델을 이용한 공간 분포 추정

  • Received : 2022.08.14
  • Accepted : 2022.08.23
  • Published : 2022.08.30

Abstract

Spatial estimation of geoscience data (geo-data) is challenging due to spatial heterogeneity, data scarcity, and high dimensionality. A novel spatial estimation method is needed to consider the characteristics of geo-data. In this study, we proposed the application of Gaussian Mixture Model (GMM) among machine learning algorithms with multivariate data for robust spatial predictions. The performance of the proposed approach was tested through soil chemical concentration data from a former smelting area. The concentrations of As and Pb determined by ex-situ ICP-AES were the primary variables to be interpolated, while the other metal concentrations by ICP-AES and all data determined by in-situ portable X-ray fluorescence (PXRF) were used as auxiliary variables in GMM and ordinary cokriging (OCK). Among the multidimensional auxiliary variables, important variables were selected using a variable selection method based on the random forest. The results of GMM with important multivariate auxiliary data decreased the root mean-squared error (RMSE) down to 0.11 for As and 0.33 for Pb and increased the correlations (r) up to 0.31 for As and 0.46 for Pb compared to those from ordinary kriging and OCK using univariate or bivariate data. The use of GMM improved the performance of spatial interpretation of anthropogenic metals in soil. The multivariate spatial approach can be applied to understand complex and heterogeneous geological and geochemical features.

지구과학 데이터(지오데이터)의 공간 이질성, 희소성 및 고차원성으로 인해 공간 분포 추정에 어려움이 있다. 따라서 지구과학의 많은 응용 분야에서 지오데이터의 고유 특성을 고려할 수 있는 공간 추정 기법이 필요하다. 본 연구에서는 기계 학습 알고리즘 중 하나인 가우시안 혼합 모델(Gaussian Mixture Model; GMM)을 이용하여 공간 예측 방법을 제공하고자 하였다. 제안된 기법의 성능을 검증하기 위해, 옛 제련소 부지에서 휴대용 X선 형광분석기(PXRF) 및 유도결합플라즈마-원자방출분광법(ICP-AES)을 이용하여 분석된 토양 농도 자료를 활용하였다. ICP-AES를 이용해 분석된 As와 Pb를 주변수로 하고, 나머지 자료는 보조변수로 활용하였다. 다차원의 보조변수 중 중요 변수를 선별하기 위해 랜덤포레스트 기반의 변수선택법을 적용하였다. ICP-AES 및 PXRF를 통해 구축된 다변량 데이터를 사용한 GMM의 결과를 단변량 및 이변량 데이터를 사용한 정규 크리깅(Ordinary Kriging; OK) 및 정규 공동크리깅(Ordinary Co-Kriging; OCK)의 결과와 비교하였다. GMM의 결과는 OK 및 OCK의 결과보다 낮은 평균 제곱근 편차(RMSE; 비소는 최대 0.11 및 납은 0.33까지 향상)와 높은 상관관계(r; 비소는 최대 0.31 및 납은 0.46까지 향상)를 제공하였다. 이는 GMM을 사용할 경우 토양 오염의 범위 해석의 성능을 향상시킬 수 있음을 지시한다. 본 연구는 다 변량 공간추정 접근법이 복잡하고 이질적인 지질 및 지구 화학자료의 특징을 이해하는 데 효과적으로 적용될 수 있음을 증명하였다.

Keywords

Acknowledgement

본 논문은 한국지질자원연구원의 주요사업(22-3415, 22-3117, 22-3412-2) 및 환경산업기술원의 지중환경오염·위해관리기술개발사업(과제번호: 2018002440002)의 지원으로 수행되었으며, 이에 감사드립니다.

References

  1. Babak, O. and Deutsch, C.V. (2009) Collocated cokriging based on merged secondary attributes. Mathematical Geosciences, v.41(8), p.921-926. doi: 10.1007/s11004-008-9192-2
  2. Barnett, R.M. and Deutsch, C.V. (2015) Multivariate imputation of unequally sampled geological variables. Mathematical Geosciences, v.47(7) p.791-817. doi: 10.1007/s11004-014-9580-8
  3. Bergen, K.J., Johnson, P.A., de Hoop, M.V. and Beroza, G.C. (2019) Machine learning for data-driven discovery in solid Earth geoscience. Science, v.363(6433). doi: 10.1126/science.aau0323
  4. Chen, T., Morris, J. and Martin, E. (2006) Probability density estimation via an infinite Gaussian mixture model: application to statistical process monitoring. Journal of the Royal Statistical Society: Series C (Applied Statistics), v.55(5), p.699-715. doi: 10.1111/j.1467-9876.2006.00560.x
  5. Galan, E., Fernandez-Caliani, J.C., Gonzalez, I., Aparicio, P. and Romero, A. (2008) Influence of geological setting on geochemical baselines of trace elements in soils. Application to soils of Southwest Spain. Journal of Geochemical Exploration, v.98(3), p.89-106. doi: 10.1016/j.gexplo.2008.01.001
  6. Goovaerts, P. (2000) Geostatistical approaches for incorporating elevation into the spatial interpolation of rainfall. Journal of Hydrology, v.228(1-2), p.113-129. doi: 10.1016/s0022-1694(00)00144-x
  7. Goovaerts, P., AvRuskin, G., Meliker, J., Slotnick, M., Jacquez, G. and Nriagu, J. (2005) Geostatistical modeling of the spatial variability of arsenic in groundwater of southeast Michigan. Water Resources Research, v.41(7). doi: 10.1029/2004wr003705
  8. Grana, D., Fjeldstad, T. and Omre, H. (2017) Bayesian Gaussian mixture linear inversion for geophysical inverse problems. Mathematical Geosciences, v.49, p.493-515. doi: 10.1007/s11004-016-9671-9
  9. Han, H.Y. (2009) Introduction to Pattern Recognition, ISBN-9788979146325(8979146329), (570p).
  10. Herms, I., Jodar, J., Soler, A., Lamban, L.J., Custodio, E., Nunez, J. A., ... and Jorge, J. (2021) Evaluation of natural background levels of high mountain karst aquifers in complex hydrogeological settings. A Gaussian mixture model approach in the Port del Comte (SE, Pyrenees) case study. Science of the Total Environment, v.756. doi: 10.1016/j.scitotenv.2020.143864
  11. ISO (International Organization for Standardization) (1995) ISO 11466:1995 Soil Quality Extraction of Trace Elements Soluble in Aqua Regia.
  12. Kim, E.J., Yoo, J.C., Park, S.M., Park, E.R. and Baek, K. (2016) Distribution of arsenic and heavy metals in soil particle sizes in the areas affected by the former smelter. J. Korean Soc. Environ. Anal., v.19(1), p.54-62.
  13. Kim, H.R., Kim, K.H., Yu, S., Moniruzzaman, M., Hwang, S.I., Lee, G.T. and Yun, S.T. (2019) Better assessment of the distribution of As and Pb in soils in a former smelting area, using ordinary co-kriging and sequential Gaussian co-simulation of portable X-ray fluorescence (PXRF) and ICP-AES data. Geoderma, v.341, p.26-38. doi: 10.1016/j.geoderma.2019.01.031
  14. Kim, H.K., Kim, K.H., Yun, S.T., Oh, J., Kim, H.R., Park, S.H., ... and Kim, T.S. (2019) Probabilistic assessment of potential leachate leakage from livestock mortality burial pits: A supervised classification approach using a Gaussian mixture model (GMM) fitted to a groundwater quality monitoring dataset. Process Safety and Environmental Protection, v.129, p.326-338. doi: 10.1016/j.psep.2019.07.015
  15. Kim, K.H., Yun, S.T., Park, S.S., Joo, Y. and Kim, T.S. (2014) Model-based clustering of hydrochemical data to demarcate natural versus human impacts on bedrock groundwater quality in rural areas, South Korea. Journal of Hydrology, v.519, Part A, p.626-636. doi: 10.1016/j.jhydrol.2014.07.055
  16. Koljonen, T. (1992) Geochemical Atlas of Finland, Part 2. Geological Survey of Finland.
  17. Korea Ministry of Environment (KMOE) (2009a) Counter Measurement Strategies to Remediate Soil Contamination near the Janghang Smelter. Unpublished Report (in Korean).
  18. Korea Ministry of Environment (KMOE) (2009b) Soil Environment Standard Test, Soil Environment Preservation Act. (291p).
  19. Korea Ministry of Environment (KMOE) (2011) Soil Contaminant Risk Assessment Guidance, (139p).
  20. Kursa, M.B. and Rudnicki, W.R. (2010) Feature selection with the Boruta package. Journal of Statistical Software, v.36(11), p.1-13. doi: 10.18637/jss.v036.i11
  21. Lee, H.G., Kim, J.I., Kim, R.Y., Ko, H., Kim, T.S. and Yoon, J.K. (2015) Improvement of analytical methods for arsenic in soil using ICP-AES. Analytical Science and Technology, v.28(6), p.409-416. doi: 10.5806/AST.2015.28.6.409
  22. Lee, P.K., Yu, S., Jeong, Y.J., Seo, J., Choi, S.G. and Yoon, B.Y. (2019) Source identification of arsenic contamination in agricultural soils surrounding a closed Cu smelter, South Korea. Chemosphere, v.217, p.183-194. doi: 10.1016/j.chemosphere.2018.11.010
  23. Lemiere, B. (2018) A review of pXRF (field portable X-ray fluorescence) applications for applied geochemistry. Journal of Geochemical Exploration, v.188, p.350-363. doi: 10.1016/j.gexplo.2018.02.006
  24. Moon, S.Y., Oh, M.A., Jung, J.K., Choi, S.I. and Lee, J.Y. (2011) Assessment of soil washing efficiency for arsenic contaminated site adjacent to Jang Hang refinery. Journal of Soil and Groundwater Environment, v.16(1), p.71-81. doi: 10.7857/JSGE.2011.16.1.071
  25. Qu, J. and Deutsch, C.V. (2018) Geostatistical simulation with a trend using gaussian mixture models. Natural Resources Research, v.27(3), p.347-363. doi: 10.1007/s11053-017-9354-3
  26. Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J. and Carvalhais, N. (2019) Deep learning and process understanding for data-driven Earth system science. Nature, v.566, p.195-204. doi: 10.1038/s41586-019-0912-1
  27. Remy, N., Boucher, A. and Wu, J. (2009) Applied geostatistics with SGeMS: A user's guide. Cambridge University Press, (264p).
  28. Reynolds, D. (2009) Gaussian Mixture Models. In: Li, S.Z., Jain, A. (eds) Encyclopedia of Biometrics. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-73003-5_196
  29. Ryan, J.G., Shervais, J.W., Li, Y., Reagan, M.K., Li, H.Y., Heaton, D., ... and IODP Expedition 352 Scientific Team (2017) Application of a handheld X-ray fluorescence spectrometer for real-time, high-density quantitative analysis of drilled igneous rocks and sediments during IODP Expedition 352. Chemical Geology, v.451, p.55-66. doi: 10.1016/j.chemgeo.2017.01.007
  30. Ryu, D.W. (2019) New Opportunities and Challenges of Geo-ICT Convergence Technology: GeoCPS and GeoAI. Journal of the Korean Society of Mineral and Energy Resources Engineers, v.56(4), p.383-397. doi: 10.32390/ksmer.2019.56.4.383
  31. Scrucca, L., Fop, M., Murphy, T.B. and Raftery, A.E. (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, v.8(1), p.289. doi: 10.32614/rj-2016-021
  32. Silva, D.S.F. and Deutsch, C.V. (2018) Multivariate data imputation using Gaussian mixture models. Spatial Statistics, v.26, p.74-90. doi: 10.1016/j.spasta.2016.11.002
  33. Silverman, B.W. (1998) Density Estimation for Statistics and Data Analysis (1st ed.), Routledge, (176p). doi:10.1201/9781315140919
  34. US EPA (2007) Method 6200: Field Portable X-ray Fluorescence Spectrometry for the Determination of Elemental Concentrations in Soil and Sediment.
  35. Yang, K., Kim, Y.J., Im, J. and Nam, K. (2014) Determination of human health risk incorporated with arsenic bioaccessibility and remediation goals at the former Janghang smelter site. Journal of Soil and Groundwater Environment, v.19(4), p.52-61. doi: 10.7857/JSGE.2014.19.4.052
  36. Zhang, X., Chen, N., Chen, Z., Wu, L., Li, X., Zhang, L., Di, L., Gong, J. and Li, D. (2018) Geospatial sensor web: A cyber-physical infrastructure for geoscience research and application. Earth-science Reviews, v.185, p.684-703. doi: j.earscirev.2018.07.006 https://doi.org/10.1016/j.earscirev.2018.07.006