Browse > Article
http://dx.doi.org/10.9719/EEG.2022.55.2.127

Predicting As Contamination Risk in Red River Delta using Machine Learning Algorithms  

Ottong, Zheina J. (School of Earth Sciences and Environmental Engineering, Gwangju Institute of Science and Technology (GIST))
Puspasari, Reta L. (School of Earth Sciences and Environmental Engineering, Gwangju Institute of Science and Technology (GIST))
Yoon, Daeung (Chonnam National University)
Kim, Kyoung-Woong (School of Earth Sciences and Environmental Engineering, Gwangju Institute of Science and Technology (GIST))
Publication Information
Economic and Environmental Geology / v.55, no.2, 2022 , pp. 127-135 More about this Journal
Abstract
Excessive presence of As level in groundwater is a major health problem worldwide. In the Red River Delta in Vietnam, several million residents possess a high risk of chronic As poisoning. The As releases into groundwater caused by natural process through microbially-driven reductive dissolution of Fe (III) oxides. It has been extracted by Red River residents using private tube wells for drinking and daily purposes because of their unawareness of the contamination. This long-term consumption of As-contaminated groundwater could lead to various health problems. Therefore, a predictive model would be useful to expose contamination risks of the wells in the Red River Delta Vietnam area. This study used four machine learning algorithms to predict the As probability of study sites in Red River Delta, Vietnam. The GBM was the best performing model with the accuracy, precision, sensitivity, and specificity of 98.7%, 100%, 95.2%, and 100%, respectively. In addition, it resulted the highest AUC of 92% and 96% for the PRC and ROC curves, with Eh and Fe as the most important variables. The partial dependence plot of As concentration on the model parameters showed that the probability of high level of As is related to the low number of wells' depth, Eh, and SO4, along with high PO43- and NH4+. This condition triggers the reductive dissolution of iron phases, thus releasing As into groundwater.
Keywords
groundwater arsenic; machine learning; predictive model; random forest; gradient boosting;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 Winkel, L., Berg, M., Amini, M., Hug, S.J. and Johnson, C.A. (2008) Predicting groundwater arsenic contamination in Southeast Asia from surface parameters. Nat. Geosci., v.1, p.536-542.   DOI
2 Berg, M., Tran, H.C., Nguyen, T.C., Pham, H.V., Schertenleib, R., Giger, W. (2001) Arsenic contamination of groundwater and drinking water in Vietnam: a human health threat. Env. Sci. Tech., v.35, p.2621-2626. doi: 10.1021/es010027y   DOI
3 Breiman, L. (2001) Random forests. Mach. Learn 45, 5-32. doi: https://doi.org/10.1023/A:1010933404324   DOI
4 Buschmann, J. and Berg, M. (2009) Impact of sulfate reduction on the scale of arsenic contamination in groundwater of the Mekong, Bengal and red river deltas. Appl. Geochem., v.24, p.1278-1286. doi: 10.1016/j.apgeochem.2009.04.002   DOI
5 Carrard, N., Foster, T. and Willetts, J. (2019) Groundwater as a source of drinking water in Southeast Asia and the Pacific: A multi-country review of current reliance and resource concerns. Water 2019 11, 1065. https://doi.org/10.3390/w11081605   DOI
6 Chen, T. and Guestrin, C. (2016) Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD, pp. 785-794. https://doi.org/10.1145/2939672.2939785   DOI
7 Choi, K.-W., Park, S.-S., Kang, C.-U., Lee, J.H. and Kim, S.J. (2021) A comparison study of alum sludge and ferric hydroxide based absorbents for arsenic adsorption from mine water. Econ. Environ. Geol., v.54, p.689-698. https://doi.org/10.9719/EEG.2021.54.6.689   DOI
8 Dramsch, J.S. (2020) 70 years of machine learning in geoscience in review. ADGEO. https://doi.org/10.1016/bs.agph.2020.08.002   DOI
9 Dreiseitl, S. and Ohno-Machado, L. (2002) Logistic regression and artificial neural network classification models: a methodology review. JBI, v.35, p.352-359. https://doi.org/10.1016/S1532-0464(03)00034-0   DOI
10 Friedman, J.H. (2001) Greedy function approximation: a gradient boosting machine. Ann. Stat., p.1189-1232.
11 Lee, J.H., Ji, W.H., Lee, J.S., Park, S.S., Choi, K.W., Kang, C.U. and Kim, S.J. (2020) A study of fluoride and arsenic adsorption from aqueous solution using alum sludge based absorbent. Econ. Environ. Geol., v.53, p.667-675. https://doi.org/10.9719/EEG.2020.53.6.667   DOI
12 Ayotte, J.D., Medalie, L., Qi, S.L., Backer, L.C. and Nolan, B.T. (2017) Estimating the high-arsenic domestic-well population in the conterminous United States. Env. Sci. Tech., v.51(21), p.12443-12454. doi: 10.1021/acs.est.7b02881   DOI
13 Berg, M., Stengel, C., Trang, P.T.K., Viet, P.H., Sampson, M.L., Leng, M., Samreth, S. and Fredericks, D. (2007) Magnitude of arsenic pollution in the Mekong and red river deltas-Cambodia and Vietnam. Sci. Tot. Environ., v.372, p.413-425. doi: 10.1016/j.scitotenv.2006.09.010   DOI
14 Guo, H., Li, X., Xiu, W., He, W., Cao, Y., Zhang, D. and Wang, A. (2019) Controls of organic matter bioreactivity on arsenic mobility in shallow aquifers of the Hetao Basin, P.R. China. Journal of Hydrology, v.571, p.448-459. https://doi.org/10.1016/j.jhydrol.2019.01.076.   DOI
15 Hastie, T., Tibshirani, R. and Friedman, J. (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media. https://doi.org/10.1007/978-0-387-21606-5   DOI
16 Kwon, O.-H., Park, H.-S., Lee, J.S. and Ji, W.H. (2020) A field study on the application of pilot-scale vertical flow reactor system into the removal of Fe, As and Mn in mine drainage. Econ. and Environ. Geol., v.53, p.695-701. https://doi.org/10.9719/EEG.2020.53.6.695   DOI
17 Podgorski, J. and Berg, M. (2020) Global threat of arsenic in groundwater. Science, v.368, p.845-850. doi: 10.1126/science.aba1510   DOI
18 Kinniburgh, D. (2001) Arsenic contamination of groundwater in Bangladesh. final report. http://www.bgs.ac.uk/research/groundwater/health/arsenic/Bangladesh/reports.
19 Liu, X., Lai, X. and Zhang, L. (2020) A hierarchical missing value imputation method by correlation-based K-Nearest Neighbors. In: Bi Y., Bhatia R., Kapoor S. (eds.) Intelligent Systems and Applications. Advances in Intelligent Systems and Computing, 1037. Springer, Cham. https://doi.org/10.1007/978-3-030-29516-5_38.   DOI
20 Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011) Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., v.12, p.2825-2830. https://dl.acm.org/doi/10.5555/1953048.2078195
21 Podgorski, J., Wu, R., Chakravorty, B. and Polya, D.A. (2020) Groundwater arsenic distribution in India by machine learning geospatial modeling. Int. J. Environ. Res. Public Health, v.17, p.7119. https://doi.org/10.3390/ijerph17197119   DOI
22 Ravenscroft, P., Burgess, W.G., Ahmed, K.M., Burren, M. and Perrin, J. (2005) Arsenic in ground- water of the bengal basin, bangladesh: Distribution, field relations, and hydrogeological setting. Hydrogeol. J., v.13, p.727-751. doi: 10.1007/s10040-003-0314-0   DOI
23 Podgorski, J.E., Labhasetwar, P., Saha, D. and Berg, M. (2018) Prediction modelling and mapping of groundwater fluoride contamination throughout India. Environ. Sci. Technol., v.52(17), p.9889-9898. https://doi.org/10.1021/acs.est.8b01679   DOI
24 Postma, D., Larsen, F., Thai, N.T., Trang, P.T.K., Jakobsen, R., Nhan, P.Q., Long, T.V., Viet, P.H. and Murray, A.S. (2012) Groundwater arsenic concentrations in Vietnam controlled by sediment age. Nat. Geosci., v.5, p.656-661. https://doi.org/10.1038/ngeo1540   DOI
25 Ravenscroft, P., Brammer, H. and Richards, K. (2011) Arsenic pollution: a global synthesis. volume 94. John Wiley & Sons.
26 Smith, R., Knight, R. and Fendorf, S. (2018) Overpumping leads to California groundwater arsenic threat. Nature Communications, v.9(2), p.115. doi: 10.1038/s41467-018-04475-3   DOI
27 Tan, Z., Yang, Q. and Zheng, Y. (2020) Machine learning models of groundwater arsenic spatial distribution in bangladesh: Influence of holocene sediment depositional history. Environ. Sci. Technol., v.54, p.9454-9463. https://doi.org/10.1021/acs.est.0c03617   DOI
28 Fendorf, S., Michael, H.A. and van Geen, A. (2010) Spatial and temporal variations of groundwater arsenic in South and Southeast Asia. Science, v.328, p.1123-1127. doi: 10.1126/science.1172974   DOI
29 Mantovani, R.G., Rossi, A.L., Vanschoren, J., Bischl, B. and De Carvalho, A.C. (2015) Effectiveness of random search in svm hyper-parameter tuning, in: 2015 International Joint Conference on Neural Networks (IJCNN), IEEE. pp. 1-8. doi: 10.1109/IJCNN.2015.7280664   DOI
30 United Nations, Department of Economic and Social Affairs Population Division (2019) World Population Prospects 2019, Volume II: Demographic Profiles Vietnam. Available at: https://population.un.org/wpp/Graphs/1_Demographic%20Profiles/Viet%20Nam.pdf.
31 Strobl, C., Boulesteix, A.L., Kneib, T., Augustin, T. and Zeileis, A. (2008) Conditional variable importance for random forests. BMC Bioinformatics, v.9, p.307. doi: 10.1186/1471-2105-9-307   DOI
32 Touzani, S., Granderson, J. and Fernandes, S. (2018) Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy and Buildings, v.158, p.1533-1543. https://doi.org/10.1016/j.enbuild.2017.11.039   DOI
33 Wallis, I., Prommer, H., Berg, M., Siade, A.J., Sun, J. and Kipfer, R. (2020) The river-groundwater interface as a hotspot for arsenic release. Nature Geoscience, v.13, p.288-295. doi: 10.1038/s41561-020-0557-6   DOI
34 Winkel, L.H., Trang, P.T.K., Lan, V.M., Stengel, C., Amini, M., Ha, N.T., Viet, P.H. and Berg, M. (2011) Arsenic pollution of groundwater in Vietnam exacerbated by deep aquifer exploitation for more than a century. PNAS, v.108, p.1246-1251. doi: 10.1073/pnas.1011915108   DOI