Peng Xia , Yifu Zhao , Xianjun Xie , Junxia Li , Kun Qian , Haoyu You , Jingxian Zhang , Weili Ge , Hongjie Pan , Yanxin Wang
{"title":"Machine learning prediction of health risk and spatial dependence of geogenic contaminated groundwater from the Hetao Basin, China","authors":"Peng Xia , Yifu Zhao , Xianjun Xie , Junxia Li , Kun Qian , Haoyu You , Jingxian Zhang , Weili Ge , Hongjie Pan , Yanxin Wang","doi":"10.1016/j.gexplo.2024.107497","DOIUrl":null,"url":null,"abstract":"<div><p>Geogenic contaminated groundwater (GCG), characterized by elevated arsenic, fluoride, and iodine levels, present a significant challenge to public health and government management. Conventional survey-based approaches of collecting groundwater samples, conducting physicochemical tests, and performing spatial interpolation to obtain regional groundwater chemical component maps are inefficient and costly. More importantly, it does not take into account the actual hydrogeological conditions or the characteristics of pollutant transport and enrichment. To address this issue, we utilized Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), and Extreme Gradient Boosting (XGBoost) to analyze the likelihood of occurrence of arsenic, fluoride, and iodine as well as their spatial distribution in shallow groundwater from the Hetao Basin. Our study incorporated 20 indicators related to meteorology, soil physicochemical properties, and groundwater conditions, along with 1505 labeled samples consisting of groundwater arsenic, fluoride, and iodine concentrations and their corresponding coordinates. Subsequently, the study automatically analyzed the meteorological, soil physicochemical properties and groundwater conditions by constructing a machine learning model using the available data. In order to optimise and select the best prediction model, this paper presents a quantitative evaluation of the prediction performance of various machine learning models. The accuracy (AC), area under curve (AUC) and mean squared error (MSE) were calculated to predict the spatial distribution of CGC. Subsequently, the optimized model for predicting the spatial distribution of GCG was selected. The results showed that the XGBoost algorithm provided optimal predictions for groundwater with arsenic concentrations above 10 μg/L and fluoride concentrations exceeding 1.5 mg/L, whereas the RF model provided the best predictions for groundwater with arsenic concentrations surpassing 50 μg/L and iodine concentrations exceeding 100 μg/L. Subsequently, groundwater health risk zones were delineated based on an optimal prediction model, and demographic analysis was conducted in both the direct and potential groundwater risk zones. Model predictions indicated that hundreds of thousands of people in the Hetao Basin were facing a public health crisis caused by high concentrations of arsenic, fluoride and iodine in groundwater. These findings underscore the significant health challenge in the study area. Considering the agricultural development and increasing groundwater use in the area, our findings can guide local governments in managing the extent of groundwater development, establishing control zones, and enhancing protection measures for populations at risk from groundwater contamination.</p></div>","PeriodicalId":16336,"journal":{"name":"Journal of Geochemical Exploration","volume":"262 ","pages":"Article 107497"},"PeriodicalIF":3.4000,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Geochemical Exploration","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0375674224001134","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0
Abstract
Geogenic contaminated groundwater (GCG), characterized by elevated arsenic, fluoride, and iodine levels, present a significant challenge to public health and government management. Conventional survey-based approaches of collecting groundwater samples, conducting physicochemical tests, and performing spatial interpolation to obtain regional groundwater chemical component maps are inefficient and costly. More importantly, it does not take into account the actual hydrogeological conditions or the characteristics of pollutant transport and enrichment. To address this issue, we utilized Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), and Extreme Gradient Boosting (XGBoost) to analyze the likelihood of occurrence of arsenic, fluoride, and iodine as well as their spatial distribution in shallow groundwater from the Hetao Basin. Our study incorporated 20 indicators related to meteorology, soil physicochemical properties, and groundwater conditions, along with 1505 labeled samples consisting of groundwater arsenic, fluoride, and iodine concentrations and their corresponding coordinates. Subsequently, the study automatically analyzed the meteorological, soil physicochemical properties and groundwater conditions by constructing a machine learning model using the available data. In order to optimise and select the best prediction model, this paper presents a quantitative evaluation of the prediction performance of various machine learning models. The accuracy (AC), area under curve (AUC) and mean squared error (MSE) were calculated to predict the spatial distribution of CGC. Subsequently, the optimized model for predicting the spatial distribution of GCG was selected. The results showed that the XGBoost algorithm provided optimal predictions for groundwater with arsenic concentrations above 10 μg/L and fluoride concentrations exceeding 1.5 mg/L, whereas the RF model provided the best predictions for groundwater with arsenic concentrations surpassing 50 μg/L and iodine concentrations exceeding 100 μg/L. Subsequently, groundwater health risk zones were delineated based on an optimal prediction model, and demographic analysis was conducted in both the direct and potential groundwater risk zones. Model predictions indicated that hundreds of thousands of people in the Hetao Basin were facing a public health crisis caused by high concentrations of arsenic, fluoride and iodine in groundwater. These findings underscore the significant health challenge in the study area. Considering the agricultural development and increasing groundwater use in the area, our findings can guide local governments in managing the extent of groundwater development, establishing control zones, and enhancing protection measures for populations at risk from groundwater contamination.
期刊介绍:
Journal of Geochemical Exploration is mostly dedicated to publication of original studies in exploration and environmental geochemistry and related topics.
Contributions considered of prevalent interest for the journal include researches based on the application of innovative methods to:
define the genesis and the evolution of mineral deposits including transfer of elements in large-scale mineralized areas.
analyze complex systems at the boundaries between bio-geochemistry, metal transport and mineral accumulation.
evaluate effects of historical mining activities on the surface environment.
trace pollutant sources and define their fate and transport models in the near-surface and surface environments involving solid, fluid and aerial matrices.
assess and quantify natural and technogenic radioactivity in the environment.
determine geochemical anomalies and set baseline reference values using compositional data analysis, multivariate statistics and geo-spatial analysis.
assess the impacts of anthropogenic contamination on ecosystems and human health at local and regional scale to prioritize and classify risks through deterministic and stochastic approaches.
Papers dedicated to the presentation of newly developed methods in analytical geochemistry to be applied in the field or in laboratory are also within the topics of interest for the journal.