Probabilistic mapping of imbalanced data for groundwater contamination using classification algorithms: Performance and reliability

IF 4.9 Q2 ENGINEERING, ENVIRONMENTAL Groundwater for Sustainable Development Pub Date : 2025-02-01 DOI:10.1016/j.gsd.2024.101393
Yang Qiu , Aiguo Zhou , Hanxiang Xiong , Defang Zhang , Cheng Su , Shizheng Zhou , Lin Go , Chi Yang , Hao Cui , Wei Fan , Yao Yu , Fawang Zhang , Chuanming Ma
{"title":"Probabilistic mapping of imbalanced data for groundwater contamination using classification algorithms: Performance and reliability","authors":"Yang Qiu ,&nbsp;Aiguo Zhou ,&nbsp;Hanxiang Xiong ,&nbsp;Defang Zhang ,&nbsp;Cheng Su ,&nbsp;Shizheng Zhou ,&nbsp;Lin Go ,&nbsp;Chi Yang ,&nbsp;Hao Cui ,&nbsp;Wei Fan ,&nbsp;Yao Yu ,&nbsp;Fawang Zhang ,&nbsp;Chuanming Ma","doi":"10.1016/j.gsd.2024.101393","DOIUrl":null,"url":null,"abstract":"<div><div>The probabilistic mapping of groundwater contamination is a crucial foundation for sustainable groundwater management. However, groundwater data often exhibit imbalance, posing challenges for precise and reliable probability mapping. This study focused on the Jianghan Plain, evaluating the performance and reliability of various sampling and ensemble techniques using a small, imbalanced dataset (n = 246, Class0/Class1 = 0.84/0.16). Probabilistic maps revealed significant spatial variability, with high-probability areas concentrated in the western (Yichang City), eastern (Wuhan), and northern regions (north bank of Han River), while low-probability areas were in the central and southern regions. Over-sampling methods outperformed others by maintaining class balance and enhancing the reliability of mapping outcomes. The high-very high probability areas for over-sampling methods ranged from 15.5% to 18.9%, with larger very low-low areas (60.5%–66.3%). In contrast, under-sampling and ensemble methods showed larger high-very high probability areas (34.0%–53.1%) and smaller very low-low areas (21.6%–46.3%). Over-sampling methods exhibited higher F1 scores (0.27–0.33) and precision (0.375–0.43) compared to other methods. SHAP analysis demonstrated that over-sampling methods balance datasets while preserving information integrity, enhancing the credibility of mapping results. Conversely, ensemble methods faced challenges in statistical analysis, hindering interpretability. We strongly recommend, that in conducting probabilistic mapping of groundwater contamination, it is imperative to adequately consider the imbalance of datasets and not solely rely on metrics like AUC and OA. For small-size datasets akin to this study, SMOTE and ADASYN emerge as recommended sampling methods, they not only yield high-precision mapping results but also ensure interpretability, thereby providing a more reliable basis for sustainable groundwater management.</div></div>","PeriodicalId":37879,"journal":{"name":"Groundwater for Sustainable Development","volume":"28 ","pages":"Article 101393"},"PeriodicalIF":4.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Groundwater for Sustainable Development","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352801X24003163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

The probabilistic mapping of groundwater contamination is a crucial foundation for sustainable groundwater management. However, groundwater data often exhibit imbalance, posing challenges for precise and reliable probability mapping. This study focused on the Jianghan Plain, evaluating the performance and reliability of various sampling and ensemble techniques using a small, imbalanced dataset (n = 246, Class0/Class1 = 0.84/0.16). Probabilistic maps revealed significant spatial variability, with high-probability areas concentrated in the western (Yichang City), eastern (Wuhan), and northern regions (north bank of Han River), while low-probability areas were in the central and southern regions. Over-sampling methods outperformed others by maintaining class balance and enhancing the reliability of mapping outcomes. The high-very high probability areas for over-sampling methods ranged from 15.5% to 18.9%, with larger very low-low areas (60.5%–66.3%). In contrast, under-sampling and ensemble methods showed larger high-very high probability areas (34.0%–53.1%) and smaller very low-low areas (21.6%–46.3%). Over-sampling methods exhibited higher F1 scores (0.27–0.33) and precision (0.375–0.43) compared to other methods. SHAP analysis demonstrated that over-sampling methods balance datasets while preserving information integrity, enhancing the credibility of mapping results. Conversely, ensemble methods faced challenges in statistical analysis, hindering interpretability. We strongly recommend, that in conducting probabilistic mapping of groundwater contamination, it is imperative to adequately consider the imbalance of datasets and not solely rely on metrics like AUC and OA. For small-size datasets akin to this study, SMOTE and ADASYN emerge as recommended sampling methods, they not only yield high-precision mapping results but also ensure interpretability, thereby providing a more reliable basis for sustainable groundwater management.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Groundwater for Sustainable Development
Groundwater for Sustainable Development Social Sciences-Geography, Planning and Development
CiteScore
11.50
自引率
10.20%
发文量
152
期刊介绍: Groundwater for Sustainable Development is directed to different stakeholders and professionals, including government and non-governmental organizations, international funding agencies, universities, public water institutions, public health and other public/private sector professionals, and other relevant institutions. It is aimed at professionals, academics and students in the fields of disciplines such as: groundwater and its connection to surface hydrology and environment, soil sciences, engineering, ecology, microbiology, atmospheric sciences, analytical chemistry, hydro-engineering, water technology, environmental ethics, economics, public health, policy, as well as social sciences, legal disciplines, or any other area connected with water issues. The objectives of this journal are to facilitate: • The improvement of effective and sustainable management of water resources across the globe. • The improvement of human access to groundwater resources in adequate quantity and good quality. • The meeting of the increasing demand for drinking and irrigation water needed for food security to contribute to a social and economically sound human development. • The creation of a global inter- and multidisciplinary platform and forum to improve our understanding of groundwater resources and to advocate their effective and sustainable management and protection against contamination. • Interdisciplinary information exchange and to stimulate scientific research in the fields of groundwater related sciences and social and health sciences required to achieve the United Nations Millennium Development Goals for sustainable development.
期刊最新文献
Radon concentration in water sources and its associated cancer risks to the populace of Ede, South Western Nigeria Risk assessment of not meeting environmental objectives related to protection of human health and groundwater quality: The tiered approach in the context of the EU water framework directive Assessing groundwater quality and solute sources in highly anthropized areas. The case of Abuja Federal Capital Territory, Nigeria Assessing the effects of ENSO-induced climate variability on shallow coastal groundwater reserves of north Patagonia, Argentina Groundwater vulnerability assessment using modified DRASTIC method with integrated hydrological model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1