Yaggesh Kumar Sharma , Seokhyeon Kim , Amir Saman Tayerani Charmchi , Doosun Kang , Okke Batelaan
{"title":"Strategic imputation of groundwater data using machine learning: Insights from diverse aquifers in the Chao-Phraya River Basin","authors":"Yaggesh Kumar Sharma , Seokhyeon Kim , Amir Saman Tayerani Charmchi , Doosun Kang , Okke Batelaan","doi":"10.1016/j.gsd.2024.101394","DOIUrl":null,"url":null,"abstract":"<div><div>Effective groundwater monitoring is essential for sustainable water management, particularly in data-sparse regions. To address inconsistencies in groundwater level data, we developed a machine learning framework for robust data imputation, tested in the Chao-Phraya River (CPR) Basin, a region facing significant groundwater challenges due to high population density and ecological importance. Our study evaluated five models—K-Nearest Neighbors (KNN), Multiple Imputation by Chained Equations (MICE), Multilayer Perceptron (MLP), Random Forest (RF), and Soft Imputation (SI) —to fill gaps in monthly groundwater level data across various locations, aquifer depths, and data loss scenarios. Results show that MICE perform well in high-density well environments, while SI excels with lower well density, maintaining Pearson correlation coefficients (R) above 0.80 and RMSE values below 6 even at 10% data loss. The Coefficient of Variation (COV) analysis also confirmed that imputed data remains stable and reliable. However, the study also reveals a significant decrease in model performance in regions with fewer wells, as indicated by increased RMSE and reduced R. Our findings indicate that machine learning models are capable of handling groundwater level observations with missing data. The well density in a region has a significant impact on these model's performance. Imputation techniques should be tailored to each aquifer's specific characteristics and surroundings in order to get accurate groundwater data.</div></div>","PeriodicalId":37879,"journal":{"name":"Groundwater for Sustainable Development","volume":"28 ","pages":"Article 101394"},"PeriodicalIF":4.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Groundwater for Sustainable Development","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352801X24003175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Effective groundwater monitoring is essential for sustainable water management, particularly in data-sparse regions. To address inconsistencies in groundwater level data, we developed a machine learning framework for robust data imputation, tested in the Chao-Phraya River (CPR) Basin, a region facing significant groundwater challenges due to high population density and ecological importance. Our study evaluated five models—K-Nearest Neighbors (KNN), Multiple Imputation by Chained Equations (MICE), Multilayer Perceptron (MLP), Random Forest (RF), and Soft Imputation (SI) —to fill gaps in monthly groundwater level data across various locations, aquifer depths, and data loss scenarios. Results show that MICE perform well in high-density well environments, while SI excels with lower well density, maintaining Pearson correlation coefficients (R) above 0.80 and RMSE values below 6 even at 10% data loss. The Coefficient of Variation (COV) analysis also confirmed that imputed data remains stable and reliable. However, the study also reveals a significant decrease in model performance in regions with fewer wells, as indicated by increased RMSE and reduced R. Our findings indicate that machine learning models are capable of handling groundwater level observations with missing data. The well density in a region has a significant impact on these model's performance. Imputation techniques should be tailored to each aquifer's specific characteristics and surroundings in order to get accurate groundwater data.
期刊介绍:
Groundwater for Sustainable Development is directed to different stakeholders and professionals, including government and non-governmental organizations, international funding agencies, universities, public water institutions, public health and other public/private sector professionals, and other relevant institutions. It is aimed at professionals, academics and students in the fields of disciplines such as: groundwater and its connection to surface hydrology and environment, soil sciences, engineering, ecology, microbiology, atmospheric sciences, analytical chemistry, hydro-engineering, water technology, environmental ethics, economics, public health, policy, as well as social sciences, legal disciplines, or any other area connected with water issues. The objectives of this journal are to facilitate: • The improvement of effective and sustainable management of water resources across the globe. • The improvement of human access to groundwater resources in adequate quantity and good quality. • The meeting of the increasing demand for drinking and irrigation water needed for food security to contribute to a social and economically sound human development. • The creation of a global inter- and multidisciplinary platform and forum to improve our understanding of groundwater resources and to advocate their effective and sustainable management and protection against contamination. • Interdisciplinary information exchange and to stimulate scientific research in the fields of groundwater related sciences and social and health sciences required to achieve the United Nations Millennium Development Goals for sustainable development.