Improving fecal bacteria estimation using machine learning and explainable AI in four major rivers, South Korea.

IF 8.2 1区 环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES Science of the Total Environment Pub Date : 2024-12-20 Epub Date: 2024-11-19 DOI:10.1016/j.scitotenv.2024.177459
SungMin Suh, JunGi Moon, Sangjin Jung, JongCheol Pyo
{"title":"Improving fecal bacteria estimation using machine learning and explainable AI in four major rivers, South Korea.","authors":"SungMin Suh, JunGi Moon, Sangjin Jung, JongCheol Pyo","doi":"10.1016/j.scitotenv.2024.177459","DOIUrl":null,"url":null,"abstract":"<p><p>This study addresses the critical public health issue of fecal coliform contamination in the four major rivers in South Korea (Han, Nakdong, Geum, and Yeongsan rivers) by applying advanced machine learning (ML) algorithms combined with Explainable Artificial Intelligence to enhance both prediction accuracy and interpretability. Both traditional and machine learning models often face challenges in accurately estimating fecal coliform levels due to the complexity of environmental variables and data limitations. To address this limitation, we employed two tree-based models (i.e., random forest [RF] and extreme gradient boost [XGBoost]), and two neural network models (i.e., deep neural network and convolutional neural network [CNN]). we employed the use of Shapley Additive Explanations (SHAP) to facilitate a more comprehensive understanding of the influence exerted by each variable on the model's predictions. Based on a comprehensive dataset collected from the National Institute of Environmental Research covering 16 water quality parameters and meteorological data from 2014 to 2022, our study improved the accuracy of fecal coliform estimation using XGBoost and CNN models. The optimal result was obtained using XGBoost, which had a validation Nash-Sutcliffe efficiency of 0.597 in the Han River. In addition, this study provides insights into the significant factors influencing fecal coliform concentrations across different river environments using the SHAP model. The results indicated that the XGBoost model provided superior estimation accuracy and explanations for the contributions of variables. The SHAP results provided the precise contribution of each water quality variable that affected the fecal estimation results using the XGBoost model. The study facilitates an improved understanding of the relationship between water quality variables and fecal coliform contamination mechanisms in the four major rivers in South Korea.</p>","PeriodicalId":422,"journal":{"name":"Science of the Total Environment","volume":" ","pages":"177459"},"PeriodicalIF":8.2000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of the Total Environment","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.scitotenv.2024.177459","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/19 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

This study addresses the critical public health issue of fecal coliform contamination in the four major rivers in South Korea (Han, Nakdong, Geum, and Yeongsan rivers) by applying advanced machine learning (ML) algorithms combined with Explainable Artificial Intelligence to enhance both prediction accuracy and interpretability. Both traditional and machine learning models often face challenges in accurately estimating fecal coliform levels due to the complexity of environmental variables and data limitations. To address this limitation, we employed two tree-based models (i.e., random forest [RF] and extreme gradient boost [XGBoost]), and two neural network models (i.e., deep neural network and convolutional neural network [CNN]). we employed the use of Shapley Additive Explanations (SHAP) to facilitate a more comprehensive understanding of the influence exerted by each variable on the model's predictions. Based on a comprehensive dataset collected from the National Institute of Environmental Research covering 16 water quality parameters and meteorological data from 2014 to 2022, our study improved the accuracy of fecal coliform estimation using XGBoost and CNN models. The optimal result was obtained using XGBoost, which had a validation Nash-Sutcliffe efficiency of 0.597 in the Han River. In addition, this study provides insights into the significant factors influencing fecal coliform concentrations across different river environments using the SHAP model. The results indicated that the XGBoost model provided superior estimation accuracy and explanations for the contributions of variables. The SHAP results provided the precise contribution of each water quality variable that affected the fecal estimation results using the XGBoost model. The study facilitates an improved understanding of the relationship between water quality variables and fecal coliform contamination mechanisms in the four major rivers in South Korea.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习和可解释人工智能改进韩国四条主要河流的粪便细菌估计。
本研究针对韩国四大河流(汉江、洛东江、锦江和灵山江)的粪大肠菌群污染这一重大公共卫生问题,采用先进的机器学习(ML)算法,结合可解释人工智能(Explainable Artificial Intelligence),提高预测的准确性和可解释性。由于环境变量的复杂性和数据的局限性,传统模型和机器学习模型在准确估算粪大肠菌群含量方面都经常面临挑战。为了解决这一局限性,我们采用了两种基于树的模型(即随机森林 [RF] 和极梯度提升 [XGBoost])和两种神经网络模型(即深度神经网络和卷积神经网络 [CNN])。我们采用了 Shapley Additive Explanations (SHAP),以便更全面地了解每个变量对模型预测的影响。我们的研究基于从国家环境研究所收集的涵盖 16 个水质参数和 2014 年至 2022 年气象数据的综合数据集,利用 XGBoost 和 CNN 模型提高了粪大肠菌群估计的准确性。使用 XGBoost 获得了最佳结果,在汉江的验证 Nash-Sutcliffe 效率为 0.597。此外,本研究还利用 SHAP 模型深入分析了影响不同河流环境中粪大肠菌群浓度的重要因素。结果表明,XGBoost 模型提供了更高的估计精度和对变量贡献的解释。SHAP 结果提供了影响 XGBoost 模型粪便估计结果的每个水质变量的精确贡献。这项研究有助于更好地理解韩国四大河流的水质变量与粪大肠菌群污染机制之间的关系。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Science of the Total Environment
Science of the Total Environment 环境科学-环境科学
CiteScore
17.60
自引率
10.20%
发文量
8726
审稿时长
2.4 months
期刊介绍: The Science of the Total Environment is an international journal dedicated to scientific research on the environment and its interaction with humanity. It covers a wide range of disciplines and seeks to publish innovative, hypothesis-driven, and impactful research that explores the entire environment, including the atmosphere, lithosphere, hydrosphere, biosphere, and anthroposphere. The journal's updated Aims & Scope emphasizes the importance of interdisciplinary environmental research with broad impact. Priority is given to studies that advance fundamental understanding and explore the interconnectedness of multiple environmental spheres. Field studies are preferred, while laboratory experiments must demonstrate significant methodological advancements or mechanistic insights with direct relevance to the environment.
期刊最新文献
Evaluating spatial effect of transportation planning factors on taxi CO2 emissions. Enhancing seagrass restoration success: Detecting and quantifying mechanisms of wave-induced dislodgement. Plastic pollution and marine mussels: Unravelling disparities in research efforts, biological effects and influences of global warming. Insights into organophosphorus insecticide malathion induced reproductive toxicity and intergenerational effect in zebrafish (Danio rerio). Modeling of heteroaggregation driven buoyant microplastic settling: Interaction with multiple clay particles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1