Explainable artificial intelligence for the interpretation of ensemble learning performance in algal bloom estimation.

IF 2.5 4区 环境科学与生态学 Q3 ENGINEERING, ENVIRONMENTAL Water Environment Research Pub Date : 2024-10-01 DOI:10.1002/wer.11140
Jungsu Park, Byeongchan Seong, Yeonjeong Park, Woo Hyoung Lee, Tae-Young Heo
{"title":"Explainable artificial intelligence for the interpretation of ensemble learning performance in algal bloom estimation.","authors":"Jungsu Park, Byeongchan Seong, Yeonjeong Park, Woo Hyoung Lee, Tae-Young Heo","doi":"10.1002/wer.11140","DOIUrl":null,"url":null,"abstract":"<p><p>Chlorophyll-a (Chl-a) concentrations, a key indicator of algal blooms, were estimated using the XGBoost machine learning model with 23 variables, including water quality and meteorological factors. The model performance was evaluated using three indices: root mean square error (RMSE), RMSE-observation standard deviation ratio (RSR), and Nash-Sutcliffe efficiency. Nine datasets were created by averaging 1 hour data to cover time frequencies ranging from 1 hour to 1 month. The dataset with relatively high observation frequencies (1-24 h) maintained stability, with an RSR ranging between 0.61 and 0.65. However, the model's performance declined significantly for datasets with weekly and monthly intervals. The Shapley value (SHAP) analysis, an explainable artificial intelligence method, was further applied to provide a quantitative understanding of how environmental factors in the watershed impact the model's performance and is also utilized to enhance the practical applicability of the model in the field. The number of input variables for model construction increased sequentially from 1 to 23, starting from the variable with the highest SHAP value to that with the lowest. The model's performance plateaued after considering five or more variables, demonstrating that stable performance could be achieved using only a small number of variables, including relatively easily measured data collected by real-time sensors, such as pH, dissolved oxygen, and turbidity. This result highlights the practicality of employing machine learning models and real-time sensor-based measurements for effective on-site water quality management. PRACTITIONER POINTS: XAI quantifies the effects of environmental factors on algal bloom prediction models The effects of input variable frequency and seasonality were analyzed using XAI XAI analysis on key variables ensures cost-effective model development.</p>","PeriodicalId":23621,"journal":{"name":"Water Environment Research","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Water Environment Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1002/wer.11140","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Chlorophyll-a (Chl-a) concentrations, a key indicator of algal blooms, were estimated using the XGBoost machine learning model with 23 variables, including water quality and meteorological factors. The model performance was evaluated using three indices: root mean square error (RMSE), RMSE-observation standard deviation ratio (RSR), and Nash-Sutcliffe efficiency. Nine datasets were created by averaging 1 hour data to cover time frequencies ranging from 1 hour to 1 month. The dataset with relatively high observation frequencies (1-24 h) maintained stability, with an RSR ranging between 0.61 and 0.65. However, the model's performance declined significantly for datasets with weekly and monthly intervals. The Shapley value (SHAP) analysis, an explainable artificial intelligence method, was further applied to provide a quantitative understanding of how environmental factors in the watershed impact the model's performance and is also utilized to enhance the practical applicability of the model in the field. The number of input variables for model construction increased sequentially from 1 to 23, starting from the variable with the highest SHAP value to that with the lowest. The model's performance plateaued after considering five or more variables, demonstrating that stable performance could be achieved using only a small number of variables, including relatively easily measured data collected by real-time sensors, such as pH, dissolved oxygen, and turbidity. This result highlights the practicality of employing machine learning models and real-time sensor-based measurements for effective on-site water quality management. PRACTITIONER POINTS: XAI quantifies the effects of environmental factors on algal bloom prediction models The effects of input variable frequency and seasonality were analyzed using XAI XAI analysis on key variables ensures cost-effective model development.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
可解释人工智能用于解释藻华估计中的集合学习性能。
叶绿素-a(Chl-a)浓度是藻类大量繁殖的一个关键指标,该浓度是利用 XGBoost 机器学习模型估算的,该模型包含 23 个变量,其中包括水质和气象因素。模型性能采用三个指标进行评估:均方根误差(RMSE)、均方根误差-观测标准偏差比(RSR)和纳什-苏特克利夫效率。通过平均 1 小时的数据创建了 9 个数据集,时间频率从 1 小时到 1 个月不等。观测频率相对较高的数据集(1-24 小时)保持了稳定性,RSR 在 0.61 和 0.65 之间。然而,对于每周和每月间隔的数据集,模型的性能明显下降。沙普利值(SHAP)分析是一种可解释的人工智能方法,它的进一步应用提供了对流域环境因素如何影响模型性能的定量理解,同时也用于提高模型在现场的实际应用性。从 SHAP 值最高的变量到 SHAP 值最低的变量,构建模型的输入变量数量从 1 个依次增加到 23 个。在考虑了 5 个或更多变量后,模型的性能趋于稳定,这表明只需使用少量变量,包括 pH 值、溶解氧和浊度等实时传感器收集的相对容易测量的数据,就能实现稳定的性能。这一结果凸显了采用机器学习模型和基于传感器的实时测量来进行有效现场水质管理的实用性。实践点:XAI 量化了环境因素对藻华预测模型的影响 利用 XAI 分析了输入变量频率和季节性的影响,对关键变量的分析确保了模型开发的成本效益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Water Environment Research
Water Environment Research 环境科学-工程:环境
CiteScore
6.30
自引率
0.00%
发文量
138
审稿时长
11 months
期刊介绍: Published since 1928, Water Environment Research (WER) is an international multidisciplinary water resource management journal for the dissemination of fundamental and applied research in all scientific and technical areas related to water quality and resource recovery. WER''s goal is to foster communication and interdisciplinary research between water sciences and related fields such as environmental toxicology, agriculture, public and occupational health, microbiology, and ecology. In addition to original research articles, short communications, case studies, reviews, and perspectives are encouraged.
期刊最新文献
Strategy to develop and validate digital droplet PCR methods for global antimicrobial resistance wastewater surveillance. Removal of Fe2+ in coastal aquaculture source water by manganese ores: Batch experiments and breakthrough curve modeling. Study on the response mechanisms and evolution prediction of groundwater microbial-toxicological indicators. Synthesis of novel composite material with spent coffee ground biochar and steel slag zeolite for enhanced dye and phosphate removal. Understanding machine learning predictions of wastewater treatment plant sludge with explainable artificial intelligence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1