基于交叉层的新型混合变量机器学习模型提高了污水处理厂能源强度预测的准确性和解释能力。

IF 8 2区环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES Journal of Environmental Management Pub Date : 2024-11-13 DOI:10.1016/j.jenvman.2024.123209

Yucheng Li, Chen Cai, Erwu Liu, Xiaofeng Lin, Ying Zhang, Hongjing Chen, Zhongqing Wei, Xiangfeng Huang, Ru Guo, Kaiming Peng, Jia Liu

{"title":"基于交叉层的新型混合变量机器学习模型提高了污水处理厂能源强度预测的准确性和解释能力。","authors":"Yucheng Li, Chen Cai, Erwu Liu, Xiaofeng Lin, Ying Zhang, Hongjing Chen, Zhongqing Wei, Xiangfeng Huang, Ru Guo, Kaiming Peng, Jia Liu","doi":"10.1016/j.jenvman.2024.123209","DOIUrl":null,"url":null,"abstract":"Energy intensity (EI) prediction in wastewater treatment plants (WWTPs) suffers from inaccuracy and non-interpretability due to poor data quality, complex mechanisms and various confounding variables. In this study, the novel hybrid variable cross layer-based machine learning (VCL-ML) model was devised, which generates new knowledge with monitoring indicators (e.g., COD, etc.) and then embeds both domain knowledge and monitoring indicators into the ML model. This novel hybrid VCL-ML model achieves a root-mean-square error (RMSE) of 0.021 kW h/m³ with an 8.7% improvement over the conventional ML (Con-ML) model. The Shapley additive explanation demonstrated that domain knowledge features are ranked high and have important interpretable implications for the model, such as capacity utilization (CU), which measures the efficiency of resource use, and total nitrogen remaining rate (TN_rr), which indicates the nitrogen retention in a system. Partially dependent interactions between domain knowledge (e.g., sludge yield) and monitoring indexes (e.g., influent pH) could contribute to the interpretation of reality. By comparing the feature categorization between VCL-ML and Con-ML models, temporal information (e.g., month) and removal information (e.g., TN_rr) played an important role in the model's performance improvement. This result highlights the strong correlation between wastewater treatment plant energy intensity with pollutant removal and temporal information while weakening the contribution of other redundant features. This VCL-ML model improves the predicting accuracy and interpretation of the EI of WWTPs, which can be used in the optimal operation and sustainable management of WWTPs.","PeriodicalId":356,"journal":{"name":"Journal of Environmental Management","volume":"371 ","pages":"123209"},"PeriodicalIF":8.0000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel hybrid variable cross layer-based machine learning model improves the accuracy and interpretation of energy intensity prediction of wastewater treatment plant.\",\"authors\":\"Yucheng Li, Chen Cai, Erwu Liu, Xiaofeng Lin, Ying Zhang, Hongjing Chen, Zhongqing Wei, Xiangfeng Huang, Ru Guo, Kaiming Peng, Jia Liu\",\"doi\":\"10.1016/j.jenvman.2024.123209\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Energy intensity (EI) prediction in wastewater treatment plants (WWTPs) suffers from inaccuracy and non-interpretability due to poor data quality, complex mechanisms and various confounding variables. In this study, the novel hybrid variable cross layer-based machine learning (VCL-ML) model was devised, which generates new knowledge with monitoring indicators (e.g., COD, etc.) and then embeds both domain knowledge and monitoring indicators into the ML model. This novel hybrid VCL-ML model achieves a root-mean-square error (RMSE) of 0.021 kW h/m³ with an 8.7% improvement over the conventional ML (Con-ML) model. The Shapley additive explanation demonstrated that domain knowledge features are ranked high and have important interpretable implications for the model, such as capacity utilization (CU), which measures the efficiency of resource use, and total nitrogen remaining rate (TN_rr), which indicates the nitrogen retention in a system. Partially dependent interactions between domain knowledge (e.g., sludge yield) and monitoring indexes (e.g., influent pH) could contribute to the interpretation of reality. By comparing the feature categorization between VCL-ML and Con-ML models, temporal information (e.g., month) and removal information (e.g., TN_rr) played an important role in the model's performance improvement. This result highlights the strong correlation between wastewater treatment plant energy intensity with pollutant removal and temporal information while weakening the contribution of other redundant features. This VCL-ML model improves the predicting accuracy and interpretation of the EI of WWTPs, which can be used in the optimal operation and sustainable management of WWTPs.\",\"PeriodicalId\":356,\"journal\":{\"name\":\"Journal of Environmental Management\",\"volume\":\"371 \",\"pages\":\"123209\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2024-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Environmental Management\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jenvman.2024.123209\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Environmental Management","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.jenvman.2024.123209","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

由于数据质量差、机制复杂和各种混杂变量，污水处理厂（WWTPs）的能源强度（EI）预测存在不准确和不可解释性的问题。本研究设计了基于变量交叉层的新型混合机器学习（VCL-ML）模型，通过监测指标（如 COD 等）生成新知识，然后将领域知识和监测指标嵌入 ML 模型。这种新型混合 VCL-ML 模型的均方根误差（RMSE）为 0.021 kW h/m³，比传统的 ML（Con-ML）模型提高了 8.7%。夏普利加法解释表明，领域知识特征排名靠前，对模型具有重要的可解释性影响，如容量利用率（CU）（衡量资源利用效率）和总氮剩余率（TN_rr）（表示系统中的氮留存率）。领域知识（如污泥产量）与监测指标（如进水 pH 值）之间的部分依赖性相互作用有助于解释现实情况。通过比较 VCL-ML 模型和 Con-ML 模型的特征分类，时间信息（如月份）和去除信息（如 TN_rr）对模型性能的提高起到了重要作用。这一结果凸显了污水处理厂能源强度与污染物去除率和时间信息之间的紧密相关性，同时削弱了其他冗余特征的贡献。该 VCL-ML 模型提高了污水处理厂能耗强度的预测精度和解释能力，可用于污水处理厂的优化运行和可持续管理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A novel hybrid variable cross layer-based machine learning model improves the accuracy and interpretation of energy intensity prediction of wastewater treatment plant.

Energy intensity (EI) prediction in wastewater treatment plants (WWTPs) suffers from inaccuracy and non-interpretability due to poor data quality, complex mechanisms and various confounding variables. In this study, the novel hybrid variable cross layer-based machine learning (VCL-ML) model was devised, which generates new knowledge with monitoring indicators (e.g., COD, etc.) and then embeds both domain knowledge and monitoring indicators into the ML model. This novel hybrid VCL-ML model achieves a root-mean-square error (RMSE) of 0.021 kW h/m³ with an 8.7% improvement over the conventional ML (Con-ML) model. The Shapley additive explanation demonstrated that domain knowledge features are ranked high and have important interpretable implications for the model, such as capacity utilization (CU), which measures the efficiency of resource use, and total nitrogen remaining rate (TN_rr), which indicates the nitrogen retention in a system. Partially dependent interactions between domain knowledge (e.g., sludge yield) and monitoring indexes (e.g., influent pH) could contribute to the interpretation of reality. By comparing the feature categorization between VCL-ML and Con-ML models, temporal information (e.g., month) and removal information (e.g., TN_rr) played an important role in the model's performance improvement. This result highlights the strong correlation between wastewater treatment plant energy intensity with pollutant removal and temporal information while weakening the contribution of other redundant features. This VCL-ML model improves the predicting accuracy and interpretation of the EI of WWTPs, which can be used in the optimal operation and sustainable management of WWTPs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Environmental Management 环境科学-环境科学

CiteScore

13.70

自引率

5.70%

发文量

2477

审稿时长

84 days

期刊介绍： The Journal of Environmental Management is a journal for the publication of peer reviewed, original research for all aspects of management and the managed use of the environment, both natural and man-made.Critical review articles are also welcome; submission of these is strongly encouraged.