A Machine Learning Framework for Enhanced Assessment of Sewer System Operation under Data Constraints and Skewed Distributions

IF 6.7 Q1 ENGINEERING, ENVIRONMENTAL ACS ES&T engineering Pub Date : 2024-09-25 DOI:10.1021/acsestengg.4c00477
Wan-Xin Yin, Yu-Qi Wang, Jia-Qiang Lv, Jia-Ji Chen, Shuai Liu, Zheng Pang, Ye Yuan, Hong-Xu Bao, Hong-Cheng Wang* and Ai-Jie Wang*, 
{"title":"A Machine Learning Framework for Enhanced Assessment of Sewer System Operation under Data Constraints and Skewed Distributions","authors":"Wan-Xin Yin,&nbsp;Yu-Qi Wang,&nbsp;Jia-Qiang Lv,&nbsp;Jia-Ji Chen,&nbsp;Shuai Liu,&nbsp;Zheng Pang,&nbsp;Ye Yuan,&nbsp;Hong-Xu Bao,&nbsp;Hong-Cheng Wang* and Ai-Jie Wang*,&nbsp;","doi":"10.1021/acsestengg.4c00477","DOIUrl":null,"url":null,"abstract":"<p >In the realm of sewer management, precise machine learning simulations of physicobiochemical processes during sewage transport are essential yet are hindered by skewed distributions and data constraints. To address this issue, the present study introduces an innovative algorithm, the Automatic Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise (AutoSMOGN), designed to mitigate the adverse effects of skewed data set distributions. The findings reveal that the integration of the AutoSMOGN algorithm with ML models significantly enhances the precision of gaseous H<sub>2</sub>S concentration predictions. Of these approaches, ensemble learning models demonstrated superior accuracy in forecasting gaseous H<sub>2</sub>S concentrations within sewer environments, achieving the highest coefficient of determination (<i>R</i><sup>2</sup>) of 0.80. Furthermore, the study validates the effectiveness of the AutoSMOGN algorithm in addressing skewed distribution, as evidenced by its acceptable predictive performance on a full-sequence data set (<i>R</i><sup>2</sup> of 0.52) and when applied to multiple variables, yielding <i>R</i><sup>2</sup> values of 0.88 for total nitrogen and 0.66 for total organic carbon, respectively. These results underscore the potential of the AutoSMOGN algorithm to significantly contribute to the development of new control and optimization strategies, thereby enhancing the maintenance and operational efficacy of sewer systems.</p>","PeriodicalId":7008,"journal":{"name":"ACS ES&T engineering","volume":"5 1","pages":"126–136 126–136"},"PeriodicalIF":6.7000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS ES&T engineering","FirstCategoryId":"1085","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acsestengg.4c00477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

In the realm of sewer management, precise machine learning simulations of physicobiochemical processes during sewage transport are essential yet are hindered by skewed distributions and data constraints. To address this issue, the present study introduces an innovative algorithm, the Automatic Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise (AutoSMOGN), designed to mitigate the adverse effects of skewed data set distributions. The findings reveal that the integration of the AutoSMOGN algorithm with ML models significantly enhances the precision of gaseous H2S concentration predictions. Of these approaches, ensemble learning models demonstrated superior accuracy in forecasting gaseous H2S concentrations within sewer environments, achieving the highest coefficient of determination (R2) of 0.80. Furthermore, the study validates the effectiveness of the AutoSMOGN algorithm in addressing skewed distribution, as evidenced by its acceptable predictive performance on a full-sequence data set (R2 of 0.52) and when applied to multiple variables, yielding R2 values of 0.88 for total nitrogen and 0.66 for total organic carbon, respectively. These results underscore the potential of the AutoSMOGN algorithm to significantly contribute to the development of new control and optimization strategies, thereby enhancing the maintenance and operational efficacy of sewer systems.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据约束和倾斜分布下下水道系统运行评估的机器学习框架
在下水道管理领域,污水运输过程中物理生化过程的精确机器学习模拟是必不可少的,但受到分布偏差和数据限制的阻碍。为了解决这个问题,本研究引入了一种创新的算法,即用于高斯噪声回归的自动合成少数派过采样技术(AutoSMOGN),旨在减轻倾斜数据集分布的不利影响。研究结果表明,AutoSMOGN算法与ML模型的集成显著提高了气体H2S浓度预测的精度。在这些方法中,集成学习模型在预测下水道环境中气态H2S浓度方面表现出更高的准确性,达到了0.80的最高决定系数(R2)。此外,该研究验证了AutoSMOGN算法在解决偏态分布方面的有效性,证明了它在全序列数据集上的预测性能(R2为0.52),当应用于多变量时,总氮和总有机碳的R2分别为0.88和0.66。这些结果强调了AutoSMOGN算法在开发新的控制和优化策略方面的潜力,从而提高了下水道系统的维护和运行效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACS ES&T engineering
ACS ES&T engineering ENGINEERING, ENVIRONMENTAL-
CiteScore
8.50
自引率
0.00%
发文量
0
期刊介绍: ACS ES&T Engineering publishes impactful research and review articles across all realms of environmental technology and engineering, employing a rigorous peer-review process. As a specialized journal, it aims to provide an international platform for research and innovation, inviting contributions on materials technologies, processes, data analytics, and engineering systems that can effectively manage, protect, and remediate air, water, and soil quality, as well as treat wastes and recover resources. The journal encourages research that supports informed decision-making within complex engineered systems and is grounded in mechanistic science and analytics, describing intricate environmental engineering systems. It considers papers presenting novel advancements, spanning from laboratory discovery to field-based application. However, case or demonstration studies lacking significant scientific advancements and technological innovations are not within its scope. Contributions containing experimental and/or theoretical methods, rooted in engineering principles and integrated with knowledge from other disciplines, are welcomed.
期刊最新文献
Modular, On-Site Solutions with Lightweight Anomaly Detection for Sustainable Nutrient Management in Agriculture. Using Novosphingobium aromaticivorans for Concurrent Production of Intracellular and Extracellular Products from Aromatics Extracted from Poplar Biomass. Influence of Membrane Ion Sorption on Ammonium Transport in Donnan Dialysis with Cation Exchange Membranes. Assessing the Accuracy of Property Model Predictions for Cost Optimization of Desalination Technologies. Nutrient Separation Systems: Current Progress and Future Opportunities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1