Hongjiao Pang , Yawen Ben , Yong Cao , Shen Qu , Chengzhi Hu
{"title":"基于时间序列的机器学习用于预测采用不同试剂剂量的全规模饮用水处理中的多元水质","authors":"Hongjiao Pang , Yawen Ben , Yong Cao , Shen Qu , Chengzhi Hu","doi":"10.1016/j.watres.2024.122777","DOIUrl":null,"url":null,"abstract":"<div><div>Accurately predicting drinking water quality is critical for intelligent water supply management and for maintaining the stability and efficiency of water treatment processes. This study presents an optimized time series machine learning approach for accurately predicting multivariate drinking water quality, explicitly considering the time-dependent effects of reagent dosing. By leveraging data from a full-scale treatment plant, we constructed feature-engineered time series datasets incorporating influent water quality parameters, reagent dosages and effluent water characteristics. Seven predictive models, including both traditional machine learning (ML) and deep learning (DL) models were developed and rigorously evaluated against a naive mean baseline model. Our results demonstrate that traditional ML models, enhanced with time feature engineering, rivaled the performance of both widely used DL models and the naive mean baseline model. Specifically, an XGBoost model achieved superior prediction accuracy in dynamically forecasting four water quality characteristics at a 12-hour time lag step, outperforming the naive baseline model by 3–4 % in terms of Mean Absolute Percentage Error (MAPE). This finding underscores the importance of incorporating a 12-hour interval to effectively capture the delayed impact of reagent dosing on water quality prediction. Furthermore, SHAP model interpretability analysis provided valuable insights into the XGBoost model's decision-making process, revealing its strong data-driven foundation aligned with established water treatment principles. This research highlights the significant potential of optimized machine learning techniques for enhancing water purification processes and enabling more informed, data-driven decision-making in the water supply industry.</div></div>","PeriodicalId":443,"journal":{"name":"Water Research","volume":"268 ","pages":"Article 122777"},"PeriodicalIF":11.4000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Time series-based machine learning for forecasting multivariate water quality in full-scale drinking water treatment with various reagent dosages\",\"authors\":\"Hongjiao Pang , Yawen Ben , Yong Cao , Shen Qu , Chengzhi Hu\",\"doi\":\"10.1016/j.watres.2024.122777\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurately predicting drinking water quality is critical for intelligent water supply management and for maintaining the stability and efficiency of water treatment processes. This study presents an optimized time series machine learning approach for accurately predicting multivariate drinking water quality, explicitly considering the time-dependent effects of reagent dosing. By leveraging data from a full-scale treatment plant, we constructed feature-engineered time series datasets incorporating influent water quality parameters, reagent dosages and effluent water characteristics. Seven predictive models, including both traditional machine learning (ML) and deep learning (DL) models were developed and rigorously evaluated against a naive mean baseline model. Our results demonstrate that traditional ML models, enhanced with time feature engineering, rivaled the performance of both widely used DL models and the naive mean baseline model. Specifically, an XGBoost model achieved superior prediction accuracy in dynamically forecasting four water quality characteristics at a 12-hour time lag step, outperforming the naive baseline model by 3–4 % in terms of Mean Absolute Percentage Error (MAPE). This finding underscores the importance of incorporating a 12-hour interval to effectively capture the delayed impact of reagent dosing on water quality prediction. Furthermore, SHAP model interpretability analysis provided valuable insights into the XGBoost model's decision-making process, revealing its strong data-driven foundation aligned with established water treatment principles. This research highlights the significant potential of optimized machine learning techniques for enhancing water purification processes and enabling more informed, data-driven decision-making in the water supply industry.</div></div>\",\"PeriodicalId\":443,\"journal\":{\"name\":\"Water Research\",\"volume\":\"268 \",\"pages\":\"Article 122777\"},\"PeriodicalIF\":11.4000,\"publicationDate\":\"2024-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Water Research\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0043135424016762\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Water Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0043135424016762","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
Time series-based machine learning for forecasting multivariate water quality in full-scale drinking water treatment with various reagent dosages
Accurately predicting drinking water quality is critical for intelligent water supply management and for maintaining the stability and efficiency of water treatment processes. This study presents an optimized time series machine learning approach for accurately predicting multivariate drinking water quality, explicitly considering the time-dependent effects of reagent dosing. By leveraging data from a full-scale treatment plant, we constructed feature-engineered time series datasets incorporating influent water quality parameters, reagent dosages and effluent water characteristics. Seven predictive models, including both traditional machine learning (ML) and deep learning (DL) models were developed and rigorously evaluated against a naive mean baseline model. Our results demonstrate that traditional ML models, enhanced with time feature engineering, rivaled the performance of both widely used DL models and the naive mean baseline model. Specifically, an XGBoost model achieved superior prediction accuracy in dynamically forecasting four water quality characteristics at a 12-hour time lag step, outperforming the naive baseline model by 3–4 % in terms of Mean Absolute Percentage Error (MAPE). This finding underscores the importance of incorporating a 12-hour interval to effectively capture the delayed impact of reagent dosing on water quality prediction. Furthermore, SHAP model interpretability analysis provided valuable insights into the XGBoost model's decision-making process, revealing its strong data-driven foundation aligned with established water treatment principles. This research highlights the significant potential of optimized machine learning techniques for enhancing water purification processes and enabling more informed, data-driven decision-making in the water supply industry.
期刊介绍:
Water Research, along with its open access companion journal Water Research X, serves as a platform for publishing original research papers covering various aspects of the science and technology related to the anthropogenic water cycle, water quality, and its management worldwide. The audience targeted by the journal comprises biologists, chemical engineers, chemists, civil engineers, environmental engineers, limnologists, and microbiologists. The scope of the journal include:
•Treatment processes for water and wastewaters (municipal, agricultural, industrial, and on-site treatment), including resource recovery and residuals management;
•Urban hydrology including sewer systems, stormwater management, and green infrastructure;
•Drinking water treatment and distribution;
•Potable and non-potable water reuse;
•Sanitation, public health, and risk assessment;
•Anaerobic digestion, solid and hazardous waste management, including source characterization and the effects and control of leachates and gaseous emissions;
•Contaminants (chemical, microbial, anthropogenic particles such as nanoparticles or microplastics) and related water quality sensing, monitoring, fate, and assessment;
•Anthropogenic impacts on inland, tidal, coastal and urban waters, focusing on surface and ground waters, and point and non-point sources of pollution;
•Environmental restoration, linked to surface water, groundwater and groundwater remediation;
•Analysis of the interfaces between sediments and water, and between water and atmosphere, focusing specifically on anthropogenic impacts;
•Mathematical modelling, systems analysis, machine learning, and beneficial use of big data related to the anthropogenic water cycle;
•Socio-economic, policy, and regulations studies.