Advanced deep learning model for predicting water pollutants using spectral data and augmentation techniques: A case study of the Middle and Lower Yangtze River, China

IF 7.8 2区环境科学与生态学 Q1 ENGINEERING, CHEMICAL Process Safety and Environmental Protection Pub Date : 2025-03-25 DOI:10.1016/j.psep.2025.107058

Guohao Zhang , Cailing Wang , Hongwei Wang , YU Tao

{"title":"Advanced deep learning model for predicting water pollutants using spectral data and augmentation techniques: A case study of the Middle and Lower Yangtze River, China","authors":"Guohao Zhang , Cailing Wang , Hongwei Wang , YU Tao","doi":"10.1016/j.psep.2025.107058","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning has demonstrated significant advantages in managing nonlinear relationships within high-dimensional spectral data, making it widely applicable in water quality monitoring. However, the variety of model selection and construction strategies has resulted in substantial fluctuations in predictive performance, particularly with high-dimensional data. This study constructs an integrated deep learning framework for predicting water pollutant concentrations, incorporating several key modules including data preprocessing, frequency decomposition, feature enhancement, sample augmentation, and decoder regression prediction. In the established model, an improved wavelet transform algorithm is first employed to address the issue of original data being unable to effectively distinguish detailed features, thereby accurately extracting the periodicity and volatility characteristics of the data. Secondly, an encoder module based on the Informer architecture enhances various frequency domain features and further improves the quality of features and their correlation with labels through distillation techniques. Subsequently, an improved generative adversarial network is introduced to tackle the problem of small sample data by effectively augmenting the limited dataset, thereby enhancing the overall quality of the dataset. Finally, a decoder module combining an optimization algorithm and an improved convolutional neural network (IMCPSO-RCNN) effectively addresses the shortcomings of traditional models in hyperparameter optimization and predictive performance, achieving efficient and accurate regression prediction of pollutant concentrations. A case study in the middle and lower reaches of the Yangtze River shows that this model outperforms others in prediction accuracy, achieving coefficients of determination (R²) of 0.9785, 0.9733, and 0.9741 for TN, COD, and TP, respectively. The root mean square error (RMSE) values are 0.0601, 0.6248, and 0.0023, while the mean absolute error (MAE) scores are 0.0252, 0.2810, and 0.0006, respectively. The necessity and effectiveness of each model component are validated through ablation experiments. This research offers an efficient and unified deep learning solution for monitoring water pollutants.</div></div><div><h3>Synopsis</h3><div>This deep learning framework enhances water quality monitoring by accurately predicting pollutant concentrations, informing environmental policy and water system management.</div></div>","PeriodicalId":20743,"journal":{"name":"Process Safety and Environmental Protection","volume":"197 ","pages":"Article 107058"},"PeriodicalIF":7.8000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Process Safety and Environmental Protection","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957582025003258","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning has demonstrated significant advantages in managing nonlinear relationships within high-dimensional spectral data, making it widely applicable in water quality monitoring. However, the variety of model selection and construction strategies has resulted in substantial fluctuations in predictive performance, particularly with high-dimensional data. This study constructs an integrated deep learning framework for predicting water pollutant concentrations, incorporating several key modules including data preprocessing, frequency decomposition, feature enhancement, sample augmentation, and decoder regression prediction. In the established model, an improved wavelet transform algorithm is first employed to address the issue of original data being unable to effectively distinguish detailed features, thereby accurately extracting the periodicity and volatility characteristics of the data. Secondly, an encoder module based on the Informer architecture enhances various frequency domain features and further improves the quality of features and their correlation with labels through distillation techniques. Subsequently, an improved generative adversarial network is introduced to tackle the problem of small sample data by effectively augmenting the limited dataset, thereby enhancing the overall quality of the dataset. Finally, a decoder module combining an optimization algorithm and an improved convolutional neural network (IMCPSO-RCNN) effectively addresses the shortcomings of traditional models in hyperparameter optimization and predictive performance, achieving efficient and accurate regression prediction of pollutant concentrations. A case study in the middle and lower reaches of the Yangtze River shows that this model outperforms others in prediction accuracy, achieving coefficients of determination (R²) of 0.9785, 0.9733, and 0.9741 for TN, COD, and TP, respectively. The root mean square error (RMSE) values are 0.0601, 0.6248, and 0.0023, while the mean absolute error (MAE) scores are 0.0252, 0.2810, and 0.0006, respectively. The necessity and effectiveness of each model component are validated through ablation experiments. This research offers an efficient and unified deep learning solution for monitoring water pollutants.

Synopsis

This deep learning framework enhances water quality monitoring by accurately predicting pollutant concentrations, informing environmental policy and water system management.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用光谱数据和增强技术预测水污染物的高级深度学习模型——以长江中下游地区为例

深度学习在管理高维光谱数据中的非线性关系方面显示出显著的优势，使其广泛应用于水质监测。然而，模型选择和构建策略的多样性导致了预测性能的大幅波动，特别是对于高维数据。本研究构建了一个用于水污染物浓度预测的集成深度学习框架，包括数据预处理、频率分解、特征增强、样本增强和解码器回归预测几个关键模块。在建立的模型中，首先采用改进的小波变换算法解决原始数据无法有效区分细节特征的问题，从而准确提取数据的周期性和波动性特征。其次，基于Informer架构的编码器模块增强了各种频域特征，并通过蒸馏技术进一步提高了特征的质量及其与标签的相关性。随后，引入了一种改进的生成对抗网络，通过有效地增强有限数据集来解决小样本数据问题，从而提高数据集的整体质量。最后，结合优化算法和改进卷积神经网络（IMCPSO-RCNN）的解码器模块有效解决了传统模型在超参数优化和预测性能方面的不足，实现了污染物浓度的高效准确回归预测。以长江中下游为例，该模型预测TN、COD和TP的决定系数（R²）分别为0.9785、0.9733和0.9741，预测精度优于其他模型。均方根误差（RMSE）值分别为0.0601、0.6248和0.0023，平均绝对误差（MAE）值分别为0.0252、0.2810和0.0006。通过烧蚀实验验证了模型各组成部分的必要性和有效性。本研究为水污染物监测提供了高效、统一的深度学习解决方案。这个深度学习框架通过准确预测污染物浓度，为环境政策和水系统管理提供信息，增强了水质监测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Process Safety and Environmental Protection 环境科学-工程：化工

CiteScore

11.40

自引率

15.40%

发文量

929

审稿时长

8.0 months

期刊介绍： The Process Safety and Environmental Protection (PSEP) journal is a leading international publication that focuses on the publication of high-quality, original research papers in the field of engineering, specifically those related to the safety of industrial processes and environmental protection. The journal encourages submissions that present new developments in safety and environmental aspects, particularly those that show how research findings can be applied in process engineering design and practice. PSEP is particularly interested in research that brings fresh perspectives to established engineering principles, identifies unsolved problems, or suggests directions for future research. The journal also values contributions that push the boundaries of traditional engineering and welcomes multidisciplinary papers. PSEP's articles are abstracted and indexed by a range of databases and services, which helps to ensure that the journal's research is accessible and recognized in the academic and professional communities. These databases include ANTE, Chemical Abstracts, Chemical Hazards in Industry, Current Contents, Elsevier Engineering Information database, Pascal Francis, Web of Science, Scopus, Engineering Information Database EnCompass LIT (Elsevier), and INSPEC. This wide coverage facilitates the dissemination of the journal's content to a global audience interested in process safety and environmental engineering.