Machine learning-based evolution of water quality prediction model: an integrated robust framework for comparative application on periodic return and jitter data
{"title":"Machine learning-based evolution of water quality prediction model: an integrated robust framework for comparative application on periodic return and jitter data","authors":"Xizhi Nong, Yi He, Lihua Chen, Jiahua Wei","doi":"10.1016/j.envpol.2025.125834","DOIUrl":null,"url":null,"abstract":"Accurate water quality prediction is paramount for the sustainable management of surface water resources. Current deep learning models face challenges in reliably forecasting water quality due to the non-stationarity of environmental conditions and the intricate interactions among various environmental factors. This study introduces a novel, multi-level coupled machine learning framework that integrates data denoising, feature selection, and Long Short-Term Memory (LSTM) networks to enhance predictive accuracy. The findings demonstrate that the LSTM model incorporates data denoising pre-processing, capturing non-stationary water quality patterns more effectively than the baseline model, enhancing prediction performance (R<sup>2</sup> increased by 1.01%). The most adept model with wavelet transform exhibited superior adaptability and predictability, achieving a modest but statistically significant increase in R<sup>2</sup> values of 0.81% and 0.51% relative to incorporate moving average and complete ensemble empirical mode decomposition with adaptive noise techniques, respectively. The integrated models varied in their suitability for time series characterized by different patterns of variability (stability vs. instability, periodicity vs. non-periodicity). We conducted multi-step ahead predictions (t+1 and t+3 days) and employed two training configurations (80-20% and 70-30% splits) for dissolved oxygen and the permanganate index across four monitoring stations within the world's largest long-distance inter-basin water diversion project, to assess the reliability and robustness of the proposed water quality prediction models under varying conditions. The integration of data denoising techniques with LSTM networks substantially improves the prediction of dynamic water quality indices in complex environmental settings. Future research should explore the scalability of this framework across different geographical and climatic conditions to further validate its effectiveness and utility in global water resource management.","PeriodicalId":311,"journal":{"name":"Environmental Pollution","volume":"84 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Pollution","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.envpol.2025.125834","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate water quality prediction is paramount for the sustainable management of surface water resources. Current deep learning models face challenges in reliably forecasting water quality due to the non-stationarity of environmental conditions and the intricate interactions among various environmental factors. This study introduces a novel, multi-level coupled machine learning framework that integrates data denoising, feature selection, and Long Short-Term Memory (LSTM) networks to enhance predictive accuracy. The findings demonstrate that the LSTM model incorporates data denoising pre-processing, capturing non-stationary water quality patterns more effectively than the baseline model, enhancing prediction performance (R2 increased by 1.01%). The most adept model with wavelet transform exhibited superior adaptability and predictability, achieving a modest but statistically significant increase in R2 values of 0.81% and 0.51% relative to incorporate moving average and complete ensemble empirical mode decomposition with adaptive noise techniques, respectively. The integrated models varied in their suitability for time series characterized by different patterns of variability (stability vs. instability, periodicity vs. non-periodicity). We conducted multi-step ahead predictions (t+1 and t+3 days) and employed two training configurations (80-20% and 70-30% splits) for dissolved oxygen and the permanganate index across four monitoring stations within the world's largest long-distance inter-basin water diversion project, to assess the reliability and robustness of the proposed water quality prediction models under varying conditions. The integration of data denoising techniques with LSTM networks substantially improves the prediction of dynamic water quality indices in complex environmental settings. Future research should explore the scalability of this framework across different geographical and climatic conditions to further validate its effectiveness and utility in global water resource management.
期刊介绍:
Environmental Pollution is an international peer-reviewed journal that publishes high-quality research papers and review articles covering all aspects of environmental pollution and its impacts on ecosystems and human health.
Subject areas include, but are not limited to:
• Sources and occurrences of pollutants that are clearly defined and measured in environmental compartments, food and food-related items, and human bodies;
• Interlinks between contaminant exposure and biological, ecological, and human health effects, including those of climate change;
• Contaminants of emerging concerns (including but not limited to antibiotic resistant microorganisms or genes, microplastics/nanoplastics, electronic wastes, light, and noise) and/or their biological, ecological, or human health effects;
• Laboratory and field studies on the remediation/mitigation of environmental pollution via new techniques and with clear links to biological, ecological, or human health effects;
• Modeling of pollution processes, patterns, or trends that is of clear environmental and/or human health interest;
• New techniques that measure and examine environmental occurrences, transport, behavior, and effects of pollutants within the environment or the laboratory, provided that they can be clearly used to address problems within regional or global environmental compartments.