{"title":"Improved prediction of chlorophyll-a concentrations using advancing graph neural network variants","authors":"Sunghyun Yoon , Kuk-Hyun Ahn","doi":"10.1016/j.scitotenv.2025.179481","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate estimation of harmful algal blooms is essential for protecting surface water. Chlorophyll-a (<em>Chl-a</em>), commonly used as a proxy for estimating algal concentration, is influenced by a broad range of weather and physicochemical factors that operate across various spatial and temporal scales. This study aims to propose a deep learning (DL)-based framework for long-term <em>Chl-a</em> simulation, consisting of two separate blocks for processing multi-modal sources together: one for incorporating irregularly measured water quality observations and the other for integrating climate data measured at constant time steps. Besides a fully connected network for encoding irregular water quality observations, we benchmark several state-of-the-art graph neural network (GNN) architectures, including ChebNet and Graph Convolutional Network (GCN), for encoding continuous climate data. Specifically, we represent water quality stations as nodes in a graph, model the spatiotemporal dependencies between these nodes, and utilize the learned relationships to predict <em>Chl-a</em> simulations simultaneously across all nodes in the graph. Additionally, we introduce a gating mechanism to integrate the outputs from the two blocks. The performance of advanced GNN models is evaluated using a daily dataset from the upper Han River basins in South Korea. The results indicate that our proposed models are promising, outperforming several baseline models developed for similar objectives with improvements up to 47 % in the R<sup>2</sup>. In particular, the combination of the GCN algorithm with Long Short-Term Memory (LSTM) in our DL framework achieves superior performance. We then conduct further analyses to assess the effectiveness of the gating mechanism, revealing that it enhances prediction performance by achieving a 12 % improvement in the R<sup>2</sup> compared to the model without the gating mechanism. We conclude that the proposed GNN-variant framework shows promise as a robust machine learning-based approach for aggregating spatiotemporal information to achieve reliable <em>Chl-a</em> predictions.</div></div>","PeriodicalId":422,"journal":{"name":"Science of the Total Environment","volume":"979 ","pages":"Article 179481"},"PeriodicalIF":8.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of the Total Environment","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0048969725011180","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/24 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate estimation of harmful algal blooms is essential for protecting surface water. Chlorophyll-a (Chl-a), commonly used as a proxy for estimating algal concentration, is influenced by a broad range of weather and physicochemical factors that operate across various spatial and temporal scales. This study aims to propose a deep learning (DL)-based framework for long-term Chl-a simulation, consisting of two separate blocks for processing multi-modal sources together: one for incorporating irregularly measured water quality observations and the other for integrating climate data measured at constant time steps. Besides a fully connected network for encoding irregular water quality observations, we benchmark several state-of-the-art graph neural network (GNN) architectures, including ChebNet and Graph Convolutional Network (GCN), for encoding continuous climate data. Specifically, we represent water quality stations as nodes in a graph, model the spatiotemporal dependencies between these nodes, and utilize the learned relationships to predict Chl-a simulations simultaneously across all nodes in the graph. Additionally, we introduce a gating mechanism to integrate the outputs from the two blocks. The performance of advanced GNN models is evaluated using a daily dataset from the upper Han River basins in South Korea. The results indicate that our proposed models are promising, outperforming several baseline models developed for similar objectives with improvements up to 47 % in the R2. In particular, the combination of the GCN algorithm with Long Short-Term Memory (LSTM) in our DL framework achieves superior performance. We then conduct further analyses to assess the effectiveness of the gating mechanism, revealing that it enhances prediction performance by achieving a 12 % improvement in the R2 compared to the model without the gating mechanism. We conclude that the proposed GNN-variant framework shows promise as a robust machine learning-based approach for aggregating spatiotemporal information to achieve reliable Chl-a predictions.
期刊介绍:
The Science of the Total Environment is an international journal dedicated to scientific research on the environment and its interaction with humanity. It covers a wide range of disciplines and seeks to publish innovative, hypothesis-driven, and impactful research that explores the entire environment, including the atmosphere, lithosphere, hydrosphere, biosphere, and anthroposphere.
The journal's updated Aims & Scope emphasizes the importance of interdisciplinary environmental research with broad impact. Priority is given to studies that advance fundamental understanding and explore the interconnectedness of multiple environmental spheres. Field studies are preferred, while laboratory experiments must demonstrate significant methodological advancements or mechanistic insights with direct relevance to the environment.