Comparative analysis of k-nearest neighbors distance metrics for retrieving coastal water quality based on concurrent in situ and satellite observations
Bonyad Ahmadi , Mehdi Gholamalifard , Seyed Mahmoud Ghasempouri , Tiit Kutser
{"title":"Comparative analysis of k-nearest neighbors distance metrics for retrieving coastal water quality based on concurrent in situ and satellite observations","authors":"Bonyad Ahmadi , Mehdi Gholamalifard , Seyed Mahmoud Ghasempouri , Tiit Kutser","doi":"10.1016/j.marpolbul.2025.117816","DOIUrl":null,"url":null,"abstract":"<div><div>It is time consuming and expensive to monitor extensive areas of coastal waters with sufficient frequency using in situ (ship based) methods. Satellite remote sensing is much more cost effective. Satellites can detect Optically Active Constituents (OACs) in water. Therefore, it is crucial to know the concentrations of OACs in the study area in order to develop and validate remote sensing methods suitable for assessing water quality in this region. The Pars Special Economic Energy Zone (PSEEZ), a major hub of natural gas extraction in the Persian Gulf, has undergone rapid industrial expansion since 1998, intensifying environmental pressures and necessitating high-resolution, frequent water quality assessments. However, a structured, long-term monitoring framework is absent despite the significance of this region. In order to develop satellite-based remote sensing methods for this region we carried out measurements of different OACs (chlorophyll-a, coloured dissolved organic matter (CDOM) and turbidity) and tested Landsat 8, Sentinel-2, and Sentinel-3 performance in retrieving the OACs. We tested the k-Nearest Neighbors machine learning algorithm. The selection of distance metrics demonstrated a significant influence on the accuracy of retrieving OACs. In turbidity retrieval, the Euclidean Distance (ED) enhanced the regression slope to 0.90 (a 55.17 % improvement over Fuzzy Mahalanobis Distance (FD)) and reduced the RMSLE to 0.51, corresponding to an approximate 160 % enhancement in precision. For CDOM, RMSLE values for ED and FD were 0.39 and 0.48, respectively, indicating an 18.75 % improvement favoring ED. Furthermore, bias analysis revealed deviations of 1–6 % compared to reference data, with the lowest values observed for Mahalanobis Distance (MD) with MSI and FD with OLCI. In chlorophyll-a retrieval, the choice of distance metric directly impacted the accuracy of the OLCI sensor, inducing bidirectional bias, comprising both overestimation and underestimation, which varied depending on the selected metric. These results underscore the critical importance of optimizing distance metric selection to enhance prediction accuracy and mitigate systematic errors in remote sensing applications. Furthermore, the results revealed that the implementation of this algorithm exhibited substantially superior performance compared to other evaluated algorithms within the study area, achieving significantly higher accuracy metrics. Thereby establishing k-NN as the optimal framework for satellite-based water quality monitoring in environmentally sensitive regions like PSEEZ.</div></div>","PeriodicalId":18215,"journal":{"name":"Marine pollution bulletin","volume":"214 ","pages":"Article 117816"},"PeriodicalIF":5.3000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Marine pollution bulletin","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0025326X25002917","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
It is time consuming and expensive to monitor extensive areas of coastal waters with sufficient frequency using in situ (ship based) methods. Satellite remote sensing is much more cost effective. Satellites can detect Optically Active Constituents (OACs) in water. Therefore, it is crucial to know the concentrations of OACs in the study area in order to develop and validate remote sensing methods suitable for assessing water quality in this region. The Pars Special Economic Energy Zone (PSEEZ), a major hub of natural gas extraction in the Persian Gulf, has undergone rapid industrial expansion since 1998, intensifying environmental pressures and necessitating high-resolution, frequent water quality assessments. However, a structured, long-term monitoring framework is absent despite the significance of this region. In order to develop satellite-based remote sensing methods for this region we carried out measurements of different OACs (chlorophyll-a, coloured dissolved organic matter (CDOM) and turbidity) and tested Landsat 8, Sentinel-2, and Sentinel-3 performance in retrieving the OACs. We tested the k-Nearest Neighbors machine learning algorithm. The selection of distance metrics demonstrated a significant influence on the accuracy of retrieving OACs. In turbidity retrieval, the Euclidean Distance (ED) enhanced the regression slope to 0.90 (a 55.17 % improvement over Fuzzy Mahalanobis Distance (FD)) and reduced the RMSLE to 0.51, corresponding to an approximate 160 % enhancement in precision. For CDOM, RMSLE values for ED and FD were 0.39 and 0.48, respectively, indicating an 18.75 % improvement favoring ED. Furthermore, bias analysis revealed deviations of 1–6 % compared to reference data, with the lowest values observed for Mahalanobis Distance (MD) with MSI and FD with OLCI. In chlorophyll-a retrieval, the choice of distance metric directly impacted the accuracy of the OLCI sensor, inducing bidirectional bias, comprising both overestimation and underestimation, which varied depending on the selected metric. These results underscore the critical importance of optimizing distance metric selection to enhance prediction accuracy and mitigate systematic errors in remote sensing applications. Furthermore, the results revealed that the implementation of this algorithm exhibited substantially superior performance compared to other evaluated algorithms within the study area, achieving significantly higher accuracy metrics. Thereby establishing k-NN as the optimal framework for satellite-based water quality monitoring in environmentally sensitive regions like PSEEZ.
期刊介绍:
Marine Pollution Bulletin is concerned with the rational use of maritime and marine resources in estuaries, the seas and oceans, as well as with documenting marine pollution and introducing new forms of measurement and analysis. A wide range of topics are discussed as news, comment, reviews and research reports, not only on effluent disposal and pollution control, but also on the management, economic aspects and protection of the marine environment in general.