Detecting floating litter in freshwater bodies with semi-supervised deep learning

IF 12.4 1区环境科学与生态学 Q1 ENGINEERING, ENVIRONMENTAL Water Research Pub Date : 2024-11-15 Epub Date: 2024-09-11 DOI:10.1016/j.watres.2024.122405

Tianlong Jia , Rinze de Vries , Zoran Kapelan , Tim H.M. van Emmerik , Riccardo Taormina

{"title":"Detecting floating litter in freshwater bodies with semi-supervised deep learning","authors":"Tianlong Jia , Rinze de Vries , Zoran Kapelan , Tim H.M. van Emmerik , Riccardo Taormina","doi":"10.1016/j.watres.2024.122405","DOIUrl":null,"url":null,"abstract":"<div><p>Researchers and practitioners have extensively utilized supervised Deep Learning methods to quantify floating litter in rivers and canals. These methods require the availability of large amount of labeled data for training. The labeling work is expensive and laborious, resulting in small open datasets available in the field compared to the comprehensive datasets for computer vision, e.g., ImageNet. Fine-tuning models pre-trained on these larger datasets helps improve litter detection performances and reduces data requirements. Yet, the effectiveness of using features learned from generic datasets is limited in large-scale monitoring, where automated detection must adapt across different locations, environmental conditions, and sensor settings. To address this issue, we propose a two-stage semi-supervised learning method to detect floating litter based on the Swapping Assignments between multiple Views of the same image (SwAV). SwAV is a self-supervised learning approach that learns the underlying feature representation from unlabeled data. In the first stage, we used SwAV to pre-train a ResNet50 backbone architecture on about 100k unlabeled images. In the second stage, we added new layers to the pre-trained ResNet50 to create a Faster R-CNN architecture, and fine-tuned it with a limited number of labeled images (<span><math><mo>≈</mo></math></span>1.8k images with 2.6k annotated litter items). We developed and validated our semi-supervised floating litter detection methodology for images collected in canals and waterways of Delft (the Netherlands) and Jakarta (Indonesia). We tested for out-of-domain generalization performances in a zero-shot fashion using additional data from Ho Chi Minh City (Vietnam), Amsterdam and Groningen (the Netherlands). We benchmarked our results against the same Faster R-CNN architecture trained via supervised learning alone by fine-tuning ImageNet pre-trained weights. The findings indicate that the semi-supervised learning method matches or surpasses the supervised learning benchmark when tested on new images from the same training locations. We measured better performances when little data (<span><math><mo>≈</mo></math></span>200 images with about 300 annotated litter items) is available for fine-tuning and with respect to reducing false positive predictions. More importantly, the proposed approach demonstrates clear superiority for generalization on the unseen locations, with improvements in average precision of up to 12.7%. We attribute this superior performance to the more effective high-level feature extraction from SwAV pre-training from relevant unlabeled images. Our findings highlight a promising direction to leverage semi-supervised learning for developing foundational models, which have revolutionized artificial intelligence applications in most fields. By scaling our proposed approach with more data and compute, we can make significant strides in monitoring to address the global challenge of litter pollution in water bodies.</p></div>","PeriodicalId":443,"journal":{"name":"Water Research","volume":"266 ","pages":"Article 122405"},"PeriodicalIF":12.4000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0043135424013046/pdfft?md5=781a999889f43b9cca9e6c44c7d0193f&pid=1-s2.0-S0043135424013046-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Water Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0043135424013046","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Researchers and practitioners have extensively utilized supervised Deep Learning methods to quantify floating litter in rivers and canals. These methods require the availability of large amount of labeled data for training. The labeling work is expensive and laborious, resulting in small open datasets available in the field compared to the comprehensive datasets for computer vision, e.g., ImageNet. Fine-tuning models pre-trained on these larger datasets helps improve litter detection performances and reduces data requirements. Yet, the effectiveness of using features learned from generic datasets is limited in large-scale monitoring, where automated detection must adapt across different locations, environmental conditions, and sensor settings. To address this issue, we propose a two-stage semi-supervised learning method to detect floating litter based on the Swapping Assignments between multiple Views of the same image (SwAV). SwAV is a self-supervised learning approach that learns the underlying feature representation from unlabeled data. In the first stage, we used SwAV to pre-train a ResNet50 backbone architecture on about 100k unlabeled images. In the second stage, we added new layers to the pre-trained ResNet50 to create a Faster R-CNN architecture, and fine-tuned it with a limited number of labeled images ( $\approx$ 1.8k images with 2.6k annotated litter items). We developed and validated our semi-supervised floating litter detection methodology for images collected in canals and waterways of Delft (the Netherlands) and Jakarta (Indonesia). We tested for out-of-domain generalization performances in a zero-shot fashion using additional data from Ho Chi Minh City (Vietnam), Amsterdam and Groningen (the Netherlands). We benchmarked our results against the same Faster R-CNN architecture trained via supervised learning alone by fine-tuning ImageNet pre-trained weights. The findings indicate that the semi-supervised learning method matches or surpasses the supervised learning benchmark when tested on new images from the same training locations. We measured better performances when little data ( $\approx$ 200 images with about 300 annotated litter items) is available for fine-tuning and with respect to reducing false positive predictions. More importantly, the proposed approach demonstrates clear superiority for generalization on the unseen locations, with improvements in average precision of up to 12.7%. We attribute this superior performance to the more effective high-level feature extraction from SwAV pre-training from relevant unlabeled images. Our findings highlight a promising direction to leverage semi-supervised learning for developing foundational models, which have revolutionized artificial intelligence applications in most fields. By scaling our proposed approach with more data and compute, we can make significant strides in monitoring to address the global challenge of litter pollution in water bodies.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用半监督深度学习检测淡水水体中的漂浮垃圾

研究人员和从业人员已广泛使用有监督的深度学习方法来量化河流和运河中的漂浮垃圾。这些方法需要大量标注数据来进行训练。标注工作既昂贵又费力，因此与计算机视觉的综合数据集（如 ImageNet）相比，该领域可用的开放数据集较小。在这些较大的数据集上对预先训练好的模型进行微调，有助于提高垃圾检测性能并减少数据需求。然而，在大规模监测中，自动检测必须适应不同的地点、环境条件和传感器设置，因此使用从通用数据集学习到的特征的效果有限。为了解决这个问题，我们提出了一种两阶段半监督学习方法，基于同一图像多个视图之间的交换分配（SwAV）来检测漂浮垃圾。SwAV 是一种自监督学习方法，可从未标明的数据中学习底层特征表示。在第一阶段，我们使用 SwAV 在约 10 万张无标签图像上预训练 ResNet50 骨干架构。在第二阶段，我们在预训练的 ResNet50 中添加了新的层，创建了一个更快的 R-CNN 架构，并使用有限数量的标注图像（1.8k 张图像，2.6k 个标注垃圾项目）对其进行了微调。我们针对在荷兰代尔夫特和印度尼西亚雅加达的运河和水道中收集的图像，开发并验证了半监督漂浮垃圾检测方法。我们使用来自胡志明市（越南）、阿姆斯特丹和格罗宁根（荷兰）的额外数据，以零镜头的方式测试了域外泛化性能。通过微调 ImageNet 预训练权重，我们将结果与仅通过监督学习训练的相同 Faster R-CNN 架构进行了比较。研究结果表明，在对来自相同训练地点的新图像进行测试时，半监督学习方法可以达到或超过监督学习基准。当有少量数据（200 张图像，约 300 个注释垃圾项目）可用于微调时，我们测得的性能更佳，并能减少误报。更重要的是，所提出的方法在未见地点的泛化方面表现出明显的优势，平均精确度提高了 12.7%。我们将这一优异表现归功于从相关的未标记图像中通过 SwAV 预训练进行的更有效的高级特征提取。我们的研究结果凸显了利用半监督学习开发基础模型的一个前景广阔的方向，半监督学习已经彻底改变了大多数领域的人工智能应用。通过利用更多数据和计算来扩展我们提出的方法，我们可以在监测方面取得重大进展，从而应对水体垃圾污染这一全球性挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Water Research 环境科学-工程：环境

CiteScore

20.80

自引率

9.40%

发文量

1307

审稿时长

38 days

期刊介绍： Water Research, along with its open access companion journal Water Research X, serves as a platform for publishing original research papers covering various aspects of the science and technology related to the anthropogenic water cycle, water quality, and its management worldwide. The audience targeted by the journal comprises biologists, chemical engineers, chemists, civil engineers, environmental engineers, limnologists, and microbiologists. The scope of the journal include: •Treatment processes for water and wastewaters (municipal, agricultural, industrial, and on-site treatment), including resource recovery and residuals management; •Urban hydrology including sewer systems, stormwater management, and green infrastructure; •Drinking water treatment and distribution; •Potable and non-potable water reuse; •Sanitation, public health, and risk assessment; •Anaerobic digestion, solid and hazardous waste management, including source characterization and the effects and control of leachates and gaseous emissions; •Contaminants (chemical, microbial, anthropogenic particles such as nanoparticles or microplastics) and related water quality sensing, monitoring, fate, and assessment; •Anthropogenic impacts on inland, tidal, coastal and urban waters, focusing on surface and ground waters, and point and non-point sources of pollution; •Environmental restoration, linked to surface water, groundwater and groundwater remediation; •Analysis of the interfaces between sediments and water, and between water and atmosphere, focusing specifically on anthropogenic impacts; •Mathematical modelling, systems analysis, machine learning, and beneficial use of big data related to the anthropogenic water cycle; •Socio-economic, policy, and regulations studies.