Underwater small and occlusion object detection with feature fusion and global context decoupling head-based YOLO

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Multimedia Systems Pub Date : 2024-07-17 DOI:10.1007/s00530-024-01410-z

Lei Deng, Shaojuan Luo, Chunhua He, Huapan Xiao, Heng Wu

{"title":"Underwater small and occlusion object detection with feature fusion and global context decoupling head-based YOLO","authors":"Lei Deng, Shaojuan Luo, Chunhua He, Huapan Xiao, Heng Wu","doi":"10.1007/s00530-024-01410-z","DOIUrl":null,"url":null,"abstract":"<p>The underwater light scattering, absorption, and camera or target moving often bring issues such as blurring, distortion, and color deviation in underwater imaging, which poses significant challenges to underwater target detection. Numerous detectors have been proposed to address these challenges, such as YOLO series models, RCNN-based variants, and Transformer-based variants. However, the previous detectors often have poor detection results when encountering small targets and target occlusion problems. To tackle these issues, We propose a feature fusion and global semantic decoupling head-based YOLO detection method. Specifically, we propose an efficient feature fusion module to solve the problem of small target feature information being lost and difficult to detect accurately. We also use self-supervision to recalibrate the feature information between each level, which achieves full integration of semantic information between different levels. We design a decoupling head that focuses on global context information, which can better filter out complex background information, thereby achieving effective detection of targets under occluded backgrounds. Finally, we replace simple upsampling with a content-aware reassembly module in the YOLO backbone, alleviating the problem of imprecise localization and identification of small targets caused by feature loss to some extent. The experimental results indicate that the proposed method achieves superior performance compared to other state-of-the-art single-stage and two-stage detection networks. Specifically, on the UTDAC2020 dataset, the proposed method attains mAP50-95 and mAP50 scores of 54.4% and 87.7%, respectively.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"1 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01410-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The underwater light scattering, absorption, and camera or target moving often bring issues such as blurring, distortion, and color deviation in underwater imaging, which poses significant challenges to underwater target detection. Numerous detectors have been proposed to address these challenges, such as YOLO series models, RCNN-based variants, and Transformer-based variants. However, the previous detectors often have poor detection results when encountering small targets and target occlusion problems. To tackle these issues, We propose a feature fusion and global semantic decoupling head-based YOLO detection method. Specifically, we propose an efficient feature fusion module to solve the problem of small target feature information being lost and difficult to detect accurately. We also use self-supervision to recalibrate the feature information between each level, which achieves full integration of semantic information between different levels. We design a decoupling head that focuses on global context information, which can better filter out complex background information, thereby achieving effective detection of targets under occluded backgrounds. Finally, we replace simple upsampling with a content-aware reassembly module in the YOLO backbone, alleviating the problem of imprecise localization and identification of small targets caused by feature loss to some extent. The experimental results indicate that the proposed method achieves superior performance compared to other state-of-the-art single-stage and two-stage detection networks. Specifically, on the UTDAC2020 dataset, the proposed method attains mAP50-95 and mAP50 scores of 54.4% and 87.7%, respectively.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用特征融合和全局上下文解耦头基 YOLO 进行水下小物体和遮挡物体检测

水下光散射、吸收以及摄像机或目标移动往往会带来水下成像模糊、失真和色彩偏差等问题，这给水下目标检测带来了巨大挑战。为了应对这些挑战，人们提出了许多探测器，如 YOLO 系列模型、基于 RCNN 的变体和基于 Transformer 的变体。然而，以往的检测器在遇到小目标和目标遮挡问题时往往检测效果不佳。为了解决这些问题，我们提出了一种基于特征融合和全局语义解耦的头部 YOLO 检测方法。具体来说，我们提出了一种高效的特征融合模块，以解决小目标特征信息丢失和难以准确检测的问题。我们还利用自监督来重新校准各层次之间的特征信息，实现了不同层次之间语义信息的充分融合。我们设计了一个解耦头，专注于全局上下文信息，可以更好地过滤掉复杂的背景信息，从而实现对遮挡背景下目标的有效检测。最后，我们在 YOLO 骨干模块中加入了内容感知重组装模块，取代了简单的上采样，在一定程度上缓解了因特征丢失而导致的小目标定位和识别不精确的问题。实验结果表明，与其他最先进的单级和两级检测网络相比，所提出的方法实现了更优越的性能。具体来说，在UTDAC2020数据集上，所提方法的mAP50-95和mAP50得分率分别达到54.4%和87.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.