IF-USOD: Multimodal information fusion interactive feature enhancement architecture for underwater salient object detection

IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Information Fusion Pub Date : 2024-11-23 DOI:10.1016/j.inffus.2024.102806
Genji Yuan , Jintao Song , Jinjiang Li
{"title":"IF-USOD: Multimodal information fusion interactive feature enhancement architecture for underwater salient object detection","authors":"Genji Yuan ,&nbsp;Jintao Song ,&nbsp;Jinjiang Li","doi":"10.1016/j.inffus.2024.102806","DOIUrl":null,"url":null,"abstract":"<div><div>Underwater salient object detection (USOD) has garnered increasing attention due to its superior performance in various underwater visual tasks. Despite the growing interest, research on USOD remains in its nascent stages, with existing methods often struggling to capture long-range contextual features of salient objects. Additionally, these methods frequently overlook the complementary nature of multimodal information. The multimodal information fusion can render previously indiscernible objects more detectable, as capturing complementary features from diverse source images enables a more accurate depiction of objects. In this work, we explore an innovative approach that integrates RGB and depth information, coupled with interactive feature enhancement, to advance the detection of underwater salient objects. Our method first leverages the strengths of both transformer and convolutional neural network architectures to extract features from source images. Here, we employ a two-stage training strategy designed to optimize feature fusion. Subsequently, we utilize self-attention and cross-attention mechanisms to model the correlations among the extracted features, thereby amplifying the relevant features. Finally, to fully exploit features across different network layers, we introduce a cross-scale learning strategy to facilitate multi-scale feature fusion, which improves the detection accuracy of underwater salient objects by generating both coarse and fine salient predictions. Extensive experimental evaluations demonstrate the state-of-the-art model performance of our proposed method.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"117 ","pages":"Article 102806"},"PeriodicalIF":14.7000,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524005840","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Underwater salient object detection (USOD) has garnered increasing attention due to its superior performance in various underwater visual tasks. Despite the growing interest, research on USOD remains in its nascent stages, with existing methods often struggling to capture long-range contextual features of salient objects. Additionally, these methods frequently overlook the complementary nature of multimodal information. The multimodal information fusion can render previously indiscernible objects more detectable, as capturing complementary features from diverse source images enables a more accurate depiction of objects. In this work, we explore an innovative approach that integrates RGB and depth information, coupled with interactive feature enhancement, to advance the detection of underwater salient objects. Our method first leverages the strengths of both transformer and convolutional neural network architectures to extract features from source images. Here, we employ a two-stage training strategy designed to optimize feature fusion. Subsequently, we utilize self-attention and cross-attention mechanisms to model the correlations among the extracted features, thereby amplifying the relevant features. Finally, to fully exploit features across different network layers, we introduce a cross-scale learning strategy to facilitate multi-scale feature fusion, which improves the detection accuracy of underwater salient objects by generating both coarse and fine salient predictions. Extensive experimental evaluations demonstrate the state-of-the-art model performance of our proposed method.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
IF-USOD:用于水下突出物体探测的多模态信息融合交互式特征增强架构
水下突出物体检测(USOD)因其在各种水下视觉任务中的卓越表现而受到越来越多的关注。尽管人们的兴趣与日俱增,但有关水下突出物体检测的研究仍处于起步阶段,现有方法往往难以捕捉突出物体的长距离上下文特征。此外,这些方法经常忽略多模态信息的互补性。多模态信息融合可以使以前无法识别的物体更容易被检测到,因为从不同的源图像中捕捉互补特征可以更准确地描述物体。在这项工作中,我们探索了一种创新方法,将 RGB 和深度信息与交互式特征增强相结合,以推进水下突出物体的检测。我们的方法首先利用变压器和卷积神经网络架构的优势,从源图像中提取特征。在此,我们采用了两阶段训练策略,旨在优化特征融合。随后,我们利用自注意和交叉注意机制对提取的特征之间的相关性进行建模,从而放大相关特征。最后,为了充分利用不同网络层的特征,我们引入了跨尺度学习策略,以促进多尺度特征融合,通过生成粗略和精细的突出预测,提高水下突出物体的检测精度。广泛的实验评估证明了我们提出的方法具有最先进的模型性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
期刊最新文献
[formula omitted]-MGSVM: Controllable multi-granularity support vector algorithm for classification and regression Diff-PC: Identity-preserving and 3D-aware controllable diffusion for zero-shot portrait customization Advancements in perception system with multi-sensor fusion for embodied agents Insight at the right spot: Provide decisive subgraph information to Graph LLM with reinforcement learning A novel hybrid model combining Vision Transformers and Graph Convolutional Networks for monkeypox disease effective diagnosis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1