Feature differences reduction and specific features preserving network for RGB-T salient object detection

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2024-12-01 Epub Date: 2024-10-18 DOI:10.1016/j.imavis.2024.105302

Qiqi Xu, Zhenguang Di, Haoyu Dong, Gang Yang

{"title":"Feature differences reduction and specific features preserving network for RGB-T salient object detection","authors":"Qiqi Xu, Zhenguang Di, Haoyu Dong, Gang Yang","doi":"10.1016/j.imavis.2024.105302","DOIUrl":null,"url":null,"abstract":"<div><div>In RGB-T salient object detection, effective utilization of the different characteristics of RGB and thermal modalities is essential to achieve accurate detection. Most of the previous methods usually only focus on reducing the differences between modalities, which may ignore the specific features that are crucial for salient object detection, leading to suboptimal results. To address the above issue, an RGB-T SOD network that simultaneously considers the reduction of modality differences and the preservation of specific features is proposed. Specifically, we construct a modality differences reduction and specific features preserving module (MDRSFPM) which aims to bridge the gap between modalities and enhance the specific features of each modality. In MDRSFPM, the dynamic vector generated by the interaction of RGB and thermal features is used to reduce modality differences, and then a dual branch is constructed to deal with the RGB and thermal modalities separately, employing a combination of channel-level and spatial-level operations to preserve their respective specific features. In addition, a multi-scale global feature enhancement module (MGFEM) is proposed to enhance global contextual information to provide guidance information for the subsequent decoding stage, so that the model can more easily localize the salient objects. Furthermore, our approach includes a fully fusion and gate module (FFGM) that utilizes dynamically generated importance maps to selectively filter and fuse features during the decoding process. Extensive experiments demonstrate that our proposed model surpasses other state-of-the-art models on three publicly available RGB-T datasets remarkably. Our code will be released at <span><span>https://github.com/JOOOOKII/FRPNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105302"},"PeriodicalIF":4.2000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004074","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/18 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In RGB-T salient object detection, effective utilization of the different characteristics of RGB and thermal modalities is essential to achieve accurate detection. Most of the previous methods usually only focus on reducing the differences between modalities, which may ignore the specific features that are crucial for salient object detection, leading to suboptimal results. To address the above issue, an RGB-T SOD network that simultaneously considers the reduction of modality differences and the preservation of specific features is proposed. Specifically, we construct a modality differences reduction and specific features preserving module (MDRSFPM) which aims to bridge the gap between modalities and enhance the specific features of each modality. In MDRSFPM, the dynamic vector generated by the interaction of RGB and thermal features is used to reduce modality differences, and then a dual branch is constructed to deal with the RGB and thermal modalities separately, employing a combination of channel-level and spatial-level operations to preserve their respective specific features. In addition, a multi-scale global feature enhancement module (MGFEM) is proposed to enhance global contextual information to provide guidance information for the subsequent decoding stage, so that the model can more easily localize the salient objects. Furthermore, our approach includes a fully fusion and gate module (FFGM) that utilizes dynamically generated importance maps to selectively filter and fuse features during the decoding process. Extensive experiments demonstrate that our proposed model surpasses other state-of-the-art models on three publicly available RGB-T datasets remarkably. Our code will be released at https://github.com/JOOOOKII/FRPNet.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

减少特征差异和特定特征保存网络用于 RGB-T 突出物体检测

在 RGB-T 突出物体检测中，有效利用 RGB 和热成像模式的不同特性对于实现精确检测至关重要。以往的大多数方法通常只关注减少模态之间的差异，这可能会忽略对突出物体检测至关重要的特定特征，从而导致检测结果不理想。为了解决上述问题，我们提出了一种同时考虑减少模态差异和保留特定特征的 RGB-T SOD 网络。具体来说，我们构建了一个减少模态差异和保留特定特征的模块（MDRSFPM），旨在弥合模态之间的差距并增强每种模态的特定特征。在 MDRSFPM 中，由 RGB 和热敏特征交互产生的动态向量被用来减少模态差异，然后构建一个双分支来分别处理 RGB 和热敏模态，采用通道级和空间级操作的组合来保留各自的特定特征。此外，我们还提出了多尺度全局特征增强模块（MGFEM），以增强全局上下文信息，为后续解码阶段提供指导信息，从而使模型更容易定位突出对象。此外，我们的方法还包括一个完全融合和门模块（FFGM），它利用动态生成的重要性图在解码过程中选择性地过滤和融合特征。广泛的实验证明，在三个公开的 RGB-T 数据集上，我们提出的模型明显超越了其他最先进的模型。我们的代码将在 https://github.com/JOOOOKII/FRPNet 上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.