Salient Object Detection From Arbitrary Modalities.

Nianchang Huang, Yang Yang, Ruida Xi, Qiang Zhang, Jungong Han, Jin Huang
{"title":"Salient Object Detection From Arbitrary Modalities.","authors":"Nianchang Huang, Yang Yang, Ruida Xi, Qiang Zhang, Jungong Han, Jin Huang","doi":"10.1109/TIP.2024.3486225","DOIUrl":null,"url":null,"abstract":"<p><p>Toward desirable saliency prediction, the types and numbers of inputs for a salient object detection (SOD) algorithm may dynamically change in many real-life applications. However, existing SOD algorithms are mainly designed or trained for one particular type of inputs, failing to be generalized to other types of inputs. Consequentially, more types of SOD algorithms need to be prepared in advance for handling different types of inputs, raising huge hardware and research costs. Differently, in this paper, we propose a new type of SOD task, termed Arbitrary Modality SOD (AM SOD). The most prominent characteristics of AM SOD are that the modality types and modality numbers will be arbitrary or dynamically changed. The former means that the inputs to the AM SOD algorithm may be arbitrary modalities such as RGB, depths, or even any combination of them. While, the latter indicates that the inputs may have arbitrary modality numbers as the input type is changed, e.g. single-modality RGB image, dual-modality RGB-Depth (RGB-D) images or triple-modality RGB-Depth-Thermal (RGB-D-T) images. Accordingly, a preliminary solution to the above challenges, i.e. a modality switch network (MSN), is proposed in this paper. In particular, a modality switch feature extractor (MSFE) is first designed to extract discriminative features from each modality effectively by introducing some modality indicators, which will generate some weights for modality switching. Subsequently, a dynamic fusion module (DFM) is proposed to adaptively fuse features from a variable number of modalities based on a novel Transformer structure. Finally, a new dataset, named AM-XD, is constructed to facilitate research on AM SOD. Extensive experiments demonstrate that our AM SOD method can effectively cope with changes in the type and number of input modalities for robust salient object detection. Our code and AM-XD dataset will be released on https://github.com/nexiakele/AMSODFirst.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TIP.2024.3486225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Toward desirable saliency prediction, the types and numbers of inputs for a salient object detection (SOD) algorithm may dynamically change in many real-life applications. However, existing SOD algorithms are mainly designed or trained for one particular type of inputs, failing to be generalized to other types of inputs. Consequentially, more types of SOD algorithms need to be prepared in advance for handling different types of inputs, raising huge hardware and research costs. Differently, in this paper, we propose a new type of SOD task, termed Arbitrary Modality SOD (AM SOD). The most prominent characteristics of AM SOD are that the modality types and modality numbers will be arbitrary or dynamically changed. The former means that the inputs to the AM SOD algorithm may be arbitrary modalities such as RGB, depths, or even any combination of them. While, the latter indicates that the inputs may have arbitrary modality numbers as the input type is changed, e.g. single-modality RGB image, dual-modality RGB-Depth (RGB-D) images or triple-modality RGB-Depth-Thermal (RGB-D-T) images. Accordingly, a preliminary solution to the above challenges, i.e. a modality switch network (MSN), is proposed in this paper. In particular, a modality switch feature extractor (MSFE) is first designed to extract discriminative features from each modality effectively by introducing some modality indicators, which will generate some weights for modality switching. Subsequently, a dynamic fusion module (DFM) is proposed to adaptively fuse features from a variable number of modalities based on a novel Transformer structure. Finally, a new dataset, named AM-XD, is constructed to facilitate research on AM SOD. Extensive experiments demonstrate that our AM SOD method can effectively cope with changes in the type and number of input modalities for robust salient object detection. Our code and AM-XD dataset will be released on https://github.com/nexiakele/AMSODFirst.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从任意模态检测突出物体
为了实现理想的突出预测,在许多实际应用中,突出物体检测(SOD)算法的输入类型和数量可能会发生动态变化。然而,现有的 SOD 算法主要是针对一种特定类型的输入而设计或训练的,无法推广到其他类型的输入。因此,需要提前准备更多类型的 SOD 算法来处理不同类型的输入,这就增加了巨大的硬件和研究成本。与此不同,我们在本文中提出了一种新型 SOD 任务,称为任意模态 SOD(AM SOD)。AM SOD 的最大特点是模态类型和模态数是任意或动态变化的。前者意味着 AM SOD 算法的输入可以是 RGB、深度等任意模态,甚至是它们的任意组合。而后者则表示,随着输入类型的改变,输入可能具有任意的模态数,例如单模态 RGB 图像、双模态 RGB-D 深度(RGB-D)图像或三模态 RGB-D 深度-热(RGB-D-T)图像。因此,本文提出了应对上述挑战的初步解决方案,即模态切换网络(MSN)。具体来说,首先设计了一个模态切换特征提取器(MSFE),通过引入一些模态指标,有效地提取每种模态的鉴别特征,从而为模态切换产生一些权重。随后,我们提出了一个动态融合模块(DFM),基于新颖的变换器结构,自适应地融合来自不同数量模态的特征。最后,我们构建了一个名为 AM-XD 的新数据集,以促进 AM SOD 的研究。广泛的实验证明,我们的 AM SOD 方法可以有效地应对输入模态类型和数量的变化,从而实现稳健的突出物体检测。我们的代码和 AM-XD 数据集将在 https://github.com/nexiakele/AMSODFirst 上发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multi-Dimensional Visual Data Restoration: Uncovering the Global Discrepancy in Transformed High-Order Tensor Singular Values. Learning Cross-Attention Point Transformer with Global Porous Sampling. Salient Object Detection From Arbitrary Modalities. GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning. AnlightenDiff: Anchoring Diffusion Probabilistic Model on Low Light Image Enhancement.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1