利用受人类启发的显著性检测,增强机器本能视觉的注意力

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2024-11-04 DOI:10.1016/j.imavis.2024.105308
Habib Khan , Muhammad Talha Usman , Imad Rida , JaKeoung Koo
{"title":"利用受人类启发的显著性检测,增强机器本能视觉的注意力","authors":"Habib Khan ,&nbsp;Muhammad Talha Usman ,&nbsp;Imad Rida ,&nbsp;JaKeoung Koo","doi":"10.1016/j.imavis.2024.105308","DOIUrl":null,"url":null,"abstract":"<div><div>Salient object detection (SOD) enables machines to recognize and accurately segment visually prominent regions in images. Despite recent advancements, existing approaches often lack progressive fusion of low and high-level features, effective multi-scale feature handling, and precise boundary detection. Moreover, the robustness of these models under varied lighting conditions remains a concern. To overcome these challenges, we present Attention Enhanced Machine Instinctive Vision framework for SOD. The proposed framework leverages the strategy of Multi-stage Feature Refinement with Optimal Attentions-Driven Framework (MFRNet). The multi-level features are extracted from six stages of the EfficientNet-B7 backbone. This provides effective feature fusions of low and high-level details across various scales at the later stage of the framework. We introduce the Spatial-optimized Feature Attention (SOFA) module, which refines spatial features from three initial-stage feature maps. The extracted multi-scale features from the backbone are passed from the convolution feature transformation and spatial attention mechanisms to refine the low-level information. The SOFA module concatenates and upsamples these refined features, producing a comprehensive spatial representation of various levels. Moreover, the proposed Context-Aware Channel Refinement (CACR) module integrates dilated convolutions with optimized dilation rates followed by channel attention to capture multi-scale contextual information from the mature three layers. Furthermore, our progressive feature fusion strategy combines high-level semantic information and low-level spatial details through multiple residual connections, ensuring robust feature representation and effective gradient backpropagation. To enhance robustness, we train our network with augmented data featuring low and high brightness adjustments, improving its ability to handle diverse lighting conditions. Extensive experiments on four benchmark datasets — ECSSD, HKU-IS, DUTS, and PASCAL-S — validate the proposed framework’s effectiveness, demonstrating superior performance compared to existing SOTA methods in the domain. Code, qualitative results, and trained weights will be available at the link: <span><span>https://github.com/habib1402/MFRNet-SOD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105308"},"PeriodicalIF":4.2000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Attention enhanced machine instinctive vision with human-inspired saliency detection\",\"authors\":\"Habib Khan ,&nbsp;Muhammad Talha Usman ,&nbsp;Imad Rida ,&nbsp;JaKeoung Koo\",\"doi\":\"10.1016/j.imavis.2024.105308\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Salient object detection (SOD) enables machines to recognize and accurately segment visually prominent regions in images. Despite recent advancements, existing approaches often lack progressive fusion of low and high-level features, effective multi-scale feature handling, and precise boundary detection. Moreover, the robustness of these models under varied lighting conditions remains a concern. To overcome these challenges, we present Attention Enhanced Machine Instinctive Vision framework for SOD. The proposed framework leverages the strategy of Multi-stage Feature Refinement with Optimal Attentions-Driven Framework (MFRNet). The multi-level features are extracted from six stages of the EfficientNet-B7 backbone. This provides effective feature fusions of low and high-level details across various scales at the later stage of the framework. We introduce the Spatial-optimized Feature Attention (SOFA) module, which refines spatial features from three initial-stage feature maps. The extracted multi-scale features from the backbone are passed from the convolution feature transformation and spatial attention mechanisms to refine the low-level information. The SOFA module concatenates and upsamples these refined features, producing a comprehensive spatial representation of various levels. Moreover, the proposed Context-Aware Channel Refinement (CACR) module integrates dilated convolutions with optimized dilation rates followed by channel attention to capture multi-scale contextual information from the mature three layers. Furthermore, our progressive feature fusion strategy combines high-level semantic information and low-level spatial details through multiple residual connections, ensuring robust feature representation and effective gradient backpropagation. To enhance robustness, we train our network with augmented data featuring low and high brightness adjustments, improving its ability to handle diverse lighting conditions. Extensive experiments on four benchmark datasets — ECSSD, HKU-IS, DUTS, and PASCAL-S — validate the proposed framework’s effectiveness, demonstrating superior performance compared to existing SOTA methods in the domain. Code, qualitative results, and trained weights will be available at the link: <span><span>https://github.com/habib1402/MFRNet-SOD</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"152 \",\"pages\":\"Article 105308\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S026288562400413X\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S026288562400413X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

突出物体检测(SOD)使机器能够识别并准确分割图像中的视觉突出区域。尽管最近取得了进步,但现有的方法往往缺乏低级和高级特征的渐进融合、有效的多尺度特征处理和精确的边界检测。此外,这些模型在不同光照条件下的鲁棒性仍然令人担忧。为了克服这些挑战,我们提出了用于 SOD 的注意力增强型机器本能视觉框架。所提出的框架利用了多阶段特征提纯与最佳注意力驱动框架(MFRNet)的策略。多级特征是从 EfficientNet-B7 主干网的六个阶段中提取的。这为框架的后期阶段提供了不同尺度的低级和高级细节的有效特征融合。我们引入了空间优化特征关注(SOFA)模块,该模块从三个初始阶段的特征图中提炼空间特征。从骨干图中提取的多尺度特征通过卷积特征变换和空间注意机制来完善低层次信息。SOFA 模块对这些细化的特征进行串联和上采样,生成不同层次的综合空间表示。此外,我们提出的情境感知信道细化(CACR)模块整合了具有优化扩张率的扩张卷积和信道关注,以捕捉来自成熟三层的多尺度情境信息。此外,我们的渐进式特征融合策略通过多个残差连接将高层语义信息和低层空间细节相结合,确保了稳健的特征表示和有效的梯度反向传播。为了增强鲁棒性,我们使用低亮度和高亮度调整的增强数据来训练我们的网络,从而提高其处理不同照明条件的能力。在四个基准数据集(ECSSD、HKU-IS、DUTS 和 PASCAL-S)上进行的广泛实验验证了所提出的框架的有效性,与该领域现有的 SOTA 方法相比,该框架的性能更加优越。代码、定性结果和训练过的权重可通过以下链接获取:https://github.com/habib1402/MFRNet-SOD。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Attention enhanced machine instinctive vision with human-inspired saliency detection
Salient object detection (SOD) enables machines to recognize and accurately segment visually prominent regions in images. Despite recent advancements, existing approaches often lack progressive fusion of low and high-level features, effective multi-scale feature handling, and precise boundary detection. Moreover, the robustness of these models under varied lighting conditions remains a concern. To overcome these challenges, we present Attention Enhanced Machine Instinctive Vision framework for SOD. The proposed framework leverages the strategy of Multi-stage Feature Refinement with Optimal Attentions-Driven Framework (MFRNet). The multi-level features are extracted from six stages of the EfficientNet-B7 backbone. This provides effective feature fusions of low and high-level details across various scales at the later stage of the framework. We introduce the Spatial-optimized Feature Attention (SOFA) module, which refines spatial features from three initial-stage feature maps. The extracted multi-scale features from the backbone are passed from the convolution feature transformation and spatial attention mechanisms to refine the low-level information. The SOFA module concatenates and upsamples these refined features, producing a comprehensive spatial representation of various levels. Moreover, the proposed Context-Aware Channel Refinement (CACR) module integrates dilated convolutions with optimized dilation rates followed by channel attention to capture multi-scale contextual information from the mature three layers. Furthermore, our progressive feature fusion strategy combines high-level semantic information and low-level spatial details through multiple residual connections, ensuring robust feature representation and effective gradient backpropagation. To enhance robustness, we train our network with augmented data featuring low and high brightness adjustments, improving its ability to handle diverse lighting conditions. Extensive experiments on four benchmark datasets — ECSSD, HKU-IS, DUTS, and PASCAL-S — validate the proposed framework’s effectiveness, demonstrating superior performance compared to existing SOTA methods in the domain. Code, qualitative results, and trained weights will be available at the link: https://github.com/habib1402/MFRNet-SOD.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
期刊最新文献
CF-SOLT: Real-time and accurate traffic accident detection using correlation filter-based tracking TransWild: Enhancing 3D interacting hands recovery in the wild with IoU-guided Transformer Machine learning applications in breast cancer prediction using mammography Channel and Spatial Enhancement Network for human parsing Non-negative subspace feature representation for few-shot learning in medical imaging
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1