Visual attention methods in deep learning: An in-depth survey

IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Information Fusion Pub Date : 2024-04-08 DOI:10.1016/j.inffus.2024.102417
Mohammed Hassanin , Saeed Anwar , Ibrahim Radwan , Fahad Shahbaz Khan , Ajmal Mian
{"title":"Visual attention methods in deep learning: An in-depth survey","authors":"Mohammed Hassanin ,&nbsp;Saeed Anwar ,&nbsp;Ibrahim Radwan ,&nbsp;Fahad Shahbaz Khan ,&nbsp;Ajmal Mian","doi":"10.1016/j.inffus.2024.102417","DOIUrl":null,"url":null,"abstract":"<div><p>Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated into one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey on attention techniques to guide researchers in employing attention in their deep models. Note that, besides being demanding in terms of training data and computational resources, transformers only cover a single category in self-attention out of the many categories available. We fill this gap and provide an in-depth survey of 50 attention techniques, categorizing them by their most prominent features. We initiate our discussion by introducing the fundamental concepts behind the success of the attention mechanism. Next, we furnish some essentials such as the strengths and limitations of each attention category, describe their fundamental building blocks, basic formulations with primary usage, and applications specifically for computer vision. We also discuss the challenges and general open questions related to attention mechanisms. Finally, we recommend possible future research directions for deep attention. All the information about visual attention methods in deep learning is provided at <span>https://github.com/saeed-anwar/VisualAttention</span><svg><path></path></svg></p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":null,"pages":null},"PeriodicalIF":14.7000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524001957","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated into one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey on attention techniques to guide researchers in employing attention in their deep models. Note that, besides being demanding in terms of training data and computational resources, transformers only cover a single category in self-attention out of the many categories available. We fill this gap and provide an in-depth survey of 50 attention techniques, categorizing them by their most prominent features. We initiate our discussion by introducing the fundamental concepts behind the success of the attention mechanism. Next, we furnish some essentials such as the strengths and limitations of each attention category, describe their fundamental building blocks, basic formulations with primary usage, and applications specifically for computer vision. We also discuss the challenges and general open questions related to attention mechanisms. Finally, we recommend possible future research directions for deep attention. All the information about visual attention methods in deep learning is provided at https://github.com/saeed-anwar/VisualAttention

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
深度学习中的视觉注意力方法:深入调查
受人类认知系统的启发,注意力是一种模仿人类对特定信息认知的机制,它能放大关键细节,从而更加专注于数据的本质方面。深度学习利用注意力提高了许多应用的性能。有趣的是,相同的注意力设计可以适用于处理不同的数据模式,并能轻松地融入大型网络。此外,多个互补的注意力机制也可以整合到一个网络中。因此,注意力技术变得极具吸引力。然而,文献中缺乏对注意力技术的全面调查,无法指导研究人员在深度模型中使用注意力。需要注意的是,除了对训练数据和计算资源的要求较高外,变换器在众多可用类别中只涵盖了自我注意力中的一个类别。我们填补了这一空白,深入研究了 50 种注意力技术,并根据其最突出的特点进行了分类。我们首先介绍了注意力机制成功背后的基本概念。接下来,我们介绍了一些基本要素,如每个注意力类别的优势和局限性,描述了它们的基本构建模块、主要用途的基本公式,以及专门针对计算机视觉的应用。我们还讨论了与注意力机制相关的挑战和一般开放性问题。最后,我们推荐了深度注意力未来可能的研究方向。有关深度学习中视觉注意力方法的所有信息,请访问 https://github.com/saeed-anwar/VisualAttention。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
期刊最新文献
Multimodal fusion for large-scale traffic prediction with heterogeneous retentive networks Scalable data fusion via a scale-based hierarchical framework: Adapting to multi-source and multi-scale scenarios High performance RGB-Thermal Video Object Detection via hybrid fusion with progressive interaction and temporal-modal difference Tensor-based unsupervised feature selection for error-robust handling of unbalanced incomplete multi-view data Evolving intra-and inter-session graph fusion for next item recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1