Qiuxia Lai , Yongwei Nie , Yu Li , Hanqiu Sun , Qiang Xu
{"title":"Spatial attention for human-centric visual understanding: An Information Bottleneck method","authors":"Qiuxia Lai , Yongwei Nie , Yu Li , Hanqiu Sun , Qiang Xu","doi":"10.1016/j.cviu.2024.104180","DOIUrl":null,"url":null,"abstract":"<div><div>The selective visual attention mechanism in the Human Visual System (HVS) restricts the amount of information that reaches human visual awareness, allowing the brain to perceive high-fidelity natural scenes in real-time with limited computational cost. This selectivity acts as an “Information Bottleneck (IB)” that balances information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). This paper introduces an IB-inspired spatial attention module for DNNs, which generates an attention map by minimizing the mutual information (MI) between the attentive content and the input while maximizing that between the attentive content and the output. We develop this IB-inspired attention mechanism based on a novel graphical model and explore various implementations of the framework. We show that our approach can yield attention maps that neatly highlight the regions of interest while suppressing the backgrounds, and are interpretable for the decision-making of the DNNs. To validate the effectiveness of the proposed IB-inspired attention mechanism, we apply it to various computer vision tasks including image classification, fine-grained recognition, cross-domain classification, semantic segmentation, and object detection. Extensive experiments demonstrate that it bootstraps standard DNN structures quantitatively and qualitatively for these tasks.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104180"},"PeriodicalIF":4.3000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002613","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The selective visual attention mechanism in the Human Visual System (HVS) restricts the amount of information that reaches human visual awareness, allowing the brain to perceive high-fidelity natural scenes in real-time with limited computational cost. This selectivity acts as an “Information Bottleneck (IB)” that balances information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). This paper introduces an IB-inspired spatial attention module for DNNs, which generates an attention map by minimizing the mutual information (MI) between the attentive content and the input while maximizing that between the attentive content and the output. We develop this IB-inspired attention mechanism based on a novel graphical model and explore various implementations of the framework. We show that our approach can yield attention maps that neatly highlight the regions of interest while suppressing the backgrounds, and are interpretable for the decision-making of the DNNs. To validate the effectiveness of the proposed IB-inspired attention mechanism, we apply it to various computer vision tasks including image classification, fine-grained recognition, cross-domain classification, semantic segmentation, and object detection. Extensive experiments demonstrate that it bootstraps standard DNN structures quantitatively and qualitatively for these tasks.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems