{"title":"深度神经网络中多种视觉注意机制的集成","authors":"Fernando Martinez, Yijun Zhao","doi":"10.1109/COMPSAC57700.2023.00180","DOIUrl":null,"url":null,"abstract":"Inspired by the success of various visual attention techniques in computer vision, we introduce a novel method for integrating multiple attention mechanisms to boost model performance. Our approach involves augmenting a base model with a Parallel Visual Attention Encoder (PVAE) branch, which concurrently employs two different attention modules (modified large kernel attention and modified convolutional block attention) to capture essential visual features. To reduce the training cost incurred by these additional components, we apply an encoder for efficient feature extraction and dimensionality reduction before applying the attention modules. The proposed PVAE architecture can be combined with cutting-edge models (e.g., EfficientNet, ResNet, DenseNet, etc.) to create a Parallel Visual Attention Network (PVAN). We evaluate the efficacy of our approach by devising a PVAN with EfficientNet as the base model for the task of classifying dog breeds. Our experimental results demonstrate the effectiveness of the proposed hybrid visual attention architecture, which achieves superior performance compared to the base model and models with a single attention mechanism. We further present an interactive web application developed for the general public to identify dog breeds using their photographs to test our model’s performance in real-life scenarios.","PeriodicalId":296288,"journal":{"name":"2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrating Multiple Visual Attention Mechanisms in Deep Neural Networks\",\"authors\":\"Fernando Martinez, Yijun Zhao\",\"doi\":\"10.1109/COMPSAC57700.2023.00180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Inspired by the success of various visual attention techniques in computer vision, we introduce a novel method for integrating multiple attention mechanisms to boost model performance. Our approach involves augmenting a base model with a Parallel Visual Attention Encoder (PVAE) branch, which concurrently employs two different attention modules (modified large kernel attention and modified convolutional block attention) to capture essential visual features. To reduce the training cost incurred by these additional components, we apply an encoder for efficient feature extraction and dimensionality reduction before applying the attention modules. The proposed PVAE architecture can be combined with cutting-edge models (e.g., EfficientNet, ResNet, DenseNet, etc.) to create a Parallel Visual Attention Network (PVAN). We evaluate the efficacy of our approach by devising a PVAN with EfficientNet as the base model for the task of classifying dog breeds. Our experimental results demonstrate the effectiveness of the proposed hybrid visual attention architecture, which achieves superior performance compared to the base model and models with a single attention mechanism. We further present an interactive web application developed for the general public to identify dog breeds using their photographs to test our model’s performance in real-life scenarios.\",\"PeriodicalId\":296288,\"journal\":{\"name\":\"2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)\",\"volume\":\"135 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSAC57700.2023.00180\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC57700.2023.00180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Integrating Multiple Visual Attention Mechanisms in Deep Neural Networks
Inspired by the success of various visual attention techniques in computer vision, we introduce a novel method for integrating multiple attention mechanisms to boost model performance. Our approach involves augmenting a base model with a Parallel Visual Attention Encoder (PVAE) branch, which concurrently employs two different attention modules (modified large kernel attention and modified convolutional block attention) to capture essential visual features. To reduce the training cost incurred by these additional components, we apply an encoder for efficient feature extraction and dimensionality reduction before applying the attention modules. The proposed PVAE architecture can be combined with cutting-edge models (e.g., EfficientNet, ResNet, DenseNet, etc.) to create a Parallel Visual Attention Network (PVAN). We evaluate the efficacy of our approach by devising a PVAN with EfficientNet as the base model for the task of classifying dog breeds. Our experimental results demonstrate the effectiveness of the proposed hybrid visual attention architecture, which achieves superior performance compared to the base model and models with a single attention mechanism. We further present an interactive web application developed for the general public to identify dog breeds using their photographs to test our model’s performance in real-life scenarios.