{"title":"基于互惠近邻的自适应稀疏注意力模块","authors":"Zhonggui Sun, Can Zhang, Mingzhu Zhang","doi":"10.1117/1.jei.33.3.033038","DOIUrl":null,"url":null,"abstract":"The attention mechanism has become a crucial technique in deep feature representation for computer vision tasks. Using a similarity matrix, it enhances the current feature point with global context from the feature map of the network. However, the indiscriminate utilization of all information can easily introduce some irrelevant contents, inevitably hampering performance. In response to this challenge, sparsing, a common information filtering strategy, has been applied in many related studies. Regrettably, their filtering processes often lack reliability and adaptability. To address this issue, we first define an adaptive-reciprocal nearest neighbors (A-RNN) relationship. In identifying neighbors, it gains flexibility through learning adaptive thresholds. In addition, by introducing a reciprocity mechanism, the reliability of neighbors is ensured. Then, we use A-RNN to rectify the similarity matrix in the conventional attention module. In the specific implementation, to distinctly consider non-local and local information, we introduce two blocks: the non-local sparse constraint block and the local sparse constraint block. The former utilizes A-RNN to sparsify non-local information, whereas the latter uses adaptive thresholds to sparsify local information. As a result, an adaptive sparse attention (ASA) module is achieved, inheriting the advantages of flexibility and reliability from A-RNN. In the validation for the proposed ASA module, we use it to replace the attention module in NLNet and conduct experiments on semantic segmentation benchmarks including Cityscapes, ADE20K and PASCAL VOC 2012. With the same backbone (ResNet101), our ASA module outperforms the conventional attention module and its some state-of-the-art variants.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"169 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive sparse attention module based on reciprocal nearest neighbors\",\"authors\":\"Zhonggui Sun, Can Zhang, Mingzhu Zhang\",\"doi\":\"10.1117/1.jei.33.3.033038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The attention mechanism has become a crucial technique in deep feature representation for computer vision tasks. Using a similarity matrix, it enhances the current feature point with global context from the feature map of the network. However, the indiscriminate utilization of all information can easily introduce some irrelevant contents, inevitably hampering performance. In response to this challenge, sparsing, a common information filtering strategy, has been applied in many related studies. Regrettably, their filtering processes often lack reliability and adaptability. To address this issue, we first define an adaptive-reciprocal nearest neighbors (A-RNN) relationship. In identifying neighbors, it gains flexibility through learning adaptive thresholds. In addition, by introducing a reciprocity mechanism, the reliability of neighbors is ensured. Then, we use A-RNN to rectify the similarity matrix in the conventional attention module. In the specific implementation, to distinctly consider non-local and local information, we introduce two blocks: the non-local sparse constraint block and the local sparse constraint block. The former utilizes A-RNN to sparsify non-local information, whereas the latter uses adaptive thresholds to sparsify local information. As a result, an adaptive sparse attention (ASA) module is achieved, inheriting the advantages of flexibility and reliability from A-RNN. In the validation for the proposed ASA module, we use it to replace the attention module in NLNet and conduct experiments on semantic segmentation benchmarks including Cityscapes, ADE20K and PASCAL VOC 2012. With the same backbone (ResNet101), our ASA module outperforms the conventional attention module and its some state-of-the-art variants.\",\"PeriodicalId\":54843,\"journal\":{\"name\":\"Journal of Electronic Imaging\",\"volume\":\"169 1\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electronic Imaging\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1117/1.jei.33.3.033038\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.3.033038","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Adaptive sparse attention module based on reciprocal nearest neighbors
The attention mechanism has become a crucial technique in deep feature representation for computer vision tasks. Using a similarity matrix, it enhances the current feature point with global context from the feature map of the network. However, the indiscriminate utilization of all information can easily introduce some irrelevant contents, inevitably hampering performance. In response to this challenge, sparsing, a common information filtering strategy, has been applied in many related studies. Regrettably, their filtering processes often lack reliability and adaptability. To address this issue, we first define an adaptive-reciprocal nearest neighbors (A-RNN) relationship. In identifying neighbors, it gains flexibility through learning adaptive thresholds. In addition, by introducing a reciprocity mechanism, the reliability of neighbors is ensured. Then, we use A-RNN to rectify the similarity matrix in the conventional attention module. In the specific implementation, to distinctly consider non-local and local information, we introduce two blocks: the non-local sparse constraint block and the local sparse constraint block. The former utilizes A-RNN to sparsify non-local information, whereas the latter uses adaptive thresholds to sparsify local information. As a result, an adaptive sparse attention (ASA) module is achieved, inheriting the advantages of flexibility and reliability from A-RNN. In the validation for the proposed ASA module, we use it to replace the attention module in NLNet and conduct experiments on semantic segmentation benchmarks including Cityscapes, ADE20K and PASCAL VOC 2012. With the same backbone (ResNet101), our ASA module outperforms the conventional attention module and its some state-of-the-art variants.
期刊介绍:
The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.