He-Yen Hsieh, Ding-Jie Chen, Cheng-Wei Chang, Tyng-Luh Liu
{"title":"Aggregating Bilateral Attention for Few-Shot Instance Localization","authors":"He-Yen Hsieh, Ding-Jie Chen, Cheng-Wei Chang, Tyng-Luh Liu","doi":"10.1109/WACV56688.2023.00626","DOIUrl":null,"url":null,"abstract":"Attention filtering under various learning scenarios has proven advantageous in enhancing the performance of many neural network architectures. The mainstream attention mechanism is established upon the non-local block, also known as an essential component of the prominent Transformer networks, to catch long-range correlations. However, such unilateral attention is often hampered by sparse and obscure responses, revealing insufficient dependencies across images/patches, and high computational cost, especially for those employing the multi-head design. To overcome these issues, we introduce a novel mechanism of aggregating bilateral attention (ABA) and validate its usefulness in tackling the task of few-shot instance localization, reflecting the underlying query-support dependency. Specifically, our method facilitates uncovering informative features via assessing: i) an embedding norm for exploring the semantically-related cues; ii) context awareness for correlating the query data and support regions. ABA is then carried out by integrating the affinity relations derived from the two measurements to serve as a lightweight but effective query-support attention mechanism with high localization recall. We evaluate ABA on two localization tasks, namely, few-shot action localization and one-shot object detection. Extensive experiments demonstrate that the proposed ABA achieves superior performances over existing methods.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV56688.2023.00626","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Attention filtering under various learning scenarios has proven advantageous in enhancing the performance of many neural network architectures. The mainstream attention mechanism is established upon the non-local block, also known as an essential component of the prominent Transformer networks, to catch long-range correlations. However, such unilateral attention is often hampered by sparse and obscure responses, revealing insufficient dependencies across images/patches, and high computational cost, especially for those employing the multi-head design. To overcome these issues, we introduce a novel mechanism of aggregating bilateral attention (ABA) and validate its usefulness in tackling the task of few-shot instance localization, reflecting the underlying query-support dependency. Specifically, our method facilitates uncovering informative features via assessing: i) an embedding norm for exploring the semantically-related cues; ii) context awareness for correlating the query data and support regions. ABA is then carried out by integrating the affinity relations derived from the two measurements to serve as a lightweight but effective query-support attention mechanism with high localization recall. We evaluate ABA on two localization tasks, namely, few-shot action localization and one-shot object detection. Extensive experiments demonstrate that the proposed ABA achieves superior performances over existing methods.