{"title":"基于语义引导的高阶区域注意嵌入的零次学习","authors":"Rui Zhang, Xiangyu Xu, Qi Zhu","doi":"10.1109/ICACI52617.2021.9435883","DOIUrl":null,"url":null,"abstract":"In zero-shot learning, knowledge transfer problem is the major challenge, which can be achieved by exploring the pattern between visual and semantic space. However, only aligning the global visual features with semantic vectors may ignore some discriminative differences. The local region features are not only implicitly related with semantic vectors, but also contain more discriminative information. Besides, most of the previous methods only consider the first-order statistical features, which may fail to capture the complex relations between categories. In this paper, we propose a semantic-guided high-order region attention embedding model that leverages the second-order information of both global features and local region features via different attention modules in an end-to-end fashion. First, we devise an encoder-decoder part to reconstruct the visual feature maps guided by semantic attention. Then, the original and new feature maps are simultaneously fed into their respective following branches to calculate region attentive and global attentive features. After that, a second-order pooling module is integrated to form higher-order features. The comprehensive experiments on four popular datasets of CUB, AWA2, SUN and aPY show the efficiency of our proposed model for zero-shot learning task and a considerable improvement over the state-of-the-art methods under generalized zero-shot learning setting.","PeriodicalId":382483,"journal":{"name":"2021 13th International Conference on Advanced Computational Intelligence (ICACI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic-Guided High-Order Region Attention Embedding for Zero-Shot Learning\",\"authors\":\"Rui Zhang, Xiangyu Xu, Qi Zhu\",\"doi\":\"10.1109/ICACI52617.2021.9435883\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In zero-shot learning, knowledge transfer problem is the major challenge, which can be achieved by exploring the pattern between visual and semantic space. However, only aligning the global visual features with semantic vectors may ignore some discriminative differences. The local region features are not only implicitly related with semantic vectors, but also contain more discriminative information. Besides, most of the previous methods only consider the first-order statistical features, which may fail to capture the complex relations between categories. In this paper, we propose a semantic-guided high-order region attention embedding model that leverages the second-order information of both global features and local region features via different attention modules in an end-to-end fashion. First, we devise an encoder-decoder part to reconstruct the visual feature maps guided by semantic attention. Then, the original and new feature maps are simultaneously fed into their respective following branches to calculate region attentive and global attentive features. After that, a second-order pooling module is integrated to form higher-order features. The comprehensive experiments on four popular datasets of CUB, AWA2, SUN and aPY show the efficiency of our proposed model for zero-shot learning task and a considerable improvement over the state-of-the-art methods under generalized zero-shot learning setting.\",\"PeriodicalId\":382483,\"journal\":{\"name\":\"2021 13th International Conference on Advanced Computational Intelligence (ICACI)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Advanced Computational Intelligence (ICACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACI52617.2021.9435883\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI52617.2021.9435883","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semantic-Guided High-Order Region Attention Embedding for Zero-Shot Learning
In zero-shot learning, knowledge transfer problem is the major challenge, which can be achieved by exploring the pattern between visual and semantic space. However, only aligning the global visual features with semantic vectors may ignore some discriminative differences. The local region features are not only implicitly related with semantic vectors, but also contain more discriminative information. Besides, most of the previous methods only consider the first-order statistical features, which may fail to capture the complex relations between categories. In this paper, we propose a semantic-guided high-order region attention embedding model that leverages the second-order information of both global features and local region features via different attention modules in an end-to-end fashion. First, we devise an encoder-decoder part to reconstruct the visual feature maps guided by semantic attention. Then, the original and new feature maps are simultaneously fed into their respective following branches to calculate region attentive and global attentive features. After that, a second-order pooling module is integrated to form higher-order features. The comprehensive experiments on four popular datasets of CUB, AWA2, SUN and aPY show the efficiency of our proposed model for zero-shot learning task and a considerable improvement over the state-of-the-art methods under generalized zero-shot learning setting.