PSVMA+: Exploring Multi-Granularity Semantic-Visual Adaption for Generalized Zero-Shot Learning

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-25 DOI:10.1109/TPAMI.2024.3467229

Man Liu;Huihui Bai;Feng Li;Chunjie Zhang;Yunchao Wei;Meng Wang;Tat-Seng Chua;Yao Zhao

{"title":"PSVMA+: Exploring Multi-Granularity Semantic-Visual Adaption for Generalized Zero-Shot Learning","authors":"Man Liu;Huihui Bai;Feng Li;Chunjie Zhang;Yunchao Wei;Meng Wang;Tat-Seng Chua;Yao Zhao","doi":"10.1109/TPAMI.2024.3467229","DOIUrl":null,"url":null,"abstract":"Generalized zero-shot learning (GZSL) endeavors to identify the unseen categories using knowledge from the seen domain, necessitating the intrinsic interactions between the visual features and attribute semantic features. However, GZSL suffers from insufficient visual-semantic correspondences due to the attribute diversity and instance diversity. Attribute diversity refers to varying semantic granularity in attribute descriptions, ranging from low-level (specific, directly observable) to high-level (abstract, highly generic) characteristics. This diversity challenges the collection of adequate visual cues for attributes under a uni-granularity. Additionally, diverse visual instances corresponding to the same sharing attributes introduce semantic ambiguity, leading to vague visual patterns. To tackle these problems, we propose a multi-granularity progressive semantic-visual mutual adaption (PSVMA+) network, where sufficient visual elements across granularity levels can be gathered to remedy the granularity inconsistency. PSVMA+ explores semantic-visual interactions at different granularity levels, enabling awareness of multi-granularity in both visual and semantic elements. At each granularity level, the dual semantic-visual transformer module (DSVTM) recasts the sharing attributes into instance-centric attributes and aggregates the semantic-related visual regions, thereby learning unambiguous visual features to accommodate various instances. Given the diverse contributions of different granularities, PSVMA+ employs selective cross-granularity learning to leverage knowledge from reliable granularities and adaptively fuses multi-granularity features for comprehensive representations. Experimental results demonstrate that PSVMA+ consistently outperforms state-of-the-art methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 1","pages":"51-66"},"PeriodicalIF":18.6000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10693541/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Generalized zero-shot learning (GZSL) endeavors to identify the unseen categories using knowledge from the seen domain, necessitating the intrinsic interactions between the visual features and attribute semantic features. However, GZSL suffers from insufficient visual-semantic correspondences due to the attribute diversity and instance diversity. Attribute diversity refers to varying semantic granularity in attribute descriptions, ranging from low-level (specific, directly observable) to high-level (abstract, highly generic) characteristics. This diversity challenges the collection of adequate visual cues for attributes under a uni-granularity. Additionally, diverse visual instances corresponding to the same sharing attributes introduce semantic ambiguity, leading to vague visual patterns. To tackle these problems, we propose a multi-granularity progressive semantic-visual mutual adaption (PSVMA+) network, where sufficient visual elements across granularity levels can be gathered to remedy the granularity inconsistency. PSVMA+ explores semantic-visual interactions at different granularity levels, enabling awareness of multi-granularity in both visual and semantic elements. At each granularity level, the dual semantic-visual transformer module (DSVTM) recasts the sharing attributes into instance-centric attributes and aggregates the semantic-related visual regions, thereby learning unambiguous visual features to accommodate various instances. Given the diverse contributions of different granularities, PSVMA+ employs selective cross-granularity learning to leverage knowledge from reliable granularities and adaptively fuses multi-granularity features for comprehensive representations. Experimental results demonstrate that PSVMA+ consistently outperforms state-of-the-art methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PSVMA+：探索用于广义零点学习的多粒度语义视觉自适应技术

广义零次学习（Generalized zero-shot learning， GZSL）是利用可见领域的知识来识别未知类别，这需要视觉特征和属性语义特征之间的内在交互。然而，由于属性多样性和实例多样性，GZSL存在视觉语义对应不足的问题。属性多样性是指属性描述中的不同语义粒度，范围从低级（具体的、直接可观察的）特征到高级（抽象的、高度通用的）特征。这种多样性对在单一粒度下收集足够的属性视觉线索提出了挑战。此外，对应于相同共享属性的不同视觉实例会引入语义歧义，从而导致模糊的视觉模式。为了解决这些问题，我们提出了一个多粒度渐进语义-视觉相互适应（PSVMA+）网络，该网络可以收集足够的跨粒度级别的视觉元素来弥补粒度不一致。PSVMA+探索不同粒度级别的语义-视觉交互，支持在视觉和语义元素中感知多粒度。在每个粒度级别上，双语义-视觉转换模块（DSVTM）将共享属性重铸为以实例为中心的属性，并聚合与语义相关的视觉区域，从而学习明确的视觉特征以适应各种实例。考虑到不同粒度的不同贡献，PSVMA+采用选择性跨粒度学习来利用可靠粒度的知识，并自适应地融合多粒度特征以获得全面的表示。实验结果表明，PSVMA+始终优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量