Disentangling Before Composing: Learning Invariant Disentangled Features for Compositional Zero-Shot Learning

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-28 DOI:10.1109/TPAMI.2024.3487222

Tian Zhang;Kongming Liang;Ruoyi Du;Wei Chen;Zhanyu Ma

{"title":"Disentangling Before Composing: Learning Invariant Disentangled Features for Compositional Zero-Shot Learning","authors":"Tian Zhang;Kongming Liang;Ruoyi Du;Wei Chen;Zhanyu Ma","doi":"10.1109/TPAMI.2024.3487222","DOIUrl":null,"url":null,"abstract":"Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set. Previous works mainly project an image and its corresponding composition into a common embedding space to measure their compatibility score. However, both attributes and objects share the visual representations learned above, leading the model to exploit spurious correlations and bias towards seen compositions. Instead, we reconsider CZSL as an out-of-distribution generalization problem. If an object is treated as a domain, we can learn object-invariant features to recognize attributes attached to any object reliably, and vice versa. Specifically, we propose an invariant feature learning framework to align different domains at the representation and gradient levels to capture the intrinsic characteristics associated with the tasks. To further facilitate and encourage the disentanglement of attributes and objects, we propose an “encoding-reshuffling-decoding” process to help the model avoid spurious correlations by randomly regrouping the disentangled features into synthetic features. Ultimately, our method improves generalization by learning to disentangle features that represent two independent factors of attributes and objects. Experiments demonstrate that the proposed method achieves state-of-the-art or competitive performance in both closed-world and open-world scenarios.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 2","pages":"1132-1147"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10737100/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set. Previous works mainly project an image and its corresponding composition into a common embedding space to measure their compatibility score. However, both attributes and objects share the visual representations learned above, leading the model to exploit spurious correlations and bias towards seen compositions. Instead, we reconsider CZSL as an out-of-distribution generalization problem. If an object is treated as a domain, we can learn object-invariant features to recognize attributes attached to any object reliably, and vice versa. Specifically, we propose an invariant feature learning framework to align different domains at the representation and gradient levels to capture the intrinsic characteristics associated with the tasks. To further facilitate and encourage the disentanglement of attributes and objects, we propose an “encoding-reshuffling-decoding” process to help the model avoid spurious correlations by randomly regrouping the disentangled features into synthetic features. Ultimately, our method improves generalization by learning to disentangle features that represent two independent factors of attributes and objects. Experiments demonstrate that the proposed method achieves state-of-the-art or competitive performance in both closed-world and open-world scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

合成前的分解：为零镜头合成学习学习不变的分解特征

构图零点学习（CZSL）旨在利用从训练集中所见的属性-对象构图中学到的知识来识别新的构图。以往的工作主要是将图像及其相应的组合投射到一个共同的嵌入空间，以衡量它们的兼容性得分。然而，属性和对象都共享上述学习到的视觉表征，导致模型利用虚假的相关性，偏向于已见过的组合。相反，我们将 CZSL 视为分布外概括问题。如果将对象视为一个域，我们就可以学习对象不变的特征，从而可靠地识别任何对象的属性，反之亦然。具体来说，我们提出了一种不变特征学习框架，在表征和梯度层面对不同领域进行调整，以捕捉与任务相关的内在特征。为了进一步促进和鼓励属性与对象的分离，我们提出了一个 "编码-清除-解码 "过程，通过将分离的特征随机重新组合为合成特征，帮助模型避免虚假的相关性。最终，我们的方法通过学习如何分离代表属性和对象两个独立因素的特征，提高了泛化能力。实验证明，所提出的方法在封闭世界和开放世界场景中都达到了最先进或具有竞争力的性能。代码见 https://github.com/PRIS-CV/Disentangling-before-Composing。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量

期刊最新文献

Fully-Connected Transformer for Multi-Source Image Fusion RenAIssance: A Survey Into AI Text-to-Image Generation in the Era of Large Model Natural Adversarial Mask for Face Identity Protection in Physical World Multi-Head Encoding for Extreme Label Classification Hierarchical Banzhaf Interaction for General Video-Language Representation Learning