Disentangling Before Composing: Learning Invariant Disentangled Features for Compositional Zero-Shot Learning.

Tian Zhang, Kongming Liang, Ruoyi Du, Wei Chen, Zhanyu Ma
{"title":"Disentangling Before Composing: Learning Invariant Disentangled Features for Compositional Zero-Shot Learning.","authors":"Tian Zhang, Kongming Liang, Ruoyi Du, Wei Chen, Zhanyu Ma","doi":"10.1109/TPAMI.2024.3487222","DOIUrl":null,"url":null,"abstract":"<p><p>Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set. Previous works mainly project an image and its corresponding composition into a common embedding space to measure their compatibility score. However, both attributes and objects share the visual representations learned above, leading the model to exploit spurious correlations and bias towards seen compositions. Instead, we reconsider CZSL as an out-of-distribution generalization problem. If an object is treated as a domain, we can learn object-invariant features to recognize attributes attached to any object reliably, and vice versa. Specifically, we propose an invariant feature learning framework to align different domains at the representation and gradient levels to capture the intrinsic characteristics associated with the tasks. To further facilitate and encourage the disentanglement of attributes and objects, we propose an \"encoding-reshuffling-decoding\" process to help the model avoid spurious correlations by randomly regrouping the disentangled features into synthetic features. Ultimately, our method improves generalization by learning to disentangle features that represent two independent factors of attributes and objects. Experiments demonstrate that the proposed method achieves state-of-the-art or competitive performance in both closed-world and open-world scenarios. Codes are available at https://github.com/PRIS-CV/Disentangling-before-Composing.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2024.3487222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set. Previous works mainly project an image and its corresponding composition into a common embedding space to measure their compatibility score. However, both attributes and objects share the visual representations learned above, leading the model to exploit spurious correlations and bias towards seen compositions. Instead, we reconsider CZSL as an out-of-distribution generalization problem. If an object is treated as a domain, we can learn object-invariant features to recognize attributes attached to any object reliably, and vice versa. Specifically, we propose an invariant feature learning framework to align different domains at the representation and gradient levels to capture the intrinsic characteristics associated with the tasks. To further facilitate and encourage the disentanglement of attributes and objects, we propose an "encoding-reshuffling-decoding" process to help the model avoid spurious correlations by randomly regrouping the disentangled features into synthetic features. Ultimately, our method improves generalization by learning to disentangle features that represent two independent factors of attributes and objects. Experiments demonstrate that the proposed method achieves state-of-the-art or competitive performance in both closed-world and open-world scenarios. Codes are available at https://github.com/PRIS-CV/Disentangling-before-Composing.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
合成前的分解:为零镜头合成学习学习不变的分解特征
构图零点学习(CZSL)旨在利用从训练集中所见的属性-对象构图中学到的知识来识别新的构图。以往的工作主要是将图像及其相应的组合投射到一个共同的嵌入空间,以衡量它们的兼容性得分。然而,属性和对象都共享上述学习到的视觉表征,导致模型利用虚假的相关性,偏向于已见过的组合。相反,我们将 CZSL 视为分布外概括问题。如果将对象视为一个域,我们就可以学习对象不变的特征,从而可靠地识别任何对象的属性,反之亦然。具体来说,我们提出了一种不变特征学习框架,在表征和梯度层面对不同领域进行调整,以捕捉与任务相关的内在特征。为了进一步促进和鼓励属性与对象的分离,我们提出了一个 "编码-清除-解码 "过程,通过将分离的特征随机重新组合为合成特征,帮助模型避免虚假的相关性。最终,我们的方法通过学习如何分离代表属性和对象两个独立因素的特征,提高了泛化能力。实验证明,所提出的方法在封闭世界和开放世界场景中都达到了最先进或具有竞争力的性能。代码见 https://github.com/PRIS-CV/Disentangling-before-Composing。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Language-Inspired Relation Transfer for Few-Shot Class-Incremental Learning. Multi-Modality Multi-Attribute Contrastive Pre-Training for Image Aesthetics Computing. 360SFUDA++: Towards Source-Free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes. Anti-Forgetting Adaptation for Unsupervised Person Re-Identification. Evolved Hierarchical Masking for Self-Supervised Learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1