{"title":"Generalized zero-shot learning via discriminative and transferable disentangled representations.","authors":"Chunyu Zhang, Zhanshan Li","doi":"10.1016/j.neunet.2024.106964","DOIUrl":null,"url":null,"abstract":"<p><p>In generalized zero-shot learning (GZSL), it is required to identify seen and unseen samples under the condition that only seen classes can be obtained during training. Recent methods utilize disentanglement to make the information contained in visual features semantically related, and ensuring semantic consistency and independence of the disentangled representations is the key to achieving better performance. However, we think there are still some limitations. Firstly, due to the fact that only seen classes can be obtained during training, the recognition of unseen samples will be poor. Secondly, the distribution relations of the representation space and the semantic space are different, and ignoring the discrepancy between them may impact the generalization of the model. In addition, the instances are associated with each other, and considering the interactions between them can obtain more discriminative information, which should not be ignored. Thirdly, since the synthesized visual features may not match the corresponding semantic descriptions well, it will compromise the learning of semantic consistency. To overcome these challenges, we propose to learn discriminative and transferable disentangled representations (DTDR) for generalized zero-shot learning. Firstly, we exploit the estimated class similarities to supervise the relations between seen semantic-matched representations and unseen semantic descriptions, thereby gaining better insight into the unseen domain. Secondly, we use cosine similarities between semantic descriptions to constrain the similarities between semantic-matched representations, thereby facilitating the distribution relation of semantic-matched representation space to approximate the distribution relation of semantic space. And during the process, the instance-level correlation can be taken into account. Thirdly, we reconstruct the synthesized visual features into the corresponding semantic descriptions to better establish the associations between them. The experimental results on four datasets verify the effectiveness of our method.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106964"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.neunet.2024.106964","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/30 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In generalized zero-shot learning (GZSL), it is required to identify seen and unseen samples under the condition that only seen classes can be obtained during training. Recent methods utilize disentanglement to make the information contained in visual features semantically related, and ensuring semantic consistency and independence of the disentangled representations is the key to achieving better performance. However, we think there are still some limitations. Firstly, due to the fact that only seen classes can be obtained during training, the recognition of unseen samples will be poor. Secondly, the distribution relations of the representation space and the semantic space are different, and ignoring the discrepancy between them may impact the generalization of the model. In addition, the instances are associated with each other, and considering the interactions between them can obtain more discriminative information, which should not be ignored. Thirdly, since the synthesized visual features may not match the corresponding semantic descriptions well, it will compromise the learning of semantic consistency. To overcome these challenges, we propose to learn discriminative and transferable disentangled representations (DTDR) for generalized zero-shot learning. Firstly, we exploit the estimated class similarities to supervise the relations between seen semantic-matched representations and unseen semantic descriptions, thereby gaining better insight into the unseen domain. Secondly, we use cosine similarities between semantic descriptions to constrain the similarities between semantic-matched representations, thereby facilitating the distribution relation of semantic-matched representation space to approximate the distribution relation of semantic space. And during the process, the instance-level correlation can be taken into account. Thirdly, we reconstruct the synthesized visual features into the corresponding semantic descriptions to better establish the associations between them. The experimental results on four datasets verify the effectiveness of our method.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.