{"title":"用于零次学习的语义增强跨模态GAN","authors":"Haotian Sun, Jiwei Wei, Yang Yang, Xing Xu","doi":"10.1145/3469877.3490581","DOIUrl":null,"url":null,"abstract":"The goal of Zero-shot Learning (ZSL) is to recognize categories that are not seen during the training process. The traditional method is to learn an embedding space and map visual features and semantic features to this common space. However, this method inevitably encounters the bias problem, i.e., unseen instances are often incorrectly recognized as the seen classes. Some attempts are made by proposing another paradigm, which uses generative models to hallucinate the features of unseen samples. However, the generative models often suffer from instability issues, making it impractical for them to generate fine-grained features of unseen samples, thus resulting in very limited improvement. To resolve this, a Semantic Enhanced Cross-modal GAN (SECM GAN) is proposed by imposing the cross-modal association for improving the semantic and discriminative property of the generated features. Specifically, we first train a cross-modal embedding model called Semantic Enhanced Cross-modal Model (SECM), which is constrained by discrimination and semantics. Then we train our generative model based on Generative Adversarial Network (GAN) called SECM GAN, in which the generator generates cross-modal features, and the discriminator distinguishes true cross-modal features from generated cross-modal features. We deploy SECM as a weak constraint of GAN, which makes reliance on GAN get reduced. We evaluate extensive experiments on three widely used ZSL datasets to demonstrate the superiority of our framework.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Enhanced Cross-modal GAN for Zero-shot Learning\",\"authors\":\"Haotian Sun, Jiwei Wei, Yang Yang, Xing Xu\",\"doi\":\"10.1145/3469877.3490581\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of Zero-shot Learning (ZSL) is to recognize categories that are not seen during the training process. The traditional method is to learn an embedding space and map visual features and semantic features to this common space. However, this method inevitably encounters the bias problem, i.e., unseen instances are often incorrectly recognized as the seen classes. Some attempts are made by proposing another paradigm, which uses generative models to hallucinate the features of unseen samples. However, the generative models often suffer from instability issues, making it impractical for them to generate fine-grained features of unseen samples, thus resulting in very limited improvement. To resolve this, a Semantic Enhanced Cross-modal GAN (SECM GAN) is proposed by imposing the cross-modal association for improving the semantic and discriminative property of the generated features. Specifically, we first train a cross-modal embedding model called Semantic Enhanced Cross-modal Model (SECM), which is constrained by discrimination and semantics. Then we train our generative model based on Generative Adversarial Network (GAN) called SECM GAN, in which the generator generates cross-modal features, and the discriminator distinguishes true cross-modal features from generated cross-modal features. We deploy SECM as a weak constraint of GAN, which makes reliance on GAN get reduced. We evaluate extensive experiments on three widely used ZSL datasets to demonstrate the superiority of our framework.\",\"PeriodicalId\":210974,\"journal\":{\"name\":\"ACM Multimedia Asia\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Multimedia Asia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3469877.3490581\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469877.3490581","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semantic Enhanced Cross-modal GAN for Zero-shot Learning
The goal of Zero-shot Learning (ZSL) is to recognize categories that are not seen during the training process. The traditional method is to learn an embedding space and map visual features and semantic features to this common space. However, this method inevitably encounters the bias problem, i.e., unseen instances are often incorrectly recognized as the seen classes. Some attempts are made by proposing another paradigm, which uses generative models to hallucinate the features of unseen samples. However, the generative models often suffer from instability issues, making it impractical for them to generate fine-grained features of unseen samples, thus resulting in very limited improvement. To resolve this, a Semantic Enhanced Cross-modal GAN (SECM GAN) is proposed by imposing the cross-modal association for improving the semantic and discriminative property of the generated features. Specifically, we first train a cross-modal embedding model called Semantic Enhanced Cross-modal Model (SECM), which is constrained by discrimination and semantics. Then we train our generative model based on Generative Adversarial Network (GAN) called SECM GAN, in which the generator generates cross-modal features, and the discriminator distinguishes true cross-modal features from generated cross-modal features. We deploy SECM as a weak constraint of GAN, which makes reliance on GAN get reduced. We evaluate extensive experiments on three widely used ZSL datasets to demonstrate the superiority of our framework.