{"title":"零射击学习的无监督侧信息学习","authors":"Fan Zhang","doi":"10.1109/CONF-SPML54095.2021.00070","DOIUrl":null,"url":null,"abstract":"Zero-Shot Learning aims to recognize unseen class images that do not appear in training, which is attracting more and more research interests in recently years. Side information is an important key to ZSL since it transfers the knowledge between seen and unseen classes. Human annotated attribute, as the most popular side information, need much human effort and time consumption during data collection. While unsupervised side information such as word2vec is not performing well since they lack the representation ability for visual information. In this paper, we propose to use CLIP features, which is learned with image and natural language pairs without human efforts, to perform ZSL. Extensive experiments on two benchmark datasets, AWA2 and CUB, demonstrates that our method is achieving impressive accuracy gain over word2vec, even beats human attributes in some circumstances.","PeriodicalId":415094,"journal":{"name":"2021 International Conference on Signal Processing and Machine Learning (CONF-SPML)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Unsupervised Side Information for Zero-Shot Learning\",\"authors\":\"Fan Zhang\",\"doi\":\"10.1109/CONF-SPML54095.2021.00070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Zero-Shot Learning aims to recognize unseen class images that do not appear in training, which is attracting more and more research interests in recently years. Side information is an important key to ZSL since it transfers the knowledge between seen and unseen classes. Human annotated attribute, as the most popular side information, need much human effort and time consumption during data collection. While unsupervised side information such as word2vec is not performing well since they lack the representation ability for visual information. In this paper, we propose to use CLIP features, which is learned with image and natural language pairs without human efforts, to perform ZSL. Extensive experiments on two benchmark datasets, AWA2 and CUB, demonstrates that our method is achieving impressive accuracy gain over word2vec, even beats human attributes in some circumstances.\",\"PeriodicalId\":415094,\"journal\":{\"name\":\"2021 International Conference on Signal Processing and Machine Learning (CONF-SPML)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Signal Processing and Machine Learning (CONF-SPML)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONF-SPML54095.2021.00070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Signal Processing and Machine Learning (CONF-SPML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONF-SPML54095.2021.00070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning Unsupervised Side Information for Zero-Shot Learning
Zero-Shot Learning aims to recognize unseen class images that do not appear in training, which is attracting more and more research interests in recently years. Side information is an important key to ZSL since it transfers the knowledge between seen and unseen classes. Human annotated attribute, as the most popular side information, need much human effort and time consumption during data collection. While unsupervised side information such as word2vec is not performing well since they lack the representation ability for visual information. In this paper, we propose to use CLIP features, which is learned with image and natural language pairs without human efforts, to perform ZSL. Extensive experiments on two benchmark datasets, AWA2 and CUB, demonstrates that our method is achieving impressive accuracy gain over word2vec, even beats human attributes in some circumstances.