{"title":"Discriminative Feature Focus via Masked Autoencoder for Zero-Shot Learning","authors":"JingQi Yang, Cheng Xie, Peng Tang","doi":"10.1109/CSCWD57460.2023.10152773","DOIUrl":null,"url":null,"abstract":"Zero-shot learning (ZSL) is an important research area in computer-supported cooperative work in design, especially in the field of visual collaborative computing. ZSL normally uses transferable semantic features to represent the visual features to predict unseen classes without training the unseen samples. Existing ZSL models have attempted to learn region features in a single image, while the discriminative attribute localization of visual features is typically neglected. To handle the mentioned problem, we propose a pre-trained Masked Autoencoders(MAE) based Zero-Shot Learning model. It uses multi-head self-attention in Transformer blocks to capture the most discriminative local features from a partial perspective by considering both positional and contextual information of the entire sequence of patches, which is consistent with the human attention mechanism when recognizing objects. Further, it uses a Multilayer Perceptron(MLP) to map visual features to the semantic space for relating visual and semantic attributes, and predicts the semantic information, which is used to find out the class label during inference. Both quantitative and qualitative experimental results on three popular ZSL benchmarks show the proposed method achieves the new state-of-the-art in the field of generalized zero-shot learning and conventional zero-shot learning. The source code of the proposed method is available at https://github.com/yangjingqi99/MAE-ZSL","PeriodicalId":51008,"journal":{"name":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","volume":"10 1","pages":"417-422"},"PeriodicalIF":2.0000,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/CSCWD57460.2023.10152773","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Zero-shot learning (ZSL) is an important research area in computer-supported cooperative work in design, especially in the field of visual collaborative computing. ZSL normally uses transferable semantic features to represent the visual features to predict unseen classes without training the unseen samples. Existing ZSL models have attempted to learn region features in a single image, while the discriminative attribute localization of visual features is typically neglected. To handle the mentioned problem, we propose a pre-trained Masked Autoencoders(MAE) based Zero-Shot Learning model. It uses multi-head self-attention in Transformer blocks to capture the most discriminative local features from a partial perspective by considering both positional and contextual information of the entire sequence of patches, which is consistent with the human attention mechanism when recognizing objects. Further, it uses a Multilayer Perceptron(MLP) to map visual features to the semantic space for relating visual and semantic attributes, and predicts the semantic information, which is used to find out the class label during inference. Both quantitative and qualitative experimental results on three popular ZSL benchmarks show the proposed method achieves the new state-of-the-art in the field of generalized zero-shot learning and conventional zero-shot learning. The source code of the proposed method is available at https://github.com/yangjingqi99/MAE-ZSL
期刊介绍:
Computer Supported Cooperative Work (CSCW): The Journal of Collaborative Computing and Work Practices is devoted to innovative research in computer-supported cooperative work (CSCW). It provides an interdisciplinary and international forum for the debate and exchange of ideas concerning theoretical, practical, technical, and social issues in CSCW.
The CSCW Journal arose in response to the growing interest in the design, implementation and use of technical systems (including computing, information, and communications technologies) which support people working cooperatively, and its scope remains to encompass the multifarious aspects of research within CSCW and related areas.
The CSCW Journal focuses on research oriented towards the development of collaborative computing technologies on the basis of studies of actual cooperative work practices (where ‘work’ is used in the wider sense). That is, it welcomes in particular submissions that (a) report on findings from ethnographic or similar kinds of in-depth fieldwork of work practices with a view to their technological implications, (b) report on empirical evaluations of the use of extant or novel technical solutions under real-world conditions, and/or (c) develop technical or conceptual frameworks for practice-oriented computing research based on previous fieldwork and evaluations.