Discriminative Feature Focus via Masked Autoencoder for Zero-Shot Learning

IF 2 3区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer Supported Cooperative Work-The Journal of Collaborative Computing Pub Date : 2023-05-24 DOI:10.1109/CSCWD57460.2023.10152773

JingQi Yang, Cheng Xie, Peng Tang

{"title":"Discriminative Feature Focus via Masked Autoencoder for Zero-Shot Learning","authors":"JingQi Yang, Cheng Xie, Peng Tang","doi":"10.1109/CSCWD57460.2023.10152773","DOIUrl":null,"url":null,"abstract":"Zero-shot learning (ZSL) is an important research area in computer-supported cooperative work in design, especially in the field of visual collaborative computing. ZSL normally uses transferable semantic features to represent the visual features to predict unseen classes without training the unseen samples. Existing ZSL models have attempted to learn region features in a single image, while the discriminative attribute localization of visual features is typically neglected. To handle the mentioned problem, we propose a pre-trained Masked Autoencoders(MAE) based Zero-Shot Learning model. It uses multi-head self-attention in Transformer blocks to capture the most discriminative local features from a partial perspective by considering both positional and contextual information of the entire sequence of patches, which is consistent with the human attention mechanism when recognizing objects. Further, it uses a Multilayer Perceptron(MLP) to map visual features to the semantic space for relating visual and semantic attributes, and predicts the semantic information, which is used to find out the class label during inference. Both quantitative and qualitative experimental results on three popular ZSL benchmarks show the proposed method achieves the new state-of-the-art in the field of generalized zero-shot learning and conventional zero-shot learning. The source code of the proposed method is available at https://github.com/yangjingqi99/MAE-ZSL","PeriodicalId":51008,"journal":{"name":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","volume":"10 1","pages":"417-422"},"PeriodicalIF":2.0000,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/CSCWD57460.2023.10152773","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Zero-shot learning (ZSL) is an important research area in computer-supported cooperative work in design, especially in the field of visual collaborative computing. ZSL normally uses transferable semantic features to represent the visual features to predict unseen classes without training the unseen samples. Existing ZSL models have attempted to learn region features in a single image, while the discriminative attribute localization of visual features is typically neglected. To handle the mentioned problem, we propose a pre-trained Masked Autoencoders(MAE) based Zero-Shot Learning model. It uses multi-head self-attention in Transformer blocks to capture the most discriminative local features from a partial perspective by considering both positional and contextual information of the entire sequence of patches, which is consistent with the human attention mechanism when recognizing objects. Further, it uses a Multilayer Perceptron(MLP) to map visual features to the semantic space for relating visual and semantic attributes, and predicts the semantic information, which is used to find out the class label during inference. Both quantitative and qualitative experimental results on three popular ZSL benchmarks show the proposed method achieves the new state-of-the-art in the field of generalized zero-shot learning and conventional zero-shot learning. The source code of the proposed method is available at https://github.com/yangjingqi99/MAE-ZSL

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于遮罩自编码器的判别特征聚焦零拍摄学习

零射击学习(Zero-shot learning, ZSL)是计算机支持的设计协同工作，特别是视觉协同计算领域的一个重要研究方向。ZSL通常使用可转移的语义特征来表示视觉特征，在不训练未见样本的情况下预测未见的类。现有的ZSL模型试图学习单幅图像中的区域特征，而视觉特征的判别属性定位通常被忽略。为了解决上述问题，我们提出了一种基于预训练掩码自编码器(MAE)的零射击学习模型。它利用Transformer块中的多头自注意，通过考虑整个块序列的位置和上下文信息，从局部角度捕捉最具判别性的局部特征，这与人类识别物体时的注意机制是一致的。利用多层感知器(Multilayer Perceptron, MLP)将视觉特征映射到语义空间，实现视觉属性和语义属性的关联，并对语义信息进行预测，从而在推理过程中找到类标签。在三个常用的ZSL基准上的定量和定性实验结果表明，该方法达到了广义零射击学习和传统零射击学习领域的最新水平。建议的方法的源代码可在https://github.com/yangjingqi99/MAE-ZSL上获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computer Supported Cooperative Work-The Journal of Collaborative Computing COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-

CiteScore

6.40

自引率

4.20%

发文量

审稿时长

>12 weeks

期刊介绍： Computer Supported Cooperative Work (CSCW): The Journal of Collaborative Computing and Work Practices is devoted to innovative research in computer-supported cooperative work (CSCW). It provides an interdisciplinary and international forum for the debate and exchange of ideas concerning theoretical, practical, technical, and social issues in CSCW. The CSCW Journal arose in response to the growing interest in the design, implementation and use of technical systems (including computing, information, and communications technologies) which support people working cooperatively, and its scope remains to encompass the multifarious aspects of research within CSCW and related areas. The CSCW Journal focuses on research oriented towards the development of collaborative computing technologies on the basis of studies of actual cooperative work practices (where ‘work’ is used in the wider sense). That is, it welcomes in particular submissions that (a) report on findings from ethnographic or similar kinds of in-depth fieldwork of work practices with a view to their technological implications, (b) report on empirical evaluations of the use of extant or novel technical solutions under real-world conditions, and/or (c) develop technical or conceptual frameworks for practice-oriented computing research based on previous fieldwork and evaluations.