Discriminative Feature Focus via Masked Autoencoder for Zero-Shot Learning

IF 2 3区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer Supported Cooperative Work-The Journal of Collaborative Computing Pub Date : 2023-05-24 DOI:10.1109/CSCWD57460.2023.10152773
JingQi Yang, Cheng Xie, Peng Tang
{"title":"Discriminative Feature Focus via Masked Autoencoder for Zero-Shot Learning","authors":"JingQi Yang, Cheng Xie, Peng Tang","doi":"10.1109/CSCWD57460.2023.10152773","DOIUrl":null,"url":null,"abstract":"Zero-shot learning (ZSL) is an important research area in computer-supported cooperative work in design, especially in the field of visual collaborative computing. ZSL normally uses transferable semantic features to represent the visual features to predict unseen classes without training the unseen samples. Existing ZSL models have attempted to learn region features in a single image, while the discriminative attribute localization of visual features is typically neglected. To handle the mentioned problem, we propose a pre-trained Masked Autoencoders(MAE) based Zero-Shot Learning model. It uses multi-head self-attention in Transformer blocks to capture the most discriminative local features from a partial perspective by considering both positional and contextual information of the entire sequence of patches, which is consistent with the human attention mechanism when recognizing objects. Further, it uses a Multilayer Perceptron(MLP) to map visual features to the semantic space for relating visual and semantic attributes, and predicts the semantic information, which is used to find out the class label during inference. Both quantitative and qualitative experimental results on three popular ZSL benchmarks show the proposed method achieves the new state-of-the-art in the field of generalized zero-shot learning and conventional zero-shot learning. The source code of the proposed method is available at https://github.com/yangjingqi99/MAE-ZSL","PeriodicalId":51008,"journal":{"name":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","volume":"10 1","pages":"417-422"},"PeriodicalIF":2.0000,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/CSCWD57460.2023.10152773","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Zero-shot learning (ZSL) is an important research area in computer-supported cooperative work in design, especially in the field of visual collaborative computing. ZSL normally uses transferable semantic features to represent the visual features to predict unseen classes without training the unseen samples. Existing ZSL models have attempted to learn region features in a single image, while the discriminative attribute localization of visual features is typically neglected. To handle the mentioned problem, we propose a pre-trained Masked Autoencoders(MAE) based Zero-Shot Learning model. It uses multi-head self-attention in Transformer blocks to capture the most discriminative local features from a partial perspective by considering both positional and contextual information of the entire sequence of patches, which is consistent with the human attention mechanism when recognizing objects. Further, it uses a Multilayer Perceptron(MLP) to map visual features to the semantic space for relating visual and semantic attributes, and predicts the semantic information, which is used to find out the class label during inference. Both quantitative and qualitative experimental results on three popular ZSL benchmarks show the proposed method achieves the new state-of-the-art in the field of generalized zero-shot learning and conventional zero-shot learning. The source code of the proposed method is available at https://github.com/yangjingqi99/MAE-ZSL
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于遮罩自编码器的判别特征聚焦零拍摄学习
零射击学习(Zero-shot learning, ZSL)是计算机支持的设计协同工作,特别是视觉协同计算领域的一个重要研究方向。ZSL通常使用可转移的语义特征来表示视觉特征,在不训练未见样本的情况下预测未见的类。现有的ZSL模型试图学习单幅图像中的区域特征,而视觉特征的判别属性定位通常被忽略。为了解决上述问题,我们提出了一种基于预训练掩码自编码器(MAE)的零射击学习模型。它利用Transformer块中的多头自注意,通过考虑整个块序列的位置和上下文信息,从局部角度捕捉最具判别性的局部特征,这与人类识别物体时的注意机制是一致的。利用多层感知器(Multilayer Perceptron, MLP)将视觉特征映射到语义空间,实现视觉属性和语义属性的关联,并对语义信息进行预测,从而在推理过程中找到类标签。在三个常用的ZSL基准上的定量和定性实验结果表明,该方法达到了广义零射击学习和传统零射击学习领域的最新水平。建议的方法的源代码可在https://github.com/yangjingqi99/MAE-ZSL上获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Supported Cooperative Work-The Journal of Collaborative Computing
Computer Supported Cooperative Work-The Journal of Collaborative Computing COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-
CiteScore
6.40
自引率
4.20%
发文量
31
审稿时长
>12 weeks
期刊介绍: Computer Supported Cooperative Work (CSCW): The Journal of Collaborative Computing and Work Practices is devoted to innovative research in computer-supported cooperative work (CSCW). It provides an interdisciplinary and international forum for the debate and exchange of ideas concerning theoretical, practical, technical, and social issues in CSCW. The CSCW Journal arose in response to the growing interest in the design, implementation and use of technical systems (including computing, information, and communications technologies) which support people working cooperatively, and its scope remains to encompass the multifarious aspects of research within CSCW and related areas. The CSCW Journal focuses on research oriented towards the development of collaborative computing technologies on the basis of studies of actual cooperative work practices (where ‘work’ is used in the wider sense). That is, it welcomes in particular submissions that (a) report on findings from ethnographic or similar kinds of in-depth fieldwork of work practices with a view to their technological implications, (b) report on empirical evaluations of the use of extant or novel technical solutions under real-world conditions, and/or (c) develop technical or conceptual frameworks for practice-oriented computing research based on previous fieldwork and evaluations.
期刊最新文献
Text-based Patient – Doctor Discourse Online And Patients’ Experiences of Empathy Agency, Power and Confrontation: the Role for Socially Engaged Art in CSCW with Rurban Communities in Support of Inclusion Data as Relation: Ontological Trouble in the Data-Driven Public Administration The Avatar Facial Expression Reenactment Method in the Metaverse based on Overall-Local Optical-Flow Estimation and Illumination Difference Investigating Author Research Relatedness through Crowdsourcing: A Replication Study on MTurk
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1