A Vision Transformer Architecture for Open Set Recognition

2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2022-12-01 DOI:10.1109/ICMLA55696.2022.00034

Feiyang Cai, Zhenkai Zhang, Jie Liu

{"title":"A Vision Transformer Architecture for Open Set Recognition","authors":"Feiyang Cai, Zhenkai Zhang, Jie Liu","doi":"10.1109/ICMLA55696.2022.00034","DOIUrl":null,"url":null,"abstract":"Deep neural networks have demonstrated prominent capacities for image classification tasks in a closed set setting, where the test data come from the same distribution as the training data. However, in a more realistic open set scenario, traditional classifiers with incomplete knowledge cannot tackle test data that are not from the training classes. Open set recognition (OSR) aims to address this problem by both identifying unknown classes and distinguishing known classes simultaneously. In this paper, we propose a novel approach to OSR that is based on the vision transformer (ViT) technique. Specifically, our approach employs two separate training stages. First, a ViT model is trained to perform closed set classification. Then, an additional detection head is attached to the embedded features extracted by the ViT, trained to force the representations of known data to class-specific clusters compactly. Test examples are identified as known or unknown based on their distance to the cluster centers. To the best of our knowledge, this is the first time to leverage ViT for the purpose of OSR, and our extensive evaluation against several OSR benchmark datasets reveals that our approach significantly outperforms other baseline methods and obtains new state-of-the-art performance.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep neural networks have demonstrated prominent capacities for image classification tasks in a closed set setting, where the test data come from the same distribution as the training data. However, in a more realistic open set scenario, traditional classifiers with incomplete knowledge cannot tackle test data that are not from the training classes. Open set recognition (OSR) aims to address this problem by both identifying unknown classes and distinguishing known classes simultaneously. In this paper, we propose a novel approach to OSR that is based on the vision transformer (ViT) technique. Specifically, our approach employs two separate training stages. First, a ViT model is trained to perform closed set classification. Then, an additional detection head is attached to the embedded features extracted by the ViT, trained to force the representations of known data to class-specific clusters compactly. Test examples are identified as known or unknown based on their distance to the cluster centers. To the best of our knowledge, this is the first time to leverage ViT for the purpose of OSR, and our extensive evaluation against several OSR benchmark datasets reveals that our approach significantly outperforms other baseline methods and obtains new state-of-the-art performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种面向开放集识别的视觉变换体系结构

深度神经网络已经证明了在封闭集环境下图像分类任务的突出能力，其中测试数据与训练数据来自相同的分布。然而，在更现实的开放集场景中，具有不完全知识的传统分类器无法处理非训练类的测试数据。开放集识别(OSR)旨在通过同时识别未知类和识别已知类来解决这一问题。在本文中，我们提出了一种基于视觉变压器(ViT)技术的OSR新方法。具体来说，我们的方法采用了两个独立的训练阶段。首先，训练ViT模型进行闭集分类。然后，一个附加的检测头附加到由ViT提取的嵌入特征上，训练以将已知数据的表示紧凑地强制到特定类的聚类。测试样例根据它们到聚类中心的距离被识别为已知或未知。据我们所知，这是第一次将ViT用于OSR，我们对几个OSR基准数据集进行了广泛的评估，结果表明我们的方法明显优于其他基准方法，并获得了最新的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量