Introspective Deep Metric Learning

IF 20.8 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2023-09-05 DOI:10.48550/arXiv.2205.04449

Cheng-Hao Wang, Wenzhao Zheng, Zheng Hua Zhu, Jie Zhou, Jiwen Lu

{"title":"Introspective Deep Metric Learning","authors":"Cheng-Hao Wang, Wenzhao Zheng, Zheng Hua Zhu, Jie Zhou, Jiwen Lu","doi":"10.48550/arXiv.2205.04449","DOIUrl":null,"url":null,"abstract":"This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images. Conventional deep metric learning methods focus on learning a discriminative embedding to describe the semantic features of images, which ignore the existence of uncertainty in each image resulting from noise or semantic ambiguity. Training without awareness of these uncertainties causes the model to overfit the annotated labels during training and produce overconfident judgments during inference. Motivated by this, we argue that a good similarity model should consider the semantic discrepancies with awareness of the uncertainty to better deal with ambiguous images for more robust training. To achieve this, we propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively. We further propose an introspective similarity metric to make similarity judgments between images considering both their semantic differences and ambiguities. The gradient analysis of the proposed metric shows that it enables the model to learn at an adaptive and slower pace to deal with the uncertainty during training. Our framework attains state-of-the-art performance on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets for image retrieval. We further evaluate our framework for image classification on the ImageNet-1K, CIFAR-10, and CIFAR-100 datasets, which shows that equipping existing data mixing methods with the proposed introspective metric consistently achieves better results (e.g., +0.44% for CutMix on ImageNet-1K).","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":20.8000,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.04449","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 2

Abstract

This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images. Conventional deep metric learning methods focus on learning a discriminative embedding to describe the semantic features of images, which ignore the existence of uncertainty in each image resulting from noise or semantic ambiguity. Training without awareness of these uncertainties causes the model to overfit the annotated labels during training and produce overconfident judgments during inference. Motivated by this, we argue that a good similarity model should consider the semantic discrepancies with awareness of the uncertainty to better deal with ambiguous images for more robust training. To achieve this, we propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively. We further propose an introspective similarity metric to make similarity judgments between images considering both their semantic differences and ambiguities. The gradient analysis of the proposed metric shows that it enables the model to learn at an adaptive and slower pace to deal with the uncertainty during training. Our framework attains state-of-the-art performance on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets for image retrieval. We further evaluate our framework for image classification on the ImageNet-1K, CIFAR-10, and CIFAR-100 datasets, which shows that equipping existing data mixing methods with the proposed introspective metric consistently achieves better results (e.g., +0.44% for CutMix on ImageNet-1K).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

反思性深度度量学习

本文提出了一种内省式深度度量学习（IDML）框架，用于图像的不确定性比较。传统的深度度量学习方法侧重于学习判别嵌入来描述图像的语义特征，而忽略了噪声或语义模糊导致的每个图像中存在的不确定性。在没有意识到这些不确定性的情况下进行训练会导致模型在训练过程中过度拟合注释标签，并在推理过程中产生过度自信的判断。受此启发，我们认为，一个好的相似性模型应该考虑语义差异，并意识到不确定性，以更好地处理模糊图像，从而进行更稳健的训练。为了实现这一点，我们建议不仅使用语义嵌入，还使用伴随的不确定性嵌入来表示图像，该嵌入分别描述图像的语义特征和模糊性。我们进一步提出了一种内省相似性度量，以在考虑图像语义差异和歧义的情况下对图像进行相似性判断。对所提出的度量的梯度分析表明，它使模型能够以自适应和较慢的速度学习，以应对训练过程中的不确定性。我们的框架在广泛使用的用于图像检索的CUB200-2011、Cars196和Stanford Online Products数据集上获得了最先进的性能。我们在ImageNet-1K、CIFAR-10和CIFAR-100数据集上进一步评估了我们的图像分类框架，这表明，为现有的数据混合方法配备所提出的内省度量始终可以获得更好的结果（例如，ImageNet-1K上的CutMix为+0.44%）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.

期刊最新文献

FSD V2: Improving Fully Sparse 3D Object Detection with Virtual Voxels Partial Scene Text Retrieval BokehMe++: Harmonious Fusion of Classical and Neural Rendering for Versatile Bokeh Creation DiffI2I: Efficient Diffusion Model for Image-to-Image Translation A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning