Introspective Deep Metric Learning

IF 20.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2023-09-05 DOI:10.48550/arXiv.2205.04449
Cheng-Hao Wang, Wenzhao Zheng, Zheng Hua Zhu, Jie Zhou, Jiwen Lu
{"title":"Introspective Deep Metric Learning","authors":"Cheng-Hao Wang, Wenzhao Zheng, Zheng Hua Zhu, Jie Zhou, Jiwen Lu","doi":"10.48550/arXiv.2205.04449","DOIUrl":null,"url":null,"abstract":"This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images. Conventional deep metric learning methods focus on learning a discriminative embedding to describe the semantic features of images, which ignore the existence of uncertainty in each image resulting from noise or semantic ambiguity. Training without awareness of these uncertainties causes the model to overfit the annotated labels during training and produce overconfident judgments during inference. Motivated by this, we argue that a good similarity model should consider the semantic discrepancies with awareness of the uncertainty to better deal with ambiguous images for more robust training. To achieve this, we propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively. We further propose an introspective similarity metric to make similarity judgments between images considering both their semantic differences and ambiguities. The gradient analysis of the proposed metric shows that it enables the model to learn at an adaptive and slower pace to deal with the uncertainty during training. Our framework attains state-of-the-art performance on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets for image retrieval. We further evaluate our framework for image classification on the ImageNet-1K, CIFAR-10, and CIFAR-100 datasets, which shows that equipping existing data mixing methods with the proposed introspective metric consistently achieves better results (e.g., +0.44% for CutMix on ImageNet-1K).","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":20.8000,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.04449","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 2

Abstract

This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images. Conventional deep metric learning methods focus on learning a discriminative embedding to describe the semantic features of images, which ignore the existence of uncertainty in each image resulting from noise or semantic ambiguity. Training without awareness of these uncertainties causes the model to overfit the annotated labels during training and produce overconfident judgments during inference. Motivated by this, we argue that a good similarity model should consider the semantic discrepancies with awareness of the uncertainty to better deal with ambiguous images for more robust training. To achieve this, we propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively. We further propose an introspective similarity metric to make similarity judgments between images considering both their semantic differences and ambiguities. The gradient analysis of the proposed metric shows that it enables the model to learn at an adaptive and slower pace to deal with the uncertainty during training. Our framework attains state-of-the-art performance on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets for image retrieval. We further evaluate our framework for image classification on the ImageNet-1K, CIFAR-10, and CIFAR-100 datasets, which shows that equipping existing data mixing methods with the proposed introspective metric consistently achieves better results (e.g., +0.44% for CutMix on ImageNet-1K).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
反思性深度度量学习
本文提出了一种内省式深度度量学习(IDML)框架,用于图像的不确定性比较。传统的深度度量学习方法侧重于学习判别嵌入来描述图像的语义特征,而忽略了噪声或语义模糊导致的每个图像中存在的不确定性。在没有意识到这些不确定性的情况下进行训练会导致模型在训练过程中过度拟合注释标签,并在推理过程中产生过度自信的判断。受此启发,我们认为,一个好的相似性模型应该考虑语义差异,并意识到不确定性,以更好地处理模糊图像,从而进行更稳健的训练。为了实现这一点,我们建议不仅使用语义嵌入,还使用伴随的不确定性嵌入来表示图像,该嵌入分别描述图像的语义特征和模糊性。我们进一步提出了一种内省相似性度量,以在考虑图像语义差异和歧义的情况下对图像进行相似性判断。对所提出的度量的梯度分析表明,它使模型能够以自适应和较慢的速度学习,以应对训练过程中的不确定性。我们的框架在广泛使用的用于图像检索的CUB200-2011、Cars196和Stanford Online Products数据集上获得了最先进的性能。我们在ImageNet-1K、CIFAR-10和CIFAR-100数据集上进一步评估了我们的图像分类框架,这表明,为现有的数据混合方法配备所提出的内省度量始终可以获得更好的结果(例如,ImageNet-1K上的CutMix为+0.44%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
28.40
自引率
3.00%
发文量
885
审稿时长
8.5 months
期刊介绍: The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.
期刊最新文献
FSD V2: Improving Fully Sparse 3D Object Detection with Virtual Voxels Partial Scene Text Retrieval BokehMe++: Harmonious Fusion of Classical and Neural Rendering for Versatile Bokeh Creation DiffI2I: Efficient Diffusion Model for Image-to-Image Translation A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1