Deep supervision network with contrastive learning for zero-shot sketch-based image retrieval

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Applied Soft Computing Pub Date : 2024-11-19 DOI:10.1016/j.asoc.2024.112474

Zhenqiu Shu , Guangyao Zhuo , Jun Yu , Zhengtao Yu

{"title":"Deep supervision network with contrastive learning for zero-shot sketch-based image retrieval","authors":"Zhenqiu Shu , Guangyao Zhuo , Jun Yu , Zhengtao Yu","doi":"10.1016/j.asoc.2024.112474","DOIUrl":null,"url":null,"abstract":"<div><div>Zero-shot sketch-based image retrieval (ZS-SBIR) is an extremely challenging cross-modal retrieval task. In ZS-SBIR, hand-drawn sketches are used as queries to retrieve corresponding natural images in zero-shot scenarios. Existing methods utilize diverse loss functions to guide deep neural networks (DNNs) to align feature representations of both sketches and images. In general, these methods supervise only the last layer of DNNs and then update each layer of DNNs using back-propagate technology. However, this strategy cannot effectively optimize the intermediate layers of DNNs, potentially hindering retrieval performance. To address this issue, we propose a deep supervision network with contrastive learning (DSNCL) approach for ZS-SBIR. Specifically, we employ a novel deep supervision network training method that attaches multiple projection heads to the intermediate layers of DNNs. These projection heads map multi-level features to a normalized embedding space and are trained by contrastive learning. The proposed method instructs the intermediate layers of DNNs to learn the invariance of various data augmentation, thereby aligning the feature representations of both sketches and images. This significantly narrows its domain gap and semantic gap. Besides, we use contrastive learning to directly optimize the intermediate layers of DNNs, which effectively reduces the optimization difficulty of their intermediate layers. Furthermore, we investigate the cross-batch metric (CBM) learning mechanism, which stores samples of different batches for metric learning by constructing a semantic queue, to further improve the performance in ZS-SBIR applications. Comprehensive experimental results on the Sketchy and TU-Berlin datasets validate the superiority of our DSNCL method over existing state-of-the-art methods.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"167 ","pages":"Article 112474"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494624012481","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Zero-shot sketch-based image retrieval (ZS-SBIR) is an extremely challenging cross-modal retrieval task. In ZS-SBIR, hand-drawn sketches are used as queries to retrieve corresponding natural images in zero-shot scenarios. Existing methods utilize diverse loss functions to guide deep neural networks (DNNs) to align feature representations of both sketches and images. In general, these methods supervise only the last layer of DNNs and then update each layer of DNNs using back-propagate technology. However, this strategy cannot effectively optimize the intermediate layers of DNNs, potentially hindering retrieval performance. To address this issue, we propose a deep supervision network with contrastive learning (DSNCL) approach for ZS-SBIR. Specifically, we employ a novel deep supervision network training method that attaches multiple projection heads to the intermediate layers of DNNs. These projection heads map multi-level features to a normalized embedding space and are trained by contrastive learning. The proposed method instructs the intermediate layers of DNNs to learn the invariance of various data augmentation, thereby aligning the feature representations of both sketches and images. This significantly narrows its domain gap and semantic gap. Besides, we use contrastive learning to directly optimize the intermediate layers of DNNs, which effectively reduces the optimization difficulty of their intermediate layers. Furthermore, we investigate the cross-batch metric (CBM) learning mechanism, which stores samples of different batches for metric learning by constructing a semantic queue, to further improve the performance in ZS-SBIR applications. Comprehensive experimental results on the Sketchy and TU-Berlin datasets validate the superiority of our DSNCL method over existing state-of-the-art methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

采用对比学习的深度监督网络，用于基于零镜头素描的图像检索

基于零镜头素描的图像检索（ZS-SBIR）是一项极具挑战性的跨模态检索任务。在 ZS-SBIR 中，手绘草图被用作查询，以检索零拍场景中相应的自然图像。现有方法利用不同的损失函数引导深度神经网络（DNN）对草图和图像的特征表示进行对齐。一般来说，这些方法只对 DNN 的最后一层进行监督，然后利用反向传播技术更新 DNN 的每一层。然而，这种策略无法有效优化 DNN 的中间层，可能会影响检索性能。为了解决这个问题，我们提出了一种针对 ZS-SBIR 的深度监督网络对比学习（DSNCL）方法。具体来说，我们采用了一种新颖的深度监督网络训练方法，在 DNN 的中间层附加多个投影头。这些投影头将多层次特征映射到归一化嵌入空间，并通过对比学习进行训练。所提出的方法可指导 DNN 的中间层学习各种数据增强的不变性，从而使草图和图像的特征表示保持一致。这大大缩小了其领域差距和语义差距。此外，我们利用对比学习直接优化 DNN 的中间层，有效降低了中间层的优化难度。此外，我们还研究了跨批次度量（CBM）学习机制，通过构建语义队列存储不同批次的样本进行度量学习，进一步提高了 ZS-SBIR 应用的性能。在 Sketchy 和 TU-Berlin 数据集上的综合实验结果验证了我们的 DSNCL 方法优于现有的先进方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.