Zhenqiu Shu , Guangyao Zhuo , Jun Yu , Zhengtao Yu
{"title":"Deep supervision network with contrastive learning for zero-shot sketch-based image retrieval","authors":"Zhenqiu Shu , Guangyao Zhuo , Jun Yu , Zhengtao Yu","doi":"10.1016/j.asoc.2024.112474","DOIUrl":null,"url":null,"abstract":"<div><div>Zero-shot sketch-based image retrieval (ZS-SBIR) is an extremely challenging cross-modal retrieval task. In ZS-SBIR, hand-drawn sketches are used as queries to retrieve corresponding natural images in zero-shot scenarios. Existing methods utilize diverse loss functions to guide deep neural networks (DNNs) to align feature representations of both sketches and images. In general, these methods supervise only the last layer of DNNs and then update each layer of DNNs using back-propagate technology. However, this strategy cannot effectively optimize the intermediate layers of DNNs, potentially hindering retrieval performance. To address this issue, we propose a deep supervision network with contrastive learning (DSNCL) approach for ZS-SBIR. Specifically, we employ a novel deep supervision network training method that attaches multiple projection heads to the intermediate layers of DNNs. These projection heads map multi-level features to a normalized embedding space and are trained by contrastive learning. The proposed method instructs the intermediate layers of DNNs to learn the invariance of various data augmentation, thereby aligning the feature representations of both sketches and images. This significantly narrows its domain gap and semantic gap. Besides, we use contrastive learning to directly optimize the intermediate layers of DNNs, which effectively reduces the optimization difficulty of their intermediate layers. Furthermore, we investigate the cross-batch metric (CBM) learning mechanism, which stores samples of different batches for metric learning by constructing a semantic queue, to further improve the performance in ZS-SBIR applications. Comprehensive experimental results on the Sketchy and TU-Berlin datasets validate the superiority of our DSNCL method over existing state-of-the-art methods.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"167 ","pages":"Article 112474"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494624012481","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Zero-shot sketch-based image retrieval (ZS-SBIR) is an extremely challenging cross-modal retrieval task. In ZS-SBIR, hand-drawn sketches are used as queries to retrieve corresponding natural images in zero-shot scenarios. Existing methods utilize diverse loss functions to guide deep neural networks (DNNs) to align feature representations of both sketches and images. In general, these methods supervise only the last layer of DNNs and then update each layer of DNNs using back-propagate technology. However, this strategy cannot effectively optimize the intermediate layers of DNNs, potentially hindering retrieval performance. To address this issue, we propose a deep supervision network with contrastive learning (DSNCL) approach for ZS-SBIR. Specifically, we employ a novel deep supervision network training method that attaches multiple projection heads to the intermediate layers of DNNs. These projection heads map multi-level features to a normalized embedding space and are trained by contrastive learning. The proposed method instructs the intermediate layers of DNNs to learn the invariance of various data augmentation, thereby aligning the feature representations of both sketches and images. This significantly narrows its domain gap and semantic gap. Besides, we use contrastive learning to directly optimize the intermediate layers of DNNs, which effectively reduces the optimization difficulty of their intermediate layers. Furthermore, we investigate the cross-batch metric (CBM) learning mechanism, which stores samples of different batches for metric learning by constructing a semantic queue, to further improve the performance in ZS-SBIR applications. Comprehensive experimental results on the Sketchy and TU-Berlin datasets validate the superiority of our DSNCL method over existing state-of-the-art methods.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.