{"title":"Semantic-Enhanced Proxy-Guided Hashing for Long-Tailed Image Retrieval","authors":"Hongtao Xie;Yan Jiang;Lei Zhang;Pandeng Li;Dongming Zhang;Yongdong Zhang","doi":"10.1109/TMM.2024.3394684","DOIUrl":null,"url":null,"abstract":"Hashing has been studied extensively for large-scale image retrieval due to its efficient computation and storage. Deep hashing methods typically train models with category-balanced data and suffer from a serious performance deterioration when dealing with long-tailed training samples. Recently, several long-tailed hashing methods focus on this newly emerging field for practical purpose. However, existing methods still face challenges that fixed category centers with limited semantic information cannot effectively improve the discriminative ability of tail-category hash codes. To tackle the issue, we propose a novel method called Semantic-enhanced Proxy-guided Hashing in this paper. We leverage two sets of learnable category proxies in the feature space and the Hamming space respectively, which can describe category semantics by getting updated continuously along with the whole model via back-propagation. Based on this, we introduce the Mahalanobis distance metric to characterize relationships accurately and enhance the semantic representation of both proxies and samples concurrently, improving the hash learning process. Moreover, we capture the multilateral correlations between proxies and samples in the feature space and extend a hypergraph neural network to transfer semantic knowledge from proxies to samples in the Hamming space. Extensive experiments show that our method achieves the state-of-the-art performance and surpasses existing methods by 1.47%–7.56% MAP on long-tailed benchmarks, demonstrating the superiority of learnable category proxies and the effectiveness of our proposed learning algorithm for long-tailed hashing.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"9499-9514"},"PeriodicalIF":8.4000,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10509797/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Hashing has been studied extensively for large-scale image retrieval due to its efficient computation and storage. Deep hashing methods typically train models with category-balanced data and suffer from a serious performance deterioration when dealing with long-tailed training samples. Recently, several long-tailed hashing methods focus on this newly emerging field for practical purpose. However, existing methods still face challenges that fixed category centers with limited semantic information cannot effectively improve the discriminative ability of tail-category hash codes. To tackle the issue, we propose a novel method called Semantic-enhanced Proxy-guided Hashing in this paper. We leverage two sets of learnable category proxies in the feature space and the Hamming space respectively, which can describe category semantics by getting updated continuously along with the whole model via back-propagation. Based on this, we introduce the Mahalanobis distance metric to characterize relationships accurately and enhance the semantic representation of both proxies and samples concurrently, improving the hash learning process. Moreover, we capture the multilateral correlations between proxies and samples in the feature space and extend a hypergraph neural network to transfer semantic knowledge from proxies to samples in the Hamming space. Extensive experiments show that our method achieves the state-of-the-art performance and surpasses existing methods by 1.47%–7.56% MAP on long-tailed benchmarks, demonstrating the superiority of learnable category proxies and the effectiveness of our proposed learning algorithm for long-tailed hashing.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.