利用语义索引和内容代码的无监督双深度散列技术实现跨模态检索

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-24 DOI:10.1109/TPAMI.2024.3467130

Bin Zhang;Yue Zhang;Junyu Li;Jiazhou Chen;Tatsuya Akutsu;Yiu-Ming Cheung;Hongmin Cai

{"title":"利用语义索引和内容代码的无监督双深度散列技术实现跨模态检索","authors":"Bin Zhang;Yue Zhang;Junyu Li;Jiazhou Chen;Tatsuya Akutsu;Yiu-Ming Cheung;Hongmin Cai","doi":"10.1109/TPAMI.2024.3467130","DOIUrl":null,"url":null,"abstract":"Hashing technology has exhibited great cross-modal retrieval potential due to its appealing retrieval efficiency and storage effectiveness. Most current supervised cross-modal retrieval methods heavily rely on accurate semantic supervision, which is intractable for annotations with ever-growing sample sizes. By comparison, the existing unsupervised methods rely on accurate sample similarity preservation strategies with intensive computational costs to compensate for the lack of semantic guidance, which causes these methods to lose the power to bridge the semantic gap. Furthermore, both kinds of approaches need to search for the nearest samples among all samples in a large search space, whose process is laborious. To address these issues, this paper proposes an unsupervised dual deep hashing (UDDH) method with semantic-index and content-code for cross-modal retrieval. Deep hashing networks are utilized to extract deep features and jointly encode the dual hashing codes in a collaborative manner with a common semantic index and modality content codes to simultaneously bridge the semantic and heterogeneous gaps for cross-modal retrieval. The dual deep hashing architecture, comprising the head code on semantic index and tail codes on modality content, enhances the efficiency for cross-modal retrieval. A query sample only needs to search for the retrieved samples with the same semantic index, thus greatly shrinking the search space and achieving superior retrieval efficiency. UDDH integrates the learning processes of deep feature extraction, binary optimization, common semantic index, and modality content code within a unified model, allowing for collaborative optimization to enhance the overall performance. Extensive experiments are conducted to demonstrate the retrieval superiority of the proposed approach over the state-of-the-art baselines.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 1","pages":"387-399"},"PeriodicalIF":18.6000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unsupervised Dual Deep Hashing With Semantic-Index and Content-Code for Cross-Modal Retrieval\",\"authors\":\"Bin Zhang;Yue Zhang;Junyu Li;Jiazhou Chen;Tatsuya Akutsu;Yiu-Ming Cheung;Hongmin Cai\",\"doi\":\"10.1109/TPAMI.2024.3467130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hashing technology has exhibited great cross-modal retrieval potential due to its appealing retrieval efficiency and storage effectiveness. Most current supervised cross-modal retrieval methods heavily rely on accurate semantic supervision, which is intractable for annotations with ever-growing sample sizes. By comparison, the existing unsupervised methods rely on accurate sample similarity preservation strategies with intensive computational costs to compensate for the lack of semantic guidance, which causes these methods to lose the power to bridge the semantic gap. Furthermore, both kinds of approaches need to search for the nearest samples among all samples in a large search space, whose process is laborious. To address these issues, this paper proposes an unsupervised dual deep hashing (UDDH) method with semantic-index and content-code for cross-modal retrieval. Deep hashing networks are utilized to extract deep features and jointly encode the dual hashing codes in a collaborative manner with a common semantic index and modality content codes to simultaneously bridge the semantic and heterogeneous gaps for cross-modal retrieval. The dual deep hashing architecture, comprising the head code on semantic index and tail codes on modality content, enhances the efficiency for cross-modal retrieval. A query sample only needs to search for the retrieved samples with the same semantic index, thus greatly shrinking the search space and achieving superior retrieval efficiency. UDDH integrates the learning processes of deep feature extraction, binary optimization, common semantic index, and modality content code within a unified model, allowing for collaborative optimization to enhance the overall performance. Extensive experiments are conducted to demonstrate the retrieval superiority of the proposed approach over the state-of-the-art baselines.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 1\",\"pages\":\"387-399\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2024-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10689647/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10689647/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

哈希技术以其极好的检索效率和存储效率显示出巨大的跨模态检索潜力。目前大多数有监督的跨模态检索方法严重依赖于准确的语义监督，这对于样本量不断增长的注释来说是难以解决的。相比之下，现有的无监督方法依靠精确的样本相似度保持策略来弥补语义引导的不足，计算成本高，导致这些方法失去了弥合语义差距的能力。而且，这两种方法都需要在很大的搜索空间中从所有样本中寻找最接近的样本，这一过程比较费力。为了解决这些问题，本文提出了一种基于语义索引和内容代码的无监督双深度哈希（UDDH）方法用于跨模态检索。利用深度哈希网络提取深度特征，并以共同的语义索引和模态内容码协同方式对双哈希码进行联合编码，同时弥合跨模态检索的语义和异构差距。双深度哈希结构，包括语义索引的头部代码和模态内容的尾部代码，提高了跨模态检索的效率。一个查询样本只需要搜索具有相同语义索引的被检索样本，从而大大缩小了搜索空间，取得了优异的检索效率。UDDH将深度特征提取、二进制优化、公共语义索引和模态内容代码的学习过程集成在一个统一的模型中，允许协作优化以提高整体性能。进行了大量的实验，以证明所提出的方法比最先进的基线检索优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Unsupervised Dual Deep Hashing With Semantic-Index and Content-Code for Cross-Modal Retrieval

Hashing technology has exhibited great cross-modal retrieval potential due to its appealing retrieval efficiency and storage effectiveness. Most current supervised cross-modal retrieval methods heavily rely on accurate semantic supervision, which is intractable for annotations with ever-growing sample sizes. By comparison, the existing unsupervised methods rely on accurate sample similarity preservation strategies with intensive computational costs to compensate for the lack of semantic guidance, which causes these methods to lose the power to bridge the semantic gap. Furthermore, both kinds of approaches need to search for the nearest samples among all samples in a large search space, whose process is laborious. To address these issues, this paper proposes an unsupervised dual deep hashing (UDDH) method with semantic-index and content-code for cross-modal retrieval. Deep hashing networks are utilized to extract deep features and jointly encode the dual hashing codes in a collaborative manner with a common semantic index and modality content codes to simultaneously bridge the semantic and heterogeneous gaps for cross-modal retrieval. The dual deep hashing architecture, comprising the head code on semantic index and tail codes on modality content, enhances the efficiency for cross-modal retrieval. A query sample only needs to search for the retrieved samples with the same semantic index, thus greatly shrinking the search space and achieving superior retrieval efficiency. UDDH integrates the learning processes of deep feature extraction, binary optimization, common semantic index, and modality content code within a unified model, allowing for collaborative optimization to enhance the overall performance. Extensive experiments are conducted to demonstrate the retrieval superiority of the proposed approach over the state-of-the-art baselines.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量

期刊最新文献

Calibrating Biased Distribution in VFM-Derived Latent Space via Cross-Domain Geometric Consistency. Penny-Wise and Pound-Foolish in AI-Generated Image Detection. 50 Years of Automated Face Recognition. Soft Label Pruning and Quantization for Large-Scale Dataset Distillation. On the Adversarial Transferability of Generalized "Skip Connections".