Lunke Fei , Zhihao He , Wai Keung Wong , Qi Zhu , Shuping Zhao , Jie Wen
{"title":"深度跨模态检索的语义分解和增强哈希","authors":"Lunke Fei , Zhihao He , Wai Keung Wong , Qi Zhu , Shuping Zhao , Jie Wen","doi":"10.1016/j.patcog.2024.111225","DOIUrl":null,"url":null,"abstract":"<div><div>Deep hashing has garnered considerable interest and has shown impressive performance in the domain of retrieval. However, the majority of the current hashing techniques rely solely on binary similarity evaluation criteria to assess the semantic relationships between multi-label instances, which presents a challenge in overcoming the feature gap across various modalities. In this paper, we propose semantic decomposition and enhancement hashing (SDEH) by extensively exploring the multi-label semantic information shared by different modalities for cross-modal retrieval. Specifically, we first introduce two independent attention-based feature learning subnetworks to capture the modality-specific features with both global and local details. Subsequently, we exploit the semantic features from multi-label vectors by decomposing the shared semantic information among multi-modal features such that the associations of different modalities can be established. Finally, we jointly learn the common hash code representations of multimodal information under the guidelines of quadruple losses, making the hash codes informative while simultaneously preserving multilevel semantic relationships and feature distribution consistency. Comprehensive experiments on four commonly used multimodal datasets offer strong support for the exceptional effectiveness of our proposed SDEH.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111225"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic decomposition and enhancement hashing for deep cross-modal retrieval\",\"authors\":\"Lunke Fei , Zhihao He , Wai Keung Wong , Qi Zhu , Shuping Zhao , Jie Wen\",\"doi\":\"10.1016/j.patcog.2024.111225\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep hashing has garnered considerable interest and has shown impressive performance in the domain of retrieval. However, the majority of the current hashing techniques rely solely on binary similarity evaluation criteria to assess the semantic relationships between multi-label instances, which presents a challenge in overcoming the feature gap across various modalities. In this paper, we propose semantic decomposition and enhancement hashing (SDEH) by extensively exploring the multi-label semantic information shared by different modalities for cross-modal retrieval. Specifically, we first introduce two independent attention-based feature learning subnetworks to capture the modality-specific features with both global and local details. Subsequently, we exploit the semantic features from multi-label vectors by decomposing the shared semantic information among multi-modal features such that the associations of different modalities can be established. Finally, we jointly learn the common hash code representations of multimodal information under the guidelines of quadruple losses, making the hash codes informative while simultaneously preserving multilevel semantic relationships and feature distribution consistency. Comprehensive experiments on four commonly used multimodal datasets offer strong support for the exceptional effectiveness of our proposed SDEH.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"160 \",\"pages\":\"Article 111225\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320324009762\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324009762","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Semantic decomposition and enhancement hashing for deep cross-modal retrieval
Deep hashing has garnered considerable interest and has shown impressive performance in the domain of retrieval. However, the majority of the current hashing techniques rely solely on binary similarity evaluation criteria to assess the semantic relationships between multi-label instances, which presents a challenge in overcoming the feature gap across various modalities. In this paper, we propose semantic decomposition and enhancement hashing (SDEH) by extensively exploring the multi-label semantic information shared by different modalities for cross-modal retrieval. Specifically, we first introduce two independent attention-based feature learning subnetworks to capture the modality-specific features with both global and local details. Subsequently, we exploit the semantic features from multi-label vectors by decomposing the shared semantic information among multi-modal features such that the associations of different modalities can be established. Finally, we jointly learn the common hash code representations of multimodal information under the guidelines of quadruple losses, making the hash codes informative while simultaneously preserving multilevel semantic relationships and feature distribution consistency. Comprehensive experiments on four commonly used multimodal datasets offer strong support for the exceptional effectiveness of our proposed SDEH.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.