{"title":"Text modality enhanced based deep hashing for multi-label cross-modal retrieval","authors":"Huan Liu, Jiang Xiong, Nian Zhang, Jing Zhong","doi":"10.1109/icaci55529.2022.9837775","DOIUrl":null,"url":null,"abstract":"In the past few years, due to the strong feature learning capability of deep neural networks, deep cross-modal hashing (DCMHs) has made considerable progress. However, there exist two problems in most DCMHs methods: (1) most extisting DCMHs methods utilize single labels to calculate the semantic similarity of instances, which overlooks the fact that, in the field of cross-modal retrieval, most benchmark datasets as well as practical applications have multiple labels. Therefore, single labels based DCMHs methods cannot accurately calculate the semantic similarity of instances and may decrease the performance of the learned DCMHs models. (2) Most DCMHs models are built on the image-text modalities, nevertheless, as the initial feature space of the text modality is quite sparse, the learned hash projection function based on these sparse features for the text modality is too weak to map the original text into robust hash codes. To solve these two problems, in this paper, we propose a text modality enhanced based deep hashing for multi-label cross-modal retrieval (TMEDH) method. TMEDH firstly defines a multi-label based semantic similarity calculation formula to accurately compute the semantic similarity of cross-modal instances. Secondly, TMEDH introduces a text modality enhanced module to compensate the sparse features of the text modality by fuse the multi-label information into the features of the text. Extensive ablation experiments as well as comparative experiments on two cross-modal retrieval datasets demonstrate that our proposed TMEDH method achieves state-of-the-art performance.","PeriodicalId":412347,"journal":{"name":"2022 14th International Conference on Advanced Computational Intelligence (ICACI)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icaci55529.2022.9837775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In the past few years, due to the strong feature learning capability of deep neural networks, deep cross-modal hashing (DCMHs) has made considerable progress. However, there exist two problems in most DCMHs methods: (1) most extisting DCMHs methods utilize single labels to calculate the semantic similarity of instances, which overlooks the fact that, in the field of cross-modal retrieval, most benchmark datasets as well as practical applications have multiple labels. Therefore, single labels based DCMHs methods cannot accurately calculate the semantic similarity of instances and may decrease the performance of the learned DCMHs models. (2) Most DCMHs models are built on the image-text modalities, nevertheless, as the initial feature space of the text modality is quite sparse, the learned hash projection function based on these sparse features for the text modality is too weak to map the original text into robust hash codes. To solve these two problems, in this paper, we propose a text modality enhanced based deep hashing for multi-label cross-modal retrieval (TMEDH) method. TMEDH firstly defines a multi-label based semantic similarity calculation formula to accurately compute the semantic similarity of cross-modal instances. Secondly, TMEDH introduces a text modality enhanced module to compensate the sparse features of the text modality by fuse the multi-label information into the features of the text. Extensive ablation experiments as well as comparative experiments on two cross-modal retrieval datasets demonstrate that our proposed TMEDH method achieves state-of-the-art performance.