Text modality enhanced based deep hashing for multi-label cross-modal retrieval

Huan Liu, Jiang Xiong, Nian Zhang, Jing Zhong
{"title":"Text modality enhanced based deep hashing for multi-label cross-modal retrieval","authors":"Huan Liu, Jiang Xiong, Nian Zhang, Jing Zhong","doi":"10.1109/icaci55529.2022.9837775","DOIUrl":null,"url":null,"abstract":"In the past few years, due to the strong feature learning capability of deep neural networks, deep cross-modal hashing (DCMHs) has made considerable progress. However, there exist two problems in most DCMHs methods: (1) most extisting DCMHs methods utilize single labels to calculate the semantic similarity of instances, which overlooks the fact that, in the field of cross-modal retrieval, most benchmark datasets as well as practical applications have multiple labels. Therefore, single labels based DCMHs methods cannot accurately calculate the semantic similarity of instances and may decrease the performance of the learned DCMHs models. (2) Most DCMHs models are built on the image-text modalities, nevertheless, as the initial feature space of the text modality is quite sparse, the learned hash projection function based on these sparse features for the text modality is too weak to map the original text into robust hash codes. To solve these two problems, in this paper, we propose a text modality enhanced based deep hashing for multi-label cross-modal retrieval (TMEDH) method. TMEDH firstly defines a multi-label based semantic similarity calculation formula to accurately compute the semantic similarity of cross-modal instances. Secondly, TMEDH introduces a text modality enhanced module to compensate the sparse features of the text modality by fuse the multi-label information into the features of the text. Extensive ablation experiments as well as comparative experiments on two cross-modal retrieval datasets demonstrate that our proposed TMEDH method achieves state-of-the-art performance.","PeriodicalId":412347,"journal":{"name":"2022 14th International Conference on Advanced Computational Intelligence (ICACI)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icaci55529.2022.9837775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In the past few years, due to the strong feature learning capability of deep neural networks, deep cross-modal hashing (DCMHs) has made considerable progress. However, there exist two problems in most DCMHs methods: (1) most extisting DCMHs methods utilize single labels to calculate the semantic similarity of instances, which overlooks the fact that, in the field of cross-modal retrieval, most benchmark datasets as well as practical applications have multiple labels. Therefore, single labels based DCMHs methods cannot accurately calculate the semantic similarity of instances and may decrease the performance of the learned DCMHs models. (2) Most DCMHs models are built on the image-text modalities, nevertheless, as the initial feature space of the text modality is quite sparse, the learned hash projection function based on these sparse features for the text modality is too weak to map the original text into robust hash codes. To solve these two problems, in this paper, we propose a text modality enhanced based deep hashing for multi-label cross-modal retrieval (TMEDH) method. TMEDH firstly defines a multi-label based semantic similarity calculation formula to accurately compute the semantic similarity of cross-modal instances. Secondly, TMEDH introduces a text modality enhanced module to compensate the sparse features of the text modality by fuse the multi-label information into the features of the text. Extensive ablation experiments as well as comparative experiments on two cross-modal retrieval datasets demonstrate that our proposed TMEDH method achieves state-of-the-art performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于文本模态增强的深度哈希多标签跨模态检索
在过去的几年里,由于深度神经网络强大的特征学习能力,深度跨模态哈希(deep cross-modal hash, DCMHs)取得了长足的进步。然而,大多数DCMHs方法存在两个问题:(1)大多数现有的DCMHs方法使用单个标签来计算实例的语义相似度,忽略了在跨模态检索领域,大多数基准数据集和实际应用都有多个标签。因此,基于单标签的DCMHs方法不能准确地计算实例的语义相似度,可能会降低学习到的DCMHs模型的性能。(2)大多数DCMHs模型都是建立在图像-文本模态上的,然而,由于文本模态的初始特征空间非常稀疏,基于这些稀疏特征学习到的文本模态哈希投影函数太弱,无法将原始文本映射到鲁棒哈希码中。为了解决这两个问题,本文提出了一种基于文本模态增强的深度哈希多标签跨模态检索(TMEDH)方法。TMEDH首先定义了基于多标签的语义相似度计算公式,精确计算跨模态实例的语义相似度。其次,TMEDH引入文本模态增强模块,通过将多标签信息融合到文本特征中来补偿文本模态的稀疏特征;大量的烧蚀实验以及在两个跨模态检索数据集上的对比实验表明,我们提出的TMEDH方法达到了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Speed Estimation of Video Target Based on Siamese Convolutional Network and Kalman Filtering Aspect Term Extraction and Categorization for Chinese MOOC Reviews A Global Harmony Search Algorithm Based on Tent Chaos Map and Elite Reverse Learning An Improved Superpixel-based Fuzzy C-Means Method for Complex Picture Segmentation Tasks New Results on Finite-Time Synchronization of Delayed Fuzzy Neural Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1