基于深度语义相关学习的多媒体跨模态检索哈希算法

Xiaolong Gong, Linpeng Huang, Fuwei Wang
{"title":"基于深度语义相关学习的多媒体跨模态检索哈希算法","authors":"Xiaolong Gong, Linpeng Huang, Fuwei Wang","doi":"10.1109/ICDM.2018.00027","DOIUrl":null,"url":null,"abstract":"For many large-scale multimedia datasets and web contents, the nearest neighbor search methods based on the hashing strategy for cross-modal retrieval have attracted considerable attention due to its fast query speed and low storage cost. Most existing hashing methods try to map different modalities to Hamming embedding in a supervised way where the semantic information comes from a large manual label matrix and each sample in different modalities is usually encoded by a sparse label vector. However, previous studies didn't address the semantic correlation learning challenges and couldn't make the best use of the prior semantic information. Therefore, they cannot preserve the accurate semantic similarities and often degrade the performance of hashing function learning. To fill this gap, we firstly proposed a novel Deep Semantic Correlation learning based Hashing framework (DSCH) that generates unified hash codes in an end-to-end deep learning architecture for cross-modal retrieval task. The major contribution in this work is to effectively automatically construct the semantic correlation between data representation and demonstrate how to utilize correlation information to generate hash codes for new samples. In particular, DSCH integrates latent semantic embedding with a unified hash embedding to strengthen the similarity information among multiple modalities. Furthermore, additional graph regularization is employed in our framework, to capture the correspondences from the inter-modal and intra-modal. Our model simultaneously learns the semantic correlation and the unified hash codes, which enhances the effectiveness of cross-modal retrieval task. Experimental results show the superior accuracy of our proposed approach to several state-of-the-art cross-modality methods on two large datasets.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Deep Semantic Correlation Learning Based Hashing for Multimedia Cross-Modal Retrieval\",\"authors\":\"Xiaolong Gong, Linpeng Huang, Fuwei Wang\",\"doi\":\"10.1109/ICDM.2018.00027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For many large-scale multimedia datasets and web contents, the nearest neighbor search methods based on the hashing strategy for cross-modal retrieval have attracted considerable attention due to its fast query speed and low storage cost. Most existing hashing methods try to map different modalities to Hamming embedding in a supervised way where the semantic information comes from a large manual label matrix and each sample in different modalities is usually encoded by a sparse label vector. However, previous studies didn't address the semantic correlation learning challenges and couldn't make the best use of the prior semantic information. Therefore, they cannot preserve the accurate semantic similarities and often degrade the performance of hashing function learning. To fill this gap, we firstly proposed a novel Deep Semantic Correlation learning based Hashing framework (DSCH) that generates unified hash codes in an end-to-end deep learning architecture for cross-modal retrieval task. The major contribution in this work is to effectively automatically construct the semantic correlation between data representation and demonstrate how to utilize correlation information to generate hash codes for new samples. In particular, DSCH integrates latent semantic embedding with a unified hash embedding to strengthen the similarity information among multiple modalities. Furthermore, additional graph regularization is employed in our framework, to capture the correspondences from the inter-modal and intra-modal. Our model simultaneously learns the semantic correlation and the unified hash codes, which enhances the effectiveness of cross-modal retrieval task. Experimental results show the superior accuracy of our proposed approach to several state-of-the-art cross-modality methods on two large datasets.\",\"PeriodicalId\":286444,\"journal\":{\"name\":\"2018 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2018.00027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2018.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

对于许多大型多媒体数据集和web内容,基于哈希策略的最近邻搜索方法因其查询速度快、存储成本低而受到广泛关注。大多数现有的哈希方法都试图以监督的方式将不同的模态映射到汉明嵌入,其中语义信息来自于一个大的手动标签矩阵,每个不同模态的样本通常由一个稀疏的标签向量编码。然而,以往的研究并没有解决语义相关学习的难题,也没有充分利用先验的语义信息。因此,它们不能保持准确的语义相似度,往往会降低哈希函数学习的性能。为了填补这一空白,我们首先提出了一种新的基于深度语义相关学习的哈希框架(DSCH),该框架在端到端深度学习架构中为跨模态检索任务生成统一的哈希码。该工作的主要贡献是有效地自动构建数据表示之间的语义相关性,并演示如何利用相关信息生成新样本的哈希码。特别是,DSCH将潜在语义嵌入与统一哈希嵌入相结合,增强了多模态之间的相似信息。此外,在我们的框架中使用了额外的图正则化,以捕获来自模态间和模态内的对应关系。该模型同时学习语义关联和统一哈希码,提高了跨模态检索任务的有效性。实验结果表明,我们提出的方法在两个大型数据集上优于几种最先进的交叉模态方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Deep Semantic Correlation Learning Based Hashing for Multimedia Cross-Modal Retrieval
For many large-scale multimedia datasets and web contents, the nearest neighbor search methods based on the hashing strategy for cross-modal retrieval have attracted considerable attention due to its fast query speed and low storage cost. Most existing hashing methods try to map different modalities to Hamming embedding in a supervised way where the semantic information comes from a large manual label matrix and each sample in different modalities is usually encoded by a sparse label vector. However, previous studies didn't address the semantic correlation learning challenges and couldn't make the best use of the prior semantic information. Therefore, they cannot preserve the accurate semantic similarities and often degrade the performance of hashing function learning. To fill this gap, we firstly proposed a novel Deep Semantic Correlation learning based Hashing framework (DSCH) that generates unified hash codes in an end-to-end deep learning architecture for cross-modal retrieval task. The major contribution in this work is to effectively automatically construct the semantic correlation between data representation and demonstrate how to utilize correlation information to generate hash codes for new samples. In particular, DSCH integrates latent semantic embedding with a unified hash embedding to strengthen the similarity information among multiple modalities. Furthermore, additional graph regularization is employed in our framework, to capture the correspondences from the inter-modal and intra-modal. Our model simultaneously learns the semantic correlation and the unified hash codes, which enhances the effectiveness of cross-modal retrieval task. Experimental results show the superior accuracy of our proposed approach to several state-of-the-art cross-modality methods on two large datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Entire Regularization Path for Sparse Nonnegative Interaction Model Accelerating Experimental Design by Incorporating Experimenter Hunches Title Page i An Efficient Many-Class Active Learning Framework for Knowledge-Rich Domains Social Recommendation with Missing Not at Random Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1