结合全局相似度和局部相似度的跨媒体图像-文本检索

Zhixin Li, Feng Ling, Canlong Zhang
{"title":"结合全局相似度和局部相似度的跨媒体图像-文本检索","authors":"Zhixin Li, Feng Ling, Canlong Zhang","doi":"10.1109/DSAA.2019.00029","DOIUrl":null,"url":null,"abstract":"In this paper, we study the problem of image-text matching in order to make the image and text have better semantic matching. In the previous work, people just simply used the pre-training network to extract image and text features and project directly into a common subspace, or change various loss functions on this basis, or use the attention mechanism to directly match the image region proposals and the text phrases. This is not a good match for the semantics of the image and the text. In this study, we propose a method of cross-media retrieval based on global representation and local representation. We constructed a cross-media two-level network to explore better semantic matching between images and text, which contains subnets that handle both global and local features. Specifically, we not only use the self-attention network to obtain a macro representation of the global image but also use the local fine-grained patch with the attention mechanism. Then, we use a two-level alignment framework to promote each other to learn different representations of cross-media retrieval. The innovation of this study lies in the use of more comprehensive features of image and text to design the two kinds of similarity and add them up in some way. Experimental results show that this method is effective in image-text retrieval. Experimental results on the Flickr30K and MS-COCO datasets show that this model has a better recall rate than many of the current advanced cross-media retrieval models.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Cross-Media Image-Text Retrieval Combined with Global Similarity and Local Similarity\",\"authors\":\"Zhixin Li, Feng Ling, Canlong Zhang\",\"doi\":\"10.1109/DSAA.2019.00029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study the problem of image-text matching in order to make the image and text have better semantic matching. In the previous work, people just simply used the pre-training network to extract image and text features and project directly into a common subspace, or change various loss functions on this basis, or use the attention mechanism to directly match the image region proposals and the text phrases. This is not a good match for the semantics of the image and the text. In this study, we propose a method of cross-media retrieval based on global representation and local representation. We constructed a cross-media two-level network to explore better semantic matching between images and text, which contains subnets that handle both global and local features. Specifically, we not only use the self-attention network to obtain a macro representation of the global image but also use the local fine-grained patch with the attention mechanism. Then, we use a two-level alignment framework to promote each other to learn different representations of cross-media retrieval. The innovation of this study lies in the use of more comprehensive features of image and text to design the two kinds of similarity and add them up in some way. Experimental results show that this method is effective in image-text retrieval. Experimental results on the Flickr30K and MS-COCO datasets show that this model has a better recall rate than many of the current advanced cross-media retrieval models.\",\"PeriodicalId\":416037,\"journal\":{\"name\":\"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSAA.2019.00029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA.2019.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

为了使图像和文本具有更好的语义匹配,本文研究了图像和文本的匹配问题。在之前的工作中,人们只是简单地使用预训练网络提取图像和文本特征并直接投影到公共子空间中,或者在此基础上改变各种损失函数,或者使用注意机制直接匹配图像区域建议和文本短语。这与图像和文本的语义不太匹配。在本研究中,我们提出了一种基于全局表示和局部表示的跨媒体检索方法。我们构建了一个跨媒体两级网络来探索图像和文本之间更好的语义匹配,该网络包含处理全局和局部特征的子网。具体来说,我们既使用自注意网络获得全局图像的宏观表示,又使用局部细粒度补丁与注意机制相结合。然后,我们使用一个两级对齐框架来相互促进学习跨媒体检索的不同表示。本研究的创新之处在于利用图像和文本更全面的特征来设计两种相似度,并在某种程度上加以叠加。实验结果表明,该方法在图像文本检索中是有效的。在Flickr30K和MS-COCO数据集上的实验结果表明,该模型比目前许多先进的跨媒体检索模型具有更好的查全率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Cross-Media Image-Text Retrieval Combined with Global Similarity and Local Similarity
In this paper, we study the problem of image-text matching in order to make the image and text have better semantic matching. In the previous work, people just simply used the pre-training network to extract image and text features and project directly into a common subspace, or change various loss functions on this basis, or use the attention mechanism to directly match the image region proposals and the text phrases. This is not a good match for the semantics of the image and the text. In this study, we propose a method of cross-media retrieval based on global representation and local representation. We constructed a cross-media two-level network to explore better semantic matching between images and text, which contains subnets that handle both global and local features. Specifically, we not only use the self-attention network to obtain a macro representation of the global image but also use the local fine-grained patch with the attention mechanism. Then, we use a two-level alignment framework to promote each other to learn different representations of cross-media retrieval. The innovation of this study lies in the use of more comprehensive features of image and text to design the two kinds of similarity and add them up in some way. Experimental results show that this method is effective in image-text retrieval. Experimental results on the Flickr30K and MS-COCO datasets show that this model has a better recall rate than many of the current advanced cross-media retrieval models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Rapid Prototyping Approach for High Performance Density-Based Clustering Automating Big Data Analysis Based on Deep Learning Generation by Automatic Service Composition Detecting Sensitive Content in Spoken Language Improving the Personalized Recommendation in the Cold-start Scenarios Colorwall: An Embedded Temporal Display of Bibliographic Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1