细粒度跨媒体检索中的局部自关注

ACM Multimedia Asia Pub Date : 2021-12-01 DOI:10.1145/3469877.3490590

Chen Wang, Yazhou Yao, Qiong Wang, Zhenmin Tang

{"title":"细粒度跨媒体检索中的局部自关注","authors":"Chen Wang, Yazhou Yao, Qiong Wang, Zhenmin Tang","doi":"10.1145/3469877.3490590","DOIUrl":null,"url":null,"abstract":"Due to the heterogeneity gap, the data representation of different media is inconsistent and belongs to different feature spaces. Therefore, it is challenging to measure the fine-grained gap between them. To this end, we propose an attention space training method to learn common representations of different media data. Specifically, we utilize local self-attention layers to learn the common attention space between different media data. We propose a similarity concatenation method to understand the content relationship between features. To further improve the robustness of the model, we also train a local position encoding to capture the spatial relationships between features. In this way, our proposed method can effectively reduce the gap between different feature distributions on cross-media retrieval tasks. It also improves the fine-grained recognition performance by attaching attention to high-level semantic information. Extensive experiments and ablation studies demonstrate that our proposed method achieves state-of-the-art performance. At the same time, our approach provides a new pipeline for fine-grained cross-media retrieval. The source code and models are publicly available at: https://github.com/NUST-Machine-Intelligence-Laboratory/SAFGCMHN.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Local Self-Attention on Fine-grained Cross-media Retrieval\",\"authors\":\"Chen Wang, Yazhou Yao, Qiong Wang, Zhenmin Tang\",\"doi\":\"10.1145/3469877.3490590\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the heterogeneity gap, the data representation of different media is inconsistent and belongs to different feature spaces. Therefore, it is challenging to measure the fine-grained gap between them. To this end, we propose an attention space training method to learn common representations of different media data. Specifically, we utilize local self-attention layers to learn the common attention space between different media data. We propose a similarity concatenation method to understand the content relationship between features. To further improve the robustness of the model, we also train a local position encoding to capture the spatial relationships between features. In this way, our proposed method can effectively reduce the gap between different feature distributions on cross-media retrieval tasks. It also improves the fine-grained recognition performance by attaching attention to high-level semantic information. Extensive experiments and ablation studies demonstrate that our proposed method achieves state-of-the-art performance. At the same time, our approach provides a new pipeline for fine-grained cross-media retrieval. The source code and models are publicly available at: https://github.com/NUST-Machine-Intelligence-Laboratory/SAFGCMHN.\",\"PeriodicalId\":210974,\"journal\":{\"name\":\"ACM Multimedia Asia\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Multimedia Asia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3469877.3490590\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469877.3490590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

由于异质性差距，不同媒体的数据表示是不一致的，属于不同的特征空间。因此，测量它们之间的细粒度差距是具有挑战性的。为此，我们提出了一种注意空间训练方法来学习不同媒体数据的共同表征。具体来说，我们利用局部自注意层来学习不同媒体数据之间的共同注意空间。我们提出了一种相似性拼接方法来理解特征之间的内容关系。为了进一步提高模型的鲁棒性，我们还训练了局部位置编码来捕获特征之间的空间关系。这样，我们提出的方法可以有效地减少跨媒体检索任务中不同特征分布之间的差距。它还通过关注高级语义信息来提高细粒度识别性能。大量的实验和烧蚀研究表明，我们提出的方法达到了最先进的性能。同时，我们的方法为细粒度的跨媒体检索提供了一个新的管道。源代码和模型可以在:https://github.com/NUST-Machine-Intelligence-Laboratory/SAFGCMHN上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Local Self-Attention on Fine-grained Cross-media Retrieval

Due to the heterogeneity gap, the data representation of different media is inconsistent and belongs to different feature spaces. Therefore, it is challenging to measure the fine-grained gap between them. To this end, we propose an attention space training method to learn common representations of different media data. Specifically, we utilize local self-attention layers to learn the common attention space between different media data. We propose a similarity concatenation method to understand the content relationship between features. To further improve the robustness of the model, we also train a local position encoding to capture the spatial relationships between features. In this way, our proposed method can effectively reduce the gap between different feature distributions on cross-media retrieval tasks. It also improves the fine-grained recognition performance by attaching attention to high-level semantic information. Extensive experiments and ablation studies demonstrate that our proposed method achieves state-of-the-art performance. At the same time, our approach provides a new pipeline for fine-grained cross-media retrieval. The source code and models are publicly available at: https://github.com/NUST-Machine-Intelligence-Laboratory/SAFGCMHN.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Multimedia Asia

自引率

0.00%

发文量