Semantic Maximum Relevance and Modal Alignment for Cross-Modal Retrieval

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition Pub Date : 2022-11-17 DOI:10.1145/3581807.3581857

Pingping Sun, Baohua Qiang, Zhiguang Liu, Xianyi Yang, Guangyong Xi, Weigang Liu, Ruidong Chen, S. Zhang

{"title":"Semantic Maximum Relevance and Modal Alignment for Cross-Modal Retrieval","authors":"Pingping Sun, Baohua Qiang, Zhiguang Liu, Xianyi Yang, Guangyong Xi, Weigang Liu, Ruidong Chen, S. Zhang","doi":"10.1145/3581807.3581857","DOIUrl":null,"url":null,"abstract":"With the increasing abundance of multimedia data resources, researches on mining the relationship between different modalities to achieve refined cross-modal retrieval are gradually emerging. In this paper, we propose a novel Semantic Maximum Relevance and Modal Alignment (SMR-MA) for Cross-Modal Retrieval, which utilizes the pre-trained model with abundant image text information to extract the features of each image text, and further promotes the modal information interaction between the same semantic categories through the modal alignment module and the multi-layer perceptron with shared weights. In addition, multi-modal embedding is distributed to the normalized hypersphere, and angular edge penalty is applied between feature embedding and weight in angular space to maximize the classification boundary, thus increasing both intra-class similarity and inter-class difference. Comprehensive analysis experiments on three benchmark datasets demonstrate that the proposed method has superior performance in cross-modal retrieval tasks and is significantly superior to the state-of-the-art cross-modal retrieval methods.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the increasing abundance of multimedia data resources, researches on mining the relationship between different modalities to achieve refined cross-modal retrieval are gradually emerging. In this paper, we propose a novel Semantic Maximum Relevance and Modal Alignment (SMR-MA) for Cross-Modal Retrieval, which utilizes the pre-trained model with abundant image text information to extract the features of each image text, and further promotes the modal information interaction between the same semantic categories through the modal alignment module and the multi-layer perceptron with shared weights. In addition, multi-modal embedding is distributed to the normalized hypersphere, and angular edge penalty is applied between feature embedding and weight in angular space to maximize the classification boundary, thus increasing both intra-class similarity and inter-class difference. Comprehensive analysis experiments on three benchmark datasets demonstrate that the proposed method has superior performance in cross-modal retrieval tasks and is significantly superior to the state-of-the-art cross-modal retrieval methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

跨模态检索的语义最大关联和模态对齐

随着多媒体数据资源的日益丰富，挖掘不同模态之间的关系以实现精细化的跨模态检索的研究逐渐兴起。在本文中，我们提出了一种新的跨模态检索的语义最大关联和模态对齐(SMR-MA)方法，该方法利用具有丰富图像文本信息的预训练模型提取每个图像文本的特征，并通过模态对齐模块和具有共享权重的多层感知器进一步促进相同语义类别之间的模态信息交互。此外，将多模态嵌入分布到归一化超球上，并在角空间中对特征嵌入和权值进行角边惩罚，使分类边界最大化，从而增加类内相似度和类间差异。在三个基准数据集上进行的综合分析实验表明，该方法在跨模态检索任务中具有优异的性能，明显优于现有的跨模态检索方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Multi-Scale Channel Attention for Chinese Scene Text Recognition Vehicle Re-identification Based on Multi-Scale Attention Feature Fusion Comparative Study on EEG Feature Recognition based on Deep Belief Network VA-TransUNet: A U-shaped Medical Image Segmentation Network with Visual Attention Traffic Flow Forecasting Research Based on Delay Reconstruction and GRU-SVR