{"title":"Semantic Maximum Relevance and Modal Alignment for Cross-Modal Retrieval","authors":"Pingping Sun, Baohua Qiang, Zhiguang Liu, Xianyi Yang, Guangyong Xi, Weigang Liu, Ruidong Chen, S. Zhang","doi":"10.1145/3581807.3581857","DOIUrl":null,"url":null,"abstract":"With the increasing abundance of multimedia data resources, researches on mining the relationship between different modalities to achieve refined cross-modal retrieval are gradually emerging. In this paper, we propose a novel Semantic Maximum Relevance and Modal Alignment (SMR-MA) for Cross-Modal Retrieval, which utilizes the pre-trained model with abundant image text information to extract the features of each image text, and further promotes the modal information interaction between the same semantic categories through the modal alignment module and the multi-layer perceptron with shared weights. In addition, multi-modal embedding is distributed to the normalized hypersphere, and angular edge penalty is applied between feature embedding and weight in angular space to maximize the classification boundary, thus increasing both intra-class similarity and inter-class difference. Comprehensive analysis experiments on three benchmark datasets demonstrate that the proposed method has superior performance in cross-modal retrieval tasks and is significantly superior to the state-of-the-art cross-modal retrieval methods.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the increasing abundance of multimedia data resources, researches on mining the relationship between different modalities to achieve refined cross-modal retrieval are gradually emerging. In this paper, we propose a novel Semantic Maximum Relevance and Modal Alignment (SMR-MA) for Cross-Modal Retrieval, which utilizes the pre-trained model with abundant image text information to extract the features of each image text, and further promotes the modal information interaction between the same semantic categories through the modal alignment module and the multi-layer perceptron with shared weights. In addition, multi-modal embedding is distributed to the normalized hypersphere, and angular edge penalty is applied between feature embedding and weight in angular space to maximize the classification boundary, thus increasing both intra-class similarity and inter-class difference. Comprehensive analysis experiments on three benchmark datasets demonstrate that the proposed method has superior performance in cross-modal retrieval tasks and is significantly superior to the state-of-the-art cross-modal retrieval methods.