Shengtang Guo , Huaxiang Zhang , Li Liu , Dongmei Liu , Xu Lu , Liujian Li
{"title":"基于超图聚类的多标签跨模态检索","authors":"Shengtang Guo , Huaxiang Zhang , Li Liu , Dongmei Liu , Xu Lu , Liujian Li","doi":"10.1016/j.jvcir.2024.104258","DOIUrl":null,"url":null,"abstract":"<div><p>Most existing cross-modal retrieval methods face challenges in establishing semantic connections between different modalities due to inherent heterogeneity among them. To establish semantic connections between different modalities and align relevant semantic features across modalities, so as to fully capture important information within the same modality, this paper considers the superiority of hypergraph in representing higher-order relationships, and proposes an image-text retrieval method based on hypergraph clustering. Specifically, we construct hypergraphs to capture feature relationships within image and text modalities, as well as between image and text. This allows us to effectively model complex relationships between features of different modalities and explore the semantic connectivity within and across modalities. To compensate for potential semantic feature loss during the construction of the hypergraph neural network, we design a weight-adaptive coarse and fine-grained feature fusion module for semantic supplementation. Comprehensive experimental results on three common datasets demonstrate the effectiveness of the proposed method.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"103 ","pages":"Article 104258"},"PeriodicalIF":2.6000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hypergraph clustering based multi-label cross-modal retrieval\",\"authors\":\"Shengtang Guo , Huaxiang Zhang , Li Liu , Dongmei Liu , Xu Lu , Liujian Li\",\"doi\":\"10.1016/j.jvcir.2024.104258\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Most existing cross-modal retrieval methods face challenges in establishing semantic connections between different modalities due to inherent heterogeneity among them. To establish semantic connections between different modalities and align relevant semantic features across modalities, so as to fully capture important information within the same modality, this paper considers the superiority of hypergraph in representing higher-order relationships, and proposes an image-text retrieval method based on hypergraph clustering. Specifically, we construct hypergraphs to capture feature relationships within image and text modalities, as well as between image and text. This allows us to effectively model complex relationships between features of different modalities and explore the semantic connectivity within and across modalities. To compensate for potential semantic feature loss during the construction of the hypergraph neural network, we design a weight-adaptive coarse and fine-grained feature fusion module for semantic supplementation. Comprehensive experimental results on three common datasets demonstrate the effectiveness of the proposed method.</p></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"103 \",\"pages\":\"Article 104258\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047320324002141\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320324002141","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Hypergraph clustering based multi-label cross-modal retrieval
Most existing cross-modal retrieval methods face challenges in establishing semantic connections between different modalities due to inherent heterogeneity among them. To establish semantic connections between different modalities and align relevant semantic features across modalities, so as to fully capture important information within the same modality, this paper considers the superiority of hypergraph in representing higher-order relationships, and proposes an image-text retrieval method based on hypergraph clustering. Specifically, we construct hypergraphs to capture feature relationships within image and text modalities, as well as between image and text. This allows us to effectively model complex relationships between features of different modalities and explore the semantic connectivity within and across modalities. To compensate for potential semantic feature loss during the construction of the hypergraph neural network, we design a weight-adaptive coarse and fine-grained feature fusion module for semantic supplementation. Comprehensive experimental results on three common datasets demonstrate the effectiveness of the proposed method.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.