{"title":"Interactive Fusion and Correlation Network for Three-Modal Images Few-Shot Semantic Segmentation","authors":"Haolan He;Xianguo Dong;Xiaofei Zhou;Bo Wang;Jiyong Zhang","doi":"10.1109/LSP.2024.3456634","DOIUrl":null,"url":null,"abstract":"This letter presents a novel method for three-modal images few-shot semantic segmentation. Some previous efforts fuse multiple modalities before feature correlation, while this changes the original visual information that is useful to subsequent feature matching. Others are built based on early correlation learning, which can cause details loss and thereby defects multi-modal integration. To address these challenges, we build a novel interactive fusion and correlation network (IFCNet). Specifically, the proposed fusing and correlating (FC) module performs feature correlating and attention-based multi-modal fusing interactively, which establishes effective inter-modal complementarity and benefits intra-modal query-support correlation. Furthermore, we add a multi-modal correlation (MC) module, which leverages multi-layer cosine similarity maps to enrich multi-modal visual correspondence. Experiments on the VDT-2048-5\n<inline-formula><tex-math>$^{i}$</tex-math></inline-formula>\n dataset demonstrate the network's superior performance, which outperforms existing state-of-the-art methods in both 1-shot and 5-shot settings. The study also includes an ablation analysis to validate the contributions of the FC module and the MC module to the overall segmentation accuracy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10669915/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This letter presents a novel method for three-modal images few-shot semantic segmentation. Some previous efforts fuse multiple modalities before feature correlation, while this changes the original visual information that is useful to subsequent feature matching. Others are built based on early correlation learning, which can cause details loss and thereby defects multi-modal integration. To address these challenges, we build a novel interactive fusion and correlation network (IFCNet). Specifically, the proposed fusing and correlating (FC) module performs feature correlating and attention-based multi-modal fusing interactively, which establishes effective inter-modal complementarity and benefits intra-modal query-support correlation. Furthermore, we add a multi-modal correlation (MC) module, which leverages multi-layer cosine similarity maps to enrich multi-modal visual correspondence. Experiments on the VDT-2048-5
$^{i}$
dataset demonstrate the network's superior performance, which outperforms existing state-of-the-art methods in both 1-shot and 5-shot settings. The study also includes an ablation analysis to validate the contributions of the FC module and the MC module to the overall segmentation accuracy.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.