MiLNet: Multiplex Interactive Learning Network for RGB-T Semantic Segmentation

Jinfu Liu;Hong Liu;Xia Li;Jiale Ren;Xinhua Xu
{"title":"MiLNet: Multiplex Interactive Learning Network for RGB-T Semantic Segmentation","authors":"Jinfu Liu;Hong Liu;Xia Li;Jiale Ren;Xinhua Xu","doi":"10.1109/TIP.2025.3544484","DOIUrl":null,"url":null,"abstract":"Semantic segmentation methods enhance robust and reliable understanding under adverse illumination conditions by integrating complementary information from visible and thermal infrared (RGB-T) images. Existing methods primarily focus on designing various feature fusion modules between different modalities, overlooking that feature learning is the critical aspect of scene understanding. In this paper, we propose a novel module-free Multiplex Interactive Learning Network (MiLNet) for RGB-T semantic segmentation, which adeptly integrates multi-model, multi-modal, and multi-level feature learning, fully exploiting the potential of multiplex feature interaction. Specifically, robust knowledge is transferred from the vision foundation model to our task-specific model to enhance its segmentation performance. In the task-specific model, an asymmetric simulated learning strategy is introduced to facilitate mutual learning of geometric and semantic information between high- and low-level features across modalities. Additionally, an inverse hierarchical fusion strategy based on feature learning pairs is adopted and further refined using multilabel and multiscale supervision. Experimental results on the MFNet and PST900 datasets demonstrate that MiLNet outperforms state-of-the-art methods in terms of mIoU. As a limitation, the model’s performance under few-sample conditions could be improved further. The code and results of our method are available at <uri>https://github.com/Jinfu-pku/MiLNet</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1686-1699"},"PeriodicalIF":13.7000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10908980/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Semantic segmentation methods enhance robust and reliable understanding under adverse illumination conditions by integrating complementary information from visible and thermal infrared (RGB-T) images. Existing methods primarily focus on designing various feature fusion modules between different modalities, overlooking that feature learning is the critical aspect of scene understanding. In this paper, we propose a novel module-free Multiplex Interactive Learning Network (MiLNet) for RGB-T semantic segmentation, which adeptly integrates multi-model, multi-modal, and multi-level feature learning, fully exploiting the potential of multiplex feature interaction. Specifically, robust knowledge is transferred from the vision foundation model to our task-specific model to enhance its segmentation performance. In the task-specific model, an asymmetric simulated learning strategy is introduced to facilitate mutual learning of geometric and semantic information between high- and low-level features across modalities. Additionally, an inverse hierarchical fusion strategy based on feature learning pairs is adopted and further refined using multilabel and multiscale supervision. Experimental results on the MFNet and PST900 datasets demonstrate that MiLNet outperforms state-of-the-art methods in terms of mIoU. As a limitation, the model’s performance under few-sample conditions could be improved further. The code and results of our method are available at https://github.com/Jinfu-pku/MiLNet.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MiLNet: RGB-T语义分割的多路交互式学习网络
语义分割方法通过整合可见光和热红外(RGB-T)图像的互补信息,增强了在恶劣光照条件下的鲁棒性和可靠性理解。现有方法主要侧重于设计不同模态之间的各种特征融合模块,忽略了特征学习是场景理解的关键方面。在本文中,我们提出了一种新的用于RGB-T语义分割的无模块多路交互学习网络(MiLNet),该网络巧妙地集成了多模型、多模态和多层次的特征学习,充分发挥了多路特征交互的潜力。具体来说,鲁棒性知识从视觉基础模型转移到我们的任务特定模型,以提高其分割性能。在特定任务模型中,引入了一种非对称模拟学习策略,以促进高阶和低阶特征之间的几何和语义信息的相互学习。此外,采用了一种基于特征学习对的逆分层融合策略,并利用多标签和多尺度监督进一步细化。在MFNet和PST900数据集上的实验结果表明,MiLNet在mIoU方面优于最先进的方法。作为一个限制,模型在少样本条件下的性能可以进一步提高。我们的方法的代码和结果可在https://github.com/Jinfu-pku/MiLNet上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Dark-EvGS: Event Camera as an Eye for Radiance Field in the Dark. SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing. JDPNet: A Network Based on Joint Degradation Processing for Underwater Image Enhancement Long-Tailed and Inter-Class Homogeneity Matters in Multi-Class Weakly Supervised Tissue Segmentation of Histopathology Images DiffLLFace: Learning Alternate Illumination-Diffusion Adaptation for Low-Light Face Super-Resolution and Beyond
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1