MiLNet: Multiplex Interactive Learning Network for RGB-T Semantic Segmentation

IF 13.7 IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-03 DOI:10.1109/TIP.2025.3544484

Jinfu Liu;Hong Liu;Xia Li;Jiale Ren;Xinhua Xu

{"title":"MiLNet: Multiplex Interactive Learning Network for RGB-T Semantic Segmentation","authors":"Jinfu Liu;Hong Liu;Xia Li;Jiale Ren;Xinhua Xu","doi":"10.1109/TIP.2025.3544484","DOIUrl":null,"url":null,"abstract":"Semantic segmentation methods enhance robust and reliable understanding under adverse illumination conditions by integrating complementary information from visible and thermal infrared (RGB-T) images. Existing methods primarily focus on designing various feature fusion modules between different modalities, overlooking that feature learning is the critical aspect of scene understanding. In this paper, we propose a novel module-free Multiplex Interactive Learning Network (MiLNet) for RGB-T semantic segmentation, which adeptly integrates multi-model, multi-modal, and multi-level feature learning, fully exploiting the potential of multiplex feature interaction. Specifically, robust knowledge is transferred from the vision foundation model to our task-specific model to enhance its segmentation performance. In the task-specific model, an asymmetric simulated learning strategy is introduced to facilitate mutual learning of geometric and semantic information between high- and low-level features across modalities. Additionally, an inverse hierarchical fusion strategy based on feature learning pairs is adopted and further refined using multilabel and multiscale supervision. Experimental results on the MFNet and PST900 datasets demonstrate that MiLNet outperforms state-of-the-art methods in terms of mIoU. As a limitation, the model’s performance under few-sample conditions could be improved further. The code and results of our method are available at <uri>https://github.com/Jinfu-pku/MiLNet</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1686-1699"},"PeriodicalIF":13.7000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10908980/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic segmentation methods enhance robust and reliable understanding under adverse illumination conditions by integrating complementary information from visible and thermal infrared (RGB-T) images. Existing methods primarily focus on designing various feature fusion modules between different modalities, overlooking that feature learning is the critical aspect of scene understanding. In this paper, we propose a novel module-free Multiplex Interactive Learning Network (MiLNet) for RGB-T semantic segmentation, which adeptly integrates multi-model, multi-modal, and multi-level feature learning, fully exploiting the potential of multiplex feature interaction. Specifically, robust knowledge is transferred from the vision foundation model to our task-specific model to enhance its segmentation performance. In the task-specific model, an asymmetric simulated learning strategy is introduced to facilitate mutual learning of geometric and semantic information between high- and low-level features across modalities. Additionally, an inverse hierarchical fusion strategy based on feature learning pairs is adopted and further refined using multilabel and multiscale supervision. Experimental results on the MFNet and PST900 datasets demonstrate that MiLNet outperforms state-of-the-art methods in terms of mIoU. As a limitation, the model’s performance under few-sample conditions could be improved further. The code and results of our method are available at https://github.com/Jinfu-pku/MiLNet.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MiLNet: RGB-T语义分割的多路交互式学习网络

语义分割方法通过整合可见光和热红外（RGB-T）图像的互补信息，增强了在恶劣光照条件下的鲁棒性和可靠性理解。现有方法主要侧重于设计不同模态之间的各种特征融合模块，忽略了特征学习是场景理解的关键方面。在本文中，我们提出了一种新的用于RGB-T语义分割的无模块多路交互学习网络（MiLNet），该网络巧妙地集成了多模型、多模态和多层次的特征学习，充分发挥了多路特征交互的潜力。具体来说，鲁棒性知识从视觉基础模型转移到我们的任务特定模型，以提高其分割性能。在特定任务模型中，引入了一种非对称模拟学习策略，以促进高阶和低阶特征之间的几何和语义信息的相互学习。此外，采用了一种基于特征学习对的逆分层融合策略，并利用多标签和多尺度监督进一步细化。在MFNet和PST900数据集上的实验结果表明，MiLNet在mIoU方面优于最先进的方法。作为一个限制，模型在少样本条件下的性能可以进一步提高。我们的方法的代码和结果可在https://github.com/Jinfu-pku/MiLNet上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量