CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes

IF 5.3 2区计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2025-01-30 DOI:10.1109/LRA.2025.3536218

Tim Brödermann;Christos Sakaridis;Yuqian Fu;Luc Van Gool

{"title":"CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes","authors":"Tim Brödermann;Christos Sakaridis;Yuqian Fu;Luc Van Gool","doi":"10.1109/LRA.2025.3536218","DOIUrl":null,"url":null,"abstract":"Leveraging multiple sensors is crucial for robust semantic perception in autonomous driving, as each sensor type has complementary strengths and weaknesses. However, existing sensor fusion methods often treat sensors uniformly across all conditions, leading to suboptimal performance. By contrast, we propose a novel, <italic>condition-aware multimodal</i> fusion approach for robust semantic perception of driving scenes. Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a <italic>Condition Token</i> that guides the fusion of multiple sensor modalities. We further newly introduce modality-specific feature adapters to align diverse sensor inputs into a shared latent space, enabling efficient integration with a single and shared pre-trained backbone. By dynamically adapting sensor fusion based on the actual condition, our model significantly improves robustness and accuracy, especially in adverse-condition scenarios. CAFuser ranks first on the public MUSES benchmarks, achieving 59.7 PQ for multimodal panoptic and 78.2 mIoU for semantic segmentation, and also sets the new state of the art on DeLiVER.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3134-3141"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10858375/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Leveraging multiple sensors is crucial for robust semantic perception in autonomous driving, as each sensor type has complementary strengths and weaknesses. However, existing sensor fusion methods often treat sensors uniformly across all conditions, leading to suboptimal performance. By contrast, we propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes. Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token that guides the fusion of multiple sensor modalities. We further newly introduce modality-specific feature adapters to align diverse sensor inputs into a shared latent space, enabling efficient integration with a single and shared pre-trained backbone. By dynamically adapting sensor fusion based on the actual condition, our model significantly improves robustness and accuracy, especially in adverse-condition scenarios. CAFuser ranks first on the public MUSES benchmarks, achieving 59.7 PQ for multimodal panoptic and 78.2 mIoU for semantic segmentation, and also sets the new state of the art on DeLiVER.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

驾驶场景鲁棒语义感知的条件感知多模态融合

利用多个传感器对于自动驾驶中的鲁棒语义感知至关重要，因为每种传感器类型都具有互补的优势和劣势。然而，现有的传感器融合方法通常在所有条件下均匀地处理传感器，导致性能不理想。相比之下，我们提出了一种新的，条件感知的多模态融合方法，用于驾驶场景的鲁棒语义感知。我们的方法CAFuser使用RGB相机输入对环境条件进行分类，并生成引导多个传感器模式融合的条件令牌。我们进一步引入了模式特定的功能适配器，将不同的传感器输入对齐到共享的潜在空间中，从而实现与单个共享的预训练主干的有效集成。通过基于实际情况的动态自适应传感器融合，我们的模型显著提高了鲁棒性和准确性，特别是在不利条件下。CAFuser在公共muse基准测试中排名第一，在多模态全景上达到59.7 PQ，在语义分割方面达到78.2 mIoU，并且在交付方面也达到了新的水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.