CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes

IF 5.3 2区 计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2025-01-30 DOI:10.1109/LRA.2025.3536218
Tim Brödermann;Christos Sakaridis;Yuqian Fu;Luc Van Gool
{"title":"CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes","authors":"Tim Brödermann;Christos Sakaridis;Yuqian Fu;Luc Van Gool","doi":"10.1109/LRA.2025.3536218","DOIUrl":null,"url":null,"abstract":"Leveraging multiple sensors is crucial for robust semantic perception in autonomous driving, as each sensor type has complementary strengths and weaknesses. However, existing sensor fusion methods often treat sensors uniformly across all conditions, leading to suboptimal performance. By contrast, we propose a novel, <italic>condition-aware multimodal</i> fusion approach for robust semantic perception of driving scenes. Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a <italic>Condition Token</i> that guides the fusion of multiple sensor modalities. We further newly introduce modality-specific feature adapters to align diverse sensor inputs into a shared latent space, enabling efficient integration with a single and shared pre-trained backbone. By dynamically adapting sensor fusion based on the actual condition, our model significantly improves robustness and accuracy, especially in adverse-condition scenarios. CAFuser ranks first on the public MUSES benchmarks, achieving 59.7 PQ for multimodal panoptic and 78.2 mIoU for semantic segmentation, and also sets the new state of the art on DeLiVER.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3134-3141"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10858375/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Leveraging multiple sensors is crucial for robust semantic perception in autonomous driving, as each sensor type has complementary strengths and weaknesses. However, existing sensor fusion methods often treat sensors uniformly across all conditions, leading to suboptimal performance. By contrast, we propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes. Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token that guides the fusion of multiple sensor modalities. We further newly introduce modality-specific feature adapters to align diverse sensor inputs into a shared latent space, enabling efficient integration with a single and shared pre-trained backbone. By dynamically adapting sensor fusion based on the actual condition, our model significantly improves robustness and accuracy, especially in adverse-condition scenarios. CAFuser ranks first on the public MUSES benchmarks, achieving 59.7 PQ for multimodal panoptic and 78.2 mIoU for semantic segmentation, and also sets the new state of the art on DeLiVER.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
驾驶场景鲁棒语义感知的条件感知多模态融合
利用多个传感器对于自动驾驶中的鲁棒语义感知至关重要,因为每种传感器类型都具有互补的优势和劣势。然而,现有的传感器融合方法通常在所有条件下均匀地处理传感器,导致性能不理想。相比之下,我们提出了一种新的,条件感知的多模态融合方法,用于驾驶场景的鲁棒语义感知。我们的方法CAFuser使用RGB相机输入对环境条件进行分类,并生成引导多个传感器模式融合的条件令牌。我们进一步引入了模式特定的功能适配器,将不同的传感器输入对齐到共享的潜在空间中,从而实现与单个共享的预训练主干的有效集成。通过基于实际情况的动态自适应传感器融合,我们的模型显著提高了鲁棒性和准确性,特别是在不利条件下。CAFuser在公共muse基准测试中排名第一,在多模态全景上达到59.7 PQ,在语义分割方面达到78.2 mIoU,并且在交付方面也达到了新的水平。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
期刊最新文献
A Multimodal Selective Fusion Approach for Robotic Grasp Detection Design and Control of a Parallel Elastic Actuator With Adjustable Equilibrium Position LED Pouch Motor: Wavelength Selective Wireless Actuation of Dyed Liquid-to-Gas Phase Change Actuators Using LEDs Direct Sparse Initialization for Stereo Visual-Inertial Odometry Adaptive Collision Detection via Impulse–Momentum Theorem for Safe Sensorless Physical Human-Robot Interaction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1