自动驾驶中鸟瞰感知鲁棒性的基准测试与改进

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-29 DOI:10.1109/TPAMI.2025.3535960

Shaoyuan Xie;Lingdong Kong;Wenwei Zhang;Jiawei Ren;Liang Pan;Kai Chen;Ziwei Liu

{"title":"自动驾驶中鸟瞰感知鲁棒性的基准测试与改进","authors":"Shaoyuan Xie;Lingdong Kong;Wenwei Zhang;Jiawei Ren;Liang Pan;Kai Chen;Ziwei Liu","doi":"10.1109/TPAMI.2025.3535960","DOIUrl":null,"url":null,"abstract":"Recent advancements in bird’s eye view (BEV) representations have shown remarkable promise for in-vehicle 3D perception. However, while these methods have achieved impressive results on standard benchmarks, their robustness in varied conditions remains insufficiently assessed. In this study, we present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms. This suite incorporates a diverse set of camera corruption types, each examined over three severity levels. Our benchmarks also consider the impact of complete sensor failures that occur when using multi-modal models. Through RoboBEV, we assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction. Our analyses reveal a noticeable correlation between the model’s performance on in-distribution datasets and its resilience to out-of-distribution challenges. Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data. Furthermore, we observe that leveraging extensive temporal information significantly improves the model’s robustness. Based on our observations, we design an effective robustness enhancement strategy based on the CLIP model. The insights from this study pave the way for the development of future BEV models that seamlessly combine accuracy with real-world robustness.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3878-3894"},"PeriodicalIF":18.6000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Benchmarking and Improving Bird’s Eye View Perception Robustness in Autonomous Driving\",\"authors\":\"Shaoyuan Xie;Lingdong Kong;Wenwei Zhang;Jiawei Ren;Liang Pan;Kai Chen;Ziwei Liu\",\"doi\":\"10.1109/TPAMI.2025.3535960\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements in bird’s eye view (BEV) representations have shown remarkable promise for in-vehicle 3D perception. However, while these methods have achieved impressive results on standard benchmarks, their robustness in varied conditions remains insufficiently assessed. In this study, we present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms. This suite incorporates a diverse set of camera corruption types, each examined over three severity levels. Our benchmarks also consider the impact of complete sensor failures that occur when using multi-modal models. Through RoboBEV, we assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction. Our analyses reveal a noticeable correlation between the model’s performance on in-distribution datasets and its resilience to out-of-distribution challenges. Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data. Furthermore, we observe that leveraging extensive temporal information significantly improves the model’s robustness. Based on our observations, we design an effective robustness enhancement strategy based on the CLIP model. The insights from this study pave the way for the development of future BEV models that seamlessly combine accuracy with real-world robustness.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 5\",\"pages\":\"3878-3894\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-01-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10857618/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10857618/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

最近在鸟瞰（BEV）表示方面的进展显示出车内3D感知的非凡前景。然而，尽管这些方法在标准基准上取得了令人印象深刻的结果，但它们在各种条件下的稳健性仍然没有得到充分的评估。在这项研究中，我们提出了RoboBEV，一个广泛的基准套件，旨在评估BEV算法的弹性。该套件包含了一组不同的相机损坏类型，每个类型都经过三个严重级别的检查。我们的基准还考虑了使用多模态模型时发生的完全传感器故障的影响。通过RoboBEV，我们评估了33个最先进的基于bev的感知模型，涵盖了检测、地图分割、深度估计和占用率预测等任务。我们的分析揭示了模型在分布内数据集上的性能与其对分布外挑战的弹性之间的显著相关性。我们的实验结果还强调了预训练和无深度BEV转换等策略在增强对分布外数据的鲁棒性方面的有效性。此外，我们观察到利用广泛的时间信息显着提高了模型的鲁棒性。基于我们的观察，我们设计了一种有效的基于CLIP模型的鲁棒性增强策略。这项研究的见解为未来BEV模型的发展铺平了道路，这些模型将精度与现实世界的鲁棒性无缝结合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Benchmarking and Improving Bird’s Eye View Perception Robustness in Autonomous Driving

Recent advancements in bird’s eye view (BEV) representations have shown remarkable promise for in-vehicle 3D perception. However, while these methods have achieved impressive results on standard benchmarks, their robustness in varied conditions remains insufficiently assessed. In this study, we present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms. This suite incorporates a diverse set of camera corruption types, each examined over three severity levels. Our benchmarks also consider the impact of complete sensor failures that occur when using multi-modal models. Through RoboBEV, we assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction. Our analyses reveal a noticeable correlation between the model’s performance on in-distribution datasets and its resilience to out-of-distribution challenges. Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data. Furthermore, we observe that leveraging extensive temporal information significantly improves the model’s robustness. Based on our observations, we design an effective robustness enhancement strategy based on the CLIP model. The insights from this study pave the way for the development of future BEV models that seamlessly combine accuracy with real-world robustness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量

期刊最新文献

Calibrating Biased Distribution in VFM-Derived Latent Space via Cross-Domain Geometric Consistency. Penny-Wise and Pound-Foolish in AI-Generated Image Detection. 50 Years of Automated Face Recognition. Soft Label Pruning and Quantization for Large-Scale Dataset Distillation. On the Adversarial Transferability of Generalized "Skip Connections".