Multi-modal information fusion for multi-task end-to-end behavior prediction in autonomous driving

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2025-03-01 DOI:10.1016/j.neucom.2025.129857

Guo Baicang , Liu Hao , Yang Xiao , Cao Yuan , Jin Lisheng , Wang Yinlin

{"title":"Multi-modal information fusion for multi-task end-to-end behavior prediction in autonomous driving","authors":"Guo Baicang , Liu Hao , Yang Xiao , Cao Yuan , Jin Lisheng , Wang Yinlin","doi":"10.1016/j.neucom.2025.129857","DOIUrl":null,"url":null,"abstract":"<div><div>Behavior prediction in autonomous driving is increasingly achieved through end-to-end frameworks that predict vehicle states from multi-modal information, streamlining decision-making and enhancing robustness in time-varying road conditions. This study proposes a novel multi-modal information fusion-based, multi-task end-to-end model that integrates RGB images, depth maps, and semantic segmentation data, enhancing situational awareness and predictive precision. Utilizing a Vision Transformer (ViT) for comprehensive spatial feature extraction and a Residual-CNN-BiGRU structure for capturing temporal dependencies, the model fuses spatiotemporal features to predict vehicle speed and steering angle with high precision. Through comparative, ablation, and generalization tests on the Udacity and self-collected datasets, the proposed model achieves steering angle prediction errors of MSE 0.012 rad, RMSE 0.109 rad, and MAE 0.074 rad, and speed prediction errors of MSE 0.321 km/h, RMSE 0.567 km/h, and MAE 0.373 km/h, outperforming existing driving behavior prediction models. Key contributions of this study include the development of a channel difference attention mechanism and advanced spatiotemporal feature fusion techniques, which improve predictive accuracy and robustness. These methods effectively balance computational efficiency and predictive performance, contributing to practical advancements in driving behavior prediction.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"634 ","pages":"Article 129857"},"PeriodicalIF":5.5000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225005296","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Behavior prediction in autonomous driving is increasingly achieved through end-to-end frameworks that predict vehicle states from multi-modal information, streamlining decision-making and enhancing robustness in time-varying road conditions. This study proposes a novel multi-modal information fusion-based, multi-task end-to-end model that integrates RGB images, depth maps, and semantic segmentation data, enhancing situational awareness and predictive precision. Utilizing a Vision Transformer (ViT) for comprehensive spatial feature extraction and a Residual-CNN-BiGRU structure for capturing temporal dependencies, the model fuses spatiotemporal features to predict vehicle speed and steering angle with high precision. Through comparative, ablation, and generalization tests on the Udacity and self-collected datasets, the proposed model achieves steering angle prediction errors of MSE 0.012 rad, RMSE 0.109 rad, and MAE 0.074 rad, and speed prediction errors of MSE 0.321 km/h, RMSE 0.567 km/h, and MAE 0.373 km/h, outperforming existing driving behavior prediction models. Key contributions of this study include the development of a channel difference attention mechanism and advanced spatiotemporal feature fusion techniques, which improve predictive accuracy and robustness. These methods effectively balance computational efficiency and predictive performance, contributing to practical advancements in driving behavior prediction.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.