Decouple Ego-View Motions for Predicting Pedestrian Trajectory and Intention

IF 13.7 IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-08-26 DOI:10.1109/TIP.2024.3445734

Zhengming Zhang;Zhengming Ding;Renran Tian

{"title":"Decouple Ego-View Motions for Predicting Pedestrian Trajectory and Intention","authors":"Zhengming Zhang;Zhengming Ding;Renran Tian","doi":"10.1109/TIP.2024.3445734","DOIUrl":null,"url":null,"abstract":"Pedestrian trajectory prediction is a critical component of autonomous driving in urban environments, allowing vehicles to anticipate pedestrian movements and facilitate safer interactions. While egocentric-view-based algorithms can reduce the sensing and computation burdens of 3D scene reconstruction, accurately predicting pedestrian trajectories and interpreting their intentions from this perspective requires a better understanding of the coupled vehicle (camera) and pedestrian motions, which has not been adequately addressed by existing models. In this paper, we present a novel egocentric pedestrian trajectory prediction approach that uses a two-tower structure and multi-modal inputs. One tower, the vehicle module, receives only the initial pedestrian position and ego-vehicle actions and speed, while the other, the pedestrian module, receives additional prior pedestrian trajectory and visual features. Our proposed action-aware loss function allows the two-tower model to decompose pedestrian trajectory predictions into two parts, caused by ego-vehicle movement and pedestrian movement, respectively, even when only trained on combined ego-view motions. This decomposition increases model flexibility and provides a better estimation of pedestrian actions and intentions, enhancing overall performance. Experiments on three publicly available benchmark datasets show that our proposed model outperforms all existing algorithms in ego-view pedestrian trajectory prediction accuracy.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"4716-4727"},"PeriodicalIF":13.7000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10648593/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Pedestrian trajectory prediction is a critical component of autonomous driving in urban environments, allowing vehicles to anticipate pedestrian movements and facilitate safer interactions. While egocentric-view-based algorithms can reduce the sensing and computation burdens of 3D scene reconstruction, accurately predicting pedestrian trajectories and interpreting their intentions from this perspective requires a better understanding of the coupled vehicle (camera) and pedestrian motions, which has not been adequately addressed by existing models. In this paper, we present a novel egocentric pedestrian trajectory prediction approach that uses a two-tower structure and multi-modal inputs. One tower, the vehicle module, receives only the initial pedestrian position and ego-vehicle actions and speed, while the other, the pedestrian module, receives additional prior pedestrian trajectory and visual features. Our proposed action-aware loss function allows the two-tower model to decompose pedestrian trajectory predictions into two parts, caused by ego-vehicle movement and pedestrian movement, respectively, even when only trained on combined ego-view motions. This decomposition increases model flexibility and provides a better estimation of pedestrian actions and intentions, enhancing overall performance. Experiments on three publicly available benchmark datasets show that our proposed model outperforms all existing algorithms in ego-view pedestrian trajectory prediction accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

解耦自我视图运动，预测行人轨迹和意图。

行人轨迹预测是城市环境中自动驾驶的一个重要组成部分，它使车辆能够预测行人的行动并促进更安全的互动。虽然基于自我中心视角的算法可以减轻三维场景重建的传感和计算负担，但从这个角度准确预测行人轨迹并解读其意图需要更好地理解车辆（摄像头）和行人的耦合运动，而现有模型尚未充分解决这个问题。在本文中，我们提出了一种新颖的以自我为中心的行人轨迹预测方法，该方法采用双塔结构和多模态输入。其中一个塔，即车辆模块，只接收初始行人位置和自我车辆的动作和速度，而另一个塔，即行人模块，接收额外的先验行人轨迹和视觉特征。我们提出的行动感知损失函数允许双塔模型将行人轨迹预测分解为两部分，分别由自我车辆运动和行人运动引起，即使只对自我视图运动进行综合训练也是如此。这种分解增加了模型的灵活性，并能更好地估计行人的行动和意图，从而提高整体性能。在三个公开的基准数据集上进行的实验表明，我们提出的模型在自我视角行人轨迹预测准确性方面优于所有现有算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量