{"title":"Transformer with Global and Local Interaction for Pedestrian Trajectory Prediction","authors":"Lingyue Kong, Kun Jiang, Yuanda Wang","doi":"10.1109/ACAIT56212.2022.10137826","DOIUrl":null,"url":null,"abstract":"Accurate prediction of pedestrian trajectory is crucial for the autonomous driving system and service robots. In this paper, we further analyze the pedestrian interaction patterns and propose a novel model, named GL-Net, based on the graph structure with two encoders and one decoder. Our model first formulates the short-term spatio-temporal interaction between pedestrians within a single frame by the single sequence encoder. In this module, we utilize a graph attention network (GAT) and a graph-based transformer in parallel to extract both local and global spatial interaction features respectively. A set of candidate trajectories are then generated by the long sequence encoder, which can extract entire temporal dependence in historical pedestrian trajectory and Figure out long-term pedestrian intention. To rectify the inherent uncertainty caused by the multimodal nature, we introduce a Gaussian noise to our spatio-temporal embedding. Evaluations of ETH and UCY datasets show that our model achieves better performance than the previous graph-based models. Moreover, our model produces more reasonable trajectories at the point of social interaction and has a better balance of capturing spatial interaction features and generating temporal sequences than other models.","PeriodicalId":398228,"journal":{"name":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","volume":"22 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACAIT56212.2022.10137826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate prediction of pedestrian trajectory is crucial for the autonomous driving system and service robots. In this paper, we further analyze the pedestrian interaction patterns and propose a novel model, named GL-Net, based on the graph structure with two encoders and one decoder. Our model first formulates the short-term spatio-temporal interaction between pedestrians within a single frame by the single sequence encoder. In this module, we utilize a graph attention network (GAT) and a graph-based transformer in parallel to extract both local and global spatial interaction features respectively. A set of candidate trajectories are then generated by the long sequence encoder, which can extract entire temporal dependence in historical pedestrian trajectory and Figure out long-term pedestrian intention. To rectify the inherent uncertainty caused by the multimodal nature, we introduce a Gaussian noise to our spatio-temporal embedding. Evaluations of ETH and UCY datasets show that our model achieves better performance than the previous graph-based models. Moreover, our model produces more reasonable trajectories at the point of social interaction and has a better balance of capturing spatial interaction features and generating temporal sequences than other models.