Faris Janjos, Maxim Dolgov, Muhamed Kuric, Yinzhe Shen, J. M. Zöllner
{"title":"SAN: Scene Anchor Networks for Joint Action-Space Prediction","authors":"Faris Janjos, Maxim Dolgov, Muhamed Kuric, Yinzhe Shen, J. M. Zöllner","doi":"10.1109/iv51971.2022.9827239","DOIUrl":null,"url":null,"abstract":"In this work, we present a novel multi-modal trajectory prediction architecture. We decompose the uncertainty of future trajectories along higher-level scene characteristics and lower-level motion characteristics, and model multi-modality along both dimensions separately. The scene uncertainty is captured in a joint manner, where diversity of scene modes is ensured by training multiple separate anchor networks which specialize to different scene realizations. At the same time, each network outputs multiple trajectories that cover smaller deviations given a scene mode, thus capturing motion modes. In addition, we train our architectures with an outlier-robust regression loss function, which offers a trade-off between the outlier-sensitive L2 and outlier-insensitive L1 losses. Our scene anchor model achieves improvements over the state of the art on the INTERACTION dataset, outperforming the StarNet architecture from our previous work.","PeriodicalId":184622,"journal":{"name":"2022 IEEE Intelligent Vehicles Symposium (IV)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Intelligent Vehicles Symposium (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iv51971.2022.9827239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
In this work, we present a novel multi-modal trajectory prediction architecture. We decompose the uncertainty of future trajectories along higher-level scene characteristics and lower-level motion characteristics, and model multi-modality along both dimensions separately. The scene uncertainty is captured in a joint manner, where diversity of scene modes is ensured by training multiple separate anchor networks which specialize to different scene realizations. At the same time, each network outputs multiple trajectories that cover smaller deviations given a scene mode, thus capturing motion modes. In addition, we train our architectures with an outlier-robust regression loss function, which offers a trade-off between the outlier-sensitive L2 and outlier-insensitive L1 losses. Our scene anchor model achieves improvements over the state of the art on the INTERACTION dataset, outperforming the StarNet architecture from our previous work.