Ilari Pajula;Niclas Joswig;Aiden Morrison;Nadia Sokolova;Laura Ruotsalainen
{"title":"一种基于交叉注意力的新型行人视觉惯性轨迹测量法,其分析表明了密集光流所面临的挑战","authors":"Ilari Pajula;Niclas Joswig;Aiden Morrison;Nadia Sokolova;Laura Ruotsalainen","doi":"10.1109/JISPIN.2023.3344077","DOIUrl":null,"url":null,"abstract":"Visual–inertial odometry (VIO), the fusion of visual and inertial sensor data, has been shown to be functional for navigation in global-navigation-satellite-system-denied environments. Recently, dense-optical-flow-based end-to-end trained deep learning VIO models have gained superior performance in outdoor navigation. In this article, we introduced a novel visual–inertial sensor fusion approach based on vision transformer architecture with a cross-attention mechanism, specifically designed to better integrate potentially poor-quality optical flow features with inertial data. Although optical-flow-based VIO models have obtained superior performance in outdoor vehicle navigation, both in accuracy and ease of calibration, we have shown how their suitability for indoor pedestrian navigation is still far from existing feature-matching-based methods. We compare the performance of traditional VIO models against deep-learning-based VIO models on the KITTI benchmark dataset and our custom pedestrian navigation dataset. We show how end-to-end trained VIO models using optical flow were significantly outperformed by simpler visual odometry models utilizing feature matching. Our findings indicate that due to the robustness against occlusion and camera shake, feature matching is better suited for indoor pedestrian navigation, whereas dense optical flow remains viable for vehicular data. Therefore, the most feasible way forward will be the integration of our novel model with feature-based visual data encoding.","PeriodicalId":100621,"journal":{"name":"IEEE Journal of Indoor and Seamless Positioning and Navigation","volume":"2 ","pages":"25-35"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10363184","citationCount":"0","resultStr":"{\"title\":\"A Novel Cross-Attention-Based Pedestrian Visual–Inertial Odometry With Analyses Demonstrating Challenges in Dense Optical Flow\",\"authors\":\"Ilari Pajula;Niclas Joswig;Aiden Morrison;Nadia Sokolova;Laura Ruotsalainen\",\"doi\":\"10.1109/JISPIN.2023.3344077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual–inertial odometry (VIO), the fusion of visual and inertial sensor data, has been shown to be functional for navigation in global-navigation-satellite-system-denied environments. Recently, dense-optical-flow-based end-to-end trained deep learning VIO models have gained superior performance in outdoor navigation. In this article, we introduced a novel visual–inertial sensor fusion approach based on vision transformer architecture with a cross-attention mechanism, specifically designed to better integrate potentially poor-quality optical flow features with inertial data. Although optical-flow-based VIO models have obtained superior performance in outdoor vehicle navigation, both in accuracy and ease of calibration, we have shown how their suitability for indoor pedestrian navigation is still far from existing feature-matching-based methods. We compare the performance of traditional VIO models against deep-learning-based VIO models on the KITTI benchmark dataset and our custom pedestrian navigation dataset. We show how end-to-end trained VIO models using optical flow were significantly outperformed by simpler visual odometry models utilizing feature matching. Our findings indicate that due to the robustness against occlusion and camera shake, feature matching is better suited for indoor pedestrian navigation, whereas dense optical flow remains viable for vehicular data. Therefore, the most feasible way forward will be the integration of our novel model with feature-based visual data encoding.\",\"PeriodicalId\":100621,\"journal\":{\"name\":\"IEEE Journal of Indoor and Seamless Positioning and Navigation\",\"volume\":\"2 \",\"pages\":\"25-35\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10363184\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Indoor and Seamless Positioning and Navigation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10363184/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Indoor and Seamless Positioning and Navigation","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10363184/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Cross-Attention-Based Pedestrian Visual–Inertial Odometry With Analyses Demonstrating Challenges in Dense Optical Flow
Visual–inertial odometry (VIO), the fusion of visual and inertial sensor data, has been shown to be functional for navigation in global-navigation-satellite-system-denied environments. Recently, dense-optical-flow-based end-to-end trained deep learning VIO models have gained superior performance in outdoor navigation. In this article, we introduced a novel visual–inertial sensor fusion approach based on vision transformer architecture with a cross-attention mechanism, specifically designed to better integrate potentially poor-quality optical flow features with inertial data. Although optical-flow-based VIO models have obtained superior performance in outdoor vehicle navigation, both in accuracy and ease of calibration, we have shown how their suitability for indoor pedestrian navigation is still far from existing feature-matching-based methods. We compare the performance of traditional VIO models against deep-learning-based VIO models on the KITTI benchmark dataset and our custom pedestrian navigation dataset. We show how end-to-end trained VIO models using optical flow were significantly outperformed by simpler visual odometry models utilizing feature matching. Our findings indicate that due to the robustness against occlusion and camera shake, feature matching is better suited for indoor pedestrian navigation, whereas dense optical flow remains viable for vehicular data. Therefore, the most feasible way forward will be the integration of our novel model with feature-based visual data encoding.