一种基于交叉注意力的新型行人视觉惯性轨迹测量法，其分析表明了密集光流所面临的挑战

IEEE Journal of Indoor and Seamless Positioning and Navigation Pub Date : 2023-12-18 DOI:10.1109/JISPIN.2023.3344077

Ilari Pajula;Niclas Joswig;Aiden Morrison;Nadia Sokolova;Laura Ruotsalainen

{"title":"一种基于交叉注意力的新型行人视觉惯性轨迹测量法，其分析表明了密集光流所面临的挑战","authors":"Ilari Pajula;Niclas Joswig;Aiden Morrison;Nadia Sokolova;Laura Ruotsalainen","doi":"10.1109/JISPIN.2023.3344077","DOIUrl":null,"url":null,"abstract":"Visual–inertial odometry (VIO), the fusion of visual and inertial sensor data, has been shown to be functional for navigation in global-navigation-satellite-system-denied environments. Recently, dense-optical-flow-based end-to-end trained deep learning VIO models have gained superior performance in outdoor navigation. In this article, we introduced a novel visual–inertial sensor fusion approach based on vision transformer architecture with a cross-attention mechanism, specifically designed to better integrate potentially poor-quality optical flow features with inertial data. Although optical-flow-based VIO models have obtained superior performance in outdoor vehicle navigation, both in accuracy and ease of calibration, we have shown how their suitability for indoor pedestrian navigation is still far from existing feature-matching-based methods. We compare the performance of traditional VIO models against deep-learning-based VIO models on the KITTI benchmark dataset and our custom pedestrian navigation dataset. We show how end-to-end trained VIO models using optical flow were significantly outperformed by simpler visual odometry models utilizing feature matching. Our findings indicate that due to the robustness against occlusion and camera shake, feature matching is better suited for indoor pedestrian navigation, whereas dense optical flow remains viable for vehicular data. Therefore, the most feasible way forward will be the integration of our novel model with feature-based visual data encoding.","PeriodicalId":100621,"journal":{"name":"IEEE Journal of Indoor and Seamless Positioning and Navigation","volume":"2 ","pages":"25-35"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10363184","citationCount":"0","resultStr":"{\"title\":\"A Novel Cross-Attention-Based Pedestrian Visual–Inertial Odometry With Analyses Demonstrating Challenges in Dense Optical Flow\",\"authors\":\"Ilari Pajula;Niclas Joswig;Aiden Morrison;Nadia Sokolova;Laura Ruotsalainen\",\"doi\":\"10.1109/JISPIN.2023.3344077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual–inertial odometry (VIO), the fusion of visual and inertial sensor data, has been shown to be functional for navigation in global-navigation-satellite-system-denied environments. Recently, dense-optical-flow-based end-to-end trained deep learning VIO models have gained superior performance in outdoor navigation. In this article, we introduced a novel visual–inertial sensor fusion approach based on vision transformer architecture with a cross-attention mechanism, specifically designed to better integrate potentially poor-quality optical flow features with inertial data. Although optical-flow-based VIO models have obtained superior performance in outdoor vehicle navigation, both in accuracy and ease of calibration, we have shown how their suitability for indoor pedestrian navigation is still far from existing feature-matching-based methods. We compare the performance of traditional VIO models against deep-learning-based VIO models on the KITTI benchmark dataset and our custom pedestrian navigation dataset. We show how end-to-end trained VIO models using optical flow were significantly outperformed by simpler visual odometry models utilizing feature matching. Our findings indicate that due to the robustness against occlusion and camera shake, feature matching is better suited for indoor pedestrian navigation, whereas dense optical flow remains viable for vehicular data. Therefore, the most feasible way forward will be the integration of our novel model with feature-based visual data encoding.\",\"PeriodicalId\":100621,\"journal\":{\"name\":\"IEEE Journal of Indoor and Seamless Positioning and Navigation\",\"volume\":\"2 \",\"pages\":\"25-35\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10363184\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Indoor and Seamless Positioning and Navigation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10363184/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Indoor and Seamless Positioning and Navigation","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10363184/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

视觉-惯性里程测量（VIO）是视觉和惯性传感器数据的融合，已被证明可在全球导航卫星系统缺失的环境中发挥导航功能。最近，基于密集光流的端到端训练型深度学习 VIO 模型在户外导航中获得了卓越的性能。在本文中，我们介绍了一种新颖的视觉-惯性传感器融合方法，该方法基于具有交叉注意机制的视觉变换器架构，专门设计用于更好地将潜在的劣质光流特征与惯性数据相结合。虽然基于光流的 VIO 模型在室外车辆导航中获得了卓越的性能，无论是在准确性还是在校准的简易性方面，但我们已经展示了它们在室内行人导航中的适用性，与现有的基于特征匹配的方法相比仍有很大差距。我们比较了传统 VIO 模型与基于深度学习的 VIO 模型在 KITTI 基准数据集和我们定制的行人导航数据集上的表现。我们展示了使用光流的端到端训练 VIO 模型的性能如何明显优于使用特征匹配的简单视觉里程测量模型。我们的研究结果表明，由于对遮挡和相机抖动的鲁棒性，特征匹配更适合室内行人导航，而密集光流仍然适用于车辆数据。因此，最可行的方法是将我们的新模型与基于特征的视觉数据编码相结合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Novel Cross-Attention-Based Pedestrian Visual–Inertial Odometry With Analyses Demonstrating Challenges in Dense Optical Flow

Visual–inertial odometry (VIO), the fusion of visual and inertial sensor data, has been shown to be functional for navigation in global-navigation-satellite-system-denied environments. Recently, dense-optical-flow-based end-to-end trained deep learning VIO models have gained superior performance in outdoor navigation. In this article, we introduced a novel visual–inertial sensor fusion approach based on vision transformer architecture with a cross-attention mechanism, specifically designed to better integrate potentially poor-quality optical flow features with inertial data. Although optical-flow-based VIO models have obtained superior performance in outdoor vehicle navigation, both in accuracy and ease of calibration, we have shown how their suitability for indoor pedestrian navigation is still far from existing feature-matching-based methods. We compare the performance of traditional VIO models against deep-learning-based VIO models on the KITTI benchmark dataset and our custom pedestrian navigation dataset. We show how end-to-end trained VIO models using optical flow were significantly outperformed by simpler visual odometry models utilizing feature matching. Our findings indicate that due to the robustness against occlusion and camera shake, feature matching is better suited for indoor pedestrian navigation, whereas dense optical flow remains viable for vehicular data. Therefore, the most feasible way forward will be the integration of our novel model with feature-based visual data encoding.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助