Fast Monocular Visual-Inertial Initialization Leveraging Learned Single-View Depth

Robotics: Science and Systems XIX Pub Date : 2023-07-10 DOI:10.15607/RSS.2023.XIX.072

Nate Merrill, Patrick Geneva, Saimouli Katragadda, Chuchu Chen, G. Huang

{"title":"Fast Monocular Visual-Inertial Initialization Leveraging Learned Single-View Depth","authors":"Nate Merrill, Patrick Geneva, Saimouli Katragadda, Chuchu Chen, G. Huang","doi":"10.15607/RSS.2023.XIX.072","DOIUrl":null,"url":null,"abstract":"—In monocular visual-inertial navigation systems, it is ideal to initialize as quickly and robustly as possible. State-of-the-art initialization methods typically make linear approximations using the image features and inertial information in order to initialize in closed-form, and then refine the states with a nonlinear optimization. While the standard methods typically wait for a 2sec data window, a recent work has shown that it is possible to initialize faster (0.5sec) by adding constraints from a robust but only up-to-scale monocular depth network in the nonlinear optimization. To further expedite the initialization, in this work, we leverage the scale-less depth measurements instead in the linear initialization step that is performed prior to the nonlinear one, which only requires a single depth image for the first frame. We show that the typical estimation of each feature state independently in the closed-form solution can be replaced by just estimating the scale and offset parameters of the learned depth map. Interestingly, our formulation makes it possible to construct small minimal problems in a RANSAC loop, whereas the typical linear system’s minimal problem is quite large and includes every feature state. Experiments show that our method can improve the overall initialization performance on popular public datasets (EuRoC MAV and TUM-VI) over state-of-the-art methods. For the TUM-VI dataset, we show superior initialization performance with only a 0.3sec window of data, which is the smallest ever reported, and show that our method can initialize more often, robustly, and accurately in different challenging scenarios.","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics: Science and Systems XIX","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15607/RSS.2023.XIX.072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

—In monocular visual-inertial navigation systems, it is ideal to initialize as quickly and robustly as possible. State-of-the-art initialization methods typically make linear approximations using the image features and inertial information in order to initialize in closed-form, and then refine the states with a nonlinear optimization. While the standard methods typically wait for a 2sec data window, a recent work has shown that it is possible to initialize faster (0.5sec) by adding constraints from a robust but only up-to-scale monocular depth network in the nonlinear optimization. To further expedite the initialization, in this work, we leverage the scale-less depth measurements instead in the linear initialization step that is performed prior to the nonlinear one, which only requires a single depth image for the first frame. We show that the typical estimation of each feature state independently in the closed-form solution can be replaced by just estimating the scale and offset parameters of the learned depth map. Interestingly, our formulation makes it possible to construct small minimal problems in a RANSAC loop, whereas the typical linear system’s minimal problem is quite large and includes every feature state. Experiments show that our method can improve the overall initialization performance on popular public datasets (EuRoC MAV and TUM-VI) over state-of-the-art methods. For the TUM-VI dataset, we show superior initialization performance with only a 0.3sec window of data, which is the smallest ever reported, and show that our method can initialize more often, robustly, and accurately in different challenging scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

快速单目视觉惯性初始化利用学习单视图深度

在单目视觉惯性导航系统中，理想的方法是尽可能快速和鲁棒地初始化。现有的初始化方法通常是利用图像特征和惯性信息进行线性逼近，以封闭形式初始化，然后通过非线性优化来细化状态。虽然标准方法通常等待2秒的数据窗口，但最近的一项研究表明，通过在非线性优化中添加鲁棒的约束条件(但只能达到规模的单目深度网络)，可以更快地初始化(0.5秒)。为了进一步加快初始化，在这项工作中，我们在非线性初始化步骤之前执行的线性初始化步骤中利用无尺度深度测量，这只需要第一帧的单个深度图像。我们证明了封闭解中每个特征状态独立的典型估计可以被仅仅估计学习到的深度图的尺度和偏移参数所取代。有趣的是，我们的公式使得在RANSAC循环中构造小的最小问题成为可能，而典型的线性系统的最小问题相当大，并且包括每个特征状态。实验表明，我们的方法可以提高流行的公共数据集(EuRoC MAV和TUM-VI)的整体初始化性能。对于TUM-VI数据集，我们仅用0.3秒的数据窗口显示了优越的初始化性能，这是有史以来最小的数据窗口，并且表明我们的方法可以在不同具有挑战性的场景中更频繁，更健壮，更准确地初始化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Robotics: Science and Systems XIX

自引率

0.00%

发文量