Fast Monocular Visual-Inertial Initialization Leveraging Learned Single-View Depth

Nate Merrill, Patrick Geneva, Saimouli Katragadda, Chuchu Chen, G. Huang
{"title":"Fast Monocular Visual-Inertial Initialization Leveraging Learned Single-View Depth","authors":"Nate Merrill, Patrick Geneva, Saimouli Katragadda, Chuchu Chen, G. Huang","doi":"10.15607/RSS.2023.XIX.072","DOIUrl":null,"url":null,"abstract":"—In monocular visual-inertial navigation systems, it is ideal to initialize as quickly and robustly as possible. State-of-the-art initialization methods typically make linear approximations using the image features and inertial information in order to initialize in closed-form, and then refine the states with a nonlinear optimization. While the standard methods typically wait for a 2sec data window, a recent work has shown that it is possible to initialize faster (0.5sec) by adding constraints from a robust but only up-to-scale monocular depth network in the nonlinear optimization. To further expedite the initialization, in this work, we leverage the scale-less depth measurements instead in the linear initialization step that is performed prior to the nonlinear one, which only requires a single depth image for the first frame. We show that the typical estimation of each feature state independently in the closed-form solution can be replaced by just estimating the scale and offset parameters of the learned depth map. Interestingly, our formulation makes it possible to construct small minimal problems in a RANSAC loop, whereas the typical linear system’s minimal problem is quite large and includes every feature state. Experiments show that our method can improve the overall initialization performance on popular public datasets (EuRoC MAV and TUM-VI) over state-of-the-art methods. For the TUM-VI dataset, we show superior initialization performance with only a 0.3sec window of data, which is the smallest ever reported, and show that our method can initialize more often, robustly, and accurately in different challenging scenarios.","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics: Science and Systems XIX","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15607/RSS.2023.XIX.072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

—In monocular visual-inertial navigation systems, it is ideal to initialize as quickly and robustly as possible. State-of-the-art initialization methods typically make linear approximations using the image features and inertial information in order to initialize in closed-form, and then refine the states with a nonlinear optimization. While the standard methods typically wait for a 2sec data window, a recent work has shown that it is possible to initialize faster (0.5sec) by adding constraints from a robust but only up-to-scale monocular depth network in the nonlinear optimization. To further expedite the initialization, in this work, we leverage the scale-less depth measurements instead in the linear initialization step that is performed prior to the nonlinear one, which only requires a single depth image for the first frame. We show that the typical estimation of each feature state independently in the closed-form solution can be replaced by just estimating the scale and offset parameters of the learned depth map. Interestingly, our formulation makes it possible to construct small minimal problems in a RANSAC loop, whereas the typical linear system’s minimal problem is quite large and includes every feature state. Experiments show that our method can improve the overall initialization performance on popular public datasets (EuRoC MAV and TUM-VI) over state-of-the-art methods. For the TUM-VI dataset, we show superior initialization performance with only a 0.3sec window of data, which is the smallest ever reported, and show that our method can initialize more often, robustly, and accurately in different challenging scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
快速单目视觉惯性初始化利用学习单视图深度
在单目视觉惯性导航系统中,理想的方法是尽可能快速和鲁棒地初始化。现有的初始化方法通常是利用图像特征和惯性信息进行线性逼近,以封闭形式初始化,然后通过非线性优化来细化状态。虽然标准方法通常等待2秒的数据窗口,但最近的一项研究表明,通过在非线性优化中添加鲁棒的约束条件(但只能达到规模的单目深度网络),可以更快地初始化(0.5秒)。为了进一步加快初始化,在这项工作中,我们在非线性初始化步骤之前执行的线性初始化步骤中利用无尺度深度测量,这只需要第一帧的单个深度图像。我们证明了封闭解中每个特征状态独立的典型估计可以被仅仅估计学习到的深度图的尺度和偏移参数所取代。有趣的是,我们的公式使得在RANSAC循环中构造小的最小问题成为可能,而典型的线性系统的最小问题相当大,并且包括每个特征状态。实验表明,我们的方法可以提高流行的公共数据集(EuRoC MAV和TUM-VI)的整体初始化性能。对于TUM-VI数据集,我们仅用0.3秒的数据窗口显示了优越的初始化性能,这是有史以来最小的数据窗口,并且表明我们的方法可以在不同具有挑战性的场景中更频繁,更健壮,更准确地初始化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Sampling-Based Approach for Heterogeneous Coalition Scheduling with Temporal Uncertainty ROSE: Rotation-based Squeezing Robotic Gripper toward Universal Handling of Objects ERASOR2: Instance-Aware Robust 3D Mapping of the Static World in Dynamic Scenes Autonomous Navigation, Mapping and Exploration with Gaussian Processes Predefined-Time Convergent Motion Control for Heterogeneous Continuum Robots
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1