SLAM2: Simultaneous Localization and Multimode Mapping for indoor dynamic environments

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Pub Date : 2025-02-01 Epub Date: 2024-09-28 DOI:10.1016/j.patcog.2024.111054

Zhihao Lin , Qi Zhang , Zhen Tian , Peizhuo Yu , Ziyang Ye , Hanyang Zhuang , Jianglin Lan

{"title":"SLAM2: Simultaneous Localization and Multimode Mapping for indoor dynamic environments","authors":"Zhihao Lin , Qi Zhang , Zhen Tian , Peizhuo Yu , Ziyang Ye , Hanyang Zhuang , Jianglin Lan","doi":"10.1016/j.patcog.2024.111054","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional visual Simultaneous Localization and Mapping (SLAM) methods based on point features are often limited by strong static assumptions and texture information, resulting in inaccurate camera pose estimation and object localization. To address these challenges, we present SLAM<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>, a novel semantic RGB-D SLAM system that can obtain accurate estimation of the camera pose and the 6DOF pose of other objects, resulting in complete and clean static 3D model mapping in dynamic environments. Our system makes full use of the point, line, and plane features in space to enhance the camera pose estimation accuracy. It combines the traditional geometric method with a deep learning method to detect both known and unknown dynamic objects in the scene. Moreover, our system is designed with a three-mode mapping method, including dense, semi-dense, and sparse, where the mode can be selected according to the needs of different tasks. This makes our visual SLAM system applicable to diverse application areas. Evaluation in the TUM RGB-D and Bonn RGB-D datasets demonstrates that our SLAM system achieves the most advanced localization accuracy and the cleanest static 3D mapping of the scene in dynamic environments, compared to state-of-the-art methods. Specifically, our system achieves a root mean square error (RMSE) of 0.018 m in the highly dynamic TUM w/half sequence, outperforming ORB-SLAM3 (0.231 m) and DRG-SLAM (0.025 m). In the Bonn dataset, our system demonstrates superior performance in 14 out of 18 sequences, with an average RMSE reduction of 27.3% compared to the next best method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111054"},"PeriodicalIF":7.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324008057","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/28 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional visual Simultaneous Localization and Mapping (SLAM) methods based on point features are often limited by strong static assumptions and texture information, resulting in inaccurate camera pose estimation and object localization. To address these challenges, we present SLAM

^{2}

, a novel semantic RGB-D SLAM system that can obtain accurate estimation of the camera pose and the 6DOF pose of other objects, resulting in complete and clean static 3D model mapping in dynamic environments. Our system makes full use of the point, line, and plane features in space to enhance the camera pose estimation accuracy. It combines the traditional geometric method with a deep learning method to detect both known and unknown dynamic objects in the scene. Moreover, our system is designed with a three-mode mapping method, including dense, semi-dense, and sparse, where the mode can be selected according to the needs of different tasks. This makes our visual SLAM system applicable to diverse application areas. Evaluation in the TUM RGB-D and Bonn RGB-D datasets demonstrates that our SLAM system achieves the most advanced localization accuracy and the cleanest static 3D mapping of the scene in dynamic environments, compared to state-of-the-art methods. Specifically, our system achieves a root mean square error (RMSE) of 0.018 m in the highly dynamic TUM w/half sequence, outperforming ORB-SLAM3 (0.231 m) and DRG-SLAM (0.025 m). In the Bonn dataset, our system demonstrates superior performance in 14 out of 18 sequences, with an average RMSE reduction of 27.3% compared to the next best method.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SLAM2：针对室内动态环境的同步定位和多模绘图

传统的基于点特征的视觉同步定位与映射（SLAM）方法往往受限于强大的静态假设和纹理信息，导致相机姿态估计和物体定位不准确。为了应对这些挑战，我们提出了 SLAM2，这是一种新颖的语义 RGB-D SLAM 系统，可以准确估计摄像机的姿态和其他物体的 6DOF 姿态，从而在动态环境中绘制出完整、清晰的静态 3D 模型映射。我们的系统充分利用空间中的点、线、面特征来提高摄像机姿态估计的准确性。它将传统的几何方法与深度学习方法相结合，既能检测场景中已知的动态物体，也能检测未知的动态物体。此外，我们的系统还设计了三种模式的映射方法，包括密集、半密集和稀疏，可根据不同任务的需要选择模式。这使得我们的视觉 SLAM 系统适用于多种应用领域。在 TUM RGB-D 和 Bonn RGB-D 数据集中进行的评估表明，与最先进的方法相比，我们的 SLAM 系统在动态环境中实现了最高的定位精度和最简洁的静态三维场景映射。具体来说，在高动态的 TUM w/half 序列中，我们的系统实现了 0.018 米的均方根误差 (RMSE)，优于 ORB-SLAM3（0.231 米）和 DRG-SLAM（0.025 米）。在波恩数据集中，我们的系统在 18 个序列中的 14 个序列中表现优异，与次优方法相比，平均 RMSE 降低了 27.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.