{"title":"Visual SLAM algorithm in dynamic environment based on deep learning","authors":"Yingjie Yu, Shuai Chen, Xinpeng Yang, Changzhen Xu, Sen Zhang, Wendong Xiao","doi":"10.1108/ir-04-2024-0166","DOIUrl":null,"url":null,"abstract":"<h3>Purpose</h3>\n<p>This paper proposes a self-supervised monocular depth estimation algorithm under multiple constraints, which can generate the corresponding depth map end-to-end based on RGB images. On this basis, based on the traditional visual simultaneous localisation and mapping (VSLAM) framework, a dynamic object detection framework based on deep learning is introduced, and dynamic objects in the scene are culled during mapping.</p><!--/ Abstract__block -->\n<h3>Design/methodology/approach</h3>\n<p>Typical SLAM algorithms or data sets assume a static environment and do not consider the potential consequences of accidentally adding dynamic objects to a 3D map. This shortcoming limits the applicability of VSLAM in many practical cases, such as long-term mapping. In light of the aforementioned considerations, this paper presents a self-supervised monocular depth estimation algorithm based on deep learning. Furthermore, this paper introduces the YOLOv5 dynamic detection framework into the traditional ORBSLAM2 algorithm for the purpose of removing dynamic objects.</p><!--/ Abstract__block -->\n<h3>Findings</h3>\n<p>Compared with Dyna-SLAM, the algorithm proposed in this paper reduces the error by about 13%, and compared with ORB-SLAM2 by about 54.9%. In addition, the algorithm in this paper can process a single frame of image at a speed of 15–20 FPS on GeForce RTX 2080s, far exceeding Dyna-SLAM in real-time performance.</p><!--/ Abstract__block -->\n<h3>Originality/value</h3>\n<p>This paper proposes a VSLAM algorithm that can be applied to dynamic environments. The algorithm consists of a self-supervised monocular depth estimation part under multiple constraints and the introduction of a dynamic object detection framework based on YOLOv5.</p><!--/ Abstract__block -->","PeriodicalId":501389,"journal":{"name":"Industrial Robot","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Industrial Robot","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ir-04-2024-0166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
This paper proposes a self-supervised monocular depth estimation algorithm under multiple constraints, which can generate the corresponding depth map end-to-end based on RGB images. On this basis, based on the traditional visual simultaneous localisation and mapping (VSLAM) framework, a dynamic object detection framework based on deep learning is introduced, and dynamic objects in the scene are culled during mapping.
Design/methodology/approach
Typical SLAM algorithms or data sets assume a static environment and do not consider the potential consequences of accidentally adding dynamic objects to a 3D map. This shortcoming limits the applicability of VSLAM in many practical cases, such as long-term mapping. In light of the aforementioned considerations, this paper presents a self-supervised monocular depth estimation algorithm based on deep learning. Furthermore, this paper introduces the YOLOv5 dynamic detection framework into the traditional ORBSLAM2 algorithm for the purpose of removing dynamic objects.
Findings
Compared with Dyna-SLAM, the algorithm proposed in this paper reduces the error by about 13%, and compared with ORB-SLAM2 by about 54.9%. In addition, the algorithm in this paper can process a single frame of image at a speed of 15–20 FPS on GeForce RTX 2080s, far exceeding Dyna-SLAM in real-time performance.
Originality/value
This paper proposes a VSLAM algorithm that can be applied to dynamic environments. The algorithm consists of a self-supervised monocular depth estimation part under multiple constraints and the introduction of a dynamic object detection framework based on YOLOv5.