Visual simultaneous localization and mapping (vSLAM) algorithm based on improved Vision Transformer semantic segmentation in dynamic scenes

Mengyuan Chen, Hangrong Guo, Runbang Qian, Guangqiang Gong, Hao Cheng
{"title":"Visual simultaneous localization and mapping (vSLAM) algorithm based on improved Vision Transformer semantic segmentation in dynamic scenes","authors":"Mengyuan Chen, Hangrong Guo, Runbang Qian, Guangqiang Gong, Hao Cheng","doi":"10.5194/ms-15-1-2024","DOIUrl":null,"url":null,"abstract":"Abstract. Identifying dynamic objects in dynamic scenes remains a challenge for traditional simultaneous localization and mapping (SLAM) algorithms. Additionally, these algorithms are not able to adequately inpaint the culling regions that result from excluding dynamic objects. In light of these challenges, this study proposes a novel visual SLAM (vSLAM) algorithm based on improved Vision Transformer semantic segmentation in dynamic scenes (VTD-SLAM), which leverages an improved Vision Transformer semantic segmentation technique to address these limitations. Specifically, VTD-SLAM utilizes a residual dual-pyramid backbone network to extract dynamic object region features and a multiclass feature transformer segmentation module to enhance the pixel weight of potential dynamic objects and to improve global semantic information for precise identification of potential dynamic objects. The method of multi-view geometry is applied to judge and remove the dynamic objects. Meanwhile, according to static information in the adjacent frames, the optimal nearest-neighbor pixel-matching method is applied to restore the static background, where the feature points are extracted for pose estimation. With validation in the public dataset TUM (The Entrepreneurial University Dataset) and real scenarios, the experimental results show that the root-mean-square error of the algorithm is reduced by 17.1 % compared with dynamic SLAM (DynaSLAM), which shows better map composition capability.\n","PeriodicalId":502917,"journal":{"name":"Mechanical Sciences","volume":"59 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mechanical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/ms-15-1-2024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract. Identifying dynamic objects in dynamic scenes remains a challenge for traditional simultaneous localization and mapping (SLAM) algorithms. Additionally, these algorithms are not able to adequately inpaint the culling regions that result from excluding dynamic objects. In light of these challenges, this study proposes a novel visual SLAM (vSLAM) algorithm based on improved Vision Transformer semantic segmentation in dynamic scenes (VTD-SLAM), which leverages an improved Vision Transformer semantic segmentation technique to address these limitations. Specifically, VTD-SLAM utilizes a residual dual-pyramid backbone network to extract dynamic object region features and a multiclass feature transformer segmentation module to enhance the pixel weight of potential dynamic objects and to improve global semantic information for precise identification of potential dynamic objects. The method of multi-view geometry is applied to judge and remove the dynamic objects. Meanwhile, according to static information in the adjacent frames, the optimal nearest-neighbor pixel-matching method is applied to restore the static background, where the feature points are extracted for pose estimation. With validation in the public dataset TUM (The Entrepreneurial University Dataset) and real scenarios, the experimental results show that the root-mean-square error of the algorithm is reduced by 17.1 % compared with dynamic SLAM (DynaSLAM), which shows better map composition capability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于改进的动态场景视觉转换器语义分割的视觉同步定位和映射(vSLAM)算法
摘要在动态场景中识别动态物体仍然是传统的同步定位和映射(SLAM)算法面临的一项挑战。此外,这些算法也无法充分涂抹因排除动态物体而产生的剔除区域。鉴于这些挑战,本研究提出了一种基于改进的动态场景视觉变换器语义分割(VTD-SLAM)的新型视觉 SLAM(vSLAM)算法,该算法利用改进的视觉变换器语义分割技术来解决这些局限性。具体来说,VTD-SLAM 利用残差双金字塔骨干网络提取动态物体区域特征,并利用多类特征变换器分割模块增强潜在动态物体的像素权重,改善全局语义信息,从而精确识别潜在动态物体。应用多视角几何方法对动态物体进行判断和去除。同时,根据相邻帧的静态信息,采用最优近邻像素匹配法还原静态背景,提取特征点进行姿态估计。通过公共数据集 TUM(创业大学数据集)和真实场景的验证,实验结果表明,与动态 SLAM(DynaSLAM)相比,该算法的均方根误差降低了 17.1%,显示了更好的地图合成能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Comparison of finite element analysis results with strain gauge measurements of a front axle housing Dynamic modeling and performance analysis of the 2PRU-PUU parallel mechanism Time-varying reliability models for parallel systems consisting of beam structures with crack flaws Development of a flexible endoscopic robot with autonomous tracking control ability using machine vision and deep learning Multi-robot consensus formation based on virtual spring obstacle avoidance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1