Enhanced outdoor visual localization using Py-Net voting segmentation approach.

IF 2.9 Q2 ROBOTICS Frontiers in Robotics and AI Pub Date : 2024-10-09 eCollection Date: 2024-01-01 DOI:10.3389/frobt.2024.1469588
Jing Wang, Cheng Guo, Shaoyi Hu, Yibo Wang, Xuhui Fan
{"title":"Enhanced outdoor visual localization using Py-Net voting segmentation approach.","authors":"Jing Wang, Cheng Guo, Shaoyi Hu, Yibo Wang, Xuhui Fan","doi":"10.3389/frobt.2024.1469588","DOIUrl":null,"url":null,"abstract":"<p><p>Camera relocalization determines the position and orientation of a camera in a 3D space. Althouh methods based on scene coordinate regression yield highly accurate results in indoor scenes, they exhibit poor performance in outdoor scenarios due to their large scale and increased complexity. A visual localization method, Py-Net, is therefore proposed herein. Py-Net is based on voting segmentation and comprises a main encoder containing Py-layer and two branch decoders. The Py-layer comprises pyramid convolution and 1 × 1 convolution kernels for feature extraction across multiple levels, with fewer parameters to enhance the model's ability to extract scene information. Coordinate attention was added at the end of the encoder for feature correction, which improved the model robustness to interference. To prevent the feature loss caused by repetitive structures and low-texture images in the scene, deep over-parameterized convolution modules were incorporated into the seg and vote decoders. Landmark segmentation and voting maps were used to establish the relation between images and landmarks in 3D space, reducing anomalies and achieving high precision with a small number of landmarks. The experimental results show that, in multiple outdoor scenes, Py-Net achieves lower distance and angle errors compared to existing methods. Additionally, compared to VS-Net, which also uses a voting segmentation structure, Py-Net reduces the number of parameters by 31.85% and decreases the model size from 236MB to 170 MB.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"11 ","pages":"1469588"},"PeriodicalIF":2.9000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11497456/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Robotics and AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frobt.2024.1469588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Camera relocalization determines the position and orientation of a camera in a 3D space. Althouh methods based on scene coordinate regression yield highly accurate results in indoor scenes, they exhibit poor performance in outdoor scenarios due to their large scale and increased complexity. A visual localization method, Py-Net, is therefore proposed herein. Py-Net is based on voting segmentation and comprises a main encoder containing Py-layer and two branch decoders. The Py-layer comprises pyramid convolution and 1 × 1 convolution kernels for feature extraction across multiple levels, with fewer parameters to enhance the model's ability to extract scene information. Coordinate attention was added at the end of the encoder for feature correction, which improved the model robustness to interference. To prevent the feature loss caused by repetitive structures and low-texture images in the scene, deep over-parameterized convolution modules were incorporated into the seg and vote decoders. Landmark segmentation and voting maps were used to establish the relation between images and landmarks in 3D space, reducing anomalies and achieving high precision with a small number of landmarks. The experimental results show that, in multiple outdoor scenes, Py-Net achieves lower distance and angle errors compared to existing methods. Additionally, compared to VS-Net, which also uses a voting segmentation structure, Py-Net reduces the number of parameters by 31.85% and decreases the model size from 236MB to 170 MB.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用 Py-Net 投票分割方法增强户外视觉定位。
摄像机重新定位可以确定摄像机在三维空间中的位置和方向。虽然基于场景坐标回归的方法在室内场景中能获得高精度的结果,但由于其规模大、复杂性高,在室外场景中表现不佳。因此,本文提出了一种可视化定位方法 Py-Net。Py-Net 基于投票分割,由一个包含 Py 层的主编码器和两个分支解码器组成。Py 层由金字塔卷积和 1 × 1 卷积核组成,用于多层次特征提取,以较少的参数提高模型提取场景信息的能力。在编码器末端添加了用于特征校正的坐标注意,从而提高了模型对干扰的鲁棒性。为了防止场景中重复结构和低纹理图像造成的特征丢失,在分割和投票解码器中加入了深度超参数卷积模块。地标分割和投票图用于在三维空间中建立图像和地标之间的关系,从而减少异常情况,并以少量地标实现高精度。实验结果表明,在多个室外场景中,与现有方法相比,Py-Net 可实现更低的距离和角度误差。此外,与同样采用投票分割结构的 VS-Net 相比,Py-Net 减少了 31.85% 的参数数量,并将模型大小从 236MB 减少到 170MB。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.50
自引率
5.90%
发文量
355
审稿时长
14 weeks
期刊介绍: Frontiers in Robotics and AI publishes rigorously peer-reviewed research covering all theory and applications of robotics, technology, and artificial intelligence, from biomedical to space robotics.
期刊最新文献
Embedding-based pair generation for contrastive representation learning in audio-visual surveillance data. Advanced robotics for automated EV battery testing using electrochemical impedance spectroscopy. Pig tongue soft robot mimicking intrinsic tongue muscle structure. A fast monocular 6D pose estimation method for textureless objects based on perceptual hashing and template matching. Semantic segmentation using synthetic images of underwater marine-growth.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1