Enhanced outdoor visual localization using Py-Net voting segmentation approach.

IF 2.9 Q2 ROBOTICS Frontiers in Robotics and AI Pub Date : 2024-10-09 eCollection Date: 2024-01-01 DOI:10.3389/frobt.2024.1469588

Jing Wang, Cheng Guo, Shaoyi Hu, Yibo Wang, Xuhui Fan

{"title":"Enhanced outdoor visual localization using Py-Net voting segmentation approach.","authors":"Jing Wang, Cheng Guo, Shaoyi Hu, Yibo Wang, Xuhui Fan","doi":"10.3389/frobt.2024.1469588","DOIUrl":null,"url":null,"abstract":"<p><p>Camera relocalization determines the position and orientation of a camera in a 3D space. Althouh methods based on scene coordinate regression yield highly accurate results in indoor scenes, they exhibit poor performance in outdoor scenarios due to their large scale and increased complexity. A visual localization method, Py-Net, is therefore proposed herein. Py-Net is based on voting segmentation and comprises a main encoder containing Py-layer and two branch decoders. The Py-layer comprises pyramid convolution and 1 × 1 convolution kernels for feature extraction across multiple levels, with fewer parameters to enhance the model's ability to extract scene information. Coordinate attention was added at the end of the encoder for feature correction, which improved the model robustness to interference. To prevent the feature loss caused by repetitive structures and low-texture images in the scene, deep over-parameterized convolution modules were incorporated into the seg and vote decoders. Landmark segmentation and voting maps were used to establish the relation between images and landmarks in 3D space, reducing anomalies and achieving high precision with a small number of landmarks. The experimental results show that, in multiple outdoor scenes, Py-Net achieves lower distance and angle errors compared to existing methods. Additionally, compared to VS-Net, which also uses a voting segmentation structure, Py-Net reduces the number of parameters by 31.85% and decreases the model size from 236MB to 170 MB.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"11 ","pages":"1469588"},"PeriodicalIF":2.9000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11497456/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Robotics and AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frobt.2024.1469588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Camera relocalization determines the position and orientation of a camera in a 3D space. Althouh methods based on scene coordinate regression yield highly accurate results in indoor scenes, they exhibit poor performance in outdoor scenarios due to their large scale and increased complexity. A visual localization method, Py-Net, is therefore proposed herein. Py-Net is based on voting segmentation and comprises a main encoder containing Py-layer and two branch decoders. The Py-layer comprises pyramid convolution and 1 × 1 convolution kernels for feature extraction across multiple levels, with fewer parameters to enhance the model's ability to extract scene information. Coordinate attention was added at the end of the encoder for feature correction, which improved the model robustness to interference. To prevent the feature loss caused by repetitive structures and low-texture images in the scene, deep over-parameterized convolution modules were incorporated into the seg and vote decoders. Landmark segmentation and voting maps were used to establish the relation between images and landmarks in 3D space, reducing anomalies and achieving high precision with a small number of landmarks. The experimental results show that, in multiple outdoor scenes, Py-Net achieves lower distance and angle errors compared to existing methods. Additionally, compared to VS-Net, which also uses a voting segmentation structure, Py-Net reduces the number of parameters by 31.85% and decreases the model size from 236MB to 170 MB.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用 Py-Net 投票分割方法增强户外视觉定位。

摄像机重新定位可以确定摄像机在三维空间中的位置和方向。虽然基于场景坐标回归的方法在室内场景中能获得高精度的结果，但由于其规模大、复杂性高，在室外场景中表现不佳。因此，本文提出了一种可视化定位方法 Py-Net。Py-Net 基于投票分割，由一个包含 Py 层的主编码器和两个分支解码器组成。Py 层由金字塔卷积和 1 × 1 卷积核组成，用于多层次特征提取，以较少的参数提高模型提取场景信息的能力。在编码器末端添加了用于特征校正的坐标注意，从而提高了模型对干扰的鲁棒性。为了防止场景中重复结构和低纹理图像造成的特征丢失，在分割和投票解码器中加入了深度超参数卷积模块。地标分割和投票图用于在三维空间中建立图像和地标之间的关系，从而减少异常情况，并以少量地标实现高精度。实验结果表明，在多个室外场景中，与现有方法相比，Py-Net 可实现更低的距离和角度误差。此外，与同样采用投票分割结构的 VS-Net 相比，Py-Net 减少了 31.85% 的参数数量，并将模型大小从 236MB 减少到 170MB。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers in Robotics and AI ROBOTICS-

CiteScore

6.50

自引率

5.90%

发文量

355

审稿时长

14 weeks

期刊介绍： Frontiers in Robotics and AI publishes rigorously peer-reviewed research covering all theory and applications of robotics, technology, and artificial intelligence, from biomedical to space robotics.