{"title":"NeSLAM: Neural Implicit Mapping and Self-Supervised Feature Tracking With Depth Completion and Denoising","authors":"Tianchen Deng;Yanbo Wang;Hongle Xie;Hesheng Wang;Rui Guo;Jingchuan Wang;Danwei Wang;Weidong Chen","doi":"10.1109/TASE.2025.3541064","DOIUrl":null,"url":null,"abstract":"In recent years, there have been significant advancements in 3D reconstruction and dense RGB-D SLAM systems. One notable development is the application of Neural Radiance Fields (NeRF) in these systems, which utilizes implicit neural representation to encode 3D scenes. However, the depth images obtained from consumer-grade RGB-D sensors are often sparse and noisy, which poses significant challenges for 3D reconstruction and affects the accuracy of the representation of the scene geometry. Furthermore, existing methods select random pixels for camera tracking, leading to inaccurate localization in real-world indoor environments. To this end, we present NeSLAM, an advanced framework that achieves accurate and dense depth estimation, robust camera tracking, and realistic synthesis of novel views. First, a depth completion and denoising network is designed to provide dense geometry prior and guide the neural implicit representation optimization. Second, we propose a NeRF-based self-supervised feature tracking algorithm for robust real-time tracking. Experiments on various indoor datasets demonstrate the effectiveness and accuracy of the system in reconstruction, tracking quality, and novel view synthesis. Note to Practitioners—Traditional SLAM methods usually use the sparse point cloud to represent the scene, resulting in poor scene representation capability. Our method proposes a neural implicit representation method with depth completion and denoising network and feature tracking method, achieves accurate scene reconstruction and accurate pose estimation in various indoor scenes. The depth completion and denoising network provide accurate depth information associated with depth uncertainty, which is used to improve the geometry consistency. The NeRF-based self-supervised feature tracking method improve the accuracy and robustness for camera tracking. The experimental results demonstrate the accuracy and effectiveness of this method in different scenes.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"12309-12321"},"PeriodicalIF":6.4000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10879467/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, there have been significant advancements in 3D reconstruction and dense RGB-D SLAM systems. One notable development is the application of Neural Radiance Fields (NeRF) in these systems, which utilizes implicit neural representation to encode 3D scenes. However, the depth images obtained from consumer-grade RGB-D sensors are often sparse and noisy, which poses significant challenges for 3D reconstruction and affects the accuracy of the representation of the scene geometry. Furthermore, existing methods select random pixels for camera tracking, leading to inaccurate localization in real-world indoor environments. To this end, we present NeSLAM, an advanced framework that achieves accurate and dense depth estimation, robust camera tracking, and realistic synthesis of novel views. First, a depth completion and denoising network is designed to provide dense geometry prior and guide the neural implicit representation optimization. Second, we propose a NeRF-based self-supervised feature tracking algorithm for robust real-time tracking. Experiments on various indoor datasets demonstrate the effectiveness and accuracy of the system in reconstruction, tracking quality, and novel view synthesis. Note to Practitioners—Traditional SLAM methods usually use the sparse point cloud to represent the scene, resulting in poor scene representation capability. Our method proposes a neural implicit representation method with depth completion and denoising network and feature tracking method, achieves accurate scene reconstruction and accurate pose estimation in various indoor scenes. The depth completion and denoising network provide accurate depth information associated with depth uncertainty, which is used to improve the geometry consistency. The NeRF-based self-supervised feature tracking method improve the accuracy and robustness for camera tracking. The experimental results demonstrate the accuracy and effectiveness of this method in different scenes.
期刊介绍:
The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.