Lorenzo De Rebotti, Emanuele Giacomini, Giorgio Grisetti, Luca Di Giammarino
Efficient and scalable 3D surface reconstruction from range data remains a core challenge in computer graphics and vision, particularly in real-time and resource-constrained scenarios. Traditional volumetric methods based on fixed-resolution voxel grids or hierarchical structures like octrees often suffer from memory inefficiency, computational overhead, and a lack of GPU support. We propose a novel variance-adaptive, multi-resolution voxel grid that dynamically adjusts voxel size based on the local variance of signed distance field (SDF) observations. Unlike prior multi-resolution approaches that rely on recursive octree structures, our method leverages a flat spatial hash table to store all voxel blocks, supporting constant-time access and full GPU parallelism. This design enables high memory efficiency, and real-time scalability. We further demonstrate how our representation supports GPU-accelerated rendering through a parallel quad-tree structure for Gaussian Splatting, enabling effective control over splat density. Our open-source CUDA/C++ implementation achieves up to 13× speedup and 4× lower memory usage compared to fixed-resolution baselines, while maintaining on par results in terms of reconstruction accuracy, offering a practical and extensible solution for high-performance 3D reconstruction.
{"title":"Resolution Where It Counts: Hash-based GPU-Accelerated 3D Reconstruction via Variance-Adaptive Voxel Grids","authors":"Lorenzo De Rebotti, Emanuele Giacomini, Giorgio Grisetti, Luca Di Giammarino","doi":"10.1145/3777909","DOIUrl":"https://doi.org/10.1145/3777909","url":null,"abstract":"Efficient and scalable 3D surface reconstruction from range data remains a core challenge in computer graphics and vision, particularly in real-time and resource-constrained scenarios. Traditional volumetric methods based on fixed-resolution voxel grids or hierarchical structures like octrees often suffer from memory inefficiency, computational overhead, and a lack of GPU support. We propose a novel variance-adaptive, multi-resolution voxel grid that dynamically adjusts voxel size based on the local variance of signed distance field (SDF) observations. Unlike prior multi-resolution approaches that rely on recursive octree structures, our method leverages a flat spatial hash table to store all voxel blocks, supporting constant-time access and full GPU parallelism. This design enables high memory efficiency, and real-time scalability. We further demonstrate how our representation supports GPU-accelerated rendering through a parallel quad-tree structure for Gaussian Splatting, enabling effective control over splat density. Our open-source CUDA/C++ implementation achieves up to 13× speedup and 4× lower memory usage compared to fixed-resolution baselines, while maintaining on par results in terms of reconstruction accuracy, offering a practical and extensible solution for high-performance 3D reconstruction.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"204 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taehei Kim, Jihun Shin, Hyeshim Kim, Hyuckjin Jang, Jiho Kang, Sung-Hee Lee
We propose a multi-user Mixed Reality (MR) telepresence system that allows users to interact by seamlessly visualizing remote environments and avatars overlaid onto their local physical space. Building on prior shared-space approaches, our method first aligns overlapping rooms to maximize a shared space —a common area containing matched real and virtual objects where all users can interact. Uniquely, our system extends beyond this shared space by visualizing non-shared spaces, the remaining part of each room, allowing users to inhabit these distinct areas. To address the issue of overlap between non-shared spaces, we dynamically adjust their visibility based on user proximity, using a Voronoi diagram to prioritize subspaces closer to each user. Visualizing the surrounding space of each user conveys spatial context, helping others interpret their behavior within their environment. Visibility is updated in real time as users move, maintaining a coherent sense of spatial awareness. Through a user study, we demonstrate that our system enhances enjoyment, spatial understanding, and presence compared to shared-space-only approaches. Quantitative results further show that our dynamic visibility modulation improves both personal space preservation and space accessibility relative to static methods. Overall, our system provides users with a seamless, dynamically connected, and shared multi-room environment.
{"title":"Voronoi Rooms: Dynamic Visibility Modulation of Overlapping Spaces for Telepresence","authors":"Taehei Kim, Jihun Shin, Hyeshim Kim, Hyuckjin Jang, Jiho Kang, Sung-Hee Lee","doi":"10.1145/3777900","DOIUrl":"https://doi.org/10.1145/3777900","url":null,"abstract":"We propose a multi-user Mixed Reality (MR) telepresence system that allows users to interact by seamlessly visualizing remote environments and avatars overlaid onto their local physical space. Building on prior shared-space approaches, our method first aligns overlapping rooms to maximize a <jats:italic toggle=\"yes\">shared space</jats:italic> —a common area containing matched real and virtual objects where all users can interact. Uniquely, our system extends beyond this shared space by visualizing non-shared spaces, the remaining part of each room, allowing users to inhabit these distinct areas. To address the issue of overlap between non-shared spaces, we dynamically adjust their visibility based on user proximity, using a Voronoi diagram to prioritize subspaces closer to each user. Visualizing the surrounding space of each user conveys spatial context, helping others interpret their behavior within their environment. Visibility is updated in real time as users move, maintaining a coherent sense of spatial awareness. Through a user study, we demonstrate that our system enhances enjoyment, spatial understanding, and presence compared to shared-space-only approaches. Quantitative results further show that our dynamic visibility modulation improves both personal space preservation and space accessibility relative to static methods. Overall, our system provides users with a seamless, dynamically connected, and shared multi-room environment.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"6 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Light Transport Operators (LTOs) represent a fundamental concept in computer graphics, modeling single bounces of light within a virtual environment as linears operators on infinite dimensional spaces. While the LTOs play a crucial role in rendering, prior studies have primarily focused on spectral analyses of the light field rather than the operators themselves. This paper presents a rigorous investigation into the spectral properties of the LTOs. Due to their non-compact nature, traditional spectral analysis techniques face challenges in this setting. However, many practical rendering methods effectively employ compact approximations, suggesting that non-compactness is not an absolute barrier. We show the relevance of such approximations and establish various path integral formulations of their spectrum. These findings enhance the theoretical understanding of light transport and offer new perspectives for improving rendering efficiency and accuracy.
{"title":"Spectral Theory of Light Transport Operators","authors":"Cyril Soler, Kartic Subr","doi":"10.1145/3774756","DOIUrl":"https://doi.org/10.1145/3774756","url":null,"abstract":"Light Transport Operators (LTOs) represent a fundamental concept in computer graphics, modeling single bounces of light within a virtual environment as linears operators on infinite dimensional spaces. While the LTOs play a crucial role in rendering, prior studies have primarily focused on spectral analyses of the light field rather than the operators themselves. This paper presents a rigorous investigation into the spectral properties of the LTOs. Due to their non-compact nature, traditional spectral analysis techniques face challenges in this setting. However, many practical rendering methods effectively employ compact approximations, suggesting that non-compactness is not an absolute barrier. We show the relevance of such approximations and establish various path integral formulations of their spectrum. These findings enhance the theoretical understanding of light transport and offer new perspectives for improving rendering efficiency and accuracy.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"53 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145434325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Yang, Yongqing Liang, Xin Li, Congyi Zhang, Guying Lin, Cheng Lin, Alla Sheffer, Scott Schaefer, John Keyser, Wenping Wang
Piecewise parametric surfaces have long been established as prevalent geometric representations; however, they often require surface refinement or sophisticated quadrangulation to accurately represent complex geometries. Geometric deep learning has shown that neural networks can provide greater representational power than conventional methods. Nevertheless, approaches using a single parametric surface for shape fitting struggle to capture fine-grained geometric details, while multi-patch methods fail to ensure seamless connections between adjacent patches. We present Neural Piecewise Parametric Surfaces ( NeuPPS ), the first piecewise neural surface representation that allows for coarse patch layouts composed of arbitrary n -sided surface patches to model complex surface geometries with high precision, offering enhanced flexibility compared to traditional parametric surfaces. This new surface representation guarantees, by construction, the continuity between adjacent patches, a property that other neural patch-based approaches cannot ensure. Two novel components are introduced: a learnable feature complex and a continuous mapping function approximated by multi-layer perceptrons (MLPs). We apply the proposed NeuPPS to surface fitting and shape space learning tasks. Extensive experiments demonstrate the advantages of NeuPPS over traditional parametric representations and existing patch-based learning approaches.
{"title":"NeuPPS: Neural Piecewise Parametric Surfaces","authors":"Lei Yang, Yongqing Liang, Xin Li, Congyi Zhang, Guying Lin, Cheng Lin, Alla Sheffer, Scott Schaefer, John Keyser, Wenping Wang","doi":"10.1145/3771546","DOIUrl":"https://doi.org/10.1145/3771546","url":null,"abstract":"Piecewise parametric surfaces have long been established as prevalent geometric representations; however, they often require surface refinement or sophisticated quadrangulation to accurately represent complex geometries. Geometric deep learning has shown that neural networks can provide greater representational power than conventional methods. Nevertheless, approaches using a single parametric surface for shape fitting struggle to capture fine-grained geometric details, while multi-patch methods fail to ensure seamless connections between adjacent patches. We present <jats:italic toggle=\"yes\">Neural Piecewise Parametric Surfaces</jats:italic> ( <jats:italic toggle=\"yes\">NeuPPS</jats:italic> ), the <jats:italic toggle=\"yes\">first</jats:italic> piecewise neural surface representation that allows for coarse patch layouts composed of <jats:italic toggle=\"yes\"> arbitrary <jats:italic toggle=\"yes\">n</jats:italic> -sided surface patches </jats:italic> to model complex surface geometries with high precision, offering enhanced <jats:italic toggle=\"yes\">flexibility</jats:italic> compared to traditional parametric surfaces. This new surface representation guarantees, by construction, the continuity between adjacent patches, a property that other neural patch-based approaches cannot ensure. Two novel components are introduced: a learnable feature complex and a continuous mapping function approximated by multi-layer perceptrons (MLPs). We apply the proposed <jats:italic toggle=\"yes\">NeuPPS</jats:italic> to surface fitting and shape space learning tasks. Extensive experiments demonstrate the advantages of <jats:italic toggle=\"yes\">NeuPPS</jats:italic> over traditional parametric representations and existing patch-based learning approaches.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"69 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145396367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Dodik, Vincent Sitzmann, Justin Solomon, Oded Stein
Bounded bihramonic weights are a popular tool used to rig and deform characters for animation, to compute reduced-order simulations, and to define feature descriptors for geometry processing. They necessitate tetrahedralizing the volume bounded by the surface, introducing the possibility of meshing artifacts or tetrahedralization failure. We introduce a mesh-free and robust automatic skinning technique that generates weights comparable to the current state of the art, but works reliably even on open surfaces, triangle soups, and point clouds where current methods fail. We achieve this through the use of a specialized Lagrangian representation enabled by the advent of hardware ray-tracing, which circumvents the need for finite elements while optimizing the biharmonic energy and enforcing boundary conditions. The flexibility of our formulation allows us to integrate artistic control through weight painting during the optimization. We offer a thorough qualitative and quantitative evaluation of our method.
{"title":"Robust Biharmonic Skinning Using Geometric Fields","authors":"Ana Dodik, Vincent Sitzmann, Justin Solomon, Oded Stein","doi":"10.1145/3771928","DOIUrl":"https://doi.org/10.1145/3771928","url":null,"abstract":"Bounded bihramonic weights are a popular tool used to rig and deform characters for animation, to compute reduced-order simulations, and to define feature descriptors for geometry processing. They necessitate tetrahedralizing the volume bounded by the surface, introducing the possibility of meshing artifacts or tetrahedralization failure. We introduce a <jats:italic toggle=\"yes\">mesh-free</jats:italic> and <jats:italic toggle=\"yes\">robust</jats:italic> automatic skinning technique that generates weights comparable to the current state of the art, but works reliably even on open surfaces, triangle soups, and point clouds where current methods fail. We achieve this through the use of a specialized Lagrangian representation enabled by the advent of hardware ray-tracing, which circumvents the need for finite elements while optimizing the biharmonic energy and enforcing boundary conditions. The flexibility of our formulation allows us to integrate artistic control through weight painting during the optimization. We offer a thorough qualitative and quantitative evaluation of our method.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"19 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145396487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kwanggyoon Seo, Rene Culaway, Byeong-Uk Lee, Junyong Noh
Manipulating the emotion of a performer in a video is a challenging task. The lip motion needs to be preserved while performing the desired changes in the emotion of the subject; however, simply utilizing existing image-based editing methods sabotages the original lip synchronization. We tackle this problem by utilizing a pretrained StyleGAN paired with a landmark-based editing module that modifies the bias present in the edit direction used in image manipulation. The proposed editing module consists of a latent-based landmark detection network and an editing network that modifies the editing direction to match the original lip synchronization while preserving the desired emotion manipulation results. This is realized by taking the facial landmarks as control points. Both networks operate on the latent space, which enables fast training and inference. We show that the proposed method runs significantly faster and performs better in terms of visual quality than alternative approaches, which was validated through a perceptual study. The proposed method can also be extended to perform face reenactment to generate a talking-head video from a single image and face image manipulation using facial landmarks as control points.
{"title":"Emotion Manipulation for Talking-Head Videos via Facial Landmarks","authors":"Kwanggyoon Seo, Rene Culaway, Byeong-Uk Lee, Junyong Noh","doi":"10.1145/3770576","DOIUrl":"https://doi.org/10.1145/3770576","url":null,"abstract":"Manipulating the emotion of a performer in a video is a challenging task. The lip motion needs to be preserved while performing the desired changes in the emotion of the subject; however, simply utilizing existing image-based editing methods sabotages the original lip synchronization. We tackle this problem by utilizing a pretrained StyleGAN paired with a landmark-based editing module that modifies the bias present in the edit direction used in image manipulation. The proposed editing module consists of a latent-based landmark detection network and an editing network that modifies the editing direction to match the original lip synchronization while preserving the desired emotion manipulation results. This is realized by taking the facial landmarks as control points. Both networks operate on the latent space, which enables fast training and inference. We show that the proposed method runs significantly faster and performs better in terms of visual quality than alternative approaches, which was validated through a perceptual study. The proposed method can also be extended to perform face reenactment to generate a talking-head video from a single image and face image manipulation using facial landmarks as control points.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"24 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145282651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural radiance fields, or NeRFs, have become the de facto approach for high-quality view synthesis from a collection of images captured from multiple viewpoints. However, many issues remain when capturing images in-the-wild under challenging conditions, such as in low light, high dynamic range, or with rapid motion, leading to smeared reconstructions with noticeable artifacts. In this work, we introduce quanta radiance fields , a novel class of neural radiance fields that are trained at the granularity of individual photons using single-photon cameras (SPCs). We develop theory and practical computational techniques for building radiance fields and estimating dense camera poses from unconventional, stochastic, and high-speed binary frame sequences captured by SPCs. We demonstrate, both via simulations and a SPC hardware prototype, high-fidelity reconstructions under high-speed motion, in low light, and for extreme dynamic range settings.
{"title":"Radiance Fields from Photons","authors":"Sacha Jungerman, Aryan Garg, Mohit Gupta","doi":"10.1145/3770578","DOIUrl":"https://doi.org/10.1145/3770578","url":null,"abstract":"Neural radiance fields, or NeRFs, have become the de facto approach for high-quality view synthesis from a collection of images captured from multiple viewpoints. However, many issues remain when capturing images in-the-wild under challenging conditions, such as in low light, high dynamic range, or with rapid motion, leading to smeared reconstructions with noticeable artifacts. In this work, we introduce <jats:italic toggle=\"yes\">quanta radiance fields</jats:italic> , a novel class of neural radiance fields that are trained at the granularity of individual photons using single-photon cameras (SPCs). We develop theory and practical computational techniques for building radiance fields and estimating dense camera poses from unconventional, stochastic, and high-speed binary frame sequences captured by SPCs. We demonstrate, both via simulations and a SPC hardware prototype, high-fidelity reconstructions under high-speed motion, in low light, and for extreme dynamic range settings.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"112 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145241484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jintao Lu, He Zhang, Yuting Ye, Takaaki Shiratori, Sebastian Starke, Taku Komura
Animating human-scene interactions such as picking and placing a wide range of objects with different geometries is a challenging task, especially in a cluttered environment where interactions with complex articulated containers are involved. The main difficulty lies in the sparsity of the motion data compared to the wide variation of the objects and environments, as well as the poor availability of transition motions between different actions, increasing the complexity of the generalization to arbitrary conditions. To cope with this issue, we develop a system that tackles the interaction synthesis problem as a hierarchical goal-driven task. Firstly, we develop a bimanual scheduler that plans a set of keyframes for simultaneously controlling the two hands to efficiently achieve the pick-and-place task from an abstract goal signal such as the target object selected by the user. Next, we develop a neural implicit planner that generates hand trajectories to guide reaching and leaving motions across diverse object shapes/types and obstacle layouts. Finally, we propose a linear dynamic model for our DeepPhase controller that incorporates a Kalman filter to enable smooth transitions in the frequency domain, resulting in a more realistic and effective multi-objective control of the character. Our system can synthesize a rich variety of natural pick-and-place movements that adapt to different object geometries, container articulations, and scene layouts.
{"title":"CHOICE: Coordinated Human-Object Interaction in Cluttered Environments for Pick-and-Place Actions","authors":"Jintao Lu, He Zhang, Yuting Ye, Takaaki Shiratori, Sebastian Starke, Taku Komura","doi":"10.1145/3770746","DOIUrl":"https://doi.org/10.1145/3770746","url":null,"abstract":"Animating human-scene interactions such as picking and placing a wide range of objects with different geometries is a challenging task, especially in a cluttered environment where interactions with complex articulated containers are involved. The main difficulty lies in the sparsity of the motion data compared to the wide variation of the objects and environments, as well as the poor availability of transition motions between different actions, increasing the complexity of the generalization to arbitrary conditions. To cope with this issue, we develop a system that tackles the interaction synthesis problem as a hierarchical goal-driven task. Firstly, we develop a bimanual scheduler that plans a set of keyframes for simultaneously controlling the two hands to efficiently achieve the pick-and-place task from an abstract goal signal such as the target object selected by the user. Next, we develop a neural implicit planner that generates hand trajectories to guide reaching and leaving motions across diverse object shapes/types and obstacle layouts. Finally, we propose a linear dynamic model for our DeepPhase controller that incorporates a Kalman filter to enable smooth transitions in the frequency domain, resulting in a more realistic and effective multi-objective control of the character. Our system can synthesize a rich variety of natural pick-and-place movements that adapt to different object geometries, container articulations, and scene layouts.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"55 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145203278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The introduction of the neural implicit representation has notably propelled the advancement of online dense reconstruction techniques. Compared to traditional explicit representations, such as TSDF, it substantially improves the mapping completeness and memory efficiency. However, the lack of reconstruction details and the time-consuming learning of neural representations hinder the widespread application of neural-based methods to large-scale online reconstruction. We introduce RemixFusion, a novel residual-based mixed representation for scene reconstruction and camera pose estimation dedicated to high-quality and large-scale online RGB-D reconstruction. In particular, we propose a residual-based map representation comprised of an explicit coarse TSDF grid and an implicit neural module that produces residuals representing fine-grained details to be added to the coarse grid. Such mixed representation allows for detail-rich reconstruction with bounded time and memory budget, contrasting with the overly-smoothed results by the purely implicit representations, thus paving the way for high-quality camera tracking. Furthermore, we extend the residual-based representation to handle multi-frame joint pose optimization via bundle adjustment (BA). In contrast to the existing methods, which optimize poses directly, we opt to optimize pose changes. Combined with a novel technique for adaptive gradient amplification, our method attains better optimization convergence and global optimality. Furthermore, we adopt a local moving volume to factorize the whole mixed scene representation with a divide-and-conquer design to facilitate efficient online learning in our residual-based framework. Extensive experiments demonstrate that our method surpasses all state-of-the-art ones, including those based either on explicit or implicit representations, in terms of the accuracy of both mapping and tracking on large-scale scenes.
{"title":"RemixFusion: Residual-based Mixed Representation for Large-scale Online RGB-D Reconstruction","authors":"Yuqing Lan, Chenyang Zhu, Shuaifeng Zhi, Jiazhao Zhang, Zhoufeng Wang, Renjiao Yi, Yijie Wang, Kai Xu","doi":"10.1145/3769007","DOIUrl":"https://doi.org/10.1145/3769007","url":null,"abstract":"The introduction of the neural implicit representation has notably propelled the advancement of online dense reconstruction techniques. Compared to traditional explicit representations, such as TSDF, it substantially improves the mapping completeness and memory efficiency. However, the lack of reconstruction details and the time-consuming learning of neural representations hinder the widespread application of neural-based methods to large-scale online reconstruction. We introduce RemixFusion, a novel residual-based mixed representation for scene reconstruction and camera pose estimation dedicated to high-quality and large-scale online RGB-D reconstruction. In particular, we propose a residual-based map representation comprised of an explicit coarse TSDF grid and an implicit neural module that produces residuals representing fine-grained details to be added to the coarse grid. Such mixed representation allows for detail-rich reconstruction with bounded time and memory budget, contrasting with the overly-smoothed results by the purely implicit representations, thus paving the way for high-quality camera tracking. Furthermore, we extend the residual-based representation to handle multi-frame joint pose optimization via bundle adjustment (BA). In contrast to the existing methods, which optimize poses directly, we opt to optimize pose changes. Combined with a novel technique for adaptive gradient amplification, our method attains better optimization convergence and global optimality. Furthermore, we adopt a local moving volume to factorize the whole mixed scene representation with a divide-and-conquer design to facilitate efficient online learning in our residual-based framework. Extensive experiments demonstrate that our method surpasses all state-of-the-art ones, including those based either on explicit or implicit representations, in terms of the accuracy of both mapping and tracking on large-scale scenes.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"38 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145089117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a general method for computing local parameterizations rooted at a point on a surface, where the surface is described only through a signed implicit function and a corresponding projection function. Using a two-stage process, we compute several points radially emanating from the map origin, and interpolate between them with a spline surface. The narrow interface of our method allows it to support several kinds of geometry such as signed distance functions, general analytic implicit functions, triangle meshes, neural implicits, and point clouds. We demonstrate the high quality of our generated parameterizations on a variety of examples, and show applications in local texturing and surface curve drawing.
{"title":"Local Surface Parameterizations via Smoothed Geodesic Splines","authors":"Abhishek Madan, David Levin","doi":"10.1145/3767323","DOIUrl":"https://doi.org/10.1145/3767323","url":null,"abstract":"We present a general method for computing local parameterizations rooted at a point on a surface, where the surface is described only through a signed implicit function and a corresponding projection function. Using a two-stage process, we compute several points radially emanating from the map origin, and interpolate between them with a spline surface. The narrow interface of our method allows it to support several kinds of geometry such as signed distance functions, general analytic implicit functions, triangle meshes, neural implicits, and point clouds. We demonstrate the high quality of our generated parameterizations on a variety of examples, and show applications in local texturing and surface curve drawing.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"18 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145084085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}