Jintao Lu, He Zhang, Yuting Ye, Takaaki Shiratori, Sebastian Starke, Taku Komura
Animating human-scene interactions such as picking and placing a wide range of objects with different geometries is a challenging task, especially in a cluttered environment where interactions with complex articulated containers are involved. The main difficulty lies in the sparsity of the motion data compared to the wide variation of the objects and environments, as well as the poor availability of transition motions between different actions, increasing the complexity of the generalization to arbitrary conditions. To cope with this issue, we develop a system that tackles the interaction synthesis problem as a hierarchical goal-driven task. Firstly, we develop a bimanual scheduler that plans a set of keyframes for simultaneously controlling the two hands to efficiently achieve the pick-and-place task from an abstract goal signal such as the target object selected by the user. Next, we develop a neural implicit planner that generates hand trajectories to guide reaching and leaving motions across diverse object shapes/types and obstacle layouts. Finally, we propose a linear dynamic model for our DeepPhase controller that incorporates a Kalman filter to enable smooth transitions in the frequency domain, resulting in a more realistic and effective multi-objective control of the character. Our system can synthesize a rich variety of natural pick-and-place movements that adapt to different object geometries, container articulations, and scene layouts.
{"title":"CHOICE: Coordinated Human-Object Interaction in Cluttered Environments for Pick-and-Place Actions","authors":"Jintao Lu, He Zhang, Yuting Ye, Takaaki Shiratori, Sebastian Starke, Taku Komura","doi":"10.1145/3770746","DOIUrl":"https://doi.org/10.1145/3770746","url":null,"abstract":"Animating human-scene interactions such as picking and placing a wide range of objects with different geometries is a challenging task, especially in a cluttered environment where interactions with complex articulated containers are involved. The main difficulty lies in the sparsity of the motion data compared to the wide variation of the objects and environments, as well as the poor availability of transition motions between different actions, increasing the complexity of the generalization to arbitrary conditions. To cope with this issue, we develop a system that tackles the interaction synthesis problem as a hierarchical goal-driven task. Firstly, we develop a bimanual scheduler that plans a set of keyframes for simultaneously controlling the two hands to efficiently achieve the pick-and-place task from an abstract goal signal such as the target object selected by the user. Next, we develop a neural implicit planner that generates hand trajectories to guide reaching and leaving motions across diverse object shapes/types and obstacle layouts. Finally, we propose a linear dynamic model for our DeepPhase controller that incorporates a Kalman filter to enable smooth transitions in the frequency domain, resulting in a more realistic and effective multi-objective control of the character. Our system can synthesize a rich variety of natural pick-and-place movements that adapt to different object geometries, container articulations, and scene layouts.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"55 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145203278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The introduction of the neural implicit representation has notably propelled the advancement of online dense reconstruction techniques. Compared to traditional explicit representations, such as TSDF, it substantially improves the mapping completeness and memory efficiency. However, the lack of reconstruction details and the time-consuming learning of neural representations hinder the widespread application of neural-based methods to large-scale online reconstruction. We introduce RemixFusion, a novel residual-based mixed representation for scene reconstruction and camera pose estimation dedicated to high-quality and large-scale online RGB-D reconstruction. In particular, we propose a residual-based map representation comprised of an explicit coarse TSDF grid and an implicit neural module that produces residuals representing fine-grained details to be added to the coarse grid. Such mixed representation allows for detail-rich reconstruction with bounded time and memory budget, contrasting with the overly-smoothed results by the purely implicit representations, thus paving the way for high-quality camera tracking. Furthermore, we extend the residual-based representation to handle multi-frame joint pose optimization via bundle adjustment (BA). In contrast to the existing methods, which optimize poses directly, we opt to optimize pose changes. Combined with a novel technique for adaptive gradient amplification, our method attains better optimization convergence and global optimality. Furthermore, we adopt a local moving volume to factorize the whole mixed scene representation with a divide-and-conquer design to facilitate efficient online learning in our residual-based framework. Extensive experiments demonstrate that our method surpasses all state-of-the-art ones, including those based either on explicit or implicit representations, in terms of the accuracy of both mapping and tracking on large-scale scenes.
{"title":"RemixFusion: Residual-based Mixed Representation for Large-scale Online RGB-D Reconstruction","authors":"Yuqing Lan, Chenyang Zhu, Shuaifeng Zhi, Jiazhao Zhang, Zhoufeng Wang, Renjiao Yi, Yijie Wang, Kai Xu","doi":"10.1145/3769007","DOIUrl":"https://doi.org/10.1145/3769007","url":null,"abstract":"The introduction of the neural implicit representation has notably propelled the advancement of online dense reconstruction techniques. Compared to traditional explicit representations, such as TSDF, it substantially improves the mapping completeness and memory efficiency. However, the lack of reconstruction details and the time-consuming learning of neural representations hinder the widespread application of neural-based methods to large-scale online reconstruction. We introduce RemixFusion, a novel residual-based mixed representation for scene reconstruction and camera pose estimation dedicated to high-quality and large-scale online RGB-D reconstruction. In particular, we propose a residual-based map representation comprised of an explicit coarse TSDF grid and an implicit neural module that produces residuals representing fine-grained details to be added to the coarse grid. Such mixed representation allows for detail-rich reconstruction with bounded time and memory budget, contrasting with the overly-smoothed results by the purely implicit representations, thus paving the way for high-quality camera tracking. Furthermore, we extend the residual-based representation to handle multi-frame joint pose optimization via bundle adjustment (BA). In contrast to the existing methods, which optimize poses directly, we opt to optimize pose changes. Combined with a novel technique for adaptive gradient amplification, our method attains better optimization convergence and global optimality. Furthermore, we adopt a local moving volume to factorize the whole mixed scene representation with a divide-and-conquer design to facilitate efficient online learning in our residual-based framework. Extensive experiments demonstrate that our method surpasses all state-of-the-art ones, including those based either on explicit or implicit representations, in terms of the accuracy of both mapping and tracking on large-scale scenes.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"38 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145089117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a general method for computing local parameterizations rooted at a point on a surface, where the surface is described only through a signed implicit function and a corresponding projection function. Using a two-stage process, we compute several points radially emanating from the map origin, and interpolate between them with a spline surface. The narrow interface of our method allows it to support several kinds of geometry such as signed distance functions, general analytic implicit functions, triangle meshes, neural implicits, and point clouds. We demonstrate the high quality of our generated parameterizations on a variety of examples, and show applications in local texturing and surface curve drawing.
{"title":"Local Surface Parameterizations via Smoothed Geodesic Splines","authors":"Abhishek Madan, David Levin","doi":"10.1145/3767323","DOIUrl":"https://doi.org/10.1145/3767323","url":null,"abstract":"We present a general method for computing local parameterizations rooted at a point on a surface, where the surface is described only through a signed implicit function and a corresponding projection function. Using a two-stage process, we compute several points radially emanating from the map origin, and interpolate between them with a spline surface. The narrow interface of our method allows it to support several kinds of geometry such as signed distance functions, general analytic implicit functions, triangle meshes, neural implicits, and point clouds. We demonstrate the high quality of our generated parameterizations on a variety of examples, and show applications in local texturing and surface curve drawing.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"18 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145084085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Discontinuous visibility changes remain a major bottleneck when optimizing surfaces within a physically based inverse renderer. Many previous works have proposed sophisticated algorithms and data structures to sample visibility silhouettes more efficiently. Our work presents another solution: instead of evolving a surface locally, we extend differentiation to hypothetical surface patches anywhere in 3D space. We refer to this as a “many-worlds” representation because it models a superposition of independent surface hypotheses that compete to explain the reference images. These hypotheses do not interact through shadowing or scattering, leading to a new transport law that distinguishes our method from prior work based on exponential random media. The complete elimination of visibility-related discontinuity handling bypasses the most complex and costly component of prior inverse rendering methods, while the extended derivative domain promotes rapid convergence. We demonstrate that the resulting Monte Carlo algorithm solves physically based inverse problems with both reduced per-iteration cost and fewer total iterations.
{"title":"Many-Worlds Inverse Rendering","authors":"Ziyi Zhang, Nicolas Roussel, Wenzel Jakob","doi":"10.1145/3767318","DOIUrl":"https://doi.org/10.1145/3767318","url":null,"abstract":"Discontinuous visibility changes remain a major bottleneck when optimizing surfaces within a physically based inverse renderer. Many previous works have proposed sophisticated algorithms and data structures to sample visibility silhouettes more efficiently. Our work presents another solution: instead of evolving a surface locally, we extend differentiation to hypothetical surface patches anywhere in 3D space. We refer to this as a “many-worlds” representation because it models a superposition of independent surface hypotheses that compete to explain the reference images. These hypotheses do not interact through shadowing or scattering, leading to a new transport law that distinguishes our method from prior work based on exponential random media. The complete elimination of visibility-related discontinuity handling bypasses the most complex and costly component of prior inverse rendering methods, while the extended derivative domain promotes rapid convergence. We demonstrate that the resulting Monte Carlo algorithm solves physically based inverse problems with both reduced per-iteration cost and fewer total iterations.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"14 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145072127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a fast, efficient, and robust feature protected 3D mesh denoising method based on a modified Lengyel-Epstein (LE) model, primarily aiming to ensure volume stability and deliver superior denoising results. Compared with the original model, we mainly introduce a function expression ζ ( X ) to replace the fixed parameters. The modified model is then discretized using a seven-point difference scheme and solved by an explicit Euler method. Notably, our approach requires no training samples or upfront training time, significantly enhancing overall computational efficiency.
{"title":"A fast, efficient, and robust feature protected denoising method","authors":"Mengyu Luo, Jian Wang","doi":"10.1145/3765902","DOIUrl":"https://doi.org/10.1145/3765902","url":null,"abstract":"This paper proposes a fast, efficient, and robust feature protected 3D mesh denoising method based on a modified Lengyel-Epstein (LE) model, primarily aiming to ensure volume stability and deliver superior denoising results. Compared with the original model, we mainly introduce a function expression <jats:italic toggle=\"yes\">ζ</jats:italic> ( <jats:italic toggle=\"yes\">X</jats:italic> ) to replace the fixed parameters. The modified model is then discretized using a seven-point difference scheme and solved by an explicit Euler method. Notably, our approach requires no training samples or upfront training time, significantly enhancing overall computational efficiency.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"3 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144987697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
José Fernández-Fernández, Fabian Löschner, Lukas Westhofen, Andreas Longva, Jan Bender
Optimization time integrators are effective at solving complex multi-physics problems including deformable solids with non-linear material models, contact with friction, strain limiting, etc. For challenging problems, Newton-type optimizers are often used, which necessitates first- and second-order derivatives of the global non-linear objective function. Manually differentiating, implementing, testing, optimizing, and maintaining the resulting code is extremely time-consuming, error-prone, and precludes quick changes to the model, even when using tools that assist with parts of such pipeline. We present SymX, an open source framework that computes the required derivatives of the different energy contributions by symbolic differentiation, generates optimized code, compiles it on-the-fly, and performs the global assembly. The user only has to provide the symbolic expression of each energy for a single representative element in its corresponding discretization and our system will determine the assembled derivatives for the whole simulation. We demonstrate the versatility of SymX in complex simulations featuring different non-linear materials, high-order finite elements, rigid body systems, adaptive discretizations, frictional contact, and coupling of multiple interacting physical systems. SymX’s derivatives offer performance on par with SymPy, an established off-the-shelf symbolic engine, and produces simulations at least one order of magnitude faster than TinyAD, an alternative state-of-the-art integral solution.
{"title":"SymX: Energy-based Simulation from Symbolic Expressions","authors":"José Fernández-Fernández, Fabian Löschner, Lukas Westhofen, Andreas Longva, Jan Bender","doi":"10.1145/3764928","DOIUrl":"https://doi.org/10.1145/3764928","url":null,"abstract":"Optimization time integrators are effective at solving complex multi-physics problems including deformable solids with non-linear material models, contact with friction, strain limiting, etc. For challenging problems, Newton-type optimizers are often used, which necessitates first- and second-order derivatives of the global non-linear objective function. Manually differentiating, implementing, testing, optimizing, and maintaining the resulting code is extremely time-consuming, error-prone, and precludes quick changes to the model, even when using tools that assist with parts of such pipeline. We present SymX, an open source framework that computes the required derivatives of the different energy contributions by symbolic differentiation, generates optimized code, compiles it on-the-fly, and performs the global assembly. The user only has to provide the symbolic expression of each energy for a single representative element in its corresponding discretization and our system will determine the assembled derivatives for the whole simulation. We demonstrate the versatility of SymX in complex simulations featuring different non-linear materials, high-order finite elements, rigid body systems, adaptive discretizations, frictional contact, and coupling of multiple interacting physical systems. SymX’s derivatives offer performance on par with SymPy, an established off-the-shelf symbolic engine, and produces simulations at least one order of magnitude faster than TinyAD, an alternative state-of-the-art integral solution.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"24 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shambel Fente Mengistu, Filippo Bergamasco, Mara Pistellato
Reflectance Transformation Imaging (RTI) is a computational photography technique in which an object is acquired from a fixed point-of-view with different light directions. The aim is to estimate the light transport function at each point so that the object can be interactively relighted in a physically-accurate way, revealing its surface characteristics. In this paper, we propose a novel RTI approach describing surface reflectance as an implicit neural representation acting as a ”relightable image” for a specific object. We propose to represent the light transport function with a Neural Reflectance Field (NRF) model, feeding it with pixel coordinates, light direction, and a latent vector encoding the per-pixel reflectance in a neighbourhood. These vectors, computed during training, allow a more accurate relighting than a pure implicit representation (i.e., relying only on positional encoding) enabling the NRF to handle complex surface shadings. Moreover, they can be efficiently stored with the learned NRF for compression and transmission. As an additional contribution, we propose a novel synthetic dataset containing objects of various shapes and materials created with a physically based rendering software. An extensive experimental section shows that the proposed NRF accurately models the light transport function for challenging datasets in synthetic and real-world scenarios.
{"title":"A Neural Reflectance Field Model for Accurate Relighting in RTI Applications","authors":"Shambel Fente Mengistu, Filippo Bergamasco, Mara Pistellato","doi":"10.1145/3759452","DOIUrl":"https://doi.org/10.1145/3759452","url":null,"abstract":"Reflectance Transformation Imaging (RTI) is a computational photography technique in which an object is acquired from a fixed point-of-view with different light directions. The aim is to estimate the light transport function at each point so that the object can be interactively relighted in a physically-accurate way, revealing its surface characteristics. In this paper, we propose a novel RTI approach describing surface reflectance as an implicit neural representation acting as a ”relightable image” for a specific object. We propose to represent the light transport function with a Neural Reflectance Field (NRF) model, feeding it with pixel coordinates, light direction, and a latent vector encoding the per-pixel reflectance in a neighbourhood. These vectors, computed during training, allow a more accurate relighting than a pure implicit representation (i.e., relying only on positional encoding) enabling the NRF to handle complex surface shadings. Moreover, they can be efficiently stored with the learned NRF for compression and transmission. As an additional contribution, we propose a novel synthetic dataset containing objects of various shapes and materials created with a physically based rendering software. An extensive experimental section shows that the proposed NRF accurately models the light transport function for challenging datasets in synthetic and real-world scenarios.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"27 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144915648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High-refresh rate displays have become very popular in recent years due to the need for superior visual quality in gaming, professional displays and specialized applications such as medical imaging. However, high-refresh rate displays alone do not guarantee a superior visual experience; the GPU needs to render frames at a matching rate. Otherwise, we observe disconcerting visual artifacts such as screen tearing and stuttering. Real-time frame generation is an effective technique to increase frame rates by predicting new frames from other rendered frames. There are two methods in this space: interpolation and extrapolation. Interpolation-based methods provide good image quality at the cost of a higher runtime because they also require the next rendered frame. On the other hand, extrapolation methods are much faster at the cost of quality. This paper introduces PatchEX , a novel frame extrapolation method that aims to provide the quality of interpolation at the speed of extrapolation. It smartly segments each frame into foreground and background regions and employs a novel neural network to generate the final extrapolated frame. Additionally, a wavelet transform (WT)-based filter pruning technique is applied to compress the network, significantly reducing the runtime of the extrapolation process. Our results demonstrate that PatchEX achieves a 61.32% and 49.21% improvement in PSNR over the latest extrapolation methods ExtraNet and ExtraSS, respectively, while being 3 × and 2.6 × faster, respectively.
{"title":"PatchEX: High-Quality Real-Time Temporal Supersampling through Patch-based Parallel Extrapolation","authors":"Akanksha Dixit, Smruti R. Sarangi","doi":"10.1145/3759247","DOIUrl":"https://doi.org/10.1145/3759247","url":null,"abstract":"High-refresh rate displays have become very popular in recent years due to the need for superior visual quality in gaming, professional displays and specialized applications such as medical imaging. However, high-refresh rate displays alone do not guarantee a superior visual experience; the GPU needs to render frames at a matching rate. Otherwise, we observe disconcerting visual artifacts such as screen tearing and stuttering. Real-time frame generation is an effective technique to increase frame rates by predicting new frames from other rendered frames. There are two methods in this space: interpolation and extrapolation. Interpolation-based methods provide good image quality at the cost of a higher runtime because they also require the next rendered frame. On the other hand, extrapolation methods are much faster at the cost of quality. This paper introduces <jats:italic toggle=\"yes\">PatchEX</jats:italic> , a novel frame extrapolation method that aims to provide the quality of interpolation at the speed of extrapolation. It smartly segments each frame into foreground and background regions and employs a novel neural network to generate the final extrapolated frame. Additionally, a wavelet transform (WT)-based filter pruning technique is applied to compress the network, significantly reducing the runtime of the extrapolation process. Our results demonstrate that <jats:italic toggle=\"yes\">PatchEX</jats:italic> achieves a 61.32% and 49.21% improvement in PSNR over the latest extrapolation methods ExtraNet and ExtraSS, respectively, while being 3 × and 2.6 × faster, respectively.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"19 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce Iris3D, a novel 3D content generation system that generates vivid textures and detailed 3D shapes while preserving the input information. Our system integrates a Multi-View Large Reconstruction Model (MVLRM [25]) to generate a coarse 3D mesh and introduces a novel optimization scheme called Synchronized Diffusion Distillation (SDD) for refinement. Unlike previous refined methods based on Score Distillation Sampling (SDS), which suffer from unstable optimization and geometric over-smoothing due to ambiguities across different views and modalities, our method effectively distills consistent multi-view and multi-modal priors from 2D diffusion models in a training-free manner. This enables robust optimization of 3D representations. Additionally, because SDD is training-free, it preserves the diffusion’s prior knowledge and mitigates potential degradation. This characteristic makes it highly compatible with advanced 2D diffusion techniques like IP-Adapters and ControlNet, allowing for more controllable 3D generation with additional conditioning signals. Experiments demonstrate that our method produces high-quality 3D results with plausible textures and intricate geometric details.
{"title":"Iris3D: 3D Generation via Synchronized Diffusion Distillation","authors":"Yixun Liang, Weiyu Li, Rui Chen, Fei-Peng Tian, Jiarui Liu, Ying-Cong Chen, Ping Tan, Xiao-Xiao Long","doi":"10.1145/3759249","DOIUrl":"https://doi.org/10.1145/3759249","url":null,"abstract":"We introduce Iris3D, a novel 3D content generation system that generates vivid textures and detailed 3D shapes while preserving the input information. Our system integrates a Multi-View Large Reconstruction Model (MVLRM [25]) to generate a coarse 3D mesh and introduces a novel optimization scheme called Synchronized Diffusion Distillation (SDD) for refinement. Unlike previous refined methods based on Score Distillation Sampling (SDS), which suffer from unstable optimization and geometric over-smoothing due to ambiguities across different views and modalities, our method effectively distills consistent multi-view and multi-modal priors from 2D diffusion models in a training-free manner. This enables robust optimization of 3D representations. Additionally, because SDD is training-free, it preserves the diffusion’s prior knowledge and mitigates potential degradation. This characteristic makes it highly compatible with advanced 2D diffusion techniques like IP-Adapters and ControlNet, allowing for more controllable 3D generation with additional conditioning signals. Experiments demonstrate that our method produces high-quality 3D results with plausible textures and intricate geometric details.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"55 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144792842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
3D Gaussian Splatting (3DGS) has shown a powerful capability for novel view synthesis due to its detailed expressive ability and highly efficient rendering speed. Unfortunately, creating relightable 3D assets and reconstructing faithful geometry with 3DGS is still problematic, particularly for reflective objects, as its discontinuous representation raises difficulties in constraining geometries. In contrary, volumetric signed distance field (SDF) methods provide robust geometry reconstruction, while the expensive ray marching hinders its real-time application and slows the training. Besides, these methods struggle to capture sharp geometric details. To this end, we propose to guide 3DGS and SDF bidirectionally in a complementary manner, including an SDF-aided Gaussian splatting for efficient optimization of the relighting model and a GS-guided SDF enhancement for high-quality geometry reconstruction. At the core of our SDF-aided Gaussian splatting is the mutual supervision of the depth and normal between blended Gaussians and SDF, which avoids the expensive volume rendering of SDF. Thanks to this mutual supervision, the learned blended Gaussians are well-constrained with a minimal time cost. As the Gaussians are rendered in a deferred shading mode, the alpha-blended Gaussians are smooth, while individual Gaussians may still be outliers, yielding floater artifacts. Therefore, we introduce an SDF-aware pruning strategy to remove Gaussian outliers located distant from the surface defined by SDF, avoiding the floater issue. This way, our GS framework provides reasonable normal and achieves realistic relighting, while the mesh of truncated SDF (TSDF) fusion from depth is still problematic. Therefore, we design a GS-guided SDF refinement, which utilizes the blended normal from Gaussians to finetune SDF. Equipped with the efficient enhancement, our method can further provide high-quality meshes for reflective objects at the cost of 17% extra training time. Consequently, our method outperforms the existing Gaussian-based inverse rendering methods in terms of relighting and mesh quality. Our method also exhibits competitive relighting/mesh quality compared to NeRF-based methods with at most 25%/33% of training time and allows rendering at 200+ frames per second on an RTX4090. Our code is available at https://github.com/NK-CS-ZZL/GS-ROR.
{"title":"GS-ROR 2 : Bidirectional-guided 3DGS and SDF for Reflective Object Relighting and Reconstruction","authors":"Zuoliang Zhu, Beibei Wang, Jian Yang","doi":"10.1145/3759248","DOIUrl":"https://doi.org/10.1145/3759248","url":null,"abstract":"3D Gaussian Splatting (3DGS) has shown a powerful capability for novel view synthesis due to its detailed expressive ability and highly efficient rendering speed. Unfortunately, creating relightable 3D assets and reconstructing faithful geometry with 3DGS is still problematic, particularly for reflective objects, as its discontinuous representation raises difficulties in constraining geometries. In contrary, volumetric signed distance field (SDF) methods provide robust geometry reconstruction, while the expensive ray marching hinders its real-time application and slows the training. Besides, these methods struggle to capture sharp geometric details. To this end, we propose to guide 3DGS and SDF bidirectionally in a complementary manner, including an SDF-aided Gaussian splatting for efficient optimization of the relighting model and a GS-guided SDF enhancement for high-quality geometry reconstruction. At the core of our SDF-aided Gaussian splatting is the <jats:italic toggle=\"yes\">mutual supervision</jats:italic> of the depth and normal between blended Gaussians and SDF, which avoids the expensive volume rendering of SDF. Thanks to this mutual supervision, the learned blended Gaussians are well-constrained with a minimal time cost. As the Gaussians are rendered in a deferred shading mode, the alpha-blended Gaussians are smooth, while individual Gaussians may still be outliers, yielding floater artifacts. Therefore, we introduce an SDF-aware pruning strategy to remove Gaussian outliers located distant from the surface defined by SDF, avoiding the floater issue. This way, our GS framework provides reasonable normal and achieves realistic relighting, while the mesh of truncated SDF (TSDF) fusion from depth is still problematic. Therefore, we design a GS-guided SDF refinement, which utilizes the blended normal from Gaussians to finetune SDF. Equipped with the efficient enhancement, our method can further provide high-quality meshes for reflective objects at the cost of 17% extra training time. Consequently, our method outperforms the existing Gaussian-based inverse rendering methods in terms of relighting and mesh quality. Our method also exhibits competitive relighting/mesh quality compared to NeRF-based methods with at most 25%/33% of training time and allows rendering at 200+ frames per second on an RTX4090. Our code is available at https://github.com/NK-CS-ZZL/GS-ROR.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"9 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144792846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}