Estimating lighting in indoor scenes is particularly challenging due to diverse distribution of light sources and complexity of scene geometry. Previous methods mainly focused on spatial variability and consistency for a single image or temporal consistency for video sequences. However, these approaches fail to achieve spatio-temporal consistency in video lighting estimation, which restricts applications such as compositing animated models into videos. In this paper, we propose STGlight, a lightweight and effective method for spatio-temporally consistent video lighting estimation, where our network processes a stream of LDR RGB-D video frames while maintaining incrementally updated global representations of both geometry and lighting, enabling the prediction of HDR environment maps at arbitrary locations for each frame. We model indoor lighting with three components: visible light sources providing direct illumination, ambient lighting approximating indirect illumination, and local environment textures producing high-quality specular reflections on glossy objects. To capture spatial-varying lighting, we represent scene geometry with point clouds, which support efficient spatio-temporal fusion and allow us to handle moderately dynamic scenes. To ensure temporal consistency, we apply a transformer-based fusion block that propagates lighting features across frames. Building on this, we further handle dynamic lighting with moving objects or changing light conditions by applying intrinsic decomposition on the point cloud and integrating the decomposed components with a neural fusion module. Experiments show that our online method can effectively predict lighting for any position within the video stream, while maintaining spatial variability and spatio-temporal consistency. Code is available at: https://github.com/nauyihsnehs/STGlight.
{"title":"STGlight: Online Indoor Lighting Estimation via Spatio-Temporal Gaussian Fusion","authors":"Shiyuan Shen, Zhongyun Bao, Hong Ding, Wenju Xu, Tenghui Lai, Chunxia Xiao","doi":"10.1145/3763350","DOIUrl":"https://doi.org/10.1145/3763350","url":null,"abstract":"Estimating lighting in indoor scenes is particularly challenging due to diverse distribution of light sources and complexity of scene geometry. Previous methods mainly focused on spatial variability and consistency for a single image or temporal consistency for video sequences. However, these approaches fail to achieve spatio-temporal consistency in video lighting estimation, which restricts applications such as compositing animated models into videos. In this paper, we propose STGlight, a lightweight and effective method for spatio-temporally consistent video lighting estimation, where our network processes a stream of LDR RGB-D video frames while maintaining incrementally updated global representations of both geometry and lighting, enabling the prediction of HDR environment maps at arbitrary locations for each frame. We model indoor lighting with three components: visible light sources providing direct illumination, ambient lighting approximating indirect illumination, and local environment textures producing high-quality specular reflections on glossy objects. To capture spatial-varying lighting, we represent scene geometry with point clouds, which support efficient spatio-temporal fusion and allow us to handle moderately dynamic scenes. To ensure temporal consistency, we apply a transformer-based fusion block that propagates lighting features across frames. Building on this, we further handle dynamic lighting with moving objects or changing light conditions by applying intrinsic decomposition on the point cloud and integrating the decomposed components with a neural fusion module. Experiments show that our online method can effectively predict lighting for any position within the video stream, while maintaining spatial variability and spatio-temporal consistency. Code is available at: https://github.com/nauyihsnehs/STGlight.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"115 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiwei Wang, Wenbin Song, Yicheng Fan, Yang Wang, Xiaopei Liu
Unmanned aerial vehicles (UAVs) have demonstrated remarkable efficacy across diverse fields. Nevertheless, developing flight controllers tailored to a specific UAV design, particularly in environments with strong fluid-interactive dynamics, remains challenging. Conventional controller design experiences often fall short in such cases, rendering it infeasible to apply time-tested practices. Consequently, a simulation test bed becomes indispensable for controller design and evaluation prior to its actual implementation on the physical UAV. This platform should allow for meticulous adjustment of controllers and should be able to transfer to real-world systems without significant performance degradation. Existing simulators predominantly hinge on empirical models due to high efficiency, often overlooking the dynamic interplay between the UAV and the surrounding airflow. This makes it difficult to mimic more complex flight maneuvers, such as an abrupt midair halt inside narrow channels, in which the UAV may experience strong fluid-structure interactions. On the other hand, simulators considering the complex surrounding airflow are extremely slow and inadequate to support the design and evaluation of flight controllers. In this paper, we present a novel remedy for highly-efficient UAV flight simulations, which entails a hybrid modeling that deftly combines our novel far-field adaptive block-based fluid simulator with parametric empirical models situated near the boundary of the UAV, with the model parameters automatically calibrated. With this newly devised simulator, a broader spectrum of flight scenarios can be explored for controller design and assessment, encompassing those influenced by potent close-proximity effects, or situations where multiple UAVs operate in close quarters. The practical worth of our simulator has been authenticated through comparisons with actual UAV flight data. We further showcase its utility in designing flight controllers for fixed-wing, multi-rotor, and hybrid UAVs, and even exemplify its application when multiple UAVs are involved, underlining the unique value of our system for flight controllers.
{"title":"A Highly-Efficient Hybrid Simulation System for Flight Controller Design and Evaluation of Unmanned Aerial Vehicles","authors":"Jiwei Wang, Wenbin Song, Yicheng Fan, Yang Wang, Xiaopei Liu","doi":"10.1145/3763283","DOIUrl":"https://doi.org/10.1145/3763283","url":null,"abstract":"Unmanned aerial vehicles (UAVs) have demonstrated remarkable efficacy across diverse fields. Nevertheless, developing flight controllers tailored to a specific UAV design, particularly in environments with strong fluid-interactive dynamics, remains challenging. Conventional controller design experiences often fall short in such cases, rendering it infeasible to apply time-tested practices. Consequently, a simulation test bed becomes indispensable for controller design and evaluation prior to its actual implementation on the physical UAV. This platform should allow for meticulous adjustment of controllers and should be able to transfer to real-world systems without significant performance degradation. Existing simulators predominantly hinge on empirical models due to high efficiency, often overlooking the dynamic interplay between the UAV and the surrounding airflow. This makes it difficult to mimic more complex flight maneuvers, such as an abrupt midair halt inside narrow channels, in which the UAV may experience strong fluid-structure interactions. On the other hand, simulators considering the complex surrounding airflow are extremely slow and inadequate to support the design and evaluation of flight controllers. In this paper, we present a novel remedy for highly-efficient UAV flight simulations, which entails a hybrid modeling that deftly combines our novel far-field adaptive block-based fluid simulator with parametric empirical models situated near the boundary of the UAV, with the model parameters automatically calibrated. With this newly devised simulator, a broader spectrum of flight scenarios can be explored for controller design and assessment, encompassing those influenced by potent close-proximity effects, or situations where multiple UAVs operate in close quarters. The practical worth of our simulator has been authenticated through comparisons with actual UAV flight data. We further showcase its utility in designing flight controllers for fixed-wing, multi-rotor, and hybrid UAVs, and even exemplify its application when multiple UAVs are involved, underlining the unique value of our system for flight controllers.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"34 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianze Guo, Zhedong Chen, Yi Jiang, Linjun Wu, Xilei Wei, Lang Xu, Yeshuang Lin, He Wang, Xiaogang Jin
Geometry-aware online motion retargeting is crucial for real-time character animation in gaming and virtual reality. However, existing methods often rely on complex optimization procedures or deep neural networks, which constrain their applicability in real-time scenarios. Moreover, they offer limited control over fine-grained motion details involved in character interactions, resulting in less realistic outcomes. To overcome these limitations, we propose a novel optimization framework for ultrafast, lightweight motion retargeting with joint-level control (i.e., controls over joint position, bone orientation, etc,). Our approach introduces a semantic-aware objective grounded in a spherical geometry representation, coupled with a bone-length-preserving algorithm that iteratively solves this objective. This formulation preserves spatial relationships among spheres, thereby maintaining motion semantics, mitigating interpenetration, and ensuring contact. It is lightweight and computationally efficient, making it particularly suitable for time-critical real-time deployment scenarios. Additionally, we incorporate a heuristic optimization strategy that enables rapid convergence and precise joint-level control. We evaluate our method against state-of-the-art approaches on the Mixamo dataset, and experimental results demonstrate that it achieves comparable performance while delivering an order-of-magnitude speedup.
{"title":"Ultrafast and Controllable Online Motion Retargeting for Game Scenarios","authors":"Tianze Guo, Zhedong Chen, Yi Jiang, Linjun Wu, Xilei Wei, Lang Xu, Yeshuang Lin, He Wang, Xiaogang Jin","doi":"10.1145/3763351","DOIUrl":"https://doi.org/10.1145/3763351","url":null,"abstract":"Geometry-aware online motion retargeting is crucial for real-time character animation in gaming and virtual reality. However, existing methods often rely on complex optimization procedures or deep neural networks, which constrain their applicability in real-time scenarios. Moreover, they offer limited control over fine-grained motion details involved in character interactions, resulting in less realistic outcomes. To overcome these limitations, we propose a novel optimization framework for ultrafast, lightweight motion retargeting with joint-level control (i.e., controls over joint position, bone orientation, etc,). Our approach introduces a semantic-aware objective grounded in a spherical geometry representation, coupled with a bone-length-preserving algorithm that iteratively solves this objective. This formulation preserves spatial relationships among spheres, thereby maintaining motion semantics, mitigating interpenetration, and ensuring contact. It is lightweight and computationally efficient, making it particularly suitable for time-critical real-time deployment scenarios. Additionally, we incorporate a heuristic optimization strategy that enables rapid convergence and precise joint-level control. We evaluate our method against state-of-the-art approaches on the Mixamo dataset, and experimental results demonstrate that it achieves comparable performance while delivering an order-of-magnitude speedup.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"10 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Integral linear operators play a key role in many graphics problems, but solutions obtained via Monte Carlo methods often suffer from high variance. A common strategy to improve the efficiency of integration across various inputs is to precompute the kernel function. Traditional methods typically rely on basis expansions for both the input and output functions. However, using fixed output bases can restrict the precision of output reconstruction and limit the compactness of the kernel representation. In this work, we introduce a new method that approximates both the kernel and the input function using Gaussian mixtures. This formulation allows the integral operator to be evaluated analytically, leading to improved flexibility in kernel storage and output representation. Moreover, our method naturally supports the sequential application of multiple operators and enables closed-form operator composition, which is particularly beneficial in tasks involving chains of operators. We demonstrate the versatility and effectiveness of our approach across a variety of graphics problems, including environment map relighting, boundary value problems, and fluorescence rendering.
{"title":"Gaussian Integral Linear Operators for Precomputed Graphics","authors":"Haolin Lu, Yash Belhe, Gurprit Singh, Tzu-Mao Li, Toshiya Hachisuka","doi":"10.1145/3763321","DOIUrl":"https://doi.org/10.1145/3763321","url":null,"abstract":"Integral linear operators play a key role in many graphics problems, but solutions obtained via Monte Carlo methods often suffer from high variance. A common strategy to improve the efficiency of integration across various inputs is to precompute the kernel function. Traditional methods typically rely on basis expansions for both the input and output functions. However, using fixed output bases can restrict the precision of output reconstruction and limit the compactness of the kernel representation. In this work, we introduce a new method that approximates both the kernel and the input function using Gaussian mixtures. This formulation allows the integral operator to be evaluated analytically, leading to improved flexibility in kernel storage and output representation. Moreover, our method naturally supports the sequential application of multiple operators and enables closed-form operator composition, which is particularly beneficial in tasks involving chains of operators. We demonstrate the versatility and effectiveness of our approach across a variety of graphics problems, including environment map relighting, boundary value problems, and fluorescence rendering.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"34 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When observing an intense light source, humans perceive dense radiating spikes known as glare/starburst patterns. These patterns are frequently used in computer graphics applications to enhance the perception of brightness (e.g., in games and films). Previous works have computed the physical energy distribution of glare patterns under daytime conditions using approximations like Fresnel diffraction. These techniques are capable of producing visually believable results, particularly when the pupil remains small. However, they are insufficient under nighttime conditions, when the pupil is significantly dilated and the assumptions behind the approximations no longer hold. To address this, we employ the Rayleigh-Sommerfeld diffraction solution, from which Fresnel diffraction is derived as an approximation, as our baseline reference. In pursuit of performance and visual quality, we also employ Ochoa's approximation and the Chirp Z transform to efficiently generate high-resolution results for computer graphics applications. By also taking into account background illumination and certain physiological characteristics of the human photoreceptor cells, particularly the visual threshold of light stimulus, we propose a framework capable of producing plausible visual depictions of glare patterns for both daytime and nighttime scenes.
{"title":"Glare Pattern Depiction: High-Fidelity Physical Computation and Physiologically-Inspired Visual Response","authors":"Yuxiang Sun, Gladimir V. G. Baranoski","doi":"10.1145/3763356","DOIUrl":"https://doi.org/10.1145/3763356","url":null,"abstract":"When observing an intense light source, humans perceive dense radiating spikes known as glare/starburst patterns. These patterns are frequently used in computer graphics applications to enhance the perception of brightness (e.g., in games and films). Previous works have computed the physical energy distribution of glare patterns under daytime conditions using approximations like Fresnel diffraction. These techniques are capable of producing visually believable results, particularly when the pupil remains small. However, they are insufficient under nighttime conditions, when the pupil is significantly dilated and the assumptions behind the approximations no longer hold. To address this, we employ the Rayleigh-Sommerfeld diffraction solution, from which Fresnel diffraction is derived as an approximation, as our baseline reference. In pursuit of performance and visual quality, we also employ Ochoa's approximation and the Chirp Z transform to efficiently generate high-resolution results for computer graphics applications. By also taking into account background illumination and certain physiological characteristics of the human photoreceptor cells, particularly the visual threshold of light stimulus, we propose a framework capable of producing plausible visual depictions of glare patterns for both daytime and nighttime scenes.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"155 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Victor Chu, Oscar Pueyo-Ciutad, Ethan Tseng, Florian Schiffers, Grace Kuo, Nathan Matsuda, Alberto Redo-Sanchez, Douglas Lanman, Oliver Cossairt, Felix Heide
Holographic near-eye displays promise unparalleled depth cues, high-resolution imagery, and realistic three-dimensional parallax at a compact form factor, making them promising candidates for emerging augmented and virtual reality systems. However, existing holographic display methods often assume ideal viewing conditions and overlook real-world factors such as eye floaters and eyelashes—obstructions that can severely degrade perceived image quality. In this work, we propose a new metric that quantifies hologram resilience to artifacts and apply it to computer generated holography (CGH) optimization. We call this Artifact Resilient Holography (ARH). We begin by introducing a simulation method that models the effects of pre- and post-pupil obstructions on holographic displays. Our analysis reveals that eyebox regions dominated by low frequencies—produced especially by the smooth-phase holograms broadly adopted in recent holography work—are vulnerable to visual degradation from dynamic obstructions such as floaters and eyelashes. In contrast, random phase holograms spread energy more uniformly across the eyebox spectrum, enabling them to diffract around obstructions without producing prominent artifacts. By characterizing a random phase eyebox using the Rayleigh Distribution, we derive a differentiable metric in the eyebox domain. We then apply this metric to train a real-time neural network-based phase generator, enabling it to produce artifact-resilient 3D holograms that preserve visual fidelity across a range of practical viewing conditions—enhancing both robustness and user interactivity.
{"title":"Artifact-Resilient Real-Time Holography","authors":"Victor Chu, Oscar Pueyo-Ciutad, Ethan Tseng, Florian Schiffers, Grace Kuo, Nathan Matsuda, Alberto Redo-Sanchez, Douglas Lanman, Oliver Cossairt, Felix Heide","doi":"10.1145/3763361","DOIUrl":"https://doi.org/10.1145/3763361","url":null,"abstract":"Holographic near-eye displays promise unparalleled depth cues, high-resolution imagery, and realistic three-dimensional parallax at a compact form factor, making them promising candidates for emerging augmented and virtual reality systems. However, existing holographic display methods often assume ideal viewing conditions and overlook real-world factors such as eye floaters and eyelashes—obstructions that can severely degrade perceived image quality. In this work, we propose a new metric that quantifies hologram resilience to artifacts and apply it to computer generated holography (CGH) optimization. We call this Artifact Resilient Holography (ARH). We begin by introducing a simulation method that models the effects of pre- and post-pupil obstructions on holographic displays. Our analysis reveals that eyebox regions dominated by low frequencies—produced especially by the smooth-phase holograms broadly adopted in recent holography work—are vulnerable to visual degradation from dynamic obstructions such as floaters and eyelashes. In contrast, random phase holograms spread energy more uniformly across the eyebox spectrum, enabling them to diffract around obstructions without producing prominent artifacts. By characterizing a random phase eyebox using the Rayleigh Distribution, we derive a differentiable metric in the eyebox domain. We then apply this metric to train a real-time neural network-based phase generator, enabling it to produce artifact-resilient 3D holograms that preserve visual fidelity across a range of practical viewing conditions—enhancing both robustness and user interactivity.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"26 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories. While significant progress has been made in generating 3D objects from text or images, creating long-range, 3D-consistent, explorable 3D scenes remains a complex and challenging problem. In this work, we present Voyager , a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames, eliminating the need for 3D reconstruction pipelines (e.g., structure-from-motion or multi-view stereo). Our method integrates three key components: 1) World-Consistent Video Diffusion : A unified architecture that jointly generates aligned RGB and depth video sequences, conditioned on existing world observation to ensure global coherence 2) Long-Range World Exploration : An efficient world cache with point culling and an auto-regressive inference with smooth video sampling for iterative scene extension with context-aware consistency, and 3) Scalable Data Engine : A video reconstruction pipeline that automates camera pose estimation and metric depth prediction for arbitrary videos, enabling large-scale, diverse training data curation without manual 3D annotations. Collectively, these designs result in a clear improvement over existing methods in visual quality and geometric accuracy, with versatile applications. Code for this paper are at https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager.
{"title":"Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation","authors":"Tianyu Huang, Wangguandong Zheng, Tengfei Wang, Yuhao Liu, Zhenwei Wang, Junta Wu, Jie Jiang, Hui Li, Rynson Lau, Wangmeng Zuo, Chunchao Guo","doi":"10.1145/3763330","DOIUrl":"https://doi.org/10.1145/3763330","url":null,"abstract":"Real-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories. While significant progress has been made in generating 3D objects from text or images, creating long-range, 3D-consistent, explorable 3D scenes remains a complex and challenging problem. In this work, we present <jats:italic toggle=\"yes\">Voyager</jats:italic> , a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames, eliminating the need for 3D reconstruction pipelines (e.g., structure-from-motion or multi-view stereo). Our method integrates three key components: 1) World-Consistent Video Diffusion : A unified architecture that jointly generates aligned RGB and depth video sequences, conditioned on existing world observation to ensure global coherence 2) Long-Range World Exploration : An efficient world cache with point culling and an auto-regressive inference with smooth video sampling for iterative scene extension with context-aware consistency, and 3) Scalable Data Engine : A video reconstruction pipeline that automates camera pose estimation and metric depth prediction for arbitrary videos, enabling large-scale, diverse training data curation without manual 3D annotations. Collectively, these designs result in a clear improvement over existing methods in visual quality and geometric accuracy, with versatile applications. Code for this paper are at https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"34 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhongtian Zheng, Tao Huang, Haozhe Su, Xueqi Ma, Yuefan Shen, Tongtong Wang, Yin Yang, Xifeng Gao, Zherong Pan, Kui Wu
Hair cards remain a widely used representation for hair modeling in real-time applications, offering a practical trade-off between visual fidelity, memory usage, and performance. However, generating high-quality hair card models remains a challenging and labor-intensive task. This work presents an automated pipeline for converting strand-based hair models into hair card models with a limited number of cards and textures while preserving the hairstyle appearance. Our key idea is a novel differentiable representation where each strand is encoded as a projected 2D curve in the texture space, which enables end-to-end optimization with differentiable rendering while respecting the structures of the hair geometry. Based on this representation, we develop a novel algorithm pipeline, where we first cluster hair strands into initial hair cards and project the strands into the texture space. We then conduct a two-stage optimization, where our first stage optimizes the orientation of each hair card separately, and after strand projection, our second stage conducts joint optimization over the entire hair card model for fine-tuning. Our method is evaluated on a range of hairstyles, including straight, wavy, curly, and coily hair. To capture the appearance of short or coily hair, our method comes with support for hair caps and cross-card.
{"title":"Auto Hair Card Extraction for Smooth Hair with Differentiable Rendering","authors":"Zhongtian Zheng, Tao Huang, Haozhe Su, Xueqi Ma, Yuefan Shen, Tongtong Wang, Yin Yang, Xifeng Gao, Zherong Pan, Kui Wu","doi":"10.1145/3763295","DOIUrl":"https://doi.org/10.1145/3763295","url":null,"abstract":"Hair cards remain a widely used representation for hair modeling in real-time applications, offering a practical trade-off between visual fidelity, memory usage, and performance. However, generating high-quality hair card models remains a challenging and labor-intensive task. This work presents an automated pipeline for converting strand-based hair models into hair card models with a limited number of cards and textures while preserving the hairstyle appearance. Our key idea is a novel differentiable representation where each strand is encoded as a projected 2D curve in the texture space, which enables end-to-end optimization with differentiable rendering while respecting the structures of the hair geometry. Based on this representation, we develop a novel algorithm pipeline, where we first cluster hair strands into initial hair cards and project the strands into the texture space. We then conduct a two-stage optimization, where our first stage optimizes the orientation of each hair card separately, and after strand projection, our second stage conducts joint optimization over the entire hair card model for fine-tuning. Our method is evaluated on a range of hairstyles, including straight, wavy, curly, and coily hair. To capture the appearance of short or coily hair, our method comes with support for hair caps and cross-card.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"12 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorenzo De Rebotti, Emanuele Giacomini, Giorgio Grisetti, Luca Di Giammarino
Efficient and scalable 3D surface reconstruction from range data remains a core challenge in computer graphics and vision, particularly in real-time and resource-constrained scenarios. Traditional volumetric methods based on fixed-resolution voxel grids or hierarchical structures like octrees often suffer from memory inefficiency, computational overhead, and a lack of GPU support. We propose a novel variance-adaptive, multi-resolution voxel grid that dynamically adjusts voxel size based on the local variance of signed distance field (SDF) observations. Unlike prior multi-resolution approaches that rely on recursive octree structures, our method leverages a flat spatial hash table to store all voxel blocks, supporting constant-time access and full GPU parallelism. This design enables high memory efficiency, and real-time scalability. We further demonstrate how our representation supports GPU-accelerated rendering through a parallel quad-tree structure for Gaussian Splatting, enabling effective control over splat density. Our open-source CUDA/C++ implementation achieves up to 13× speedup and 4× lower memory usage compared to fixed-resolution baselines, while maintaining on par results in terms of reconstruction accuracy, offering a practical and extensible solution for high-performance 3D reconstruction.
{"title":"Resolution Where It Counts: Hash-based GPU-Accelerated 3D Reconstruction via Variance-Adaptive Voxel Grids","authors":"Lorenzo De Rebotti, Emanuele Giacomini, Giorgio Grisetti, Luca Di Giammarino","doi":"10.1145/3777909","DOIUrl":"https://doi.org/10.1145/3777909","url":null,"abstract":"Efficient and scalable 3D surface reconstruction from range data remains a core challenge in computer graphics and vision, particularly in real-time and resource-constrained scenarios. Traditional volumetric methods based on fixed-resolution voxel grids or hierarchical structures like octrees often suffer from memory inefficiency, computational overhead, and a lack of GPU support. We propose a novel variance-adaptive, multi-resolution voxel grid that dynamically adjusts voxel size based on the local variance of signed distance field (SDF) observations. Unlike prior multi-resolution approaches that rely on recursive octree structures, our method leverages a flat spatial hash table to store all voxel blocks, supporting constant-time access and full GPU parallelism. This design enables high memory efficiency, and real-time scalability. We further demonstrate how our representation supports GPU-accelerated rendering through a parallel quad-tree structure for Gaussian Splatting, enabling effective control over splat density. Our open-source CUDA/C++ implementation achieves up to 13× speedup and 4× lower memory usage compared to fixed-resolution baselines, while maintaining on par results in terms of reconstruction accuracy, offering a practical and extensible solution for high-performance 3D reconstruction.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"204 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taehei Kim, Jihun Shin, Hyeshim Kim, Hyuckjin Jang, Jiho Kang, Sung-Hee Lee
We propose a multi-user Mixed Reality (MR) telepresence system that allows users to interact by seamlessly visualizing remote environments and avatars overlaid onto their local physical space. Building on prior shared-space approaches, our method first aligns overlapping rooms to maximize a shared space —a common area containing matched real and virtual objects where all users can interact. Uniquely, our system extends beyond this shared space by visualizing non-shared spaces, the remaining part of each room, allowing users to inhabit these distinct areas. To address the issue of overlap between non-shared spaces, we dynamically adjust their visibility based on user proximity, using a Voronoi diagram to prioritize subspaces closer to each user. Visualizing the surrounding space of each user conveys spatial context, helping others interpret their behavior within their environment. Visibility is updated in real time as users move, maintaining a coherent sense of spatial awareness. Through a user study, we demonstrate that our system enhances enjoyment, spatial understanding, and presence compared to shared-space-only approaches. Quantitative results further show that our dynamic visibility modulation improves both personal space preservation and space accessibility relative to static methods. Overall, our system provides users with a seamless, dynamically connected, and shared multi-room environment.
{"title":"Voronoi Rooms: Dynamic Visibility Modulation of Overlapping Spaces for Telepresence","authors":"Taehei Kim, Jihun Shin, Hyeshim Kim, Hyuckjin Jang, Jiho Kang, Sung-Hee Lee","doi":"10.1145/3777900","DOIUrl":"https://doi.org/10.1145/3777900","url":null,"abstract":"We propose a multi-user Mixed Reality (MR) telepresence system that allows users to interact by seamlessly visualizing remote environments and avatars overlaid onto their local physical space. Building on prior shared-space approaches, our method first aligns overlapping rooms to maximize a <jats:italic toggle=\"yes\">shared space</jats:italic> —a common area containing matched real and virtual objects where all users can interact. Uniquely, our system extends beyond this shared space by visualizing non-shared spaces, the remaining part of each room, allowing users to inhabit these distinct areas. To address the issue of overlap between non-shared spaces, we dynamically adjust their visibility based on user proximity, using a Voronoi diagram to prioritize subspaces closer to each user. Visualizing the surrounding space of each user conveys spatial context, helping others interpret their behavior within their environment. Visibility is updated in real time as users move, maintaining a coherent sense of spatial awareness. Through a user study, we demonstrate that our system enhances enjoyment, spatial understanding, and presence compared to shared-space-only approaches. Quantitative results further show that our dynamic visibility modulation improves both personal space preservation and space accessibility relative to static methods. Overall, our system provides users with a seamless, dynamically connected, and shared multi-room environment.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"6 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}