Remanufacturing effectively extends component lifespans by restoring used or end-of-life parts to like-new or even superior conditions, with an emphasis on maximizing reutilized material, especially for high-cost materials. Hybrid manufacturing technology combines the capabilities of additive and subtractive manufacturing, with the ability to add and remove material, enabling it to remanufacture complex shapes and is increasingly being applied in remanufacturing. How to effectively plan the process of additive and subtractive hybrid remanufacturing (ASHRM) to maximize material reutilization has become a key focus of attention. However, current ASHRM process planning methods lack strict consideration of collision-free constraints, hindering practical application. This paper introduces a computational framework to tackle ASHRM process planning for general shapes with strictly considering these constraints. We separate global and local collision-free constraints, employing clipping planes and graph to tackle them respectively, ultimately maximizing the reutilized volume while ensuring these constraints are satisfied. Additionally, we also optimize the setup of the target model that is conducive to maximizing the reutilized volume. Extensive experiments and physical validations on a 5-axis hybrid manufacturing platform demonstrate the effectiveness of our method across various 3D shapes, achieving an average material reutilization of 69% across 12 cases. Code is publicly available at https://github.com/fanchao98/Waste-to-Value.
{"title":"Waste-to-Value: Reutilized Material Maximization for Additive and Subtractive Hybrid Remanufacturing","authors":"Fanchao Zhong, Zhenmin Zhang, Liyuan Wang, Xin Yan, Jikai Liu, Lin Lu, Haisen Zhao","doi":"10.1145/3763313","DOIUrl":"https://doi.org/10.1145/3763313","url":null,"abstract":"Remanufacturing effectively extends component lifespans by restoring used or end-of-life parts to like-new or even superior conditions, with an emphasis on maximizing reutilized material, especially for high-cost materials. Hybrid manufacturing technology combines the capabilities of additive and subtractive manufacturing, with the ability to add and remove material, enabling it to remanufacture complex shapes and is increasingly being applied in remanufacturing. How to effectively plan the process of additive and subtractive hybrid remanufacturing (ASHRM) to maximize material reutilization has become a key focus of attention. However, current ASHRM process planning methods lack strict consideration of collision-free constraints, hindering practical application. This paper introduces a computational framework to tackle ASHRM process planning for general shapes with strictly considering these constraints. We separate global and local collision-free constraints, employing clipping planes and graph to tackle them respectively, ultimately maximizing the reutilized volume while ensuring these constraints are satisfied. Additionally, we also optimize the setup of the target model that is conducive to maximizing the reutilized volume. Extensive experiments and physical validations on a 5-axis hybrid manufacturing platform demonstrate the effectiveness of our method across various 3D shapes, achieving an average material reutilization of 69% across 12 cases. Code is publicly available at https://github.com/fanchao98/Waste-to-Value.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"125 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estimating lighting in indoor scenes is particularly challenging due to diverse distribution of light sources and complexity of scene geometry. Previous methods mainly focused on spatial variability and consistency for a single image or temporal consistency for video sequences. However, these approaches fail to achieve spatio-temporal consistency in video lighting estimation, which restricts applications such as compositing animated models into videos. In this paper, we propose STGlight, a lightweight and effective method for spatio-temporally consistent video lighting estimation, where our network processes a stream of LDR RGB-D video frames while maintaining incrementally updated global representations of both geometry and lighting, enabling the prediction of HDR environment maps at arbitrary locations for each frame. We model indoor lighting with three components: visible light sources providing direct illumination, ambient lighting approximating indirect illumination, and local environment textures producing high-quality specular reflections on glossy objects. To capture spatial-varying lighting, we represent scene geometry with point clouds, which support efficient spatio-temporal fusion and allow us to handle moderately dynamic scenes. To ensure temporal consistency, we apply a transformer-based fusion block that propagates lighting features across frames. Building on this, we further handle dynamic lighting with moving objects or changing light conditions by applying intrinsic decomposition on the point cloud and integrating the decomposed components with a neural fusion module. Experiments show that our online method can effectively predict lighting for any position within the video stream, while maintaining spatial variability and spatio-temporal consistency. Code is available at: https://github.com/nauyihsnehs/STGlight.
{"title":"STGlight: Online Indoor Lighting Estimation via Spatio-Temporal Gaussian Fusion","authors":"Shiyuan Shen, Zhongyun Bao, Hong Ding, Wenju Xu, Tenghui Lai, Chunxia Xiao","doi":"10.1145/3763350","DOIUrl":"https://doi.org/10.1145/3763350","url":null,"abstract":"Estimating lighting in indoor scenes is particularly challenging due to diverse distribution of light sources and complexity of scene geometry. Previous methods mainly focused on spatial variability and consistency for a single image or temporal consistency for video sequences. However, these approaches fail to achieve spatio-temporal consistency in video lighting estimation, which restricts applications such as compositing animated models into videos. In this paper, we propose STGlight, a lightweight and effective method for spatio-temporally consistent video lighting estimation, where our network processes a stream of LDR RGB-D video frames while maintaining incrementally updated global representations of both geometry and lighting, enabling the prediction of HDR environment maps at arbitrary locations for each frame. We model indoor lighting with three components: visible light sources providing direct illumination, ambient lighting approximating indirect illumination, and local environment textures producing high-quality specular reflections on glossy objects. To capture spatial-varying lighting, we represent scene geometry with point clouds, which support efficient spatio-temporal fusion and allow us to handle moderately dynamic scenes. To ensure temporal consistency, we apply a transformer-based fusion block that propagates lighting features across frames. Building on this, we further handle dynamic lighting with moving objects or changing light conditions by applying intrinsic decomposition on the point cloud and integrating the decomposed components with a neural fusion module. Experiments show that our online method can effectively predict lighting for any position within the video stream, while maintaining spatial variability and spatio-temporal consistency. Code is available at: https://github.com/nauyihsnehs/STGlight.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"115 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiwei Wang, Wenbin Song, Yicheng Fan, Yang Wang, Xiaopei Liu
Unmanned aerial vehicles (UAVs) have demonstrated remarkable efficacy across diverse fields. Nevertheless, developing flight controllers tailored to a specific UAV design, particularly in environments with strong fluid-interactive dynamics, remains challenging. Conventional controller design experiences often fall short in such cases, rendering it infeasible to apply time-tested practices. Consequently, a simulation test bed becomes indispensable for controller design and evaluation prior to its actual implementation on the physical UAV. This platform should allow for meticulous adjustment of controllers and should be able to transfer to real-world systems without significant performance degradation. Existing simulators predominantly hinge on empirical models due to high efficiency, often overlooking the dynamic interplay between the UAV and the surrounding airflow. This makes it difficult to mimic more complex flight maneuvers, such as an abrupt midair halt inside narrow channels, in which the UAV may experience strong fluid-structure interactions. On the other hand, simulators considering the complex surrounding airflow are extremely slow and inadequate to support the design and evaluation of flight controllers. In this paper, we present a novel remedy for highly-efficient UAV flight simulations, which entails a hybrid modeling that deftly combines our novel far-field adaptive block-based fluid simulator with parametric empirical models situated near the boundary of the UAV, with the model parameters automatically calibrated. With this newly devised simulator, a broader spectrum of flight scenarios can be explored for controller design and assessment, encompassing those influenced by potent close-proximity effects, or situations where multiple UAVs operate in close quarters. The practical worth of our simulator has been authenticated through comparisons with actual UAV flight data. We further showcase its utility in designing flight controllers for fixed-wing, multi-rotor, and hybrid UAVs, and even exemplify its application when multiple UAVs are involved, underlining the unique value of our system for flight controllers.
{"title":"A Highly-Efficient Hybrid Simulation System for Flight Controller Design and Evaluation of Unmanned Aerial Vehicles","authors":"Jiwei Wang, Wenbin Song, Yicheng Fan, Yang Wang, Xiaopei Liu","doi":"10.1145/3763283","DOIUrl":"https://doi.org/10.1145/3763283","url":null,"abstract":"Unmanned aerial vehicles (UAVs) have demonstrated remarkable efficacy across diverse fields. Nevertheless, developing flight controllers tailored to a specific UAV design, particularly in environments with strong fluid-interactive dynamics, remains challenging. Conventional controller design experiences often fall short in such cases, rendering it infeasible to apply time-tested practices. Consequently, a simulation test bed becomes indispensable for controller design and evaluation prior to its actual implementation on the physical UAV. This platform should allow for meticulous adjustment of controllers and should be able to transfer to real-world systems without significant performance degradation. Existing simulators predominantly hinge on empirical models due to high efficiency, often overlooking the dynamic interplay between the UAV and the surrounding airflow. This makes it difficult to mimic more complex flight maneuvers, such as an abrupt midair halt inside narrow channels, in which the UAV may experience strong fluid-structure interactions. On the other hand, simulators considering the complex surrounding airflow are extremely slow and inadequate to support the design and evaluation of flight controllers. In this paper, we present a novel remedy for highly-efficient UAV flight simulations, which entails a hybrid modeling that deftly combines our novel far-field adaptive block-based fluid simulator with parametric empirical models situated near the boundary of the UAV, with the model parameters automatically calibrated. With this newly devised simulator, a broader spectrum of flight scenarios can be explored for controller design and assessment, encompassing those influenced by potent close-proximity effects, or situations where multiple UAVs operate in close quarters. The practical worth of our simulator has been authenticated through comparisons with actual UAV flight data. We further showcase its utility in designing flight controllers for fixed-wing, multi-rotor, and hybrid UAVs, and even exemplify its application when multiple UAVs are involved, underlining the unique value of our system for flight controllers.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"34 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianze Guo, Zhedong Chen, Yi Jiang, Linjun Wu, Xilei Wei, Lang Xu, Yeshuang Lin, He Wang, Xiaogang Jin
Geometry-aware online motion retargeting is crucial for real-time character animation in gaming and virtual reality. However, existing methods often rely on complex optimization procedures or deep neural networks, which constrain their applicability in real-time scenarios. Moreover, they offer limited control over fine-grained motion details involved in character interactions, resulting in less realistic outcomes. To overcome these limitations, we propose a novel optimization framework for ultrafast, lightweight motion retargeting with joint-level control (i.e., controls over joint position, bone orientation, etc,). Our approach introduces a semantic-aware objective grounded in a spherical geometry representation, coupled with a bone-length-preserving algorithm that iteratively solves this objective. This formulation preserves spatial relationships among spheres, thereby maintaining motion semantics, mitigating interpenetration, and ensuring contact. It is lightweight and computationally efficient, making it particularly suitable for time-critical real-time deployment scenarios. Additionally, we incorporate a heuristic optimization strategy that enables rapid convergence and precise joint-level control. We evaluate our method against state-of-the-art approaches on the Mixamo dataset, and experimental results demonstrate that it achieves comparable performance while delivering an order-of-magnitude speedup.
{"title":"Ultrafast and Controllable Online Motion Retargeting for Game Scenarios","authors":"Tianze Guo, Zhedong Chen, Yi Jiang, Linjun Wu, Xilei Wei, Lang Xu, Yeshuang Lin, He Wang, Xiaogang Jin","doi":"10.1145/3763351","DOIUrl":"https://doi.org/10.1145/3763351","url":null,"abstract":"Geometry-aware online motion retargeting is crucial for real-time character animation in gaming and virtual reality. However, existing methods often rely on complex optimization procedures or deep neural networks, which constrain their applicability in real-time scenarios. Moreover, they offer limited control over fine-grained motion details involved in character interactions, resulting in less realistic outcomes. To overcome these limitations, we propose a novel optimization framework for ultrafast, lightweight motion retargeting with joint-level control (i.e., controls over joint position, bone orientation, etc,). Our approach introduces a semantic-aware objective grounded in a spherical geometry representation, coupled with a bone-length-preserving algorithm that iteratively solves this objective. This formulation preserves spatial relationships among spheres, thereby maintaining motion semantics, mitigating interpenetration, and ensuring contact. It is lightweight and computationally efficient, making it particularly suitable for time-critical real-time deployment scenarios. Additionally, we incorporate a heuristic optimization strategy that enables rapid convergence and precise joint-level control. We evaluate our method against state-of-the-art approaches on the Mixamo dataset, and experimental results demonstrate that it achieves comparable performance while delivering an order-of-magnitude speedup.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"10 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Integral linear operators play a key role in many graphics problems, but solutions obtained via Monte Carlo methods often suffer from high variance. A common strategy to improve the efficiency of integration across various inputs is to precompute the kernel function. Traditional methods typically rely on basis expansions for both the input and output functions. However, using fixed output bases can restrict the precision of output reconstruction and limit the compactness of the kernel representation. In this work, we introduce a new method that approximates both the kernel and the input function using Gaussian mixtures. This formulation allows the integral operator to be evaluated analytically, leading to improved flexibility in kernel storage and output representation. Moreover, our method naturally supports the sequential application of multiple operators and enables closed-form operator composition, which is particularly beneficial in tasks involving chains of operators. We demonstrate the versatility and effectiveness of our approach across a variety of graphics problems, including environment map relighting, boundary value problems, and fluorescence rendering.
{"title":"Gaussian Integral Linear Operators for Precomputed Graphics","authors":"Haolin Lu, Yash Belhe, Gurprit Singh, Tzu-Mao Li, Toshiya Hachisuka","doi":"10.1145/3763321","DOIUrl":"https://doi.org/10.1145/3763321","url":null,"abstract":"Integral linear operators play a key role in many graphics problems, but solutions obtained via Monte Carlo methods often suffer from high variance. A common strategy to improve the efficiency of integration across various inputs is to precompute the kernel function. Traditional methods typically rely on basis expansions for both the input and output functions. However, using fixed output bases can restrict the precision of output reconstruction and limit the compactness of the kernel representation. In this work, we introduce a new method that approximates both the kernel and the input function using Gaussian mixtures. This formulation allows the integral operator to be evaluated analytically, leading to improved flexibility in kernel storage and output representation. Moreover, our method naturally supports the sequential application of multiple operators and enables closed-form operator composition, which is particularly beneficial in tasks involving chains of operators. We demonstrate the versatility and effectiveness of our approach across a variety of graphics problems, including environment map relighting, boundary value problems, and fluorescence rendering.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"34 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When observing an intense light source, humans perceive dense radiating spikes known as glare/starburst patterns. These patterns are frequently used in computer graphics applications to enhance the perception of brightness (e.g., in games and films). Previous works have computed the physical energy distribution of glare patterns under daytime conditions using approximations like Fresnel diffraction. These techniques are capable of producing visually believable results, particularly when the pupil remains small. However, they are insufficient under nighttime conditions, when the pupil is significantly dilated and the assumptions behind the approximations no longer hold. To address this, we employ the Rayleigh-Sommerfeld diffraction solution, from which Fresnel diffraction is derived as an approximation, as our baseline reference. In pursuit of performance and visual quality, we also employ Ochoa's approximation and the Chirp Z transform to efficiently generate high-resolution results for computer graphics applications. By also taking into account background illumination and certain physiological characteristics of the human photoreceptor cells, particularly the visual threshold of light stimulus, we propose a framework capable of producing plausible visual depictions of glare patterns for both daytime and nighttime scenes.
{"title":"Glare Pattern Depiction: High-Fidelity Physical Computation and Physiologically-Inspired Visual Response","authors":"Yuxiang Sun, Gladimir V. G. Baranoski","doi":"10.1145/3763356","DOIUrl":"https://doi.org/10.1145/3763356","url":null,"abstract":"When observing an intense light source, humans perceive dense radiating spikes known as glare/starburst patterns. These patterns are frequently used in computer graphics applications to enhance the perception of brightness (e.g., in games and films). Previous works have computed the physical energy distribution of glare patterns under daytime conditions using approximations like Fresnel diffraction. These techniques are capable of producing visually believable results, particularly when the pupil remains small. However, they are insufficient under nighttime conditions, when the pupil is significantly dilated and the assumptions behind the approximations no longer hold. To address this, we employ the Rayleigh-Sommerfeld diffraction solution, from which Fresnel diffraction is derived as an approximation, as our baseline reference. In pursuit of performance and visual quality, we also employ Ochoa's approximation and the Chirp Z transform to efficiently generate high-resolution results for computer graphics applications. By also taking into account background illumination and certain physiological characteristics of the human photoreceptor cells, particularly the visual threshold of light stimulus, we propose a framework capable of producing plausible visual depictions of glare patterns for both daytime and nighttime scenes.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"155 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Victor Chu, Oscar Pueyo-Ciutad, Ethan Tseng, Florian Schiffers, Grace Kuo, Nathan Matsuda, Alberto Redo-Sanchez, Douglas Lanman, Oliver Cossairt, Felix Heide
Holographic near-eye displays promise unparalleled depth cues, high-resolution imagery, and realistic three-dimensional parallax at a compact form factor, making them promising candidates for emerging augmented and virtual reality systems. However, existing holographic display methods often assume ideal viewing conditions and overlook real-world factors such as eye floaters and eyelashes—obstructions that can severely degrade perceived image quality. In this work, we propose a new metric that quantifies hologram resilience to artifacts and apply it to computer generated holography (CGH) optimization. We call this Artifact Resilient Holography (ARH). We begin by introducing a simulation method that models the effects of pre- and post-pupil obstructions on holographic displays. Our analysis reveals that eyebox regions dominated by low frequencies—produced especially by the smooth-phase holograms broadly adopted in recent holography work—are vulnerable to visual degradation from dynamic obstructions such as floaters and eyelashes. In contrast, random phase holograms spread energy more uniformly across the eyebox spectrum, enabling them to diffract around obstructions without producing prominent artifacts. By characterizing a random phase eyebox using the Rayleigh Distribution, we derive a differentiable metric in the eyebox domain. We then apply this metric to train a real-time neural network-based phase generator, enabling it to produce artifact-resilient 3D holograms that preserve visual fidelity across a range of practical viewing conditions—enhancing both robustness and user interactivity.
{"title":"Artifact-Resilient Real-Time Holography","authors":"Victor Chu, Oscar Pueyo-Ciutad, Ethan Tseng, Florian Schiffers, Grace Kuo, Nathan Matsuda, Alberto Redo-Sanchez, Douglas Lanman, Oliver Cossairt, Felix Heide","doi":"10.1145/3763361","DOIUrl":"https://doi.org/10.1145/3763361","url":null,"abstract":"Holographic near-eye displays promise unparalleled depth cues, high-resolution imagery, and realistic three-dimensional parallax at a compact form factor, making them promising candidates for emerging augmented and virtual reality systems. However, existing holographic display methods often assume ideal viewing conditions and overlook real-world factors such as eye floaters and eyelashes—obstructions that can severely degrade perceived image quality. In this work, we propose a new metric that quantifies hologram resilience to artifacts and apply it to computer generated holography (CGH) optimization. We call this Artifact Resilient Holography (ARH). We begin by introducing a simulation method that models the effects of pre- and post-pupil obstructions on holographic displays. Our analysis reveals that eyebox regions dominated by low frequencies—produced especially by the smooth-phase holograms broadly adopted in recent holography work—are vulnerable to visual degradation from dynamic obstructions such as floaters and eyelashes. In contrast, random phase holograms spread energy more uniformly across the eyebox spectrum, enabling them to diffract around obstructions without producing prominent artifacts. By characterizing a random phase eyebox using the Rayleigh Distribution, we derive a differentiable metric in the eyebox domain. We then apply this metric to train a real-time neural network-based phase generator, enabling it to produce artifact-resilient 3D holograms that preserve visual fidelity across a range of practical viewing conditions—enhancing both robustness and user interactivity.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"26 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories. While significant progress has been made in generating 3D objects from text or images, creating long-range, 3D-consistent, explorable 3D scenes remains a complex and challenging problem. In this work, we present Voyager , a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames, eliminating the need for 3D reconstruction pipelines (e.g., structure-from-motion or multi-view stereo). Our method integrates three key components: 1) World-Consistent Video Diffusion : A unified architecture that jointly generates aligned RGB and depth video sequences, conditioned on existing world observation to ensure global coherence 2) Long-Range World Exploration : An efficient world cache with point culling and an auto-regressive inference with smooth video sampling for iterative scene extension with context-aware consistency, and 3) Scalable Data Engine : A video reconstruction pipeline that automates camera pose estimation and metric depth prediction for arbitrary videos, enabling large-scale, diverse training data curation without manual 3D annotations. Collectively, these designs result in a clear improvement over existing methods in visual quality and geometric accuracy, with versatile applications. Code for this paper are at https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager.
{"title":"Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation","authors":"Tianyu Huang, Wangguandong Zheng, Tengfei Wang, Yuhao Liu, Zhenwei Wang, Junta Wu, Jie Jiang, Hui Li, Rynson Lau, Wangmeng Zuo, Chunchao Guo","doi":"10.1145/3763330","DOIUrl":"https://doi.org/10.1145/3763330","url":null,"abstract":"Real-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories. While significant progress has been made in generating 3D objects from text or images, creating long-range, 3D-consistent, explorable 3D scenes remains a complex and challenging problem. In this work, we present <jats:italic toggle=\"yes\">Voyager</jats:italic> , a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames, eliminating the need for 3D reconstruction pipelines (e.g., structure-from-motion or multi-view stereo). Our method integrates three key components: 1) World-Consistent Video Diffusion : A unified architecture that jointly generates aligned RGB and depth video sequences, conditioned on existing world observation to ensure global coherence 2) Long-Range World Exploration : An efficient world cache with point culling and an auto-regressive inference with smooth video sampling for iterative scene extension with context-aware consistency, and 3) Scalable Data Engine : A video reconstruction pipeline that automates camera pose estimation and metric depth prediction for arbitrary videos, enabling large-scale, diverse training data curation without manual 3D annotations. Collectively, these designs result in a clear improvement over existing methods in visual quality and geometric accuracy, with versatile applications. Code for this paper are at https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"34 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhongtian Zheng, Tao Huang, Haozhe Su, Xueqi Ma, Yuefan Shen, Tongtong Wang, Yin Yang, Xifeng Gao, Zherong Pan, Kui Wu
Hair cards remain a widely used representation for hair modeling in real-time applications, offering a practical trade-off between visual fidelity, memory usage, and performance. However, generating high-quality hair card models remains a challenging and labor-intensive task. This work presents an automated pipeline for converting strand-based hair models into hair card models with a limited number of cards and textures while preserving the hairstyle appearance. Our key idea is a novel differentiable representation where each strand is encoded as a projected 2D curve in the texture space, which enables end-to-end optimization with differentiable rendering while respecting the structures of the hair geometry. Based on this representation, we develop a novel algorithm pipeline, where we first cluster hair strands into initial hair cards and project the strands into the texture space. We then conduct a two-stage optimization, where our first stage optimizes the orientation of each hair card separately, and after strand projection, our second stage conducts joint optimization over the entire hair card model for fine-tuning. Our method is evaluated on a range of hairstyles, including straight, wavy, curly, and coily hair. To capture the appearance of short or coily hair, our method comes with support for hair caps and cross-card.
{"title":"Auto Hair Card Extraction for Smooth Hair with Differentiable Rendering","authors":"Zhongtian Zheng, Tao Huang, Haozhe Su, Xueqi Ma, Yuefan Shen, Tongtong Wang, Yin Yang, Xifeng Gao, Zherong Pan, Kui Wu","doi":"10.1145/3763295","DOIUrl":"https://doi.org/10.1145/3763295","url":null,"abstract":"Hair cards remain a widely used representation for hair modeling in real-time applications, offering a practical trade-off between visual fidelity, memory usage, and performance. However, generating high-quality hair card models remains a challenging and labor-intensive task. This work presents an automated pipeline for converting strand-based hair models into hair card models with a limited number of cards and textures while preserving the hairstyle appearance. Our key idea is a novel differentiable representation where each strand is encoded as a projected 2D curve in the texture space, which enables end-to-end optimization with differentiable rendering while respecting the structures of the hair geometry. Based on this representation, we develop a novel algorithm pipeline, where we first cluster hair strands into initial hair cards and project the strands into the texture space. We then conduct a two-stage optimization, where our first stage optimizes the orientation of each hair card separately, and after strand projection, our second stage conducts joint optimization over the entire hair card model for fine-tuning. Our method is evaluated on a range of hairstyles, including straight, wavy, curly, and coily hair. To capture the appearance of short or coily hair, our method comes with support for hair caps and cross-card.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"12 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorenzo De Rebotti, Emanuele Giacomini, Giorgio Grisetti, Luca Di Giammarino
Efficient and scalable 3D surface reconstruction from range data remains a core challenge in computer graphics and vision, particularly in real-time and resource-constrained scenarios. Traditional volumetric methods based on fixed-resolution voxel grids or hierarchical structures like octrees often suffer from memory inefficiency, computational overhead, and a lack of GPU support. We propose a novel variance-adaptive, multi-resolution voxel grid that dynamically adjusts voxel size based on the local variance of signed distance field (SDF) observations. Unlike prior multi-resolution approaches that rely on recursive octree structures, our method leverages a flat spatial hash table to store all voxel blocks, supporting constant-time access and full GPU parallelism. This design enables high memory efficiency, and real-time scalability. We further demonstrate how our representation supports GPU-accelerated rendering through a parallel quad-tree structure for Gaussian Splatting, enabling effective control over splat density. Our open-source CUDA/C++ implementation achieves up to 13× speedup and 4× lower memory usage compared to fixed-resolution baselines, while maintaining on par results in terms of reconstruction accuracy, offering a practical and extensible solution for high-performance 3D reconstruction.
{"title":"Resolution Where It Counts: Hash-based GPU-Accelerated 3D Reconstruction via Variance-Adaptive Voxel Grids","authors":"Lorenzo De Rebotti, Emanuele Giacomini, Giorgio Grisetti, Luca Di Giammarino","doi":"10.1145/3777909","DOIUrl":"https://doi.org/10.1145/3777909","url":null,"abstract":"Efficient and scalable 3D surface reconstruction from range data remains a core challenge in computer graphics and vision, particularly in real-time and resource-constrained scenarios. Traditional volumetric methods based on fixed-resolution voxel grids or hierarchical structures like octrees often suffer from memory inefficiency, computational overhead, and a lack of GPU support. We propose a novel variance-adaptive, multi-resolution voxel grid that dynamically adjusts voxel size based on the local variance of signed distance field (SDF) observations. Unlike prior multi-resolution approaches that rely on recursive octree structures, our method leverages a flat spatial hash table to store all voxel blocks, supporting constant-time access and full GPU parallelism. This design enables high memory efficiency, and real-time scalability. We further demonstrate how our representation supports GPU-accelerated rendering through a parallel quad-tree structure for Gaussian Splatting, enabling effective control over splat density. Our open-source CUDA/C++ implementation achieves up to 13× speedup and 4× lower memory usage compared to fixed-resolution baselines, while maintaining on par results in terms of reconstruction accuracy, offering a practical and extensible solution for high-performance 3D reconstruction.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"204 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}