When observing an intense light source, humans perceive dense radiating spikes known as glare/starburst patterns. These patterns are frequently used in computer graphics applications to enhance the perception of brightness (e.g., in games and films). Previous works have computed the physical energy distribution of glare patterns under daytime conditions using approximations like Fresnel diffraction. These techniques are capable of producing visually believable results, particularly when the pupil remains small. However, they are insufficient under nighttime conditions, when the pupil is significantly dilated and the assumptions behind the approximations no longer hold. To address this, we employ the Rayleigh-Sommerfeld diffraction solution, from which Fresnel diffraction is derived as an approximation, as our baseline reference. In pursuit of performance and visual quality, we also employ Ochoa's approximation and the Chirp Z transform to efficiently generate high-resolution results for computer graphics applications. By also taking into account background illumination and certain physiological characteristics of the human photoreceptor cells, particularly the visual threshold of light stimulus, we propose a framework capable of producing plausible visual depictions of glare patterns for both daytime and nighttime scenes.
{"title":"Glare Pattern Depiction: High-Fidelity Physical Computation and Physiologically-Inspired Visual Response","authors":"Yuxiang Sun, Gladimir V. G. Baranoski","doi":"10.1145/3763356","DOIUrl":"https://doi.org/10.1145/3763356","url":null,"abstract":"When observing an intense light source, humans perceive dense radiating spikes known as glare/starburst patterns. These patterns are frequently used in computer graphics applications to enhance the perception of brightness (e.g., in games and films). Previous works have computed the physical energy distribution of glare patterns under daytime conditions using approximations like Fresnel diffraction. These techniques are capable of producing visually believable results, particularly when the pupil remains small. However, they are insufficient under nighttime conditions, when the pupil is significantly dilated and the assumptions behind the approximations no longer hold. To address this, we employ the Rayleigh-Sommerfeld diffraction solution, from which Fresnel diffraction is derived as an approximation, as our baseline reference. In pursuit of performance and visual quality, we also employ Ochoa's approximation and the Chirp Z transform to efficiently generate high-resolution results for computer graphics applications. By also taking into account background illumination and certain physiological characteristics of the human photoreceptor cells, particularly the visual threshold of light stimulus, we propose a framework capable of producing plausible visual depictions of glare patterns for both daytime and nighttime scenes.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"155 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Victor Chu, Oscar Pueyo-Ciutad, Ethan Tseng, Florian Schiffers, Grace Kuo, Nathan Matsuda, Alberto Redo-Sanchez, Douglas Lanman, Oliver Cossairt, Felix Heide
Holographic near-eye displays promise unparalleled depth cues, high-resolution imagery, and realistic three-dimensional parallax at a compact form factor, making them promising candidates for emerging augmented and virtual reality systems. However, existing holographic display methods often assume ideal viewing conditions and overlook real-world factors such as eye floaters and eyelashes—obstructions that can severely degrade perceived image quality. In this work, we propose a new metric that quantifies hologram resilience to artifacts and apply it to computer generated holography (CGH) optimization. We call this Artifact Resilient Holography (ARH). We begin by introducing a simulation method that models the effects of pre- and post-pupil obstructions on holographic displays. Our analysis reveals that eyebox regions dominated by low frequencies—produced especially by the smooth-phase holograms broadly adopted in recent holography work—are vulnerable to visual degradation from dynamic obstructions such as floaters and eyelashes. In contrast, random phase holograms spread energy more uniformly across the eyebox spectrum, enabling them to diffract around obstructions without producing prominent artifacts. By characterizing a random phase eyebox using the Rayleigh Distribution, we derive a differentiable metric in the eyebox domain. We then apply this metric to train a real-time neural network-based phase generator, enabling it to produce artifact-resilient 3D holograms that preserve visual fidelity across a range of practical viewing conditions—enhancing both robustness and user interactivity.
{"title":"Artifact-Resilient Real-Time Holography","authors":"Victor Chu, Oscar Pueyo-Ciutad, Ethan Tseng, Florian Schiffers, Grace Kuo, Nathan Matsuda, Alberto Redo-Sanchez, Douglas Lanman, Oliver Cossairt, Felix Heide","doi":"10.1145/3763361","DOIUrl":"https://doi.org/10.1145/3763361","url":null,"abstract":"Holographic near-eye displays promise unparalleled depth cues, high-resolution imagery, and realistic three-dimensional parallax at a compact form factor, making them promising candidates for emerging augmented and virtual reality systems. However, existing holographic display methods often assume ideal viewing conditions and overlook real-world factors such as eye floaters and eyelashes—obstructions that can severely degrade perceived image quality. In this work, we propose a new metric that quantifies hologram resilience to artifacts and apply it to computer generated holography (CGH) optimization. We call this Artifact Resilient Holography (ARH). We begin by introducing a simulation method that models the effects of pre- and post-pupil obstructions on holographic displays. Our analysis reveals that eyebox regions dominated by low frequencies—produced especially by the smooth-phase holograms broadly adopted in recent holography work—are vulnerable to visual degradation from dynamic obstructions such as floaters and eyelashes. In contrast, random phase holograms spread energy more uniformly across the eyebox spectrum, enabling them to diffract around obstructions without producing prominent artifacts. By characterizing a random phase eyebox using the Rayleigh Distribution, we derive a differentiable metric in the eyebox domain. We then apply this metric to train a real-time neural network-based phase generator, enabling it to produce artifact-resilient 3D holograms that preserve visual fidelity across a range of practical viewing conditions—enhancing both robustness and user interactivity.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"26 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories. While significant progress has been made in generating 3D objects from text or images, creating long-range, 3D-consistent, explorable 3D scenes remains a complex and challenging problem. In this work, we present Voyager , a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames, eliminating the need for 3D reconstruction pipelines (e.g., structure-from-motion or multi-view stereo). Our method integrates three key components: 1) World-Consistent Video Diffusion : A unified architecture that jointly generates aligned RGB and depth video sequences, conditioned on existing world observation to ensure global coherence 2) Long-Range World Exploration : An efficient world cache with point culling and an auto-regressive inference with smooth video sampling for iterative scene extension with context-aware consistency, and 3) Scalable Data Engine : A video reconstruction pipeline that automates camera pose estimation and metric depth prediction for arbitrary videos, enabling large-scale, diverse training data curation without manual 3D annotations. Collectively, these designs result in a clear improvement over existing methods in visual quality and geometric accuracy, with versatile applications. Code for this paper are at https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager.
{"title":"Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation","authors":"Tianyu Huang, Wangguandong Zheng, Tengfei Wang, Yuhao Liu, Zhenwei Wang, Junta Wu, Jie Jiang, Hui Li, Rynson Lau, Wangmeng Zuo, Chunchao Guo","doi":"10.1145/3763330","DOIUrl":"https://doi.org/10.1145/3763330","url":null,"abstract":"Real-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories. While significant progress has been made in generating 3D objects from text or images, creating long-range, 3D-consistent, explorable 3D scenes remains a complex and challenging problem. In this work, we present <jats:italic toggle=\"yes\">Voyager</jats:italic> , a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames, eliminating the need for 3D reconstruction pipelines (e.g., structure-from-motion or multi-view stereo). Our method integrates three key components: 1) World-Consistent Video Diffusion : A unified architecture that jointly generates aligned RGB and depth video sequences, conditioned on existing world observation to ensure global coherence 2) Long-Range World Exploration : An efficient world cache with point culling and an auto-regressive inference with smooth video sampling for iterative scene extension with context-aware consistency, and 3) Scalable Data Engine : A video reconstruction pipeline that automates camera pose estimation and metric depth prediction for arbitrary videos, enabling large-scale, diverse training data curation without manual 3D annotations. Collectively, these designs result in a clear improvement over existing methods in visual quality and geometric accuracy, with versatile applications. Code for this paper are at https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"34 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhongtian Zheng, Tao Huang, Haozhe Su, Xueqi Ma, Yuefan Shen, Tongtong Wang, Yin Yang, Xifeng Gao, Zherong Pan, Kui Wu
Hair cards remain a widely used representation for hair modeling in real-time applications, offering a practical trade-off between visual fidelity, memory usage, and performance. However, generating high-quality hair card models remains a challenging and labor-intensive task. This work presents an automated pipeline for converting strand-based hair models into hair card models with a limited number of cards and textures while preserving the hairstyle appearance. Our key idea is a novel differentiable representation where each strand is encoded as a projected 2D curve in the texture space, which enables end-to-end optimization with differentiable rendering while respecting the structures of the hair geometry. Based on this representation, we develop a novel algorithm pipeline, where we first cluster hair strands into initial hair cards and project the strands into the texture space. We then conduct a two-stage optimization, where our first stage optimizes the orientation of each hair card separately, and after strand projection, our second stage conducts joint optimization over the entire hair card model for fine-tuning. Our method is evaluated on a range of hairstyles, including straight, wavy, curly, and coily hair. To capture the appearance of short or coily hair, our method comes with support for hair caps and cross-card.
{"title":"Auto Hair Card Extraction for Smooth Hair with Differentiable Rendering","authors":"Zhongtian Zheng, Tao Huang, Haozhe Su, Xueqi Ma, Yuefan Shen, Tongtong Wang, Yin Yang, Xifeng Gao, Zherong Pan, Kui Wu","doi":"10.1145/3763295","DOIUrl":"https://doi.org/10.1145/3763295","url":null,"abstract":"Hair cards remain a widely used representation for hair modeling in real-time applications, offering a practical trade-off between visual fidelity, memory usage, and performance. However, generating high-quality hair card models remains a challenging and labor-intensive task. This work presents an automated pipeline for converting strand-based hair models into hair card models with a limited number of cards and textures while preserving the hairstyle appearance. Our key idea is a novel differentiable representation where each strand is encoded as a projected 2D curve in the texture space, which enables end-to-end optimization with differentiable rendering while respecting the structures of the hair geometry. Based on this representation, we develop a novel algorithm pipeline, where we first cluster hair strands into initial hair cards and project the strands into the texture space. We then conduct a two-stage optimization, where our first stage optimizes the orientation of each hair card separately, and after strand projection, our second stage conducts joint optimization over the entire hair card model for fine-tuning. Our method is evaluated on a range of hairstyles, including straight, wavy, curly, and coily hair. To capture the appearance of short or coily hair, our method comes with support for hair caps and cross-card.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"12 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorenzo De Rebotti, Emanuele Giacomini, Giorgio Grisetti, Luca Di Giammarino
Efficient and scalable 3D surface reconstruction from range data remains a core challenge in computer graphics and vision, particularly in real-time and resource-constrained scenarios. Traditional volumetric methods based on fixed-resolution voxel grids or hierarchical structures like octrees often suffer from memory inefficiency, computational overhead, and a lack of GPU support. We propose a novel variance-adaptive, multi-resolution voxel grid that dynamically adjusts voxel size based on the local variance of signed distance field (SDF) observations. Unlike prior multi-resolution approaches that rely on recursive octree structures, our method leverages a flat spatial hash table to store all voxel blocks, supporting constant-time access and full GPU parallelism. This design enables high memory efficiency, and real-time scalability. We further demonstrate how our representation supports GPU-accelerated rendering through a parallel quad-tree structure for Gaussian Splatting, enabling effective control over splat density. Our open-source CUDA/C++ implementation achieves up to 13× speedup and 4× lower memory usage compared to fixed-resolution baselines, while maintaining on par results in terms of reconstruction accuracy, offering a practical and extensible solution for high-performance 3D reconstruction.
{"title":"Resolution Where It Counts: Hash-based GPU-Accelerated 3D Reconstruction via Variance-Adaptive Voxel Grids","authors":"Lorenzo De Rebotti, Emanuele Giacomini, Giorgio Grisetti, Luca Di Giammarino","doi":"10.1145/3777909","DOIUrl":"https://doi.org/10.1145/3777909","url":null,"abstract":"Efficient and scalable 3D surface reconstruction from range data remains a core challenge in computer graphics and vision, particularly in real-time and resource-constrained scenarios. Traditional volumetric methods based on fixed-resolution voxel grids or hierarchical structures like octrees often suffer from memory inefficiency, computational overhead, and a lack of GPU support. We propose a novel variance-adaptive, multi-resolution voxel grid that dynamically adjusts voxel size based on the local variance of signed distance field (SDF) observations. Unlike prior multi-resolution approaches that rely on recursive octree structures, our method leverages a flat spatial hash table to store all voxel blocks, supporting constant-time access and full GPU parallelism. This design enables high memory efficiency, and real-time scalability. We further demonstrate how our representation supports GPU-accelerated rendering through a parallel quad-tree structure for Gaussian Splatting, enabling effective control over splat density. Our open-source CUDA/C++ implementation achieves up to 13× speedup and 4× lower memory usage compared to fixed-resolution baselines, while maintaining on par results in terms of reconstruction accuracy, offering a practical and extensible solution for high-performance 3D reconstruction.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"204 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taehei Kim, Jihun Shin, Hyeshim Kim, Hyuckjin Jang, Jiho Kang, Sung-Hee Lee
We propose a multi-user Mixed Reality (MR) telepresence system that allows users to interact by seamlessly visualizing remote environments and avatars overlaid onto their local physical space. Building on prior shared-space approaches, our method first aligns overlapping rooms to maximize a shared space —a common area containing matched real and virtual objects where all users can interact. Uniquely, our system extends beyond this shared space by visualizing non-shared spaces, the remaining part of each room, allowing users to inhabit these distinct areas. To address the issue of overlap between non-shared spaces, we dynamically adjust their visibility based on user proximity, using a Voronoi diagram to prioritize subspaces closer to each user. Visualizing the surrounding space of each user conveys spatial context, helping others interpret their behavior within their environment. Visibility is updated in real time as users move, maintaining a coherent sense of spatial awareness. Through a user study, we demonstrate that our system enhances enjoyment, spatial understanding, and presence compared to shared-space-only approaches. Quantitative results further show that our dynamic visibility modulation improves both personal space preservation and space accessibility relative to static methods. Overall, our system provides users with a seamless, dynamically connected, and shared multi-room environment.
{"title":"Voronoi Rooms: Dynamic Visibility Modulation of Overlapping Spaces for Telepresence","authors":"Taehei Kim, Jihun Shin, Hyeshim Kim, Hyuckjin Jang, Jiho Kang, Sung-Hee Lee","doi":"10.1145/3777900","DOIUrl":"https://doi.org/10.1145/3777900","url":null,"abstract":"We propose a multi-user Mixed Reality (MR) telepresence system that allows users to interact by seamlessly visualizing remote environments and avatars overlaid onto their local physical space. Building on prior shared-space approaches, our method first aligns overlapping rooms to maximize a <jats:italic toggle=\"yes\">shared space</jats:italic> —a common area containing matched real and virtual objects where all users can interact. Uniquely, our system extends beyond this shared space by visualizing non-shared spaces, the remaining part of each room, allowing users to inhabit these distinct areas. To address the issue of overlap between non-shared spaces, we dynamically adjust their visibility based on user proximity, using a Voronoi diagram to prioritize subspaces closer to each user. Visualizing the surrounding space of each user conveys spatial context, helping others interpret their behavior within their environment. Visibility is updated in real time as users move, maintaining a coherent sense of spatial awareness. Through a user study, we demonstrate that our system enhances enjoyment, spatial understanding, and presence compared to shared-space-only approaches. Quantitative results further show that our dynamic visibility modulation improves both personal space preservation and space accessibility relative to static methods. Overall, our system provides users with a seamless, dynamically connected, and shared multi-room environment.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"6 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145554480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Light Transport Operators (LTOs) represent a fundamental concept in computer graphics, modeling single bounces of light within a virtual environment as linears operators on infinite dimensional spaces. While the LTOs play a crucial role in rendering, prior studies have primarily focused on spectral analyses of the light field rather than the operators themselves. This paper presents a rigorous investigation into the spectral properties of the LTOs. Due to their non-compact nature, traditional spectral analysis techniques face challenges in this setting. However, many practical rendering methods effectively employ compact approximations, suggesting that non-compactness is not an absolute barrier. We show the relevance of such approximations and establish various path integral formulations of their spectrum. These findings enhance the theoretical understanding of light transport and offer new perspectives for improving rendering efficiency and accuracy.
{"title":"Spectral Theory of Light Transport Operators","authors":"Cyril Soler, Kartic Subr","doi":"10.1145/3774756","DOIUrl":"https://doi.org/10.1145/3774756","url":null,"abstract":"Light Transport Operators (LTOs) represent a fundamental concept in computer graphics, modeling single bounces of light within a virtual environment as linears operators on infinite dimensional spaces. While the LTOs play a crucial role in rendering, prior studies have primarily focused on spectral analyses of the light field rather than the operators themselves. This paper presents a rigorous investigation into the spectral properties of the LTOs. Due to their non-compact nature, traditional spectral analysis techniques face challenges in this setting. However, many practical rendering methods effectively employ compact approximations, suggesting that non-compactness is not an absolute barrier. We show the relevance of such approximations and establish various path integral formulations of their spectrum. These findings enhance the theoretical understanding of light transport and offer new perspectives for improving rendering efficiency and accuracy.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"53 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145434325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Yang, Yongqing Liang, Xin Li, Congyi Zhang, Guying Lin, Cheng Lin, Alla Sheffer, Scott Schaefer, John Keyser, Wenping Wang
Piecewise parametric surfaces have long been established as prevalent geometric representations; however, they often require surface refinement or sophisticated quadrangulation to accurately represent complex geometries. Geometric deep learning has shown that neural networks can provide greater representational power than conventional methods. Nevertheless, approaches using a single parametric surface for shape fitting struggle to capture fine-grained geometric details, while multi-patch methods fail to ensure seamless connections between adjacent patches. We present Neural Piecewise Parametric Surfaces ( NeuPPS ), the first piecewise neural surface representation that allows for coarse patch layouts composed of arbitrary n -sided surface patches to model complex surface geometries with high precision, offering enhanced flexibility compared to traditional parametric surfaces. This new surface representation guarantees, by construction, the continuity between adjacent patches, a property that other neural patch-based approaches cannot ensure. Two novel components are introduced: a learnable feature complex and a continuous mapping function approximated by multi-layer perceptrons (MLPs). We apply the proposed NeuPPS to surface fitting and shape space learning tasks. Extensive experiments demonstrate the advantages of NeuPPS over traditional parametric representations and existing patch-based learning approaches.
{"title":"NeuPPS: Neural Piecewise Parametric Surfaces","authors":"Lei Yang, Yongqing Liang, Xin Li, Congyi Zhang, Guying Lin, Cheng Lin, Alla Sheffer, Scott Schaefer, John Keyser, Wenping Wang","doi":"10.1145/3771546","DOIUrl":"https://doi.org/10.1145/3771546","url":null,"abstract":"Piecewise parametric surfaces have long been established as prevalent geometric representations; however, they often require surface refinement or sophisticated quadrangulation to accurately represent complex geometries. Geometric deep learning has shown that neural networks can provide greater representational power than conventional methods. Nevertheless, approaches using a single parametric surface for shape fitting struggle to capture fine-grained geometric details, while multi-patch methods fail to ensure seamless connections between adjacent patches. We present <jats:italic toggle=\"yes\">Neural Piecewise Parametric Surfaces</jats:italic> ( <jats:italic toggle=\"yes\">NeuPPS</jats:italic> ), the <jats:italic toggle=\"yes\">first</jats:italic> piecewise neural surface representation that allows for coarse patch layouts composed of <jats:italic toggle=\"yes\"> arbitrary <jats:italic toggle=\"yes\">n</jats:italic> -sided surface patches </jats:italic> to model complex surface geometries with high precision, offering enhanced <jats:italic toggle=\"yes\">flexibility</jats:italic> compared to traditional parametric surfaces. This new surface representation guarantees, by construction, the continuity between adjacent patches, a property that other neural patch-based approaches cannot ensure. Two novel components are introduced: a learnable feature complex and a continuous mapping function approximated by multi-layer perceptrons (MLPs). We apply the proposed <jats:italic toggle=\"yes\">NeuPPS</jats:italic> to surface fitting and shape space learning tasks. Extensive experiments demonstrate the advantages of <jats:italic toggle=\"yes\">NeuPPS</jats:italic> over traditional parametric representations and existing patch-based learning approaches.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"69 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145396367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Dodik, Vincent Sitzmann, Justin Solomon, Oded Stein
Bounded bihramonic weights are a popular tool used to rig and deform characters for animation, to compute reduced-order simulations, and to define feature descriptors for geometry processing. They necessitate tetrahedralizing the volume bounded by the surface, introducing the possibility of meshing artifacts or tetrahedralization failure. We introduce a mesh-free and robust automatic skinning technique that generates weights comparable to the current state of the art, but works reliably even on open surfaces, triangle soups, and point clouds where current methods fail. We achieve this through the use of a specialized Lagrangian representation enabled by the advent of hardware ray-tracing, which circumvents the need for finite elements while optimizing the biharmonic energy and enforcing boundary conditions. The flexibility of our formulation allows us to integrate artistic control through weight painting during the optimization. We offer a thorough qualitative and quantitative evaluation of our method.
{"title":"Robust Biharmonic Skinning Using Geometric Fields","authors":"Ana Dodik, Vincent Sitzmann, Justin Solomon, Oded Stein","doi":"10.1145/3771928","DOIUrl":"https://doi.org/10.1145/3771928","url":null,"abstract":"Bounded bihramonic weights are a popular tool used to rig and deform characters for animation, to compute reduced-order simulations, and to define feature descriptors for geometry processing. They necessitate tetrahedralizing the volume bounded by the surface, introducing the possibility of meshing artifacts or tetrahedralization failure. We introduce a <jats:italic toggle=\"yes\">mesh-free</jats:italic> and <jats:italic toggle=\"yes\">robust</jats:italic> automatic skinning technique that generates weights comparable to the current state of the art, but works reliably even on open surfaces, triangle soups, and point clouds where current methods fail. We achieve this through the use of a specialized Lagrangian representation enabled by the advent of hardware ray-tracing, which circumvents the need for finite elements while optimizing the biharmonic energy and enforcing boundary conditions. The flexibility of our formulation allows us to integrate artistic control through weight painting during the optimization. We offer a thorough qualitative and quantitative evaluation of our method.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"19 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145396487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kwanggyoon Seo, Rene Culaway, Byeong-Uk Lee, Junyong Noh
Manipulating the emotion of a performer in a video is a challenging task. The lip motion needs to be preserved while performing the desired changes in the emotion of the subject; however, simply utilizing existing image-based editing methods sabotages the original lip synchronization. We tackle this problem by utilizing a pretrained StyleGAN paired with a landmark-based editing module that modifies the bias present in the edit direction used in image manipulation. The proposed editing module consists of a latent-based landmark detection network and an editing network that modifies the editing direction to match the original lip synchronization while preserving the desired emotion manipulation results. This is realized by taking the facial landmarks as control points. Both networks operate on the latent space, which enables fast training and inference. We show that the proposed method runs significantly faster and performs better in terms of visual quality than alternative approaches, which was validated through a perceptual study. The proposed method can also be extended to perform face reenactment to generate a talking-head video from a single image and face image manipulation using facial landmarks as control points.
{"title":"Emotion Manipulation for Talking-Head Videos via Facial Landmarks","authors":"Kwanggyoon Seo, Rene Culaway, Byeong-Uk Lee, Junyong Noh","doi":"10.1145/3770576","DOIUrl":"https://doi.org/10.1145/3770576","url":null,"abstract":"Manipulating the emotion of a performer in a video is a challenging task. The lip motion needs to be preserved while performing the desired changes in the emotion of the subject; however, simply utilizing existing image-based editing methods sabotages the original lip synchronization. We tackle this problem by utilizing a pretrained StyleGAN paired with a landmark-based editing module that modifies the bias present in the edit direction used in image manipulation. The proposed editing module consists of a latent-based landmark detection network and an editing network that modifies the editing direction to match the original lip synchronization while preserving the desired emotion manipulation results. This is realized by taking the facial landmarks as control points. Both networks operate on the latent space, which enables fast training and inference. We show that the proposed method runs significantly faster and performs better in terms of visual quality than alternative approaches, which was validated through a perceptual study. The proposed method can also be extended to perform face reenactment to generate a talking-head video from a single image and face image manipulation using facial landmarks as control points.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"24 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145282651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}