Lane detection is a fundamental task in intelligent driving systems. However, the slender and sparse structure of lanes, combined with the dominance of irrelevant background regions in road scenes, makes accurate lane localization particularly challenging, especially under complex and adverse conditions. To address these issues, we propose a novel Region-Aware Sparse Attention Network (RSANet), which is designed to selectively enhance lane-relevant features while suppressing background interference. Specifically, we introduce the Region-guided Pooling Predictor (RPP) that generates lane region activation maps to guide the backbone network in focusing on informative areas. To improve the multi-scale feature fusion capability of the Feature Pyramid Network (FPN), we propose the Bilateral Pooling Attention Module (BPAM) that captures discriminative features by jointly modeling dependencies along both the channel and spatial dimensions. Furthermore, the Lane-guided Sparse Attention Mechanism (LSAM) efficiently aggregates global contextual information from the most relevant spatial regions to reinforce lane prior representations while significantly reducing redundant computation. Extensive experiments on benchmark datasets demonstrate that RSANet outperforms state-of-the-art methods in a variety of challenging scenarios. Notably, RSANet achieves an F1@50 score of 80.04% on the CULane dataset that shows notable improvements.
{"title":"Region-Aware Sparse Attention Network for Lane Detection","authors":"Yan Deng, Guoqiang Xiao","doi":"10.1111/cgf.70246","DOIUrl":"https://doi.org/10.1111/cgf.70246","url":null,"abstract":"<p>Lane detection is a fundamental task in intelligent driving systems. However, the slender and sparse structure of lanes, combined with the dominance of irrelevant background regions in road scenes, makes accurate lane localization particularly challenging, especially under complex and adverse conditions. To address these issues, we propose a novel Region-Aware Sparse Attention Network (RSANet), which is designed to selectively enhance lane-relevant features while suppressing background interference. Specifically, we introduce the Region-guided Pooling Predictor (RPP) that generates lane region activation maps to guide the backbone network in focusing on informative areas. To improve the multi-scale feature fusion capability of the Feature Pyramid Network (FPN), we propose the Bilateral Pooling Attention Module (BPAM) that captures discriminative features by jointly modeling dependencies along both the channel and spatial dimensions. Furthermore, the Lane-guided Sparse Attention Mechanism (LSAM) efficiently aggregates global contextual information from the most relevant spatial regions to reinforce lane prior representations while significantly reducing redundant computation. Extensive experiments on benchmark datasets demonstrate that RSANet outperforms state-of-the-art methods in a variety of challenging scenarios. Notably, RSANet achieves an F1@50 score of 80.04% on the CULane dataset that shows notable improvements.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Open-vocabulary 3D object detection has gained significant interest due to its critical applications in autonomous driving and embodied AI. Existing detection methods, whether offline or online, typically rely on dense point cloud reconstruction, which imposes substantial computational overhead and memory constraints, hindering real-time deployment in downstream tasks. To address this, we propose a novel reconstruction-free online framework tailored for memory-efficient and real-time 3D detection. Specifically, given streaming posed RGB-D video input, we leverage Cubify Anything as a pre-trained visual foundation model (VFM) for single-view 3D object detection, coupled with CLIP to capture open-vocabulary semantics of detected objects. To fuse all detected bounding boxes across different views into a unified one, we employ an association module for correspondences of multi-views and an optimization module to fuse the 3D bounding boxes of the same instance. The association module utilizes 3D Non-Maximum Suppression (NMS) and a box correspondence matching module. The optimization module uses an IoU-guided efficient random optimization technique based on particle filtering to enforce multi-view consistency of the 3D bounding boxes while minimizing computational complexity. Extensive experiments on CA-1M and ScanNetV2 datasets demonstrate that our method achieves state-of-the-art performance among online methods. Benefiting from this novel reconstruction-free paradigm for 3D object detection, our method exhibits great generalization abilities in various scenarios, enabling real-time perception even in environments exceeding 1000 square meters.
{"title":"BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion","authors":"Yuqing Lan, Chenyang Zhu, Zhirui Gao, Jiazhao Zhang, Yihan Cao, Renjiao Yi, Yijie Wang, Kai Xu","doi":"10.1111/cgf.70254","DOIUrl":"https://doi.org/10.1111/cgf.70254","url":null,"abstract":"<p>Open-vocabulary 3D object detection has gained significant interest due to its critical applications in autonomous driving and embodied AI. Existing detection methods, whether offline or online, typically rely on dense point cloud reconstruction, which imposes substantial computational overhead and memory constraints, hindering real-time deployment in downstream tasks. To address this, we propose a novel reconstruction-free online framework tailored for memory-efficient and real-time 3D detection. Specifically, given streaming posed RGB-D video input, we leverage Cubify Anything as a pre-trained visual foundation model (VFM) for single-view 3D object detection, coupled with CLIP to capture open-vocabulary semantics of detected objects. To fuse all detected bounding boxes across different views into a unified one, we employ an association module for correspondences of multi-views and an optimization module to fuse the 3D bounding boxes of the same instance. The association module utilizes 3D Non-Maximum Suppression (NMS) and a box correspondence matching module. The optimization module uses an IoU-guided efficient random optimization technique based on particle filtering to enforce multi-view consistency of the 3D bounding boxes while minimizing computational complexity. Extensive experiments on CA-1M and ScanNetV2 datasets demonstrate that our method achieves state-of-the-art performance among online methods. Benefiting from this novel reconstruction-free paradigm for 3D object detection, our method exhibits great generalization abilities in various scenarios, enabling real-time perception even in environments exceeding 1000 square meters.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
HuiGuang Huang, Dong-Yi Wu, Yulin Wang, Yu Cao, Tong-Yee Lee
This paper presents a novel fully automated method for generating view-independent abstract wire art from 3D models. The main challenge in creating line art is to strike a balance among abstraction, structural clarity, 3D perception, and consistent aesthetics from different viewpoints. Many existing approaches have been proposed, including extracting wire art from mesh, reconstructing it from pictures, etc. But they all suffer from the fact that the wires are usually very unorganized and cumbersome and usually can only guarantee the observation effect of specific viewpoints. To overcome these problems, we propose a paradigm shift: instead of predicting the line segments directly, we consider the generation of wire art as an optimization-driven manifold-fitting problem. Thus we can abstract/generalize the 3D model while retaining the key properties necessary for appealing line art, including structural topology and connectivity, and maintain the three-dimensionality of the line art with a multi-perspective view. Experimental results show that our view-independent method outperforms previous methods in terms of line simplicity, shape fidelity, and visual consistency.
{"title":"View-Independent Wire Art Modeling via Manifold Fitting","authors":"HuiGuang Huang, Dong-Yi Wu, Yulin Wang, Yu Cao, Tong-Yee Lee","doi":"10.1111/cgf.70247","DOIUrl":"https://doi.org/10.1111/cgf.70247","url":null,"abstract":"<p>This paper presents a novel fully automated method for generating view-independent abstract wire art from 3D models. The main challenge in creating line art is to strike a balance among abstraction, structural clarity, 3D perception, and consistent aesthetics from different viewpoints. Many existing approaches have been proposed, including extracting wire art from mesh, reconstructing it from pictures, etc. But they all suffer from the fact that the wires are usually very unorganized and cumbersome and usually can only guarantee the observation effect of specific viewpoints. To overcome these problems, we propose a paradigm shift: instead of predicting the line segments directly, we consider the generation of wire art as an optimization-driven manifold-fitting problem. Thus we can abstract/generalize the 3D model while retaining the key properties necessary for appealing line art, including structural topology and connectivity, and maintain the three-dimensionality of the line art with a multi-perspective view. Experimental results show that our view-independent method outperforms previous methods in terms of line simplicity, shape fidelity, and visual consistency.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, 2D Gaussian Splatting (2DGS) has demonstrated superior geometry reconstruction quality than the popular 3DGS by using 2D surfels to approximate thin surfaces. However, it falls short when dealing with glossy surfaces, resulting in visible holes in these areas. We find that the reflection discontinuity causes the issue. To fit the jump from diffuse to specular reflection at different viewing angles, depth bias is introduced in the optimized Gaussian primitives. To address that, we first replace the depth distortion loss in 2DGS with a novel depth convergence loss, which imposes a strong constraint on depth continuity. Then, we rectify the depth criterion in determining the actual surface, which fully accounts for all the intersecting Gaussians along the ray. Qualitative and quantitative evaluations across various datasets reveal that our method significantly improves reconstruction quality, with more complete and accurate surfaces than 2DGS. Code is available at https://github.com/XiaoXinyyx/Unbiased_Surfel.
{"title":"Introducing Unbiased Depth into 2D Gaussian Splatting for High-accuracy Surface Reconstruction","authors":"Yixin Yang, Yang Zhou, Hui Huang","doi":"10.1111/cgf.70252","DOIUrl":"https://doi.org/10.1111/cgf.70252","url":null,"abstract":"<p>Recently, 2D Gaussian Splatting (2DGS) has demonstrated superior geometry reconstruction quality than the popular 3DGS by using 2D surfels to approximate thin surfaces. However, it falls short when dealing with glossy surfaces, resulting in visible holes in these areas. We find that the reflection discontinuity causes the issue. To fit the jump from diffuse to specular reflection at different viewing angles, depth bias is introduced in the optimized Gaussian primitives. To address that, we first replace the depth distortion loss in 2DGS with a novel depth convergence loss, which imposes a strong constraint on depth continuity. Then, we rectify the depth criterion in determining the actual surface, which fully accounts for all the intersecting Gaussians along the ray. Qualitative and quantitative evaluations across various datasets reveal that our method significantly improves reconstruction quality, with more complete and accurate surfaces than 2DGS. Code is available at https://github.com/XiaoXinyyx/Unbiased_Surfel.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural fields have emerged as a powerful framework for representing continuous multidimensional signals such as images and videos, 3D and 4D objects and scenes, and radiance fields. While efficient, achieving high-quality representation requires the use of wide and deep neural networks. These, however, are slow to train and evaluate. Although several acceleration techniques have been proposed, they either trade memory for faster training and/or inference, rely on thousands of fitted primitives with considerable optimization time, or compromise the smooth, continuous nature of neural fields. In this paper, we introduce Gaussian Neural Fields (GNF), a novel compact neural decoder that maps learned feature grids into continuous non-linear signals, such as RGB images, Signed Distance Functions (SDFs), and radiance fields, using a single compact layer of Gaussian kernels defined in a high-dimensional feature space. Our key observation is that neurons in traditional MLPs perform simple computations, usually a dot product followed by an activation function, necessitating wide and deep MLPs or high-resolution feature grids to model complex functions. In this paper, we show that replacing MLP-based decoders with Gaussian kernels whose centers are learned features yields highly accurate representations of 2D (RGB), 3D (geometry), and 5D (radiance fields) signals with just a single layer of such kernels. This representation is highly parallelizable, operates on low-resolution grids, and trains in under 15 seconds for 3D geometry and under 11 minutes for view synthesis. GNF matches the accuracy of deep MLP-based decoders with far fewer parameters and significantly higher inference throughput. The source code is publicly available at https://grbfnet.github.io/.
{"title":"GNF: Gaussian Neural Fields for Multidimensional Signal Representation and Reconstruction","authors":"Abelaziz Bouzidi, Hamid Laga, Hazem Wannous, Ferdous Sohel","doi":"10.1111/cgf.70232","DOIUrl":"https://doi.org/10.1111/cgf.70232","url":null,"abstract":"<p>Neural fields have emerged as a powerful framework for representing continuous multidimensional signals such as images and videos, 3D and 4D objects and scenes, and radiance fields. While efficient, achieving high-quality representation requires the use of wide and deep neural networks. These, however, are slow to train and evaluate. Although several acceleration techniques have been proposed, they either trade memory for faster training and/or inference, rely on thousands of fitted primitives with considerable optimization time, or compromise the smooth, continuous nature of neural fields. In this paper, we introduce Gaussian Neural Fields (GNF), a novel compact neural decoder that maps learned feature grids into continuous non-linear signals, such as RGB images, Signed Distance Functions (SDFs), and radiance fields, using a single compact layer of Gaussian kernels defined in a high-dimensional feature space. Our key observation is that neurons in traditional MLPs perform simple computations, usually a dot product followed by an activation function, necessitating wide and deep MLPs or high-resolution feature grids to model complex functions. In this paper, we show that replacing MLP-based decoders with Gaussian kernels whose centers are learned features yields highly accurate representations of 2D (RGB), 3D (geometry), and 5D (radiance fields) signals with just a single layer of such kernels. This representation is highly parallelizable, operates on low-resolution grids, and trains in under 15 seconds for 3D geometry and under 11 minutes for view synthesis. GNF matches the accuracy of deep MLP-based decoders with far fewer parameters and significantly higher inference throughput. The source code is publicly available at https://grbfnet.github.io/.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julian Kaltheuner, Alexander Oebel, Hannah Droege, Patrick Stotko, Reinhard Klein
Dynamic surface reconstruction of objects from point cloud sequences is a challenging field in computer graphics. Existing approaches either require multiple regularization terms or extensive training data which, however, lead to compromises in reconstruction accuracy as well as over-smoothing or poor generalization to unseen objects and motions. To address these limitations, we introduce Preconditioned Deformation Grids, a novel technique for estimating coherent deformation fields directly from unstructured point cloud sequences without requiring or forming explicit correspondences. Key to our approach is the use of multi-resolution voxel grids that capture the overall motion at varying spatial scales, enabling a more flexible deformation representation. In conjunction with incorporating grid-based Sobolev preconditioning into gradient-based optimization, we show that applying a Chamfer loss between the input point clouds as well as to an evolving template mesh is sufficient to obtain accurate deformations. To ensure temporal consistency along the object surface, we include a weak isometry loss on mesh edges which complements the main objective without constraining deformation fidelity. Extensive evaluations demonstrate that our method achieves superior results, particularly for long sequences, compared to state-of-the-art techniques.
{"title":"Preconditioned Deformation Grids","authors":"Julian Kaltheuner, Alexander Oebel, Hannah Droege, Patrick Stotko, Reinhard Klein","doi":"10.1111/cgf.70269","DOIUrl":"https://doi.org/10.1111/cgf.70269","url":null,"abstract":"<p>Dynamic surface reconstruction of objects from point cloud sequences is a challenging field in computer graphics. Existing approaches either require multiple regularization terms or extensive training data which, however, lead to compromises in reconstruction accuracy as well as over-smoothing or poor generalization to unseen objects and motions. To address these limitations, we introduce <i>Preconditioned Deformation Grids</i>, a novel technique for estimating coherent deformation fields directly from unstructured point cloud sequences without requiring or forming explicit correspondences. Key to our approach is the use of multi-resolution voxel grids that capture the overall motion at varying spatial scales, enabling a more flexible deformation representation. In conjunction with incorporating grid-based Sobolev preconditioning into gradient-based optimization, we show that applying a Chamfer loss between the input point clouds as well as to an evolving template mesh is sufficient to obtain accurate deformations. To ensure temporal consistency along the object surface, we include a weak isometry loss on mesh edges which complements the main objective without constraining deformation fidelity. Extensive evaluations demonstrate that our method achieves superior results, particularly for long sequences, compared to state-of-the-art techniques.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cgf.70269","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. T. Jones, Z. Zhang, F. Hähnlein, W. Matusik, M. Ahmad, V. Kim, A. Schulz
Parametric CAD systems use domain-specific languages (DSLs) to represent geometry as programs, enabling both flexible modeling and structured editing. With the rise of large language models (LLMs), there is growing interest in generating such programs from natural language. This raises a key question: what kind of DSL best supports both CAD generation and editing, whether performed by a human or an AI? In this work, we introduce AIDL, a hierarchical, solver-aided DSL designed to align with the strengths of LLMs while remaining interpretable and editable by humans. AIDL enables high-level reasoning by breaking problems into abstract components and structural relationships, while offloading low-level geometric reasoning to a constraint solver. We evaluate AIDL in a 2D text-to-CAD setting using a zero-shot prompt-based interface and compare it to OpenSCAD, a widely used CAD DSL that appears in LLM training data. AIDL produces results that are visually competitive and significantly easier to edit. Our findings suggest that language design is a powerful complement to model training and prompt engineering for building collaborative AI–human tools in CAD. Code is available at https://github.com/deGravity/aidl.
{"title":"A Solver-Aided Hierarchical Language for LLM-Driven CAD Design","authors":"B. T. Jones, Z. Zhang, F. Hähnlein, W. Matusik, M. Ahmad, V. Kim, A. Schulz","doi":"10.1111/cgf.70250","DOIUrl":"https://doi.org/10.1111/cgf.70250","url":null,"abstract":"<p>Parametric CAD systems use domain-specific languages (DSLs) to represent geometry as programs, enabling both flexible modeling and structured editing. With the rise of large language models (LLMs), there is growing interest in generating such programs from natural language. This raises a key question: what kind of DSL best supports both CAD generation and editing, whether performed by a human or an AI? In this work, we introduce AIDL, a hierarchical, solver-aided DSL designed to align with the strengths of LLMs while remaining interpretable and editable by humans. AIDL enables high-level reasoning by breaking problems into abstract components and structural relationships, while offloading low-level geometric reasoning to a constraint solver. We evaluate AIDL in a 2D text-to-CAD setting using a zero-shot prompt-based interface and compare it to OpenSCAD, a widely used CAD DSL that appears in LLM training data. AIDL produces results that are visually competitive and significantly easier to edit. Our findings suggest that language design is a powerful complement to model training and prompt engineering for building collaborative AI–human tools in CAD. Code is available at https://github.com/deGravity/aidl.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Co-speech gesture generation, driven by emotional expression and synergistic bodily movements, is essential for applications such as virtual avatars and human-robot interaction. Existing co-speech gesture generation methods face two fundamental limitations: (1) producing inexpressive gestures due to ignoring the temporal evolution of emotion; (2) generating incoherent and unnatural motions as a result of either holistic body oversimplification or independent part modeling. To address the above limitations, we propose EmoDiffGes, a diffusion-based framework grounded in embodied emotion theory, unifying dynamic emotion conditioning and part-aware synergistic modeling. Specifically, a Dynamic Emotion-Alignment Module (DEAM) is first applied to extract dynamic emotional cues and inject emotion guidance into the generation process. Then, a Progressive Synergistic Gesture Generator (PSGG) iteratively refines region-specific latent codes while maintaining full-body coordination, leveraging a Body Region Prior for part-specific encoding and Progressive Inter-Region Synergistic Flow for global motion coherence. Extensive experiments validate the effectiveness of our methods, showcasing the potential for generating expressive, coordinated, and emotionally grounded human gestures.
{"title":"EmoDiffGes: Emotion-Aware Co-Speech Holistic Gesture Generation with Progressive Synergistic Diffusion","authors":"Xinru Li, Jingzhong Lin, Bohao Zhang, Yuanyuan Qi, Changbo Wang, Gaoqi He","doi":"10.1111/cgf.70261","DOIUrl":"https://doi.org/10.1111/cgf.70261","url":null,"abstract":"<p>Co-speech gesture generation, driven by emotional expression and synergistic bodily movements, is essential for applications such as virtual avatars and human-robot interaction. Existing co-speech gesture generation methods face two fundamental limitations: (1) producing inexpressive gestures due to ignoring the temporal evolution of emotion; (2) generating incoherent and unnatural motions as a result of either holistic body oversimplification or independent part modeling. To address the above limitations, we propose EmoDiffGes, a diffusion-based framework grounded in embodied emotion theory, unifying dynamic emotion conditioning and part-aware synergistic modeling. Specifically, a Dynamic Emotion-Alignment Module (DEAM) is first applied to extract dynamic emotional cues and inject emotion guidance into the generation process. Then, a Progressive Synergistic Gesture Generator (PSGG) iteratively refines region-specific latent codes while maintaining full-body coordination, leveraging a Body Region Prior for part-specific encoding and Progressive Inter-Region Synergistic Flow for global motion coherence. Extensive experiments validate the effectiveness of our methods, showcasing the potential for generating expressive, coordinated, and emotionally grounded human gestures.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce ClothingTwin, a novel end-to-end framework for reconstructing 3D digital twins of clothing that capture both the outer and inner fabric —without the need for manual mannequin removal. Traditional 2D “ghost mannequin” photography techniques remove the mannequin and composite partial inner textures to create images in which the garment appears as if it were worn by a transparent model. However, extending such method to photorealistic 3D Gaussian Splatting (3DGS) is far more challenging. Achieving consistent inner-layer compositing across the large sets of images used for 3DGS optimization quickly becomes impractical if done manually. To address these issues, ClothingTwin introduces three key innovations. First, a specialized image acquisition protocol captures two sets of images for each garment: one worn normally on the mannequin (outer layer exposed) and one worn inside-out (inner layer exposed). This eliminates the need to painstakingly edit out mannequins in thousands of images and provides full coverage of all fabric surfaces. Second, we employ a mesh-guided 3DGS reconstruction for each layer and leverage Non-Rigid Iterative Closest Point (ICP) to align outer and inner point-clouds despite distinct geometries. Third, our enhanced rendering pipeline—featuring mesh-guided back-face culling, back-to-front alpha blending, and recalculated spherical harmonic angles—ensures photorealistic visualization of the combined outer and inner layers without inter-layer artifacts. Experimental evaluations on various garments show that ClothingTwin outperforms conventional 3DGS-based methods, and our ablation study validates the effectiveness of each proposed component.
{"title":"ClothingTwin: Reconstructing Inner and Outer Layers of Clothing Using 3D Gaussian Splatting","authors":"Munkyung Jung, Dohae Lee, In-Kwon Lee","doi":"10.1111/cgf.70240","DOIUrl":"https://doi.org/10.1111/cgf.70240","url":null,"abstract":"<p>We introduce ClothingTwin, a novel end-to-end framework for reconstructing 3D digital twins of clothing that capture both the outer and inner fabric —without the need for manual mannequin removal. Traditional 2D “ghost mannequin” photography techniques remove the mannequin and composite partial inner textures to create images in which the garment appears as if it were worn by a transparent model. However, extending such method to photorealistic 3D Gaussian Splatting (3DGS) is far more challenging. Achieving consistent inner-layer compositing across the large sets of images used for 3DGS optimization quickly becomes impractical if done manually. To address these issues, ClothingTwin introduces three key innovations. First, a specialized image acquisition protocol captures two sets of images for each garment: one worn normally on the mannequin (outer layer exposed) and one worn inside-out (inner layer exposed). This eliminates the need to painstakingly edit out mannequins in thousands of images and provides full coverage of all fabric surfaces. Second, we employ a mesh-guided 3DGS reconstruction for each layer and leverage Non-Rigid Iterative Closest Point (ICP) to align outer and inner point-clouds despite distinct geometries. Third, our enhanced rendering pipeline—featuring mesh-guided back-face culling, back-to-front alpha blending, and recalculated spherical harmonic angles—ensures photorealistic visualization of the combined outer and inner layers without inter-layer artifacts. Experimental evaluations on various garments show that ClothingTwin outperforms conventional 3DGS-based methods, and our ablation study validates the effectiveness of each proposed component.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cgf.70240","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present FlowCapX, a physics-enhanced framework for flow reconstruction from sparse video inputs, addressing the challenge of jointly optimizing complex physical constraints and sparse observational data over long time horizons. Existing methods often struggle to capture turbulent motion while maintaining physical consistency, limiting reconstruction quality and downstream tasks. Focusing on velocity inference, our approach introduces a hybrid framework that strategically separates representation and supervision across spatial scales. At the coarse level, we resolve sparse-view ambiguities via a novel optimization strategy that aligns long-term observation with physics-grounded velocity fields. By emphasizing vorticity-based physical constraints, our method enhances physical fidelity and improves optimization stability. At the fine level, we prioritize observational fidelity to preserve critical turbulent structures. Extensive experiments demonstrate state-of-the-art velocity reconstruction, enabling velocity-aware downstream tasks, e.g., accurate flow analysis, scene augmentation with tracer visualization and re-simulation. Our implementation is released at ://github.com/taoningxiao/FlowCapX.git.
{"title":"FlowCapX: Physics-Grounded Flow Capture with Long-Term Consistency","authors":"N. Tao, L. Zhang, X. Ni, M. Chu, B. Chen","doi":"10.1111/cgf.70274","DOIUrl":"https://doi.org/10.1111/cgf.70274","url":null,"abstract":"<p>We present <b>FlowCapX</b>, a physics-enhanced framework for flow reconstruction from sparse video inputs, addressing the challenge of jointly optimizing complex physical constraints and sparse observational data over long time horizons. Existing methods often struggle to capture turbulent motion while maintaining physical consistency, limiting reconstruction quality and downstream tasks. Focusing on velocity inference, our approach introduces a hybrid framework that strategically separates representation and supervision across spatial scales. At the coarse level, we resolve sparse-view ambiguities via a novel optimization strategy that aligns long-term observation with physics-grounded velocity fields. By emphasizing vorticity-based physical constraints, our method enhances physical fidelity and improves optimization stability. At the fine level, we prioritize observational fidelity to preserve critical turbulent structures. Extensive experiments demonstrate state-of-the-art velocity reconstruction, enabling velocity-aware downstream tasks, e.g., accurate flow analysis, scene augmentation with tracer visualization and re-simulation. Our implementation is released at ://github.com/taoningxiao/FlowCapX.git.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}