首页 > 最新文献

Computer Graphics Forum最新文献

英文 中文
Feature Disentanglement in GANs for Photorealistic Multi-view Hair Transfer 用于逼真多视图头发转移的gan特征解缠
IF 2.9 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-11 DOI: 10.1111/cgf.70245
Jiayi Xu, Zhengyang Wu, Chenming Zhang, Xiaogang Jin, Yaohua Ji

Fast and highly realistic multi-view hair transfer plays a crucial role in evaluating the effectiveness of virtual hair try-on systems. However, GAN-based generation and editing methods face persistent challenges in feature disentanglement. Achieving pixel-level, attribute-specific modifications—such as changing hairstyle or hair color without affecting other facial features—remains a long-standing problem. To address this limitation, we propose a novel multi-view hair transfer framework that leverages a hair-only intermediate facial representation and a 3D-guided masking mechanism. Our approach disentangles tri-plane facial features into spatial geometric components and global style descriptors, enabling independent and precise control over hairstyle and hair color. By introducing a dedicated intermediate representation focused solely on hair and incorporating a two-stage feature fusion strategy guided by the generated 3D mask, our framework achieves fine-grained local editing across multiple viewpoints while preserving facial integrity and improving background consistency. Extensive experiments demonstrate that our method produces visually compelling and natural results in side-to-front view hair transfer tasks, offering a robust and flexible solution for high-fidelity hair reconstruction and manipulation.

快速、高逼真度的多视点头发转移对评估虚拟试发系统的有效性起着至关重要的作用。然而,基于gan的生成和编辑方法在特征解纠缠方面面临着持续的挑战。实现像素级、特定属性的修改——比如在不影响其他面部特征的情况下改变发型或发色——仍然是一个长期存在的问题。为了解决这一限制,我们提出了一种新的多视图头发转移框架,该框架利用仅头发的中间面部表示和3d引导掩蔽机制。我们的方法将三平面面部特征分解为空间几何成分和全局风格描述符,从而实现对发型和发色的独立和精确控制。通过引入专注于头发的专用中间表示,并结合由生成的3D蒙版引导的两阶段特征融合策略,我们的框架实现了跨多个视点的细粒度本地编辑,同时保持面部完整性并提高背景一致性。大量的实验表明,我们的方法在侧面到正面的头发转移任务中产生了视觉上引人注目和自然的结果,为高保真的头发重建和操作提供了一个强大而灵活的解决方案。
{"title":"Feature Disentanglement in GANs for Photorealistic Multi-view Hair Transfer","authors":"Jiayi Xu,&nbsp;Zhengyang Wu,&nbsp;Chenming Zhang,&nbsp;Xiaogang Jin,&nbsp;Yaohua Ji","doi":"10.1111/cgf.70245","DOIUrl":"https://doi.org/10.1111/cgf.70245","url":null,"abstract":"<p>Fast and highly realistic multi-view hair transfer plays a crucial role in evaluating the effectiveness of virtual hair try-on systems. However, GAN-based generation and editing methods face persistent challenges in feature disentanglement. Achieving pixel-level, attribute-specific modifications—such as changing hairstyle or hair color without affecting other facial features—remains a long-standing problem. To address this limitation, we propose a novel multi-view hair transfer framework that leverages a hair-only intermediate facial representation and a 3D-guided masking mechanism. Our approach disentangles tri-plane facial features into spatial geometric components and global style descriptors, enabling independent and precise control over hairstyle and hair color. By introducing a dedicated intermediate representation focused solely on hair and incorporating a two-stage feature fusion strategy guided by the generated 3D mask, our framework achieves fine-grained local editing across multiple viewpoints while preserving facial integrity and improving background consistency. Extensive experiments demonstrate that our method produces visually compelling and natural results in side-to-front view hair transfer tasks, offering a robust and flexible solution for high-fidelity hair reconstruction and manipulation.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments 实时每件服装虚拟试穿与时间一致性宽松合身的服装
IF 2.9 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-11 DOI: 10.1111/cgf.70272
Zaiqiang Wu, I-Chao Shen, Takeo Igarashi

Per-garment virtual try-on methods collect garment-specific datasets and train networks tailored to each garment to achieve superior results. However, these approaches often struggle with loose-fitting garments due to two key limitations: (1) They rely on human body semantic maps to align garments with the body, but these maps become unreliable when body contours are obscured by loose-fitting garments, resulting in degraded outcomes; (2) They train garment synthesis networks on a per-frame basis without utilizing temporal information, leading to noticeable jittering artifacts. To address the first limitation, we propose a two-stage approach for robust semantic map estimation. First, we extract a garment-invariant representation from the raw input image. This representation is then passed through an auxiliary network to estimate the semantic map. This enhances the robustness of semantic map estimation under loose-fitting garments during garment-specific dataset generation. To address the second limitation, we introduce a recurrent garment synthesis framework that incorporates temporal dependencies to improve frame-to-frame coherence while maintaining real-time performance. We conducted qualitative and quantitative evaluations to demonstrate that our method outperforms existing approaches in both image quality and temporal coherence. Ablation studies further validate the effectiveness of the garment-invariant representation and the recurrent synthesis framework.

每件衣服的虚拟试穿方法收集特定于衣服的数据集,并为每件衣服量身定制训练网络,以达到更好的效果。然而,由于两个关键的限制,这些方法经常在宽松的衣服上遇到困难:(1)它们依赖于人体语义图来将衣服与身体对齐,但是当身体轮廓被宽松的衣服遮挡时,这些地图就变得不可靠,导致结果下降;(2)他们在每帧的基础上训练服装合成网络,而不利用时间信息,导致明显的抖动伪影。为了解决第一个限制,我们提出了一种两阶段的鲁棒语义图估计方法。首先,我们从原始输入图像中提取服装不变表示。然后通过一个辅助网络来估计语义图。这增强了在服装特定数据集生成过程中宽松服装下语义图估计的鲁棒性。为了解决第二个限制,我们引入了一个循环服装合成框架,该框架结合了时间依赖性,以提高帧与帧之间的一致性,同时保持实时性能。我们进行了定性和定量评估,以证明我们的方法在图像质量和时间相干性方面优于现有方法。烧蚀研究进一步验证了服装不变表示和循环合成框架的有效性。
{"title":"Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments","authors":"Zaiqiang Wu,&nbsp;I-Chao Shen,&nbsp;Takeo Igarashi","doi":"10.1111/cgf.70272","DOIUrl":"https://doi.org/10.1111/cgf.70272","url":null,"abstract":"<p>Per-garment virtual try-on methods collect garment-specific datasets and train networks tailored to each garment to achieve superior results. However, these approaches often struggle with loose-fitting garments due to two key limitations: (1) They rely on human body semantic maps to align garments with the body, but these maps become unreliable when body contours are obscured by loose-fitting garments, resulting in degraded outcomes; (2) They train garment synthesis networks on a per-frame basis without utilizing temporal information, leading to noticeable jittering artifacts. To address the first limitation, we propose a two-stage approach for robust semantic map estimation. First, we extract a garment-invariant representation from the raw input image. This representation is then passed through an auxiliary network to estimate the semantic map. This enhances the robustness of semantic map estimation under loose-fitting garments during garment-specific dataset generation. To address the second limitation, we introduce a recurrent garment synthesis framework that incorporates temporal dependencies to improve frame-to-frame coherence while maintaining real-time performance. We conducted qualitative and quantitative evaluations to demonstrate that our method outperforms existing approaches in both image quality and temporal coherence. Ablation studies further validate the effectiveness of the garment-invariant representation and the recurrent synthesis framework.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cgf.70272","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RT-HDIST: Ray-Tracing Core-based Hausdorff Distance Computation RT-HDIST:基于光线追踪核心的豪斯多夫距离计算
IF 2.9 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-11 DOI: 10.1111/cgf.70229
Young Woo Kim, Jaehong Lee, Duksu Kim

The Hausdorff distance is a fundamental metric with widespread applications across various fields. However, its computation remains computationally expensive, especially for large-scale datasets. This work targets exact point-to-point Hausdorff distance on point sets. In this work, we present RT-HDIST, the first Hausdorff distance algorithm accelerated by ray-tracing cores (RT-cores). By reformulating the Hausdorff distance problem as a series of nearest-neighbor searches and introducing a novel quantized voxel-index space, RT-HDIST achieves significant reductions in computational overhead while maintaining exact results. Extensive benchmarks demonstrate up to a two-order-of-magnitude speedup over prior state-of-the-art methods, underscoring RT-HDIST's potential for real-time and large-scale applications.

豪斯多夫距离是一个基本度量,在各个领域有着广泛的应用。然而,它的计算仍然是计算昂贵的,特别是对于大规模数据集。这项工作的目标是点集上精确的点对点豪斯多夫距离。在这项工作中,我们提出了RT-HDIST,这是第一个由光线追踪核心(RT-cores)加速的豪斯多夫距离算法。通过将Hausdorff距离问题重新表述为一系列最近邻搜索,并引入一种新的量化体素索引空间,RT-HDIST在保持精确结果的同时显著减少了计算开销。广泛的基准测试表明,与之前最先进的方法相比,RT-HDIST的速度提高了两个数量级,强调了RT-HDIST在实时和大规模应用中的潜力。
{"title":"RT-HDIST: Ray-Tracing Core-based Hausdorff Distance Computation","authors":"Young Woo Kim,&nbsp;Jaehong Lee,&nbsp;Duksu Kim","doi":"10.1111/cgf.70229","DOIUrl":"https://doi.org/10.1111/cgf.70229","url":null,"abstract":"<p>The Hausdorff distance is a fundamental metric with widespread applications across various fields. However, its computation remains computationally expensive, especially for large-scale datasets. This work targets exact point-to-point Hausdorff distance on point sets. In this work, we present RT-HDIST, the first Hausdorff distance algorithm accelerated by ray-tracing cores (RT-cores). By reformulating the Hausdorff distance problem as a series of nearest-neighbor searches and introducing a novel quantized voxel-index space, RT-HDIST achieves significant reductions in computational overhead while maintaining exact results. Extensive benchmarks demonstrate up to a two-order-of-magnitude speedup over prior state-of-the-art methods, underscoring RT-HDIST's potential for real-time and large-scale applications.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Deblurring and 3D Reconstruction for Macrophotography 联合去模糊和三维重建的微距摄影
IF 2.9 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-11 DOI: 10.1111/cgf.70253
Yifan Zhao, Liangchen Li, Yuqi Zhou, Kai Wang, Yan Liang, Juyong Zhang

Macro lens has the advantages of high resolution and large magnification, and 3D modeling of small and detailed objects can provide richer information. However, defocus blur in macrophotography is a long-standing problem that heavily hinders the clear imaging of the captured objects and high-quality 3D reconstruction of them. Traditional image deblurring methods require a large number of images and annotations, and there is currently no multi-view 3D reconstruction method for macrophotography. In this work, we propose a joint deblurring and 3D reconstruction method for macrophotography. Starting from multi-view blurry images captured, we jointly optimize the clear 3D model of the object and the defocus blur kernel of each pixel. The entire framework adopts a differentiable rendering method to self-supervise the optimization of the 3D model and the defocus blur kernel. Extensive experiments show that from a small number of multi-view images, our proposed method can not only achieve high-quality image deblurring but also recover high-fidelity 3D appearance.

微距镜头具有分辨率高、放大倍率大的优点,对小而细的物体进行三维建模可以提供更丰富的信息。然而,在微距摄影中,离焦模糊是一个长期存在的问题,严重阻碍了拍摄对象的清晰成像和高质量的3D重建。传统的图像去模糊方法需要大量的图像和注释,目前还没有针对宏观摄影的多视图三维重建方法。在这项工作中,我们提出了一种联合去模糊和三维重建方法的宏观摄影。从捕获的多视点模糊图像出发,共同优化物体的清晰3D模型和每个像素的散焦模糊核。整个框架采用一种可微渲染方法来自监督三维模型和散焦模糊核的优化。大量实验表明,从少量的多视图图像中,我们提出的方法不仅可以实现高质量的图像去模糊,而且可以恢复高保真的3D外观。
{"title":"Joint Deblurring and 3D Reconstruction for Macrophotography","authors":"Yifan Zhao,&nbsp;Liangchen Li,&nbsp;Yuqi Zhou,&nbsp;Kai Wang,&nbsp;Yan Liang,&nbsp;Juyong Zhang","doi":"10.1111/cgf.70253","DOIUrl":"https://doi.org/10.1111/cgf.70253","url":null,"abstract":"<p>Macro lens has the advantages of high resolution and large magnification, and 3D modeling of small and detailed objects can provide richer information. However, defocus blur in macrophotography is a long-standing problem that heavily hinders the clear imaging of the captured objects and high-quality 3D reconstruction of them. Traditional image deblurring methods require a large number of images and annotations, and there is currently no multi-view 3D reconstruction method for macrophotography. In this work, we propose a joint deblurring and 3D reconstruction method for macrophotography. Starting from multi-view blurry images captured, we jointly optimize the clear 3D model of the object and the defocus blur kernel of each pixel. The entire framework adopts a differentiable rendering method to self-supervise the optimization of the 3D model and the defocus blur kernel. Extensive experiments show that from a small number of multi-view images, our proposed method can not only achieve high-quality image deblurring but also recover high-fidelity 3D appearance.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Projective Displacement Mapping for Ray Traced Editable Surfaces 投影位移映射光线跟踪可编辑的表面
IF 2.9 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-11 DOI: 10.1111/cgf.70235
Rama Hoetzlein

Displacement mapping is an important tool for modeling detailed geometric features. We explore the problem of authoring complex surfaces while ray tracing interactively. Current techniques for ray tracing displaced surfaces rely on acceleration structures that require dynamic rebuilding when edited. These techniques are typically used for massive static scenes or the compression of detailed source assets. Our interest lies in modeling and look development of artistic features with real-time ray tracing. We introduce projective displacement mapping as a direct sampling method combined with a hardware BVH. Quality and performance are improved over existing methods with smoothed displaced normals, thin feature sampling, tight prism bounds and ray bi-linear patch intersections.

位移映射是精细几何特征建模的重要工具。我们探讨了在光线追踪交互时创作复杂表面的问题。当前的光线追踪技术依赖于需要在编辑时动态重建的加速结构。这些技术通常用于大规模静态场景或详细源资产的压缩。我们的兴趣在于建模和实时光线追踪的艺术特征的外观发展。我们将投影位移映射作为一种直接采样方法与硬件BVH相结合。与现有的平滑置换法线、薄特征采样、紧棱镜边界和射线双线贴片相交的方法相比,质量和性能得到了提高。
{"title":"Projective Displacement Mapping for Ray Traced Editable Surfaces","authors":"Rama Hoetzlein","doi":"10.1111/cgf.70235","DOIUrl":"https://doi.org/10.1111/cgf.70235","url":null,"abstract":"<p>Displacement mapping is an important tool for modeling detailed geometric features. We explore the problem of authoring complex surfaces while ray tracing interactively. Current techniques for ray tracing displaced surfaces rely on acceleration structures that require dynamic rebuilding when edited. These techniques are typically used for massive static scenes or the compression of detailed source assets. Our interest lies in modeling and look development of artistic features with real-time ray tracing. We introduce projective displacement mapping as a direct sampling method combined with a hardware BVH. Quality and performance are improved over existing methods with smoothed displaced normals, thin feature sampling, tight prism bounds and ray bi-linear patch intersections.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cgf.70235","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Sparse Transformer and Feature Alignment for Efficient Image Completion 基于混合稀疏变换和特征对齐的高效图像补全
IF 2.9 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-11 DOI: 10.1111/cgf.70255
L. Chen, H. Sun

In this paper, we propose an efficient single-stage hybrid architecture for image completion. Existing transformer-based image completion methods often struggle with accurate content restoration, largely due to their ineffective modeling of corrupted channel information and the attention noise introduced by softmax-based mechanisms, which results in blurry textures and distorted structures. Additionally, these methods frequently fail to maintain texture consistency, either relying on imprecise mask sampling or incurring substantial computational costs from complex similarity calculations. To address these limitations, we present two key contributions: a Hybrid Sparse Self-Attention (HSA) module and a Feature Alignment Module (FAM). The HSA module enhances structural recovery by decoupling spatial and channel attention with sparse activation, while the FAM enforces texture consistency by aligning encoder and decoder features via a mask-free, energy-gated mechanism without additional inference cost. Our method achieves state-of-the-art image completion results with the fastest inference speed among single-stage networks, as measured by PSNR, SSIM, FID, and LPIPS on CelebA-HQ, Places2, and Paris datasets.

在本文中,我们提出了一种高效的单级混合图像补全架构。现有的基于变压器的图像补全方法往往难以实现准确的内容恢复,这主要是由于它们对损坏的通道信息建模无效,以及基于softmax的机制引入的注意噪声,导致纹理模糊和结构扭曲。此外,这些方法往往不能保持纹理一致性,要么依赖于不精确的掩模采样,要么从复杂的相似性计算中产生大量的计算成本。为了解决这些限制,我们提出了两个关键贡献:混合稀疏自关注(HSA)模块和特征对齐模块(FAM)。HSA模块通过将空间和通道注意力与稀疏激活解耦来增强结构恢复,而FAM模块通过无掩模、能量门控机制对齐编码器和解码器特征来增强纹理一致性,而无需额外的推理成本。通过CelebA-HQ、Places2和Paris数据集上的PSNR、SSIM、FID和LPIPS测量,我们的方法以单级网络中最快的推理速度获得了最先进的图像补全结果。
{"title":"Hybrid Sparse Transformer and Feature Alignment for Efficient Image Completion","authors":"L. Chen,&nbsp;H. Sun","doi":"10.1111/cgf.70255","DOIUrl":"https://doi.org/10.1111/cgf.70255","url":null,"abstract":"<p>In this paper, we propose an efficient single-stage hybrid architecture for image completion. Existing transformer-based image completion methods often struggle with accurate content restoration, largely due to their ineffective modeling of corrupted channel information and the attention noise introduced by softmax-based mechanisms, which results in blurry textures and distorted structures. Additionally, these methods frequently fail to maintain texture consistency, either relying on imprecise mask sampling or incurring substantial computational costs from complex similarity calculations. To address these limitations, we present two key contributions: a Hybrid Sparse Self-Attention (HSA) module and a Feature Alignment Module (FAM). The HSA module enhances structural recovery by decoupling spatial and channel attention with sparse activation, while the FAM enforces texture consistency by aligning encoder and decoder features via a mask-free, energy-gated mechanism without additional inference cost. Our method achieves state-of-the-art image completion results with the fastest inference speed among single-stage networks, as measured by PSNR, SSIM, FID, and LPIPS on CelebA-HQ, Places2, and Paris datasets.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
G-SplatGAN: Disentangled 3D Gaussian Generation for Complex Shapes via Multi-Scale Patch Discriminators G-SplatGAN:基于多尺度斑块鉴别器的复杂形状解纠缠三维高斯生成
IF 2.9 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-11 DOI: 10.1111/cgf.70256
Jiaqi Li, Haochuan Dang, Zhi Zhou, Junke Zhu, Zhangjin Huang

Generating 3D objects with complex topologies from monocular images remains a challenge in computer graphics, due to the difficulty of modeling varying 3D shapes with disentangled, steerable geometry and visual attributes. While NeRF-based methods suffer from slow volumetric rendering and limited structural controllability. Recent advances in 3D Gaussian Splatting provide a more efficient alternative and its generative modeling with separate control over structure and appearance remains underexplored. In this paper, we propose G-SplatGAN, a novel 3D-aware generation framework that combines the rendering efficiency of 3D Gaussian Splatting with disentangled latent modeling. Starting from a shared Gaussian template, our method uses dual modulation branches to modulate geometry and appearance from independent latent codes, enabling precise shape manipulation and controllable generation. We adopt a progressive adversarial training scheme with multi-scale and patch-based discriminators to capture both global structure and local detail. Our model requires no 3D supervision and is trained on monocular images with known camera poses, reducing data reliance while supporting real image inversion through a geometry-aware encoder. Experiments show that G-SplatGAN achieves superior performance in rendering speed, controllability and image fidelity, offering a compelling solution for controllable 3D generation using Gaussian representations.

从单眼图像中生成具有复杂拓扑结构的3D对象仍然是计算机图形学中的一个挑战,因为难以对具有解纠缠、可操纵的几何形状和视觉属性的不同3D形状进行建模。而基于nerf的方法存在体积渲染缓慢和结构可控性有限的问题。3D高斯飞溅的最新进展提供了一种更有效的替代方案,其生成建模对结构和外观的单独控制仍未得到充分探索。在本文中,我们提出了一种新的3D感知生成框架G-SplatGAN,它将3D高斯飞溅的渲染效率与解纠缠潜在建模相结合。我们的方法从共享高斯模板开始,使用双调制分支从独立的潜在代码调制几何形状和外观,从而实现精确的形状操纵和可控的生成。我们采用一种渐进的对抗训练方案,采用多尺度和基于补丁的鉴别器来捕获全局结构和局部细节。我们的模型不需要3D监督,并且在已知相机姿势的单眼图像上进行训练,减少了数据依赖,同时通过几何感知编码器支持真实图像反转。实验表明,G-SplatGAN在渲染速度、可控性和图像保真度方面取得了优异的性能,为使用高斯表示进行可控3D生成提供了一个令人信服的解决方案。
{"title":"G-SplatGAN: Disentangled 3D Gaussian Generation for Complex Shapes via Multi-Scale Patch Discriminators","authors":"Jiaqi Li,&nbsp;Haochuan Dang,&nbsp;Zhi Zhou,&nbsp;Junke Zhu,&nbsp;Zhangjin Huang","doi":"10.1111/cgf.70256","DOIUrl":"https://doi.org/10.1111/cgf.70256","url":null,"abstract":"<p>Generating 3D objects with complex topologies from monocular images remains a challenge in computer graphics, due to the difficulty of modeling varying 3D shapes with disentangled, steerable geometry and visual attributes. While NeRF-based methods suffer from slow volumetric rendering and limited structural controllability. Recent advances in 3D Gaussian Splatting provide a more efficient alternative and its generative modeling with separate control over structure and appearance remains underexplored. In this paper, we propose <b>G-SplatGAN</b>, a novel 3D-aware generation framework that combines the rendering efficiency of 3D Gaussian Splatting with disentangled latent modeling. Starting from a shared Gaussian template, our method uses dual modulation branches to modulate geometry and appearance from independent latent codes, enabling precise shape manipulation and controllable generation. We adopt a progressive adversarial training scheme with multi-scale and patch-based discriminators to capture both global structure and local detail. Our model requires no 3D supervision and is trained on monocular images with known camera poses, reducing data reliance while supporting real image inversion through a geometry-aware encoder. Experiments show that G-SplatGAN achieves superior performance in rendering speed, controllability and image fidelity, offering a compelling solution for controllable 3D generation using Gaussian representations.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Saliency for Semantic Image Abstractions in Robotic Painting 基于显著性的机器人绘画语义图像抽象
IF 2.9 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-11 DOI: 10.1111/cgf.70259
Michael Stroh, Patrick Paetzold, Daniel Berio, Rebecca Kehlbeck, Frederic Fol Leymarie, Oliver Deussen, Noura Faraj

We present an adaptive, semantics-based abstraction approach that balances aesthetic quality and structural coherence within the practical constraints of robotic painting. We apply panoptic segmentation with color-based over-segmentation to partition images into meaningful regions aligned with semantic objects, while providing flexible abstraction levels. Automatic parameter selection for region merging is enabled by semantic saliency maps, derived from Out-of-Distribution segmentation techniques in combination with machine learning methods for feature detection. This preserves the boundaries of salient objects while simplifying less prominent regions. A graph-based community detection step further refines the abstraction by grouping regions according to local connectivity and semantic coherence. The runtime of our method outperforms optimization-based image vectorization methods, enabling the efficient generation of multiple abstraction levels that can serve as hierarchical layers for robotic painting. We demonstrate the quality of our method by showing abstraction results, robotic paintings with the e-David robot, and a comparison to other abstraction methods.

我们提出了一种自适应的、基于语义的抽象方法,在机器人绘画的实际约束下平衡美学质量和结构一致性。我们采用基于颜色的过度分割的全视分割,将图像划分为与语义对象对齐的有意义区域,同时提供灵活的抽象层次。区域合并的自动参数选择是通过语义显著性图实现的,该图来源于out - distribution分割技术,并结合机器学习方法进行特征检测。这保留了突出对象的边界,同时简化了不太突出的区域。基于图的社区检测步骤通过根据局部连通性和语义一致性对区域进行分组,进一步细化了抽象。我们的方法的运行时间优于基于优化的图像矢量化方法,能够有效地生成多个抽象层,这些抽象层可以作为机器人绘画的分层层。我们通过展示抽象结果、使用e-David机器人的机器人绘画以及与其他抽象方法的比较来证明我们方法的质量。
{"title":"Using Saliency for Semantic Image Abstractions in Robotic Painting","authors":"Michael Stroh,&nbsp;Patrick Paetzold,&nbsp;Daniel Berio,&nbsp;Rebecca Kehlbeck,&nbsp;Frederic Fol Leymarie,&nbsp;Oliver Deussen,&nbsp;Noura Faraj","doi":"10.1111/cgf.70259","DOIUrl":"https://doi.org/10.1111/cgf.70259","url":null,"abstract":"<p>We present an adaptive, semantics-based abstraction approach that balances aesthetic quality and structural coherence within the practical constraints of robotic painting. We apply panoptic segmentation with color-based over-segmentation to partition images into meaningful regions aligned with semantic objects, while providing flexible abstraction levels. Automatic parameter selection for region merging is enabled by semantic saliency maps, derived from Out-of-Distribution segmentation techniques in combination with machine learning methods for feature detection. This preserves the boundaries of salient objects while simplifying less prominent regions. A graph-based community detection step further refines the abstraction by grouping regions according to local connectivity and semantic coherence. The runtime of our method outperforms optimization-based image vectorization methods, enabling the efficient generation of multiple abstraction levels that can serve as hierarchical layers for robotic painting. We demonstrate the quality of our method by showing abstraction results, robotic paintings with the e-David robot, and a comparison to other abstraction methods.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cgf.70259","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GS-Share: Enabling High-fidelity Map Sharing with Incremental Gaussian Splatting GS-Share:通过增量高斯飞溅实现高保真地图共享
IF 2.9 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-11 DOI: 10.1111/cgf.70248
Xinran Zhang, Hanqi Zhu, Yifan Duan, Yanyong Zhang

Constructing and sharing 3D maps is essential for many applications, including autonomous driving and augmented reality. Recently, 3D Gaussian splatting has emerged as a promising approach for accurate 3D reconstruction. However, a practical map-sharing system that features high-fidelity, continuous updates, and network efficiency remains elusive. To address these challenges, we introduce GS-Share, a photorealistic map-sharing system with a compact representation. The core of GS-Share includes anchor-based global map construction, virtual-image-based map enhancement, and incremental map update. We evaluate GS-Share against state-of-the-art methods, demonstrating that our system achieves higher fidelity, particularly for extrapolated views, with improvements of 11%, 22%, and 74% in PSNR, LPIPS, and Depth L1, respectively. Furthermore, GS-Share is significantly more compact, reducing map transmission overhead by 36%.

构建和共享3D地图对于包括自动驾驶和增强现实在内的许多应用程序都是必不可少的。最近,三维高斯溅射已经成为一种很有前途的精确三维重建方法。然而,一个具有高保真度、持续更新和网络效率的实用地图共享系统仍然难以实现。为了解决这些挑战,我们引入了GS-Share,一个具有紧凑表示的逼真地图共享系统。GS-Share的核心包括基于锚点的全球地图构建、基于虚拟图像的地图增强和增量地图更新。我们用最先进的方法对GS-Share进行了评估,证明我们的系统实现了更高的保真度,特别是对于外推视图,在PSNR、LPIPS和L1深度方面分别提高了11%、22%和74%。此外,GS-Share明显更加紧凑,将地图传输开销减少了36%。
{"title":"GS-Share: Enabling High-fidelity Map Sharing with Incremental Gaussian Splatting","authors":"Xinran Zhang,&nbsp;Hanqi Zhu,&nbsp;Yifan Duan,&nbsp;Yanyong Zhang","doi":"10.1111/cgf.70248","DOIUrl":"https://doi.org/10.1111/cgf.70248","url":null,"abstract":"<p>Constructing and sharing 3D maps is essential for many applications, including autonomous driving and augmented reality. Recently, 3D Gaussian splatting has emerged as a promising approach for accurate 3D reconstruction. However, a practical map-sharing system that features high-fidelity, continuous updates, and network efficiency remains elusive. To address these challenges, we introduce GS-Share, a photorealistic map-sharing system with a compact representation. The core of GS-Share includes anchor-based global map construction, virtual-image-based map enhancement, and incremental map update. We evaluate GS-Share against state-of-the-art methods, demonstrating that our system achieves higher fidelity, particularly for extrapolated views, with improvements of 11%, 22%, and 74% in PSNR, LPIPS, and Depth L1, respectively. Furthermore, GS-Share is significantly more compact, reducing map transmission overhead by 36%.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal 3D Few-Shot Classification via Gaussian Mixture Discriminant Analysis 基于高斯混合判别分析的多模态三维少弹分类
IF 2.9 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-10-11 DOI: 10.1111/cgf.70268
Yiqi Wu, Huachao Wu, Ronglei Hu, Yilin Chen, Dejun Zhang

While pre-trained 3D vision-language models are becoming increasingly available, there remains a lack of frameworks that can effectively harness their capabilities for few-shot classification. In this work, we propose PointGMDA, a training-free framework that combines Gaussian Mixture Models (GMMs) with Gaussian Discriminant Analysis (GDA) to perform robust classification using only a few labeled point cloud samples. Our method estimates GMM parameters per class from support data and computes mixture-weighted prototypes, which are then used in GDA with a shared covariance matrix to construct decision boundaries. This formulation allows us to model intra-class variability more expressively than traditional single-prototype approaches, while maintaining analytical tractability. To incorporate semantic priors, we integrate CLIP-style textual prompts and fuse predictions from geometric and textual modalities through a hybrid scoring strategy. We further introduce PointGMDA-T, a lightweight attention-guided refinement module that learns residuals for fast feature adaptation, improving robustness under distribution shift. Extensive experiments on ModelNet40 and ScanObjectNN demonstrate that PointGMDA outperforms strong baselines across a variety of few-shot settings, with consistent gains under both training-free and fine-tuned conditions. These results highlight the effectiveness and generality of our probabilistic modeling and multimodal adaptation framework. Our code is publicly available at https://github.com/djzgroup/PointGMDA.

虽然预训练的3D视觉语言模型变得越来越可用,但仍然缺乏能够有效利用其少量镜头分类能力的框架。在这项工作中,我们提出了PointGMDA,这是一个将高斯混合模型(GMMs)与高斯判别分析(GDA)相结合的无训练框架,仅使用少量标记的点云样本进行鲁棒分类。我们的方法从支持数据中估计每个类的GMM参数,并计算混合加权原型,然后将混合加权原型与共享协方差矩阵一起用于GDA以构建决策边界。这个公式允许我们比传统的单一原型方法更有表现力地建模类内的可变性,同时保持分析的可追溯性。为了整合语义先验,我们整合了clip风格的文本提示,并通过混合评分策略融合了几何和文本模式的预测。我们进一步介绍了PointGMDA-T,这是一个轻量级的注意力引导改进模块,可以学习残差以快速适应特征,提高分布变化下的鲁棒性。在ModelNet40和ScanObjectNN上进行的大量实验表明,PointGMDA在各种少数镜头设置中优于强基线,在无训练和微调条件下都具有一致的增益。这些结果突出了我们的概率建模和多模态适应框架的有效性和普遍性。我们的代码可以在https://github.com/djzgroup/PointGMDA上公开获得。
{"title":"Multimodal 3D Few-Shot Classification via Gaussian Mixture Discriminant Analysis","authors":"Yiqi Wu,&nbsp;Huachao Wu,&nbsp;Ronglei Hu,&nbsp;Yilin Chen,&nbsp;Dejun Zhang","doi":"10.1111/cgf.70268","DOIUrl":"https://doi.org/10.1111/cgf.70268","url":null,"abstract":"<p>While pre-trained 3D vision-language models are becoming increasingly available, there remains a lack of frameworks that can effectively harness their capabilities for few-shot classification. In this work, we propose PointGMDA, a training-free framework that combines Gaussian Mixture Models (GMMs) with Gaussian Discriminant Analysis (GDA) to perform robust classification using only a few labeled point cloud samples. Our method estimates GMM parameters per class from support data and computes mixture-weighted prototypes, which are then used in GDA with a shared covariance matrix to construct decision boundaries. This formulation allows us to model intra-class variability more expressively than traditional single-prototype approaches, while maintaining analytical tractability. To incorporate semantic priors, we integrate CLIP-style textual prompts and fuse predictions from geometric and textual modalities through a hybrid scoring strategy. We further introduce PointGMDA-T, a lightweight attention-guided refinement module that learns residuals for fast feature adaptation, improving robustness under distribution shift. Extensive experiments on ModelNet40 and ScanObjectNN demonstrate that PointGMDA outperforms strong baselines across a variety of few-shot settings, with consistent gains under both training-free and fine-tuned conditions. These results highlight the effectiveness and generality of our probabilistic modeling and multimodal adaptation framework. Our code is publicly available at https://github.com/djzgroup/PointGMDA.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 7","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145297327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Graphics Forum
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1