首页 > 最新文献

Computer Animation and Virtual Worlds最新文献

英文 中文
SCNet: A Dual-Branch Network for Strong Noisy Image Denoising Based on Swin Transformer and ConvNeXt 基于Swin变压器和ConvNeXt的双支路强噪声图像去噪网络
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-06-03 DOI: 10.1002/cav.70030
Chuchao Lin, Changjun Zou, Hangbin Xu

Image denoising plays a vital role in restoring high-quality images from noisy inputs and directly impacts downstream vision tasks. Traditional methods often fail under strong noise, causing detail loss or excessive smoothing. While recent Convolutional Neural Networks-based and Transformer-based models have shown progress, they struggle to jointly capture global structure and preserve local details. To address this, we propose SCNet, a dual-branch fusion network tailored for strong-noise denoising. It combines a Swin Transformer branch for global context modeling and a ConvNeXt branch for fine-grained local feature extraction. Their outputs are adaptively merged via a Feature Fusion Block using joint spatial and channel attention, ensuring semantic consistency and texture fidelity. A multi-scale upsampling module and the Charbonnier loss further improve structural accuracy and visual quality. Extensive experiments on four benchmark datasets show that SCNet outperforms state-of-the-art methods, especially under severe noise, and proves effective in real-world tasks such as mural image restoration.

图像去噪在从噪声输入恢复高质量图像中起着至关重要的作用,并直接影响下游的视觉任务。传统的方法往往在强噪声下失效,造成细节丢失或过度平滑。虽然最近基于卷积神经网络和基于transformer的模型取得了进展,但它们很难同时捕获全局结构并保留局部细节。为了解决这个问题,我们提出了SCNet,一种专为强噪声去噪而设计的双分支融合网络。它结合了用于全局上下文建模的Swin Transformer分支和用于细粒度局部特征提取的ConvNeXt分支。它们的输出通过使用联合空间和通道关注的特征融合块自适应合并,确保语义一致性和纹理保真度。多尺度上采样模块和Charbonnier损失进一步提高了结构精度和视觉质量。在四个基准数据集上进行的大量实验表明,SCNet优于最先进的方法,特别是在严重噪声下,并且在壁画图像恢复等现实任务中被证明是有效的。
{"title":"SCNet: A Dual-Branch Network for Strong Noisy Image Denoising Based on Swin Transformer and ConvNeXt","authors":"Chuchao Lin,&nbsp;Changjun Zou,&nbsp;Hangbin Xu","doi":"10.1002/cav.70030","DOIUrl":"https://doi.org/10.1002/cav.70030","url":null,"abstract":"<div>\u0000 \u0000 <p>Image denoising plays a vital role in restoring high-quality images from noisy inputs and directly impacts downstream vision tasks. Traditional methods often fail under strong noise, causing detail loss or excessive smoothing. While recent Convolutional Neural Networks-based and Transformer-based models have shown progress, they struggle to jointly capture global structure and preserve local details. To address this, we propose SCNet, a dual-branch fusion network tailored for strong-noise denoising. It combines a Swin Transformer branch for global context modeling and a ConvNeXt branch for fine-grained local feature extraction. Their outputs are adaptively merged via a Feature Fusion Block using joint spatial and channel attention, ensuring semantic consistency and texture fidelity. A multi-scale upsampling module and the Charbonnier loss further improve structural accuracy and visual quality. Extensive experiments on four benchmark datasets show that SCNet outperforms state-of-the-art methods, especially under severe noise, and proves effective in real-world tasks such as mural image restoration.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144196987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AIKII: An AI-Enhanced Knowledge Interactive Interface for Knowledge Representation in Educational Games AIKII:用于教育游戏中知识表示的ai增强知识交互界面
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-06-02 DOI: 10.1002/cav.70052
Dake Liu, Huiwen Zhao, Wen Tang, Wenwen Yang

The use of generative AI to create responsive and adaptive game content has attracted considerable interest within the educational game design community, highlighting its potential as a tool for enhancing players' understanding of in-game knowledge. However, designing effective player-AI interaction to support knowledge representation remains unexplored. This paper presents AIKII, an AI-enhanced Knowledge Interaction Interface designed to facilitate knowledge representation in educational games. AIKII employs various interaction channels to represent in-game knowledge and support player engagement. To investigate its effectiveness and user learning experience, we implemented AIKII into The Journey of Poetry, an educational game centered on learning Chinese poetry, and conducted interviews with university students. The results demonstrated that our method fosters contextual and reflective connections between players and in-game knowledge, enhancing player autonomy and immersion.

使用生成式AI来创造响应性和适应性的游戏内容已经引起了教育游戏设计社区的极大兴趣,突出了它作为增强玩家对游戏知识理解的工具的潜力。然而,如何设计有效的玩家- ai交互来支持知识表示仍有待探索。本文介绍了AIKII,一个人工智能增强的知识交互界面,旨在促进教育游戏中的知识表示。AIKII采用多种互动渠道来呈现游戏知识并支持玩家粘性。为了调查AIKII的有效性和用户学习体验,我们将AIKII应用于以学习中国诗歌为中心的教育游戏《诗之旅》中,并对大学生进行了采访。结果表明,我们的方法促进了玩家与游戏知识之间的情境和反思联系,增强了玩家的自主性和沉浸感。
{"title":"AIKII: An AI-Enhanced Knowledge Interactive Interface for Knowledge Representation in Educational Games","authors":"Dake Liu,&nbsp;Huiwen Zhao,&nbsp;Wen Tang,&nbsp;Wenwen Yang","doi":"10.1002/cav.70052","DOIUrl":"https://doi.org/10.1002/cav.70052","url":null,"abstract":"<div>\u0000 \u0000 <p>The use of generative AI to create responsive and adaptive game content has attracted considerable interest within the educational game design community, highlighting its potential as a tool for enhancing players' understanding of in-game knowledge. However, designing effective player-AI interaction to support knowledge representation remains unexplored. This paper presents AIKII, an AI-enhanced Knowledge Interaction Interface designed to facilitate knowledge representation in educational games. AIKII employs various interaction channels to represent in-game knowledge and support player engagement. To investigate its effectiveness and user learning experience, we implemented AIKII into The Journey of Poetry, an educational game centered on learning Chinese poetry, and conducted interviews with university students. The results demonstrated that our method fosters contextual and reflective connections between players and in-game knowledge, enhancing player autonomy and immersion.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144197102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DTGS: Defocus-Tolerant View Synthesis Using Gaussian Splatting 使用高斯飞溅的容散焦视图合成
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-06-02 DOI: 10.1002/cav.70045
Xinying Dai, Li Yao

Defocus blur poses a significant challenge for 3D reconstruction, as traditional methods often struggle to maintain detail and accuracy in blurred regions. Building upon the recent advancements in the 3DGS technique, we propose an architecture for 3D scene reconstruction from defocused blurry images. Due to the sparsity of point clouds initialized by SfM, we improve the scene representation by reasonably filling in new Gaussians where the Gaussian field is insufficient. During the optimization phase, we adjust the gradient field based on the depth values of the Gaussians and introduce perceptual loss in the objective function to reduce reconstruction bias caused by blurriness and enhance the realism of the rendered results. Experimental results on both synthetic and real datasets show that our method outperforms existing approaches in terms of reconstruction quality and robustness, even under challenging defocus blur conditions.

散焦模糊对3D重建提出了重大挑战,因为传统的方法往往难以保持模糊区域的细节和准确性。基于3DGS技术的最新进展,我们提出了一种从散焦模糊图像中重建3D场景的架构。由于SfM初始化点云的稀疏性,我们通过在高斯场不足的地方合理填充新的高斯场来改善场景表示。在优化阶段,我们根据高斯函数的深度值对梯度场进行调整,并在目标函数中引入感知损失,以减少由于模糊造成的重建偏差,增强渲染结果的真实感。在合成数据集和真实数据集上的实验结果表明,即使在具有挑战性的散焦模糊条件下,我们的方法在重建质量和鲁棒性方面也优于现有方法。
{"title":"DTGS: Defocus-Tolerant View Synthesis Using Gaussian Splatting","authors":"Xinying Dai,&nbsp;Li Yao","doi":"10.1002/cav.70045","DOIUrl":"https://doi.org/10.1002/cav.70045","url":null,"abstract":"<div>\u0000 \u0000 <p>Defocus blur poses a significant challenge for 3D reconstruction, as traditional methods often struggle to maintain detail and accuracy in blurred regions. Building upon the recent advancements in the 3DGS technique, we propose an architecture for 3D scene reconstruction from defocused blurry images. Due to the sparsity of point clouds initialized by SfM, we improve the scene representation by reasonably filling in new Gaussians where the Gaussian field is insufficient. During the optimization phase, we adjust the gradient field based on the depth values of the Gaussians and introduce perceptual loss in the objective function to reduce reconstruction bias caused by blurriness and enhance the realism of the rendered results. Experimental results on both synthetic and real datasets show that our method outperforms existing approaches in terms of reconstruction quality and robustness, even under challenging defocus blur conditions.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144197101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint-Learning: A Robust Segmentation Method for 3D Point Clouds Under Label Noise 联合学习:标签噪声下三维点云的鲁棒分割方法
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-06-01 DOI: 10.1002/cav.70038
Mengyao Zhang, Jie Zhou, Tingyun Miao, Yong Zhao, Xin Si, Jingliang Zhang

Most of point cloud segmentation methods are based on clean datasets and are easily affected by label noise. We present a novel method called Joint-learning, which is the first attempt to apply a dual-network framework to point cloud segmentation with noisy labels. Two networks are trained simultaneously, and each network selects clean samples to update its peer network. The communication between two networks is able to exchange the knowledge they learned, possessing good robustness and generalization ability. Subsequently, adaptive sample selection is proposed to maximize the learning capacity. When the accuracies of both networks are no longer improving, the selection rate is reduced, which results in cleaner selected samples. To further reduce the impact of noisy labels, for unselected samples, we provide a joint label correction algorithm to rectify their labels via two networks' predictions. We conduct various experiments on S3DIS and ScanNet-v2 datasets under different types and rates of noises. Both quantitative and qualitative results verify the reasonableness and effectiveness of the proposed method. By comparison, our method is substantially superior to the state-of-the-art methods and achieves the best results in all noise settings. The average performance improvement is more than 7.43%, with a maximum of 11.42%.

大多数点云分割方法都是基于干净的数据集,容易受到标签噪声的影响。我们提出了一种称为联合学习的新方法,这是首次尝试将双网络框架应用于带噪声标签的点云分割。同时训练两个网络,每个网络选择干净的样本来更新其对等网络。两个网络之间的通信能够交换学到的知识,具有良好的鲁棒性和泛化能力。随后,提出了自适应样本选择,使学习能力最大化。当两种网络的准确率不再提高时,选择率会降低,从而导致选择的样本更干净。为了进一步减少噪声标签的影响,对于未选择的样本,我们提供了一种联合标签校正算法,通过两个网络的预测来校正它们的标签。我们在S3DIS和ScanNet-v2数据集上进行了不同类型和速率的噪声实验。定量和定性结果验证了所提方法的合理性和有效性。通过比较,我们的方法实质上优于最先进的方法,并在所有噪声设置中达到最佳效果。平均性能提升幅度超过7.43%,最大提升幅度为11.42%。
{"title":"Joint-Learning: A Robust Segmentation Method for 3D Point Clouds Under Label Noise","authors":"Mengyao Zhang,&nbsp;Jie Zhou,&nbsp;Tingyun Miao,&nbsp;Yong Zhao,&nbsp;Xin Si,&nbsp;Jingliang Zhang","doi":"10.1002/cav.70038","DOIUrl":"https://doi.org/10.1002/cav.70038","url":null,"abstract":"<div>\u0000 \u0000 <p>Most of point cloud segmentation methods are based on clean datasets and are easily affected by label noise. We present a novel method called Joint-learning, which is the first attempt to apply a dual-network framework to point cloud segmentation with noisy labels. Two networks are trained simultaneously, and each network selects clean samples to update its peer network. The communication between two networks is able to exchange the knowledge they learned, possessing good robustness and generalization ability. Subsequently, adaptive sample selection is proposed to maximize the learning capacity. When the accuracies of both networks are no longer improving, the selection rate is reduced, which results in cleaner selected samples. To further reduce the impact of noisy labels, for unselected samples, we provide a joint label correction algorithm to rectify their labels via two networks' predictions. We conduct various experiments on S3DIS and ScanNet-v2 datasets under different types and rates of noises. Both quantitative and qualitative results verify the reasonableness and effectiveness of the proposed method. By comparison, our method is substantially superior to the state-of-the-art methods and achieves the best results in all noise settings. The average performance improvement is more than 7.43%, with a maximum of 11.42%.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144190905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Talking Face Generation With Lip and Identity Priors 有嘴唇和身份先验的说话面孔一代
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-28 DOI: 10.1002/cav.70026
Jiajie Wu, Frederick W. B. Li, Gary K. L. Tam, Bailin Yang, Fangzhe Nan, Jiahao Pan

Speech-driven talking face video generation has attracted growing interest in recent research. While person-specific approaches yield high-fidelity results, they require extensive training data from each individual speaker. In contrast, general-purpose methods often struggle with accurate lip synchronization, identity preservation, and natural facial movements. To address these limitations, we propose a novel architecture that combines an alignment model with a rendering model. The rendering model synthesizes identity-consistent lip movements by leveraging facial landmarks derived from speech, a partially occluded target face, multi-reference lip features, and the input audio. Concurrently, the alignment model estimates optical flow using the occluded face and a static reference image, enabling precise alignment of facial poses and lip shapes. This collaborative design enhances the rendering process, resulting in more realistic and identity-preserving outputs. Extensive experiments demonstrate that our method significantly improves lip synchronization and identity retention, establishing a new benchmark in talking face video generation.

语音驱动的说话脸视频生成在最近的研究中引起了越来越多的兴趣。虽然针对个人的方法可以产生高保真度的结果,但它们需要来自每个说话者的大量训练数据。相比之下,通用的方法往往与精确的嘴唇同步、身份保存和自然的面部运动作斗争。为了解决这些限制,我们提出了一种结合了对齐模型和呈现模型的新架构。该渲染模型通过利用来自语音的面部标志、部分遮挡的目标面部、多参考嘴唇特征和输入音频来综合身份一致的嘴唇运动。同时,对齐模型使用被遮挡的面部和静态参考图像估计光流,从而实现面部姿势和唇形的精确对齐。这种协作设计增强了渲染过程,从而产生更真实和保留身份的输出。大量的实验表明,我们的方法显著提高了唇部同步和身份保留,为语音人脸视频生成建立了新的基准。
{"title":"Talking Face Generation With Lip and Identity Priors","authors":"Jiajie Wu,&nbsp;Frederick W. B. Li,&nbsp;Gary K. L. Tam,&nbsp;Bailin Yang,&nbsp;Fangzhe Nan,&nbsp;Jiahao Pan","doi":"10.1002/cav.70026","DOIUrl":"https://doi.org/10.1002/cav.70026","url":null,"abstract":"<div>\u0000 \u0000 <p>Speech-driven talking face video generation has attracted growing interest in recent research. While person-specific approaches yield high-fidelity results, they require extensive training data from each individual speaker. In contrast, general-purpose methods often struggle with accurate lip synchronization, identity preservation, and natural facial movements. To address these limitations, we propose a novel architecture that combines an alignment model with a rendering model. The rendering model synthesizes identity-consistent lip movements by leveraging facial landmarks derived from speech, a partially occluded target face, multi-reference lip features, and the input audio. Concurrently, the alignment model estimates optical flow using the occluded face and a static reference image, enabling precise alignment of facial poses and lip shapes. This collaborative design enhances the rendering process, resulting in more realistic and identity-preserving outputs. Extensive experiments demonstrate that our method significantly improves lip synchronization and identity retention, establishing a new benchmark in talking face video generation.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Precise Motion Inbetweening via Bidirectional Autoregressive Diffusion Models 通过双向自回归扩散模型的精确运动中间
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-28 DOI: 10.1002/cav.70040
Jiawen Peng, Zhuoran Liu, Jingzhong Lin, Gaoqi He

Conditional motion diffusion models have demonstrated significant potential in generating natural and reasonable motions response to constraints such as keyframes, that can be used for motion inbetweening task. However, most methods struggle to match the keyframe constraints accurately, which resulting in unsmooth transitions between keyframes and generated motion. In this article, we propose Bidirectional Autoregressive Motion Diffusion Inbetweening (BAMDI) to generate seamless motion between start and target frames. The main idea is to transfer the motion diffusion model to autoregressive paradigm, which predicts subsequence of motion adjacent to both start and target keyframes to infill the missing frames through several iterations. This can help to improve the local consistency of generated motion. Additionally, bidirectional generation make sure the smoothness on both start frame target keyframes. Experiments show our method achieves state-of-the-art performance compared with other diffusion-based motion inbetweening methods.

条件运动扩散模型在生成自然和合理的运动响应约束(如关键帧)方面显示出巨大的潜力,这可以用于任务之间的运动。然而,大多数方法都难以准确匹配关键帧约束,这导致关键帧和生成的运动之间的过渡不平滑。在本文中,我们提出了双向自回归运动扩散之间(BAMDI),以产生无缝运动之间的开始和目标帧。其主要思想是将运动扩散模型转化为自回归模型,通过多次迭代,预测起始和目标关键帧附近的运动序列,以填补缺失帧。这有助于提高生成运动的局部一致性。此外,双向生成确保了开始帧和目标关键帧的平滑性。实验表明,与其他基于扩散的运动间隔方法相比,我们的方法达到了最先进的性能。
{"title":"Precise Motion Inbetweening via Bidirectional Autoregressive Diffusion Models","authors":"Jiawen Peng,&nbsp;Zhuoran Liu,&nbsp;Jingzhong Lin,&nbsp;Gaoqi He","doi":"10.1002/cav.70040","DOIUrl":"https://doi.org/10.1002/cav.70040","url":null,"abstract":"<div>\u0000 \u0000 <p>Conditional motion diffusion models have demonstrated significant potential in generating natural and reasonable motions response to constraints such as keyframes, that can be used for motion inbetweening task. However, most methods struggle to match the keyframe constraints accurately, which resulting in unsmooth transitions between keyframes and generated motion. In this article, we propose Bidirectional Autoregressive Motion Diffusion Inbetweening (BAMDI) to generate seamless motion between start and target frames. The main idea is to transfer the motion diffusion model to autoregressive paradigm, which predicts subsequence of motion adjacent to both start and target keyframes to infill the missing frames through several iterations. This can help to improve the local consistency of generated motion. Additionally, bidirectional generation make sure the smoothness on both start frame target keyframes. Experiments show our method achieves state-of-the-art performance compared with other diffusion-based motion inbetweening methods.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144171472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PG-VTON: Front-And-Back Garment Guided Panoramic Gaussian Virtual Try-On With Diffusion Modeling PG-VTON:前后服装引导全景高斯虚拟试戴扩散建模
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-27 DOI: 10.1002/cav.70054
Jian Zheng, Shengwei Sang, Yifei Lu, Guojun Dai, Xiaoyang Mao, Wenhui Zhou

Virtual try-on (VTON) technology enables the rapid creation of realistic try-on experiences, which makes it highly valuable for the metaverse and e-commerce. However, 2D VTON methods struggle to convey depth and immersion, while existing 3D methods require multi-view garment images and face challenges in generating high-fidelity garment textures. To address the aforementioned limitations, this paper proposes a panoramic Gaussian VTON framework guided solely by front-and-back garment information, named PG-VTON, which uses an adapted local controllable diffusion model for generating virtual dressing effects in specific regions. Specifically, PG-VTON adopts a coarse-to-fine architecture consisting of two stages. The coarse editing stage employs a local controllable diffusion model with a score distillation sampling (SDS) loss to generate coarse garment geometries with high-level semantics. Meanwhile, the refinement stage applies the same diffusion model with a photometric loss not only to enhance garment details and reduce artifacts but also to correct unwanted noise and distortions introduced during the coarse stage, thereby effectively enhancing realism. To improve training efficiency, we further introduce a dynamic noise scheduling (DNS) strategy, which ensures stable training and high-fidelity results. Experimental results demonstrate the superiority of our method, which achieves geometrically consistent and highly realistic 3D virtual try-on generation.

虚拟试戴(VTON)技术可以快速创建逼真的试戴体验,这对虚拟世界和电子商务具有很高的价值。然而,2D VTON方法难以传达深度和沉浸感,而现有的3D方法需要多视角服装图像,并且在生成高保真服装纹理方面面临挑战。针对上述局限性,本文提出了一种仅以服装前后信息为指导的全景高斯VTON框架,命名为PG-VTON,该框架使用自适应的局部可控扩散模型在特定区域生成虚拟穿着效果。具体来说,PG-VTON采用了一个由两个阶段组成的从粗到精的架构。粗编辑阶段采用带有分数蒸馏采样(SDS)损失的局部可控扩散模型生成具有高级语义的粗服装几何图形。同时,精化阶段采用相同的扩散模型,增加了光度损失,不仅可以增强服装细节,减少伪影,还可以纠正粗化阶段引入的不必要的噪声和失真,从而有效地增强真实感。为了提高训练效率,我们进一步引入了动态噪声调度(DNS)策略,以保证稳定的训练和高保真的结果。实验结果证明了该方法的优越性,实现了几何一致性和高真实感的三维虚拟试戴生成。
{"title":"PG-VTON: Front-And-Back Garment Guided Panoramic Gaussian Virtual Try-On With Diffusion Modeling","authors":"Jian Zheng,&nbsp;Shengwei Sang,&nbsp;Yifei Lu,&nbsp;Guojun Dai,&nbsp;Xiaoyang Mao,&nbsp;Wenhui Zhou","doi":"10.1002/cav.70054","DOIUrl":"https://doi.org/10.1002/cav.70054","url":null,"abstract":"<div>\u0000 \u0000 <p>Virtual try-on (VTON) technology enables the rapid creation of realistic try-on experiences, which makes it highly valuable for the metaverse and e-commerce. However, 2D VTON methods struggle to convey depth and immersion, while existing 3D methods require multi-view garment images and face challenges in generating high-fidelity garment textures. To address the aforementioned limitations, this paper proposes a panoramic Gaussian VTON framework guided solely by front-and-back garment information, named PG-VTON, which uses an adapted local controllable diffusion model for generating virtual dressing effects in specific regions. Specifically, PG-VTON adopts a coarse-to-fine architecture consisting of two stages. The coarse editing stage employs a local controllable diffusion model with a score distillation sampling (SDS) loss to generate coarse garment geometries with high-level semantics. Meanwhile, the refinement stage applies the same diffusion model with a photometric loss not only to enhance garment details and reduce artifacts but also to correct unwanted noise and distortions introduced during the coarse stage, thereby effectively enhancing realism. To improve training efficiency, we further introduce a dynamic noise scheduling (DNS) strategy, which ensures stable training and high-fidelity results. Experimental results demonstrate the superiority of our method, which achieves geometrically consistent and highly realistic 3D virtual try-on generation.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Robust 3D Mesh Segmentation Algorithm With Anisotropic Sparse Embedding 基于各向异性稀疏嵌入的鲁棒三维网格分割算法
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-27 DOI: 10.1002/cav.70042
Mengyao Zhang, Wenting Li, Yong Zhao, Xin Si, Jingliang Zhang

3D mesh segmentation, as a very challenging problem in computer graphics, has attracted considerable interest. The most popular methods in recent years are data-driven methods. However, such methods require a large amount of accurately labeled data, which is difficult to obtain. In this article, we propose a novel mesh segmentation algorithm based on anisotropic sparse embedding. We first over-segment the input mesh and get a collection of patches. Then these patches are embedded into a latent space via an anisotropic L1$$ {L}_1 $$-regularized optimization problem. In the new space, the patches that belong to the same part of the mesh will be closer, while those belonging to different parts will be farther. Finally, we can easily generate the segmentation result by clustering. Various experimental results on the PSB and COSEG datasets show that our algorithm is able to get perception-aware results and is superior to the state-of-the-art algorithms. In addition, the proposed algorithm can robustly deal with meshes with different poses, different triangulations, noises, missing regions, or missing parts.

三维网格分割是计算机图形学中一个非常具有挑战性的问题,引起了人们的广泛关注。近年来最流行的方法是数据驱动方法。然而,这种方法需要大量准确标记的数据,而这些数据很难获得。本文提出了一种基于各向异性稀疏嵌入的网格分割算法。我们首先对输入网格进行过分割,得到一组补丁。然后通过各向异性l1 $$ {L}_1 $$正则化优化问题将这些patch嵌入到潜在空间中。在新的空间中,属于网格同一部分的补丁会更近,而属于不同部分的补丁会更远。最后,通过聚类可以方便地生成分割结果。在PSB和COSEG数据集上的各种实验结果表明,我们的算法能够获得感知感知的结果,并且优于当前的算法。此外,该算法还可以鲁棒地处理不同姿态、不同三角剖分、噪声、缺失区域或缺失部分的网格。
{"title":"A Robust 3D Mesh Segmentation Algorithm With Anisotropic Sparse Embedding","authors":"Mengyao Zhang,&nbsp;Wenting Li,&nbsp;Yong Zhao,&nbsp;Xin Si,&nbsp;Jingliang Zhang","doi":"10.1002/cav.70042","DOIUrl":"https://doi.org/10.1002/cav.70042","url":null,"abstract":"<div>\u0000 \u0000 <p>3D mesh segmentation, as a very challenging problem in computer graphics, has attracted considerable interest. The most popular methods in recent years are data-driven methods. However, such methods require a large amount of accurately labeled data, which is difficult to obtain. In this article, we propose a novel mesh segmentation algorithm based on anisotropic sparse embedding. We first over-segment the input mesh and get a collection of patches. Then these patches are embedded into a latent space via an anisotropic <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>L</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>1</mn>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ {L}_1 $$</annotation>\u0000 </semantics></math>-regularized optimization problem. In the new space, the patches that belong to the same part of the mesh will be closer, while those belonging to different parts will be farther. Finally, we can easily generate the segmentation result by clustering. Various experimental results on the PSB and COSEG datasets show that our algorithm is able to get perception-aware results and is superior to the state-of-the-art algorithms. In addition, the proposed algorithm can robustly deal with meshes with different poses, different triangulations, noises, missing regions, or missing parts.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UTMCR: 3U-Net Transformer With Multi-Contrastive Regularization for Single Image Dehazing UTMCR:用于单幅图像去雾的多对比正则化3U-Net变压器
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-26 DOI: 10.1002/cav.70029
HangBin Xu, ChangJun Zou, ChuChao Lin

Convolutional neural networks have a long history of development in single-width dehazing tasks, but have gradually been dominated by the Transformer framework due to their insufficient global modeling capability and large number of parameters. However, the existing Transformer network structure adopts a single U-Net structure, which is insufficient in multi-level and multi-scale feature fusion and modeling capability. Therefore, we propose an end-to-end dehazing network (UTMCR-Net). The network consists of two parts: (1) UT module, which connects three U-Net networks in series, where the backbone is replaced by the Dehazeformer block. By connecting three U-Net networks in series, we can improve the image global modeling capability and capture multi-scale information at different levels to achieve multi-level and multi-scale feature fusion. (2) MCR module, which improves the original contrastive regularization method by splitting the results of the UT module into four equal blocks, which are then compared and learned by using the contrast regularization method, respectively. Specifically, we use three U-Net networks to enhance the global modeling capability of UTMCR as well as the multi-scale feature fusion capability. The image dehazing ability is further enhanced using the MCR module. Experimental results show that our method achieves better results on most datasets.

卷积神经网络在单宽度除雾任务中有着悠久的发展历史,但由于其全局建模能力不足和参数数量庞大,逐渐被Transformer框架所主导。然而,现有的变压器网络结构采用单一的U-Net结构,在多层次、多尺度的特征融合和建模能力方面存在不足。因此,我们提出了一个端到端除雾网络(UTMCR-Net)。该网络由两部分组成:(1)UT模块,将三个U-Net网络串联起来,其中骨干网由Dehazeformer块代替。通过串联三个U-Net网络,可以提高图像全局建模能力,在不同层次捕获多尺度信息,实现多层次、多尺度特征融合。(2) MCR模块,该模块改进了原始对比正则化方法,将UT模块的结果分成四个相等的块,然后分别使用对比正则化方法进行比较和学习。具体而言,我们使用了三种U-Net网络来增强UTMCR的全局建模能力和多尺度特征融合能力。使用MCR模块进一步增强了图像去雾能力。实验结果表明,该方法在大多数数据集上都取得了较好的效果。
{"title":"UTMCR: 3U-Net Transformer With Multi-Contrastive Regularization for Single Image Dehazing","authors":"HangBin Xu,&nbsp;ChangJun Zou,&nbsp;ChuChao Lin","doi":"10.1002/cav.70029","DOIUrl":"https://doi.org/10.1002/cav.70029","url":null,"abstract":"<div>\u0000 \u0000 <p>Convolutional neural networks have a long history of development in single-width dehazing tasks, but have gradually been dominated by the Transformer framework due to their insufficient global modeling capability and large number of parameters. However, the existing Transformer network structure adopts a single U-Net structure, which is insufficient in multi-level and multi-scale feature fusion and modeling capability. Therefore, we propose an end-to-end dehazing network (UTMCR-Net). The network consists of two parts: (1) UT module, which connects three U-Net networks in series, where the backbone is replaced by the Dehazeformer block. By connecting three U-Net networks in series, we can improve the image global modeling capability and capture multi-scale information at different levels to achieve multi-level and multi-scale feature fusion. (2) MCR module, which improves the original contrastive regularization method by splitting the results of the UT module into four equal blocks, which are then compared and learned by using the contrast regularization method, respectively. Specifically, we use three U-Net networks to enhance the global modeling capability of UTMCR as well as the multi-scale feature fusion capability. The image dehazing ability is further enhanced using the MCR module. Experimental results show that our method achieves better results on most datasets.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144135834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoupling Density Dynamics: A Neural Operator Framework for Adaptive Multi-Fluid Interactions 解耦密度动力学:自适应多流体相互作用的神经算子框架
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-26 DOI: 10.1002/cav.70027
Yalan Zhang, Yuhang Xu, Xiaokun Wang, Angelos Chatzimparmpas, Xiaojuan Ban

The dynamic interface prediction of multi-density fluids presents a fundamental challenge across computational fluid dynamics and graphics, rooted in nonlinear momentum transfer. We present Density-Conditioned Dynamic Convolution, a novel neural operator framework that establishes differentiable density-dynamics mapping through decoupled operator response. The core theoretical advancement lies in continuously adaptive neighborhood kernels that transform local density distributions into tunable filters, enabling unified representation from homogeneous media to multi-phase fluid. Experiments demonstrate autonomous evolution of physically consistent interface separation patterns in density contrast scenarios, including cocktail and bidirectional hourglass flow. Quantitative evaluation shows improved computational efficiency compared to a SPH method and qualitatively plausible interface dynamics, with a larger time step size.

多密度流体的动态界面预测是计算流体动力学和图形学领域的一个基本挑战,其根源在于非线性动量传递。我们提出了密度条件动态卷积,这是一种新的神经算子框架,通过解耦算子响应建立可微的密度动态映射。核心理论进步在于连续自适应邻域核,将局部密度分布转化为可调滤波器,实现从均匀介质到多相流体的统一表示。实验表明,在密度对比的情况下,包括鸡尾酒流和双向沙漏流,物理上一致的界面分离模式自主演化。定量评价表明,与SPH方法相比,该方法的计算效率有所提高,并且具有更大的时间步长和定性合理的界面动力学。
{"title":"Decoupling Density Dynamics: A Neural Operator Framework for Adaptive Multi-Fluid Interactions","authors":"Yalan Zhang,&nbsp;Yuhang Xu,&nbsp;Xiaokun Wang,&nbsp;Angelos Chatzimparmpas,&nbsp;Xiaojuan Ban","doi":"10.1002/cav.70027","DOIUrl":"https://doi.org/10.1002/cav.70027","url":null,"abstract":"<div>\u0000 \u0000 <p>The dynamic interface prediction of multi-density fluids presents a fundamental challenge across computational fluid dynamics and graphics, rooted in nonlinear momentum transfer. We present Density-Conditioned Dynamic Convolution, a novel neural operator framework that establishes differentiable density-dynamics mapping through decoupled operator response. The core theoretical advancement lies in continuously adaptive neighborhood kernels that transform local density distributions into tunable filters, enabling unified representation from homogeneous media to multi-phase fluid. Experiments demonstrate autonomous evolution of physically consistent interface separation patterns in density contrast scenarios, including cocktail and bidirectional hourglass flow. Quantitative evaluation shows improved computational efficiency compared to a SPH method and qualitatively plausible interface dynamics, with a larger time step size.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144140557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Animation and Virtual Worlds
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1