首页 > 最新文献

Computer Animation and Virtual Worlds最新文献

英文 中文
Talking Face Generation With Lip and Identity Priors 有嘴唇和身份先验的说话面孔一代
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-28 DOI: 10.1002/cav.70026
Jiajie Wu, Frederick W. B. Li, Gary K. L. Tam, Bailin Yang, Fangzhe Nan, Jiahao Pan

Speech-driven talking face video generation has attracted growing interest in recent research. While person-specific approaches yield high-fidelity results, they require extensive training data from each individual speaker. In contrast, general-purpose methods often struggle with accurate lip synchronization, identity preservation, and natural facial movements. To address these limitations, we propose a novel architecture that combines an alignment model with a rendering model. The rendering model synthesizes identity-consistent lip movements by leveraging facial landmarks derived from speech, a partially occluded target face, multi-reference lip features, and the input audio. Concurrently, the alignment model estimates optical flow using the occluded face and a static reference image, enabling precise alignment of facial poses and lip shapes. This collaborative design enhances the rendering process, resulting in more realistic and identity-preserving outputs. Extensive experiments demonstrate that our method significantly improves lip synchronization and identity retention, establishing a new benchmark in talking face video generation.

语音驱动的说话脸视频生成在最近的研究中引起了越来越多的兴趣。虽然针对个人的方法可以产生高保真度的结果,但它们需要来自每个说话者的大量训练数据。相比之下,通用的方法往往与精确的嘴唇同步、身份保存和自然的面部运动作斗争。为了解决这些限制,我们提出了一种结合了对齐模型和呈现模型的新架构。该渲染模型通过利用来自语音的面部标志、部分遮挡的目标面部、多参考嘴唇特征和输入音频来综合身份一致的嘴唇运动。同时,对齐模型使用被遮挡的面部和静态参考图像估计光流,从而实现面部姿势和唇形的精确对齐。这种协作设计增强了渲染过程,从而产生更真实和保留身份的输出。大量的实验表明,我们的方法显著提高了唇部同步和身份保留,为语音人脸视频生成建立了新的基准。
{"title":"Talking Face Generation With Lip and Identity Priors","authors":"Jiajie Wu,&nbsp;Frederick W. B. Li,&nbsp;Gary K. L. Tam,&nbsp;Bailin Yang,&nbsp;Fangzhe Nan,&nbsp;Jiahao Pan","doi":"10.1002/cav.70026","DOIUrl":"https://doi.org/10.1002/cav.70026","url":null,"abstract":"<div>\u0000 \u0000 <p>Speech-driven talking face video generation has attracted growing interest in recent research. While person-specific approaches yield high-fidelity results, they require extensive training data from each individual speaker. In contrast, general-purpose methods often struggle with accurate lip synchronization, identity preservation, and natural facial movements. To address these limitations, we propose a novel architecture that combines an alignment model with a rendering model. The rendering model synthesizes identity-consistent lip movements by leveraging facial landmarks derived from speech, a partially occluded target face, multi-reference lip features, and the input audio. Concurrently, the alignment model estimates optical flow using the occluded face and a static reference image, enabling precise alignment of facial poses and lip shapes. This collaborative design enhances the rendering process, resulting in more realistic and identity-preserving outputs. Extensive experiments demonstrate that our method significantly improves lip synchronization and identity retention, establishing a new benchmark in talking face video generation.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Precise Motion Inbetweening via Bidirectional Autoregressive Diffusion Models 通过双向自回归扩散模型的精确运动中间
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-28 DOI: 10.1002/cav.70040
Jiawen Peng, Zhuoran Liu, Jingzhong Lin, Gaoqi He

Conditional motion diffusion models have demonstrated significant potential in generating natural and reasonable motions response to constraints such as keyframes, that can be used for motion inbetweening task. However, most methods struggle to match the keyframe constraints accurately, which resulting in unsmooth transitions between keyframes and generated motion. In this article, we propose Bidirectional Autoregressive Motion Diffusion Inbetweening (BAMDI) to generate seamless motion between start and target frames. The main idea is to transfer the motion diffusion model to autoregressive paradigm, which predicts subsequence of motion adjacent to both start and target keyframes to infill the missing frames through several iterations. This can help to improve the local consistency of generated motion. Additionally, bidirectional generation make sure the smoothness on both start frame target keyframes. Experiments show our method achieves state-of-the-art performance compared with other diffusion-based motion inbetweening methods.

条件运动扩散模型在生成自然和合理的运动响应约束(如关键帧)方面显示出巨大的潜力,这可以用于任务之间的运动。然而,大多数方法都难以准确匹配关键帧约束,这导致关键帧和生成的运动之间的过渡不平滑。在本文中,我们提出了双向自回归运动扩散之间(BAMDI),以产生无缝运动之间的开始和目标帧。其主要思想是将运动扩散模型转化为自回归模型,通过多次迭代,预测起始和目标关键帧附近的运动序列,以填补缺失帧。这有助于提高生成运动的局部一致性。此外,双向生成确保了开始帧和目标关键帧的平滑性。实验表明,与其他基于扩散的运动间隔方法相比,我们的方法达到了最先进的性能。
{"title":"Precise Motion Inbetweening via Bidirectional Autoregressive Diffusion Models","authors":"Jiawen Peng,&nbsp;Zhuoran Liu,&nbsp;Jingzhong Lin,&nbsp;Gaoqi He","doi":"10.1002/cav.70040","DOIUrl":"https://doi.org/10.1002/cav.70040","url":null,"abstract":"<div>\u0000 \u0000 <p>Conditional motion diffusion models have demonstrated significant potential in generating natural and reasonable motions response to constraints such as keyframes, that can be used for motion inbetweening task. However, most methods struggle to match the keyframe constraints accurately, which resulting in unsmooth transitions between keyframes and generated motion. In this article, we propose Bidirectional Autoregressive Motion Diffusion Inbetweening (BAMDI) to generate seamless motion between start and target frames. The main idea is to transfer the motion diffusion model to autoregressive paradigm, which predicts subsequence of motion adjacent to both start and target keyframes to infill the missing frames through several iterations. This can help to improve the local consistency of generated motion. Additionally, bidirectional generation make sure the smoothness on both start frame target keyframes. Experiments show our method achieves state-of-the-art performance compared with other diffusion-based motion inbetweening methods.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144171472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PG-VTON: Front-And-Back Garment Guided Panoramic Gaussian Virtual Try-On With Diffusion Modeling PG-VTON:前后服装引导全景高斯虚拟试戴扩散建模
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-27 DOI: 10.1002/cav.70054
Jian Zheng, Shengwei Sang, Yifei Lu, Guojun Dai, Xiaoyang Mao, Wenhui Zhou

Virtual try-on (VTON) technology enables the rapid creation of realistic try-on experiences, which makes it highly valuable for the metaverse and e-commerce. However, 2D VTON methods struggle to convey depth and immersion, while existing 3D methods require multi-view garment images and face challenges in generating high-fidelity garment textures. To address the aforementioned limitations, this paper proposes a panoramic Gaussian VTON framework guided solely by front-and-back garment information, named PG-VTON, which uses an adapted local controllable diffusion model for generating virtual dressing effects in specific regions. Specifically, PG-VTON adopts a coarse-to-fine architecture consisting of two stages. The coarse editing stage employs a local controllable diffusion model with a score distillation sampling (SDS) loss to generate coarse garment geometries with high-level semantics. Meanwhile, the refinement stage applies the same diffusion model with a photometric loss not only to enhance garment details and reduce artifacts but also to correct unwanted noise and distortions introduced during the coarse stage, thereby effectively enhancing realism. To improve training efficiency, we further introduce a dynamic noise scheduling (DNS) strategy, which ensures stable training and high-fidelity results. Experimental results demonstrate the superiority of our method, which achieves geometrically consistent and highly realistic 3D virtual try-on generation.

虚拟试戴(VTON)技术可以快速创建逼真的试戴体验,这对虚拟世界和电子商务具有很高的价值。然而,2D VTON方法难以传达深度和沉浸感,而现有的3D方法需要多视角服装图像,并且在生成高保真服装纹理方面面临挑战。针对上述局限性,本文提出了一种仅以服装前后信息为指导的全景高斯VTON框架,命名为PG-VTON,该框架使用自适应的局部可控扩散模型在特定区域生成虚拟穿着效果。具体来说,PG-VTON采用了一个由两个阶段组成的从粗到精的架构。粗编辑阶段采用带有分数蒸馏采样(SDS)损失的局部可控扩散模型生成具有高级语义的粗服装几何图形。同时,精化阶段采用相同的扩散模型,增加了光度损失,不仅可以增强服装细节,减少伪影,还可以纠正粗化阶段引入的不必要的噪声和失真,从而有效地增强真实感。为了提高训练效率,我们进一步引入了动态噪声调度(DNS)策略,以保证稳定的训练和高保真的结果。实验结果证明了该方法的优越性,实现了几何一致性和高真实感的三维虚拟试戴生成。
{"title":"PG-VTON: Front-And-Back Garment Guided Panoramic Gaussian Virtual Try-On With Diffusion Modeling","authors":"Jian Zheng,&nbsp;Shengwei Sang,&nbsp;Yifei Lu,&nbsp;Guojun Dai,&nbsp;Xiaoyang Mao,&nbsp;Wenhui Zhou","doi":"10.1002/cav.70054","DOIUrl":"https://doi.org/10.1002/cav.70054","url":null,"abstract":"<div>\u0000 \u0000 <p>Virtual try-on (VTON) technology enables the rapid creation of realistic try-on experiences, which makes it highly valuable for the metaverse and e-commerce. However, 2D VTON methods struggle to convey depth and immersion, while existing 3D methods require multi-view garment images and face challenges in generating high-fidelity garment textures. To address the aforementioned limitations, this paper proposes a panoramic Gaussian VTON framework guided solely by front-and-back garment information, named PG-VTON, which uses an adapted local controllable diffusion model for generating virtual dressing effects in specific regions. Specifically, PG-VTON adopts a coarse-to-fine architecture consisting of two stages. The coarse editing stage employs a local controllable diffusion model with a score distillation sampling (SDS) loss to generate coarse garment geometries with high-level semantics. Meanwhile, the refinement stage applies the same diffusion model with a photometric loss not only to enhance garment details and reduce artifacts but also to correct unwanted noise and distortions introduced during the coarse stage, thereby effectively enhancing realism. To improve training efficiency, we further introduce a dynamic noise scheduling (DNS) strategy, which ensures stable training and high-fidelity results. Experimental results demonstrate the superiority of our method, which achieves geometrically consistent and highly realistic 3D virtual try-on generation.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Robust 3D Mesh Segmentation Algorithm With Anisotropic Sparse Embedding 基于各向异性稀疏嵌入的鲁棒三维网格分割算法
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-27 DOI: 10.1002/cav.70042
Mengyao Zhang, Wenting Li, Yong Zhao, Xin Si, Jingliang Zhang

3D mesh segmentation, as a very challenging problem in computer graphics, has attracted considerable interest. The most popular methods in recent years are data-driven methods. However, such methods require a large amount of accurately labeled data, which is difficult to obtain. In this article, we propose a novel mesh segmentation algorithm based on anisotropic sparse embedding. We first over-segment the input mesh and get a collection of patches. Then these patches are embedded into a latent space via an anisotropic L1$$ {L}_1 $$-regularized optimization problem. In the new space, the patches that belong to the same part of the mesh will be closer, while those belonging to different parts will be farther. Finally, we can easily generate the segmentation result by clustering. Various experimental results on the PSB and COSEG datasets show that our algorithm is able to get perception-aware results and is superior to the state-of-the-art algorithms. In addition, the proposed algorithm can robustly deal with meshes with different poses, different triangulations, noises, missing regions, or missing parts.

三维网格分割是计算机图形学中一个非常具有挑战性的问题,引起了人们的广泛关注。近年来最流行的方法是数据驱动方法。然而,这种方法需要大量准确标记的数据,而这些数据很难获得。本文提出了一种基于各向异性稀疏嵌入的网格分割算法。我们首先对输入网格进行过分割,得到一组补丁。然后通过各向异性l1 $$ {L}_1 $$正则化优化问题将这些patch嵌入到潜在空间中。在新的空间中,属于网格同一部分的补丁会更近,而属于不同部分的补丁会更远。最后,通过聚类可以方便地生成分割结果。在PSB和COSEG数据集上的各种实验结果表明,我们的算法能够获得感知感知的结果,并且优于当前的算法。此外,该算法还可以鲁棒地处理不同姿态、不同三角剖分、噪声、缺失区域或缺失部分的网格。
{"title":"A Robust 3D Mesh Segmentation Algorithm With Anisotropic Sparse Embedding","authors":"Mengyao Zhang,&nbsp;Wenting Li,&nbsp;Yong Zhao,&nbsp;Xin Si,&nbsp;Jingliang Zhang","doi":"10.1002/cav.70042","DOIUrl":"https://doi.org/10.1002/cav.70042","url":null,"abstract":"<div>\u0000 \u0000 <p>3D mesh segmentation, as a very challenging problem in computer graphics, has attracted considerable interest. The most popular methods in recent years are data-driven methods. However, such methods require a large amount of accurately labeled data, which is difficult to obtain. In this article, we propose a novel mesh segmentation algorithm based on anisotropic sparse embedding. We first over-segment the input mesh and get a collection of patches. Then these patches are embedded into a latent space via an anisotropic <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>L</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>1</mn>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ {L}_1 $$</annotation>\u0000 </semantics></math>-regularized optimization problem. In the new space, the patches that belong to the same part of the mesh will be closer, while those belonging to different parts will be farther. Finally, we can easily generate the segmentation result by clustering. Various experimental results on the PSB and COSEG datasets show that our algorithm is able to get perception-aware results and is superior to the state-of-the-art algorithms. In addition, the proposed algorithm can robustly deal with meshes with different poses, different triangulations, noises, missing regions, or missing parts.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UTMCR: 3U-Net Transformer With Multi-Contrastive Regularization for Single Image Dehazing UTMCR:用于单幅图像去雾的多对比正则化3U-Net变压器
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-26 DOI: 10.1002/cav.70029
HangBin Xu, ChangJun Zou, ChuChao Lin

Convolutional neural networks have a long history of development in single-width dehazing tasks, but have gradually been dominated by the Transformer framework due to their insufficient global modeling capability and large number of parameters. However, the existing Transformer network structure adopts a single U-Net structure, which is insufficient in multi-level and multi-scale feature fusion and modeling capability. Therefore, we propose an end-to-end dehazing network (UTMCR-Net). The network consists of two parts: (1) UT module, which connects three U-Net networks in series, where the backbone is replaced by the Dehazeformer block. By connecting three U-Net networks in series, we can improve the image global modeling capability and capture multi-scale information at different levels to achieve multi-level and multi-scale feature fusion. (2) MCR module, which improves the original contrastive regularization method by splitting the results of the UT module into four equal blocks, which are then compared and learned by using the contrast regularization method, respectively. Specifically, we use three U-Net networks to enhance the global modeling capability of UTMCR as well as the multi-scale feature fusion capability. The image dehazing ability is further enhanced using the MCR module. Experimental results show that our method achieves better results on most datasets.

卷积神经网络在单宽度除雾任务中有着悠久的发展历史,但由于其全局建模能力不足和参数数量庞大,逐渐被Transformer框架所主导。然而,现有的变压器网络结构采用单一的U-Net结构,在多层次、多尺度的特征融合和建模能力方面存在不足。因此,我们提出了一个端到端除雾网络(UTMCR-Net)。该网络由两部分组成:(1)UT模块,将三个U-Net网络串联起来,其中骨干网由Dehazeformer块代替。通过串联三个U-Net网络,可以提高图像全局建模能力,在不同层次捕获多尺度信息,实现多层次、多尺度特征融合。(2) MCR模块,该模块改进了原始对比正则化方法,将UT模块的结果分成四个相等的块,然后分别使用对比正则化方法进行比较和学习。具体而言,我们使用了三种U-Net网络来增强UTMCR的全局建模能力和多尺度特征融合能力。使用MCR模块进一步增强了图像去雾能力。实验结果表明,该方法在大多数数据集上都取得了较好的效果。
{"title":"UTMCR: 3U-Net Transformer With Multi-Contrastive Regularization for Single Image Dehazing","authors":"HangBin Xu,&nbsp;ChangJun Zou,&nbsp;ChuChao Lin","doi":"10.1002/cav.70029","DOIUrl":"https://doi.org/10.1002/cav.70029","url":null,"abstract":"<div>\u0000 \u0000 <p>Convolutional neural networks have a long history of development in single-width dehazing tasks, but have gradually been dominated by the Transformer framework due to their insufficient global modeling capability and large number of parameters. However, the existing Transformer network structure adopts a single U-Net structure, which is insufficient in multi-level and multi-scale feature fusion and modeling capability. Therefore, we propose an end-to-end dehazing network (UTMCR-Net). The network consists of two parts: (1) UT module, which connects three U-Net networks in series, where the backbone is replaced by the Dehazeformer block. By connecting three U-Net networks in series, we can improve the image global modeling capability and capture multi-scale information at different levels to achieve multi-level and multi-scale feature fusion. (2) MCR module, which improves the original contrastive regularization method by splitting the results of the UT module into four equal blocks, which are then compared and learned by using the contrast regularization method, respectively. Specifically, we use three U-Net networks to enhance the global modeling capability of UTMCR as well as the multi-scale feature fusion capability. The image dehazing ability is further enhanced using the MCR module. Experimental results show that our method achieves better results on most datasets.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144135834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoupling Density Dynamics: A Neural Operator Framework for Adaptive Multi-Fluid Interactions 解耦密度动力学:自适应多流体相互作用的神经算子框架
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-26 DOI: 10.1002/cav.70027
Yalan Zhang, Yuhang Xu, Xiaokun Wang, Angelos Chatzimparmpas, Xiaojuan Ban

The dynamic interface prediction of multi-density fluids presents a fundamental challenge across computational fluid dynamics and graphics, rooted in nonlinear momentum transfer. We present Density-Conditioned Dynamic Convolution, a novel neural operator framework that establishes differentiable density-dynamics mapping through decoupled operator response. The core theoretical advancement lies in continuously adaptive neighborhood kernels that transform local density distributions into tunable filters, enabling unified representation from homogeneous media to multi-phase fluid. Experiments demonstrate autonomous evolution of physically consistent interface separation patterns in density contrast scenarios, including cocktail and bidirectional hourglass flow. Quantitative evaluation shows improved computational efficiency compared to a SPH method and qualitatively plausible interface dynamics, with a larger time step size.

多密度流体的动态界面预测是计算流体动力学和图形学领域的一个基本挑战,其根源在于非线性动量传递。我们提出了密度条件动态卷积,这是一种新的神经算子框架,通过解耦算子响应建立可微的密度动态映射。核心理论进步在于连续自适应邻域核,将局部密度分布转化为可调滤波器,实现从均匀介质到多相流体的统一表示。实验表明,在密度对比的情况下,包括鸡尾酒流和双向沙漏流,物理上一致的界面分离模式自主演化。定量评价表明,与SPH方法相比,该方法的计算效率有所提高,并且具有更大的时间步长和定性合理的界面动力学。
{"title":"Decoupling Density Dynamics: A Neural Operator Framework for Adaptive Multi-Fluid Interactions","authors":"Yalan Zhang,&nbsp;Yuhang Xu,&nbsp;Xiaokun Wang,&nbsp;Angelos Chatzimparmpas,&nbsp;Xiaojuan Ban","doi":"10.1002/cav.70027","DOIUrl":"https://doi.org/10.1002/cav.70027","url":null,"abstract":"<div>\u0000 \u0000 <p>The dynamic interface prediction of multi-density fluids presents a fundamental challenge across computational fluid dynamics and graphics, rooted in nonlinear momentum transfer. We present Density-Conditioned Dynamic Convolution, a novel neural operator framework that establishes differentiable density-dynamics mapping through decoupled operator response. The core theoretical advancement lies in continuously adaptive neighborhood kernels that transform local density distributions into tunable filters, enabling unified representation from homogeneous media to multi-phase fluid. Experiments demonstrate autonomous evolution of physically consistent interface separation patterns in density contrast scenarios, including cocktail and bidirectional hourglass flow. Quantitative evaluation shows improved computational efficiency compared to a SPH method and qualitatively plausible interface dynamics, with a larger time step size.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144140557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weisfeiler-Lehman Kernel Augmented Product Representation for Queries on Large-Scale BIM Scenes 面向大规模BIM场景查询的Weisfeiler-Lehman核增强产品表示
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-26 DOI: 10.1002/cav.70043
Huiqiang Hu, Changyan He, Xiaojun Liu, Jinyuan Jia, Ting Yu

To achieve efficient querying of BIM products in large-scale virtual scenes, this study introduces a Weisfeiler-Lehman (WL) kernel augmented representation for Building Information Modeling(BIM) products based on Product Attributed Graphs (PAGs). Unlike conventional data-driven approaches that demand extensive labeling and preprocessing, our method directly processes raw BIM product data to extract stable semantic and geometric features. Initially, a PAG is constructed to encapsulate product features. Subsequently, a WL kernel enhanced multi-channel node aggregation strategy is employed to integrate BIM product attributes effectively. Leveraging the bijective relationship in graph isomorphism, an unsupervised convergence mechanism based on attribute value differences is established. Experiments demonstrate that our method achieves convergence within an average of 3 iterations, completes graph isomorphism testing in minimal time, and attains an average query accuracy of 95%. This approach outperforms 1-WL and 3-WL methods, especially in handling products with topologically isomorphic but oppositely attributed spaces.

为了实现大规模虚拟场景下BIM产品的高效查询,本研究引入了基于产品属性图(PAGs)的建筑信息建模(BIM)产品的Weisfeiler-Lehman (WL)核增强表示。与需要大量标记和预处理的传统数据驱动方法不同,我们的方法直接处理原始BIM产品数据,以提取稳定的语义和几何特征。最初,构建PAG是为了封装产品特性。随后,采用WL内核增强的多通道节点聚合策略,有效整合BIM产品属性。利用图同构中的双目标关系,建立了一种基于属性值差异的无监督收敛机制。实验表明,该方法平均在3次迭代内实现收敛,在最短时间内完成图同构测试,平均查询准确率达到95%。这种方法优于1-WL和3-WL方法,特别是在处理具有拓扑同构但相反属性空间的产品时。
{"title":"Weisfeiler-Lehman Kernel Augmented Product Representation for Queries on Large-Scale BIM Scenes","authors":"Huiqiang Hu,&nbsp;Changyan He,&nbsp;Xiaojun Liu,&nbsp;Jinyuan Jia,&nbsp;Ting Yu","doi":"10.1002/cav.70043","DOIUrl":"https://doi.org/10.1002/cav.70043","url":null,"abstract":"<div>\u0000 \u0000 <p>To achieve efficient querying of BIM products in large-scale virtual scenes, this study introduces a Weisfeiler-Lehman (WL) kernel augmented representation for Building Information Modeling(BIM) products based on Product Attributed Graphs (PAGs). Unlike conventional data-driven approaches that demand extensive labeling and preprocessing, our method directly processes raw BIM product data to extract stable semantic and geometric features. Initially, a PAG is constructed to encapsulate product features. Subsequently, a WL kernel enhanced multi-channel node aggregation strategy is employed to integrate BIM product attributes effectively. Leveraging the bijective relationship in graph isomorphism, an unsupervised convergence mechanism based on attribute value differences is established. Experiments demonstrate that our method achieves convergence within an average of 3 iterations, completes graph isomorphism testing in minimal time, and attains an average query accuracy of 95%. This approach outperforms 1-WL and 3-WL methods, especially in handling products with topologically isomorphic but oppositely attributed spaces.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144135833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Risk-Aware Pedestrian Behavior Using Reinforcement Learning in Mixed Traffic 基于强化学习的混合交通风险感知行人行为研究
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-25 DOI: 10.1002/cav.70031
Cheng-En Cai, Sai-Keung Wong, Tzu-Yu Chen

This paper introduces a reinforcement learning method to simulate agents crossing roads in unsignalized, mixed-traffic environments. These agents represent individual pedestrians or small groups. The method ensures that agents adopt safe interactions with nearby dynamic obstacles (bikes, motorcycles, or cars) by considering factors such as conflict zones and post-encroachment times. Risk assessments based on interaction times encourage agents to avoid hazardous behaviors. Additionally, risk-informed reward terms incentivize agents to perform safe actions, while collision penalties deter collisions. The method achieved collision-free crossings and demonstrated normal, conservative, and aggressive pedestrian behaviors in various scenarios. Finally, ablation tests revealed the impact of reward weights, reward terms, and key agent state components. The weights of reward terms can be adjusted to achieve either conservative or aggressive pedestrian crossing behaviors, balancing road crossing efficiency and safety.

本文介绍了一种用于模拟无信号混合交通环境下智能体过马路的强化学习方法。这些代理代表单个行人或小团体。该方法通过考虑冲突区域和入侵后时间等因素,确保智能体与附近的动态障碍物(自行车、摩托车或汽车)进行安全交互。基于互动时间的风险评估鼓励代理人避免危险行为。此外,风险知情的奖励条款激励代理执行安全操作,而碰撞惩罚则阻止碰撞。该方法实现了无碰撞过马路,并在不同场景下展示了正常、保守和攻击性的行人行为。最后,消融测试揭示了奖励权重、奖励条款和关键代理状态组件的影响。通过调整奖励条件的权重,可以实现保守或激进的行人过马路行为,平衡过马路效率和安全性。
{"title":"Risk-Aware Pedestrian Behavior Using Reinforcement Learning in Mixed Traffic","authors":"Cheng-En Cai,&nbsp;Sai-Keung Wong,&nbsp;Tzu-Yu Chen","doi":"10.1002/cav.70031","DOIUrl":"https://doi.org/10.1002/cav.70031","url":null,"abstract":"<div>\u0000 \u0000 <p>This paper introduces a reinforcement learning method to simulate agents crossing roads in unsignalized, mixed-traffic environments. These agents represent individual pedestrians or small groups. The method ensures that agents adopt safe interactions with nearby dynamic obstacles (bikes, motorcycles, or cars) by considering factors such as conflict zones and post-encroachment times. Risk assessments based on interaction times encourage agents to avoid hazardous behaviors. Additionally, risk-informed reward terms incentivize agents to perform safe actions, while collision penalties deter collisions. The method achieved collision-free crossings and demonstrated normal, conservative, and aggressive pedestrian behaviors in various scenarios. Finally, ablation tests revealed the impact of reward weights, reward terms, and key agent state components. The weights of reward terms can be adjusted to achieve either conservative or aggressive pedestrian crossing behaviors, balancing road crossing efficiency and safety.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144135795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Motion In-Betweening via Recursive Keyframe Prediction 通过递归关键帧预测运动之间
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-25 DOI: 10.1002/cav.70035
Rui Zeng, Ju Dai, Junxuan Bai, Junjun Pan

Motion in-betweening is a flexible and efficient technique for generating 3-dimensional animations. In this paper, we propose a keyframe-driven method that effectively addresses the pose ambiguity issue and achieves robust in-betweening performance. We introduce a keyframe-driven synthesis framework. At each recursion, the key poses at both ends keep predicting the new one at the midpoint. The recursive breakdown reduces motion ambiguities by simplifying the in-betweening sequence as the integration of short clips. The hybrid positional encoding scales the hidden states to adapt to long- and short-term dependencies. Additionally, we employ a temporal refinement network to capture the local motion relationships, thereby enhancing the consistency of the predicted pose sequence. Through comprehensive evaluations that include both quantitative and qualitative comparisons, the proposed model demonstrates its competitiveness in prediction accuracy and in-betweening flexibility.

中间运动是一种灵活有效的三维动画生成技术。在本文中,我们提出了一种关键帧驱动的方法,有效地解决了姿态模糊问题,并实现了鲁棒的中间性能。我们介绍了一个关键帧驱动的合成框架。在每次递归中,两端的关键姿势都能预测中点的新姿势。递归分解通过简化中间序列作为短片段的集成来减少运动的模糊性。混合位置编码扩展隐藏状态以适应长期和短期依赖关系。此外,我们采用时间优化网络来捕获局部运动关系,从而增强预测姿态序列的一致性。通过定量和定性比较的综合评价,表明该模型在预测精度和中间灵活性方面具有竞争力。
{"title":"Motion In-Betweening via Recursive Keyframe Prediction","authors":"Rui Zeng,&nbsp;Ju Dai,&nbsp;Junxuan Bai,&nbsp;Junjun Pan","doi":"10.1002/cav.70035","DOIUrl":"https://doi.org/10.1002/cav.70035","url":null,"abstract":"<div>\u0000 \u0000 <p>Motion in-betweening is a flexible and efficient technique for generating 3-dimensional animations. In this paper, we propose a keyframe-driven method that effectively addresses the pose ambiguity issue and achieves robust in-betweening performance. We introduce a keyframe-driven synthesis framework. At each recursion, the key poses at both ends keep predicting the new one at the midpoint. The recursive breakdown reduces motion ambiguities by simplifying the in-betweening sequence as the integration of short clips. The hybrid positional encoding scales the hidden states to adapt to long- and short-term dependencies. Additionally, we employ a temporal refinement network to capture the local motion relationships, thereby enhancing the consistency of the predicted pose sequence. Through comprehensive evaluations that include both quantitative and qualitative comparisons, the proposed model demonstrates its competitiveness in prediction accuracy and in-betweening flexibility.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144135796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GSFaceMorpher: High-Fidelity 3D Face Morphing via Gaussian Splatting GSFaceMorpher:高保真3D人脸变形通过高斯飞溅
IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-05-23 DOI: 10.1002/cav.70036
Xiwen Shi, Hao Zhao, Yi Jiang, Hao Xu, Ziyi Yang, Yiqian Wu, Qingbiao Wu, Xiaogang Jin

High-fidelity 3D face morphing aims to achieve seamless transitions between realistic 3D facial representations of different identities. Although 3D Gaussian Splatting (3DGS) excels in high-quality rendering, its application to morphing is hindered by the lack of Gaussian primitive correspondence and variations in primitive quantities. To address this, we propose GSFaceMorpher, which is a novel framework for high-fidelity 3D face morphing based on 3DGS. Our method constructs an auxiliary model that bridges the source and target face models by aligning the geometry through Radial Basis Function (RBF) warping and optimizing the appearance in the image space. This auxiliary model enables smooth parameter interpolation, whereas a diffusion-based refinement step enhances critical facial details through attention replacement from the reference faces. Experiments demonstrate that our method produces visually coherent and high-fidelity morphing sequences, significantly outperforming NeRF-based baselines in terms of both quantitative metrics and user preferences. Our work establishes a new benchmark for high-fidelity 3D face morphing with applications in visual effects, animation, and immersive experiences.

高保真3D面部变形旨在实现不同身份的逼真3D面部表现之间的无缝转换。虽然3D高斯飞溅(3DGS)在高质量渲染方面表现优异,但由于缺乏高斯原语对应关系和原语量的变化,阻碍了其在变形中的应用。为了解决这个问题,我们提出了GSFaceMorpher,这是一个基于3DGS的高保真3D人脸变形的新框架。该方法通过径向基函数(RBF)翘曲对齐几何图形并优化图像空间中的外观,构建了一个辅助模型,该模型将源和目标面部模型连接起来。该辅助模型能够实现平滑的参数插值,而基于扩散的细化步骤通过参考面部的注意力替换来增强关键面部细节。实验表明,我们的方法产生了视觉上连贯和高保真的变形序列,在定量指标和用户偏好方面都明显优于基于nerf的基线。我们的工作为高保真3D人脸变形在视觉效果、动画和沉浸式体验方面的应用建立了新的基准。
{"title":"GSFaceMorpher: High-Fidelity 3D Face Morphing via Gaussian Splatting","authors":"Xiwen Shi,&nbsp;Hao Zhao,&nbsp;Yi Jiang,&nbsp;Hao Xu,&nbsp;Ziyi Yang,&nbsp;Yiqian Wu,&nbsp;Qingbiao Wu,&nbsp;Xiaogang Jin","doi":"10.1002/cav.70036","DOIUrl":"https://doi.org/10.1002/cav.70036","url":null,"abstract":"<div>\u0000 \u0000 <p>High-fidelity 3D face morphing aims to achieve seamless transitions between realistic 3D facial representations of different identities. Although 3D Gaussian Splatting (3DGS) excels in high-quality rendering, its application to morphing is hindered by the lack of Gaussian primitive correspondence and variations in primitive quantities. To address this, we propose <i>GSFaceMorpher</i>, which is a novel framework for high-fidelity 3D face morphing based on 3DGS. Our method constructs an auxiliary model that bridges the source and target face models by aligning the geometry through Radial Basis Function (RBF) warping and optimizing the appearance in the image space. This auxiliary model enables smooth parameter interpolation, whereas a diffusion-based refinement step enhances critical facial details through attention replacement from the reference faces. Experiments demonstrate that our method produces visually coherent and high-fidelity morphing sequences, significantly outperforming NeRF-based baselines in terms of both quantitative metrics and user preferences. Our work establishes a new benchmark for high-fidelity 3D face morphing with applications in visual effects, animation, and immersive experiences.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Animation and Virtual Worlds
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1