Computer Animation and Virtual Worlds最新文献

英文中文

Uniform gradient magnetic field and spatial localization method based on Maxwell coils for virtual surgery simulation 基于麦克斯韦线圈的均匀梯度磁场和空间定位方法用于虚拟手术模拟

IF 1.1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds

Pub Date : 2024-05-17 DOI: 10.1002/cav.2247

Yi Huang, Xutian Deng, Xujie Zhao, Wenxuan Xie, Zhiyong Yuan, Jianhui Zhao

With the development of virtual reality technology, simulation surgery has become a low-risk surgical training method and high-precision positioning of surgical instruments is required in virtual simulation surgery. In this paper we design and validate a novel electromagnetic positioning method based on a uniform gradient magnetic field. We employ Maxwell coils to generate the uniform gradient magnetic field and propose two positioning algorithms based on magnetic field, namely the linear equation positioning algorithm and the magnetic field fingerprint positioning algorithm. After validating the feasibility of proposed positioning system through simulation, we construct a prototype system and conduct practical experiments. The experimental results demonstrate that the positioning system exhibits excellent accuracy and speed in both simulation and real-world applications. The positioning accuracy remains consistent and high, showing no significant variation with changes in the positions of surgical instruments.

随着虚拟现实技术的发展，仿真手术已成为一种低风险的外科训练方法，而在虚拟仿真手术中需要对手术器械进行高精度定位。本文设计并验证了一种基于均匀梯度磁场的新型电磁定位方法。我们采用麦克斯韦线圈来产生均匀梯度磁场，并提出了两种基于磁场的定位算法，即线性方程定位算法和磁场指纹定位算法。在通过仿真验证了所提定位系统的可行性后，我们构建了一个原型系统并进行了实际实验。实验结果表明，定位系统在仿真和实际应用中都表现出了出色的精度和速度。定位精度始终保持在较高水平，不会随着手术器械位置的变化而出现明显变化。

引用次数: 0

Facial action units detection using temporal context and feature reassignment 利用时间上下文和特征重新分配检测面部动作单元

IF 1.1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds

Pub Date : 2024-05-17 DOI: 10.1002/cav.2246

Sipeng Yang, Hongyu Huang, Ying Sophie Huang, Xiaogang Jin

Facial action units (AUs) encode the activations of facial muscle groups, playing a crucial role in expression analysis and facial animation. However, current deep learning AU detection methods primarily focus on single-image analysis, which limits the exploitation of rich temporal context for robust outcomes. Moreover, the scale of available datasets remains limited, leading models trained on these datasets to tend to suffer from overfitting issues. This paper proposes a novel AU detection method integrating spatial and temporal data with inter-subject feature reassignment for accurate and robust AU predictions. Our method first extracts regional features from facial images. Then, to effectively capture both the temporal context and identity-independent features, we introduce a temporal feature combination and feature reassignment (TC&FR) module, which transforms single-image features into a cohesive temporal sequence and fuses features across multiple subjects. This transformation encourages the model to utilize identity-independent features and temporal context, thus ensuring robust prediction outcomes. Experimental results demonstrate the enhancements brought by the proposed modules and the state-of-the-art (SOTA) results achieved by our method.

面部动作单元（AU）编码面部肌肉群的激活，在表情分析和面部动画中发挥着至关重要的作用。然而，目前的深度学习 AU 检测方法主要侧重于单张图像分析，这限制了利用丰富的时间背景来获得稳健的结果。此外，可用数据集的规模仍然有限，导致在这些数据集上训练的模型容易出现过拟合问题。本文提出了一种新颖的 AU 检测方法，该方法将空间和时间数据与受试者之间的特征重新分配相结合，以实现准确、稳健的 AU 预测。我们的方法首先从面部图像中提取区域特征。然后，为了有效捕捉时空背景和与身份无关的特征，我们引入了时空特征组合和特征重新分配（TC&FR）模块，该模块将单张图像特征转换为具有凝聚力的时空序列，并融合多个主体的特征。这种转换促使模型利用与身份无关的特征和时间上下文，从而确保预测结果的稳健性。实验结果表明了所提模块带来的改进，以及我们的方法所取得的最先进（SOTA）结果。

{"title":"Facial action units detection using temporal context and feature reassignment","authors":"Sipeng Yang, Hongyu Huang, Ying Sophie Huang, Xiaogang Jin","doi":"10.1002/cav.2246","DOIUrl":"https://doi.org/10.1002/cav.2246","url":null,"abstract":"Facial action units (AUs) encode the activations of facial muscle groups, playing a crucial role in expression analysis and facial animation. However, current deep learning AU detection methods primarily focus on single-image analysis, which limits the exploitation of rich temporal context for robust outcomes. Moreover, the scale of available datasets remains limited, leading models trained on these datasets to tend to suffer from overfitting issues. This paper proposes a novel AU detection method integrating spatial and temporal data with inter-subject feature reassignment for accurate and robust AU predictions. Our method first extracts regional features from facial images. Then, to effectively capture both the temporal context and identity-independent features, we introduce a temporal feature combination and feature reassignment (TC&FR) module, which transforms single-image features into a cohesive temporal sequence and fuses features across multiple subjects. This transformation encourages the model to utilize identity-independent features and temporal context, thus ensuring robust prediction outcomes. Experimental results demonstrate the enhancements brought by the proposed modules and the state-of-the-art (SOTA) results achieved by our method.","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AG-SDM: Aquascape generation based on stable diffusion model with low-rank adaptation AG-SDM：基于低秩适应的稳定扩散模型的水景生成

IF 1.1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds

Pub Date : 2024-05-17 DOI: 10.1002/cav.2252

Muyang Zhang, Jinming Yang, Yuewei Xian, Wei Li, Jiaming Gu, Weiliang Meng, Jiguang Zhang, Xiaopeng Zhang

As an amalgamation of landscape design and ichthyology, aquascape endeavors to create visually captivating aquatic environments imbued with artistic allure. Traditional methodologies in aquascape, governed by rigid principles such as composition and color coordination, may inadvertently curtail the aesthetic potential of the landscapes. In this paper, we propose Aquascape Generation based on Stable Diffusion Models (AG-SDM), prioritizing aesthetic principles and color coordination to offer guiding principles for real artists in Aquascape creation. We meticulously curated and annotated three aquascape datasets with varying aspect ratios to accommodate diverse landscape design requirements regarding dimensions and proportions. Leveraging the Fréchet Inception Distance (FID) metric, we trained AGFID for quality assessment. Extensive experiments validate that our AG-SDM excels in generating hyper-realistic underwater landscape images, closely resembling real flora, and achieves state-of-the-art performance in aquascape image generation.

作为景观设计和鱼类学的结合体，水族造景致力于创造充满艺术魅力的水生环境。传统的水族造景方法受构图和色彩协调等僵化原则的制约，可能会在不经意间削弱景观的美学潜力。在本文中，我们提出了基于稳定扩散模型的水景生成（AG-SDM），优先考虑美学原则和色彩协调，为真正的艺术家提供水景创作的指导原则。我们精心策划并注释了三个不同长宽比的水景数据集，以满足不同景观设计对尺寸和比例的要求。利用弗雷谢特起始距离（FID）指标，我们对 AGFID 进行了质量评估训练。广泛的实验验证了我们的 AG-SDM 在生成超逼真的水下景观图像方面表现出色，与真实植物非常相似，并在水景图像生成方面达到了最先进的性能。

{"title":"AG-SDM: Aquascape generation based on stable diffusion model with low-rank adaptation","authors":"Muyang Zhang, Jinming Yang, Yuewei Xian, Wei Li, Jiaming Gu, Weiliang Meng, Jiguang Zhang, Xiaopeng Zhang","doi":"10.1002/cav.2252","DOIUrl":"https://doi.org/10.1002/cav.2252","url":null,"abstract":"As an amalgamation of landscape design and ichthyology, aquascape endeavors to create visually captivating aquatic environments imbued with artistic allure. Traditional methodologies in aquascape, governed by rigid principles such as composition and color coordination, may inadvertently curtail the aesthetic potential of the landscapes. In this paper, we propose Aquascape Generation based on Stable Diffusion Models (AG-SDM), prioritizing aesthetic principles and color coordination to offer guiding principles for real artists in Aquascape creation. We meticulously curated and annotated three aquascape datasets with varying aspect ratios to accommodate diverse landscape design requirements regarding dimensions and proportions. Leveraging the Fréchet Inception Distance (FID) metric, we trained AGFID for quality assessment. Extensive experiments validate that our AG-SDM excels in generating hyper-realistic underwater landscape images, closely resembling real flora, and achieves state-of-the-art performance in aquascape image generation.","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

POST: Prototype-oriented similarity transfer framework for cross-domain facial expression recognition POST：面向原型的跨域面部表情识别相似性转移框架

IF 1.1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds

Pub Date : 2024-05-17 DOI: 10.1002/cav.2260

Zhe Guo, Bingxin Wei, Qinglin Cai, Jiayi Liu, Yi Wang

Facial expression recognition (FER) is one of the popular research topics in computer vision. Most deep learning expression recognition methods perform well on a single dataset, but may struggle in cross-domain FER applications when applied to different datasets. FER under cross-dataset also suffers from difficulties such as feature distribution deviation and discriminator degradation. To address these issues, we propose a prototype-oriented similarity transfer framework (POST) for cross-domain FER. The bidirectional cross-attention Swin Transformer (BCS Transformer) module is designed to aggregate local facial feature similarities across different domains, enabling the extraction of relevant cross-domain features. The dual learnable category prototypes is designed to represent potential space samples for both source and target domains, ensuring enhanced domain alignment by leveraging both cross-domain and specific domain features. We further introduce the self-training resampling (STR) strategy to enhance similarity transfer. The experimental results with the RAF-DB dataset as the source domain and the CK+, FER2013, JAFFE and SFEW 2.0 datasets as the target domains, show that our approach achieves much higher performance than the state-of-the-art cross-domain FER methods.

面部表情识别（FER）是计算机视觉领域的热门研究课题之一。大多数深度学习表情识别方法在单一数据集上表现良好，但应用于不同数据集时，在跨域 FER 应用中可能会遇到困难。跨数据集下的 FER 还存在特征分布偏差和判别器退化等困难。为了解决这些问题，我们提出了一种面向原型的跨域 FER 相似性转移框架（POST）。双向跨注意力斯温变换器（BCS Transformer）模块旨在聚合不同领域的局部面部特征相似性，从而提取相关的跨领域特征。双重可学习类别原型旨在代表源域和目标域的潜在空间样本，通过利用跨域和特定域特征确保增强域对齐。我们进一步引入了自我训练重采样（STR）策略，以增强相似性转移。以 RAF-DB 数据集为源域，CK+、FER2013、JAFFE 和 SFEW 2.0 数据集为目标域的实验结果表明，我们的方法比最先进的跨域 FER 方法实现了更高的性能。

{"title":"POST: Prototype-oriented similarity transfer framework for cross-domain facial expression recognition","authors":"Zhe Guo, Bingxin Wei, Qinglin Cai, Jiayi Liu, Yi Wang","doi":"10.1002/cav.2260","DOIUrl":"https://doi.org/10.1002/cav.2260","url":null,"abstract":"Facial expression recognition (FER) is one of the popular research topics in computer vision. Most deep learning expression recognition methods perform well on a single dataset, but may struggle in cross-domain FER applications when applied to different datasets. FER under cross-dataset also suffers from difficulties such as feature distribution deviation and discriminator degradation. To address these issues, we propose a prototype-oriented similarity transfer framework (POST) for cross-domain FER. The bidirectional cross-attention Swin Transformer (BCS Transformer) module is designed to aggregate local facial feature similarities across different domains, enabling the extraction of relevant cross-domain features. The dual learnable category prototypes is designed to represent potential space samples for both source and target domains, ensuring enhanced domain alignment by leveraging both cross-domain and specific domain features. We further introduce the self-training resampling (STR) strategy to enhance similarity transfer. The experimental results with the RAF-DB dataset as the source domain and the CK+, FER2013, JAFFE and SFEW 2.0 datasets as the target domains, show that our approach achieves much higher performance than the state-of-the-art cross-domain FER methods.","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-style cartoonization: Leveraging multiple datasets with generative adversarial networks 多风格卡通化：通过生成式对抗网络利用多个数据集

IF 1.1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds

Pub Date : 2024-05-17 DOI: 10.1002/cav.2269

Jianlu Cai, Frederick W. B. Li, Fangzhe Nan, Bailin Yang

Scene cartoonization aims to convert photos into stylized cartoons. While generative adversarial networks (GANs) can generate high-quality images, previous methods focus on individual images or single styles, ignoring relationships between datasets. We propose a novel multi-style scene cartoonization GAN that leverages multiple cartoon datasets jointly. Our main technical contribution is a multi-branch style encoder that disentangles representations to model styles as distributions over entire datasets rather than images. Combined with a multi-task discriminator and perceptual losses optimizing across collections, our model achieves state-of-the-art diverse stylization while preserving semantics. Experiments demonstrate that by learning from inter-dataset relationships, our method translates photos into cartoon images with improved realism and abstraction fidelity compared to prior arts, without iterative re-training for new styles.

场景卡通化旨在将照片转换成风格化的卡通。虽然生成式对抗网络（GAN）可以生成高质量的图像，但以往的方法只关注单个图像或单一风格，忽略了数据集之间的关系。我们提出了一种新颖的多风格场景卡通化生成式对抗网络（GAN），可联合利用多个卡通数据集。我们的主要技术贡献是多分支风格编码器，该编码器可拆分表征，将风格建模为整个数据集而非图像上的分布。结合多任务判别器和跨集合优化的感知损失，我们的模型实现了最先进的多样化风格化，同时保留了语义。实验证明，通过学习数据集之间的关系，我们的方法能将照片转化为卡通图像，与之前的技术相比，逼真度和抽象保真度都有所提高，而且无需对新风格进行迭代再训练。

引用次数: 0

Multi-scale edge aggregation mesh-graph-network for character secondary motion 人物二次运动的多尺度边缘聚合网格图网络

IF 1.1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds

Pub Date : 2024-05-17 DOI: 10.1002/cav.2241

Tianyi Wang, Shiguang Liu

As an enhancement to skinning-based animations, light-weight secondary motion method for 3D characters are widely demanded in many application scenarios. To address the dependence of data-driven methods on ground truth data, we propose a self-supervised training strategy that is free of ground truth data for the first time in this domain. Specifically, we construct a self-supervised training framework by modeling the implicit integration problem with steps as an optimization problem based on physical energy terms. Furthermore, we introduce a multi-scale edge aggregation mesh-graph block (MSEA-MG Block), which significantly enhances the network performance. This enables our model to make vivid predictions of secondary motion for 3D characters with arbitrary structures. Empirical experiments indicate that our method, without requiring ground truth data for model training, achieves comparable or even superior performance quantitatively and qualitatively compared to state-of-the-art data-driven approaches in the field.

作为对基于皮肤的动画的增强，轻量级三维角色二次运动方法在许多应用场景中都有广泛需求。为了解决数据驱动方法对地面实况数据的依赖，我们首次在该领域提出了一种无需地面实况数据的自监督训练策略。具体来说，我们将隐式积分问题建模为基于物理能量项的优化问题，从而构建了一个自监督训练框架。此外，我们还引入了多尺度边缘聚合网格图块（MSEA-MG Block），从而显著提高了网络性能。这使得我们的模型能够对具有任意结构的 3D 角色的二次运动做出生动的预测。实证实验表明，我们的方法无需地面实况数据来训练模型，就能在定量和定性方面达到与该领域最先进的数据驱动方法相当甚至更优的性能。

引用次数: 0

A novel transformer-based graph generation model for vectorized road design 基于变换器的新型矢量化道路设计图形生成模型

IF 1.1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds

Pub Date : 2024-05-17 DOI: 10.1002/cav.2267

Peichi Zhou, Chen Li, Jian Zhang, Changbo Wang, Hong Qin, Long Liu

Road network design, as an important part of landscape modeling, shows a great significance in automatic driving, video game development, and disaster simulation. To date, this task remains labor-intensive, tedious and time-consuming. Many improved techniques have been proposed during the last two decades. Nevertheless, most of the state-of-the-art methods still encounter problems of intuitiveness, usefulness and/or interactivity. As a rapid deviation from the conventional road design, this paper advocates an improved road modeling framework for automatic and interactive road production driven by geographical maps (including elevation, water, vegetation maps). Our method integrates the capability of flexible image generation models with powerful transformer architecture to afford a vectorized road network. We firstly construct a dataset that includes road graphs, density map and their corresponding geographical maps. Secondly, we develop a density map generation network based on image translation model with an attention mechanism to predict a road density map. The usage of density map facilitates faster convergence and better performance, which also serves as the input for road graph generation. Thirdly, we employ the transformer architecture to evolve density maps to road graphs. Our comprehensive experimental results have verified the efficiency, robustness and applicability of our newly-proposed framework for road design.

路网设计是景观建模的重要组成部分，在自动驾驶、视频游戏开发和灾难模拟中具有重要意义。迄今为止，这项工作仍然是劳动密集型的，既繁琐又耗时。在过去的二十年里，人们提出了许多改进的技术。然而，大多数最先进的方法在直观性、实用性和/或交互性方面仍然存在问题。作为对传统道路设计的快速突破，本文提出了一种改进的道路建模框架，用于在地理图（包括高程图、水系图、植被图）的驱动下自动生成交互式道路。我们的方法将灵活的图像生成模型能力与强大的转换器架构相结合，以提供矢量化的道路网络。首先，我们构建了一个数据集，其中包括道路图、密度图及其相应的地理图。其次，我们开发了一种基于图像转换模型的密度图生成网络，该网络具有预测道路密度图的注意力机制。密度图的使用有助于加快收敛速度，提高性能，同时也可作为道路图生成的输入。第三，我们采用变换器架构将密度图演化为道路图。我们的综合实验结果验证了我们新提出的道路设计框架的高效性、稳健性和适用性。

{"title":"A novel transformer-based graph generation model for vectorized road design","authors":"Peichi Zhou, Chen Li, Jian Zhang, Changbo Wang, Hong Qin, Long Liu","doi":"10.1002/cav.2267","DOIUrl":"https://doi.org/10.1002/cav.2267","url":null,"abstract":"Road network design, as an important part of landscape modeling, shows a great significance in automatic driving, video game development, and disaster simulation. To date, this task remains labor-intensive, tedious and time-consuming. Many improved techniques have been proposed during the last two decades. Nevertheless, most of the state-of-the-art methods still encounter problems of intuitiveness, usefulness and/or interactivity. As a rapid deviation from the conventional road design, this paper advocates an improved road modeling framework for automatic and interactive road production driven by geographical maps (including elevation, water, vegetation maps). Our method integrates the capability of flexible image generation models with powerful transformer architecture to afford a vectorized road network. We firstly construct a dataset that includes road graphs, density map and their corresponding geographical maps. Secondly, we develop a density map generation network based on image translation model with an attention mechanism to predict a road density map. The usage of density map facilitates faster convergence and better performance, which also serves as the input for road graph generation. Thirdly, we employ the transformer architecture to evolve density maps to road graphs. Our comprehensive experimental results have verified the efficiency, robustness and applicability of our newly-proposed framework for road design.","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PIPformers: Patch based inpainting with vision transformers for generalize paintings PIPformers：利用视觉变换器进行基于补丁的内绘，实现通用绘画

IF 1.1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds

Pub Date : 2024-05-17 DOI: 10.1002/cav.2270

Jeyoung Lee, Hochul Kang

Image inpainting is a field that has been traditionally attempted in the field of computer vision. After the development of deep learning, image inpainting has been advancing endlessly together with convolutional neural networks and generative adversarial networks. Thereafter, it has been expanded to various fields such as image filing through guiding and image inpainting using various masking. Furthermore, the field termed image out-painting has also been pioneered. Meanwhile, after the recent announcement of the vision transformer, various computer vision problems have been attempted using the vision transformer. In this paper, we are trying to solve the problem of image generalization painting using the vision transformer. This is an attempt to fill images with painting no matter whether the areas where painting is missing are in or out of the images, and without guiding. To that end, the painting problem was defined as a problem to drop images in patch units for easy use in the vision transformer. And we solved the problem with a simple network structure created by slightly modifying the vision transformer to fit the problem. We named this network PIPformers. PIPformers achieved better values than other papers compared to PSNR, RMSE and SSIM.

图像内绘是计算机视觉领域的传统尝试。深度学习发展起来后，图像内绘与卷积神经网络和生成对抗网络一起不断向前发展。此后，它又扩展到通过引导进行图像归档和使用各种遮罩进行图像内绘等多个领域。此外，还开创了称为 "图像外绘 "的领域。同时，在最近发布视觉变换器之后，人们开始尝试利用视觉变换器解决各种计算机视觉问题。在本文中，我们试图利用视觉变换器解决图像泛化绘制问题。这是一种尝试，无论缺少绘画的区域是在图像内还是图像外，都可以在不进行引导的情况下用绘画来填充图像。为此，绘画问题被定义为将图像丢弃在补丁单元中以便于视觉转换器使用的问题。为了解决这个问题，我们对视觉转换器稍作修改，创建了一个简单的网络结构。我们将这一网络命名为 PIPformers。与其他论文相比，PIPformers 在 PSNR、RMSE 和 SSIM 方面取得了更好的成绩。

{"title":"PIPformers: Patch based inpainting with vision transformers for generalize paintings","authors":"Jeyoung Lee, Hochul Kang","doi":"10.1002/cav.2270","DOIUrl":"https://doi.org/10.1002/cav.2270","url":null,"abstract":"Image inpainting is a field that has been traditionally attempted in the field of computer vision. After the development of deep learning, image inpainting has been advancing endlessly together with convolutional neural networks and generative adversarial networks. Thereafter, it has been expanded to various fields such as image filing through guiding and image inpainting using various masking. Furthermore, the field termed image out-painting has also been pioneered. Meanwhile, after the recent announcement of the vision transformer, various computer vision problems have been attempted using the vision transformer. In this paper, we are trying to solve the problem of image generalization painting using the vision transformer. This is an attempt to fill images with painting no matter whether the areas where painting is missing are in or out of the images, and without guiding. To that end, the painting problem was defined as a problem to drop images in patch units for easy use in the vision transformer. And we solved the problem with a simple network structure created by slightly modifying the vision transformer to fit the problem. We named this network PIPformers. PIPformers achieved better values than other papers compared to PSNR, RMSE and SSIM.","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2270","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

UnderwaterImage2IR: Underwater impulse response generation via dual-path pre-trained networks and conditional generative adversarial networks 水下图像 2IR：通过双路径预训练网络和条件生成对抗网络生成水下脉冲响应

IF 1.1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds

Pub Date : 2024-05-17 DOI: 10.1002/cav.2243

Yisheng Zhang, Shiguang Liu

In the field of acoustic simulation, methods that are widely applied and have been proven to be highly effective rely on accurately capturing the impulse response (IR) and its convolution relationship. This article introduces a novel approach, named as UnderwaterImage2IR, that generates acoustic IRs from underwater images using dual-path pre-trained networks. This technique aims to achieve cross-modal conversion from underwater visual images to acoustic information with high accuracy at a low cost. Our method utilizes deep learning technology by integrating dual-path pre-trained networks and conditional generative adversarial networks conditional generative adversarial networks (CGANs) to generate acoustic IRs that match the observed scenes. One branch of the network focuses on the extraction of spatial features from images, while the other is dedicated to recognizing underwater characteristics. These features are fed into the CGAN network, which is trained to generate acoustic IRs corresponding to the observed scenes, thereby achieving high-accuracy acoustic simulation in an efficient manner. Experimental results, compared with the ground truth and evaluated by human experts, demonstrate the significant advantages of our method in generating underwater acoustic IRs, further proving its potential application in underwater acoustic simulation.

在声学模拟领域，广泛应用并被证明非常有效的方法依赖于准确捕捉脉冲响应（IR）及其卷积关系。本文介绍了一种名为 "水下图像 2IR"（UnderwaterImage2IR）的新方法，该方法利用双路径预训练网络从水下图像生成声学 IR。该技术旨在以低成本、高精度实现水下视觉图像到声学信息的跨模态转换。我们的方法利用深度学习技术，通过整合双路径预训练网络和条件生成对抗网络条件生成对抗网络（CGANs）来生成与观测场景相匹配的声学红外图像。该网络的一个分支侧重于从图像中提取空间特征，而另一个分支则专门用于识别水下特征。这些特征被输入 CGAN 网络，经过训练后生成与观察到的场景相对应的声学红外图像，从而以高效的方式实现高精度的声学模拟。实验结果与地面实况进行了比较，并由人类专家进行了评估，证明了我们的方法在生成水下声学红外图像方面的显著优势，进一步证明了其在水下声学模拟中的潜在应用。

{"title":"UnderwaterImage2IR: Underwater impulse response generation via dual-path pre-trained networks and conditional generative adversarial networks","authors":"Yisheng Zhang, Shiguang Liu","doi":"10.1002/cav.2243","DOIUrl":"https://doi.org/10.1002/cav.2243","url":null,"abstract":"In the field of acoustic simulation, methods that are widely applied and have been proven to be highly effective rely on accurately capturing the impulse response (IR) and its convolution relationship. This article introduces a novel approach, named as UnderwaterImage2IR, that generates acoustic IRs from underwater images using dual-path pre-trained networks. This technique aims to achieve cross-modal conversion from underwater visual images to acoustic information with high accuracy at a low cost. Our method utilizes deep learning technology by integrating dual-path pre-trained networks and conditional generative adversarial networks conditional generative adversarial networks (CGANs) to generate acoustic IRs that match the observed scenes. One branch of the network focuses on the extraction of spatial features from images, while the other is dedicated to recognizing underwater characteristics. These features are fed into the CGAN network, which is trained to generate acoustic IRs corresponding to the observed scenes, thereby achieving high-accuracy acoustic simulation in an efficient manner. Experimental results, compared with the ground truth and evaluated by human experts, demonstrate the significant advantages of our method in generating underwater acoustic IRs, further proving its potential application in underwater acoustic simulation.","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Microfacet rendering with diffraction compensation 带衍射补偿的微面渲染

IF 1.1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds

Pub Date : 2024-05-17 DOI: 10.1002/cav.2253

Xudong Yang, Aoran Lyu, Chuhua Xian, Hongmin Cai

The traditional microfacet rendering models usually only consider the straight propagation of light and do not take into account the diffraction effect when calculating the radiance of outgoing light. However, ignoring the energy generated by diffraction can lead to darker rendering results when the object's surface has many small details. To address this issue, we introduce a diffraction energy term in the microfacet model to compensate for the energy loss caused by diffraction. Starting from the Fresnel-Kirchhoff diffraction theorem, we combine it with the Cook-Torrance model. By incorporating the computed diffraction radiance into the outgoing radiance of the microfacet, we obtain a diffraction-compensated BRDF (Bidirectional Reflectance Distribution Function) model. Experimental results demonstrate that our proposed method has a significant effect in compensating for outgoing light and produces more realistic rendering results.

传统的微切面渲染模型通常只考虑光线的直线传播，在计算出射光线的辐射度时不会考虑衍射效应。然而，当物体表面有许多小细节时，忽略衍射产生的能量可能会导致渲染结果较暗。为了解决这个问题，我们在 microfacet 模型中引入了衍射能量项，以补偿衍射造成的能量损失。从菲涅尔-基尔霍夫衍射定理出发，我们将其与库克-托伦斯模型相结合。通过将计算出的衍射辐射率纳入 microfacet 的出射辐射率，我们得到了衍射补偿 BRDF（双向反射分布函数）模型。实验结果表明，我们提出的方法在补偿出射光线方面效果显著，并能产生更逼真的渲染效果。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Computer Animation and Virtual Worlds

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀