With the development of virtual reality technology, simulation surgery has become a low-risk surgical training method and high-precision positioning of surgical instruments is required in virtual simulation surgery. In this paper we design and validate a novel electromagnetic positioning method based on a uniform gradient magnetic field. We employ Maxwell coils to generate the uniform gradient magnetic field and propose two positioning algorithms based on magnetic field, namely the linear equation positioning algorithm and the magnetic field fingerprint positioning algorithm. After validating the feasibility of proposed positioning system through simulation, we construct a prototype system and conduct practical experiments. The experimental results demonstrate that the positioning system exhibits excellent accuracy and speed in both simulation and real-world applications. The positioning accuracy remains consistent and high, showing no significant variation with changes in the positions of surgical instruments.
{"title":"Uniform gradient magnetic field and spatial localization method based on Maxwell coils for virtual surgery simulation","authors":"Yi Huang, Xutian Deng, Xujie Zhao, Wenxuan Xie, Zhiyong Yuan, Jianhui Zhao","doi":"10.1002/cav.2247","DOIUrl":"https://doi.org/10.1002/cav.2247","url":null,"abstract":"<p>With the development of virtual reality technology, simulation surgery has become a low-risk surgical training method and high-precision positioning of surgical instruments is required in virtual simulation surgery. In this paper we design and validate a novel electromagnetic positioning method based on a uniform gradient magnetic field. We employ Maxwell coils to generate the uniform gradient magnetic field and propose two positioning algorithms based on magnetic field, namely the linear equation positioning algorithm and the magnetic field fingerprint positioning algorithm. After validating the feasibility of proposed positioning system through simulation, we construct a prototype system and conduct practical experiments. The experimental results demonstrate that the positioning system exhibits excellent accuracy and speed in both simulation and real-world applications. The positioning accuracy remains consistent and high, showing no significant variation with changes in the positions of surgical instruments.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sipeng Yang, Hongyu Huang, Ying Sophie Huang, Xiaogang Jin
Facial action units (AUs) encode the activations of facial muscle groups, playing a crucial role in expression analysis and facial animation. However, current deep learning AU detection methods primarily focus on single-image analysis, which limits the exploitation of rich temporal context for robust outcomes. Moreover, the scale of available datasets remains limited, leading models trained on these datasets to tend to suffer from overfitting issues. This paper proposes a novel AU detection method integrating spatial and temporal data with inter-subject feature reassignment for accurate and robust AU predictions. Our method first extracts regional features from facial images. Then, to effectively capture both the temporal context and identity-independent features, we introduce a temporal feature combination and feature reassignment (TC&FR) module, which transforms single-image features into a cohesive temporal sequence and fuses features across multiple subjects. This transformation encourages the model to utilize identity-independent features and temporal context, thus ensuring robust prediction outcomes. Experimental results demonstrate the enhancements brought by the proposed modules and the state-of-the-art (SOTA) results achieved by our method.
面部动作单元(AU)编码面部肌肉群的激活,在表情分析和面部动画中发挥着至关重要的作用。然而,目前的深度学习 AU 检测方法主要侧重于单张图像分析,这限制了利用丰富的时间背景来获得稳健的结果。此外,可用数据集的规模仍然有限,导致在这些数据集上训练的模型容易出现过拟合问题。本文提出了一种新颖的 AU 检测方法,该方法将空间和时间数据与受试者之间的特征重新分配相结合,以实现准确、稳健的 AU 预测。我们的方法首先从面部图像中提取区域特征。然后,为了有效捕捉时空背景和与身份无关的特征,我们引入了时空特征组合和特征重新分配(TC&FR)模块,该模块将单张图像特征转换为具有凝聚力的时空序列,并融合多个主体的特征。这种转换促使模型利用与身份无关的特征和时间上下文,从而确保预测结果的稳健性。实验结果表明了所提模块带来的改进,以及我们的方法所取得的最先进(SOTA)结果。
{"title":"Facial action units detection using temporal context and feature reassignment","authors":"Sipeng Yang, Hongyu Huang, Ying Sophie Huang, Xiaogang Jin","doi":"10.1002/cav.2246","DOIUrl":"https://doi.org/10.1002/cav.2246","url":null,"abstract":"<p>Facial action units (AUs) encode the activations of facial muscle groups, playing a crucial role in expression analysis and facial animation. However, current deep learning AU detection methods primarily focus on single-image analysis, which limits the exploitation of rich temporal context for robust outcomes. Moreover, the scale of available datasets remains limited, leading models trained on these datasets to tend to suffer from overfitting issues. This paper proposes a novel AU detection method integrating spatial and temporal data with inter-subject feature reassignment for accurate and robust AU predictions. Our method first extracts regional features from facial images. Then, to effectively capture both the temporal context and identity-independent features, we introduce a temporal feature combination and feature reassignment (TC&FR) module, which transforms single-image features into a cohesive temporal sequence and fuses features across multiple subjects. This transformation encourages the model to utilize identity-independent features and temporal context, thus ensuring robust prediction outcomes. Experimental results demonstrate the enhancements brought by the proposed modules and the state-of-the-art (SOTA) results achieved by our method.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As an amalgamation of landscape design and ichthyology, aquascape endeavors to create visually captivating aquatic environments imbued with artistic allure. Traditional methodologies in aquascape, governed by rigid principles such as composition and color coordination, may inadvertently curtail the aesthetic potential of the landscapes. In this paper, we propose Aquascape Generation based on Stable Diffusion Models (AG-SDM), prioritizing aesthetic principles and color coordination to offer guiding principles for real artists in Aquascape creation. We meticulously curated and annotated three aquascape datasets with varying aspect ratios to accommodate diverse landscape design requirements regarding dimensions and proportions. Leveraging the Fréchet Inception Distance (FID) metric, we trained AGFID for quality assessment. Extensive experiments validate that our AG-SDM excels in generating hyper-realistic underwater landscape images, closely resembling real flora, and achieves state-of-the-art performance in aquascape image generation.
{"title":"AG-SDM: Aquascape generation based on stable diffusion model with low-rank adaptation","authors":"Muyang Zhang, Jinming Yang, Yuewei Xian, Wei Li, Jiaming Gu, Weiliang Meng, Jiguang Zhang, Xiaopeng Zhang","doi":"10.1002/cav.2252","DOIUrl":"https://doi.org/10.1002/cav.2252","url":null,"abstract":"<p>As an amalgamation of landscape design and ichthyology, aquascape endeavors to create visually captivating aquatic environments imbued with artistic allure. Traditional methodologies in aquascape, governed by rigid principles such as composition and color coordination, may inadvertently curtail the aesthetic potential of the landscapes. In this paper, we propose Aquascape Generation based on Stable Diffusion Models (AG-SDM), prioritizing aesthetic principles and color coordination to offer guiding principles for real artists in Aquascape creation. We meticulously curated and annotated three aquascape datasets with varying aspect ratios to accommodate diverse landscape design requirements regarding dimensions and proportions. Leveraging the Fréchet Inception Distance (FID) metric, we trained AGFID for quality assessment. Extensive experiments validate that our AG-SDM excels in generating hyper-realistic underwater landscape images, closely resembling real flora, and achieves state-of-the-art performance in aquascape image generation.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhe Guo, Bingxin Wei, Qinglin Cai, Jiayi Liu, Yi Wang
Facial expression recognition (FER) is one of the popular research topics in computer vision. Most deep learning expression recognition methods perform well on a single dataset, but may struggle in cross-domain FER applications when applied to different datasets. FER under cross-dataset also suffers from difficulties such as feature distribution deviation and discriminator degradation. To address these issues, we propose a prototype-oriented similarity transfer framework (POST) for cross-domain FER. The bidirectional cross-attention Swin Transformer (BCS Transformer) module is designed to aggregate local facial feature similarities across different domains, enabling the extraction of relevant cross-domain features. The dual learnable category prototypes is designed to represent potential space samples for both source and target domains, ensuring enhanced domain alignment by leveraging both cross-domain and specific domain features. We further introduce the self-training resampling (STR) strategy to enhance similarity transfer. The experimental results with the RAF-DB dataset as the source domain and the CK+, FER2013, JAFFE and SFEW 2.0 datasets as the target domains, show that our approach achieves much higher performance than the state-of-the-art cross-domain FER methods.
面部表情识别(FER)是计算机视觉领域的热门研究课题之一。大多数深度学习表情识别方法在单一数据集上表现良好,但应用于不同数据集时,在跨域 FER 应用中可能会遇到困难。跨数据集下的 FER 还存在特征分布偏差和判别器退化等困难。为了解决这些问题,我们提出了一种面向原型的跨域 FER 相似性转移框架(POST)。双向跨注意力斯温变换器(BCS Transformer)模块旨在聚合不同领域的局部面部特征相似性,从而提取相关的跨领域特征。双重可学习类别原型旨在代表源域和目标域的潜在空间样本,通过利用跨域和特定域特征确保增强域对齐。我们进一步引入了自我训练重采样(STR)策略,以增强相似性转移。以 RAF-DB 数据集为源域,CK+、FER2013、JAFFE 和 SFEW 2.0 数据集为目标域的实验结果表明,我们的方法比最先进的跨域 FER 方法实现了更高的性能。
{"title":"POST: Prototype-oriented similarity transfer framework for cross-domain facial expression recognition","authors":"Zhe Guo, Bingxin Wei, Qinglin Cai, Jiayi Liu, Yi Wang","doi":"10.1002/cav.2260","DOIUrl":"https://doi.org/10.1002/cav.2260","url":null,"abstract":"<p>Facial expression recognition (FER) is one of the popular research topics in computer vision. Most deep learning expression recognition methods perform well on a single dataset, but may struggle in cross-domain FER applications when applied to different datasets. FER under cross-dataset also suffers from difficulties such as feature distribution deviation and discriminator degradation. To address these issues, we propose a prototype-oriented similarity transfer framework (POST) for cross-domain FER. The bidirectional cross-attention Swin Transformer (BCS Transformer) module is designed to aggregate local facial feature similarities across different domains, enabling the extraction of relevant cross-domain features. The dual learnable category prototypes is designed to represent potential space samples for both source and target domains, ensuring enhanced domain alignment by leveraging both cross-domain and specific domain features. We further introduce the self-training resampling (STR) strategy to enhance similarity transfer. The experimental results with the RAF-DB dataset as the source domain and the CK+, FER2013, JAFFE and SFEW 2.0 datasets as the target domains, show that our approach achieves much higher performance than the state-of-the-art cross-domain FER methods.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianlu Cai, Frederick W. B. Li, Fangzhe Nan, Bailin Yang
Scene cartoonization aims to convert photos into stylized cartoons. While generative adversarial networks (GANs) can generate high-quality images, previous methods focus on individual images or single styles, ignoring relationships between datasets. We propose a novel multi-style scene cartoonization GAN that leverages multiple cartoon datasets jointly. Our main technical contribution is a multi-branch style encoder that disentangles representations to model styles as distributions over entire datasets rather than images. Combined with a multi-task discriminator and perceptual losses optimizing across collections, our model achieves state-of-the-art diverse stylization while preserving semantics. Experiments demonstrate that by learning from inter-dataset relationships, our method translates photos into cartoon images with improved realism and abstraction fidelity compared to prior arts, without iterative re-training for new styles.
{"title":"Multi-style cartoonization: Leveraging multiple datasets with generative adversarial networks","authors":"Jianlu Cai, Frederick W. B. Li, Fangzhe Nan, Bailin Yang","doi":"10.1002/cav.2269","DOIUrl":"https://doi.org/10.1002/cav.2269","url":null,"abstract":"<p>Scene cartoonization aims to convert photos into stylized cartoons. While generative adversarial networks (GANs) can generate high-quality images, previous methods focus on individual images or single styles, ignoring relationships between datasets. We propose a novel multi-style scene cartoonization GAN that leverages multiple cartoon datasets jointly. Our main technical contribution is a multi-branch style encoder that disentangles representations to model styles as distributions over entire datasets rather than images. Combined with a multi-task discriminator and perceptual losses optimizing across collections, our model achieves state-of-the-art diverse stylization while preserving semantics. Experiments demonstrate that by learning from inter-dataset relationships, our method translates photos into cartoon images with improved realism and abstraction fidelity compared to prior arts, without iterative re-training for new styles.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As an enhancement to skinning-based animations, light-weight secondary motion method for 3D characters are widely demanded in many application scenarios. To address the dependence of data-driven methods on ground truth data, we propose a self-supervised training strategy that is free of ground truth data for the first time in this domain. Specifically, we construct a self-supervised training framework by modeling the implicit integration problem with steps as an optimization problem based on physical energy terms. Furthermore, we introduce a multi-scale edge aggregation mesh-graph block (MSEA-MG Block), which significantly enhances the network performance. This enables our model to make vivid predictions of secondary motion for 3D characters with arbitrary structures. Empirical experiments indicate that our method, without requiring ground truth data for model training, achieves comparable or even superior performance quantitatively and qualitatively compared to state-of-the-art data-driven approaches in the field.
作为对基于皮肤的动画的增强,轻量级三维角色二次运动方法在许多应用场景中都有广泛需求。为了解决数据驱动方法对地面实况数据的依赖,我们首次在该领域提出了一种无需地面实况数据的自监督训练策略。具体来说,我们将隐式积分问题建模为基于物理能量项的优化问题,从而构建了一个自监督训练框架。此外,我们还引入了多尺度边缘聚合网格图块(MSEA-MG Block),从而显著提高了网络性能。这使得我们的模型能够对具有任意结构的 3D 角色的二次运动做出生动的预测。实证实验表明,我们的方法无需地面实况数据来训练模型,就能在定量和定性方面达到与该领域最先进的数据驱动方法相当甚至更优的性能。
{"title":"Multi-scale edge aggregation mesh-graph-network for character secondary motion","authors":"Tianyi Wang, Shiguang Liu","doi":"10.1002/cav.2241","DOIUrl":"https://doi.org/10.1002/cav.2241","url":null,"abstract":"<p>As an enhancement to skinning-based animations, light-weight secondary motion method for 3D characters are widely demanded in many application scenarios. To address the dependence of data-driven methods on ground truth data, we propose a self-supervised training strategy that is free of ground truth data for the first time in this domain. Specifically, we construct a self-supervised training framework by modeling the implicit integration problem with steps as an optimization problem based on physical energy terms. Furthermore, we introduce a multi-scale edge aggregation mesh-graph block (MSEA-MG Block), which significantly enhances the network performance. This enables our model to make vivid predictions of secondary motion for 3D characters with arbitrary structures. Empirical experiments indicate that our method, without requiring ground truth data for model training, achieves comparable or even superior performance quantitatively and qualitatively compared to state-of-the-art data-driven approaches in the field.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peichi Zhou, Chen Li, Jian Zhang, Changbo Wang, Hong Qin, Long Liu
Road network design, as an important part of landscape modeling, shows a great significance in automatic driving, video game development, and disaster simulation. To date, this task remains labor-intensive, tedious and time-consuming. Many improved techniques have been proposed during the last two decades. Nevertheless, most of the state-of-the-art methods still encounter problems of intuitiveness, usefulness and/or interactivity. As a rapid deviation from the conventional road design, this paper advocates an improved road modeling framework for automatic and interactive road production driven by geographical maps (including elevation, water, vegetation maps). Our method integrates the capability of flexible image generation models with powerful transformer architecture to afford a vectorized road network. We firstly construct a dataset that includes road graphs, density map and their corresponding geographical maps. Secondly, we develop a density map generation network based on image translation model with an attention mechanism to predict a road density map. The usage of density map facilitates faster convergence and better performance, which also serves as the input for road graph generation. Thirdly, we employ the transformer architecture to evolve density maps to road graphs. Our comprehensive experimental results have verified the efficiency, robustness and applicability of our newly-proposed framework for road design.
{"title":"A novel transformer-based graph generation model for vectorized road design","authors":"Peichi Zhou, Chen Li, Jian Zhang, Changbo Wang, Hong Qin, Long Liu","doi":"10.1002/cav.2267","DOIUrl":"https://doi.org/10.1002/cav.2267","url":null,"abstract":"<p>Road network design, as an important part of landscape modeling, shows a great significance in automatic driving, video game development, and disaster simulation. To date, this task remains labor-intensive, tedious and time-consuming. Many improved techniques have been proposed during the last two decades. Nevertheless, most of the state-of-the-art methods still encounter problems of intuitiveness, usefulness and/or interactivity. As a rapid deviation from the conventional road design, this paper advocates an improved road modeling framework for automatic and interactive road production driven by geographical maps (including elevation, water, vegetation maps). Our method integrates the capability of flexible image generation models with powerful transformer architecture to afford a vectorized road network. We firstly construct a dataset that includes road graphs, density map and their corresponding geographical maps. Secondly, we develop a density map generation network based on image translation model with an attention mechanism to predict a road density map. The usage of density map facilitates faster convergence and better performance, which also serves as the input for road graph generation. Thirdly, we employ the transformer architecture to evolve density maps to road graphs. Our comprehensive experimental results have verified the efficiency, robustness and applicability of our newly-proposed framework for road design.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image inpainting is a field that has been traditionally attempted in the field of computer vision. After the development of deep learning, image inpainting has been advancing endlessly together with convolutional neural networks and generative adversarial networks. Thereafter, it has been expanded to various fields such as image filing through guiding and image inpainting using various masking. Furthermore, the field termed image out-painting has also been pioneered. Meanwhile, after the recent announcement of the vision transformer, various computer vision problems have been attempted using the vision transformer. In this paper, we are trying to solve the problem of image generalization painting using the vision transformer. This is an attempt to fill images with painting no matter whether the areas where painting is missing are in or out of the images, and without guiding. To that end, the painting problem was defined as a problem to drop images in patch units for easy use in the vision transformer. And we solved the problem with a simple network structure created by slightly modifying the vision transformer to fit the problem. We named this network PIPformers. PIPformers achieved better values than other papers compared to PSNR, RMSE and SSIM.
{"title":"PIPformers: Patch based inpainting with vision transformers for generalize paintings","authors":"Jeyoung Lee, Hochul Kang","doi":"10.1002/cav.2270","DOIUrl":"https://doi.org/10.1002/cav.2270","url":null,"abstract":"<p>Image inpainting is a field that has been traditionally attempted in the field of computer vision. After the development of deep learning, image inpainting has been advancing endlessly together with convolutional neural networks and generative adversarial networks. Thereafter, it has been expanded to various fields such as image filing through guiding and image inpainting using various masking. Furthermore, the field termed image out-painting has also been pioneered. Meanwhile, after the recent announcement of the vision transformer, various computer vision problems have been attempted using the vision transformer. In this paper, we are trying to solve the problem of image generalization painting using the vision transformer. This is an attempt to fill images with painting no matter whether the areas where painting is missing are in or out of the images, and without guiding. To that end, the painting problem was defined as a problem to drop images in patch units for easy use in the vision transformer. And we solved the problem with a simple network structure created by slightly modifying the vision transformer to fit the problem. We named this network PIPformers. PIPformers achieved better values than other papers compared to PSNR, RMSE and SSIM.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2270","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the field of acoustic simulation, methods that are widely applied and have been proven to be highly effective rely on accurately capturing the impulse response (IR) and its convolution relationship. This article introduces a novel approach, named as UnderwaterImage2IR, that generates acoustic IRs from underwater images using dual-path pre-trained networks. This technique aims to achieve cross-modal conversion from underwater visual images to acoustic information with high accuracy at a low cost. Our method utilizes deep learning technology by integrating dual-path pre-trained networks and conditional generative adversarial networks conditional generative adversarial networks (CGANs) to generate acoustic IRs that match the observed scenes. One branch of the network focuses on the extraction of spatial features from images, while the other is dedicated to recognizing underwater characteristics. These features are fed into the CGAN network, which is trained to generate acoustic IRs corresponding to the observed scenes, thereby achieving high-accuracy acoustic simulation in an efficient manner. Experimental results, compared with the ground truth and evaluated by human experts, demonstrate the significant advantages of our method in generating underwater acoustic IRs, further proving its potential application in underwater acoustic simulation.
{"title":"UnderwaterImage2IR: Underwater impulse response generation via dual-path pre-trained networks and conditional generative adversarial networks","authors":"Yisheng Zhang, Shiguang Liu","doi":"10.1002/cav.2243","DOIUrl":"https://doi.org/10.1002/cav.2243","url":null,"abstract":"<p>In the field of acoustic simulation, methods that are widely applied and have been proven to be highly effective rely on accurately capturing the impulse response (IR) and its convolution relationship. This article introduces a novel approach, named as UnderwaterImage2IR, that generates acoustic IRs from underwater images using dual-path pre-trained networks. This technique aims to achieve cross-modal conversion from underwater visual images to acoustic information with high accuracy at a low cost. Our method utilizes deep learning technology by integrating dual-path pre-trained networks and conditional generative adversarial networks conditional generative adversarial networks (CGANs) to generate acoustic IRs that match the observed scenes. One branch of the network focuses on the extraction of spatial features from images, while the other is dedicated to recognizing underwater characteristics. These features are fed into the CGAN network, which is trained to generate acoustic IRs corresponding to the observed scenes, thereby achieving high-accuracy acoustic simulation in an efficient manner. Experimental results, compared with the ground truth and evaluated by human experts, demonstrate the significant advantages of our method in generating underwater acoustic IRs, further proving its potential application in underwater acoustic simulation.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The traditional microfacet rendering models usually only consider the straight propagation of light and do not take into account the diffraction effect when calculating the radiance of outgoing light. However, ignoring the energy generated by diffraction can lead to darker rendering results when the object's surface has many small details. To address this issue, we introduce a diffraction energy term in the microfacet model to compensate for the energy loss caused by diffraction. Starting from the Fresnel-Kirchhoff diffraction theorem, we combine it with the Cook-Torrance model. By incorporating the computed diffraction radiance into the outgoing radiance of the microfacet, we obtain a diffraction-compensated BRDF (Bidirectional Reflectance Distribution Function) model. Experimental results demonstrate that our proposed method has a significant effect in compensating for outgoing light and produces more realistic rendering results.
{"title":"Microfacet rendering with diffraction compensation","authors":"Xudong Yang, Aoran Lyu, Chuhua Xian, Hongmin Cai","doi":"10.1002/cav.2253","DOIUrl":"https://doi.org/10.1002/cav.2253","url":null,"abstract":"<p>The traditional microfacet rendering models usually only consider the straight propagation of light and do not take into account the diffraction effect when calculating the radiance of outgoing light. However, ignoring the energy generated by diffraction can lead to darker rendering results when the object's surface has many small details. To address this issue, we introduce a diffraction energy term in the microfacet model to compensate for the energy loss caused by diffraction. Starting from the Fresnel-Kirchhoff diffraction theorem, we combine it with the Cook-Torrance model. By incorporating the computed diffraction radiance into the outgoing radiance of the microfacet, we obtain a diffraction-compensated BRDF (Bidirectional Reflectance Distribution Function) model. Experimental results demonstrate that our proposed method has a significant effect in compensating for outgoing light and produces more realistic rendering results.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}