首页 > 最新文献

Computer Animation and Virtual Worlds最新文献

英文 中文
De-NeRF: Ultra-high-definition NeRF with deformable net alignment De-NeRF:采用可变形网排列的超高清 NeRF
IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-30 DOI: 10.1002/cav.2240
Jianing Hou, Runjie Zhang, Zhongqi Wu, Weiliang Meng, Xiaopeng Zhang, Jianwei Guo

Neural Radiance Field (NeRF) can render complex 3D scenes with viewpoint-dependent effects. However, less work has been devoted to exploring its limitations in high-resolution environments, especially when upscaled to ultra-high resolution (e.g., 4k). Specifically, existing NeRF-based methods face severe limitations in reconstructing high-resolution real scenes, for example, a large number of parameters, misalignment of the input data, and over-smoothing of details. In this paper, we present a novel and effective framework, called De-NeRF, based on NeRF and deformable convolutional network, to achieve high-fidelity view synthesis in ultra-high resolution scenes: (1) marrying the deformable convolution unit which can solve the problem of misaligned input of the high-resolution data. (2) Presenting a density sparse voxel-based approach which can greatly reduce the training time while rendering results with higher accuracy. Compared to existing high-resolution NeRF methods, our approach improves the rendering quality of high-frequency details and achieves better visual effects in 4K high-resolution scenes.

神经辐射场(NeRF)可以渲染复杂的三维场景,并产生与视点相关的效果。然而,在探索其在高分辨率环境中的局限性方面,特别是在放大到超高分辨率(如 4k)时,研究较少。具体来说,现有的基于 NeRF 的方法在重建高分辨率真实场景时面临着严重的局限性,例如参数数量庞大、输入数据不对齐、细节过度平滑等。本文基于 NeRF 和可变形卷积网络,提出了一种新颖有效的框架,称为 De-NeRF,以实现超高分辨率场景中的高保真视图合成:(1)结合可变形卷积单元,解决高分辨率数据输入错位的问题。(2)提出一种基于密度稀疏体素的方法,可大大减少训练时间,同时呈现更高精度的结果。与现有的高分辨率 NeRF 方法相比,我们的方法提高了高频细节的渲染质量,在 4K 高分辨率场景中实现了更好的视觉效果。
{"title":"De-NeRF: Ultra-high-definition NeRF with deformable net alignment","authors":"Jianing Hou,&nbsp;Runjie Zhang,&nbsp;Zhongqi Wu,&nbsp;Weiliang Meng,&nbsp;Xiaopeng Zhang,&nbsp;Jianwei Guo","doi":"10.1002/cav.2240","DOIUrl":"https://doi.org/10.1002/cav.2240","url":null,"abstract":"<p>Neural Radiance Field (NeRF) can render complex 3D scenes with viewpoint-dependent effects. However, less work has been devoted to exploring its limitations in high-resolution environments, especially when upscaled to ultra-high resolution (e.g., 4k). Specifically, existing NeRF-based methods face severe limitations in reconstructing high-resolution real scenes, for example, a large number of parameters, misalignment of the input data, and over-smoothing of details. In this paper, we present a novel and effective framework, called <i>De-NeRF</i>, based on NeRF and deformable convolutional network, to achieve high-fidelity view synthesis in ultra-high resolution scenes: (1) marrying the deformable convolution unit which can solve the problem of misaligned input of the high-resolution data. (2) Presenting a density sparse voxel-based approach which can greatly reduce the training time while rendering results with higher accuracy. Compared to existing high-resolution NeRF methods, our approach improves the rendering quality of high-frequency details and achieves better visual effects in 4K high-resolution scenes.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Screen-space Streamline Seeding Method for Visualizing Unsteady Flow in Augmented Reality 在增强现实中可视化不稳定流的屏幕空间流线播种法
IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-30 DOI: 10.1002/cav.2250
Hyunmo Kang, JungHyun Han

Streamlines are a popular method of choice in many flow visualization techniques due to their simplicity and intuitiveness. This paper presents a novel streamline seeding method, which is tailored for visualizing unsteady flow in augmented reality (AR). Our method prioritizes visualizing the visible part of the flow field to enhance the flow representation's quality and reduce the computational cost. Being an image-based method, it evenly samples 2D seeds from the screen space. Then, a ray is fired toward each 2D seed, and the on-the-ray point, which has the largest entropy, is selected. It is taken as the 3D seed for a streamline. By advecting such 3D seeds in the velocity field, which is continuously updated in real time, the unsteady flow is visualized more naturally, and the temporal coherence is achieved with no extra efforts. Our method is tested using an AR application for visualizing airflow from a virtual air conditioner. Comparison with the baseline methods shows that our method is suitable for visualizing unsteady flow in AR.

流线因其简单直观而成为许多流动可视化技术的首选方法。本文介绍了一种新颖的流线播种方法,该方法专为增强现实(AR)中的非稳态流动可视化而量身定制。我们的方法优先可视化流场的可见部分,以提高流动表示的质量并降低计算成本。作为一种基于图像的方法,它从屏幕空间中均匀采样二维种子。然后,向每个二维种子发射一条射线,并选择射线上熵值最大的点。将其作为流线的三维种子。通过在实时不断更新的速度场中平移这样的三维种子,可以更自然地可视化非稳态流动,并且无需额外的努力即可实现时间上的一致性。我们使用一个 AR 应用程序对我们的方法进行了测试,该应用程序用于可视化虚拟空调的气流。与基线方法的比较表明,我们的方法适合在 AR 中实现非稳态流的可视化。
{"title":"Screen-space Streamline Seeding Method for Visualizing Unsteady Flow in Augmented Reality","authors":"Hyunmo Kang,&nbsp;JungHyun Han","doi":"10.1002/cav.2250","DOIUrl":"https://doi.org/10.1002/cav.2250","url":null,"abstract":"<p>Streamlines are a popular method of choice in many flow visualization techniques due to their simplicity and intuitiveness. This paper presents a novel streamline seeding method, which is tailored for visualizing unsteady flow in augmented reality (AR). Our method prioritizes visualizing the visible part of the flow field to enhance the flow representation's quality and reduce the computational cost. Being an image-based method, it evenly samples 2D seeds from the screen space. Then, a ray is fired toward each 2D seed, and the on-the-ray point, which has the largest entropy, is selected. It is taken as the 3D seed for a streamline. By advecting such 3D seeds in the velocity field, which is continuously updated in real time, the unsteady flow is visualized more naturally, and the temporal coherence is achieved with no extra efforts. Our method is tested using an AR application for visualizing airflow from a virtual air conditioner. Comparison with the baseline methods shows that our method is suitable for visualizing unsteady flow in AR.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PR3D: Precise and realistic 3D face reconstruction from a single image PR3D:从单张图像重建精确逼真的 3D 人脸
IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-30 DOI: 10.1002/cav.2254
Zhangjin Huang, Xing Wu

Reconstructing the three-dimensional (3D) shape and texture of the face from a single image is a significant and challenging task in computer vision and graphics. In recent years, learning-based reconstruction methods have exhibited outstanding performance, but their effectiveness is severely constrained by the scarcity of available training data with 3D annotations. To address this issue, we present the PR3D (Precise and Realistic 3D face reconstruction) method, which consists of high-precision shape reconstruction based on semi-supervised learning and high-fidelity texture reconstruction based on StyleGAN2. In shape reconstruction, we use in-the-wild face images and 3D annotated datasets to train the auxiliary encoder and the identity encoder, encoding the input image into parameters of FLAME (a parametric 3D face model). Simultaneously, a novel semi-supervised hybrid landmark loss is designed to more effectively learn from in-the-wild face images and 3D annotated datasets. Furthermore, to meet the real-time requirements in practical applications, a lightweight shape reconstruction model called fast-PR3D is distilled through teacher–student learning. In texture reconstruction, we propose a texture extraction method based on face reenactment in StyleGAN2 style space, extracting texture from the source and reenacted face images to constitute a facial texture map. Extensive experiments have demonstrated the state-of-the-art performance of our method.

从单张图像中重建人脸的三维(3D)形状和纹理是计算机视觉和图形学中一项重要而具有挑战性的任务。近年来,基于学习的重建方法表现出了卓越的性能,但由于具有三维注释的可用训练数据稀缺,这些方法的有效性受到了严重制约。为了解决这个问题,我们提出了 PR3D(精确逼真三维人脸重建)方法,它包括基于半监督学习的高精度形状重建和基于 StyleGAN2 的高保真纹理重建。在形状重建中,我们使用野生人脸图像和三维注释数据集来训练辅助编码器和身份编码器,将输入图像编码为 FLAME(参数化三维人脸模型)参数。与此同时,还设计了一种新颖的半监督混合地标损失法,以更有效地学习野外人脸图像和三维注释数据集。此外,为了满足实际应用中的实时性要求,我们通过师生学习提炼出了一种名为 fast-PR3D 的轻量级形状重建模型。在纹理重建方面,我们提出了一种基于 StyleGAN2 风格空间的人脸重演纹理提取方法,从源图像和重演的人脸图像中提取纹理,构成人脸纹理图。广泛的实验证明了我们的方法具有最先进的性能。
{"title":"PR3D: Precise and realistic 3D face reconstruction from a single image","authors":"Zhangjin Huang,&nbsp;Xing Wu","doi":"10.1002/cav.2254","DOIUrl":"https://doi.org/10.1002/cav.2254","url":null,"abstract":"<p>Reconstructing the three-dimensional (3D) shape and texture of the face from a single image is a significant and challenging task in computer vision and graphics. In recent years, learning-based reconstruction methods have exhibited outstanding performance, but their effectiveness is severely constrained by the scarcity of available training data with 3D annotations. To address this issue, we present the PR3D (Precise and Realistic 3D face reconstruction) method, which consists of high-precision shape reconstruction based on semi-supervised learning and high-fidelity texture reconstruction based on StyleGAN2. In shape reconstruction, we use in-the-wild face images and 3D annotated datasets to train the auxiliary encoder and the identity encoder, encoding the input image into parameters of FLAME (a parametric 3D face model). Simultaneously, a novel semi-supervised hybrid landmark loss is designed to more effectively learn from in-the-wild face images and 3D annotated datasets. Furthermore, to meet the real-time requirements in practical applications, a lightweight shape reconstruction model called fast-PR3D is distilled through teacher–student learning. In texture reconstruction, we propose a texture extraction method based on face reenactment in StyleGAN2 style space, extracting texture from the source and reenacted face images to constitute a facial texture map. Extensive experiments have demonstrated the state-of-the-art performance of our method.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of a lightweight and easy-to-wear hand glove with multi-modal tactile perception for digital human 为数字人类设计轻便易穿的多模式触觉手套
IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-30 DOI: 10.1002/cav.2258
Zhigeng Pan, Hongyi Ren, Chang Liu, Ming Chen, Mithun Mukherjee, Wenzhen Yang

Within the field of human–computer interaction, data gloves play an essential role in establishing a connection between virtual and physical environments for the realization of digital human. To enhance the credibility of human-virtual hand interactions, we aim to develop a system incorporating a data glove-embedded technology. Our proposed system collects a wide range of information (temperature, bending, and pressure of fingers) that arise during natural interactions and afterwards reproduce them within the virtual environment. Furthermore, we implement a novel traversal polling technique to facilitate the streamlined aggregation of multi-channel sensors. This mitigates the hardware complexity of the embedded system. The experimental results indicate that the data glove demonstrates a high degree of precision in acquiring real-time hand interaction information, as well as effectively displaying hand posture in real-time using Unity3D. The data glove's lightweight and compact design facilitates its versatile utilization in virtual reality interactions.

在人机交互领域,数据手套在建立虚拟和物理环境之间的联系以实现数字人类方面发挥着至关重要的作用。为了提高人与虚拟手交互的可信度,我们的目标是开发一个包含数据手套嵌入式技术的系统。我们建议的系统收集自然交互过程中产生的各种信息(手指的温度、弯曲和压力),然后在虚拟环境中再现这些信息。此外,我们还采用了一种新颖的遍历轮询技术,以简化多通道传感器的聚合。这减轻了嵌入式系统的硬件复杂性。实验结果表明,数据手套在获取实时手部交互信息方面具有很高的精确度,并能使用 Unity3D 有效地实时显示手部姿态。数据手套的设计轻便小巧,便于在虚拟现实交互中广泛使用。
{"title":"Design of a lightweight and easy-to-wear hand glove with multi-modal tactile perception for digital human","authors":"Zhigeng Pan,&nbsp;Hongyi Ren,&nbsp;Chang Liu,&nbsp;Ming Chen,&nbsp;Mithun Mukherjee,&nbsp;Wenzhen Yang","doi":"10.1002/cav.2258","DOIUrl":"https://doi.org/10.1002/cav.2258","url":null,"abstract":"<p>Within the field of human–computer interaction, data gloves play an essential role in establishing a connection between virtual and physical environments for the realization of digital human. To enhance the credibility of human-virtual hand interactions, we aim to develop a system incorporating a data glove-embedded technology. Our proposed system collects a wide range of information (temperature, bending, and pressure of fingers) that arise during natural interactions and afterwards reproduce them within the virtual environment. Furthermore, we implement a novel traversal polling technique to facilitate the streamlined aggregation of multi-channel sensors. This mitigates the hardware complexity of the embedded system. The experimental results indicate that the data glove demonstrates a high degree of precision in acquiring real-time hand interaction information, as well as effectively displaying hand posture in real-time using Unity3D. The data glove's lightweight and compact design facilitates its versatile utilization in virtual reality interactions.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Soccer match broadcast video analysis method based on detection and tracking 基于检测和跟踪的足球比赛转播视频分析方法
IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-29 DOI: 10.1002/cav.2259
Hongyu Li, Meng Yang, Chao Yang, Jianglang Kang, Xiang Suo, Weiliang Meng, Zhen Li, Lijuan Mao, Bin Sheng, Jun Qi

We propose a comprehensive soccer match video analysis pipeline tailored for broadcast footage, which encompasses three pivotal stages: soccer field localization, player tracking, and soccer ball detection. Firstly, we introduce sports camera calibration to seamlessly map soccer field images from match videos onto a standardized two-dimensional soccer field template. This addresses the challenge of consistent analysis across video frames amid continuous camera angle changes. Secondly, given challenges such as occlusions, high-speed movements, and dynamic camera perspectives, obtaining accurate position data for players and the soccer ball is non-trivial. To mitigate this, we curate a large-scale, high-precision soccer ball detection dataset and devise a robust detection model, which achieved the mAP5095$$ mA{P}_{50-95} $$ of 80.9%. Additionally, we develop a high-speed, efficient, and lightweight tracking model to ensure precise player tracking. Through the integration of these modules, our pipeline focuses on real-time analysis of the current camera lens content during matches, facilitating rapid and accurate computation and analysis while offering intuitive visualizations.

我们提出了一种专为转播镜头定制的综合足球比赛视频分析管道,其中包括三个关键阶段:足球场定位、球员跟踪和足球检测。首先,我们引入了运动摄像机校准技术,将比赛视频中的足球场图像无缝映射到标准化的二维足球场模板上。这就解决了在摄像机角度不断变化的情况下对视频帧进行一致分析的难题。其次,考虑到遮挡、高速运动和动态摄像机视角等挑战,获取球员和足球的准确位置数据并非易事。为了缓解这一问题,我们策划了一个大规模、高精度的足球检测数据集,并设计了一个稳健的检测模型,该模型的 m A P 50 - 95 $$ mA{P}_{50-95} 达到了 80.9%。$$ 的 80.9%。此外,我们还开发了一个高速、高效、轻量级的跟踪模型,以确保精确跟踪球员。通过这些模块的集成,我们的管道侧重于在比赛期间对当前摄像机镜头内容进行实时分析,从而促进快速、准确的计算和分析,同时提供直观的可视化效果。
{"title":"Soccer match broadcast video analysis method based on detection and tracking","authors":"Hongyu Li,&nbsp;Meng Yang,&nbsp;Chao Yang,&nbsp;Jianglang Kang,&nbsp;Xiang Suo,&nbsp;Weiliang Meng,&nbsp;Zhen Li,&nbsp;Lijuan Mao,&nbsp;Bin Sheng,&nbsp;Jun Qi","doi":"10.1002/cav.2259","DOIUrl":"https://doi.org/10.1002/cav.2259","url":null,"abstract":"<p>We propose a comprehensive soccer match video analysis pipeline tailored for broadcast footage, which encompasses three pivotal stages: soccer field localization, player tracking, and soccer ball detection. Firstly, we introduce sports camera calibration to seamlessly map soccer field images from match videos onto a standardized two-dimensional soccer field template. This addresses the challenge of consistent analysis across video frames amid continuous camera angle changes. Secondly, given challenges such as occlusions, high-speed movements, and dynamic camera perspectives, obtaining accurate position data for players and the soccer ball is non-trivial. To mitigate this, we curate a large-scale, high-precision soccer ball detection dataset and devise a robust detection model, which achieved the <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>m</mi>\u0000 <mi>A</mi>\u0000 <msub>\u0000 <mrow>\u0000 <mi>P</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>50</mn>\u0000 <mo>−</mo>\u0000 <mn>95</mn>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ mA{P}_{50-95} $$</annotation>\u0000 </semantics></math> of 80.9%. Additionally, we develop a high-speed, efficient, and lightweight tracking model to ensure precise player tracking. Through the integration of these modules, our pipeline focuses on real-time analysis of the current camera lens content during matches, facilitating rapid and accurate computation and analysis while offering intuitive visualizations.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-based control framework for motion propagation and pattern preservation in swarm flight simulations 基于图的控制框架,用于蜂群飞行模拟中的运动传播和模式保持
IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-29 DOI: 10.1002/cav.2276
Feixiang Qi, Bojian Wang, Meili Wang

Simulation of swarm motion is a crucial research area in computer graphics and animation, and is widely used in a variety of applications such as biological behavior research, robotic swarm control, and the entertainment industry. In this paper, we address the challenges of preserving structural relations between the individuals in swarm flight simulations by proposing an innovative motion control framework that utilizes a graph-based hierarchy to illustrate patterns within a swarm and allows the swarm to perform flight motions along externally specified paths. In addition, this study designs motion propagation strategies with different focuses for varied application scenarios, analyzes the effects of information transfer latencies on pattern preservation under these strategies, and optimizes the control algorithms at the mathematical level. This study not only establishes a complete set of control methods for group flight simulations, but also has excellent scalability, which can be combined with other techniques in this field to provide new solutions for group behavior simulations.

蜂群运动模拟是计算机图形学和动画的一个重要研究领域,被广泛应用于生物行为研究、机器人蜂群控制和娱乐业等多个领域。在本文中,我们提出了一种创新的运动控制框架,利用基于图的层次结构来说明蜂群内部的模式,并允许蜂群沿着外部指定的路径执行飞行运动,从而解决了在蜂群飞行模拟中保留个体间结构关系的难题。此外,本研究还针对不同的应用场景设计了侧重点不同的运动传播策略,分析了在这些策略下信息传输延迟对模式保持的影响,并在数学层面优化了控制算法。该研究不仅为群体飞行模拟建立了一套完整的控制方法,而且具有良好的可扩展性,可与该领域的其他技术相结合,为群体行为模拟提供新的解决方案。
{"title":"Graph-based control framework for motion propagation and pattern preservation in swarm flight simulations","authors":"Feixiang Qi,&nbsp;Bojian Wang,&nbsp;Meili Wang","doi":"10.1002/cav.2276","DOIUrl":"https://doi.org/10.1002/cav.2276","url":null,"abstract":"<p>Simulation of swarm motion is a crucial research area in computer graphics and animation, and is widely used in a variety of applications such as biological behavior research, robotic swarm control, and the entertainment industry. In this paper, we address the challenges of preserving structural relations between the individuals in swarm flight simulations by proposing an innovative motion control framework that utilizes a graph-based hierarchy to illustrate patterns within a swarm and allows the swarm to perform flight motions along externally specified paths. In addition, this study designs motion propagation strategies with different focuses for varied application scenarios, analyzes the effects of information transfer latencies on pattern preservation under these strategies, and optimizes the control algorithms at the mathematical level. This study not only establishes a complete set of control methods for group flight simulations, but also has excellent scalability, which can be combined with other techniques in this field to provide new solutions for group behavior simulations.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SADNet: Generating immersive virtual reality avatars by real-time monocular pose estimation SADNet:通过实时单目姿势估计生成身临其境的虚拟现实头像
IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-29 DOI: 10.1002/cav.2233
Ling Jiang, Yuan Xiong, Qianqian Wang, Tong Chen, Wei Wu, Zhong Zhou

Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time-consuming and limited by datasets, which does not satisfy immersive and real-time requirements of VR systems. In this paper, we aim to generate 3D real-time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self-attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre-trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real-time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose-driven avatars on Helmet-Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state-of-the-art trade-off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose-driven 3D human avatars generated by our method are smooth and attractive.

生成身临其境的虚拟现实化身是 VR/AR 应用中的一项具有挑战性的任务,它将人体的物理姿势映射到虚拟场景中的化身,以获得身临其境的用户体验。然而,大多数现有工作都耗时且受数据集的限制,无法满足虚拟现实系统的沉浸式和实时性要求。本文旨在基于单目摄像头生成三维实时虚拟现实头像,以解决这些问题。具体来说,我们首先设计了一个自我注意力蒸馏网络(SADNet),在预先训练好的教师指导下进行有效的人体姿态估计。其次,我们为人类化身提出了一种轻量级姿势映射方法,该方法利用摄像头模型将二维姿势映射到三维化身关键点,从而生成具有姿势一致性的实时人类化身。最后,我们将框架集成到虚拟现实系统中,在头盔式显示设备上显示生成的三维姿势驱动化身,让用户获得身临其境的体验。我们在两个公开可用的数据集上对 SADNet 进行了评估。实验结果表明,SADNet 在速度和准确性之间实现了最先进的权衡。此外,我们还对虚拟现实头像的性能和沉浸感进行了用户体验研究。结果表明,用我们的方法生成的姿势驱动的三维人类头像流畅而有吸引力。
{"title":"SADNet: Generating immersive virtual reality avatars by real-time monocular pose estimation","authors":"Ling Jiang,&nbsp;Yuan Xiong,&nbsp;Qianqian Wang,&nbsp;Tong Chen,&nbsp;Wei Wu,&nbsp;Zhong Zhou","doi":"10.1002/cav.2233","DOIUrl":"https://doi.org/10.1002/cav.2233","url":null,"abstract":"<div>\u0000 \u0000 <p>Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time-consuming and limited by datasets, which does not satisfy immersive and real-time requirements of VR systems. In this paper, we aim to generate 3D real-time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self-attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre-trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real-time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose-driven avatars on Helmet-Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state-of-the-art trade-off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose-driven 3D human avatars generated by our method are smooth and attractive.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
S-LASSIE: Structure and smoothness enhanced learning from sparse image ensemble for 3D articulated shape reconstruction S-LASSIE:从稀疏图像集合中进行结构和平滑度增强学习,用于三维关节形状重建
IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-29 DOI: 10.1002/cav.2277
Jingze Feng, Chong He, Guorui Wang, Meili Wang

In computer vision, the task of 3D reconstruction from monocular sparse images poses significant challenges, particularly in the field of animal modelling. The diverse morphology of animals, their varied postures, and the variable conditions of image acquisition significantly complicate the task of accurately reconstructing their 3D shape and pose from a monocular image. To address these complexities, we propose S-LASSIE, a novel technique for 3D reconstruction of quadrupeds from monocular sparse images. It requires only 10–30 images of similar breeds for training. To effectively mitigate depth ambiguities inherent in monocular reconstructions, S-LASSIE employs a multi-angle projection loss function. In addition, our approach, which involves fusion and smoothing of bone structures, resolves issues related to disjointed topological structures and uneven connections at junctions, resulting in 3D models with comprehensive topologies and improved visual fidelity. Our extensive experiments on the Pascal-Part and LASSIE datasets demonstrate significant improvements in keypoint transfer, overall 2D IOU and visual quality, with an average keypoint transfer and overall 2D IOU of 59.6% and 86.3%, respectively, which are superior to existing techniques in the field.

在计算机视觉领域,从单目稀疏图像中重建三维图像是一项重大挑战,尤其是在动物建模领域。动物的形态各异、姿态各异,而且图像采集的条件也各不相同,这使得从单目图像中准确重建动物三维形状和姿态的任务变得非常复杂。为了解决这些复杂问题,我们提出了 S-LASSIE,一种从单眼稀疏图像重建四足动物三维的新技术。它只需要 10-30 张相似品种的图像进行训练。为了有效缓解单目重建中固有的深度模糊性,S-LASSIE 采用了多角度投影损失函数。此外,我们的方法涉及骨骼结构的融合和平滑,解决了拓扑结构不连贯和连接处连接不均匀的问题,从而生成具有全面拓扑结构和更高视觉逼真度的三维模型。我们在 Pascal-Part 和 LASSIE 数据集上进行的大量实验表明,关键点转移、整体二维 IOU 和视觉质量都有显著改善,平均关键点转移率和整体二维 IOU 分别为 59.6% 和 86.3%,优于该领域的现有技术。
{"title":"S-LASSIE: Structure and smoothness enhanced learning from sparse image ensemble for 3D articulated shape reconstruction","authors":"Jingze Feng,&nbsp;Chong He,&nbsp;Guorui Wang,&nbsp;Meili Wang","doi":"10.1002/cav.2277","DOIUrl":"https://doi.org/10.1002/cav.2277","url":null,"abstract":"<p>In computer vision, the task of 3D reconstruction from monocular sparse images poses significant challenges, particularly in the field of animal modelling. The diverse morphology of animals, their varied postures, and the variable conditions of image acquisition significantly complicate the task of accurately reconstructing their 3D shape and pose from a monocular image. To address these complexities, we propose S-LASSIE, a novel technique for 3D reconstruction of quadrupeds from monocular sparse images. It requires only 10–30 images of similar breeds for training. To effectively mitigate depth ambiguities inherent in monocular reconstructions, S-LASSIE employs a multi-angle projection loss function. In addition, our approach, which involves fusion and smoothing of bone structures, resolves issues related to disjointed topological structures and uneven connections at junctions, resulting in 3D models with comprehensive topologies and improved visual fidelity. Our extensive experiments on the Pascal-Part and LASSIE datasets demonstrate significant improvements in keypoint transfer, overall 2D IOU and visual quality, with an average keypoint transfer and overall 2D IOU of 59.6% and 86.3%, respectively, which are superior to existing techniques in the field.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Face attribute translation with multiple feature perceptual reconstruction assisted by style translator 由风格翻译器辅助的多特征感知重建人脸属性翻译
IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-29 DOI: 10.1002/cav.2273
Shuqi Zhu, Jiuzhen Liang, Hao Liu

Improving the accuracy and disentanglement of attribute translation, and maintaining the consistency of face identity have been hot topics in face attribute translation. Recent approaches employ attention mechanisms to enable attribute translation in facial images. However, due to the lack of accuracy in the extraction of style code, the attention mechanism alone is not precise enough for the translation of attributes. To tackle this, we introduce a style translator module, which partitions the style code into attribute-related and unrelated components, enhancing latent space disentanglement for more accurate attribute manipulation. Additionally, many current methods use per-pixel loss functions to preserve face identity. However, this can sacrifice crucial high-level features and textures in the target image. To address this limitation, we propose a multiple-perceptual reconstruction loss to better maintain image fidelity. Extensive qualitative and quantitative experiments in this article demonstrate significant improvements over state-of-the-art methods, validating the effectiveness of our approach.

提高属性翻译的准确性和不纠缠性,以及保持人脸身份的一致性一直是人脸属性翻译的热门话题。最近的方法采用注意力机制来实现人脸图像的属性翻译。然而,由于风格代码提取的准确性不足,仅靠注意力机制还不足以实现精确的属性翻译。为了解决这个问题,我们引入了风格翻译模块,将风格代码分为与属性相关和不相关的部分,增强了潜在空间的解缠能力,从而实现更精确的属性操作。此外,目前的许多方法都使用每像素损失函数来保留人脸身份。然而,这会牺牲目标图像中关键的高级特征和纹理。为了解决这一局限性,我们提出了一种多重感知重建损失,以更好地保持图像的保真度。本文中广泛的定性和定量实验表明,与最先进的方法相比,我们的方法有了显著的改进,验证了我们方法的有效性。
{"title":"Face attribute translation with multiple feature perceptual reconstruction assisted by style translator","authors":"Shuqi Zhu,&nbsp;Jiuzhen Liang,&nbsp;Hao Liu","doi":"10.1002/cav.2273","DOIUrl":"https://doi.org/10.1002/cav.2273","url":null,"abstract":"<p>Improving the accuracy and disentanglement of attribute translation, and maintaining the consistency of face identity have been hot topics in face attribute translation. Recent approaches employ attention mechanisms to enable attribute translation in facial images. However, due to the lack of accuracy in the extraction of style code, the attention mechanism alone is not precise enough for the translation of attributes. To tackle this, we introduce a style translator module, which partitions the style code into attribute-related and unrelated components, enhancing latent space disentanglement for more accurate attribute manipulation. Additionally, many current methods use per-pixel loss functions to preserve face identity. However, this can sacrifice crucial high-level features and textures in the target image. To address this limitation, we propose a multiple-perceptual reconstruction loss to better maintain image fidelity. Extensive qualitative and quantitative experiments in this article demonstrate significant improvements over state-of-the-art methods, validating the effectiveness of our approach.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KDPM: Knowledge-driven dynamic perception model for evacuation scene simulation KDPM:用于疏散场景模拟的知识驱动动态感知模型
IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-29 DOI: 10.1002/cav.2279
Kecheng Tang, Jiawen Zhang, Yuji Shen, Chen Li, Gaoqi He

Evacuation scene simulation has become one important approach for public safety decision-making. Although existing research has considered various factors, including social forces, panic emotions, and so forth, there is a lack of consideration of how complex environmental factors affect human psychology and behavior. The main idea of this paper is to model complex evacuation environmental factors from the perspective of knowledge and explore pedestrians' emergency response mechanisms to this knowledge. Thus, a knowledge-driven dynamic perception model (KDPM) for evacuation scene simulation is proposed in this paper. This model combines three modules: knowledge dissemination, dynamic scene perception, and stress response. Both scenario knowledge and hazard source knowledge are extracted and expressed. The improved intelligent agent perception model is designed by adopting position determination. Moreover, a general adaptation syndrome (GAS) model is first presented by introducing a modified stress system model. Experimental results show that the proposed model is closer to the reality of real data sets.

疏散场景模拟已成为公共安全决策的重要方法之一。现有研究虽然考虑了社会力量、恐慌情绪等多种因素,但缺乏对复杂环境因素如何影响人的心理和行为的考虑。本文的主要思路是从知识的角度对复杂的疏散环境因素进行建模,并探索行人对这些知识的应急反应机制。因此,本文提出了一种用于疏散场景模拟的知识驱动动态感知模型(KDPM)。该模型由三个模块组成:知识传播、动态场景感知和压力响应。该模型提取并表达了场景知识和危险源知识。通过位置确定,设计了改进的智能代理感知模型。此外,通过引入改进的应力系统模型,首次提出了一般适应综合征(GAS)模型。实验结果表明,所提出的模型更接近真实数据集的实际情况。
{"title":"KDPM: Knowledge-driven dynamic perception model for evacuation scene simulation","authors":"Kecheng Tang,&nbsp;Jiawen Zhang,&nbsp;Yuji Shen,&nbsp;Chen Li,&nbsp;Gaoqi He","doi":"10.1002/cav.2279","DOIUrl":"https://doi.org/10.1002/cav.2279","url":null,"abstract":"<p>Evacuation scene simulation has become one important approach for public safety decision-making. Although existing research has considered various factors, including social forces, panic emotions, and so forth, there is a lack of consideration of how complex environmental factors affect human psychology and behavior. The main idea of this paper is to model complex evacuation environmental factors from the perspective of knowledge and explore pedestrians' emergency response mechanisms to this knowledge. Thus, a knowledge-driven dynamic perception model (KDPM) for evacuation scene simulation is proposed in this paper. This model combines three modules: knowledge dissemination, dynamic scene perception, and stress response. Both scenario knowledge and hazard source knowledge are extracted and expressed. The improved intelligent agent perception model is designed by adopting position determination. Moreover, a general adaptation syndrome (GAS) model is first presented by introducing a modified stress system model. Experimental results show that the proposed model is closer to the reality of real data sets.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Animation and Virtual Worlds
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1