SADNet：通过实时单目姿势估计生成身临其境的虚拟现实头像

IF 0.9 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Computer Animation and Virtual Worlds Pub Date : 2024-05-29 DOI:10.1002/cav.2233

Ling Jiang, Yuan Xiong, Qianqian Wang, Tong Chen, Wei Wu, Zhong Zhou

{"title":"SADNet：通过实时单目姿势估计生成身临其境的虚拟现实头像","authors":"Ling Jiang, Yuan Xiong, Qianqian Wang, Tong Chen, Wei Wu, Zhong Zhou","doi":"10.1002/cav.2233","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time-consuming and limited by datasets, which does not satisfy immersive and real-time requirements of VR systems. In this paper, we aim to generate 3D real-time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self-attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre-trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real-time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose-driven avatars on Helmet-Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state-of-the-art trade-off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose-driven 3D human avatars generated by our method are smooth and attractive.</p>\n </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SADNet: Generating immersive virtual reality avatars by real-time monocular pose estimation\",\"authors\":\"Ling Jiang, Yuan Xiong, Qianqian Wang, Tong Chen, Wei Wu, Zhong Zhou\",\"doi\":\"10.1002/cav.2233\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time-consuming and limited by datasets, which does not satisfy immersive and real-time requirements of VR systems. In this paper, we aim to generate 3D real-time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self-attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre-trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real-time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose-driven avatars on Helmet-Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state-of-the-art trade-off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose-driven 3D human avatars generated by our method are smooth and attractive.</p>\\n </div>\",\"PeriodicalId\":50645,\"journal\":{\"name\":\"Computer Animation and Virtual Worlds\",\"volume\":\"35 3\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2024-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Animation and Virtual Worlds\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cav.2233\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.2233","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

生成身临其境的虚拟现实化身是 VR/AR 应用中的一项具有挑战性的任务，它将人体的物理姿势映射到虚拟场景中的化身，以获得身临其境的用户体验。然而，大多数现有工作都耗时且受数据集的限制，无法满足虚拟现实系统的沉浸式和实时性要求。本文旨在基于单目摄像头生成三维实时虚拟现实头像，以解决这些问题。具体来说，我们首先设计了一个自我注意力蒸馏网络（SADNet），在预先训练好的教师指导下进行有效的人体姿态估计。其次，我们为人类化身提出了一种轻量级姿势映射方法，该方法利用摄像头模型将二维姿势映射到三维化身关键点，从而生成具有姿势一致性的实时人类化身。最后，我们将框架集成到虚拟现实系统中，在头盔式显示设备上显示生成的三维姿势驱动化身，让用户获得身临其境的体验。我们在两个公开可用的数据集上对 SADNet 进行了评估。实验结果表明，SADNet 在速度和准确性之间实现了最先进的权衡。此外，我们还对虚拟现实头像的性能和沉浸感进行了用户体验研究。结果表明，用我们的方法生成的姿势驱动的三维人类头像流畅而有吸引力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SADNet: Generating immersive virtual reality avatars by real-time monocular pose estimation

Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time-consuming and limited by datasets, which does not satisfy immersive and real-time requirements of VR systems. In this paper, we aim to generate 3D real-time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self-attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre-trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real-time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose-driven avatars on Helmet-Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state-of-the-art trade-off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose-driven 3D human avatars generated by our method are smooth and attractive.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Animation and Virtual Worlds 工程技术-计算机：软件工程

CiteScore

2.20

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.