SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data

IF 7.8 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING ACM Transactions on Graphics Pub Date : 2023-10-31 DOI:10.1145/3625264

Jose Luis Ponton, Haoran Yun, Andreas Aristidou, Carlos Andujar, Nuria Pelechano

{"title":"SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data","authors":"Jose Luis Ponton, Haoran Yun, Andreas Aristidou, Carlos Andujar, Nuria Pelechano","doi":"10.1145/3625264","DOIUrl":null,"url":null,"abstract":"Accurate and reliable human motion reconstruction is crucial for creating natural interactions of full-body avatars in Virtual Reality (VR) and entertainment applications. As the Metaverse and social applications gain popularity, users are seeking cost-effective solutions to create full-body animations that are comparable in quality to those produced by commercial motion capture systems. In order to provide affordable solutions though, it is important to minimize the number of sensors attached to the subject’s body. Unfortunately, reconstructing the full-body pose from sparse data is a heavily under-determined problem. Some studies that use IMU sensors face challenges in reconstructing the pose due to positional drift and ambiguity of the poses. In recent years, some mainstream VR systems have released 6-degree-of-freedom (6-DoF) tracking devices providing positional and rotational information. Nevertheless, most solutions for reconstructing full-body poses rely on traditional inverse kinematics (IK) solutions, which often produce non-continuous and unnatural poses. In this article, we introduce SparsePoser, a novel deep learning-based solution for reconstructing a full-body pose from a reduced set of six tracking devices. Our system incorporates a convolutional-based autoencoder that synthesizes high-quality continuous human poses by learning the human motion manifold from motion capture data. Then, we employ a learned IK component, made of multiple lightweight feed-forward neural networks, to adjust the hands and feet toward the corresponding trackers. We extensively evaluate our method on publicly available motion capture datasets and with real-time live demos. We show that our method outperforms state-of-the-art techniques using IMU sensors or 6-DoF tracking devices, and can be used for users with different body dimensions and proportions.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"198 4","pages":"0"},"PeriodicalIF":7.8000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3625264","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate and reliable human motion reconstruction is crucial for creating natural interactions of full-body avatars in Virtual Reality (VR) and entertainment applications. As the Metaverse and social applications gain popularity, users are seeking cost-effective solutions to create full-body animations that are comparable in quality to those produced by commercial motion capture systems. In order to provide affordable solutions though, it is important to minimize the number of sensors attached to the subject’s body. Unfortunately, reconstructing the full-body pose from sparse data is a heavily under-determined problem. Some studies that use IMU sensors face challenges in reconstructing the pose due to positional drift and ambiguity of the poses. In recent years, some mainstream VR systems have released 6-degree-of-freedom (6-DoF) tracking devices providing positional and rotational information. Nevertheless, most solutions for reconstructing full-body poses rely on traditional inverse kinematics (IK) solutions, which often produce non-continuous and unnatural poses. In this article, we introduce SparsePoser, a novel deep learning-based solution for reconstructing a full-body pose from a reduced set of six tracking devices. Our system incorporates a convolutional-based autoencoder that synthesizes high-quality continuous human poses by learning the human motion manifold from motion capture data. Then, we employ a learned IK component, made of multiple lightweight feed-forward neural networks, to adjust the hands and feet toward the corresponding trackers. We extensively evaluate our method on publicly available motion capture datasets and with real-time live demos. We show that our method outperforms state-of-the-art techniques using IMU sensors or 6-DoF tracking devices, and can be used for users with different body dimensions and proportions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SparsePoser:基于稀疏数据的实时全身运动重建

在虚拟现实(VR)和娱乐应用中，准确可靠的人体运动重建对于创造全身化身的自然交互至关重要。随着Metaverse和社交应用程序的普及，用户正在寻求具有成本效益的解决方案来创建与商业动作捕捉系统制作的质量相当的全身动画。然而，为了提供经济实惠的解决方案，重要的是要尽量减少附着在受试者身体上的传感器数量。不幸的是，从稀疏数据中重建全身姿态是一个严重不确定的问题。由于姿态的位置漂移和模糊性，一些使用IMU传感器的研究在重建姿态时面临挑战。近年来，一些主流的VR系统已经发布了6自由度(6-DoF)跟踪设备，提供位置和旋转信息。然而，大多数重建全身姿态的解决方案依赖于传统的逆运动学(IK)解决方案，这往往产生不连续和不自然的姿态。在本文中，我们介绍了SparsePoser，这是一种基于深度学习的新型解决方案，用于从六个跟踪设备的简化集中重建全身姿势。我们的系统集成了一个基于卷积的自编码器，通过从动作捕捉数据中学习人体运动流形来合成高质量的连续人体姿势。然后，我们使用由多个轻量级前馈神经网络组成的学习IK组件来调整手和脚朝向相应的跟踪器。我们在公开可用的动作捕捉数据集和实时现场演示上广泛评估我们的方法。我们表明，我们的方法优于使用IMU传感器或6自由度跟踪设备的最先进技术，并且可以用于不同身体尺寸和比例的用户。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Graphics 工程技术-计算机：软件工程

CiteScore

14.30

自引率

25.80%

发文量

193

审稿时长

12 months

期刊介绍： ACM Transactions on Graphics (TOG) is a peer-reviewed scientific journal that aims to disseminate the latest findings of note in the field of computer graphics. It has been published since 1982 by the Association for Computing Machinery. Starting in 2003, all papers accepted for presentation at the annual SIGGRAPH conference are printed in a special summer issue of the journal.

期刊最新文献

Direct Manipulation of Procedural Implicit Surfaces 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting Quark: Real-time, High-resolution, and General Neural View Synthesis Differentiable Owen Scrambling ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling