Object Motion Guided Human Motion Synthesis

ACM Transactions on Graphics (TOG) Pub Date : 2023-09-28 DOI:10.1145/3618333

Jiaman Li, Jiajun Wu, C. K. Liu

{"title":"Object Motion Guided Human Motion Synthesis","authors":"Jiaman Li, Jiajun Wu, C. K. Liu","doi":"10.1145/3618333","DOIUrl":null,"url":null,"abstract":"Modeling human behaviors in contextual environments has a wide range of applications in character animation, embodied AI, VR/AR, and robotics. In real-world scenarios, humans frequently interact with the environment and manipulate various objects to complete daily tasks. In this work, we study the problem of full-body human motion synthesis for the manipulation of large-sized objects. We propose Object MOtion guided human MOtion synthesis (OMOMO), a conditional diffusion framework that can generate full-body manipulation behaviors from only the object motion. Since naively applying diffusion models fails to precisely enforce contact constraints between the hands and the object, OMOMO learns two separate denoising processes to first predict hand positions from object motion and subsequently synthesize full-body poses based on the predicted hand positions. By employing the hand positions as an intermediate representation between the two denoising processes, we can explicitly enforce contact constraints, resulting in more physically plausible manipulation motions. With the learned model, we develop a novel system that captures full-body human manipulation motions by simply attaching a smartphone to the object being manipulated. Through extensive experiments, we demonstrate the effectiveness of our proposed pipeline and its ability to generalize to unseen objects. Additionally, as high-quality human-object interaction datasets are scarce, we collect a large-scale dataset consisting of 3D object geometry, object motion, and human motion. Our dataset contains human-object interaction motion for 15 objects, with a total duration of approximately 10 hours.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":"11 1","pages":"1 - 11"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics (TOG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3618333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modeling human behaviors in contextual environments has a wide range of applications in character animation, embodied AI, VR/AR, and robotics. In real-world scenarios, humans frequently interact with the environment and manipulate various objects to complete daily tasks. In this work, we study the problem of full-body human motion synthesis for the manipulation of large-sized objects. We propose Object MOtion guided human MOtion synthesis (OMOMO), a conditional diffusion framework that can generate full-body manipulation behaviors from only the object motion. Since naively applying diffusion models fails to precisely enforce contact constraints between the hands and the object, OMOMO learns two separate denoising processes to first predict hand positions from object motion and subsequently synthesize full-body poses based on the predicted hand positions. By employing the hand positions as an intermediate representation between the two denoising processes, we can explicitly enforce contact constraints, resulting in more physically plausible manipulation motions. With the learned model, we develop a novel system that captures full-body human manipulation motions by simply attaching a smartphone to the object being manipulated. Through extensive experiments, we demonstrate the effectiveness of our proposed pipeline and its ability to generalize to unseen objects. Additionally, as high-quality human-object interaction datasets are scarce, we collect a large-scale dataset consisting of 3D object geometry, object motion, and human motion. Our dataset contains human-object interaction motion for 15 objects, with a total duration of approximately 10 hours.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

物体运动引导的人体运动合成

模拟人类在上下文环境中的行为在角色动画、人工智能、VR/AR 和机器人技术中有着广泛的应用。在现实世界中，人类经常与环境互动，操纵各种物体完成日常任务。在这项工作中，我们研究了操纵大型物体的全身人体运动合成问题。我们提出了 "物体运动引导的人体运动合成（OMOMO）"，这是一种条件扩散框架，可以仅从物体运动生成全身操纵行为。由于天真地应用扩散模型无法精确执行手与物体之间的接触约束，OMOMO 学习了两个独立的去噪过程，首先从物体运动中预测手的位置，然后根据预测的手的位置合成全身姿势。通过将手部位置作为两个去噪过程之间的中间表征，我们可以明确执行接触约束，从而产生物理上更合理的操纵动作。利用学习到的模型，我们开发了一种新型系统，只需将智能手机连接到被操纵的物体上，就能捕捉到人体的全身操纵动作。通过广泛的实验，我们证明了我们提出的管道的有效性及其对未见物体的泛化能力。此外，由于高质量的人机交互数据集非常稀缺，我们收集了一个包含三维物体几何形状、物体运动和人体运动的大型数据集。我们的数据集包含 15 个物体的人-物交互运动，总时长约 10 小时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Graphics (TOG)

自引率

0.00%

发文量

期刊最新文献

GeoLatent: A Geometric Approach to Latent Space Design for Deformable Shape Generators An Implicit Neural Representation for the Image Stack: Depth, All in Focus, and High Dynamic Range Rectifying Strip Patterns From Skin to Skeleton: Towards Biomechanically Accurate 3D Digital Humans Warped-Area Reparameterization of Differential Path Integrals