Object Motion Guided Human Motion Synthesis

Jiaman Li, Jiajun Wu, C. K. Liu
{"title":"Object Motion Guided Human Motion Synthesis","authors":"Jiaman Li, Jiajun Wu, C. K. Liu","doi":"10.1145/3618333","DOIUrl":null,"url":null,"abstract":"Modeling human behaviors in contextual environments has a wide range of applications in character animation, embodied AI, VR/AR, and robotics. In real-world scenarios, humans frequently interact with the environment and manipulate various objects to complete daily tasks. In this work, we study the problem of full-body human motion synthesis for the manipulation of large-sized objects. We propose Object MOtion guided human MOtion synthesis (OMOMO), a conditional diffusion framework that can generate full-body manipulation behaviors from only the object motion. Since naively applying diffusion models fails to precisely enforce contact constraints between the hands and the object, OMOMO learns two separate denoising processes to first predict hand positions from object motion and subsequently synthesize full-body poses based on the predicted hand positions. By employing the hand positions as an intermediate representation between the two denoising processes, we can explicitly enforce contact constraints, resulting in more physically plausible manipulation motions. With the learned model, we develop a novel system that captures full-body human manipulation motions by simply attaching a smartphone to the object being manipulated. Through extensive experiments, we demonstrate the effectiveness of our proposed pipeline and its ability to generalize to unseen objects. Additionally, as high-quality human-object interaction datasets are scarce, we collect a large-scale dataset consisting of 3D object geometry, object motion, and human motion. Our dataset contains human-object interaction motion for 15 objects, with a total duration of approximately 10 hours.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics (TOG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3618333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Modeling human behaviors in contextual environments has a wide range of applications in character animation, embodied AI, VR/AR, and robotics. In real-world scenarios, humans frequently interact with the environment and manipulate various objects to complete daily tasks. In this work, we study the problem of full-body human motion synthesis for the manipulation of large-sized objects. We propose Object MOtion guided human MOtion synthesis (OMOMO), a conditional diffusion framework that can generate full-body manipulation behaviors from only the object motion. Since naively applying diffusion models fails to precisely enforce contact constraints between the hands and the object, OMOMO learns two separate denoising processes to first predict hand positions from object motion and subsequently synthesize full-body poses based on the predicted hand positions. By employing the hand positions as an intermediate representation between the two denoising processes, we can explicitly enforce contact constraints, resulting in more physically plausible manipulation motions. With the learned model, we develop a novel system that captures full-body human manipulation motions by simply attaching a smartphone to the object being manipulated. Through extensive experiments, we demonstrate the effectiveness of our proposed pipeline and its ability to generalize to unseen objects. Additionally, as high-quality human-object interaction datasets are scarce, we collect a large-scale dataset consisting of 3D object geometry, object motion, and human motion. Our dataset contains human-object interaction motion for 15 objects, with a total duration of approximately 10 hours.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
物体运动引导的人体运动合成
模拟人类在上下文环境中的行为在角色动画、人工智能、VR/AR 和机器人技术中有着广泛的应用。在现实世界中,人类经常与环境互动,操纵各种物体完成日常任务。在这项工作中,我们研究了操纵大型物体的全身人体运动合成问题。我们提出了 "物体运动引导的人体运动合成(OMOMO)",这是一种条件扩散框架,可以仅从物体运动生成全身操纵行为。由于天真地应用扩散模型无法精确执行手与物体之间的接触约束,OMOMO 学习了两个独立的去噪过程,首先从物体运动中预测手的位置,然后根据预测的手的位置合成全身姿势。通过将手部位置作为两个去噪过程之间的中间表征,我们可以明确执行接触约束,从而产生物理上更合理的操纵动作。利用学习到的模型,我们开发了一种新型系统,只需将智能手机连接到被操纵的物体上,就能捕捉到人体的全身操纵动作。通过广泛的实验,我们证明了我们提出的管道的有效性及其对未见物体的泛化能力。此外,由于高质量的人机交互数据集非常稀缺,我们收集了一个包含三维物体几何形状、物体运动和人体运动的大型数据集。我们的数据集包含 15 个物体的人-物交互运动,总时长约 10 小时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
GeoLatent: A Geometric Approach to Latent Space Design for Deformable Shape Generators An Implicit Neural Representation for the Image Stack: Depth, All in Focus, and High Dynamic Range Rectifying Strip Patterns From Skin to Skeleton: Towards Biomechanically Accurate 3D Digital Humans Warped-Area Reparameterization of Differential Path Integrals
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1