D&D: Learning Human Dynamics from Dynamic Camera

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision Pub Date : 2022-09-19 DOI:10.48550/arXiv.2209.08790

Jiefeng Li, Siyuan Bian, Chaoshun Xu, Gang Liu, Gang Yu, Cewu Lu

{"title":"D&D: Learning Human Dynamics from Dynamic Camera","authors":"Jiefeng Li, Siyuan Bian, Chaoshun Xu, Gang Liu, Gang Yu, Cewu Lu","doi":"10.48550/arXiv.2209.08790","DOIUrl":null,"url":null,"abstract":"3D human pose estimation from a monocular video has recently seen significant improvements. However, most state-of-the-art methods are kinematics-based, which are prone to physically implausible motions with pronounced artifacts. Current dynamics-based methods can predict physically plausible motion but are restricted to simple scenarios with static camera view. In this work, we present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the laws of physics to reconstruct 3D human motion from the in-the-wild videos with a moving camera. D&D introduces inertial force control (IFC) to explain the 3D human motion in the non-inertial local frame by considering the inertial forces of the dynamic camera. To learn the ground contact with limited annotations, we develop probabilistic contact torque (PCT), which is computed by differentiable sampling from contact probabilities and used to generate motions. The contact state can be weakly supervised by encouraging the model to generate correct motions. Furthermore, we propose an attentive PD controller that adjusts target pose states using temporal information to obtain smooth and accurate pose control. Our approach is entirely neural-based and runs without offline optimization or simulation in physics engines. Experiments on large-scale 3D human motion benchmarks demonstrate the effectiveness of D&D, where we exhibit superior performance against both state-of-the-art kinematics-based and dynamics-based methods. Code is available at https://github.com/Jeffsjtu/DnD","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"19 1","pages":"479-496"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2209.08790","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

3D human pose estimation from a monocular video has recently seen significant improvements. However, most state-of-the-art methods are kinematics-based, which are prone to physically implausible motions with pronounced artifacts. Current dynamics-based methods can predict physically plausible motion but are restricted to simple scenarios with static camera view. In this work, we present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the laws of physics to reconstruct 3D human motion from the in-the-wild videos with a moving camera. D&D introduces inertial force control (IFC) to explain the 3D human motion in the non-inertial local frame by considering the inertial forces of the dynamic camera. To learn the ground contact with limited annotations, we develop probabilistic contact torque (PCT), which is computed by differentiable sampling from contact probabilities and used to generate motions. The contact state can be weakly supervised by encouraging the model to generate correct motions. Furthermore, we propose an attentive PD controller that adjusts target pose states using temporal information to obtain smooth and accurate pose control. Our approach is entirely neural-based and runs without offline optimization or simulation in physics engines. Experiments on large-scale 3D human motion benchmarks demonstrate the effectiveness of D&D, where we exhibit superior performance against both state-of-the-art kinematics-based and dynamics-based methods. Code is available at https://github.com/Jeffsjtu/DnD

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

龙与地下城:从动态摄像机学习人类动态

单目视频的3D人体姿态估计最近有了显著的改进。然而，大多数最先进的方法是基于运动学的，这很容易产生物理上难以置信的运动和明显的伪影。目前基于动态的方法可以预测物理上合理的运动，但仅限于静态摄像机视图的简单场景。在这项工作中，我们提出了D&D(从动态摄像机学习人类动力学)，它利用物理定律从移动摄像机的野外视频中重建3D人体运动。龙与地下城引入惯性力控制(IFC)，通过考虑动态摄像机的惯性力来解释非惯性局部坐标系中的三维人体运动。为了学习具有有限注释的地面接触，我们开发了概率接触扭矩(PCT)，该扭矩由接触概率的可微采样计算并用于生成运动。通过鼓励模型产生正确的运动，可以对接触状态进行弱监督。此外，我们提出了一种专注PD控制器，利用时间信息调整目标姿态状态，以获得平滑和准确的姿态控制。我们的方法完全是基于神经的，无需在物理引擎中进行离线优化或模拟。大规模3D人体运动基准实验证明了D&D的有效性，我们在最先进的基于运动学和基于动力学的方法中都表现出卓越的性能。代码可从https://github.com/Jeffsjtu/DnD获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

自引率

0.00%

发文量