Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-18 DOI:arxiv-2409.11920

Lorenzo Mandelli, Stefano Berretti

引用次数: 0

Abstract

In this paper, we address the challenge of generating realistic 3D human motions for action classes that were never seen during the training phase. Our approach involves decomposing complex actions into simpler movements, specifically those observed during training, by leveraging the knowledge of human motion contained in GPTs models. These simpler movements are then combined into a single, realistic animation using the properties of diffusion models. Our claim is that this decomposition and subsequent recombination of simple movements can synthesize an animation that accurately represents the complex input action. This method operates during the inference phase and can be integrated with any pre-trained diffusion model, enabling the synthesis of motion classes not present in the training data. We evaluate our method by dividing two benchmark human motion datasets into basic and complex actions, and then compare its performance against the state-of-the-art.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过扩散模型的时空组合生成复杂的三维人体运动

在本文中，我们要解决的难题是为训练阶段从未见过的动作类别生成逼真的三维人类动作。我们的方法是利用 GPTs 模型中包含的人类动作知识，将复杂动作分解为更简单的动作，特别是在训练过程中观察到的动作。然后利用扩散模型的特性，将这些较简单的动作组合成单个逼真的动画。我们的主张是，这种简单动作的分解和随后的重组可以合成一个能准确表现复杂输入动作的动画。这种方法在推理阶段运行，可以与任何预先训练好的扩散模型相结合，从而合成训练数据中不存在的动作类别。我们通过将两个基准人类动作数据集分为基本动作和复杂动作来评估我们的方法，然后将其性能与最先进的方法进行比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey