Robust Diffusion-based Motion In-betweening

IF 2.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Computer Graphics Forum Pub Date : 2024-11-07 DOI:10.1111/cgf.15260

Jia Qin, Peng Yan, Bo An

{"title":"Robust Diffusion-based Motion In-betweening","authors":"Jia Qin, Peng Yan, Bo An","doi":"10.1111/cgf.15260","DOIUrl":null,"url":null,"abstract":"<p>The emergence of learning-based motion in-betweening techniques offers animators a more efficient way to animate characters. However, existing non-generative methods either struggle to support long transition generation or produce results that lack diversity. Meanwhile, diffusion models have shown promising results in synthesizing diverse and high-quality motions driven by text and keyframes. However, in these methods, keyframes often serve as a guide rather than a strict constraint and can sometimes be ignored when keyframes are sparse. To address these issues, we propose a lightweight yet effective diffusion-based motion in-betweening framework that generates animations conforming to keyframe constraints. We incorporate keyframe constraints into the training phase to enhance robustness in handling various constraint densities. Moreover, we employ relative positional encoding to improve the model's generalization on long range in-betweening tasks. This approach enables the model to learn from short animations while generating realistic in-betweening motions spanning thousands of frames. We conduct extensive experiments to validate our framework using the newly proposed metrics K-FID, K-Diversity, and K-Error, designed to evaluate generative in-betweening methods. Results demonstrate that our method outperforms existing diffusion-based methods across various lengths and keyframe densities. We also show that our method can be applied to text-driven motion synthesis, offering fine-grained control over the generated results.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Graphics Forum","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cgf.15260","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

The emergence of learning-based motion in-betweening techniques offers animators a more efficient way to animate characters. However, existing non-generative methods either struggle to support long transition generation or produce results that lack diversity. Meanwhile, diffusion models have shown promising results in synthesizing diverse and high-quality motions driven by text and keyframes. However, in these methods, keyframes often serve as a guide rather than a strict constraint and can sometimes be ignored when keyframes are sparse. To address these issues, we propose a lightweight yet effective diffusion-based motion in-betweening framework that generates animations conforming to keyframe constraints. We incorporate keyframe constraints into the training phase to enhance robustness in handling various constraint densities. Moreover, we employ relative positional encoding to improve the model's generalization on long range in-betweening tasks. This approach enables the model to learn from short animations while generating realistic in-betweening motions spanning thousands of frames. We conduct extensive experiments to validate our framework using the newly proposed metrics K-FID, K-Diversity, and K-Error, designed to evaluate generative in-betweening methods. Results demonstrate that our method outperforms existing diffusion-based methods across various lengths and keyframe densities. We also show that our method can be applied to text-driven motion synthesis, offering fine-grained control over the generated results.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于扩散的鲁棒性运动中间处理

基于学习的运动过渡技术的出现为动画制作者提供了一种更高效的角色动画制作方法。然而，现有的非生成方法要么难以支持长时间的过渡生成，要么生成的结果缺乏多样性。与此同时，扩散模型在合成由文本和关键帧驱动的多样化高质量动作方面取得了可喜的成果。然而，在这些方法中，关键帧通常只是一种指导，而不是严格的约束，当关键帧稀少时，关键帧有时会被忽略。为了解决这些问题，我们提出了一种轻量级但有效的基于扩散的中间运动框架，它能生成符合关键帧约束的动画。我们将关键帧约束纳入训练阶段，以增强处理各种约束密度的鲁棒性。此外，我们还采用了相对位置编码，以提高模型在远距离穿插任务中的泛化能力。这种方法使模型能够从短小的动画中学习，同时生成跨越数千帧的逼真中间运动。我们进行了大量实验，使用新提出的 K-FID、K-Diversity 和 K-Error 指标来验证我们的框架，这些指标旨在评估生成式夹杂方法。结果表明，在不同长度和关键帧密度的情况下，我们的方法优于现有的基于扩散的方法。我们还证明，我们的方法可应用于文本驱动的运动合成，对生成结果进行精细控制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computer Graphics Forum 工程技术-计算机：软件工程

CiteScore

5.80

自引率

12.00%

发文量

175

审稿时长

3-6 weeks

期刊介绍： Computer Graphics Forum is the official journal of Eurographics, published in cooperation with Wiley-Blackwell, and is a unique, international source of information for computer graphics professionals interested in graphics developments worldwide. It is now one of the leading journals for researchers, developers and users of computer graphics in both commercial and academic environments. The journal reports on the latest developments in the field throughout the world and covers all aspects of the theory, practice and application of computer graphics.

期刊最新文献

Front Matter DiffPop: Plausibility-Guided Object Placement Diffusion for Image Composition Front Matter LGSur-Net: A Local Gaussian Surface Representation Network for Upsampling Highly Sparse Point Cloud 𝒢-Style: Stylized Gaussian Splatting