MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion

arXiv - CS - Multimedia Pub Date : 2024-09-18 DOI:arxiv-2409.12140

Kalakonda Sai Shashank, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla

引用次数: 0

Abstract

We introduce MoRAG, a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation. The method enhances motion diffusion models by leveraging additional knowledge obtained through an improved motion retrieval process. By effectively prompting large language models (LLMs), we address spelling errors and rephrasing issues in motion retrieval. Our approach utilizes a multi-part retrieval strategy to improve the generalizability of motion retrieval across the language space. We create diverse samples through the spatial composition of the retrieved motions. Furthermore, by utilizing low-level, part-specific motion information, we can construct motion samples for unseen text descriptions. Our experiments demonstrate that our framework can serve as a plug-and-play module, improving the performance of motion diffusion models. Code, pretrained models and sample videos will be made available at: https://motion-rag.github.io/

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MoRAG -- 针对人体运动的多融合检索增强生成技术

我们介绍了 MoRAG，这是一种新颖的基于多部分融合的检索-增强生成策略，适用于基于文本的人体动作生成。该方法利用通过动画改进的动作检索过程获得的额外知识来增强动作扩散模型。通过有效地提示大型语言模型（LLM），我们解决了运动检索中的拼写错误和重新措辞问题。我们的方法采用了多部分检索策略，以提高运动检索在整个语言空间的通用性。此外，通过利用低层次、特定部分的运动信息，我们可以为未见的文本描述构建运动样本。我们的实验证明，我们的框架可以作为即插即用模块，提高运动扩散模型的性能。代码、预训练模型和样本视频可在以下网址获取： https://motion-rag.github.io/

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Multimedia

自引率

0.00%

发文量

期刊最新文献

Vista3D: Unravel the 3D Darkside of a Single Image MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion Efficient Low-Resolution Face Recognition via Bridge Distillation Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints NVLM: Open Frontier-Class Multimodal LLMs