Kalakonda Sai Shashank, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla
{"title":"MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion","authors":"Kalakonda Sai Shashank, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla","doi":"arxiv-2409.12140","DOIUrl":null,"url":null,"abstract":"We introduce MoRAG, a novel multi-part fusion based retrieval-augmented\ngeneration strategy for text-based human motion generation. The method enhances\nmotion diffusion models by leveraging additional knowledge obtained through an\nimproved motion retrieval process. By effectively prompting large language\nmodels (LLMs), we address spelling errors and rephrasing issues in motion\nretrieval. Our approach utilizes a multi-part retrieval strategy to improve the\ngeneralizability of motion retrieval across the language space. We create\ndiverse samples through the spatial composition of the retrieved motions.\nFurthermore, by utilizing low-level, part-specific motion information, we can\nconstruct motion samples for unseen text descriptions. Our experiments\ndemonstrate that our framework can serve as a plug-and-play module, improving\nthe performance of motion diffusion models. Code, pretrained models and sample\nvideos will be made available at: https://motion-rag.github.io/","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We introduce MoRAG, a novel multi-part fusion based retrieval-augmented
generation strategy for text-based human motion generation. The method enhances
motion diffusion models by leveraging additional knowledge obtained through an
improved motion retrieval process. By effectively prompting large language
models (LLMs), we address spelling errors and rephrasing issues in motion
retrieval. Our approach utilizes a multi-part retrieval strategy to improve the
generalizability of motion retrieval across the language space. We create
diverse samples through the spatial composition of the retrieved motions.
Furthermore, by utilizing low-level, part-specific motion information, we can
construct motion samples for unseen text descriptions. Our experiments
demonstrate that our framework can serve as a plug-and-play module, improving
the performance of motion diffusion models. Code, pretrained models and sample
videos will be made available at: https://motion-rag.github.io/