Michael Adewole, Oluwaseyi Giwa, Favour Nerrise, Martins Osifeko, Ajibola Oyedeji
{"title":"Human Motion Synthesis_ A Diffusion Approach for Motion Stitching and In-Betweening","authors":"Michael Adewole, Oluwaseyi Giwa, Favour Nerrise, Martins Osifeko, Ajibola Oyedeji","doi":"arxiv-2409.06791","DOIUrl":null,"url":null,"abstract":"Human motion generation is an important area of research in many fields. In\nthis work, we tackle the problem of motion stitching and in-betweening. Current\nmethods either require manual efforts, or are incapable of handling longer\nsequences. To address these challenges, we propose a diffusion model with a\ntransformer-based denoiser to generate realistic human motion. Our method\ndemonstrated strong performance in generating in-betweening sequences,\ntransforming a variable number of input poses into smooth and realistic motion\nsequences consisting of 75 frames at 15 fps, resulting in a total duration of 5\nseconds. We present the performance evaluation of our method using quantitative\nmetrics such as Frechet Inception Distance (FID), Diversity, and Multimodality,\nalong with visual assessments of the generated outputs.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Human motion generation is an important area of research in many fields. In
this work, we tackle the problem of motion stitching and in-betweening. Current
methods either require manual efforts, or are incapable of handling longer
sequences. To address these challenges, we propose a diffusion model with a
transformer-based denoiser to generate realistic human motion. Our method
demonstrated strong performance in generating in-betweening sequences,
transforming a variable number of input poses into smooth and realistic motion
sequences consisting of 75 frames at 15 fps, resulting in a total duration of 5
seconds. We present the performance evaluation of our method using quantitative
metrics such as Frechet Inception Distance (FID), Diversity, and Multimodality,
along with visual assessments of the generated outputs.