Mingao Zhang, Changhong Liu, Yong Chen, Zhenchun Lei, Mingwen Wang
{"title":"具有多重一致性的音乐-舞蹈世代","authors":"Mingao Zhang, Changhong Liu, Yong Chen, Zhenchun Lei, Mingwen Wang","doi":"10.1145/3512527.3531430","DOIUrl":null,"url":null,"abstract":"It is necessary for the music-to-dance generation to consider both the kinematics in dance that is highly complex and non-linear and the connection between music and dance movement that is far from deterministic. Existing approaches attempt to address the limited creativity problem, but it is still a very challenging task. First, it is a long-term sequence-to-sequence task. Second, it is noisy in the extracted motion keypoints. Last, there exist local and global dependencies in the music sequence and the dance motion sequence. To address these issues, we propose a novel autoregressive generative framework that predicts future motions based on past motions and music. This framework contains a music conformer, a motion conformer, and a cross-modal conformer, which utilizes the conformer to encode music and motion sequences, and further adapt the cross-modal conformer to the noisy dance motion data that enable it to not only capture local and global dependencies among the sequences but also reduce the effect of noisy data. Quantitative and qualitative experimental results on the publicly available music-to-dance dataset demonstrate our method improves greatly upon the baselines and can generate long-term coherent dance motions well-coordinated with the music.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"s1-16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Music-to-Dance Generation with Multiple Conformer\",\"authors\":\"Mingao Zhang, Changhong Liu, Yong Chen, Zhenchun Lei, Mingwen Wang\",\"doi\":\"10.1145/3512527.3531430\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is necessary for the music-to-dance generation to consider both the kinematics in dance that is highly complex and non-linear and the connection between music and dance movement that is far from deterministic. Existing approaches attempt to address the limited creativity problem, but it is still a very challenging task. First, it is a long-term sequence-to-sequence task. Second, it is noisy in the extracted motion keypoints. Last, there exist local and global dependencies in the music sequence and the dance motion sequence. To address these issues, we propose a novel autoregressive generative framework that predicts future motions based on past motions and music. This framework contains a music conformer, a motion conformer, and a cross-modal conformer, which utilizes the conformer to encode music and motion sequences, and further adapt the cross-modal conformer to the noisy dance motion data that enable it to not only capture local and global dependencies among the sequences but also reduce the effect of noisy data. Quantitative and qualitative experimental results on the publicly available music-to-dance dataset demonstrate our method improves greatly upon the baselines and can generate long-term coherent dance motions well-coordinated with the music.\",\"PeriodicalId\":179895,\"journal\":{\"name\":\"Proceedings of the 2022 International Conference on Multimedia Retrieval\",\"volume\":\"s1-16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3512527.3531430\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3512527.3531430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
It is necessary for the music-to-dance generation to consider both the kinematics in dance that is highly complex and non-linear and the connection between music and dance movement that is far from deterministic. Existing approaches attempt to address the limited creativity problem, but it is still a very challenging task. First, it is a long-term sequence-to-sequence task. Second, it is noisy in the extracted motion keypoints. Last, there exist local and global dependencies in the music sequence and the dance motion sequence. To address these issues, we propose a novel autoregressive generative framework that predicts future motions based on past motions and music. This framework contains a music conformer, a motion conformer, and a cross-modal conformer, which utilizes the conformer to encode music and motion sequences, and further adapt the cross-modal conformer to the noisy dance motion data that enable it to not only capture local and global dependencies among the sequences but also reduce the effect of noisy data. Quantitative and qualitative experimental results on the publicly available music-to-dance dataset demonstrate our method improves greatly upon the baselines and can generate long-term coherent dance motions well-coordinated with the music.