种子音乐：高质量可控音乐生成的统一框架

arXiv - CS - Sound Pub Date : 2024-09-13 DOI:arxiv-2409.09214

Ye Bai, Haonan Chen, Jitong Chen, Zhuo Chen, Yi Deng, Xiaohong Dong, Lamtharn Hantrakul, Weituo Hao, Qingqing Huang, Zhongyi Huang, Dongya Jia, Feihu La, Duc Le, Bochen Li, Chumin Li, Hui Li, Xingxing Li, Shouda Liu, Wei-Tsung Lu, Yiqing Lu, Andrew Shaw, Janne Spijkervet, Yakun Sun, Bo Wang, Ju-Chiang Wang, Yuping Wang, Yuxuan Wang, Ling Xu, Yifeng Yang, Chao Yao, Shuo Zhang, Yang Zhang, Yilin Zhang, Hang Zhao, Ziyi Zhao, Dejian Zhong, Shicen Zhou, Pei Zou

{"title":"种子音乐：高质量可控音乐生成的统一框架","authors":"Ye Bai, Haonan Chen, Jitong Chen, Zhuo Chen, Yi Deng, Xiaohong Dong, Lamtharn Hantrakul, Weituo Hao, Qingqing Huang, Zhongyi Huang, Dongya Jia, Feihu La, Duc Le, Bochen Li, Chumin Li, Hui Li, Xingxing Li, Shouda Liu, Wei-Tsung Lu, Yiqing Lu, Andrew Shaw, Janne Spijkervet, Yakun Sun, Bo Wang, Ju-Chiang Wang, Yuping Wang, Yuxuan Wang, Ling Xu, Yifeng Yang, Chao Yao, Shuo Zhang, Yang Zhang, Yilin Zhang, Hang Zhao, Ziyi Zhao, Dejian Zhong, Shicen Zhou, Pei Zou","doi":"arxiv-2409.09214","DOIUrl":null,"url":null,"abstract":"We introduce Seed-Music, a suite of music generation systems capable of\nproducing high-quality music with fine-grained style control. Our unified\nframework leverages both auto-regressive language modeling and diffusion\napproaches to support two key music creation workflows: \\textit{controlled\nmusic generation} and \\textit{post-production editing}. For controlled music\ngeneration, our system enables vocal music generation with performance controls\nfrom multi-modal inputs, including style descriptions, audio references,\nmusical scores, and voice prompts. For post-production editing, it offers\ninteractive tools for editing lyrics and vocal melodies directly in the\ngenerated audio. We encourage readers to listen to demo audio examples at\nhttps://team.doubao.com/seed-music .","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Seed-Music: A Unified Framework for High Quality and Controlled Music Generation\",\"authors\":\"Ye Bai, Haonan Chen, Jitong Chen, Zhuo Chen, Yi Deng, Xiaohong Dong, Lamtharn Hantrakul, Weituo Hao, Qingqing Huang, Zhongyi Huang, Dongya Jia, Feihu La, Duc Le, Bochen Li, Chumin Li, Hui Li, Xingxing Li, Shouda Liu, Wei-Tsung Lu, Yiqing Lu, Andrew Shaw, Janne Spijkervet, Yakun Sun, Bo Wang, Ju-Chiang Wang, Yuping Wang, Yuxuan Wang, Ling Xu, Yifeng Yang, Chao Yao, Shuo Zhang, Yang Zhang, Yilin Zhang, Hang Zhao, Ziyi Zhao, Dejian Zhong, Shicen Zhou, Pei Zou\",\"doi\":\"arxiv-2409.09214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce Seed-Music, a suite of music generation systems capable of\\nproducing high-quality music with fine-grained style control. Our unified\\nframework leverages both auto-regressive language modeling and diffusion\\napproaches to support two key music creation workflows: \\\\textit{controlled\\nmusic generation} and \\\\textit{post-production editing}. For controlled music\\ngeneration, our system enables vocal music generation with performance controls\\nfrom multi-modal inputs, including style descriptions, audio references,\\nmusical scores, and voice prompts. For post-production editing, it offers\\ninteractive tools for editing lyrics and vocal melodies directly in the\\ngenerated audio. We encourage readers to listen to demo audio examples at\\nhttps://team.doubao.com/seed-music .\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们介绍的 Seed-Music 是一套音乐生成系统，能够生成具有细粒度风格控制的高质量音乐。我们的统一框架利用自动回归语言建模和扩散方法来支持两个关键的音乐创作工作流：\textit{受控音乐生成}和textit{后期编辑}。在受控音乐生成方面，我们的系统可根据多模式输入（包括风格描述、音频参考、乐谱和语音提示）进行表演控制，从而生成声乐。在后期制作编辑方面，它提供了交互式工具，可直接在生成的音频中编辑歌词和声乐旋律。我们鼓励读者收听演示音频示例，网址是：https://team.doubao.com/seed-music 。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: \textit{controlled music generation} and \textit{post-production editing}. For controlled music generation, our system enables vocal music generation with performance controls from multi-modal inputs, including style descriptions, audio references, musical scores, and voice prompts. For post-production editing, it offers interactive tools for editing lyrics and vocal melodies directly in the generated audio. We encourage readers to listen to demo audio examples at https://team.doubao.com/seed-music .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Sound

自引率

0.00%

发文量