{"title":"Music Style Transfer With Diffusion Model","authors":"Hong Huang, Yuyi Wang, Luyao Li, Jun Lin","doi":"arxiv-2404.14771","DOIUrl":null,"url":null,"abstract":"Previous studies on music style transfer have mainly focused on one-to-one\nstyle conversion, which is relatively limited. When considering the conversion\nbetween multiple styles, previous methods required designing multiple modes to\ndisentangle the complex style of the music, resulting in large computational\ncosts and slow audio generation. The existing music style transfer methods\ngenerate spectrograms with artifacts, leading to significant noise in the\ngenerated audio. To address these issues, this study proposes a music style\ntransfer framework based on diffusion models (DM) and uses spectrogram-based\nmethods to achieve multi-to-multi music style transfer. The GuideDiff method is\nused to restore spectrograms to high-fidelity audio, accelerating audio\ngeneration speed and reducing noise in the generated audio. Experimental\nresults show that our model has good performance in multi-mode music style\ntransfer compared to the baseline and can generate high-quality audio in\nreal-time on consumer-grade GPUs.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.14771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Previous studies on music style transfer have mainly focused on one-to-one
style conversion, which is relatively limited. When considering the conversion
between multiple styles, previous methods required designing multiple modes to
disentangle the complex style of the music, resulting in large computational
costs and slow audio generation. The existing music style transfer methods
generate spectrograms with artifacts, leading to significant noise in the
generated audio. To address these issues, this study proposes a music style
transfer framework based on diffusion models (DM) and uses spectrogram-based
methods to achieve multi-to-multi music style transfer. The GuideDiff method is
used to restore spectrograms to high-fidelity audio, accelerating audio
generation speed and reducing noise in the generated audio. Experimental
results show that our model has good performance in multi-mode music style
transfer compared to the baseline and can generate high-quality audio in
real-time on consumer-grade GPUs.