{"title":"利用图神经网络生成分层符号流行音乐","authors":"Wen Qing Lim, Jinhua Liang, Huan Zhang","doi":"arxiv-2409.08155","DOIUrl":null,"url":null,"abstract":"Music is inherently made up of complex structures, and representing them as\ngraphs helps to capture multiple levels of relationships. While music\ngeneration has been explored using various deep generation techniques, research\non graph-related music generation is sparse. Earlier graph-based music\ngeneration worked only on generating melodies, and recent works to generate\npolyphonic music do not account for longer-term structure. In this paper, we\nexplore a multi-graph approach to represent both the rhythmic patterns and\nphrase structure of Chinese pop music. Consequently, we propose a two-step\napproach that aims to generate polyphonic music with coherent rhythm and\nlong-term structure. We train two Variational Auto-Encoder networks - one on a\nMIDI dataset to generate 4-bar phrases, and another on song structure labels to\ngenerate full song structure. Our work shows that the models are able to learn\nmost of the structural nuances in the training dataset, including chord and\npitch frequency distributions, and phrase attributes.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hierarchical Symbolic Pop Music Generation with Graph Neural Networks\",\"authors\":\"Wen Qing Lim, Jinhua Liang, Huan Zhang\",\"doi\":\"arxiv-2409.08155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Music is inherently made up of complex structures, and representing them as\\ngraphs helps to capture multiple levels of relationships. While music\\ngeneration has been explored using various deep generation techniques, research\\non graph-related music generation is sparse. Earlier graph-based music\\ngeneration worked only on generating melodies, and recent works to generate\\npolyphonic music do not account for longer-term structure. In this paper, we\\nexplore a multi-graph approach to represent both the rhythmic patterns and\\nphrase structure of Chinese pop music. Consequently, we propose a two-step\\napproach that aims to generate polyphonic music with coherent rhythm and\\nlong-term structure. We train two Variational Auto-Encoder networks - one on a\\nMIDI dataset to generate 4-bar phrases, and another on song structure labels to\\ngenerate full song structure. Our work shows that the models are able to learn\\nmost of the structural nuances in the training dataset, including chord and\\npitch frequency distributions, and phrase attributes.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08155\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hierarchical Symbolic Pop Music Generation with Graph Neural Networks
Music is inherently made up of complex structures, and representing them as
graphs helps to capture multiple levels of relationships. While music
generation has been explored using various deep generation techniques, research
on graph-related music generation is sparse. Earlier graph-based music
generation worked only on generating melodies, and recent works to generate
polyphonic music do not account for longer-term structure. In this paper, we
explore a multi-graph approach to represent both the rhythmic patterns and
phrase structure of Chinese pop music. Consequently, we propose a two-step
approach that aims to generate polyphonic music with coherent rhythm and
long-term structure. We train two Variational Auto-Encoder networks - one on a
MIDI dataset to generate 4-bar phrases, and another on song structure labels to
generate full song structure. Our work shows that the models are able to learn
most of the structural nuances in the training dataset, including chord and
pitch frequency distributions, and phrase attributes.