{"title":"Hierarchical Symbolic Pop Music Generation with Graph Neural Networks","authors":"Wen Qing Lim, Jinhua Liang, Huan Zhang","doi":"arxiv-2409.08155","DOIUrl":null,"url":null,"abstract":"Music is inherently made up of complex structures, and representing them as\ngraphs helps to capture multiple levels of relationships. While music\ngeneration has been explored using various deep generation techniques, research\non graph-related music generation is sparse. Earlier graph-based music\ngeneration worked only on generating melodies, and recent works to generate\npolyphonic music do not account for longer-term structure. In this paper, we\nexplore a multi-graph approach to represent both the rhythmic patterns and\nphrase structure of Chinese pop music. Consequently, we propose a two-step\napproach that aims to generate polyphonic music with coherent rhythm and\nlong-term structure. We train two Variational Auto-Encoder networks - one on a\nMIDI dataset to generate 4-bar phrases, and another on song structure labels to\ngenerate full song structure. Our work shows that the models are able to learn\nmost of the structural nuances in the training dataset, including chord and\npitch frequency distributions, and phrase attributes.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Music is inherently made up of complex structures, and representing them as
graphs helps to capture multiple levels of relationships. While music
generation has been explored using various deep generation techniques, research
on graph-related music generation is sparse. Earlier graph-based music
generation worked only on generating melodies, and recent works to generate
polyphonic music do not account for longer-term structure. In this paper, we
explore a multi-graph approach to represent both the rhythmic patterns and
phrase structure of Chinese pop music. Consequently, we propose a two-step
approach that aims to generate polyphonic music with coherent rhythm and
long-term structure. We train two Variational Auto-Encoder networks - one on a
MIDI dataset to generate 4-bar phrases, and another on song structure labels to
generate full song structure. Our work shows that the models are able to learn
most of the structural nuances in the training dataset, including chord and
pitch frequency distributions, and phrase attributes.