自动音乐合成与变压器

Yi-Hsuan Yang
{"title":"自动音乐合成与变压器","authors":"Yi-Hsuan Yang","doi":"10.1145/3463946.3469111","DOIUrl":null,"url":null,"abstract":"In this talk, I will first give a brief overview of recent deep learning-based approaches for automatic music generation in the symbolic domain. I will then talk about our own research that employs self-attention based architectures, a.k.a. Transformers, for symbolic music generation. A naive approach with Transformers would treat music as a sequence of text-like tokens. But, our research demonstrates that Transformers can generate higher-quality music when music is not treated simply as text. In particular, our Pop Music Transformer model, published at ACM Multimedia 2020, employs a novel beat-based representation of music that informs self-attention models with the bar-beat metrical structure present in music. This approach greatly improves the rhythmic structure of the generated music. A more recent model we published at AAAI 2021, named the Compound Word Transformer, exploits the fact that a musical note is associated with multiple attributes such as pitch, duration and velocity. Instead of predicting tokens corresponding to these different attributes one-by-one at inference time, the Compound Word Transformer predicts them altogether jointly, greatly reducing the sequence length needed to model a full-length song and also making it easier to model the dependency among these attributes.","PeriodicalId":43265,"journal":{"name":"International Journal of Mobile Computing and Multimedia Communications","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2021-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Automatic Music Composition with Transformers\",\"authors\":\"Yi-Hsuan Yang\",\"doi\":\"10.1145/3463946.3469111\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this talk, I will first give a brief overview of recent deep learning-based approaches for automatic music generation in the symbolic domain. I will then talk about our own research that employs self-attention based architectures, a.k.a. Transformers, for symbolic music generation. A naive approach with Transformers would treat music as a sequence of text-like tokens. But, our research demonstrates that Transformers can generate higher-quality music when music is not treated simply as text. In particular, our Pop Music Transformer model, published at ACM Multimedia 2020, employs a novel beat-based representation of music that informs self-attention models with the bar-beat metrical structure present in music. This approach greatly improves the rhythmic structure of the generated music. A more recent model we published at AAAI 2021, named the Compound Word Transformer, exploits the fact that a musical note is associated with multiple attributes such as pitch, duration and velocity. Instead of predicting tokens corresponding to these different attributes one-by-one at inference time, the Compound Word Transformer predicts them altogether jointly, greatly reducing the sequence length needed to model a full-length song and also making it easier to model the dependency among these attributes.\",\"PeriodicalId\":43265,\"journal\":{\"name\":\"International Journal of Mobile Computing and Multimedia Communications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2021-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Mobile Computing and Multimedia Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3463946.3469111\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"TELECOMMUNICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Mobile Computing and Multimedia Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3463946.3469111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 1

摘要

在这次演讲中,我将首先简要概述最近在符号领域中基于深度学习的自动音乐生成方法。然后我将谈论我们自己的研究,使用基于自我关注的架构,也就是变形金刚,来生成符号音乐。对于《变形金刚》,一种幼稚的方法是将音乐视为一系列类似文本的符号。但是,我们的研究表明,当音乐不被简单地视为文本时,变形金刚可以产生更高质量的音乐。特别是,我们在ACM Multimedia 2020上发表的流行音乐转换器模型,采用了一种新颖的基于节拍的音乐表示,将音乐中存在的小节节奏结构告知自我注意模型。这种方法极大地改善了生成音乐的节奏结构。我们在AAAI 2021上发布了一个最新的模型,名为Compound Word Transformer,它利用了一个音符与多个属性(如音高、持续时间和速度)相关的事实。复合词转换器不是在推理时一个接一个地预测与这些不同属性相对应的标记,而是将它们一起预测,这大大减少了为全长歌曲建模所需的序列长度,也使建模这些属性之间的依赖关系变得更容易。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Automatic Music Composition with Transformers
In this talk, I will first give a brief overview of recent deep learning-based approaches for automatic music generation in the symbolic domain. I will then talk about our own research that employs self-attention based architectures, a.k.a. Transformers, for symbolic music generation. A naive approach with Transformers would treat music as a sequence of text-like tokens. But, our research demonstrates that Transformers can generate higher-quality music when music is not treated simply as text. In particular, our Pop Music Transformer model, published at ACM Multimedia 2020, employs a novel beat-based representation of music that informs self-attention models with the bar-beat metrical structure present in music. This approach greatly improves the rhythmic structure of the generated music. A more recent model we published at AAAI 2021, named the Compound Word Transformer, exploits the fact that a musical note is associated with multiple attributes such as pitch, duration and velocity. Instead of predicting tokens corresponding to these different attributes one-by-one at inference time, the Compound Word Transformer predicts them altogether jointly, greatly reducing the sequence length needed to model a full-length song and also making it easier to model the dependency among these attributes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.40
自引率
16.70%
发文量
23
期刊最新文献
The Role of the Combination of 3D Simulation Sequence Diagram and Video Motion Recognition Technology in Evaluating and Correcting Dancers' Dance Moves Fuzzy Learning-Based Electric Measurement Data Circulation Monitoring and Security Risk Anomaly Evaluation Youth Sources of News During the COVID-19 Period An End-to-End Network Evaluation Method for Differentiated Multi-Service Bearing in VPP Biometric Authentication Methods on Mobile Platforms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1