Prevailing Research Areas for Music AI in the Era of Foundation Models

Megan Wei, Mateusz Modrzejewski, Aswin Sivaraman, Dorien Herremans
{"title":"Prevailing Research Areas for Music AI in the Era of Foundation Models","authors":"Megan Wei, Mateusz Modrzejewski, Aswin Sivaraman, Dorien Herremans","doi":"arxiv-2409.09378","DOIUrl":null,"url":null,"abstract":"In tandem with the recent advancements in foundation model research, there\nhas been a surge of generative music AI applications within the past few years.\nAs the idea of AI-generated or AI-augmented music becomes more mainstream, many\nresearchers in the music AI community may be wondering what avenues of research\nare left. With regards to music generative models, we outline the current areas\nof research with significant room for exploration. Firstly, we pose the\nquestion of foundational representation of these generative models and\ninvestigate approaches towards explainability. Next, we discuss the current\nstate of music datasets and their limitations. We then overview different\ngenerative models, forms of evaluating these models, and their computational\nconstraints/limitations. Subsequently, we highlight applications of these\ngenerative models towards extensions to multiple modalities and integration\nwith artists' workflow as well as music education systems. Finally, we survey\nthe potential copyright implications of generative music and discuss strategies\nfor protecting the rights of musicians. While it is not meant to be exhaustive,\nour survey calls to attention a variety of research directions enabled by music\nfoundation models.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":"105 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In tandem with the recent advancements in foundation model research, there has been a surge of generative music AI applications within the past few years. As the idea of AI-generated or AI-augmented music becomes more mainstream, many researchers in the music AI community may be wondering what avenues of research are left. With regards to music generative models, we outline the current areas of research with significant room for exploration. Firstly, we pose the question of foundational representation of these generative models and investigate approaches towards explainability. Next, we discuss the current state of music datasets and their limitations. We then overview different generative models, forms of evaluating these models, and their computational constraints/limitations. Subsequently, we highlight applications of these generative models towards extensions to multiple modalities and integration with artists' workflow as well as music education systems. Finally, we survey the potential copyright implications of generative music and discuss strategies for protecting the rights of musicians. While it is not meant to be exhaustive, our survey calls to attention a variety of research directions enabled by music foundation models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基础模型时代音乐人工智能的主流研究领域
随着人工智能生成或人工智能增强音乐的理念逐渐成为主流,音乐人工智能界的许多研究人员可能会想知道还有哪些研究途径。关于音乐生成模型,我们概述了当前具有重大探索空间的研究领域。首先,我们提出了这些生成模型的基础表征问题,并探讨了实现可解释性的方法。接下来,我们讨论了音乐数据集的现状及其局限性。然后,我们概述了不同的生成模型、评估这些模型的形式以及它们的计算限制/局限性。随后,我们重点介绍了这些生成模型在扩展到多种模式以及与艺术家工作流程和音乐教育系统集成方面的应用。最后,我们探讨了生成式音乐对版权的潜在影响,并讨论了保护音乐家权利的策略。我们的调查并非详尽无遗,但我们的调查唤起了人们对音乐基础模型所带来的各种研究方向的关注。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration Prevailing Research Areas for Music AI in the Era of Foundation Models Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1