NES 视频音乐数据库:与游戏视频配对的符号电子游戏音乐数据集

Igor Cardoso, Rubens O. Moraes, Lucas N. Ferreira
{"title":"NES 视频音乐数据库:与游戏视频配对的符号电子游戏音乐数据集","authors":"Igor Cardoso, Rubens O. Moraes, Lucas N. Ferreira","doi":"arxiv-2404.04420","DOIUrl":null,"url":null,"abstract":"Neural models are one of the most popular approaches for music generation,\nyet there aren't standard large datasets tailored for learning music directly\nfrom game data. To address this research gap, we introduce a novel dataset\nnamed NES-VMDB, containing 98,940 gameplay videos from 389 NES games, each\npaired with its original soundtrack in symbolic format (MIDI). NES-VMDB is\nbuilt upon the Nintendo Entertainment System Music Database (NES-MDB),\nencompassing 5,278 music pieces from 397 NES games. Our approach involves\ncollecting long-play videos for 389 games of the original dataset, slicing them\ninto 15-second-long clips, and extracting the audio from each clip.\nSubsequently, we apply an audio fingerprinting algorithm (similar to Shazam) to\nautomatically identify the corresponding piece in the NES-MDB dataset.\nAdditionally, we introduce a baseline method based on the Controllable Music\nTransformer to generate NES music conditioned on gameplay clips. We evaluated\nthis approach with objective metrics, and the results showed that the\nconditional CMT improves musical structural quality when compared to its\nunconditional counterpart. Moreover, we used a neural classifier to predict the\ngame genre of the generated pieces. Results showed that the CMT generator can\nlearn correlations between gameplay videos and game genres, but further\nresearch has to be conducted to achieve human-level performance.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The NES Video-Music Database: A Dataset of Symbolic Video Game Music Paired with Gameplay Videos\",\"authors\":\"Igor Cardoso, Rubens O. Moraes, Lucas N. Ferreira\",\"doi\":\"arxiv-2404.04420\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural models are one of the most popular approaches for music generation,\\nyet there aren't standard large datasets tailored for learning music directly\\nfrom game data. To address this research gap, we introduce a novel dataset\\nnamed NES-VMDB, containing 98,940 gameplay videos from 389 NES games, each\\npaired with its original soundtrack in symbolic format (MIDI). NES-VMDB is\\nbuilt upon the Nintendo Entertainment System Music Database (NES-MDB),\\nencompassing 5,278 music pieces from 397 NES games. Our approach involves\\ncollecting long-play videos for 389 games of the original dataset, slicing them\\ninto 15-second-long clips, and extracting the audio from each clip.\\nSubsequently, we apply an audio fingerprinting algorithm (similar to Shazam) to\\nautomatically identify the corresponding piece in the NES-MDB dataset.\\nAdditionally, we introduce a baseline method based on the Controllable Music\\nTransformer to generate NES music conditioned on gameplay clips. We evaluated\\nthis approach with objective metrics, and the results showed that the\\nconditional CMT improves musical structural quality when compared to its\\nunconditional counterpart. Moreover, we used a neural classifier to predict the\\ngame genre of the generated pieces. Results showed that the CMT generator can\\nlearn correlations between gameplay videos and game genres, but further\\nresearch has to be conducted to achieve human-level performance.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2404.04420\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.04420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

神经模型是最流行的音乐生成方法之一,但目前还没有直接从游戏数据中学习音乐的标准大型数据集。为了填补这一研究空白,我们引入了一个名为 NES-VMDB 的新型数据集,其中包含 389 款 NES 游戏的 98,940 个游戏视频,每个视频都配有符号格式(MIDI)的原始配乐。NES-VMDB 基于任天堂娱乐系统音乐数据库(Nintendo Entertainment System Music Database,NES-MDB),包含 397 款 NES 游戏中的 5,278 首乐曲。我们的方法包括收集原始数据集中 389 款游戏的长播放视频,将其切成 15 秒长的片段,并从每个片段中提取音频。随后,我们应用音频指纹识别算法(类似于 Shazam)自动识别 NES-MDB 数据集中的相应乐曲。我们用客观指标对这种方法进行了评估,结果表明,条件式 CMT 与无条件式 CMT 相比,提高了音乐结构质量。此外,我们还使用神经分类器来预测生成乐曲的游戏流派。结果表明,CMT 生成器可以学习游戏视频和游戏类型之间的相关性,但要达到人类水平的性能,还需要进行进一步的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
The NES Video-Music Database: A Dataset of Symbolic Video Game Music Paired with Gameplay Videos
Neural models are one of the most popular approaches for music generation, yet there aren't standard large datasets tailored for learning music directly from game data. To address this research gap, we introduce a novel dataset named NES-VMDB, containing 98,940 gameplay videos from 389 NES games, each paired with its original soundtrack in symbolic format (MIDI). NES-VMDB is built upon the Nintendo Entertainment System Music Database (NES-MDB), encompassing 5,278 music pieces from 397 NES games. Our approach involves collecting long-play videos for 389 games of the original dataset, slicing them into 15-second-long clips, and extracting the audio from each clip. Subsequently, we apply an audio fingerprinting algorithm (similar to Shazam) to automatically identify the corresponding piece in the NES-MDB dataset. Additionally, we introduce a baseline method based on the Controllable Music Transformer to generate NES music conditioned on gameplay clips. We evaluated this approach with objective metrics, and the results showed that the conditional CMT improves musical structural quality when compared to its unconditional counterpart. Moreover, we used a neural classifier to predict the game genre of the generated pieces. Results showed that the CMT generator can learn correlations between gameplay videos and game genres, but further research has to be conducted to achieve human-level performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration Prevailing Research Areas for Music AI in the Era of Foundation Models Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1