HSP数据集:歌曲流行度预测的见解

M. Vötter, Maximilian Mayerl, Günther Specht, Eva Zangerle
{"title":"HSP数据集:歌曲流行度预测的见解","authors":"M. Vötter, Maximilian Mayerl, Günther Specht, Eva Zangerle","doi":"10.1142/s1793351x22400104","DOIUrl":null,"url":null,"abstract":"Estimating the success of a song before its release is an important music industry task. This work uses audio descriptors to predict the success (popularity) of a song, where typical measures of success are chart measures such as peak position and streaming measures such as listener-count. Currently, a wide range of datasets is used for that purpose, but most of them are not publicly available; likewise, available datasets are restricted either in size, available features, or popularity measures. This substantially impedes the evaluation of the predictive power of a wide range of models. Therefore, we present two novel datasets called HSP-S and HSP-L based on data from AcousticBrainz, Billboard Hot 100, the Million Song Dataset, and last.fm. Both datasets contain audio features, mel-spectrograms as well as streaming listener- and play-counts. The larger HSP-L dataset contains 73,482 songs, whereas the smaller HSP-S dataset contains 7736 songs and additionally features Billboard Hot 100 chart measures. In contrast to the previous publicly available datasets, our datasets contain substantially more songs and richer and more diverse features. We solely utilize data from the public domain, allowing us to evaluate and compare a wide range of models on our datasets. To demonstrate the use of the datasets, we perform regression and classification (popular/unpopular) tasks on both datasets using a wide variety of models to predict song popularity for all provided target measures of success.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":" 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"HSP Datasets: Insights on Song Popularity Prediction\",\"authors\":\"M. Vötter, Maximilian Mayerl, Günther Specht, Eva Zangerle\",\"doi\":\"10.1142/s1793351x22400104\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Estimating the success of a song before its release is an important music industry task. This work uses audio descriptors to predict the success (popularity) of a song, where typical measures of success are chart measures such as peak position and streaming measures such as listener-count. Currently, a wide range of datasets is used for that purpose, but most of them are not publicly available; likewise, available datasets are restricted either in size, available features, or popularity measures. This substantially impedes the evaluation of the predictive power of a wide range of models. Therefore, we present two novel datasets called HSP-S and HSP-L based on data from AcousticBrainz, Billboard Hot 100, the Million Song Dataset, and last.fm. Both datasets contain audio features, mel-spectrograms as well as streaming listener- and play-counts. The larger HSP-L dataset contains 73,482 songs, whereas the smaller HSP-S dataset contains 7736 songs and additionally features Billboard Hot 100 chart measures. In contrast to the previous publicly available datasets, our datasets contain substantially more songs and richer and more diverse features. We solely utilize data from the public domain, allowing us to evaluate and compare a wide range of models on our datasets. To demonstrate the use of the datasets, we perform regression and classification (popular/unpopular) tasks on both datasets using a wide variety of models to predict song popularity for all provided target measures of success.\",\"PeriodicalId\":217956,\"journal\":{\"name\":\"Int. J. Semantic Comput.\",\"volume\":\" 8\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Semantic Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s1793351x22400104\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Semantic Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s1793351x22400104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在歌曲发行前评估其成功与否是音乐行业的一项重要任务。这项工作使用音频描述符来预测歌曲的成功(受欢迎程度),其中成功的典型衡量标准是图表衡量标准,如峰值位置和流媒体衡量标准,如听众数。目前,广泛的数据集被用于这一目的,但其中大多数不是公开可用的;同样,可用的数据集在大小、可用特征或受欢迎程度方面也受到限制。这在很大程度上阻碍了对各种模型预测能力的评估。因此,我们基于来自AcousticBrainz、Billboard Hot 100、百万歌曲数据集和last.fm的数据,提出了两个新的数据集,称为HSP-S和HSP-L。这两个数据集都包含音频功能,mel谱图以及流媒体听众和播放计数。较大的HSP-L数据集包含73,482首歌曲,而较小的HSP-S数据集包含7736首歌曲,另外还包含Billboard Hot 100排行榜。与之前的公开数据集相比,我们的数据集包含了更多的歌曲和更丰富、更多样化的特征。我们完全利用来自公共领域的数据,允许我们评估和比较我们数据集上的各种模型。为了演示数据集的使用,我们使用各种模型对两个数据集执行回归和分类(流行/不流行)任务,以预测所有提供的目标成功度量的歌曲流行程度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HSP Datasets: Insights on Song Popularity Prediction
Estimating the success of a song before its release is an important music industry task. This work uses audio descriptors to predict the success (popularity) of a song, where typical measures of success are chart measures such as peak position and streaming measures such as listener-count. Currently, a wide range of datasets is used for that purpose, but most of them are not publicly available; likewise, available datasets are restricted either in size, available features, or popularity measures. This substantially impedes the evaluation of the predictive power of a wide range of models. Therefore, we present two novel datasets called HSP-S and HSP-L based on data from AcousticBrainz, Billboard Hot 100, the Million Song Dataset, and last.fm. Both datasets contain audio features, mel-spectrograms as well as streaming listener- and play-counts. The larger HSP-L dataset contains 73,482 songs, whereas the smaller HSP-S dataset contains 7736 songs and additionally features Billboard Hot 100 chart measures. In contrast to the previous publicly available datasets, our datasets contain substantially more songs and richer and more diverse features. We solely utilize data from the public domain, allowing us to evaluate and compare a wide range of models on our datasets. To demonstrate the use of the datasets, we perform regression and classification (popular/unpopular) tasks on both datasets using a wide variety of models to predict song popularity for all provided target measures of success.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Guest Editorial - Special Issue on IEEE AIKE 2022 TemporalDedup: Domain-Independent Deduplication of Redundant and Errant Temporal Data Knowledge Graph-Based Explainable Artificial Intelligence for Business Process Analysis Knowledge Graph-Based Integration of Autonomous Driving Datasets Confidence-Based Cheat Detection Through Constrained Order Inference of Temporal Sequences
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1