Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features

Carlos Hernandez-Olivan, J. R. Beltrán, David Diaz-Guerra
{"title":"Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features","authors":"Carlos Hernandez-Olivan, J. R. Beltrán, David Diaz-Guerra","doi":"10.9781/ijimai.2021.10.005","DOIUrl":null,"url":null,"abstract":"The analysis of the structure of musical pieces is a task that remains a challenge for Artificial Intelligence, especially in the field of Deep Learning. It requires prior identification of structural boundaries of the music pieces. This structural boundary analysis has recently been studied with unsupervised methods and \\textit{end-to-end} techniques such as Convolutional Neural Networks (CNN) using Mel-Scaled Log-magnitude Spectograms features (MLS), Self-Similarity Matrices (SSM) or Self-Similarity Lag Matrices (SSLM) as inputs and trained with human annotations. Several studies have been published divided into unsupervised and \\textit{end-to-end} methods in which pre-processing is done in different ways, using different distance metrics and audio characteristics, so a generalized pre-processing method to compute model inputs is missing. The objective of this work is to establish a general method of pre-processing these inputs by comparing the inputs calculated from different pooling strategies, distance metrics and audio characteristics, also taking into account the computing time to obtain them. We also establish the most effective combination of inputs to be delivered to the CNN in order to establish the most efficient way to extract the limits of the structure of the music pieces. With an adequate combination of input matrices and pooling strategies we obtain a measurement accuracy $F_1$ of 0.411 that outperforms the current one obtained under the same conditions.","PeriodicalId":143152,"journal":{"name":"Int. J. Interact. Multim. Artif. Intell.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Interact. Multim. Artif. Intell.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.9781/ijimai.2021.10.005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

The analysis of the structure of musical pieces is a task that remains a challenge for Artificial Intelligence, especially in the field of Deep Learning. It requires prior identification of structural boundaries of the music pieces. This structural boundary analysis has recently been studied with unsupervised methods and \textit{end-to-end} techniques such as Convolutional Neural Networks (CNN) using Mel-Scaled Log-magnitude Spectograms features (MLS), Self-Similarity Matrices (SSM) or Self-Similarity Lag Matrices (SSLM) as inputs and trained with human annotations. Several studies have been published divided into unsupervised and \textit{end-to-end} methods in which pre-processing is done in different ways, using different distance metrics and audio characteristics, so a generalized pre-processing method to compute model inputs is missing. The objective of this work is to establish a general method of pre-processing these inputs by comparing the inputs calculated from different pooling strategies, distance metrics and audio characteristics, also taking into account the computing time to obtain them. We also establish the most effective combination of inputs to be delivered to the CNN in order to establish the most efficient way to extract the limits of the structure of the music pieces. With an adequate combination of input matrices and pooling strategies we obtain a measurement accuracy $F_1$ of 0.411 that outperforms the current one obtained under the same conditions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用卷积神经网络的音乐边界检测:组合输入特征的比较分析
对音乐作品结构的分析仍然是人工智能的一个挑战,特别是在深度学习领域。它需要事先识别乐曲的结构边界。这种结构边界分析最近用无监督方法和\textit{端到端}技术进行了研究,如卷积神经网络(CNN),使用mel - scale Log-magnitude spectrum feature (MLS)、自相似矩阵(SSM)或自相似滞后矩阵(SSLM)作为输入,并使用人工注释进行训练。已经发表的一些研究分为无监督和\textit{端到端}方法,其中预处理以不同的方式完成,使用不同的距离度量和音频特征,因此缺乏一种通用的预处理方法来计算模型输入。这项工作的目的是通过比较不同池化策略、距离度量和音频特征计算的输入,并考虑获得它们的计算时间,建立一种预处理这些输入的通用方法。我们还建立了传递给CNN的最有效的输入组合,以便建立最有效的方法来提取音乐片段的结构极限。通过输入矩阵和池化策略的适当组合,我们获得了0.411的测量精度$F_1$,优于在相同条件下获得的当前测量精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The Yield Curve as a Recession Leading Indicator. An Application for Gradient Boosting and Random Forest Variational Learning for the Inverted Beta-Liouville Mixture Model and Its Application to Text Categorization Why the Future Might Actually Need Us: A Theological Critique of the 'Humanity-As-Midwife-For-Artificial-Superintelligence' Proposal Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1