三种改进的深度学习架构在音乐体裁分类中的比较分析

International Journal of Information Technology and Computer Science Pub Date : 2021-04-08 DOI:10.5815/IJITCS.2021.02.01

Quazi Ghulam Rafi, Mohammed Noman, Sadia Zahin Prodhan, S. Alam, Dipannyta Nandi

{"title":"三种改进的深度学习架构在音乐体裁分类中的比较分析","authors":"Quazi Ghulam Rafi, Mohammed Noman, Sadia Zahin Prodhan, S. Alam, Dipannyta Nandi","doi":"10.5815/IJITCS.2021.02.01","DOIUrl":null,"url":null,"abstract":"Among the many music information retrieval (MIR) tasks, music genre classification is noteworthy. The categorization of music into different groups that came to existence through a complex interplay of cultures, musicians, and various market forces to characterize similarities between compositions and organize collections is known as a music genre. The past researchers extracted various hand-crafted features and developed classifiers based on them. But the major drawback of this approach was the requirement of field expertise. However, in recent times researchers, because of the remarkable classification accuracy of deep learning models, have used similar models for MIR tasks. Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and the hybrid model, Convolutional Recurrent Neural Network (CRNN), are such prominently used deep learning models for music genre classification along with other MIR tasks and various architectures of these models have achieved state-of-the-art results. In this study, we review and discuss three such architectures of deep learning models, already used for music genre classification of music tracks of length of 29-30 seconds. In particular, we analyze improved CNN, RNN, and CRNN architectures named Bottom-up Broadcast Neural Network (BBNN) [1], Independent Recurrent Neural Network (IndRNN) [2] and CRNN in Time and Frequency dimensions (CRNNTF) [3] respectively, almost all of the architectures achieved the highest classification accuracy among the variants of their base deep learning model. Hence, this study holds a comparative analysis of the three most impressive architectural variants of the main deep learning models that are prominently used to classify music genre and presents the three architecture, hence the models (CNN, RNN, and CRNN) in one study. We also propose two ways that can improve the performances of the RNN (IndRNN) and CRNN (CRNN-TF) architectures.","PeriodicalId":130361,"journal":{"name":"International Journal of Information Technology and Computer Science","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Comparative Analysis of Three Improved Deep Learning Architectures for Music Genre Classification\",\"authors\":\"Quazi Ghulam Rafi, Mohammed Noman, Sadia Zahin Prodhan, S. Alam, Dipannyta Nandi\",\"doi\":\"10.5815/IJITCS.2021.02.01\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Among the many music information retrieval (MIR) tasks, music genre classification is noteworthy. The categorization of music into different groups that came to existence through a complex interplay of cultures, musicians, and various market forces to characterize similarities between compositions and organize collections is known as a music genre. The past researchers extracted various hand-crafted features and developed classifiers based on them. But the major drawback of this approach was the requirement of field expertise. However, in recent times researchers, because of the remarkable classification accuracy of deep learning models, have used similar models for MIR tasks. Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and the hybrid model, Convolutional Recurrent Neural Network (CRNN), are such prominently used deep learning models for music genre classification along with other MIR tasks and various architectures of these models have achieved state-of-the-art results. In this study, we review and discuss three such architectures of deep learning models, already used for music genre classification of music tracks of length of 29-30 seconds. In particular, we analyze improved CNN, RNN, and CRNN architectures named Bottom-up Broadcast Neural Network (BBNN) [1], Independent Recurrent Neural Network (IndRNN) [2] and CRNN in Time and Frequency dimensions (CRNNTF) [3] respectively, almost all of the architectures achieved the highest classification accuracy among the variants of their base deep learning model. Hence, this study holds a comparative analysis of the three most impressive architectural variants of the main deep learning models that are prominently used to classify music genre and presents the three architecture, hence the models (CNN, RNN, and CRNN) in one study. We also propose two ways that can improve the performances of the RNN (IndRNN) and CRNN (CRNN-TF) architectures.\",\"PeriodicalId\":130361,\"journal\":{\"name\":\"International Journal of Information Technology and Computer Science\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Information Technology and Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/IJITCS.2021.02.01\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Technology and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/IJITCS.2021.02.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

在众多的音乐信息检索(MIR)任务中，音乐类型分类是值得注意的。通过文化、音乐家和各种市场力量的复杂相互作用，将音乐分类为不同的群体，以表征作品之间的相似性并组织收藏，这被称为音乐流派。过去的研究人员提取各种手工制作的特征，并在此基础上开发分类器。但是这种方法的主要缺点是需要现场专家。然而，近年来，由于深度学习模型具有显著的分类准确性，研究人员已经将类似的模型用于MIR任务。卷积神经网络(CNN)、循环神经网络(RNN)和混合模型卷积循环神经网络(CRNN)是用于音乐类型分类的深度学习模型，以及其他MIR任务，这些模型的各种架构已经取得了最先进的结果。在本研究中，我们回顾并讨论了三种深度学习模型的架构，这些模型已经用于长度为29-30秒的音乐曲目的音乐类型分类。特别是，我们分别分析了自底向上广播神经网络(BBNN)[1]、独立递归神经网络(IndRNN)[2]和时间和频率维度的CRNN (CRNNTF)[3]等改进的CNN、RNN和CRNN架构，几乎所有的架构在其基础深度学习模型的变体中都达到了最高的分类精度。因此，本研究对主要深度学习模型的三种最令人印象深刻的架构变体进行了比较分析，这些模型主要用于对音乐类型进行分类，并呈现了三种架构，因此在一项研究中提出了模型(CNN, RNN和CRNN)。我们还提出了两种可以提高RNN (IndRNN)和CRNN (CRNN- tf)体系性能的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparative Analysis of Three Improved Deep Learning Architectures for Music Genre Classification

Among the many music information retrieval (MIR) tasks, music genre classification is noteworthy. The categorization of music into different groups that came to existence through a complex interplay of cultures, musicians, and various market forces to characterize similarities between compositions and organize collections is known as a music genre. The past researchers extracted various hand-crafted features and developed classifiers based on them. But the major drawback of this approach was the requirement of field expertise. However, in recent times researchers, because of the remarkable classification accuracy of deep learning models, have used similar models for MIR tasks. Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and the hybrid model, Convolutional Recurrent Neural Network (CRNN), are such prominently used deep learning models for music genre classification along with other MIR tasks and various architectures of these models have achieved state-of-the-art results. In this study, we review and discuss three such architectures of deep learning models, already used for music genre classification of music tracks of length of 29-30 seconds. In particular, we analyze improved CNN, RNN, and CRNN architectures named Bottom-up Broadcast Neural Network (BBNN) [1], Independent Recurrent Neural Network (IndRNN) [2] and CRNN in Time and Frequency dimensions (CRNNTF) [3] respectively, almost all of the architectures achieved the highest classification accuracy among the variants of their base deep learning model. Hence, this study holds a comparative analysis of the three most impressive architectural variants of the main deep learning models that are prominently used to classify music genre and presents the three architecture, hence the models (CNN, RNN, and CRNN) in one study. We also propose two ways that can improve the performances of the RNN (IndRNN) and CRNN (CRNN-TF) architectures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Information Technology and Computer Science

自引率

0.00%

发文量