Quazi Ghulam Rafi, Mohammed Noman, Sadia Zahin Prodhan, S. Alam, Dipannyta Nandi
{"title":"Comparative Analysis of Three Improved Deep Learning Architectures for Music Genre Classification","authors":"Quazi Ghulam Rafi, Mohammed Noman, Sadia Zahin Prodhan, S. Alam, Dipannyta Nandi","doi":"10.5815/IJITCS.2021.02.01","DOIUrl":null,"url":null,"abstract":"Among the many music information retrieval (MIR) tasks, music genre classification is noteworthy. The categorization of music into different groups that came to existence through a complex interplay of cultures, musicians, and various market forces to characterize similarities between compositions and organize collections is known as a music genre. The past researchers extracted various hand-crafted features and developed classifiers based on them. But the major drawback of this approach was the requirement of field expertise. However, in recent times researchers, because of the remarkable classification accuracy of deep learning models, have used similar models for MIR tasks. Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and the hybrid model, Convolutional Recurrent Neural Network (CRNN), are such prominently used deep learning models for music genre classification along with other MIR tasks and various architectures of these models have achieved state-of-the-art results. In this study, we review and discuss three such architectures of deep learning models, already used for music genre classification of music tracks of length of 29-30 seconds. In particular, we analyze improved CNN, RNN, and CRNN architectures named Bottom-up Broadcast Neural Network (BBNN) [1], Independent Recurrent Neural Network (IndRNN) [2] and CRNN in Time and Frequency dimensions (CRNNTF) [3] respectively, almost all of the architectures achieved the highest classification accuracy among the variants of their base deep learning model. Hence, this study holds a comparative analysis of the three most impressive architectural variants of the main deep learning models that are prominently used to classify music genre and presents the three architecture, hence the models (CNN, RNN, and CRNN) in one study. We also propose two ways that can improve the performances of the RNN (IndRNN) and CRNN (CRNN-TF) architectures.","PeriodicalId":130361,"journal":{"name":"International Journal of Information Technology and Computer Science","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Technology and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/IJITCS.2021.02.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Among the many music information retrieval (MIR) tasks, music genre classification is noteworthy. The categorization of music into different groups that came to existence through a complex interplay of cultures, musicians, and various market forces to characterize similarities between compositions and organize collections is known as a music genre. The past researchers extracted various hand-crafted features and developed classifiers based on them. But the major drawback of this approach was the requirement of field expertise. However, in recent times researchers, because of the remarkable classification accuracy of deep learning models, have used similar models for MIR tasks. Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and the hybrid model, Convolutional Recurrent Neural Network (CRNN), are such prominently used deep learning models for music genre classification along with other MIR tasks and various architectures of these models have achieved state-of-the-art results. In this study, we review and discuss three such architectures of deep learning models, already used for music genre classification of music tracks of length of 29-30 seconds. In particular, we analyze improved CNN, RNN, and CRNN architectures named Bottom-up Broadcast Neural Network (BBNN) [1], Independent Recurrent Neural Network (IndRNN) [2] and CRNN in Time and Frequency dimensions (CRNNTF) [3] respectively, almost all of the architectures achieved the highest classification accuracy among the variants of their base deep learning model. Hence, this study holds a comparative analysis of the three most impressive architectural variants of the main deep learning models that are prominently used to classify music genre and presents the three architecture, hence the models (CNN, RNN, and CRNN) in one study. We also propose two ways that can improve the performances of the RNN (IndRNN) and CRNN (CRNN-TF) architectures.