Sonic Signatures: Sequential Model-driven Music Genre Classification with Mel Spectograms

2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT) Pub Date : 2024-01-11 DOI:10.1109/ICAECT60202.2024.10468856

Rudresh Pillai, Neha Sharma, Deepak Upadhyay, Sarishma Dangi, Rupesh Gupta

{"title":"Sonic Signatures: Sequential Model-driven Music Genre Classification with Mel Spectograms","authors":"Rudresh Pillai, Neha Sharma, Deepak Upadhyay, Sarishma Dangi, Rupesh Gupta","doi":"10.1109/ICAECT60202.2024.10468856","DOIUrl":null,"url":null,"abstract":"Music genres, with their diverse sonic landscapes and distinct characteristics, have been a subject of profound interest in audio analysis. This research investigates the application of digital image processing in the field of music genre classification, utilizing Mel spectrogram images obtained from audio files. This study employs a sequential approach to analyze the 'GTZAN Dataset,' which consists of 10,000 documented Mel spectrogram images that represent ten distinct music genres. The dataset was partitioned in a systematic manner into three separate segments. This partitioning allowed for thorough training and evaluation of the model, with a distribution ratio of 60% for training, 20% for validation, and 20% for testing. The sequential model, which is based on deep learning tenets effectively captures complex genre-specific characteristics from Mel spectrograms in order to achieve accurate music genre categorization. By utilizing a dataset consisting of 6,000 training photos and 2,000 validation photos, the model's parameters underwent refinement. Subsequently, an evaluation was conducted on a distinct set of 2,000 test photographs, which unveiled a remarkable accuracy rate of 94%. During the course of the research, performance metrics such as accuracy and loss graphs were employed to monitor the learning progress of the model during the training phase. Moreover, the examination of the confusion matrix in the testing phase provided insight into the effectiveness of the model, resulting in notable performance measurements. This confirms the model's strength in accurately categorizing music genres. This research makes a substantial contribution towards the advancement of autonomous systems that possess the ability to accurately classify music genres by utilizing spectrogram representations. The model's accuracy of 94% serves as evidence of its effectiveness, indicating its possible applications in systems for recommendations, music indexing, and content organization. This emphasizes its significant contribution to the field of audio content analysis and classification approaches.","PeriodicalId":518900,"journal":{"name":"2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)","volume":"44 2","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAECT60202.2024.10468856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Music genres, with their diverse sonic landscapes and distinct characteristics, have been a subject of profound interest in audio analysis. This research investigates the application of digital image processing in the field of music genre classification, utilizing Mel spectrogram images obtained from audio files. This study employs a sequential approach to analyze the 'GTZAN Dataset,' which consists of 10,000 documented Mel spectrogram images that represent ten distinct music genres. The dataset was partitioned in a systematic manner into three separate segments. This partitioning allowed for thorough training and evaluation of the model, with a distribution ratio of 60% for training, 20% for validation, and 20% for testing. The sequential model, which is based on deep learning tenets effectively captures complex genre-specific characteristics from Mel spectrograms in order to achieve accurate music genre categorization. By utilizing a dataset consisting of 6,000 training photos and 2,000 validation photos, the model's parameters underwent refinement. Subsequently, an evaluation was conducted on a distinct set of 2,000 test photographs, which unveiled a remarkable accuracy rate of 94%. During the course of the research, performance metrics such as accuracy and loss graphs were employed to monitor the learning progress of the model during the training phase. Moreover, the examination of the confusion matrix in the testing phase provided insight into the effectiveness of the model, resulting in notable performance measurements. This confirms the model's strength in accurately categorizing music genres. This research makes a substantial contribution towards the advancement of autonomous systems that possess the ability to accurately classify music genres by utilizing spectrogram representations. The model's accuracy of 94% serves as evidence of its effectiveness, indicating its possible applications in systems for recommendations, music indexing, and content organization. This emphasizes its significant contribution to the field of audio content analysis and classification approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

声波特征：利用旋律谱图进行序列模型驱动的音乐流派分类

音乐流派具有不同的声音景观和鲜明的特征，一直是音频分析领域深感兴趣的课题。本研究利用从音频文件中获取的梅尔频谱图图像，研究数字图像处理在音乐流派分类领域的应用。该数据集由 10,000 张记录的梅尔频谱图图像组成，代表了十种不同的音乐流派。该数据集被系统地划分为三个独立的部分。这种划分方式有助于对模型进行全面的训练和评估，训练、验证和测试的分配比例分别为 60%、20% 和 20%。基于深度学习原理的序列模型能从梅尔频谱图中有效捕捉复杂的特定流派特征，从而实现准确的音乐流派分类。通过使用由 6000 张训练照片和 2000 张验证照片组成的数据集，该模型的参数得到了改进。随后，在由 2,000 张测试照片组成的不同数据集上进行了评估，结果显示准确率高达 94%。在研究过程中，采用了准确率和损失图等性能指标来监控模型在训练阶段的学习进度。此外，在测试阶段对混淆矩阵的检查也有助于深入了解模型的有效性，从而得出显著的性能指标。这证实了该模型在准确划分音乐类型方面的优势。这项研究为推动自主系统的发展做出了重大贡献，这些系统具备利用频谱图准确分类音乐类型的能力。该模型高达 94% 的准确率证明了它的有效性，也表明了它在推荐、音乐索引和内容组织系统中的应用可能性。这强调了它对音频内容分析和分类方法领域的重大贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)

自引率

0.00%

发文量