Sonic Signatures: Sequential Model-driven Music Genre Classification with Mel Spectograms

Rudresh Pillai, Neha Sharma, Deepak Upadhyay, Sarishma Dangi, Rupesh Gupta
{"title":"Sonic Signatures: Sequential Model-driven Music Genre Classification with Mel Spectograms","authors":"Rudresh Pillai, Neha Sharma, Deepak Upadhyay, Sarishma Dangi, Rupesh Gupta","doi":"10.1109/ICAECT60202.2024.10468856","DOIUrl":null,"url":null,"abstract":"Music genres, with their diverse sonic landscapes and distinct characteristics, have been a subject of profound interest in audio analysis. This research investigates the application of digital image processing in the field of music genre classification, utilizing Mel spectrogram images obtained from audio files. This study employs a sequential approach to analyze the 'GTZAN Dataset,' which consists of 10,000 documented Mel spectrogram images that represent ten distinct music genres. The dataset was partitioned in a systematic manner into three separate segments. This partitioning allowed for thorough training and evaluation of the model, with a distribution ratio of 60% for training, 20% for validation, and 20% for testing. The sequential model, which is based on deep learning tenets effectively captures complex genre-specific characteristics from Mel spectrograms in order to achieve accurate music genre categorization. By utilizing a dataset consisting of 6,000 training photos and 2,000 validation photos, the model's parameters underwent refinement. Subsequently, an evaluation was conducted on a distinct set of 2,000 test photographs, which unveiled a remarkable accuracy rate of 94%. During the course of the research, performance metrics such as accuracy and loss graphs were employed to monitor the learning progress of the model during the training phase. Moreover, the examination of the confusion matrix in the testing phase provided insight into the effectiveness of the model, resulting in notable performance measurements. This confirms the model's strength in accurately categorizing music genres. This research makes a substantial contribution towards the advancement of autonomous systems that possess the ability to accurately classify music genres by utilizing spectrogram representations. The model's accuracy of 94% serves as evidence of its effectiveness, indicating its possible applications in systems for recommendations, music indexing, and content organization. This emphasizes its significant contribution to the field of audio content analysis and classification approaches.","PeriodicalId":518900,"journal":{"name":"2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)","volume":"44 2","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAECT60202.2024.10468856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Music genres, with their diverse sonic landscapes and distinct characteristics, have been a subject of profound interest in audio analysis. This research investigates the application of digital image processing in the field of music genre classification, utilizing Mel spectrogram images obtained from audio files. This study employs a sequential approach to analyze the 'GTZAN Dataset,' which consists of 10,000 documented Mel spectrogram images that represent ten distinct music genres. The dataset was partitioned in a systematic manner into three separate segments. This partitioning allowed for thorough training and evaluation of the model, with a distribution ratio of 60% for training, 20% for validation, and 20% for testing. The sequential model, which is based on deep learning tenets effectively captures complex genre-specific characteristics from Mel spectrograms in order to achieve accurate music genre categorization. By utilizing a dataset consisting of 6,000 training photos and 2,000 validation photos, the model's parameters underwent refinement. Subsequently, an evaluation was conducted on a distinct set of 2,000 test photographs, which unveiled a remarkable accuracy rate of 94%. During the course of the research, performance metrics such as accuracy and loss graphs were employed to monitor the learning progress of the model during the training phase. Moreover, the examination of the confusion matrix in the testing phase provided insight into the effectiveness of the model, resulting in notable performance measurements. This confirms the model's strength in accurately categorizing music genres. This research makes a substantial contribution towards the advancement of autonomous systems that possess the ability to accurately classify music genres by utilizing spectrogram representations. The model's accuracy of 94% serves as evidence of its effectiveness, indicating its possible applications in systems for recommendations, music indexing, and content organization. This emphasizes its significant contribution to the field of audio content analysis and classification approaches.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
声波特征:利用旋律谱图进行序列模型驱动的音乐流派分类
音乐流派具有不同的声音景观和鲜明的特征,一直是音频分析领域深感兴趣的课题。本研究利用从音频文件中获取的梅尔频谱图图像,研究数字图像处理在音乐流派分类领域的应用。该数据集由 10,000 张记录的梅尔频谱图图像组成,代表了十种不同的音乐流派。该数据集被系统地划分为三个独立的部分。这种划分方式有助于对模型进行全面的训练和评估,训练、验证和测试的分配比例分别为 60%、20% 和 20%。基于深度学习原理的序列模型能从梅尔频谱图中有效捕捉复杂的特定流派特征,从而实现准确的音乐流派分类。通过使用由 6000 张训练照片和 2000 张验证照片组成的数据集,该模型的参数得到了改进。随后,在由 2,000 张测试照片组成的不同数据集上进行了评估,结果显示准确率高达 94%。在研究过程中,采用了准确率和损失图等性能指标来监控模型在训练阶段的学习进度。此外,在测试阶段对混淆矩阵的检查也有助于深入了解模型的有效性,从而得出显著的性能指标。这证实了该模型在准确划分音乐类型方面的优势。这项研究为推动自主系统的发展做出了重大贡献,这些系统具备利用频谱图准确分类音乐类型的能力。该模型高达 94% 的准确率证明了它的有效性,也表明了它在推荐、音乐索引和内容组织系统中的应用可能性。这强调了它对音频内容分析和分类方法领域的重大贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Bridging the Gap in Precision Agriculture: A CNN-Random Forest Fusion for Disease Classification Overlapping Community Detection based on Facets of Social Network: An Empirical Analysis Sonic Signatures: Sequential Model-driven Music Genre Classification with Mel Spectograms Disease Prediction System in Human Beings using Machine Learning Approaches Enhanced scanning rate for SIW-LWA with continuous beam steering using delay lines
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1