{"title":"Sonic Signatures: Sequential Model-driven Music Genre Classification with Mel Spectograms","authors":"Rudresh Pillai, Neha Sharma, Deepak Upadhyay, Sarishma Dangi, Rupesh Gupta","doi":"10.1109/ICAECT60202.2024.10468856","DOIUrl":null,"url":null,"abstract":"Music genres, with their diverse sonic landscapes and distinct characteristics, have been a subject of profound interest in audio analysis. This research investigates the application of digital image processing in the field of music genre classification, utilizing Mel spectrogram images obtained from audio files. This study employs a sequential approach to analyze the 'GTZAN Dataset,' which consists of 10,000 documented Mel spectrogram images that represent ten distinct music genres. The dataset was partitioned in a systematic manner into three separate segments. This partitioning allowed for thorough training and evaluation of the model, with a distribution ratio of 60% for training, 20% for validation, and 20% for testing. The sequential model, which is based on deep learning tenets effectively captures complex genre-specific characteristics from Mel spectrograms in order to achieve accurate music genre categorization. By utilizing a dataset consisting of 6,000 training photos and 2,000 validation photos, the model's parameters underwent refinement. Subsequently, an evaluation was conducted on a distinct set of 2,000 test photographs, which unveiled a remarkable accuracy rate of 94%. During the course of the research, performance metrics such as accuracy and loss graphs were employed to monitor the learning progress of the model during the training phase. Moreover, the examination of the confusion matrix in the testing phase provided insight into the effectiveness of the model, resulting in notable performance measurements. This confirms the model's strength in accurately categorizing music genres. This research makes a substantial contribution towards the advancement of autonomous systems that possess the ability to accurately classify music genres by utilizing spectrogram representations. The model's accuracy of 94% serves as evidence of its effectiveness, indicating its possible applications in systems for recommendations, music indexing, and content organization. This emphasizes its significant contribution to the field of audio content analysis and classification approaches.","PeriodicalId":518900,"journal":{"name":"2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)","volume":"44 2","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAECT60202.2024.10468856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Music genres, with their diverse sonic landscapes and distinct characteristics, have been a subject of profound interest in audio analysis. This research investigates the application of digital image processing in the field of music genre classification, utilizing Mel spectrogram images obtained from audio files. This study employs a sequential approach to analyze the 'GTZAN Dataset,' which consists of 10,000 documented Mel spectrogram images that represent ten distinct music genres. The dataset was partitioned in a systematic manner into three separate segments. This partitioning allowed for thorough training and evaluation of the model, with a distribution ratio of 60% for training, 20% for validation, and 20% for testing. The sequential model, which is based on deep learning tenets effectively captures complex genre-specific characteristics from Mel spectrograms in order to achieve accurate music genre categorization. By utilizing a dataset consisting of 6,000 training photos and 2,000 validation photos, the model's parameters underwent refinement. Subsequently, an evaluation was conducted on a distinct set of 2,000 test photographs, which unveiled a remarkable accuracy rate of 94%. During the course of the research, performance metrics such as accuracy and loss graphs were employed to monitor the learning progress of the model during the training phase. Moreover, the examination of the confusion matrix in the testing phase provided insight into the effectiveness of the model, resulting in notable performance measurements. This confirms the model's strength in accurately categorizing music genres. This research makes a substantial contribution towards the advancement of autonomous systems that possess the ability to accurately classify music genres by utilizing spectrogram representations. The model's accuracy of 94% serves as evidence of its effectiveness, indicating its possible applications in systems for recommendations, music indexing, and content organization. This emphasizes its significant contribution to the field of audio content analysis and classification approaches.