利用卷积时空池网络进行音乐流派分类

IF 3 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Multimedia Tools and Applications Pub Date : 2024-09-02 DOI:10.1007/s11042-024-20163-5
Vijayameenakshi T. M, Swapna T. R
{"title":"利用卷积时空池网络进行音乐流派分类","authors":"Vijayameenakshi T. M, Swapna T. R","doi":"10.1007/s11042-024-20163-5","DOIUrl":null,"url":null,"abstract":"<p>Music genre classification is one of the most interesting topics in digital music. Classifying genres is basically subjective, and different listeners may perceive genres in various ways. Furthermore, it might be difficult to classify some songs accurately since they belong to numerous genres. Genres are incredibly wide and ill-defined categories, which makes them problematic. Thus, genre-based measures are inherently inaccurate and coarse. Moreover, not every piece of music cleanly fits into a particular genre. Many papers based on deep neural networks perform sound recognition and classification with input images of audio, which do not affect the time–frequency representation of a signal. The traditional method adds waveform augmentation to the audio signal, thereby increasing the network's training speed. This paper considers music genre classification with the convolution temporal pooling framework and explores the impact of adding the SpecAugment method to augment the spectrogram itself. The augmented spectrogram is then fed into a convolutional temporal pooling network. In this model, the temporal and pooling layers identify the genre pattern and classify the songs based on the genre. It also predicts these duplication that will occur in the given sample. We apply this model to the GTZAN dataset, a widely used dataset for music genre classification. This method improves the identification of Rock and Pop song and also eliminates the replication of the songs. The trained model reports an accuracy of 0.75 for training a 30-s audio file.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"15 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Music genre classification using convolution temporal pooling network\",\"authors\":\"Vijayameenakshi T. M, Swapna T. R\",\"doi\":\"10.1007/s11042-024-20163-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Music genre classification is one of the most interesting topics in digital music. Classifying genres is basically subjective, and different listeners may perceive genres in various ways. Furthermore, it might be difficult to classify some songs accurately since they belong to numerous genres. Genres are incredibly wide and ill-defined categories, which makes them problematic. Thus, genre-based measures are inherently inaccurate and coarse. Moreover, not every piece of music cleanly fits into a particular genre. Many papers based on deep neural networks perform sound recognition and classification with input images of audio, which do not affect the time–frequency representation of a signal. The traditional method adds waveform augmentation to the audio signal, thereby increasing the network's training speed. This paper considers music genre classification with the convolution temporal pooling framework and explores the impact of adding the SpecAugment method to augment the spectrogram itself. The augmented spectrogram is then fed into a convolutional temporal pooling network. In this model, the temporal and pooling layers identify the genre pattern and classify the songs based on the genre. It also predicts these duplication that will occur in the given sample. We apply this model to the GTZAN dataset, a widely used dataset for music genre classification. This method improves the identification of Rock and Pop song and also eliminates the replication of the songs. The trained model reports an accuracy of 0.75 for training a 30-s audio file.</p>\",\"PeriodicalId\":18770,\"journal\":{\"name\":\"Multimedia Tools and Applications\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multimedia Tools and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11042-024-20163-5\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Tools and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11042-024-20163-5","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

音乐流派分类是数字音乐领域最有趣的话题之一。流派分类基本上是主观的,不同的听众可能会以不同的方式感知流派。此外,有些歌曲可能很难准确分类,因为它们属于多种流派。流派是一个非常宽泛且定义不清的类别,这就给流派分类带来了问题。因此,基于流派的测量方法本质上是不准确和粗糙的。此外,并非每首音乐都能准确地归入某一特定流派。许多基于深度神经网络的论文都是通过输入音频图像来进行声音识别和分类的,这不会影响信号的时频表示。传统方法会对音频信号进行波形增强,从而提高网络的训练速度。本文考虑了使用卷积时空池框架进行音乐流派分类的问题,并探讨了添加 SpecAugment 方法对增强频谱图本身的影响。然后将增强频谱图输入卷积时序池网络。在该模型中,时序层和池化层可识别流派模式,并根据流派对歌曲进行分类。它还能预测给定样本中会出现的重复现象。我们将该模型应用于 GTZAN 数据集,这是一个广泛用于音乐流派分类的数据集。这种方法提高了对摇滚和流行歌曲的识别率,并消除了歌曲的重复现象。经过训练的模型在训练 30 秒音频文件时的准确率为 0.75。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Music genre classification using convolution temporal pooling network

Music genre classification is one of the most interesting topics in digital music. Classifying genres is basically subjective, and different listeners may perceive genres in various ways. Furthermore, it might be difficult to classify some songs accurately since they belong to numerous genres. Genres are incredibly wide and ill-defined categories, which makes them problematic. Thus, genre-based measures are inherently inaccurate and coarse. Moreover, not every piece of music cleanly fits into a particular genre. Many papers based on deep neural networks perform sound recognition and classification with input images of audio, which do not affect the time–frequency representation of a signal. The traditional method adds waveform augmentation to the audio signal, thereby increasing the network's training speed. This paper considers music genre classification with the convolution temporal pooling framework and explores the impact of adding the SpecAugment method to augment the spectrogram itself. The augmented spectrogram is then fed into a convolutional temporal pooling network. In this model, the temporal and pooling layers identify the genre pattern and classify the songs based on the genre. It also predicts these duplication that will occur in the given sample. We apply this model to the GTZAN dataset, a widely used dataset for music genre classification. This method improves the identification of Rock and Pop song and also eliminates the replication of the songs. The trained model reports an accuracy of 0.75 for training a 30-s audio file.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Multimedia Tools and Applications
Multimedia Tools and Applications 工程技术-工程:电子与电气
CiteScore
7.20
自引率
16.70%
发文量
2439
审稿时长
9.2 months
期刊介绍: Multimedia Tools and Applications publishes original research articles on multimedia development and system support tools as well as case studies of multimedia applications. It also features experimental and survey articles. The journal is intended for academics, practitioners, scientists and engineers who are involved in multimedia system research, design and applications. All papers are peer reviewed. Specific areas of interest include: - Multimedia Tools: - Multimedia Applications: - Prototype multimedia systems and platforms
期刊最新文献
MeVs-deep CNN: optimized deep learning model for efficient lung cancer classification Text-driven clothed human image synthesis with 3D human model estimation for assistance in shopping Hybrid golden jackal fusion based recommendation system for spatio-temporal transportation's optimal traffic congestion and road condition classification Deep-Dixon: Deep-Learning frameworks for fusion of MR T1 images for fat and water extraction Unified pre-training with pseudo infrared images for visible-infrared person re-identification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1