Music source feature extraction based on improved attention mechanism and phase feature

Weina Yu
{"title":"Music source feature extraction based on improved attention mechanism and phase feature","authors":"Weina Yu","doi":"10.1016/j.sasc.2024.200149","DOIUrl":null,"url":null,"abstract":"<div><div>Music source feature extraction is an important research direction in music information retrieval and music recommendation system. To extract the features of music sources more effectively, the study introduces the jump attention mechanism and combines it with the convolutional attention module. Also, a feature extraction module based on Unet + + and spatial attention module is proposed. In addition, the phase feature information of the mixed music signals is utilized to improve the network performance. Results showed that this model was studied to perform well in music source separation experiments of vocals and accompaniment. For vocal separation on the MIR-1K dataset, the model achieves 11.25 dB, 17.34 dB, and 13.83 dB for each metric, respectively. Meanwhile, for drum separation on the DSD100 dataset, the model achieves a median signal-to-source distortion ratio of 4.36 dB, which is 2.91 dB better than that of the Spectral Hierarchical Network model. For the separation of the bass sound and the human voice, the model's in the separation of bass and human voice, the median distortion ratio of the model is as high as 4.87 dB and 6.09 dB, which is better than that of the Spectral Hierarchical Network model. This indicates the significant performance advantages in feature extraction and separation of music sources, and it has important application values in music production and speech recognition.</div></div>","PeriodicalId":101205,"journal":{"name":"Systems and Soft Computing","volume":"6 ","pages":"Article 200149"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772941924000784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Music source feature extraction is an important research direction in music information retrieval and music recommendation system. To extract the features of music sources more effectively, the study introduces the jump attention mechanism and combines it with the convolutional attention module. Also, a feature extraction module based on Unet + + and spatial attention module is proposed. In addition, the phase feature information of the mixed music signals is utilized to improve the network performance. Results showed that this model was studied to perform well in music source separation experiments of vocals and accompaniment. For vocal separation on the MIR-1K dataset, the model achieves 11.25 dB, 17.34 dB, and 13.83 dB for each metric, respectively. Meanwhile, for drum separation on the DSD100 dataset, the model achieves a median signal-to-source distortion ratio of 4.36 dB, which is 2.91 dB better than that of the Spectral Hierarchical Network model. For the separation of the bass sound and the human voice, the model's in the separation of bass and human voice, the median distortion ratio of the model is as high as 4.87 dB and 6.09 dB, which is better than that of the Spectral Hierarchical Network model. This indicates the significant performance advantages in feature extraction and separation of music sources, and it has important application values in music production and speech recognition.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于改进的注意力机制和相位特征的音乐源特征提取
音乐源特征提取是音乐信息检索和音乐推荐系统的一个重要研究方向。为了更有效地提取音乐源特征,本研究引入了跳跃注意力机制,并将其与卷积注意力模块相结合。同时,还提出了基于 Unet + + 和空间注意力模块的特征提取模块。此外,还利用了混合音乐信号的相位特征信息来提高网络性能。研究结果表明,该模型在人声和伴奏的音乐源分离实验中表现良好。在 MIR-1K 数据集的人声分离实验中,该模型的各项指标分别达到了 11.25 dB、17.34 dB 和 13.83 dB。同时,在 DSD100 数据集的鼓声分离方面,该模型的信号源失真比中位数为 4.36 dB,比频谱分层网络模型好 2.91 dB。对于低音和人声的分离,模型在低音和人声分离中的失真比中位数分别高达 4.87 dB 和 6.09 dB,优于频谱分层网络模型。这表明该模型在音乐源的特征提取和分离方面具有明显的性能优势,在音乐制作和语音识别方面具有重要的应用价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.20
自引率
0.00%
发文量
0
期刊最新文献
Application of an intelligent English text classification model with improved KNN algorithm in the context of big data in libraries Analyzing the quality evaluation of college English teaching based on probabilistic linguistic multiple-attribute group decision-making Interior design assistant algorithm based on indoor scene analysis Research and application of visual synchronous positioning and mapping technology assisted by ultra wideband positioning technology Sentiment analysis of movie reviews: A flask application using CNN with RoBERTa embeddings
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1