Deep Audio Embeddings and Attention Based Music Emotion Recognition

S. Gupta
{"title":"Deep Audio Embeddings and Attention Based Music Emotion Recognition","authors":"S. Gupta","doi":"10.1109/DeSE58274.2023.10100058","DOIUrl":null,"url":null,"abstract":"The emotion is an intricated impression present in the music that is extremely hard to capture even using refined feature engineering techniques. The emotion of a song is an important feature that can be used for various MIR tasks like recommendation systems, music therapy, and automatic playlist generation. In this research, we investigate the application of L3- Net deep audio embeddings with the attention-based deep neural network model using positional encoding for recognizing musical emotions. In addition, we have constructed a master dataset using the 4Q audio emotion dataset and Bi-modal emotion dataset which is used in this research as the main dataset. The L3-Net deep audio embeddings are being used as features for the neural network model that does not require any feature engineering and other audio-based features. We have proposed two attention-based neural network models with and without recurrent layers. The positional encoding mechanism has helped the ACNN model to learn the recurrent information in the audio embeddings without any recurrent layers. Therefore, we conclude that the ACNN model has performed better than other models with the F1-score of 0.79 using the AdamP optimizer.","PeriodicalId":346847,"journal":{"name":"2023 15th International Conference on Developments in eSystems Engineering (DeSE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 15th International Conference on Developments in eSystems Engineering (DeSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DeSE58274.2023.10100058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The emotion is an intricated impression present in the music that is extremely hard to capture even using refined feature engineering techniques. The emotion of a song is an important feature that can be used for various MIR tasks like recommendation systems, music therapy, and automatic playlist generation. In this research, we investigate the application of L3- Net deep audio embeddings with the attention-based deep neural network model using positional encoding for recognizing musical emotions. In addition, we have constructed a master dataset using the 4Q audio emotion dataset and Bi-modal emotion dataset which is used in this research as the main dataset. The L3-Net deep audio embeddings are being used as features for the neural network model that does not require any feature engineering and other audio-based features. We have proposed two attention-based neural network models with and without recurrent layers. The positional encoding mechanism has helped the ACNN model to learn the recurrent information in the audio embeddings without any recurrent layers. Therefore, we conclude that the ACNN model has performed better than other models with the F1-score of 0.79 using the AdamP optimizer.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
深度音频嵌入和基于注意的音乐情感识别
这种情感是一种呈现在音乐中的复杂印象,即使使用精细的特征工程技术也很难捕捉到。歌曲的情感是一个重要的功能,可以用于各种MIR任务,如推荐系统、音乐治疗和自动播放列表生成。在这项研究中,我们研究了L3- Net深度音频嵌入与基于注意的深度神经网络模型的应用,该模型使用位置编码来识别音乐情感。此外,我们使用4Q音频情感数据集和本研究使用的双模态情感数据集作为主数据集构建了主数据集。L3-Net深度音频嵌入被用作神经网络模型的特征,不需要任何特征工程和其他基于音频的特征。我们提出了两种基于注意力的神经网络模型,有和没有循环层。位置编码机制帮助ACNN模型在没有任何循环层的情况下学习音频嵌入中的循环信息。因此,我们得出结论,使用AdamP优化器,ACNN模型的f1得分为0.79,优于其他模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Using Simulation for Investigating Emergency Traffic Situations Real- Time Healthcare Monitoring and Treatment System Based Microcontroller with IoT Automated Face Mask Detection using Artificial Intelligence and Video Surveillance Management Improvement of the Personnel Delivery System in the Mining Complex using Simulation Models An Exploratory Study on the Impact of Hosting Blockchain Applications in Cloud Infrastructures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1