An Efficient Bi-Modal Fusion Framework for Music Emotion Recognition

IF 9.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Affective Computing Pub Date : 2024-10-24 DOI:10.1109/TAFFC.2024.3486340
Yao Xiao;Haoxin Ruan;Xujian Zhao;Peiquan Jin;Li Tian;Zihan Wei;Xuebo Cai;Yixin Wang;Liang Liu
{"title":"An Efficient Bi-Modal Fusion Framework for Music Emotion Recognition","authors":"Yao Xiao;Haoxin Ruan;Xujian Zhao;Peiquan Jin;Li Tian;Zihan Wei;Xuebo Cai;Yixin Wang;Liang Liu","doi":"10.1109/TAFFC.2024.3486340","DOIUrl":null,"url":null,"abstract":"Current methods for Music Emotion Recognition (MER) face challenges in effectively extracting features sensitive to emotions, especially those rich in temporal detail. Moreover, the narrow scope of music-related modalities impedes data integration from multiple sources, while including multiple modalities often leads to redundant information, which can degrade performance. To address these issues, we propose a lightweight framework for music emotion recognition that improves the extraction of features that are both sensitive to emotions and rich in temporal information and that integrates data from both audio and MIDI modalities while minimizing redundancy. Our approach involves developing two innovative unimodal encoders to learn embeddings from audio and MIDI-like features. Additionally, we introduce a Bi-modal Fusion Attention Model (BFAM) that integrates features from low-level to high-level semantic information across different modalities. Experimental evaluations on the EMOPIA and VGMIDI datasets show that our unimodal networks achieve accuracies that are 6.1% and 4.4% higher than baseline algorithms for MIDI and audio on the EMOPIA dataset, respectively. Furthermore, our BFAM achieves a 15.2% improvement in accuracy over the baseline, reaching 82.2%, which underscores its effectiveness for bi-modal MER applications.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 2","pages":"999-1015"},"PeriodicalIF":9.8000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10735097/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Current methods for Music Emotion Recognition (MER) face challenges in effectively extracting features sensitive to emotions, especially those rich in temporal detail. Moreover, the narrow scope of music-related modalities impedes data integration from multiple sources, while including multiple modalities often leads to redundant information, which can degrade performance. To address these issues, we propose a lightweight framework for music emotion recognition that improves the extraction of features that are both sensitive to emotions and rich in temporal information and that integrates data from both audio and MIDI modalities while minimizing redundancy. Our approach involves developing two innovative unimodal encoders to learn embeddings from audio and MIDI-like features. Additionally, we introduce a Bi-modal Fusion Attention Model (BFAM) that integrates features from low-level to high-level semantic information across different modalities. Experimental evaluations on the EMOPIA and VGMIDI datasets show that our unimodal networks achieve accuracies that are 6.1% and 4.4% higher than baseline algorithms for MIDI and audio on the EMOPIA dataset, respectively. Furthermore, our BFAM achieves a 15.2% improvement in accuracy over the baseline, reaching 82.2%, which underscores its effectiveness for bi-modal MER applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
音乐情感识别的高效双模态融合框架
当前的音乐情感识别方法在有效提取情感敏感特征,特别是那些富含时间细节的特征方面面临着挑战。此外,音乐相关模态的狭窄范围阻碍了来自多个来源的数据集成,而包含多个模态通常会导致冗余信息,从而降低性能。为了解决这些问题,我们提出了一个用于音乐情感识别的轻量级框架,该框架改进了对情感敏感和丰富时间信息的特征提取,并集成了音频和MIDI模式的数据,同时最大限度地减少了冗余。我们的方法包括开发两个创新的单模编码器来学习音频和midi类功能的嵌入。此外,我们引入了一个双模态融合注意模型(BFAM),该模型集成了不同模态的从低级到高级语义信息的特征。在EMOPIA和VGMIDI数据集上的实验评估表明,我们的单峰网络在EMOPIA数据集上的准确率分别比MIDI和音频的基线算法高6.1%和4.4%。此外,我们的BFAM在基线上实现了15.2%的精度提高,达到82.2%,这强调了它在双峰MER应用中的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS
CiteScore
15.00
自引率
6.20%
发文量
174
期刊介绍: The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.
期刊最新文献
EmoSENSE: Modeling Sentiment-Semantic Knowledge with Hierarchical Reinforcement Learning for Emotional Image Generation Emo-DiT: Emotional Speech Synthesis With a Diffusion Model Approach to Enhance Naturalness and Emotional Expressiveness InterARM: Interpretable Affective Reasoning Model for Multimodal Sarcasm Detection Exploring canine emotions: A transfer learning and 3DCNN-based study for small databases Fine-grained EEG emotion recognition using lite residual convolution-based transformer neural network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1