A Low-Rank Matching Attention Based Cross-Modal Feature Fusion Method for Conversational Emotion Recognition

IF 9.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Affective Computing Pub Date : 2024-11-14 DOI:10.1109/TAFFC.2024.3498443

Yuntao Shou;Huan Liu;Xiangyong Cao;Deyu Meng;Bo Dong

{"title":"A Low-Rank Matching Attention Based Cross-Modal Feature Fusion Method for Conversational Emotion Recognition","authors":"Yuntao Shou;Huan Liu;Xiangyong Cao;Deyu Meng;Bo Dong","doi":"10.1109/TAFFC.2024.3498443","DOIUrl":null,"url":null,"abstract":"Conversational emotion recognition (CER) is an important research topic in human-computer interactions. Although recent advancements in transformer-based cross-modal fusion methods have shown promise in CER tasks, they tend to overlook the crucial intra-modal and inter-modal emotional interaction or suffer from high computational complexity. To address this, we introduce a novel and lightweight cross-modal feature fusion method called Low-Rank Matching Attention Method (LMAM). LMAM effectively captures contextual emotional semantic information in conversations while mitigating the quadratic complexity issue caused by the self-attention mechanism. Specifically, by setting a matching weight and calculating inter-modal features attention scores row by row, LMAM requires only one-third of the parameters of self-attention methods. We also employ the low-rank decomposition method on the weights to further reduce the number of parameters in LMAM. As a result, LMAM offers a lightweight model while avoiding overfitting problems caused by a large number of parameters. Moreover, LMAM is able to fully exploit the intra-modal emotional contextual information within each modality and integrates complementary emotional semantic information across modalities by computing and fusing similarities of intra-modal and inter-modal features simultaneously. Experimental results verify the superiority of LMAM compared with other popular cross-modal fusion methods on the premise of being more lightweight. Also, LMAM can be embedded into any existing state-of-the-art CER methods in a plug-and-play manner, and can be applied to other multi-modal recognition tasks, e.g., session recommendation and humour detection, demonstrating its remarkable generalization ability.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 2","pages":"1177-1189"},"PeriodicalIF":9.8000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10753050/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Conversational emotion recognition (CER) is an important research topic in human-computer interactions. Although recent advancements in transformer-based cross-modal fusion methods have shown promise in CER tasks, they tend to overlook the crucial intra-modal and inter-modal emotional interaction or suffer from high computational complexity. To address this, we introduce a novel and lightweight cross-modal feature fusion method called Low-Rank Matching Attention Method (LMAM). LMAM effectively captures contextual emotional semantic information in conversations while mitigating the quadratic complexity issue caused by the self-attention mechanism. Specifically, by setting a matching weight and calculating inter-modal features attention scores row by row, LMAM requires only one-third of the parameters of self-attention methods. We also employ the low-rank decomposition method on the weights to further reduce the number of parameters in LMAM. As a result, LMAM offers a lightweight model while avoiding overfitting problems caused by a large number of parameters. Moreover, LMAM is able to fully exploit the intra-modal emotional contextual information within each modality and integrates complementary emotional semantic information across modalities by computing and fusing similarities of intra-modal and inter-modal features simultaneously. Experimental results verify the superiority of LMAM compared with other popular cross-modal fusion methods on the premise of being more lightweight. Also, LMAM can be embedded into any existing state-of-the-art CER methods in a plug-and-play manner, and can be applied to other multi-modal recognition tasks, e.g., session recommendation and humour detection, demonstrating its remarkable generalization ability.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于低库匹配注意力的对话情绪识别跨模态特征融合方法

会话情感识别（CER）是人机交互领域的一个重要研究课题。尽管最近基于变压器的跨模态融合方法在CER任务中显示出了希望，但它们往往忽略了关键的模态内和模态间的情感交互，或者计算复杂性高。为了解决这个问题，我们引入了一种新颖的轻量级跨模态特征融合方法，称为低秩匹配注意方法（lam）。LMAM有效地捕获了会话中的语境情感语义信息，同时减轻了自注意机制引起的二次复杂度问题。具体来说，通过设置匹配权值，逐行计算多式联运特征注意得分，LMAM只需要自注意方法三分之一的参数。我们还在权重上采用了低秩分解方法，进一步减少了lmm中的参数数量。因此，lmm提供了一个轻量级的模型，同时避免了由大量参数引起的过拟合问题。此外，lam能够充分利用每个模态内的情感语境信息，并通过同时计算和融合模态内和模态间特征的相似性来整合跨模态的互补情感语义信息。实验结果验证了lmm在更轻量化的前提下与其他流行的跨模态融合方法相比的优越性。此外，lmm可以以即插即用的方式嵌入到任何现有的最先进的CER方法中，并且可以应用于其他多模态识别任务，例如会话推荐和幽默检测，显示出其显著的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

15.00

自引率

6.20%

发文量

174

期刊介绍： The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.