利用 CNN 增强变换器进行条件选择，实现多模态情感分析

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE CAAI Transactions on Intelligence Technology Pub Date : 2024-03-22 DOI:10.1049/cit2.12320

Jianwen Wang, Shiping Wang, Shunxin Xiao, Renjie Lin, Mianxiong Dong, Wenzhong Guo

{"title":"利用 CNN 增强变换器进行条件选择，实现多模态情感分析","authors":"Jianwen Wang, Shiping Wang, Shunxin Xiao, Renjie Lin, Mianxiong Dong, Wenzhong Guo","doi":"10.1049/cit2.12320","DOIUrl":null,"url":null,"abstract":"<p>Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional semantics. The other is fusing complementary cross-modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross-modal attention. As a result, the located nonverbal features are not only salient but also complementary to sentiment words directly. Experimental results show that the authors’ method achieves state-of-the-art performance on several multimodal affective analysis datasets.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"917-931"},"PeriodicalIF":8.4000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12320","citationCount":"0","resultStr":"{\"title\":\"Conditional selection with CNN augmented transformer for multimodal affective analysis\",\"authors\":\"Jianwen Wang, Shiping Wang, Shunxin Xiao, Renjie Lin, Mianxiong Dong, Wenzhong Guo\",\"doi\":\"10.1049/cit2.12320\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional semantics. The other is fusing complementary cross-modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross-modal attention. As a result, the located nonverbal features are not only salient but also complementary to sentiment words directly. Experimental results show that the authors’ method achieves state-of-the-art performance on several multimodal affective analysis datasets.</p>\",\"PeriodicalId\":46211,\"journal\":{\"name\":\"CAAI Transactions on Intelligence Technology\",\"volume\":\"9 4\",\"pages\":\"917-931\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12320\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CAAI Transactions on Intelligence Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12320\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12320","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

注意力机制是近年来多模态情感分析的一种成功方法。尽管取得了进步，但在融合语言及其非语言语境信息方面仍存在一些重大挑战。其一是生成与声学和视觉模态相关的稀疏注意系数，这有助于定位关键的情感语义。另一个是融合互补的跨模态表征，以构建多种模态的最佳突出特征组合。本文提出了一种条件变换器融合网络来处理这些问题。首先，作者为变换器模块配备了 CNN 层，以增强对非语言序列中微妙信号模式的检测。其次，利用情感词作为上下文条件来指导跨模态注意力的计算。因此，所定位的非语言特征不仅是突出的，而且是对情感词的直接补充。实验结果表明，作者的方法在多个多模态情感分析数据集上取得了一流的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Conditional selection with CNN augmented transformer for multimodal affective analysis

Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional semantics. The other is fusing complementary cross-modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross-modal attention. As a result, the located nonverbal features are not only salient but also complementary to sentiment words directly. Experimental results show that the authors’ method achieves state-of-the-art performance on several multimodal affective analysis datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

CAAI Transactions on Intelligence Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

11.00

自引率

3.90%

发文量

134

审稿时长

35 weeks

期刊介绍： CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.