AdaFN-AG: Enhancing multimodal interaction with Adaptive Feature Normalization for multimodal sentiment analysis

Weilong Liu , Hua Xu , Yu Hua , Yunxian Chi , Kai Gao
{"title":"AdaFN-AG: Enhancing multimodal interaction with Adaptive Feature Normalization for multimodal sentiment analysis","authors":"Weilong Liu ,&nbsp;Hua Xu ,&nbsp;Yu Hua ,&nbsp;Yunxian Chi ,&nbsp;Kai Gao","doi":"10.1016/j.iswa.2024.200410","DOIUrl":null,"url":null,"abstract":"<div><p>In multimodal sentiment analysis, achieving effective fusion among text, acoustic, and visual modalities for enhanced sentiment prediction is a crucial research topic. Recent studies typically employ tensor-based or attention-based mechanisms for multimodal fusion. However, the former fails to achieve satisfactory prediction performance, and the latter complicates the computation of fusion between non-textual modalities. Therefore, this paper proposes the multimodal sentiment analysis model based on Adaptive Feature Normalization and Attention Gating mechanism (AdaFN-AG). Firstly, facing highly synchronized non-textual modalities, we design the Adaptive Feature Normalization (AdaFN) method, which focuses more on sentiment features interaction rather than timing features association. In AdaFN, acoustic and visual modality features achieve cross-modal interaction through normalization, inverse normalization, and mix-up operations, with weights utilized for adaptive strength regulation of the cross-modal interaction. Meanwhile, we design the Attention Gating mechanism that facilitates cross-modal interactions between textual and non-textual modalities through cross-attention and captures timing associations, while the gating module concurrently regulates the intensity of these interactions. Additionally, we employ self-attention to capture the intrinsic correlations within single-modal features. Subsequently, we conduct experiments on three benchmark datasets for multimodal sentiment analysis, with the results indicating that AdaFN-AG outperforms the baselines across the majority of evaluation metrics. Through research and experiments, we validate that AdaFN-AG not only enhances performance by adopting appropriate methods for different types of cross-modal interactions while conserving computational resources but also verifies the generalization capability of the AdaFN method.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"23 ","pages":"Article 200410"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266730532400084X/pdfft?md5=cfef4a3efbd5bbc2c211360cdb5fb70c&pid=1-s2.0-S266730532400084X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266730532400084X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In multimodal sentiment analysis, achieving effective fusion among text, acoustic, and visual modalities for enhanced sentiment prediction is a crucial research topic. Recent studies typically employ tensor-based or attention-based mechanisms for multimodal fusion. However, the former fails to achieve satisfactory prediction performance, and the latter complicates the computation of fusion between non-textual modalities. Therefore, this paper proposes the multimodal sentiment analysis model based on Adaptive Feature Normalization and Attention Gating mechanism (AdaFN-AG). Firstly, facing highly synchronized non-textual modalities, we design the Adaptive Feature Normalization (AdaFN) method, which focuses more on sentiment features interaction rather than timing features association. In AdaFN, acoustic and visual modality features achieve cross-modal interaction through normalization, inverse normalization, and mix-up operations, with weights utilized for adaptive strength regulation of the cross-modal interaction. Meanwhile, we design the Attention Gating mechanism that facilitates cross-modal interactions between textual and non-textual modalities through cross-attention and captures timing associations, while the gating module concurrently regulates the intensity of these interactions. Additionally, we employ self-attention to capture the intrinsic correlations within single-modal features. Subsequently, we conduct experiments on three benchmark datasets for multimodal sentiment analysis, with the results indicating that AdaFN-AG outperforms the baselines across the majority of evaluation metrics. Through research and experiments, we validate that AdaFN-AG not only enhances performance by adopting appropriate methods for different types of cross-modal interactions while conserving computational resources but also verifies the generalization capability of the AdaFN method.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AdaFN-AG:利用自适应特征归一化增强多模态交互,用于多模态情感分析
在多模态情感分析中,实现文本、声音和视觉模态的有效融合以增强情感预测是一个重要的研究课题。最近的研究通常采用基于张量或基于注意力的多模态融合机制。然而,前者无法达到令人满意的预测效果,后者则使非文本模态之间的融合计算变得复杂。因此,本文提出了基于自适应特征归一化和注意力门控机制(AdaFN-AG)的多模态情感分析模型。首先,面对高度同步的非文本模态,我们设计了自适应特征归一化(AdaFN)方法,该方法更注重情感特征的交互而非时序特征的关联。在 AdaFN 中,声学和视觉模态特征通过归一化、反归一化和混合操作实现跨模态交互,并利用权重对跨模态交互进行自适应强度调节。同时,我们设计了注意门控机制,通过交叉注意和捕捉时间关联来促进文本模态和非文本模态之间的跨模态交互,而门控模块则同时调节这些交互的强度。此外,我们还利用自我注意来捕捉单模态特征的内在关联。随后,我们在三个基准数据集上进行了多模态情感分析实验,结果表明,AdaFN-AG 在大多数评估指标上都优于基线。通过研究和实验,我们验证了 AdaFN-AG 不仅能在节约计算资源的同时,针对不同类型的跨模态交互采用适当的方法来提高性能,而且还验证了 AdaFN 方法的泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.60
自引率
0.00%
发文量
0
期刊最新文献
MapReduce teaching learning based optimization algorithm for solving CEC-2013 LSGO benchmark Testsuit Intelligent gear decision method for vehicle automatic transmission system based on data mining Design and implementation of EventsKG for situational monitoring and security intelligence in India: An open-source intelligence gathering approach Ideological orientation and extremism detection in online social networking sites: A systematic review Multi-objective optimization of power networks integrating electric vehicles and wind energy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1