{"title":"Resolving multimodal ambiguity via knowledge-injection and ambiguity learning for multimodal sentiment analysis","authors":"Xianbing Zhao , Xuejiao Li , Ronghuan Jiang , Buzhou Tang","doi":"10.1016/j.inffus.2024.102745","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal Sentiment Analysis (MSA) utilizes complementary multimodal features to predict sentiment polarity, which mainly involves language, vision, and audio modalities. Existing multimodal fusion methods primarily consider the complementarity of different modalities, while neglecting the ambiguity caused by conflicts between modalities (i.e. the text modality predicts positive sentiment while the visual modality predicts negative sentiment). To well diminish these conflicts, we develop a novel multimodal ambiguity learning framework, namely RMA, Resolving Multimodal Ambiguity via Knowledge-Injection and Ambiguity Learning for Multimodal Sentiment Analysis. Specifically, We introduce and filter external knowledge to enhance the consistency of cross-modal sentiment polarity prediction. Immediately, we explicitly measure ambiguity and dynamically adjust the impact between the subordinate modalities and the dominant modality to simultaneously consider the complementarity and conflicts of multiple modalities during multimodal fusion. Experiments demonstrate the dominantity of our proposed model across three public multimodal sentiment analysis datasets CMU-MOSI, CMU-MOSEI, and MELD.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"115 ","pages":"Article 102745"},"PeriodicalIF":14.7000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524005232","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal Sentiment Analysis (MSA) utilizes complementary multimodal features to predict sentiment polarity, which mainly involves language, vision, and audio modalities. Existing multimodal fusion methods primarily consider the complementarity of different modalities, while neglecting the ambiguity caused by conflicts between modalities (i.e. the text modality predicts positive sentiment while the visual modality predicts negative sentiment). To well diminish these conflicts, we develop a novel multimodal ambiguity learning framework, namely RMA, Resolving Multimodal Ambiguity via Knowledge-Injection and Ambiguity Learning for Multimodal Sentiment Analysis. Specifically, We introduce and filter external knowledge to enhance the consistency of cross-modal sentiment polarity prediction. Immediately, we explicitly measure ambiguity and dynamically adjust the impact between the subordinate modalities and the dominant modality to simultaneously consider the complementarity and conflicts of multiple modalities during multimodal fusion. Experiments demonstrate the dominantity of our proposed model across three public multimodal sentiment analysis datasets CMU-MOSI, CMU-MOSEI, and MELD.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.