Hao Sun;Ziwei Niu;Hongyi Wang;Xinyao Yu;Jiaqing Liu;Yen-Wei Chen;Lanfen Lin
{"title":"Multimodal Sentiment Analysis With Mutual Information-Based Disentangled Representation Learning","authors":"Hao Sun;Ziwei Niu;Hongyi Wang;Xinyao Yu;Jiaqing Liu;Yen-Wei Chen;Lanfen Lin","doi":"10.1109/TAFFC.2025.3529732","DOIUrl":null,"url":null,"abstract":"Multimodal sentiment analysis seeks to utilize various types of signals to identify underlying emotions and sentiments. A key challenge in this field lies in multimodal representation learning, which aims to develop effective methods for integrating multimodal features into cohesive representations. Recent advancements include two notable approaches: one focuses on decomposing multimodal features into modality-invariant and -specific components, while the other emphasizes the use of mutual information to enhance the fusion of modalities. Both strategies have demonstrated effectiveness and yielded remarkable results. In this paper, we propose a novel learning framework that combines the strengths of these two approaches, termed mutual information-based disentangled multimodal representation learning. Our approach involves estimating different types of information during feature extraction and fusion stages. Specifically, we quantitatively assess and adjust the proportions of modality-invariant, -specific, and -complementary information during feature extraction. Subsequently, during fusion, we evaluate the amount of information retained by each modality in the fused representation. We employ mutual information or conditional mutual information to estimate each type of information content. By reconciling the proportions of these different types of information, our approach achieves state-of-the-art performance on popular sentiment analysis benchmarks, including CMU-MOSI and CMU-MOSEI.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1606-1617"},"PeriodicalIF":9.8000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10842969/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal sentiment analysis seeks to utilize various types of signals to identify underlying emotions and sentiments. A key challenge in this field lies in multimodal representation learning, which aims to develop effective methods for integrating multimodal features into cohesive representations. Recent advancements include two notable approaches: one focuses on decomposing multimodal features into modality-invariant and -specific components, while the other emphasizes the use of mutual information to enhance the fusion of modalities. Both strategies have demonstrated effectiveness and yielded remarkable results. In this paper, we propose a novel learning framework that combines the strengths of these two approaches, termed mutual information-based disentangled multimodal representation learning. Our approach involves estimating different types of information during feature extraction and fusion stages. Specifically, we quantitatively assess and adjust the proportions of modality-invariant, -specific, and -complementary information during feature extraction. Subsequently, during fusion, we evaluate the amount of information retained by each modality in the fused representation. We employ mutual information or conditional mutual information to estimate each type of information content. By reconciling the proportions of these different types of information, our approach achieves state-of-the-art performance on popular sentiment analysis benchmarks, including CMU-MOSI and CMU-MOSEI.
期刊介绍:
The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.