SSLMM: Semi-Supervised Learning with Missing Modalities for Multimodal Sentiment Analysis

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Information Fusion Pub Date : 2025-03-07 DOI:10.1016/j.inffus.2025.103058

Yiyu Wang , Haifang Jian , Jian Zhuang , Huimin Guo , Yan Leng

{"title":"SSLMM: Semi-Supervised Learning with Missing Modalities for Multimodal Sentiment Analysis","authors":"Yiyu Wang , Haifang Jian , Jian Zhuang , Huimin Guo , Yan Leng","doi":"10.1016/j.inffus.2025.103058","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal Sentiment Analysis (MSA) integrates information from text, audio, and visuals to understand human emotions, but real-world applications face two challenges: (1) expensive annotation costs reduce the effectiveness of fully supervised methods, and (2) missing modality severely impact model robustness. While there are studies addressing these issues separately, few focus on solving both within a single framework. In real-world scenarios, these challenges often occur together, necessitating an algorithm that can handle both. To address this, we propose a Semi-Supervised Learning with Missing Modalities (SSLMM) framework. SSLMM combines self-supervised learning, alternating interaction information, semi-supervised learning, and modality reconstruction to tackle label scarcity and modality missing simultaneously. Firstly, SSLMM captures latent structural information through self-supervised pre-training. It then fine-tunes the model using semi-supervised learning and modality reconstruction to reduce dependence on labeled data and improve robustness to modality missing. The framework uses a graph-based architecture with an iterative message propagation mechanism to alternately propagate intra-modal and inter-modal messages, capturing emotional associations within and across modalities. Experiments on CMU-MOSI, CMU-MOSEI, and CH-SIMS demonstrate that under the condition where the proportion of labeled samples and the missing modality rate are both 0.5, SSLMM achieves binary classification (negative vs. positive) accuracies of 80.2%, 81.7%, and 77.1%, respectively, surpassing existing methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103058"},"PeriodicalIF":15.5000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525001319","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal Sentiment Analysis (MSA) integrates information from text, audio, and visuals to understand human emotions, but real-world applications face two challenges: (1) expensive annotation costs reduce the effectiveness of fully supervised methods, and (2) missing modality severely impact model robustness. While there are studies addressing these issues separately, few focus on solving both within a single framework. In real-world scenarios, these challenges often occur together, necessitating an algorithm that can handle both. To address this, we propose a Semi-Supervised Learning with Missing Modalities (SSLMM) framework. SSLMM combines self-supervised learning, alternating interaction information, semi-supervised learning, and modality reconstruction to tackle label scarcity and modality missing simultaneously. Firstly, SSLMM captures latent structural information through self-supervised pre-training. It then fine-tunes the model using semi-supervised learning and modality reconstruction to reduce dependence on labeled data and improve robustness to modality missing. The framework uses a graph-based architecture with an iterative message propagation mechanism to alternately propagate intra-modal and inter-modal messages, capturing emotional associations within and across modalities. Experiments on CMU-MOSI, CMU-MOSEI, and CH-SIMS demonstrate that under the condition where the proportion of labeled samples and the missing modality rate are both 0.5, SSLMM achieves binary classification (negative vs. positive) accuracies of 80.2%, 81.7%, and 77.1%, respectively, surpassing existing methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于缺失模态的半监督学习多模态情感分析

多模态情感分析（MSA）集成了来自文本、音频和视觉的信息来理解人类情感，但现实应用面临两个挑战：(1)昂贵的注释成本降低了全监督方法的有效性；(2)缺失模态严重影响模型的鲁棒性。虽然有研究分别解决这些问题，但很少有人关注在一个框架内解决这两个问题。在现实场景中，这些挑战经常同时发生，因此需要一种能够同时处理这两种挑战的算法。为了解决这个问题，我们提出了一个缺失模态的半监督学习（SSLMM）框架。SSLMM结合了自监督学习、交替交互信息、半监督学习和模态重构，同时解决了标签稀缺和模态缺失问题。首先，SSLMM通过自监督预训练捕获潜在的结构信息。然后使用半监督学习和模态重建对模型进行微调，以减少对标记数据的依赖，并提高对模态缺失的鲁棒性。该框架使用基于图的架构和迭代消息传播机制，交替传播模态内和模态间的消息，捕获模态内部和模态之间的情感关联。在CMU-MOSI、CMU-MOSEI和CH-SIMS上的实验表明，在标记样本比例和缺失模态率均为0.5的情况下，SSLMM的二值分类（阴性vs阳性）准确率分别达到80.2%、81.7%和77.1%，超过了现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.