Siqi Wei, Zheng Wang, Meiling Li, Xuanning Liu, Bin Wu
{"title":"DCCMA-Net: Disentanglement-based cross-modal clues mining and aggregation network for explainable multimodal fake news detection","authors":"Siqi Wei, Zheng Wang, Meiling Li, Xuanning Liu, Bin Wu","doi":"10.1016/j.ipm.2025.104089","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal fake news detection is significant in safeguarding social security. Compared with single-text news, multimodal news data contains rich cross-modal clues that can improve the detection effectiveness: modality-common semantic enhancement, modality-specific semantic complementation, and modality-specific semantic inconsistency. However, most existing studies ignore the disentanglement of modality-specific and modality-common semantics but treat them as an entangled whole. Consequently, these studies can only implicitly explore the interactions between modalities, resulting in a lack of explainability. To address that, we propose a Disentanglement-based Cross-modal Clues Mining and Aggregation Network for explainable fake news detection, called DCCMA-Net. Specifically, DCCMA-Net decomposes each modality into two distinct representations: a modality-common representation that captures shared semantics across modalities, and a modality-specific representation that captures unique semantics within each modality. Then, leveraging these disentangled representations, DCCMA-Net explicitly and comprehensively mines three cross-modal clues: modality-common semantic enhancement, modality-specific semantic complementation, and modality-specific semantic inconsistency. Since not all clues play an equal role in the decision-making process, DCCMA-Net proposes an adaptive attention aggregation module to assign contribution weights to different clues. Finally, DCCMA-Net aggregates these clues based on their contribution weights to obtain highly discriminative news representations for detection, and highlights the most contributive clues as explanations for the detection results. Extensive experiments demonstrate that DCCMA-Net outperforms existing methods, achieving detection accuracy improvements of 2.53%, 4.01%, and 3.99% on Weibo, PHEME, and Gossipcop datasets, respectively. Moreover, the explainability accuracy of DCCMA-Net exceeds that of current state-of-the-art methods on the Weibo dataset.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 4","pages":"Article 104089"},"PeriodicalIF":7.4000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325000317","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal fake news detection is significant in safeguarding social security. Compared with single-text news, multimodal news data contains rich cross-modal clues that can improve the detection effectiveness: modality-common semantic enhancement, modality-specific semantic complementation, and modality-specific semantic inconsistency. However, most existing studies ignore the disentanglement of modality-specific and modality-common semantics but treat them as an entangled whole. Consequently, these studies can only implicitly explore the interactions between modalities, resulting in a lack of explainability. To address that, we propose a Disentanglement-based Cross-modal Clues Mining and Aggregation Network for explainable fake news detection, called DCCMA-Net. Specifically, DCCMA-Net decomposes each modality into two distinct representations: a modality-common representation that captures shared semantics across modalities, and a modality-specific representation that captures unique semantics within each modality. Then, leveraging these disentangled representations, DCCMA-Net explicitly and comprehensively mines three cross-modal clues: modality-common semantic enhancement, modality-specific semantic complementation, and modality-specific semantic inconsistency. Since not all clues play an equal role in the decision-making process, DCCMA-Net proposes an adaptive attention aggregation module to assign contribution weights to different clues. Finally, DCCMA-Net aggregates these clues based on their contribution weights to obtain highly discriminative news representations for detection, and highlights the most contributive clues as explanations for the detection results. Extensive experiments demonstrate that DCCMA-Net outperforms existing methods, achieving detection accuracy improvements of 2.53%, 4.01%, and 3.99% on Weibo, PHEME, and Gossipcop datasets, respectively. Moreover, the explainability accuracy of DCCMA-Net exceeds that of current state-of-the-art methods on the Weibo dataset.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.