用于多模态假新闻检测的单模态事件诊断知识蒸馏法

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-10-10 DOI:10.1109/TKDE.2024.3477977

Guofan Liu;Jinghao Zhang;Qiang Liu;Junfei Wu;Shu Wu;Liang Wang

{"title":"用于多模态假新闻检测的单模态事件诊断知识蒸馏法","authors":"Guofan Liu;Jinghao Zhang;Qiang Liu;Junfei Wu;Shu Wu;Liang Wang","doi":"10.1109/TKDE.2024.3477977","DOIUrl":null,"url":null,"abstract":"With the rapid expansion of multimodal content in online social media, automatic detection of multimodal fake news has received much attention. Multimodal joint training commonly used in existing methods is expected to benefit from thoroughly leveraging cross-modal features, yet these methods still suffer from insufficient learning of uni-modal features. Due to the heterogeneity of multimodal networks, optimizing a single objective will inevitably make the models prone to rely on specific modality while leaving other modalities under-optimized. On the other hand, simply expecting each modality to play a significant role in identifying all the rumors is also not appropriate as the multimodal fake news often involves tampering in only one modality. Therefore, how to find the genuine tampering on the per-sample basis becomes the key point to unlock the full power of each modality in a good collaborative manner. To address these issues, we propose a \n<bold>U\nni-modal \n<bold>E\nvent-agnostic \n<bold>K\nnowledge \n<bold>D\nistillation framework (UEKD), which aims to transfer knowledge contained in the fine-grained prediction from uni-modal teachers to the multimodal student model through modality-specific distillation. Specifically, we find that the uni-modal teachers simply trained on the whole training set are easy to memorize the event-specific noise information to make a correct but biased prediction, failing to reflect the genuine degree of tampering in each modality. To tackle this problem, we propose to train and validate the teacher models on different domains in training dataset through a cross-validation manner, as the predictions from the out-of-domain teachers can be regarded as event-agnostic knowledge without spurious connections with event-specific information. Finally, to balance the convergence speeds across modalities, we dynamically monitor the involvement of each modality during training, through which we could identify the more under-optimized modalities and re-weight the distillation loss accordingly. Our method could be served as a plug-and-play module for existing multimodal fake news detection backbones. Extensive experiments on three public datasets and four state-of-the-art fake news detection backbones show that our proposed method can improve the performance by a large margin.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9490-9503"},"PeriodicalIF":8.9000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uni-Modal Event-Agnostic Knowledge Distillation for Multimodal Fake News Detection\",\"authors\":\"Guofan Liu;Jinghao Zhang;Qiang Liu;Junfei Wu;Shu Wu;Liang Wang\",\"doi\":\"10.1109/TKDE.2024.3477977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid expansion of multimodal content in online social media, automatic detection of multimodal fake news has received much attention. Multimodal joint training commonly used in existing methods is expected to benefit from thoroughly leveraging cross-modal features, yet these methods still suffer from insufficient learning of uni-modal features. Due to the heterogeneity of multimodal networks, optimizing a single objective will inevitably make the models prone to rely on specific modality while leaving other modalities under-optimized. On the other hand, simply expecting each modality to play a significant role in identifying all the rumors is also not appropriate as the multimodal fake news often involves tampering in only one modality. Therefore, how to find the genuine tampering on the per-sample basis becomes the key point to unlock the full power of each modality in a good collaborative manner. To address these issues, we propose a \\n<bold>U\\nni-modal \\n<bold>E\\nvent-agnostic \\n<bold>K\\nnowledge \\n<bold>D\\nistillation framework (UEKD), which aims to transfer knowledge contained in the fine-grained prediction from uni-modal teachers to the multimodal student model through modality-specific distillation. Specifically, we find that the uni-modal teachers simply trained on the whole training set are easy to memorize the event-specific noise information to make a correct but biased prediction, failing to reflect the genuine degree of tampering in each modality. To tackle this problem, we propose to train and validate the teacher models on different domains in training dataset through a cross-validation manner, as the predictions from the out-of-domain teachers can be regarded as event-agnostic knowledge without spurious connections with event-specific information. Finally, to balance the convergence speeds across modalities, we dynamically monitor the involvement of each modality during training, through which we could identify the more under-optimized modalities and re-weight the distillation loss accordingly. Our method could be served as a plug-and-play module for existing multimodal fake news detection backbones. Extensive experiments on three public datasets and four state-of-the-art fake news detection backbones show that our proposed method can improve the performance by a large margin.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"36 12\",\"pages\":\"9490-9503\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10713273/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10713273/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

随着多模态内容在网络社交媒体中的迅速扩展，多模态假新闻的自动检测受到了广泛关注。现有方法常用的多模态联合训练有望从彻底利用跨模态特征中获益，但这些方法仍然存在单模态特征学习不足的问题。由于多模态网络的异质性，优化单一目标将不可避免地使模型容易依赖于特定模态，而使其他模态未得到充分优化。另一方面，由于多模态假新闻往往只涉及一种模态的篡改，因此简单地期望每种模态在识别所有谣言方面发挥重要作用也是不合适的。因此，如何在每个样本的基础上发现真正的篡改，成为以良好的协作方式释放每种模式的全部威力的关键点。为了解决这些问题，我们提出了一种单模态事件标示知识蒸馏框架（UEKD），旨在通过特定模态的蒸馏，将单模态教师的细粒度预测中包含的知识转移到多模态学生模型中。具体来说，我们发现单纯在整个训练集上训练的单模态教师很容易记住特定事件的噪声信息，从而做出正确但有偏差的预测，无法反映每种模态中真正的篡改程度。为了解决这个问题，我们建议通过交叉验证的方式，在训练数据集的不同域上训练和验证教师模型，因为域外教师的预测可以被视为与事件无关的知识，不会与特定事件信息产生虚假联系。最后，为了平衡不同模态的收敛速度，我们在训练过程中动态监测了每种模态的参与情况，从而识别出优化程度较低的模态，并相应地重新加权蒸馏损失。我们的方法可以作为现有多模态假新闻检测骨干网的即插即用模块。在三个公共数据集和四个最先进的假新闻检测骨干网上进行的广泛实验表明，我们提出的方法可以大幅提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Uni-Modal Event-Agnostic Knowledge Distillation for Multimodal Fake News Detection

With the rapid expansion of multimodal content in online social media, automatic detection of multimodal fake news has received much attention. Multimodal joint training commonly used in existing methods is expected to benefit from thoroughly leveraging cross-modal features, yet these methods still suffer from insufficient learning of uni-modal features. Due to the heterogeneity of multimodal networks, optimizing a single objective will inevitably make the models prone to rely on specific modality while leaving other modalities under-optimized. On the other hand, simply expecting each modality to play a significant role in identifying all the rumors is also not appropriate as the multimodal fake news often involves tampering in only one modality. Therefore, how to find the genuine tampering on the per-sample basis becomes the key point to unlock the full power of each modality in a good collaborative manner. To address these issues, we propose a U ni-modal E vent-agnostic K nowledge D istillation framework (UEKD), which aims to transfer knowledge contained in the fine-grained prediction from uni-modal teachers to the multimodal student model through modality-specific distillation. Specifically, we find that the uni-modal teachers simply trained on the whole training set are easy to memorize the event-specific noise information to make a correct but biased prediction, failing to reflect the genuine degree of tampering in each modality. To tackle this problem, we propose to train and validate the teacher models on different domains in training dataset through a cross-validation manner, as the predictions from the out-of-domain teachers can be regarded as event-agnostic knowledge without spurious connections with event-specific information. Finally, to balance the convergence speeds across modalities, we dynamically monitor the involvement of each modality during training, through which we could identify the more under-optimized modalities and re-weight the distillation loss accordingly. Our method could be served as a plug-and-play module for existing multimodal fake news detection backbones. Extensive experiments on three public datasets and four state-of-the-art fake news detection backbones show that our proposed method can improve the performance by a large margin.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.